*You will need to register or log in to view the course*. You can select the button shown at the top of this page to log on.

_{Please note some of these films require the Flash player plugin. If you experience problems accessing any videos, please email info-cmm@bristol.ac.uk.}

- Using quantitative data in research (watch video introduction)
- Introduction to quantitative data analysis (watch video introduction)
- Multiple regression
- Multilevel structures and classifications (watch video introduction)
- Introduction to multilevel modelling
- Regression models for binary responses
- Multilevel models for binary responses
- Multilevel modelling in practice: Research questions, data preparation and analysis
- Single-level and multilevel models for ordinal responses
- Single-level and multilevel models for nominal responses
- Three-level multilevel models
- Cross-classified multilevel models
- Multiple membership multilevel models
- Missing Data
- Multilevel Modelling of Repeated Measures Data

- Using quantitative data in research - concepts (PDF, 64kB) - (sample PDF, 0.1 mb)

In Module 1 we look at quantitative research and how we collect data, in order to provide a firm foundation for the analyses covered in later modules. The aims of Module 1 are:

- To give a broad overview of how research questions might be answered through quantitative analysis. Such questions as the following are explored: How does quantitative analysis relate to other methods of inquiry? Why is it required and what sorts of evidence can it supply?
- To introduce the vocabulary of quantitative analysis and specify the common terminology to be used in later modules. Of particular importance is the operational definition of research concepts (how we get from real world characteristics to numbers in our data set) and how this leads to observable variables at different levels of measurement.
- To introduce sources of data and concepts relating to how it may be possible to generalise results from samples of various kinds to the populations they are drawn from.
- To discuss how variables are defined, what different types there are, and how this may influence how they are analysed.
- To give some emphasis to certain ideas such as the nature of variability or the recognition of hierarchical units of analysis that are central to multilevel modelling

- C 1.0: Using quantitative data in research: concepts and definitions
- C 1.1: The uses of statistical analysis in research
- C 1.2: Research design and generalisation
- C 1.3 Data: units of analysis and statistical variables
- C 1.3.1: Example of data from the 2002 European Social Surveys
- C 1.3.2: Units of analysis
- C 1.3.3: Operational definition of research concepts
- C 1.3.4: Variables
- C 1.3.5: Naming variables
- C 1.3.6: Variable description
- C 1.3.7: Categories and values
- C 1.3.8: A classification scheme for variables
- C 1.3.9: Measurement error

- C 1.4: Data hierarchies

- Introduction to quantitative data analysis - concepts (PDF, 79kB) - (sample PDF, 0.1 mb)

- C 2.0: Introduction
- C 2.1: Univariate data summary
- Why do we need to explore our data?
- What questions might we ask?
- Level of measurement and data exploration
- C 2.1.1: Frequency distributions
- C 2.1.1.1: Categorical data
- C 2.1.1.2: Continuous data
- C 2.1.1.3: Symmetric distributions for continuous data: the normal curve
- C 2.1.2 Summary statistics of key features of distributions
- C 2.1.2.1: Measures of central tendency
- C 2.1.2.2 Measures of variation or dispersion

- C 2.2: Comparisons and relationships: the role of variability
- Comparison as the fundamental purpose of statistical analysis
- C 2.2.1: Comparing subgroups as a form of relationship
- C 2.2.2: Relationships and their direction
- C 2.2.3: The role of variability
- C 2.2.4: Homoscedasticity and heteroscedasticity

- C 2.3 Some other examples of relationships and the role of explained variability
- C 2.3.1: Extending to two categorical explanators for a continuous response: the possibility of interactions
- C 2.3.2: Variability of categorical responses
- C 2.3.3: Where both response and explanatory variables are continuous
- C 2.3.4: Patterns other than straight lines
- C 2.3.5: Relationship between a continuous response and combinations of categorical and continuous explanators: Progress of students in schools

- C 2.4 Working towards the idea of a formal statistical model
- What are models?
- Why use models?
- Patterns (explained variability) as ingredients of a model
- Residual variation (unexplained variability) as an ingredient of a model
- Heteroscedasticity as a feature of interest
- Residual variation and multilevel modellin

- C 2.5: Comments on statistical inference; uncertainty, estimation and hypothesis testing

- Multiple regression - concepts (PDF, 81kB) - (sample PDF, 0.1 mb)
- 3 - MLwiN practical (PDF, 121kB) - (sample PDF, 0.1 mb)
- 3 - R practical (PDF, 67kB)- (sample PDF, 0.1 mb)
- 3 - SPSS practical (PDF, 1,114kB) - (sample PDF, 1.08 mb)
- 3 - Stata practical (PDF, 90kB) - (sample PDF, 0.1 mb)

Multiple regression is a technique used to study the relationship between an outcome variable and a set of explanatory or predictor variables. Module 3 covers the following topics:

- Regression with a single continuous explanatory variable
- Comparing groups: regression with a Single categorical explanatory variable (dummy variables)
- Regression with more than one explanatory variable (multiple regression), including:
- A discussion of the idea of statistical control
- The multiple regression model for continuous and categorical explanatory variables
- Modelling non-linear relationships

- Interaction effects (allowing the effect of one explanatory variable X
_{1}to depend on the value of another X_{2})- Allowing the slope of the relationship between Y and X
_{1}to vary across groups defined by a categorical variable X_{2}by 1) fitting separate models for each value of X_{2}, and 2) fitting an interaction between X_{1}and X_{2} - Testing for interaction effects

- Allowing the slope of the relationship between Y and X
- Checking model assumptions in multiple regression
- Checking the normality assumption
- Checking the homoskedasticity (equal residual variance) assumption
- Outliers

The ideas are illustrated in analyses of hedonistic attitudes in Europe (using data from the European Social Surveys) and of trends in educational attainment (using data from the Scottish Youth Cohort Study).

- C 3.0:introduction
- C 3.1: Regression with a single continuous explanatory variable
- C 3.2: Comparing Groups: Regression with a single categorical explanatory variable
- C 3.3: Regression with more than one explanatory variable (multiple regression)
- C 3.4: Interaction effects
- C 3.4.1: Model with fixed slopes across groups
- C 3.4.2: Fitting separate models for each group
- C 3.4.3: Allowing for varying slopes in a pooled analysis: interaction effects
- C 3.4.4: Testing for interaction effects
- C 3.4.5: Another example: allowing age effects to be different in different countries

- C 3.5: Checking model assumptions in multiple regression

- Multilevel structures and classifications - concepts (PDF, 71kB) - (sample PDF, 0.1 mb)

Multilevel modelling is designed to explore and analyse data that come from populations which have a complex structure. This module aims to introduce:

- a range of multilevel structures and classifications and how they correspond to real-world situations, research designs, and/or social-science research problems;
- the different types of data frames associated with each structure and how subscripts are used to represent structure;
- targets of inference;
- the distinction between levels and variables, and fixed and random classifications;
- the notion that multilevel structures are likely to generate dependent, correlated data that requires explicit modelling;
- the difference between long and wide forms of data structures;
- the advantages, both technical and substantive, of using a multilevel model, and the disadvantages of not doing so.

- C 4.0.1: introduction
- C 4.1: Two-level hierarchical structures
- C 4.1.1: Students within schools
- C 4.1.2: Issues of sample size
- C 4.1.3: Variables and levels, fixed and random classifications
- C 4.1.4: Other examples of a two-level structure
- C 4.1.5: Repeated measurements within individuals, panel data
- C 4.1.6: Multivariate responses within individuals
- C 4.1.7: Two-stage sample survey design
- C 4.1.8: An experimental design in which the intervention is at the higher level

- C 4.2: Three-level structures
- C 4.3 Four-level structures
- C 4.4: Non-hierarchical structures
- C 4.5: Combining structures: Hierarchies, cross-classifications and multiple membership relationships
- C 4.6: Spatial structures
- C 4.7: Multilevel structures and classifications: Summary

- Introduction to multilevel modelling - concepts (PDF, 81kB) - (sample PDF, 0.1 mb)
- 5 - MLwiN practical (PDF, 322kB) - (sample PDF, 0.3 mb)
- 5 - R practical (PDF, 74kB)- (sample PDF, 0.1 mb)
- 5 - SPSS practical (PDF, 1,071kB) - (sample PDF, 1.04 mb)
- 5 - Stata practical (PDF, 93kB) - (sample PDF, 0.1 mb)

In the social, medical and biological sciences multilevel or hierarchical structures are the norm. Examples include individuals nested within geographical areas or institutions (e.g. schools or employers), and repeated observations over time on individuals. Other examples of hierarchical and non-hierarchical structures were given in Module 4. When individuals form groups or clusters, we might expect that two randomly selected individuals from the same group will tend to be more alike than two individuals selected from different groups. For example, children learn in classes and features of their class, such as teacher characteristics and the ability of other children in the class, are likely to influence a child's educational attainment. Because of these class effects, we would expect test scores for children in the same class to be more alike than scores for children from different classes. Multilevel models - also known as **hierarchical linear models, mixed models, random effects models** and **variance components models** - can be used to analyse data with a hierarchical structure. Throughout this module we refer to the lowest level of observation in the hierarchy (e.g. student) as level 1, and the group or cluster (e.g. class) as level 2.

The ideas are illustrated in analyses of hedonistic attitudes in 20 European countries (using data from the European Social Surveys) and of between-school variation in trends in students' educational attainment (using data from the Scottish Youth Cohort Study). The same datasets are analysed in Module 3 using multiple regression, ignoring country and school effects respectively. In this module, we emphasise the substantive insights that can be gained from a multilevel modelling approach.

- C 5.0: What is multilevel modelling?
- C 5.1: Comparing groups using multilevel modelling
- C 5.2: Multilevel regression with a level 1 explanatory variable: Random intercept models
- C 5.3: Allowing for different slopes across groups: Random slope models
- C 5.3.1: Random slope model
- C 5.3.2: Example of random slope for a continuous explanatory variable: allowing the relationship between hedonism and age to differ across countries
- C 5.3.3: Centring in a random slope model
- C 5.3.4: Example of random slope for a dichotomous explanatory variable: allowing the relationship between hedonism and gender to differ across countries
- C 5.3.5: Between-group variance as a function of explanatory variables

- C 5.4: Adding Level 2 Explanatory variables
- C 5.5: Complex level 1 variance
- Understanding Module 5 quiz
- Application 1: The use of performance indicators in education

- Regression models for binary responses - concepts (PDF, 75kB) - (sample PDF, 0.1 mb)
- 6 - MLwiN practical (PDF, 135kB) - (sample PDF, 0.1 mb)
- 6 - R practical (PDF, 68kB) - (sample PDF, 0.1 mb)
- 6 - Stata practical (PDF, 83kB) - (sample PDF, 0.1 mb)

In Module 3 we considered multiple linear regression models for the relationship between a continuous response variable and a set of explanatory variables which may be continuous or categorical. Regression models need to be adapted to handle categorical response variables and, in this module, we consider methods for a particular type of categorical variable: binary or dichotomous responses, that is variables with only two categories.

The ideas are illustrated in analyses of voting intentions in the 2004 US general election (using data from the National Annenberg Election Study) and uptake of antenatal care in Bangladesh (using data from the 2004 Bangladesh Demographic and Health Survey).

- C 6.0: Introduction to modelling binary responses
- C 6.1: Preliminaries: Mean and variance of binary data
- C 6.2: Moving towards a regression model for y : The linear probability model
- C 6.3: Generalised linear models
- C 6.4: Latent variable representation of a generalised linear model
- C 6.5: Application of logit and probit models to state differences in US voting intentions
- C 6.6: Adding further predictors in the analysis of US voting intentions
- C 6.6.1: Interpretation of a logit model using odds ratios
- C 6.6.2: Interpretation of logit and probit models using predicted response probabilities
- Predicted probabilities for each individual
- Predicted probabilities for 'ideal' or 'typical' individuals
- Predicted probabilities varying values of one x at a time, holding other x constant
- Partial change in the probability for a one-unit change in x (marginal effects)
- Discrete change in the predicted probability for a discrete change in x

- C 6.7: Interaction effects
- C 6.8: Modelling proportions

- Multilevel models for binary responses - concepts (PDF, 115kB) - (sample PDF, 0.1 mb)
- 7 - MLwiN practical (PDF, 210kB) - (sample PDF, 0.2 mb)
- 7 - R practical (PDF, 86kB)- (sample PDF, 0.1 mb)
- 7 - Stata practical (PDF, 98kB) - (sample PDF, 0.1 mb)

In Module 6 we saw how multiple linear regression models for continuous responses can be generalised to handle binary responses. In this module, we consider how these methods can be extended for the analysis of clustered binary data. We show that many of the extensions to the basic multilevel model introduced in Module 5 - for example random slopes and contextual effects - apply also to binary responses. However, there are some important new issues to consider in the interpretation and estimation of multilevel binary response models.

The ideas are illustrated in analyses of voting intentions in the 2004 US general election (using data from the National Annenberg Election Study) and uptake of antenatal care in Bangladesh (using data from the 2004 Bangladesh Demographic and Health Survey).

- C 7.0: introduction
- C 7.1: Two-level random intercept model for binary responses
- C 7.2: Latent Variable Representation of a random intercept model for binary responses
- C 7 3: Population-averaged and cluster-specific effects
- C 7.4: Predicted Probabilities from a multilevel model
- C 7.5: A Two-level random slope model
- C 7.6: Addinglevel 2 explanatory variables: Contextual effects
- C 7.7: Estimation of binary response models

- Module 8 - Multilevel modelling in practice: Research questions, data preparation and analysis - concepts (PDF, 144kB) - (sample PDF, 0.1 mb)

In this module we consider the whole process of conducting a research project using multilevel modelling, taking as an example a study of ethnic differences in educational achievement and progress. The research process starts with the formulation of research questions as hypotheses that can be tested using multilevel models. The next step is to prepare the dataset for analysis, which includes decisions about issues such as coding variables, deriving new variables and handling missing data. The analysis then begins with a detailed exploration of the data before fitting multilevel models, building model complexity gradually. We show how the research process is iterative with the results from initial analyses leading to refinements in the original research questions.

This module builds on Module 5: Introduction to multilevel modelling.

- C 8.1 Introduction and Features of the Module
- C 8.2 Framing the Initial Research Questions
- C 8.3 Data Considerations
- C 8.4 Preliminaries to Research: Getting to Know the Data
- C 8.5 Getting Data Ready for Analysis
- C 8.6 Initial Benchmark Analyses and Model Building Strategies
- C 8.6.1 Model building strategies
- C 8.6.2 Analysis of school and LEA differences in achievement: Approaching research question Q1
- C 8.6.3 Initial models with school type and LEA
- C 8.6.4 An initial model for ethnicity effects: preliminaries to research question Q2
- C 8.6.5 Basic progress models: control for KS2 achievement
- C 8.6.6 Residuals from the basic progress models
- C 8.6.7 Ethnicity and progress: research question Q2 pursued

- C 8.7 Developing the Model for Progress to KS3:
- C 8.8 Interactions Effects: Does Effect on KS3 Outcomes of Earlier KS2 Achievement Differ in Pattern Across Ethnic Groups?
- C 8.9 Random Coefficients: School Effects on Ethnic Group Differences in Progress
- C 8.10 Contextual Effects of Ethnicity: the Role of School Ethnic Composition
- C 8.11 The Final Model
- References and Further Reading

- 9 - Concepts (PDF, 136kB) - (Sample PDF, 1.7 mb)
- 9 MLwiN practical (PDF, 228kB) - (Sample PDF, 1.5 mb)
- 9 - Stata practical (Sample PDF, 0.5mb)

- C 9.0: Introduction
- C 9.1: Cumulative logit model for single-level data
- C 9.2: Continuation ratio logit model
- C 9.3: Random intercept cumulative logit model
- C 9.4: Random slope cumulative logit model
- C 9.5: Contextual effects

- 10 - Concepts (PDF, 1,534kB) - (sample PDF, 1.5 mb)
- 10 MLwiN practical (PDF, 2,321kB) - (sample PDF, 2.3 mb)

- C10.0: Introduction
- C10.1: Multinomial Logit Model for Single-Level Data Lesson
- C10.2: Example: Means of Travel to Work Lesson
- C10.3: Random Intercept Multinomial Logit Model Lesson
- C10.4: Contextual Effects Lesson
- C10.5: Conditional Logit Models: Incorporating Characteristics of Response Alternatives
- C10.5.1 Latent variable formulation of the multinomial logit model
- C10.5.2 Conditional logit model
- C10.5.3 General discrete choice model: Combining the multinomial and conditional logit models
- C10.5.4 Link between conditional/multinomial logit and Poisson regression
- C10.5.5 Multilevel conditional logit model

- 11 - Concepts (PDF, 753kB) - (sample PDF, 0.8 mb)
- 11 MLwiN practical (PDF, 1,108kB) - (sample PDF, 1.1 mb)
- 11 Stata practical (PDF, 825kB) - (sample PDF, 0.8 mb)

- C11.0: Introduction
- C11.1 Understanding Three-Level Data Structures
- C11.2: A Three-Level Variance Components Model
- C11.2.1 Specifying the three-level model
- C11.2.2 Interpretation of the intercept and the random effects
- C11.2.3 Testing for cluster effects
- C11.2.4 Calculating coverage intervals, variance partition coefficients (VPCs) and intraclass correlation coefficients (ICCs)
- C11.2.5 Predicting and examining cluster effects
- C11.2.6 Example: Students nested within school-cohorts nested within schools

- C11.3 Adding Predictor Variables
- C11.4 Adding Random Coefficients
- C11.5 Adding Further Levels

- 12 - Concepts - (sample PDF, 1.0 mb)
- 12 - MLwiN practical - (sample PDF, 1.2 mb)
- 12 - Stata practical - (sample PDF, 0.8 mb)

- C12.0: Introduction
- C12.1 Understanding Cross-Classified Data Structures
- C12.2 A Cross-Classified Variance Components Model
- C12.2.1 Specifying the two-way cross-classified model
- C12.2.2 Interpretation of the intercept and the random effects
- C12.2.3 Testing for cluster effects
- C12.2.4 Calculating coverage intervals, variance partition coefficients (VPCs) and intraclass correlation coefficients (ICCs)
- C12.2.5 Predicting and examining cluster effects
- C12.2.6 Example: Secondary schools crossed with primary schools

- C12.3 Adding a Random Interaction Classification
- C12.4 Adding Predictor Variables
- C12.5 Adding Random Coefficients
- C12.6 Adding Further Classifications

- 13 - Concepts - (sample PDF, 1.0 mb)
- 13 - MLwiN practical - (sample PDF, 1.0 mb)
- 13 - Stata practical - (sample PDF, 0.8 mb)

- C13.0: Introduction
- C13.1 Understanding Multiple Membership Data Structures
- C13.2 A Multiple Membership Variance Components Model
- C13.2.1 Specifying the two-level multiple membership model
- C13.2.2 Interpretation of the intercept and the random effects
- C13.2.3 Testing for cluster effects
- C13.2.4 Calculating coverage intervals, variance partition coefficients (VPCs) and intraclass correlation coefficients (ICCs)
- C13.2.5 Predicting and examining cluster effects
- C13.2.6 Example: Students as multiple members of schools

- C13.3 Exploring Alternative Multiple Membership Weighting Schemes
- C13.4 Adding Predictor Variables
- C13.5 Adding Random Coefficients
- C13.6 Adding Further Classifications

- 14 - Concepts - (sample PDF, 0.6 mb)
- 14 - MLwiN practical - (sample PDF, 0.7 mb)
- 14 - Stata practical - (sample PDF, 0.4 mb)

- C14.0: Introduction
- C14.1: The Model of Interest
- C14.2 Investigating Missingness
- C14.3 Ad-hoc Methods
- C14.4: Complete Records Analysis
- C14.5: Multiple Imputation
- C14.5.1: Creating the imputations
- C14.5.2: Choosing variables to include in the imputation model
- C14.5.3: Proper imputation
- C14.5.4: Using the imputed datasets
- C14.5.5: How many imputations should we generate?
- C14.5.6: Class size data
- C14.5.7: Assumptions made by multiple imputation
- C14.5.8: Software

- C14.6 Inverse Probability Weighting
- C14.7: Multilevel and Longitudinal Studies
- C14.8: Summary and Conclusions

- 15 - Concepts - (sample PDF, 0.4 mb)
- 15 - MLwiN practical - (sample PDF, 0.5 mb)
- 15 - Stata practical - (sample PDF, 0.2 mb)

- 15.0: Introduction Lesson
- 15.1: Overview of Repeated Measures Data and Methods Lesson
- 15.2: Introduction to Growth Curve Models Lesson
- 15.3: Linear Growth Model for Continuous Repeated Measures Lesson
- 15.4: Nonlinear Growth Lesson
- 15.5: Adding Explanatory Variables: Fitting Group-specific Growth Curves Lesson
- 15.6: Residual Autocorrelation Lesson
- 15.7: Introduction to Dynamic Models Lesson
- 15.8: The Initial Conditions Problem Lesson
- 15.9: Advanced Topics