REALCOM: Developing multilevel models for REAListically COMplex social science data

Realcom logoThis software specialises in three areas: models with responses at several levels of a data hierarchy, multilevel structural equation models, and measurement error modelling. The models developed under the project were estimated using Markov Chain Monte Carlo (MCMC) estimation.


REALCOM downloads


We no longer support the original mixed-responses module of REALCOM as the functionality is included in REALCOM-Impute. Because of this the original REALCOM installer has been split into realcom-factor and realcom-measerr. Please note that these installers no longer contain the training manual, so you will need to download this from the link above.

Note: During installation you may get a message on your screen: ".Net Framework is not installed - do you want to stop this installation and install .Net first?". Answer: You do not need .Net to run the application. Further instructions are available in the training manual on page 2 (page 5 of the PDF).

Other materials


Bug fixes



Previous bugs (earlier versions of the software)


To resolve the following bugs please ensure you have the latest version of REALCOM.

(Back to top)


The research project


The project developed new methodology and associated training materials in the following areas of multilevel modelling: structural equation models, measurement errors and multivariate mixed response types at more than one level of the data hierarchy. The models developed under the project were estimated using Markov Chain Monte Carlo (MCMC) estimation.

The methodology builds upon that already implemented in MLwiN which is described in the MLwiN manuals. The training materials are written in MATLAB. and are available as free-standing programs. They are designed to interface with MLwiN in terms of data transfer but have their own graphical user interfaces for setting up models and displaying results. There is a set of training materials (PDF, 791kB). which provides an introduction to the methodology and a guide to using the software.

Applications are to a variety of problems, including flexible prediction models, multiple imputation for missing data in multilevel models, and misclassification errors in social status data.

Three repeated 1-day workshops were held in Bristol, London and Birmingham, June/July 2007.

The ESRC has rated this project as outstanding. The outstanding grade indicates that a project has fully met its objectives and has provided an exceptional research contribution well above average or very high in relation to the level of award. Go to ESRC award details.

The methodology builds upon that already implemented in MLwiN version 2.02 which is described in the MLwiN manuals. The training materials are written in MATLAB and are available as free-standing programs. They are designed to interface with MLwiN in terms of data transfer but have their own graphical user interfaces for setting up models and displaying results.


Measurement errors


In many of the variables used in the social and medical sciences measurement errors are found. These can arise from unreliable measuring instruments, problems with variable definitions or simply reflect temporal fluctuations, for example within individual units. The errors we are concerned with are essentially considered as random and distinct from systematic errors which can lead to biases.

There is a large statistical literature on the modelling of such errors, mostly dealing with the case of continuously distributed variables in single level linear and non-linear models. Fuller (2006) provides a comprehensive treatment. In this work we develop existing work based upon MCMC estimation for multilevel models (Browne et al., 2001) and incorporated in the MLwiN software (Browne, 2004). We deal with the 2-level case in detail with extensions to three levels being relatively straightforward. Extensions to handle cross classified and multiple membership models (Goldstein, 2003, Chapters 11 &12) also involve just the addition of appropriate sampling steps within the MCMC algorithm. The consequences of ignoring measurement errors are well known and typically lead to underestimation of coefficients and biased standard errors. In multilevel models we will also obtain biased estimates of covariance matrices.

The innovations introduced are to handle correlated measurement errors and also misclassification errors in binary predictor variables. The main example is taken from a study of class size which involves both continuous predictors with correlated measurement errors and a binary predictor with misclassification error.

(Back to top)


Latent variable (factor) models


The MATLAB routines that have been developed extend existing models for multilevel factor analysis in t he following ways. First, they allow certain constraints across parameters that are important for interpretation. Secondly, they allow different ways of specifying level 2 latent variables and thirdly they use MCMC estimation rather than maximum likelihood (ML). One problem with ML estimation is that it becomes very slow when the number of parameters in the model becomes large, typically increasing factorially with the number of parameters; MCMC estimation, however, avoids this kind of dependence on the number of parameters.

The workshops presented two examples, one from demography and one from education, that illustrate, for two levels, how to set up and analyse such models.

(Back to top)


Responses at more than one level


Multivariate models, including those which incorporate a multilevel structure are traditionally confined to responses at the lowest level of the data hierarchy and usually also deal with Normally distributed responses. One exception to the latter, and implemented in MLwiN is where the responses are all binary or sometimes Normal as well. Browne (2004) discuss such models and gives examples. There are also some examples of the use of Normal responses jointly at levels 1 and 2; Steele et al (2007) model pupil and school level Normal responses in a multiprocess model for evaluating the impact of school resources on student achievement, and Goldstein (1989) fits a model with repeated measures on individuals during growth (level 1) jointly with their adult height (level 2) as the basis for an efficient prediction model. The MATLAB routines allow any of the responses additionally to be ordered or unordered categorical variables. This is particularly useful when we wish to carry out a multiple imputation for missing data, where missingness may occur with continuous or discrete data. Examples are given using growth data and class size data.


Papers


(Back to top)

(Back to top)


The REALCOM team


Harvey Goldstein, (project director), Jon Rasbash, Fiona Steele (co-directors), Christopher Charlton (research officer), Hilary Browne (web developer), Sophie Pollard (project assistant)

This three-year ESRC-funded research project developed multilevel modelling techniques, software and training materials in three areas: models with responses at several levels of a data hierarchy, multilevel structural equation models, and measurement error modelling. The models developed under the project were estimated using Markov Chain Monte Carlo (MCMC) estimation.


Missing data


Missing data are a persistent problem in social and other datasets. A standard technique for handling missing values efficiently is known as multiple imputation and the software REALCOM-IMPUTE is unique in that it has been designed to implement this procedure for 2-level data. Apart from being able to deal with 2-level data it can also handle properly categorical data, whether in the response or predictor variables in a model. An interface is provided with MLwiN that allows users to carry out the full procedure and fit their final model semi-automatically.


References


Blatchford, P., Goldstein, H., Martin, C. and Browne, W. (2002). A study of class size effects in English school reception year classes. British Educational Research Journal 28: 169-185.

Browne, W. J. (2004). MCMC estimation in MLwiN. Version 2.0. London, Institute of Education.

Browne, W., Goldstein, H., Woodhouse, G. and yang, M. (2001). An MCMC algorithm for adjusting for errors in variables in random slopes multilevel models. Multilevel modelling newsletter 13(1): 4-9. Fuller, W. A. (2006). Measurement Error Models. New York, Wiley: Goldstein, H. (1989). Models for Multilevel Response variables with an application to Growth Curves. Multilevel Analysis of Educational Data. R. D. Bock. New York, Academic Press: 107-125.

Goldstein, H. (2003). Multilevel Statistical Models. Third edition. London, Edward Arnold:

Goldstein, H. and Browne, W. (2005). Multilevel factor analysis models for continuous and discrete data. Contemporary Psychometrics. A Festschrift to Roderick P. McDonald. A. Olivares and J. J. McArdle. Mahwah, NJ:, Lawrence Erlbaum.

Lawley, D. N. and Maxwell, A. E. (1971). Factor analysis as a statistical method. London, Butterworth:

Mathworks (2004). Matlab

McDonald, R. P. and Goldstein, H. (1989). Balanced versus unbalanced designs for linear structural relations in two-level data. British Journal of mathematical and statistical psychology 42: 215-232.

Rabe-Hesketh, S., Pickles, A. and Skrondal, A. (2001). GLLAMM: a general class of multilevel models and a STATA program. Multilevel modelling newsletter 13(1): 17-23.

Steele, F., Vignoles, A. and Jenkins, A. (2007). The Impact of School Resources on Pupil Attainment: A Multilevel Simultaneous Equation Modelling Approach. Journal of the Royal Statistical Society, A. 170.

(Back to top)

Note: some of the documents on this page are in PDF format. In order to view a PDF you will need Adobe Acrobat Reader