Skip to main content

Unit information: Statistical Methods 1 in 2021/22

Please note: It is possible that the information shown for future academic years may change due to developments in the relevant academic field. Optional unit availability varies depending on both staffing, student choice and timetabling constraints.

Unit name Statistical Methods 1
Unit code MATHM0041
Credit points 20
Level of study M/7
Teaching block(s) Teaching Block 1 (weeks 1 - 12)
Unit director Dr. Song Liu
Open unit status Not open
Pre-requisites

None

Co-requisites

None

School/department School of Mathematics
Faculty Faculty of Science

Description

This unit covers the topic of prediction, from the initial consultation with the client all the way through to delivering an effective prediction algorithm and quantifying its out-of-sample performance. Prediction is an important activity in its own right, but we also use it to illustrate many of the major topics in computational statistics. These include statistical optimality, the limitations of naive approaches such as nearest-neighbour and cross-validation, the Normal Linear Model for regression, the concepts of prior, posterior, and predictive distributions, regression modelling with basis expansions, extensions to data-dependent regressors, and the treatment of more complex parameters through optimization.

In its most sophisticated form, the extended Normal Linear Model provides a powerful and computationally tractable platform for regression and optimal prediction, but does not provide similar benefits for classification. The later part of the unit considers the challenges presented by classification, and the various approaches that are used to approximate the predictive distribution. These approaches, including numerical optimization of penalized likelihoods and approximate numerical integration, are core tools in computational statistics and machine learning.

Intended learning outcomes

By the end of the unit students should be able to:

  • Formulate a prediction problem with a client, either regression or classification, including a discussion about an appropriate loss function, and about out-of-sample performance.
  • Demonstrate both theoretically and numerically how the curse of dimensionality undermines naive approaches to prediction.
  • Describe the purpose of a parametric model, and explain the benefits and the limitations of integrate-out versus plug-in, for the model parameters.
  • For the Normal Linear Model for regression, derive the explicit forms for the posterior and predictive (‘integrate-out’) distributions, and the marginal likelihood as a function of hyperparameters, and code these into an efficient and numerically-stable prediction algorithm.
  • State the discriminative modelling framework for classification, and contrast it with regression, highlighting the additional challenges that classification brings over regression.
  • Outline numerical approximation methods which can be used for both ‘plug-in’ and ‘integrate-out’ approaches to classification, and code these into an algorithm based on existing tools, such as the Generalized Linear Model and ‘glmnet’ in R.

Teaching details

Some lab based instruction

Assessment Details

Formative: homework each week.

Summative:

  1. A personal portfolio of notes, code snippets, and vignettes, 30%.
  2. Assessed coursework, 2 at 20% each.
  3. A group project, 30%.

Reading and References

T. Hastie, R. Tibshirani, and J. Friedman (2017), The Elements of Statistical Learning, 2nd edition, Springer.

Feedback