# Unit information: Statistical Methods 1 in 2019/20

Please note: Due to alternative arrangements for teaching and assessment in place from 18 March 2020 to mitigate against the restrictions in place due to COVID-19, information shown for 2019/20 may not always be accurate.

Please note: you are viewing unit and programme information for a past academic year. Please see the current academic year for up to date information.

Unit name Statistical Methods 1 MATHM0041 20 M/7 Teaching Block 1 (weeks 1 - 12) Dr. Song Liu Not open None None School of Mathematics Faculty of Science

## Description

This unit covers the topic of prediction, from the initial consultation with the client all the way through to delivering an effective prediction algorithm and quantifying its out-of-sample performance. Prediction is an important activity in its own right, but we also use it to illustrate many of the major topics in computational statistics. These include statistical optimality, the limitations of naive approaches such as nearest-neighbour and cross-validation, the Normal Linear Model for regression, the concepts of prior, posterior, and predictive distributions, regression modelling with basis expansions, extensions to data-dependent regressors, and the treatment of more complex parameters through optimization.

In its most sophisticated form, the extended Normal Linear Model provides a powerful and computationally tractable platform for regression and optimal prediction, but does not provide similar benefits for classification. The later part of the unit considers the challenges presented by classification, and the various approaches that are used to approximate the predictive distribution. These approaches, including numerical optimization of penalized likelihoods and approximate numerical integration, are core tools in computational statistics and machine learning.

## Intended learning outcomes

By the end of the unit students should be able to:

• Formulate a prediction problem with a client, either regression or classification, including a discussion about an appropriate loss function, and about out-of-sample performance.
• Demonstrate both theoretically and numerically how the curse of dimensionality undermines naive approaches to prediction.
• Describe the purpose of a parametric model, and explain the benefits and the limitations of integrate-out versus plug-in, for the model parameters.
• For the Normal Linear Model for regression, derive the explicit forms for the posterior and predictive (‘integrate-out’) distributions, and the marginal likelihood as a function of hyperparameters, and code these into an efficient and numerically-stable prediction algorithm.
• State the discriminative modelling framework for classification, and contrast it with regression, highlighting the additional challenges that classification brings over regression.
• Outline numerical approximation methods which can be used for both ‘plug-in’ and ‘integrate-out’ approaches to classification, and code these into an algorithm based on existing tools, such as the Generalized Linear Model and ‘glmnet’ in R.

## Teaching details

Some lab based instruction

## Assessment Details

Formative: homework each week.

Summative:

1. A personal portfolio of notes, code snippets, and vignettes, 30%.
2. Assessed coursework, 2 at 20% each.
3. A group project, 30%.