# Unit information: Statistical Methods 1 in 2022/23

Please note: you are viewing unit and programme information for a past academic year. Please see the current academic year for up to date information.

Unit name Statistical Methods 1 MATHM0041 20 M/7 Teaching Block 1 (weeks 1 - 12) Dr. Song Liu Not open None None None School of Mathematics Faculty of Science

## Unit Information

This unit covers the topic of prediction, from the initial consultation with the client all the way through to delivering an effective prediction algorithm and quantifying its out-of-sample performance. Prediction is an important activity in its own right, but we also use it to illustrate many of the major topics in computational statistics. These include statistical optimality, the limitations of naive approaches such as nearest-neighbour and cross-validation, the Normal Linear Model for regression, the concepts of prior, posterior, and predictive distributions, regression modelling with basis expansions, extensions to data-dependent regressors, and the treatment of more complex parameters through optimization.

In its most sophisticated form, the extended Normal Linear Model provides a powerful and computationally tractable platform for regression and optimal prediction, but does not provide similar benefits for classification. The later part of the unit considers the challenges presented by classification, and the various approaches that are used to approximate the predictive distribution. These approaches, including numerical optimization of penalized likelihoods and approximate numerical integration, are core tools in computational statistics and machine learning.

## Your learning on this unit

By the end of the unit students should be able to:

• Formulate a prediction problem with a client, either regression or classification, including a discussion about an appropriate loss function, and about out-of-sample performance.
• Demonstrate both theoretically and numerically how the curse of dimensionality undermines naive approaches to prediction.
• Describe the purpose of a parametric model, and explain the benefits and the limitations of integrate-out versus plug-in, for the model parameters.
• For the Normal Linear Model for regression, derive the explicit forms for the posterior and predictive (‘integrate-out’) distributions, and the marginal likelihood as a function of hyperparameters, and code these into an efficient and numerically-stable prediction algorithm.
• State the discriminative modelling framework for classification, and contrast it with regression, highlighting the additional challenges that classification brings over regression.
• Outline numerical approximation methods which can be used for both ‘plug-in’ and ‘integrate-out’ approaches to classification, and code these into an algorithm based on existing tools, such as the Generalized Linear Model and ‘glmnet’ in R.

## How you will learn

Some lab based instruction

## How you will be assessed

Formative: homework each week.

Summative:

1. A personal portfolio of notes, code snippets, and vignettes, 30%.
2. Assessed coursework, 2 at 20% each.
3. A group project, 30%.

## Resources

If this unit has a Resource List, you will normally find a link to it in the Blackboard area for the unit. Sometimes there will be a separate link for each weekly topic.

If you are unable to access a list through Blackboard, you can also find it via the Resource Lists homepage. Search for the list by the unit name or code (e.g. MATHM0041).

How much time the unit requires
Each credit equates to 10 hours of total student input. For example a 20 credit unit will take you 200 hours of study to complete. Your total learning time is made up of contact time, directed learning tasks, independent learning and assessment activity.