# Unit information: Statistical Machine Learning in 2022/23

Unit name Statistical Machine Learning MATH30028 20 H/6 Teaching Block 2 (weeks 13 - 24) Professor. Anthony Lee Not open Statistics 2 or Econometrics 1 Students are expected to be familiar with programming, and assessed coursework will involve substantial amounts of programming in a high-level language such as R N/A N/A School of Mathematics Faculty of Science

## Unit Information

Unit Aims

Why is this unit important?

Statistical machine learning is an increasingly important approach to extracting valuable information from data. In particular, and when used appropriately, it allows for a data-driven approach to solving various problems that cannot be solved from first principles alone.

This unit will develop theoretically and practically a selection of fundamental machine learning problems and commonly used solutions.

How does this unit fit into your programme of study

Although there are several statistical units available to Mathematics students, there is not at present a statistical machine learning unit. We expect that such a unit would complement very well the other statistical units while also being of interest to some students who are less focused on statistics. This unit would also be very suitable for BSc Data Science students.

In contrast to statistical units that cover some similar ideas, there is a stronger emphasis on algorithms in this machine learning unit.

## An overview of content

Machine learning is concerned with algorithms that process relevant data and then perform some task. Often, performance of machine learning algorithms is measured statistically, and the algorithms themselves are heavily influenced by statistical ideas. For example, after observing several (x,y) pairs an algorithm may be able to predict with high accuracy the corresponding value of y for an unseen x. When the data is complex and/or high-dimensional, a number of statistical and algorithmic issues arise: a sufficiently rich class of statistical models must be used effectively and irrelevant data should be identified and then discarded.

## How will students, personally, be different as a result of the unit

Students will understand the statistical approach to analyzing data, and how it can be used to effectively perform tasks under appropriate assumptions. This will then enable students to formulate various real-life problems as statistical learning tasks and use common techniques to develop solutions.

By the end of the course the students should be able to:

• Perform linear and non-linear regression and classification in certain settings.
• Implement and understand some standard clustering algorithms for unsupervised data.
• Perform appropriate dimension reduction techniques on certain types of data.
• Understand the statistical issues involved when selecting appropriate models, and implement some powerful and general methods for model selection.
• Understand why artificial neural networks can be useful for some problems.

## Learning Outcomes

Students will be able to:

• formulate models for supervised learning, and fit them using common methods.
• choose between different models using data.

perform some unsupervised learning and dimension reduction tasks.

## How you will learn

In addition to lectures introducing the concepts and various algorithms, students will learn:

• in a problem-based manner, engaging with the assessed coursework and implementing the algorithms.
• In an Inquiry-based manner, thinking about how the algorithms covered may be applied to new problems.

interactively, if they choose to work in a pair on assessed coursework.

## How you will be assessed

• Labs where students can work on their assessed work, ask questions of the instructor, and work on any other non-assessed example problems that have been given.
• Implementation of algorithms covered in the lectures, for example, would generally be considered helpful formative tasks.

• 50% from continuous coursework that involves programming and mathematical derivations and is essential to the intended learning outcomes for the unit.
• 50% by written examination.

## When assessment does not go to plan

Students will be offered reassessment for the exam and for coursework.

## Resources

If this unit has a Resource List, you will normally find a link to it in the Blackboard area for the unit. Sometimes there will be a separate link for each weekly topic.

If you are unable to access a list through Blackboard, you can also find it via the Resource Lists homepage. Search for the list by the unit name or code (e.g. MATH30028).

How much time the unit requires
Each credit equates to 10 hours of total student input. For example a 20 credit unit will take you 200 hours of study to complete. Your total learning time is made up of contact time, directed learning tasks, independent learning and assessment activity.