Skip to main content

Unit information: Stochastic Optimisation in 2019/20

Please note: Due to alternative arrangements for teaching and assessment in place from 18 March 2020 to mitigate against the restrictions in place due to COVID-19, information shown for 2019/20 may not always be accurate.

Please note: you are viewing unit and programme information for a past academic year. Please see the current academic year for up to date information.

Unit name Stochastic Optimisation
Unit code MATH30021
Credit points 20
Level of study H/6
Teaching block(s) Teaching Block 1 (weeks 1 - 12)
Unit director Dr. Tadic
Open unit status Not open

MATH11300 Probability 1 (or MATH10013 Probability and Statistics) and MATH20008 Probability 2



School/department School of Mathematics
Faculty Faculty of Science

Description including Unit Aims

Unit Aims

The unit deals with the study of optimisation under uncertainty. It introduces some of the main modelling frameworks within which a wide variety of such problems can be set, before going on to study algorithms for their solution, and the analysis of these algorithms.

Unit Description

Stochastic optimisation covers a broad framework of problems at the interface of applied probability and optimisation. The unit will cover both static and dynamic problems. Static problems involve the optimisation of functions whose values are available only through noise-corrupted observations. Dynamic problems involve sequential decision-making to optimise some measure of long-term reward in a stochastic system evolving over time. The two main models studied in this context will be multi-armed bandit problems and Markov decision processes.

The unit will emphasise theoretical analysis of algorithms and derivation of optimal algorithms, as well as applications.

Intended Learning Outcomes

Students who successfully complete this unit should be able to:

  • recognise and construct appropriate formal multi-armed bandit (MAB) and Markov decision process (MDP) models from informal problem descriptions;
  • use a variety of probability inequalities to prove bounds on algorithms;
  • construct appropriate optimality equations for MDPs and prove the existence of solutions;
  • use appropriate computational techniques to solve MABs and MDPs.

Teaching Information

Lectures, supported by problem and solution sheets.

Assessment Information


10 problem sheets, approximately one per week, with feedback provided on selected problems. Full solutions will be provided for all problems.


100% Exam

Reading and References


  • S. Bubeck and N. Cesa-Bianchi, Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Foundations and Trends in Machine Learning, 2012
  • P. Auer, N. Cesa-Bianchi and P. Fischer, Finite Time Analysis of the Multiarmed Bandit Problem, Machine Learning, 2002
  • E. Kaufmann, N. Korda and R. Munos, Thompson Sampling: An Asymptotically Optimal Finite Time Analysis, Algorithmic Learning Theory, 2012
  • J. Tsitsiklis, A Short Proof of the Gittins Index Theorem, Annals of Applied Probability, 1994
  • M. L. Puterman, Markov Decision Processes, Wiley, 2005
  • D. P. Bertsekas, Dynamic Programming and Optimal Control, vols. 1 and 2, Athena Scientific, 2005 and 2007
  • J. C. Spall, Introduction to Stochastic Search and Optimization, Wiley, 2005.