Skip to main content

Unit information: Stochastic Optimisation in 2020/21

Please note: you are viewing unit and programme information for a past academic year. Please see the current academic year for up to date information.

Unit name Stochastic Optimisation
Unit code MATH30021
Credit points 20
Level of study H/6
Teaching block(s) Teaching Block 1 (weeks 1 - 12)
Unit director Dr. Ayalvadi Ganesh
Open unit status Not open

MATH11300 Probability 1 (or MATH10013 Probability and Statistics) and MATH20008 Probability 2



School/department School of Mathematics
Faculty Faculty of Science

Description including Unit Aims

Lecturers: Vladislav Tadic and Ayalvadi Ganesh

Unit Aims

The unit deals with the study of optimisation under uncertainty. It introduces some of the main modelling frameworks within which a wide variety of such problems can be set, before going on to study algorithms for their solution, and the analysis of these algorithms.

Unit Description

Stochastic optimisation covers a broad framework of problems at the interface of applied probability and optimisation. The unit will cover both static and dynamic problems. Static problems involve the optimisation of functions whose values are available only through noise-corrupted observations. Dynamic problems involve sequential decision-making to optimise some measure of long-term reward in a stochastic system evolving over time. The two main models studied in this context will be multi-armed bandit problems and Markov decision processes.

The unit will emphasise theoretical analysis of algorithms and derivation of optimal algorithms, as well as applications.

Intended Learning Outcomes

Students who successfully complete this unit should be able to:

  • recognise and construct appropriate formal multi-armed bandit (MAB) and Markov decision process (MDP) models from informal problem descriptions;
  • use a variety of probability inequalities to prove bounds on algorithms;
  • construct appropriate optimality equations for MDPs and prove the existence of solutions;
  • use appropriate computational techniques to solve MABs and MDPs.

Teaching Information

The unit will be taught through a combination of

  • synchronous online and, if subsequently possible, face-to-face lectures
  • asynchronous online materials, including narrated presentations and worked examples
  • guided asynchronous independent activities such as problem sheets and/or other exercises
  • synchronous weekly group problem/example classes, workshops and/or tutorials
  • synchronous weekly group tutorials
  • synchronous weekly office hours

Assessment Information

90% Timed, open-book examination 10% Coursework

Raw scores on the examinations will be determined according to the marking scheme written on the examination paper. The marking scheme, indicating the maximum score per question, is a guide to the relative weighting of the questions. Raw scores are moderated as described in the Undergraduate Handbook.

If you fail this unit and are required to resit, reassessment is by a written examination in the August/September Resit and Supplementary exam period.

Reading and References


  • S. Bubeck and N. Cesa-Bianchi, Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Foundations and Trends in Machine Learning, 2012
  • P. Auer, N. Cesa-Bianchi and P. Fischer, Finite Time Analysis of the Multiarmed Bandit Problem, Machine Learning, 2002
  • E. Kaufmann, N. Korda and R. Munos, Thompson Sampling: An Asymptotically Optimal Finite Time Analysis, Algorithmic Learning Theory, 2012
  • J. Tsitsiklis, A Short Proof of the Gittins Index Theorem, Annals of Applied Probability, 1994
  • M. L. Puterman, Markov Decision Processes, Wiley, 2005
  • D. P. Bertsekas, Dynamic Programming and Optimal Control, vols. 1 and 2, Athena Scientific, 2005 and 2007
  • J. C. Spall, Introduction to Stochastic Search and Optimization, Wiley, 2005.