Unit name | Statistical Computing and Empirical Methods |
---|---|

Unit code | EMATM0061 |

Credit points | 20 |

Level of study | M/7 |

Teaching block(s) |
Teaching Block 1 (weeks 1 - 12) |

Unit director | Dr. Reeve |

Open unit status | Not open |

Pre-requisites |
Students taking this course are expected to have a strong background in CS/SE. It is also expected that students taking this course have an understanding of mathematical topics such as basic calculus, linear algebra and probability typically covered within A level Mathematics(or equivalent). |

Co-requisites |
None |

School/department | Department of Engineering Mathematics |

Faculty | Faculty of Engineering |

The aim of this unit is to provide students with a broad introduction to the principles of statistical computing and empirical methods using the R programming language. We will cover topics such as data wrangling and data exploration, statistical significance testing, parameter estimation, experimental design and regression analysis.

Many of these topics are commonly taught in STEM subjects such as physics, psychology, or engineering mathematics, but are very rarely covered in any depth on Computer Science (CS) or Software Engineering (SE) degrees. For that reason, this unit is aimed primarily at postgraduate students with a strong background in CS/SE. It is also expected that students taking this course have an understanding of mathematical topics such as basic calculus, linear algebra and probability typically covered within A level Mathematics(or equivalent).

On successful completion of this unit, students should be able to:

- Select and successfully apply appropriate statistical significance tests to evaluate a research hypothesis. Appreciate the importance of test size and power and have the ability to investigate these concepts empirically through simulation studies.
- Demonstrate their ability to select and employ appropriate tools to perform a variety of data wrangling tasks including the gathering and cleaning of tabular data sets.
- Critically appraise scientific conclusions drawn from data, with reference to concepts from the theory of experimental design such as selection bias, confounding variables, and measurement errors. In addition, students should understand the relative merits of designed experiments relative to observational studies. Students should also understand basic algorithmic approaches to sequential experimental design with an understanding of the exploration-exploitation trade-off.
- Understand the maximum likelihood approach to estimating the parameters of a statistical model and apply these concepts to basic supervised learning approaches. In addition, students should be able to apply interval estimators to reflect the level of confidence in the value of a parameter and understand the connection between interval estimation and hypothesis testing.
- Demonstrate an understanding of basic probabilistic concepts necessary for a developing a clear understanding of basic statistical techniques used in Data Science. This includes concepts such as probability mass functions, probability density functions, discrete and continuous random variables, expectation, variance and covariance. In addition, students should understand the concept of a conditional probability and be able to state and apply Bayes theorem. Students should also have a basic familiarity with commonly used distributions such as the Gaussian, the chi-squared and Student's t-distribution.

Teaching will be delivered through a combination of synchronous and asynchronous sessions, including lectures, practical activities and self-directed exercises.

Examination (50%), Coursework (50%)

The examination will contain a variety of questions intended to assess all of the topics covered within the course.

The coursework will take the form of a Data Science report produced using Rmarkdown. This will allow you to demonstrate your data wrangling and statistical skills in a greater level of detail by focusing on a specific area of interest.

- D. Montgomer
, Wiley.*y (2019) Design and Analysis of Experiments, 9*^{th}Edition - M. Pett (2015)
. Sage Publications.*Nonparametric Statistics for Health Care Research: Statistics for Small Samples and Unusual Distributions. 2*^{nd}Edition - J. Vandenplas (2016)
. O' Reilly.*Python Data Science Handbook: Essential Tools for Working with Data* - S. Baumer, D. Kaplan, & N. Horton (2017)
. CRC Press.*Modern Data Science with R* - P. Bruce, A. Bruce, & P. Gedeck (2020)
*P*'. 2**ractical Statistics for Data Scientists: 50+ Essential Concepts Using R and Python**^{nd}Edition*, O’Reilly.* - A. Field & G. Hole (2003)
. SAGE Publications.*How to Design & Report Experiments* - R. Mitchell (2018)
, O’Reilly.*Web Scraping With Python, Second Edition*