Skip to main content

Unit information: Statistical Computing and Empirical Methods in 2021/22

Please note: It is possible that the information shown for future academic years may change due to developments in the relevant academic field. Optional unit availability varies depending on both staffing, student choice and timetabling constraints.

Unit name Statistical Computing and Empirical Methods
Unit code EMATM0061
Credit points 20
Level of study M/7
Teaching block(s) Teaching Block 1 (weeks 1 - 12)
Unit director Dr. Reeve
Open unit status Not open
Pre-requisites

Students taking this course are expected to have a strong background in CS/SE. It is also expected that students taking this course have an understanding of mathematical topics such as basic calculus, linear algebra and probability typically covered within A level Mathematics(or equivalent).

Co-requisites

None

School/department Department of Engineering Mathematics
Faculty Faculty of Engineering

Description

The aim of this unit is to provide students with a broad introduction to the principles of statistical computing and empirical methods using the R programming language. We will cover topics such as data wrangling and data exploration, statistical significance testing, parameter estimation, experimental design and regression analysis.

Many of these topics are commonly taught in STEM subjects such as physics, psychology, or engineering mathematics, but are very rarely covered in any depth on Computer Science (CS) or Software Engineering (SE) degrees. For that reason, this unit is aimed primarily at postgraduate students with a strong background in CS/SE. It is also expected that students taking this course have an understanding of mathematical topics such as basic calculus, linear algebra and probability typically covered within A level Mathematics(or equivalent).

Intended learning outcomes

On successful completion of this unit, students should be able to:

  1. Select and successfully apply appropriate statistical significance tests to evaluate a research hypothesis. Appreciate the importance of test size and power and have the ability to investigate these concepts empirically through simulation studies.
  2. Demonstrate their ability to select and employ appropriate tools to perform a variety of data wrangling tasks including the gathering and cleaning of tabular data sets.
  3. Critically appraise scientific conclusions drawn from data, with reference to concepts from the theory of experimental design such as selection bias, confounding variables, and measurement errors. In addition, students should understand the relative merits of designed experiments relative to observational studies. Students should also understand basic algorithmic approaches to sequential experimental design with an understanding of the exploration-exploitation trade-off.
  4. Understand the maximum likelihood approach to estimating the parameters of a statistical model and apply these concepts to basic supervised learning approaches. In addition, students should be able to apply interval estimators to reflect the level of confidence in the value of a parameter and understand the connection between interval estimation and hypothesis testing.
  5. Demonstrate an understanding of basic probabilistic concepts necessary for a developing a clear understanding of basic statistical techniques used in Data Science. This includes concepts such as probability mass functions, probability density functions, discrete and continuous random variables, expectation, variance and covariance. In addition, students should understand the concept of a conditional probability and be able to state and apply Bayes theorem. Students should also have a basic familiarity with commonly used distributions such as the Gaussian, the chi-squared and Student's t-distribution.

Teaching details

Teaching will be delivered through a combination of synchronous and asynchronous sessions, including lectures, practical activities and self-directed exercises.

Assessment Details

Examination (50%), Coursework (50%)

The examination will contain a variety of questions intended to assess all of the topics covered within the course.

The coursework will take the form of a Data Science report produced using Rmarkdown. This will allow you to demonstrate your data wrangling and statistical skills in a greater level of detail by focusing on a specific area of interest.

Reading and References

  • D. Montgomery (2019) Design and Analysis of Experiments, 9th Edition, Wiley.
  • M. Pett (2015) Nonparametric Statistics for Health Care Research: Statistics for Small Samples and Unusual Distributions. 2nd Edition. Sage Publications.
  • J. Vandenplas (2016) Python Data Science Handbook: Essential Tools for Working with Data. O' Reilly.
  • S. Baumer, D. Kaplan, & N. Horton (2017) Modern Data Science with R. CRC Press.
  • P. Bruce, A. Bruce, & P. Gedeck (2020) Practical Statistics for Data Scientists: 50+ Essential Concepts Using R and Python'. 2nd Edition, O’Reilly.
  • A. Field & G. Hole (2003) How to Design & Report Experiments. SAGE Publications.
  • R. Mitchell (2018) Web Scraping With Python, Second Edition, O’Reilly.

Feedback