Skip to main content

Unit information: Data Science Toolbox in 2020/21

Please note: you are viewing unit and programme information for a past academic year. Please see the current academic year for up to date information.

Unit name Data Science Toolbox
Unit code MATHM0029
Credit points 20
Level of study M/7
Teaching block(s) Teaching Block 4 (weeks 1-24)
Unit director Dr. Lawson
Open unit status Not open
Pre-requisites

MATH11300 Probability 1, MATH11400 Statistics 1 and MATH20800 Statistics 2

Co-requisites

MATHM0028 Introduction to Mathematical Cybersecurity

School/department School of Mathematics
Faculty Faculty of Science

Description including Unit Aims

Unit Aims

The purpose of this unit is to provide all students with theoretical and (especially) practical data science literacy relevant to cybersecurity.

Unit Description

This unit will cover the following topics.

  1. Exploratory Data Analysis tools (including data summaries; regression; visualisation; clustering; statistical testing; outlier detection) using appropriate languages such as R and Python.
  2. Applied Machine Learning (including fitting Random Forests, topic models &neural networks; cross validation; interpretation of performance metrics).
  3. Handling Big Data (including the use of command line tools; data processing algorithms, for example, bloom filters and streaming summarisation; introduction to computational complexity; Big Data platforms, for example, Hadoop and Spark).

This unit will be partly assessed by coursework with a focus on real cybersecurity datasets.

Intended Learning Outcomes

By the end of the unit, students will:

  • Be able to access and process cyber security data into a format suitable for mathematical reasoning
  • Be able to use and apply basic machine learning tools
  • Be able to make and report appropriate inferences from the results of applying basic tools to data
  • Be able to use high throughput computing infrastructure and understand appropriate algorithms
  • Be able to reason about and conceptually align problems involving real data to appropriate theoretical methods and available methodology to correctly make inferences and decisions
  • Be able to work as part of a team to apply mathematical methods to difficult data science problems

Teaching Information

The unit will be taught through a combination of

  • synchronous online and, if subsequently possible, face-to-face lectures
  • asynchronous online materials, including narrated presentations and worked examples
  • guided asynchronous independent activities such as problem sheets and/or other exercises
  • synchronous weekly group problem/example classes, workshops and/or tutorials
  • synchronous weekly group tutorials
  • synchronous weekly office hours

Assessment Information

  • 50% Timed, open-book examination (to assess the underlying ideas and interpretation)
  • 50% Practical Assignments (The first practical is formative, the remaining 5 are also assessed and marked on best 4 out 5. All ILOs will be examined across the pieces of coursework, though some pieces of coursework will focus more on one particular ILO.)

Reading and References

Recommended

  • Jerome Friedman, Trevor Hastie and Robert Tibshirani, The Elements of Statistical Learning, (2nd edition) Springer, 2009
  • Holden Karau, Andy Konwinski, Patrick Wendell and Matei Zaharia, Learning Spark: Lightning-fast Big Data Analysis, O'Reilly Media, 2015
  • John W. Tukey, Exploratory Data Analysis, Addison-Wesley, 1977
  • Scikit online tutorialsl: http://scikit-learn.org/stable/tutorial/index.html

Feedback