Skip to main content

Unit information: Data Science Toolbox in 2018/19

Please note: you are viewing unit and programme information for a past academic year. Please see the current academic year for up to date information.

Unit name Data Science Toolbox
Unit code MATHM0029
Credit points 20
Level of study M/7
Teaching block(s) Teaching Block 4 (weeks 1-24)
Unit director Dr. Lawson
Open unit status Not open
Pre-requisites

Probability 1, Statistics 1 and Statistics 2 (or equivalent)

Co-requisites

Introduction to Mathematical Cybersecurity

School/department School of Mathematics
Faculty Faculty of Science

Description including Unit Aims

The purpose of this unit is to provide all students with theoretical and (especially) practical data science literacy relevant to cybersecurity. It will cover the following topics.

  1. Exploratory Data Analysis tools (including data summaries; regression; visualisation; clustering; statistical testing; outlier detection) using appropriate languages such as R and Python.
  2. Applied Machine Learning (including fitting Random Forests, topic models & neural networks; cross validation; interpretation of performance metrics)
  3. Handling Big Data (including the use of command line tools; data processing algorithms, for example, bloom filters and streaming summarisation; introduction to computational complexity; Big Data platforms, for example, Hadoop and Spark)

This unit will be partly assessed by coursework with a focus on real cybersecurity datasets.

Intended Learning Outcomes

ILO1 Be able to access and process cyber security data into a format suitable for mathematical reasoning

ILO2 Be able to use and apply basic machine learning tools

ILO3 Be able to make and report appropriate inferences from the results of applying basic tools to data

ILO4 Be able to use high throughput computing infrastructure and understand appropriate algorithms

ILO5 Be able to reason about and conceptually align problems involving real data to appropriate theoretical methods and available methodology to correctly make inferences and decisions

Teaching Information

2 lectures per week for 12 weeks, plus a 2 hour biweekly practical. Lectures include 24 hours of new material.

Assessment Information

Ongoing practical assignments (50%) - marked on best 4 out 5 ongoing practical assessments. Practicals will be held every two weeks and will be submitted biweekly. The first practical is formative, the remaining 5 are also assessed.

Exam (50%) - to assess the underlying ideas and interpretation

All ILOs will be examined across the pieces of coursework, though some pieces of coursework will focus more on one particular ILO.

Reading and References

Tukey, J. W. Exploratory data analysis, Addison-Wesley, 1977.

Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. The elements of statistical learning. (2nd edition) Springer New York, 2009.

Karau, H., A. Konwinski, P. Wendell and M. Zaharia. Learning spark: lightning-fast big data analysis, O'Reilly Media, 2015

Scikit online tutorialsl: http://scikit-learn.org/stable/tutorial/index.html

Feedback