Unit name | Data Science Toolbox |
---|---|

Unit code | MATHM0029 |

Credit points | 20 |

Level of study | M/7 |

Teaching block(s) |
Teaching Block 1 (weeks 1 - 12) |

Unit director | Dr. Lawson |

Open unit status | Not open |

Units you must take before you take this one (pre-requisite units) |
Equivalent to MATH10013 Probability and Statistics is essential. Some coding experience is also essential. MATH20800 Statistics 2 is highly desirable. Exceptions would require an extreme drive and understanding of at least linear algebra and probability. |

Units you must take alongside this one (co-requisite units) |
None |

Units you may not take alongside this one |
None |

School/department | School of Mathematics |

Faculty | Faculty of Science |

**Why is this unit important? **

Data Science is at the heart of the modern digital economy. Its promise is huge and expanding, but to see through the hype to what is truly achievable requires an ability to comprehend both the big picture and the mathematical details. Using Data Science responsibly and effectively is one of the simplest ways that Mathematicians can make a difference in modern society.

Data Science Toolbox provides a “literacy” of statistical and machine learning methods to make students proficient with practical data science problems. The approach is to focus on the broad spectrum of techniques that are truly useful in “real-world” data science, and develop a maturity of approach that will support independent learning for students either entering a “technical expert” role in a data science industry, or considering an applied research role.

**How does this unit fit into your programme of study?**

This unit welcomes students with a background in any area of mathematics with a real drive to learn Data Science. It aims to be a “melting pot” of ideas in which students work together, sharing coding and domain knowledge acquired elsewhere, to practically explore the most useful and influential areas of Machine Learning, Visualisation and Data Engineering. It is likely to be valuable for anyone analysing data or using simulation in their Project. Some of the mathematical ideas will be easier for those with a statistics background, but the majority of content is advanced and from other fields, with mathematical ideas spanning algorithms, linear algebra, and machine learning. Therefore, the course provides breadth, with a depth of knowledge acquired through independently chosen coursework.

**Overview **

Students will encounter everything necessary to handle Modern Data Science. These are structured as:

**Statistical foundations**: Exploratory Data Analysis, Modern Regression, Testing and Model Selection; Latent Structure, Principal Components Analysis and Clustering; Non-Parametrics, the Kernel Trick, Missing Data, Scalable Bayesian methods and Regularisation.**Machine Learning tools**: Modern Classification including k-Nearest Neighbours, Random Forests, Decision Trees, Bagging, Boosting and Stacking; assessing performance through Receiver-Operator Curves; Topic Models; Algorithms for Data Science and Parallel Algorithms; the Perceptron and Neural Networks.**Ethics and Privacy**.

These are linked to Mathematical foundations where this is possible, and the insights inspired by a mathematical understanding are emphasised throughout. Extensive workshops in R and python will allow proficiency through the use of well-established libraries, and real-world practical examples will be explored during deeper group projects.

**Growth on the Unit **

During the unit, students will learn to work collaboratively using Github, to take part in and design classification competitions, and to exploit the wealth of prior work in Data Science. During this process students will undertake independent research into subjects of their choice to build a public portfolio of work that demonstrates their newly-found data science agility.

**Learning Outcomes **

- ILO1 Be able to
**use and apply basic machine learning**tools. - ILO2 Be able to
**make and report appropriate inferences**from the results of applying basic tools to data. - ILO3 Be able to reason about and
**conceptually align problems**involving real data to appropriate theoretical methods and available methodology to correctly make inferences and decisions. - ILO4 Be able to
**effectively use online collaboration tools**to apply mathematical methods to difficult data science problems.

You will learn through:

- Lecture-style presentation of conceptual material.
- Weekly workshop-style coding implementation. This is an opportunity to practice your coding in a supported, low-risk environment.
- Ongoing formative worksheets. These provide constant feedback on your conceptual understanding.
- Ongoing summative worksheets. These are assessed as part of the portfolio to assess conceptual material without relying on continual submission of coursework.
- Ongoing group projects. These allow a deeper understanding of specific topics of individuals’ choice. They include a personal reflection, to encourage growth of both team-support and technical data science skills, as well as evidence individual contributions.
- A final portfolio, replacing the exam, to provide a more in-depth exploration of concepts and cement your knowledge.

By learning with your fellow students in groups, you will learn faster and be able to experience a wider range of coding practice and machine-learning, whilst minimising the hard job of data preparation and cleaning.

**Tasks which help you learn and prepare you for summative tasks (formative):**

You will have one formative group assessment to get used to Github and the idea of independent group assessments. You will also undertake continual formative assessments for the conceptual content each week.

**Tasks which count towards your unit mark (summative): **

There are two summative group assessments each contributing 30% of your mark, one on a real-world classification problem and one on a real-world machine-learning problem.

There is a final individually assessed portfolio contributing 40% of your mark containing work performed during the Unit. There is no exam.

**When assessment does not go to plan **

Group Assessments contain mechanisms to a) allow for different group effort, and b) reward individual contributions via an individual reflection. In the case of failure, failure by a whole group would result in an appropriate group task being set and reassessed for all group members. If a single student fails a group assessment or is unable to participate for an evidenced reason, an individual reassessment will be set.

Assessed projects should rarely fail, because success of the machine-learning algorithm is not the goal and does not affect grades. The process of learning that is important. Different coding skill is expected coming into the unit and by including a reflective assessment element, growth can be rewarded.

If this unit has a Resource List, you will normally find a link to it in the Blackboard area for the unit. Sometimes there will be a separate link for each weekly topic.

If you are unable to access a list through Blackboard, you can also find it via the Resource Lists homepage. Search for the list by the unit name or code (e.g. MATHM0029).

**How much time the unit requires**

Each credit equates to 10 hours of total student input. For example a 20 credit unit will take you 200 hours
of study to complete. Your total learning time is made up of contact time, directed learning tasks,
independent learning and assessment activity.

See the University Workload statement relating to this unit for more information.

**Assessment**

The Board of Examiners will consider all cases where students have failed or not completed the assessments required for credit.
The Board considers each student's outcomes across all the units which contribute to each year's programme of study. For appropriate assessments, if you have self-certificated your absence, you will normally be required to complete it the next time it runs (for assessments at the end of TB1 and TB2 this is usually in the next re-assessment period).

The Board of Examiners will take into account any exceptional circumstances and operates
within the Regulations and Code of Practice for Taught Programmes.