Skip to main content

Unit information: Applied Data Science in 2016/17

Please note: you are viewing unit and programme information for a past academic year. Please see the current academic year for up to date information.

Unit name Applied Data Science
Unit code COMSM0017
Credit points 10
Level of study M/7
Teaching block(s) Teaching Block 2 (weeks 13 - 24)
Unit director Professor. Peter Flach
Open unit status Not open
Pre-requisites

The unit assumes a good working knowledge in the key machine learning and data mining techniques, for instance as acquired in COMS30301 Introduction to Machine Learning, and programming skills in a major language.

Co-requisites

None

School/department Department of Computer Science
Faculty Faculty of Engineering

Description including Unit Aims

This unit introduces key data science concepts and their application to support data-driven approaches to problem solving.

The aim of this unit is to allow students to acquire fundamental skills covering the full data science pipeline, including the pre-processing, manipulation, integration, storage, exploration, visualisation and privacy. Students will study techniques to transform raw data into advanced representations that will enable a deeper understanding of the original data:

  • Data ingress and pre-processing
  • Data storage and data management
  • Data transformation and integration
  • Data exploration and visualisation
  • Data sharing, privacy and anonymisation

The students will also gain practical skills in handling structured and unstructured data, gaining hands-on experience of software tools widely used in real-world settings.

Intended Learning Outcomes

On completion of the unit, students will:

  • Acquire a working knowledge of practical data science, applied to real world problems. - Be able to start from raw data and deliver a representation allows a better understanding of the topics in the data.
  • Have experience of using software tools for data pre-processing and management.
  • Acquire first hand experience in specific techniques for data storage.
  • Understand the differences between different visualisation strategies to efficiently explore the data. - Have learnt how to present and interpret data to/for a non-technical audience.
  • Be able to share data under privacy constraints.
  • Have practised teamwork and time management.

Teaching Information

This unit involve lectures that will cover the recent advances in applied data science. The topics are addressed from a practical point of view, following the emphasis of a hands-on point of view. This will enable students from different backgrounds to be able to understand the fundamentals of the data science techniques that they will implement in the coursework.

In addition there will be weekly Q&A sessions in which students can get help, advice and feedback on their current progress with the coursework.

Assessment Information

100% coursework.

Assessment will be through a significant data science project, which will be carried in groups of 4-5 students. The projects will be on the basis of real-life data provided by a number of domain experts. Groups will need to pitch for 2 projects after which the allocation is made. The groups do their software development on a platform such as Github, and can request formative feedback on their progress up to three times before the final submission, at a time of their choosing. 1-2 weeks before the final submission there will be a workshop where all groups present their proposed solution to the entire cohort and the domain experts and therefore will be able to incorporate any further formative feedback into their final submission. This final submission is due at the end of the teaching block and will be summatively assessed on all intended learning outcomes as they correspond to different stages of the data science pipeline.

Reading and References

Mining of Massive Datasets, Anand Rajaraman, Jeffrey David Ullman, Cambridge University Press, 2011.

Principles of Data Mining, David J. Hand, Heikki Mannila and Padhraic Smyth, MIT Press, 2001.

Information Visualization, Colin Ware, Morgan Kaufmann, 2012.

Additional reading material in the form of research papers, online resources, etc.

Feedback