Skip to main content

Unit information: Advanced Data Analytics in 2021/22

Please note: It is possible that the information shown for future academic years may change due to developments in the relevant academic field. Optional unit availability varies depending on both staffing, student choice and timetabling constraints.

Unit name Advanced Data Analytics
Unit code COMSM0088
Credit points 20
Level of study M/7
Teaching block(s) Teaching Block 2 (weeks 13 - 24)
Unit director Professor. Nabney
Open unit status Not open
Pre-requisites

EMATM0048 (SDPA) or EMATM0061 (SCEM)

Co-requisites

EMATM0044 (INAI) and COMSM0089 (INDA)

School/department Department of Computer Science
Faculty Faculty of Engineering

Description

Visual analytics couples the visual representation of data with analytical processes to support complex decision making and understanding. A picture may be worth a thousand words, but only if it is well designed to represent data faithfully and meaningfully. This unit will enable students to create powerful analyses of data and communicate them effectively to non-specialists.

This unit extends the material taught in the co-requisite unit Introduction to Data Analytics by giving students a solid grounding in contemporary advanced machine learning. In visual analytics, such methods serve to as useful tools to change the data representation, e.g. through dimensionality reduction, or as a way of analysing visual data) in a framework of statistical pattern recognition; in text analytics, such methods serve to produce powerful analyses that traditional methods fail to deliver.

Machine learning topics covered by this unit include: principles of Statistical Pattern Recognition (probabilistic models for data, curse of dimensionality generalisation error, bias-variance dilemma); linear models (Probabilistic Principal Component Analysis; Discriminant Analysis); generalised dissimilarity mappings and neighbour embedding techniques; Gaussian Processes; latent variable models (Gaussian Mixture Models, Generative Topographic Mapping and Gaussian Process Latent Variable Model); Bayesian model regularisation and combination; feature selection; challenges of large datasets and potential solutions. The text analytics methods taught include rule-based approaches, traditional machine learning techniques, and also current leading techniques such as those based on deep-learning neural networks.

Throughout the unit there is a focus on understanding theory and modelling principles in order to apply them effectively to represent and analyse data

Intended learning outcomes

Students will be able to

  1. Apply established text analysis methods on large-scale text-data sources.
  2. Define the types and semantics of data.
  3. Build machine learning models for data and explain their operation in terms of a statistical pattern recognition framework.
  4. Use Bayesian regularisation and variational methods to fit models.
  5. Create user-focused visualisations of numerical, categorical, time series, and network data using visualization tools such as those available in the public domain via Python and Tablea

Teaching details

Problem-based learning combining lecture elements with practical individual work.

Assessment Details

Mid-term coursework (25%): design and implement a system for automated analysis of a substantial text corpus and write a report on the findings from deploying this (ILO 1).

Final coursework (75%): Create a visualisation of key features of a medium-sized real-world dataset, analyse and evaluate the representation through a user trial, and report on conclusions relating them to the theory of information visualization (ILO 2, 3, 4, & 5).

Reading and References

  • Bird, S., Klein, E., & Loper, E. Natural Language Processing with Python. O’Reilly, 2009.
  • Bishop, C. Pattern Recognition and Machine Learning, Springer, 2006
  • Liu, B. Sentiment Analysis: Mining Opinions, Sentiments, and Emotion. Cambridge University Press, 2015.
  • Manning, C. & Schütze, H. Foundations of Statistical Natural Language Processing. MIT Press, 1999.
  • Munzner, T. Visualization Analysis and Design, CRC Press, 2014.
  • Nabney, I. Netlab: Algorithms for Pattern Recognition, Springer, 2004
  • Rao, D. & McMahon. Natural Language Processing with PyTorch: Build Intelligent Language Applications Using Deep Learning. O'Reilly, 2019.
  • Ware, C. Visual Thinking: For Design. Elsevier, 2010

Feedback