Skip to main content

Unit information: Introduction to Data Analytics in 2021/22

Please note: It is possible that the information shown for future academic years may change due to developments in the relevant academic field. Optional unit availability varies depending on both staffing, student choice and timetabling constraints.

Unit name Introduction to Data Analytics
Unit code COMSM0089
Credit points 10
Level of study M/7
Teaching block(s) Teaching Block 2 (weeks 13 - 24)
Unit director Dr. Simpson
Open unit status Not open
Pre-requisites

EMATM0048 (SDPA) or EMATM0061 (SCEM)

Co-requisites

EMATM0044 (INAI)

School/department Department of Computer Science
Faculty Faculty of Engineering

Description

The sheer volume and complexity of available digital data means that traditional manual techniques and stand-alone applications for data analysis are very often no longer sufficient to process and analyse such data and provide useful information. The vast volumes of digital data that are in the form of human-readable natural language text (e.g. in English or Spanish or Chinese) enables large-scale language analysis techniques that are heavily rooted in statistics and machine learning. For example, the availability of large- scale sources of text data, such as those found on social media websites, opens up new opportunities for estimating the sentiment or opinions of large groups of people. At the same time, making sense of digital data is often possible only when it is distilled and displayed via an appropriate visualization technique, and contemporary visualization techniques also often rely on machine learning and statistical methods. This unit gives students a grounding in fundamentals both of visual analytics and of text analytics: the science of information visualisation (primarily concerned with the way that data is represented visually); and the science of extracting useful information from bodies of natural-language text.

Information visualisation topics covered by this unit include: data types and their representations, non-vectoral data, human requirements for visual analytics, scientific visualisation, visualisation quality metrics, Shneiderman’s mantra (overview first, zoom and filter, details on demand) practical visualisation tools.

Text analytics topics covered by this unit include methods for unsupervised and supervised text mining including text pre-processing, structured data extraction, clustering of documents, classification of documents, and sentiment analysis using different techniques.

Intended learning outcomes

Students will be able to

  1. Select and employ appropriate techniques for structured data extraction and text pre-processing.
  2. Write programs and deploy library-code for various techniques for statistical text analysis.
  3. Define and apply the principles of information visualisation.
  4. Analyse the design of visual representations of data in terms of human perception and cognition

Teaching details

Problem-based learning combining lecture elements with practical individual work.

Assessment Details

Coursework: students will develop a system for automated gathering and analysis of a substantial text corpus and write a report on their findings ILO1, 2. The report should include appropriate use of visualizations, with an accompanying rationale/commentary on why the chosen methods were selected.

Reading and References

  • Bengfort, B., Bilbro, R., & Ojeda, T. Applied Text Analysis with Python. O'Reilly, 2018.
  • Dale, K. Data Visualization with Python and JavaScript: Scrape, Clean, Explore & Transform Your Data, O'Reilly, 2016.
  • Eisenstein, J. Introduction to Natural Language Processing, MIT Press 2019.
  • Heer, J., Bostock, M., & Ogievetsky, V. (2010) "A Tour Through the Visualization Zoo" Communications of the ACM, 53(6):59-67.
  • Manning, C., Raghavan, P., & Schütze, H. Introduction to Information Retrieval, Cambridge University Press, 2008.
  • Tufte, E. The Visual Display of Quantitative Information. 2nd edition, Graphics Press, 200

Feedback