Skip to main content

Unit information: Text Analytics in 2020/21

Unit name Text Analytics
Unit code COMSM0037
Credit points 10
Level of study M/7
Teaching block(s) Teaching Block 2 (weeks 13 - 24)
Unit director Dr. Simpson
Open unit status Not open

Software Development: Programming and Algorithms


Introduction to Artificial Intelligence EMATM0044

School/department Department of Computer Science
Faculty Faculty of Engineering


The sheer volume and complexity of online natural-language text data means that traditional manual techniques and stand-alone applications are very often no longer sufficient to process and analyse this data and provide useful information. Furthermore, the availability of large-scale sources of text data, such as those found on social media websites, opens up new opportunities for estimating the sentiment or opinions of large groups of people.

This unit aims to provide students with a thorough grounding in the computational analysis of large-scale natural-language texts. The unit will cover methods for unsupervised and supervised text mining including text pre-processing, structured data extraction, clustering of documents, classification of documents, and sentiment analysis using different techniques. The methods taught include rule-based approaches, traditional machine learning techniques as well as more recent techniques such as those based on deep-learning neural networks.

Intended learning outcomes

Students will be able to

  1. Demonstrate an understanding of the theory and terminology of empirical modelling of natural language.
  2. Select and employ appropriate techniques for structured data extraction and text pre-processing.
  3. Write programs and deploy library-code for various techniques for statistical text analysis.
  4. Apply established text analysis methods on large-scale text-data sources.

Teaching details

Teaching will be delivered through a combination of synchronous and asynchronous sessions, including lectures, practical activities and self-directed exercises.

Assessment Details

80% coursework, 20% in-class tests

Reading and References

  • Bengfort, Benjamin, Bilbro, Rebeccca, and Ojeda, Tony. Applied Text Analysis with Python. O'Reilly, 2018.
  • Bird, Steven, Klein, Ewan, and Loper, Edward. Natural Language Processing with Python. O’Reilly, 2009.
  • Liu, Bing. Sentiment Analysis: Mining Opinions, Sentiments, and Emotion. Cambridge University Press, 2015.
  • Manning, Christopher and Schütze, Hinrich. Foundations of Statistical Natural Language Processing. MIT Press, 1999.
  • Manning, Christopher, Raghavan, Prabhakar, and Schütze, Hinrich. Introduction to Information Retrieval, Cambridge University Press, 2008.
  • Rao, Delip and McMahon, Brian. Natural Language Processing with PyTorch: Build Intelligent Language Applications Using Deep Learning. O'Reilly, 2019.