Skip to main content

Unit information: Large-Scale Data Engineering in 2021/22

Please note: It is possible that the information shown for future academic years may change due to developments in the relevant academic field. Optional unit availability varies depending on both staffing, student choice and timetabling constraints.

Unit name Large-Scale Data Engineering
Unit code EMATM0051
Credit points 20
Level of study M/7
Teaching block(s) Teaching Block 1 (weeks 1 - 12)
Unit director Mr. Forsyth
Open unit status Not open
Pre-requisites

None

Co-requisites

None

School/department Department of Engineering Mathematics
Faculty Faculty of Engineering

Description

This unit aims to give a comprehensive overview of elastically scalable and remotely-accessed "cloud" computing services such as those offered by Amazon, Google, and Microsoft, and associated technologies for dealing with very-large-scale bodies of data.

The unit commences with discussion of the economics that have driven the rapid development and adoption of cloud computing in a variety of industries; it then explores the provisioning of cloud services moving from infrastructure-as-a-service (IaaS), through platform-as-a-service (PaaS), software-as-a-service (SaaS), and "serverless" functions-as-a-service (FaaS). The open-source Hadoop "ecosystem" cloud service projects is introduced, and various cloud data-storage and data-processing technologies are surveyed, with evaluation of their strengths and weaknesses. The unit closes with an overview of best practices in the use and management of Big Data.

Intended learning outcomes

On successful completion of the unit, students will be able to:

  1. Explain the economic factors and economies of scale that have driven the development of cloud computing;
  2. Compare and appropriately select among the various cloud computing services offered by major providers such as Amazon, Google and Microsoft, and have direct experience of initiating, running and managing, and closing remotely accessed computational resources via X-as-a-Service access models;
  3. Demonstrate competence as a practitioner of cloud computing architecture with reference to fundamental concepts such as availability, reliability, scalability, elasticity, security, cost effectiveness and automation;
  4. Demonstrate the combination and use of cloud computing technologies such as in-memory compute and stream-processing in high-performance and high-throughput applications;
  5. Apply effective methods to store, manage, process and secure data at very large scale (‘Big Data’).

Teaching details

Teaching will be delivered through a combination of synchronous and asynchronous sessions, including lectures, group work, practical activities and self-directed exercises.

Assessment Details

Coursework 1: Design, implement and optimise an effective cloud architecture for an existing data processing application. (ILO 1-5; 100%)

Reading and References

  • Akidau, Tyler, Chernyak, Slava, and Lax, Reuven. Streaming Systems: The What, Where, When, and Howof Large-Scale Data Processing. O'Reilly, 2018.
  • Karau, Holden. Learning Spark: Lightning-Fast Big Data Analysis. O'Reilly, 2015.
  • Kleppmann, Martin. Designing Data-Intensive Applications, O’Reilly, 2017.
  • Lakshmanan, Valliappa. Data Science on the Google Cloud Platform. O'Reilly, 2017.
  • Perkins, Luc, Redmond, Eric, and Wilson, Jim. Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement. Second edition, O'Reilly, 2018
  • Piper, Ben, and Clinton, David. AWS Certified Cloud Practitioner Study Guide: CLF-C01 Exam, Sybex, 2019
  • White, Tom. Hadoop: The Definitive Guide. O'Reilly, 2015.
  • Wittig, Michael. Amazon Web Services in Action, Manning, 2018

Feedback