Unit name | Large-Scale Data Engineering |
---|---|
Unit code | EMATM0051 |
Credit points | 20 |
Level of study | M/7 |
Teaching block(s) |
Teaching Block 1 (weeks 1 - 12) |
Unit director | Mr. Alan Forsyth |
Open unit status | Not open |
Pre-requisites |
None |
Co-requisites |
Software Development: Programming and Algorithms or MATHM0039. Technology, Innovation, Business, and Society |
School/department | School of Engineering Mathematics and Technology |
Faculty | Faculty of Engineering |
This unit aims to give a comprehensive overview of elastically scalable and remotely-accessed "cloud" computing services such as those offered by Amazon, Google, and Microsoft, and associated technologies for dealing with very-large-scale bodies of data. The unit commences with discussion of the economics that have driven the rapid development and adoption of cloud computing in a variety of industries; it then explores the provisioning of cloud services moving from infrastructure-as-a-service (IaaS), through platform-as-a-service (PaaS), software-as-a-service (SaaS), and "serverless" functions-as-a-service (FaaS). The open-source Hadoop "ecosystem" cloud service projects is introduced, and various cloud data-storage and data-processing technologies (e.g. "NoSQL" and "NewSQL" databases, graph databases, stream-processing systems, etc) are surveyed, with evaluation of their strengths and weaknesses. The unit closes with discussion of current research issues.
By the end of the unit students will be able to:
1. Explain the economic factors and economies of scale that have driven the development of cloud computing;
2. Compare and appropriately select among the various cloud computing services offered by major providers such as Amazon, Google, Microsoft, and Oracle, and have direct experience of initiating, running and managing, and closing remotely accessed computational resources via X-as-a-Service access models;
3. Demonstrate competence as a practitioner of cloud database programming with reference to the "NoSQL" approach (such as MongoDB, Cassandra, and CouchDB), to "NewSQL" cloud databases with relational functionality, and to graph databases such as Neo4J or Giraph).
4. Reflect on experience of small-group team-work using contemporary software development techniques such as “pair programming”.
5. Refer to at least one case-study of a contemporary successful company whose business model is dependent on cloud services and relate this success to their implementation and use of large-scale data engineering;
6. Demonstrate the combination and use of cloud computing technologies such as in-memory compute and stream-processing in high-performance and high-throughput applications; and
7. Identify and discuss current research issues in large-scale data engineering.
Teaching will be delivered through a combination of synchronous and asynchronous sessions, including lectures, group work, practical activities and self-directed exercises.
Coursework (100%)