Skip to main content

Unit information: SWBio DTP: Data Science and Machine Learning for the Biosciences in 2020/21

Unit name SWBio DTP: Data Science and Machine Learning for the Biosciences
Unit code BIOCM0022
Credit points 20
Level of study M/7
Teaching block(s) Teaching Block 1 (weeks 1 - 12)
Unit director Dr. Barker
Open unit status Not open
Pre-requisites

None

Co-requisites

BIOCM0010 SWBio DTP: Statistics and Bioinformatics,

BIOCM0013 SWBio DTP: Science in Society, Business and Industry,

BIOCM0021 SWBio DTP: Rotation Project 1,

BIOCM0020 SWBio DTP: Rotation Project 2

School/department School of Biochemistry
Faculty Faculty of Life Sciences

Description

The key aim of this unit is to introduce and familiarise doctoral students with the basics of coding, machine learning and general principles of data science as applied in the analysis of data from the biosciences. It is assumed that students will have minimal previous experience with coding (but noting that they will have made limited usage of R in the co-requisite BIOCM0010 unit which is taken prior to this new unit). By the end of the unit it is anticipated that students will be able to complete a short coding project manipulating data of relevance to their doctoral research studies.

The specific aims of this unit are:

  • to provide students with an introduction to programming, primarily using Python;
  • familiarisation with data analysis using Python modules such as Pandas, Numpy, Matplotlib, Matplot;
  • an understanding of the process of software engineering including design, documentation, testing and version control;
  • basic theory and application of elementary machine learning techniques using TensorFlow, as applied to image analysis;
  • familiarisation with data science principles underlying model generation for deep learning applications, and of ethics in machine learning
  • (optional) Intermediate to advanced programming in Python.
  • (optional) Parallel programming using task-based, message passing and shared memory models

Intended learning outcomes

  1. An understanding of the process by which software is assembled and operated, as implemented within the Python platform.
  2. Familiarity with core data analysis modules implemented in Python including Pandas, Numpy, Matplotlib and Scikit-learn;
  3. Ability to write, compile and debug simple Python scripts for the analysis of biological data.
  4. An understanding of the basic principles of machine learning and their application to research data such as image processing.
  5. Understanding and demonstrating competence in how combinations of sequential segments of coding can be combined to provide in depth analysis of large data sets.

Teaching details

This unit will have an intensive one week of teaching, comprising lectures, workshops, practical activities including some small-group activities. This will be followed by recommended- and self-directed study, to prepare the student for the various assessment activities.

Assessment Details

This is a pass/fail unit, with each individual assessment being assessed using the pass/fail criteria.

There will be 2 assessments:

(1) A short group project on day 3, including a verbal presentation to the whole cohort and to which all group members will need to contribute (30%), and

(2) an individual short project, involving development of simple software for elementary analysis of a large data set from their area of doctoral research (70%) (of which 50% is submission of a functioning code along with a documented log of debugging steps and 50% for a short, written summary of their code and the main outcomes from their analysis).

Reading and References

Suite of online training modules for Python:

https://milliams.gitlab.io/beginning_python

Python Essential Reference, 4th Edition David Beazley. Addison-Wesley Professional (July 19, 2009)
ISBN 0672329786.

Learning Python, 5th Edition Mark Lutz, O’Reilly Publishing (June 2013) ISBN 978-1-449-35573-9.

Feedback