Skip to main content

Unit information: Computational Bioinformatics in 2014/15

Please note: you are viewing unit and programme information for a past academic year. Please see the current academic year for up to date information.

Unit name Computational Bioinformatics
Unit code COMS30003
Credit points 10
Level of study H/6
Teaching block(s) Teaching Block 2 (weeks 13 - 24)
Unit director Professor. Gough
Open unit status Open
Pre-requisites

None

Co-requisites

None

School/department Department of Computer Science
Faculty Faculty of Engineering

Description

This is an introductory unit for students of computer science with an interest in BioTech or Big Data (e.g. DNA/genomic sequence data) and for science students from any discipline, including biology, who can demonstrate some ability with computers. The unit will cover principles of bioinformatics computational tools, and practical training in their use (via the web and by running on local computer). Commonly used informatics techniques such as relational databases and scripting/parsing languages will also be explained and discussed in the context of bioinformatics.

Computer science, and all science in general is becoming increasingly driven by big data. The two outstanding areas of challenge for humanity in the coming decades will be web/media and biological data; within years most of us will have our own personal genome sequenced and stored on a computer. As a vehicle for learning, the unit will focus on protein sequences. Initially it will cover analysis of both the 2D sequence and the 3D structure. Following this it will cover how proteins fit into genomics at the DNA sequence level and lead to large-scale comparative genomics between thousands of genomes. The unit will be put in perspective with a vision of the future directions in which bioinformatics is moving, including discussion of current and upcoming disruptive technologies that will shape the world into which graduating students will emerge.

The unit will cover:

  1. Some basics of molecular biology: there is no pre-requisite for any previous knowledge of biology, we will cover the basics necessary to understand the sequence and genomic data types, and how they relate to each other. This is intended to be a fun and stimulating part of the unit, and will not be heavily assessed.
  2. Sequence analysis: some of the largest datasets are of biological sequence data, which are ideal for handling computationally. We will learn the main skills, tools and datasets for sequence analysis. This will form the main component of the assessment.
  3. The future of informatics: an important aspect of this introductory unit is giving the vision of the point where we are in this incredibly fast-changing area (in 2000 the first human genome, in 2020 your genome), and identifying the opportunities students completing this unit.

Intended learning outcomes

After successful completion of this unit students will: have an overview of the fundamentals of the molecular biology and evolution of proteins at the atomic and genetic level, be able to use downloadable or online web resources and bioinformatics tools to carry out advanced sequence analysis and comparative genomics, and be able to do basic scripting for automating data processing tasks and handling different data formats. This unit should be a good foundation in the practical application of bioinformatics, opening up a broad range of employment opportunities in BioTech or biological research.

Teaching details

2 lectures per week supported by a combination of laboratory and problem classes

Assessment Details

100% coursework consisting of two assignments assessing both the theoretical and practical content of the unit and involving a combination of software development and report writing.

Reading and References

This unit doesn't require a particular book or reading material, but below are some recommendations for those interested in additional reading.

On dynamic programming and the basics of hidden Markov models and sequence comparison methods: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids by Richard Durbin, Sean R. Eddy, Anders Krogh, and Graeme Mitchison. This one is in the library. Here is a good index of papers and tools/resources on the available biological databases: Nucleic Acids Research 2013 Database Summary Paper Category list. http://www.oxfordjournals.org/nar/database/c/

Feedback