Exabyte Informatics

Biometric penguin recognition system in operation at Robben Island, South Africa

Exabyte Informatics is concerned with the challenges and opportunities arising from having unprecedented amounts of data available in computer-readable form. The aim of the Exabyte Informatics research theme is to advance computing as the language of 21st-century science.

The size of these huge data repositories, such as the internet, is measured in exabytes; 1018 bytes is equivalent to around 50,000 years of DVD-quality video, for example.

The Exabyte Informatics theme is inherently interdisciplinary, based around a core hub of research activities in the Department of Computer Science. It draws on the state-of-the-art high-performance computing (HPC) infrastructure at the University, as well as a petascale storage facility currently being commissioned.

Fundamental to the exabyte approach is the idea that data is engaged in a continuous cycle of computational operations that enhance the data. For instance, terabytes of video footage of penguin colonies can be used to train penguin recognition models, which can then be used to enhance the data such that individual penguins can be tracked and their behaviour analysed. The key computational methods used are known as annotation, integration and mining.

The Exabyte Informatics theme is inherently interdisciplinary, drawing on the state-of-the-art high-performance computing infrastructure at the University, as well as a petascale storage facility.

By supporting these computational methods, exabyte informatics offers the opportunity to exploit the synergies arising from similar types of data occurring in very different disciplines. For instance, image data plays a major role in areas such as astronomy, theatre studies, neuroscience and medieval studies, among others. Interaction networks form another type of data occurring in a wide range of domains.

Recent, ongoing and planned research projects include:

  • biometric penguin recognition and tracking system
  • capture and computer-generated synthesis of insect motion
  • scientific discovery of metabolic networks
  • computational analysis of bibliographic and social networks
  • structural analysis of high-resolution scans of medieval documents
  • automated discovery of social insect behavioural and decision-making models