Image & Video Content Analysis

The group has expertise in the fields of automated content analysis for annotation and retrieval of image sequences from databases and archives, image and video segmentation and tracking, data fusion, data authentication and 3D processing and view synthesis.

A number of schemes have been developed by the group for video parsing and classification with uncompressed and compressed video sequences. A recent innovation includes the use of complex wavelets for texture analysis and segmentation. This was refined further in EPSRC Autoarch (GR/M84183/01) to incorporate spectral clustering. The method is unsupervised and integrates intensity, texture and colour information via a texture watershed, to provide a perceptual gradient function. Compared to previous approaches, it offers low complexity, orientation-adaptivity and control of over-segmentation. It competes closely with the state of the art in machine-learned boundary detection.

gradient based segmentation
Texture Gradient based segmentation

This work continued in ICBR (Intelligent Content-Based Retrieval), a large collaborative project funded by 3CRL, where the group together with the Department of Computer Science and partner companies, Matrix Data and Granada-ITV, has worked on efficient automatic segmentation and classification methods for Wildlife archives. These are based on determining colour, texture, shape and motion parameters to identify scene content, context and activity. A number of feature extraction schemes which are invariant to rotation, scale and illumination have been developed alongside high level semantic models. The group has, in collaboration with Texas Instruments, developed application specific search engines based on MPEG-7 features. Current research is also addressing region tracking and replacement for broadcast applications including selective advertising. Here particle filtering has been employed to provide a robust region tracking framework incorporating occlusion and illumination compensation.

region tracking
Image region tracking and replacement

The above work also forms the basis of research in region based image fusion, detection and tracking the aim of which is to develop and integrate multi--dimensional fusion algorithms that fully exploit the potential of distributed, hyperspectral and/or multi-modality sensors (Figure 3). The work is conducted within the Data and Information Fusion Defence Technology Centre (project 2.1), funded by the MOD, and is currently being evaluated by QinetiQ for an airborne EO/IR target recognition system. It is a unique collaboration between the Departments of Electrical and Electronic Engineering, Mathematics and Experimental Psychology, which combines analytical and practical skills with user interface expertise.

Also via the DTC work (project 2.2), joint segmentation and tracking work has been undertaken based on particle filtering, which has been incorporated into a demonstrator by General Dynamics. The Group has been successful in being invited to join two cluster projects in DIF DTC Phase II (Tracking and Multi-Dimensional Fusion).

wavelet-based sensor fusion
Wavelet-based sensor fusion (right) of visual (left) and thermal (middle)
surveillance images. (Source data courtesy of the TNO Human Factors
Research Institute.)

The group's work has also focussed on problems associated with medical image processing. In collaboration with local hospitals, new confocal microscopy techniques for analysis and diagnosis have been developed together with a novel framework for multimodality image fusion using multiscale (complex wavelet) edges. New methods for guided navigation through medical volumes have resulted in a novel virtual liver biopsy system and in the world's first stereo eye-tracker for region enhanced medical displays.


Wavelet based image fusion of medical imaging modalities

Digitally encoded video material is becoming increasingly important in medical or law enforcement applications where large quantities of information can be rapidly accumulated which for legal reasons cannot be disposed of. However, for such material to be legally admissible, a means of establishing authentication of the content must be provided. Several possible approaches to solving this problem have been researched by the group. Digital watermarking has demonstrated significant promise in this area since it can offer information on the location and type of attack. The group has generated 2 patent applications in this field which employ digital watermarking techniques to detect and classify the location and type of any 'attack'.


Image authentication using digital watermarks (attack: person deleted from centre)

The requirement for dramatic special effects in film and broadcast material production has led to increased requirements for hybrid synthetic and natural content, object based decomposition methods and 'virtual' view synthesis. These, combined with motion effects based on high frame rate and/or multi-camera capture are beginning to transform film and television content creation. One significant example of the above has been developed by our partners, Snell and Wilcox Limited, who have pioneered a motion interpolation technique for Post Production known as FloMo. New algorithms have been produced which are able to produce the very high quality motion interpolated "inbetweening", as seen in the 'bullet time' sequences in the film "The Matrix" (1999). The group has significant expertise in image and video interpolation and the recovery of depth maps from two or more views. This has included efficient and disparity-accurate view synthesis methods for multiview autostereoscopic displays (EPSRC LINK 3D-Intercam) and improved and scalable compression methods for coding of multiview content (Figure 6). High speed search methods for dense disparity estimation, based on genetic algorithms and wavelet pyramids have been recently proposed.

Current research in this area is investigating methods for occlusion detection, occlusion concealment, and segmentation. We have exploited geometric aspects of the scene, together with information present in a multiple camera set-up for enhanced performance. The use of object extraction methods for multiple camera configuration have also resulted in improved segmentation. These approaches enable robust disparity estimation and mitigate the effects of artefacts introduced due to occlusions and camera parameter errors. Our time-slice rig with 24 HD cameras has provided the group with a unique opportunity to capture high quality video material for algorithm development and performance evaluation in this research field. The group, in collaboration with Heriot-Watt University, is currently investigating techniques based on wavelet transforms and volumetric representations for multi-view compression and view synthesis (EPSRC Multiview).


Disparity-based view synthesis Left, right and synthesised intermediate views