ressearch project afm

parlab

research> activities

Research projects

Multicue Vision

Nabil Ouerhani

Project program

CSEM-IMT common research program

Keywords

Early vision, visual attention, attentive vision, focus of attention, parallel visual processing, real-time vision

Project description

The goal is to investigate and develop methods and algorithms for the processing of visual signals according to the multicue paradigm. It is planned to implement the necessary means to perform experiments and to validate the concept of asynchronous multicue segmentation. Actions are performed at two levels.

Cue level. The purpose of the cue level is to provide the cue signals. It is planned to practically work with the same cues as the one already selected by CSEM for its architecture, i.e. gradient magnitude and direction as well as their time derivatives. Activities at this level are less fundamental and concern mainly the development of experimental tools.

Focus of attention level. The purpose of this level is to combine the different cues in order to extract the most conspicuous information. Relevant cues and significant locations are made available in a timely ordered fashion according to their importance. Significant developments are planned at this level in order to translate the known methods and relatively heavy algorithms to hardware which offers limited computational power.

Multicue visual attention

The visual attention system developed in this project takes advantage of cues available from conventional video sources (that is, motion, color and contrast), in addition to depth [1]. As each feature contributes differently to the saliency map, they must be selected in a way that optimizes a global criterion based on a specific quality/cost ratio measure. This criterion should weigh more the motion channel as this cue strongly contributes to attracting attention in a visual scene. Because motion cues play such an important role in the proposed system, special care must be provided to robustly extract this key feature. For instance, methods simply based on image differences are not recommended because they cannot provide the motion's direction. Conversely, refined methods based on relaxation principles are also rejected because of their complexity. As a good compromise, a multiresolution scheme based on standard motion-detection methods has been selected for recovering a wide range of velocities.


Figure 1 Multicue visual attention model	Figure 2 Three main spots of attention of a dynamic real scene

Another aspect of main concern is related to the natural capacity of the human visual system to process visual information in channels with different spatial sensitivity. This calls for an extension of the visual attention model to treat multiple spatial scales. Practically, this is realized by representing each visual feature by a vector, where each component represents the feature at a given scale.

Visual Attention in Real-Time

The visual attention model was implemented on a highly parallel Single Instruction Multiple Data (SIMD) architecture. Generally a SIMD machine consists of a 2D array of identical processing units which perform the same operation on a 2D array of data. Unlike conventional sequential processors, the computation time of such a parallel architecture is independent of the size of images.

Figure 3. Real-time visual attention implemented on a prototype SIMD architectur

ProtoEye is a CSEM image processing ASIC based on the principle of SIMD. It consists of a 35 x 35 array of mixed analog-digital cells. The digital part of a cell, working on 4-bits words, performs all operations needed to transform single images and to combine pairs of images. The analog part is composed essentially of a diffusion network which efficiently performs the time consuming task of low and high-pass spatial filtering of images. Thus, a processor is assigned to each pixel of the image. Four ProtoEye chips are connected together to process 64 x 64 gray level images, provided by a CMOS camera. The whole architecture is controlled by a general purpose microcontroller (sequencer) running at a frequency of 4 MHz, yielding an effective performance of over 8 Giga operations per second. In addition to its high performance, the image processing platform is fully programmable.

References

[1] N. Ouerhani and H. Hügli, "Computing visual attention from scene depth", Proc. ICPR 2000, IEEE Computer Society Press, pp. 375-378, Sept. 2000

[2] Nabil Ouerhani, Neculai Archip, Heinz Hügli & Pierre-Jean Erard, "Visual Attention Guided Seed Selection for Color Image Segmentation", Proc. Conf. Computer Analysis of Images and Patterns, CAIP'2001, September 5-7, 2001, Warsaw, Poland, Lecture Notes on Computer Science, Springer Verlag, LNCS-2124, pp. 630-637

[3] Nabil Ouerhani, J. Bracamonte, H. Hügli, M. Ansorge & F. Pellandini, "Adaptive Color Image Compression Based on Visual Attention", Proc. 11th Int. Conf. on Image Analysis and Processing, ICIAP 2001, 26-28 Sept. 2001, IEEE Computer Society Press, 2001, pp 416-421

[4] H. Hugli, N. Ouerhani & P.-Y. Burgi, "Multicue Visual Attention", CSEM Scientific and Technical Report 2000