Vision Services

Next: Robot controller Up: Software Architecture Previous: Software Architecture

Vision Services

There are four cameras on board Spinoza: a colour camera (``top'') on the pan-tilt unit (PTU) provides a pointable colour input useful for tracking; three monochrome cameras (``left'', ``right'', and ``upper'') in a static ``L'' configuration are used for stereo ranging. The first DSP, called the grabber, grabs colour (RGB) images and stereo camera information from the three monochrome cameras. The stereo information is passed on to the second DSP for processing while the color information is used to find coloured blobs. Timely delivery of blob information is assured by having the VIP TIM probe the Grabber for blob information during stereo computation, which takes much longer than blob detection. The interconnections between these and the robot controller are shown in Figure 8.

The grabber regularly switches between two three-input camera systems. In one the three inputs are the left, right, and upper cameras, the trinocular inputs. In the other, the top RGB camera sends three signals containing Red, Green, and Blue separations.

Colour blob tracking is performed by first segmenting a colour image to a binary map. The centroid of all ``on'' pixels is the centroid of the target; speed requirement necessitate this simplification. While the blob is being detected, the DSP concurrently passes on the stereo images to the VIP TIM which performs trinocular stereo.

Figure 8: Vision Server

The reliability of stereo data is paramount in obstacle avoidance---stereo is computed in trinocular format, requiring slightly more computing, but with a useful increase in reliability [8]. Dense stereo citeBulLitPog89a,OkuKan93a permits obstacle avoidance without segmentation or interpretation as would be required by line-based stereo [18]. Trinocular stereo compares image patches along a fixed range of disparities, among three cameras roughly aligned in an ``L'' shape. Horizontal scene structures may be ambiguous from the left-right comparison, but will be separated by the upper-right comparison. Both comparisons create combined measure of support for a particular depth.

The VIP TIM DSP corrects for warping of images due to lens distortion and aligns the geometry so that the epipolar lines are alighted to the x and y axis. Cameras are calibrated [12] and the images correction mapping is computed off line, using Matlab. Images are first smoothed, then down sampled and corrected via a large table. This implements a ``soft'' calibration that can be redone on demand.

The stereo is then computed a multi--baseline correlation method[17]. This is implemented using the A110 convolver to do the stereo correlation.

This replaces stereo previously implemented on the Datacube system which could operate at 15Hz, but the Datacube does not fit into an embedded system [13]. Optical flow [4] can be implemented in similar fashion to the stereo on the VIP TIM, to support obstacle avoidance based on flow [5].

Figure 9: Results of the stereo algorithm

Figure 9 presents an example of the results obtained by the stereo algorithm. The brighter shades of grey represent points in the scene that are closer to the robot. Likewise the darker shades of grey represent the points further away. The black areas of the image represent points for which distance can not be determined accurately. The system processes 128x128 pixel images at 20 disparities at 2 Hz.

Next: Robot controller Up: Software Architecture Previous: Software Architecture

Vladimir Tucakov
Tue Oct 8 14:08:29 PDT 1996