Visual Robot Guidance from Uncalibrated Stereo 5

5 Implementation and Experiments

5.1 Equipment

The system was implemented on a Sun SPARCstation 10 with a Data Cell S2200 frame grabber. The manipulator is a Scorbot ER-7 robot arm, which has 5 degrees of freedom and a parallel-jawed gripper. The robot has its own 68000-based controller which implements the low-level control loop and provides a Cartesian kinematic model. Images are obtained from two inexpensive CCD cameras placed 1m-3m from the robot's workspace. The angle between the cameras is in the range of 15-30 degrees (figure 4).

Figure 4: The experimental setup. Uncalibrated stereo cameras viewing a robot gripper and target object.

5.2 Implementation

When the system is started up, it begins by opening and closing the jaws of the gripper. By observing the image difference, it is able to locate the gripper and set up a pair of affine trackers as instances of a hand-made 2D template. The trackers will then follow the gripper's movements continuously. Stereo tracking can be implemented on the Sun at over 10 Hz. The robot moves to four preset points to calibrate the system in terms of the controller's coordinate space.

A target object is found by similar means - observing the image changes when it is placed in the manipulator's workspace. Alternatively it may be selected from a monitor screen using the mouse. There is no pre-defined model of the target shape, so a pair of `expanding' B-spline snakes [2] are used to locate the contours delimiting the target surface in each of the images. The snakes are then converted to a pair of affine trackers. The target surface is then tracked, to compensate for unexpected motions of either the target or the two cameras.

By introducing modifications and offsets to the feedback mechanism (which would otherwise try to superimpose the gripper and the target), two `behaviours' have been implemented. The tracking behaviour causes it to follow the target continuously, hovering a few centimetres above it (figure 5). The grasping behaviour causes the gripper to approach the target from above (to avoid collisions) with the gripper turned through an angle of 90 degrees, to grasp it normal to its visible surface (figure 6).

Figure 5: The robot is tracking its quarry, guided by the position and orientation of the target contour (view through left camera). On the target surface is an affine snake - an affine tracker obtained by `expanding' a B-spline snake from the centre of the object. A slight offset has been introduced into the control loop to cause the gripper to hover above it. Last frame: one of the cameras has been rotated and zoomed, but the system continues to operate successfully with visual feedback.

Figure 6: Affine stereo and visual feedback used to grasp a planar face.

5.3 Results

Without feedback control, the robot locates its target only approximately (typically to within 5cm in a 50cm workspace) reflecting the approximate nature of affine stereo and calibration from only four points. With a feedback gain of 0.75 the gripper converges on its target in three or four control iterations. If the system is not disturbed it will take a straight-line path. The system has so far demonstrated its robustness by continuing to track and grasp objects despite:

Kinematic errors. Linear offsets or scalings of the controller's coordinate system are absorbed by the self-calibration process with complete transparency. Slight nonlinear distortions to the kinematics are corrected for by the visual feedback loop, though large errors introduce a risk of ringing and instability unless the gain is reduced.

Camera disturbances. The system continues to function when its cameras are subjected to small translations (e.g. 20cm), rotations (e.g. 30 degrees) and zooms (e.g. 200% change in focal length), even after it has self-calibrated. Large disturbances to camera geometry cause the gripper to take a curved path towards the target, and require more control iterations to get there.

Strong perspective. The condition of weak perspective throughout the robot's workspace does not seem to be essential for image-based control and the system can function when the cameras are as close as 1.5 metres (the robot's reach is a little under 1 metre). However the feedback gain must be reduced or the system will overshoot on motions towards the cameras. Figure 5 shows four frames from a tracking sequence (all taken through the same camera). The cameras are about two metres from the workspace. Tracking of position and orientation is maintained even when one of the cameras is rotated about its optical axis and zoomed (figure 5, bottom right).

Contents