The image coordinates of a world feature in two images are not
independent, but related by an *epipolar constraint*.
Consider the family of planes passing through the optical centre of each
camera. These project to a family of *epipolar lines* in each image.
If a feature lies upon a particular line in the left image, the
corresponding feature must lie upon the line in the right image, which
is the projection of the same plane.
The constraint reflects the
redundancy inherent in deriving four image coordinates from points in a
three-dimensional world. Most correspondence algorithms
exploit this constraint, which reduces the search for matching
features to a single dimension, and identifying it is an important
aspect of any calibration scheme.

In affine stereo, the epipolar planes are considered to be parallel,
and the constraint takes the form of a single linear relation among
the four image coordinates. With the full perspective model, the lines
need not be parallel, and converge to a point called the *epipole*
(the projection of one camera centre on the other camera's image plane).
The constraint may be obtained from calibration data,
for instance by rearranging the model to predict one
image coordinate as a function of the other three.

Figure 7 compares the epipolar line structure predicted by both affine and full perspective stereo models (after calibration using linear least squares). In this setup, in which the camera distance is about 2 metres, both models give similar epipolar accuracy. Furthermore, the affine model can predict epipolar lines using just 4 reference points; perspective stereo requires a minimum of 6.

[Reference and test points are confined to a unit cube centred about the origin. There are 6 reference points within the unit cube. Test points are distributed uniformly within the cube. The cameras face the origin from a distance of 3-24 units, angled 20 degrees apart (their focal length is proportional to distance, to normalize image size)].

Without noise or other disturbances, perspective stereo estimates absolute and relative positions with complete accuracy. At close range affine stereo performs poorly, but the error decreases in inverse proportion to camera distance (figure 8).

Accuracy is also somewhat dependent on the number and configuration of the reference points used in calibration, and there is a limited improvement as the unit cube is sampled more regularly.

Adding 1% Gaussian noise to the image coordinates of the
reference points causes both systems to lose accuracy. Perspective stereo
is more sensitive to noise because of its nonlinearlity and greater
degrees of freedom, and is *less* accurate than
the affine stereo approximation at large viewing distances
(figure 9).
(viewing a larger number of reference points reduces the effects of noise
and restores the accuracy of perspective stereo).

In a laboratory or industrial environment it is possible for cameras to be disturbed from time to time and subject to small rotations and translations. If this happens after calibration, it will give rise to a corresponding error in stereo reconstruction.

Table 1 shows the average change in perceived relative position when one camera is rotated or translated a small distance around/along each principle axis. The two systems degrade comparably with small movements, the worst of which is rotation about the optical axis. Perspective stereo is more sensitive to larger movements, and to rotations and translations in the epipolar plane (in which a small error can induce large changes of perceived depth), because it distorts nonlinearly.

When gaussian noise is added to the image coordinates of the points whose relative position is to be estimated (after accurate calibration), the effect is comparable on both systems, and their performance converges at large camera distance (figure 10).