The relation between image plane coordinates (

This is the usual camera model for many stereo vision systems. Although it neglects effects such as lens distortion which are significant in some high-accuracy applications [15], it correctly predicts image distortion due to perspective effects e.g. parallel 3D lines projecting to intersect a vanishing point and the cross ratio (not ratio) of lengths is invariant to this transformation.

This formulation assumes that images are not distorted by variations in depth, and is known as

The entire projection, again incorporating scaling and
shearing of pixel coordinates, may now be written very simply as a
linear mapping:

The eight coefficients

There are many situations in computer vision where an object must be
*tracked* as it moves across a view. Here we consider the simple,
but not uncommon, case where the object is small and has planar
faces.

We can define a coordinate system centred about the object face
itself so that it
lies within the *xy* plane. If the object is small compared to the
camera distance, we again have weak perspective, and a special case
of (4):

We see that the transformation from a plane in the world to the image plane is a 2D

This is a powerful constraint that can be exploited when tracking a planar object. It tells us that the shape of the image will deform only affinely as the object moves, and that there will exist an affine transformation between any two views of the same plane.

where

Once the coefficients are known, world coordinates can be obtained by inverting (6), using a least-squares method to resolve the redundant information. Errors in calibration will manifest themselves as a linear distortion of the perceived coordinate frame.

**Note:**

Under weak perspective any two views of the same planar surface will
be related by an affine transformation
that maps one image to the other. This consists of a
pure 2D translation encoding the displacement of the centroid
and a 2D tensor - the disparity gradient tensor
- which represents the distortion in image shape.
This transformation can be used to
recover surface orientation [2].
Surface orientation in space is most conveniently represented by a
surface normal vector **n**. We can obtain it by the vector
product of two non-collinear vectors in the plane which can
of course be obtained from three pairs of image points.
There is, however, no redundancy in the data and this method
would be sensitive to image measurement error. A better
approach is to exploit all the information in available in the
affine transform (disparity field).

Consider the standard unit vectors **X** and **Y**
in one image
and suppose they were the projections of some vectors on the object surface.
If the linear mapping between images is represented by a 2*3
matrix **A**, then the first two columns of **A** itself will
be the corresponding vectors in the other image.
As the centroid of the plane will map to both image centroids, we can
easily use it and the above pairs of vectors to find three points in
space on the plane (by inverting (6))
and hence the surface
orientation.