##
2 Geometrical framework

### 2.1 Viewing the plane

Consider a pinhole camera viewing a plane.
The viewing transformation is a plane collineation
between some world coordinate system (*X*,*Y*),
and image plane coordinates (*u*,*v*), thus:

(1)

where *s* is a scale factor that varies for each point;
and **T is a 3*3 transformation matrix.
The system is homogeneous, so we can fix t33 = 1 without
loss of generality, leaving 8 degrees
of freedom. To solve for ****T** we must observe at least four
points. By assigning arbitrary world coordinates to these points
(e.g. (0,0), (0,1), (1,1), (1,0)), we define a
new coordinate system on the plane, which we call
* working plane coordinates*.
Now, given the image coordinates of a point anywhere on the plane, along
with the image coordinates of the four reference points, it is possible
to invert the relation and recover the point's working plane
coordinates, which are invariant to the choice of camera location [7].
We use the same set of reference points for a stereo pair of views,
and compute two transformations **T** and **T'**, one for each
camera.

### 2.2 Recovering the indicated point in stereo

With natural human pointing behaviour, the hand is used to
define a line in space, passing through the base and tip of the index
finger. This line will not generally be in the target plane but
intersects the plane at some point. It is this point (the *
`piercing point'* or * `indicated point'*) that we aim to recover.
Let the pointing finger lie along the line *lw*
in space (see figure 1).
Viewed by a camera, it appears on line *li*
in the image, which is also the projection of a *plane*, **P**, passing
through the image line and the optical centre of the camera.
This plane intersects the ground plane **G** along line
*lgp*.
We know that *lw* lies in **P**, and the
indicated point in *lgp*,
but from one view we cannot see exactly where.

*
***Figure 1: Relation between lines in the world, image and ground planes.**
Projection of the finger's image line
li
onto the
ground plane yields a constraint line
lgp
on which the indicated point must lie.

Note that the line *li* is an image of line
*lgp*; that is,
*li* = **T***lgp*,
where **T** is the projective
transformation from equation (1). If the four reference points are
visible, this transformation can be inverted to find
*lgp*
in terms
of the working plane coordinates. The indicated point is constrained
to lie upon this line on the target surface.
Repeating the above procedure with the second camera **C'** gives us
another view *li'*
of the finger, and another line of constraint
*lgp'*.
The two constraint lines will intersect at a point on the
target plane, which is the indicated point. Its position can now be
found relative to the four reference points.
Figure 2
shows the lines of pointing in a pair of
images, and the intersecting constraint lines in a `canonical' view of
the working plane (in which the reference point quadrilateral is
transformed to a square).
This is a variation of a method employed by Quan and Mohr
[8],
who present an analysis based on cross-ratios.

By transforming this point with projections **T** and **T'**,
the indicated point can be projected back into image coordinates.
Although the working plane coordinates of the indicated point depend
on the configuration of the reference points, its back-projections into
the images do not.
Because all calculations are restricted to the image and ground
planes, explicit 3-D reconstruction is avoided and no camera
calibration is necessary. By tracking at least four points on the
target plane, the system can be made insensitive to camera
motions.

*
***
Figure 2: Pointing at the plane.**
By taking the lines of pointing in left and right views (a, c),
transforming them into the canonical frame defined by the four corners of
the grey rectangle (b), and finding the intersection of the lines,
the indicated point can be determined; this is then projected back into the
images.

Next
Contents