In computer vision, the essential matrix is a
Contents
Function
More specifically, if
if
The above relation which defines the essential matrix was published in 1981 by Longuet-Higgins, introducing the concept to the computer vision community. Hartley & Zisserman's book reports that an analogous matrix appeared in photogrammetry long before that. Longuet-Higgins' paper includes an algorithm for estimating
Use
The essential matrix can be seen as a precursor to the fundamental matrix. Both matrices can be used for establishing constraints between matching image points, but the essential matrix can only be used in relation to calibrated cameras since the inner camera parameters must be known in order to achieve the normalization. If, however, the cameras are calibrated the essential matrix can be useful for determining both the relative position and orientation between the cameras and the 3D position of corresponding image points.
Derivation and definition
This derivation follows the paper by Longuet-Higgins.
Two normalized cameras project the 3D world onto their respective image planes. Let the 3D coordinates of a point P be
A homogeneous representation of the two image coordinates is then given by
which also can be written more compactly as
where
Another consequence of the normalized cameras is that their respective coordinate systems are related by means of a translation and rotation. This implies that the two sets of 3D coordinates are related as
where
Define the essential matrix as
where
To see that this definition of the essential matrix describes a constraint on corresponding image coordinates multiply
- Insert the above relations between
x ~ ′ x ~ E in terms ofR andt . -
R T R = I sinceR is a rotation matrix. - Properties of the matrix representation of the cross product.
Finally, it can be assumed that both
which is the constraint that the essential matrix defines between corresponding image points.
Properties of the essential matrix
Not every arbitrary
If the essential matrix
and then
The constraints can also be expressed as
and
Here the last equation is matrix constraint, which can be seen as 9 constraints, one for each matrix element. These constraints are often used for determining the essential matrix from five corresponding point pairs.
The essential matrix has five or six degrees of freedom, depending on whether or not it is seen as a projective element. The rotation matrix
Estimation of the essential matrix
Given a set of corresponding image points it is possible to estimate an essential matrix which satisfies the defining epipolar constraint for all the points in the set. However, if the image points are subject to noise, which is the common case in any practical situation, it is not possible to find an essential matrix which satisfies all constraints exactly.
Depending on how the error related to each constraint is measured, it is possible to determine or estimate an essential matrix which optimally satisfies the constraints for a given set of corresponding image points. The most straightforward approach is to set up a total least squares problem, commonly known as the eight-point algorithm.
Determining R and t from E
Given that the essential matrix has been determined for a stereo camera pair, for example, using the estimation method above this information can be used for determining also the rotation and translation (up to a scaling) between the two camera's coordinate systems. In these derivations
The following method for determining
Finding one solution
An SVD of
where
The diagonal entries of
and make the following ansatz
Since
may help.
Showing that it is valid
First, these expressions for
Second, it must be shown that this
it is the case that
According to the general properties of the matrix representation of the cross product it then follows that
Third, it must also need to be shown that the above expression for
Finding all solutions
So far one possible solution for
For the subsequent analysis of the solutions, however, the exact scaling of
To summarize, given
It turns out, however, that only one of the four classes of solutions can be realized in practice. Given a pair of corresponding image coordinates, three of the solutions will always produce a 3D point which lies behind at least one of the two cameras and therefore cannot be seen. Only one of the four classes will consistently produce 3D points which are in front of both cameras. This must then be the correct solution. Still, however, it has an undetermined positive scaling related to the translation component.
It should be noted that the above determination of
3D points from corresponding image points
The problem to be solved there is how to compute
Let
Combining the above relations between 3D coordinates in the two coordinate systems and the mapping between 3D and 2D points described earlier gives
or
Once
The above derivation is not unique. It is also possible to start with an expression for
In the ideal case, when the camera maps the 3D points according to a perfect pinhole camera and the resulting 2D points can be detected without any noise, the two expressions for
There are also other types of extensions of the above computations which are possible. They started with an expression of the primed image coordinates and derived 3D coordinates in the unprimed system. It is also possible to start with unprimed image coordinates and obtain primed 3D coordinates, which finally can be transformed into unprimed 3D coordinates. Again, in the ideal case the result should be equal to the above expressions, but in practice they may deviate.
A final remark relates to the fact that if the essential matrix is determined from corresponding image coordinate, which often is the case when 3D points are determined in this way, the translation vector