5.3.1
Figure: Pose clusters in one dimensional parameter space. Poses ,
,
and
due to pairs 1, 2, 4 and 9 agree to a
transformation and poses
,
,
,
and
agree
to another. Poses
and
forms isolated points so that pair
6 and pair 10 are outliers.
The problem of the pose clustering is stated as follows: Let a set of
corresponding points be given as the data, , where
's are the model features,
are the
scene features (Note that in this section, the upper case notations are for models and the lower case notations for the scene.)
and
indexes the set of the matched pairs. Let
be the geometric transformation from
to
and consider
the set
as a configuration of the ``pose field''.
In the case of noiseless, perfect correspondences, the following m
equations, each transforming a model feature to a scene feature, should
hold simultaneously
We want to find the optimal pose configuration in the MAP sense, i.e.
.
Assume that each 's is confined to a certain class
of pose
transformations such that the the admissible pose space is
. This imposes constraints on the parameters governing
. The needed number of transformation parameters (degree of
freedom) depends on the class of transformations and the adopted
representation for the pose transformation. In the case of the 3D-3D
Euclidean transformation, for example, it can consists of an orthogonal
rotation
followed by a translation
, i.e.
; the
relation between the corresponding points is
. The simple matrix representation needs 12 parameters: 9 elements
in the rotation matrix
plus 3 elements in the translation vector
. The rotation angle representation needs six parameters: 3 for
the three rotation angles and 3 for the translation. Quaternions
provide still another choice. A single pair
alone is usually insufficient to determine a pose
transformation
; more are needed for the pose to be fully
determined.
If all the pairs in the data, d, are inliers and are due to a single
transformation, then all ,
, which are points in the pose
space, must be close to each other; and the errors
,
where
is the Euclidean distance, must all be small.
Complication increases when there are multiple pose clusters and outlier
pairs. When there are multiple poses,
's should form distinct
clusters. In this case, the set f is divided into subset, each giving
a consistent pose transformation from a partition of
to a
partition of
. Fig.5.12 illustrates a case in
which there are two pose clusters and some outliers. Outlier pairs, if
contained in the data, should be excluded from the pose estimation
because they can cause large errors. Multiple pose identification with
outlier detection has a close affinity to the prototypical problem of
image restoration involving discontinuities [Geman and Geman 1984] and to
that of matching overlapping objects using data containing spurious
features [Li 1994a].
Now we derive the MAP-MRF formulation. The neighborhood system is define by
where is some suitably defined measure of distance
between model features and the scope r may be reasonably related to
the size of the largest model object. We consider cliques of up to order
two and so the clique set
where
is the set of single-site (first order) cliques and
pair-site (second
order) cliques.
Under a single pose transformation, nearby model features are likely to
appear together in the scene whereas distantly apart model features tend
less likely. This is the coherence of spatial features. We characterize
this using the Markovianity condition . The positivity condition
also hold for all
where BbbF is the set of admissible transformations.
The MRF configuration f follows a Gibbs distribution. The two-site
potentials determine interactions between the individual 's. They
may be defined as
where is a norm in the pose space and
is some
function. To be able to separate different pose clusters, the function
should stop increasing as
becomes very
large. A choice is
where is a threshold parameter; this is the same as that
used in the line process model for image restoration with
discontinuities [Geman and Geman 1984,Maroquin et al. 1987,Blake and Zisserman 1987]. It may
be any APF defined in (3.28). Its value reflects the cost
associated with the pair of pose labels,
and
, and will be
large when
and
belong to different clusters. But it
cannot be arbitrarily large since a large value, as might be given by a
quadratic g, tends to force the
and
to stay in one
cluster, as the result of energy minimization, even when they should
not. Using an APF imposes piecewise smoothness.
The single-site potentials may be used to force
to stay
in the admissible set
if such a force is needed. For example,
assume
is a 2D-2D Euclidean transformation. Then, the
rotation matrix
must be orthogonal. The
unary potential for the orthogonality constraint can be expressed by
. It has the value of
zero only when
is orthogonal. If no scale change is allowed, then
the scaling factor should be exactly one, and an additional term
can be added where
is the determinant.
Adding these two gives the single-site potential as
where a and b are the weighting factors. In this case,
imposes the orthogonality. It is also possible to define
for
other classes of transformations. Summing all prior clique potentials
yields the following prior energy
The above defines the prior distribution .
The likelihood function is derived below. Assume that the features are
point locations and that they are subject to the additive noise model,
, where
is a vector
of i.i.d. Gaussian noise. Then the distribution of the data d
conditional on the configuration f is
where the likelihood energy is
The location is the conditional ``mean'' of the random
variable
. The quantity,
, reflects the error between
the location
predicated by
and the actual location
.
After that, the posterior energy follows immediately as . The optimal solution is
. As the result of energy minimization, inlier pairs undergone the
same pose transformation will form a cluster whereas outlier pairs will
form isolated points in the pose space, as illustrated in
Fig.5.12.