Pose Clustering and Estimation

Next: Simultaneous Matching and Pose Up: Pose Computation Previous: Pose Computation

5.3.1

Pose Clustering and Estimation

Figure: Pose clusters in one dimensional parameter space. Poses , , and due to pairs 1, 2, 4 and 9 agree to a transformation and poses , , , and agree to another. Poses and forms isolated points so that pair 6 and pair 10 are outliers.

The problem of the pose clustering is stated as follows: Let a set of corresponding points be given as the data, , where 's are the model features, are the scene features (Note that in this section, the upper case notations are for models and the lower case notations for the scene.) and indexes the set of the matched pairs. Let be the geometric transformation from to and consider the set as a configuration of the ``pose field''. In the case of noiseless, perfect correspondences, the following m equations, each transforming a model feature to a scene feature, should hold simultaneously

We want to find the optimal pose configuration in the MAP sense, i.e.

Assume that each 's is confined to a certain class of pose transformations such that the the admissible pose space is . This imposes constraints on the parameters governing . The needed number of transformation parameters (degree of freedom) depends on the class of transformations and the adopted representation for the pose transformation. In the case of the 3D-3D Euclidean transformation, for example, it can consists of an orthogonal rotation followed by a translation , i.e. ; the relation between the corresponding points is . The simple matrix representation needs 12 parameters: 9 elements in the rotation matrix plus 3 elements in the translation vector . The rotation angle representation needs six parameters: 3 for the three rotation angles and 3 for the translation. Quaternions provide still another choice. A single pair alone is usually insufficient to determine a pose transformation ; more are needed for the pose to be fully determined.

If all the pairs in the data, d, are inliers and are due to a single transformation, then all , , which are points in the pose space, must be close to each other; and the errors , where is the Euclidean distance, must all be small. Complication increases when there are multiple pose clusters and outlier pairs. When there are multiple poses, 's should form distinct clusters. In this case, the set f is divided into subset, each giving a consistent pose transformation from a partition of to a partition of . Fig.5.12 illustrates a case in which there are two pose clusters and some outliers. Outlier pairs, if contained in the data, should be excluded from the pose estimation because they can cause large errors. Multiple pose identification with outlier detection has a close affinity to the prototypical problem of image restoration involving discontinuities [Geman and Geman 1984] and to that of matching overlapping objects using data containing spurious features [Li 1994a].

Now we derive the MAP-MRF formulation. The neighborhood system is define by

where is some suitably defined measure of distance between model features and the scope r may be reasonably related to the size of the largest model object. We consider cliques of up to order two and so the clique set where is the set of single-site (first order) cliques and pair-site (second order) cliques.

Under a single pose transformation, nearby model features are likely to appear together in the scene whereas distantly apart model features tend less likely. This is the coherence of spatial features. We characterize this using the Markovianity condition . The positivity condition also hold for all where BbbF is the set of admissible transformations.

The MRF configuration f follows a Gibbs distribution. The two-site potentials determine interactions between the individual 's. They may be defined as

where is a norm in the pose space and is some function. To be able to separate different pose clusters, the function should stop increasing as becomes very large. A choice is

where is a threshold parameter; this is the same as that used in the line process model for image restoration with discontinuities [Geman and Geman 1984,Maroquin et al. 1987,Blake and Zisserman 1987]. It may be any APF defined in (3.28). Its value reflects the cost associated with the pair of pose labels, and , and will be large when and belong to different clusters. But it cannot be arbitrarily large since a large value, as might be given by a quadratic g, tends to force the and to stay in one cluster, as the result of energy minimization, even when they should not. Using an APF imposes piecewise smoothness.

The single-site potentials may be used to force to stay in the admissible set if such a force is needed. For example, assume is a 2D-2D Euclidean transformation. Then, the rotation matrix must be orthogonal. The unary potential for the orthogonality constraint can be expressed by . It has the value of zero only when is orthogonal. If no scale change is allowed, then the scaling factor should be exactly one, and an additional term can be added where is the determinant. Adding these two gives the single-site potential as

where a and b are the weighting factors. In this case, imposes the orthogonality. It is also possible to define for other classes of transformations. Summing all prior clique potentials yields the following prior energy

The above defines the prior distribution .

The likelihood function is derived below. Assume that the features are point locations and that they are subject to the additive noise model, , where is a vector of i.i.d. Gaussian noise. Then the distribution of the data d conditional on the configuration f is

where the likelihood energy is

The location is the conditional ``mean'' of the random variable . The quantity, , reflects the error between the location predicated by and the actual location .

After that, the posterior energy follows immediately as . The optimal solution is . As the result of energy minimization, inlier pairs undergone the same pose transformation will form a cluster whereas outlier pairs will form isolated points in the pose space, as illustrated in Fig.5.12.

Next: Simultaneous Matching and Up: Pose Computation Previous: Pose Computation