Simultaneous Matching and Pose

Next: Discussion Up: Pose Computation Previous: Pose Clustering and Estimation

5.3.2

Simultaneous Matching and Pose

In the previous pose estimation formulation, a set of matched pairs is assumed available. Here we assume the situation in which the matching has not been done and pose has to be performed during the matching. Pose estimation during matching is practiced when invariants are unavailable or difficult to compute, e.g. due to that the class of transformations is not linear or involves projections. In the following, an MRF model for simultaneous 3D-from-2D matching and pose is derived without using view-invariants. Matching and pose are jointly sought as in [Wells III 1991]. The formulation is an extension to that given in Section 5.2.

Let index a set of m points on a 3D model object, . Let be the label set indexing a set of M scene points in 2D, , and be the augmented set with 0 representing the NULL label. Let , , denote the matching from the to NULL . When i is assigned the virtual point 0, , it means that there is no corresponding point found in the physically existing point set . Let be the projective pose transformation from the 3D model points to the matched 2D image points (). We have , for all i for which , under an exact pose.

Now we derive the MAP-MRF formulation. The neighborhood system is defined by

The single-site potential is an extension to (5.11) as

where is a constant. The function encodes the prior knowledge about . It may include prior terms, such as in the previous subsection for the admissibility of pose transformations. If the p.d.f. of the pose is known, e.g. to be a normal distribution centered at a known mean pose (This is assumed in [Wells III 1991]), then is a multi-variate Gaussian function. The pair-site potential is defined the same as (5.12)

where is a constant.

The likelihood function characterizes the distribution of the errors and relates to the observation model and the noise in it. Given , the model point is projected to a point by the projective transformation . In the inexact situation, where is the corresponding image point actually observed.

Assume the following additive noise model

where is a vector of i.i.d. Gaussian noise. Then the likelihood function is

where

is the unary likelihood potential. The joint likelihood is then where denotes the set of unary properties.

We also make use of the distances as an additional binary constraints. The distance, , between the two model points in 3D is projected to the distance

in 2D. Its p.d.f. can be derived, based on the distribution of the projected points given in (5.39), in the following way. Let . These random variables are assumed independent, so their joint conditional p.d.f. is

Introduce new random variables, , as

each of which is a function of the X variables. Note that we are deriving the p.d.f. of . The inverse of , denoted by , is determined by

The Jacobian of the inverse is defined to be the determinant

which is a function of the Z variables. The joint conditional p.d.f. for Z can be derived from the joint p.d.f. (5.43) using the following relation [Grimmett 1982]

The conditional distribution of is then the conditional marginal

which is a function of . This gives the binary likelihood potential . The joint p.d.f. of the set of binary features, , is approximated by the ``pseudo-likelihood''

The joint p.d.f. of is approximated by

Now the posterior energy can be obtained as

The optimal solution is . The non- NULL labels in represent the matching from the considered model object to the scene and determines the pose transformation therein. The model points which are assigned the NULL

label are either spurious or due to other model objects. Another round matching-pose operation may be formed on these remaining point in terms of another model object.

Next: Discussion Up: Pose Computation Previous: Pose Clustering and