5.3.2
In the previous pose estimation formulation, a set of matched pairs is assumed available. Here we assume the situation in which the matching has not been done and pose has to be performed during the matching. Pose estimation during matching is practiced when invariants are unavailable or difficult to compute, e.g. due to that the class of transformations is not linear or involves projections. In the following, an MRF model for simultaneous 3D-from-2D matching and pose is derived without using view-invariants. Matching and pose are jointly sought as in [Wells III 1991]. The formulation is an extension to that given in Section 5.2.
Let index a set of m points on a 3D model
object,
. Let
be the label
set indexing a set of M scene points in 2D,
, and
be the augmented set with 0
representing the NULL
label. Let
,
,
denote the matching from the
to
NULL
. When
i is assigned the virtual point 0,
, it means that there is
no corresponding point found in the physically existing point set
.
Let
be the projective pose transformation from the 3D model
points
to the matched 2D image points
(
).
We have
, for all i for which
, under
an exact pose.
Now we derive the MAP-MRF formulation. The neighborhood system is defined by
The single-site potential is an extension to (5.11) as
where is a constant. The function
encodes the prior knowledge about
. It may include prior terms,
such as
in the previous subsection for the admissibility of pose
transformations. If the p.d.f. of the pose is known, e.g.
to be a normal
distribution centered at a known mean pose (This is assumed in [Wells III 1991]), then
is a multi-variate
Gaussian function. The pair-site potential is defined the same as
(5.12)
where is a constant.
The likelihood function characterizes the distribution of the errors and
relates to the observation model and the noise in it. Given ,
the model point
is projected to a point
by the projective
transformation
. In the inexact situation,
where
is the corresponding
image point actually observed.
Assume the following additive noise model
where is a vector of i.i.d. Gaussian noise.
Then the likelihood function is
where
is the unary likelihood potential. The joint likelihood is then where
denotes the
set of unary properties.
We also make use of the distances as an additional binary constraints.
The distance, , between the two model points in 3D is
projected to the distance
in 2D. Its p.d.f. can be derived, based on the distribution of the
projected points given in (5.39), in the following way.
Let . These random variables are assumed
independent, so their joint conditional p.d.f. is
Introduce new random variables, , as
each of which is a function of the X variables. Note that we are
deriving the p.d.f. of . The inverse of
, denoted by
, is determined by
The Jacobian of the inverse is defined to be the determinant
which is a function of the Z variables. The joint conditional p.d.f.
for Z can be derived from the joint p.d.f.
(5.43) using the following relation [Grimmett 1982]
The conditional distribution of is then
the conditional marginal
which is a function of . This gives the
binary likelihood potential
. The
joint p.d.f. of the set of binary features,
, is approximated by
the ``pseudo-likelihood''
The joint p.d.f. of is approximated by
Now the posterior energy can be obtained as
The optimal solution is . The non- NULL
labels in
represent the matching from the
considered model object to the scene and
determines the pose
transformation therein. The model points which are assigned the NULL
label are either spurious or due to other model objects. Another round matching-pose operation may be formed on these remaining point in terms of another model object.