7.3.1
An object or a scene is represented by a set of features where the
features are attributed by their properties and constrained to one
another by contextual relations. Let a set of m features (sites) in
the scene be indexed by , a set of M features
(labels) in the considered model object by
, and
everything in the scene not modeled by labels in
by
which
is a virtual NULL
label. The set union
is the
augmented label set. The structure of the scene is denoted by
and that of the model object by
where d
denotes the visual constraints on features in
and D describes
the visual constraints on features in
where the constraints can be
e.g.
properties and relations between features.
Let object recognition be posed as assigning a label from to
each of the sites in
so as to satisfy the constraints. The
labeling (configuration) of the sites is defined by
in which
is the label assigned to
i. A pair
is a match or correspondence. Under
contextual constraints, a configuration f can be interpreted as a
mapping from the structure of the scene
to the structure
of the model object
.
Therefore, such a mapping is denoted as a triple
.
The observation , which is the features extracted from the
image, consists of two sources of constraints, unary properties
for single-site features such as color and size, and binary relations
for pair-site features such as angle and distance. More
specifically, each site
is associated with a set of
properties
and each pair
of sites with a set of
relations
. In the model object library, we have
model features
and
(note that
excludes the NULL
label). According to (5.14), under
the labeling f, the observation d is a noise contaminated version of
the corresponding model features D
where are non- NULL
matches and e is a white Gauss
noise; that is,
and
are white Gaussian distributions
with conditional means
and
, respectively.
The posterior energy takes the form
shown in (5.18), rewritten below
The first and second summations are due to the joint prior probability of the MRF labels f; the third and fourth are due to the conditional p.d.f. of d, or the likelihood of f, respectively. Refer to (5.11), (5.12), (5.16) and (5.17).