9.2
Mean field annealing provides another approach for solving combinatorial optimization problems using real computation. Peterson and Soderberg (1989) derive an alternative iteration equation which is consistent with the continuous RL representation.
Let be an unambiguous
labeling assignment in
defined in
(8.24); each
takes the value of one of
the vectors
,
, and
.
Denote the energy with the RL representation still by
. The Gibbs
distribution of the energy under temperature T>0 is given as
where the partition function
sums over . The statistical mean of the distribution is
defined as
As the temperature approaches zero, the mean of the Gibbs distribution
approaches its mode or the global energy minimum
This suggests that instead of minimizing directly, we could try
to evaluate the mean field
at a sufficiently high
temperature and then track it down using the continuation method
[Wasserstrom 1973] as the temperature is lowered towards zero.
The analysis of the partition function is central in mean field theory
because once it is calculated, all statistic information about the
system can be deduced from it. The mean field trick for calculating the
partition function is to (1) rewrite the sum over a discrete space as an
integral over a pair of continuous variables p and q,
and
having the same dimensionality as f, and then
(2) evaluate its integrand at the saddle point (saddle point
approximation). As can be seen later,
p and q corresponds to a labeling assignment as defined in
(8.20) and a gradient as defined in
(8.29), respectively. In the following, f, p and
q are considered as
matrices. To do step (1), we begin by
rewriting any function
as the following integral
where is the Dirac delta function and
is the
dimensional real space. The following definition of
the Dirac delta function is used in the above
where denotes transpose,
is the
dimensional imaginary space and C is a normalizing constant.
Therefore,
Using the above formula, we can rewrite the partition function, which
is the sum of the function as
The summation inside the integrand can be written as
Therefore,
where
is called the effective energy. .
The above integral expression of Z is exact but too complicated for precise calculation. The calculation may be approximated based on the following heuristic argument: The double integral is dominated by saddle points in the integration intervals and therefore the integral is approximated by
where is a saddle point of
. The saddle points are
among the roots of the equations
where is the gradient with respect to p.
This yields the mean field theory equations
The p matrix represents a labeling assignment in the feasibility space defined by (8.19). The mean field theory equations for combinatorial minimization have been derived in many places e.g. in [Yuille 1990,Simic 1990,Elfadel 1993,Haykin 1994,Yang and Kittler 1994].
Note that in terms of the RL representation, the energy can be written
as and so q can be computed as
( cf. 8.29). Now we obtain the following fixed-point iteration
In iterative computation, the temperature T in q is decreased
towards as iteration goes. The unambiguity of p is achieved when
. In [Peterson and Soderberg 1989], the convergence is judged by a
quantity defined as
also called the ``saturation'' of f. The iteration
(9.22) is considered converged if . Analog
network implementation schemes for mean field algorithms are
investigated in [Elfadel 1993]
It is worth pointing out that the idea of mean field annealing for
global optimization has been proposed earlier by [Pinkus 1968] as the
integral limit method. There, the global minimum is expressed in closed
form as a limit of an integral. Assume
that is the closure of a bounded domain in
. Suppose
that
is continuous on
and has a unique global minimum
. Pinkus shows that the following
integral
approaches to the global minimum as
. Obviously, the
above integral is the continuous version of the mean field defined
in (9.9) and the idea of the integral limit is similar
to that of the mean field annealing. Theoretical analysis shows that
this approximation has rather good asymptotic properties but its
realization may need some heuristic trials. Computationally, the values
can be obtained analytically only in exceptional cases and a
numerical method has to be used in general.