9.2
Mean field annealing provides another approach for solving combinatorial optimization problems using real computation. Peterson and Soderberg (1989) derive an alternative iteration equation which is consistent with the continuous RL representation.
Let
be an unambiguous
labeling assignment in
defined in
(8.24); each
takes the value of one of
the vectors
,
, and
.
Denote the energy with the RL representation still by
. The Gibbs
distribution of the energy under temperature T>0 is given as

where the partition function

sums over
. The statistical mean of the distribution is
defined as
As the temperature approaches zero, the mean of the Gibbs distribution
approaches its mode or the global energy minimum

This suggests that instead of minimizing
directly, we could try
to evaluate the mean field
at a sufficiently high
temperature and then track it down using the continuation method
[Wasserstrom 1973] as the temperature is lowered towards zero.
The analysis of the partition function is central in mean field theory
because once it is calculated, all statistic information about the
system can be deduced from it. The mean field trick for calculating the
partition function is to (1) rewrite the sum over a discrete space as an
integral over a pair of continuous variables p and q,
and
having the same dimensionality as f, and then
(2) evaluate its integrand at the saddle point (saddle point
approximation). As can be seen later,
p and q corresponds to a labeling assignment as defined in
(8.20) and a gradient as defined in
(8.29), respectively. In the following, f, p and
q are considered as
matrices. To do step (1), we begin by
rewriting any function
as the following integral

where
is the Dirac delta function and
is the
dimensional real space. The following definition of
the Dirac delta function is used in the above

where
denotes transpose,
is the
dimensional imaginary space and C is a normalizing constant.
Therefore,

Using the above formula, we can rewrite the partition function, which
is the sum of the function
as

The summation inside the integrand can be written as

Therefore,

where

is called the effective energy. .
The above integral expression of Z is exact but too complicated for precise calculation. The calculation may be approximated based on the following heuristic argument: The double integral is dominated by saddle points in the integration intervals and therefore the integral is approximated by

where
is a saddle point of
. The saddle points are
among the roots of the equations

where
is the gradient with respect to p.
This yields the mean field theory equations

The p matrix represents a labeling assignment in the feasibility space defined by (8.19). The mean field theory equations for combinatorial minimization have been derived in many places e.g. in [Yuille 1990,Simic 1990,Elfadel 1993,Haykin 1994,Yang and Kittler 1994].
Note that in terms of the RL representation, the energy can be written
as
and so q can be computed as

( cf. 8.29). Now we obtain the following fixed-point iteration
In iterative computation, the temperature T in q is decreased
towards
as iteration goes. The unambiguity of p is achieved when
. In [Peterson and Soderberg 1989], the convergence is judged by a
quantity defined as
also called the ``saturation'' of f. The iteration
(9.22) is considered converged if
. Analog
network implementation schemes for mean field algorithms are
investigated in [Elfadel 1993]
It is worth pointing out that the idea of mean field annealing for
global optimization has been proposed earlier by [Pinkus 1968] as the
integral limit method. There, the global minimum is expressed in closed
form as a limit of an integral. Assume
that
is the closure of a bounded domain in
. Suppose
that
is continuous on
and has a unique global minimum
. Pinkus shows that the following
integral
approaches to the global minimum
as
. Obviously, the
above integral is the continuous version of the mean field defined
in (9.9) and the idea of the integral limit is similar
to that of the mean field annealing. Theoretical analysis shows that
this approximation has rather good asymptotic properties but its
realization may need some heuristic trials. Computationally, the values
can be obtained analytically only in exceptional cases and a
numerical method has to be used in general.