9.1.2
Figure 9.2: The simulated annealing algorithm.
Consider a system in which any f in the configuration space
has probability
where T>0 is the temperature of the system. In the limit as
,
is a uniform distribution on
; for T=1,
; as
,
is concentrated on the peak(s) of
. Of course, the same is also true with a posterior probability
.
The SA algorithm is described in Fig.9.2. SA applies a
sampling algorithm, such as Metropolis, successively at decreasing values
of temperature T. Initially, T is set very high and f is set to a
random configuration. At a fixed T, the sampling is according to the
Gibbs distribution . After
the Metropolis algorithm converges to the equilibrium at current T,
T is decreased according to a carefully chosen schedule. This iterates
until T is close to 0 when the system is ``frozen'' near the minimum
of
. The cooling schedule, specified by a decrement function and a
final value, plays an important part; see below.
Two convergence theorems are proven by [Geman and Geman 1984]. The first concerns the convergence of the Metropolis algorithm. It says that if every configuration is visited infinitely often, the distribution of generated configurations is guaranteed to converge to the Boltzmann ( i.e.
Gibbs) distribution.
The second is about SA. It states that if the decreasing sequence of temperatures satisfy
and
where , then the system converges to
the global minimum regardless of the initial configuration
.
Note that the above conditions are sufficient but not necessary for the
convergence.
Unfortunately, the schedule (9.4) is too slow to be of practical use. In practice, heuristic, faster schedules have to be used instead. Geman and Geman (1984) adopt the following
where the constant C is set or
for their problem.
Kirkpatrick et al.
(1983) choose
where typically takes a value between
and
. The
initial temperature is set high enough so that essentially all
configuration changes are accepted. At each temperature, enough
configurations are tried that either there are 10m (10 times the
number of sites) accepted configuration transitions or the number of
tries exceeds 100m. The system is frozen and annealing stops if the
desired number of acceptances is not achieved at three successive
temperatures. In [Kato et al. 1993a], a multi-temperature annealing
scheme is proposed for annealing in multi-resolution computation; there,
the temperature in the Gibbs distribution is related not only to the
time but also to the scale.
The two sampling procedures for SA, i.e. Metropolis algorithm [Metropolis and etal 1953] or Gibbs sampler [Geman and Geman 1984], are proven to be asymptotically equivalent for SA performed on lattices but this is not generally true for non-lattice structures [Chiang and Chow 1992]. The interested reader is referred to [Kirkpatrick et al. 1982,Geman and Geman 1984,Aarts 1989] for devoted discussions on SA.
A performance comparison between stochastic SA and deterministic GNC for continuous visual reconstruction is given by [Blake 1989]. Based on his experiments under controlled conditions, Black argues that GNC excels SA both in computational efficiency and problem-solving power. An experimental comparison between SA and deterministic RL algorithms for combinatorial optimization will be presented in Section 9.5.