6.2.3
The expectation-maximization (EM) algorithm [Dempster et al. 1977] is a general technique for finding maximum likelihood (ML) estimates with incomplete data. In the classic ML, the estimate is obtained from the complete data as
In EM, the complete data is considered to consist of the two parts,
, of which only
is observed while
is missing (or unobservable, hidden). With only the incomplete
data
, an EM procedure attempts to solve the following ML
estimation problem
which is more general than the classic ML.
Intuitively, we may envisage a maximum likelihood procedure which after
some initialization for and
, iterates between the
following two steps until convergence: (1) estimate the missing part as
given the current
estimate and then use it to
augment the observed data
to form the complete data
set
; and (2) estimate, with
,
by maximizing the complete-data log likelihood
. The simultaneous
labeling-estimation algorithm described in the previous two subsections
is based on such an idea.
However, we cannot work directly with this log likelihood because it is
a random function of the missing variables f (the reason why the above
procedure is ad hoc
). The idea of the EM algorithm is to use the
expectation of the complete-data log likelihood, , which formalizes the above
intuitive procedure.
Related to parameter estimation in MRF models, the missing part
corresponds to the unobservable labeling f, , and the
observed part is the given data,
. The complete data log
likelihood is then denoted
. The EM algorithm
consists of the following two steps for each iteration:
(1) The E-step. Computing the following conditional expectation of the log likelihood
(2) The M-step. Maximizing
to obtain the next estimate
If in the M-step, the next estimate is chosen
only to ensure
(not necessarily to maximize Q), then the algorithm is then called a generalized EM (GEM).
The E-step computes the conditional expectation of the unobservable
labels f given the observed data d and the current estimate
; and then substitutes the expectations for the labels.
The M-step performs maximum likelihood estimation as if there were no
missing data, i.e.
as it had been filled in by the expectations.
The above describes just an instance of the EM procedure. More
generally, the probability can be replaced by any
complete-data sufficient statistics. When the prior distribution
is known, the M-step can be modified to maximize the
expected posterior
or
, where
, instead of the likelihood.
EM is shown to converge to the ML estimates at least locally under some moderate conditions [Wu 1983]. Global convergence is also proven for special cases e.g. in [Redner and Walker 1984,]. However, the convergence of the EM algorithm can be painfully slow.
Let us be more specific about the expectation . For f taking discrete values,
which is the case for discrete restoration and for segmentation, the
expectation is
For continuous f, which is for continuous restoration, it is calculated from the p.d.f.'s
Consider the homogeneous and isotropic auto-logistic MRF model described by (6.7) and (6.8). We have
Assume that the data is obtained according to with
independent Gaussian noise
. Then
where . The
expectation is
A main difficulty in applying EM to MRFs is again the complexity of in the computation of
, due to the need to evaluate the partition function for
. This difficulty may be overcome by using the
pseudo-likelihood method [Chalmond 1989] or the mean field
approximation [Zhang 1992,Zhang 1993].