Expectation-Maximization

Next: Cross Validation Up: Unsupervised Estimation with Unlabeled Data Previous: Simultaneous Segmentation and Estimation

6.2.3

Expectation-Maximization

The expectation-maximization (EM) algorithm [Dempster et al. 1977] is a general technique for finding maximum likelihood (ML) estimates with incomplete data. In the classic ML, the estimate is obtained from the complete data as

In EM, the complete data is considered to consist of the two parts, , of which only is observed while is missing (or unobservable, hidden). With only the incomplete data , an EM procedure attempts to solve the following ML estimation problem

which is more general than the classic ML.

Intuitively, we may envisage a maximum likelihood procedure which after some initialization for and , iterates between the following two steps until convergence: (1) estimate the missing part as given the current estimate and then use it to augment the observed data to form the complete data set ; and (2) estimate, with , by maximizing the complete-data log likelihood . The simultaneous labeling-estimation algorithm described in the previous two subsections is based on such an idea.

However, we cannot work directly with this log likelihood because it is a random function of the missing variables f (the reason why the above procedure is ad hoc ). The idea of the EM algorithm is to use the expectation of the complete-data log likelihood, , which formalizes the above intuitive procedure.

Related to parameter estimation in MRF models, the missing part corresponds to the unobservable labeling f, , and the observed part is the given data, . The complete data log likelihood is then denoted . The EM algorithm consists of the following two steps for each iteration:

(1) The E-step. Computing the following conditional expectation of the log likelihood

(2) The M-step. Maximizing to obtain the next estimate

If in the M-step, the next estimate is chosen only to ensure

(not necessarily to maximize Q), then the algorithm is then called a generalized EM (GEM).

The E-step computes the conditional expectation of the unobservable labels f given the observed data d and the current estimate ; and then substitutes the expectations for the labels. The M-step performs maximum likelihood estimation as if there were no missing data, i.e. as it had been filled in by the expectations.

The above describes just an instance of the EM procedure. More generally, the probability can be replaced by any complete-data sufficient statistics. When the prior distribution is known, the M-step can be modified to maximize the expected posterior or , where , instead of the likelihood.

EM is shown to converge to the ML estimates at least locally under some moderate conditions [Wu 1983]. Global convergence is also proven for special cases e.g. in [Redner and Walker 1984,]. However, the convergence of the EM algorithm can be painfully slow.

Let us be more specific about the expectation . For f taking discrete values, which is the case for discrete restoration and for segmentation, the expectation is

For continuous f, which is for continuous restoration, it is calculated from the p.d.f.'s

Consider the homogeneous and isotropic auto-logistic MRF model described by (6.7) and (6.8). We have

Assume that the data is obtained according to with independent Gaussian noise . Then

where . The expectation is

A main difficulty in applying EM to MRFs is again the complexity of in the computation of , due to the need to evaluate the partition function for . This difficulty may be overcome by using the pseudo-likelihood method [Chalmond 1989] or the mean field approximation [Zhang 1992,Zhang 1993].

Next: Cross Validation Up: Unsupervised Estimation with Unlabeled Data Previous: Simultaneous Segmentation and Estimation