Maximum Likelihood

Next: Pseudo-Likelihood Up: Supervised Estimation with Labeled Data Previous: Supervised Estimation with Labeled Data

6.1.1

Maximum Likelihood

Given a realization f of a single MRF, the maximum likelihood (ML) estimate maximizes the conditional probability (the likelihood of ), that is,

or its log likelihood . Note that in this case, f is the data used for the estimation. When the prior p.d.f. of the parameters, , is known, the MAP estimation which maximizes the posterior density

can be sought. The ignorance of the prior p.d.f. leads the user to a rather diffuse choice. The prior p.d.f. is assumed to be flat when the prior information is totally unavailable. In this case the MAP estimation reduces to the ML estimation.

Let us review the idea of the ML parameter estimation using a simple example. Let be a realization of an identical independent Gaussian distribution with the two parameters, the mean and the variance , that is, . We want to estimate the two parameters, , from f. The likelihood function of for fixed f is

A necessary condition for maximizing , or equivalently maximizing , is and . Solving this, we find the ML estimate as

and

A more general result can be obtained for the m-variate Gaussian distribution, which when restricted by the Markovianity, is also known as the auto-normal [Besag 1974] MRF ( cf. Section 1.3.1). Assuming in (1.47) and thus consisting only of , the log likelihood function for the auto-normal field is

where f is considered as a vector. Maximizing this needs to evaluate the determinant . The ML estimation for the Gaussian cases is usually less involved than for general MRFs.

The ML estimation for an MRF in general needs to evaluate the normalizing partition function in the corresponding Gibbs distribution. Consider a homogeneous and isotropic auto-logistic model in the 4-neighborhood system with the parameters (this MRF will also be used in subsequent sections for illustrations). According to (1.39) and (1.40), its energy function and conditional probability are, respectively,

and

The likelihood function is in the Gibbs form

where the partition function

is also a function of . Maximizing with respect to needs to evaluate the partition function . However, the computation of is intractable even for moderately sized problems because there are a combinatorial number of elements in the configuration space . This is a main difficulty in parameter estimation for MRFs. Approximate formulae will be used for solving this problem.

The approximate formulae are based on the conditional probabilities , . (The notation of parameters, , on which the probabilities are conditioned, is dropped temporarily for clarity.) Write the energy function into the form

Here, depends on the configuration on the cliques involving i and , in which the labels and are mutually dependent. If only single- and pair-site cliques are considered, then

The conditional probability (1.29) can be written as

Based on this, approximate formulae for the joint probability are given in the following subsections.

Next: Pseudo-Likelihood Up: Supervised Estimation with Labeled Data Previous: Supervised Estimation with Labeled Data