1.4.4
In formal models, as opposed to heuristic ones, an energy function is formulated based on an established criterion. Because of inevitable uncertainties in vision processes, principles from statistics, probability and information theory are often used as the formal basis.
When we have the knowledge about the data distribution but no appreciable prior information about the quantity being estimated, we may use the maximum likelihood (ML) criterion. When the situation is the opposite, that is, when we have only prior information, then we may use the maximum entropy criterion. Distributions of higher entropy are more likely because nature can generate them in more ways and the maximum entropy criterion is simply taking this fact into account [Jaynes 1982].
Neither of these two methods is adequate for problems where we know both prior and likelihood distributions. With both sources of information available, the best we can get is that maximizes a Bayes criterion. There are two forms of such estimate often used in practice: that of the MAP probability and that of maximum a posteriori mean. The maximizer of the posterior marginals (MPM) [Marroquin 1985 ; Marroquin et al. 1987] provides an alternative Bayes estimator. Although there have been philosophical and scientific controversies about their appropriateness in inference and decision making (see [Clark and Yuille 1990] for a short review), Bayes criteria are among the most popular ones in computer vision and in fact, MAP is the most popular criterion in optimization-based MRF modeling. The equivalence theorem of between Markov random fields and Gibbs distribution established in Section 1.2.4 provides a convenient way for specifying the joint prior probability, solving a difficult issue in MAP-MRF labeling.
In the principle of minimum description length (MDL) [Rissanen 1978 ; Rissanen 1983], the optimal solution to a problem is that needs the smallest set of vocabulary in a given language for explaining the input data. The MDL has close relationships to the statistical methods such as the ML and MAP [Rissanen 1983]. For example, if is related to the description length and related to the description error, then MDL is equivalent to MAP. However, it is a more natural and intuitive when prior probabilities are not well defined. The MDL has been used for vision problems at different levels such as segmentation [Leclerc 1989 ; Pentland 1990 ; Darrell et al. 1990 ; Dengler 1991 ; Keeler 1991] and object recognition [Breuel 1993].