7.1
Object recognition systems almost inevitably involve parameters such as thresholds, bounds and weights [Grimson 1990]. In optimization-based object recognition, where the optimal recognition solution is explicitly defined as the global extreme of an objective function, these parameters can be part of the definition of the objective function by which the global cost (or gain) of the solution is measured. The selection of the parameters is crucial for a system to perform successfully.
Among all the admissible parameter estimates, only a subset of them lead to the desirable or correct solutions to the recognition. Among all the correct estimates, a smaller number of them are better in the sense that they lead to correct solutions for a larger variety of data sets. One of them may be optimal in the sense that it makes the vision procedure the most stable to uncertainties and the least prone to local optima in the search for the global optimum.
The manual method performs parameter estimation in an ad hoc way by trial and error: A combination of parameters is selected to optimize the objective function, then the optimum is compared with the desirable result in the designer's perception and the selection is adjusted; this process is repeated until a satisfactory choice, which makes the optimum consistent with the desirable result, is found. This is a process of supervised learning from examples. When the objective function takes a right functional form, a correct manual selection may be made for a small number of data sets. However, there is no reason to believe that the manual selection is an optimal or even a good one. Such empirical methods have been criticized for its ad hoc nature.
This chapter aims to develop an automated, optimal approach for parameter estimation (In this chapter, parameter "selection", "estimation" and "learning" are used interchangeability) applied to optimization-based object recognition schemes. A theory of parameter estimation based on supervised learning is presented. The learning is ``supervised'' because exemplar is given. Each exemplary instance represents a desirable recognition result where a recognition result is a labeling of the scene in terms of the model objects. Correctness and optimality are proposed as the two-level criteria for evaluating parameter estimates.
A correct selection of parameter enables the configuration given by each exemplary instance to be embedded as a unique global energy minimum. In other words, if the selection is incorrect, the exemplary configuration will not correspond to a global minimum. While a correct estimate can be learned from the exemplar, it is generally not the only correct solution. Instability is defined as the measure of the ease with which the global minimum is replaced by a non-exemplar labeling after a perturbation to the input. The optimality minimizes the instability so as to maximize the ability to generalize the estimated parameters to other situations not directly represented by the exemplar.
Combining the two criteria gives a constrained minimization problem: minimize the instability subject to the correctness. A non-parametric algorithm is presented for learning an estimate which is optimal as well as correct. It does not make any assumption about the distributions and is useful for cases where the size of training data obtained from the exemplar is small and where the underlying parametric models are not accurate. The estimate thus obtained is optimal with respect to the training data.
The theory is applied to a specific model of MRF recognition proposed in [Li 1994a]. The objective function in this model is the posterior energy of an MRF. The form of the energy function has been derived but it involves parameters that have to be estimated. The optimal recognition solution is the maximum a posteriori (MAP) configuration of an MRF. Experiments conducted show very promising results in which the optimal estimate serves well for recognizing other scenes and objects.
A parametric method based on maximum likelihood is also described for computing the optimal parameter estimate under the Gaussian-MRF assumption. It takes advantage of the assumption and may be useful when the size of training data is sufficiently large. The parameter estimate thus computed is optimal with respect to the assumption.
Although automated and optimal parameter selection for object recognition in high level vision is an important and interesting problem which has been existing for a long time, reports on this topic are rare. Works have been done in related areas: In [Poggio and Edelman 1990] for recognizing 3D objects from different viewpoints, a function mapping any viewpoint to a standard view is learned from a set of perspective views. In [Weng et al. 1993], a network structure is introduced for automated learning to recognize 3D objects. In [Pope and Lowe 1993], a numerical graph representation for an object model is learned from features computed from training images. A recent work [Pelilio and Refice 1994] proposes a procedure of learning compatibility coefficients for relaxation labeling by minimizing a quadratic error function. Automated and optimal parameter estimation for low level problems has achieved significant progress. MRF parameter selection has been dealt with in statistics [Besag 1974,Besag 1975] and in applications such as image restoration, reconstruction and texture analysis [Cross and Jain 1983,Cohen and Cooper 1987,Derin and Elliot 1987,Qian and Titterington 1989,Zhang 1988,Nadabar and Jain 1982]. The problem is also addressed from regularization viewpoint [Wahba 1980,Geiger and Poggio 1987,Shahraray and Anderson 1989,Thomson et al. 1991].
The chapter is organized as follows: Section 7.2 presents the theory. Section 7.3 applies the theory to an MRF recognition model. Section 7.4 presents the experimental results. Finally, conclusions are made in Section 7.5.