RL using Lagrange-Hopfield Method

Next: Minimization -- Global Methods Up: Constrained Minimization Previous: Hopfield Method

8.3.4

RL using Lagrange-Hopfield Method

In this subsection, we will use a so-called Lagrange-Hopfield method to find the RL solution after posing it as a constrained minimization problem. This method combines the techniques used in Lagrangian multiplier method and in the graded Hopfield neural network. In terms of the optimized objective value, the algorithm performs almost as well as simulated annealing, as shown by the experimental results in [Li 1995d]. Also, the resulting algorithm is fully distributive and is suitable for analog implementation.

Let us state the RL problem more formally as the following constrained minimization:

(8.72)

(8.73)

where

Equation (8.72) and (8.73) defines the feasible space . The final solution must be unambiguous:

To solve this, we can use the following method.

First, we introduce internal variables () and relate them to via

where is a sigmoid function

controlled by a temperature parameter T>0. With the introduction of the u variables, the energy function can be considered as . This treatment confines to the range and imposes the constraints of (8.73). In very high gain when , is forced to be 0 or 1 depending on whether is positive or negative, thus imposing the constraints of (8.75).

Next, we use the Lagrange multiplier method to impose the constraints of (8.72). Define the following Lagrange function

where are called Lagrange multipliers. It is a function of the variables of p and m variables of . Penalty terms can be added to give an augmented Lagrange function [Powell 1969,Hestenes 1969]

where is a finite weight. Our experiments show that the penalty terms are generally necessary for damping oscillations and ensuring the convergence for the algorithm described below.

For to be a local minimum subject to the constraints, it is necessary that be a stationary point of the Lagrange function:

If is a saddle point for which

then is a local minimum of satisfying .

The following dynamics for the p variables can be used to find such a saddle point

where is defined in (8.29) and for . It performs energy descent on p but ascent on . The convergence of this system has been obtained in []. Its application in relaxation labeling is suggested in [Ullman 1979b] without using the u variables. It has also been used for solving the traveling salesman problem [Platt and Barr 1988,Wacholder et al. 1989].

In our method, the updating of the labeling variables is performed on u rather than on p. Equ.(8.82) is replaced by

where a is weighted by a factor of to achieve some invariance to m. Note that is always positive.

A damping term , where , may be added to (8.84) as in the graded Hopfield neural networks [Hopfield 1984]. This is equivalent to adding an energy term

to , where the inverse of the sigmoid function is

The term reaches the minimum of zero only when all 's are either 0 or 1. This means minimizing this term with in effect leads to the satisfaction of (8.75). However, this term is not necessary in our model because the satisfaction of (8.72) and (8.73) under a small T for is enough for the convergence to (8.75). The competitive mechanism in (8.73) will make the winner take all.

In summary, the algorithm consists of the three equations, (8.83), (8.76) and (8.84). The updating consists of the three equations:

where is a step size factor

and

where T is decreasing and increasing. The above is formulated for minimizing the energy function . To maximize a gain function with the constraints, we can simply let and then apply the same updating equations.

Next: Minimization -- Global Methods Up: Constrained Minimization Previous: Hopfield Method