## Mini courses svan 2016 mc5 class 01 stochastic optimal control

**Optimal control theory**, an extension of the calculus of variations, is a mathematical optimization method for deriving control policies. The method is largely due to the work of Lev Pontryagin and Richard Bellman in the 1950s, after contributions to calculus of variations by Edward J. McShane. Optimal control can be seen as a control strategy in control theory.

## Contents

## Spin dynamics introduction to optimal control theory part i

## General method

Optimal control deals with the problem of finding a control law for a given system such that a certain optimality criterion is achieved. A control problem includes a cost functional that is a function of state and control variables. An **optimal control** is a set of differential equations describing the paths of the control variables that minimize the cost function. The optimal control can be derived using Pontryagin's maximum principle (a necessary condition also known as Pontryagin's minimum principle or simply Pontryagin's Principle), or by solving the Hamilton–Jacobi–Bellman equation (a sufficient condition).

We begin with a simple example. Consider a car traveling in a straight line on a hilly road. The question is, how should the driver press the accelerator pedal in order to *minimize* the total traveling time? Clearly in this example, the term *control law* refers specifically to the way in which the driver presses the accelerator and shifts the gears. The *system* consists of both the car and the road, and the *optimality criterion* is the minimization of the total traveling time. Control problems usually include ancillary constraints. For example, the amount of available fuel might be limited, the accelerator pedal cannot be pushed through the floor of the car, speed limits, etc.

A proper cost function will be a mathematical expression giving the traveling time as a function of the speed, geometrical considerations, and initial conditions of the system. It is often the case that the constraints are interchangeable with the cost function.

Another optimal control problem is to find the way to drive the car so as to minimize its fuel consumption, given that it must complete a given course in a time not exceeding some amount. Yet another control problem is to minimize the total monetary cost of completing the trip, given assumed monetary prices for time and fuel.

A more abstract framework goes as follows. Minimize the continuous-time cost functional

subject to the first-order dynamic constraints (the **state equation**)

the algebraic *path constraints*

and the boundary conditions

where
*state*,
*control*,
*endpoint cost* and *Lagrangian*, respectively. Furthermore, it is noted that the path constraints are in general *inequality* constraints and thus may not be active (i.e., equal to zero) at the optimal solution. It is also noted that the optimal control problem as stated above may have multiple solutions (i.e., the solution may not be unique). Thus, it is most often the case that any solution
*locally minimizing*.

## Linear quadratic control

A special case of the general nonlinear optimal control problem given in the previous section is the *linear quadratic* (LQ) optimal control problem. The LQ problem is stated as follows. Minimize the *quadratic* continuous-time cost functional

Subject to the *linear* first-order dynamic constraints

and the initial condition

A particular form of the LQ problem that arises in many control system problems is that of the *linear quadratic regulator* (LQR) where all of the matrices (i.e.,
*constant*, the initial time is arbitrarily set to zero, and the terminal time is taken in the limit
*infinite horizon*). The LQR problem is stated as follows. Minimize the infinite horizon quadratic continuous-time cost functional

Subject to the *linear time-invariant* first-order dynamic constraints

and the initial condition

In the finite-horizon case the matrices are restricted in that
*constant*. These additional restrictions on
*bounded*, the additional restriction is imposed that the pair
*controllable*. Note that the LQ or LQR cost functional can be thought of physically as attempting to minimize the *control energy* (measured as a quadratic form).

The infinite horizon problem (i.e., LQR) may seem overly restrictive and essentially useless because it assumes that the operator is driving the system to zero-state and hence driving the output of the system to zero. This is indeed correct. However the problem of driving the output to a desired nonzero level can be solved *after* the zero output one is. In fact, it can be proved that this secondary LQR problem can be solved in a very straightforward manner. It has been shown in classical optimal control theory that the LQ (or LQR) optimal control has the feedback form

where

and

For the finite horizon LQ problem, the Riccati equation is integrated backward in time using the terminal boundary condition

For the infinite horizon LQR problem, the differential Riccati equation is replaced with the *algebraic* Riccati equation (ARE) given as

Understanding that the ARE arises from infinite horizon problem, the matrices
*constant*. It is noted that there are in general multiple solutions to the algebraic Riccati equation and the *positive definite* (or positive semi-definite) solution is the one that is used to compute the feedback gain. The LQ (LQR) problem was elegantly solved by Rudolf Kalman.

## Numerical methods for optimal control

Optimal control problems are generally nonlinear and therefore, generally do not have analytic solutions (e.g., like the linear-quadratic optimal control problem). As a result, it is necessary to employ numerical methods to solve optimal control problems. In the early years of optimal control (c. 1950s to 1980s) the favored approach for solving optimal control problems was that of *indirect methods*. In an indirect method, the calculus of variations is employed to obtain the first-order optimality conditions. These conditions result in a two-point (or, in the case of a complex problem, a multi-point) boundary-value problem. This boundary-value problem actually has a special structure because it arises from taking the derivative of a Hamiltonian. Thus, the resulting dynamical system is a Hamiltonian system of the form

where

is the *augmented Hamiltonian* and in an indirect method, the boundary-value problem is solved (using the appropriate boundary or *transversality* conditions). The beauty of using an indirect method is that the state and adjoint (i.e.,

The approach that has risen to prominence in numerical optimal control over the past two decades (i.e., from the 1980s to the present) is that of so-called *direct methods*. In a direct method, the state and/or control are approximated using an appropriate function approximation (e.g., polynomial approximation or piecewise constant parameterization). Simultaneously, the cost functional is approximated as a *cost function*. Then, the coefficients of the function approximations are treated as optimization variables and the problem is "transcribed" to a nonlinear optimization problem of the form:

Minimize

subject to the algebraic constraints

Depending upon the type of direct method employed, the size of the nonlinear optimization problem can be quite small (e.g., as in a direct shooting or quasilinearization method), moderate (e.g. pseudospectral optimal control) or may be quite large (e.g., a direct collocation method). In the latter case (i.e., a collocation method), the nonlinear optimization problem may be literally thousands to tens of thousands of variables and constraints. Given the size of many NLPs arising from a direct method, it may appear somewhat counter-intuitive that solving the nonlinear optimization problem is easier than solving the boundary-value problem. It is, however, the fact that the NLP is easier to solve than the boundary-value problem. The reason for the relative ease of computation, particularly of a direct collocation method, is that the NLP is *sparse* and many well-known software programs exist (e.g., SNOPT) to solve large sparse NLPs. As a result, the range of problems that can be solved via direct methods (particularly direct *collocation methods* which are very popular these days) is significantly larger than the range of problems that can be solved via indirect methods. In fact, direct methods have become so popular these days that many people have written elaborate software programs that employ these methods. In particular, many such programs include *DIRCOL*, SOCS, OTIS, GESOP/ASTOS, DITAN. and PyGMO/PyKEP. In recent years, due to the advent of the MATLAB programming language, optimal control software in MATLAB has become more common. Examples of academically developed MATLAB software tools implementing direct methods include *RIOTS*,*DIDO*, *DIRECT*, and *GPOPS,* while an example of an industry developed MATLAB tool is *PROPT*. These software tools have increased significantly the opportunity for people to explore complex optimal control problems both for academic research and industrial problems. Finally, it is noted that general-purpose MATLAB optimization environments such as TOMLAB have made coding complex optimal control problems significantly easier than was previously possible in languages such as C and FORTRAN.

## Discrete-time optimal control

The examples thus far have shown continuous time systems and control solutions. In fact, as optimal control solutions are now often implemented digitally, contemporary control theory is now primarily concerned with discrete time systems and solutions. The Theory of Consistent Approximations provides conditions under which solutions to a series of increasingly accurate discretized optimal control problem converge to the solution of the original, continuous-time problem. Not all discretization methods have this property, even seemingly obvious ones. For instance, using a variable step-size routine to integrate the problem's dynamic equations may generate a gradient which does not converge to zero (or point in the right direction) as the solution is approached. The direct method *RIOTS* is based on the Theory of Consistent Approximation.

## Examples

A common solution strategy in many optimal control problems is to solve for the costate (sometimes called the shadow price)

Having obtained

## Finite time

Consider the problem of a mine owner who must decide at what rate to extract ore from his mine. He owns rights to the ore from date