In the mathematical theory of probability, the drift-plus-penalty method is used for optimization of queueing networks and other stochastic systems.
Contents
- Methodology
- Origins and applications
- How it works
- The stochastic optimization problem
- Virtual queues
- The drift plus penalty expression
- Drift plus penalty algorithm
- Approximate scheduling
- Performance analysis
- Average penalty analysis
- Average queue size analysis
- Probability 1 convergence
- Treatment of queueing systems
- Convex functions of time averages
- Delay tradeoffs and related work
- Extensions to non iid event processes
- Extensions to variable frame length systems
- Application to convex programming
- Drift plus penalty algorithm for convex programming
- Drift plus penalty algorithm for linear programming
- Related links
- References
The technique is for stabilizing a queueing network while also minimizing the time average of a network penalty function. It can be used to optimize performance objectives such as time average power, throughput, and throughput utility. In the special case when there is no penalty to be minimized, and when the goal is to design a stable routing policy in a multi-hop network, the method reduces to backpressure routing. The drift-plus-penalty method can also be used to minimize the time average of a stochastic process subject to time average constraints on a collection of other stochastic processes. This is done by defining an appropriate set of virtual queues. It can also be used to produce time averaged solutions to convex optimization problems.
Methodology
The drift-plus-penalty method applies to queueing systems that operate in discrete time with time slots t in {0, 1, 2, ...}. First, a non-negative function L(t) is defined as a scalar measure of the state of all queues at time t. The function L(t) is typically defined as the sum of the squares of all queue sizes at time t, and is called a Lyapunov function. The Lyapunov drift is defined:
Every slot t, the current queue state is observed and control actions are taken to greedily minimize a bound on the following drift-plus-penalty expression:
where p(t) is the penalty function and V is a non-negative weight. The V parameter can be chosen to ensure the time average of p(t) is arbitrarily close to optimal, with a corresponding tradeoff in average queue size. Like backpressure routing, this method typically does not require knowledge of the probability distributions for job arrivals and network mobility.
Origins and applications
When
The theory was developed primarily for optimizing communication networks, including wireless networks, ad-hoc mobile networks, and other computer networks. However, the mathematical techniques can be applied to optimization and control for other stochastic systems, including renewable energy allocation in smart power grids and inventory control for product assembly systems.
How it works
This section shows how to use the drift-plus-penalty method to minimize the time average of a function p(t) subject to time average constraints on a collection of other functions. The analysis below is based on the material in.
The stochastic optimization problem
Consider a discrete time system that evolves over normalized time slots t in {0, 1, 2, ...}. Define p(t) as a function whose time average should be minimized, called a penalty function. Suppose that minimization of the time average of p(t) must be done subject to time-average constraints on a collection of K other functions:
Every slot t, the network controller observes a new random event. It then makes a control action based on knowledge of this event. The values of p(t) and y_i(t) are determined as functions of the random event and the control action on slot t:
The small case notation p(t), y_i(t) and upper case notation P(), Y_i() is used to distinguish the penalty values from the function that determines these values based on the random event and control action for slot t. The random event
As an example in the context of communication networks, the random event
For simplicity of exposition, assume the P() and Y_i() functions are bounded. Further assume the random event process
It is assumed throughout that this problem is feasible. That is, it is assumed that there exists an algorithm that can satisfy all of the K desired constraints.
The above problem poses each constraint in the standard form of a time average expectation of an abstract process y_i(t) being non-positive. There is no loss of generality with this approach. For example, suppose one desires the time average expectation of some process a(t) to be less than or equal to a given constant c. Then a new penalty function y(t) = a(t) − c can be defined, and the desired constraint is equivalent to the time average expectation of y(t) being non-positive. Likewise, suppose there are two processes a(t) and b(t) and one desires the time average expectation of a(t) to be less than or equal to that of b(t). This constraint is written in standard form by defining a new penalty function y(t) = a(t) − b(t). The above problem seeks to minimize the time average of an abstract penalty function p'(t)'. This can be used to maximize the time average of some desirable reward function r(t) by defining p(t) = −r('t).
Virtual queues
For each constraint i in {1, ..., K}, define a virtual queue with dynamics over slots t in {0, 1, 2, ...} as follows:
Initialize Qi(0) = 0 for all i in {1, ..., K}. This update equation is identical to that of a virtual discrete time queue with backlog Q_i(t) and with y_i(t) being the difference between new arrivals and new service opportunities on slot t. Intuitively, stabilizing these virtual queues ensures the time averages of the constraint functions are less than or equal to zero, so the desired constraints are satisfied. To see this precisely, note that (Eq. 1) implies:
Therefore:
Summing the above over the first t slots and using the law of telescoping sums implies:
Dividing by t and taking expectations implies:
Therefore, the desired constraints of the problem are satisfied whenever the following holds for all i in {1, ..., K}:
A queue Q_i(t) that satisfies the above limit equation is said to be mean rate stable.
The drift-plus-penalty expression
To stabilize the queues, define the Lyapunov function L(t) as a measure of the total queue backlog on slot t:
Squaring the queueing equation (Eq. 1) results in the following bound for each queue i in {1, ..., K}:
Therefore,
It follows that
Now define B as a positive constant that upper bounds the first term on the right-hand-side of the above inequality. Such a constant exists because the y_i(t) values are bounded. Then:
Adding Vp(t) to both sides results in the following bound on the drift-plus-penalty expression:
The drift-plus-penalty algorithm (defined below) makes control actions every slot t that greedily minimize the right-hand-side of the above inequality. Intuitively, taking an action that minimizes the drift alone would be beneficial in terms of queue stability but would not minimize time average penalty. Taking an action that minimizes the penalty alone would not necessarily stabilize the queues. Thus, taking an action to minimize the weighted sum incorporates both objectives of queue stability and penalty minimization. The weight V can be tuned to place more or less emphasis on penalty minimization, which results in a performance tradeoff.
Drift-plus-penalty algorithm
Let
Given these observations for slot t, greedily choose a control action
Then update the queues for each i in {1, ..., K} according to (Eq. 1). Repeat this procedure for slot t+1.
Note that the random event and queue backlogs observed on slot t act as given constants when selecting the control action for the slot t minimization. Thus, each slot involves a deterministic search for the minimizing control action over the set A. A key feature of this algorithm is that it does not require knowledge of the probability distribution of the random event process.
Approximate scheduling
The above algorithm involves finding a minimum of a function over an abstract set A. In general cases, the minimum might not exist, or might be difficult to find. Thus, it is useful to assume the algorithm is implemented in an approximate manner as follows: Define C as a non-negative constant, and assume that for all slots t, the control action
Such a control action is called a C-additive approximation. The case C = 0 corresponds to exact minimization of the desired expression on every slot t.
Performance analysis
This section shows the algorithm results in a time average penalty that is within O(1/V) of optimality, with a corresponding O(V) tradeoff in average queue size.
Average penalty analysis
Define an
The expectations above are with respect to the random variable
Let
Assume the drift-plus-penalty action
where the last inequality follows because action
Notice that the
Dividing the above by
Thus, the time average expected penalty can be made arbitrarily close to the optimal value
Average queue size analysis
Assume now there exists an
An argument similar to the one in the previous section shows:
A telescoping series argument similar to the one in the previous section can thus be used to show the following for all t>0:
This shows that average queue size is indeed O(V).
Probability 1 convergence
The above analysis considers time average expectations. Related probability 1 performance bounds for infinite horizon time average queue size and penalty can be derived using the drift-plus-penalty method together with martingale theory.
Treatment of queueing systems
The above analysis considers constrained optimization of time averages in a stochastic system that did not have any explicit queues. Each time average inequality constraint was mapped to a virtual queue according to (Eq. 1). In the case of optimizing a queueing network, the virtual queue equations in (Eq. 1) are replaced by the actual queueing equations.
Convex functions of time averages
A related problem is the minimization of a convex function of time averages subject to constraints, such as:
where
Such problems of optimizing convex functions of time averages can be transformed into problems of optimizing time averages of functions via auxiliary variables (see Chapter 5 of the Neely textbook). The latter problems can then be solved by the drift-plus-penalty method as described in previous subsections. An alternative primal-dual method makes decisions similar to drift-plus-penalty decisions, but uses a penalty defined by partial derivatives of the objective function
Delay tradeoffs and related work
The mathematical analysis in the previous section shows that the drift-plus-penalty method produces a time average penalty that is within O(1/V) of optimality, with a corresponding O(V) tradeoff in average queue size. This method, together with the O(1/V), O(V) tradeoff, was developed in Neely and Neely, Modiano, Li in the context of maximizing network utility subject to stability.
A related algorithm for maximizing network utility was developed by Eryilmaz and Srikant. The Eryilmaz and Srikant work resulted in an algorithm very similar to the drift-plus-penalty algorithm, but used a different analytical technique. That technique was based on Lagrange multipliers. A direct use of the Lagrange multiplier technique results in a worse tradeoff of O(1/V), O(V2). However, the Lagrange multiplier analysis was later strengthened by Huang and Neely to recover the original O(1/V), O(V) tradeoffs, while showing that queue sizes are tightly clustered around the Lagrange multiplier of a corresponding deterministic optimization problem. This clustering result can be used to modify the drift-plus-penalty algorithm to enable improved O(1/V), O(log2(V)) tradeoffs. The modifications can use either place-holder backlog or Last-in-First-Out (LIFO) scheduling.
When implemented for non-stochastic functions, the drift-plus-penalty method is similar to the dual subgradient method of convex optimization theory, with the exception that its output is a time average of primal variables, rather than the primal variables themselves. A related primal-dual technique for maximizing utility in a stochastic queueing network was developed by Stolyar using a fluid model analysis. The Stolyar analysis does not provide analytical results for a performance tradeoff between utility and queue size. A later analysis of the primal-dual method for stochastic networks proves a similar O(1/V), O(V) utility and queue size tradeoff, and also shows local optimality results for minimizing non-convex functions of time averages, under an additional convergence assumption. However, this analysis does not specify how much time is required for the time averages to converge to something close to their infinite horizon limits. Related primal-dual algorithms for utility maximization without queues were developed by Agrawal and Subramanian and Kushner and Whiting.
Extensions to non-i.i.d. event processes
The drift-plus-penalty algorithm is known to ensure similar performance guarantees for more general ergodic processes
Extensions to variable frame length systems
The drift-plus-penalty method can be extended to treat systems that operate over variable size frames. In that case, the frames are labeled with indices r in {0, 1, 2, ...} and the frame durations are denoted {T[0], T[1], T[2], ...}, where T[r] is a non-negative real number for each frame r. Let
where Q[r] is the vector of queue backlogs at the beginning of frame r. In the special case when all frames are the same size and are normalized to 1 slot length, so that T[r] = 1 for all r, the above minimization reduces to the standard drift-plus-penalty technique. This frame-based method can be used for constrained optimization of Markov decision problems (MDPs) and for other problems involving systems that experience renewals.
Application to convex programming
Let x = (x1, ..., xN) be an N-dimensional vector of real numbers, and define the hyper-rectangle A by:
where xmin, i, xmax, i are given real numbers that satisfy
This can be solved by the drift-plus-penalty method as follows: Consider the special case of a deterministic system with no random event process
and define the action space as the N-dimensional hyper-rectangle A. Define penalty and constraint functions as:
Define the following time averages:
Now consider the following time average optimization problem:
By Jensen's inequality the following holds for all slots t>0:
From this, it can be shown that an optimal solution to the time-average problem (Eq. 8)–(Eq. 9) can be achieved by solutions of the type x(t) = x* for all slots t, where x* is a vector that solves the convex program (Eq. 6)–(Eq. 7). Further, any time-averaged vector
Drift-plus-penalty algorithm for convex programming
Every slot t, choose vector
Then update the queues according to:
The time average vector
This algorithm is similar to the standard dual subgradient algorithm of optimization theory, using a fixed stepsize of 1/V. However, a key difference is that the dual subgradient algorithm is typically analyzed under restrictive strict convexity assumptions that are needed for the primal variables x(t) to converge. There are many important cases where these variables do not converge to the optimal solution, and never even get near the optimal solution (this is the case for most linear programs, as shown below). On the other hand, the drift-plus-penalty algorithm does not require strict convexity assumptions. It ensures that the time averages of the primals converge to a solution that is within O(1/V) of optimality, with O(V) bounds on queue sizes (it can be shown that this translates into an O(V2) bound on convergence time).
Drift-plus-penalty algorithm for linear programming
Consider the special case of a linear program. Specifically, suppose:
for given real-valued constants (c1, …, cN), (ain), (b1, …, bK). Then the above algorithm reduces to the following: Every slot t and for each variable n in {1, …, N}, choose xn(t) in [xmin,n, xmax,n] to minimize the expression:
Then update queues Qi(t) as before. This amounts to choosing each variable xi(t) according to the simple bang-bang control policy:
Since the primal variables xi(t) are always either xmin,i or xmax,i, they can never converge to the optimal solution if the optimal solution is not a vertex point of the hyper-rectangle A. However, the time-averages of these bang-bang decisions indeed converge to an O(1/V) approximation of the optimal solution. For example, suppose that xmin,1 = 0, xmax,1 = 1, and suppose that all optimal solutions to the linear program have x1 = 3/4. Then roughly 3/4 of the time the bang-bang decision for the first variable will be x1(t) = 1, and the remaining time it will be x1(t) = 0.