![]() | ||
Simulated annealing (SA) is a probabilistic technique for approximating the global optimum of a given function. Specifically, it is a metaheuristic to approximate global optimization in a large search space. It is often used when the search space is discrete (e.g., all tours that visit a given set of cities). For problems where finding an approximate global optimum is more important than finding a precise local optimum in a fixed amount of time, simulated annealing may be preferable to alternatives such as gradient descent.
Contents
- Overview
- The basic iteration
- The neighbours of a state
- Acceptance probabilities
- The annealing schedule
- Pseudocode
- Selecting the parameters
- Transition probabilities
- Efficient candidate generation
- Barrier avoidance
- Cooling schedule
- Restarts
- Related methods
- References
The name and inspiration come from annealing in metallurgy, a technique involving heating and controlled cooling of a material to increase the size of its crystals and reduce their defects. Both are attributes of the material that depend on its thermodynamic free energy. Heating and cooling the material affects both the temperature and the thermodynamic free energy. The simulation of annealing as an approach that reduces a minimization of a function of large number of variables to the statistical mechanics of equilibration (annealing) of the mathematically equivalent artificial multiatomic system was first formulated by Armen G. Khachaturyan, Svetlana V. Semenovskaya, Boris K. Vainshtein in 1979 and by Armen G. Khachaturyan, Svetlana V. Semenovskaya, Boris K. Vainshtein in 1981. These authors used computer simulation mimicking annealing and cooling of such a system to find its global minimum.
Simulated annealing interprets slow cooling as a slow decrease in the probability of accepting worse solutions as it explores the solution space. Accepting worse solutions is a fundamental property of metaheuristics because it allows for a more extensive search for the optimal solution.
This notion of slow cooling is implemented in the Simulated Annealing algorithm as a slow decrease in the probability of accepting worse solutions as it explores the solution space. The simulation can be performed either by a solution of kinetic equations for density functions (A. G. Khachaturyan, S.V. Semenovskaya, B.K.Vainstein in 1979 and A. Khachaturyan, S. Semenovsakaya, B. Vainshtein in 1981) or by using the stochastic sampling method which was independently described by Scott Kirkpatrick, C. Daniel Gelatt and Mario P. Vecchi in 1983, by Vlado Černý in 1985 and by Svetlana V. Semenovskaya, Karen A. Khachaturyan and Armen G. Khachaturyan in 1985. The method is an adaptation of the Metropolis–Hastings algorithm, a Monte Carlo method to generate sample states of a thermodynamic system, invented by M.N. Rosenbluth and published by N. Metropolis et al. in 1953.
Overview
The state of some physical systems, and the function E(s) to be minimized is analogous to the internal energy of the system in that state. The goal is to bring the system, from an arbitrary initial state, to a state with the minimum possible energy.
The basic iteration
At each step, the SA heuristic considers some neighbouring state s' of the current state s, and probabilistically decides between moving the system to state s' or staying in state s. These probabilities ultimately lead the system to move to states of lower energy. Typically this step is repeated until the system reaches a state that is good enough for the application, or until a given computation budget has been exhausted.
The neighbours of a state
The neighbours of a state are new states of the problem that are produced after altering a given state in some well-defined way. For example, in the traveling salesman problem each state is typically defined as a permutation of the cities to be visited. The neighbours of a state are the set of permutations that are produced, for example, by reversing the order of any two successive cities. The well-defined way in which the states are altered in order to find neighbouring states is called a "move" and different moves give different sets of neighbouring states. These moves usually result in minimal alterations of the last state, as the previous example depicts, in order to help the algorithm keep the better parts of the solution and change only the worse parts. In the traveling salesman problem, the parts of the solution are the city connections.
Searching for neighbours of a state is fundamental to optimization because the final solution will come after a tour of successive neighbours. Simple heuristics move by finding better neighbour after better neighbour and stop when they have reached a solution which has no neighbours that are better solutions. The problem with this approach is that the neighbours of a state are not guaranteed to contain any of the existing better solutions. This means that failure to find a better solution among them does not guarantee that no better solution exists. This is why the best solution found by such algorithms is called a local optimum in contrast with the actual best solution which is called a global optimum. Metaheuristics use the neighbours of a solution as a way to explore the solutions space, and although they prefer better neighbours, they also accept worse neighbours in order to avoid getting stuck in local optima. If the algorithm were run for a long enough amount of time, the global optimum would be found.
Acceptance probabilities
The probability of making the transition from the current state
When
In the original description of SA, the probability
The
Given these properties, the temperature
The annealing schedule
The name and inspiration of the algorithm demand an interesting feature related to the temperature variation to be embedded in the operational characteristics of the algorithm. This necessitates a gradual reduction of the temperature as the simulation proceeds. The algorithm starts initially with
For any given finite problem, the probability that the simulated annealing algorithm terminates with a global optimal solution approaches 1 as the annealing schedule is extended. This theoretical result, however, is not particularly helpful, since the time required to ensure a significant probability of success will usually exceed the time required for a complete search of the solution space.
Pseudocode
The following pseudocode presents the simulated annealing heuristic as described above. It starts from a state s0 and continues to either a maximum of kmax steps or until a state with an energy of emin or less is found. In the process, the call neighbour(s) should generate a randomly chosen neighbour of a given state s; the call random(0, 1) should pick and return a value in the range [0, 1], uniformly at random. The annealing schedule is defined by the call temperature(r), which should yield the temperature to use, given the fraction r of the time budget that has been expended so far.
Selecting the parameters
In order to apply the SA method to a specific problem, one must specify the following parameters: the state space, the energy (goal) function E(), the candidate generator procedure neighbour(), the acceptance probability function P(), and the annealing schedule temperature() AND initial temperature <init temp>. These choices can have a significant impact on the method's effectiveness. Unfortunately, there are no choices of these parameters that will be good for all problems, and there is no general way to find the best choices for a given problem. The following sections give some general guidelines.
Simulated annealing may be modeled as a random walk on a search graph, whose vertices are all possible states, and whose edges are the candidate moves. An essential requirement for the neighbor() function is that it must provide a sufficiently short path on this graph from the initial state to any state which may be the global optimum. (In other words, the diameter of the search graph must be small.) In the traveling salesman example above, for instance, the search space for n = 20 {display style n=20} n=20 cities has n ! {display style n!} n! = 2,432,902,008,176,640,000 (2.4 quintillion) states; yet the neighbor generator function that swaps two consecutive cities can get from any state (tour) to any other state in at most n ( n − 1 ) / 2 = 190 {display style n(n-1)/2=190} n(n-1)/2=190 steps (this is equivalent to ∑ i = 1 n − 1 i {display style sum _{i=1}^{n-1}i} sum _{i=1}^{n-1}i).
Transition probabilities
To investigate the behavior of simulated annealing on a particular problem, it can be useful to consider the transition probabilities that result from the various design choices made in the implementation of the algorithm. For each edge
Acceptance probabilities
The specification of neighbour(), P(), and temperature() is partially redundant. In practice, it's common to use the same acceptance function P() for many problems, and adjust the other two functions according to the specific problem.
In the formulation of the method by Kirkpatrick et al., the acceptance probability function
Efficient candidate generation
When choosing the candidate generator neighbour(), one must consider that after a few iterations of the SA algorithm, the current state is expected to have much lower energy than a random state. Therefore, as a general rule, one should skew the generator towards candidate moves where the energy of the destination state
In the traveling salesman problem above, for example, swapping two consecutive cities in a low-energy tour is expected to have a modest effect on its energy (length); whereas swapping two arbitrary cities is far more likely to increase its length than to decrease it. Thus, the consecutive-swap neighbour generator is expected to perform better than the arbitrary-swap one, even though the latter could provide a somewhat shorter path to the optimum (with
A more precise statement of the heuristic is that one should try first candidate states
Barrier avoidance
When choosing the candidate generator neighbour() one must also try to reduce the number of "deep" local minima — states (or sets of connected states) that have much lower energy than all its neighbouring states. Such "closed catchment basins" of the energy function may trap the SA algorithm with high probability (roughly proportional to the number of states in the basin) and for a very long time (roughly exponential on the energy difference between the surrounding states and the bottom of the basin).
As a rule, it is impossible to design a candidate generator that will satisfy this goal and also prioritize candidates with similar energy. On the other hand, one can often vastly improve the efficiency of SA by relatively simple changes to the generator. In the traveling salesman problem, for instance, it is not hard to exhibit two tours
Cooling schedule
The physical analogy that is used to justify SA assumes that the cooling rate is low enough for the probability distribution of the current state to be near thermodynamic equilibrium at all times. Unfortunately, the relaxation time—the time one must wait for the equilibrium to be restored after a change in temperature—strongly depends on the "topography" of the energy function and on the current temperature. In the SA algorithm, the relaxation time also depends on the candidate generator, in a very complicated way. Note that all these parameters are usually provided as black box functions to the SA algorithm. Therefore, the ideal cooling rate cannot be determined beforehand, and should be empirically adjusted for each problem. Adaptive simulated annealing algorithms address this problem by connecting the cooling schedule to the search progress.
Restarts
Sometimes it is better to move back to a solution that was significantly better rather than always moving from the current state. This process is called restarting of simulated annealing. To do this we set s
and e
to sbest
and ebest
and perhaps restart the annealing schedule. The decision to restart could be based on several criteria. Notable among these include restarting based on a fixed number of steps, based on whether the current energy is too high compared to the best energy obtained so far, restarting randomly, etc.