Multiplicative weight update method is a meta-algorithm. It is an algorithmic technique which "maintains a distribution on a certain set of interest, and updates it iteratively by multiplying the probability mass of elements by suitably chosen factors based on feedback obtained by running another algorithm on the distribution". It was discovered repeatedly in very diverse fields such as machine learning (AdaBoost, Winnow, Hedge), optimization (solving LPs), theoretical computer science (devising fast algorithm for LPs and SDPs), and game theory.
Contents
- Name
- History and Background
- General Setup
- Halving Algorithm
- Weighted Majority Algorithm
- Randomized Weighted Majority Algorithm
- Applications
- Solving Zero Sum Games Approximately Oracle Algorithm
- Machine Learning
- Winnow Algorithm
- Hedge Algorithm
- AdaBoost Algorithm
- Problem
- Assumption
- solution
- Operations Research and On line Statistical Decision Making
- Computational Geometry
- References
Name
"Multiplicative weights" implies the iterative rule used in algorithms derived from the Multiplicative Weight Update Method. It is given with different names in the different fields where it was discovered or rediscovered.
History and Background
The earliest known version of this technique was in an algorithm named "Fictitious Play" which was proposed in game theory in the early 1950s. Grigoriadis and Khachiyan applied a randomized variant of "Fictitious Play" to solve two-player zero-sum games efficiently using the multiplicative weights algorithm. In this case, player allocates higher weight to the actions that had a better outcome and choose his strategy relying on these weights. In machine learning, Littlestone applied the earliest form of the multiplicative weights update rule in his famous Winnow Algorithm, which is similar to Minsky and Papert's earlier perceptron learning algorithm.Later, he generalized the Winnow Algorithm to Weighted Majority Algorithm. Freund and Schapire followed his steps and generalized the Winnow Algorithm in the form of Hedge Algorithm.
The Multiplicative weights algorithm is also widely applied in computational geometry such as Clarkson's algorithm for linear programming (LP) with a bounded number of variables in linear time. Later, Bronnimann and Goodrich employed analogous methods to find Set Covers for hypergraphs with small VC dimension.
In operation research and on-line statistical decision making problem field, the weighted majority algorithm and its more complicated versions have been found independently.
In computer science field, some researchers have previously obeserved the close relationships between multiplicative update algorithms used in different contexts. Young discovered the similarities between fast LP algorithms and Raghavan's method of pessimistic estimators for derandomization of randomized rounding algorithms; Klivans and Servedio linked boosting algorithms in learning theory to proofs of Yao's XOR Lemma; Garg and Khandekar defined a common framework for convex optimization problems that contains Garg-Konemann and Plotkin-Shmoys-Tardos as subcases.
General Setup
A binary decision needs to be made based on n experts’ opinions to attain an associated payoff. In the first round, all experts’ opinions have the same weight. The decision maker will make the first decision based on the majority of the experts' prediction. Then, in each successive round, the decision maker will repeatedly update the weight of each expert's opinion depending on the correctness of his prior predictions. Real life examples includes predicting if it is rainy tomorrow or if the stock market will go up or go down.
Halving Algorithm
Given a sequential game played between an Adversary and an Aggregator who is advised by N experts. The goal is for aggregator to make as few mistakes as possible. Assume there is an expert among N experts always gives the correct prediction. In the halving algorithm, only the consistent experts are retained. Experts who make mistake will all be dismissed. With the remaining experts, take a majority vote. Therefore, every time the Aggregator makes a mistake, at least half of the remaining experts are dismissed. Aggregator makes at most log2(N) mistakes.
Weighted Majority Algorithm
Unlike halving algorithm which dismisses experts who have made mistakes, weighted majority algorithm discounts their advice. Given the same "expert advice" setup, suppose we have n decisions, and we need to select one decision for each loop. In each loop, every decision incurs a cost. All costs will be revealed after making the choice. The cost is 0 if the expert is correct, and 1 otherwise. this algorithm's goal is to limit its cumulative losses to roughly the same as the best of experts. The very first algorithm that makes choice based on majority vote every iteration does not work since the majority of the experts can be wrong consistently every time. The weighted majority algorithm corrects above trivial algorithm by keeping a weight of experts instead of fixing the cost at either 1 or 0. This would make fewer mistakes compared to halving algorithm.
Initialization: Fix anIf
After
In particular, this holds for i which is the best expert. Since the best expert will have the least
Randomized Weighted Majority Algorithm
Given the same setup with N experts. Consider the special situation where the proportions of experts predicting positive and negative, counting the weights, are both close to 50%. Then, there might be a tie. Following the weight update rule in weighted majority algorithm, the predictions made by the algorithm would be randomized. The algorithm calculates the probabilities of experts predicting positive or negatives, and then makes a random decision based on the computed fraction:
predict
where
The number of mistakes made by the Randomized Weighted Majority Algorithm is bounded as:
where
Note that only the learning algorithm is randomized. The underlying assumption is that the examples and experts’ predictions are not random. The only randomness is the randomness where the learner makes his own prediction. In this randomized algorithm,
Applications
Multiplicative Weights method is usually used to solve a constrained optimization problem. Let each expert be the constraint in the problem, and the events represent the points in the area of interest. The punishment of the expert corresponds to how well its corresponding constraint is satisfied on the point represented by an event.
Solving Zero-Sum Games Approximately (Oracle Algorithm):
Suppose we were given the distribution
When the row player
If player
Hence, in order to maximize
where P and i changes over the distributions over rows, Q and j changes over the columns.
Then, let
Therefore, there is an algorithm solving zero-sum game up to an additive factor of δ using O(log2(n)/
Machine Learning
In Machine Learning, Littlestone and Warmuth generalized the Winnow algorithm to the Weighted Majority algorithm. Later, Freund and Schapire generalized it in the form of Hedge algorithm. AdaBoost Algorithm formulated by Yoav Freund and Robert Schapire also employed the Multiplicative Weight Update Method.
Winnow Algorithm
Based on current knowledge in algorithms, multiplicative weight update method was first used in Littlestone's Winnow Algorithm. It is utilized in machine learning to solve a linear program.
Given
The aim is to find non-negative weights such that for all examples, the sign of the weighted combination of the features matches its labels. That is, require that
This is general form of LP.
Hedge Algorithm
Hedge Algorithm is similar to Weighted Majority Algorithm. However, their exponential update rules are different. It is generally used to solve the problem of binary allocation in which we need to allocate different portion of resources into N different options. The loss with every option is available at the end of every iteration. The goal is to reduce the total loss suffered for a particular allocation. The allocation for the following iteration is then revised, based on the total loss suffered in the current iteration using multiplicative update.
Analysis
Assume
Initialization: Fix an
AdaBoost Algorithm
This algorithm maintains a set of weights
Problem
Given a
Assumption
Using the oracle algorithm in solving zero-sum problem, with an error parameter
solution
Given vector
If there exists a x satisfying (1), then x satisfies (2) for all
Operations Research and On-line Statistical Decision Making
In operations research and on-line statistical decision making problem field, the weighted majority algorithm and its more complicated versions have been found independently.
Computational Geometry
Multiplicative weights algorithm is also widely applied in computational geometry such as Clarkson's algorithm for linear programming (LP) with a bounded number of variables in linear time. Later, Bronnimann and Goodrich employed analogous methods to find Set Covers for hypergraphs with small VC dimension.