The Bradley–Terry model is a probability model that can predict the outcome of a comparison. Given a pair of individuals i and j drawn from some population, it estimates the probability that the pairwise comparison i > j turns out true, as
Contents
where pi is a positive real-valued score assigned to individual i. The comparison i > j can be read as "i is preferred to j", "i ranks higher than j", or "i beats j", depending on the application.
For example, pi may represent the skill of a team in a sports tournament, estimated from the number of times i has won a match.
History and applications
The model is named after R. A. Bradley and M. E. Terry, who presented it in 1952, although it had already been studied by Zermelo in the 1920s.
Real-world applications of the model include estimation of the influence of statistical journals, or ranking documents by relevance in machine-learned search engines. In the latter application,
Definition
The Bradley–Terry model can be parametrized in various ways. One way to do so is to pick a single parameter per observation, leading to a model of n parameters p1, ..., pn. Another variant, in fact the version considered by Bradley and Terry, uses exponential score functions
or, using the logit (and disallowing ties),
reducing the model to logistic regression on pairs of individuals.
Estimating the parameters
The following algorithm computes the parameters pi of the basic version of the model from a sample of observations. Formally, it computes a maximum likelihood estimate, i.e., it maximizes the likelihood of the observed data. The algorithm dates back to the work of Zermelo.
The observations required are the outcomes of previous comparisons, for example, pairs (i, j) where i beats j. Summarizing these outcomes as wij, the number of times i has beaten j, we obtain the log-likelihood of the parameter vector p = p1, ..., pn as
Denote the number of comparisons "won" by i as Wi, and the number of comparisons made between i and j as Nij. Starting from an arbitrary vector p, the algorithm iteratively performs the update
for all i. After computing all of the new parameters, they should be renormalized,
This estimation procedure improves the log-likelihood in every iteration, and eventually converges to a unique maximum.