In statistical classification the Bayes classifier minimizes the probability of misclassification.
Suppose a pair ( X , Y ) takes values in R d × { 1 , 2 , … , K } , where Y is the class label of X . This means that the conditional distribution of X, given that the label Y takes the value r is given by
X ∣ Y = r ∼ P r for
r = 1 , 2 , … , K where " ∼ " means "is distributed as", and where P r denotes a probability distribution.
A classifier is a rule that assigns to an observation X=x a guess or estimate of what the unobserved label Y=r actually was. In theoretical terms, a classifier is a measurable function C : R d → { 1 , 2 , … , K } , with the interpretation that C classifies the point x to the class C(x). The probability of misclassification, or risk, of a classifier C is defined as
R ( C ) = P { C ( X ) ≠ Y } . The Bayes classifier is
C Bayes ( x ) = argmax r ∈ { 1 , 2 , … , K } P ( Y = r ∣ X = x ) . In practice, as in most of statistics, the difficulties and subtleties are associated with modeling the probability distributions effectively—in this case, P ( Y = r ∣ X = x ) . The Bayes classifier is a useful benchmark in statistical classification.
The excess risk of a general classifier C (possibly depending on some training data) is defined as R ( C ) − R ( C Bayes ) . Thus this non-negative quantity is important for assessing the performance of different classification techniques. A classifier is said to be consistent if the excess risk converges to zero as the size of the training data set tends to infinity.