In statistical classification the Bayes classifier minimizes the probability of misclassification.
Suppose a pair                     (        X        ,        Y        )                 takes values in                                           R                                d                          ×        {        1        ,        2        ,        …        ,        K        }                , where                     Y                 is the class label of                     X                . This means that the conditional distribution of X, given that the label Y takes the value r is given by
                    X        ∣        Y        =        r        ∼                  P                      r                                   for 
                    r        =        1        ,        2        ,        …        ,        K                where "                    ∼                " means "is distributed as", and where                               P                      r                                   denotes a probability distribution.
A classifier is a rule that assigns to an observation X=x a guess or estimate of what the unobserved label Y=r actually was. In theoretical terms, a classifier is a measurable function                     C        :                              R                                d                          →        {        1        ,        2        ,        …        ,        K        }                , with the interpretation that C classifies the point x to the class C(x). The probability of misclassification, or risk, of a classifier C is defined as
                                          R                          (        C        )        =        P                {        C        (        X        )        ≠        Y        }        .                The Bayes classifier is
                              C                      Bayes                          (        x        )        =                              argmax                          r              ∈              {              1              ,              2              ,              …              ,              K              }                                      P                (        Y        =        r        ∣        X        =        x        )        .                In practice, as in most of statistics, the difficulties and subtleties are associated with modeling the probability distributions effectively—in this case,                     P                (        Y        =        r        ∣        X        =        x        )                . The Bayes classifier is a useful benchmark in statistical classification.
The excess risk of a general classifier                     C                 (possibly depending on some training data) is defined as                                           R                          (        C        )        −                              R                          (                  C                      Bayes                          )        .                 Thus this non-negative quantity is important for assessing the performance of different classification techniques. A classifier is said to be consistent if the excess risk converges to zero as the size of the training data set tends to infinity.