The Rand index or Rand measure (named after William M. Rand) in statistics, and in particular in data clustering, is a measure of the similarity between two data clusterings. A form of the Rand index may be defined that is adjusted for the chance grouping of elements, this is the adjusted Rand index. From a mathematical standpoint, Rand index is related to the accuracy, but is applicable even when class labels are not used.
Given a set of                     n                 elements                     S        =        {                  o                      1                          ,        …        ,                  o                      n                          }                 and two partitions of                     S                 to compare,                     X        =        {                  X                      1                          ,        …        ,                  X                      r                          }                , a partition of R into r subsets, and                     Y        =        {                  Y                      1                          ,        …        ,                  Y                      s                          }                , a partition of S into s subsets, define the following:
                    a                , the number of pairs of elements in                     S                 that are in the same subset in                     X                 and in the same subset in                     Y                                    b                , the number of pairs of elements in                     S                 that are in different subsets in                     X                 and in different subsets in                     Y                                    c                , the number of pairs of elements in                     S                 that are in the same subset in                     X                 and in different subsets in                     Y                                    d                , the number of pairs of elements in                     S                 that are in different subsets in                     X                 and in the same subset in                     Y                The Rand index,                     R                , is:
                    R        =                                            a              +              b                                      a              +              b              +              c              +              d                                      =                                            a              +              b                                                      (                                            n                2                                            )                                                            Intuitively,                     a        +        b                 can be considered as the number of agreements between                     X                 and                     Y                 and                     c        +        d                 as the number of disagreements between                     X                 and                     Y                .
Since the denominator is the total number of pairs, the Rand index represents the frequency of occurrence of agreements over the total pairs, or the probability that                     X                 and                     Y                 will agree on a randomly chosen pair.
The Rand index has a value between 0 and 1, with 0 indicating that the two data clusterings do not agree on any pair of points and 1 indicating that the data clusterings are exactly the same.
In mathematical terms, a, b, c, d are defined as follows:
                    a        =                  |                          S                      ∗                                    |                        , where                               S                      ∗                          =        {        (                  o                      i                          ,                  o                      j                          )                  |                          o                      i                          ,                  o                      j                          ∈                  X                      k                          ,                  o                      i                          ,                  o                      j                          ∈                  Y                      l                          }                                    b        =                  |                          S                      ∗                                    |                        , where                               S                      ∗                          =        {        (                  o                      i                          ,                  o                      j                          )                  |                          o                      i                          ∈                  X                                    k                              1                                                    ,                  o                      j                          ∈                  X                                    k                              2                                                    ,                  o                      i                          ∈                  Y                                    l                              1                                                    ,                  o                      j                          ∈                  Y                                    l                              2                                                    }                                    c        =                  |                          S                      ∗                                    |                        , where                               S                      ∗                          =        {        (                  o                      i                          ,                  o                      j                          )                  |                          o                      i                          ,                  o                      j                          ∈                  X                      k                          ,                  o                      i                          ∈                  Y                                    l                              1                                                    ,                  o                      j                          ∈                  Y                                    l                              2                                                    }                                    d        =                  |                          S                      ∗                                    |                        , where                               S                      ∗                          =        {        (                  o                      i                          ,                  o                      j                          )                  |                          o                      i                          ∈                  X                                    k                              1                                                    ,                  o                      j                          ∈                  X                                    k                              2                                                    ,                  o                      i                          ,                  o                      j                          ∈                  Y                      l                          }                for some                     1        ≤        i        ,        j        ≤        n        ,        i        ≠        j        ,        1        ≤        k        ,                  k                      1                          ,                  k                      2                          ≤        r        ,                  k                      1                          ≠                  k                      2                          ,        1        ≤        l        ,                  l                      1                          ,                  l                      2                          ≤        s        ,                  l                      1                          ≠                  l                      2                                  
Adjusted Rand index
The adjusted Rand index is the corrected-for-chance version of the Rand index. Though the Rand Index may only yield a value between 0 and +1, the adjusted Rand index can yield negative values if the index is less than the expected index.
Given a set                     S                 of                     n                 elements, and two groupings or partitions (e.g. clusterings) of these points, namely                     X        =        {                  X                      1                          ,                  X                      2                          ,        …        ,                  X                      r                          }                 and                     Y        =        {                  Y                      1                          ,                  Y                      2                          ,        …        ,                  Y                      s                          }                , the overlap between                     X                 and                     Y                 can be summarized in a contingency table                               [                      n                          i              j                                ]                         where each entry                               n                      i            j                                   denotes the number of objects in common between                               X                      i                                   and                               Y                      j                                   :                               n                      i            j                          =                  |                          X                      i                          ∩                  Y                      j                                    |                        .
The adjusted form of the Rand Index, the Adjusted Rand Index, is                     A        d        j        u        s        t        e        d        I        n        d        e        x        =                                            I              n              d              e              x              −              E              x              p              e              c              t              e              d              I              n              d              e              x                                      M              a              x              I              n              d              e              x              −              E              x              p              e              c              t              e              d              I              n              d              e              x                                              , more specifically
                    A        R        I        =                                                            ∑                                  i                  j                                                                                                  (                                                                              n                                              i                        j                                                              2                                                        )                                                              −              [                              ∑                                  i                                                                                                  (                                                                              a                                              i                                                              2                                                        )                                                                              ∑                                  j                                                                                                  (                                                                              b                                              j                                                              2                                                        )                                                              ]                              /                                                                                  (                                                        n                    2                                                        )                                                                                                                        1                  2                                            [                              ∑                                  i                                                                                                  (                                                                              a                                              i                                                              2                                                        )                                                              +                              ∑                                  j                                                                                                  (                                                                              b                                              j                                                              2                                                        )                                                              ]              −              [                              ∑                                  i                                                                                                  (                                                                              a                                              i                                                              2                                                        )                                                                              ∑                                  j                                                                                                  (                                                                              b                                              j                                                              2                                                        )                                                              ]                              /                                                                                  (                                                        n                    2                                                        )                                                                                              
where                               n                      i            j                          ,                  a                      i                          ,                  b                      j                                   are values from the contingency table.