In probability theory, a Markov kernel (also known as a stochastic kernel or probability kernel) is a map that plays the role, in the general theory of Markov processes, that the transition matrix does in the theory of Markov processes with a finite state space.
Let                     (        X        ,                              A                          )                ,                     (        Y        ,                              B                          )                 be measurable spaces. A Markov kernel with source                     (        X        ,                              A                          )                 and target                     (        Y        ,                              B                          )                 is a map                     κ        :        X        ×                              B                          →        [        0        ,        1        ]                 with the following properties:
- The map                     x        ↦        κ        (        x        ,        B        )                 is                                           A                                   - measureable for every                     B        ∈                              B                                  .
- The map                     B        ↦        κ        (        x        ,        B        )                 is a probability measure on                     (        Y        ,                              B                          )                 for every                     x        ∈        X                .
(i.e. It associates to each point                     x        ∈        X                 a probability measure                     κ        (        x        ,        .        )                 on                     (        Y        ,                              B                          )                 such that, for every measurable set                     B        ∈                              B                                  , the map                     x        ↦        κ        (        x        ,        B        )                 is measurable with respect to the                     σ                -algebra                                           A                                  .)
Simple random walk: Take                     X        =        Y        =                  Z                         and                                           A                          =                              B                          =                              P                          (                  Z                )                , then the Markov kernel                     κ                 with                    κ        (        x        ,        B        )        =                              1            2                                                1                                B                          (        x        −        1        )        +                              1            2                                                1                                B                          (        x        +        1        )        ,                ∀        x        ∈                  Z                ,                ∀        B        ∈                              P                          (                  Z                )                ,
describes the transition rule for the random walk on                               Z                        , where                               1                         is the indicator function.
Galton-Watson process: Take                     X        =        Y        =                  N                        ,                                           A                          =                              B                          =                              P                          (                  N                )                , then                    κ        (        x        ,        B        )        =                              {                                                                                                      1                                                              B                                                        (                  0                  )                                                                    x                  =                  0                  ,                                                                              P                  [                                      ξ                                          1                                                        +                  ⋯                  +                                      ξ                                          x                                                        ∈                  B                  ]                                                                                        else,                                                                                                          with i.i.d. random variables                               ξ                      i                                  .
General Markov processes with finite state space: Take                     X        =        Y                ,                                           A                          =                              B                          =                              P                          (        X        )        =                              P                          (        Y        )                 and                               |                X                  |                =                  |                Y                  |                =        n                , then the transition rule can be represented as a stochastic matrix                     (                  K                      i            j                                    )                      1            ≤            i            ,            j            ≤            n                                   with                               Σ                      j            ∈            Y                                    K                      i            j                          =        1                 for every                     i        ∈        X                . In the convention of Markov kernels we write                    κ        (        i        ,        B        )        =                  Σ                      j            ∈            B                                    K                      i            j                          ,                ∀        i        ∈        X        ,                ∀        B        ∈                              B                                  .
Construction of a Markov kernel: If                     ν                 is a finite measure on                     (        Y        ,                              B                          )                 and                     k        :        X        ×        Y        →                              R                                +                                   is a measurable function with respect to the product                     σ                -algebra                                           A                          ⊗                              B                                   and has the property                              ∫                      X                          k        (        x        ,        y        )        ν        (                  d                y        )        =        1        ,                for all                     x        ∈        X                , then the mapping                     κ        :        X        ×                              B                          →        [        0        ,        1        ]                
                    κ        (        x        ,        B        )        =                  ∫                      B                          k        (        x        ,        y        )        ν        (                  d                y        )        ,                defines a Markov kernel.
Let                     (        X        ,                              A                          ,        P        )                 be a probability space and                     κ                 a Markov kernel from                     (        X        ,                              A                          )                 to some                     (        Y        ,                              B                          )                .
Then there exists a unique measure                     Q                 on                     (        X        ×        Y        ,                              A                          ⊗                              B                          )                , such that
                    Q        (        A        ×        B        )        =                  ∫                      A                          κ        (        x        ,        B        )        d        P        (        x        )        ,                ∀        A        ∈                              A                          ,                ∀        B        ∈                              B                                  .
Let                     (        S        ,        Y        )                 be a Borel space,                     X                 a                     (        S        ,        Y        )                 - valued random variable on the measure space                     (        Ω        ,                              F                          ,        P        )                 and                                           G                          ⊆                              F                                   a sub-                    σ                -algebra.
Then there exists a Markov kernel                     κ                 from                     (        Ω        ,                              G                          )                 to                     (        S        ,        Y        )                , such that                     κ        (        .        ,        B        )                 is a version of the conditional expectation                     E        [                              1                                {            X            ∈            B            }                                    |                                      G                          ]                 for every                     B        ∈        Y                , i.e.
                    P        [        X        ∈        B                  |                                      G                          ]        =        E        [                              1                                {            X            ∈            B            }                                    |                                      G                          ]        =        κ        (        ω        ,        B        )        ,                P        −        a        .        s        .        ∀        B        ∈                              G                                  .
It is called regular conditional distribution of                     X                 given                                           G                                   and is not uniquely defined.