The definition of Markov chains has evolved during the 20th century. In 1953 the term Markov chain was used for stochastic processes with discrete or continuous index set, living on a countable or finite state space, see Doob. or Chung. Since the late 20th century it became more popular to consider a Markov chain as a stochastic process with discrete index set, living on a measurable state space.
Denote with
(
E
,
Σ
)
a measurable space and with
p
a Markov kernel with source and target
(
E
,
Σ
)
. A stochastic process
(
X
n
)
n
∈
N
on
(
Ω
,
F
,
P
)
is called a time homogeneous Markov chain with Markov kernel
p
and start distribution
μ
if
P
[
X
0
∈
A
0
,
X
1
∈
A
1
,
…
,
X
n
∈
A
n
]
=
∫
A
0
…
∫
A
n
−
1
p
(
y
n
−
1
,
A
n
)
p
(
y
n
−
2
,
d
y
n
−
1
)
…
p
(
y
0
,
d
y
1
)
μ
(
d
y
0
)
is satisfied for any
n
∈
N
,
A
0
,
…
,
A
n
∈
Σ
. One can construct for any Markov kernel and any probability measure an associated Markov chain.
For any measure
μ
:
Σ
→
[
0
,
∞
]
we denote for
μ
-integrable function
f
:
E
→
R
∪
{
∞
,
−
∞
}
the Lebesgue integral as
∫
E
f
(
x
)
μ
(
d
x
)
. For the measure
ν
x
:
Σ
→
[
0
,
∞
]
defined by
ν
x
(
A
)
:=
p
(
x
,
A
)
we used the following notation:
∫
E
f
(
y
)
p
(
x
,
d
y
)
:=
∫
E
f
(
y
)
ν
x
(
d
y
)
.
If
μ
is a Dirac measure in
x
, we denote for a Markov kernel
p
with starting distribution
μ
the associated Markov chain as
(
X
n
)
n
∈
N
on
(
Ω
,
F
,
P
x
)
and the expectation value
E
x
[
X
]
=
∫
Ω
X
(
ω
)
P
x
(
d
ω
)
for a
P
x
-integrable function
X
. By definition, we have then
P
x
[
X
0
=
x
]
=
1
.
We have for any measurable function
f
:
E
→
[
0
,
∞
]
the following relation:
∫
E
f
(
y
)
p
(
x
,
d
y
)
=
E
x
[
f
(
X
1
)
]
.
For a Markov kernel
p
with starting distribution
μ
one can introduce a family of Markov kernels
(
p
n
)
n
∈
N
by
p
n
+
1
(
x
,
A
)
:=
∫
E
p
n
(
y
,
A
)
p
(
x
,
d
y
)
for
n
∈
N
,
n
≥
1
and
p
1
:=
p
. For the associated Markov chain
(
X
n
)
n
∈
N
according to
p
and
μ
one obtains
P
[
X
0
∈
A
,
X
n
∈
B
]
=
∫
A
p
n
(
x
,
B
)
μ
(
d
x
)
.
A probability measure
μ
is called stationary measure of a Markov kernel
p
if
∫
A
μ
(
d
x
)
=
∫
E
p
(
x
,
A
)
μ
(
d
x
)
holds for any
A
∈
Σ
. If
(
X
n
)
n
∈
N
on
(
Ω
,
F
,
P
)
denotes the Markov chain according to a Markov kernel
p
with stationary measure
μ
, then all
X
n
have the same probability distribution, namely:
P
[
X
n
∈
A
]
=
μ
(
A
)
for any
A
∈
Σ
.
A Markov kernel
p
is called reversible according to a probability measure
μ
if
∫
A
p
(
x
,
B
)
μ
(
d
x
)
=
∫
B
p
(
x
,
A
)
μ
(
d
x
)
holds for any
A
,
B
∈
Σ
. Replacing
A
=
E
shows that if
p
is reversible according to
μ
, then
μ
must be a stationary measure of
p
.