In formal language theory, the Chomsky–Schützenberger representation theorem is a theorem derived by Noam Chomsky and Marcel-Paul Schützenberger about representing a given context-free language in terms of two simpler languages. These two simpler languages, namely a regular language and a Dyck language, are combined by means of an intersection and a homomorphism.
A few notions from formal language theory are in order. A context-free language is regular, if can be described by a regular expression, or, equivalently, if it is accepted by a finite automaton. A homomorphism is based on a function
h
which maps symbols from an alphabet
Γ
to words over another alphabet
Σ
; If the domain of this function is extended to words over
Γ
in the natural way, by letting
h
(
x
y
)
=
h
(
x
)
h
(
y
)
for all words
x
and
y
, this yields a homomorphism
h
:
Γ
∗
→
Σ
∗
. A matched alphabet
T
∪
T
¯
is an alphabet with two equal-sized sets; it is convenient to think of it as a set of parentheses types, where
T
contains the opening parenthesis symbols, whereas the symbols in
T
¯
contains the closing parenthesis symbols. For a matched alphabet
T
∪
T
¯
, the Dyck language
D
T
is given by
D
T
=
{
w
∈
(
T
∪
T
¯
)
∗
∣
w
is a correctly nested sequence of parentheses
}
words that are well-nested parentheses over
T
∪
T
¯
.
Chomsky–Schützenberger theorem. A language
L over the alphabet
Σ
is context-free if and only if there exists
a matched alphabet
T
∪
T
¯
a regular language
R
over
T
∪
T
¯
,
and a homomorphism
h
:
(
T
∪
T
¯
)
∗
→
Σ
∗
such that
L
=
h
(
D
T
∩
R
)
.
Proofs of this theorem are found in several textbooks, e.g. Autebert, Berstel & Boasson (1997) or Davis, Sigal & Weyuker (1994).