![]() | ||
In computer science, in particular in formal language theory, the pumping lemma for context-free languages, also known as the Bar-Hillel lemma, is a lemma that gives a property shared by all context-free languages. It generalizes the pumping lemma for regular languages.
Contents
Formal statement
If a language L is context-free, then there exists some integer p ≥ 1 (called a "pumping length") such that every string s in L that has a length of p or more symbols (i.e. with |s| ≥ p) can be written as
s = uvwxywith substrings u, v, w, x and y, such that
1. |vwx| ≤ p,2. |vx| ≥ 1, and3. uvnwxny is in L for all n ≥ 0.Below is a formal expression of the Pumping Lemma.
Informal statement and explanation
The pumping lemma for context-free languages (called just "the pumping lemma" for the rest of this article) describes a property that all context-free languages are guaranteed to have.
The property is a property of all strings in the language that are of length at least p, where p is a constant—called the pumping length—that varies between context-free languages.
Say s is a string of length at least p that is in the language.
The pumping lemma states that s can be split into five substrings, s = uvwxy, where vx is non-empty and the length of vwx is at most p, such that repeating v and x any (and the same) number of times in s produces a string that is still in the language (it is possible and often useful to repeat zero times, which removes v and x from the string). This process of "pumping up" additional copies of v and x is what gives the pumping lemma its name.
Finite languages (which are regular and hence context-free) obey the pumping lemma trivially by having p equal to the maximum string length in L plus one. As there are no strings of this length the pumping lemma is not violated.
Usage of the lemma
The pumping lemma is often used to prove that a given language L is non-context-free, by showing that arbitrarily long strings s are in L that cannot be "pumped" without producing strings outside L.
For example, the language L = { anbncn | n > 0 } can be shown to be non-context-free by using the pumping lemma in a proof by contradiction. First, assume that L is context free. By the pumping lemma, there exists an integer p which is the pumping length of language L. Consider the string s = apbpcp in L. The pumping lemma tells us that s can be written in the form s = uvwxy, where u, v, w, x, and y are substrings, such that |vwx| ≤ p, |vx| ≥ 1, and uviwxiy is in L for every integer i ≥ 0. By the choice of s and the fact that |vwx| ≤ p, it is easily seen that the substring vwx can contain no more than two distinct symbols. That is, we have one of five possibilities for vwx:
- vwx = aj for some j ≤ p.
- vwx = ajbk for some j and k with j+k ≤ p.
- vwx = bj for some j ≤ p.
- vwx = bjck for some j and k with j+k ≤ p.
- vwx = cj for some j ≤ p.
For each case, it is easily verified that uviwxiy does not contain equal numbers of each letter for any i ≠ 1. Thus, uv2wx2y does not have the form aibici. This contradicts the definition of L. Therefore, our initial assumption that L is context free must be false.
While the pumping lemma is often a useful tool to prove that a given language is not context-free, it does not give a complete characterization of the context-free languages. If a language does not satisfy the condition given by the pumping lemma, we have established that it is not context-free.
On the other hand, there are languages that are not context-free, but still satisfy the condition given by the pumping lemma, for example L = { bjckdl | j, k, l ∈ ℕ } ∪ { aibjcjdj | i, j ∈ ℕ, i≥1 }: for s=bjckdl with e.g. j≥1 choose vwx to consist only of b’s, for s=aibjcjdj choose vwx to consist only of a’s; in both cases all pumped strings are still in L. There are more powerful proof techniques available, such as Ogden's lemma, but also these techniques do not give a complete characterization of the context-free languages.