About the Pumping Lemma Here I try to explain the Pumping Lemma in plain English. Still some knowledge of Computational Theory is assumed.

To prove that a Language is Regular, one can construct a DFA, an NFA or a Regular Expression for that Language. If any of these can accept or express the exact set of strings in the Language, then it can be said that the Language is Regular.

Such proofs are called proof by construction.

To prove that a language is not Regular is a different matter. The fact that a DFA, NFA or a Regular Expression has yet to be found to accept or express all strings in the language is not sufficient evidence. It is always possible that such a device still exists but has yet to be found! In order to prove that a language is not Regular, we will use a different mechanism. We will consider a property that is true of all Regular Languages. This property is known as the Pumping Lemma. We will then assume that a language which we suspect is not Regular to be Regular, and then demonstrate that the properties defined by the Pumping Lemma do not hold. In other words, we will utilize proof by contradiction.

The idea of the Pumping Lemma is as follows:

Consider a DFA that consists of three states.
We know that when we run the Machine on a string, a transition is made for each character in the string. Each transition leaves us in a predetermined state. This means that for every character in a string, we have visited a (not necessarily unique) state. If our DFA has three states, and we run a string of length five on it, (ignoring the start state) five states have been reached. If our DFA consists of only three states, the pigeon-hole principle states that at least on state must have been visited multiple times. If this is true, we know that there is at least one cycle in our DFA. By cycle, we mean, a sequence of states that start and end with the same state. If we look at the DFA, we can find a sequence of characters that will bring us through this cycle. Since we return to the same state each time we go through the cycle, it can be seen that if a cycle can be traversed once in a DFA, it can be traversed infinitely many times. In other words; If a DFA has a cycle that, when traversed, keeps open the possibility of exiting and reaching an accept state, any string that has more characters than the number of states in the DFA will have a substring which is infinitely repeatable. When this substring is repeated (pumped), the resulting string will also be in the language of the DFA. Since this is true of all DFA's, it is true of Regular Languages.

To prove that a language is not Regular, we show that the Pumping Lemma does not hold. To do this, find a string which is known to be larger than the Pumping L ength. Show that there is no substring that can be infinitely repeated while keeping the resulting string in the language. If you can not show this, try another string. There may be some strings in non-regular languages that are perfectly pumpable! Failure to find such a string proves nothing about the language being Regular or Non-Regular. The Pumping Lemma can not be used to prove that a language is Regular.

The Formal definition of the Pumping Lemma can be found on page 78 of Michael Sipser's Introduction to the Thoery of Computation.

It says that if a language is Regular, a Pumping Length can be found. Remember, intuitively, we consider this number to be the number of states in our Machine.
If a string is longer than this length, it is possible to subdivide it into three parts:
x, y and z

x consists of all characters parsed before the DFA cycle.
y consists of all characters parsed in the DFA cycle.
z consists of all characters parsed after the DFA cycle.


There are three properties that this string, under the subdivision will have. Here I explain their signifigance.
  • for all i >= 0, x(y^i)z will be a string that is in t he language.
    This makes since because the cycle can be traversed any number of times.
    (By y^i, we mean y*)
  • |y| > 0
    If the length of y were 0, this would mean that there is nothing to pump ! Essentially, there is no part of the string that has gone through the cycle.
  • |xy| < = p
    If the length of xy were greater than p, we have made a bad choice for y.
    xy is longer than the number of states, y is our cycle, bu t before we have finished traversing it, we have already seen some state more than once. There must be a cycle somewhere earlier in the string. That earlier cycle should be our y!


When proving that the language is not regular, it is not sufficient to choose one arbitrary subdivision of the string into x,y,z and show that y can not be pumped. It must be shown that for all possible subdivisions of the string into x,y,z, pumping y will always, at some point, result in a string which is not in the language.