The language defined on the alphabet {a,b} consisting of the words a^n b^n. This language is not a regular language, leading to the common assertion that regular expressions can't count.

The language "anbn" is one of the classic languages used to demonstrate the limits of regular expressions and context-free grammars.

The term "language" here refers to the computer science theory definition of a language - not the linguistic or programming term that we are familiar with. The language here is something more basic from which programming languages and linguistic languages are built from.

A language is a sequence of tokens.

Languages can be grouped into the Chomsky hierarchy of languages:

Each type of language has a comparable "machine" that can recognize if the sequence of tokens fits that language or not. English is certainly not a regular language, or even context-free - neither finite state automata nor pushdown automata can recognize if a particular sequence of tokens (words or letters) is valid English or not. Frankly, most people can't identify if sequence of words is valid English or not - but thats a different matter.

Returning back to the language at hand, "anbn" we have some number of 'a's followed by the same number of 'b's. Thats it - thats the entire language.

Some examples of valid strings within this language are:

There are an infinite number of acceptable strings. Likewise, there are strings that are not acceptable within this language (of which there are also an infinite number):

This language can be recognized by the context-free gammar:

S ::= (null) | aSb
This simply says that you match an 'a' on the left, a 'b' on the right and stick another matching token in the middle, or you match nothing.

This language, however is not a regular language and there are several proofs that explore this:

The statement that "regular expressions can't count" is quite true. However, the perl language regular expression engine is a bit more powerful than the standard regular language. While this particular language is beyond the grasp of the perl regular expression language without "cheating" (invoking perl commands from within the expression itself with the 'e' switch), perl can handle a superficialy similar language of "anban" where we have a 'b' separating two strings of 'a's that are the same length.

The reason that perl can match this is that perl regular expressions have (among other things) the ability to reference backwards a previously matched string. In this case, that is the string of 'a's.

This ability is best demonstrated in the perl regular expression that matches prime numbers in unary demonstrated and created by Abigail as part of her .sig:

perl -wle 'print "Prime" if (1 x shift) !~ /^1?$|^(11+?)\1+$/'
Seee "1^p" p prime language for more on this paticular language.

Computer Science 520 - Introduction to Theoretical Computer Science, University Madison, Wisconsin

Log in or register to write something here or to contact authors.