"a^n b^n" language - Everything2.com

The language "aⁿbⁿ" is one of the classic languages used to demonstrate the limits of regular expressions and context-free grammars.

The term "language" here refers to the computer science theory definition of a language - not the linguistic or programming term that we are familiar with. The language here is something more basic from which programming languages and linguistic languages are built from.

A language is a sequence of tokens.

Languages can be grouped into the Chomsky hierarchy of languages:

Each type of language has a comparable "machine" that can recognize if the sequence of tokens fits that language or not. English is certainly not a regular language, or even context-free - neither finite state automata nor pushdown automata can recognize if a particular sequence of tokens (words or letters) is valid English or not. Frankly, most people can't identify if sequence of words is valid English or not - but thats a different matter.

Returning back to the language at hand, "aⁿbⁿ" we have some number of 'a's followed by the same number of 'b's. Thats it - thats the entire language.

Some examples of valid strings within this language are:

(null)
ab
aabb
aaabbb
aaaaaaaaaabbbbbbbbbb

There are an infinite number of acceptable strings. Likewise, there are strings that are not acceptable within this language (of which there are also an infinite number):

a
aab
ababa
abba

This language can be recognized by the context-free gammar:

S ::= (null) | aSb

This simply says that you match an 'a' on the left, a 'b' on the right and stick another matching token in the middle, or you match nothing.

This language, however is not a regular language and there are several proofs that explore this:

The statement that "regular expressions can't count" is quite true. However, the perl language regular expression engine is a bit more powerful than the standard regular language. While this particular language is beyond the grasp of the perl regular expression language without "cheating" (invoking perl commands from within the expression itself with the 'e' switch), perl can handle a superficialy similar language of "aⁿbaⁿ" where we have a 'b' separating two strings of 'a's that are the same length.

m/^(a*)b\1$/

The reason that perl can match this is that perl regular expressions have (among other things) the ability to reference backwards a previously matched string. In this case, that is the string of 'a's.

This ability is best demonstrated in the perl regular expression that matches prime numbers in unary demonstrated and created by Abigail as part of her .sig:

Abigail
-- 
perl -wle 'print "Prime" if (1 x shift) !~ /^1?$|^(11+?)\1+$/'

Seee "1^p" p prime language for more on this paticular language.

Computer Science 520 - Introduction to Theoretical Computer Science, University Madison, Wisconsin

pumping lemma proof that the "a^n b^n" language is not regular	"1^p" p prime language	regular language	Linear-bounded automaton
Myhill Theorem proof that the "a^n b^n" language is not regular	The Value of a College Degree - Computer Science	Abigail	the expressive power of regular expressions
context-free grammar	Chomsky hierarchy	Correcting BBC newsreaders on their grammar	Turing Machine
Language recognition and generation in Prolog	Pushdown Automaton	finite state automaton	Neelix
Primality testing with Perl regexs	Wolfram's classes of cellular automata	The Tipping Point	Mastering Regular Expressions
Zeromancer	language family