A type of grammar first described by linguist Noam Chomsky that is used for describing programming languages.

The grammar consists of rules such as "a sentence is a noun phrase followed by a verb phrase" rendered as S -> NP VP. Another rule might be "a noun phrase consists of an article followed by zero or more adjectives followed by a noun."

source: Out of Their Minds: The Lives and Discoveries of 15 Great Computer Scientists, by Dennis Shasha and Cathy Lazere

Formally, a context-free grammar is a finite set of rules. Each rule consists of a symbol (called the left hand side), an arrow (for cosmetic purposes only), and a sequence of symbols (called the right hand side). There are two kinds of symbols: terminals and nonterminals. The nonterminals are those symbols that occur at the left hand side of rules.

In general discussions about context-free grammars, it is customary to use lowercase letters to denote terminals and uppercase letters to denote nonterminals. For example,


  A -> aAb
  A ->

This is a context-free grammar with one nonterminal (A), and two rules, with 3 and 0 symbols on the right hand side, respectively.

The purpose of grammars is to describe languages, that is, sets of possible sequences of symbols. Context-free grammars describe languages in a generative way: pick a nonterminal (in this case, A) and apply the rules until there are no nonterminals left, and you have a sequence of (terminal) symbols; do this in every possible way, and you have a set of such sequences. This set can be infinitely large.

The example grammar, for instance, describes the language "all sequences consisting of a number of 'a's followed by an equal number of 'b's".

A context-free language is a language that can be described by a context-free grammar. Not all languages are context-free: the language "all sequences consisting of a number of 'a's followed by an equal number of 'b's followed by an equal number of 'c's" isn't.

What context-free grammars are basically capable of doing is expressing languages with 'nested' structures, where items on the left hand must be paired off with items on the right in a systematically nested way. This is considered to be a basic feature of natural languages.

The formalism was invented in the late 50s as a way to express the structure of natural language, and a way to define the structure of computer programming languages. But it is not powerful enough to describe either of them in full.

Context Free Grammars:
A Context Free Grammar is a 4-tuple.
( V , SIGMA , R , S )
V = Set of Variables or Symbols
SIGMA = The Alphabet (terminals)
R = Set of Rules
S = Start Rule

A Rule is of the form:
Symbol -> String
Where 'Symbol' is a member of V and 'String' is a string over the set V U SIGMA. An example rule would be:
S -> aBa
This can be read as 'S may be replaced with aBa'
By convention Symbols are capitalized and terminals are not.
It is possible to have two rules that share a Symbol on the left. For instance, consider the two rules:
S->bSb
S->aBa
In a case such as this, S can be replaced with either one of {bSb,aBa}
There is a shorthand for this.
S->bSb | aBa

If a Language L is a Context Free Language iff there is some Context Free Grammar, G, that for every string w in L, there is a derivation of w in G. A derivation is a sequence of substitutions (as permitted by the set of rules) to obtain a string.

Example(pg.92):
A CFG Gis defined by:

V = {A,B}
SIGMA = {0,1,#}
R = {
      A -> 0A1 ,
      A -> B ,
      B -> #
     }
S = A


0#1 is a string in the Language of G
This is because, starting with A, there is a sequence of substitutions that will end with the string 0#1
A derivation is of 0#1 is described as follows:
  • Start with A
  • Look in the set of rules,
    Find an A on the left-hand side of a rule. There are two possible substitutions
  • We will replace A with the right-hand side of one of the rule s.
    Our A is now 0A1
  • It is time for another replacement. 0 and 1 are terminals. They are nonreplacable. Remember SIGMA and V are disjoint set s.
    The only thing to replace is the A.
  • This time we replace A with B by the second rule from our set.
    This leaves us with the string 0B1.
  • B is the only symbol (or variable) and there is only one rule for replacing B.
    This leaves us with the string 0#1


We can write the above sequence of substitiutions as a flat derivation, in the following way:

A => 0A1 => 0B1 => 0#1

This derivation is read as:
A yeilds 0A1, which yeilds 0B1, which yeilds 0#1

A flat derivation for the string 000#111 is explained on page 92 of Michael Sipsers An Introduction to the Theory of Computation and is written as follows:

A => 0A1 => 00A11 => 000A111 => 000B111 => 000#111

The formal definition of a context-free grammar (CFG) is a 4-tuple (V, Σ, R, S), where

  1. V is a finite set of variables
  2. Σ is a finite set of terminals
  3. R is a finite set of rules of the form {A s | A V, s (V Σ)*}
    (variable string of variables and terminals)
  4. S V is the start variable

Let u, v and w be elements of (V Σ)* (strings of variables and terminals). If A w is a rule of the grammar (A R), then uAv yields uwv. This is generally written uAv uwv. Furthermore, one can write u ⇒* v if u = v or there is a sequence u1, u2,,uk such that k ≥ 0 and u1 u2 uk v. One says that v is derivable from u.

The language of a grammar is {w Σ* | S ⇒* w}


Introduction to the Theory of Computation (ISBN 0-534-94728-X) was consulted in the creation of this writeup.

Log in or register to write something here or to contact authors.