For a formal grammar `G` = { `N`, `Σ`, `P`, `S` }, where...

`N` is a finite set of *nonterminal symbols*
`Σ` is a finite set of *terminal symbols*, disjoint from `N`
`P` is a finite set of *production rules*, each consisting of
- A finite string of symbols on the left (with at least one of them being nonterminal in
`N`)
- A finite string of symbols on the right

- A special starting symbol
`S` which is nonterminal in `N`

...the formal language `L`(`G`) is the set of all finite strings of symbols which can be *produced* by...

- Starting with a string consisting of just an
`S`
- Applying a production rule to the current string, to turn it into a new string
- Repeating step 2 until there are no nonterminal symbols left in the string.

At this point, all the symbols in the string are terminal in `Σ`, which means no more production rules can be applied. So, the sentence itself is terminal.

While `G` is finite, `L` may be finite, countably infinite, or empty. `L` is distinct from the set of *all* finite strings of terminal symbols in Σ, because `L` only contains the *producible* strings. `L` is the set of all terminal strings.

### The relationship between `G` and `L`

It is a relatively straightforward recursive process to derive all the the members of `L` from `G`.

- Stage 1: start with
`S`
- Stage 2: take each production rule in turn. Try to apply it to
`S`. Sometimes the rule is not applicable, but sometimes a new string is produced. Some of these are terminal. Add these to `L`. The rest are nonterminal. Hand these off to Stage 3.
- Stage 3: take each production rule in turn. Take each nonterminal string from Stage 2 in turn. Try to apply the rule to the string. This produces many, many more strings. Some of these are terminal. Add these to
`L`. The rest are nonterminal. Hand these off to Stage 4...
- Continue...

This process may continue forever, as we know, because `L` may be infinite.

But what about reversing this process: turning a set of strings `L` back into a finite formal grammar `G`? This is much harder.

If `L` is finite, e.g....

`L` = {

`f``j``j``w`

`a`

`m``o``o``c``o``w`

`f``l``o``i``p``o``j``w``e``r``p``i``n``g``e``l``r``n``v`

`k``k``k``k``k``k``k`

`p``e``a``s`

...

`z``z``z``z``z`

}

...then in theory we can just enumerate all the members of `L` and have a grammar which produces each one in turn...

`G` = {

`N` = { `S` }

`Σ` = { `a`, `b`, ..., `z` } [every terminal character seen in `L`]

`S` = `S`, obviously

`P` = {

`S` → `f``j``j``w`

`S` → `a`

`S` → `m``o``o``c``o``w`

`S` → `f``l``o``i``p``o``j``w``e``r``p``i``n``g``e``l``r``n``v`

`S` → `k``k``k``k``k``k``k`

`S` → `p``e``a``s`

...

`S` → `z``z``z``z``z`

}

}

However, this is trivial, and not terribly helpful, because `G` is as huge as `L` and therefore doesn't tell us anything about `L`, which is the point of the exercise.

If we wish to make a *small* formal grammar from `L`, then the task becomes substantially more difficult. If `L` is infinite, then the task can even become impossible.

#### Proof that some languages have no formal grammar

If a *language* is a set of finite strings, then a language `L` is a subset of the set (call it `V`) of all possible finite strings. This means that the set of *all* languages is the set of *all* possible subsets of `V`. This is called the power set of `V`, which is written P(`V`). A power set is always strictly larger than the original set. `V` is countably infinite, meaning that P(`V`) is uncountably infinite. There are uncountably many languages.

However, every *formal* grammar `G` is finite. This means that the set of all possible formal grammars is countably infinite. Each formal grammar generates a single language `L`, which means, even including duplicates, that there only exists a countably infinite number of formal languages.

This means that uncountably many languages must *not* be formal languages; they cannot be generated by a finite formal grammar.