The mathematical theory
of formal language
Language has form and meaning; formal language theory is only concerned with its form (i.e. its syntax). It defines a language as a (possibly infinite) set of strings, i.e., sequences of symbols. It is not at all concerned with what is expressed by such sequences.
Furthermore, formal language theory really doesn't study individual languages,
but rather, the mechanisms by which languages can be described, the relationships between these mechanisms and between the classes of languages they can describe. Examples of such mechanisms are regular expressions, grammars, and automata. These mechanisms are themselves languages in the more conventional sense of the word: they consist of expressions (sentences, utterances) that carry meaning.
The best known classes of languages are those of the Chomsky hierarchy.
Formal language theory studies parsing mechanisms: membership tests for a string and a language.
The focus is on complexity: how efficiently membership can be tested for given classes of languages.
It also deals with language comparison: to decide whether two languages are the same, or whether one includes the other. This turns out to be undecidable or very expensive for many classes of languages. Most important in practice are comparison algorithms on finite automata, not least because the techniques used here are applicable to many related problems, such as type checking and string matching.
Another topic of study in formal language theory is closure properties, that is, whether or not a certain operation on languages - e.g., string reversal or the union of two languages - on languages in a given class always produces a result that is still in that class.