Formally, a context-free grammar
is a finite set of rules.
Each rule consists of a symbol (called the left hand side),
an arrow (for cosmetic purposes only), and a sequence of symbols (called the right hand side). There are two kinds of symbols: terminal
s and nonterminal
s. The nonterminals are those symbols that occur at the left hand side of rules.
In general discussions about context-free grammars, it is customary to use lowercase letters to denote terminals and uppercase letters to denote nonterminals.
A -> aAb
This is a context-free grammar with one nonterminal (A), and two rules, with 3 and 0 symbols on the right hand side, respectively.
The purpose of grammars is to describe languages, that is, sets of possible sequences of symbols.
Context-free grammars describe languages in a generative way: pick a nonterminal (in this case, A) and apply the rules until there are no nonterminals left, and you have a sequence of (terminal) symbols; do this in every possible way, and you have a set of such sequences. This set can be infinitely large.
The example grammar, for instance, describes the language "all sequences consisting of a number of 'a's followed by an equal number of 'b's".
A context-free language is a language that can be described
by a context-free grammar. Not all languages are context-free: the language "all sequences consisting of a number of 'a's followed by an equal number of 'b's followed by an equal number of 'c's" isn't.
What context-free grammars are basically capable of doing is expressing languages with 'nested' structures, where items on the left hand must be paired off with items on the right in a systematically nested way. This is considered to be a basic feature of natural languages.
The formalism was invented in the late 50s as a way to express the structure of natural language, and a way to define the structure of computer programming languages. But it is not powerful enough to describe either of them in full.