A revolutionary parsing algorithm for context-free grammars, published in 1986 by Jay Earley. It has the ability to accept or reject any input for a grammar correctly. With a small augmentation, it can also extract all possible parse trees for that input.

It operates on a set of parsing states, which include the following:

The rule: This is a standard context-free production.
The dot: A position within the rule that indicates how much of the text of that rule has been seen so far.
The reference: A pointer to the set which generated this rule.

The algorithm works as follows:

Start with a "start state," usually something like S → statement. For each item in the current set (where capitals represent nonterminals, and lowercase terminals, and • the dot):

If it matches "A → P • Q R", then add all rules which match "Q → S" to the current set (unless it's already there). In english, add all rules whose left side are the same as the nonterminal to the right of the dot. This stage is called the "Predictor."

If it matches "A → P • q R" and the next input symbol matches q, then add "A → P q • R" to the following set. That is, if we're expecting a terminal, find out if the input matches what we're expecting. This stage is called the "Scanner."

If it matches "A → R •" then add all items from this rule's reference set matching "B → P • A Q" to the current set as "B → P A • Q". Basically, if we've matched an entire rule, update all rules expecting the nonterminal we completed. This stage is called the "Completer."

If the input has been exhausted, and the final set contains something that would complete your start state (eg. "S → statement •", then the input is accepted. Otherwise it is rejected.

An earley parser that shows the parse trees can be found at http://fibonaci.babylonia.flatirons.org/earley.html.

Log in or register to write something here or to contact authors.