shift/reduce parser (thing) by Iconoplast

There are a variety of parsers out there for dealing with programming languages. These parsers can be grouped into a few types. One common type is the shift/reduce parser. Before getting into an explanation of what it is and how it works, we need to talk about grammars, rules, and all that sort of stuff.

Computer language rules are typically laid out in a BNF-style. These rules in turn get transformed into functions and such in the compiler, where each function knows a bit about one rule. Several functions are then typically combined into an entire rule. They all work together to make sense of the source code file. The compiler first scans the source code input file, breaks it up into tokens, and tries to match all those tokens against its rules.

Now, some definitions. A shift takes place whenever the parser gets a token of input and it doesn't complete any of the rules it knows about. That token is then shifted onto a stack of tokens. A reduction occurs whenever the parser gets a token that completes a rule. It is then able to pop all the necissary tokens off the stack and replace them with just the completed rule.

An example will clear this all up. Let's say that we're a parser and that we're given the expression "a = 40 + 7". How would we handle it? First, I'll tell you what our rules are:

statement ::= NAME = expression
expression ::= NUMBER + NUMBER

So we've got this stream of input coming in. The first token we get is "a". The parser checks the list of rules it's got and sees that "a" doesn't complete any of them. However, it is a NAME. So it shifts a NAME token onto the stack. Next is an equals sign, which doesn't complete anything. Shift it as well.

The same goes for the "40", plus sign, and "7". Once the 7 has been shifted, though, the parse stack contains everything needed to form an expression (two numbers and a plus sign). So those three tokens are popped off and reduced into an expression. That is then shifted onto the stack. Then, the parser sees a NAME, an equals sign, and an expression. So it can pop all those off, reduce them to a statement, and shift that back on.

Since we have thoughtfully defined a statement as the start symbol (the main rule for a grammar), everything is cool. If some additional tokens were left on the stack, that would be an error.

It should be noted that shift-reduce parsers do have a few problems. For example, they cannot deal with ambiguous grammars. If your grammar has multiple rules with the same right-hand side, the shift-reduce parser will complain. Additionally, if it cannot decide whether to shift or reduce, that will cause a problem as well. The C grammar has one such problem in the if-then-else statement.

Examples of errors to be forthcoming, just as soon as I make sense of all the information. Some of this material was taken from the lex & yacc book by o'Reilly. It's good, thorough, and loaded with examples.

Shift/Reduce Conflict	Dead Letter Office	BNF	ambiguous grammar
parser	Lex	yyerror	ambiguous
stack	grammar	Python	Natural Language Processing
C