Common problem encountered when using yacc(1) (or its GNU-decendant, Bison) to generate a parser for a given computer language.

To understand what is going on here, it is necessary to have some understanding of how yacc-generated parsers work. The yyparse() function generated by yacc attempts to reduce a string of "tokens" read from a source file down to one single token, according to a set of "rules" supplied by the user in a manner not dissimilar to Backus-Naur form. To do this, it creates a "stack" onto which tokens can be placed, and proceeds according to the following algorithm:

  1. Read a token from the file and store it in the variable yychar (this is called the "look-ahead" token)
  2. If there is a rule by which the top n tokens on the stack can be reduced down to a single token, and there is neither a rule which requires the look-ahead token nor the possibility of a syntax error should the reduction occur, perform the reduction, and any semantic actions associated with it.
  3. Repeat step 2 until no further reductions can be performed.
  4. Shift the look-ahead token onto the top of the stack.
  5. Return to step 1, and repeat until end of file.
(actually, that's a bit of an oversimplification, but it'll do for the purposes of this writeup.)

A shift/reduce conflict occurs at step 2 of the above process, when it is ambiguous whether or not a reduction is required. For instance, consider the following hypothetical fragment from a grammar for the C language:


if_statement    :   IF '(' expression ')' statement_list
                |   IF '(' expression ')' statement_list ELSE statement_list
		;

In other words, an if_statement consists of the keyword "if", an expession in brackets, and a list of statements, followed optionally by an "else" keyword and another list of statements (assuming that statement_list has been defined elsewhere as either a single statement or several statements between braces). Then consider the following ambiguous, but ANSI-legal, code fragment:

    if (x>y) if (x<z) foo(); else bar();

The parser will read "if (x>y)" and expect a statement as the next token. It finds another "if" and reads this in, followed by the rest of the statement up to the "else". It then faces the problem of deciding whether to reduce "if (x<z) foo();" to an if_statement and then shifting the "else", or of the "else" should be shifted first: both would be equally valid under the grammar as stated, but reduction would result in this:

    if (x>y) {
        if (x<z) {
            foo();
         }
    } else {
        bar();
    }

associating the "else" with the first "if", whereas shifting the "else" produces

    if (x>y) {
        if (x<z) {
            foo();
        } else {
            bar();
        }
    }

with the "else" associated with the second "if".

Sadly such problems are nigh-on impossible to prevent totally (the c-parse.y file found in the GCC source produces 51 such conflicts when processed with Bison 1.28), but the default behaviour of both yacc and bison is to shift, as this produces the correct result in C and similar languages, and print a warning to stderr that a possible conflict exists. Examination of the y.output file produced should then help locate the conflict and enable easy altering of the source file if this is the wrong behaiviour.

Log in or register to write something here or to contact authors.