yyerror is the user-supplied error reporting function called by yacc parsers.
All identifiers in yacc parsers have a "yy" or "YY" prefix1.
yyerror's prototype looks like this:
int yyerror( char * s );
There's a certain Earth shoes and gold medallions flavor about that. It dates from the mid-1970s, back when C programmers went in for a lot of global variables, reentrancy had about the same cachet as safe sex, and the const keyword didn't exist yet. You've read Tales of the City, haven't you? Imagine those people writing code. Historically, yacc parsers are not reentrant.
When a yacc parser runs into a string of tokens that doesn't match any of its rules, that's a syntax error. The parser then calls yyerror, and passes it a string that describes what went wrong. The string is usually something informative like "Parse error.", unless you #define YYERROR_VERBOSE3. In any case, the string doesn't say anything about where the error happened. You've got to fill that part in yourself. The obvious way to do that is to have the lexer keep track of the starting line number and column for each token it reads, and then put them in file-scope static variables before it returns.
That's adequate, but it still isn't reentrant. The rest of this writeup will discuss reentrancy, and will be specific to Bison.
Bison will give you a reentrant parser if you use their nonstandard %pure_parser directive. If you #define YYPARSE_PARAM, yyparse is declared with a single void * parameter that you provide when you call it. That pointer will be visible in all of your rules (all that code goes in one big function, in the generated parser). If you #define YYLEX_PARAM, the pointer be also passed into yylex. In spite of all that, yyerror still gets called with that one stupid pointer to character. This is not good design. There's no clean way to give yyerror the information it needs. We're lucky, though: This is C. There's a dirty way. In C, there's always a dirty way.
It looks like this in your .y file:
/* Tell Bison to make yyparse() reentrant. */
%pure_parser
%{
/* Tell Bison to give yyparse() and yylex a void *
* parameter
*/
#define YYPARSE_PARAM voidparam
#define YYLEX_PARAM voidparam
/* Now yyparse() will look like this:
* int yyparse( void * voidparam ) { ... }
*/
#define yyerror( msg ) dirty_yyerror( voidparam, msg )
int dirty_yyerror( void * yyparse_param, char * msg );
%}
The parser doesn't have a pointer to yyerror. It calls it by name. So you make yyerror a macro that calls some other function, and quietly passes to that other function the void * parameter to yyparse that we discussed above. Since this is all about reentrancy, the logical thing to do with that pointer is make it a pointer to a struct or class which owns your lexer, along with whatever other state you'd like to keep handy. Your "real" yyerror function can get whatever it needs just by asking. Everybody's thread-safe and nobody gets hurt.
It sounds like a hassle, but it's quicker than writing a shift/reduce parser by hand just to get the error-reporting right and it's not as crazy as hacking up bison.simple.
1 GNU's version of yacc, which they call Bison2, allows the user to supply a different prefix, which lets you have multiple parsers without name conflicts. This is done with the command line option --name-prefix or -p.
2 The reasons for this should be painfully obvious.
3 This feature appears to be unique to Bison; I can't find any mention of it in the O'Reilly Book.
- References:
- The Bison Manual, version 1.25, by Charles Donnelly and Richard Stallman, copyright © 1988, 89, 90, 91, 92, 93, 1995 Free Software Foundation, Inc.
- lex & yacc, Second Edition, by John R Levine, Tony Mason, and Doug Brown, O'Reilly & Associates, Inc., 1990, 1992.