Command Line Options

Usage: yooparse [options] file
  -h, -u, -?    print this message
  -c class      specify the C++ class name
  -d            generate token definition header file
  -l            generate a LR(1) table instead of the default LALR(1) table
  -v            generate a detailed LR states information
  -w            do not generate warning messages

Input File Format Overview

All parser classes are child class of yoogroup::YooParse<>, which in turn is a child class of yoogroup::YooLex<>.

For the input file, in all sections, Line comments (//) and block comments (/* ...*/) are allowed. %% is used as the section separator. The overview of the sections is as follows:

//section 1


// section 2 prolog

// section 2 grammars here

// section 2 epilog.


// section 3


Configurations are started by %option at beginning of the line in section 1. All options are case sensitive.

Configuration Explaination
ccext = "name" specifies the C++ source file extention.
ccfile = "name" specifies the C++ source file w/ extention.
class = "name" specifies the class name.
compact Use a more compact representation of DFA table. This is done by changing error states to the default reduce even if the lookaheads do not match. So use this option with caution. I think that Yacc/Bison use this mode by default. Default reduce option can be turned on/off w/o this option by calling yySetDefaultReduce (true), but if you are using default reduces, using this option can save some space.
hhext = "name" specifies the default header file extention.
hhfile = "name" specifies the header file w/ extention.
kernel In the DFA state debug file, only prints kernel items for each DFA state item set. By default, closure items are printed as well.
lalr tells YooParse to generate an LALR(1) parser. This is the default.
lr tells YooParse to generate a full LR(1) parser.
main tells YooParse to generate a default main function.
namespace = "name" specifies the namespace for the class
nola In the DFA state debug file, do not print the lookaheads for each LR(1) item. By default, lookaheads for each item is printed.
token tells YooParse to generate a token definition file. By default, YooParse would generate the file named class + "_tokens" + hhext
token = "name" tells YooParse to generate a token definition file and specifies the output file name w/ extension.
token_namespace = "name" specifies the namespace for the token definitions. If not specified, it will be the same as the class namespace. To force token namespace in the default namespace, specify this option at a later place.
verbose tells YooParse to generate a DFA state debug file. By default, YooParse would generate the file named class + ".output"
verbose = "name" tells YooParse to generate a DFA state debug file and specifies the output file name with extension.
yytext = "name" specifies the _yyText data type. Same as the YooLex option.
yyvalue = "name" specifies the _yyValue data type.
Note: this data type must have a default constructor. Since this data type is used inside containers, std::auto_ptr<> cannot be used. For automatic memory management, use smart pointers like boost::shared_ptr<>.

Section 1 Description

The purpose of section 1 is to enter configuration parameters for YooParse, include some C/C++ codes, and define terminals, precedence and associative rules. Here are the tags that start these actions.

%{ starts a code block, which is terminated with %}
%left specifies left associativity on the terminal as well as the precedence level
%nonassoc specifies non-associativity on the terminal as well as the precedence level
%option YooParse configurations. See above.
%right specifies right associativity on the terminal as well as the precedence level
%start specifies the start non-terminal instead of the first one encountered.
%token specifies a terminal. The value of this terminal is automatically assigned.
Note for %left, %right and %nonassoc: Terminals specified on the same line have the same precedence level. Terminals specified later have higher precedence

Section 2 Description

Section 2 contain 3 parts, prolog, grammar rules and epilog. This section is the same as yacc/bison.

%prec <terminal name> is supported.

The prolog and epilog subsections are used to insert codes in the beginning and at the end of yyParse () function, respectively.

Section 3 Description

All codes in this section are echoed as is at the bottom of the generated source code. So feel free to put class functions and main here.


Reduce/Reduce Conflicts

An example of such conflict is:
A : B | C ;
B : a ;
C : a ;
There is a reduce/reduce conflict with $ (EOF) as lookahead since both B and C can be reduced. By default, the rule specified earlier is reduced.

Shift/Reduce Conflicts

An example of such conflict is:
statement : if statement
	  : if statement else statement
	  : ;
so there is a DFA state which is the following item set:
	statement : if statement . , $/else
	statement : if statement . else statement , $/else
There is a shift/reduce conflict on else lookahead terminal. By default, shift takes precedence over reduce. Associativity and precedence can be used to change this default rule.

Parsing Algorithsm


SLR stands for Simple LR. It improves over LR(0) parsing algorithm by looking at an additional lookahead to avoid erroneous reduces and thus avoid certain reduce/reduce and shift/reduce conflicts. It does so by checking if a non-terminal A can really be reduced in an LR(0) item set only if the lookahead is in the set of lookaheads, FOLLOW (A). Reduce/Reduce and Shift/Reduce conflicts can be raised if other non-terminals can be reduced or other items do shift action on the same lookahead token.


LALR stands for Lookahead LR. It improves over SLR by attempting to do some more careful lookahead analysis. In SLR, the lookaheads for each LR item for a non-terminal A is always FOLLOW (A). In LALR, the lookaheads are subsets of FOLLOW (A). Example:

A : a a | B b b | b B a ;
B : a ;
In the grammar above:
FOLLOW (B) := { a, b }
But, the resulting states do not always have FOLLOW (B) attached to items that contain B:
state 0:
	A : . a a , $
	A : . B b b , $
	A : . b B a , $
	B : . a , b
state 1:
	A : a . a , $
	B : a . , b
state 2:
	A : b . B , b , $
	B : . a , a
The analysis is not easy to do with hands, this is what YooParse is used for. The advantage of LALR is that it eliminates many reduce/reduce and shift/reduce conflicts in SLR w/o additional space cost.


Full LR(1) table is also called canonical LR. L indicates input is processed from left to right, and R indicates right-most derivation. The full LR(1) table can be 5-10 times bigger than the LALR table, but it can further eliminate certain reduce/reduce and shift/reduce conflicts and it can detect errors before erroneous reduces.

Data Types

The following data types are used in YooParse. The data type is defined in yoogroup::YooParse<> unless mentioned otherwise.

YYParserState Parser stack value data type.
YYParserStateList Parser stack type. Equivalent of std::list<YYParserState>
YYValueType _yyValueType data type.
YYTextType _yyText data type. Defined in yoogroup::YooLex<>.


The following macros are defined in the generated C++ source file. These macros can be accessed in the section 3.

YYPARSE_DFA(outState,inState,lookahead) DFA state lookup macro
YYPARSE_GOTO(outState,inState,reducedSymbol) GOTO state lookup macro
YYPARSE_TRANSLATE(terminal) Does terminal->internal representation translation


protected virtual bool
YooParse<>::yyParseError (int yychar);
Error recovery function. Overload this function if you don't like the default method.
protected void
YooParse<>::yyCheckErrorToken ();
Push an ERROR token onto parser state stack.
public bool
YooParse<>::yyGetDefaultReduce () const;
Return true if default reduce.
public bool
YooParse<>::yySetDefaultReduce (bool defaultReduce);
Set true to force default reduce. Return old value.
public virtual int
class::yyParse ()
The parser function

$Id: reference.html,v 1.3 2002/07/27 03:47:33 coconut Exp $