Reference

Command Line Options
Input File Format Overview
Warnings
- Reduce/Reduce Conflicts
- Shift/Reduce Conflicts
Data Types
Macros
Functions
Parsing Algorithms

Command Line Options

Usage: yooparse [options] file
Options:
  -h, -u, -?    print this message
  -c class      specify the C++ class name
  -d            generate token definition header file
  -l            generate a LR(1) table instead of the default LALR(1) table
  -v            generate a detailed LR states information
  -w            do not generate warning messages

Input File Format Overview

All parser classes are child class of yoogroup::YooParse<>, which in turn is a child class of yoogroup::YooLex<>.

For the input file, in all sections, Line comments (//) and block comments (/* ...*/) are allowed. %% is used as the section separator. The overview of the sections is as follows:

//section 1

%%

%{
// section 2 prolog
%}

// section 2 grammars here

%{
// section 2 epilog.
%}

%%

// section 3

Configurations

Configurations are started by %option at beginning of the line in section 1. All options are case sensitive.

Configuration Explaination
ccext = "name" specifies the C++ source file extention.
ccfile = "name" specifies the C++ source file w/ extention.
class = "name" specifies the class name.
compact Use a more compact representation of DFA table. This is done by changing error states to the default reduce even if the lookaheads do not match. So use this option with caution. I think that Yacc/Bison use this mode by default. Default reduce option can be turned on/off w/o this option by calling yySetDefaultReduce (true), but if you are using default reduces, using this option can save some space.
hhext = "name" specifies the default header file extention.
hhfile = "name" specifies the header file w/ extention.
kernel In the DFA state debug file, only prints kernel items for each DFA state item set. By default, closure items are printed as well.
lalr tells YooParse to generate an LALR(1) parser. This is the default.
lr tells YooParse to generate a full LR(1) parser.
main tells YooParse to generate a default main function.
namespace = "name" specifies the namespace for the class
nola In the DFA state debug file, do not print the lookaheads for each LR(1) item. By default, lookaheads for each item is printed.
token tells YooParse to generate a token definition file. By default, YooParse would generate the file named class + "_tokens" + hhext
token = "name" tells YooParse to generate a token definition file and specifies the output file name w/ extension.
token_namespace = "name" specifies the namespace for the token definitions. If not specified, it will be the same as the class namespace. To force token namespace in the default namespace, specify this option at a later place.
verbose tells YooParse to generate a DFA state debug file. By default, YooParse would generate the file named class + ".output"
verbose = "name" tells YooParse to generate a DFA state debug file and specifies the output file name with extension.
yytext = "name" specifies the _yyText data type. Same as the YooLex option.
yyvalue = "name" specifies the _yyValue data type.
Note: this data type must have a default constructor. Since this data type is used inside containers, std::auto_ptr<> cannot be used. For automatic memory management, use smart pointers like boost::shared_ptr<>.

Section 1 Description

The purpose of section 1 is to enter configuration parameters for YooParse, include some C/C++ codes, and define terminals, precedence and associative rules. Here are the tags that start these actions.

%{ starts a code block, which is terminated with %}
%left specifies left associativity on the terminal as well as the precedence level
%nonassoc specifies non-associativity on the terminal as well as the precedence level
%option YooParse configurations. See above.
%right specifies right associativity on the terminal as well as the precedence level
%start specifies the start non-terminal instead of the first one encountered.
%token specifies a terminal. The value of this terminal is automatically assigned.
Note for %left, %right and %nonassoc: Terminals specified on the same line have the same precedence level. Terminals specified later have higher precedence

Section 2 Description

Section 2 contain 3 parts, prolog, grammar rules and epilog. This section is the same as yacc/bison.

%prec <terminal name> is supported.

The prolog and epilog subsections are used to insert codes in the beginning and at the end of yyParse () function, respectively.

Section 3 Description

All codes in this section are echoed as is at the bottom of the generated source code. So feel free to put class functions and main here.

Warnings

Reduce/Reduce Conflicts

An example of such conflict is:

A : B | C ;
B : a ;
C : a ;

There is a reduce/reduce conflict with $ (EOF) as lookahead since both B and C can be reduced. By default, the rule specified earlier is reduced.

Shift/Reduce Conflicts

An example of such conflict is:

statement : if statement
	  : if statement else statement
	  : ;
	  ;

so there is a DFA state which is the following item set:

	statement : if statement . , $/else
	statement : if statement . else statement , $/else

There is a shift/reduce conflict on else lookahead terminal. By default, shift takes precedence over reduce. Associativity and precedence can be used to change this default rule.

Right associativity favors shifts over reduces. As in the above dangling-else problem. This is the default.
Left associativity favors reduces over shift. Example:
```
E : E + E | INTEGER ;
```
An expression 5 + 3 + 2 would be evalued as 5 + (3 + 2) by the default right associativity rule. By specifying left associativity on '+' terminal, the above expression would be evalued as (5 + 3) + 2, which is more intuitive.
Precedence favors shifts or reduces based on the lookahead. Example:
```
E : E + E | E * E | INTEGER ;
```
5+3*2 and 5*3+2 are two expressions. By specifying * having higher precedence than +, the first expression is evalued as 5+(3*2), which favors shift on * lookahead, and the second expression is valued as (5*3)+2, which favors reduce on + lookahead.

Parsing Algorithsm

SLR(1)

SLR stands for Simple LR. It improves over LR(0) parsing algorithm by looking at an additional lookahead to avoid erroneous reduces and thus avoid certain reduce/reduce and shift/reduce conflicts. It does so by checking if a non-terminal A can really be reduced in an LR(0) item set only if the lookahead is in the set of lookaheads, FOLLOW (A). Reduce/Reduce and Shift/Reduce conflicts can be raised if other non-terminals can be reduced or other items do shift action on the same lookahead token.

LALR(1)

LALR stands for Lookahead LR. It improves over SLR by attempting to do some more careful lookahead analysis. In SLR, the lookaheads for each LR item for a non-terminal A is always FOLLOW (A). In LALR, the lookaheads are subsets of FOLLOW (A). Example:

A : a a | B b b | b B a ;
B : a ;

In the grammar above:
FOLLOW (B) := { a, b }
But, the resulting states do not always have FOLLOW (B) attached to items that contain B:

state 0:
	A : . a a , $
	A : . B b b , $
	A : . b B a , $
	B : . a , b
state 1:
	A : a . a , $
	B : a . , b
state 2:
	A : b . B , b , $
	B : . a , a
...

The analysis is not easy to do with hands, this is what YooParse is used for. The advantage of LALR is that it eliminates many reduce/reduce and shift/reduce conflicts in SLR w/o additional space cost.

LR(1)

Full LR(1) table is also called canonical LR. L indicates input is processed from left to right, and R indicates right-most derivation. The full LR(1) table can be 5-10 times bigger than the LALR table, but it can further eliminate certain reduce/reduce and shift/reduce conflicts and it can detect errors before erroneous reduces.

Data Types

The following data types are used in YooParse. The data type is defined in yoogroup::YooParse<> unless mentioned otherwise.

YYParserState Parser stack value data type.
YYParserStateList Parser stack type. Equivalent of std::list<YYParserState>
YYValueType _yyValueType data type.
YYTextType _yyText data type. Defined in yoogroup::YooLex<>.

Macros

The following macros are defined in the generated C++ source file. These macros can be accessed in the section 3.

`YYPARSE_DFA(outState,inState,lookahead)`	DFA state lookup macro
`YYPARSE_GOTO(outState,inState,reducedSymbol)`	GOTO state lookup macro
`YYPARSE_GOTO_BASEADD`	Internal use
`YYPARSE_TRANSLATE(terminal)`	Does terminal->internal representation translation

Functions

`protected virtual bool YooParse<>::yyParseError (int yychar);`	Error recovery function. Overload this function if you don't like the default method.
`protected void YooParse<>::yyCheckErrorToken ();`	Push an ERROR token onto parser state stack.
`public bool YooParse<>::yyGetDefaultReduce () const;`	Return true if default reduce.
`public bool YooParse<>::yySetDefaultReduce (bool defaultReduce);`	Set true to force default reduce. Return old value.
`public virtual int class::yyParse ()`	The parser function

$Id: reference.html,v 1.3 2002/07/27 03:47:33 coconut Exp $

Configuration	Explaination
ccext = "name"	specifies the C++ source file extention.
ccfile = "name"	specifies the C++ source file w/ extention.
class = "name"	specifies the class name.
compact	Use a more compact representation of DFA table. This is done by changing error states to the default reduce even if the lookaheads do not match. So use this option with caution. I think that Yacc/Bison use this mode by default. Default reduce option can be turned on/off w/o this option by calling `yySetDefaultReduce (true)`, but if you are using default reduces, using this option can save some space.
hhext = "name"	specifies the default header file extention.
hhfile = "name"	specifies the header file w/ extention.
kernel	In the DFA state debug file, only prints kernel items for each DFA state item set. By default, closure items are printed as well.
lalr	tells YooParse to generate an LALR(1) parser. This is the default.
lr	tells YooParse to generate a full LR(1) parser.
main	tells YooParse to generate a default main function.
namespace = "name"	specifies the namespace for the class
nola	In the DFA state debug file, do not print the lookaheads for each LR(1) item. By default, lookaheads for each item is printed.
token	tells YooParse to generate a token definition file. By default, YooParse would generate the file named class + "_tokens" + hhext
token = "name"	tells YooParse to generate a token definition file and specifies the output file name w/ extension.
token_namespace = "name"	specifies the namespace for the token definitions. If not specified, it will be the same as the class namespace. To force token namespace in the default namespace, specify this option at a later place.
verbose	tells YooParse to generate a DFA state debug file. By default, YooParse would generate the file named class + ".output"
verbose = "name"	tells YooParse to generate a DFA state debug file and specifies the output file name with extension.
yytext = "name"	specifies the _yyText data type. Same as the YooLex option.
yyvalue = "name"	specifies the _yyValue data type. Note: this data type must have a default constructor. Since this data type is used inside containers, `std::auto_ptr<>` cannot be used. For automatic memory management, use smart pointers like `boost::shared_ptr<>`.

`%{`	starts a code block, which is terminated with `%}`
`%left`	specifies left associativity on the terminal as well as the precedence level
`%nonassoc`	specifies non-associativity on the terminal as well as the precedence level
`%option`	YooParse configurations. See above.
`%right`	specifies right associativity on the terminal as well as the precedence level
`%start`	specifies the start non-terminal instead of the first one encountered.
`%token`	specifies a terminal. The value of this terminal is automatically assigned.

YYParserState	Parser stack value data type.
YYParserStateList	Parser stack type. Equivalent of `std::list<YYParserState>`
YYValueType	`_yyValueType` data type.
YYTextType	`_yyText` data type. Defined in `yoogroup::YooLex<>`.