Token sets

Top  Previous  Next

Algorithms > Token sets


The decision on the next branching in the grammar can be made dependent in special cases of a look-ahead in the text or of the semantic predicates. In most cases however, which will discussed now, the grammatical alternative is chosen whose first symbol is recognized in the text next. Besides the preference rules already explained this can depend on the set of the tokens which are tested: tokens which aren't tested cannot cause any conflicts either. Therefore there is the possibility in the options of the project and in the local options, only to test expected tokens. However, it can just be desired to recognize conflicts early, too. So e.g. reserved words of a programming language may not be used for variables as names:


double int;  // error


The option is to test expected tokens only in this case has to be deactivated.

However, the token sets have to be discussed at the use of productions outside of the main parser.


Token sets of inclusions, sub-parsers and in a look-ahead


It's conspicuous for the case of comments that conflicts with the tokens of the main parser are unwanted.


CppComment ::= "/*" ( SKIP | STRING )* "*/"

// ! this definition isn't appropriate for nested comments


/* int iCount : Zähler */


If the keyword int were recognized here, the comment couldn't be parsed.

Inclusions can therefore form a new production system in the TextTransformer which is independent of the token set of the main parser.

Exactly the opposite applies to look-ahead productions: it is mostly desired here, that the same tokens as in the main parser are recognized.


The following rules are allowing a flexible adaptation of the token sets to the respective purposes:


1., Any production which isn't called by another production, i.e. every start production, is a base for an independent production system with a token set of its own. This is the union of all tokens occurring in the system. Look-ahead productions used in the system aren't regarded as called here, so they are not part of the system automatically). The start rule of the main parser is the first system.


2., If a production is used in several systems, then the token sets of these systems are united.




Prod1 ::= IF( Prod2() ) Prod2 ELSE Prod3 END

Prod2 ::= "a" "b" 

Prod3 ::= IF( !Prod4() ) "c" "d" ELSE  ID+ END

Prod4 ::= SKIP "h"


The rules Prod2 and Prod3 are reached from the start rule Prod1. The token set of this system is: "a", "b", "c", "d", ID.


Prod2 is used as a look-ahead. Since this production, however, also is part of the system of the start rule, the set of tokens tested by global scanners in Prod2 is identical with that one of the main system: "a", "b", "c", "d", ID.


The look-ahead with Prod4, however, is independent of the main system. Prod4 therefore forms a system of its own whose set of tokens only consists of "h".


Now let's extend Prod4 to:


Prod4 ::= SKIP "h" Prod2?


Prod2 then would be a part of the two previous systems. In this case the token sets of the two systems are united in accordance with point 2 above: "a", "b", "c", "d", "h", ID.


This expansion of Prod4 can have to consequence that e.g. "a" isn't skipped by SKIP in the text "a h" any more.



This page belongs to the TextTransformer Documentation

Home  Content  German