First set

Top  Previous  Next

Glossary > First set


Each node in the TETRA syntax tree and each symbol of the grammar are characterized by two sets of token:  the first set and the follow set. The first set is of special interest for a top down analysis, because it determines the progress of the analysis.


The first set of a node contains all token, by which the parser will be guided to the node.

If the actual text isn't matched by one of the token of the first set, an other node will be chosen, namely the node with the first set that contains a token, which matches. If there is no such alternative the parsing is stopped with an error message. To guarantee that the decision between alternatives is definite, the first sets of the alternatives must be disjunctive, i.e.: there may not be a token contained in two alternative first sets.


So the first sets are essential for the progress of the text analysis and the calculation of these sets is the central task of the TextTransformer. To calculate these sets, several cases have to be distinguished:


The simplest case is, a node of a terminal symbol. Here the first set exactly contains this symbol.




The first set of "TETRA" is the set with the Token "TETRA" as its only element.


The first set of a concatenation of symbols is the first set of its first member (if its not nullable, see below)




The first set of "TETRA" "means" "Texttransformer" ...  is the set with the token "TETRA" as its only element.



More elements are contained in nodes, which contain alternatives. The first set of such a node consists in the union of the first sets of the alternatives.




The first set of  ( "TETRA" ... | "Texttransformer" ... ) is the set of the token  "TETRA" and "Texttransformer".




Quite difficult the calculation of first sets of nullable nodes can be. These are the options "(...)? and optional repetitions "(...)*". At first all the tokens belong to the first set, by which the alternative chains inside of the structure can begin. For:


( "very"+ | "super" )? "good"


this is the set of the token "very" and "super". Because the node is optional, the parsing must continue, even if  "very" or "super" are not found at the actual position. The parsing shall continue with the token "good", or more general, it shall be continued with one of the token, which can follow the nullable structure. The first set of a nullable structure must be united with its follow set: in this case the result is {"very", "super", "good"}. (Again, all set must be disjunctive, see above)


The follow set is the set of all the tokens, which can follow a node.


Semantic actions also are presented in the syntax tree as nodes. They are nullable nodes. Their first set exactly is their follow set.






This page belongs to the TextTransformer Documentation

Home  Content  German