Parser algorithm

Top  Previous  Next

Algorithms > Parser algorithm

 

Preliminary remark:

The TextTransformer is an interpreted parser generator. It is a parser, which parses production scripts to create a new parser. The created parser in turn parses input text to transform it. Because the parser of the TextTransformer is created by itself, the algorithms of parsing scripts and of parsing input text are the same.

 

The task of a parser is the syntactical analysis of text. The parser uses a rule system (grammar), which defines the valid possibilities of token sequences. At every new step of parsing is tested, whether the actual evaluated token is a token, which may follow or not. If not, the parsing failed; if yes, the parsing will be continue through the alternative of the grammar, which is determined by the token. The decision about the validity of the next token is made by means of precomputed token sets: the first sets of the possible following nodes, including the SKIP alternatives. (The computation of the token sets is done in the actions, while parsing a production script.)  Here the function of the parser is crossed over with the scanners, because these token sets are exactly the sets of the single scanners, which were the theme in the previous page. In contrast to other parser/scanners combinations the set of tokens, which will be tested at this point of the grammar can be limited to the set of allowed tokens. This results into an increased velocity of parsing, especially, if the lexicon is big.

 

The parsing algorithm is as follows:

 

1. At the beginning of the text analysis no token is recognized. Because of that, a local (interface) scanner compares the beginning of the text with the set of token, by which the text can start, according to the grammar.

 

2. if a match is found, the parser steps to the node with a first set that contains the evaluated token.  This may be repeated through a sequence of nullable nodes, until the node is reached, which represents the token itself.

 

3. Each terminal node has a scanner, which evaluates the next token. But now there are two cases:

a) inside of the production, which contains the actual terminal symbol there always are other terminal symbols situated, which must be passed before it is possible to leave the actual production. In this case the next token will be evaluated and continued as described at point 2,

b) inside of the production, which contains the actual terminal symbol nothing follows or only nullable structures are following. In this case, the set of possible followers of the terminal symbol depends on the followed of the production. But a production can be called at different places of the grammar, and at every place may be followed by different token sets. In this case the TextTransformer will test at first only the incomplete set of those token, which can follow inside of the actual production. Further point 3 will be executed.

 

3. After each call of a production inside the body of a different production a local scanner tests the token, which may follow the call inside of the outer production. Then will be continued as described at point 2.

 

 

To summarize, following token sets are needed:

The first set of the start rule

(The first sets of those productions, for which interfaces are created)

The follow sets of the terminal symbols (token)

The follow sets of the calls to non-terminal symbols (productions)

 

 

 



This page belongs to the TextTransformer Documentation

Home  Content  German