Algorithms > Parser algorithm

Parser algorithm

Top Previous Next

The task of a parser is the syntactical analysis of text. The parser uses a rule system (grammar), which defines the valid possibilities of token sequences. At every new step of parsing is tested, whether the actual evaluated token is a token, which may follow or not. If not, the parsing failed; if yes, the parsing will be continue through the alternatives of the grammar, which is determined by the token.

The decision about the validity of the next token is made by means of precomputed token sets: the first sets of the possible following nodes, including the SKIP alternatives. Here the function of the parser is crossed over with the scanners, because these token sets are exactly the sets of the single scanners, which were the theme in the previous section. In the project options the set of tokens, which will be tested can be limited to the set of allowed tokens. This results into an increased velocity of parsing, especially, if the lexicon is big.

The parsing algorithm is as follows:

1. Testing the first set of the startrule

At the beginning of the text analysis no token is recognized. Because of that, a local (interface) scanner compares the beginning of the text with the set of tokens, by which the text can start, according to the grammar.

2. Going to the expected terminal symbol

If a match is found, the parser steps to the node with a first set that contains the evaluated token. This may be repeated through a sequence of nullable nodes, until the node is reached, which represents the token itself.

3. Testing the followers inside and outside of the production

Each terminal node has a scanner, which evaluates the next token. But now there are two cases:

a) inside of the production, which contains the actual terminal symbol there always are other terminal symbols situated, which must be passed before it is possible to leave the actual production. In this case the next token will be evaluated and continued as described at point 2,

b) inside of the production, which contains the actual terminal symbol nothing follows or only nullable structures are following. In this case, the set of possible followers of the terminal symbol depends on the followed of the production. But a production can be called at different places of the grammar, and at every place may be followed by different token sets. In this case the TextTransformer will test at first only the incomplete set of those token, which can follow inside of the actual production. Further point 3 will be executed.

Note:

The TextTransformer is an interpreted parser generator. It is a parser, which parses production scripts to create a new parser. The created parser in turn parses input text to transform it. Because the parser of the TextTransformer is created by itself, the algorithms of parsing scripts and of parsing input text are the same.

This page belongs to the TextTransformer Documentation

Home Content German