Introduction > Looking-ahead

Looking-ahead

Top Previous Next

The decision by which production or branching within a production the analysis of a text has to be continued always depends on the tokens following in the text.

A parser most efficiently works if the next token already makes this decision possible. If a look-ahead of only of a single token always suffices for analyzing the text, then the parser is called LL(1) conform. It is an art of the developer to formulate the grammar - the set of the productions - so that the parser gets LL (1)-conform.

The TextTransformer offers a great support at this task since it generates notes and error messages automatically if the grammar is not LL(1). TETRA also permits, however, the look-ahead of arbitrarily many many tokens, if this should be required. The already known productions are used for such a look-ahead once more. A production can be applied as a trial to a text to test whether it can parse it or not. The analysis of the text then can depending on the result of this test be continued in a different way. (At the tentative application of productions no semantic actions are executed.)

The look-ahead is explained again concretely at the example of a formal salutation at the beginning of a letter Either it is

Dear Mr NAME

Dear Mrs NAME

To parse these short texts at first one could have the idea, to formulate the following productions (the character "|" separates alternatives from each other and can be read as "or"):

Salutation ::= SalutationBeginning NAME

SalutationBeginning ::= "Dear" "Mr" NAME

| "Dear "Mrs" NAME

A look-ahead of two words is required here. After the word "Dear" is recognized the following word decides, which alternative of SalutationBeginning has to be chosen and you must go back in front of the word "Dear" again to start with the real processing of the text.

The following productions are better:

Salutation ::= "Dear" Gender NAME

Gender :: "Mr"

| "Mrs"

Here always the next word decides how to continue with the productions.

Astonishingly many texts can be parsed according to this LL(1) principle, if one designs the rules correspondingly.

There nevertheless are cases a look-ahead of only one token doesn't suffice. By the TextTransformer it is possibly to look-ahead arbitrarily far in the text in such cases. E.g. it could be necessary to know already before parsing a sentence whether it is an interrogative sentence or not. However, the interrogative sentence can be identified by the question mark only at the end of the sentence. This could be managed as follows:

IF( InterrogativeSentence() )

InterrogativeSentence

ELSE

NormalSentence

END

InterrogativeSentence :: InterrogativeSentenceWordOrder "?"

NormalSentence ::= NormalSentenceWordOrder ( "." | "!" )

This page belongs to the TextTransformer Documentation

Home Content German