Parser state

Top  Previous  Next

Scripts > Class elements and c++ instructions > Parser class methods > Parser state

 

At each moment the state of the parsing process is characterized by the actual position in the input text and by the hitherto recognized token and productions. The interpreter can access some of the properties of the actual state. The state as a whole is represented by the variable xState.

 

Remark to the names xState and State:

 

In the course of the development of the TextTransformer State was also used instead of the name xState. The preceding 'x' shall express that it is a parameter variable. State was taken as a class element too. The parser state is only existing as a parameter by now. Therefore the name xState is used everywhere now. However, State can be used for xState as synonymous furthermore. For the interpreter the use of State or xState doesn't make any difference. See also: xState as parameter of a call of a class method

 

 

Single properties of the state can be investigated by the following instructions:

 

unsigned int size() const

unsigned int length(int sub = 0) const

stri str(int sub) const

str str() const

str text(unsigned int from) const

str text(unsigned int from, unsigned int to) const

str copy() const

int itg() const

int itg(int sub) const

double dbl() const

double dbl(int sub) const

 

str next_str() const

str next_copy() const

str next_str(int sub) const

unsigned int next_size() const

unsigned int next_length(int sub = 0) const

 

str lp_str() const

str lp_str(int sub) const

str lp_copy() const 

unsigned int lp_length(int sub = 0) const

 

str la_str() const

str la_copy() const

str la_str(int sub) const

unsigned int la_length(int sub = 0) const

 

int LastSym() const

unsigned int Line() const

int Col() const

unsigned int Position() const

unsigned int LastPosition() const

unsigned int NextPosition() const

void SetPosition(unsigned int xi );

 

bool IsSubCall() const

str ProductionName() const

str BranchName() const

 

bool xState.IsSubCall() const

str ProductionName() const

str BranchName() const

 

int GetState()

void SetState(int xeState);

 

 

Example:

 

Source text:  one two three four

Production: "one" "two" "three" "four"

 

If "two" was recognized last, is valid:

 

0123456789...

one two three four

 

xState.str()              : two

xState.copy()             :  two

xState.length()           : 3

xState.size()             : 1

xState.Line()             : 1

xState.Col()              : 8

xState.LastPosition()     : 4

xState.Position()         : 7

xState.NextPosition()     : 8

 

 

unsigned int size() const

 

Returns the number of sub expressions, which take part at the actual recognition, included the whole recognition (sub expression with the index null). This is the case even if no matches were found

 

 

str str(int sub) const

 

Returns what matched, item 0 represents the whole string, item 1 the first sub-expression and so on, defaults to the whole match (sub == 0).

 

 

str text(unsigned int from) const

str text(unsigned int from, unsigned int to) const

 

By the function text you get parts of the source text.

If it is invoked with only one parameter, it delivers the text from the position "from" until the end of the token recognized currently. With the second parameter the end of the text section can be determined.

If from or to is greater than the length of the source text or if to is greater than from, an empty string is returned. If only to is greater as the length of the source text, the string from from until the end of the text is returned.

 

 

 

 

str copy() const

 

returns the string from the end of the last recognized token to the end of the current recognized token. xState.copy() is equivalent to xState.str(-1) + xState.str():

 

 

unsigned int length(int sub = 0) const

 

Returns the length of the matched sub expression, defaults to the length of the whole match (sub == 0).

 

 

int LastSym() const

 

Returns the internally given number of the last recognized token.

 

 

unsigned int Line() const

 

Returns the line number of the last recognized token. (The line count begins with one.)

 

 

int Col() const

 

Returns the column number of the last recognized token. (The column count begins with one.)

 

 

unsigned int LastPosition() const

 

Returns the position of the last recognized token, that means the number of characters from the beginning of the parsed text to the first character of the recognized token.

 

 

unsigned int Position() const

 

returns the position, where last recognized token ends. This position is equal to LastPosition() + length()

 

 

unsigned int NextPosition() const

 

returns the position, where the next token begins, that means the number of characters from the beginning of the parsed text to the first character of the expected next token.

 

If a SKIP token was recognized last, then position and NextPosition are identical.

The spaces at the end of the text covered vy SKIP can be removed with trim_right_copy

 

 

void SetPosition(unsigned int xi );

 

With SetPosition the current position can be set directly as a number of characters from the start of text. The next token is calculated newly with the current scanner in result of this method. E.g. this can be useful if the text contains Meta information about the lengths of its components.

 

 

 

bool IsSubCall() const

 

returns true, if actually a production is executed, which was invoked from the interpreter. A temporary parser-state variable xState is used here (in contrast to the plugin). Within the productions of the main parser this function returns false.

 

str ProductionName() const

 

returns the name of the actual production.

 

The return type always is std::string, even, when Unicode parsers are created.

 

This function is not thread save.

 

 

str BranchName() const

 

returns the name of the last branch (alternative, option etc.) in the grammar.

 

The return type always is std::string, even, when Unicode parsers are created.

 

This function is not thread save.

---------------------------------------------------------------------------

 

int itg() const

int itg(int sub) const

double dbl() const

double dbl(int sub) const

 

The functions itg and dbl immediately convert the text of the token recognized last into an integer value or a double value. itg returns a correct integer value for text sections  too, which can be interpreted as octal or hexadecimal numbers.

 

----------------------------------------------------------------------------

 

str lp_str() const

str lp_copy() const

str lp_str(int sub) const

unsigned int lp_length(int sub = 0) const

 

These methods are concerning the part of text, which was recognized by the last call of a production ( "lp" for "last production").

lp_str returns the part of text without the ignored characters at the beginning, while lp_copy returns the whole text. The methods lp_str(int sub) and lp_length(int sub) are formally equivalent to the methods str(int sub) and length(int sub). But they can be called only with the indices -1 and 0. If the index is == -1,  you get the information about the ignored text of the production.

 

Example:

 

Prod1 ::=  Prod3 Prod2 {{cout << xState.lp_copy(); }}

Prod2 ::=  {{cout << xState.lp_copy(); }} Prod1+

Prod3 ::=  ID

 

Input ::= a b c

Output ::= a b c

 

Prod3 in Prod1 recognizes "a", which then is printed Prod2. Prod2 then recognizes " b c", which is output at the end of Prod1.

 

 

----------------------------------------------------------------------------

 

str la_str() const

str la_copy() const

str la_str(int sub) const

unsigned int la_length(int sub = 0) const

 

These methods are concerning the part of text, which was recognized by the last call of a look-ahead parser ( "la" for "look-ahead").

la_str returns the part of text without the ignored characters at the beginning, while la_copy returns the whole text. The methods la_str(int sub) and la_length(int sub) are formally equivalent to the methods str(int sub) and length(int sub). But they can be called only with the indices -1 and 0. If the index is == -1, you get the information about the ignored text of the look-ahead.

 

--------------------------------------------------------------------------

 

unsigned int next_size() const

unsigned int next_length(int sub = 0) const

stri next_str(int sub) const

str next_str() const

str next_copy() const

 

This group of functions is analogously to the functions whose names don't start with "next_". They return the corresponding values for the token expected next. The time, when the next token is found out has changed in the course of the development of the TextTransformer, and it might be possible, that modifications could arise again.  These functions have to be used therefore only under reservation.

 

 

----------------------------------------------------------------------------

 

int GetState()

void SetState(int xeState);

 

There are some integer values which characterize the state of the parse-state.

 

  typedef enum { epCleared,

                 epExpectingToken,

                 epExpectingSKIP,

                 epExpectingBreak,

                 epExpectingEOF,

                 epNoProgress,

                 epStopped,

                 epExpectationError,

                 epUnexpectedError,

                 epSkipMatchedNeatless,

                 epUnknownError,

                 epParsedIncomplete,

                 epUnknown

               } EPState;

 

Experienced users can try to manipulate these values in OnParseError, to make some error recovery.

 

 



This page belongs to the TextTransformer Documentation

Home  Content  German