Parser state |
Top Previous Next |
Scripts > Class elements and c++ instructions > Parser class methods > Parser state
At each moment the state of the parsing process is characterized by the actual position in the input text and by the hitherto recognized token and productions. The interpreter can access some of the properties of the actual state. The state as a whole is represented by the variable xState.
Remark to the names xState and State:
In the course of the development of the TextTransformer State was also used instead of the name xState. The preceding 'x' shall express that it is a parameter variable. State was taken as a class element too. The parser state is only existing as a parameter by now. Therefore the name xState is used everywhere now. However, State can be used for xState as synonymous furthermore. For the interpreter the use of State or xState doesn't make any difference. See also: xState as parameter of a call of a class method
Single properties of the state can be investigated by the following instructions:
unsigned int size() const unsigned int length(int sub = 0) const stri str(int sub) const str str() const str text(unsigned int from) const str text(unsigned int from, unsigned int to) const str copy() const int itg() const int itg(int sub) const double dbl() const double dbl(int sub) const
str next_str() const str next_copy() const str next_str(int sub) const unsigned int next_size() const unsigned int next_length(int sub = 0) const
str lp_str() const str lp_str(int sub) const str lp_copy() const unsigned int lp_length(int sub = 0) const
str la_str() const str la_copy() const str la_str(int sub) const unsigned int la_length(int sub = 0) const
int LastSym() const unsigned int Line() const int Col() const unsigned int Position() const unsigned int LastPosition() const unsigned int NextPosition() const void SetPosition(unsigned int xi );
bool IsSubCall() const str ProductionName() const str BranchName() const
bool xState.IsSubCall() const str ProductionName() const str BranchName() const
int GetState() void SetState(int xeState);
Example:
Source text: one two three four Production: "one" "two" "three" "four"
If "two" was recognized last, is valid:
0123456789... one two three four
xState.str() : two xState.copy() : two xState.length() : 3 xState.size() : 1 xState.Line() : 1 xState.Col() : 8 xState.LastPosition() : 4 xState.Position() : 7 xState.NextPosition() : 8
unsigned int size() const
Returns the number of sub expressions, which take part at the actual recognition, included the whole recognition (sub expression with the index null). This is the case even if no matches were found
str str(int sub) const
Returns what matched, item 0 represents the whole string, item 1 the first sub-expression and so on, defaults to the whole match (sub == 0).
str text(unsigned int from) const str text(unsigned int from, unsigned int to) const
By the function text you get parts of the source text. If it is invoked with only one parameter, it delivers the text from the position "from" until the end of the token recognized currently. With the second parameter the end of the text section can be determined. If from or to is greater than the length of the source text or if to is greater than from, an empty string is returned. If only to is greater as the length of the source text, the string from from until the end of the text is returned.
str copy() const
returns the string from the end of the last recognized token to the end of the current recognized token. xState.copy() is equivalent to xState.str(-1) + xState.str():
unsigned int length(int sub = 0) const
Returns the length of the matched sub expression, defaults to the length of the whole match (sub == 0).
int LastSym() const
Returns the internally given number of the last recognized token.
unsigned int Line() const
Returns the line number of the last recognized token. (The line count begins with one.)
int Col() const
Returns the column number of the last recognized token. (The column count begins with one.)
unsigned int LastPosition() const
Returns the position of the last recognized token, that means the number of characters from the beginning of the parsed text to the first character of the recognized token.
unsigned int Position() const
returns the position, where last recognized token ends. This position is equal to LastPosition() + length()
unsigned int NextPosition() const
returns the position, where the next token begins, that means the number of characters from the beginning of the parsed text to the first character of the expected next token.
If a SKIP token was recognized last, then position and NextPosition are identical. The spaces at the end of the text covered vy SKIP can be removed with trim_right_copy
void SetPosition(unsigned int xi );
With SetPosition the current position can be set directly as a number of characters from the start of text. The next token is calculated newly with the current scanner in result of this method. E.g. this can be useful if the text contains Meta information about the lengths of its components.
bool IsSubCall() const
returns true, if actually a production is executed, which was invoked from the interpreter. A temporary parser-state variable xState is used here (in contrast to the plugin). Within the productions of the main parser this function returns false.
str ProductionName() const
returns the name of the actual production.
The return type always is std::string, even, when Unicode parsers are created.
This function is not thread save.
str BranchName() const
returns the name of the last branch (alternative, option etc.) in the grammar.
The return type always is std::string, even, when Unicode parsers are created.
This function is not thread save. ---------------------------------------------------------------------------
int itg() const int itg(int sub) const double dbl() const double dbl(int sub) const
The functions itg and dbl immediately convert the text of the token recognized last into an integer value or a double value. itg returns a correct integer value for text sections too, which can be interpreted as octal or hexadecimal numbers.
----------------------------------------------------------------------------
str lp_str() const str lp_copy() const str lp_str(int sub) const unsigned int lp_length(int sub = 0) const
These methods are concerning the part of text, which was recognized by the last call of a production ( "lp" for "last production"). lp_str returns the part of text without the ignored characters at the beginning, while lp_copy returns the whole text. The methods lp_str(int sub) and lp_length(int sub) are formally equivalent to the methods str(int sub) and length(int sub). But they can be called only with the indices -1 and 0. If the index is == -1, you get the information about the ignored text of the production.
Example:
Prod1 ::= Prod3 Prod2 {{cout << xState.lp_copy(); }} Prod2 ::= {{cout << xState.lp_copy(); }} Prod1+ Prod3 ::= ID
Input ::= a b c Output ::= a b c
Prod3 in Prod1 recognizes "a", which then is printed Prod2. Prod2 then recognizes " b c", which is output at the end of Prod1.
----------------------------------------------------------------------------
str la_str() const str la_copy() const str la_str(int sub) const unsigned int la_length(int sub = 0) const
These methods are concerning the part of text, which was recognized by the last call of a look-ahead parser ( "la" for "look-ahead"). la_str returns the part of text without the ignored characters at the beginning, while la_copy returns the whole text. The methods la_str(int sub) and la_length(int sub) are formally equivalent to the methods str(int sub) and length(int sub). But they can be called only with the indices -1 and 0. If the index is == -1, you get the information about the ignored text of the look-ahead.
--------------------------------------------------------------------------
unsigned int next_size() const unsigned int next_length(int sub = 0) const stri next_str(int sub) const str next_str() const str next_copy() const
This group of functions is analogously to the functions whose names don't start with "next_". They return the corresponding values for the token expected next. The time, when the next token is found out has changed in the course of the development of the TextTransformer, and it might be possible, that modifications could arise again. These functions have to be used therefore only under reservation.
----------------------------------------------------------------------------
int GetState() void SetState(int xeState);
There are some integer values which characterize the state of the parse-state.
typedef enum { epCleared, epExpectingToken, epExpectingSKIP, epExpectingBreak, epExpectingEOF, epNoProgress, epStopped, epExpectationError, epUnexpectedError, epSkipMatchedNeatless, epUnknownError, epParsedIncomplete, epUnknown } EPState;
Experienced users can try to manipulate these values in OnParseError, to make some error recovery.
|
This page belongs to the TextTransformer Documentation |
Home Content German |