Introduction > Analysis

Analysis

Top Previous Next

The analysis of the source text is done in two steps.

The lexical analysis takes the source text to words, punctuation marks etc. More general: the lexical analysis is the recognition of the so-called tokens. Tokens, also called terminal symbols, consist of one or several characters. Such a sequence can be considered as a pattern of characters, which can be described generally by so-called regular expressions.

Depending on the kind of text, these tokens can denote different things. In a mathematical text names, numbers and operators will be considered, in texts of the natural language words, groups of words, sentences or parts of words are basically elements and in records the different fields.

The lexical analysis also will remove meaningless characters from the text, as spaces, tabs comments etc.

The syntactical analysis evaluates in which order the token appear in the text.

This order is defined by sequences of alternatives of tokens, which repeatedly follow in the text one after the other. For example, a text simply can be considered as a sequence of lines or as repeated occurrences of groups of words separated by punctuation marks. Also the text can obey a grammar, described by complex rules.

A syntactical rule is named a production or a non-terminal symbol.

This page belongs to the TextTransformer Documentation

Home Content German