Token and lexemes

Top  Previous  Next

Glossary > Token and lexemes


Token are the elements of a text, into which pieces the lexical analysis takes the text. Typical tokens are words, numbers, punctuation characters etc. The tokens of a programming language for example are key words like "double" or "while". In the case of key words, there is a 1:1-relationship between the tokens and the Lexemes. A lexeme is a section of text, which represents a token. For example in case of a number there are many lexemes representing the same token; for example: "12", "14.8" or "1001". Such general tokens are described by patterns of text. Inside of the TextTransformer the description of the patterns is done by means of regular expressions.

A problem consists in overlapping tokens. For example: "<" and "<=". The TextTransformer automatically chooses in such conflicting cases the longer lexeme: "<=".


Which parts of a text are tokens, in the end depends on the interpretation of the text.

This page belongs to the TextTransformer Documentation

Home  Content  German