How to begin with a new project?

Top Previous Next

It is suggested that you read the introduction and experiment with the wizard for new projects and with the examples, before you begin to develop your own projects.

For many programming languages and formats one can find ready grammars in the Internet. If such a description exists, you often can translate it into the syntax of the TextTransformer quite easily. A half automatic translator for Coco/R grammars belongs to the examples of the TextTransformer package. In the examples: E-mail address and XML, is demonstrated, how available syntax specifications can be converted into TextTransformer programs.

If no syntax description exists, you have to create it your own. There are some rules and experiences which can serve as a guide at the construction of a new project.

1. Set the required project options!

E.g. it is very important already at the beginning of the development of a new project, to select the characters, which don't have a meaning for parsing the texts. Per default the line feed and the line break characters are amongst them. This setting must be changed, if line breaks have to be recognized.

Another important decision is, whether all literal tokens should be tested or not. Rule of thumb: All literals should be tested for formalized languages with defined key words at significant positions. Only the expected literals should be tested otherwise The local options also can if necessary be adapted respectively.

2. At first design the parser without semantic actions!

For the construction of the parser it often will be necessary or appropriate to rearrange productions and to simplify complex productions by definition of sub-productions. If the parser already contained a semantic code, this had to be adapted newly at each of these changes.

3. Develop top down!

Start with the most general production, the start rule that shall recognize the complete text, and then take the start rule to pieces of sub-productions which shall recognize principal parts of the text. According to the same principle the sub-productions then further can be refined. If e.g. a book shall be parsed, then the start rule would be:

Book ::= SKIP // recognizes the whole text

After the first improvement:

Book ::= SKIP? Chapter+

Chapter ::= TITLE SKIP

TITLE here stands for a regular expression, which unmistakably distinguishes a chapter heading from other text components.

Remark: Such an expression doesn't exist certainly for all books. The book is used as an example of a text structure, which everybody knows. The book parser works only for syntactically ideal books. (e.g. TITLE ::= \d\.[^\r\n]+ // if you take the text of this page as a "book".)

The Chapter production can further be refined now:

Chapter ::= TITLE Paragraph+

Paragraph ::= EOL+ SKIP

EOL ::= \r?\n // end of line

The advantage of this top down procedure is, that in every stage of the development the current parser can be tested at all "books". Possible faults can so already be discovered in an early stage of development.

Note: With the transformation manager many examples can be tested as a batch. If such a test fails, the corresponding text can be opened with a click in the IDE.

4. Choose the kind of transformation!

There in principle are three ways how the parser can be completed to a transformation program. They differ in what is done with the recognized text sections.

a) text sections are immediately processed and written into the output.

b) text sections are, written into variables and these are returned or passed as (reference-) parameters to other productions, where they can be evaluated or combined to new values.

c) a parse tree is produced and the processing of the text sections are carried out after the complete text was parsed.

The last method is the most variable since all text sections still can in principle be accessed and since with the parsing tree a different output can be caused, depending on the used function table. If a translator shall be developed, which shall convert one format into several output formats, then the use of a parse tree is nearly indispensable. The development of such a translator is, however, much more difficult than the direct processing of the source text with one of the two other methods.

If the order in which the processed text sections shall be put out is approximately identically with the sequence in which they were recognized, the first method of direct output is recommend. If recognized text parts must be rearranged or the processing of a part depends from a text that is found later, the second method is recommend.

If you have decided about the way of the transformation, different wizards can help you to insert parameters, variable declarations or tree nodes into the productions.

5. Make a copy program before writing the definite transformation code!

This rule only applies to projects at which the source text shall be modified in some significant places. If at first a program is made, which simply copies the source text, by comparison with the target text can be found easily, whether the output is complete.

How to begin practically

This page belongs to the TextTransformer Documentation

Home Content German