Character references

Top  Previous  Next

Examples > XML > Character references

 

Inside of an XML-element:

 

<text> ... </text>

 

the characters:

 

< > " ' & $

 

may not be used.So the have to be coded either as a name entity or as a decimal entity:

 

Character

Name entitiy

Decimal entity

<

&lt

&#60

>

&gt

&#62

&

&amp

&#38

"

&quot

&#34

'

&apos

&#39

 

 

The mstrstr class variable m_EntityRefs with the values of the first and second column is used to decode the named entities. Insde of the Reference production m_EntityRefs helps to translate the named entities into the corresponding characters.

The according decimal entities are treated in the action for the token:

 

CharRef ::= &#(\d+);|&#x([0-9a-fA-F]+);

 

Special character, which don't belong to the first 128 characters of the ASCII set, often have to be coded too.

Whether and how this is necessary depends on the encoding attribute in XMLDecl. A complete XML parser should be able to access a lot of tables. It is presupposed here for the demonstration, that we are using the standard font for Western Europe, Latin America (ISO 8859-1) . The characters then can be translated according to the numbering of the ANSI table.

The regular expression CharRef either recognizes a character in a decimal coding and delivers the corresponding decimal number as the 1. sub-expression or it recognizes a hexa decimal in the 2. sub-expression.

 

{{

if(xState.length(1))

return ctos(xState.itg(1));

else

if(xState.length(2))

   return ctos(hstoi(xState.str(2)));

else

{

   throw CTT_Error("unknown char reference");

   return str(); // formal return type

}}

 

 

 



This page belongs to the TextTransformer Documentation

Home  Content  German