Identifier

Top  Previous  Next

Scripts > Token definitions > Predefined tokens > Identifier

 

The following predefined tokens can be inserted in a project by a pop-up menu.

 

 

ID:        [a-zA-Z_]\w*

 

Identifiers beginning with a character of the alphabet or the underscore and followed by an arbitrary number of alphanumeric characters or the underscore (= \w). It is important, that an identifier cannot begin with a digit. Otherwise it would recognize numbers too. Mostly it is recommended, to define an extra token for numbers.

 

 

URI_WS_DELIM

URI_QUOTE_DELIM

URI_ANGLE_DELIM

 

This are regular expressions for Uniform Resource Identifiers (URI), as for example website addresses. URI's are described in RFC 3986

 

http://www.apps.ietf.org/rfc/rfc3986.html

 

The three expressions are variants of an expression given on the mentioned page. As opposed to the latter these expressions don't recognize any empty text. A URI has to start with the "scheme" expression described there, i.e. with characters on which a colon follows. In addition, a URI has to be delimited from the surrounded text. The three expressions given here are different in the way of this separation.

 

 

URI_WS_DELIM : (([^:/?#]+):)(//([^/?#\\s]*))?([^?#\\s]*)(\\?([^#\\s]*))?(#([\\s]*))?

 

This URI is delimited by white spaces from the rest of the text. So the URI may not contain white spaces. For example URI_WS_DELIM recognizes the following text:

 

http://www.ics.uci.edu/pub/ietf/uri/#Related

 

 

URI_QUOTE_DELIM : "(([^:/?#]+):)(//([^/?#"]*))?([^?#"]*)(\\?([^#"]*))?(#([^"]*))?"

 

This URI is delimited by double quotes from the rest of the text. So the URI could have been written in a manner, that it contains white spaces. For exampleURI_QUOTE_DELIM recognizes the following text:

 

"http://www.ics.uci.edu/pub/ietf/uri/#Related"

 

 

URI_ANGLE_DELIM : <(([^:/?#]+):)(//([^/?#>]*))?([^?#>]*)(\\?([^#>]*))?(#([^>]*))?>

 

This URI is delimited by angle brackets from the rest of the text. So the URI could have been written in a manner, that it contains white spaces. For example URI_ANGLE_DELIM recognizes the following text:

 

<http://www.ics.uci.edu/pub/ietf/uri/#Related>

 

 

 

Remark:

 

The following sections are recognized by the sub-expression in the example above:

 

     $1 = http:

     $2 = http

     $3 = //www.ics.uci.edu

     $4 = www.ics.uci.edu

     $5 = /pub/ietf/uri/

     $6 = <undefined>

     $7 = <undefined>

     $8 = #Related

     $9 = Related

 

where <undefined> indicates that the component is not present. Therefore, we can determine the value of the five components described in RFC 3986 as

 

     scheme    = $2

     authority = $4

     path      = $5

     query     = $7

     fragment  = $9

 



This page belongs to the TextTransformer Documentation

Home  Content  German