Character classes

Top  Previous  Next

Scripts > Token definitions > Regular expressions > Character classes

 

Character classes are denoted using the syntax "[:classname:]" within a set declaration, for example "[[:space:]]" is the set of all white space characters.

[[:digit:],]  is the set of all digit and the comma.

 

The available character classes are:

 

 

alnum

Any alpha numeric character; alpha and digit (*)

alpha

Any alphabetical character a-z and A-Z, umlauts etc. (*)

blank

Any blank character, either a white space, a non-breaking space (decimal 160) or a tab

cntrl

Any control character

digit

Any digit 0-9

graph

Any graphical character; all other except cntrl

lower

Any lower case character a-z (*)

print

Any printable character, graph and blank

punct

Any punctuation character

space

Any white space character (space, tabulator, carriage return, line feed... )

upper

Any upper case character A-Z (*)

xdigit

Any hexadecimal digit character, 0-9, a-f and A-F

word

Any word character - all alphanumeric characters plus the underscore (*)

      

 

(*) according to the local settings on your computer other characters might be recognized too. Try it in the dialog for the calculation of character classes!

 

There are some shortcuts that can be used in place of the character classes

 

 

\w

[:word:]

\W

^[:word:]

\s

[:space:]

\S

^[:space:]

\d

[:digit:]

\D

^[:digit:]

\l

[:lower:]

\L

^[:lower:]

\u

[:upper:]

\U

^[:upper:]

                       



This page belongs to the TextTransformer Documentation

Home  Content  German