Regular expression

From UNL Wiki

Revision as of 18:00, 22 March 2010 by Martins (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Regular expressions, also referred to as regex or regexp, provide a concise and flexible means for matching strings of text, such as particular characters, words, or patterns of characters. In the UNL^arium framework, regular expressions follow the PCRE library and must be provided between / /. They are used mainly to enhance the power of Ph-rules.

Main features

Characters
a	match the character a
3	match the number 3
\n	newline (NL, LF)
\r	return (CR)
\f	form feed (FF)
\t	tab (TAB)
\x3C	character with the hex code 3C
\u561A	character with the hex code 561A
\e	escape character (alias \u001B)
\c…	control character
Wildcards
.	match any character
\…	quote single metacharacter: \. matches a dot instead of any character and \\ matches a single backslash
\w	alphanumeric + underscore (shortcut for [0-9a-zA-Z_])
\W	any character not covered by \w
\d	numeric (shortcut for [0-9])
\D	any character not covered by \d
\s	whitespace (shortcut for [ \t\n\r\f])
\S	any character not covered by \s
[…]	any character listed: [a5!d-g] means a, 5, ! and d, e, f, g
[^…]	any character not listed: [^a5!d-g] means anything but a, 5, ! and d, e, f, g
Boundaries
\b	matches at a word boundary (spot between \w and \W)
\B	matches anything but a word boundary
^	matches at the beginning of a line (m) or entire string (s)
\A	matches at the beginning of the entire string
$	matches at the end of a line (m) or entire string (s)
\Z	matches at the end of the entire string ignoring a tailing \n
\z	matches at the end of the entire string
Quantifiers
?	match 1 or 0 times
*	0 or more times
+	1 or more times
{n}	exactly n times
{n,}	at least n times
{n,m}	at least n but not more than m times, as often as possible
Grouping
(...)

Examples

RegEx	Description	Matches
/abc/	match the sequence "abc"	abc
/abc./	match the sequence "abc" plus one character	abca, abcb, abcc, abcd, abce, ...
/abc(a)?/	match the sequence "abc" plus zero or one character "a"	abc, abca
/abc(a)*/	match the sequence "abc" plus zero or more characters "a"	abc, abca, abcaa, abcaaa, abcaaaa, abcaaaaa,
/abc(a)+/	match the sequence "abc" plus one or more characters "a"	abca, abcaa, abcaaa, abcaaaa, ...
/abc(a){3}/	match the sequence "abc" plus three characters "a"	abcaaa
/abc(a)(3,}/	match the sequence "abc" plus at least three characters "a"	abcaaa, abcaaaa, abcaaaaa, abcaaaaaa, ...
/abc(a)(2,5}/	match the sequence "abc" plus two to five characters "a"	abcaa, abcaaa, abcaaaa, abcaaaaa
/a[bcd]e/	match "a" plus "b", "c" or "d", plus "e"	abe, ace, ade
/a[^bcd]e/	match "a" plus any character that is not "b", "c" or "d", plus "e"	aae, aee, afe, age, ahe, ...
/a\d/	match "a" plus any single digit	a0, a1, a2, a3, a4, a5, a6, a7, a8, a9
/a(\d){2}/	match "a" plus any two digits	a00, a01, a02, a03, a04, ...

Regular expression

Main features

Examples

Views

Personal tools

Search

UNL

Lingware

Software

UNL Program

Navigation

Toolbox

Print/export