Perl Quick Reference Card Page 2

ADVERTISEMENT

Alphanumeric Regex Metasymbols
161-162
Composite Unicode Properties
168-169
Symbol Atomic Meaning
Property Equivalent
Yes
Match the null character (ASCII NUL).
IsASCII [\x00-\x7f]
\0
IsAlnum [\p{IsLl}\p{IsLu}\p{IsLt}\p{IsLo}\p{IsNd}
Yes
Match the character given in octal, up to
.
\NNN
\377
IsAlpha [\p{IsLl}\p{IsLu}\p{IsLt}\p{IsLo}
Yes
Match nth previously captured string (decimal).
\n
IsCntrl \p{IsC}
Yes
Match the alarm character (BEL).
\a
IsDigit \p{IsNd}
No
True at the beginning of a string.
\A
IsGraph [^\pC\p{IsSpace}]
Yes
Match the backspace character (BS).
IsLower \p{IsLl}
\b
IsPrint \P{IsC}
No
True at a word boundary.
\b
IsPunct \p{IsP}
No
True when not at a word boundary.
\B
IsSpace [\t\n\f\r\p{IsZ}]
Yes
Match the control character Ctrl-X (
).
\cX
\cZ
IsUpper [\p{IsLu}\p{IsLt}]
Yes
Match one byte (C char) even in utf8 (dangerous).
\C
IsWord
[_\p{IsLl}\p{IsLu}\p{IsLt}\p{IsLo}\p{IsNd}]
Yes
Match any digit character.
IsXDigit [0-9a-fA-F]
\d
Yes
Match any non-digit character.
\D
Perl also provides the following composites:
Yes
Match the escape character (ASCII ESC, not
).
\e
\
End case (
,
) or quotemeta (
) translation.
\E
\L
\U
\Q
Property
Meaning
Normative
Yes
Match the form feed character (FF).
\f
Crazy control characters and such
Yes
IsC
No
True at end-of-match position of prior
.
\G
m//g
Letters
Partly
IsL
Lowercase the next character only.
\l
Marks
Yes
IsM
Lowercase till
.
\L
\E
Numbers
Yes
IsN
Yes
Match the newline character (usually NL, but CR
\n
Punctuation
No
IsP
on Macs).
Symbols
No
IsS
Yes
Match the named char (
).
\N{NAME}
\N{greek:Sigma}
Separators (Zeparators?)
Yes
IsZ
Yes
Match any character with named property.
\p{PROP}
Yes
Match any character without the named property.
POSIX-Style Character Classes
174-175
\P{PROP}
Quote (de-meta) metacharacters till
.
\Q
\E
Class Meaning
Yes
Match the return character (usually CR, but NL
\r
Any alphanumeric, that is an
or a
.
alnum
alpha
digit
on Macs).
Any letter. (That's a lot more letters than you think, unless
alpha
Yes
Match any whitespace character.
\s
you're thinking Unicode, in which case it's still a lot.)
Yes
Match any nonwhitespace character.
\S
Any character with an ordinal value between 0 and 127.
ascii
Yes
Match the tab character (HT).
\t
Any control character. Usually characters that don't
cntrl
Titlecase next character only.
\u
produce output as such, but instead control the terminal
Uppercase (not titlecase) till
.
\U
\E
somehow; for example, newline, form feed, and backspace.
Yes
Match any “word” character (alphanum plus “_”).
\w
A character representing a decimal digit, such as 0 to 9.
digit
Yes
Match any nonword character.
\W
(Includes other characters under Unicode.) Equivalent to \d.
Yes
Match the character given one or two hex digits.
\xHEX
Any alphanumeric or punctuation character.
graph
Yes
Match the character given in hexadecimal.
\x{abcd}
A lowercase letter.
lower
Yes
Match Unicode “combining character sequence”
\X
Any alphanumeric or punctuation character or space.
print
string.
Any punctuation character.
punct
No
True at end of string only.
\z
Any space character. Includes tab, newline, form feed, and
space
No
True at end of string or before optional newline.
\Z
carriage return (and a lot more under Unicode.) Equivalent
to \s.
Classic Character Classes
167
Any uppercase (or titlecase) letter.
upper
Symbol Meaning
As Bytes
As utf8
Any identifier character, either an
or underline.
word
alnum
Digit
\d
[0-9]
\p{IsDigit}
Any hexadecimal digit. Equivalent to
.
xdigit
[0-9a-fA-F]
Nondigit
\D
[^0-9]
\P{IsDigit}
You can negate the POSIX character classes by prefixing the class
White
\s
[ \t\n\r\f]
\p{IsSpace}
name with a
following the
. (This is a Perl extension.)
^
[:
Nonwhitespace
\S
[^ \t\n\r\f]
\P{IsSpace}
Word character
\w
[a-zA-Z0-9_]
\p{IsWord}
Non-(word character)
\W
[^a-zA-Z0-9_] \P{IsWord}

ADVERTISEMENT

00 votes

Related Articles

Related forms

Related Categories

Parent category: Education
Go
Page of 2