Regular Expression Examples
"Regular Expressions" refer to a standardized pattern-matching syntax. It can look something like "/[a-z_.\-]{3,8}/", difficult to grasp upon first exposure. However, regular expressions are a powerful tool for processing text.Regular expressions are supported by the perl programming language, MySQL database(see below), and Vim text editor(see below). All are open source and available for Linux. This page has examples of regular expressions for each of these software tools.
Perl regular expressions
In perl, a pattern-matching expression has an equal sign(=) with a tilde(~) after it, which looks like this:
$string =~ /regular-expression/
The string to be tested goes on the left,
and the pattern to test for goes on the right.
The pattern is enclosed by a delimiting character,
such as quotation marks, although the standard
convention is to use a forward slash ("/"), like above.
The example below will exit if '404' is present in
the string $text:
if($text =~ /404/) { exit }
The pattern to be tested can have special meta-characters to represent character classes. Some of the basic meta-characters in perl's implementation of regular expressions are:
\s one whitespace character
(space, return, tab, etc.)
\S one non-whitespace character
(a-z, 0-9, etc.)
. one character, whitespace or not
\d a digit(0-9)
\n a newline
Elements in a pattern can be given a quantity with the meta-characters below:
+ matches the preceding character
one or more times. So while \d matches one
digit, \d+ will match a series of consecutive digits.
* matches the preceding character
zero or more times, making it's presence optional.
{2} match the preceding character 2 times exactly
{12,} match the preceding character 12 or more times
^ anchor pattern to the beginning of the string.
/^prefix/ will only match if "prefix" is at
the beginning of a string (usually this is the beginning of
a line of text).
$ anchor pattern to the end of a string.
/html$/ will only match if "html"
is at the end of a string.
A forward slash is used to force a
meta-character to be interpreted as a normal
character. So to match ".html" you would use
"\.html" to make the "." a literal period
instead of a meta-character.
Perl regexp examples
Below are some regular expressions I have found useful in perl scripts that generate webpages.URL-ify filenames
Terms like "Linden Hills" and "Lake Calhoun" are categories in the Phototour of Minneapolis website. However, those terms won't work in URLS, because they contain a space. I also standardize all my webpage filenames to lower-case (for consistency). The following regular expression and function coverts phrases like the two above into filenames like "linden_hills.html" and "lake_calhoun.html".$file =~ s/ /_/g; #this is a substitution
$file = lc($file); #'lc()' returns all lower-case
$file .= ".html"; #'.=' adds to a string
Modify navigation links
It is a standard convention to have a row or column of internal links on webpages, to make navigating among pages as easy as possible. Assume you have a list of 5 internal hyperlinks, similar to the following:[ home ] [ all ] [ all2 ] [ all3 ] [ all4 ]
Assume the HTML code for these links is stored in the string variable $links, and is re-used among all web pages.
However, when a visitor is reading a given page, it should be listed but not hyperlinked, to indicate their current position within the website. So the links on the page "all3.html" should look like this:
[ home ] [ all ] [ all2 ] [ all3 ] [ all4 ]
The regular expression below is one way to effect this. Assume the HTML code for your hyperlinks is in a variable "$links", and "$file" is the name of the current webpage(such as "index.html" or "all.html").
$links =~ s/(.*?)<\/a>/$1/i;
The "(.*?)" part of the regular expression
will match everything until the first ""
that's encountered. The question mark makes this pattern "non-greedy";
that is, matching the least number of characters
rather than the most. If the question mark were
absent, the pattern would match everything until the last
"", instead of the first one.
Break long strings of text
If a single line of text is several screens wide(so that it runs off the page) it can make HTML source code awkward to read. There are several ways to deal with this in perl(such as using thesplit function or Text::Wrap
module); I use the regexp below. It wraps a single line of
text with a newline at the first blank space after every 70 characters.
$text =~ s/.{,70} /$&\n /sg;
MySQL regular expressions
MySQL allows you to use regular expressions in "select where" clauses. For example,the below statement would fetch all names where the last name begins with "G":mysql>select first_name,last_name from staff where last_name regexp "^G";
To make a regexp case sensitive, use "regexp binary":
mysql>select first_name,last_name from staff where last_name regexp binary "^G";
I have found regexp select statements useful for databases which have fields of comma-separated numbers. If I need to select all rows with a given number in such a comma-separated list, a regular expression using word boundries would work:
| [[:<:]] | left word boundary |
| [[:>:]] | right word boundary |
The below will locate all rows with number "23" in their comma-seperated list:
mysql>select id,photos from categories where photos regexp "[[:<:]]23[[:>:]]";
MySQL supports the following regexp meta characters "normally":
^ and $ -- anchors for the begining and end of string.
*?+ -- quantifiers for zero or more, zero or one, one or more.
. -- any one character, including a newline.
[a-zA-Z] -- character classes, such as a-zA-Z.
[^0-9] -- exclude a character class, such as 0-9
See Appendix H of the MySQL manual (linked to below)
for details on MySQL's regexp support.
Some other Vim regexp tips:
Vim regular expressions
Vim's regexp support requires
escaping of many regexp meta-characters with
a backslash. For example, the
quantifier "+" must be "\+", and capturing a match
in a substitution uses escaped parenthesis "\(...\)".
| s/pattern/&/ | matches the entire previously matched pattern |
| s/\(pattern\)/\1/ | matches the first captured pattern |
| \{-} | quantifier to match the shortest, instead of longest (greedy), version of the pattern |
| :%s/<.\{-}>//g | matches (and removes) all HTML tags |

The below will locate all rows with number "23" in their comma-seperated list:
mysql>select id,photos from categories where photos regexp "[[:<:]]34[[>:]]";
It works but I don't know why it said that it will be looking for 23 and they end up looking for 34 :-)
Anyways the problem is missing a colon on the second delimiter. It should read [[:>:]]
"The below will locate all rows with number "23" in their comma-seperated list:
mysql>select id,photos from categories where photos regexp "[[::]]";"
Hm, didn't work for me. Too bad, it's elegant simplicity was appealing ;)