Regular Expressions Cookbook 2nd Edition Code Samples

This HTML file contains all the blocks with regular expressions and source code from the second edition of Regular Expressions Cookbook. If you have purchased the book, you can use this file to easily copy and paste the regular expressions and source code snippets. This file was extracted from the book's DocBook XML source files using PowerGREP.

2. Basic Regular Expression Skills

2.1. Match Literal Text

Solution

The punctuation characters in the ASCII table are: !"#\$%&'\(\)\*\+,-\./:;<=>\?@\[\\]\^_`\{\|}~

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Variations

Block escape

The punctuation characters in the ASCII table are: \Q!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~\E

Regex options: None
Regex flavors: Java 6, PCRE, Perl

Case-insensitive matching

ascii

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

(?i)ascii

Regex options: None
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

2.2. Match Nonprintable Characters

Solution

\a\e\f\n\r\t\v

Regex options: None
Regex flavors: .NET, Java, PCRE, Python, Ruby

\x07\x1B\f\n\r\t\v

Regex options: None
Regex flavors: .NET, Java, JavaScript, Python, Ruby

\a\e\f\n\r\t\x0B

Regex options: None
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

Variations on Representations of Nonprinting Characters

The 26 control characters

\cG\x1B\cL\cJ\cM\cI\cK

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Ruby 1.9

The 7-bit character set

\x07\x1B\x0C\x0A\x0D\x09\x0B

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

2.3. Match One of Many Characters

Solution

Calendar with misspellings

c[ae]l[ae]nd[ae]r

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Hexadecimal character

[a-fA-F0-9]

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Nonhexadecimal character

[^a-fA-F0-9]

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Variations

Shorthands

[a-fA-F\d]

Regex options: None
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

Case insensitivity

(?i)[A-F0-9]

Regex options: None
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

(?i)[^A-F0-9]

Regex options: None
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

Flavor-Specific Features

.NET character class subtraction

[a-zA-Z0-9-[g-zG-Z]]

Regex options: None
Regex flavors: .NET 2.0 or later

2.4. Match Any Character

Solution

Any character except line breaks

'.'

Regex options: None (the “dot matches line breaks” option must not be set)
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Any character including line breaks

'.'

Regex options: Dot matches line breaks
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

'[\s\S]'

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Variations

(?s)'.'

Regex options: None
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python

(?m)'.'

Regex options: None
Regex flavors: Ruby

2.5. Match Something at the Start and/or the End of a Line

Solution

Start of the subject

^alpha

Regex options: None (“^ and $ match at line breaks” must not be set)
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python

\Aalpha

Regex options: None
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

End of the subject

omega$

Regex options: None (“^ and $ match at line breaks” must not be set)
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python

omega\Z

Regex options: None
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

Start of a line

^begin

Regex options: ^ and $ match at line breaks
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

End of a line

end$

Regex options: ^ and $ match at line breaks
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Variations

(?m)^begin

Regex options: None
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python

(?m)end$

Regex options: None
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python

2.6. Match Whole Words

Solution

Word boundaries

\bcat\b

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Nonboundaries

\Bcat\B

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

2.7. Unicode Code Points, Categories, Blocks, and Scripts

Solution

Unicode code point

\u2122

Regex options: None
Regex flavors: .NET, Java, JavaScript, Python, Ruby 1.9

\U00002122

Regex options: None
Regex flavors: Python

\x{2122}

Regex options: None
Regex flavors: Java 7, PCRE, Perl

\u{2122}

Regex options: None
Regex flavors: Ruby 1.9

Unicode category

\p{Sc}

Regex options: None
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Ruby 1.9

Unicode block

\p{IsGreekExtended}

Regex options: None
Regex flavors: .NET, Perl

\p{InGreekExtended}

Regex options: None
Regex flavors: Java, XRegExp, Perl

Unicode script

\p{Greek}

Regex options: None
Regex flavors: XRegExp, PCRE, Perl, Ruby 1.9

\p{IsGreek}

Regex options: None
Regex flavors: Java 7, Perl

Unicode grapheme

\X

Regex options: None
Regex flavors: PCRE, Perl

(?>\P{M}\p{M}*)

Regex options: None
Regex flavors: .NET, Java, Ruby 1.9

(?:\P{M}\p{M}*)

Regex options: None
Regex flavors: XRegExp

Variations

Character classes

[\p{Pi}\p{Pf}\u2122]

Regex options: None
Regex flavors: .NET, Java, XRegExp, Ruby 1.9

[\p{Pi}\p{Pf}\x{2122}]

Regex options: None
Regex flavors: Java 7, PCRE, Perl

Listing all characters

[\u1F00-\u1FFF]

Regex options: None
Regex flavors: .NET, Java, JavaScript, Python, Ruby 1.9

[\x{1F00}-\x{1FFF}]

Regex options: None
Regex flavors: Java 7, PCRE, Perl

[\u0370-\u0373\u0375-\u0377\u037A-\u037D\u0384\u0386\u0388-\u038A\u038C\u038E-\u03A1\u03A3-\u03E1\u03F0-\u03FF\u1D26-\u1D2A\u1D5D-\u1D61\u1D66-\u1D6A\u1DBF\u1F00-\u1F15\u1F18-\u1F1D\u1F20-\u1F45\u1F48-\u1F4D\u1F50-\u1F57\u1F59\u1F5B\u1F5D\u1F5F-\u1F7D\u1F80-\u1FB4\u1FB6-\u1FC4\u1FC6-\u1FD3\u1FD6-\u1FDB\u1FDD-\u1FEF\u1FF2-\u1FF4\u1FF6-\u1FFE\u2126\U00010140-\U0001018A\U0001D200-\U0001D245]

Regex options: None
Regex flavors: Python

[\u0370-\u0373\u0375-\u0377\u037A-\u037D\u0384\u0386\u0388-\u038A\u038C\u038E-\u03A1\u03A3-\u03E1\u03F0-\u03FF\u1D26-\u1D2A\u1D5D-\u1D61\u1D66-\u1D6A\u1DBF\u1F00-\u1F15\u1F18-\u1F1D\u1F20-\u1F45\u1F48-\u1F4D\u1F50-\u1F57\u1F59\u1F5B\u1F5D\u1F5F-\u1F7D\u1F80-\u1FB4\u1FB6-\u1FC4\u1FC6-\u1FD3\u1FD6-\u1FDB\u1FDD-\u1FEF\u1FF2-\u1FF4\u1FF6-\u1FFE\u2126]

Regex options: None
Regex flavors: .NET, Java, JavaScript, Python, Ruby 1.9

[\x{0370}-\x{0373}\x{0375}-\x{0377}\x{037A}-\x{037D}\x{0384}\x{0386}\x{0388}-\x{038A}\x{038C}\x{038E}-\x{03A1}\x{03A3}-\x{03E1}\x{03F0}-\x{03FF}\x{1D26}-\x{1D2A}\x{1D5D}-\x{1D61}\x{1D66}-\x{1D6A}\x{1DBF}\x{1F00}-\x{1F15}\x{1F18}-\x{1F1D}\x{1F20}-\x{1F45}\x{1F48}-\x{1F4D}\x{1F50}-\x{1F57}\x{1F59}\x{1F5B}\x{1F5D}\x{1F5F}-\x{1F7D}\x{1F80}-\x{1FB4}\x{1FB6}-\x{1FC4}\x{1FC6}-\x{1FD3}\x{1FD6}-\x{1FDB}\x{1FDD}-\x{1FEF}\x{1FF2}-\x{1FF4}\x{1FF6}-\x{1FFE}\x{2126}\x{10140}-\x{10178}\x{10179}-\x{10189}\x{1018A}\x{1D200}-\x{1D245}]

Regex options: None
Regex flavors: Java 7, PCRE, Perl

2.8. Match One of Several Alternatives

Solution

Mary|Jane|Sue

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

2.9. Group and Capture Parts of the Match

Solution

\b(Mary|Jane|Sue)\b

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

\b(\d\d\d\d)-(\d\d)-(\d\d)\b

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Variations

Noncapturing groups

\b(?:Mary|Jane|Sue)\b

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Group with mode modifiers

\b(?i:Mary|Jane|Sue)\b

Regex options: None
Regex flavors: .NET, Java, PCRE, Perl, Ruby

sensitive(?i:caseless)sensitive

Regex options: None
Regex flavors: .NET, Java, PCRE, Perl, Ruby

(?i:…)

2.10. Match Previously Matched Text Again

Solution

\b\d\d(\d\d)-\1-\1\b

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

2.11. Capture and Name Parts of the Match

Solution

Named capture

\b(?<year>\d\d\d\d)-(?<month>\d\d)-(?<day>\d\d)\b

Regex options: None
Regex flavors: .NET, Java 7, XRegExp, PCRE 7, Perl 5.10, Ruby 1.9

\b(?'year'\d\d\d\d)-(?'month'\d\d)-(?'day'\d\d)\b

Regex options: None
Regex flavors: .NET, PCRE 7, Perl 5.10, Ruby 1.9

\b(?P<year>\d\d\d\d)-(?P<month>\d\d)-(?P<day>\d\d)\b

Regex options: None
Regex flavors: PCRE 4 and later, Perl 5.10, Python

Named backreferences

\b\d\d(?<magic>\d\d)-\k<magic>-\k<magic>\b

Regex options: None
Regex flavors: .NET, Java 7, XRegExp, PCRE 7, Perl 5.10, Ruby 1.9

\b\d\d(?'magic'\d\d)-\k'magic'-\k'magic'\b

Regex options: None
Regex flavors: .NET, PCRE 7, Perl 5.10, Ruby 1.9

\b\d\d(?P<magic>\d\d)-(?P=magic)-(?P=magic)\b

Regex options: None
Regex flavors: PCRE 4 and later, Perl 5.10, Python

2.12. Repeat Part of the Regex a Certain Number of Times

Solution

Googol

\b\d{100}\b

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Hexadecimal number

\b[a-f0-9]{1,8}\b

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Hexadecimal number with optional suffix

\b[a-f0-9]{1,8}h?\b

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Floating-point number

\d*\.\d+(e\d+)?

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

2.13. Choose Minimal or Maximal Repetition

Solution

<p>.*?</p>

Regex options: Dot matches line breaks
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Discussion

<p>.*</p>

Regex options: Dot matches line breaks
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

2.14. Eliminate Needless Backtracking

Solution

\b\d++\b

Regex options: None
Regex flavors: Java, PCRE, Perl 5.10, Ruby 1.9

\b(?>\d+)\b

Regex options: None
Regex flavors: .NET, Java, PCRE, Perl, Ruby

2.15. Prevent Runaway Repetition

Solution

<html>(?>.*?<head>)(?>.*?<title>)(?>.*?</title>)(?>.*?</head>)(?>.*?<body[^>]*>)(?>.*?</body>).*?</html>

Regex options: Case insensitive, dot matches line breaks
Regex flavors: .NET, Java, PCRE, Perl, Ruby

Discussion

<html>.*?<head>.*?<title>.*?</title>.*?</head>.*?<body[^>]*>.*?</body>.*?</html>

Regex options: Case insensitive, dot matches line breaks
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

2.16. Test for a Match Without Adding It to the Overall Match

Solution

(?<=<b>)\w+(?=</b>)

Regex options: Case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby 1.9

Discussion

Matching the same text twice

(?=\p{Thai})\p{N}

Regex options: None
Regex flavors: PCRE, Perl, Ruby 1.9

Lookaround is atomic

(?=(\d+))\w+\1

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Alternative to Lookbehind

<b>\K\w+(?=</b>)

Regex options: Case insensitive
Regex flavors: PCRE 7.2, Perl 5.10

Solution Without Lookbehind

(<b>)(\w+)(?=</b>)

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

var mainregexp = /\w+(?=<\/b>)/;
var lookbehind = /<b>$/;
if (match = mainregexp.exec("My <b>cat</b> is furry")) {
    // Found a word before a closing tag </b>
    var potentialmatch = match[0];
    var leftContext = match.input.substring(0, match.index);
    if (lookbehind.exec(leftContext)) {
        // Lookbehind matched:
        // potentialmatch occurs between a pair of <b> tags
    } else {
        // Lookbehind failed: potentialmatch is no good
    }
} else {
    // Unable to find a word before a closing tag </b>
}

2.17. Match One of Two Alternatives Based on a Condition

Solution

\b(?:(?:(one)|(two)|(three))(?:,|\b)){3,}(?(1)|(?!))(?(2)|(?!))(?(3)|(?!))

Regex options: None
Regex flavors: .NET, PCRE, Perl, Python

\b(?:(?:(one)|(two)|(three))(?:,|\b)){3,}

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

2.18. Add Comments to a Regular Expression

Solution

\d{4}    # Year
-        # Separator
\d{2}    # Month
-        # Separator
\d{2}    # Day

Regex options: Free-spacing
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

Variations

(?#Year)\d{4}(?#Separator)-(?#Month)\d{2}-(?#Day)\d{2}

Regex options: None
Regex flavors: .NET, XRegExp, PCRE, Perl, Python, Ruby

(?x)\d{4}    # Year
-            # Separator
\d{2}        # Month
-            # Separator
\d{2}        # Day

Regex options: None
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

2.19. Insert Literal Text into the Replacement Text

Solution

$%\*$$1\1

\$%\\*\$1\\1

$%\*\$1\\1

\$%\*\$1\\1

$%\*$1\\1

Discussion

.NET and JavaScript

$$%\*$$1\1

2.20. Insert the Regex Match into the Replacement Text

Solution

Regular expression

http:\S+

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Replacement

<a href="$&">$&</a>

<a href="$0">$0</a>

<a href="\0">\0</a>

<a href="\&">\&</a>

<a href="\g<0>">\g<0></a>

2.21. Insert Part of the Regex Match into the Replacement Text

Solution

Regular expression

\b(\d{3})(\d{3})(\d{4})\b

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Replacement

($1) $2-$3

(${1}) ${2}-${3}

(\1) \2-\3

Solution Using Named Capture

Regular expression

\b(?<area>\d{3})(?<exchange>\d{3})(?<number>\d{4})\b

Regex options: None
Regex flavors: .NET, Java 7, XRegExp, PCRE 7, Perl 5.10, Ruby 1.9

\b(?'area'\d{3})(?'exchange'\d{3})(?'number'\d{4})\b

Regex options: None
Regex flavors: .NET, PCRE 7, Perl 5.10, Ruby 1.9

\b(?P<area>\d{3})(?P<exchange>\d{3})(?P<number>\d{4})\b

Regex options: None
Regex flavors: PCRE, Perl 5.10, Python

Replacement

(${area}) ${exchange}-${number}

(\g<area>) \g<exchange>-\g<number>

(\k<area>) \k<exchange>-\k<number>

(\k'area') \k'exchange'-\k'number'

($+{area}) $+{exchange}-$+{number}

($1) $2-$3

2.22. Insert Match Context into the Replacement Text

Solution

$`$_$'

\`\`\&\'\'

$`$`$&$'$'

3. Programming with Regular Expressions

3.1. Literal Regular Expressions in Source Code

Solution

C#

"[$\"'\n\\d/\\\\]"

@"[$""'\n\d/\\]"

VB.NET

"[$""'\n\d/\\]"

Java

"[$\"'\n\\d/\\\\]"

JavaScript

/[$"'\n\d\/\\]/

XRegExp

"[$\"'\n\\d/\\\\]"

PHP

'%[$"\'\n\d/\\\\]%'

Perl

/[\$"'\n\d\/\\]/

m![\$"'\n\d/\\]!

s![\$"'\n\d/\\]!!

Python

r"""[$"'\n\d/\\]"""

"[$\"'\n\\d/\\\\]"

Ruby

/[$"'\n\d\/\\]/

%r![$"'\n\d/\\]!

3.2. Import the Regular Expression Library

Solution

C#

using System.Text.RegularExpressions;

VB.NET

Imports System.Text.RegularExpressions

XRegExp

<script src="xregexp-all-min.js"></script>

var XRegExp = require('xregexp').XRegExp;

Java

import java.util.regex.*;

Python

import re

3.3. Create Regular Expression Objects

Solution

C#

Regex regexObj = new Regex("regex pattern");

try {
    Regex regexObj = new Regex(UserInput);
} catch (ArgumentException ex) {
    // Syntax error in the regular expression
}

VB.NET

Dim RegexObj As New Regex("regex pattern")

Try
    Dim RegexObj As New Regex(UserInput)
Catch ex As ArgumentException
    'Syntax error in the regular expression
End Try

Java

Pattern regex = Pattern.compile("regex pattern");

try {
	Pattern regex = Pattern.compile(userInput);
} catch (PatternSyntaxException ex) {
	// Syntax error in the regular expression
}

Matcher regexMatcher = regex.matcher(subjectString);

regexMatcher.reset(anotherSubjectString);

JavaScript

var myregexp = /regex pattern/;

var myregexp = new RegExp(userinput);

XRegExp

var myregexp = XRegExp("regex pattern");

Perl

$myregex = qr/regex pattern/

$myregex = qr/$userinput/

Python

reobj = re.compile("regex pattern")

reobj = re.compile(userinput)

Ruby

myregexp = /regex pattern/;

myregexp = Regexp.new(userinput);

Compiling a Regular Expression Down to CIL

C#

Regex regexObj = new Regex("regex pattern", RegexOptions.Compiled);

VB.NET

Dim RegexObj As New Regex("regex pattern", RegexOptions.Compiled)

3.4. Set Regular Expression Options

Solution

C#

Regex regexObj = new Regex("regex pattern",
    RegexOptions.IgnorePatternWhitespace | RegexOptions.IgnoreCase |
    RegexOptions.Singleline | RegexOptions.Multiline);

VB.NET

Dim RegexObj As New Regex("regex pattern",
    RegexOptions.IgnorePatternWhitespace Or RegexOptions.IgnoreCase Or
    RegexOptions.Singleline Or RegexOptions.Multiline)

Java

Pattern regex = Pattern.compile("regex pattern",
    Pattern.COMMENTS | Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE |
    Pattern.DOTALL | Pattern.MULTILINE);

JavaScript

var myregexp = /regex pattern/im;

var myregexp = new RegExp(userinput, "im");

XRegExp

var myregexp = XRegExp("regex pattern", "xism");

PHP

regexstring = '/regex pattern/xism';

Perl

m/regex pattern/xism;

Python

reobj = re.compile("regex pattern",
    re.VERBOSE | re.IGNORECASE |
    re.DOTALL | re.MULTILINE)

Ruby

myregexp = /regex pattern/xim;

myregexp = Regexp.new(userinput,
    Regexp::EXTENDED or Regexp::IGNORECASE or
    Regexp::MULTILINE);

3.5. Test If a Match Can Be Found Within a Subject String

Solution

C#

bool foundMatch = Regex.IsMatch(subjectString, "regex pattern");

bool foundMatch = false;
try {
    foundMatch = Regex.IsMatch(subjectString, UserInput);
} catch (ArgumentNullException ex) {
    // Cannot pass null as the regular expression or subject string
} catch (ArgumentException ex) {
    // Syntax error in the regular expression
}

Regex regexObj = new Regex("regex pattern");
bool foundMatch = regexObj.IsMatch(subjectString);

bool foundMatch = false;
try {
    Regex regexObj = new Regex(UserInput);
    try {
        foundMatch = regexObj.IsMatch(subjectString);
    } catch (ArgumentNullException ex) {
        // Cannot pass null as the regular expression or subject string
    }
} catch (ArgumentException ex) {
    // Syntax error in the regular expression
}

VB.NET

Dim FoundMatch = Regex.IsMatch(SubjectString, "regex pattern")

Dim FoundMatch As Boolean
Try
    FoundMatch = Regex.IsMatch(SubjectString, UserInput)
Catch ex As ArgumentNullException
    'Cannot pass Nothing as the regular expression or subject string
Catch ex As ArgumentException
    'Syntax error in the regular expression
End Try

Dim RegexObj As New Regex("regex pattern")
Dim FoundMatch = RegexObj.IsMatch(SubjectString)

Dim FoundMatch = RegexObj.IsMatch(SubjectString)

Dim FoundMatch As Boolean
Try
    Dim RegexObj As New Regex(UserInput)
    Try
        FoundMatch = Regex.IsMatch(SubjectString)
    Catch ex As ArgumentNullException
        'Cannot pass Nothing as the regular expression or subject string
    End Try
Catch ex As ArgumentException
    'Syntax error in the regular expression
End Try

Java

Pattern regex = Pattern.compile("regex pattern");
Matcher regexMatcher = regex.matcher(subjectString);
boolean foundMatch = regexMatcher.find();

boolean foundMatch = false;
try {
	Pattern regex = Pattern.compile(UserInput);
	Matcher regexMatcher = regex.matcher(subjectString);
	foundMatch = regexMatcher.find();
} catch (PatternSyntaxException ex) {
	// Syntax error in the regular expression
}

JavaScript

if (/regex pattern/.test(subject)) {
    // Successful match
} else {
    // Match attempt failed
}

PHP

if (preg_match('/regex pattern/', $subject)) {
    # Successful match
} else {
    # Match attempt failed
}

Perl

if (m/regex pattern/) {
    # Successful match
} else {
    # Match attempt failed
}

if ($subject =~ m/regex pattern/) {
    # Successful match
} else {
    # Match attempt failed
}

$regex = qr/regex pattern/;
if ($subject =~ $regex) {
    # Successful match
} else {
    # Match attempt failed
}

Python

if re.search("regex pattern", subject):
    # Successful match
else:
    # Match attempt failed

reobj = re.compile("regex pattern")
if reobj.search(subject):
    # Successful match
else:
    # Match attempt failed

Ruby

if subject =~ /regex pattern/
    # Successful match
else
    # Match attempt failed
end

if /regex pattern/ =~ subject
    # Successful match
else
    # Match attempt failed
end

3.6. Test Whether a Regex Matches the Subject String Entirely

Solution

C#

bool foundMatch = Regex.IsMatch(subjectString, @"\Aregex pattern\Z");

Regex regexObj = new Regex(@"\Aregex pattern\Z");
bool foundMatch = regexObj.IsMatch(subjectString);

VB.NET

Dim FoundMatch = Regex.IsMatch(SubjectString, "\Aregex pattern\Z")

Dim RegexObj As New Regex("\Aregex pattern\Z")
Dim FoundMatch = RegexObj.IsMatch(SubjectString)

Dim FoundMatch = RegexObj.IsMatch(SubjectString)

Java

boolean foundMatch = subjectString.matches("regex pattern");

Pattern regex = Pattern.compile("regex pattern");
Matcher regexMatcher = regex.matcher(subjectString);
boolean foundMatch = regexMatcher.matches(subjectString);

JavaScript

if (/^regex pattern$/.test(subject)) {
    // Successful match
} else {
    // Match attempt failed
}

PHP

if (preg_match('/\Aregex pattern\Z/', $subject)) {
    # Successful match
} else {
    # Match attempt failed
}

Perl

if ($subject =~ m/\Aregex pattern\Z/) {
    # Successful match
} else {
    # Match attempt failed
}

Python

if re.match(r"regex pattern\Z", subject):
    # Successful match
else:
    # Match attempt failed

reobj = re.compile(r"regex pattern\Z")
if reobj.match(subject):
    # Successful match
else:
    # Match attempt failed

Ruby

if subject =~ /\Aregex pattern\Z/
    # Successful match
else
    # Match attempt failed
end

3.7. Retrieve the Matched Text

Solution

C#

string resultString = Regex.Match(subjectString, @"\d+").Value;

string resultString = null;
try {
    resultString = Regex.Match(subjectString, @"\d+").Value;
} catch (ArgumentNullException ex) {
    // Cannot pass null as the regular expression or subject string
} catch (ArgumentException ex) {
    // Syntax error in the regular expression
}

Regex regexObj = new Regex(@"\d+");
string resultString = regexObj.Match(subjectString).Value;

string resultString = null;
try {
    Regex regexObj = new Regex(@"\d+");
    try {
        resultString = regexObj.Match(subjectString).Value;
    } catch (ArgumentNullException ex) {
        // Cannot pass null as the subject string
    }
} catch (ArgumentException ex) {
    // Syntax error in the regular expression
}

VB.NET

Dim ResultString  = Regex.Match(SubjectString, "\d+").Value

Dim ResultString As String = Nothing
Try
    ResultString = Regex.Match(SubjectString, "\d+").Value
Catch ex As ArgumentNullException
    'Cannot pass Nothing as the regular expression or subject string
Catch ex As ArgumentException
    'Syntax error in the regular expression
End Try

Dim RegexObj As New Regex("\d+")
Dim ResultString = RegexObj.Match(SubjectString).Value

Dim ResultString As String = Nothing
Try
    Dim RegexObj As New Regex("\d+")
    Try
        ResultString = RegexObj.Match(SubjectString).Value
    Catch ex As ArgumentNullException
        'Cannot pass Nothing as the subject string
    End Try
Catch ex As ArgumentException
    'Syntax error in the regular expression
End Try

Java

String resultString = null;
Pattern regex = Pattern.compile("\\d+");
Matcher regexMatcher = regex.matcher(subjectString);
if (regexMatcher.find()) {
    resultString = regexMatcher.group();
}

String resultString = null;
try {
    Pattern regex = Pattern.compile("\\d+");
    Matcher regexMatcher = regex.matcher(subjectString);
    if (regexMatcher.find()) {
        resultString = regexMatcher.group();
    }
} catch (PatternSyntaxException ex) {
    // Syntax error in the regular expression
}

JavaScript

var result = subject.match(/\d+/);
if (result) {
    result = result[0];
} else {
    result = '';
}

PHP

if (preg_match('/\d+/', $subject, $groups)) {
    $result = $groups[0];
} else {
    $result = '';
}

Perl

if ($subject =~ m/\d+/) {
    $result = $&;
} else {
    $result = '';
}

Python

matchobj = re.search("regex pattern", subject)
if matchobj:
    result = matchobj.group()
else:
    result = ""

reobj = re.compile("regex pattern")
matchobj = reobj.search(subject)
if match:
    result = matchobj.group()
else:
    result = ""

Ruby

if subject =~ /regex pattern/
    result = $&
else
    result = ""
end

matchobj = /regex pattern/.match(subject)
if matchobj
    result = matchobj[0]
else
    result = ""
end

3.8. Determine the Position and Length of the Match

Solution

C#

int matchstart, matchlength = -1;
Match matchResult = Regex.Match(subjectString, @"\d+");
if (matchResult.Success) {
    matchstart = matchResult.Index;
    matchlength = matchResult.Length;
}

int matchstart, matchlength = -1;
Regex regexObj = new Regex(@"\d+");
Match matchResult = regexObj.Match(subjectString).Value;
if (matchResult.Success) {
    matchstart = matchResult.Index;
    matchlength = matchResult.Length;
}

VB.NET

Dim MatchStart = -1
Dim MatchLength = -1
Dim MatchResult = Regex.Match(SubjectString, "\d+")
If MatchResult.Success Then
    MatchStart = MatchResult.Index
    MatchLength = MatchResult.Length
End If

Dim MatchStart = -1
Dim MatchLength = -1
Dim RegexObj As New Regex("\d+")
Dim MatchResult = Regex.Match(SubjectString, "\d+")
If MatchResult.Success Then
    MatchStart = MatchResult.Index
    MatchLength = MatchResult.Length
End If

Java

int matchStart, matchLength = -1;
Pattern regex = Pattern.compile("\\d+");
Matcher regexMatcher = regex.matcher(subjectString);
if (regexMatcher.find()) {
	matchStart = regexMatcher.start();
	matchLength = regexMatcher.end() - matchStart;
}

JavaScript

var matchstart = -1;
var matchlength = -1;
var match = /\d+/.exec(subject);
if (match) {
    matchstart = match.index;
    matchlength = match[0].length;
}

PHP

if (preg_match('/\d+/', $subject, $groups, PREG_OFFSET_CAPTURE)) {
    $matchstart = $groups[0][1];
    $matchlength = strlen($groups[0][0]);
}

Perl

if ($subject =~ m/\d+/g) {
    $matchstart = $-[0];
    $matchlength = $+[0] - $-[0];
}

Python

matchobj = re.search(r"\d+", subject)
if matchobj:
    matchstart = matchobj.start()
    matchlength = matchobj.end() - matchstart

reobj = re.compile(r"\d+")
matchobj = reobj.search(subject)
if matchobj:
    matchstart = matchobj.start()
    matchlength = matchobj.end() - matchstart

Ruby

if subject =~ /regex pattern/
    matchstart = $~.begin()
    matchlength = $~.end() - matchstart
end

matchobj = /regex pattern/.match(subject)
if matchobj
    matchstart = matchobj.begin()
    matchlength = matchobj.end() - matchstart
end

3.9. Retrieve Part of the Matched Text

Solution

C#

string resultString = Regex.Match(subjectString,
                      "http://([a-z0-9.-]+)").Groups[1].Value;

Regex regexObj = new Regex("http://([a-z0-9.-]+)");
string resultString = regexObj.Match(subjectString).Groups[1].Value;

VB.NET

Dim ResultString = Regex.Match(SubjectString,
                   "http://([a-z0-9.-]+)").Groups(1).Value

Dim RegexObj As New Regex("http://([a-z0-9.-]+)")
Dim ResultString = RegexObj.Match(SubjectString).Groups(1).Value

Java

String resultString = null;
Pattern regex = Pattern.compile("http://([a-z0-9.-]+)");
Matcher regexMatcher = regex.matcher(subjectString);
if (regexMatcher.find()) {
    resultString = regexMatcher.group(1);
}

JavaScript

var result;
var match = /http:\/\/([a-z0-9.-]+)/.exec(subject);
if (match) {
    result = match[1];
} else {
    result = "";
}

PHP

if (preg_match('%http://([a-z0-9.-]+)%', $subject, $groups)) {
    $result = $groups[1];
} else {
    $result = '';
}

Perl

if ($subject =~ m!http://([a-z0-9.-]+)!) {
    $result = $1;
} else {
    $result = '';
}

Python

matchobj = re.search("http://([a-z0-9.-]+)", subject)
if matchobj:
    result = matchobj.group(1)
else:
    result = ""

reobj = re.compile("http://([a-z0-9.-]+)")
matchobj = reobj.search(subject)
if match:
    result = matchobj.group(1)
else:
    result = ""

Ruby

if subject =~ %r!http://([a-z0-9.-]+)!
    result = $1
else
    result = ""
end

matchobj = %r!http://([a-z0-9.-]+)!.match(subject)
if matchobj
    result = matchobj[1]
else
    result = ""
end

Named Capture

C#

string resultString = Regex.Match(subjectString,
               "http://(?<domain>[a-z0-9.-]+)").Groups["domain"].Value;

Regex regexObj = new Regex("http://(?<domain>[a-z0-9.-]+)");
string resultString = regexObj.Match(subjectString).Groups["domain"].Value;

VB.NET

Dim ResultString = Regex.Match(SubjectString,
                   "http://(?<domain>[a-z0-9.-]+)").Groups("domain").Value

Dim RegexObj As New Regex("http://(?<domain>[a-z0-9.-]+)")
Dim ResultString = RegexObj.Match(SubjectString).Groups("domain").Value

Java

String resultString = null;
Pattern regex = Pattern.compile("http://(?<domain>[a-z0-9.-]+)");
Matcher regexMatcher = regex.matcher(subjectString);
if (regexMatcher.find()) {
    resultString = regexMatcher.group("domain");
}

XRegExp

var result;
var match = XRegExp.exec(subject, 
                         XRegExp("http://(?<domain>[a-z0-9.-]+)"));
if (match) {
    result = match.domain;
} else {
    result = "";
}

PHP

if (preg_match('%http://(?P<domain>[a-z0-9.-]+)%', $subject, $groups)) {
    $result = $groups['domain'];
} else {
    $result = '';
}

Perl

if ($subject =~ '!http://(?<domain>[a-z0-9.-]+)%!) {
    $result = $+{'domain'};
} else {
    $result = '';
}

Python

matchobj = re.search("http://(?P<domain>[a-z0-9.-]+)", subject)
if matchobj:
    result = matchobj.group("domain")
else:
    result = ""

3.10. Retrieve a List of All Matches

Solution

C#

MatchCollection matchlist = Regex.Matches(subjectString, @"\d+");

Regex regexObj = new Regex(@"\d+");
MatchCollection matchlist = regexObj.Matches(subjectString);

VB.NET

Dim MatchList = Regex.Matches(SubjectString, "\d+")

Dim RegexObj As New Regex("\d+")
Dim MatchList = RegexObj.Matches(SubjectString)

Java

List<String> resultList = new ArrayList<String>();
Pattern regex = Pattern.compile("\\d+");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
    resultList.add(regexMatcher.group());
}

JavaScript

var list = subject.match(/\d+/g);

PHP

preg_match_all('/\d+/', $subject, $result, PREG_PATTERN_ORDER);
$result = $result[0];

Perl

@result = $subject =~ m/\d+/g;

Python

result = re.findall(r"\d+", subject)

reobj = re.compile(r"\d+")
result = reobj.findall(subject)

Ruby

result = subject.scan(/\d+/)

3.11. Iterate over All Matches

Solution

C#

Match matchResult = Regex.Match(subjectString, @"\d+");
while (matchResult.Success) {
    // Here you can process the match stored in matchResult
    matchResult = matchResult.NextMatch();
}

Regex regexObj = new Regex(@"\d+");
matchResult = regexObj.Match(subjectString);
while (matchResult.Success) {
    // Here you can process the match stored in matchResult
    matchResult = matchResult.NextMatch();
}

VB.NET

Dim MatchResult = Regex.Match(SubjectString, "\d+")
While MatchResult.Success
    'Here you can process the match stored in MatchResult
    MatchResult = MatchResult.NextMatch
End While

Dim RegexObj As New Regex("\d+")
Dim MatchResult = RegexObj.Match(SubjectString)
While MatchResult.Success
    'Here you can process the match stored in MatchResult
    MatchResult = MatchResult.NextMatch
End While

Java

Pattern regex = Pattern.compile("\\d+");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
    // Here you can process the match stored in regexMatcher
}

JavaScript

var regex = /\d+/g;
var match = null;
while (match = regex.exec(subject)) {
  // Don't let browsers get stuck in an infinite loop
  if (match.index == regex.lastIndex) regex.lastIndex++;
  // Here you can process the match stored in the match variable
}

var regex = /\d+/g;
var match = null;
while (match = regex.exec(subject)) {
  // Here you can process the match stored in the match variable
}

XRegExp

XRegExp.forEach(subject, /\d+/, function(match) {
  // Here you can process the match stored in the match variable
});

PHP

preg_match_all('/\d+/', $subject, $result, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($result[0]); $i++) {
    # Matched text = $result[0][$i];
}

Perl

while ($subject =~ m/\d+/g) {
    # matched text = $&
}

Python

for matchobj in re.finditer(r"\d+", subject):
    # Here you can process the match stored in the matchobj variable

reobj = re.compile(r"\d+")
for matchobj in reobj.finditer(subject):
    # Here you can process the match stored in the matchobj variable

Ruby

subject.scan(/\d+/) {|match|
    # Here you can process the match stored in the match variable
}

Discussion

Ruby

subject.scan(/(a)(b)(c)/) {|a, b, c|
    # a, b, and c hold the text matched by the three capturing groups
}

subject.scan(/(a)(b)(c)/) {|abc|
    # abc[0], abc[1], and abc[2] hold the text
    # matched by the three capturing groups
}

3.12. Validate Matches in Procedural Code

Solution

C#

StringCollection resultList = new StringCollection();
Match matchResult = Regex.Match(subjectString, @"\d+");
while (matchResult.Success) {
    if (int.Parse(matchResult.Value) % 13 == 0) {
        resultList.Add(matchResult.Value);
    }
    matchResult = matchResult.NextMatch();
}

StringCollection resultList = new StringCollection();
Regex regexObj = new Regex(@"\d+");
matchResult = regexObj.Match(subjectString);
while (matchResult.Success) {
    if (int.Parse(matchResult.Value) % 13 == 0) {
        resultList.Add(matchResult.Value);
    }
    matchResult = matchResult.NextMatch();
}

VB.NET

Dim ResultList = New StringCollection
Dim MatchResult = Regex.Match(SubjectString, "\d+")
While MatchResult.Success
    If Integer.Parse(MatchResult.Value) Mod 13 = 0 Then
        ResultList.Add(MatchResult.Value)
    End If
    MatchResult = MatchResult.NextMatch
End While

Dim ResultList = New StringCollection
Dim RegexObj As New Regex("\d+")
Dim MatchResult = RegexObj.Match(SubjectString)
While MatchResult.Success
    If Integer.Parse(MatchResult.Value) Mod 13 = 0 Then
        ResultList.Add(MatchResult.Value)
    End If
    MatchResult = MatchResult.NextMatch
End While

Java

List<String> resultList = new ArrayList<String>();
Pattern regex = Pattern.compile("\\d+");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
    if (Integer.parseInt(regexMatcher.group()) % 13 == 0) {
        resultList.add(regexMatcher.group());
    }
}

JavaScript

var list = [];
var regex = /\d+/g;
var match = null;
while (match = regex.exec(subject)) {
    // Don't let browsers get stuck in an infinite loop
    if (match.index == regex.lastIndex) regex.lastIndex++;
    // Here you can process the match stored in the match variable
    if (match[0] % 13 == 0) {
        list.push(match[0]);
    }
}

XRegExp

var list = [];
XRegExp.forEach(subject, /\d+/, function(match) {
   if (match[0] % 13 == 0) {
       list.push(match[0]);
   }
});

PHP

preg_match_all('/\d+/', $subject, $matchdata, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($matchdata[0]); $i++) {
    if ($matchdata[0][$i] % 13 == 0) {
      $list[] = $matchdata[0][$i];
    }
}

Perl

while ($subject =~ m/\d+/g) {
    if ($& % 13 == 0) {
        push(@list, $&);
    }
}

Python

list = []
for matchobj in re.finditer(r"\d+", subject):
    if int(matchobj.group()) % 13 == 0:
       list.append(matchobj.group())

list = []
reobj = re.compile(r"\d+")
for matchobj in reobj.finditer(subject):
    if int(matchobj.group()) % 13 == 0:
       list.append(matchobj.group())

Ruby

list = []
subject.scan(/\d+/) {|match|
    list << match if (Integer(match) % 13 == 0)
}

3.13. Find a Match Within Another Match

Solution

C#

StringCollection resultList = new StringCollection();
Regex outerRegex = new Regex("<b>(.*?)</b>", RegexOptions.Singleline);
Regex innerRegex = new Regex(@"\d+");
// Find the first section
Match outerMatch = outerRegex.Match(subjectString);
while (outerMatch.Success) {
    // Get the matches within the section
	Match innerMatch = innerRegex.Match(outerMatch.Groups[1].Value);
	while (innerMatch.Success) {
		resultList.Add(innerMatch.Value);
		innerMatch = innerMatch.NextMatch();
	}
	// Find the next section
    outerMatch = outerMatch.NextMatch();
}

VB.NET

Dim ResultList = New StringCollection
Dim OuterRegex As New Regex("<b>(.*?)</b>", RegexOptions.Singleline)
Dim InnerRegex As New Regex("\d+")
'Find the first section
Dim OuterMatch = OuterRegex.Match(SubjectString)
While OuterMatch.Success
    'Get the matches within the section
    Dim InnerMatch = InnerRegex.Match(OuterMatch.Groups(1).Value)
    While InnerMatch.Success
        ResultList.Add(InnerMatch.Value)
        InnerMatch = InnerMatch.NextMatch
    End While
    OuterMatch = OuterMatch.NextMatch
End While

Java

List<String> resultList = new ArrayList<String>();
Pattern outerRegex = Pattern.compile("<b>(.*?)</b>", Pattern.DOTALL);
Pattern innerRegex = Pattern.compile("\\d+");
Matcher outerMatcher = outerRegex.matcher(subjectString);
while (outerMatcher.find()) {
    Matcher innerMatcher = innerRegex.matcher(outerMatcher.group(1));
    while (innerMatcher.find()) {
        resultList.add(innerMatcher.group());
    }
}

List<String> resultList = new ArrayList<String>();
Pattern outerRegex = Pattern.compile("<b>(.*?)</b>", Pattern.DOTALL);
Pattern innerRegex = Pattern.compile("\\d+");
Matcher outerMatcher = outerRegex.matcher(subjectString);
Matcher innerMatcher = innerRegex.matcher(subjectString);
while (outerMatcher.find()) {
    innerMatcher.region(outerMatcher.start(1), outerMatcher.end(1));
    while (innerMatcher.find()) {
        resultList.add(innerMatcher.group());
    }
}

JavaScript

var result = [];
var outerRegex = /<b>([\s\S]*?)<\/b>/g;
var innerRegex = /\d+/g;
var outerMatch;
var innerMatches;
while (outerMatch = outerRegex.exec(subject)) {
    if (outerMatch.index == outerRegex.lastIndex)
        outerRegex.lastIndex++;
    innerMatches = outerMatch[1].match(innerRegex);
    if (innerMatches) {
        result = result.concat(innerMatches);
    }
}

XRegExp

var result = XRegExp.matchChain(subject, [
    {regex: XRegExp("<b>(.*?)</b>", "s"), backref: 1},
    /\d+/
]);

var result = [];
var outerRegex = XRegExp("<b>(.*?)</b>", "s");
var innerRegex = /\d+/g;
XRegExp.forEach(subject, outerRegex, function(outerMatch) {
    var innerMatches = outerMatch[1].match(innerRegex);
    if (innerMatches) {
        result = result.concat(innerMatches);
    }
});

PHP

$list = array();
preg_match_all('%<b>(.*?)</b>%s', $subject, $outermatches,
               PREG_PATTERN_ORDER);
for ($i = 0; $i < count($outermatches[0]); $i++) {
    if (preg_match_all('/\d+/', $outermatches[1][$i], $innermatches,
                       PREG_PATTERN_ORDER)) {
        $list = array_merge($list, $innermatches[0]);
    }
}

Perl

while ($subject =~ m!<b>(.*?)</b>!gs) {
    push(@list, ($1 =~ m/\d+/g));
}

Python

list = []
innerre = re.compile(r"\d+")
for outermatch in re.finditer("(?s)<b>(.*?)</b>", subject):
    list.extend(innerre.findall(outermatch.group(1)))

Ruby

list = []
subject.scan(/<b>(.*?)<\/b>/m) {|outergroups|
    list += outergroups[1].scan(/\d+/)
}

Discussion

\d+(?=(?:(?!<b>).)*</b>)

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

3.14. Replace All Matches

Solution

C#

string resultString = Regex.Replace(subjectString, "before", "after");

string resultString = null;
try {
    resultString = Regex.Replace(subjectString, "before", "after");
} catch (ArgumentNullException ex) {
    // Cannot pass null as the regular expression, subject string,
    // or replacement text
} catch (ArgumentException ex) {
    // Syntax error in the regular expression
}

Regex regexObj = new Regex("before");
string resultString = regexObj.Replace(subjectString, "after");

string resultString = null;
try {
    Regex regexObj = new Regex("before");
    try {
        resultString = regexObj.Replace(subjectString, "after");
    } catch (ArgumentNullException ex) {
        // Cannot pass null as the subject string or replacement text
    }
} catch (ArgumentException ex) {
    // Syntax error in the regular expression
}

VB.NET

Dim ResultString = Regex.Replace(SubjectString, "before", "after")

Dim ResultString As String = Nothing
Try
    ResultString = Regex.Replace(SubjectString, "before", "after")
Catch ex As ArgumentNullException
    'Cannot pass null as the regular expression, subject string,
    'or replacement text
Catch ex As ArgumentException
    'Syntax error in the regular expression
End Try

Dim RegexObj As New Regex("before")
Dim ResultString = RegexObj.Replace(SubjectString, "after")

Dim ResultString As String = Nothing
Try
    Dim RegexObj As New Regex("before")
    Try
        ResultString = RegexObj.Replace(SubjectString, "after")
    Catch ex As ArgumentNullException
       'Cannot pass null as the subject string or replacement text
    End Try
Catch ex As ArgumentException
    'Syntax error in the regular expression
End Try

Java

String resultString = subjectString.replaceAll("before", "after");

try {
    String resultString = subjectString.replaceAll("before", "after");
} catch (PatternSyntaxException ex) {
    // Syntax error in the regular expression
} catch (IllegalArgumentException ex) {
    // Syntax error in the replacement text (unescaped $ signs?)
} catch (IndexOutOfBoundsException ex) {
    // Non-existent backreference used the replacement text
}

Pattern regex = Pattern.compile("before");
Matcher regexMatcher = regex.matcher(subjectString);
String resultString = regexMatcher.replaceAll("after");

String resultString = null;
try {
    Pattern regex = Pattern.compile("before");
    Matcher regexMatcher = regex.matcher(subjectString);
    try {
        resultString = regexMatcher.replaceAll("after");
    } catch (IllegalArgumentException ex) {
        // Syntax error in the replacement text (unescaped $ signs?)
    } catch (IndexOutOfBoundsException ex) {
        // Non-existent backreference used the replacement text
    }
} catch (PatternSyntaxException ex) {
    // Syntax error in the regular expression
}

JavaScript

result = subject.replace(/before/g, "after");

PHP

$result = preg_replace('/before/', 'after', $subject);

Perl

s/before/after/g;

$subject =~ s/before/after/g;

($result = $subject) =~ s/before/after/g;

Python

result = re.sub("before", "after", subject)

reobj = re.compile("before")
result = reobj.sub("after", subject)

Ruby

result = subject.gsub(/before/, 'after')

Discussion

PHP

$regex[0] = '/a/';
$regex[1] = '/b/';
$regex[2] = '/c/';
$replace[2] = '3';
$replace[1] = '2';
$replace[0] = '1';

echo preg_replace($regex, $replace, "abc");
ksort($replace);
echo preg_replace($regex, $replace, "abc");

3.15. Replace Matches Reusing Parts of the Match

Solution

C#

string resultString = Regex.Replace(subjectString, @"(\w+)=(\w+)",
                                                   "$2=$1");

Regex regexObj = new Regex(@"(\w+)=(\w+)");
string resultString = regexObj.Replace(subjectString, "$2=$1");

VB.NET

Dim ResultString = Regex.Replace(SubjectString, "(\w+)=(\w+)", "$2=$1")

Dim RegexObj As New Regex("(\w+)=(\w+)")
Dim ResultString = RegexObj.Replace(SubjectString, "$2=$1")

Java

String resultString = subjectString.replaceAll("(\\w+)=(\\w+)", "$2=$1");

Pattern regex = Pattern.compile("(\\w+)=(\\w+)");
Matcher regexMatcher = regex.matcher(subjectString);
String resultString = regexMatcher.replaceAll("$2=$1");

JavaScript

result = subject.replace(/(\w+)=(\w+)/g, "$2=$1");

PHP

$result = preg_replace('/(\w+)=(\w+)/', '$2=$1', $subject);

Perl

$subject =~ s/(\w+)=(\w+)/$2=$1/g;

Python

result = re.sub(r"(\w+)=(\w+)", r"\2=\1", subject)

reobj = re.compile(r"(\w+)=(\w+)")
result = reobj.sub(r"\2=\1", subject)

Ruby

result = subject.gsub(/(\w+)=(\w+)/, '\2=\1')

Named Capture

C#

string resultString = Regex.Replace(subjectString,
                      @"(?<left>\w+)=(?<right>\w+)", "${right}=${left}");

Regex regexObj = new Regex(@"(?<left>\w+)=(?<right>\w+)");
string resultString = regexObj.Replace(subjectString, "${right}=${left}");

VB.NET

Dim ResultString = Regex.Replace(SubjectString,
                   "(?<left>\w+)=(?<right>\w+)", "${right}=${left}")

Dim RegexObj As New Regex("(?<left>\w+)=(?<right>\w+)")
Dim ResultString = RegexObj.Replace(SubjectString, "${right}=${left}")

Java 7

String resultString = subjectString.replaceAll(
                      "(?<left>\\w+)=(?<right>\\w+)", "${right}=${left}");

Pattern regex = Pattern.compile("(?<left>\\w+)=(?<right>\\w+)");
Matcher regexMatcher = regex.matcher(subjectString);
String resultString = regexMatcher.replaceAll("${right}=${left}");

XRegExp

var re = XRegExp("(?<left>\\w+)=(?<right>\\w+)", "g");
var result = XRegExp.replace(subject, re, "${right}=${left}");

PHP

$result = preg_replace('/(?P<left>\w+)=(?P<right>\w+)/',
                       '$2=$1', $subject);

Perl

$subject =~ s/(?<left>\w+)=(?<right>\w+)/$+{right}=$+{left}/g;

Python

result = re.sub(r"(?P<left>\w+)=(?P<right>\w+)", r"\g<right>=\g<left>",
                subject)

reobj = re.compile(r"(?P<left>\w+)=(?P<right>\w+)")
result = reobj.sub(r"\g<right>=\g<left>", subject)

Ruby

result = subject.gsub(/(?<left>\w+)=(?<right>\w+)/, '\k<left>=\k<right>')

3.16. Replace Matches with Replacements Generated in Code

Solution

C#

string resultString = Regex.Replace(subjectString, @"\d+",
                      new MatchEvaluator(ComputeReplacement));

Regex regexObj = new Regex(@"\d+");
string resultString = regexObj.Replace(subjectString,
                      new MatchEvaluator(ComputeReplacement));

public String ComputeReplacement(Match matchResult) {
    int twiceasmuch = int.Parse(matchResult.Value) * 2;
    return twiceasmuch.ToString();
}

VB.NET

Dim MyMatchEvaluator As New MatchEvaluator(AddressOf ComputeReplacement)
Dim ResultString = Regex.Replace(SubjectString, "\d+", MyMatchEvaluator)

Dim RegexObj As New Regex("\d+")
Dim MyMatchEvaluator As New MatchEvaluator(AddressOf ComputeReplacement)
Dim ResultString = RegexObj.Replace(SubjectString, MyMatchEvaluator)

Public Function ComputeReplacement(ByVal MatchResult As Match) As String
    Dim TwiceAsMuch = Int.Parse(MatchResult.Value) * 2;
    Return TwiceAsMuch.ToString();
End Function

Java

StringBuffer resultString = new StringBuffer();
Pattern regex = Pattern.compile("\\d+");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
    Integer twiceasmuch = Integer.parseInt(regexMatcher.group()) * 2;
    regexMatcher.appendReplacement(resultString, twiceasmuch.toString());
}
regexMatcher.appendTail(resultString);

JavaScript

var result = subject.replace(/\d+/g, function(match) {
    return match * 2;
});

PHP

$result = preg_replace_callback('/\d+/', 'compute_replacement', $subject);

function compute_replacement($groups) {
    return $groups[0] * 2;
}

$result = preg_replace_callback(
    '/\d+/',
    create_function(
        '$groups',
        'return $groups[0] * 2;'
    ),
    $subject
);

Perl

$subject =~ s/\d+/$& * 2/eg;

Python

result = re.sub(r"\d+", computereplacement, subject)

reobj = re.compile(r"\d+")
result = reobj.sub(computereplacement, subject)

def computereplacement(matchobj):
    return str(int(matchobj.group()) * 2)

Ruby

result = subject.gsub(/\d+/) {|match|
    Integer(match) * 2
}

3.17. Replace All Matches Within the Matches of

Solution

C#

Regex outerRegex = new Regex("<b>.*?</b>", RegexOptions.Singleline);
Regex innerRegex = new Regex("before");
string resultString = outerRegex.Replace(subjectString,
                      new MatchEvaluator(ComputeReplacement));

public String ComputeReplacement(Match matchResult) {
    // Run the inner search-and-replace on each match of the outer regex
    return innerRegex.Replace(matchResult.Value, "after");
}

VB.NET

Dim OuterRegex As New Regex("<b>.*?</b>", RegexOptions.Singleline)
Dim InnerRegex As New Regex("before")
Dim MyMatchEvaluator As New MatchEvaluator(AddressOf ComputeReplacement)
Dim ResultString = OuterRegex.Replace(SubjectString, MyMatchEvaluator)

Public Function ComputeReplacement(ByVal MatchResult As Match) As String
    'Run the inner search-and-replace on each match of the outer regex
    Return InnerRegex.Replace(MatchResult.Value, "after");
End Function

Java

StringBuffer resultString = new StringBuffer();
Pattern outerRegex = Pattern.compile("<b>.*?</b>");
Pattern innerRegex = Pattern.compile("before");
Matcher outerMatcher = outerRegex.matcher(subjectString);
while (outerMatcher.find()) {
    outerMatcher.appendReplacement(resultString,
      innerRegex.matcher(outerMatcher.group()).replaceAll("after"));
}
outerMatcher.appendTail(resultString);

JavaScript

var result = subject.replace(/<b>.*?<\/b>/g, function(match) {
    return match.replace(/before/g, "after");
});

PHP

$result = preg_replace_callback('%<b>.*?</b>%',
                                replace_within_tag, $subject);

function replace_within_tag($groups) {
    return preg_replace('/before/', 'after', $groups[0]);
}

Perl

$subject =~ s%<b>.*?</b>%($match = $&) =~ s/before/after/g; $match;%eg;

Python

innerre = re.compile("before")
def replacewithin(matchobj):
    return innerre.sub("after", matchobj.group())

result = re.sub("<b>.*?</b>", replacewithin, subject)

Ruby

innerre = /before/
result = subject.gsub(/<b>.*?<\/b>/) {|match|
    match.gsub(innerre, 'after')
}

3.18. Replace All Matches Between the Matches of Another Regex

Solution

C#

string resultString = null;
Regex outerRegex = new Regex("<[^<>]*>");
Regex innerRegex = new Regex("\"([^\"]*)\"");
// Find the first section
int lastIndex = 0;
Match outerMatch = outerRegex.Match(subjectString);
while (outerMatch.Success) {
    // Search and replace through the text between this match,
    // and the previous one
	string textBetween =
	    subjectString.Substring(lastIndex, outerMatch.Index - lastIndex);
	resultString += innerRegex.Replace(textBetween, "\u201C$1\u201D");
	lastIndex = outerMatch.Index + outerMatch.Length;
	// Copy the text in the section unchanged
	resultString += outerMatch.Value;
	// Find the next section
	outerMatch = outerMatch.NextMatch();
}
// Search and replace through the remainder after the last regex match
string textAfter = subjectString.Substring(lastIndex,
                   subjectString.Length - lastIndex);
resultString += innerRegex.Replace(textAfter, "\u201C$1\u201D");

VB.NET

Dim ResultString As String = Nothing
Dim OuterRegex As New Regex("<[^<>]*>")
Dim InnerRegex As New Regex("""([^""]*)""")
'Find the first section
Dim LastIndex = 0
Dim OuterMatch = OuterRegex.Match(SubjectString)
While OuterMatch.Success
    'Search and replace through the text between this match, 
    'and the previous one
    Dim TextBetween = SubjectString.Substring(LastIndex, 
                      OuterMatch.Index - LastIndex);
    ResultString += InnerRegex.Replace(TextBetween, 
                    ChrW(&H201C) + "$1" + ChrW(&H201D))
    LastIndex = OuterMatch.Index + OuterMatch.Length
    'Copy the text in the section unchanged
    ResultString += OuterMatch.Value
    'Find the next section
    OuterMatch = OuterMatch.NextMatch
End While
'Search and replace through the remainder after the last regex match
Dim TextAfter = SubjectString.Substring(LastIndex,
                                        SubjectString.Length - LastIndex);
ResultString += InnerRegex.Replace(TextAfter, 
                ChrW(&H201C) + "$1" + ChrW(&H201D))

Java

StringBuffer resultString = new StringBuffer();
Pattern outerRegex = Pattern.compile("<[^<>]*>");
Pattern innerRegex = Pattern.compile("\"([^\"]*)\"");
Matcher outerMatcher = outerRegex.matcher(subjectString);
int lastIndex = 0;
while (outerMatcher.find()) {
    // Search and replace through the text between this match,
    // and the previous one
    String textBetween = subjectString.substring(lastIndex,
                                                 outerMatcher.start());
    Matcher innerMatcher = innerRegex.matcher(textBetween);
    resultString.append(innerMatcher.replaceAll("\u201C$1\u201D"));
    lastIndex = outerMatcher.end();
    // Append the regex match itself unchanged
    resultString.append(outerMatcher.group());
}
// Search and replace through the remainder after the last regex match
String textAfter = subjectString.substring(lastIndex);
Matcher innerMatcher = innerRegex.matcher(textAfter);
resultString.append(innerMatcher.replaceAll("\u201C$1\u201D"));

JavaScript

var result = "";
var outerRegex = /<[^<>]*>/g;
var innerRegex = /"([^"]*)"/g;
var outerMatch = null;
var lastIndex = 0;
while (outerMatch = outerRegex.exec(subject)) {
    if (outerMatch.index == outerRegex.lastIndex) outerRegex.lastIndex++;
    // Search and replace through the text between this match,
    // and the previous one
    var textBetween = subject.slice(lastIndex, outerMatch.index);
    result += textBetween.replace(innerRegex, "\u201C$1\u201D");
    lastIndex = outerMatch.index + outerMatch[0].length;
    // Append the regex match itself unchanged
    result += outerMatch[0];
}
// Search and replace through the remainder after the last regex match
var textAfter = subject.slice(lastIndex);
result += textAfter.replace(innerRegex, "\u201C$1\u201D");

PHP

$result = '';
$lastindex = 0;
while (preg_match('/<[^<>]*>/', $subject, $groups, PREG_OFFSET_CAPTURE, 
                 $lastindex)) {
    $matchstart = $groups[0][1];
    $matchlength = strlen($groups[0][0]);
    // Search and replace through the text between this match,
    // and the previous one
    $textbetween = substr($subject, $lastindex, $matchstart-$lastindex);
    $result .= preg_replace('/"([^"]*)"/', '“$1”', $textbetween);
    // Append the regex match itself unchanged
    $result .= $groups[0][0];
    // Move the starting position for the next match
    $lastindex = $matchstart + $matchlength;
    if ($matchlength == 0) {
        // Don't get stuck in an infinite loop
        // if the regex allows zero-length matches
        $lastindex++;
    }
}
// Search and replace through the remainder after the last regex match
$textafter = substr($subject, $lastindex);
$result .= preg_replace('/"([^"]*)"/', '“$1”', $textafter);

Perl

use encoding "utf-8";
$result = '';
while ($subject =~ m/<[^<>]*>/g) {
    $match = $&;
    $textafter = $';
    ($textbetween = $`) =~ s/"([^"]*)"/\x{201C}$1\x{201D}/g;
    $result .= $textbetween . $match;
}
$textafter =~ s/"([^"]*)"/\x{201C}$1\x{201D}/g;
$result .= $textafter;

Python

innerre = re.compile('"([^"]*)"')
result = "";
lastindex = 0;
for outermatch in re.finditer("<[^<>]*>", subject):
    # Search and replace through the text between this match,
    # and the previous one
    textbetween = subject[lastindex:outermatch.start()]
    result += innerre.sub(u"\u201C\\1\u201D", textbetween)
    lastindex = outermatch.end()
    # Append the regex match itself unchanged
    result += outermatch.group()
# Search and replace through the remainder after the last regex match
textafter = subject[lastindex:]
result += innerre.sub(u"\u201C\\1\u201D", textafter)

Ruby

result = '';
textafter = ''
subject.scan(/<[^<>]*>/) {|match|
    textafter = $'
    textbetween = $`.gsub(/"([^"]*)"/, '“\1”')
    result += textbetween + match
}
result += textafter.gsub(/"([^"]*)"/, '“\1”')

Discussion

Python

print result.encode('1252')

3.19. Split a String

Solution

C#

string[] splitArray = Regex.Split(subjectString, "<[^<>]*>");

string[] splitArray = null;
try {
    splitArray = Regex.Split(subjectString, "<[^<>]*>");
} catch (ArgumentNullException ex) {
     // Cannot pass null as the regular expression or subject string
} catch (ArgumentException ex) {
    // Syntax error in the regular expression
}

Regex regexObj = new Regex("<[^<>]*>");
string[] splitArray = regexObj.Split(subjectString);

string[] splitArray = null;
try {
    Regex regexObj = new Regex("<[^<>]*>");
    try {
        splitArray = regexObj.Split(subjectString);
    } catch (ArgumentNullException ex) {
        // Cannot pass null as the subject string
    }
} catch (ArgumentException ex) {
    // Syntax error in the regular expression
}

VB.NET

Dim SplitArray = Regex.Split(SubjectString, "<[^<>]*>")

Dim SplitArray As String()
Try
    SplitArray = Regex.Split(SubjectString, "<[^<>]*>")
Catch ex As ArgumentNullException
    'Cannot pass null as the regular expression or subject string
Catch ex As ArgumentException
    'Syntax error in the regular expression
End Try

Dim RegexObj As New Regex("<[^<>]*>")
Dim SplitArray = RegexObj.Split(SubjectString)

Dim SplitArray As String()
Try
    Dim RegexObj As New Regex("<[^<>]*>")
    Try
        SplitArray = RegexObj.Split(SubjectString)
    Catch ex As ArgumentNullException
        'Cannot pass null as the subject string
    End Try
Catch ex As ArgumentException
    'Syntax error in the regular expression
End Try

Java

String[] splitArray = subjectString.split("<[^<>]*>");

try {
    String[] splitArray = subjectString.split("<[^<>]*>");
} catch (PatternSyntaxException ex) {
    // Syntax error in the regular expression
}

Pattern regex = Pattern.compile("<[^<>]*>");
String[] splitArray = regex.split(subjectString);

String[] splitArray = null;
try {
    Pattern regex = Pattern.compile("<[^<>]*>");
    splitArray = regex.split(subjectString);
} catch (ArgumentException ex) {
    // Syntax error in the regular expression
}

JavaScript

result = subject.split(/<[^<>]*>/);

XRegExp

result = XRegExp.split(subject, /<[^<>]*>/);

PHP

$result = preg_split('/<[^<>]*>/', $subject);

Perl

@result = split(m/<[^<>]*>/, $subject);

Python

result = re.split("<[^<>]*>", subject))

reobj = re.compile("<[^<>]*>")
result = reobj.split(subject)

Ruby

result = subject.split(/<[^<>]*>/)

3.20. Split a String, Keeping the Regex Matches

Solution

C#

string[] splitArray = Regex.Split(subjectString, "(<[^<>]*>)");

Regex regexObj = new Regex("(<[^<>]*>)");
string[] splitArray = regexObj.Split(subjectString);

VB.NET

Dim SplitArray = Regex.Split(SubjectString, "(<[^<>]*>)")

Dim RegexObj As New Regex("(<[^<>]*>)")
Dim SplitArray = RegexObj.Split(SubjectString)

Java

List<String> resultList = new ArrayList<String>();
Pattern regex = Pattern.compile("<[^<>]*>");
Matcher regexMatcher = regex.matcher(subjectString);
int lastIndex = 0;
while (regexMatcher.find()) {
    resultList.add(subjectString.substring(lastIndex,
                                           regexMatcher.start()));
    resultList.add(regexMatcher.group());
    lastIndex = regexMatcher.end();
}
resultList.add(subjectString.substring(lastIndex));

JavaScript

result = subject.split(/(<[^<>]*>)/);

XRegExp

result = XRegExp.split(subject, /(<[^<>]*>)/);

PHP

$result = preg_split('/(<[^<>]*>)/', $subject, -1,
                     PREG_SPLIT_DELIM_CAPTURE);

Perl

@result = split(m/(<[^<>]*>)/, $subject);

Python

result = re.split("(<[^<>]*>)", subject))

reobj = re.compile("(<[^<>]*>)")
result = reobj.split(subject)

Ruby

list = []
lastindex = 0;
subject.scan(/<[^<>]*>/) {|match|
    list << subject[lastindex..$~.begin(0)-1];
    list << $&
    lastindex = $~.end(0)
}
list << subject[lastindex..subject.length()]

3.21. Search Line by Line

Solution

C#

string[] lines = Regex.Split(subjectString, "\r?\n");

Regex regexObj = new Regex("regex pattern");
for (int i = 0; i < lines.Length; i++) {
    if (regexObj.IsMatch(lines[i])) {
        // The regex matches lines[i]
    } else {
        // The regex does not match lines[i]
    }
}

VB.NET

Dim Lines = Regex.Split(SubjectString, "\r?\n")

Dim RegexObj As New Regex("regex pattern")
For i As Integer = 0 To Lines.Length - 1
    If RegexObj.IsMatch(Lines(i)) Then
        'The regex matches Lines(i)
    Else
        'The regex does not match Lines(i)
    End If
Next

Java

String[] lines = subjectString.split("\r?\n");

Pattern regex = Pattern.compile("regex pattern");
Matcher regexMatcher = regex.matcher("");
for (int i = 0; i < lines.length; i++) {
    regexMatcher.reset(lines[i]);
    if (regexMatcher.find()) {
        // The regex matches lines[i]
    } else {
        // The regex does not match lines[i]
    }
}

JavaScript

var lines = subject.split(/\r?\n/);

var regexp = /regex pattern/;
for (var i = 0; i < lines.length; i++) {
    if (lines[i].match(regexp)) {
        // The regex matches lines[i]
    } else {
        // The regex does not match lines[i]
    }
}

PHP

$lines = preg_split('/\r?\n/', $subject)

foreach ($lines as $line) {
    if (preg_match('/regex pattern/', $line)) {
        // The regex matches $line
    } else {
        // The regex does not match $line
    }
}

Perl

@lines = split(m/\r?\n/, $subject)

foreach $line (@lines) {
    if ($line =~ m/regex pattern/) {
        # The regex matches $line
    } else {
        # The regex does not match $line
    }
}

Python

lines = re.split("\r?\n", subject)

reobj = re.compile("regex pattern")
for line in lines[:]:
    if reobj.search(line):
        # The regex matches line
    else:
        # The regex does not match line

Ruby

lines = subject.split(/\r?\n/)

re = /regex pattern/
lines.each { |line|
    if line =~ re
        # The regex matches line
    else
        # The regex does not match line
}

3.22. Construct a Parser

Problem

table %First table%
  row cell %A1% cell %B1% cell%C1%cell%D1%
  ROW row CELL %The previous row was blank%
  cell %B3%
  row
    cell %A4% %second line%
    cEll %B4%
         %second line%
    cell %C4
second line%
  row cell %%%string%%%
    cell %%
    cell %%%%
    cell %%%%%%

Solution

C#

static RECTable ImportTable(string fileContents) {
  RECTable table = null;
  RECRow row = null;
  RECCell cell = null;
  Regex regexObj = new Regex(
      @"  \b(?<keyword>table|row|cell)\b
      | %(?<string>[^%]*(?:%%[^%]*)*)%
      | (?<error>\S+)",
    RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);
  Match match = regexObj.Match(fileContents);
  while (match.Success) {
    if (match.Groups["keyword"].Success) {
      string keyword = match.Groups["keyword"].Value.ToLower();
      if (keyword == "table") {
        table = new RECTable();
        row = null;
        cell = null;
      } else if (keyword == "row") {
        if (table == null)
          throw new Exception("Invalid data: row without table");
        row = table.addRow();
        cell = null;
      } else if (keyword == "cell") {
        if (row == null)
          throw new Exception("Invalid data: cell without row");
        cell = row.addCell();
      } else {
        throw new Exception("Parser bug: unknown keyword");
      }
    } else if (match.Groups["string"].Success) {
      string content = match.Groups["string"].Value.Replace("%%", "%");
      if (cell != null)
        cell.addContent(content);
      else if (row != null)
        throw new Exception("Invalid data: string after row keyword");
      else if (table != null)
        table.addCaption(content);
      else
        throw new Exception("Invalid data: string before table keyword");
    } else if (match.Groups["error"].Success) {
      throw new Exception("Invalid data: " + match.Groups["error"].Value);
    } else {
      throw new Exception("Parser bug: no capturing group matched");
    }
    match = match.NextMatch();
  }
  if (table == null)
    throw new Exception("Invalid data: table keyword missing");
  return table;
}

VB.NET

Function ImportTable(ByVal FileContents As String)
  Dim Table As RECTable = Nothing
  Dim Row As RECRow = Nothing
  Dim Cell As RECCell = Nothing
  Dim RegexObj As New Regex(
      "  \b(?<keyword>table|row|cell)\b" & _
      "| %(?<string>[^%]*(?:%%[^%]*)*)%" & _
      "| (?<error>\S+)",
      RegexOptions.IgnoreCase Or RegexOptions.IgnorePatternWhitespace)
  Dim MatchResults As Match = RegexObj.Match(FileContents)
  While MatchResults.Success
    If MatchResults.Groups("keyword").Success Then
      Dim Keyword As String = MatchResults.Groups("keyword").Value
      Keyword = Keyword.ToLower()
      If Keyword = "table" Then
        Table = New RECTable
        Row = Nothing
        Cell = Nothing
      ElseIf Keyword = "row" Then
        If Table Is Nothing Then
          Throw New Exception("Invalid data: row without table")
        End If
        Row = Table.addRow
        Cell = Nothing
      ElseIf Keyword = "cell" Then
        If Row Is Nothing Then
          Throw New Exception("Invalid data: cell without row")
        End If
        Cell = Row.addCell
      Else
        Throw New Exception("Parser bug: unknown keyword")
      End If
    ElseIf MatchResults.Groups("string").Success Then
      Dim Content As String = MatchResults.Groups("string").Value
      Content = Content.Replace("%%", "%")
      If Cell IsNot Nothing Then
        Cell.addContent(Content)
      ElseIf Row IsNot Nothing Then
        Throw New Exception("Invalid data: string after row keyword")
      ElseIf Table IsNot Nothing Then
        Table.addCaption(Content)
      Else
        Throw New Exception("Invalid data: string before table keyword")
      End If
    ElseIf MatchResults.Groups("error").Success Then
      Throw New Exception("Invalid data")
    Else
      Throw New Exception("Parser bug: no capturing group matched")
    End If
    MatchResults = MatchResults.NextMatch()
  End While
  If Table Is Nothing Then
    Throw New Exception("Invalid data: table keyword missing")
  End If
  Return Table
End Function

Java

RECTable ImportTable(String fileContents) throws Exception {
  RECTable table = null;
  RECRow row = null;
  RECCell cell = null;
  final int groupkeyword = 1;
  final int groupstring = 2;
  final int grouperror = 3;
  Pattern regex = Pattern.compile(
      "  \\b(table|row|cell)\\b\n" +
      "| %([^%]*(?:%%[^%]*)*)%\n" +
      "| (\\S+)",
      Pattern.CASE_INSENSITIVE | Pattern.COMMENTS);
  Matcher regexMatcher = regex.matcher(fileContents);
  while (regexMatcher.find()) {
    if (regexMatcher.start(groupkeyword) >= 0) {
      String keyword = regexMatcher.group(groupkeyword).toLowerCase();
      if (keyword.equals("table")) {
        table = new RECTable();
        row = null;
        cell = null;
      } else if (keyword.equals("row")) {
        if (table == null)
          throw new Exception("Invalid data: row without table");
        row = table.addRow();
        cell = null;
      } else if (keyword.equals("cell")) {
        if (row == null)
          throw new Exception("Invalid data: cell without row");
        cell = row.addCell();
      } else {
        throw new Exception("Parser bug: unknown keyword");
      }
    } else if (regexMatcher.start(groupstring) >= 0) {
      String content = regexMatcher.group(groupstring);
      content = content.replaceAll("%%", "%");
      if (cell != null)
        cell.addContent(content);
      else if (row != null)
        throw new Exception("Invalid data: String after row keyword");
      else if (table != null)
        table.addCaption(content);
      else
        throw new Exception("Invalid data: String before table keyword");
    } else if (regexMatcher.start(grouperror) >= 0) {
      throw new Exception("Invalid data: " + 
                          regexMatcher.group(grouperror));
    } else {
      throw new Exception("Parser bug: no capturing group matched");
    }
  }
  if (table == null)
    throw new Exception("Invalid data: table keyword missing");
  return table;
}

JavaScript

function importTable(fileContents) {
  var table = null;
  var row = null;
  var cell = null;
  var groupkeyword = 1;
  var groupstring = 2;
  var grouperror = 3;
  var myregexp = /\b(table|row|cell)\b|%([^%]*(?:%%[^%]*)*)%|(\S+)/ig;
  var match;
  var keyword;
  var content;
  while (match = myregexp.exec(fileContents)) {
    if (match[groupkeyword] !== undefined) {
      keyword = match[groupkeyword].toLowerCase();
      if (keyword == "table") {
        table = new RECTable();
        row = null;
        cell = null;
      } else if (keyword == "row") {
        if (!table)
          throw new Error("Invalid data: row without table");
        row = table.addRow();
        cell = null;
      } else if (keyword == "cell") {
        if (!row)
          throw new Error("Invalid data: cell without row");
        cell = row.addCell();
      } else {
        throw new Error("Parser bug: unknown keyword");
      }
    } else if (match[groupstring] !== undefined) {
      content = match[groupstring].replace(/%%/g, "%");
      if (cell)
        cell.addContent(content);
      else if (row)
        throw new Error("Invalid data: string after row keyword");
      else if (table)
        table.addCaption(content);
      else
        throw new Error("Invalid data: string before table keyword");
    } else if (match[grouperror] !== undefined) {
      throw new Error("Invalid data: " + match[grouperror]);
    } else {
      throw new Error("Parser bug: no capturing group matched");
    }
  }
  if (!table)
    throw new Error("Invalid data: table keyword missing");
  return table;
}

XRegExp

function importTable(fileContents) {
  var table = null;
  var row = null;
  var cell = null;
  var myregexp = XRegExp("(?ix)\\b(?<keyword>table|row|cell)\\b" +
                         "   | %(?<string>[^%]*(?:%%[^%]*)*)%" +
                         "   | (?<error>\\S+)");
  XRegExp.forEach(fileContents, myregexp, function(match) {
    var keyword;
    var content;
    if (match.keyword !== undefined) {
      keyword = match.keyword.toLowerCase();
      if (keyword == "table") {
        table = new RECTable();
        row = null;
        cell = null;
      } else if (keyword == "row") {
        if (!table)
          throw new Error("Invalid data: row without table");
        row = table.addRow();
        cell = null;
      } else if (keyword == "cell") {
        if (!row)
          throw new Error("Invalid data: cell without row");
        cell = row.addCell();
      } else {
        throw new Error("Parser bug: unknown keyword");
      }
    } else if (match.string !== undefined) {
      content = match.string.replace(/%%/g, "%");
      if (cell)
        cell.addContent(content);
      else if (row)
        throw new Error("Invalid data: string after row keyword");
      else if (table)
        table.addCaption(content);
      else
        throw new Error("Invalid data: string before table keyword");
    } else if (match.error !== undefined) {
      throw new Error("Invalid data: " + match.error);
    } else {
      throw new Error("Parser bug: no capturing group matched");
    }
  });
  if (!table)
    throw new Error("Invalid data: table keyword missing");
  return table;
}

Perl

sub importtable {
  my $filecontents = shift;
  my $table;
  my $row;
  my $cell;
  while ($filecontents =~ 
          m/  \b(table|row|cell)\b
          | %([^%]*(?:%%[^%]*)*)%
          | (\S+)/ixg) {
    if (defined($1)) { # Keyword
      my $keyword = lc($1);
      if ($keyword eq "table") {
        $table = new RECTable();
        undef $row;
        undef $cell;
      } elsif ($keyword eq "row") {
        if (!defined($table)) {
          die "Invalid data: row without table";
        }
        $row = $table->addRow();
        undef $cell;
      } elsif ($keyword eq "cell") {
        if (!defined($row)) {
          die "Invalid data: cell without row";
        }
        $cell = $row->addCell();
      } else {
        die "Parser bug: unknown keyword";
      }
    } elsif (defined($2)) { # String
      my $content = $2;
      $content =~ s/%%/%/g;
      if (defined($cell)) {
        $cell->addContent($content);
      } elsif (defined($row)) {
        die "Invalid data: string after row keyword";
      } elsif (defined($table)) {
        $table->addCaption($content);
      } else {
        die "Invalid data: string before table keyword";
      }
    } elsif (defined($3)) { # Error
      die "Invalid data: $3";
    } else {
      die "Parser bug: no capturing group matched";
    }
  }
  if (!defined(table)) {
    die "Invalid data: table keyword missing";
  }
  return table;
}

Python

def importtable(filecontents):
  table = None
  row = None
  cell = None
  for match in re.finditer(
    r"""(?ix)\b(?P<keyword>table|row|cell)\b
             | %(?P<string>[^%]*(?:%%[^%]*)*)%
             | (?P<error>\S+)""", filecontents):
    if match.group("keyword") != None:
      keyword = match.group("keyword").lower()
      if keyword == "table":
        table = RECTable()
        row = None
        cell = None
      elif keyword == "row":
        if table == None:
          raise Exception("Invalid data: row without table")
        row = table.addRow()
        cell = None
      elif keyword == "cell":
        if row == None:
          raise Exception("Invalid data: cell without row")
        cell = row.addCell()
      else:
        raise Exception("Parser bug: unknown keyword")
    elif match.group("string") != None:
      content = match.group("string").replace("%%", "%")
      if cell != None:
        cell.addContent(content)
      elif row != None:
        raise Exception("Invalid data: string after row keyword")
      elif table != None:
        table.addCaption(content)
      else:
        raise Exception("Invalid data: string before table keyword")
    elif match.group("error") != None:
      raise Exception("Invalid data: " + match.group("error"))
    else:
      raise Exception("Parser bug: no capturing group matched")
  if table == None:
    raise Exception("Invalid data: table keyword missing")
  return table

PHP

function importTable($fileContents) {
  preg_match_all(
    '/  \b(?P<keyword>table|row|cell)\b
      | (?P<string>%[^%]*(?:%%[^%]*)*%)
      | (?P<error>\S+)/ix',
    $fileContents, $matches, PREG_PATTERN_ORDER);
  $table = NULL;
  $row = NULL;
  $cell = NULL;
  for ($i = 0; $i < count($matches[0]); $i++) {
    if ($matches['keyword'][$i] != NULL) {
      $keyword = strtolower($matches['keyword'][$i]);
      if ($keyword == "table") {
        $table = new RECTable();
        $row = NULL;
        $cell = NULL;
      } elseif ($keyword == "row") {
        if ($table == NULL)
          throw new Exception("Invalid data: row without table");
        $row = $table->addRow();
        $cell = NULL;
      } elseif ($keyword == "cell") {
        if ($row == NULL)
          throw new Exception("Invalid data: cell without row");
        $cell = $row->addCell();
      } else {
        throw new Exception("Parser bug: unknown keyword");
      }
    } elseif ($matches['string'][$i] != NULL) {
      $content = $matches['string'][$i];
      $content = substr($content, 1, strlen($content)-2);
      $content = str_replace('%%', '%', $content);
      if ($cell != NULL)
        $cell->addContent($content);
      elseif ($row != NULL)
        throw new Exception("Invalid data: string after row keyword");
      elseif ($table != NULL)
        $table->addCaption($content);
      else
        throw new Exception("Invalid data: string before table keyword");
    } elseif ($matches['error'][$i] != NULL) {
      throw new Exception("Invalid data: " + $matches['error'][$i]);
    } else {
      throw new Exception("Parser bug: no capturing group matched");
    }
  }
  if ($table == NULL)
    throw new Exception("Invalid data: table keyword missing");
  return $table;
}

Ruby

def importtable(filecontents)
  table = nil
  row = nil
  cell = nil
  groupkeyword = 0;
  groupstring = 1;
  grouperror = 2;
  regexp = /  \b(table|row|cell)\b
            | %([^%]*(?:%%[^%]*)*)%
            | (\S+)/ix
  filecontents.scan(regexp) do |match|
    if match[groupkeyword]
      keyword = match[groupkeyword].downcase
      if keyword == "table"
        table = RECTable.new()
        row = nil
        cell = nil
      elsif keyword == "row"
        if table.nil?
          raise "Invalid data: row without table"
        end
        row = table.addRow()
        cell = nil
      elsif keyword == "cell"
        if row.nil?
          raise "Invalid data: cell without row"
        end
        cell = row.addCell()
      else
        raise "Parser bug: unknown keyword"
      end
    elsif not match[groupstring].nil?
      content = match[groupstring].gsub("%%", "%")
      if not cell.nil?
        cell.addContent(content)
      elsif not row.nil?
        raise "Invalid data: string after row keyword"
      elsif not table.nil?
        table.addCaption(content)
      else
        raise "Invalid data: string before table keyword"
      end
    elsif not match[grouperror].nil?
      raise "Invalid data: " + match.group("error")
    else
      raise "Parser bug: no capturing group matched"
    end
  end
  if table.nil?
    raise "Invalid data: table keyword missing"
  end
  return table
end

Discussion

  \b(?<keyword>table|row|cell)\b
| %(?<string>[^%]*(?:%%[^%]*)*)%
| (?<error>\S+)

Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java 7, XRegExp, PCRE 7, Perl 5.10, Ruby 1.9

  \b(?P<keyword>table|row|cell)\b
| %(?P<string>[^%]*(?:%%[^%]*)*)%
| (?P<error>\S+)

Regex options: Free-spacing, case insensitive
Regex flavors: PCRE 4 and later, Perl 5.10, Python

  \b(table|row|cell)\b
| %([^%]*(?:%%[^%]*)*)%
| (\S+)

Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

\b(table|row|cell)\b|%([^%]*+(?:%%[^%]*+)*+)%|(\S+)

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

4. Validation and Formatting

4.1. Validate Email Addresses

Solution

Simple

^\S+@\S+$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python

\A\S+@\S+\Z

Regex options: None
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

Simple, with restrictions on characters

^[A-Z0-9+_.-]+@[A-Z0-9.-]+$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python

\A[A-Z0-9+_.-]+@[A-Z0-9.-]+\Z

Regex options: Case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

Simple, with all valid local part characters

^[A-Z0-9_!#$%&'*+/=?`{|}~^.-]+@[A-Z0-9.-]+$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python

\A[A-Z0-9_!#$%&'*+/=?`{|}~^.-]+@[A-Z0-9.-]+\Z

Regex options: Case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

No leading, trailing, or consecutive dots

^[A-Z0-9_!#$%&'*+/=?`{|}~^-]+(?:\.[A-Z0-9_!#$%&'*+/=?`{|}~^-]+)*@[A-Z0-9-]+(?:\.[A-Z0-9-]+)*$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python

\A[A-Z0-9_!#$%&'*+/=?`{|}~^-]+(?:\.[A-Z0-9_!#$%&'*+/=?`{|}~^-]+)*@[A-Z0-9-]+(?:\.[A-Z0-9-]+)*\Z

Regex options: Case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

Top-level domain has two to six letters

^[\w!#$%&'*+/=?`{|}~^-]+(?:\.[\w!#$%&'*+/=?`{|}~^-]+)*@(?:[A-Z0-9-]+\.)+[A-Z]{2,6}$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python

\A[\w!#$%&'*+/=?`{|}~^-]+(?:\.[\w!#$%&'*+/=?`{|}~^-]+)*@(?:[A-Z0-9-]+\.)+[A-Z]{2,6}\Z

Regex options: Case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

4.2. Validate and Format North American Phone Numbers

Solution

Regular expression

^\(?([0-9]{3})\)?[-. ]?([0-9]{3})[-. ]?([0-9]{4})$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Replacement

($1) $2-$3

(\1) \2-\3

C# example

Regex phoneRegex =
    new Regex(@"^\(?([0-9]{3})\)?[-. ]?([0-9]{3})[-. ]?([0-9]{4})$");

if (phoneRegex.IsMatch(subjectString)) {
    string formattedPhoneNumber =
        phoneRegex.Replace(subjectString, "($1) $2-$3");
} else {
    // Invalid phone number
}

JavaScript example

var phoneRegex = /^\(?([0-9]{3})\)?[-. ]?([0-9]{3})[-. ]?([0-9]{4})$/;

if (phoneRegex.test(subjectString)) {
    var formattedPhoneNumber =
        subjectString.replace(phoneRegex, "($1) $2-$3");
} else {
    // Invalid phone number
}

Discussion

^        # Assert position at the beginning of the string.
\(       # Match a literal "("
  ?      #   between zero and one time.
(        # Capture the enclosed match to backreference 1:
  [0-9]  #   Match a digit
    {3}  #     exactly three times.
)        # End capturing group 1.
\)       # Match a literal ")"
  ?      #   between zero and one time.
[-. ]    # Match one hyphen, dot, or space
  ?      #   between zero and one time.
…       # [Match the remaining digits and separator.]
$        # Assert position at the end of the string.

Variations

Eliminate invalid phone numbers

^\(?([2-9][0-8][0-9])\)?[-. ]?([2-9][0-9]{2})[-. ]?([0-9]{4})$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Find phone numbers in documents

\(?\b([0-9]{3})\)?[-. ]?([0-9]{3})[-. ]?([0-9]{4})\b

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Allow a leading “1”

^(?:\+?1[-. ]?)?\(?([0-9]{3})\)?[-. ]?([0-9]{3})[-. ]?([0-9]{4})$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Allow seven-digit phone numbers

^(?:\(?([0-9]{3})\)?[-. ]?)?([0-9]{3})[-. ]?([0-9]{4})$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

4.3. Validate International Phone Numbers

Solution

Regular expression

^\+(?:[0-9] ?){6,14}[0-9]$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

JavaScript example

function validate(phone) {
    var regex = /^\+(?:[0-9] ?){6,14}[0-9]$/;

    if (regex.test(phone)) {
        // Valid international phone number
    } else {
        // Invalid international phone number
    }
}

Discussion

^         # Assert position at the beginning of the string.
\+        # Match a literal "+" character.
(?:       # Group but don't capture:
  [0-9]   #   Match a digit.
  \x20    #   Match a space character
    ?     #     between zero and one time.
)         # End the noncapturing group.
  {6,14}  #   Repeat the group between 6 and 14 times.
[0-9]     # Match a digit.
$         # Assert position at the end of the string.

Regex options: Free-spacing
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

Variations

Validate international phone numbers in EPP format

^\+[0-9]{1,3}\.[0-9]{4,14}(?:x.+)?$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

4.4. Validate Traditional Date Formats

Solution

^[0-3]?[0-9]/[0-3]?[0-9]/(?:[0-9]{2})?[0-9]{2}$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^[0-3][0-9]/[0-3][0-9]/(?:[0-9][0-9])?[0-9][0-9]$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])/(?:[0-9]{2})?[0-9]{2}$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(1[0-2]|0[1-9])/(3[01]|[12][0-9]|0[1-9])/[0-9]{4}$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(3[01]|[12][0-9]|0?[1-9])/(1[0-2]|0?[1-9])/(?:[0-9]{2})?[0-9]{2}$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(3[01]|[12][0-9]|0[1-9])/(1[0-2]|0[1-9])/[0-9]{4}$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(?:(1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])|(3[01]|[12][0-9]|0?[1-9])/(1[0-2]|0?[1-9]))/(?:[0-9]{2})?[0-9]{2}$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(?:
  # m/d or mm/dd
  (1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])
|
  # d/m or dd/mm
  (3[01]|[12][0-9]|0?[1-9])/(1[0-2]|0?[1-9])
)
# /yy or /yyyy
/(?:[0-9]{2})?[0-9]{2}$

Regex options: Free-spacing
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

^(?:(1[0-2]|0[1-9])/(3[01]|[12][0-9]|0[1-9])|(3[01]|[12][0-9]|0[1-9])/(1[0-2]|0[1-9]))/[0-9]{4}$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(?:
  # mm/dd
  (1[0-2]|0[1-9])/(3[01]|[12][0-9]|0[1-9])
|
  # dd/mm
  (3[01]|[12][0-9]|0[1-9])/(1[0-2]|0[1-9])
)
# /yyyy
/[0-9]{4}$

Regex options: Free-spacing
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

Variations

\b(1[0-2]|0[1-9])/(3[01]|[12][0-9]|0[1-9])/[0-9]{4}\b

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

4.5. Validate Traditional Date Formats, Excluding Invalid Dates

Solution

C#

^(?<month>[0-3]?[0-9])/(?<day>[0-3]?[0-9])/(?<year>(?:[0-9]{2})?[0-9]{2})$

Regex options: None
Regex flavors: .NET, Java 7, XRegExp, PCRE 7, Perl 5.10

DateTime foundDate;
Match matchResult = Regex.Match(SubjectString,
    "^(?<month>[0-3]?[0-9])/(?<day>[0-3]?[0-9])/" +
    "(?<year>(?:[0-9]{2})?[0-9]{2})$");
if (matchResult.Success) {
    int year = int.Parse(matchResult.Groups["year"].Value);
    if (year < 50) year += 2000;
    else if (year < 100) year += 1900;
    try {
        foundDate = new DateTime(year,
            int.Parse(matchResult.Groups["month"].Value),
            int.Parse(matchResult.Groups["day"].Value));
    } catch {
        // Invalid date
    }
}

^(?<day>[0-3]?[0-9])/(?<month>[0-3]?[0-9])/(?<year>(?:[0-9]{2})?[0-9]{2})$

Regex options: None
Regex flavors: .NET, Java 7, XRegExp, PCRE 7, Perl 5.10

DateTime foundDate;
Match matchResult = Regex.Match(SubjectString,
    "^(?<day>[0-3]?[0-9])/(?<month>[0-3]?[0-9])/" +
    "(?<year>(?:[0-9]{2})?[0-9]{2})$");
if (matchResult.Success) {
    int year = int.Parse(matchResult.Groups["year"].Value);
    if (year < 50) year += 2000;
    else if (year < 100) year += 1900;
    try {
        foundDate = new DateTime(year,
            int.Parse(matchResult.Groups["month"].Value),
            int.Parse(matchResult.Groups["day"].Value));
    } catch {
        // Invalid date
    }
}

Perl

^([0-3]?[0-9])/([0-3]?[0-9])/((?:[0-9]{2})?[0-9]{2})$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

@daysinmonth = (31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31);
$validdate = 0;
if ($subject =~ m!^([0-3]?[0-9])/([0-3]?[0-9])/((?:[0-9]{2})?[0-9]{2})$!) 
{
    $month = $1;
    $day = $2;
    $year = $3;
    $year += 2000 if $year < 50;
    $year += 1900 if $year < 100;
    if ($month == 2 && $year % 4 == 0 && ($year % 100 != 0 ||
                                          $year % 400 == 0)) {
    	$validdate = 1 if $day >= 1 && $day <= 29;
    } elsif ($month >= 1 && $month <= 12) {
        $validdate = 1 if $day >= 1 && $day <= $daysinmonth[$month-1];
    }
}

@daysinmonth = (31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31);
$validdate = 0;
if ($subject =~ m!^([0-3]?[0-9])/([0-3]?[0-9])/((?:[0-9]{2})?[0-9]{2})$!) 
{
    $day = $1;
    $month = $2;
    $year = $3;
    $year += 2000 if $year < 50;
    $year += 1900 if $year < 100;
    if ($month == 2 && $year % 4 == 0 && ($year % 100 != 0 ||
                                          $year % 400 == 0)) {
    	$validdate = 1 if $day >= 1 && $day <= 29;
    } elsif ($month >= 1 && $month <= 12) {
        $validdate = 1 if $day >= 1 && $day <= $daysinmonth[$month-1];
    }
}

Pure regular expression

^(?:
  # February (29 days every year)
  (?<month>0?2)/(?<day>[12][0-9]|0?[1-9])
|
  # 30-day months
  (?<month>0?[469]|11)/(?<day>30|[12][0-9]|0?[1-9])
|
  # 31-day months
  (?<month>0?[13578]|1[02])/(?<day>3[01]|[12][0-9]|0?[1-9])
)
# Year
/(?<year>(?:[0-9]{2})?[0-9]{2})$

Regex options: Free-spacing
Regex flavors: .NET, Perl 5.10, Ruby 1.9

^(?:
  # February (29 days every year)
  (0?2)/([12][0-9]|0?[1-9])
|
  # 30-day months
  (0?[469]|11)/(30|[12][0-9]|0?[1-9])
|
  # 31-day months
  (0?[13578]|1[02])/(3[01]|[12][0-9]|0?[1-9])
)
# Year
/((?:[0-9]{2})?[0-9]{2})$

Regex options: Free-spacing
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

^(?:(0?2)/([12][0-9]|0?[1-9])|(0?[469]|11)/(30|[12][0-9]|0?[1-9])|(0?[13578]|1[02])/(3[01]|[12][0-9]|0?[1-9]))/((?:[0-9]{2})?[0-9]{2})$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(?:
  # February (29 days every year)
  (?<day>[12][0-9]|0?[1-9])/(?<month>0?2)
|
  # 30-day months
  (?<day>30|[12][0-9]|0?[1-9])/(?<month>0?[469]|11)
|
  # 31-day months
  (?<day>3[01]|[12][0-9]|0?[1-9])/(?<month>0?[13578]|1[02])
)
# Year
/(?<year>(?:[0-9]{2})?[0-9]{2})$

Regex options: Free-spacing
Regex flavors: .NET, Perl 5.10, Ruby 1.9

^(?:
  # February (29 days every year)
  ([12][0-9]|0?[1-9])/(0?2)
|
  # 30-day months
  (30|[12][0-9]|0?[1-9])/([469]|11)
|
  # 31-day months
  (3[01]|[12][0-9]|0?[1-9])/(0?[13578]|1[02])
)
# Year
/((?:[0-9]{2})?[0-9]{2})$

Regex options: Free-spacing
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

^(?:([12][0-9]|0?[1-9])/(0?2)|(30|[12][0-9]|0?[1-9])/([469]|11)|(3[01]|[12][0-9]|0?[1-9])/(0?[13578]|1[02]))/((?:[0-9]{2})?[0-9]{2})$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Variations

# 2 May 2007 till 29 August 2008
^(?:
  # 2 May 2007 till 31 December 2007
  (?:
    # 2 May till 31 May
    (?<day>3[01]|[12][0-9]|0?[2-9])/(?<month>0?5)/(?<year>2007)
  |
    # 1 June till 31 December
    (?:
      # 30-day months
      (?<day>30|[12][0-9]|0?[1-9])/(?<month>0?[69]|11)
    |
      # 31-day months
      (?<day>3[01]|[12][0-9]|0?[1-9])/(?<month>0?[78]|1[02])
    )
    /(?<year>2007)
  )
|
  # 1 January 2008 till 29 August 2008
  (?:
    # 1 August till 29 August
    (?<day>[12][0-9]|0?[1-9])/(?<month>0?8)/(?<year>2008)
  |
    # 1 Janary till 30 June
    (?:
      # February
      (?<day>[12][0-9]|0?[1-9])/(?<month>0?2)
    |
      # 30-day months
      (?<day>30|[12][0-9]|0?[1-9])/(?<month>0?[46])
    |
      # 31-day months
      (?<day>3[01]|[12][0-9]|0?[1-9])/(?<month>0?[1357])
    )
    /(?<year>2008)
  )
)$

Regex options: Free-spacing
Regex flavors: .NET, Perl 5.10, Ruby 1.9

4.6. Validate Traditional Time Formats

Solution

^(1[0-2]|0?[1-9]):([0-5]?[0-9])( ?[AP]M)?$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(2[0-3]|[01]?[0-9]):([0-5]?[0-9])$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(1[0-2]|0?[1-9]):([0-5]?[0-9]):([0-5]?[0-9])( ?[AP]M)?$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(2[0-3]|[01]?[0-9]):([0-5]?[0-9]):([0-5]?[0-9])$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Variations

\b(2[0-3]|[01]?[0-9]):([0-5]?[0-9])\b

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

(?<![:\w])(2[0-3]|[01]?[0-9]):([0-5]?[0-9])(?![:\w])

Regex options: None
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby 1.9

4.7. Validate ISO 8601 Dates and Times

Solution

Dates

^([0-9]{4})-(1[0-2]|0[1-9])$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(?<year>[0-9]{4})-(?<month>1[0-2]|0[1-9])$

Regex options: None
Regex flavors: .NET, Java 7, XRegExp, PCRE 7, Perl 5.10, Ruby 1.9

^(?P<year>[0-9]{4})-(?P<month>1[0-2]|0[1-9])$

Regex options: None
Regex flavors: PCRE, Python

^([0-9]{4})-?(1[0-2]|0[1-9])-?(3[01]|0[1-9]|[12][0-9])$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(?<year>[0-9]{4})-?(?<month>1[0-2]|0[1-9])-?(?<day>3[01]|0[1-9]|[12][0-9])$

Regex options: None
Regex flavors: .NET, Java 7, XRegExp, PCRE 7, Perl 5.10, Ruby 1.9

^([0-9]{4})(-?)(1[0-2]|0[1-9])\2(3[01]|0[1-9]|[12][0-9])$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(?<year>[0-9]{4})(?<hyphen>-?)(?<month>1[0-2]|0[1-9])\k<hyphen>(?<day>3[01]|0[1-9]|[12][0-9])$

Regex options: None
Regex flavors: .NET, Java 7, XRegExp, PCRE 7, Perl 5.10, Ruby 1.9

^(?P<year>[0-9]{4})(?P<hyphen>-?)(?P<month>1[0-2]|0[1-9])(?P=hyphen)(?<day>3[01]|0[1-9]|[12][0-9])$

Regex options: None
Regex flavors: .NET, Java 7, XRegExp, PCRE 7, Perl 5.10, Ruby 1.9

^([0-9]{4})-?(36[0-6]|3[0-5][0-9]|[12][0-9]{2}|0[1-9][0-9]|00[1-9])$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(?<year>[0-9]{4})-?(?<day>36[0-6]|3[0-5][0-9]|[12][0-9]{2}|0[1-9][0-9]|00[1-9])$

Regex options: None
Regex flavors: .NET, Java 7, PCRE 7, Perl 5.10, Ruby 1.9

Weeks

^([0-9]{4})-?W(5[0-3]|[1-4][0-9]|0[1-9])$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(?<year>[0-9]{4})-?W(?<week>5[0-3]|[1-4][0-9]|0[1-9])$

Regex options: None
Regex flavors: .NET, Java 7, XRegExp, PCRE 7, Perl 5.10, Ruby 1.9

^([0-9]{4})-?W(5[0-3]|[1-4][0-9]|0[1-9])-?([1-7])$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(?<year>[0-9]{4})-?W(?<week>5[0-3]|[1-4][0-9]|0[1-9])-?(?<day>[1-7])$

Regex options: None
Regex flavors: .NET, Java 7, XRegExp, PCRE 7, Perl 5.10, Ruby 1.9

Times

^(2[0-3]|[01][0-9]):?([0-5][0-9])$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(?<hour>2[0-3]|[01][0-9]):?(?<minute>[0-5][0-9])$

Regex options: None
Regex flavors: .NET, Java 7, XRegExp, PCRE 7, Perl 5.10, Ruby 1.9

^(2[0-3]|[01][0-9]):?([0-5][0-9]):?([0-5][0-9])$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(?<hour>2[0-3]|[01][0-9]):?(?<minute>[0-5][0-9]):?(?<second>[0-5][0-9])$

Regex options: None
Regex flavors: .NET, Java 7, XRegExp, PCRE 7, Perl 5.10, Ruby 1.9

^(Z|[+-](?:2[0-3]|[01][0-9])(?::?(?:[0-5][0-9]))?)$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(2[0-3]|[01][0-9]):?([0-5][0-9]):?([0-5][0-9])(Z|[+-](?:2[0-3]|[01][0-9])(?::?(?:[0-5][0-9]))?)$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(?<hour>2[0-3]|[01][0-9]):?(?<minute>[0-5][0-9]):?(?<second>[0-5][0-9])(?<timezone>Z|[+-](?:2[0-3]|[01][0-9])(?::?(?:[0-5][0-9]))?)$

Regex options: None
Regex flavors: .NET, Java 7, XRegExp, PCRE 7, Perl 5.10, Ruby 1.9

Date and time

^([0-9]{4})-?(1[0-2]|0[1-9])-?(3[01]|0[1-9]|[12][0-9]) (2[0-3]|[01][0-9]):?([0-5][0-9]):?([0-5][0-9])$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(?<year>[0-9]{4})-?(?<month>1[0-2]|0[1-9])-?(?<day>3[01]|0[1-9]|[12][0-9]) (?<hour>2[0-3]|[01][0-9]):?(?<minute>[0-5][0-9]):?(?<second>[0-5][0-9])$

Regex options: None
Regex flavors: .NET, Java 7, XRegExp, PCRE 7, Perl 5.10, Ruby 1.9

^([0-9]{4})(-)?(1[0-2]|0[1-9])(?(2)-)(3[01]|0[1-9]|[12][0-9]) (2[0-3]|[01][0-9])(?(2):)([0-5][0-9])(?(2):)([0-5][0-9])$

Regex options: None
Regex flavors: .NET, PCRE, Perl, Python

^(?<year>[0-9]{4})(?<hyphen>-)?(?<month>1[0-2]|0[1-9])(?(hyphen)-)(?<day>3[01]|0[1-9]|[12][0-9]) (?<hour>2[0-3]|[01][0-9])(?(hyphen):)(?<minute>[0-5][0-9])(?(hyphen):)(?<second>[0-5][0-9])$

Regex options: None
Regex flavors: .NET, PCRE 7, Perl 5.10

^(?P<year>[0-9]{4})(?P<hyphen>-)?(?P<month>1[0-2]|0[1-9])(?(hyphen)-)(?P<day>3[01]|0[1-9]|[12][0-9]) (?P<hour>2[0-3]|[01][0-9])(?(hyphen):)(?P<minute>[0-5][0-9])(?(hyphen):)(?P<second>[0-5][0-9])$

Regex options: None
Regex flavors: PCRE, Perl 5.10, Python

^(?:([0-9]{4})-?(1[0-2]|0[1-9])-?(3[01]|0[1-9]|[12][0-9]) (2[0-3]|[01][0-9]):?([0-5][0-9]):?([0-5][0-9])|([0-9]{4})(1[0-2]|0[1-9])(3[01]|0[1-9]|[12][0-9]) (2[0-3]|[01][0-9])([0-5][0-9])([0-5][0-9]))$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

XML Schema dates and times

^(-?(?:[1-9][0-9]*)?[0-9]{4})-(1[0-2]|0[1-9])-(3[01]|0[1-9]|[12][0-9])(Z|[+-](?:2[0-3]|[01][0-9]):[0-5][0-9])?$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(?<year>-?(?:[1-9][0-9]*)?[0-9]{4})-(?<month>1[0-2]|0[1-9])-(?<day>3[01]|0[1-9]|[12][0-9])(?<timezone>Z|[+-](?:2[0-3]|[01][0-9]):[0-5][0-9])?$

Regex options: None
Regex flavors: .NET, Java 7, XRegExp, PCRE 7, Perl 5.10, Ruby 1.9

^(2[0-3]|[01][0-9]):([0-5][0-9]):([0-5][0-9])(\.[0-9]+)?(Z|[+-](?:2[0-3]|[01][0-9]):[0-5][0-9])?$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(?<hour>2[0-3]|[01][0-9]):(?<minute>[0-5][0-9]):(?<second>[0-5][0-9])(?<frac>\.[0-9]+)?(?<timezone>Z|[+-](?:2[0-3]|[01][0-9]):[0-5][0-9])?$

Regex options: None
Regex flavors: .NET, Java 7, XRegExp, PCRE 7, Perl 5.10, Ruby 1.9

^(-?(?:[1-9][0-9]*)?[0-9]{4})-(1[0-2]|0[1-9])-(3[01]|0[1-9]|[12][0-9])T(2[0-3]|[01][0-9]):([0-5][0-9]):([0-5][0-9])(\.[0-9]+)?(Z|[+-](?:2[0-3]|[01][0-9]):[0-5][0-9])?$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(?<year>-?(?:[1-9][0-9]*)?[0-9]{4})-(?<month>1[0-2]|0[1-9])-(?<day>3[01]|0[1-9]|[12][0-9])T(?<hour>2[0-3]|[01][0-9]):(?<minute>[0-5][0-9]):(?<second>[0-5][0-9])(?<ms>\.[0-9]+)?(?<timezone>Z|[+-](?:2[0-3]|[01][0-9]):[0-5][0-9])?$

Regex options: None
Regex flavors: .NET, Java 7, XRegExp, PCRE 7, Perl 5.10, Ruby 1.9

4.8. Limit Input to Alphanumeric Characters

Solution

Regular expression

^[A-Z0-9]+$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Ruby example

if subject =~ /^[A-Z0-9]+$/i
    puts "Subject is alphanumeric"
else
    puts "Subject is not alphanumeric"
end

Discussion

^         # Assert position at the beginning of the string.
[A-Z0-9]  # Match a character from A to Z or from 0 to 9
  +       #   between one and unlimited times.
$         # Assert position at the end of the string.

Regex options: Case insensitive, free-spacing
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

Variations

Limit input to ASCII characters

^[\x00-\x7F]+$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Limit input to ASCII noncontrol characters and line breaks

^[\n\r\x20-\x7E]+$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Limit input to shared ISO-8859-1 and Windows-1252 characters

^[\x00-\x7F\xA0-\xFF]+$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Limit input to alphanumeric characters in any language

^[\p{L}\p{M}\p{Nd}]+$

Regex options: None
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Ruby 1.9

^[^\W_]+$

Regex options: Unicode
Regex flavors: Python

4.9. Limit the Length of Text

Solution

Regular expression

^[A-Z]{1,10}$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Perl example

if ($ARGV[0] =~ /^[A-Z]{1,10}$/) {
    print "Input is valid\n";
} else {
    print "Input is invalid\n";
}

Discussion

^         # Assert position at the beginning of the string.
[A-Z]     # Match one letter from A to Z
  {1,10}  #   between 1 and 10 times.
$         # Assert position at the end of the string.

Regex options: Free-spacing
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

Variations

Limit the length of an arbitrary pattern

^(?=.{1,10}$).*

Regex options: Dot matches line breaks
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

^(?=[\S\s]{1,10}$)[\S\s]*

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Limit the number of nonwhitespace characters

^\s*(?:\S\s*){10,100}$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^[\p{Z}\s]*(?:[^\p{Z}\s][\p{Z}\s]*){10,100}$

Regex options: None
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Ruby 1.9

Limit the number of words

^\W*(?:\w+\b\W*){10,100}$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^[^\p{L}\p{M}\p{Nd}\p{Pc}]*(?:[\p{L}\p{M}\p{Nd}\p{Pc}]+\b[^\p{L}\p{M}\p{Nd}\p{Pc}]*){10,100}$

Regex options: None
Regex flavors: .NET, Java, Perl

^[^\p{L}\p{M}\p{Nd}\p{Pc}]*(?:[\p{L}\p{M}\p{Nd}\p{Pc}]+(?:[^\p{L}\p{M}\p{Nd}\p{Pc}]+|$)){10,100}$

Regex options: None
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Ruby 1.9

^\s*(?:\S+(?:\s+|$)){10,100}$

Regex options: None
Regex flavors: .NET, Java, JavaScript, Perl, PCRE, Python, Ruby

4.10. Limit the Number of Lines in Text

Solution

Regular expression

\A(?>[^\r\n]*(?>\r\n?|\n)){0,4}[^\r\n]*\z

Regex options: None
Regex flavors: .NET, Java, PCRE, Perl, Ruby

\A(?:[^\r\n]*(?:\r\n?|\n)){0,4}[^\r\n]*\Z

Regex options: None

^(?:[^\r\n]*(?:\r\n?|\n)){0,4}[^\r\n]*$

Regex options: None (“^ and $ match at line breaks” must not be set)

PHP (PCRE) example

if (preg_match('/\A(?>[^\r\n]*(?>\r\n?|\n)){0,4}[^\r\n]*\z/', 
               $_POST['subject'])) {
    print 'Subject contains five or fewer lines';
} else {
    print 'Subject contains more than five lines';
}

Discussion

\A          # Assert position at the beginning of the string.
(?>         # Group but don't capture or keep backtracking positions:
  [^\r\n]*  #   Match zero or more characters except CR and LF.
  (?>       #   Group but don't capture or keep backtracking positions:
    \r\n?   #     Match a CR, with an optional following LF (CRLF).
   |        #    Or:
    \n      #     Match a standalone LF character.
  )         #   End the noncapturing, atomic group.
){0,4}      # End group; repeat between zero and four times.
[^\r\n]*    # Match zero or more characters except CR and LF.
\z          # Assert position at the end of the string.

Regex options: Free-spacing
Regex flavors: .NET, Java, PCRE, Perl, Ruby

Variations

Working with esoteric line separators

\A(?>\V*\R){0,4}\V*\z

Regex options: None

\A(?>[^\n-\r\x85\x{2028}\x{2029}]*(?>\r\n?|[\n-\f\x85\x{2028}\x{2029}])){0,4}[^\n-\r\x85\x{2028}\x{2029}]*\z

Regex options: None
Regex flavors: Java 7, PCRE, Perl

\A(?>[^\n-\r\u0085\u2028\u2029]*(?>\r\n?|[\n-\f\u0085\u2028\u2029])){0,4}[^\n-\r\u0085\u2028\u2029]*\z

Regex options: None
Regex flavors: .NET, Java, Ruby 1.9

\A(?>[^\n-\r\x85\u2028\u2029]*(?>\r\n?|[\n-\f\x85\u2028\u2029])){0,4}[^\n-\r\x85\u2028\u2029]*\z

Regex options: None
Regex flavors: .NET, Java

\A(?:[^\n-\r\x85\u2028\u2029]*(?:\r\n?|[\n-\f\x85\u2028\u2029])){0,4}[^\n-\r\x85\u2028\u2029]*\Z

Regex options: None

^(?:[^\n-\r\x85\u2028\u2029]*(?:\r\n?|[\n-\f\x85\u2028\u2029])){0,4}[^\n-\r\x85\u2028\u2029]*$

Regex options: None (“^ and $ match at line breaks” must not be set)

4.11. Validate Affirmative Responses

Solution

Regular expression

^(?:1|t(?:rue)?|y(?:es)?|ok(?:ay)?)$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

JavaScript example

var yes = /^(?:1|t(?:rue)?|y(?:es)?|ok(?:ay)?)$/i;

if (yes.test(subject)) {
    alert("Yes");
} else {
    alert("No");
}

Discussion

^            # Assert position at the beginning of the string.
(?:          # Group but don't capture:
  1          #   Match "1".
 |           #  Or:
  t(?:rue)?  #   Match "t", optionally followed by "rue".
 |           #  Or:
  y(?:es)?   #   Match "y", optionally followed by "es".
 |           #  Or:
  ok(?:ay)?  #   Match "ok", optionally followed by "ay".
)            # End the noncapturing group.
$            # Assert position at the end of the string.

Regex options: Case insensitive, free-spacing
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

4.12. Validate Social Security Numbers

Solution

Regular expression

^(?!000|666)[0-8][0-9]{2}-(?!00)[0-9]{2}-(?!0000)[0-9]{4}$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Python example

if re.match(r"^(?!000|666)[0-8][0-9]{2}-(?!00)[0-9]{2}-(?!0000)[0-9]{4}$", sys.argv[1]):
    print "SSN is valid"
else:
    print "SSN is invalid"

Discussion

^            # Assert position at the beginning of the string.
(?!000|666)  # Assert that neither "000" nor "666" can be matched here.
[0-8]        # Match a digit between 0 and 8.
[0-9]{2}     # Match a digit, exactly two times.
-            # Match a literal "-".
(?!00)       # Assert that "00" cannot be matched here.
[0-9]{2}     # Match a digit, exactly two times.
-            # Match a literal "-".
(?!0000)     # Assert that "0000" cannot be matched here.
[0-9]{4}     # Match a digit, exactly four times.
$            # Assert position at the end of the string.

Regex options: Free-spacing
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

Variations

Find Social Security numbers in documents

\b(?!000|666)[0-8][0-9]{2}-(?!00)[0-9]{2}-(?!0000)[0-9]{4}\b

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

4.13. Validate ISBNs

Solution

Regular expressions

^(?:ISBN(?:-10)?:? )?(?=[0-9X]{10}$|(?=(?:[0-9]+[- ]){3})[- 0-9X]{13}$)[0-9]{1,5}[- ]?[0-9]+[- ]?[0-9]+[- ]?[0-9X]$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^
(?:ISBN(?:-10)?:?\ )?     # Optional ISBN/ISBN-10 identifier.
(?=                       # Basic format pre-checks (lookahead):
  [0-9X]{10}$             #   Require 10 digits/Xs (no separators).
 |                        #  Or:
  (?=(?:[0-9]+[-\ ]){3})  #   Require 3 separators
  [-\ 0-9X]{13}$          #     out of 13 characters total.
)                         # End format pre-checks.
[0-9]{1,5}[-\ ]?          # 1-5 digit group identifier.
[0-9]+[-\ ]?[0-9]+[-\ ]?  # Publisher and title identifiers.
[0-9X]                    # Check digit.
$

Regex options: Free-spacing
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

^(?:ISBN(?:-13)?:? )?(?=[0-9]{13}$|(?=(?:[0-9]+[- ]){4})[- 0-9]{17}$)97[89][- ]?[0-9]{1,5}[- ]?[0-9]+[- ]?[0-9]+[- ]?[0-9]$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^
(?:ISBN(?:-13)?:?\ )?     # Optional ISBN/ISBN-13 identifier.
(?=                       # Basic format pre-checks (lookahead):
  [0-9]{13}$              #   Require 13 digits (no separators).
 |                        #  Or:
  (?=(?:[0-9]+[-\ ]){4})  #   Require 4 separators
  [-\ 0-9]{17}$           #     out of 17 characters total.
)                         # End format pre-checks.
97[89][-\ ]?              # ISBN-13 prefix.
[0-9]{1,5}[-\ ]?          # 1-5 digit group identifier.
[0-9]+[-\ ]?[0-9]+[-\ ]?  # Publisher and title identifiers.
[0-9]                     # Check digit.
$

Regex options: Free-spacing
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

^(?:ISBN(?:-1[03])?:? )?(?=[0-9X]{10}$|(?=(?:[0-9]+[- ]){3})[- 0-9X]{13}$|97[89][0-9]{10}$|(?=(?:[0-9]+[- ]){4})[- 0-9]{17}$)(?:97[89][- ]?)?[0-9]{1,5}[- ]?[0-9]+[- ]?[0-9]+[- ]?[0-9X]$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^
(?:ISBN(?:-1[03])?:?\ )?  # Optional ISBN/ISBN-10/ISBN-13 identifier.
(?=                       # Basic format pre-checks (lookahead):
  [0-9X]{10}$             #   Require 10 digits/Xs (no separators).
 |                        #  Or:
  (?=(?:[0-9]+[-\ ]){3})  #   Require 3 separators
  [-\ 0-9X]{13}$          #     out of 13 characters total.
 |                        #  Or:
  97[89][0-9]{10}$        #   978/979 plus 10 digits (13 total).
 |                        #  Or:
  (?=(?:[0-9]+[-\ ]){4})  #   Require 4 separators
  [-\ 0-9]{17}$           #     out of 17 characters total.
)                         # End format pre-checks.
(?:97[89][-\ ]?)?         # Optional ISBN-13 prefix.
[0-9]{1,5}[-\ ]?          # 1-5 digit group identifier.
[0-9]+[-\ ]?[0-9]+[-\ ]?  # Publisher and title identifiers.
[0-9X]                    # Check digit.
$

Regex options: Free-spacing
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

JavaScript example, with checksum validation

var subject = document.getElementById("isbn").value;

// Checks for ISBN-10 or ISBN-13 format
var regex = /^(?:ISBN(?:-1[03])?:? )?(?=[0-9X]{10}$|(?=(?:[0-9]+[- ]){3})[- 0-9X]{13}$|97[89][0-9]{10}$|(?=(?:[0-9]+[- ]){4})[- 0-9]{17}$)(?:97[89][- ]?)?[0-9]{1,5}[- ]?[0-9]+[- ]?[0-9]+[- ]?[0-9X]$/;

if (regex.test(subject)) {
    // Remove non ISBN digits, then split into an array
    var chars = subject.replace(/[- ]|^ISBN(?:-1[03])?:?/g, "").split("");
    // Remove the final ISBN digit from `chars`, and assign it to `last`
    var last = chars.pop();
    var sum = 0;
    var check, i;

    if (chars.length == 9) {
        // Compute the ISBN-10 check digit
        chars.reverse();
        for (i = 0; i < chars.length; i++) {
            sum += (i + 2) * parseInt(chars[i], 10);
        }
        check = 11 - (sum % 11);
        if (check == 10) {
            check = "X";
        } else if (check == 11) {
            check = "0";
        }
    } else {
        // Compute the ISBN-13 check digit
        for (i = 0; i < chars.length; i++) {
            sum += (i % 2 * 2 + 1) * parseInt(chars[i], 10);
        }
        check = 10 - (sum % 10);
        if (check == 10) {
            check = "0";
        }
    }

    if (check == last) {
        alert("Valid ISBN");
    } else {
        alert("Invalid ISBN check digit");
    }
} else {
    alert("Invalid ISBN");
}

Python example, with checksum validation

import re
import sys

subject = sys.argv[1]

# Checks for ISBN-10 or ISBN-13 format
regex = re.compile("^(?:ISBN(?:-1[03])?:? )?(?=[0-9X]{10}$|(?=(?:[0-9]+[- ]){3})[- 0-9X]{13}$|97[89][0-9]{10}$|(?=(?:[0-9]+[- ]){4})[- 0-9]{17}$)(?:97[89][- ]?)?[0-9]{1,5}[- ]?[0-9]+[- ]?[0-9]+[- ]?[0-9X]$")

if regex.search(subject):
    # Remove non ISBN digits, then split into a list
    chars = list(re.sub("[- ]|^ISBN(?:-1[03])?:?", "", subject))
    # Remove the final ISBN digit from `chars`, and assign it to `last`
    last = chars.pop()

    if len(chars) == 9:
        # Compute the ISBN-10 check digit
        val = sum((x + 2) * int(y) for x,y in enumerate(reversed(chars)))
        check = 11 - (val % 11)
        if check == 10:
            check = "X"
        elif check == 11:
            check = "0"
    else:
        # Compute the ISBN-13 check digit
        val = sum((x % 2 * 2 + 1) * int(y) for x,y in enumerate(chars))
        check = 10 - (val % 10)
        if check == 10:
            check = "0"

    if (str(check) == last):
        print("Valid ISBN")
    else:
        print("Invalid ISBN check digit")
else:
    print("Invalid ISBN")

Discussion

ISBN-10 checksum

Step 1:
sum = 10×0 + 9×5 + 8×9 + 7×6 + 6×5 + 5×2 + 4×0 + 3×6 + 2×8
    =    0 +  45 +  72 +  42 +  30 +  10 +   0 +  18 +  16
    = 233
Step 2:
    233 ÷ 11 = 21, remainder 2
Step 3:
    11 − 2 = 9
Step 4:
    9 [no substitution required]

ISBN-13 checksum

Step 1:
sum = 1×9 + 3×7 + 1×8 + 3×0 + 1×5 + 3×9 + 1×6 + 3×5 + 1×2 + 3×0 + 1×6 + 3×8
    =   9 +  21 +   8 +   0 +   5 +  27 +   6 +  15 +   2 +   0 +   6 +  24
    = 123
Step 2:
    123 ÷ 10 = 12, remainder 3
Step 3:
    10 − 3 = 7
Step 4:
    7 [no substitution required]

Variations

Find ISBNs in documents

\bISBN(?:-1[03])?:? (?=[0-9X]{10}$|(?=(?:[0-9]+[- ]){3})[- 0-9X]{13}$|97[89][0-9]{10}$|(?=(?:[0-9]+[- ]){4})[- 0-9]{17}$)(?:97[89][- ]?)?[0-9]{1,5}[- ]?[0-9]+[- ]?[0-9]+[- ]?[0-9X]\b

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Eliminate incorrect ISBN identifiers

^
(?:ISBN(-1(?:(0)|3))?:?\ )?
(?(1)
  (?(2)
    # ISBN-10
    (?=[0-9X]{10}$|(?=(?:[0-9]+[- ]){3})[- 0-9X]{13}$)
    [0-9]{1,5}[- ]?[0-9]+[- ]?[0-9]+[- ]?[0-9X]
   |
    # ISBN-13
    (?=[0-9]{13}$|(?=(?:[0-9]+[- ]){4})[- 0-9]{17}$)
    97[89][- ]?[0-9]{1,5}[- ]?[0-9]+[- ]?[0-9]+[- ]?[0-9]
  )
 |
  # No explicit identifier; allow ISBN-10 or ISBN-13
  (?=[0-9X]{10}$|(?=(?:[0-9]+[- ]){3})[- 0-9X]{13}$|97[89][0-9]{10}$|
    (?=(?:[0-9]+[- ]){4})[- 0-9]{17}$)
  (?:97[89][- ]?)?[0-9]{1,5}[- ]?[0-9]+[- ]?[0-9]+[- ]?[0-9X]
)
$

Regex options: Free-spacing
Regex flavors: .NET, PCRE, Perl, Python

4.14. Validate ZIP Codes

Solution

Regular expression

^[0-9]{5}(?:-[0-9]{4})?$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

VB.NET example

If Regex.IsMatch(subjectString, "^[0-9]{5}(?:-[0-9]{4})?$") Then
    Console.WriteLine("Valid ZIP code")
Else
    Console.WriteLine("Invalid ZIP code")
End If

Discussion

^           # Assert position at the beginning of the string.
[0-9]{5}    # Match a digit, exactly five times.
(?:         # Group but don't capture:
  -         #   Match a literal "-".
  [0-9]{4}  #   Match a digit, exactly four times.
)           # End the noncapturing group.
  ?         #   Make the group optional.
$           # Assert position at the end of the string.

Regex options: Free-spacing
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

4.15. Validate Canadian Postal Codes

Solution

^(?!.*[DFIOQU])[A-VXY][0-9][A-Z] ?[0-9][A-Z][0-9]$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

4.16. Validate U.K. Postcodes

Solution

^[A-Z]{1,2}[0-9R][0-9A-Z]? [0-9][ABD-HJLNP-UW-Z]{2}$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Discussion

^(?:(?:[A-PR-UWYZ][0-9]{1,2}|[A-PR-UWYZ][A-HK-Y][0-9]{1,2}|[A-PR-UWYZ][0-9][A-HJKSTUW]|[A-PR-UWYZ][A-HK-Y][0-9][ABEHMNPRV-Y]) [0-9][ABD-HJLNP-UW-Z]{2}|GIR 0AA)$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

4.17. Find Addresses with Post Office Boxes

Solution

Regular expression

^(?:Post(?:al)? (?:Office )?|P[. ]?O\.? )?Box\b

Regex options: Case insensitive, ^ and $ match at line breaks
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

C# example

Regex regexObj = new Regex(
    @"^(?:Post(?:al)? (?:Office )?|P[. ]?O\.? )?Box\b",
    RegexOptions.IgnoreCase | RegexOptions.Multiline
);
if (regexObj.IsMatch(subjectString) {
    Console.WriteLine("The value does not appear to be a street address");
} else {
    Console.WriteLine("Good to go");
}

Discussion

^                # Assert position at the beginning of a line.
(?:              # Group but don't capture:
  Post(?:al)?\   #   Match "Post " or "Postal ".
  (?:Office\ )?  #   Optionally match "Office ".
 |               #  Or:
  P[.\ ]?        #   Match "P" and an optional period or space character.
  O\.?\          #   Match "O", an optional period, and a space character.
)?               # Make the group optional.
Box              # Match "Box".
\b               # Assert position at a word boundary.

Regex options: Case insensitive, ^ and $ match at line breaks, free-spacing
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

4.18. Reformat Names From “FirstName LastName” to “LastName, FirstName”

Solution

Regular expression

^(.+?) ([^\s,]+)(,? (?:[JS]r\.?|III?|IV))?$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Replacement

$2, $1$3

\2, \1\3

JavaScript example

function formatName(name) {
    return name.replace(/^(.+?) ([^\s,]+)(,? (?:[JS]r\.?|III?|IV))?$/i,
                        "$2, $1$3");
}

Discussion

^              # Assert position at the beginning of the string.
(              # Capture the enclosed match to backreference 1:
  .+?          #   Match one or more characters, as few times as possible.
)              # End the capturing group.
\              # Match a literal space character.
(              # Capture the enclosed match to backreference 2:
  [^\s,]+      #   Match one or more non-whitespace/comma characters.
)              # End the capturing group.
(              # Capture the enclosed match to backreference 3:
  ,?\          #   Match ", " or " ".
  (?:          #   Group but don't capture:
    [JS]r\.?   #     Match "Jr", "Jr.", "Sr", or "Sr.".
   |           #    Or:
    III?       #     Match "II" or "III".
   |           #    Or:
    IV         #     Match "IV".
  )            #   End the noncapturing group.
)?             # Make the group optional.
$              # Assert position at the end of the string.

Regex options: Case insensitive, free-spacing
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

Variations

List surname particles at the beginning of the name

^(.+?) ((?:(?:d[eu]|l[ae]|Ste?\.?|v[ao]n) )*[^\s,]+)(,? (?:[JS]r\.?|III?|IV))?$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

$2, $1$3

\2, \1\3

4.19. Validate Password Complexity

Solution

Length between 8 and 32 characters

^.{8,32}$

Regex options: Dot matches line breaks (“^ and $ match at line breaks” must not be set)
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

^[\s\S]{8,32}$

Regex options: None (“^ and $ match at line breaks” must not be set)
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

ASCII visible and space characters only

^[\x20-\x7E]+$

Regex options: None (“^ and $ match at line breaks” must not be set)
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

One or more uppercase letters

[A-Z]

Regex options: None (“case insensitive” must not be set)
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

\p{Lu}

Regex options: None (“case insensitive” must not be set)
Regex flavors: .NET, Java, PCRE, Perl, Ruby 1.9

One or more lowercase letters

[a-z]

Regex options: None (“case insensitive” must not be set)
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

\p{Ll}

Regex options: None (“case insensitive” must not be set)
Regex flavors: .NET, Java, PCRE, Perl, Ruby 1.9

One or more numbers

[0-9]

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

One or more special characters

[ !"#$%&'()*+,\-./:;<=>?@[\\\]^_`{|}~]

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

[^A-Za-z0-9]

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Disallow three or more sequential identical characters

(.)\1\1

Regex options: Dot matches line breaks
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

([\s\S])\1\1

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Example JavaScript solution, basic

function validate(password) {
    var minMaxLength = /^[\s\S]{8,32}$/,
        upper = /[A-Z]/,
        lower = /[a-z]/,
        number = /[0-9]/,
        special = /[ !"#$%&'()*+,\-./:;<=>?@[\\\]^_`{|}~]/;

    if (minMaxLength.test(password) &&
        upper.test(password) &&
        lower.test(password) &&
        number.test(password) &&
        special.test(password)
    ) {
        return true;
    }

    return false;
}

Example JavaScript solution, with x out of y validation

function validate(password) {
    var minMaxLength = /^[\s\S]{8,32}$/,
        upper = /[A-Z]/,
        lower = /[a-z]/,
        number = /[0-9]/,
        special = /[^A-Za-z0-9]/,
        count = 0;

    if (minMaxLength.test(password)) {
        // Only need 3 out of 4 of these to match
        if (upper.test(password)) count++;
        if (lower.test(password)) count++;
        if (number.test(password)) count++;
        if (special.test(password)) count++;
    }

    return count >= 3;
}

Example JavaScript solution, with password security ranking

var rank = {
    TOO_SHORT: 0,
    WEAK: 1,
    MEDIUM: 2,
    STRONG: 3,
    VERY_STRONG: 4
};

function rankPassword(password) {
    var upper = /[A-Z]/,
        lower = /[a-z]/,
        number = /[0-9]/,
        special = /[^A-Za-z0-9]/,
        minLength = 8,
        score = 0;

    if (password.length < minLength) {
        return rank.TOO_SHORT; // End early
    }

    // Increment the score for each of these conditions
    if (upper.test(password)) score++;
    if (lower.test(password)) score++;
    if (number.test(password)) score++;
    if (special.test(password)) score++;

    // Penalize if there aren't at least three char types
    if (score < 3) score--;

    if (password.length > minLength) {
        // Increment the score for every 2 chars longer than the minimum
        score += Math.floor((password.length - minLength) / 2);
    }

    // Return a ranking based on the calculated score
    if (score < 3) return rank.WEAK; // score is 2 or lower
    if (score < 4) return rank.MEDIUM; // score is 3
    if (score < 6) return rank.STRONG; // score is 4 or 5
    return rank.VERY_STRONG; // score is 6 or higher
}

// Test it...
var result = rankPassword("password1"),
    labels = ["Too Short", "Weak", "Medium", "Strong", "Very Strong"];

alert(labels[result]); // -> Weak

Variations

Validate multiple password rules with a single regex

^(?=.{8,32}$)(?=.*[A-Z])(?=.*[a-z])(?=.*[0-9]).*

Regex options: Dot matches line breaks (“^ and $ match at line breaks” must not be set)
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

4.20. Validate Credit Card Numbers

Solution

Strip spaces and hyphens

[ -]

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Validate the number

^(?:
(?<visa>4[0-9]{12}(?:[0-9]{3})?) |
(?<mastercard>5[1-5][0-9]{14}) |
(?<discover>6(?:011|5[0-9]{2})[0-9]{12}) |
(?<amex>3[47][0-9]{13}) |
(?<diners>3(?:0[0-5]|[68][0-9])[0-9]{11}) |
(?<jcb>(?:2131|1800|35[0-9]{3})[0-9]{11})
)$

Regex options: Free-spacing
Regex flavors: .NET, Java 7, XRegExp, PCRE 7, Perl 5.10, Ruby 1.9

^(?:
(?P<visa>4[0-9]{12}(?:[0-9]{3})?) |
(?P<mastercard>5[1-5][0-9]{14}) |
(?P<discover>6(?:011|5[0-9]{2})[0-9]{12}) |
(?P<amex>3[47][0-9]{13}) |
(?P<diners>3(?:0[0-5]|[68][0-9])[0-9]{11}) |
(?P<jcb>(?:2131|1800|35[0-9]{3})[0-9]{11})
)$

Regex options: Free-spacing
Regex flavors: PCRE, Python

^(?:
(4[0-9]{12}(?:[0-9]{3})?) |          # Visa
(5[1-5][0-9]{14}) |                  # MasterCard
(6(?:011|5[0-9]{2})[0-9]{12}) |      # Discover
(3[47][0-9]{13}) |                   # AMEX
(3(?:0[0-5]|[68][0-9])[0-9]{11}) |   # Diners Club
((?:2131|1800|35[0-9]{3})[0-9]{11})  # JCB
)$

Regex options: Free-spacing
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

^(?:(4[0-9]{12}(?:[0-9]{3})?)|(5[1-5][0-9]{14})|(6(?:011|5[0-9]{2})[0-9]{12})|(3[47][0-9]{13})|(3(?:0[0-5]|[68][0-9])[0-9]{11})|((?:2131|1800|35[0-9]{3})[0-9]{11}))$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Example web page with JavaScript

<html>
<head>
<title>Credit Card Test</title>
</head>

<body>
<h1>Credit Card Test</h1>

<form>
<p>Please enter your credit card number:</p>

<p><input type="text" size="20" name="cardnumber"
  onkeyup="validatecardnumber(this.value)"></p>

<p id="notice">(no card number entered)</p>
</form>

<script>
function validatecardnumber(cardnumber) {
  // Strip spaces and dashes
  cardnumber = cardnumber.replace(/[ -]/g, '');
  // See if the card is valid
  // The regex will capture the number in one of the capturing groups
  var match = /^(?:(4[0-9]{12}(?:[0-9]{3})?)|(5[1-5][0-9]{14})|(6(?:011|5[0-9]{2})[0-9]{12})|(3[47][0-9]{13})|(3(?:0[0-5]|[68][0-9])[0-9]{11})|((?:2131|1800|35[0-9]{3})[0-9]{11}))$/.exec(cardnumber);
  if (match) {
    // List of card types, in the same order as the regex capturing groups
    var types = ['Visa', 'MasterCard', 'Discover', 'American Express',
                 'Diners Club', 'JCB'];
    // Find the capturing group that matched
    // Skip the zeroth element of the match array (the overall match)
    for (var i = 1; i < match.length; i++) {
      if (match[i]) {
        // Display the card type for that group
        document.getElementById('notice').innerHTML = types[i - 1];
        break;
      }
    }
  } else {
    document.getElementById('notice').innerHTML = '(invalid card number)';
  }
}
</script>
</body>
</html>

Discussion

Validate the number

^(?:
4[0-9]{12}(?:[0-9]{3})? |         # Visa
5[1-5][0-9]{14} |                 # MasterCard
3[47][0-9]{13}                    # AMEX
)$

Regex options: Free-spacing
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

^(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|3[47][0-9]{13})$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Extra Validation with the Luhn Algorithm

function luhn(cardnumber) {
  // Build an array with the digits in the card number
  var digits = cardnumber.split('');
  for (var i = 0; i < digits.length; i++) {
    digits[i] = parseInt(digits[i], 10);
  }  
  // Run the Luhn algorithm on the array
  var sum = 0;
  var alt = false;
  for (i = digits.length - 1; i >= 0; i--) {
    if (alt) {
      digits[i] *= 2;
      if (digits[i] > 9) {
        digits[i] -= 9;
      }
    }
    sum += digits[i];
    alt = !alt;
  }
  // Check the result
  if (sum % 10 == 0) {
    document.getElementById('notice').innerHTML += '; Luhn check passed';
  } else {
    document.getElementById('notice').innerHTML += '; Luhn check failed';
  }
}

4.21. European VAT Numbers

Solution

Strip whitespace and punctuation

[-. ]

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Validate the number

^(
(AT)?U[0-9]{8} |                              # Austria
(BE)?0[0-9]{9} |                              # Belgium
(BG)?[0-9]{9,10} |                            # Bulgaria
(CY)?[0-9]{8}L |                              # Cyprus
(CZ)?[0-9]{8,10} |                            # Czech Republic
(DE)?[0-9]{9} |                               # Germany
(DK)?[0-9]{8} |                               # Denmark
(EE)?[0-9]{9} |                               # Estonia
(EL|GR)?[0-9]{9} |                            # Greece
(ES)?[0-9A-Z][0-9]{7}[0-9A-Z] |               # Spain
(FI)?[0-9]{8} |                               # Finland
(FR)?[0-9A-Z]{2}[0-9]{9} |                    # France
(GB)?([0-9]{9}([0-9]{3})?|[A-Z]{2}[0-9]{3}) | # United Kingdom
(HU)?[0-9]{8} |                               # Hungary
(IE)?[0-9]S[0-9]{5}L |                        # Ireland
(IT)?[0-9]{11} |                              # Italy
(LT)?([0-9]{9}|[0-9]{12}) |                   # Lithuania
(LU)?[0-9]{8} |                               # Luxembourg
(LV)?[0-9]{11} |                              # Latvia
(MT)?[0-9]{8} |                               # Malta
(NL)?[0-9]{9}B[0-9]{2} |                      # Netherlands
(PL)?[0-9]{10} |                              # Poland
(PT)?[0-9]{9} |                               # Portugal
(RO)?[0-9]{2,10} |                            # Romania
(SE)?[0-9]{12} |                              # Sweden
(SI)?[0-9]{8} |                               # Slovenia
(SK)?[0-9]{10}                                # Slovakia
)$

Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

^((AT)?U[0-9]{8}|(BE)?0[0-9]{9}|(BG)?[0-9]{9,10}|(CY)?[0-9]{8}L|(CZ)?[0-9]{8,10}|(DE)?[0-9]{9}|(DK)?[0-9]{8}|(EE)?[0-9]{9}|(EL|GR)?[0-9]{9}|(ES)?[0-9A-Z][0-9]{7}[0-9A-Z]|(FI)?[0-9]{8}|(FR)?[0-9A-Z]{2}[0-9]{9}|(GB)?([0-9]{9}([0-9]{3})?|[A-Z]{2}[0-9]{3})|(HU)?[0-9]{8}|(IE)?[0-9]S[0-9]{5}L|(IT)?[0-9]{11}|(LT)?([0-9]{9}|[0-9]{12})|(LU)?[0-9]{8}|(LV)?[0-9]{11}|(MT)?[0-9]{8}|(NL)?[0-9]{9}B[0-9]{2}|(PL)?[0-9]{10}|(PT)?[0-9]{9}|(RO)?[0-9]{2,10}|(SE)?[0-9]{12}|(SI)?[0-9]{8}|(SK)?[0-9]{10})$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

5. Words, Lines, and Special Characters

5.1. Find a Specific Word

Solution

\bcat\b

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Discussion

// 8-bit-wide letter characters
var pL = "A-Za-z\xAA\xB5\xBA\xC0-\xD6\xD8-\xF6\xF8-\xFF",
    pattern = "([^{L}]|^)cat([^{L}]|$)".replace(/{L}/g, pL),
    regex = new RegExp(pattern, "gi");

// replace cat with dog, and put back any
// additional matched characters
subject = subject.replace(regex, "$1dog$2");

5.2. Find Any of Multiple Words

Solution

Using alternation

\b(?:one|two|three)\b

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Example JavaScript solution

var subject = "One times two plus one equals three.";

// Solution 1:

var regex = /\b(?:one|two|three)\b/gi;

subject.match(regex);
// Returns an array with four matches: ["One","two","one","three"]

// Solution 2 (reusable):

// This function does the same thing but accepts an array of words to
// match. Any regex metacharacters within the accepted words are escaped
// with a backslash before searching.

function matchWords(subject, words) {
    var regexMetachars = /[(){[*+?.\\^$|]/g;

    for (var i = 0; i < words.length; i++) {
        words[i] = words[i].replace(regexMetachars, "\\$&");
    }

    var regex = new RegExp("\\b(?:" + words.join("|") + ")\\b", "gi");

    return subject.match(regex) || [];
}

matchWords(subject, ["one","two","three"]);
// Returns an array with four matches: ["One","two","one","three"]

5.3. Find Similar Words

Solution

Color or colour

\bcolou?r\b

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Bat, cat, or rat

\b[bcr]at\b

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Words ending with “phobia”

\b\w*phobia\b

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Steve, Steven, or Stephen

\bSte(?:ven?|phen)\b

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Variations of “regular expression”

\breg(?:ular expressions?|ex(?:ps?|e[sn])?)\b

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Discussion

Variations of “regular expression”

\b              # Assert position at a word boundary.
reg             # Match "reg".
(?:             # Group but don't capture:
  ular\         #   Match "ular ".
  expressions?  #   Match "expression" or "expressions".
 |              #  Or:
  ex            #   Match "ex".
  (?:           #   Group but don't capture:
    ps?         #     Match "p" or "ps".
   |            #    Or:
    e[sn]       #     Match "es" or "en".
  )?            #   End the group and make it optional.
)               # End the group.
\b              # Assert position at a word boundary.

Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

5.4. Find All Except a Specific Word

Solution

\b(?!cat\b)\w+

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Discussion

\b     # Assert position at a word boundary.
(?!    # Not followed by:
  cat  #   Match "cat".
  \b   #   Assert position at a word boundary.
)      # End the negative lookahead.
\w+    # Match one or more word characters.

Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

Variations

Find words that don’t contain another word

\b(?:(?!cat)\w)+\b

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

5.5. Find Any Word Not Followed by a Specific Word

Solution

\b\w+\b(?!\W+cat\b)

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Variations

\b\w+\b(?=\W+cat\b)

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

5.6. Find Any Word Not Preceded by a Specific Word

Solution

Words not preceded by “cat”

(?<!\bcat\W+)\b\w+

Regex options: Case insensitive

(?<!\bcat\W{1,9})\b\w+

Regex options: Case insensitive
Regex flavors: .NET, Java

(?<!\bcat\W)\b\w+

Regex options: Case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python

(?<!\Wcat\W)(?<!^cat\W)\b\w+

Regex options: Case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby 1.9

Simulate lookbehind

var subject = "My cat is fluffy.",
    mainRegex = /\b\w+/g,
    lookbehind = /\bcat\W+$/i,
    lookbehindType = false, // false for negative, true for positive
    matches = [],
    match,
    leftContext;

while (match = mainRegex.exec(subject)) {
    leftContext = subject.substring(0, match.index);

    if (lookbehindType == lookbehind.test(leftContext)) {
        matches.push(match[0]);
    } else {
        mainRegex.lastIndex = match.index + 1;
    }
}

// matches:  ["My", "cat", "fluffy"]

Variations

(?<=\bcat\W+)\w+

Regex options: Case insensitive

(?<=\bcat\W{1,9})\w+

Regex options: Case insensitive
Regex flavors: .NET, Java

(?<=\bcat\W)\w+

Regex options: Case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python

(?:(?<=\Wcat\W)|(?<=^cat\W))\w+

Regex options: Case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby 1.9

\bcat\W+\K\w+

Regex options: Case insensitive
Regex flavors: PCRE 7.2, Perl 5.10

5.7. Find Words Near Each Other

Solution

\b(?:word1\W+(?:\w+\W+){0,5}?word2|word2\W+(?:\w+\W+){0,5}?word1)\b

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

\b(?:
  word1                 # first term
  \W+ (?:\w+\W+){0,5}?  # up to five words
  word2                 # second term
|                       #   or, the same pattern in reverse:
  word2                 # second term
  \W+ (?:\w+\W+){0,5}?  # up to five words
  word1                 # first term
)\b

Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

Variations

Using a conditional

\b(?:word1|(word2))\W+(?:\w+\W+){0,5}?(?(1)word1|word2)\b

Regex options: None
Regex flavors: .NET, PCRE, Perl, Python

\b(?:(?<w1>word1)|(?<w2>word2))\W+(?:\w+\W+){0,5}?(?(w2)(?&w1)|(?&w2))\b

Regex options: None
Regex flavors: PCRE 7, Perl 5.10

Match three or more words near each other

\b(?:(?>(word1)|(word2)|(word3)|(?(1)|(?(2)|(?(3)|(?!))))\w+)\b\W*?){3,8}(?(1)(?(2)(?(3)|(?!))|(?!))|(?!))

Regex options: Case insensitive
Regex flavors: .NET, PCRE, Perl

\b(?:(?:(word1)|(word2)|(word3)|(?(1)|(?(2)|(?(3)|(?!))))\w+)\b\W*?){3,8}(?(1)(?(2)(?(3)|(?!))|(?!))|(?!))

Regex options: Case insensitive
Regex flavors: .NET, PCRE, Perl, Python

\b(?:(?>word1()|word2()|word3()|(?>\1|\2|\3)\w+)\b\W*?){3,8}\1\2\3

Regex options: Case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Ruby

\b(?:(?:word1()|word2()|word3()|(?:\1|\2|\3)\w+)\b\W*?){3,8}\1\2\3

Regex options: Case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

\b(?:(?>word1()|word2()|word3()|word4()|(?>\1|\2|\3|\4)\w+)\b\W*?){4,9}\1\2\3\4

Regex options: Case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Ruby

\b(?:(?:word1()|word2()|word3()|word4()|(?:\1|\2|\3|\4)\w+)\b\W*?){4,9}\1\2\3\4

Regex options: Case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

Multiple words, any distance from each other

^(?=.*?\bword1\b)(?=.*?\bword2\b).*

Regex options: Case insensitive, dot matches line breaks (“^ and $ match at line breaks” must not be set)
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

^(?=[\s\S]*?\bword1\b)(?=[\s\S]*?\bword2\b)[\s\S]*

Regex options: Case insensitive (“^ and $ match at line breaks” must not be set)
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

5.8. Find Repeated Words

Solution

\b([A-Z]+)\s+\1\b

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Variations

(?<![\p{L}\p{M}\-'\u2019])([\-'\u2019]?(?:[\p{L}\p{M}][\-'\u2019]?)+)\s+\1(?![\p{L}\p{M}\-'\u2019])

Regex options: Case insensitive
Regex flavors: .NET, Java, Ruby 1.9

(?<![\p{L}\p{M}\-'\x{2019}])([\-'\x{2019}]?(?:[\p{L}\p{M}][\-'\x{2019}]?)+)\s+\1(?![\p{L}\p{M}\-'\x{2019}])

Regex options: Case insensitive
Regex flavors: Java 7, PCRE, Perl

5.9. Remove Duplicate Lines

Solution

Option 1: Sort lines and remove adjacent duplicates

^(.*)(?:(?:\r?\n|\r)\1)+$

Regex options: ^ and $ match at line breaks (“dot matches line breaks” must not be set)
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

$1

\1

Option 2: Keep the last occurrence of each duplicate line in an unsorted file

^([^\r\n]*)(?:\r?\n|\r)(?=.*^\1$)

Regex options: Dot matches line breaks, ^ and $ match at line breaks
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

^(.*)(?:\r?\n|\r)(?=[\s\S]*^\1$)

Regex options: ^ and $ match at line breaks (“dot matches line breaks” must not be set)

Option 3: Keep the first occurrence of each duplicate line in an unsorted file

^([^\r\n]*)$(.*?)(?:(?:\r?\n|\r)\1$)+

Regex options: Dot matches line breaks, ^ and $ match at line breaks
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

^(.*)$([\s\S]*?)(?:(?:\r?\n|\r)\1$)+

Regex options: ^ and $ match at line breaks (“dot matches line breaks” must not be set)
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

$1$2

\1\2

Discussion

Option 3: Keep the first occurrence of each duplicate line in an unsorted file

value1
value2
value2
value3
value3
value1
value2

5.10. Match Complete Lines That Contain a Word

Solution

^.*\berror\b.*$

Regex options: Case insensitive, ^ and $ match at line breaks (“dot matches line breaks” must not be set)
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Variations

^.*\b(one|two|three)\b.*$

Regex options: Case insensitive, ^ and $ match at line breaks (“dot matches line breaks” must not be set)
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(?=.*?\bone\b)(?=.*?\btwo\b)(?=.*?\bthree\b).+$

Regex options: Case insensitive, ^ and $ match at line breaks (“dot matches line breaks” must not be set)
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

5.11. Match Complete Lines That Do Not Contain a Word

Solution

^(?:(?!\berror\b).)*$

Regex options: Case insensitive, ^ and $ match at line breaks (“dot matches line breaks” must not be set)
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

5.12. Trim Leading and Trailing Whitespace

Solution

\A\s+

Regex options: None
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

^\s+

Regex options: None (“^ and $ match at line breaks” must not be set)
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python

\s+\Z

Regex options: None
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

\s+$

Regex options: None (“^ and $ match at line breaks” must not be set)
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python

Discussion

sub trim {
    my $string = shift;
    $string =~ s/^\s+//;
    $string =~ s/\s+$//;
    return $string;
}

// Add the trim method for browsers that don't already include it
if (!String.prototype.trim) {
    String.prototype.trim = function() {
        return this.replace(/^\s+/, "").replace(/\s+$/, "");
    };
}

5.13. Replace Repeated Whitespace with a Single Space

Solution

Clean any whitespace characters

\s+

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Clean horizontal whitespace characters

[ \t\xA0]+

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby 1.8

[ \t\u00A0]+

Regex options: None
Regex flavors: .NET, Java, JavaScript, Python, Ruby 1.9

\h+

Regex options: None
Regex flavors: PCRE 7.2, Perl 5.10

5.14. Escape Regular Expression Metacharacters

Solution

Regular expression

[[\]{}()*+?.\\|^$\-,&#\s]

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Replacement

\$&

\$0

\\$&

\\$0

\\\0

\\\&

\\\g<0>

Example JavaScript function

RegExp.escape = function(str) {
    return str.replace(/[[\]{}()*+?.\\|^$\-,&#\s]/g, "\\$&");
};

// Test it...
var str = "<Hello World.>";
var escapedStr = RegExp.escape(str);
alert(escapedStr == "<Hello\\ World\\.>"); // -> true

6. Numbers

6.1. Integer Numbers

Solution

\b[0-9]+\b

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

\A[0-9]+\Z

Regex options: None
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

^[0-9]+$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python

(?<=^|\s)[0-9]+(?=$|\s)

Regex options: None
Regex flavors: .NET, Java, PCRE, Ruby 1.9

(?:^|(?<=\s))[0-9]+(?=$|\s)

Regex options: None
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby 1.9

(^|\s)([0-9]+)(?=$|\s)

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

[+-]?\b[0-9]+\b

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

\A[+-]?[0-9]+\Z

Regex options: None
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

^[+-]?[0-9]+$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python

([+-] *)?\b[0-9]+\b

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

6.2. Hexadecimal Numbers

Solution

\b[0-9A-F]+\b

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

\b[0-9A-Fa-f]+\b

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

\A[0-9A-F]+\Z

Regex options: Case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

^[0-9A-F]+$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python

\b0x[0-9A-F]+\b

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

&H[0-9A-F]+\b

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

\b[0-9A-F]+H\b

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

\b[0-9A-F]{2}\b

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

\b[0-9A-F]{4}\b

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

\b[0-9A-F]{8}\b

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

\b[0-9A-F]{16}\b

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

\b(?:[0-9A-F]{2})+\b

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

6.3. Binary Numbers

Solution

\b[01]+\b

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

\A[01]+\Z

Regex options: None
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

^[01]+$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python

\b0b[01]+\b

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

\b[01]+B\b

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

\b[01]{8}\b

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

\b[01]{16}\b

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

\b(?:[01]{8})+\b

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

6.4. Octal Numbers

Solution

\b0[0-7]*\b

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

\A0[0-7]*\Z

Regex options: None
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

^0[0-7]*$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python

\b0o[0-7]+\b

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

6.5. Decimal Numbers

Solution

\b(0|[1-9][0-9]*)\b

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

\A(0|[1-9][0-9]*)\Z

Regex options: None
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

^(0|[1-9][0-9]*)$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python

6.6. Strip Leading Zeros

Solution

Regular expression

\b0*([1-9][0-9]*|0)\b

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Replacement

$1

\1

Getting the numbers in Perl

while ($subject =~ m/\b0*([1-9][0-9]*|0)\b/g) {
    push(@list, $1);
}

Stripping leading zeros in PHP

$result = preg_replace('/\b0*([1-9][0-9]*|0)\b/', '$1', $subject);

6.7. Numbers Within a Certain Range

Solution

^(1[0-2]|[1-9])$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(2[0-4]|1[0-9]|[1-9])$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(3[01]|[12][0-9]|[1-9])$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(5[0-3]|[1-4][0-9]|[1-9])$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^[1-5]?[0-9]$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(100|[1-9]?[0-9])$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(100|[1-9][0-9]?)$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(12[0-6]|1[01][0-9]|[4-9][0-9]|3[2-9])$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(12[0-7]|1[01][0-9]|[1-9]?[0-9])$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(12[0-7]|1[01][0-9]|[1-9]?[0-9]|-(12[0-8]|1[01][0-9]|[1-9]?[0-9]))$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(36[0-6]|3[0-5][0-9]|[12][0-9]{2}|[1-9][0-9]?)$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(19|20)[0-9]{2}$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(3276[0-7]|327[0-5][0-9]|32[0-6][0-9]{2}|3[01][0-9]{3}|[12][0-9]{4}|[1-9][0-9]{1,3}|[0-9])$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(3276[0-7]|327[0-5][0-9]|32[0-6][0-9]{2}|3[01][0-9]{3}|[12][0-9]{4}|[1-9][0-9]{1,3}|[0-9]|-(3276[0-8]|327[0-5][0-9]|32[0-6][0-9]{2}|3[01][0-9]{3}|[12][0-9]{4}|[1-9][0-9]{1,3}|[0-9]))$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(6553[0-5]|655[0-2][0-9]|65[0-4][0-9]{2}|6[0-4][0-9]{3}|[1-5][0-9]{4}|[1-9][0-9]{1,3}|[0-9])$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Discussion

[45][0-9]

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

4[4-9]|5[0-5]

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

3[4-9]|[45][0-9]|6[0-5]

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

1[0-2]|[1-9]

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

8[5-9]|9[0-9]|10[0-9]|11[0-7]

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

6553[0-5]|655[0-2][0-9]|65[0-4][0-9][0-9]|6[0-4][0-9][0-9][0-9]|[1-5][0-9][0-9][0-9][0-9]|[1-9][0-9][0-9][0-9]|[1-9][0-9][0-9]|[1-9][0-9]|[0-9]

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

6553[0-5]|655[0-2][0-9]|65[0-4][0-9]{2}|6[0-4][0-9]{3}|[1-5][0-9]{4}|[1-9][0-9]{3}|[1-9][0-9]{2}|[1-9][0-9]|[0-9]

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

6553[0-5]|655[0-2][0-9]|65[0-4][0-9]{2}|6[0-4][0-9]{3}|[1-5][0-9]{4}|[1-9][0-9]{1,3}|[0-9]

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

6(?:553[0-5]|55[0-2][0-9]|5[0-4][0-9]{2}|[0-4][0-9]{3})|[1-5][0-9]{4}|[1-9][0-9]{1,3}|[0-9]

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

6.8. Hexadecimal Numbers Within a Certain Range

Solution

^[1-9a-c]$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(1[0-8]|[1-9a-f])$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(1[0-9a-f]|[1-9a-f])$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(3[0-5]|[12][0-9a-f]|[1-9a-f])$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(3[0-9a-b]|[12]?[0-9a-f])$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(6[0-4]|[1-5]?[0-9a-f])$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(6[0-4]|[1-5][0-9a-f]|[1-9a-f])$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(7[0-9a-e]|[2-6][0-9a-f])$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^[1-7]?[0-9a-f]$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^[1-9a-f]?[0-9a-f]$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(16[0-9a-e]|1[0-5][0-9a-f]|[1-9a-f][0-9a-f]?)$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(83[0-3]|8[0-2][0-9a-f]|7[7-9a-f][0-9a-f]|76[c-f])$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^([1-7][0-9a-f]{3}|[1-9a-f][0-9a-f]{1,2}|[0-9a-f])$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^([1-9a-f][0-9a-f]{1,3}|[0-9a-f])$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

6.9. Integer Numbers with Separators

Solution

\b[0-9]+(_+[0-9]+)*\b

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

\b0x[0-9A-F]+(_+[0-9A-F]+)*\b

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

\b0b[01]+(_+[01]+)*\b

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

\b([0-9]+(_+[0-9]+)*|0x[0-9A-F]+(_+[0-9A-F]+)*|0b[01]+(_+[01]+)*)\b

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

\A([0-9]+(_+[0-9]+)*|0x[0-9A-F]+(_+[0-9A-F]+)*|0b[01]+(_+[01]+)*)\Z

Regex options: Case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

^([0-9]+(_+[0-9]+)*|0x[0-9A-F]+(_+[0-9A-F]+)*|0b[01]+(_+[01]+)*)$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python

6.10. Floating-Point Numbers

Solution

^[-+][0-9]+\.[0-9]+[eE][-+]?[0-9]+$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^[-+][0-9]+\.[0-9]+$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^[-+]?[0-9]+\.[0-9]+$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^[-+]?[0-9]*\.[0-9]+$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^[-+]?([0-9]+(\.[0-9]+)?|\.[0-9]+)$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^[-+]?([0-9]+(\.[0-9]*)?|\.[0-9]+)$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^[-+]?([0-9]+(\.[0-9]+)?|\.[0-9]+)([eE][-+]?[0-9]+)?$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^[-+]?([0-9]+(\.[0-9]*)?|\.[0-9]+)([eE][-+]?[0-9]+)?$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

[-+]?(\b[0-9]+(\.[0-9]*)?|\.[0-9]+)([eE][-+]?[0-9]+\b)?

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

6.11. Numbers with Thousand Separators

Solution

^[0-9]{1,3}(,[0-9]{3})*\.[0-9]+$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^[0-9]{1,3}(,[0-9]{3})*(\.[0-9]+)?$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^([0-9]{1,3}(,[0-9]{3})*(\.[0-9]+)?|\.[0-9]+)$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

\b[0-9]{1,3}(,[0-9]{3})*(\.[0-9]+)?\b|\.[0-9]+\b

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

6.12. Add Thousand Separators to Numbers

Solution

Basic solution

[0-9](?=(?:[0-9]{3})+(?![0-9]))

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

$&,

$0,

\0,

\&,

\g<0>,

Match separator positions only, using lookbehind

(?<=[0-9])(?=(?:[0-9]{3})+(?![0-9]))

Regex options: None
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby 1.9

Variations

Don’t add commas after a decimal point

[0-9](?=(?:[0-9]{3})+(?![0-9]))(?<!\.[0-9]+)

Regex options: None
Regex flavors: .NET

[0-9](?=(?:[0-9]{3})+(?![0-9]))(?<!\.[0-9]{1,100})

Regex options: None
Regex flavors: .NET, Java

$0,

\b(?<!\.)[0-9]{4,}

Regex options: None
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby 1.9

subject.gsub(/\b(?<!\.)[0-9]{4,}/) {|match|
    match.gsub(/[0-9](?=(?:[0-9]{3})+(?![0-9]))/, '\0,')
}

subject.replace(/(^|[^0-9.])([0-9]{4,})/g, function($0, $1, $2) {
    return $1 + $2.replace(/[0-9](?=(?:[0-9]{3})+(?![0-9]))/g, "$&,");
});

subject.split(").reverse().join(").replace(/[0-9]{3}(?=[0-9])(?![0-9]*\.)/g, "$&,").split(").reverse().join(");

function commafy(num) {
    num = String(num);
    var numParts = /^([0-9]+)(\.[0-9]+)?$/.exec(num);
    var result = numParts[1].replace(/[0-9](?=(?:[0-9]{3})+(?![0-9]))/, "$&,");
    if (numParts[2]) {
        result += numParts[2];
    }
    return result;
}

// Test it...
commafy(10000); // "10,000"
commafy(10000.1234); // "10,000.1234"
commafy(.1234); // "0.1234"
commafy("a"); // Throws a TypeError

6.13. Roman Numerals

Solution

^[MDCLXVI]+$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(?=[MDCLXVI])M*(C[MD]|D?C{0,3})(X[CL]|L?X{0,3})(I[XV]|V?I{0,3})$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(?=[MDCLXVI])M*(C[MD]|D?C*)(X[CL]|L?X*)(I[XV]|V?I*)$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(?=[MDCLXVI])M*D?C{0,4}L?X{0,4}V?I{0,4}$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Convert Roman Numerals to Decimal

sub roman2decimal {
    my $roman = shift;
    if ($roman =~
        m/^(?=[MDCLXVI])
          (M*)               # 1000
          (C[MD]|D?C{0,3})   # 100
          (X[CL]|L?X{0,3})   # 10
          (I[XV]|V?I{0,3})   # 1
          $/ix)
    {
        # Roman numeral found
        my %r2d = ('I' =>    1, 'IV' =>   4, 'V' =>   5, 'IX' =>   9,
                   'X' =>   10, 'XL' =>  40, 'L' =>  50, 'XC' =>  90,
                   'C' =>  100, 'CD' => 400, 'D' => 500, 'CM' => 900,
                   'M' => 1000);
        my $decimal = 0;
        while ($roman =~ m/[MDLV]|C[MD]?|X[CL]?|I[XV]?/ig) {
            $decimal += $r2d{uc($&)};
        }
        return $decimal;
    } else {
        # Not a Roman numeral
        return 0;
    }
}

7. Source Code and Log Files

7.1. Keywords

Solution

\b(?:end|in|inline|inherited|item|object)\b

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

\b(?>end|in(?:line|herited)?|item|object)\b

Regex options: Case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

Variations

object Button1: TButton
    Caption = 'The end is near'
end

\b(end|in|inline|inherited|item|object)\b|'[^'\r\n]*(?:''[^'\r\n]*)*'

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

7.2. Identifiers

Solution

\b[a-z_][0-9a-z_]{0,31}\b

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

7.3. Numeric Constants

Solution

\b(?:(?<dec>[1-9][0-9]*)
   | (?<oct>0[0-7]*)
   | 0x(?<hex>[0-9A-F]+)
   | 0b(?<bin>[01]+)
  )(?<L>L)?\b

Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java 7, XRegExp, PCRE 7, Perl 5.10, Ruby 1.9

\b(?:(?P<dec>[1-9][0-9]*)
   | (?P<oct>0[0-7]*)
   | 0x(?P<hex>[0-9A-F]+)
   | 0b(?P<bin>[01]+)
  )(?P<L>L)?\b

Regex options: Free-spacing, case insensitive
Regex flavors: PCRE 4, Perl 5.10, Python

\b(?:([1-9][0-9]*)|(0[0-7]*)|0x([0-9A-F]+)|0b([01]+))(L)?\b

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

7.4. Operators

Solution

[-+*/=<>%&^|!~?]

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

7.5. Single-Line Comments

Solution

//.*

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

7.6. Multiline Comments

Solution

/\*.*?\*/

Regex options: Dot matches line breaks
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

/\*[\s\S]*?\*/

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Variations

/\*.*?(?:\*/)?

Regex options: Dot matches line breaks
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

/\*[\s\S]*?(?:\*/)?

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

7.7. All Comments

Solution

(?-s://.*)|(?s:/\*.*?\*/)

Regex options: None
Regex flavors: .NET, Java, PCRE, Perl

(?-m://.*)|(?m:/\*.*?\*/)

Regex options: None
Regex flavors: Ruby

//[^\r\n]*|/\*.*?\*/

Regex options: Dot matches line breaks
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

//.*|/\*[\s\S]*?\*/

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

7.8. Strings

Solution

"[^"\r\n]*(?:""[^"\r\n]*)*"

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Variations

'[^'\r\n]*(?:''[^'\r\n]*)*'

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

"[^"\r\n]*(?:""[^"\r\n]*)*"|'[^'\r\n]*(?:''[^'\r\n]*)*'

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

"[^"]*(?:""[^"]*)*"

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

"[^"\r\n]*(?:""[^"\r\n]*)*"?

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

7.9. Strings with Escapes

Solution

"[^"\\\r\n]*(?:\\.[^"\\\r\n]*)"

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Variations

'[^'\\\r\n]*(?:\\.[^'\\\r\n]*)'

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

"[^"\\\r\n]*(?:\\.[^"\\\r\n]*)"|'[^'\\\r\n]*(?:\\.[^'\\\r\n]*)'

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

"[^"\\\r\n]*(?:\\(?:.|\r?\n)[^"\\\r\n]*)"

Regex options: None (make sure “dot matches line breaks” is off)
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

"[^"\\]*(?:\\.[^"\\]*)*"

Regex options: None
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

"[^"\\]*(?:\\[\s\S][^"\\]*)*"

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

7.10. Regex Literals

Solution

(?<=[=:(,](?:\s*!)?\s*)/[^/\\\r\n]*(?:\\.[^/\\\r\n]*)*/

Regex options: None
Regex flavors: .NET

[=:(,](?:\s*!)?\s*\K/[^/\\\r\n]*(?:\\.[^/\\\r\n]*)*/

Regex options: None
Regex flavors: PCRE 7.2, Perl 5.10

(?<=[=:(,](?:\s{0,10}+!)?\s{0,10})/[^/\\\r\n]*(?:\\.[^/\\\r\n]*)*/

Regex options: None
Regex flavors: .NET, Java

[=:(,](?:\s*!)?+\s*(/[^/\\\r\n]*(?:\\.[^/\\\r\n]*)*/)

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

7.11. Here Documents

Solution

<<(["']?)([A-Za-z]+)\b\1.*?^\2\b

Regex options: Dot matches line breaks, ^ and $ match at line breaks
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

<<(["']?)([A-Za-z]+)\b\1[\s\S]*?^\2\b

Regex options: ^ and $ match at line breaks
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

7.12. Common Log Format

Solution

^(?<client>\S+) \S+ (?<userid>\S+) \[(?<datetime>[^\]]+)\] "(?<method>[A-Z]+) (?<request>[^ "]+)? HTTP/[0-9.]+" (?<status>[0-9]{3}) (?<size>[0-9]+|-)

Regex options: ^ and $ match at line breaks
Regex flavors: .NET, Java 7, XRegExp, PCRE 7, Perl 5.10, Ruby 1.9

^(?P<client>\S+) \S+ (?P<userid>\S+) \[(?P<datetime>[^\]]+)\] "(?P<method>[A-Z]+) (?P<request>[^ "]+)? HTTP/[0-9.]+" (?P<status>[0-9]{3}) (?P<size>[0-9]+|-)

Regex options: ^ and $ match at line breaks
Regex flavors: PCRE 4, Perl 5.10, Python

^(\S+) \S+ (\S+) \[([^\]]+)\] "([A-Z]+) ([^ "]+)? HTTP/[0-9.]+" ([0-9]{3}) ([0-9]+|-) "([^"]*)" "([^"]*)"

Regex options: ^ and $ match at line breaks
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Variations

^(?<client>\S+) \S+ (?<userid>\S+) \[(?<day>[0-9]{2})/(?<month>[A-Za-z]+)/(?<year>[0-9]{4}):(?<hour>[0-9]{2}):(?<min>[0-9]{2}):(?<sec>[0-9]{2}) (?<zone>[-+][0-9]{4})\] "(?<method>[A-Z]+) (?<file>[^#? "]+)(?<parameters>[#?][^ "]*)? HTTP/[0-9.]+" (?<status>[0-9]{3}) (?<size>[0-9]+|-)

Regex options: ^ and $ match at line breaks
Regex flavors: .NET, Java 7, XRegExp, PCRE 7, Perl 5.10, Ruby 1.9

^(?P<client>\S+) \S+ (?P<userid>\S+) \[(?P<day>[0-9]{2})/(?P<month>[A-Za-z]+)/(?P<year>[0-9]{4}):(?P<hour>[0-9]{2}):(?P<min>[0-9]{2}):(?P<sec>[0-9]{2}) (?P<zone>[-+][0-9]{4})\] "(?P<method>[A-Z]+) (?P<file>[^#? "]+)(?P<parameters>[#?][^ "]*)? HTTP/[0-9.]+" (?P<status>[0-9]{3}) (?P<size>[0-9]+|-)

Regex options: ^ and $ match at line breaks
Regex flavors: PCRE 4, Perl 5.10, Python

^(\S+) \S+ (\S+) \[([0-9]{2})/([A-Za-z]+)/([0-9]{4}):([0-9]{2}):([0-9]{2}):([0-9]{2}) ([\-+][0-9]{4})\] "([A-Z]+) ([^#? "]+)([#?][^ "]*)? HTTP/[0-9.]+" ([0-9]{3}) ([0-9]+|-)

Regex options: ^ and $ match at line breaks
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

7.13. Combined Log Format

Solution

^(?<client>\S+) \S+ (?<userid>\S+) \[(?<datetime>[^\]]+)\] "(?<method>[A-Z]+) (?<request>[^ "]+)? HTTP/[0-9.]+" (?<status>[0-9]{3}) (?<size>[0-9]+|-) "(?<referrer>[^"]*)" "(?<useragent>[^"]*)"

Regex options: ^ and $ match at line breaks
Regex flavors: .NET, Java 7, XRegExp, PCRE 7, Perl 5.10, Ruby 1.9

^(?P<client>\S+) \S+ (?P<userid>\S+) \[(?P<datetime>[^\]]+)\] "(?P<method>[A-Z]+) (?P<request>[^ "]+)? HTTP/[0-9.]+" (?P<status>[0-9]{3}) (?P<size>[0-9]+|-) "(?P<referrer>[^"]*)" "(?P<useragent>[^"]*)"

Regex options: ^ and $ match at line breaks
Regex flavors: PCRE 4, Perl 5.10, Python

^(\S+) \S+ (\S+) \[([^\]]+)\] "([A-Z]+) ([^ "]+)? HTTP/[0-9.]+" ([0-9]{3}) ([0-9]+|-) "([^"]*)" "([^"]*)" "([^"]*)" "([^"]*)"

Regex options: ^ and $ match at line breaks
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

7.14. Broken Links Reported in Web Logs

Solution

"(?:GET|POST) (?<file>[^#? "]+)(?:[#?][^ "]*)? HTTP/[0-9.]+" 404 (?:[0-9]+|-) "(?<referrer>http://www\.yoursite\.com[^"]*)"

Regex options: None
Regex flavors: .NET, Java 7, XRegExp, PCRE 7, Perl 5.10, Ruby 1.9

"(?:GET|POST) (?P<file>[^#? "]+)(?:[#?][^ "]*)? HTTP/[0-9.]+" 404 (?:[0-9]+|-) "(?P<referrer>http://www\.yoursite\.com[^"]*)"

Regex options: None
Regex flavors: PCRE 4, Perl 5.10, Python

"(?:GET|POST) ([^#? "]+)(?:[#?][^ "]*)? HTTP/[0-9.]+" 404 (?:[0-9]+|-) "(http://www\.yoursite\.com[^"]*)"

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Discussion

^(?<client>\S+) \S+ (?<userid>\S+) \[(?<datetime>[^\]]+)\] "(?<method>[A-Z]+) (?<request>[^ "]+)? HTTP/[0-9.]+" (?<status>404) (?<size>[0-9]+|-) "(?<referrer>http://www\.yoursite\.com[^"]*)" "(?<useragent>[^"]*)"

Regex options: ^ and $ match at line breaks
Regex flavors: .NET, Java 7, XRegExp, PCRE 7, Perl 5.10, Ruby 1.9

^(?<client>\S+) \S+ (?<userid>\S+) \[(?<datetime>[^\]]+)\] "(?<method>[A-Z]+) (?<request>[^ "]+)? HTTP/[0-9.]+" (?<status>404) (?<size>[0-9]+|-) "(?<referrer>http://www\.yoursite\.com[^"]*)"

Regex options: ^ and $ match at line breaks
Regex flavors: .NET, Java 7, XRegExp, PCRE 7, Perl 5.10, Ruby 1.9

"(?<method>GET|POST) (?<request>[^ "]+)? HTTP/[0-9.]+" (?<status>404) (?<size>[0-9]+|-) "(?<referrer>http://www\.yoursite\.com[^"]*)"

Regex options: ^ and $ match at line breaks
Regex flavors: .NET, Java 7, XRegExp, PCRE 7, Perl 5.10, Ruby 1.9

8. URLs, Paths, and Internet Addresses

8.1. Validating URLs

Solution

^(https?|ftp|file)://.+$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python

\A(https?|ftp|file)://.+\Z

Regex options: Case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

\A                         # Anchor
(https?|ftp)://            # Scheme
[a-z0-9-]+(\.[a-z0-9-]+)+  # Domain
([/?].*)?                  # Path and/or parameters
\Z                         # Anchor

Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

^(https?|ftp)://[a-z0-9-]+(\.[a-z0-9-]+)+([/?].+)?$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

\A                             # Anchor
((https?|ftp)://|(www|ftp)\.)  # Scheme or subdomain
[a-z0-9-]+(\.[a-z0-9-]+)+      # Domain
([/?].*)?                      # Path and/or parameters
\Z                             # Anchor

Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

^((https?|ftp)://|(www|ftp)\.)[a-z0-9-]+(\.[a-z0-9-]+)+([/?].*)?$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python

\A                         # Anchor
(https?|ftp)://            # Scheme
[a-z0-9-]+(\.[a-z0-9-]+)+  # Domain
(/[\w-]+)*                 # Path
/[\w-]+\.(gif|png|jpg)     # File
\Z                         # Anchor

Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

^(https?|ftp)://[a-z0-9-]+(\.[a-z0-9-]+)+(/[\w-]+)*/[\w-]+\.(gif|png|jpg)$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python

8.2. Finding URLs Within Full Text

Solution

\b(https?|ftp|file)://\S+

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

\b(https?|ftp|file)://[-A-Z0-9+&@#/%?=~_|$!:,.;]*[A-Z0-9+&@#/%=~_|$]

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

\b((https?|ftp|file)://|(www|ftp)\.)[-A-Z0-9+&@#/%?=~_|$!:,.;]*[A-Z0-9+&@#/%=~_|$]

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

8.3. Finding Quoted URLs in Full Text

Solution

\b(?:(?:https?|ftp|file)://|(www|ftp)\.)[-A-Z0-9+&@#/%?=~_|$!:,.;]*
                                        [-A-Z0-9+&@#/%=~_|$]
|"(?:(?:https?|ftp|file)://|(www|ftp)\.)[^"\r\n]+"
|'(?:(?:https?|ftp|file)://|(www|ftp)\.)[^'\r\n]+'

Regex options: Free-spacing, case insensitive, dot matches line breaks, anchors match at line breaks
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

8.4. Finding URLs with Parentheses in Full Text

Solution

\b(?:(?:https?|ftp|file)://|www\.|ftp\.)
  (?:\([-A-Z0-9+&@#/%=~_|$?!:,.]*\)|[-A-Z0-9+&@#/%=~_|$?!:,.])*
  (?:\([-A-Z0-9+&@#/%=~_|$?!:,.]*\)|[A-Z0-9+&@#/%=~_|$])

Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

\b(?:(?:https?|ftp|file)://|www\.|ftp\.)(?:\([-A-Z0-9+&@#/%=~_|$?!:,.]*\)|[-A-Z0-9+&@#/%=~_|$?!:,.])*(?:\([-A-Z0-9+&@#/%=~_|$?!:,.]*\)|[A-Z0-9+&@#/%=~_|$])

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Discussion

http://en.wikipedia.org/wiki/PC_Tools_(Central_Point_Software)
http://msdn.microsoft.com/en-us/library/aa752574(VS.85).aspx

RegexBuddy's website (at http://www.regexbuddy.com) is really cool.

[-A-Z0-9+&@#/%=~_|$?!:,.]

\([-A-Z0-9+&@#/%=~_|$?!:,.]*\)|[-A-Z0-9+&@#/%=~_|$?!:,.]

[A-Z0-9+&@#/%=~_|$]

\([-A-Z0-9+&@#/%=~_|$?!:,.]*\)|[A-Z0-9+&@#/%=~_|$]

8.5. Turn URLs into Links

Solution

<a href="$&">$&</a>

<a href="$0">$0</a>

<a href="\0">\0</a>

<a href="\&">\&</a>

<a href="\g<0>">\g<0></a>

8.6. Validating URNs

Solution

\Aurn:
# Namespace Identifier
[a-z0-9][a-z0-9-]{0,31}:
# Namespace Specific String
[a-z0-9()+,\-.:=@;$_!*'%/?#]+
\Z

Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

^urn:[a-z0-9][a-z0-9-]{0,31}:[a-z0-9()+,\-.:=@;$_!*'%/?#]+$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python

\burn:
# Namespace Identifier
[a-z0-9][a-z0-9-]{0,31}:
# Namespace Specific String
[a-z0-9()+,\-.:=@;$_!*'%/?#]+

Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

\burn:[a-z0-9][a-z0-9-]{0,31}:[a-z0-9()+,\-.:=@;$_!*'%/?#]+

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

\burn:
# Namespace Identifier
[a-z0-9][a-z0-9-]{0,31}:
# Namespace Specific String
[a-z0-9()+,\-.:=@;$_!*'%/?#]*[a-z0-9+=@$/]

Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

\burn:[a-z0-9][a-z0-9-]{0,31}:[a-z0-9()+,\-.:=@;$_!*'%/?#]*[a-z0-9+=@$/]

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Discussion

The URN is urn:nid:nss, isn't it?

8.7. Validating Generic URLs

Solution

\A
(# Scheme
 [a-z][a-z0-9+\-.]*:
 (# Authority & path
  //
  ([a-z0-9\-._~%!$&'()*+,;=]+@)?              # User
  ([a-z0-9\-._~%]+                            # Named host
  |\[[a-f0-9:.]+\]                            # IPv6 host
  |\[v[a-f0-9][a-z0-9\-._~%!$&'()*+,;=:]+\])  # IPvFuture host
  (:[0-9]+)?                                  # Port
  (/[a-z0-9\-._~%!$&'()*+,;=:@]+)*/?          # Path
 |# Path without authority
  (/?[a-z0-9\-._~%!$&'()*+,;=:@]+(/[a-z0-9\-._~%!$&'()*+,;=:@]+)*/?)?
 )
|# Relative URL (no scheme or authority)
 (# Relative path
  [a-z0-9\-._~%!$&'()*+,;=@]+(/[a-z0-9\-._~%!$&'()*+,;=:@]+)*/?
 |# Absolute path
  (/[a-z0-9\-._~%!$&'()*+,;=:@]+)+/?
 )
)
# Query
(\?[a-z0-9\-._~%!$&'()*+,;=:@/?]*)?
# Fragment
(\#[a-z0-9\-._~%!$&'()*+,;=:@/?]*)?
\Z

Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

\A
(# Scheme
 (?<scheme>[a-z][a-z0-9+\-.]*):
 (# Authority & path
  //
  (?<user>[a-z0-9\-._~%!$&'()*+,;=]+@)?              # User
  (?<host>[a-z0-9\-._~%]+                            # Named host
  |       \[[a-f0-9:.]+\]                            # IPv6 host
  |       \[v[a-f0-9][a-z0-9\-._~%!$&'()*+,;=:]+\])  # IPvFuture host
  (?<port>:[0-9]+)?                                  # Port
  (?<path>(/[a-z0-9\-._~%!$&'()*+,;=:@]+)*/?)        # Path
 |# Path without authority
  (?<path>/?[a-z0-9\-._~%!$&'()*+,;=:@]+
          (/[a-z0-9\-._~%!$&'()*+,;=:@]+)*/?)?
 )
|# Relative URL (no scheme or authority)
 (?<path>
  # Relative path
  [a-z0-9\-._~%!$&'()*+,;=@]+(/[a-z0-9\-._~%!$&'()*+,;=:@]+)*/?
 |# Absolute path
  (/[a-z0-9\-._~%!$&'()*+,;=:@]+)+/?
 )
)
# Query
(?<query>\?[a-z0-9\-._~%!$&'()*+,;=:@/?]*)?
# Fragment
(?<fragment>\#[a-z0-9\-._~%!$&'()*+,;=:@/?]*)?
\Z

Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Perl 5.10, Ruby 1.9

\A
(# Scheme
 (?<scheme>[a-z][a-z0-9+\-.]*):
 (# Authority & path
  //
  (?<user>[a-z0-9\-._~%!$&'()*+,;=]+@)?              # User
  (?<host>[a-z0-9\-._~%]+                            # Named host
  |       \[[a-f0-9:.]+\]                            # IPv6 host
  |       \[v[a-f0-9][a-z0-9\-._~%!$&'()*+,;=:]+\])  # IPvFuture host
  (?<port>:[0-9]+)?                                  # Port
  (?<hostpath>(/[a-z0-9\-._~%!$&'()*+,;=:@]+)*/?)    # Path
 |# Path without authority
  (?<schemepath>/?[a-z0-9\-._~%!$&'()*+,;=:@]+
                (/[a-z0-9\-._~%!$&'()*+,;=:@]+)*/?)?
 )
|# Relative URL (no scheme or authority)
 (?<relpath>
  # Relative path
  [a-z0-9\-._~%!$&'()*+,;=@]+(/[a-z0-9\-._~%!$&'()*+,;=:@]+)*/?
 |# Absolute path
  (/[a-z0-9\-._~%!$&'()*+,;=:@]+)+/?
 )
)
# Query
(?<query>\?[a-z0-9\-._~%!$&'()*+,;=:@/?]*)?
# Fragment
(?<fragment>\#[a-z0-9\-._~%!$&'()*+,;=:@/?]*)?
\Z

Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java 7, PCRE 7, Perl 5.10, Ruby 1.9

\A
(# Scheme
 (?P<scheme>[a-z][a-z0-9+\-.]*):
 (# Authority & path
  //
  (?P<user>[a-z0-9\-._~%!$&'()*+,;=]+@)?              # User
  (?P<host>[a-z0-9\-._~%]+                            # Named host
  |       \[[a-f0-9:.]+\]                             # IPv6 host
  |       \[v[a-f0-9][a-z0-9\-._~%!$&'()*+,;=:]+\])   # IPvFuture host
  (?P<port>:[0-9]+)?                                  # Port
  (?P<hostpath>(/[a-z0-9\-._~%!$&'()*+,;=:@]+)*/?)    # Path
 |# Path without authority
  (?P<schemepath>/?[a-z0-9\-._~%!$&'()*+,;=:@]+
                 (/[a-z0-9\-._~%!$&'()*+,;=:@]+)*/?)?
 )
|# Relative URL (no scheme or authority)
 (?P<relpath>
  # Relative path
  [a-z0-9\-._~%!$&'()*+,;=@]+(/[a-z0-9\-._~%!$&'()*+,;=:@]+)*/?
 |# Absolute path
  (/[a-z0-9\-._~%!$&'()*+,;=:@]+)+/?
 )
)
# Query
(?P<query>\?[a-z0-9\-._~%!$&'()*+,;=:@/?]*)?
# Fragment
(?P<fragment>\#[a-z0-9\-._~%!$&'()*+,;=:@/?]*)?
\Z

Regex options: Free-spacing, case insensitive
Regex flavors: PCRE 4 and later, Perl 5.10, Python

^([a-z][a-z0-9+\-.]*:(\/\/([a-z0-9\-._~%!$&'()*+,;=]+@)?([a-z0-9\-._~%]+|\[[a-f0-9:.]+\]|\[v[a-f0-9][a-z0-9\-._~%!$&'()*+,;=:]+\])(:[0-9]+)?(\/[a-z0-9\-._~%!$&'()*+,;=:@]+)*\/?|(\/?[a-z0-9\-._~%!$&'()*+,;=:@]+(\/[a-z0-9\-._~%!$&'()*+,;=:@]+)*\/?)?)|([a-z0-9\-._~%!$&'()*+,;=@]+(\/[a-z0-9\-._~%!$&'()*+,;=:@]+)*\/?|(\/[a-z0-9\-._~%!$&'()*+,;=:@]+)+\/?))
(\?[a-z0-9\-._~%!$&'()*+,;=:@\/?]*)?(#[a-z0-9\-._~%!$&'()*+,;=:@\/?]*)?$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python

8.8. Extracting the Scheme from a URL

Solution

Extract the scheme from a URL known to be valid

^([a-z][a-z0-9+\-.]*):

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Extract the scheme while validating the URL

\A
([a-z][a-z0-9+\-.]*):
(# Authority & path
 //
 ([a-z0-9\-._~%!$&'()*+,;=]+@)?              # User
 ([a-z0-9\-._~%]+                            # Named host
 |\[[a-f0-9:.]+\]                            # IPv6 host
 |\[v[a-f0-9][a-z0-9\-._~%!$&'()*+,;=:]+\])  # IPvFuture host
 (:[0-9]+)?                                  # Port
 (/[a-z0-9\-._~%!$&'()*+,;=:@]+)*/?          # Path
|# Path without authority
 (/?[a-z0-9\-._~%!$&'()*+,;=:@]+(/[a-z0-9\-._~%!$&'()*+,;=:@]+)*/?)?
)
# Query
(\?[a-z0-9\-._~%!$&'()*+,;=:@/?]*)?
# Fragment
(\#[a-z0-9\-._~%!$&'()*+,;=:@/?]*)?
\Z

Regex options: Case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

^([a-z][a-z0-9+\-.]*):(//([a-z0-9\-._~%!$&'()*+,;=]+@)?([a-z0-9\-._~%]+|\[[a-f0-9:.]+\]|\[v[a-f0-9][a-z0-9\-._~%!$&'()*+,;=:]+\])(:[0-9]+)?(/[a-z0-9\-._~%!$&'()*+,;=:@]+)*/?|(/?[a-z0-9\-._~%!$&'()*+,;=:@]+(/[a-z0-9\-._~%!$&'()*+,;=:@]+)*/?)?)(\?[a-z0-9\-._~%!$&'()*+,;=:@/?]*)?(#[a-z0-9\-._~%!$&'()*+,;=:@/?]*)?$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python

8.9. Extracting the User from a URL

Solution

Extract the user from a URL known to be valid

^[a-z0-9+\-.]+://([a-z0-9\-._~%!$&'()*+,;=]+)@

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Extract the user while validating the URL

\A
[a-z][a-z0-9+\-.]*://                       # Scheme
([a-z0-9\-._~%!$&'()*+,;=]+)@               # User
([a-z0-9\-._~%]+                            # Named host
|\[[a-f0-9:.]+\]                            # IPv6 host
|\[v[a-f0-9][a-z0-9\-._~%!$&'()*+,;=:]+\])  # IPvFuture host
(:[0-9]+)?                                  # Port
(/[a-z0-9\-._~%!$&'()*+,;=:@]+)*/?          # Path
(\?[a-z0-9\-._~%!$&'()*+,;=:@/?]*)?         # Query
(\#[a-z0-9\-._~%!$&'()*+,;=:@/?]*)?         # Fragment
\Z

Regex options: Case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

^[a-z][a-z0-9+\-.]*://([a-z0-9\-._~%!$&'()*+,;=]+)@([a-z0-9\-._~%]+|\[[a-f0-9:.]+\]|\[v[a-f0-9][a-z0-9\-._~%!$&'()*+,;=:]+\])(:[0-9]+)?(/[a-z0-9\-._~%!$&'()*+,;=:@]+)*/?(\?[a-z0-9\-._~%!$&'()*+,;=:@/?]*)?(#[a-z0-9\-._~%!$&'()*+,;=:@/?]*)?$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python

8.10. Extracting the Host from a URL

Solution

Extract the host from a URL known to be valid

\A
[a-z][a-z0-9+\-.]*://               # Scheme
([a-z0-9\-._~%!$&'()*+,;=]+@)?      # User
([a-z0-9\-._~%]+                    # Named or IPv4 host
|\[[a-z0-9\-._~%!$&'()*+,;=:]+\])   # IPv6+ host

Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^[a-z][a-z0-9+\-.]*://([a-z0-9\-._~%!$&'()*+,;=]+@)?([a-z0-9\-._~%]+|\[[a-z0-9\-._~%!$&'()*+,;=:]+\])

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Extract the host while validating the URL

\A
[a-z][a-z0-9+\-.]*://                       # Scheme
([a-z0-9\-._~%!$&'()*+,;=]+@)?              # User
([a-z0-9\-._~%]+                            # Named host
|\[[a-f0-9:.]+\]                            # IPv6 host
|\[v[a-f0-9][a-z0-9\-._~%!$&'()*+,;=:]+\])  # IPvFuture host
(:[0-9]+)?                                  # Port
(/[a-z0-9\-._~%!$&'()*+,;=:@]+)*/?          # Path
(\?[a-z0-9\-._~%!$&'()*+,;=:@/?]*)?         # Query
(\#[a-z0-9\-._~%!$&'()*+,;=:@/?]*)?         # Fragment
\Z

Regex options: Case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

^[a-z][a-z0-9+\-.]*://([a-z0-9\-._~%!$&'()*+,;=]+@)?([a-z0-9\-._~%]+|\[[a-f0-9:.]+\]|\[v[a-f0-9][a-z0-9\-._~%!$&'()*+,;=:]+\])(:[0-9]+)?(/[a-z0-9\-._~%!$&'()*+,;=:@]+)*/?(\?[a-z0-9\-._~%!$&'()*+,;=:@/?]*)?(#[a-z0-9\-._~%!$&'()*+,;=:@/?]*)?$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python

8.11. Extracting the Port from a URL

Solution

Extract the port from a URL known to be valid

\A
[a-z][a-z0-9+\-.]*://               # Scheme
([a-z0-9\-._~%!$&'()*+,;=]+@)?      # User
([a-z0-9\-._~%]+                    # Named or IPv4 host
|\[[a-z0-9\-._~%!$&'()*+,;=:]+\])   # IPv6+ host
:(?<port>[0-9]+)                    # Port number

Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java 7, PCRE 7, Perl 5.10, Ruby 1.9

\A
[a-z][a-z0-9+\-.]*://               # Scheme
([a-z0-9\-._~%!$&'()*+,;=]+@)?      # User
([a-z0-9\-._~%]+                    # Named or IPv4 host
|\[[a-z0-9\-._~%!$&'()*+,;=:]+\])   # IPv6+ host
:(?P<port>[0-9]+)                   # Port number

Regex options: Free-spacing, case insensitive
Regex flavors: PCRE, Perl 5.10, Python

^[a-z][a-z0-9+\-.]*://([a-z0-9\-._~%!$&'()*+,;=]+@)?([a-z0-9\-._~%]+|\[[a-z0-9\-._~%!$&'()*+,;=:]+\]):([0-9]+)

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Extract the port while validating the URL

\A
[a-z][a-z0-9+\-.]*://                       # Scheme
([a-z0-9\-._~%!$&'()*+,;=]+@)?              # User
([a-z0-9\-._~%]+                            # Named host
|\[[a-f0-9:.]+\]                            # IPv6 host
|\[v[a-f0-9][a-z0-9\-._~%!$&'()*+,;=:]+\])  # IPvFuture host
:([0-9]+)                                   # Port
(/[a-z0-9\-._~%!$&'()*+,;=:@]+)*/?          # Path
(\?[a-z0-9\-._~%!$&'()*+,;=:@/?]*)?         # Query
(\#[a-z0-9\-._~%!$&'()*+,;=:@/?]*)?         # Fragment
\Z

Regex options: Case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

^[a-z][a-z0-9+\-.]*:\/\/([a-z0-9\-._~%!$&'()*+,;=]+@)?([a-z0-9\-._~%]+|\[[a-f0-9:.]+\]|\[v[a-f0-9][a-z0-9\-._~%!$&'()*+,;=:]+\]):([0-9]+)(\/[a-z0-9\-._~%!$&'()*+,;=:@]+)*\/?(\?[a-z0-9\-._~%!$&'()*+,;=:@\/?]*)?(#[a-z0-9\-._~%!$&'()*+,;=:@\/?]*)?$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python

8.12. Extracting the Path from a URL

Solution

\A
# Skip over scheme and authority, if any
([a-z][a-z0-9+\-.]*:(//[^/?#]+)?)?
# Path
([a-z0-9\-._~%!$&'()*+,;=:@/]*)

Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

^([a-z][a-z0-9+\-.]*:(//[^/?#]+)?)?([a-z0-9\-._~%!$&'()*+,;=:@/]*)

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

\A
# Skip over scheme and authority, if any
([a-z][a-z0-9+\-.]*:(//[^/?#]+)?)?
# Path
(/?[a-z0-9\-._~%!$&'()*+,;=@]+(/[a-z0-9\-._~%!$&'()*+,;=:@]+)*/?|/)
# Query, fragment, or end of URL
([#?]|\Z)

Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

^([a-z][a-z0-9+\-.]*:(//[^/?#]+)?)?(/?[a-z0-9\-._~%!$&'()*+,;=@]+(/[a-z0-9\-._~%!$&'()*+,;=:@]+)*/?|/)([#?]|$)

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

\A
# Skip over scheme and authority, if any
(?>([a-z][a-z0-9+\-.]*:(//[^/?#]+)?)?)
# Path
([a-z0-9\-._~%!$&'()*+,;=:@/]+)

Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Ruby

8.13. Extracting the Query from a URL

Solution

^[^?#]+\?([^#]+)

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

8.14. Extracting the Fragment from a URL

Solution

#(.+)

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

8.15. Validating Domain Names

Solution

^([a-z0-9]+(-[a-z0-9]+)*\.)+[a-z]{2,}$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python

\A([a-z0-9]+(-[a-z0-9]+)*\.)+[a-z]{2,}\Z

Regex options: Case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

\b([a-z0-9]+(-[a-z0-9]+)*\.)+[a-z]{2,}\b

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

\b((?=[a-z0-9-]{1,63}\.)[a-z0-9]+(-[a-z0-9]+)*\.)+[a-z]{2,63}\b

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

\b((xn--)?[a-z0-9]+(-[a-z0-9]+)*\.)+[a-z]{2,}\b

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

\b((?=[a-z0-9-]{1,63}\.)(xn--)?[a-z0-9]+(-[a-z0-9]+)*\.)+[a-z]{2,63}\b

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

8.16. Matching IPv4 Addresses

Solution

Regular expression

^(?:[0-9]{1,3}\.){3}[0-9]{1,3}$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.){3}(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

\b(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.){3}(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\b

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])$

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Perl

if ($subject =~ m/^([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})/)
{
    $ip = $1 << 24 | $2 << 16 | $3 << 8 | $4;
}

8.17. Matching IPv6 Addresses

Solution

Standard notation

^(?:[A-F0-9]{1,4}:){7}[A-F0-9]{1,4}$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python

\A(?:[A-F0-9]{1,4}:){7}[A-F0-9]{1,4}\Z

Regex options: Case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

(?<![:.\w])(?:[A-F0-9]{1,4}:){7}[A-F0-9]{1,4}(?![:.\w])

Regex options: Case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby 1.9

\b(?:[A-F0-9]{1,4}:){7}[A-F0-9]{1,4}\b

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Mixed notation

^(?:[A-F0-9]{1,4}:){6}(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.){3}(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

(?<![:.\w])(?:[A-F0-9]{1,4}:){6}(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.){3}(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])(?![:.\w])

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

\b(?:[A-F0-9]{1,4}:){6}(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.){3}(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\b

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Standard or mixed notation

\A                                                       # Start of string
(?:[A-F0-9]{1,4}:){6}                                        # 6 words
(?:[A-F0-9]{1,4}:[A-F0-9]{1,4}                               # 2 words
|  (?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.){3}  # or 4 bytes
   (?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])
)\Z                                                      # End of string

Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

^(?:[A-F0-9]{1,4}:){6}(?:[A-F0-9]{1,4}:[A-F0-9]{1,4}|(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.){3}(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9]))$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python

(?<![:.\w])                                              # Anchor address
(?:[A-F0-9]{1,4}:){6}                                        # 6 words
(?:[A-F0-9]{1,4}:[A-F0-9]{1,4}                               # 2 words
|  (?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.){3}  # or 4 bytes
   (?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])
)(?![:.\w])                                              # Anchor address

Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby 1.9

\b                                                       # Word boundary
(?:[A-F0-9]{1,4}:){6}                                        # 6 words
(?:[A-F0-9]{1,4}:[A-F0-9]{1,4}                               # 2 words
|  (?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.){3}  # or 4 bytes
   (?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])
)\b                                                      # Word boundary

Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

\b(?:[A-F0-9]{1,4}:){6}(?:[A-F0-9]{1,4}:[A-F0-9]{1,4}|(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.){3}(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9]))\b

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Compressed notation

\A(?:
 # Standard
 (?:[A-F0-9]{1,4}:){7}[A-F0-9]{1,4}
 # Compressed with at most 7 colons
|(?=(?:[A-F0-9]{0,4}:){0,7}[A-F0-9]{0,4}
    \Z) # and anchored
 # and at most 1 double colon
 (([0-9A-F]{1,4}:){1,7}|:)((:[0-9A-F]{1,4}){1,7}|:)
 # Compressed with 8 colons
|(?:[A-F0-9]{1,4}:){7}:|:(:[A-F0-9]{1,4}){7}
)\Z

Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

^(?:(?:[A-F0-9]{1,4}:){7}[A-F0-9]{1,4}|(?=(?:[A-F0-9]{0,4}:){0,7}[A-F0-9]{0,4}$)(([0-9A-F]{1,4}:){1,7}|:)((:[0-9A-F]{1,4}){1,7}|:)|(?:[A-F0-9]{1,4}:){7}:|:(:[A-F0-9]{1,4}){7})$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python

(?<![:.\w])(?:
 # Standard
 (?:[A-F0-9]{1,4}:){7}[A-F0-9]{1,4}
 # Compressed with at most 7 colons
|(?=(?:[A-F0-9]{0,4}:){0,7}[A-F0-9]{0,4}
    (?![:.\w])) # and anchored
 # and at most 1 double colon
 (([0-9A-F]{1,4}:){1,7}|:)((:[0-9A-F]{1,4}){1,7}|:)
 # Compressed with 8 colons
|(?:[A-F0-9]{1,4}:){7}:|:(:[A-F0-9]{1,4}){7}
)(?![:.\w])

Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby 1.9

(?:
 # Standard
 (?:[A-F0-9]{1,4}:){7}[A-F0-9]{1,4}
 # Compressed with at most 7 colons
|(?=(?:[A-F0-9]{0,4}:){0,7}[A-F0-9]{0,4}
    (?![:.\w])) # and anchored
 # and at most 1 double colon
 (([0-9A-F]{1,4}:){1,7}|:)((:[0-9A-F]{1,4}){1,7}|:)
 # Compressed with 8 colons
|(?:[A-F0-9]{1,4}:){7}:|:(:[A-F0-9]{1,4}){7}
)(?![:.\w])

Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

(?:(?:[A-F0-9]{1,4}:){7}[A-F0-9]{1,4}|(?=(?:[A-F0-9]{0,4}:){0,7}[A-F0-9]{0,4}(?![:.\w]))(([0-9A-F]{1,4}:){1,7}|:)((:[0-9A-F]{1,4}){1,7}|:)|(?:[A-F0-9]{1,4}:){7}:|:(:[A-F0-9]{1,4}){7})(?![:.\w])

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Compressed mixed notation

\A
(?:
 # Non-compressed
 (?:[A-F0-9]{1,4}:){6}
 # Compressed with at most 6 colons
|(?=(?:[A-F0-9]{0,4}:){0,6}
    (?:[0-9]{1,3}\.){3}[0-9]{1,3}  # and 4 bytes
    \Z)                            # and anchored
 # and at most 1 double colon
 (([0-9A-F]{1,4}:){0,5}|:)((:[0-9A-F]{1,4}){1,5}:|:)
 # Compressed with 7 colons and 5 numbers
|::(?:[A-F0-9]{1,4}:){5}
)
# 255.255.255.
(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.){3}
# 255
(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])
\Z

Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

^(?:(?:[A-F0-9]{1,4}:){6}|(?=(?:[A-F0-9]{0,4}:){0,6}(?:[0-9]{1,3}\.){3}[0-9]{1,3}$)(([0-9A-F]{1,4}:){0,5}|:)((:[0-9A-F]{1,4}){1,5}:|:)|::(?:[A-F0-9]{1,4}:){5})(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python

(?<![:.\w])
(?:
 # Non-compressed
 (?:[A-F0-9]{1,4}:){6}
 # Compressed with at most 6 colons
|(?=(?:[A-F0-9]{0,4}:){0,6}
    (?:[0-9]{1,3}\.){3}[0-9]{1,3}  # and 4 bytes
    (?![:.\w]))                    # and anchored
 # and at most 1 double colon
 (([0-9A-F]{1,4}:){0,5}|:)((:[0-9A-F]{1,4}){1,5}:|:)
 # Compressed with 7 colons and 5 numbers
|::(?:[A-F0-9]{1,4}:){5}
)
# 255.255.255.
(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.){3}
# 255
(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])
(?![:.\w])

Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby 1.9

(?:
 # Non-compressed
 (?:[A-F0-9]{1,4}:){6}
 # Compressed with at most 6 colons
|(?=(?:[A-F0-9]{0,4}:){0,6}
    (?:[0-9]{1,3}\.){3}[0-9]{1,3}  # and 4 bytes
    (?![:.\w]))                    # and anchored
 # and at most 1 double colon
 (([0-9A-F]{1,4}:){0,5}|:)((:[0-9A-F]{1,4}){1,5}:|:)
 # Compressed with 7 colons and 5 numbers
|::(?:[A-F0-9]{1,4}:){5}
)
# 255.255.255.
(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.){3}
# 255
(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])
(?![:.\w])

Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

(?:(?:[A-F0-9]{1,4}:){6}|(?=(?:[A-F0-9]{0,4}:){0,6}(?:[0-9]{1,3}\.){3}[0-9]{1,3}(?![:.\w]))(([0-9A-F]{1,4}:){0,5}|:)((:[0-9A-F]{1,4}){1,5}:|:)|::(?:[A-F0-9]{1,4}:){5})(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.){3}(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])(?![:.\w])

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Standard, mixed, or compressed notation

\A(?:
# Mixed
 (?:
  # Non-compressed
  (?:[A-F0-9]{1,4}:){6}
  # Compressed with at most 6 colons
 |(?=(?:[A-F0-9]{0,4}:){0,6}
     (?:[0-9]{1,3}\.){3}[0-9]{1,3}  # and 4 bytes
     \Z)                            # and anchored
  # and at most 1 double colon
  (([0-9A-F]{1,4}:){0,5}|:)((:[0-9A-F]{1,4}){1,5}:|:)
  # Compressed with 7 colons and 5 numbers
 |::(?:[A-F0-9]{1,4}:){5}
 )
 # 255.255.255.
 (?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.){3}
 # 255
 (?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])
|# Standard
 (?:[A-F0-9]{1,4}:){7}[A-F0-9]{1,4}
|# Compressed with at most 7 colons
 (?=(?:[A-F0-9]{0,4}:){0,7}[A-F0-9]{0,4}
    \Z)  # and anchored
 # and at most 1 double colon
 (([0-9A-F]{1,4}:){1,7}|:)((:[0-9A-F]{1,4}){1,7}|:)
 # Compressed with 8 colons
|(?:[A-F0-9]{1,4}:){7}:|:(:[A-F0-9]{1,4}){7}
)\Z

Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

^(?:(?:(?:[A-F0-9]{1,4}:){6}|(?=(?:[A-F0-9]{0,4}:){0,6}(?:[0-9]{1,3}\.){3}[0-9]{1,3}$)(([0-9A-F]{1,4}:){0,5}|:)((:[0-9A-F]{1,4}){1,5}:|:)|::(?:[A-F0-9]{1,4}:){5})(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.){3}(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])|(?:[A-F0-9]{1,4}:){7}[A-F0-9]{1,4}|(?=(?:[A-F0-9]{0,4}:){0,7}[A-F0-9]{0,4}$)(([0-9A-F]{1,4}:){1,7}|:)((:[0-9A-F]{1,4}){1,7}|:)|(?:[A-F0-9]{1,4}:){7}:|:(:[A-F0-9]{1,4}){7})$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python

(?<![:.\w])(?:
# Mixed
 (?:
  # Non-compressed
  (?:[A-F0-9]{1,4}:){6}
  # Compressed with at most 6 colons
 |(?=(?:[A-F0-9]{0,4}:){0,6}
     (?:[0-9]{1,3}\.){3}[0-9]{1,3}  # and 4 bytes
     (?![:.\w]))                    # and anchored
  # and at most 1 double colon
  (([0-9A-F]{1,4}:){0,5}|:)((:[0-9A-F]{1,4}){1,5}:|:)
  # Compressed with 7 colons and 5 numbers
 |::(?:[A-F0-9]{1,4}:){5}
 )
 # 255.255.255.
 (?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.){3}
 # 255
 (?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])
|# Standard
 (?:[A-F0-9]{1,4}:){7}[A-F0-9]{1,4}
|# Compressed with at most 7 colons
 (?=(?:[A-F0-9]{0,4}:){0,7}[A-F0-9]{0,4}
    (?![:.\w]))  # and anchored
 # and at most 1 double colon
 (([0-9A-F]{1,4}:){1,7}|:)((:[0-9A-F]{1,4}){1,7}|:)
 # Compressed with 8 colons
|(?:[A-F0-9]{1,4}:){7}:|:(:[A-F0-9]{1,4}){7}
)(?![:.\w])

Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby 1.9

(?:
 # Mixed
 (?:
  # Non-compressed
  (?:[A-F0-9]{1,4}:){6}
  # Compressed with at most 6 colons
 |(?=(?:[A-F0-9]{0,4}:){0,6}
     (?:[0-9]{1,3}\.){3}[0-9]{1,3}  # and 4 bytes
     (?![:.\w]))                    # and anchored
  # and at most 1 double colon
  (([0-9A-F]{1,4}:){0,5}|:)((:[0-9A-F]{1,4}){1,5}:|:)
  # Compressed with 7 colons and 5 numbers
 |::(?:[A-F0-9]{1,4}:){5}
 )
 # 255.255.255.
 (?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.){3}
 # 255
 (?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])
|# Standard
 (?:[A-F0-9]{1,4}:){7}[A-F0-9]{1,4}
|# Compressed with at most 7 colons
 (?=(?:[A-F0-9]{0,4}:){0,7}[A-F0-9]{0,4}
    (?![:.\w]))  # and anchored
 # and at most 1 double colon
 (([0-9A-F]{1,4}:){1,7}|:)((:[0-9A-F]{1,4}){1,7}|:)
 # Compressed with 8 colons
|(?:[A-F0-9]{1,4}:){7}:|:(:[A-F0-9]{1,4}){7}
)(?![:.\w])

Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

(?:(?:(?:[A-F0-9]{1,4}:){6}|(?=(?:[A-F0-9]{0,4}:){0,6}(?:[0-9]{1,3}\.){3}[0-9]{1,3}(?![:.\w]))(([0-9A-F]{1,4}:){0,5}|:)((:[0-9A-F]{1,4}){1,5}:|:)|::(?:[A-F0-9]{1,4}:){5})(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)|(?:[A-F0-9]{1,4}:){7}[A-F0-9]{1,4}|(?=(?:[A-F0-9]{0,4}:){0,7}[A-F0-9]{0,4}(?![:.\w]))(([0-9A-F]{1,4}:){1,7}|:)((:[0-9A-F]{1,4}){1,7}|:)|(?:[A-F0-9]{1,4}:){7}:|:(:[A-F0-9]{1,4}){7})(?![:.\w])

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Discussion

Compressed notation

(
  ([0-9A-F]{1,4}:){1,7}  # 1 to 7 words to the left
| :                      # or a double colon at the start
)
(
  (:[0-9A-F]{1,4}){1,7}  # 1 to 7 words to the right
| :                      # or a double colon at the end
)

Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

\A(?:a{7}x
 |  a{6}xb?
 |  a{5}xb{0,2}
 |  a{4}xb{0,3}
 |  a{3}xb{0,4}
 |  a{2}xb{0,5}
 |  axb{0,6}
 |  xb{0,7}
)\Z

Regex options: Free-spacing
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

\A
  (?=[abx]{1,8}\Z)
  a{0,7}xb{0,7}
\Z

Regex options: Free-spacing
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

Standard, mixed, or compressed notation

^(6words|compressed6words)ip4$

^(8words|compressed8words)$

^((6words|compressed6words)ip4|8words|compressed8words)$

^((6words|compressed6words)ip4|(8words|compressed8words))$

8.18. Validate Windows Paths

Solution

Drive letter paths

\A
[a-z]:\\                    # Drive
(?:[^\\/:*?"<>|\r\n]+\\)*   # Folder
[^\\/:*?"<>|\r\n]*          # File
\Z

Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

^[a-z]:\\(?:[^\\/:*?"<>|\r\n]+\\)*[^\\/:*?"<>|\r\n]*$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python

Drive letter and UNC paths

\A
(?:[a-z]:|\\\\[a-z0-9_.$\ -]+\\[a-z0-9_.$\ -]+)\\   # Drive
(?:[^\\/:*?"<>|\r\n]+\\)*                            # Folder
[^\\/:*?"<>|\r\n]*                                   # File
\Z

Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

^(?:[a-z]:|\\\\[a-z0-9_.$ -]+\\[a-z0-9_.$ -]+)\\(?:[^\\/:*?"<>|\r\n]+\\)*[^\\/:*?"<>|\r\n]*$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python

Drive letter, UNC, and relative paths

\A
(?:(?:[a-z]:|\\\\[a-z0-9_.$\ -]+\\[a-z0-9_.$\ -]+)\\|  # Drive
   \\?[^\\/:*?"<>|\r\n]+\\?)                            # Relative path
(?:[^\\/:*?"<>|\r\n]+\\)*                               # Folder
[^\\/:*?"<>|\r\n]*                                      # File
\Z

Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

^(?:(?:[a-z]:|\\\\[a-z0-9_.$ -]+\\[a-z0-9_.$ -]+)\\|\\?[^\\/:*?"<>|\r\n]+\\?)(?:[^\\/:*?"<>|\r\n]+\\)*[^\\/:*?"<>|\r\n]*$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python

8.19. Split Windows Paths into Their Parts

Solution

Drive letter paths

\A
(?<drive>[a-z]:)\\
(?<folder>(?:[^\\/:*?"<>|\r\n]+\\)*)
(?<file>[^\\/:*?"<>|\r\n]*)
\Z

Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java 7, PCRE 7, Perl 5.10, Ruby 1.9

\A
(?P<drive>[a-z]:)\\
(?P<folder>(?:[^\\/:*?"<>|\r\n]+\\)*)
(?P<file>[^\\/:*?"<>|\r\n]*)
\Z

Regex options: Free-spacing, case insensitive
Regex flavors: PCRE 4 and later, Perl 5.10, Python

\A
([a-z]:)\\
((?:[^\\/:*?"<>|\r\n]+\\)*)
([^\\/:*?"<>|\r\n]*)
\Z

Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

^([a-z]:)\\((?:[^\\/:*?"<>|\r\n]+\\)*)([^\\/:*?"<>|\r\n]*)$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python

Drive letter and UNC paths

\A
(?<drive>[a-z]:|\\\\[a-z0-9_.$ -]+\\[a-z0-9_.$ -]+)\\
(?<folder>(?:[^\\/:*?"<>|\r\n]+\\)*)
(?<file>[^\\/:*?"<>|\r\n]*)
\Z

Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java 7, PCRE 7, Perl 5.10, Ruby 1.9

\A
(?P<drive>[a-z]:|\\\\[a-z0-9_.$ -]+\\[a-z0-9_.$ -]+)\\
(?P<folder>(?:[^\\/:*?"<>|\r\n]+\\)*)
(?P<file>[^\\/:*?"<>|\r\n]*)
\Z

Regex options: Free-spacing, case insensitive
Regex flavors: PCRE 4 and later, Perl 5.10, Python

\A
([a-z]:|\\\\[a-z0-9_.$ -]+\\[a-z0-9_.$ -]+)\\
((?:[^\\/:*?"<>|\r\n]+\\)*)
([^\\/:*?"<>|\r\n]*)
\Z

Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

^([a-z]:|\\\\[a-z0-9_.$ -]+\\[a-z0-9_.$ -]+)\\((?:[^\\/:*?"<>|\r\n]+\\)*)([^\\/:*?"<>|\r\n]*)$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python

Drive letter, UNC, and relative paths

\A
(?<drive>[a-z]:\\|\\\\[a-z0-9_.$ -]+\\[a-z0-9_.$ -]+\\|\\?)
(?<folder>(?:[^\\/:*?"<>|\r\n]+\\)*)
(?<file>[^\\/:*?"<>|\r\n]*)
\Z

Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java 7, PCRE 7, Perl 5.10, Ruby 1.9

\A
(?P<drive>[a-z]:\\|\\\\[a-z0-9_.$ -]+\\[a-z0-9_.$ -]+\\|\\?)
(?P<folder>(?:[^\\/:*?"<>|\r\n]+\\)*)
(?P<file>[^\\/:*?"<>|\r\n]*)
\Z

Regex options: Free-spacing, case insensitive
Regex flavors: PCRE 4 and later, Perl 5.10, Python

\A
([a-z]:\\|\\\\[a-z0-9_.$ -]+\\[a-z0-9_.$ -]+\\|\\?)
((?:[^\\/:*?"<>|\r\n]+\\)*)
([^\\/:*?"<>|\r\n]*)
\Z

Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

^([a-z]:\\|\\\\[a-z0-9_.$ -]+\\[a-z0-9_.$ -]+\\|\\?)((?:[^\\/:*?"<>|\r\n]+\\)*)([^\\/:*?"<>|\r\n]*)$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python

Discussion

Drive letter, UNC, and relative paths

\A
(?:
   (?<drive>[a-z]:|\\\\[a-z0-9_.$ -]+\\[a-z0-9_.$ -]+)\\
   (?<folder>(?:[^\\/:*?"<>|\r\n]+\\)*)
   (?<file>[^\\/:*?"<>|\r\n]*)
|  (?<relativefolder>\\?(?:[^\\/:*?"<>|\r\n]+\\)+)
   (?<file2>[^\\/:*?"<>|\r\n]*)
|  (?<relativefile>[^\\/:*?"<>|\r\n]+)
)
\Z

Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java 7, PCRE 7, Perl 5.10, Ruby 1.9

\A
(?:
   (?P<drive>[a-z]:|\\\\[a-z0-9_.$ -]+\\[a-z0-9_.$ -]+)\\
   (?P<folder>(?:[^\\/:*?"<>|\r\n]+\\)*)
   (?P<file>[^\\/:*?"<>|\r\n]*)
|  (?P<relativefolder>\\?(?:[^\\/:*?"<>|\r\n]+\\)+)
   (?P<file2>[^\\/:*?"<>|\r\n]*)
|  (?P<relativefile>[^\\/:*?"<>|\r\n]+)
)
\Z

Regex options: Free-spacing, case insensitive
Regex flavors: PCRE 4 and later, Perl 5.10, Python

\A
(?:
   ([a-z]:|\\\\[a-z0-9_.$ -]+\\[a-z0-9_.$ -]+)\\
   ((?:[^\\/:*?"<>|\r\n]+\\)*)
   ([^\\/:*?"<>|\r\n]*)
|  (\\?(?:[^\\/:*?"<>|\r\n]+\\)+)
   ([^\\/:*?"<>|\r\n]*)
|  ([^\\/:*?"<>|\r\n]+)
)
\Z

Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

^(?:([a-z]:|\\\\[a-z0-9_.$ -]+\\[a-z0-9_.$ -]+)\\((?:[^\\/:*?"<>|\r\n]+\\)*)([^\\/:*?"<>|\r\n]*)|(\\?(?:[^\\/:*?"<>|\r\n]+\\)+)([^\\/:*?"<>|\r\n]*)|([^\\/:*?"<>|\r\n]+))$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python

\A
(?:
   (?<drive>[a-z]:|\\\\[a-z0-9_.$ -]+\\[a-z0-9_.$ -]+)\\
   (?<folder>(?:[^\\/:*?"<>|\r\n]+\\)*)
   (?<file>[^\\/:*?"<>|\r\n]*)
|  (?<folder>\\?(?:[^\\/:*?"<>|\r\n]+\\)+)
   (?<file>[^\\/:*?"<>|\r\n]*)
|  (?<file>[^\\/:*?"<>|\r\n]+)
)
\Z

Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Perl 5.10, Ruby 1.9

8.20. Extract the Drive Letter from a Windows Path

Solution

^([a-z]):

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

8.21. Extract the Server and Share from a UNC Path

Solution

^\\\\([a-z0-9_.$ -]+)\\([a-z0-9_.$ -]+)

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

8.22. Extract the Folder from a Windows Path

Solution

^([a-z]:|\\\\[a-z0-9_.$ -]+\\[a-z0-9_.$ -]+)?((?:\\|^)(?:[^\\/:*?"<>|\r\n]+\\)+)

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Discussion

^([a-z]:|\\\\[a-z0-9_.$ -]+\\[a-z0-9_.$ -]+)?((?:\\?(?:[^\\/:*?"<>|\r\n]+\\)+)

8.23. Extract the Filename from a Windows Path

Solution

[^\\/:*?"<>|\r\n]+$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

8.24. Extract the File Extension from a Windows Path

Solution

\.[^.\\/:*?"<>|\r\n]+$

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

8.25. Strip Invalid Characters from Filenames

Solution

Regular expression

[\\/:"*?<>|]+

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

9. Markup and Data Formats

Basic Rules for Formats Covered in This Chapter

<a href="http://www.regexcookbook.com"
    title = 'Regex Cookbook'>Click me!</a>

<!-- this is a comment -->
<!-- so is this, but this comment
    spans more than one line -->

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
    "http://www.w3.org/TR/html4/strict.dtd">

<!DOCTYPE html>

<!DOCTYPE example [
  <!ENTITY copy "&#169;">
  <!ENTITY copyright-notice "Copyright &copy; 2012, O'Reilly">
]>

aaa,b b,"""c"" cc"
1,,"333, three,
still more threes"

333, three,
still more threes

; last modified 2012-02-14

[user]
name=J. Random Hacker

[post]
title = How do I love thee, regular expressions?
content = "Let me count the ways..."

9.1. Find XML-Style Tags

Solution

Quick and dirty

<[^>]*>

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Allow > in attribute values

<(?:[^>"']|"[^"]*"|'[^']*')*>

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

<
(?: [^>"']   # Non-quoted character
  | "[^"]*"  # Double-quoted attribute value
  | '[^']*'  # Single-quoted attribute value
)*
>

Regex options: Free-spacing
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

(X)HTML tags (loose)

</?([A-Za-z][^\s>/]*)(?:=\s*(?:"[^"]*"|'[^']*'|[^\s>]+)|[^>])*(?:>|$)

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

<
/?                  # Permit closing tags
([A-Za-z][^\s>/]*)  # Capture the tag name to backreference 1
(?:                 # Attribute value branch:
  = \s*             #   Signals the start of an attribute value
  (?: "[^"]*"       #   Double-quoted attribute value
    | '[^']*'       #   Single-quoted attribute value
    | [^\s>]+       #   Unquoted attribute value
  )
|                   # Non-attribute-value branch:
  [^>]              #   Character outside of an attribute value
)*
(?:>|$)             # End of the tag or string

Regex options: Free-spacing
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

(X)HTML tags (strict)

<(?:([A-Z][-:A-Z0-9]*)(?:\s+[A-Z][-:A-Z0-9]*(?:\s*=\s*(?:"[^"]*"|'[^']*'|[^"'`=<>\s]+))?)*\s*/?|/([A-Z][-:A-Z0-9]*)\s*)>

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

<
(?:                    # Branch for opening tags:
  ([A-Z][-:A-Z0-9]*)   #   Capture the opening tag name to backreference 1
  (?:                  #   This group permits zero or more attributes
    \s+                #   Whitespace to separate attributes
    [A-Z][-:A-Z0-9]*   #   Attribute name
    (?: \s*=\s*        #   Attribute name-value delimiter
      (?: "[^"]*"      #   Double-quoted attribute value
        | '[^']*'      #   Single-quoted attribute value
        | [^"'`=<>\s]+ #   Unquoted attribute value (HTML)
      )
    )?                 #   Permit attributes without a value (HTML)
  )*
  \s*                  #   Permit trailing whitespace
  /?                   #   Permit self-closed tags
|                      # Branch for closing tags:
  /
  ([A-Z][-:A-Z0-9]*)   #   Capture the closing tag name to backreference 2
  \s*                  #   Permit trailing whitespace
)
>

Regex options: Case insensitive, free-spacing
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

XML tags (strict)

<(?:([_:A-Z][-.:\w]*)(?:\s+[_:A-Z][-.:\w]*\s*=\s*(?:"[^"]*"|'[^']*'))*\s*/?|/([_:A-Z][-.:\w]*)\s*)>

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

<
(?:                  # Branch for opening tags:
  ([_:A-Z][-.:\w]*)  #   Capture the opening tag name to backreference 1
  (?:                #   This group permits zero or more attributes
    \s+              #   Whitespace to separate attributes
    [_:A-Z][-.:\w]*  #   Attribute name
    \s*=\s*          #   Attribute name-value delimiter
    (?: "[^"]*"      #   Double-quoted attribute value
      | '[^']*'      #   Single-quoted attribute value
    )
  )*
  \s*                #   Permit trailing whitespace
  /?                 #   Permit self-closed tags
|                    # Branch for closing tags:
  /
  ([_:A-Z][-.:\w]*)  #   Capture the closing tag name to backreference 2
  \s*                #   Permit trailing whitespace
)
>

Regex options: Case insensitive, free-spacing
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

Discussion

Allow > in attribute values

<(?>(?:(?>[^>"']+)|"[^"]*"|'[^']*')*)>

Regex options: None
Regex flavors: .NET, Java, PCRE, Perl, Ruby

<(?:[^>"']++|"[^"]*"|'[^']*')*+>

Regex options: None
Regex flavors: Java, PCRE, Perl 5.10, Ruby 1.9

(X)HTML tags (loose)

<([A-Za-z][^\s>/]*)(?:=\s*(?:"[^"]*"|'[^']*'|[^\s>]+)|[^>/])*(?:>|$)

<([A-Za-z][^\s>/]*)(?:=\s*(?:"[^"]*"|'[^']*'|[^\s>]+)|[^>])*(?:/>|$)

<([A-Za-z][^\s>/]*)(?:=\s*(?:"[^"]*"|'[^']*'|[^\s>]+)|[^>])*(?:>|$)

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

</([A-Za-z][^\s>/]*)(?:=\s*(?:"[^"]*"|'[^']*'|[^\s>]+)|[^>])*(?:>|$)

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

</?([A-Za-z](?>[^\s>/]*))(?>=\s*(?:"[^"]*"|'[^']*'|[^\s>]+)|[^>])*(?:>|$)

Regex options: None
Regex flavors: .NET, Java, PCRE, Perl, Ruby

</?([A-Za-z][^\s>/]*+)(?:=\s*(?:"[^"]*"|'[^']*'|[^\s>]+)|[^>])*+(?:>|$)

Regex options: None
Regex flavors: Java, PCRE, Perl 5.10, Ruby 1.9

</?([A-Za-z](?=([^\s>/]*))\2)(?=((?:=\s*(?:"[^"]*"|'[^']*'|[^\s>]+)|[^>])*))\3(?:>|$)

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

(X)HTML tags (strict)

<([A-Z][-:A-Z0-9]*)(?:\s+[A-Z][-:A-Z0-9]*(?:\s*=\s*(?:"[^"]*"|'[^']*'|[^"'`=<>\s]+))?)*\s*/?>

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

</([A-Z][-:A-Z0-9]*)\s*>

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

XML tags (strict)

<([_:A-Z][-.:\w]*)(?:\s+[_:A-Z][-.:\w]*\s*=\s*(?:"[^"]*"|'[^']*'))*\s*/?>

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

</([_:A-Z][-.:\w]*)\s*>

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Skip Tricky (X)HTML and XML Sections

Outer regex for (X)HTML

<!--.*?-->|<!\[CDATA\[.*?]]>|<(script|style|textarea|title|xmp)\b(?:[^>"']|"[^"]*"|'[^']*')*>.*?</\1\s*>|<plaintext\b(?:[^>"']|"[^"]*"|'[^']*')*>.*

Regex options: Case insensitive, dot matches line breaks
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

# Comment
<!-- .*? -->
|
# CDATA section
<!\[CDATA\[ .*? ]]>
|
# Special element and its content
<( script | style | textarea | title | xmp )\b
  (?:[^>"']|"[^"]*"|'[^']*')*
> .*? </\1\s*>
|
# <plaintext/> continues until the end of the string
<plaintext\b
  (?:[^>"']|"[^"]*"|'[^']*')*
> .*

Regex options: Case insensitive, dot matches line breaks, free-spacing
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

<!--[\s\S]*?-->|<!\[CDATA\[[\s\S]*?]]>|<(script|style|textarea|title|xmp)\b(?:[^>"']|"[^"]*"|'[^']*')*>[\s\S]*?</\1\s*>|<plaintext\b(?:[^>"']|"[^"]*"|'[^']*')*>[\s\S]*

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Outer regex for XML

<!--.*?--\s*>|<!\[CDATA\[.*?]]>|<!DOCTYPE\s(?:[^<>"']|"[^"]*"|'[^']*'|<!(?:[^>"']|"[^"]*"|'[^']*')*>)*>

Regex options: Case insensitive, dot matches line breaks
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

# Comment
<!-- .*? --\s*>
|
# CDATA section
<!\[CDATA\[ .*? ]]>
|
# Document type declaration
<!DOCTYPE\s
    (?: [^<>"']  # Non-special character
      | "[^"]*"  # Double-quoted value
      | '[^']*'  # Single-quoted value
      | <!(?:[^>"']|"[^"]*"|'[^']*')*>  # Markup declaration
    )*
>

Regex options: Case insensitive, dot matches line breaks, free-spacing
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

<!--[\s\S]*?--\s*>|<!\[CDATA\[[\s\S]*?]]>|<!DOCTYPE\s(?:[^<>"']|"[^"]*"|'[^']*'|<!(?:[^>"']|"[^"]*"|'[^']*')*>)*>

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

9.2. Replace Tags with

Solution

<(/?)b\b((?:[^>"']|"[^"]*"|'[^']*')*)>

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

<
(/?)             # Capture the optional leading slash to backreference 1
b \b             # Tag name, with word boundary
(                # Capture any attributes, etc. to backreference 2
    (?: [^>"']   # Any character except >, ", or '
      | "[^"]*"  # Double-quoted attribute value
      | '[^']*'  # Single-quoted attribute value
    )*
)
>

Regex options: Case insensitive, free-spacing
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

<$1strong$2>

<\1strong\2>

<$1strong>

<\1strong>

Variations

Replace a list of tags

<(/?)([bi]|em|big)\b((?:[^>"']|"[^"]*"|'[^']*')*)>

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

<
(/?)              # Capture the optional leading slash to backreference 1
([bi]|em|big) \b  # Capture the tag name to backreference 2
(                 # Capture any attributes, etc. to backreference 3
    (?: [^>"']    # Any character except >, ", or '
      | "[^"]*"   # Double-quoted attribute value
      | '[^']*'   # Single-quoted attribute value
    )*
)
>

Regex options: Case insensitive, free-spacing
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

<$1strong$3>

<\1strong\3>

<$1strong>

<\1strong>

9.3. Remove All XML-Style Tags Except and

Solution

Solution 1: Match tags except and

</?(?!(?:em|strong)\b)[a-z](?:[^>"']|"[^"]*"|'[^']*')*>

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

< /?                   # Permit closing tags
(?!
    (?: em | strong )  # List of tags to avoid matching
    \b                 # Word boundary avoids partial word matches
)
[a-z]                  # Tag name initial character must be a-z
(?: [^>"']             # Any character except >, ", or '
  | "[^"]*"            # Double-quoted attribute value
  | '[^']*'            # Single-quoted attribute value
)*
>

Regex options: Case insensitive, free-spacing
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

Solution 2: Match tags except and , and any tags that contain attributes

</?(?!(?:em|strong)\s*>)[a-z](?:[^>"']|"[^"]*"|'[^']*')*>

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

< /?                   # Permit closing tags
(?!
    (?: em | strong )  # List of tags to avoid matching
    \s* >              # Only avoid tags if they contain no attributes
)
[a-z]                  # Tag name initial character must be a-z
(?: [^>"']             # Any character except >, ", or '
  | "[^"]*"            # Double-quoted attribute value
  | '[^']*'            # Single-quoted attribute value
)*
>

Regex options: Case insensitive, free-spacing
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

Variations

Whitelist specific attributes

<(?!(?:em|strong|a(?:\s+(?:href|title)\s*=\s*(?:"[^"]*"|'[^']*'))*)\s*>)[a-z](?:[^>"']|"[^"]*"|'[^']*')*>

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

< /?          # Permit closing tags
(?!
  (?: em      # Dont match <em>
    | strong  #   or <strong>
    | a       #   or <a>
      (?:     # Only avoid matching <a> tags that use only
        \s+   #   href and/or title attributes
        (?:href|title)
        \s*=\s*
        (?:"[^"]*"|'[^']*')  # Quoted attribute value
      )*
  )
  \s* >       # Only avoid matching these tags when they're
)             #   limited to any attributes permitted above
[a-z]         # Tag name initial character must be a-z
(?: [^>"']    # Any character except >, ", or '
  | "[^"]*"   # Double-quoted attribute value
  | '[^']*'   # Single-quoted attribute value
)*
>

Regex options: Case insensitive, free-spacing
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

9.4. Match XML Names

Solution

XML 1.0 names (approximate)

^[:_\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nl}][:_\-.\p{L}\p{M}\p{Nd}\p{Nl}]*$

Regex options: None (“^ and $ match at line breaks” must not be set)
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Ruby 1.9

XML 1.1 names (exact)

^[:_A-Za-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C\u200D\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD][:_\-.A-Za-z0-9\u00B7\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u036F\u0370-\u037D\u037F-\u1FFF\u200C\u200D\u203F\u2040\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD]*$

Regex options: None (“^ and $ match at line breaks” must not be set)
Regex flavors: .NET, Java, JavaScript, Python, Ruby 1.9

^[:_A-Za-z\x{C0}-\x{D6}\x{D8}-\x{F6}\x{F8}-\x{2FF}\x{370}-\x{37D}\x{37F}-\x{1FFF}\x{200C}\x{200D}\x{2070}-\x{218F}\x{2C00}-\x{2FEF}\x{3001}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFFD}][:_\-.A-Za-z0-9\x{B7}\x{C0}-\x{D6}\x{D8}-\x{F6}\x{F8}-\x{36F}\x{370}-\x{37D}\x{37F}-\x{1FFF}\x{200C}\x{200D}\x{203F}\x{2040}\x{2070}-\x{218F}\x{2C00}-\x{2FEF}\x{3001}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFFD}]*$

Regex options: None (“^ and $ match at line breaks” must not be set)
Regex flavors: Java 7, PCRE, Perl

Discussion

XML 1.0 names

^                                   # Start of string
[:_\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nl}]  # Initial name character
[:_\-.\p{L}\p{M}\p{Nd}\p{Nl}]*      # Subsequent name characters (optional)
$                                   # End of string

Regex options: Free-spacing (“^ and $ match at line breaks” must not be set)
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Ruby 1.9

^[:_\p{L&}\p{Lo}\p{Nl}][:_\-.\pL\pM\p{Nd}\p{Nl}]*$

Regex options: None (“^ and $ match at line breaks” must not be set)
Regex flavors: PCRE, Perl

^[:_\p{L}\p{Nl}-[\p{Lm}]][:_\-.\p{L}\p{M}\p{Nd}\p{Nl}]*$

Regex options: None (“^ and $ match at line breaks” must not be set)

^[:_\pL\p{Nl}&&[^\p{Lm}]][:_\-.\pL\pM\p{Nd}\p{Nl}]*$

Regex options: None (“^ and $ match at line breaks” must not be set)

Variations

[^\d\s"'/<=>][^\s"'/<=>]*

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

(?!\d)[^\s"'/<=>]+

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

9.5. Convert Plain Text to HTML by Adding and Tags

Solution

Step 2: Replace all line breaks with

\r\n?|\n

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

\R

Regex options: None
Regex flavors: PCRE 7, Perl 5.10

<br>

Step 3: Replace double tags with

<br>\s*<br>

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

</p><p>

Example JavaScript solution

function htmlFromPlainText(subject) {
    // Step 1 (plain text searches)
    subject = subject.replace(/&/g, "&amp;").
                      replace(/</g, "&lt;").
                      replace(/>/g, "&gt;");

    // Step 2
    subject = subject.replace(/\r\n?|\n/g, "<br>");

    // Step 3
    subject = subject.replace(/<br>\s*<br>/g, "</p><p>");

    // Step 4
    subject = "<p>" + subject + "</p>";

    return subject;
}

// Run some tests...
htmlFromPlainText("Test.");            // -> "<p>Test.</p>"
htmlFromPlainText("Test.\n");          // -> "<p>Test.<br></p>"
htmlFromPlainText("Test.\n\n");        // -> "<p>Test.</p><p></p>"
htmlFromPlainText("Test1.\nTest2.");   // -> "<p>Test1.<br>Test2.</p>"
htmlFromPlainText("Test1.\n\nTest2."); // -> "<p>Test1.</p><p>Test2.</p>"
htmlFromPlainText("< AT&T >");         // -> "<p>&lt; AT&amp;T &gt;</p>"

9.6. Decode XML Entities

Solution

Regular expression

&(?:#([0-9]+)|#x([0-9a-fA-F]+)|([0-9a-zA-Z]+));

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Example JavaScript solution

// Accepts the match ($0) and backreferences; returns replacement text
function callback($0, $1, $2, $3) {
    var charCode;

    // Name lookup object that maps to decimal character codes
    // Equivalent hexadecimal numbers are listed in comments
    var names = {
        quot: 34, // 0x22
        amp: 38, // 0x26
        apos: 39, // 0x27
        lt: 60, // 0x3C
        gt: 62 // 0x3E
    };

    // Decimal character reference
    if ($1) {
        charCode = parseInt($1, 10);
    // Hexadecimal character reference
    } else if ($2) {
        charCode = parseInt($2, 16);
    // Named entity with a lookup mapping
    } else if ($3 && ($3 in names)) {
        charCode = names[$3];
    // Invalid or unknown entity name
    } else {
        return $0; // Return the match unaltered
    }

    // Return a literal character
    return String.fromCharCode(charCode);
}

// Replace all entities with literal text
subject = subject.replace(
        /&(?:#([0-9]+)|#x([0-9a-fA-F]+)|([0-9a-zA-Z]+));/g,
        callback);

Discussion

"&lt; &bogus; dec &#65;&#0065; &amp;lt; hex &#x41;&#x041; &gt;"

"< &bogus; dec AA &lt; hex AA >"

9.7. Find a Specific Attribute in XML-Style Tags

Solution

Tags that contain an id attribute (quick and dirty)

<[^>]+\sid\b[^>]*>

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

<         # Start of the tag
[^>]+     # Tag name, attributes, etc.
\s id \b  # The target attribute name, as a whole word
[^>]*     # The remainder of the tag, including the id attribute's value
>         # End of the tag

Regex options: Case insensitive, free-spacing
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

Tags that contain an id attribute (more reliable)

<(?:[^>"']|"[^"]*"|'[^']*')+?\sid\s*=\s*("[^"]*"|'[^']*')(?:[^>"']|"[^"]*"|'[^']*')*>

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

<
(?: [^>"']             # Tag and attribute names, etc.
  | "[^"]*"            #   and quoted attribute values
  | '[^']*'
)+?
\s id                  # The target attribute name, as a whole word
\s* = \s*              # Attribute name-value delimiter
( "[^"]*" | '[^']*' )  # Capture the attribute value to backreference 1
(?: [^>"']             # Any remaining characters
  | "[^"]*"            #   and quoted attribute values
  | '[^']*'
)*
>

Regex options: Case insensitive, free-spacing
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

<div> tags that contain an id attribute

<div\s(?:[^>"']|"[^"]*"|'[^']*')*?\bid\s*=\s*("[^"]*"|'[^']*')(?:[^>"']|"[^"]*"|'[^']*')*>

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

<div \s                # Tag name and following whitespace character
(?: [^>"']             # Tag and attribute names, etc.
  | "[^"]*"            #   and quoted attribute values
  | '[^']*'
)*?
\b id                  # The target attribute name, as a whole word
\s* = \s*              # Attribute name-value delimiter
( "[^"]*" | '[^']*' )  # Capture the attribute value to backreference 1
(?: [^>"']             # Any remaining characters
  | "[^"]*"            #   and quoted attribute values
  | '[^']*'
)*
>

Regex options: Case insensitive, free-spacing
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

Tags that contain an id attribute with the value “my-id”

<(?:[^>"']|"[^"]*"|'[^']*')+?\sid\s*=\s*(?:"my-id"|'my-id')(?:[^>"']|"[^"]*"|'[^']*')*>

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

<
(?: [^>"']     # Tag and attribute names, etc.
  | "[^"]*"    #   and quoted attribute values
  | '[^']*'
)+?
\s id          # The target attribute name, as a whole word
\s* = \s*      # Attribute name-value delimiter
(?: "my-id"    # The target attribute value
  | 'my-id' )  #   surrounded by single or double quotes
(?: [^>"']     # Any remaining characters
  | "[^"]*"    #   and quoted attribute values
  | '[^']*'
)*
>

Regex options: Case insensitive, free-spacing
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

Tags that contain “my-class” within their class attribute value

<(?:[^>"']|"[^"]*"|'[^']*')+>

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^(?:[^>"']|"[^"]*"|'[^']*')+?\sclass\s*=\s*("[^"]*"|'[^']*')

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

["'\s]my-class["'\s]

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

(?:^|\s)my-class(?:\s|$)

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

9.8. Add a cellspacing Attribute to <table> Tags That Do Not Already Include It

Solution

Solution 1, simplistic

<table\b(?![^>]*?\scellspacing\b)([^>]*)>

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

<table \b            # Match "<table", as a complete word
(?!                  # Not followed by:
  [^>]*?             #   Any attributes, etc.
  \s cellspacing \b  #   "cellspacing", as a complete word
)
([^>]*)              # Capture attributes, etc. to backreference 1
>

Regex options: Case insensitive
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

Solution 2, more reliable

<table\b(?!(?:[^>"']|"[^"]*"|'[^']*')*?\scellspacing\b)((?:[^>"']|"[^"]*"|'[^']*')*)>

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

<table \b  # Match "<table", as a complete word
(?!  # Not followed by: Any attributes, etc., then "cellspacing"
  (?:[^>"']|"[^"]*"|'[^']*')*?
  \s cellspacing \b
)
(  # Capture attributes, etc. to backreference 1
  (?:[^>"']|"[^"]*"|'[^']*')*
)
>

Regex options: Case insensitive
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

Insert the new attribute

<table cellspacing="0"$1>

<table cellspacing="0"\1>

9.9. Remove XML-Style Comments

Solution

<!--.*?-->

Regex options: Dot matches line breaks
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

<!--[\s\S]*?-->

Regex options: None

Discussion

When comments can’t be removed

<(script|style|textarea|title|xmp)\b(?:[^>"']|"[^"]*"|'[^']*')*>.*?</\1\s*>|<plaintext\b(?:[^>"']|"[^"]*"|'[^']*')*>.*|<[a-z](?:[^>"']|"[^"]*"|'[^']*')*>|<!\[CDATA\[.*?]]>

Regex options: Case insensitive, dot matches line breaks
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

# Special element: tag and its content
<( script | style | textarea | title | xmp )\b
  (?:[^>"']|"[^"]*"|'[^']*')*
> .*? </\1\s*>
|
# <plaintext/> continues until the end of the string
<plaintext\b
  (?:[^>"']|"[^"]*"|'[^']*')*
> .*
|
# Standard element: tag only
<[a-z]  # Tag name initial character
  (?:[^>"']|"[^"]*"|'[^']*')*
>
|
# CDATA section
<!\[CDATA\[ .*? ]]>

Regex options: Case insensitive, dot matches line breaks, free-spacing
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

<(script|style|textarea|title|xmp)\b(?:[^>"']|"[^"]*"|'[^']*')*>[\s\S]*?</\1\s*>|<plaintext\b(?:[^>"']|"[^"]*"|'[^']*')*>[\s\S]*|<[a-z](?:[^>"']|"[^"]*"|'[^']*')*>|<!\[CDATA\[[\s\S]*?]]>

Regex options: Case insensitive

Variations

Find valid XML comments

<!--[^-]*(?:-[^-]+)*--\s*>

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

<!--(?>-?[^-]+)*--\s*>

Regex options: None
Regex flavors: .NET, Java, PCRE, Perl, Ruby

Find valid HTML comments

<!--(?!-?>)[^-]*(?:-[^-]+)*-->

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

9.10. Find Words Within XML-Style Comments

Problem

        This "TODO" is not within a comment, but the next one is. <!-- 
        TODO
        : Come up with a cooler comment for this example. -->

Solution

Two-step approach

<!--.*?-->

Regex options: Dot matches line breaks
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

<!--[\s\S]*?-->

Regex options: None

\bTODO\b

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Single-step approach

\bTODO\b(?=(?:(?!<!--).)*?-->)

Regex options: Case insensitive, dot matches line breaks
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

\bTODO\b(?=(?:(?!<!--)[\s\S])*?-->)

Regex options: Case insensitive

Discussion

Single-step approach

\b TODO \b      # Match the characters "TODO", as a complete word
(?=             # Followed by:
  (?:           #   Group but don't capture:
    (?! <!-- )  #     Not followed by: "<!--"
    .           #     Match any single character
  )*?           #   Repeat zero or more times, as few as possible (lazy)
  -->           #   Match the characters "-->"
)

Regex options: Dot matches line breaks, free-spacing
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

Variations

(?<=<!--(?:(?!-->).)*?)\bTODO\b(?=(?:(?!<!--).)*?-->)

Regex options: Case insensitive, dot matches line breaks

9.11. Change the Delimiter Used in CSV Files

Solution

(,|\r?\n|^)([^",\r\n]+|"(?:[^"]|"")*")?

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

( , | \r?\n | ^ )   # Capture the leading field delimiter to backref 1
(                   # Capture a single field to backref 2:
  [^",\r\n]+        #   Unquoted field
|                   #  Or:
  " (?:[^"]|"")* "  #   Quoted field (may contain escaped double quotes)
)?                  # The group is optional because fields may be empty

Regex options: Free-spacing
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

Example web page with JavaScript

<html>
<head>
<title>Change CSV delimiters from commas to tabs</title>
</head>

<body>
<p>Input:</p>
<textarea id="input" rows="5" cols="75"></textarea>

<p><input type="button" value="Replace" onclick="commasToTabs()"></p>

<p>Output:</p>
<textarea id="output" rows="5" cols="75"></textarea>

<script>
function commasToTabs() {
    var input  = document.getElementById("input"),
        output = document.getElementById("output"),
        regex  = /(,|\r?\n|^)([^",\r\n]+|"(?:[^"]|"")*")?/g,
        result = "",
        match;

    while (match = regex.exec(input.value)) {
        // Check the value of backreference 1
        if (match[1] == ",") {
            // Add a tab (in place of the matched comma) and backreference
            // 2 to the result. If backreference 2 is undefined (because
            // the optional, second capturing group did not participate in
            // the match), use an empty string instead.
            result += "\t" + (match[2] || "");
        } else {
            // Add the entire match to the result
            result += match[0];
        }

        // If there is an empty match, prevent some browsers from getting
        // stuck in an infinite loop
        if (match.index == regex.lastIndex) {
            regex.lastIndex++;
        }
    }

    output.value = result;
}
</script>
</body>
</html>

9.12. Extract CSV Fields from a Specific Column

Solution

(,|\r?\n|^)([^",\r\n]+|"(?:[^"]|"")*")?

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

( , | \r?\n | ^ )   # Capture the leading field delimiter to backref 1
(                   # Capture a single field to backref 2:
  [^",\r\n]+        #   Unquoted field
|                   #  Or:
  " (?:[^"]|"")* "  #   Quoted field (may contain escaped double quotes)
)?                  # The group is optional because fields may be empty

Regex options: Free-spacing
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

Example web page with JavaScript

<html>
<head>
<title>Extract the third column from a CSV string</title>
</head>

<body>
<p>Input:</p>
<textarea id="input" rows="5" cols="75"></textarea>

<p><input type="button" value="Extract Column 3"
    onclick="displayCsvColumn(2)"></p>

<p>Output:</p>
<textarea id="output" rows="5" cols="75"></textarea>

<script>
function displayCsvColumn(index) {
    var input = document.getElementById("input"),
        output = document.getElementById("output"),
        columnFields = getCsvColumn(input.value, index);

    if (columnFields.length > 0) {
        // Show each record on its own line, separated by a line feed (\n)
        output.value = columnFields.join("\n");
    } else {
        output.value = "[No data found to extract]";
    }
}

// Return an array of CSV fields at the provided, zero-based index
function getCsvColumn(csv, index) {
    var regex = /(,|\r?\n|^)([^",\r\n]+|"(?:[^"]|"")*")?/g,
        result = [],
        columnIndex = 0,
        match;

    while (match = regex.exec(csv)) {
        // Check the value of backreference 1. If it's a comma,
        // increment columnIndex. Otherwise, reset it to zero.
        if (match[1] == ",") {
            columnIndex++;
        } else {
            columnIndex = 0;
        }
        if (columnIndex == index) {
            // Add the field (backref 2) at the end of the result array
            result.push(match[2]);
        }

        // If there is an empty match, prevent some browsers from getting
        // stuck in an infinite loop
        if (match.index == regex.lastIndex) {
            regex.lastIndex++;
        }
    }

    return result;
}
</script>
</body>
</html>

Variations

Match a CSV record and capture the field in column 1 to backreference 1

^([^",\r\n]+|"(?:[^"]|"")*")?(?:,(?:[^",\r\n]+|"(?:[^"]|"")*")?)*

Regex options: ^ and $ match at line breaks
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Match a CSV record and capture the field in column 2 to backreference 1

^(?:[^",\r\n]+|"(?:[^"]|"")*")?,([^",\r\n]+|"(?:[^"]|"")*")?(?:,(?:[^",\r\n]+|"(?:[^"]|"")*")?)*

Regex options: ^ and $ match at line breaks
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Match a CSV record and capture the field in column 3 or higher to backreference 1

^(?:[^",\r\n]+|"(?:[^"]|"")*")?(?:,(?:[^",\r\n]+|"(?:[^"]|"")*")?){1},([^",\r\n]+|"(?:[^"]|"")*")?(?:,(?:[^",\r\n]+|"(?:[^"]|"")*")?)*

Regex options: ^ and $ match at line breaks
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Replacement string

$1

\1

9.13. Match INI Section Headers

Solution

^\[[^\]\r\n]+]

Regex options: ^ and $ match at line breaks
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Variations

^\[Section1]

Regex options: ^ and $ match at line breaks
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

9.14. Match INI Section Blocks

Solution

^\[[^\]\r\n]+](?:\r?\n(?:[^[\r\n].*)?)*

Regex options: ^ and $ match at line breaks (“dot matches line breaks” must not be set)
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^ \[ [^\]\r\n]+ ]  # Match a section header
(?:                # Followed by the rest of the section:
  \r?\n            #   Match a line break character sequence
  (?:              #   After each line starts, match:
    [^[\r\n]       #     Any character except "[" or a line break character
    .*             #     Match the rest of the line
  )?               #   The group is optional to allow matching empty lines
)*                 # Continue until the end of the section

Regex options: ^ and $ match at line breaks, free-spacing (“dot matches line breaks” must not be set)
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

Discussion

[Section1]
Item1=Value1
Item2=[Value2]

; [SectionA]
; The SectionA header has been commented out

ItemA=ValueA ; ItemA is not commented out, and is part of Section1

[Section2]
Item3=Value3
Item4 = Value4

9.15. Match INI Name-Value Pairs

Solution

^([^=;\r\n]+)=([^;\r\n]*)

Regex options: ^ and $ match at line breaks
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

^               # Start of a line
( [^=;\r\n]+ )  # Capture the name to backreference 1
=               # Name-value delimiter
( [^;\r\n]* )   # Capture the value to backreference 2

Regex options: ^ and $ match at line breaks, free-spacing
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby