Insights Technical Hub How to use regular expressions in Perl Part 3: Global Modifiers, Backreferences, and Character Classes

How to use regular expressions in Perl Part 3: Global Modifiers, Backreferences, and Character Classes

Introduction:

Perl is a language which is commonly used for file processing. Regular expressions provide pattern matching and extraction support. Regular expressions can be used in Perl for matching data, not just within files. Perl’s regular expression support is very extensive and rich.

Requirements:

A Perl interpreter, such as ActiveState Perl. Perl may be installed by default on Linux.

Procedure:

You can terminate your regular expression with other modifiers, such as g and cg is used to match multiple times in the same data set:

$data =~ /stringToMatch/g;

With scalar data sets, such as the above, you can perform your match with g and the position of the last match will be remembered. If you match again, such as with a while loop, you can resume where you left off. If a match failed, however, the position will reset. You can continue with the last position even after a failure, with the c modifier:

while ($data =~ /stringToMatch/gc){ 
print $1; 
}

There is also another way you can continue after a failure. If you are matching something of a particular length, and you want to step ahead the amount of characters that are in the match, you can use the \G. This can only be done at the start of a string. For instance:

while ($data =~ /\GstringToMatch/g){ 
print $1; 
}

If you are extracting matches from a string, you can reference them with backreferences within the same regular expression. You can use \g1\g2, etc to do so:

$data =~ /stringToMatch(stringToExtract)secondStringToMatch\g1/;

The extracted string will be used where \g1 is used. If you have further extractions, \g2 etc can be within the regular expression as backreferences.

Perl also supports character classes such as: \d for a digit, \s for a whitespace, \w for a word character, \D\S\W for the inverse of \d\s, and \w, and \N, the inverse of , though it will not match “\n” newline whether the //s modifier terminates a regular expression or not.

Let's make your job easier

Sign up to get insightful content