How to Understand Basic Regular Expression Metacharacters

Introduction:

Regular expressions are used for matching characters within text. Metacharacters are reserved symbols used to assist in matching. There are several sets of basic regular expression metacharacters, as opposed to extended regular expression metacharacters. Basic metacharacters are available to any regular expression engine, while extended symbols may have to be enabled. In general, regular expressions can be used for such purposes as finding or replacing text, for example.

Requirements:

A regular expression engine, made available by some tool or language. The Linux command line utility grep, for instance, has regular expression abilities, and the language Perl has capabilities for regular expression matching built in.

Procedure:

The basic regular expression metacharacters are:

. [ ] [^] ^ $ \n *

This does not include the regular expression metacharacters that need to be escaped. Using these metacharacters with a regular expression tool or engine will provide special functionality when pattern matching. They each serve a function. A complete regular expression can consist of a single or string of characters, which can include these metacharacters.

When matching for text, each metacharacter will be substituted with a certain search criteria. The ‘.’ metacharacter represents any character. The [ ] metacharacters can contain characters between them. Any characters between the brackets will be substituted, or if there is a dash between them, such as:

[a-z]

each character in the range will be substituted. If the dash is at the end, or the beginning, it is treated as the ‘–‘ character within the criteria. The “]” character can also be included in this same way within the brackets, but only if it is the first character after the “[“.

[^ ]

Anything within the brackets and after the “^” will be negated, and not matched. This is the inverse of the “[ ]”.

This will match from the beginning of the string, or line if the tool is line-based.

This represents the end of the string or line (the opposite end of the string/line from “^”).

\n

The “n” above can be replaced by a number (1-9), and this will represent the previously found match. This is called a backreference, and will represent whichever match, the first or following match (up to the ninth).

This is a quantifier, and whatever precedes it (character, metacharacter, or expression), will be matched zero or more times. There is no limit to the amount of times it can be matched.

How to Understand Basic Regular Expression Metacharacters

share

Introduction:

Requirements:

Procedure:

Other Technical Hub