Regular Expressions
What Is a Regular Expression?
A regular expression is a compact notation for describing a set of strings — a language. Rather than listing every string in the set individually, a regular expression defines a pattern, and any string that matches the pattern is a member of the described set.
Example from the AQA spec: the regular expression a(a|b)* generates the set {a, aa, ab, aaa, aab, aba, …} — every string that starts with 'a' and is followed by any sequence of a's and b's (including no further characters).
Regular expressions are used throughout computing:
- Compilers use them in lexical analysis to recognise tokens such as identifiers, keywords, and literals
- Text editors and command-line tools use them for pattern matching and search-replace operations
- Input validation (checking that a postcode or email address matches the expected format) is often implemented using regular expressions
A regular expression is a way of describing a set. Every string in that set matches the regular expression. Strings outside the set do not match.
The five metacharacters assessed by AQA are: *, +, ?, |, and ( ). Any other metacharacter used in an exam question will be defined within that question.
The Five Metacharacters
These five metacharacters are the building blocks of AQA regular expressions. Any character that is not a metacharacter is a literal — it matches itself exactly.
| Metacharacter | Name | Meaning | Example |
|---|---|---|---|
* | Star / Kleene star | 0 or more repetitions of the preceding element | ab* matches a, ab, abb, abbb, … |
+ | Plus | 1 or more repetitions of the preceding element | ab+ matches ab, abb, abbb, … (not a alone) |
? | Question mark | 0 or 1 repetitions — makes the preceding element optional | colou?r matches color or colour |
| | Pipe / alternation | Either the left or the right — logical OR | cat|dog matches cat or dog |
( ) | Parentheses | Grouping — applies an operator to a sub-expression | (ab)* matches ε, ab, abab, ababab, … |
The difference between * and +:
a*includes the empty string (zero repetitions)a+requires at least one a — the empty string does not match
How ? works:
u?means "zero or one u" — the element is optionalcolou?r=colo(u?)r= "colo, then optionally u, then r"
Grouping with parentheses:
- Without grouping:
ab*means "a, then zero or more b's" — the*applies only to b - With grouping:
(ab)*means "zero or more copies of ab" — the*applies to the whole group ab
Reading Regular Expressions
The key skill is parsing a regular expression left to right, respecting operator precedence: *, +, ? bind tightly to the single preceding element (or group); | has lower precedence and separates alternatives.
Worked examples:
a(a|b)*
a→ a literal 'a' (required)(a|b)→ either an 'a' or a 'b'(a|b)*→ zero or more repetitions of (a|b)- Full match: one 'a', followed by any sequence (including empty) of a's and b's
- Matches: a, aa, ab, aaa, aab, aba, abba, …
- Does NOT match: b, ba, empty string
(0|1)*1
(0|1)*→ any sequence of 0s and 1s (including empty)- Followed by a literal 1
- Full match: any binary string that ends with 1
- Matches: 1, 01, 11, 001, 101, 111, …
- Does NOT match: 0, 10, 100, empty string
ab?c
a→ literal ab?→ optional b (0 or 1 times)c→ literal c- Matches: ac, abc
- Does NOT match: abbc, a, bc
4 more slides
Continue this lesson
Create a free account to unlock all 7 slides, track your progress, and ask the AI tutor for help.
Related lessons
7 Slides