Advanced

Regular Expressions

AicademyAicademy
·A-Level Computer Science·AQA 7517·7 slides
4.4.2.3 Regular expressions·4.4.2.4 Regular language

What Is a Regular Expression?

A regular expression is a compact notation for describing a set of strings — a language. Rather than listing every string in the set individually, a regular expression defines a pattern, and any string that matches the pattern is a member of the described set.

Example from the AQA spec: the regular expression a(a|b)* generates the set {a, aa, ab, aaa, aab, aba, …} — every string that starts with 'a' and is followed by any sequence of a's and b's (including no further characters).

Regular expressions are used throughout computing:

  • Compilers use them in lexical analysis to recognise tokens such as identifiers, keywords, and literals
  • Text editors and command-line tools use them for pattern matching and search-replace operations
  • Input validation (checking that a postcode or email address matches the expected format) is often implemented using regular expressions

A regular expression is a way of describing a set. Every string in that set matches the regular expression. Strings outside the set do not match.

The five metacharacters assessed by AQA are: *, +, ?, |, and ( ). Any other metacharacter used in an exam question will be defined within that question.

The Five Metacharacters

These five metacharacters are the building blocks of AQA regular expressions. Any character that is not a metacharacter is a literal — it matches itself exactly.

MetacharacterNameMeaningExample
*Star / Kleene star0 or more repetitions of the preceding elementab* matches a, ab, abb, abbb, …
+Plus1 or more repetitions of the preceding elementab+ matches ab, abb, abbb, … (not a alone)
?Question mark0 or 1 repetitions — makes the preceding element optionalcolou?r matches color or colour
|Pipe / alternationEither the left or the right — logical ORcat|dog matches cat or dog
( )ParenthesesGrouping — applies an operator to a sub-expression(ab)* matches ε, ab, abab, ababab, …

The difference between * and +:

  • a* includes the empty string (zero repetitions)
  • a+ requires at least one a — the empty string does not match

How ? works:

  • u? means "zero or one u" — the element is optional
  • colou?r = colo(u?)r = "colo, then optionally u, then r"

Grouping with parentheses:

  • Without grouping: ab* means "a, then zero or more b's" — the * applies only to b
  • With grouping: (ab)* means "zero or more copies of ab" — the * applies to the whole group ab

Reading Regular Expressions

The key skill is parsing a regular expression left to right, respecting operator precedence: *, +, ? bind tightly to the single preceding element (or group); | has lower precedence and separates alternatives.

Worked examples:

a(a|b)*

  • a → a literal 'a' (required)
  • (a|b) → either an 'a' or a 'b'
  • (a|b)* → zero or more repetitions of (a|b)
  • Full match: one 'a', followed by any sequence (including empty) of a's and b's
  • Matches: a, aa, ab, aaa, aab, aba, abba, …
  • Does NOT match: b, ba, empty string

(0|1)*1

  • (0|1)* → any sequence of 0s and 1s (including empty)
  • Followed by a literal 1
  • Full match: any binary string that ends with 1
  • Matches: 1, 01, 11, 001, 101, 111, …
  • Does NOT match: 0, 10, 100, empty string

ab?c

  • a → literal a
  • b? → optional b (0 or 1 times)
  • c → literal c
  • Matches: ac, abc
  • Does NOT match: abbc, a, bc

4 more slides

Continue this lesson

Create a free account to unlock all 7 slides, track your progress, and ask the AI tutor for help.

Related lessons

7 Slides

Lesson

Finite State Machines

A-Level Computer Science · AQA 7517

4 hours ago