Advanced

BNF and Syntax Diagrams

AicademyAicademy
·A-Level Computer Science·AQA 7517·6 min
4.4.2.5 Backus-Naur Form (BNF) / syntax diagrams·4.4.3 Context-free languages

BNF: Terminals and Non-Terminals

Backus-Naur Form (BNF) is a formal notation for describing the grammar (syntax rules) of a language. It is used to define programming languages, file formats, and communication protocols.

A BNF grammar consists of production rules of the form:

<non-terminal> ::= definition

Terminals are the actual symbols of the language — they cannot be broken down further. In BNF they are written as literal values:

0 | 1 | 2 | a | b | + | (    ← these are all terminals

Non-terminals are syntactic categories — placeholders that expand into further rules. They are written in angle brackets:

<digit>  <letter>  <expression>  <identifier>    ← non-terminals

::= means "is defined as". | means "or" (alternative).

Example — defining a single digit:

<digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

This says: a <digit> is exactly one of the ten decimal digit characters.

BNF Production Rules

A grammar for simple identifiers (a letter optionally followed by letters or digits):

<letter>     ::= a | b | c | d | e | f | g | h | i | j | k | l | m |
                 n | o | p | q | r | s | t | u | v | w | x | y | z
<digit>      ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
<identifier> ::= <letter> | <identifier><letter> | <identifier><digit>

Derivation — showing that cat3 is a valid <identifier>, one non-terminal replaced per step:

<identifier>
→ <identifier><digit>         (rule: <identifier> ::= <identifier><digit>)
→ <identifier>3               (rule: <digit> ::= 3)
→ <identifier><letter>3       (rule: <identifier> ::= <identifier><letter>)
→ <identifier>t3              (rule: <letter> ::= t)
→ <identifier><letter>t3      (rule: <identifier> ::= <identifier><letter>)
→ <identifier>at3             (rule: <letter> ::= a)
→ <letter>at3                 (rule: <identifier> ::= <letter>)
→ cat3                        (rule: <letter> ::= c)

Derivation shows step-by-step replacement of non-terminals until only terminals remain.

Recursive BNF Rules

BNF rules can be recursive — a non-terminal may appear on both sides of a rule. This allows infinite languages to be described with a finite grammar.

Integer grammar:

<digit>   ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
<integer> ::= <digit> | <integer><digit>

The recursive rule <integer> ::= <integer><digit> means: an integer is either a single digit, or an integer followed by a digit. This generates all non-empty sequences of digits: 0, 1, …, 9, 00, 01, …, 99, 100, …

Arithmetic expression grammar:

<expr>   ::= <term> | <expr> + <term> | <expr> - <term>
<term>   ::= <factor> | <term> * <factor> | <term> / <factor>
<factor> ::= <number> | ( <expr> )
<number> ::= <digit> | <number><digit>
<digit>  ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

This grammar handles operator precedence correctly: * and / bind more tightly than + and - because <term> is nested inside <expr>.

Without brackets, many languages cannot be described by regular expressions alone — recursive BNF (context-free grammars) are needed. This is why compilers use BNF rather than regular expressions to parse arithmetic.

Syntax Diagrams

A syntax diagram (railroad diagram) is a visual alternative to BNF. It represents the same grammar as a flowchart-like diagram.

Notation:

  • Rectangular box: non-terminal (expand further using its own diagram)
  • Rounded box (oval/circle): terminal (literal character or token)
  • Flow follows arrows; branches represent alternatives; loops represent repetition

Example — <digit> syntax diagram:

──→ ( 0 ) ──┐
    ( 1 ) ──┤
    ( 2 ) ──┤
    ...     ┤──→
    ( 9 ) ──┘

Rounded boxes (0 through 9) are terminals; the branching represents the | alternatives.

Example — <integer> syntax diagram:

──→ [ digit ] ──┬──→ (exit)
        ↑       │
        └───────┘

Enter, process one [digit], then either exit or loop back to process another [digit]. This represents one or more digits — matching the recursive BNF rule <integer> ::= <digit> | <integer><digit>.

Worth saving these ideas?

Turn what you've read into instant revision cards. Free to get started.

Make flashcards

Context-Free Languages and Parse Trees

A grammar written in BNF generates a context-free language — a language where each production rule replaces a single non-terminal regardless of the surrounding context.

Context-free languages are more expressive than regular languages:

  • Regular languages: described by regular expressions and accepted by FSMs
  • Context-free languages: described by BNF grammars; include arithmetic expressions, most programming language syntax

Parse tree — a tree that represents the derivation of a string from a grammar. Each internal node is a non-terminal; each leaf is a terminal.

Example — parse 3 + 4 using <expr> ::= <number> | <expr> + <number>:

         <expr>
        /   |   \
    <expr>  +  <number>
      |            |
  <number>       <digit>
      |            |
   <digit>         4
      |
      3

Parse trees make the structure of an expression explicit. Compilers build parse trees (or equivalent internal representations) during syntax analysis.

Ambiguous grammar — a grammar where a string has more than one parse tree. Ambiguous grammars cause problems for compilers because the meaning of an expression becomes unclear.

Common Exam Mistakes

1. Using the wrong brackets for terminals vs non-terminals

Terminals are written as literal characters (unbracketed or quoted); non-terminals are in angle brackets <like this>. Writing <0> instead of 0 for a digit terminal conflates the two.

2. Stopping a derivation at a non-terminal

A complete derivation ends when every symbol is a terminal. If your derivation still contains <digit> or <identifier>, you have not finished — continue applying rules.

3. Confusing recursion with circular definition

A recursive rule like <integer> ::= <digit> | <integer><digit> is not circular — the base case <integer> ::= <digit> terminates the recursion. A truly circular definition with no base case generates nothing.

4. Swapping the symbols for rectangular and rounded boxes in syntax diagrams

In a syntax diagram, rectangular (square-cornered) boxes contain non-terminals; rounded (oval) boxes contain terminals. Getting these backwards is a mark-scheme error.

5. Claiming BNF can describe any language

BNF describes context-free languages. Some languages (like those requiring context-sensitive rules, e.g. C's type declarations) require more powerful grammars. BNF cannot express every possible language.

Generate revision on any topic you study

Type any topic you're studying and Aicademy generates a complete lesson, quiz, and flashcard set — personalised to your level.

Lessons on anything

Structured, level-matched lessons on any topic you study

Practice quizzes

Find out what you actually know before the exam does

Flashcard sets

Lock in key concepts with instant revision cards

Ask Aica

Stuck on something? Get a clear explanation, any time

Prev

Regular Expressions

Next

Complexity and Big-O Notation

Related lessons

6 Slides

Lesson

Sets and Formal Languages

A-Level Computer Science · AQA 7517

10 hours ago

7 Slides

Lesson

Regular Expressions

A-Level Computer Science · AQA 7517

7 days ago