BNF and Syntax Diagrams
BNF: Terminals and Non-Terminals
Backus-Naur Form (BNF) is a formal notation for describing the grammar (syntax rules) of a language. It is used to define programming languages, file formats, and communication protocols.
A BNF grammar consists of production rules of the form:
<non-terminal> ::= definition
Terminals are the actual symbols of the language — they cannot be broken down further. In BNF they are written as literal values:
0 | 1 | 2 | a | b | + | ( ← these are all terminals
Non-terminals are syntactic categories — placeholders that expand into further rules. They are written in angle brackets:
<digit> <letter> <expression> <identifier> ← non-terminals
::= means "is defined as". | means "or" (alternative).
Example — defining a single digit:
<digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
This says: a <digit> is exactly one of the ten decimal digit characters.
BNF Production Rules
A grammar for simple identifiers (a letter optionally followed by letters or digits):
<letter> ::= a | b | c | d | e | f | g | h | i | j | k | l | m |
n | o | p | q | r | s | t | u | v | w | x | y | z
<digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
<identifier> ::= <letter> | <identifier><letter> | <identifier><digit>
Derivation — showing that cat3 is a valid <identifier>, one non-terminal replaced per step:
<identifier>
→ <identifier><digit> (rule: <identifier> ::= <identifier><digit>)
→ <identifier>3 (rule: <digit> ::= 3)
→ <identifier><letter>3 (rule: <identifier> ::= <identifier><letter>)
→ <identifier>t3 (rule: <letter> ::= t)
→ <identifier><letter>t3 (rule: <identifier> ::= <identifier><letter>)
→ <identifier>at3 (rule: <letter> ::= a)
→ <letter>at3 (rule: <identifier> ::= <letter>)
→ cat3 (rule: <letter> ::= c)
Derivation shows step-by-step replacement of non-terminals until only terminals remain.
Recursive BNF Rules
BNF rules can be recursive — a non-terminal may appear on both sides of a rule. This allows infinite languages to be described with a finite grammar.
Integer grammar:
<digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
<integer> ::= <digit> | <integer><digit>
The recursive rule <integer> ::= <integer><digit> means: an integer is either a single digit, or an integer followed by a digit. This generates all non-empty sequences of digits: 0, 1, …, 9, 00, 01, …, 99, 100, …
Arithmetic expression grammar:
<expr> ::= <term> | <expr> + <term> | <expr> - <term>
<term> ::= <factor> | <term> * <factor> | <term> / <factor>
<factor> ::= <number> | ( <expr> )
<number> ::= <digit> | <number><digit>
<digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
This grammar handles operator precedence correctly: * and / bind more tightly than + and - because <term> is nested inside <expr>.
Without brackets, many languages cannot be described by regular expressions alone — recursive BNF (context-free grammars) are needed. This is why compilers use BNF rather than regular expressions to parse arithmetic.
Syntax Diagrams
A syntax diagram (railroad diagram) is a visual alternative to BNF. It represents the same grammar as a flowchart-like diagram.
Notation:
- Rectangular box: non-terminal (expand further using its own diagram)
- Rounded box (oval/circle): terminal (literal character or token)
- Flow follows arrows; branches represent alternatives; loops represent repetition
Example — <digit> syntax diagram:
──→ ( 0 ) ──┐
( 1 ) ──┤
( 2 ) ──┤
... ┤──→
( 9 ) ──┘
Rounded boxes (0 through 9) are terminals; the branching represents the | alternatives.
Example — <integer> syntax diagram:
──→ [ digit ] ──┬──→ (exit)
↑ │
└───────┘
Enter, process one [digit], then either exit or loop back to process another [digit]. This represents one or more digits — matching the recursive BNF rule <integer> ::= <digit> | <integer><digit>.
Worth saving these ideas?
Turn what you've read into instant revision cards. Free to get started.
Context-Free Languages and Parse Trees
A grammar written in BNF generates a context-free language — a language where each production rule replaces a single non-terminal regardless of the surrounding context.
Context-free languages are more expressive than regular languages:
- Regular languages: described by regular expressions and accepted by FSMs
- Context-free languages: described by BNF grammars; include arithmetic expressions, most programming language syntax
Parse tree — a tree that represents the derivation of a string from a grammar. Each internal node is a non-terminal; each leaf is a terminal.
Example — parse 3 + 4 using <expr> ::= <number> | <expr> + <number>:
<expr>
/ | \
<expr> + <number>
| |
<number> <digit>
| |
<digit> 4
|
3
Parse trees make the structure of an expression explicit. Compilers build parse trees (or equivalent internal representations) during syntax analysis.
Ambiguous grammar — a grammar where a string has more than one parse tree. Ambiguous grammars cause problems for compilers because the meaning of an expression becomes unclear.
Common Exam Mistakes
1. Using the wrong brackets for terminals vs non-terminals
Terminals are written as literal characters (unbracketed or quoted); non-terminals are in angle brackets <like this>. Writing <0> instead of 0 for a digit terminal conflates the two.
2. Stopping a derivation at a non-terminal
A complete derivation ends when every symbol is a terminal. If your derivation still contains <digit> or <identifier>, you have not finished — continue applying rules.
3. Confusing recursion with circular definition
A recursive rule like <integer> ::= <digit> | <integer><digit> is not circular — the base case <integer> ::= <digit> terminates the recursion. A truly circular definition with no base case generates nothing.
4. Swapping the symbols for rectangular and rounded boxes in syntax diagrams
In a syntax diagram, rectangular (square-cornered) boxes contain non-terminals; rounded (oval) boxes contain terminals. Getting these backwards is a mark-scheme error.
5. Claiming BNF can describe any language
BNF describes context-free languages. Some languages (like those requiring context-sensitive rules, e.g. C's type declarations) require more powerful grammars. BNF cannot express every possible language.
Generate revision on any topic you study
Type any topic you're studying and Aicademy generates a complete lesson, quiz, and flashcard set — personalised to your level.
Lessons on anything
Structured, level-matched lessons on any topic you study
Practice quizzes
Find out what you actually know before the exam does
Flashcard sets
Lock in key concepts with instant revision cards
Ask Aica
Stuck on something? Get a clear explanation, any time
Regular Expressions
Complexity and Big-O Notation
Related lessons
6 Slides
7 Slides