Intermediate

Representation Errors and Precision

Aicademy

·A-Level Computer Science·AQA 7517·6 min

4.5.4.5 Rounding errors·4.5.4.6 Absolute and relative errors·4.5.4.7 Range and precision·4.5.4.9 Underflow and overflow

Rounding Errors

Not every real number can be represented exactly in binary with a finite number of bits. When a value cannot be stored exactly, it is rounded to the nearest representable value — introducing a rounding error.

Classic example: $0. 1_{10}$ in binary.

Converting 0.1 by repeated multiplication by 2:

Step	Value	Bit
$0.1 \times 2$	$0.2$	0
$0.2 \times 2$	$0.4$	0
$0.4 \times 2$	$0.8$	0
$0.8 \times 2$	$1.6$	1
$0.6 \times 2$	$1.2$	1
$0.2 \times 2$	$0.4$	0 (repeats)

$0. 1_{10} = 0.0001 \overline{1001}_{2}$ — a repeating binary fraction, similar to $1/3 = 0. \overline{3}$ in decimal.

No finite number of bits can represent 0.1 exactly. Any floating-point or fixed-point system that stores a finite number of bits must approximate it.

Consequence: rounding errors accumulate in repeated calculations. Adding $0.1$ ten times in floating point may not give exactly $1.0$ .

Absolute and Relative Errors

Absolute error is the magnitude of the difference between the true value and the stored approximation:

$Absolute error = ∣ true value - stored value ∣$

Relative error expresses the error as a proportion of the true value:

$Relative error = \frac{absolute error}{∣ true value ∣}$

Worked examples:

Example 1 — true value = 1000, stored value = 999:

Absolute error = $∣1000 - 999∣ = 1$
Relative error = $1/1000 = 0.001$ (0.1%)

Example 2 — true value = 0.01, stored value = 0.009:

Absolute error = $∣0.01 - 0.009∣ = 0.001$
Relative error = $0.001/0.01 = 0.1$ (10%)

Key insight: the same absolute error has a much larger relative impact for small numbers. A $\pm 1$ error in a value of $10000$ is trivial; a $\pm 1$ error in a value of 5 is a 20% error.

Floating-point arithmetic maintains roughly constant relative error (errors scale with magnitude), while fixed-point arithmetic maintains constant absolute error.

Range and Precision

Precision is the smallest difference between two representable values — how finely spaced the representable numbers are.

Range is the span from the most negative to the most positive representable value.

For any fixed total bit count, precision and range trade off against each other:

More bits allocated to the mantissa → finer precision, same range
More bits allocated to the exponent → larger range, coarser precision

Fixed point:

Constant absolute precision (spacing between adjacent values is always $2^{- fractional bits}$ )
Limited range: determined by the number of integer bits
Best for: financial calculations, sensor readings — any context where the value range is known and bounded

Floating point:

Variable absolute precision: precision decreases for very large numbers (the representable values become further apart)
Large range: controlled by the exponent
Best for: scientific computing, graphics, physics simulations — anywhere values span many orders of magnitude

Example — with 4 fractional bits, fixed point: Adjacent values differ by $2^{- 4} = 0.0625$ . The precision near 1000 and near 0.001 is the same absolute amount: 0.0625. Relative precision near 0.001 is poor: $0.0625/0.001 = 62.5$ .

Overflow and Underflow

Overflow occurs when a calculation produces a result whose magnitude is too large to represent in the available bits.

In unsigned binary: result exceeds $2^{n} - 1$ . The carry out of the MSB is lost; the result wraps around.
In floating-point: the exponent exceeds its maximum representable value.

Example — 8-bit unsigned overflow: $200 + 100 = 300$ . But $300 > 255$ , so the result wraps: $300 - 256 = 44$ .

  1100 1000   (200)
+ 0110 0100   (100)
-----------
1 0010 1100   → carry out discarded → 0010 1100 = 44 (incorrect)

Underflow occurs when the result of a floating-point calculation is too small in magnitude to represent — its absolute value is smaller than the smallest non-zero representable value. The result rounds to zero.

This commonly occurs when two very small numbers are multiplied together.
Underflow in floating point: the exponent drops below its minimum representable value.

Condition	What happens	Effect
Integer overflow	MSB carry lost; wraps around	Silent incorrect result
Float overflow	Exponent too large	Result is incorrect (behaviour is implementation-defined)
Float underflow	Exponent too small	Result rounds to zero

Studying this for an exam?

Generate a personalised learning path for this subject. Free to get started.

Create a learning path

Fixed Point vs Floating Point: Comparison

	Fixed point	Floating point
Precision	Constant absolute	Constant relative (proportional to magnitude)
Range	Limited by integer bits	Large (controlled by exponent)
Rounding errors	Present near limits of precision	Present throughout; worse near zero
Overflow	At range boundary	When exponent overflows
Underflow	Not applicable	When exponent underflows (rounds to zero)
Speed	Faster — simpler hardware	Slower — complex arithmetic unit
Use cases	Finance, embedded systems	Science, graphics, general purpose

In practice: modern hardware implements IEEE 754 floating-point arithmetic in dedicated FPUs (floating-point units). Fixed-point arithmetic is used in embedded systems (microcontrollers without FPUs) and financial systems (where exact decimal representations are legally required).

Common Exam Mistakes

1. Claiming floating point has more precision than fixed point

Floating point has greater range but not necessarily greater precision. For numbers close to zero, floating point can represent very small values; for large numbers, the representable values become more widely spaced, reducing absolute precision.

2. Confusing absolute and relative error

Absolute error has the same units as the value. Relative error is dimensionless (a ratio). A question asking for "relative error" expects a ratio or percentage, not the raw difference.

3. Thinking overflow always produces an error message

In many hardware and language contexts, integer overflow silently wraps around (produces an incorrect result without any warning). Floating-point overflow may produce a special value like infinity rather than raising an exception.

4. Confusing underflow with a small number being stored correctly

Underflow is a specific condition where the result rounds to zero because it is smaller than the smallest representable non-zero value. A small but representable non-zero number is not underflow.