Intermediate

Character Encoding and Error Detection

AicademyAicademy
·A-Level Computer Science·AQA 7517·5 min
4.5.5.1 Character form of a decimal digit·4.5.5.2 ASCII and Unicode·4.5.5.3 Error checking and correction

Character vs Binary Representation

Computers store everything as binary patterns, but the meaning of a pattern depends on how it is interpreted.

The decimal digit 5 can be stored in two completely different ways:

RepresentationStored valueMeaning
Pure binary integer0000 0101The number five (used in arithmetic)
ASCII character code0011 0101 (= 53₁₀)The character '5' (used in text)

0011 0101 as a binary integer = — not 5. Treating a text character as a number produces the wrong result, and vice versa.

Common character codes (decimal):

  • '0' through '9' = 48 through 57
  • 'A' through 'Z' = 65 through 90
  • 'a' through 'z' = 97 through 122
  • 'A' + 32 = 'a' — letters differ by exactly 32 in ASCII (a bit-flip of bit 5)

ASCII

ASCII (American Standard Code for Information Interchange) is a 7-bit character encoding covering 128 characters.

  • 128 characters = values (codes 0–127)
  • Covers: English letters (upper and lower), digits 0–9, punctuation, and control characters (newline, tab, etc.)
  • Extended ASCII uses 8 bits (256 characters), adding accented characters and symbols — but different standards define the extra 128 differently (not standardised)

Limitations of ASCII:

  • Only covers the English alphabet
  • Cannot represent Chinese, Arabic, Cyrillic, emoji, or mathematical symbols
  • 128 characters is far too few for international use

Unicode

Unicode is a universal character encoding that assigns a unique code point to every character in every writing system in the world.

EncodingStorageCharacters supported
ASCII7 bits128
Extended ASCII8 bits256
Unicode (UTF-8)1–4 bytes (variable)Over 1.1 million code points
Unicode (UTF-16)2 or 4 bytesSame code points
Unicode (UTF-32)4 bytes (fixed)Same code points

UTF-8 is the dominant encoding on the web. It is backward-compatible with ASCII (the first 128 Unicode code points match ASCII exactly) and uses variable-length encoding — common characters use 1 byte, rarer ones use more.

Why Unicode was introduced: ASCII could not represent international alphabets, mathematical symbols, or emoji. A single standard was needed so text could be exchanged reliably between systems worldwide.

Parity Bits

A parity bit is an extra bit added to a data word to make the total number of 1-bits either even or odd.

  • Even parity: the parity bit is chosen so the total count of 1s in the transmitted data (including the parity bit) is even
  • Odd parity: total count of 1s is odd

Example — transmit 0110 1001 (four 1-bits) using even parity:

  • Count of 1s in data = 4 (already even)
  • Parity bit = 0
  • Transmitted: 0 0110 1001

If one bit is corrupted during transmission, the count of 1s becomes odd — the receiver detects an error.

Limitations of parity:

  • Detects single-bit errors only — two simultaneous errors cancel out and go undetected
  • Can detect errors but cannot correct them — it only flags that something went wrong

Something not quite clicking?

Ask Aica to explain any part of this differently. Free, takes 30 seconds.

Ask Aica

Majority Voting

Majority voting achieves error correction (not just detection) by transmitting each bit multiple times.

  • Each bit is transmitted three times: 1 is sent as 1 1 1; 0 is sent as 0 0 0
  • The receiver takes the majority value: 1 1 01 (two 1s outweigh the corrupted 0)
TransmittedReceivedMajorityCorrected?
1 1 11 1 01Yes — 1 error corrected
0 0 00 1 00Yes — 1 error corrected
1 1 10 0 10No — 2 errors, wrong result

Trade-off: majority voting triples the data volume to transmit. It corrects single-bit errors but fails for two or more errors in the same triple.

Checksums and Check Digits

Checksum — a value computed from a block of data by applying a function (typically summing bytes). The sender includes the checksum; the receiver recomputes it and compares.

Example — simple 8-bit checksum: data bytes 45, 32, 67, 12

  • Sum =
  • Checksum =
  • Transmitted: 45 32 67 12 156
  • If any byte changes, the recomputed checksum will likely differ

Check digit — a single digit appended to an identifier (barcode, ISBN, bank card number), computed from the preceding digits.

ISBN-13 check digit: multiply alternate digits by 1 and 3, sum all, subtract last digit of sum from 10.

Example — ISBN 978-0-306-40615-?: Digits: 9 7 8 0 3 0 6 4 0 6 1 5

Weights: 1 3 1 3 1 3 1 3 1 3 1 3

Products: 9, 21, 8, 0, 3, 0, 6, 12, 0, 18, 1, 15 → sum = 93

Check digit =

ISBN-13: 9780306406157 — the final 7 is the check digit.

Common Exam Mistakes

1. Confusing ASCII code for '5' with binary 5

'5' in ASCII = 53 = 0011 0101. Binary 5 = 0000 0101. These are completely different bit patterns. Treating a character digit as a binary integer gives the wrong value.

2. Claiming parity can correct errors

Parity bits can only detect errors, not correct them. To correct errors, majority voting or more sophisticated error-correcting codes (e.g. Hamming codes) are needed.

3. Stating UTF-8 always uses 4 bytes

UTF-8 uses variable-length encoding: 1 byte for ASCII-compatible characters (0–127), up to 4 bytes for rare characters. Claiming it always uses 4 bytes (like UTF-32) is incorrect.

4. Confusing Unicode with UTF-8

Unicode is a standard that assigns code points to characters. UTF-8, UTF-16, and UTF-32 are different ways of encoding (storing) those code points as bytes. Unicode is the what; UTF-8/UTF-16/UTF-32 are the how.

Generate revision on any topic you study

Type any topic you're studying and Aicademy generates a complete lesson, quiz, and flashcard set — personalised to your level.

Lessons on anything

Structured, level-matched lessons on any topic you study

Practice quizzes

Find out what you actually know before the exam does

Flashcard sets

Lock in key concepts with instant revision cards

Ask Aica

Stuck on something? Get a clear explanation, any time

Prev

Representation Errors and Precision

Next

Images, Sound and Data Representation

Related lessons

7 Slides

Lesson

Number Systems and Bases

A-Level Computer Science · AQA 7517

10 hours ago

7 Slides

Lesson

Binary Number Representations

A-Level Computer Science · AQA 7517

10 hours ago

7 Slides

Lesson

Data Compression and Encryption

A-Level Computer Science · AQA 7517

10 hours ago