Intermediate

Data Compression and Encryption

AicademyAicademy
·A-Level Computer Science·AQA 7517·6 min
4.5.6.9 Data compression·4.5.6.10 Encryption

Why Compress Data?

Uncompressed data files are large. Compression reduces the number of bits needed to represent data, reducing both:

  • Storage requirements (fit more data on a disk)
  • Transmission time (send files faster over a network)

A 1920×1080 uncompressed 24-bit image takes about 6 MB. The same image as a JPEG may be 200–500 KB — a 10–30× reduction.

Two fundamentally different approaches:

LosslessLossy
Data recovered?Yes — perfectlyNo — some data permanently lost
Compression ratioModerate (2–5×)High (10–100×)
Suitable forText, executables, source code, medical imagesPhotos (JPEG), audio (MP3), video (H.264)
Reversible?YesNo

Lossless Compression: Run-Length Encoding

Run-length encoding (RLE) replaces consecutive repeated values with a (value, count) pair.

Example — compressing a bitmap row:

Original (15 values): A A A A B B B C C D D D D D D
RLE:                   (A,4) (B,3) (C,2) (D,6)  = 8 values

8 stored values vs 15 original — nearly 50% reduction.

RLE is most effective when the data contains long runs of the same value. It performs poorly on complex images with many colour changes (it may even increase the file size).

Dictionary-based compression (e.g. LZW, used in PNG and GIF): builds a dictionary of repeated patterns found in the data. Longer repeated substrings are replaced with short dictionary codes. Effective on text and structured data.

Both RLE and dictionary-based compression are lossless — the original data can be perfectly reconstructed.

Lossy Compression

Lossy compression permanently discards some data to achieve much higher compression ratios.

How it works (JPEG example):

  1. The image is divided into 8×8 blocks of pixels
  2. Each block is transformed into frequency components (DCT)
  3. High-frequency detail (fine textures) — imperceptible to the human eye — is discarded
  4. Remaining values are encoded efficiently

The discarded data is gone. Repeatedly saving a JPEG increases the visible degradation ("compression artefacts").

Use cases:

  • JPEG: photographs — lossy is acceptable because photos contain more detail than the eye can discern
  • MP3: audio — removes sounds outside human hearing range and quieter sounds masked by louder ones
  • Video codecs (H.264, H.265): store only changes between frames; discard imperceptible detail

Why not use lossless for everything? Lossless compression ratios are insufficient for video and audio — a 2-hour film in lossless format would be hundreds of gigabytes.

Encryption: Core Concepts

Encryption transforms plaintext (readable data) into ciphertext (unreadable data) using a cipher (algorithm) and a key. Only authorised parties with the key can decrypt it.

TermMeaning
PlaintextThe original readable message
CiphertextThe encrypted (unreadable) output
CipherThe algorithm used to encrypt and decrypt
KeyA value that controls the cipher's operation
EncryptionPlaintext → Ciphertext (using key)
DecryptionCiphertext → Plaintext (using key)

The same plaintext encrypted with different keys produces different ciphertext. Without the key, the ciphertext should be computationally infeasible to reverse.

How much of this have you taken in?

Quiz yourself on this section — free, no card needed.

Test myself

The Caesar Cipher

The Caesar cipher is a substitution cipher that shifts each letter of the plaintext by a fixed number of positions in the alphabet.

Key = shift amount (1–25).

Example — encrypt HELLO with key = 3 (shift right by 3):

PlaintextHELLO
Shift +3KHOOR
CiphertextKHOOR

Decryption: shift back by 3.

Weaknesses:

  • Only 25 possible keys — an attacker can try all 25 in seconds (exhaustive key search)
  • Vulnerable to frequency analysis: in English text, 'E' is most common; if 'X' is most common in the ciphertext, the key is likely 23

The Caesar cipher provides essentially no security by modern standards.

The Vernam Cipher (One-Time Pad)

The Vernam cipher (one-time pad) XORs each bit of the plaintext with the corresponding bit of a secret key.

Conditions for perfect security:

  1. The key must be truly random (not pseudo-random)
  2. The key must be at least as long as the message
  3. The key must be used exactly once and never reused
  4. The key must be kept completely secret

Worked example:

Plaintext:  1011 0011
Key:        0110 1100
Ciphertext: 1101 1111  (XOR each bit)

Decrypt:
Ciphertext: 1101 1111
Key:        0110 1100
Plaintext:  1011 0011  ✓

Why it is provably perfectly secure: if the key is truly random, every possible plaintext that produces the given ciphertext is equally likely. No amount of computation can determine the plaintext without the key.

Practical limitations: distributing a key as long as the message securely is difficult. One-time pads are used in high-security diplomatic communications but are impractical for general use.

Computational security: all other ciphers (AES, RSA, etc.) rely on computational hardness — breaking them requires more computation than is currently feasible, but they are not mathematically proven unbreakable (unlike the Vernam cipher).

Common Exam Mistakes

1. Claiming lossy compression can be reversed

Lossy compression permanently discards data. The original file cannot be recovered from a lossy-compressed version. "Decompressing" a JPEG gives the approximation stored in the file, not the original photograph.

2. Stating the Caesar cipher is secure because the alphabet is large

The number of keys (25) is tiny — not the alphabet size. An attacker tests all 25 possible shifts in seconds. Security depends on key space size, not alphabet size.

3. Confusing the Vernam cipher with general XOR encryption

XOR with a short repeated key is not a Vernam cipher and is easily broken. The Vernam cipher requires a key as long as the message, used only once. Short or repeated keys destroy the perfect security guarantee.

4. Overstating the security of AES/RSA

AES and RSA are computationally secure — no practical attack is known with current technology. They are not mathematically proven unbreakable. Only the Vernam cipher (with correct use) is provably perfectly secure.

Generate revision on any topic you study

Type any topic you're studying and Aicademy generates a complete lesson, quiz, and flashcard set — personalised to your level.

Lessons on anything

Structured, level-matched lessons on any topic you study

Practice quizzes

Find out what you actually know before the exam does

Flashcard sets

Lock in key concepts with instant revision cards

Ask Aica

Stuck on something? Get a clear explanation, any time

Prev

Images, Sound and Data Representation

Next

Software, Operating Systems and System Software

Related lessons

6 Slides

Lesson

Images, Sound and Data Representation

A-Level Computer Science · AQA 7517

10 hours ago

7 Slides

Lesson

Character Encoding and Error Detection

A-Level Computer Science · AQA 7517

10 hours ago

7 Slides

Lesson

Internet Security

A-Level Computer Science · AQA 7517

10 hours ago