Intermediate

Data Compression and Encryption

Aicademy

·A-Level Computer Science·AQA 7517·6 min

4.5.6.9 Data compression·4.5.6.10 Encryption

Why Compress Data?

Uncompressed data files are large. Compression reduces the number of bits needed to represent data, reducing both:

Storage requirements (fit more data on a disk)
Transmission time (send files faster over a network)

A 1920×1080 uncompressed 24-bit image takes about 6 MB. The same image as a JPEG may be 200–500 KB — a 10–30× reduction.

Two fundamentally different approaches:

	Lossless	Lossy
Data recovered?	Yes — perfectly	No — some data permanently lost
Compression ratio	Moderate (2–5×)	High (10–100×)
Suitable for	Text, executables, source code, medical images	Photos (JPEG), audio (MP3), video (H.264)
Reversible?	Yes	No

Lossless Compression: Run-Length Encoding

Run-length encoding (RLE) replaces consecutive repeated values with a (value, count) pair.

Example — compressing a bitmap row:

Original (15 values): A A A A B B B C C D D D D D D
RLE:                   (A,4) (B,3) (C,2) (D,6)  = 8 values

8 stored values vs 15 original — nearly 50% reduction.

RLE is most effective when the data contains long runs of the same value. It performs poorly on complex images with many colour changes (it may even increase the file size).

Dictionary-based compression (e.g. LZW, used in PNG and GIF): builds a dictionary of repeated patterns found in the data. Longer repeated substrings are replaced with short dictionary codes. Effective on text and structured data.

Both RLE and dictionary-based compression are lossless — the original data can be perfectly reconstructed.

Lossy Compression

Lossy compression permanently discards some data to achieve much higher compression ratios.

How it works (JPEG example):

The image is divided into 8×8 blocks of pixels
Each block is transformed into frequency components (DCT)
High-frequency detail (fine textures) — imperceptible to the human eye — is discarded
Remaining values are encoded efficiently

The discarded data is gone. Repeatedly saving a JPEG increases the visible degradation ("compression artefacts").

Use cases:

JPEG: photographs — lossy is acceptable because photos contain more detail than the eye can discern
MP3: audio — removes sounds outside human hearing range and quieter sounds masked by louder ones
Video codecs (H.264, H.265): store only changes between frames; discard imperceptible detail

Why not use lossless for everything? Lossless compression ratios are insufficient for video and audio — a 2-hour film in lossless format would be hundreds of gigabytes.

Encryption: Core Concepts

Encryption transforms plaintext (readable data) into ciphertext (unreadable data) using a cipher (algorithm) and a key. Only authorised parties with the key can decrypt it.

Term	Meaning
Plaintext	The original readable message
Ciphertext	The encrypted (unreadable) output
Cipher	The algorithm used to encrypt and decrypt
Key	A value that controls the cipher's operation
Encryption	Plaintext → Ciphertext (using key)
Decryption	Ciphertext → Plaintext (using key)

The same plaintext encrypted with different keys produces different ciphertext. Without the key, the ciphertext should be computationally infeasible to reverse.

How much of this have you taken in?

Quiz yourself on this section — free, no card needed.

Test myself

The Caesar Cipher

The Caesar cipher is a substitution cipher that shifts each letter of the plaintext by a fixed number of positions in the alphabet.

Key = shift amount (1–25).

Example — encrypt HELLO with key = 3 (shift right by 3):

Plaintext	H	E	L	L	O
Shift +3	K	H	O	O	R
Ciphertext	K	H	O	O	R

Decryption: shift back by 3.

Weaknesses:

Only 25 possible keys — an attacker can try all 25 in seconds (exhaustive key search)
Vulnerable to frequency analysis: in English text, 'E' is most common; if 'X' is most common in the ciphertext, the key is likely 23

The Caesar cipher provides essentially no security by modern standards.

The Vernam Cipher (One-Time Pad)

The Vernam cipher (one-time pad) XORs each bit of the plaintext with the corresponding bit of a secret key.

$ciphertext_{i} = plaintext_{i} \oplus key_{i}$

Conditions for perfect security:

The key must be truly random (not pseudo-random)
The key must be at least as long as the message
The key must be used exactly once and never reused
The key must be kept completely secret

Worked example:

Plaintext:  1011 0011
Key:        0110 1100
Ciphertext: 1101 1111  (XOR each bit)

Decrypt:
Ciphertext: 1101 1111
Key:        0110 1100
Plaintext:  1011 0011  ✓

Why it is provably perfectly secure: if the key is truly random, every possible plaintext that produces the given ciphertext is equally likely. No amount of computation can determine the plaintext without the key.

Practical limitations: distributing a key as long as the message securely is difficult. One-time pads are used in high-security diplomatic communications but are impractical for general use.

Computational security: all other ciphers (AES, RSA, etc.) rely on computational hardness — breaking them requires more computation than is currently feasible, but they are not mathematically proven unbreakable (unlike the Vernam cipher).

Common Exam Mistakes

1. Claiming lossy compression can be reversed

Lossy compression permanently discards data. The original file cannot be recovered from a lossy-compressed version. "Decompressing" a JPEG gives the approximation stored in the file, not the original photograph.

2. Stating the Caesar cipher is secure because the alphabet is large

The number of keys (25) is tiny — not the alphabet size. An attacker tests all 25 possible shifts in seconds. Security depends on key space size, not alphabet size.

3. Confusing the Vernam cipher with general XOR encryption

XOR with a short repeated key is not a Vernam cipher and is easily broken. The Vernam cipher requires a key as long as the message, used only once. Short or repeated keys destroy the perfect security guarantee.

4. Overstating the security of AES/RSA

AES and RSA are computationally secure — no practical attack is known with current technology. They are not mathematically proven unbreakable. Only the Vernam cipher (with correct use) is provably perfectly secure.