Signal Processing Foundations — Week 2: Sampling & Aliasing

The Core Idea

Discrete Signals & Sampling

A continuous-time signal has a value at every instant of time. A digital computer can only store a finite list of numbers. Sampling bridges these two worlds by taking periodic "snapshots" — and the rate at which we snap determines everything about what we can recover.

After this section you will be able to

Explain in plain English what a discrete signal is and how it differs from a continuous signal.
Apply the sampling equation $x[n] = x(nT_s)$ to compute sample values by hand.
Interpret the sr parameter in NumPy and librosa code as $f_s$ in the sampling equation.

Imagine trying to describe a song to someone over text. You cannot send the whole continuous soundwave — you have to send a list of numbers. But which numbers? How many? And how do you make sure the list is enough to reconstruct the original? That is exactly the problem sampling solves.

🎯

Why this matters: Every dataset you will ever load in an ML project is already a sampled signal. The sr=22050 you pass to librosa.load() is not a magic constant — it is $f_s$, the sampling frequency, and choosing it incorrectly degrades model performance or silently corrupts your features.

🎬

Analogy Bridge

Think of a flip-book animation. Each page is one sample. The continuous movie is the analog signal; the stack of pages is the digital signal. More pages per second = smoother motion = higher $f_s$. The question the Nyquist theorem (Topic 2) answers is: how many pages do you actually need?

Analog

Continuous Signal

Value defined at every real time $t$

Sampling

Clock Grid

ADC fires every $T_s$ seconds

Digital

Discrete Sequence

List of numbers: $x[0], x[1], x[2], \ldots$

Python

NumPy Array

sr is $f_s$ — samples / second

$x(t)$

Analog Signal

Continuous — has a value at every real time $t$

e.g. microphone voltage

$x[n]$

Discrete Signal

A list of numbers indexed by integer $n$

e.g. Python NumPy array

$f_s$

Sample Rate

Snapshots taken per second (Hz)

e.g. 44100 Hz (CD audio)

$T_s$

Sampling Period

Time gap between consecutive samples

e.g. 1/44100 ≈ 22.7 µs

44,100

CD audio — the industry standard since 1980

22,050

librosa default sr — half the CD rate, still covers speech

16,000

ASR models (Whisper, Wav2Vec) — speech processing standard

8,000

Telephone voice — minimum for intelligible speech

Signal taxonomy pipeline: The Analog-to-Digital Converter (ADC) bridges the continuous physical world and the discrete digital domain at a fixed rate $f_s$.

Problem

📐 The Sampling Equation

The fundamental relationship that connects the continuous analog signal to its discrete digital representation. Each sample $x[n]$ is simply the value of the analog signal frozen at clock tick $n$:

$$x[n] = x(n \cdot T_s) \qquad T_s = \frac{1}{f_s}$$

$x[n] = x(n \cdot T_s)$ · $T_s = 1/f_s$

$x[n]$ Discrete Sample The $n$-th number stored in the array

$n$ Sample Index Integer counter: 0, 1, 2, 3, …

$x(t)$ Analog Signal Continuous-time measurement (e.g., voltage)

$T_s$ Sampling Period Seconds between consecutive snapshots

$f_s$ Sample Rate Snapshots taken per second (Hz)

📝 Worked Example — Sampling a 5 Hz Sine Wave

Background. The equation $x[n] = x(nT_s)$ replaces continuous time $t$ with the discrete grid $\{0, T_s, 2T_s, \ldots\}$. We evaluate the analog function at each of these clock ticks.

Problem: A sensor records $x(t) = 3\cos(2\pi \cdot 5 \cdot t)$ V at $f_s = 20$ Hz. Compute $T_s$ and the first four samples $x[0], x[1], x[2], x[3]$.

Find the sampling period.
$T_s = 1/f_s = 1/20 = 0.05\text{ s}$

Ts = 0.05 s = 50 ms

Substitute into the sampling equation.
$x[n] = 3\cos(2\pi \cdot 5 \cdot n \cdot 0.05) = 3\cos(0.5\pi n)$

Evaluate each sample.
$x[0] = 3\cos(0) = 3.000$
$x[1] = 3\cos(0.5\pi) = 0.000$
$x[2] = 3\cos(\pi) = -3.000$
$x[3] = 3\cos(1.5\pi) = 0.000$

x = [3.000, 0.000, −3.000, 0.000, …]

Physical interpretation. Sampling at $f_s = 4f_{signal} = 20$ Hz captures exactly 4 points per cycle (peak, zero, trough, zero). The sequence $[3, 0, -3, 0, 3, \ldots]$ fully captures the wave's shape.

Quick Check

If $f_s = 40$ Hz instead, how many samples appear per cycle of the 5 Hz signal?

8 samples per cycle — at f_s=40Hz, T_s=0.025s. One cycle = 1/5 = 0.2 s. So 0.2/0.025 = 8 samples.

🐍 What `sr` Really Is in Python

When you write y, sr = librosa.load(file), the variable sr is nothing but $f_s$ — the number of samples per second stored in the audio file. Every array index maps to a real physical time via $t = n / f_s$.

Code	Meaning
`sr = 22050`	$f_s = 22{,}050$ Hz → $T_s \approx 45\,\mu\text{s}$
`y[0]`	$x[0]$ — amplitude at $t = 0\,\text{s}$
`y[44100]`	$x[44100]$ — amplitude at $t = 2\,\text{s}$
`len(y)/sr`	Total duration in seconds

📝 Worked Example — Duration and Index-to-Time

Background. Each index $n$ represents a time $t = n / f_s$. This lets you convert between array indices (what the code sees) and physical time (what the signal means).

Problem: An audio array y has 88,200 samples loaded at sr=22050. (a) What is the clip duration? (b) What physical time does index y[11025] correspond to?

Duration.
$\text{duration} = N/f_s = 88{,}200 / 22{,}050 = 4.0\text{ s}$

Duration = 4.0 seconds

Index to time.
$t = n/f_s = 11{,}025 / 22{,}050 = 0.5\text{ s}$

y[11025] = amplitude at t = 0.5 s (halfway through)

Quick Check

Which index n corresponds to time $t = 3.0$ s with sr=22050?

n = t × sr = 3.0 × 22050 = 66150

💡

Key Insight

Parentheses vs. square brackets are not cosmetic. In signal processing, $x(t)$ with parentheses always means "continuous time" and $x[n]$ with square brackets always means "discrete index." Mixing them up in an exam answer is an automatic mistake — they represent fundamentally different mathematical objects.

⚠️

Common Mistake

Myth: "Higher sample rate always means better quality — more is always more."

Reality: Above twice the signal's highest frequency, extra samples add zero information but double storage costs. Resampling from 44,100 Hz to 16,000 Hz loses no speech information at all, because the human voice stays below 8,000 Hz. The Nyquist theorem (Topic 2) gives the exact minimum.

Solution

Pause & Predict

Before you move the slider: if you sample a 5 Hz sine wave at only 6 samples per second, how many dots do you expect to see per cycle? Will the shape still look like a sine wave?

Hint: one cycle = 1/5 = 0.2 s. At 6 Hz, T_s = 1/6 ≈ 0.167 s. Count dots in 0.2 s.

Try It: Sampling Rate Visualizer

Drag the slider to change $f_s$. Watch how the sample density changes relative to the 5 Hz sine wave. Below 10 Hz, you can see the wave shape starting to distort.

Sample Rate $f_s$: 20 Hz

Analog $x(t)=3\cos(2\pi\cdot5\cdot t)$ Samples $x[n]$ Sampling grid

Live Calculation — Sampling Parameters

x[n] = 3·cos(2π × 5/fs × n) = 3·cos(2π × 5/20 × n) = 3·cos(0.3927 × n)

n = 0:3 × cos(0.0000) = 3 × 1.0000= 3.0000

n = 1:3 × cos(0.3927) = 3 × 0.9239= 2.7716

n = 4 (T/4):3 × cos(1.5708) = 3 × 0.0000= 0.0000

Implementation

Python · NumPy — Sampling a Cosine Wave

import numpy as np

# ── Sampling parameters ──────────────────────────────────────
fs = 20         # Hz  — sampling frequency (samples per second)
Ts = 1 / fs      # s   — sampling period = 0.050 s
f_signal = 5   # Hz  — frequency of the cosine wave

# ── Sample indices ───────────────────────────────────────────
n = np.arange(0, 8)          # integer indices: 0, 1, 2, 3, 4, 5, 6, 7

# ── Evaluate x[n] = x(n · Ts) = 3·cos(2π·f_signal·n·Ts) ───
x = 3 * np.cos(2 * np.pi * f_signal * n * Ts)

print(f"Ts = {Ts*1000:.1f} ms  |  {fs/f_signal:.0f} samples per cycle")
print(f"x  = {np.round(x, 3)}")

Output

Ts = 50.0 ms | 4 samples per cycle x = [ 3. 0. -3. -0. 3. 0. -3. -0.]

Key Takeaway

Every digital signal is a list of numbers tied to a sample rate — and understanding $f_s$ is the first step to understanding any audio, sensor, or ML dataset.

🎙️

Real-World Application

Automatic Speech Recognition — Why `sr=16000` Is the Industry Standard

Models like OpenAI's Whisper and Meta's Wav2Vec 2.0 require audio at exactly 16,000 Hz. This is not arbitrary — human speech energy is concentrated below 8,000 Hz, so $f_s = 16{,}000$ Hz satisfies the Nyquist criterion with a comfortable margin. Loading audio at any other rate and feeding it directly to these models corrupts the temporal features and destroys recognition accuracy. The fix is always an explicit librosa.load(file, sr=16000).

Checkpoint Quiz Sampling Fundamentals

Q1 A signal is sampled at $f_s = 8{,}000$ Hz. What is the sampling period $T_s$ in milliseconds?

$T_s = 1/f_s = 1/8000 = 0.000125\text{ s} = \mathbf{0.125\text{ ms}}$

Q2 An array y has 32,000 samples at sr=16000. What is the duration?

Duration $= N/f_s = 32000/16000 = \mathbf{2.0\text{ s}}$

Q3 You call librosa.load(file, sr=22050). What does y[0] represent?

$x[0]$ — the amplitude of the audio signal at time $t = 0/22050 = 0\text{ s}$ (the very first sample).

The Fundamental Law

Nyquist-Shannon Theorem & Aliasing

There is a precise mathematical threshold below which sampling destroys information — and above which it preserves everything. The Nyquist-Shannon theorem defines that threshold. Violating it produces aliasing: a silent corruption where high-frequency content masquerades as low-frequency noise.

After this section you will be able to

State the Nyquist-Shannon sampling theorem and identify the minimum safe sample rate for a given signal.
Calculate the alias frequency produced when a signal is sampled below the Nyquist rate.
Explain why aliasing causes silent data corruption in sensor pipelines and how an anti-aliasing filter prevents it.

What if you filmed a spinning helicopter rotor at the wrong frame rate? Sometimes the blades appear to rotate backwards — or even stand still — when they are actually spinning at full speed. That impossible illusion is aliasing, and it does the exact same thing to audio and sensor data when the sample rate is too low.

🎯

Why this matters: Wrong sample rates are one of the most common silent bugs in real ML systems. An ECG sensor that aliases a 60 Hz power-line artifact into the cardiac frequency band will make a classifier see false heartbeat patterns. A speech model that receives audio sampled at the wrong rate produces nonsense — no exception is raised; the output is just wrong.

🎡

Analogy Bridge

The stroboscope effect. If you illuminate a spinning wheel with a strobe light that flashes exactly at the wheel's rotation speed, the wheel looks frozen. Flash twice as fast, and you can see it moving. The Nyquist rate $f_s \geq 2f_{max}$ is the "flash twice as fast" rule — the minimum needed to correctly perceive motion.

Proper Sampling ($f_s \geq 2f_{max}$)

Each sample carries unique information. The original signal can be perfectly reconstructed.

✓ No information lost

Undersampling ($f_s < 2f_{max}$)

High-frequency components "wrap around" and appear as lower frequencies — aliasing. Reconstruction is impossible.

✗ Permanent information loss

2×

minimum

Nyquist rate: $f_s \geq 2f_{max}$ to prevent aliasing

20 kHz

human hearing

Upper limit — why CD audio uses $f_s = 44{,}100$ Hz

60 Hz

power-line noise

Common aliasing source in ECG/EEG sensors

Original Signal
$f=13$ Hz

→

Undersampled
$f_s=10\,\text{Hz} < 2f$

→

Aliased Result
$f_{alias}=3\,\text{Hz}$

Aliasing step-strip: sampling a 13 Hz signal at only 10 Hz (below the Nyquist rate of 26 Hz) causes the signal to appear as a spurious 3 Hz wave — indistinguishable from a real 3 Hz component.

Problem

📐 The Nyquist-Shannon Sampling Theorem

A band-limited signal with highest frequency $f_{max}$ can be perfectly reconstructed from its samples if and only if the sampling rate satisfies:

$$f_s \geq 2 f_{max}$$

The threshold $f_N = f_s / 2$ is called the Nyquist frequency — the highest frequency that can be represented at a given sample rate:

$$f_N = \frac{f_s}{2}$$

$f_s \geq 2f_{max}$ · $f_N = f_s / 2$

$f_s$ Sample Rate Snapshots per second chosen by engineer

$f_{max}$ Signal Bandwidth Highest frequency present in the signal

$2$ Nyquist Factor Minimum 2 samples needed per cycle

$f_N$ Nyquist Frequency Highest frequency representable at $f_s$

📝 Worked Example — Finding the Minimum Safe Sample Rate

Background. The Nyquist theorem states that you need at least 2 samples per cycle of the highest frequency component. Below this rate, there are not enough samples to distinguish that frequency from a lower one.

Problem: A biomedical ECG signal contains components up to 150 Hz. What is the minimum sample rate to avoid aliasing? What is the Nyquist frequency if we use $f_s = 500$ Hz?

Apply the Nyquist criterion.
$f_s^{min} = 2 \times f_{max} = 2 \times 150 = 300\text{ Hz}$

Minimum safe rate: 300 Hz

In practice, add margin. Medical devices use $f_s = 500$ Hz — well above the theoretical minimum — to prevent distortion at the edge of the band.

Nyquist frequency at 500 Hz.
$f_N = f_s / 2 = 500 / 2 = 250\text{ Hz}$

Nyquist frequency: 250 Hz (safely above 150 Hz)

Interpretation. The 100 Hz headroom ($250 - 150 = 100$ Hz) gives the anti-aliasing filter room to roll off before reaching the Nyquist limit.

Quick Check

CD audio has $f_s = 44{,}100$ Hz. What is its Nyquist frequency? Does it comfortably cover human hearing (up to 20,000 Hz)?

f_N = 44100/2 = 22050 Hz — yes, 22050 > 20000, so human hearing is fully covered with 2050 Hz of margin.

🔁 Aliasing — When Frequencies Impersonate Each Other

When $f_s < 2f_{signal}$, the sampled sequence $x[n]$ is identical to what you would get from sampling a lower-frequency signal. The aliased frequency is:

$$f_{alias} = \left| f_{signal} - k \cdot f_s \right|$$

where $k$ is the integer that places $f_{alias}$ in $[0,\, f_s/2]$. For the simplest case:

$$f_{alias} = \left| f_{signal} - f_s \right| \quad \text{(when } f_s < 2f_{signal} \leq 2f_s\text{)}$$

$f_{alias} = |f_{signal} - k \cdot f_s|$

$f_{alias}$ Alias Frequency Where the energy appears after undersampling

$f_{signal}$ True Frequency The actual frequency of the input signal

$k$ Wrap Count Integer that folds $f_{signal}$ into $[0, f_s/2]$

$|\cdot|$ Absolute Value Frequency is always non-negative

📝 Worked Example — Computing Alias Frequency

Background. When a frequency $f$ is sampled below Nyquist, there is no way to distinguish it from a lower frequency — the sample sequences are mathematically identical. The alias frequency is where the energy appears to be.

Problem: A 13 Hz tone is sampled at $f_s = 10$ Hz (below the Nyquist rate of 26 Hz). At what frequency does it appear in the spectrum?

Check that aliasing occurs.
Nyquist rate = $2 \times 13 = 26$ Hz. We are using 10 Hz < 26 Hz. Aliasing confirmed.

Compute alias.
$f_{alias} = |13 - 10| = 3\text{ Hz}$

The 13 Hz tone appears as a 3 Hz signal

Interpretation. If this were ECG data, a 13 Hz muscle artifact sampled at 10 Hz would appear as a 3 Hz wave — right inside the cardiac frequency band. A downstream classifier would treat it as a real heart signal.

Quick Check

A 7 Hz signal is sampled at $f_s = 10$ Hz. Is aliasing present? If yes, compute $f_{alias}$.

Nyquist rate = 2 × 7 = 14 Hz. Since 10 Hz < 14 Hz, aliasing is present. f_alias = |7 − 10| = 3 Hz. The 7 Hz signal appears as a 3 Hz alias.

⚠️

Common Mistake

Myth: "I can just filter out aliased frequencies after sampling."

Reality: Once aliasing has occurred, the aliased frequency is indistinguishable from a genuine signal at that frequency — you cannot unmix them after the fact. Aliasing must be prevented before sampling with an anti-aliasing low-pass filter that removes content above $f_N = f_s/2$.

Solution

Pause & Predict

The widget below shows a 13 Hz signal. Before you drag the slider: at what sample rate do you expect the reconstructed wave to start "looking wrong" — and what frequency will the alias appear at if you set $f_s = 10$ Hz?

Hint: the Nyquist rate is $2 \times 13 = 26$ Hz. Use the alias formula $f_{alias} = |f_{signal} - f_s|$.

Try It: Aliasing Explorer

A 13 Hz signal (teal) is sampled at $f_s$ Hz (red dots). Drag below the Nyquist rate of 26 Hz to see aliasing — the reconstructed wave (orange) diverges from the original.

Sample Rate $f_s$: 40 Hz

Original 13 Hz signal Samples $x[n]$ Reconstructed (aliased when undersampled)

Live Calculation — Nyquist Check

Signal: f = 13 Hz | Nyquist rate: 2 × 13 = 26 Hz fs = 40 Hz ≥ 26 Hz → NO ALIASING Nyquist freq: fN = 40 / 2 = 20.0 Hz

Status:13 Hz is below fN = 20.0 Hz✓ Safe

Implementation

Python · NumPy — Nyquist Check and Alias Frequency

import numpy as np

f_signal = 13    # Hz — frequency of the signal to sample
fs       = 10    # Hz — our (too low) sample rate

nyquist_rate = 2 * f_signal    # = 26 Hz — minimum safe fs
f_nyquist    = fs / 2           # = 5 Hz — highest representable freq

if fs < nyquist_rate:
    # Alias: find the lowest positive freq indistinguishable from f_signal
    k        = round(f_signal / fs)  # closest integer multiple of fs
    f_alias  = abs(f_signal - k * fs)  # = |13 - 1×10| = 3 Hz
    print(f"⚠ ALIASING: fs={fs} Hz < Nyquist rate {nyquist_rate} Hz")
    print(f"  {f_signal} Hz appears as {f_alias} Hz in spectrum")
else:
    print(f"✓ Safe: fs={fs} Hz >= Nyquist rate {nyquist_rate} Hz")

Output

⚠ ALIASING: fs=10 Hz < Nyquist rate 26 Hz 13 Hz appears as 3 Hz in spectrum

Key Takeaway

The Nyquist-Shannon theorem gives a hard mathematical lower bound on sample rate: $f_s \geq 2f_{max}$ — violate it and high-frequency content permanently masquerades as low-frequency noise.

🫀

Real-World Application

Aliasing in Medical Wearables — The Hidden Diagnostic Trap

Consumer smartwatch PPG sensors often sample at 25–50 Hz to conserve battery. A 30 Hz breathing artifact superimposed on the cardiac signal aliases to $|30 - 25| = 5$ Hz at a 25 Hz sample rate — right inside the cardiac frequency band (0.5–4 Hz). This aliased component has fooled early AI models into misclassifying normal resting heart rhythms as pathological. Medical-grade devices use $f_s \geq 250$ Hz with a strict anti-aliasing filter below 125 Hz to prevent this entirely.

Checkpoint Quiz Nyquist-Shannon & Aliasing

Q1 A signal contains frequencies up to 3,500 Hz. What is the minimum sample rate to prevent aliasing?

$f_s^{min} = 2 \times f_{max} = 2 \times 3500 = \mathbf{7{,}000\text{ Hz}}$

Q2 A 9 Hz tone is sampled at $f_s = 8$ Hz. What alias frequency appears in the spectrum?

$f_{alias} = |9 - 8| = \mathbf{1\text{ Hz}}$. The 9 Hz signal looks identical to a 1 Hz signal at this sample rate.

Q3 At $f_s = 44{,}100$ Hz, what is the Nyquist frequency, and what happens to a 25,000 Hz overtone?

$f_N = 44100/2 = 22050$ Hz. The 25,000 Hz overtone is above $f_N$, so it aliases to $|25000 - 44100| = 19100$ Hz — it appears as a false 19.1 kHz component. Anti-aliasing filters on real ADCs remove content above 22 kHz before sampling.

Putting It to Work

Choosing `sr` in librosa & Resampling for ML

The theory of sampling and Nyquist is only useful when you can apply it to real data. This section shows you exactly how to choose the correct sample rate when loading audio with librosa, how to detect and fix mismatched rates, and how to build a resampling step into a production ML pipeline.

After this section you will be able to

Use librosa.load() with an explicit sr parameter to load audio at a target sample rate.
Resample audio between two rates using librosa.resample() without introducing aliasing.
Design a pre-processing stage that normalises all files to a consistent sample rate before feature extraction.

You have a dataset of 10,000 audio files from five different microphones — recorded at 8,000, 16,000, 22,050, 44,100, and 48,000 Hz. If you feed them directly to a speech model, every file will be "heard" at the wrong speed. The model will perform as if each clip were the wrong duration, at the wrong pitch. The fix is three lines of Python and one design decision: what rate does your model need?

🎯

Why this matters: Real datasets are never clean. Audio from the wild has mixed sample rates, mixed bit depths, and mixed channel counts. Building a robust preprocessing pipeline that enforces a consistent $f_s$ before any feature extraction is a non-negotiable step in every production speech, music, or environmental audio ML system.

🌍

Analogy Bridge

Think of sample rate like a language. A model trained on English cannot understand French — even though both are human languages. A model trained on 16 kHz audio "speaks" 16 kHz; feeding it 44.1 kHz audio is like speaking French to it. Resampling is the translation step that converts every file to the language the model understands.

Production ML audio pipeline: always load at native rate, check and resample explicitly, then extract features — never let librosa silently resample at load time.

Problem

📐 librosa.load() — The Key Parameters

The librosa.load() function reads an audio file and returns a NumPy array y plus the sample rate integer sr. The sr parameter controls resampling at load time:

Call	Result
`librosa.load(f)`	Resamples to 22,050 Hz (default)
`librosa.load(f, sr=16000)`	Resamples to exactly 16,000 Hz
`librosa.load(f, sr=None)`	Preserves original rate — no resampling
`librosa.resample(y, orig_sr=r1, target_sr=r2)`	Converts array already in memory

📝 Worked Example — Resampling Duration and Array Length

Background. When you resample from $f_{s1}$ to $f_{s2}$, the duration stays the same but the number of samples changes proportionally:

$N_2 = N_1 \times (f_{s2} / f_{s1})$

$N_2$ New Length Samples after resampling

$N_1$ Original Length Samples before resampling

$f_{s2}$ Target Rate Desired Hz for model input

$f_{s1}$ Original Rate Source Hz from the file

Problem: A 3-second audio clip is loaded at 44,100 Hz. (a) How many samples does it have? (b) After resampling to 16,000 Hz, how many samples remain?

Original sample count.
$N_1 = \text{duration} \times f_{s1} = 3 \times 44{,}100 = 132{,}300$

N₁ = 132,300 samples at 44,100 Hz

Resampled count.
$N_2 = N_1 \times (f_{s2}/f_{s1}) = 132{,}300 \times (16{,}000/44{,}100) = 48{,}000$

N₂ = 48,000 samples at 16,000 Hz

Duration preserved.
$48{,}000 / 16{,}000 = 3.0\text{ s}$ — identical to the original. Resampling changes the array length but never the physical duration.

Quick Check

A 2-second clip at 22,050 Hz is resampled to 8,000 Hz. What is the new array length?

N₁ = 2 × 22050 = 44100. N₂ = 44100 × (8000/22050) ≈ 16000. New length = 16,000 samples.

🔧 Building a Robust Preprocessing Pipeline

A production-grade preprocessing function must (1) handle any input rate, (2) enforce the target rate the model needs, (3) normalise amplitude, and (4) be deterministic. The pattern:

Always pass sr=None first to discover the native rate without silent resampling.
Resample explicitly with librosa.resample() — this makes the step visible in code review.
Assert after — assert len(y) == TARGET_SR * duration — to catch edge cases.
Log the rate of every file in a manifest so you can audit the dataset later.

📝 Worked Example — Rate Ratio and Nyquist Check

Background. Downsampling (reducing $f_s$) is only safe if the new Nyquist frequency $f_{s,\text{new}}/2$ still covers all signal content. Upsampling (increasing $f_s$) is always safe — it adds zero new information but costs extra memory.

Problem: You want to downsample a speech file from 44,100 Hz to 8,000 Hz. (a) Is this safe if speech tops out at 4,000 Hz? (b) What is the resampling ratio?

Check Nyquist at target rate.
$f_{N,\text{new}} = 8{,}000 / 2 = 4{,}000\text{ Hz}$
Speech max = 4,000 Hz. Since $f_{N,\text{new}} \geq f_{max,\text{speech}}$, this is exactly at the limit — safe, but with zero margin.

Resampling ratio.
$r = f_{s,\text{new}} / f_{s,\text{old}} = 8{,}000 / 44{,}100 \approx 0.181$

1 original sample becomes 0.181 resampled samples — array shrinks ~5.5×

Practical note. librosa's resample automatically applies an anti-aliasing filter before downsampling — this is why you must never downsample manually by slicing every 5th sample (y[::5]), which skips the filter and introduces aliasing.

Quick Check

Why is y_down = y[::5] a bad way to downsample from 22,050 Hz to ~4,410 Hz?

Slicing skips the anti-aliasing filter. Content above the new Nyquist (2205 Hz) folds back into the signal as aliasing. librosa.resample() applies a low-pass filter first, then resamples — the correct approach.

⚠️

Common Mistake

Myth: "I called librosa.load(file) — the sample rate is whatever the file is."

Reality: librosa's default is sr=22050. If your file is 44,100 Hz, librosa silently downsamples it to 22,050 Hz before returning it. If your file is 8,000 Hz, it silently upsamples to 22,050 Hz. Use sr=None to get the file's actual rate, or pass an explicit sr= value to enforce a specific rate.

Solution

Pause & Predict

The widget below shows a mixed-rate dataset. Before you move the slider: if you set the target rate to 8,000 Hz, which files can be losslessly downsampled and which will lose information? The answer depends on whether each file's content exceeds the new Nyquist frequency of 4,000 Hz.

Hint: speech tops out ~4,000 Hz (safe at 8k). Music may have components up to 20,000 Hz (not safe at 8k).

Try It: Resampling Rate Calculator

Set the original and target sample rates. The panel shows whether downsampling is Nyquist-safe, the output array length for a 2-second clip, and the resampling ratio.

Original Rate $f_{s,\text{orig}}$: 44100 Hz

Target Rate $f_{s,\text{target}}$: 16000 Hz

Original waveform Resampled waveform Nyquist limit

Live Calculation — Resampling Parameters

Original: fs_orig = 44100 Hz → N_orig = 88200 samples (2 s) Target: fs_target = 16000 Hz → N_target = 32000 samples (2 s) Ratio: r = 16000 / 44100 = 0.3628 Nyquist (target): fN = 16000 / 2 = 8000 Hz

Type:Downsampling (target < original)✓ Safe for speech

Implementation

Python · librosa — Complete Resampling Pipeline

import librosa
import numpy as np

TARGET_SR = 16000  # Hz — required by downstream speech model

def load_and_normalise(filepath):
    # Step 1: load at native rate — never silently resample on load
    y_native, sr_native = librosa.load(filepath, sr=None, mono=True)

    # Step 2: resample to target if needed
    if sr_native != TARGET_SR:
        y = librosa.resample(y_native, orig_sr=sr_native, target_sr=TARGET_SR)
    else:
        y = y_native

    # Step 3: peak-normalise to [-1, 1]
    peak = np.max(np.abs(y))
    if peak > 0:
        y = y / peak

    print(f"  native={sr_native} Hz → resampled={TARGET_SR} Hz | {len(y)} samples")
    return y, TARGET_SR

# Process a whole directory
import pathlib
audio_dir = pathlib.Path("audio_clips/")
dataset = [load_and_normalise(f) for f in audio_dir.glob("*.wav")]

Output

native=44100 Hz → resampled=16000 Hz | 32000 samples native=8000 Hz → resampled=16000 Hz | 32000 samples native=22050 Hz → resampled=16000 Hz | 32000 samples native=16000 Hz → resampled=16000 Hz | 32000 samples

Key Takeaway

Always load with sr=None to see the native rate, then resample explicitly — never silently trust librosa's default 22,050 Hz when your model expects something else.

🗣️

Real-World Application

Whisper and Wav2Vec 2.0 — Why 16 kHz Is Non-Negotiable

Both OpenAI Whisper and Meta Wav2Vec 2.0 are trained exclusively on 16,000 Hz audio. At 16 kHz, the Nyquist frequency is 8,000 Hz — sufficient for all speech phonemes. Passing a file at a different rate does not raise an error; the model silently interprets the wrong number of samples per window as audio at the wrong pitch and tempo. In benchmark evaluations, this silent mismatch can reduce word error rate from 3% to over 30% with no warning. The fix is always the same: a single librosa.load(file, sr=16000).

Checkpoint Quiz ML Pipeline Application

Q1 What does librosa.load(file) (default) do to a 44,100 Hz file?

It resamples it down to 22,050 Hz — the library's default sr. To preserve the native rate, use sr=None.

Q2 A 5-second clip at 22,050 Hz is resampled to 8,000 Hz. What is the new array length?

$N_1 = 5 \times 22050 = 110{,}250$. $N_2 = 110{,}250 \times (8000/22050) \approx \mathbf{40{,}000}$ samples.

Q3 Why is y_down = y[::3] a bad way to downsample by 3×?

It skips the anti-aliasing filter. Frequencies above the new Nyquist limit fold back as aliasing. Use librosa.resample(), which applies a proper low-pass filter before reducing the sample count.

Practice

Exercises

Rigorous problem sets covering mathematical derivations and functional coding tasks, followed by advanced synthesis applications.

1 Theory · Sampling Fundamentals Easy

Sampling Period and Discrete Sequence Values

A humidity sensor records a signal $x(t) = 8\cos(2\pi \cdot 4 \cdot t)$ volts at a sampling rate of $f_s = 32$ Hz.

(a) Compute the sampling period $T_s$ in milliseconds.
(b) Write the discrete equation $x[n]$ explicitly in terms of $n$.
(c) Compute $x[0],\ x[2],\ x[4],$ and $x[8]$ to four decimal places.

Use $T_s = 1/f_s$, then substitute into $x[n] = 8\cos(2\pi \cdot 4 \cdot n \cdot T_s)$. Simplify the argument before evaluating.

2 Code · Sampling Fundamentals Easy

Generating and Verifying a Discrete Signal in NumPy

Use NumPy to sample $x(t) = 6\sin(2\pi \cdot 3 \cdot t)$ at $f_s = 30$ Hz for exactly 2 seconds.

(a) Create n = np.arange(0, 2*30) and compute x_n.
(b) Verify len(x_n) == 60 and print the first 5 values rounded to 3 decimal places.
(c) Find the index of the first peak using np.argmax(x_n) and verify it equals $f_s / (4 \cdot f_{signal})$.

The first peak of a sine occurs at $t = T/4 = 1/(4f)$. Convert to index: $n_{peak} = t_{peak} \times f_s$.

3 Theory · Nyquist-Shannon & Aliasing Medium

Nyquist Rate and Alias Frequency Calculation

Three signals are given:

(a) Signal A: $f_{max} = 2{,}500$ Hz. What is the minimum safe $f_s$?
(b) Signal B: $f = 11$ Hz sampled at $f_s = 8$ Hz. Compute $f_{alias}$. Is this above or below the Nyquist frequency?
(c) Signal C: $f = 25$ Hz sampled at $f_s = 40$ Hz. Is aliasing present? Justify using the Nyquist criterion.

For aliasing check: compute $f_N = f_s/2$ and compare to the signal frequency. Alias formula: $f_{alias} = |f - k \cdot f_s|$ where $k=\text{round}(f/f_s)$.

4 Code · Nyquist-Shannon & Aliasing Medium

Visualising Aliasing in NumPy

Write a Python function alias_check(f_signal, fs) that:

(a) Returns the tuple (is_aliased: bool, f_alias: float, f_nyquist: float).
(b) When aliased, f_alias = abs(f_signal - round(f_signal/fs)*fs).
(c) Test with (f=13, fs=10) and (f=5, fs=20) and print the results.
(d) Generate 200 samples of the 13 Hz signal at 10 Hz and overlay it with the alias frequency to confirm they match numerically.

Compare np.round(x_f, 6) == np.round(x_alias, 6) for the first 10 samples to verify the sequences are numerically identical.

5 Theory · ML Pipeline Application Medium

Resampling Parameters and Nyquist Safety

A speech dataset is recorded at 44,100 Hz. Your ASR model requires 16,000 Hz input.

(a) Compute $N_{orig}$ for a 3.5-second clip at 44,100 Hz.
(b) Compute $N_{target}$ after resampling to 16,000 Hz.
(c) What is the Nyquist frequency at 16,000 Hz? Is it safe for speech (which tops out at 4,000 Hz)?
(d) A colleague suggests downsampling to 7,000 Hz instead. Is this safe for speech? Explain using the Nyquist theorem.

$N_{target} = N_{orig} \times (f_{s,\text{target}} / f_{s,\text{orig}})$. For part (d): compute $f_N$ at 7,000 Hz and compare to the maximum speech frequency.

6 Code · ML Pipeline Application Medium

Building a Robust Audio Preprocessing Function

Implement a function preprocess(y, sr_native, target_sr) using librosa.resample() that:

(a) If sr_native != target_sr, resamples using librosa. Otherwise passes through unchanged.
(b) Peak-normalises the output to $[-1, 1]$.
(c) Returns (y_out, target_sr).
(d) Test with a synthetic 44,100 Hz cosine of 3 seconds resampled to 16,000 Hz. Verify len(y_out) == 48000 and max(abs(y_out)) == 1.0.

Use y_r = librosa.resample(y, orig_sr=sr_native, target_sr=target_sr). Peak-norm: y_r / np.max(np.abs(y_r)). Check the output with np.allclose().

7 Synthesis · Theory: Multi-Rate Sensor Design Hard

Choosing Sample Rates Across a Sensor Network

A wearable device collects three physiological signals simultaneously:

ECG: frequency content up to 150 Hz
PPG (pulse oximetry): frequency content up to 10 Hz
Skin temperature: frequency content up to 0.5 Hz

(a) Compute the minimum safe sample rate for each sensor.
(b) The device uses a shared ADC at 300 Hz. Which signals are safe? Which are at risk of aliasing?
(c) If a 200 Hz muscle artifact contaminates the ECG at $f_s = 300$ Hz, compute its alias frequency.
(d) Recommend a single ADC rate that safely captures all three signals without excessive data volume.

Apply Nyquist to each signal independently. For (c): $f_N = 150$ Hz at 300 Hz sampling. A 200 Hz artifact is above $f_N$. Alias = $|200 - 300| = 100$ Hz.

8 Synthesis · Code: End-to-End Audio Normalisation Pipeline Hard

Mixed-Rate Dataset Normalisation

Build a complete pipeline that processes a list of synthetic audio clips with different sample rates:

(a) Generate four synthetic clips using NumPy: 3 s at 8,000 Hz, 2 s at 22,050 Hz, 4 s at 44,100 Hz, and 1 s at 48,000 Hz (all cosines at 440 Hz).
(b) For each clip, use alias_check(440, sr) from Ex 4 to verify no aliasing is present.
(c) Resample all clips to 16,000 Hz using preprocess() from Ex 6.
(d) Print a report table: original rate, original length, target rate, target length, duration (should all be equal to original).

Target length = int(duration * target_sr). Duration check: abs(len(y_out)/target_sr - original_duration) < 0.01. Use string formatting to align columns in the report.

Sampling & Aliasing
From Analog to Digital

Discrete Signals & Sampling

📐 The Sampling Equation

🐍 What `sr` Really Is in Python

Try It: Sampling Rate Visualizer

Automatic Speech Recognition — Why `sr=16000` Is the Industry Standard

Nyquist-Shannon Theorem & Aliasing

📐 The Nyquist-Shannon Sampling Theorem

🔁 Aliasing — When Frequencies Impersonate Each Other

Try It: Aliasing Explorer

Aliasing in Medical Wearables — The Hidden Diagnostic Trap

Choosing `sr` in librosa & Resampling for ML

📐 librosa.load() — The Key Parameters

🔧 Building a Robust Preprocessing Pipeline

Try It: Resampling Rate Calculator

Whisper and Wav2Vec 2.0 — Why 16 kHz Is Non-Negotiable

Sampling & Aliasing Laboratory

Week 2 Recap

The Sampling Equation

Nyquist-Shannon Theorem

Alias Frequency Formula

librosa Pattern

Further Reading

Circles, Sines, and Signals

librosa Documentation

SciPy Signal Processing

NumPy — rfftfreq

Exercises

Sampling Period and Discrete Sequence Values

Generating and Verifying a Discrete Signal in NumPy

Nyquist Rate and Alias Frequency Calculation

Visualising Aliasing in NumPy

Resampling Parameters and Nyquist Safety

Building a Robust Audio Preprocessing Function

Choosing Sample Rates Across a Sensor Network

Mixed-Rate Dataset Normalisation

Sampling & AliasingFrom Analog to Digital

Discrete Signals & Sampling

📐 The Sampling Equation

🐍 What sr Really Is in Python

Try It: Sampling Rate Visualizer

Automatic Speech Recognition — Why sr=16000 Is the Industry Standard

Nyquist-Shannon Theorem & Aliasing

📐 The Nyquist-Shannon Sampling Theorem

🔁 Aliasing — When Frequencies Impersonate Each Other

Try It: Aliasing Explorer

Aliasing in Medical Wearables — The Hidden Diagnostic Trap

Choosing sr in librosa & Resampling for ML

📐 librosa.load() — The Key Parameters

🔧 Building a Robust Preprocessing Pipeline

Try It: Resampling Rate Calculator

Whisper and Wav2Vec 2.0 — Why 16 kHz Is Non-Negotiable

Sampling & Aliasing Laboratory

Week 2 Recap

The Sampling Equation

Nyquist-Shannon Theorem

Alias Frequency Formula

librosa Pattern

Further Reading

Circles, Sines, and Signals

librosa Documentation

SciPy Signal Processing

NumPy — rfftfreq

Exercises

Sampling Period and Discrete Sequence Values

Generating and Verifying a Discrete Signal in NumPy

Nyquist Rate and Alias Frequency Calculation

Visualising Aliasing in NumPy

Resampling Parameters and Nyquist Safety

Building a Robust Audio Preprocessing Function

Choosing Sample Rates Across a Sensor Network

Mixed-Rate Dataset Normalisation

Sampling & Aliasing
From Analog to Digital

🐍 What `sr` Really Is in Python

Automatic Speech Recognition — Why `sr=16000` Is the Industry Standard

Choosing `sr` in librosa & Resampling for ML