Signal Processing Foundations — Week 2

Sampling & Aliasing
From Analog to Digital

Every audio file, sensor reading, and medical scan begins with one decision — how many times per second to measure. Learn the law that governs this choice, what happens when you break it, and how to apply it in real ML pipelines.

Discrete Signals Nyquist-Shannon Aliasing sample_rate librosa.load() Resampling

Discrete Signals & Sampling

A continuous-time signal has a value at every instant of time. A digital computer can only store a finite list of numbers. Sampling bridges these two worlds by taking periodic "snapshots" — and the rate at which we snap determines everything about what we can recover.

After this section you will be able to
  • Explain in plain English what a discrete signal is and how it differs from a continuous signal.
  • Apply the sampling equation $x[n] = x(nT_s)$ to compute sample values by hand.
  • Interpret the sr parameter in NumPy and librosa code as $f_s$ in the sampling equation.

Imagine trying to describe a song to someone over text. You cannot send the whole continuous soundwave — you have to send a list of numbers. But which numbers? How many? And how do you make sure the list is enough to reconstruct the original? That is exactly the problem sampling solves.

🎯
Why this matters: Every dataset you will ever load in an ML project is already a sampled signal. The sr=22050 you pass to librosa.load() is not a magic constant — it is $f_s$, the sampling frequency, and choosing it incorrectly degrades model performance or silently corrupts your features.
🎬
Analogy Bridge

Think of a flip-book animation. Each page is one sample. The continuous movie is the analog signal; the stack of pages is the digital signal. More pages per second = smoother motion = higher $f_s$. The question the Nyquist theorem (Topic 2) answers is: how many pages do you actually need?

x(t) — continuous, every instant Analog
Continuous Signal
Value defined at every real time $t$
clock ticks at interval T_s = 1/f_s Sampling
Clock Grid
ADC fires every $T_s$ seconds
x[n] — integer index, finite list Digital
Discrete Sequence
List of numbers: $x[0], x[1], x[2], \ldots$
y, sr = librosa .load(file) sr = 22050 # = f_s sr is f_s — samples per second Python
NumPy Array
sr is $f_s$ — samples / second
$x(t)$
Analog Signal
Continuous — has a value at every real time $t$
e.g. microphone voltage
$x[n]$
Discrete Signal
A list of numbers indexed by integer $n$
e.g. Python NumPy array
$f_s$
Sample Rate
Snapshots taken per second (Hz)
e.g. 44100 Hz (CD audio)
$T_s$
Sampling Period
Time gap between consecutive samples
e.g. 1/44100 ≈ 22.7 µs
44,100
Hz
CD audio — the industry standard since 1980
22,050
Hz
librosa default sr — half the CD rate, still covers speech
16,000
Hz
ASR models (Whisper, Wav2Vec) — speech processing standard
8,000
Hz
Telephone voice — minimum for intelligible speech
Analog Signal x(t) ADC Sampler Snapshot every Ts seconds Rate = fs (Hz) Discrete Array x[n]

Signal taxonomy pipeline: The Analog-to-Digital Converter (ADC) bridges the continuous physical world and the discrete digital domain at a fixed rate $f_s$.

Problem

📐 The Sampling Equation

The fundamental relationship that connects the continuous analog signal to its discrete digital representation. Each sample $x[n]$ is simply the value of the analog signal frozen at clock tick $n$:

$$x[n] = x(n \cdot T_s) \qquad T_s = \frac{1}{f_s}$$

$x[n] = x(n \cdot T_s)$  ·  $T_s = 1/f_s$
$x[n]$ Discrete Sample The $n$-th number stored in the array
$n$ Sample Index Integer counter: 0, 1, 2, 3, …
$x(t)$ Analog Signal Continuous-time measurement (e.g., voltage)
$T_s$ Sampling Period Seconds between consecutive snapshots
$f_s$ Sample Rate Snapshots taken per second (Hz)
📝 Worked Example — Sampling a 5 Hz Sine Wave

Background. The equation $x[n] = x(nT_s)$ replaces continuous time $t$ with the discrete grid $\{0, T_s, 2T_s, \ldots\}$. We evaluate the analog function at each of these clock ticks.

Problem: A sensor records $x(t) = 3\cos(2\pi \cdot 5 \cdot t)$ V at $f_s = 20$ Hz. Compute $T_s$ and the first four samples $x[0], x[1], x[2], x[3]$.

1
Find the sampling period.
$T_s = 1/f_s = 1/20 = 0.05\text{ s}$
Ts = 0.05 s = 50 ms
2
Substitute into the sampling equation.
$x[n] = 3\cos(2\pi \cdot 5 \cdot n \cdot 0.05) = 3\cos(0.5\pi n)$
3
Evaluate each sample.
$x[0] = 3\cos(0) = 3.000$
$x[1] = 3\cos(0.5\pi) = 0.000$
$x[2] = 3\cos(\pi) = -3.000$
$x[3] = 3\cos(1.5\pi) = 0.000$
x = [3.000, 0.000, −3.000, 0.000, …]
4
Physical interpretation. Sampling at $f_s = 4f_{signal} = 20$ Hz captures exactly 4 points per cycle (peak, zero, trough, zero). The sequence $[3, 0, -3, 0, 3, \ldots]$ fully captures the wave's shape.
Quick Check

If $f_s = 40$ Hz instead, how many samples appear per cycle of the 5 Hz signal?

8 samples per cycle — at f_s=40Hz, T_s=0.025s. One cycle = 1/5 = 0.2 s. So 0.2/0.025 = 8 samples.

🐍 What sr Really Is in Python

When you write y, sr = librosa.load(file), the variable sr is nothing but $f_s$ — the number of samples per second stored in the audio file. Every array index maps to a real physical time via $t = n / f_s$.

CodeMeaning
sr = 22050$f_s = 22{,}050$ Hz → $T_s \approx 45\,\mu\text{s}$
y[0]$x[0]$ — amplitude at $t = 0\,\text{s}$
y[44100]$x[44100]$ — amplitude at $t = 2\,\text{s}$
len(y)/srTotal duration in seconds
📝 Worked Example — Duration and Index-to-Time

Background. Each index $n$ represents a time $t = n / f_s$. This lets you convert between array indices (what the code sees) and physical time (what the signal means).

Problem: An audio array y has 88,200 samples loaded at sr=22050. (a) What is the clip duration? (b) What physical time does index y[11025] correspond to?

1
Duration.
$\text{duration} = N/f_s = 88{,}200 / 22{,}050 = 4.0\text{ s}$
Duration = 4.0 seconds
2
Index to time.
$t = n/f_s = 11{,}025 / 22{,}050 = 0.5\text{ s}$
y[11025] = amplitude at t = 0.5 s (halfway through)
Quick Check

Which index n corresponds to time $t = 3.0$ s with sr=22050?

n = t × sr = 3.0 × 22050 = 66150
💡
Key Insight

Parentheses vs. square brackets are not cosmetic. In signal processing, $x(t)$ with parentheses always means "continuous time" and $x[n]$ with square brackets always means "discrete index." Mixing them up in an exam answer is an automatic mistake — they represent fundamentally different mathematical objects.

⚠️
Common Mistake

Myth: "Higher sample rate always means better quality — more is always more."

Reality: Above twice the signal's highest frequency, extra samples add zero information but double storage costs. Resampling from 44,100 Hz to 16,000 Hz loses no speech information at all, because the human voice stays below 8,000 Hz. The Nyquist theorem (Topic 2) gives the exact minimum.

Solution
Pause & Predict

Before you move the slider: if you sample a 5 Hz sine wave at only 6 samples per second, how many dots do you expect to see per cycle? Will the shape still look like a sine wave?

Hint: one cycle = 1/5 = 0.2 s. At 6 Hz, T_s = 1/6 ≈ 0.167 s. Count dots in 0.2 s.

Try It: Sampling Rate Visualizer

Drag the slider to change $f_s$. Watch how the sample density changes relative to the 5 Hz sine wave. Below 10 Hz, you can see the wave shape starting to distort.

Analog $x(t)=3\cos(2\pi\cdot5\cdot t)$ Samples $x[n]$ Sampling grid
Live Calculation — Sampling Parameters
x[n] = 3·cos(2π × 5/fs × n) = 3·cos(2π × 5/20 × n) = 3·cos(0.3927 × n)
n = 0:3 × cos(0.0000) = 3 × 1.0000= 3.0000
n = 1:3 × cos(0.3927) = 3 × 0.9239= 2.7716
n = 4 (T/4):3 × cos(1.5708) = 3 × 0.0000= 0.0000
Implementation
Python · NumPy — Sampling a Cosine Wave
import numpy as np # ── Sampling parameters ────────────────────────────────────── fs = 20 # Hz — sampling frequency (samples per second) Ts = 1 / fs # s — sampling period = 0.050 s f_signal = 5 # Hz — frequency of the cosine wave # ── Sample indices ─────────────────────────────────────────── n = np.arange(0, 8) # integer indices: 0, 1, 2, 3, 4, 5, 6, 7 # ── Evaluate x[n] = x(n · Ts) = 3·cos(2π·f_signal·n·Ts) ─── x = 3 * np.cos(2 * np.pi * f_signal * n * Ts) print(f"Ts = {Ts*1000:.1f} ms | {fs/f_signal:.0f} samples per cycle") print(f"x = {np.round(x, 3)}")
Output
Ts = 50.0 ms | 4 samples per cycle x = [ 3. 0. -3. -0. 3. 0. -3. -0.]
Key Takeaway

Every digital signal is a list of numbers tied to a sample rate — and understanding $f_s$ is the first step to understanding any audio, sensor, or ML dataset.

🎙️
Real-World Application

Automatic Speech Recognition — Why sr=16000 Is the Industry Standard

Models like OpenAI's Whisper and Meta's Wav2Vec 2.0 require audio at exactly 16,000 Hz. This is not arbitrary — human speech energy is concentrated below 8,000 Hz, so $f_s = 16{,}000$ Hz satisfies the Nyquist criterion with a comfortable margin. Loading audio at any other rate and feeding it directly to these models corrupts the temporal features and destroys recognition accuracy. The fix is always an explicit librosa.load(file, sr=16000).

Checkpoint Quiz Sampling Fundamentals

Q1 A signal is sampled at $f_s = 8{,}000$ Hz. What is the sampling period $T_s$ in milliseconds?

$T_s = 1/f_s = 1/8000 = 0.000125\text{ s} = \mathbf{0.125\text{ ms}}$

Q2 An array y has 32,000 samples at sr=16000. What is the duration?

Duration $= N/f_s = 32000/16000 = \mathbf{2.0\text{ s}}$

Q3 You call librosa.load(file, sr=22050). What does y[0] represent?

$x[0]$ — the amplitude of the audio signal at time $t = 0/22050 = 0\text{ s}$ (the very first sample).

Nyquist-Shannon Theorem & Aliasing

There is a precise mathematical threshold below which sampling destroys information — and above which it preserves everything. The Nyquist-Shannon theorem defines that threshold. Violating it produces aliasing: a silent corruption where high-frequency content masquerades as low-frequency noise.

After this section you will be able to
  • State the Nyquist-Shannon sampling theorem and identify the minimum safe sample rate for a given signal.
  • Calculate the alias frequency produced when a signal is sampled below the Nyquist rate.
  • Explain why aliasing causes silent data corruption in sensor pipelines and how an anti-aliasing filter prevents it.

What if you filmed a spinning helicopter rotor at the wrong frame rate? Sometimes the blades appear to rotate backwards — or even stand still — when they are actually spinning at full speed. That impossible illusion is aliasing, and it does the exact same thing to audio and sensor data when the sample rate is too low.

🎯
Why this matters: Wrong sample rates are one of the most common silent bugs in real ML systems. An ECG sensor that aliases a 60 Hz power-line artifact into the cardiac frequency band will make a classifier see false heartbeat patterns. A speech model that receives audio sampled at the wrong rate produces nonsense — no exception is raised; the output is just wrong.
🎡
Analogy Bridge

The stroboscope effect. If you illuminate a spinning wheel with a strobe light that flashes exactly at the wheel's rotation speed, the wheel looks frozen. Flash twice as fast, and you can see it moving. The Nyquist rate $f_s \geq 2f_{max}$ is the "flash twice as fast" rule — the minimum needed to correctly perceive motion.

Proper Sampling ($f_s \geq 2f_{max}$)

Each sample carries unique information. The original signal can be perfectly reconstructed.

✓ No information lost

VS
Undersampling ($f_s < 2f_{max}$)

High-frequency components "wrap around" and appear as lower frequencies — aliasing. Reconstruction is impossible.

✗ Permanent information loss

minimum
Nyquist rate: $f_s \geq 2f_{max}$ to prevent aliasing
20 kHz
human hearing
Upper limit — why CD audio uses $f_s = 44{,}100$ Hz
60 Hz
power-line noise
Common aliasing source in ECG/EEG sensors
1
f = 13 Hz continuous
Original Signal
$f=13$ Hz
2
f_s = 10 Hz < 26 Hz ⚠
Undersampled
$f_s=10\,\text{Hz} < 2f$
3
alias = |13−10| = 3 Hz
Aliased Result
$f_{alias}=3\,\text{Hz}$

Aliasing step-strip: sampling a 13 Hz signal at only 10 Hz (below the Nyquist rate of 26 Hz) causes the signal to appear as a spurious 3 Hz wave — indistinguishable from a real 3 Hz component.

Problem

📐 The Nyquist-Shannon Sampling Theorem

A band-limited signal with highest frequency $f_{max}$ can be perfectly reconstructed from its samples if and only if the sampling rate satisfies:

$$f_s \geq 2 f_{max}$$

The threshold $f_N = f_s / 2$ is called the Nyquist frequency — the highest frequency that can be represented at a given sample rate:

$$f_N = \frac{f_s}{2}$$

$f_s \geq 2f_{max}$  ·  $f_N = f_s / 2$
$f_s$ Sample Rate Snapshots per second chosen by engineer
$f_{max}$ Signal Bandwidth Highest frequency present in the signal
$2$ Nyquist Factor Minimum 2 samples needed per cycle
$f_N$ Nyquist Frequency Highest frequency representable at $f_s$
📝 Worked Example — Finding the Minimum Safe Sample Rate

Background. The Nyquist theorem states that you need at least 2 samples per cycle of the highest frequency component. Below this rate, there are not enough samples to distinguish that frequency from a lower one.

Problem: A biomedical ECG signal contains components up to 150 Hz. What is the minimum sample rate to avoid aliasing? What is the Nyquist frequency if we use $f_s = 500$ Hz?

1
Apply the Nyquist criterion.
$f_s^{min} = 2 \times f_{max} = 2 \times 150 = 300\text{ Hz}$
Minimum safe rate: 300 Hz
2
In practice, add margin. Medical devices use $f_s = 500$ Hz — well above the theoretical minimum — to prevent distortion at the edge of the band.
3
Nyquist frequency at 500 Hz.
$f_N = f_s / 2 = 500 / 2 = 250\text{ Hz}$
Nyquist frequency: 250 Hz (safely above 150 Hz)
4
Interpretation. The 100 Hz headroom ($250 - 150 = 100$ Hz) gives the anti-aliasing filter room to roll off before reaching the Nyquist limit.
Quick Check

CD audio has $f_s = 44{,}100$ Hz. What is its Nyquist frequency? Does it comfortably cover human hearing (up to 20,000 Hz)?

f_N = 44100/2 = 22050 Hz — yes, 22050 > 20000, so human hearing is fully covered with 2050 Hz of margin.

🔁 Aliasing — When Frequencies Impersonate Each Other

When $f_s < 2f_{signal}$, the sampled sequence $x[n]$ is identical to what you would get from sampling a lower-frequency signal. The aliased frequency is:

$$f_{alias} = \left| f_{signal} - k \cdot f_s \right|$$

where $k$ is the integer that places $f_{alias}$ in $[0,\, f_s/2]$. For the simplest case:

$$f_{alias} = \left| f_{signal} - f_s \right| \quad \text{(when } f_s < 2f_{signal} \leq 2f_s\text{)}$$

$f_{alias} = |f_{signal} - k \cdot f_s|$
$f_{alias}$ Alias Frequency Where the energy appears after undersampling
$f_{signal}$ True Frequency The actual frequency of the input signal
$k$ Wrap Count Integer that folds $f_{signal}$ into $[0, f_s/2]$
$|\cdot|$ Absolute Value Frequency is always non-negative
📝 Worked Example — Computing Alias Frequency

Background. When a frequency $f$ is sampled below Nyquist, there is no way to distinguish it from a lower frequency — the sample sequences are mathematically identical. The alias frequency is where the energy appears to be.

Problem: A 13 Hz tone is sampled at $f_s = 10$ Hz (below the Nyquist rate of 26 Hz). At what frequency does it appear in the spectrum?

1
Check that aliasing occurs.
Nyquist rate = $2 \times 13 = 26$ Hz. We are using 10 Hz < 26 Hz. Aliasing confirmed.
2
Compute alias.
$f_{alias} = |13 - 10| = 3\text{ Hz}$
The 13 Hz tone appears as a 3 Hz signal
3
Interpretation. If this were ECG data, a 13 Hz muscle artifact sampled at 10 Hz would appear as a 3 Hz wave — right inside the cardiac frequency band. A downstream classifier would treat it as a real heart signal.
Quick Check

A 7 Hz signal is sampled at $f_s = 10$ Hz. Is aliasing present? If yes, compute $f_{alias}$.

Nyquist rate = 2 × 7 = 14 Hz. Since 10 Hz < 14 Hz, aliasing is present. f_alias = |7 − 10| = 3 Hz. The 7 Hz signal appears as a 3 Hz alias.
⚠️
Common Mistake

Myth: "I can just filter out aliased frequencies after sampling."

Reality: Once aliasing has occurred, the aliased frequency is indistinguishable from a genuine signal at that frequency — you cannot unmix them after the fact. Aliasing must be prevented before sampling with an anti-aliasing low-pass filter that removes content above $f_N = f_s/2$.

Solution
Pause & Predict

The widget below shows a 13 Hz signal. Before you drag the slider: at what sample rate do you expect the reconstructed wave to start "looking wrong" — and what frequency will the alias appear at if you set $f_s = 10$ Hz?

Hint: the Nyquist rate is $2 \times 13 = 26$ Hz. Use the alias formula $f_{alias} = |f_{signal} - f_s|$.

Try It: Aliasing Explorer

A 13 Hz signal (teal) is sampled at $f_s$ Hz (red dots). Drag below the Nyquist rate of 26 Hz to see aliasing — the reconstructed wave (orange) diverges from the original.

Original 13 Hz signal Samples $x[n]$ Reconstructed (aliased when undersampled)
Live Calculation — Nyquist Check
Signal: f = 13 Hz | Nyquist rate: 2 × 13 = 26 Hz fs = 40 Hz ≥ 26 Hz → NO ALIASING Nyquist freq: fN = 40 / 2 = 20.0 Hz
Status:13 Hz is below fN = 20.0 Hz✓ Safe
Implementation
Python · NumPy — Nyquist Check and Alias Frequency
import numpy as np f_signal = 13 # Hz — frequency of the signal to sample fs = 10 # Hz — our (too low) sample rate nyquist_rate = 2 * f_signal # = 26 Hz — minimum safe fs f_nyquist = fs / 2 # = 5 Hz — highest representable freq if fs < nyquist_rate: # Alias: find the lowest positive freq indistinguishable from f_signal k = round(f_signal / fs) # closest integer multiple of fs f_alias = abs(f_signal - k * fs) # = |13 - 1×10| = 3 Hz print(f"⚠ ALIASING: fs={fs} Hz < Nyquist rate {nyquist_rate} Hz") print(f" {f_signal} Hz appears as {f_alias} Hz in spectrum") else: print(f"✓ Safe: fs={fs} Hz >= Nyquist rate {nyquist_rate} Hz")
Output
⚠ ALIASING: fs=10 Hz < Nyquist rate 26 Hz 13 Hz appears as 3 Hz in spectrum
Key Takeaway

The Nyquist-Shannon theorem gives a hard mathematical lower bound on sample rate: $f_s \geq 2f_{max}$ — violate it and high-frequency content permanently masquerades as low-frequency noise.

🫀
Real-World Application

Aliasing in Medical Wearables — The Hidden Diagnostic Trap

Consumer smartwatch PPG sensors often sample at 25–50 Hz to conserve battery. A 30 Hz breathing artifact superimposed on the cardiac signal aliases to $|30 - 25| = 5$ Hz at a 25 Hz sample rate — right inside the cardiac frequency band (0.5–4 Hz). This aliased component has fooled early AI models into misclassifying normal resting heart rhythms as pathological. Medical-grade devices use $f_s \geq 250$ Hz with a strict anti-aliasing filter below 125 Hz to prevent this entirely.

Checkpoint Quiz Nyquist-Shannon & Aliasing

Q1 A signal contains frequencies up to 3,500 Hz. What is the minimum sample rate to prevent aliasing?

$f_s^{min} = 2 \times f_{max} = 2 \times 3500 = \mathbf{7{,}000\text{ Hz}}$

Q2 A 9 Hz tone is sampled at $f_s = 8$ Hz. What alias frequency appears in the spectrum?

$f_{alias} = |9 - 8| = \mathbf{1\text{ Hz}}$. The 9 Hz signal looks identical to a 1 Hz signal at this sample rate.

Q3 At $f_s = 44{,}100$ Hz, what is the Nyquist frequency, and what happens to a 25,000 Hz overtone?

$f_N = 44100/2 = 22050$ Hz. The 25,000 Hz overtone is above $f_N$, so it aliases to $|25000 - 44100| = 19100$ Hz — it appears as a false 19.1 kHz component. Anti-aliasing filters on real ADCs remove content above 22 kHz before sampling.

Choosing sr in librosa & Resampling for ML

The theory of sampling and Nyquist is only useful when you can apply it to real data. This section shows you exactly how to choose the correct sample rate when loading audio with librosa, how to detect and fix mismatched rates, and how to build a resampling step into a production ML pipeline.

After this section you will be able to
  • Use librosa.load() with an explicit sr parameter to load audio at a target sample rate.
  • Resample audio between two rates using librosa.resample() without introducing aliasing.
  • Design a pre-processing stage that normalises all files to a consistent sample rate before feature extraction.

You have a dataset of 10,000 audio files from five different microphones — recorded at 8,000, 16,000, 22,050, 44,100, and 48,000 Hz. If you feed them directly to a speech model, every file will be "heard" at the wrong speed. The model will perform as if each clip were the wrong duration, at the wrong pitch. The fix is three lines of Python and one design decision: what rate does your model need?

🎯
Why this matters: Real datasets are never clean. Audio from the wild has mixed sample rates, mixed bit depths, and mixed channel counts. Building a robust preprocessing pipeline that enforces a consistent $f_s$ before any feature extraction is a non-negotiable step in every production speech, music, or environmental audio ML system.
🌍
Analogy Bridge

Think of sample rate like a language. A model trained on English cannot understand French — even though both are human languages. A model trained on 16 kHz audio "speaks" 16 kHz; feeding it 44.1 kHz audio is like speaking French to it. Resampling is the translation step that converts every file to the language the model understands.

① Load librosa.load() sr=None → native ② Check Rate assert sr_native == target or flag for resample ③ Resample librosa.resample(y, orig_sr, target_sr) ④ Extract librosa.feature.mfcc() mel_spectrogram() … ⑤ Model Input Consistent shape ready for training

Production ML audio pipeline: always load at native rate, check and resample explicitly, then extract features — never let librosa silently resample at load time.

Problem

📐 librosa.load() — The Key Parameters

The librosa.load() function reads an audio file and returns a NumPy array y plus the sample rate integer sr. The sr parameter controls resampling at load time:

CallResult
librosa.load(f)Resamples to 22,050 Hz (default)
librosa.load(f, sr=16000)Resamples to exactly 16,000 Hz
librosa.load(f, sr=None)Preserves original rate — no resampling
librosa.resample(y, orig_sr=r1, target_sr=r2)Converts array already in memory
📝 Worked Example — Resampling Duration and Array Length

Background. When you resample from $f_{s1}$ to $f_{s2}$, the duration stays the same but the number of samples changes proportionally:

$N_2 = N_1 \times (f_{s2} / f_{s1})$
$N_2$ New Length Samples after resampling
$N_1$ Original Length Samples before resampling
$f_{s2}$ Target Rate Desired Hz for model input
$f_{s1}$ Original Rate Source Hz from the file

Problem: A 3-second audio clip is loaded at 44,100 Hz. (a) How many samples does it have? (b) After resampling to 16,000 Hz, how many samples remain?

1
Original sample count.
$N_1 = \text{duration} \times f_{s1} = 3 \times 44{,}100 = 132{,}300$
N₁ = 132,300 samples at 44,100 Hz
2
Resampled count.
$N_2 = N_1 \times (f_{s2}/f_{s1}) = 132{,}300 \times (16{,}000/44{,}100) = 48{,}000$
N₂ = 48,000 samples at 16,000 Hz
3
Duration preserved.
$48{,}000 / 16{,}000 = 3.0\text{ s}$ — identical to the original. Resampling changes the array length but never the physical duration.
Quick Check

A 2-second clip at 22,050 Hz is resampled to 8,000 Hz. What is the new array length?

N₁ = 2 × 22050 = 44100. N₂ = 44100 × (8000/22050) ≈ 16000. New length = 16,000 samples.

🔧 Building a Robust Preprocessing Pipeline

A production-grade preprocessing function must (1) handle any input rate, (2) enforce the target rate the model needs, (3) normalise amplitude, and (4) be deterministic. The pattern:

  • Always pass sr=None first to discover the native rate without silent resampling.
  • Resample explicitly with librosa.resample() — this makes the step visible in code review.
  • Assert afterassert len(y) == TARGET_SR * duration — to catch edge cases.
  • Log the rate of every file in a manifest so you can audit the dataset later.
📝 Worked Example — Rate Ratio and Nyquist Check

Background. Downsampling (reducing $f_s$) is only safe if the new Nyquist frequency $f_{s,\text{new}}/2$ still covers all signal content. Upsampling (increasing $f_s$) is always safe — it adds zero new information but costs extra memory.

Problem: You want to downsample a speech file from 44,100 Hz to 8,000 Hz. (a) Is this safe if speech tops out at 4,000 Hz? (b) What is the resampling ratio?

1
Check Nyquist at target rate.
$f_{N,\text{new}} = 8{,}000 / 2 = 4{,}000\text{ Hz}$
Speech max = 4,000 Hz. Since $f_{N,\text{new}} \geq f_{max,\text{speech}}$, this is exactly at the limit — safe, but with zero margin.
2
Resampling ratio.
$r = f_{s,\text{new}} / f_{s,\text{old}} = 8{,}000 / 44{,}100 \approx 0.181$
1 original sample becomes 0.181 resampled samples — array shrinks ~5.5×
3
Practical note. librosa's resample automatically applies an anti-aliasing filter before downsampling — this is why you must never downsample manually by slicing every 5th sample (y[::5]), which skips the filter and introduces aliasing.
Quick Check

Why is y_down = y[::5] a bad way to downsample from 22,050 Hz to ~4,410 Hz?

Slicing skips the anti-aliasing filter. Content above the new Nyquist (2205 Hz) folds back into the signal as aliasing. librosa.resample() applies a low-pass filter first, then resamples — the correct approach.
⚠️
Common Mistake

Myth: "I called librosa.load(file) — the sample rate is whatever the file is."

Reality: librosa's default is sr=22050. If your file is 44,100 Hz, librosa silently downsamples it to 22,050 Hz before returning it. If your file is 8,000 Hz, it silently upsamples to 22,050 Hz. Use sr=None to get the file's actual rate, or pass an explicit sr= value to enforce a specific rate.

Solution
Pause & Predict

The widget below shows a mixed-rate dataset. Before you move the slider: if you set the target rate to 8,000 Hz, which files can be losslessly downsampled and which will lose information? The answer depends on whether each file's content exceeds the new Nyquist frequency of 4,000 Hz.

Hint: speech tops out ~4,000 Hz (safe at 8k). Music may have components up to 20,000 Hz (not safe at 8k).

Try It: Resampling Rate Calculator

Set the original and target sample rates. The panel shows whether downsampling is Nyquist-safe, the output array length for a 2-second clip, and the resampling ratio.

Original waveform Resampled waveform Nyquist limit
Live Calculation — Resampling Parameters
Original: fs_orig = 44100 Hz → N_orig = 88200 samples (2 s) Target: fs_target = 16000 Hz → N_target = 32000 samples (2 s) Ratio: r = 16000 / 44100 = 0.3628 Nyquist (target): fN = 16000 / 2 = 8000 Hz
Type:Downsampling (target < original)✓ Safe for speech
Implementation
Python · librosa — Complete Resampling Pipeline
import librosa import numpy as np TARGET_SR = 16000 # Hz — required by downstream speech model def load_and_normalise(filepath): # Step 1: load at native rate — never silently resample on load y_native, sr_native = librosa.load(filepath, sr=None, mono=True) # Step 2: resample to target if needed if sr_native != TARGET_SR: y = librosa.resample(y_native, orig_sr=sr_native, target_sr=TARGET_SR) else: y = y_native # Step 3: peak-normalise to [-1, 1] peak = np.max(np.abs(y)) if peak > 0: y = y / peak print(f" native={sr_native} Hz → resampled={TARGET_SR} Hz | {len(y)} samples") return y, TARGET_SR # Process a whole directory import pathlib audio_dir = pathlib.Path("audio_clips/") dataset = [load_and_normalise(f) for f in audio_dir.glob("*.wav")]
Output
native=44100 Hz → resampled=16000 Hz | 32000 samples native=8000 Hz → resampled=16000 Hz | 32000 samples native=22050 Hz → resampled=16000 Hz | 32000 samples native=16000 Hz → resampled=16000 Hz | 32000 samples
Key Takeaway

Always load with sr=None to see the native rate, then resample explicitly — never silently trust librosa's default 22,050 Hz when your model expects something else.

🗣️
Real-World Application

Whisper and Wav2Vec 2.0 — Why 16 kHz Is Non-Negotiable

Both OpenAI Whisper and Meta Wav2Vec 2.0 are trained exclusively on 16,000 Hz audio. At 16 kHz, the Nyquist frequency is 8,000 Hz — sufficient for all speech phonemes. Passing a file at a different rate does not raise an error; the model silently interprets the wrong number of samples per window as audio at the wrong pitch and tempo. In benchmark evaluations, this silent mismatch can reduce word error rate from 3% to over 30% with no warning. The fix is always the same: a single librosa.load(file, sr=16000).

Checkpoint Quiz ML Pipeline Application

Q1 What does librosa.load(file) (default) do to a 44,100 Hz file?

It resamples it down to 22,050 Hz — the library's default sr. To preserve the native rate, use sr=None.

Q2 A 5-second clip at 22,050 Hz is resampled to 8,000 Hz. What is the new array length?

$N_1 = 5 \times 22050 = 110{,}250$. $N_2 = 110{,}250 \times (8000/22050) \approx \mathbf{40{,}000}$ samples.

Q3 Why is y_down = y[::3] a bad way to downsample by 3×?

It skips the anti-aliasing filter. Frequencies above the new Nyquist limit fold back as aliasing. Use librosa.resample(), which applies a proper low-pass filter before reducing the sample count.

Sampling & Aliasing Laboratory

Control the signal frequency and sample rate. The left panel shows the continuous wave plus sampled dots and a reconstruction. The right panel shows a live Nyquist analysis. Try to find the exact threshold where aliasing appears.

Nyquist Analysis
Nyquist rate = 2 × 7 = 14 Hz fs = 20 Hz ≥ 14 Hz → SAFE
fN = fs/2:10.0 Hzf < fN ✓
Ts:1/20 = 0.0500 s50.0 ms
Samples/cycle:20/7 = 2.86per period

Time Domain

Sampling Grid & Alias View

Original signal $x(t)$ Samples $x[n]$ Aliased reconstruction (when undersampled) Nyquist frequency line

Week 2 Recap

The four things you must remember from this week — enough to solve any exam question on sampling and aliasing.

📐

The Sampling Equation

$x[n] = x(nT_s)$ with $T_s = 1/f_s$. Every sample is the analog signal evaluated at a clock tick. The integer index $n$ maps to real time $t = n/f_s$.

⚖️

Nyquist-Shannon Theorem

$f_s \geq 2f_{max}$ for perfect reconstruction. The Nyquist frequency is $f_N = f_s/2$. Any signal component above $f_N$ aliases — irreversibly.

🔁

Alias Frequency Formula

When undersampled, a tone at $f$ appears at $f_{alias} = |f - k \cdot f_s|$ where $k$ is chosen so the result falls in $[0, f_s/2]$.

🐍

librosa Pattern

Load with sr=None, resample explicitly with librosa.resample(). Never silently trust the default 22,050 Hz when your model needs a specific rate.

Coming up — Week 3: LTI Systems & 1D Convolution: We now understand how to convert continuous signals into discrete arrays safely. But what do we do with them? Next week, we build our first algorithmic processors — LTI Systems — and learn convolution: the sliding-window mechanic that powers everything from reverb effects to deep learning.

Further Reading

Curated resources to build intuition and reinforce the mathematics — mix of interactive tools, video, and documentation.

Exercises

Rigorous problem sets covering mathematical derivations and functional coding tasks, followed by advanced synthesis applications.

1 Theory · Sampling Fundamentals Easy

Sampling Period and Discrete Sequence Values

A humidity sensor records a signal $x(t) = 8\cos(2\pi \cdot 4 \cdot t)$ volts at a sampling rate of $f_s = 32$ Hz.

(a) Compute the sampling period $T_s$ in milliseconds.
(b) Write the discrete equation $x[n]$ explicitly in terms of $n$.
(c) Compute $x[0],\ x[2],\ x[4],$ and $x[8]$ to four decimal places.

Use $T_s = 1/f_s$, then substitute into $x[n] = 8\cos(2\pi \cdot 4 \cdot n \cdot T_s)$. Simplify the argument before evaluating.
2 Code · Sampling Fundamentals Easy

Generating and Verifying a Discrete Signal in NumPy

Use NumPy to sample $x(t) = 6\sin(2\pi \cdot 3 \cdot t)$ at $f_s = 30$ Hz for exactly 2 seconds.

(a) Create n = np.arange(0, 2*30) and compute x_n.
(b) Verify len(x_n) == 60 and print the first 5 values rounded to 3 decimal places.
(c) Find the index of the first peak using np.argmax(x_n) and verify it equals $f_s / (4 \cdot f_{signal})$.

The first peak of a sine occurs at $t = T/4 = 1/(4f)$. Convert to index: $n_{peak} = t_{peak} \times f_s$.
3 Theory · Nyquist-Shannon & Aliasing Medium

Nyquist Rate and Alias Frequency Calculation

Three signals are given:

(a) Signal A: $f_{max} = 2{,}500$ Hz. What is the minimum safe $f_s$?
(b) Signal B: $f = 11$ Hz sampled at $f_s = 8$ Hz. Compute $f_{alias}$. Is this above or below the Nyquist frequency?
(c) Signal C: $f = 25$ Hz sampled at $f_s = 40$ Hz. Is aliasing present? Justify using the Nyquist criterion.

For aliasing check: compute $f_N = f_s/2$ and compare to the signal frequency. Alias formula: $f_{alias} = |f - k \cdot f_s|$ where $k=\text{round}(f/f_s)$.
4 Code · Nyquist-Shannon & Aliasing Medium

Visualising Aliasing in NumPy

Write a Python function alias_check(f_signal, fs) that:

(a) Returns the tuple (is_aliased: bool, f_alias: float, f_nyquist: float).
(b) When aliased, f_alias = abs(f_signal - round(f_signal/fs)*fs).
(c) Test with (f=13, fs=10) and (f=5, fs=20) and print the results.
(d) Generate 200 samples of the 13 Hz signal at 10 Hz and overlay it with the alias frequency to confirm they match numerically.

Compare np.round(x_f, 6) == np.round(x_alias, 6) for the first 10 samples to verify the sequences are numerically identical.
5 Theory · ML Pipeline Application Medium

Resampling Parameters and Nyquist Safety

A speech dataset is recorded at 44,100 Hz. Your ASR model requires 16,000 Hz input.

(a) Compute $N_{orig}$ for a 3.5-second clip at 44,100 Hz.
(b) Compute $N_{target}$ after resampling to 16,000 Hz.
(c) What is the Nyquist frequency at 16,000 Hz? Is it safe for speech (which tops out at 4,000 Hz)?
(d) A colleague suggests downsampling to 7,000 Hz instead. Is this safe for speech? Explain using the Nyquist theorem.

$N_{target} = N_{orig} \times (f_{s,\text{target}} / f_{s,\text{orig}})$. For part (d): compute $f_N$ at 7,000 Hz and compare to the maximum speech frequency.
6 Code · ML Pipeline Application Medium

Building a Robust Audio Preprocessing Function

Implement a function preprocess(y, sr_native, target_sr) using librosa.resample() that:

(a) If sr_native != target_sr, resamples using librosa. Otherwise passes through unchanged.
(b) Peak-normalises the output to $[-1, 1]$.
(c) Returns (y_out, target_sr).
(d) Test with a synthetic 44,100 Hz cosine of 3 seconds resampled to 16,000 Hz. Verify len(y_out) == 48000 and max(abs(y_out)) == 1.0.

Use y_r = librosa.resample(y, orig_sr=sr_native, target_sr=target_sr). Peak-norm: y_r / np.max(np.abs(y_r)). Check the output with np.allclose().
7 Synthesis · Theory: Multi-Rate Sensor Design Hard

Choosing Sample Rates Across a Sensor Network

A wearable device collects three physiological signals simultaneously:

  • ECG: frequency content up to 150 Hz
  • PPG (pulse oximetry): frequency content up to 10 Hz
  • Skin temperature: frequency content up to 0.5 Hz

(a) Compute the minimum safe sample rate for each sensor.
(b) The device uses a shared ADC at 300 Hz. Which signals are safe? Which are at risk of aliasing?
(c) If a 200 Hz muscle artifact contaminates the ECG at $f_s = 300$ Hz, compute its alias frequency.
(d) Recommend a single ADC rate that safely captures all three signals without excessive data volume.

Apply Nyquist to each signal independently. For (c): $f_N = 150$ Hz at 300 Hz sampling. A 200 Hz artifact is above $f_N$. Alias = $|200 - 300| = 100$ Hz.
8 Synthesis · Code: End-to-End Audio Normalisation Pipeline Hard

Mixed-Rate Dataset Normalisation

Build a complete pipeline that processes a list of synthetic audio clips with different sample rates:

(a) Generate four synthetic clips using NumPy: 3 s at 8,000 Hz, 2 s at 22,050 Hz, 4 s at 44,100 Hz, and 1 s at 48,000 Hz (all cosines at 440 Hz).
(b) For each clip, use alias_check(440, sr) from Ex 4 to verify no aliasing is present.
(c) Resample all clips to 16,000 Hz using preprocess() from Ex 6.
(d) Print a report table: original rate, original length, target rate, target length, duration (should all be equal to original).

Target length = int(duration * target_sr). Duration check: abs(len(y_out)/target_sr - original_duration) < 0.01. Use string formatting to align columns in the report.