Every audio file, sensor reading, and medical scan begins with one decision — how many times per second to measure. Learn the law that governs this choice, what happens when you break it, and how to apply it in real ML pipelines.
The Core Idea
A continuous-time signal has a value at every instant of time. A digital computer can only store a finite list of numbers. Sampling bridges these two worlds by taking periodic "snapshots" — and the rate at which we snap determines everything about what we can recover.
sr parameter in NumPy and librosa code as $f_s$ in the sampling equation.Imagine trying to describe a song to someone over text. You cannot send the whole continuous soundwave — you have to send a list of numbers. But which numbers? How many? And how do you make sure the list is enough to reconstruct the original? That is exactly the problem sampling solves.
sr=22050 you pass to librosa.load() is not a magic constant — it is $f_s$, the sampling frequency, and choosing it incorrectly degrades model performance or silently corrupts your features.
Think of a flip-book animation. Each page is one sample. The continuous movie is the analog signal; the stack of pages is the digital signal. More pages per second = smoother motion = higher $f_s$. The question the Nyquist theorem (Topic 2) answers is: how many pages do you actually need?
sr is $f_s$ — samples / secondSignal taxonomy pipeline: The Analog-to-Digital Converter (ADC) bridges the continuous physical world and the discrete digital domain at a fixed rate $f_s$.
The fundamental relationship that connects the continuous analog signal to its discrete digital representation. Each sample $x[n]$ is simply the value of the analog signal frozen at clock tick $n$:
$$x[n] = x(n \cdot T_s) \qquad T_s = \frac{1}{f_s}$$
Background. The equation $x[n] = x(nT_s)$ replaces continuous time $t$ with the discrete grid $\{0, T_s, 2T_s, \ldots\}$. We evaluate the analog function at each of these clock ticks.
Problem: A sensor records $x(t) = 3\cos(2\pi \cdot 5 \cdot t)$ V at $f_s = 20$ Hz. Compute $T_s$ and the first four samples $x[0], x[1], x[2], x[3]$.
If $f_s = 40$ Hz instead, how many samples appear per cycle of the 5 Hz signal?
sr Really Is in PythonWhen you write y, sr = librosa.load(file), the variable sr is nothing but $f_s$ — the number of samples per second stored in the audio file. Every array index maps to a real physical time via $t = n / f_s$.
| Code | Meaning |
|---|---|
sr = 22050 | $f_s = 22{,}050$ Hz → $T_s \approx 45\,\mu\text{s}$ |
y[0] | $x[0]$ — amplitude at $t = 0\,\text{s}$ |
y[44100] | $x[44100]$ — amplitude at $t = 2\,\text{s}$ |
len(y)/sr | Total duration in seconds |
Background. Each index $n$ represents a time $t = n / f_s$. This lets you convert between array indices (what the code sees) and physical time (what the signal means).
Problem: An audio array y has 88,200 samples loaded at sr=22050. (a) What is the clip duration? (b) What physical time does index y[11025] correspond to?
Which index n corresponds to time $t = 3.0$ s with sr=22050?
Parentheses vs. square brackets are not cosmetic. In signal processing, $x(t)$ with parentheses always means "continuous time" and $x[n]$ with square brackets always means "discrete index." Mixing them up in an exam answer is an automatic mistake — they represent fundamentally different mathematical objects.
Myth: "Higher sample rate always means better quality — more is always more."
Reality: Above twice the signal's highest frequency, extra samples add zero information but double storage costs. Resampling from 44,100 Hz to 16,000 Hz loses no speech information at all, because the human voice stays below 8,000 Hz. The Nyquist theorem (Topic 2) gives the exact minimum.
Before you move the slider: if you sample a 5 Hz sine wave at only 6 samples per second, how many dots do you expect to see per cycle? Will the shape still look like a sine wave?
Hint: one cycle = 1/5 = 0.2 s. At 6 Hz, T_s = 1/6 ≈ 0.167 s. Count dots in 0.2 s.
Every digital signal is a list of numbers tied to a sample rate — and understanding $f_s$ is the first step to understanding any audio, sensor, or ML dataset.
sr=16000 Is the Industry StandardModels like OpenAI's Whisper and Meta's Wav2Vec 2.0 require audio at exactly 16,000 Hz. This is not arbitrary — human speech energy is concentrated below 8,000 Hz, so $f_s = 16{,}000$ Hz satisfies the Nyquist criterion with a comfortable margin. Loading audio at any other rate and feeding it directly to these models corrupts the temporal features and destroys recognition accuracy. The fix is always an explicit librosa.load(file, sr=16000).
Q1 A signal is sampled at $f_s = 8{,}000$ Hz. What is the sampling period $T_s$ in milliseconds?
Q2 An array y has 32,000 samples at sr=16000. What is the duration?
Q3 You call librosa.load(file, sr=22050). What does y[0] represent?
The Fundamental Law
There is a precise mathematical threshold below which sampling destroys information — and above which it preserves everything. The Nyquist-Shannon theorem defines that threshold. Violating it produces aliasing: a silent corruption where high-frequency content masquerades as low-frequency noise.
What if you filmed a spinning helicopter rotor at the wrong frame rate? Sometimes the blades appear to rotate backwards — or even stand still — when they are actually spinning at full speed. That impossible illusion is aliasing, and it does the exact same thing to audio and sensor data when the sample rate is too low.
The stroboscope effect. If you illuminate a spinning wheel with a strobe light that flashes exactly at the wheel's rotation speed, the wheel looks frozen. Flash twice as fast, and you can see it moving. The Nyquist rate $f_s \geq 2f_{max}$ is the "flash twice as fast" rule — the minimum needed to correctly perceive motion.
Each sample carries unique information. The original signal can be perfectly reconstructed.
✓ No information lost
High-frequency components "wrap around" and appear as lower frequencies — aliasing. Reconstruction is impossible.
✗ Permanent information loss
Aliasing step-strip: sampling a 13 Hz signal at only 10 Hz (below the Nyquist rate of 26 Hz) causes the signal to appear as a spurious 3 Hz wave — indistinguishable from a real 3 Hz component.
A band-limited signal with highest frequency $f_{max}$ can be perfectly reconstructed from its samples if and only if the sampling rate satisfies:
$$f_s \geq 2 f_{max}$$
The threshold $f_N = f_s / 2$ is called the Nyquist frequency — the highest frequency that can be represented at a given sample rate:
$$f_N = \frac{f_s}{2}$$
Background. The Nyquist theorem states that you need at least 2 samples per cycle of the highest frequency component. Below this rate, there are not enough samples to distinguish that frequency from a lower one.
Problem: A biomedical ECG signal contains components up to 150 Hz. What is the minimum sample rate to avoid aliasing? What is the Nyquist frequency if we use $f_s = 500$ Hz?
CD audio has $f_s = 44{,}100$ Hz. What is its Nyquist frequency? Does it comfortably cover human hearing (up to 20,000 Hz)?
When $f_s < 2f_{signal}$, the sampled sequence $x[n]$ is identical to what you would get from sampling a lower-frequency signal. The aliased frequency is:
$$f_{alias} = \left| f_{signal} - k \cdot f_s \right|$$
where $k$ is the integer that places $f_{alias}$ in $[0,\, f_s/2]$. For the simplest case:
$$f_{alias} = \left| f_{signal} - f_s \right| \quad \text{(when } f_s < 2f_{signal} \leq 2f_s\text{)}$$
Background. When a frequency $f$ is sampled below Nyquist, there is no way to distinguish it from a lower frequency — the sample sequences are mathematically identical. The alias frequency is where the energy appears to be.
Problem: A 13 Hz tone is sampled at $f_s = 10$ Hz (below the Nyquist rate of 26 Hz). At what frequency does it appear in the spectrum?
A 7 Hz signal is sampled at $f_s = 10$ Hz. Is aliasing present? If yes, compute $f_{alias}$.
Myth: "I can just filter out aliased frequencies after sampling."
Reality: Once aliasing has occurred, the aliased frequency is indistinguishable from a genuine signal at that frequency — you cannot unmix them after the fact. Aliasing must be prevented before sampling with an anti-aliasing low-pass filter that removes content above $f_N = f_s/2$.
The widget below shows a 13 Hz signal. Before you drag the slider: at what sample rate do you expect the reconstructed wave to start "looking wrong" — and what frequency will the alias appear at if you set $f_s = 10$ Hz?
Hint: the Nyquist rate is $2 \times 13 = 26$ Hz. Use the alias formula $f_{alias} = |f_{signal} - f_s|$.
The Nyquist-Shannon theorem gives a hard mathematical lower bound on sample rate: $f_s \geq 2f_{max}$ — violate it and high-frequency content permanently masquerades as low-frequency noise.
Consumer smartwatch PPG sensors often sample at 25–50 Hz to conserve battery. A 30 Hz breathing artifact superimposed on the cardiac signal aliases to $|30 - 25| = 5$ Hz at a 25 Hz sample rate — right inside the cardiac frequency band (0.5–4 Hz). This aliased component has fooled early AI models into misclassifying normal resting heart rhythms as pathological. Medical-grade devices use $f_s \geq 250$ Hz with a strict anti-aliasing filter below 125 Hz to prevent this entirely.
Q1 A signal contains frequencies up to 3,500 Hz. What is the minimum sample rate to prevent aliasing?
Q2 A 9 Hz tone is sampled at $f_s = 8$ Hz. What alias frequency appears in the spectrum?
Q3 At $f_s = 44{,}100$ Hz, what is the Nyquist frequency, and what happens to a 25,000 Hz overtone?
Putting It to Work
sr in librosa & Resampling for MLThe theory of sampling and Nyquist is only useful when you can apply it to real data. This section shows you exactly how to choose the correct sample rate when loading audio with librosa, how to detect and fix mismatched rates, and how to build a resampling step into a production ML pipeline.
librosa.load() with an explicit sr parameter to load audio at a target sample rate.librosa.resample() without introducing aliasing.You have a dataset of 10,000 audio files from five different microphones — recorded at 8,000, 16,000, 22,050, 44,100, and 48,000 Hz. If you feed them directly to a speech model, every file will be "heard" at the wrong speed. The model will perform as if each clip were the wrong duration, at the wrong pitch. The fix is three lines of Python and one design decision: what rate does your model need?
Think of sample rate like a language. A model trained on English cannot understand French — even though both are human languages. A model trained on 16 kHz audio "speaks" 16 kHz; feeding it 44.1 kHz audio is like speaking French to it. Resampling is the translation step that converts every file to the language the model understands.
Production ML audio pipeline: always load at native rate, check and resample explicitly, then extract features — never let librosa silently resample at load time.
The librosa.load() function reads an audio file and returns a NumPy array y plus the sample rate integer sr. The sr parameter controls resampling at load time:
| Call | Result |
|---|---|
librosa.load(f) | Resamples to 22,050 Hz (default) |
librosa.load(f, sr=16000) | Resamples to exactly 16,000 Hz |
librosa.load(f, sr=None) | Preserves original rate — no resampling |
librosa.resample(y, orig_sr=r1, target_sr=r2) | Converts array already in memory |
Background. When you resample from $f_{s1}$ to $f_{s2}$, the duration stays the same but the number of samples changes proportionally:
Problem: A 3-second audio clip is loaded at 44,100 Hz. (a) How many samples does it have? (b) After resampling to 16,000 Hz, how many samples remain?
A 2-second clip at 22,050 Hz is resampled to 8,000 Hz. What is the new array length?
A production-grade preprocessing function must (1) handle any input rate, (2) enforce the target rate the model needs, (3) normalise amplitude, and (4) be deterministic. The pattern:
sr=None first to discover the native rate without silent resampling.librosa.resample() — this makes the step visible in code review.assert len(y) == TARGET_SR * duration — to catch edge cases.Background. Downsampling (reducing $f_s$) is only safe if the new Nyquist frequency $f_{s,\text{new}}/2$ still covers all signal content. Upsampling (increasing $f_s$) is always safe — it adds zero new information but costs extra memory.
Problem: You want to downsample a speech file from 44,100 Hz to 8,000 Hz. (a) Is this safe if speech tops out at 4,000 Hz? (b) What is the resampling ratio?
y[::5]), which skips the filter and introduces aliasing.Why is y_down = y[::5] a bad way to downsample from 22,050 Hz to ~4,410 Hz?
Myth: "I called librosa.load(file) — the sample rate is whatever the file is."
Reality: librosa's default is sr=22050. If your file is 44,100 Hz, librosa silently downsamples it to 22,050 Hz before returning it. If your file is 8,000 Hz, it silently upsamples to 22,050 Hz. Use sr=None to get the file's actual rate, or pass an explicit sr= value to enforce a specific rate.
The widget below shows a mixed-rate dataset. Before you move the slider: if you set the target rate to 8,000 Hz, which files can be losslessly downsampled and which will lose information? The answer depends on whether each file's content exceeds the new Nyquist frequency of 4,000 Hz.
Hint: speech tops out ~4,000 Hz (safe at 8k). Music may have components up to 20,000 Hz (not safe at 8k).
Always load with sr=None to see the native rate, then resample explicitly — never silently trust librosa's default 22,050 Hz when your model expects something else.
Both OpenAI Whisper and Meta Wav2Vec 2.0 are trained exclusively on 16,000 Hz audio. At 16 kHz, the Nyquist frequency is 8,000 Hz — sufficient for all speech phonemes. Passing a file at a different rate does not raise an error; the model silently interprets the wrong number of samples per window as audio at the wrong pitch and tempo. In benchmark evaluations, this silent mismatch can reduce word error rate from 3% to over 30% with no warning. The fix is always the same: a single librosa.load(file, sr=16000).
Q1 What does librosa.load(file) (default) do to a 44,100 Hz file?
sr=None.Q2 A 5-second clip at 22,050 Hz is resampled to 8,000 Hz. What is the new array length?
Q3 Why is y_down = y[::3] a bad way to downsample by 3×?
librosa.resample(), which applies a proper low-pass filter before reducing the sample count.Interactive Lab
Control the signal frequency and sample rate. The left panel shows the continuous wave plus sampled dots and a reconstruction. The right panel shows a live Nyquist analysis. Try to find the exact threshold where aliasing appears.
Key Ideas
The four things you must remember from this week — enough to solve any exam question on sampling and aliasing.
$x[n] = x(nT_s)$ with $T_s = 1/f_s$. Every sample is the analog signal evaluated at a clock tick. The integer index $n$ maps to real time $t = n/f_s$.
$f_s \geq 2f_{max}$ for perfect reconstruction. The Nyquist frequency is $f_N = f_s/2$. Any signal component above $f_N$ aliases — irreversibly.
When undersampled, a tone at $f$ appears at $f_{alias} = |f - k \cdot f_s|$ where $k$ is chosen so the result falls in $[0, f_s/2]$.
Load with sr=None, resample explicitly with librosa.resample(). Never silently trust the default 22,050 Hz when your model needs a specific rate.
Go Deeper
Curated resources to build intuition and reinforce the mathematics — mix of interactive tools, video, and documentation.
Jack Schaedler's visual, interactive introduction to sampling and sine waves — build deep geometric intuition without equations first.
→ jackschaedler.github.io Official DocsReference for librosa.load(), librosa.resample(), and the full feature extraction API used in this course.
The scipy.signal module — anti-aliasing filters, decimation, and resampling functions used in production DSP pipelines.
Understand how np.fft.rfftfreq(N, d=1/fs) relates FFT bin indices to physical frequencies — essential for the DFT weeks ahead.
Practice
Rigorous problem sets covering mathematical derivations and functional coding tasks, followed by advanced synthesis applications.
A humidity sensor records a signal $x(t) = 8\cos(2\pi \cdot 4 \cdot t)$ volts at a sampling rate of $f_s = 32$ Hz.
(a) Compute the sampling period $T_s$ in milliseconds.
(b) Write the discrete equation $x[n]$ explicitly in terms of $n$.
(c) Compute $x[0],\ x[2],\ x[4],$ and $x[8]$ to four decimal places.
Use NumPy to sample $x(t) = 6\sin(2\pi \cdot 3 \cdot t)$ at $f_s = 30$ Hz for exactly 2 seconds.
(a) Create n = np.arange(0, 2*30) and compute x_n.
(b) Verify len(x_n) == 60 and print the first 5 values rounded to 3 decimal places.
(c) Find the index of the first peak using np.argmax(x_n) and verify it equals $f_s / (4 \cdot f_{signal})$.
Three signals are given:
(a) Signal A: $f_{max} = 2{,}500$ Hz. What is the minimum safe $f_s$?
(b) Signal B: $f = 11$ Hz sampled at $f_s = 8$ Hz. Compute $f_{alias}$. Is this above or below the Nyquist frequency?
(c) Signal C: $f = 25$ Hz sampled at $f_s = 40$ Hz. Is aliasing present? Justify using the Nyquist criterion.
Write a Python function alias_check(f_signal, fs) that:
(a) Returns the tuple (is_aliased: bool, f_alias: float, f_nyquist: float).
(b) When aliased, f_alias = abs(f_signal - round(f_signal/fs)*fs).
(c) Test with (f=13, fs=10) and (f=5, fs=20) and print the results.
(d) Generate 200 samples of the 13 Hz signal at 10 Hz and overlay it with the alias frequency to confirm they match numerically.
np.round(x_f, 6) == np.round(x_alias, 6) for the first 10 samples to verify the sequences are numerically identical.A speech dataset is recorded at 44,100 Hz. Your ASR model requires 16,000 Hz input.
(a) Compute $N_{orig}$ for a 3.5-second clip at 44,100 Hz.
(b) Compute $N_{target}$ after resampling to 16,000 Hz.
(c) What is the Nyquist frequency at 16,000 Hz? Is it safe for speech (which tops out at 4,000 Hz)?
(d) A colleague suggests downsampling to 7,000 Hz instead. Is this safe for speech? Explain using the Nyquist theorem.
Implement a function preprocess(y, sr_native, target_sr) using librosa.resample() that:
(a) If sr_native != target_sr, resamples using librosa. Otherwise passes through unchanged.
(b) Peak-normalises the output to $[-1, 1]$.
(c) Returns (y_out, target_sr).
(d) Test with a synthetic 44,100 Hz cosine of 3 seconds resampled to 16,000 Hz. Verify len(y_out) == 48000 and max(abs(y_out)) == 1.0.
y_r = librosa.resample(y, orig_sr=sr_native, target_sr=target_sr). Peak-norm: y_r / np.max(np.abs(y_r)). Check the output with np.allclose().A wearable device collects three physiological signals simultaneously:
(a) Compute the minimum safe sample rate for each sensor.
(b) The device uses a shared ADC at 300 Hz. Which signals are safe? Which are at risk of aliasing?
(c) If a 200 Hz muscle artifact contaminates the ECG at $f_s = 300$ Hz, compute its alias frequency.
(d) Recommend a single ADC rate that safely captures all three signals without excessive data volume.
Build a complete pipeline that processes a list of synthetic audio clips with different sample rates:
(a) Generate four synthetic clips using NumPy: 3 s at 8,000 Hz, 2 s at 22,050 Hz, 4 s at 44,100 Hz, and 1 s at 48,000 Hz (all cosines at 440 Hz).
(b) For each clip, use alias_check(440, sr) from Ex 4 to verify no aliasing is present.
(c) Resample all clips to 16,000 Hz using preprocess() from Ex 6.
(d) Print a report table: original rate, original length, target rate, target length, duration (should all be equal to original).
int(duration * target_sr). Duration check: abs(len(y_out)/target_sr - original_duration) < 0.01. Use string formatting to align columns in the report.