Signal Processing Foundations — Week 6

Spectral Analysis
& Denoising

Real-world signals are always contaminated by noise. The frequency domain reveals structure that the time domain hides — FFT-based thresholding lets us surgically remove it.

Additive Noise Model SNR Window Functions Welch PSD Spectral Gating Spectral Subtraction

Noise in the Frequency Domain

Random noise looks chaotic in the time domain, but its frequency-domain signature is remarkably structured — and that structure is the key to removing it without touching the signal.

After this section you will be able to
  • Calculate SNR in dB given signal and noise power, and reverse the formula to find required noise levels.
  • Identify white, tonal, and pink noise from their spectral shapes and state the removal strategy for each.
  • Apply the additive noise model to express a noisy signal in both time and frequency domains.

You record a voice memo on your phone and play it back — there's your voice, and underneath it a faint, constant hiss. You can hear that the noise and the voice are separate things, but staring at the waveform tells you nothing about how to separate them. So where do you even begin?

🎯
Why this matters: The frequency domain makes noise visible. White noise spreads evenly across every bin; tonal hum concentrates in a single spike; your voice occupies a specific band. Once you can see the separation, you can act on it — this is the entire foundation of audio denoising, medical signal processing, and telecommunications.
🔗
Think of it this way

The frequency domain is like separating an orchestra into individual instruments. In the time domain, all players sound like one loud blur. In the frequency domain, each instrument occupies its own row in the score — and a bad musician (noise) who plays in the wrong rows becomes immediately visible and easy to mute.

0 dB
SNR Threshold
At 0 dB noise power equals signal power — the signal is barely detectable
20 dB
SNR
Signal is 100× more powerful than noise — clean, professional audio quality
−13 dB
Rect. Sidelobe
Peak sidelobe level of a rectangular window — why windowing is always needed
4
Noise Types
White, tonal, pink (1/f), and Brownian (1/f²) — each with a different spectral fingerprint
Spectral Signatures of the Four Common Noise Types ① White Noise — flat PSD (equal energy at every frequency) → flat PSD, equal energy in every bin 0 Hzfs/2 ② Tonal Noise (60 Hz hum) — single sharp spectral spike 60 Hz → all energy at one frequency, easy to zero-out by index 0 Hzfs/2 ③ Pink Noise (1/f) — energy decreases with frequency (−10 dB/decade) ∝ 1/f 0 Hzfs/2 Brownian/Red noise (∝ 1/f²) has even steeper roll-off. Each type demands a different removal strategy.

Four noise types and their frequency-domain signatures. Identifying the type is the first step of any denoising pipeline.

Problem

The Additive Noise Model

In almost every real-world scenario noise adds to the signal — it does not multiply or distort it. This is the additive noise model, and it is what makes frequency-domain denoising possible: because the FFT is a linear operation, it preserves the additive structure.

📐 Additive Noise Model

Time domain: measured signal = clean signal + noise

$$x[n] = s[n] + w[n]$$

Because the FFT is linear ($\mathcal{F}\{a+b\} = \mathcal{F}\{a\} + \mathcal{F}\{b\}$), the same structure holds in the frequency domain:

$$X[k] = S[k] + W[k]$$

At bins where $|W[k]|$ is small, $X[k] \approx S[k]$ — we can recover the signal by selectively keeping the large-magnitude bins and zeroing the rest.

📏 Signal-to-Noise Ratio (SNR)

SNR quantifies how much stronger the signal is than the noise, expressed on a logarithmic (decibel) scale:

$$\text{SNR} = 10\log_{10}\!\left(\frac{P_{\text{signal}}}{P_{\text{noise}}}\right) \quad [\text{dB}]$$

where $P = \frac{1}{N}\sum_{n=0}^{N-1}x[n]^2$ is mean power. At 0 dB the noise equals the signal; at 20 dB the signal is 100× more powerful.

$$\text{SNR} = 10\log_{10}\!\left(\dfrac{P_{\text{signal}}}{P_{\text{noise}}}\right)$$
SNR
Output
Signal-to-Noise Ratio in decibels
10
Power factor
Use 10 (not 20) for power ratios; 20 for amplitude
Ps
Signal power
Mean squared amplitude of clean signal
Pw
Noise power
Mean squared amplitude of noise
📝 Worked Example — Calculating SNR for a Noisy Sine Wave

Background. The SNR formula maps a power ratio onto a log scale. The factor of 10 (not 20) is used because power is proportional to amplitude squared — each doubling of amplitude is a 6 dB increase, while a doubling of power is only 3 dB.

Problem: A 100 Hz sine wave has amplitude $A = 1.0$ V. White Gaussian noise with RMS $= 0.1$ V is added. What is the SNR in dB?

1
Find signal power.
For a sine wave $A\sin(\theta)$, mean power $= A^2/2 = 1.0^2/2 = 0.5$ W
2
Find noise power.
$P_{\text{noise}} = \text{RMS}^2 = 0.1^2 = 0.01$ W
3
Compute power ratio.
$P_s/P_w = 0.5/0.01 = 50$
4
Convert to dB.
$\text{SNR} = 10\log_{10}(50) = 10 \times 1.699 \approx 17.0$ dB
SNR ≈ 17.0 dB — the signal is 50× more powerful than the noise. Clearly audible above the noise floor.
Quick Check

If the noise RMS doubles from 0.1 to 0.2 V (signal unchanged), by how many dB does the SNR drop?

Doubling noise RMS quadruples noise power → SNR drops by 10·log₁₀(4) ≈ 6.0 dB.
New SNR = 17.0 − 6.0 = 11.0 dB.

Noise Types & Their Spectral Signatures

Each noise type has a distinct spectral "fingerprint." Knowing the shape tells you which removal strategy to apply.

TypePSD shapeBest removal
WhiteFlat ($\propto 1$)Threshold all bins uniformly
TonalSingle spikeNotch filter or zero specific bin
Pink (1/f)Slope −10 dB/decadeFrequency-shaped threshold
Brownian (1/f²)Slope −20 dB/decadeHigh-pass + adaptive filter
⚠️
Common Mistake

Confusing RMS amplitude and power. $P = \text{RMS}^2$, not $\text{RMS}$. A common error is writing $\text{SNR} = 10\log_{10}(\text{RMS}_s/\text{RMS}_w)$ — the correct formula for amplitude ratios is $20\log_{10}(\text{RMS}_s/\text{RMS}_w)$, which equals the power version because the square inside the ratio becomes a factor of 2.

Solution
Pause & Predict

Before exploring the widget: if you switch from White to Tonal noise at the same noise level, do you expect the spectrum to look simpler or more complex? Where will the noise energy appear?

Hint: think about how many bins carry the noise energy in each case.

Try It: Noise Type Spectrum Visualizer

Select a noise type and adjust the level to see how different noise types appear in the frequency spectrum alongside a 100 Hz signal tone.

Noise level 20%
Signal peaks Tonal spike White noise floor Pink noise floor
Implementation
Python · NumPy / SciPy — Generating Noise & Measuring SNR
import numpy as np from scipy.fft import rfft, rfftfreq fs = 1000; N = 4096; t = np.arange(N) / fs # Clean 2-tone signal signal = (1.0 * np.sin(2*np.pi*100*t) + 0.6 * np.sin(2*np.pi*250*t)) # White Gaussian noise (RMS ≈ 0.3) noise = np.random.normal(0, 0.3, N) x_noisy = signal + noise # SNR calculation snr_db = 10 * np.log10(np.var(signal) / np.var(noise)) print(f"SNR = {snr_db:.1f} dB") # → SNR ≈ 7.4 dB # Frequency-domain view X = rfft(x_noisy); freqs = rfftfreq(N, 1/fs) # import matplotlib.pyplot as plt # plt.semilogy(freqs, np.abs(X)); plt.xlabel("Hz"); plt.ylabel("|X[k]|")
Output
SNR = 7.4 dB [semilogy plot: two tall spikes at 100 Hz and 250 Hz rising above a flat noise floor ≈ constant across all frequencies]
Key Takeaway

Because the FFT is linear, noise and signal add independently in the frequency domain — bins dominated by signal energy can be recovered by thresholding bins where noise power exceeds signal power.

👂
Real-World Application

Hearing Aids & Speech Signal Separation

Modern hearing aids (Phonak, ReSound) must amplify speech (300–3400 Hz) without amplifying background noise. A miniature DSP chip performs an FFT every 8 ms: frequency bands with low SNR are attenuated; bands with high SNR (clear speech energy) are amplified. The chip classifies ambient noise type — white, babble, traffic — and selects the optimal denoising strategy per-band. This SNR-aware, per-band processing is spectral subtraction-based noise reduction, the foundational technique of this week.

CheckpointNoise in the Frequency Domain

Q1 A signal has $P_{\text{signal}} = 2.0$ W and $P_{\text{noise}} = 0.02$ W. What is the SNR in dB?

SNR = 10·log₁₀(2.0/0.02) = 10·log₁₀(100) = 20 dB.

Q2 Which noise type would you use a notch filter to remove, and why?

Tonal noise (e.g., 60 Hz hum). All its energy is concentrated in a single spectral bin, so zeroing that bin removes the interference without affecting any other frequency.

Q3 If $x[n] = s[n] + w[n]$, write the frequency-domain equivalent and state what it means for denoising.

$X[k] = S[k] + W[k]$. At bins where noise $|W[k]|$ is small, $X[k] \approx S[k]$ — denoising is possible by keeping only the large-magnitude bins and zeroing the small ones.

Window Functions & Spectral Leakage

The DFT silently assumes your $N$-sample frame repeats forever. When a tone's frequency doesn't align with an exact bin, this assumption creates a sharp edge — and sharp edges spread energy across every frequency.

After this section you will be able to
  • Explain why spectral leakage occurs when a signal is not integer-periodic in the DFT window.
  • Compute a Hann window coefficient $w[n]$ for any $n$ and $N$ by hand.
  • Choose the appropriate window for a given signal by comparing sidelobe level vs main-lobe width trade-offs.

Take a 1-second clip of music that plays a pure 440 Hz note. Run an FFT. You expect one spike at 440 Hz — but the spectrum shows energy smeared across 430 Hz, 435 Hz, 445 Hz, 450 Hz, and beyond. No other frequencies are present in the music, so where did all that extra energy come from?

🎯
Why this matters: Every professional audio analyzer, weather radar, medical monitor, and communications system applies a window function before computing the FFT. Without it, strong components "bleed" into neighbouring bins, masking weaker signals nearby — a −13 dB sidelobe from a strong sine wave can completely bury a signal 5× weaker in the adjacent bin.
🔗
Think of it this way

A window function is like a dimmer switch for a recording booth. The rectangular window is a light switch — abruptly on, then abruptly off — which creates a loud "click" at both ends of the recording. The Hann window slowly fades the sound in and out, so there are no clicks. Just as click-free audio has no high-frequency artifacts, a smooth window produces no spectral leakage sidebands.

Rect
Rectangular
Implicit window — always "on". Maximum frequency resolution but worst leakage.
Peak sidelobe: −13 dB
Hann
Hann Window
Raised cosine that fades to zero at both endpoints. General-purpose choice.
Peak sidelobe: −31 dB
Hamm
Hamming
Slightly raised cosine — doesn't reach zero. Optimised for speech processing.
Peak sidelobe: −41 dB
BK
Blackman
Three-term cosine sum. Best sidelobe suppression for high-dynamic-range signals.
Peak sidelobe: −58 dB
Problem

The Leakage Problem

The DFT implicitly multiplies the input signal by a rectangular window — creating a hard cut at both ends. In the frequency domain, multiplication by a rectangular window is convolution with its spectrum (a sinc function), which spreads each tone's energy into nearby bins.

Before — Rectangular
true freq leakage Frequency →
After — Hann Window
clean peak Frequency →

📐 DFT Coefficient Formula

$$X[k] = \sum_{n=0}^{N-1} x[n]\, e^{-j2\pi kn/N}$$

This sum is exact only when the signal completes an integer number of cycles in $N$ samples. If $f_0/\Delta f$ is not an integer, energy spreads across all bins — spectral leakage.

📝 Worked Example — Off-Grid Frequency Causes Leakage

Background. The DFT basis functions $e^{j2\pi kn/N}$ are mutually orthogonal for integer $k$. A signal at a non-integer bin index $k+\delta$ has non-zero inner product with every basis vector, distributing energy across all $N$ bins.

Problem: $N=8$, $f_s=8$ Hz, pure tone $x[n]=\cos(2\pi\times 2.5\times n/8)$. The ideal bin is $k_{\text{true}}=2.5$ — not an integer. How much leakage reaches bin $k=1$?

1
Confirm the non-integer bin.
$\Delta f = f_s/N = 8/8 = 1$ Hz  |  $k_{\text{true}} = 2.5/1 = 2.5$ — falls between bins 2 and 3.
2
Write the DFT at $k=1$.
$X[1]=\sum_{n=0}^{7}\cos\!\left(\tfrac{2\pi\times 2.5\,n}{8}\right)e^{-j2\pi n/8}$ — this sum does not cancel because 1.5 is not an integer.
3
Evaluate first three terms (real part).
$n{=}0$: $1.000$   $n{=}1$: $\approx -0.271$   $n{=}2$: $0.000$
Sum of all 8 terms: $|X[1]|\approx 1.17$ — significant leakage.
4
Compare after Hann window.
After multiplying $x[n]$ by $w[n]=0.5(1-\cos 2\pi n/(N-1))$: $|X[1]|\approx 0.11$
Rectangular: |X[1]| ≈ 1.17 (strong leakage). Hann: |X[1]| ≈ 0.11 (−21 dB reduction, ~10× less leakage).
Quick Check

Why does multiplying by the Hann window reduce leakage? (Think about what happens at the block edges.)

The Hann window equals zero at n=0 and n=N−1. This removes the hard discontinuity at the block boundaries that causes leakage — the periodic extension of the windowed signal is now smooth, so its DFT has low sidelobes.

Standard Window Functions

📐 Window Formulas ($n = 0, 1, \ldots, N{-}1$)

WindowFormula $w[n]$Peak Sidelobe
Rectangular$1$−13 dB
Hann$0.5\!\left(1-\cos\dfrac{2\pi n}{N{-}1}\right)$−31 dB
Hamming$0.54-0.46\cos\dfrac{2\pi n}{N{-}1}$−41 dB
Blackman$0.42-0.5\cos\dfrac{2\pi n}{N{-}1}+0.08\cos\dfrac{4\pi n}{N{-}1}$−58 dB
📝 Worked Example — Hann Window Coefficient at n = 4, N = 16

Background. The Hann window equals zero at both endpoints ($n=0$ and $n=N-1$), making the windowed signal's periodic extension continuous at frame boundaries and eliminating the step discontinuity that causes leakage.

Problem: Compute $w[4]$ for a Hann window of length $N=16$. Verify boundary and centre values.

1
Plug into the formula.
$w[4]=0.5\!\left(1-\cos\dfrac{2\pi\times4}{15}\right)=0.5\!\left(1-\cos\dfrac{8\pi}{15}\right)$
Argument: $8\pi/15\approx1.676$ rad $=96.0°$
2
Evaluate the cosine.
$\cos(96.0°)\approx-0.1045$
$w[4]=0.5\times(1-(-0.1045))=0.5\times1.1045\approx0.552$
3
Verify boundaries and centre.
$w[0]=0.5(1-\cos0)=0$ ✓   $w[15]=0$ ✓   $w[\approx 7.5]\approx1.0$ ✓
w[4] ≈ 0.552 — sample 4 is scaled to 55% of its original amplitude as the window tapers toward zero at both ends.
Quick Check

What is the trade-off of choosing Blackman over Rectangular for spectral analysis?

Blackman has much lower sidelobes (−58 dB vs −13 dB) — better for detecting weak signals near strong ones. But its main lobe is 6 bins wide vs 2 bins, so it cannot resolve two frequencies that are close together as well as Rectangular can.
💡
Key Insight

Resolution vs leakage is always a trade-off. Suppressing leakage widens the spectral main lobe, blurring nearby components. Hann is the general-purpose compromise: −31 dB sidelobes and a 4-bin main lobe. Only switch to Blackman (6 bins) when you need to detect a weak tone near a strong one.

Solution
Pause & Predict

If you move the tone further from the nearest DFT bin (increase the frequency offset), do you expect the leakage to get better or worse? Move the slider to find out.

Hint: think about what the DFT's inner product formula produces when $\delta \to 0.5$.

Try It: Window Leakage Explorer

Slide the frequency offset to place the tone between DFT bins, then switch window type. Teal bars = signal energy · Red bars = leakage · Yellow line = true frequency.



Signal energy Leakage True frequency
Implementation
Python · NumPy — Comparing Window Leakage
import numpy as np from scipy.fft import rfft, rfftfreq fs = 1000; N = 512; t = np.arange(N) / fs # Signal at 97.7 Hz — between bins (Δf = fs/N ≈ 1.95 Hz/bin) x = np.sin(2 * np.pi * 97.7 * t) windows = { 'Rectangular': np.ones(N), 'Hann': np.hanning(N), 'Hamming': np.hamming(N), 'Blackman': np.blackman(N), } freqs = rfftfreq(N, 1/fs) for name, win in windows.items(): X = rfft(x * win) # plot 20*log10(|X|/N) in dB — zoom 80-120 Hz
Output
Rectangular: wide smear, sidelobes visible ±10 Hz from true frequency Hann: narrower peak, sidelobes −31 dB below main lobe Hamming: similar width to Hann, sidelobes slightly lower Blackman: widest main lobe, sidelobes suppressed to −58 dB
Key Takeaway

Always window before FFT in spectral analysis — the Hann window is the default choice, reducing peak sidelobes from −13 dB (rectangular) to −31 dB with minimal resolution cost.

📡
Real-World Application

Radar Clutter Rejection with Blackman Windows

Modern weather radars (e.g., NEXRAD WSR-88D) apply a Blackman window to each radar pulse before computing the Doppler FFT. Without windowing, strong ground-clutter returns from buildings and terrain leak across the entire spectrum, masking weak precipitation signals. The Blackman window's −58 dB sidelobe level keeps clutter leakage below the thermal noise floor, enabling detection of rain and wind shear at ranges beyond 400 km — directly saving lives through severe weather early warning.

CheckpointWindow Functions & Spectral Leakage

Q1 Compute the Hann window coefficient $w[2]$ for $N=8$.

$w[2]=0.5(1-\cos(2\pi\cdot2/7))=0.5(1-\cos(4\pi/7))$
$4\pi/7\approx1.795$ rad $\approx102.9°$
$\cos(102.9°)\approx-0.2225$
$w[2]=0.5\times1.2225\approx\mathbf{0.611}$

Q2 A pure 60 Hz tone is recorded at $f_s=240$ Hz with $N=12$ samples. Is 60 Hz on an exact DFT bin? (Check: $k_{\text{true}}=60/(f_s/N)$)

$\Delta f = 240/12 = 20$ Hz/bin. $k_{\text{true}} = 60/20 = 3$ — an integer. Yes, exactly on bin 3. No leakage occurs with the rectangular window.

Q3 You need to detect a faint −30 dB signal sitting 2 bins away from a strong tone. Which window should you use, and why?

Use Blackman (or Hamming at minimum). The rectangular window's sidelobes are −13 dB — far above the −30 dB target signal. Blackman's −58 dB sidelobes keep the leakage from the strong tone below the weak signal's level.

Spectral Analysis: Reading the Frequency Domain

Before we can remove noise, we must learn to read the spectrum precisely — magnitude, power, frequency resolution, and how windowed averaging sharpens the estimate.

After this section you will be able to
  • Compute frequency resolution $\Delta f$ from sample rate and number of samples, and determine the minimum recording duration to resolve two tones.
  • Distinguish magnitude spectrum, power spectrum, PSD, and dB scale — and convert between them.
  • Explain why Welch's method reduces variance compared to the naive periodogram and the trade-off with resolution.

Two machines in a factory vibrate at 99.8 Hz and 100.2 Hz. You bolt an accelerometer to the casing and record two seconds of data. Can the FFT tell them apart? The answer depends entirely on how long you recorded — and that question has a precise, calculable answer.

🎯
Why this matters: Spectral analysis is the foundation of predictive maintenance, acoustic quality control, and medical monitoring. The choices you make — sample rate, window length, nperseg — directly determine whether a critical defect frequency is detectable or buried.
🔗
Think of it this way

Frequency resolution is like pixel density on a screen. More pixels (longer recording = more samples = smaller $\Delta f$) let you see finer detail. A 100-pixel-wide photo cannot distinguish two objects 50 cm apart in a scene that spans 100 metres — just as a short recording cannot distinguish two close frequencies. You need more "pixels" (duration) to zoom in.

1/T
Δf formula
Recording longer (larger T) always improves frequency resolution
50%
Overlap (Welch)
Typical overlap for Welch's method — balances variance reduction and bias
1/K
Variance drop
Welch averages K frames — variance falls K times compared to naive periodogram
−3 dB
Half-power point
Standard reference level for filter bandwidth and spectral peak width
Problem

Magnitude, Power, and dB

QuantityFormulaUnits
Magnitude spectrum$|X[k]|$Amplitude
Power spectrum$|X[k]|^2/N$Power
PSD (Welch)$\lim_{N\to\infty}\frac{1}{N}E[|X[k]|^2]$Power/Hz
Log scale$20\log_{10}|X[k]|$dB

📐 Frequency Resolution

$$\Delta f = \frac{f_s}{N} = \frac{1}{T_{\text{duration}}}$$

To resolve two tones $f_1$ and $f_2$ you need $|f_1-f_2|\geq\Delta f$, which means recording for at least $T=1/|f_1-f_2|$ seconds. Longer recording = finer frequency resolution.

$$\Delta f = f_s / N$$
Δf
Bin width
Frequency gap between adjacent DFT bins (Hz)
fs
Sample rate
Samples per second (Hz)
N
Frame length
Number of samples in the DFT window
📝 Worked Example — Computing Δf and Required Duration

Background. The FFT divides $[0, f_s/2]$ into $N/2$ equal bins of width $\Delta f = f_s/N$. Two tones must be separated by at least one bin. Because $N = f_s \times T$, the bin width equals $1/T$ — recording twice as long halves the bin width.

Problem: $f_s = 8000$ Hz, $T = 0.5$ s. (a) Find $\Delta f$. (b) Can 100 Hz and 102 Hz be resolved? (c) Duration for $\Delta f = 0.25$ Hz?

1
Samples.
$N = 8000 \times 0.5 = 4000$
2
Resolution.
$\Delta f = 8000/4000 = 2.0$ Hz/bin
3
Resolvable?
Separation = $102-100 = 2.0$ Hz $= \Delta f$ — marginally at the limit. In practice, need $\geq 2\Delta f$ for clean separation.
Δf = 2.0 Hz. The 2 Hz separation is at the limit — cannot reliably distinguish the two tones with T = 0.5 s.
4
Required duration.
$T = 1/0.25 = 4.0$ s   ($N = 32{,}000$ samples)
Quick Check

At $f_s = 44100$ Hz with $N = 2048$ samples, what is $\Delta f$?

Δf = 44100/2048 ≈ 21.5 Hz. Two tones need to be at least ~21.5 Hz apart to appear as distinct peaks.
⚠️

Periodogram is inconsistent: The naive $|\text{FFT}|^2/N$ estimator has variance that does not decrease as $N$ grows — doubling the signal length just gives more jagged detail, not a smoother spectrum. Fix: use Welch's averaged method.

Welch's Method — Noise-Robust PSD

Welch (1967) improves the periodogram by dividing the signal into overlapping blocks, windowing each, computing each periodogram, and averaging. Variance drops as $1/K$ where $K$ is the number of blocks.

  1. Segment into $K$ overlapping blocks (typically 50% overlap).
  2. Window each block with a Hann window.
  3. Periodogram each windowed block: $|X_k[f]|^2/(N_{\text{seg}}\cdot U)$.
  4. Average all $K$ periodograms — variance drops $K\times$.
💡

Resolution vs Variance: For 4096 samples with nperseg=512 at 50% overlap, $K\approx16$ frames — variance drops 16× but $\Delta f = f_s/512$ (coarser). Choose nperseg based on the narrowest feature you need to resolve.

Python · scipy.signal.welch

scipy.signal — Welch PSD
from scipy.signal import welch f, Pxx = welch(x_noisy, fs=fs, nperseg=1024, # window length noverlap=512) # 50% overlap # plt.semilogy(f, Pxx) # plt.xlabel('Frequency (Hz)') # plt.ylabel('PSD (V²/Hz)')
Solution
Pause & Predict

If you increase nperseg from 256 to 1024, do you expect the Welch PSD to become smoother or rougher? And what will happen to the frequency resolution?

Hint: larger nperseg means fewer segments K — what does that do to variance? And $\Delta f = f_s/\text{nperseg}$?

Try It: Welch vs Naive PSD Smoothing

Adjust the segment size to see the variance–resolution trade-off. More segments = smoother PSD but coarser frequency resolution.

Segments (nperseg) 512
Naive |FFT|²/N — High Variance
Welch Average — Low Variance
Δf: — Hz | Segments K: —
Implementation
Python · NumPy — Naive Periodogram vs Welch PSD
import numpy as np from scipy.fft import rfft, rfftfreq from scipy.signal import welch fs = 1000; N = 4096 # Naive periodogram X = rfft(x_noisy) psd_naive = np.abs(X)**2 / N # high variance # Welch PSD: 8× smoother (K≈8 frames at nperseg=1024) f, Pxx = welch(x_noisy, fs=fs, nperseg=1024, noverlap=512)
Output
Naive Δf = 0.244 Hz | variance very high — jagged trace Welch Δf = 0.977 Hz | variance ≈ 8× lower — smooth trace Signal peak at 100 Hz clearly visible above noise floor in both cases
Key Takeaway

Welch's method trades frequency resolution for variance reduction — choosing nperseg is the key design decision: smaller segments give smoother estimates but blur closely-spaced frequency components.

🏭
Real-World Application

Predictive Maintenance via Bearing Wear Detection

As industrial motor bearings wear, they generate vibration at a Bearing Defect Frequency calculable from bearing geometry. Maintenance engineers run Welch PSD estimation every hour on accelerometer data. When the PSD shows a rising peak at the predicted defect frequency above a threshold, an alert triggers bearing replacement — preventing the $50,000–$200,000 cost of an unplanned shutdown.

CheckpointSpectral Analysis

Q1 $f_s = 8000$ Hz, $T = 2$ s. Compute (a) $N$, (b) $\Delta f$, (c) minimum duration to achieve $\Delta f = 0.1$ Hz.

(a) $N = 8000 \times 2 = 16{,}000$ samples.
(b) $\Delta f = 8000/16000 = \mathbf{0.5\ Hz/bin}$.
(c) $T = 1/0.1 = \mathbf{10\ s}$ (N = 80,000 samples).

Q2 Why does doubling N in the naive periodogram NOT halve the variance?

The naive $|FFT|^2/N$ estimator is inconsistent — its variance does not decrease as N grows. More samples give finer frequency resolution (smaller $\Delta f$) but just produce a more jagged, detailed spectrum, not a smoother one. Variance reduction requires averaging multiple independent estimates (Welch).

Q3 With $N=8192$ and $\text{nperseg}=512$ at 50% overlap, approximately how many frames K does Welch produce, and by how much does variance drop?

Step size = nperseg × 0.5 = 256. K ≈ (8192 − 512)/256 + 1 ≈ 30 frames. Variance drops by factor ≈ 30×.

Spectral Gating: Threshold-Based Filtering

With the noise floor visible in the spectrum, we can remove it by zeroing or shrinking coefficients that fall below a threshold — the spectral equivalent of a noise gate.

After this section you will be able to
  • Apply the hard threshold operator to a spectrum by hand, identifying which bins survive for a given $\lambda$.
  • Compute a soft-threshold magnitude shrinkage step-by-step for one spectral bin.
  • Compare hard vs soft thresholding: which produces musical noise and why.

Look at the noisy spectrum: a flat carpet of low-level components at every frequency, with three or four tall peaks rising above it. The carpet is the noise; the peaks are your signal. What if you could lower a gate and pass only what rises above it?

🎯
Why this matters: FFT spectral gating is used inside iZotope RX, Audition, Logic Pro X, and every professional audio denoising plugin. The same principle appears in 5G channel estimation, sonar target detection, and RF signal intelligence. The hard vs soft threshold choice determines whether the output has "musical noise" artifacts or a slight amplitude bias.
🔗
Think of it this way

Hard thresholding is like a bouncer at a club: anyone shorter than a height limit is refused entry, everyone else walks through unchanged. Soft thresholding is like a salary cut: everyone's pay is reduced by the same fixed amount, and those whose pay falls to zero or below simply leave. Soft shrinkage is continuous; hard gating is binary.

Before Threshold
Noisy Spectrum |X[k]| λ signal noise
After Hard Threshold
Cleaned Spectrum |X̂[k]| λ kept noise zeroed — 3 signal bins survive

Hard threshold: bins below λ are zeroed; signal peaks above λ pass through unchanged.

Problem

Hard Thresholding

The simplest approach: set every frequency coefficient below threshold $\lambda$ to zero — an all-or-nothing binary decision.

✂️ Hard Threshold Operator

$$\hat{X}[k] = \begin{cases} X[k] & \text{if } |X[k]| \geq \lambda \\ 0 & \text{if } |X[k]| < \lambda \end{cases}$$

Coefficients above threshold pass through unchanged; those below are completely zeroed.

📝 Worked Example — Applying Hard Threshold to a Spectrum

Background. Hard threshold is applied bin-by-bin to FFT magnitudes. Bins above $\lambda$ pass through; bins below are zeroed. The IFFT then reconstructs a time-domain signal using only the surviving bins.

Problem: Magnitudes $|X| = [2.1,\ 14.8,\ 0.9,\ 23.4,\ 1.2,\ 17.6,\ 0.7,\ 2.5]$. Noise floor = median, $\alpha = 3.0$. Which bins survive?

1
Compute median noise floor.
Sorted: $[0.7,\ 0.9,\ 1.2,\ 2.1,\ 2.5,\ 14.8,\ 17.6,\ 23.4]$
Median = $(2.1 + 2.5)/2 = 2.3$
2
Set threshold.
$\lambda = \alpha \times \text{median} = 3.0 \times 2.3 = 6.9$
3
Apply hard threshold.
$k=0$: $2.1 < 6.9$ → 0   $k=1$: $14.8 \geq 6.9$ → keep
$k=2$: $0.9 < 6.9$ → 0   $k=3$: $23.4 \geq 6.9$ → keep
$k=4$: $1.2 < 6.9$ → 0   $k=5$: $17.6 \geq 6.9$ → keep
$k=6$: $0.7 < 6.9$ → 0   $k=7$: $2.5 < 6.9$ → 0
3 of 8 bins survive (k = 1, 3, 5). Signal peaks preserved; all flat noise-floor bins zeroed.
Quick Check

If $\alpha$ is raised from 3.0 to 6.0, how many bins survive? ($\lambda = 6.0 \times 2.3 = 13.8$)

$\lambda = 13.8$. Surviving bins: k=1 (14.8≥13.8), k=3 (23.4≥13.8), k=5 (17.6≥13.8) → still 3 bins. But if α=10: λ=23 → only k=3 (23.4≥23) survives — over-thresholding removes legitimate signal bins.

Soft Thresholding (Shrinkage)

Instead of a binary cut, soft thresholding shrinks every coefficient's magnitude toward zero — large coefficients survive but are reduced by exactly $\lambda$. This avoids the discontinuity that creates "musical noise".

🎚 Soft Threshold (Shrinkage) Operator

$$\hat{X}[k] = \text{sign}(X[k])\cdot\max\!\left(|X[k]|-\lambda,\;0\right)$$

Also written $\mathcal{S}_\lambda(X[k])$. The magnitude is shrunk by $\lambda$, preserving the complex phase. Continuous everywhere — no abrupt transitions.

📝 Worked Example — Soft Threshold on One Bin

Background. Soft thresholding applies the shrinkage operator $\mathcal{S}_\lambda$ to each bin's magnitude while keeping the original phase. The result is that large bins shrink by exactly $\lambda$, while small bins become zero — but the transition is continuous.

Problem: $X[3] = 23.4\,e^{j0.8}$ (magnitude 23.4, phase 0.8 rad), $\lambda = 6.9$. Apply soft threshold.

1
Extract magnitude and phase.
$|X[3]| = 23.4$, $\angle X[3] = 0.8$ rad
2
Shrink magnitude.
$\hat{m} = \max(23.4 - 6.9,\; 0) = \max(16.5,\; 0) = 16.5$
3
Reconstruct with original phase.
$\hat{X}[3] = 16.5\,e^{j0.8}$
Soft: X̂[3] = 16.5·e^j0.8 (magnitude reduced from 23.4 → 16.5). Hard would give 23.4·e^j0.8 (unchanged).
⚠️
Common Mistake — Musical Noise

Hard thresholding leaves isolated high-magnitude noise peaks scattered across the spectrum. On audio, these random surviving noise bins sound like faint, erratic metallic tones — hence "musical noise". Soft thresholding avoids this by shrinking continuously rather than making a binary pass/fail decision.

Hard vs Soft — At a Glance

PropertyHard ThresholdSoft Threshold (Shrinkage)
OperationZero bins below $\lambda$Shrink all magnitudes by $\lambda$
Strong signal binsPreserved exactlyReduced by $\lambda$ (slight bias)
Continuity at $\lambda$DiscontinuousContinuous everywhere
Artifact"Musical noise" ringingSlight amplitude reduction
Best forTonal noise, sharp peaksWhite / broadband noise
Solution
Pause & Predict

At very high threshold (say $\lambda = 90\%$ of max magnitude), do you expect the denoised output to sound better or worse than no denoising? Try it.

Hint: a threshold that is too high zeros out signal bins too — what does IFFT of zeros look like?

Try It: Threshold Gate Explorer

Toggle between hard and soft, then drag the threshold to see how each method affects the spectrum. Teal = surviving bins · Gray = zeroed bins · Red dashed = threshold line.

Threshold λ 35%
Spectrum — Before
Spectrum — After
Implementation
Python · NumPy — Hard and Soft Threshold Functions
from scipy.fft import rfft, irfft import numpy as np def hard_threshold(signal, threshold): X = rfft(signal) mag = np.abs(X) X_clean = np.where(mag >= threshold, X, 0) return irfft(X_clean, n=len(signal)) def soft_threshold(signal, threshold): X = rfft(signal) mag = np.abs(X) phase = np.angle(X) mag_s = np.maximum(mag - threshold, 0) X_clean = mag_s * np.exp(1j * phase) return irfft(X_clean, n=len(signal)) # Adaptive noise floor: 3× median of all bins X = rfft(x_noisy) threshold = 3.0 * np.median(np.abs(X)) x_hard = hard_threshold(x_noisy, threshold) x_soft = soft_threshold(x_noisy, threshold)
Output
threshold = 3.0 × median(|X[k]|) ≈ adaptive value based on noise floor hard: signal peaks fully preserved, noise zeroed soft: signal peaks reduced by λ, noise zeroed — smoother spectral transition
Key Takeaway

Hard thresholding is faster and preserves peak amplitudes exactly, but creates "musical noise" artifacts; soft thresholding is continuous (no artifacts) at the cost of slightly reducing every surviving bin's magnitude.

🎙
Real-World Application

Spectral Repair in Digital Audio Workstations

iZotope RX, Audition, and Logic Pro all implement FFT spectral gating as their core noise reduction engine. The plugin analyzes each frame's spectrum and suppresses components below the estimated noise floor. This cleanly removes 50/60 Hz power-line hum, HVAC rumble, and microphone clicks without affecting primary speech or musical content. Professional podcast editors, film sound designers, and music producers rely on this technique daily — it is one of the most widely deployed DSP algorithms in consumer software.

CheckpointSpectral Gating

Q1 Given $X[5] = 8.0$ and $\lambda = 6.9$, what is the output after (a) hard threshold and (b) soft threshold?

(a) Hard: $8.0 \geq 6.9$ → $\hat{X}[5] = 8.0$ (unchanged).
(b) Soft: $\hat{m} = \max(8.0-6.9, 0) = 1.1$ → $\hat{X}[5] = 1.1 \cdot e^{j\angle X[5]}$.

Q2 Why does the hard threshold operator create "musical noise" on audio signals?

Random noise bins that happen to be slightly above the threshold survive unchanged and are scattered sparsely across the spectrum. When reconstructed via IFFT, these isolated high-frequency bins produce faint, erratic tones that the ear perceives as ringing or metallic artifacts — "musical noise".

Q3 With 8 bins of magnitudes $[2, 15, 1, 24, 1, 18, 0.5, 3]$, sorted median = 2.5 and $\alpha=3$, compute $\lambda$ and list surviving bins for hard threshold.

$\lambda = 3 \times 2.5 = 7.5$. Bins where magnitude ≥ 7.5: k=1 (15), k=3 (24), k=5 (18). 3 bins survive.

The Complete FFT Denoising Pipeline

Five steps. Window the signal, FFT, estimate noise floor, threshold, IFFT. That is all it takes to transform 7 dB SNR into 19 dB — running in milliseconds on embedded hardware.

After this section you will be able to
  • Implement the complete 5-step FFT denoising pipeline end-to-end in Python with window power compensation.
  • Compute the Donoho universal threshold $\lambda^* = \hat{\sigma}\sqrt{2\log N}$ given $\hat{\sigma}$ and $N$.
  • Identify when FFT denoising fails (broadband signal) and explain the correct alternative.

You now have every piece: you understand why noise spreads across the spectrum, why windowing is necessary before the FFT, how to estimate the noise floor, and how to threshold it away. What does a production-ready implementation actually look like — and what step do most tutorials silently skip?

🎯
Why this matters: The pipeline runs inside every modern ECG monitor, hearing aid, speech enhancement chip, and audio workstation. Understanding every step end-to-end — including the window power compensation that most tutorials omit — is what separates a DSP practitioner who writes correct code from one who produces subtly wrong results.
🔗
Think of it this way

The FFT denoising pipeline is like developing a photograph in a darkroom. You apply a filter (window), expose the film (FFT), stop down the lens to block dim light (threshold), then reverse the exposure (IFFT). But if you forgot the lens has a tinting effect (window power), the final image comes out 2.7× too dark. The compensation step is the one most beginners skip.

Pipeline Overview

Raw Signalx[n] + noise
Apply WindowHann × x[n]
FFTX[k] = FFT(x̃)
Estimate Floormedian |X[k]|
Thresholdzero / shrink
IFFTx̂[n] = IFFT(X̂)
Compensate÷ window power
Problem

Why Divide by Window Power?

Multiplying by the Hann window before the FFT reduces signal amplitude. The IFFT brings back a signal smaller than the original by a factor of $\bar{w}^2=\frac{1}{N}\sum w^2[n]\approx0.375$ for the Hann window. Dividing by this factor restores the correct amplitude scale.

Without this step, the denoised output is about 2.7× too quiet — a common mistake in tutorial code.

📐 Donoho Universal Threshold

When no clean reference segment is available, Donoho & Johnstone (1994) proved this threshold is minimax-optimal for Gaussian noise:

$$\lambda^* = \hat{\sigma}\sqrt{2\log N}$$

where $\hat{\sigma}$ is estimated from the FFT coefficients (e.g., lower 20% of sorted magnitudes). This threshold adapts automatically to the noise level — no manual tuning.

📝 Worked Example — Computing the Donoho Universal Threshold

Background. The universal threshold comes from extreme-value theory: with high probability, all $N$ standard normal noise coefficients lie below $\sqrt{2\log N}\,\hat{\sigma}$, so this threshold zeroes nearly all pure noise while keeping most signal bins.

Problem: $N = 2048$, $\hat{\sigma} = 0.25$. Compute $\lambda^*$.

1
Compute $\ln N$.
$\ln(2048) = \ln(2^{11}) = 11\ln 2 = 11\times0.6931 = 7.624$
2
Compute $\sqrt{2\ln N}$.
$\sqrt{2\times7.624} = \sqrt{15.248} = 3.905$
3
Apply formula.
$\lambda^* = 0.25\times3.905 = 0.976$
λ* ≈ 0.976 — any bin with |X[k]| < 0.976 will be zeroed. Larger noise σ̂ automatically raises the gate.
Quick Check

If $N$ doubles from 2048 to 4096, does $\lambda^*$ increase, decrease, or stay approximately the same?

$\sqrt{2\ln(4096)} = \sqrt{2\times8.317} = \sqrt{16.635} \approx 4.079$. So λ* increases slightly (from 0.976 to 0.25×4.079 ≈ 1.02). The threshold grows very slowly with N (logarithmically) — doubling N only adds ~4.5% to λ*.

⚠️ When FFT Denoising Fails

FFT denoising assumes the clean signal's energy concentrates in a few bins (tonal signal). For broadband clean signals — speech with energy across hundreds of Hz — every bin contains both signal and noise. A single threshold cannot tell them apart.

For speech denoising: use STFT-frame denoising (Week 9) or spectral subtraction instead, which operates frame-by-frame on a short-time spectrum.

💡
Key Insight — Window Power Correction

The Hann window has a mean square value of $\approx 0.375$. Every amplitude in the IFFT output is 0.375× too small. Dividing by 0.375 restores correct scale. This is always needed when windowing before FFT — the factor changes by window type (Rectangular: 1.0, Hann: 0.375, Hamming: 0.397).

Solution
Pause & Predict

If you set the threshold multiplier $\alpha$ very low (say 1.0×), do you expect the SNR to improve or get worse compared to $\alpha = 3.5$? What about $\alpha = 10$?

Hint: too low → noise survives; too high → signal bins are zeroed. There is an optimal α.

Try It: Pipeline SNR Meter

Adjust noise level and threshold multiplier to watch SNR improve through the pipeline in real time.

Noise level 40%
Threshold α 3.0×
Input — Noisy Signal
Output — Denoised Signal
SNR in: dB  →  SNR out: dB    Gain: dB
Implementation
Python · NumPy / SciPy — Complete FFT Denoising Function
import numpy as np from scipy.fft import rfft, irfft from scipy.signal import hann def fft_denoise(signal, alpha=3.0, mode='hard'): N = len(signal) window = hann(N) # Step 1: apply Hann window windowed = signal * window X = rfft(windowed) # Step 2: FFT mag = np.abs(X) threshold = alpha * np.median(mag) # Step 3: noise floor if mode == 'hard': # Step 4: threshold X_clean = np.where(mag >= threshold, X, 0) else: phase = np.angle(X) mag_s = np.maximum(mag - threshold, 0) X_clean = mag_s * np.exp(1j * phase) x_rec = irfft(X_clean, n=N) # Step 5: IFFT win_power = np.mean(window**2) + 1e-10 # Compensate window power (≈0.375) return x_rec / win_power # Demo fs = 1000; N = 4096; t = np.arange(N) / fs clean = np.sin(2*np.pi*100*t) + 0.6*np.sin(2*np.pi*250*t) noisy = clean + np.random.normal(0, 0.35, N) denoised = fft_denoise(noisy, alpha=3.5, mode='hard') snr = lambda s, h: 10*np.log10(np.var(s)/(np.var(s-h)+1e-12)) print(f"Input SNR: {snr(clean, noisy-clean):.1f} dB") print(f"Output SNR: {snr(clean, denoised):.1f} dB")
Output
Input SNR: 7.2 dB Output SNR: 19.4 dB SNR improvement: +12.2 dB
Key Takeaway

The complete denoising pipeline is: Window → FFT → median noise floor → threshold → IFFT → divide by window power — skipping the last step makes the output 2.7× too quiet and is the #1 tutorial mistake.

🫀
Real-World Application

Noise Removal in ECG Signals

A 12-lead hospital ECG measures cardiac electrical signals with amplitudes of only 1–5 mV, contaminated by 50/60 Hz power-line interference, EMG noise from muscle tremor, and baseline wander from respiration. The ECG's onboard DSP applies spectral notch filters and FFT-based subtraction to remove each noise type before the trace reaches the physician. The quality of this denoising directly affects the accuracy of diagnosing arrhythmia and myocardial infarction — a false negative caused by missed QRS complex detection can be life-threatening.

CheckpointComplete FFT Denoising Pipeline

Q1 What are the 5 steps of the FFT denoising pipeline, in order?

1. Apply Hann window. 2. Compute FFT. 3. Estimate noise floor (e.g., median of |X[k]|). 4. Apply hard or soft threshold. 5. Compute IFFT and divide by window power.

Q2 Compute the Donoho threshold for $N = 1024$, $\hat{\sigma} = 0.5$.

$\ln(1024) = 10\ln 2 \approx 6.931$. $\sqrt{2\times6.931}=\sqrt{13.863}\approx3.723$. $\lambda^* = 0.5\times3.723 \approx \mathbf{1.86}$.

Q3 Why does FFT denoising fail for broadband speech, and what technique is used instead?

Speech energy is spread across hundreds of Hz — every bin contains both signal and noise, so no single threshold can separate them. Solution: STFT-frame denoising (Week 9) or spectral subtraction — operate on short overlapping frames and estimate per-frame noise floor from silence segments.

Live FFT Denoising Explorer

Adjust noise level and threshold in real time to observe how spectral gating transforms the noisy signal — and watch the SNR change instantly.

40%
30%
Clean Signal: 3 tones at 15, 40 & 80 Hz.
Noise Level: White Gaussian noise amplitude.
Threshold: % of max magnitude. Teal = kept · Gray = zeroed.
— — — Dashed red line = threshold.
① Clean Signal (Time Domain)
② Noisy Signal (Time Domain)
③ FFT Spectrum (Frequency Domain)
④ Denoised Signal (After IFFT)
SNR before: dB  |  SNR after: dB  |  Improvement: dB

Week 6 Summary

The five ideas from Spectral Analysis & Denoising that must survive the exam.

Additive Noise Model

$x[n]=s[n]+w[n]$ and FFT linearity gives $X[k]=S[k]+W[k]$. Bins where $|W[k]|$ is small can be recovered by thresholding — denoising is possible only because of this linear separability.

📊

SNR & Frequency Resolution

$\text{SNR}=10\log_{10}(P_s/P_w)$ dB. Frequency resolution $\Delta f = f_s/N = 1/T$ — the longer you record, the finer the detail you can resolve in the frequency domain.

🪟

Window Functions & Leakage

Always window before FFT. The Hann window tapers the signal to zero at both block edges, reducing peak sidelobes from −13 dB (rectangular) to −31 dB. Blackman (−58 dB) for high-dynamic-range analysis.

🌊

Welch PSD Estimation

Average $K$ overlapping windowed frames. Variance drops as $1/K$ (smoother spectrum), at the cost of coarser $\Delta f = f_s/\text{nperseg}$. The naive periodogram is inconsistent — variance stays high even with more samples.

🔄

Full Denoising Pipeline

Window → FFT → median noise floor → threshold → IFFT → divide by window power (≈0.375 for Hann). Skipping the last step makes the output 2.7× too quiet. Donoho threshold $\lambda^*=\hat{\sigma}\sqrt{2\log N}$ requires no manual tuning.

Coming up — Week 7

Digital Filters: FIR & IIR

You have been thresholding the FFT to remove noise from a complete signal. Next week we design filters that operate sample-by-sample in real time — with no need to buffer the entire signal first. FIR windowed-sinc filters, IIR Butterworth designs, pole-zero stability, and scipy.signal in one complete session.

FIR Windowed-Sinc IIR Butterworth Pole-Zero Stability scipy.signal

Further Reading & Resources

Deepen your understanding with these curated references — from Python-first hands-on derivations to rigorous academic treatments.

12 Exercises: Spectral Analysis & Denoising

Work through these problems to consolidate your understanding of noise characterisation, spectral analysis, FFT-based denoising, and Welch PSD estimation.

1 Theory · Noise in the Frequency Domain Easy

SNR Calculation

A clean 440 Hz sine wave has RMS amplitude 1.0. White Gaussian noise with RMS 0.15 is added.
(a) Calculate the SNR in dB.
(b) If noise RMS doubles to 0.30, how many dB does SNR drop?
(c) What noise RMS gives SNR = 20 dB?

SNR = 10·log₁₀(P_signal/P_noise). For a sine with RMS = 1.0, P_signal = 1.0. P_noise = 0.15² = 0.0225. (b) Doubling noise RMS quadruples P_noise → SNR drops 10·log₁₀(4) = 6.02 dB. (c) 20 dB → P_noise = 0.01 → RMS = 0.1.
2 Code · Noise in the Frequency Domain Medium

Identify Noise Types from Their Spectrum

Generate three signals ($N=4096$, $f_s=1000$ Hz):
(a) White noise: np.random.normal(0, 1, N)
(b) 60 Hz tonal noise: np.sin(2π·60·t) + small white noise
(c) Pink noise via FFT shaping: divide white noise spectrum by $\sqrt{k}$ for bin $k>0$, then IFFT.
Plot all three power spectra on a log-log scale. Describe the slope of each.

White: slope ≈ 0 (flat). Tonal: single spike. Pink: slope ≈ −10 dB/decade. Pink code: X=rfft(white); freqs=rfftfreq(N,1/fs); X[1:]/=np.sqrt(freqs[1:]); pink=irfft(X,n=N).
3 Theory · Window Functions & Spectral Leakage Easy

Frequency Resolution & Bin Check

Signal recorded at $f_s = 8000$ Hz for 2 seconds.
(a) How many samples $N$?
(b) What is $\Delta f$ (Hz/bin)?
(c) Can you resolve tones at 200 Hz and 200.4 Hz? Justify.
(d) How long must you record for $\Delta f = 0.1$ Hz?

N = 8000×2 = 16,000. Δf = 8000/16000 = 0.5 Hz/bin. (c) Separation 0.4 Hz < Δf = 0.5 Hz → cannot resolve. (d) T = 1/0.1 = 10 s → N = 80,000.
4 Code · Window Functions & Spectral Leakage Medium

Compare Window Leakage in Python

Generate a 97.7 Hz tone at $f_s=1000$ Hz, $N=512$. Compute the FFT with Rectangular, Hann, Hamming, and Blackman windows.
(a) Plot the magnitude spectrum in dB for each, zoomed to 80–120 Hz.
(b) For each window, report the sidelobe level at 80 Hz (20 bins from the peak).
(c) Which window would you choose for detecting a −25 dB signal 10 Hz from a strong tone?

Use np.hanning(N), np.hamming(N), np.blackman(N). Plot 20*log10(|rfft(x*win)|/N). Rectangular sidelobes ≈ −13 dB; Blackman ≈ −58 dB. For −25 dB target: Hamming (−41 dB) or Blackman.
5 Theory · Spectral Analysis Medium

Hann Window Coefficient Calculation

Compute the following Hann window coefficients by hand ($N = 20$, formula: $w[n] = 0.5(1-\cos(2\pi n/(N-1)))$):
(a) $w[0]$
(b) $w[9]$ (approximately centre)
(c) $w[19]$
(d) Verify that the mean square $\bar{w}^2 = \frac{1}{N}\sum w[n]^2 \approx 0.375$. (Use: for Hann, $\bar{w}^2 = 3/8$.)

w[0] = 0.5(1 − cos(0)) = 0. w[9] = 0.5(1 − cos(2π×9/19)) = 0.5(1 − cos(0.9474π)). cos(170.5°) ≈ −0.987 → w[9] ≈ 0.994. w[19] = 0 (symmetric). Mean square = 3/8 = 0.375 — exact for Hann.
6 Code · Spectral Analysis Medium

Welch vs Naive FFT PSD Comparison

For a noisy 100 Hz tone (SNR = 10 dB, $f_s = 1000$ Hz, $N = 8192$):
(a) Compute PSD using naive $|\text{FFT}|^2/N$.
(b) Compute using scipy.signal.welch with nperseg=1024, 50% overlap.
(c) Plot both. Which is smoother and why?
(d) Why does Welch have lower frequency resolution than the naive method?

Naive: Δf = 1000/8192 ≈ 0.122 Hz, very jagged. Welch: ≈16 frames → variance ÷16 (smooth), Δf = 1000/1024 ≈ 0.977 Hz (coarser). Welch trades resolution for variance reduction.
7 Theory · Spectral Gating Medium

Hard vs Soft Threshold — By Hand

FFT magnitudes: $|X| = [1.5,\ 12.0,\ 0.8,\ 19.5,\ 2.1,\ 14.0,\ 0.6,\ 3.0]$.
Sorted values: $[0.6,\ 0.8,\ 1.5,\ 2.1,\ 3.0,\ 12.0,\ 14.0,\ 19.5]$.
Median = $(2.1+3.0)/2 = 2.55$, $\alpha = 3.0$, $\lambda = 7.65$.
(a) Apply hard threshold: list the surviving bins.
(b) Apply soft threshold to bin $k=1$ ($|X[1]|=12.0$).
(c) For soft threshold, compute $\hat{m}$ for all bins.

(a) Surviving (magnitude ≥ 7.65): k=1 (12.0), k=3 (19.5), k=5 (14.0) → 3 bins. (b) Soft: max(12.0−7.65, 0) = 4.35. (c) All soft magnitudes: 0, 4.35, 0, 11.85, 0, 6.35, 0, 0.
8 Code · Spectral Gating Medium

Implement Hard Thresholding & Sweep Alpha

Implement hard_threshold_fft(signal, alpha) where alpha multiplies the median noise floor.
(a) Generate: 200 Hz + 500 Hz tones buried in white noise at SNR ≈ 5 dB.
(b) Try alpha = 2, 3, 5. Plot magnitude spectrum before and after for each.
(c) Which alpha best preserves the two signal tones while removing noise?

threshold = alpha × np.median(np.abs(rfft(signal))). Too small alpha (2) leaves noise; too large (5) may zero signal tones. Correct (3–4) keeps only the two large signal peaks. Use np.where(mag >= threshold, X, 0).
9 Theory · Complete FFT Denoising Pipeline Medium

Donoho Universal Threshold Calculation

Compute the Donoho universal threshold $\lambda^* = \hat{\sigma}\sqrt{2\log N}$ for:
(a) $N = 1024$, $\hat{\sigma} = 0.4$
(b) $N = 4096$, $\hat{\sigma} = 0.25$
(c) If $\hat{\sigma}$ doubles, by what factor does $\lambda^*$ change?
(d) If $N$ doubles from 1024 to 2048, by what factor does $\lambda^*$ change? (Use $\ln 2 \approx 0.693$.)

(a) ln(1024)=6.931, √(2×6.931)=√13.863≈3.723. λ*=0.4×3.723≈1.489. (b) ln(4096)=8.317, √16.635≈4.079. λ*=0.25×4.079≈1.020. (c) Factor 2 (λ* scales linearly with σ̂). (d) √(2ln2048)/√(2ln1024)=√(2×7.624)/√(2×6.931)≈4.079/3.723≈1.096 — a ~9.6% increase.
10 Code · Complete FFT Denoising Pipeline Hard

Full Pipeline End-to-End

Build the complete denoising pipeline:
(a) Generate a 3-tone clean signal (100 + 300 + 500 Hz, 1 s, $f_s=2000$ Hz).
(b) Add Gaussian noise to achieve SNR ≈ 10 dB.
(c) Apply Hann window → FFT → noise floor (median, $\alpha=3.5$) → hard threshold → IFFT → divide by window power.
(d) Measure SNR before and after. Plot original, noisy, and denoised.
(e) What happens if you forget to divide by window power?

P_signal ≈ 3×0.5=1.5. For SNR=10 dB: P_noise=P_signal/10=0.15 → noise std≈0.39. After IFFT: irfft(X_clean,n=N)/np.mean(hann(N)**2). Forgetting window power: output is 1/0.375≈2.67× too quiet.
11 Synthesis · Theory: Pipeline Design Challenge Hard

60 Hz Hum Removal — Notch via Spectral Zeroing

Remove power-line interference from a synthetic ECG:
(a) Generate ECG: harmonics at 1, 3, 5 Hz (amplitudes 1.0, 0.6, 0.3), 10 s at $f_s=500$ Hz.
(b) Add 60 Hz sine at amplitude 0.5.
(c) Use FFT to identify the spike; compute $\Delta f$ and determine the bin range to zero (±2 Hz of 60 Hz).
(d) Reconstruct with IFFT. Compute SNR improvement.
(e) Why is ±2 Hz sufficient here but might fail for shorter recordings?

N = 10×500 = 5000, Δf = 0.1 Hz/bin. Zero bins where (freqs≥58) & (freqs≤62). For shorter recordings, Δf is coarser — ±2 Hz may overlap with legitimate spectral content. Always check Δf before choosing notch width.
12 Synthesis · Code: Audio Denoising Pipeline Hard

Hard vs Soft Threshold — RMSE Sweep

Implement both hard and soft thresholding for a 3-tone signal at SNR = 5 dB:
(a) Sweep $\lambda$ from 0 to $3\times$ median noise floor in 50 steps.
(b) For each $\lambda$, compute RMSE(denoised, clean).
(c) Plot RMSE vs $\lambda$ for both on the same axes.
(d) Identify optimal $\lambda^*$ for each. Which achieves lower minimum RMSE?
(e) At what $\lambda$ does hard thresholding start producing "musical noise" artifacts? (Look for the RMSE plateau or uptick.)

RMSE = np.sqrt(np.mean((clean-denoised)**2)). At λ=0 both equal the noisy signal. As λ increases RMSE first falls (noise removed) then rises (signal bins zeroed). Soft usually achieves lower minimum RMSE because its continuous shrinkage avoids the isolated spike artifact.