Real-world signals are always contaminated by noise. The frequency domain reveals structure that the time domain hides — FFT-based thresholding lets us surgically remove it.
Foundations
Random noise looks chaotic in the time domain, but its frequency-domain signature is remarkably structured — and that structure is the key to removing it without touching the signal.
You record a voice memo on your phone and play it back — there's your voice, and underneath it a faint, constant hiss. You can hear that the noise and the voice are separate things, but staring at the waveform tells you nothing about how to separate them. So where do you even begin?
The frequency domain is like separating an orchestra into individual instruments. In the time domain, all players sound like one loud blur. In the frequency domain, each instrument occupies its own row in the score — and a bad musician (noise) who plays in the wrong rows becomes immediately visible and easy to mute.
Four noise types and their frequency-domain signatures. Identifying the type is the first step of any denoising pipeline.
In almost every real-world scenario noise adds to the signal — it does not multiply or distort it. This is the additive noise model, and it is what makes frequency-domain denoising possible: because the FFT is a linear operation, it preserves the additive structure.
Time domain: measured signal = clean signal + noise
$$x[n] = s[n] + w[n]$$
Because the FFT is linear ($\mathcal{F}\{a+b\} = \mathcal{F}\{a\} + \mathcal{F}\{b\}$), the same structure holds in the frequency domain:
$$X[k] = S[k] + W[k]$$
At bins where $|W[k]|$ is small, $X[k] \approx S[k]$ — we can recover the signal by selectively keeping the large-magnitude bins and zeroing the rest.
SNR quantifies how much stronger the signal is than the noise, expressed on a logarithmic (decibel) scale:
$$\text{SNR} = 10\log_{10}\!\left(\frac{P_{\text{signal}}}{P_{\text{noise}}}\right) \quad [\text{dB}]$$
where $P = \frac{1}{N}\sum_{n=0}^{N-1}x[n]^2$ is mean power. At 0 dB the noise equals the signal; at 20 dB the signal is 100× more powerful.
Background. The SNR formula maps a power ratio onto a log scale. The factor of 10 (not 20) is used because power is proportional to amplitude squared — each doubling of amplitude is a 6 dB increase, while a doubling of power is only 3 dB.
Problem: A 100 Hz sine wave has amplitude $A = 1.0$ V. White Gaussian noise with RMS $= 0.1$ V is added. What is the SNR in dB?
If the noise RMS doubles from 0.1 to 0.2 V (signal unchanged), by how many dB does the SNR drop?
Each noise type has a distinct spectral "fingerprint." Knowing the shape tells you which removal strategy to apply.
| Type | PSD shape | Best removal |
|---|---|---|
| White | Flat ($\propto 1$) | Threshold all bins uniformly |
| Tonal | Single spike | Notch filter or zero specific bin |
| Pink (1/f) | Slope −10 dB/decade | Frequency-shaped threshold |
| Brownian (1/f²) | Slope −20 dB/decade | High-pass + adaptive filter |
Confusing RMS amplitude and power. $P = \text{RMS}^2$, not $\text{RMS}$. A common error is writing $\text{SNR} = 10\log_{10}(\text{RMS}_s/\text{RMS}_w)$ — the correct formula for amplitude ratios is $20\log_{10}(\text{RMS}_s/\text{RMS}_w)$, which equals the power version because the square inside the ratio becomes a factor of 2.
Before exploring the widget: if you switch from White to Tonal noise at the same noise level, do you expect the spectrum to look simpler or more complex? Where will the noise energy appear?
Hint: think about how many bins carry the noise energy in each case.
Because the FFT is linear, noise and signal add independently in the frequency domain — bins dominated by signal energy can be recovered by thresholding bins where noise power exceeds signal power.
Modern hearing aids (Phonak, ReSound) must amplify speech (300–3400 Hz) without amplifying background noise. A miniature DSP chip performs an FFT every 8 ms: frequency bands with low SNR are attenuated; bands with high SNR (clear speech energy) are amplified. The chip classifies ambient noise type — white, babble, traffic — and selects the optimal denoising strategy per-band. This SNR-aware, per-band processing is spectral subtraction-based noise reduction, the foundational technique of this week.
Q1 A signal has $P_{\text{signal}} = 2.0$ W and $P_{\text{noise}} = 0.02$ W. What is the SNR in dB?
Q2 Which noise type would you use a notch filter to remove, and why?
Q3 If $x[n] = s[n] + w[n]$, write the frequency-domain equivalent and state what it means for denoising.
Frequency Resolution
The DFT silently assumes your $N$-sample frame repeats forever. When a tone's frequency doesn't align with an exact bin, this assumption creates a sharp edge — and sharp edges spread energy across every frequency.
Take a 1-second clip of music that plays a pure 440 Hz note. Run an FFT. You expect one spike at 440 Hz — but the spectrum shows energy smeared across 430 Hz, 435 Hz, 445 Hz, 450 Hz, and beyond. No other frequencies are present in the music, so where did all that extra energy come from?
A window function is like a dimmer switch for a recording booth. The rectangular window is a light switch — abruptly on, then abruptly off — which creates a loud "click" at both ends of the recording. The Hann window slowly fades the sound in and out, so there are no clicks. Just as click-free audio has no high-frequency artifacts, a smooth window produces no spectral leakage sidebands.
The DFT implicitly multiplies the input signal by a rectangular window — creating a hard cut at both ends. In the frequency domain, multiplication by a rectangular window is convolution with its spectrum (a sinc function), which spreads each tone's energy into nearby bins.
$$X[k] = \sum_{n=0}^{N-1} x[n]\, e^{-j2\pi kn/N}$$
This sum is exact only when the signal completes an integer number of cycles in $N$ samples. If $f_0/\Delta f$ is not an integer, energy spreads across all bins — spectral leakage.
Background. The DFT basis functions $e^{j2\pi kn/N}$ are mutually orthogonal for integer $k$. A signal at a non-integer bin index $k+\delta$ has non-zero inner product with every basis vector, distributing energy across all $N$ bins.
Problem: $N=8$, $f_s=8$ Hz, pure tone $x[n]=\cos(2\pi\times 2.5\times n/8)$. The ideal bin is $k_{\text{true}}=2.5$ — not an integer. How much leakage reaches bin $k=1$?
Why does multiplying by the Hann window reduce leakage? (Think about what happens at the block edges.)
| Window | Formula $w[n]$ | Peak Sidelobe |
|---|---|---|
| Rectangular | $1$ | −13 dB |
| Hann | $0.5\!\left(1-\cos\dfrac{2\pi n}{N{-}1}\right)$ | −31 dB |
| Hamming | $0.54-0.46\cos\dfrac{2\pi n}{N{-}1}$ | −41 dB |
| Blackman | $0.42-0.5\cos\dfrac{2\pi n}{N{-}1}+0.08\cos\dfrac{4\pi n}{N{-}1}$ | −58 dB |
Background. The Hann window equals zero at both endpoints ($n=0$ and $n=N-1$), making the windowed signal's periodic extension continuous at frame boundaries and eliminating the step discontinuity that causes leakage.
Problem: Compute $w[4]$ for a Hann window of length $N=16$. Verify boundary and centre values.
What is the trade-off of choosing Blackman over Rectangular for spectral analysis?
Resolution vs leakage is always a trade-off. Suppressing leakage widens the spectral main lobe, blurring nearby components. Hann is the general-purpose compromise: −31 dB sidelobes and a 4-bin main lobe. Only switch to Blackman (6 bins) when you need to detect a weak tone near a strong one.
If you move the tone further from the nearest DFT bin (increase the frequency offset), do you expect the leakage to get better or worse? Move the slider to find out.
Hint: think about what the DFT's inner product formula produces when $\delta \to 0.5$.
Always window before FFT in spectral analysis — the Hann window is the default choice, reducing peak sidelobes from −13 dB (rectangular) to −31 dB with minimal resolution cost.
Modern weather radars (e.g., NEXRAD WSR-88D) apply a Blackman window to each radar pulse before computing the Doppler FFT. Without windowing, strong ground-clutter returns from buildings and terrain leak across the entire spectrum, masking weak precipitation signals. The Blackman window's −58 dB sidelobe level keeps clutter leakage below the thermal noise floor, enabling detection of rain and wind shear at ranges beyond 400 km — directly saving lives through severe weather early warning.
Q1 Compute the Hann window coefficient $w[2]$ for $N=8$.
Q2 A pure 60 Hz tone is recorded at $f_s=240$ Hz with $N=12$ samples. Is 60 Hz on an exact DFT bin? (Check: $k_{\text{true}}=60/(f_s/N)$)
Q3 You need to detect a faint −30 dB signal sitting 2 bins away from a strong tone. Which window should you use, and why?
Analysis Tools
Before we can remove noise, we must learn to read the spectrum precisely — magnitude, power, frequency resolution, and how windowed averaging sharpens the estimate.
Two machines in a factory vibrate at 99.8 Hz and 100.2 Hz. You bolt an accelerometer to the casing and record two seconds of data. Can the FFT tell them apart? The answer depends entirely on how long you recorded — and that question has a precise, calculable answer.
Frequency resolution is like pixel density on a screen. More pixels (longer recording = more samples = smaller $\Delta f$) let you see finer detail. A 100-pixel-wide photo cannot distinguish two objects 50 cm apart in a scene that spans 100 metres — just as a short recording cannot distinguish two close frequencies. You need more "pixels" (duration) to zoom in.
| Quantity | Formula | Units |
|---|---|---|
| Magnitude spectrum | $|X[k]|$ | Amplitude |
| Power spectrum | $|X[k]|^2/N$ | Power |
| PSD (Welch) | $\lim_{N\to\infty}\frac{1}{N}E[|X[k]|^2]$ | Power/Hz |
| Log scale | $20\log_{10}|X[k]|$ | dB |
$$\Delta f = \frac{f_s}{N} = \frac{1}{T_{\text{duration}}}$$
To resolve two tones $f_1$ and $f_2$ you need $|f_1-f_2|\geq\Delta f$, which means recording for at least $T=1/|f_1-f_2|$ seconds. Longer recording = finer frequency resolution.
Background. The FFT divides $[0, f_s/2]$ into $N/2$ equal bins of width $\Delta f = f_s/N$. Two tones must be separated by at least one bin. Because $N = f_s \times T$, the bin width equals $1/T$ — recording twice as long halves the bin width.
Problem: $f_s = 8000$ Hz, $T = 0.5$ s. (a) Find $\Delta f$. (b) Can 100 Hz and 102 Hz be resolved? (c) Duration for $\Delta f = 0.25$ Hz?
At $f_s = 44100$ Hz with $N = 2048$ samples, what is $\Delta f$?
Periodogram is inconsistent: The naive $|\text{FFT}|^2/N$ estimator has variance that does not decrease as $N$ grows — doubling the signal length just gives more jagged detail, not a smoother spectrum. Fix: use Welch's averaged method.
Welch (1967) improves the periodogram by dividing the signal into overlapping blocks, windowing each, computing each periodogram, and averaging. Variance drops as $1/K$ where $K$ is the number of blocks.
Resolution vs Variance: For 4096 samples with nperseg=512 at 50% overlap, $K\approx16$ frames — variance drops 16× but $\Delta f = f_s/512$ (coarser). Choose nperseg based on the narrowest feature you need to resolve.
If you increase nperseg from 256 to 1024, do you expect the Welch PSD to become smoother or rougher? And what will happen to the frequency resolution?
Hint: larger nperseg means fewer segments K — what does that do to variance? And $\Delta f = f_s/\text{nperseg}$?
Welch's method trades frequency resolution for variance reduction — choosing nperseg is the key design decision: smaller segments give smoother estimates but blur closely-spaced frequency components.
As industrial motor bearings wear, they generate vibration at a Bearing Defect Frequency calculable from bearing geometry. Maintenance engineers run Welch PSD estimation every hour on accelerometer data. When the PSD shows a rising peak at the predicted defect frequency above a threshold, an alert triggers bearing replacement — preventing the $50,000–$200,000 cost of an unplanned shutdown.
Q1 $f_s = 8000$ Hz, $T = 2$ s. Compute (a) $N$, (b) $\Delta f$, (c) minimum duration to achieve $\Delta f = 0.1$ Hz.
Q2 Why does doubling N in the naive periodogram NOT halve the variance?
Q3 With $N=8192$ and $\text{nperseg}=512$ at 50% overlap, approximately how many frames K does Welch produce, and by how much does variance drop?
Denoising Technique
With the noise floor visible in the spectrum, we can remove it by zeroing or shrinking coefficients that fall below a threshold — the spectral equivalent of a noise gate.
Look at the noisy spectrum: a flat carpet of low-level components at every frequency, with three or four tall peaks rising above it. The carpet is the noise; the peaks are your signal. What if you could lower a gate and pass only what rises above it?
Hard thresholding is like a bouncer at a club: anyone shorter than a height limit is refused entry, everyone else walks through unchanged. Soft thresholding is like a salary cut: everyone's pay is reduced by the same fixed amount, and those whose pay falls to zero or below simply leave. Soft shrinkage is continuous; hard gating is binary.
Hard threshold: bins below λ are zeroed; signal peaks above λ pass through unchanged.
The simplest approach: set every frequency coefficient below threshold $\lambda$ to zero — an all-or-nothing binary decision.
$$\hat{X}[k] = \begin{cases} X[k] & \text{if } |X[k]| \geq \lambda \\ 0 & \text{if } |X[k]| < \lambda \end{cases}$$
Coefficients above threshold pass through unchanged; those below are completely zeroed.
Background. Hard threshold is applied bin-by-bin to FFT magnitudes. Bins above $\lambda$ pass through; bins below are zeroed. The IFFT then reconstructs a time-domain signal using only the surviving bins.
Problem: Magnitudes $|X| = [2.1,\ 14.8,\ 0.9,\ 23.4,\ 1.2,\ 17.6,\ 0.7,\ 2.5]$. Noise floor = median, $\alpha = 3.0$. Which bins survive?
If $\alpha$ is raised from 3.0 to 6.0, how many bins survive? ($\lambda = 6.0 \times 2.3 = 13.8$)
Instead of a binary cut, soft thresholding shrinks every coefficient's magnitude toward zero — large coefficients survive but are reduced by exactly $\lambda$. This avoids the discontinuity that creates "musical noise".
$$\hat{X}[k] = \text{sign}(X[k])\cdot\max\!\left(|X[k]|-\lambda,\;0\right)$$
Also written $\mathcal{S}_\lambda(X[k])$. The magnitude is shrunk by $\lambda$, preserving the complex phase. Continuous everywhere — no abrupt transitions.
Background. Soft thresholding applies the shrinkage operator $\mathcal{S}_\lambda$ to each bin's magnitude while keeping the original phase. The result is that large bins shrink by exactly $\lambda$, while small bins become zero — but the transition is continuous.
Problem: $X[3] = 23.4\,e^{j0.8}$ (magnitude 23.4, phase 0.8 rad), $\lambda = 6.9$. Apply soft threshold.
Hard thresholding leaves isolated high-magnitude noise peaks scattered across the spectrum. On audio, these random surviving noise bins sound like faint, erratic metallic tones — hence "musical noise". Soft thresholding avoids this by shrinking continuously rather than making a binary pass/fail decision.
| Property | Hard Threshold | Soft Threshold (Shrinkage) |
|---|---|---|
| Operation | Zero bins below $\lambda$ | Shrink all magnitudes by $\lambda$ |
| Strong signal bins | Preserved exactly | Reduced by $\lambda$ (slight bias) |
| Continuity at $\lambda$ | Discontinuous | Continuous everywhere |
| Artifact | "Musical noise" ringing | Slight amplitude reduction |
| Best for | Tonal noise, sharp peaks | White / broadband noise |
At very high threshold (say $\lambda = 90\%$ of max magnitude), do you expect the denoised output to sound better or worse than no denoising? Try it.
Hint: a threshold that is too high zeros out signal bins too — what does IFFT of zeros look like?
Hard thresholding is faster and preserves peak amplitudes exactly, but creates "musical noise" artifacts; soft thresholding is continuous (no artifacts) at the cost of slightly reducing every surviving bin's magnitude.
iZotope RX, Audition, and Logic Pro all implement FFT spectral gating as their core noise reduction engine. The plugin analyzes each frame's spectrum and suppresses components below the estimated noise floor. This cleanly removes 50/60 Hz power-line hum, HVAC rumble, and microphone clicks without affecting primary speech or musical content. Professional podcast editors, film sound designers, and music producers rely on this technique daily — it is one of the most widely deployed DSP algorithms in consumer software.
Q1 Given $X[5] = 8.0$ and $\lambda = 6.9$, what is the output after (a) hard threshold and (b) soft threshold?
Q2 Why does the hard threshold operator create "musical noise" on audio signals?
Q3 With 8 bins of magnitudes $[2, 15, 1, 24, 1, 18, 0.5, 3]$, sorted median = 2.5 and $\alpha=3$, compute $\lambda$ and list surviving bins for hard threshold.
End-to-End Implementation
Five steps. Window the signal, FFT, estimate noise floor, threshold, IFFT. That is all it takes to transform 7 dB SNR into 19 dB — running in milliseconds on embedded hardware.
You now have every piece: you understand why noise spreads across the spectrum, why windowing is necessary before the FFT, how to estimate the noise floor, and how to threshold it away. What does a production-ready implementation actually look like — and what step do most tutorials silently skip?
The FFT denoising pipeline is like developing a photograph in a darkroom. You apply a filter (window), expose the film (FFT), stop down the lens to block dim light (threshold), then reverse the exposure (IFFT). But if you forgot the lens has a tinting effect (window power), the final image comes out 2.7× too dark. The compensation step is the one most beginners skip.
Multiplying by the Hann window before the FFT reduces signal amplitude. The IFFT brings back a signal smaller than the original by a factor of $\bar{w}^2=\frac{1}{N}\sum w^2[n]\approx0.375$ for the Hann window. Dividing by this factor restores the correct amplitude scale.
Without this step, the denoised output is about 2.7× too quiet — a common mistake in tutorial code.
When no clean reference segment is available, Donoho & Johnstone (1994) proved this threshold is minimax-optimal for Gaussian noise:
$$\lambda^* = \hat{\sigma}\sqrt{2\log N}$$
where $\hat{\sigma}$ is estimated from the FFT coefficients (e.g., lower 20% of sorted magnitudes). This threshold adapts automatically to the noise level — no manual tuning.
Background. The universal threshold comes from extreme-value theory: with high probability, all $N$ standard normal noise coefficients lie below $\sqrt{2\log N}\,\hat{\sigma}$, so this threshold zeroes nearly all pure noise while keeping most signal bins.
Problem: $N = 2048$, $\hat{\sigma} = 0.25$. Compute $\lambda^*$.
If $N$ doubles from 2048 to 4096, does $\lambda^*$ increase, decrease, or stay approximately the same?
FFT denoising assumes the clean signal's energy concentrates in a few bins (tonal signal). For broadband clean signals — speech with energy across hundreds of Hz — every bin contains both signal and noise. A single threshold cannot tell them apart.
For speech denoising: use STFT-frame denoising (Week 9) or spectral subtraction instead, which operates frame-by-frame on a short-time spectrum.
The Hann window has a mean square value of $\approx 0.375$. Every amplitude in the IFFT output is 0.375× too small. Dividing by 0.375 restores correct scale. This is always needed when windowing before FFT — the factor changes by window type (Rectangular: 1.0, Hann: 0.375, Hamming: 0.397).
If you set the threshold multiplier $\alpha$ very low (say 1.0×), do you expect the SNR to improve or get worse compared to $\alpha = 3.5$? What about $\alpha = 10$?
Hint: too low → noise survives; too high → signal bins are zeroed. There is an optimal α.
The complete denoising pipeline is: Window → FFT → median noise floor → threshold → IFFT → divide by window power — skipping the last step makes the output 2.7× too quiet and is the #1 tutorial mistake.
A 12-lead hospital ECG measures cardiac electrical signals with amplitudes of only 1–5 mV, contaminated by 50/60 Hz power-line interference, EMG noise from muscle tremor, and baseline wander from respiration. The ECG's onboard DSP applies spectral notch filters and FFT-based subtraction to remove each noise type before the trace reaches the physician. The quality of this denoising directly affects the accuracy of diagnosing arrhythmia and myocardial infarction — a false negative caused by missed QRS complex detection can be life-threatening.
Q1 What are the 5 steps of the FFT denoising pipeline, in order?
Q2 Compute the Donoho threshold for $N = 1024$, $\hat{\sigma} = 0.5$.
Q3 Why does FFT denoising fail for broadband speech, and what technique is used instead?
Interactive Demo
Adjust noise level and threshold in real time to observe how spectral gating transforms the noisy signal — and watch the SNR change instantly.
Recap
The five ideas from Spectral Analysis & Denoising that must survive the exam.
$x[n]=s[n]+w[n]$ and FFT linearity gives $X[k]=S[k]+W[k]$. Bins where $|W[k]|$ is small can be recovered by thresholding — denoising is possible only because of this linear separability.
$\text{SNR}=10\log_{10}(P_s/P_w)$ dB. Frequency resolution $\Delta f = f_s/N = 1/T$ — the longer you record, the finer the detail you can resolve in the frequency domain.
Always window before FFT. The Hann window tapers the signal to zero at both block edges, reducing peak sidelobes from −13 dB (rectangular) to −31 dB. Blackman (−58 dB) for high-dynamic-range analysis.
Average $K$ overlapping windowed frames. Variance drops as $1/K$ (smoother spectrum), at the cost of coarser $\Delta f = f_s/\text{nperseg}$. The naive periodogram is inconsistent — variance stays high even with more samples.
Window → FFT → median noise floor → threshold → IFFT → divide by window power (≈0.375 for Hann). Skipping the last step makes the output 2.7× too quiet. Donoho threshold $\lambda^*=\hat{\sigma}\sqrt{2\log N}$ requires no manual tuning.
Coming up — Week 7
You have been thresholding the FFT to remove noise from a complete signal. Next week we design filters that operate sample-by-sample in real time — with no need to buffer the entire signal first. FIR windowed-sinc filters, IIR Butterworth designs, pole-zero stability, and scipy.signal in one complete session.
Resources
Deepen your understanding with these curated references — from Python-first hands-on derivations to rigorous academic treatments.
Focused lecture on FFT-based spectral denoising: noise floor estimation, hard thresholding in Python, and SNR measurement. The primary video reference for this week.
▶ Watch on YouTube → DocumentationOfficial SciPy documentation for Welch's PSD estimator. Covers all parameters: nperseg, noverlap, window, scaling, and detrend, with worked examples.
🔗 docs.scipy.org → Textbook · FreePython-first DSP textbook. Chapter 4 covers noise colour taxonomy and PSD estimation; Chapter 8 builds intuition for spectral manipulation with NumPy.
🐙 github.com/allendowney/thinkdsp →Engineer-friendly treatment of spectral leakage (Ch. 5) and noise in the frequency domain (Ch. 11). Essential for understanding why windowing and Welch averaging are necessary in practice.
📚 Reference Book · Richard G. LyonsTest Your Understanding
Work through these problems to consolidate your understanding of noise characterisation, spectral analysis, FFT-based denoising, and Welch PSD estimation.
A clean 440 Hz sine wave has RMS amplitude 1.0. White Gaussian noise with RMS 0.15 is added.
(a) Calculate the SNR in dB.
(b) If noise RMS doubles to 0.30, how many dB does SNR drop?
(c) What noise RMS gives SNR = 20 dB?
Generate three signals ($N=4096$, $f_s=1000$ Hz):
(a) White noise: np.random.normal(0, 1, N)
(b) 60 Hz tonal noise: np.sin(2π·60·t) + small white noise
(c) Pink noise via FFT shaping: divide white noise spectrum by $\sqrt{k}$ for bin $k>0$, then IFFT.
Plot all three power spectra on a log-log scale. Describe the slope of each.
X=rfft(white); freqs=rfftfreq(N,1/fs); X[1:]/=np.sqrt(freqs[1:]); pink=irfft(X,n=N).Signal recorded at $f_s = 8000$ Hz for 2 seconds.
(a) How many samples $N$?
(b) What is $\Delta f$ (Hz/bin)?
(c) Can you resolve tones at 200 Hz and 200.4 Hz? Justify.
(d) How long must you record for $\Delta f = 0.1$ Hz?
Generate a 97.7 Hz tone at $f_s=1000$ Hz, $N=512$. Compute the FFT with Rectangular, Hann, Hamming, and Blackman windows.
(a) Plot the magnitude spectrum in dB for each, zoomed to 80–120 Hz.
(b) For each window, report the sidelobe level at 80 Hz (20 bins from the peak).
(c) Which window would you choose for detecting a −25 dB signal 10 Hz from a strong tone?
np.hanning(N), np.hamming(N), np.blackman(N). Plot 20*log10(|rfft(x*win)|/N). Rectangular sidelobes ≈ −13 dB; Blackman ≈ −58 dB. For −25 dB target: Hamming (−41 dB) or Blackman.Compute the following Hann window coefficients by hand ($N = 20$, formula: $w[n] = 0.5(1-\cos(2\pi n/(N-1)))$):
(a) $w[0]$
(b) $w[9]$ (approximately centre)
(c) $w[19]$
(d) Verify that the mean square $\bar{w}^2 = \frac{1}{N}\sum w[n]^2 \approx 0.375$. (Use: for Hann, $\bar{w}^2 = 3/8$.)
For a noisy 100 Hz tone (SNR = 10 dB, $f_s = 1000$ Hz, $N = 8192$):
(a) Compute PSD using naive $|\text{FFT}|^2/N$.
(b) Compute using scipy.signal.welch with nperseg=1024, 50% overlap.
(c) Plot both. Which is smoother and why?
(d) Why does Welch have lower frequency resolution than the naive method?
FFT magnitudes: $|X| = [1.5,\ 12.0,\ 0.8,\ 19.5,\ 2.1,\ 14.0,\ 0.6,\ 3.0]$.
Sorted values: $[0.6,\ 0.8,\ 1.5,\ 2.1,\ 3.0,\ 12.0,\ 14.0,\ 19.5]$.
Median = $(2.1+3.0)/2 = 2.55$, $\alpha = 3.0$, $\lambda = 7.65$.
(a) Apply hard threshold: list the surviving bins.
(b) Apply soft threshold to bin $k=1$ ($|X[1]|=12.0$).
(c) For soft threshold, compute $\hat{m}$ for all bins.
Implement hard_threshold_fft(signal, alpha) where alpha multiplies the median noise floor.
(a) Generate: 200 Hz + 500 Hz tones buried in white noise at SNR ≈ 5 dB.
(b) Try alpha = 2, 3, 5. Plot magnitude spectrum before and after for each.
(c) Which alpha best preserves the two signal tones while removing noise?
Compute the Donoho universal threshold $\lambda^* = \hat{\sigma}\sqrt{2\log N}$ for:
(a) $N = 1024$, $\hat{\sigma} = 0.4$
(b) $N = 4096$, $\hat{\sigma} = 0.25$
(c) If $\hat{\sigma}$ doubles, by what factor does $\lambda^*$ change?
(d) If $N$ doubles from 1024 to 2048, by what factor does $\lambda^*$ change? (Use $\ln 2 \approx 0.693$.)
Build the complete denoising pipeline:
(a) Generate a 3-tone clean signal (100 + 300 + 500 Hz, 1 s, $f_s=2000$ Hz).
(b) Add Gaussian noise to achieve SNR ≈ 10 dB.
(c) Apply Hann window → FFT → noise floor (median, $\alpha=3.5$) → hard threshold → IFFT → divide by window power.
(d) Measure SNR before and after. Plot original, noisy, and denoised.
(e) What happens if you forget to divide by window power?
Remove power-line interference from a synthetic ECG:
(a) Generate ECG: harmonics at 1, 3, 5 Hz (amplitudes 1.0, 0.6, 0.3), 10 s at $f_s=500$ Hz.
(b) Add 60 Hz sine at amplitude 0.5.
(c) Use FFT to identify the spike; compute $\Delta f$ and determine the bin range to zero (±2 Hz of 60 Hz).
(d) Reconstruct with IFFT. Compute SNR improvement.
(e) Why is ±2 Hz sufficient here but might fail for shorter recordings?
Implement both hard and soft thresholding for a 3-tone signal at SNR = 5 dB:
(a) Sweep $\lambda$ from 0 to $3\times$ median noise floor in 50 steps.
(b) For each $\lambda$, compute RMSE(denoised, clean).
(c) Plot RMSE vs $\lambda$ for both on the same axes.
(d) Identify optimal $\lambda^*$ for each. Which achieves lower minimum RMSE?
(e) At what $\lambda$ does hard thresholding start producing "musical noise" artifacts? (Look for the RMSE plateau or uptick.)