The DFT costs O(N²) — too slow for real-time audio, radar, and 5G. Cooley-Tukey's divide-and-conquer trick slashes this to O(N log N), making the modern digital world possible.
Topic 1 — Motivation
The DFT is mathematically correct — but its brute-force O(N²) cost makes it unusable for real-time processing. This section counts the operations and shows why that number grows catastrophically.
Imagine you need to build a real-time audio equalizer. Your app must analyse a 4096-sample audio frame sixty times per second — before the next frame arrives. You implement the DFT formula directly. The CPU runs for a few seconds… and then crashes. The math is right. So what went wrong?
The naive DFT is like matching every student in a class with every student individually to check friendship — that's N×N handshakes. Double the class size and the handshakes quadruple. The FFT finds a shortcut: split the class in half, check each half separately, then combine — cutting work from N² to N log N.
For a length-N discrete signal $x[n]$, the DFT computes N frequency bins:
$$X[k] = \sum_{n=0}^{N-1} x[n]\, e^{-j\frac{2\pi}{N}kn}, \quad k = 0,1,\ldots,N{-}1$$Each $X[k]$ requires N complex multiplications. Computing all N outputs costs $N \times N = N^2$ complex multiplications total — the O(N²) bottleneck.
For N = 16, how many complex multiplications does the naive DFT require?
Doubling N quadruples the work. This is not a constant slowdown — it is an exponentially growing tax:
| N | DFT muls (N²) | FFT muls (~N log₂N / 2) | Speedup |
|---|---|---|---|
| 8 | 64 | 12 | 5× |
| 64 | 4,096 | 192 | 21× |
| 1,024 | 1,048,576 | 5,120 | 205× |
| 65,536 | 4.3 billion | 1,048,576 | 4,096× |
At N = 65,536 (a typical audio FFT frame), the naive DFT requires 4.3 billion complex multiplications — more than a modern CPU core can execute in several seconds. The FFT does it in milliseconds.
If the codec doubles its frame size to N = 8192, how many times larger does the naive DFT cost become?
DFT operation pipeline: each of N output bins requires N multiplications → N × N = O(N²) total.
"Bigger N just means more data to process — it can't be that much slower."
Linear thinking is wrong here. If N doubles from 1024 to 2048, work does not double — it quadruples (4× more operations). At N = 65,536, the DFT is 4,096 times slower than FFT. This is why the FFT is not an optimisation — it is a categorical change in what is possible.
Before exploring the widget: if N doubles from 64 to 128, does the DFT operation count increase by 2× or 4×? What about the FFT (N log₂N)?
Hint: Write out 64² and 128² side-by-side, then compare 64×6 vs 128×7.
Modern FMCW radar processes 256-point chirp signals at 10 ms intervals. Naive DFT at N = 256 costs 65,536 muls per sweep. With 100+ sweeps per frame at 100 frames/sec: over 655 million muls/sec — far beyond an embedded DSP at 200 MHz. The FFT reduces this to 256 × 8 = 2,048 muls per sweep, well within embedded range. The FFT is why your car's collision-avoidance radar works in real time.
Q1 A naive DFT of length N = 512 requires how many complex multiplications?
Q2 If you double N from 512 to 1024, by what factor does the naive DFT cost increase?
Q3 Which line in the DFT formula causes the O(N²) cost — the summation index n, the output index k, or both?
Topic 2 — Mechanics
By splitting one N-point DFT into two N/2-point DFTs and recombining with twiddle factors, the algorithm recursively reuses computations — converting O(N²) into O(N log N).
In 1965, James Cooley and John Tukey published a paper that Carl Friedrich Gauss had essentially written 160 years earlier (without publishing it). The insight is deceptively simple: the even-indexed and odd-indexed samples of any signal transform completely independently — and their sub-transforms can be recombined in O(N) time. One observation; a million-fold speedup.
The FFT is like merge-sort for frequency analysis. Just as merge-sort splits an array in half, sorts each half, then merges — the FFT splits the signal into even/odd halves, transforms each half, then "butterflies" them together. Both algorithms run in O(N log N) for the same reason.
Rewrite the DFT sum by separating even-indexed ($n = 2r$) and odd-indexed ($n = 2r+1$) samples:
$$X[k] = \underbrace{\sum_{r=0}^{N/2-1} x[2r]\,e^{-j\frac{2\pi}{N/2}kr}}_{X_e[k]} + \; e^{-j\frac{2\pi}{N}k}\underbrace{\sum_{r=0}^{N/2-1} x[2r+1]\,e^{-j\frac{2\pi}{N/2}kr}}_{X_o[k]}$$This is the butterfly equation. Each stage costs O(N) to combine two N/2-point DFTs — and there are log₂N stages, giving O(N log N) total.
What is the twiddle factor $W_8^2$? Express as a+jb (exact values only).
The key trick that enables the upper half of X[k] to be computed for free:
$$W_N^{k + N/2} = e^{-j\frac{2\pi}{N}(k+N/2)} = e^{-j\frac{2\pi k}{N}} \cdot e^{-j\pi} = -W_N^k$$This means the complete butterfly only needs one twiddle multiply per pair, not two. Combined with the even/odd split, each stage costs exactly N/2 multiplications — and log₂N stages gives (N/2)log₂N total muls.
For N = 8, k = 3: compute $W_8^3$ and $W_8^7$ (show that they are negatives of each other).
Radix-2 DIT butterfly for N=4. Stage 1 computes two 2-point DFTs (even/odd split). Stage 2 combines with twiddle factors. Solid purple = add; dashed red = subtract.
Why "butterfly"? Draw the stage-2 connections on paper: two nodes on the left fan out to two nodes on the right, and the crossing wires form a shape that looks like butterfly wings. Each butterfly operation takes two complex inputs and produces two complex outputs using one twiddle multiply — the most efficient possible merge.
For N = 8, how many butterfly stages are needed? How many twiddle multiplications per stage? What is the total FFT multiplication count?
Hint: stages = log₂(8), twiddle muls per stage = N/2, total = (N/2) × stages.
OFDM (Orthogonal Frequency Division Multiplexing) — the modulation scheme in every 4G/5G base station — is literally a bank of inverse FFTs. A 5G base station may compute 4096-point FFTs at millions of frames per second across hundreds of subcarriers simultaneously. The radix-2 butterfly structure enables this at silicon-efficient O(N log N) cost. Without Cooley-Tukey, high-speed wireless communications as we know them would be impossible.
Q1 For N = 8, how many butterfly stages does the radix-2 FFT have, and how many complex multiplications per stage?
Q2 What is the exact value of $W_4^1$ (twiddle factor for N=4, k=1)?
Q3 Why does $W_N^{k+N/2} = -W_N^k$? Express the proof in one sentence.
Topic 3 — Application
scipy.fftTheory meets practice: compute, interpret, and display a frequency spectrum from a real audio or synthesized signal using scipy.fft. Master zero-padding, the fftfreq axis, and the magnitude spectrum.
scipy.fft.fft and scipy.fft.fftfreq to compute and label the frequency axis of a signal's spectrum.You now know that the FFT is fast. But how do you actually use it? What does the output array mean? Why does the second half of the spectrum look like a mirror of the first? And why does zero-padding seem to "add detail" to a spectrum that wasn't there before?
scipy.fft.fft is the Swiss-army knife of signal analysis — used in audio equalizers, spectrum analysers, vibration monitoring, and neural signal processing. Reading its output correctly is a foundational skill for any DSP practitioner.
The FFT output is like a piano keyboard laid flat. The first bin (k=0) is DC (0 Hz). Each subsequent bin is one step higher in frequency. The second half mirrors the first because a real note played on a physical keyboard sounds the same played forwards or backwards — real signals have symmetric (conjugate) spectra.
After calling fft(x), bin index k does not directly equal frequency. You need fftfreq to convert:
The frequency resolution is $\Delta f = f_s/N$ Hz per bin. To detect a 1 Hz difference, you need at least 1 second of signal ($N = f_s$).
If you double N to 1024 (same $f_s = 8000$ Hz), what is the new frequency resolution? What bin does 440 Hz land in?
Appending N zeros to a length-N signal before calling fft() doubles the number of output bins without changing the spectral content:
Zero-padding interpolates the spectrum — it adds points between existing bins, making peaks look sharper. It does not add new spectral information (that would require a longer signal).
Use scipy.fft.fft(x, n=2*N) to zero-pad to length 2N in one call. Typical practice: pad to the next power of 2 for maximum FFT efficiency.
A signal has N = 256 samples at $f_s = 44100$ Hz. What is the frequency resolution? What happens to this resolution if you zero-pad to N = 512?
Zero-padding doubles the bin density (interpolation) without adding new frequency information. The spectral shape is preserved; peaks appear smoother.
"Zero-padding gives better frequency resolution."
It gives better display resolution (interpolation), not better spectral resolution (ability to distinguish two nearby tones). To actually resolve two frequencies separated by $\Delta f$ Hz, you need a signal of length at least $1/\Delta f$ seconds — no amount of zero-padding can substitute for more data.
A signal has $f_s = 1000$ Hz and N = 100 samples, containing tones at 50 Hz and 200 Hz. Before running the widget: which bins do you expect to see peaks at? How many useful bins total?
Hint: bin k ≈ f / (fs/N). Useful bins = N/2 = 50.
fft(x) → complex spectrum X[k] → fftfreq(N, 1/fs) for the Hz axis → np.abs(X[:N//2]) for the one-sided magnitude spectrum — where peaks in the magnitude correspond directly to sinusoidal frequencies in the original signal.Shazam applies the FFT to overlapping 3-second audio windows from a song, extracts peaks in the magnitude spectrum across multiple frequency bands, and stores them as a "constellation map" fingerprint. When you hum or record a clip, the same FFT + peak-detection pipeline runs in real time. The fingerprint is then matched against a database of 100+ million songs in milliseconds. The entire pipeline — recording, FFT, peak detection, hash lookup — runs on a mobile CPU in under two seconds, made possible by the O(N log N) FFT.
Q1 For $f_s = 16000$ Hz and $N = 1024$, what is the frequency resolution (Hz per bin) and at which bin index does a 2000 Hz tone appear?
Q2 Why do we keep only the first N/2 output bins of the FFT for a real-valued signal?
Q3 You zero-pad from N=256 to N=1024. What changes and what does not change?
Interactive Lab
An integrated dashboard combining all three topics: compare DFT vs FFT complexity, trace a butterfly computation, and analyse a multi-tone signal spectrum — all live.
Week 5 Summary
Five ideas from this week that every signal processing practitioner must carry.
The naive DFT requires N² complex multiplications. At N = 4096, that is 16.7 million muls per frame — real-time audio processing is impossible. Doubling N quadruples cost.
Split the signal into even- and odd-indexed halves. Compute two N/2-point DFTs independently. Recombine with twiddle factors in O(N). Apply recursively log₂N times → O(N log N).
Each butterfly takes inputs (a, b) and a twiddle W, and produces (a + Wb, a − Wb) — two outputs for the price of one complex multiplication. Twiddle symmetry W^(k+N/2) = −W^k means the upper half is free.
The canonical pipeline: fft(x) → complex spectrum → fftfreq(N, 1/fs) → Hz axis → abs(X[:N//2]) → one-sided magnitude. Frequency resolution = f_s / N Hz per bin.
Zero-padding increases bin density (smoother display) but does not improve the ability to resolve closely-spaced tones. Only a longer signal (more time data) gives true spectral resolution.
Now that you have the FFT, learn to use it for practical signal analysis: window functions to reduce spectral leakage, Welch's method for averaged power spectra, and FFT-based denoising pipelines for audio and ECG signals.
Further Reading
Curated resources for building a richer understanding of the FFT — from intuition to implementation.
A crystal-clear visual walkthrough of the Cooley-Tukey FFT: recursion tree, butterfly diagrams, and the exact operation count derivation. The best single introduction to the FFT available online.
Watch on YouTube → DocsComplete documentation for fft, fftfreq, rfft (real-input optimisation), fftshift, and the planning-based next_fast_len function. Essential reference when writing production FFT code.
An interactive visual primer covering sine waves, phasors, and the DFT from first principles. Particularly helpful for building geometric intuition about why the DFT uses rotating complex exponentials.
Explore interactively → DocsThe numpy.fft module: np.fft.fft, np.fft.rfft, normalisation conventions, and comparison with scipy.fft. Understanding both APIs avoids silent normalisation bugs when switching between them.
Practice
8 exercises (3 topics × 2 + 2 synthesis). Expand the hint if you are stuck — only after attempting the problem first.
A radar system processes chirp signals of length N = 128. (a) How many complex multiplications does the naive DFT require per transform? (b) How many does the radix-2 FFT require? (c) By what factor is the FFT faster?
Implement the naive_dft function using NumPy matrix multiplication (the N×N twiddle matrix approach). Use time.perf_counter to measure the average execution time over 10 repeats for N ∈ {32, 64, 128, 256}. Plot DFT time vs N² on a log-log axis and verify the slope is approximately 2.
W = np.exp(-2j*np.pi*k*n/N) where k = n.reshape((N,1)). Then X = W @ x. For the log-log plot use plt.loglog.Given x = [1, 2, 3, 4], compute X[0], X[1], X[2], X[3] using the radix-2 DIT butterfly exactly as derived in the slides: (1) split even/odd, (2) compute two 2-point DFTs, (3) apply the butterfly recombination with the appropriate $W_4^k$ twiddle factors. Verify by computing the DFT directly from the sum formula.
Implement an iterative (non-recursive) radix-2 Cooley-Tukey FFT using bit-reversal permutation and in-place butterfly updates. Your function signature: fft_iterative(x). Test it on random complex arrays of length 8, 16, 64, and 256, verifying np.allclose(fft_iterative(x), scipy.fft.fft(x)) for each.
A physiological sensor records a signal at $f_s = 250$ Hz with N = 500 samples. (a) What is the frequency resolution (Hz per bin)? (b) What bin index corresponds to 50 Hz (power-line interference)? (c) After zero-padding to 2000 samples, what is the new bin width? (d) Does zero-padding help resolve two tones at 49.8 Hz and 50.2 Hz? Explain why or why not.
Synthesise a signal at $f_s = 8000$ Hz containing three tones: 330 Hz (E4), 440 Hz (A4), and 550 Hz (C#5), each for 0.5 seconds. Use scipy.fft.fft to compute the one-sided magnitude spectrum. Find the three dominant frequency bins using scipy.signal.find_peaks with a minimum distance of 50 bins and print the detected frequencies. Plot the magnitude spectrum from 0–1000 Hz.
Let $C(N)$ be the number of complex multiplications for an N-point radix-2 FFT. (a) Write the recurrence relation: splitting into two N/2-point FFTs plus N/2 butterfly muls. (b) Solve the recurrence to show $C(N) = (N/2)\log_2 N$. (c) For N = 1024, compute both $N^2$ and $(N/2)\log_2 N$ and express the speedup ratio as an integer. (d) Explain why the FFT speedup grows with N and what this means architecturally for choosing hardware.
Build a complete spectrum analyser function spectrum_analyser(fs, duration=1.0, n_fft=2048) that: (1) synthesises a "mystery" signal combining a 261 Hz tone (C4), a 329 Hz tone (E4), and a 392 Hz tone (G4) — forming a C major chord; (2) applies a Hann window (scipy.signal.windows.hann) to reduce spectral leakage; (3) computes the FFT and one-sided magnitude spectrum; (4) detects peaks and prints the identified chord tones; (5) plots the magnitude spectrum from 100–600 Hz with peak markers. Compare results with and without the window function — which approach gives sharper peaks?