ICA and NMF: Statistical Approaches to Music Source Separation
Explore Independent Component Analysis and Non-Negative Matrix Factorization - the classical statistical models that pioneered blind source separation in music.

The Cocktail Party Problem
Imagine being at a noisy cocktail party where multiple conversations happen simultaneously. Your brain can focus on a single speaker while filtering out others - a remarkable feat of biological signal processing. This is the challenge that Independent Component Analysis (ICA) and Non-Negative Matrix Factorization (NMF) aim to solve computationally.
The term "blind" means we have no prior information about the sources or how they were mixed. We only observe the mixture and must infer the original components using statistical properties.
Independent Component Analysis (ICA)
Mathematical Foundation
ICA assumes a linear mixing model where observed signals are linear combinations of independent sources:
Where x is the observed mixture, s is the source signals, and A is the mixing matrix
The goal is to find an unmixing matrix W such that:
Statistical Independence
Source signals must be mutually independent
Non-Gaussianity
At most one source can be Gaussian distributed
Linear Mixing
Sources combine linearly without delays or echoes
The FastICA Algorithm
1import numpy as np
2from scipy import signal
3
4class FastICA:
5 def __init__(self, n_components=None, max_iter=200):
6 self.n_components = n_components
7 self.max_iter = max_iter
8 self.W = None
9
10 def _center(self, X):
11 """Center the data by removing mean"""
12 self.mean_ = np.mean(X, axis=1, keepdims=True)
13 return X - self.mean_
14
15 def _whiten(self, X):
16 """Whiten data to have unit variance"""
17 # Compute covariance matrix
18 cov = np.cov(X)
19
20 # Eigenvalue decomposition
21 eigenvalues, eigenvectors = np.linalg.eigh(cov)
22
23 # Whitening matrix
24 D = np.diag(1.0 / np.sqrt(eigenvalues))
25 self.whitening_ = D @ eigenvectors.T
26
27 return self.whitening_ @ X
28
29 def _g_tanh(self, x):
30 """Non-linearity function and its derivative"""
31 g = np.tanh(x)
32 g_prime = 1 - g**2
33 return g, g_prime.mean(axis=1, keepdims=True)
34
35 def fit_transform(self, X):
36 """Apply FastICA algorithm"""
37 n_samples, n_features = X.shape
38 X = X.T # Work with features x samples
39
40 # Preprocessing
41 X = self._center(X)
42 X = self._whiten(X)
43
44 n_comp = self.n_components or X.shape[0]
45 W = np.random.randn(n_comp, X.shape[0])
46
47 # FastICA iteration
48 for c in range(n_comp):
49 w = W[c, :].copy()
50 w = w / np.linalg.norm(w)
51
52 for i in range(self.max_iter):
53 # Apply non-linearity
54 wx = w @ X
55 g, g_prime = self._g_tanh(wx)
56
57 # Update rule
58 w_new = (X @ g.T) / n_features - g_prime * w
59
60 # Decorrelate from previous components
61 w_new = w_new - W[:c].T @ (W[:c] @ w_new)
62
63 # Normalize
64 w_new = w_new / np.linalg.norm(w_new)
65
66 # Check convergence
67 if np.abs(np.abs((w_new @ w)) - 1) < 1e-8:
68 break
69
70 w = w_new
71
72 W[c, :] = w
73
74 self.W = W
75 return (W @ X).T
Non-Negative Matrix Factorization (NMF)
Unlike ICA, NMF works directly on magnitude spectrograms, decomposing them into non-negative basis functions and activations:
V: Spectrogram (frequency × time), W: Basis matrix (frequency × components), H: Activation matrix (components × time)
- W columns: Spectral templates (e.g., note timbres)
- H rows: Temporal activations (when each note plays)
- Non-negativity: Models additive nature of audio mixing
NMF Update Rules
The multiplicative update rules ensure non-negativity while minimizing reconstruction error:
1import numpy as np
2
3class NMF:
4 def __init__(self, n_components=10, max_iter=200):
5 self.n_components = n_components
6 self.max_iter = max_iter
7
8 def fit_transform(self, V, beta=2):
9 """
10 Apply NMF using multiplicative update rules
11 beta: divergence parameter (2 = Euclidean, 1 = KL, 0 = IS)
12 """
13 n_freq, n_time = V.shape
14
15 # Random initialization
16 W = np.random.rand(n_freq, self.n_components) + 1e-10
17 H = np.random.rand(self.n_components, n_time) + 1e-10
18
19 # Normalize W
20 W = W / W.sum(axis=0, keepdims=True)
21
22 for iteration in range(self.max_iter):
23 # Compute approximation
24 V_approx = W @ H + 1e-10
25
26 # Update H (activations)
27 if beta == 2: # Euclidean distance
28 H = H * (W.T @ V) / (W.T @ V_approx + 1e-10)
29 elif beta == 1: # KL divergence
30 H = H * (W.T @ (V / V_approx)) / (W.T @ np.ones_like(V) + 1e-10)
31 elif beta == 0: # IS divergence
32 H = H * (W.T @ (V * V_approx**(-2))) / (W.T @ V_approx**(-1) + 1e-10)
33
34 # Update W (basis)
35 V_approx = W @ H + 1e-10
36
37 if beta == 2:
38 W = W * (V @ H.T) / (V_approx @ H.T + 1e-10)
39 elif beta == 1:
40 W = W * ((V / V_approx) @ H.T) / (np.ones_like(V) @ H.T + 1e-10)
41 elif beta == 0:
42 W = W * ((V * V_approx**(-2)) @ H.T) / (V_approx**(-1) @ H.T + 1e-10)
43
44 # Normalize W to prevent scaling ambiguity
45 W = W / W.sum(axis=0, keepdims=True)
46
47 # Compute reconstruction error
48 if iteration % 10 == 0:
49 error = np.sum((V - V_approx)**2)
50 print(f"Iteration {iteration}, Error: {error:.4f}")
51
52 self.W = W
53 self.H = H
54 return W, H
55
56 def separate_sources(self, V):
57 """Reconstruct individual sources from components"""
58 W, H = self.fit_transform(V)
59 sources = []
60
61 for i in range(self.n_components):
62 # Reconstruct source i
63 source_spectrogram = W[:, i:i+1] @ H[i:i+1, :]
64 sources.append(source_spectrogram)
65
66 return sources
Practical Music Separation Example
Let's implement a complete pipeline for separating drums and harmonic content:
1// Music separation using NMF in the browser
2class MusicSeparator {
3 constructor(fftSize = 2048, hopSize = 512) {
4 this.fftSize = fftSize;
5 this.hopSize = hopSize;
6 this.audioContext = new AudioContext();
7 }
8
9 async separateAudio(audioBuffer) {
10 // Convert to mono
11 const channelData = audioBuffer.getChannelData(0);
12
13 // Compute STFT
14 const spectrogram = this.computeSTFT(channelData);
15
16 // Apply NMF
17 const { drums, harmonic } = await this.applyNMF(spectrogram);
18
19 // Reconstruct audio
20 const drumsAudio = this.reconstructAudio(drums, spectrogram.phase);
21 const harmonicAudio = this.reconstructAudio(harmonic, spectrogram.phase);
22
23 return { drumsAudio, harmonicAudio };
24 }
25
26 computeSTFT(signal) {
27 const frames = [];
28 const phases = [];
29 const window = this.hannWindow(this.fftSize);
30
31 for (let i = 0; i <= signal.length - this.fftSize; i += this.hopSize) {
32 const frame = signal.slice(i, i + this.fftSize);
33 const windowed = frame.map((s, j) => s * window[j]);
34
35 // FFT (using Web Audio API's FFT)
36 const fft = this.fft(windowed);
37 frames.push(fft.magnitude);
38 phases.push(fft.phase);
39 }
40
41 return {
42 magnitude: frames,
43 phase: phases
44 };
45 }
46
47 async applyNMF(spectrogram) {
48 // Convert to 2D array
49 const V = this.to2DArray(spectrogram.magnitude);
50
51 // NMF with components for drums (percussive) and harmonic
52 const nmf = new NMF({
53 nComponents: 20, // 10 for drums, 10 for harmonic
54 maxIter: 100
55 });
56
57 const { W, H } = await nmf.fitTransform(V);
58
59 // Cluster components by characteristics
60 const drumsIndices = this.identifyPercussiveComponents(W, H);
61 const harmonicIndices = this.identifyHarmonicComponents(W, H);
62
63 // Reconstruct separated spectrograms
64 const drums = this.reconstructFromComponents(W, H, drumsIndices);
65 const harmonic = this.reconstructFromComponents(W, H, harmonicIndices);
66
67 return { drums, harmonic };
68 }
69
70 identifyPercussiveComponents(W, H) {
71 // Percussive components have:
72 // - Broadband frequency content (high entropy in W)
73 // - Short, impulsive activations (spiky H)
74 const indices = [];
75
76 for (let i = 0; i < W.columns; i++) {
77 const frequencyEntropy = this.entropy(W.getColumn(i));
78 const temporalSparsity = this.sparsity(H.getRow(i));
79
80 if (frequencyEntropy > 0.7 && temporalSparsity > 0.8) {
81 indices.push(i);
82 }
83 }
84
85 return indices;
86 }
87
88 identifyHarmonicComponents(W, H) {
89 // Harmonic components have:
90 // - Peaked frequency content (harmonic structure in W)
91 // - Sustained activations (smooth H)
92 const indices = [];
93
94 for (let i = 0; i < W.columns; i++) {
95 const harmonicity = this.measureHarmonicity(W.getColumn(i));
96 const temporalSmoothness = this.smoothness(H.getRow(i));
97
98 if (harmonicity > 0.6 && temporalSmoothness > 0.7) {
99 indices.push(i);
100 }
101 }
102
103 return indices;
104 }
105}
Comparison: ICA vs NMF
- • Works well for instantaneous mixtures
- • Strong theoretical foundation
- • Good for speech separation
- • Preserves phase information
- • Natural for spectrograms
- • Parts-based representation
- • Handles overlapping harmonics
- • Interpretable components
Both ICA and NMF struggle with:
- • Complex, real-world music with reverb and effects
- • Overlapping frequency content between instruments
- • Time-varying mixing conditions
- • Limited to linear mixing assumptions
These limitations motivated the shift to deep learning approaches like U-Net and Demucs.
Modern Applications and Extensions
Conclusion
ICA and NMF represent the classical foundation of music source separation. While modern deep learning methods have surpassed them in performance, understanding these statistical approaches provides crucial insights into the mathematical structure of music and the fundamental challenges of source separation.