JewelMusic - AI-Powered Music Distribution Platform

The Cocktail Party Problem

Imagine being at a noisy cocktail party where multiple conversations happen simultaneously. Your brain can focus on a single speaker while filtering out others - a remarkable feat of biological signal processing. This is the challenge that Independent Component Analysis (ICA) and Non-Negative Matrix Factorization (NMF) aim to solve computationally.

Blind Source Separation

The term "blind" means we have no prior information about the sources or how they were mixed. We only observe the mixture and must infer the original components using statistical properties.

Independent Component Analysis (ICA)

Mathematical Foundation

ICA assumes a linear mixing model where observed signals are linear combinations of independent sources:

\mathbf{x} = \mathbf{A}\mathbf{s}

Where x is the observed mixture, s is the source signals, and A is the mixing matrix

The goal is to find an unmixing matrix W such that:

\mathbf{s} \approx \mathbf{y} = \mathbf{W}\mathbf{x}

Core Assumptions of ICA

Statistical Independence

Source signals must be mutually independent

Non-Gaussianity

At most one source can be Gaussian distributed

Linear Mixing

Sources combine linearly without delays or echoes

The FastICA Algorithm

FastICA Implementation

1import numpy as np
2from scipy import signal
3
4class FastICA:
5    def __init__(self, n_components=None, max_iter=200):
6        self.n_components = n_components
7        self.max_iter = max_iter
8        self.W = None
9        
10    def _center(self, X):
11        """Center the data by removing mean"""
12        self.mean_ = np.mean(X, axis=1, keepdims=True)
13        return X - self.mean_
14    
15    def _whiten(self, X):
16        """Whiten data to have unit variance"""
17        # Compute covariance matrix
18        cov = np.cov(X)
19        
20        # Eigenvalue decomposition
21        eigenvalues, eigenvectors = np.linalg.eigh(cov)
22        
23        # Whitening matrix
24        D = np.diag(1.0 / np.sqrt(eigenvalues))
25        self.whitening_ = D @ eigenvectors.T
26        
27        return self.whitening_ @ X
28    
29    def _g_tanh(self, x):
30        """Non-linearity function and its derivative"""
31        g = np.tanh(x)
32        g_prime = 1 - g**2
33        return g, g_prime.mean(axis=1, keepdims=True)
34    
35    def fit_transform(self, X):
36        """Apply FastICA algorithm"""
37        n_samples, n_features = X.shape
38        X = X.T  # Work with features x samples
39        
40        # Preprocessing
41        X = self._center(X)
42        X = self._whiten(X)
43        
44        n_comp = self.n_components or X.shape[0]
45        W = np.random.randn(n_comp, X.shape[0])
46        
47        # FastICA iteration
48        for c in range(n_comp):
49            w = W[c, :].copy()
50            w = w / np.linalg.norm(w)
51            
52            for i in range(self.max_iter):
53                # Apply non-linearity
54                wx = w @ X
55                g, g_prime = self._g_tanh(wx)
56                
57                # Update rule
58                w_new = (X @ g.T) / n_features - g_prime * w
59                
60                # Decorrelate from previous components
61                w_new = w_new - W[:c].T @ (W[:c] @ w_new)
62                
63                # Normalize
64                w_new = w_new / np.linalg.norm(w_new)
65                
66                # Check convergence
67                if np.abs(np.abs((w_new @ w)) - 1) < 1e-8:
68                    break
69                    
70                w = w_new
71            
72            W[c, :] = w
73        
74        self.W = W
75        return (W @ X).T

Non-Negative Matrix Factorization (NMF)

Unlike ICA, NMF works directly on magnitude spectrograms, decomposing them into non-negative basis functions and activations:

\mathbf{V} \approx \mathbf{W}\mathbf{H}

V: Spectrogram (frequency × time), W: Basis matrix (frequency × components), H: Activation matrix (components × time)

Musical Interpretation

W columns: Spectral templates (e.g., note timbres)
H rows: Temporal activations (when each note plays)
Non-negativity: Models additive nature of audio mixing

NMF Update Rules

The multiplicative update rules ensure non-negativity while minimizing reconstruction error:

NMF with Multiplicative Updates

1import numpy as np
2
3class NMF:
4    def __init__(self, n_components=10, max_iter=200):
5        self.n_components = n_components
6        self.max_iter = max_iter
7        
8    def fit_transform(self, V, beta=2):
9        """
10        Apply NMF using multiplicative update rules
11        beta: divergence parameter (2 = Euclidean, 1 = KL, 0 = IS)
12        """
13        n_freq, n_time = V.shape
14        
15        # Random initialization
16        W = np.random.rand(n_freq, self.n_components) + 1e-10
17        H = np.random.rand(self.n_components, n_time) + 1e-10
18        
19        # Normalize W
20        W = W / W.sum(axis=0, keepdims=True)
21        
22        for iteration in range(self.max_iter):
23            # Compute approximation
24            V_approx = W @ H + 1e-10
25            
26            # Update H (activations)
27            if beta == 2:  # Euclidean distance
28                H = H * (W.T @ V) / (W.T @ V_approx + 1e-10)
29            elif beta == 1:  # KL divergence
30                H = H * (W.T @ (V / V_approx)) / (W.T @ np.ones_like(V) + 1e-10)
31            elif beta == 0:  # IS divergence
32                H = H * (W.T @ (V * V_approx**(-2))) / (W.T @ V_approx**(-1) + 1e-10)
33            
34            # Update W (basis)
35            V_approx = W @ H + 1e-10
36            
37            if beta == 2:
38                W = W * (V @ H.T) / (V_approx @ H.T + 1e-10)
39            elif beta == 1:
40                W = W * ((V / V_approx) @ H.T) / (np.ones_like(V) @ H.T + 1e-10)
41            elif beta == 0:
42                W = W * ((V * V_approx**(-2)) @ H.T) / (V_approx**(-1) @ H.T + 1e-10)
43            
44            # Normalize W to prevent scaling ambiguity
45            W = W / W.sum(axis=0, keepdims=True)
46            
47            # Compute reconstruction error
48            if iteration % 10 == 0:
49                error = np.sum((V - V_approx)**2)
50                print(f"Iteration {iteration}, Error: {error:.4f}")
51        
52        self.W = W
53        self.H = H
54        return W, H
55    
56    def separate_sources(self, V):
57        """Reconstruct individual sources from components"""
58        W, H = self.fit_transform(V)
59        sources = []
60        
61        for i in range(self.n_components):
62            # Reconstruct source i
63            source_spectrogram = W[:, i:i+1] @ H[i:i+1, :]
64            sources.append(source_spectrogram)
65            
66        return sources

Practical Music Separation Example

Let's implement a complete pipeline for separating drums and harmonic content:

Web-based NMF Music Separation

1// Music separation using NMF in the browser
2class MusicSeparator {
3  constructor(fftSize = 2048, hopSize = 512) {
4    this.fftSize = fftSize;
5    this.hopSize = hopSize;
6    this.audioContext = new AudioContext();
7  }
8  
9  async separateAudio(audioBuffer) {
10    // Convert to mono
11    const channelData = audioBuffer.getChannelData(0);
12    
13    // Compute STFT
14    const spectrogram = this.computeSTFT(channelData);
15    
16    // Apply NMF
17    const { drums, harmonic } = await this.applyNMF(spectrogram);
18    
19    // Reconstruct audio
20    const drumsAudio = this.reconstructAudio(drums, spectrogram.phase);
21    const harmonicAudio = this.reconstructAudio(harmonic, spectrogram.phase);
22    
23    return { drumsAudio, harmonicAudio };
24  }
25  
26  computeSTFT(signal) {
27    const frames = [];
28    const phases = [];
29    const window = this.hannWindow(this.fftSize);
30    
31    for (let i = 0; i <= signal.length - this.fftSize; i += this.hopSize) {
32      const frame = signal.slice(i, i + this.fftSize);
33      const windowed = frame.map((s, j) => s * window[j]);
34      
35      // FFT (using Web Audio API's FFT)
36      const fft = this.fft(windowed);
37      frames.push(fft.magnitude);
38      phases.push(fft.phase);
39    }
40    
41    return {
42      magnitude: frames,
43      phase: phases
44    };
45  }
46  
47  async applyNMF(spectrogram) {
48    // Convert to 2D array
49    const V = this.to2DArray(spectrogram.magnitude);
50    
51    // NMF with components for drums (percussive) and harmonic
52    const nmf = new NMF({ 
53      nComponents: 20,  // 10 for drums, 10 for harmonic
54      maxIter: 100 
55    });
56    
57    const { W, H } = await nmf.fitTransform(V);
58    
59    // Cluster components by characteristics
60    const drumsIndices = this.identifyPercussiveComponents(W, H);
61    const harmonicIndices = this.identifyHarmonicComponents(W, H);
62    
63    // Reconstruct separated spectrograms
64    const drums = this.reconstructFromComponents(W, H, drumsIndices);
65    const harmonic = this.reconstructFromComponents(W, H, harmonicIndices);
66    
67    return { drums, harmonic };
68  }
69  
70  identifyPercussiveComponents(W, H) {
71    // Percussive components have:
72    // - Broadband frequency content (high entropy in W)
73    // - Short, impulsive activations (spiky H)
74    const indices = [];
75    
76    for (let i = 0; i < W.columns; i++) {
77      const frequencyEntropy = this.entropy(W.getColumn(i));
78      const temporalSparsity = this.sparsity(H.getRow(i));
79      
80      if (frequencyEntropy > 0.7 && temporalSparsity > 0.8) {
81        indices.push(i);
82      }
83    }
84    
85    return indices;
86  }
87  
88  identifyHarmonicComponents(W, H) {
89    // Harmonic components have:
90    // - Peaked frequency content (harmonic structure in W)
91    // - Sustained activations (smooth H)
92    const indices = [];
93    
94    for (let i = 0; i < W.columns; i++) {
95      const harmonicity = this.measureHarmonicity(W.getColumn(i));
96      const temporalSmoothness = this.smoothness(H.getRow(i));
97      
98      if (harmonicity > 0.6 && temporalSmoothness > 0.7) {
99        indices.push(i);
100      }
101    }
102    
103    return indices;
104  }
105}

Comparison: ICA vs NMF

ICA Strengths

• Works well for instantaneous mixtures
• Strong theoretical foundation
• Good for speech separation
• Preserves phase information

NMF Strengths

• Natural for spectrograms
• Parts-based representation
• Handles overlapping harmonics
• Interpretable components

Limitations of Classical Methods

Both ICA and NMF struggle with:

• Complex, real-world music with reverb and effects
• Overlapping frequency content between instruments
• Time-varying mixing conditions
• Limited to linear mixing assumptions

These limitations motivated the shift to deep learning approaches like U-Net and Demucs.

Modern Applications and Extensions

Informed Source Separation

Modern approaches combine NMF with side information like musical scores or user annotations to guide the separation process.

Complex NMF

Extensions to complex-valued spectrograms preserve phase information while maintaining the parts-based decomposition.

Deep NMF

Hybrid approaches use neural networks to learn better basis functions or to post-process NMF outputs.

Conclusion

ICA and NMF represent the classical foundation of music source separation. While modern deep learning methods have surpassed them in performance, understanding these statistical approaches provides crucial insights into the mathematical structure of music and the fundamental challenges of source separation.

Try It Yourself

Experience these algorithms in action with our interactive demos: