Matrix Factorization Methods for Music Analysis
Deep dive into Independent Component Analysis (ICA) and Non-Negative Matrix Factorization (NMF) - powerful techniques for blind source separation, feature extraction, and understanding the latent structure of music.
The Power of Matrix Decomposition in Music
Matrix factorization techniques transform complex musical signals into interpretable components. By decomposing spectrograms or feature matrices into products of simpler matrices, we can uncover hidden structure, separate sources, and extract meaningful patterns from music.
Given a data matrix (e.g., a spectrogram), find factor matrices that approximate it:
Where represents basis components (e.g., instrument timbres) and represents their activations over time.
Independent Component Analysis (ICA)
ICA assumes that observed signals are linear mixtures of statistically independent source signals. It's particularly powerful for separating sources that were mixed linearly, like multiple instruments recorded simultaneously.
For source signals and observed mixtures :
ICA finds the unmixing matrix such that:
The key assumption: sources are statistically independent and non-Gaussian.
Mutual Information:
ICA minimizes mutual information between components
Negentropy:
Measures departure from Gaussian distribution
The most popular ICA algorithm uses fixed-point iteration:
1. Preprocessing:
2. Iteration:
where is a non-quadratic function (e.g., )
1import numpy as np
2from sklearn.decomposition import FastICA
3import librosa
4import soundfile as sf
5
6def perform_ica_separation(audio_mixture, n_components=2, max_iter=200):
7 """
8 Separate audio sources using Independent Component Analysis
9
10 Parameters:
11 - audio_mixture: Stereo audio signal (2 channels)
12 - n_components: Number of sources to extract
13 - max_iter: Maximum iterations for FastICA
14 """
15 # Ensure we have a 2D array (channels × samples)
16 if audio_mixture.ndim == 1:
17 audio_mixture = np.vstack([audio_mixture, audio_mixture])
18
19 # Initialize FastICA
20 ica = FastICA(
21 n_components=n_components,
22 algorithm='parallel', # or 'deflation'
23 whiten=True,
24 fun='logcosh', # Non-linearity: 'logcosh', 'exp', 'cube'
25 max_iter=max_iter,
26 random_state=42
27 )
28
29 # Fit and transform the data
30 sources = ica.fit_transform(audio_mixture.T).T
31
32 # Get mixing matrix for analysis
33 mixing_matrix = ica.mixing_
34
35 return sources, mixing_matrix
36
37# Example: Separate a mixed recording
38def separate_stereo_recording(filepath, output_dir='separated/'):
39 # Load stereo audio
40 audio, sr = librosa.load(filepath, sr=None, mono=False)
41
42 # Perform ICA separation
43 sources, mixing = perform_ica_separation(audio, n_components=2)
44
45 # Save separated sources
46 for i, source in enumerate(sources):
47 output_path = f"{output_dir}source_{i+1}.wav"
48 sf.write(output_path, source, sr)
49 print(f"Saved source {i+1} to {output_path}")
50
51 # Analyze mixing matrix
52 print(f"\nMixing matrix:\n{mixing}")
53 print(f"Condition number: {np.linalg.cond(mixing):.2f}")
54
55 return sources, mixing
56
57# Advanced: Time-frequency domain ICA
58def tf_ica_separation(audio, sr, n_fft=2048, hop_length=512):
59 """Apply ICA in time-frequency domain for better separation"""
60 # Convert to spectrogram
61 D = librosa.stft(audio, n_fft=n_fft, hop_length=hop_length)
62 magnitude, phase = np.abs(D), np.angle(D)
63
64 # Apply ICA to magnitude spectrogram
65 ica = FastICA(n_components=2, whiten=True)
66 separated_mag = ica.fit_transform(magnitude.T).T
67
68 # Reconstruct with original phase
69 separated_complex = separated_mag * np.exp(1j * phase[:separated_mag.shape[0], :])
70
71 # Convert back to time domain
72 separated_audio = []
73 for i in range(separated_mag.shape[0]):
74 audio_i = librosa.istft(separated_complex[i], hop_length=hop_length)
75 separated_audio.append(audio_i)
76
77 return np.array(separated_audio)
Non-Negative Matrix Factorization (NMF)
NMF decomposes non-negative data (like magnitude spectrograms) into non-negative factors. This constraint leads to parts-based representations that are often more interpretable than other factorizations.
Given non-negative matrix , find:
where and
Minimize divergence:
Assumes Gaussian noise model
Better for sparse, count-like data
Lee and Seung's elegant update rules guarantee non-negativity:
For Euclidean distance:
For KL divergence:
1import numpy as np
2from sklearn.decomposition import NMF
3import librosa
4import matplotlib.pyplot as plt
5
6class MusicNMF:
7 def __init__(self, n_components=8, init='random', max_iter=200):
8 """
9 Initialize NMF for music analysis
10
11 Parameters:
12 - n_components: Number of basis components
13 - init: Initialization method ('random', 'nndsvd', 'nndsvda')
14 - max_iter: Maximum iterations
15 """
16 self.n_components = n_components
17 self.nmf = NMF(
18 n_components=n_components,
19 init=init,
20 solver='mu', # Multiplicative update
21 beta_loss='frobenius', # or 'kullback-leibler'
22 max_iter=max_iter,
23 random_state=42
24 )
25
26 def decompose_spectrogram(self, audio, sr=22050, n_fft=2048, hop_length=512):
27 """Decompose audio spectrogram using NMF"""
28 # Compute magnitude spectrogram
29 D = librosa.stft(audio, n_fft=n_fft, hop_length=hop_length)
30 magnitude = np.abs(D)
31 phase = np.angle(D)
32
33 # Apply NMF
34 W = self.nmf.fit_transform(magnitude.T).T # Basis components
35 H = self.nmf.components_ # Activations
36
37 # Reconstruction
38 magnitude_reconstructed = W @ H
39
40 return {
41 'W': W, # Frequency × Components
42 'H': H, # Components × Time
43 'magnitude': magnitude,
44 'phase': phase,
45 'reconstruction': magnitude_reconstructed,
46 'reconstruction_error': np.mean((magnitude - magnitude_reconstructed) ** 2)
47 }
48
49 def extract_components(self, audio, sr=22050):
50 """Extract individual audio components"""
51 result = self.decompose_spectrogram(audio, sr)
52 W, H, phase = result['W'], result['H'], result['phase']
53
54 components = []
55 for i in range(self.n_components):
56 # Reconstruct each component
57 component_mag = np.outer(W[:, i], H[i, :])
58 component_complex = component_mag * np.exp(1j * phase)
59 component_audio = librosa.istft(component_complex)
60 components.append(component_audio)
61
62 return np.array(components), result
63
64 def plot_decomposition(self, result, sr=22050, hop_length=512):
65 """Visualize NMF decomposition"""
66 fig, axes = plt.subplots(3, 2, figsize=(12, 10))
67
68 # Original spectrogram
69 librosa.display.specshow(
70 librosa.amplitude_to_db(result['magnitude'], ref=np.max),
71 sr=sr, hop_length=hop_length, x_axis='time', y_axis='hz',
72 ax=axes[0, 0]
73 )
74 axes[0, 0].set_title('Original Spectrogram')
75
76 # Reconstructed spectrogram
77 librosa.display.specshow(
78 librosa.amplitude_to_db(result['reconstruction'], ref=np.max),
79 sr=sr, hop_length=hop_length, x_axis='time', y_axis='hz',
80 ax=axes[0, 1]
81 )
82 axes[0, 1].set_title('NMF Reconstruction')
83
84 # Basis components (W)
85 axes[1, 0].imshow(result['W'], aspect='auto', origin='lower')
86 axes[1, 0].set_title('Basis Components (W)')
87 axes[1, 0].set_xlabel('Component')
88 axes[1, 0].set_ylabel('Frequency Bin')
89
90 # Activations (H)
91 axes[1, 1].imshow(result['H'], aspect='auto', origin='lower')
92 axes[1, 1].set_title('Activations (H)')
93 axes[1, 1].set_xlabel('Time Frame')
94 axes[1, 1].set_ylabel('Component')
95
96 # Component contributions
97 contributions = np.sum(result['H'], axis=1)
98 axes[2, 0].bar(range(len(contributions)), contributions)
99 axes[2, 0].set_title('Component Contributions')
100 axes[2, 0].set_xlabel('Component')
101 axes[2, 0].set_ylabel('Total Activation')
102
103 # Reconstruction error over time
104 error = np.mean((result['magnitude'] - result['reconstruction']) ** 2, axis=0)
105 time_frames = np.arange(len(error)) * hop_length / sr
106 axes[2, 1].plot(time_frames, error)
107 axes[2, 1].set_title('Reconstruction Error')
108 axes[2, 1].set_xlabel('Time (s)')
109 axes[2, 1].set_ylabel('MSE')
110
111 plt.tight_layout()
112 return fig
113
114# Example: Drum pattern extraction
115def extract_drum_patterns(audio_file, n_patterns=4):
116 """Extract repeating drum patterns using NMF"""
117 audio, sr = librosa.load(audio_file, sr=22050)
118
119 # Use NMF with sparsity constraints for drums
120 nmf = MusicNMF(n_components=n_patterns)
121 components, result = nmf.extract_components(audio, sr)
122
123 # Find most percussive components (high frequency energy)
124 percussiveness = []
125 for comp in components:
126 # Compute spectral centroid as measure of "brightness"
127 centroid = librosa.feature.spectral_centroid(y=comp, sr=sr)
128 percussiveness.append(np.mean(centroid))
129
130 # Sort by percussiveness
131 drum_indices = np.argsort(percussiveness)[-2:] # Top 2 most percussive
132
133 return components[drum_indices], result
134
135# Supervised NMF with template matching
136def template_based_nmf(audio, templates, sr=22050):
137 """
138 Use pre-defined templates for targeted source separation
139
140 Parameters:
141 - audio: Input audio signal
142 - templates: Dictionary of instrument templates
143 """
144 # Compute spectrogram
145 D = librosa.stft(audio)
146 magnitude = np.abs(D)
147
148 # Initialize W with templates
149 W_init = np.column_stack(list(templates.values()))
150
151 # Partially fixed NMF
152 nmf = NMF(
153 n_components=len(templates),
154 init='custom',
155 solver='mu',
156 max_iter=300
157 )
158
159 # Set initial W
160 nmf.fit_transform(magnitude.T, W=W_init.T, H=None)
161
162 return nmf.components_, nmf.transform(magnitude.T).T
Comparing ICA and NMF
- • Excellent for linear mixtures
- • No non-negativity constraint
- • Strong theoretical foundation
- • Works well for instantaneous mixtures
- • Can handle complex-valued signals
- • Naturally parts-based representation
- • Interpretable components
- • Works well with spectrograms
- • Can incorporate sparsity constraints
- • Suitable for polyphonic music
Advanced Techniques and Variants
Extends NMF to complex spectrograms, preserving phase information:
Allows direct audio reconstruction without phase estimation
Models temporal patterns with 2D convolution:
Captures time-varying spectral patterns like vibrato
Adds sparsity constraints for cleaner separation:
Encourages part-based representations with fewer active components
Practical Applications in Music
1import numpy as np
2import librosa
3from scipy.ndimage import median_filter
4
5def hpss_with_nmf(audio, sr=22050, n_components=2, kernel_size=(1, 31)):
6 """
7 Harmonic-Percussive Source Separation enhanced with NMF
8
9 Combines median filtering with NMF for improved separation
10 """
11 # Compute STFT
12 D = librosa.stft(audio)
13 magnitude, phase = np.abs(D), np.angle(D)
14
15 # Step 1: Initial median filtering separation
16 harmonic = median_filter(magnitude, size=(kernel_size[0], kernel_size[1]))
17 percussive = median_filter(magnitude, size=(kernel_size[1], kernel_size[0]))
18
19 # Step 2: Apply NMF to refine separation
20 from sklearn.decomposition import NMF
21
22 # NMF on harmonic component (pitched content)
23 nmf_h = NMF(n_components=n_components, init='nndsvd', max_iter=200)
24 W_h = nmf_h.fit_transform(harmonic.T).T
25 H_h = nmf_h.components_
26 harmonic_refined = W_h @ H_h
27
28 # NMF on percussive component (transient content)
29 nmf_p = NMF(n_components=n_components, init='nndsvd', max_iter=200)
30 W_p = nmf_p.fit_transform(percussive.T).T
31 H_p = nmf_p.components_
32 percussive_refined = W_p @ H_p
33
34 # Step 3: Soft masking
35 mask_h = harmonic_refined / (harmonic_refined + percussive_refined + 1e-10)
36 mask_p = percussive_refined / (harmonic_refined + percussive_refined + 1e-10)
37
38 # Apply masks
39 harmonic_final = magnitude * mask_h
40 percussive_final = magnitude * mask_p
41
42 # Reconstruct audio
43 harmonic_audio = librosa.istft(harmonic_final * np.exp(1j * phase))
44 percussive_audio = librosa.istft(percussive_final * np.exp(1j * phase))
45
46 return harmonic_audio, percussive_audio, {
47 'harmonic_basis': W_h,
48 'percussive_basis': W_p,
49 'harmonic_activation': H_h,
50 'percussive_activation': H_p
51 }
52
53# Application: Remix generation
54def create_remix_stems(audio_file, n_stems=4):
55 """
56 Create remixable stems using NMF decomposition
57 """
58 audio, sr = librosa.load(audio_file, sr=22050)
59
60 # Separate harmonic and percussive first
61 harmonic, percussive, nmf_data = hpss_with_nmf(audio, sr, n_components=n_stems//2)
62
63 # Further decompose harmonic into bass and melody
64 bass_cutoff = 200 # Hz
65 bass_bin = int(bass_cutoff * 2048 / sr)
66
67 # Extract bass from harmonic
68 D_harmonic = librosa.stft(harmonic)
69 D_bass = D_harmonic.copy()
70 D_bass[bass_bin:, :] = 0
71 bass = librosa.istft(D_bass)
72
73 # Extract melody (harmonic minus bass)
74 melody = harmonic - bass
75
76 # Create stems dictionary
77 stems = {
78 'bass': bass,
79 'melody': melody,
80 'drums': percussive,
81 'original': audio
82 }
83
84 return stems, nmf_data
85
86# Example usage
87stems, data = create_remix_stems('song.wav')
88for name, audio in stems.items():
89 librosa.output.write_wav(f'stem_{name}.wav', audio, 22050)
Evaluation Metrics
Signal-to-Distortion Ratio (SDR)
Signal-to-Interference Ratio (SIR)
Signal-to-Artifacts Ratio (SAR)
Key Takeaways
- ICA excels at unmixing: When sources are truly independent and linearly mixed, ICA provides theoretically optimal separation.
- NMF offers interpretability: Non-negativity constraints lead to parts-based representations that align with musical structure.
- Preprocessing matters: Both methods benefit from appropriate preprocessing (whitening for ICA, scaling for NMF).
- Hybrid approaches win: Combining multiple techniques often yields better results than any single method.