JewelMusic - AI-Powered Music Distribution Platform

The Power of Matrix Decomposition in Music

Matrix factorization techniques transform complex musical signals into interpretable components. By decomposing spectrograms or feature matrices into products of simpler matrices, we can uncover hidden structure, separate sources, and extract meaningful patterns from music.

The Fundamental Idea

Given a data matrix $V$ (e.g., a spectrogram), find factor matrices that approximate it:

V \approx W \cdot H

Where $W$ represents basis components (e.g., instrument timbres) and $H$ represents their activations over time.

Independent Component Analysis (ICA)

ICA assumes that observed signals are linear mixtures of statistically independent source signals. It's particularly powerful for separating sources that were mixed linearly, like multiple instruments recorded simultaneously.

The ICA Model

For source signals $s_1, s_2, ..., s_n$ and observed mixtures $x_1, x_2, ..., x_m$ :

X = A \cdot S

ICA finds the unmixing matrix $W$ such that:

S = W \cdot X = W \cdot A \cdot S

The key assumption: sources are statistically independent and non-Gaussian.

Independence Measures

Mutual Information:

I(y_1, y_2) = \sum_i \sum_j p(y_1^i, y_2^j) \log \frac{p(y_1^i, y_2^j)}{p(y_1^i)p(y_2^j)}

ICA minimizes mutual information between components

Non-Gaussianity

Negentropy:

J(y) = H(y_{gauss}) - H(y)

Measures departure from Gaussian distribution

FastICA Algorithm

The most popular ICA algorithm uses fixed-point iteration:

1. Preprocessing:

X_{centered} = X - E[X]

X_{whitened} = ED^{-1/2}E^T X_{centered}

2. Iteration:

w^+ = E\{xg(w^Tx)\} - E\{g'(w^Tx)\}w

w = w^+/||w^+||

where $g$ is a non-quadratic function (e.g., $tanh$ )

ICA for Music Source Separation

1import numpy as np
2from sklearn.decomposition import FastICA
3import librosa
4import soundfile as sf
5
6def perform_ica_separation(audio_mixture, n_components=2, max_iter=200):
7    """
8    Separate audio sources using Independent Component Analysis
9    
10    Parameters:
11    - audio_mixture: Stereo audio signal (2 channels)
12    - n_components: Number of sources to extract
13    - max_iter: Maximum iterations for FastICA
14    """
15    # Ensure we have a 2D array (channels × samples)
16    if audio_mixture.ndim == 1:
17        audio_mixture = np.vstack([audio_mixture, audio_mixture])
18    
19    # Initialize FastICA
20    ica = FastICA(
21        n_components=n_components,
22        algorithm='parallel',  # or 'deflation'
23        whiten=True,
24        fun='logcosh',  # Non-linearity: 'logcosh', 'exp', 'cube'
25        max_iter=max_iter,
26        random_state=42
27    )
28    
29    # Fit and transform the data
30    sources = ica.fit_transform(audio_mixture.T).T
31    
32    # Get mixing matrix for analysis
33    mixing_matrix = ica.mixing_
34    
35    return sources, mixing_matrix
36
37# Example: Separate a mixed recording
38def separate_stereo_recording(filepath, output_dir='separated/'):
39    # Load stereo audio
40    audio, sr = librosa.load(filepath, sr=None, mono=False)
41    
42    # Perform ICA separation
43    sources, mixing = perform_ica_separation(audio, n_components=2)
44    
45    # Save separated sources
46    for i, source in enumerate(sources):
47        output_path = f"{output_dir}source_{i+1}.wav"
48        sf.write(output_path, source, sr)
49        print(f"Saved source {i+1} to {output_path}")
50    
51    # Analyze mixing matrix
52    print(f"\nMixing matrix:\n{mixing}")
53    print(f"Condition number: {np.linalg.cond(mixing):.2f}")
54    
55    return sources, mixing
56
57# Advanced: Time-frequency domain ICA
58def tf_ica_separation(audio, sr, n_fft=2048, hop_length=512):
59    """Apply ICA in time-frequency domain for better separation"""
60    # Convert to spectrogram
61    D = librosa.stft(audio, n_fft=n_fft, hop_length=hop_length)
62    magnitude, phase = np.abs(D), np.angle(D)
63    
64    # Apply ICA to magnitude spectrogram
65    ica = FastICA(n_components=2, whiten=True)
66    separated_mag = ica.fit_transform(magnitude.T).T
67    
68    # Reconstruct with original phase
69    separated_complex = separated_mag * np.exp(1j * phase[:separated_mag.shape[0], :])
70    
71    # Convert back to time domain
72    separated_audio = []
73    for i in range(separated_mag.shape[0]):
74        audio_i = librosa.istft(separated_complex[i], hop_length=hop_length)
75        separated_audio.append(audio_i)
76    
77    return np.array(separated_audio)

Non-Negative Matrix Factorization (NMF)

NMF decomposes non-negative data (like magnitude spectrograms) into non-negative factors. This constraint leads to parts-based representations that are often more interpretable than other factorizations.

The NMF Optimization Problem

Given non-negative matrix $V \in \mathbb{R}^{m \times n}_+$ , find:

V \approx WH

where $W \in \mathbb{R}^{m \times r}_+$ and $H \in \mathbb{R}^{r \times n}_+$

Minimize divergence:

D(V||WH) = \sum_{i,j} d(V_{ij}, (WH)_{ij})

Euclidean Distance

D_{EU}(V||WH) = ||V - WH||_F^2

Assumes Gaussian noise model

KL Divergence

D_{KL}(V||WH) = \sum_{ij} V_{ij} \log\frac{V_{ij}}{(WH)_{ij}} - V_{ij} + (WH)_{ij}

Better for sparse, count-like data

Multiplicative Update Rules

Lee and Seung's elegant update rules guarantee non-negativity:

For Euclidean distance:

H \leftarrow H \odot \frac{W^T V}{W^T W H}

W \leftarrow W \odot \frac{V H^T}{W H H^T}

For KL divergence:

H \leftarrow H \odot \frac{W^T (V \oslash WH)}{W^T \mathbf{1}}

W \leftarrow W \odot \frac{(V \oslash WH) H^T}{\mathbf{1} H^T}

NMF for Music Decomposition

1import numpy as np
2from sklearn.decomposition import NMF
3import librosa
4import matplotlib.pyplot as plt
5
6class MusicNMF:
7    def __init__(self, n_components=8, init='random', max_iter=200):
8        """
9        Initialize NMF for music analysis
10        
11        Parameters:
12        - n_components: Number of basis components
13        - init: Initialization method ('random', 'nndsvd', 'nndsvda')
14        - max_iter: Maximum iterations
15        """
16        self.n_components = n_components
17        self.nmf = NMF(
18            n_components=n_components,
19            init=init,
20            solver='mu',  # Multiplicative update
21            beta_loss='frobenius',  # or 'kullback-leibler'
22            max_iter=max_iter,
23            random_state=42
24        )
25    
26    def decompose_spectrogram(self, audio, sr=22050, n_fft=2048, hop_length=512):
27        """Decompose audio spectrogram using NMF"""
28        # Compute magnitude spectrogram
29        D = librosa.stft(audio, n_fft=n_fft, hop_length=hop_length)
30        magnitude = np.abs(D)
31        phase = np.angle(D)
32        
33        # Apply NMF
34        W = self.nmf.fit_transform(magnitude.T).T  # Basis components
35        H = self.nmf.components_  # Activations
36        
37        # Reconstruction
38        magnitude_reconstructed = W @ H
39        
40        return {
41            'W': W,  # Frequency × Components
42            'H': H,  # Components × Time
43            'magnitude': magnitude,
44            'phase': phase,
45            'reconstruction': magnitude_reconstructed,
46            'reconstruction_error': np.mean((magnitude - magnitude_reconstructed) ** 2)
47        }
48    
49    def extract_components(self, audio, sr=22050):
50        """Extract individual audio components"""
51        result = self.decompose_spectrogram(audio, sr)
52        W, H, phase = result['W'], result['H'], result['phase']
53        
54        components = []
55        for i in range(self.n_components):
56            # Reconstruct each component
57            component_mag = np.outer(W[:, i], H[i, :])
58            component_complex = component_mag * np.exp(1j * phase)
59            component_audio = librosa.istft(component_complex)
60            components.append(component_audio)
61        
62        return np.array(components), result
63    
64    def plot_decomposition(self, result, sr=22050, hop_length=512):
65        """Visualize NMF decomposition"""
66        fig, axes = plt.subplots(3, 2, figsize=(12, 10))
67        
68        # Original spectrogram
69        librosa.display.specshow(
70            librosa.amplitude_to_db(result['magnitude'], ref=np.max),
71            sr=sr, hop_length=hop_length, x_axis='time', y_axis='hz',
72            ax=axes[0, 0]
73        )
74        axes[0, 0].set_title('Original Spectrogram')
75        
76        # Reconstructed spectrogram
77        librosa.display.specshow(
78            librosa.amplitude_to_db(result['reconstruction'], ref=np.max),
79            sr=sr, hop_length=hop_length, x_axis='time', y_axis='hz',
80            ax=axes[0, 1]
81        )
82        axes[0, 1].set_title('NMF Reconstruction')
83        
84        # Basis components (W)
85        axes[1, 0].imshow(result['W'], aspect='auto', origin='lower')
86        axes[1, 0].set_title('Basis Components (W)')
87        axes[1, 0].set_xlabel('Component')
88        axes[1, 0].set_ylabel('Frequency Bin')
89        
90        # Activations (H)
91        axes[1, 1].imshow(result['H'], aspect='auto', origin='lower')
92        axes[1, 1].set_title('Activations (H)')
93        axes[1, 1].set_xlabel('Time Frame')
94        axes[1, 1].set_ylabel('Component')
95        
96        # Component contributions
97        contributions = np.sum(result['H'], axis=1)
98        axes[2, 0].bar(range(len(contributions)), contributions)
99        axes[2, 0].set_title('Component Contributions')
100        axes[2, 0].set_xlabel('Component')
101        axes[2, 0].set_ylabel('Total Activation')
102        
103        # Reconstruction error over time
104        error = np.mean((result['magnitude'] - result['reconstruction']) ** 2, axis=0)
105        time_frames = np.arange(len(error)) * hop_length / sr
106        axes[2, 1].plot(time_frames, error)
107        axes[2, 1].set_title('Reconstruction Error')
108        axes[2, 1].set_xlabel('Time (s)')
109        axes[2, 1].set_ylabel('MSE')
110        
111        plt.tight_layout()
112        return fig
113
114# Example: Drum pattern extraction
115def extract_drum_patterns(audio_file, n_patterns=4):
116    """Extract repeating drum patterns using NMF"""
117    audio, sr = librosa.load(audio_file, sr=22050)
118    
119    # Use NMF with sparsity constraints for drums
120    nmf = MusicNMF(n_components=n_patterns)
121    components, result = nmf.extract_components(audio, sr)
122    
123    # Find most percussive components (high frequency energy)
124    percussiveness = []
125    for comp in components:
126        # Compute spectral centroid as measure of "brightness"
127        centroid = librosa.feature.spectral_centroid(y=comp, sr=sr)
128        percussiveness.append(np.mean(centroid))
129    
130    # Sort by percussiveness
131    drum_indices = np.argsort(percussiveness)[-2:]  # Top 2 most percussive
132    
133    return components[drum_indices], result
134
135# Supervised NMF with template matching
136def template_based_nmf(audio, templates, sr=22050):
137    """
138    Use pre-defined templates for targeted source separation
139    
140    Parameters:
141    - audio: Input audio signal
142    - templates: Dictionary of instrument templates
143    """
144    # Compute spectrogram
145    D = librosa.stft(audio)
146    magnitude = np.abs(D)
147    
148    # Initialize W with templates
149    W_init = np.column_stack(list(templates.values()))
150    
151    # Partially fixed NMF
152    nmf = NMF(
153        n_components=len(templates),
154        init='custom',
155        solver='mu',
156        max_iter=300
157    )
158    
159    # Set initial W
160    nmf.fit_transform(magnitude.T, W=W_init.T, H=None)
161    
162    return nmf.components_, nmf.transform(magnitude.T).T

Comparing ICA and NMF

ICA Strengths

• Excellent for linear mixtures
• No non-negativity constraint
• Strong theoretical foundation
• Works well for instantaneous mixtures
• Can handle complex-valued signals

NMF Strengths

• Naturally parts-based representation
• Interpretable components
• Works well with spectrograms
• Can incorporate sparsity constraints
• Suitable for polyphonic music

Advanced Techniques and Variants

Complex NMF

Extends NMF to complex spectrograms, preserving phase information:

S \approx \sum_k W_k \odot H_k \odot e^{j\Phi_k}

Allows direct audio reconstruction without phase estimation

Convolutive NMF

Models temporal patterns with 2D convolution:

V \approx \sum_{t=0}^{T-1} W^{(t)} \cdot \stackrel{\rightarrow}{H^{(t)}}

Captures time-varying spectral patterns like vibrato

Sparse NMF

Adds sparsity constraints for cleaner separation:

\min_{W,H \geq 0} ||V - WH||_F^2 + \lambda_W ||W||_1 + \lambda_H ||H||_1

Encourages part-based representations with fewer active components

Practical Applications in Music

Real-World Application: Harmonic-Percussive Separation

1import numpy as np
2import librosa
3from scipy.ndimage import median_filter
4
5def hpss_with_nmf(audio, sr=22050, n_components=2, kernel_size=(1, 31)):
6    """
7    Harmonic-Percussive Source Separation enhanced with NMF
8    
9    Combines median filtering with NMF for improved separation
10    """
11    # Compute STFT
12    D = librosa.stft(audio)
13    magnitude, phase = np.abs(D), np.angle(D)
14    
15    # Step 1: Initial median filtering separation
16    harmonic = median_filter(magnitude, size=(kernel_size[0], kernel_size[1]))
17    percussive = median_filter(magnitude, size=(kernel_size[1], kernel_size[0]))
18    
19    # Step 2: Apply NMF to refine separation
20    from sklearn.decomposition import NMF
21    
22    # NMF on harmonic component (pitched content)
23    nmf_h = NMF(n_components=n_components, init='nndsvd', max_iter=200)
24    W_h = nmf_h.fit_transform(harmonic.T).T
25    H_h = nmf_h.components_
26    harmonic_refined = W_h @ H_h
27    
28    # NMF on percussive component (transient content)
29    nmf_p = NMF(n_components=n_components, init='nndsvd', max_iter=200)
30    W_p = nmf_p.fit_transform(percussive.T).T
31    H_p = nmf_p.components_
32    percussive_refined = W_p @ H_p
33    
34    # Step 3: Soft masking
35    mask_h = harmonic_refined / (harmonic_refined + percussive_refined + 1e-10)
36    mask_p = percussive_refined / (harmonic_refined + percussive_refined + 1e-10)
37    
38    # Apply masks
39    harmonic_final = magnitude * mask_h
40    percussive_final = magnitude * mask_p
41    
42    # Reconstruct audio
43    harmonic_audio = librosa.istft(harmonic_final * np.exp(1j * phase))
44    percussive_audio = librosa.istft(percussive_final * np.exp(1j * phase))
45    
46    return harmonic_audio, percussive_audio, {
47        'harmonic_basis': W_h,
48        'percussive_basis': W_p,
49        'harmonic_activation': H_h,
50        'percussive_activation': H_p
51    }
52
53# Application: Remix generation
54def create_remix_stems(audio_file, n_stems=4):
55    """
56    Create remixable stems using NMF decomposition
57    """
58    audio, sr = librosa.load(audio_file, sr=22050)
59    
60    # Separate harmonic and percussive first
61    harmonic, percussive, nmf_data = hpss_with_nmf(audio, sr, n_components=n_stems//2)
62    
63    # Further decompose harmonic into bass and melody
64    bass_cutoff = 200  # Hz
65    bass_bin = int(bass_cutoff * 2048 / sr)
66    
67    # Extract bass from harmonic
68    D_harmonic = librosa.stft(harmonic)
69    D_bass = D_harmonic.copy()
70    D_bass[bass_bin:, :] = 0
71    bass = librosa.istft(D_bass)
72    
73    # Extract melody (harmonic minus bass)
74    melody = harmonic - bass
75    
76    # Create stems dictionary
77    stems = {
78        'bass': bass,
79        'melody': melody,
80        'drums': percussive,
81        'original': audio
82    }
83    
84    return stems, nmf_data
85
86# Example usage
87stems, data = create_remix_stems('song.wav')
88for name, audio in stems.items():
89    librosa.output.write_wav(f'stem_{name}.wav', audio, 22050)

Evaluation Metrics

Source Separation Quality Metrics

Signal-to-Distortion Ratio (SDR)

SDR = 10 \log_{10} \frac{||s_{target}||^2}{||e_{interf} + e_{noise} + e_{artif}||^2}

Signal-to-Interference Ratio (SIR)

SIR = 10 \log_{10} \frac{||s_{target}||^2}{||e_{interf}||^2}

Signal-to-Artifacts Ratio (SAR)

SAR = 10 \log_{10} \frac{||s_{target} + e_{interf} + e_{noise}||^2}{||e_{artif}||^2}

Key Takeaways

ICA excels at unmixing: When sources are truly independent and linearly mixed, ICA provides theoretically optimal separation.
NMF offers interpretability: Non-negativity constraints lead to parts-based representations that align with musical structure.
Preprocessing matters: Both methods benefit from appropriate preprocessing (whitening for ICA, scaling for NMF).
Hybrid approaches win: Combining multiple techniques often yields better results than any single method.

References & Resources

Independent Component Analysis: Algorithms and Applications

Hyvärinen & Oja - Comprehensive ICA tutorial

Learning the parts of objects by NMF

Lee & Seung - Seminal NMF paper

Non-negative Matrix Factorization for Polyphonic Music Transcription

Smaragdis & Brown - NMF in music applications

Continue Reading

Statistical Modeling in MIR: HMMs and GMMs

Explore Hidden Markov Models and Gaussian Mixture Models for temporal pattern recognition and clustering in music.