Music Instrument Source Separation

Current architecture (link to GitHub):

 1-D 6-layered U-Net in the time domain. Trained with 70 songs. Each model trained on one instrument: bass, drum, and vocal.

Trained with the DSD100 dataset.

Good news, everyone!

More datasets (MedleyDB) have arrived. This will enable the fine-tuning and explore a little bit more complex model: adding VQ-VAE

Don't Go - Sanulrim

Original mix: 48000 samples per second

Bass: 16000 samples per second

Drum: 16000 samples per second

Vocal: 16000 samples per second

A long way to go! 

There are areas of underperformance: for instance, the model struggles to accurately separate vocals when the vocalist talks rather than sings.

I want a better and cleaner separation

What else is coming?

GitHub