Datasets and solutions for new Music Information Retrieval challenges

About KaraMIR

KaraMIR is a community project started as a joint effort of:

We believe that there is a room to improve the music discovery experience. The applications such as Shazam or SoundHound showed the way how users can identify the music quickly, and became very popular. But the needs of musicians and listeners are only getting higher. Examples of common users' requests are:

  • Could our smartphones identify music from a live performance? Will it work for classical music pieces?
  • Can we automatically detect metadata such as sung language, genre, number of singers, etc?
  • Can we get useful real-time information upon playing the music in media player, such as the sounding tones, harmonies, tempo, ... ?

Many of these requests cannot be fulfilled by corporations, due to a small market that would not attract big companies. Neither they are studied by the academics, since certain tasks are not yet formed, and the datasets do not exist, that could foster such research.

While we all wait for the time that our music app gets our desired feature included, we started a new effort that can somewhat help both corporations and academics to discover new challenges. We think that:

  • Creating new datasets with new ground truths is a key to propel new research.
  • Partnerships between academics and corporations are the key to help both sectors develop and think "out of the box".
  • Forming new and more specific tasks in Music Information Retrieval field is a good way of moving forward, and should not be neglected alongside improving and benchmarking the common tasks.

We call our project KaraMIR, to celebrate the first dataset we created with Recisio, the creators of Karafun application (KaraMIR - data from a Karaoke company for MIR research). KaraMIR project is not limited to karaoke songs only, but karaoke songs are a good example of our project vision. It is a smaller subset of cover songs, not explored to a full extent yet. Can karaoke songs be effectively identified and searched? Are there enough datasets to foster such research? What do we get from forming a partnership such as the one with Recisio, and what useful information we get from analyzing karaoke songs? Visit our Kara1k dataset to find out more.

KaraMIR is a community project. You can find our source code on GitHub and we are always open for collaboration. Do not hesitate to share your ideas with us!

    KaraMIR team