KaraMIR


Datasets and solutions for new Music Information Retrieval challenges

KaraMIR: Project description

At glance
  • KaraMIR is a community project that aims at creating data, know-how and new partnerships, toward the development of new-generation music discovery applications. Read about our vision.
  • We partnered with Recisio to provide datasets and analysis of karaoke songs from Karafun. Kara1k, the 1,000 songs dataset is now available.
Maintainers
  • Yann Bayle, LaBRI, CNRS, Univ. Bordeaux
  • Ladislav Maršík, Charles University, Prague
  • Martin Rusek, IT4Innovations, Ostrava

What's new

Kara1k dataset is now available 11th August 2017 by lmarsik

Kara1k

We introduce Kara1k - a dataset composed of 1,000 analyzed song titles, each in 2 versions, thanks to a partnership with Recisio (Karafun mobile application).
Read more ...
KaraMIR community project started 9th August 2017 by lmarsik

Featured post

Powered by

Kara1k


A karaoke dataset for cover song identification and singing voice analysis

Kara1k is a freely-available dataset of audio features from karaoke songs provided by Recisio (Karafun mobile application).

Kara1k is mainly dedicated toward cover song identification and singing voice analysis. The dataset is divided into 1,000 cover songs from Karafun application, and the corresponding 1,000 songs by the original authors.

It contains:

  • An unmatched variety of features, including: Essentia [1], harmony-analyser [2], Marsyas [3], Vamp plugins [4][5] and YAAFE [6].
  • Metadata such as the title, genre, original author, year, International Standard Recording Code and lyrics' language. We also include non-standard metadata such as explicit language annotation, combined singer and backing vocalists gender, or annotation of duets.
  • Pure singing voice features alongside with backing track features and mixes. The karaoke tracks from Recisio are recorded in a studio quality by professional musicians, including the lead singer track for the guidance in Karafun mobile application. Therefore, we can provide 3 tracks for cover songs (lead voice, backing track, mix) and the original song. The cover and original are both professional mixes (as opposed to amateur user-recorded audio) which allows for an interesting comparison.

Metadata and ground truth

Table 1 shows all supported metadata / ground truths that are available in the form of XLS and CSV files or as a SQL database dump. We specify, which part of the Kara1k dataset the annotation describes (Cover/Origin/Both).

Table 1: Available metadata / ground truths in Kara1k
Download all as XLS, CSV, or SQL
Please reach out to us with requests for new metadata / ground truths
NameTypeCover/OriginDescription
TitleStringBothThe official name of the song
ArtistStringBothThe original artist
GenreComma-separated listCoverComma-separated list of all genres specified by Recisio
GenderEnumCoverGender of the singers and vocalists sounding together, value from
{mixed, male, males, female, females}
LanguageComma-separated listCoverComma-separated list of all languages that are sounding in the song
Backing vocalsBooleanCoverThe song contains backing vocals
DuetBooleanCoverThe song contains duets
YearNumericOriginWhen was the original recording was published (year)
ExplicitBooleanCoverThe song lyrics contain explicit language
ISRCStringOriginISRC of the original recording
Youtube linkURLOriginYouTube URL

Features

Below is the full list of features (Table 2) with download links. Features are in the form of TXT, JSON or ARFF files, depending on their type and extractor. Some features may have prerequisites, as they are high-level features dependent e.g. on chroma vectors, and not solely on the WAV file. If so, they are listed in Prerequisites column.

We can distinguish 3 types based on the feature structure:

  • Frame (time series with a fixed frame rate, each line of TXT file represents 1 frame and contains
    value or array
    ). One feature per file.
  • Timestamp (in the form
    timestamp: value or array
    for each line of TXT file). One feature per file.
  • Single (Essentia: in a JSON file
    { "name1": value, array or JSON, "name2": value, array or JSON, ... }
    , Marsyas: in an ARFF file with headers
    @attribute name type
    followed by
    @data
    and comma-separated values). Multiple logically grouped features are included in one file.

Table 2: Available features in Kara1k
You can also download all features for cover songs / origin songs , or particular features in the table below.
Please reach out to us with requests for new features or different parameters
Feature extractorFeatureTypeDescriptionPrerequisitesParametersDefinitionDownload
Essentia [1]47 low-level featuresSingleMean, median, var, dmean, dmean2, dvar, dvar2, max and min of: barkbands, dissonance, erbbands, hfc, melbands, pitch salience, silence rates, spectral, gfcc and mfcc featuresnone
version: 2.1-beta2; extractor: music 1.0
cover/origin
13 rhythm featuresSingle, Timestampbeats (count, bpm, loudness, 6 histograms), danceability, onset rate, beat position timestampsnone
version: 2.1-beta2; extractor: music 1.0
24 harmony featuresSinglechord (changes rate, number rate, strength, histogram, key based on chords - label, min/maj), key (label, min/maj, strength), tuning (strength, equal tempered deviation, frequency, nontempered energy ratio), hpcp (entropy, mean, median, var, dmean, dmean2, dvar, dvar2, min and max), thpcpnone
version: 2.1-beta2; extractor: music 1.0
9 extracted metadataSinglesample rate, bitrate, equal_loudness, length, lossless, replay gain, codec, downmix, md5_encodednone
version: 2.1-beta2; extractor: music 1.0
harmony-analyser [2]Chord vectors frameFrame12-dimensional chord vectors, for each frameChordino Tones
version: 1.2-beta
cover/origin
Chord vectorsTimestamp12-dimensional chord vectors, with timestampsChordino Tones
version: 1.2-beta
cover/origin
Key vectors frameFrame12-dimensional key vectors, for each frameKey
version: 1.2-beta; frameRate: 10 Hz
cover/origin
Key vectorsTimestamp12-dimensional key vectors, with timestampsKey
version: 1.2-beta; frameRate: 10 Hz
cover/origin
TPS distance frameFrame1-dimensional series of TPS distances, for each frameChordino Labels, Chordino Tones, Key
version: 1.2-beta; frameRate: 10 Hz
cover/origin
TPS distanceTimestamp1-dimensional series of TPS distances, with timestampsChordino Labels, Chordino Tones, Key
version: 1.2-beta; none
cover/origin
CC distanceFrame1-dimensional series of CC distances, for each frameChroma Vectors, Chordino Labels
version: 1.2-beta; audibleThreshold: 0.07; maximumNumberOfChordTones: 4; maximalComplexity: 7; frameRate: 10 Hz
cover/origin
CC distanceTimestamp1-dimensional series of CC distances, with timestampsChroma Vectors, Chordino Labels
version: 1.2-beta; audibleThreshold: 0.07; maximumNumberOfChordTones: 4; maximalComplexity: 7
cover/origin
Average CC distanceSingleMean of CC distancesChroma Vectors, Chordino Labels
version: 1.2-beta; audibleThreshold: 0.07; maximumNumberOfChordTones: 4; maximalComplexity: 7
cover/origin
Chroma CDFrame1-dimensional series of chroma vector differences, for each frameChroma Vectors
audibleThreshold: 0.07; frameRate: 10 Hz
cover/origin
Marsyas [3]68 scalar featuresSingleMeans and Standard deviations of: ZeroCrossing, Spectral Centroid, Rolloff, Flux, MFCC0-12, each calculated in 2 settings (2 * 2 * 17 features)none
version: 0.5; window: Hamming; hopSize: 512; winSize: 1024; accSize: 898; singleVector: yes
cover/origin
Vamp plugins [4][5]Chroma vectorsFrame12-dimensional chroma vectors, for each framenone
version: 1.1; block size: 16384; useNNLS: 1; rollon: 1; tuningMode: 0; whitening: 1; s: 0.7; chromanormalize: 0; frameRate: 10.75 Hz
cover/origin
Chordino LabelsTimestampChord labels, with timestampsnone
version: 1.1; block size: 16384; useNNLS: 1; tuningMode: 0; whitening: 1; s: 0.7; boostn: 0.1; usehartesyntax: 0
cover/origin
Chordino TonesTimestampChord tones, with timestampsnone
version: 1.1; block size: 16384; useNNLS: 1; tuningMode: 0; whitening: 1; s: 0.7; boostn: 0.1; usehartesyntax: 0
cover/origin
KeyTimestampKey labels, with timestampsnone
version: 1.7.1; tuning: 440; length: 10
cover/origin
YAAFE [6]MFCCFrame13-dimensional MFCC, for each framenone
version: 0.65; sampleRate: 22050; blockSize: 2048; stepSize: 1024; normalize: -1; resample: yes; frameRate: 21.52 Hz
cover/origin

References

[1] D. Bogdanov, N. Wack, E. Gómez, S. Gulati, and P. Herrera, “Essentia: An Audio Analysis Library for Music Information Retrieval,” in Proc. Int. Soc. Music Inform. Retrieval Conf., Curitiba, Brazil, 2013, pp. 493– 498.
[2] L. Maršı́k, “harmony-analyser.org - Java Library and Tools for Chordal Analysis,” in Proceedings of 2016 Joint WOCMAT-IRCAM Forum Conference, Taoyuan City, Taiwan, 2016, pp. 38–43.
[3] G. Tzanetakis and P. Cook, “Marsyas: A framework for audio analysis,” Organised sound, vol. 4, no. 3, pp. 169–175, 2000
[4] M. Mauch and S. Dixon, “Approximate Note Transcription for the Improved Identification of Difficult Chords,” in Proc. Int. Soc. Music Inform. Retrieval Conf., Utrecht, Netherlands, 2010, pp. 135–140.
[5] K. Noland and M. Sandler, “Signal processing parameters for tonality estimation,” in Audio Engineering Society Convention 122, Vienna, Austria, 2007.
[6] B. Mathieu, S. Essid, T. Fillon, J. Prado, and G. Richard, “YAAFE, an Easy to Use and Efficient Audio Feature Extraction Software,” in Proc. Int. Soc. Music Inform. Retrieval Conf., Utrecht, Netherlands, 2010, pp. 441–446