KaraMIR

Featured post

Kara1k

A karaoke dataset for cover song identification and singing voice analysis

Kara1k is a freely-available dataset of audio features from karaoke songs provided by Recisio (Karafun mobile application).

Kara1k is mainly dedicated toward cover song identification and singing voice analysis. The dataset is divided into 1,000 cover songs from Karafun application, and the corresponding 1,000 songs by the original authors.

It contains:

An unmatched variety of features, including: Essentia [1], harmony-analyser [2], Marsyas [3], Vamp plugins [4][5] and YAAFE [6].
Metadata such as the title, genre, original author, year, International Standard Recording Code and lyrics' language. We also include non-standard metadata such as explicit language annotation, combined singer and backing vocalists gender, or annotation of duets.
Pure singing voice features alongside with backing track features and mixes. The karaoke tracks from Recisio are recorded in a studio quality by professional musicians, including the lead singer track for the guidance in Karafun mobile application. Therefore, we can provide 3 tracks for cover songs (lead voice, backing track, mix) and the original song. The cover and original are both professional mixes (as opposed to amateur user-recorded audio) which allows for an interesting comparison.

Metadata and ground truth

Table 1 shows all supported metadata / ground truths that are available in the form of XLS and CSV files or as a SQL database dump. We specify, which part of the Kara1k dataset the annotation describes (Cover/Origin/Both).

Table 1: Available metadata / ground truths in Kara1k

Download all as XLS, CSV, or SQL

Please reach out to us with requests for new metadata / ground truths

Name	Type	Cover/Origin	Description
Title	String	Both	The official name of the song
Artist	String	Both	The original artist
Genre	Comma-separated list	Cover	Comma-separated list of all genres specified by Recisio
Gender	Enum	Cover	Gender of the singers and vocalists sounding together, value from {mixed, male, males, female, females}
Language	Comma-separated list	Cover	Comma-separated list of all languages that are sounding in the song
Backing vocals	Boolean	Cover	The song contains backing vocals
Duet	Boolean	Cover	The song contains duets
Year	Numeric	Origin	When was the original recording was published (year)
Explicit	Boolean	Cover	The song lyrics contain explicit language
ISRC	String	Origin	ISRC of the original recording
Youtube link	URL	Origin	YouTube URL

Features

Below is the full list of features (Table 2) with download links. Features are in the form of TXT, JSON or ARFF files, depending on their type and extractor. Some features may have prerequisites, as they are high-level features dependent e.g. on chroma vectors, and not solely on the WAV file. If so, they are listed in Prerequisites column.

We can distinguish 3 types based on the feature structure:

Frame (time series with a fixed frame rate, each line of TXT file represents 1 frame and contains
```
value or array
```
). One feature per file.
Timestamp (in the form
```
timestamp: value or array
```
for each line of TXT file). One feature per file.
Single (Essentia: in a JSON file
```
{ "name1": value, array or JSON, "name2": value, array or JSON, ... }
```
, Marsyas: in an ARFF file with headers
```
@attribute name type
```
followed by
```
@data
```
and comma-separated values). Multiple logically grouped features are included in one file.

Table 2: Available features in Kara1k

Download all as ZIP (1.3GB) or separately as Essentia, harmony-analyser, Marsyas, Vamp plugins, YAAFE.

You can also download all features for cover songs / origin songs , or particular features in the table below.

Please reach out to us with requests for new features or different parameters

Feature extractor	Feature	Type	Description	Prerequisites	Parameters	Download
Essentia [1]	47 low-level features	Single	Mean, median, var, dmean, dmean2, dvar, dvar2, max and min of: barkbands, dissonance, erbbands, hfc, melbands, pitch salience, silence rates, spectral, gfcc and mfcc features	none	version: 2.1-beta2; extractor: music 1.0	cover/origin
	13 rhythm features	Single, Timestamp	beats (count, bpm, loudness, 6 histograms), danceability, onset rate, beat position timestamps	none	version: 2.1-beta2; extractor: music 1.0
	24 harmony features	Single	chord (changes rate, number rate, strength, histogram, key based on chords - label, min/maj), key (label, min/maj, strength), tuning (strength, equal tempered deviation, frequency, nontempered energy ratio), hpcp (entropy, mean, median, var, dmean, dmean2, dvar, dvar2, min and max), thpcp	none	version: 2.1-beta2; extractor: music 1.0
	9 extracted metadata	Single	sample rate, bitrate, equal_loudness, length, lossless, replay gain, codec, downmix, md5_encoded	none	version: 2.1-beta2; extractor: music 1.0
harmony-analyser [2]	Chord vectors frame	Frame	12-dimensional chord vectors, for each frame	Chordino Tones	version: 1.2-beta	cover/origin
	Chord vectors	Timestamp	12-dimensional chord vectors, with timestamps	Chordino Tones	version: 1.2-beta	cover/origin
	Key vectors frame	Frame	12-dimensional key vectors, for each frame	Key	version: 1.2-beta; frameRate: 10 Hz	cover/origin
	Key vectors	Timestamp	12-dimensional key vectors, with timestamps	Key	version: 1.2-beta; frameRate: 10 Hz	cover/origin
	TPS distance frame	Frame	1-dimensional series of TPS distances, for each frame	Chordino Labels, Chordino Tones, Key	version: 1.2-beta; frameRate: 10 Hz	cover/origin
	TPS distance	Timestamp	1-dimensional series of TPS distances, with timestamps	Chordino Labels, Chordino Tones, Key	version: 1.2-beta; none	cover/origin
	CC distance	Frame	1-dimensional series of CC distances, for each frame	Chroma Vectors, Chordino Labels	version: 1.2-beta; audibleThreshold: 0.07; maximumNumberOfChordTones: 4; maximalComplexity: 7; frameRate: 10 Hz	cover/origin
	CC distance	Timestamp	1-dimensional series of CC distances, with timestamps	Chroma Vectors, Chordino Labels	version: 1.2-beta; audibleThreshold: 0.07; maximumNumberOfChordTones: 4; maximalComplexity: 7	cover/origin
	Average CC distance	Single	Mean of CC distances	Chroma Vectors, Chordino Labels	version: 1.2-beta; audibleThreshold: 0.07; maximumNumberOfChordTones: 4; maximalComplexity: 7	cover/origin
	Chroma CD	Frame	1-dimensional series of chroma vector differences, for each frame	Chroma Vectors	audibleThreshold: 0.07; frameRate: 10 Hz	cover/origin
Marsyas [3]	68 scalar features	Single	Means and Standard deviations of: ZeroCrossing, Spectral Centroid, Rolloff, Flux, MFCC0-12, each calculated in 2 settings (2 * 2 * 17 features)	none	version: 0.5; window: Hamming; hopSize: 512; winSize: 1024; accSize: 898; singleVector: yes	cover/origin
Vamp plugins [4][5]	Chroma vectors	Frame	12-dimensional chroma vectors, for each frame	none	version: 1.1; block size: 16384; useNNLS: 1; rollon: 1; tuningMode: 0; whitening: 1; s: 0.7; chromanormalize: 0; frameRate: 10.75 Hz	cover/origin
	Chordino Labels	Timestamp	Chord labels, with timestamps	none	version: 1.1; block size: 16384; useNNLS: 1; tuningMode: 0; whitening: 1; s: 0.7; boostn: 0.1; usehartesyntax: 0	cover/origin
	Chordino Tones	Timestamp	Chord tones, with timestamps	none	version: 1.1; block size: 16384; useNNLS: 1; tuningMode: 0; whitening: 1; s: 0.7; boostn: 0.1; usehartesyntax: 0	cover/origin
	Key	Timestamp	Key labels, with timestamps	none	version: 1.7.1; tuning: 440; length: 10	cover/origin
YAAFE [6]	MFCC	Frame	13-dimensional MFCC, for each frame	none	version: 0.65; sampleRate: 22050; blockSize: 2048; stepSize: 1024; normalize: -1; resample: yes; frameRate: 21.52 Hz	cover/origin

References

[1] D. Bogdanov, N. Wack, E. Gómez, S. Gulati, and P. Herrera, “Essentia: An Audio Analysis Library for Music Information Retrieval,” in Proc. Int. Soc. Music Inform. Retrieval Conf., Curitiba, Brazil, 2013, pp. 493– 498.

[2] L. Maršı́k, “harmony-analyser.org - Java Library and Tools for Chordal Analysis,” in Proceedings of 2016 Joint WOCMAT-IRCAM Forum Conference, Taoyuan City, Taiwan, 2016, pp. 38–43.

[3] G. Tzanetakis and P. Cook, “Marsyas: A framework for audio analysis,” Organised sound, vol. 4, no. 3, pp. 169–175, 2000

[4] M. Mauch and S. Dixon, “Approximate Note Transcription for the Improved Identification of Difficult Chords,” in Proc. Int. Soc. Music Inform. Retrieval Conf., Utrecht, Netherlands, 2010, pp. 135–140.

[5] K. Noland and M. Sandler, “Signal processing parameters for tonality estimation,” in Audio Engineering Society Convention 122, Vienna, Austria, 2007.

[6] B. Mathieu, S. Essid, T. Fillon, J. Prado, and G. Richard, “YAAFE, an Easy to Use and Efficient Audio Feature Extraction Software,” in Proc. Int. Soc. Music Inform. Retrieval Conf., Utrecht, Netherlands, 2010, pp. 441–446

KaraMIR

KaraMIR: Project description

What's new

Kara1k

Featured post

Kara1k

Metadata and ground truth

Features

References