Kara1k is a freely-available dataset of audio features from karaoke songs provided by Recisio (Karafun mobile application).
Kara1k is mainly dedicated toward cover song identification and singing voice analysis. The dataset is divided into 1,000 cover songs from Karafun application, and the corresponding 1,000 songs by the original authors.
It contains:
Table 1 shows all supported metadata / ground truths that are available in the form of XLS and CSV files or as a SQL database dump. We specify, which part of the Kara1k dataset the annotation describes (Cover/Origin/Both).
Name | Type | Cover/Origin | Description |
---|---|---|---|
Title | String | Both | The official name of the song |
Artist | String | Both | The original artist |
Genre | Comma-separated list | Cover | Comma-separated list of all genres specified by Recisio |
Gender | Enum | Cover | Gender of the singers and vocalists sounding together, value from {mixed, male, males, female, females} |
Language | Comma-separated list | Cover | Comma-separated list of all languages that are sounding in the song |
Backing vocals | Boolean | Cover | The song contains backing vocals |
Duet | Boolean | Cover | The song contains duets |
Year | Numeric | Origin | When was the original recording was published (year) |
Explicit | Boolean | Cover | The song lyrics contain explicit language |
ISRC | String | Origin | ISRC of the original recording |
Youtube link | URL | Origin | YouTube URL |
Below is the full list of features (Table 2) with download links. Features are in the form of TXT, JSON or ARFF files, depending on their type and extractor. Some features may have prerequisites, as they are high-level features dependent e.g. on chroma vectors, and not solely on the WAV file. If so, they are listed in Prerequisites column.
We can distinguish 3 types based on the feature structure:
value or array). One feature per file.
timestamp: value or arrayfor each line of TXT file). One feature per file.
{ "name1": value, array or JSON, "name2": value, array or JSON, ... }, Marsyas: in an ARFF file with headers
@attribute name typefollowed by
@dataand comma-separated values). Multiple logically grouped features are included in one file.
Feature extractor | Feature | Type | Description | Prerequisites | Parameters | Definition | Download |
---|---|---|---|---|---|---|---|
Essentia [1] | 47 low-level features | Single | Mean, median, var, dmean, dmean2, dvar, dvar2, max and min of: barkbands, dissonance, erbbands, hfc, melbands, pitch salience, silence rates, spectral, gfcc and mfcc features | none | version: 2.1-beta2; extractor: music 1.0 | cover/origin | |
13 rhythm features | Single, Timestamp | beats (count, bpm, loudness, 6 histograms), danceability, onset rate, beat position timestamps | none | version: 2.1-beta2; extractor: music 1.0 | |||
24 harmony features | Single | chord (changes rate, number rate, strength, histogram, key based on chords - label, min/maj), key (label, min/maj, strength), tuning (strength, equal tempered deviation, frequency, nontempered energy ratio), hpcp (entropy, mean, median, var, dmean, dmean2, dvar, dvar2, min and max), thpcp | none | version: 2.1-beta2; extractor: music 1.0 | |||
9 extracted metadata | Single | sample rate, bitrate, equal_loudness, length, lossless, replay gain, codec, downmix, md5_encoded | none | version: 2.1-beta2; extractor: music 1.0 | |||
harmony-analyser [2] | Chord vectors frame | Frame | 12-dimensional chord vectors, for each frame | Chordino Tones | version: 1.2-beta | cover/origin | |
Chord vectors | Timestamp | 12-dimensional chord vectors, with timestamps | Chordino Tones | version: 1.2-beta | cover/origin | ||
Key vectors frame | Frame | 12-dimensional key vectors, for each frame | Key | version: 1.2-beta; frameRate: 10 Hz | cover/origin | ||
Key vectors | Timestamp | 12-dimensional key vectors, with timestamps | Key | version: 1.2-beta; frameRate: 10 Hz | cover/origin | ||
TPS distance frame | Frame | 1-dimensional series of TPS distances, for each frame | Chordino Labels, Chordino Tones, Key | version: 1.2-beta; frameRate: 10 Hz | cover/origin | ||
TPS distance | Timestamp | 1-dimensional series of TPS distances, with timestamps | Chordino Labels, Chordino Tones, Key | version: 1.2-beta; none | cover/origin | ||
CC distance | Frame | 1-dimensional series of CC distances, for each frame | Chroma Vectors, Chordino Labels | version: 1.2-beta; audibleThreshold: 0.07; maximumNumberOfChordTones: 4; maximalComplexity: 7; frameRate: 10 Hz | cover/origin | ||
CC distance | Timestamp | 1-dimensional series of CC distances, with timestamps | Chroma Vectors, Chordino Labels | version: 1.2-beta; audibleThreshold: 0.07; maximumNumberOfChordTones: 4; maximalComplexity: 7 | cover/origin | ||
Average CC distance | Single | Mean of CC distances | Chroma Vectors, Chordino Labels | version: 1.2-beta; audibleThreshold: 0.07; maximumNumberOfChordTones: 4; maximalComplexity: 7 | cover/origin | ||
Chroma CD | Frame | 1-dimensional series of chroma vector differences, for each frame | Chroma Vectors | audibleThreshold: 0.07; frameRate: 10 Hz | cover/origin | ||
Marsyas [3] | 68 scalar features | Single | Means and Standard deviations of: ZeroCrossing, Spectral Centroid, Rolloff, Flux, MFCC0-12, each calculated in 2 settings (2 * 2 * 17 features) | none | version: 0.5; window: Hamming; hopSize: 512; winSize: 1024; accSize: 898; singleVector: yes | cover/origin | |
Vamp plugins [4][5] | Chroma vectors | Frame | 12-dimensional chroma vectors, for each frame | none | version: 1.1; block size: 16384; useNNLS: 1; rollon: 1; tuningMode: 0; whitening: 1; s: 0.7; chromanormalize: 0; frameRate: 10.75 Hz | cover/origin | |
Chordino Labels | Timestamp | Chord labels, with timestamps | none | version: 1.1; block size: 16384; useNNLS: 1; tuningMode: 0; whitening: 1; s: 0.7; boostn: 0.1; usehartesyntax: 0 | cover/origin | ||
Chordino Tones | Timestamp | Chord tones, with timestamps | none | version: 1.1; block size: 16384; useNNLS: 1; tuningMode: 0; whitening: 1; s: 0.7; boostn: 0.1; usehartesyntax: 0 | cover/origin | ||
Key | Timestamp | Key labels, with timestamps | none | version: 1.7.1; tuning: 440; length: 10 | cover/origin | ||
YAAFE [6] | MFCC | Frame | 13-dimensional MFCC, for each frame | none | version: 0.65; sampleRate: 22050; blockSize: 2048; stepSize: 1024; normalize: -1; resample: yes; frameRate: 21.52 Hz | cover/origin |