Saturday, March 10, 2012

Million Song Dataset | scaling MIR research

Looking for some Big Data to test out a project?  This one looks like fun.

Million Song Dataset | scaling MIR research: "The Million Song Dataset is a freely-available collection of audio features and metadata for a million contemporary popular music tracks.


Its purposes are:


To encourage research on algorithms that scale to commercial sizes
To provide a reference dataset for evaluating research
As a shortcut alternative to creating a large dataset with APIs (e.g. The Echo Nest's)
To help new researchers get started in the MIR field
The core of the dataset is the feature analysis and metadata for one million songs, provided by The Echo Nest. The dataset does not include any audio, only the derived features. Note, however, that sample audio can be fetched from services like 7digital, using code we provide.


The Million Song Dataset is also a cluster of complementary datasets contributed by the community:


SecondHandSongs dataset -> cover songs
musiXmatch dataset -> lyrics
Last.fm dataset -> song-level tags and similarity
Taste Profile subset -> user data


The Million Song Dataset started as a collaborative project between The Echo Nest and LabROSA. It was supported in part by the NSF."


'via Blog this'

No comments: