KNIME logo
Contact usDownload
Read time: 2 min

#66DaysOfData Resources Datasets

August 1, 2021
Stacked TrianglesPanel BG

The datasets for the #66daysofdata challenge

The core of the #66daysofdata with KNIME project draws on three Spotify datasets freely available on Kaggle (sign in to download them). As the Kaggle descriptions don't provide too much information about the different columns - check out this brief overview.

The tracks.csv dataset contains about 600k tracks from the period 1900-2021 and is described by 20 columns

idtrack unique ID
nametrack name
duration_msduration of song in milliseconds
explicitdescribes the content type of a track. Expicit content is represented by 1 and unexplicit by 0
artistsartist name
id_artistsartist unique ID (collection)
release_datedate when track was released
danceabilitydescribes how suitable a track is for dancing. Values range from 0.0 (least danceable= to 1.0 (most danceable)
energyrepresents a perceptual measure of intensity and activity. Values range from 0.0 (least energetic) to 1.0 (most energetic)
keythe estimated overall key of the track e.g., 0 = C, 1 = C♯/D♭, 2 = D, etc.
loudnessthe overall loudness of a track in decibels
modeindicates the modality (major or minor) of a track. Major is represented by 1 and minor is 0
speechinessdetects the presence of spoken words in a track. Values range from 0.0 (least speechy) to 1.0 (most speechy)
acousticnessa measure of whether the track is acoustic. Values range from 0.0 (least acoustic) to 1.0 (most acoustic)
instrumentalnesspredicts whether a track contains no vocals. Values range from 0.0 (least instrumental) to 1.0 (most instrumental=
livenessdetects the presence of an audience in the recording. Values range from 0.0 (least live) to 1.0 (most live)
valencedescribes the musical postiveness/negativeness conveyed by a track. Values range from 0.0 (least positive) to 1.0 (most positive)
tempothe overall estimated tempo of a track in beats per minute (BPM)
time_signaturetells how the music is to be counted

The artist-uris.csv dataset contains data on roughly 81k artists and is described by 2 columns (header names are not provided)

[id_artists]artist unique ID
[artists]artist name

The artist.csv dataset is very similar to the tracks.csv dataset but also includes a popularity metric for the artists.

popularitythe popularity of an artist. Values range from 0 (least popular= to 100 (most popular)

P.S. What is the #66DaysOfData Challenge?

The idea is to spend around 5-10 minutes on a specific data science project each day for 66 days and share your progress on your favorite social media platform with #66daysofdata. Ken Jee is the original instigator of #66daysofdata. Why 66 days? Because that's the average time it takes us to get practiced at doing something. In this case, data science with KNIME. Find the full roadmap here.

Data Exploration in #66DaysOfData with KNIME

Data Exploration in #66DaysOfData with KNIME

September 20, 2021 | by Roberto Cadili, Rosaria Silipo
The Importance of Community in Data Science

The Importance of Community in Data Science

November 21, 2019 | by Rosaria Silipo, Paolo Tamagnini