Having fun with Spotipy!

6 minute read

In this post we will explore some of the Spotipy possibilities to retrieve and analyze your streaming data.

Libraries import and authentification

Let’s start by importing the libraries that we are going to use

# write a simple Python function
# libraries import
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from dateutil.parser import parse as parse_date

# spotify libraries
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
from spotipy import util

Next we need to authentificate our user and credentials. To do this, we first create an application in the Spotify Dashboard and gather the credentials. Then it’s easy to initialize and authorize the client:

token = util.prompt_for_user_token(username = user,
                                   scope = 'user-top-read',
                                   client_id= client_id,
                                   client_secret=client_secret,
                                   redirect_uri= 'http://localhost/')

spotify = spotipy.Spotify(auth=token)
spotify.trace = True
spotify.trace_out = True

Tip: In case you are struggling to get the user, client_id and client_secret, I recommend this video and this tutorial to help you configure your app.

Top tracks

The spotify object has several ways to access JSON for artists, songs and songs features. Let’s start by retrieving my top tracks. One way to do it is to define a for loop to parse the data into a tidy dataframe.

top_tracks = spotify.current_user_top_tracks(limit=1000, time_range="long_term") #<-- method to get top tracks 
cnt = 1
fields = ['rank_position', 'album_type', 'album_name', 'album_id',
              'artist_name', 'artist_id', 'track_duration', 'track_id', 
              'track_name', 'track_popularity', 'track_number', 'track_type']
    
tracks = {}

for i in fields:
        tracks[i] = []
    
for i in top_tracks['items']:
    
    #tracks['rank_position'].append(cnt)
    tracks['album_type'].append(i['album']['album_type'])
    tracks['album_id'].append(i['album']['id'])
    tracks['album_name'].append(i['album']['name'])
    tracks['artist_name'].append(i['artists'][0]['name'])
    tracks['artist_id'].append(i['artists'][0]['id'])
    tracks['track_duration'].append(i['duration_ms'])
    tracks['track_id'].append(i['id'])
    tracks['track_name'].append(i['name'])
    tracks['track_popularity'].append(i['popularity'])
    tracks['track_number'].append(i['track_number'])
    tracks['track_type'].append(i['type'])
    cnt += 1
    
df = pd.DataFrame(dict([ (k,pd.Series(v)) for k,v in tracks.items() ]))
df
rank_position album_type album_name album_id artist_name artist_id track_duration track_id track_name track_popularity track_number track_type
0 NaN ALBUM Principios Basicos De Astronomia 6wMqAOwrW6E8FkSGBXKGVe Los Planetas 0N1TIXCk9Q9JbEPXQDclEL 142680 0oQhYCbyUqieIVsl1zt1q3 Pesadilla En El Parque De Atracciones 20 11 track
2 NaN ALBUM The Messenger 3ZFz6QOM8bF9yjNm5NTjWr Johnny Marr 2bA2YuQk2ID3PWNXUhQrWS 232120 6r3eOjeAlA1SRiLMr9vNco The Crack Up 17 10 track
4 NaN ALBUM Oshin 1hSONHeTOofdeh2uoFBLgv DIIV 4OrizGCKhOrW6iDDJHN9xd 127800 49cnatGE4zvbt5gP5DISLy (Druun) 36 1 track
5 NaN ALBUM Oshin 1hSONHeTOofdeh2uoFBLgv DIIV 4OrizGCKhOrW6iDDJHN9xd 165840 76MJsF1rbbhrv2tDBfeRR5 Follow 38 9 track
6 NaN ALBUM Sugar Tax 1J8e1dLKVmZbsyxpGa9lGg Orchestral Manoeuvres In The Dark 7wJ9NwdRWtN92NunmXuwBk 249173 4NsNi4w10Tkpv6uikyXbJ6 Pandora's Box 53 2 track
7 NaN ALBUM Shapeshifting 3DyIAjq1iOl07Z1IV39Py6 Young Galaxy 5xfJLyvC5UElVSiMuLt1ss 32373 6UvfReXhIyTime6acIlHzc NTH 0 1 track
8 NaN ALBUM Antics 58fDEyJ5XSau8FRA3y8Bps Interpol 3WaJSfKnzc65VDgmj2zU8B 215826 6B182GP3TvEfmgUoIMVUSJ Evil 63 2 track

Now that we have tidy data, let’s analize the number of songs by artist

artists = pd.DataFrame(df.groupby('artist_name')['track_name'].count().sort_values(ascending = False).reset_index())
sns.catplot(x = 'track_name', y = 'artist_name', data = artists, kind = 'bar', height = 7, aspect = 1 );
plt.title("# of Top songs by artist")
plt.xlabel('# of songs')
plt.ylabel('Artist Name')
plt.xticks(np.arange(0, 10, step=1));
this is a placeholder image

What’s the average length of the songs I listen to?

# Average and median length of songs
avg_duration_sec = round(np.mean(df['track_duration'] / 1000), 2)
avg_duration_min = round(avg_duration_sec / 60, 2)
median_duration_sec = round(np.median(df['track_duration'] / 1000), 2)
median_duration_min = round(median_duration_sec / 60, 2)

print('The average length of top tracks is ' + str(avg_duration_sec) + ' seconds -- ' + str(avg_duration_min) + ' minutes') 
print('The median length of top tracks is ' + str(median_duration_sec) + ' seconds -- ' + str(median_duration_min) + ' minutes') 
The average length of top tracks is 232.29 seconds -- 3.87 minutes
The median length of top tracks is 236.25 seconds -- 3.94 minutes

What’s the average popularity of the songs?

# Average and median popularity of songs
avg_popularity = round(np.mean(df['track_popularity']), 2)
median_popularity = round(np.median(df['track_popularity']), 2)


print('The average popularity of top tracks is ' + str(avg_popularity))
print('The median popularity of top tracks is ' + str(median_popularity))
The average popularity of top tracks is 28.78
The median popularity of top tracks is 32.0

Audio Features

The audio features are characteristics that Spotify assigns to the songs. Some of them are really straightforward (tempo, loudness) and others not that much (instrumentalness). You can read more on the API reference.

# Get audio features from top tracks and analysis
df = df.loc[df['track_id'] != '4BSP1PK4sLlticRfYl1M79',:] # drop track that's not a song
features = spotify.audio_features(tracks = df['track_id'])

# Build df from dictionary and merge
def get_features(result):
    danceability = []
    key = []
    loudness = []
    mode = []
    speechiness = []
    acousticness = []
    instrumentalness = []
    liveness = []
    valence = []
    tempo = []
    track_id = []
    t = 0
    for i in result:
        danceability.append(i['danceability'])
        key.append(i['key'])
        loudness.append(i['loudness'])
        mode.append(i['mode'])
        speechiness.append(i['speechiness'])
        instrumentalness.append(i['instrumentalness'])
        liveness.append(i['liveness'])
        valence.append(i['valence'])
        tempo.append(i['tempo'])
        track_id.append(i['id'])
    return pd.DataFrame({'danceability' : danceability, 'key' : key, 'loudness' : loudness, 'speechiness' : speechiness,
                          'instrumentalness' : instrumentalness, 'liveness' : liveness, 'valence' : valence, 
                          'tempo' : tempo, 'track_id' : track_id})
    features_df = get_features(features)
# merge dataframes
songs_and_features = pd.merge(df, features_df, left_on = 'track_id', right_on = 'track_id')
songs_and_features
rank_position album_type album_name album_id artist_name artist_id track_duration track_id track_name track_popularity track_number track_type danceability key loudness speechiness instrumentalness liveness valence tempo
0 NaN ALBUM Principios Basicos De Astronomia 6wMqAOwrW6E8FkSGBXKGVe Los Planetas 0N1TIXCk9Q9JbEPXQDclEL 142680 0oQhYCbyUqieIVsl1zt1q3 Pesadilla En El Parque De Atracciones 20 11 track 0.363 2 -6.365 0.0814 0.022700 0.3970 0.832 132.532
1 NaN ALBUM The Messenger 3ZFz6QOM8bF9yjNm5NTjWr Johnny Marr 2bA2YuQk2ID3PWNXUhQrWS 232120 6r3eOjeAlA1SRiLMr9vNco The Crack Up 17 10 track 0.539 5 -6.303 0.0291 0.001290 0.3270 0.329 101.007
2 NaN SINGLE EP II 5JsmZQAcvUYjjDIdehCVub Yumi Zouma 4tPyCwWrsvZ8OKYl7QRavL 288080 17SSL9kvysDq9D6YuBMEoP Catastrophe 0 3 track 0.696 7 -9.911 0.0317 0.190000 0.1080 0.567 137.986
3 NaN ALBUM Oshin 1hSONHeTOofdeh2uoFBLgv DIIV 4OrizGCKhOrW6iDDJHN9xd 127800 49cnatGE4zvbt5gP5DISLy (Druun) 36 1 track 0.468 9 -8.553 0.0305 0.628000 0.0977 0.641 134.964
4 NaN ALBUM Oshin 1hSONHeTOofdeh2uoFBLgv DIIV 4OrizGCKhOrW6iDDJHN9xd 165840 76MJsF1rbbhrv2tDBfeRR5 Follow 38 9 track 0.364 7 -5.273 0.0454 0.878000 0.3260 0.624 142.908

Let’s go ahead and plot danceability of the artists with more than 1 song in the list.

Turns out I’m not that bad… I was expecting a much darker outcome :satisfied:

Top Artists

With the method .current_user_top_artists() you can retrieve your top artists and some info about them. Applying the same methods than before we get the following df:

artist_name genres popularity artist_image followers
14 The Beatles [beatlesque, british invasion, classic rock, m... 90 https://i.scdn.co/image/1047bf172446f2a815a99a... 16626426
6 Frank Ocean [alternative r&b, hip hop, lgbtq+ hip hop, neo... 87 https://i.scdn.co/image/0cc22250c0b18183e5c62f... 5930758
10 Daft Punk [electro, filter house] 83 https://i.scdn.co/image/8e189c820ba32ffd332393... 6266549
18 U2 [irish rock, permanent wave, rock] 83 https://i.scdn.co/image/40d6c5c14355cfc127b70d... 6949831
4 Cigarettes After Sex [ambient pop, dream pop, el paso indie, shoegaze] 76 https://i.scdn.co/image/d34e8cb22455b5d6f49fde... 1570803
15 Pixies [alternative rock, art rock, boston rock, mode... 72 https://i.scdn.co/image/f3ed963d1578a0aff0be16... 1567607
2 The National [indie rock, modern rock] 71 https://i.scdn.co/image/ffaca820e2ea78e2c2cd01... 1193825
0 Wilco [alternative country, alternative rock, chicag... 66 https://i.scdn.co/image/cdf5e817af10c070b8995f... 533115
7 Tycho [chillwave, downtempo, electronica, indietroni... 66 https://i.scdn.co/image/80ba51f8c16442d839a0ed... 563780
5 Patti Smith [art pop, art punk, art rock, dance rock, folk... 64 https://i.scdn.co/image/44012cfc4fedcbe92ac5e1... 621566
3 Alexandre Tharaud [classical performance, classical piano] 62 https://i.scdn.co/image/c5e94f9df82bd5fdf9a978... 18637
12 Bobby Womack [classic soul, funk, motown, quiet storm, soul... 61 https://i.scdn.co/image/2a844501416646506eef4a... 445212
17 The Jesus and Mary Chain [alternative rock, art rock, dance rock, new w... 58 https://i.scdn.co/image/7134f3041e86bb30a1aa8f... 286478
16 John Cale [alternative rock, anti-folk, art pop, art roc... 55 https://i.scdn.co/image/ab6772690000dd221d8c27... 111959

Get top tracks from an artist

I wanna finish this post with a way of getting top tracks from any artist you want. You only need the spotify uri that identifies each artist.

# Diiv Example
lz_uri = 'spotify:artist:4OrizGCKhOrW6iDDJHN9xd'

spotify = spotipy.Spotify(client_credentials_manager=credentials)
results = spotify.artist_top_tracks(lz_uri)

for track in results['tracks'][:10]:
    print('track    : ' + track['name'])
    print('audio    : ' + track['preview_url'])
    print('cover art: ' + track['album']['images'][0]['url'])
    print()
track    : Under the Sun
audio    : https://p.scdn.co/mp3-preview/168b3cbe7b1e7224ca297b757b6c9b35fb176629?cid=ff47cd5371a049cb87c5c6bc407f4901
cover art: https://i.scdn.co/image/ab67616d0000b2735172f44c5f8743c09fb5bbc8

track    : Doused
audio    : https://p.scdn.co/mp3-preview/204c017dbfb537a03ea1ce0146b419d6cd40fa10?cid=ff47cd5371a049cb87c5c6bc407f4901
cover art: https://i.scdn.co/image/ab67616d0000b2737bc6a0c2b8d9393a2dd80cb7

track    : Blankenship
audio    : https://p.scdn.co/mp3-preview/93ae971865b312f14d5b46b85fc75074d0daa1f3?cid=ff47cd5371a049cb87c5c6bc407f4901
cover art: https://i.scdn.co/image/ab67616d0000b2733bdae3719124f5f749feb9c5

Tip: If you want to improve your analysis it’s better to Download your privacy data from Spotify. It’s an easy process but it takes a few days to complete. Once it’s done, they will send you an alert to your email with the instructions to analyze the data.

Extras

  • Repo of the project.
  • 2 great songs that keep coming to my mind in these odd, strange days that we are living.