Having fun with Spotipy!

6 minute read

In this post we will explore some of the Spotipy possibilities to retrieve and analyze your streaming data.

Libraries import and authentification

Let’s start by importing the libraries that we are going to use

# write a simple Python function
# libraries import
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from dateutil.parser import parse as parse_date

# spotify libraries
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
from spotipy import util

Next we need to authentificate our user and credentials. To do this, we first create an application in the Spotify Dashboard and gather the credentials. Then it’s easy to initialize and authorize the client:

token = util.prompt_for_user_token(username = user,
                                   scope = 'user-top-read',
                                   client_id= client_id,
                                   client_secret=client_secret,
                                   redirect_uri= 'http://localhost/')

spotify = spotipy.Spotify(auth=token)
spotify.trace = True
spotify.trace_out = True

Tip: In case you are struggling to get the user, client_id and client_secret, I recommend this video and this tutorial to help you configure your app.

Top tracks

The spotify object has several ways to access JSON for artists, songs and songs features. Let’s start by retrieving my top tracks. One way to do it is to define a for loop to parse the data into a tidy dataframe.

top_tracks = spotify.current_user_top_tracks(limit=1000, time_range="long_term") #<-- method to get top tracks 
cnt = 1
fields = ['rank_position', 'album_type', 'album_name', 'album_id',
              'artist_name', 'artist_id', 'track_duration', 'track_id', 
              'track_name', 'track_popularity', 'track_number', 'track_type']
    
tracks = {}

for i in fields:
        tracks[i] = []
    
for i in top_tracks['items']:
    
    #tracks['rank_position'].append(cnt)
    tracks['album_type'].append(i['album']['album_type'])
    tracks['album_id'].append(i['album']['id'])
    tracks['album_name'].append(i['album']['name'])
    tracks['artist_name'].append(i['artists'][0]['name'])
    tracks['artist_id'].append(i['artists'][0]['id'])
    tracks['track_duration'].append(i['duration_ms'])
    tracks['track_id'].append(i['id'])
    tracks['track_name'].append(i['name'])
    tracks['track_popularity'].append(i['popularity'])
    tracks['track_number'].append(i['track_number'])
    tracks['track_type'].append(i['type'])
    cnt += 1
    
df = pd.DataFrame(dict([ (k,pd.Series(v)) for k,v in tracks.items() ]))
df

	rank_position	album_type	album_name	album_id	artist_name	artist_id	track_duration	track_id	track_name	track_popularity	track_number	track_type
0	NaN	ALBUM	Principios Basicos De Astronomia	6wMqAOwrW6E8FkSGBXKGVe	Los Planetas	0N1TIXCk9Q9JbEPXQDclEL	142680	0oQhYCbyUqieIVsl1zt1q3	Pesadilla En El Parque De Atracciones	20	11	track
2	NaN	ALBUM	The Messenger	3ZFz6QOM8bF9yjNm5NTjWr	Johnny Marr	2bA2YuQk2ID3PWNXUhQrWS	232120	6r3eOjeAlA1SRiLMr9vNco	The Crack Up	17	10	track
4	NaN	ALBUM	Oshin	1hSONHeTOofdeh2uoFBLgv	DIIV	4OrizGCKhOrW6iDDJHN9xd	127800	49cnatGE4zvbt5gP5DISLy	(Druun)	36	1	track
5	NaN	ALBUM	Oshin	1hSONHeTOofdeh2uoFBLgv	DIIV	4OrizGCKhOrW6iDDJHN9xd	165840	76MJsF1rbbhrv2tDBfeRR5	Follow	38	9	track
6	NaN	ALBUM	Sugar Tax	1J8e1dLKVmZbsyxpGa9lGg	Orchestral Manoeuvres In The Dark	7wJ9NwdRWtN92NunmXuwBk	249173	4NsNi4w10Tkpv6uikyXbJ6	Pandora's Box	53	2	track
7	NaN	ALBUM	Shapeshifting	3DyIAjq1iOl07Z1IV39Py6	Young Galaxy	5xfJLyvC5UElVSiMuLt1ss	32373	6UvfReXhIyTime6acIlHzc	NTH	0	1	track
8	NaN	ALBUM	Antics	58fDEyJ5XSau8FRA3y8Bps	Interpol	3WaJSfKnzc65VDgmj2zU8B	215826	6B182GP3TvEfmgUoIMVUSJ	Evil	63	2	track

Now that we have tidy data, let’s analize the number of songs by artist

artists = pd.DataFrame(df.groupby('artist_name')['track_name'].count().sort_values(ascending = False).reset_index())
sns.catplot(x = 'track_name', y = 'artist_name', data = artists, kind = 'bar', height = 7, aspect = 1 );
plt.title("# of Top songs by artist")
plt.xlabel('# of songs')
plt.ylabel('Artist Name')
plt.xticks(np.arange(0, 10, step=1));

What’s the average length of the songs I listen to?

# Average and median length of songs
avg_duration_sec = round(np.mean(df['track_duration'] / 1000), 2)
avg_duration_min = round(avg_duration_sec / 60, 2)
median_duration_sec = round(np.median(df['track_duration'] / 1000), 2)
median_duration_min = round(median_duration_sec / 60, 2)

print('The average length of top tracks is ' + str(avg_duration_sec) + ' seconds -- ' + str(avg_duration_min) + ' minutes') 
print('The median length of top tracks is ' + str(median_duration_sec) + ' seconds -- ' + str(median_duration_min) + ' minutes') 

The average length of top tracks is 232.29 seconds -- 3.87 minutes
The median length of top tracks is 236.25 seconds -- 3.94 minutes

What’s the average popularity of the songs?

# Average and median popularity of songs
avg_popularity = round(np.mean(df['track_popularity']), 2)
median_popularity = round(np.median(df['track_popularity']), 2)


print('The average popularity of top tracks is ' + str(avg_popularity))
print('The median popularity of top tracks is ' + str(median_popularity))

The average popularity of top tracks is 28.78
The median popularity of top tracks is 32.0

Audio Features

The audio features are characteristics that Spotify assigns to the songs. Some of them are really straightforward (tempo, loudness) and others not that much (instrumentalness). You can read more on the API reference.

# Get audio features from top tracks and analysis
df = df.loc[df['track_id'] != '4BSP1PK4sLlticRfYl1M79',:] # drop track that's not a song
features = spotify.audio_features(tracks = df['track_id'])

# Build df from dictionary and merge
def get_features(result):
    danceability = []
    key = []
    loudness = []
    mode = []
    speechiness = []
    acousticness = []
    instrumentalness = []
    liveness = []
    valence = []
    tempo = []
    track_id = []
    t = 0
    for i in result:
        danceability.append(i['danceability'])
        key.append(i['key'])
        loudness.append(i['loudness'])
        mode.append(i['mode'])
        speechiness.append(i['speechiness'])
        instrumentalness.append(i['instrumentalness'])
        liveness.append(i['liveness'])
        valence.append(i['valence'])
        tempo.append(i['tempo'])
        track_id.append(i['id'])
    return pd.DataFrame({'danceability' : danceability, 'key' : key, 'loudness' : loudness, 'speechiness' : speechiness,
                          'instrumentalness' : instrumentalness, 'liveness' : liveness, 'valence' : valence, 
                          'tempo' : tempo, 'track_id' : track_id})
    features_df = get_features(features)
# merge dataframes
songs_and_features = pd.merge(df, features_df, left_on = 'track_id', right_on = 'track_id')
songs_and_features

	rank_position	album_type	album_name	album_id	artist_name	artist_id	track_duration	track_id	track_name	track_popularity	track_number	track_type	danceability	key	loudness	speechiness	instrumentalness	liveness	valence	tempo
0	NaN	ALBUM	Principios Basicos De Astronomia	6wMqAOwrW6E8FkSGBXKGVe	Los Planetas	0N1TIXCk9Q9JbEPXQDclEL	142680	0oQhYCbyUqieIVsl1zt1q3	Pesadilla En El Parque De Atracciones	20	11	track	0.363	2	-6.365	0.0814	0.022700	0.3970	0.832	132.532
1	NaN	ALBUM	The Messenger	3ZFz6QOM8bF9yjNm5NTjWr	Johnny Marr	2bA2YuQk2ID3PWNXUhQrWS	232120	6r3eOjeAlA1SRiLMr9vNco	The Crack Up	17	10	track	0.539	5	-6.303	0.0291	0.001290	0.3270	0.329	101.007
2	NaN	SINGLE	EP II	5JsmZQAcvUYjjDIdehCVub	Yumi Zouma	4tPyCwWrsvZ8OKYl7QRavL	288080	17SSL9kvysDq9D6YuBMEoP	Catastrophe	0	3	track	0.696	7	-9.911	0.0317	0.190000	0.1080	0.567	137.986
3	NaN	ALBUM	Oshin	1hSONHeTOofdeh2uoFBLgv	DIIV	4OrizGCKhOrW6iDDJHN9xd	127800	49cnatGE4zvbt5gP5DISLy	(Druun)	36	1	track	0.468	9	-8.553	0.0305	0.628000	0.0977	0.641	134.964
4	NaN	ALBUM	Oshin	1hSONHeTOofdeh2uoFBLgv	DIIV	4OrizGCKhOrW6iDDJHN9xd	165840	76MJsF1rbbhrv2tDBfeRR5	Follow	38	9	track	0.364	7	-5.273	0.0454	0.878000	0.3260	0.624	142.908

Let’s go ahead and plot danceability of the artists with more than 1 song in the list.

Turns out I’m not that bad… I was expecting a much darker outcome

Top Artists

With the method .current_user_top_artists() you can retrieve your top artists and some info about them. Applying the same methods than before we get the following df:

	artist_name	genres	popularity	artist_image	followers
14	The Beatles	[beatlesque, british invasion, classic rock, m...	90	https://i.scdn.co/image/1047bf172446f2a815a99a...	16626426
6	Frank Ocean	[alternative r&b, hip hop, lgbtq+ hip hop, neo...	87	https://i.scdn.co/image/0cc22250c0b18183e5c62f...	5930758
10	Daft Punk	[electro, filter house]	83	https://i.scdn.co/image/8e189c820ba32ffd332393...	6266549
18	U2	[irish rock, permanent wave, rock]	83	https://i.scdn.co/image/40d6c5c14355cfc127b70d...	6949831
4	Cigarettes After Sex	[ambient pop, dream pop, el paso indie, shoegaze]	76	https://i.scdn.co/image/d34e8cb22455b5d6f49fde...	1570803
15	Pixies	[alternative rock, art rock, boston rock, mode...	72	https://i.scdn.co/image/f3ed963d1578a0aff0be16...	1567607
2	The National	[indie rock, modern rock]	71	https://i.scdn.co/image/ffaca820e2ea78e2c2cd01...	1193825
0	Wilco	[alternative country, alternative rock, chicag...	66	https://i.scdn.co/image/cdf5e817af10c070b8995f...	533115
7	Tycho	[chillwave, downtempo, electronica, indietroni...	66	https://i.scdn.co/image/80ba51f8c16442d839a0ed...	563780
5	Patti Smith	[art pop, art punk, art rock, dance rock, folk...	64	https://i.scdn.co/image/44012cfc4fedcbe92ac5e1...	621566
3	Alexandre Tharaud	[classical performance, classical piano]	62	https://i.scdn.co/image/c5e94f9df82bd5fdf9a978...	18637
12	Bobby Womack	[classic soul, funk, motown, quiet storm, soul...	61	https://i.scdn.co/image/2a844501416646506eef4a...	445212
17	The Jesus and Mary Chain	[alternative rock, art rock, dance rock, new w...	58	https://i.scdn.co/image/7134f3041e86bb30a1aa8f...	286478
16	John Cale	[alternative rock, anti-folk, art pop, art roc...	55	https://i.scdn.co/image/ab6772690000dd221d8c27...	111959

Get top tracks from an artist

I wanna finish this post with a way of getting top tracks from any artist you want. You only need the spotify uri that identifies each artist.

# Diiv Example
lz_uri = 'spotify:artist:4OrizGCKhOrW6iDDJHN9xd'

spotify = spotipy.Spotify(client_credentials_manager=credentials)
results = spotify.artist_top_tracks(lz_uri)

for track in results['tracks'][:10]:
    print('track    : ' + track['name'])
    print('audio    : ' + track['preview_url'])
    print('cover art: ' + track['album']['images'][0]['url'])
    print()

track    : Under the Sun
audio    : https://p.scdn.co/mp3-preview/168b3cbe7b1e7224ca297b757b6c9b35fb176629?cid=ff47cd5371a049cb87c5c6bc407f4901
cover art: https://i.scdn.co/image/ab67616d0000b2735172f44c5f8743c09fb5bbc8

track    : Doused
audio    : https://p.scdn.co/mp3-preview/204c017dbfb537a03ea1ce0146b419d6cd40fa10?cid=ff47cd5371a049cb87c5c6bc407f4901
cover art: https://i.scdn.co/image/ab67616d0000b2737bc6a0c2b8d9393a2dd80cb7

track    : Blankenship
audio    : https://p.scdn.co/mp3-preview/93ae971865b312f14d5b46b85fc75074d0daa1f3?cid=ff47cd5371a049cb87c5c6bc407f4901
cover art: https://i.scdn.co/image/ab67616d0000b2733bdae3719124f5f749feb9c5

Tip: If you want to improve your analysis it’s better to Download your privacy data from Spotify. It’s an easy process but it takes a few days to complete. Once it’s done, they will send you an alert to your email with the instructions to analyze the data.

Extras

Repo of the project.
2 great songs that keep coming to my mind in these odd, strange days that we are living.

Twitter Facebook LinkedIn

Leandro Elesgaray