A Novel, Interactive, Personalized Approach to Music Recommendation System

Jenny Wu, Tianru Wang, Jiading Zhu

Behind The Song Recommendation System

What a bummer to wake up on a Monday morning just to find out that your daily mix is a total disaster? It could be a double whammy if you have a long commute to work and school. Most often, we just accept what is in our “Recommended for you” list without questioning it. But have you ever thought about how music platforms like Spotify manage to recommend songs to us? These algorithms contrast and filter other playlists from listeners who listened to the same songs as you, then randomly recommend to you some unheard songs that these individuals like listening to. These particular algorithms are content-based filtering and collaborative filtering. Below, we examine some limitations regarding these algorithms.

  1. Echo chamber effect. It takes away your autonomy of exposing yourself to new music you may not know you are interested in.
  2. Lack of personalization. Recommending you new songs using other people’s browsing history, how original is that?
  3. Inability to be exposed to new unpopular songs. A collaborative filtering approach is unable to include musical features into the recommendation process, which makes less popular songs that just come out less likely to appear in your recommended list.

In this blog post, we are going to introduce an innovative machine learning approach to pinpoint those problems and refocus on the song’s musical features and listeners’ personalities. Below, we will explain how our new recommendation system works to streamline the song recommending process.

Getting Data Ready

In order to figure out how a person’s personality is related to his or her taste in music, we first scraped our own data using Python from Spotify with the help of Spotipy. We formatted the data and cross-referenced it with data from the Billboard to make sure that all the songs’ information is correct. We got our second dataset from Kaggle, which is a survey of personality, hobbies, demographics, and music taste for young people aged 15–30. Shown below are two data dictionaries that explain critical variables in the two datasets.

Data Dictionary for the Spotify Dataset
Data Dictionary for the Young People Survey Dataset

Like how all data analysis processes start, we cleaned our data, took out the NAs, and dealt with the outliers. In order to help the general public get a better understanding of our data, we designed interactive data visualization with the help of Plotly, R, and Python so that people are able to play around with the visualization and see the musical feature and time period that they are most interested in.

Shown below is a portion of the R code we used to create the dynamic data visualizations using the Plotly and Quantmod package.

data <- group_by(data, year) %>%
summarize(Positivity = mean(valence)*100,
Acousticness = mean(acousticness)*100,
fig <- plot_ly(data5, x = ~year)
fig <- fig %>% add_lines(y = ~Positivity, name="Positivity")
fig <- fig %>% add_lines(y = ~Acousticness, name="Acousticness")
fig <- fig %>% add_lines(y = ~Danceability, name="Danceability")
fig <- fig %>% add_lines(y = ~Energy, name="Energy")
fig <- fig %>% add_lines(y = ~Liveness, name="Liveness")
fig <- fig %>% add_lines(y = ~Speechiness, name="Speechiness")
fig <- fig %>%
layout(title = "Year v.s. Musical Features",
xaxis = list(title = "Year",
rangeslider = list(type = "date")))

Using the interactive graph below, you can select the time range in which you are most curious about. On the left, you can select the music feature you are most interested in to see its significance over time.

From the graph above, we can tell how the musical features of the Billboard top 100 songs changed from the 1970s to the 2020s. We can see that the musical features of top hits stayed roughly the same after 2000. The main shifts happened around 1980 when acousticness dropped and around 1995 when speechiness and energy rose.

This interactive chart shows how people’s age affects their music taste. You can hover on any data point to see the average score people gave to the survey questions.

The graph above shows the popularity of different genres of music for people of different ages. We can see that as people grew older, their likeness for pop, rock, hip-hop, and dance music drop while their likeness for country music grew.

Correlation Matrix

We went ahead and did a correlation matrix of the independent variables to check for multicollinearity. Theatre and art exhibitions are slightly positively correlated, but they turned out to be insignificant when we included this interaction in our regression model.

Correlation Matrix For People’s Demographic and Behavioral Information
Correlation Matrix For People’s Demographic and Behavioral Information
Correlation Matrix for People’s Demographic and Behavioral Information

Methodology, Model Selection, Machine Learning

Linear Regression

To start off, we tried linear regression models to see how each demographic or behavioral question relates to a person’s likeness to the six music genres.

lm_pop <- lm(Pop~Foreign.languages + Art.exhibitions + Dancing +  
Musical.instruments + Age + Entertainment.spending +
Celebrities + Adrenaline.sports + Theatre,
data = training_set)

Although we achieved an adjusted R-Squared of 0.67 and relatively low RMSE values, the model predictions only achieved an accuracy of less than 60%. We were not comfortable making any conclusions from the linear model. Therefore, we thought that maybe changing our analysis on the likeness to music genres to analyzing whether a person likes a music genre or not would be a more appropriate approach. For ratings of 4 to 5, we made them 1 indicating that the person liked this genre, while for ratings of 3 or less, we made them 0, indicating that the person did not like this genre. Instead of predicting the extent to which a user will like a genre, we are now trying to predict whether he or she will like it or not.

A classification problem can usually be tackled using either logistic regression, decision trees, random forests, or support vector machines.

Logistic Regression

We began our analysis for the classification problem using logistic regression. We constructed models using the same independent variables from our linear model. In order to examine the predictive power of our models, we also split our dataset into two groups, with 80% of data in the training set and 20% of data in the testing set. After extensive testing using several different test and training combinations, we found out that the logistic regression model is only acceptable for the “Pop” Genre, achieving an AUC of 0.7.

ROC Curve for Pop Genre’s Logistic Regression

Decision Trees

We plotted the decision trees for all other genres but recognized that since our dataset of people’s demographic and behavioral information is not extensive, decision trees are likely to overfit and create particularly deep trees. In order to mitigate both the error due to bias and the error due to variance, we chose to proceed with random forests models.

Random Forests

We chose to use random forests over decision trees in this case because random forests’ ability to reduce variance is by training on different samples of the data and the ability to train on a different subset of features. In our dataset containing people’s demographic and behavioral information, there are 150 different questions about people’s preferences. However, not all features are useful in determining which genre of music people find favorable.

rf_hiphop <- randomForest(Hiphop~Foreign.languages + 
Art.exhibitions + Dancing + Musical.instruments +
Age + Entertainment.spending + Celebrities +
Adrenaline.sports + Theatre,
data = training_set2, ntree = 500, mtry = 3,
importance = TRUE, proximity = TRUE)

Random forests will only use a certain number of those features in each model. Since a random forest is a collection of decision trees and if we use enough trees in the forest, we will ultimately find out the features that are useful, which will help reduce our error due to bias and variance.

Sample Trees from the Random Forests

The results from random forests models are promising, with an average of 76% sensitivity, 67% specificity, and 72% accuracy. The confusion matrices are shown below:

Confusion Matrix For Random Forest Models of Genre Dance, Rock, Hip-Hop, and Trance

Support Vector Machines

SVMs use kernel tricks to transform data and find optimal boundaries between possible outputs. The non-linear kernel enables the SVMs to find a curved separation line between data points. We constructed an SVM for the “Country” genre and achieved an accuracy of 87%, with sensitivity and specificity both over 80%.

svm_Country <- svm(formula = Country ~ Foreign.languages +    
Art.exhibitions + Dancing + Musical.instruments +
Celebrities + Theatre + Adrenaline.sports +
Entertainment.spending + Age,
type = "C-classification", data = training_set)

K-Fold Cross Validation

In order to validate our models, we perform repeated k-fold cross-validation. We did not choose a stratified k-fold cross validation since our dataset is small, and the training set does not adequately represent the entire population. Therefore, a k-fold cross-validation with repetition seems to be a better approach.

Since cross-validation isn’t necessary as a guard against over-fitting for the random forest algorithm, we applied k-fold cross-validation to our logistic regression and SVM models. Both sensitivity and specificity of all models weren’t greatly impacted, and the models’ accuracy rate remained roughly the same.

In conclusion, we designed a logistic regression model to predict whether a person likes pop music, random forest models to determine whether a person likes dance, rock, hip-hop, and trance music, and SVM to assess whether a person likes country music.

After we finished our process of designing machine learning models to predict a person’s favorite music genres, The idea of designing a song recommendation system came to our mind when we found out that songs recommended by popular streaming services, e.g., Spotify, are somewhat repetitive and mostly based on what others had listened to.

So what is new about our recommendation system?

We intend to build a machine learning algorithm that can take your personality and taste for different music features into account. We designed the system using Python. The algorithm takes in a user’s favorite genres from the output of our R models. Then the algorithm will recommend five songs of those genres entirely based on song popularity (Weeks on the Billboard chart, peak position, etc.) and ask the user to rate these songs.

Based on the user’s ratings, the algorithm will analyze which musical features are of more importance to the user and give its recommendations. This machine-learning algorithm also gets better with time since it will record whenever the user “liked” a song and take that song’s musical features into consideration.

Shown below is a portion of our code to enable user input of their ratings for recommended songs.

print("How do you feel about the following songs?")
print("Dislike - Like : 1 - 5, Never heard of: 0")
for i in range(0,5):
print(df.at[i, 'name'])
songs_appeared.append(df.at[i, 'name'])
if (temp_input>temp):
if (temp_input<temp_l) and (temp_input!=0):

Shown below is a short demonstration of our recommendation system implemented using Python.


We came up with a novel strategy to a long-existing problem. Our machine-learning algorithm improves on the current Content-Based Filtering System and Collaborative Filtering System by incorporating a person’s demographic and behavioral data into consideration. The algorithm avoids over-specialization and can handle fresh items by analyzing the musical features of a certain song.

Our approach enables the machine to learn from the user’s personality and taste of music so as to provide new experiences for our users and opportunities for them to listen to music that they have never tried before. Combining our R models and Python algorithm, we have designed a way to recommend music based on demographic and behavioral information of the user. The main drawbacks of our algorithm lie in the following two points:

  1. The unpredictability of a person’s taste in music. People may like a wide range of different songs. These songs may have little in common within their musical features. This would negatively impact the model’s performance in the early stages.
  2. The algorithm’s inability to learn from its past recommendations. The model only takes the songs that people like or dislike into consideration, but not people’s reactions to its past recommendations, which may cause it to occasionally recommend songs that the users don’t like.

This algorithm has the potential to improve itself since it can learn from a person’s taste in music. As time goes on, the algorithm would have more information on what songs the user liked and what songs the user did not like. In this way, the algorithm would be more accurate in recommending new music. This machine learning model is a great starting point for a new song recommending algorithm. We are also confident that this novel approach can be easily incorporated into Spotify’s already perfect song recommending framework and provide great insights for future studies.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store