This was a course project demonstrating machine learning tools in R on a complex dataset – Spotify song data. A combination of supervised and unsupervised methods were used: Support Vector Machines (SVM), K-Means Clustering, and Principal Components Analysis (PCA). All data was cleaned, analyzed, and visualized with R.
SVM

For SVM to be effective, clusters should be separable along some dimensions. These plots are examples of genre separation along song attributes.
While the SVM model accuracy was better than a naïve model, it is not sufficiently accurate for genre prediction in general. Music genres are fluid classifications, and many genres blend together, making decision boundaries unclear.
Clustering and PCA
K-means Clustering was performed to determine clusters “naturally” occurring in the data, which can be seen below.

Overall, clusters did not conform well to a given genre. To help break down cluster attributes, PCA was performed. Contributions of Principal Component 1 (PC1) and Principal Component 2 (PC2) are shown in Figure 3, alongside the dataset plotted in PC space.

Example songs for each cluster are tagged in the graphic. Cluster 1 (red) had high acousticness, speechiness, and valence (aka positivity), with low instrumentalness compared to other clusters. These are upbeat, acoustic songs with an emphasis on vocal delivery. An example of this cluster is Still Strugglin’ by Raekwon and Notorious BIG (https://www.youtube.com/watch?v=TWFwG1gp_6w).
Cluster 2 (green) had high instrumentalness and duration, with low acousticness and speechiness relative to the other clusters. These are longer, instrumental songs with minimal vocals. An example of this cluster is Walkway by Vacant (https://youtu.be/9cRVtld4c9A).
Cluster 3 (blue) had high loudness and liveness, with low danceability and acousticness. These songs tended to be faster (up-tempo), noisy, and electronic. An example of this cluster is Call of the Baphomet by PRXJEK (https://youtu.be/607l6ASRlAU).
Cluster 4 (purple) included songs that were both high and low energy, with high acousticness and danceability, and low loudness. These songs are more of a mixed bag, and represent quieter, acoustic instrumentation. An example of this cluster is Jump Off by Lotus (https://youtu.be/zV1ZkVJKTwo).
Leave a comment