This text is one among a two-part piece documenting my learnings from my Machine Studying Thesis at Spotify. Remember to additionally take a look at the second article on how I implemented Feature Importance in this research.
In 2021, I spent 8 months constructing a predictive mannequin to measure consumer satisfaction as a part of my Thesis at Spotify.
My objective was to grasp what made customers glad with their music expertise. To take action, I constructed a LightGBM classifier whose output was a binary response:
y = 1 → the consumer is seemingly glad
y = 0 → not a lot
Predicting human satisfaction is a problem as a result of people are by definition unhappy. Even a machine isn’t so match to decipher the mysteries of the human psyche. So naturally my mannequin was as confused as one may be.
From Human Predictor to Fortune Teller
My accuracy rating was round 0.5, which is the worst doable consequence you will get on a classifier. It means the algorithm has a 50% probability of predicting sure or no, and that’s as random as a human guess.
So I spent 2 months making an attempt and mixing completely different strategies to enhance the prediction of my mannequin. In the long run, I used to be lastly in a position to enhance my ROC rating from 0.5 to 0.73, which was a giant success!
On this submit, I’ll share with you the strategies I used to considerably improve the accuracy of my mannequin. This text may turn out to be useful everytime you’re coping with fashions that simply received’t cooperate.
Because of the confidentiality of this analysis, I can’t share delicate data, however I’ll do my best possible for it to not sound complicated.
Earlier than diving into the strategies I used, I simply need to be sure you get the fundamentals proper first. A few of these strategies depend on encoding your variables and making ready your information accordingly to ensure that them to work. A number of the code snippets I’ve included additionally reference…