Monday, October 18, 2010

Pandora

I recently went back to using Pandora, the internet radio station that selects songs based on your musical preferences. I tried it a few years ago and got a few interesting suggestions, but it was pretty hit and miss.

After a gap of two years or so of not using it, it's gotten way better in the interim. I think it's to do with the way they predict which songs you'd like. Originally, they had a bunch of guys who assigned attributes to the songs (acoustic guitar, major chord instrumentation, 'wispy male vocals' (really) etc.) and these were prominently featured ('this song came up because you seem to like [attribute X]). Coding up vague song descriptions struck me at the time as an incredibly labour-intensive way of doing it, but they seemed to be giving it a go, so good for them.The Pandora About Page still describes this process.

While this is just a hunch, what I suspect they actually do now is use the much richer data set that their users provide when they select which songs they like and dislike, and which songs they skip. This is outsourced to thousands of people, not fifty, and reveals actual likes and dislikes (not just a presumed love of all 'wispy male vocals', however defined).

I suspect this is the case, because:

a) songs that come up feature less musical overlap than before on song type, and more on taste - Owl City came up on my station for Jack's Mannequin the other day. Musically, they're quite different. Demographically, they're both near the centre of the bullseye of Stuff White People Like.

b) The 'Why did this song come up?' tab has now gone, as it has to when the answer to every question is 'because that's what happened when we inverted the giant matrix and extracted the principle components'

and

c) It's gotten a lot better, as it does when you start using really huge datasets with really good information extraction mechanisms. The Jack's Mannequin station is almost a pitch perfect playlist for SM's music tastes.

And I suspect this was the plan all along - the whole Music Genome thing is mainly a seeding mechanism (and for new songs, which don't yet have any likes or dislikes), and the real information was always in user choices. At a minimum, if these guys aren't using this information, they're crazy.

Put it this way - the algorithm is now good enough that I listened long enough for the frequent ads to bug me, and actually paid for the upgrade (perhaps the first 'freemium' product where I bought the extra part). Proof that good matrix decomposition can really add value!

No comments:

Post a Comment