Your Favorite Thing: Algorithmically Perfect Playlist

I’ve previously written about analysing my iTunes library and about generating Smart Playlists in iTunes. This post takes things a bit further by generating a “perfect playlist” outside of iTunes… it is exclusively for nerds.

How can you put together a perfect playlist? What are your favourite songs? How can you tell what they are? Well, we could look at how many times you’ve played each song in your iTunes library (assuming this is mainly how you consume your music)… but this can be misleading. Songs that have been in there since the start (in my case, a decade ago) have had longer to accumulate plays than songs that were added last year. This problem was covered nicely in a post by Mr Science Show.

He suggests that your all-time greatest playlist can be modelled using

\frac{dp}{dt}=\frac{A}{Bt+N_0} + Ce^{-Dt}

Where N_0 is the number of tracks in the library at t_0, time zero. A and B are constants and the collection growing linearly over time. The second component is an additional correction for the fact that songs added more recently are likely to have garnered more plays, and as they age, they relax back into the general soup of the library. I used something similar to make my perfect playlist.

Calculating something like this is well beyond the scope of iTunes and so we need to do something more heavy duty. The steps below show how this can be achieved. Of course, I used IgorPro to do almost all of this. I tried to read in the iTunes Music Library.xml directly in Igor using the udStFiLrXML package, but couldn’t get it to work. So there’s a bit of ruby followed by an all-Igor workflow. You can scroll to the bottom to find out a) whether this was worth it and b) for other stuff I discovered along the way.

All the code to do this is available here. I’ll try to put quantixed code on github from now on.

Once the data is in Igor, the strategy is to calculate the expected number of plays a track should have received if iTunes was simply set to random. We can then compare this number to the actual number of plays. The ratio of these numbers helps us to discriminate our favourite tracks. To work out the expected plays, we calculate the number of tracks in the library over time and the inverse of this gives us the probability that a given track, at that moment in the lifetime of the library, will be played. We know the total number of plays and the lifetime of the library, so if we assume that play rate is constant over time (fair assumption), this means we can calculate the expected number of plays for each track. As noted above, there is a slight snag with this methodology, because tracks added in the last few months will have a very low number of expected plays, yet are likely to have been played quite a lot. To compensate for this I used the modelling method suggested by Mr Science Show, but only for recent songs. Hopefully that all makes sense, so now for a step-by-step guide.

Step 1: Extract data from iTunes xml file to tsv

After trying and failing to write my own script to parse the xml file, I stumbled across this on the web.

#!/usr/bin/ruby

require 'rubygems'
 require 'nokogiri'

list = []
 doc = Nokogiri::XML(File.open(ARGV[0], 'r'))

 doc.xpath('/plist/dict/dict/dict').each do |node|

hash = {}
 last_key = nil

node.children.each do |child|

next if child.blank?

if child.name == 'key'

 last_key = child.text
 else

 hash[last_key] = child.text
 end
 end

list << hash
 end

 p list

This script was saved as parsenoko.rb and could be executed from the command line

find . -name "*.xml" -exec ruby parsenoko.rb {} > playlist.csv \;

after cd to appropriate directory containing the script and a copy of the xml file.

Step 2: A little bit of cleaning

The file starts with [ and ends with ]. Each dictionary item (dict) has been printed enclosed by {}. It’s easiest to remove these before importing to IgorPro. For my library the maximum number of keys is 38. I added a line with (ColumnA<tab>ColumnB<tab>…<tab>ColumnAL), to make sure all keys were imported correctly.

Step 3: Import into IgorPro

Import the tsv. This is preferable to csv because many tracks have commas in the track title, album title or artist name. Everything comes in as text and we will sort everything out in the next step.

LoadWave /N=Column/O/K=2/J/V={"\t"," $",0,0}

Step 4: Get Igor to sort the key values into waves corresponding to each key

This is a major type of cleaning. What we’ll do is read the key and its value. The two are separated by => and so this is used to parse and resort the values. This will convert the numeric values to numeric waves.

This is done by executing

iTunes()

Step 5: Convert timestamps to date values

iTunes stores dates in a UTC timestamp with this format 2014-10-02T20:24:10Z. It does this for Date Added, Date Modified, Last Played etc. To do anything meaningful with these, we need to convert them to date values. IgorPro uses the time in seconds from Midnight on 1st Jan 1904 as a date system. This requires double precision FP64 waves. We can parse the string containing this time stamp and convert it using

DateRead()

Step 6: Discover your favourite tracks!

We do all of this by running

Predictor()

The way this works is described above. Note that you can run whatever algorithm you like at this point to generate a list of tracks.

Step 7: Make a playlist to feed back to iTunes

The format for playlists is the M3U file. This has a simple layout which can easily be printed to a Notebook in Igor and then saved as a file for importing back into iTunes.

To do this we run

WritePlaylist(listlen)

Where the Variable listlen is the length of the playlist. In this example, listlen=50 would give the Top 50 favourite tracks.

So what did I find out?

My top 50 songs determined by this method were quite different to the Smart Playlist in iTunes of the Most Played tracks. The tracks at the top of the Most Played list in iTunes have disappeared in the new list and these are the ones that have been in the library for a long time and I suppose I don’t listen to that much any more. The new algorithmically designed playlist has a bunch of fresher tracks that were added in the last few years and I have listened to quite a lot. Looking through I can see music that I should explore in more detail. In short, it’s a superior playlist and one that will always change and should not go stale.

Other useful stuff

There are quite a few parsing tools on the web that vary in their utility and usefulness. Some that deserve a mention are:

  • The xml file should be readable as a plist by cocoa which is native to OSX
  • Visualisation of what proportion of an iTunes library is by a given artist – bdunagan’s blog
  • itunes-parser on github by phiggins
  • Really nice XSLT to move the xml file to html – moveable-type
  • Comprehensive but difficult to follow method in ruby.

The post title comes from “Your Favorite Thing” by Sugar from their LP “File Under: Easy Listening”

Advertisements

2 responses

  1. […] what will be interesting to lots of people. Posts that took a long time to prepare and were the most fun to think about, have received hardly any views. The PCA post is most surprising, because I thought no-one would […]

  2. […] time to 1994. This made me wonder what albums released that year were my favourites. As previously described on this blog, I have this information readily available. So I quickly crunched the numbers. I […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: