Tag Archives: iTunes

Bateman Writes: 1994

BBC 6Music recently went back in time to 1994. This made me wonder what albums released that year were my favourites. As previously described on this blog, I have this information readily available. So I quickly crunched the numbers. I focused on full-length albums and, using play density (sum of all plays divided by number of album tracks) as a metric, I plotted out the Top 20.




There you have it. Scorn’s epic Evanescence has the highest play density of any album released in 1994 in my iTunes library. By some distance. If you haven’t heard it, this is an amazing record that broke new ground and spawned numerous musical genres. I think that record, One Last Laugh In A Place of Dying… and Ro Sham Bo would all be high on my all-time favourite list. A good year for music then as far as I’m concerned.

Other observations: I was amazed that Definitely Maybe was up there, since I am not a big fan of Oasis. Likewise for Dummy by Portishead. Note that Oxford’s Angels and Superdeformed[…] are bootleg records.

Bubbling under: this was the top 20, but there were some great records bubbling under in the 20s and 30s. Here are the best 5.

  • Heatmiser – Cop and Speeder
  • Circle – Meronia
  • Credit to the Nation – Take Dis
  • Kyuss – Welcome to Sky Valley
  • Drive Like Jehu – Yank Crime

I heard tracks from some of these bands on 6Music, but many were missing. Maybe there is something for you to investigate.

Part of a series obsessively looking at music in an obsessive manner.

Your Favorite Thing: Algorithmically Perfect Playlist

I’ve previously written about analysing my iTunes library and about generating Smart Playlists in iTunes. This post takes things a bit further by generating a “perfect playlist” outside of iTunes… it is exclusively for nerds.

How can you put together a perfect playlist? What are your favourite songs? How can you tell what they are? Well, we could look at how many times you’ve played each song in your iTunes library (assuming this is mainly how you consume your music)… but this can be misleading. Songs that have been in there since the start (in my case, a decade ago) have had longer to accumulate plays than songs that were added last year. This problem was covered nicely in a post by Mr Science Show.

He suggests that your all-time greatest playlist can be modelled using

\frac{dp}{dt}=\frac{A}{Bt+N_0} + Ce^{-Dt}

Where N_0 is the number of tracks in the library at t_0, time zero. A and B are constants and the collection growing linearly over time. The second component is an additional correction for the fact that songs added more recently are likely to have garnered more plays, and as they age, they relax back into the general soup of the library. I used something similar to make my perfect playlist.

Calculating something like this is well beyond the scope of iTunes and so we need to do something more heavy duty. The steps below show how this can be achieved. Of course, I used IgorPro to do almost all of this. I tried to read in the iTunes Music Library.xml directly in Igor using the udStFiLrXML package, but couldn’t get it to work. So there’s a bit of ruby followed by an all-Igor workflow. You can scroll to the bottom to find out a) whether this was worth it and b) for other stuff I discovered along the way.

All the code to do this is available here. I’ll try to put quantixed code on github from now on.

Once the data is in Igor, the strategy is to calculate the expected number of plays a track should have received if iTunes was simply set to random. We can then compare this number to the actual number of plays. The ratio of these numbers helps us to discriminate our favourite tracks. To work out the expected plays, we calculate the number of tracks in the library over time and the inverse of this gives us the probability that a given track, at that moment in the lifetime of the library, will be played. We know the total number of plays and the lifetime of the library, so if we assume that play rate is constant over time (fair assumption), this means we can calculate the expected number of plays for each track. As noted above, there is a slight snag with this methodology, because tracks added in the last few months will have a very low number of expected plays, yet are likely to have been played quite a lot. To compensate for this I used the modelling method suggested by Mr Science Show, but only for recent songs. Hopefully that all makes sense, so now for a step-by-step guide.

Step 1: Extract data from iTunes xml file to tsv

After trying and failing to write my own script to parse the xml file, I stumbled across this on the web.


require 'rubygems'
 require 'nokogiri'

list = []
 doc = Nokogiri::XML(File.open(ARGV[0], 'r'))

 doc.xpath('/plist/dict/dict/dict').each do |node|

hash = {}
 last_key = nil

node.children.each do |child|

next if child.blank?

if child.name == 'key'

 last_key = child.text

 hash[last_key] = child.text

list << hash

 p list

This script was saved as parsenoko.rb and could be executed from the command line

find . -name "*.xml" -exec ruby parsenoko.rb {} > playlist.csv \;

after cd to appropriate directory containing the script and a copy of the xml file.

Step 2: A little bit of cleaning

The file starts with [ and ends with ]. Each dictionary item (dict) has been printed enclosed by {}. It’s easiest to remove these before importing to IgorPro. For my library the maximum number of keys is 38. I added a line with (ColumnA<tab>ColumnB<tab>…<tab>ColumnAL), to make sure all keys were imported correctly.

Step 3: Import into IgorPro

Import the tsv. This is preferable to csv because many tracks have commas in the track title, album title or artist name. Everything comes in as text and we will sort everything out in the next step.

LoadWave /N=Column/O/K=2/J/V={"\t"," $",0,0}

Step 4: Get Igor to sort the key values into waves corresponding to each key

This is a major type of cleaning. What we’ll do is read the key and its value. The two are separated by => and so this is used to parse and resort the values. This will convert the numeric values to numeric waves.

This is done by executing


Step 5: Convert timestamps to date values

iTunes stores dates in a UTC timestamp with this format 2014-10-02T20:24:10Z. It does this for Date Added, Date Modified, Last Played etc. To do anything meaningful with these, we need to convert them to date values. IgorPro uses the time in seconds from Midnight on 1st Jan 1904 as a date system. This requires double precision FP64 waves. We can parse the string containing this time stamp and convert it using


Step 6: Discover your favourite tracks!

We do all of this by running


The way this works is described above. Note that you can run whatever algorithm you like at this point to generate a list of tracks.

Step 7: Make a playlist to feed back to iTunes

The format for playlists is the M3U file. This has a simple layout which can easily be printed to a Notebook in Igor and then saved as a file for importing back into iTunes.

To do this we run


Where the Variable listlen is the length of the playlist. In this example, listlen=50 would give the Top 50 favourite tracks.

So what did I find out?

My top 50 songs determined by this method were quite different to the Smart Playlist in iTunes of the Most Played tracks. The tracks at the top of the Most Played list in iTunes have disappeared in the new list and these are the ones that have been in the library for a long time and I suppose I don’t listen to that much any more. The new algorithmically designed playlist has a bunch of fresher tracks that were added in the last few years and I have listened to quite a lot. Looking through I can see music that I should explore in more detail. In short, it’s a superior playlist and one that will always change and should not go stale.

Other useful stuff

There are quite a few parsing tools on the web that vary in their utility and usefulness. Some that deserve a mention are:

  • The xml file should be readable as a plist by cocoa which is native to OSX
  • Visualisation of what proportion of an iTunes library is by a given artist – bdunagan’s blog
  • itunes-parser on github by phiggins
  • Really nice XSLT to move the xml file to html – moveable-type
  • Comprehensive but difficult to follow method in ruby.

The post title comes from “Your Favorite Thing” by Sugar from their LP “File Under: Easy Listening”

Science songs

I thought I’d compile a list of songs related to biomedical science. These were all found in my iTunes library. I’ve missed off multiple entries for the same kind of thing, as indicated.


  • Grand Mal -Elliott Smith from XO Sessions
  • She’s Lost Control – Joy Division from Unknown Pleasures (Epilepsy)
  • Aneuryism – Nirvana from Hormoaning EP
  • Serotonin – Mansun from Six
  • Serotonin Smile – Ooberman from Shorley Wall EP
  • Brain Damage – Pink Floyd from Dark Side of The Moon
  • Paranoid Schizophrenic – The Bats from How Pop Can You Get?
  • Headacher – Bear Quartet from Penny Century
  • Headache – Frank Black from Teenager of the Year
  • Manic Depression – Jimi Hendrix Experience and lots of other songs about depression
  • Paranoid – Black Sabbath from Paranoid (thanks to Joaquin for the suggestion!)


  • Cancer (interlude) – Mansun from Six
  • Hepatic Tissue Fermentation – Carcass or pretty much any song in this genre of Death Metal
  • Whiplash – Metallica from Kill ‘Em All
  • Another Invented Disease – Manic Street Preachers from Generation Terrorists
  • Broken Nose – Family from Bandstand
  • Bones – Radiohead from The Bends
  • Ana’s Song – Silverchair from Neon Ballroom (Anorexia Nervosa)
  • 4st 7lb – Manic Street Preachers from The Holy Bible (Anorexia Nervosa)
  • November Spawned A Monster – Morrissey from Bona Drag (disability)
  • Castles Made of Sand – Jimi Hendrix Experience from Axis: Bold As Love (disability)
  • Cardiac Arrest – Madness from 7
  • Blue Veins – The Raconteurs from Broken Boy Soldiers
  • Vein Melter – Herbie Hancock from Headhunters
  • Scoliosis – Pond from Rock Collection (curvature of the spine)
  • Taste the Blood – Mazzy Star… lots of songs with blood in the title.


  • Biotech is Godzilla – Sepultura from Chaos A.D.
  • Luminol – Ryan Adams from Rock N Roll
  • Feel Good Hit Of The Summer – Queens of The Stone Age from Rated R (prescription drugs of abuse)
  • Stars That Play with Laughing Sam’s Dice – Jimi Hendrix Experience (and hundreds of other songs about recreational drugs)
  • Tramazi Parti – Black Grape from It’s Great When You’re Straight…
  • Z is for Zofirax – Wingtip Sloat from If Only For The Hatchery
  • Goldfish and Paracetamol – Catatonia from International Velvet
  • L Dopa – Big Black from Songs About Fucking

Genetics and molecular biology

  • Genetic Reconstruction – Death from Spiritual Healing
  • Genetic – Sonic Youth from 100%
  • Hair and DNA – Hot Snakes from Audit in Progress
  • DNA – Circle from Meronia
  • Biological – Air from Talkie Walkie
  • Gene by Gene – Blur from Think Tank
  • My Selfish Gene – Catatonia from International Velvet
  • Sheer Heart Attack – Queen (“it was the DNA that made me this way”)
  • Mutantes – Os Mutantes
  • The Missing Link – Napalm Death from Mentally Murdered E.P.
  • Son of Mr. Green Genes – Frank Zappa from Hot Rats

Cell Biology

  • Sweet Oddysee Of A Cancer Cell T’ Th’ Center Of Yer Heart – Mercury Rev from Yerself Is Steam
  • Dead Embryonic Cells – Sepultura from Arise
  • Cells – They Might Be Giants from Here Comes Science (songs for kids about science)
  • White Blood Cells LP by The White Stripes
  • Anything by The Membranes
  • Soma – Smashing Pumpkins from Siamese Dream
  • Golgi Apparatus – Phish from Junta
  • Cell-scape LP by Melt Banana

Album covers with science images

Godflesh – Selfless. Scanning EM image of some cells growing on a microchip?



Circle – Meronia. Photograph of an ampuole?

Do you know any other science songs or album covers? Leave a comment!

My Favorite Things

I realised recently that I’ve maintained a consistent iTunes library for ~10 years. For most of that time I’ve been listening exclusively to iTunes, rather than to music in other formats. So the library is a useful source of information about my tastes in music. It should be possible to look at who are my favourite artists, what bands need more investigation, or just to generate some interesting statistics based on my favourite music.

Play count is the central statistic here as it tells me how often I’ve listened to a certain track. It’s the equivalent of a +1/upvote/fave/like or maybe even a citation. Play count increases by one if you listen to a track all the way to the end. So if a track starts and you don’t want to hear it and you skip on to the next song, there’s no +1. There’s a caveat here in that the time a track has been in the library, influences the play count to a certain extent – but that’s for another post*. The second indicator for liking a track or artist is the fact that it’s in the library. This may sound obvious, but what I mean is that artists with lots of tracks in the library are more likely to be favourite artists compared to a band with just one or two tracks in there. A caveat here is that some artists do not have long careers for a variety of reasons, which can limit the number of tracks actually available to load into the library. Check the methods at the foot of the post if you want to do the same.

What’s the most popular year? Firstly, I looked at the most popular year in the library. This question was the focus of an earlier post that found that 1971 was the best year in music. The play distribution per year can be plotted together with a summary of how many tracks and how many plays in total from each year are in the library. There’s a bias towards 90s music, which probably reflects my age, but could also be caused by my habit of collecting CD singles which peaked as a format in this decade. The average number of plays is actually pretty constant for all years (median of ~4), the mean is perhaps slightly higher for late-2000s music.

Favourite styles of music: I also looked at Genre. Which styles of music are my favourite? I plotted the total number of tracks versus the total number of plays for each Genre in the library. Size of the marker reflects the median number of plays per track for that genre. Most Genres obey a rule where total plays is a function of total tracks, but there are exceptions. Crossover, Hip-hop/Rap and Power-pop are highlighted as those with an above average number of plays. I’m not lacking in Power-pop with a few thousand tracks, but I should probably get my hands on more Crossover or Hip-Hop/Rap.


Using citation statistics to find my favourite artists: Next, I looked at who my favourite artists are. It could be argued that I should know who my favourite artists are! But tastes can change over a 10 year period and I was interested in an unbiased view of my favourite artists rather than who I think they are. A plot of Total Tracks vs Mean plays per track is reasonably informative. The artists with the highest plays per track are those with only one track in the library, e.g. Harvey Danger with Flagpole Sitta. So this statistic is pretty unreliable. Equally, I’ve got lots of tracks by Manic Street Preachers but evidently I don’t play them that often. I realised that the problem of identifying favourite artists based on these two pieces of information (plays and number of tracks) is pretty similar to assessing scientists using citation metrics (citations and number of papers). Hirsch proposed the h-index to meld these two bits of information into a single metric, the h-index. It’s easily computed and I already had an Igor procedure to calculate it en masse, so I ran it on the library information.

Before doing this, I consolidated multiple versions of the same track into one. I knew that I had several versions of the same track, especially as I have multiple versions of some albums (e.g. Pet Sounds = 3 copies = mono + stereo + a capella), the top offending track was “Baby’s Coming Back” by Jellyfish, 11 copies! Anyway, these were consolidated before running the h-index calculation.

The top artist was Elliott Smith with an h-index of 32. This means he has 32 tracks that have been listened to at least 32 times each. I was amazed that Muse had the second highest h-index (I don’t consider myself a huge fan of their music) until I remembered a period where their albums were on an iPod Nano used during exercise. Amusingly (and narcissistically) my own music – the artist names are redacted – scored quite highly with two out of three bands in the top 100, which are shown here. These artists with high h-indeces are the most consistently played in the library and probably constitute my favourite artists, but is the ranking correct?

The procedure also calculates the g-index for every artist. The g-index is similar to the h-index but takes into account very highly played tracks (very highly cited papers) over the h threshold. For example, The Smiths h=26. This could be 26 tracks that have been listened to exactly 26 times or they could have been listened to 90 times each. The h-index cannot reveal this, but the g-index gets to this by assessing average plays for the ranked tracks. The Smiths g=35. To find the artists that are most-played-of-the-consistently-most-played, I subtracted h from g and plotted the Top 50. This ranked list I think most closely represents my favourite artists, according to my listening habits over the last ten years.


Track length: Finally, I looked at the track length. I have a range of track lengths in the library, from “You Suffer” by Napalm Death (iTunes has this at 4 s, but Wikipedia says it is 1.36 s), through to epic tracks like “Blue Room” by The Orb. Most tracks are in the 3-4 min range. Plays per track indicates that this track length is optimal with most of the highly played tracks being within this window. The super-long tracks are rarely listened to, probably because of their length. Short tracks also have higher than average plays, probably because they are less likely to be skipped, due to their length.

These were the first things that sprang to mind for iTunes analysis. As I said at the top, there’s lots of information in the library to dig through, but I think this is enough for one post. And not a pie-chart in sight!

Methods: the library is in xml format and can be read/parsed this way. More easily, you can just select the whole library and copy-paste it into TextEdit and then load this into a data analysis package. In this case, IgorPro (as always). Make sure that the interesting fields are shown in the full library view (Music>Songs). To do everything in this post you need artist, track, album, genre, length, year and play count. At the time of writing, I had 21326 tracks in the library. For the “H-index” analysis, I consolidated multiple versions of the same track, giving 18684 tracks. This is possible by concatenating artist and the first ten characters of the track title (separated by a unique character) and adding the play counts for these concatenated versions. The artist could then be deconvolved (using the unique character) and used for the H-calculation. It’s not very elegant, but seemed to work well. The H-index and G-index calculations were automated (previously sort-of-described here), as was most of the plot generation. The inspiration for the colour coding is from the 2013 Feltron Report.

* there’s an interesting post here about modelling the ideal playlist. I worked through the ideas in that post but found that it doesn’t scale well to large libraries, especially if they’ve been going for a long time, i.e. mine.

The post title is taken from John Coltrane’s cover version of My Favorite Things from the album of the same name. Excuse the US English spelling.

Tips from the Blog I

What is the best music to listen to while writing a manuscript or grant proposal? OK, I know that some people prefer silence and certainly most people hate radio chatter while trying to concentrate. However, if you like listening to music, setting an iPod on shuffle is no good since a track by Napalm Death can jump from the speakers and affect your concentration. Here is a strategy for a randomised music stream of the right mood and with no repetition, using iTunes.

For this you need:
A reasonably large and varied iTunes library that is properly tagged*.

1. Setup the first smart playlist to select all songs in your library that you like to listen to while writing. I do this by selecting genres that I find conducive to writing.
Conditions are:
-Match any of the following rules
-Genre contains jazz
-add as many genres as you like, e.g. shoegaze, space rock, dream pop etc.
-Don’t limit and do check live updating
I call this list Writing

2. Setup a second smart playlist that makes a randomised novel list from the first playlist
Conditions are:
-Match all of the following rules
-Playlist is Writing   //or whatever you called the 1st playlist
-Last played is not in the last 14 days    //this means once the track is played it disappears, i.e. refreshes constantly
-Limit to 50 items selected by random
-Check Live updating
I call this list Writing List

That’s it! Now play from Writing List while you write. The same strategy works for other moods, e.g. for making figures I like to listen to different music and so I have another pair for that.

After a while, the tracks that you’ve skipped (for whatever reason) clog up the playlist. Just select all and delete from the smart playlist, this refreshes the list and you can go again with a fresh set.

* If your library has only a few tracks, or has plenty of tracks but they are all of a similar genre, this tip is not for you.