Caution: this post is for nerds only.
I watched this numberphile video last night and was fascinated by the point pattern that was created in it. I thought I would quickly program my own version to recreate it and then look at patterns made by more points.
I didn’t realise until afterwards that there is actually a web version of the program used in the video here. It is a bit limited though so my code was still worthwhile.
A fractal triangular pattern can be created by:
- Setting three points
- Picking a randomly placed seed point
- Rolling a die and going halfway towards the result
- Repeat last step
If the first three points are randomly placed the pattern is skewed, so I added the ability to generate an equilateral triangle. Here is the result.
and here are the results of a triangle through to a decagon.
All of these are generated with one million points using alpha=0.25. The triangle, pentagon and hexagon make nice patterns but the square and polygons with more than six points make pretty uninteresting patterns.
Watching the creation of the point pattern from a triangular set is quite fun. This is 30000 points with a frame every 10 points.
Here is the code.
Some other notes: this version runs in IgorPro. In my version, the seed is set at the centre of the image rather than a random location. I used the random allocation of points rather than a six-sided dice.
The post title is taken from the title track from Bolt Thrower’s “Realm of Chaos”.
BBC 6Music recently went back in time to 1994. This made me wonder what albums released that year were my favourites. As previously described on this blog, I have this information readily available. So I quickly crunched the numbers. I focused on full-length albums and, using play density (sum of all plays divided by number of album tracks) as a metric, I plotted out the Top 20.
There you have it. Scorn’s epic Evanescence has the highest play density of any album released in 1994 in my iTunes library. By some distance. If you haven’t heard it, this is an amazing record that broke new ground and spawned numerous musical genres. I think that record, One Last Laugh In A Place of Dying… and Ro Sham Bo would all be high on my all-time favourite list. A good year for music then as far as I’m concerned.
Other observations: I was amazed that Definitely Maybe was up there, since I am not a big fan of Oasis. Likewise for Dummy by Portishead. Note that Oxford’s Angels and Superdeformed[…] are bootleg records.
Bubbling under: this was the top 20, but there were some great records bubbling under in the 20s and 30s. Here are the best 5.
- Heatmiser – Cop and Speeder
- Circle – Meronia
- Credit to the Nation – Take Dis
- Kyuss – Welcome to Sky Valley
- Drive Like Jehu – Yank Crime
I heard tracks from some of these bands on 6Music, but many were missing. Maybe there is something for you to investigate.
Part of a series obsessively looking at music in an obsessive manner.
I use a Garmin 800 GPS device to log my cycling activity. including my commutes. Since I have now built up nearly 4 years of cycling the same route, I had a good dataset to look at how accurate the device is.
I wrote some code to import all of the rides tagged with commute in rubiTrack 4 Pro (technical details are below). These tracks needed categorising so that they could be compared. Then I plotted them out as a gizmo in Igor Pro and compared them to a reference data set which I obtained via GPS Visualiser.
The reference dataset is black. Showing the “true” elevation at those particular latitude and longitude coordinates. Plotted on to that are the commute tracks coloured red-white-blue according to longitude. You can see that there are a range of elevations recorded by the device, apart from a few outliers they are mostly accurate but offset. This is strange because I have the elevation of the start and end points saved in the device and I thought it changed the altitude it was measuring to these elevation positions when recording the track, obviously not.
To look at the error in the device I plotted out the difference in the measured altitude at a given location versus the true elevation. For each route (to and from work) a histogram of elevation differences is shown to the right. The average difference is 8 m for the commute in and 4 m for the commute back. This is quite a lot considering that all of this is only ~100 m above sea level. The standard deviation is 43 m for the commute in and 26 m for the way back.
This post at VeloViewer comparing GPS data on Strava from pro-cyclists riding the St15 of 2015 Giro d’Italia sprang to mind. Some GPS devices performed OK, whereas others (including Garmin) did less well. The idea in that post is that rain affects the recording of some units. This could be true and although I live in a rainy country, I doubt it can account for the inaccuracies recorded here. Bear in mind that that stage was over some big changes in altitude and my recordings, very little. On the other hand, there are very few tracks in that post whereas there is lots of data here.
It’s interesting that the data is worse going in to work than coming back. I do set off quite early in the morning and it is colder etc first thing which might mean the unit doesn’t behave as well for the commute to work. Both to and from work tracks vary most in lat/lon recordings at the start of the track which suggests that the unit is slow to get an exact location – something every Garmin user can attest to. Although I always wait until it has a fix before setting off. The final two plots show what the beginning of the return from work looks like for location accuracy (travelling east to west) compared to a midway section of the same commute (right). This might mean the the inaccuracy at the start determines how inaccurate the track is. As I mentioned, the elevation is set for start and end points. Perhaps if the lat/lon is too far from the endpoint it fails to collect the correct elevation.
I’m disappointed with the accuracy of the device. However, I have no idea whether other GPS units (including phones) would outperform the Garmin Edge 800 or even if later Garmin models are better. This is a good but limited dataset. A similar analysis would be possible on a huge dataset (e.g. all strava data) which would reveal the best and worst GPS devices and/or the best conditions for recording the most accurate data.
I described how to get GPX tracks from rubiTrack 4 Pro into Igor and how to crunch them in a previous post. I modified the code to get elevation data out from the cycling tracks and generally made the code slightly more robust. This left me with 1,200 tracks. My commutes are varied. I frequently go from A to C via B and from C to A via D which is a loop (this is what is shown here). But I also go A to C via D, C to A via B and then I also often extend the commute to include 30 km of Warwickshire countryside. The tracks could be categorized by testing whether they began at A or C (this rejected some partial routes) and then testing whether they passed through B or D. These could then be plotted and checked visually for any routes which went off course, there were none. The key here is to pick the right B and D points. To calculate the differences in elevation, the simplest thing was to get GPS Visualiser to tell me what the elevation should be for all the points I had. I was surprised that the API could do half a million points without complaining. This was sufficient to do the rest. Note that the comparisons needed to be done as lat/lon versus elevation because due to differences in speed, time or trackpoint number lead to inherent differences in lat/lon (and elevation). Note also due to the small scale I didn’t bother converting lat/lon into flat earth kilometres.
The post title comes from “Elevation” by Television, which can be found on the classic “Marquee Moon” LP.
Towards the end of 2015, I started distance running. I thought it’d be fun to look at the frequency of my runs over the course of 2016.
Most of my runs were recorded with a GPS watch. I log my cycling data using Rubitrack, so I just added my running data to this. This software is great but to do any serious number crunching, other software is needed. Yes, I know that if I used strava I can do lots of things with my data… but I don’t. I also know that there are tools for R to do this, but I wrote something in Igor instead. The GitHub repo is here. There’s a technical description below, as well as some random thoughts on running (and cycling).
The animation shows the tracks I recorded as 2016 rolled by. The routes won’t mean much to you, but I can recognise most of them. You can see how I built up the distance to run a marathon and then how the runs became less frequent through late summer to October. I logged 975 km with probably another 50 km or so not logged.
To pull the data out of rubiTrack 4 Pro is actually quite difficult since there is no automated export. An applescript did the job of going through all the run activities and exporting them as gpx. There is an API provided by Garmin to take the data straight from the FIT files recorded by the watch, but everything is saved and tagged in rubiTrack, so gpx is a good starting point. GPX is an xml format which can be read into Igor using XMLutils XOP written by andyfaff. Previously, I’ve used nokogiri for reading XML, but this XOP keeps everything within Igor. This worked OK, but I had some trouble with namespaces which I didn’t resolve properly and what is in the code is a slight hack. I wrote some code which imported all the files and then processed the time frame I wanted to look at. It basically looks at a.m. and p.m. for each day in the timeframe. Igor deals with date/time nicely and so this was quite easy. Two lookups per day were needed because I often went for two runs per day (run commuting). I set the lat/lon at the start of each track as 0,0. I used the new alpha tools in IP7 to fade the tracks so that they decay away over time. They disappear with 1/8 reduction in opacity over a four day period. Igor writes out to mov which worked really nicely, but wordpress can’t host movies, so I added a line to write out TIFFs of each frame of the animation and assembled a nice gif using FIJI.
Getting started with running
Getting into running was almost accidental. I am a committed cyclist and had always been of the opinion: since running doesn’t improve aerobic cycling performance (only cycling does that), any activity other than cycling is a waste of time. However, I realised that finding time for cycling was getting more difficult and also my goal is to keep fit and not to actually be a pro-cyclist, so running had to be worth a try. Roughly speaking, running is about three times more time efficient compared to cycling. One hour of running approximates to three hours of cycling. I thought, I would just try it. Over the winter. No more than that. Of course, I soon got the running bug and ran through most of 2016. Taking part in a few running events (marathon, half marathons, 10K). A quick four notes on my experience.
- The key thing to keeping running is staying healthy and uninjured. That means building up distance and frequency of running very slowly. In fact, the limitation to running is the body’s ability to actually do the distance. In cycling this is different, as long as you fuel adequately and you’re reasonably fit, you could cycle all day if you wanted. This not true of running, and so, building up to doing longer distances is essential and the ramp up shouldn’t be rushed. Injuries will cost you lost weeks on a training schedule.
- There’s lots of things “people don’t tell you” about running. Blisters and things everyone knows about, but losing a toenail during a 20 km run? Encountering runner’s GI problems? There’s lots of surprises as you start out. Joining a club or reading running forums probably helps (I didn’t bother!). In case you are wondering, the respective answers are getting decent shoes fitted and well, there is no cure.
- Going from cycling to running meant going from very little upper body mass to gaining extra muscle. This means gaining weight. This is something of a shock to a cyclist and seems counterintuitive, since more activity should really equate to weight loss. I maintained cycling through the year, but was not expecting a gain of ~3 kilos.
- As with any sport, having something to aim for is essential. Training for training’s sake can become pointless, so line up something to shoot for. Sign up for an event or at least have an achievement (distance, average speed) in your mind that you want to achieve.
So there you have it. I’ll probably continue to mix running with cycling in 2017. I’ll probably extend the repo to do more with cycling data if I have the time.
The post title is taken from “Colours Running Out” by TOY from their eponymous LP.
Top Trumps is a card game for children. The mind can wander when playing such games with kids… typically, I start thinking: what is the best strategy for this game? But also, as the game drags on: what is the quickest way to lose?
Since Top Trumps is based on numerical values with simple outcomes, it seemed straightforward to analyse the cards and to simulate different scenarios to look at these questions.
Many Top Trumps variants exist, but the pack I’ll focus on is Marvel Universe “Who’s Your Hero?” made by Winning Moves (cat. No.: 3399736). Note though that the approach can probably be adapted to handle any other Top Trumps set.
There are 30 cards featuring Marvel characters. Each card has six categories:
- Top Trumps Rating.
What is the best card and which one is the worst?
In order to determine this I pulled in all the data and compared each value to every other card’s value, and repeated this per category (code is here, the data are here). The scaling is different between category, but that’s OK, because the game only uses within field comparisons. This technique allowed me to add up how many cards have a lower value for a certain field for a given card, i.e. how many cards would that card beat. These victories could then be summed across all six fields to determine the “winningest card”.
The cumulative victories can be used to rank the cards and a category plot illustrates how “winningness” is distributed throughout the deck.
The best card in the deck is Iron Man. What is interesting is that Spider-Man has the designation Top Trump (see card), but he’s actually second in terms of wins over all other cards. Head-to-head, Spider-Man beats Iron Man in Skill and Mystique. They draw on Top Trumps Rating. But Iron Man beats Spider-Man on the three remaining fields. So if Iron Man comes up in your hand, you are most likely to defeat your opponent.
At the other end of the “winningest card” plot, the worst card, is Wasp. Followed by Ant Man and Bucky Barnes. There needs to be a terrible card in every Top Trump deck, and Wasp is it. She has pitiful scores in most fields. And can collectively only win 9 out of (6 * 29) = 174 contests. If this card comes up, you are pretty much screwed.
What about draws? It’s true that a draw doesn’t mean losing and the active player gets another turn, so a draw does have some value. To make sure I wasn’t overlooking this with my system of counting victories, I recalculated the values using a Football League points system (3 points for a win, 1 point for a draw and 0 for a loss). The result is the same, with only some minor changes in the ranking.
I went with the first evaluation system in order to simulate the games.
I wrote a first version of the code that would printout what was happening so I could check that the simulation ran OK. Once that was done, it was possible to call the function that runs the game, do this multiple (1 x 10^6) times and record who won (player 1 or player 2) and for how many rounds each game lasted.
A typical printout of a game (first 9 rounds) is shown here. So now I could test out different strategies: What is the best way to win and what is the best way to lose?
If you knew which category was the most likely to win, you could pick that one and just win every game? Well, not quite. If both players take this strategy, then the player that goes first has a slight advantage and wins 57.8% of the time. The games can go on and on, the longest is over 500 rounds. I timed a few rounds and it worked out around 15 s per round. So the longest game would take just over 2 hours.
Strategy 2: pick one category and stick with it
This one requires very little brainpower and suits the disengaged adult: just keep picking the same category. In this scenario, Player 1 just picks strength every time while Player 2 picks their best category. This is a great way to lose. Just 0.02% of games are won using this strategy.
Strategy 3: pick categories at random
The next scenario was to just pick random categories. I set up Player 1 to do this and play against Player 2 picking their best category. This means 0.2% of wins for Player 1. The games are over fairly quickly with the longest of 1 x 10^6 games stretching to 200 rounds.
If both players take this strategy, it results in much longer games (almost 2000 rounds for the longest). The player-goes-first advantage disappears and the wins are split 49.9 to 50.1%.
Strategy 4: pick your worst category
How does all of this compare with selecting the worst category? To look at this I made Player 2 take this strategy, while Player 1 picked the best category. The result was definitive, it is simply not possible for Player 2 to win. Player 1 wins 100% of all 1 x 10^6 games. The games are over in less than 60 rounds, with most being wrapped up in less than 35 rounds. Of course this would require almost as much knowledge of the deck as the winning strategy, but if you are determined to lose then it is the best strategy.
The hand you’re dealt
Head-to-head, the best strategy is to pick your best category (no surprise there), but whether you win or lose depends on the cards you are dealt. I looked at which player is dealt the worst card Wasp and at the outcome. The split of wins for player 1 (58% of games) are with 54% of those, Player 2 stated with Wasp. Being dealt this card is a disadvantage but it is not the kiss of death. This analysis could be extended to look at the outcome if the n worst cards end up in your hand. I’d predict that this would influence the outcome further than just having Wasp.
So there you have it: every last drop of fun squeezed out of a children’s game by computational analysis. At quantixed, we aim to please.
The post title is taken from “Weak Superhero” by Rocket From The Crypt off their debut LP “Paint As A Fragrance” on Headhunter Records
2016 was the 400 year anniversary of William Shakespeare’s death. Stratford-upon-Avon Rotary Club held the Shakespeare Marathon on the same weekend. Runners had an option of half or full marathon. There were apparently 3.5 K runners. Only 700 of whom were doing the full marathon. The chip results were uploaded last night and can be found here. Similar to my post on the Coventry Half Marathon, I thought I’d quickly analyse the data.
The best time was 02:34:51 by Adam Holland of Notfast. Fastest female runner was 3:14:39 by Josie Hinton of London Heathside.
Congrats to everyone who ran and thanks to the organisers and all the supporters out on the course.
The post title is taken from “Pledging My Time” a track from Blonde on Blonde by Bob Dylan
Well, the 2015/2016 season was one to forget for Crewe Alexandra. Relegation to League Two (English football’s 4th tier) was confirmed on 9th April with a 3-0 defeat to local rivals Port Vale. Painful.
Maybe Repeat Failure is a bit strong. Under Dario Gradi, the Railwaymen eventually broke into League One/Championship (the 2nd Tier) where they punched above their weight for 8 seasons. The stats for all league finishes can be downloaded and plotted out to get a sense of Crewe’s fortunes over a century-and-a-bit.
The data are normalised because the number of teams in each league has varied over the years from 16 to 24. There were several years where The Alex finished bottom but there was nowhere to go. You can see the trends that have seen the team promoted and then relegated. It looked inevitable that the team would go down this season.
Now, the reasons why the Alex have done so badly this season are complex, however there is a theme to Crewe’s performances over all of this time. Letting in too many goals. To a non-supporter this might seem utterly obvious – of course you lose a lot if you let in too many goals. But Crewe are incredibly leaky and their goal difference historically is absolutely horrendous. The Alex are currently in 64th place on the all-time table, between West Ham and Portsmouth, with 4242 points – not bad – however our goal difference is -952. That’s minus 952 goals. Only Hartlepool have a worse goal difference (-1042). That’s out of 144 teams. At Gresty Road they’ve scored 3384 and let in 2526. On the road they netted 2135 but let in 3945.
See you in League Two for 2016/2017.
The post title is taken from “Repeat Failure” by The Delgados from their Peloton LP.
Fans of probability love random processes. And lotteries are a great example of random number generation.
The UK National Lottery ran in one format from 19/11/1994 until 7/10/2015. I was talking to somebody who had played the same set of numbers in all of these lottery draws and I wondered what the net gain or loss has been for them over this period.
The basic format is that people buy a line of numbers (6 numbers, from 1-49) and try to match the six numbers (from 49 balls numbered 1-49) drawn from a machine. The aim is to match all six balls and win the jackpot. The odds of this are fantastically small (1 in ~14 million), but if they are the only person matching these numbers they can take away £3-5 million. There are prizes for matching three numbers (1 in ~56 chance), four numbers (1 in ~1,032), five numbers (1 in ~55,491) or five numbers plus a seventh “bonus ball” (1 in ~2,330,636). Typical prizes are £10, £100, £1,500, or £50,000, respectively.
The data for all draws are available here. I pulled all draws regardless of machine that was used or what set of balls was used. This is what the data look like.
The rows are the seven balls (colour coded 1-49) that came out of the machine over 2065 draws.
I wrote a quick bit of code which generated all possible combinations of lottery numbers and compared all of these combinations to the real-life draws. The 1 in 14 million that I referred to earlier is actually
This gives us the following.
Crunching these combinations against the real-life draw outcomes tells us what would have happened if every possible ticket had been bought for all draws. If we assume a £1 stake for each draw and ~14 million people each buying a unique combination line. Each person has staked £2065 for the draws we are considering.
- The unluckiest line is 6, 7, 10, 21, 26, 36. This would’ve only won 12 lots of three balls, i.e. £120 – a net loss of £1945
- The luckiest line is 3, 6, 13, 23, 27, 49. These numbers won 41 x three ball, 2 x four ball, 1 x jackpot, 1 x 5 balls + bonus.
- Out of all possible combinations, 13728621 of them are in the red by anything from £5 to £1945. This is 98.2% of combinations.
Pretty terrible odds all-in-all. Note that I used the typical payout values for this calculation. If all possible tickets had been purchased the payouts would be much higher. So this calculation tells us what an individual could expect if they played the same numbers for every draw.
Note that the unluckiest line and the luckiest line have an equal probability of success in the 2066th draw. There is nothing intrinsically unlucky or lucky about these numbers!
I played the lottery a few times when it started with a specified set of numbers. I matched 3 balls twice and 4 balls once. I’ve not played since 1998 or so. Using another function in my code, I could check what would’ve happened if I’d kept playing all those intervening years. Fortunately, I would’ve looked forward to a net loss with 43 x three balls and 2 x four balls. Since I actually had a ticket for some of those wins and hardly any for the 2020 losing draws, I feel OK about that. Discovering that my line had actually matched the jackpot would’ve been weird, so I’m glad that wasn’t the case.
There’s lots of fun to be had with this dataset and a quick google suggests that there are plenty of sites on the web doing just that.
Here’s a quick plot for fun. The frequency of balls drawn in the dataset:
- The ball drawn the least is 13
- The one drawn the most is 38
- Expected number of appearances is 295 (14455/49).
- 14455 is 7 balls from 2065 draws
Since October 2015, the Lottery changed to 1-59 balls and so the dataset used here is effectively complete unless they revert to the old format.
The title of this post comes from “Wrote for Luck” by The Happy Mondays from their 1988 LP Bummed. The Manic Street Preachers recorded a great cover version which was on the B-Side of Roses in The Hospital single.
Every Song Ever: Twenty Ways to Listen in an Age of Musical Plenty
Ben Ratliff (Farrar, Straus and Giroux)
A non-science book review for today’s post. This is a great read on “how to listen to music”. There have been hundreds of books published along these lines, the innovation here however is that we now live in an age of musical plenty. Every song ever recorded is available at our fingertips to listen to when, where and how we want. This means that the author can draw on Thelonious Monk, Sunn O))), Shostakovitch and Mariah Carey. And you can seek it out and find out whatever it is that they have in common.
I got hooked in Chapter 2 (discussing slowness in music). I was reading and thinking: he should mention Sleep’s Dopesmoker, but what are the chances? I turn the page and there it was. Then I knew that we were literally on the same page and that I would enjoy whatever it was he had to say. Isn’t confirmation bias a wonderful thing (outside of science).
A lot of writing about music is terrible, but I love it when it is done well. As it is here. I especially like reading “under the bonnet” analysis of songs. Ian MacDonald’s Revolution In The Head (or Twilight of the Gods by Wilfred Mellers as an extreme example) springs to mind. This close analysis means you can go back and find new treasures in old songs. And this is the essence of the book.
I must admit that I have thought about trying to write similar analyses of songs on quantixed. Aside from the fact that I don’t have time, I was worried it might make me seem like Patrick Bateman discussing the merits of Huey Lewis & The News in American Psycho. It’s something that’s difficult to do well and Ratliff’s analyses here are light touch and spot-on.
The short section on blast beats which mentioned D.R.I. made me smile too. Although there’s a factual error here. Ratliff talks about how singer-drummer-brother combo Kurt and Eric Brecht lock in on Draft Me when they played CBGB’s in 1984. Drummer Eric had left the band at that point to be replaced by Felix Griffin, and it is him, not Eric, duelling with vocalist Kurt. Both on LP Dealing With It and the gig at CBGB’s which was later released as an LP and video. Again it’s a band that I have soft spot for and it was great to see them picked out.
There were a couple of quotes that I found amusing, being a CD collector and something of a completist. Here’s one:
A friend described to me the experience of acquiring a complete CD collection of Mozart, after having had a piece-by-piece relationship with his music for most of his life. It was 175 CDs, or something like that. “I realized,” he said, “that now that I had it all, I never needed to listen to it again.
Along the same lines, I thought this quote was pretty chilling.
We can pretty much wave bye-bye to the completist-music-collector impulse: it had a limited run in the human brain, probably 1930 to 2010. (It still exists in a fitful way, but it doesn’t have a consensual frame: there is no style for it.) It is not only a way of buying, owning, and arranging music-related objects and experiences in one’s life, but also a distinct way of listening.
As somebody who is not a fan of streaming and still values physically owning music I know I am out-of-step with the rest of the world. However I think this quote is at odds with what the whole book is trying to achieve. The guy listening to music on his phone speaker on the bus, described in the intro can’t hear and appreciate much of what is described in the book. To hear that squeak of John Bonham’s kick drum pedal on Since I’ve Been Loving You from Led Zeppelin III, you need to be listening in the old-fashioned way, rather than in the noisy and busy way most music is consumed nowadays.
It’s a great read. You can get it here.
My Blank Pages is a track by Velvet Crush. This is an occasional series of book reviews.
The end of the month sees the Coventry Half Marathon. I looked at what constitutes a good time over this course, based on 2015 results. I thought I’d post this here in case any one is interested.
The breakdown of runners by category for the 2015 event. Male Senior (MSEN) category has the most runners, constituting a wide age grouping. There were 3565 runners in total, 5 in an undetermined category and 9 DNFs. These 14 were not included in the analysis.
The best time last year was 01:10:21!
Good luck to everyone running this (or any other event) this year.
Edit: The 2016 Coventry Half Marathon happened today. I’m updating this post with the new data.
The width of violins has no special significance compared to 2015. Fastest time this year was 1:08:40 in the MSEN category.
There were more runners this year than last (4212 finishers), across all categories. Also this year there was a wheelchair category, which is not included here as there were only four competitors. FWIW, I placed somewhere in the first violin, in the lower whisker :-).
Congrats to everyone who ran and thanks to the all the supporters out on the course.
The post title is taken from “Pledging My Time” a track from Blonde on Blonde by Bob Dylan