Articles

RSS
  • Python: dealing with ZipImportError in setuptools

    4 mai 2011, 17h36m

    I spent most of today wrestling with a build problem in one of our internal Python modules. We use zc.buildout to automatically pull in the dependencies and build all the components for each module, but one of them started mysteriously failing with an error like this:

    While:
    Installing scripts.
    Getting distribution for 'lfm.data'.

    An internal error occurred due to a bug in either zc.buildout or in a
    recipe being used:
    Traceback (most recent call last):
    ...
    ZipImportError: bad local file header in /blah/blah/lfm.data-0.11.dev_r186049-py2.6.egg


    This was particularly weird, as I could unzip that file fine with no problems.

    Eventually I got some good advice from Alexis on the #distutils Freenode channel -- try using distribute, which is the successor to setuptools, as discussed here.

    So I got a new version of bootstrap.py from here, ran it with the --distribute flag, then ran bin/buildout again -- and it all worked.

    Why it failed in the first place is still a total mystery though :-)
  • Dead Voices On Air

    20 fév. 2011, 19h51m

    I've just been signed up for a gig next weekend (26 Feb) at ARtCH, a quirky little gallery/space/venue of the sort that East London does so well.

    Topping the bill is Mark Spybey aka Dead Voices on Air, formerly of Download and sometime collaborator with all manner of avant-garde experimentalists. Expect home-made noise-toys, improvisation and spooky atmospherics.

    Also in attendance will be Simon Fisher Turner, Two Dead Voices providing 'audiovisual synthesis', and entertainment until past bedtime from myself and Tango Mango (Freq/Kosmische).

    See you there...
  • Technical background on Valentine's Day data-mining post

    14 fév. 2011, 12h10m

    If you're interested, here's some more technical details on how we gathered the data for the Valentine's Day blog post.

    Our tagging system assigns a relevance score to each tag for each artist that tag is applied to. This is based broadly on the number of listeners who've assigned that tag to that particular artist, but with a number of statistical corrections designed to reduce noise caused by errors, helping the genuinely informative tags float to the top.

    This score is reflected in the size of the tags on each artist's tag page. So for Barry White you can see the highest scoring tags are , and -- and are surprisingly small, and isn't there at all. But tags can be applied to individual songs as well as artists.

    So for our first experiment, on tags and songs, we started with the Romantic and Love Songs tags and extracted all tracks which were tagged with both of those. We gave each track a score equal to the higher of their two scores for Romantic and Love Songs, and threw away all the tracks which scored less than 0.5 (scores range between 0 and 1). This gave us the following 30 tracks:

    Lionel Richie: Stuck On You
    Bee Gees: Follow the Wind
    Mohit Chauhan: Tune Jo Na Kaha
    A1: One In Love
    Lionel Richie: Truly
    Luis Fonsi: Tell Her Tonight
    Lionel Richie: Endless Love
    Michael McDonald: All In Love Is Fair
    Rascal Flatts: Forever
    Michael Bolton: When A Man Loves A Woman
    Howard Hewett: How Do I Know I Love You
    Rod Stewart: You Go To My Head
    Michael Bolton: To Love Somebody
    Michael Bolton: How Am I Supposed To Live Without You
    Marc Cohn: True Companion
    Virgin: Opowiem Ci
    Jeffrey Osborne: On The Wings Of Love
    Atif Aslam: Kuch Is Tarah
    Peaches & Herb: Reunited
    Jon Secada: She's All I Ever Had
    Lionel Richie: My Love
    Paulla: Od Dziś
    Rita Coolidge: We're All Alone
    Rod Stewart: A Nightingale Sang In Berkeley Square
    Ronan Keating: If Tomorrow Never Comes
    Bobby Vinton: Sealed With a Kiss
    Glenn Medeiros: Nothing's Gonna Change My Love For You
    Dan Hill: Sometimes When We Touch
    Bryan Adams: Everything I Do (I Do It for You)
    Barry Manilow: Can't Smile Without You

    Next, we counted the number of listeners who had scrobbled any of these tracks, via Last.fm radio or their own music choices, for each day in 2010. To adjust for natural variation in usage, we divided the count for each day by the total number of listeners we had on that day, in order to give the fractional values in the graph. The number crunching was performed on our Hadoop cluster since a year of scrobbling data amounts to hundreds of gigabytes.

    For the second experiment, we ranked all the artists scrobbled on Valentine's Day 2010 by the number of listeners each artist had that day, and took the top 1000. Then, to provide a 'baseline' for comparison, we repeated this for February 21st 2010, since it's the same day of the week as the 14th -- this accounts for people having different listening habits on different days of the week. We looked for every artist who did make the top 1000 on the 14th but didn't on the 21st, and repeated this whole process for every year from 2005-2009. Each artist got 1 point for every year they made the Valentine's Day top 1000 but not the baseline top 1000.

    As it turned out, Barry White was the only artist who pulled off this feat for three years, half of the six years in the experiment. All of the others in our Valentine's Day top 10 only managed this twice, and no other artist managed it more than once. So to break the ties between the other nine artists, we re-ranked them by their average chart position (number of listeners) on those Valentine's Days when they did make the top 1000.

    If anyone thinks it's suspicious that Barry White -- surely the classic example of the smooth-voiced love machine -- came out on top, let me assure you, it was a pleasant surprise to us. It's a nice demonstration of how tags and scrobbles provide different (and complimentary) ways of mining the data. Tags tell us what people say, scrobbling logs tell us what they actually do.
  • Big Bang lineup announced!

    4 fév. 2011, 18h45m

    I've posted the event page for the first ever Big Bang party, at Electrowerkz in London on the first of April.

    Big Bang #1

    One floor of banging techno and trance, and another of filthy dubbed-up beats, breaks and bass. What's not to like? The lineup includes Broken Note and Matta, two of the dirtiest dubstep acts I've ever had the pleasure of seeing live -- that's a complement by the way -- and I'll be doing a set of noisy, atmospheric, bass-heavy electronics.

    Tickets are a mere tenner from here.

    See the new Big Bang FM group for playlists and updates (coming soon).

    In other news: I joined Last.fm this week in the Music Information Retrieval team. We're the geeky spods who build the ENORMOUS SCARY MACHINES behind the personalized radio and recommendations systems, and the toys in the Playground.

    If this is the kind of thing you can do too, see the job spec.

    Last.fm are an excellent bunch -- send them love and cupcakes. That's all.