Last.fm Countries

 
  • Last.fm Countries

    Over the past several weeks, Xhochy's nationality statistics tool has started to lose accuracy, truncate libraries, and has appeared to stop updating. As a result, I decided to make a replacement tool.

    Please read at least the Overview and Known Issues below first. You'll find the link at the bottom of this post.

    Overview

    Once you enter your username, it will begin fetching the pages of your library, along with any already-known nationalities. After all pages of the library are fetched, the tool will then start to look up each artist in your library that was not already known. You can watch the results come in as the lookups are done.

    Keep in mind that for the first few weeks, this will take several minutes, as the tool does not yet know about many artists. Over time, the database will grow, and the results will be generated faster.

    As for unknown artists, the status of "Unknown" is cached for 5 days. If you try to look up an unknown artist more than 5 days after the last attempt, a fresh query will be done.

    Once your entire library's artists' nationalities have been determined (or determined to be unknown), pie charts will appear at the top of the page, and links will appear allowing you to toggle between the artist list and country list.

    Your library will be cached for 7 days, allowing you to generate a fresh chart once a week. You may opt to regenerate with your already-cached playcounts, though, if you want to refresh an unknown artist (and the 5 days for the artist have passed).

    Known Issues
    -Some artists may appear twice. This is due to a bug in the last.fm API, where it presents the same artist on different pages. I will try to come up with a workaround soon. UPDATE: Last.fm knows about this bug, and as soon as they fix it, the results from Last.fm countries will be correct too. For now, you're going to have duplicates (that come at the expense of other artists), and there's nothing I can do.

    -This tool does not work properly in Internet Explorer 6 or earlier. Once you enter your username, the browser will freeze until all the results are ready. While it should eventually complete, this is obviously not the intended behavior. Since IE6 is old, and programming to make it work properly would be a nightmare, I'm not going to do it. Even Google's given up on IE6, and if it's not good enough for them, it's not good enough for me :-)

    -As mentioned above, your first lookup will probably take several minutes. Subsequent lookups will be much faster.

    -I expect high demand, especially at first when every artist needs to be looked up. Last.fm limits its API usage to 5 queries a second from a given IP address, averaged over a 5-minute period. Thus, after 1400 queries have happened in a 5-minute period, queries will be suspended (I'm allowing a safety margin of 100). If you're in the middle of loading your data, let it keep running--it will retry every 5 seconds until the block has been cleared.

    -Sometimes, a request for a library page or artist might get lost, and the loading of the page will not progress. For now, just refresh and try again--the lookup will continue where it left off.

    Future Features
    -Allow a user to sign up for an account on the site, which would let you override the automatically detected nationalities for particular artists (e.g., in the case where there is more than one artist with the same name). These changes would apply only to your charts, and would not affect other people.

    -Allow suggestions for unknown artists.

    -Use the aforementioned suggestions and overrides, along with the already-detected nationalities, to generate a model that could determine nationality based on associated tags, without actually having any countries in the tags.

    For example, analysis might indicate:
    --That "catalan" tends to coincide with "spain" more often than any other country, so the model would know that "catalan" should apply more weight to "spain" than any other country.

    --That "tokyo" coincides with "japan" more than any other country, and thus "tokyo" should be a strong indicator for "japan".

    --That "trallpunk" (a style of punk rock generally only found in Sweden) tends to frequently coincide with "sweden", and thus "trallpunk" should indicate a higher probability of the country being Sweden.

    --That "seen live" does not have a statistically significant correlation with any country (after normalization), and should be ignored.

    -I am also considering an iPhone/Android app. Let me know your thoughts.

    Now that you've read that, here's the link!
    http://www.quinnsoft.com/lastfm/lastfm_countries.php

    Please report any bugs you find.


    • Bloopy a dit :...
    • Forum Moderator
    • 24 avr. 2010, 14h10m
    Nice work! I think I prefer it as a bar chart though, and with percentages it would be even better. For the pie chart, perhaps it could have the percentages in a list below so it fits in the About Me profile section, like Scaicha did.

    I also really like the ability to sort artists by country in xhochy's script. It makes it easy to see who your top artist is for a particular country, and it groups all the unknown artists together.

    Truncating the library didn't really bother me. It seems like overkill to check all those artists with so few plays, but I guess there's no harm if your script can handle it! It does look a bit odd if your chart has both 'unknown' and 'other' on it. Maybe you could combine the 2 into 1 slice of pie called 'other'.

    A couple of issues:

    A few artists always get a blank space in the country column, as if there's an error. Such as Jason "EVIL" Covelli, and a badly tagged artist with '+' in their name. I would blame it on the punctuation, but not all artists with " or + in their name have the same problem ("Weird Al" Yankovic is at least showing up as unknown).

    A few artists are showing up as unknown despite being tagged, eg:
    - artists Immundus and Avoid The Man Of The Frostbites.
    - artist Коллежский Асессор.

    • snyde1 a dit :...
    • Abonné
    • 24 avr. 2010, 15h33m
    I have a few comments, but am mostly here to echo Paul. The pie charts don't work for me either. Loading the entire library can take a loooooong time for some people. (Okay, not the library so much, but the tags for all of those artists.)

    You might want to trigger your graph plot when you hit the first artist with zero plays - when you pull in the library, you get all the tagged artists.

    It might be nice to have the link to the artist Last.fm page from the artist name. (This could also help troubleshoot problems like the ones indicated by Paul above.)

    Are you planning to hardcode overrides for obviously wrong tagging? For example, I don't get this one:
    The Moody Blues: Japan (UK)
    There are also those that tagged for language (e.g. German) and not country (for the given language, Austrian artists often get misclassified). However, your script seems to be doing a good job of getting it right. There are some wrong ones e.g.:
    La Menor Idea:France (Canada)
    April March:France (US)

    Hmmm ... this should be fixed
    ALL BLACKS HAKA:Unknown (New Zealand)

    In all, it works well.

    Improve your view of Last.fm - add some User Scripts.
    Did I hear that right? Mondegreens - for the misheard word. Like Odds? Can't get better than Even Odds!

    Speak your truth quietly and clearly; and listen to others, even to the dull and the ignorant; they too have their story.
  • Most of the wrong countries were due to a bug in the script. Right now, I have a script going through all the artists in the database (about 10,000 so far) and recalculating the country (I'll update this post when it's done). As for the Unknowns, the Norwegian and Ukrainian ones were due to a misspelling on my part. As for ALL BLACKS HAKA, it's not tagged, so the result of Unknown is correct.

    For the situations where artists were being tagged by their language rather than their country, I have adjusted the weight of the language tags to be 1/6 of that of a country tag (down from 1/2). This should allow, for instance, Austrian artists to appear correctly.

    I have also fixed the bug with +'s and quotation marks. If you find any other characters causing trouble, let me know.

    I have also added links to artists, but, right now, slashes break them. I'll fix that later.

    Snyde, I had no problem loading your pie charts. Wait up to 15 seconds after "Loading Image..." appears, and they should show up. If not, refresh and try again (the artists won't need to be looked up again).

    As for bar graphs/other layouts, I'll add those soon. I have a project I need to work on this weekend, but once I'm done with that, I'll work on those.


    • snyde1 a dit :...
    • Abonné
    • 24 avr. 2010, 21h36m
    darkshadow88 said:
    As for ALL BLACKS HAKA, it's not tagged, so the result of Unknown is correct.
    What I had meant as "fix that" was to get it tagged. I can't think of anything more Kiwi than that B-)
    For the situations where artists were being tagged by their language rather than their country, I have adjusted the weight of the language tags to be 1/6 of that of a country tag (down from 1/2). This should allow, for instance, Austrian artists to appear correctly.Cool!
    I have also added links to artists, but, right now, slashes break them. I'll fix that later.

    Snyde, I had no problem loading your pie charts. Wait up to 15 seconds after "Loading Image..." appears, and they should show up. If not, refresh and try again (the artists won't need to be looked up again).
    Well, yes but (1) I checked my username in your script earlier today and those artists are now cached and (2) I'm not a user with a lot of artists. Try one of the members of the 10000+ artist groups. I was thinking more of checking and waiting for the process to finish for the first time.

    While it may sound like I'm whinging here, I'd like to say "Great job!" Vive La Fête is even recognised as Belgian!!

    Improve your view of Last.fm - add some User Scripts.
    Did I hear that right? Mondegreens - for the misheard word. Like Odds? Can't get better than Even Odds!

    Speak your truth quietly and clearly; and listen to others, even to the dull and the ignorant; they too have their story.
    • brtkrbzhnv a dit :...
    • Utilisateur
    • 25 avr. 2010, 2h57m
    Bloopy said:
    It does look a bit odd if your chart has both 'unknown' and 'other' on it. Maybe you could combine the 2 into 1 slice of pie called 'other'.

    I completely disagree with this, as those are two completely different things. The size of "other" is a measure of diversity, whereas that of "unknown" is a measure of the unreliability of the figures.

    I agree that getting all artists seems kind of unnecessary/wasteful, and the API-request limit might become a problem when this service becomes popular. E.g. I have about 5500 artists in my library, and you could skip more than 3000 of those without missing more than 5% of my library:
    A script I wrote said:
    114166 (92%) plays of 1670 artists with >= 10 plays
    115237 (93%) plays of 1789 artists with >= 9 plays
    116037 (94%) plays of 1889 artists with >= 8 plays
    116828 (94%) plays of 2002 artists with >= 7 plays
    117836 (95%) plays of 2170 artists with >= 6 plays
    118691 (96%) plays of 2341 artists with >= 5 plays
    119599 (97%) plays of 2568 artists with >= 4 plays
    120493 (97%) plays of 2866 artists with >= 3 plays
    121959 (98%) plays of 3599 artists with >= 2 plays

    You could even do this dynamically with something like while(accumulated_playcount * 100 < user.playcount * 95) … or whatever to accommodate differently shaped libraries without sacrificing much accuracy.

    last.fm#DIV(class=messageSig)
  • The API request limit will hopefully sort itself out in time, as any artists that the script figures out the nationality of will be stored (and thus will not need an API call again). That said, the number of unknowns with lower playcounts is probably going to be high (as there are often many mistagged artists down there), and those would indeed end up getting retried every 5 days.

    I really have three options here:
    -Suppress retrying unknown artists below a certain threshold.
    -Suppress looking up artists (even for the first time) below a certain threshold, but use the known nationality if it's available.
    -Suppress all artists below a certain threshold, even if data is available.

    My thought is to consider all artists with known nationalities, suppress new lookups at 99%, and suppress retries at 95%. I'd like to hear your opinions, though, before I make a change.


    • Bloopy a dit :...
    • Forum Moderator
    • 25 avr. 2010, 6h49m
    brtkrbzhnv said:
    I completely disagree with this, as those are two completely different things. The size of "other" is a measure of diversity, whereas that of "unknown" is a measure of the unreliability of the figures.
    'Other' is a group of artists who aren't categorized into any of my top countries. 'Unknown' is a group of artists who aren't categorized into any of my top countries. Unknown is useful in the background for finding the artists I need to tag, but I don't need to see it as a separate category in the chart. Some unknowns are collaborations between artists of multiple countries, and some artists keep their nationality secret, so I'll never be able to tag those ones... they can count towards the diversity.

  • Bloopy said:
    'Other' is a group of artists who aren't categorized into any of my top countries. 'Unknown' is a group of artists who aren't categorized into any of my top countries.


    That said, many of the artists in 'Unknown' should be in your top countries (probably). To put them under 'Other' would give a false impression of increased diversity.

    Unless there's significant demand for me to change the behavior, 'Unknown' will not be assimilated into 'Other'. One option that could be considered would be to eliminate the Unknowns from the pie charts entirely, and have the percentages based only on 'known' artists.


    • brtkrbzhnv a dit :...
    • Utilisateur
    • 25 avr. 2010, 20h51m
    1) Is the library caching not working if I get the "loading page 107" stuff every time I look at my stats? Or is that just the way the caching is designed?
    2) Maybe it's just me, but my browser (Firefox) becomes unresponsive when I look at my stats and needs to be restarted afterwards. A minimum_playcount cutoff option would be a nice workaround.
    3) I think your ruleset may be missing → "Trinidad and Tobago"; e.g. Lord Kitchener is treated as "Unknown".

    By the way, I didn't know there were foreigners who listen to , but I think tags like it should be treated similarly to language tags, as they're basically the same from some kind of Bayesian perspective. You just need to be careful with the weighting or whatever, so that people like Finnish artist Laura Vanamo (the only good thing to come out of those Idol shows?) don't get miscategorized.

    Oh, and I think it would make sense to allow people to have unknowns ignored for the graphs. The "unknown" category both adds some information (reliability) and makes other information more difficult to get (the actual percentages of the countries), so it's a matter of (reasonable) personal preference whether it should be included.

    last.fm#DIV(class=messageSig)
  • Right now, the loading of the library pages will still happen, despite being cached (it's actually pulling them from my server rather than from last.fm, once cached). I intend to change this behavior in the near future so that it will only need to make one request for a cached library.

    I'm not sure about Firefox becoming unresponsive. I loaded your library and didn't notice any ill effects (though it does become a bit unresponsive while your library is loading). Maybe it's a memory issue--I'll investigate further.

    Indeed, "trinidad" was missing for Trinidad and Tobago. I have added it, and when the unknowns refresh, it should pick up on it.

    Of course, tags like "trallpunk" can be treated like language tags (and yeah, I'm one of probably very few Americans that listen to trallpunk). The problem is not in how to assign weight to that tag, but rather in listing all of the myriad possible tags like it. A Bayesian approach would avoid the need to hard-code a list of tag->country mappings once sufficient training data is available.

    Adding custom options is not feasible at this moment, as any settings would apply to your generated graphs, overwriting the previous ones. This means that somebody could come along and look at your library with different options, which would change your graphs. I could hack in a solution by just generating twice as many graphs and storing them with different filenames, but I'm not inclined to do so.

    The real solution to this will come with user accounts, which will let you set your personal overrides for artists, and probably offer a settings page to let you customize your graphs (including the option of whether to include the unknowns). This would let you present your library the way you want it. If another user (or an unregistered user) tried to look at your library, then, that person would see it as you intended. In order to ensure that only you can register settings for your last.fm username, there will be a simple verification procedure that will basically consist of you posting a randomly generated confirmation code to your shoutbox, which the script will then verify.

    I won't be able to do any work on it for about 2 weeks, though, as I have final projects and exams. Once I'm done with those, though, a user account system is one of the first things I intend to work on.


    Modifié par darkshadow88 le 26 avr. 2010, 2h57m
    • Bloopy a dit :...
    • Forum Moderator
    • 26 avr. 2010, 2h53m
    You might like to post about your script in the Last.fm nations group.

    Your future feature idea about overriding the nationality of an artist sounds great.

    Also, although I wouldn't use it, a few people in that group requested the option to recognize smaller regions within countries, for example:
    - separated from
    - & separated from
    - separated from /
    - and separated from

  • This tool looks great so far. Works fine for me, and the problem of language tags overriding actual nationality seems to have disappeared for me.

    Can't wait for your planned overriding nationality feature. Would make this less of a pain.

    Choro Club Feat. Senoo - Japanese, not Brazilian.

    • viv5552 a dit :...
    • Utilisateur
    • 1 mai 2010, 20h34m
    Awesome tool ! ! ! ! !



    .......but.......

    Bands tagged by origin of their lead singer:
    Lacrimosa
    and
    Snakeskin are from Switzerland, not Germany;
    Emigrate is from USA, not Germany;
    Leandra is from Germany, not Belarus.

    Bands with same names:
    Citadel - I listened to Russian band, not from USA;
    Devia - Belarusian, not from Finland.

    "Unknown" bands:
    MIND:|:SHREDDER - Ukraine;
    Songs from a tomb - Ukraine.

    • viv5552 a dit :...
    • Utilisateur
    • 1 mai 2010, 20h37m
    also in pie chart that I used in my previous message hard to see parts



  • Works nearly perfect. Imho, it would be better to eliminate the "Unknown" from the pie charts entirely, and have the percentages based only on known artists.



    Some artists/bands in my list, that are "Unknown" :

    -Tango Orchestra of Buenos Aires - Argentina
    -Francisco Canaro Y Su Orquesta Típica - Uruguay
    -Rue du Soleil - Switzerland
    -Adolfo Carabelli - Argentina
    -Tanghetto & Others - Argentina
    -Mike Patton, Ikue Mori, John Zorn - United States
    -Ensemble Sreteniye - Ukraine
    -Azam Ali-Vas - Iran
    -Soledad y Guarani - Argentina
    -Single Cell Orchestra - United States
    -Ned Rothenberg - United States

    • Music_TG a dit :...
    • Utilisateur
    • 19 jui. 2010, 16h08m






    • pchalk a dit :...
    • Utilisateur
    • 23 sept. 2010, 15h57m
    jeanbclemance said:
    Works nearly perfect. Imho, it would be better to eliminate the "Unknown" from the pie charts entirely, and have the percentages based only on known artists.



    i agree completely. for me i listen to a lot of artists that do collaborations. so u have many where 1 song may be made from two different artists from two different countries. idk if there is a future option to add a "multiple" stat. other than that many of mine are unknown. too many to list. how would i go about helping? tagging the artists?

    mine btw


    • [Utilisateur supprimé] a dit :...
    • Utilisateur
    • 4 déc. 2010, 18h49m
    Mine got pretty cool ;D except for a few unknown artists



    • zaakjes_ a dit :...
    • Utilisateur
    • 17 déc. 2010, 20h12m

    • [Utilisateur supprimé] a dit :...
    • Utilisateur
    • 18 déc. 2010, 2h00m

  • When I try to load my profile in your stats page it hangs at page 37, and I never get any pie charts. Could you take a look at that?

    I'll start paying subscription when Last.fm offers this site in Dutch. ;)
Les utilisateurs anonymes ne peuvent pas poster de messages. Merci de vous connecter ou de créer un compte pour pouvoir intervenir dans les forums.