Revamping the similar artists formula.


25 déc. 2011, 19h45m

As you've noticed the last several years, similar artists for bands/artists have gotten worse and more unreasonable. Especially if you keep reading on and on through the top similar artists for each artist. The one that has been baffling me for a while is The Offspring, a punk rock band that's not extreme or (though they do have influences and sounds at times). In fact, as of 12/25/2011, these are their top 50 similar artists in order:

Sum 41
Rise Against
Bad Religion
Billy Talent
Green Day

Papa Roach
System of a Down
Red Hot Chili Peppers
Rage Against the Machine

Bloodhound Gang
Limp Bizkit
3 Doors Down
Farben Lehre
Dropkick Murphys

Iron Maiden
Drowning Pool
Pidżama Porno
Bullet for My Valentine
Rob Zombie

Bowling for Soup
Good Charlotte
Guano Apes
Flogging Molly
Tenacious D
Foo Fighters
Three Days Grace

You guys can puke now. The current formula is based on what people of Offspring listen to more often. But seriously...we know this. If one person listens to two bands, those two bands aren't necessarily similar...especially The Offspring to Metallica. The Offspring to Nickelback. The Offspring to AC/DC (seriously...). The Offspring to System of a Down...The Offspring to Disturbed and Godsmack (seriously...?) The Offspring to Slipknot, KoRn, and Limp Bizkit (when has The Offspring ever been nu metal?). And worse yet, The Offspring to Iron Maiden, Rammstein, Bullet For My Valentine, DragonForce, Rob Zombie...are we supposed to believe that The Offspring is a heavy metal band?

Anyone should know that this method isn't the best way to find a similar artist. It's not. Simple as follows. A topic on the forums discussed that and some people suggested that the bands/artists be determined by tag clouds. That could work, but a few believe that there could be problems, especially when taking the whole tag cloud as a whole and seeing a ton of hard rock and alternative bands become similar to the one artist (The Offspring in this case).

Actually, there is another way I've been looking at, and it was a waste of several hours.

What if we took the current list and re-sorted it based on just the top 5 tags, taking into account the order of the tags. Of course there are questionable tags like , , and for some reason, very frequently and (appears that The Offspring has a huge following in Poland), but for now, let's leave them here (already, the infamous tag was banned from the tag cloud).

This is what I did...I grabbed the top 5 tags for every single band and resorted the list based on how many tags a band/artist shared those tags. Those that shared all 5 tags (in this case, , , , , ) came on the top followed by those that shared 4, then 3, and then 0. Here's how the top 20 now looked like:

Billy Talent
Alien Ant Farm
Unwritten Law

SHARED ONLY 4 TAGS (missing tag in paranthesis)
Sum 41 (had , missing alternative rock)
Bad Religion (had , missing alternative rock)
Green Day (had pop punk, missing alternative rock)
blink-182 (had pop punk, missing alternative rock)
Bloodhound Gang (had , missing punk rock)
Pidżama Porno (had , missing alternative rock)
Bowling For Soup (had pop punk, missing alternative rock)
Good Charlotte (had pop punk, missing alternative rock)
AFI (had emo, missing alternative rock)
SR-71 (had pop punk, missing alternative rock)
Social Distortion (had rockabilly, missing alternative rock)
No Use For a Name (had pop punk, missing alternative rock)
+44 (had pop punk, missing alternative rock)
Strachy Na Lachy (had polish, missing punk)

As you can notice now, all of those metal and hard rock bands like Metallica, AC/DC, Nickelback, System of a Down, Disturbed and Godsmack....GONE from the top 20...actually they didn't even break the top 30 as well. What was more disturbing though was the high number of bands that didn't even share a single tag with The Offspring. These are some of the bands that you could hear when listening to The Offspring similar artists radio, and grab a bucket, you're about to puke, Justin Bieber is not there, but still...

Bullet For My Valentine
In Flames
The Prodigy
Children of Bodom
Sonata Arctica
Hollywood Undead
All That Remains
Blind Guardian
Five Finger Death Punch
Lacuna Coil
As I Lay Dying
Machine Head
Judas Priest
Sonic Syndicate
A Day to Remember
Within Temptation
Fatboy Slim
Alley Life

Take a look guys...enjoy listening to all that metalcore, symphonic metal, power metal, heavy metal, electronica, nu metal, thrash metal, melodic death metal, and hip-hop stuff. According to right now, they are really similar artists to The Offspring (actually, all of them except for DF and BFMV come up as medium to low similarity to The Offspring). But seriously, a band that doesn't even share any of the top 5 tags with another band/artist has no reason even being on a list with that band. And there's literally a common thread with the bands/artists listed above. Most of them are pure metal to the point they don't even have a tag (I suspect Priest would get one if classic rock was barred though). Others like The Prodigy or Eminem are from electronica and hip-hop genres...and..really...I can't see a similarity with those bands to begin with.

So there...we managed to filter in bands that are similar...but it's not over yet. Let's now resort the list again, but this time, only compare the top 4 tags for all artists/bands. Tiebreakers for this sort will be

1). Number of tags shared when maximum 5 tags are used...
2). Their original ranking on the old formula, higher goes first

After the sort, the top 20 looks even more different:

Billy Talent
Alien Ant Farm

Unwritten Law
Sum 41
Bad Religion
Green Day

Bloodhound Gang
Bowling For Soup
Good Charlotte
Social Distortion

No Use For a Name
The Distillers
Autopilot Off

Well for the most part, the ordering has changed here. Some other bands now moved into the top 20 as there's more of a focus on the top 4 tags.

Last but not least, I repeated the resort again, but only comparing the top 3 tags for all artists/bands using the same tiebreakers as before while adding another tiebreaker that takes greater precedence (basically the number of tags shared when the maximum tags is used at 3...followed by the 2 tiebreaker rules listed above). And here is the final product:

#1 Killradio
#2 Billy Talent
#3 Unwritten Law
#4 Sum 41
#5 Green Day
#6 AFI
#7 Social Distortion
#8 Ramones
#9 Pezz
#10 Bad Religion
#11 blink-182
#12 Bowling For Soup
#13 Good Charlotte
#14 No Use For a Name
#15 +44
#16 Sugarcult
#17 The Distillers
#18 Autopilot Off
#19 Transplants
#20 Pennywise

Now where did Hoobastank and Alien Ant Farm go? Even though they shard all 5 tags as The Offspring, the ordering on both bands was different (rock, alternative rock, alternative, punk rock, punk), while with The Offspring, the ordering was (punk rock, punk, rock, alternative rock, alternative). As a result, the bands only shared 1 tag among the top 3 tags and ended up going below those that had at least 2 of The Offspring's top tags (punk rock, punk, rock).

As for some of those disputed bands, Nickelback ended up coming in at #69. Much lower on the list, but still quite high as they did share 1 tag among the top 3 (rock). Also, remember we are re-ranking the top 250 bands artists and remember we are already dealing with a good 45 bands that have nothing in common with The Offspring. I do not know whether they strive to get the top 250 bands and cut off from there, or work out some weird formula there, but perhaps maybe they should apply the current formula to get a batch of even more artists, apply the above formula, and then just take the top 250 artists...honestly 250 might be too 200 perhaps? You could see a real shake up on many similar artists lists.

Many mainstream bands like Metallica, Radiohead, Linkin Park and other bands could see a lot of shake-up too. There are still a few other unresolved issues, like will it actually pick up 100% similar artists? Red Hot Chili Peppers for example has their top tags as rock, alternative, and alternative rock. Didn't some of the bands listed above have those 3 tags as well? Are they really similar? But perhaps it could cut bands that aren't alternative to begin with, like AC/DC, Black Sabbath, etc...

Another issue is potential abuse of tags. I'm talking about a bunch of wackos attempting to give something like Justin Bieber the now infamous tag (honestly though, 1 tag isn't really going to be enough and even identifying a brutal death metal band that a Justin Bieber fan listens too is going to be real difficult...especially if it doesn't share a pop or something. Moderators can assist with issues like that.

Also, the removal of tags could have an effect on similar artists. If we remove something like 70s, 80s, swedish, female vocalists, and all those other tags, can we guarantee that the new tags will make things better, or make things worse? What about tags regarding the same thing (alternative is not the same as alternative rock, but hip-hop/hip hop/hiphop? Nu metal/nu-metal? alt-rock/alternative rock?)? Do we merge these tags into one? Or do these duplicates exist because there's no other major tag to classify them as? What about vague tags like "classic rock"? And abused tags like "alternative", "indie"? And bogus tags like "", "", ""? With cases like this, it's all trial and error then, but it's really something that can be done on the go and with the assistance of user feedback.

There is one more thing I thought of regarding my suggested? Perhaps we could start sorting based on top tag for each artist, then by top 2 tags, then by top 3 tags, and all the way until the top 50th tag or something (and if an artist doesn't have a certain number of tags, TOO BAD :P). But that's something that would take a while to look at.

Who knows, perhaps someone might create a widget tool based on this suggestion. But it is clearly a possibility to use instead of the current formula, which guarantees a ton of similar artists that have nothing to do with each other.


  • maplejet

    DAnEmpire, I'm running a program that I will discuss in the next journal entry right now regarding Soilwork and FFDP. Apparently, FFDP only shares 2 tags (I have banned some tags that referred to countries, decades, and nonsense tags like awesome and overated). On Soilwork's similar artists based on the new formula and program I wrote, they would come in at #129 among the 250 bands listed...much farther down the list. Here's how the top 20 generated as: #1 Cipher System #2 Disarmonia Mundi #3 In Flames #4 Dark Tranquillity #5 Nightrage #6 Mors Principium Est #7 At The Gates #8 The Duskfall #9 Gardenian #10 MyGrain #11 Dark Age #12 Norther #13 Insomnium #14 Diablo #15 Naildown #16 Arch Enemy #17 Kalmah #18 Omnium Gatherum #19 Children Of Bodom #20 Degradead

    2 jan. 2012, 3h51m
Voir les 3 commentaires
Ajouter un commentaire. Connectez-vous à ou inscrivez-vous (c'est gratuit).