Rolling Garfield Ninja Turtles
Thursday, October 26th, 2006So, a friend pointed out today that when you search for “Rolling Stones” on Google, you may not get the results you were expecting:
Apparently, to Google, Garfield.com is the canonical source for information on the Rolling Stones. Also, Garfield.com and NinjaTurtles.com are apparently the same site:
Hmm, I wonder what the top result is for “Ninja Turtles“. Let’s see:
Yup, that Garfield cat sure is popular. For completeness’ sake, I’ll point out now that searches for both “Teenage Mutant Ninja Turtles” and even “Ninja Stones” also show Garfield.com as the top hit. If you find more, let me know. In the mean time, I’m really curious as to why this is happening, so let’s try to figure that out.
So, why does Google think Garfield.com and NinjaTurtles.com are the same site? Well, might they be hosted on the same machine?
$ host www.ninjaturtles.com www.ninjaturtles.com has address 216.204.128.19 $ host www.garfield.com www.garfield.com is an alias for ucomics.com. ucomics.com has address 198.247.208.81 ucomics.com has address 198.247.208.121 ucomics.com has address 198.247.208.122 ucomics.com has address 198.247.208.123 ucomics.com has address 63.208.55.161 ucomics.com has address 63.208.55.201 ucomics.com has address 63.208.55.202 ucomics.com has address 63.208.55.203
Nope. Neither is anything about the Rolling Stones. Okay, just for fun, let’s look at the robots.txt file on Garfield.com:
# robots.txt for http://garftest.uclick.com/ User-agent: * Dissallow: /
That’s weird. First of all, they misspelled disallow as “dissallow”. A quick check of other sites with disallow misspelled in their robots.txt files shows that Google ignores the misspelled disallow directive. So, the file should have no affect on indexing.
Maybe Google’s is using a hash of the domain name to identify identical sites? Trying 16 common hashing algorithms (MD5, RIPEMD-160, Tiger, SHA-1, SHA-256, Adler32, CRC32, etc.) of various permutations of the domain names with and without punctuation, I didn’t find any collisions. That doesn’t rule out hash collisions using different (say, homegrown) hashing algorithms, but I’ll move on for now.
Next, scanning the results returned for Garfield on various search engines, I don’t see any evidence of Googlebombing the terms “Garfield”, “Rolling Stones”, or “Teenage Mutant Ninja Turtles”.
So, I give up. Maybe Garfield’s prominence in Google results is an homage to Eugene Garfield, a citation analysis pioneer cited in Larry and Sergey’s PageRank paper from 1998.
Okay, that’s a stretch, but I’m stumped. Anyone else have any ideas?