UI Patterns: Wet Lazy Susan

June 3rd, 2007

I ran across a couple of variations on the “wet floor” effect today which both introduce a rotating set of images that can be moused over for more information. For lack of a better term, I’ll call this the “Wet Lazy Susan” effect. Read the rest of this entry »

Microsoft Web

March 31st, 2007

Here’s a snippet from Microsoft’s current corporate home page:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html lang="en" dir="ltr"> <head> <META http-equiv="Content-Type" content="text/html; charset=utf-16"> <title>Microsoft Corporation</title> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <meta name="SearchTitle" content="Microsoft.com"> <meta name="SearchDescription" content="Microsoft.com Homepage"> Download this code: /code/microsoft.txt

Can you spot the problem?

Update: James Booker is the first in with an answer: there are two Content-Type meta tags above, both of which specify different character sets.

Specifying the content-type in meta tags is a bit of a hack, as the browser has to seek through the first section of the document looking for a content-type declaration, then try reinterpreting the page with the character set the page specifies. Specifying a character set of "utf-16" doesn't make any sense in this scenario, as the browser is going to try the sniffing by interpreting the HTML as ASCII. If the page were actually UTF-16, this wouldn't work, as the representation for the string "Content-type" in UTF-16 isn't identical to its representation in UTF-8, as we can see in a Python shell:

Thankfully, the HTML spec foresaw this problem:

The META declaration must only be used when the character encoding is organized such that ASCII-valued bytes stand for ASCII characters (at least until the META element is parsed). META declarations should appear as early as possible in the HEAD element.

So, there are actually three problems with the above HTML. There are two content-type declarations in meta tags, one of them is bogus, and the correct one isn't as early as is possible in the head element. These problems, thankfully, are mitigated by the presence of an HTTP header that specifies the correct character set, and by the incredible amount of effort browser vendors have put into making their code accepting of mistakes such as these.

robots.txt Adventure

March 12th, 2007

Introduction.txt

Last October I got bored and set my spider loose on the robots.txt files of the world. Having had a good deal of positive feedback on my HTTP Headers survey, I had decided to poke around in robots.txt files and see what sorts of interesting things I could find.

Since then, I’ve taken 6 weeks of vacation and gotten to be very busy at work, so I’m just now getting around to analyzing all the data I gathered. These are some of the results of that analysis. Read the rest of this entry »

New Zealand Trip 2006

February 9th, 2007

Introduction

In mid-2006, I decided that I wanted to use up some of my long-accrued vacation, and started planning a 5 week trip to New Zealand. I’d wanted to go to New Zealand for a couple of years, after having seen photos of the country from my friend Josh’s ’round the world trip.

Typically, for me, planning a trip involves booking transportation, getting a rough idea of what I want to do, and figuring things out when I get there, a tendency which annoys some of those close to me to no end. I didn’t want New Zealand to be any different, so I booked airfare and investigated my in-country transportation options pretty carefully. I finally settled on renting a campervan, so that I’d be able to set my own pace for exploring the country. That settled, I booked the only activity I really had to book in advance, a hut ticket for the Milford Track, which is reputed to be the “finest walk in the world.” That settled, I geared up for the trip, buying lots of photography and hiking gear, and prepared myself for my first vacation outside of North America. Read the rest of this entry »

A Peek Inside

January 27th, 2007

The Shomstoppers present: Jolly Boots of Doom.

Undergraduate Computer Science Reading List

December 28th, 2006

My friend and coworker Buzz Andersen posited awhile ago that there exists a programming canon, a body of literature with which software developers and computer scientists are expected to be familiar.

Having interviewed a lot of people (and I mean a lot) and been interviewed myself a few times, Buzz’s point resonated with me. I’ve noticed that many recent university graduates have not been exposed to material that is considered fundamental to computer science and programming by many in the industry. A lot of schools have sacrificed the teaching of this material in favor of more IT development oriented courses, and really done their students a disservice in the process.

Since I’d rather see more kick-ass programmers and computer scientists than fewer, here’s my list of books that I think should be on the bookshelves of all computer science undergraduates. These books are, of course, my own favorites, and they exclude a lot of the books I’d include in a list of graduate-level books. Feel free to chime in in the comments if you think I’ve missed something important. Read the rest of this entry »

Rolling Garfield Ninja Turtles

October 26th, 2006

So, a friend pointed out today that when you search for “Rolling Stones” on Google, you may not get the results you were expecting:

Search for Rolling Stones on Google

Apparently, to Google, Garfield.com is the canonical source for information on the Rolling Stones. Also, Garfield.com and NinjaTurtles.com are apparently the same site:

TMNT and Garfield, they both like fake Italian food.

Hmm, I wonder what the top result is for “Ninja Turtles“. Let’s see:

Teenage Mutant Ninja Garfield

Yup, that Garfield cat sure is popular. For completeness’ sake, I’ll point out now that searches for both “Teenage Mutant Ninja Turtles” and even “Ninja Stones” also show Garfield.com as the top hit. If you find more, let me know. In the mean time, I’m really curious as to why this is happening, so let’s try to figure that out.

So, why does Google think Garfield.com and NinjaTurtles.com are the same site? Well, might they be hosted on the same machine?

$ host www.ninjaturtles.com
www.ninjaturtles.com has address 216.204.128.19
$ host www.garfield.com
www.garfield.com is an alias for ucomics.com.
ucomics.com has address 198.247.208.81
ucomics.com has address 198.247.208.121
ucomics.com has address 198.247.208.122
ucomics.com has address 198.247.208.123
ucomics.com has address 63.208.55.161
ucomics.com has address 63.208.55.201
ucomics.com has address 63.208.55.202
ucomics.com has address 63.208.55.203

Nope. Neither is anything about the Rolling Stones. Okay, just for fun, let’s look at the robots.txt file on Garfield.com:

# robots.txt for http://garftest.uclick.com/

User-agent: *
Dissallow: /

That’s weird. First of all, they misspelled disallow as “dissallow”. A quick check of other sites with disallow misspelled in their robots.txt files shows that Google ignores the misspelled disallow directive. So, the file should have no affect on indexing.

Maybe Google’s is using a hash of the domain name to identify identical sites? Trying 16 common hashing algorithms (MD5, RIPEMD-160, Tiger, SHA-1, SHA-256, Adler32, CRC32, etc.) of various permutations of the domain names with and without punctuation, I didn’t find any collisions. That doesn’t rule out hash collisions using different (say, homegrown) hashing algorithms, but I’ll move on for now.

Next, scanning the results returned for Garfield on various search engines, I don’t see any evidence of Googlebombing the terms “Garfield”, “Rolling Stones”, or “Teenage Mutant Ninja Turtles”.

So, I give up. Maybe Garfield’s prominence in Google results is an homage to Eugene Garfield, a citation analysis pioneer cited in Larry and Sergey’s PageRank paper from 1998.

Okay, that’s a stretch, but I’m stumped. Anyone else have any ideas?

Fun With HTTP Headers

August 7th, 2005

Introduction

Like any good web developer, I have a tendency to poke around at people’s web sites to see if I can figure out how they’re implemented. After poking at enough sites, I started noticing that people were putting some weird and interesting stuff in their HTTP headers. So, a couple of weeks ago, I decided to actually go out and see what I could find by scrounging around in HTTP headers in the wild. A header safari, if you will. These are the results of my hunt.

Headers?

HTTP is the protocol used to transmit data on what we know as “the web”. At the beginning of every server response on the web, there’s a bit of text like:

HTTP/1.1 200 OK
Content-Type: text/html
Connection: close

The top line specifies the protocol version of HTTP and a response code (200 in this case) used to indicate the outcome of a request. Following that are a bunch of lines that should consist of a field name (like “Connection”), followed by a colon, and then followed by a value (like “close” or “keep-alive”). These lines of text are the HTTP response headers. Immediately after the headers is a blank line, followed by the content of the message, such as the text of a web page or the data of an image file.

Technical Mumbo Jumbo

Want to examine the headers of a site for yourself? Try curl:

curl -i http://www.nextthing.org/

In the output of the above the first few lines are the headers, then there are a couple of line breaks, and then the body. If you just want to see the headers, and not the body, use the -I option instead of -i. Be forewarned, however, that some servers return different headers in this case, as curl will be requesting the data using a HEAD request rather than a GET request.

What I did to gather all of these headers was very similar. First, I downloaded an RDF dump of the Open Directory Project’s directory, and pulled out every URL from that file. Then, I stuck all of the domain names of these URL’s in a big database. A simple multithreaded Python script was used to download all of the index pages of these URL’s using PycURL and stick the headers and page contents in a database. When that was done, I had a database with 2,686,155 page responses and 23,699,737 response headers. The actual downloading of all of this took about a week.

This is, of course, not anywhere near a comprehensive survey of the web. Netcraft received responses from 70,392,567 sites in its August 2005 web survey, so I hit around 3.8% of them. Not bad, but I’m sure there’s a lot of interesting stuff I’m missing.

Obligatory Mention of Long Tail

First of all, yes, HTTP headers form something like a long tail:

Graph of log(frequency) over rank of headers

In particular, hapax legomena (one-offs) make up over half of the headers found. I expected this. Unfortunately for me, however, a lot of the really interesting stuff is over on that long flat section of the long tail. Which means I spent a lot of time poring over one-offs looking for interesting stuff. Weee.

It’s a good thing I’m easily amused.

Off with Her Headers

I found 891 instances of:

X-Pad-For-Netscrape-Bug: 0123456789

Which brought back memories of the days when Netscape was reviled by developers the world ’round, and had not yet achieved its ultimate (albeit posthumous) glory with Firefox. It’s nice to know comments by frustrated engineers have such a long half-life on the Internet. There are a few variants on this header:

X-Pad: avoid browser bug
XX-Pad: Padding
aheader: WOULDN'T YOU LIKE TO KNOW!
X-BrowserAlignment: problem

Similarly, people are still blocking Microsoft’s Dumb Tags:

X-MS-Smart-Tags: We have nothing to do with them.
X-Meta-MSSmartTagsPreventParsing: TRUE

Speaking of Microsoft, apparently the IIS team felt the need to advertise the domain of the site the user was accessing in every page request:

Server: Microsoft-IIS/5.0
jvc.org: jvc.org

How completely and utterly unnecessary…

They’re not the only ones, though. WebObjects powered sites spit out:

HTTP/1.1 200 Apple

Go team!

This cute header is courtesy of Caudium, a webserver written partially in Pike:

X-Got-Fish: Yes

The webmaster of www.kfki.hu should be commended for being on the bleeding edge, both using Caudium and including lots of Dublin Core metadata in the headers. Although, 32 headers seems a bit much, which is why I’m not going to show them all:

DC.Subject: physics
DC.Type: organizational homepage
SCHEMA.DCTERMS: http://purl.org/dc/terms/
X-Got-Fish: Yes

Contrary to popular belief, there are people out there using Smalltalk on the web. Two of them. One Smalltalk software company running a web server written in Smalltalk, and another:

Server: Swazoo 0.9 (Columbus)
X-WIKI-ENGINE: SmallWiki 1.0
CACHE-CONTROL: no-cache
X-WIKI-COPYRIGHT: Software Composition Group, University of Berne, 2003

running a Smalltalk user’s group web site with a wiki written on Smalltalk on a web server written in, you guessed it: Smalltalk. Cool.

And, of course, it wouldn’t be the Internet without an appearance by a BOFH:

X-BOFH: http://www.xxxxx.de/bofh/xxxxxx.html

The actual URL it points to has been obscured to protect the guilty, and a local mirror provided in its stead.

Missed Cneonctions

This header:

Cneonction: close

and its variant:

nnCoection: close

were two of the headers which first spurred my interest in HTTP headers.

imdb.com, amazon.com, gamespy.com, and google.com have all at various times used these or similar misspellings of connection, and I’m not by any means the first to have noticed. My first thought was that this was just a typo. After more consideration, however, I now believe this is something done by a hackish hardware load balancer trying to “remove” the connection close header when proxying for an internal server. That way, the connection can be held open and images can be transmitted through the same TCP connection, while the backend web server doesn’t need to be modified at all. It just closes the connection and moves on to the next request. Ex-coworker and Mudd alumus jra has a similar analysis.

Another data point which would back this up is the Oracle9iAS Web Cache rewriting:

Connection: close

as

yyyyyyyyyy: close
Connection: Keep-Alive

Headers with “X-Cnection: close” appear to be the result of a similar trick.

One ISP/web host is kind enough to include their web address and phone number in every request to any of their hosted servers:

Phone: (888) 817-8323
Web: www.wgn.net

This is just super-awesome. I once spent a good hour trying to find a technical contact for a certain monstrous job site to tell them their servers had been compromised and were displaying the following message to visitors:

You are being sniffed by Carnivore.
Your nation is secure.

…………..OCR IS WATCHING YOU…………..

Content-type: text/html

The message, funnily enough, was being relayed by modifying the HTTP headers.

C is for Cookie

Cookies 2 were defined in RFC 2965, way back in October of 2000. As far as I know, Opera is the only browser in widespread use to support them. It’s sad, really, as the original cookie spec that Netscape came up with is kind of lame. Specifically, Netscape’s spec defines the expiration as a date, which is vulnerable to clock skew on the user’s system making the cookie expire early. The Cookies-2 spec, on the other hand, uses a max-age attribute, specifying the lifetime of the cookie in seconds:

Set-Cookie2: Meyer_Sound_777=68.126.233.177.1122451925660461; path=/; max-age=1209600; domain=.meyersound.com; version=1

There are also Comment and CommentURL fields which explain what the cookie is for, but I have yet to find a header which uses them. *sigh* On the other hand, I did find 518 Set-Cookie2 headers, which, while miniscule compared to the 764,976 SetCookie headers I received, is more than I expected. It looks like software written by Sun is responsible for most of these.

A bunch of servers spit out:

shmget() failed: No space left on device

Doh! Time to cycle some log files.

Pingback discovery headers like this show up a lot (2370 times):

X-Pingback: http://www.nextthing.org/wordpress/xmlrpc.php

I don’t even know what to say to this, found at ebrain.ecnext.com:

HTTP/1.1 200 OK
Date: Sun, 24 Jul 2005 18:39:54 GMT
Server: Apache


Transfer-Encoding: chunked
Content-Type: text/html; charset=ISO-8859-1

At least they’re not alone, as www.charlottesweb.hungerford.org will keep them company:

Turn off Pictures Popup Toolbar in IE 6.0: 

And www.station.lu:

XHTML: <!DOCTYPE html PUBLIC"-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

The list goes on…

The Coral Content Distribution Network has been getting some buzz lately, so I was interested to see some

X-Coral-Control: redirect-home

headers show up. This header is used to tell Coral that if Coral can’t handle the load of requests for cached copies of your page, it should redirect these requests back to your site.

Why anyone would think to themselves, “Gee, if a massively scalable caching service running on hundreds of geographically distributed computers can’t handle the load of people wanting to look at my site, I’ll just have them bounce people back at me”, I don’t know. Masochism perhaps?

Speaking of P2P technologies, I was interested to run across a KaZaA server:

HTTP/1.0 404 Not Found
X-Kazaa-Username: anonymous_user
X-Kazaa-Network: KaZaA
X-Kazaa-IP: xx.xx.xx.xx:1348
X-Kazaa-SupernodeIP: xx.xx.xx.x:3699

It looked like it was running on someone’s DVR. Anyone have any pointers as to what software does that?

Along the same lines, haha:

X-Kaza-Username: hrosen
X-Kaza-Network: RIAA
X-Kaza-IP: 146.82.174.12:80
X-Kaza-SupernodeIP: 68.163.90.12:80
X-Disclaimer: All Your Base Are Belong To Us
X-Pizza-Phone: 961.1.351904

They’re not even the only ones using “X-Disclaimer”, a bunch of other sites do too:

X-Disclaimer: The local sysadmins have *nothing* to do with the content of this server.

It looks like Tux Games is trying to extend the venerable RFC 1097 to the web:

X-Subliminal: You want to buy as many games as you can afford

Personally, I would’ve gone for: “X-Superliminal: Hey you, buy some games!”.

I’m sure these kind folks would be first adopters:

X-Cotton: The Fabric of Our Lives

This person wants to make their opinion known, so here it is:

Veto: Usage of server response for statistics and advertising is disagreed!

To which I say: Take off every ‘zig’!! You know what you doing.

Robot Rock

I’d never really paid much attention to the Robots header:

ROBOTS: index,follow,cache

as it’s mostly used to disable indexing of a page and is intended to be used in a meta tag in the HTML itself, not in the HTTP headers.

However, it seems Google has added a new NOARCHIVE attribute, so let’s see who’s using it in their headers rather than in the meta tags like Google specifies.

It looks like the Singapore-based “Ministry of Pets” doesn’t want to be cached, as does the Civil Engineering department at São Paulo Polytechnic University, the realtime-3d software company MultiGen Paradigm, Swiss handicraft company Schweizer Heimatwerk, a Swiss kitesailing site, the Ragin’ Cajun Cafe in Hermosa Beach, CA, the London-based BouncingFish web consultancy, and the French financial paper La Tribune. That’s it.

BouncingFish even goes so far as to use an additional GOOGLEBOT header:

GOOGLEBOT: NOCACHE

How many of these sites are not being cached by Google? Zero. Which just goes to show that one shouldn’t just expect mix-and-matching of specs to work.

Along the same vein, I don’t think the first two headers below will work as expected:

X-Meta-ROBOTS: ALL
X-Meta-Revisit-After: 1 days
Robots: INDEX, FOLLOW

Except, possibly, in spiders using Perl’s HTML::HeadParser module. And, of course, we’ve already seen that the third header probably won’t work, either.

While I’m on the subject of Google… all Blogspot sites spit out:

test: %{HOSTNAME}e

So Blogger folks, whatcha doin’?

It’s Funny, Laugh

The fine folks at www.basement.com.au want to make it clear that:

Mickey-Mouse: Does_Not_Live_Here

Some people have a lot of fun with headers, as seen here:

Limerick: There was a young fellow named Fisher
Limerick: Who was fishing for fish in a fissure,
Limerick: When a cod, with a grin,
Limerick: Pulled the fisherman in
Limerick: Now they're fishing the fissure for Fisher.

This is the only ascii art I found:

<!--
Content-type: text/html
*************************************************************************
*                     Welcome to schMOOze University                    *
*                                                                       *
*   ==> To connect to an existing player type:  CONNECT NAME PASSWORD   *
*   ==> To connect as a guest type:             CONNECT GUEST           *
*                                                                       *
*************************************************************************
*            all text is copyrighted by the various authors             *
*    TIME FLIES LIKE AN ARROW              FRUIT FLIES LIKE A BANANA    *
*                                  ***                                  *
*                          *                 *                          *
*                    *                             *                    *
*                *                                     *                *
*             *                                           *             *
*           *                         (__)                  *           *
*          *                          (OO)                   *          *
*          *              ____________ /                     *          *
*          *            /|            /                      *          *
*          *          /  | |------ | |                       *          *
*          *        *    | |^^     | |                       *          *
*          *             ^ ^       ^ ^                       *          *
************                                                 ************

Nobody is connected.
-->
<HTML>
<HEAD>
  <TITLE>Welcome to schMOOze!</TITLE>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"></HEAD>
<meta http-equiv="refresh" content="0;URL=http://schmooze.hunter.cuny.edu/">
</BODY></HTML>

and it had me puzzled, until I realized it’s a telnet server, and the above is a really clever hack to redirect browsers towards HTML-land.

This made me laugh:

X-Powered-By: Intravenous Caffeine Drips
X-kluged-by: Nick, Mic, Ash, Andy
X-munged-by: The powers that be
X-Sanity-Provided-By: Ashleigh

Apparently the site has an alter-ego, as well.

www.wrestlingdb.com has some interesting headers. A few requests gets:

X-Stone-Cold-Steve-Austin: And that's the bottom line, cause Stone Cold says so.
X-Mick-Foley: Have a nice day!
X-Ric-Flair: To be the man, WHOOO!, you've got to beat the man.
X-Rock: If you smell what The Rock is cooking.
X-Booker-T: Can you dig it, SUCKAAAA?
X-Kurt-Angle: It's true, it's DAMN true.
X-Hurricane: Stand Back! There's a Hurricane Coming Through!
X-Kane: FREAKS RULE!

which is about as entertaining as watching a real wrestling match.

Totally Ellet

Just so everyone knows, Frostburg students are so totally leet, they don’t even need to spell it correctly:

Owned And Operted By FSU Computer Club: 31137

Speaking of which, apparently some guy named morris would like his visitors to know that he 0wnzor$ them:

X-You-Are-Owned-By: morris

Not sure where that box you rooted and are browsing the web from is located? Never fear, mobileparty.net will tell you:

X-Detected-Country: US

And, for those who were wondering, the Texarkana Police are the world’s finest, at least in the HTTP headers department:

TEXPOLICE: LAW_ENFORCEMENTS_FINEST

These nederlanders are representin’ for the westside:

X-Side: : WESTSIDE-FOR-LIFE

Western Europe, that is. Jaaa.

Speaking of furriners, anyone care to translate:

X-Sarrazin-Says: Ciccio, lascia perdere, e' un blowfish a 448 bit.

Similarly:

X-beliebig: Dieser Header dient der allgemeinen Verwirrung =:)
X-Gleitschirmfliegen: macht Spaaaasss!

Going back to my discussion on standards, localizing headers that are used to actually do stuff is a bad idea:

Ultima Modificação: Thu, 28 Jul 2005 15:12:07 GMT

ObRef

The Democrats called, they want you to know they found their sense of humor:

X-Dubya: You teach a child to read and he or her will be able to pass a literacy test.

Make sure to hit it a few times for optimum goodness:

X-Dubya: We're in for a long struggle, and I think Texans understand that. And so do Americans.
X-Dubya: Africa is a nation that suffers from incredible disease.
X-Dubya: We're making the right decisions to bring the solution to an end.
X-Dubya: Families is where our nation finds hope, where wings take dream.

In the politics vein:

X-Powered-By: MonkeyMag 0.02.01, (c) Niel Bornstein and Kendall Clark
X-Shout-Out: No Power Without Accountability
X-Mos-Defology: Speech is my hammer//Bang the world into shape//Now let it fall
X-American-Leftist-Salute: Doing Woody's Work!
X-Billy-Braggage: Sun, Sea, Socialism!

Yes! Someone just made my day. I love Al Bundy quotes:

X-Bundy: Here we have 3 of the seven dwarfs, puffy, crabby and horny.
X-Bundy: You know I never danced unless it was gonna get some sex for me.
X-Bundy: I blame it on TV myself.
X-Bundy: To know me is to love me.

I was disappointed in the lack of mention of mules, donkeys, or garden gnomes, but at least llamas, mice, and loons are well represented:

X-Llamas-Version: 2.0

From: www.teevee.org

X-Favourite-Animal: Mouse

From: www.kingssing.de

X-Loons-Version: 1.5.1

From: www.eod.com

Speaking of strange characters, apparently the Wicked Witch of the West and Spongebob Squarepants cohabitate at www.harbor-club.com:

X-Wicked-Witch: West
X-Spongebob: Squarepants!

Who knew?!

As if we needed further proof that the soft underbelly of the Internet is full of cults, slowly corrupting the moral fabric of society, I present:

X-SAVIOUR: BOB_DOBBS

From the looks of things, Living Slack Master Bob Dobbs is giving Jesus a run for his money among Oregonian carpenters and their web designers. They join such luminaries as R. Crumb, Devo, and Bruce Cambell.

And if you thought that was an obscure meme, try this on for size:

X-Lerfjhax: Extra yummy

When I first saw an X-Han header, I thought for sure the contents would be “Shot first!”, but instead I found something more obscure:

X-Han: 'I look forward to a tournament  of  truly  epic  proportions.'

While we’re on pop culture allusions:

X-Powered-By: Twine
X-Towelie: You wanna get high?

And it would be difficult to be more obscure than this:

X-Sven: look out for the fruits of life

Finally, old school Mac-diehards should appreciate:

X-Blam: Frog blast the vent core!

Connection: close

Back when I was interviewing for an internship at Tellme Networks, they had a comment buried on their homepage that said:

  <!-- (c) Copyright 2000 Tellme Networks. -->
  <!--
   If you're looking at our HTML source, you're exactly the person
   who should send us your resume. We recently redesigned our site;
   Tell us all about how you would make it better and better yet,
   if you have an illustrious career of web-hacking, drop us an email
   at jobs@tellme.com.
  -->

I thought this was just way awesome. However, if I was disappointed when they removed that comment, I’m even more disappointed to report that I have yet to find a single HTTP header offering me a cool job. What’s wrong with you people?! I’m supposed to be able to find anything on the Internet!

I was, at least, thanked for my efforts, and I found the answer to life, the universe, and everything!

X-Thank-You: for bothering to look at my HTTP headers
X-Answer: 42

You’re welcome! And thank you all, for making the Internet so interesting!

A Few Upgrades

July 16th, 2005

After having to delete a few hundred spam comments from my weblog this week, I finally admitted that my blacklist wasn’t stopping any spam. Apparently domains have gotten too cheap for blacklists to be a viable deterrent anymore.

So, today I upgraded the blog to WordPress 1.5.1.3, and taking a cue from Jeremy Zawodny I’ve included an extra field people need to fill out to post comments:
New required field for commenting.

The idea here is that most comment spam is automated by programs (call them bots or spiders) which look for installations of common blogging software. These spiders look for a “signature” they recognize, like the fields in a comment posting form, that let it identify which blogging software is being used on the site. From there, they can guess where they need to send automated requests to post comments, and what the values of the various form fields need to be. By simply adding another field that needs to be filled out with a specific value, most comment spam can be stopped. Or, that’s the idea.

If everyone were to start doing this, writing spam bots would suddenly become much harder, as they’d essentially need to start passing arbitrary Turing tests. That, or spammers would need to use human labor to customize their bots for each site they wanted to spam. The hope is that that would get very expensive very quickly.

In a sense, this is one step backwards from Captchas, which are programs designed to automatically generate these simple tests. However, the point of vulnerability for captchas is that they too are programs, with their own signatures that bots can pick up on. If enough people use the same captcha in the same way, then at some point it makes sense economically for a spammer to specifically target that captcha, either by writing a program to solve the captcha, or through clever social engineering efforts.

So, if a spammer starts customizing their bots for my site, the logical next step is to either change my one-off customization by asking a different question, or to write a custom captcha to increase the marginal cost of customizing a bot to spam me. Personally, I’m leaning towards automatically generating visual puzzles that would be difficult for a person to solve, let alone a spambot, and seeing how they like that. 🙂

In the meantime, the code I added to do this was very simple, and I’d encourage people to try customizing their own site with a different test.

In wp-comments-post.php, I added:
$comment_turing = trim($_POST['comment_turing']); if (stristr($comment_turing, 'andrew') === FALSE) { die( __('Sorry, you must enter Andrew\'s first name to post a comment.') ); } Download this code: /code/17a.txt

and in my comments template I added:
<p> What is Andrew's first name? <input type="text" name="comment_turing" id="comment_turing" tabindex="4" /> <small>(required)</small> </p> Download this code: /code/17b.txt
and changed the next couple of "tabindex" attributes to be higher so tabbing between fields works correctly.

Enjoy!

Update:
Peter took this idea and ran with it, harnessing the power of his commenters to compute pi. Check it out.

Tuesday Highlights

March 16th, 2005

Some highlights from the first day of the Etech conference in San Diego:

  • I got stuck in an elevator with Doc Searls, Steve Gillmor, and 7 other people after breakfast.
  • The Applied Minds presentation was awesome. Danny Hillis started by showing some videos of robots and moved on to a video of a live projection map table and a Myst-like deforming map surface. The projection map table was basically a large touch screen table which allowed people to manipulate a globe-like map by touch, zooming in and out and overlaying different satellite and topographical data sets. There’s a video (local mirror) and press blurb available from Northrop Grumman. Apparently this table uses ArcGlobe and runs about $300,000 (ouch). The deforming map surface was a table which deformed to match the elevation of the topographical map being projected onto it. I cannot express how cool this looks. I want one.
  • Sam Ruby’s “Just Use HTTP” talk did a good job of explaining some of the complexities in making RESTful web services. Slides are up.
  • Werner Vogels’ talk on Interplanetary Scale services was good. He touched on a few things I’ve been thinking about for awhile now relating to epidemics and scalability.
  • Later, I walked around a bit downtown:
    A fountain in downtown San Diego.
  • And went to the vendor reception:
    Vendor reception at Etech.