The Best Spidering Project Ever

April 25th, 2009

I keep losing the link to the oddly-named Drunk Men Work Here project. This is, frankly, the best (which in my quirky view is some combination of unique, creative, awesome, and data-intensive) analysis of web spiders I’ve ever seen.

For anyone interested in how to induce the internals of spiders, this is required reading.

Office Hours

October 30th, 2008

Tonight I held my first office hours in Palo Alto at Coupa Café. Office hours are an idea I got from my good friend Buzz Andersen, who held them on a semi-regular basis at The Alembic in San Francisco, and which I endeavored to attend whenever possible.

The basic idea of office hours is that I (or whoever holds them) will be at a certain location at a certain time, and anyone who wants to talk to me, ask questions, etc, can come by and do so. It seems a bit ego-centric, but really I feel that it’s the opposite. I’m interested in what other people have to say, and feel that modern society has disconnected us from talking to one-another on a regular basis in the ways that were previously available. The sharing of ideas that were the foundation of social clubs, pubs, libraries, coffee shop discussions, and even secret societies of say, the 17th through 19th centuries have been eclipsed by water cooler discussions and team meetings. Instead of a roundtable of ideas shared in the open among a group of like-minded people, we have substituted proprietary trade secrets and patent applications, action items and Gantt charts, vapid launch parties and lavish offsites.

I feel that office hours are a good start in getting back to the foundations of the Enlightenment and, with much less importance, our industry. They are a brief pause where we can exchange ideas, talk to each other frankly, share our ideas, and come together in a spirit of pushing society and ourselves further. Many may consider this to be utopian garbage, but even on a utilitarian basis I’ve found office hours to be a welcome break from the hyper-competitive nature of our industry, and an invaluable way to exchange approaches to problems, solutions and expertise, and even industry gossip (*gasp*).

At my first office hours, the topics ranged from iPhone memory management questions to open geodata initiatives to the future plans of our respective lives and businesses. It was a lively exchange of ideas, and one I’m glad to have had. For those of you in the Bay Area interested to coming to my next office hours, I encourage you to follow me on Twitter. For people interested in holding their own office hours, I say, “just do it”. You may just be pleasantly surprised by the results.

Update: Tom Robinson of 280 North suggests using an #officehours tag in Twitter to announce office hours. I think this is a great idea, and will be using this for my future office hours announcements.

Adding Default Descriptions to Trac Tickets

May 2nd, 2008

I’ve been using Trac for the new projects I’m working on, and am liking it so far. I have, however, done a bit of customization to make it fit my needs a bit better. One thing that took awhile was figuring out how to change the default ticket description.

I like having some structure to my bug descriptions, including steps to reproduce, regression information, etc. Thankfully, Trac’s template system allows you to replace output with an override template (typically found in templates/site.html) that is applied before content is sent across the wire. It was a bit time-consuming to figure out how to do what I wanted to do, but I eventually ended up with a templates/site.html file that looks something like this:

<html xmlns="http://www.w3.org/1999/xhtml" xmlns:py="http://genshi.edgewall.org/" py:strip=""> <!-- Custom match templates go here --> <span py:if="req.environ['PATH_INFO']=='/newticket' and not req.args.has_key('preview')"> <textarea py:match="textarea[@id='field-description' and @class='wikitext']" py:attrs="select('@*')"> Your custom description goes here. </textarea> </span> </html>

I hope this is useful for someone. :)

On Regret Minimization, Leaving Apple, and Playing Pinball

April 30th, 2008

It’s been a month now since I left Apple to start a company. I’d been working on Time Machine (an interesting and challenging project) as a software engineer, with a fantastic group of people, and was in a situation that was really pretty great. So, why did I decide to leave and go do my own thing? Read the rest of this entry »

Happy Leopard Day!

October 26th, 2007

In January, I started working as a software engineer on Leopard. I was lucky enough to get to work on two of the top ten features for this release of OS X:

Time Machine Parental Controls

For the past few months I worked seven days a week pretty much non-stop on making the back end of Time Machine faster and more reliable. It was a lot of work, but overall a very rewarding experience, and I’m happy that I got to work on such a kick-ass release of OS X. Enjoy!

Now, I’m off to a beer bash for some much needed relaxation. :-)

UI Patterns: Wet Lazy Susan

June 3rd, 2007

I ran across a couple of variations on the “wet floor” effect today which both introduce a rotating set of images that can be moused over for more information. For lack of a better term, I’ll call this the “Wet Lazy Susan” effect. Read the rest of this entry »

Microsoft Web

March 31st, 2007

Here’s a snippet from Microsoft’s current corporate home page:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html lang="en" dir="ltr"> <head> <META http-equiv="Content-Type" content="text/html; charset=utf-16"> <title>Microsoft Corporation</title> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <meta name="SearchTitle" content="Microsoft.com"> <meta name="SearchDescription" content="Microsoft.com Homepage"> Download this code: /code/microsoft.txt

Can you spot the problem?

Update: James Booker is the first in with an answer: there are two Content-Type meta tags above, both of which specify different character sets.

Specifying the content-type in meta tags is a bit of a hack, as the browser has to seek through the first section of the document looking for a content-type declaration, then try reinterpreting the page with the character set the page specifies. Specifying a character set of “utf-16″ doesn’t make any sense in this scenario, as the browser is going to try the sniffing by interpreting the HTML as ASCII. If the page were actually UTF-16, this wouldn’t work, as the representation for the string “Content-type” in UTF-16 isn’t identical to its representation in UTF-8, as we can see in a Python shell:
>>> "Content-type".encode("utf-16") '\xfe\xff\x00C\x00o\x00n\x00t\x00e\x00n\x00t\x00-\x00t\x00y\x00p\x00e' >>> "Content-type".encode("utf-8") 'Content-type'

Thankfully, the HTML spec foresaw this problem:

The META declaration must only be used when the character encoding is organized such that ASCII-valued bytes stand for ASCII characters (at least until the META element is parsed). META declarations should appear as early as possible in the HEAD element.

So, there are actually three problems with the above HTML. There are two content-type declarations in meta tags, one of them is bogus, and the correct one isn’t as early as is possible in the head element. These problems, thankfully, are mitigated by the presence of an HTTP header that specifies the correct character set, and by the incredible amount of effort browser vendors have put into making their code accepting of mistakes such as these.

robots.txt Adventure

March 12th, 2007

Introduction.txt

Last October I got bored and set my spider loose on the robots.txt files of the world. Having had a good deal of positive feedback on my HTTP Headers survey, I had decided to poke around in robots.txt files and see what sorts of interesting things I could find.

Since then, I’ve taken 6 weeks of vacation and gotten to be very busy at work, so I’m just now getting around to analyzing all the data I gathered. These are some of the results of that analysis. Read the rest of this entry »

New Zealand Trip 2006

February 9th, 2007

Introduction

In mid-2006, I decided that I wanted to use up some of my long-accrued vacation, and started planning a 5 week trip to New Zealand. I’d wanted to go to New Zealand for a couple of years, after having seen photos of the country from my friend Josh’s ’round the world trip.

Typically, for me, planning a trip involves booking transportation, getting a rough idea of what I want to do, and figuring things out when I get there, a tendency which annoys some of those close to me to no end. I didn’t want New Zealand to be any different, so I booked airfare and investigated my in-country transportation options pretty carefully. I finally settled on renting a campervan, so that I’d be able to set my own pace for exploring the country. That settled, I booked the only activity I really had to book in advance, a hut ticket for the Milford Track, which is reputed to be the “finest walk in the world.” That settled, I geared up for the trip, buying lots of photography and hiking gear, and prepared myself for my first vacation outside of North America. Read the rest of this entry »

A Peek Inside

January 27th, 2007

The Shomstoppers present: Jolly Boots of Doom.