Results tagged “theinternet” from Appleseed Blog

I recently finished a project for a client that involved installing CAPTCHAs on their various web forms. You've seen these before - they're the little widgets that challenge you to retype the intentionally garbled numbers and letters it displays in order to prove that you're an actual human, and not a node of some spammer's botnet.

In researching the current best practices for adding a CAPTCHA to an existing site, I found ReCAPTCHA, a project of Carnegie Mellon University. You've probably come across examples of this particular flavor of CAPTCHA before. The image always consists of two unrelated words or word fragments, usually resembling smudgily typed copy with some additional bot-foiling visual artifacts thrown on for good measure. Something like this:

captcha.png

It happens that the two words have a very good reason for looking like they do: they have been scanned out of old printed books and periodicals, part of various CMU-affiliated efforts to digitize old media. As ReCAPTCHA's own About page explains, even the best OCR software is only so-so at recognizing text, and frequently can't recognize words that would be obvious to any human reader of the language at hand.

Now, here's the cool part: by plugging itself into ReCAPTCHA, a computer working in one of these massive scanning projects can submit a word it's unsure about to the global community of people who happen to be filling web-based forms in at that very moment. It will quickly get a response that 98 percent of all the people who saw it thought the word was "doggy" (or whatnot), and that will be enough agreement for the machine's purposes.

The reason there are two words per ReCAPTCHA instance is that one of the words is undergoing this kind of trial, while the other one already has - in other words, the ReCAPTCHA system already knows what word it is. This is how the widget still functions as a CAPTCHA - the entity filling it in must still be correct about at least one of the words, if it wants to prove that it's a human. Meanwhile, bots are foiled not just for the usual reasons, but because all the words on display have already proven to be confusing to computers trying to read them!

I think this is incredibly cool. That slimy spammers have made technologies like CAPTCHAs a necessity of the modern web is quite unfortunate, but the way that ReCAPTCHA has found a way to put a positive, culture-perserving spin on it is ingenious and laudable.

Farewell, Hyperarchive

| | Comments (8)

My friend Noah, a sysadmin at MIT, reports that on October 1 he switched off the info-mac hyperarchive (hyperarchive.lcs.mit.edu), one of the oldest websites on the internet. It was a web-accessible version of the info-mac archive, an online repository of Mac freeware and shareware, which before then was mainly browsable via FTP. I have fond memories of spending evenings trolling through the hyperarchive's directory structure, looking for neat stuff to fill my Mac LC's 40 GB hard drive, circa 1994.

Several years ago, when I was writing the Nutshell book, I discussed the possibility of being the hyperarchive's volunteer maintainer. Nothing came of it, though, and the server was allowed to coast into electronic senescence. I see from that Wikipedia article that there exists an info-mac website that claims lineage from the original archive and mailing list, but it's now just one more computer-news website in a vast sea. It does sport a mirror of the info-mac archive, where it's quickly apparent how little traffic it got since the turn of the decade; viewing some categories by date shows you software from the 1990s on the first page.

Though the hyperarchive's role was supplanted by better-organized websites years ago (hello, versiontracker), I won't forget its important role in the early history of Macintosh software, the web, and myself as a computer dood. Goodbye, old friend!

Here is a Making Light thread with interesting commentary, built on an essay by Jon Stokes on "IT consumerizaton and the future of work". Lots of good discussion on how technologies sold directly (or, in Googlish cases, made freely available) to consumers nowadays easily outpaces the technology and work-paradigms typically found in offices and enforced by IT departments, often by a factor of years.

For me, it brings to mind the continually increasing feasibility of distributed work environments, where a professional team doesn't work together nine-to-five in a physical office, but instead works wherever they happen to be, applying their own resources as they see fit, and using the internet to collaborate. I find this sort of discussion serendipitous, as I'm just starting to move Appleseed towards this sort of configuration, slowly and carefully bringing colleagues on-board as fellow consultants. It's a trend I've been noticing among several other newer, information-based businesses like mine.

The goal in our case is to allow Appleseed to serve more customers, without diminishing our level of personal attention to each. But I want to do it without sacrificing the independence I've enjoyed - and which has directly translated into higher-quality work for Appleseed's customers - since I launched the business, nor would I ask the same sacrifice from anyone I work with.

It's not something that's going to work for every kind of business. While a distributed office must maintain a minimal set of IT-style standards in order to keep all its team members synchronized, there's still a need for everyone to serve as their own personal system administrators. This is a lot easier to pull off if the company happens to be in the software consulting biz.

The results so far have proved quite promising. I feel quite confident about Appleseed's continued growth as both a great source of software expertise and a great place to work. (Sorry, though; no "careers" link on our website header just yet. :) )

0