display | more...

CAPTCHA is a way websites attempt to keep spambots at bay, testing for humanity instead of an automaton by presenting a distorted word for the viewer to decipher. reCAPTCHA presents two words, but does a bit more behind the scenes. One word is randomly chosen from a list of confirmed words, and the other is randomly chosen from fails by an Optical Character Reading program on printed text. When a human deciphers both words, they are helping to proofread text.

The team at Carnegie Mellon University that developed Captcha realised that they could help with the digitising of book archives, using the man hours of punters trying to buy tickets to a concert, comment on weblogs or vote for TIME's most influential person of a year. While a computer does the work of helping humans get to where they want, the human helps compute the transcribing of millions of words. The CMU team have also considered implementing an image-based re-captcha that would help the tagging of photos for coherent retrieval from image databases. Not that long after its introduction, reCAPTCHA was bought by Google and the OCR words presented were pulled from their scanned archives of the New York Times. Within a year, over 20 years of the NYT archives were corrected using reCAPTCHA.

With every measure to defend against spambots there is a concerted effort to get around it; several ways are documented on E2 and on Wikipedia. There is also no getting around human capriciousness.

As you only need to get the correct word correct (or, depending on the setting, almost correct), it is possible to teach reCAPTCHA the wrong word to correct in the digitised document. For example, if the test shows 'Moving' and 'gadgets' and you enter 'Moving' and 'gaga', and you pass, you've taught it that a possible interpretation for the second text is 'gaga'. Since the test compares multiple 'correct' replies (and some of the corrections are checked by Google employees), this only works if many more people are attempting the same thing. A group most commonly calling itself Captain Charlie has been presenting people with words to use each month-- usually an adjective, noun and verb-- that are nonsense or quite clearly recently coined words, brands or slang, such as 'Fnord Shatner Bling'. The phrase can be found posted in cafe's, city and campus libraries, or from Twitter accounts. You can test their success against uploaded NYT archives.

The test is also available in audio. The text version of reCAPTCHA is available in eight languages, with only three for audio. Notoriously, the audio test has about a 60% result of getting the correct reply. Ironicly, the easiest way for a blind person to get through the test is by asking for a sighted person's help.

Log in or register to write something here or to contact authors.