Human user verification helping to digitize old books
About 200 million CAPTCHAs are solved by humans around the world every day. In each case, roughly ten seconds of human time are being spent. Individually, that's not a lot of time, but in aggregate these little puzzles consume more than 150,000 hours of work each day.
Each word of an old scanned book, or rather only the words that cannot be read correctly by Optical Character Recognition (OCR), is used as a CAPTCHA. The system uses an aggregated response to determine what the OCR couldn't read
But if a computer can't read such a CAPTCHA, how does the system know the correct answer to the puzzle?
Here's how: Each new word that cannot be read correctly by OCR is given to a user in conjunction with another word for which the answer is already known. The user is then asked to read both words. If they solve the one for which the answer is known, the system assumes their answer is correct for the new one. The system then gives the new image to a number of other people to determine, with higher confidence, whether the original answer was correct.