Google Books Goofed

Twitter Updates

HashTags

#PKNsf
San Francisco PechaKucha Night
#IXDAsf
San Francisco IxDA

Blogroll

Anything worth doing is worth doing right. —Hunter S. Thompson

I’ve recently been researching Old English measures, and found that the ability of Google’s search engine to do a plain-text search on everything they’ve scanned into Google Books / the Google Library Project has been a phenomenal aid in ferreting out lost knowledge that shouldn’t be so obscure that until today, every Wikipedia entry on the pound Sterling said its weight had always been different from the Tower Pound.

Awesome that Google let me discover that!

However, when I wanted to read the rest of the story, so that I could accurately update Wikipedia, Google failed me completely. The low resolution at which they scanned A View of the Silver Coin and Coinage of England is so pitifully low that you cannot decipher many of the composed fractions. For this book, published in 1762, copyright hasn’t been an issue for many years. A few, such as ½ and ¼ vary from somewhat decipherable (though not in all instances, and usually because I already knew the value from another source, as on page 4), to not quite decipherable (several of those on page 13; is that 5 5⁄8 grains or 5 5⁄8 grains or 5 3⁄9 grains?), but once you get to the fractions on page 14 and 15, there’s no chance of deciphering those without access to the original document. And heaven help you if the composed fractions (which are by definition substantially smaller than the surrounding characters) are in a footnote, whose text size is already smaller than the standard text.

Anything worth doing is worth doing right. —Hunter S. Thompson

I’ve recently been researching Old English measures, and found that the ability of Google’s search engine to do a plain-text search on everything they’ve scanned into Google Books Library Project has been a phenomenal aid in ferreting out lost knowledge that shouldn’t be so obscure that until today, every Wikipedia entry on the pound Sterling said its weight had always been different from the Tower Pound.

Awesome that Google let me discover that!

However, when I wanted to read the rest of the story, so that I could accurately update Wikipedia, Google failed me completely. The low resolution at which they scanned A View of the Silver Coin and Coinage of England is so pitifully low that you cannot decipher many of the composed fractions. For this book, published in 1762, copyright hasn’t been an issue for many years. A few, such as ½ and ¼ vary from somewhat decipherable (though not in all instances, and usually because I already knew the value from another source, as on page 4), to not quite decipherable (several of those on page 13; is that 5 5⁄8 grains or 5 5⁄8 grains or 5 3⁄9 grains?), but once you get to the fractions on page 14 and 15, there’s no chance of deciphering those without access to the original document. And heaven help you if the composed fractions (which are by definition substantially smaller than the surrounding characters) are in a footnote, whose text size is already smaller than the standard text.

There are, of course, other problems resulting from low resolution. Where both the resolution and contrast are low, entire pages become difficult to read because many parts of typeset characters have an identity crisis that flits them between zero and one pixels. But the very next page is dark and bold and looks like it came off your LaserJet.

Google has publicly stated that one of their intents with Google Books is to  “organize the world’s information”, yet in scanning such a volume of books (Google has reportedly spent more than USD $200 million in the effort), some of which the libraries may discard in the future, Google may be forever shredding valuable information by allowing (however tacitly) librarians to think these old volumes can be thrown out or otherwise hidden without harm because they’re now in the cloud. In the case of this book, I can’t even go to the physical library it was borrowed from, because the digitized book, the copy is missing all indications of which library that was. Yes, there appear to be copies nearby at UC Berkeley and Stanford, but what if I needed to see margin notes that were in this particular book and absent in both of those copies?

I’m mad at Google for putting such a poor work product on the Web, and for squandering valuable time and money on low-resolution scans. Should they decide to correct the issues, they would have to borrow all the books again, retrain or hire new staff (clearly the staff that did the first set of scans were not historians or even of such a mindset), and a few other things. Given the controversy, would the libraries even offer access a second time?

This is doubly frustrating when you read how and why Google started. The boys should know better.

Author: Peter Sheerin

Peter Sheerin is best known for the decade he spent as the Technical Editor of CADENCE magazine, where he was the acknowledged expert in Computer-Aided Design hardware and software. He has a long-standing passion for improving usability of software, hardware, and everyday objects that is always interwoven in his articles. Peter is available for freelance technical writing and product reviews, and is exploring career opportunities in interaction design. His pet personal project is exploring the best ways to harmonize visual, tactile, and audible symbols for improving the effectiveness of alerting systems.

Leave a Reply