Wired News: The Great Library of Amazonia:
"Getting to this point represents a significant technological feat. Most of the material in the archive comes from scanned pages of actual books. This may be surprising, given that most books today are written on PCs, e-mailed to publishers, typeset on computers, and printed on digital presses. But many publishers still do not have push-button access to the digital files of the books they put out. Insofar as the files exist, they are often scattered around the desktops of editors, designers, and contract printers. For books more than a few years old, complete digital files may be lost. John Wiley & Sons contributed 5,000 titles to the Amazon project -- all of them in physical form.
Fortunately, mass scanning has grown increasingly feasible, with the cost dropping to as low as $1 each. Amazon sent some of the books to scanning centers in low-wage countries like India and the Philippines; others were run in the United States using specialty machines to ensure accurate color and to handle oversize volumes. Some books can be chopped out of their bindings and fed into scanners, others have to be babied by a human, who turns pages one by one. Remarkably, Amazon was already doing so much data processing in its regular business that the huge task of reading the images of the books and converting them into a plain-text database was handled by idle computers at one of the company's backup centers.
The copyrights to these titles are spread among countless owners. How was it possible to create a publicly accessible database from material whose ownership is so tangled? Amazon's solution is audacious: The company simply denies it has built an electronic library at all. 'This is not an ebook project!' Manber says. And in a sense he is right. The archive is intentionally crippled. A search brings back not text, but pictures -- pictures of pages. You can find the page that responds to your query, read it on your screen, and browse a few pages backward and forward. But you cannot download, copy, or read the book from beginning to end. There is no way to link directly to any page of a book. If you want to read an extensive excerpt, you must turn to the physical volume -- which, of course, you can conveniently purchase from Amazon. Users will be asked to give their credit card number before looking at pages in the archive, and they won't be able to view more than a few thousand pages per month, or more than 20 percent of any single book.
Manber has built a powerful, even mind-boggling tool, then added powerful constraints. 'The point is to help users find a book,' says Manber, 'not to make a new source of information.'"
Amazon will unveil a new search engine of sorts that lets you search for text in almost 120,000 books. More to come. Wow!
"Getting to this point represents a significant technological feat. Most of the material in the archive comes from scanned pages of actual books. This may be surprising, given that most books today are written on PCs, e-mailed to publishers, typeset on computers, and printed on digital presses. But many publishers still do not have push-button access to the digital files of the books they put out. Insofar as the files exist, they are often scattered around the desktops of editors, designers, and contract printers. For books more than a few years old, complete digital files may be lost. John Wiley & Sons contributed 5,000 titles to the Amazon project -- all of them in physical form.
Fortunately, mass scanning has grown increasingly feasible, with the cost dropping to as low as $1 each. Amazon sent some of the books to scanning centers in low-wage countries like India and the Philippines; others were run in the United States using specialty machines to ensure accurate color and to handle oversize volumes. Some books can be chopped out of their bindings and fed into scanners, others have to be babied by a human, who turns pages one by one. Remarkably, Amazon was already doing so much data processing in its regular business that the huge task of reading the images of the books and converting them into a plain-text database was handled by idle computers at one of the company's backup centers.
The copyrights to these titles are spread among countless owners. How was it possible to create a publicly accessible database from material whose ownership is so tangled? Amazon's solution is audacious: The company simply denies it has built an electronic library at all. 'This is not an ebook project!' Manber says. And in a sense he is right. The archive is intentionally crippled. A search brings back not text, but pictures -- pictures of pages. You can find the page that responds to your query, read it on your screen, and browse a few pages backward and forward. But you cannot download, copy, or read the book from beginning to end. There is no way to link directly to any page of a book. If you want to read an extensive excerpt, you must turn to the physical volume -- which, of course, you can conveniently purchase from Amazon. Users will be asked to give their credit card number before looking at pages in the archive, and they won't be able to view more than a few thousand pages per month, or more than 20 percent of any single book.
Manber has built a powerful, even mind-boggling tool, then added powerful constraints. 'The point is to help users find a book,' says Manber, 'not to make a new source of information.'"
Amazon will unveil a new search engine of sorts that lets you search for text in almost 120,000 books. More to come. Wow!
Comments