Google Print is the topic that may single-handedly keep the copyright-related blog world in business for the next few years.
Last week, Google added the full text of 10,000 public domain books into the Google Print database. The NY Times reports: Google Adds LIbrary Texts to Search Database: "The additions, from the university libraries at Michigan, Harvard and Stanford and from the New York Public Library, represent the first large group of material to be made available electronically from those libraries, which along with Oxford University contracted with Google last year to let the company scan and make searchable the contents of much or all of their collections."
On Google's corporate blog, Adam Mathes writes: Preserving public domain books: "The world's libraries are a tremendous source of knowledge, much of which has never been available online. One of our goals for Google Print is to change that, and today we've taken an exciting step toward meeting it: making available a number of public domain books that were never subject to copyright or whose copyright has expired."
The following day, Amazon.com announced a program that will sell online access to "any page, section or chapter of a book." These commercial programs will convert the full-text databases used for searching into a way to offer access to full text works as well as a way compensate rightsholders-- like iTunes. Again, from the NY Times: Want 'War and Peace' Online? How About 20 Pages at a Time?: "The idea is to do for books what Apple has done for music, allowing readers to buy and download parts of individual books for their own use through their computers rather than trek to a store or receive them by mail. Consumers could purchase a single recipe from a cookbook, for example, or a chapter on rebuilding a car engine from a repair manual."
This week, the debate even spilled over into my favorite football-related column, Gregg Easterbrook's Tuesday Morning Quarterback, where Easterbrook writes:
Copyright law gives authors and performers the exclusive right to make or authorize copies of their works; the exclusive right to make or authorize copies is, at heart, what a copyright represents.…[Google] says it will not scan books whose authors send a letter of objection. But if you want to use a copyrighted work, the legal onus is on you to get permission, not on the copyright holder to lodge a protest. Google's position is like saying that if you do not want your house broken in to, it is your responsibility to send a notification to thieves. In this analogy, Google is the thief -- just like in the real world! Remember when Google maintained it would never be the next Microsoft? It's not; Microsoft obeys the law. Remember when Google was going to be a corporate good-guy? Google is fast becoming the next Enron; maybe this is the kind of thing that happens when your founders decide they need an entire Boeing 767 to themselves. Contrast Google's corporate kleptomania to Amazon's decision to offer online books only if authors grant permission. As we enter the digital age, it becomes ever-more important society resists the idea that unaccountable corporations have an unlimited right to seize whatever exists in electronic form. And Google, now that you have declared it is fine to copy intellectual property without permission, surely you won't object if anyone steals your proprietary software and corporate data?
In order to understand the legal implications of the Google Print case, we have to look at what Google is doing-- scanning books into an electronic database for the purpose of indexing.
In Kelly v. Arriba Soft, The 9th Circuit ruled that creating thumbnails of images in a search engine is fair use.
The search engine at issue in this case is unconventional in
that it displays the results of a user’s query as “thumbnail”
images. When a user wants to search the internet for informa-
tion on a certain topic, he or she types a search term into a
search engine, which then produces a list of web sites that
contain information relating to the search term. Normally, the
list of results is in text format. The Arriba search engine, how-
ever, produces its list of results as small pictures.
To provide this service, Arriba developed a computer pro-
gram that “crawls” the web looking for images to index. This
crawler downloads full-sized copies of the images onto Arri-
ba’s server. The program then uses these copies to generate
smaller, lower-resolution thumbnails of the images. Once the
thumbnails are created, the program deletes the full-sized
originals from the server. Although a user could copy these
thumbnails to his computer or disk, he cannot increase the
resolution of the thumbnail; any enlargement would result in
a loss of clarity of the image.
The Google Print service provides essentially the same service as the Arriba Soft image search engine, except that it searches print books instead of digital images.
We must determine if Arri- ba’s use of the images merely superseded the object of the originals or instead added a further purpose or different charac- ter…Although Arriba made exact replications of Kelly’s images, the thumbnails were much smaller, lower-resolution images that served an entirely different function than Kelly’s original images.
The court ruled that create a search engine index is a transformative use that does not supersede the purpose of the original work. The character of a copy used in a search engine index is different than the character of a copy used to read. The search engine use helps to find the book. The intrinsic purposes of the use are different.
The court found that creating a complete copy is necessary to create a service that adds value to the images:
It was necessary for Arriba to copy the entire image to allow users to recognize the image and decide whether to pursue more information about the image or the originating web site. If Arriba only copied part of the image, it would be more difficult to identify it, thereby reducing the usefulness of the visual search engine
Google's book scans are used only for the purpose of creating a full-text index for searching and not for offering text to users. Google is not distributing copies of copyrighted books without permission. For books submitted to the index by publishers, Google provides acess to a couple of pages (with permission of the copyright owner.) For books scanned in under the partnership with university libraries, Google provides access to ~30 word excerpts that contain the user's search term. Google's Screenshots page explains this well.
In UMG Recordings v. MP3.com, the court found that a digital locker service, which created medium-shifted full copies of recorded music, was an infringing use. The defendant's service not only created but distributed complete copies. Like the Arriba Soft thumbnail images, these copies were at a lower resolution/fidelity than the original works. Unlike the Arriba Soft thumbnails, these copies were used to supplant the original use of the works-- for listening.
The key difference between Google and Arriba Soft is that Arriba searches images already on the web in digital form. Google is digitizing the books made available only in print, possibly superseding the market for electronic versions of those same books. Images placed on the web may be thought to be made available with an implied consent to be indexed.
Google Print does not provide access to the complete work and its full copies are used to add value by creating an index, rather than to merely replace the traditional use.
If Google, like Amazon, was providing access to a complete copyrighted work, Google would clearly need permission.
The authors and publishers complaint is based on the fact that Google is copying the entire book without permission in order to create this index. And this question shows why this case is important. Does Copyright law regulate the act of copying or the act of distribution? If making a copy of a complete work in order to create a searchable index, then Google's entire business is threatened. In indexing the web, Google creates complete copies of web pages, unless the web publisher explicitly opts out using the robots.txt protocol. In addition, Google not only creates, but also distributes medium-shifted cache copies of .PDF and .DOC files.
If Copyright law is concerned with regulating the act of copying, then Google may be in trouble, but then so might culture. As a matter of public policy, copyright law might be better served by regulating distribution rather than regulating copying per se. If it is impossible to search the entire web, we lose this wonderful resource. As a matter of public policy, prohibiting intermediate copying will harm public access to information. Just because Google would have the ability to disseminate infringing copies might not mean that it should be prohibited from using infringing copies.
The NY Public Library will hold a live panel discussion, The Battle Over Books: Authors & Publishers Take on the Google Print Library Project, with Allan Adler (Association of American Publishers), Chris Anderson (Wired Magazine), David Drummond (Google), Paul LeClerc & David Ferriero (The New York Public Library), Lawrence Lessig (Stanford Law School), and Nick Taylor (The Authors Guild.) I will liveblog this, if possible.
Pat Schroeder and Bob Barr wrote an op-ed piece in the Washington Times stressing the rights of authors: Reining in Google: "Not only is Google trying to rewrite copyright law, it is also crushing creativity. "
In Forbes, Nick Schultz defends Google: Don't Fear Google: "The way the current copyright law works, I can take a book out from any library, read it and write a review of it for publication on the Web site I edit or in the pages of Forbes.com or anywhere else. This “fair use” of material involves no copyright violation. Readers benefit from learning a bit about the book, authors and publishers benefit from increased exposure. "
Copyright treatise author Raymond Nimmer thinks that the Google project is very different from the Arriba Soft case and that Google's use is not fair use, based mainly on the fact that it is a commercial enterprise: Google Lawsuit Begins; Fair Use On the one hand, this large company desires to make a massive number of copies of other persons’ property for its own profit. On the other hand, the authors and publishers that own the property rights have been given exclusive rights to copy or distribute copies of their works as part of a statutory scheme that intends to provide authors with incentive to create new works."
Another treatise author, William Patry, prefers to apply a market substitute test for fair use: Google Revisited: "So in the Google project, why should we care if there are server copies? The purposes for the copies in connection with the Print Library project is to give people access to knowledge about the existence of the book as well as a tiny amount of text. That is of great help to researchers and hopefully to authors and publishers of the books too. It in no way harms copyright owners unless the project becomes something else, namely a full-text service which then is a market substitute."
I tend to think that this is the core analysis of fair use-- if the use is a market substitute for the original work, it is probably not a fair use.
Jason Schultz was quoted in a segment on NPR's California Report on Google Lawsuits over Images, Books
In Salon.com, Farhad Manjoo has an excellent piece that summarizes the implications of these cases: Throwing Google at the book: "A year later, Google's grand plan to digitize the world's books still seems as fantastical as it did when it was first proposed. Earlier this year, the company started scanning books at libraries, and on Nov. 3 launched an elegant beta version of its book search engine -- but the project faces an uncertain future."
On a tangentially related note, Eric Goldman discusses a different search engine indexing case: Newborn v. Yahoo: "In this case, a web publisher sued Google and Yahoo for contributory copyright and contributory trademark infringement based (apparently) on their indexing the publisher's press releases. I say "apparently" because the plaintiff was unable to articulate a legal complaint or a statement of facts that the judge could understand. Because of the defects in the complaint, the judge granted a motion to dismiss with prejudice, ending the case before it started."
More links and commentary follow in the extended entry.