Tuesday, February 07, 2006

Book search projects

As many readers of these blogs know, the company that owns the blogspot space is involved in a controversy over its plans to offer total book searching: to store most published books on its servers and include in search results, and to offer royalties to publishers and authors when books are sold. The company did not plan to offer entire books or large segments of books for free browsing beyond what is normally called “fair use.” The plan always offered an opt-out feature for any publisher or author. Amazon.com has a partial similar capability with its “search inside the book” feature.

This sounds like a win-win for authors, who putatively would increase sales and for newbies especially. Why, then, has there been an outcry, and even a lawsuit by the Authors Guild, seeking injunctive relief?

Some of the reason may be practical. Some publishers and writers (myself included) offer their books for free online browsing. This may be all right for books with long narratives (especially fiction or biography) that cannot be easily read without eventually purchasing a hard copy for the beach (or perhaps an e-book for an e-book reader, a technology from the late 1990s, including Softlock, that does not seem to be taking off). Sales of books with a reference aspect, however, like cookbooks with recipes, or “how-to” books, might be harmed by such easy online availability.

But the objections seem to be more fundamental. If you go into a Kinko’s store, you probably won’t be allowed to copy an entire book, or even any pictures, because of copyright infringement, even for your own use. (It probably would be more practical in economic terms to buy a copy of the book anyway, unless it was out of print.) There has always been a legal expectation that authors and publishers may withhold consent for large amounts of copying, even for personal use. (There sometimes exceptions for teachers or librarians or other school use.) Established authors, especially those who make a living through writing and who normally expect advances for future work, simply do not want to yield any turf on this. Too bad. Okay, maybe they really think that people will waste time, toner or jet ink printing out books on homer or office printers.

Truth to tell, there may even be a more subtle concern with some authors. Most non-fiction issue-oriented books, particularly those oriented toward current events, run the risk of embarassing or exposing certain parties. That may in fact be part of the author's intention and may not seem out of line when the visitor has to go to the trouble and expense of purchasing a whole hard copy book. When a "book search" makes it easy to find a particular party mentioned in a book with no effort essentially forever, this may expose the author to a greater practical risk of legal complaints. All of this is very unsettled, as search engines have created new practical perils for "controversial" people that were not significant before. Some publishers, when they display portions of book text online for browing, exclude robots from harvesting the book content. Many publishers give authors the option to opt-in or opt-out from submitting their books to complete search programs, which may or may not include the ability to browse the entire book online.

That means that this book catloging (essentially an electronic "Library of Congress") may have to start out modestly, with historical archives and works in the public domain. A lot of this (Shakespeare) is easily available online to search now. But it would be good, from a Freedom of Information Act point of view, to get more government material and national archives into the search engines. Some of that impacted me earlier in life.

A couple of other points: companies that want to offer internal book searching or publishers (including cooperative publishers) that allow free public browsing of online copies ought to have considered becoming plaintiffs challenging COPA (see below). Furthermore, some search engines offer visitors cached copies of documents, often converted from other formats (particularly PDF) to HTML. Theoretically, these might be viewed as infringements if done without permission, although they cause little practical objection. However, when a document needs to be removed or changed for some particular issue, it often takes some time before the cache can be removed or changed, particularly if the item is to be deleted so that the link becomes orphaned.

No comments: