DATABASE PROTECTION AT THE CROSSROADS: RECENT DEVELOPMENTS AND THEIR IMPACT ON SCIENCE AND TECHNOLOGY

By J. H. Reichman and Paul F. Uhlir

ABSTRACT

This article explores the potentially adverse impact that the emerging legal infrastructure could have on scientific, technical, and educational users of factual data and information-as well as on other sectors of the information economy-unless suitable adjustments are made. It begins by explaining how efforts to accommodate the networked environment to the publishers' fears of market failure will impose a daunting array of legal and contractual restraints on the ability of scientists and engineers to access factual data and information in the near future. It then goes on to examine the most recent efforts to devise a sui generis intellectual property right in noncopyrightable collections of data that would suitably balance public and private interests. It also emphasizes the need to reconcile legal protection of databases with fundamental constitutional mandates concerning free speech and the progress of science. The article concludes with a warning that overly protective initiatives could compromise the research-based institutions that currently ensure the technological predominance of U.S. industry in the global marketplace.

TABLE OF CONTENTS

  I. COMMODIFICATION OF DATA IN THE NETWORKED ENVIRONMENT: THE BIGGER PICTURE 796

II. POTENTIAL IMPACT OF THE DATABASE PROTECTION LAWS ON SCIENCE AND TECHNOLOGY 799

A. The User-Friendly Rules of Copyright Law 800

B. Unbalanced Rules of the Sui Generis Model 802

1. Basic Substantive Principles 803

2. The Resulting Legal and Practical Constraints 806

C. Long-term Implications of the Sui Generis Model 812

1. Reversing the Transparency Movement 813

2. Transaction Costs Unlimited 814

3. Endless Monopolies and Diminished Access to Government Data 816

4. Gaming the Cooperative Ethos 819

III. RECENT DEVELOPMENTS: THE QUEST FOR AN APPROPRIATE UNFAIR COMPETITION APPROACH 821

A. The Administration's Position 822

B. A Negotiated Discussion Draft in the Senate 823

1. Clarifying the Demands on Scientific and Technical Users 824

2. Compromise Proposals 825

C. Uncertain Future of the Database Protection Law 829

IV. PRESERVING THE CONSTITUTIONAL BALANCE OF INTERESTS IN THE NETWORKED ENVIRONMENT 832

A. The Competitive Ethos Under Attack 832

B. The Constitutional Dilemma 833

C. Erring on the Side of Caution 836

INTRODUCTION

The convergence of digital computing and telecommunications technologies has greatly expanded the already bright economic prospects for information goods of all kinds, but it has also unsettled the legal architecture on which the free market economies have previously been grounded.1 Information products behave differently from the tangible, physical products of the Industrial Revolution;2 and the legal paradigms that we have applied to balance incentives to create against both public good uses of information and the discipline of free competition are stretched past the breaking point.3 We are thus challenged to rethink how best to structure competition for information goods in the emerging, worldwide information economy.4

The technological convergence that creates promising new markets for information goods also opens new opportunities for scientific and educational uses of data and information. However, a powerful movement to commodify data and information previously treated as a public good-that is, as an inexhaustible, indivisible, and ubiquitous component of the public domain5-could limit the ability of the scientific, technical, and educational communities to capitalize on such opportunities. The momentum generated by that movement would eventually have faced these communities with serious challenges even in the absence of a new intellectual property right in collections of data. The adoption of a strong property right in noncopyrightable collections of data by the European Union6-in a haphazard manner, with little serious economic or empirical investigation7-thus precipitated a crisis that was already well under way.

This article explores the potentially adverse impact that the emerging legal infrastructure could have on scientific, technical and educational users of factual data and information (as well as on other sectors of the information economy) unless suitable adjustments are made. Parts I and II explain how efforts to accommodate the networked environment to the publishers' and database makers' fears of market failure will impose a daunting array of legal and contractual restraints on the ability of scientists and engineers to access factual data and information in the near future. Part III examines the most recent efforts to devise a sui generis intellectual property right in noncopyrightable collections of data that would suitably balance public and private interests. Part IV emphasizes the need to reconcile legal protection of databases with fundamental constitutional mandates concerning free speech and the progress of science. It ends with a warning that overly protective initiatives could compromise the research-based institutions that currently ensure the technological predominance of U.S. industry in the global marketplace.

I. COMMODIFICATION OF DATA IN THE NETWORKED ENVIRONMENT: THE BIGGER PICTURE

Digital telecommunications networks enable publishers to control the uses of information goods directly by contract, without relying on state action to avoid market failure,8 for the first time since the advent of the Guttenberg printing press. In effect, online delivery has "restored the power of the two-party deal" with regard to information goods and diminished the dependence of publishers on artificial legal fences that copyright laws and other related rights supplied in the print environment.9

Efforts to accommodate the pre-existing legal landscape to the new technologies are proceeding along several different fronts. For example, because the new technologies empower publishers to fence off information goods by means of encryption devices and other technical protection measures,10 Congress has been persuaded to pass new laws making it a civil or criminal offense to disarm or tamper with these devices.11 Would-be users must increasingly gain access to information goods via an electronic gateway where they are obliged to identify themselves and acknowledge the rights of the gatekeeper to the information goods, as, for example, expressed in copyright management information systems.12 The new laws that defend the owner's encryption devices also forbid users from tampering with their intellectual property identity tags.13

At the same time, the National Commissioners on Uniform State Laws have proposed a model contract law to govern computerized information transactions that all state legislatures would eventually adopt. These proposals, until recently embodied in a draft Article 2B of the Uniform Commercial Code ("U.C.C."),14 would validate the publishers' standard form, non-negotiable contracts to which would-be users must assent in order to cross the electronic threshold and gain access to information delivered online.15 As matters stand, and despite mounting criticism from intellectual property scholars, the proposed model law would validate even "click on" or shrinkwrap licenses that ignored or attempted to override public interest exceptions that favored users or competitors, including the technical and scientific communities, under the pre-existing legal infrastructure.16 For example, such contracts could override the right to make non-infringing uses of copyrighted works or the right to reverse-engineer subpatentable innovation,17 and they could require payment for uses that courts had previously deemed fair uses under the federal copyright law.18

A third line of attack is to devise new intellectual property rights that, among other things, would serve to reduce potential tensions between state contract laws and the federal intellectual property system. As we shall see, copyright law expressly permits many uses of copyrighted works that publishers would like to restrict by means of online licenses.19 The validity of such contracts may be questioned on the grounds that they disrupt the federal intellectual property system (preemption arguments) or overstep the constitutional guarantees of free thought and expression.20

However, if Congress enacted a hybrid ("sui generis") intellectual property right to protect the contents of databases, like that adopted by the European Union, it would give legislative approval to forms of protection that were previously unknown or questionable under traditional intellectual property law.21 The creation of new intellectual property rights in collections of data would thus make it harder to resist arguments that publishers who subject online delivery of databases to technical protection measures and to contracts of adhesion that limit previously legal uses had violated fundamental public policies derived from the copyright laws. In other words, the database protection laws seem to permit acts (and foster policies) that overtly contradict or override the limits previously established by copyright and other traditional legal models.22

Taken together, the ability of publishers to combine technical protection measures with tailor-made contract laws and hybrid intellectual property rights is supposed to stimulate investment in online commerce and to foster overall economic development.23 Critics fear, however, that the cumulative effect of these separate but well-coordinated legal initiatives will be to balkanize the information economy and to unduly restrict the use of unbundled information as raw materials of science and technology or as inputs into the production of value-adding or second generation information goods.24

II. POTENTIAL IMPACT OF THE DATABASE PROTECTION LAWS ON SCIENCE AND TECHNOLOGY

Let us suppose that a scientist or engineer lawfully obtained a printed copy of a chemical handbook or of a scientific article, with appended data, that was published in a peer-reviewed journal. These works currently attract copyright protection, and we shall assume that they meet the eligibility criteria of that body of law.25

A. The User-Friendly Rules of Copyright Law

The rules of copyright law constitute a balanced regime of public and private interests. In retrospect, we are struck by the friendly treatment this body of law gives to users and competitors alike, notwithstanding the powerful bundle of exclusive rights it vests in authors and artists in order to stimulate the production and dissemination of creative works.26

For example, any scientist or engineer who lawfully obtained the book or article mentioned above could immediately re-use all the data and all the ideas disclosed in them because copyright law does not protect ideas or data,27 nor does it protect against use of expression as such, but only against certain specified uses.28 Indeed, another scientist or engineer could independently rewrite his or her own version of the same article and disseminate it because copyright law allows independent creation, and all the unprotected data are spread out before the second comer's eyes.29

A second scientist or engineer who needed to duplicate even the first author's creative selection and arrangement of data (if any) for non-profit research purposes could normally fall back upon the "fair use" provisions of current law.30 A later researcher could also produce a follow-on article or book that borrowed the originator's unprotected factual information and data, but not his or her stylistic expression.31 To be sure, the norms of world copyright law (but not necessarily U.S. law) favor attribution in such a case, as do the ethics of science.32 But plagiarism is not the same as copyright infringement; the reuse of facts and data is clearly permitted in copyright law; and another author's popularized version of a prior researcher's factual findings remains perfectly legal.33

Most important, later scientists could combine the published data and factual information with other data and information into a multiple or complex interdisciplinary database without permission or additional payment to the originators.34 This follows in part because only ineligible matter is at issue and in part because copyright law does not prohibit use as such, but only certain uses, such as reproduction or adaptation of protected expression, and it is also buttressed by the doctrine of fair use.35

Even if scientists, engineers, or educators made classroom use of the protected expression for nonprofit purposes, these uses might well be fair or privileged uses under U.S. copyright law36 and would possibly become subject to compulsory licensing under E.U. copyright laws.37 Finally, having once purchased the book or the article, a scientist or engineer could sell it, lend it, or give it to others (first sale doctrine),38 borrow it from a library,39 use it as often as he or she liked for virtually any purpose, and make photocopies of it for scientific purposes under the fair use doctrine of U.S. law40 or the private use doctrine of E.U. law.41

B. Unbalanced Rules of the Sui Generis Model

Now, let us suppose that the contents of the same chemical handbook or of the aforementioned scientific article were disseminated online and surrounded by technical fences as previously described. Suppose further that the contents of the book or article were protected by laws implementing the E.U.'s sui generis exclusive property right in noncopyrightable collections of data or by the U.S. version of that right, as set out in H.R. 2652, the Collections of Information Antipiracy Act (March 1998).42 The House of Representatives adopted H.R. 2652, and the Subcommittee on Courts and Intellectual Property then attached it to the Digital Millennium Copyright Act, which became H.R. 2281, as sent to the Senate.43 The database portion was dropped prior to Congressional enactment of that bill, however, and it was reintroduced with some modifications as H.R. 354 in January 1999.44

1. Basic Substantive Principles

The sui generis provisions of the E.U. Directive45 protect the contents of any noncopyrightable database that is the product of substantial investment against extraction or reutilization of the whole or of any substantial part (evaluated quantitatively or qualitatively).46 Hence, this law could protect the noncopyrightable data appended to the hypothetical article in question or collected in the handbook, which the publishers might eventually disseminate online, with or without an accompanying print version.47

Such protection lasts as long as new investments are made in updates or maintenance; hence perpetual protection of dynamic databases becomes a likely result, despite a nominal fifteen-year term.48 There are no exceptions for "reutilization" by scientific and educational bodies, and there are no mandatory exceptions for "extraction" for scientific and educational purposes (although states may adopt this exception for noncommercial purposes).49 The member states implementing the Directive must permit extraction or use of an insubstantial part of a protected database.50 However, the risks of invoking even this exception are high, because a would-be user has no way of knowing in advance whether a court will later find that the amount used was in fact qualitatively or quantitatively insubstantial.51

U.S. bill H.R. 2652, later H.R. 2281 as part of the Digital Millennium Copyright Act, used different language to accomplish essentially the same result. It protected against use or extraction in commerce of all or a substantial part of a protected collection of information that is the product of substantial investment if such use or extraction would "cause harm to the actual or potential market" for a product or service that incorporated the collection.52 For this purpose, the term "collection of information" was very broadly defined,53 and during face-to-face negotiation in the Senate (in which the authors of this article participated directly),54 the publishers claimed that a single lost sale would fit within this standard of harm to the market. Any substantial new investment in updates or maintenance would prolong protection beyond fifteen years, with no limit to the number of renewals.55 The bill initially recognized no exceptions for science and education as such, but, at the last minute, a provision tacked onto H.R. 2281 held scientists and educators liable only for harm to "actual markets," and not for harm to "potential markets" for their nonprofit uses of protected information.56

The situation was further complicated in January, 1999, when Chairman Howard Coble of the House Committee on the Judiciary's subcommittee dealing with intellectual property rights introduced a new version of the database protection bill, H.R. 354,57 which modified the previous bill in at least two important respects. First, a serious effort was made to limit the term of protection to fifteen years, with little or no possibility of extension even in dynamic databases that are continuously updated.58 Second, a new provision established an exemption for "additional, reasonable uses" by educational, scientific and research organizations,59 which was loosely based on the "fair use" provisions of copyright law. However, this ambiguous provision would limit the proposed exception to "an individual act of use or extraction of information done for" specified purposes,60 which apparently placed the burden of proof on the otherwise infringing researcher.61

Because one can read the new exception for scientific and educational uses set out in H.R. 354 broadly or narrowly, depending on how one interprets its latent ambiguities, it is instructive to assess the likely impact of the proposed legislation on science and technology as it stood at the end of 1998. We can then factor the proposed amendments into the analysis and compare them to certain promising proposals that emerged from face-to-face negotiations between stakeholders, held in the Senate under Senator Hatch's auspices in late summer of 1998.62 Accordingly, if Congress had adopted H.R. 2281 at the end of 1998, and that law had subsequently been applied to online delivery of the data contained in the book or article that were previously discussed in connection with the workings of copyright law,63 the following results would have been likely to occur.

2. The Resulting Legal and Practical Constraints

In principle, a second scientist or engineer could not make any uses of the information or data that were not permitted by the form-contract site licenses that regulated access to the online database from which they were extracted.64 The site license could charge one price for accessing or consulting the database, a second price for downloading it, and a third price for using it or reusing it in other contexts.65

Even though the second scientist or engineer normally would have paid to access the data and information (and they are not copyrightable by definition), he or she could not use them in ways not permitted by the terms and conditions of that site license, which, in turn, would now be supported by a duly enacted federal intellectual property law.66 Absent some constitutional override, the second comer could not, therefore, independently generate a similar article or study based on the same material without permission, even though the relevant data were now revealed to the public.67 Because the data no longer entered the public domain,68 he or she would need to obtain a new grant or substitute funding to repeat the collection process, in which case scarce funds would have been used to duplicate the creation of knowledge already in existence. This, of course, contradicts the norms of science, which favor building on previous discoveries and the sharing of research results.69

In many instances, the data will be based on one-time events that later scientists and engineers could not physically regenerate, in order to fall within the permitted acts of independent creation under the database protection laws.70 Even when regeneration remained feasible, the cost in relation to the niche market of likely users would normally be so high that few second comers would willingly regenerate the data.71 Hence, sole-source providers are likely to remain a dominant feature of the database landscape, real competition will continue to be the exception, and the strong property rights given database proprietors would potentiate existing barriers to entry.72

Later scientists and engineers could not combine data legitimately accessed from one commercial database with data extracted from other databases to make a complex new database for addressing hard problems without obtaining additional licenses and permissions. This remains, perhaps, the single most critical problem for scientific and technical research.73 Despite reassurances to the contrary from leaders of the international publishing community to leaders of the scientific community at a recent meeting in Paris,74 lawyers representing publishers at face-to-face negotiations held in the Senate late in 1998 continued to insist that this customary and traditional scientific practice would, in principle, violate their redistribution rights.75 Another critical factor is that there would never be a sale that exhausted the publisher's rights, only a license, which the proposed model laws of computerized information transactions would make perpetual.76

No one could combine "substantial" amounts of data or information into a more efficient follow-on product without a license; the licensor would labor under no duty to grant such a license: and the sole-source provider would not want any competition from follow-on products.77 This also suggests, however, that the price would not be set so high as to encourage independent creation of the same data, when otherwise feasible. If so, and potential producers of follow-on products tended to invest in other activities, it would further discourage competition and innovation.78

So long as natural and artificial barriers to entry remained high, scientists and engineers must pay artificially high prices to access commercial databases in the absence of competition. The enactment of strong exclusive property rights (complemented by strengthened contractual rights if the proposed model law were also adopted79) thus seems likely to reinforce the pervasive sole-source character of the marketplace and exert further upward pressure on prices.80

Meanwhile, scientists and engineers who paid to access protected databases could not routinely lessen overall transaction costs by lending, borrowing, or transferring the data they extracted to others working on a common problem. This follows because there would never be a sale or transfer under some equivalent of the "first sale" doctrine of the copyright (and patent) laws,81 only a license that would logically restrict further transfers without any time limit.82 Scientists and engineers who continued to share data once acquired without obtaining permission and without paying additional fees for such heretofore traditional or customary uses would "harm the market" that the database proprietor presumably secured by dint of the proposed legislation.83

The data would not enter the public domain for at least fifteen years, and possibly never, if the private party were to continue to invest in maintenance or updates of a dynamic database.84 Even data that nominally entered the public domain at expiry of the fifteen year term could remain unavailable in practice if would-be users lacked means to identify and isolate those data within the larger mix of protected and unprotected data comprising a dynamic collection.85 If such data were rendered technically identifiable, nothing would prevent the proprietor from using electronic fencing devices and standard form contracts to further preclude extraction even after the intellectual property right had expired.86

Moreover, unless proper precautions are taken, there is considerable risk that data generated or funded by the U.S. government would become privatized in ways that unduly restricted access on onerous terms and conditions.87 If this were allowed to happen, taxpayer-financed data would be sold back to science and education at monopoly prices, with the likelihood that additional state subsidies would be needed to defray the costs. In the European Union, where governments intend to commercialize publicly funded data, insufficient thought has been given to this problem in general and to the impact on science and technology in particular.88

A common thread uniting all the foregoing observations is the lack of any limits on the power of providers who benefit from legal protection of databases to impose any licensing terms or conditions they wish on access to, and use of, their products. In principle, the database provider could override by contract even the few exceptions and limitations contained in the bill, including the public's right to use insubstantial parts of a database.89

The net result, as Professors Reichman and Samuelson pointed out in an earlier article, is that, under the U.S. database proposals, as under the E.U. Directive,

the most borderline and suspect of all the objects of protection ever to enter the universe of intellectual property discourse-raw data, scientific or otherwise-paradoxically obtains the strongest scope of protection available from any intellectual property regime except, perhaps, for the classical patent paradigm itself.90

When the provisions added to the latest bill, H.R. 354,91 are factored into the analysis, the end result is only slightly improved, at least in appearance if not in practice.

The first significant change mentioned above, which would more clearly inject protected data into the public domain after fifteen years,92 is of course a move in the right direction. However, the drafters still ignore the difficulties of identifying and accessing data whose term of protection had technically expired, an issue that was widely discussed last year.93 The bill also ignores the power of database providers to override formal access to data that nominally entered the public domain by combining adhesion contracts with electronic fencing devices.94

The second major change is a good faith effort to address some of the concerns of the scientific and educational communities by means of new, "fair-use-like" provisions.95 However, these provisions are both ambiguous and too narrowly drawn.96 By placing the burden of proof on scientists and engineers, whose "individual acts" of "reasonable" use remain subject to scrutiny case by case,97 they would continue to exert the chilling effect on research98 that seems inherent in any "fair use" approach to a database law that does not otherwise provide the many other safeguards familiar from copyright law. Hence, as we explain below, a different kind of approach, one not strictly linked to the "fair use" concept, will be needed to ensure that a sui generis database regime does not harm customary and traditional scientific activities.99

Even if a satisfactory legal formula to avoid harm to science and education were found, that formula would remain largely ineffective if database providers could simply override it by contract or, in the alternative, if the publishers could just charge more for access if they knew that the state would require them to charge less for extractions and reuse by scientific and educational bodies.100 In short, unless the bill expressly and adequately immunizes traditional scientific and technical pursuits, the only limit on the database providers in most instances is what a monopoly market will bear.

C. Long-term Implications of the Sui Generis Model

We believe that the long-term implications of the proposed regime are potentially very damaging for science and technology. All science operates on databases. The near-complete digitization of data collection, manipulation, and dissemination over the past thirty years has ushered in what many regard as the transparency revolution.101 Every aspect of the natural world, from the nano-scale to the macro-scale, all human activities, and indeed every life form, can now be observed and captured as an electronic database.

According to Nobel laureate Joshua Lederberg,

[d]ata are the building blocks of knowledge and the seeds of discovery. They challenge us to develop new concepts, theories, and models to make sense of the patterns we see in them. They provide the quantitative basis for testing and confirming theories and for translating new discoveries into useful applications for the benefit of society. They also are the foundation of sensible public policy in our democracy. The assembled record of scientific data and resulting information is both a history of events in the natural world and a record of human accomplishment.102

1. Reversing the Transparency Movement

Science builds on science. In all areas of research, the collection of data sets is not an end in itself, but rather a means to an end, the first step in the creation of new information, knowledge, and understanding. As part of that process, the original databases are continually refined and recombined to create new databases and new insights. Typically, each level of processing adds value to an original (raw) data set by summarizing the original product, synthesizing a new product, or providing an interpretation of the original data.103

The processing of data leads to a not readily apparent paradox. The original unprocessed, or minimally processed, data are usually the most difficult to understand or to use by anyone other than the expert primary user. With every successive level of processing, the data tend to become more understandable and frequently are better documented for the nonexpert user. As the data become more highly processed, documented, and formatted for easier use, they also are more likely to attract copyright protection.104

Yet, it is the raw, noncopyrightable data that are typically of greatest use and value to researchers, who can manipulate and experiment with the original measurements in pursuit of their own research goals. If strong intellectual property protection of noncopyrightable data sets, which previously had the least commercial marketability, weakened the still nascent impetus toward transparency, it could disproportionately affect the availability of data most commonly used in basic research and higher education.

2. Transaction Costs Unlimited

The success of the U.S. basic research and educational system is predicated on the relatively unfettered access to and use of factual information; on a robust public domain for data; and on easy re-use, recompilation, and value adding applications of data. Practically all databases developed in the pursuit of basic research and education are motivated by non-economic incentives such as the desire to create knowledge, the thrill of discovery, and the enhancement of professional status.105 The new database laws, however, place an overriding emphasis on protecting original investments and on augmenting purportedly necessary economic incentives to create new databases. At the same time, they undervalue the adverse effects on scientific and technical progress, as well as the aggregate economic and social costs inherent in restricting and discouraging the downstream applications and transformative uses of noncopyrightable databases in general.106

The lack of any restraints on licensing, especially on sole-source data providers, adds to the dangers inherent in the creation of a strong exclusive property right in collections of data.107 In particular, the ability of data providers to override by contract even the limited exceptions that the new law may grant to public-interest users, including scientists, engineers, and educators, is of great concern. Without a concomitant duty to deal fairly and reasonably with public-interest users, these combined powers could lead to high prices for data and to the imposition of harsh and oppressive terms concerning both access and subsequent uses of data that would especially disadvantage academic researchers.108

Moreover, scientists and engineers will have to defray increased transactional and administrative costs engendered by the need to enforce the different legal restrictions on newly obtained data, to institute new administrative guidelines regulating institutional acquisitions and uses of such data, and by associated legal fees. Because universities and government agencies are inherently conservative, risk-averse institutions, they will err on the side of caution and place additional limits on what researchers and educators can do in acquiring and using data in order to avoid the possibility of costly litigation.109

The proposed database law would severely discourage the re-use, recompilation, and other value adding uses of data. Anytime someone uses data in a "collection of information" protected by the proposed law, that user becomes exposed to claims that he or she will have harmed the database originator's actual or potential markets.110 As a practical matter, this means that once public-domain data are collected and used for one purpose, such as to prepare a compilation of poisons and antidotes, it will foster a strong disincentive to use the same data for other purposes lest those uses violate the "harm to other markets" principle. By the same token, database recompilers or value adders incur the risk of lawsuits for infringement every time their new database resembles some pre-existing database, whether those data were used or not.111

One of the most serious problems of all is the risk of inhibiting the creation and exploitation of multiple-source data products, which have become the scientific method of choice for addressing hard new problems. Because research is increasingly conducted by teams, often operating from different institutions, the pertinent data "are drawn from multiple sources, recombined and merged with new data to produce data sets that may lead to new and unanticipated findings."112 As Joshua Lederberg testified at a hearing on H.R. 354, the "recent advent of digital technologies for collecting, processing, storing, and transmitting data has led to an exponential increase in the size and number of databases created and used. A hallmark trait of modern research is to obtain and use dozens or even hundreds of databases, extracting and merging portions of each to create new databases and new sources for knowledge and innovation."113

In this regard, the Administration itself predicted that, under the current proposals, scientists and engineers would face rising transaction costs when attempting to create complex databases from multiple public and private sources. Also predicted are higher costs due to the burdens of administering national data centers and of carrying out related, large-scale management activities that currently benefit from the policy of open and unrestricted access to scientific and technical data.114

3. Endless Monopolies and Diminished Access to Government Data

Because many data providers are sole-source and an exclusive property right would greatly strengthen the legal and economic protection of these mini-monopolies, the proposed legislation seems likely to raise the costs of data acquisitions to researchers and educators generally, not to mention other consumers. Those costs would either be passed on to the government and the taxpayer through increased research contract and grant requests, or they would simply diminish the resources available to researchers and education. If the costs and restrictions on all downstream or transformative data users-whether in the public or private sector-similarly increased (as feared by database proprietors opposed to strong protection), it would discourage socially and economically beneficial forms of exploiting factual data that have up to now been available from the public domain.115

The fifteen year term (which is potentially much longer because of loopholes favoring constantly updated, dynamic databases) is particularly likely to hamper the progress of science and technology, a prospect that troubles the Administration, too.116 Such long delays in unfettered access to and use of data will undermine the value of many data sets for most fields of research, including research pertaining to the formulation of government policy; and in other cases it will effectively remove them from comparative analysis with other, openly available, concurrent data sets.117

A fifteen year period appears completely arbitrary and has not been seriously compared with other, potentially shorter, periods of protection.118 The proposed legislation thus defeats a primary, constitutionally mandated purpose of intellectual property laws, which is to establish a public domain that "promotes science and the useful arts,"119 from which researchers, educators, and other downstream users can build on previous contributions to further knowledge.

The proponents of the legislation say that nothing prevents a user or competitor from independently creating an equivalent database.120 But many databases cannot be recreated from scratch. Data that are time-sensitive, unique, very old, or prohibitively expensive fit this description. In research, this includes virtually all observational data sets of transient natural phenomena, as well as data from very costly or labor-intensive experiments. Furthermore, a basic underlying principle in research and education is that the creation of new knowledge should build on the base of existing data and information, and that scientists and engineers should not have to duplicate previous factual compilations or discoveries in socially and economically inefficient ways.121 Protection of investments in factual databases is not the only interest that the law should seek to protect in this area.

Public interest users in the United States are likewise concerned about the applicability of the E.U. Database Directive to government data and the potential restrictions on access to and use of European public-sector data. Moreover, even though the proposed U.S. legislation does expressly exempt government data from its scope of protection, there are concerns that, as drafted, this exemption could be circumvented in several ways.122

This can occur, for example, if the contractors or grantees are not expressly required either to provide their data back to the government for public dissemination, or to make the data publicly available themselves under appropriate terms and conditions.123 Absent such universal vigilance by the government, a lot of data produced as a direct result of public funding could end up under proprietary control of researchers or their institutions. Because most of the noncopyrightable databases generated with government funding in the United States are actually created by non-government employees, whether in academia or industry, the failure of government agencies to enforce this exemption could have a far-reaching impact on the full and open availability of publicly funded data. Indeed, there is some risk that government agencies could increasingly view database protection as an income-generating opportunity, like their European counterparts.124 As more university research is funded by private sources, more data will likely be removed from the public domain in the form of income-producing products.

Still other legislation, if combined with increased database protection, could further limit the principle of full and open access to government data. For example, the Commercial Space Act of 1998 encourages NASA to purchase space and earth science data collection and dissemination services from the private sector and to treat data as commercial commodities under federal procurement regulations.125 When coupled with strong protectionistic measures, such as those contemplated by H.R. 354, we could eventually witness the passing of substantial amounts of data from the public domain of entire federal agencies. It also remains unclear if the government concludes an arrangement with a private sector party to disseminate public data or information, whether there will be adequate safeguards that either promote competition or that require low-cost access for public-interest users.126

4. Gaming the Cooperative Ethos

Finally, a high-protectionist regime tends to undermine scientific and technical cooperation over time and to exert a progressive chilling effect on data-intensive research. As scientists, engineers, and their employing institutions become more accustomed to a new legal regime that encourages the commercial exploitation of their own research data sets, the cooperative culture that has become the hallmark of so many fields of science will be threatened.127 Universities have already indicated that they intend to commercially exploit databases, and they have obtained an exemption for state universities from the government data exception in the proposed legislation.128 If scientific institutions in one segment of the research community try to commercially exploit their colleagues in other institutions or countries, still others will be tempted either to emulate such behavior or to cut off cooperation. Either way, science and technology would suffer.

Even if scientific data exchanges in established cooperative research programs were allowed to continue among a select group of principal investigators and an approved class of associated researchers, it would become increasingly difficult for other researchers outside the officially sanctioned group to obtain full and open access to the program data. This result, of course, would discourage interdisciplinary research and applications, contrary to the interests of technological innovation and the advancement of knowledge.

If simple exchanges of data and access to single databases became legally threatening or prohibitively expensive, imagine the potential transactional burdens that ill-conceived laws could impose on data compilers or users who needed to integrate data from multiple, or even hundreds, of different sources. This brings us to what may well be the most profound¾ and insidious¾ impact of the proposed legal regime on science and technology: the lost opportunity costs that will be repeated thousands of times each day across the basic and applied research communities. If scientists and engineers must choose between spending a lot of administrative time and a larger percentage of their valuable research grants on acquiring data and doing other, less data-intensive work, they will increasingly opt for the second route, despite the astounding yields that have so far been harvested from data-intensive research under existing conditions.

For all these reasons, an overly protective database regime would seriously impede the use, reuse, and transformation of the factual data that are the lifeblood of science and technology. In a worst case scenario, this law would first disrupt the system of cheap access to upstream data for purposes of basic research, it would then lead to ever higher prices for the acquisition of data used in applied research, and finally, it would strangle the ability of value-adding researchers and industries to improve, transform, or develop follow-on databases and related information products.129 These outcomes would, in turn, greatly reduce the downstream applications of scientific breakthroughs subject to exclusive property rights. In sum, putting a strong property right too far upstream too soon130 could have a disastrous effect on the long-term competitiveness of the U.S. economy and would undermine a key comparative advantage this country enjoys in the high-tech sectors of the global marketplace.

III. RECENT DEVELOPMENTS: THE QUEST FOR AN APPROPRIATE UNFAIR COMPETITION APPROACH

Concern about these issues mobilized the scientific, educational and library communities to express their views both to Congress and to the World Intellectual Property Organization ("WIPO"). For example, the Presidents of the National Academy of Sciences, the National Academy of Engineering, and the Institute of Medicine sent several letters to the Administration and to leading members of Congress responsible for this legislation.131 The International Council for Science ("ICSU") likewise intervened at relevant meetings of WIPO, and documents submitted by ICSU have played a prominent role in successful efforts to block rapid or premature efforts to launch an international treaty regulating databases modeled on the unbalanced E.U. Directive.132 ICSU has also begun direct consultation with publishers' representatives, with a view to working out some common understanding applicable to database protection issues affecting the scientific community.

Initial efforts in 1997 and early 1998 to slow the legislative process in the U.S. House of Representatives were not successful. The concerns of the scientific community were largely ignored by the House Committee on the Judiciary's Subcommittee on Courts and Intellectual Property, which first pushed H.R. 2652 (the "Collections of Information Antipiracy Act") through the House in the spring of 1998 and then had that bill attached to the Digital Millennium Copyright Act, H.R. 2281, in July 1998.133

A. The Administration's Position

More recent developments, however, have been favorable to the interests of the scientific and educational communities. To begin with, an interagency review initiated within the Administration in the spring of 1998 produced a series of important position papers that supported the theses that the scientific community had already put forward. On August 4, 1998, the General Counsel of the U.S. Department of Commerce, Andrew Pincus, wrote Senator Patrick Leahy, Ranking Minority Member of the Senate Committee on the Judiciary, to advise that any new database legislation must avoid capture by private parties of government data and that "any effects [it may have] on non-commercial research should be de minimis."134 This letter was sent as a consensus position of all departments and agencies of the Administration.

Consistent with these views, the Administration expressed concerns about possible

increase[d] transaction costs in data use, particularly where larger collections integrate data sets originating from different parties or where different parties have added value to a collection through separate contributions.... This is especially important for large-scale data management activities, where public investment has leveraged contributions from the private and non-profit sectors.135

The letter went on to express further concerns "that the ... exception for noncommercial research and educational uses does not ensure that legitimate non-commercial research and educational activities are not disrupted by the prohibition against commercial misappropriation" and that sole-source providers might unduly burden "access and use" by this sector.136 The Administration reiterated most of these same concerns in testimony before the House Subcommittee on Courts and Intellectual Property, at hearings concerning H.R. 354 on March 18, 1999.137

The Administration's initial letter of August 4, 1998, also referenced the Department of Justice's "serious constitutional concerns that the First Amendment restricts Congress's ability to enact legislation" of this kind, and that other constitutional obstacles would have to be overcome.138 These constitutional impediments were elaborated in a 26-page memorandum by the Legal Counsel of the U.S. Department of Justice, dated July 28, 1998, which detailed a serious indictment of the sui generis model then pending before Congress.139

On September 28, 1998, moreover, the Chair of the Federal Trade Commission ("FTC"), Robert Pitofsky, wrote the Chair of the House Committee on Commerce, Tom Bliley, to express additional concerns about the pending database legislation.140 In particular, the FTC found that "certain provisions within the proposed legislation raise concerns about possible unintended, deleterious effects on competition and innovation," and that "the potential for anti-competitive use of a 'collection of information' is substantially increased when there is only a single source for the data."141

Finally, it should be noted that the United States Patent and Trademark Office ("USPTO") held a conference on April 28, 1998, to reexamine database protection and access issues. In July 1998, it issued a Report that, while endorsing a modified form of sui generis database protection, expressed support for some of the concerns that the scientific and educational communities had been voicing.142

B. A Negotiated Discussion Draft in the Senate

In late July of 1998, the Chair of the Senate Committee on the Judiciary, Senator Orrin Hatch, invited representatives of some of the major stakeholder organizations and companies to participate in strenuous negotiations, which lasted from the beginning of August through early October. These negotiations were conducted under the leadership of Senator Hatch's Counsel for Intellectual Property, Edward Damich, and with the participation of the counterpart staffer in Senator Leahy's office, Marla Grossman.

1. Clarifying the Demands on Scientific and Technical Users

The U.S. Academies took the unusual step of participating directly in these negotiations.143 They submitted a series of alternative proposals aimed at providing a balanced piece of legislation that would protect publishers against free-riding conduct and preserve the incentive to invest (through a true unfair competition approach) without creating a strong exclusive property right in collections of data and factual information.144

Although the direct negotiations produced no major breakthroughs or compromise solutions, they did succeed in clarifying the different positions. It seems fair to say that, when exposed to direct interrogation, the publishers' detailed demands more than justified the scientific and educational communities' initial concerns.145 Indeed, in response to one hypothetical situation after another, the publishers' representatives made it clear that the exclusive property right they championed for the digital network system would in fact engender the kind of legal and contractual demands on scientific, technical, and commercial users of protected databases that critics of the proposed legislation had been fearing and that are described in this article.

Perhaps because the publishers' actual demands amply confirmed the concerns that the Administration's own position papers had expressed, the final phases of the negotiations, as mediated by the Senate staffers, produced far-reaching modifications to the database component of H.R. 2281. On January 19, 1999, Senator Orrin Hatch placed in the Congressional Record a statement acknowledging that "considerable progress" had been made during the aforementioned negotiations and that, "in the end we were close to a workable compromise."146 Senator Hatch also put in the Record "a discussion draft that is identical to the last of the discussion drafts ... [he had] offered last year."147

2. Compromise Proposals

The changes incorporated in the last Discussion Draft substantially reflected the Academies' own position. Although there is no way of knowing the degree of assent to all the various provisions it contained, it is worth reviewing the package of compromise proposals embodied in the last Hatch Database Discussion Draft of October 5, 1998.148

First, the strong property right approach was nudged closer to a true "misappropriation" (unfair competition) approach. This was accomplished by conditioning liability on acts that "cause substantial harm to the actual or neighboring market" of database proprietors,149 and by inviting courts, in the draft legislative history, to determine "substantial harm" in light of "whether the harm is such as to significantly diminish the incentive to invest in gathering, organizing, or maintaining the database."150

Second, a full exception that would immunize customary scientific activities was adopted by the Senate staffers, in place of the limited and unacceptable "fair use" approach that the Administration had eventually recommended.151 A "fair use" approach, modeled on copyright law, would fail because other basic copyright immunities and exceptions, especially the idea-expression dichotomy, would not carry over into the database protection environment. On the contrary, because a database law protects collections of facts and data that are ineligible under copyright laws (and because scientists perceive no valid distinction between "data" and a "collection of data" in a dynamic electronic database), basic research methods that were previously permissible would become infringing acts under such a law. The burden would then fall on scientists and engineers to show that a vague fair use exception should excuse some of these infringing acts from whatever test of harm was adopted.

In contrast, the Academies successfully argued that customary and traditional scientific activities should remain untouched and unhampered by any new database protection law, exactly as the government's initial position paper had maintained.152 To this end, section 1304 of the final version of the Hatch Database Discussion Draft stated the following proposition:

Nothing in this chapter shall prohibit or otherwise restrict the extraction or use of a database protected under this chapter for the following purposes:

1) for illustration, explanation, or example, comment or criticism, internal verification, or scientific or statistical analysis of the portion used or extracted; and

2) in the case of nonprofit scientific, educational, or research activities by nonprofit organizations, for similar customary or traditional purposes.153

Only if scientists, engineers or educators working at nonprofit organizations caused substantial harm to the database-maker by using unreasonable and non-customary amounts of the collection for a given purpose, or if they in fact produced a market substitute for the original, or otherwise sought to avoid paying for the use of research tools devised as such, would liability kick in.154 On this approach, the burden fell on publishers to show that scientists had crossed the line of permitted, traditional, or customary uses, which were otherwise immunized. The guiding principle that science, technology, and education should be left no worse off after enactment than they were before, as proposed by ICSU,155 would thus have been implemented.

Third, additional immunities and exceptions favoring certain instructional and library uses of databases were also defined,156 although more thought needs to be given to educational users generally in this context.157 Fourth, efforts were also made to reduce the likelihood that private interests might permanently capture government-generated data,158 although more remains to be done on this score as well.159

Fifth, a clearly-worded duration clause ending protection after fifteen years reduced (but did not altogether eliminate) the risk of perpetual protection.160 A rudimentary database deposit scheme was also proposed, which increased the likelihood of data eventually entering the public domain, albeit in a cumbersome and, perhaps, costly fashion.161 If that route were taken, more incentives would be needed to ensure that deposits were actually made. However, the Administration favors developing other, simpler incentives to ensure the availability of public domain data that are worth exploring.162

Sixth, the need for some regulation of licensing terms and conditions was expressly recognized. A series of provisions required periodic studies of the misuse doctrine as applied to licensing agreements, or to the use of technological measures that might frustrate the "permitted acts" clause of the bill. Particular grounds for the study included sole-source provider contracts that imposed unreasonable terms or conditions; tying or other practices traditionally recognized as abusive; and practices shown to have "prevented access to valuable information for research, competition, or innovation purposes."163 The draft legislative history then clarified that courts were free to apply these same criteria to claims of misuse arising after the time of enactment and need not "refrain from applying the doctrine of misuse until the study is completed."164 There was some further possibility that criteria for evaluating the misuse of licensing agreements might have ultimately been codified in the operative clauses of the Act itself.

Finally, the draft legislative history to accompany these measures also clarified the definition of databases in ways that tended to exclude ordinary literary works, and it denied protection "to any ideas, facts, procedure, process, system, method of operation, concept, principle, or discovery, as distinct from the collection that is the product of investment protected by this Act."165 Needless to say, we think the proposed database law should expressly codify these provisions. Indeed, the fact that the bill's proponents oppose inclusion of such a basic limitation in H.R. 2281 only serves to reinforce our concerns about the true nature and extent of their intended exploitation of the legislation's most restrictive provisions.

C. Uncertain Future of the Database Protection Law

The foregoing discussion reveals the extent to which the Hatch Database Discussion Draft evolved away from the strong exclusive property right approach, adopted in the E.U. Directive, toward a more balanced unfair competition approach that protected publishers against piracy while consciously avoiding harm to science, education, and other public-good uses of data. Of course, not all the issues of concern to science were addressed in a fully satisfactory manner; but given the need for compromise and consensus, the ability of the staff to produce a relatively balanced bill from such unpromising material as the House bill deserves commendation.

Perhaps the biggest unaddressed issue was that of value-adding uses. The Discussion Draft did not resolve the tensions between a dominant group of database publishers, who seek to control value-adding uses of protected collections, and a dissident group of publishers and allies, who believe value-adding uses should remain as unfettered as possible.166 On this point, the Academies proposed a scheme favoring easy use of data for commercial value-adding purposes in exchange for the payment of reasonable royalties under an automatic licensing scheme;167 but neither side would accept this approach. Nevertheless, under the misappropriation approach to "substantial harm," as elaborated in the Draft Legislative History,168 courts could work out the criteria for balancing incentives to invest against incentives to compete for the short run, and these case-by-case solutions could be legislatively evaluated later on.

In the end, the Hatch Discussion Draft was not adopted mainly because time ran out in which to remove the last remaining wrinkles that prevented an agreed compromise.169 As a result, the database component of H.R. 2281, Title V, was stripped from the Digital Millennium Copyright Act, which was enacted at the end of the legislative year. Work on database protection has begun all over again under the aegis of a new Congress, which convened in January 1999.170

During the fall of 1998, some members of the coalition that had opposed H.R. 2281 drafted still another bill that sought to implement unfair competition principles more aggressively than was contemplated in the final Hatch Discussion Draft.171 There exists some support for this so-called "minimalist" unfair competition approach, which would protect databases only against wholesale duplication for an indefinite period of time.172 However, this solution could easily degenerate into a de facto exclusive property right conferring perpetual protection by the back door.

Whatever happens next, the final version of the Hatch Discussion Draft constituted a milestone along the route towards a more balanced model of database protection, and its lessons should inform the next round of legislative deliberations. There is unofficial and anecdotal evidence that the Japanese government may also embark upon a true unfair competition approach, which, if true, would afford a unique opportunity for the United States and Japan to present a united front to the rest of the world. In that event, other countries would probably move in the direction of a more balanced unfair competition regime, which might leave the E.U. alone to continue its experiment with a strong property right, or to modify its Directive so as to obtain a more balanced system of protection with fewer social costs.

Unfortunately, the scientific community will experience serious challenges to the policy of easy access to, and unrestricted uses of data, regardless of the approach to database protection that ultimately emerges from Congress and the legislation of other countries. As pointed out at the beginning of this article, publishers can already control the dissemination of data by combining technical protection measures with adhesion contracts in the online environment, even without the adoption of specific database legislation.173 While the presence of an intellectual property right would strengthen the publishers' position and put the scientific and technical communities under grave legal disadvantages,174 the absence of an exclusive property right would not free them from the need to rethink their whole approach to maintaining the unrestricted flow of scientific and technical data in an emerging information economy.

We trust that two new studies by the National Research Council, which are examining these issues, will shed further light on the options for science when they are published in mid-1999. Meanwhile, it seems clear that the scientific and technical communities will have to consider ways of reconciling a greater degree of commercialization for databases generated within the academic community with the need to maintain privileged access to the same databases for scientific and other public interest objectives. Universities and research institutions that generate data will thus have to develop rules for disciplining grants and the uses of data obtained from grants. Separate channels for the nonprofit distribution of scientific and technical data may have to be created, with particular rules for participating organizations. Ultimately, it may also prove desirable to develop an extended licensing authority for certain classes of scientific data, in order to administer these resources with low transaction costs and uniform rules for commercial and non-commercial users.175

In general, efforts must be made to preserve the sharing ethos with respect to publicly-generated scientific data, to encourage those who invest in the production of privately-generated data to provide price discrimination in favor of the scientific and educational communities, and to develop differentiated products for the non-profit sector.176 As the Academies recently explained, Congress should strive to reconcile legitimate measures to repress parasitical copying of protected databases with the equally legitimate needs of the scientific, technical, and educational communities. These communities require:

  • access to data on fair and reasonable conditions;
  • the ability to use the data accessed for research or educational purposes; and
  • freedom from contractual or technical interference with these pursuits.177

These objectives will, in turn, require close collaboration with governments. The goal is to ensure that data generated at the taxpayers' expense remain available at least for scientific and educational purposes, and that efforts to stimulate greater investment in the development of new databases do not end by creating barriers to entry or otherwise discouraging follow-on innovation and public good uses of the building blocks of knowledge.

IV. PRESERVING THE CONSTITUTIONAL BALANCE OF INTERESTS IN THE NETWORKED ENVIRONMENT

Because everything on the Internet is potentially a "database" or a "collection of information" in our increasingly information-based economy, the law that protects collected information will determine the level of competition and prices in that economy. The EU Directive¾ and to almost the same extent its counterpart proposal pending in the United States178 ¾ opt for a very high level of protection. These regimes buttress mini-monopolies of data and information that could threaten the advance of scientific and technical research, hinder the creation of legitimate new commercial information products, and hurt downstream consumer interests.

A. The Competitive Ethos Under Attack

The fallacy behind most proposals for strong forms of database protection is that they ignore the dual nature of data and information as such. On one level, data function as a raw material of the information economy, a basic ingredient of the public domain, from which scientists and entrepreneurs both draw to fashion their respective products. On a second level, data and information are bundled into downstream products that attract intellectual property rights and related contractual agreements. The mistake is to presume that strong intellectual property rights that were empirically well-suited to downstream applications-mainly derived from the patent and copyright models-are equally well-suited to upstream regulation of the data as inputs into the process of innovation.

The opposite is true. If we balkanize the public domain and make the transaction costs of recreating it by contracts prohibitively expensive and complex, a dysfunctional legal system will impede the cumulative and sequential development of technical paradigms by depriving routine innovators of access to the building blocks of knowledge.179

The truth is that traditionally we have left small grain-size innovation to weaker forms of entitlement, that is, to liability principles rooted in unfair competition law, rather than strong property rights, and this has been a basic premise on which the competitive economy of the industrial revolution was constructed.180 These lessons are still germane to the information economy-lessons sounding in reverse engineering and the reuse of ideas, rather than in legally supported monopolies on products of routine innovation and investment.

Until convincing evidence to the contrary accrues, we should address the risk of market failure in the information economy by erring on the side of underprotection rather than overprotection. This follows because there is no real or potential shortage of investment in this milieu once the causes of market failure are controlled; and it is sound public policy, because we do not wish needlessly to encourage the monopolization of the sources of factual data, to deter value-adding innovators, or to retard the progress of science.181

B. The Constitutional Dilemma

The inclination to place strong intellectual property rights in upstream collections of information is contrary to our entire intellectual property tradition and to our basic constitutional heritage. For some forty years, the late Professor Melville Nimmer, a leading authority on both copyright and First Amendment law, taught that copyright protection would violate First Amendment guarantees of free speech were it not for the judicial exclusion of ideas and facts from the reach of the exclusive property rights granted to authors and artists.182 In 1976, Congress codified that exclusion in Section 102(b) of the General Revision of Copyright Law,183 and in 1991, the Supreme Court, in Feist Publications, Inc. v. Rural Telephone Service, Co.,184 reconfirmed the constitutional prohibition against an exclusive property right in either facts or ideas.

Proponents of H.R. 354 and its predecessors openly concede that the "harm to actual or potential markets" test was drawn from Section 107(4) of the 1976 Copyright Law, which codified its fair use provisions.185 This is a constitutionally dubious admission because the very purpose of Section 107(4) is to confirm that protection of the author's market interests in both primary and secondary markets constitutes the true goal of the copyright law's exclusive rights, exactly as Judge Frank declared in his famous opinion in Arnstein v. Porter.186

When transplanted to the database milieu, however, the protection of mere investment in databases that do not rise to the level of creative works of authorship against "harm to actual or potential markets"187 indirectly creates an exclusive property right in noncopyrightable collections of data, which governs both primary and secondary markets. Once collected, no one can make further use of the facts and data contained in the collection without the compiler's permission, even though Section 102(b) of the Copyright Law states that facts and ideas are not fit subjects of an exclusive property right.

True, H.R. 354 does allow "independent creation" of databases in Section 1403(c) and Section 1403(a) exempts nonprofit educational, scientific, and research uses from liability for causing harm to "potential" markets and for certain other reasonable uses.188 But such defenses in the proposed database regime are no more curative of these constitutional flaws than they would be in the copyright regime, for the reason that no one can constitutionally oblige all persons not to use facts or ideas that have been made available to the public. Facts and ideas that the copyright law must leave to unrestricted public use cannot constitutionally be withdrawn from public use under the First Amendment by a database law that protects against extraction and use on both primary and derivative markets.189

In this connection, one should recall that the copyright law, unlike the patent law, does not protect against use as such of even the protected expression, as the Supreme Court established in Baker v. Selden.190 The protection of noncopyrightable data and facts against use on both primary and secondary markets thus impermissibly disrupts the balance established in the federal copyright and patent laws, which implement the constitutional Enabling Clause.191 Notwithstanding the public's right to use facts and ideas under the First Amendment and notwithstanding the constraints limiting Congressional action under the constitutional Enabling Clause, H.R. 354 institutes copyright-like protection for the use of noncopyrightable matter, creates a de facto derivative work right in noncopyrightable compilations, and prohibits transformative uses-however pro-competitive in nature-that harm this reserved or derivative market on the "potential harm" test.

No invocation of unfair competition law can disguise the fact that a "harm to actual or potential markets" test that does not focus on unfair or improper conduct expresses the language of exclusive property rights, which is exactly the function that Section 107(4) performs in the Copyright Law.192 The fact that it is often physically or economically impracticable to regenerate scientific and research data from scratch only enhances the potential restraints on free speech under H.R. 354 as it stands, by risking the withdrawal of facts and data as such from the public domain.193 The federal appellate courts have consistently declared that avoiding the costs of regenerating known facts and ideas constitutes a basic economic premise underlying the constraints on intellectual property protection deriving from both the First Amendment and the Enabling Clause.194

The broad definitions of both "collection of information"195 and "information"196 in H.R. 354 aggravate these constitutional infirmities by drawing "works of authorship" into the realm of a competing and overlapping intellectual property right, and also by casting legal doubts upon the future ability of third parties to make untrammeled use of public domain matter. Anytime someone would use data, including historical data, that are made available to the public contained in a "collection of information" protected by the proposed law, that user would be exposed to claims that he or she will have harmed the database originator's actual or potential markets if that producer had also used the same or similar data. This broad risk of liability cannot fail to have a chilling effect on the use of known facts and noncopyrightable databases in both the commercial and noncommercial spheres; and it is of little consolation to researchers and educators that they must fear only harm they might cause to actual markets, rather than to potential markets, as well.197

C. Erring on the Side of Caution

In contrast, a true unfair competition approach would attach liability only when the third party harmed the database maker's actual or potential market by improper, unfair, or dishonest means.198 Such an approach would not inhibit competitors who "harm" the market by honest and innovative means, and it would not impede true transformative uses that promote competition and the public interest in science and education.199

The "actual or potential markets" test is thus so broad that it would hinder fair competition simply because every successful competitor harms a prior entrant's market by definition and because would-be competitors would never know in advance when the use or extraction of protected data may turn out to cause harm to some unknown potential market. In this and other respects, the "harm to markets" test actually cloaks a reserved market formula, in the manner of the exclusive rights to reproduce and to prepare derivative works granted by the Copyright Law.200 Use of this formula in the database context invites other industries to apply for similar protection against harm to their actual or potential markets; and the cumulative anti-competitive effects of recognizing such special-interest protectionist pleas could seriously undermine the ability of the United States to compete in an integrated global marketplace.

United States intellectual property law and policy have traditionally mandated the unfettered use of noncopyrightable facts and of subpatentable ideas, and have favored unbridled competition with respect to the products of mere investment.201 We shall undoubtedly experience suboptimal investment in the production of databases if Congress fails to protect publishers against certain forms of piratical conduct that threaten to deprive them of the fruits of their investment.202 But if we combat this risk of market failure by enforcing strong monopolies in collections of data, we may end up balkanizing the information economy by recreating the medieval economic quandary in which products could not flow across countries or continents because too many feudal monopolists demanded payments every few miles down the road.203

If we discourage follow-on innovation and public good uses of the very databases whose development statutory legal protection is supposed to stimulate, the end result may be bad for the database industry as a whole204 and devastating for our whole scientific and technical innovation system, which depends on the relatively unrestricted flow of factual data. Instead, we need a regime that loosely preserves a balanced relationship between public and private interests, which courts can develop gradually in response to the empirical conditions of the evolving information economy.205

In the Information Age, as in the Industrial Revolution, we should continue to believe that competition is the lifeblood of commerce, and we should accordingly structure all legal entitlements so as to produce a high degree of competition and maximum dissemination of data and information. If we err on the side of caution and underprotect the building blocks of knowledge, we can always adjust the level of protection upwards later on, in the face of compelling empirical evidence of real economic harm.

But the opposite is not true. Acquired rights and legislatively enacted monopolies cannot easily be eradicated. The wrong decisions today could lessen the vitality of our research enterprise, weaken the national system of innovation, and compromise our future technological superiority, which all depend on maintaining an appropriate balance between upstream and downstream uses of data and factual information.