When scientists perform experiments or make observations, they record information in the form of computer printouts, handwritten lists of data, and photographs. This kind of research data is commonly thought to be copyrightable. For example, technical and professional journals routinely attach copyright notices to articles reporting data. The provisions of the Code of Federal Regulations presuppose that 'technical data' first produced pursuant to a contract with a federal agency are copyrightable. [FN1] Similarly, federal law provides copyright protection for handbooks of standardized scientific and technical research data prepared by the Secretary of Commerce. [FN2] Because many of the institutions responsible for sponsoring research and for publishing research results assume that data are copyrightable, scientists may also assume that copyright provides sufficient protection. If this common belief that research data are copyrightable is inaccurate, the ramifications would profoundly affect the scientific community.

Scientists are concerned with protecting research data from competitors, especially prior to publication. Researchers may want to preserve claims to priority in discoveries, and retain the first opportunity *448 to theorize about the data. [FN3] They may also want to control and profit from commercial exploitation of their discoveries, or use the raw data to substantiate the quantity and quality of their work when applying for research grants. Thus, copyright protection is attractive to scientists because it secures protection from the time the work is recorded, and it gives the author exclusive rights to make copies and produce derivative works. [FN4] Copyright in their research data would give scientists the right to reproduce and distribute their material publicly, as well as the right to make derivative works, such as articles summarizing their results. Because of this apparent protection, scientists may depend on copyright in the intangible research data. [FN5]

Similarly, employers of scientists may have expectations that the 'work-for- hire' provision of copyright law will give them control over employees' data. [FN6] If this reliance on copyright is misplaced, research data collected in both private and academic research programs may be unprotected. Therefore, it is important to examine the exact basis in statutes and case law for the claim that research data are protectable as property.

The focus for this paper is on 'raw' scientific research data, defined here to include any unedited recording of information concerning measurable properties of physical objects resulting from experimentation or controlled observation. [FN7] This does not include charts, graphs, tables, and models which express data in summary form. [FN8]

*449 The two-fold thesis of this Comment is that raw data are not copyrightable and that federal copyright law preempts the states from providing a property or quasi-property interest in raw data. Section I sets forth the basic argument for copyrightability of raw data and then counters with the problems that would result from allowing raw data to be copyrighted. In particular, Section I discusses the potential inclusion of raw data in the statutory subject matter categories, the requirement of originality as applied to factual works, the merger of idea and expression in raw data, the issue of dissemination of information versus control, and fair use of raw data. In Section II, potential forms of state protection of raw data are considered. As will become apparent, the developers of copyright case law, as well as the federal statute, did not have scientific research data in mind.


The basic requirement for copyrightability under the Copyright Act of 1976 is that protection only extends to 'original works of authorship fixed in any tangible medium of expression.' [FN9] Thus, the argument for the copyrightability of scientific research data is simple. Recorded data are 'fixed' in a 'tangible medium of expression' when they are written down or photographed. Furthermore, the fixed raw data constitute a 'writing' within the scope of copyright [FN10] because fixation is either a 'literary work' [FN11] under section 102(a) or a 'compilation' under section 103(a). The required originality of authorship is supplied by the creativity and labor of scientists in devising experiments and collecting the data the experiments generate. In short, the argument is that raw data are writings that result from originality and thus satisfy the requirements for copyrightability. One could argue further that the constitutional objective of promoting the 'Progress of Science . . . by securing for limited Times to Authors . . . the exclusive Right to their . . . Writings' [FN12] favors granting copyright protection to raw data. No underlying 'idea, procedure, process, system, method of operation, concept, *450 principle, or discovery' [FN13] can be protected by copyright--protection can be claimed only for a particular 'expression' that the author has produced. Even if raw data is copyrightable, other scientists would still be free to duplicate the data by conducting their own experiments. Thus, no 'principle or discovery' would be protected by giving copyright protection to raw data.

A. Problems with Potential Subject Matter Categories

Unfortunately, there are various problems with the argument that raw data are copyrightable. The first problem is determining in which category of copyrightable subject matter, listed in section 102(a) of the Copyright Act, raw data belong. [FN14] The categories in section 102(a) do not neatly encompass the arrays of numbers which constitute most scientific data. For example, one category that might be applicable is 'literary works.' [FN15] However, despite the fact that the congressional reports include compilations of data within this category, [FN16] scientific research data are less self- evidently 'literary works' than most other factual works. Neither Congress nor the courts have addressed the particular issue of how to categorize scientific research data.

The following discussion examines three categories of copyrightable subject matter. The first category is photographs. The second and third are two broad categories of fact-gathering works--collections of facts, and non- fictional narratives. [FN17] However, there are significant problems *451 with each potential category. Both collections of facts and non- fictional narratives raise the issue of whether to protect only a researcher's original contribution or, because of the vast effort expended, all the facts the researcher uncovers.

1. Photographs

Scientifically significant photographs such as those of astronomical events or Wilson cloud-chamber events appear to fall within the meaning of the statute. [FN18] Photographs of nature are copyrightable. [FN19] However, the case law on photographs presumes they are the result of artistic creativity. None of these cases involve photographs produced strictly for their informational content. [FN20] Courts in photograph cases emphasize the artistic element of selecting and arranging the objects to be photographed. [FN21] Such creative decisions make a photograph an original work of authorship. It can be argued that the reason the requirement of creativity is greater here than in other areas of copyright protection is that without some creativity a photograph is not an'original work' of an 'author' but merely a mechanical reproduction of material in the public domain. [FN22]

Some of the concerns expressed in early cases about granting copyright to photographs are relevant in the scientific setting because scientific photographs are media for recording data, rather than works of art. In one early photograph case, upholding the copyright on a posed photograph, the Supreme Court left open the issue of whether an 'ordinary production of a photograph' (i.e., one involving no arrangement of the subject) is copyrightable. [FN23] Such photography 'is merely mechanical, with no place for novelty, invention, or originality. It is simply the *452 manual operation, by the use of these instruments and preparations, of transferring to the plate the visible representation of some existing object, the accuracy of this representation being its highest merit.' [FN24]

Blindly snapping a camera lens is precisely what occurs in scientific experiments. Scientists do select the area of study and prepare experiments, but they exercise no further control over what is revealed by their experiments. The scientists must, in the sense important to the issue at hand, blindly snap the shutter once the topic of the experiment is prepared for the experiment to be valid. If scientists were more manipulative, their experimental results would reflect only their theories and would not be 'objective.' [FN25] Thus, research photographs do not fit within the Supreme Court's characterization of a photograph as 'the personal reaction of an individual upon nature.' [FN26]

With regard to scientific photographs, the issue is whether the originality requirement of copyright can be satisfied by the preparation and design of experiments. [FN27] Research photographs are the product of 'creative intellectual or aesthetic labor.' [FN28] But it is only creativity of expression, rather than labor, which is relevant to copyright law. Research photographs, like research data in general, do not contain creativity of expression. [FN29]

Photograph cases under the Copyright Act offer no guidance on whether the resulting data embodied in the picture are copyrightable. Because research photographs are devices for recording data and are not intended to be works of art, [FN30] it may make more sense to treat scientific research photographs, along with computer printouts and handwritten arrays of data, as 'fact- gathering' works.

*453 2. Compilations and Maps

Although the general rule is that facts are not protected by copyright, certain types of fact-gathering works, such as compilations and maps, are protectable. [FN31] The 1976 Act defines compilations as works 'formed by the collection and assembly of preexisting materials or of data that are selected, coordinated, or arranged in such a way that the resulting work as a whole constitutes an original work of authorship.' [FN32] Maps are pertinent to research data issues because courts sometimes require actual field observations to support claims for copyright protection. [FN33] However, while maps are technically within the category of 'pictorial, graphic, and sculptural' works, [FN34] they can be analyzed as compilations because they involve selection, synthesis, and judgment like other fact-gathering works. [FN35] Accordingly, no separate discussion is required.

There are two distinct lines of cases that ascribe different degrees of protection to compilations of facts. The first line of cases [FN36] follows the literal words of the 1976 Act, which states that: ' t he copyright in a compilation . . . extends only to the material contributed by the author of such work, as distinguished from the preexisting material employed in the work, and does not imply any exclusive right in the preexisting material.' [FN37] Thus, under this line of cases, it is the selection and arrangement of material by the researcher which, if more than trivial, [FN38] is the *454 protectable element in a collection of preexisting material. [FN39] As a result, the facts gathered can be used by any person who does not also take the researcher's intellectual efforts of selecting those particular facts and arranging them in that particular order. [FN40]

The second line of compilation cases rewards the labor of the researcher by giving him a property interest in the material gathered. [FN41] This material is protected from substantial copying, although anyone is free to duplicate the research provided she does not copy the prior results. In other words, in this line of cases protection for labor expended extends to the content of fact-gathering works (the facts themselves) rather than merely to the author's expression (selection and arrangement).

However, the 'rationale behind protecting such compilations . . . is not clear' [FN42] since copyright in compilations is supposed to protect only original material contributed by an author. [FN43] One court explained the rationale for such extensive protection as follows:

The compiler's contribution to knowledge normally is the collection of the information, not its arrangement. If his protection is limited solely to the form of expression, the economic incentives underlying *455 the copyright laws are largely swept away. Recognizing this, the courts have long afforded protection under the copyright laws against appropriation of the fruits of the compiler's industry. [FN44]

However, that court recognized that such 'protection does not fit nicely into the conceptual framework of copyright law and has for that reason been criticized.' [FN45] Indeed, granting protection on the basis of the labor expended in making a compilation has been criticized by Professors Nimmer and Gorman for expanding copyright protection by importing ideas into the field of copyright which are unrelated to the protection of expression. [FN46]

Recently, courts have rejected the 'sweat of the brow' rationale in both a classic factual compilation case and a classic map case. [FN47] In the former, the Court of Appeals for the Second Circuit recently stated: ' t o grant copyright protection based merely on the 'sweat of the author's brow' would risk putting large areas of factual research material off limits and threaten the public's unrestrained access to information.' [FN48] In the latter, the court explicitly rejected the sweat of the brow requirement stating that authorship for maps exists in 'selection, design, and synthesis.' [FN49]

Moreover, scientific research data do not fit either the 'selection and arrangement' or the sweat of the brow rationale well. Consider first the selection and arrangement rationale. Raw data consists of very large collections of informational items. Attempting to characterize raw data as a 'compilation' is problematic, however, because the order of raw data is dictated by the laws of nature, not by the creativity of the scientist. [FN50] A scientist does not synthesize data, he only records it. [FN51] The arrangement of data is predetermined as intractably as is the *456 chronological ordering of historical events. The only scientifically significant order is one which illustrates the law of nature or other phenomenon discovered by the scientist. There is no room for variation in the presentation and thus none for individual choice. [FN52]

The higher the degree of selectivity and judgment in creating a compilation, the stronger the claim is to copyright protection. However, collection of raw data involves little creativity. From the point of view of copyright law, a scientist's only contribution to the data collected is the exercise of creativity and effort in determining which phenomena to study and which variables to correlate. [FN53] This effort amounts to no more than choosing a subject within the public domain to copy and determining a framework for copying. As Judge Wyzanski put the point: ' t o constitute a copyrightable compilation, a compendium must ordinarily result from the labor of assembling, connecting, and categorizing disparate facts which in nature occurred in isolation. A compilation, in short, is a synthesis.' [FN54]

Yet it is not always clear whether scientific experimental and observational data can be characterized as an analysis of a series of events rather than an analysis of a single, larger event. Often research data consist only of a picture or record of a single occurrence. Such data do not fit easily into the subject matter category of fact compilations. As Judge Wyzanski stated: '[i]t is rare indeed that an analysis of any one actual occurrence should be regarded as a compilation.' [FN55]

There is no copyrightable element of transformation or synthesis of the public domain material recorded during a scientific experiment or observation even if there is creativity in designing the experiment. Although, 'practically anything novel can be copyrighted,' some novelty in expression is needed. [FN56] As a result, research data are not copyrightable as fact compilations under the selection and arrangement criteria.

*457 At first glance the sweat of the brow rationale seems to provide a better basis for copyright protection for scientific data. The standard of originality is not especially high. For example, under the sweat of the brow rationale, telephone books have been given protection for merely placing names submitted to the telephone company in conventional alphabetical order. [FN57] As Judge Learned Hand stated: ' t he man who goes through the streets of a town and puts down the names of each of the inhabitants, with their occupations and their street numbers, acquires material of which he is the author.' [FN58] In addition, computer data-bases are copyrightable [FN59] although it makes little sense to speak of an 'order' in these automated compilations.

One can argue that scientists should be given copyright protection for data resulting from their experiments, which are often carried out at great time and expense, because compilations 'have value because the compiler has collected data which otherwise would not be available.' [FN60] Even where the order of data is dictated by nature, scientists frequently reveal information not otherwise available to the public. In addition, giving protection would force other scientists to repeat experiments, thereby checking their predecessors' results. One can argue that the practice of checking previous results by replication would lead to more reliable theories, and should be encouraged.

Unfortunately, under current practice repeating experiments to test their validity 'is a myth, a theoretical construct dreamed up by the philosophers and sociologists of science.' [FN61] Grants are not normally awarded for checking other scientists' results. Scientists repeat experiments only if the earlier results are controversial; otherwise, scientists accept the findings and build upon them. [FN62] Thus, one could argue that an incentive to duplicate research is needed.

However, the reasons enunciated for protecting works of diligence do not carry over to the domain of scientific research. The 'sweat of the brow' rationale is invoked only when it is necessary to provide the incentive of a property interest for a socially valuable but otherwise *458 unprotectable work resulting from mere diligence and tedious labor. Although scientists must pay great attention to exacting detail, scientific experiments involve great creativity in their design and in their interaction with theory. Thus, scientific experiments involve more imagination and inventiveness than do works of mere diligence. Scientists attempt to find new phenomena and develop new theories for understanding nature. In this sense, the research process is more of an 'art' than is fact-gathering. Conducting experiments is not an end in itself but only one component of a creative enterprise; a separate incentive for the production of research data is not required. Especially since the legitimacy of the 'sweat of the brow' rationale in copyright law has been questioned, [FN63] this rationale should not be expanded into a creative field such as scientific research.

Furthermore, scientific research differs from other fact-gathering activities in that the resulting data cannot be used piecemeal, whereas facts in directories or parts of a map can be so used. It is the whole of the data that is important to scientists. An individual datum revealed by an experiment is usually of little use apart from the pattern disclosed. A useful directory can either be limited to selected highlights of a subject or can be comprehensive. [FN64] On the other hand, enough scientific data on a particular subject must be produced to support claims concerning alleged discoveries, although not all instances of the alleged phenomena need be produced. In other words, scientific data must be comprehensive, but not exhaustive. Thus, the protection copyright gives against use of even a portion of a copyrighted work is not necessary.

In short, a collection of scientific data does not easily fit the definition of a 'compilation' within the meaning of the 1976 Act. [FN65] The 'sweat of the brow' approach has been used in cases involving fact-gathering activity which are significantly different from scientific research. Scientists simply do not string together facts in the manner contemplated by courts in providing protection for compilations.

*459 3. Non-Fictional Narrative Works

The third branch of fact-gathering cases involves narrative works of fact: news, biography, and history. [FN66] There are two schools of thought about whether the research supporting such works can be copyrighted. The less popular school of thought protects the fruit of a researcher's labor by requiring that subsequent authors do independent research from the original sources and that they not make 'substantial and unfair use' of the first researcher's work. [FN67] The justification given is that the 'substantial investment of time, money, and labor' expended in researching for a work should be protected from appropriators. [FN68]

For example, the district court in Miller v. Universal City Studios held that '[t]he law is clear that research can be copyrightable.' [FN69] That court viewed 'the labor and expense of the research involved in . . . obtaining . . . those uncopyrightable facts to be intellectually distinct from those facts and more similar to the expression of the facts than to the facts themselves.' [FN70] On appeal, however, the holding that research is copyrightable was reversed. The appellate decision typifies the second school of thought, which is the majority view regarding the copyrightability of research.

The valuable distinction in copyright law between facts and the expression of facts cannot be maintained if research is held to be copyrightable. There is no rational basis for distinguishing between facts and the research involved in obtaining facts. To hold that research is copyrightable is no more or no less than to hold that the facts discovered as a result of research are entitled to copyright protection. . . . [T]he law is clear that facts are not entitled to such protection. [FN71]

*460 Similarly, in Harper & Row v. Nation Enterprises [FN72] Justice Brennan, in his dissent, discussed whether the use of factual material from a manuscript describing particular historical events (but not the direct copying of the manuscript) infringed the author's copyright in the manuscript. [FN73] Addressing the issue of whether facts are copyrightable, Brennan noted that ' w ere an author able to prevent subsequent authors from using . . . facts contained in his or her work, the creative process would wither and scholars would be forced into unproductive replication of the research of their predecessors.' [FN74] Brennan went on to find that, because ' a part from the quotations, virtually all of the material in the allegedly infringing article indirectly recounted the plaintiff-author's factual narrative, . . . n o copyright can be claimed in this information qua information.' [FN75]

Following Justice Brennan's reasoning, allowing copyright in scientific data would amount to allowing copyright in 'information qua information,' which would stifle creative research and promote wasteful duplication of scientific effort. Thus, this school of thought rejects copyright protection for research, and instead emphasizes the factual nature of research and the waste involved in duplicating the effort of the first researcher. [FN76]

*461 Scientific writings pose the same problems as nonfictional narratives with regard to the copyrightability of the underlying research. First, like research for other nonfictional narrative works, scientific research represents the expenditure of significant time and labor. Second, protection of raw data would force subsequent researchers to duplicate the effort of a prior researcher. Currently, when a scientist publishes an article containing theories and summaries of data, the article as a whole is copyrightable. Other scientists can criticize the results by either conducting new experiments in order to produce data indicating a position contrary to that advocated in the article, or by re-analyzing the published data to reveal an error in the original analysis. If the raw data contained in the article were copyrightable, the latter option would be unavailable.

The conflict between the two schools of thought on the copyrightability of research should be resolved in favor of the majority view precluding copyright protection for raw data, at least where scientific research is involved. Granting protection would force subsequent researchers to repeat experiments. While such duplication would serve the useful purpose of checking earlier results, [FN77] it would frequently result in a wasted effort. Furthermore, scientific data collection involves less of a selection process than historical or biographical works, as scientific data are collected comprehensively rather than selectively from the available data. [FN78] Thus, the rationale for protecting the research effort itself is weaker for scientific data than for other nonfictional works.

To summarize, the problem of placing raw data in a category of copyrightable subject matter is formidable. Raw data are too integral a part of a process involving too much imagination to justify invoking the 'sweat of the brow' rationale, yet are too rigidly dictated by nature to justify the 'selection and arrangement' rationale. Therefore, raw data fall outside the recognized categories of copyrightable subject matter.

B. Authorship & Originality

Copyright protects only an author's original contribution. [FN79] In many ways, this requirement is extremely lax. One court has said: ' a ll *462 that is needed to satisfy both the Constitution and the statute is that the 'author' contributed something more than a 'merely trivial' variation, something that can be recognized as 'his own.' [FN80] Originality in this context 'means little more than a prohibition of actual copying.' [FN81] However, if scientists merely record public domain material, they contribute nothing to the expression; if nothing has been added by an individual researcher, it makes no sense to speak of an 'author' or 'originality.'

The Copyright Office Regulations promulgated under the 1909 Copyright Act (but still applicable under the 1976 Act) deny copyright to '[w]orks consisting entirely of information that is common property containing no original authorship, such as, for example: standard calendars, heights and weight charts, tape measures and rulers, schedules of sporting events, and lists of tables taken from public documents or other common sources.' [FN82] Like the works listed in this regulation, raw scientific research data do not contain sufficient expression and originality to be protected by copyright law.

1. Originality of Expression

A compiler of disconnected facts makes an original contribution by the selection and arrangement process or by the labor expended in collecting the material. [FN83] Such a compiler is an 'author' of a writing, i.e., one 'to whom anything owes its origin; originator; maker; one who completes a work of science or literature.' [FN84]

An axiom of copyright law is that the Act protects only the expressions of ideas, not the ideas themselves. [FN85] Thus, ' t here is no copyright of facts,' [FN86] as no one may claim original expression in facts. [FN87] When any expression is so 'straightforward and simple' as virtually to 'spring *463 directly' from uncopyrightable material, there is no 'original creative authorship.' [FN88]

The court in Alfred Bell & Co. v. Catalda Fine Arts [FN89] proposed a test for determining when a work based on public domain material will support a copyright. The court stated that 'a 'copy of something in the public domain' will support a copyright if it is a 'distinguishable variation." [FN90] Scientific research data fail this distinguishable variation test. The facts expressed by the raw data are stated in the simplest language possible: scientific notation. The barest description of an event is, in the eyes of copyright law, a writing without an author.

Although facts are sometimes discovered by an author, they are not themselves works of authorship. As Professor Nimmer remarked, '[o]ne who discovers an otherwise unknown fact may well have performed a socially useful function, but the discovery as such does not render him an 'author' in either the constitutional or statutory sense.' [FN91] Authorship requires originality of expression, not merely the discovery of a fact.

The fifth circuit has also expressed the opinion that facts are akin to discoveries, not original works:

Obviously, a fact does not originate with the author of a book describing the fact. Neither does it originate with one who 'discovers' the fact. 'The discoverer merely finds and records. He may not claim that the facts are 'original' with him although there may be originality and hence authorship in the manner of reporting, i.e., the 'expression,' of the facts.' [FN92]

A 'discovery' has been judicially defined as the 'disclosure of an hitherto unknown fact, principle, or theory.' [FN93] Such discoveries are the substance of the work of scientists, but discoveries, along with ideas, *464 procedures, and processes, are not copyrightable 'regardless of the form in which they are described, explained, illustrated, or embodied' in an original work of authorship. [FN94] Accordingly, scientific data do not meet the requirement that copyright subject matter be an expression owing its 'origin' to an author.

2. Facts as Expressions of Theory

Post-empiricist philosophers believe that a scientist mixes theory with observation to generate facts [FN95] in such a manner that all data are 'expressions' of an underlying theory. [FN96] In other words, there are no 'bare facts' since every fact is 'an event as we see it' [FN97] reflecting the theory of a particular scientist. The copyright argument is thus that a scientist's world view influences what she deems 'facts' [FN98] and that therefore, her observation and record of data creates a tangible 'expression' of her ideas and theories.

However, even if raw data are theory-laden, the data present the facts from one point of view in a very simple manner. Scientists select questions to answer and experiments to perform, thereby preselecting the type of data that will result and providing in advance a framework with which to interpret the data. This preselection does not mean that the recorded data have any additional 'expression' in them. Every fact requires some conceptualization, but raw data includes no expression apart from this bare conceptualization. [FN99] Through creativity and labor, scientists carefully phrase their questions, but nature supplies the answers.

*465 To produce a copyrightable expression describing a single event, the account of the event must either have individuality of expression or reflect the author's 'peculiar skill and judgment.' [FN100] The bare reporting of events that occurs in scientific experimentation lacks such skill and judgment. Designing an experiment preselects certain data for expression, but this is comparably only to choosing which public domain fact to copy--nothing separately copyrightable is involved. There might be more complicated ways to express simple facts: for example, Einstein could copyright books written in a standard language explaining the significance of the formula 'E = mc 2' but the formula itself is not copyrightable. [FN101] Likewise, the simplest expression in scientific language of any idea or fact is not copyrightable, although a complicated expression would be.

Each scientific datum gives a simple (usually mathematical) description of one fact. In addition to each individual datum describing one fact, the data collectively 'merge' into a single fact which the observer can understand as an automatic expression of a natural law or pattern. [FN102] The expression embodied in the data is inseparable from the underlying fact. In particular, one court has treated such expression as uncopyrightable because, ' c opyright protection will not be given to a form of expression necessarily dictated by the underlying subject matter.' [FN103] Similarly, in a historical research case a court held that 'if the expression arrangement and selection of the facts must necessarily, by the nature of the facts, be formulated in given ways then they are not copyrightable.' [FN104]

*466 One important corollary to the idea/expression distinction [FN105] is that an expression will receive copyright protection only if it is possible to create alternative expression of the same idea involving substantial variation. [FN106] An expression of uncopyrightable subject matter is itself uncopyrightable if only a limited number of ways to express the given subject matter are available. [FN107] If any one expression were given copyright protection, a virtual monopoly over an idea or fact would result. [FN108]

Therefore, because the primary concern with the free flow of ideas prevails over the property interest, the law denied copyright protection.

When the 'idea' and its 'expression' are thus inseparable, copying the 'expression' will not be barred, since protecting the 'expression' in such circumstances would confer a monopoly of the 'idea' upon the copyright owner free of the conditions and limitations imposed by patent law. [FN109]

Thus, where idea and expression merge, neither is protected by copyright law.

Scientific research data are the paradigm of the merger of idea (or fact) and expression--the information expressed is the data. Idea and expression coincide when 'the expression provides nothing new or additional over the idea,' [FN110] and this is precisely what occurs with scientific *467 data and facts. Conjectures concerning new discoveries or theories are interpretations of the data, not something already 'in' the data themselves. If the idea is taken to be the experimental procedure, then the idea is so detailed that only one pattern of data could result. In this case, the variation in expressions represented by differing sets of data would be trivial, and thus the expression would collapse into the idea. In simplest terms, the data as a whole express a law or pattern of nature which the underlying reality produces. Thus, the data collectively express one idea in addition to each datum reporting an individual fact. [FN111] Therefore, protecting the data collectively or individually proves just as difficult as protecting the data individually because there is no protectable expression distinct from the underlying unprotectable facts.

Such data are forms of expression which cannot be varied without altering the facts expressed. [FN112] There is no room for a plurality of expressions, and hence no room for the creativity protected by the copyright laws. Thus, based on the statutory and case law, copyright protection probably does not extend to raw scientific data. However, because the law is not entirely clear on this issue, the policy considerations which influence copyright doctrine should be examined.

C. Policy considerations: Dissemination of Information Versus Control

Copyright law has two basic goals: (1) to encourage the dissemination of information; and (2) to provide an incentive to authors by granting them property rights in their works for a limited period of time. [FN113] Accordingly, the decision to grant or deny copyright protection to research data follows from analyzing the tension between the free flow of information and the great control which results from providing a property interest in original works of authorship. Scientists need control of their data for research and theorizing, and copyright protection could provide that control. On the other hand, society has an interest in the dissemination of research data.

*468 Copyright law attempt to resolve this tension by granting protection only to 'expressions' and not to the underlying 'ideas.' [FN114] Society's interest in the free dissemination of ideas is promoted, while at the same time, creative endeavors are encouraged through the protection of their expression. Thus, ' t he public interest in the free flow of information is assured by the law's refusal to recognize a valid copyright in facts.' [FN115]

One may argue, however, that copyright protection of raw data would actually aid, rather than hinder, the free flow of information. In other words, the two primary goals of copyright law may be complementary rather than conflicting. For example, the Code of Federal Regulations provides that: '[i]n order to enhance the transfer or dissemination of information produced at Government expense, contractors may be permitted to establish copyright in the data first produced in the performance of work under a contract containing [a specific clause].' [FN116] This provision evidences Congress' concern that private publishers of handbooks of standardized reference data need some incentive to publish data produced at government expense. Copyright protection makes feasible the participation of private publishers in the program by protecting a publisher's investment. [FN117]

*469 A similar economic consideration may apply to the initial publication of research articles by professional journals. [FN118] Incentives to undertake expensive and time-consuming research may be increased by allowing scientists to copyright raw data. Furthermore, requiring scientists to replicate experiments and observations in previously explored areas will produce a checking procedure for the original data. Arguably, then, providing copyright protection would contribute to the advancement of knowledge.

However, allowing copyright of research data would more likely impede, rather than encourage, the dissemination of data. First, the replication argument has been discredited. [FN119] Second, because 'expression' and 'fact' merge in research data, [FN120] granting copyright protection to the data will have the effect of giving a scientist virtual monopolistic control over facts which would otherwise be part of the public domain. Other scientists would, of course, have the right to reinvestigate a field of study by conducting their own experiments, but subsequent researchers would not be able to copy data without the first researcher's permission. [FN121] If the experiments are too costly and time-consuming to replicate, as is likely with many research topics today, the actual results of any scientific experiment would be monopolized to the detriment of both the scientific and general community.

Furthermore, because little replicative research is done, copyrighting raw data would result in a significant detrimental impact on surveys of scientific subjects. No scientifically useful comprehensive summary of data would be available because any such subject would be a 'derivative work' [FN122] to which the original copyright owner has exclusive rights. [FN123]

*470 At best, copyrighting of raw data would lead to repeated, needless experiments to duplicate the copyrighted data. [FN124] Scientists would not be able to extend the research of their predecessors where old research is necessary to establish support for further findings. [FN125] This kind of restrictive control of data would be a great hindrance to the progress of knowledge [FN126]--especially in an enterprise such as scientific research where each scientist relies so extensively upon the contributions of other scientists.

Furthermore, the policy of granting protection as an economic incentive [FN127] is not as central in scientific research as in the realm of directories and other fact-gathering works, because there are incentives and rewards for scientific research other than the commercial exploitation of the collected data. [FN128]

In short, the decision whether to grant copyright protection to research data is informed by conflicting motivations. In general, when there is a conflict between public interest and the potential copyright owner's interest, the former will prevail since the primary objective of copyright protection is to serve the public interest. [FN129] Thus, when 'expression' and 'fact' merge, [FN130] as in research data, considerations related to the free flow of information should prevail over the other policy *471 considerations embodied in copyright law.

As the Supreme Court said in Baker v. Selden:

The very object of publishing a book on science or the useful arts is to communicate to the world the useful knowledge which it contains. But this object would be frustrated if the knowledge could not be used without incurring the guilt of piracy of the book. [FN131]

D. Fair Use

Before leaving the topic of copyright, it should be noted that if scientific research data were given copyright protection, other scientists who copy might not be able to avail themselves of the 'fair use' defense against claims of infringement. [FN132] The 1976 Act limits the exclusivity of rights given to a copyright owner for purposes of, inter alia, scholarship and research, by permitting 'fair use' by others of protected material in certain circumstances. [FN133]

Factors to be considered in determining whether use of a work qualifies as a fair use include:

(1) the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;

(2) the nature of the copyrighted work;

(3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and

(4) the effect of the use upon the potential market for, or value of, the copyrighted work. [FN134]

Problems might arise, however, where scientists attempt to use an earlier researcher's data as corroboration of a finding or as the basis for further research. A statistically significant amount of the prior data would have to be utilized to corroborate such a claim. This would probably amount to substantial copying. Since the end result of the second researcher's efforts would probably be a substantially similar collection of data, one would be forced to conclude that there had been infringement. [FN135] Similarly, a useful summary of all the data would be a *472 protected derivative work, and hence under the control of the original copyright owner. [FN136]

Literal application of the factors enumerated in the Act for determining whether a particular use is 'fair' discloses even more problems. The first factor, the purpose of the use would favor a claim for fair use since research is listed as a permitted purpose. Yet subsequent scientists are likely to be engaged in the same enterprise, and, in this sense, the scientists are competitors. Although the purpose of the copying is not directly 'commercial,' it is nonetheless not a totally disinterested, selfless desire to advance knowledge.

With regard to the second factor, the nature of the copyrighted work, one concern will be whether the original copyrighted work is unpublished or not. Under the 1976 Act, publication is no longer the crucial triggering event for copyright protection, but under section 106(3) of the new Act the copyright owner has 'the right to control the first public distribution of an authorized copy . . . of his work.' [FN137] Hence, the fair use defense is not likely to be available if the copyrighted data are unpublished, since use of unpublished data would supplant the copyright owner's valuable right of first publication. [FN138]

The third factor, the 'amount and substantiality of the portion' of the copyrighted work utilized, will be a major stumbling block to a fair-use defense since any scientifically useful copying will take a significant portion of the data. [FN139]

The last factor, the economic effect of the use upon the value and potential market of the copyrighted work, is the most important of the four. [FN140] To negate a fair-use defense, 'one need only show that if the challenged use 'should become widespread, it would adversely affect the potential market for the copyrighted work.' [FN141] In the case of unpublished data, the effect of unauthorized use of data upon the discovering *473 scientist's career may be tremendous since there is no market for duplicative data. In addition, claims to priority and substantiation of work for research grants may be lost.

In short, the fair use defense would probably not be available to scientists who copy another's original research data. Thus, the damaging repercussions of allowing copyright protection for research data would not be abated by the existence of the fair use doctrine.

E. Conclusion

The conclusion from this discussion is that all relevant considerations lead to the same result--there is no justification for granting copyright in research data. Both courts and commentators have suggested that scientific literature in general should receive only limited protection. [FN142] Clearly, even that limited protection should not extend to the underlying facts and raw research data.

'[W]hen an idea is such that any use of that idea necessarily involves certain forms of expression, one may not copyright the those forms of expression, because to do so would be in effect to copyright the underlying idea.' [FN143] Scientific research data fall squarely within this prohibition.


Although the policy of free dissemination of information argues against, and ultimately prohibits, copyright protection of raw data, the other policy considerations discussed suggest that at least some measure of protection should be provided for research data. State causes of action might be considered an appropriate alternative for providing such protection. In particular, the states might consider providing a mechanism for protection of unpublished data. Potential claims include conversion, misappropriation, unfair competition, and causes of action based on contract and quasi-contract. [FN144] This Section considers only those claims which, like copyright, would provide a property interest in the abstract data.

The cause of action most suited to protecting data is a form of unfair competition called 'misappropriation'--the taking of the fruit of *474 another's time and effort for competitive advantage. The United States Supreme Court first articulated the doctrine of misappropriation in International News Service v. Associated Press. [FN145] In that case, the Court held actionable as misappropriation the conduct of a news-gathering organization in systematically copying and selling to clients fresh foreign news that another service had gathered abroad. The Court further held that the news service which had acquired the news items by its organized expenditure of labor, skill, and money had thereby acquired a quasi-property interest in the news, which would be valid only while the news remained 'hot.' [FN146] The court observed that no one may claim a monopoly on the gathering or distribution of news which is only the report of information in the public domain, [FN147] but held that even if the competitor acknowledged the source of information, when the competitor endeavors 'to reap where he has not sown,' [FN148] the 'transaction speaks for itself, and a court of equity ought not to hesitate long in characterizing it as unfair competition in business.' [FN149]

The International News Service approach opens the possibility for the judicial recognition of quasi-property rights in scientific research data. The basis of the Court's recognition of a quasi-property right, however, does not appear to be significantly different from the 'sweat of the brow' rationale for copyright in compilation cases. [FN150] Because this rationale has been rejected, [FN151] it should not be used as a basis for providing protection of raw data under the guise of quasi-property rights. In addition, raw scientific data may be sufficiently different from news so as not to justify the recognition of quasi-property rights in that data. Yesterday's news is old hat, so that the quasi-property right expires quickly. It is not nearly so simple to determine whether raw data is 'hot' or 'cold'.

Even if the 'sweat of the brow' doctrine were revived, and a suitable test developed for applying quasi-property rights to raw data, the *475 states would not automatically be able to provide such protection to researchers. One must first determine whether federal copyright law has preempted misappropriation doctrine, at least as it might apply to scientific research data.

A. Doctrine of Preemption

Section 301 of the 1976 Act provides that federal copyright law preempts all state law rights and causes of action that are 'equivalent' to those protected by the federal Copyright Act. [FN152] Section 301(b) further provides that federal law does not preempt state causes of action with respect to:

(1) subject matter that does not come within the subject matter of copyright as specified by sections 102 and 103, including works of authorship not fixed in any tangible medium of expression; or

. . .

(3) activities violating legal or equitable rights that are not equivalent to any of the exclusive rights within the general scope of copyright as specified by section 106. [FN153]

Stated affirmatively, the Copyright Act preempts a state action if and only if two conditions are satisfied: (1) the subject matter of the work is within the scope of the Copyright Act; and (2) the protected rights are equivalent to the exclusive rights specified by section 106 of the Act.

The intention behind section 301 is to preempt and abolish any rights under state law that are equivalent to copyright and that extend to works subject to copyright protection. [FN154] Whether misappropriation in particular has been preempted by federal law is not clear. [FN155] Misappropriation was included in the original 1976 copyright bill as an example of a state action not preempted by the new Act. [FN156] The House amended the bill by deleting the reference to misappropriation [FN157] but *476 offered no explanation for the deletion. [FN158] The Act's legislative history thus leaves open to dispute whether, and to what extent, section 301 is intended to preempt misappropriation doctrine. [FN159] While the Senate adjudged misappropriation to be 'nothing more than copyright protection under another name,' [FN160] the House determined that misappropriation 'is not necessarily synonymous with copyright infringement.' [FN161] In short, the legislative history is not at all conclusive regarding preemption of state misappropriation law.

Because of the absence of a clear expression of legislative intent on preemption, the following discussion examines the literal wording of section 301 and the two criteria set forth therein to determine whether the Act preempts misappropriation claims against scientists who appropriate for their own use the research data of their colleagues. This Comment concludes that federal copyright law preempts most claims based on the doctrine of misappropriation, thereby leaving scientists without intellectual property protection for their raw research data.

B. Equivalent Rights

Section 106 secures to the owner of a copyright the exclusive rights to reproduce the copyrighted work, to prepare derivative works based upon the copyrighted work, to distribute copies of the copyrighted work, and to perform or display the copyrighted work publicly. [FN162] Under the usual test a state right is not equivalent to a federal right and is not preempted if it requires proof of some element instead of, or in addition to, those acts enumerated in section 106. [FN163]

*477 This test is surely too permissive, countenancing far more state protection than the dissemination policy behind copyright law would allow. For example, the equivalency test would seem to allow state claims for copying of misappropriated literary works simply because misappropriation claims, unlike copyright claims, all require proof of the plaintiff's considerable time and expense and the defendant's intent to reap a competitive advantage. [FN164] Professor Gorman has argued that ' b ecause it is possible to frame almost any state tort so as to evince a protective policy different from copyright, the proffered analysis would too often interfere with the dissemination policies of the Copyright Act and particularly of section 301.' [FN165]

Professor Abrams has also suggested that the current approach to determining equivalency is weak, and has proposed that the appropriate way to determine if rights are equivalent is to determine what right is being asserted, rather than to compare its elements of proof.

Whether the antecedent conditions for asserting a right under state law are identical to copyright infringement or diametrically opposed to it is simply irrelevant. The question to ask is whether the right being asserted is one of the exclusive rights listed in 106. Thus proving that the claimant has invested great time, money, and skill, or that the claimant will suffer harm, is no more germane than proving that the claimant has blue eyes. [FN166]

It is clear that the rights sought by scientists through the misappropriation doctrine are exactly the same as those enumerated in the Copyright Act-- protection against unauthorized reproduction or distribution of their raw factual material. [FN167] Even claims governing the use of unpublished research data would ordinarily involve rights equivalent to the section 106 rights against unauthorized reproduction because a scientist would actually have to publish the misappropriated and previously unpublished data in some form in order to support any conclusion based on the data. [FN168]

*478 At best, section 301 might exempt from copyright protection some claims based on sustained and systematic misappropriation of research data. The legislative history of the Copyright Act directs that:

state law should have the flexibility to afford a remedy (under traditional principles of equity) against a consistent pattern of unauthorized appropriation by a competitor of the facts (i.e., not the literary expression) constituting 'hot' news, whether in the mold of International News Service v. Associated Press, 248 U.S. 215 (1918), or in the newer form of data updates from scientific, business, or financial data bases. [FN169]

Absent such a pattern of unauthorized appropriation, however, a state action for the misappropriation of scientific research data will reduce to the equivalent of an action against copying or reproduction. [FN170]

Even if the use of research data were not equivalent to any section 106 right, permitting states to protect rights in the use of scientific data clearly goes too far. Because 'expression' merges with 'fact' in data, [FN171] protecting the use of such data would allow scientists to control material in the public domain which would directly conflict with the fundamental copyright policy of encouraging the free flow of information. This cannot be justified even by a policy providing incentives to undertake scientific research. Allowing scientists to control the use of their research data would inhibit the progress of science by providing individual scientists with too much control over raw facts--even after such facts had been introduced into the public domain.

Thus, with the possible exception of extraordinary cases involving consistent, unauthorized appropriation, [FN172] the rights protected under a state action for misappropriation of scientific data are equivalent to those protected by the 1976 Copyright Act and therefore cannot avoid preemption under the equivalent rights test.

C. Subject Matter

The 'subject matter' test of section 301(b) as applied to research data is more problematic. Section 301(b) permits state protection for subject matter falling outside sections 102 and 103; section 102(b) provides that copyright protection does not 'extend to any idea, procedure, process, system, method of operation, concept, principle, or discovery, regardless of the form in which it is described, explained, illustrated, or *479 embodied.' Thus, under a literal reading, section 301(b) would permit broad state protection of ideas, processes, etc. [FN173]

Moreover, section 301(a) specifies that only 'works of authorship that are fixed in a tangible medium of expression and come within the subject matter of copyright as specified by sections 102 and 103' are governed exclusively by the Act. Since facts and data are not 'works of authorship,' [FN174] they would not be governed by the Copyright Act, and arguably would be open to state protection.

The difficulty with this literal reading of section 301 derives from the general goal of the Copyright Act. The House Committee Report contains the following passage.

As long as a work fits within one of the general subject matter categories of sections 102 and 103, the bill prevents the States from protecting it even if it fails to achieve Federal statutory copyright because it is too minimal or lacking in originality to qualify, or because it has fallen into the public domain. [FN175]

Research data and other bald expressions of facts fixed in a tangible medium of expression seem to fall easily into this preempted category because they are 'too minimal' in their expression. In addition, the Supreme Court held in Goldstein v. California [FN176] that only those areas left 'unattended' by federal law--areas in which Congress had 'drawn no balance'--are not preempted by federal copyright legislation. [FN177] The inclusion of section 102(b) in the 1976 Act strongly suggests that Congress has drawn a balance in the area of facts and intended that the *480 free flow of facts not be restrained in any manner. In other words, the area of facts has been covered by congressional action and Congress has deliberately left it unprotected. [FN178] As the Supreme Court stated in Goldstein,' a conflict would develop if a State attempted to protect that which Congress intended to free from restraint or that which Congress had protected.' [FN179]

With respect to section 301(b), the consequence of the House Committee Report and Goldstein is that fact-expressions fall 'within the subject matter of copyright as specified in sections 102 and 103,' and thus state protection is preempted. In the words of Professor Gorman:

[w]hen Congress declares in section 102(b) that copyright in such literary work does not 'extend to any idea' described, explained or embodied therein it is not declaring such an idea outside the subject matter of copyright so much as it is affirmatively declaring--as clearly as it can, and for the clearest reasons--that ideas are free to be copied, adapted anddisseminated, and that no court is to construe the federal copyright monopoly as inhibiting that freedom. The implication for state law is equally clear: neither can the states. . . . Far from leaving facts, ideas and the like 'unattended,' to borrow a term from the Goldstein case, Congress has very much attended to them in section 102(b), and has declared them to be free as the air. [FN180]

Thus, facts have been deliberately excluded from copyright protection and therefore the states cannot protect them. [FN181]

Permitting state protection of simple factual expressions would create 'vague borderline areas between State and Federal protection' *481 contrary to the intent of section 301. [FN182] The creation of these 'vague borderine areas' would also be contrary to the general congressional intent to provide a 'single Federal system' of statutory copyright protection which 'would greatly improve the operation of the copyright law and would be much more effective in carrying out the basic constitutional aims of uniformity and the promotion of writing and scholarship.' [FN183]

Thus, it appears doubtful that states could provide a property interest protection for scientific research data, because such protection fails both prongs of the preemption test: first, the rights of importance to scientists in protecting raw data are equivalent to those provided by section 106 of the Act; second, raw scientific data fall within the subject matter considered by the Act. Therefore, there is no room for non-federal property interest in scientific research data.


A scientist conducting an experiment and gathering data is not an author of an original work in a sense relevant to copyright. It may seem anomalous that no protection is available under the copyright laws despite the ingenuity and labor expended in creating and carrying out scientific experiments, but from the point of view of copyright law, the scientific researcher is simply gathering the work of another author: nature. The 1976 Copyright Act was not designed to protect scientific data. Furthermore, the cases and secondary authorities on fact-gathering works do not indicate that the Act should be adapted to protect such data. In addition, because scientific data fit squarely into the category of expressions Congress intended to leave unprotected by copyright, the states may not extend protection to such data under the doctrine of misappropriation.

In sum, this Comment has argued that a property interest would not be the proper vehicle for protecting intangible scientific research data. However, this does not mean scientists are necessarily without recourse in asserting rights over their research results. The House Committee Report gives 'invasion of personal rights' as an example of a cause of action not equivalent to copyright. [FN184] Conversion, trespass, *482 misrepresentation, and breaches of contract and of trust are other examples included in the Report. [FN185] Other potential actions not mentioned in the Report include unfair competition and false designation of the origin of a work under the Lanham Act. [FN186] A District Court has even held that an action concerning trade secrets is not preempted when the material was not copyrighted. [FN187] Perhaps these causes of action concerning tangible and intangible property will help secure the rights of scientists in their raw data and protect the integrity of the research process without unduly restricting the free flow of scientific information.

