Section 1201 of the Digital Millennium Copyright Act (“DMCA”) prohibits circumventing technological protection measures on copyrighted works in digital form. As an outlet for fair use, Congress created a triennial review process by which people can petition for an exemption from liability under 1201 for a specific use and class of work. This requires a showing that, among other things, the proposed use is likely to be lawful and that the digital lock makes the lawful use impractical.
Text and data mining (“TDM”) is a shorthand phrase for numerous computational techniques that can be used on datasets to make inferences about the underlying works. For example, TDM could be used to determine the frequency of a particular gender in all popular novels in the 18th century, or to analyze the use of color in a particular director’s films. Relying on Authors Guild v. HathiTrust and Authors Guild v. Google, the Clinic argued that such research projects are likely to be fair use. Although they involve copying entire works, the purpose is transformative as the goal is to gather information about the works rather than what is contained in the works.
In addition, the technological protection measures, combined with the threat of liability under 1201, are the cause of such research going undone. Researchers could theoretically scan thousands of physical books and turn them into text files, but the costs associated with such work is often prohibitive for humanities research. Additionally, existing collections of works that are available for data mining research are either incomplete or lack important research tools. This leaves circumvention as the only true option for many researchers.
Despite opposition from the content industry, the Office largely recommended adoption of the proposed exemption. It includes some limitations, however, including on who may perform such research and the data security precautions researchers must take.