Friday, April 17, 2026
Executive Summary
At the closing panel of the Berkeley Technology Law Journal’s 29th Annual Symposium on the 1976 Copyright Act, Professors Daniel Gervais, Matthew Sag, Rebecca Tushnet, and Jennifer Urban examined four unanticipated strains on the Act—the licensing architecture’s adequacy for generative AI, the fair use status of non-consumptive copying, the legal standing of non-commercial fan creators, and the stalled orphan works reform—concluding that voluntary licensing, conflict preemption doctrine, and a targeted Section 107 amendment each offer partial solutions, but that no single mechanism can resolve the scale, territorial, and political-economy failures that AI training has simultaneously exposed. Professors Peter Menell, Pamela Samuelson, and Molly Van Houweling followed with closing remarks.
Instructor(s)
Daniel Gervais, Professor of Law (Emeritus), Vanderbilt Law School
Matthew Sag, Professor of Law, Emory University School of Law
Rebecca Tushnet, Professor of Law, Harvard Law School
Jennifer Urban, Professor of Law, UC Berkeley School of Law
Peter Menell, Professor and Faculty Co-Director, Berkeley Center for Law and Technology, UC Berkeley School of Law
Pamela Samuelson, Professor and Faculty Co-Director, Berkeley Center for Law and Technology, UC Berkeley School of Law
Molly Van Houweling, Professor and Faculty Co-Director, Berkeley Center for Law and Technology, UC Berkeley School of Law
Keywords
non-consumptive use • fair use • AI training • machine learning • copyright • Section 107 • orphan works • copyright reform • diligent search • limitations on remedies • compulsory license • mechanical license • Section 115 • Music Modernization Act • blanket license • American Geophysical Union v. Texaco • licensing market • fourth factor • fan fiction • transformative use • non-commercial • Section 103(a) • derivative works • Hathi Trust • Google Books • data mining • non-expressive use copyright • Section 1201 DMCA • anti-circumvention • exemption • non-commercial • remix video • voluntary collective licensing • AI cross-border territorial copyright • “is AI training on copyrighted works fair use under Section 107,” • “what is non-consumptive use in copyright law and does it apply to generative AI,” • Ross Intelligence third circuit text data mining fair use • extended collective licensing
Legal Analysis
The Copyright Act’s Licensing Architecture Under Stress: From Mechanical Licenses to AI Training
The Copyright Act of 1976, Pub. L. No. 94-553, 90 Stat. 2541, did not create a self-executing market for copyrighted works; it created exclusive rights whose economic value depends entirely on functioning exchange mechanisms that the statute largely presupposed rather than established. Professor Daniel Gervais framed this infrastructure as “the ghost architecture of the statute”—the real-world licensing machinery “embedded within it, built alongside it by antitrust enforcement, constructed around it by the courts and extended by subsequent congressional interventions”—and argued that this architecture, not the rights provisions themselves, “is what has made the statute breathe and sometimes cough for 50 years.” The mechanical license in Section 115 is, as Gervais identified, the oldest compulsory license in American copyright law, and its origins reveal the anti-monopoly function that compulsory licenses have historically served: Congress introduced it in the Copyright Act of 1909, Pub. L. No. 60-349, 35 Stat. 1075, not to benefit record companies at the expense of composers but to prevent the Aeolian Company—which had quietly secured exclusive licensing arrangements with virtually every major music publisher—from cornering the market for piano roll reproductions following the Supreme Court’s holding in White-Smith Music Publishing Co. v. Apollo Co., 209 U.S. 1 (1908). The digital streaming era exposed Section 115’s structural inadequacy through the “address unknown” loophole that interactive services exploited in the mid-2000s—filing bulk notices while failing to remit royalties to publishers they claimed they could not locate—a systemic breakdown that Spotify consolidated class action suits, settling for approximately $43 million, and Spotify’s separate settlement with the National Music Publishers Association documented, and that the Music Modernization Act of 2018, Pub. L. No. 115-264, 132 Stat. 3676, attempted to address by creating the Mechanical Licensing Collective as the mandatory administrator of a blanket license for digital phonorecord delivery. Gervais identified a parallel constitutive dynamic in the voluntary text-reproduction market: before American Geophysical Union v. Texaco Inc., 60 F.3d 913 (2d Cir. 1994), institutional photocopying operated in legal limbo, with corporate legal departments lacking incentive to license from the Copyright Clearance Center because no court had held systematic copying to be infringement; after Texaco narrowed fair use by finding that CCC licensing was “readily and reasonably available” and that Texaco’s copying therefore caused cognizable market harm under the fourth factor of Section 107, voluntary licensing was incentivized precisely because fair use was narrowed—a circularity Gervais characterized as “not necessarily a logical flaw in practice” but rather “the mechanism by which the statute working through the courts constitutes voluntary markets.” Against this fifty-year record, Gervais assessed generative AI as “the most severe stress test of the licensing architectures since at least cable television,” combining scale of reproduction that “exceeds anything the existing compulsory license framework was designed to accommodate,” simultaneous impact across every sector of the copyright market, and territorial fragmentation that means any national solution will “interact uneasily with divergent approaches in the EU, the UK and elsewhere, creating the risk of arbitrage and regulatory mismatch that no national licensing scheme can fully eliminate.” He endorsed voluntary and collective licensing—not compulsory licensing—as the mechanism for cross-border solutions, specifically because contracts cross borders in ways that statutory licenses cannot, and predicted that contracts between AI developers and rights holders can “manage the legal risk” of unresolved fair use questions while courts work toward clarity that he assessed as “probably five years out.”
Non-Consumptive Use, Generative AI, and the Case for a Targeted Section 107 Amendment
The foundational copyright question posed by search engines, plagiarism-detection software, and large language models is whether intermediate copying that delivers no original expression to any human reader should be regulated by copyright at all. Professor Matthew Sag, who coined the term “non-expressive use” in a 2009 law review article, characterized the question as “the unanticipated question” that the printing-press metaphor underlying copyright law cannot answer: “what if there are no readers?” He identified a tension between two competing intuitions—the historical position that “a copy is a copy is a copy” and the structural observation that copyright’s substantial similarity doctrine, public performance rights, and other features are all keyed to the communication of original expression to a new audience, so that “hidden intermediate copies” that never reach readers have always occupied ambiguous legal terrain. Sag assessed the recent court decisions on AI training—citing Bartz v. Anthropic and Kadrey v. Meta as finding model training “highly transformative and ultimately fair use,” and noting the Ross Intelligence, Inc. v. Thomson Reuters Enterprise Centre GmbH case currently before the Third Circuit as the notable pending exception—and concluded that “by and large, courts have held that technical acts of copying are fair use when no original expression is communicated to a new audience,” even if courts frame this as transformative use without adopting his theoretical vocabulary. He proposed a modest but targeted amendment to Section 107 that would recognize that “copying works to extract unprotected information or enable non-expressive computational functions is highly transformative,” while deliberately stopping short of declaring such copying categorically fair—a distinction preserving the court’s authority to evaluate all four factors and to find infringement where models genuinely memorize and output substantial portions of their training data. On the licensing question, Sag was skeptical that voluntary collective licensing can serve as a general solution: an ASCAP-style collective for large language models cannot function because “there is no Taylor Swift book for LLM training,” meaning that there is no usage data to attribute relative value across contributors, and the resulting equal-division arithmetic would distribute negligible sums to “billions” of contributors—blog posters, Stack Overflow commenters, social media users—whose individual transaction costs would exceed their share. He distinguished access licensing, which Reddit’s $60 million annual agreements with Google and OpenAI exemplify and which he characterized as “healthy and important,” from training licensing as a general solution, cautioning that “the fallacy I see over and over and over again is, oh, well, there was some licensing, therefore everything can be licensed.” Sag predicted that the international dimension reinforces rather than undercuts the case for fair use: jurisdictions representing 52 percent of world GDP permit text and data mining for machine learning, making AI training broadly lawful across the United States’ peer economies, and the portability of AI training operations means that “a lot of this activity is incredibly portable—you can just take your AI training to other jurisdictions,” a competitive reality that any licensing or tax-and-redistribution system must confront.
Fan Creators, Non-Commercial Expression, and the Orphan Works Impasse: What the 1976 Act Left Unresolved
The 1976 Act’s elimination of formalities, extension of duration, and automatic federal protection upon creation were intended as pro-author reforms, but they generated two structural pathologies that its drafters did not anticipate: a vast population of works whose owners cannot be identified or located, and an equally vast population of creative works made by individuals whose relationship to the copyright system is fundamentally non-commercial and who neither want monetary compensation nor have pathways to participation in licensing frameworks. Professor Rebecca Tushnet described the legal position of non-commercial fan creators—whose Archive of Our Own hosts over 17 million works and has been designated a Library of Congress American Heritage site—as one of de facto toleration rather than formal recognition, arguing that the “tolerated infringement” characterization is “at best, an argument that formal law sweeps way too broadly under any justification you want to give for copyright rights.” She emphasized that statutory damages are the structural mechanism that makes this toleration precarious: if damage from a non-commercial, non-reproductive fan work is cognizable under copyright law and subject to up to $150,000 in statutory damages, then “that damage ought to be bad, not just an annoyance, something that you tolerate,” an observation she identified as vindicating the longstanding position, referenced in the panel discussion, that statutory damages have been harmful to the broader copyright scheme. On the question of whether non-commercial fan creators can participate in AI licensing frameworks, Tushnet was categorical: offering payment to fan authors creates “a category error, like offering money after I’ve hosted you at dinner—that is a mistake about the nature of the transaction,” and voluntary blanket licenses present a structural vulnerability absent from fair use because they “always have that out”—what she termed “tuggable blanket licensing”—whereby permission can be revoked if a particular output becomes controversial, a power that is “most likely to be tugged in fair use situations.” Professor Urban situated the orphan works problem within the same structural framework: the 1976 Act’s features that increased the orphan works population—reduced formalities, removal of the publication requirement, extended duration—were intended as author-protective reforms, but the practical consequence is “more and more work accreting over time without a lot of information carried together with them,” creating a chill that is “not good for the public and not good for further creativity.” She documented a consensus that collapsed rather than succeeded: the Copyright Office produced comprehensive reports in 2006 and 2015 recommending a solution centered on reasonably diligent search and limitations on remedies; the 2008 House bill provided injunctive relief limitations and damage caps, but libraries and other institutional users withdrew support after the HathiTrust and Google Books litigation suggested fair use might be adequate, fearing—as Gervais had noted approvingly—that the existence of a statutory orphan works scheme could itself become evidence of a cognizable licensing market that courts would use to narrow fair use. Urban predicted that AI training compounds the orphan works problem at a scale that resists every existing solution: massive training corpora are too large for diligent per-work searches, there is no known mechanism for removing a work from a trained model, and licensing settlements structured around registration records will systematically exclude both orphan works and unregistered works from compensation, as her example of an author whose publisher failed to register the work illustrates. The Register’s 2015 observation—that extended collective licensing schemes incorporating orphan works distribute money to no one and serve no copyright system purpose in the absence of a traceable owner—remains the most concise statement of why no mechanism currently on offer resolves the orphan works problem at AI training scale.
Generated by AI based on the Interview/Transcript below.
Key Takeaways
- The licensing architecture, not the rights provisions, determines who gets paid. Gervais argued that the “ghost architecture” of compulsory licenses, court-supervised blanket licenses, and voluntary collective management—not Section 106’s exclusive rights—”is the statute’s living organism, the part that determines in any given decade, who gets paid for what on what terms,” and that AI training represents “the most severe stress test of the licensing architectures since at least cable television.”
- Non-consumptive copying should be presumed highly transformative. Sag proposed a targeted Section 107 amendment recognizing that “copying works to extract unprotected information or enable non-expressive computational functions is highly transformative,” arguing that courts have effectively reached this result already in cases like Hathi Trust and Bartz v. Anthropic, but that a statutory fix is available if courts err in the Ross Intelligence line of cases.
- Voluntary collective licensing cannot solve the AI training compensation problem at scale. Sag demonstrated that the absence of usage attribution data in LLM training—”there is no Taylor Swift book for LLM training”—means that dividing revenues equally across billions of contributors would yield per-person sums below transaction costs, making the ASCAP-for-LLMs analogy structurally inapt and characterizing any such system as “a tax system” whose proponents “should admit that’s what you’re doing.”
- Orphan works reform has stalled for political and structural reasons that AI magnifies. Urban documented that a decade of consensus-building collapsed when institutional users preferred fair use to the risk that a statutory orphan works scheme would narrow fair use doctrine, and cautioned that AI training—with its massive scale, inability to remove works from trained models, and registration-based settlement structures—places orphan works in “a space that is a little bit harder to think about” than any prior reform framework addressed.
- Non-commercial fan works need formal legal recognition, not mere toleration. Tushnet argued that the “tolerated infringement” framing is “at best, an argument that formal law sweeps way too broadly,” that statutory damages are structurally inappropriate where harm is concededly negligible, and that non-commercial fan creativity represents “part of the background of a thriving modern creative ecosystem” and “the seed corn for the next generation of creators.”
- AI training licenses for access are healthy but do not prove a general licensing market. Sag cautioned that deals like Reddit’s $60 million annual agreements with Google and OpenAI are licenses for “access”—cooperative API access to improve system performance—not licenses for training data as such, and that courts should not “jump from there to say, oh, therefore, there’s a market effect, there’s no fair use for this.”
- Compulsory licenses cannot cross borders; voluntary contracts can. Gervais argued that territorial copyright law makes compulsory licensing structurally unsuited for AI training, which is inherently borderless, while “contracts are the way across borders”—parties can select governing law, agree to jurisdiction, and include clauses managing the uncertainty of unresolved fair use questions “up to a point.”
- Section 1201 exemptions have proven that small organized groups can convert informal practice into formal policy. Tushnet reported that the Organization for Transformative Works has obtained repeated renewals of Copyright Office exemptions permitting non-commercial remixers to circumvent DVD, Blu-ray, and streaming anti-circumvention measures, demonstrating that “a small group of people can effectively” participate in regulatory processes “if they are in fact representing a group of people with shared concerns.”
- AI’s portability undermines national licensing and tax solutions. Sag warned that “a lot of this activity is incredibly portable—you can just take your AI training to other jurisdictions,” and that jurisdictions representing 52 percent of world GDP already permit text and data mining for machine learning, making unilateral national licensing mandates or tax-and-redistribution schemes difficult to sustain competitively.
B-CLE Recording (CLE: FREE) | Youtube Recording | Resource(s) | Speaker Bio(s) & Contact Info
Download the interview/transcript and slides here!
Interview/Transcript
This interview/transcript was based on a conversation on April 17, 2026 about 29th Annual BTLJ-BCLT Spring Symposium: Origins, Evolution, and Possible Futures of the 1976 Copyright Act, hosted by Berkeley Center for Law & Technology, UC Berkeley School of Law. The panel on ‘Unanticipated Consequences of New Technologies and Practices’ was provided by Daniel Gervais, Professor of Law (Emeritus), Vanderbilt Law School, Matthew Sag, Professor of Law, Emory University School of Law, Rebecca Tushnet, Professor of Law, Harvard Law School, and Jennifer Urban, Professor of Law, UC Berkeley School of Law. Finally, the event closed with closing remarks from Peter Menell, Professor and Faculty Co-Director, Berkeley Center for Law and Technology, UC Berkeley School of Law, Pamela Samuelson, Professor and Faculty Co-Director, Berkeley Center for Law and Technology, UC Berkeley School of Law, and Molly Van Houweling, Professor and Faculty Co-Director, Berkeley Center for Law and Technology, UC Berkeley School of Law.
Jennifer Urban 00:15
Welcome back everyone to our final panel of the first installment of the two part symposium on the 50th anniversary of the 1976 Copyright Act. I am delighted to be here and very grateful to my BCLT colleagues and colleagues at the Kernochan Center and all the speakers for putting together a program that has been absolutely terrific thus far, and thank you to the student editors who are going to be editing the papers, and especially to all of our BTLJ and I-House staff in the room who have been making this work. I’m very pleased to introduce this final panel on unanticipated effects of the ’76 act as conditions have changed. Register Perlmutter made the case that the drafters intended result — a law that was flexible enough to accommodate changes in society and changes in technology or practice has largely been met. I tend to agree with her analysis in many ways, for many aspects of the law. Nonetheless, the 76 act either did not anticipate or has come under some unexpected strain, in some cases, in the years since, and we’ve heard some of that woven in throughout the more historical part of the conference earlier today and yesterday. At the same time, the 76 act itself may present the seeds for solutions. Today, we’re going to talk about four instances of this phenomenon. One is not so much about an unexpected strain. Well, I guess it is technologically. Daniel Gervais, unfortunately couldn’t be here in person, but we are delighted that we will have him here via zoom, and he is going to talk about the copyright act as a statute of exchange that has undergirded licensing architectures currently strained by AI. And I’ll leave him to tell us whether he thinks that is a good path forward for AI. To my right is Matthew Sag, he will be talking about non consumptive uses. And to his right is Rebecca Tushnet, who’s going to be talking about the explosion and expansion of the Creator Community and creator technologies. I will finish up by talking about everyone’s favorite subject, I’m sure, because it’s one of mine, which is orphan works. So with that, if we can, I will turn it over to Daniel.
Daniel Gervais 02:53
Thank you, Jennifer, and I regret very much being unable to travel and not being with you today. Regret it doubly, in fact, because it’s my kind of last day as a full time academic. I’m moving to emeritus status this summer, so I’ll only be there part time. But the good news is, I’m spending the day, at least virtually, with, you know, the best possible group of people. Let me share my slides.
Jennifer Urban 03:28
You’d like to share his slides? Yeah, they’re up. Okay, great,
Daniel Gervais 03:36
Yeah. Except I’m not seeing my own. Okay? Let’s see. Okay, now it’s working very good. All right. So, orphan works is, in fact, everybody’s favorite topic. But I’m sure the second topic on everyone’s list is is licensing. So knowing that that’s not the case, I asked notebook to prepare the most dramatic looking slides on on this topic, so I hope it compensates for the fact that the topic itself may not be exactly as exciting as the slides. So copyright rights are usually economically inert without exchange, and for example, a public performance right in a musical composition means very little if the songwriter cannot practically license it to broadcasters, venues and streaming services, a reproduction right in a journal article is commercially sterile if the transaction costs of individual licensing exceed the value of any single license. The 1976 act created rights whose value could only be realized through functioning exchange mechanisms, and those mechanisms the staff. Statute could not conjure on its own. So what I want to explore today is what I think of as the kind of ghost architecture of the statute. It’s real world infrastructure, by which I mean the licensing machinery embedded within it, built alongside it by anti trust enforcement, constructed around it by the courts and extended by subsequent congressional interventions. This architecture, not the rights provisions themselves, is what has made the statute breed and sometimes cough for 50 years. Why does the US rely on such a conspicuously heterogeneous mix of compulsory licenses, court supervised, blanket licenses, designated CMOs and voluntary collective management organizations. I don’t think this is accidental. Each layer reflects a different congressional judgment about when markets will produce exchange without intervention and when they will not tracing that judgment across 50 years of legislative and judicial history is the subject of my talk. The legislative history of the Act shows that the drafters expected the market to serve as the primary exchange mechanism, but Congress also understood that certain uses would produce market failures if left entirely to negotiation transaction costs dwarfing the value of individual licenses users impossible to identify or monitor, and the practical impossibility of obtaining advanced licenses for millions of daily transactions. In those cases, Congress reached for the compulsory license, a device already present in the 1909 act. This is a point the architecture, as critics often miss, compulsory licenses are not concessions to users at the expense of right holders. They are mechanisms for ensuring that market activity occurs in context where it may otherwise not occur at all. The cable television license in Section 111, the mechanical license in Section 115, the jukebox license in Section 116, each one reflects Congress’s judgment that without regulated access, a particular technology would either be frozen out of the market or would operate in legally uncompensated darkness. The 1976 Act also inherited without creating the performing rights organizations, architecture of ASCA, BMI and SASAC, quietly relying on on that architecture for the public performance right and musical works without needing to create a compulsory license. The mechanical license in Section 115 is, I believe, the oldest compulsory license in American copyright law, and its origins reveal something fundamental about the relationship between copyright technology and market power in whitesmith music publishing in 1908 Supreme Court case, the Court held that player piano rolls were not copies of musical compositions under the pre 1909 copyright law, essentially because they were perceptible only to a machine. Congress set about correcting this, but by the time the 1909 Act was drafted, the Aeolian company had quietly secured exclusive licensing arrangements with virtually every major music publisher positioning itself to monopolize the market for piano roll reproductions, Congress created the compulsory mechanical license not to benefit record companies at the expense of composers or the or the other way around, but rather to prevent a single company from cornering the market. The lesson, I think, is this, compulsory licenses can be anti monopoly instruments, just as they can be seen sometimes as subsidies. The streaming era exposed section 115 structural inadequacy. Interactive Services in the mid 2000s exploited the address unknown loophole, for example, filing bulk notices with the copyright office and failing to remit royalties to publishers they claimed they could not locate in Fergus. Spotify consolidated class action suits settled for approximately $43 million Spotify settled separately with the National Music Publishers Association. These settlements demonstrated not merely individual compliance failures, but a systemic breakdown the music Modernization Act of 2018 tried to address the failure. Congress created the mechanical licensing collective as the mandatory administrator of the blanket license for digital photo record deliverance. This greatly reduced the address unknown loophole and established a musical works database resolving matching problems of unidentified right holders whose royalties had accumulated in unallocated pools. That being said, if you walk around Nashville, you can talk to anyone about the black box, and they know what you’re talking about. The structural lesson is worth stating plainly when voluntary intermediaries fail to deliver complete market cover coverage, Congress can mandate collective administration. That pattern has repeated itself through the 50 year history of the 1976 act, and it will be relevant when I turn to the AI context in a minute. The digital performance right and sound recordings act of 1995 is perhaps the most architecturally self conscious piece of legislation in American copyright history. Congress created a new exclusive right, the digital performance right and sound recordings, and simultaneously constrained it with a statutory license embedding the interactive, non interactive distinction as the boundary between the regulated market and the negotiated market. Every dollar difference between the statutory rate and the negotiated rate gets parties strong incentives to contest which side of the line they’re on, and that contestation has generated persistent boundary litigation for three decades. Sound exchange the government designated mandatory collector for the section 114 royalty represents a hybrid model distinct from everything that preceded it neither a voluntary membership organization like pros, nor a traditional intermediary like Harry Fox, but a creature of statute with mandatory authority over all eligible rights holders without requiring opt in the compulsory license tradition addresses one half of the licensing in architecture, the other half is the voluntary market for reproduction rights in text and images. And its development illustrates a different but equally important dynamic — the wage judicial calibration of fair use can constitute, rather than merely describe, a licensing market before the 1976 act took effect. Photocopying existed in with one Michael might call legal limbo in Williams and Wilkins, the National Institutes of Health, systematic copying of journal articles was held to be fair use, largely because the copyright owners could not demonstrate economic harm with sufficient concreteness. Copyright Clearance Center was established in 1978 to fill that gap, but its early commercial fortunes were very modest, without a clear judicial determination that institutional photocopying was infringement, corporate legal departments had limited incentive to pay for licenses. American Geophysical Union V Texaco, decided by the Second Circuit in 1994 changed that landscape. The Court held that Texaco research researchers’ systematic copying of journal articles was not fair use, turning critically on the fourth factor of section 107. The court found that CCC licensing was readily and reasonably available, and that Texas codes copying therefore caused cognizable market harm. The existence of a functioning licensing market made it possible to find harm. Fair Use was narrowed precisely because reasonable voluntary licensing existed, voluntary licensing was incentivized precisely because fair use was narrowed. This secularity is not necessarily a logical flaw in practice. It is the mechanism by which the statute working through the courts constitutes voluntary markets post Texaco, decisions in the sixth and 11th circuits seem to confirm this pattern. The pro model, which is normally voluntary but effectively mandatory for any serious music user or right owner, was shaped by anti trust consent decrees rather than copyright legislation. It represents the third variant of this architecture, one in which blanket licensing is upheld by Supreme Court precisely because the regulatory constraints preserve enough residual competition to satisfy antitrust law. The record of 50 years is genuinely substantial the Pro System sound exchange, CCC and the MLC have collectively enabled transactions on the scale that individual negotiation could never have achieved, distributing billions of dollars in royalties that would otherwise have gone unallocated. But the architecture’s deepest structural tension is one that recurs in every sector. Legislative updating operates on the political timescale that is systematically slower than technological change. It took the better part of two decades from the onset of the streaming and music downloading era to the MMA passage, a decade of market distortion occurred in the interval in. That gap between the speed of technology and the speed of law is a problem the AI context now presents in its most acute form. The use of copyrighted work and training large language models and other generative AI systems is, in structural terms, the most severe stress test of the licensing architectures since at least cable television. The scale of reproduction involved exceeds anything the existing compulsory license framework was designed to accommodate the works involved span every sector of the copyright market simultaneously, text, images, sound recordings, audio visual work, software and the territorial fragmentation of copyright law means that any American solution will interact uneasily with divergent approaches in the EU the UK and elsewhere, creating the risk of arbitrage and regulatory mismatch that no national licensing scheme can fully eliminate. The voluntary licensing market is beginning to respond direct deals between AI developers and major right shoulders. For example, the deal struck between Open AI and the AP, the Financial Times, News Corp, address some high value end of the market where both sides have the leverage to negotiate. CCC has extended its annual corporate license to cover a number of AI uses. CMOS in the UK, Australia, Canada and elsewhere have AI specific licenses. These developments are not trivial. They demonstrate that voluntary markets can begin to fill a space before the legal framework is settled. The voluntary licensing market may thus be more capacious than its critics sometimes acknowledge. An author retains her right, need not negotiate individually with every AI developer. She could register her work in that through a CMO and authorize it to license collectively. So the history of collective management in the US is at its core, history of coverage expanding incrementally as the administrative and technological capacity to represent diffuse rights holders improve the group. One legislative instrument the American experience Council is heavily against, is a levy, the digital audio recording technology royalty, introduced by the audio Home Recording act of 1992 remains the let’s call it cautionary exhibit. Congress imposed the statutory royalty on manufacturers and importers of digital audio recording devices and blank media, a scheme facially coherent in design, but built for technology that passed through the market like a comet. The 50th anniversary of the statute is also an appropriate moment to ask how a system built on territoriality can govern a technology that is inherently borderless, the answer lies in the capacity of voluntary licensing to cross the borders that statutory law cannot, just as CMOS have long managed global royalties through networks of reciprocal agreements, voluntary collective licensing for AI training could build a functional cross border framework, turning the inescapable and indeed desirable baseline of territoriality from an insurmountable barrier into a manageable administrative reality. At 50, the Copyright Act is simultaneously a monument to legislative ambition and a reminder of the limits of any static text in the world of relentless technological change, the licensing architecture built around its foundations is not a footnote to the statute. It is the statutes living organism, the part that determines in any given decade, who gets paid for what on what terms and through what institutional intermediaries? Two observations as a conclusion. The first is historical. The architecture has proven more adaptive than critics feared and less adequate than its designers hoped. It has managed successive technological disruptions, from photocopying to cable retransmissions to digital downloads to streaming with a combination of legislative intervention, judicial doctrine and market development that has, on the whole, kept the same the system mostly functional. Each adaptation has been slower and more costly than it might have been. None has been perfect, but the architecture has helped. The second observation is perspectiv. The AI context presents the architecture with its most demanding test, combining scale, speed and territorial complexity in ways that none of the previous disruptions did individually. Whether the architecture proves adequate to that test will determine not only the economic fortunes of the creative industries and the AI sector for the next generation. But the broader question of whether a copyright system remains capable of serving its constitutional purpose, as we all know, to promote the progress of science and useful arts, 50 years is a long time for any statute to remain operative in a field that. In this technologically dynamic, whether the flexibility built into the foundations of the statute is sufficient for the next 50 years, is the question this anniversary asks us to take seriously. Thank you very much.
Jennifer Urban 20:22
Thank you very much. Matthew Sag, we’ll talk about non expressive uses next.
Matthew Sag 20:31
Thank you everyone for sticking with us to the final panel. Thank you Berkeley and Columbia for organizing this amazing event.
Jennifer Urban 20:39
I think you’re gonna need to get closer to the mic.
Matthew Sag 20:43
Yeah, I was do that. I’m sorry. Used to my voice traveling.
Matthew Sag 20:45
All right. So non expressive use. Here is the crux of the problem. Copyright is built on a metaphor of printing press. Copyright was originally a response to the printing press, and the printing press still governs how we think about copyright law today. So when we think about copyright law, we think about copyright providing a system of incentives for authors whose work would otherwise be freely copied in that system, it really makes sense that we treat copying as this point of exchange, as this sort of toll gate, because it usually represents an exchange of value where the reader pays money and gets to read or enjoy the work. But the question, the unanticipated question, is, well, what if? What if there are no readers? We have seen since the 1990s, a series of what I call copy reliant technologies, and so what I’m thinking of here are search engines, plagiarism detection software and, of course, now machine learning and generative AI, these works have a common structure in that they necessarily copy works but usually don’t deliver prior original expression to any human reader. This is what makes them copy reliant. They literally rely on copying for their very existence or machinery. And this seems like a fundamentally new problem. Sorry, I’m out of sync with my slides. The question that really couldn’t have been anticipated in the 1970s — this is what Gemini thinks the 1970s looks like, by the way — is should copying without communication of original expression be regulated by copyright at all? Let me explain why I think this is a hard issue. Fundamentally, the question is whether hidden intermediate copies should be permissible if no one ever reads them. This raises a tension between two intuitions. One intuition is that just the technical act of copying, the literal fact of copying, that is what copyright historically has regulated, that has always been seen as an act of infringement. A copy is a copy is a copy. The other perspective is that when you look at the Copyright Act, and you look at how we judge substantial similarity, how we have rights of performance, which are keyed on public consumption, etc, and many other features of the act, you see that copyright revolves around the protection of original expression, and doesn’t really care about situations where the original expression is not communicated in any meaningful way to a new audience. And I think one of the most fascinating lines of cases here is if you dig into all of the Hollywood cases where someone brings an action saying they ripped off my script, and the court says, Yeah, this doesn’t look anything like your script. They say, Aha, but I need discovery to show that their intermediate drafts started with ripping off my script, even though they changed it by the end. And the courts say, No, right? Like we don’t really care about those intermediate copies that aren’t seen by any any new audience. So in 2009 I wrote a law review article on this topic, coining the expression of non expressive use, and I basically pitched my idea for how we should think about copyright and why. Think this is fair use, but I want to acknowledge it is a hard issue. It does expose some competing tensions. When I started writing about this issue, the examples were not abundant. I mostly had in mind software reverse engineering and then plagiarism detection and obviously the Google Books and emerging HathiTrust litigation. Now, of course, we live in the world of generative AI, and I think it’s productive to ask, What if anything is different? Generative AI is just another form of text data mining. Generative AI is just another way of processing pre existing texts to extract information about those texts and information about the relationship between those texts. Now it’s true, sometimes works are so heavily encoded in the models that they are a copy, as Cooper has demonstrated in several papers, but his work also demonstrates that that is very much a minority occurrence. We really don’t have a good understanding of how common representation in the models are that would actually meet the threshold of copyright law. What we do know is that Harry Potter and Fahrenheit 411, etc, they probably are copied in the Lama models. But by and large, these models are not in any meaningful sense, copies of most of the data. Even so, I think the fact that models are capable of producing works in the same functional form as their inputs, they produce things that look like expression. You can train a model on photos, and it will produce things that look like photos. That is something that hits a little bit differently. That is something that I think changes the political economy. It changes the politics, it changes the social implications. But I’m not sure it actually changes the copyright law. I think that ultimately the test has to be, are the outputs substantially similar to the inputs? And if the answer to that question is no, then if you still have concerns about generative AI and machine learning, we need to be thinking about regimes outside copyright to address those concerns. So let’s think about what courts have done. I’m going to be incredibly summary here, because I’m assuming a reasonable amount of knowledge in the room, but if you have more questions, I’m happy to refer you to like a half a dozen law review articles on the topic, I think by and large, courts have held that technical acts of copying are fair use when no original expression is communicated to a new audience that is not the language that the courts use. That is my assessment, my factual description of the cases right? The courts don’t talk about non expressive use. The courts haven’t read my law review articles. Apparently, they just talk about transformative use, which is fair enough. Recent cases Barthes versus Anthropic and Katherine versus Meta both find that model training is highly transformative and ultimately fair use, admittedly cadre with a lot less enthusiasm than in than in Barrett’s the notable exception here is, of course, the Ross intelligence case currently under review by the Third Circuit. And I think we’re all interested to see what the Third Circuit does with that case. I expect if that case goes the other way, it will be on narrow grounds that are very much related to the fourth Fair Use factor, but I will have an answer to that soon. Where is this heading? I think courts have actually done a pretty reasonable job with the non expressive use cases, you know, starting with Hathi Trust, Google Books and the recent Gen AI cases. But I think we don’t have to rely on the courts. If you think about the net com decision, net com is an analogous issue, the court did a great job recognizing just the insanity of holding providers of internet infrastructure liable when they were just a passive pass through. And so net com articulated the principle of volitional conduct, and that worked extremely well, but Congress also stepped in and gave us the 512 safe harbors, very much modeled on net com, but also in some ways a little bit more predictable than the line between volitional and not volitional. There is no reason why a functional Congress could not step in and legislate to provide additional clarity on on this area. And so to that end, I actually have a proposal, which I’ll be writing up for the GW symposium, going in the Journal of the copyright society on exactly that, on how we could and should rewrite section 107 of the Copyright Act. I think that it would take a fairly simple and strategic amendment to simply recognize that copying works to extract unprotected information or enable non expressive computational functions is highly transformative. And you note that I don’t say is fair use, because I think ultimately there are four Fair Use factors, there should be room for the courts to evaluate the whole picture. But I think the case that non expressive use is transformative use is overwhelming. That does not mean that on specific facts, a defendant couldn’t lose under factor four. And that does not mean when you have a model that really does memorize a whole bunch of works and outputs them in some significant way, that that’s non expressive use. I think that’s an expressive use and is outside, outside the concept. But I think if the courts go wrong here, there are actually fairly clean options for statutory reform that could intervene and fix the problem. And so given that I have a couple of minutes left, I just want to talk about licensing, right? A lot of people seem to think that there is a licensing solution available for whatever they perceive the issue to be with AI training. And so I’ve thought about, well, could we have an ASCAP for LMS? Right? ASCAP is amazing. ASCAP is incredibly efficient. They distribute a huge amount of money to a very wide variety of people, but they don’t pay anyone. They don’t write a check for less than $100, they don’t do a direct deposit for less than $1, and they have data on what gets played. ASCAP works, because all of the copyright owners whose works have negligible value don’t get paid. The problem is that, with current technology, we have no way of tracing which individual works are important for the system. There is no Taylor Swift book for LLM training. That means that we would have to divide the revenues equally, and that would be dividing those revenues among a lot of people. We’re not just talking the authors of songs, the writers of books, the photographers. We’re also talking everyone who ever posted on social media, everyone who made a comment on Stack Overflow, everyone who contributed all of the digital ephemera, right, your blog posts, et cetera, et cetera that go into the training data. So that’s not millions, that’s not hundreds of millions, that’s billions, right? And once you take a very large sum of money and divide it by billions, then the costs of sending people a check for their individual contribution are exceeded by the transaction costs. So the LLM could still send checks to some large content owners, but I think those people are able to do individual deals with the AI companies based on putting gates around their content. So really, all that we would be doing is we would be creating a tax system, and if you want to tax large language models and redistribute that money in order to promote worthy social causes, I actually think that’s a fantastic idea, but you should admit that’s what you’re doing.
Jennifer Urban 35:35
Thank you. Rebecca Tushnet will talk about fan creators
Rebecca Tushnet 35:35
And now for something completely different. So when I started my career writing about fan fiction, that is fans writing, for example, the further adventures of Kirk and Spock from Star Trek, or Mulder and Scully from The X Files, people in the legal community were often surprised that I cared. Wasn’t this a bunch of infringing derivative works? Now, when I talk about fan fiction, people in the legal community are often surprised that I care, because non commercial fan works seem obviously transformative and fair, or at least obviously not going to come under legal threat. Chloe Zhao directs movies for Marvel while giving interviews about her fan fiction. The actress who plays Dr Javadi on the pit says that she’s a regular girl and gives us a key example that Javadi is on a 03 which he expects you to know means the Archive of Our Own. My students have never known a world in which fan fiction was hard to find. I’m more pleased to be in the latter situation, but it does make me feel a bit old, and given that non commercial fan works were not on the radar of the drafters of the Copyright Act, even if some of them almost certainly had interacted with fan cultures, in particular, the science fiction fandom scene. My placement on this panel makes a lot of sense, despite being kind of at the other end of concern from the previous panelists. So I’m going to give you a bit about my relationship with fan works. I’m a founder and presidently co chair of the legal committee of the Organization for Transformative Works, or OTW. Our mission is to support and defend non commercial fan works explicitly framed as transformative, both in the legal copyright sense and in the broader sense of being different in exciting ways. So one of the ideas behind creating an organization was that we would try and show up in the rooms where it happens to give fans a voice in policy and legal discussions as creators, the way the EFF has done for general internet freedom. So today, the OTW’s Archive of Our Own hosts over 17 million fan works. We’re a library of congress American heritage site. The OTW also has a wiki fan lore dedicated to fan related topics. It has a peer reviewed open access journal, transformative works and cultures, and, of course, a legal advocacy project to help protect and defend fan works from legal challenge and commercial exploitation. The OTW routinely now participates in amicus briefs and policy comments to courts, legislators and regulators regarding copyright, trademark and right of publicity issues, including internationally, one of our most long standing projects has been seeking and obtaining exemptions from 1201 its prohibition on circumvention For non commercial remix video makers known as visitor visitors. These days, these are often called Fan edits. Our exemption currently allows non commercial remixers to rip clips of video from DVD, Blu ray and streaming video in order to make their own transformative works in the 1201 exemption process, the Copyright Office perceives its job to be narrowing your requested exemption as much as possible. I’m sure the copyright office folks will not entirely agree with that, but that’s certainly what it feels like we showed, nonetheless, to their satisfaction, that non commercial remix videos are regularly fair use, and that 1201 had hampered fans ability to make those fair uses. So we’ve obtained renewal of those exempt exemptions several times, and I’m very proud of the work that we’ve done, sort of converting fan fan works into sort of intelligible copyright policy. So I wanted to talk about some lessons that I take from my experience in fandom. First, there’s no substitute in the modern state for organizations that can speak the language of regulation. Citizens must organize or they will be ignored. But the good news is, a small group of people can effectively do that if they are, in fact, representing a group of people with shared concerns, very few of the more radical anti copyright, anti capitalist people who think that the OTW is a liberal, derogatory organization are in this room, but I think that we’ve had a productive effect on the overall conversation. That includes them, right? So it includes people not showing up in this room. Second, possibly more relevant to this group, it is not good for everyday practices to get fundamentally out of sync with formal law. If the everyday practices are acceptable and even good, the formal law ought to recognize that, and in this context, we can use fair use to do so. There are those who say that fan works are tolerated infringement. In fact, some of those people are probably in this room. And my reaction to that is, this is, at best, an argument that formal law sweeps way too broadly under any justification you want to give for copyright rights. It’s true, the main tolerators are big conglomerates, simply because, as we heard yesterday, they’re the source of most of the widely disseminated for profit copyright works that we have. But there’s a reason that even individual authors who say they oppose fan works haven’t actually sued over non commercial fan works, and the reason is they don’t do the kind of harm that we think about when we think about copyright harm, in addition, the tolerated infringement argument is a profound indictment of statutory damages. Specifically, if damage to the exclusivity of a copyrighted work is both infringed by a non commercial, non reproductive work and subject potentially to up to $150,000 in damages, that damage ought to be bad, not just an annoyance, something that you tolerate. Pam Samuelson has always had the right of it, and we heard yesterday various forms of agreement with her position that statutory damages have been harmful to the rest of the copyright scheme. Third, and more broadly, non commercial fan works are good. They offer a distinct field for creative endeavors separate from the copyright enabled commercial system. They’re both artisanal, like fine art, and widely distributed, making them an important alternative form of expression. Among other things, non commercial works are fundamentally different in the aggregate, from commercial works. So think of a bunch of things that there is no commercial market for, or very limited commercial market for poetry, 100 word drabbles, short stories, 2000 word stories, million word stories, lots of things don’t have a good market for them, but people want to make them anyway, And the structure of fandom lets them share those things and find their audiences, even though those audiences might not be enough to support a commercial publishing structure. This is part of what makes fan works worth preserving and protecting. They are part of the background of a thriving modern creative ecosystem. Non commerciality, among other things, does complicate questions about things like blanket licensing, which the other panelists spoke of. So in general, fan authors don’t want your money, they don’t want to participate in the commercial ecosystem. They kind of just want to be left alone. In addition, And relatedly, fan cultures have a long connection to queer writing. Fan fiction is, in fact, inherently about difference. It is about the fact that a story could be different, different things could happen, different people could experience the events. The people experiencing the events could be different in some way. They could be gender swapped, they could be queer, they could be trans. This encourages both repetition with difference, right? So, you know it doesn’t. I’ve already seen Kirk and Spock. I still want to see him again. This allows people to open themselves up to various possibilities, not just in fiction, but in the rest of their lives. If you want to cry about the power of creativity, I would encourage people to read the stories that we collected for our submission to the NTIA inquiry into the legal framework for remixes, we had incredibly powerful testimony that the power of making stories and other creative works within a community that is excited to hear everyone speak. I’m excited to hear your version of how Shane and Ella got together that has literally saved lives beyond its transformative effects on people, though non commercial fandom is a huge boon to creativity generally. So I like the metaphor of professors Andrew Torrance and Eric Von Hippel, who identify innovation wetlands. These are largely non commercial spaces in which individuals innovate that can be easily destroyed by laws aimed at large commercial entities, unless those individuals are specifically considered in the process of legal reform. So their description, I’m going to quote, because I think it’s very good. The practice of innovation by individuals prominently involves factors important to human flourishing, such as exercise of competence, meaningful engagement and self expression. In addition, the innovations individuals create often diffuse to peers who gain value from them. Individual in this case, innovate, excuse me, innovation requires that individuals have rights to make use and share their new creations, collaborating with others to improve them, which is what remix authors do given the small scale and limited resources. Sources of most individuals, anything that raises their innovation cost can therefore have a major deterrent effect. So things that I have personally witnessed in fandom include the adoption of curated folksonomy tagging as a system to explain what works are about. This is moved to pro publishing. I’ve seen new story types and tropes. Somewhat more niche is five things that never happened, a way of exploring different scenarios for characters that together illustrate something about the fan author’s view of the characters, or what’s important to them. You may have heard about the fan invented Omega verse tropes. I was there about humans with certain animalistic characteristics. Whether you think that this is deathless literature or not, doesn’t really matter. It’s very clear that it came from the non commercial world, and other things are going to come too. If you forget about non commercial works in your creativity policy, you enable the destruction of vital diversity and the seed corn for the next generation of creators, which is also not for nothing. Why? It’s not surprising that today’s directors, screenwriters, actors, are often coming out of fan cultures, which is where they got their first exposure to creativity. So finally, I just want to end with a coda on another view of internationalism. At the time, the OTW was founded nearly 20 years ago, the US was the only place we could count on for a strong and flexible Fair Use defense. This has somewhat changed, including by adoption of fair use in several other jurisdictions. Canada’s non commercial user generated content exception and most recently, actually greater European flexibility on pastiche. Who would have thought? But fair uses impact is still really notable, in part because we’re there with our 17 million works, which you can get to from pretty much anywhere, at least if you’re outside of China and Russia or if you have a VPN in there. American hegemony meant that we didn’t even need a term like the Brussels effect for the effect of American fair use and safe harbors laws. It was the water that we swam in. But it really did seem like the internet was another American territory that’s changing more every day. We are probably going to miss it when it’s gone. Certainly the people in this room probably are. So I’ll leave it there, and I look forward to your questions.
Jennifer Urban 47:24
Thank you. I’m going to stay here, so I’m waiting for the clicker. Wonderful. Thank you so much. So I’m not sure how we ended up closing out the day and the symposium with this, but we did. So please bear with me. I’m going to talk about orphan works, and specifically I’m going to very briefly, I think this room is probably very familiar with the problem with the definition of an orphan work, but just to put us all on the same page and then talk about what happened. There was a tremendous amount of consensus, certainly for a lot of policy issues, and for copyright in particular, to address the issue, sort of circa 2000 for up and down to 2015 but the efforts were not, ultimately, I think, arousing success. And one question is, why not? And another question is, is there a place to go from here? I think this is mostly a political story, but I will say, going all the way back to Jane Ginsburg’s observation yesterday that the ’76 act really centered the author, that where you sit on this issue and what about it troubles you may probably connects to your sense of who is an author, what authors might like, and how authors should be treated under the Act so, just so that we’re all on the same page, orphan work is where the owner of a copyrighted work cannot be identified and located by someone who wishes to make use of the work in a manner that requires permission of the copyright owner. Cannot be identified and requires permission of the owner are both very important. Essentially, these are works that are unclearable. And if you can make a fair use, for example, then in theory, this should not be a concern. Or if, of course, the works in the public domain, then the author has, the work has acceded to the public. Public domain and belongs to everybody, but it needs to be in the way that that would need to otherwise need a license. The fear is that a good faith user who maybe has tried to find the owner, or maybe has not assumed there wasn’t one, makes a use that is salutary or maybe not, but makes a use, and then a copyright owner later appears, and because the that means that liability fears as the story goes, prevent use, even in cases where there’s likely no owner. So for example, a defunct corporation, that’s something that comes up pretty frequently, or there might be somebody with no interest in exploiting the work, and because of the remedies available under the Copyright Act, including statutory damages, which are going to be limited by the circumstances, but you can come to a path where they might show up, or injunctive relief, for example, there is a chill, and that is not good for the public and that is not good for further creativity or use. At the same time, there are concerns that if we try to create special provisions for orphan works, what we might be dealing with are owners that didn’t want to be found. They didn’t care to exploit the work, but it wasn’t as though they wanted the work to be used, for example, and that this could just send us right back into a world where a copyright owner would have to take steps in order to solidify their rights, which was one of the things that was extremely important undergraduating principle of the 76 act. So the 76 Act has a number of features that are widely thought to have increased the number of orphan works around reducing formalities, no more notice. Don’t have to register, often, removing the publication requirement. And I would agree with David Nimmer, yeah. Sorry, wasn’t David Nimmer? Who was? It was, anyway, I apologize, but extending the duration being the main issue here, so that the terms are very long. Works hang around a long time and so you have more and more work, sort of accreting over time, without a lot of information carried together with them. I will return to the section 107 fair use standard, because I think that that does read on to this problem quite substantially as well. And that may be one of the places where there is a seed of an answer within the 76 act even if some of its other features may have resulted in unintended effects. So those went along. Those changes in the law went along with a number of practical, technical, technological and practice changes. There were orphan works all along. There’s no question, but they probably exploded in size for a number of reasons, distributed copy and editing technologies, which we’ve heard about already today. I think I met Rebecca years ago at a conference in which she was talking about Vitters. So the VCR was a major technology for people to make fan fiction by using two VCRs to make a different story. And of course, it’s just exploded from there. There have always been ephemeral works, things that were never intended to be exploited in the way that we would think of with copyright brochures, letters. You know, the kinds of things you find in Special Collections and libraries and archives in the 20th century, they also really exploded. And now, of course, we have all of the internet, of course, also new creative tools which have been mentioned as well over the course of the conference, we have an explosion of creativity by those who are not necessarily in a profession or in a sort of a chain of habit for registering works, for example, mass digitization and llms, which pose interesting and significant issues of their own. So taking all of that as all of that together, as I mentioned, there has been a lot of agreement on this issue, historically. This is a quote from Maria Pallante, who was then the register of copyrights when she came and gave a lecture here in 2012, that where there is a true orphan work, where there’s no copyright owner and therefore no beneficiary, it doesn’t behoove the copyright system to deny the use of the work. You know, nobody can use it, nobody can make a beneficial use. And we hope that the copyright system, as Daniel now was saying, will support the finding of a copyright owner and a licensee and an exchange an agreement, and you can’t do that if there’s no one to find. So there were widespread agreement on the characteristics of the issue, on aspects of a solution, for example, verifying that a work is an orphan by doing a reasonably diligent search, although, as I’ll say in a minute, that is complicated by practicalities and then variance in detail and substance across across jurisdictions, but the solution space has about three sets. The first is the limitations on remedies. So in the US proposals included both injunctive relief limitations in certain circumstances, especially when a significant amount of original expression was added, and that’s it’s not like you can just pull the putative orphan back out again after the work is done, limitations on remedies for damages. So that’s sort of one set limitations on remedies, and that has been the feature of the US proposals. The EU directive has statutory exceptions. If you do all of the things required and you think you have an orphan work, then an exception to the making available in reproduction rights. And then there is compensation to the copyright later appearing copyright holder in various ways that is folded into us proposals, for example. But there are also centralized licensing authorities in a few countries and extended collective licensing systems in a few countries that would apply to orphan works as well. In order for those solutions to apply, what are the conditions for relief for someone who wants to use an orphan work? And here, I apologize for this slide. It got very long on me. It gets things get considerably more complicated here, but I would say the reasonably diligent search two so that you have a reasonable sense that what you have is an orphan, that there isn’t an owner, that we are in the world of the problem that we would all like to solve is required in every example that I’ve come across, and then after that, it does start to get a little bit mixed up. So maybe you need to identify your use as an orphan use. US proposals have had that. So you put a little notice on saying that I’m using this as an orphan work register the use, potentially with a waiting period before use. I’ll say a little bit about the EU IPO database in a moment as an example of that kind of a system. And then we get into the issue of extracting the putative orphan later. So some solution sets include taking down or stopping the use upon the appearance of a copyright owner, and the 2008 US Bill did that for nonprofit institutions who would be able to choose that as a safe harbor. Most systems require some kind of compensation, later appearing copyright holder intended to be limited enough that the chilling effect that existed when you didn’t know if somebody would appear and hold up the use or require compensation was warmed and wasn’t so chilled. And then various other things. One thing the US proposals have never done is put categorical limitations on types of users who could take advantage of the solution, types of uses that would be covered, or categorical limitations on types of eligible works. And this is even given the fact that certain kinds of works are particularly have been historically, particularly hard to clear, like photographs, and that is harder for the original makers of those works. However, the EU copyright directive has all of those conditions. Whoops. So why so complicated a map? And this is, this is just a very high level, very simplified version of what I think is the case when it comes to why so complicated when it comes to users, you don’t have a perfect continuum, but you have something of a continuum where you have potential users whose interests diverge in some ways, but at least what they’re concerned about, what’s going to hold up their use. It looks a little bit different. So if you think of archives and library collections, large scale digitization, they’re going to be very, very sensitive to the cost of searching, doing the reasonably diligent search, because they have so many works, and many of them have limited resources, but taking down with they get a notice is not such a big deal for them, and licensing fees just may be prohibitive. At the other end, the wit during when we were discussing this in the 2008 bill in Congress, I was working with documentary filmmakers, and more extensive search may be more feasible for them. They’re probably not using as many works, although some documentary films use a lot, take down a removal is not possible. Somebody shows up 10 years later and says, You need to take this out of your film. Or I’m going to, you know, take you to court and hold you up for a ton of money, or I’m not going to let you, I’m going to get an injunction. You can’t distribute it anymore. That’s not feasible for them. And so where you were willing to compromise depends, to some degree, of course, on where you sit. And then similarly, for copyright holders who participated, there are those who are worried they’d be hard to find, like photographers with less information about who took a photograph on a photograph and they didn’t like they usually don’t need to make work use of orphan works themselves, whereas filmmakers can be easier to find, especially large ones and more likely to need to use orphan works. So there were a lot of efforts to address the Orphan Works issue, and as I mentioned earlier, there was a lot of positive energy around this, and a lot of really thoughtful work. Jules Siegel did, sort of spearheaded the original report in the Copyright Office, which was comprehensive register Pawlenty at the time, spearheaded a second report in 2015 and there were certain characteristics around which these these efforts coalesced. The first one defined what an orphan work was that seemed pretty straight forward. The solution space coalesced on the reasonably diligent search and limitations on remedies. And there were the strong objections I mentioned by some creators. And then there was a bit of a hiatus, and the 2012 EU Directive, which focused on certain actors, so cultural institutions, certain kinds of works, not photographs, for example, and certain kinds of uses, which would be the kinds of uses you would imagine libraries and so forth making. And then the 2015, Copyright Office report, which separately covered mass digitization and orphan works. And for orphan works specifically, the register declined to recommend that we follow any of these examples, for example, from Europe, but that we continue to cover all works, that we have a general solution and that we focus it on a reasonably diligent search and a limitation in remedies. So this is the EU IPO database, where you have you register your search and register your use that I mentioned. The question is, what has been the effectiveness of these efforts? So there’s been no new legislation in the US. I’ll skip to the bottom there. There’s been a limited effectiveness of others, the administrative and centralized licensing models, which I didn’t go into much for time the Register said in her report, in 2015 you know, these just didn’t work out. Fewer than 1000 licenses total. And some of these systems started around 1999 that’s a very high cost system and did not seem to be working. The EU directive had a follow up report in 2021 it found very limited use of the of that registry system, and thus very, presumably, very little uptake on using orphan works by these cultural institutions, there were originally 18,649 registered in the database, which, when you think of the landscape of orphan works, is quite small, but it turns out that 70% of those were registered by one entity, which was the British Library, and when Brexit happened, the number dropped immediately to 6903 so very limited overall use by most eligible organizations. A lot of complaints about strict search requirements, et cetera. TRA and it is not, not a huge success, although, although it, although there has been some uptake. So why is it so complicated to get this right? It’s really the same story as why people have different, different sort of red lines that they would have that they bring to the debates about it. And then there was a third thing that happened between 2006 and 2015 which was there the fair use case law developed a little bit further. And remember, an orphan work is if you requires permission. So this is a question in risk management, essentially, do you want to stand on if you think you have a potential fair use? Do you want to stand on fair use? Or would you like a statutory protection limitation on remedies? Would you feel safer with that? And in the interim, the Google Books case settled. The Hathi Trust case was not done. It was moving. And some large institutional stakeholder, libraries, not all of them, some of them, took the position that given all of the other things introduced in the law that they would have to do, that they may not be able to do, that would increase costs, etc, and that really, it might be more reasonable for them to rely on fair use. And they were very concerned about the dynamic that Daniel mentioned at the top of the panel, but in a negative way, he mentioned it positively, that that the existence of this could be seen by courts as reason to limit the fair use doctrine. The Register suggested a savings clause that was not enough to allay their concerns, and ultimately, the proposal didn’t go anywhere. So where are we now? It’s hard to say. I think there are a couple of options that don’t require legislation. One wonderful thing that wasn’t the case in 2006 because the Copyright Office hadn’t gotten the money, is the Copyright Office has made substantial strides in digitizing its registration records, so that is certainly helpful, and the fewer things that are you can’t find, the fewer orphan works at the same time, there are a lot of important records that are still not going to be digitized. And sort of a sour spot, is what I how I think of it. From 1945 to 1978, they’re not done yet. Obviously, there are limitations to registration records. It doesn’t give you everything you need to know to find an owner sometimes. And then, of course, this later appearing a copyright owner who registers later is the suspecter that that over it sort of looms over all of this. And that brings us to risk aversion. When Register Pallante was here in 2012 she said in her talk, boy, I did not realize how risk averse you folks are, and and, you know, and she, I think, through the process of the orphan works proceeding and others, you begin to learn about the gatekeepers that small creators have, or libraries have the limitations in their budgets. You know, other people sort of making decisions about risk that perhaps are not fully economically rational, but that indeed have a practical effect. Same thing with fair use, if you’re confident in your fair uses, if the case law makes you confident in more fair uses that could reduce uncertainty, certainly occasionally, courts have considered market unavailability, as was mentioned yesterday, not that frequently, but that inspires perhaps even more risk aversion, and again, brings in the gatekeepers, the insurance companies, the distributors, who who may insist that a work be cleared, and the definition of an orphan work is that it cannot be cleared. So the last thing, and I’ll close on this, is, what do we do with LLMs and AI, they don’t work on my continuum, because they have massive scale, so they’re going to be sensitive to search costs. I don’t know how you take something out of training data. Maybe Cooper can explain it to me, but I’m not sure that’s going to work. And so they end up in this space that is a little bit harder to think about, and I’m interested in everybody else’s thoughts too about where this might go. Thank you. So we are going to talk amongst ourselves a bit, as Molly put it yesterday, and ask for. Questions from the audience as well. I will kick us off with Rebecca, and this relates to my thinking about orphan work. So although non commercial and commercial creators have protested their work, their use of their works in AI training, only commercial creators have had paths to control or or compensation. And I guess I’d like to hear a little bit more about why you think that is. I, for example, have had a client who came and asked about whether she would be able to gain from the settlement, but she hadn’t registered her work. Her publisher was supposed to register the work. Publisher didn’t register the work, and so so she could not so that’s an example to start with. And the Orphan Works come in because, of course, they’re going to be massive numbers of orphan works. And if somebody comes forward and the settlements are structured with registration in mind, they won’t be able to recover. So could that change?
Rebecca Tushnet 1:11:17
So, you know, I don’t really see it changing for reasons that, I think, has expressed ably. The transaction costs are not worth it, not to mention the fact that most non commercial creators, as I said, they sort of want to stay out of the commercial economy. Am I not loud enough? Oh, I’m sorry. Okay, all right, so, and I think part of the issue is this is one of these category errors, like, if you offer me money for the fan work, that’s like offering money after I’ve hosted you at dinner, like that is a mistake about the nature of the transaction. It is fine to have restaurants. Restaurants are not immoral, of course, but that’s not the relationship that I wanted to have with you. And so I, you know, I think it’s, it’s not, you know, it’s incommensurable in a really classic way. And it’s not as if we can, we can, you know, reward people, even with attribution, which people do tend to care about given the the nature of these things.
Jennifer Urban 1:12:39
Thank you. To Matt Sag, how does the international dimension of AI training affect how we think about this under US law? And I think this connects to one to Daniel’s last point about extending licensing and extending solutions across borders.
Matthew Sag 1:13:00
Yeah, so the the international scene for AI training is, of course, quite complicated. Peter, you and I have a recent article where we survey the global scene, and the upshot of our conclusion is that different jurisdictions have approached this issue in very different ways, but if you just step back a little bit, you see that each jurisdiction is trying to do essentially the same thing. They’re trying to make a pathway for legal text data mining, have some protections for copyright owners. And what you see is just a difference in regulatory style. So the European Union is far more prescriptive. There are many things about the EU DSM directive that allows text data mining that actually give people in the EU admirable clarity. Some of them go further than American Fair Use law would allow, but they do make some very hard edge distinctions about not exactly commercial, non commercial, because it’s not enough to be non commercial in Europe, you need to be attached to a proper university or library, et cetera. But I think the the thing that I always come back to is, what is the policy space for addressing this issue? People who think that we can put the AI genie back in the bottle? One, I think are misguided. But two, even if that’s what you wanted to do, a lot of this activity is incredibly portable. You can just take your AI training to other jurisdictions. And I think that fact of international competition needs to be recognized. I think it’s very difficult to see how a licensing system or a set of tax and redistribution system could work on an international basis. You know, I just, I don’t think we have the political competence to even do it here on a national basis, but I wouldn’t be surprised to see them do it in Europe. I think it’s something the European political culture is more amenable to. But yeah, I mean, you know, if you look around the world and you count up the jurisdictions where text and data mining for machine learning is lawful. It’s still only a handful of jurisdictions, but when you add up the GDP, it’s 52% of the world’s GDP. So the fact that we currently allow this in the US is not really an outlier. It’s, you know, other countries do it differently, but most of our peers into do something like this.
Jennifer Urban 1:16:06
Thank you. Daniel, I don’t mean to ask you to sort of respond to that, but you did end your presentation on an idea that we might need cross border solutions that would be grounded in licensing. And I wondered if you would be able to expand on that a bit for us.
Daniel Gervais 1:16:26
Of course. Well, it’s a basic rule of copyright at the Berkeley convention, and therefore the TRIPS agreement that copyright is territorial. And so when you have something that’s protected by copyright. You have 182 different national copyrights in fact, or in each in each country has a different bundle of rights with different names, exceptions, enforcement rules, damages rules, injunction rules, all that. I can’t resist mentioning that everyone except the Fifth Circuit seems to get that. So you can trade those individual copyrights, and that’s what people do. So when you sign a contract, you cross borders, and you know, as I like to tell my students, you don’t, you don’t, you can’t say in the contract, this United States Copyright exception shall be interpreted according to Brazilian law, right? You can’t do that. But what you can do in the contract and say is, this contract shall be interpreted according to the law of x, and parties will, you know, agree to the jurisdiction of y. The reason we do that is precisely because, you know, contracts cross borders. And in the music space, for example, we’ve had cross border licensing for several decades, and it works well in text. We’re seeing it increasingly, but it’s still not at the scale that it’s out for music in other areas, it’s it’s not as developed, and I’m talking on the collective scale, but even on the individual level. So I think contracts are the way across borders. That’s why, that’s one reason I’m opposed to compulsory licensing in this space. I think compulsory licenses have a whole bunch of problems with them, but one of them is that they don’t cross borders and and I think voluntary licensing is is the way to go, individual and collective, so they work together. Last thing I’ll say, though, is also one advantage of licensing is it’s a contract. And the advantage of a contract is you negotiate what’s in the contract, and so it’s entirely okay. And I’ve actually seen collective licenses that do this, and I’ve seen individual licenses that do this basically saying something like parties don’t agree on the current scope of fair use, right? Of course, that’s not exactly contractual language, but that’s the idea. And obviously at this point, it’s very hard to know exactly what is the scope of fair use for AI training, I think we’re probably five years out before we, you know, we have some some, you know, reasonable clarity from the courts, but a contract can also manage that up to a point. And so I think there will be some licenses that will happen because you can manage the legal risk. And I think some of the existing licenses, in fact, may be attributable to that so, so that’s why I think licensing has a role to play here. But you know, obviously it depends on your perspective of how much of this should be copyright infringement and how much you know should not be.
Jennifer Urban 1:19:41
Thank you. Rebecca, can I ask you to pick up on that from the perspective of the kind of creators with whom you work? And I guess there are at least two aspects of a question. One is for creators who are almost, you know, might be insulted by payment, but that isn’t what they’re looking for. But maybe they’re looking for attribution or recognition, whether Daniel’s thought is something that would interest them. Then, of course, there’s the question of whether you actually, they could actually be part of that negotiation. And then, relatedly, if they’re currently relying on fair use, if that could change and change their views.
Rebecca Tushnet 1:20:23
Yeah, so, I mean, first of all, the description of, you know, waiting until it becomes more coherent courts, makes me think facetiously, why don’t we just propose, like, prediction markets? Let’s, you know, let’s bring Cal in on this, and, you know, we’ll just have something that will settle out at a particular time, since, apparently, that’s what we’re doing for everything now. Now I don’t think that would be actually be a good idea. And again, you know this from the non commercial creator side, like that. The whole point is that there, that it’s just a bunch of people united by sort of overlapping content specific interests. It’s just not the kind of thing that is amenable to negotiation, nor would you know, my organization be good representatives in that, in that, in that field, like, right? Because most people just don’t want to be paid. The other thing that I would say about the experience of collective licensing, from the non commercial creative side, is buried in these contracts you always see like, and at any time we, if we decide we don’t like a particular output, we can get rid of it, right? What I’ve called tugable blanket licensing. So, you know, go ahead and create and then if it becomes controversial, we’ll, you know, we’ll take away your permission. And I think that exposes one of the key weaknesses of license, of voluntary licensing schemes is, you know, they always have that out in a way that Fair Use doesn’t. And in fact, of course, the blanket is most likely to be tagged in Fair Use situations. So I do think that, you know, fair use will just continue to be, or, you know, the zero price compulsory license, if you want to put it that way, will continue to be the most effective and justified by copyright theory solution.
Jennifer Urban 1:22:30
Thank you. And I didn’t talk about the success or failure of the extended collective license models that incorporate orphan works, for example, but that’s perhaps an even more extreme case than creators who they’re not really part of this party. They’re not interested in being this part of this party. They may not exist at all. They’re not findable. And in the 2015 report, register polanti pointed out that it’s just money that goes somewhere, and it is not supporting any goals with the copyright system if there’s nobody to pay, the whole point is to find somebody to pay or to impose their their preferences on the system. So I believe, Matt, you have a question for Rebecca, and then we’re going to go to the audience.
Matthew Sag 1:23:18
Yeah. So I have a question about — is kind of in the weeds about section 103 A of the Copyright Act, which says protection for a work employing pre existing material which copyright subsists, does not extend to any part of the work in which such material has been used unlawfully. And I know Pam Samuelson and Jessica Sylvia have written on this, but thinking about fan communities and how this affects them and like is this a point of friction or sort of disquiet in those communities?
Rebecca Tushnet 1:23:54
Not at all. So well, okay, I should say a couple things. So things that people worry about are so they do worry, first of all, you know, if somebody, and this happens is, you know, printing my work out and selling it on Etsy, or has put their name on it and put it on Kindle Direct, can I assert a claim against them? And our answer is always yes, you made it. It was fair use. Therefore 103 is irrelevant. They also worry about whether they could be held responsible, because they do understand that, at least for some fan works, the non commerciality may be a crucial thing that makes them fair use, right? So that just you know, writing the next Star Trek book that could, you know, and telling it commercially could infringe. And so they’re worried, if somebody does that without their permission, are they in trouble? And you know, we generally say No, right? You didn’t authorize that. You can’t — there’s no you can’t be held liable for that. But, yes, go ahead and file the takedown anyway. In terms of Goldsmith, sort of could complicate that, but could also make it easier in that, since Goldsmith demands a use by use inquiry, we say, okay, you know, the fan author’s use was fair use. There’s a valid copyright. There may be infringing ways you could use that. That’s what Goldsmith explicitly contemplates, but that doesn’t change the fair use status of the original creation.
Jennifer Urban 1:25:29
Thank you. I think we have a few minutes for questions from the audience, and I see a few hands. So let’s get started.
Audience 1:25:42
Hi, Mitch Stoltz, Electronic Frontier Foundation. This question is mainly for Matt, and it’s about the licenses that large copyright owners have made with some of the AI companies. Do you see those narrowing the scope of fair use? And if so, what do you think are the implications of that?
Matthew Sag 1:26:12
I don’t think that those licenses should narrow the scope of fair use, although I do know the editor of The Atlantic said on a podcast that one reason he entered into one such license was to prove the existence of a licensing market to help out the plaintiffs in some of these litigations. When you look at those licenses, as I have done, a few things stand out. One is that most of them, as far as I can tell, they’re not just for AI training. Most of them are also licenses for retrieval, augmented generation, and people don’t make this distinction enough like the economics and the copyright implications of sending an AI agent out onto the web to clip little pieces of information and then assemble those into a report, are quite different from the AI training cases, and it makes sense that people would be licensing that activity. Mostly what people are licensing is access, and you can see this most cleanly with Reddit. Reddit doesn’t own copyright in the things inside Reddit, but Reddit charges $60 million a year to Google and open AI for access, and not just any old access, like cooperative fire hose access that makes the system go around. So I think we’re going to continue to see licenses for access, and I think that’s healthy, and that’s important, and that’s why we desperately need to update the robot.txt protocol so that it is a little bit more consistent with the kind of preferences people want to signal. But those licenses don’t prove that licensing is a general solution, right? And that’s the, that’s the sort of fallacy that I see over and over and over again, like, oh, well, there was some licensing, therefore everything can be licensed. And, no, that’s not true. Like, it’s very easy to go to Reddit and do a big deal, right? Pam points this out in her UCLA article. The problem is that it’s really only by aggregating lots and lots of content that you get a value above zero in order to do a license deal in the first place. And so, yeah, I think we’re going to see more of those licensing deals, and I think that they’re a great thing, and they’re part of a healthy ecosystem, and we should think about how we can facilitate them. But I hope courts don’t jump from there to say, oh, therefore, there’s a market effect. There’s no fair use for this.
Audience 1:29:02
So, Matt, since you’ve got till October, I just want to suggest that instead of amending fair use to essentially an act of presumption that training is highly transformative, you move away from fair use, and you certainly avoid the word transformative, because I think by using the word is highly transformative, you’re going to attract additional political opposition and religious opposition and emotional opposition that you don’t need, unless your goal is to have this not pass.
Matthew Sag 1:29:47
That is a completely fair point. I should say, my larger project is, I think that 50 years on section 107 is actually really good, but it could be better. It could be a lot better in ways that are completely consistent with current law. We could draft it better. It would be easier for judges and the sort of fixing fair use if the courts say the wrong thing on AI training is really just like an afterthought in that project. I am not proposing something I expect to happen. I understand the politics.
Jennifer Urban 1:30:22
So we are going to need to end there, but I think we have a theme, which the ’76 act is really good, and it could be better. Thank you all very much for your wonderful participation, and I’m going to invite Peter and Pam and Molly to the stage to close while we retire. Thank you very much.
Molly Van Houweling 1:31:16
Thank you, Jennifer, our grand finale for this grand event is going to be lunch, and the key features of lunch are going to be conversation amongst yourselves with the speakers and moderators and cake. Many of you might recall that at our anniversary party, or birthday party for the Statute of Anne, several years ago, we featured a giant birthday cake, and so we are going to revive that tradition with a celebratory cake today. Now your schedule calls for us to come back in a half an hour or so for closing remarks, to interrupt lunch, and we’ve decided that we don’t want to interrupt your conversations, nor your cake, so we’re going to do some closing remarks now and send you to enjoy the party. So first, we want to thank our MVPs of the entire event, BCLT, Richard Fisk and Justin Do. We also want to thank for today and for the future, the student editors of the Berkeley technology Law Journal. They have been staffing this event as mic passers and timekeepers, and they, of course, will also be editing the forthcoming symposium issue, capturing many of the insights of our speakers. Let’s also thank the AV staff. They have been doing all the on the spot. AV, and we’ll also be preparing recordings that will be available after the event. Getting to the substance, I want to thank our keynote speaker, Shira Perlmutter, who is still here today. Thank you, Shira, for staying with us and participating in the conversation. Judge McKeown had to run to a judge event, but we also appreciate her time with us yesterday. I also want to thank our speakers and moderators, including my colleagues, Jennifer Erwin and Eric Stallman, Shyamkrishna Balganesh, who moderated yesterday as well and the speakers, we had high expectations for this event, and for me, they have been exceeded by the breadth and depth of the conversations, by the mix of history and contemporary controversies and insights about the future, by a nice mix of really rich description, and also some moments of cutting critique, and I think we’ve also gotten, especially today, some inspiration for both scholarly analysis and also advocacy. Finally, I want to thank our collaborators at Columbia. This has been, as you know, a joint collaborative event, and we will be both thanking them for the collaboration that led for led to today, and also for the upcoming second part of this two part event, which will be at Columbia on October 23 and 24th and I hope that we will see many of you there as well. And so to Jane and Shyam and Pippa and Kaitlyn. Thank you for this, and advance, in advance for the time that we look forward to spending with you at Columbia. And I think my colleagues Pam and Peter have some closing thoughts as well, so I’ll turn it over to whoever wants to go next.
Pamela Samuelson 1:34:40
So thank you so much everyone for being here. Having an audience means that we get more excited to present. And I’m very grateful for all the insights. I think Shira did an amazing job giving us a kind of an overview. And I think each of the panels has gone deep or wide since then, and that, I think, has made the event really feel like it’s holistic, that we really have had an amazing event. And to my students who are here, I think that one of the things that you see is how passionate so many of us who work in the copyright field and have done that for many years, how passionate we are about what we think, though the law, can do to promote culture, knowledge and the social good, and also, you know, you probably didn’t know a lot of the history. I think some of us in this room. We live through this stuff, and so it’s very familiar to us. But even so, I came away from this conference with a number of insights about a number of things that I didn’t know anything about. So I’m going to actually call out Marketa because I had no idea about all those state laws. That was like, Ah, I can’t even believe it. So. So thank you for all your insights as the speakers and also for your participation. We’re really, really grateful to have a rich audience for our program. We spent really long time planning this thing, and so to see it come off and to be able to enjoy it with you has just been quite a joy for us.
Peter Menell 1:36:35
Well, we founded BCLT about 32 years ago. We had a mission statement, and it was, I think, beautifully captured in this event, that we wanted to have a place where people from all sides have important technology related issues could come together. And I just feel that that was our experience. There’s passion and there’s range of perspectives. We are sort of an older guard by age, not necessarily by ideas, but still, we recognize that the future is is going to be created as I think Judge McKeown said, in ways that we won’t fully be able to participate, but I know from looking at the audience that we have a tremendous pool of talent here. They were below the radar, but people who are working in the generative AI world, who are going to be very important as these debates go forward, so to have an event that really captured such a breadth of perspective, I think, is the real hallmark of of this institution, and I hope we’re going to be able to keep doing that. So thank you all for coming you.