Professor Jonah Gelbach on Why Scholars Want to Free PACER

Berkeley Law Voices Carry podcast

In this episode, host Gwyneth Shaw talks with Professor Jonah Gelbach, whose research spans a variety of topics, including civil procedure, evidence, statutory interpretation and legislation, and law and economics. He has a Ph.D. in economics and was a professor with a specialization in econometrics before earning his J.D. and joining the legal academy. He’s also worked as an expert witness in employment and securities cases. 

For more than a decade, he’s been demonstrating how aggregating federal court data can help researchers tease out critical trends. He’s also pushed for the federal judiciary to drop the paywall on the Public Access to Court Electronic Records database — the online repository of more than than 1 billion federal court records, commonly referred to as PACER — for non-parties to access information so that more scholars, journalists, and members of the public can analyze what’s going on inside our courthouses. 

Click below to read some of Professor Gelbach’s work and other groups advocating for easier access to federal court data. 

Beyond Transsubstantivity

Free PACER

Estimation Evidence

Legal Tech, Civil Procedure, and the Future of Adversarialism

Locking the Doors to Discovery? Assessing the Effects of Twombly and Iqbal on Access to Discovery

Free Law Project

About:

“Berkeley Law Voices Carry” is a podcast hosted by Gwyneth Shaw about how the school’s faculty, students, and staff are making an impact — in California, across the country, and around the world — through pathbreaking scholarship, hands-on legal training, and advocacy. 

Production by Yellow Armadillo Studios.


Episode Transcript

[MUSIC PLAYING ]

 

GWYNETH SHAW: Hi, listeners. I’m Gwyneth Shaw and this is “Berkeley Law Voices Carry,” a podcast about how our faculty, students, and staff are making an impact through pathbreaking scholarship, hands-on legal training and advocacy. My guest for this episode is Professor Jonah Gelbach, whose research spans a variety of topics including civil procedure, evidence, statutory interpretation and legislation, and law and economics. He has a Ph.D. in economics as well as a J.D., and one of his fields in economics is econometrics. And he’s worked as an expert witness in employment and securities cases. 

 

For more than a decade, he’s been demonstrating how aggregating federal court data can help researchers tease out critical trends. He’s also pushed for the federal judiciary to drop the paywall on the Public Access to Court Electronic Records database for non-parties to access information. The online repository of more than 1 billion federal court records, commonly referred to as PACER, has been the subject of legislative and litigation activity over the past several years. 

 

So what can researchers learn about the judiciary from PACER, as well as databases from state courts across the country? Gelbach, who also happens to be my husband and so is joining me in our home studio, is here to discuss his work and advocacy. Welcome, Jonah, and thanks for being here. 

 

JONAH GELBACH: Good to be with you here. 

 

GWYNETH SHAW: First of all, what is PACER? And why is it paywalled at all, if it’s a federal database?

 

JONAH GELBACH: Well, PACER is often thought of as its own database. But actually what it is is kind of just a top layer above the database is really a collection or a network of databases that the individual federal courts use in order to operate their day to day activities. There’s something called the CM ECF system. And I won’t go through the entire acronym, but that system is essentially stitched together for purposes of providing access to the electronic records of the courts through this front end that’s called PACER. It’s essentially a website more than anything else.

 

GWYNETH SHAW: What are some of the things you’ve been able to discover by working with these records? And can you talk a little bit about how you’ve been able to get access to those records as well? 

 

JONAH GELBACH: Sure, well, let me start with the access question. I was very fortunate when I was a law student. I had been an economics professor before I went to law school. And so I had a lot of experience doing large scale, econometric and empirical, quantitative research. And I took Civil Procedure from a wonderful scholar and professor named Bill Eskridge at Yale and civil procedure really isn’t Bill’s first field, he teaches it a lot. But he doesn’t do a whole lot of scholarship in the area. And I kept coming up to him and asking him after class, you know, how often do these motions happen? Or what happens when they’re filed? What fraction of them are granted? And things like that? And Bill kept looking at me like, “How am I supposed to know the answer to that?” And it’s not Bill’s fault. Most people don’t know the answers to those questions, because the data aren’t publicly available. 

 

So I started thinking about ways to do research on litigation in the courts, from a scholarly and really, in some respects, economics perspective. And I kept running into this hurdle of not being able to get access to representative data. There was a data set here, a data set there, but nothing on the order of well, how often does this happen, generally. And it was very frustrating because I knew the data existed because the courts operate their own database systems, as I mentioned earlier, so I did some looking into this. And it turned out that you really couldn’t get access to the kind of data I was interested in the federal courts are not subject to the Freedom of Information Act, they will provide access to most of their data, but only on a per page basis for most of it. You’re talking about some of the exceptions to that later. That’s the PACER paywall that you mentioned earlier. 

 

And I had this idea I really wanted to study and I was looking for access to the data and just couldn’t find it. And finally, I went to Professor Eskridge and said, “Professor Eskridge, I can’t get access to these data.” And and he said, “Oh, well, you know, why don’t we ask someone on the Supreme Court?” Because of course, he had friends on the Supreme Court. And I said, “Well, it turns out, that’s probably not quite enough. It’s there’s a, you know, bureaucratic procedure behind it, and so on.” And he said, “Well, does anybody else have access to the to the data?” And I said, “Well, yes. The the large electronic data providers in the legal information world have downloaded it, they just pay the money to download at least some of the data and LexisNexis was an example of that Bloomberg Law and and Westlaw.” And he said, “Well, I am a West author. And I’m just in the process of redoing my contract with them. And I’ll just tell them that I want them to give you the data. And I’m sure that it’ll happen.”

 

 And to my great surprise, that was basically what happened. It was more, you know, cooperative than just that. But after a fairly lengthy period of time, we were able to get access to an enormous amount, essentially 10 years worth of filings in the federal courts. And so I have a lot of data about those cases, not every single thing you’d like to have, but a lot.

 

GWYNETH SHAW: And one of the most interesting things is that because you have access to that data, you’ve been able to do things that basically no other law researcher has been able or any researcher has been able to do. Is that right? 

 

JONAH GELBACH: That’s basically right. And it’s kind of amazing, because the data in question are essentially the amalgamation of information that as I mentioned before, lives in these individual court databases. So in the federal district courts, there are 94 of them. And each one of them, I believe, operates its own CM ECF system. And there are also bankruptcy courts and courts of appeals that run these, the same software and have their own databases. I’ve not done any real work with that data. But because of the way I got these data, I am able to do some nationwide research on the dockets of the cases that were filed in the period of time, this 10-year period that I have data for. And even the Federal Judicial Center, which is the essentially the research arm of the judiciary, doesn’t have access to do this kind of research. In principle they do because they have access to something called a replicator database that essentially provides all of the data I’ve talked about in one place. But they’re not, as I understand it, anyway, allowed to do any queries of that database unless they’re specifically asked to do so by one of the judicial conferences and committees. 

 

I don’t have any such limitation, I can just go and do research on these questions. And I don’t have access to the underlying documents. So I have the docket which lists the various events in a case, for example, it’ll say that a complaint was filed on a certain date in a certain case. And it’ll list the name of the filer or perhaps the attorney who filed it depending on the precise details. I don’t have the complaint itself, I would have to go to PACER to download that, or get it from some other source that had already downloaded it and was willing to provide it in order to do that. So it still makes research expensive for me, because I only have access to the list of document events itself. But that turns out to be quite a lot to have access to. 

 

GWYNETH SHAW: And one of the things that working with this dataset convinced you of is that this information should be available to pretty much everybody, not with a paywall. So first of all, how much how much is the paywall? How much is it per page? And what are some of the ways that you think the government should open this up to researchers and just the general public?

 

JONAH GELBACH: Great questions, Gwyneth. So per page, they charge 10 cents, it used to be less now it’s 10 cents — hasn’t gone up in quite a long time, actually. But it’s 10 cents a page. And then there’s a cap per document of $3. So if you get a 50-page document, it still costs $3, not $5, for example. However, there are certain queries that have uncapped fees. So if you were, for example, to do a query, asking for the list of every docket number, in all the cases that were filed in a year, that would be a very, very large set of quote unquote, pages that would be returned, and they would charge you a lot of money for that it would be in the hundreds, maybe 1000s of dollars, just to get that list of docket numbers. There are probably other ways you could get that list, and we could maybe talk about that later, although maybe isn’t all that interesting. But, but that gives you a sense of just how expensive it could be. And there actually is the potential for these surprise fees. If you don’t realize that a particular search is uncapped, they won’t do the search until they have your have your you know, agreement to pay the bill. And so you might well get a bill and you hear stories about about PACER trying to or the whoever’s operating their payment systems actually trying to enforce these sorts of things. So it could get fairly ugly for for people who are not aware, having said that, it is possible to get a fee waiver. 

 

It’s complicated at times, and not everyone gets them even for scholarly purposes, which is sort of frustrating and surprising. You have to apply to each district. I think they have recently changed that or in the last few years change that there’s some sort of multi district application procedure. But even then there’s the risk that individual districts will decide to say no, or all of them will. And it’s just a very strange way for of all the branches the courts to manage and limit access to their own data. There’s some history behind it, which I’d be happy to talk about later. But as a matter of policy choice at this point, it’s very odd, you would think that the courts would want people to know how litigation proceeds to be able to assess whether the judicial process is being conducted in the ways that one expects to to be able to do the kinds of research that scholars want to do. It might be uncomfortable sometimes if it turns out that particular courts are not functioning as well as they should, or particular judges perhaps are not doing their jobs in the right way and in various dimensions. But it’s really kind of a shocking degree of opacity for a branch that, after all, is tasked with telling the other branches, whether they’ve behaved transparently enough.

 

GWYNETH SHAW: You mentioned the Freedom of Information Act. And of course, that’s a whole other conversation about how well that law is functioning. But even so, why doesn’t this apply to the judicial branch?

 

JONAH GELBACH: Well, when the Freedom of Information Act was enacted by Congress, and I’m not a specialist in that area of law, which is essentially administrative law. So I actually don’t know the year that that happened. It’s essentially part of the Administrative Procedures Act, which itself dates to the 40s. The FOIA part of the APA just simply exempts the judiciary entirely. And so not only the kinds of data we’re talking about, but all sorts of administrative discussions and decisions that the Chief Justice, for example, engages with in deciding who will be on which committees of the courts how they even determined that the price for PACER fees will be 10 cents a page, all sorts of things like that. There’s just no public access to the decision making procedures. There’s a small exception to this, although it’s actually a very important one, which has to do with something called the rules Enabling Act. The rules Enabling Act governs the process by which the Federal Rules of Civil Procedure, the Federal Rules of Evidence and various other of the rules involving bankruptcy, the criminal rules that tell the district courts and the appellate courts how they should operate in cases on a day to day basis, those rules are created by an advisory committee driven process. And ultimately, the Supreme Court is given a set of proposed rule changes, additions, amendments, etc. And the court votes them up or down and usually up and the process by which those rules are initially considered and amended and discussed. That process now has a lot of openness by statute as a result of changes that were made, I think, in the 1990s. But that’s the exception, not the rule for the federal courts. 

 

GWYNETH SHAW: So I asked before, you know, why is this information valuable to researchers like yourself, and you’ve got a couple of papers that are forthcoming in the next year or so that use some of the some of the federal data, some data from California courts, what can you tell me about that, and how it shows how useful this access can be? 

 

JONAH GELBACH: Great question. So let me give you the example of a paper of mine that’s co authored with Nora Angstrom and David Angstrom of Stanford Law, and Austin Peters, who’s a Stanford grad who’s now clerking for a district court judge, and Aaron Schaffer-Neitz, who is a Stanford JD student right now. So the five of us wrote this paper called Secrecy by Stipulation, that paper is I’m delighted to say now forthcoming in to Duke Law Journal, we’ve just essentially finished the kind of the main draft of this paper. 

 

And the basic focus of this paper is on a phenomenon called stipulated protective orders. So a protective order is an order that a district court trial court in the in the federal system would issue to say that discovery information cannot be shared or cannot be used in one way or another. That’s a departure from the default rule, which is, once you receive information in the discovery process, from from even an adversary, in principle, you can hand it to the media, you can create a billboard, you can, you know, say anything you want publicly about this information. You can imagine there are trade secrets, there are embarrassing facts, there are all sorts of things about about a party’s information that it might not want, want to be known publicly, in principle, any information that’s relevant in one way or another to the case in front of the court has to be disclosed in what’s called adversary discovery. But parties can fight that they can move to obtain a protective order from the trial court, they can dig in their heels and threatened to do that. And so one of the things that sometimes happens is, the parties will go jointly to the court and say, We’d like a protective order that will protect discovery, even if it’s only the discovery of one side, we want it to protect the discovered information. So that, for example, it can’t be released to the media. And there will be a court order if they get what they want that says you may not anybody who comes into possession comes into possession of these data, as part of this litigation may not disclose these data outside of the scope of of the litigation use. And you can think about, you know, there are good reasons why that might be appropriate. In some cases, there’s really, you know, personal embarrassing information about a litigant where there’s really no public interest, or no substantial one that makes total sense if there’s a trade secret, you know, the secret formula for Coca-Cola or something like that, where it would be really, really costly to one of the litigants for this information to become public and no particular good reason for that to happen. 

 

But what about a situation like the like an automaker whose cars have defects, and they’re only willing to provide information that shows that there’s a defect, if the other side agrees not to disclose it? Well, there are lots of other people out there driving cars on the road today, who might be the victims of preventable accidents, and might find out about that and there might be a recall if that information became public, but not if it doesn’t. Information like that there’s not particularly good cause for the court to approve the withholding of the right to disclose such information from the receiver of such discovery. And so it’s really inappropriate. In such a case. As it turns out, it’s actually against the rules, the written rules for information like that to be withheld as a result of a protective order. And so we have this system where there’s an incentive for parties who want to get such information to agree not to disclose it to the public, because that might be the only way they can get it without a fight in the litigation. And it’s expensive to fight. And you know, an automaker, like GM, for example, has a lot of resources to fight. And so you might be looking at an enormous pre-trial set of wranglings to get access to such data. And it’s just easier for the plaintiff’s lawyers to agree sometimes to say, Yeah, we won’t tell anyone. And not only will we not tell anyone will support the request to the court to tell us we can’t tell anyone. 

 

That’s a potentially dangerous situation. And it’s not what the rules are designed for, or even allow. So our paper uses data from federal trial courts, the data I’ve talked about to show that first of all, these stipulated protective orders are happening in a lot more cases than people had thought, originally, there was a thought that it was maybe only one or 2% of cases. But we find not only is that number, is it at least twice, maybe several times as large as that number. But it’s actually been increasing over time, at least in the 10 years of data, we can relay for these purposes, eight years of data that we can use to assess this. On top of that the fraction of the stipulated protective order requests were both sides go to the judge and say, please order us not to disclose this information. The fraction of those requests that are granted by district courts is something like 96%. That’s not a number that makes you think that the courts are investigating the details of these proposed protective orders, and determining carefully that there’s good cause, which is the written standard for granting them. So you could never have known the extent of these results without access to data of the type that we have. Further, we were able to download a sample of some of these stipulated protective order requests and the court opinions granting or in rare cases, denying them so that we can evaluate whether Well, maybe it’s just the case that these are all really high quality requests. And maybe it’s 96%. Because the parties have just done a really good job of explaining to the courts, why it is that this is necessary. That doesn’t look like what’s happening. We talked about this in detail in the paper. And I’m not gonna go into further detail here. But I’ll just say that we really don’t think that’s what’s happening. 

 

This is a project that could not have been done with anything like the convincing this in the scale with which we did it without access to the data that I was able to bring to this project to share. 

 

GWYNETH SHAW: As you mentioned, before you claim to be a law professor after having been an economics professor. And so a lot of your work uses kind of the economist’s toolbox. What value does that have for legal scholarship? What do you bring to the table, along with some of your colleagues, especially at Berkeley Law, which has a pretty sizable law and econ group and a long history of scholars who specialized in law and economics? What are those tools able to help you do that someone who doesn’t have them can’t do? 

 

JONAH GELBACH: Well, I’m really glad you asked that question. To be clear, there are ways in which anyone who thinks carefully about human behavior could do the kinds of things that I do in terms of the use of the economics toolbox for studying these questions. So it’s not like a secret ritual where we like wear hoods and get around like a cauldron in the middle of the night or something like that, you know, and chanting “economics, economics.” But what we do as economists is think carefully about the incentives that human beings have when they make decisions about all sorts of behavior. So you know, economics is some some people think economics is about the stock market, or the economy, quote, unquote, or the unemployment rate. And, you know, it is about those things. But it’s, in many respects, at least for microeconomists, and I’m a microeconomist, meaning we look at individual decisions and individual behavior. Really, what we’re getting from economics is a methodological approach of asking, how do people behave when their incentives change? 

 

And when you put it that way, it’s sort of common sense, right? I mean, a lot of economics is common sense. For example, give somebody the incentive to go to court and get a protective order during discovery, and if the thing they want to protect is really valuable to them, they’re probably going to do it. If you make it harder for them to get that, they might hesitate to ask for a protective order, particularly when they don’t really have any particularly good reason to get that protective wear. 

 

So at some level, you know, economics is like a disciplined and careful way of thinking up through the incentives that people have to make different kinds of choices. And when you apply that in the legal system, you can often get really good insights into what sorts of behaviors different sorts of legal rules and policy reforms are likely to have. 

 

So let me give you another example. I’ve already talked about protective orders. Let me give the example of another paper of mine that you sort of briefly mentioned earlier, and I didn’t get to, which is a paper about California’s Stand Together Against Non Disclosure Act or the STAMP Act, and another one that followed that, that law. So the STAMP Act makes non disclosure agreements that hide the details of sexual assault, sexual harassment, sexual discrimination cases, many what’s happened in the workplace, the facts that generate these cases, it makes it non enforceable to have a secret settlement that hides the facts of these. And this is a law that was enacted in the kind of immediate wake of the me to movement and the many disclosures relating to Harvey Weinstein and lots of other wrongdoers. And it’s an interesting law to think about from an economic standpoint, because one of the things that a plaintiff who has truly been wronged in a sex discrimination or sex harassment case, one of the things that that plaintiff has to sell, or has to offer in litigation, you know, so there’s a threat, right, which is, I’ll go to court and out, disclose these facts, and it’ll be embarrassing for you. And it’ll show that you broke the law, and you will be found liable, and you will, under various statutes, and you will have to pay me damages, right. Part of the threat is the susceptibility to having to pay damages. But part of the threat is also like the really embarrassing facts, and particularly for a very large corporation or employer, you know, which may have problems in some of its workplaces and not in others, they may care quite a lot about the reputational aspect of that considerably more even than the financial damages to which they’re exposed. 

 

So the kind of thing that that that kind of defendant is going to be especially interested in very frequently is to essentially I said, selling secrecy earlier, from their perspective, they’re buying reputational cleanliness, in a sense, part of what’s going on in such a settlement is the incentive for the defendant to avoid this negative publicity. The incentive for the plaintiff on the other side, is to avoid being attacked on the stand or described as having done something to deserve the mistreatment, and what have you. And so there were many people who thought these were great, these kinds of laws are great, because they’re going to prevent companies from hiding behind non disclosure agreements. And that will show the world that, you know, this company needs to be monitored, or you shouldn’t work for this company, or the authorities should look into whether there are criminal laws being broken, and so on. And all that could be right. At the same time. Some critics of these laws suggested that they were going to scare off plaintiffs with meritorious claims, because some people were going to be afraid, as I said earlier, that they’d be dragged through the mud on the stand. And if those people have to worry that once they filed their lawsuit, they would be attacked, or even just they would be worried about public shame, then they might not bring those cases in the first place. 

 

All that is, is economic reasoning, right? It’s all response to different types of incentives. Now, again, it’s common sense to so you could say what do we need economists for, for this, well, we’re economists are especially helpful is when there are multiple kinds of incentives kicking around in different directions. And we have the ability to think carefully about which stages of litigation are likely to be affected by this kind of law in which will be less affected. So one of the things you might expect is that, and this comes out of the way we would sort of carefully model this kind of question is that we might see a lot more settlements happening before a claim has ever filed in court. Because it turns out the STAMP Act doesn’t apply to those sorts of cases. So there’s an incentive now to settle with a woody plaintiff before he or she ever filed suit. And that has certain implications for what should happen to the number of cases that are filed. In other words, we might expect to see fewer of them filed, and so on. And so that’s the kind of thing where the economics toolkit can be really helpful in helping us understand whether particular views of whether a law is good or bad for different reasons are being borne out by data, without thinking carefully about what the incentives should cause in terms of different behaviors, it might be difficult to figure out which patterns of data would support one versus the other theory. So economists and particularly empirical ones, and that’s I’m also an empirical economists, tend to be pretty good at thinking through those details. And let me just say one other thing about about that topic, that paper which is I’m not I don’t have the entire title in front of me, but it has to do with secret settlements in California under the STAMP Act. That paper is actually forthcoming in the University of Chicago Law Review. Same co-authors except instead of Aaron, our fifth author is Garrett Wen. 

 

GWYNETH SHAW: You’ve been involved for many years in efforts to improve access to PACER, at least for researchers, and ideally, at some point to maybe even drop the paywall entirely for everyone, can you talk a little bit about that some of the people you’ve been working with both in the advocacy realm and in the political realm?

 

JONAH GELBACH: Sure. So I’m, you know, I have been involved in various attempts on this front. There are a lot of different players involved. In this at one level, you know, you have the Judicial Conference, which is the well, it is what it sounds like, it’s the it’s the body of different judges who serve function to run the judicial system. The Judicial Conference sets policy, I think, through the Administrative Office of the United States Courts, for you know, what the paywall will be, what are the circumstances under which fee waivers are available, which documents will be available to whom for free. There are a lot of details to all of this, but in principle, the Judicial Conference could make changes at its own behest, it probably is required by statute to charge at least something under some circumstances, but it could radically change the way things operate. 

 

So they’re one player. You have also Congress, Congress could enact a statute tomorrow that says the judiciary simply cannot charge fees for these data. Now, I think that would be a good idea in some respects, and a really terrible one and others. So it would be a good idea if it were done in such a way as to hold the judiciary financially harmless. So the judiciary makes a lot of money actually, on PACER fees, it brings in something like $150 million a year in revenues from the PACER paywall, and they use that money partly to operate the CM ECF system, the actual system from which the data come. And that sort of was the original design idea, but because they make a quote, unquote, profit in doing that, it doesn’t cost anything like $150 million to run the CM ECF system. And for that reason, they have all this extra money sloshing around and they, they do good things with it, you know, they, they, they improve, you know, the technology in the courtrooms, which presumably means that you have better jury decisions and better administration of individual cases than you otherwise would have. Um, for those things, I don’t think that judiciary should lose access to that amount of money. 

 

But we have this crazy system where, you know, the judiciary, understandably, doesn’t want to lose this revenue stream. It’s important for some of the things that the judiciary does. But it’s a trivial teeny tiny amount of money in the great scheme of things. So $150 million, sounds like a lot. But it only sounds like a lot if you’re not thinking in relation to the entire economy, to the entire U.S. society, or, for that matter in relation to congressional appropriations generally. So let me give you an example. I noticed this during one of our our class meetings this semester, there’s a house on the market in Los Angeles for something like $150 million. Now, that’s the cost of the stock, which is a house, and I’m talking about a flow, which has annual revenues, but it gives you a sense of the proportion here, somebody who could buy that house could essentially pay for an entire year’s worth of free data to the judiciary, that seems kind of crazy. Here’s another fact: Americans spend something like 25 times as much annually on wild bird seed as the federal courts taken through the PACER revenue system. 

 

Now, of course, these two things have nothing to do with each other. But it gives you a sense of scale, Congress could increase the federal Judiciary’s budget by $150 million, without affecting anything at all, it would have no impact on interest rates on the what it costs the Treasury to borrow money for financing the deficit or any of that. And it would radically transform the public’s access to data and information about how the federal courts are operating. And so that’s what I think should happen. I think Congress should get involved in in this very simple way. And I think it would solve a lot of things. There is legislation, which I’ll talk about in a little bit, though it doesn’t do quite exactly what I just described. So you have Congress, you have the courts, you also have lots of advocacy groups out there. One of the leading ones is a nonprofit called the Free Law Project, which is actually coincidentally based in Berkeley, the Free Law Project was founded by was co founded by Mike Lissner and Brian Carver. Mike Lissner is now the executive director of the Free Law Project. And it actually grew out of a master’s degree project that Mike had. And Brian was at the time a professor at Berkeley in the information school and was his advisor for this project. And this was I don’t remember the exact number of years ago, but on the order of 10 or 15 years ago, and they they decided, you know, we’re gonna make a go of this project. Brian subsequently left Berkeley and he’s now copyright counsel at Google, but he’s still involved as a board member at Free Law and for those who are familiar with the CourtListener website, which most people are if you pay attention to federal or for that matter, state lawsuits at all, because when you see links on Twitter or on the web to documents, they’re often the CourtListener link, CourtListener has to be the sort of largest collection of court records that are freely available to the public. Certainly, it’s the largest one I can think of. But it’s not representative because it doesn’t have the full access to PACER or many state databases. So Free Law Project has been in addition to providing a lot of law for free, a lot of legal information for free, Free law has been involved, I think, in advocacy in lots of ways. I think there are lots of organizations like Free Law Project out out there that are, so another would be public resource.org, which is puts up a lot of legal information on the web for free. And also advocates are constantly involved in litigation to try and open up access to legal resources, you have legal startups, legal startups can’t do their like aI thing unless they have access to the data that they were used to train their AI systems, you have third party litigation, financing organizations, which want to figure out, you know, is this case that somebody has come to us and asked us to help finance is that a good risk, given the judge who’s going to hear it in the place that they’re, that they’re going to litigate? 

 

You know, given the other things we know about that judge in those types of cases, those folks would love to have access to information, that sort of just general information about the way legal cases have proceeded. And then, of course, you have lots of researchers, there are lots of folks at universities and law schools around the country in the world that that would benefit if we had access to public access to these data, and you have journalistic organizations. So there’s a ton of players, there’s lots of people out there who want to improve access to court data. And there’s really only these two channels, though, there’s, you know, convinced the judiciary to change their ways. Good luck with that. And then there’s get Congress to change the law. That might happen. But so far, it hasn’t. 

 

GWYNETH SHAW: And you’re teaching a course about this topic this semester with some of the people you just mentioned, as guest speakers. Um, who else is on your list of speakers? And what do you want students to take away from your course?

 

JONAH GELBACH: Great. So what I want students to take away from this course, is that there there’s a really strong case for at least their big interests implicated in the argument for having more open access to judicial data. There are also good arguments for limitations in particular cases, there are witnesses in cases who should be protected from being exposed and publicly known. The thing is, the courts already seal information about that kind of person, and the people who would know to look for the identity or such a witness already can access it at 10 cents a page, if it’s accessible at all. So what we’re talking about, I really don’t think has much to do with that. So I want students to know that there are arguments on both sides. But I want them to be able to engage and think critically about those arguments and what they mean. So that’s what I want them to take away from this, I also want them to take away the idea that, you know, we’re used to think we in law schools and legal education are used to thinking about the law is just fully accessible to us. We have access to Westlaw, we have access to various forms of data, because our law schools pay for that access. And so if we want to know what happened in the case, we just go to Westlaw. Those searches are actually really expensive outside of law schools, for many attorneys and many organizations. Maybe they shouldn’t be, right, maybe maybe the data should maybe they should be expensive, because you’re paying Westlaw to for their search algorithm, but not expensive, because you’re paying them for access to data that could be gotten for free if you knew what you were looking for anyway, right. So these are the kinds of practical and big things that I want them to take away together. 

 

Now, the other parts of your question, let me take a step back and say, the genesis of this class is I needed to pick up a seminar for the semester. And I was thinking some about things I wanted to talk about. And as you know, you know, the the issue of judicial data access has been close to my heart for a long time. I actually wrote a paper, which I don’t remember we’re gonna talk about later called free PACER about the topic, and I’ve been thinking kind of on and off about maybe, you know, trying to integrate this into my teaching some it happens that Mike Lissner and Brian Carver and I were, were having tacos and beers one night, last fall. And I was telling them about, you know, what I was teaching. And we were talking about judicial data issues, generally what was going on in Congress and some of the work they were working on and so on. And we, as we were talking, I can’t remember which one of us suggested this. But one of us got the idea that well, why don’t I do a seminar on judicial data and sort of all the different issues related to it? And I jokingly said, Yeah, we could call it public access to court electronic records. And they both laughed, because of course everybody hates PACER and they everybody loves PACER, because they want the data, but they’re mad because they can’t have it for free. I actually tried to change the title at one point Brian instructed me that that was simply not allowed. And so one of the great things that’s how And as a result of this course, which I’m delighted that Berkeley Law agreed to let like approved this this course topic is that I have all these guest speakers. And among them are Brian and Mike, who have been sort of helped me develop this course and think about what topics to include. And they’ve been president most of the course meetings. We have another collaborator, Rebecca Fordon, who’s a law professor and law librarian at Ohio State, and has also been involved in PACER access. And so we’ve one of the other great things is we’ve had the benefit of somebody with real law librarian chops has been able to talk about well, what can you do with existing data? And what are you know, what are the legal tech tools on the horizon? And how might they intersect access to judicial data. 

 

And so there’s been this great community of folks involved in this, you asked about other folks who are who have been guest speakers. It’s been an incredible roster, I’ve been so thrilled with the the set of people who’ve agreed to present and discuss their work and their interests with within our course are about 20 students in the course, by the way, it’s a seminar that meets once a week and had quite a substantial enrollment. So we’ve had Carl Malamud, who’s the founder of publicresource.org. He’s talked some about copyright law and access to judicial records. We went all the way back to the founding, actually. And we got to talk about, you know, the early Supreme Court reporters, and that was really cool. Professor David Pozen of Columbia Law School came to discuss his work on transparency. He’s actually, in some respects, sort of a transparency, skeptic. So I wanted to make sure we engage with arguments about that. And, and that was a great conversation. He hadn’t actually thought too much about the courts, and judicial data. He’s really more focused on administrative agencies and the executive branch when it comes to transparency and some with the legislative branch. But that was a great conversation. 

We’re incredibly fortunate to have Judge Robert Dow, who is a district court judge in the Northern District of Illinois, who I know a little bit because he’s been a lot of conferences I’ve gone to he’s been a chair of various advisory committee related committees, Judicial Conference committees, and it’s kind of a mainstay among those of us who talk about rules and policy. He’s also now the counselor to the Chief Justice, meaning, Chief Justice Roberts. And that role allows him to advise the Chief Justice on all sorts of administrative aspects of how the federal courts are run the rules committee process of it committees, and we’re hopeful a bunch of scholars not discourse as such, but hopeful that we might be able to have the benefit of Judge Dow’s help and maybe opening up some access to data as well. And he was just very generous in coming to speak with our class. So that was really great. 

 

We’ve had Professor Judith Resnik from the Yale Law School came and talked about some of her work on on courts, and transparency in courts as institutions and even buildings. That was really fun. I mentioned Rebecca Fordon from Ohio State have sort of helped create the course topic areas, Mike and Brian as well, they’ve been speakers. Just this week, we had Professor Zach Clopton, from Northwestern and Professor Aziz Huq from the University of Chicago, who gave this great paper talked about this really interesting paper about judicial data as a public asset, public good, and really takes like a really serious constitutional analysis of Congress’s powers and what could it do in terms of regulating the judiciary and and requiring the judiciary to disclose data and provide public access to really thoughtful, interesting paper? Later this semester, we’re going to have Pam Samuelson, my colleague at Berkeley, who’s going to talk about AI and copyright law and intersections of those issues with judicial data. We’re hoping to have Chris Bogart who is CEO of Burford Capital, a third-party litigation, major third party litigation, financing company, and then also my co-authors, Nora and David Angstrom, from Stanford Law, I think are going to drop out and maybe talk about some of our joint work that I mentioned earlier. So it’s really been just a tremendous roster of speakers. 

 

GWYNETH SHAW: So what do you think will happen on the short term? And across the long horizon? Do you think a free PACER is possible in some time in your scholarly lifetime?

 

JONAH GELBACH: I sure hope so. That would be great. You know, I have this incredible data, but it’s getting old. It would be great to be able to refresh that and in ways that are partially possible, I think, given some of what’s out there, but but not comprehensively. So, and I think would be great for the, you know, obviously for the for society more generally, for there to be general access. I think there are kind of middle grounds that are plausibly achievable, like just if the judiciary could be convinced to do it. They could do it. And I don’t think it would require congressional action and I don’t think it would really affect The PACER paywall revenue, so bad for, you know kind of people who wouldn’t have access to the kind of middle ground thing I’m talking about, but really great for everyone else. 

 

So let me give you an example of that, you know, the judiciary, in principle could take the data that they have, and put it into a side database and just not put the most recent year’s worth of data or something like that. So everything until one year ago on any given day maybe becomes publicly available or available only to you know, some set of approved, approved folks, which could be researchers, it could could include, probably wouldn’t, sadly enough, include journalists, but it could in principle, and why would that work? That would work because for research purposes, generally don’t need to know everything that’s happening right up to the moment. There are exceptions to that. But for the kind of research that, you know, court watchers and legal scholars and legal empiricists Do you know, if if you gave us data that was only, you know, only a year, you know, only out of date by a year that would be incredibly good compared to most of what we work with. So that would be like radical improvement in the access that researchers, at least those who would be part of such a pilot program would have access to. 

 

Why would it not affect judicial revenues through the payment with a PACER paywall? Well, most of those revenues, or at least a very large share come from a relatively small number of purchasers, there platforms like Lexus, or Westlaw or Bloomberg, that buy in bulk, and so they pay a lot. There’s also you know, companies that are involved in debt collection and bankruptcy management. PACER includes bankruptcy data. And because it includes bankruptcy data, if you are engaged in a large scale way with creditor risk, one of the things you want to know is what claims are out there against your creditors. And in order to know the answer to that you have to be able to download data that’s been files that have been filed as part of bankruptcy litigation that’s ongoing. And so there’s actually quite a large market for bankruptcy data, those folks that have essentially figured it out, there’s a big enough market, that they’ve all paid, that there are companies that just download everything. And then they can charge enough to private parties to access the data for that to be a working business model. It’s not for the district court data for whether civil or criminal, there’s just no buyer who’s willing to pay enough for that to make sense. But one of the consequences is that there’s really no value in your old data for these large purchasers, or there’s very limited value anyway. 

 

And so if you made it all free, but only after a year, they would still find it in their interests to buy the data in real time. And so the judiciary would still get its revenues, just as it does now. But then there’d be this like slightly out of date data that would be available for research purposes, that’d be a massive improvement on the current system. Now, you could say, well, but now, aren’t you, you know, sandbagging the rest of the public just for a researcher access. And to an extent that that would be a fair criticism. And it’s one of the reasons why I do favor full public access to these data. But I will point out that there are lots of other places with federal data that have a similar structure, the Census Bureau runs lots of surveys and, and studies where there’s a public use version of the data that’s available to everyone. And then there’s restricted use, that’s available only to people who can pass a rigorous set of screening standards in order to make sure that they won’t release or inadvertently even disclose private data that, you know, like the names of people in surveys and things like that the model that the judiciary has, it’s not one that exists anywhere else in the government. And there are middle ground models that do exist all over the place, where there’s sensitivity for one reason or another to full disclosure. And I think, you know, that would be a middle ground. 

 

As I said earlier, the thing I really favor is Congress writes a check to the judiciary to make up for the the lost fee revenue, and then the data are just available to everyone pretty much in real time, how to actually provide the data to everyone in real time. I don’t think that judiciary is obviously the right organization to do the provision, like they should just create a feed of data. And then something like any provider wants to would then be able to provide their own user experience and front end to it. There’s lots of nonprofits that would do a great job doing this free law for sure. Internet Archive, maybe public resource. So do I think that this could happen? Yes, I think it would be very straightforward. Do I think that extreme version I just described will happen? Probably not. Although there is legislation pending, called the Open Courts Act. And the most recent version I’ve seen, which is the Open Courts Act of 2021. gets a good bit of the way towards what I described, and in some respects, all of the way, it’s a little bit vague and a little bit hard to know exactly what would happen if that became law. 

 

The judiciary has been very opposed to the Open Courts Act, I think, partly out of a concern that they won’t get full revenue replacement, which is understandable. But you know, Congress could turn around and pass that tomorrow. So will it ever happen? Hope so. 

 

GWYNETH SHAW: Great. Well, we’ll have to leave it there with that hopeful perspective. Thanks so much, Jonah, for joining me. If you want to know more about Professor Gelbach and his work, check the show notes for links to his papers and projects and a link to a couple of the other organizations that you mentioned, particularly the Free Law Project, so folks who are interested in this can explore the topic a little bit further. 

 

And thanks to you, listeners, for tuning in. Be sure to subscribe to voices carry wherever you get your podcasts. Until next time, I’m Gwyneth Shaw.

 

[MUSIC PLAYS]