Why We Are Downloading all Free Opinions and Orders from PACER
Today we are launching a new project to download all of the free opinions and orders that are available on PACER. Since we do not want to unduly impact PACER, we are doing this process slowly, giving it several weeks or months to complete, and slowing down if any PACER administrators get in touch with issues.
In this project, we expect to download millions of PDFs, all of which we will add to both the RECAP Archive that we host, and to the Internet Archive, which will serve as a publicly available backup.1 In the RECAP Archive, we will be immediately parsing the contents of all the PDFs as we download them. Once that is complete we will extract the content of scanned documents, as we have done for the rest of the collection.
This project will create an ongoing expense for Free Law Project—hosting this many files costs real money—and so we want to explain two major reasons why we believe this is an important project. The first reason is because there is a monumental value to these documents, and until now they have not been easily available to the public. These documents are a critical part of America's legal system, and yet there is no easy and free way to access or analyze them except through expensive third party vendors whose tools are out of reach for many people. This inhibits researchers, journalists, and the public, and is in conflict with the spirit of PACER itself, which was formed at Congress's request precisely to provide "Public Access to Court Electronic Records."
Our work with Georgia State University is a good example of the kind of work that this project will enable when it is complete. In this project, Free Law Project will provide opinions and orders to GSU researchers so that they can study how different courts have interpreted employee classification laws. This will provide legal guidance to courts, government agencies, employers, and employees about where courts draw the line between independent contractor and employee status. As the "gig economy" continues heating up this is an important area of law, but to do this research, the first step is to have the raw data in bulk. This new initiative will create that kind of bulk data, and we predict that it will enable numerous other studies.
The second important reason why we are undertaking this initiative is that it should provide a mechanism for getting a fairly representative sample of cases from PACER. Our current collection of PACER content only has information about a case when the case is downloaded by a RECAP user. This tilts the RECAP Archive towards cases our users download, leaving out a vast swath of important content. This initiative will open up our collection so that we should have basic information about many more cases in the federal district courts. Any case with an opinion or order will be in the RECAP Archive.
This initiative will be a very important one as we further expand our collection of PACER content and tools, and we hope that you will support our work as we expand it into this new area. Downloading, extracting, and hosting all these files will not be easy, but these documents are a critical element of the American legal system. Making them easily available will create measurable impacts on our access to and understanding of the law.
Footnotes
-
For years Internet Archive has been a great partner in the RECAP initiative, and we are thrilled to continue partnering with them for our biggest collection of PACER data yet. ↩