By Andrew Cohen
Like all merger and acquisition lawyers, Wei Chen spent much of her formative years on due diligence patrol — mining thousands of contract pages to find the few clauses needed for legal analysis.
“The process hasn’t evolved since I started practicing over 20 years ago: it’s time-consuming, mind-numbing, and prone to errors,” Chen laments. “Two years ago, with the success of artificial intelligence (AI), I kept asking myself, ‘If my iPhone can find cat pictures from my photo albums, why can’t AI find the most favored nation or exclusivity clauses in my piles of contracts?’”
Senior vice president and associate general counsel at Salesforce, Chen soon realized that the secret to AI success lies in the volume and quality of training data. “For AI to perform better on contract review, lawyers were needed to tag contracts and make them publicly available for free,” she says.
But given that highly trained legal minds are needed for such a task — and that most contracts are confidential and proprietary — how to open source them through skilled lawyers willing to tag contracts for free?
Chen found her answer after conferring with Adam Sterling ’13, executive director of the Berkeley Center for Law and Business (BCLB), where she serves on the advisory board. Sterling connected her with a dozen eager Berkeley Law students, leading to an experimental pilot program that launched the Atticus Project — a nonprofit that harnesses AI’s power to accelerate accurate and efficient contract review — in summer 2020.
“The Atticus Project is defining the future of technology in the legal profession and I was so excited for the opportunity to allow Berkeley Law students to be a part of that,” Sterling says.
Working under the supervision of experienced attorneys, the students released a beta dataset of 200 commercial contracts in October 2020. Sterling subsequently introduced Chen to two UC Berkeley AI researchers, which opened the floodgates.
“With their help, we were able to open source a training dataset in March 2021 with over 13,000 clauses across 510 commercial contracts, corresponding to 40 types of clauses,” Chen says. “We hope this dataset will be a catalyst for legal AI innovations and move the industry forward for all.”
The Atticus Project team tags the contracts’ key clauses, exports them into a csv file (similar to a spreadsheet) computers can read, and downloads the full contracts into txt files. The team wrote a user’s manual and datasheet explaining what the dataset contains, how to use it, and what limitations and biases it may have, and posted the package on its website and for free public download.
A broader reach
Recently, The Atticus Project partnered with BCLB and the LexLab at UC Hastings Law to launch two programs — one for experienced attorneys and the other for law students.
The former gives lawyers the chance to design a project to solve data-related legal challenges, create training programs, teach at participating law schools, and mentor law students. The latter offers law students and non-legal professionals the chance to learn vital legal concepts and find them in legal documents under experienced attorney supervision.
“Unless lawyers get their hands dirty to create training datasets, legal AI will never get to an accuracy level that we need,” Chen says. “There’s already a huge divide between the AI performance in areas such as law and medicine where skilled labor is scarce versus areas where training datasets are abundant or cheap to obtain, like image and voice recognition or Wikipedia. Unless the legal industry catches up quickly, legal will continue to live in the Stone Age while other areas take off.”
The Atticus Project has trained over 60 law students and 15 high school students on contract review skills and AI knowledge, including six from Berkeley Law this past summer. Experienced attorneys from approximately 10 leading law firms and several public company in-house legal departments served as volunteer lecturers and reviewers, discussing complex legal topics that students encounter during their contract review.
With the law students’ help, The Atticus Project developed a comprehensive reservoir of high-quality training materials. Students start their learning journeys by taking a self-paced online course on Berkeley Canvas that contains over 50 training modules with presentations and videos by practicing attorneys and AI researchers.
That learning is then reinforced by over 20 self-graded quizzes and a 200-plus page handbook created by the attorneys. The students are then tasked with finding the key concepts in contracts and tagging them using an online tool. The process includes individual student review, group peer review of the work product, and attorney supervisor oversight and feedback.
Students also had access to office hours for questions, career talks, and OCI interview preparation to establish meaningful conversations with the supervising attorneys beyond contract review.
“I had previously only heard ‘artificial intelligence’ and ‘machine learning’ as buzz-wordy terms used to describe a dystopian future of work,” says Chris Gronseth ’22, who worked with the Atticus Project in summer 2020. “ I was excited to get hands-on experience in learning more about what AI really entails and how it might be a positive and useful tool in legal work. The fast-paced, startup dynamic at Atticus provides a lot of room for creativity and initiative.”
Chen’s team is busy creating a new training dataset of legal contracts that she plans to open source in early 2022. She foresees AI extending into more areas of legal research and consumer protection, such as scanning a credit card agreement for predatory clauses or scouring a privacy policy to assess compliance with the law.
“AI can enhance and supplement an expert’s knowledge by leaps and bounds,” Chen says. “Knowledge that requires years of training and apprenticeship can be obtained within a few short months. In the very near future, a lawyer not knowing how to leverage AI would be the same as a brain surgeon not aided by CT-scan or a farmer not having access to weather reports.”