The following article offers key insights, previously undisclosed to the legal community, on how to improve the burdensome document review process through the use of machine learning, also known as predictive coding or technology assisted review (TAR). Document review has become particularly challenging because the volume of electronically stored information has grown exponentially in the past decade. Although document review has become increasingly time-intensive and expensive, employing machine learning can ease the burden of document review for counsel and clients. Machine learning uses computer algorithms to identify potentially relevant documents during discovery. The goal of machine learning is to reduce the manual review by attorneys of irrelevant and nonresponsive documents.
Understanding the technical aspects of machine learning is essential for efficient document review in modern litigation. The carefully constructed experiments presented in this article shed light on how best to design predictive models. The authors of this article performed nearly 34,000 experiments to determine the best overall combinations of algorithms and backend settings for predictive modeling effectiveness. These experiments used six data sets from real cases across a variety of industries. The results of these experiments demonstrate that the current use of machine learning in legal matters is inefficient. Significant improvements can be made to basic settings that have the potential to greatly improve the performance of the algorithms and save literally thousands of hours of attorney time that is currently spent needlessly reviewing irrelevant documents.