News Stay informed about the latest enterprise technology news and product updates.

Predictive coding: It's not just for e-discovery any more

(This blog post was written by Marilyn Bier, chief executive officer for ARMA International.)

By now, everyone in the world of information management is aware of the staggering increases in electronically stored information (ESI). Pundits and consultants often use colorful analogies to emphasize these remarkable numbers. Extreme predictions are routinely made and routinely replaced by predictions that are even more extreme. Obscure terms like “exabytes” and “yottabytes” are commonly summoned.

There’s a good reason the pundits and consultants are going through all this trouble to get our attention: poorly managed ESI poses very serious business and legal risks to an organization.

Maximizing value, minimizing risks

Solid information governance is always the best approach to maximizing records and information as business assets and minimizing business risks. ARMA International defines information governance as “a strategic framework composed of standards, processes, roles and metrics that hold organizations and individuals accountable to create, organize, secure, maintain, use and dispose of information in ways that align with and contribute to the organization’s goals.”

Poor information governance will likely result in a data management disaster whereby records are lost, retained too long, disposed of improperly, made vulnerable to breaches, and subjected to other undesirable ends that weigh heavily on an organization’s bottom line and its reputation.

The explosion of ESI and the rigors of e-discovery have spawned many tools that promise to help organizations conquer the chaos of “too much information.” One evolving technology, called predictive coding, has proved useful for e-discovery and is gaining traction as a tool for managing information throughout its lifecycle.

Using predictive coding for e-discovery

The process of predictive coding is not new, but the technologies around it have been evolving rapidly to better address e-discovery, where the hours and dollars required to manually review thousands of documents in potentially thousands of locations can overwhelm some organizations. Using algorithms, predictive coding helps an organization get a better idea of what its data contains, thereby signaling its relevance to a particular e-discovery action.

Also of note, the courts are showing some level of comfort with the practice of predictive coding for e-discovery. In the 2012a Silva Moore v. Publicis Groupe et al. action, for example, the defendant proposed using predictive coding technology to cull more than 3 million documents. After the plaintiff objected to the methodology, U.S. Magistrate Judge Andrew J. Peck (Southern District of New York) ruled that while predictive coding is not a magical solution in all cases, “this judicial opinion now recognizes that computer-assisted review is an acceptable way to search for relevant ESI in appropriate cases.”

Last year, in Global Aerospace Inc. v. Landow Aviation LP, a Virginia circuit court allowed a defendant’s request to use predictive coding in a document review that would otherwise require 10 man-years of billable time.

Using predictive coding for information governance

Information professionals Leigh Isaacs and Doug Smith are among many who champion the use of predictive coding beyond the e-discovery arena.

Isaacs, director of records and information governance at Orrick, Herrington & Sutcliffe LLP in Washington, describes predictive coding in a recent Information Management article as “an evolving technology that combines people, technology and workflows to find key documents and identify and review large data sets.” It’s a machine-learning technology that teaches the computer program to predict how to classify documents, based on human guidance, and “the computer program then applies what it has learned to the universe of information.”

In the article, Isaacs explains how predictive coding increases information identification accuracy by pairing subject matter experts with predictive coding technologies. This pairing provides a solid foundation for defensible disposition and prevents content from being retained too long. The technologies can also help a company cull its data to identify valuable intellectual property; locate vital records and contracts that may have been misfiled; identify sensitive information for the purposes of protection and compliance; and much more.

Smith, business manager at Wiley Rein LLP in Ashburn, Va., says in another Information Management article that predictive coding offers an alternative to the manual, subjective process of coding and quality review, which is laden with inefficiencies and inaccuracies.

Predictive coding processes operate either through sampling or observing, both of which use human decisions as the calibrating mechanism, he explains.

Sampling is done by computer software that randomly selects a subset of electronic records and presents it to a human coder for review. The software monitors the coder’s decisions, notes the characteristics of the records that are coded — such as date, recipients and keywords — and then uses these recorded decisions to predict the value of the remaining documents.

In the observing process, the coding software monitors the decisions of human coders as they review records, and then predicts how a record will be coded before presenting it for coding. Next, it compares the predicted coding to the actual coding. Eventually, the software’s predictive coding process reaches the accuracy level that’s deemed acceptable based on pre-set policies.

If an organization lacks proper information governance and hasn’t conformed with the Generally Accepted Recordkeeping Principles®, its ESI will not be in a legally defensible condition. Predictive coding can help remediate the problem by creating a classification schema that identifies and categorizes the information that’s housed in unstructured or less-formal systems, Smith explains.

ARMA international joins Isaacs and Smith in encouraging legal, IT and information management professionals to work together to consider predictive coding as another solution in the information governance toolkit.

Marilyn Bier is chief executive officer of ARMA International, an authority on governing and managing information as critical business assets. As a not-for-profit professional association founded in 1955, it provides its 10,000+ global members and countless external customers the education, publications, and resources they need to be able to create, organize, secure, maintain, use, and dispose of information in ways that align with and contribute to their organization’s goals.

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.