Predictive coding technology continues to gain respect in the legal community, as it's increasingly viewed as a valuable way to reduce the cost and scope of electronic discovery processes. There are challenges to using the technology, however, according to information governance expert Jeffrey Ritter. It's important to remember that predictive coding technology is only as good as the humans operating it, Ritter said, and companies often fail to ask vendors the correct data governance-related questions when acquiring predictive coding software.
In part one of this two-part podcast, Ritter sits down with SearchCompliance Editor Ben Cole to discuss predictive coding and how companies can best take advantage of the technology to optimize e-discovery processes.
There's been enormous buzz in e-discovery about predictive coding technology. Can you explain, in plain English, what the technology is all about?
Jeffrey Ritter: Sure. Discovery means finding information that helps tell the story, and each side in litigation wants to tell the story with the information that is most favorable. That means looking through vast quantities of electronically-stored information to find the data records and the communications that are both responsive and relevant to the litigation.
In effect, imagine a very large haystack where every piece of straw represents a different type of data, record or communication. Predictive coding is a technology that assists this process of going through the haystack by taking vocabulary, words or phrases that have been selected by the litigation team and then searching for patterns within each record that are essentially aligned to the defined search target. You can either use it to be inclusive to find information that matches the words you provided, or exclusionary if the records do not have any other relevant search terms.
The idea to develop this search, though, is that you have to build a search set or a set of terms to be used. You can do that in one of two ways: The lawyers or litigation analyst will take a sample set of documents and extract from that an inventory of words and phrases to be looking for. Or, you can develop a more specific understanding of the vocabulary related to the litigation -- the acronyms, the shorthand, the misspellings -- and begin to be more inclusive in finding the words.
The key thing to emphasize is that predictive coding is predictive: It is guessing as to whether a record is relevant or responsive. It does not eliminate the need to make any other decisions about actually producing that record, and it's only as good as the information that has been provided to it to search for. In that regard, it is essentially technology-assisted guesswork. That guesswork is effective if you do it right at [the point of] reducing the size of the haystack, which means that the downstream expenses of processing, review and production are incrementally, but significantly, reduced. It's working, and we're increasingly seeing companies and law firms use predictive coding and seeking and obtaining judicial support for using it as a way of making e-discovery more rational in the overall expense.
Sounds exciting. What are some of the challenges law firms and corporations face when selecting predictive coding technology?
Ritter: The selection of any technology to assist e-discovery is often done in crisis -- the big lawsuit that puts the company's very survival at risk. It's the lawsuit that suddenly is international in scope and has a level of complexity that no one contemplated, a lawsuit where it's a class action, possibly against a class of corporate defendants. These crises are where the companies often find, or their law firms find, that "we need to get this under control."
Over the last 5 to 8 years, we've been seeing more technology-assisted tools enter into the marketplace. There are five areas where I think everyone basically fails to ask the right questions when acquiring the technology. Every installation of any technology often confronts gaps or shortcomings in the infrastructure or the resources that the licensee -- whether it's a company or a law firm -- faces. One area where the challenges exist is that we go out and get a technology and don't have everything in place to be able to enable that technology to work correctly. It's like going out and buying a car and realizing that you need a garage for it and four tires, but you got it on special. You still need to get the tires, you still need the garage, you still need the driver education. Those are variables that drive up the actual total cost of ownership.
Second is training. If it's a Ferrari, you need to know how to drive it. We often find companies and law firms underestimating the learning curve their team members must navigate in order to use predictive coding effectively. Third is that most of these tools, as a result of their sophistication, are best used by people that are skilled in the analysis of vast quantities of information -- not legal analysis, but in finding semantic patterns. These types of information analysts often work within and in full support of predictive coding technologies. It's one of the key areas, again, where when calculating total cost of ownership, companies fail to anticipate the expense and fail to ask questions about what they actually need to be successful in their investment.
More on predictive coding
- Predictive coding helps data discovery, along with manual review
- Use predictive coding to benefit numerous info governance processes
Fourth is information security. Predictive coding technology is great, but only if the way it works, the results and its operations are properly placed within a security envelope that the company is comfortable with. Several years ago, a friend of mine who is an information security analyst confided in me that he had actually conducted security audits on three different e-discovery service vendors to discover that none of them were secure and he actually found hostile malware in each of their systems that was duplicating and exporting ESI [electronically stored information] whenever they connected to the Internet. It's critical that we understand that in using the coding technology, particularly across the cloud, the security is there. We're overlooking that.
Finally, because predictive coding is just that, a predictive technology that's based on semantic analytics, there is a great deal of science that has developed about the effectiveness of various strategic tools and algorithms that are used. Many law firms and law departments that are acquiring predictive coding in crisis fail to actually ask the question: How good are the algorithms inside the software at doing what we tell it to do in finding information?
Customer variables, training, the use of information analysts, information security, search effectiveness -- those five key areas are where my experience indicates companies don't ask the right questions of the vendors to assure the success of their investment and hold the total cost of ownership under control.