Manage Learn to apply best practices and optimize your operations.

Choosing predictive coding vendors: The information to find out first

In part two of this SearchCompliance podcast, our expert covers questions to ask predictive coding vendors to ensure data management needs are met.

Predictive coding has become a respected tool in the legal community because of its ability to reduce the cost and scope of electronic discovery. It's important to remember, however, that all predictive coding vendors are not created equal, according to information governance expert Jeffrey Ritter. Before choosing a predictive coding technology and starting the implementation process, there are numerous data management, security, and even corporate culture-related questions companies need answers to first.

In part two of our podcast, Ritter and SearchCompliance editor Ben Cole continue their discussion about building relationships with predictive coding vendors, including the security and information governance-related questions to ask before choosing a technology.

During past conversations, you've mentioned there were seven key questions companies should ask when choosing predictive coding technology. What are these questions, and how can answering them help overcome the challenges of working with relatively new technologies, such as predictive coding?

Jeffrey RitterJeffrey Ritter

Jeffrey Ritter: You know, Ben, when we were getting ready for today's podcast, we hoped we would have seven questions, but I wasn't able to quite limit it to seven questions. What I'm going to do is give you a list of questions as [they] address each of these five key areas, and we'll talk over each one of them briefly.

These are the type of questions I would like to see included in any request for information, or RFP [request for proposal], that the company or law firm issues. Number one: Identify the five major customer variables that can add cost and complexity to the installation and use of your technology. This is something vendors won't often talk about, but they have the experience to know what are the things the companies as licensees need to do to make sure predictive coding technology will be effective.

The second area to focus on is training. Second question is: Describe the training provided to my team and its cost. Actually learning how to drive the Ferrari and having enough people in your group know how to drive the Ferrari can often be more expensive than was anticipated; so therefore, it's important to know whether that training is included with the product, [if] there [is] a supplemental cost, and who needs to be trained to be able to be effective.

It's important that the RFP process for predictive coding anticipate and address the all-in cost, including security controls.

Jeffrey Ritter,
information governance expert

That's really the third question, which is to describe the metrics that are used to measure the effectiveness of the training. Anybody can sit in the classroom and anyone can sit in their pink bunny slippers and watch a podcast, a webcast or a recorded lecture, but the question is: Are they learning? It's really important to ask if the predictive coding vendors [have a way of demonstrating the training is effective at teaching others] how to use their tool.

The third area that I mentioned a moment ago was information analyst. As experience indicates, just about any predictive coding technology is used best by people that really get semantics. There are professionals that are tremendously impressive, but they work for a fee. They usually work in support of lawyers and paralegals that may have basic discovery skills but don't understand semantic analytics.

There are three questions there -- these are questions four, five and six: Describe the skills and credentials of your information analysts. Ask the vendor who they have working there that helps the customer use their predictive coding technology. Next, how are they compensated? It's really important to know whether the information analyst's support is there as part of the license fee, or are they paid on an [hourly] basis. If so, when and how is that paid for? And then finally, how do the information analysts support my team's use of your technology? Is it just a help desk function? Is it active involvement in the case? Do they actually help evaluate the algorithmic results of different search queries to be more effective in the predictive function of the technology?

The fourth area I focused on was security, and in particular the security around the tool. Many of the e-discovery predictive coding technologies are now being operated across the cloud, and therefore it's vitally important to know where the security connects to and how it's maintained. Here are two questions about information security: Describe the security controls used with your technology. For any company, the quality of the description will be variable. If it's a financial institution that is regulated by federal authorities, they're going to have a much more detailed inventory of requirements than, perhaps, a mom-and-pop shop or automotive dealer on your corner lot. It's important that the RFP process for predictive coding anticipate and address the all-in cost, including security controls.

Then, more importantly, what is your liability [and] the vendor's liability if the security of your systems -- including stored data – [is] breached? This is very interesting, because while many people are talking about security, very few are willing to put their name behind it. You should be very attentive to the consequences if the security controls are breached, whether there's a liability disclaimer or a company that stands behind their work to secure your information.

Those are four areas. We have one more, which is search effectiveness. There are three questions that I have here. To refresh, in this area, what we're looking to find out is how good is the predictive coding technology at predicting the relevance and responsiveness of ESI that is being targeted in a very large haystack. First, ask the vendor to describe the testing methods employed to validate the effectiveness of their algorithms. This is a fairly detailed discussion, but any vendor that has a quality product should be able to describe to you what they have done to test and validate the way they find responsive and relevant ESI. It's a detailed discussion. It may require some support from your IT group to be able to test and qualify the vendor's response, but it's a very important question.

As I mentioned a few moments ago, there is sophisticated science that's been developed. In fact, the National Institute of Standards and Technology has actually built an entire program, known by its acronym of TREC, in which vendors can participate by using their tools against a standard database -- a big haystack -- in order to measure the effectiveness of their tools at finding records. What's nice about the NIST TREC program is that it's a stable data set, so it's very possible to get highly accurate and comparable information on the performance of the predictive coding technologies.

More on this topic

  • Predictive coding tech benefits data management, other business processes
  • Predictive coding assists, but does not replace, human e-discovery review

Ask the vendor to confirm whether or not they participate in the NIST TREC program. If they do participate, ask them to share the results of their participation. What have been the outcomes of using their technology against the data set? If they have not participated in the TREC program, that can be a big yellow or red flag. If not, ask them, "Why not?" Why have they not made the commitment to participate in a nationally recognized, quality validation of the effectiveness of search algorithms?

Finally, in the area of search effectiveness, the last question is: Will the vendor use a customer-provided test data set to demonstrate the effectiveness of their predictive coding technology? If a company is really serious about acquiring predictive coding technology that works, while it can rely on the NIST TREC program, nothing's better than running a test with your own data and [with] a data set in which you have [a] pretty good understanding of what's actually there. For example, a data set that was used in a prior litigation that was thoroughly analyzed using linear methods without predictive coding.

There's really ten, 11, 12 questions, not just seven, that I've come up with that I think change the landscape of how you select predictive coding technology.

These seem like tough questions. Can the predictive coding vendors actually provide good answers?

Ritter: In a word, the good vendors can, yes. The reason I developed these questions is because you can quickly know the quality of the predictive coding vendor by watching whether or not the blood stays in their face or drains out of their face when you start having this conversation. By talking to the vendor, even before giving the RFP to them and indicating these are the areas you have as a concern, you may find that one or more potential marketing claims they have on effective predictive coding fall to the wayside, and they may not come back for the actual RFP. That tells you you've saved everybody's time and money.

Let us know what you think about the story; email Ben Cole, site editor. For more regulatory compliance news and updates throughout the week, follow us on Twitter @ITCompliance.

Dig Deeper on E-discovery and compliance

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.