Several years ago, a lot of noise was being made about data classification for managing storage tiers, with startups like Index Engines, StoredIQ, Njini and Scentric springing up in the data classification space. EMC even jumped into the data classification space with its Infoscape software. But even though data classification has since spilled over into data compliance, electronic discovery and storage security, success has eluded data classification products. Scentric has gone out of business and some of the other vendors have retooled their offerings.
"Adoption is nowhere near where the hype is," said analyst Greg Schulz with StorageIO Group. "One of the biggest challenges for data classification is it's talked about now in so many contexts -- compliance, regulatory, electronic discovery -- as opposed to just tiering and management. Sometimes the term is used to cover too many things."
The key underlying technologies for data classification are the same regardless of how it's used. Data classification tools identify, index, search and move data according to an organization's policies. "It's really a supporting technology for a number of different solutions," said analyst Jeff Boles, Taneja Group. "It's a matter of picking the right solution for what you're trying to accomplish."
Methods on how to classify data vary depending on whether it's structured data in databases, unstructured file data or email. Most classification products are geared toward file and email classification, but many organizations that classify data end up writing their own scripts to do the classification. . .even though creating scripts is usually labor-intensive and complicated. "Data classification requires a lot of coordination and scripting between teams. . . a lot of scripting," said Rob Stevenson, managing director for research firm TheInfoPro.
Still, Stevenson said, interviews with storage decision makers in enterprise and medium-sized shops reveal that the overwhelming majority who use data classification do it themselves. "Among Fortune 1000 companies, they're almost all doing it homegrown," he said. "The midmarket sees it as a feature of the product they buy. They're not buying separate products for data classification. The problem is, the disconnect between applications and what storage vendors are talking about is tough to cross."
Even storage vendors aren't sure how to classify their data classification products. Last year, EMC put Infoscape under its RSA Security banner, claiming that customers found Infoscape more useful for controlling access to sensitive data than tiering and archiving data. Njini, one of the early data classification vendors, has moved in the same direction.
Boles calls this category of data classification information control. "You detect a normal usage pattern or find a key word popping up in a file name that suggests this person shouldn't be accessing that," he said. "It takes some type of proactive control. This is bringing data classification and storage management tools into the security domain, which is interesting to watch. You can interact with unstructured file-based data based on who owns the data."
It's even more common for classification to be used for regulatory and data compliance reasons in products, such as email archiving applications. Autonomy Zantaz, Clearwell Systems, MessageGate, Microsoft, Mimosa Systems and Orchestria already have products in this category, and other classification vendors are planning to enter this market. For instance, Index Engines gives backup administrators the ability to do a keyword search across dozens of backup tapes to identify which tapes should be restored -- a feature that can come in handy during legal discovery.
"Most classification products are gravitating toward litigation support," Schulz said. "Search technology, discovery and classification are being embedded into more and more products."
Boles thinks classification may also help enable business processes down the road, perhaps moving into the areas of document or content management. But for now, data classification is a technology feature looking for a home. . .or several homes.