Compliance professionals don't need to be told about data classification's importance. Very often, the compliance team is responsible for figuring out exactly what data needs to be kept protected and private, including personally identifiable information
But the waters become murky when it comes to available tools to help support a compliance-driven data classification program. The unique circumstances impacting how each type of tool is used in regards to regulatory compliance can be complex. The situation is compounded because data classification as a product category can also refer to related tools that don't necessarily fit the bill from a compliance standpoint, including tools used by storage engineers to support storage system performance optimization based on the type, value or "freshness" of data.
As a result, GRC professionals need to take a compliance-eye view of tools used to support data classification in ways that are most useful to them and to discover data governed by specific regulatory or contractual requirements.
Use automation tools to your advantage
When it comes to supporting data classification, server automation tools are commonly used to discover and identify regulated data within the technology ecosystem using defined, high-level classification categories.
Organizations often employ a data classification toolkit to canvas the technical environment -- including files and folders, database repositories and collaboration environments -- to find specific data such as health records, credit card information, social security numbers and other PII. These tools work by searching for patterns within the data. For example, a tool might look for 16-digit numbers that conform to the Luhn formula and develop a list of where those occurrences are.
Data leak prevention tools find and flag specific types of data that might also be included within a regulatory scope.
The compliance team usually completes this process either during development of the initial data classification policy documentation or as an ongoing process during policy enforcement.
But why do this before the policy is formalized? Many organizations like to understand the impact of policy-setting ahead of time, perhaps to set strategic dates to allow time for remediation, to document specific exceptions or to know how aggressively to implement the policy and into what areas of the organization. When used to support enforcement, usage primarily centers on finding areas that are out of compliance with the policy so that future remediation can be planned and implemented.
There are a few different types of tools that can be used for this type of data classification. The first are the storage optimization tools referred to earlier. The purpose of these tools is to analyze structured and unstructured information so that storage professionals can best determine where this data can, optimally, "live." Some of these tools are able to unpack data and examine the contents to look for patterns. Compliance professionals might be able to leverage this information and, with assistance from technical resources, find specific data covered by classification categories. These tools also support the creation of metadata that serves as an index into where this type of data lives within your organization. This is a useful feature for tracking data migration over time as well.
The second category of tools stems from the information security space. Specifically, data leak prevention (DLP) tools find and flag specific types of data that might also be included within a regulatory scope. When applied to a file system, these tools can sort through large volumes of data and flag specific data items. If already used by the organization, they can potentially be harnessed by compliance teams as well. If these types of tools are not already fielded by the organization, open source alternative tools like OpenDLP or MyDLP can be leveraged in a pinch.
Beware false positives, technical constraints
Whichever type of data classification toolkit you elect to use to support your strategy, be aware of a few constraints that may limit their usage. Not every technology supports every type of use, and there could be environments that cannot be inventoried either procedurally or technically.
More on compliance strategy
Emerging tech trends create compliance burden
Use records retention schedules to remain compliant
For example, cloud environments could present technical constraints, such as when a SaaS vendor won't allow OS-level access to underlying systems. The cloud presents procedural obstacles as well, such as if an IaaS provider bills based on CPU usage or data access throughput, creating an economic disincentive.
It's also useful to point out that an inventory exercise of this type can fail in a few different ways. False negatives occur when the system doesn't find something it should, such as when a credit card number is stored in an encrypted file the system can't open. False positives occur when the system finds data items that the tool indicates is the type of data you want to find, but it is not exactly what you are looking for.
When employed across a large data set such as a storage area network, the number of false positive results can be significant. Therefore, it's helpful to start with a workable data classification boundary at first, and expand the process outwards once you've ascertained the results are useful and meaningful. You may also choose to focus on a subset of specific data classification areas that are considered problematic before expanding usage to the entire environment.
About the author:
Ed Moyle is director of emerging business and technology at ISACA. He previously worked as a senior security strategist at Savvis Communications and a senior manager at Computer Task Group. Before that, he served as a vice president and information security officer at Merrill Lynch Investment Managers.
This was first published in September 2013