Big data takes many forms, with the umbrella term used to describe vital business information stemming from applications, security logs and business productivity documents, to name a few. Then, of course, is the unmanaged or unstructured data not stored in a database, which is headache-inducing if the information must be located for a compliance audit or lawsuit.
When it comes to aging data, understanding exactly what specific data is used for and what information should be discarded was cited by 50% of respondents to a 2013 Gartner Inc. survey as the number one issue their organizations struggle with the most. From information governance perspective, information classification challenges stem primarily from not knowing what data to keep, said Gartner Research Director Garth Landers.
"This is increasingly hard given that these big data analytics opportunities may not be readily evident," Landers said during a presentation at the 2014 Gartner Security and Risk Management Summit. "For those of us in IT, there is a lack of understanding, a fear of what to do with that data."
Classification vital to data discovery
This makes data classification -- and what to do with that newly defined information -- vital when it comes time to identify and recover that data if it's needed for litigation. After all, if you don't know what exactly the data is, how will you know how to properly store and recover it?
Having a rock solid structured data archiving process is a great place to start. It is now common for organizations to generate and manage petabytes or even exabytes of data. As a result, organizations are moving away from storing data indefinitely and toward identifying any information that can be deleted, Landers said.
"If you store everything, it's more likely to come up in pending litigations or regulatory compliance audits," Landers said. "If data has no business value, get rid of it."
Data classification is vital to this process -- starting with simply defining what a "record" is as it pertains to your organization, said John Isaza, Esq., who leads the Information Governance & Records Management practice at law firm Rimon PC.
"Once you have that definition of a record, then you can dispose of information that is not a record, except of course if there is a legal hold in place," Isaza said.
Companies need to strike a delicate balance, however: Keeping too much information can create legal complications, but the organization also must be prepared in the event that relevant information is called upon as part of an e-discovery request.
"To prevent over retention, you have to first and foremost have a records retention schedule that dictates what you absolutely need to keep as a matter of best business practices and to remain fully compliant with regulations," Isaza said.
If the organization has adequate resources, Isaza suggests having a legal hold steering committee in place to oversee anticipated or pending litigation. The committee would be responsible for identifying legally relevant data and preventing the "destruction of not just records, but of all data in your systems that could be relevant to the subject matter of that lawsuit or investigation," Isaza said.
(Big) data mapping
If implemented correctly, these information classification and big data e-discovery processes can help reduce litigation costs and risk. But where should organizations start? One proactive solution is to develop your organization's "data map," said Marshall Hoel, manager of Berkeley Research Group LLC.
"The data map is a comprehensive listing of the systems employed by a business, their functions, how they store data, how often they are backed up and which other systems deal with similar data and/or how they interact," Hoel said.
This information will be invaluable if specific data must be located quickly to fulfill a legal obligation. Data maps can help the IT team communicate with legal counsel about how the record is classified, its location and the retention policies for it.
"It's also important to know who the key players are that are involved in the day-to-day management of the various systems," Hoel said. "It will be critical to interact well with these individuals when the time comes to act in the face of impending litigation."
Thorough data classification and mapping techniques have become necessary as organizations adapt to the demands of big data, as well as its associated storage costs. In the past, businesses that dealt with large amounts of structured and unstructured data would do so using relational database management systems tailored to the various applications. Under these systems, the data being analyzed was stored in logical, easily identified location so information management policies could be readily applied, Hoel said.
Big data has dramatically changed these processes, he added.
"When confronted with the problem of legal preservation and discovery, businesses are forced to re-think their strategies regarding structured and unstructured data, and traditional collection techniques," Hoel said.
Five strategies to manage cloud-based e-discovery
Cloud, mobile complicate e-discovery