Get started Bring yourself up to speed with our introductory content.

Six steps to 'dark data' information governance

ARMA International President Fred A. Pulzello discusses the information governance steps to manage dark data before it hinders company-wide GRC.

The term "dark data" may have clandestine connotations, but it actually refers to something more mundane: the information we know we have but don't know what to do with.

Gartner Inc. defines dark data as "information assets that organizations collect, process, and store in the course of their regular business activity, but generally fail to use for other purposes."

Dark data is a hot topic in information governance circles because of the relentless, immeasurable increases in electronically stored information (ESI) and the places to store it, such as on desktops, shared servers, flash drives, smartphones, tablets and the cloud. The quantity of ESI and its scattered nature contribute to the near inevitability of dark data accumulating in an organization.

One key characteristic of dark data is that it's left unanalyzed -- or "dark." If organizations allow it to collect in their systems, it can become problematic during litigation and regulatory investigations, and complicate efforts to secure personally identifiable information. Further, storing this excess data is unnecessarily expensive. It consumes server space and is a drag on system response time.

Smart organizations will take steps to manage their dark data. The word "manage" is appropriate in this context, because the first impulse may be to just dispose of it all. In truth, some dark data can be useful if properly managed. For example, organizations can mine their log files or customer call records to uncover patterns about customer behavior and preferences.

In the following list are six steps organizations can take to properly identify and manage their dark data:

1. Define and identify your dark data. First, determine the source of the data. Is it generated by employees or by a system? Employee-generated data is a less likely candidate for repurposing than system-generated data. It often consists of such things as working drafts and "just-in-case" copies. If the data is system-generated, consider gathering it into a form that's conducive to analysis. But, fair warning: There is no guarantee that the time and effort to harvest such data will pay off in the end.

2. Perform a cost-benefit analysis of the data's potential usefulness. To justify the further retention of the data, there must be a specific purpose for keeping it. The defined purpose should have a short retention time frame -- such as six to nine months -- to avoid the temptation to keep the data indefinitely "just in case." There must be accountability for any proposed purpose. Assign the responsibility of follow-up to an individual or group.

3. Determine what to keep and what to delete or destroy. At this stage, an organization should make the business case for keeping any data that can be useful and for deleting any data that has no apparent value.

4. Map data to the retention and disposition schedule to justify deletion. It may be necessary to create new categories on the retention schedule to address data that is not otherwise addressed in a policy statement. For example, the organization might have a policy on deleting drafts of documents, but that policy statement or retention schedule might not address the issue of systems' audit logs.

5. Execute your defensible disposition plan. Make sure the data slated for disposition is actually deleted or destroyed. Keep documentation that cites the "why" and the "when" of deletion.

6. Determine annually whether the repurposed data is bringing value. If the repurposed data is not useful to the organization, then it is once again dark data and should be deleted in a proper, defensible manner.

Dark data might seem like a buzzword, but the reality is that it's here to stay. Electronically stored information volumes will continue to explode. Big data will continue to get bigger. Technologies will continue to evolve faster than our abilities to keep pace. An organization's best response is to implement an information governance program that properly manages all of its data throughout its life cycle, recognizes the probability of dark data and implements measures to repurpose or properly delete it.

About the author:
Fred A. Pulzello, IGP, CRM, president of ARMA International and managing director at BNY Mellon, is certified as an Information Governance Professional and is a Certified Records Manager. He has more than 20 years of experience in information management, including as executive director and global corporate information manager for Morgan Stanley Inc. and as assistant vice president of the corporate records management group for Merrill Lynch & Co. He earned a master's degree in business administration from Seton Hall University.

Let us know what you think about the story; email Ben Cole, site editor. For IT compliance news and updates throughout the week, follow us on Twitter @ITCompliance.

Next Steps

Classification tools help offset dark data risk

Big data systems allow business to tap into neglected data

Dig Deeper on Business records management

Join the conversation

1 comment

Send me notifications when other members comment.

Please create a username to comment.

What information governance steps has your organization taken to properly manage dark data?