Human- vs. machine-generated data: A cautionary taleDate: Feb 17, 2014
If your organization is considering tactics to gather and leverage compliance data, listen to Evan Davison's cautionary tale of improper assessments of human- and machine-generated data.
Davison, a security architect at Barling Bay LLC, spoke at the 2013 ISSA International Conference in Nashville, Tenn., in a session titled "Data mining for continuous monitoring and compliance reporting." Before his session, Davison sat down with SearchCompliance to discuss pros and cons associated with analyzing information generated both by big data tools and human-created data entered into spreadsheets and other systems. Watch this video interview and read the transcript below to get the full story.
Have you uncovered any trends by gathering compliance data? What stands out as a cautionary tale or an opportunity to do things differently?
Evan Davison: I think the biggest piece that we've uncovered is when we use automated controls. That's part of the point of this, of using big data for compliance: You can automate the assessment process for some controls. As you do that, you'll often become very cautionary because you start to see how things may not be as compliant as you thought they were.
Before, we would assess controls or regulations as snapshots in time. An assessor or auditor would come in and take things as they were at that point in time. As compliance professionals, we all know that there's usually that spike in security right before an audit and that downfall right after. Automating controls actually allows us to see that, actually quantify it and see what is really going on. Also, as we automate some of these processes, we see that the way we thought we were auditing it, or the way we audited it manually, may not be efficient or it may not be accurate at all.
As we move into this concept of continuous monitoring, which is what these tools are really driving at, is that it's not just about tracking individual technical requirements. If we try to bring in outside data sources, analyze data sources and human-generated data, we can start to see how things aren't matching up compared to that snapshot in time. It really changes over the scope of a year, six months, or even from week to week.
You just spoke a little bit about human-generated data. Can you talk about the difference between human- and machine-generated data?
Davison: It's pretty clear what human-generated data is: It's data that we generate when we sit down at an Excel spreadsheet, when you type notes, when you create a Word document -- things that may involve some level of creativity or something that a machine otherwise would not be capable of doing. Right now, most big data and security monitoring tools are focused on machine-generated data such as logs. We're looking through logs and other things to identify trends and abnormalities [and] to try to find security events that may be notable or specific to a requirement that we're looking for.
What we started to find was that machine-generated data obviously generates a lot of data. That's the allure of big data: that, with these tools, we can see things that our machines are doing that we couldn't see before. But what it's done is made us blind to the large volumes of human-generated data that are lost. We still haven't found a way to capture the human-generated data that is created day to day. It brings us to a concept that big data introduced, which is allowing us to keep all the data and we pick out what we think is important to us. As long as we have space, or as long as we have the processing power, we can take everything in and just look at what we want to when we need it.
For some reason, we haven't done that with human-generated data. An employee walks out of the building, and all that knowledge transfer goes with them; their documents go with them. Most organizations will track [employee] training progress or other types of industry requirements through an Excel spreadsheet. What if we took that Excel spreadsheet and trended it in time and didn't change the process of how the data was generated, where it was stored, or anything else? We just used a big data tool to pull it in and look at it, just like we did with machine-generated data?
Now we have a greater scope of what we can capture instead of using just technical controls. We can now see policy and process, we can now see training, we can now see policy being changed. We are actually generating security incident reporting. All those types of things can start to be captured based on your organization's policy and process.