It is axiomatic that if something cannot be measured, it cannot be managed. It follows that risk measurement is an intrinsic component of risk management. Risk management is very much in the news these days, applied to finance, insurance and war. There are a variety of techniques used in each field, with mixed success in each and virtually no correspondence among disciplines. A common consideration of risk in business is the possibility -- or rather, the uncertainty -- about events that might interrupt an organization's functional and technical operations, i.e., continuity risk.
Unfortunately, the most common approaches to measuring continuity risk are vague, subjective and difficult to use for guiding management in budgeting for controls and countermeasures. Almost all are based on the simplistic formula:
Risk = Impact x Probability
There are several problems with this method of measuring risk. First, it does not measure risk at all, but rather exposure, which is the expectation of loss over time, usually expressed on a yearly basis: the ALE or annual loss expectancy. If members of management have a reasonable expectation of, say, $10 million in annual losses due to business disruptions, they have an outer boundary for investment to mitigate or eliminate their effects through controls or insurance. No one would spend $20 million to reduce a $10 million exposure. The proper amount is an amount (much) less than the potential impact, perhaps nothing at all (i.e., acceptance of the exposure).
As Nassim Nicholas Taleb explains in The Black Swan, risk is not about predictable losses but instead about the impact of highly improbable events, the so-called unknown unknowns. Thus, Risk = Impact x Probability is meaningful for those disruptions for which likelihood and effects are known, or at least are predictable. As Taleb demonstrates, it is specifically the rare, unforeseeable incidents that cause the most damage.
In other words, we will forever be uncertain about the probability of a significant disruption, a catastrophe. Other researchers such as Rory Knight and Deborah Pretty of Oxford Metrica have shown that the impact on shareholder value is magnified by management ineptitude, especially if an event results in deaths. Thus, the impact is not predictable either. The time-honored formula collapses into itself.
It is not simply time that honors the formula. The information security risk management standard ISO 27005, which includes business continuity risk, shows risk as a function of likelihood and impact.
BS 25999, the generally accepted global standard for business continuity management, explains that risk is "an average effect by summing the combined effect of each possible consequence weighted by the associated likelihood of each consequence," although to be fair the standard does go on to say that "probability distributions are needed to quantify perceptions about the range of possible consequences." It recommends instead standard deviations, which (as Taleb rants on about) lead us back to known, rather than unpredictable, effects. NFPA 1600, the U.S. standard on disaster/emergency management and business continuity programs, defines risk as -- no surprise -- "a combination of probability and severity."
So where does that leave us?
For one thing, it leaves us without a magic formula and it seems there will never be any worthy algorithms for calculating risk. But that does not mean that risk cannot be measured. It is important for risk measurement to be accurate, but it is not necessary for it to be precise to the nth decimal place. If we cannot have a solid, quantified value for continuity risk we can still get it right in a relative or "fuzzy" manner. Here are some basic principles:
- Measure the effect on critical resources, not the threats to them: Once again, poor definition leads to poor thinking. NFPA 1600 provides a list of "hazards"; ISO 27005 has its list of "threats" and "vulnerabilities." Both standards mean events like fires, floods, earthquakes, power failures or corrosion. But no one can list all the possible causes of continuity breaches. That would be betting against God, and he always wins.
The real risk to an organization is the impact on critical resources. At a high level, these resources include working premises, human resources, data, equipment, information systems, voice and data networks, raw materials, etc. The non-probabilistic approach is to determine the effects of disruption of these resources without a priori consideration of the likelihood or extent of such disruption.
- Categorize the impacts: The simplistic formula asks us to posit probability, without stating the specific impact we would refer to. Thus, we must assume the worst case, i.e., total destruction. That is indeed one category of impact, but so are:
- Inaccessibility (the resource exists, but we cannot get to it) -- Consider offices on the 50th floor when the elevator does not work.
- Unavailability (the resource exists but is rendered inoperable) -- Consider hacks that stop Internet websites.
- Unusability (the resource exists, but it is malfunctioning) -- Consider a Voice over Internet Protocol telephone systems if Internet connectivity is lost.
- Incapacity (the resource exists and functions as expected but not at a sufficient level) – This usually occurs at a gradual pace, but consider a computer virus that slows a network to a crawl.
Each of these categories might be adjusted somewhat to fit the circumstances of a particular resource. It is not clear how unusability, for instance, might apply to people.
- Scale the categories: Each of the impact categories might occur at different levels. For example, the total destruction (i.e., death) of critical personnel is one of those unpredictable events. However, the range could be expressed as the death of all critical people or the death of a single individual. Particularly for large populations, the risk of losing everyone is not credible, while the loss of just one approaches a certainty in any given time period. Even total loss may be credible if one is concerned about nuclear attack, a classically unpredictable event. The same might be said about the loss of some versus all data, raw materials or workplaces. The scale might be expressed differently for certain resources, but the concept remains the same for all.
- Determine the credibility of each level of risk: Note that in scaling the impact categories, the test is credibility, not likelihood. Those levels not considered credible should be eliminated, leaving only the risks that might occur to respective resources. At this point, management can begin to determine the investments it wishes to make to mitigate the risks. Note that the investment may be differentiated based on management's perception of relative credibility of each level, the "fuzziness" in risk measurement. Note also that some outlay is required for all credible risks, even if the risk is accepted. In that case, the piper must be paid when the tune is called.
- Consider frequency of occurrence: Aha! Here, probability seems to be creeping in the back door. While this is to an extent true, consideration of frequency comes in only at the end, not the beginning of the measurement. Moreover, it might be expected that some risks, while credible, would have less impact on an individual basis but might occur more often. For example, some equipment somewhere is going to malfunction on a regular basis. Recognizing this, management institutes preventive maintenance. It is the high-impact, low-frequency events (i.e., catastrophes) that seemingly absorb most of the business continuity budget, until the measurement of the totality of risk is considered.
In the end, risk measurement is a process, not a formula. Moreover, it is an unending process, because the profile of risk changes at an unpredictable pace. That is why risk management is a continuous process as well.
Steven J. Ross, MBCP, CISSP, CISA, is founder and principle at Risk Masters Inc. Write to him at email@example.com.