Yogi Berra made a name for himself with his own brand of pithy and paradoxical statements, and the statement above is an excellent example.
You can’t schedule disasters, but you can plan to mitigate their effects. Whether it’s fires and floods, critical accidents by personnel, server crashes, viruses, hacking attacks, or even stock market crashes, you want to have a plan in place for resiliency and recovery. Although often described as “just common sense,” it’s really about taking responsibility and taking business continuity seriously.
What is Business Continuity?
Business continuity (BC) is defined as the capability of the organization to continue delivery of products or services at acceptable predefined levels following a disruptive incident. (Source: ISO 22301:2012)
The Business Continuity Management Guide available on KnowledgeLeader further defines business continuity as the development of strategies, plans, and actions which provide protection or alternative modes of operation for those activities or business processes which, if they were to be interrupted, might otherwise bring about a seriously damaging or potentially fatal loss to the enterprise.
This guide provides an overview of business continuity management along with insight on regulatory and industry frameworks for business continuity management, business continuity methodology and business impact analysis (BIA).
Use of international standards, program development, and supporting policies insure that proven methods and concepts for continuity are in place and in use to ensure an organization’s survival on continuity of business activities. There are numerous regulatory and industry frameworks for business continuity management, as shown below:
“The tendency of an event to occur varies inversely with one's preparation for it.”
- David Searles
It is important to clearly document an organization’s business continuity methodology and make sure that it has full executive support and that there is known tactical ownership and a complete governance structure. Take a look at the following steps and the structure chart below to consider how your organization is prepared.
- Determine what systems and business processes need to be recovered and why
- Derive recovery options (price vs. risk) and identify gaps
- Implement and document solutions
- Train personnel; test and maintain the plans
Of course, spending for business continuity planning is finite; therefore, plans need to focus on recovery mechanisms for the high-risk elements of the organization. Enter the business impact analysis (BIA). The business impact analysis is the single most important component, as it provides the guidance, metrics and purpose for the ongoing development of the business continuity plan.
Supporting technologies, critical business processes and existing recovery plans all factor into the business impact analysis to develop the information necessary to develop risk-based recovery solutions. This will include:
- Recovery time objectives by process
- Financial impact of an outage
- Customer Impact of an outage
- Prioritization of recovery steps
Once the impact of potential outages has been determined, recovery solutions must be evaluated and selected. Numerous options may exist, ranging from obtaining an external recovery (or “hotsite”) provider, utilizing the resources of an existing owned data center, or modifying the current equipment to ensure necessary redundancy.
The ultimate solution should be based on the potential risk of an outage, the level of risk that management is willing to accept, and the cost constraints faced by the organization.
A good strategy development approach includes logical options for each business function/location, together with the pros/cons of each and their implementation implications. Management can then weigh the cost of the recovery strategy (both implementation and maintenance costs) against the potential cost of the business interruption. Note the Risk Gap identified in the graphic below:
Using a good recovery option analysis can identify and reduce the gaps and deficiencies so that they may be corrected.
Strategies for data recovery are also an important part of business continuity management. Recovery time objective (RTO) drives selection of alternative strategies that enable data restoration anywhere from point of failure (e.g., synchronous mirroring) to multiple days (e.g., traditional “tape” backup).
Implementation cost for data recovery strategies will increase as data loss exposure is reduced. As data loss exposure is minimized, recovery time may be reduced. See the chart below for a cost versus chronological time of data recovery:
Business Continuity Plan Development & Documentation
Once an organization understands its strategic business continuity direction, the need to organize and document those solutions into a “living” plan becomes critical. This helps to ensure that the plan can be maintained as the organization changes through time. The end result should be business continuity documentation that has been properly distributed and that contains the necessary detail to quickly recover from an interruption. Also, during this phase, emergency procedures, recovery teams, and incident plans will need to be assembled and documented.
Some additional thoughts based on questions we’ve received:
IT’s primary job in BCM is to support the recovery time objectives of the various business units and processes across the enterprise. As this relates to application teams specifically, if the particular business unit states that they need to be back up and running in 1 day (and executive management buys off on this), then a given application team’s responsibility is to develop a strategy that will allow for the recovery of the application supporting said business process in an acceptable timeframe (e.g., 1 day). It’s important to keep in mind that one has to decouple BCM/BCP with IT DR. BCM focuses on the people and the processes, and IT DR focuses on the infrastructure. The application team’s BCP should focus on what the people on that team need to do in order to continue supporting the application and bringing it back online.
When looking at percentage of revenue typically spent on disaster recovery and business continuity, there have been figures that companies spend (on average) 2 to 4% of their annual IT budget on IT DR planning. (Note: This doesn’t mean that the infrastructure or strategies they put in place, like mirroring or replication, only cost 2-4% of the budget. We are referring only to the planning.). We don’t have good metrics or statistics on BCM spend.
In terms of man hours required for a business impact analysis, a from-scratch BIA could easily take 6-8 hours per business process. You would have to do the math and multiply by the number of business processes and locations in question.
When considering the reasonable distance to locate a secondary data center, backup facilities should be far enough away so that localized or regional impacts wouldn’t impact the backup DC. For example, if you are in Miami, Florida, you probably don’t want your backup DC to be in Orlando because a hurricane could easily come through and knock out both locations. Further, you don’t want the primary and backup DCs to be pulling from the same power grid, if possible. If the company has multiple geographic locations (let’s say one in Florida and another in Texas), it might be best to use those sites as backups to one another, because the geographic dispersion is already there. The downfall is that if you have a data center that isn’t managed by a vendor or if the backup DC is too far away, it could cause delays in recovery because personnel may spend too much time trying to physically get to the backup DC.
In closing, it’s important to remember that business continuity is a living process that constantly needs to be analyzed and updated. The heart of business continuity management is a cycle of analysis, design, implementation and validation, and the work of risk management is ongoing.
For further reading and resources, you may be interested in the following:
Business Continuity Management topic page
Disaster Recovery topic page
“When written in Chinese, the word 'crisis' is composed of two characters. One represents danger and the other represents opportunity.”
- John Fitzgerald Kennedy