Blog

An Enterprise Guide to an Effective Disaster Recovery Plan

March 5, 2019

Lightedge

Author

The level of risk from any particular threat varies widely across organizations, based on business type, location, and other factors. However, the danger of the “average” business facing some business disruption in any given year is far from insignificant, and it’s no longer a question of whether or not you should put a disaster recovery plan into place, it’s a question of how.

The U.S. Bureau of Labor reported that 93 percent of companies that experience a data disaster go out of business within 5 years, and that 95 percent of all organizations experienced a data outage in the past year.

With these staggering numbers, it’s no longer a question of whether or not you should put a disaster recovery (DR) plan into place, it’s a question of how. Though costs vary significantly by company size and industry, the average cost of a data center outage has been estimated at $5,000 to nearly $8,000 per minute. Over time, the losses can be catastrophic.

Surprisingly though, 20 percent of organizations with 100 to 5,000 employees do not maintain any disaster recovery (DR) solution, according to Infrascale’s “Disaster Recovery as a Service Attitudes and Adoption Report,” leaving those companies vulnerable to extended downtime and loss of business.

That’s exactly why we created this guide. This resource outlines everything you need to know about creating an effective disaster recovery plan. Inside you’ll discover:

Why creating a disaster recovery plan is critical for every organization
The difference between backups and disaster recovery
How the cloud has optimized disaster recovery
The step-by-step process for creating your own plan
How to test that plan to ensure it’s continually being optimized
And much more…

Now let’s begin.

Backups Are Not Disaster Recovery

We often read the phrase “Backup and Disaster Recovery” as if the two things were inexorably linked. In a way, they are, but backup is not disaster recovery.

While you can’t have disaster recovery without having backups, you can back up your data without having a disaster recovery plan in place. Although it might seem cheaper and easier, it’s a dangerous idea and one that could be very costly in the long run.

Backup vs. Disaster Recovery

The term “backup” seems simple enough to most of us. It’s the process of storing copies of your data in case some failure, machine or human, causes your primary data to disappear or become corrupted. Data recovery happens all the time. Whether an employee deletes a file in error, or something happens that requires you to reload your data, the backup serves as the information that you restore.

But in order to restore data, you need to have a place and an environment where the data can reside. And that’s where disaster recovery comes into play, when you have lost your IT environment. A disaster can be something as large as a hurricane that wipes out your entire data center or as small as temporarily losing power or connectivity to your servers or primary site.

A disaster recovery plan enables you to restore functionality and access to your data and systems via a secondary environment and then transfer it all back to your primary environment after the disaster has ended.

The fundamental difference between backup and disaster recovery is pretty straightforward:

Backup is the process of saving your data in a secure location (onsite or offsite) to restore a working environment when you need it.
Disaster recovery is a larger process that replicates your entire computing environment (data, systems, networks, and applications) as part of your business continuity plan and restores it all after the crisis has passed.

The goal of both processes are to ensure that you don’t lose valuable data and that you have the ability to restore it should something impact it negatively. Thanks to current technology and the use of virtual machines and cloud computing environments, data backup can be done concurrently with the replication that is necessary for a good disaster recovery environment.

3 Ways the Cloud Has Improved Disaster Recovery

Before virtualization via the cloud, a backup and disaster recovery environment would have had to be a complete physical copy of the one you were running in your primary site, whether that was just one server or an entire data center.

The data that had been backed up would need to be manually loaded from tapes or other media. Today, cloud-based disaster recovery provides robust disaster recovery options, such as:

1. Reduced Recovery Time

In the past, disaster recovery has been based on the process of physically storing data on backup tapes offsite and then relying on the transportation of those tapes from the storage facility before being re-uploaded onsite. The advent of virtualization and cloud computing offered the possibility to transfer and store backups in cyberspace.

An entire server, including software, applications, and auxiliary data, can be launched on a virtual host in a matter of minutes, as opposed to the days or even weeks that it takes traditional backup systems to recover. Without having to reload each independent server component, businesses can reduce their recovery time objectives (RTO), or at least meet the ones they’ve set in place.

2. DR Is Cost Effective

With traditional disaster recovery, capital expenses such as tape storage, transportation costs to a secure facility, and space to store backups were necessary. The time it takes to transport and restore a system represents higher resource consumption and downtime as well. With cloud disaster recovery, the total cost of ownership is lower, and its higher performance translates to increased savings when a disaster does strike.

3. Increased Flexibility and Options

In today’s marketplace, scalability is essential and having the flexibility to increase or reduce capacity quickly has real value. The cloud allows businesses to scale both up and down.

This type of flexibility is also useful when deciding whether you want to back up to and restore from the cloud, or whether you want to back up from and restore to the cloud. Some application bundles may have much more stringent recovery point objectives (RPOs) than others, so the ability to pick and choose what level of service each application bundle may need is also of great value.

Creating a Highly Effective Disaster Recovery Plan

Now that we have a clear understanding of what a disaster recovery plan is, let’s take a closer look at the step-by-step process for creating your own highly effective DR plan.

An important thing to remember is that a business continuity or disaster recovery plan is a living, breathing document. This is not a “one and done” effort or expense. As your systems, software, and personnel change, your disaster recovery documentation needs to be updated and maintained accordingly.

Creating a plan is only the first step in the process. The five primary steps of the life cycle are:

Development
Implementation
Testing
Evaluation
Maintenance

Developing the plan requires an in-depth analysis of your business processes, data needs, and technology infrastructure. Implementing that plan happens in stages. Those stages include, backing up data has to occur regularly. Other aspects of the plan won’t go into effect until or unless a disaster occurs.

Testing to ensure that data and systems can actually be restored during an emergency needs to happen at least annually, if not more often. As changes, like updating systems, adding personnel or new software, take place you should evaluate your plan’s effectiveness and make sure the plan reflects the changes. Quarterly assessments are recommended, at a minimum.

Maintenance is key to ensuring that all necessary information is up to date. If your plan doesn’t reflect changes that have occurred to your organization or your infrastructure, it will doom your recovery and continuity efforts.

Make Planning a Business Priority

When you’re planning your yearly budget, the fires already burning always get higher priority than those that haven’t started yet, and arguably might not happen. As a result, disaster recovery is often prioritized lower than other immediate needs in operations, marketing, or infrastructure.

But the cost of deferring spending on creating and implementing a comprehensive disaster recovery plan can be disastrous financially and may even be at the expense of your business, as the previous statistics attest to.

How long could your business survive without any access to your data, systems, and ability to do business with your customers?
What happens if you lose that data entirely and have to rebuild from scratch—could your company remain in operation if you don’t have a plan?

Your disaster recovery plan is just like any other insurance coverage your company invests in. You may not need it tomorrow, next week, or even next month. But when you do need it and don’t have it, the results are catastrophic.

Elements of a Good Disaster Recovery Plan

A good DR plan document will be detailed, kept up to date with current information, and accessible by anyone who needs to refer to it in the case of a disaster. The elements will vary according to your company’s structure, vertical, and what services you have supported by partners like your MSP or cloud provider.

The following is a short list of essential elements that you need to include.

Communication and Roles

Who does what and how to get hold of people are the two most essential needs in the immediate aftermath of a disastrous event? Contact information for all employees and providers essential to DR needs to be kept up to date and readily accessible. Also, each team and team members’ roles in case of an emergency event must be clearly outlined.

Schematics

Diagrams of the equipment, infrastructure, and data flow is be an essential part of any necessary restoration or rebuilding.

Systems and Asset Inventory

The Systems and Asset Inventory covers physical assets, like servers and laptops, as well as agreements with providers and agreements with vendors. If you’re outsourcing your primary IT and data to an MSP, you will have a shorter list of actual assets, but will need to know exactly what your agreement provides.

Application Dependencies and Prioritization

Detailing which applications interact with others is essential to the plan. You should list the application you need to restart first, identify the apps that are mission-critical and those you can delay restarting, as well as the level of priority to recover each.

Once determined, these should be outlined in both your internal and external Service Level Agreements (SLAs). You will also need to have a step-by-step roadmap for your administrators to follow, so that systems like point-of-sale payment or customer-facing applications are restored quickly, while those that can be delayed slightly are moved lower down on the list.

RTO and RPO

Recovery Time Objective (RTO) is the “deadline” in a disaster recovery situation. It’s determined by evaluating how quickly your system needs to be back online when something goes wrong. Your backup/replication strategy and schedule will determine how recent the data you are restoring will be.

You want to make sure the latest backup is not older than the Recovery Point Objective (RPO) you have set. An RPO is the maximum targeted period in which data might be lost from an IT service due to a major incident. Think about the potential re-work required when you determine yours. It will be different for every business and sometimes different for individual applications.

Regulatory Compliance

After disaster recovery events, most industries have regulatory obligations regarding reporting, documentation, and future protection against further instances. Whether HIPAA, Sarbanes-Oxley or PCI SSC, if your business is subject to regulations which require reporting after an outage or breach, this is a must-include item.

7 Steps to a Successful Disaster Recovery Plan

The purpose of disaster recovery planning is preparing your business to withstand a disaster and to be able to recover quickly with the least possible damage. Planning for the unexpected, whether it’s a technical failure, violent weather, cyberterrorism, or human error, helps ensure that business remains up and running. Even amid the most extreme challenges.

A DR plan specifically addresses the processes your company will use to recover access to the software, data, hardware, etc., needed to resume your standard, business-critical functions. Your DR plan should provide for redundant data center infrastructure, like servers, software, network connections, and storage to support your applications and enable your operations to function effectively.

Here are 7 steps to lead you through your disaster recovery planning process:

Step 1: Business Impact Analysis, RPO, and RTO

Conduct a Business Impact Analysis (BIA) to identify your most critical systems and processes, as well as the effect of their malfunction. A BIA will determine the functions or activities in your organization considered essential and those which are non-critical.

Critical functions include any business activity that’s mandated by law, fulfills a financial obligation, maintains cash flow, safeguards an irreplaceable asset, or plays a key role in maintaining market share.

Once you have identified which processes are essential, you will assign the following metrics to calculate your company’s level of tolerance for loss and the target time you set for recovery after a disaster has struck.

Recovery Point Objective

The first, your Recovery Point Objective (RPO), is focused on data and your company’s loss tolerance in relation to your data. RPO is determined by looking at the time between data backups and the amount of data that could be lost in between backups.

Recovery Time Objective

The second, your Recovery Time Objective (RTO), is the target time you set for the recovery of your IT and business activities after you’ve experienced a disaster. The goal of the RTO is to calculate how quickly you need to recover, which then dictates the type of preparations you need to implement and the budget you should allocate toward business continuity.

If, for example, you find that your RTO is five hours, meaning your business can survive with systems down for this amount of time, then you will need to ensure a high level of preparation and a larger budget to make sure that you will be able to recover your critical systems quickly.

On the other hand, if the RTO is two days, then you can probably budget less and invest in less advanced solutions.

You must define your acceptable recovery time. How quickly you must restore your data and critical systems to resume operations is a serious decision. Understanding how long you can wait to access and apply your data will yield clarity about which solution—data center, cloud, onsite, or Disaster Recovery as a Service (DRaaS)—is best for your company. (More on that in Step 5.)

Step 2: Risk Assessment

With this business impact analysis in place, you can establish and set priorities as part of your disaster recovery plan by conducting a risk assessment. Your risk assessment is a vital step in the DR planning process and identifies potential hazards and the high-value assets, like customer information and other sensitive data, and how they align with critical business functions.

As you develop your DR plan, and as part of the risk assessment, you must be able to answer the following questions:

What types of hazards or disasters (man-made or natural) could occur to disrupt the business?
How could each of these disasters impact the IT functions the business relies on to operate?

The greater the potential impact, the greater the resources that should be allocated to restore a system or process. While you may never be able to plan for all contingencies, it’s imperative to have solutions for the most critical functions that are at risk in a disaster.

Step 3: Establish Priorities

To establish priorities, assemble an appropriate team for your impact analysis, keeping in mind that everyone thinks their area of responsibility is the most important. Gather leaders from IT and various divisions to make the hard decisions about the real operational priorities.

Your disaster recovery plan will only be as good as your answers to the following:

What applications and infrastructure must be restored immediately if disaster strikes?
What is essential for productivity?

One strategy is to divide your applications into levels or tiers.

Tier 1 should include the mission-critical applications you need immediately.
Tier 2 covers applications you need within 12 to 24 hours.
Tier 3 includes applications that can wait to be restored for a few days.

In addition to data and information systems, your risk assessment should focus on communications infrastructure, communications strategy (both internal and external), secure access and authorization to critical systems, and re-establishing a suitable work environment.

Avoid this mistake: Do not fail to consider the needs of the people who will be carrying out your disaster recovery plan—usually under severe stress. Establish an emergency chain of command and communication strategy, so everyone is in the loop. Also, make sure food and sustenance are readily available, and provide lodging when necessary.

Step 4: Ensure Adequate Resources

Managing disaster recovery on your own requires significant investment in capital, time, and expertise. Even resource-rich companies have to decide how much internal effort to focus on disaster recovery planning vs. growing the business.

Many companies choose an experienced partner to help disaster-proof their systems. A vendor can bring expertise and a programmatic approach to ensure your disaster recovery solutions meet the needs of your business and your IT capabilities.

Disaster recovery experts advise that backup data be kept offsite in a secure location, preferably a data center that is unlikely to be affected by the same disaster. Modern technology also offers the option to secure your organization’s data and critical applications in a hosted cloud environment. Either option allows applications and data to be delivered on demand.

Step 5: Choose the Right Data Center

You’ll want to confirm that your data center provider is secure and compliant with industry-recognized standards and certifications to ensure your data is secure.

Types of data center compliance can include SSAE18, HIPAA, and PCI-DSS. Because you are relying on the facility to keep your equipment safe from disasters, these standards verify that the colocation provider has the proper physical and administrative safeguards. That way, if a disaster does occur, your equipment and data will remain unaffected.

Ask the following questions:

Is your data center facility remote?
Does your facility have adequate redundancy?
How do you secure your facility?
What certifications or audits have you undergone to prove your compliance?

When it comes to protecting your data and operations as a whole, there are numerous options. These include:

Public Cloud

Online file backup services are popular choice for consumers. They offer shared resources and the ability to pay only for services and resources needed, with no investment in server or networking hardware required. While this might sound attractive, the public cloud comes with inherent security risks. If you operate in any industry that must adhere to compliance standards, the public cloud may not your best choice.

Colocation

With colocation, SMBs purchase their hardware but install it in a physically separate, secure, specialized location that offers protection from both natural and human-caused disasters, and which also provides redundant power and connectivity options. Colocation comes with plenty of business benefits that fall outside of the scope of this post.

Hybrid Cloud

A hybrid approach enables businesses to leverage multiple platforms and services to fit their unique business continuity and disaster recovery needs, such as a combination of colocated servers and equipment, public and private clouds, and managed hosting services.

DRaaS

Disaster Recovery as a Service (DRaaS) is perhaps the simplest approach from the customer’s perspective. A managed hosting provider supplies continuous and fully automated replication of data and applications from a primary site to a target site, often in a different geographic region.

Today’s DRaaS solutions enable businesses of all sizes to cost-effectively and efficiently protect critical systems and data in the event of a disaster. The need for complex and time-consuming manual DR processes has been replaced with fully orchestrated, automated failover and failback of systems and applications.

Additionally, DRaaS solutions give companies the ability to non-disruptively test and verify their DR plan, which is crucial. And, very importantly, DRaaS allows businesses to achieve extremely low RPOs/RTOs, thereby speeding the recovery time of critical applications and ensuring valuable data stays protected. The end result: costly downtime and data loss is mitigated, and the business and financial impact of a disaster is minimized.

Step 6: Think Beyond Data

If you want to keep your business up and running even in in the event of a disaster, you’re going to have to back up more than just data. Be sure to have safeguards for operating systems and applications (and their licenses) or any other essential cogs in your daily business operations.

Additionally, don’t forget to have backup contingencies for your laptops and mobile devices. Suppose your business must set up shop outside your office to keep things going. You’ll need the resources to get the job done at whatever location you end up. Fortunately, in today’s computing age, cloud-based technology enables the remote worker, allowing you to access information on the fly. That is, assuming the data itself remained unharmed should such a scenario arise.

Step 7: Test and Update Accordingly

Take a long, honest look at your organization’s disaster recovery initiatives. Does your organization follow best practices and schedule regular drills to test your disaster preparedness? Or, can your business improve its disaster recovery testing?

Without consistent testing and optimization, disaster recovery remains a technological hypothesis. It likely does not account for the contingencies of a real emergency. For companies that never test a disaster recovery plan or only test it once every few years, unproven recommendations could undermine the entire disaster recovery process.

For that reason, we’ve dedicated an entire section of this guide to testing your DR plan.

Testing Your Disaster Recovery Plan

Disaster recovery plan testing means carefully analyzing all of the redundancies and recovery systems currently in place. While monitoring and maintaining systems play a significant role in disaster recovery, neither constitute accurate testing.

A real test involves running a simulation of a real-world scenario to ensure the business can continue in an emergency. It means identifying individual components of the disaster recovery plan, determining optimal outcomes, and then creating an environment to run the test and measure results.

Testing frequency varies among organizations and depends on many factors. Consider data sensitivity, your company’s need for a fast recovery, and the number of technological changes the company makes per year to create the right testing schedule. More testing usually facilitates a better, faster disaster response.

How Often Should Your Business Test Your Disaster Recovery Program?

According to the ninth annual joint Forrester and Disaster Recovery Journal’s “State of Enterprise Risk Management” survey, 40 percent of respondents report conducting annual simulations to test their disaster recovery response plans.

The report says that 27 percent of the respondents test more than once per year, 21 percent test every two years and 11 percent reported never having tested their DR plans at all. Without regular testing and verification of DR plans, organizations have no idea as to whether or not they actually will be able to recover from a disaster or extended outage.

Just as disturbing is the estimate from the DR Preparedness Council which indicates that three out of four companies worldwide fail at DR preparedness, in large part, due to a lack of testing. Additionally, of the companies worldwide that test their DR plans, a massive 65 percent do not pass their tests.

So how often should you be testing your disaster recovery plan?

Chances are the answer is going to be “more often than you are testing now.” There truly isn’t one duration to suit every organization’s needs, but best practices indicate the more DR testing companies conduct, the better prepared they ultimately end up.

Of course, there’s a point at which this may become unwieldy. If you’re spending more time on DR than your systems are worth, then you’ve got a problem, but for the most part, businesses can never get enough disaster recovery testing.

Common Shortcomings in Disaster Recovery Plans

It’s important to understand where to begin to look when testing. As organizations test their plans and optimize their systems, some of the most common shortcomings include:

Underestimating true recovery time

Ideally, a failover will take over the moment a system glitches, and it will create seamless business continuity. However, disasters bring unexpected outcomes. Consider the true recovery period of weather events, cyberattacks, and system glitches, and test with the goal of reducing recovery times in any situation.

Failing to involve all recovery team personnel in testing

If only the IT department conducts disaster recovery tests, it alone will understand the recovery process. In the event of a disaster, leaders, legal advisors, IT personnel, and other employees may play a role in recovery. Include them in testing procedures.

Failing to go beyond regulatory compliance

Certain industries have mandates that require disaster recovery planning (i.e. healthcare organizations must meet the HIPAA Contingency Plan standard in the Administrative Safeguards section of the Security Rule). While regulatory requirements are essential, prudent DR plans go beyond compliance to address all security threats.

Best Practices for Disaster Recovery Testing

Testing may involve penetration testing, employee phishing scam simulations, and backup system access testing. For each test, consider the following tips and best practices:

Always use a script and keep thorough DR documentation

A step-by-step plan of action creates accountability and allows you to act swiftly. It also enables the recovery team to respond according to pre-approved policies, regardless of stress or confusion. Use a detailed script for every test to create and account for true-to-life response activities.

This document will evolve as you identify better solutions and correct ambiguities. Furthermore, good documentation prevents issues associated with “tribal knowledge”, meaning solid documentation about systems and processes prevents confusion when there’s internal turnover.

Go beyond basic data and application transference

An off-site storage facility retains business applications and datasets but requires validation before user access. A backup system needs to receive pings from a different IP address to grant users access. This seemingly small, back-end oversight could stall the entire recovery.

Simulate common disaster scenarios for true-to-life testing

Work with security professionals to test systems in a simulated environment. Conduct specific penetration tests from internal and external endpoints and consider all potential vulnerabilities from a holistic viewpoint.

Partner with experts

Most businesses cannot afford to manage a department dedicated to disaster recovery activities. To improve the quality of testing environments and reduce the stress of internal disaster recovery planning, collaborate with a third-party organization focused on disaster recovery, backup, testing, and maintenance.

Evaluating a Strategic Partnership with a Provider

Having a good partner in times of crisis, one who is not trying to handle any aspects of business other than backup and disaster recovery and who offers reliable expertise in this area, can be a life raft amidst the flood of emergencies that spring up following a disaster.

You’re likely already using other forms of cloud computing in your business and adding disaster preparedness to the equation is just the next step in your business evolution. Once you’ve performed a usage assessment, partner with a managed IT services provider to store your most critical systems off premises, in the cloud.

An experienced provider can offer assistance with this while providing ongoing monitoring to ensure data is being regularly backed up and always secure. With the help of an experienced managed services provider, you’ll be in a good position to ride out any situation that might come your way.

By choosing a robust suite of management services, you can save yourself a lot of time and stress in the event of an emergency, be it due to inclement weather or a cyberattack. Lightedge’s services, for example, provide comprehensive data backup and management. In a data-based economy, preparing for the worst goes beyond getting the lights back on after a natural disaster or failed equipment; the value of a managed service provider (MSP) extends beyond the peace of mind you gain in knowing your data is protected.

Benefits of Partnering with an MSP

Advantage #1: Flexible, Scalable Solutions

Every business has different needs, from protecting sensitive healthcare data to safeguarding financial and credit card data, and all businesses strive for flexible solutions. If you’re trying to maximize your investment in legacy machines, you can still take advantage of data management and data recovery without needing to upgrade equipment.

Similarly, you can build a customized portfolio of services tailored to your business’s needs, from keeping legacy equipment at your facility to colocating at an MSP’s facility, to adding virtualized environments. Managed hosting can eliminate the need to invest in server equipment and infrastructure as well as the management and maintenance of those servers.

Advantage # 2: Building Compliance into Managed Services

Private and sensitive data management shouldn’t be left to chance, so industries dealing with sensitive customer data should pay close attention to their MSP’s compliance protocols and take a proactive approach.

In many cases, MSPs that specialize in security and compliance can help you prevent breaches from happening, mitigating both a data emergency and a PR nightmare. Investigate your MSP’s record with third-party auditing, certifications, references, and ask questions. For instance, find out how they can help you prepare for an audit and how they can adapt to your future needs.

Advantage #3: Onsite Maintenance

The perfect MSP will be a seamless extension of your in-house IT department, managing your security long after your people have left the office. No matter the scale of your MSP needs, managed services allow you to focus on your business, while they focus on maintaining uptime for your infrastructure.

For instance, data center providers offer onsite maintenance through a team of engineers that monitor, maintain, fix, and update everything associated with your network. From the hardware to the software that runs your critical programs, to the facilities in which the hardware resides.

Advantage #4: Location Flexibility

Mother Nature is always unpredictable, and weather experts agree climate change is not slowing down. A number of industries have already made changes to address threats, and data-dependent businesses should take note. Companies vulnerable to natural disasters such as hurricanes, flooding, or earthquakes must separate and backup their data to a secondary site for maximum protection and redundancy.

If you suffer damage to your physical business, your data and operations will remain unaffected. In the event of a natural disaster, your data is safe on your provider’s servers, allowing you to focus on more immediate concerns like communicating with your stakeholders.

Advantage #5: As-a-Service Model Takes Out the Guesswork

While anyone can buy the equipment and plan for the disaster, an MSP can implement Disaster Recovery as a Service (DRaaS), providing the most streamlined solution for any business that has to keep going through any storm, providing near-zero recovery point objectives (RPO) and recovery time objectives.

Next Steps

Disaster recovery plans are living documents. Without testing, modification, and maintenance, they cease to provide relevant information. Prioritize testing to develop a plan that provides value in the wake of a natural disaster or a cyberattack. If you’d like to get a second opinion on your disaster recovery efforts, contact us to discuss your strategy. We’re happy to help.

Disaster recovery plans are not the kind of thing you want to make on the knifepoint of necessity, or worse, after it’s too late. Instead, you have the opportunity to find the solutions your business needs without blowing your budget, especially if you take advantage of Lightedge’s free consultation.

As data management needs evolve, your MSP should be able to evolve along with your business. Get ahead of the game with a management service that keeps you on the crest of changes with high-quality, compliant data management and a comprehensive disaster recovery plan.

Colocation
Environmental risk: Considerations for Data Center Site Selection
Business Continuity
What is Business Continuity
Business Continuity
7 Quick Facts About DDoS Attacks