Sorry, the language specified is not available for this page

    The importance of disaster recovery testing

    October 27, 2020

    There’s no way you’ll be successful in recovery without testing. Full stop.

    You might think you have a solid plan in place, but there are so many issues that can arise throughout a recovery, from communications to logistics to an array of technical glitches. And if you don’t test your plan ahead of time, it won’t be long before you learn the true meaning of Murphy’s law.

    Whether you’re already a testing evangelist or need to brush up on the importance of disaster recovery (DR) testing, here’s an overview of why testing matters, what can happen without testing, and guidelines for how to approach testing to ensure your systems are ready for recovery.

    Why disaster recovery testing is important

    Discovering your DR plan is ineffective in the midst of a disaster is everyone’s worst nightmare. That’s why testing your DR plan ahead of time and on a regular basis is critical. Rigorous testing is the only way to uncover and fix issues in your plan.

    And trust me, a lot can go wrong when a disaster strikes.

    Communications is a common issue. You’re in the middle of a crisis and have a problem, but don’t know who to call. Or worse, you have the name and number of the person you’re supposed to call, but when you dial the number, you find out that person no longer works for the company.

    Technical glitches run the gamut, but one that I’voe seen derail DR plans again and again is missing data. An absolute requirement for a successful recovery is to have all of your required data in place. Given the complexity and interdependencies among today’s systems, missing even one critical piece of data can result in an inability to bring up your production applications in your recovery environment.

    DR testing matters because the recovery process is based not only on your recovery procedures but on coordination, collaboration and sequencing. You must maintain those across your storage, network, applications, databases and other platforms.

    You may assume you can get that done, but you need confirmation that a given input will produce the desired outcome. There’s only one way to discover whether that’s true and what pitfalls you might face: testing.

    How often should you test your disaster recovery plan?

    There’s no universal answer here, but there’s a spectrum of testing frequency based on how much downtime your business can afford. If you need your business functions up and running in a week (which would be surprisingly long), testing once per year could be fine. If your systems can never be down, you owe it to yourself to test much more frequently.

    Many companies have a two-day or one-day recovery requirement. If your requirement is 48 hours, you should be testing twice per year. If you have a one-day requirement, you should be in your recovery environment testing once per quarter. Executing tests at that frequency will help you to confirm if anything in your environment has changed.

    Another factor to consider is major changes to your environment, or to internal or external requirements. If you typically test in March and September, but your company is making a change to your processing capabilities at the end of June because of corporate requirements, you should consider a one-off test to make sure those changes are reflected in your DR plan.

    Guidelines for successful disaster recovery testing

    There are a few principles to keep in mind about DR testing, especially in the wake of a major disruption like the COVID-19 pandemic. Take note of these guidelines:

    1. If you have a lapse in testing, don’t make it permanent.

    Testing lapses happen for any number of reasons. Some companies have been taxed by changes during the pandemic and have had to defer tests for a couple months as they figure out problems they’voe never faced before. That’s fine as long as you get back on schedule with your testing programme.

    But don’t take a temporary shift in priorities as an excuse to forego testing. Never let a temporary lapse turn into a permanent one.

    1. Make post-COVID adjustments to your recovery centre.

    Organisations have been able to execute DR plans remotely for a while now, so whether your team is working in the office or from home, they should be able to participate in a recovery event.

    You should, however, make sure your team has viable access to the recovery centre if working situations have changed. If you’voe always worked from the office and haven’t set up the configuration from home, that’s an adjustment you’ll need to make.

    1. Don’t fool yourself, but make documented exceptions when testing.

    There’s no reason to fool yourself by going easy on your DR test. If you’re running a test and find out you’re missing a file, your instant reaction might be to go back to production, get that file, and continue with your test. But to ensure you can successfully recover, bringing something forward from production should be an automatic fail because that’s not possible in a true disaster.

    However, you can set out certain documented exceptions to the full plan that you simply can’t execute as part of a test. For example, after a disaster, you’ll want to establish a link to your bank in your recovery system to process payroll. But before a test, you can make the decision to leave that connection to the bank out of the testing process. You don’t want the bank to process payroll during the test, but you do want to take the payroll system as far as you can in creating that file and analysing whether it would match what you would send to the bank.

    Don’t cheat on the test, but make informed, proactive decisions on steps to exclude.

    Testing your disaster recovery plan should be a regular habit

    During a disaster you will face many obstacles and challenges. But with consistent, successful testing, your DR plan doesn’t have to be one of them.

    No one wants to discover an issue when they no longer have the ability to fix it. Testing gives you that chance. You can’t be successful in recovery without it.

    Other Posts You Might Be Interested In

    How To Choose Between Public, Private, or Hybrid Clouds

    Choosing the right cloud solution for your business is crucial to ensuring you have the right level of computing support and security to meet the needs of your company. Learn More

    AWS Elastic Block Storage GP3: Fewer Trips Around The World at a Lower Cost

    Four years ago, IDC issued a report on the soaring growth of the world’s total amount of data, estimating that by 2025 the collective sum would skyrocket from 33 zettabytes... Learn More

    How Big Is Your Bucket? Use The AWS Cost Calculator To Find Out

    There’s a metaphor that has been applied to everything from the human body to rotational motion (in physics) to customer relationship management. It’s the humble bucket, that... Learn More