Articles

How to Conduct a Disaster Recovery Test

Being prepared for disaster recovery is important… but not core to daily activities. So in a typical day, if there are twenty items on your to-do list, disaster recovery is most likely number twenty-one. But testing your disaster recovery plan can make the difference between sailing through disruption or sinking the ship.

When to test

There is no fixed interval for conducting a disaster recovery test since every business is unique. What’s important is that your plan is sufficiently up to date to allow you to recover information systems and business functions in the event of an emergency. That might be once a year or once a month, depending on changes in your organization, personnel, technologies or facilities; at a minimum, you should test once a year. Some organizations have never tested their DR plan, while others have tested routinely yet their plans have still failed when needed. The key thing to remember is that a DR plan is a living, breathing document that is only fit for purpose when it reflects the current state of the business and its production environment, so don’t let “configuration drift” catch you off guard; there must be a strong link between DR and change management.

During the test

Testing your DR Plan means validating your recovery procedures, not simply demonstrating or rehearsing them. It’s also important that the person who creates the DR procedures doesn’t execute them – after all, they may have included mental shorthand into the procedure or devised it to be easy to carry out themselves, which can automatically introduce sources of failure. And there’s no guarantee that, at time of disaster, that individual will still be employed or not on vacation or sick leave. Task redundancy should ensure that at least two people can perform any one activity, to avoid creating a single-point-of-failure.


After the test

Like all products made by fallible human beings, DR plans can have flaws that may not be revealed until a certain set of circumstances aligns. A successful test validates your DR plan can succeed, a failed test proves that the plan can let you down when the DR plan is not current. If a test identifies a defect under relatively ideal, planned conditions, you can be sure this glitch would be amplified at time of disruption, so any kinks need to be ironed out before the plan can be invoked in earnest. Defects or shortcomings should be noted, categorized, their root cause determined, and resolved within an agreed timeframe. The same test should be performed again with the resolutions in place to determine whether they are effective in eliminating the flaws – this may require several iterations, all of which should be documented.

Reducing cost, time and effort

For many, the cost of disaster recovery (and personnel downtime during testing) is a major inhibitor. However, standardization, automation, and virtualization can all drive down cost by mitigating the problem of idle assets, simplifying the test process and hours required to complete tasks, improving reliability and driving efficiencies. Good documentation not only results in better management but can help you identify savings opportunities in day-to-day IT operations. For example, why run three separate servers when you could run three virtual servers on one physical piece of tin?

Disaster recovery is a lifecycle, not a one-and-done exercise. Establish an ongoing plan maintenance schedule, including activities such as risk assessments, business impact analyzes, plan reviews, change control, contact list updates, and any training and awareness required.

Find out how our Disaster Recovery experts can help