Disaster Recovery - Minimize the Risks

By Vicky McKim, AFBCI, MBCP, CRMP, Risk Management & Business Resilience Director, Aureon Consulting

Vicky McKim, AFBCI, MBCP, CRMP, Risk Management & Business Resilience Director, Aureon Consulting

In every disaster recovery effort, there are a myriad of things that can go wrong and delay the planned recovery sequence. By way of example, let me provide a list of all the things that have delayed my IT recovery efforts in the past:

• Backups were not delivered on time.
• Wrong drives were installed by the recovery service vendor with a two-hour hunt for a screw driver to fix it.
• Operating systems loaded did not match what the backup required or test document that was agreed to.
• Recovery data center power failed during recovery and the transfer switch to the generator failed.
• Small tree caused the satellite to be disrupted (until we cut down the tree).
• Most knowledgeable person had left shortly before the event
• Recovery Service did not have room for us by the time we made a declaration.

Cloud has improved some of these items, but there are still no guarantees. Yes, we have the ability to transfer data operations nearly seamlessly to any continent we choose, but there are still things that can go wrong. Data replication is not always the silver bullet largely because of how it is set up. Let me run through a few scenarios that will get you thinking like I do about disaster recovery using the cloud.

What if there is malware not only in your primary production, but also embedded in your replicated instances? What if you do not catch it in time and it is in every instance and backup you own? What then? How do you protect yourself when they now control the data? What if phishers gain access to the cloud and no one realizes it, and they are skimming data as such a small rate your alarms do not detect there is anything amiss? Suppose they have hidden themselves well and at the last moment, when they have everything they want, they shut down your cloud. You have no data! What does cloud recovery look like now? This is just a sample of why you need to have multiple strategies for system and data recovery.

"To turn disaster into an opportunity and win market share, companies need to have backups to the backups for software, data, people, power and other resources"

More recently, we had an office shutdown from a severe power outage caused by a storm. As we transferred back to commercial power from the generator, which we were having trouble getting fuel for, something went wrong. There was a surge in the commercial power that blew the power panel in the building. This meant no power for us from the commercial source or the generator. We were down four hours until they could get a replacement panel installed. This can happen to anyone including a data center hosting your virtual servers.

Wisdom says you need to have backups to the backups, both for software, data, people, power and other resources. Planning should go beyond the common phrase, “It’s replicated, we have got it covered!” There is still a good chance that replication is not sufficient for recovery in a severe event. Cloud providers often replicate to the same rack on a different disk, usually on the same array. Make sure your replication is geo-diverse. M ake s ure y ou have a downloaded backup of your data on a local server from your cloud, just in case the internet is not available. This is why many businesses end up paying ransom ware. They planned recovery from backups that were rendered inoperable, as well as their production equipment.

Plan as though plan A will fail, expect to use plan B, and know that it too may fail. Assume your most knowledgeable people will be out of the country on vacation and that Murphy S. Law will join your team. If you follow that rule of thumb you may have a chance of surviving and turning disaster into an opportunity to win market share because you will recover faster than your competitors.