Disaster Recovery (DR) is one of the hottest topics concerning cloud computing these days. With the recent large scale disasters, both natural in the Northeast as well as man-made in Texas, disaster recovery is becoming more than just a proverbial checkmark in the box. IT organizations are tasked with providing a true disaster preparedness strategy and an executable plan to recover core systems in the event of a disaster. To that end, there are 3 areas where organizations should focus their efforts:
- Data Backup (Backup & Archive of Data)
- Business Continuity
- Disaster Recovery
Data Backup and Archive
Even today, backup and archive often involves tape service. (A 2011 survey of attendees at Spiceworks’ Backup Central Live shows noted that 82 percent of organizations still used tape as their final destination for backups.) Organizations will take their data from disk (some use multiple tiers of disk) to tape. Tape vaulting services will provide a vaulted copy in the event of a disaster. Tapes are rotated and ultimately used until either a standard replacement schedule is eclipsed or the media becomes end of life.
I was a big fan of The Simpsons in college and we never missed an episode. Nightly, I would gather with my friends to enjoy the latest episode using a VCR and a removable tape. Eventually, the quality of the recording would be so bad that you could hear Homer saying ”D’ohs” but you couldn’t see why.
Back-up tape has a similar usable lifetime. Let’s hope when it’s time to recover those key files for an audit or disaster, they work. There are many analysts who have estimated the percentage of tape restores which fail. What’s more, while considered inexpensive, tape backup can be anything but when you consider the cost of a technician’s time, hourly rate, travel to an offsite storage location or having a service offsite tapes. When its time to recover your critical data from tape, do you want heads or tails?
There is significant cost savings of running storage in the cloud versus onsite. Forrester analyzed the costs of 100TB onsite versus in the cloud and found a whopping 74% savings over traditional storage costs.
Of course, there are many factors to consider: How many years until a storage device is fully depreciated? What API is available? Do you have enough WAN capacity? How many copies are required? Is there compliance needed where you know exactly where your data resides? How many years of data must be retained?
At the end of the day, common backup software will write directly to the cloud and for systems that do not, gateway appliances are easily absorbed into the cost savings and accessibility benefits. What’s more, you can apply the same retention, de-duplication, and retention policies in the current backup model to the cloud.
For the purpose of this discussion, business continuity is a higher recovery time objective (RTO) and recovery point objective (RPO) than traditional disaster recovery. For IT portion of BC plans, most organizations have an active system on “standby” along with an entire replicated data set for the specific applications which cannot experience downtime. In many cases, there are two primary data centers along with two similar storage area networks (SAN) using software native to the manufacturer replicating the data. This can be done at a block level, file level, or using snap shots. In many cases, automation of the failover and traffic redirection can be accomplished using tools, like VMware’s SRM.
Cloud services are a great option to enable the secondary system. Most storage vendors have partner cloud environments that provide native targets for replication. Whether you’re using EMC, NetApp, IBM, 3Par, HP, Compellent or any number of storage vendors, there is a target service available. You can set up their replication in the same manner as they do today with multiple endpoints but instead of owning the target assets (like SAN, networking, cables, blades, virtualization hosts and the like) they’re provided by the cloud vendor. You can implement storage targets to support the most demanding workloads. Be it NFS, SATA, SAS and SSD (whether connected via 1Gbe/10Gbe+, ISCSI block or NFS), cloud service providers are offering ranges of IOPS from 1,500 through 100,000+ (using Host-Based SSD) to mirror production workloads.
Administration can be simplified by creating automated resource management both onsite as well as in the cloud. You can instantly provision storage and scale resources up or down as needed with real-time access to additional resources. Lastly, use the same IP schema on most cloud providers you currently use on premise. One of our cloud partners can actually provision a single virtual machine with 1TB RAM, 32 cores and 100,000 IOPS. All of this is provided on an operational expense (OPEX) versus a capital expense (CAPEX) and can be manipulated as needed during testing and actual Business Continuity (BC) scenarios.
DR is typically defined as an enterprise system which has an RTO/RPO of 1-2 hours or greater. Anything less than 1-2 hours generally falls into the category of business continuity. If a WAN circuit fails or your primary data center experiences a “hiccup” of power, you likely will not want to implement a DR plan. Older disaster recovery plans typically included a contract for hardware where you could take tapes and recover systems in a service provider data center.
After the major natural disasters including Hurricanes Katrina, Irene and Sandy, these more traditional approaches to DR are simply not viable. During Katrina, many IT organizations that had major DR contracts were required to send their IT professionals to the East Coast or the mountainous West. The last thing people want to do during these types of disasters is leave their families and temporarily relocate across the country. In some cases, there isn’t even a method of transport to get the tapes and the people moved to recover the systems.
As such, a new cloud service called Disaster Recovery as a Service (DRaaS) has gained momentum. Just like an insurance policy, you can replicate data to a service provider using one of three methods.
- Application Based
- Host Based
- Array Based
Regardless of your hypervisor (Citrix, Microsoft, VMware, KVM) or your storage technology, you can now replicate data and have servers on “standby” in the event of a disaster. It could entail using a site recovery manager (SRM), or importing an open virtual format (OVF), for example, to enable a platform to restore in the event of a disaster.
With the DRaaS service offerings, organizations extend their WAN service (either public or private) to a service provider and pay for a scaled down replica of the current systems. Once a disaster is declared, the service provider has a defined set of action items to expand server capacity and rebroadcast. There is typically a disaster declaration fee (like a deductible) and you’re free to run your systems on the secondary target for up to 30-90 days. In the event of a catastrophic disaster, the DRaaS infrastructure can support production (by rolling the contract from DR to production).
Most homeowners will tell you for disaster preparedness, they don’t buy another house, they have an insurance policy. The cloud is that insurance policy for organizations not only to save capital but also to provide the business functionality needed in the event of a disaster. At CDW we have aligned ourselves with cloud partners who support the technology manufacturers we’ve been supporting for years. Regardless of your time to recover, method of replication or your manufacturer of choice, CDW can help build a road map to meet your technology, financial and business needs.