High Availability versu Disaster Recovery

It is not really rare that High Availability (HA) and Disaster Recovery (DR) are miscellaneous. HA and DR seems like interchangeable terms, but there are distinctions between the two terms. Running a business-critical workload like SAP and SAP HANA, a clear definition is important - especially in SAP and SAP HANA virtualized environment. But keep in mind, both things are not contra productive to each other. In many cases the business requires to have both in place to ensure business continuity.

High Availability (HA):

High Availability systems are designed to protect single predictable failure. The predictable components could be power supplies, processors or memory but also blue screens or kernel panics on operating system level as well application failure. HA is monitoring the component, detect, re-start or failover the recourse(s) and the HA is automated and has no human interaction. The failover is typical implemented by cluster systems like VMware vSphere HA in a virtualized environment. HA archived by designing systems with no single point of failure (SPOF).

The availability time is expressed in percent and is usually up to 99,999 % - referred to as “five nines” -, or as close as possible. The availability time is calculated for unplanned downtime - this doesn’t include the time for planned maintenance like patching or upgrading of the system. As HA is a strategy and therefore a sustainable planning is indispensable and is focused on technology implementation and design.

Availability %	Downtime per year	Downtime per month	Downtime per week
90% (one nine)	36.5 days	72 hours	16.8 hours
99% (two nines	3.65 days	7.20 hours	1.68 hours
99.9% (three nines)	4.38 hours	21.56 minutes	5.04 minutes
99,99% (four nines)	52.56 minutes	4.32 minutes	1.01 minutes
99,99% (four nines)	5.26 minutes	25.9 seconds	6.05 seconds

Disclaimer: Source: Wikipedia http://en.wikipedia.org/wiki/High_availability

Disaster Recovery (DR):

Disaster Recovery is not an automatism, disaster recovery is defined in processes and people who is necessary to execute recovery procedures after the disaster is occurs. The function of DR is to re-establishment the service after disaster. A disaster could be power outage, flood or cyberattack besides this, disaster recovery could use for disaster avoidance - migrate the workload before the disaster occurs. DR goes beyond HA and doesn’t rely on data center or system level, it also includes the usage if geographic diversity, for example an alternative site and date center.

In the discussions of High Availability vs Disaster Recovery is, that both components could combine but important is, that DR is defined with independent components to avoid that the system or human error paralyzed the disaster recovery implementation. Therefore, a stretched cluster or similar technology implementation is not disaster recovery - in best cases disaster avoidance or business continuity.

Recovery Point Objects (RPO) and Recovery Time Object (RTO):

In the discussion of disaster recovery two points should considered - Recovery Point Objective (RPO) and Recovery Time Objective (RTO). Both objects should be defined by the business. RPO defines the threshold - in data or time - how much data can afford to lost since the last backup. The value could in range of zero (minutes) until a couple of hours or days depends on data and service that the business is running. RTO the threshold - in time - how fast the application functionality needs back to business. The time range could in minutes until hours or maybe days, but zero minutes for RTO is unrealistic. The calculation of RPO and RTO estimate the cost of downtime and helps to define the budget for a continuity plan.

There are some other terms in the market like Disaster Avoidance or Business Continuity, but all of them are more or less vendor specific.

Tags: High Availability, Disaster Recovery

Lars Fiechtner