Jason Steuernagel of Momentum Technology Group provided us with this write-up regarding backups, disaster recovery and business continuity.
As an IT provider one of the most basic overriding goals for all of our clients is to NEVER LOSE DATA. This simple and basic goal is the reason we design servers with fault tolerance (minimal single points of failure), utilize IT security measures (firewalls, antivirus) and put backup system in place that maintain extra copies of the data – ideally in an automated fashion and/or with processes that are not prone to break down.
A business can recover lost data and resume operation when a basic “event” occurs like accidental deletion or even just losing a file. With technology today, restoring data is much faster than in the past where tapes were involved. There are also advances with server technology that allow for the snapshotting of data (Volume Shadow Copy) for very fast retrieval of accidentally lost or deleted files. These advances have made restoring data quick and reliable. This fact has also probably lulled a lot of businesses into thinking that they are all set as far as any potential data loss or operational disruption.
What is a Disaster?
The most common type of event is a server crash. This can happen several different ways: a non-redundant part of the server could go down (like motherboard), the operating system of the server could become corrupted and not boot, or the server could lose more than 1 hard drive in a redundant set. Disaster recovery comes into play when something like this happens (or any other serious other kind of event like fire, natural disaster, etc.).
Disaster recovery planning shouldn’t be done during a disaster. This seems extremely logical, but unfortunately, this is when a lot of disaster recovery planning happens – and it is the wrong time. In this article it is estimated that 57% of small and medium sized business don’t have a disaster recovery plan.
Based on my experience, I would argue that that the number is actually much higher.
What is Disaster Recovery Planning?
At the most basic level, disaster recovery planning is the process of identifying risks, putting together ways to avoid or mitigate risks, documenting plans, procedure and very specific courses of action that can be followed, testing and validating that plan and continually updating it. Disaster recovery planning should be done before a disaster and should be updated and tested on an ongoing basis.
Summary / Final Notes:
Writing up this information has been on my mind for quite a while. Recovery will always be faster, more efficient and cost less if it is planned out ahead of time vs. on the fly during the disaster—that is the primary reason I wanted to bring awareness to this issue. Disaster recovery planning is a big undertaking. It takes significant time both on an IT side and on a business/operational side working hand in hand to make sure all risks are found and accounted for and a sound plan developed. There is also a probability factor involved. There is a probability that a business will never experience a disaster. Some businesses look at this and decide that they are not going to plan and just deal with a disaster if it happens. This is solely dependent on the business and their risk tolerance.
Again, I wanted to bring awareness to this issue and let you know as a company Momentum Technology Group is developing planning tools, planning templates and testing disaster recovery technologies that can make recovery faster and more cost-effective.
Some formal definitions for disaster recovery and some sub-components of disaster recovery:
Disaster recovery (DR) involves a set of policies and procedures to enable the recovery or continuation of vital technology infrastructure and systems following a natural or human-induced disaster. Disaster recovery focuses on the IT or technology systems supporting critical business function, as opposed to business continuity, which involves keeping all essential aspects of a business functioning despite significant disruptive events. Disaster recovery is therefore a subset of business continuity. [Source]
Business Continuity – A business continuity plan is a plan to continue operations if adverse conditions occur, such as a storm, a fire or a crime. The plan includes moving operations, (recovering operations) to another location if a disaster occurs at a worksite or datacenter. For example if a fire destroys an office building or datacenter, then the people and business or datacenter operations would relocate to a recovery site.
The plan could include recovering from different levels of disaster which can be short time, localized disasters, to days long building wide problems, to a permanent loss of a building. [Source]
Recovery Point Objective, or “RPO”, is defined by business continuity planning. It is the maximum tolerable period in which data might be lost from an IT service due to a major incident. The RPO gives systems designers a limit to work to. For instance, if the RPO is set to four hours, then in practice, off-site mirrored backups must be continuously maintained – a daily off-site backup on tape will not suffice. Care must be taken to avoid two common mistakes around the use and definition of RPO. Firstly, BC staff use business impact analysis to determine RPO for each service – RPO is not determined by the existent backup regime. Secondly, when any level of preparation of off-site data is required, rather than at the time the backups are offsited, the period during which data is lost very often starts near the time of the beginning of the work to prepare backups which are eventually offsited. [Source]
Recovery Time Objective (RTO) is the targeted duration of time and a service level within which a business process must be restored after a disaster (or disruption) in order to avoid unacceptable consequences associated with a break in business continuity. It can include the time for trying to fix the problem without a recovery, the recovery itself, testing, and the communication to the users. Decision time for users representative is not included. The business continuity timeline usually runs parallel with an incident management timeline and may start at the same, or different, points. [Source]
By Jason Steuernagel
Jason is the Principal of Momentum Technology Group and works in IT consulting, IT infrastructure design and management, cloud computing, virtualization, and disaster recovery planning and implementation. Click here to see Jason’s profile on LinkedIn.