Quantify the Business Impact and ROI of Data Protection and Backup

Data protection and backup aren’t always respected even when compared to other facets of IT — until something fails, breaks, is overwritten, or corrupted accidentally or on purpose. Read on this article and learn how to:

  • Broaden the conversations with cross-section stakeholders to address the business impact
  • Leverage operational metrics and translate technologies/issues into dollars
  • Understand data and downtime loss to determine the best strategy and solution choice
  • Define a strategy to ensure data protection and availability
Quantify the Business Impact and ROI of Data Protection and Backup
Quantify the Business Impact and ROI of Data Protection and Backup

Why wait until something fails, breaks, is overwritten or corrupted accidentally/on purpose? Help your teams learn to quantify the full business impact of downtime and data loss, so that an informed decision is made for your data protection.

Content Summary

Downtime & Data Loss are Business Problems, not IT Issues
Quantify Your Problems
To = how long is the outage
Td = how much data was lost (and likely to be recreated)
Hr = Human costs
Pr = Profitability or productivity
Cost of an IT Outage = (To + Td) x (Hr + Pr)
How to Move Forward
The Process Begins (and Ends) with the Business, Not the Technology

Downtime & Data Loss are Business Problems, not IT Issues

Data protection and backup are often among the least appreciated aspects of IT, both in terms of budget and the perception of creating “business value” for the broader organization. Arguably, data protection and backup aren’t always respected even when compared to other facets of IT — until something fails, breaks, is overwritten, or corrupted accidentally or on purpose.

Tip: A smarter strategy would be to think of backup the way people think of insurance.

Consider this analogy

Insurance for home or automobile is essentially “pay a little to avoid a huge expense.” Premiums and rates are determined by aligning the costs of repair/replacement with the likelihood of an incident. Minor but potentially more frequent calamities might be under one’s deductible (just absorb the loss, in other words), whereas major but infrequent catastrophes that would otherwise be severely impacting are instead mitigated by the protection of insurance. Most responsible people carry adequate insurance, but “adequacy of protection” is often subjective.

As an alternate perspective, consider that in a world where organizations are constantly seeking where to save costs, data loss and downtime are like leaky pipes and faucets. Imagine that:

  • Everyone sees that the faucet in the breakroom has a small leak, but it trickles into the sink so “it’s not a big deal.”
  • Most people have noticed but dismissed that one of the toilets never stops flushing, but “that’s just what it does.”
  • Not everyone sees it, but there is one corner of the building where the grass is always muddy, even if it hasn’t rained in weeks, but “no one goes over there, so it’s not a problem worth fixing.”

Those are petty examples. But what if you then learned that the single most disproportionate expense between your company and your top five competitors is an inordinately high water bill? What if, when you measured the aggregate costs of the faucet leak, the flushing valve, and the sprinkler head, you discovered that those three issues were entirely responsible for the disparity in water costs? What if, without those minor issues, your company would have the lowest water costs among your competition, thereby allowing you to invest those funds into other areas of the organization that would help you be more successful? If you knew that, wouldn’t you fix your “minor” water issues?

Water bills (or telephony costs or cleaning services) are “just” operational matters that don’t “really” matter to the rest of the organization, until you realize that leaky faucets are inordinately costing the organization money that could be better spent.

Warning: People who say “we don’t have money in the budget for new plumbing” without looking at the water bill are making an uninformed decision. If your team members are saying “we don’t have money in the budget for better data protection” without quantifying the business impact of downtime and data loss, they are making an equally uninformed decision. Said another way, leaky faucets aren’t just the Facilities Team’s problem, and downtime/data loss aren’t just IT’s problem.

Unfortunately, many IT professionals aren’t effective in quantifying their leaky faucets in business terms. They measure IT outages with technical jargon and acronyms like RTO and RPO (recovery time objective and recovery point objective, respectively). Those two terms aren’t incorrect, as they can be used to assess the scale of each outage, as well as objectively quantify and compare newer technologies and approaches. They are like measuring ounces per hour of water leaking. To anyone other than the plumber or facilities manager, OZ/hr is “interesting” or perhaps “alarming,” but doesn’t quantify the impact on the water bill or the organization at large.

Tip: Organizations need to convert technical jargon like OZ per hour or RTO/RPO into business impact.

Only after assessing the business impact is the leadership team equipped to decide to accept those costs of doing business or fix it; and whether to fix it with a $15 part or a $300 service call.

Quantify Your Problems

Server outages, overwritten data, even ransomware/malware losses are often addressed, resolved, and then dismissed based solely on their temporary and tactical impedance. IT professionals can quantify how long outages are and how frequently they occur, but they typically do not have operational data such as the human costs of sitting idle while their systems are down or encumbered. IT also is unlikely to know the expected production or profitability of other departments (an inside sales department that can’t sell for three hours because their systems are offline, for example).

To quantify each IT outage in business terms that matter, you need at least four simple variables:

To = how long is the outage

Downtime is measured in time from the initial event until services are restored and users return to productivity

Td = how much data was lost (and likely to be recreated)

Data loss is measured in time from when the last recoverable backup was made until the outage itself. If it took someone an hour to write a document, it’ll likely take almost as long to re-create it from scratch. So, lost data has costs.

Hr = Human costs

Idle human costs measure the time that employees and contractors are sitting idle during an outage and who will likely have to recreate whatever data they created but was lost during the IT event. If a customer service department isn’t completely idle during an outage, but instead just encumbered, measure that as ½Hr or ¼Hr or whatever is appropriate.

Pr = Profitability or productivity

This measures the financial contributions to the organization that is hindered, such as the hourly created revenue of an inside sales organization. The first two variables quantify the complete time of the IT event, while the latter two variables quantify the hourly rate (in dollars) that an IT event costs. There is slightly more to the formula, but in its most basic format it looks like the following:

Cost of an IT Outage = (To + Td) x (Hr + Pr)

That is the cost of one outage (one leaky faucet, in other words). If IT were to then go through their system logs and count the number of outages and the lengths of times, you’d know how much of your inordinate water bill was unnecessarily wasted water –— avoidable IT outages that are costing your organization money.

Not All Costs Are Quantifiable

As tidy and empirical as this formula is, it alone doesn’t tell the whole story. Most IT issues also have less quantifiable impacts that might even be direr to the organization than the direct financial costs, no matter how great, as seen below.

Impacts Resulting From Application Downtime or Lost Data. Source: ESG Real World SLAs and Availability Methods, December 2017
Impacts Resulting From Application Downtime or Lost Data. Source: ESG Real World SLAs and Availability Methods, December 2017

How to Move Forward

Like most leadership tasks, the first step is to assess the problem. As leaders, start by surrounding yourself with a broad range of insights and expertise. Most organizations that simply rely on IT’s self-assessment of their data protection capabilities are either under-protected (unaware of data sets, unaware of regulatory mandates, etc.), unprepared to recover (because IT doesn’t understand the business requirements), or both. Instead, start by talking more with a broader cross-section of operational stakeholders and technical professionals, including:

  • IT Operations, who focus on server and infrastructure uptime and delivery
  • Backup administrators, who focus on data protection/recovery, and outage mitigation/remediation
  • Application owners, who focus on the nuances of databases, email systems, and so on. and their uptime/recovery requirements
  • Business unit stakeholders, who focus on their teams’ functional requirements and IT dependencies
  • Legal/compliance teams, who focus on regulatory mandates related to data and its handling
  • Operations/HR, who focus on financial considerations and strategies across the organization
  • Business Continuity/Disaster Recovery (BC/DR) professionals, who focus on the complex correlations between the technical and less-technical communities and processes listed above. (Note: by this definition, do not assume that BC/DR is “just an IT” thing)
  • Executive sponsors, meaning you or representatives of the senior leadership team who can help bridge these myriad sub-organizations and champion change across the organization

Warning: If you are missing one of these perspectives, then you’ll be missing either 1) a contributing factor to the problems related to assuring IT service delivery or 2) relevant information to determine the best strategy and tool choices to resolve your problems (or both).

The Process Begins (and Ends) with the Business, Not the Technology

Start by identifying the core teams, their processes, and their dependencies on IT services. After that, go through the “what if’s” for common and less common (but possible) types of IT outages. Working backward from each potential outage, you can quantify the potential costs of each type of event —referred to as a Business Impact Analysis (BIA) — as well as the likelihood of each type of crisis —referred to as a Risk Analysis (RA).

Your cross-functional team can use the same team-process-IT dependency relationships from the BIA and RA process to retrospectively assess the financial impact of the last two years of actual IT outages. This can serve as a validation of your BIA and RA models, much like planning one’s “Flex Spending” not just by anticipated future expenses but also by looking at the past few years’ actual expenses. The results of your work should provide the answers to 3 crucial questions:

What is the actual “cost of the problem” of downtime and data loss with the current IT strategy?

What are the organization’s most pressing areas of concern or vulnerability that should be assured to be protected against?

What is the organization’s commitment to do better?

Frankly speaking, it doesn’t matter what is currently “in the budget for IT” — because the financial impacts of downtime and data loss aren’t in the budget either. Using the earlier analogy, stating that “a plumber isn’t in the budget” at the beginning of the process isn’t a valid point of view since it’s more than likely that no one had the foresight to budget for water loss and water damage remediation either. Wouldn’t you rather add a better solution to the budget than simply accepting the costs of the problem?

Once you’ve calculated the real costs of water loss, then you can decide how much to spend on a plumber or not. Once you’ve calculated the real costs of data loss and downtime, then you can decide how much to invest in better data protection and recovery. At that point, you now have a three-sided conversation:

Technical professionals discussing RPO/RTO of the underlying IT systems and their recovery/availability capabilities, and the resulting commitments to IT delivery through Service Level Agreements (SLAs)

Operations professionals discussing BIA/RA and the negotiation of SLA’s, based on operational dependencies to IT systems in alignment with the needs of the organization Financial professionals discussing Total Cost of Ownership (TCO) and Return on Investment (ROI), looking at the costs of the problem versus the costs of the solution(s)

Tip: It is important to note, that all three “languages” are valid: Technical, Operational, and Financial.
Source: Veeam