Skip to Content

SLA Expectations and Measurements Quiz: Are You in the Know?

Test your knowledge of Service Level Agreements (SLAs) and learn how to create well-defined criteria to accurately measure expectations and results. Avoid costly mistakes by understanding the fundamentals of SLA management.

Service Level Agreements (SLAs) are crucial for setting clear expectations and objectives in the world of IT services. Without a comprehensive understanding of your organization’s IT requirements and capabilities, agreeing to SLAs can lead to disaster. On the other hand, well-defined SLA expectations that align with business priorities can not only help IT teams steer clear of penalties and performance reviews due to contract breaches but can also contribute to increased worker productivity and enhanced customer satisfaction.

This quiz delves into the realm of SLAs and how they can significantly impact IT operations. It tests your ability to differentiate between ambitious, impractical uptime expectations and realistic ones. It also examines whether SLA contract wordings support consistent and objective measurement of SLA metrics. Furthermore, it challenges you to consider whether IT services should be treated as a single entity or categorized based on business priorities.

Taking this quiz will provide insights on how to craft a robust service-level agreement and assess how your organization measures IT performance in existing SLAs. There’s always room for improvement, and enhancing your understanding of SLAs is a vital step towards better IT service management.

Question 1

Which of these service levels for a continuously used application (24/7 year-round) permits about 4.5 hours of downtime per year?

A. 999%
B. 99%
C. 95%
D. 90%

Answer

C. 95%

Explanation

Organizations can set uptime expectations for applications and for IT resources, such as a data center or cloud service. Several online tools quickly translate uptime percentages into real minutes of downtime. Specify in the SLA what the uptime expectations apply to so that the IT organization can build appropriate levels of redundancy and failure tolerance into the system. Measure SLA compliance based on agreed-upon performance levels, not simply whether an application or site was wholly offline. For example, some applications are considered unavailable if they work too slowly and create a negative UX.

Continuous, highly available operation — as needed for an application that contains employees’ payroll information — is the most demanding in terms of uptime support and likely requires IT operations to set up on-call hours for immediate support, potentially across time zones. However, not all applications require continuous availability. For example, an application that aggregates data on customer volume at retail sites for planning purposes could experience significant downtime without a noticeable effect on customer experience or employee productivity.

Question 2

“This application shall not experience more than 60 minutes of downtime during business hours per month.” What concrete step will best help the IT organization meet this SLA expectation?

A. Run the application on physical rather than virtualized hardware.
B. Review application response times/end-user experience at the end of each week.
C. Plan application updates on nights and weekends.
D. Limit the number of users on the application.

Answer

C. Plan application updates on nights and weekends.

Explanation

Unplanned application downtime is unpredictable — caused by human error, a security breach, equipment failure or a runaway issue with the application code that was not caught in testing. IT operations teams can minimize unplanned downtime with approaches such as cluster-based virtualized hosting that eliminates a server as a single point of failure, preproduction load tests, and real-time and predictive monitoring. However, there’s no fail-safe way to ensure the application will always be available.

Planned downtime is under IT’s control. Most IT organizations plan upgrades and other updates on nights and weekends. This practice is the antithesis of a job perk, but it does maximize application availability for productivity workers, which increases satisfaction with IT. Patches to fix security vulnerabilities fall somewhere between planned and unplanned downtime. While the IT organization can communicate expected downtime to users in advance, it cannot always wait until a convenient time to deploy these critical fixes.

Question 3

How can the IT organization improve the business application’s SLA above?

A. State whether the SLA applies to planned and unplanned downtime or only unplanned downtime (outages).
B. Change the month measurement to an objective time limit, such as 30 days.
C. Both of the above
D. Neither of the above

Answer

C. Both of the above

Explanation

The devil is in the details with any contract, and SLAs are no exception. Improve how you measure SLA metrics by making everything as specific and consistent as possible. Since downtime describes the application’s state, not the cause of it, be sure to include details that classify downtime: unplanned issues or all times when the application is offline, including patches and major updates.

In addition to definitions for operations, SLAs should clearly define time periods. A month can last anywhere from 28 to 31 days. While formalized contracts can verge on pedantic, these specifics will make SLA expectations clear to all parties and enable metrics-based conversations if IT misses performance targets.

Question 4

How could an IT organization improve this SLA statement without upsetting users? “The time to resolution for issues will not exceed 24 hours.”

A. Set time to resolution metrics for critical, important and low-priority issues instead of all issues.
B. Extend the time to resolution to double the current limit.
C. Set a time to escalation metric instead of time to resolution.
D. None of the above

Answer

A. Set time to resolution metrics for critical, important and low-priority issues instead of all issues.

Explanation

Not all IT incidents should be treated in the same way. Issues that severely affect how workers do their jobs or how customers interact with the business should take priority. The more users a problem hits, the more likely it is to be critical. While extending the time to resolution surely would help the IT organization more easily meet its SLA, that extended time frame does not improve the SLA itself. Likewise, IT organizations might want to track how often issues are escalated and how long it takes before an escalation occurs or how quickly the IT support team communicates with the affected parties. But these response metrics in and of themselves won’t ensure that issues get resolved more quickly. By categorizing incidence severity, the IT organization can quickly prioritize finite resources to resolve the most important problems in the least amount of time possible.

Measure SLA compliance for time to resolution consistently by stating when the issue begins (such as when the ticket is created or when it is assigned to the technician) and when the issue stops (when operations return to normal, when operations reach a predetermined acceptable level or when the ticket is closed).

Question 5

Which of these statements is the best way to explain an issue’s severity level?

A. A business-critical issue costs the company money.
B. A business-critical issue affects at least 20% of customers and generates an error or timeout messagAn example is a database glitch in the purchasing application.
C. A business-critical issue affects a large group of workers in high-priority job roles, such as VP of finance.
D. A business-critical issue requires a level III technician to resolve and leads to development work to prevent recurrence.

Answer

B. A business-critical issue affects at least 20% of customers and generates an error or timeout messagAn example is a database glitch in the purchasing application.

Explanation

Define issues with as much specificity as possible, including measures of scope (percentage of customers affected) and an example scenario. Issue classification helps IT organizations prioritize monitoring resources and support staff. While subjective terminology, such as “a large group” and “high-priority job roles,” are better than no description, they’re open to interpretation by members of the IT team and by business leaders. Similarly, it is difficult to classify an issue by stating what level technician should address it. A critical failure affecting all customers could require a simple restart performed by a level I tech, and a low-priority UI problem might require the specialized expertise of a senior developer to resolve.

Question 6

“Network latency shall not exceed 40 milliseconds.” How can an IT team ensure compliance with this SLA?

A. Network monitoring
B. Workload scaling close to the users
C. Share and enforce this SLA with network partners
D. All of the above

Answer

D. All of the above

Explanation

Monitoring is essential to measure SLA compliance on a day-to-day basis. In addition to detecting traffic spikes and problems, network monitoring provides fundamental data about traffic routes and congestion points, so the IT team can optimize network usage for lower latency.

Application scaling should also take into account traffic routes. Physical proximity between server and user reduces latency, so try to locate computing resources geographically close to where they are needed. For example, multinational banks use data centers in financial hubs in the U.S., Germany, Singapore and elsewhere to serve clients quickly.

Finally, address the network speeds and traffic that are outside of the IT team’s immediate control. Scrutinize network provider contracts, regularly review the current setup and discuss ways to improve it.

Question 7

What is a good way for the IT team to improve how it measures the network latency SLA above?

A. Set network latency expectations based on location, such as remote workers vs. employees in the central office.
B. Disable the enterprise’s firewalls on the network.
C. Increase the number of applications that rely on the network.
D. Change from on-premises applications to SaaS options.

Answer

A. Set network latency expectations based on location, such as remote workers vs. employees in the central office.

Explanation

Networks have a natural amount of latency related to how far information must travel from source to consumer. Therefore, organizations can expect lower latency for workers collocated with the corporate data center. For global operations, consider distributed data copies in cloud or data centers near remote offices.

While a migration from on-premises applications to SaaS offerings could affect network operations, these changes must stem from larger considerations than latency. Evaluate SaaS options’ support costs, the features and UI of the product, and data protection.

Disabling the firewall would likely reduce network latency. However, it would not improve the overall service level provided by IT operations and likely would lead to an angry meeting with the information security chief.

Question 8

“Application response time shall not exceed 5 milliseconds.” Which of the following steps is LEAST likely to help an IT organization reduce application response time?

A. Upgrade application storage from hard drives to solid-state drives.
B. Collocate application components on hosting resources to minimize network latency.
C. Refactor the application to remove wasteful or outdated processes.
D. Set power consumption limits on the servers hosting the application.

Answer

D. Set power consumption limits on the servers hosting the application.

Explanation

Power consumption limits can reduce data center facility costs and even improve application performance in the event that hotspots from overheating servers have caused equipment failures. However, the benefits for application performance are indirect at best, and indeed an energy-saving initiative could hurt application performance if it is executed overzealously or with an incomplete picture of normal IT operations and expectations.

Question 9

Which of the following SLAs for work in progress in the IT department can a business measure objectively?

A. IT’s backlog must be prioritized according to how damaging an issue is to business performance.
B. IT’s backlog must not affect application performance.
C. IT’s backlog must not exceed two business days.
D. IT’s backlog must not lead to user complaints about service delivery.

Answer

C. IT’s backlog must not exceed two business days.

Explanation

Because it specifies a time limit, the SLA for a backlog of two business days sets an objective measure of IT performance. IT organizations and business leaders should collaborate to set expectations around work prioritization — such as for valuable application features and security improvements — and user expectations. To improve backlog management, IT organizations can revamp the support desk’s ticket system to include levels of priority and set escalation rules for scenarios where business value is at risk.

Not sure where IT’s backlog stands today? Work in progress is especially difficult to pin down for IT operations because there’s no “done” when it comes to application and IT support, according to Dominica DeGrandis, who wrote Making Work Visible: Exposing Time Theft to Optimize Work & Flow. Check out an excerpt of the book, and explore how backlogs affect IT performance.