IT service outages: Shorter or Fewer?

CTN Issue: May 2012 IEEE Transactions on Network and Service Management

Maintaining high availability is a sine qua non for today’s IT service providers. Customer demands are on the rise, and service outages in high-profile application areas such as credit card payment systems rapidly hit the news headlines. However, despite all the attention, the most popular concept of service availability is surprisingly crude, and often reduced to just a single figure (such as 99.98%).

This article highlights the importance for IT service managers to look beyond mere average service outage times and costs. In fact, outage costs often exhibit a lot of variance, meaning that a risk analysis based on averages might be misleading. For instance, in the retail business, a single high revenue hour outage might cost as much as a dozen low revenue hour outages. Using simulations on existing sets of revenue data, an argument is made that the single-figure-approach to service availability is inadequate for businesses that want to properly manage the financial risks associated with IT service unavailability. Additional information on average duration and number of outages is also required. Should a person signing a Service Level Agreement on a certain availability level (e.g. 99.98%) prefer that the downtime is distributed over many but short outages or fewer but longer ones? The answer depends on the kind of a company the person represents.

Simply put, companies that have a high fixed cost for a restart of their main business process should prefer longer but fewer outages. A good example is a physical process, such as a rolling mill, with high working temperatures and supply chains involving thousands of metric tons. Companies where outages in the main business process entail costs that increase with the outage duration should prefer more but shorter outages. A good example is an ATM service where a short glitch just makes end users retry, but hours of interrupted service on payday would make them switch to another bank. In the paper, these phenomena are mathematically modeled, and an optimal outage length is derived.

Title and author(s) of the original paper in IEEE Xplore:
Title: Optimal IT Service Availability:Shorter Outages, or Fewer?
Author: Ulrik Franke
This paper appears in: IEEE Transactions on Network and Service Management
Issue Date: March 2012

Leave a comment

Statements and opinions given in a work published by the IEEE or the IEEE Communications Society are the expressions of the author(s). Responsibility for the content of published articles rests upon the authors(s), not IEEE nor the IEEE Communications Society.