Examining Disk Storage Reliability Specs

July 09, 2008

Disk storage, whether on a corporate desktop or shared on a company network, is arguably the most important element of any computer system.

Computers are all about information: creating it, manipulating it, retrieving it and, above all, storing it. That makes storage, and more particularly hard disks, the most critical element of any computer system, a statement that holds true whether the drives are on the corporate desktop or function as shared storage on a company network, alone or in an array. If part of your job is deciding on minimum requirements for hard disks, that's a strong argument for paying close attention to hard disk specifications. Here's a look at some key reliability specs to consider.

Life Expectancy
Drives, like people, have a life expectancy. For drives, it's called the service life or design life (because that's how long the drive was designed to remain in service). The service life is typically three to five years, but can be as high as 10 years. Knowing the service life is important, because failure rates rise rapidly at the end of service life. Assuming the drive lasts that long, you'll want to replace it at that point - before it fails. Knowing the service life is also important for understanding the MTBF (mean time between failures) spec.

What Mean Time Between Failures Isn't
MTBF is probably the single most widely misunderstood drive spec, even among people who are knowledgeable about computer hardware. It doesn't tell you anything about how long a drive will last, which is what most people think it means. MTBFs for the current generation of hard disks are typically anywhere from 500,000 to 1.2 million hours for desktop drives, and as much as 1.6 million hours for enterprise drives. That works out to roughly 57 to 180 years. Drives obviously don't last that long.

How Likely It Is to Break
The right way to read MTBF is as a statistical statement. It tells you how likely the drive is to fail, or, more precisely, how often a drive will fail, on average. Before we look at the details, though, it's important to understand that not all MTBFs are on equal footing.


Any given MTBF is based on specific testing conditions, including obvious issues like temperature, which can affect the life of the drive. A less obvious but nonetheless critical issue is whether the drive was tested on the assumption that it would run 24 hours a day, 7 days per week (8,760 hours per year), or run just 40 to 50 hours per week (a total of 2,400 hours per year for Seagate Barracuda desktop drives, for example).

The specs for enterprise drives destined for networks are generally based on the first scenario, which also assumes just a few hundred motor starts and stops per year. The specs for drives aimed at the desktop are generally based on the second scenario, and assume thousands of motor starts and stops per year.

Translating MTBF
Depending on the scenario and on how many drives you have, a given MTBF will translate to a different length of time on the calendar (or clock). If a given model drive has a 1.2 million hour MTBF, for example, and you have 1.2 million drives, you can expect an average of one drive to fail every hour. If you have 120 drives - a more reasonable number - you would, on average, expect one to fail every 10,000 hours. That works out to about one every 416 days for enterprise drives running 24 hours per day, or roughly one every 4.2 years for desktop drives running a total of 2,400 hours per year.

Another Way to Look at MTBF
MTBF in hours isn't as easy to understand on a gut level as a simple statement of failure rate. Some manufacturers are addressing that by adding the AFR (annualized failure rate) to their specs - the odds of a drive failing over the course of a single year.

From MTBF to AFR
If you can't find the AFR for a given drive, you can easily calculate it from the MTBF. First divide one failure by the MTBF in hours to get failures per hour. To convert that to failures per year, multiply the result by 8,760 hours for an enterprise drive or by 2,400 hours (or whatever number of hours the MTBF is based on) for a desktop drive. To turn the result into a percentage, multiply by 100. For a 1.2 million hour MTBF for an enterprise drive, for example, the APR comes out to 0.73 percent, which means (in theory at least) that you have a 0.73 percent chance of the drive dying in any given year.

MTBF and Service Life
One important point about MTBF is that it holds true only for the service life of the drive. As already mentioned, once a drive reaches the end of its service life, the failure rate goes way up, and the MTBF is no longer meaningful. The MTBF only applies if you keep replacing individual drives at (or before) the end of their service lives - at which point the technology should be much improved, so you'll want to move on to a new drive in any case.

Finding the Service Life
The service life for a drive is typically missing from the spec sheet, but you can often find it in the drive's manual. In most cases, you can search for the manual on the manufacturer's Web site, where it's usually available as a PDF file. You can then search for "service life" in the manual itself.

How Long Is the Warranty?
If you can't find the service life for a particular model drive, and can't get the information from the manufacturer, you might want to simply treat the length of the drive's warranty as the service life. The cynical (some would say, conservative) view is that you should treat it as the service life in any case. After all, the length of the warranty is, by definition, how long the manufacturer is willing to bet the drive will last, regardless of how long it was designed to last.

Back of the Envelope, Please
One thing to keep in mind when comparing MTBFs between drives, even when they are based on the same scenarios, is that they are not solidly reliable numbers based on actual drive history. As a rule, they are based on some limited testing combined with actual results of similar, older models, with the numbers plugged into a mathematical model that calculates the MTBF. Given the same limited data, different mathematical models will spit out different results. It's best to think of the spec as a back-of-the-envelope calculation: a useful indicator, as long as you don't take it too seriously.

Small Differences Don't Matter
The nature of the MTBF spec means that you have to take it with a large grain of salt. A two-to-one differenceâ”600,000 hours versus 1,200,000 - is probably meaningful. A 10 or 20 percent difference - 800,000 hours versus 1,000,000 - may not be.

And in the Real World...
Keep in mind too that even if the MTBF spec were precisely correct, it would only apply in the conditions defined by the testing scenario. As with fuel efficiency claims for cars, your mileage will probably vary. This alone may be enough to explain why some real-world studies have found much higher failure rates than MTBF specs predict. That's an unpleasant reality, but it's important to know.

M. David Stone is the author of this article and a writer for eWeek.com. His original article can be found at Data Storage: Examining Disk Storage Reliability.

Scalable NAS Resources

  • Storage Consolidation Without Performance Compromise

    Given the current state of the economy, storage consolidation is now a high priority for every IT organization. But for IT organizations running performance-sensitive applications, storage consolidation can be a major challenge.

  • Demystifying NAS Clustering

    Data storage needs are on the rise. But beyond simply providing more raw capacity, today¿s storage solutions must also be easy to provision and manage, energy-efficient, and highly scalable in performance and capacity. Download this white paper to learn about HP NAS clustering solutions that help meet today¿s rapidly changing storage requirements.

  • Windows File Server Consolidation: Reference Architecture and Configurations

    Organizations that deploy Microsoft Windows file servers receive many useful services. Traditional file servers, however, lack scalability, so organizations must add file servers as their data storage needs grow. This results in server sprawl, which leads to low utilization of the file servers and sub-optimal availability of storage. Learn how organizations benefit from consolidating their Windows file serving environments using HP Scalable NAS, a highly scalable, manageable and available storage solution.

  • Data Mobility Group TCO Study on ExDS

    Storage administrators are being challenged to manage enterprise data growth and maintain increasing service level commitments while keeping within budgets. This study examines the total cost of ownership of the new HP StorageWorks 9100 Extreme Data Storage System (ExDS9100) and compares it to three competitive approaches. Learn how the HP ExDS9100 is well positioned to deliver massive scalability in both capacity and performance, yet offers considerable cost advantages to meet today¿s storage challenges.

  • Managing Exponential Storage Growth

    In this IT Link podcast hosted by Mike Vizard, Scott Campbell, HP manager of solutions architects, explains why HP is taking a different approach to managing storage using a new XDS architecture specifically designed to handle the requirements of rapidly growing unstructured data storage.

  • Comprehending NAS Clusters

    In this IT Link podcast hosted by Mike Vizard, Efren Molina, PolyServe technical specialist for HP, explains how NAS cluster technology is being used to help customers keep costs in line even as their storage requirements continue to balloon.

  • Coming to Terms with Storage Management

    In this IT Link podcast hosted by Mike Vizard, Logicalis vice president of consulting Eric Linxweiler explains why storage management software is becoming a strategic issue as the amount and types of data that needs to be managed continues to explode.

  • Massively Scalable NAS: Pre-Empting Tomorrow’s Data Overload with Today’s Technology

    NAS has always been simple, unless IT managers wanted to grow their NAS storage significantly. For the first time, storage administrators are thinking in terms of managing petabytes of storage, making massive storage build-outs a necessity. Learn how companies can affordably meet these demands with a simply managed, highly scalable NAS environment.

  • Transparent Business Continuity and Availability through HP Scalable NAS

    This solution brief explores HP’s next generation of Scalable NAS and how it caters to every business continuity need by being highly available and easy to deploy while adding levels of affordable, fault tolerant data protection and availability.

  • Scalable NAS: Insights from customers, analysts and HP

    When IT administrators are looking for networked storage solutions, they often look to NAS because they can use the Ethernet infrastructure they are familiar with to build pools of storage for significantly less money than SAN with equivalent capacity. Unfortunately, traditional NAS doesn't scale and administrators find themselves having to add NAS platforms to keep up with growing storage demands. As a result, many administrators have started looking for alternative solutions.

  • Scalable, Always Available Solution for Digital Media

    Learn how HP's Scalable NAS solution offers central management and administration, scalable capacity and improved utilization, with a lower total cost of ownership (TCO)

  • Create an On-demand Streaming Media Storage Solution with HP Scalable NAS

    Watch this demo and learn how HP's next generation of Scalable NAS is well suited for streaming media serving solutions.

  • Roswell Park Cancer Institute Improves Scalability and Performance with HP Storage Solution

    When Roswell Park Cancer Institute (RPCI) needed to remain on the front line of research and to continue providing high-quality care for patients, they chose a comprehensive HP storage solution and improved storage capacity, performance and scalability.

  • HP Storage Removes Bottlenecks, Consolidates Storage and Increases Revenue for Crest Animation

    When Crest Animation looked to take on an increased workload and handle High Definition and 2K film animations, the company chose a comprehensive HP storage solution that has given the company a unified, highly reliable storage infrastructure.

  • Create a Scalable Infrastructure for Oracle

    Oracle Database and the Oracle E-Business Suite are at the heart of most commercial data centers. HP's Scalable NAS solution Create a scalable infrastructure for Oracle consolidation and file serving.

  • Streaming Media Content Reference Architecture

    The new Web 2.0 business model, where the data is the business, utilizes the Internet to disseminate information in many different ways.

  • Scalable NAS for Oracle Demo

    NAS has been rapidly evolving as a storage alternative for Oracle databases, and for good reason: NAS is often the simplest, most cost-effective storage approach for Oracle.

  • Consolidation for an Optimized Storage Environment

    Windows File Server and Storage Consolidation using HP EVA File Services.

  • Scalable, Fault-Tolerant NAS for Oracle: The Next Generation

    For several years NAS has been evolving as a storage alternative for Oracle databases, and for good reason