All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bill Davidsen <davidsen@tmr.com>
To: Mark Hahn <hahn@mcmaster.ca>
Cc: Al Boldi <a1426z@gawab.com>, linux-raid@vger.kernel.org
Subject: Re: PATA/SATA Disk Reliability paper
Date: Tue, 27 Feb 2007 14:21:33 -0500	[thread overview]
Message-ID: <45E484BD.5010501@tmr.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0702251119300.21918@coffee.psychology.mcmaster.ca>

Mark Hahn wrote:
>>>> In contrast, ever since these holes appeared, drive failures became 
>>>> the
>>>> norm.
>>>
>>> wow, great conspiracy theory!
>>
>> I think you misunderstand.  I just meant plain old-fashioned 
>> mis-engineering.
>
> I should have added a smilie.  but I find it dubious that the whole 
> industry would have made a major bungle if so many failures are due to 
> the hole...
>
>> But remember, the google report mentions a great number of drives 
>> failing for
>> no apparent reason, not even a smart warning, so failing within the 
>> warranty
>> period is just pure luck.
>
> are we reading the same report?  I look at it and see:
>
>         - lowest failures from medium-utilization drives, 30-35C.
>         - higher failures from young drives in general, but especially
>         if cold or used hard.
>         - higher failures from end-of-life drives, especially > 40C.
>     - scan errors, realloc counts, offline realloc and probation
>     counts are all significant in drives which fail.
>
> the paper seems unnecessarily gloomy about these results.  to me, they're
> quite exciting, and provide good reason to pay a lot of attention to 
> these
> factors.  I hate to criticize such a valuable paper, but I think they've
> missed a lot by not considering the results in a fully factorial analysis
> as most medical/behavioral/social studies do.  for instance, they bemoan
> a 56% false negative rate from only SMART signals, and mention that if
>> 40C is added, the FN rate falls to 36%.  also incorporating the 
>> low-young
> risk factor would help.  I would guess that a full-on model, especially
> if it incorporated utilization, age, performance could comfortable 
> levels. 
The big thing I notice is that drives with SMART errors are quite likely 
to fail, but drives which fail aren't all that likely to have SMART 
errors. So while I might proactively move a drive with errors out or to 
non-critical service, seeing no errors doesn't mean the drive won't fail.

I haven't looked at drive temp vs. ambient, I am collecting what data I 
can, but I no longer have thousands of drives to monitor (I'm grateful).

Interesting speculation: on drives with cyclic load, does spinning down 
off-shift help or hinder? I have two boxes full of WD, Seagate and 
Maxtor drives, all cheap commodity drives, which have about 6.8 years 
power on time, 11-14 power cycles, and 2200-2500 spin-up cycles, due to 
spin down nights and weekends. Does anyone have a large enough 
collection of similar use drives to contribute results?

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979


  parent reply	other threads:[~2007-02-27 19:21 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-02-18 18:50 PATA/SATA Disk Reliability paper Richard Scobie
2007-02-19 11:26 ` Al Boldi
2007-02-19 21:42   ` Eyal Lebedinsky
2007-02-20 12:15     ` Al Boldi
2007-02-22 22:27       ` Nix
2007-02-22 22:30         ` Nix
2007-02-22 23:30         ` Stephen C Woods
2007-02-23 18:22           ` Al Boldi
2007-02-24 22:27             ` Mark Hahn
2007-02-25 11:22               ` Al Boldi
2007-02-25 17:40                 ` Mark Hahn
     [not found]                   ` <200702252057.22963.a1426z@gawab.com>
2007-02-25 19:58                     ` Mark Hahn
2007-02-25 21:07                       ` Al Boldi
2007-02-25 22:14                         ` Mark Hahn
2007-02-25 22:46                           ` Benjamin Davenport
2007-02-25 23:58                             ` Mark Hahn
2007-02-27 19:21                   ` Bill Davidsen [this message]
2007-02-25 19:02               ` Richard Scobie
2007-02-27 19:06           ` Bill Davidsen
2007-02-26 14:15   ` Mario 'BitKoenig' Holbe
2007-02-26 17:46     ` Al Boldi
2007-02-20  3:03 ` H. Peter Anvin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=45E484BD.5010501@tmr.com \
    --to=davidsen@tmr.com \
    --cc=a1426z@gawab.com \
    --cc=hahn@mcmaster.ca \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.