linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bill Davidsen <davidsen@tmr.com>
To: linux-raid@vger.kernel.org
Subject: Re: raid failure question
Date: Mon, 01 Feb 2010 15:19:25 -0500	[thread overview]
Message-ID: <4B67374D.1030007@tmr.com> (raw)
In-Reply-To: <20100111205332.GA24486@cthulhu.home.robinhill.me.uk>

Robin Hill wrote:
> On Mon Jan 11, 2010 at 11:00:40AM -0700, Tim Bock wrote:
>
>   
>> Hello,
>>
>> Excluding the obvious multi-disk or bus failures, can anyone describe
>> what type of disk failure a raid cannot detect/recover from?
>>
>> I have had two disk failures over the last three months, and in spite of
>> having a hot spare, manual intervention was required each time to make
>> the raid usable again.  I'm just not sure if I'm not setting something
>> up right, or if there is some other issue.
>>
>> Thanks for any comments or suggestions.
>>
>>     
> Any failure where the disk doesn't actually return an error (within a
> reasonable time).  For example, consumer grade disks often have very
> long retry times - this can mean the array in unusable for a long time
> until the disk eventually fails the read.
>
> If the disk actually returns an error then, AFAIK, the RAID array should
> always be able to recover from it.
>   

The problem is that the admin should be able to set a timeout after 
which recovery takes place even if the drive hasn't returned a bad 
status. And some form of counter could be kept such that after a number 
of these the drive is failed. There is no solution, Neil says the 
timeout should be in the driver, the driver writers say that if it hurts 
md the timeout should be there. Everyone points the finger at some other 
code and says "there."

This is not lazyness or buck passing, Neil feels that md is not the 
place, but putting it elsewhere causes other problems. Until someone 
says "perfect is the enemy of good enough" and puts a timer where it 
will solve the problem, this behavior will continue.

-- 
Bill Davidsen <davidsen@tmr.com>
  "We can't solve today's problems by using the same thinking we
   used in creating them." - Einstein


  parent reply	other threads:[~2010-02-01 20:19 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-01-11 18:00 raid failure question Tim Bock
2010-01-11 18:08 ` Majed B.
2010-01-11 20:44   ` Thomas Fjellstrom
2010-01-11 20:53 ` Robin Hill
2010-01-12 12:08   ` Roger Heflin
2010-01-12 15:07     ` Tim Bock
2010-02-01 20:19   ` Bill Davidsen [this message]
2010-01-12  4:47 ` Leslie Rhorer
  -- strict thread matches above, loose matches on Subject: below --
2010-02-01 20:29 David Lethe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B67374D.1030007@tmr.com \
    --to=davidsen@tmr.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).