Re: RAID timeout parameter accessibility request

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Robert Hancock <hancockr@shaw.ca>
To: Jose de la Mancha <hidalgoj@free.fr>
Cc: linux-kernel@vger.kernel.org
Subject: Re: RAID timeout parameter accessibility request
Date: Sun, 30 Dec 2007 17:10:38 -0600	[thread overview]
Message-ID: <4778256E.5000309@shaw.ca> (raw)
In-Reply-To: <fa.nf+P3+JC0dd6/v0bUA0T+jgXpts@ifi.uio.no>

Jose de la Mancha wrote:
> Hi everyone. I'm sorry but I'm not currently subscribed to this list (I've
> been sent here by the listmaster), so please CC me all your
> answers/comments. Thanks in advance.
> 
> SHORT QUESTION :
> In a Debian-controlled RAID array, is there a parameter that handles the
> timeout before a non-responding drive is dropped from the array ? Can this
> timeout become user-adjustable in a future build ?
> 
> EXPLANATIONS :
> As you might know, if you install and use a "desktop edition" hard drive in
> a RAID array, the drive may not work correctly. This is caused by the normal
> error recovery procedure that a desktop edition hard drive uses : when an
> error is found on a desktop edition hard drive, the drive will enter into a
> deep recovery cycle to attempt to repair the error, recover the data from
> the problematic area, and then reallocate a dedicated area to replace the
> problematic area. This process can take up to 120 seconds depending on the
> severity of the issue.
> 
> The problem is that most RAID controllers allow a very short amount of time
> (7-15 seconds) for a hard drive to recover from an error. If a hard drive
> takes too long to complete this process, the drive will be dropped from the
> RAID array !

This always seemed a strange use case to me. If the drive is getting 
read errors, either it's dying and needs to be replaced, or it has a 
sporadic bad sector as a result of a power failure during write, etc. in 
which case the drive should be resynchronized. In either case the drive 
should be dropped from the array and require manual intervention. It 
doesn't seem logical to me to just read the data from another drive and 
carry on in our merry way without any warning.

> 
> Of course there are "RAID edition" hard drives with a feature called TLER
> (Time Limited Error Recovery) which stops the hard drive from entering into
> a deep recovery cycle. The hard drive will only spend 7 seconds to attempt
> to recover. This means that the hard drive will not be dropped from a RAID
> array. But these "special" hard drives are way too expensive IMHO just for a
> small firmware-based feature.
> 
> There would be an easy way to allow users to use "ordinary" hard drives in a
> Debian software-controlled RAID array. So here's my request : I suppose
> there is a parameter that handles the default timeout before a drive is
> dropped from the RAID array. I don't know if this parameter is hardcoded,
> but it would be nice if it was user-adjustable. This way, we could simply
> set up this parameter to 120 seconds or more (instead of 7-15) and we
> wouldn't have any more problems with using desktop "edition hard" drives in
> a RAID array.
> 
> What do you think ? Can it be done in a future build ?
> 
> I really hope that you'll be able to help, because I guess a lot of people
> can be concerned by this issue.
> 
> Many thanks in advance & Best regards.

I don't know the md internals very well, but I wouldn't imagine there's 
a timeout in its code, the timeout would be based on the block layer and 
driver timeouts for the consitituent devices. For libata disks, the 
timeout is normally 30 seconds. After that expires, the disk will get a 
soft or hard reset and the command is typically retried by the block 
layer. If all retries fail the upper layers will get a failure report, 
and I believe at that point the md layer decides to disable the device.

-- 
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/

next      parent reply	other threads:[~2007-12-30 23:12 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <fa.nf+P3+JC0dd6/v0bUA0T+jgXpts@ifi.uio.no>
2007-12-30 23:10 ` Robert Hancock [this message]
2007-12-31  9:54   ` RAID timeout parameter accessibility request Jose de la Mancha
2007-12-31 10:45     ` Thanasis
2007-12-31 12:11     ` Michael Tokarev
     [not found] <9GpBt-6P8-57@gated-at.bofh.it>
     [not found] ` <9GqH0-pd-11@gated-at.bofh.it>
2008-01-02 14:49   ` Bodo Eggert
2007-12-30 22:42 Jose de la Mancha
2007-12-30 23:22 ` Jan Engelhardt
2007-12-31  7:19 ` Thanasis
2008-01-02 18:17 ` Bill Davidsen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4778256E.5000309@shaw.ca \
    --to=hancockr@shaw.ca \
    --cc=hidalgoj@free.fr \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox