From: Phil Turmel <philip@turmel.org>
To: Wilson Jonathan <piercing_male@hotmail.com>
Cc: "\"Großkreutz, Julian\"" <Julian.Grosskreutz@med.uni-jena.de>,
"linux-raid@vger.kernel.org" <linux-raid@vger.kernel.org>,
"neilb@suse.de" <neilb@suse.de>
Subject: Re: mdadm expanded 8 disk raid 6 fails in new server, 5 original devices show no md superblock
Date: Wed, 15 Jan 2014 08:35:22 -0500 [thread overview]
Message-ID: <52D68E9A.5030906@turmel.org> (raw)
In-Reply-To: <BLU0-SMTP2298EF7EE934F21F583ECCA98BE0@phx.gbl>
On 01/15/2014 07:50 AM, Wilson Jonathan wrote:
> On Tue, 2014-01-14 at 13:43 -0500, Phil Turmel wrote:
>> On 01/14/2014 12:47 PM, Wilson Jonathan wrote:
>>
>> [trim /]
>>
>>> I understand the issue of "timeout" on drives that might perform long
>>> error checking which then causes mdadm, via the device (block?) driver
>>> issuing a time out, to then kick the drive. In this instance you allow
>>> some time for a drive to try and fix things at the expense of a hung
>>> array for a longer period of time.
>>>
>>> I also understand that with scterc the drive gives up (in effect timing
>>> its self out) when it hits the 7 second, or there about, mark and
>>> subsequently mdadm kicks the drive out. In this specific instance the
>>> idea is to kill a drive quickly to that the raid doesn't hang longer
>>> than a few seconds.
>>
>> No. The intent is to fail the read without failing the controller channel.
>
> Arrr, thanks for the clarification... I hadn't realised that instead of
> the drive returning a "Error, I can't get the data, I'm dead in the
> water" message it instead returned a "warning, I can't get the data, you
> deal with it and get back to me, I'm still working" kind of affair.
Let me emphasize one point here: while a drive is performing error
recovery, it *stops talking to the controller*. The drive isn't
replying with a warning as you suggest--it isn't replying *at all*.
Modern desktop drives try *very hard* to recover bad sectors, under the
assumption that they have the only copy of the data. Typically, they'll
work at it for two *minutes* or more.
The linux kernel driver will give up after 30 seconds and try to reset
the drive. The drive firmware ignores the reset, possibly multiple
times, until it is done retrying the original read. When it does
finally reset, it is too late--it's been bumped from the array.
But the drive didn't really fail, leading to:
>> When you, the admin, get around to looking, the drive is idle but
>> apparently fine. (It gains a "pending" sector, which stays until the
>> drive is told to write over that spot.)
>>
>> HTH,
>
> It does, thanks for the information :-)
You are welcome.
Phil
prev parent reply other threads:[~2014-01-15 13:35 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-11 6:42 mdadm expanded 8 disk raid 6 fails in new server, 5 original devices show no md superblock Großkreutz, Julian
2014-01-11 17:47 ` Phil Turmel
[not found] ` <1389632980.11328.104.camel@achilles.aeskuladis.de>
2014-01-13 18:42 ` Phil Turmel
2014-01-13 20:11 ` Chris Murphy
2014-01-14 10:31 ` Großkreutz, Julian
2014-01-14 13:14 ` Phil Turmel
2014-01-14 14:00 ` AW: " Großkreutz, Julian
2014-01-14 17:47 ` Wilson Jonathan
2014-01-14 18:43 ` Phil Turmel
2014-01-15 12:50 ` Wilson Jonathan
2014-01-15 13:35 ` Phil Turmel [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52D68E9A.5030906@turmel.org \
--to=philip@turmel.org \
--cc=Julian.Grosskreutz@med.uni-jena.de \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.de \
--cc=piercing_male@hotmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).