From: Jason Keltz <jas@cse.yorku.ca>
To: Phil Turmel <philip@turmel.org>
Cc: linux-raid@vger.kernel.org
Subject: Re: Raid5 drive fail during grow and no backup
Date: Sun, 09 Nov 2014 22:20:22 -0500 [thread overview]
Message-ID: <54602EF6.9070909@cse.yorku.ca> (raw)
In-Reply-To: <545D8FBA.9090701@turmel.org>
On 07/11/2014 10:36 PM, Phil Turmel wrote:
> On 11/07/2014 11:06 AM, P. Gautschi wrote:
>> > This is a problem you haven't solved yet, I think. The raid array
>> should have fixed this bad sector for you without kicking the drive out.
>> The scenario is common with "green" drives and/or consumer-grade drives
>> in general.
>> > ...
>> > Then you can set up your array to properly correct bad sectors, and
>> set your system to look for bad sectors on
>> > a regular basis.
>>
>> What is the behavior of mdadm when a disk reports a read error?
>> - reconstruct the data, deliver it to the fs and otherwise ignore it?
>> - set the disk to fail?
>> - reconstruct the data, rewrite the failed data and continue with any
>> action?
>> - rewrite the failed data and reread it (bypassing the cache on the HD)?
>
> Option 3. Reconstruct and rewrite.
>
> However, if the device with the bad sector is trying to recover longer
> than the linux low level driver's timeout, bad things^TM happen.
> Specifically, the driver resets the SATA (or SCSI) connection and
> attempts to reconnect. During this brief time, it will not accept
> further I/O, so the write back of the reconstructed data fails. Then
> the device has experienced a *write* error, so MD fails the drive.
> This is the out-of-the-box behavior of consumer-grade drives in raid
> arrays.
Hi Phil,
Sorry to interject..
Since I'm in the midst of setting up a 22 disk RAID 10 with 2 TB WD
black (desktop) drives, I wanted to be clear that I understand this
particular scenerio that you bring up. Should a drive enter a deep
error recovery, would I be correct that the worst that should happen
would be a hang for the users during this recovery time, and, if the
driver does reset the SATA connection (as it likely would do), then a
potential removal of the disk from the array, but not the destruction of
the array? If I had a spare disk, it would be used for a potential
rebuild, but I could test the original disk and re-add it back to the
pool at another time.
Any feedback would be helpful.
Thanks!
Jason.
next prev parent reply other threads:[~2014-11-10 3:20 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-10-31 13:34 Raid5 drive fail during grow and no backup Vince
2014-11-02 3:22 ` Phil Turmel
2014-11-03 14:45 ` Vince
2014-11-04 16:17 ` Phil Turmel
2014-11-05 19:03 ` Vince
2014-11-06 17:12 ` Vince
2014-11-07 13:36 ` Phil Turmel
2014-11-07 16:07 ` P. Gautschi
2014-11-07 16:06 ` P. Gautschi
2014-11-08 3:36 ` Phil Turmel
2014-11-10 3:20 ` Jason Keltz [this message]
2014-12-04 19:29 ` Phillip Susi
2014-12-04 20:02 ` Phil Turmel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54602EF6.9070909@cse.yorku.ca \
--to=jas@cse.yorku.ca \
--cc=linux-raid@vger.kernel.org \
--cc=philip@turmel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).