All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tim Small <tim@buttersideup.com>
To: Paul Clements <paul.clements@steeleye.com>
Cc: Neil Brown <neilb@cse.unsw.edu.au>, linux-raid@vger.kernel.org
Subject: Write and verify correct data to read-failed sectors before degrading array?
Date: Thu, 16 Sep 2004 11:50:55 +0100	[thread overview]
Message-ID: <4149700F.6060509@buttersideup.com> (raw)
In-Reply-To: <41487D18.1050000@steeleye.com>

Paul Clements wrote:

> Neil Brown wrote:
>
>> On Friday September 10, paul.clements@steeleye.com wrote:
>>
>>> Neil,
>>>
>>> unless you've already done so, I believe there is a little fix 
>>> needed in the raid1 read reschedule code. As the code currently 
>>> works, a read that is retried will continue to fail and cause raid1 
>>> to go into an infinite retry loop:
>>
>>
>>
>> Thanks.  I must have noticed this when writing the raid10 module
>> because it gets it right.  Obviously I didn't "back-port" it to raid1.
>>
>> A few other fields need to be reset for safety.
>
>
> Well, it turns out that even that is not enough. Even with your patch, 
> we're still seeing ext3-fs errors, which means we're getting bogus 
> data on the read retry (the filesystem is re-created every test run, 
> so there's no chance of lingering filesystem corruption causing the 
> errors).
>
> Rather than getting down in the guts of the bio and trying to reset 
> all the fields that potentially could have been touched, I think it's 
> probably safer to simply discard the bio that had the failed I/O 
> attempted against it and clone a new bio, setting it up just as we did 
> for the original read attempt. This seems to work better and will also 
> protect us against any future changes in the bio code (or bio handling 
> in any driver sitting below raid1), which could break read retry 
> again. Patch attached.
>

Just thinking out loud here, but I wonder if the following change is 
possible or worth making to this code?  For a failed read, where the 
block is then successfully read from another drive, then attempt to 
write the correct data for this block to the device with the read 
failure (to try to see if the drive firmware thinks this sector is still 
usable, and if not then maybe it will reallocate the failed sector).  If 
this write succeeds, and can be verified, then don't mark the sector bad 
(maybe just complain with a printk)..

This would get around a lot of mirror failures that I see in 
operation..  In the past, I've had mirrors go bad with individual failed 
sectors in different locations on both drives, the array is then 
unusable (and the database server is dead, in my experience) unless you 
manually try to knit it back together with dd.


  reply	other threads:[~2004-09-16 10:50 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-09-10 20:22 [BUG / PATCH] raid1: set BIO_UPTODATE after read error Paul Clements
2004-09-13  5:32 ` Neil Brown
2004-09-15 17:34   ` Paul Clements
2004-09-16 10:50     ` Tim Small [this message]
2004-09-17  0:39       ` Write and verify correct data to read-failed sectors before degrading array? Neil Brown
2004-09-17  1:41         ` Sebastian Sobolewski
2004-09-17  2:00           ` Neil Brown
2004-09-17  2:13             ` Sebastian Sobolewski
2004-09-22  0:06               ` [PATCH] " Sebastian Sobolewski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4149700F.6060509@buttersideup.com \
    --to=tim@buttersideup.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@cse.unsw.edu.au \
    --cc=paul.clements@steeleye.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.