Re: [PATCH md 014 of 18] Attempt to auto-correct read errors in raid1.

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Paul Clements <paul.clements@steeleye.com>
To: NeilBrown <neilb@suse.de>
Cc: Andrew Morton <akpm@osdl.org>, linux-raid@vger.kernel.org
Subject: Re: [PATCH md 014 of 18] Attempt to auto-correct read errors in raid1.
Date: Tue, 29 Nov 2005 11:38:51 -0500	[thread overview]
Message-ID: <438C841B.7000405@steeleye.com> (raw)
In-Reply-To: <1051127234048.14901@suse.de>

Hi Neil,

Glad to see this patch is making its way to mainline. I have a couple of 
questions on the patch, though...

NeilBrown wrote:

> +	if (uptodate || conf->working_disks <= 1) {

Is it valid to mask a read error just because we have only 1 working disk?

> +				do {
> +					rdev = conf->mirrors[d].rdev;
> +					if (rdev &&
> +					    test_bit(In_sync, &rdev->flags) &&
> +					    sync_page_io(rdev->bdev,
> +							 sect + rdev->data_offset,
> +							 s<<9,
> +							 conf->tmppage, READ))
> +						success = 1;
> +					else {
> +						d++;
> +						if (d == conf->raid_disks)
> +							d = 0;
> +					}
> +				} while (!success && d != r1_bio->read_disk);
> +
> +				if (success) {
> +					/* write it back and re-read */
> +					while (d != r1_bio->read_disk) {

Here, it looks like if we retry the read on the same disk that just gave 
the read error, then we will not do any re-writes? I assume that is 
intentional? I guess it's a judgment call whether the sector is really 
bad at that point.

> +						if (d==0)
> +							d = conf->raid_disks;
> +						d--;
> +						rdev = conf->mirrors[d].rdev;
> +						if (rdev &&
> +						    test_bit(In_sync, &rdev->flags)) {
> +							if (sync_page_io(rdev->bdev,
> +									 sect + rdev->data_offset,
> +									 s<<9, conf->tmppage, WRITE) == 0 ||
> +							    sync_page_io(rdev->bdev,
> +									 sect + rdev->data_offset,
> +									 s<<9, conf->tmppage, READ) == 0) {
> +								/* Well, this device is dead */
> +								md_error(mddev, rdev);

Here, we might have gotten garbage back from the sync_page_io(..., 
READ), if it failed. So don't we have to quit the re-write loop at this 
point? Otherwise, aren't we potentially writing bad data over other 
disks? Granted, this particular case might never actually happen in the 
real world.

Thanks,
Paul

next prev parent reply	other threads:[~2005-11-29 16:38 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-11-27 23:39 [PATCH md 000 of 18] Introduction NeilBrown
2005-11-27 23:39 ` [PATCH md 001 of 18] Improve read speed to raid10 arrays using 'far copies' NeilBrown
2005-11-27 23:39 ` [PATCH md 002 of 18] Fix locking problem in r5/r6 NeilBrown
2005-11-27 23:39 ` [PATCH md 003 of 18] Fix problem with raid6 intent bitmap NeilBrown
2005-11-27 23:39 ` [PATCH md 004 of 18] Set default_bitmap_offset properly in set_array_info NeilBrown
2005-11-27 23:40 ` [PATCH md 005 of 18] Fix --re-add for raid1 and raid6 NeilBrown
2005-11-27 23:40 ` [PATCH md 006 of 18] Improve raid1 "IO Barrier" concept NeilBrown
2005-11-27 23:40 ` [PATCH md 007 of 18] Improve raid10 " NeilBrown
2005-11-27 23:40 ` [PATCH md 008 of 18] Small cleanups for raid5 NeilBrown
2005-11-27 23:40 ` [PATCH md 010 of 18] Move bitmap_create to after md array has been initialised NeilBrown
2005-11-27 23:40 ` [PATCH md 011 of 18] Write intent bitmap support for raid10 NeilBrown
2005-11-27 23:40 ` [PATCH md 012 of 18] Fix raid6 resync check/repair code NeilBrown
2005-11-27 23:40 ` [PATCH md 013 of 18] Improve handing of read errors with raid6 NeilBrown
2005-11-30 22:33   ` Carlos Carvalho
2005-12-01  2:54     ` Neil Brown
2005-11-27 23:40 ` [PATCH md 014 of 18] Attempt to auto-correct read errors in raid1 NeilBrown
2005-11-29 16:38   ` Paul Clements [this message]
2005-11-29 23:21     ` Neil Brown
2005-11-27 23:40 ` [PATCH md 015 of 18] Tidyup some issues with raid1 resync and prepare for catching read errors NeilBrown
2005-11-27 23:40 ` [PATCH md 016 of 18] Better handling for read error in raid1 during resync NeilBrown
2005-11-27 23:41 ` [PATCH md 017 of 18] Handle errors when read-only NeilBrown
2005-12-10  6:41   ` Yanggun
2005-12-10  6:59     ` raid1 mysteriously switching to read-only Neil Brown
2005-12-10  7:50       ` Yanggun
2005-12-10  8:02         ` Neil Brown
2005-12-10  8:10           ` Yanggun
2005-12-10 12:10             ` Neil Brown
2005-12-11 13:04               ` Yanggun
2005-12-11 14:14                 ` Patrik Jonsson
2005-12-11 14:29                   ` Yanggun
2005-12-11 17:13                     ` Ross Vandegrift
2005-12-11 23:28                       ` Yanggun
2005-11-27 23:41 ` [PATCH md 018 of 18] Fix up some rdev rcu locking in raid5/6 NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=438C841B.7000405@steeleye.com \
    --to=paul.clements@steeleye.com \
    --cc=akpm@osdl.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).