Re: md looping on recovery of raid1 array

public inbox for linux-raid@vger.kernel.org
 help / color / mirror / Atom feed

* Re: md looping on recovery of raid1 array
@ 2008-12-22  5:49 Bin Guo
  0 siblings, 0 replies; 5+ messages in thread
From: Bin Guo @ 2008-12-22  5:49 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid, Ryan_MichaelS

I tried the patch with minor changes for 2.6.18 kernel with injected
read error, and it works as expected: with injected error, adding
spare disk would fail with no more repeated retries; unset the injected
error, spare disk can be added and sync'ed fine.

Thanks,
-- 
Bin

^ permalink raw reply	[flat|nested] 5+ messages in thread

* md looping on recovery of raid1 array
@ 2008-12-15 21:01 Bin Guo
  2008-12-16  1:56 ` Neil Brown
  2008-12-18  5:34 ` Neil Brown
  0 siblings, 2 replies; 5+ messages in thread
From: Bin Guo @ 2008-12-15 21:01 UTC (permalink / raw)
  To: linux-raid; +Cc: Ryan_MichaelS

Hi,

  I had similar errors to the problem reported in

http://marc.info/?l=linux-raid&m=118385063014256&w=2

Using manually coded patch similar to scsi fault injection
tests, I can reproduce the problem:

  1. create degraded raid1 with only disk "sda1"
  2. inject permanent I/O error on a block on "sda1"
  3. try to add spare disk "sdb1" to the raid

Now raid code would loop to sync:

[  295.837203] sd 0:0:0:0: SCSI error: return code = 0x08000002
[  295.842869] sda: Current: sense key=0x3
[  295.846725]     ASC=0x11 ASCQ=0x4
[  295.850081] Info fld=0x1e240
[  295.852958] end_request: I/O error, dev sda, sector 123456
[  295.858454] raid1: sda: unrecoverable I/O read error for block 123136
[  295.864986] md: md0: sync done.
[  295.903715] RAID1 conf printout:
[  295.906939]  --- wd:1 rd:2
[  295.909649]  disk 0, wo:0, o:1, dev:sda1
[  295.913573]  disk 1, wo:1, o:1, dev:sdb1
[  295.920686] RAID1 conf printout:
[  295.923914]  --- wd:1 rd:2
[  295.926634]  disk 0, wo:0, o:1, dev:sda1
[  295.930570] RAID1 conf printout:
[  295.933815]  --- wd:1 rd:2
[  295.936518]  disk 0, wo:0, o:1, dev:sda1
[  295.940442]  disk 1, wo:1, o:1, dev:sdb1
[  295.944419] md: syncing RAID array md0
[  295.948199] md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
[  295.955262] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.
[  295.965369] md: using 128k window, over a total of 71289063 blocks.

It seems to be caused by raid1.c:error() doing nothing in this fatal error
case:

       /*
         * If it is not operational, then we have already marked it as dead
         * else if it is the last working disks, ignore the error, let the
         * next level up know.
         * else mark the drive as failed
         */
        if (test_bit(In_sync, &rdev->flags)
            && conf->working_disks == 1)
                /*
                 * Don't fail the drive, act as though we were just a
                 * normal single drive
                 */
                return;

Where is the code in "next level up" handling this? I'm using ancient 2.6.18,
can someone test whether this is the case for newer kernel?

I tested by commenting out those lines, but ends up with a raid1 consisting
of "sdb1" instead of total failure.

-- 
Bin

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: md looping on recovery of raid1 array
  2008-12-15 21:01 Bin Guo
@ 2008-12-16  1:56 ` Neil Brown
  2008-12-18  5:34 ` Neil Brown
  1 sibling, 0 replies; 5+ messages in thread
From: Neil Brown @ 2008-12-16  1:56 UTC (permalink / raw)
  To: Bin Guo; +Cc: linux-raid, Ryan_MichaelS

On Monday December 15, bguo@starentnetworks.com wrote:
> Hi,
> 
>   I had similar errors to the problem reported in
> 
> http://marc.info/?l=linux-raid&m=118385063014256&w=2
> 
> Using manually coded patch similar to scsi fault injection
> tests, I can reproduce the problem:
> 
>   1. create degraded raid1 with only disk "sda1"
>   2. inject permanent I/O error on a block on "sda1"
>   3. try to add spare disk "sdb1" to the raid
> 
> Now raid code would loop to sync:

Yes, I know about this.  I just haven't decided what to do about it
exactly.

Longer term I want to be able to support a bad-block log for each
device in a raid array.  Then we would simply record the bad block as
bad for each device and keep recovering the rest of the array.  And
whenever that block is read, we return EIO.

But we need a sensible response when there is a no bad-block log.
I suspect I need to flag the array as "recovery won't work" so that it
doesn't keep trying to recover.
raid1 one would set that flag in the code that you found, and
md_check_recovery would skip any recovery if it was set.
There would need to be some simple way to clear the flag too.  Maybe
any time a device is added to the array we clear the flag so we can
have another attempt at recovery....

NeilBrown


> 
> [  295.837203] sd 0:0:0:0: SCSI error: return code = 0x08000002
> [  295.842869] sda: Current: sense key=0x3
> [  295.846725]     ASC=0x11 ASCQ=0x4
> [  295.850081] Info fld=0x1e240
> [  295.852958] end_request: I/O error, dev sda, sector 123456
> [  295.858454] raid1: sda: unrecoverable I/O read error for block 123136
> [  295.864986] md: md0: sync done.
> [  295.903715] RAID1 conf printout:
> [  295.906939]  --- wd:1 rd:2
> [  295.909649]  disk 0, wo:0, o:1, dev:sda1
> [  295.913573]  disk 1, wo:1, o:1, dev:sdb1
> [  295.920686] RAID1 conf printout:
> [  295.923914]  --- wd:1 rd:2
> [  295.926634]  disk 0, wo:0, o:1, dev:sda1
> [  295.930570] RAID1 conf printout:
> [  295.933815]  --- wd:1 rd:2
> [  295.936518]  disk 0, wo:0, o:1, dev:sda1
> [  295.940442]  disk 1, wo:1, o:1, dev:sdb1
> [  295.944419] md: syncing RAID array md0
> [  295.948199] md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
> [  295.955262] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.
> [  295.965369] md: using 128k window, over a total of 71289063 blocks.
> 
> It seems to be caused by raid1.c:error() doing nothing in this fatal error
> case:
> 
>        /*
>          * If it is not operational, then we have already marked it as dead
>          * else if it is the last working disks, ignore the error, let the
>          * next level up know.
>          * else mark the drive as failed
>          */
>         if (test_bit(In_sync, &rdev->flags)
>             && conf->working_disks == 1)
>                 /*
>                  * Don't fail the drive, act as though we were just a
>                  * normal single drive
>                  */
>                 return;
> 
> Where is the code in "next level up" handling this? I'm using ancient 2.6.18,
> can someone test whether this is the case for newer kernel?
> 
> I tested by commenting out those lines, but ends up with a raid1 consisting
> of "sdb1" instead of total failure.
> 
> -- 
> Bin
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: md looping on recovery of raid1 array
  2008-12-15 21:01 Bin Guo
  2008-12-16  1:56 ` Neil Brown
@ 2008-12-18  5:34 ` Neil Brown
  1 sibling, 0 replies; 5+ messages in thread
From: Neil Brown @ 2008-12-18  5:34 UTC (permalink / raw)
  To: Bin Guo; +Cc: linux-raid, Ryan_MichaelS

On Monday December 15, bguo@starentnetworks.com wrote:
> Hi,
> 
>   I had similar errors to the problem reported in
> 
> http://marc.info/?l=linux-raid&m=118385063014256&w=2
> 
> Using manually coded patch similar to scsi fault injection
> tests, I can reproduce the problem:
> 
>   1. create degraded raid1 with only disk "sda1"
>   2. inject permanent I/O error on a block on "sda1"
>   3. try to add spare disk "sdb1" to the raid
> 
> Now raid code would loop to sync:

Below is the patch I expect to submit to fix this.

Thanks.

NeilBrown


From 6d9701cba935838b33a2b5df9c693c27c621c2d5 Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@suse.de>
Date: Thu, 18 Dec 2008 12:19:14 +1100
Subject: [PATCH] md: don't retry recovery of raid1 that fails due to error on source drive.

If a raid1 has only one working drive and it has a sector which
gives an error on read, then an attempt to recover onto a spare will
fail, but as the single remaining drive is not removed from the
array, the recovery will be immediately re-attempted, resulting
in an infinite recovery loop.

So detect this situation and don't retry recovery once an error
on the lone remaining drive is detected.

Allow recovery to be retried once every time a spare is added
in case the problem wasn't actually a media error.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 drivers/md/md.c           |    5 ++++-
 drivers/md/raid1.c        |    8 ++++++--
 include/linux/raid/md_k.h |    3 +++
 3 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index bef91b8..03cd257 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -1500,6 +1500,9 @@ static int bind_rdev_to_array(mdk_rdev_t * rdev, mddev_t * mddev)
 
 	list_add_rcu(&rdev->same_set, &mddev->disks);
 	bd_claim_by_disk(rdev->bdev, rdev->bdev->bd_holder, mddev->gendisk);
+
+	/* May as well allow recovery to be retried once */
+	mddev->recovery_disabled = 0;
 	return 0;
 
  fail:
@@ -6175,7 +6178,7 @@ static int remove_and_add_spares(mddev_t *mddev)
 			}
 		}
 
-	if (mddev->degraded && ! mddev->ro) {
+	if (mddev->degraded && ! mddev->ro && !mddev->recovery_disabled) {
 		list_for_each_entry(rdev, &mddev->disks, same_set) {
 			if (rdev->raid_disk >= 0 &&
 			    !test_bit(In_sync, &rdev->flags) &&
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index c165b1e..7b4f5f7 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1016,12 +1016,16 @@ static void error(mddev_t *mddev, mdk_rdev_t *rdev)
 	 * else mark the drive as failed
 	 */
 	if (test_bit(In_sync, &rdev->flags)
-	    && (conf->raid_disks - mddev->degraded) == 1)
+	    && (conf->raid_disks - mddev->degraded) == 1) {
 		/*
 		 * Don't fail the drive, act as though we were just a
-		 * normal single drive
+		 * normal single drive.
+		 * However don't try a recovery from this drive as
+		 * it is very likely to fail.
 		 */
+		mddev->recovery_disabled = 1;
 		return;
+	}
 	if (test_and_clear_bit(In_sync, &rdev->flags)) {
 		unsigned long flags;
 		spin_lock_irqsave(&conf->device_lock, flags);
diff --git a/include/linux/raid/md_k.h b/include/linux/raid/md_k.h
index dac4217..9743e4d 100644
--- a/include/linux/raid/md_k.h
+++ b/include/linux/raid/md_k.h
@@ -218,6 +218,9 @@ struct mddev_s
 #define	MD_RECOVERY_FROZEN	9
 
 	unsigned long			recovery;
+	int				recovery_disabled; /* if we detect that recovery
+							    * will always fail, set this
+							    * so we don't loop trying */
 
 	int				in_sync;	/* know to not need resync */
 	struct mutex			reconfig_mutex;
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* md looping on recovery of raid1 array
@ 2007-07-07 23:21 Ryan_MichaelS
  0 siblings, 0 replies; 5+ messages in thread
From: Ryan_MichaelS @ 2007-07-07 23:21 UTC (permalink / raw)
  To: linux-raid

Md loops forever when attempting to recover a 2 disk raid 1 array:

Jul  7 16:32:35 soho user.info kernel: md: recovery of RAID array md0
Jul  7 16:32:35 soho user.info kernel: md: minimum _guaranteed_  speed:
1000 B/sec/disk.
Jul  7 16:32:35 soho user.info kernel: md: using maximum available idle
IO bandwidth (but not more than 200000 KB/sec) for recovery.
Jul  7 16:32:35 soho user.info kernel: md: using 128k window, over a
total of 2096384 blocks.
Jul  7 16:32:35 soho user.err kernel: scsi 1:0:0:0: rejecting I/O to
dead device
Jul  7 16:32:35 soho user.err kernel: scsi 1:0:0:0: rejecting I/O to
dead device
Jul  7 16:32:35 soho user.alert kernel: raid1: dm-1: unrecoverable I/O
read error for block 0
Jul  7 16:32:35 soho user.err kernel: scsi 1:0:0:0: rejecting I/O to
dead device
Jul  7 16:32:35 soho user.alert kernel: raid1: dm-1: unrecoverable I/O
read error for block 128
Jul  7 16:32:35 soho user.info kernel: md: md0: recovery done.
Jul  7 16:32:35 soho user.err kernel: scsi 1:0:0:0: rejecting I/O to
dead device
Jul  7 16:32:35 soho user.err kernel: scsi 1:0:0:0: rejecting I/O to
dead device
Jul  7 16:32:35 soho user.alert kernel: raid1: dm-1: unrecoverable I/O
read error for block 256
Jul  7 16:32:35 soho user.err kernel: scsi 1:0:0:0: rejecting I/O to
dead device
Jul  7 16:32:35 soho user.alert kernel: raid1: dm-1: unrecoverable I/O
read error for block 384
Jul  7 16:32:35 soho user.err kernel: scsi 1:0:0:0: rejecting I/O to
dead device
Jul  7 16:32:35 soho user.warn kernel: md: super_written gets error=-5,
uptodate=0
Jul  7 16:32:35 soho user.warn kernel: RAID1 conf printout:
Jul  7 16:32:35 soho user.warn kernel:  --- wd:1 rd:2
Jul  7 16:32:35 soho user.warn kernel:  disk 0, wo:1, o:1, dev:dm-7 Jul
7 16:32:35 soho user.warn kernel:  disk 1, wo:0, o:1, dev:dm-1
Jul  7 16:32:35 soho user.warn kernel: RAID1 conf printout:
Jul  7 16:32:35 soho user.warn kernel:  --- wd:1 rd:2
Jul  7 16:32:35 soho user.warn kernel:  disk 1, wo:0, o:1, dev:dm-1
Jul  7 16:32:35 soho user.warn kernel: RAID1 conf printout:
Jul  7 16:32:35 soho user.warn kernel:  --- wd:1 rd:2
Jul  7 16:32:35 soho user.warn kernel:  disk 0, wo:1, o:1, dev:dm-7
Jul  7 16:32:35 soho user.warn kernel:  disk 1, wo:0, o:1, dev:dm-1
Jul  7 16:32:35 soho user.info kernel: md: recovery of RAID array md0
...

This occurs after hotswap removing both drives of the raid1 array and
reinserting them.  The kernel version is 2.6.19.  Is anyone familiar
with this scenario?  Can anyone shed any light on what's happening here?

Thanks.
- Michael

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2008-12-22  5:49 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-22  5:49 md looping on recovery of raid1 array Bin Guo
  -- strict thread matches above, loose matches on Subject: below --
2008-12-15 21:01 Bin Guo
2008-12-16  1:56 ` Neil Brown
2008-12-18  5:34 ` Neil Brown
2007-07-07 23:21 Ryan_MichaelS

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox