[PATCH] The md RAID10 resync thread could cause a md RAID10 array deadlock

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] The md RAID10 resync thread could cause a md RAID10 array deadlock
@ 2008-02-28  2:55 K.Tanaka
  2008-03-03  0:12 ` Neil Brown
  0 siblings, 1 reply; 2+ messages in thread
From: K.Tanaka @ 2008-02-28  2:55 UTC (permalink / raw)
  To: linux-raid; +Cc: linux-scsi

This message describes another issue about md RAID10 found by
testing the 2.6.24 md RAID10 using new scsi fault injection framework.

Abstract:
When a scsi error results in disabling a disk during RAID10 recovery,
the resync threads of md RAID10 could stall.
This case, the raid array has already been broken and it may not matter.
But I think stall is not preferable. If it occurs, even shutdown or reboot
will fail because of resource busy.

The deadlock mechanism:
The r10bio_s structure has a "remaining" member to keep track of BIOs yet to be
handled when recovering. The "remaining" counter is incremented when building a BIO
in sync_request() and is decremented when finish a BIO in end_sync_write().

If building a BIO fails for some reasons in sync_request(), the "remaining" should be
decremented if it has already been incremented. I found a case where this decrement
is forgotten. This causes a md_do_sync() deadlock because md_do_sync() waits for
md_done_sync() called by end_sync_write(), but end_sync_write() never calls
md_done_sync() because of the "remaining" counter mismatch.

For example, this problem would be reproduced in the following case:

Personalities : [raid10]
md0 : active raid10 sdf1[4] sde1[5](F) sdd1[2] sdc1[1] sdb1[6](F)
      3919616 blocks 64K chunks 2 near-copies [4/2] [_UU_]
      [>....................]  recovery =  2.2% (45376/1959808) finish=0.7min speed=45376K/sec

This case, sdf1 is recovering, sdb1 and sde1 are disabled.
An additional error with detaching sdd will cause a deadlock.

md0 : active raid10 sdf1[4] sde1[5](F) sdd1[6](F) sdc1[1] sdb1[7](F)
      3919616 blocks 64K chunks 2 near-copies [4/1] [_U__]
      [=>...................]  recovery =  5.0% (99520/1959808) finish=5.9min speed=5237K/sec

 2739 ?        S<     0:17 [md0_raid10]
28608 ?        D<     0:00 [md0_resync]
28629 pts/1    Ss     0:00 bash
28830 pts/1    R+     0:00 ps ax
31819 ?        D<     0:00 [kjournald]

The resync thread keeps working, but actually it is deadlocked.

Patch:
By this patch, the remaining counter will be decremented if needed.

--- raid10.c.org	2008-01-30 01:09:04.000000000 +0900
+++ raid10.c	2008-02-26 16:27:22.000000000 +0900
@@ -1805,6 +1805,9 @@ static sector_t sync_request(mddev_t *md
 				if (j == conf->copies) {
 					/* Cannot recover, so abort the recovery */
 					put_buf(r10_bio);
+  				        if (rb2) {
+ 					    atomic_dec(&rb2->remaining);
+                                        }
 					r10_bio = rb2;
 					if (!test_and_set_bit(MD_RECOVERY_ERR, &mddev->recovery))
 						printk(KERN_INFO "raid10: %s: insufficient working devices for recovery.\n",

This problem is also detected by using new scsi fault injection framework.
I have posted a new version to sourceforge with some sample shell script
using the framework for usability. If you are interested, please take a look at it.

-- 

---------------------------------------------------------
Kenichi TANAKA    | Open Source Software Platform Development Division
                  | Computers Software Operations Unit, NEC Corporation
                  | k-tanaka@ce.jp.nec.com

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH] The md RAID10 resync thread could cause a md RAID10 array deadlock
  2008-02-28  2:55 [PATCH] The md RAID10 resync thread could cause a md RAID10 array deadlock K.Tanaka
@ 2008-03-03  0:12 ` Neil Brown
  0 siblings, 0 replies; 2+ messages in thread
From: Neil Brown @ 2008-03-03  0:12 UTC (permalink / raw)
  To: K.Tanaka; +Cc: linux-raid, linux-scsi

On Thursday February 28, k-tanaka@ce.jp.nec.com wrote:
> This message describes another issue about md RAID10 found by
> testing the 2.6.24 md RAID10 using new scsi fault injection framework.

Thanks for this one too.

The patch looks good (except for some tiny formatting changes).
I'll forward it upstream shortly.

NeilBrown

> 
> Abstract:
> When a scsi error results in disabling a disk during RAID10 recovery,
> the resync threads of md RAID10 could stall.
> This case, the raid array has already been broken and it may not matter.
> But I think stall is not preferable. If it occurs, even shutdown or reboot
> will fail because of resource busy.
> 
> The deadlock mechanism:
> The r10bio_s structure has a "remaining" member to keep track of BIOs yet to be
> handled when recovering. The "remaining" counter is incremented when building a BIO
> in sync_request() and is decremented when finish a BIO in end_sync_write().
> 
> If building a BIO fails for some reasons in sync_request(), the "remaining" should be
> decremented if it has already been incremented. I found a case where this decrement
> is forgotten. This causes a md_do_sync() deadlock because md_do_sync() waits for
> md_done_sync() called by end_sync_write(), but end_sync_write() never calls
> md_done_sync() because of the "remaining" counter mismatch.
> 
> For example, this problem would be reproduced in the following case:
> 
> Personalities : [raid10]
> md0 : active raid10 sdf1[4] sde1[5](F) sdd1[2] sdc1[1] sdb1[6](F)
>       3919616 blocks 64K chunks 2 near-copies [4/2] [_UU_]
>       [>....................]  recovery =  2.2% (45376/1959808) finish=0.7min speed=45376K/sec
> 
> This case, sdf1 is recovering, sdb1 and sde1 are disabled.
> An additional error with detaching sdd will cause a deadlock.
> 
> md0 : active raid10 sdf1[4] sde1[5](F) sdd1[6](F) sdc1[1] sdb1[7](F)
>       3919616 blocks 64K chunks 2 near-copies [4/1] [_U__]
>       [=>...................]  recovery =  5.0% (99520/1959808) finish=5.9min speed=5237K/sec
> 
>  2739 ?        S<     0:17 [md0_raid10]
> 28608 ?        D<     0:00 [md0_resync]
> 28629 pts/1    Ss     0:00 bash
> 28830 pts/1    R+     0:00 ps ax
> 31819 ?        D<     0:00 [kjournald]
> 
> The resync thread keeps working, but actually it is deadlocked.
> 
> Patch:
> By this patch, the remaining counter will be decremented if needed.
> 
> --- raid10.c.org	2008-01-30 01:09:04.000000000 +0900
> +++ raid10.c	2008-02-26 16:27:22.000000000 +0900
> @@ -1805,6 +1805,9 @@ static sector_t sync_request(mddev_t *md
>  				if (j == conf->copies) {
>  					/* Cannot recover, so abort the recovery */
>  					put_buf(r10_bio);
> +  				        if (rb2) {
> + 					    atomic_dec(&rb2->remaining);
> +                                        }
>  					r10_bio = rb2;
>  					if (!test_and_set_bit(MD_RECOVERY_ERR, &mddev->recovery))
>  						printk(KERN_INFO "raid10: %s: insufficient working devices for recovery.\n",
> 
> 
> This problem is also detected by using new scsi fault injection framework.
> I have posted a new version to sourceforge with some sample shell script
> using the framework for usability. If you are interested, please take a look at it.
> 
> 
> -- 
> 
> ---------------------------------------------------------
> Kenichi TANAKA    | Open Source Software Platform Development Division
>                   | Computers Software Operations Unit, NEC Corporation
>                   | k-tanaka@ce.jp.nec.com
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2008-03-03  0:12 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-28  2:55 [PATCH] The md RAID10 resync thread could cause a md RAID10 array deadlock K.Tanaka
2008-03-03  0:12 ` Neil Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).