Re: [PATCH] The md RAID10 resync thread could cause a md RAID10 array deadlock

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Neil Brown <neilb@suse.de>
To: "K.Tanaka" <k-tanaka@ce.jp.nec.com>
Cc: linux-raid@vger.kernel.org, linux-scsi@vger.kernel.org
Subject: Re: [PATCH] The md RAID10 resync thread could cause a md RAID10 array deadlock
Date: Mon, 3 Mar 2008 11:12:18 +1100	[thread overview]
Message-ID: <18379.16994.277457.632455@notabene.brown> (raw)
In-Reply-To: message from K.Tanaka on Thursday February 28

On Thursday February 28, k-tanaka@ce.jp.nec.com wrote:
> This message describes another issue about md RAID10 found by
> testing the 2.6.24 md RAID10 using new scsi fault injection framework.

Thanks for this one too.

The patch looks good (except for some tiny formatting changes).
I'll forward it upstream shortly.

NeilBrown

> 
> Abstract:
> When a scsi error results in disabling a disk during RAID10 recovery,
> the resync threads of md RAID10 could stall.
> This case, the raid array has already been broken and it may not matter.
> But I think stall is not preferable. If it occurs, even shutdown or reboot
> will fail because of resource busy.
> 
> The deadlock mechanism:
> The r10bio_s structure has a "remaining" member to keep track of BIOs yet to be
> handled when recovering. The "remaining" counter is incremented when building a BIO
> in sync_request() and is decremented when finish a BIO in end_sync_write().
> 
> If building a BIO fails for some reasons in sync_request(), the "remaining" should be
> decremented if it has already been incremented. I found a case where this decrement
> is forgotten. This causes a md_do_sync() deadlock because md_do_sync() waits for
> md_done_sync() called by end_sync_write(), but end_sync_write() never calls
> md_done_sync() because of the "remaining" counter mismatch.
> 
> For example, this problem would be reproduced in the following case:
> 
> Personalities : [raid10]
> md0 : active raid10 sdf1[4] sde1[5](F) sdd1[2] sdc1[1] sdb1[6](F)
>       3919616 blocks 64K chunks 2 near-copies [4/2] [_UU_]
>       [>....................]  recovery =  2.2% (45376/1959808) finish=0.7min speed=45376K/sec
> 
> This case, sdf1 is recovering, sdb1 and sde1 are disabled.
> An additional error with detaching sdd will cause a deadlock.
> 
> md0 : active raid10 sdf1[4] sde1[5](F) sdd1[6](F) sdc1[1] sdb1[7](F)
>       3919616 blocks 64K chunks 2 near-copies [4/1] [_U__]
>       [=>...................]  recovery =  5.0% (99520/1959808) finish=5.9min speed=5237K/sec
> 
>  2739 ?        S<     0:17 [md0_raid10]
> 28608 ?        D<     0:00 [md0_resync]
> 28629 pts/1    Ss     0:00 bash
> 28830 pts/1    R+     0:00 ps ax
> 31819 ?        D<     0:00 [kjournald]
> 
> The resync thread keeps working, but actually it is deadlocked.
> 
> Patch:
> By this patch, the remaining counter will be decremented if needed.
> 
> --- raid10.c.org	2008-01-30 01:09:04.000000000 +0900
> +++ raid10.c	2008-02-26 16:27:22.000000000 +0900
> @@ -1805,6 +1805,9 @@ static sector_t sync_request(mddev_t *md
>  				if (j == conf->copies) {
>  					/* Cannot recover, so abort the recovery */
>  					put_buf(r10_bio);
> +  				        if (rb2) {
> + 					    atomic_dec(&rb2->remaining);
> +                                        }
>  					r10_bio = rb2;
>  					if (!test_and_set_bit(MD_RECOVERY_ERR, &mddev->recovery))
>  						printk(KERN_INFO "raid10: %s: insufficient working devices for recovery.\n",
> 
> 
> This problem is also detected by using new scsi fault injection framework.
> I have posted a new version to sourceforge with some sample shell script
> using the framework for usability. If you are interested, please take a look at it.
> 
> 
> -- 
> 
> ---------------------------------------------------------
> Kenichi TANAKA    | Open Source Software Platform Development Division
>                   | Computers Software Operations Unit, NEC Corporation
>                   | k-tanaka@ce.jp.nec.com
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

     prev parent reply	other threads:[~2008-03-03  0:12 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-28  2:55 [PATCH] The md RAID10 resync thread could cause a md RAID10 array deadlock K.Tanaka
2008-03-03  0:12 ` Neil Brown [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=18379.16994.277457.632455@notabene.brown \
    --to=neilb@suse.de \
    --cc=k-tanaka@ce.jp.nec.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).