Re: Problem with disk replacement

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: NeilBrown <neilb@suse.de>
To: qindehua <qindehua@163.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: Problem with disk replacement
Date: Thu, 25 Jul 2013 16:45:31 +1000	[thread overview]
Message-ID: <20130725164531.72f316c8@notabene.brown> (raw)
In-Reply-To: <b97941b.229e0.14010b931c5.Coremail.qindehua@163.com>

[-- Attachment #1: Type: text/plain, Size: 6721 bytes --]

On Wed, 24 Jul 2013 20:48:27 +0800 (CST) qindehua <qindehua@163.com> wrote:

> Hi Neil,
> I found another problem that the raid5 thread may run into an endless loop in a rare circumstance.
> 
> I have tested with or without the patch of "md/raid5: fix interaction of 'replace' and 'recovery'",
> the problem will not happen without the patch. So it is the patch that introduce this problem.
> 
> Here is the fast steps to reproduce the problem:
> 1. create 3-drives RAID5 with --assume-clean (I use 1GB disks in VirtualBox machine)
>    mdadm -C /dev/md1 -l 5 -n 3 --assume-clean /dev/sd[b-d]
> 
> 2. change speed_limit_min and speed_limit_max to small number:
>    echo 1 > /proc/sys/dev/raid/speed_limit_min
>    echo 10 > /proc/sys/dev/raid/speed_limit_max
> 
> 3. add three spare disks to the raid:
>    mdadm /dev/md1 -a /dev/sde -a /dev/sdf -a /dev/sdg
> 
> 4. replace three raid disks with this procedure:
>    	for disk in sdb sdc sdd
> 	do
> 		mdadm /dev/md1 --replace /dev/$disk
> 		echo "idle" >  /sys/block/md1/md/sync_action
> 		sleep 3
> 	done
> 
> After step 4 the md1_raid5 process then ran into endless loop with 99% CPU usage.
> The problem will not happen if there is no step 2. Also it will not happen if
> only two disks were replaced.

Thanks a lot for the  testing!

This was missing from the previous patch:

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 1c3b279..e6e24c3 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -3524,6 +3524,7 @@ static void handle_stripe(struct stripe_head *sh)
 		    test_and_clear_bit(STRIPE_SYNC_REQUESTED, &sh->state)) {
 			set_bit(STRIPE_SYNCING, &sh->state);
 			clear_bit(STRIPE_INSYNC, &sh->state);
+			clear_bit(STRIPE_REPLACED, &sh->state);
 		}
 		spin_unlock(&sh->stripe_lock);
 	}


and that omission caused the problem you saw.

However after looking at the code more closely I'm not sure that is the only
problem.  So I won't push out a revised patch until I am sure.
Maybe I don't actually need the new "STRIPE_REPLACED" flag after all.

Thanks,
NeilBrown


> 
> Regards,
> Qin Dehua
> 
> At 2013-07-23 11:21:24,NeilBrown <neilb@suse.de> wrote:
> >On Mon, 22 Jul 2013 18:23:57 +0800 (CST) qindehua <qindehua@163.com> wrote:
> >
> >> Hi Neil,
> >> I have tested with this patch, and it fixes the problem.
> >> I have also tried various combinations of failing and replacing disks,
> >> with echo idle to /sys/block/md1/md/sync_action, they all work fine.
> >> 
> >> Regards,
> >> Qin Dehua
> >
> >Thanks for confirming, and for all of your testing!
> >
> >I'll forward the patch to Linus shortly.
> >
> >NeilBrown
> >
> >
> >
> >> 
> >> At 2013-07-22 11:04:15,NeilBrown <neilb@suse.de> wrote:
> >> >Hi Qin,
> >> > thanks for the report.  I can easily reproduce the bug.
> >> >
> >> >I think this will fix it.  Could you please test and confirm that it fixes
> >> >the problem for you too.
> >> >
> >> >Thanks,
> >> >NeilBrown
> >> >
> >> >
> >> >From: NeilBrown <neilb@suse.de>
> >> >Date: Mon, 22 Jul 2013 12:57:21 +1000
> >> >Subject: [PATCH] md/raid5: fix interaction of 'replace' and 'recovery'.
> >> >
> >> >If a device in a RAID4/5/6 is being replaced while another is being
> >> >recovered, then the writes to the replacement device currently don't
> >> >happen, resulting in corruption when the replacement completes and the
> >> >new drive takes over.
> >> >
> >> >This is because the replacement writes are only triggered when
> >> >'s.replacing' is set and not when the similar 's.sync' is set (which
> >> >is the case during resync and recovery - it means all devices need to
> >> >be read).
> >> >
> >> >So schedule those writes when s.replacing is set as well.
> >> >
> >> >In this case we cannot use "STRIPE_INSYNC" to record that the
> >> >replacement has happened as that is needed for recording that any
> >> >parity calculation is complete.  So introduce STRIPE_REPLACED to
> >> >record if the replacement has happened.
> >> >
> >> >This bug was introduced in commit 9a3e1101b827a59ac9036a672f5fa8d5279d0fe2
> >> >(md/raid5:  detect and handle replacements during recovery.)
> >> >which introduced replacement for raid5.
> >> >That was in 3.3-rc3, so any stable kernel since then would benefit
> >> >from this fix.
> >> >
> >> >Cc: stable@vger.kernel.org (3.3+)
> >> >Reported-by: qindehua <13691222965@163.com>
> >> >Signed-off-by: NeilBrown <neilb@suse.de>
> >> >
> >> >diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> >> >index 2bf094a..f0aa7abd 100644
> >> >--- a/drivers/md/raid5.c
> >> >+++ b/drivers/md/raid5.c
> >> >@@ -3607,8 +3607,8 @@ static void handle_stripe(struct stripe_head *sh)
> >> > 			handle_parity_checks5(conf, sh, &s, disks);
> >> > 	}
> >> > 
> >> >-	if (s.replacing && s.locked == 0
> >> >-	    && !test_bit(STRIPE_INSYNC, &sh->state)) {
> >> >+	if ((s.replacing || s.syncing) && s.locked == 0
> >> >+	    && !test_bit(STRIPE_REPLACED, &sh->state)) {
> >> > 		/* Write out to replacement devices where possible */
> >> > 		for (i = 0; i < conf->raid_disks; i++)
> >> > 			if (test_bit(R5_UPTODATE, &sh->dev[i].flags) &&
> >> >@@ -3617,7 +3617,9 @@ static void handle_stripe(struct stripe_head *sh)
> >> > 				set_bit(R5_LOCKED, &sh->dev[i].flags);
> >> > 				s.locked++;
> >> > 			}
> >> >-		set_bit(STRIPE_INSYNC, &sh->state);
> >> >+		if (s.replacing)
> >> >+			set_bit(STRIPE_INSYNC, &sh->state);
> >> >+		set_bit(STRIPE_REPLACED, &sh->state);
> >> > 	}
> >> > 	if ((s.syncing || s.replacing) && s.locked == 0 &&
> >> > 	    test_bit(STRIPE_INSYNC, &sh->state)) {
> >> >diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
> >> >index b0b663b..70c4932 100644
> >> >--- a/drivers/md/raid5.h
> >> >+++ b/drivers/md/raid5.h
> >> >@@ -306,6 +306,7 @@ enum {
> >> > 	STRIPE_SYNC_REQUESTED,
> >> > 	STRIPE_SYNCING,
> >> > 	STRIPE_INSYNC,
> >> >+	STRIPE_REPLACED,
> >> > 	STRIPE_PREREAD_ACTIVE,
> >> > 	STRIPE_DELAYED,
> >> > 	STRIPE_DEGRADED,
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

next prev parent reply	other threads:[~2013-07-25  6:45 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <45ae92bb.4ffb.13ff2092e1d.Coremail.13691222965@163.com>
2013-07-18 16:46 ` Problem with disk replacement qindehua
2013-07-22  3:04   ` NeilBrown
2013-07-22 10:23     ` qindehua
2013-07-23  3:21       ` NeilBrown
2013-07-24 12:48         ` qindehua
2013-07-25  6:45           ` NeilBrown [this message]
2013-07-25  7:32             ` NeilBrown
2013-08-16 10:26               ` qindehua
2013-07-18 16:57 Qin Dehua

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:1c3b279 dfblob:e6e24c3 )
 OR (
bs:"Re: Problem with disk replacement" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130725164531.72f316c8@notabene.brown \
    --to=neilb@suse.de \
    --cc=linux-raid@vger.kernel.org \
    --cc=qindehua@163.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).