linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: qindehua <13691222965@163.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: Problem with disk replacement
Date: Mon, 22 Jul 2013 13:04:15 +1000	[thread overview]
Message-ID: <20130722130415.44b85e93@notabene.brown> (raw)
In-Reply-To: <7945ee0b.7d5e.13ff2acdc3f.Coremail.13691222965@163.com>

[-- Attachment #1: Type: text/plain, Size: 3822 bytes --]

On Fri, 19 Jul 2013 00:46:22 +0800 (CST) qindehua <13691222965@163.com> wrote:

> Hi Neil,
> I use kernel 3.9.6 and found data may lost while replacing a RAID disk using --replace.
> 
> Here is the steps of the test case:
> 1. Create 3-drives RAID5 and wait for resync done (I use 1GB disks in VirtualBox machine)
> 2. Calculate the md5sum on the whole raid: dd if=/dev/md1 bs=1M iflag=direct | md5sum
> 3. add two spare disks to the raid: mdadm /dev/md1 -a /dev/sdi -a /dev/sdj
> 4. mdadm --replace a disk then --fail another disk:
>    mdadm /dev/md1 --replace /dev/sdb; sleep 3; mdadm /dev/md1 -f /dev/sdc
> 5. wait for recovery done
> 6. recalculate the md5sum on the whole raid: dd if=/dev/md1 bs=1M iflag=direct | md5sum
> 
> The md5sum of step 6 is NOT identical with that of step 2, this means data is corrupted.
> I found in step 4, when another disk was made failed, the replacing process was stopped,
> and started recovering of the failed disk, and afterwards the replacing not continue.

Hi Qin,
 thanks for the report.  I can easily reproduce the bug.

I think this will fix it.  Could you please test and confirm that it fixes
the problem for you too.

Thanks,
NeilBrown


From: NeilBrown <neilb@suse.de>
Date: Mon, 22 Jul 2013 12:57:21 +1000
Subject: [PATCH] md/raid5: fix interaction of 'replace' and 'recovery'.

If a device in a RAID4/5/6 is being replaced while another is being
recovered, then the writes to the replacement device currently don't
happen, resulting in corruption when the replacement completes and the
new drive takes over.

This is because the replacement writes are only triggered when
's.replacing' is set and not when the similar 's.sync' is set (which
is the case during resync and recovery - it means all devices need to
be read).

So schedule those writes when s.replacing is set as well.

In this case we cannot use "STRIPE_INSYNC" to record that the
replacement has happened as that is needed for recording that any
parity calculation is complete.  So introduce STRIPE_REPLACED to
record if the replacement has happened.

This bug was introduced in commit 9a3e1101b827a59ac9036a672f5fa8d5279d0fe2
(md/raid5:  detect and handle replacements during recovery.)
which introduced replacement for raid5.
That was in 3.3-rc3, so any stable kernel since then would benefit
from this fix.

Cc: stable@vger.kernel.org (3.3+)
Reported-by: qindehua <13691222965@163.com>
Signed-off-by: NeilBrown <neilb@suse.de>

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 2bf094a..f0aa7abd 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -3607,8 +3607,8 @@ static void handle_stripe(struct stripe_head *sh)
 			handle_parity_checks5(conf, sh, &s, disks);
 	}
 
-	if (s.replacing && s.locked == 0
-	    && !test_bit(STRIPE_INSYNC, &sh->state)) {
+	if ((s.replacing || s.syncing) && s.locked == 0
+	    && !test_bit(STRIPE_REPLACED, &sh->state)) {
 		/* Write out to replacement devices where possible */
 		for (i = 0; i < conf->raid_disks; i++)
 			if (test_bit(R5_UPTODATE, &sh->dev[i].flags) &&
@@ -3617,7 +3617,9 @@ static void handle_stripe(struct stripe_head *sh)
 				set_bit(R5_LOCKED, &sh->dev[i].flags);
 				s.locked++;
 			}
-		set_bit(STRIPE_INSYNC, &sh->state);
+		if (s.replacing)
+			set_bit(STRIPE_INSYNC, &sh->state);
+		set_bit(STRIPE_REPLACED, &sh->state);
 	}
 	if ((s.syncing || s.replacing) && s.locked == 0 &&
 	    test_bit(STRIPE_INSYNC, &sh->state)) {
diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
index b0b663b..70c4932 100644
--- a/drivers/md/raid5.h
+++ b/drivers/md/raid5.h
@@ -306,6 +306,7 @@ enum {
 	STRIPE_SYNC_REQUESTED,
 	STRIPE_SYNCING,
 	STRIPE_INSYNC,
+	STRIPE_REPLACED,
 	STRIPE_PREREAD_ACTIVE,
 	STRIPE_DELAYED,
 	STRIPE_DEGRADED,

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

  reply	other threads:[~2013-07-22  3:04 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <45ae92bb.4ffb.13ff2092e1d.Coremail.13691222965@163.com>
2013-07-18 16:46 ` Problem with disk replacement qindehua
2013-07-22  3:04   ` NeilBrown [this message]
2013-07-22 10:23     ` qindehua
2013-07-23  3:21       ` NeilBrown
2013-07-24 12:48         ` qindehua
2013-07-25  6:45           ` NeilBrown
2013-07-25  7:32             ` NeilBrown
2013-08-16 10:26               ` qindehua
2013-07-18 16:57 Qin Dehua

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130722130415.44b85e93@notabene.brown \
    --to=neilb@suse.de \
    --cc=13691222965@163.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).