public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, qindehua <13691222965@163.com>,
	qindehua <qindehua@163.com>, NeilBrown <neilb@suse.de>
Subject: [ 31/39] md/raid5: fix interaction of replace and recovery.
Date: Fri,  2 Aug 2013 18:18:41 +0800	[thread overview]
Message-ID: <20130802101501.180201088@linuxfoundation.org> (raw)
In-Reply-To: <20130802101456.800497005@linuxfoundation.org>

3.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: NeilBrown <neilb@suse.de>

commit f94c0b6658c7edea8bc19d13be321e3860a3fa54 upstream.

If a device in a RAID4/5/6 is being replaced while another is being
recovered, then the writes to the replacement device currently don't
happen, resulting in corruption when the replacement completes and the
new drive takes over.

This is because the replacement writes are only triggered when
's.replacing' is set and not when the similar 's.sync' is set (which
is the case during resync and recovery - it means all devices need to
be read).

So schedule those writes when s.replacing is set as well.

In this case we cannot use "STRIPE_INSYNC" to record that the
replacement has happened as that is needed for recording that any
parity calculation is complete.  So introduce STRIPE_REPLACED to
record if the replacement has happened.

For safety we should also check that STRIPE_COMPUTE_RUN is not set.
This has a similar effect to the "s.locked == 0" test.  The latter
ensure that now IO has been flagged but not started.  The former
checks if any parity calculation has been flagged by not started.
We must wait for both of these to complete before triggering the
'replace'.

Add a similar test to the subsequent check for "are we finished yet".
This possibly isn't needed (is subsumed in the STRIPE_INSYNC test),
but it makes it more obvious that the REPLACE will happen before we
think we are finished.

Finally if a NeedReplace device is not UPTODATE then that is an
error.  We really must trigger a warning.

This bug was introduced in commit 9a3e1101b827a59ac9036a672f5fa8d5279d0fe2
(md/raid5:  detect and handle replacements during recovery.)
which introduced replacement for raid5.
That was in 3.3-rc3, so any stable kernel since then would benefit
from this fix.

Reported-by: qindehua <13691222965@163.com>
Tested-by: qindehua <qindehua@163.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/md/raid5.c |   15 ++++++++++-----
 drivers/md/raid5.h |    1 +
 2 files changed, 11 insertions(+), 5 deletions(-)

--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -3326,6 +3326,7 @@ static void handle_stripe(struct stripe_
 	if (test_and_clear_bit(STRIPE_SYNC_REQUESTED, &sh->state)) {
 		set_bit(STRIPE_SYNCING, &sh->state);
 		clear_bit(STRIPE_INSYNC, &sh->state);
+		clear_bit(STRIPE_REPLACED, &sh->state);
 	}
 	clear_bit(STRIPE_DELAYED, &sh->state);
 
@@ -3465,19 +3466,23 @@ static void handle_stripe(struct stripe_
 			handle_parity_checks5(conf, sh, &s, disks);
 	}
 
-	if (s.replacing && s.locked == 0
-	    && !test_bit(STRIPE_INSYNC, &sh->state)) {
+	if ((s.replacing || s.syncing) && s.locked == 0
+	    && !test_bit(STRIPE_COMPUTE_RUN, &sh->state)
+	    && !test_bit(STRIPE_REPLACED, &sh->state)) {
 		/* Write out to replacement devices where possible */
 		for (i = 0; i < conf->raid_disks; i++)
-			if (test_bit(R5_UPTODATE, &sh->dev[i].flags) &&
-			    test_bit(R5_NeedReplace, &sh->dev[i].flags)) {
+			if (test_bit(R5_NeedReplace, &sh->dev[i].flags)) {
+				WARN_ON(!test_bit(R5_UPTODATE, &sh->dev[i].flags));
 				set_bit(R5_WantReplace, &sh->dev[i].flags);
 				set_bit(R5_LOCKED, &sh->dev[i].flags);
 				s.locked++;
 			}
-		set_bit(STRIPE_INSYNC, &sh->state);
+		if (s.replacing)
+			set_bit(STRIPE_INSYNC, &sh->state);
+		set_bit(STRIPE_REPLACED, &sh->state);
 	}
 	if ((s.syncing || s.replacing) && s.locked == 0 &&
+	    !test_bit(STRIPE_COMPUTE_RUN, &sh->state) &&
 	    test_bit(STRIPE_INSYNC, &sh->state)) {
 		md_done_sync(conf->mddev, STRIPE_SECTORS, 1);
 		clear_bit(STRIPE_SYNCING, &sh->state);
--- a/drivers/md/raid5.h
+++ b/drivers/md/raid5.h
@@ -306,6 +306,7 @@ enum {
 	STRIPE_SYNC_REQUESTED,
 	STRIPE_SYNCING,
 	STRIPE_INSYNC,
+	STRIPE_REPLACED,
 	STRIPE_PREREAD_ACTIVE,
 	STRIPE_DELAYED,
 	STRIPE_DEGRADED,



  parent reply	other threads:[~2013-08-02 10:19 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-02 10:18 [ 00/39] 3.4.56-stable review Greg Kroah-Hartman
2013-08-02 10:18 ` [ 01/39] iscsi-target: Fix tfc_tpg_nacl_auth_cit configfs length overflow Greg Kroah-Hartman
2013-08-02 10:18 ` [ 02/39] USB: storage: Add MicroVault Flash Drive to unusual_devs Greg Kroah-Hartman
2013-08-02 10:18 ` [ 03/39] ASoC: max98088 - fix element type of the register cache Greg Kroah-Hartman
2013-08-02 10:18 ` [ 04/39] ASoC: wm8962: Remove remaining direct register cache accesses Greg Kroah-Hartman
2013-08-02 10:18 ` [ 05/39] SCSI: sd: fix crash when UA received on DIF enabled device Greg Kroah-Hartman
2013-08-02 10:18 ` [ 06/39] SCSI: qla2xxx: Properly set the tagging for commands Greg Kroah-Hartman
2013-08-02 10:18 ` [ 07/39] tracing: Fix irqs-off tag display in syscall tracing Greg Kroah-Hartman
2013-08-02 10:18 ` [ 08/39] usb: host: xhci: Enable XHCI_SPURIOUS_SUCCESS for all controllers with xhci 1.0 Greg Kroah-Hartman
2013-08-02 10:18 ` [ 09/39] xhci: fix null pointer dereference on ring_doorbell_for_active_rings Greg Kroah-Hartman
2013-08-02 10:18 ` [ 10/39] xhci: Avoid NULL pointer deref when host dies Greg Kroah-Hartman
2013-08-02 10:18 ` [ 11/39] usb: dwc3: fix wrong bit mask in dwc3_event_type Greg Kroah-Hartman
2013-08-02 10:18 ` [ 12/39] usb: dwc3: gadget: dont prevent gadget from being probed if we fail Greg Kroah-Hartman
2013-08-02 10:18 ` [ 13/39] USB: ti_usb_3410_5052: fix dynamic-id matching Greg Kroah-Hartman
2013-08-02 10:18 ` [ 14/39] USB: misc: Add Manhattan Hi-Speed USB DVI Converter to sisusbvga Greg Kroah-Hartman
2013-08-02 10:18 ` [ 15/39] usb: Clear both buffers when clearing a control transfer TT buffer Greg Kroah-Hartman
2013-08-02 10:18 ` [ 16/39] staging: comedi: COMEDI_CANCEL ioctl should wake up read/write Greg Kroah-Hartman
2013-08-02 10:18 ` [ 17/39] Btrfs: fix lock leak when resuming snapshot deletion Greg Kroah-Hartman
2013-08-02 10:18 ` [ 18/39] Btrfs: re-add root to dead root list if we stop dropping it Greg Kroah-Hartman
2013-08-02 10:18 ` [ 19/39] xen/blkback: Check device permissions before allowing OP_DISCARD Greg Kroah-Hartman
2013-08-02 10:18 ` [ 20/39] ata: Fix DVD not dectected at some platform with Wellsburg PCH Greg Kroah-Hartman
2013-08-02 10:18 ` [ 21/39] libata: make it clear that sata_inic162x is experimental Greg Kroah-Hartman
2013-08-02 10:18 ` [ 22/39] powerpc/modules: Module CRC relocation fix causes perf issues Greg Kroah-Hartman
2013-08-02 10:18 ` [ 23/39] ACPI / memhotplug: Fix a stale pointer in error path Greg Kroah-Hartman
2013-08-02 10:18 ` [ 24/39] dm verity: fix inability to use a few specific devices sizes Greg Kroah-Hartman
2013-08-02 10:18 ` [ 25/39] drm/radeon: fix endian issues with DP handling (v3) Greg Kroah-Hartman
2013-08-02 10:18 ` [ 26/39] drm/radeon: fix combios tables on older cards Greg Kroah-Hartman
2013-08-02 10:18 ` [ 27/39] drm/radeon: improve dac adjust heuristics for legacy pdac Greg Kroah-Hartman
2013-08-02 10:18 ` [ 28/39] drm/radeon/atom: initialize more atom interpretor elements to 0 Greg Kroah-Hartman
2013-08-02 10:18 ` [ 29/39] USB: serial: ftdi_sio: add more RT Systems ftdi devices Greg Kroah-Hartman
2013-08-02 10:18 ` [ 30/39] livelock avoidance in sget() Greg Kroah-Hartman
2013-08-02 10:18 ` Greg Kroah-Hartman [this message]
2013-08-02 10:18 ` [ 32/39] md/raid10: remove use-after-free bug Greg Kroah-Hartman
2013-08-02 10:18 ` [ 33/39] xen/evtchn: avoid a deadlock when unbinding an event channel Greg Kroah-Hartman
2013-08-02 10:18 ` [ 34/39] firewire: fix libdc1394/FlyCap2 iso event regression Greg Kroah-Hartman
2013-08-02 10:18 ` [ 35/39] [SCSI] zfcp: status read buffers on first adapter open with link down Greg Kroah-Hartman
2013-08-02 10:18 ` [ 36/39] s390: move dummy io_remap_pfn_range() to asm/pgtable.h Greg Kroah-Hartman
2013-08-02 10:18 ` [ 37/39] virtio: support unlocked queue poll Greg Kroah-Hartman
2013-08-02 10:18 ` [ 38/39] virtio_net: fix race in RX VQ processing Greg Kroah-Hartman
2013-08-02 10:18 ` [ 39/39] mm/memory-hotplug: fix lowmem count overflow when offline pages Greg Kroah-Hartman
2013-08-02 19:58 ` [ 00/39] 3.4.56-stable review Shuah Khan
2013-08-03  2:38 ` Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130802101501.180201088@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=13691222965@163.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=qindehua@163.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox