From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
alan@lxorguk.ukuu.org.uk, "George Spelvin" <linux@horizon.com>,
NeilBrown <neilb@suse.de>
Subject: [ 54/54] md/raid10: close race that lose writes lost when replacement completes.
Date: Fri, 30 Nov 2012 10:56:25 -0800 [thread overview]
Message-ID: <20121130185213.341757158@linuxfoundation.org> (raw)
In-Reply-To: <20121130185207.894301294@linuxfoundation.org>
3.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown <neilb@suse.de>
commit e7c0c3fa29280d62aa5e11101a674bb3064bd791 upstream.
When a replacement operation completes there is a small window
when the original device is marked 'faulty' and the replacement
still looks like a replacement. The faulty should be removed and
the replacement moved in place very quickly, bit it isn't instant.
So the code write out to the array must handle the possibility that
the only working device for some slot in the replacement - but it
doesn't. If the primary device is faulty it just gives up. This
can lead to corruption.
So make the code more robust: if either the primary or the
replacement is present and working, write to them. Only when
neither are present do we give up.
This bug has been present since replacement was introduced in
3.3, so it is suitable for any -stable kernel since then.
Reported-by: "George Spelvin" <linux@horizon.com>
Cc: stable@vger.kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: George Spelvin <linux@horizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
drivers/md/raid10.c | 114 ++++++++++++++++++++++++++--------------------------
1 file changed, 59 insertions(+), 55 deletions(-)
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1287,18 +1287,21 @@ retry_write:
blocked_rdev = rrdev;
break;
}
+ if (rdev && (test_bit(Faulty, &rdev->flags)
+ || test_bit(Unmerged, &rdev->flags)))
+ rdev = NULL;
if (rrdev && (test_bit(Faulty, &rrdev->flags)
|| test_bit(Unmerged, &rrdev->flags)))
rrdev = NULL;
r10_bio->devs[i].bio = NULL;
r10_bio->devs[i].repl_bio = NULL;
- if (!rdev || test_bit(Faulty, &rdev->flags) ||
- test_bit(Unmerged, &rdev->flags)) {
+
+ if (!rdev && !rrdev) {
set_bit(R10BIO_Degraded, &r10_bio->state);
continue;
}
- if (test_bit(WriteErrorSeen, &rdev->flags)) {
+ if (rdev && test_bit(WriteErrorSeen, &rdev->flags)) {
sector_t first_bad;
sector_t dev_sector = r10_bio->devs[i].addr;
int bad_sectors;
@@ -1340,8 +1343,10 @@ retry_write:
max_sectors = good_sectors;
}
}
- r10_bio->devs[i].bio = bio;
- atomic_inc(&rdev->nr_pending);
+ if (rdev) {
+ r10_bio->devs[i].bio = bio;
+ atomic_inc(&rdev->nr_pending);
+ }
if (rrdev) {
r10_bio->devs[i].repl_bio = bio;
atomic_inc(&rrdev->nr_pending);
@@ -1397,58 +1402,57 @@ retry_write:
for (i = 0; i < conf->copies; i++) {
struct bio *mbio;
int d = r10_bio->devs[i].devnum;
- if (!r10_bio->devs[i].bio)
- continue;
- mbio = bio_clone_mddev(bio, GFP_NOIO, mddev);
- md_trim_bio(mbio, r10_bio->sector - bio->bi_sector,
- max_sectors);
- r10_bio->devs[i].bio = mbio;
-
- mbio->bi_sector = (r10_bio->devs[i].addr+
- choose_data_offset(r10_bio,
- conf->mirrors[d].rdev));
- mbio->bi_bdev = conf->mirrors[d].rdev->bdev;
- mbio->bi_end_io = raid10_end_write_request;
- mbio->bi_rw = WRITE | do_sync | do_fua;
- mbio->bi_private = r10_bio;
-
- atomic_inc(&r10_bio->remaining);
- spin_lock_irqsave(&conf->device_lock, flags);
- bio_list_add(&conf->pending_bio_list, mbio);
- conf->pending_count++;
- spin_unlock_irqrestore(&conf->device_lock, flags);
- if (!mddev_check_plugged(mddev))
- md_wakeup_thread(mddev->thread);
-
- if (!r10_bio->devs[i].repl_bio)
- continue;
+ if (r10_bio->devs[i].bio) {
+ struct md_rdev *rdev = conf->mirrors[d].rdev;
+ mbio = bio_clone_mddev(bio, GFP_NOIO, mddev);
+ md_trim_bio(mbio, r10_bio->sector - bio->bi_sector,
+ max_sectors);
+ r10_bio->devs[i].bio = mbio;
+
+ mbio->bi_sector = (r10_bio->devs[i].addr +
+ choose_data_offset(r10_bio, rdev));
+ mbio->bi_bdev = rdev->bdev;
+ mbio->bi_end_io = raid10_end_write_request;
+ mbio->bi_rw = WRITE | do_sync | do_fua;
+ mbio->bi_private = r10_bio;
+
+ atomic_inc(&r10_bio->remaining);
+ spin_lock_irqsave(&conf->device_lock, flags);
+ bio_list_add(&conf->pending_bio_list, mbio);
+ conf->pending_count++;
+ spin_unlock_irqrestore(&conf->device_lock, flags);
+ if (!mddev_check_plugged(mddev))
+ md_wakeup_thread(mddev->thread);
+ }
- mbio = bio_clone_mddev(bio, GFP_NOIO, mddev);
- md_trim_bio(mbio, r10_bio->sector - bio->bi_sector,
- max_sectors);
- r10_bio->devs[i].repl_bio = mbio;
-
- /* We are actively writing to the original device
- * so it cannot disappear, so the replacement cannot
- * become NULL here
- */
- mbio->bi_sector = (r10_bio->devs[i].addr +
- choose_data_offset(
- r10_bio,
- conf->mirrors[d].replacement));
- mbio->bi_bdev = conf->mirrors[d].replacement->bdev;
- mbio->bi_end_io = raid10_end_write_request;
- mbio->bi_rw = WRITE | do_sync | do_fua;
- mbio->bi_private = r10_bio;
-
- atomic_inc(&r10_bio->remaining);
- spin_lock_irqsave(&conf->device_lock, flags);
- bio_list_add(&conf->pending_bio_list, mbio);
- conf->pending_count++;
- spin_unlock_irqrestore(&conf->device_lock, flags);
- if (!mddev_check_plugged(mddev))
- md_wakeup_thread(mddev->thread);
+ if (r10_bio->devs[i].repl_bio) {
+ struct md_rdev *rdev = conf->mirrors[d].replacement;
+ if (rdev == NULL) {
+ /* Replacement just got moved to main 'rdev' */
+ smp_mb();
+ rdev = conf->mirrors[d].rdev;
+ }
+ mbio = bio_clone_mddev(bio, GFP_NOIO, mddev);
+ md_trim_bio(mbio, r10_bio->sector - bio->bi_sector,
+ max_sectors);
+ r10_bio->devs[i].repl_bio = mbio;
+
+ mbio->bi_sector = (r10_bio->devs[i].addr +
+ choose_data_offset(r10_bio, rdev));
+ mbio->bi_bdev = rdev->bdev;
+ mbio->bi_end_io = raid10_end_write_request;
+ mbio->bi_rw = WRITE | do_sync | do_fua;
+ mbio->bi_private = r10_bio;
+
+ atomic_inc(&r10_bio->remaining);
+ spin_lock_irqsave(&conf->device_lock, flags);
+ bio_list_add(&conf->pending_bio_list, mbio);
+ conf->pending_count++;
+ spin_unlock_irqrestore(&conf->device_lock, flags);
+ if (!mddev_check_plugged(mddev))
+ md_wakeup_thread(mddev->thread);
+ }
}
/* Don't remove the bias on 'remaining' (one_write_done) until
next prev parent reply other threads:[~2012-11-30 18:56 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-11-30 18:55 [ 00/54] 3.6.9-stable review Greg Kroah-Hartman
2012-11-30 18:55 ` [ 01/54] wireless: add back sysfs directory Greg Kroah-Hartman
2012-11-30 18:55 ` [ 02/54] x86-32: Fix invalid stack address while in softirq Greg Kroah-Hartman
2012-11-30 18:55 ` [ 03/54] x86, efi: Fix processor-specific memcpy() build error Greg Kroah-Hartman
2012-11-30 18:55 ` [ 04/54] x86, microcode, AMD: Add support for family 16h processors Greg Kroah-Hartman
2012-11-30 18:55 ` [ 05/54] x86-32: Export kernel_stack_pointer() for modules Greg Kroah-Hartman
2012-11-30 18:55 ` [ 06/54] iwlwifi: dont WARN when a non empty queue is disabled Greg Kroah-Hartman
2012-11-30 18:55 ` [ 07/54] iwlwifi: fix monitor mode FCS flag Greg Kroah-Hartman
2012-11-30 18:55 ` [ 08/54] rtlwifi: rtl8192cu: Add new USB ID Greg Kroah-Hartman
2012-11-30 18:55 ` [ 09/54] mwifiex: report error to MMC core if we cannot suspend Greg Kroah-Hartman
2012-11-30 18:55 ` [ 10/54] mwifiex: fix system hang issue in cmd timeout error case Greg Kroah-Hartman
2012-11-30 18:55 ` [ 11/54] SCSI: isci: copy fis 0x34 response into proper buffer Greg Kroah-Hartman
2012-11-30 18:55 ` [ 12/54] drm/radeon: add new SI pci id Greg Kroah-Hartman
2012-11-30 18:55 ` [ 13/54] ALSA: ua101, usx2y: fix broken MIDI output Greg Kroah-Hartman
2012-11-30 18:55 ` [ 14/54] ALSA: hda - Cirrus: Correctly clear line_out_pins when moving to speaker Greg Kroah-Hartman
2012-11-30 18:55 ` [ 15/54] PARISC: fix virtual aliasing issue in get_shared_area() Greg Kroah-Hartman
2012-11-30 18:55 ` [ 16/54] PARISC: fix user-triggerable panic on parisc Greg Kroah-Hartman
2012-11-30 18:55 ` [ 17/54] mtd: slram: invalid checking of absolute end address Greg Kroah-Hartman
2012-11-30 18:55 ` [ 18/54] mtd: ofpart: Fix incorrect NULL check in parse_ofoldpart_partitions() Greg Kroah-Hartman
2012-11-30 18:55 ` [ 19/54] jffs2: Fix lock acquisition order bug in jffs2_write_begin Greg Kroah-Hartman
2012-11-30 18:55 ` [ 20/54] md: Reassigned the parameters if read_seqretry returned true in func md_is_badblock Greg Kroah-Hartman
2012-11-30 18:55 ` [ 21/54] md: Avoid write invalid address if read_seqretry returned true Greg Kroah-Hartman
2012-11-30 18:55 ` [ 22/54] md/raid10: decrement correct pending counter when writing to replacement Greg Kroah-Hartman
2012-11-30 18:55 ` [ 23/54] block: Dont access request after it might be freed Greg Kroah-Hartman
2012-11-30 18:55 ` [ 24/54] dm: fix deadlock with request based dm and queue request_fn recursion Greg Kroah-Hartman
2012-11-30 18:55 ` [ 25/54] futex: avoid wake_futex() for a PI futex_q Greg Kroah-Hartman
2012-11-30 18:55 ` [ 26/54] mac80211: deinitialize ibss-internals after emptiness check Greg Kroah-Hartman
2012-11-30 18:55 ` [ 27/54] radeon: add AGPMode 1 quirk for RV250 Greg Kroah-Hartman
2012-11-30 18:55 ` [ 28/54] can: peak_usb: fix hwtstamp assignment Greg Kroah-Hartman
2012-11-30 18:56 ` [ 29/54] can: bcm: initialize ifindex for timeouts without previous frame reception Greg Kroah-Hartman
2012-11-30 18:56 ` [ 30/54] jbd: Fix lock ordering bug in journal_unmap_buffer() Greg Kroah-Hartman
2012-11-30 18:56 ` [ 31/54] sparc64: not any error from do_sigaltstack() should fail rt_sigreturn() Greg Kroah-Hartman
2012-11-30 18:56 ` [ 32/54] PM / QoS: fix wrong error-checking condition Greg Kroah-Hartman
2012-11-30 18:56 ` [ 33/54] writeback: put unused inodes to LRU after writeback completion Greg Kroah-Hartman
2012-11-30 18:56 ` [ 34/54] ALSA: hda - Add new codec ALC283 ALC290 support Greg Kroah-Hartman
2012-11-30 18:56 ` [ 35/54] ALSA: hda - Fix missing beep on ASUS X43U notebook Greg Kroah-Hartman
2012-11-30 18:56 ` [ 36/54] ALSA: hda - Add support for Realtek ALC292 Greg Kroah-Hartman
2012-11-30 18:56 ` [ 37/54] bas_gigaset: fix pre_reset handling Greg Kroah-Hartman
2012-11-30 18:56 ` [ 38/54] pstore/ram: Fix printk format warning Greg Kroah-Hartman
2012-11-30 18:56 ` [ 39/54] HID: add quirk for Freescale i.MX28 ROM recovery Greg Kroah-Hartman
2012-11-30 18:56 ` [ 40/54] KVM: x86: invalid opcode oops on SET_SREGS with OSXSAVE bit set (CVE-2012-4461) Greg Kroah-Hartman
2012-11-30 18:56 ` [ 41/54] ixgbe: add support for X540-AT1 Greg Kroah-Hartman
2012-11-30 18:56 ` [ 42/54] sata_svw: check DMA start bit before reset Greg Kroah-Hartman
2012-11-30 18:56 ` [ 43/54] get_dvb_firmware: fix download site for tda10046 firmware Greg Kroah-Hartman
2012-11-30 18:56 ` [ 44/54] NFC: pn533: Fix use after free Greg Kroah-Hartman
2012-11-30 18:56 ` [ 45/54] NFC: Fix pn533 target mode memory leak Greg Kroah-Hartman
2012-11-30 18:56 ` [ 46/54] NFC: pn533: Fix mem leak in pn533_in_dep_link_up Greg Kroah-Hartman
2012-11-30 18:56 ` [ 47/54] NFC: Fix nfc_llcp_local chained list insertion Greg Kroah-Hartman
2012-11-30 18:56 ` [ 48/54] mm: vmscan: check for fatal signals iff the process was throttled Greg Kroah-Hartman
2012-11-30 18:56 ` [ 49/54] watchdog: using u64 in get_sample_period() Greg Kroah-Hartman
2012-11-30 18:56 ` [ 50/54] MPI: Fix compilation on MIPS with GCC 4.4 and newer Greg Kroah-Hartman
2012-11-30 18:56 ` [ 51/54] ext4: remove erroneous ext4_superblock_csum_set() in update_backups() Greg Kroah-Hartman
2012-11-30 18:56 ` [ 52/54] powerpc/eeh: Lock module while handling EEH event Greg Kroah-Hartman
2012-11-30 18:56 ` [ 53/54] mmc: sdhci-s3c: fix the wrong number of max bus clocks Greg Kroah-Hartman
2012-11-30 18:56 ` Greg Kroah-Hartman [this message]
2012-12-01 14:58 ` [ 00/54] 3.6.9-stable review Satoru Takeuchi
2012-12-01 16:50 ` Greg Kroah-Hartman
2012-12-02 2:11 ` Shuah Khan
2012-12-02 17:01 ` Greg Kroah-Hartman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121130185213.341757158@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=linux-kernel@vger.kernel.org \
--cc=linux@horizon.com \
--cc=neilb@suse.de \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).