From: Greg KH <gregkh@suse.de>
To: linux-kernel@vger.kernel.org, stable@kernel.org
Cc: stable-review@kernel.org, torvalds@linux-foundation.org,
akpm@linux-foundation.org, alan@lxorguk.ukuu.org.uk,
NeilBrown <neilb@suse.de>
Subject: [110/111] md/raid1: delay reads that could overtake behind-writes.
Date: Wed, 11 Aug 2010 16:55:31 -0700 [thread overview]
Message-ID: <20100811235505.712326775@clark.site> (raw)
In-Reply-To: <20100811235623.GA24440@kroah.com>
2.6.32-stable review patch. If anyone has any objections, please let us know.
------------------
From: NeilBrown <neilb@suse.de>
commit e555190d82c0f58e825e3cbd9e6ebe2e7ac713bd upstream.
When a raid1 array is configured to support write-behind
on some devices, it normally only reads from other devices.
If all devices are write-behind (because the rest have failed)
it is possible for a read request to be serviced before a
behind-write request, which would appear as data corruption.
So when forced to read from a WriteMostly device, wait for any
write-behind to complete, and don't start any more behind-writes.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
drivers/md/bitmap.c | 4 +++-
drivers/md/bitmap.h | 3 +++
drivers/md/raid1.c | 25 ++++++++++++++++++-------
3 files changed, 24 insertions(+), 8 deletions(-)
--- a/drivers/md/bitmap.c
+++ b/drivers/md/bitmap.c
@@ -1317,7 +1317,8 @@ void bitmap_endwrite(struct bitmap *bitm
{
if (!bitmap) return;
if (behind) {
- atomic_dec(&bitmap->behind_writes);
+ if (atomic_dec_and_test(&bitmap->behind_writes))
+ wake_up(&bitmap->behind_wait);
PRINTK(KERN_DEBUG "dec write-behind count %d/%d\n",
atomic_read(&bitmap->behind_writes), bitmap->max_write_behind);
}
@@ -1629,6 +1630,7 @@ int bitmap_create(mddev_t *mddev)
atomic_set(&bitmap->pending_writes, 0);
init_waitqueue_head(&bitmap->write_wait);
init_waitqueue_head(&bitmap->overflow_wait);
+ init_waitqueue_head(&bitmap->behind_wait);
bitmap->mddev = mddev;
--- a/drivers/md/bitmap.h
+++ b/drivers/md/bitmap.h
@@ -254,6 +254,9 @@ struct bitmap {
wait_queue_head_t write_wait;
wait_queue_head_t overflow_wait;
+#ifndef __GENKSYMS__
+ wait_queue_head_t behind_wait;
+#endif
};
/* the bitmap API */
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -845,6 +845,15 @@ static int make_request(struct request_q
}
mirror = conf->mirrors + rdisk;
+ if (test_bit(WriteMostly, &mirror->rdev->flags) &&
+ bitmap) {
+ /* Reading from a write-mostly device must
+ * take care not to over-take any writes
+ * that are 'behind'
+ */
+ wait_event(bitmap->behind_wait,
+ atomic_read(&bitmap->behind_writes) == 0);
+ }
r1_bio->read_disk = rdisk;
read_bio = bio_clone(bio, GFP_NOIO);
@@ -922,9 +931,13 @@ static int make_request(struct request_q
set_bit(R1BIO_Degraded, &r1_bio->state);
}
- /* do behind I/O ? */
+ /* do behind I/O ?
+ * Not if there are too many, or cannot allocate memory,
+ * or a reader on WriteMostly is waiting for behind writes
+ * to flush */
if (bitmap &&
atomic_read(&bitmap->behind_writes) < bitmap->max_write_behind &&
+ !waitqueue_active(&bitmap->behind_wait) &&
(behind_pages = alloc_behind_pages(bio)) != NULL)
set_bit(R1BIO_BehindIO, &r1_bio->state);
@@ -2105,15 +2118,13 @@ static int stop(mddev_t *mddev)
{
conf_t *conf = mddev->private;
struct bitmap *bitmap = mddev->bitmap;
- int behind_wait = 0;
/* wait for behind writes to complete */
- while (bitmap && atomic_read(&bitmap->behind_writes) > 0) {
- behind_wait++;
- printk(KERN_INFO "raid1: behind writes in progress on device %s, waiting to stop (%d)\n", mdname(mddev), behind_wait);
- set_current_state(TASK_UNINTERRUPTIBLE);
- schedule_timeout(HZ); /* wait a second */
+ if (bitmap && atomic_read(&bitmap->behind_writes) > 0) {
+ printk(KERN_INFO "raid1: behind writes in progress on device %s, waiting to stop.\n", mdname(mddev));
/* need to kick something here to make sure I/O goes? */
+ wait_event(bitmap->behind_wait,
+ atomic_read(&bitmap->behind_writes) == 0);
}
raise_barrier(conf);
next prev parent reply other threads:[~2010-08-12 0:00 UTC|newest]
Thread overview: 112+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-08-11 23:56 [000/111] 2.6.32.19 -stable review Greg KH
2010-08-11 23:53 ` [001/111] ata_piix: fix locking around SIDPR access Greg KH
2010-08-11 23:53 ` [002/111] powerpc: fix build with make 3.82 Greg KH
2010-08-11 23:53 ` [003/111] nvram: Fix write beyond end condition; prove to gcc copy is safe Greg KH
2010-08-11 23:53 ` [004/111] x86: Add memory modify constraints to xchg() and cmpxchg() Greg KH
2010-08-11 23:53 ` [005/111] x86, vmware: Preset lpj values when on VMware Greg KH
2010-08-11 23:53 ` [006/111] Staging: line6: needs to select SND_PCM Greg KH
2010-08-11 23:53 ` [007/111] Staging: panel: Prevent double-calling of parport_release - fix oops Greg KH
2010-08-11 23:53 ` [008/111] PCI: Do not run NVidia quirks related to MSI with MSI disabled Greg KH
2010-08-11 23:53 ` [009/111] PCI: disable MSI on VIA K8M800 Greg KH
2010-08-11 23:53 ` [010/111] solos-pci: Fix race condition in tasklet RX handling Greg KH
2010-08-11 23:53 ` [011/111] splice: fix misuse of SPLICE_F_NONBLOCK Greg KH
2010-08-11 23:53 ` [012/111] drivers/video/w100fb.c: ignore void return value / fix build failure Greg KH
2010-08-11 23:53 ` [013/111] ide-cd: Do not access completed requests in the irq handler Greg KH
2010-08-11 23:53 ` [014/111] md/raid10: fix deadlock with unaligned read during resync Greg KH
2010-08-11 23:53 ` [015/111] blkdev: cgroup whitelist permission fix Greg KH
2010-08-11 23:53 ` [016/111] eCryptfs: Handle ioctl calls with unlocked and compat functions Greg KH
2010-08-11 23:53 ` [017/111] ecryptfs: release reference to lower mount if interpose fails Greg KH
2010-08-11 23:53 ` [018/111] fs/ecryptfs/file.c: introduce missing free Greg KH
2010-08-11 23:54 ` [019/111] bio, fs: update RWA_MASK, READA and SWRITE to match the corresponding BIO_RW_* bits Greg KH
2010-08-11 23:54 ` [020/111] signalfd: fill in ssi_int for posix timers and message queues Greg KH
2010-08-11 23:54 ` [021/111] smsc911x: Add spinlocks around registers access Greg KH
2010-08-11 23:54 ` [022/111] ARM: 6299/1: errata: TLBIASIDIS and TLBIMVAIS operations can broadcast a faulty ASID Greg KH
2010-08-11 23:54 ` [023/111] ARM: 6280/1: imx: Fix build failure when including <mach/gpio.h> without <linux/spinlock.h> Greg KH
2010-08-11 23:54 ` [024/111] USB: resizing usbmon binary interface buffer causes protection faults Greg KH
2010-08-11 23:54 ` [025/111] USB delay init quirk for logitech Harmony 700-series devices Greg KH
2010-08-11 23:54 ` [026/111] USB: serial: enabling support for Segway RMP in ftdi_sio Greg KH
2010-08-11 23:54 ` [027/111] USB: option: Huawei ETS 1220 support added Greg KH
2010-08-11 23:54 ` [028/111] USB: option: add huawei k3765 k4505 devices to work properly Greg KH
2010-08-11 23:54 ` [029/111] USB: ftdi_sio: device id for Navitator Greg KH
2010-08-11 23:54 ` [030/111] USB: cp210x: Add four new device IDs Greg KH
2010-08-11 23:54 ` [031/111] USB: usbtest: avoid to free coherent buffer in atomic context Greg KH
2010-08-11 23:54 ` [032/111] USB: fix thread-unsafe anchor utiliy routines Greg KH
2010-08-11 23:54 ` [033/111] drm/edid: Fix the HDTV hack sync adjustment Greg KH
2010-08-11 23:54 ` [034/111] Bluetooth: Added support for controller shipped with iMac i5 Greg KH
2010-08-11 23:54 ` [035/111] jfs: dont allow os2 xattr namespace overlap with others Greg KH
2010-08-11 23:54 ` [036/111] arp_notify: allow drivers to explicitly request a notification event Greg KH
2010-08-11 23:54 ` [037/111] xen: netfront: explicitly generate arp_notify event after migration Greg KH
2010-08-11 23:54 ` [038/111] net: Fix NETDEV_NOTIFY_PEERS to not conflict with NETDEV_BONDING_DESLAVE Greg KH
2010-08-11 23:54 ` [039/111] irq: Add new IRQ flag IRQF_NO_SUSPEND Greg KH
2010-08-11 23:54 ` [040/111] xen: Do not suspend IPI IRQs Greg KH
2010-08-11 23:54 ` [041/111] ext4: fix freeze deadlock under IO Greg KH
2010-08-11 23:54 ` [042/111] drm/i915: Use RSEN instead of HTPLG for tfp410 monitor detection Greg KH
2010-08-11 23:54 ` [043/111] Btrfs: Avoid superfluous tree-log writeout Greg KH
2010-08-11 23:54 ` [044/111] Btrfs: Add btrfs_duplicate_item Greg KH
2010-08-11 23:54 ` [045/111] Btrfs: Rewrite btrfs_drop_extents Greg KH
2010-08-11 23:54 ` [046/111] Btrfs: Fix disk_i_size update corner case Greg KH
2010-08-11 23:54 ` [047/111] Btrfs: Avoid orphan inodes cleanup while replaying log Greg KH
2010-08-11 23:54 ` [048/111] Btrfs: Avoid orphan inodes cleanup during committing transaction Greg KH
2010-08-11 23:54 ` [049/111] Btrfs: Make fallocate(2) more ENOSPC friendly Greg KH
2010-08-11 23:54 ` [050/111] Btrfs: Make truncate(2) " Greg KH
2010-08-11 23:54 ` [051/111] Btrfs: Pass transaction handle to security and ACL initialization functions Greg KH
2010-08-11 23:54 ` [052/111] Btrfs: Add delayed iput Greg KH
2010-08-11 23:54 ` [053/111] Btrfs: Fix btrfs_drop_extent_cache for skip pinned case Greg KH
2010-08-11 23:54 ` [054/111] Btrfs: Fix per root used space accounting Greg KH
2010-08-11 23:54 ` [055/111] Btrfs: dont add extent 0 to the free space cache v2 Greg KH
2010-08-11 23:54 ` [056/111] Btrfs: fail mount on bad mount options Greg KH
2010-08-11 23:54 ` [057/111] Btrfs: deny sys_link across subvolumes Greg KH
2010-08-11 23:54 ` [058/111] Btrfs: Show discard option in /proc/mounts Greg KH
2010-08-11 23:54 ` [059/111] Btrfs: make metadata chunks smaller Greg KH
2010-08-11 23:54 ` [060/111] Btrfs: make sure fallocate properly starts a transaction Greg KH
2010-08-11 23:54 ` [061/111] btrfs: fix missing last-entry in readdir(3) Greg KH
2010-08-11 23:54 ` [062/111] Btrfs: align offsets for btrfs_ordered_update_i_size Greg KH
2010-08-11 23:54 ` [063/111] Btrfs, fix memory leaks in error paths Greg KH
2010-08-11 23:54 ` [064/111] Btrfs: Fix race in btrfs_mark_extent_written Greg KH
2010-08-11 23:54 ` [065/111] Btrfs: fix regression in orphan cleanup Greg KH
2010-08-11 23:54 ` [066/111] Btrfs: deal with NULL acl sent to btrfs_set_acl Greg KH
2010-08-11 23:54 ` [067/111] Btrfs: fix possible panic on unmount Greg KH
2010-08-11 23:54 ` [068/111] Btrfs: Use correct values when updating inode i_size on fallocate Greg KH
2010-08-11 23:54 ` [069/111] Btrfs: fix a memory leak in btrfs_init_acl Greg KH
2010-08-11 23:54 ` [070/111] Btrfs: run orphan cleanup on default fs root Greg KH
2010-08-11 23:54 ` [071/111] Btrfs: do not mark the chunk as readonly if in degraded mode Greg KH
2010-08-11 23:54 ` [072/111] Btrfs: check return value of open_bdev_exclusive properly Greg KH
2010-08-11 23:54 ` [073/111] Btrfs: check total number of devices when removing missing Greg KH
2010-08-11 23:54 ` [074/111] Btrfs: fix race between allocate and release extent buffer Greg KH
2010-08-11 23:54 ` [075/111] Btrfs: make error return negative in btrfs_sync_file() Greg KH
2010-08-11 23:54 ` [076/111] Btrfs: remove BUG_ON() due to mounting bad filesystem Greg KH
2010-08-11 23:54 ` [077/111] Btrfs: Fix oopsen when dropping empty tree Greg KH
2010-08-11 23:54 ` [078/111] Btrfs: do not try and lookup the file extent when finishing ordered io Greg KH
2010-08-11 23:55 ` [079/111] Btrfs: apply updated fallocate i_size fix Greg KH
2010-08-11 23:55 ` [080/111] Btrfs: btrfs_mark_extent_written uses the wrong slot Greg KH
2010-08-11 23:55 ` [081/111] Btrfs: kfree correct pointer during mount option parsing Greg KH
2010-08-11 23:55 ` [082/111] nohz: Introduce arch_needs_cpu Greg KH
2010-08-11 23:55 ` [083/111] nohz: Reuse ktime in sub-functions of tick_check_idle Greg KH
2010-08-11 23:55 ` [084/111] timekeeping: Fix clock_gettime vsyscall time warp Greg KH
2010-08-11 23:55 ` [085/111] sched: Fix granularity of task_u/stime() Greg KH
2010-08-11 23:55 ` [086/111] sched, cputime: Introduce thread_group_times() Greg KH
2010-08-11 23:55 ` [087/111] mutex: Dont spin when the owner CPU is offline or other weird cases Greg KH
2010-08-11 23:55 ` [088/111] [IA64] fix SBA IOMMU to handle allocation failure properly Greg KH
2010-08-11 23:55 ` [089/111] crypto: testmgr - Fix complain about lack test for internal used algorithm Greg KH
2010-08-11 23:55 ` [090/111] memory hotplug: fix a bug on /dev/mem for 64-bit kernels Greg KH
2010-08-11 23:55 ` [091/111] x86: Fix out of order of gsi Greg KH
2010-08-11 23:55 ` [092/111] HWPOISON: remove the anonymous entry Greg KH
2010-08-11 23:55 ` [093/111] HWPOISON: abort on failed unmap Greg KH
2010-08-11 23:55 ` [094/111] powerpc/eeh: Fix a bug when pci structure is null Greg KH
2010-08-11 23:55 ` [095/111] ACPI: Fix regression where _PPC is not read at boot even when ignore_ppc=0 Greg KH
2010-08-11 23:55 ` [096/111] ext4: Make sure the MOVE_EXT ioctl cant overwrite append-only files Greg KH
2010-08-11 23:55 ` [097/111] ext4: Fix optional-arg mount options Greg KH
2010-08-11 23:55 ` [098/111] reiserfs: properly honor read-only devices Greg KH
2010-08-11 23:55 ` [099/111] reiserfs: fix oops while creating privroot with selinux enabled Greg KH
2010-08-11 23:55 ` [100/111] dlm: always use GFP_NOFS Greg KH
2010-08-11 23:55 ` [101/111] dlm: fix ordering of bast and cast Greg KH
2010-08-11 23:55 ` [102/111] dlm: send reply before bast Greg KH
2010-08-11 23:55 ` [103/111] ocfs2: Find proper end cpos for a leaf refcount block Greg KH
2010-08-11 23:55 ` [104/111] ocfs2: Set MS_POSIXACL on remount Greg KH
2010-08-11 23:55 ` [105/111] [PATCH] Skip check for mandatory locks when unlocking Greg KH
2010-08-11 23:55 ` [106/111] loop: Update mtime when writing using aops Greg KH
2010-08-11 23:55 ` [107/111] [SCSI] aic79xx: check for non-NULL scb in ahd_handle_nonpkt_busfree Greg KH
2010-08-11 23:55 ` [108/111] [SCSI] ibmvfc: Fix command completion handling Greg KH
2010-08-11 23:55 ` [109/111] [SCSI] ibmvfc: Reduce error recovery timeout Greg KH
2010-08-11 23:55 ` Greg KH [this message]
2010-08-11 23:55 ` [111/111] mm: fix corruption of hibernation caused by reusing swap during image saving Greg KH
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100811235505.712326775@clark.site \
--to=gregkh@suse.de \
--cc=akpm@linux-foundation.org \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=linux-kernel@vger.kernel.org \
--cc=neilb@suse.de \
--cc=stable-review@kernel.org \
--cc=stable@kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).