linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg KH <gregkh@suse.de>
To: linux-kernel@vger.kernel.org, stable@kernel.org
Cc: stable-review@kernel.org, torvalds@linux-foundation.org,
	akpm@linux-foundation.org, alan@lxorguk.ukuu.org.uk,
	NeilBrown <neilb@suse.de>
Subject: [110/111] md/raid1: delay reads that could overtake behind-writes.
Date: Wed, 11 Aug 2010 16:55:31 -0700	[thread overview]
Message-ID: <20100811235505.712326775@clark.site> (raw)
In-Reply-To: <20100811235623.GA24440@kroah.com>

2.6.32-stable review patch.  If anyone has any objections, please let us know.

------------------

From: NeilBrown <neilb@suse.de>

commit e555190d82c0f58e825e3cbd9e6ebe2e7ac713bd upstream.

When a raid1 array is configured to support write-behind
on some devices, it normally only reads from other devices.
If all devices are write-behind (because the rest have failed)
it is possible for a read request to be serviced before a
behind-write request, which would appear as data corruption.

So when forced to read from a WriteMostly device, wait for any
write-behind to complete, and don't start any more behind-writes.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


---
 drivers/md/bitmap.c |    4 +++-
 drivers/md/bitmap.h |    3 +++
 drivers/md/raid1.c  |   25 ++++++++++++++++++-------
 3 files changed, 24 insertions(+), 8 deletions(-)

--- a/drivers/md/bitmap.c
+++ b/drivers/md/bitmap.c
@@ -1317,7 +1317,8 @@ void bitmap_endwrite(struct bitmap *bitm
 {
 	if (!bitmap) return;
 	if (behind) {
-		atomic_dec(&bitmap->behind_writes);
+		if (atomic_dec_and_test(&bitmap->behind_writes))
+			wake_up(&bitmap->behind_wait);
 		PRINTK(KERN_DEBUG "dec write-behind count %d/%d\n",
 		  atomic_read(&bitmap->behind_writes), bitmap->max_write_behind);
 	}
@@ -1629,6 +1630,7 @@ int bitmap_create(mddev_t *mddev)
 	atomic_set(&bitmap->pending_writes, 0);
 	init_waitqueue_head(&bitmap->write_wait);
 	init_waitqueue_head(&bitmap->overflow_wait);
+	init_waitqueue_head(&bitmap->behind_wait);
 
 	bitmap->mddev = mddev;
 
--- a/drivers/md/bitmap.h
+++ b/drivers/md/bitmap.h
@@ -254,6 +254,9 @@ struct bitmap {
 	wait_queue_head_t write_wait;
 	wait_queue_head_t overflow_wait;
 
+#ifndef __GENKSYMS__
+	wait_queue_head_t behind_wait;
+#endif
 };
 
 /* the bitmap API */
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -845,6 +845,15 @@ static int make_request(struct request_q
 		}
 		mirror = conf->mirrors + rdisk;
 
+		if (test_bit(WriteMostly, &mirror->rdev->flags) &&
+		    bitmap) {
+			/* Reading from a write-mostly device must
+			 * take care not to over-take any writes
+			 * that are 'behind'
+			 */
+			wait_event(bitmap->behind_wait,
+				   atomic_read(&bitmap->behind_writes) == 0);
+		}
 		r1_bio->read_disk = rdisk;
 
 		read_bio = bio_clone(bio, GFP_NOIO);
@@ -922,9 +931,13 @@ static int make_request(struct request_q
 		set_bit(R1BIO_Degraded, &r1_bio->state);
 	}
 
-	/* do behind I/O ? */
+	/* do behind I/O ?
+	 * Not if there are too many, or cannot allocate memory,
+	 * or a reader on WriteMostly is waiting for behind writes
+	 * to flush */
 	if (bitmap &&
 	    atomic_read(&bitmap->behind_writes) < bitmap->max_write_behind &&
+	    !waitqueue_active(&bitmap->behind_wait) &&
 	    (behind_pages = alloc_behind_pages(bio)) != NULL)
 		set_bit(R1BIO_BehindIO, &r1_bio->state);
 
@@ -2105,15 +2118,13 @@ static int stop(mddev_t *mddev)
 {
 	conf_t *conf = mddev->private;
 	struct bitmap *bitmap = mddev->bitmap;
-	int behind_wait = 0;
 
 	/* wait for behind writes to complete */
-	while (bitmap && atomic_read(&bitmap->behind_writes) > 0) {
-		behind_wait++;
-		printk(KERN_INFO "raid1: behind writes in progress on device %s, waiting to stop (%d)\n", mdname(mddev), behind_wait);
-		set_current_state(TASK_UNINTERRUPTIBLE);
-		schedule_timeout(HZ); /* wait a second */
+	if (bitmap && atomic_read(&bitmap->behind_writes) > 0) {
+		printk(KERN_INFO "raid1: behind writes in progress on device %s, waiting to stop.\n", mdname(mddev));
 		/* need to kick something here to make sure I/O goes? */
+		wait_event(bitmap->behind_wait,
+			   atomic_read(&bitmap->behind_writes) == 0);
 	}
 
 	raise_barrier(conf);



  parent reply	other threads:[~2010-08-12  0:00 UTC|newest]

Thread overview: 112+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-08-11 23:56 [000/111] 2.6.32.19 -stable review Greg KH
2010-08-11 23:53 ` [001/111] ata_piix: fix locking around SIDPR access Greg KH
2010-08-11 23:53 ` [002/111] powerpc: fix build with make 3.82 Greg KH
2010-08-11 23:53 ` [003/111] nvram: Fix write beyond end condition; prove to gcc copy is safe Greg KH
2010-08-11 23:53 ` [004/111] x86: Add memory modify constraints to xchg() and cmpxchg() Greg KH
2010-08-11 23:53 ` [005/111] x86, vmware: Preset lpj values when on VMware Greg KH
2010-08-11 23:53 ` [006/111] Staging: line6: needs to select SND_PCM Greg KH
2010-08-11 23:53 ` [007/111] Staging: panel: Prevent double-calling of parport_release - fix oops Greg KH
2010-08-11 23:53 ` [008/111] PCI: Do not run NVidia quirks related to MSI with MSI disabled Greg KH
2010-08-11 23:53 ` [009/111] PCI: disable MSI on VIA K8M800 Greg KH
2010-08-11 23:53 ` [010/111] solos-pci: Fix race condition in tasklet RX handling Greg KH
2010-08-11 23:53 ` [011/111] splice: fix misuse of SPLICE_F_NONBLOCK Greg KH
2010-08-11 23:53 ` [012/111] drivers/video/w100fb.c: ignore void return value / fix build failure Greg KH
2010-08-11 23:53 ` [013/111] ide-cd: Do not access completed requests in the irq handler Greg KH
2010-08-11 23:53 ` [014/111] md/raid10: fix deadlock with unaligned read during resync Greg KH
2010-08-11 23:53 ` [015/111] blkdev: cgroup whitelist permission fix Greg KH
2010-08-11 23:53 ` [016/111] eCryptfs: Handle ioctl calls with unlocked and compat functions Greg KH
2010-08-11 23:53 ` [017/111] ecryptfs: release reference to lower mount if interpose fails Greg KH
2010-08-11 23:53 ` [018/111] fs/ecryptfs/file.c: introduce missing free Greg KH
2010-08-11 23:54 ` [019/111] bio, fs: update RWA_MASK, READA and SWRITE to match the corresponding BIO_RW_* bits Greg KH
2010-08-11 23:54 ` [020/111] signalfd: fill in ssi_int for posix timers and message queues Greg KH
2010-08-11 23:54 ` [021/111] smsc911x: Add spinlocks around registers access Greg KH
2010-08-11 23:54 ` [022/111] ARM: 6299/1: errata: TLBIASIDIS and TLBIMVAIS operations can broadcast a faulty ASID Greg KH
2010-08-11 23:54 ` [023/111] ARM: 6280/1: imx: Fix build failure when including <mach/gpio.h> without <linux/spinlock.h> Greg KH
2010-08-11 23:54 ` [024/111] USB: resizing usbmon binary interface buffer causes protection faults Greg KH
2010-08-11 23:54 ` [025/111] USB delay init quirk for logitech Harmony 700-series devices Greg KH
2010-08-11 23:54 ` [026/111] USB: serial: enabling support for Segway RMP in ftdi_sio Greg KH
2010-08-11 23:54 ` [027/111] USB: option: Huawei ETS 1220 support added Greg KH
2010-08-11 23:54 ` [028/111] USB: option: add huawei k3765 k4505 devices to work properly Greg KH
2010-08-11 23:54 ` [029/111] USB: ftdi_sio: device id for Navitator Greg KH
2010-08-11 23:54 ` [030/111] USB: cp210x: Add four new device IDs Greg KH
2010-08-11 23:54 ` [031/111] USB: usbtest: avoid to free coherent buffer in atomic context Greg KH
2010-08-11 23:54 ` [032/111] USB: fix thread-unsafe anchor utiliy routines Greg KH
2010-08-11 23:54 ` [033/111] drm/edid: Fix the HDTV hack sync adjustment Greg KH
2010-08-11 23:54 ` [034/111] Bluetooth: Added support for controller shipped with iMac i5 Greg KH
2010-08-11 23:54 ` [035/111] jfs: dont allow os2 xattr namespace overlap with others Greg KH
2010-08-11 23:54 ` [036/111] arp_notify: allow drivers to explicitly request a notification event Greg KH
2010-08-11 23:54 ` [037/111] xen: netfront: explicitly generate arp_notify event after migration Greg KH
2010-08-11 23:54 ` [038/111] net: Fix NETDEV_NOTIFY_PEERS to not conflict with NETDEV_BONDING_DESLAVE Greg KH
2010-08-11 23:54 ` [039/111] irq: Add new IRQ flag IRQF_NO_SUSPEND Greg KH
2010-08-11 23:54 ` [040/111] xen: Do not suspend IPI IRQs Greg KH
2010-08-11 23:54 ` [041/111] ext4: fix freeze deadlock under IO Greg KH
2010-08-11 23:54 ` [042/111] drm/i915: Use RSEN instead of HTPLG for tfp410 monitor detection Greg KH
2010-08-11 23:54 ` [043/111] Btrfs: Avoid superfluous tree-log writeout Greg KH
2010-08-11 23:54 ` [044/111] Btrfs: Add btrfs_duplicate_item Greg KH
2010-08-11 23:54 ` [045/111] Btrfs: Rewrite btrfs_drop_extents Greg KH
2010-08-11 23:54 ` [046/111] Btrfs: Fix disk_i_size update corner case Greg KH
2010-08-11 23:54 ` [047/111] Btrfs: Avoid orphan inodes cleanup while replaying log Greg KH
2010-08-11 23:54 ` [048/111] Btrfs: Avoid orphan inodes cleanup during committing transaction Greg KH
2010-08-11 23:54 ` [049/111] Btrfs: Make fallocate(2) more ENOSPC friendly Greg KH
2010-08-11 23:54 ` [050/111] Btrfs: Make truncate(2) " Greg KH
2010-08-11 23:54 ` [051/111] Btrfs: Pass transaction handle to security and ACL initialization functions Greg KH
2010-08-11 23:54 ` [052/111] Btrfs: Add delayed iput Greg KH
2010-08-11 23:54 ` [053/111] Btrfs: Fix btrfs_drop_extent_cache for skip pinned case Greg KH
2010-08-11 23:54 ` [054/111] Btrfs: Fix per root used space accounting Greg KH
2010-08-11 23:54 ` [055/111] Btrfs: dont add extent 0 to the free space cache v2 Greg KH
2010-08-11 23:54 ` [056/111] Btrfs: fail mount on bad mount options Greg KH
2010-08-11 23:54 ` [057/111] Btrfs: deny sys_link across subvolumes Greg KH
2010-08-11 23:54 ` [058/111] Btrfs: Show discard option in /proc/mounts Greg KH
2010-08-11 23:54 ` [059/111] Btrfs: make metadata chunks smaller Greg KH
2010-08-11 23:54 ` [060/111] Btrfs: make sure fallocate properly starts a transaction Greg KH
2010-08-11 23:54 ` [061/111] btrfs: fix missing last-entry in readdir(3) Greg KH
2010-08-11 23:54 ` [062/111] Btrfs: align offsets for btrfs_ordered_update_i_size Greg KH
2010-08-11 23:54 ` [063/111] Btrfs, fix memory leaks in error paths Greg KH
2010-08-11 23:54 ` [064/111] Btrfs: Fix race in btrfs_mark_extent_written Greg KH
2010-08-11 23:54 ` [065/111] Btrfs: fix regression in orphan cleanup Greg KH
2010-08-11 23:54 ` [066/111] Btrfs: deal with NULL acl sent to btrfs_set_acl Greg KH
2010-08-11 23:54 ` [067/111] Btrfs: fix possible panic on unmount Greg KH
2010-08-11 23:54 ` [068/111] Btrfs: Use correct values when updating inode i_size on fallocate Greg KH
2010-08-11 23:54 ` [069/111] Btrfs: fix a memory leak in btrfs_init_acl Greg KH
2010-08-11 23:54 ` [070/111] Btrfs: run orphan cleanup on default fs root Greg KH
2010-08-11 23:54 ` [071/111] Btrfs: do not mark the chunk as readonly if in degraded mode Greg KH
2010-08-11 23:54 ` [072/111] Btrfs: check return value of open_bdev_exclusive properly Greg KH
2010-08-11 23:54 ` [073/111] Btrfs: check total number of devices when removing missing Greg KH
2010-08-11 23:54 ` [074/111] Btrfs: fix race between allocate and release extent buffer Greg KH
2010-08-11 23:54 ` [075/111] Btrfs: make error return negative in btrfs_sync_file() Greg KH
2010-08-11 23:54 ` [076/111] Btrfs: remove BUG_ON() due to mounting bad filesystem Greg KH
2010-08-11 23:54 ` [077/111] Btrfs: Fix oopsen when dropping empty tree Greg KH
2010-08-11 23:54 ` [078/111] Btrfs: do not try and lookup the file extent when finishing ordered io Greg KH
2010-08-11 23:55 ` [079/111] Btrfs: apply updated fallocate i_size fix Greg KH
2010-08-11 23:55 ` [080/111] Btrfs: btrfs_mark_extent_written uses the wrong slot Greg KH
2010-08-11 23:55 ` [081/111] Btrfs: kfree correct pointer during mount option parsing Greg KH
2010-08-11 23:55 ` [082/111] nohz: Introduce arch_needs_cpu Greg KH
2010-08-11 23:55 ` [083/111] nohz: Reuse ktime in sub-functions of tick_check_idle Greg KH
2010-08-11 23:55 ` [084/111] timekeeping: Fix clock_gettime vsyscall time warp Greg KH
2010-08-11 23:55 ` [085/111] sched: Fix granularity of task_u/stime() Greg KH
2010-08-11 23:55 ` [086/111] sched, cputime: Introduce thread_group_times() Greg KH
2010-08-11 23:55 ` [087/111] mutex: Dont spin when the owner CPU is offline or other weird cases Greg KH
2010-08-11 23:55 ` [088/111] [IA64] fix SBA IOMMU to handle allocation failure properly Greg KH
2010-08-11 23:55 ` [089/111] crypto: testmgr - Fix complain about lack test for internal used algorithm Greg KH
2010-08-11 23:55 ` [090/111] memory hotplug: fix a bug on /dev/mem for 64-bit kernels Greg KH
2010-08-11 23:55 ` [091/111] x86: Fix out of order of gsi Greg KH
2010-08-11 23:55 ` [092/111] HWPOISON: remove the anonymous entry Greg KH
2010-08-11 23:55 ` [093/111] HWPOISON: abort on failed unmap Greg KH
2010-08-11 23:55 ` [094/111] powerpc/eeh: Fix a bug when pci structure is null Greg KH
2010-08-11 23:55 ` [095/111] ACPI: Fix regression where _PPC is not read at boot even when ignore_ppc=0 Greg KH
2010-08-11 23:55 ` [096/111] ext4: Make sure the MOVE_EXT ioctl cant overwrite append-only files Greg KH
2010-08-11 23:55 ` [097/111] ext4: Fix optional-arg mount options Greg KH
2010-08-11 23:55 ` [098/111] reiserfs: properly honor read-only devices Greg KH
2010-08-11 23:55 ` [099/111] reiserfs: fix oops while creating privroot with selinux enabled Greg KH
2010-08-11 23:55 ` [100/111] dlm: always use GFP_NOFS Greg KH
2010-08-11 23:55 ` [101/111] dlm: fix ordering of bast and cast Greg KH
2010-08-11 23:55 ` [102/111] dlm: send reply before bast Greg KH
2010-08-11 23:55 ` [103/111] ocfs2: Find proper end cpos for a leaf refcount block Greg KH
2010-08-11 23:55 ` [104/111] ocfs2: Set MS_POSIXACL on remount Greg KH
2010-08-11 23:55 ` [105/111] [PATCH] Skip check for mandatory locks when unlocking Greg KH
2010-08-11 23:55 ` [106/111] loop: Update mtime when writing using aops Greg KH
2010-08-11 23:55 ` [107/111] [SCSI] aic79xx: check for non-NULL scb in ahd_handle_nonpkt_busfree Greg KH
2010-08-11 23:55 ` [108/111] [SCSI] ibmvfc: Fix command completion handling Greg KH
2010-08-11 23:55 ` [109/111] [SCSI] ibmvfc: Reduce error recovery timeout Greg KH
2010-08-11 23:55 ` Greg KH [this message]
2010-08-11 23:55 ` [111/111] mm: fix corruption of hibernation caused by reusing swap during image saving Greg KH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100811235505.712326775@clark.site \
    --to=gregkh@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=stable-review@kernel.org \
    --cc=stable@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).