public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
To: Stefan Bader <stefan.bader@canonical.com>,
	Greg KH <gregkh@suse.de>, Alasdair G Kergon <agk@redhat.com>
Cc: linux-kernel@vger.kernel.org, stable@kernel.org,
	Junichi Nomura <j-nomura@ce.jp.nec.com>,
	akpm@linux-foundation.org, torvalds@linux-foundation.org,
	stable-review@kernel.org, alan@lxorguk.ukuu.org.uk,
	Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Subject: Re: [Stable-review] [93/93] dm mpath: fix stall when requeueing io
Date: Mon, 22 Feb 2010 19:16:34 +0900	[thread overview]
Message-ID: <4B825982.3060606@ct.jp.nec.com> (raw)
In-Reply-To: <4B815A3D.9040402@canonical.com>

Hi Alasdair, Greg,

Please replace this patch with the patch attached below.
This patch seems to have been broken somewhere, since the original
patch (*) for 2.6.33-rc6 cosmetically depends on another patch included
in 2.6.33-rc1.
  (*) http://marc.info/?l=dm-devel&m=126518144727377&w=2

Stefan, thank you for spotting this.

BTW, as for your spin lock question, the spin_lock_irq could be spin_lock.
Please see my answer below for details.

On 02/22/2010 01:07 AM +0900, Stefan Bader wrote:
>> @@ -1568,12 +1575,16 @@ static void dm_request_fn(struct request
>>  
>>  		blk_start_request(rq);
>>  		spin_unlock(q->queue_lock);
>> -		map_request(ti, rq, md);
>> +		if (map_request(ti, rq, md))
>> +			goto requeued;
>>  		spin_lock_irq(q->queue_lock);
>>  	}
> 
> That is the current state of dm_request_function:
> 
>                 clone = rq->special;
>                 atomic_inc(&md->pending[rq_data_dir(clone)]);
> 
>                 spin_unlock(q->queue_lock);
>                 if (map_request(ti, clone, md))
> 
> While looking over the code I also noticed that the spinlock is dropped with
> spin_unlock and then reacquired with spin_lock_irq. Isn't the irq version too
> much in that case? Or was the intention to have interrupts enabled when unlocking?

In the current device-mapper code, I would like to go with
spin_unlock/lock here.
However, there was a case to enable irq in map_requst() for request
allocation, and this spin_lock_irq was a work-around for the case.
Now, there is no such case in the device-mapper code, so spin_lock should
be enough here.  But I'm still using spin_lock_irq for safeness, since
there might be some more cases to enable irq during request submission
to underlying devices.
I'll remove the _irq in the future after lots of testings.


REVISED PATCH:
This patch fixes the problem that system may stall if target's ->map_rq
returns DM_MAPIO_REQUEUE in map_request().
E.g. stall happens on 1 CPU box when a dm-mpath device with queue_if_no_path
     bounces between all-paths-down and paths-up on I/O load.

When target's ->map_rq returns DM_MAPIO_REQUEUE, map_request() requeues
the request and returns to dm_request_fn().  Then, dm_request_fn()
doesn't exit the I/O dispatching loop and continues processing
the requeued request again.
This map and requeue loop can be done with interrupt disabled,
so 1 CPU system can be stalled if this situation happens.

For example, commands below can stall my 1 CPU box within 1 minute or so:
  # dmsetup table mp
  mp: 0 2097152 multipath 1 queue_if_no_path 0 1 1 service-time 0 1 2 8:144 1 1
  # while true; do dd if=/dev/mapper/mp of=/dev/null bs=1M count=100; done &
  # while true; do \
  > dmsetup message mp 0 "fail_path 8:144" \
  > dmsetup suspend --noflush mp \
  > dmsetup resume mp \
  > dmsetup message mp 0 "reinstate_path 8:144" \
  > done

To fix the problem above, this patch changes dm_request_fn() to exit
the I/O dispatching loop once if a request is requeued in map_request().

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Alasdair G Kergon <agk@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
---
 drivers/md/dm.c |   21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

Index: 2.6.32.8/drivers/md/dm.c
===================================================================
--- 2.6.32.8.orig/drivers/md/dm.c
+++ 2.6.32.8/drivers/md/dm.c
@@ -1487,10 +1487,15 @@ static int dm_prep_fn(struct request_que
 	return BLKPREP_OK;
 }
 
-static void map_request(struct dm_target *ti, struct request *rq,
-			struct mapped_device *md)
+/*
+ * Returns:
+ * 0  : the request has been processed (not requeued)
+ * !0 : the request has been requeued
+ */
+static int map_request(struct dm_target *ti, struct request *rq,
+		       struct mapped_device *md)
 {
-	int r;
+	int r, requeued = 0;
 	struct request *clone = rq->special;
 	struct dm_rq_target_io *tio = clone->end_io_data;
 
@@ -1516,6 +1521,7 @@ static void map_request(struct dm_target
 	case DM_MAPIO_REQUEUE:
 		/* The target wants to requeue the I/O */
 		dm_requeue_unmapped_request(clone);
+		requeued = 1;
 		break;
 	default:
 		if (r > 0) {
@@ -1527,6 +1533,8 @@ static void map_request(struct dm_target
 		dm_kill_unmapped_request(clone, r);
 		break;
 	}
+
+	return requeued;
 }
 
 /*
@@ -1568,12 +1576,17 @@ static void dm_request_fn(struct request
 
 		blk_start_request(rq);
 		spin_unlock(q->queue_lock);
-		map_request(ti, rq, md);
+		if (map_request(ti, rq, md))
+			goto requeued;
+
 		spin_lock_irq(q->queue_lock);
 	}
 
 	goto out;
 
+requeued:
+	spin_lock_irq(q->queue_lock);
+
 plug_and_out:
 	if (!elv_queue_empty(q))
 		/* Some requests still remain, retry later */

  reply	other threads:[~2010-02-22 10:26 UTC|newest]

Thread overview: 105+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-19 16:57 [00/93] 2.6.32.9-stable review Greg KH
2010-02-19 16:28 ` [01/93] Fix potential crash with sys_move_pages Greg KH
2010-02-19 16:28 ` [02/93] futex_lock_pi() key refcnt fix Greg KH
2010-02-19 16:28 ` [03/93] futex: Handle user space corruption gracefully Greg KH
2010-02-19 16:28 ` [04/93] futex: Handle futex value " Greg KH
2010-02-19 16:28 ` [05/93] Fix race in tty_fasync() properly Greg KH
2010-02-19 16:28 ` [06/93] hwmon: (w83781d) Request I/O ports individually for probing Greg KH
2010-02-19 16:29 ` [07/93] hwmon: (lm78) " Greg KH
2010-02-19 16:29 ` [08/93] hwmon: (adt7462) Wrong ADT7462_VOLT_COUNT Greg KH
2010-02-19 16:29 ` [09/93] ALSA: ctxfi - fix PTP address initialization Greg KH
2010-02-19 16:29 ` [10/93] drm/i915: disable hotplug detect before Ironlake CRT detect Greg KH
2010-02-19 16:29 ` [11/93] drm/i915: enable self-refresh on 965 Greg KH
2010-02-19 16:29 ` [12/93] drm/i915: Disable SR when more than one pipe is enabled Greg KH
2010-02-19 16:29 ` [13/93] drm/i915: Fix DDC on some systems by clearing BIOS GMBUS setup Greg KH
2010-02-19 16:29 ` [14/93] drm/i915: Add HP nx9020/SamsungSX20S to ACPI LID quirk list Greg KH
2010-02-19 16:29 ` [15/93] drm/i915: Fix the incorrect DMI string for Samsung SX20S laptop Greg KH
2010-02-19 16:29 ` [16/93] drm/i915: Add MALATA PC-81005 to ACPI LID quirk list Greg KH
2010-02-19 16:29 ` [17/93] usb: r8a66597-hcd: Flush the D-cache for the pipe-in transfer buffers Greg KH
2010-02-19 16:29 ` [18/93] i2c-tiny-usb: Fix on big-endian systems Greg KH
2010-02-19 16:29 ` [19/93] drm/i915: handle FBC and self-refresh better Greg KH
2010-02-19 16:29 ` [20/93] drm/i915: Increase fb alignment to 64k Greg KH
2010-02-19 16:29 ` [21/93] drm/i915: Update write_domains on active list after flush Greg KH
2010-02-19 16:29 ` [22/93] regulator: Fix display of null constraints for regulators Greg KH
2010-02-19 16:29 ` [23/93] ALSA: hda-intel: Avoid divide by zero crash Greg KH
2010-02-19 16:29 ` [24/93] CPUFREQ: Fix use after free of struct powernow_k8_data Greg KH
2010-02-19 16:29 ` [25/93] freeze_bdev: dont deactivate successfully frozen MS_RDONLY sb Greg KH
2010-02-19 16:29 ` [26/93] cciss: Make cciss_seq_show handle holes in the h->drv[] array Greg KH
2010-02-19 16:29 ` [27/93] ioat: fix infinite timeout checking in ioat2_quiesce Greg KH
2010-02-19 16:29 ` [28/93] resource: add helpers for fetching rlimits Greg KH
2010-02-19 16:29 ` [29/93] fs/exec.c: restrict initial stack space expansion to rlimit Greg KH
2010-02-21  6:42   ` Michael Neuling
2010-02-23 15:34     ` [stable] " Greg KH
2010-02-23 20:42       ` Michael Neuling
2010-02-19 16:29 ` [30/93] cifs: fix length calculation for converted unicode readdir names Greg KH
2010-02-19 16:29 ` [31/93] NFS: Fix a reference leak in nfs_wb_cancel_page() Greg KH
2010-02-19 16:29 ` [32/93] NFS: Try to commit unstable writes in nfs_release_page() Greg KH
2010-02-19 16:29 ` [33/93] NFSv4: Dont allow posix locking against servers that dont support it Greg KH
2010-02-19 16:29 ` [34/93] NFSv4: Ensure that the NFSv4 locking can recover from stateid errors Greg KH
2010-02-19 16:29 ` [35/93] NFS: Fix an Oops when truncating a file Greg KH
2010-02-19 16:29 ` [36/93] NFS: Fix a umount race Greg KH
2010-02-19 16:29 ` [37/93] NFS: Fix a bug in nfs_fscache_release_page() Greg KH
2010-02-19 16:29 ` [38/93] NFS: Fix the mapping of the NFSERR_SERVERFAULT error Greg KH
2010-02-19 16:29 ` [39/93] md: fix degraded calculation when starting a reshape Greg KH
2010-02-19 16:29 ` [40/93] V4L/DVB: dvb-core: fix initialization of feeds list in demux filter Greg KH
2010-02-19 16:29 ` [41/93] Export the symbol of getboottime and mmonotonic_to_bootbased Greg KH
2010-02-19 16:29 ` [42/93] kvmclock: count total_sleep_time when updating guest clock Greg KH
2010-02-19 16:29 ` [43/93] KVM: PIT: control word is write-only Greg KH
2010-02-19 16:29 ` [44/93] tpm_infineon: fix suspend/resume handler for pnp_driver Greg KH
2010-02-19 16:29 ` [45/93] amd64_edac: Do not falsely trigger kerneloops Greg KH
2010-02-19 16:29 ` [46/93] netfilter: nf_conntrack: fix memory corruption with multiple namespaces Greg KH
2010-02-19 16:29 ` [47/93] netfilter: nf_conntrack: per netns nf_conntrack_cachep Greg KH
2010-02-19 16:29 ` [48/93] netfilter: nf_conntrack: restrict runtime expect hashsize modifications Greg KH
2010-02-19 16:29 ` [49/93] netfilter: xtables: compat out of scope fix Greg KH
2010-02-19 16:29 ` [50/93] netfilter: nf_conntrack: fix hash resizing with namespaces Greg KH
2010-02-19 16:29 ` [51/93] drm/i915: remove full registers dump debug Greg KH
2010-02-19 16:29 ` [52/93] drm/i915: add i915_lp_ring_sync helper Greg KH
2010-02-19 16:29 ` [53/93] drm/i915: Dont wait interruptible for possible plane buffer flush Greg KH
2010-02-19 16:29 ` [54/93] [S390] dasd: remove strings from s390dbf Greg KH
2010-02-19 16:29 ` [55/93] crypto: padlock-sha - Add import/export support Greg KH
2010-02-19 16:29 ` [56/93] wmi: Free the allocated acpi objects through wmi_get_event_data Greg KH
2010-02-19 16:29 ` [57/93] dell-wmi, hp-wmi, msi-wmi: check wmi_get_event_data() return value Greg KH
2010-02-19 16:29 ` [58/93] /dev/mem: introduce size_inside_page() Greg KH
2010-02-19 16:29 ` [59/93] devmem: check vmalloc address on kmem read/write Greg KH
2010-02-19 16:29 ` [60/93] devmem: fix kmem write bug on memory holes Greg KH
2010-02-19 16:29 ` [61/93] SCSI: mptfusion : mptscsih_abort return value should be SUCCESS instead of value 0 Greg KH
2010-02-19 16:29 ` [62/93] sh: Couple kernel and user write page perm bits for CONFIG_X2TLB Greg KH
2010-02-19 16:29 ` [63/93] ALSA: hda - use WARN_ON_ONCE() for zero-division detection Greg KH
2010-02-19 16:29 ` [64/93] dst: call cond_resched() in dst_gc_task() Greg KH
2010-02-19 16:29 ` [65/93] ALSA: hda - Improved MacBook (Pro) 5,1 / 5,2 support Greg KH
2010-02-19 16:29 ` [66/93] befs: fix leak Greg KH
2010-02-19 16:30 ` [67/93] rtc-fm3130: add missing braces Greg KH
2010-02-19 16:30 ` [68/93] [libata] Call flush_dcache_page after PIO data transfers in libata-sff.c Greg KH
2010-02-19 16:30 ` [69/93] ahci: add Acer G725 to broken suspend list Greg KH
2010-02-19 16:30 ` [70/93] pktgen: Fix freezing problem Greg KH
2010-02-19 16:30 ` [71/93] x86/amd-iommu: Fix IOMMU-API initialization for iommu=pt Greg KH
2010-02-19 16:30 ` [72/93] x86/amd-iommu: Fix deassignment of a device from the pt_domain Greg KH
2010-02-19 16:30 ` [73/93] x86: Re-get cfg_new in case reuse/move irq_desc Greg KH
2010-02-19 16:30 ` [74/93] Staging: fix rtl8187se compilation errors with mac80211 Greg KH
2010-02-19 16:30 ` [75/93] ALSA: usb-audio - Avoid Oops after disconnect Greg KH
2010-02-19 16:30 ` [76/93] serial: 8250: add serial transmitter fully empty test Greg KH
2010-02-19 16:30 ` [77/93] sysfs: sysfs_sd_setattr set iattrs unconditionally Greg KH
2010-02-19 16:30 ` [78/93] class: Free the class private data in class_release Greg KH
2010-02-19 16:30 ` [79/93] USB: usbfs: only copy the actual data received Greg KH
2010-02-19 16:30 ` [80/93] USB: usbfs: properly clean up the as structure on error paths Greg KH
2010-02-19 16:30 ` [81/93] rtl8187: Add new device ID Greg KH
2010-02-19 16:30 ` [82/93] ACPI: Add NULL pointer check in acpi_bus_start Greg KH
2010-02-19 16:30 ` [83/93] ACPI: fix High cpu temperature with 2.6.32 Greg KH
2010-02-19 16:30 ` [84/93] drm/radeon/kms: use udelay for short delays Greg KH
2010-02-19 16:30 ` [85/93] NFS: Too many GETATTR and ACCESS calls after direct I/O Greg KH
2010-02-19 16:30 ` [86/93] eCryptfs: Add getattr function Greg KH
2010-02-19 16:30 ` [87/93] b43: Fix throughput regression Greg KH
2010-02-19 16:30 ` [88/93] ath9k: Fix sequence numbers for PAE frames Greg KH
2010-02-19 16:30 ` [89/93] mac80211: Fix probe request filtering in IBSS mode Greg KH
2010-02-19 16:30 ` [90/93] iwlwifi: Fix to set correct ht configuration Greg KH
2010-02-19 16:30 ` [91/93] dm stripe: avoid divide by zero with invalid stripe count Greg KH
2010-02-19 16:30 ` [92/93] dm log: userspace fix overhead_size calcuations Greg KH
2010-02-19 16:30 ` [93/93] dm mpath: fix stall when requeueing io Greg KH
2010-02-21 16:07   ` [Stable-review] " Stefan Bader
2010-02-22 10:16     ` Kiyoshi Ueda [this message]
2010-02-23 18:12       ` Alasdair G Kergon
2010-02-24  1:12         ` Kiyoshi Ueda
2010-02-24 22:30           ` Mikulas Patocka
     [not found]       ` <20100223175331.GE560@agk-dp.fab.redhat.com>
     [not found]         ` <Pine.LNX.4.64.1002231429210.29863@hs20-bc2-1.build.redhat.com>
     [not found]           ` <4B84E05C.4070300@ct.jp.nec.com>
     [not found]             ` <Pine.LNX.4.64.1002241732300.2537@hs20-bc2-1.build.redhat.com>
     [not found]               ` <4B862103.1040802@ct.jp.nec.com>
2010-02-26 20:57                 ` How to unload a module? (Was: [dm-devel] rqdm: bad usage of dm_get/dm_put) Mikulas Patocka
2010-02-23 15:33     ` [Stable-review] [93/93] dm mpath: fix stall when requeueing io Greg KH
2010-02-23 17:00       ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B825982.3060606@ct.jp.nec.com \
    --to=k-ueda@ct.jp.nec.com \
    --cc=agk@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=gregkh@suse.de \
    --cc=j-nomura@ce.jp.nec.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable-review@kernel.org \
    --cc=stable@kernel.org \
    --cc=stefan.bader@canonical.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox