stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Mikulas Patocka <mpatocka@redhat.com>,
	Mike Snitzer <snitzer@redhat.com>
Subject: [PATCH 4.10 45/48] dm: flush queued bios when process blocks to avoid deadlock
Date: Thu, 16 Mar 2017 23:30:29 +0900	[thread overview]
Message-ID: <20170316142923.047201831@linuxfoundation.org> (raw)
In-Reply-To: <20170316142920.761502205@linuxfoundation.org>

4.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Mikulas Patocka <mpatocka@redhat.com>

commit d67a5f4b5947aba4bfe9a80a2b86079c215ca755 upstream.

Commit df2cb6daa4 ("block: Avoid deadlocks with bio allocation by
stacking drivers") created a workqueue for every bio set and code
in bio_alloc_bioset() that tries to resolve some low-memory deadlocks
by redirecting bios queued on current->bio_list to the workqueue if the
system is low on memory.  However other deadlocks (see below **) may
happen, without any low memory condition, because generic_make_request
is queuing bios to current->bio_list (rather than submitting them).

** the related dm-snapshot deadlock is detailed here:
https://www.redhat.com/archives/dm-devel/2016-July/msg00065.html

Fix this deadlock by redirecting any bios on current->bio_list to the
bio_set's rescue workqueue on every schedule() call.  Consequently,
when the process blocks on a mutex, the bios queued on
current->bio_list are dispatched to independent workqueus and they can
complete without waiting for the mutex to be available.

The structure blk_plug contains an entry cb_list and this list can contain
arbitrary callback functions that are called when the process blocks.
To implement this fix DM (ab)uses the onstack plug's cb_list interface
to get its flush_current_bio_list() called at schedule() time.

This fixes the snapshot deadlock - if the map method blocks,
flush_current_bio_list() will be called and it redirects bios waiting
on current->bio_list to appropriate workqueues.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1267650
Depends-on: df2cb6daa4 ("block: Avoid deadlocks with bio allocation by stacking drivers")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


---
 drivers/md/dm.c |   55 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)

--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -972,10 +972,61 @@ void dm_accept_partial_bio(struct bio *b
 }
 EXPORT_SYMBOL_GPL(dm_accept_partial_bio);
 
+/*
+ * Flush current->bio_list when the target map method blocks.
+ * This fixes deadlocks in snapshot and possibly in other targets.
+ */
+struct dm_offload {
+	struct blk_plug plug;
+	struct blk_plug_cb cb;
+};
+
+static void flush_current_bio_list(struct blk_plug_cb *cb, bool from_schedule)
+{
+	struct dm_offload *o = container_of(cb, struct dm_offload, cb);
+	struct bio_list list;
+	struct bio *bio;
+
+	INIT_LIST_HEAD(&o->cb.list);
+
+	if (unlikely(!current->bio_list))
+		return;
+
+	list = *current->bio_list;
+	bio_list_init(current->bio_list);
+
+	while ((bio = bio_list_pop(&list))) {
+		struct bio_set *bs = bio->bi_pool;
+		if (unlikely(!bs) || bs == fs_bio_set) {
+			bio_list_add(current->bio_list, bio);
+			continue;
+		}
+
+		spin_lock(&bs->rescue_lock);
+		bio_list_add(&bs->rescue_list, bio);
+		queue_work(bs->rescue_workqueue, &bs->rescue_work);
+		spin_unlock(&bs->rescue_lock);
+	}
+}
+
+static void dm_offload_start(struct dm_offload *o)
+{
+	blk_start_plug(&o->plug);
+	o->cb.callback = flush_current_bio_list;
+	list_add(&o->cb.list, &current->plug->cb_list);
+}
+
+static void dm_offload_end(struct dm_offload *o)
+{
+	list_del(&o->cb.list);
+	blk_finish_plug(&o->plug);
+}
+
 static void __map_bio(struct dm_target_io *tio)
 {
 	int r;
 	sector_t sector;
+	struct dm_offload o;
 	struct bio *clone = &tio->clone;
 	struct dm_target *ti = tio->ti;
 
@@ -988,7 +1039,11 @@ static void __map_bio(struct dm_target_i
 	 */
 	atomic_inc(&tio->io->io_count);
 	sector = clone->bi_iter.bi_sector;
+
+	dm_offload_start(&o);
 	r = ti->type->map(ti, clone);
+	dm_offload_end(&o);
+
 	if (r == DM_MAPIO_REMAPPED) {
 		/* the bio has been remapped so dispatch it */
 

  parent reply	other threads:[~2017-03-16 14:36 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-16 14:29 [PATCH 4.10 00/48] 4.10.4-stable review Greg Kroah-Hartman
2017-03-16 14:29 ` [PATCH 4.10 01/48] iio: 104-quad-8: Fix off-by-one error when addressing flag register Greg Kroah-Hartman
2017-03-16 14:29 ` [PATCH 4.10 02/48] ARM: qcom_defconfig: Enable RPM/RPM-SMD clocks Greg Kroah-Hartman
2017-03-16 14:29 ` [PATCH 4.10 03/48] USB: serial: digi_acceleport: fix OOB data sanity check Greg Kroah-Hartman
2017-03-16 14:29 ` [PATCH 4.10 04/48] USB: serial: digi_acceleport: fix OOB-event processing Greg Kroah-Hartman
2017-03-16 14:29 ` [PATCH 4.10 05/48] crypto: improve gcc optimization flags for serpent and wp512 Greg Kroah-Hartman
2017-03-16 14:29 ` [PATCH 4.10 06/48] MIPS: Update defconfigs for NF_CT_PROTO_DCCP/UDPLITE change Greg Kroah-Hartman
2017-03-16 14:29 ` [PATCH 4.10 07/48] MIPS: VDSO: avoid duplicate CAC_BASE definition Greg Kroah-Hartman
2017-03-16 14:29 ` [PATCH 4.10 08/48] MIPS: ip27: Disable qlge driver in defconfig Greg Kroah-Hartman
2017-03-16 14:29 ` [PATCH 4.10 09/48] MIPS: Update ip27_defconfig for SCSI_DH change Greg Kroah-Hartman
2017-03-16 14:29 ` [PATCH 4.10 10/48] MIPS: ip22: Fix ip28 build for modern gcc Greg Kroah-Hartman
2017-03-16 14:29 ` [PATCH 4.10 11/48] MIPS: Update lemote2f_defconfig for CPU_FREQ_STAT change Greg Kroah-Hartman
2017-03-16 14:29 ` [PATCH 4.10 12/48] mtd: pmcmsp: use kstrndup instead of kmalloc+strncpy Greg Kroah-Hartman
2017-03-16 14:29 ` [PATCH 4.10 13/48] MIPS: ralink: Cosmetic change to prom_init() Greg Kroah-Hartman
2017-03-16 14:29 ` [PATCH 4.10 14/48] MIPS: ralink: Remove unused timer functions Greg Kroah-Hartman
2017-03-16 14:29 ` [PATCH 4.10 15/48] MIPS: ralink: Remove unused rt*_wdt_reset functions Greg Kroah-Hartman
2017-03-16 14:30 ` [PATCH 4.10 17/48] tracing: Add #undef to fix compile error Greg Kroah-Hartman
2017-03-16 14:30 ` [PATCH 4.10 18/48] ucount: Remove the atomicity from ucount->count Greg Kroah-Hartman
2017-03-16 14:30 ` [PATCH 4.10 19/48] efi/arm: Fix boot crash with CONFIG_CPUMASK_OFFSTACK=y Greg Kroah-Hartman
2017-03-16 14:30 ` [PATCH 4.10 20/48] [media] dw2102: dont do DMA on stack Greg Kroah-Hartman
2017-03-16 14:30 ` [PATCH 4.10 21/48] i2c: add missing of_node_put in i2c_mux_del_adapters Greg Kroah-Hartman
2017-03-16 14:30 ` [PATCH 4.10 22/48] powerpc: Emulation support for load/store instructions on LE Greg Kroah-Hartman
2017-03-16 14:30 ` [PATCH 4.10 23/48] powerpc/booke: Fix boot crash due to null hugepd Greg Kroah-Hartman
2017-03-16 14:30 ` [PATCH 4.10 24/48] powerpc/xics: Work around limitations of OPAL XICS priority handling Greg Kroah-Hartman
2017-03-16 14:30 ` [PATCH 4.10 25/48] PCI: Prevent VPD access for QLogic ISP2722 Greg Kroah-Hartman
2017-03-16 14:30 ` [PATCH 4.10 26/48] usb: gadget: dummy_hcd: clear usb_gadget region before registration Greg Kroah-Hartman
2017-03-16 14:30 ` [PATCH 4.10 27/48] usb: dwc3: gadget: make Set Endpoint Configuration macros safe Greg Kroah-Hartman
2017-03-16 14:30 ` [PATCH 4.10 28/48] usb: dwc3-omap: Fix missing break in dwc3_omap_set_mailbox() Greg Kroah-Hartman
2017-03-16 14:30 ` [PATCH 4.10 29/48] usb: ohci-at91: Do not drop unhandled USB suspend control requests Greg Kroah-Hartman
2017-03-16 14:30 ` [PATCH 4.10 30/48] usb: gadget: function: f_fs: pass companion descriptor along Greg Kroah-Hartman
2017-03-16 14:30 ` [PATCH 4.10 31/48] Revert "usb: gadget: uvc: Add missing call for additional setup data" Greg Kroah-Hartman
2017-03-16 14:30 ` [PATCH 4.10 32/48] usb: host: xhci-dbg: HCIVERSION should be a binary number Greg Kroah-Hartman
2017-03-16 14:30 ` [PATCH 4.10 33/48] usb: host: xhci-plat: Fix timeout on removal of hot pluggable xhci controllers Greg Kroah-Hartman
2017-03-16 14:30 ` [PATCH 4.10 34/48] USB: serial: safe_serial: fix information leak in completion handler Greg Kroah-Hartman
2017-03-16 14:30 ` [PATCH 4.10 35/48] USB: serial: omninet: fix reference leaks at open Greg Kroah-Hartman
2017-03-16 14:30 ` [PATCH 4.10 36/48] USB: iowarrior: fix NULL-deref at probe Greg Kroah-Hartman
2017-03-16 14:30 ` [PATCH 4.10 37/48] USB: iowarrior: fix NULL-deref in write Greg Kroah-Hartman
2017-03-16 14:30 ` [PATCH 4.10 38/48] USB: serial: io_ti: fix NULL-deref in interrupt callback Greg Kroah-Hartman
2017-03-16 14:30 ` [PATCH 4.10 39/48] USB: serial: io_ti: fix information leak in completion handler Greg Kroah-Hartman
2017-03-16 14:30 ` [PATCH 4.10 41/48] KVM: s390: Fix guest migration for huge guests resulting in panic Greg Kroah-Hartman
2017-03-16 14:30 ` [PATCH 4.10 42/48] KVM: arm/arm64: Let vcpu thread modify its own active state Greg Kroah-Hartman
2017-03-16 14:30 ` [PATCH 4.10 43/48] drm/i915/gvt: Fix superfluous newline in GVT_DISPLAY_READY env var Greg Kroah-Hartman
2017-03-16 14:30 ` [PATCH 4.10 44/48] [media] serial_ir: ensure were ready to receive interrupts Greg Kroah-Hartman
2017-03-16 14:30 ` Greg Kroah-Hartman [this message]
2017-03-16 14:30 ` [PATCH 4.10 46/48] [media] rc: raw decoder for keymap protocol is not loaded on register Greg Kroah-Hartman
2017-03-16 14:30 ` [PATCH 4.10 47/48] ext4: dont BUG when truncating encrypted inodes on the orphan list Greg Kroah-Hartman
2017-03-16 14:30 ` [PATCH 4.10 48/48] IB/mlx5: Verify that Q counters are supported Greg Kroah-Hartman
2017-03-16 19:21 ` [PATCH 4.10 00/48] 4.10.4-stable review Shuah Khan
2017-03-17  1:42   ` Greg Kroah-Hartman
2017-03-16 22:37 ` Guenter Roeck
2017-03-17  1:42   ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170316142923.047201831@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mpatocka@redhat.com \
    --cc=snitzer@redhat.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).