All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Mikulas Patocka <mpatocka@redhat.com>,
	Mike Snitzer <snitzer@redhat.com>
Subject: [PATCH 4.4 34/35] dm: flush queued bios when process blocks to avoid deadlock
Date: Thu, 16 Mar 2017 23:29:53 +0900	[thread overview]
Message-ID: <20170316142909.076437163@linuxfoundation.org> (raw)
In-Reply-To: <20170316142906.685052998@linuxfoundation.org>

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Mikulas Patocka <mpatocka@redhat.com>

commit d67a5f4b5947aba4bfe9a80a2b86079c215ca755 upstream.

Commit df2cb6daa4 ("block: Avoid deadlocks with bio allocation by
stacking drivers") created a workqueue for every bio set and code
in bio_alloc_bioset() that tries to resolve some low-memory deadlocks
by redirecting bios queued on current->bio_list to the workqueue if the
system is low on memory.  However other deadlocks (see below **) may
happen, without any low memory condition, because generic_make_request
is queuing bios to current->bio_list (rather than submitting them).

** the related dm-snapshot deadlock is detailed here:
https://www.redhat.com/archives/dm-devel/2016-July/msg00065.html

Fix this deadlock by redirecting any bios on current->bio_list to the
bio_set's rescue workqueue on every schedule() call.  Consequently,
when the process blocks on a mutex, the bios queued on
current->bio_list are dispatched to independent workqueus and they can
complete without waiting for the mutex to be available.

The structure blk_plug contains an entry cb_list and this list can contain
arbitrary callback functions that are called when the process blocks.
To implement this fix DM (ab)uses the onstack plug's cb_list interface
to get its flush_current_bio_list() called at schedule() time.

This fixes the snapshot deadlock - if the map method blocks,
flush_current_bio_list() will be called and it redirects bios waiting
on current->bio_list to appropriate workqueues.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1267650
Depends-on: df2cb6daa4 ("block: Avoid deadlocks with bio allocation by stacking drivers")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/md/dm.c |   55 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)

--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1467,11 +1467,62 @@ void dm_accept_partial_bio(struct bio *b
 }
 EXPORT_SYMBOL_GPL(dm_accept_partial_bio);
 
+/*
+ * Flush current->bio_list when the target map method blocks.
+ * This fixes deadlocks in snapshot and possibly in other targets.
+ */
+struct dm_offload {
+	struct blk_plug plug;
+	struct blk_plug_cb cb;
+};
+
+static void flush_current_bio_list(struct blk_plug_cb *cb, bool from_schedule)
+{
+	struct dm_offload *o = container_of(cb, struct dm_offload, cb);
+	struct bio_list list;
+	struct bio *bio;
+
+	INIT_LIST_HEAD(&o->cb.list);
+
+	if (unlikely(!current->bio_list))
+		return;
+
+	list = *current->bio_list;
+	bio_list_init(current->bio_list);
+
+	while ((bio = bio_list_pop(&list))) {
+		struct bio_set *bs = bio->bi_pool;
+		if (unlikely(!bs) || bs == fs_bio_set) {
+			bio_list_add(current->bio_list, bio);
+			continue;
+		}
+
+		spin_lock(&bs->rescue_lock);
+		bio_list_add(&bs->rescue_list, bio);
+		queue_work(bs->rescue_workqueue, &bs->rescue_work);
+		spin_unlock(&bs->rescue_lock);
+	}
+}
+
+static void dm_offload_start(struct dm_offload *o)
+{
+	blk_start_plug(&o->plug);
+	o->cb.callback = flush_current_bio_list;
+	list_add(&o->cb.list, &current->plug->cb_list);
+}
+
+static void dm_offload_end(struct dm_offload *o)
+{
+	list_del(&o->cb.list);
+	blk_finish_plug(&o->plug);
+}
+
 static void __map_bio(struct dm_target_io *tio)
 {
 	int r;
 	sector_t sector;
 	struct mapped_device *md;
+	struct dm_offload o;
 	struct bio *clone = &tio->clone;
 	struct dm_target *ti = tio->ti;
 
@@ -1484,7 +1535,11 @@ static void __map_bio(struct dm_target_i
 	 */
 	atomic_inc(&tio->io->io_count);
 	sector = clone->bi_iter.bi_sector;
+
+	dm_offload_start(&o);
 	r = ti->type->map(ti, clone);
+	dm_offload_end(&o);
+
 	if (r == DM_MAPIO_REMAPPED) {
 		/* the bio has been remapped so dispatch it */
 

  parent reply	other threads:[~2017-03-16 15:04 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-16 14:29 [PATCH 4.4 00/35] 4.4.55-stable review Greg Kroah-Hartman
2017-03-16 14:29 ` [PATCH 4.4 01/35] USB: serial: digi_acceleport: fix OOB data sanity check Greg Kroah-Hartman
2017-03-16 14:29 ` [PATCH 4.4 02/35] USB: serial: digi_acceleport: fix OOB-event processing Greg Kroah-Hartman
2017-03-16 14:29 ` [PATCH 4.4 03/35] crypto: improve gcc optimization flags for serpent and wp512 Greg Kroah-Hartman
2017-03-16 14:29 ` [PATCH 4.4 04/35] MIPS: Update defconfigs for NF_CT_PROTO_DCCP/UDPLITE change Greg Kroah-Hartman
2017-03-19 16:05   ` Ben Hutchings
2017-03-20 10:03     ` Ralf Baechle
2017-03-20 10:03       ` Ralf Baechle
2017-03-20 10:30     ` Greg Kroah-Hartman
2017-03-20 10:30       ` Greg Kroah-Hartman
2017-03-20 10:42       ` Arnd Bergmann
2017-03-20 16:29         ` Greg Kroah-Hartman
2017-03-20 16:43           ` Ralf Baechle
2017-03-16 14:29 ` [PATCH 4.4 05/35] MIPS: ip27: Disable qlge driver in defconfig Greg Kroah-Hartman
2017-03-16 14:29 ` [PATCH 4.4 06/35] MIPS: Update ip27_defconfig for SCSI_DH change Greg Kroah-Hartman
2017-03-16 14:29 ` [PATCH 4.4 07/35] MIPS: ip22: Fix ip28 build for modern gcc Greg Kroah-Hartman
2017-03-16 14:29 ` [PATCH 4.4 08/35] MIPS: Update lemote2f_defconfig for CPU_FREQ_STAT change Greg Kroah-Hartman
2017-03-19 16:06   ` Ben Hutchings
2017-03-20 10:15     ` Ralf Baechle
2017-03-20 10:15       ` Ralf Baechle
2017-03-20 16:29       ` Greg Kroah-Hartman
2017-03-20 16:29         ` Greg Kroah-Hartman
2017-03-20 16:50         ` Ralf Baechle
2017-03-20 17:34           ` Greg Kroah-Hartman
2017-03-20 10:31     ` Greg Kroah-Hartman
2017-03-20 10:31       ` Greg Kroah-Hartman
2017-03-16 14:29 ` [PATCH 4.4 09/35] mtd: pmcmsp: use kstrndup instead of kmalloc+strncpy Greg Kroah-Hartman
2017-03-16 14:29 ` [PATCH 4.4 10/35] MIPS: ralink: Cosmetic change to prom_init() Greg Kroah-Hartman
2017-03-16 14:29 ` [PATCH 4.4 11/35] MIPS: ralink: Remove unused rt*_wdt_reset functions Greg Kroah-Hartman
2017-03-16 14:29 ` [PATCH 4.4 12/35] cpmac: remove hopeless #warning Greg Kroah-Hartman
2017-03-16 14:29 ` [PATCH 4.4 13/35] mm: memcontrol: avoid unused function warning Greg Kroah-Hartman
2017-03-16 14:29 ` [PATCH 4.4 14/35] MIPS: DEC: Avoid la pseudo-instruction in delay slots Greg Kroah-Hartman
2017-03-16 14:29 ` [PATCH 4.4 15/35] MIPS: Netlogic: Fix CP0_EBASE redefinition warnings Greg Kroah-Hartman
2017-03-16 14:29 ` [PATCH 4.4 16/35] tracing: Add #undef to fix compile error Greg Kroah-Hartman
2017-03-16 14:29 ` [PATCH 4.4 17/35] powerpc: Emulation support for load/store instructions on LE Greg Kroah-Hartman
2017-03-16 14:29 ` [PATCH 4.4 18/35] usb: gadget: dummy_hcd: clear usb_gadget region before registration Greg Kroah-Hartman
2017-03-16 14:29 ` [PATCH 4.4 19/35] usb: dwc3: gadget: make Set Endpoint Configuration macros safe Greg Kroah-Hartman
2017-03-16 14:29 ` [PATCH 4.4 20/35] usb: gadget: function: f_fs: pass companion descriptor along Greg Kroah-Hartman
2017-03-16 14:29 ` [PATCH 4.4 21/35] usb: host: xhci-dbg: HCIVERSION should be a binary number Greg Kroah-Hartman
2017-03-16 14:29 ` [PATCH 4.4 22/35] usb: host: xhci-plat: Fix timeout on removal of hot pluggable xhci controllers Greg Kroah-Hartman
2017-03-16 14:29 ` [PATCH 4.4 23/35] USB: serial: safe_serial: fix information leak in completion handler Greg Kroah-Hartman
2017-03-16 14:29 ` [PATCH 4.4 24/35] USB: serial: omninet: fix reference leaks at open Greg Kroah-Hartman
2017-03-16 14:29 ` [PATCH 4.4 25/35] USB: iowarrior: fix NULL-deref at probe Greg Kroah-Hartman
2017-03-16 14:29 ` [PATCH 4.4 26/35] USB: iowarrior: fix NULL-deref in write Greg Kroah-Hartman
2017-03-16 14:29 ` [PATCH 4.4 27/35] USB: serial: io_ti: fix NULL-deref in interrupt callback Greg Kroah-Hartman
2017-03-16 14:29 ` [PATCH 4.4 28/35] USB: serial: io_ti: fix information leak in completion handler Greg Kroah-Hartman
2017-03-16 14:29 ` [PATCH 4.4 30/35] mvsas: fix misleading indentation Greg Kroah-Hartman
2017-03-16 14:29 ` [PATCH 4.4 31/35] KVM: s390: Fix guest migration for huge guests resulting in panic Greg Kroah-Hartman
2017-03-16 14:29 ` [PATCH 4.4 32/35] s390/kdump: Use "LINUX" ELF note name instead of "CORE" Greg Kroah-Hartman
2017-03-16 14:29 ` [PATCH 4.4 33/35] nfit, libnvdimm: fix interleave set cookie calculation Greg Kroah-Hartman
2017-03-19 16:38   ` Ben Hutchings
2017-03-20 17:55     ` Dan Williams
2017-03-16 14:29 ` Greg Kroah-Hartman [this message]
2017-03-16 14:29 ` [PATCH 4.4 35/35] ext4: dont BUG when truncating encrypted inodes on the orphan list Greg Kroah-Hartman
2017-03-16 19:20 ` [PATCH 4.4 00/35] 4.4.55-stable review Shuah Khan
2017-03-16 22:36 ` Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170316142909.076437163@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mpatocka@redhat.com \
    --cc=snitzer@redhat.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.