From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Anton Eidelman <anton@lightbitslabs.com>,
Sagi Grimberg <sagi@grimberg.me>, Christoph Hellwig <hch@lst.de>,
Sasha Levin <sashal@kernel.org>
Subject: [PATCH 4.19 11/36] nvme-multipath: fix deadlock between ana_work and scan_work
Date: Tue, 7 Jul 2020 17:17:03 +0200 [thread overview]
Message-ID: <20200707145749.677702241@linuxfoundation.org> (raw)
In-Reply-To: <20200707145749.130272978@linuxfoundation.org>
From: Anton Eidelman <anton@lightbitslabs.com>
[ Upstream commit 489dd102a2c7c94d783a35f9412eb085b8da1aa4 ]
When scan_work calls nvme_mpath_add_disk() this holds ana_lock
and invokes nvme_parse_ana_log(), which may issue IO
in device_add_disk() and hang waiting for an accessible path.
While nvme_mpath_set_live() only called when nvme_state_is_live(),
a transition may cause NVME_SC_ANA_TRANSITION and requeue the IO.
In order to recover and complete the IO ana_work on the same ctrl
should be able to update the path state and remove NVME_NS_ANA_PENDING.
The deadlock occurs because scan_work keeps holding ana_lock,
so ana_work hangs [1].
Fix:
Now nvme_mpath_add_disk() uses nvme_parse_ana_log() to obtain a copy
of the ANA group desc, and then calls nvme_update_ns_ana_state() without
holding ana_lock.
[1]:
kernel: Workqueue: nvme-wq nvme_scan_work [nvme_core]
kernel: Call Trace:
kernel: __schedule+0x2b9/0x6c0
kernel: schedule+0x42/0xb0
kernel: io_schedule+0x16/0x40
kernel: do_read_cache_page+0x438/0x830
kernel: read_cache_page+0x12/0x20
kernel: read_dev_sector+0x27/0xc0
kernel: read_lba+0xc1/0x220
kernel: efi_partition+0x1e6/0x708
kernel: check_partition+0x154/0x244
kernel: rescan_partitions+0xae/0x280
kernel: __blkdev_get+0x40f/0x560
kernel: blkdev_get+0x3d/0x140
kernel: __device_add_disk+0x388/0x480
kernel: device_add_disk+0x13/0x20
kernel: nvme_mpath_set_live+0x119/0x140 [nvme_core]
kernel: nvme_update_ns_ana_state+0x5c/0x60 [nvme_core]
kernel: nvme_set_ns_ana_state+0x1e/0x30 [nvme_core]
kernel: nvme_parse_ana_log+0xa1/0x180 [nvme_core]
kernel: nvme_mpath_add_disk+0x47/0x90 [nvme_core]
kernel: nvme_validate_ns+0x396/0x940 [nvme_core]
kernel: nvme_scan_work+0x24f/0x380 [nvme_core]
kernel: process_one_work+0x1db/0x380
kernel: worker_thread+0x249/0x400
kernel: kthread+0x104/0x140
kernel: Workqueue: nvme-wq nvme_ana_work [nvme_core]
kernel: Call Trace:
kernel: __schedule+0x2b9/0x6c0
kernel: schedule+0x42/0xb0
kernel: schedule_preempt_disabled+0xe/0x10
kernel: __mutex_lock.isra.0+0x182/0x4f0
kernel: ? __switch_to_asm+0x34/0x70
kernel: ? select_task_rq_fair+0x1aa/0x5c0
kernel: ? kvm_sched_clock_read+0x11/0x20
kernel: ? sched_clock+0x9/0x10
kernel: __mutex_lock_slowpath+0x13/0x20
kernel: mutex_lock+0x2e/0x40
kernel: nvme_read_ana_log+0x3a/0x100 [nvme_core]
kernel: nvme_ana_work+0x15/0x20 [nvme_core]
kernel: process_one_work+0x1db/0x380
kernel: worker_thread+0x4d/0x400
kernel: kthread+0x104/0x140
kernel: ? process_one_work+0x380/0x380
kernel: ? kthread_park+0x80/0x80
kernel: ret_from_fork+0x35/0x40
Fixes: 0d0b660f214d ("nvme: add ANA support")
Signed-off-by: Anton Eidelman <anton@lightbitslabs.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/nvme/host/multipath.c | 24 ++++++++++++++++--------
1 file changed, 16 insertions(+), 8 deletions(-)
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 6f584a9515f42..3ad6183c5e6b4 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -496,26 +496,34 @@ static ssize_t ana_state_show(struct device *dev, struct device_attribute *attr,
}
DEVICE_ATTR_RO(ana_state);
-static int nvme_set_ns_ana_state(struct nvme_ctrl *ctrl,
+static int nvme_lookup_ana_group_desc(struct nvme_ctrl *ctrl,
struct nvme_ana_group_desc *desc, void *data)
{
- struct nvme_ns *ns = data;
+ struct nvme_ana_group_desc *dst = data;
- if (ns->ana_grpid == le32_to_cpu(desc->grpid)) {
- nvme_update_ns_ana_state(desc, ns);
- return -ENXIO; /* just break out of the loop */
- }
+ if (desc->grpid != dst->grpid)
+ return 0;
- return 0;
+ *dst = *desc;
+ return -ENXIO; /* just break out of the loop */
}
void nvme_mpath_add_disk(struct nvme_ns *ns, struct nvme_id_ns *id)
{
if (nvme_ctrl_use_ana(ns->ctrl)) {
+ struct nvme_ana_group_desc desc = {
+ .grpid = id->anagrpid,
+ .state = 0,
+ };
+
mutex_lock(&ns->ctrl->ana_lock);
ns->ana_grpid = le32_to_cpu(id->anagrpid);
- nvme_parse_ana_log(ns->ctrl, ns, nvme_set_ns_ana_state);
+ nvme_parse_ana_log(ns->ctrl, &desc, nvme_lookup_ana_group_desc);
mutex_unlock(&ns->ctrl->ana_lock);
+ if (desc.state) {
+ /* found the group desc: update */
+ nvme_update_ns_ana_state(&desc, ns);
+ }
} else {
mutex_lock(&ns->head->lock);
ns->ana_state = NVME_ANA_OPTIMIZED;
--
2.25.1
next prev parent reply other threads:[~2020-07-07 15:18 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-07 15:16 [PATCH 4.19 00/36] 4.19.132-rc1 review Greg Kroah-Hartman
2020-07-07 15:16 ` [PATCH 4.19 01/36] btrfs: fix a block group ref counter leak after failure to remove block group Greg Kroah-Hartman
2020-07-07 15:16 ` [PATCH 4.19 02/36] mm: fix swap cache node allocation mask Greg Kroah-Hartman
2020-07-07 15:16 ` [PATCH 4.19 03/36] EDAC/amd64: Read back the scrub rate PCI register on F15h Greg Kroah-Hartman
2020-07-07 15:16 ` [PATCH 4.19 04/36] usbnet: smsc95xx: Fix use-after-free after removal Greg Kroah-Hartman
2020-07-07 15:16 ` [PATCH 4.19 05/36] mm/slub.c: fix corrupted freechain in deactivate_slab() Greg Kroah-Hartman
2020-07-07 15:16 ` [PATCH 4.19 06/36] mm/slub: fix stack overruns with SLUB_STATS Greg Kroah-Hartman
2020-07-07 15:16 ` [PATCH 4.19 07/36] usb: usbtest: fix missing kfree(dev->buf) in usbtest_disconnect Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 4.19 08/36] s390/debug: avoid kernel warning on too large number of pages Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 4.19 09/36] nvme-multipath: set bdi capabilities once Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 4.19 10/36] nvme: fix possible deadlock when I/O is blocked Greg Kroah-Hartman
2020-07-07 18:16 ` Pavel Machek
2020-07-08 2:29 ` Sasha Levin
2020-07-07 15:17 ` Greg Kroah-Hartman [this message]
2020-07-07 15:17 ` [PATCH 4.19 12/36] kgdb: Avoid suspicious RCU usage warning Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 4.19 13/36] crypto: af_alg - fix use-after-free in af_alg_accept() due to bh_lock_sock() Greg Kroah-Hartman
2020-07-07 21:25 ` Pavel Machek
2020-07-07 23:38 ` Herbert Xu
2020-07-07 15:17 ` [PATCH 4.19 14/36] drm/msm/dpu: fix error return code in dpu_encoder_init Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 4.19 15/36] cxgb4: use unaligned conversion for fetching timestamp Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 4.19 16/36] cxgb4: parse TC-U32 key values and masks natively Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 4.19 17/36] cxgb4: use correct type for all-mask IP address comparison Greg Kroah-Hartman
2020-07-07 21:33 ` Pavel Machek
2020-07-08 12:36 ` Rahul Lakkireddy
2020-07-07 15:17 ` [PATCH 4.19 18/36] cxgb4: fix SGE queue dump destination buffer context Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 4.19 19/36] hwmon: (max6697) Make sure the OVERT mask is set correctly Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 4.19 20/36] hwmon: (acpi_power_meter) Fix potential memory leak in acpi_power_meter_add() Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 4.19 21/36] drm: sun4i: hdmi: Remove extra HPD polling Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 4.19 22/36] virtio-blk: free vblk-vqs in error path of virtblk_probe() Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 4.19 23/36] SMB3: Honor posix flag for multiuser mounts Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 4.19 24/36] nvme: fix a crash in nvme_mpath_add_disk Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 4.19 25/36] i2c: algo-pca: Add 0x78 as SCL stuck low status for PCA9665 Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 4.19 26/36] i2c: mlxcpld: check correct size of maximum RECV_LEN packet Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 4.19 27/36] nfsd: apply umask on fs without ACL support Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 4.19 28/36] Revert "ALSA: usb-audio: Improve frames size computation" Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 4.19 29/36] SMB3: Honor seal flag for multiuser mounts Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 4.19 30/36] SMB3: Honor persistent/resilient handle flags " Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 4.19 31/36] SMB3: Honor lease disabling " Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 4.19 32/36] cifs: Fix the target file was deleted when rename failed Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 4.19 33/36] MIPS: Add missing EHB in mtc0 -> mfc0 sequence for DSPen Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 4.19 34/36] irqchip/gic: Atomically update affinity Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 4.19 35/36] dm zoned: assign max_io_len correctly Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 4.19 36/36] efi: Make it possible to disable efivar_ssdt entirely Greg Kroah-Hartman
2020-07-08 5:56 ` [PATCH 4.19 00/36] 4.19.132-rc1 review Naresh Kamboju
[not found] ` <20200707145749.130272978-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
2020-07-08 8:41 ` Jon Hunter
2020-07-08 8:41 ` Jon Hunter
2020-07-08 10:41 ` Chris Paterson
2020-07-08 15:15 ` Greg Kroah-Hartman
2020-07-08 15:04 ` Shuah Khan
2020-07-08 17:52 ` Guenter Roeck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200707145749.677702241@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=anton@lightbitslabs.com \
--cc=hch@lst.de \
--cc=linux-kernel@vger.kernel.org \
--cc=sagi@grimberg.me \
--cc=sashal@kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.