From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Anton Eidelman <anton@lightbitslabs.com>,
Sagi Grimberg <sagi@grimberg.me>, Christoph Hellwig <hch@lst.de>,
Sasha Levin <sashal@kernel.org>
Subject: [PATCH 4.19 10/36] nvme: fix possible deadlock when I/O is blocked
Date: Tue, 7 Jul 2020 17:17:02 +0200 [thread overview]
Message-ID: <20200707145749.639245963@linuxfoundation.org> (raw)
In-Reply-To: <20200707145749.130272978@linuxfoundation.org>
From: Sagi Grimberg <sagi@grimberg.me>
[ Upstream commit 3b4b19721ec652ad2c4fe51dfbe5124212b5f581 ]
Revert fab7772bfbcf ("nvme-multipath: revalidate nvme_ns_head gendisk
in nvme_validate_ns")
When adding a new namespace to the head disk (via nvme_mpath_set_live)
we will see partition scan which triggers I/O on the mpath device node.
This process will usually be triggered from the scan_work which holds
the scan_lock. If I/O blocks (if we got ana change currently have only
available paths but none are accessible) this can deadlock on the head
disk bd_mutex as both partition scan I/O takes it, and head disk revalidation
takes it to check for resize (also triggered from scan_work on a different
path). See trace [1].
The mpath disk revalidation was originally added to detect online disk
size change, but this is no longer needed since commit cb224c3af4df
("nvme: Convert to use set_capacity_revalidate_and_notify") which already
updates resize info without unnecessarily revalidating the disk (the
mpath disk doesn't even implement .revalidate_disk fop).
[1]:
--
kernel: INFO: task kworker/u65:9:494 blocked for more than 241 seconds.
kernel: Tainted: G OE 5.3.5-050305-generic #201910071830
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: kworker/u65:9 D 0 494 2 0x80004000
kernel: Workqueue: nvme-wq nvme_scan_work [nvme_core]
kernel: Call Trace:
kernel: __schedule+0x2b9/0x6c0
kernel: schedule+0x42/0xb0
kernel: schedule_preempt_disabled+0xe/0x10
kernel: __mutex_lock.isra.0+0x182/0x4f0
kernel: __mutex_lock_slowpath+0x13/0x20
kernel: mutex_lock+0x2e/0x40
kernel: revalidate_disk+0x63/0xa0
kernel: __nvme_revalidate_disk+0xfe/0x110 [nvme_core]
kernel: nvme_revalidate_disk+0xa4/0x160 [nvme_core]
kernel: ? evict+0x14c/0x1b0
kernel: revalidate_disk+0x2b/0xa0
kernel: nvme_validate_ns+0x49/0x940 [nvme_core]
kernel: ? blk_mq_free_request+0xd2/0x100
kernel: ? __nvme_submit_sync_cmd+0xbe/0x1e0 [nvme_core]
kernel: nvme_scan_work+0x24f/0x380 [nvme_core]
kernel: process_one_work+0x1db/0x380
kernel: worker_thread+0x249/0x400
kernel: kthread+0x104/0x140
kernel: ? process_one_work+0x380/0x380
kernel: ? kthread_park+0x80/0x80
kernel: ret_from_fork+0x1f/0x40
...
kernel: INFO: task kworker/u65:1:2630 blocked for more than 241 seconds.
kernel: Tainted: G OE 5.3.5-050305-generic #201910071830
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: kworker/u65:1 D 0 2630 2 0x80004000
kernel: Workqueue: nvme-wq nvme_scan_work [nvme_core]
kernel: Call Trace:
kernel: __schedule+0x2b9/0x6c0
kernel: schedule+0x42/0xb0
kernel: io_schedule+0x16/0x40
kernel: do_read_cache_page+0x438/0x830
kernel: ? __switch_to_asm+0x34/0x70
kernel: ? file_fdatawait_range+0x30/0x30
kernel: read_cache_page+0x12/0x20
kernel: read_dev_sector+0x27/0xc0
kernel: read_lba+0xc1/0x220
kernel: ? kmem_cache_alloc_trace+0x19c/0x230
kernel: efi_partition+0x1e6/0x708
kernel: ? vsnprintf+0x39e/0x4e0
kernel: ? snprintf+0x49/0x60
kernel: check_partition+0x154/0x244
kernel: rescan_partitions+0xae/0x280
kernel: __blkdev_get+0x40f/0x560
kernel: blkdev_get+0x3d/0x140
kernel: __device_add_disk+0x388/0x480
kernel: device_add_disk+0x13/0x20
kernel: nvme_mpath_set_live+0x119/0x140 [nvme_core]
kernel: nvme_update_ns_ana_state+0x5c/0x60 [nvme_core]
kernel: nvme_set_ns_ana_state+0x1e/0x30 [nvme_core]
kernel: nvme_parse_ana_log+0xa1/0x180 [nvme_core]
kernel: ? nvme_update_ns_ana_state+0x60/0x60 [nvme_core]
kernel: nvme_mpath_add_disk+0x47/0x90 [nvme_core]
kernel: nvme_validate_ns+0x396/0x940 [nvme_core]
kernel: ? blk_mq_free_request+0xd2/0x100
kernel: nvme_scan_work+0x24f/0x380 [nvme_core]
kernel: process_one_work+0x1db/0x380
kernel: worker_thread+0x249/0x400
kernel: kthread+0x104/0x140
kernel: ? process_one_work+0x380/0x380
kernel: ? kthread_park+0x80/0x80
kernel: ret_from_fork+0x1f/0x40
--
Fixes: fab7772bfbcf ("nvme-multipath: revalidate nvme_ns_head gendisk
in nvme_validate_ns")
Signed-off-by: Anton Eidelman <anton@lightbitslabs.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/nvme/host/core.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 0d60f2f8f3eec..5c9326777334f 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1602,7 +1602,6 @@ static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
if (ns->head->disk) {
nvme_update_disk_info(ns->head->disk, ns, id);
blk_queue_stack_limits(ns->head->disk->queue, ns->queue);
- revalidate_disk(ns->head->disk);
}
#endif
}
--
2.25.1
next prev parent reply other threads:[~2020-07-07 15:19 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-07 15:16 [PATCH 4.19 00/36] 4.19.132-rc1 review Greg Kroah-Hartman
2020-07-07 15:16 ` [PATCH 4.19 01/36] btrfs: fix a block group ref counter leak after failure to remove block group Greg Kroah-Hartman
2020-07-07 15:16 ` [PATCH 4.19 02/36] mm: fix swap cache node allocation mask Greg Kroah-Hartman
2020-07-07 15:16 ` [PATCH 4.19 03/36] EDAC/amd64: Read back the scrub rate PCI register on F15h Greg Kroah-Hartman
2020-07-07 15:16 ` [PATCH 4.19 04/36] usbnet: smsc95xx: Fix use-after-free after removal Greg Kroah-Hartman
2020-07-07 15:16 ` [PATCH 4.19 05/36] mm/slub.c: fix corrupted freechain in deactivate_slab() Greg Kroah-Hartman
2020-07-07 15:16 ` [PATCH 4.19 06/36] mm/slub: fix stack overruns with SLUB_STATS Greg Kroah-Hartman
2020-07-07 15:16 ` [PATCH 4.19 07/36] usb: usbtest: fix missing kfree(dev->buf) in usbtest_disconnect Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 4.19 08/36] s390/debug: avoid kernel warning on too large number of pages Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 4.19 09/36] nvme-multipath: set bdi capabilities once Greg Kroah-Hartman
2020-07-07 15:17 ` Greg Kroah-Hartman [this message]
2020-07-07 18:16 ` [PATCH 4.19 10/36] nvme: fix possible deadlock when I/O is blocked Pavel Machek
2020-07-08 2:29 ` Sasha Levin
2020-07-07 15:17 ` [PATCH 4.19 11/36] nvme-multipath: fix deadlock between ana_work and scan_work Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 4.19 12/36] kgdb: Avoid suspicious RCU usage warning Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 4.19 13/36] crypto: af_alg - fix use-after-free in af_alg_accept() due to bh_lock_sock() Greg Kroah-Hartman
2020-07-07 21:25 ` Pavel Machek
2020-07-07 23:38 ` Herbert Xu
2020-07-07 15:17 ` [PATCH 4.19 14/36] drm/msm/dpu: fix error return code in dpu_encoder_init Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 4.19 15/36] cxgb4: use unaligned conversion for fetching timestamp Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 4.19 16/36] cxgb4: parse TC-U32 key values and masks natively Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 4.19 17/36] cxgb4: use correct type for all-mask IP address comparison Greg Kroah-Hartman
2020-07-07 21:33 ` Pavel Machek
2020-07-08 12:36 ` Rahul Lakkireddy
2020-07-07 15:17 ` [PATCH 4.19 18/36] cxgb4: fix SGE queue dump destination buffer context Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 4.19 19/36] hwmon: (max6697) Make sure the OVERT mask is set correctly Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 4.19 20/36] hwmon: (acpi_power_meter) Fix potential memory leak in acpi_power_meter_add() Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 4.19 21/36] drm: sun4i: hdmi: Remove extra HPD polling Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 4.19 22/36] virtio-blk: free vblk-vqs in error path of virtblk_probe() Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 4.19 23/36] SMB3: Honor posix flag for multiuser mounts Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 4.19 24/36] nvme: fix a crash in nvme_mpath_add_disk Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 4.19 25/36] i2c: algo-pca: Add 0x78 as SCL stuck low status for PCA9665 Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 4.19 26/36] i2c: mlxcpld: check correct size of maximum RECV_LEN packet Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 4.19 27/36] nfsd: apply umask on fs without ACL support Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 4.19 28/36] Revert "ALSA: usb-audio: Improve frames size computation" Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 4.19 29/36] SMB3: Honor seal flag for multiuser mounts Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 4.19 30/36] SMB3: Honor persistent/resilient handle flags " Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 4.19 31/36] SMB3: Honor lease disabling " Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 4.19 32/36] cifs: Fix the target file was deleted when rename failed Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 4.19 33/36] MIPS: Add missing EHB in mtc0 -> mfc0 sequence for DSPen Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 4.19 34/36] irqchip/gic: Atomically update affinity Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 4.19 35/36] dm zoned: assign max_io_len correctly Greg Kroah-Hartman
2020-07-07 15:17 ` [PATCH 4.19 36/36] efi: Make it possible to disable efivar_ssdt entirely Greg Kroah-Hartman
2020-07-08 5:56 ` [PATCH 4.19 00/36] 4.19.132-rc1 review Naresh Kamboju
[not found] ` <20200707145749.130272978-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
2020-07-08 8:41 ` Jon Hunter
2020-07-08 8:41 ` Jon Hunter
2020-07-08 10:41 ` Chris Paterson
2020-07-08 15:15 ` Greg Kroah-Hartman
2020-07-08 15:04 ` Shuah Khan
2020-07-08 17:52 ` Guenter Roeck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200707145749.639245963@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=anton@lightbitslabs.com \
--cc=hch@lst.de \
--cc=linux-kernel@vger.kernel.org \
--cc=sagi@grimberg.me \
--cc=sashal@kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.