From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
patches@lists.linux.dev, Christoph Hellwig <hch@lst.de>,
Nilay Shroff <nilay@linux.ibm.com>,
Keith Busch <kbusch@kernel.org>, Sasha Levin <sashal@kernel.org>
Subject: [PATCH 6.1 21/39] nvme: make keep-alive synchronous operation
Date: Fri, 15 Nov 2024 07:38:31 +0100 [thread overview]
Message-ID: <20241115063723.375244706@linuxfoundation.org> (raw)
In-Reply-To: <20241115063722.599985562@linuxfoundation.org>
6.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nilay Shroff <nilay@linux.ibm.com>
[ Upstream commit d06923670b5a5f609603d4a9fee4dec02d38de9c ]
The nvme keep-alive operation, which executes at a periodic interval,
could potentially sneak in while shutting down a fabric controller.
This may lead to a race between the fabric controller admin queue
destroy code path (invoked while shutting down controller) and hw/hctx
queue dispatcher called from the nvme keep-alive async request queuing
operation. This race could lead to the kernel crash shown below:
Call Trace:
autoremove_wake_function+0x0/0xbc (unreliable)
__blk_mq_sched_dispatch_requests+0x114/0x24c
blk_mq_sched_dispatch_requests+0x44/0x84
blk_mq_run_hw_queue+0x140/0x220
nvme_keep_alive_work+0xc8/0x19c [nvme_core]
process_one_work+0x200/0x4e0
worker_thread+0x340/0x504
kthread+0x138/0x140
start_kernel_thread+0x14/0x18
While shutting down fabric controller, if nvme keep-alive request sneaks
in then it would be flushed off. The nvme_keep_alive_end_io function is
then invoked to handle the end of the keep-alive operation which
decrements the admin->q_usage_counter and assuming this is the last/only
request in the admin queue then the admin->q_usage_counter becomes zero.
If that happens then blk-mq destroy queue operation (blk_mq_destroy_
queue()) which could be potentially running simultaneously on another
cpu (as this is the controller shutdown code path) would forward
progress and deletes the admin queue. So, now from this point onward
we are not supposed to access the admin queue resources. However the
issue here's that the nvme keep-alive thread running hw/hctx queue
dispatch operation hasn't yet finished its work and so it could still
potentially access the admin queue resource while the admin queue had
been already deleted and that causes the above crash.
This fix helps avoid the observed crash by implementing keep-alive as a
synchronous operation so that we decrement admin->q_usage_counter only
after keep-alive command finished its execution and returns the command
status back up to its caller (blk_execute_rq()). This would ensure that
fabric shutdown code path doesn't destroy the fabric admin queue until
keep-alive request finished execution and also keep-alive thread is not
running hw/hctx queue dispatch operation.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/nvme/host/core.c | 17 +++++++----------
1 file changed, 7 insertions(+), 10 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index dc25d91891327..92ffeb6605618 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1231,10 +1231,9 @@ static void nvme_queue_keep_alive_work(struct nvme_ctrl *ctrl)
nvme_keep_alive_work_period(ctrl));
}
-static enum rq_end_io_ret nvme_keep_alive_end_io(struct request *rq,
- blk_status_t status)
+static void nvme_keep_alive_finish(struct request *rq,
+ blk_status_t status, struct nvme_ctrl *ctrl)
{
- struct nvme_ctrl *ctrl = rq->end_io_data;
unsigned long flags;
bool startka = false;
unsigned long rtt = jiffies - (rq->deadline - rq->timeout);
@@ -1252,13 +1251,11 @@ static enum rq_end_io_ret nvme_keep_alive_end_io(struct request *rq,
delay = 0;
}
- blk_mq_free_request(rq);
-
if (status) {
dev_err(ctrl->device,
"failed nvme_keep_alive_end_io error=%d\n",
status);
- return RQ_END_IO_NONE;
+ return;
}
ctrl->ka_last_check_time = jiffies;
@@ -1270,7 +1267,6 @@ static enum rq_end_io_ret nvme_keep_alive_end_io(struct request *rq,
spin_unlock_irqrestore(&ctrl->lock, flags);
if (startka)
queue_delayed_work(nvme_wq, &ctrl->ka_work, delay);
- return RQ_END_IO_NONE;
}
static void nvme_keep_alive_work(struct work_struct *work)
@@ -1279,6 +1275,7 @@ static void nvme_keep_alive_work(struct work_struct *work)
struct nvme_ctrl, ka_work);
bool comp_seen = ctrl->comp_seen;
struct request *rq;
+ blk_status_t status;
ctrl->ka_last_check_time = jiffies;
@@ -1301,9 +1298,9 @@ static void nvme_keep_alive_work(struct work_struct *work)
nvme_init_request(rq, &ctrl->ka_cmd);
rq->timeout = ctrl->kato * HZ;
- rq->end_io = nvme_keep_alive_end_io;
- rq->end_io_data = ctrl;
- blk_execute_rq_nowait(rq, false);
+ status = blk_execute_rq(rq, false);
+ nvme_keep_alive_finish(rq, status, ctrl);
+ blk_mq_free_request(rq);
}
static void nvme_start_keep_alive(struct nvme_ctrl *ctrl)
--
2.43.0
next prev parent reply other threads:[~2024-11-15 6:53 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-15 6:38 [PATCH 6.1 00/39] 6.1.118-rc1 review Greg Kroah-Hartman
2024-11-15 6:38 ` [PATCH 6.1 01/39] Revert "Bluetooth: fix use-after-free in accessing skb after sending it" Greg Kroah-Hartman
2024-11-15 6:38 ` [PATCH 6.1 02/39] Revert "Bluetooth: hci_sync: Fix overwriting request callback" Greg Kroah-Hartman
2024-11-15 6:38 ` [PATCH 6.1 03/39] Revert "Bluetooth: af_bluetooth: Fix deadlock" Greg Kroah-Hartman
2024-11-15 6:38 ` [PATCH 6.1 04/39] Revert "Bluetooth: hci_core: Fix possible buffer overflow" Greg Kroah-Hartman
2024-11-15 6:38 ` [PATCH 6.1 05/39] Revert "Bluetooth: hci_conn: Consolidate code for aborting connections" Greg Kroah-Hartman
2024-11-15 6:38 ` [PATCH 6.1 06/39] 9p: Avoid creating multiple slab caches with the same name Greg Kroah-Hartman
2024-11-15 6:38 ` [PATCH 6.1 07/39] irqchip/ocelot: Fix trigger register address Greg Kroah-Hartman
2024-11-15 6:38 ` [PATCH 6.1 08/39] nvme: tcp: avoid race between queue_lock lock and destroy Greg Kroah-Hartman
2024-11-15 6:38 ` [PATCH 6.1 09/39] block: Fix elevator_get_default() checking for NULL q->tag_set Greg Kroah-Hartman
2024-11-15 6:38 ` [PATCH 6.1 10/39] HID: multitouch: Add support for B2402FVA track point Greg Kroah-Hartman
2024-11-15 6:38 ` [PATCH 6.1 11/39] HID: multitouch: Add quirk for HONOR MagicBook Art 14 touchpad Greg Kroah-Hartman
2024-11-15 6:38 ` [PATCH 6.1 12/39] nvme: disable CC.CRIME (NVME_CC_CRIME) Greg Kroah-Hartman
2024-11-15 6:38 ` [PATCH 6.1 13/39] bpf: use kvzmalloc to allocate BPF verifier environment Greg Kroah-Hartman
2024-11-15 6:38 ` [PATCH 6.1 14/39] crypto: api - Fix liveliness check in crypto_alg_tested Greg Kroah-Hartman
2024-11-15 6:38 ` [PATCH 6.1 15/39] crypto: marvell/cesa - Disable hash algorithms Greg Kroah-Hartman
2024-11-15 6:38 ` [PATCH 6.1 16/39] sound: Make CONFIG_SND depend on INDIRECT_IOMEM instead of UML Greg Kroah-Hartman
2024-11-15 6:38 ` [PATCH 6.1 17/39] drm/vmwgfx: Limit display layout ioctl array size to VMWGFX_NUM_DISPLAY_UNITS Greg Kroah-Hartman
2024-11-15 6:38 ` [PATCH 6.1 18/39] kasan: Disable Software Tag-Based KASAN with GCC Greg Kroah-Hartman
2024-11-15 6:38 ` [PATCH 6.1 19/39] nvme-multipath: defer partition scanning Greg Kroah-Hartman
2024-11-15 6:38 ` [PATCH 6.1 20/39] powerpc/powernv: Free name on error in opal_event_init() Greg Kroah-Hartman
2024-11-15 6:38 ` Greg Kroah-Hartman [this message]
2024-11-15 6:38 ` [PATCH 6.1 22/39] vDPA/ifcvf: Fix pci_read_config_byte() return code handling Greg Kroah-Hartman
2024-11-15 6:38 ` [PATCH 6.1 23/39] bpf: Fix mismatched RCU unlock flavour in bpf_out_neigh_v6 Greg Kroah-Hartman
2024-11-15 6:38 ` [PATCH 6.1 24/39] fs: Fix uninitialized value issue in from_kuid and from_kgid Greg Kroah-Hartman
2024-11-15 6:38 ` [PATCH 6.1 25/39] HID: multitouch: Add quirk for Logitech Bolt receiver w/ Casa touchpad Greg Kroah-Hartman
2024-11-15 6:38 ` [PATCH 6.1 26/39] HID: lenovo: Add support for Thinkpad X1 Tablet Gen 3 keyboard Greg Kroah-Hartman
2024-11-15 6:38 ` [PATCH 6.1 27/39] LoongArch: Use "Exception return address" to comment ERA Greg Kroah-Hartman
2024-11-15 6:38 ` [PATCH 6.1 28/39] net: usb: qmi_wwan: add Fibocom FG132 0x0112 composition Greg Kroah-Hartman
2024-11-15 6:38 ` [PATCH 6.1 29/39] md/raid10: improve code of mrdev in raid10_sync_request Greg Kroah-Hartman
2024-11-15 6:38 ` [PATCH 6.1 30/39] io_uring: fix possible deadlock in io_register_iowq_max_workers() Greg Kroah-Hartman
2024-11-15 6:38 ` [PATCH 6.1 31/39] uprobes: encapsulate preparation of uprobe args buffer Greg Kroah-Hartman
2024-11-15 6:38 ` [PATCH 6.1 32/39] uprobe: avoid out-of-bounds memory access of fetching args Greg Kroah-Hartman
2024-11-15 6:38 ` [PATCH 6.1 33/39] drm/amdkfd: amdkfd_free_gtt_mem clear the correct pointer Greg Kroah-Hartman
2024-11-15 6:38 ` [PATCH 6.1 34/39] ext4: fix timer use-after-free on failed mount Greg Kroah-Hartman
2024-11-15 6:38 ` [PATCH 6.1 35/39] Bluetooth: L2CAP: Fix uaf in l2cap_connect Greg Kroah-Hartman
2024-11-15 6:38 ` [PATCH 6.1 36/39] mm: krealloc: Fix MTE false alarm in __do_krealloc Greg Kroah-Hartman
2024-11-15 6:38 ` [PATCH 6.1 37/39] platform/x86: x86-android-tablets: Fix use after free on platform_device_register() errors Greg Kroah-Hartman
2024-11-15 6:38 ` [PATCH 6.1 38/39] fs/ntfs3: Fix general protection fault in run_is_mapped_full Greg Kroah-Hartman
2024-11-15 6:38 ` [PATCH 6.1 39/39] 9p: fix slab cache name creation for real Greg Kroah-Hartman
2024-11-15 12:43 ` [PATCH 6.1 00/39] 6.1.118-rc1 review Peter Schneider
2024-11-15 18:11 ` Jon Hunter
2024-11-15 18:26 ` SeongJae Park
2024-11-15 19:14 ` Florian Fainelli
2024-11-15 21:26 ` Mark Brown
2024-11-16 0:04 ` Ron Economos
2024-11-16 12:24 ` Naresh Kamboju
2024-11-16 17:20 ` [PATCH 6.1] " Hardik Garg
2024-11-16 21:10 ` [PATCH 6.1 00/39] " Shuah Khan
2024-11-17 13:27 ` Pavel Machek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241115063723.375244706@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=hch@lst.de \
--cc=kbusch@kernel.org \
--cc=nilay@linux.ibm.com \
--cc=patches@lists.linux.dev \
--cc=sashal@kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox