From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B05F7CFA466 for ; Mon, 24 Nov 2025 08:07:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=4VM8nTkmJ1uVRUlewVtqOMdA8rPmxmlCO8gwfxrbXZM=; b=2oJVYX9wuabfKXvG5+i1LPsepO z21UF4+zLG3Yl7nbmdDy5PBn8cbX5IZkCmnsyL4nTGakJOBwt4bMbFJX6FWFl+T5cmgn9A7SjaGKz XgEBc1HjAeoP174pWps882tB+m7ewDyYjIygF7EprXUSlg4+7DxY1OJQ/xaJ6I6vcwDKyp9soYP9/ OOkKM2SWMw2fw4ibgFXigE4/APkeXUmqmPm2ApxX1FVuX59CrwnlTtGdJlb41b/ktP10J7Xegn1IA QI1TtLVx6CX/HgERC0fqRN8dc/Y6DMe3nyDBF1iKvXqCWcs34oqqfi+9yLTOxcFC4ziP9gDH0blaY X1OALNcg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98.2 #2 (Red Hat Linux)) id 1vNRbY-0000000BGRf-07YH; Mon, 24 Nov 2025 08:07:24 +0000 Received: from sea.source.kernel.org ([2600:3c0a:e001:78e:0:1991:8:25]) by bombadil.infradead.org with esmtps (Exim 4.98.2 #2 (Red Hat Linux)) id 1vNRbT-0000000BGQ9-3LFc for linux-nvme@lists.infradead.org; Mon, 24 Nov 2025 08:07:22 +0000 Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sea.source.kernel.org (Postfix) with ESMTP id 074EF442DB; Mon, 24 Nov 2025 08:07:18 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id AF10BC116D0; Mon, 24 Nov 2025 08:07:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1763971637; bh=FVBf3vcLNPCvsTYwbrp09DU8HE90hdfMcM+dfe6Nxqc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=LHi+SV1mHKsdiRBrd45OECz+Ok/8MsfYzB/apVBS89qk39vBQU9BMnV5RiL+fp2jH 44CoQ4zexhBdZjd8X5qLWgzX0EapGI2uClZx5POiQfmp59BhxHoFXc6uXV0VSJzZ4P iBcEBkXQa8+NYPjQfCvOnuTPTGJS+5I9EoKwqDWXBZw+VZsdaf3IDDsFwj9OQv/JIT OjKR3sMlYMbwsM5kHPY/2MllAjk/6MOLXCKBPsyPLaJrLTpnbGE1C0YhoR0A/KENBn DANeXu+d2VVCHGrPkc/AoiA1S1e3erCFjyd+Xcm1WNisWlRULC3AoI9L1leN8b8kcY FISsNc8cl4IUw== From: Sasha Levin To: patches@lists.linux.dev, stable@vger.kernel.org Cc: Keith Busch , Casey Chen , Christoph Hellwig , Hannes Reinecke , Ming Lei , Chaitanya Kulkarni , Sasha Levin , sagi@grimberg.me, linux-nvme@lists.infradead.org Subject: [PATCH AUTOSEL 6.17-6.6] nvme: fix admin request_queue lifetime Date: Mon, 24 Nov 2025 03:06:32 -0500 Message-ID: <20251124080644.3871678-18-sashal@kernel.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20251124080644.3871678-1-sashal@kernel.org> References: <20251124080644.3871678-1-sashal@kernel.org> MIME-Version: 1.0 X-stable: review X-Patchwork-Hint: Ignore X-stable-base: Linux 6.17.8 Content-Transfer-Encoding: 8bit X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20251124_000721_087671_3609BCCB X-CRM114-Status: GOOD ( 22.68 ) X-BeenThere: linux-nvme@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "Linux-nvme" Errors-To: linux-nvme-bounces+linux-nvme=archiver.kernel.org@lists.infradead.org From: Keith Busch [ Upstream commit 03b3bcd319b3ab5182bc9aaa0421351572c78ac0 ] The namespaces can access the controller's admin request_queue, and stale references on the namespaces may exist after tearing down the controller. Ensure the admin request_queue is active by moving the controller's 'put' to after all controller references have been released to ensure no one is can access the request_queue. This fixes a reported use-after-free bug: BUG: KASAN: slab-use-after-free in blk_queue_enter+0x41c/0x4a0 Read of size 8 at addr ffff88c0a53819f8 by task nvme/3287 CPU: 67 UID: 0 PID: 3287 Comm: nvme Tainted: G E 6.13.2-ga1582f1a031e #15 Tainted: [E]=UNSIGNED_MODULE Hardware name: Jabil /EGS 2S MB1, BIOS 1.00 06/18/2025 Call Trace: dump_stack_lvl+0x4f/0x60 print_report+0xc4/0x620 ? _raw_spin_lock_irqsave+0x70/0xb0 ? _raw_read_unlock_irqrestore+0x30/0x30 ? blk_queue_enter+0x41c/0x4a0 kasan_report+0xab/0xe0 ? blk_queue_enter+0x41c/0x4a0 blk_queue_enter+0x41c/0x4a0 ? __irq_work_queue_local+0x75/0x1d0 ? blk_queue_start_drain+0x70/0x70 ? irq_work_queue+0x18/0x20 ? vprintk_emit.part.0+0x1cc/0x350 ? wake_up_klogd_work_func+0x60/0x60 blk_mq_alloc_request+0x2b7/0x6b0 ? __blk_mq_alloc_requests+0x1060/0x1060 ? __switch_to+0x5b7/0x1060 nvme_submit_user_cmd+0xa9/0x330 nvme_user_cmd.isra.0+0x240/0x3f0 ? force_sigsegv+0xe0/0xe0 ? nvme_user_cmd64+0x400/0x400 ? vfs_fileattr_set+0x9b0/0x9b0 ? cgroup_update_frozen_flag+0x24/0x1c0 ? cgroup_leave_frozen+0x204/0x330 ? nvme_ioctl+0x7c/0x2c0 blkdev_ioctl+0x1a8/0x4d0 ? blkdev_common_ioctl+0x1930/0x1930 ? fdget+0x54/0x380 __x64_sys_ioctl+0x129/0x190 do_syscall_64+0x5b/0x160 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x7f765f703b0b Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d dd 52 0f 00 f7 d8 64 89 01 48 RSP: 002b:00007ffe2cefe808 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007ffe2cefe860 RCX: 00007f765f703b0b RDX: 00007ffe2cefe860 RSI: 00000000c0484e41 RDI: 0000000000000003 RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000000 R10: 00007f765f611d50 R11: 0000000000000202 R12: 0000000000000003 R13: 00000000c0484e41 R14: 0000000000000001 R15: 00007ffe2cefea60 Reported-by: Casey Chen Reviewed-by: Christoph Hellwig Reviewed-by: Hannes Reinecke Reviewed-by: Ming Lei Reviewed-by: Chaitanya Kulkarni Signed-off-by: Keith Busch Signed-off-by: Sasha Levin --- LLM Generated explanations, may be completely bogus: ## Analysis ### 1. Commit Message and Bug Analysis The commit "nvme: fix admin request_queue lifetime" addresses a critical **Use-After-Free (UAF)** vulnerability in the NVMe subsystem. The commit message includes a KASAN stack trace showing a crash in `blk_queue_enter` triggered by `nvme_submit_user_cmd`. - **The Issue:** A race condition exists during NVMe controller teardown. The admin request queue (`ctrl->admin_q`) is destroyed and its reference dropped early in the teardown process (inside `nvme_remove_admin_tag_set`). However, references to the controller object itself (`nvme_ctrl`) can persist, for example, if userspace holds open file descriptors to namespaces. If a user issues an admin command via ioctl after the queue is freed but before the controller is fully released, the kernel attempts to access freed memory, resulting in a panic. - **Real-World Impact:** This was reported by Pure Storage (`Reported- by: Casey Chen`) and verified with a KASAN splat, confirming it affects production environments and is triggerable by userspace operations. ### 2. Deep Code Research & History - **Origin of Regression:** Detailed investigation reveals this issue stems from the v6.2 development cycle (specifically commit `2b3f056f72e5`, "blk-mq: move the call to blk_put_queue out of blk_mq_destroy_queue"). That change shifted the responsibility of dropping queue references to individual drivers. The NVMe driver placed this `blk_put_queue` call in `nvme_remove_admin_tag_set`, which proved to be too early in the lifecycle. - **The Fix Mechanism:** The patch moves the `blk_put_queue(ctrl->admin_q)` call from `nvme_remove_admin_tag_set()` to `nvme_free_ctrl()`. - `nvme_free_ctrl()` is the specific `release` callback for the controller's device structure. It is guaranteed to run **only** when the last reference to the controller is dropped. - By moving the cleanup here, the patch ensures the `admin_q` remains valid for exactly as long as the controller structure exists, perfectly aligning their lifetimes and eliminating the UAF window. - **Code Context:** The change is minimal (removing 1 line, adding 2 lines) and self-contained within `drivers/nvme/host/core.c`. ### 3. Security Assessment - **Severity: High.** Use-After-Free bugs are significant security vulnerabilities that can lead to data corruption, kernel panics, or potentially arbitrary code execution/privilege escalation. - **Trigger:** The bug is reachable from userspace via standard ioctl paths (`nvme_user_cmd`), making it a viable target for local attackers or a source of instability during hotplug/shutdown events. ### 4. Stable Kernel Rules Compliance - **Fixes a real bug:** Yes, documented UAF with stack trace. - **Obviously correct:** Yes, associating resource cleanup with the object destructor is the standard correct pattern for refcounted objects. - **Small and contained:** Only modifies 3 lines in one file. - **No new features:** Pure bug fix. - **Mainline Status:** Reviewed by multiple senior subsystem maintainers (Hellwig, Reinecke, Lei, Kulkarni). ### 5. Risk Assessment - **Regression Risk: Very Low.** The fix simply extends the lifetime of a memory object to match its owner. It does not change the logic of *how* the queue is used, only *when* it is freed. This is defensively safer than the current code. - **Backportability:** The patch applies cleanly to stable kernels 6.6.y, 6.1.y, and others where the regression exists (kernels >= v6.2). ### Conclusion This is a textbook candidate for stable backporting. It fixes a severe memory safety bug (UAF) that causes crashes in production systems. The fix is surgical, well-understood, and restores correct lifetime management logic. **YES** drivers/nvme/host/core.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 5714d49932822..28c598008124c 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -4896,7 +4896,6 @@ void nvme_remove_admin_tag_set(struct nvme_ctrl *ctrl) */ nvme_stop_keep_alive(ctrl); blk_mq_destroy_queue(ctrl->admin_q); - blk_put_queue(ctrl->admin_q); if (ctrl->ops->flags & NVME_F_FABRICS) { blk_mq_destroy_queue(ctrl->fabrics_q); blk_put_queue(ctrl->fabrics_q); @@ -5040,6 +5039,8 @@ static void nvme_free_ctrl(struct device *dev) container_of(dev, struct nvme_ctrl, ctrl_device); struct nvme_subsystem *subsys = ctrl->subsys; + if (ctrl->admin_q) + blk_put_queue(ctrl->admin_q); if (!subsys || ctrl->instance != subsys->instance) ida_free(&nvme_instance_ida, ctrl->instance); nvme_free_cels(ctrl); -- 2.51.0