linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] nvme: fix admin request_queue lifetime
@ 2025-11-04 22:59 Keith Busch
  2025-11-04 23:08 ` Chaitanya Kulkarni
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Keith Busch @ 2025-11-04 22:59 UTC (permalink / raw)
  To: linux-nvme; +Cc: hch, ming.lei, chaitanyak, Keith Busch, Casey Chen

From: Keith Busch <kbusch@kernel.org>

The namespaces can access the controller's admin request_queue, and
stale references on the namespaces may exist. Ensure the request_queue
is active by moving the controller's 'put' after all references on the
controller have been released to ensure no one is trying to access the
request_queue. This fixes a reported use-after-free bug:

  BUG: KASAN: slab-use-after-free in blk_queue_enter+0x41c/0x4a0
  Read of size 8 at addr ffff88c0a53819f8 by task nvme/3287
  CPU: 67 UID: 0 PID: 3287 Comm: nvme Tainted: G            E       6.13.2-ga1582f1a031e #15
  Tainted: [E]=UNSIGNED_MODULE
  Hardware name: Jabil /EGS 2S MB1, BIOS 1.00 06/18/2025
  Call Trace:
   <TASK>
   dump_stack_lvl+0x4f/0x60
   print_report+0xc4/0x620
   ? _raw_spin_lock_irqsave+0x70/0xb0
   ? _raw_read_unlock_irqrestore+0x30/0x30
   ? blk_queue_enter+0x41c/0x4a0
   kasan_report+0xab/0xe0
   ? blk_queue_enter+0x41c/0x4a0
   blk_queue_enter+0x41c/0x4a0
   ? __irq_work_queue_local+0x75/0x1d0
   ? blk_queue_start_drain+0x70/0x70
   ? irq_work_queue+0x18/0x20
   ? vprintk_emit.part.0+0x1cc/0x350
   ? wake_up_klogd_work_func+0x60/0x60
   blk_mq_alloc_request+0x2b7/0x6b0
   ? __blk_mq_alloc_requests+0x1060/0x1060
   ? __switch_to+0x5b7/0x1060
   nvme_submit_user_cmd+0xa9/0x330
   nvme_user_cmd.isra.0+0x240/0x3f0
   ? force_sigsegv+0xe0/0xe0
   ? nvme_user_cmd64+0x400/0x400
   ? vfs_fileattr_set+0x9b0/0x9b0
   ? cgroup_update_frozen_flag+0x24/0x1c0
   ? cgroup_leave_frozen+0x204/0x330
   ? nvme_ioctl+0x7c/0x2c0
   blkdev_ioctl+0x1a8/0x4d0
   ? blkdev_common_ioctl+0x1930/0x1930
   ? fdget+0x54/0x380
   __x64_sys_ioctl+0x129/0x190
   do_syscall_64+0x5b/0x160
   entry_SYSCALL_64_after_hwframe+0x4b/0x53
  RIP: 0033:0x7f765f703b0b
  Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d dd 52 0f 00 f7 d8 64 89 01 48
  RSP: 002b:00007ffe2cefe808 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
  RAX: ffffffffffffffda RBX: 00007ffe2cefe860 RCX: 00007f765f703b0b
  RDX: 00007ffe2cefe860 RSI: 00000000c0484e41 RDI: 0000000000000003
  RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000000
  R10: 00007f765f611d50 R11: 0000000000000202 R12: 0000000000000003
  R13: 00000000c0484e41 R14: 0000000000000001 R15: 00007ffe2cefea60
   </TASK>

Reported-by: Casey Chen <cachen@purestorage.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 drivers/nvme/host/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index fa4181d7de736..0b83d82f67e75 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4901,7 +4901,6 @@ void nvme_remove_admin_tag_set(struct nvme_ctrl *ctrl)
 	 */
 	nvme_stop_keep_alive(ctrl);
 	blk_mq_destroy_queue(ctrl->admin_q);
-	blk_put_queue(ctrl->admin_q);
 	if (ctrl->ops->flags & NVME_F_FABRICS) {
 		blk_mq_destroy_queue(ctrl->fabrics_q);
 		blk_put_queue(ctrl->fabrics_q);
@@ -5045,6 +5044,7 @@ static void nvme_free_ctrl(struct device *dev)
 		container_of(dev, struct nvme_ctrl, ctrl_device);
 	struct nvme_subsystem *subsys = ctrl->subsys;
 
+	blk_put_queue(ctrl->admin_q);
 	if (!subsys || ctrl->instance != subsys->instance)
 		ida_free(&nvme_instance_ida, ctrl->instance);
 	nvme_free_cels(ctrl);
-- 
2.47.3



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] nvme: fix admin request_queue lifetime
  2025-11-04 22:59 [PATCH] nvme: fix admin request_queue lifetime Keith Busch
@ 2025-11-04 23:08 ` Chaitanya Kulkarni
  2025-11-04 23:22   ` Casey Chen
  2025-11-05  1:20 ` Ming Lei
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Chaitanya Kulkarni @ 2025-11-04 23:08 UTC (permalink / raw)
  To: Keith Busch, linux-nvme@lists.infradead.org
  Cc: hch@lst.de, ming.lei@redhat.com, Keith Busch, Casey Chen

On 11/4/25 14:59, Keith Busch wrote:
> From: Keith Busch<kbusch@kernel.org>
>
> The namespaces can access the controller's admin request_queue, and
> stale references on the namespaces may exist. Ensure the request_queue
> is active by moving the controller's 'put' after all references on the
> controller have been released to ensure no one is trying to access the
> request_queue. This fixes a reported use-after-free bug:
>
>    BUG: KASAN: slab-use-after-free in blk_queue_enter+0x41c/0x4a0
>    Read of size 8 at addr ffff88c0a53819f8 by task nvme/3287
>    CPU: 67 UID: 0 PID: 3287 Comm: nvme Tainted: G            E       6.13.2-ga1582f1a031e #15
>    Tainted: [E]=UNSIGNED_MODULE
>    Hardware name: Jabil /EGS 2S MB1, BIOS 1.00 06/18/2025
>    Call Trace:
>     <TASK>
>     dump_stack_lvl+0x4f/0x60
>     print_report+0xc4/0x620
>     ? _raw_spin_lock_irqsave+0x70/0xb0
>     ? _raw_read_unlock_irqrestore+0x30/0x30
>     ? blk_queue_enter+0x41c/0x4a0
>     kasan_report+0xab/0xe0
>     ? blk_queue_enter+0x41c/0x4a0
>     blk_queue_enter+0x41c/0x4a0
>     ? __irq_work_queue_local+0x75/0x1d0
>     ? blk_queue_start_drain+0x70/0x70
>     ? irq_work_queue+0x18/0x20
>     ? vprintk_emit.part.0+0x1cc/0x350
>     ? wake_up_klogd_work_func+0x60/0x60
>     blk_mq_alloc_request+0x2b7/0x6b0
>     ? __blk_mq_alloc_requests+0x1060/0x1060
>     ? __switch_to+0x5b7/0x1060
>     nvme_submit_user_cmd+0xa9/0x330
>     nvme_user_cmd.isra.0+0x240/0x3f0
>     ? force_sigsegv+0xe0/0xe0
>     ? nvme_user_cmd64+0x400/0x400
>     ? vfs_fileattr_set+0x9b0/0x9b0
>     ? cgroup_update_frozen_flag+0x24/0x1c0
>     ? cgroup_leave_frozen+0x204/0x330
>     ? nvme_ioctl+0x7c/0x2c0
>     blkdev_ioctl+0x1a8/0x4d0
>     ? blkdev_common_ioctl+0x1930/0x1930
>     ? fdget+0x54/0x380
>     __x64_sys_ioctl+0x129/0x190
>     do_syscall_64+0x5b/0x160
>     entry_SYSCALL_64_after_hwframe+0x4b/0x53
>    RIP: 0033:0x7f765f703b0b
>    Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d dd 52 0f 00 f7 d8 64 89 01 48
>    RSP: 002b:00007ffe2cefe808 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
>    RAX: ffffffffffffffda RBX: 00007ffe2cefe860 RCX: 00007f765f703b0b
>    RDX: 00007ffe2cefe860 RSI: 00000000c0484e41 RDI: 0000000000000003
>    RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000000
>    R10: 00007f765f611d50 R11: 0000000000000202 R12: 0000000000000003
>    R13: 00000000c0484e41 R14: 0000000000000001 R15: 00007ffe2cefea60
>     </TASK>
>
> Reported-by: Casey Chen<cachen@purestorage.com>
> Signed-off-by: Keith Busch<kbusch@kernel.org>
> ---


Looks good.

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

-ck



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nvme: fix admin request_queue lifetime
  2025-11-04 23:08 ` Chaitanya Kulkarni
@ 2025-11-04 23:22   ` Casey Chen
  0 siblings, 0 replies; 12+ messages in thread
From: Casey Chen @ 2025-11-04 23:22 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: Keith Busch, linux-nvme@lists.infradead.org, hch@lst.de,
	ming.lei@redhat.com, Keith Busch

Looks good. Thanks

On Tue, Nov 4, 2025 at 3:08 PM Chaitanya Kulkarni <chaitanyak@nvidia.com> wrote:
>
> On 11/4/25 14:59, Keith Busch wrote:
> > From: Keith Busch<kbusch@kernel.org>
> >
> > The namespaces can access the controller's admin request_queue, and
> > stale references on the namespaces may exist. Ensure the request_queue
> > is active by moving the controller's 'put' after all references on the
> > controller have been released to ensure no one is trying to access the
> > request_queue. This fixes a reported use-after-free bug:
> >
> >    BUG: KASAN: slab-use-after-free in blk_queue_enter+0x41c/0x4a0
> >    Read of size 8 at addr ffff88c0a53819f8 by task nvme/3287
> >    CPU: 67 UID: 0 PID: 3287 Comm: nvme Tainted: G            E       6.13.2-ga1582f1a031e #15
> >    Tainted: [E]=UNSIGNED_MODULE
> >    Hardware name: Jabil /EGS 2S MB1, BIOS 1.00 06/18/2025
> >    Call Trace:
> >     <TASK>
> >     dump_stack_lvl+0x4f/0x60
> >     print_report+0xc4/0x620
> >     ? _raw_spin_lock_irqsave+0x70/0xb0
> >     ? _raw_read_unlock_irqrestore+0x30/0x30
> >     ? blk_queue_enter+0x41c/0x4a0
> >     kasan_report+0xab/0xe0
> >     ? blk_queue_enter+0x41c/0x4a0
> >     blk_queue_enter+0x41c/0x4a0
> >     ? __irq_work_queue_local+0x75/0x1d0
> >     ? blk_queue_start_drain+0x70/0x70
> >     ? irq_work_queue+0x18/0x20
> >     ? vprintk_emit.part.0+0x1cc/0x350
> >     ? wake_up_klogd_work_func+0x60/0x60
> >     blk_mq_alloc_request+0x2b7/0x6b0
> >     ? __blk_mq_alloc_requests+0x1060/0x1060
> >     ? __switch_to+0x5b7/0x1060
> >     nvme_submit_user_cmd+0xa9/0x330
> >     nvme_user_cmd.isra.0+0x240/0x3f0
> >     ? force_sigsegv+0xe0/0xe0
> >     ? nvme_user_cmd64+0x400/0x400
> >     ? vfs_fileattr_set+0x9b0/0x9b0
> >     ? cgroup_update_frozen_flag+0x24/0x1c0
> >     ? cgroup_leave_frozen+0x204/0x330
> >     ? nvme_ioctl+0x7c/0x2c0
> >     blkdev_ioctl+0x1a8/0x4d0
> >     ? blkdev_common_ioctl+0x1930/0x1930
> >     ? fdget+0x54/0x380
> >     __x64_sys_ioctl+0x129/0x190
> >     do_syscall_64+0x5b/0x160
> >     entry_SYSCALL_64_after_hwframe+0x4b/0x53
> >    RIP: 0033:0x7f765f703b0b
> >    Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d dd 52 0f 00 f7 d8 64 89 01 48
> >    RSP: 002b:00007ffe2cefe808 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
> >    RAX: ffffffffffffffda RBX: 00007ffe2cefe860 RCX: 00007f765f703b0b
> >    RDX: 00007ffe2cefe860 RSI: 00000000c0484e41 RDI: 0000000000000003
> >    RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000000
> >    R10: 00007f765f611d50 R11: 0000000000000202 R12: 0000000000000003
> >    R13: 00000000c0484e41 R14: 0000000000000001 R15: 00007ffe2cefea60
> >     </TASK>
> >
> > Reported-by: Casey Chen<cachen@purestorage.com>
> > Signed-off-by: Keith Busch<kbusch@kernel.org>
> > ---
>
>
> Looks good.
>
> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
>
> -ck
>
>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nvme: fix admin request_queue lifetime
  2025-11-04 22:59 [PATCH] nvme: fix admin request_queue lifetime Keith Busch
  2025-11-04 23:08 ` Chaitanya Kulkarni
@ 2025-11-05  1:20 ` Ming Lei
  2025-11-05  7:38 ` Hannes Reinecke
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Ming Lei @ 2025-11-05  1:20 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-nvme, hch, chaitanyak, Keith Busch, Casey Chen

On Tue, Nov 04, 2025 at 02:59:39PM -0800, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> The namespaces can access the controller's admin request_queue, and
> stale references on the namespaces may exist. Ensure the request_queue
> is active by moving the controller's 'put' after all references on the
> controller have been released to ensure no one is trying to access the
> request_queue. This fixes a reported use-after-free bug:
> 
>   BUG: KASAN: slab-use-after-free in blk_queue_enter+0x41c/0x4a0
>   Read of size 8 at addr ffff88c0a53819f8 by task nvme/3287
>   CPU: 67 UID: 0 PID: 3287 Comm: nvme Tainted: G            E       6.13.2-ga1582f1a031e #15
>   Tainted: [E]=UNSIGNED_MODULE
>   Hardware name: Jabil /EGS 2S MB1, BIOS 1.00 06/18/2025
>   Call Trace:
>    <TASK>
>    dump_stack_lvl+0x4f/0x60
>    print_report+0xc4/0x620
>    ? _raw_spin_lock_irqsave+0x70/0xb0
>    ? _raw_read_unlock_irqrestore+0x30/0x30
>    ? blk_queue_enter+0x41c/0x4a0
>    kasan_report+0xab/0xe0
>    ? blk_queue_enter+0x41c/0x4a0
>    blk_queue_enter+0x41c/0x4a0
>    ? __irq_work_queue_local+0x75/0x1d0
>    ? blk_queue_start_drain+0x70/0x70
>    ? irq_work_queue+0x18/0x20
>    ? vprintk_emit.part.0+0x1cc/0x350
>    ? wake_up_klogd_work_func+0x60/0x60
>    blk_mq_alloc_request+0x2b7/0x6b0
>    ? __blk_mq_alloc_requests+0x1060/0x1060
>    ? __switch_to+0x5b7/0x1060
>    nvme_submit_user_cmd+0xa9/0x330
>    nvme_user_cmd.isra.0+0x240/0x3f0
>    ? force_sigsegv+0xe0/0xe0
>    ? nvme_user_cmd64+0x400/0x400
>    ? vfs_fileattr_set+0x9b0/0x9b0
>    ? cgroup_update_frozen_flag+0x24/0x1c0
>    ? cgroup_leave_frozen+0x204/0x330
>    ? nvme_ioctl+0x7c/0x2c0
>    blkdev_ioctl+0x1a8/0x4d0
>    ? blkdev_common_ioctl+0x1930/0x1930
>    ? fdget+0x54/0x380
>    __x64_sys_ioctl+0x129/0x190
>    do_syscall_64+0x5b/0x160
>    entry_SYSCALL_64_after_hwframe+0x4b/0x53
>   RIP: 0033:0x7f765f703b0b
>   Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d dd 52 0f 00 f7 d8 64 89 01 48
>   RSP: 002b:00007ffe2cefe808 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
>   RAX: ffffffffffffffda RBX: 00007ffe2cefe860 RCX: 00007f765f703b0b
>   RDX: 00007ffe2cefe860 RSI: 00000000c0484e41 RDI: 0000000000000003
>   RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000000
>   R10: 00007f765f611d50 R11: 0000000000000202 R12: 0000000000000003
>   R13: 00000000c0484e41 R14: 0000000000000001 R15: 00007ffe2cefea60
>    </TASK>
> 
> Reported-by: Casey Chen <cachen@purestorage.com>
> Signed-off-by: Keith Busch <kbusch@kernel.org>

Reviewed-by: Ming Lei <ming.lei@redhat.com>


Thanks,
Ming



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nvme: fix admin request_queue lifetime
  2025-11-04 22:59 [PATCH] nvme: fix admin request_queue lifetime Keith Busch
  2025-11-04 23:08 ` Chaitanya Kulkarni
  2025-11-05  1:20 ` Ming Lei
@ 2025-11-05  7:38 ` Hannes Reinecke
  2025-11-05 13:14 ` Christoph Hellwig
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Hannes Reinecke @ 2025-11-05  7:38 UTC (permalink / raw)
  To: Keith Busch, linux-nvme
  Cc: hch, ming.lei, chaitanyak, Keith Busch, Casey Chen

On 11/4/25 23:59, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> The namespaces can access the controller's admin request_queue, and
> stale references on the namespaces may exist. Ensure the request_queue
> is active by moving the controller's 'put' after all references on the
> controller have been released to ensure no one is trying to access the
> request_queue. This fixes a reported use-after-free bug:
> 
>    BUG: KASAN: slab-use-after-free in blk_queue_enter+0x41c/0x4a0
>    Read of size 8 at addr ffff88c0a53819f8 by task nvme/3287
>    CPU: 67 UID: 0 PID: 3287 Comm: nvme Tainted: G            E       6.13.2-ga1582f1a031e #15
>    Tainted: [E]=UNSIGNED_MODULE
>    Hardware name: Jabil /EGS 2S MB1, BIOS 1.00 06/18/2025
>    Call Trace:
>     <TASK>
>     dump_stack_lvl+0x4f/0x60
>     print_report+0xc4/0x620
>     ? _raw_spin_lock_irqsave+0x70/0xb0
>     ? _raw_read_unlock_irqrestore+0x30/0x30
>     ? blk_queue_enter+0x41c/0x4a0
>     kasan_report+0xab/0xe0
>     ? blk_queue_enter+0x41c/0x4a0
>     blk_queue_enter+0x41c/0x4a0
>     ? __irq_work_queue_local+0x75/0x1d0
>     ? blk_queue_start_drain+0x70/0x70
>     ? irq_work_queue+0x18/0x20
>     ? vprintk_emit.part.0+0x1cc/0x350
>     ? wake_up_klogd_work_func+0x60/0x60
>     blk_mq_alloc_request+0x2b7/0x6b0
>     ? __blk_mq_alloc_requests+0x1060/0x1060
>     ? __switch_to+0x5b7/0x1060
>     nvme_submit_user_cmd+0xa9/0x330
>     nvme_user_cmd.isra.0+0x240/0x3f0
>     ? force_sigsegv+0xe0/0xe0
>     ? nvme_user_cmd64+0x400/0x400
>     ? vfs_fileattr_set+0x9b0/0x9b0
>     ? cgroup_update_frozen_flag+0x24/0x1c0
>     ? cgroup_leave_frozen+0x204/0x330
>     ? nvme_ioctl+0x7c/0x2c0
>     blkdev_ioctl+0x1a8/0x4d0
>     ? blkdev_common_ioctl+0x1930/0x1930
>     ? fdget+0x54/0x380
>     __x64_sys_ioctl+0x129/0x190
>     do_syscall_64+0x5b/0x160
>     entry_SYSCALL_64_after_hwframe+0x4b/0x53
>    RIP: 0033:0x7f765f703b0b
>    Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d dd 52 0f 00 f7 d8 64 89 01 48
>    RSP: 002b:00007ffe2cefe808 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
>    RAX: ffffffffffffffda RBX: 00007ffe2cefe860 RCX: 00007f765f703b0b
>    RDX: 00007ffe2cefe860 RSI: 00000000c0484e41 RDI: 0000000000000003
>    RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000000
>    R10: 00007f765f611d50 R11: 0000000000000202 R12: 0000000000000003
>    R13: 00000000c0484e41 R14: 0000000000000001 R15: 00007ffe2cefea60
>     </TASK>
> 
> Reported-by: Casey Chen <cachen@purestorage.com>
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
>   drivers/nvme/host/core.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index fa4181d7de736..0b83d82f67e75 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -4901,7 +4901,6 @@ void nvme_remove_admin_tag_set(struct nvme_ctrl *ctrl)
>   	 */
>   	nvme_stop_keep_alive(ctrl);
>   	blk_mq_destroy_queue(ctrl->admin_q);
> -	blk_put_queue(ctrl->admin_q);
>   	if (ctrl->ops->flags & NVME_F_FABRICS) {
>   		blk_mq_destroy_queue(ctrl->fabrics_q);
>   		blk_put_queue(ctrl->fabrics_q);
> @@ -5045,6 +5044,7 @@ static void nvme_free_ctrl(struct device *dev)
>   		container_of(dev, struct nvme_ctrl, ctrl_device);
>   	struct nvme_subsystem *subsys = ctrl->subsys;
>   
> +	blk_put_queue(ctrl->admin_q);
>   	if (!subsys || ctrl->instance != subsys->instance)
>   		ida_free(&nvme_instance_ida, ctrl->instance);
>   	nvme_free_cels(ctrl);

Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nvme: fix admin request_queue lifetime
  2025-11-04 22:59 [PATCH] nvme: fix admin request_queue lifetime Keith Busch
                   ` (2 preceding siblings ...)
  2025-11-05  7:38 ` Hannes Reinecke
@ 2025-11-05 13:14 ` Christoph Hellwig
  2025-11-05 20:21 ` Casey Chen
  2025-11-05 21:31 ` Ewan Milne
  5 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2025-11-05 13:14 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-nvme, hch, ming.lei, chaitanyak, Keith Busch, Casey Chen

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nvme: fix admin request_queue lifetime
  2025-11-04 22:59 [PATCH] nvme: fix admin request_queue lifetime Keith Busch
                   ` (3 preceding siblings ...)
  2025-11-05 13:14 ` Christoph Hellwig
@ 2025-11-05 20:21 ` Casey Chen
  2025-11-05 20:31   ` Keith Busch
  2025-11-05 21:31 ` Ewan Milne
  5 siblings, 1 reply; 12+ messages in thread
From: Casey Chen @ 2025-11-05 20:21 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-nvme, hch, ming.lei, chaitanyak, Keith Busch

On Tue, Nov 4, 2025 at 3:00 PM Keith Busch <kbusch@meta.com> wrote:
>
> From: Keith Busch <kbusch@kernel.org>
>
> The namespaces can access the controller's admin request_queue, and
> stale references on the namespaces may exist. Ensure the request_queue
> is active by moving the controller's 'put' after all references on the
> controller have been released to ensure no one is trying to access the
> request_queue. This fixes a reported use-after-free bug:
>
>   BUG: KASAN: slab-use-after-free in blk_queue_enter+0x41c/0x4a0
>   Read of size 8 at addr ffff88c0a53819f8 by task nvme/3287
>   CPU: 67 UID: 0 PID: 3287 Comm: nvme Tainted: G            E       6.13.2-ga1582f1a031e #15
>   Tainted: [E]=UNSIGNED_MODULE
>   Hardware name: Jabil /EGS 2S MB1, BIOS 1.00 06/18/2025
>   Call Trace:
>    <TASK>
>    dump_stack_lvl+0x4f/0x60
>    print_report+0xc4/0x620
>    ? _raw_spin_lock_irqsave+0x70/0xb0
>    ? _raw_read_unlock_irqrestore+0x30/0x30
>    ? blk_queue_enter+0x41c/0x4a0
>    kasan_report+0xab/0xe0
>    ? blk_queue_enter+0x41c/0x4a0
>    blk_queue_enter+0x41c/0x4a0
>    ? __irq_work_queue_local+0x75/0x1d0
>    ? blk_queue_start_drain+0x70/0x70
>    ? irq_work_queue+0x18/0x20
>    ? vprintk_emit.part.0+0x1cc/0x350
>    ? wake_up_klogd_work_func+0x60/0x60
>    blk_mq_alloc_request+0x2b7/0x6b0
>    ? __blk_mq_alloc_requests+0x1060/0x1060
>    ? __switch_to+0x5b7/0x1060
>    nvme_submit_user_cmd+0xa9/0x330
>    nvme_user_cmd.isra.0+0x240/0x3f0
>    ? force_sigsegv+0xe0/0xe0
>    ? nvme_user_cmd64+0x400/0x400
>    ? vfs_fileattr_set+0x9b0/0x9b0
>    ? cgroup_update_frozen_flag+0x24/0x1c0
>    ? cgroup_leave_frozen+0x204/0x330
>    ? nvme_ioctl+0x7c/0x2c0
>    blkdev_ioctl+0x1a8/0x4d0
>    ? blkdev_common_ioctl+0x1930/0x1930
>    ? fdget+0x54/0x380
>    __x64_sys_ioctl+0x129/0x190
>    do_syscall_64+0x5b/0x160
>    entry_SYSCALL_64_after_hwframe+0x4b/0x53
>   RIP: 0033:0x7f765f703b0b
>   Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d dd 52 0f 00 f7 d8 64 89 01 48
>   RSP: 002b:00007ffe2cefe808 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
>   RAX: ffffffffffffffda RBX: 00007ffe2cefe860 RCX: 00007f765f703b0b
>   RDX: 00007ffe2cefe860 RSI: 00000000c0484e41 RDI: 0000000000000003
>   RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000000
>   R10: 00007f765f611d50 R11: 0000000000000202 R12: 0000000000000003
>   R13: 00000000c0484e41 R14: 0000000000000001 R15: 00007ffe2cefea60
>    </TASK>
>
> Reported-by: Casey Chen <cachen@purestorage.com>
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
>  drivers/nvme/host/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index fa4181d7de736..0b83d82f67e75 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -4901,7 +4901,6 @@ void nvme_remove_admin_tag_set(struct nvme_ctrl *ctrl)
>          */
>         nvme_stop_keep_alive(ctrl);
>         blk_mq_destroy_queue(ctrl->admin_q);
> -       blk_put_queue(ctrl->admin_q);
>         if (ctrl->ops->flags & NVME_F_FABRICS) {
>                 blk_mq_destroy_queue(ctrl->fabrics_q);
>                 blk_put_queue(ctrl->fabrics_q);
> @@ -5045,6 +5044,7 @@ static void nvme_free_ctrl(struct device *dev)
>                 container_of(dev, struct nvme_ctrl, ctrl_device);
>         struct nvme_subsystem *subsys = ctrl->subsys;
>
> +       blk_put_queue(ctrl->admin_q);

Wait. Do we need to check ctrl->admin_q non-NULL before putting it ?
If nvme_alloc_admin_tag_set() fails, blk_put_queue() would put NULL
and panic kernel.

Casey

>         if (!subsys || ctrl->instance != subsys->instance)
>                 ida_free(&nvme_instance_ida, ctrl->instance);
>         nvme_free_cels(ctrl);
> --
> 2.47.3
>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nvme: fix admin request_queue lifetime
  2025-11-05 20:21 ` Casey Chen
@ 2025-11-05 20:31   ` Keith Busch
  2025-11-06  0:10     ` Chaitanya Kulkarni
  0 siblings, 1 reply; 12+ messages in thread
From: Keith Busch @ 2025-11-05 20:31 UTC (permalink / raw)
  To: Casey Chen; +Cc: Keith Busch, linux-nvme, hch, ming.lei, chaitanyak

On Wed, Nov 05, 2025 at 12:21:14PM -0800, Casey Chen wrote:
> On Tue, Nov 4, 2025 at 3:00 PM Keith Busch <kbusch@meta.com> wrote:
> > @@ -5045,6 +5044,7 @@ static void nvme_free_ctrl(struct device *dev)
> >                 container_of(dev, struct nvme_ctrl, ctrl_device);
> >         struct nvme_subsystem *subsys = ctrl->subsys;
> >
> > +       blk_put_queue(ctrl->admin_q);
> 
> Wait. Do we need to check ctrl->admin_q non-NULL before putting it ?
> If nvme_alloc_admin_tag_set() fails, blk_put_queue() would put NULL
> and panic kernel.

Oh, like if we call nvme_uninit_ctrl() prior to allocating the tagset?
Yes, I think you're right, unlikely as that is to occur. Thanks, I'll
fold in your suggestion.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nvme: fix admin request_queue lifetime
  2025-11-04 22:59 [PATCH] nvme: fix admin request_queue lifetime Keith Busch
                   ` (4 preceding siblings ...)
  2025-11-05 20:21 ` Casey Chen
@ 2025-11-05 21:31 ` Ewan Milne
  2025-11-05 22:34   ` Keith Busch
  5 siblings, 1 reply; 12+ messages in thread
From: Ewan Milne @ 2025-11-05 21:31 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-nvme, hch, ming.lei, chaitanyak, Keith Busch, Casey Chen

On Tue, Nov 4, 2025 at 6:00 PM Keith Busch <kbusch@meta.com> wrote:
>
> From: Keith Busch <kbusch@kernel.org>
>
> The namespaces can access the controller's admin request_queue, and
> stale references on the namespaces may exist. Ensure the request_queue
> is active by moving the controller's 'put' after all references on the
> controller have been released to ensure no one is trying to access the
> request_queue. This fixes a reported use-after-free bug:
>

OK, so I get that this fixes the use-after-free, and don't let my
comments hold up
acceptance of the patch.  But can you explain why this actually helps?
nvme_alloc_admin_tag_set() allocates the admin_q as part of the admin tagset
initiailization, and doesn't this change keep the lifetime of the
admin_q past when
the admin tagset is deallocated?  So where do we detect that?

-Ewan



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nvme: fix admin request_queue lifetime
  2025-11-05 21:31 ` Ewan Milne
@ 2025-11-05 22:34   ` Keith Busch
  2025-11-06 19:33     ` Ewan Milne
  0 siblings, 1 reply; 12+ messages in thread
From: Keith Busch @ 2025-11-05 22:34 UTC (permalink / raw)
  To: Ewan Milne; +Cc: Keith Busch, linux-nvme, hch, ming.lei, chaitanyak, Casey Chen

On Wed, Nov 05, 2025 at 04:31:13PM -0500, Ewan Milne wrote:
> On Tue, Nov 4, 2025 at 6:00 PM Keith Busch <kbusch@meta.com> wrote:
> >
> > From: Keith Busch <kbusch@kernel.org>
> >
> > The namespaces can access the controller's admin request_queue, and
> > stale references on the namespaces may exist. Ensure the request_queue
> > is active by moving the controller's 'put' after all references on the
> > controller have been released to ensure no one is trying to access the
> > request_queue. This fixes a reported use-after-free bug:
> >
> 
> OK, so I get that this fixes the use-after-free, and don't let my
> comments hold up
> acceptance of the patch.  But can you explain why this actually helps?
> nvme_alloc_admin_tag_set() allocates the admin_q as part of the admin tagset
> initiailization, and doesn't this change keep the lifetime of the
> admin_q past when
> the admin tagset is deallocated?  So where do we detect that?

We still call blk_mq_destroy_queue() prior to calling
blk_mq_free_tag_set(). The queue has exited the tagset and set to dying;
no one can "enter" the queue after that, so that tagset can be safely
freed even if people are holding references on that dying queue.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nvme: fix admin request_queue lifetime
  2025-11-05 20:31   ` Keith Busch
@ 2025-11-06  0:10     ` Chaitanya Kulkarni
  0 siblings, 0 replies; 12+ messages in thread
From: Chaitanya Kulkarni @ 2025-11-06  0:10 UTC (permalink / raw)
  To: Keith Busch
  Cc: Keith Busch, linux-nvme@lists.infradead.org, hch@lst.de,
	ming.lei@redhat.com, Casey Chen

On 11/5/25 12:31, Keith Busch wrote:
> On Wed, Nov 05, 2025 at 12:21:14PM -0800, Casey Chen wrote:
>> On Tue, Nov 4, 2025 at 3:00 PM Keith Busch <kbusch@meta.com> wrote:
>>> @@ -5045,6 +5044,7 @@ static void nvme_free_ctrl(struct device *dev)
>>>                  container_of(dev, struct nvme_ctrl, ctrl_device);
>>>          struct nvme_subsystem *subsys = ctrl->subsys;
>>>
>>> +       blk_put_queue(ctrl->admin_q);
>> Wait. Do we need to check ctrl->admin_q non-NULL before putting it ?
>> If nvme_alloc_admin_tag_set() fails, blk_put_queue() would put NULL
>> and panic kernel.
> Oh, like if we call nvme_uninit_ctrl() prior to allocating the tagset?
> Yes, I think you're right, unlikely as that is to occur. Thanks, I'll
> fold in your suggestion.

I had that check in the patch I posted earlier :)

"""""""

+    /**
+     * Release admin_q's final reference. All namespace references have
+     * been released at this point. NULL check is needed for to handle
+     * allocation failure in nvme_alloc_admin_tag_set().
+     */
+    if (ctrl->admin_q)
+        blk_put_queue(ctrl->admin_q);
+
        if (!subsys || ctrl->instance != subsys->instance)
            ida_free(&nvme_instance_ida, ctrl->instance);
        nvme_free_cels(ctrl);

-ck

""""""

-ck



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nvme: fix admin request_queue lifetime
  2025-11-05 22:34   ` Keith Busch
@ 2025-11-06 19:33     ` Ewan Milne
  0 siblings, 0 replies; 12+ messages in thread
From: Ewan Milne @ 2025-11-06 19:33 UTC (permalink / raw)
  To: Keith Busch
  Cc: Keith Busch, linux-nvme, hch, ming.lei, chaitanyak, Casey Chen

On Wed, Nov 5, 2025 at 5:35 PM Keith Busch <kbusch@kernel.org> wrote:
>
> On Wed, Nov 05, 2025 at 04:31:13PM -0500, Ewan Milne wrote:
> > On Tue, Nov 4, 2025 at 6:00 PM Keith Busch <kbusch@meta.com> wrote:
> > >
> > > From: Keith Busch <kbusch@kernel.org>
> > >
> > > The namespaces can access the controller's admin request_queue, and
> > > stale references on the namespaces may exist. Ensure the request_queue
> > > is active by moving the controller's 'put' after all references on the
> > > controller have been released to ensure no one is trying to access the
> > > request_queue. This fixes a reported use-after-free bug:
> > >
> >
> > OK, so I get that this fixes the use-after-free, and don't let my
> > comments hold up
> > acceptance of the patch.  But can you explain why this actually helps?
> > nvme_alloc_admin_tag_set() allocates the admin_q as part of the admin tagset
> > initiailization, and doesn't this change keep the lifetime of the
> > admin_q past when
> > the admin tagset is deallocated?  So where do we detect that?
>
> We still call blk_mq_destroy_queue() prior to calling
> blk_mq_free_tag_set(). The queue has exited the tagset and set to dying;
> no one can "enter" the queue after that, so that tagset can be safely
> freed even if people are holding references on that dying queue.
>

OK, thanks.  I think it wise to include Chaitanya's null pointer check also.

Reviewed-by: Ewan D. Milne <emilne@redhat.com>



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-11-06 19:33 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-04 22:59 [PATCH] nvme: fix admin request_queue lifetime Keith Busch
2025-11-04 23:08 ` Chaitanya Kulkarni
2025-11-04 23:22   ` Casey Chen
2025-11-05  1:20 ` Ming Lei
2025-11-05  7:38 ` Hannes Reinecke
2025-11-05 13:14 ` Christoph Hellwig
2025-11-05 20:21 ` Casey Chen
2025-11-05 20:31   ` Keith Busch
2025-11-06  0:10     ` Chaitanya Kulkarni
2025-11-05 21:31 ` Ewan Milne
2025-11-05 22:34   ` Keith Busch
2025-11-06 19:33     ` Ewan Milne

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).