Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] nvme: reject completions for requests that are not in flight
@ 2026-05-22 15:30 Chao Shi
  2026-05-25 20:27 ` Chao S
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Chao Shi @ 2026-05-22 15:30 UTC (permalink / raw)
  To: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg
  Cc: linux-nvme, linux-kernel, Chao Shi, Sungwoo Kim, Dave Tian,
	Weidong Zhu

nvme_find_rq() resolves a device-supplied command id to a request with
blk_mq_tag_to_rq(), which returns whatever request last used that tag -
possibly one that is no longer in flight (freed, or never dispatched and
thus with a NULL rq->mq_hctx).  Commit e7006de6c238 ("nvme: code
command_id with a genctr for use-after-free validation") guards against
this, but its generation counter is only 4 bits wide and can be matched
by a malfunctioning or malicious device replaying command ids.  The
driver then completes a request that is not outstanding, dereferencing a
NULL rq->mq_hctx or double-completing a command:

  Oops: general protection fault ... KASAN: null-ptr-deref
  RIP: blk_mq_complete_request_remote+0xe5/0xa80 block/blk-mq.c:1319
   nvme_handle_cqe drivers/nvme/host/pci.c:1418 [inline]
   nvme_poll_cq drivers/nvme/host/pci.c:1449
   nvme_irq drivers/nvme/host/pci.c:1463

Require the request to be in flight before completing it.  The check uses
the request state, so it also covers controllers with
NVME_QUIRK_SKIP_CID_GEN.

Found by FuzzNvme(Syzkaller with FEMU fuzzing framework).

Acked-by: Sungwoo Kim <iam@sung-woo.kim>
Acked-by: Dave Tian <daveti@purdue.edu>
Acked-by: Weidong Zhu <weizhu@fiu.edu>
Signed-off-by: Chao Shi <coshi036@gmail.com>
---
 drivers/nvme/host/nvme.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 9a5f28c5103c..3a525c1dc818 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -615,6 +615,17 @@ static inline struct request *nvme_find_rq(struct blk_mq_tags *tags,
 			tag);
 		return NULL;
 	}
+	/*
+	 * blk_mq_tag_to_rq() returns whatever request last used this tag, which
+	 * may no longer be in flight if the device reports a bogus command id.
+	 * Completing it would deref a NULL rq->mq_hctx or double-complete a
+	 * command; the 4-bit genctr below only narrows the window.
+	 */
+	if (unlikely(blk_mq_rq_state(rq) != MQ_RQ_IN_FLIGHT)) {
+		dev_err(nvme_req(rq)->ctrl->device,
+			"completion for request %#x not in flight\n", tag);
+		return NULL;
+	}
 	if (unlikely(nvme_genctr_mask(nvme_req(rq)->genctr) != genctr)) {
 		dev_err(nvme_req(rq)->ctrl->device,
 			"request %#x genctr mismatch (got %#x expected %#x)\n",
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] nvme: reject completions for requests that are not in flight
  2026-05-22 15:30 [PATCH] nvme: reject completions for requests that are not in flight Chao Shi
@ 2026-05-25 20:27 ` Chao S
  2026-05-27 14:19 ` Christoph Hellwig
  2026-05-27 15:00 ` Keith Busch
  2 siblings, 0 replies; 5+ messages in thread
From: Chao S @ 2026-05-25 20:27 UTC (permalink / raw)
  To: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg
  Cc: linux-nvme, linux-kernel, Sungwoo Kim, Dave Tian, Weidong Zhu

Hi,

Since posting this I reproduced a more severe manifestation of the same
bug and confirmed the patch handles it; sharing as extra justification.

The commit message covers the freed / never-dispatched case (the NULL
rq->mq_hctx dereference).  When the stale command id instead maps to a
tag that has already been *reused*, the driver completes an unrelated,
still-in-flight request -- a use-after-free.  Under fuzzing (a device
that replays and reorders completions) this did not show up as a clean
NULL deref but as cross-subsystem memory corruption: general protection
faults in mtree_range_walk(), unlink_anon_vmas() and the slub freelist,
in unrelated tasks (modprobe, systemd-udevd, ...).  The trigger was a
stale completion delivered for a request that a concurrent controller
reset had just freed.

To confirm the fix addresses this, I rebuilt the kernel with the patch
and re-ran the same workload for ~10h.  The guard now rejects the
offending completion instead of acting on it:

  nvme nvme0: resetting controller
  nvme nvme0: completion for request 0x1c0 not in flight
  nvme nvme0: invalid id 448 completed on queue 2

and no use-after-free / corruption recurred over the run.

The code is unchanged; I'm happy to fold this into the commit message
as a v2 if you'd prefer it spelled out there.

Thanks,
Chao

On Fri, May 22, 2026 at 11:30 AM Chao Shi <coshi036@gmail.com> wrote:
>
> nvme_find_rq() resolves a device-supplied command id to a request with
> blk_mq_tag_to_rq(), which returns whatever request last used that tag -
> possibly one that is no longer in flight (freed, or never dispatched and
> thus with a NULL rq->mq_hctx).  Commit e7006de6c238 ("nvme: code
> command_id with a genctr for use-after-free validation") guards against
> this, but its generation counter is only 4 bits wide and can be matched
> by a malfunctioning or malicious device replaying command ids.  The
> driver then completes a request that is not outstanding, dereferencing a
> NULL rq->mq_hctx or double-completing a command:
>
>   Oops: general protection fault ... KASAN: null-ptr-deref
>   RIP: blk_mq_complete_request_remote+0xe5/0xa80 block/blk-mq.c:1319
>    nvme_handle_cqe drivers/nvme/host/pci.c:1418 [inline]
>    nvme_poll_cq drivers/nvme/host/pci.c:1449
>    nvme_irq drivers/nvme/host/pci.c:1463
>
> Require the request to be in flight before completing it.  The check uses
> the request state, so it also covers controllers with
> NVME_QUIRK_SKIP_CID_GEN.
>
> Found by FuzzNvme(Syzkaller with FEMU fuzzing framework).
>
> Acked-by: Sungwoo Kim <iam@sung-woo.kim>
> Acked-by: Dave Tian <daveti@purdue.edu>
> Acked-by: Weidong Zhu <weizhu@fiu.edu>
> Signed-off-by: Chao Shi <coshi036@gmail.com>
> ---
>  drivers/nvme/host/nvme.h | 11 +++++++++++
>  1 file changed, 11 insertions(+)
>
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index 9a5f28c5103c..3a525c1dc818 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -615,6 +615,17 @@ static inline struct request *nvme_find_rq(struct blk_mq_tags *tags,
>                         tag);
>                 return NULL;
>         }
> +       /*
> +        * blk_mq_tag_to_rq() returns whatever request last used this tag, which
> +        * may no longer be in flight if the device reports a bogus command id.
> +        * Completing it would deref a NULL rq->mq_hctx or double-complete a
> +        * command; the 4-bit genctr below only narrows the window.
> +        */
> +       if (unlikely(blk_mq_rq_state(rq) != MQ_RQ_IN_FLIGHT)) {
> +               dev_err(nvme_req(rq)->ctrl->device,
> +                       "completion for request %#x not in flight\n", tag);
> +               return NULL;
> +       }
>         if (unlikely(nvme_genctr_mask(nvme_req(rq)->genctr) != genctr)) {
>                 dev_err(nvme_req(rq)->ctrl->device,
>                         "request %#x genctr mismatch (got %#x expected %#x)\n",
> --
> 2.43.0
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] nvme: reject completions for requests that are not in flight
  2026-05-22 15:30 [PATCH] nvme: reject completions for requests that are not in flight Chao Shi
  2026-05-25 20:27 ` Chao S
@ 2026-05-27 14:19 ` Christoph Hellwig
  2026-05-27 15:02   ` Jens Axboe
  2026-05-27 15:00 ` Keith Busch
  2 siblings, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2026-05-27 14:19 UTC (permalink / raw)
  To: Chao Shi
  Cc: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	linux-nvme, linux-kernel, Sungwoo Kim, Dave Tian, Weidong Zhu

On Fri, May 22, 2026 at 11:30:34AM -0400, Chao Shi wrote:
> nvme_find_rq() resolves a device-supplied command id to a request with
> blk_mq_tag_to_rq(), which returns whatever request last used that tag -
> possibly one that is no longer in flight (freed, or never dispatched and
> thus with a NULL rq->mq_hctx).  Commit e7006de6c238 ("nvme: code
> command_id with a genctr for use-after-free validation") guards against
> this, but its generation counter is only 4 bits wide and can be matched
> by a malfunctioning or malicious device replaying command ids.  The
> driver then completes a request that is not outstanding, dereferencing a
> NULL rq->mq_hctx or double-completing a command:

I don't think an intentionally malicious device is part of the threat
model here.  This was added to protect against buggy devices.

> +	/*
> +	 * blk_mq_tag_to_rq() returns whatever request last used this tag, which
> +	 * may no longer be in flight if the device reports a bogus command id.
> +	 * Completing it would deref a NULL rq->mq_hctx or double-complete a
> +	 * command; the 4-bit genctr below only narrows the window.
> +	 */
> +	if (unlikely(blk_mq_rq_state(rq) != MQ_RQ_IN_FLIGHT)) {
> +		dev_err(nvme_req(rq)->ctrl->device,
> +			"completion for request %#x not in flight\n", tag);
> +		return NULL;
> +	}

Although this check looks cheap enough that it should not hurt to add
it.  So I think this should be ok, but maybe respin with your planned
commit message update.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] nvme: reject completions for requests that are not in flight
  2026-05-22 15:30 [PATCH] nvme: reject completions for requests that are not in flight Chao Shi
  2026-05-25 20:27 ` Chao S
  2026-05-27 14:19 ` Christoph Hellwig
@ 2026-05-27 15:00 ` Keith Busch
  2 siblings, 0 replies; 5+ messages in thread
From: Keith Busch @ 2026-05-27 15:00 UTC (permalink / raw)
  To: Chao Shi
  Cc: Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme,
	linux-kernel, Sungwoo Kim, Dave Tian, Weidong Zhu

On Fri, May 22, 2026 at 11:30:34AM -0400, Chao Shi wrote:
>   Oops: general protection fault ... KASAN: null-ptr-deref
>   RIP: blk_mq_complete_request_remote+0xe5/0xa80 block/blk-mq.c:1319
>    nvme_handle_cqe drivers/nvme/host/pci.c:1418 [inline]
>    nvme_poll_cq drivers/nvme/host/pci.c:1449
>    nvme_irq drivers/nvme/host/pci.c:1463

This scenario doesn't sound specific to nvme. Should blk-mq completion
check for the IN_FLIGHT state instead?


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] nvme: reject completions for requests that are not in flight
  2026-05-27 14:19 ` Christoph Hellwig
@ 2026-05-27 15:02   ` Jens Axboe
  0 siblings, 0 replies; 5+ messages in thread
From: Jens Axboe @ 2026-05-27 15:02 UTC (permalink / raw)
  To: Christoph Hellwig, Chao Shi
  Cc: Keith Busch, Sagi Grimberg, linux-nvme, linux-kernel, Sungwoo Kim,
	Dave Tian, Weidong Zhu

On 5/27/26 8:19 AM, Christoph Hellwig wrote:
> On Fri, May 22, 2026 at 11:30:34AM -0400, Chao Shi wrote:
>> nvme_find_rq() resolves a device-supplied command id to a request with
>> blk_mq_tag_to_rq(), which returns whatever request last used that tag -
>> possibly one that is no longer in flight (freed, or never dispatched and
>> thus with a NULL rq->mq_hctx).  Commit e7006de6c238 ("nvme: code
>> command_id with a genctr for use-after-free validation") guards against
>> this, but its generation counter is only 4 bits wide and can be matched
>> by a malfunctioning or malicious device replaying command ids.  The
>> driver then completes a request that is not outstanding, dereferencing a
>> NULL rq->mq_hctx or double-completing a command:
> 
> I don't think an intentionally malicious device is part of the threat
> model here.  This was added to protect against buggy devices.

Malicious devices are explicitly NOT part of the linux threat model. If
this is a real device, I'd say go talk to whomever made it and get the
firmware fixed. If this is a "hardening" effort to protect against the
threat of malicious devices, then I don't think we should bother.

>> +	 * blk_mq_tag_to_rq() returns whatever request last used this tag, which
>> +	 * may no longer be in flight if the device reports a bogus command id.
>> +	 * Completing it would deref a NULL rq->mq_hctx or double-complete a
>> +	 * command; the 4-bit genctr below only narrows the window.
>> +	 */
>> +	if (unlikely(blk_mq_rq_state(rq) != MQ_RQ_IN_FLIGHT)) {
>> +		dev_err(nvme_req(rq)->ctrl->device,
>> +			"completion for request %#x not in flight\n", tag);
>> +		return NULL;
>> +	}
> 
> Although this check looks cheap enough that it should not hurt to add
> it.  So I think this should be ok, but maybe respin with your planned
> commit message update.

Only for the right reasons, imho.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-05-27 15:02 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-22 15:30 [PATCH] nvme: reject completions for requests that are not in flight Chao Shi
2026-05-25 20:27 ` Chao S
2026-05-27 14:19 ` Christoph Hellwig
2026-05-27 15:02   ` Jens Axboe
2026-05-27 15:00 ` Keith Busch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox