* [PATCH] nvme: reject completions for requests that are not in flight
@ 2026-05-22 15:30 Chao Shi
2026-05-25 20:27 ` Chao S
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Chao Shi @ 2026-05-22 15:30 UTC (permalink / raw)
To: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg
Cc: linux-nvme, linux-kernel, Chao Shi, Sungwoo Kim, Dave Tian,
Weidong Zhu
nvme_find_rq() resolves a device-supplied command id to a request with
blk_mq_tag_to_rq(), which returns whatever request last used that tag -
possibly one that is no longer in flight (freed, or never dispatched and
thus with a NULL rq->mq_hctx). Commit e7006de6c238 ("nvme: code
command_id with a genctr for use-after-free validation") guards against
this, but its generation counter is only 4 bits wide and can be matched
by a malfunctioning or malicious device replaying command ids. The
driver then completes a request that is not outstanding, dereferencing a
NULL rq->mq_hctx or double-completing a command:
Oops: general protection fault ... KASAN: null-ptr-deref
RIP: blk_mq_complete_request_remote+0xe5/0xa80 block/blk-mq.c:1319
nvme_handle_cqe drivers/nvme/host/pci.c:1418 [inline]
nvme_poll_cq drivers/nvme/host/pci.c:1449
nvme_irq drivers/nvme/host/pci.c:1463
Require the request to be in flight before completing it. The check uses
the request state, so it also covers controllers with
NVME_QUIRK_SKIP_CID_GEN.
Found by FuzzNvme(Syzkaller with FEMU fuzzing framework).
Acked-by: Sungwoo Kim <iam@sung-woo.kim>
Acked-by: Dave Tian <daveti@purdue.edu>
Acked-by: Weidong Zhu <weizhu@fiu.edu>
Signed-off-by: Chao Shi <coshi036@gmail.com>
---
drivers/nvme/host/nvme.h | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 9a5f28c5103c..3a525c1dc818 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -615,6 +615,17 @@ static inline struct request *nvme_find_rq(struct blk_mq_tags *tags,
tag);
return NULL;
}
+ /*
+ * blk_mq_tag_to_rq() returns whatever request last used this tag, which
+ * may no longer be in flight if the device reports a bogus command id.
+ * Completing it would deref a NULL rq->mq_hctx or double-complete a
+ * command; the 4-bit genctr below only narrows the window.
+ */
+ if (unlikely(blk_mq_rq_state(rq) != MQ_RQ_IN_FLIGHT)) {
+ dev_err(nvme_req(rq)->ctrl->device,
+ "completion for request %#x not in flight\n", tag);
+ return NULL;
+ }
if (unlikely(nvme_genctr_mask(nvme_req(rq)->genctr) != genctr)) {
dev_err(nvme_req(rq)->ctrl->device,
"request %#x genctr mismatch (got %#x expected %#x)\n",
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] nvme: reject completions for requests that are not in flight
2026-05-22 15:30 [PATCH] nvme: reject completions for requests that are not in flight Chao Shi
@ 2026-05-25 20:27 ` Chao S
2026-05-27 14:19 ` Christoph Hellwig
2026-05-27 15:00 ` Keith Busch
2 siblings, 0 replies; 5+ messages in thread
From: Chao S @ 2026-05-25 20:27 UTC (permalink / raw)
To: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg
Cc: linux-nvme, linux-kernel, Sungwoo Kim, Dave Tian, Weidong Zhu
Hi,
Since posting this I reproduced a more severe manifestation of the same
bug and confirmed the patch handles it; sharing as extra justification.
The commit message covers the freed / never-dispatched case (the NULL
rq->mq_hctx dereference). When the stale command id instead maps to a
tag that has already been *reused*, the driver completes an unrelated,
still-in-flight request -- a use-after-free. Under fuzzing (a device
that replays and reorders completions) this did not show up as a clean
NULL deref but as cross-subsystem memory corruption: general protection
faults in mtree_range_walk(), unlink_anon_vmas() and the slub freelist,
in unrelated tasks (modprobe, systemd-udevd, ...). The trigger was a
stale completion delivered for a request that a concurrent controller
reset had just freed.
To confirm the fix addresses this, I rebuilt the kernel with the patch
and re-ran the same workload for ~10h. The guard now rejects the
offending completion instead of acting on it:
nvme nvme0: resetting controller
nvme nvme0: completion for request 0x1c0 not in flight
nvme nvme0: invalid id 448 completed on queue 2
and no use-after-free / corruption recurred over the run.
The code is unchanged; I'm happy to fold this into the commit message
as a v2 if you'd prefer it spelled out there.
Thanks,
Chao
On Fri, May 22, 2026 at 11:30 AM Chao Shi <coshi036@gmail.com> wrote:
>
> nvme_find_rq() resolves a device-supplied command id to a request with
> blk_mq_tag_to_rq(), which returns whatever request last used that tag -
> possibly one that is no longer in flight (freed, or never dispatched and
> thus with a NULL rq->mq_hctx). Commit e7006de6c238 ("nvme: code
> command_id with a genctr for use-after-free validation") guards against
> this, but its generation counter is only 4 bits wide and can be matched
> by a malfunctioning or malicious device replaying command ids. The
> driver then completes a request that is not outstanding, dereferencing a
> NULL rq->mq_hctx or double-completing a command:
>
> Oops: general protection fault ... KASAN: null-ptr-deref
> RIP: blk_mq_complete_request_remote+0xe5/0xa80 block/blk-mq.c:1319
> nvme_handle_cqe drivers/nvme/host/pci.c:1418 [inline]
> nvme_poll_cq drivers/nvme/host/pci.c:1449
> nvme_irq drivers/nvme/host/pci.c:1463
>
> Require the request to be in flight before completing it. The check uses
> the request state, so it also covers controllers with
> NVME_QUIRK_SKIP_CID_GEN.
>
> Found by FuzzNvme(Syzkaller with FEMU fuzzing framework).
>
> Acked-by: Sungwoo Kim <iam@sung-woo.kim>
> Acked-by: Dave Tian <daveti@purdue.edu>
> Acked-by: Weidong Zhu <weizhu@fiu.edu>
> Signed-off-by: Chao Shi <coshi036@gmail.com>
> ---
> drivers/nvme/host/nvme.h | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index 9a5f28c5103c..3a525c1dc818 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -615,6 +615,17 @@ static inline struct request *nvme_find_rq(struct blk_mq_tags *tags,
> tag);
> return NULL;
> }
> + /*
> + * blk_mq_tag_to_rq() returns whatever request last used this tag, which
> + * may no longer be in flight if the device reports a bogus command id.
> + * Completing it would deref a NULL rq->mq_hctx or double-complete a
> + * command; the 4-bit genctr below only narrows the window.
> + */
> + if (unlikely(blk_mq_rq_state(rq) != MQ_RQ_IN_FLIGHT)) {
> + dev_err(nvme_req(rq)->ctrl->device,
> + "completion for request %#x not in flight\n", tag);
> + return NULL;
> + }
> if (unlikely(nvme_genctr_mask(nvme_req(rq)->genctr) != genctr)) {
> dev_err(nvme_req(rq)->ctrl->device,
> "request %#x genctr mismatch (got %#x expected %#x)\n",
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] nvme: reject completions for requests that are not in flight
2026-05-22 15:30 [PATCH] nvme: reject completions for requests that are not in flight Chao Shi
2026-05-25 20:27 ` Chao S
@ 2026-05-27 14:19 ` Christoph Hellwig
2026-05-27 15:02 ` Jens Axboe
2026-05-27 15:00 ` Keith Busch
2 siblings, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2026-05-27 14:19 UTC (permalink / raw)
To: Chao Shi
Cc: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
linux-nvme, linux-kernel, Sungwoo Kim, Dave Tian, Weidong Zhu
On Fri, May 22, 2026 at 11:30:34AM -0400, Chao Shi wrote:
> nvme_find_rq() resolves a device-supplied command id to a request with
> blk_mq_tag_to_rq(), which returns whatever request last used that tag -
> possibly one that is no longer in flight (freed, or never dispatched and
> thus with a NULL rq->mq_hctx). Commit e7006de6c238 ("nvme: code
> command_id with a genctr for use-after-free validation") guards against
> this, but its generation counter is only 4 bits wide and can be matched
> by a malfunctioning or malicious device replaying command ids. The
> driver then completes a request that is not outstanding, dereferencing a
> NULL rq->mq_hctx or double-completing a command:
I don't think an intentionally malicious device is part of the threat
model here. This was added to protect against buggy devices.
> + /*
> + * blk_mq_tag_to_rq() returns whatever request last used this tag, which
> + * may no longer be in flight if the device reports a bogus command id.
> + * Completing it would deref a NULL rq->mq_hctx or double-complete a
> + * command; the 4-bit genctr below only narrows the window.
> + */
> + if (unlikely(blk_mq_rq_state(rq) != MQ_RQ_IN_FLIGHT)) {
> + dev_err(nvme_req(rq)->ctrl->device,
> + "completion for request %#x not in flight\n", tag);
> + return NULL;
> + }
Although this check looks cheap enough that it should not hurt to add
it. So I think this should be ok, but maybe respin with your planned
commit message update.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] nvme: reject completions for requests that are not in flight
2026-05-22 15:30 [PATCH] nvme: reject completions for requests that are not in flight Chao Shi
2026-05-25 20:27 ` Chao S
2026-05-27 14:19 ` Christoph Hellwig
@ 2026-05-27 15:00 ` Keith Busch
2 siblings, 0 replies; 5+ messages in thread
From: Keith Busch @ 2026-05-27 15:00 UTC (permalink / raw)
To: Chao Shi
Cc: Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme,
linux-kernel, Sungwoo Kim, Dave Tian, Weidong Zhu
On Fri, May 22, 2026 at 11:30:34AM -0400, Chao Shi wrote:
> Oops: general protection fault ... KASAN: null-ptr-deref
> RIP: blk_mq_complete_request_remote+0xe5/0xa80 block/blk-mq.c:1319
> nvme_handle_cqe drivers/nvme/host/pci.c:1418 [inline]
> nvme_poll_cq drivers/nvme/host/pci.c:1449
> nvme_irq drivers/nvme/host/pci.c:1463
This scenario doesn't sound specific to nvme. Should blk-mq completion
check for the IN_FLIGHT state instead?
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] nvme: reject completions for requests that are not in flight
2026-05-27 14:19 ` Christoph Hellwig
@ 2026-05-27 15:02 ` Jens Axboe
0 siblings, 0 replies; 5+ messages in thread
From: Jens Axboe @ 2026-05-27 15:02 UTC (permalink / raw)
To: Christoph Hellwig, Chao Shi
Cc: Keith Busch, Sagi Grimberg, linux-nvme, linux-kernel, Sungwoo Kim,
Dave Tian, Weidong Zhu
On 5/27/26 8:19 AM, Christoph Hellwig wrote:
> On Fri, May 22, 2026 at 11:30:34AM -0400, Chao Shi wrote:
>> nvme_find_rq() resolves a device-supplied command id to a request with
>> blk_mq_tag_to_rq(), which returns whatever request last used that tag -
>> possibly one that is no longer in flight (freed, or never dispatched and
>> thus with a NULL rq->mq_hctx). Commit e7006de6c238 ("nvme: code
>> command_id with a genctr for use-after-free validation") guards against
>> this, but its generation counter is only 4 bits wide and can be matched
>> by a malfunctioning or malicious device replaying command ids. The
>> driver then completes a request that is not outstanding, dereferencing a
>> NULL rq->mq_hctx or double-completing a command:
>
> I don't think an intentionally malicious device is part of the threat
> model here. This was added to protect against buggy devices.
Malicious devices are explicitly NOT part of the linux threat model. If
this is a real device, I'd say go talk to whomever made it and get the
firmware fixed. If this is a "hardening" effort to protect against the
threat of malicious devices, then I don't think we should bother.
>> + * blk_mq_tag_to_rq() returns whatever request last used this tag, which
>> + * may no longer be in flight if the device reports a bogus command id.
>> + * Completing it would deref a NULL rq->mq_hctx or double-complete a
>> + * command; the 4-bit genctr below only narrows the window.
>> + */
>> + if (unlikely(blk_mq_rq_state(rq) != MQ_RQ_IN_FLIGHT)) {
>> + dev_err(nvme_req(rq)->ctrl->device,
>> + "completion for request %#x not in flight\n", tag);
>> + return NULL;
>> + }
>
> Although this check looks cheap enough that it should not hurt to add
> it. So I think this should be ok, but maybe respin with your planned
> commit message update.
Only for the right reasons, imho.
--
Jens Axboe
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-05-27 15:02 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-22 15:30 [PATCH] nvme: reject completions for requests that are not in flight Chao Shi
2026-05-25 20:27 ` Chao S
2026-05-27 14:19 ` Christoph Hellwig
2026-05-27 15:02 ` Jens Axboe
2026-05-27 15:00 ` Keith Busch
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox