From: Sasha Levin <sashal@kernel.org>
To: patches@lists.linux.dev, stable@vger.kernel.org
Cc: Shivam Kumar <kumar.shivam43666@gmail.com>,
Christoph Hellwig <hch@lst.de>, Keith Busch <kbusch@kernel.org>,
Sasha Levin <sashal@kernel.org>,
sagi@grimberg.me, kch@nvidia.com, linux-nvme@lists.infradead.org,
linux-kernel@vger.kernel.org
Subject: [PATCH AUTOSEL 7.0-6.1] nvmet-tcp: check INIT_FAILED before nvmet_req_uninit in digest error path
Date: Tue, 5 May 2026 05:51:18 -0400 [thread overview]
Message-ID: <20260505095149.512052-2-sashal@kernel.org> (raw)
In-Reply-To: <20260505095149.512052-1-sashal@kernel.org>
From: Shivam Kumar <kumar.shivam43666@gmail.com>
[ Upstream commit 4606467a75cfc16721937272ed29462a750b60c8 ]
In nvmet_tcp_try_recv_ddgst(), when a data digest mismatch is detected,
nvmet_req_uninit() is called unconditionally. However, if the command
arrived via the nvmet_tcp_handle_req_failure() path, nvmet_req_init()
had returned false and percpu_ref_tryget_live() was never executed. The
unconditional percpu_ref_put() inside nvmet_req_uninit() then causes a
refcount underflow, leading to a WARNING in
percpu_ref_switch_to_atomic_rcu, a use-after-free diagnostic, and
eventually a permanent workqueue deadlock.
Check cmd->flags & NVMET_TCP_F_INIT_FAILED before calling
nvmet_req_uninit(), matching the existing pattern in
nvmet_tcp_execute_request().
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Shivam Kumar <kumar.shivam43666@gmail.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
## Phase Walkthrough
Phase 1 Record:
Subject subsystem/action is `nvmet-tcp`, action `check`, intent: avoid
calling `nvmet_req_uninit()` after failed request initialization in the
data-digest error path. Tags found: `Reviewed-by: Christoph Hellwig
<hch@lst.de>`, `Signed-off-by: Shivam Kumar`, `Signed-off-by: Keith
Busch`. No `Fixes:`, `Reported-by:`, `Tested-by:`, `Link:`, or `Cc:
stable` tag in the committed version I verified. The body describes a
real refcount bug: `nvmet_req_init()` can fail before
`percpu_ref_tryget_live()`, while `nvmet_req_uninit()` always does
`percpu_ref_put()`. Hidden bug fix: yes, it prevents refcount underflow,
UAF diagnostics, and a workqueue deadlock.
Phase 2 Record:
One file changed: `drivers/nvme/target/tcp.c`, 2 insertions and 1
deletion, only in `nvmet_tcp_try_recv_ddgst()`. Before: a data digest
mismatch always called `nvmet_req_uninit(&cmd->req)`. After: it skips
that call when `cmd->flags & NVMET_TCP_F_INIT_FAILED`. Bug category:
reference-counting bug with severe follow-on failure. Fix quality:
small, surgical, matches the existing `nvmet_tcp_execute_request()`
handling of `NVMET_TCP_F_INIT_FAILED`; regression risk is very low
because it only suppresses an invalid put on a command whose init
failed.
Phase 3 Record:
`git blame` showed the current direct `nvmet_req_uninit()` line came
from `0700542a823b` (`nvmet-tcp: remove nvmet_tcp_finish_cmd`), but
older stable trees had the same underlying behavior through
`nvmet_tcp_finish_cmd()`. `git describe --contains` places the original
nvmet-tcp code at `v5.0-rc1` and the direct-inline refactor at
`v6.1-rc1`. The init-failure flag and execute-path guard were already
present in stable versions I checked. Recent history contains other
nvmet-tcp fixes, but `git apply --check` confirmed this candidate
applies to the current 7.0.y tree without prerequisites. Author has at
least one other nvmet target fix in the local history; review/commit
path is stronger signal here than author history.
Phase 4 Record:
`b4 dig -c 4606467a75cfc` found the original lore submission at `https:/
/patch.msgid.link/20260318225658.1760759-1-kumar.shivam43666@gmail.com`.
`b4 dig -a` showed only v1, no superseding revision. `b4 dig -w` showed
recipients included `gregkh`, `security@kernel.org`, Christoph Hellwig,
Sagi Grimberg, Keith Busch, and `linux-nvme`. The infradead archive copy
matched the diff exactly. Christoph replied “Looks good to me” and gave
`Reviewed-by`, while noting he would prefer someone more familiar with
TCP code also look. I found no NAK. A stable lore query via `WebFetch`
was blocked by Anubis; web search did not reveal separate stable-
specific discussion.
Phase 5 Record:
Modified function: `nvmet_tcp_try_recv_ddgst()`. Call chain verified
locally: socket callbacks queue `nvmet_tcp_io_work()`, which calls
`nvmet_tcp_try_recv()`, then `nvmet_tcp_try_recv_one()`, then
`nvmet_tcp_try_recv_ddgst()` when `rcv_state == NVMET_TCP_RECV_DDGST`.
The failing path is reachable by NVMe/TCP target receive processing when
data digest is enabled and a write command with failed
`nvmet_req_init()` still has inline data to drain. Callees verified:
`nvmet_req_uninit()` unconditionally calls
`percpu_ref_put(&req->sq->ref)`, and `nvmet_req_init()` only takes that
ref after earlier validation and parsing succeeds.
Phase 6 Record:
Stable-code checks show the vulnerable pattern exists in `v5.4`,
`v5.10`, `v5.15`, `v6.1`, `v6.6`, `v6.12`, `v6.18`, `v6.19`, and `v7.0`,
though `v5.x` uses `nvmet_tcp_finish_cmd()` so those trees need a small
backport adjustment. For `v6.1+`, the same flag, execute-path guard, and
unconditional digest-error uninit pattern are present. Current 7.0.y
accepts the candidate patch with `git apply --check`.
Phase 7 Record:
Subsystem is NVMe target over TCP: storage/network transport driver,
important for systems exporting NVMe-oF targets. Activity level is
active; recent local `block-next` history includes several nvmet-tcp
fixes around the same file.
Phase 8 Record:
Affected users are systems running `nvmet-tcp` targets with data digest
enabled. Trigger requires a command init failure path plus later data
digest mismatch while stale write data is being drained, so not
universal, but reachable through remote NVMe/TCP protocol traffic.
Failure mode is high to critical: verified commit text and code
mechanism support refcount underflow from unmatched `percpu_ref_put()`,
with reported consequences of warning, UAF diagnostic, and permanent
workqueue deadlock. Benefit is high for affected targets; risk is very
low due to a two-line guard on an error path.
Phase 9 Record:
Evidence for backporting: real refcounting bug, severe hang/deadlock
failure mode, small fix, reviewed by Christoph Hellwig, applies cleanly
to 7.0.y, and the vulnerable pattern exists across active stable lines.
Evidence against: no `Fixes:`/`Cc: stable`/`Reported-by`, no `Tested-
by`, local `master` did not contain the commit while `block-next` did,
and `v5.x` needs an adjusted backport around `nvmet_tcp_finish_cmd()`.
Stable checklist: obviously correct yes; fixes a real bug yes; important
issue yes; small and contained yes; no new feature/API yes; applies
cleanly to 7.0.y and likely minor/manual adjustment for older stable
trees.
## Problem And Stable Value
The commit prevents `nvmet_req_uninit()` from dropping a request
reference that was never acquired. The code verifies that
`nvmet_req_init()` only does `percpu_ref_tryget_live()` after
validation/parsing succeeds, while `nvmet_req_uninit()` always calls
`percpu_ref_put()`. On the `NVMET_TCP_F_INIT_FAILED` path, calling
uninit is therefore invalid.
This matters for stable because the failure is not cosmetic: the
described and mechanically verified outcome is refcount underflow, with
potential UAF diagnostics and permanent workqueue deadlock in an
NVMe/TCP target. The change is as small as this kind of fix gets and
mirrors an existing guard in the same driver.
## Risk / Benefit
Benefit is high for affected NVMe/TCP target deployments, especially
because a transport-level bad digest/error path should not be able to
wedge the target workqueue. Risk is very low: it changes only an error
path and only skips cleanup that is invalid when init failed. Buffer
cleanup and fatal error handling still run.
## Verification
- Phase 1: `git show 4606467a75cfc` verified commit message, tags,
author, committer, and exact diff.
- Phase 2: Diff analysis verified only `drivers/nvme/target/tcp.c`
changes, 2 insertions/1 deletion in `nvmet_tcp_try_recv_ddgst()`.
- Phase 3: `git blame` verified the direct uninit line came from
`0700542a823b`; `git show 0700542a823b` verified it was a helper
removal preserving the same uninit/free behavior.
- Phase 3: `git describe --contains` verified relevant code ancestry:
original nvmet-tcp code in `v5.0-rc1`, direct inline refactor in
`v6.1-rc1`.
- Phase 4: `b4 dig -c 4606467a75cfc`, `-a`, and `-w` verified lore
match, single v1 revision, and original recipients.
- Phase 4: WebFetch of infradead patch and reply verified patch contents
and Christoph Hellwig’s Reviewed-by plus caveat.
- Phase 5: `rg` and file reads verified the receive call chain and
`nvmet_req_init()`/`nvmet_req_uninit()` reference behavior.
- Phase 6: Stable tag checks verified vulnerable patterns in `v5.4`,
`v5.10`, `v5.15`, `v6.1`, `v6.6`, `v6.12`, `v6.18`, `v6.19`, and
`v7.0`.
- Phase 6: `git apply --check` verified the candidate patch applies to
the current 7.0.y tree.
- Unverified: I could not verify a separate stable-mailing-list
discussion because the lore stable query was blocked by Anubis.
**YES**
drivers/nvme/target/tcp.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index acc71a26733f9..a456dd2fd8bd1 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -1310,7 +1310,8 @@ static int nvmet_tcp_try_recv_ddgst(struct nvmet_tcp_queue *queue)
queue->idx, cmd->req.cmd->common.command_id,
queue->pdu.cmd.hdr.type, le32_to_cpu(cmd->recv_ddgst),
le32_to_cpu(cmd->exp_ddgst));
- nvmet_req_uninit(&cmd->req);
+ if (!(cmd->flags & NVMET_TCP_F_INIT_FAILED))
+ nvmet_req_uninit(&cmd->req);
nvmet_tcp_free_cmd_buffers(cmd);
nvmet_tcp_fatal_error(queue);
ret = -EPROTO;
--
2.53.0
next prev parent reply other threads:[~2026-05-05 9:51 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-05 9:51 [PATCH AUTOSEL 7.0-5.10] ALSA: hda: Avoid WARN_ON() for HDMI chmap slot checks Sasha Levin
2026-05-05 9:51 ` Sasha Levin [this message]
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0] drm/amd/pm: Update emit clock logic Sasha Levin
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0] smb: client: change allocation requirements in smb2_compound_op Sasha Levin
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0-6.18] btrfs: handle -EAGAIN from btrfs_duplicate_item and refresh stale leaf pointer Sasha Levin
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0-5.10] nvme: add missing MODULE_ALIAS for fabrics transports Sasha Levin
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0] dpll: export __dpll_pin_change_ntf() for use under dpll_lock Sasha Levin
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0-5.10] nvme-core: fix parameter name in comment Sasha Levin
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0-5.10] nvme: add quirk NVME_QUIRK_IGNORE_DEV_SUBNQN for 144d:a808 (Samsung PM981/983/970 EVO Plus ) Sasha Levin
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0] ASoC: spacemit: move hw constraints from hw_params to startup Sasha Levin
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0-5.10] ALSA: usb-audio: apply quirk for Playstation PDP Riffmaster Sasha Levin
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0-6.18] nvmet-tcp: Don't clear tls_key when freeing sq Sasha Levin
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0-5.10] rculist: add list_splice_rcu() for private lists Sasha Levin
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0] ALSA: hda/realtek: enable mute LED support on ThinkBook 16p Sasha Levin
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0-6.18] mailbox: cix: Add IRQF_NO_SUSPEND to mailbox interrupt Sasha Levin
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0-6.12] ASoC: codecs: wcd937x: fix AUX PA sequencing and mixer controls Sasha Levin
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0-6.18] btrfs: replace ASSERT with proper error handling in stripe lookup fallback Sasha Levin
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0-5.10] btrfs: handle unexpected free-space-tree key types Sasha Levin
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0-6.18] md/raid5: Fix UAF on IO across the reshape position Sasha Levin
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0-6.6] btrfs: apply first key check for readahead when possible Sasha Levin
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0-6.6] ASoC: aw88395: Fix kernel panic caused by invalid GPIO error pointer Sasha Levin
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0-6.12] nvme-tcp: teardown circular locking fixes Sasha Levin
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0-6.18] btrfs: fix wrong min_objectid in btrfs_previous_item() call Sasha Levin
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0-6.18] btrfs: check return value of btrfs_partially_delete_raid_extent() Sasha Levin
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0-6.18] btrfs: fix raid stripe search missing entries at leaf boundaries Sasha Levin
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0-6.18] btrfs: copy devid in btrfs_partially_delete_raid_extent() Sasha Levin
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0-6.18] nvme-multipath: put module reference when delayed removal work is canceled Sasha Levin
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0] btrfs: abort transaction in do_remap_reloc_trans() on failure Sasha Levin
2026-05-05 9:51 ` [PATCH AUTOSEL 7.0] drm/amdkfd: check if vm ready in svm map and unmap to gpu Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260505095149.512052-2-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=hch@lst.de \
--cc=kbusch@kernel.org \
--cc=kch@nvidia.com \
--cc=kumar.shivam43666@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=patches@lists.linux.dev \
--cc=sagi@grimberg.me \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox