Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH AUTOSEL 7.0-6.1] nvmet-tcp: check INIT_FAILED before nvmet_req_uninit in digest error path
       [not found] <20260505095149.512052-1-sashal@kernel.org>
@ 2026-05-05  9:51 ` Sasha Levin
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-5.10] nvme: add missing MODULE_ALIAS for fabrics transports Sasha Levin
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 7+ messages in thread
From: Sasha Levin @ 2026-05-05  9:51 UTC (permalink / raw)
  To: patches, stable
  Cc: Shivam Kumar, Christoph Hellwig, Keith Busch, Sasha Levin, sagi,
	kch, linux-nvme, linux-kernel

From: Shivam Kumar <kumar.shivam43666@gmail.com>

[ Upstream commit 4606467a75cfc16721937272ed29462a750b60c8 ]

In nvmet_tcp_try_recv_ddgst(), when a data digest mismatch is detected,
nvmet_req_uninit() is called unconditionally. However, if the command
arrived via the nvmet_tcp_handle_req_failure() path, nvmet_req_init()
had returned false and percpu_ref_tryget_live() was never executed. The
unconditional percpu_ref_put() inside nvmet_req_uninit() then causes a
refcount underflow, leading to a WARNING in
percpu_ref_switch_to_atomic_rcu, a use-after-free diagnostic, and
eventually a permanent workqueue deadlock.

Check cmd->flags & NVMET_TCP_F_INIT_FAILED before calling
nvmet_req_uninit(), matching the existing pattern in
nvmet_tcp_execute_request().

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Shivam Kumar <kumar.shivam43666@gmail.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Phase Walkthrough
Phase 1 Record:
Subject subsystem/action is `nvmet-tcp`, action `check`, intent: avoid
calling `nvmet_req_uninit()` after failed request initialization in the
data-digest error path. Tags found: `Reviewed-by: Christoph Hellwig
<hch@lst.de>`, `Signed-off-by: Shivam Kumar`, `Signed-off-by: Keith
Busch`. No `Fixes:`, `Reported-by:`, `Tested-by:`, `Link:`, or `Cc:
stable` tag in the committed version I verified. The body describes a
real refcount bug: `nvmet_req_init()` can fail before
`percpu_ref_tryget_live()`, while `nvmet_req_uninit()` always does
`percpu_ref_put()`. Hidden bug fix: yes, it prevents refcount underflow,
UAF diagnostics, and a workqueue deadlock.

Phase 2 Record:
One file changed: `drivers/nvme/target/tcp.c`, 2 insertions and 1
deletion, only in `nvmet_tcp_try_recv_ddgst()`. Before: a data digest
mismatch always called `nvmet_req_uninit(&cmd->req)`. After: it skips
that call when `cmd->flags & NVMET_TCP_F_INIT_FAILED`. Bug category:
reference-counting bug with severe follow-on failure. Fix quality:
small, surgical, matches the existing `nvmet_tcp_execute_request()`
handling of `NVMET_TCP_F_INIT_FAILED`; regression risk is very low
because it only suppresses an invalid put on a command whose init
failed.

Phase 3 Record:
`git blame` showed the current direct `nvmet_req_uninit()` line came
from `0700542a823b` (`nvmet-tcp: remove nvmet_tcp_finish_cmd`), but
older stable trees had the same underlying behavior through
`nvmet_tcp_finish_cmd()`. `git describe --contains` places the original
nvmet-tcp code at `v5.0-rc1` and the direct-inline refactor at
`v6.1-rc1`. The init-failure flag and execute-path guard were already
present in stable versions I checked. Recent history contains other
nvmet-tcp fixes, but `git apply --check` confirmed this candidate
applies to the current 7.0.y tree without prerequisites. Author has at
least one other nvmet target fix in the local history; review/commit
path is stronger signal here than author history.

Phase 4 Record:
`b4 dig -c 4606467a75cfc` found the original lore submission at `https:/
/patch.msgid.link/20260318225658.1760759-1-kumar.shivam43666@gmail.com`.
`b4 dig -a` showed only v1, no superseding revision. `b4 dig -w` showed
recipients included `gregkh`, `security@kernel.org`, Christoph Hellwig,
Sagi Grimberg, Keith Busch, and `linux-nvme`. The infradead archive copy
matched the diff exactly. Christoph replied “Looks good to me” and gave
`Reviewed-by`, while noting he would prefer someone more familiar with
TCP code also look. I found no NAK. A stable lore query via `WebFetch`
was blocked by Anubis; web search did not reveal separate stable-
specific discussion.

Phase 5 Record:
Modified function: `nvmet_tcp_try_recv_ddgst()`. Call chain verified
locally: socket callbacks queue `nvmet_tcp_io_work()`, which calls
`nvmet_tcp_try_recv()`, then `nvmet_tcp_try_recv_one()`, then
`nvmet_tcp_try_recv_ddgst()` when `rcv_state == NVMET_TCP_RECV_DDGST`.
The failing path is reachable by NVMe/TCP target receive processing when
data digest is enabled and a write command with failed
`nvmet_req_init()` still has inline data to drain. Callees verified:
`nvmet_req_uninit()` unconditionally calls
`percpu_ref_put(&req->sq->ref)`, and `nvmet_req_init()` only takes that
ref after earlier validation and parsing succeeds.

Phase 6 Record:
Stable-code checks show the vulnerable pattern exists in `v5.4`,
`v5.10`, `v5.15`, `v6.1`, `v6.6`, `v6.12`, `v6.18`, `v6.19`, and `v7.0`,
though `v5.x` uses `nvmet_tcp_finish_cmd()` so those trees need a small
backport adjustment. For `v6.1+`, the same flag, execute-path guard, and
unconditional digest-error uninit pattern are present. Current 7.0.y
accepts the candidate patch with `git apply --check`.

Phase 7 Record:
Subsystem is NVMe target over TCP: storage/network transport driver,
important for systems exporting NVMe-oF targets. Activity level is
active; recent local `block-next` history includes several nvmet-tcp
fixes around the same file.

Phase 8 Record:
Affected users are systems running `nvmet-tcp` targets with data digest
enabled. Trigger requires a command init failure path plus later data
digest mismatch while stale write data is being drained, so not
universal, but reachable through remote NVMe/TCP protocol traffic.
Failure mode is high to critical: verified commit text and code
mechanism support refcount underflow from unmatched `percpu_ref_put()`,
with reported consequences of warning, UAF diagnostic, and permanent
workqueue deadlock. Benefit is high for affected targets; risk is very
low due to a two-line guard on an error path.

Phase 9 Record:
Evidence for backporting: real refcounting bug, severe hang/deadlock
failure mode, small fix, reviewed by Christoph Hellwig, applies cleanly
to 7.0.y, and the vulnerable pattern exists across active stable lines.
Evidence against: no `Fixes:`/`Cc: stable`/`Reported-by`, no `Tested-
by`, local `master` did not contain the commit while `block-next` did,
and `v5.x` needs an adjusted backport around `nvmet_tcp_finish_cmd()`.
Stable checklist: obviously correct yes; fixes a real bug yes; important
issue yes; small and contained yes; no new feature/API yes; applies
cleanly to 7.0.y and likely minor/manual adjustment for older stable
trees.

## Problem And Stable Value
The commit prevents `nvmet_req_uninit()` from dropping a request
reference that was never acquired. The code verifies that
`nvmet_req_init()` only does `percpu_ref_tryget_live()` after
validation/parsing succeeds, while `nvmet_req_uninit()` always calls
`percpu_ref_put()`. On the `NVMET_TCP_F_INIT_FAILED` path, calling
uninit is therefore invalid.

This matters for stable because the failure is not cosmetic: the
described and mechanically verified outcome is refcount underflow, with
potential UAF diagnostics and permanent workqueue deadlock in an
NVMe/TCP target. The change is as small as this kind of fix gets and
mirrors an existing guard in the same driver.

## Risk / Benefit
Benefit is high for affected NVMe/TCP target deployments, especially
because a transport-level bad digest/error path should not be able to
wedge the target workqueue. Risk is very low: it changes only an error
path and only skips cleanup that is invalid when init failed. Buffer
cleanup and fatal error handling still run.

## Verification
- Phase 1: `git show 4606467a75cfc` verified commit message, tags,
  author, committer, and exact diff.
- Phase 2: Diff analysis verified only `drivers/nvme/target/tcp.c`
  changes, 2 insertions/1 deletion in `nvmet_tcp_try_recv_ddgst()`.
- Phase 3: `git blame` verified the direct uninit line came from
  `0700542a823b`; `git show 0700542a823b` verified it was a helper
  removal preserving the same uninit/free behavior.
- Phase 3: `git describe --contains` verified relevant code ancestry:
  original nvmet-tcp code in `v5.0-rc1`, direct inline refactor in
  `v6.1-rc1`.
- Phase 4: `b4 dig -c 4606467a75cfc`, `-a`, and `-w` verified lore
  match, single v1 revision, and original recipients.
- Phase 4: WebFetch of infradead patch and reply verified patch contents
  and Christoph Hellwig’s Reviewed-by plus caveat.
- Phase 5: `rg` and file reads verified the receive call chain and
  `nvmet_req_init()`/`nvmet_req_uninit()` reference behavior.
- Phase 6: Stable tag checks verified vulnerable patterns in `v5.4`,
  `v5.10`, `v5.15`, `v6.1`, `v6.6`, `v6.12`, `v6.18`, `v6.19`, and
  `v7.0`.
- Phase 6: `git apply --check` verified the candidate patch applies to
  the current 7.0.y tree.
- Unverified: I could not verify a separate stable-mailing-list
  discussion because the lore stable query was blocked by Anubis.

**YES**

 drivers/nvme/target/tcp.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index acc71a26733f9..a456dd2fd8bd1 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -1310,7 +1310,8 @@ static int nvmet_tcp_try_recv_ddgst(struct nvmet_tcp_queue *queue)
 			queue->idx, cmd->req.cmd->common.command_id,
 			queue->pdu.cmd.hdr.type, le32_to_cpu(cmd->recv_ddgst),
 			le32_to_cpu(cmd->exp_ddgst));
-		nvmet_req_uninit(&cmd->req);
+		if (!(cmd->flags & NVMET_TCP_F_INIT_FAILED))
+			nvmet_req_uninit(&cmd->req);
 		nvmet_tcp_free_cmd_buffers(cmd);
 		nvmet_tcp_fatal_error(queue);
 		ret = -EPROTO;
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH AUTOSEL 7.0-5.10] nvme: add missing MODULE_ALIAS for fabrics transports
       [not found] <20260505095149.512052-1-sashal@kernel.org>
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.1] nvmet-tcp: check INIT_FAILED before nvmet_req_uninit in digest error path Sasha Levin
@ 2026-05-05  9:51 ` Sasha Levin
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-5.10] nvme-core: fix parameter name in comment Sasha Levin
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 7+ messages in thread
From: Sasha Levin @ 2026-05-05  9:51 UTC (permalink / raw)
  To: patches, stable
  Cc: Geliang Tang, Christoph Hellwig, Keith Busch, Sasha Levin,
	justin.tee, nareshgottumukkala83, paul.ely, sagi, linux-nvme,
	linux-kernel

From: Geliang Tang <tanggeliang@kylinos.cn>

[ Upstream commit 723277b15ed97185ce6f75abbf19f06e00f0a6f5 ]

The generic fabrics layer uses request_module("nvme-%s", opts->transport)
to auto-load transport modules. Currently, the nvme-tcp, nvme-rdma, and
nvme-fc modules lack MODULE_ALIAS entries for these names, which prevents
the kernel from automatically finding and loading them when requested.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Phase Walkthrough
Phase 1 Record: subsystem `nvme` / NVMe host fabrics; action verb `add
missing`; intent is to add module aliases for existing fabrics
transports. Tags: `Reviewed-by: Christoph Hellwig <hch@lst.de>`,
`Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>`, `Signed-off-by:
Keith Busch <kbusch@kernel.org>`. No `Fixes:`, `Reported-by:`, `Tested-
by:`, `Link:`, or `Cc: stable`. Body describes a real autoload bug:
`fabrics.c` calls `request_module("nvme-%s", opts->transport)`, but the
transport modules do not advertise aliases matching `nvme-tcp`, `nvme-
rdma`, or `nvme-fc`. This is a hidden bug fix despite using “add”.

Phase 2 Record: files changed are `drivers/nvme/host/fc.c` `+1`,
`drivers/nvme/host/rdma.c` `+1`, `drivers/nvme/host/tcp.c` `+1`; no
functions changed, only module metadata at EOF. Before: modules had
license/description metadata but no matching alias. After: each existing
module exports the alias that the existing autoload path requests. Bug
category is logic/module-autoload correctness. Fix quality is excellent:
three literal aliases, each matching the existing transport name and
module filename; regression risk is very low.

Phase 3 Record: `request_module("nvme-%s", opts->transport)` was
introduced by `d1f1071f81513` (`nvme-fabrics: request transport
module`), first in `v4.15`. RDMA host came in `7110230719602` (`v4.8`),
FC in `e399441de9115` (`v4.10`), TCP in `3f2304f8c6d6` (`v5.0`). No
`Fixes:` tag to follow. Recent file history shows normal NVMe churn, not
a prerequisite series. The author has prior NVMe host commits, and the
patch was reviewed/committed by listed NVMe maintainers. No dependency
beyond the existing files and `MODULE_ALIAS`.

Phase 4 Record: `b4 dig -c 723277b15ed97` found the original patch at `h
ttps://patch.msgid.link/e3076c80ee2e0c4a2cae0374afb617a3365946ea.1774944
415.git.tanggeliang@kylinos.cn`. `b4 dig -a` found only v1. `b4 dig -w`
showed NVMe maintainers and `linux-nvme` were included. Lore mirror
shows Christoph Hellwig replied “Looks good” with `Reviewed-by`, and
Keith Busch replied “applied to nvme-7.1”. No NAKs, no stable
nomination, and no separate bug report link found.

Phase 5 Record: modified “functions” are module metadata only. The
affected call path is verified as `nvmf_dev_write()` ->
`nvmf_create_ctrl()` -> `request_module("nvme-%s", opts->transport)` ->
`nvmf_lookup_transport()` -> transport `create_ctrl`. Transport
registration uses `.name = "fc"`, `.name = "rdma"`, `.name = "tcp"` and
module objects are `nvme-fc.o`, `nvme-rdma.o`, `nvme-tcp.o`, so the
alias strings exactly match the requested names. This path is reachable
from userspace via the `nvme-fabrics` misc device write path; I did not
verify whether unprivileged users can trigger it.

Phase 6 Record: stable refs checked. `stable/linux-4.19.y` has the
autoload request and FC/RDMA files, but no TCP file.
`stable/linux-5.4.y` through `stable/linux-7.0.y` have the autoload
request and all three transport files. None of those checked stable refs
had the `MODULE_ALIAS("nvme-*")` entries. The patch applies cleanly to
current `stable/linux-7.0.y`; `5.4` through `6.6` lack the newer
`MODULE_DESCRIPTION` context, so they need a trivial backport placing
aliases after `MODULE_LICENSE`. `4.14.y` lacks the verified autoload
request and has no TCP file, so benefit there is not established.

Phase 7 Record: subsystem is NVMe host fabrics under storage/block;
criticality is IMPORTANT because it affects NVMe-oF connectivity for
users of these transports. The subsystem is active, with recent NVMe
host changes verified in history.

Phase 8 Record: affected users are NVMe-oF users with FC/RDMA/TCP
transports built as modules and not already loaded. Trigger is creating
a fabrics controller for one of those transports while relying on kernel
module autoload. Failure mode is not a crash: `nvmf_lookup_transport()`
fails, logs `no handler found for transport %s`, and returns `-EINVAL`,
preventing the connection. Severity is MEDIUM user-visible functional
failure for storage connectivity. Benefit is high for affected
configurations; risk is very low because it only adds module alias
metadata for existing modules.

Phase 9 Record: evidence for backporting: real autoload bug, existing
request path, exact alias/name/module matches, important storage
subsystem, very small patch, maintainer review, no API or new feature.
Evidence against: no crash/security/data-corruption impact, no explicit
stable tag, no explicit Tested-by, and older stable trees need minor
context adjustment. Stable rules: obviously correct yes; tested tag no
but maintainer-reviewed; fixes a real user-visible bug yes; important
enough for affected NVMe-oF users yes; small and contained yes; no new
feature/API yes; can apply cleanly to `7.0.y` and trivially to older
applicable trees. Exception category: closest to module/device autoload
enablement for existing drivers, not a new driver.

## Verification
- Phase 1: inspected upstream commit `723277b15ed97`; verified subject,
  body, tags, and absence of `Fixes:`/`Reported-by:`/stable tags.
- Phase 2: inspected diff; verified exactly three added `MODULE_ALIAS`
  lines.
- Phase 3: used blame/log/tag checks; verified autoload request
  introduced in `v4.15`, RDMA in `v4.8`, FC in `v4.10`, TCP in `v5.0`.
- Phase 4: ran `b4 dig -c`, `-a`, `-w`; fetched lore mirror; verified
  v1-only patch, maintainer review, maintainer application, no NAKs.
- Phase 5: read `fabrics.c`, `Makefile`, `Kconfig`, and transport ops;
  verified call path, transport names, and module names.
- Phase 6: checked stable refs `4.14.y`, `4.19.y`, `5.4.y`, `5.10.y`,
  `5.15.y`, `6.1.y`, `6.6.y`, `6.12.y`, `6.19.y`, `7.0.y`; verified
  applicable code and missing aliases; `git apply --check` passed on
  current `7.0.y`. One intermediate shell formatting command failed due
  to a bad `sed` expression; I reran the stable metadata check correctly
  afterward.
- Phase 8: verified the failure path returns `-EINVAL` after
  `nvmf_lookup_transport()` fails; privilege requirements were not
  verified.

This should be backported to applicable stable trees. It fixes an
existing intended autoload mechanism for existing NVMe fabrics
transports with a tiny, low-risk metadata-only change.

**YES**

 drivers/nvme/host/fc.c   | 1 +
 drivers/nvme/host/rdma.c | 1 +
 drivers/nvme/host/tcp.c  | 1 +
 3 files changed, 3 insertions(+)

diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index e1bb4707183ca..e4f4528fe2a2d 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -3968,3 +3968,4 @@ module_exit(nvme_fc_exit_module);
 
 MODULE_DESCRIPTION("NVMe host FC transport driver");
 MODULE_LICENSE("GPL v2");
+MODULE_ALIAS("nvme-fc");
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 57111139e84fa..1ec6e867aedb6 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -2432,3 +2432,4 @@ module_exit(nvme_rdma_cleanup_module);
 
 MODULE_DESCRIPTION("NVMe host RDMA transport driver");
 MODULE_LICENSE("GPL v2");
+MODULE_ALIAS("nvme-rdma");
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 243dab830dc84..02c95c32b07e3 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -3071,3 +3071,4 @@ module_exit(nvme_tcp_cleanup_module);
 
 MODULE_DESCRIPTION("NVMe host TCP transport driver");
 MODULE_LICENSE("GPL v2");
+MODULE_ALIAS("nvme-tcp");
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH AUTOSEL 7.0-5.10] nvme-core: fix parameter name in comment
       [not found] <20260505095149.512052-1-sashal@kernel.org>
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.1] nvmet-tcp: check INIT_FAILED before nvmet_req_uninit in digest error path Sasha Levin
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-5.10] nvme: add missing MODULE_ALIAS for fabrics transports Sasha Levin
@ 2026-05-05  9:51 ` Sasha Levin
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-5.10] nvme: add quirk NVME_QUIRK_IGNORE_DEV_SUBNQN for 144d:a808 (Samsung PM981/983/970 EVO Plus ) Sasha Levin
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 7+ messages in thread
From: Sasha Levin @ 2026-05-05  9:51 UTC (permalink / raw)
  To: patches, stable
  Cc: Flavio Suligoi, Christoph Hellwig, Keith Busch, Sasha Levin, sagi,
	linux-nvme, linux-kernel

From: Flavio Suligoi <f.suligoi@asem.it>

[ Upstream commit e80e39f25567310c1c7392eed886890b5c6788ba ]

In the declaration of the structure "core_quirks[]", in the comment
referred to the devices "Kioxia CD6-V Series / HPE PE8030", the
parameter "default_ps_max_latency_us" is reported in a wrong way:

nvme_core.default_ps_max_latency=0

The correct form is, instead:

nvme_core.default_ps_max_latency_us=0

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Flavio Suligoi <f.suligoi@asem.it>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Phase 1: Commit Message Forensics
Step 1.1 Record: Subsystem `nvme-core`; action verb `fix`; claimed
intent is to correct the kernel module parameter name in a source
comment for the Kioxia CD6-V / HPE PE8030 NVMe quirk.

Step 1.2 Record: Tags present:
- `Reviewed-by: Christoph Hellwig <hch@lst.de>`
- `Signed-off-by: Flavio Suligoi <f.suligoi@asem.it>`
- `Signed-off-by: Keith Busch <kbusch@kernel.org>`
No `Fixes:`, `Reported-by:`, `Tested-by:`, `Link:`, or `Cc:
stable@vger.kernel.org` tag was present in the commit inspected with
`git show`.

Step 1.3 Record: The commit body describes an incorrect parameter
spelling in a comment: `nvme_core.default_ps_max_latency=0`; the actual
module parameter is `nvme_core.default_ps_max_latency_us=0`. Symptom is
not a runtime kernel failure from the patch itself, but incorrect in-
source guidance for disabling APST. Version information was not stated.
Root cause is a missing `_us` suffix in the comment.

Step 1.4 Record: This is not a hidden runtime bug fix. It is an explicit
comment/documentation correction for a real module parameter.

## Phase 2: Diff Analysis
Step 2.1 Record: One file changed: `drivers/nvme/host/core.c`, 1
insertion and 1 deletion. No function body is modified; the changed
object is the `core_quirks[]` table comment. Scope classification:
single-file, comment-only surgical fix.

Step 2.2 Record: Before: the comment suggested booting with nonexistent
or incorrect `nvme_core.default_ps_max_latency=0`. After: it suggests
the verified existing parameter `nvme_core.default_ps_max_latency_us=0`.
This affects only source-code guidance, not execution.

Step 2.3 Record: Bug category is documentation/comment correctness. No
resource leak, race, refcount bug, memory safety bug, type bug, or
executable logic change is present.

Step 2.4 Record: Fix quality is obviously correct:
`module_param(default_ps_max_latency_us, ulong, 0644)` exists in
`drivers/nvme/host/core.c`. Regression risk is effectively zero because
only a comment changes.

## Phase 3: Git History Investigation
Step 3.1 Record: `git blame` shows the incorrect comment line was
introduced by `5a6254d55e2a9f` (`nvme-pci: add NO APST quirk for Kioxia
device`). `git tag --contains 5a6254d55e2a9f` shows it is present from
`v5.16` onward in mainline tags available locally, and stable branch
snapshots for 5.10 and 5.15 also contain the line.

Step 3.2 Record: No `Fixes:` tag exists, so there was no tagged
introducer to follow. I separately inspected `5a6254d55e2a9f` because
blame identified it as the source of the wrong comment.

Step 3.3 Record: Recent file history shows this patch is standalone.
`block-next` contains `e80e39f255673 nvme-core: fix parameter name in
comment`; nearby commits touch unrelated NVMe behavior.

Step 3.4 Record: `git log block-next --author='Flavio Suligoi' --
drivers/nvme/host` found only this NVMe host commit locally. The patch
was reviewed by Christoph Hellwig and committed by Keith Busch, both
verified from commit metadata and lore.

Step 3.5 Record: No dependencies found. The patch is a one-line comment
correction and `git apply --check` against the current stable checkout
succeeded.

## Phase 4: Mailing List And External Research
Step 4.1 Record: `b4 dig -c e80e39f255673` found the original thread:
`https://patch.msgid.link/20260408124522.2375297-1-f.suligoi@asem.it`.
`b4 dig -a` found only v1, so no later revision was missed. Lore mirror
showed Christoph Hellwig replied “Looks good” with `Reviewed-by`, and
Keith Busch replied “applied to nvme-7.1”. No NAKs or risk concerns were
found.

Step 4.2 Record: `b4 dig -w` showed recipients included Keith Busch,
Jens Axboe, Christoph Hellwig, Sagi Grimberg, `linux-nvme`, and `linux-
kernel`, so the right subsystem maintainers/lists were included.

Step 4.3 Record: No `Reported-by` or bug-report `Link:` tag exists.
External search confirmed this is about correcting the APST parameter
spelling, not a separate crash report.

Step 4.4 Record: `b4 dig -a` showed this is a single-patch v1 series,
not a multi-patch dependency chain.

Step 4.5 Record: Direct `lore.kernel.org/stable` fetch was blocked by
Anubis, and web search did not find stable-specific discussion for this
exact patch. Stable-specific discussion remains unverified.

## Phase 5: Code Semantic Analysis
Step 5.1 Record: No functions are modified. The changed text is inside
the static `core_quirks[]` data table comment.

Step 5.2 Record: Caller tracing is not applicable to the changed line
because it is non-executable. I verified `core_quirks[]` is used by the
quirk scan path in `drivers/nvme/host/core.c`, but the patch does not
alter table values or matching logic.

Step 5.3 Record: Callee tracing is not applicable because no executable
statement changed.

Step 5.4 Record: Runtime reachability is not applicable. The only
“reachability” is human/developer/user reading the source comment.

Step 5.5 Record: `rg` verified the actual parameter is declared as
`default_ps_max_latency_us`; the old comment text is the mismatching
pattern.

## Phase 6: Cross-Referencing And Stable Tree Analysis
Step 6.1 Record: The incorrect comment exists in checked refs `v5.16`,
`v6.1`, `v6.6`, `v6.12`, `v6.19`, `v7.0`, and stable branch snapshots
`for-greg/5.10-201`, `for-greg/5.15-201`, `for-greg/6.1-201`, `for-
greg/6.6-201`, `for-greg/6.12-201`, `for-greg/6.19-200`, and `for-
greg/7.0-200`.

Step 6.2 Record: Expected backport difficulty is clean or trivial. `git
apply --check` and `git apply --check --3way` both succeeded on the
current checkout.

Step 6.3 Record: Local history did not show a different stable-side fix
for this exact comment. `git log stable/linux-7.0.y --grep='fix
parameter name in comment' -- drivers/nvme/host/core.c` returned no
match.

## Phase 7: Subsystem And Maintainer Context
Step 7.1 Record: Subsystem is NVMe host core under `drivers/nvme/host`.
Criticality level: important driver subsystem, but this patch’s actual
affected surface is source documentation only.

Step 7.2 Record: The NVMe host core file is actively maintained; recent
local history shows multiple NVMe core fixes and feature changes. This
specific patch was reviewed and applied through the NVMe/block path.

## Phase 8: Impact And Risk Assessment
Step 8.1 Record: Affected population is users/developers reading this
source comment for the Kioxia CD6-V / HPE PE8030 APST workaround.
Runtime users are not directly affected by the patch.

Step 8.2 Record: Trigger condition is consulting the source comment and
using the wrong boot parameter. I verified the correct parameter exists;
I did not verify kernel behavior for an unknown wrong boot parameter in
this investigation. Unprivileged runtime triggering is not applicable.

Step 8.3 Record: Failure mode of the patch’s target issue is incorrect
guidance, not a kernel crash, data corruption, deadlock, or security
issue. Severity is LOW as a code issue, but the guidance relates to a
real NVMe APST workaround.

Step 8.4 Record: Benefit is modest but real: stable source carries the
correct module parameter spelling for affected hardware guidance. Risk
is near zero because the patch is comment-only. Risk-benefit ratio is
favorable under the stable exception for documentation/comment fixes.

## Phase 9: Final Synthesis
Step 9.1 Record:
Evidence for backporting:
- Corrects an objectively wrong parameter name.
- Correct parameter is verified in the code.
- Incorrect text exists across active stable branch snapshots.
- Patch is one-line, comment-only, and applies cleanly.
- Reviewed by Christoph Hellwig and applied by Keith Busch.
- Falls under the documentation/comment-fix exception with effectively
  no runtime regression risk.

Evidence against backporting:
- Does not fix executable kernel behavior.
- Does not address crash, corruption, deadlock, security, memory safety,
  or resource leak.
- No explicit stable nomination or stable-list discussion was verified.

Unresolved:
- Stable-list search was partly blocked by Anubis; no stable-specific
  discussion was found via web search.
- I did not verify the exact runtime handling of the wrong boot
  parameter, because the commit only changes a source comment.

Step 9.2 Record:
1. Obviously correct and tested? Yes for the text correction; the actual
   parameter declaration was verified.
2. Fixes a real bug that affects users? It fixes a real
   documentation/comment bug; user impact is indirect.
3. Important issue? Not important as a runtime kernel bug; LOW severity.
4. Small and contained? Yes, one comment line in one file.
5. No new features or APIs? Yes.
6. Can apply to stable trees? Yes on the checked-out stable tree; likely
   trivial elsewhere due the same comment text in checked stable refs.

Step 9.3 Record: Exception category applies: documentation/comment fix.
It corrects incorrect in-source documentation and has zero runtime
behavior change.

Step 9.4 Record: Despite not being a runtime fix, this is appropriate
for stable under the explicit documentation/comment-fix exception: it is
objectively correct, present in stable trees, useful for a real hardware
workaround, and essentially risk-free.

## Verification
- [Phase 1] `git show --format=fuller --stat --patch e80e39f255673`:
  verified subject, tags, author/committer, and one-line comment diff.
- [Phase 2] Diff inspection: verified only `drivers/nvme/host/core.c`
  changed, 1 insertion and 1 deletion, comment-only.
- [Phase 2] `rg default_ps_max_latency`: verified actual module
  parameter is `default_ps_max_latency_us`.
- [Phase 3] `git blame` around the quirk comment: verified wrong comment
  introduced by `5a6254d55e2a9f`.
- [Phase 3] `git show 5a6254d55e2a9f`: verified original Kioxia NO_APST
  quirk and wrong comment text.
- [Phase 3] `git tag --contains 5a6254d55e2a9f`: verified mainline
  availability from `v5.16` in local tags.
- [Phase 4] `b4 dig -c e80e39f255673`: found original patch submission.
- [Phase 4] `b4 dig -a`: verified single v1 patch.
- [Phase 4] `b4 dig -w`: verified NVMe maintainers and lists were
  included.
- [Phase 4] WebFetch of lore mirror: verified Reviewed-by from Christoph
  Hellwig and applied note from Keith Busch.
- [Phase 5] `rg core_quirks`: verified the changed line is in
  `core_quirks[]`; no executable behavior changed.
- [Phase 6] Ref checks with `git show <ref>:drivers/nvme/host/core.c |
  rg`: verified wrong comment exists in checked stable refs and
  corrected text exists in `block-next`.
- [Phase 6] `git apply --check` and `git apply --check --3way`: verified
  clean local application.
- [Phase 7] `git log block-next -- drivers/nvme/host/core.c`: verified
  active subsystem history and target commit in block-next.
- [Phase 8] Failure mode verified from diff: incorrect comment guidance
  only, no runtime code path changed.
- UNVERIFIED: stable mailing-list discussion, because direct lore stable
  search was blocked and web search found no exact stable discussion.
- UNVERIFIED: exact kernel warning/handling for the wrong boot
  parameter, because it is outside the changed code.

**YES**

 drivers/nvme/host/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 766e9cc4ffca5..ac5a7d9781f58 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3013,7 +3013,7 @@ static const struct nvme_core_quirk_entry core_quirks[] = {
 		 *
 		 * The device is left in a state where it is also not possible
 		 * to use "nvme set-feature" to disable APST, but booting with
-		 * nvme_core.default_ps_max_latency=0 works.
+		 * nvme_core.default_ps_max_latency_us=0 works.
 		 */
 		.vid = 0x1e0f,
 		.mn = "KCD6XVUL6T40",
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH AUTOSEL 7.0-5.10] nvme: add quirk NVME_QUIRK_IGNORE_DEV_SUBNQN for 144d:a808 (Samsung PM981/983/970 EVO Plus )
       [not found] <20260505095149.512052-1-sashal@kernel.org>
                   ` (2 preceding siblings ...)
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-5.10] nvme-core: fix parameter name in comment Sasha Levin
@ 2026-05-05  9:51 ` Sasha Levin
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.18] nvmet-tcp: Don't clear tls_key when freeing sq Sasha Levin
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 7+ messages in thread
From: Sasha Levin @ 2026-05-05  9:51 UTC (permalink / raw)
  To: patches, stable
  Cc: Alan Cui, Keith Busch, Sasha Levin, sagi, linux-nvme,
	linux-kernel

From: Alan Cui <me@alancui.cc>

[ Upstream commit 7f991e3f9b8f044640bcb5fa8570350a68932843 ]

The firmware for Samsung 970 Evo Plus / PM981 / PM983 does not support SUBNQN.
Make quirks to suppress warnings.

# nvme id-ctrl /dev/nvme1n1
NVME Identify Controller:
vid       : 0x144d
ssvid     : 0x144d
sn        : ***
mn        : Samsung SSD 970 EVO Plus 500GB
fr        : 2B2QEXM7

mcdqpc    : 0
subnqn    :
ioccsz    : 0

Signed-off-by: Alan Cui <me@alancui.cc>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Phase Walkthrough
1. Phase 1: Commit message forensics
  Record: subsystem `nvme`, action `add quirk`, intent is to add
  `NVME_QUIRK_IGNORE_DEV_SUBNQN` for Samsung PCI ID `144d:a808`. Tags
  present: `Signed-off-by: Alan Cui`, `Signed-off-by: Keith Busch`; no
  `Fixes:`, `Reported-by:`, `Tested-by:`, `Reviewed-by:`, `Link:`, or
  `Cc: stable`. Body says Samsung 970 EVO Plus / PM981 / PM983 firmware
  leaves `subnqn` empty; supplied `nvme id-ctrl` output confirms
  `vid=0x144d`, model `Samsung SSD 970 EVO Plus 500GB`, firmware
  `2B2QEXM7`, and empty `subnqn`. This is a hardware quirk, not a hidden
  memory/race/resource bug.

2. Phase 2: Diff analysis
  Record: one file, `drivers/nvme/host/pci.c`, 2 insertions. Modified
  object is the `nvme_id_table` PCI ID table. Before: Samsung
  `144d:a808` matched the generic NVMe PCI class entry and got no
  `IGNORE_DEV_SUBNQN` quirk. After: it matches a specific PCI ID entry
  and sets `NVME_QUIRK_IGNORE_DEV_SUBNQN`. In `nvme_init_subnqn()`, that
  quirk skips device-provided SUBNQN handling and suppresses the
  “missing or invalid SUBNQN field” warning while still generating the
  synthetic NQN. Fix quality is surgical and consistent with nearby
  quirks. Regression risk is low, with the main caveat that the quirk
  applies to all `144d:a808` devices.

3. Phase 3: Git history investigation
  Record: target commit is `7f991e3f9b8f0`. There is no `Fixes:` tag.
  `NVME_QUIRK_IGNORE_DEV_SUBNQN` was introduced by `6299358d198a0`,
  described as handling firmware that reports invalid/non-unique SUBNQN,
  first contained around `v5.0-rc2`. Existing `144d:a808` handling for a
  suspend quirk was introduced by `1fae37accfc587`, around `v5.6-rc3`,
  confirming the PCI ID is already known in NVMe PCI code. Recent
  history shows this commit is standalone, not part of a required
  series. Author history in this subsystem showed only this commit;
  Keith Busch committed it.

4. Phase 4: Mailing list and external research
  Record: `b4 dig -c 7f991e3f9b8f0` found the original submission at
  `https://patch.msgid.link/9600680.CDJkKcVGEf@alanarchdesktop`. `b4 dig
  -a` showed only v1. `b4 dig -w` showed recipients included `linux-
  nvme`, `linux-kernel`, and Keith Busch. The fetched mbox contained the
  same patch and no review replies or objections. Web research found an
  earlier 2021 linux-nvme patch proposing the same `144d:a808` quirk for
  Samsung 970 EVO Plus/SM981/PM981/PM983, plus Debian and Proxmox user
  reports of the same warning. No stable-specific discussion or
  rejection reason was found.

5. Phase 5: Code semantic analysis
  Record: changed data structure is `nvme_id_table`. PCI core uses that
  table through `nvme_driver.id_table`, then calls `nvme_probe()`.
  `nvme_probe()` calls `nvme_pci_alloc_dev()`, which initializes `quirks
  = id->driver_data`, then passes those quirks to `nvme_init_ctrl()`.
  Later identify flow calls `nvme_init_identify()`,
  `nvme_init_subsystem()`, and `nvme_init_subnqn()`. The affected path
  is normal PCI NVMe device probe at boot or hotplug, not a syscall-
  triggered path. Similar `IGNORE_DEV_SUBNQN` quirks already exist for
  Intel, ADATA, Samsung PM1725a, Lexar, Phison, and other devices.

6. Phase 6: Stable tree analysis
  Record: checked `v5.4`, `v5.10`, `v5.15`, `v6.1`, `v6.6`, `v6.12`,
  `v6.17`, `v6.18`, and `v6.19`. All have the generic NVMe PCI class
  match and the `NVME_QUIRK_IGNORE_DEV_SUBNQN` infrastructure; none had
  the specific `144d:a808` `IGNORE_DEV_SUBNQN` entry. The insertion
  context around Memblaze `0x1c5f:0x0540` and Samsung PM1725/PM1725a
  exists in all checked tags, so backport difficulty should be clean or
  trivial.

7. Phase 7: Subsystem and maintainer context
  Record: subsystem is NVMe PCI host driver under `drivers/nvme/host`,
  important storage hardware support. It affects users with Samsung
  `144d:a808` NVMe SSDs, not all users. The subsystem is actively
  maintained; recent history shows multiple NVMe fixes and quirk
  additions. Keith Busch, an NVMe maintainer, committed the patch.

8. Phase 8: Impact and risk assessment
  Record: affected users are Samsung 970 EVO Plus / PM981 / PM983 /
  related `144d:a808` NVMe users. Trigger is device probe, typically
  boot. Verified failure mode is a persistent kernel warning for
  missing/invalid SUBNQN; for the empty-SUBNQN case, code already falls
  back to a synthetic NQN, so I did not verify a crash, data corruption,
  or probe failure for this exact firmware. Severity is low-to-medium,
  but it is a real firmware compliance issue on real hardware. Benefit
  is modest but real: suppresses a misleading warning and applies the
  established firmware workaround. Risk is very low: two lines, device-
  specific, no API changes.

9. Phase 9: Final synthesis
  Evidence for backporting: hardware quirk for an existing driver; real
  user-visible firmware issue; exact device ID; tiny and contained;
  infrastructure exists across stable trees; maintainer accepted
  upstream; stable context appears present across checked LTS tags.
  Evidence against: the verified symptom for this exact commit is
  warning suppression rather than a crash/data-loss fix; no `Reported-
  by`, `Tested-by`, or review tags; broad PCI ID match could affect all
  `144d:a808` variants. Stable checklist: obviously correct yes; real
  bug yes, as firmware reports empty SUBNQN; important issue only weak
  under normal criteria, but it fits the stable exception for hardware
  quirks; small and contained yes; no new API or feature yes; expected
  to apply cleanly yes.

## Verification
- Phase 1: `git show --format=fuller --stat --patch 7f991e3f9b8f0`
  verified subject, body, tags, author/committer, and 2-line diff.
- Phase 2: Read `drivers/nvme/host/core.c`, `drivers/nvme/host/nvme.h`,
  and `drivers/nvme/host/pci.c`; verified quirk definition, PCI table
  use, and `nvme_init_subnqn()` behavior.
- Phase 3: `git blame` around the quirk table and `144d:a808` suspend
  handling; `git show` and `git describe --contains` for `6299358d198a0`
  and `1fae37accfc587`; `git log` on `drivers/nvme/host` for related
  commits.
- Phase 4: `b4 dig -c`, `-a`, `-w`, and mbox read verified original lore
  submission, single v1 revision, recipients, and lack of visible review
  objections. WebFetch verified the older 2021 same-quirk patch and user
  reports.
- Phase 5: `rg` and file reads traced `nvme_id_table` through
  `nvme_probe()`, `nvme_pci_alloc_dev()`, `nvme_init_identify()`,
  `nvme_init_subsystem()`, and `nvme_init_subnqn()`.
- Phase 6: `git grep` and a Python `git show
  <tag>:drivers/nvme/host/pci.c` check verified stable tags have the
  infrastructure/context and lack the new `144d:a808` quirk.
- Unverified: I did not test-build or boot the patch, and I did not
  verify a functional failure beyond the warning for this exact Samsung
  firmware.

This should be backported because it is a classic low-risk hardware
quirk for a real, reported firmware non-compliance on an existing NVMe
driver path, even though the confirmed symptom is warning noise rather
than a severe failure.

**YES**

 drivers/nvme/host/pci.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index b78ba239c8ea8..d59340982520a 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -4104,6 +4104,8 @@ static const struct pci_device_id nvme_id_table[] = {
 		.driver_data = NVME_QUIRK_DELAY_BEFORE_CHK_RDY, },
 	{ PCI_DEVICE(0x1c5f, 0x0540),	/* Memblaze Pblaze4 adapter */
 		.driver_data = NVME_QUIRK_DELAY_BEFORE_CHK_RDY, },
+	{ PCI_DEVICE(0x144d, 0xa808),	/* Samsung PM981/983 */
+		.driver_data = NVME_QUIRK_IGNORE_DEV_SUBNQN, },
 	{ PCI_DEVICE(0x144d, 0xa821),   /* Samsung PM1725 */
 		.driver_data = NVME_QUIRK_DELAY_BEFORE_CHK_RDY, },
 	{ PCI_DEVICE(0x144d, 0xa822),   /* Samsung PM1725a */
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH AUTOSEL 7.0-6.18] nvmet-tcp: Don't clear tls_key when freeing sq
       [not found] <20260505095149.512052-1-sashal@kernel.org>
                   ` (3 preceding siblings ...)
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-5.10] nvme: add quirk NVME_QUIRK_IGNORE_DEV_SUBNQN for 144d:a808 (Samsung PM981/983/970 EVO Plus ) Sasha Levin
@ 2026-05-05  9:51 ` Sasha Levin
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.12] nvme-tcp: teardown circular locking fixes Sasha Levin
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.18] nvme-multipath: put module reference when delayed removal work is canceled Sasha Levin
  6 siblings, 0 replies; 7+ messages in thread
From: Sasha Levin @ 2026-05-05  9:51 UTC (permalink / raw)
  To: patches, stable
  Cc: Alistair Francis, Chris Leech, Hannes Reinecke, Keith Busch,
	Sasha Levin, sagi, kch, linux-nvme, linux-kernel

From: Alistair Francis <alistair.francis@wdc.com>

[ Upstream commit 5fc422951c962cc01e654950fc043ebd8fadd865 ]

Curently after the host sends a REPLACETLSPSK we free the TLS keys as
part of calling nvmet_auth_sq_free() on success. This means when the
host sends a follow up REPLACETLSPSK we return CONCAT_MISMATCH as the
check for !nvmet_queue_tls_keyid(req->sq) fails.

A previous attempt to fix this involed not calling nvmet_auth_sq_free()
on successful connections, but that results in memory leaks. Instead we
should not clear `tls_key` in nvmet_auth_sq_free(), as that was
incorrectly wiping the tls keys which are used for the session.

This patch ensures we correctly free the ephemeral session key on
connection, yet we don't free the TLS key unless closing the connection.

Reviewed-by: Chris Leech <cleech@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Phase 1: Commit Message Forensics

Step 1.1 Record: subsystem `nvmet-tcp` / NVMe target TCP; action verb
“Don’t clear”; intent is to stop `nvmet_auth_sq_free()` from wiping
`sq->tls_key` during authentication cleanup.

Step 1.2 Record: tags in upstream commit `5fc422951c962`: `Reviewed-by:
Chris Leech <cleech@redhat.com>`, `Reviewed-by: Hannes Reinecke
<hare@suse.de>`, `Signed-off-by: Alistair Francis
<alistair.francis@wdc.com>`, `Signed-off-by: Keith Busch
<kbusch@kernel.org>`. No `Fixes:`, no `Reported-by:`, no `Cc: stable`.

Step 1.3 Record: the described bug is deterministic: after a host sends
`REPLACETLSPSK`, successful auth cleanup calls `nvmet_auth_sq_free()`,
which clears `sq->tls_key`; a later `REPLACETLSPSK` then fails the
`!nvmet_queue_tls_keyid(req->sq)` check and returns `CONCAT_MISMATCH`.
The commit also states the earlier approach of not calling
`nvmet_auth_sq_free()` leaked memory.

Step 1.4 Record: this is not a hidden cleanup; it is an explicit
correctness fix for broken TLS PSK replacement and for preserving the
key until connection close.

## Phase 2: Diff Analysis

Step 2.1 Record: one file changed, `drivers/nvme/target/auth.c`;
upstream diff is 3 deletions. Modified function: `nvmet_auth_sq_free()`.
Scope: single-file surgical fix.

Step 2.2 Record: before, `nvmet_auth_sq_free()` canceled auth timeout
work, cleared `sq->tls_key`, then freed DH-CHAP ephemeral buffers.
After, it still cancels work and frees `dhchap_c1`, `dhchap_c2`, and
`dhchap_skey`, but no longer clears the TCP TLS key.

Step 2.3 Record: bug category is logic/resource lifetime.
`nvmet_tcp_tls_key_lookup()` stores a key from `nvme_tls_key_lookup()`
into `queue->nvme_sq.tls_key`; `key_lookup()` increments the key
refcount. `nvmet_sq_put_tls_key()` later does `key_put()` only if
`sq->tls_key` is still set. Clearing the pointer in
`nvmet_auth_sq_free()` both breaks later `REPLACETLSPSK` validation and
prevents the normal close path from seeing the key pointer.

Step 2.4 Record: fix quality is high. The patch removes only incorrect
ownership cleanup from the per-auth cleanup helper. Regression risk is
low because the verified TCP queue release path calls
`nvmet_sq_put_tls_key()` before `nvmet_sq_destroy()`, and non-TCP
transports do not populate `sq->tls_key`.

## Phase 3: Git History Investigation

Step 3.1 Record: `git blame` on the changed area shows `sq->tls_key`
handling was introduced by `fa2e0f8bbc689` (“nvmet-tcp: support secure
channel concatenation”), first contained at `v6.15-rc1~166^2^2~13`;
`sq->tls_key = NULL` was later style-adjusted by `b1efcc470eb30`. The
older `nvmet_auth_sq_free()` DH-CHAP cleanup itself dates to the
original auth code around v6.0, but the TLS-specific bug begins with
`fa2e0f8bbc689`.

Step 3.2 Record: no `Fixes:` tag is present. I manually inspected the
likely introducing commit `fa2e0f8bbc689`, which added secure channel
concatenation, `sq->tls_key`, `nvmet_queue_tls_keyid()`,
`nvmet_sq_put_tls_key()`, and the clear in `nvmet_auth_sq_free()`.

Step 3.3 Record: recent related upstream history includes
`2e6eb6b277f59` (“Don’t free SQ on authentication success”),
`f920ebd03cd13` reverting that due to leaks, and this commit
`5fc422951c962`. This commit was submitted as patch 2/2 after the
revert. On the checked stable branches, `2e6eb6b277f59` is not an
ancestor, so this patch can apply standalone there.

Step 3.4 Record: Alistair Francis has multiple recent NVMe target
auth/TLS commits in this subsystem, including `ecf4d2d883515`,
`2e6eb6b277f59`, `f920ebd03cd13`, `5fc422951c962`, and `5d10069e1a169`.

Step 3.5 Record: dependencies are minimal. The patch depends on
`sq->tls_key` and `nvmet_auth_sq_free()` existing, which local stable
refs show in `6.15.y` and newer. If a target stable tree had already
taken `2e6eb6b277f59`, then the paired revert `f920ebd03cd13` would also
be needed; in the local stable refs checked, that prerequisite is not
needed.

## Phase 4: Mailing List And External Research

Step 4.1 Record: `b4 dig -c 5fc422951c962...` found the original
submission at `https://patch.msgid.link/20260417004809.2894745-2-
alistair.francis@wdc.com`. `b4 dig -a` found only a v1 two-patch series.
The committed version is the reviewed/applied version.

Step 4.2 Record: `b4 dig -w` showed the patch was sent to the expected
NVMe/block audience: Hannes Reinecke, Christoph Hellwig, Sagi Grimberg,
Chaitanya Kulkarni, Keith Busch, linux-nvme, linux-block, linux-kernel,
Yi Zhang, Maurizio Lombardi, and Shinichiro Kawasaki. Lore replies show
`Reviewed-by` from Hannes Reinecke and Chris Leech, and Keith Busch
replied that patches 1 and 2 were applied to `nvme-7.1`.

Step 4.3 Record: no direct bug-report link is in this commit. The paired
revert references a kmemleak report from Yi Zhang during blktests
(`nvme/041`, `nvme/042`, `nvme/043`, `nvme/044`, `nvme/045`, `nvme/051`,
`nvme/052`), confirming the earlier “don’t call cleanup” approach leaked
DH-CHAP allocations.

Step 4.4 Record: series context is important: patch 1 reverts the
earlier workaround that skipped `nvmet_auth_sq_free()` on success; patch
2, this commit, fixes the root cause by keeping cleanup but no longer
clearing `tls_key`.

Step 4.5 Record: stable-specific web search through
`lore.kernel.org/stable` was blocked by Anubis; the `yhbt.net` stable
path returned 404. I found an AUTOSEL posting for the earlier
`2e6eb6b277f59` workaround, but did not use that as selection evidence.

## Phase 5: Code Semantic Analysis

Step 5.1 Record: modified function is `nvmet_auth_sq_free()`.

Step 5.2 Record: callers found by `rg`: `nvmet_execute_auth_send()`,
`nvmet_execute_auth_receive()`, and `nvmet_sq_destroy()`. Auth
send/receive are assigned as handlers for `nvme_fabrics_type_auth_send`
and `nvme_fabrics_type_auth_receive` in `fabrics-cmd.c`.

Step 5.3 Record: key callees around the affected flow include
`nvmet_queue_tls_keyid()`, `nvmet_auth_insert_psk()`,
`nvmet_tcp_tls_key_lookup()`, `nvme_tls_key_lookup()`, `key_lookup()`,
`nvmet_sq_put_tls_key()`, and `key_put()`.

Step 5.4 Record: reachability is verified through the NVMe/TCP receive
path: `nvmet_tcp_done_recv_pdu()` initializes a request, assigns/uses
the fabrics auth execute handler, and runs `req->execute()`. This is
reachable to an NVMe-oF host using target auth/TLS, not a theoretical
internal path.

Step 5.5 Record: similar lifetime pattern found: the only proper TCP TLS
SQ key release helper is `nvmet_sq_put_tls_key()`, called from
`nvmet_tcp_release_queue_work()`. `nvmet_auth_sq_free()` is a DH-CHAP
ephemeral cleanup helper and should not own the TLS key.

## Phase 6: Cross-Referencing And Stable Tree Analysis

Step 6.1 Record: local stable refs show `sq->tls_key = NULL` in
`nvmet_auth_sq_free()` exists in `stable/linux-6.15.y`, `6.16.y`,
`6.17.y`, `6.18.y`, `6.19.y`, and `7.0.y`; it is absent from `6.14.y`,
`6.13.y`, `6.12.y`, `6.6.y`, and `6.1.y`.

Step 6.2 Record: `git apply --check` of the candidate patch succeeds on
local `stable/linux-6.15.y` through `stable/linux-7.0.y`, and fails on
`6.14.y` because the buggy TLS key clearing code is not there. Expected
backport difficulty: clean for affected local refs.

Step 6.3 Record: bounded related-history searches found no existing
equivalent fix in `stable/linux-7.0.y`. Upstream has the earlier
attempted fix `2e6eb6b277f59`, its revert `f920ebd03cd13`, and this
final fix.

## Phase 7: Subsystem And Maintainer Context

Step 7.1 Record: subsystem is NVMe target over TCP, with authentication
and TLS secure channel concatenation. Criticality: IMPORTANT, because it
affects networked storage authentication/TLS behavior, though not a
universal core-kernel path.

Step 7.2 Record: subsystem is active. Recent history in
`drivers/nvme/target` includes multiple auth/TLS fixes and refactors,
including secure concatenation support and follow-up fixes through the
6.15 to 7.1 development window.

## Phase 8: Impact And Risk Assessment

Step 8.1 Record: affected users are systems using NVMe target TCP with
`CONFIG_NVME_TARGET_AUTH` and `CONFIG_NVME_TARGET_TCP_TLS`, specifically
secure channel concatenation / TLS PSK replacement.

Step 8.2 Record: trigger is a host sending a follow-up `REPLACETLSPSK`
after successful authentication on a TLS-enabled admin queue. The
failure is deterministic from the verified code path. I did not verify
that an unprivileged local user can trigger it; this is a
remote/protocol operation by an NVMe host with access to the target.

Step 8.3 Record: failure mode is protocol/authentication failure with
`CONCAT_MISMATCH`, breaking TLS PSK replacement/key rotation.
Additionally, because `key_lookup()` increments the key refcount and the
close path only `key_put()`s if `sq->tls_key` remains set, clearing the
pointer early risks leaking the key reference. Severity: HIGH for
affected NVMe/TCP secure-channel users; not a crash/data-corruption fix.

Step 8.4 Record: benefit is high for affected users because it fixes a
broken authenticated TLS key replacement flow and restores the verified
close-time key release path. Risk is very low: 3 deleted lines, no new
API, no new feature, no locking changes, no cross-subsystem behavior
change.

## Phase 9: Final Synthesis

Step 9.1 Record: evidence for backporting: real deterministic protocol
failure; possible key-ref lifetime leak; tiny one-function patch;
reviewed by relevant NVMe/TLS people; applied by NVMe maintainer;
cleanly applies to affected stable refs; affects released stable
branches from 6.15.y onward. Evidence against: affects a relatively
specialized NVMe/TCP TLS configuration and needs no backport before
6.15.y. Unresolved: I could not access `lore.kernel.org/stable` directly
due Anubis, and I did not run runtime blktests.

Step 9.2 Stable rules checklist: obviously correct and reviewed: yes.
Fixes a real bug: yes. Important issue: yes for NVMe/TCP TLS key
replacement and key lifetime. Small and contained: yes, 3 deleted lines
in one function. No new features/APIs: yes. Applies to stable trees: yes
for local `6.15.y` through `7.0.y`; not applicable to older checked
branches.

Step 9.3 Exception category: not a device ID, quirk, DT, build, or
documentation exception.

Step 9.4 Decision: backport to stable trees that contain `fa2e0f8bbc689`
/ the `sq->tls_key` clearing code, namely local refs `6.15.y` and newer.
Do not apply to older trees where the code is absent. If a target branch
contains the earlier workaround `2e6eb6b277f59`, include/reconcile the
paired revert `f920ebd03cd13`; otherwise this commit stands alone.

## Verification

- Phase 1: Found upstream commit `5fc422951c962` with `git log
  origin/master --grep`.
- Phase 1: Parsed commit tags from `git show --format=fuller --stat
  --patch 5fc422951c962`.
- Phase 2: Verified the diff removes only the
  `CONFIG_NVME_TARGET_TCP_TLS` block clearing `sq->tls_key`.
- Phase 2: Verified `nvmet_queue_tls_keyid()` returns 0 when
  `sq->tls_key` is NULL.
- Phase 2: Verified `REPLACETLSPSK` returns `CONCAT_MISMATCH` when
  `nvmet_queue_tls_keyid(req->sq)` is false.
- Phase 2: Verified `nvmet_sq_put_tls_key()` calls `key_put()` and NULLs
  the key on TCP queue release.
- Phase 2: Verified `nvme_tls_key_lookup()` uses `key_lookup()`, and
  `key_lookup()` increments the key refcount.
- Phase 3: Ran `git blame` on `auth.c`; TLS key clearing originates from
  `fa2e0f8bbc689`/`b1efcc470eb30`.
- Phase 3: Ran `git describe --contains fa2e0f8bbc689`; first contained
  at `v6.15-rc1`.
- Phase 3: Inspected `fa2e0f8bbc689`, `b1efcc470eb30`, `2e6eb6b277f59`,
  `f920ebd03cd13`, and `ecf4d2d883515`.
- Phase 4: Ran `b4 dig -c`, `b4 dig -a`, and `b4 dig -w`; found v1 two-
  patch lore series and recipient list.
- Phase 4: Fetched lore mirror thread; verified Hannes Reinecke and
  Chris Leech reviewed, Keith Busch applied patches 1 and 2.
- Phase 4: Fetched Yi Zhang kmemleak report linked by the paired revert.
- Phase 5: Used `rg` and file reads to trace auth command handlers, TCP
  receive execution, TLS key lookup, and queue release.
- Phase 6: Checked local stable refs with scripted `git show`; buggy
  code exists in `6.15.y` through `7.0.y`, absent in older checked
  stable refs.
- Phase 6: Ran `git apply --check` against local stable worktrees; clean
  for `6.15.y` through `7.0.y`, not applicable to `6.14.y`.
- Phase 7: Reviewed recent `drivers/nvme/target` history showing active
  auth/TLS development.
- Phase 8: Verified trigger and failure path from
  `nvmet_auth_negotiate()` and auth send/receive call flow.
- Unverified: no runtime blktests were run.
- Unverified: direct `lore.kernel.org/stable` search was blocked by
  Anubis; the mirror stable search path returned 404.

**YES**

 drivers/nvme/target/auth.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/nvme/target/auth.c b/drivers/nvme/target/auth.c
index 2eadeb7e06f26..3a905124afdee 100644
--- a/drivers/nvme/target/auth.c
+++ b/drivers/nvme/target/auth.c
@@ -239,9 +239,6 @@ u8 nvmet_setup_auth(struct nvmet_ctrl *ctrl, struct nvmet_sq *sq)
 void nvmet_auth_sq_free(struct nvmet_sq *sq)
 {
 	cancel_delayed_work(&sq->auth_expired_work);
-#ifdef CONFIG_NVME_TARGET_TCP_TLS
-	sq->tls_key = NULL;
-#endif
 	kfree(sq->dhchap_c1);
 	sq->dhchap_c1 = NULL;
 	kfree(sq->dhchap_c2);
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH AUTOSEL 7.0-6.12] nvme-tcp: teardown circular locking fixes
       [not found] <20260505095149.512052-1-sashal@kernel.org>
                   ` (4 preceding siblings ...)
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.18] nvmet-tcp: Don't clear tls_key when freeing sq Sasha Levin
@ 2026-05-05  9:51 ` Sasha Levin
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.18] nvme-multipath: put module reference when delayed removal work is canceled Sasha Levin
  6 siblings, 0 replies; 7+ messages in thread
From: Sasha Levin @ 2026-05-05  9:51 UTC (permalink / raw)
  To: patches, stable
  Cc: Chaitanya Kulkarni, Christoph Hellwig, Sagi Grimberg,
	Hannes Reinecke, Keith Busch, Sasha Levin, linux-nvme,
	linux-kernel

From: Chaitanya Kulkarni <kch@nvidia.com>

[ Upstream commit 26bb12b9caafa2e62d638104bf2732f610cdbb0b ]

When a controller reset is triggered via sysfs (by writing to
/sys/class/nvme/<nvmedev>/reset_controller), the reset work tears down
and re-establishes all queues. The socket release using fput() defers
the actual cleanup to task_work delayed_fput workqueue. This deferred
cleanup can race with the subsequent queue re-allocation during reset,
potentially leading to use-after-free or resource conflicts.

Replace fput() with __fput_sync() to ensure synchronous socket release,
guaranteeing that all socket resources are fully cleaned up before the
function returns. This prevents races during controller reset where
new queue setup may begin before the old socket is fully released.

* Call chain during reset:
  nvme_reset_ctrl_work()
    -> nvme_tcp_teardown_ctrl()
      -> nvme_tcp_teardown_io_queues()
        -> nvme_tcp_free_io_queues()
          -> nvme_tcp_free_queue()       <-- fput() -> __fput_sync()
      -> nvme_tcp_teardown_admin_queue()
        -> nvme_tcp_free_admin_queue()
          -> nvme_tcp_free_queue()       <-- fput() -> __fput_sync()
    -> nvme_tcp_setup_ctrl()             <-- race with deferred fput

memalloc_noreclaim_save() sets PF_MEMALLOC which is intended for tasks
performing memory reclaim work that need reserve access. While PF_MEMALLOC
prevents the task from entering direct reclaim (causing __need_reclaim() to
return false), it does not strip __GFP_IO from gfp flags. The allocator can
therefore still trigger writeback I/O when __GFP_IO remains set, which is
unsafe when the caller holds block layer locks.

Switch to memalloc_noio_save() which sets PF_MEMALLOC_NOIO. This causes
current_gfp_context() to strip __GFP_IO|__GFP_FS from every allocation in
the scope, making it safe to allocate memory while holding elevator_lock and
set->srcu.

* The issue can be reproduced using blktests:

  nvme_trtype=tcp ./check nvme/005
blktests (master) # nvme_trtype=tcp ./check nvme/005
nvme/005 (tr=tcp) (reset local loopback target)              [failed]
    runtime  0.725s  ...  0.798s
    something found in dmesg:
    [  108.473940] run blktests nvme/005 at 2025-11-22 16:12:20

    [...]
    ...
    (See '/root/blktests/results/nodev_tr_tcp/nvme/005.dmesg' for the entire message)
blktests (master) # cat /root/blktests/results/nodev_tr_tcp/nvme/005.dmesg
[  108.473940] run blktests nvme/005 at 2025-11-22 16:12:20
[  108.526983] loop0: detected capacity change from 0 to 2097152
[  108.555606] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[  108.572531] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
[  108.613061] nvmet: Created nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
[  108.616832] nvme nvme0: creating 48 I/O queues.
[  108.630791] nvme nvme0: mapped 48/0/0 default/read/poll queues.
[  108.661892] nvme nvme0: new ctrl: NQN "blktests-subsystem-1", addr 127.0.0.1:4420, hostnqn: nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
[  108.746639] nvmet: Created nvm controller 2 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
[  108.748466] nvme nvme0: creating 48 I/O queues.
[  108.802984] nvme nvme0: mapped 48/0/0 default/read/poll queues.
[  108.829983] nvme nvme0: Removing ctrl: NQN "blktests-subsystem-1"
[  108.854288] block nvme0n1: no available path - failing I/O
[  108.854344] block nvme0n1: no available path - failing I/O
[  108.854373] Buffer I/O error on dev nvme0n1, logical block 1, async page read

[  108.891693] ======================================================
[  108.895912] WARNING: possible circular locking dependency detected
[  108.900184] 6.17.0nvme+ #3 Tainted: G                 N
[  108.903913] ------------------------------------------------------
[  108.908171] nvme/2734 is trying to acquire lock:
[  108.911957] ffff88810210e610 (set->srcu){.+.+}-{0:0}, at: __synchronize_srcu+0x17/0x170
[  108.917587]
               but task is already holding lock:
[  108.921570] ffff88813abea198 (&q->elevator_lock){+.+.}-{4:4}, at: elevator_change+0xa8/0x1c0
[  108.927361]
               which lock already depends on the new lock.

[  108.933018]
               the existing dependency chain (in reverse order) is:
[  108.938223]
               -> #4 (&q->elevator_lock){+.+.}-{4:4}:
[  108.942988]        __mutex_lock+0xa2/0x1150
[  108.945873]        elevator_change+0xa8/0x1c0
[  108.948925]        elv_iosched_store+0xdf/0x140
[  108.952043]        kernfs_fop_write_iter+0x16a/0x220
[  108.955367]        vfs_write+0x378/0x520
[  108.957598]        ksys_write+0x67/0xe0
[  108.959721]        do_syscall_64+0x76/0xbb0
[  108.962052]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  108.965145]
               -> #3 (&q->q_usage_counter(io)){++++}-{0:0}:
[  108.968923]        blk_alloc_queue+0x30e/0x350
[  108.972117]        blk_mq_alloc_queue+0x61/0xd0
[  108.974677]        scsi_alloc_sdev+0x2a0/0x3e0
[  108.977092]        scsi_probe_and_add_lun+0x1bd/0x430
[  108.979921]        __scsi_add_device+0x109/0x120
[  108.982504]        ata_scsi_scan_host+0x97/0x1c0
[  108.984365]        async_run_entry_fn+0x2d/0x130
[  108.986109]        process_one_work+0x20e/0x630
[  108.987830]        worker_thread+0x184/0x330
[  108.989473]        kthread+0x10a/0x250
[  108.990852]        ret_from_fork+0x297/0x300
[  108.992491]        ret_from_fork_asm+0x1a/0x30
[  108.994159]
               -> #2 (fs_reclaim){+.+.}-{0:0}:
[  108.996320]        fs_reclaim_acquire+0x99/0xd0
[  108.998058]        kmem_cache_alloc_node_noprof+0x4e/0x3c0
[  109.000123]        __alloc_skb+0x15f/0x190
[  109.002195]        tcp_send_active_reset+0x3f/0x1e0
[  109.004038]        tcp_disconnect+0x50b/0x720
[  109.005695]        __tcp_close+0x2b8/0x4b0
[  109.007227]        tcp_close+0x20/0x80
[  109.008663]        inet_release+0x31/0x60
[  109.010175]        __sock_release+0x3a/0xc0
[  109.011778]        sock_close+0x14/0x20
[  109.013263]        __fput+0xee/0x2c0
[  109.014673]        delayed_fput+0x31/0x50
[  109.016183]        process_one_work+0x20e/0x630
[  109.017897]        worker_thread+0x184/0x330
[  109.019543]        kthread+0x10a/0x250
[  109.020929]        ret_from_fork+0x297/0x300
[  109.022565]        ret_from_fork_asm+0x1a/0x30
[  109.024194]
               -> #1 (sk_lock-AF_INET-NVME){+.+.}-{0:0}:
[  109.026634]        lock_sock_nested+0x2e/0x70
[  109.028251]        tcp_sendmsg+0x1a/0x40
[  109.029783]        sock_sendmsg+0xed/0x110
[  109.031321]        nvme_tcp_try_send_cmd_pdu+0x13e/0x260 [nvme_tcp]
[  109.034263]        nvme_tcp_try_send+0xb3/0x330 [nvme_tcp]
[  109.036375]        nvme_tcp_queue_rq+0x342/0x3d0 [nvme_tcp]
[  109.038528]        blk_mq_dispatch_rq_list+0x297/0x800
[  109.040448]        __blk_mq_sched_dispatch_requests+0x3db/0x5f0
[  109.042677]        blk_mq_sched_dispatch_requests+0x29/0x70
[  109.044787]        blk_mq_run_work_fn+0x76/0x1b0
[  109.046535]        process_one_work+0x20e/0x630
[  109.048245]        worker_thread+0x184/0x330
[  109.049890]        kthread+0x10a/0x250
[  109.051331]        ret_from_fork+0x297/0x300
[  109.053024]        ret_from_fork_asm+0x1a/0x30
[  109.054740]
               -> #0 (set->srcu){.+.+}-{0:0}:
[  109.056850]        __lock_acquire+0x1468/0x2210
[  109.058614]        lock_sync+0xa5/0x110
[  109.060048]        __synchronize_srcu+0x49/0x170
[  109.061802]        elevator_switch+0xc9/0x330
[  109.063950]        elevator_change+0x128/0x1c0
[  109.065675]        elevator_set_none+0x4c/0x90
[  109.067316]        blk_unregister_queue+0xa8/0x110
[  109.069165]        __del_gendisk+0x14e/0x3c0
[  109.070824]        del_gendisk+0x75/0xa0
[  109.072328]        nvme_ns_remove+0xf2/0x230 [nvme_core]
[  109.074365]        nvme_remove_namespaces+0xf2/0x150 [nvme_core]
[  109.076652]        nvme_do_delete_ctrl+0x71/0x90 [nvme_core]
[  109.078775]        nvme_delete_ctrl_sync+0x3b/0x50 [nvme_core]
[  109.081009]        nvme_sysfs_delete+0x34/0x40 [nvme_core]
[  109.083082]        kernfs_fop_write_iter+0x16a/0x220
[  109.085009]        vfs_write+0x378/0x520
[  109.086539]        ksys_write+0x67/0xe0
[  109.087982]        do_syscall_64+0x76/0xbb0
[  109.089577]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  109.091665]
               other info that might help us debug this:

[  109.095478] Chain exists of:
                 set->srcu --> &q->q_usage_counter(io) --> &q->elevator_lock

[  109.099544]  Possible unsafe locking scenario:

[  109.101708]        CPU0                    CPU1
[  109.103402]        ----                    ----
[  109.105103]   lock(&q->elevator_lock);
[  109.106530]                                lock(&q->q_usage_counter(io));
[  109.109022]                                lock(&q->elevator_lock);
[  109.111391]   sync(set->srcu);
[  109.112586]
                *** DEADLOCK ***

[  109.114772] 5 locks held by nvme/2734:
[  109.116189]  #0: ffff888101925410 (sb_writers#4){.+.+}-{0:0}, at: ksys_write+0x67/0xe0
[  109.119143]  #1: ffff88817a914e88 (&of->mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0x10f/0x220
[  109.123141]  #2: ffff8881046313f8 (kn->active#185){++++}-{0:0}, at: sysfs_remove_file_self+0x26/0x50
[  109.126543]  #3: ffff88810470e1d0 (&set->update_nr_hwq_lock){++++}-{4:4}, at: del_gendisk+0x6d/0xa0
[  109.129891]  #4: ffff88813abea198 (&q->elevator_lock){+.+.}-{4:4}, at: elevator_change+0xa8/0x1c0
[  109.133149]
               stack backtrace:
[  109.134817] CPU: 6 UID: 0 PID: 2734 Comm: nvme Tainted: G                 N  6.17.0nvme+ #3 PREEMPT(voluntary)
[  109.134819] Tainted: [N]=TEST
[  109.134820] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[  109.134821] Call Trace:
[  109.134823]  <TASK>
[  109.134824]  dump_stack_lvl+0x75/0xb0
[  109.134828]  print_circular_bug+0x26a/0x330
[  109.134831]  check_noncircular+0x12f/0x150
[  109.134834]  __lock_acquire+0x1468/0x2210
[  109.134837]  ? __synchronize_srcu+0x17/0x170
[  109.134838]  lock_sync+0xa5/0x110
[  109.134840]  ? __synchronize_srcu+0x17/0x170
[  109.134842]  __synchronize_srcu+0x49/0x170
[  109.134843]  ? mark_held_locks+0x49/0x80
[  109.134845]  ? _raw_spin_unlock_irqrestore+0x2d/0x60
[  109.134847]  ? kvm_clock_get_cycles+0x14/0x30
[  109.134853]  ? ktime_get_mono_fast_ns+0x36/0xb0
[  109.134858]  elevator_switch+0xc9/0x330
[  109.134860]  elevator_change+0x128/0x1c0
[  109.134862]  ? kernfs_put.part.0+0x86/0x290
[  109.134864]  elevator_set_none+0x4c/0x90
[  109.134866]  blk_unregister_queue+0xa8/0x110
[  109.134868]  __del_gendisk+0x14e/0x3c0
[  109.134870]  del_gendisk+0x75/0xa0
[  109.134872]  nvme_ns_remove+0xf2/0x230 [nvme_core]
[  109.134879]  nvme_remove_namespaces+0xf2/0x150 [nvme_core]
[  109.134887]  nvme_do_delete_ctrl+0x71/0x90 [nvme_core]
[  109.134893]  nvme_delete_ctrl_sync+0x3b/0x50 [nvme_core]
[  109.134899]  nvme_sysfs_delete+0x34/0x40 [nvme_core]
[  109.134905]  kernfs_fop_write_iter+0x16a/0x220
[  109.134908]  vfs_write+0x378/0x520
[  109.134911]  ksys_write+0x67/0xe0
[  109.134913]  do_syscall_64+0x76/0xbb0
[  109.134915]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  109.134916] RIP: 0033:0x7fd68a737317
[  109.134917] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[  109.134919] RSP: 002b:00007ffded1546d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  109.134920] RAX: ffffffffffffffda RBX: 000000000054f7e0 RCX: 00007fd68a737317
[  109.134921] RDX: 0000000000000001 RSI: 00007fd68a855719 RDI: 0000000000000003
[  109.134921] RBP: 0000000000000003 R08: 0000000030407850 R09: 00007fd68a7cd4e0
[  109.134922] R10: 00007fd68a65b130 R11: 0000000000000246 R12: 00007fd68a855719
[  109.134923] R13: 00000000304074c0 R14: 00000000304074c0 R15: 0000000030408660
[  109.134926]  </TASK>
[  109.962756] Key type psk unregistered

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Backport Analysis

### Phase 1: Commit Message Forensics
Step 1.1 Record: subsystem is `nvme-tcp` under
`drivers/nvme/host/tcp.c`; action is a fix for teardown circular
locking. Claimed intent is to make socket teardown synchronous during
reset and use a NOIO allocation scope during socket release.

Step 1.2 Record: candidate tags present are:
- `Reviewed-by: Christoph Hellwig <hch@lst.de>`
- `Reviewed-by: Sagi Grimberg <sagi@grimberg.me>`
- `Reviewed-by: Hannes Reinecke <hare@suse.de>`
- `Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>`
- `Signed-off-by: Keith Busch <kbusch@kernel.org>`

No `Fixes:`, `Reported-by:`, `Tested-by:`, `Cc: stable`, or `Link:` tag
is present in the supplied commit message. `b4 am` additionally found
review trailers from Hannes Reinecke, Daniel Wagner, and Nilay Shroff on
the v4 thread.

Step 1.3 Record: the commit describes a real reset-time race and lockdep
issue. The verified reproducer is `nvme_trtype=tcp ./check nvme/005`.
The supplied dmesg shows a “possible circular locking dependency
detected” involving `set->srcu`, `q->elevator_lock`, `fs_reclaim`, and
NVMe/TCP socket teardown through `delayed_fput`. The root cause is that
`fput()` defers `__fput()` and `memalloc_noreclaim_save()` does not
strip `__GFP_IO`.

Step 1.4 Record: this is not merely cleanup. It is a synchronization and
deadlock-prevention fix, with a claimed possible UAF/resource conflict
from deferred socket release. The UAF aspect is verified as
author/reviewer rationale, while the lockdep circular dependency is
directly backed by the reported trace.

### Phase 2: Diff Analysis
Step 2.1 Record: one file changed: `drivers/nvme/host/tcp.c`, 21
insertions and 7 deletions in the v4 patch. Functions modified:
`nvme_tcp_free_queue()` and the `err_sock` path in
`nvme_tcp_alloc_queue()`. Scope is a single-file surgical driver fix.

Step 2.2 Record:
- In `nvme_tcp_free_queue()`, before: drain page fragments, enter
  `memalloc_noreclaim_save()`, call deferred `fput()`, clear
  `queue->sock`, restore noreclaim. After: enter `memalloc_noio_save()`,
  call `__fput_sync()`, clear `queue->sock`, restore NOIO.
- In `nvme_tcp_alloc_queue()` error handling, before: failed queue setup
  used `fput()`. After: it uses `__fput_sync()`.

Step 2.3 Record: bug categories are race condition, lockdep/deadlock
prevention, and allocation-context correctness. The race mechanism is
deferred file/socket destruction via `fput()` while reset immediately
re-enters queue setup. The deadlock mechanism is socket close allocating
memory with I/O allowed while block teardown paths hold locks.

Step 2.4 Record: the fix is minimal and understandable. Risk is low to
medium: `__fput_sync()` is intentionally special-purpose and
`fs/file_table.c` warns not to blindly convert callers, but here the
code has a concrete need and subsystem reviewers accepted it.
`memalloc_noio_save()` is the right primitive for suppressing
`__GFP_IO|__GFP_FS`, verified in `include/linux/sched/mm.h`.

### Phase 3: Git History Investigation
Step 3.1 Record: `git blame` shows:
- `nvme_tcp_free_queue()` dates to `3f2304f8c6d6` (“nvme-tcp: add NVMe
  over TCP host driver”), described as first contained around
  `v5.0-rc1`.
- `memalloc_noreclaim_save()` in this area came from `83e1226b0ee2`
  (“nvme-tcp: fix possible circular locking when deleting a controller
  under memory pressure”), first contained around `v6.1-rc3`.
- `fput(queue->sock->file)` came from `e40d4eb84089` (“nvme-tcp:
  allocate socket file”), first contained around `v6.7-rc1`.

Step 3.2 Record: no `Fixes:` tag in the candidate. I inspected related
commits instead. `83e1226b0ee2` fixed an earlier lockdep circular
locking report by adding `memalloc_noreclaim_save()`, but this candidate
corrects that to `memalloc_noio_save()` for the newer lock chain.
`e40d4eb84089` introduced socket files for TLS upcalls and therefore the
`fput()` path.

Step 3.3 Record: recent file history shows active NVMe/TCP maintenance,
including queue teardown, TLS, request handling, and UAF/race fixes. No
required multi-patch series dependency was found for the exact current-
tree patch.

Step 3.4 Record: Chaitanya Kulkarni has multiple recent NVMe/block fixes
in `drivers/nvme/host`; the strongest quality signal is that Christoph
Hellwig, Sagi Grimberg, Hannes Reinecke, Daniel Wagner, and Nilay Shroff
reviewed/tested or reviewed the patch thread.

Step 3.5 Record: dependencies are existing core APIs: `__fput_sync()` is
exported in `fs/file_table.c`, and `memalloc_noio_save()` is available
in `include/linux/sched/mm.h`. For older stable branches, the exact
`fput(queue->sock->file)` part only exists where `e40d4eb84089` is
present.

### Phase 4: Mailing List And External Research
Step 4.1 Record: I could not use `b4 dig -c` because the exact applied
commit hash was not present in local `master`, `linus-next/master`,
`storage-next`, or `pending-7.0`; `b4 dig` only accepts a commitish. I
used the message-id with `b4 am -c`, which found the v4 patch at
`https://patch.msgid.link/20260413171628.6204-1-kch@nvidia.com`.

Step 4.2 Record: original recipients/reviewers verified from the raw
thread: To `kbusch`, `sagi`; Cc `hch`, `linux-nvme`. Review trailers
found: Christoph Hellwig, Sagi Grimberg, Hannes Reinecke, Daniel Wagner,
Nilay Shroff. Keith Busch replied “applied to nvme-7.1”.

Step 4.3 Record: no syzbot/bugzilla report. The bug report evidence is
the included blktests `nvme/005` failure and lockdep trace. Daniel
Wagner replied that he tested locally with blktests and it passed,
though he could not reproduce the original failure.

Step 4.4 Record: patch evolution was v2 to v3 to v4. v2 only converted
`fput()` to `__fput_sync()`. v3 added the `memalloc_noio_save()` change
after feedback from Nilay/Christoph/Hannes. v4 rebased/retested and
added review tags. No NAKs found.

Step 4.5 Record: direct lore stable search was blocked by Anubis; web
search did not find stable-specific objections or a known reason to
avoid stable.

### Phase 5: Code Semantic Analysis
Step 5.1 Record: key functions modified are `nvme_tcp_free_queue()` and
`nvme_tcp_alloc_queue()`.

Step 5.2 Record: callers verified:
- `nvme_tcp_free_admin_queue()` calls `nvme_tcp_free_queue(ctrl, 0)`.
- `nvme_tcp_free_io_queues()` calls `nvme_tcp_free_queue()` for I/O
  queues.
- `nvme_tcp_teardown_ctrl()` calls I/O teardown then admin teardown.
- `nvme_reset_ctrl_work()` calls `nvme_tcp_teardown_ctrl()` followed
  immediately by `nvme_tcp_setup_ctrl()`.
- `nvme_sysfs_reset()` calls `nvme_reset_ctrl_sync()`, which queues and
  flushes `ctrl->reset_work`.

Step 5.3 Record: important callees are `fput()`/`__fput_sync()`, socket
close through `__fput()`, `tcp_close()`, `tcp_disconnect()`, and
allocations in `tcp_send_active_reset()` as shown in the trace.
`current_gfp_context()` strips `__GFP_IO|__GFP_FS` only for
`PF_MEMALLOC_NOIO`, not plain `PF_MEMALLOC`.

Step 5.4 Record: reachability is real. The reset path is reachable from
writable sysfs `reset_controller` and from NVMe reset ioctl paths; both
are privileged/admin operations. The send-side lock chain is reachable
through normal NVMe/TCP block I/O via `nvme_tcp_queue_rq()` ->
`nvme_tcp_queue_request()` -> workqueue send.

Step 5.5 Record: related pattern found: prior `83e1226b0ee2` was also an
NVMe/TCP circular locking fix around socket teardown under memory
pressure. No prior `__fput_sync()` fix in `drivers/nvme/host/tcp.c`
history was found.

### Phase 6: Stable Tree Analysis
Step 6.1 Record:
- `stable/linux-6.12.y`, `6.17.y`, `6.18.y`, `6.19.y`, and `7.0.y`
  contain `sock_alloc_file()`, `fput(queue->sock->file)`, and
  `memalloc_noreclaim_save()`, so they contain the exact bug pattern.
- `stable/linux-6.1.y` and `6.6.y` contain the
  `memalloc_noreclaim_save()` plus `sock_release(queue->sock)` teardown
  pattern, so the NOIO part is relevant but the exact `fput()` hunk does
  not apply.
- `stable/linux-5.10.y` and `5.15.y` in this repo did not show the
  specific `memalloc_noreclaim_save()` or `fput(queue->sock->file)`
  patterns.

Step 6.2 Record: `git apply --check` succeeds on the current `7.0.y`
checkout. Raw v4 patch does not apply cleanly to `6.12.y`, `6.6.y`, or
`5.10.y` test worktrees; `6.12.y` has the bug pattern but nearby context
differs, while `6.6.y`/`5.10.y` lack the `fput(queue->sock->file)` form.
Expected backport difficulty is clean for current 7.0, minor rework for
6.12+, and adapted/no partial backport for older branches.

Step 6.3 Record: related fix already in current history is
`83e1226b0ee2`; this candidate is a follow-up/correction rather than a
duplicate. I found no alternate `__fput_sync()` fix already in this file
history.

### Phase 7: Subsystem Context
Step 7.1 Record: subsystem is NVMe/TCP host driver, in storage/block.
Criticality is IMPORTANT: driver-specific, but it backs real block
devices and can affect I/O availability and teardown/reset reliability.

Step 7.2 Record: subsystem is active; recent `drivers/nvme/host/tcp.c`
history includes TLS, queue removal, congestion, stalls, UAF, and
failover fixes.

### Phase 8: Impact And Risk
Step 8.1 Record: affected users are systems using NVMe over TCP,
especially during controller reset/delete/reconnect or tests like
blktests `nvme/005`.

Step 8.2 Record: trigger is privileged/admin reset via sysfs or ioctl,
and teardown/delete paths. The I/O lock chain involves normal NVMe/TCP
request submission, but initiating reset/delete is not unprivileged in
the verified paths.

Step 8.3 Record: failure mode is at least HIGH and plausibly CRITICAL:
verified lockdep circular dependency with a possible deadlock scenario,
I/O failures in the reproducer trace, and a reviewed claim of possible
UAF/resource conflict from deferred socket cleanup.

Step 8.4 Record: benefit is high for affected NVMe/TCP stable users
because it prevents reset/teardown races and circular locking. Risk is
low-medium because the patch changes teardown/error paths only, is
small, and has strong review, though `__fput_sync()` is a sensitive
primitive.

### Phase 9: Final Synthesis
Step 9.1 Record:
Evidence for backporting: real reproduced lockdep issue, concrete
blktests reproducer, small single-file fix, no new API or feature,
strong subsystem review, exact bug pattern present in active stable
trees from 6.12+ and current 7.0.
Evidence against: exact patch does not apply to some older stable trees;
older branches need adaptation or may not contain the same bug pattern.
`__fput_sync()` has general cautionary documentation.
Unresolved: no applied upstream commit hash available locally, so `b4
dig -c` could not be performed; no direct stable-list discussion could
be fetched due lore Anubis.

Step 9.2 Record:
1. Obviously correct and tested: yes, based on direct code inspection,
   blktests discussion, and review/test replies.
2. Fixes a real bug: yes, verified lockdep circular dependency and reset
   teardown race.
3. Important issue: yes, possible deadlock/hang during NVMe/TCP
   reset/teardown.
4. Small and contained: yes, one file and two localized hunks.
5. No new features/APIs: yes.
6. Can apply to stable: yes for current 7.0; needs minor/adapted
   backports for some older branches.

Step 9.3 Record: no exception category such as device ID, quirk, DT,
build, or docs applies.

Step 9.4 Record: this should be backported to stable trees that contain
the affected NVMe/TCP socket-file teardown path, with branch-specific
adjustment where necessary.

## Verification
- Phase 1: Parsed supplied tags and verified additional review trailers
  via `b4 am -c`.
- Phase 2: Verified diff scope from v4 patch: `drivers/nvme/host/tcp.c`,
  21 insertions, 7 deletions.
- Phase 3: Ran `git blame` on both changed areas; identified
  `83e1226b0ee2`, `e40d4eb84089`, and `3f2304f8c6d6`.
- Phase 3: Ran `git show` on `83e1226b0ee2` and `e40d4eb84089`;
  confirmed prior circular-locking context and socket-file introduction.
- Phase 4: Fetched v2, v3, and v4 threads from public mirrors; verified
  reviewer feedback and no NAKs.
- Phase 5: Read `nvme_reset_ctrl_work()`, queue teardown/setup
  functions, sysfs reset, and `nvme_reset_ctrl_sync()` to verify
  reachability.
- Phase 5: Read `fs/file_table.c` and `include/linux/sched/mm.h` to
  verify `fput()` deferral, `__fput_sync()`, and NOIO semantics.
- Phase 6: Checked stable branch code patterns for `5.10.y`, `5.15.y`,
  `6.1.y`, `6.6.y`, `6.12.y`, `6.17.y`, `6.18.y`, `6.19.y`, and `7.0.y`.
- Phase 6: Ran `git apply --check`; current `7.0.y` applies, older
  tested branches need rework or lack exact context.
- Unverified: exact applied upstream commit hash was not available
  locally, so `b4 dig -c`, `b4 dig -a`, and `b4 dig -w` could not be
  run.

**YES**

 drivers/nvme/host/tcp.c | 28 +++++++++++++++++++++-------
 1 file changed, 21 insertions(+), 7 deletions(-)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 02c95c32b07e3..15d36d6a728e8 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1438,18 +1438,32 @@ static void nvme_tcp_free_queue(struct nvme_ctrl *nctrl, int qid)
 {
 	struct nvme_tcp_ctrl *ctrl = to_tcp_ctrl(nctrl);
 	struct nvme_tcp_queue *queue = &ctrl->queues[qid];
-	unsigned int noreclaim_flag;
+	unsigned int noio_flag;
 
 	if (!test_and_clear_bit(NVME_TCP_Q_ALLOCATED, &queue->flags))
 		return;
 
 	page_frag_cache_drain(&queue->pf_cache);
 
-	noreclaim_flag = memalloc_noreclaim_save();
-	/* ->sock will be released by fput() */
-	fput(queue->sock->file);
+	/**
+	 * Prevent memory reclaim from triggering block I/O during socket
+	 * teardown. The socket release path fput -> tcp_close ->
+	 * tcp_disconnect -> tcp_send_active_reset may allocate memory, and
+	 * allowing reclaim to issue I/O could deadlock if we're being called
+	 * from block device teardown (e.g., del_gendisk -> elevator cleanup)
+	 * which holds locks that the I/O completion path needs.
+	 */
+	noio_flag = memalloc_noio_save();
+
+	/**
+	 * Release the socket synchronously. During reset in
+	 * nvme_reset_ctrl_work(), queue teardown is immediately followed by
+	 * re-allocation. fput() defers socket cleanup to delayed_fput_work
+	 * in workqueue context, which can race with new queue setup.
+	 */
+	__fput_sync(queue->sock->file);
 	queue->sock = NULL;
-	memalloc_noreclaim_restore(noreclaim_flag);
+	memalloc_noio_restore(noio_flag);
 
 	kfree(queue->pdu);
 	mutex_destroy(&queue->send_mutex);
@@ -1901,8 +1915,8 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl, int qid,
 err_rcv_pdu:
 	kfree(queue->pdu);
 err_sock:
-	/* ->sock will be released by fput() */
-	fput(queue->sock->file);
+	/* Use sync variant - see nvme_tcp_free_queue() for explanation */
+	__fput_sync(queue->sock->file);
 	queue->sock = NULL;
 err_destroy_mutex:
 	mutex_destroy(&queue->send_mutex);
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH AUTOSEL 7.0-6.18] nvme-multipath: put module reference when delayed removal work is canceled
       [not found] <20260505095149.512052-1-sashal@kernel.org>
                   ` (5 preceding siblings ...)
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.12] nvme-tcp: teardown circular locking fixes Sasha Levin
@ 2026-05-05  9:51 ` Sasha Levin
  6 siblings, 0 replies; 7+ messages in thread
From: Sasha Levin @ 2026-05-05  9:51 UTC (permalink / raw)
  To: patches, stable
  Cc: John Garry, Christoph Hellwig, Nilay Shroff, Chaitanya Kulkarni,
	Keith Busch, Sasha Levin, sagi, linux-nvme, linux-kernel

From: John Garry <john.g.garry@oracle.com>

[ Upstream commit 3f150f0f010f234f34a67897344f18e68fe803f7 ]

The delayed disk removal work is canceled when a NS (re)appears. However,
we do not put the module reference grabbed in nvme_mpath_remove_disk(), so
fix that.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Decision Summary
This should be backported to stable trees that contain NVMe multipath
delayed removal, especially `v6.16+` trees checked locally. It fixes a
real module reference leak: `nvme_mpath_remove_disk()` takes a
`THIS_MODULE` reference before scheduling delayed removal work, but
`nvme_init_ns_head()` can cancel that work when a namespace reappears.
If canceled, the work callback never runs and therefore never executes
its existing `module_put(THIS_MODULE)`. The patch balances that
reference only when `cancel_delayed_work()` actually canceled pending
work.

## Phase 1: Commit Message Forensics
Step 1.1 Record: subsystem `nvme-multipath`; action verb `put`/implicit
`fix`; intent is to release a module reference when delayed namespace
removal work is canceled.

Step 1.2 Record: tags present: `Reviewed-by: Christoph Hellwig
<hch@lst.de>`, `Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>`,
`Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>`, `Signed-off-by: John
Garry <john.g.garry@oracle.com>`, `Signed-off-by: Keith Busch
<kbusch@kernel.org>`. No `Fixes:`, `Reported-by:`, `Tested-by:`,
`Link:`, or `Cc: stable` tag in the commit.

Step 1.3 Record: bug is a module reference leak. The commit body states
the delayed disk removal work is canceled when a namespace appears or
reappears, but the module reference acquired in
`nvme_mpath_remove_disk()` is not released. Symptom from related
blktests discussion: module refcount does not return to the original
value after reconnect. Root cause: cancellation bypasses
`nvme_remove_head_work()`, where the existing `module_put()` lives.

Step 1.4 Record: yes, this is a hidden resource/reference leak fix. The
subject does not say “fix leak”, but the body and code show a missing
`module_put()` on a cancellation path.

## Phase 2: Diff Analysis
Step 2.1 Record: one file changed, `drivers/nvme/host/core.c`, with 2
insertions and 1 deletion. Modified function: `nvme_init_ns_head()`.
Scope: single-file surgical fix.

Step 2.2 Record: before, `nvme_init_ns_head()` unconditionally called
`cancel_delayed_work(&head->remove_work)` after adding the namespace
head. After, it checks the return value and calls
`module_put(THIS_MODULE)` only if pending delayed work was canceled. The
affected path is namespace initialization/reappearance after delayed
multipath removal was scheduled.

Step 2.3 Record: bug category is reference counting/resource leak.
`nvme_mpath_remove_disk()` acquires a module reference with
`try_module_get(THIS_MODULE)` before scheduling `head->remove_work`;
`nvme_remove_head_work()` releases it with `module_put(THIS_MODULE)`
when the work runs. If `nvme_init_ns_head()` cancels the pending work,
the callback does not run, so the added `module_put()` balances the
acquired reference. Workqueue docs in `kernel/workqueue.c` verify
`cancel_delayed_work()` returns true if pending work was canceled and
false otherwise.

Step 2.4 Record: fix quality is high. It is minimal and correctly
conditional. Regression risk is very low: an extra `module_put()` is
only performed when the work was pending and canceled, matching the
exact reference lifetime. If the work was not pending or was already
running, no extra put is done.

## Phase 3: Git History Investigation
Step 3.1 Record: `git blame` shows the cancellation line in
`nvme_init_ns_head()` was introduced by `dd2c18548964` (`nvme: reset
delayed remove_work after reconnect`), first contained around
`v6.16-rc4`. The delayed removal work and callback `module_put()` came
from `62188639ec16` (`nvme-multipath: introduce delayed removal of the
multipath head node`), first contained around `v6.16-rc1`.

Step 3.2 Record: no `Fixes:` tag is present, so there was no tagged
commit to follow. Manual history points to the interaction between
`62188639ec16` and `dd2c18548964`.

Step 3.3 Record: recent related commits include `62188639ec16`
introducing delayed removal, `dd2c18548964` adding cancellation on
reconnect, and `0f5197ea9a73` fixing a different delayed-removal module
reference failure path. This commit is standalone; it only changes the
cancellation path in `core.c`.

Step 3.4 Record: John Garry has recent NVMe-related commits and reviewed
the related `0f5197ea9a73` fix. The patch was committed by Keith Busch
and reviewed by Christoph Hellwig, both key NVMe maintainers.

Step 3.5 Record: no prerequisite commit beyond the delayed-removal
feature and reconnect cancellation code. The current `core.c` already
includes `<linux/module.h>`, so the new `module_put(THIS_MODULE)`
requires no include change.

## Phase 4: Mailing List And External Research
Step 4.1 Record: `b4 dig -c 3f150f0f010f2` found the original patch at `
https://patch.msgid.link/20260415155358.1517871-1-
john.g.garry@oracle.com`. `b4 dig -a` found only v1, so the committed
version is the submitted revision. Lore mirror shows three positive
reviews and Keith Busch applying it to `nvme-7.1`.

Step 4.2 Record: `b4 dig -w` shows John Garry, Christoph Hellwig, Keith
Busch, Nilay Shroff, and `linux-nvme` were included. Appropriate
maintainers/reviewers were on the thread.

Step 4.3 Record: no `Reported-by` or bugzilla/syzbot link. Related
blktests thread reported a concrete failure: `module refcount not as
original`; John Garry replied that he had posted the kernel fix for it,
referring to this fix.

Step 4.4 Record: related patch thread for `nvme-multipath: fix leak on
try_module_get failure` contains John Garry identifying this missing
`module_put()` on timer cancellation; Keith Busch agreed the check and
`module_put()` were correct and that `_sync` was unnecessary.

Step 4.5 Record: direct lore stable search was blocked by Anubis; web
search did not find stable-specific objections or reasons not to
backport.

## Phase 5: Code Semantic Analysis
Step 5.1 Record: modified function is `nvme_init_ns_head()`.

Step 5.2 Record: `nvme_init_ns_head()` is called by `nvme_alloc_ns()`.
`nvme_alloc_ns()` is called by `nvme_scan_ns()` when a namespace is
discovered during controller scan. Scan work is queued from namespace
change AEN handling, reset/start paths, sysfs rescan, and ioctl rescan
paths.

Step 5.3 Record: key callees around the bug are `cancel_delayed_work()`,
`module_put()`, `nvme_mpath_remove_disk()`, `try_module_get()`,
`mod_delayed_work()`, and `nvme_remove_head_work()`.

Step 5.4 Record: reachable path is: namespace/path removal ->
`nvme_ns_remove()` -> last path -> `nvme_mpath_remove_disk()` ->
`try_module_get()` + delayed work; later namespace reappears -> scan
path -> `nvme_scan_ns()` -> `nvme_alloc_ns()` -> `nvme_init_ns_head()`
-> cancel delayed work. This is reachable through real NVMe multipath
path loss/reconnect and through the blktests NVMe loop scenario. I did
not verify an unprivileged trigger; the tested path writes
`delayed_removal_secs` via sysfs and manipulates NVMe connectivity.

Step 5.5 Record: similar reference-balancing pattern exists elsewhere in
the kernel, e.g. Bluetooth code checks `cancel_delayed_work()` and drops
a reference only when cancellation succeeds. No additional identical
NVMe missing-put sites were found in the searched files.

## Phase 6: Cross-Referencing And Stable Tree Analysis
Step 6.1 Record: local stable branch checks show `stable/linux-6.16.y`,
`6.17.y`, `6.18.y`, `6.19.y`, and `7.0.y` contain
`cancel_delayed_work(&head->remove_work)`,
`try_module_get(THIS_MODULE)`, `module_put(THIS_MODULE)`, and
`delayed_removal_secs`. `stable/linux-6.12.y` does not contain this
delayed-removal code, so this specific bug is not present there.

Step 6.2 Record: `git apply --check` for the patch succeeds on the
current `stable/linux-7.0.y` checkout. The exact hunk context exists in
local `6.16.y` through `7.0.y` branches, so backport difficulty should
be clean or trivial. I did not create separate worktrees for branch-
specific apply checks.

Step 6.3 Record: no alternative fix for this exact cancellation leak was
found in checked stable branches; the candidate commit itself is not an
ancestor of the checked stable branches.

## Phase 7: Subsystem And Maintainer Context
Step 7.1 Record: subsystem is NVMe host multipath, under
`drivers/nvme/host`. Criticality: important for systems using NVMe
multipath and delayed removal, not universal core-kernel impact.

Step 7.2 Record: subsystem is actively developed; recent history
includes delayed removal introduction, reconnect cancellation, and
related leak fixes. This bug is in relatively new code introduced for
`v6.16`.

## Phase 8: Impact And Risk Assessment
Step 8.1 Record: affected users are `CONFIG_NVME_MULTIPATH` users who
enable nonzero `delayed_removal_secs` and experience path loss followed
by namespace/path return before the delayed removal work fires.

Step 8.2 Record: trigger is realistic for NVMe multipath transient path
failures or hot-remove/re-add scenarios. Related blktests exercises this
with NVMe loop. Unprivileged trigger was not verified.

Step 8.3 Record: failure mode is module reference leak, severity medium.
It can prevent `nvme_core` from being unloadable and causes persistent
refcount imbalance; the test discussion shows a concrete refcount
mismatch.

Step 8.4 Record: benefit is moderate for affected NVMe multipath users
because it fixes a real reference leak in a recovery path. Risk is very
low: 2 added lines, no API change, no behavior change except balancing a
reference when pending work is canceled.

## Phase 9: Final Synthesis
Step 9.1 Record: evidence for backporting: real reference leak; concrete
test failure in related blktests discussion; tiny one-function fix;
reviewed by three NVMe developers; maintainer agreed the logic; applies
cleanly to current `7.0.y` and relevant code exists in `6.16+` stable
branches. Evidence against: affects a specific configured NVMe multipath
feature, not all systems; severity is not crash/data
corruption/security. Unresolved: I did not verify exact branch-specific
application in separate worktrees, and direct lore stable search was
blocked.

Step 9.2 Record: stable rules checklist:
1. Obviously correct and tested: yes, by code inspection and related
   blktests discussion; no `Tested-by` tag.
2. Fixes real bug: yes, module reference leak.
3. Important issue: yes enough for stable as a driver lifecycle
   reference leak, though severity is medium rather than critical.
4. Small and contained: yes, 2 insertions/1 deletion in one function.
5. No new feature/API: yes.
6. Can apply to stable: yes for current `7.0.y`; exact context exists in
   checked `6.16+` branches. Not applicable to trees lacking delayed
   removal.

Step 9.3 Record: no exception category such as device ID, quirk, DT,
build, or documentation. This is a normal bug fix.

Step 9.4 Record: backport recommended to stable trees containing NVMe
delayed removal cancellation, notably checked `6.16.y` through `7.0.y`;
not applicable to older trees where the feature is absent.

## Verification
- [Phase 1] Parsed commit `3f150f0f010f2` with `git show --format=fuller
  --stat --patch`; confirmed subject, body, tags, and 2-line fix.
- [Phase 2] Inspected `drivers/nvme/host/core.c` and
  `drivers/nvme/host/multipath.c`; confirmed `try_module_get()` in
  `nvme_mpath_remove_disk()` and `module_put()` in
  `nvme_remove_head_work()`.
- [Phase 2] Checked `kernel/workqueue.c`; confirmed
  `cancel_delayed_work()` returns true if pending work was canceled and
  false otherwise.
- [Phase 3] Ran `git blame` on the changed and related lines; identified
  `dd2c18548964`, `62188639ec16`, and `0f5197ea9a73`.
- [Phase 3] Ran `git show` on related commits; confirmed delayed removal
  introduction, reconnect cancellation, and prior try-module failure
  fix.
- [Phase 4] Ran `b4 dig -c`, `-a`, and `-w`; found original patch URL,
  single v1 revision, and appropriate NVMe recipients.
- [Phase 4] Fetched lore mirror thread; confirmed reviews by Christoph
  Hellwig, Nilay Shroff, Chaitanya Kulkarni, and application by Keith
  Busch.
- [Phase 4] Fetched related try-module failure and blktests threads;
  confirmed this missing cancellation `module_put()` was discussed and
  tied to module refcount mismatch.
- [Phase 5] Used `rg` and file reads to trace callers from
  `nvme_init_ns_head()` through `nvme_alloc_ns()`, `nvme_scan_ns()`, and
  `nvme_scan_work()`.
- [Phase 6] Used `git merge-base --is-ancestor` and branch object
  inspection to confirm the relevant code exists in local `6.16.y`,
  `6.17.y`, `6.18.y`, `6.19.y`, and `7.0.y`, but not `6.12.y`.
- [Phase 6] Ran `git apply --check` for the candidate diff on current
  `stable/linux-7.0.y`; it succeeded.
- UNVERIFIED: exact `git apply --check` on separate worktrees for every
  stable branch; unprivileged triggerability; direct stable-list search
  results due lore Anubis blocking.

**YES**

 drivers/nvme/host/core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index ac5a7d9781f58..2e8402513189d 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4052,7 +4052,8 @@ static int nvme_init_ns_head(struct nvme_ns *ns, struct nvme_ns_info *info)
 	mutex_unlock(&ctrl->subsys->lock);
 
 #ifdef CONFIG_NVME_MULTIPATH
-	cancel_delayed_work(&head->remove_work);
+	if (cancel_delayed_work(&head->remove_work))
+		module_put(THIS_MODULE);
 #endif
 	return 0;
 
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-05-05  9:53 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260505095149.512052-1-sashal@kernel.org>
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.1] nvmet-tcp: check INIT_FAILED before nvmet_req_uninit in digest error path Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-5.10] nvme: add missing MODULE_ALIAS for fabrics transports Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-5.10] nvme-core: fix parameter name in comment Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-5.10] nvme: add quirk NVME_QUIRK_IGNORE_DEV_SUBNQN for 144d:a808 (Samsung PM981/983/970 EVO Plus ) Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.18] nvmet-tcp: Don't clear tls_key when freeing sq Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.12] nvme-tcp: teardown circular locking fixes Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.18] nvme-multipath: put module reference when delayed removal work is canceled Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox