[PATCH AUTOSEL 7.0-5.10] ALSA: hda: Avoid WARN_ON() for HDMI chmap slot checks

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH AUTOSEL 7.0-5.10] ALSA: hda: Avoid WARN_ON() for HDMI chmap slot checks
@ 2026-05-05  9:51 Sasha Levin
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.1] nvmet-tcp: check INIT_FAILED before nvmet_req_uninit in digest error path Sasha Levin
                   ` (27 more replies)
  0 siblings, 28 replies; 29+ messages in thread
From: Sasha Levin @ 2026-05-05  9:51 UTC (permalink / raw)
  To: patches, stable
  Cc: Takashi Iwai, Sasha Levin, perex, tiwai, linux-sound,
	linux-kernel

From: Takashi Iwai <tiwai@suse.de>

[ Upstream commit 077c593dacf7ee33511468e4f29417d795cf07a4 ]

At parsing the channel mapping for HDMI, the current code may spew
WARN_ON() unnecessarily for the case where only invalid (zero) channel
maps are given from the hardware.  Drop WARN_ON() and reorganize the
code a bit for avoiding the hdmi_slot over the array size.

Closes: https://bugzilla.kernel.org/show_bug.cgi?id=221390
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://patch.msgid.link/20260428061800.80527-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Phase 1: Commit Message Forensics

Step 1.1 Record: subsystem is `ALSA: hda`; action verb is `Avoid`;
claimed intent is to avoid unnecessary `WARN_ON()` during HDMI channel-
map slot checks.

Step 1.2 Record: tags present are `Closes:
https://bugzilla.kernel.org/show_bug.cgi?id=221390`, `Signed-off-by:
Takashi Iwai <tiwai@suse.de>`, `Link:
https://patch.msgid.link/20260428061800.80527-1-tiwai@suse.de`, and
another `Signed-off-by: Takashi Iwai <tiwai@suse.de>` in the supplied
commit text. No `Fixes:`, `Reported-by:`, `Tested-by:`, `Reviewed-by:`,
`Acked-by:`, or `Cc: stable` tag was present.

Step 1.3 Record: the commit says HDMI channel-map parsing can emit
unnecessary `WARN_ON()` when hardware supplies only invalid zero channel
maps, and also says the code is reorganized to avoid `hdmi_slot` going
over the array size. The visible user symptom is kernel warning spam;
the linked Bugzilla content could not be fetched because Bugzilla
returned an Anubis bot-protection page.

Step 1.4 Record: this is a real bug fix, not just cleanup. The diff
removes a warning path reachable from bad hardware-provided mapping data
and adds an explicit bound exit before programming or storing an invalid
HDMI slot.

## Phase 2: Diff Analysis

Step 2.1 Record: one file changes: `sound/hda/core/hdmi_chmap.c`, with 7
insertions and 4 deletions. The only modified function is
`hdmi_std_setup_channel_mapping()`. Scope is a single-file surgical fix.

Step 2.2 Record: before the patch, the loop skipped empty speaker slots
using `WARN_ON(hdmi_slot >= 8)` as part of the while condition, but
after the warning fired the loop still proceeded to assign `hdmi_slot`
into `hdmi_channel_mapping`. After the patch, the outer loop stops when
`hdmi_slot` reaches 8, the inner skip loop jumps to `out` before
assignment when the slot limit is reached, and the remaining slots are
filled with the invalid ALSA channel marker as before.

Step 2.3 Record: bug category is logic/bounds handling plus spurious
warning removal. The broken mechanism is that `hdmi_slot` can reach the
8-slot HDMI limit while looking for a nonzero speaker slot, causing a
`WARN_ON()` and allowing an invalid slot number to be used in the
mapping path. The fix prevents the over-limit assignment.

Step 2.4 Record: the fix is obviously local and low-risk. It changes
only the invalid/edge path where slot scanning reaches the HDMI 8-slot
limit. It does not add APIs, data structures, or features.

## Phase 3: Git History Investigation

Step 3.1 Record: `git blame` shows the function and mapping logic came
from `2f6e8a8518f3` (`ALSA: hda - Move chmap support helpers/ops to
core`, first contained in `v4.6-rc1`), and the current
`WARN_ON(hdmi_slot >= 8)` ordering came from `960a581e22d9` (`ALSA: hda:
fix some klockwork scan warnings`, first contained in `v4.8-rc1`).

Step 3.2 Record: no `Fixes:` tag is present, so there was no tagged
introducing commit to follow. Blame identifies `960a581e22d9` as the
source of the current warning expression.

Step 3.3 Record: recent history for this file is sparse. Relevant later
changes include `c2432466f583c` HDMI/DP chmap reporting, `6278056e42d9`
input value sanity checks, and `b2660d1ebde` moving the file into
`sound/hda/core`. No prerequisite for this patch was identified.

Step 3.4 Record: author Takashi Iwai is listed in `MAINTAINERS` as a
maintainer for `SOUND`, and recent git history shows many HDA commits
authored/merged by him. This is a strong subsystem-authority signal.

Step 3.5 Record: no dependency on newer structures or helper functions
was found. The affected code pattern exists in active stable trees;
older trees before `v6.17` use the path `sound/hda/hdmi_chmap.c` instead
of `sound/hda/core/hdmi_chmap.c`.

## Phase 4: Mailing List And External Research

Step 4.1 Record: an exact candidate commit SHA was not found by subject
on the searched refs, so `b4 dig -c <candidate>` could not be run for
the candidate. Using the message-id instead, `b4 am
20260428061800.80527-1-tiwai@suse.de` found the patch, confirmed a
single patch, DKIM-signed by `suse.de`, and reported that it applies
cleanly to the current tree. `b4 am -c` found no newer revision.

Step 4.2 Record: `b4 mbox` showed the original patch was sent by Takashi
Iwai to `linux-sound@vger.kernel.org` and Cc’d `linux-
kernel@vger.kernel.org`. The fetched thread contained one message and no
code-review replies or review trailers.

Step 4.3 Record: the patch has a Bugzilla `Closes:` link, but WebFetch
for Bugzilla was blocked by Anubis. I could not verify reporter severity
or duplicates from Bugzilla content.

Step 4.4 Record: `b4 am` reported one total patch, and no series context
or newer revision was found. This appears standalone.

Step 4.5 Record: web searches did not find stable-list discussion for
the exact title. No stable-specific objection was found.

## Phase 5: Code Semantic Analysis

Step 5.1 Record: modified function is
`hdmi_std_setup_channel_mapping()`.

Step 5.2 Record: `rg` shows `hdmi_std_setup_channel_mapping()` is called
only by `snd_hdac_setup_channel_mapping()`. That exported helper is
called from HDA HDMI setup in `sound/hda/codecs/hdmi/hdmi.c` and ASoC
HDMI codec setup in `sound/soc/codecs/hdac_hdmi.c`.

Step 5.3 Record: the affected function computes slot/channel assignments
and calls `chmap->ops.pin_set_slot_channel()`, which writes the
`AC_VERB_SET_HDMI_CHAN_SLOT` HDA verb through the codec operation.

Step 5.4 Record: the path is reachable during HDMI/DP audio setup: PCM
prepare, ELD/presence update, channel-map control update when prepared,
and ASoC DAPM pin power-up call into the infoframe setup path. This is a
real runtime hardware path, not dead code.

Step 5.5 Record: `rg` found no other `WARN_ON(hdmi_slot >= 8)` or
similar `hdmi_slot` warning pattern elsewhere in `sound/`.

## Phase 6: Cross-Referencing And Stable Tree Analysis

Step 6.1 Record: the exact `WARN_ON(hdmi_slot >= 8)` pattern exists in
active stable branches checked: `stable/linux-5.10.y`, `5.15.y`,
`6.1.y`, `6.6.y`, `6.12.y`, `6.18.y`, `6.19.y`, and `7.0.y`.

Step 6.2 Record: backport difficulty should be clean or minor. `git
apply --check` succeeds on the current `7.0.y` checkout. For `5.10.y`
through `6.12.y`, the code is effectively the same but the file path is
`sound/hda/hdmi_chmap.c`; `6.18.y+` uses `sound/hda/core/hdmi_chmap.c`.

Step 6.3 Record: searches for the exact subject and bug number in stable
history found no existing matching stable fix.

## Phase 7: Subsystem And Maintainer Context

Step 7.1 Record: subsystem is ALSA HDA HDMI channel mapping under
`sound/`. Criticality is IMPORTANT: it affects users of HDA HDMI/DP
audio hardware, not all kernel users.

Step 7.2 Record: subsystem activity is active; recent `sound/hda`
history contains HDA HDMI fixes, quirks, cleanups, and device IDs. The
touched file itself has low churn.

## Phase 8: Impact And Risk Assessment

Step 8.1 Record: affected users are driver/hardware-specific: systems
using HDA HDMI/DP audio channel mapping, especially hardware that
reports invalid zero channel-map data.

Step 8.2 Record: trigger is HDMI channel-map setup with invalid/zero
mapping data from hardware, reached during audio infoframe/channel
setup. I did not verify whether an unprivileged user can force the exact
bad hardware data; the runtime setup path itself is reachable during
normal audio/display operation.

Step 8.3 Record: failure mode is warning splat/log spam and invalid slot
handling. Severity is MEDIUM normally, and potentially higher on systems
configured with `panic_on_warn`, but I did not verify `panic_on_warn` in
the bug report.

Step 8.4 Record: benefit is medium: avoids a user-reported warning and
prevents invalid HDMI slot use on affected hardware. Risk is very low:
11-line single-function change in an edge path with no API or behavior
expansion.

## Phase 9: Final Synthesis

Step 9.1 Record: evidence for backporting: user-reported Bugzilla-linked
issue, maintainer-authored ALSA fix, small and isolated diff, removes a
spurious runtime `WARN_ON()`, adds explicit bounds handling, code exists
across active stable trees, and current-tree apply check succeeds.
Evidence against: primary verified symptom is warning/log noise rather
than confirmed crash, data corruption, or security issue; Bugzilla
details could not be fetched; older stable trees need a path-adjusted
backport because of the `sound/hda/core` rename.

Step 9.2 Record: stable rules checklist:
1. Obviously correct and tested: yes for obvious correctness by
   inspection; external testing is not evidenced by tags.
2. Fixes a real bug: yes, commit has a Bugzilla `Closes:` and fixes a
   concrete warning/bounds condition.
3. Important issue: moderate importance; warnings from hardware-
   triggered paths are stable-worthy when the fix is tiny and safe.
4. Small and contained: yes, one function, one file, 7 insertions and 4
   deletions.
5. No new features/APIs: yes.
6. Can apply to stable: yes for current path; minor path adjustment
   needed for older stable branches.

Step 9.3 Record: no automatic exception category applies. This is not a
device ID, quirk table entry, DT update, build fix, or documentation
fix.

Step 9.4 Record: decision is YES. The patch is small, maintainer-
authored, fixes a real user-reported WARN/bounds handling issue in
existing HDA HDMI code present across stable trees, and has very low
regression risk.

## Verification

- [Phase 1] Read the supplied commit message and the fetched mbox:
  confirmed subject, body, `Closes`, and `Signed-off-by`.
- [Phase 2] Read the b4-fetched patch: confirmed
  `sound/hda/core/hdmi_chmap.c`, 7 insertions, 4 deletions, one hunk in
  `hdmi_std_setup_channel_mapping()`.
- [Phase 3] Ran `git blame` on the changed region: confirmed
  `2f6e8a8518f3` and `960a581e22d9` history.
- [Phase 3] Ran `git describe --contains`: confirmed `2f6e8a8518f3`
  appears in `v4.6-rc1` ancestry and `960a581e22d9` in `v4.8-rc1`
  ancestry.
- [Phase 3] Checked file history and author history: no prerequisite
  commit identified; Takashi Iwai is active in HDA history.
- [Phase 4] `b4 am` by message-id: found the exact patch, one total
  patch, DKIM signed, applies cleanly to current tree.
- [Phase 4] `b4 am -c`: found no newer revision.
- [Phase 4] `b4 mbox`: confirmed To/Cc and that the thread has one
  message.
- [Phase 4] WebFetch for lore and Bugzilla: blocked by Anubis, so bug
  details beyond the `Closes` URL are unverified.
- [Phase 5] `rg` call tracing: confirmed caller chain through
  `snd_hdac_setup_channel_mapping()` into HDA HDMI and ASoC HDMI setup
  paths.
- [Phase 5] Read caller code: confirmed use during audio infoframe
  setup, ELD/presence handling, PCM prepare, DAPM power-up, and prepared
  chmap update paths.
- [Phase 6] `git grep` on active stable refs: confirmed the exact
  `WARN_ON(hdmi_slot >= 8)` pattern exists in checked `5.10.y` through
  `7.0.y`.
- [Phase 6] `git diff` across stable refs: confirmed older branches
  mostly differ by file rename, with `5.10.y/5.15.y` having one
  unrelated PCM runtime-state line difference.
- [Phase 6] `git apply --check`: candidate patch applies cleanly to the
  current `7.0.y` checkout.
- [Phase 7] `MAINTAINERS` search: confirmed Takashi Iwai is listed as a
  `SOUND` maintainer.
- [Phase 8] Severity/risk assessment is based on the verified diff,
  commit message, and runtime call paths.
- UNVERIFIED: exact Bugzilla reporter details, number of affected users,
  and whether the reporter hit a panic rather than warning spam.
- UNVERIFIED: exact upstream commit SHA, because the subject was not
  found on searched local refs; candidate-specific `b4 dig -c` could not
  be performed.

**YES**

 sound/hda/core/hdmi_chmap.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/sound/hda/core/hdmi_chmap.c b/sound/hda/core/hdmi_chmap.c
index 7b276047f85a7..c897fc443467c 100644
--- a/sound/hda/core/hdmi_chmap.c
+++ b/sound/hda/core/hdmi_chmap.c
@@ -353,13 +353,16 @@ static void hdmi_std_setup_channel_mapping(struct hdac_chmap *chmap,
 	if (hdmi_channel_mapping[ca][1] == 0) {
 		int hdmi_slot = 0;
 		/* fill actual channel mappings in ALSA channel (i) order */
-		for (i = 0; i < ch_alloc->channels; i++) {
-			while (!WARN_ON(hdmi_slot >= 8) &&
-			       !ch_alloc->speakers[7 - hdmi_slot])
-				hdmi_slot++; /* skip zero slots */
+		for (i = 0; i < ch_alloc->channels && hdmi_slot < 8; i++) {
+			while (!ch_alloc->speakers[7 - hdmi_slot]) {
+				/* skip zero slots */
+				if (++hdmi_slot >= 8)
+					goto out;
+			}

 			hdmi_channel_mapping[ca][i] = (i << 4) | hdmi_slot++;
 		}
+	out:
 		/* fill the rest of the slots with ALSA channel 0xf */
 		for (hdmi_slot = 0; hdmi_slot < 8; hdmi_slot++)
 			if (!ch_alloc->speakers[7 - hdmi_slot])
-- 
2.53.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH AUTOSEL 7.0-6.1] nvmet-tcp: check INIT_FAILED before nvmet_req_uninit in digest error path
  2026-05-05  9:51 [PATCH AUTOSEL 7.0-5.10] ALSA: hda: Avoid WARN_ON() for HDMI chmap slot checks Sasha Levin
@ 2026-05-05  9:51 ` Sasha Levin
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0] drm/amd/pm: Update emit clock logic Sasha Levin
                   ` (26 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Sasha Levin @ 2026-05-05  9:51 UTC (permalink / raw)
  To: patches, stable
  Cc: Shivam Kumar, Christoph Hellwig, Keith Busch, Sasha Levin, sagi,
	kch, linux-nvme, linux-kernel

From: Shivam Kumar <kumar.shivam43666@gmail.com>

[ Upstream commit 4606467a75cfc16721937272ed29462a750b60c8 ]

In nvmet_tcp_try_recv_ddgst(), when a data digest mismatch is detected,
nvmet_req_uninit() is called unconditionally. However, if the command
arrived via the nvmet_tcp_handle_req_failure() path, nvmet_req_init()
had returned false and percpu_ref_tryget_live() was never executed. The
unconditional percpu_ref_put() inside nvmet_req_uninit() then causes a
refcount underflow, leading to a WARNING in
percpu_ref_switch_to_atomic_rcu, a use-after-free diagnostic, and
eventually a permanent workqueue deadlock.

Check cmd->flags & NVMET_TCP_F_INIT_FAILED before calling
nvmet_req_uninit(), matching the existing pattern in
nvmet_tcp_execute_request().

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Shivam Kumar <kumar.shivam43666@gmail.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Phase Walkthrough
Phase 1 Record:
Subject subsystem/action is `nvmet-tcp`, action `check`, intent: avoid
calling `nvmet_req_uninit()` after failed request initialization in the
data-digest error path. Tags found: `Reviewed-by: Christoph Hellwig
<hch@lst.de>`, `Signed-off-by: Shivam Kumar`, `Signed-off-by: Keith
Busch`. No `Fixes:`, `Reported-by:`, `Tested-by:`, `Link:`, or `Cc:
stable` tag in the committed version I verified. The body describes a
real refcount bug: `nvmet_req_init()` can fail before
`percpu_ref_tryget_live()`, while `nvmet_req_uninit()` always does
`percpu_ref_put()`. Hidden bug fix: yes, it prevents refcount underflow,
UAF diagnostics, and a workqueue deadlock.

Phase 2 Record:
One file changed: `drivers/nvme/target/tcp.c`, 2 insertions and 1
deletion, only in `nvmet_tcp_try_recv_ddgst()`. Before: a data digest
mismatch always called `nvmet_req_uninit(&cmd->req)`. After: it skips
that call when `cmd->flags & NVMET_TCP_F_INIT_FAILED`. Bug category:
reference-counting bug with severe follow-on failure. Fix quality:
small, surgical, matches the existing `nvmet_tcp_execute_request()`
handling of `NVMET_TCP_F_INIT_FAILED`; regression risk is very low
because it only suppresses an invalid put on a command whose init
failed.

Phase 3 Record:
`git blame` showed the current direct `nvmet_req_uninit()` line came
from `0700542a823b` (`nvmet-tcp: remove nvmet_tcp_finish_cmd`), but
older stable trees had the same underlying behavior through
`nvmet_tcp_finish_cmd()`. `git describe --contains` places the original
nvmet-tcp code at `v5.0-rc1` and the direct-inline refactor at
`v6.1-rc1`. The init-failure flag and execute-path guard were already
present in stable versions I checked. Recent history contains other
nvmet-tcp fixes, but `git apply --check` confirmed this candidate
applies to the current 7.0.y tree without prerequisites. Author has at
least one other nvmet target fix in the local history; review/commit
path is stronger signal here than author history.

Phase 4 Record:
`b4 dig -c 4606467a75cfc` found the original lore submission at `https:/
/patch.msgid.link/20260318225658.1760759-1-kumar.shivam43666@gmail.com`.
`b4 dig -a` showed only v1, no superseding revision. `b4 dig -w` showed
recipients included `gregkh`, `security@kernel.org`, Christoph Hellwig,
Sagi Grimberg, Keith Busch, and `linux-nvme`. The infradead archive copy
matched the diff exactly. Christoph replied “Looks good to me” and gave
`Reviewed-by`, while noting he would prefer someone more familiar with
TCP code also look. I found no NAK. A stable lore query via `WebFetch`
was blocked by Anubis; web search did not reveal separate stable-
specific discussion.

Phase 5 Record:
Modified function: `nvmet_tcp_try_recv_ddgst()`. Call chain verified
locally: socket callbacks queue `nvmet_tcp_io_work()`, which calls
`nvmet_tcp_try_recv()`, then `nvmet_tcp_try_recv_one()`, then
`nvmet_tcp_try_recv_ddgst()` when `rcv_state == NVMET_TCP_RECV_DDGST`.
The failing path is reachable by NVMe/TCP target receive processing when
data digest is enabled and a write command with failed
`nvmet_req_init()` still has inline data to drain. Callees verified:
`nvmet_req_uninit()` unconditionally calls
`percpu_ref_put(&req->sq->ref)`, and `nvmet_req_init()` only takes that
ref after earlier validation and parsing succeeds.

Phase 6 Record:
Stable-code checks show the vulnerable pattern exists in `v5.4`,
`v5.10`, `v5.15`, `v6.1`, `v6.6`, `v6.12`, `v6.18`, `v6.19`, and `v7.0`,
though `v5.x` uses `nvmet_tcp_finish_cmd()` so those trees need a small
backport adjustment. For `v6.1+`, the same flag, execute-path guard, and
unconditional digest-error uninit pattern are present. Current 7.0.y
accepts the candidate patch with `git apply --check`.

Phase 7 Record:
Subsystem is NVMe target over TCP: storage/network transport driver,
important for systems exporting NVMe-oF targets. Activity level is
active; recent local `block-next` history includes several nvmet-tcp
fixes around the same file.

Phase 8 Record:
Affected users are systems running `nvmet-tcp` targets with data digest
enabled. Trigger requires a command init failure path plus later data
digest mismatch while stale write data is being drained, so not
universal, but reachable through remote NVMe/TCP protocol traffic.
Failure mode is high to critical: verified commit text and code
mechanism support refcount underflow from unmatched `percpu_ref_put()`,
with reported consequences of warning, UAF diagnostic, and permanent
workqueue deadlock. Benefit is high for affected targets; risk is very
low due to a two-line guard on an error path.

Phase 9 Record:
Evidence for backporting: real refcounting bug, severe hang/deadlock
failure mode, small fix, reviewed by Christoph Hellwig, applies cleanly
to 7.0.y, and the vulnerable pattern exists across active stable lines.
Evidence against: no `Fixes:`/`Cc: stable`/`Reported-by`, no `Tested-
by`, local `master` did not contain the commit while `block-next` did,
and `v5.x` needs an adjusted backport around `nvmet_tcp_finish_cmd()`.
Stable checklist: obviously correct yes; fixes a real bug yes; important
issue yes; small and contained yes; no new feature/API yes; applies
cleanly to 7.0.y and likely minor/manual adjustment for older stable
trees.

## Problem And Stable Value
The commit prevents `nvmet_req_uninit()` from dropping a request
reference that was never acquired. The code verifies that
`nvmet_req_init()` only does `percpu_ref_tryget_live()` after
validation/parsing succeeds, while `nvmet_req_uninit()` always calls
`percpu_ref_put()`. On the `NVMET_TCP_F_INIT_FAILED` path, calling
uninit is therefore invalid.

This matters for stable because the failure is not cosmetic: the
described and mechanically verified outcome is refcount underflow, with
potential UAF diagnostics and permanent workqueue deadlock in an
NVMe/TCP target. The change is as small as this kind of fix gets and
mirrors an existing guard in the same driver.

## Risk / Benefit
Benefit is high for affected NVMe/TCP target deployments, especially
because a transport-level bad digest/error path should not be able to
wedge the target workqueue. Risk is very low: it changes only an error
path and only skips cleanup that is invalid when init failed. Buffer
cleanup and fatal error handling still run.

## Verification
- Phase 1: `git show 4606467a75cfc` verified commit message, tags,
  author, committer, and exact diff.
- Phase 2: Diff analysis verified only `drivers/nvme/target/tcp.c`
  changes, 2 insertions/1 deletion in `nvmet_tcp_try_recv_ddgst()`.
- Phase 3: `git blame` verified the direct uninit line came from
  `0700542a823b`; `git show 0700542a823b` verified it was a helper
  removal preserving the same uninit/free behavior.
- Phase 3: `git describe --contains` verified relevant code ancestry:
  original nvmet-tcp code in `v5.0-rc1`, direct inline refactor in
  `v6.1-rc1`.
- Phase 4: `b4 dig -c 4606467a75cfc`, `-a`, and `-w` verified lore
  match, single v1 revision, and original recipients.
- Phase 4: WebFetch of infradead patch and reply verified patch contents
  and Christoph Hellwig’s Reviewed-by plus caveat.
- Phase 5: `rg` and file reads verified the receive call chain and
  `nvmet_req_init()`/`nvmet_req_uninit()` reference behavior.
- Phase 6: Stable tag checks verified vulnerable patterns in `v5.4`,
  `v5.10`, `v5.15`, `v6.1`, `v6.6`, `v6.12`, `v6.18`, `v6.19`, and
  `v7.0`.
- Phase 6: `git apply --check` verified the candidate patch applies to
  the current 7.0.y tree.
- Unverified: I could not verify a separate stable-mailing-list
  discussion because the lore stable query was blocked by Anubis.

**YES**

 drivers/nvme/target/tcp.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index acc71a26733f9..a456dd2fd8bd1 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -1310,7 +1310,8 @@ static int nvmet_tcp_try_recv_ddgst(struct nvmet_tcp_queue *queue)
 			queue->idx, cmd->req.cmd->common.command_id,
 			queue->pdu.cmd.hdr.type, le32_to_cpu(cmd->recv_ddgst),
 			le32_to_cpu(cmd->exp_ddgst));
-		nvmet_req_uninit(&cmd->req);
+		if (!(cmd->flags & NVMET_TCP_F_INIT_FAILED))
+			nvmet_req_uninit(&cmd->req);
 		nvmet_tcp_free_cmd_buffers(cmd);
 		nvmet_tcp_fatal_error(queue);
 		ret = -EPROTO;
-- 
2.53.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH AUTOSEL 7.0] drm/amd/pm: Update emit clock logic
  2026-05-05  9:51 [PATCH AUTOSEL 7.0-5.10] ALSA: hda: Avoid WARN_ON() for HDMI chmap slot checks Sasha Levin
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.1] nvmet-tcp: check INIT_FAILED before nvmet_req_uninit in digest error path Sasha Levin
@ 2026-05-05  9:51 ` Sasha Levin
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0] smb: client: change allocation requirements in smb2_compound_op Sasha Levin
                   ` (25 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Sasha Levin @ 2026-05-05  9:51 UTC (permalink / raw)
  To: patches, stable
  Cc: Lijo Lazar, Asad Kamal, Alex Deucher, Sasha Levin, kenneth.feng,
	christian.koenig, airlied, simona, amd-gfx, dri-devel,
	linux-kernel

From: Lijo Lazar <lijo.lazar@amd.com>

[ Upstream commit d6b99885b122528651d554a7bd907211a81579c2 ]

If only one level is enabled in clock table, there is no need to
follow the fine grained clock logic which expects a minimum of
two levels (min/max).

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 7f19097af1496dd908a044ca95862f32d05f02df)
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Phase 1: Commit Message Forensics
Step 1.1 Record: subsystem `drm/amd/pm`; action verb `Update`; claimed
intent is to change AMD SMU clock-level sysfs emission when a DPM clock
table has only one enabled level.

Step 1.2 Record: tags present are `Signed-off-by: Lijo Lazar
<lijo.lazar@amd.com>`, `Reviewed-by: Asad Kamal <asad.kamal@amd.com>`,
`Signed-off-by: Alex Deucher <alexander.deucher@amd.com>`, and a cherry-
pick marker for `7f19097af149...`. No `Fixes:`, `Reported-by:`, `Tested-
by:`, `Link:`, or `Cc: stable@vger.kernel.org` tag was present in the
supplied commit message.

Step 1.3 Record: the body says fine-grained clock logic expects two
levels, min and max, so it should not be used when the table has only
one level. Symptom is incorrect sysfs clock-level emission for that one-
level case; no crash, stack trace, affected version, or reproducer is
described.

Step 1.4 Record: this is a hidden logic/correctness bug fix, not a
cleanup. The bug is that a one-level table can be treated as fine-
grained min/max output.

## Phase 2: Diff Analysis
Step 2.1 Record: one file changed,
`drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c`; one-line condition change in
`smu_cmn_print_dpm_clk_levels()`. Scope is a single-file surgical driver
fix.

Step 2.2 Record: before, any table marked `SMU_DPM_TABLE_FINE_GRAINED`
used the fine-grained path, which forces `count = 2` and emits min/max-
style output. After, `count == 1` tables use the discrete-table path and
emit exactly the real table entries.

Step 2.3 Record: bug category is logic/correctness in user-visible sysfs
output. The specific broken mechanism is the fine-grained display path
assuming two levels even when `dpm_table->count` is one.

Step 2.4 Record: fix quality is high: it preserves all existing behavior
except the verified `is_fine_grained && count == 1` case. Regression
risk is very low because non-fine-grained tables and fine-grained tables
with more than one level are unchanged.

## Phase 3: Git History Investigation
Step 3.1 Record: `git blame` shows `smu_cmn_print_dpm_clk_levels()` and
the original `if (!is_fine_grained)` condition came from
`a08ea4bc7711f8` (`drm/amd/pm: Add a helper to show dpm table`). Later
fine-grained formatting changes came from `d81e52fc61fb9` (`drm/amd/pm:
fix issue of missing '*' on pp_dpm_xxx nodes`). `a08ea4bc7711f8` is
contained in `v7.0`, `v7.0.1`, `v7.0.2`, and `v7.0.3`, but not `v6.19`.

Step 3.2 Record: no `Fixes:` tag is present, so there was no Fixes
target to follow. Related history identifies `a08ea4bc7711f8` as the
likely introducing commit.

Step 3.3 Record: recent file history shows a series of AMD PM
helper/refactor commits, including `a08ea4bc7711f8` and `d81e52fc61fb9`.
The target patch is standalone on top of code present in `v7.0.y`.

Step 3.4 Record: Lijo Lazar authored the helper introduction and has
multiple recent commits under `drivers/gpu/drm/amd/pm`, so the author is
an active AMD PM contributor.

Step 3.5 Record: dependency is the helper itself. The affected function
is absent in `v6.19`, so older stable trees do not need this patch. For
`v7.0.y`, the context exists and the backport should be trivial.

## Phase 4: Mailing List And External Research
Step 4.1 Record: `b4 dig -c 7f19097af149...`, `-a`, and `-w` all failed
because the local repository does not contain that commit object.
`WebFetch` to lore and git.kernel.org was blocked by Anubis. Web
searches by exact subject/hash/body did not find the original target
patch.

Step 4.2 Record: b4 recipient data could not be obtained. The only
verified review signal is the commit-message `Reviewed-by: Asad Kamal`.

Step 4.3 Record: no bug-report link or reporter tag exists. No syzbot,
Bugzilla, or user-report evidence was found.

Step 4.4 Record: related external context found the earlier AMD PM
`Remove print_clock_levels` series and a `Use one level table if dpm not
enabled` patch, but not this exact patch. These support the broader area
of clock table/sysfs work but are not direct evidence for this commit.

Step 4.5 Record: stable-list search via lore was blocked; web search
found no stable-specific discussion.

## Phase 5: Code Semantic Analysis
Step 5.1 Record: modified function is `smu_cmn_print_dpm_clk_levels()`.

Step 5.2 Record: callers found in AMD SMU ppt files include
`smu_v14_0_2_ppt.c`, `smu_v13_0_7_ppt.c`, `smu_v13_0_6_ppt.c`,
`smu_v13_0_0_ppt.c`, `aldebaran_ppt.c`, `sienna_cichlid_ppt.c`,
`navi10_ppt.c`, and `arcturus_ppt.c`.

Step 5.3 Record: key callees/macros are `sysfs_emit_at()`,
`smu_cmn_freqs_match()`, `SMU_DPM_TABLE_MIN()`, and
`SMU_DPM_TABLE_MAX()`.

Step 5.4 Record: verified call chain is sysfs read path
`amdgpu_get_pp_dpm_clock()` / `amdgpu_get_pp_od_clk_voltage()` ->
`amdgpu_dpm_emit_clock_levels()` -> `smu_emit_ppclk_levels()` -> ASIC-
specific `emit_clk_levels()` -> `smu_cmn_print_dpm_clk_levels()`. The
buggy path is reachable through AMDGPU PM sysfs clock-level reads.

Step 5.5 Record: similar one-level DPM table setup exists in multiple
AMD SMU ppt files, and fine-grained flags are set in several SMU
generations. This verifies that `count == 1` and fine-grained clock-
table handling are both real local patterns.

## Phase 6: Stable Tree Analysis
Step 6.1 Record: `git grep` confirmed `smu_cmn_print_dpm_clk_levels()`
is absent from `v6.19` and present in `v7.0`. Tags containing the helper
include `v7.0` through `v7.0.3`.

Step 6.2 Record: expected backport difficulty to `v7.0.y` is clean or
near-clean: the patch changes one existing condition in code verified
present in `v7.0`.

Step 6.3 Record: local `git log --grep` searches did not find this exact
target commit in current, `stable/linux-7.0.y`, `master`, or `graphics-
next`. Related fix `d81e52fc61fb9` is already in `v7.0`.

## Phase 7: Subsystem Context
Step 7.1 Record: subsystem is DRM AMDGPU power management. Criticality
is driver-specific but important for affected AMD GPU users because it
affects exported PM sysfs clock-level state.

Step 7.2 Record: `drivers/gpu/drm/amd/pm/swsmu` is actively developed;
recent history shows many AMD PM and SMU changes.

## Phase 8: Impact And Risk
Step 8.1 Record: affected users are AMDGPU users on SMU generations
using `smu_cmn_print_dpm_clk_levels()` with a fine-grained DPM table
whose count is one.

Step 8.2 Record: trigger is reading relevant AMDGPU PM sysfs files such
as `pp_dpm_*` / OD clock output. Whether unprivileged users can read
every affected node was not verified.

Step 8.3 Record: verified failure mode is incorrect sysfs clock-level
reporting: the old code can emit min/max-style two-level output for a
one-level table. I verified the paired write path parses numeric levels
and some fine-grained force-clock paths clamp to level 0/1 without
checking `count`, but I did not verify a concrete hardware failure from
writing the falsely advertised level.

Step 8.4 Record: benefit is moderate for affected AMD GPU users because
it fixes misleading PM sysfs state. Risk is very low because the patch
is one conditional change and leaves all other cases unchanged.

## Phase 9: Final Synthesis
Evidence for backporting: real user-visible logic bug, one-line fix,
reviewed by an AMD PM developer, authored by the contributor who
introduced the helper, code exists in `v7.0.y`, and regression risk is
very low.

Evidence against backporting: no verified crash, security issue, data
corruption, deadlock, reporter, reproducer, `Fixes:` tag, `Cc: stable`,
or accessible lore discussion. It is not relevant to stable trees older
than `v7.0`.

Stable rules checklist: obviously correct: yes by local code inspection.
Tested: no `Tested-by`, but reviewed. Real bug: yes, wrong sysfs clock-
level output. Important: borderline, but it affects a stable user-
visible PM interface and can advertise a non-real level.
Small/contained: yes, one line in one driver helper. No new
features/APIs: yes. Applies to stable: yes for `v7.0.y`; no need for
older stable trees lacking the helper.

Exception category: none.

## Verification
- Phase 1: parsed supplied subject/body/tags; confirmed no reporter,
  link, Fixes, Tested, or stable tag in the supplied message.
- Phase 2: inspected the supplied diff and current
  `smu_cmn_print_dpm_clk_levels()` implementation.
- Phase 3: ran `git blame` on the affected lines; identified
  `a08ea4bc7711f8` and `d81e52fc61fb9`.
- Phase 3: ran `git show a08ea4bc7711f8` and `git show d81e52fc61fb9`;
  confirmed helper introduction and related missing-star fix.
- Phase 3/6: ran tag/ancestor checks; confirmed helper is in `v7.0*`
  tags and not in `v6.19`.
- Phase 4: ran `b4 dig` variants; all failed because the commit object
  was not locally resolvable.
- Phase 4: attempted lore/git.kernel.org fetches; blocked by Anubis.
- Phase 4: searched web by exact subject/hash/body; did not find the
  target discussion.
- Phase 5: searched callers with `rg`; confirmed multiple AMD SMU ppt
  callers.
- Phase 5: read sysfs call chain in `amdgpu_pm.c`, `amdgpu_dpm.c`, and
  `amdgpu_smu.c`.
- Phase 5/8: inspected write-path parsing and force-clock logic;
  verified possible level 0/1 handling, but not a concrete hardware
  failure.
- Phase 7: ran recent `git log` for `drivers/gpu/drm/amd/pm/swsmu`;
  confirmed active subsystem churn.
- UNVERIFIED: exact upstream lore discussion, all review comments, exact
  commit date of `7f19097af149...`, and concrete hardware impact beyond
  incorrect sysfs output.

This should be backported to stable trees that contain the helper,
especially `v7.0.y`; it should not be applied to older stable trees
where the helper does not exist.

**YES**

 drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
index 6fd50c2fd20e0..37de6022581ed 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
@@ -1310,7 +1310,7 @@ int smu_cmn_print_dpm_clk_levels(struct smu_context *smu,
 		level_index = 1;
 	}

-	if (!is_fine_grained) {
+	if (!is_fine_grained || count == 1) {
 		for (i = 0; i < count; i++) {
 			freq_match = !is_deep_sleep &&
 				     smu_cmn_freqs_match(
-- 
2.53.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH AUTOSEL 7.0] smb: client: change allocation requirements in smb2_compound_op
  2026-05-05  9:51 [PATCH AUTOSEL 7.0-5.10] ALSA: hda: Avoid WARN_ON() for HDMI chmap slot checks Sasha Levin
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.1] nvmet-tcp: check INIT_FAILED before nvmet_req_uninit in digest error path Sasha Levin
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0] drm/amd/pm: Update emit clock logic Sasha Levin
@ 2026-05-05  9:51 ` Sasha Levin
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.18] btrfs: handle -EAGAIN from btrfs_duplicate_item and refresh stale leaf pointer Sasha Levin
                   ` (24 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Sasha Levin @ 2026-05-05  9:51 UTC (permalink / raw)
  To: patches, stable
  Cc: Fredric Cover, Steve French, Sasha Levin, sfrench, linux-cifs,
	samba-technical, linux-kernel

From: Fredric Cover <fredric.cover.lkernel@gmail.com>

[ Upstream commit 8e13b1b4093e0cbcb3dc2906c13b1fdc95cdf0a0 ]

Currently, smb2_compound_op() allocates
struct smb2_compound_vars *vars using GFP_ATOMIC, although
smb2_compound_op() can sleep when it calls compound_send_recv()
before vars is freed.

Allocate vars using GFP_KERNEL.

Signed-off-by: Fredric Cover <fredric.cover.lkernel@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Phase 1: Commit Message Forensics
Step 1.1 Record: Subsystem `smb: client`; action verb `change`; intent
is to change `smb2_compound_op()` allocation from `GFP_ATOMIC` to
`GFP_KERNEL`.

Step 1.2 Record: Tags present: `Signed-off-by: Fredric Cover
<fredric.cover.lkernel@gmail.com>` and `Signed-off-by: Steve French
<stfrench@microsoft.com>`. No `Fixes:`, `Reported-by:`, `Tested-by:`,
`Reviewed-by:`, `Acked-by:`, `Link:`, or `Cc: stable@vger.kernel.org`
tags were present.

Step 1.3 Record: The commit body says `smb2_compound_op()` allocates
`struct smb2_compound_vars *vars` with `GFP_ATOMIC`, but later calls
`compound_send_recv()` before freeing `vars`, and that path can sleep.
Symptom is not a crash report; the root cause is an over-restrictive
allocation mode in a sleepable path.

Step 1.4 Record: This is a hidden correctness/reliability fix, not a
feature. `GFP_ATOMIC` is allowed only for non-sleeping/atomic-style
allocations and uses atomic reserves; the documented default for
sleepable kernel allocations is `GFP_KERNEL`.

## Phase 2: Diff Analysis
Step 2.1 Record: One file changed: `fs/smb/client/smb2inode.c`, 1
insertion and 1 deletion. Modified function: `smb2_compound_op()`.
Scope: single-file, single-line surgical fix.

Step 2.2 Record: Before, `vars = kzalloc_obj(*vars, GFP_ATOMIC);`.
After, `vars = kzalloc_obj(*vars, GFP_KERNEL);`. The affected path is
allocation at the start of each `smb2_compound_op()` attempt, before
SMB2 compound request setup and before `compound_send_recv()`.

Step 2.3 Record: Bug category is allocation-context correctness /
resource reliability. The function can sleep through retry `msleep()`
and `compound_send_recv()`, which waits for SMB credits and responses.
The fix lets normal reclaim run and avoids consuming atomic reserves.

Step 2.4 Record: The fix is obviously correct if all callers are
sleepable; code inspection verified callers are VFS/SMB client
operations that already perform sleeping network waits. Regression risk
is very low, because the only behavior change is allowing the allocation
itself to sleep in a path that already sleeps.

## Phase 3: Git History Investigation
Step 3.1 Record: Upstream commit is
`8e13b1b4093e0cbcb3dc2906c13b1fdc95cdf0a0`. Blame on the current
`kzalloc_obj(... GFP_ATOMIC)` line points to treewide allocator
conversion `69050f8d6d075`; the original dynamic allocation using
`GFP_ATOMIC` was introduced by `a7d5c294628088` (`cifs: reduce stack use
in smb2_compound_op`), first appearing in `v5.8`.

Step 3.2 Record: No `Fixes:` tag, so no tagged introducing commit to
follow. Independent blame identified `a7d5c294628088` as the relevant
introduction of the heap allocation.

Step 3.3 Record: Recent history of `fs/smb/client/smb2inode.c` includes
several real fixes to this same function/path: off-by-8 bounds check,
buffer size fix, uninitialized variable fix, wrong index reference, and
refcount leak fixes. No prerequisite for this one-line allocation flag
change was found.

Step 3.4 Record: Author Fredric Cover has a small number of recent SMB
client commits. The patch was committed by Steve French, who is listed
as CIFS/SMB3 client maintainer in `MAINTAINERS`.

Step 3.5 Record: For `v7.0.y`, the patch applies cleanly as-is because
`kzalloc_obj()` exists. For older affected stable trees, the equivalent
backport is the same flag change on `kzalloc(sizeof(*vars),
GFP_ATOMIC)`.

## Phase 4: Mailing List And External Research
Step 4.1 Record: `b4 dig -c 8e13b1b4093e0` found the original submission
at
`https://patch.msgid.link/20260429213453.26235-1-FredTheDude@proton.me`.
`b4 dig -a` found only v1; no later revisions.

Step 4.2 Record: `b4 dig -w` showed the patch was sent to Steve French,
Paulo Alcantara, `linux-cifs@vger.kernel.org`, and `linux-
kernel@vger.kernel.org`. Those are appropriate subsystem recipients.

Step 4.3 Record: No `Reported-by` or bug-report `Link` was present.
WebFetch of lore was blocked by Anubis, but `b4` saved the mbox locally;
the mbox contained the patch email and no review replies.

Step 4.4 Record: The patch is standalone, not part of a multi-patch
series.

Step 4.5 Record: Web search did not find stable-specific discussion for
this exact subject. No known objection or known reason to avoid stable
was found.

## Phase 5: Code Semantic Analysis
Step 5.1 Record: Modified function: `smb2_compound_op()`.

Step 5.2 Record: Callers are all internal SMB2/SMB3 client filesystem
operations: `smb2_query_path_info()`, `smb2_mkdir()`,
`smb2_mkdir_setinfo()`, `smb2_rmdir()`, `smb2_set_path_attr()` via
rename/hardlink, `smb2_set_path_size()`, `smb2_set_file_info()`,
`smb2_create_reparse_inode()`, `smb2_query_reparse_point()`, and
`smb2_rename_pending_delete()`.

Step 5.3 Record: Key callees include request initializers, `msleep()` on
replay retries, `compound_send_recv()`, SMB2 request free helpers,
response buffer frees, and `kfree(vars)`.

Step 5.4 Record: Reachability is user-facing: VFS operations such as
`stat/getattr`, mkdir, rmdir, rename, setattr/truncate, link/reparse
handling flow through SMB client ops into `smb2_compound_op()`.
`compound_send_recv()` waits in `wait_for_compound_request()` and
`wait_for_response()`.

Step 5.5 Record: No duplicate same-line allocation bug was found
elsewhere in this file. Related history shows this function has received
multiple independent correctness fixes.

## Phase 6: Stable Tree Analysis
Step 6.1 Record: The buggy heap allocation exists in
`stable/linux-5.10.y`, `5.15.y`, `6.1.y`, `6.6.y`, `6.12.y`, `6.18.y`,
`6.19.y`, and `7.0.y`. `stable/linux-5.4.y` has `smb2_compound_op()` but
did not show this `vars` heap allocation, so this specific fix is not
needed there.

Step 6.2 Record: Expected backport difficulty: clean for `7.0.y`;
trivial contextual backport for older affected trees by changing
`kzalloc(sizeof(*vars), GFP_ATOMIC)` to `GFP_KERNEL`.

Step 6.3 Record: No related stable fix for this exact allocation flag
issue was found. Current stable file histories still show the
`GFP_ATOMIC` allocation in affected branches.

## Phase 7: Subsystem And Maintainer Context
Step 7.1 Record: Subsystem is CIFS/SMB3 client filesystem code.
Criticality: IMPORTANT, because it affects mounted SMB2/SMB3 filesystems
and normal filesystem operations, not a niche driver.

Step 7.2 Record: The subsystem is active; recent file history contains
multiple 2025-2026 fixes in the same file and function. The maintainer
tree accepted this patch through Steve French.

## Phase 8: Impact And Risk Assessment
Step 8.1 Record: Affected users are SMB2/SMB3 client users on kernels
where the heap allocation exists, mainly `v5.8+` and active stable trees
from `5.10.y` onward.

Step 8.2 Record: Trigger condition is any `smb2_compound_op()`
allocation during common SMB client metadata/namespace operations. The
user-visible bad outcome is avoidable `-ENOMEM` or unnecessary atomic
reserve pressure under memory pressure. Unprivileged users can reach
some paths through normal filesystem operations if they have access to
an SMB mount.

Step 8.3 Record: Failure severity is MEDIUM: not proven
crash/corruption/security, but it affects reliability of common
filesystem operations under memory pressure.

Step 8.4 Record: Benefit is medium because it improves allocation
reliability in common SMB client paths. Risk is very low because the
path already sleeps and the patch changes only the GFP flag.

## Phase 9: Final Synthesis
Step 9.1 Record: Evidence for backporting: one-line fix, no feature/API
change, maintainer-applied, correct allocation flag per kernel docs,
path already sleeps, affects common SMB client operations, affected code
exists in active stable trees. Evidence against: no reported crash, no
`Fixes:`/stable tag, and severity is reliability/ENOMEM rather than data
corruption or panic. Unresolved: direct lore WebFetch was blocked, but
`b4` mbox retrieval succeeded and showed no discussion replies.

Step 9.2 Record: Stable rules checklist: obviously correct: yes; tested:
no explicit `Tested-by`; fixes a real bug: yes, wrong allocation mode in
a sleepable common path; important issue: medium reliability issue, not
critical; small and contained: yes, one line; no new feature/API: yes;
can apply to stable: yes, with trivial adjustment before
`kzalloc_obj()`.

Step 9.3 Record: No automatic exception category applies; this is not a
device ID, hardware quirk, DT update, build fix, or documentation fix.

Step 9.4 Record: Decision: backport. The technical merit is enough
because the fix is minimal, clearly correct, affects common SMB client
operations, and has negligible regression risk.

## Verification
- Phase 1: `git show --format=fuller --stat --patch 8e13b1b4093e0`
  verified subject, body, tags, author, committer, and one-line diff.
- Phase 2: `git show` verified exactly `GFP_ATOMIC` to `GFP_KERNEL` in
  `smb2_compound_op()`.
- Phase 2/5: `ReadFile` verified `compound_send_recv()` is called before
  `kfree(vars)` and that retries can call `msleep()`.
- Phase 2/5: `ReadFile` verified `compound_send_recv()` calls
  `wait_for_compound_request()` and `wait_for_response()`, both wait
  paths.
- Phase 3: `git blame` verified the current line was last touched by
  `69050f8d6d075`; `git show a7d5c294628088` verified the original heap
  allocation with `GFP_ATOMIC`.
- Phase 3: `git tag --contains a7d5c294628088` verified first release
  family as `v5.8`.
- Phase 4: `b4 dig -c`, `-a`, and `-w` verified original patch URL,
  single v1 revision, and recipients.
- Phase 4: local mbox read verified the patch email content and absence
  of additional replies in the saved thread.
- Phase 5: `rg` and `ReadFile` verified callers and VFS reachability
  through SMB client ops.
- Phase 6: branch checks verified the buggy allocation in `5.10.y`,
  `5.15.y`, `6.1.y`, `6.6.y`, `6.12.y`, `6.18.y`, `6.19.y`, and `7.0.y`;
  `5.4.y` did not show this heap allocation.
- Phase 6: `git apply --check` verified the upstream patch applies
  cleanly to the current `7.0.y` checkout.
- Phase 7: `MAINTAINERS` verified Steve French as CIFS/SMB3 maintainer
  and `linux-cifs@vger.kernel.org` as the subsystem list.
- Phase 8: `Documentation/core-api/memory-allocation.rst` and
  `include/linux/gfp_types.h` verified `GFP_KERNEL` is for sleepable
  kernel allocations and `GFP_ATOMIC` is non-sleeping/atomic-reserve
  allocation.

**YES**

 fs/smb/client/smb2inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/smb/client/smb2inode.c b/fs/smb/client/smb2inode.c
index fe1c9d7765806..8f66de216d0b5 100644
--- a/fs/smb/client/smb2inode.c
+++ b/fs/smb/client/smb2inode.c
@@ -209,7 +209,7 @@ static int smb2_compound_op(const unsigned int xid, struct cifs_tcon *tcon,
 	num_rqst = 0;
 	server = cifs_pick_channel(ses);

-	vars = kzalloc_obj(*vars, GFP_ATOMIC);
+	vars = kzalloc_obj(*vars, GFP_KERNEL);
 	if (vars == NULL) {
 		rc = -ENOMEM;
 		goto out;
-- 
2.53.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH AUTOSEL 7.0-6.18] btrfs: handle -EAGAIN from btrfs_duplicate_item and refresh stale leaf pointer
  2026-05-05  9:51 [PATCH AUTOSEL 7.0-5.10] ALSA: hda: Avoid WARN_ON() for HDMI chmap slot checks Sasha Levin
                   ` (2 preceding siblings ...)
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0] smb: client: change allocation requirements in smb2_compound_op Sasha Levin
@ 2026-05-05  9:51 ` Sasha Levin
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-5.10] nvme: add missing MODULE_ALIAS for fabrics transports Sasha Levin
                   ` (23 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Sasha Levin @ 2026-05-05  9:51 UTC (permalink / raw)
  To: patches, stable
  Cc: robbieko, Johannes Thumshirn, David Sterba, Sasha Levin, clm,
	linux-btrfs, linux-kernel

From: robbieko <robbieko@synology.com>

[ Upstream commit fe0cdfd7118d8b40a21bfac221bb4982c5e10e10 ]

In the 'punch a hole' case of btrfs_delete_raid_extent(),
btrfs_duplicate_item() can return -EAGAIN when the leaf needs to be
split and the path becomes invalid. The old code treats any error as
fatal and breaks out of the loop.

Additionally, btrfs_duplicate_item() may trigger setup_leaf_for_split()
which can reallocate the leaf node. The code continues using the old
leaf pointer, leading to use-after-free or stale data access.

Fix both issues by:

- Handling -EAGAIN specifically: release the path and retry the loop.
- Refreshing leaf = path->nodes[0] after successful duplication.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: robbieko <robbieko@synology.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Phase 1: Commit Message Forensics
Step 1.1 Record: Subsystem `btrfs`; action verb `handle` / `refresh`;
claimed intent is to handle `-EAGAIN` from `btrfs_duplicate_item()` and
refresh a stale `leaf` pointer in `btrfs_delete_raid_extent()`.

Step 1.2 Record: Tags present:
- `Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>`
- `Signed-off-by: robbieko <robbieko@synology.com>`
- `Reviewed-by: David Sterba <dsterba@suse.com>`
- `Signed-off-by: David Sterba <dsterba@suse.com>`
No `Fixes:`, `Reported-by:`, `Tested-by:`, `Link:`, or `Cc:
stable@vger.kernel.org` tag was present.

Step 1.3 Record: The commit describes two bugs in the “punch a hole”
case of `btrfs_delete_raid_extent()`: `btrfs_duplicate_item()` can
return `-EAGAIN`, which old code treats as fatal, and
`btrfs_duplicate_item()` can change the leaf through
`setup_leaf_for_split()`, leaving the caller’s cached `leaf` pointer
stale. Symptom/failure mode: failed deletion path via false fatal error,
plus possible use-after-free or stale metadata access. No affected
kernel version is stated in the commit body.

Step 1.4 Record: This is not hidden; it is explicitly a bug fix. It
fixes memory safety/stale pointer handling and retry handling for a
known nonfatal `-EAGAIN` path.

## Phase 2: Diff Analysis
Step 2.1 Record: One file changed: `fs/btrfs/raid-stripe-tree.c`, 10
insertions, no removals. One function modified:
`btrfs_delete_raid_extent()`. Scope classification: single-file surgical
filesystem metadata fix.

Step 2.2 Record:
- Before: after `btrfs_duplicate_item()`, any nonzero return, including
  `-EAGAIN`, broke out of the loop and returned an error.
- After: `-EAGAIN` releases the path and retries the outer search loop.
- Before: `leaf` was captured before `btrfs_duplicate_item()` and reused
  afterward.
- After: `leaf = path->nodes[0]` is refreshed before item access and
  stripe physical address updates.

Step 2.3 Record: Bug categories are memory safety and logic/correctness.
Verified mechanism: `btrfs_duplicate_item()` calls
`setup_leaf_for_split()`, which can release and re-search the path and
return `-EAGAIN`; after success, `btrfs_duplicate_item()` itself
refreshes its local `leaf`, showing the caller must not assume its
previous `leaf` remains valid.

Step 2.4 Record: The fix is obviously correct by local pattern: existing
`btrfs_duplicate_item()` callers in `fs/btrfs/file.c` already handle
`-EAGAIN` by releasing/retrying and refresh `leaf` after success.
Regression risk is low: no API change, no locking change, no new
feature, and the behavior change is limited to one special-case branch.

## Phase 3: Git History Investigation
Step 3.1 Record: `git blame` shows the buggy punch-hole code in
`btrfs_delete_raid_extent()` was introduced by `6aa0e7cc569eb` (`btrfs:
implement hole punching for RAID stripe extents`), first contained by
`v6.14-rc1~207^2~7`.

Step 3.2 Record: No `Fixes:` tag exists, so there was no tagged commit
to follow. I independently identified `6aa0e7cc569eb` as the introducer
by blaming the changed lines.

Step 3.3 Record: Recent `fs/btrfs/raid-stripe-tree.c` history shows this
commit is part of a six-patch bug-fix series for RAID stripe tree
deletion. The immediately following commit, `a8d58a7c02009`, fixes
unchecked return values in the same function, but this candidate’s stale
pointer and `-EAGAIN` fix is standalone and applied cleanly to the
current `7.0.y` checkout.

Step 3.4 Record: On `master`, the author has several nearby Btrfs fixes,
including this six-patch RAID stripe tree deletion series and older
Btrfs fixes. The commit was committed and reviewed by David Sterba, the
Btrfs maintainer.

Step 3.5 Record: Dependencies found: the buggy branch requires
`6aa0e7cc569eb`; stable branches without that commit do not need this
fix. No prerequisite commit is needed for the patch hunk itself on
`7.0.y`; `git apply --check` succeeded there.

## Phase 4: Mailing List And External Research
Step 4.1 Record: `b4 dig -c fe0cdfd7118d8...` found the original
submission at `https://patch.msgid.link/20260413065249.2320122-6-
robbieko@synology.com`. `b4 dig -a` found only v1 of the series. The
patch was `[PATCH 5/6]`.

Step 4.2 Record: `b4 dig -w -C` showed original recipients were
`robbieko <robbieko@synology.com>` and `linux-btrfs@vger.kernel.org`.
The mbox shows Johannes Thumshirn replied “Looks good” with `Reviewed-
by`, and David Sterba later said the series was “Added to for-next,
thanks.”

Step 4.3 Record: No `Reported-by` or bug-report `Link:` tag exists.
WebFetch to lore was blocked by Anubis, but `b4` successfully downloaded
the mbox.

Step 4.4 Record: The series contains six RAID stripe tree deletion
fixes. This patch is patch 5/6; the following patch is related but not a
prerequisite for this specific retry/stale-pointer fix.

Step 4.5 Record: Web/lore stable searches did not find a stable-specific
request or objection. This is unresolved only for stable-list
discussion; it does not affect the technical decision.

## Phase 5: Code Semantic Analysis
Step 5.1 Record: Modified function: `btrfs_delete_raid_extent()`.

Step 5.2 Record: Callers/impact surface verified:
- `do_free_extent_accounting()` calls `btrfs_delete_raid_extent()` for
  data extents.
- `__btrfs_free_extent()` calls `do_free_extent_accounting()` when
  references drop to zero.
- `btrfs_free_extent()` queues data refs from several Btrfs paths,
  including `btrfs_drop_extents()` and `btrfs_mark_extent_written()`.
- `btrfs_punch_hole()` calls `btrfs_replace_file_extents()`, which calls
  `btrfs_drop_extents()`.

Step 5.3 Record: Key callees are `btrfs_search_slot()`,
`btrfs_duplicate_item()`, `btrfs_release_path()`, `btrfs_item_size()`,
`btrfs_item_ptr()`, and `btrfs_partially_delete_raid_extent()`.
`btrfs_duplicate_item()` calls `setup_leaf_for_split()`.

Step 5.4 Record: The path is reachable from normal Btrfs file extent
freeing, including file hole punching through `btrfs_punch_hole()`, but
only when the filesystem has the RAID stripe tree incompat feature and a
supported data profile.

Step 5.5 Record: Similar pattern found in `fs/btrfs/file.c`: callers of
`btrfs_duplicate_item()` handle `-EAGAIN` by releasing/retrying and
refresh `leaf` afterward. This strongly validates the fix pattern.

## Phase 6: Stable Tree Analysis
Step 6.1 Record: The buggy introducer `6aa0e7cc569eb` is present in
`stable/linux-6.14.y` through `stable/linux-7.0.y`; it is absent from
`stable/linux-6.13.y` and older checked LTS branches.
`stable/linux-6.6.y` and `6.12.y` have `raid-stripe-tree.c` but lack
this punch-hole code.

Step 6.2 Record: Expected backport difficulty is clean or minor for
affected newer stable trees. Verified `git apply --check` succeeds on
the current `7.0.y` checkout. The exact affected hunk also exists in
`6.18.y`, `6.19.y`, and `7.0.y`; `6.18.y` differs elsewhere in the file
but not in the relevant hunk.

Step 6.3 Record: No same-title or same “stale leaf pointer” fix exists
in checked stable branches. Related six-patch series fixes are not
present in `7.0.y`.

## Phase 7: Subsystem And Maintainer Context
Step 7.1 Record: Subsystem is Btrfs filesystem metadata, specifically
RAID stripe tree. Criticality: important for users of Btrfs RAID stripe
tree; not universal. It is gated by `RAID_STRIPE_TREE` and supported
data profiles.

Step 7.2 Record: The file is actively developed: recent history shows
several RAID stripe tree fixes and cleanups, including this six-patch
deletion-path series.

## Phase 8: Impact And Risk Assessment
Step 8.1 Record: Affected population is Btrfs users with RAID stripe
tree enabled, especially supported data profiles such as
DUP/RAID1/RAID0/RAID10 per `btrfs_need_stripe_tree_update()`.

Step 8.2 Record: Trigger condition is deleting/freeing a data extent
range that falls strictly inside a RAID stripe extent (`found_start <
start && found_end > end`) and then hits either `-EAGAIN` from
`btrfs_duplicate_item()` or a leaf change after duplication. User
reachability is verified through Btrfs file hole-punching and extent
dropping paths, but exact ease of triggering the leaf split was not
runtime-tested.

Step 8.3 Record: Failure mode severity is HIGH: use-after-free/stale
metadata access in filesystem metadata code, plus incorrect fatal
handling of a retryable `-EAGAIN`. The commit body explicitly states
UAF/stale data access; code inspection verifies the stale pointer
mechanism.

Step 8.4 Record: Benefit is high for affected users because it prevents
memory-safety/stale metadata behavior in a filesystem metadata update
path. Risk is low because it adds only 10 lines in one branch and
follows existing Btrfs caller patterns.

## Phase 9: Final Synthesis
Step 9.1 Record:
Evidence for backporting:
- Fixes a real memory-safety/stale pointer bug and incorrect `-EAGAIN`
  handling.
- Small, contained, one-function change.
- Matches established `btrfs_duplicate_item()` caller behavior in
  `fs/btrfs/file.c`.
- Reviewed by Johannes Thumshirn and David Sterba; merged by David
  Sterba.
- Bug exists in newer stable branches from `6.14.y` onward and is absent
  from older checked branches.

Evidence against:
- Affects a narrower, experimental Btrfs RAID stripe tree feature, not
  all Btrfs users.
- No `Reported-by`, reproducer, or `Tested-by` tag.
- It is part of a six-patch series, so adjacent RAID stripe tree
  deletion fixes should also be reviewed.

Unresolved:
- I did not runtime-test the path.
- I could not read lore via WebFetch due Anubis, but `b4` mbox retrieval
  succeeded.
- I did not prove exact clean application for every stable branch, only
  current `7.0.y`; source inspection indicates minor/clean for newer
  affected branches.

Step 9.2 Record:
1. Obviously correct and tested? Obviously correct by code and existing
   Btrfs pattern; no explicit `Tested-by`.
2. Fixes a real bug? Yes, stale `leaf` use and retryable `-EAGAIN`
   mishandling.
3. Important issue? Yes, HIGH severity memory-safety/stale metadata
   access in filesystem metadata code.
4. Small and contained? Yes, 10 insertions in one function.
5. No new features/APIs? Yes.
6. Can apply to stable trees? Yes for `7.0.y` verified; applicable to
   `6.14.y+` trees containing the introducer.

Step 9.3 Record: No automatic exception category applies; this is not a
device ID, quirk, DT, build, or documentation patch.

Step 9.4 Record: The risk-benefit balance favors backporting to stable
trees that contain `6aa0e7cc569eb`. It should not be backported to older
trees that lack the RAID stripe extent hole-punching code.

## Verification
- [Phase 1] Parsed `git show --format=fuller fe0cdfd7118d8`: confirmed
  subject, body, tags, author, committer, and absence of
  `Fixes:`/`Reported-by`/`Tested-by`.
- [Phase 2] Inspected `git diff --function-context fe0cdfd7118d8^
  fe0cdfd7118d8`: confirmed 10 insertions in
  `btrfs_delete_raid_extent()`.
- [Phase 2] Read `fs/btrfs/ctree.c`: confirmed `btrfs_duplicate_item()`
  calls `setup_leaf_for_split()` and refreshes its local `leaf`.
- [Phase 3] Ran `git blame` on affected lines: identified
  `6aa0e7cc569eb` as introducer.
- [Phase 3] Ran `git describe --contains 6aa0e7cc569eb`: first contained
  by `v6.14-rc1~207^2~7`.
- [Phase 3] Checked author history on `master`: found the six related
  Btrfs RAID stripe tree fixes.
- [Phase 4] Ran `b4 dig -c`, `-a`, `-w -C`, and `-m`: found patch URL,
  v1-only series, recipients, mbox, maintainer/reviewer responses.
- [Phase 4] Read mbox: Johannes gave `Reviewed-by`; David said series
  was added to for-next.
- [Phase 5] Used `rg` and source reads to trace `btrfs_punch_hole()` →
  `btrfs_replace_file_extents()` → `btrfs_drop_extents()` →
  `btrfs_free_extent()` → delayed free accounting →
  `btrfs_delete_raid_extent()`.
- [Phase 5] Searched similar patterns: `fs/btrfs/file.c` handles
  `-EAGAIN` and refreshes `leaf` after `btrfs_duplicate_item()`.
- [Phase 6] Checked stable branches with `git merge-base --is-ancestor`:
  `6.14.y` through `7.0.y` contain introducer and lack fix; `6.13.y`
  lacks introducer.
- [Phase 6] Ran `git apply --check` for the candidate patch on current
  `7.0.y`: success.
- [Phase 7] Read `fs/btrfs/Kconfig` and `fs/btrfs/raid-stripe-tree.h`:
  verified RAID stripe tree is feature-gated and profile-limited.
- [Phase 8] Verified failure mode from commit body plus code flow
  through `setup_leaf_for_split()` and stale caller `leaf`.

**YES**

 fs/btrfs/raid-stripe-tree.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
index 86ddc3ecb4060..22327f4833113 100644
--- a/fs/btrfs/raid-stripe-tree.c
+++ b/fs/btrfs/raid-stripe-tree.c
@@ -194,9 +194,19 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le

 			/* The "right" item. */
 			ret = btrfs_duplicate_item(trans, stripe_root, path, &newkey);
+			if (ret == -EAGAIN) {
+				btrfs_release_path(path);
+				continue;
+			}
 			if (ret)
 				break;

+			/*
+			 * btrfs_duplicate_item() may have triggered a leaf
+			 * split via setup_leaf_for_split(), so we must refresh
+			 * our leaf pointer from the path.
+			 */
+			leaf = path->nodes[0];
 			item_size = btrfs_item_size(leaf, path->slots[0]);
 			extent = btrfs_item_ptr(leaf, path->slots[0],
 						struct btrfs_stripe_extent);
-- 
2.53.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH AUTOSEL 7.0-5.10] nvme: add missing MODULE_ALIAS for fabrics transports
  2026-05-05  9:51 [PATCH AUTOSEL 7.0-5.10] ALSA: hda: Avoid WARN_ON() for HDMI chmap slot checks Sasha Levin
                   ` (3 preceding siblings ...)
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.18] btrfs: handle -EAGAIN from btrfs_duplicate_item and refresh stale leaf pointer Sasha Levin
@ 2026-05-05  9:51 ` Sasha Levin
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0] dpll: export __dpll_pin_change_ntf() for use under dpll_lock Sasha Levin
                   ` (22 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Sasha Levin @ 2026-05-05  9:51 UTC (permalink / raw)
  To: patches, stable
  Cc: Geliang Tang, Christoph Hellwig, Keith Busch, Sasha Levin,
	justin.tee, nareshgottumukkala83, paul.ely, sagi, linux-nvme,
	linux-kernel

From: Geliang Tang <tanggeliang@kylinos.cn>

[ Upstream commit 723277b15ed97185ce6f75abbf19f06e00f0a6f5 ]

The generic fabrics layer uses request_module("nvme-%s", opts->transport)
to auto-load transport modules. Currently, the nvme-tcp, nvme-rdma, and
nvme-fc modules lack MODULE_ALIAS entries for these names, which prevents
the kernel from automatically finding and loading them when requested.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Phase Walkthrough
Phase 1 Record: subsystem `nvme` / NVMe host fabrics; action verb `add
missing`; intent is to add module aliases for existing fabrics
transports. Tags: `Reviewed-by: Christoph Hellwig <hch@lst.de>`,
`Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>`, `Signed-off-by:
Keith Busch <kbusch@kernel.org>`. No `Fixes:`, `Reported-by:`, `Tested-
by:`, `Link:`, or `Cc: stable`. Body describes a real autoload bug:
`fabrics.c` calls `request_module("nvme-%s", opts->transport)`, but the
transport modules do not advertise aliases matching `nvme-tcp`, `nvme-
rdma`, or `nvme-fc`. This is a hidden bug fix despite using “add”.

Phase 2 Record: files changed are `drivers/nvme/host/fc.c` `+1`,
`drivers/nvme/host/rdma.c` `+1`, `drivers/nvme/host/tcp.c` `+1`; no
functions changed, only module metadata at EOF. Before: modules had
license/description metadata but no matching alias. After: each existing
module exports the alias that the existing autoload path requests. Bug
category is logic/module-autoload correctness. Fix quality is excellent:
three literal aliases, each matching the existing transport name and
module filename; regression risk is very low.

Phase 3 Record: `request_module("nvme-%s", opts->transport)` was
introduced by `d1f1071f81513` (`nvme-fabrics: request transport
module`), first in `v4.15`. RDMA host came in `7110230719602` (`v4.8`),
FC in `e399441de9115` (`v4.10`), TCP in `3f2304f8c6d6` (`v5.0`). No
`Fixes:` tag to follow. Recent file history shows normal NVMe churn, not
a prerequisite series. The author has prior NVMe host commits, and the
patch was reviewed/committed by listed NVMe maintainers. No dependency
beyond the existing files and `MODULE_ALIAS`.

Phase 4 Record: `b4 dig -c 723277b15ed97` found the original patch at `h
ttps://patch.msgid.link/e3076c80ee2e0c4a2cae0374afb617a3365946ea.1774944
415.git.tanggeliang@kylinos.cn`. `b4 dig -a` found only v1. `b4 dig -w`
showed NVMe maintainers and `linux-nvme` were included. Lore mirror
shows Christoph Hellwig replied “Looks good” with `Reviewed-by`, and
Keith Busch replied “applied to nvme-7.1”. No NAKs, no stable
nomination, and no separate bug report link found.

Phase 5 Record: modified “functions” are module metadata only. The
affected call path is verified as `nvmf_dev_write()` ->
`nvmf_create_ctrl()` -> `request_module("nvme-%s", opts->transport)` ->
`nvmf_lookup_transport()` -> transport `create_ctrl`. Transport
registration uses `.name = "fc"`, `.name = "rdma"`, `.name = "tcp"` and
module objects are `nvme-fc.o`, `nvme-rdma.o`, `nvme-tcp.o`, so the
alias strings exactly match the requested names. This path is reachable
from userspace via the `nvme-fabrics` misc device write path; I did not
verify whether unprivileged users can trigger it.

Phase 6 Record: stable refs checked. `stable/linux-4.19.y` has the
autoload request and FC/RDMA files, but no TCP file.
`stable/linux-5.4.y` through `stable/linux-7.0.y` have the autoload
request and all three transport files. None of those checked stable refs
had the `MODULE_ALIAS("nvme-*")` entries. The patch applies cleanly to
current `stable/linux-7.0.y`; `5.4` through `6.6` lack the newer
`MODULE_DESCRIPTION` context, so they need a trivial backport placing
aliases after `MODULE_LICENSE`. `4.14.y` lacks the verified autoload
request and has no TCP file, so benefit there is not established.

Phase 7 Record: subsystem is NVMe host fabrics under storage/block;
criticality is IMPORTANT because it affects NVMe-oF connectivity for
users of these transports. The subsystem is active, with recent NVMe
host changes verified in history.

Phase 8 Record: affected users are NVMe-oF users with FC/RDMA/TCP
transports built as modules and not already loaded. Trigger is creating
a fabrics controller for one of those transports while relying on kernel
module autoload. Failure mode is not a crash: `nvmf_lookup_transport()`
fails, logs `no handler found for transport %s`, and returns `-EINVAL`,
preventing the connection. Severity is MEDIUM user-visible functional
failure for storage connectivity. Benefit is high for affected
configurations; risk is very low because it only adds module alias
metadata for existing modules.

Phase 9 Record: evidence for backporting: real autoload bug, existing
request path, exact alias/name/module matches, important storage
subsystem, very small patch, maintainer review, no API or new feature.
Evidence against: no crash/security/data-corruption impact, no explicit
stable tag, no explicit Tested-by, and older stable trees need minor
context adjustment. Stable rules: obviously correct yes; tested tag no
but maintainer-reviewed; fixes a real user-visible bug yes; important
enough for affected NVMe-oF users yes; small and contained yes; no new
feature/API yes; can apply cleanly to `7.0.y` and trivially to older
applicable trees. Exception category: closest to module/device autoload
enablement for existing drivers, not a new driver.

## Verification
- Phase 1: inspected upstream commit `723277b15ed97`; verified subject,
  body, tags, and absence of `Fixes:`/`Reported-by:`/stable tags.
- Phase 2: inspected diff; verified exactly three added `MODULE_ALIAS`
  lines.
- Phase 3: used blame/log/tag checks; verified autoload request
  introduced in `v4.15`, RDMA in `v4.8`, FC in `v4.10`, TCP in `v5.0`.
- Phase 4: ran `b4 dig -c`, `-a`, `-w`; fetched lore mirror; verified
  v1-only patch, maintainer review, maintainer application, no NAKs.
- Phase 5: read `fabrics.c`, `Makefile`, `Kconfig`, and transport ops;
  verified call path, transport names, and module names.
- Phase 6: checked stable refs `4.14.y`, `4.19.y`, `5.4.y`, `5.10.y`,
  `5.15.y`, `6.1.y`, `6.6.y`, `6.12.y`, `6.19.y`, `7.0.y`; verified
  applicable code and missing aliases; `git apply --check` passed on
  current `7.0.y`. One intermediate shell formatting command failed due
  to a bad `sed` expression; I reran the stable metadata check correctly
  afterward.
- Phase 8: verified the failure path returns `-EINVAL` after
  `nvmf_lookup_transport()` fails; privilege requirements were not
  verified.

This should be backported to applicable stable trees. It fixes an
existing intended autoload mechanism for existing NVMe fabrics
transports with a tiny, low-risk metadata-only change.

**YES**

 drivers/nvme/host/fc.c   | 1 +
 drivers/nvme/host/rdma.c | 1 +
 drivers/nvme/host/tcp.c  | 1 +
 3 files changed, 3 insertions(+)

diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index e1bb4707183ca..e4f4528fe2a2d 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -3968,3 +3968,4 @@ module_exit(nvme_fc_exit_module);

 MODULE_DESCRIPTION("NVMe host FC transport driver");
 MODULE_LICENSE("GPL v2");
+MODULE_ALIAS("nvme-fc");
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 57111139e84fa..1ec6e867aedb6 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -2432,3 +2432,4 @@ module_exit(nvme_rdma_cleanup_module);

 MODULE_DESCRIPTION("NVMe host RDMA transport driver");
 MODULE_LICENSE("GPL v2");
+MODULE_ALIAS("nvme-rdma");
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 243dab830dc84..02c95c32b07e3 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -3071,3 +3071,4 @@ module_exit(nvme_tcp_cleanup_module);

 MODULE_DESCRIPTION("NVMe host TCP transport driver");
 MODULE_LICENSE("GPL v2");
+MODULE_ALIAS("nvme-tcp");
-- 
2.53.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH AUTOSEL 7.0] dpll: export __dpll_pin_change_ntf() for use under dpll_lock
  2026-05-05  9:51 [PATCH AUTOSEL 7.0-5.10] ALSA: hda: Avoid WARN_ON() for HDMI chmap slot checks Sasha Levin
                   ` (4 preceding siblings ...)
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-5.10] nvme: add missing MODULE_ALIAS for fabrics transports Sasha Levin
@ 2026-05-05  9:51 ` Sasha Levin
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-5.10] nvme-core: fix parameter name in comment Sasha Levin
                   ` (21 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Sasha Levin @ 2026-05-05  9:51 UTC (permalink / raw)
  To: patches, stable
  Cc: Ivan Vecera, Vadim Fedorenko, Petr Oros, Alexander Nowlin,
	Arkadiusz Kubalewski, Jacob Keller, Paolo Abeni, Sasha Levin,
	jiri, netdev, linux-kernel

From: Ivan Vecera <ivecera@redhat.com>

[ Upstream commit 620055cb1036a6125fd912e7a14b47a6572b809b ]

Export __dpll_pin_change_ntf() so that drivers can send pin change
notifications from within pin callbacks, which are already called
under dpll_lock. Using dpll_pin_change_ntf() in that context would
deadlock.

Add lockdep_assert_held() to catch misuse without the lock held.

Acked-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Petr Oros <poros@redhat.com>
Tested-by: Alexander Nowlin <alexander.nowlin@intel.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260427-jk-iwl-net-petr-oros-fixes-v1-9-cdcb48303fd8@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Phase 1: Commit Message Forensics
Step 1.1 Record: subsystem `dpll`; action verb `export`; intent is to
expose `__dpll_pin_change_ntf()` so drivers can notify pin changes while
already under `dpll_lock`.

Step 1.2 Record: tags present: `Acked-by: Vadim Fedorenko`; `Signed-off-
by: Ivan Vecera`; `Signed-off-by: Petr Oros`; `Tested-by: Alexander
Nowlin`; `Reviewed-by: Arkadiusz Kubalewski`; `Signed-off-by: Jacob
Keller`; `Link: https://patch.msgid.link/20260427-jk-iwl-net-petr-oros-
fixes-v1-9-cdcb48303fd8@intel.com`; `Signed-off-by: Paolo Abeni`. No
`Fixes:`, `Reported-by:`, or `Cc: stable`.

Step 1.3 Record: the message describes a real locking problem: pin
callbacks are already invoked with `dpll_lock` held, and calling
exported `dpll_pin_change_ntf()` there would try to acquire the same
mutex again and deadlock. It adds a lockdep assertion to catch misuse.

Step 1.4 Record: this is a hidden/preparatory bug fix. Alone it mostly
exports an internal helper, but in the verified series it enables the
following ice fix to send peer DPLL notifications without self-
deadlocking.

## Phase 2: Diff Analysis
Step 2.1 Record: files changed are `drivers/dpll/dpll_netlink.c` `+10`,
`drivers/dpll/dpll_netlink.h` `-2`, and `include/linux/dpll.h` `+1`;
total `11 insertions, 2 deletions`. Modified function:
`__dpll_pin_change_ntf()`. Scope is small and contained.

Step 2.2 Record: before, only `dpll_pin_change_ntf()` was exported and
it always acquired `dpll_lock`; the unlocked helper was internal. After,
the helper is documented, asserts `dpll_lock` is held, is exported GPL-
only, and its declaration moves from private dpll header to public
kernel dpll header.

Step 2.3 Record: bug category is synchronization/deadlock avoidance plus
prerequisite API exposure for a driver bug fix. The specific mechanism
is avoiding recursive acquisition of non-recursive `dpll_lock` from
callbacks already reached through `dpll_pin_pre_doit()`.

Step 2.4 Record: fix quality is good: minimal, obvious, lockdep-guarded,
no userspace ABI, no broad refactor. Regression risk is low; the main
concern is that it is useful only with the dependent ice fixes.

## Phase 3: Git History Investigation
Step 3.1 Record: `git blame` showed `dpll_pin_change_ntf()` came from
`9d71b54b65b1` in v6.7-era dpll netlink code; `__dpll_pin_change_ntf()`
declaration was added by `58256a26bfb3`, first in v6.17. The ice bug
being fixed by the dependent commits was introduced by `2dd5d03c77e2`,
first in v6.17.

Step 3.2 Record: candidate has no `Fixes:` tag. Related follow-up
commits `1a41b58fd4dc` and `9e5dead140af` both fix `2dd5d03c77e2 ("ice:
redesign dpll sma/u.fl pins control")`.

Step 3.3 Record: recent history shows this is immediately followed by
`ice: fix missing dpll notifications for SW pins` and `ice: add dpll
peer notification for paired SMA and U.FL pins`; it is part of that fix
series and is a prerequisite for the third patch.

Step 3.4 Record: Ivan Vecera has multiple recent dpll commits in this
area; Petr Oros and Intel reviewers are active in the dpll/ice code. The
patch was committed through Paolo Abeni’s net tree path.

Step 3.5 Record: dependency found: the real user-visible fix needs the
two following ice commits. For older stable trees before v7.0, the patch
also needs a small backport adjustment because `dpll: Add notifier chain
for dpll events` is absent.

## Phase 4: Mailing List And External Research
Step 4.1 Record: `b4 dig -c 620055cb1036a` found the original applied
submission as `[PATCH net 09/11]` at `20260427-jk-iwl-net-petr-oros-
fixes-v1-9-cdcb48303fd8@intel.com`. `b4 dig -a` showed earlier v4-v7
iterations of the ice DPLL fix series; v7 split the ice fix as requested
by review.

Step 4.2 Record: `b4 dig -w` showed netdev maintainers and relevant
Intel/dpll reviewers were CC’d, including David Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Jiri Pirko, Vadim Fedorenko, Arkadiusz
Kubalewski, Jacob Keller, Intel wired LAN, and netdev.

Step 4.3 Record: lore mirror confirmed the series purpose: SMA/U.FL
software-controlled pins missed DPLL notifications, and userspace
consumers such as `synce4l` would not learn about state/phase-offset
changes. Alexander Nowlin provided `Tested-by`; Arkadiusz Kubalewski
reviewed.

Step 4.4 Record: related patches are the following two ice commits;
patch 3 specifically calls `__dpll_pin_change_ntf()` from dpll pin
callbacks because `dpll_lock` remains held.

Step 4.5 Record: direct lore stable search via WebFetch was blocked by
Anubis; web search found no stable-specific objection or discussion.

## Phase 5: Code Semantic Analysis
Step 5.1 Record: modified function is `__dpll_pin_change_ntf()`.

Step 5.2 Record: callers are `dpll_pin_change_ntf()`, internal dpll
netlink setters, dpll core ref-sync code, and after the dependent patch,
`ice_dpll_sw_pin_notify_peer()`.

Step 5.3 Record: callees are `lockdep_assert_held(&dpll_lock)`,
`dpll_pin_notify()` on v7.0+, and `dpll_pin_event_send()`.

Step 5.4 Record: user reachability is through generic netlink
`DPLL_CMD_PIN_SET`, which uses `dpll_pin_pre_doit()` and
`dpll_pin_post_doit()` around the operation. The netlink operation is
admin-permission gated, so this is not an unprivileged trigger.

Step 5.5 Record: existing internal `__dpll_pin_change_ntf()` callers are
in dpll code under dpll locking. The problematic missing pattern was an
exported helper for already-locked driver callback context.

## Phase 6: Stable Tree Analysis
Step 6.1 Record: affected stable trees are v6.17+, because
`2dd5d03c77e2` exists in `stable/linux-6.17.y`, `6.18.y`, `6.19.y`, and
`7.0.y`. It is absent from `6.12.y` and older checked branches;
`drivers/dpll` is absent in `6.6.y`.

Step 6.2 Record: current `7.0.y` accepts the patch with `git apply
--check`. `git merge-tree` showed `6.17.y`-`6.19.y` need minor backport
adjustment because they lack the dpll notifier-chain change; `7.0.y` is
straightforward.

Step 6.3 Record: stable branches checked do not contain this candidate
or the dependent two ice fixes.

## Phase 7: Subsystem And Maintainer Context
Step 7.1 Record: subsystem is dpll core plus Intel ice networking driver
context. Criticality is important but hardware/config-specific: users of
DPLL-capable Intel ice devices and DPLL netlink consumers.

Step 7.2 Record: dpll and ice DPLL code are actively developed; recent
history shows several related dpll/ice fixes, including missing SMA
initialization and paired-pin state fixes.

## Phase 8: Impact And Risk
Step 8.1 Record: affected population is driver-specific: DPLL-capable
Intel ice hardware using SMA/U.FL software-controlled pins.

Step 8.2 Record: trigger is DPLL pin state/direction changes or periodic
DPLL updates on affected hardware. Userspace impact is missed netlink
notifications and stale phase-offset reporting; triggering netlink
changes requires admin privileges.

Step 8.3 Record: failure mode severity is medium for the user-visible
bug: time-sync management software can miss DPLL pin transitions/phase-
offset changes. The locking issue would be a deadlock if the public
helper were used in the already-locked callback path.

Step 8.4 Record: benefit is medium-high for affected hardware and
required for the real ice fix; risk is low because the change is tiny,
GPL-only, lockdep-guarded, and does not alter userspace ABI.

## Phase 9: Final Synthesis
Evidence for backporting: it is a small, reviewed, tested prerequisite
for real ice DPLL notification fixes; it prevents a concrete self-
deadlock in the callback context verified through `dpll_pin_pre_doit()`
locking; affected stable trees contain the buggy ice redesign from v6.17
onward.

Evidence against backporting: by itself it is mostly an exported in-
kernel helper, so it should not be backported alone for standalone
value. Older affected stable trees need a minor manual backport due to
missing dpll notifier-chain code.

Stable rules checklist: obviously correct and tested: yes; fixes/enables
fix for real user-visible bug: yes, as part of the series; important
issue: medium functional correctness and deadlock avoidance; small and
contained: yes; no userspace API: yes; applies: clean on `7.0.y`, minor
backport for `6.17.y`-`6.19.y`.

Exception category: not a device ID or quirk. This is an acceptable
prerequisite/helper export for a stable-worthy driver fix series.

Verification:
- Phase 1: parsed `git show 620055cb1036a`; confirmed tags and no
  `Fixes:`.
- Phase 2: inspected diff; confirmed `11 insertions, 2 deletions` across
  three files.
- Phase 3: ran `git blame` and `git show` for `9d71b54b65b1`,
  `58256a26bfb3`, `2dd5d03c77e2`.
- Phase 4: ran `b4 dig -c`, `-a`, `-w`; fetched lore mirror thread
  showing v7 review, `Tested-by`, and `Reviewed-by`.
- Phase 5: searched callers and dpll netlink ops; verified
  `DPLL_CMD_PIN_SET` uses `dpll_pin_pre_doit()` and callbacks run while
  `dpll_lock` is held.
- Phase 6: checked stable branches for `2dd5d03c77e2`, candidate
  absence, dependent fix absence, and merge-tree/apply behavior.
- Phase 7: reviewed recent dpll and ice DPLL history.
- Phase 8: verified trigger/user impact from the dependent ice commit
  messages and diffs.

Backport recommendation: backport this commit to affected stable trees
together with `1a41b58fd4dc` and `9e5dead140af`; do not take it alone
unless it is queued as that prerequisite.

**YES**

 drivers/dpll/dpll_netlink.c | 10 ++++++++++
 drivers/dpll/dpll_netlink.h |  2 --
 include/linux/dpll.h        |  1 +
 3 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/dpll/dpll_netlink.c b/drivers/dpll/dpll_netlink.c
index 83cbd64abf5a4..95ae786e98aab 100644
--- a/drivers/dpll/dpll_netlink.c
+++ b/drivers/dpll/dpll_netlink.c
@@ -842,11 +842,21 @@ int dpll_pin_delete_ntf(struct dpll_pin *pin)
 	return dpll_pin_event_send(DPLL_CMD_PIN_DELETE_NTF, pin);
 }

+/**
+ * __dpll_pin_change_ntf - notify that the pin has been changed
+ * @pin: registered pin pointer
+ *
+ * Context: caller must hold dpll_lock. Suitable for use inside pin
+ *          callbacks which are already invoked under dpll_lock.
+ * Return: 0 if succeeds, error code otherwise.
+ */
 int __dpll_pin_change_ntf(struct dpll_pin *pin)
 {
+	lockdep_assert_held(&dpll_lock);
 	dpll_pin_notify(pin, DPLL_PIN_CHANGED);
 	return dpll_pin_event_send(DPLL_CMD_PIN_CHANGE_NTF, pin);
 }
+EXPORT_SYMBOL_GPL(__dpll_pin_change_ntf);

 /**
  * dpll_pin_change_ntf - notify that the pin has been changed
diff --git a/drivers/dpll/dpll_netlink.h b/drivers/dpll/dpll_netlink.h
index dd28b56d27c56..a9cfd55f57fc4 100644
--- a/drivers/dpll/dpll_netlink.h
+++ b/drivers/dpll/dpll_netlink.h
@@ -11,5 +11,3 @@ int dpll_device_delete_ntf(struct dpll_device *dpll);
 int dpll_pin_create_ntf(struct dpll_pin *pin);

 int dpll_pin_delete_ntf(struct dpll_pin *pin);
-
-int __dpll_pin_change_ntf(struct dpll_pin *pin);
diff --git a/include/linux/dpll.h b/include/linux/dpll.h
index 2ce295b46b8cd..8f97120ee7b37 100644
--- a/include/linux/dpll.h
+++ b/include/linux/dpll.h
@@ -276,6 +276,7 @@ int dpll_pin_ref_sync_pair_add(struct dpll_pin *pin,

 int dpll_device_change_ntf(struct dpll_device *dpll);

+int __dpll_pin_change_ntf(struct dpll_pin *pin);
 int dpll_pin_change_ntf(struct dpll_pin *pin);

 int register_dpll_notifier(struct notifier_block *nb);
-- 
2.53.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH AUTOSEL 7.0-5.10] nvme-core: fix parameter name in comment
  2026-05-05  9:51 [PATCH AUTOSEL 7.0-5.10] ALSA: hda: Avoid WARN_ON() for HDMI chmap slot checks Sasha Levin
                   ` (5 preceding siblings ...)
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0] dpll: export __dpll_pin_change_ntf() for use under dpll_lock Sasha Levin
@ 2026-05-05  9:51 ` Sasha Levin
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-5.10] nvme: add quirk NVME_QUIRK_IGNORE_DEV_SUBNQN for 144d:a808 (Samsung PM981/983/970 EVO Plus ) Sasha Levin
                   ` (20 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Sasha Levin @ 2026-05-05  9:51 UTC (permalink / raw)
  To: patches, stable
  Cc: Flavio Suligoi, Christoph Hellwig, Keith Busch, Sasha Levin, sagi,
	linux-nvme, linux-kernel

From: Flavio Suligoi <f.suligoi@asem.it>

[ Upstream commit e80e39f25567310c1c7392eed886890b5c6788ba ]

In the declaration of the structure "core_quirks[]", in the comment
referred to the devices "Kioxia CD6-V Series / HPE PE8030", the
parameter "default_ps_max_latency_us" is reported in a wrong way:

nvme_core.default_ps_max_latency=0

The correct form is, instead:

nvme_core.default_ps_max_latency_us=0

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Flavio Suligoi <f.suligoi@asem.it>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Phase 1: Commit Message Forensics
Step 1.1 Record: Subsystem `nvme-core`; action verb `fix`; claimed
intent is to correct the kernel module parameter name in a source
comment for the Kioxia CD6-V / HPE PE8030 NVMe quirk.

Step 1.2 Record: Tags present:
- `Reviewed-by: Christoph Hellwig <hch@lst.de>`
- `Signed-off-by: Flavio Suligoi <f.suligoi@asem.it>`
- `Signed-off-by: Keith Busch <kbusch@kernel.org>`
No `Fixes:`, `Reported-by:`, `Tested-by:`, `Link:`, or `Cc:
stable@vger.kernel.org` tag was present in the commit inspected with
`git show`.

Step 1.3 Record: The commit body describes an incorrect parameter
spelling in a comment: `nvme_core.default_ps_max_latency=0`; the actual
module parameter is `nvme_core.default_ps_max_latency_us=0`. Symptom is
not a runtime kernel failure from the patch itself, but incorrect in-
source guidance for disabling APST. Version information was not stated.
Root cause is a missing `_us` suffix in the comment.

Step 1.4 Record: This is not a hidden runtime bug fix. It is an explicit
comment/documentation correction for a real module parameter.

## Phase 2: Diff Analysis
Step 2.1 Record: One file changed: `drivers/nvme/host/core.c`, 1
insertion and 1 deletion. No function body is modified; the changed
object is the `core_quirks[]` table comment. Scope classification:
single-file, comment-only surgical fix.

Step 2.2 Record: Before: the comment suggested booting with nonexistent
or incorrect `nvme_core.default_ps_max_latency=0`. After: it suggests
the verified existing parameter `nvme_core.default_ps_max_latency_us=0`.
This affects only source-code guidance, not execution.

Step 2.3 Record: Bug category is documentation/comment correctness. No
resource leak, race, refcount bug, memory safety bug, type bug, or
executable logic change is present.

Step 2.4 Record: Fix quality is obviously correct:
`module_param(default_ps_max_latency_us, ulong, 0644)` exists in
`drivers/nvme/host/core.c`. Regression risk is effectively zero because
only a comment changes.

## Phase 3: Git History Investigation
Step 3.1 Record: `git blame` shows the incorrect comment line was
introduced by `5a6254d55e2a9f` (`nvme-pci: add NO APST quirk for Kioxia
device`). `git tag --contains 5a6254d55e2a9f` shows it is present from
`v5.16` onward in mainline tags available locally, and stable branch
snapshots for 5.10 and 5.15 also contain the line.

Step 3.2 Record: No `Fixes:` tag exists, so there was no tagged
introducer to follow. I separately inspected `5a6254d55e2a9f` because
blame identified it as the source of the wrong comment.

Step 3.3 Record: Recent file history shows this patch is standalone.
`block-next` contains `e80e39f255673 nvme-core: fix parameter name in
comment`; nearby commits touch unrelated NVMe behavior.

Step 3.4 Record: `git log block-next --author='Flavio Suligoi' --
drivers/nvme/host` found only this NVMe host commit locally. The patch
was reviewed by Christoph Hellwig and committed by Keith Busch, both
verified from commit metadata and lore.

Step 3.5 Record: No dependencies found. The patch is a one-line comment
correction and `git apply --check` against the current stable checkout
succeeded.

## Phase 4: Mailing List And External Research
Step 4.1 Record: `b4 dig -c e80e39f255673` found the original thread:
`https://patch.msgid.link/20260408124522.2375297-1-f.suligoi@asem.it`.
`b4 dig -a` found only v1, so no later revision was missed. Lore mirror
showed Christoph Hellwig replied “Looks good” with `Reviewed-by`, and
Keith Busch replied “applied to nvme-7.1”. No NAKs or risk concerns were
found.

Step 4.2 Record: `b4 dig -w` showed recipients included Keith Busch,
Jens Axboe, Christoph Hellwig, Sagi Grimberg, `linux-nvme`, and `linux-
kernel`, so the right subsystem maintainers/lists were included.

Step 4.3 Record: No `Reported-by` or bug-report `Link:` tag exists.
External search confirmed this is about correcting the APST parameter
spelling, not a separate crash report.

Step 4.4 Record: `b4 dig -a` showed this is a single-patch v1 series,
not a multi-patch dependency chain.

Step 4.5 Record: Direct `lore.kernel.org/stable` fetch was blocked by
Anubis, and web search did not find stable-specific discussion for this
exact patch. Stable-specific discussion remains unverified.

## Phase 5: Code Semantic Analysis
Step 5.1 Record: No functions are modified. The changed text is inside
the static `core_quirks[]` data table comment.

Step 5.2 Record: Caller tracing is not applicable to the changed line
because it is non-executable. I verified `core_quirks[]` is used by the
quirk scan path in `drivers/nvme/host/core.c`, but the patch does not
alter table values or matching logic.

Step 5.3 Record: Callee tracing is not applicable because no executable
statement changed.

Step 5.4 Record: Runtime reachability is not applicable. The only
“reachability” is human/developer/user reading the source comment.

Step 5.5 Record: `rg` verified the actual parameter is declared as
`default_ps_max_latency_us`; the old comment text is the mismatching
pattern.

## Phase 6: Cross-Referencing And Stable Tree Analysis
Step 6.1 Record: The incorrect comment exists in checked refs `v5.16`,
`v6.1`, `v6.6`, `v6.12`, `v6.19`, `v7.0`, and stable branch snapshots
`for-greg/5.10-201`, `for-greg/5.15-201`, `for-greg/6.1-201`, `for-
greg/6.6-201`, `for-greg/6.12-201`, `for-greg/6.19-200`, and `for-
greg/7.0-200`.

Step 6.2 Record: Expected backport difficulty is clean or trivial. `git
apply --check` and `git apply --check --3way` both succeeded on the
current checkout.

Step 6.3 Record: Local history did not show a different stable-side fix
for this exact comment. `git log stable/linux-7.0.y --grep='fix
parameter name in comment' -- drivers/nvme/host/core.c` returned no
match.

## Phase 7: Subsystem And Maintainer Context
Step 7.1 Record: Subsystem is NVMe host core under `drivers/nvme/host`.
Criticality level: important driver subsystem, but this patch’s actual
affected surface is source documentation only.

Step 7.2 Record: The NVMe host core file is actively maintained; recent
local history shows multiple NVMe core fixes and feature changes. This
specific patch was reviewed and applied through the NVMe/block path.

## Phase 8: Impact And Risk Assessment
Step 8.1 Record: Affected population is users/developers reading this
source comment for the Kioxia CD6-V / HPE PE8030 APST workaround.
Runtime users are not directly affected by the patch.

Step 8.2 Record: Trigger condition is consulting the source comment and
using the wrong boot parameter. I verified the correct parameter exists;
I did not verify kernel behavior for an unknown wrong boot parameter in
this investigation. Unprivileged runtime triggering is not applicable.

Step 8.3 Record: Failure mode of the patch’s target issue is incorrect
guidance, not a kernel crash, data corruption, deadlock, or security
issue. Severity is LOW as a code issue, but the guidance relates to a
real NVMe APST workaround.

Step 8.4 Record: Benefit is modest but real: stable source carries the
correct module parameter spelling for affected hardware guidance. Risk
is near zero because the patch is comment-only. Risk-benefit ratio is
favorable under the stable exception for documentation/comment fixes.

## Phase 9: Final Synthesis
Step 9.1 Record:
Evidence for backporting:
- Corrects an objectively wrong parameter name.
- Correct parameter is verified in the code.
- Incorrect text exists across active stable branch snapshots.
- Patch is one-line, comment-only, and applies cleanly.
- Reviewed by Christoph Hellwig and applied by Keith Busch.
- Falls under the documentation/comment-fix exception with effectively
  no runtime regression risk.

Evidence against backporting:
- Does not fix executable kernel behavior.
- Does not address crash, corruption, deadlock, security, memory safety,
  or resource leak.
- No explicit stable nomination or stable-list discussion was verified.

Unresolved:
- Stable-list search was partly blocked by Anubis; no stable-specific
  discussion was found via web search.
- I did not verify the exact runtime handling of the wrong boot
  parameter, because the commit only changes a source comment.

Step 9.2 Record:
1. Obviously correct and tested? Yes for the text correction; the actual
   parameter declaration was verified.
2. Fixes a real bug that affects users? It fixes a real
   documentation/comment bug; user impact is indirect.
3. Important issue? Not important as a runtime kernel bug; LOW severity.
4. Small and contained? Yes, one comment line in one file.
5. No new features or APIs? Yes.
6. Can apply to stable trees? Yes on the checked-out stable tree; likely
   trivial elsewhere due the same comment text in checked stable refs.

Step 9.3 Record: Exception category applies: documentation/comment fix.
It corrects incorrect in-source documentation and has zero runtime
behavior change.

Step 9.4 Record: Despite not being a runtime fix, this is appropriate
for stable under the explicit documentation/comment-fix exception: it is
objectively correct, present in stable trees, useful for a real hardware
workaround, and essentially risk-free.

## Verification
- [Phase 1] `git show --format=fuller --stat --patch e80e39f255673`:
  verified subject, tags, author/committer, and one-line comment diff.
- [Phase 2] Diff inspection: verified only `drivers/nvme/host/core.c`
  changed, 1 insertion and 1 deletion, comment-only.
- [Phase 2] `rg default_ps_max_latency`: verified actual module
  parameter is `default_ps_max_latency_us`.
- [Phase 3] `git blame` around the quirk comment: verified wrong comment
  introduced by `5a6254d55e2a9f`.
- [Phase 3] `git show 5a6254d55e2a9f`: verified original Kioxia NO_APST
  quirk and wrong comment text.
- [Phase 3] `git tag --contains 5a6254d55e2a9f`: verified mainline
  availability from `v5.16` in local tags.
- [Phase 4] `b4 dig -c e80e39f255673`: found original patch submission.
- [Phase 4] `b4 dig -a`: verified single v1 patch.
- [Phase 4] `b4 dig -w`: verified NVMe maintainers and lists were
  included.
- [Phase 4] WebFetch of lore mirror: verified Reviewed-by from Christoph
  Hellwig and applied note from Keith Busch.
- [Phase 5] `rg core_quirks`: verified the changed line is in
  `core_quirks[]`; no executable behavior changed.
- [Phase 6] Ref checks with `git show <ref>:drivers/nvme/host/core.c |
  rg`: verified wrong comment exists in checked stable refs and
  corrected text exists in `block-next`.
- [Phase 6] `git apply --check` and `git apply --check --3way`: verified
  clean local application.
- [Phase 7] `git log block-next -- drivers/nvme/host/core.c`: verified
  active subsystem history and target commit in block-next.
- [Phase 8] Failure mode verified from diff: incorrect comment guidance
  only, no runtime code path changed.
- UNVERIFIED: stable mailing-list discussion, because direct lore stable
  search was blocked and web search found no exact stable discussion.
- UNVERIFIED: exact kernel warning/handling for the wrong boot
  parameter, because it is outside the changed code.

**YES**

 drivers/nvme/host/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 766e9cc4ffca5..ac5a7d9781f58 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3013,7 +3013,7 @@ static const struct nvme_core_quirk_entry core_quirks[] = {
 		 *
 		 * The device is left in a state where it is also not possible
 		 * to use "nvme set-feature" to disable APST, but booting with
-		 * nvme_core.default_ps_max_latency=0 works.
+		 * nvme_core.default_ps_max_latency_us=0 works.
 		 */
 		.vid = 0x1e0f,
 		.mn = "KCD6XVUL6T40",
-- 
2.53.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH AUTOSEL 7.0-5.10] nvme: add quirk NVME_QUIRK_IGNORE_DEV_SUBNQN for 144d:a808 (Samsung PM981/983/970 EVO Plus )
  2026-05-05  9:51 [PATCH AUTOSEL 7.0-5.10] ALSA: hda: Avoid WARN_ON() for HDMI chmap slot checks Sasha Levin
                   ` (6 preceding siblings ...)
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-5.10] nvme-core: fix parameter name in comment Sasha Levin
@ 2026-05-05  9:51 ` Sasha Levin
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0] ASoC: spacemit: move hw constraints from hw_params to startup Sasha Levin
                   ` (19 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Sasha Levin @ 2026-05-05  9:51 UTC (permalink / raw)
  To: patches, stable
  Cc: Alan Cui, Keith Busch, Sasha Levin, sagi, linux-nvme,
	linux-kernel

From: Alan Cui <me@alancui.cc>

[ Upstream commit 7f991e3f9b8f044640bcb5fa8570350a68932843 ]

The firmware for Samsung 970 Evo Plus / PM981 / PM983 does not support SUBNQN.
Make quirks to suppress warnings.

# nvme id-ctrl /dev/nvme1n1
NVME Identify Controller:
vid       : 0x144d
ssvid     : 0x144d
sn        : ***
mn        : Samsung SSD 970 EVO Plus 500GB
fr        : 2B2QEXM7

mcdqpc    : 0
subnqn    :
ioccsz    : 0

Signed-off-by: Alan Cui <me@alancui.cc>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Phase Walkthrough
1. Phase 1: Commit message forensics
  Record: subsystem `nvme`, action `add quirk`, intent is to add
  `NVME_QUIRK_IGNORE_DEV_SUBNQN` for Samsung PCI ID `144d:a808`. Tags
  present: `Signed-off-by: Alan Cui`, `Signed-off-by: Keith Busch`; no
  `Fixes:`, `Reported-by:`, `Tested-by:`, `Reviewed-by:`, `Link:`, or
  `Cc: stable`. Body says Samsung 970 EVO Plus / PM981 / PM983 firmware
  leaves `subnqn` empty; supplied `nvme id-ctrl` output confirms
  `vid=0x144d`, model `Samsung SSD 970 EVO Plus 500GB`, firmware
  `2B2QEXM7`, and empty `subnqn`. This is a hardware quirk, not a hidden
  memory/race/resource bug.

2. Phase 2: Diff analysis
  Record: one file, `drivers/nvme/host/pci.c`, 2 insertions. Modified
  object is the `nvme_id_table` PCI ID table. Before: Samsung
  `144d:a808` matched the generic NVMe PCI class entry and got no
  `IGNORE_DEV_SUBNQN` quirk. After: it matches a specific PCI ID entry
  and sets `NVME_QUIRK_IGNORE_DEV_SUBNQN`. In `nvme_init_subnqn()`, that
  quirk skips device-provided SUBNQN handling and suppresses the
  “missing or invalid SUBNQN field” warning while still generating the
  synthetic NQN. Fix quality is surgical and consistent with nearby
  quirks. Regression risk is low, with the main caveat that the quirk
  applies to all `144d:a808` devices.

3. Phase 3: Git history investigation
  Record: target commit is `7f991e3f9b8f0`. There is no `Fixes:` tag.
  `NVME_QUIRK_IGNORE_DEV_SUBNQN` was introduced by `6299358d198a0`,
  described as handling firmware that reports invalid/non-unique SUBNQN,
  first contained around `v5.0-rc2`. Existing `144d:a808` handling for a
  suspend quirk was introduced by `1fae37accfc587`, around `v5.6-rc3`,
  confirming the PCI ID is already known in NVMe PCI code. Recent
  history shows this commit is standalone, not part of a required
  series. Author history in this subsystem showed only this commit;
  Keith Busch committed it.

4. Phase 4: Mailing list and external research
  Record: `b4 dig -c 7f991e3f9b8f0` found the original submission at
  `https://patch.msgid.link/9600680.CDJkKcVGEf@alanarchdesktop`. `b4 dig
  -a` showed only v1. `b4 dig -w` showed recipients included `linux-
  nvme`, `linux-kernel`, and Keith Busch. The fetched mbox contained the
  same patch and no review replies or objections. Web research found an
  earlier 2021 linux-nvme patch proposing the same `144d:a808` quirk for
  Samsung 970 EVO Plus/SM981/PM981/PM983, plus Debian and Proxmox user
  reports of the same warning. No stable-specific discussion or
  rejection reason was found.

5. Phase 5: Code semantic analysis
  Record: changed data structure is `nvme_id_table`. PCI core uses that
  table through `nvme_driver.id_table`, then calls `nvme_probe()`.
  `nvme_probe()` calls `nvme_pci_alloc_dev()`, which initializes `quirks
  = id->driver_data`, then passes those quirks to `nvme_init_ctrl()`.
  Later identify flow calls `nvme_init_identify()`,
  `nvme_init_subsystem()`, and `nvme_init_subnqn()`. The affected path
  is normal PCI NVMe device probe at boot or hotplug, not a syscall-
  triggered path. Similar `IGNORE_DEV_SUBNQN` quirks already exist for
  Intel, ADATA, Samsung PM1725a, Lexar, Phison, and other devices.

6. Phase 6: Stable tree analysis
  Record: checked `v5.4`, `v5.10`, `v5.15`, `v6.1`, `v6.6`, `v6.12`,
  `v6.17`, `v6.18`, and `v6.19`. All have the generic NVMe PCI class
  match and the `NVME_QUIRK_IGNORE_DEV_SUBNQN` infrastructure; none had
  the specific `144d:a808` `IGNORE_DEV_SUBNQN` entry. The insertion
  context around Memblaze `0x1c5f:0x0540` and Samsung PM1725/PM1725a
  exists in all checked tags, so backport difficulty should be clean or
  trivial.

7. Phase 7: Subsystem and maintainer context
  Record: subsystem is NVMe PCI host driver under `drivers/nvme/host`,
  important storage hardware support. It affects users with Samsung
  `144d:a808` NVMe SSDs, not all users. The subsystem is actively
  maintained; recent history shows multiple NVMe fixes and quirk
  additions. Keith Busch, an NVMe maintainer, committed the patch.

8. Phase 8: Impact and risk assessment
  Record: affected users are Samsung 970 EVO Plus / PM981 / PM983 /
  related `144d:a808` NVMe users. Trigger is device probe, typically
  boot. Verified failure mode is a persistent kernel warning for
  missing/invalid SUBNQN; for the empty-SUBNQN case, code already falls
  back to a synthetic NQN, so I did not verify a crash, data corruption,
  or probe failure for this exact firmware. Severity is low-to-medium,
  but it is a real firmware compliance issue on real hardware. Benefit
  is modest but real: suppresses a misleading warning and applies the
  established firmware workaround. Risk is very low: two lines, device-
  specific, no API changes.

9. Phase 9: Final synthesis
  Evidence for backporting: hardware quirk for an existing driver; real
  user-visible firmware issue; exact device ID; tiny and contained;
  infrastructure exists across stable trees; maintainer accepted
  upstream; stable context appears present across checked LTS tags.
  Evidence against: the verified symptom for this exact commit is
  warning suppression rather than a crash/data-loss fix; no `Reported-
  by`, `Tested-by`, or review tags; broad PCI ID match could affect all
  `144d:a808` variants. Stable checklist: obviously correct yes; real
  bug yes, as firmware reports empty SUBNQN; important issue only weak
  under normal criteria, but it fits the stable exception for hardware
  quirks; small and contained yes; no new API or feature yes; expected
  to apply cleanly yes.

## Verification
- Phase 1: `git show --format=fuller --stat --patch 7f991e3f9b8f0`
  verified subject, body, tags, author/committer, and 2-line diff.
- Phase 2: Read `drivers/nvme/host/core.c`, `drivers/nvme/host/nvme.h`,
  and `drivers/nvme/host/pci.c`; verified quirk definition, PCI table
  use, and `nvme_init_subnqn()` behavior.
- Phase 3: `git blame` around the quirk table and `144d:a808` suspend
  handling; `git show` and `git describe --contains` for `6299358d198a0`
  and `1fae37accfc587`; `git log` on `drivers/nvme/host` for related
  commits.
- Phase 4: `b4 dig -c`, `-a`, `-w`, and mbox read verified original lore
  submission, single v1 revision, recipients, and lack of visible review
  objections. WebFetch verified the older 2021 same-quirk patch and user
  reports.
- Phase 5: `rg` and file reads traced `nvme_id_table` through
  `nvme_probe()`, `nvme_pci_alloc_dev()`, `nvme_init_identify()`,
  `nvme_init_subsystem()`, and `nvme_init_subnqn()`.
- Phase 6: `git grep` and a Python `git show
  <tag>:drivers/nvme/host/pci.c` check verified stable tags have the
  infrastructure/context and lack the new `144d:a808` quirk.
- Unverified: I did not test-build or boot the patch, and I did not
  verify a functional failure beyond the warning for this exact Samsung
  firmware.

This should be backported because it is a classic low-risk hardware
quirk for a real, reported firmware non-compliance on an existing NVMe
driver path, even though the confirmed symptom is warning noise rather
than a severe failure.

**YES**

 drivers/nvme/host/pci.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index b78ba239c8ea8..d59340982520a 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -4104,6 +4104,8 @@ static const struct pci_device_id nvme_id_table[] = {
 		.driver_data = NVME_QUIRK_DELAY_BEFORE_CHK_RDY, },
 	{ PCI_DEVICE(0x1c5f, 0x0540),	/* Memblaze Pblaze4 adapter */
 		.driver_data = NVME_QUIRK_DELAY_BEFORE_CHK_RDY, },
+	{ PCI_DEVICE(0x144d, 0xa808),	/* Samsung PM981/983 */
+		.driver_data = NVME_QUIRK_IGNORE_DEV_SUBNQN, },
 	{ PCI_DEVICE(0x144d, 0xa821),   /* Samsung PM1725 */
 		.driver_data = NVME_QUIRK_DELAY_BEFORE_CHK_RDY, },
 	{ PCI_DEVICE(0x144d, 0xa822),   /* Samsung PM1725a */
-- 
2.53.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH AUTOSEL 7.0] ASoC: spacemit: move hw constraints from hw_params to startup
  2026-05-05  9:51 [PATCH AUTOSEL 7.0-5.10] ALSA: hda: Avoid WARN_ON() for HDMI chmap slot checks Sasha Levin
                   ` (7 preceding siblings ...)
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-5.10] nvme: add quirk NVME_QUIRK_IGNORE_DEV_SUBNQN for 144d:a808 (Samsung PM981/983/970 EVO Plus ) Sasha Levin
@ 2026-05-05  9:51 ` Sasha Levin
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-5.10] ALSA: usb-audio: apply quirk for Playstation PDP Riffmaster Sasha Levin
                   ` (18 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Sasha Levin @ 2026-05-05  9:51 UTC (permalink / raw)
  To: patches, stable
  Cc: Troy Mitchell, Mark Brown, Sasha Levin, lgirdwood, perex, tiwai,
	dlan, linux-sound, linux-riscv, spacemit, linux-kernel

From: Troy Mitchell <troy.mitchell@linux.spacemit.com>

[ Upstream commit 6b4afbaaa342eaa52172e0be5ef8d1fcbf9ff460 ]

Hardware constraints should be applied in the startup callback rather
than hw_params, as hw_params may be called too late for the constraints
to take effect properly.

Move the channel count and format constraints for I2S and DSP_A/DSP_B
modes into a new startup callback. This also tightens the I2S mode
channel constraint from 1-2 to exactly 2, matching the actual hardware
behavior.

Signed-off-by: Troy Mitchell <troy.mitchell@linux.spacemit.com>
Link: https://patch.msgid.link/20260429-k3-i2s-v1-2-2fe99db11ecb@linux.spacemit.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Phase Walkthrough
Step 1.1 Record: Subsystem `ASoC: spacemit`; action verb `move`; intent
is to apply PCM hardware constraints in `.startup` instead of
`.hw_params`.

Step 1.2 Record: Tags present: `Signed-off-by: Troy Mitchell`, `Link: ht
tps://patch.msgid.link/20260429-k3-i2s-v1-2-
2fe99db11ecb@linux.spacemit.com`, `Signed-off-by: Mark Brown`. No
`Fixes:`, `Reported-by:`, `Tested-by:`, `Reviewed-by`, `Acked-by`, or
`Cc: stable`.

Step 1.3 Record: The body says constraints in `.hw_params` may be too
late to affect ALSA parameter negotiation, and that I2S should be
constrained to exactly 2 channels. Symptom is invalid PCM
formats/channel counts being allowed for the SpacemiT I2S mode. No crash
trace or affected kernel versions are stated.

Step 1.4 Record: This is a hidden bug fix: it is phrased as
moving/refactoring constraints, but it corrects when ALSA constraints
are installed and prevents unsupported hardware parameters.

Step 2.1 Record: One file changed, `sound/soc/spacemit/k1_i2s.c`, 32
insertions and 13 deletions. Modified functions/objects: new
`spacemit_i2s_startup`, existing `spacemit_i2s_hw_params`, and
`spacemit_i2s_dai_ops`. Scope is single-file, driver-local, surgical.

Step 2.2 Record: Before, the driver advertised broad 1-2 channel and
S16/S32 capabilities and only added mode-specific constraints inside
`.hw_params`, after parameters had already been selected. After,
`.startup` constrains I2S to 2ch/S16 and DSP_A/DSP_B to 1ch/S32 before
userspace parameter selection completes.

Step 2.3 Record: Bug category is logic/correctness in hardware parameter
negotiation. There is no memory safety, locking, refcounting, or
resource lifetime change.

Step 2.4 Record: Fix quality is good: it moves existing constraints to
the ALSA callback used for early constraints and tightens I2S channels.
Regression risk is low but not zero: unsupported mono I2S streams will
now fail earlier, which is intended if the author’s hardware statement
is correct.

Step 3.1 Record: `git blame` shows the changed constraint code was
introduced by `fce217449075d` / local full SHA
`fce217449075d59b29052b8cdac567f0f3e22641`, “ASoC: spacemit: add i2s
support for K1 SoC”, first described as contained before `v6.19-rc1`.

Step 3.2 Record: No `Fixes:` tag in this candidate, so there was no tag
to follow. I inspected the driver-introduction commit instead.

Step 3.3 Record: Recent file history contains the original driver plus
two small later fixes: failure handling for `spacemit_i2s_init_dai()`
and an `sspa_clk` error-check fix. No prerequisite refactor is needed
for this patch.

Step 3.4 Record: Author Troy Mitchell authored the original K1 I2S
driver and one related later fix, so he is directly familiar with this
driver.

Step 3.5 Record: The patch is part of a 7-patch series, but `b4` and
`git apply --check` confirmed patch 2 applies cleanly standalone to the
current tree.

Step 4.1 Record: `b4 dig` against a temporary commit matched the lore
thread at the provided patch-msgid URL. `b4 dig -a` found only v1. The
full thread shows Mark Brown applied patch 2 to `broonie/sound.git
for-7.1` as commit `6b4afbaaa342`.

Step 4.2 Record: `b4 dig -w` shows ASoC/sound maintainers and lists were
included, including Mark Brown, Liam Girdwood, Jaroslav Kysela, Takashi
Iwai, `linux-sound`, and relevant SpacemiT/RISC-V lists.

Step 4.3 Record: No separate bug report, syzbot report, or user report
was linked for this specific patch.

Step 4.4 Record: Series context: patches 1-3 are described as K1 bug
fixes/refactoring; patches 4-7 are K3 feature/binding work. Mark Brown
explicitly noted fixes and new features had no textual overlap and
applied only patches 2 and 3 from the subset.

Step 4.5 Record: Stable-specific web search was blocked by lore Anubis,
and a local pending-branch grep was stopped after timing out. No stable-
specific discussion was verified.

Step 5.1 Record: Key function added/modified: `spacemit_i2s_startup`;
key function simplified: `spacemit_i2s_hw_params`.

Step 5.2 Record: Callers verified: `.startup` is called through
`snd_soc_dai_startup()` from `__soc_pcm_open()`. `.hw_params` is called
through `snd_soc_dai_hw_params()` from `__soc_pcm_hw_params()`.

Step 5.3 Record: Key callees are `snd_pcm_hw_constraint_minmax()` and
`snd_pcm_hw_constraint_mask64()` in startup, and clock/DMA/register
setup remains in `hw_params`.

Step 5.4 Record: Reachability is via ALSA PCM open and hw_params
operations; userspace audio applications can trigger the affected
negotiation path by opening/configuring a PCM device for this DAI.

Step 5.5 Record: Similar local ASoC pattern verified: many drivers
install constraints in `.startup`, and ASoC core comments confirm
startup is part of PCM open.

Step 6.1 Record: The file is absent at `v6.18` and present with the same
blob in `v6.19`, `v7.0`, `v7.0.3`, `pending-6.19`, and `pending-7.0`.
Likely relevant stable trees start at 6.19.y.

Step 6.2 Record: Backport difficulty is low: the patch applies cleanly
to the current 7.0.3 checkout, and the file blob is identical in checked
6.19/7.0 branches.

Step 6.3 Record: No related fix for this exact subject exists in checked
stable branches; the candidate commit exists locally as `6b4afbaaa342`
but is not contained in the checked pending stable branches I verified.

Step 7.1 Record: Subsystem is ASoC platform driver under
`sound/soc/spacemit`; criticality is PERIPHERAL/driver-specific.

Step 7.2 Record: Subsystem activity is low and localized: only four
local commits under `sound/soc/spacemit`, including the original driver
and small fixes.

Step 8.1 Record: Affected population is config/platform-specific:
`CONFIG_SND_SOC_K1_I2S`, depending on `COMPILE_TEST || ARCH_SPACEMIT`,
`HAVE_CLK`, and `DMA_CMA`.

Step 8.2 Record: Trigger is normal PCM device open/parameter selection
on SpacemiT K1 I2S hardware. Unprivileged reachability depends on local
audio device permissions, which I did not verify.

Step 8.3 Record: Failure mode is unsupported audio configuration being
accepted or constrained too late, causing bad/failed audio operation
rather than a verified crash, corruption, or security issue. Severity:
MEDIUM for affected hardware users.

Step 8.4 Record: Benefit is moderate: corrects real hardware parameter
negotiation for a supported driver. Risk is low: one driver, one file,
no ABI/API change, clean apply, and maintainer-applied.

Step 9.1 Record: Evidence for backporting: real driver correctness bug,
normal userspace audio path, introduced in stable-relevant 6.19-era
driver, small contained patch, clean apply, maintainer accepted.
Evidence against: no reporter/test tag, no crash/security/data-loss
evidence, exact “I2S is 2ch only” hardware fact not independently
verified outside author/maintainer review.

Step 9.2 Record: Stable rules: obviously correct enough: yes, based on
ALSA callback ordering and existing constraints; real bug: yes;
important issue: moderate hardware functionality bug, not critical;
small/contained: yes; no new API/feature: yes; can apply: yes to checked
6.19/7.0-family trees.

Step 9.3 Record: No automatic exception category like device ID, quirk,
DT-only, build-only, or documentation-only applies.

Step 9.4 Record: Decision is to backport. The risk is low and the patch
fixes a real, user-visible hardware configuration bug in a stable-
relevant driver.

## Verification
- [Phase 1] Parsed commit `6b4afbaaa342` with `git show`; confirmed tags
  and lack of `Fixes`/reporter/stable tags.
- [Phase 2] Inspected candidate diff; confirmed only `k1_i2s.c` changes,
  adding `.startup` and moving constraints out of `.hw_params`.
- [Phase 3] Ran `git blame` on changed lines; confirmed original code
  from `fce217449075d`.
- [Phase 3] Ran file history and author history; found only localized
  SpacemiT I2S changes.
- [Phase 4] Used `b4 am`, `b4 mbox`, and `b4 dig`; confirmed v1 thread,
  maintainer application to `for-7.1`, and reviewer/maintainer
  recipients.
- [Phase 5] Read ASoC core code; confirmed startup occurs during PCM
  open and hw_params occurs later during ALSA hardware parameter setup.
- [Phase 6] Checked tags/branches and patch application; confirmed file
  presence from `v6.19` onward and clean standalone apply.
- [Phase 7] Read `Kconfig`; confirmed driver-specific SpacemiT K1 I2S
  config scope.
- [Phase 8] Assessed trigger/failure from verified ALSA call paths and
  driver-advertised constraints.
- UNVERIFIED: Independent hardware documentation proving I2S mode is
  exactly 2 channels; this claim comes from the commit author and
  maintainer-accepted patch.
- UNVERIFIED: Stable mailing-list search results, because lore WebFetch
  was blocked and one local pending-branch grep was stopped after
  timeout.

**YES**

 sound/soc/spacemit/k1_i2s.c | 45 ++++++++++++++++++++++++++-----------
 1 file changed, 32 insertions(+), 13 deletions(-)

diff --git a/sound/soc/spacemit/k1_i2s.c b/sound/soc/spacemit/k1_i2s.c
index 1cb99f1abc7cd..bb73d32a1b097 100644
--- a/sound/soc/spacemit/k1_i2s.c
+++ b/sound/soc/spacemit/k1_i2s.c
@@ -106,6 +106,37 @@ static void spacemit_i2s_init(struct spacemit_i2s_dev *i2s)
 	writel(0, i2s->base + SSINTEN);
 }

+static int spacemit_i2s_startup(struct snd_pcm_substream *substream,
+	struct snd_soc_dai *dai)
+{
+	struct spacemit_i2s_dev *i2s = snd_soc_dai_get_drvdata(dai);
+
+	switch (i2s->dai_fmt & SND_SOC_DAIFMT_FORMAT_MASK) {
+	case SND_SOC_DAIFMT_I2S:
+		snd_pcm_hw_constraint_minmax(substream->runtime,
+					     SNDRV_PCM_HW_PARAM_CHANNELS,
+					     2, 2);
+		snd_pcm_hw_constraint_mask64(substream->runtime,
+					     SNDRV_PCM_HW_PARAM_FORMAT,
+					     SNDRV_PCM_FMTBIT_S16_LE);
+		break;
+	case SND_SOC_DAIFMT_DSP_A:
+	case SND_SOC_DAIFMT_DSP_B:
+		snd_pcm_hw_constraint_minmax(substream->runtime,
+					     SNDRV_PCM_HW_PARAM_CHANNELS,
+					     1, 1);
+		snd_pcm_hw_constraint_mask64(substream->runtime,
+					     SNDRV_PCM_HW_PARAM_FORMAT,
+					     SNDRV_PCM_FMTBIT_S32_LE);
+		break;
+	default:
+		dev_dbg(i2s->dev, "unexpected format type");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 static int spacemit_i2s_hw_params(struct snd_pcm_substream *substream,
 				  struct snd_pcm_hw_params *params,
 				  struct snd_soc_dai *dai)
@@ -157,22 +188,9 @@ static int spacemit_i2s_hw_params(struct snd_pcm_substream *substream,
 			dma_data->maxburst = 32;
 			dma_data->addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
 		}
-
-		snd_pcm_hw_constraint_minmax(substream->runtime,
-					     SNDRV_PCM_HW_PARAM_CHANNELS,
-					     1, 2);
-		snd_pcm_hw_constraint_mask64(substream->runtime,
-					     SNDRV_PCM_HW_PARAM_FORMAT,
-					     SNDRV_PCM_FMTBIT_S16_LE);
 		break;
 	case SND_SOC_DAIFMT_DSP_A:
 	case SND_SOC_DAIFMT_DSP_B:
-		snd_pcm_hw_constraint_minmax(substream->runtime,
-					     SNDRV_PCM_HW_PARAM_CHANNELS,
-					     1, 1);
-		snd_pcm_hw_constraint_mask64(substream->runtime,
-					     SNDRV_PCM_HW_PARAM_FORMAT,
-					     SNDRV_PCM_FMTBIT_S32_LE);
 		break;
 	default:
 		dev_dbg(i2s->dev, "unexpected format type");
@@ -303,6 +321,7 @@ static int spacemit_i2s_dai_remove(struct snd_soc_dai *dai)
 static const struct snd_soc_dai_ops spacemit_i2s_dai_ops = {
 	.probe = spacemit_i2s_dai_probe,
 	.remove = spacemit_i2s_dai_remove,
+	.startup = spacemit_i2s_startup,
 	.hw_params = spacemit_i2s_hw_params,
 	.set_sysclk = spacemit_i2s_set_sysclk,
 	.set_fmt = spacemit_i2s_set_fmt,
-- 
2.53.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH AUTOSEL 7.0-5.10] ALSA: usb-audio: apply quirk for Playstation PDP Riffmaster
  2026-05-05  9:51 [PATCH AUTOSEL 7.0-5.10] ALSA: hda: Avoid WARN_ON() for HDMI chmap slot checks Sasha Levin
                   ` (8 preceding siblings ...)
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0] ASoC: spacemit: move hw constraints from hw_params to startup Sasha Levin
@ 2026-05-05  9:51 ` Sasha Levin
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.18] nvmet-tcp: Don't clear tls_key when freeing sq Sasha Levin
                   ` (17 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Sasha Levin @ 2026-05-05  9:51 UTC (permalink / raw)
  To: patches, stable
  Cc: Rosalie Wanders, Takashi Iwai, Sasha Levin, perex, tiwai,
	linux-sound, linux-kernel

From: Rosalie Wanders <rosalie@mailbox.org>

[ Upstream commit 110189f0268d0eb85895721526328cac5804a739 ]

This device, just like the Playstation 5's DualSense, has a volume
that's too low, hid-playstation solves this by raising the minimum
volume on the device itself by sending an output report, third party PS5
controllers/accessories do not support this output report format, so we
apply a quirk to raise the minimum volume by 6dB.

Signed-off-by: Rosalie Wanders <rosalie@mailbox.org>
Link: https://patch.msgid.link/20260426025520.3985-2-rosalie@mailbox.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Phase 1: Commit Message Forensics
Step 1.1 Record: Subsystem is `ALSA: usb-audio`; action verb is `apply`;
intent is to add a device-specific volume quirk for PlayStation PDP
Riffmaster controllers/accessories.

Step 1.2 Record: Tags present are:
- `Signed-off-by: Rosalie Wanders <rosalie@mailbox.org>`
- `Link:
  https://patch.msgid.link/20260426025520.3985-2-rosalie@mailbox.org`
- `Signed-off-by: Takashi Iwai <tiwai@suse.de>`
No `Fixes:`, `Reported-by:`, `Tested-by:`, `Reviewed-by:`, `Acked-by:`,
or `Cc: stable@vger.kernel.org` tags were present.

Step 1.3 Record: The commit says the PDP Riffmaster has too-low volume,
similar to the PS5 DualSense. It states `hid-playstation` compensates
for DualSense by sending an output report, but third-party PS5
controllers/accessories do not support that output report format, so
this patch raises the ALSA minimum volume by 6 dB. Symptom is functional
audio output being too quiet on this hardware. No affected kernel
version is stated.

Step 1.4 Record: This is not hidden as cleanup; it is explicitly a
hardware quirk/workaround. It fixes a device-specific functional
problem, not a crash, memory safety issue, or data corruption issue.

## Phase 2: Diff Analysis
Step 2.1 Record: One file changed: `sound/usb/mixer.c`, with 10
insertions and no removals. One function changed:
`volume_control_quirks()`. Scope is a single-file, single-switch-entry
hardware quirk.

Step 2.2 Record: Before, PDP Riffmaster USB IDs had no special handling,
so the device-reported volume range was used as-is. After, for USB IDs
`0x0e6f:0x024a` and `0x0e6f:0x0249`, if the ALSA control name is `PCM
Playback Volume`, the code logs the quirk and sets `cval->min = -2560`.

Step 2.3 Record: Bug category is hardware workaround / device-specific
USB-audio volume quirk. The mechanism is correcting the exposed minimum
playback-volume value before ALSA dB conversion and control
registration.

Step 2.4 Record: Fix quality is high: tiny, isolated, matches existing
nearby patterns for device-specific volume quirks, changes no API, and
is gated by exact USB IDs plus exact control name. Regression risk is
very low and limited to the two PDP Riffmaster IDs.

## Phase 3: Git History Investigation
Step 3.1 Record: `git blame` shows `volume_control_quirks()` was
introduced by `dcaaf9f2c16b56` and is present since `v3.2-rc2~23^2~3`.
The immediate insertion context includes the Asus USB DAC quirk from
`4020d1ccbe55bd` (`v5.7~16^2~2`) and, on newer branches, MOONDROP
quirks. For this commit, the “buggy code” is effectively the absence of
a quirk for this hardware, not a bad prior line.

Step 3.2 Record: No `Fixes:` tag is present, so there is no introducing
commit to follow.

Step 3.3 Record: Recent `sound/usb/mixer.c` history contains multiple
related USB-audio volume/control fixes and quirks, including MOONDROP,
Huawei, MS LifeChat, and volume-range checking work. No prerequisite
commit was identified for the Riffmaster quirk itself.

Step 3.4 Record: `git log --author='Rosalie Wanders'` under `sound/usb`
found this one sound commit. The patch was committed by Takashi Iwai,
and `MAINTAINERS` lists Takashi Iwai and Jaroslav Kysela as `SOUND`
maintainers.

Step 3.5 Record: No functional dependencies found. The code assumes only
existing `USB_ID()` matching and `volume_control_quirks()`
infrastructure. Backport may need minor contextual placement on some
branches.

## Phase 4: Mailing List And External Research
Step 4.1 Record: `b4 dig -c 110189f0268d0` found the original thread at
`https://patch.msgid.link/20260426025520.3985-2-rosalie@mailbox.org`.
`b4 dig -a` found only v1. Patchew and b4 showed Takashi Iwai replied
“Applied now. Thanks.” No objections or NAKs found.

Step 4.2 Record: `b4 dig -w` showed recipients: Rosalie Wanders,
Jaroslav Kysela, Takashi Iwai, `linux-sound@vger.kernel.org`, and
`linux-kernel@vger.kernel.org`. The appropriate sound maintainers/list
were included.

Step 4.3 Record: The only `Link:` is the patch submission itself. No
separate bugzilla, syzbot, or user report link was found.

Step 4.4 Record: The patch is standalone v1, not part of a multi-patch
series. No related Riffmaster patches were found in local
`origin/master` history.

Step 4.5 Record: Direct lore stable search was blocked by Anubis, and
web search did not find stable-list discussion for PDP Riffmaster. No
stable-specific nomination or rejection was verified.

## Phase 5: Code Semantic Analysis
Step 5.1 Record: Modified function: `volume_control_quirks()`.

Step 5.2 Record: Call path verified manually: USB probe creates streams
and mixer via `snd_usb_create_mixer()`, mixer parsing reaches
`snd_usb_mixer_controls()`, feature units call `build_feature_ctl()`,
which calls `get_min_max_with_quirks()`, which calls
`volume_control_quirks()` when a `kctl` exists.

Step 5.3 Record: Relevant callees are `usb_audio_info()` and later ALSA
dB/range handling in `get_min_max_with_quirks()`. The patch changes
`cval->min` before `dBmin`, `dBmax`, initialization, and control
registration.

Step 5.4 Record: Reachability is via USB device enumeration and ALSA
mixer creation for this specific hardware. Userspace impact is through
normal audio/mixer use after the device is connected. I did not verify
the actual device descriptor/control name on hardware.

Step 5.5 Record: Similar patterns exist in the same switch for
CM102-A+/102S+, Asus USB DAC, MOONDROP Quark2, Huawei CM-Q3, and
MOONDROP JU Jiu. A similar MOONDROP JU Jiu quirk was explicitly tagged
for stable.

## Phase 6: Stable Tree Analysis
Step 6.1 Record: Checked `remotes/stable/linux-5.10.y`, `5.15.y`,
`6.1.y`, `6.6.y`, `6.12.y`, `6.18.y`, `6.19.y`, and `7.0.y`: all contain
`volume_control_quirks()` and the call from `get_min_max_with_quirks()`.
None contained the Riffmaster IDs.

Step 6.2 Record: Current `7.0.y` working tree accepts the patch with
`git apply --check`. `git merge-tree` suggests most checked branches can
merge the addition automatically, but `5.15.y` shows a minor context
conflict around another quirk entry after the Asus DAC entry.

Step 6.3 Record: No related Riffmaster fix already present in the
checked stable branches or `origin/master` history beyond
`110189f0268d0`.

## Phase 7: Subsystem And Maintainer Context
Step 7.1 Record: Subsystem is ALSA USB audio under `sound/usb`.
Criticality is driver-specific/peripheral: it affects users of the PDP
Riffmaster hardware, not the whole system.

Step 7.2 Record: Recent history shows `sound/usb` is actively
maintained, with nearby fixes and quirks. The patch was committed by a
listed sound maintainer.

## Phase 8: Impact And Risk Assessment
Step 8.1 Record: Affected population is hardware-specific: PDP
Riffmaster PS4/PS5 users with USB-audio playback exposed as `PCM
Playback Volume`.

Step 8.2 Record: Trigger is connecting/using that hardware and its ALSA
playback volume control. This is common for affected users but
irrelevant to everyone else. Unprivileged users can observe/use ALSA
controls depending on system policy, but the patch itself runs at device
enumeration/control setup.

Step 8.3 Record: Failure mode is too-low audio volume / poor device
usability, not a crash or corruption issue. Severity is medium for
affected hardware.

Step 8.4 Record: Benefit is high for affected device users and zero-
impact for other devices. Risk is very low because the change is 10
lines, exact USB-ID gated, and follows existing quirk-table style.

## Phase 9: Final Synthesis
Step 9.1 Record:
Evidence for backporting: this is a classic hardware quirk, small and
contained, maintainer-applied, affects real hardware, and all checked
active stable branches have the infrastructure. Hardware quirks are an
accepted stable exception.
Evidence against: no `Cc: stable`, no `Tested-by`, no separate bug
report, and the failure is not crash/security/data corruption. `5.15.y`
may need minor manual placement.
Unresolved: no hardware test or descriptor dump was available to
independently verify that the device exposes the exact `PCM Playback
Volume` control.

Step 9.2 Stable rules:
1. Obviously correct and tested? Mostly yes on correctness; no explicit
   `Tested-by` or local hardware test.
2. Fixes a real bug? Yes, per commit and mailing-list submission: too-
   low volume on specific hardware.
3. Important issue? Medium severity hardware usability issue; stable
   exception applies for quirks.
4. Small and contained? Yes, 10 lines in one switch.
5. No new features/APIs? Yes, no API or broad behavior change.
6. Can apply to stable? Yes for current tree; likely minor context
   handling for at least `5.15.y`.

Step 9.3 Exception category Record: Hardware quirk/workaround for
specific USB devices. This is a stable-appropriate exception category.

Step 9.4 Decision: Backport. The technical risk is extremely low, the
change is device-specific, and stable policy explicitly allows
quirks/workarounds for real hardware problems.

## Verification
- Phase 1: Parsed `git show --format=fuller --stat --patch
  110189f0268d0`; confirmed subject, tags, commit body, author,
  committer, and 10-line diff.
- Phase 2: Verified the only code change is adding two `USB_ID()` cases
  in `volume_control_quirks()`.
- Phase 3: Ran `git blame` around the insertion context; identified
  historical context commits and version containment via `git describe
  --contains`.
- Phase 3: Ran local history searches without `--all`; found no
  prerequisite Riffmaster commits.
- Phase 4: Ran `b4 dig -c`, `b4 dig -a`, and `b4 dig -w`; confirmed one
  v1 patch, original recipients, and lore URL.
- Phase 4: Used b4 mbox and Patchew to read the thread; confirmed
  Takashi’s “Applied now. Thanks.” reply and no objections.
- Phase 4: Stable lore WebFetch was blocked by Anubis; WebSearch found
  no stable-specific Riffmaster discussion.
- Phase 5: Read `sound/usb/mixer.c`, `sound/usb/card.c`, and
  `sound/usb/quirks.c`; traced the mixer creation and feature-control
  path to `volume_control_quirks()`.
- Phase 6: Checked stable branches `5.10.y`, `5.15.y`, `6.1.y`, `6.6.y`,
  `6.12.y`, `6.18.y`, `6.19.y`, and `7.0.y`; confirmed infrastructure
  exists and Riffmaster IDs are absent.
- Phase 6: Ran `git apply --check` on current tree; passed. Ran `git
  merge-tree`; noted a minor conflict marker for `5.15.y`.
- Phase 7: Verified `MAINTAINERS` lists Jaroslav Kysela and Takashi Iwai
  for `SOUND`.
- Phase 8: Verified scope is exact-device gated by USB IDs and exact
  mixer control name.
- UNVERIFIED: Actual PDP Riffmaster USB descriptors and hardware
  behavior were not locally tested.
- UNVERIFIED: Older branches outside the checked active stable set were
  not analyzed.

**YES**

 sound/usb/mixer.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/sound/usb/mixer.c b/sound/usb/mixer.c
index 1ced9ba8be406..75c932ea77388 100644
--- a/sound/usb/mixer.c
+++ b/sound/usb/mixer.c
@@ -1190,6 +1190,16 @@ static void volume_control_quirks(struct usb_mixer_elem_info *cval,
 			cval->res = 1;
 		}
 		break;
+
+	case USB_ID(0x0e6f, 0x024a): /* PDP Riffmaster for PS4 */
+	case USB_ID(0x0e6f, 0x0249): /* PDP Riffmaster for PS5 */
+		if (!strcmp(kctl->id.name, "PCM Playback Volume")) {
+			usb_audio_info(chip,
+				"set volume quirk for PDP Riffmaster for PS4/PS5\n");
+			cval->min = -2560; /* Mute under it */
+		}
+		break;
+
 	case USB_ID(0x3302, 0x12db): /* MOONDROP Quark2 */
 		if (!strcmp(kctl->id.name, "PCM Playback Volume")) {
 			usb_audio_info(chip,
-- 
2.53.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH AUTOSEL 7.0-6.18] nvmet-tcp: Don't clear tls_key when freeing sq
  2026-05-05  9:51 [PATCH AUTOSEL 7.0-5.10] ALSA: hda: Avoid WARN_ON() for HDMI chmap slot checks Sasha Levin
                   ` (9 preceding siblings ...)
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-5.10] ALSA: usb-audio: apply quirk for Playstation PDP Riffmaster Sasha Levin
@ 2026-05-05  9:51 ` Sasha Levin
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-5.10] rculist: add list_splice_rcu() for private lists Sasha Levin
                   ` (16 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Sasha Levin @ 2026-05-05  9:51 UTC (permalink / raw)
  To: patches, stable
  Cc: Alistair Francis, Chris Leech, Hannes Reinecke, Keith Busch,
	Sasha Levin, sagi, kch, linux-nvme, linux-kernel

From: Alistair Francis <alistair.francis@wdc.com>

[ Upstream commit 5fc422951c962cc01e654950fc043ebd8fadd865 ]

Curently after the host sends a REPLACETLSPSK we free the TLS keys as
part of calling nvmet_auth_sq_free() on success. This means when the
host sends a follow up REPLACETLSPSK we return CONCAT_MISMATCH as the
check for !nvmet_queue_tls_keyid(req->sq) fails.

A previous attempt to fix this involed not calling nvmet_auth_sq_free()
on successful connections, but that results in memory leaks. Instead we
should not clear `tls_key` in nvmet_auth_sq_free(), as that was
incorrectly wiping the tls keys which are used for the session.

This patch ensures we correctly free the ephemeral session key on
connection, yet we don't free the TLS key unless closing the connection.

Reviewed-by: Chris Leech <cleech@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Phase 1: Commit Message Forensics

Step 1.1 Record: subsystem `nvmet-tcp` / NVMe target TCP; action verb
“Don’t clear”; intent is to stop `nvmet_auth_sq_free()` from wiping
`sq->tls_key` during authentication cleanup.

Step 1.2 Record: tags in upstream commit `5fc422951c962`: `Reviewed-by:
Chris Leech <cleech@redhat.com>`, `Reviewed-by: Hannes Reinecke
<hare@suse.de>`, `Signed-off-by: Alistair Francis
<alistair.francis@wdc.com>`, `Signed-off-by: Keith Busch
<kbusch@kernel.org>`. No `Fixes:`, no `Reported-by:`, no `Cc: stable`.

Step 1.3 Record: the described bug is deterministic: after a host sends
`REPLACETLSPSK`, successful auth cleanup calls `nvmet_auth_sq_free()`,
which clears `sq->tls_key`; a later `REPLACETLSPSK` then fails the
`!nvmet_queue_tls_keyid(req->sq)` check and returns `CONCAT_MISMATCH`.
The commit also states the earlier approach of not calling
`nvmet_auth_sq_free()` leaked memory.

Step 1.4 Record: this is not a hidden cleanup; it is an explicit
correctness fix for broken TLS PSK replacement and for preserving the
key until connection close.

## Phase 2: Diff Analysis

Step 2.1 Record: one file changed, `drivers/nvme/target/auth.c`;
upstream diff is 3 deletions. Modified function: `nvmet_auth_sq_free()`.
Scope: single-file surgical fix.

Step 2.2 Record: before, `nvmet_auth_sq_free()` canceled auth timeout
work, cleared `sq->tls_key`, then freed DH-CHAP ephemeral buffers.
After, it still cancels work and frees `dhchap_c1`, `dhchap_c2`, and
`dhchap_skey`, but no longer clears the TCP TLS key.

Step 2.3 Record: bug category is logic/resource lifetime.
`nvmet_tcp_tls_key_lookup()` stores a key from `nvme_tls_key_lookup()`
into `queue->nvme_sq.tls_key`; `key_lookup()` increments the key
refcount. `nvmet_sq_put_tls_key()` later does `key_put()` only if
`sq->tls_key` is still set. Clearing the pointer in
`nvmet_auth_sq_free()` both breaks later `REPLACETLSPSK` validation and
prevents the normal close path from seeing the key pointer.

Step 2.4 Record: fix quality is high. The patch removes only incorrect
ownership cleanup from the per-auth cleanup helper. Regression risk is
low because the verified TCP queue release path calls
`nvmet_sq_put_tls_key()` before `nvmet_sq_destroy()`, and non-TCP
transports do not populate `sq->tls_key`.

## Phase 3: Git History Investigation

Step 3.1 Record: `git blame` on the changed area shows `sq->tls_key`
handling was introduced by `fa2e0f8bbc689` (“nvmet-tcp: support secure
channel concatenation”), first contained at `v6.15-rc1~166^2^2~13`;
`sq->tls_key = NULL` was later style-adjusted by `b1efcc470eb30`. The
older `nvmet_auth_sq_free()` DH-CHAP cleanup itself dates to the
original auth code around v6.0, but the TLS-specific bug begins with
`fa2e0f8bbc689`.

Step 3.2 Record: no `Fixes:` tag is present. I manually inspected the
likely introducing commit `fa2e0f8bbc689`, which added secure channel
concatenation, `sq->tls_key`, `nvmet_queue_tls_keyid()`,
`nvmet_sq_put_tls_key()`, and the clear in `nvmet_auth_sq_free()`.

Step 3.3 Record: recent related upstream history includes
`2e6eb6b277f59` (“Don’t free SQ on authentication success”),
`f920ebd03cd13` reverting that due to leaks, and this commit
`5fc422951c962`. This commit was submitted as patch 2/2 after the
revert. On the checked stable branches, `2e6eb6b277f59` is not an
ancestor, so this patch can apply standalone there.

Step 3.4 Record: Alistair Francis has multiple recent NVMe target
auth/TLS commits in this subsystem, including `ecf4d2d883515`,
`2e6eb6b277f59`, `f920ebd03cd13`, `5fc422951c962`, and `5d10069e1a169`.

Step 3.5 Record: dependencies are minimal. The patch depends on
`sq->tls_key` and `nvmet_auth_sq_free()` existing, which local stable
refs show in `6.15.y` and newer. If a target stable tree had already
taken `2e6eb6b277f59`, then the paired revert `f920ebd03cd13` would also
be needed; in the local stable refs checked, that prerequisite is not
needed.

## Phase 4: Mailing List And External Research

Step 4.1 Record: `b4 dig -c 5fc422951c962...` found the original
submission at `https://patch.msgid.link/20260417004809.2894745-2-
alistair.francis@wdc.com`. `b4 dig -a` found only a v1 two-patch series.
The committed version is the reviewed/applied version.

Step 4.2 Record: `b4 dig -w` showed the patch was sent to the expected
NVMe/block audience: Hannes Reinecke, Christoph Hellwig, Sagi Grimberg,
Chaitanya Kulkarni, Keith Busch, linux-nvme, linux-block, linux-kernel,
Yi Zhang, Maurizio Lombardi, and Shinichiro Kawasaki. Lore replies show
`Reviewed-by` from Hannes Reinecke and Chris Leech, and Keith Busch
replied that patches 1 and 2 were applied to `nvme-7.1`.

Step 4.3 Record: no direct bug-report link is in this commit. The paired
revert references a kmemleak report from Yi Zhang during blktests
(`nvme/041`, `nvme/042`, `nvme/043`, `nvme/044`, `nvme/045`, `nvme/051`,
`nvme/052`), confirming the earlier “don’t call cleanup” approach leaked
DH-CHAP allocations.

Step 4.4 Record: series context is important: patch 1 reverts the
earlier workaround that skipped `nvmet_auth_sq_free()` on success; patch
2, this commit, fixes the root cause by keeping cleanup but no longer
clearing `tls_key`.

Step 4.5 Record: stable-specific web search through
`lore.kernel.org/stable` was blocked by Anubis; the `yhbt.net` stable
path returned 404. I found an AUTOSEL posting for the earlier
`2e6eb6b277f59` workaround, but did not use that as selection evidence.

## Phase 5: Code Semantic Analysis

Step 5.1 Record: modified function is `nvmet_auth_sq_free()`.

Step 5.2 Record: callers found by `rg`: `nvmet_execute_auth_send()`,
`nvmet_execute_auth_receive()`, and `nvmet_sq_destroy()`. Auth
send/receive are assigned as handlers for `nvme_fabrics_type_auth_send`
and `nvme_fabrics_type_auth_receive` in `fabrics-cmd.c`.

Step 5.3 Record: key callees around the affected flow include
`nvmet_queue_tls_keyid()`, `nvmet_auth_insert_psk()`,
`nvmet_tcp_tls_key_lookup()`, `nvme_tls_key_lookup()`, `key_lookup()`,
`nvmet_sq_put_tls_key()`, and `key_put()`.

Step 5.4 Record: reachability is verified through the NVMe/TCP receive
path: `nvmet_tcp_done_recv_pdu()` initializes a request, assigns/uses
the fabrics auth execute handler, and runs `req->execute()`. This is
reachable to an NVMe-oF host using target auth/TLS, not a theoretical
internal path.

Step 5.5 Record: similar lifetime pattern found: the only proper TCP TLS
SQ key release helper is `nvmet_sq_put_tls_key()`, called from
`nvmet_tcp_release_queue_work()`. `nvmet_auth_sq_free()` is a DH-CHAP
ephemeral cleanup helper and should not own the TLS key.

## Phase 6: Cross-Referencing And Stable Tree Analysis

Step 6.1 Record: local stable refs show `sq->tls_key = NULL` in
`nvmet_auth_sq_free()` exists in `stable/linux-6.15.y`, `6.16.y`,
`6.17.y`, `6.18.y`, `6.19.y`, and `7.0.y`; it is absent from `6.14.y`,
`6.13.y`, `6.12.y`, `6.6.y`, and `6.1.y`.

Step 6.2 Record: `git apply --check` of the candidate patch succeeds on
local `stable/linux-6.15.y` through `stable/linux-7.0.y`, and fails on
`6.14.y` because the buggy TLS key clearing code is not there. Expected
backport difficulty: clean for affected local refs.

Step 6.3 Record: bounded related-history searches found no existing
equivalent fix in `stable/linux-7.0.y`. Upstream has the earlier
attempted fix `2e6eb6b277f59`, its revert `f920ebd03cd13`, and this
final fix.

## Phase 7: Subsystem And Maintainer Context

Step 7.1 Record: subsystem is NVMe target over TCP, with authentication
and TLS secure channel concatenation. Criticality: IMPORTANT, because it
affects networked storage authentication/TLS behavior, though not a
universal core-kernel path.

Step 7.2 Record: subsystem is active. Recent history in
`drivers/nvme/target` includes multiple auth/TLS fixes and refactors,
including secure concatenation support and follow-up fixes through the
6.15 to 7.1 development window.

## Phase 8: Impact And Risk Assessment

Step 8.1 Record: affected users are systems using NVMe target TCP with
`CONFIG_NVME_TARGET_AUTH` and `CONFIG_NVME_TARGET_TCP_TLS`, specifically
secure channel concatenation / TLS PSK replacement.

Step 8.2 Record: trigger is a host sending a follow-up `REPLACETLSPSK`
after successful authentication on a TLS-enabled admin queue. The
failure is deterministic from the verified code path. I did not verify
that an unprivileged local user can trigger it; this is a
remote/protocol operation by an NVMe host with access to the target.

Step 8.3 Record: failure mode is protocol/authentication failure with
`CONCAT_MISMATCH`, breaking TLS PSK replacement/key rotation.
Additionally, because `key_lookup()` increments the key refcount and the
close path only `key_put()`s if `sq->tls_key` remains set, clearing the
pointer early risks leaking the key reference. Severity: HIGH for
affected NVMe/TCP secure-channel users; not a crash/data-corruption fix.

Step 8.4 Record: benefit is high for affected users because it fixes a
broken authenticated TLS key replacement flow and restores the verified
close-time key release path. Risk is very low: 3 deleted lines, no new
API, no new feature, no locking changes, no cross-subsystem behavior
change.

## Phase 9: Final Synthesis

Step 9.1 Record: evidence for backporting: real deterministic protocol
failure; possible key-ref lifetime leak; tiny one-function patch;
reviewed by relevant NVMe/TLS people; applied by NVMe maintainer;
cleanly applies to affected stable refs; affects released stable
branches from 6.15.y onward. Evidence against: affects a relatively
specialized NVMe/TCP TLS configuration and needs no backport before
6.15.y. Unresolved: I could not access `lore.kernel.org/stable` directly
due Anubis, and I did not run runtime blktests.

Step 9.2 Stable rules checklist: obviously correct and reviewed: yes.
Fixes a real bug: yes. Important issue: yes for NVMe/TCP TLS key
replacement and key lifetime. Small and contained: yes, 3 deleted lines
in one function. No new features/APIs: yes. Applies to stable trees: yes
for local `6.15.y` through `7.0.y`; not applicable to older checked
branches.

Step 9.3 Exception category: not a device ID, quirk, DT, build, or
documentation exception.

Step 9.4 Decision: backport to stable trees that contain `fa2e0f8bbc689`
/ the `sq->tls_key` clearing code, namely local refs `6.15.y` and newer.
Do not apply to older trees where the code is absent. If a target branch
contains the earlier workaround `2e6eb6b277f59`, include/reconcile the
paired revert `f920ebd03cd13`; otherwise this commit stands alone.

## Verification

- Phase 1: Found upstream commit `5fc422951c962` with `git log
  origin/master --grep`.
- Phase 1: Parsed commit tags from `git show --format=fuller --stat
  --patch 5fc422951c962`.
- Phase 2: Verified the diff removes only the
  `CONFIG_NVME_TARGET_TCP_TLS` block clearing `sq->tls_key`.
- Phase 2: Verified `nvmet_queue_tls_keyid()` returns 0 when
  `sq->tls_key` is NULL.
- Phase 2: Verified `REPLACETLSPSK` returns `CONCAT_MISMATCH` when
  `nvmet_queue_tls_keyid(req->sq)` is false.
- Phase 2: Verified `nvmet_sq_put_tls_key()` calls `key_put()` and NULLs
  the key on TCP queue release.
- Phase 2: Verified `nvme_tls_key_lookup()` uses `key_lookup()`, and
  `key_lookup()` increments the key refcount.
- Phase 3: Ran `git blame` on `auth.c`; TLS key clearing originates from
  `fa2e0f8bbc689`/`b1efcc470eb30`.
- Phase 3: Ran `git describe --contains fa2e0f8bbc689`; first contained
  at `v6.15-rc1`.
- Phase 3: Inspected `fa2e0f8bbc689`, `b1efcc470eb30`, `2e6eb6b277f59`,
  `f920ebd03cd13`, and `ecf4d2d883515`.
- Phase 4: Ran `b4 dig -c`, `b4 dig -a`, and `b4 dig -w`; found v1 two-
  patch lore series and recipient list.
- Phase 4: Fetched lore mirror thread; verified Hannes Reinecke and
  Chris Leech reviewed, Keith Busch applied patches 1 and 2.
- Phase 4: Fetched Yi Zhang kmemleak report linked by the paired revert.
- Phase 5: Used `rg` and file reads to trace auth command handlers, TCP
  receive execution, TLS key lookup, and queue release.
- Phase 6: Checked local stable refs with scripted `git show`; buggy
  code exists in `6.15.y` through `7.0.y`, absent in older checked
  stable refs.
- Phase 6: Ran `git apply --check` against local stable worktrees; clean
  for `6.15.y` through `7.0.y`, not applicable to `6.14.y`.
- Phase 7: Reviewed recent `drivers/nvme/target` history showing active
  auth/TLS development.
- Phase 8: Verified trigger and failure path from
  `nvmet_auth_negotiate()` and auth send/receive call flow.
- Unverified: no runtime blktests were run.
- Unverified: direct `lore.kernel.org/stable` search was blocked by
  Anubis; the mirror stable search path returned 404.

**YES**

 drivers/nvme/target/auth.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/nvme/target/auth.c b/drivers/nvme/target/auth.c
index 2eadeb7e06f26..3a905124afdee 100644
--- a/drivers/nvme/target/auth.c
+++ b/drivers/nvme/target/auth.c
@@ -239,9 +239,6 @@ u8 nvmet_setup_auth(struct nvmet_ctrl *ctrl, struct nvmet_sq *sq)
 void nvmet_auth_sq_free(struct nvmet_sq *sq)
 {
 	cancel_delayed_work(&sq->auth_expired_work);
-#ifdef CONFIG_NVME_TARGET_TCP_TLS
-	sq->tls_key = NULL;
-#endif
 	kfree(sq->dhchap_c1);
 	sq->dhchap_c1 = NULL;
 	kfree(sq->dhchap_c2);
-- 
2.53.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH AUTOSEL 7.0-5.10] rculist: add list_splice_rcu() for private lists
  2026-05-05  9:51 [PATCH AUTOSEL 7.0-5.10] ALSA: hda: Avoid WARN_ON() for HDMI chmap slot checks Sasha Levin
                   ` (10 preceding siblings ...)
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.18] nvmet-tcp: Don't clear tls_key when freeing sq Sasha Levin
@ 2026-05-05  9:51 ` Sasha Levin
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0] ALSA: hda/realtek: enable mute LED support on ThinkBook 16p Sasha Levin
                   ` (15 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Sasha Levin @ 2026-05-05  9:51 UTC (permalink / raw)
  To: patches, stable
  Cc: Pablo Neira Ayuso, Paul E. McKenney, Sasha Levin, frederic,
	neeraj.upadhyay, joelagnelf, josh, boqun, urezki, rcu,
	linux-kernel

From: Pablo Neira Ayuso <pablo@netfilter.org>

[ Upstream commit f902877b635551513729bdf9a8d1422c4aab7741 ]

This patch adds a helper function, list_splice_rcu(), to safely splice
a private (non-RCU-protected) list into an RCU-protected list.

The function ensures that only the pointer visible to RCU readers
(prev->next) is updated using rcu_assign_pointer(), while the rest of
the list manipulations are performed with regular assignments, as the
source list is private and not visible to concurrent RCU readers.

This is useful for moving elements from a private list into a global
RCU-protected list, ensuring safe publication for RCU readers.
Subsystems with some sort of batching mechanism from userspace can
benefit from this new function.

The function __list_splice_rcu() has been added for clarity and to
follow the same pattern as in the existing list_splice*() interfaces,
where there is a check to ensure that the list to splice is not
empty. Note that __list_splice_rcu() has no documentation for this
reason.

Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Phase Walkthrough

Phase 1 Record: Subject subsystem is `rculist`; action verb is `add`;
claimed intent is adding `list_splice_rcu()` for splicing private lists
into RCU-protected lists. Tags verified: `Reviewed-by: Paul E.
McKenney`, `Signed-off-by: Pablo Neira Ayuso`. No `Fixes:`, `Reported-
by:`, `Tested-by:`, `Cc: stable`, or bug-report `Link:` in this commit.
The body describes an RCU publication helper, not a direct user-visible
failure. Hidden-fix assessment: as a standalone commit, it is an
internal helper; as verified from the series, it is required by the
following `nf_tables` RCU list fix.

Phase 2 Record: One file changed, `include/linux/rculist.h`, 29
insertions. New functions: `__list_splice_rcu()` and
`list_splice_rcu()`. Before: no helper for private-list-to-RCU-list
splice. After: list links are prepared with normal stores, then
`prev->next` is published with `rcu_assign_pointer()`. Bug category: RCU
synchronization/publication correctness, but the direct behavioral fix
is in the dependent `nf_tables` patch. Fix quality: small and clear;
regression risk low, because it is unused unless paired with callers.

Phase 3 Record: `git blame` showed neighboring RCU list helpers are
long-standing code; new lines have no prior blame. The dependent fix
targets `78d9f48f7f44` first contained by `v5.8-rc1` and `b9703ed44ffb`
first contained by `v6.4-rc1`. Local blame confirms the unsafe
`nf_tables` `list_splice()` sites came from those commits. Recent
history shows the helper is paired with `netfilter: nf_tables: join hook
list via splice_list_rcu() in commit phase`. Author context verified:
Pablo is listed as `NETFILTER` maintainer in `MAINTAINERS`.

Phase 4 Record: `b4 dig -c b93a4320874cf` found the original patch
discussion by patch-id, with the applied upstream helper as `[PATCH net
10/11]`. `b4 dig -a` found multiple revisions/submissions, including
v2/v3 and net pull submissions. `b4 dig -w` showed netfilter/netdev
maintainers and lists were included. Lore/Patchwork review verified Paul
requested documentation/order changes in early review, then gave
`Reviewed-by`; no NAK found for the helper. The cover letter explicitly
says this helper is required to fix unsafe splice into RCU-protected
hook lists. Stable lore search was blocked/ inconclusive.

Phase 5 Record: Key functions are `list_splice_rcu()` and
`__list_splice_rcu()`. Verified call sites on the candidate branch are
only the two `nf_tables_commit()` updates from the dependent patch.
`nf_tables_commit()` is the `nfnetlink` batch commit callback;
`NFT_MSG_NEWCHAIN` and `NFT_MSG_NEWFLOWTABLE` are batch netlink
operations, while `GETCHAIN`/`GETFLOWTABLE` callbacks are RCU readers.
`nfnetlink_rcv()` requires `CAP_NET_ADMIN`, so this is reachable from
privileged userspace netlink operations, not verified as unprivileged.

Phase 6 Record: The helper itself is not present in checked-out
`v7.0.3`. The unsafe `list_splice()` patterns exist in active stable-
family branches checked: `5.10`, `5.15`, `6.1`, `6.6`, `6.12`, and
`6.19`, with chain-specific sites appearing from `6.6` onward and
flowtable sites in older branches. `git apply --check` verified both the
helper patch and dependent `nf_tables` patch apply cleanly to the
current checkout. Backport difficulty for this helper is low; older
branches may need only the relevant dependent fix subset.

Phase 7 Record: Subsystem is core RCU list infrastructure plus netfilter
usage context. Criticality: `include/linux/rculist.h` is core
infrastructure, but runtime impact here is limited to future callers;
the verified caller is `nf_tables`, an important networking subsystem.
Subsystem activity is active, with many recent `nf_tables` fixes in
history.

Phase 8 Record: Affected users are systems using nftables netdev
chain/flowtable hook updates and concurrent netlink dumps. Trigger is
privileged nftables batch update plus concurrent RCU dump traversal.
Verified severity is RCU list publication correctness; no concrete crash
report was found for this exact helper/dependent patch, so I do not rate
it as proven crash/security. Benefit is high when paired with the
dependent `nf_tables` fix; standalone benefit is none. Risk is low: 29
lines, internal static inline helper, no userspace API.

Phase 9 Record: Evidence for backporting: small helper, reviewed,
required by an actual `nf_tables` RCU correctness fix, clean apply
verified, bug-introducing commits exist in stable lines. Evidence
against: standalone commit does not fix runtime behavior and adds a new
internal helper API. Stable rules: obviously correct yes; tested only by
review/apply-check here, no runtime test verified; fixes a real bug only
as dependency; important issue is RCU list safety but no concrete crash
report verified; small/contained yes; no userspace API yes; applies
cleanly to current stable checkout yes. Exception category:
dependency/helper for a small bug fix, not a device ID/quirk/build/doc
exception.

## Verification

- Phase 1: Parsed `git show`/provided commit message; verified tags and
  absence of bug tags.
- Phase 2: Inspected diff; verified 29 insertions in
  `include/linux/rculist.h`.
- Phase 3: Ran `git blame` on `rculist.h` and `nf_tables_api.c`;
  verified introducers and `git describe --contains` versions for
  dependent fix targets.
- Phase 4: Ran `b4 dig -c`, `-a`, and `-w`; fetched lore/Patchwork
  mirrors; verified series context, review, and dependency statement.
- Phase 5: Used `git grep`, `rg`, and file reads; verified helper call
  sites, netlink callback paths, RCU dump traversal, and `CAP_NET_ADMIN`
  gate.
- Phase 6: Checked stable branches with `git grep`; verified helper
  absence and unsafe splice presence; ran `git apply --check` for helper
  and dependent fix.
- Phase 7: Verified maintainer entry in `MAINTAINERS` and recent
  netfilter history.
- Phase 8: Verified trigger path from nfnetlink batch operations; no
  concrete crash report was found.
- Unverified: Runtime testing was not performed; stable-specific lore
  discussion could not be fetched due bot protection.

This should be backported only together with the dependent `nf_tables`
fix. As that dependency, it is small, reviewed, cleanly applicable, and
enables a real RCU list publication fix without userspace-visible API
changes.

**YES**

 include/linux/rculist.h | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/include/linux/rculist.h b/include/linux/rculist.h
index 2abba7552605c..e3bc442256922 100644
--- a/include/linux/rculist.h
+++ b/include/linux/rculist.h
@@ -261,6 +261,35 @@ static inline void list_replace_rcu(struct list_head *old,
 	old->prev = LIST_POISON2;
 }

+static inline void __list_splice_rcu(struct list_head *list,
+				     struct list_head *prev,
+				     struct list_head *next)
+{
+	struct list_head *first = list->next;
+	struct list_head *last = list->prev;
+
+	last->next = next;
+	first->prev = prev;
+	next->prev = last;
+	rcu_assign_pointer(list_next_rcu(prev), first);
+}
+
+/**
+ * list_splice_rcu - splice a non-RCU list into an RCU-protected list,
+ *                   designed for stacks.
+ * @list:	the non RCU-protected list to splice
+ * @head:	the place in the existing RCU-protected list to splice
+ *
+ * The list pointed to by @head can be RCU-read traversed concurrently with
+ * this function.
+ */
+static inline void list_splice_rcu(struct list_head *list,
+				   struct list_head *head)
+{
+	if (!list_empty(list))
+		__list_splice_rcu(list, head, head->next);
+}
+
 /**
  * __list_splice_init_rcu - join an RCU-protected list into an existing list.
  * @list:	the RCU-protected list to splice
-- 
2.53.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH AUTOSEL 7.0] ALSA: hda/realtek: enable mute LED support on ThinkBook 16p
  2026-05-05  9:51 [PATCH AUTOSEL 7.0-5.10] ALSA: hda: Avoid WARN_ON() for HDMI chmap slot checks Sasha Levin
                   ` (11 preceding siblings ...)
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-5.10] rculist: add list_splice_rcu() for private lists Sasha Levin
@ 2026-05-05  9:51 ` Sasha Levin
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.18] mailbox: cix: Add IRQF_NO_SUSPEND to mailbox interrupt Sasha Levin
                   ` (14 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Sasha Levin @ 2026-05-05  9:51 UTC (permalink / raw)
  To: patches, stable
  Cc: Yuxuan Qiu, Takashi Iwai, Sasha Levin, perex, tiwai, linux-sound,
	linux-kernel

From: Yuxuan Qiu <yuxuanqiu596@gmail.com>

[ Upstream commit 597aa74b0e73f5e0c915b5d0c95cb296774589bd ]

On ThinkBook 16p systems the platform mute LED is present and
bound to the audio-mute trigger, but it does not react to Master
mute changes.

The affected fixup chain sets up the DAC routing, but does not enable
vmaster mute LED handling. Because of that, the generic HDA code does
not mark Master Playback Switch with SNDRV_CTL_ELEM_ACCESS_SPK_LED,
and the audio-mute trigger never receives speaker mute updates.

Add a ThinkBook-specific wrapper around alc287_fixup_bind_dacs() and
enable spec->gen.vmaster_mute_led during PRE_PROBE. This keeps the
existing DAC binding logic unchanged while allowing the normal generic
LED path to drive the mute LED.

Signed-off-by: Yuxuan Qiu <yuxuanqiu596@gmail.com>
Link: https://patch.msgid.link/20260424112107.22206-1-yuxuanqiu596@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Phase 1: Commit Message Forensics

Step 1.1 Record: subsystem `ALSA: hda/realtek`; action verb `enable`;
claimed intent is enabling speaker mute LED support on Lenovo ThinkBook
16p systems whose LED is already bound to `audio-mute` but not updated
by Master mute changes.

Step 1.2 Record: tags present are `Signed-off-by: Yuxuan Qiu
<yuxuanqiu596@gmail.com>`, `Link:
https://patch.msgid.link/20260424112107.22206-1-yuxuanqiu596@gmail.com`,
and `Signed-off-by: Takashi Iwai <tiwai@suse.de>`. No `Fixes:`,
`Reported-by:`, `Tested-by:`, `Reviewed-by:`, `Acked-by:`, or `Cc:
stable@vger.kernel.org` tag was present.

Step 1.3 Record: the commit describes a real hardware behavior bug: the
platform mute LED exists and is registered with the `audio-mute`
trigger, but `Master Playback Switch` lacks
`SNDRV_CTL_ELEM_ACCESS_SPK_LED`, so ALSA control LED notifications never
drive the LED. No kernel crash, corruption, or security symptom is
claimed. No affected kernel version is explicitly stated.

Step 1.4 Record: this is a hardware quirk/fix disguised as an enablement
patch. It is not a new driver or new API; it enables an existing generic
HDA LED path for an existing Realtek/Lenovo fixup chain.

## Phase 2: Diff Analysis

Step 2.1 Record: one file changed, `sound/hda/codecs/realtek/alc269.c`,
with 12 insertions and 1 replacement in the submitted patch. Modified
areas are a new helper `alc287_fixup_tb_vmaster_led()` and the
`ALC287_FIXUP_MG_RTKC_CSAMP_CS35L41_I2C_THINKPAD` fixup table entry.
Scope is single-file and surgical.

Step 2.2 Record: before,
`ALC287_FIXUP_MG_RTKC_CSAMP_CS35L41_I2C_THINKPAD` directly called
`alc287_fixup_bind_dacs()`, which adjusts DAC routing and amp mute
behavior during `HDA_FIXUP_ACT_PRE_PROBE`. After, it calls a wrapper
that sets `spec->gen.vmaster_mute_led = 1` during `PRE_PROBE`, then
calls the same DAC binding function. Normal DAC binding behavior is
preserved.

Step 2.3 Record: bug category is hardware quirk / logic correctness for
LED control metadata. The broken mechanism is missing
`vmaster_mute_led`, verified in `generic.c`: when set,
`snd_hda_gen_build_controls()` adds `SNDRV_CTL_ELEM_ACCESS_SPK_LED` to
`Master Playback Switch`; the ALSA control LED layer then follows that
control and fires the `audio-mute` LED trigger.

Step 2.4 Record: the fix is obviously simple: a one-bit flag is set
before auto config parsing/building controls, and the existing fixup
function is still invoked. Main regression risk is that the changed
fixup ID is shared by several Lenovo ThinkPad/ThinkBook SSIDs, not only
ThinkBook 16p. Verified matching entries include ThinkPad P1/Z-series,
ThinkBook 16P, ThinkBook 13x, and ThinkBook 16P Gen5 on current
branches. The behavioral change is limited to LED control tagging, so
risk is low.

## Phase 3: Git History Investigation

Step 3.1 Record: `git blame` on current `v7.0.3` points changed lines to
the Realtek driver split commit `aeeb85f26c3b` because the file moved.
`git log -S` traced the actual related history: ThinkBook 16P IDs
`0x38a9/0x38ab` were added by `6214e24cae9b` in v6.8; Gen4 was switched
to the affected MG/CS35L41 fixup by `dca5f4dfa925` in v6.9; Gen5 was
switched by `34c8e74cd666` in v6.13. The affected MG/CS35L41 fixup was
introduced earlier by `d93eeca627db`.

Step 3.2 Record: no `Fixes:` tag is present, so there was no tagged
introducing commit to follow. Related history nevertheless shows the
affected ThinkBook 16P quirk chain exists in maintained stable-era
kernels.

Step 3.3 Record: recent file history shows many ALSA Realtek quirk and
mute LED fixes. Related ThinkBook commits are `dca5f4dfa925` and
`34c8e74cd666`, both fixing ThinkBook 16P volume/mute behavior by
selecting the same MG/CS35L41 fixup. No prerequisite patch for this new
wrapper was identified beyond the target tree already having
`alc287_fixup_bind_dacs()` and
`ALC287_FIXUP_MG_RTKC_CSAMP_CS35L41_I2C_THINKPAD`.

Step 3.4 Record: no prior local commits from Yuxuan Qiu were found under
the Realtek HDA paths. Takashi Iwai, the ALSA maintainer, applied the
patch in the mailing-list thread.

Step 3.5 Record: dependencies are minimal. Current `v7.0.3` accepts the
submitted patch with `git apply --check`. Older pre-driver-split stable
trees need a path adjustment from `sound/hda/codecs/realtek/alc269.c` to
`sound/pci/hda/patch_realtek.c`.

## Phase 4: Mailing List And External Research

Step 4.1 Record: no local commit hash for this candidate was available,
so `b4 dig -c <commit>` was not applicable. `b4 am` using message id
`20260424112107.22206-1-yuxuanqiu596@gmail.com` fetched the original
submission and reported one patch in a two-message thread. The yhbt lore
mirror confirmed the same patch text and one maintainer reply.

Step 4.2 Record: recipients verified from the mbox were Jaroslav Kysela,
Takashi Iwai, `linux-sound@vger.kernel.org`, and `linux-
kernel@vger.kernel.org`. Takashi Iwai replied “Thanks, applied now.” No
NAKs or concerns were present in the fetched thread.

Step 4.3 Record: there was no separate `Reported-by:` or bugzilla/syzbot
link in this patch. External search found the indexed patch and related
Lenovo mute LED/platform LED context, but no independent user bug report
for this exact issue.

Step 4.4 Record: the thread is a single-patch submission, not a multi-
patch series. No later revision was found by `b4 am`; the thread
contains only the patch and maintainer apply reply.

Step 4.5 Record: web search did not find a stable-specific discussion or
explicit stable request for this exact patch. Direct lore WebFetch was
blocked by Anubis, but the yhbt mirror and `b4` provided the thread
content.

## Phase 5: Code Semantic Analysis

Step 5.1 Record: modified functions/objects are
`alc287_fixup_tb_vmaster_led()` and the
`ALC287_FIXUP_MG_RTKC_CSAMP_CS35L41_I2C_THINKPAD` fixup table entry.

Step 5.2 Record: call path verified: `alc269_probe()` picks Realtek
fixups, applies `HDA_FIXUP_ACT_PRE_PROBE`, then runs
`alc269_parse_auto_config()`; `alc_build_controls()` later calls
`snd_hda_gen_build_controls()`. The affected code runs during codec
probe for matching Lenovo HDA SSIDs.

Step 5.3 Record: key callees are `alc287_fixup_bind_dacs()`,
`snd_hda_override_conn_list()`, and generic HDA control creation.
`generic.c` verifies `spec->vmaster_mute_led` directly controls whether
`SNDRV_CTL_ELEM_ACCESS_SPK_LED` is passed to `__snd_hda_add_vmaster()`.

Step 5.4 Record: runtime reachability is verified through hardware probe
and ALSA mixer control updates. Userspace changing `Master Playback
Switch` can trigger ALSA control LED notifications once the control has
the speaker LED access flag.

Step 5.5 Record: similar Realtek patterns exist: many HDA Realtek
commits add or fix mute LED quirks; current code also has existing
`vmaster_mute_led = 1` usage in another mute LED fixup and many
`snd_hda_gen_add_mute_led_cdev()` users.

## Phase 6: Cross-Referencing And Stable Tree Analysis

Step 6.1 Record: `v6.12` contains ThinkBook 16P `0x38a9/0x38ab` using
the affected fixup; `v6.13+` contains ThinkBook 16P Gen5 `0x38f9` using
the affected fixup. `for-greg/6.12-201` contains both ThinkBook 16P and
Gen5 using the affected fixup. `for-greg/6.6-201` contains the affected
fixup for several ThinkPad SSIDs but not the original ThinkBook 16P
`0x38a9/0x38ab`; its Gen5 `0x38f9` entry still uses
`ALC287_FIXUP_CS35L41_I2C_2`, so this exact ThinkBook 16p bug is not
fully verified there.

Step 6.2 Record: expected backport difficulty is clean for current post-
split trees such as `v7.0.3`; minor manual backport for pre-split trees
because the same code lives in `sound/pci/hda/patch_realtek.c`.

Step 6.3 Record: no alternate stable fix for this exact mute LED issue
was found. Related ThinkBook volume/mute fixup selection commits exist
and are prerequisites for branches where the ThinkBook entries use the
affected MG/CS35L41 fixup.

## Phase 7: Subsystem And Maintainer Context

Step 7.1 Record: subsystem is ALSA HDA Realtek codec quirks, under sound
drivers. Criticality is peripheral/driver-specific, affecting Lenovo
systems with these Realtek/CS35L41 configurations.

Step 7.2 Record: subsystem activity is high; recent history shows many
Realtek quirk additions and mute LED fixes. This is typical stable
material when scoped to specific hardware behavior.

## Phase 8: Impact And Risk Assessment

Step 8.1 Record: affected population is driver-specific: Lenovo
ThinkBook/ThinkPad systems matching the shared
`ALC287_FIXUP_MG_RTKC_CSAMP_CS35L41_I2C_THINKPAD` quirk, especially
verified ThinkBook 16P/16P Gen5 entries in newer stable trees.

Step 8.2 Record: trigger condition is ordinary Master mute changes after
boot on affected hardware. This is user-triggerable through normal audio
controls. The failure is visible as a mute LED that does not track mute
state.

Step 8.3 Record: failure severity is LOW to MEDIUM. It is not a crash or
data-loss bug, but it is a real hardware/user-visible regression in mute
state indication.

Step 8.4 Record: benefit is moderate for affected laptops: restores
expected mute LED behavior using existing ALSA trigger infrastructure.
Risk is low: one bit is set during probe and existing DAC routing logic
remains unchanged. The only meaningful concern is wider impact on all
devices sharing the fixup ID, but the added effect is LED
metadata/control-trigger handling, not audio routing or power
sequencing.

## Phase 9: Final Synthesis

Step 9.1 Record:
Evidence for backporting: real user-visible hardware bug; tiny one-file
fix; uses existing generic HDA LED path; accepted by ALSA maintainer
Takashi Iwai; current tree patch applies cleanly; stable-style hardware
quirk; related ThinkBook fixup commits are already present in maintained
stable branches.
Evidence against backporting: no crash/security/corruption; no
`Reported-by` or `Tested-by`; fixup ID is shared beyond ThinkBook 16p;
older stable branches may need path adjustment and should only receive
this where the affected fixup/IDs exist.
Unresolved: exact upstream commit SHA was not available locally; no
independent bug report beyond the patch mail was found; exact
applicability to every active stable branch must be checked branch-by-
branch.

Step 9.2 Record:
1. Obviously correct and tested? Mostly yes by inspection and maintainer
   acceptance; no explicit `Tested-by`.
2. Fixes a real bug? Yes, verified mechanism: missing `vmaster_mute_led`
   prevents `SNDRV_CTL_ELEM_ACCESS_SPK_LED` tagging.
3. Important issue? Moderate, hardware/user-visible mute LED
   correctness, not critical stability.
4. Small and contained? Yes, one helper and one fixup function pointer
   change.
5. No new feature/API? Yes in stable sense: it enables existing LED
   trigger behavior for existing hardware; no new userspace API.
6. Can apply to stable trees? Yes on current post-split trees; older
   trees need path adjustment.

Step 9.3 Record: exception category applies: hardware-specific
quirk/workaround in an existing driver.

Step 9.4 Decision: backport. This is not a critical crash fix, but ALSA
HDA hardware quirks and mute LED fixes are normal stable material. The
patch is small, preserves existing routing logic, and fixes a concrete
hardware behavior problem on affected Lenovo systems. Backport only to
trees containing the affected fixup and relevant Lenovo entries, with
pre-split path adjustment where needed.

## Verification

- [Phase 1] Parsed supplied subject/body/tags; confirmed no `Fixes:`,
  `Reported-by`, `Tested-by`, `Reviewed-by`, or stable tag.
- [Phase 2] Read local `alc269.c`; confirmed `alc287_fixup_bind_dacs()`
  adjusts DAC routing and does not set `vmaster_mute_led`.
- [Phase 2] Read `generic.h` and `generic.c`; confirmed
  `vmaster_mute_led` means “add SPK-LED flag to vmaster mute switch” and
  is used when creating `Master Playback Switch`.
- [Phase 3] `git blame` and `git log -S` traced related ThinkBook/fixup
  history to `6214e24cae9b`, `dca5f4dfa925`, `34c8e74cd666`, and
  `d93eeca627db`.
- [Phase 3] `git show` confirmed `6214e24cae9b` added ThinkBook 16P IDs,
  `dca5f4dfa925` switched Gen4 to the affected fixup, and `34c8e74cd666`
  switched Gen5 to the affected fixup.
- [Phase 3] `git log --author="Yuxuan Qiu"` under Realtek paths found no
  prior local author commits.
- [Phase 4] `b4 am` fetched the two-message thread; `b4 mbox` saved and
  verified the full thread.
- [Phase 4] WebFetch of patch.msgid.link was blocked by Anubis; yhbt
  lore mirror successfully showed the patch and Takashi Iwai’s “Thanks,
  applied now.”
- [Phase 4] WebSearch found no stable-specific request for this exact
  patch.
- [Phase 5] `rg` and file reads verified the probe path:
  `alc269_probe()` applies PRE_PROBE fixups before auto config parsing,
  and `alc_build_controls()` calls generic control creation.
- [Phase 5] Read `control_led.c`; confirmed controls tagged with speaker
  LED access update the `audio-mute` LED trigger.
- [Phase 6] `git grep` verified `v6.12` and `for-greg/6.12-201` contain
  ThinkBook entries using the affected fixup; `for-greg/6.6-201`
  contains the affected fixup but not the original ThinkBook 16P IDs.
- [Phase 6] `git apply --check` against current `v7.0.3` succeeded.
- [Phase 8] Risk assessment verified the patch changes only LED metadata
  setup and preserves the existing `alc287_fixup_bind_dacs()` call.
- UNVERIFIED: exact candidate commit SHA, because the subject was not
  found as a local commit in the checked branches and only the mailing-
  list patch was available.
- UNVERIFIED: independent user bug report outside the patch thread.
- UNVERIFIED: clean application to every older stable branch without
  manual path adjustment.

**YES**

 sound/hda/codecs/realtek/alc269.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/sound/hda/codecs/realtek/alc269.c b/sound/hda/codecs/realtek/alc269.c
index ae74e1b69eb3a..44cf78ea6dc03 100644
--- a/sound/hda/codecs/realtek/alc269.c
+++ b/sound/hda/codecs/realtek/alc269.c
@@ -3707,6 +3707,17 @@ static void alc287_fixup_lenovo_thinkpad_with_alc1318(struct hda_codec *codec,
 	spec->power_hook = alc287_s4_power_gpio3_default;
 	spec->gen.pcm_playback_hook = alc287_alc1318_playback_pcm_hook;
 }
+
+static void alc287_fixup_tb_vmaster_led(struct hda_codec *codec,
+					const struct hda_fixup *fix, int action)
+{
+	struct alc_spec *spec = codec->spec;
+
+	if (action == HDA_FIXUP_ACT_PRE_PROBE)
+		spec->gen.vmaster_mute_led = 1;
+
+	alc287_fixup_bind_dacs(codec, fix, action);
+}
 /* GPIO2: mute led GPIO3: micmute led */
 static void alc245_tas2781_spi_hp_fixup_muteled(struct hda_codec *codec,
 					  const struct hda_fixup *fix, int action)
@@ -6460,7 +6471,7 @@ static const struct hda_fixup alc269_fixups[] = {
 	},
 	[ALC287_FIXUP_MG_RTKC_CSAMP_CS35L41_I2C_THINKPAD] = {
 		.type = HDA_FIXUP_FUNC,
-		.v.func = alc287_fixup_bind_dacs,
+		.v.func = alc287_fixup_tb_vmaster_led,
 		.chained = true,
 		.chain_id = ALC287_FIXUP_CS35L41_I2C_2_THINKPAD_ACPI,
 	},
-- 
2.53.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH AUTOSEL 7.0-6.18] mailbox: cix: Add IRQF_NO_SUSPEND to mailbox interrupt
  2026-05-05  9:51 [PATCH AUTOSEL 7.0-5.10] ALSA: hda: Avoid WARN_ON() for HDMI chmap slot checks Sasha Levin
                   ` (12 preceding siblings ...)
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0] ALSA: hda/realtek: enable mute LED support on ThinkBook 16p Sasha Levin
@ 2026-05-05  9:51 ` Sasha Levin
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.12] ASoC: codecs: wcd937x: fix AUX PA sequencing and mixer controls Sasha Levin
                   ` (13 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Sasha Levin @ 2026-05-05  9:51 UTC (permalink / raw)
  To: patches, stable
  Cc: Dylan Wu, Jassi Brar, Sasha Levin, peter.chen, fugang.duan,
	linux-arm-kernel, linux-kernel

From: Dylan Wu <fredwudi0305@gmail.com>

[ Upstream commit 80784b427970219ebc338a6fb4118cde67a6c317 ]

During the system suspend process, device interrupts are masked in the
noirq phase. However, SCMI often needs to exchange final messages with the
firmware to complete the power-down transition. Without the IRQF_NO_SUSPEND
flag, the mailbox ISR cannot run during this late stage, leading to SCMI
communication timeouts and error messages like "SCMI protocol wait for
resp timeout" during suspend.

Add the IRQF_NO_SUSPEND flag to the interrupt request to ensure the mailbox
can continue to handle responses during the noirq stages of suspend and
resume, thereby ensuring a reliable power state transition.

Signed-off-by: Dylan Wu <fredwudi0305@gmail.com>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Phase Walkthrough

### Phase 1: Commit Message Forensics
Step 1.1 Record: subsystem is `mailbox: cix`; action verb is `Add`;
claimed intent is to add `IRQF_NO_SUSPEND` so the CIX mailbox interrupt
can run during suspend/resume noirq phases.

Step 1.2 Record: tags present are `Signed-off-by: Dylan Wu
<fredwudi0305@gmail.com>` and `Signed-off-by: Jassi Brar
<jassisinghbrar@gmail.com>`. No `Fixes:`, `Reported-by:`, `Tested-by:`,
`Reviewed-by:`, `Acked-by:`, `Link:`, or `Cc: stable@vger.kernel.org`
tags were present.

Step 1.3 Record: the commit describes SCMI response timeouts during
suspend because device IRQs are masked in noirq, preventing the mailbox
ISR from handling firmware responses. Symptom recorded in the message:
`"SCMI protocol wait for resp timeout"` during suspend. Root cause: the
CIX mailbox IRQ was requested with flags `0`, so it is suspended with
ordinary device IRQs.

Step 1.4 Record: yes, this is a hidden bug fix despite the `Add`
wording. It fixes a power-management failure mode on existing CIX SCMI
mailbox users.

### Phase 2: Diff Analysis
Step 2.1 Record: one file changed, `drivers/mailbox/cix-mailbox.c`; 1
insertion, 1 deletion. Modified function: `cix_mbox_startup()`. Scope:
single-file surgical fix.

Step 2.2 Record: before, `cix_mbox_startup()` called
`request_irq(priv->irq, cix_mbox_isr, 0, ...)`. After, it calls the same
ISR with `IRQF_NO_SUSPEND`. This affects mailbox channel startup and
later suspend/resume IRQ handling, not message formatting or normal send
logic.

Step 2.3 Record: bug category is suspend/resume hardware/firmware
communication workaround. The specific mechanism is that SCMI over the
CIX mailbox needs interrupt-driven completion during noirq
suspend/resume, but the IRQ was suspendable.

Step 2.4 Record: fix quality is high: one flag added to an existing IRQ
request, no new API, no data structure change, no refactor. Regression
risk is low. The main theoretical risk of `IRQF_NO_SUSPEND` is keeping
an IRQ active during noirq, but kernel documentation says that is the
intended use for special-purpose interrupts, and this request is not
using `IRQF_SHARED`.

### Phase 3: Git History Investigation
Step 3.1 Record: `git blame` shows the affected `request_irq(..., 0,
...)` line was introduced by `fe2aa2361ddba8f4ccf05123e0e14e9fb70ea701`
(`mailbox: add CIX mailbox driver`). `git describe --contains` places it
in `v6.17-rc1`; `git merge-base` confirmed it is not in `v6.16` and is
in `v6.17`.

Step 3.2 Record: no `Fixes:` tag is present. I still inspected the
introducing commit: it added the whole CIX mailbox driver, including the
suspendable IRQ request. The driver introduction exists in stable-
relevant `v6.17+` history, but not older trees.

Step 3.3 Record: recent file history shows `fe2aa2361ddba` driver
introduction, `dfd997b1e24c` typo fix, and candidate `80784b427970`.
Between `v7.0` and the candidate there is also `89e5d7d61600` removing
an internal header include, but `git apply --check` confirmed the
candidate patch applies to the current 7.0 tree without that being a
functional prerequisite.

Step 3.4 Record: `git log --author='Dylan Wu' -10 -- drivers/mailbox`
found no prior local mailbox commits by Dylan Wu. The patch was
committed and signed off by Jassi Brar, and `MAINTAINERS` verifies Jassi
is the `MAILBOX API` maintainer.

Step 3.5 Record: dependencies found: none for the actual change.
`IRQF_NO_SUSPEND` is available via existing interrupt headers, and the
same `request_irq` line exists in `v6.17` and `v7.0`.

### Phase 4: Mailing List And External Research
Step 4.1 Record: `b4 dig -c 80784b427970219ebc338a6fb4118cde67a6c317`
found the original thread at `https://patch.msgid.link/20260209083452.15
4983-1-fredwudi0305@gmail.com`. `b4 dig -a` found only v1. The saved
mbox shows Jassi replied: “Applied to mailbox/for-next”. No NAKs or
concerns were present in the fetched thread.

Step 4.2 Record: `b4 dig -w` shows the original recipients included
Dylan Wu, Peter Chen, Fugang Duan, Jassi Brar, `cix-kernel-
upstream@cixtech.com`, `linux-arm-kernel`, and `linux-kernel`.
`MAINTAINERS` confirms Peter Chen/Fugang Duan as CIX maintainers and
Jassi Brar as mailbox maintainer.

Step 4.3 Record: there were no `Link:` or `Reported-by:` tags. Web
search for the exact timeout plus CIX did not find a separate public bug
report. The only verified bug report source is the commit/thread text
itself.

Step 4.4 Record: b4 found this as a standalone single-patch series, not
a multi-patch dependency.

Step 4.5 Record: direct `WebFetch` for lore stable search was blocked by
Anubis. Web search for exact subject plus `stable` found no stable-
specific discussion. This is an unresolved external-history gap, not a
technical blocker.

### Phase 5: Code Semantic Analysis
Step 5.1 Record: modified function is `cix_mbox_startup()`.

Step 5.2 Record: caller chain verified in code: SCMI mailbox transport
calls `mbox_request_channel()`; mailbox core calls
`__mbox_bind_client()`; that calls `chan->mbox->ops->startup(chan)`,
which is `cix_mbox_startup()` for this driver.

Step 5.3 Record: key callees in `cix_mbox_startup()` are
`request_irq()`, `cix_mbox_read()`, and `cix_mbox_write()` to enable
ACK, doorbell, FIFO, or fast-channel interrupts depending on channel
type.

Step 5.4 Record: reachability is verified for CIX SCMI:
`arch/arm64/boot/dts/cix/sky1.dtsi` defines `arm,scmi` with `mboxes =
<&mbox_ap2pm 8>, <&mbox_pm2ap 8>`, and those mailbox nodes use
`compatible = "cix,sky1-mbox"`. This path is used by firmware
communication, including SCMI. Unprivileged triggering was not verified.

Step 5.5 Record: similar patterns exist in mailbox drivers: `tegra-hsp`,
`sprd-mailbox`, `qcom-ipcc`, `qcom-cpucp`, `mailbox-th1520`,
`bcm2835-mailbox`, and `bcm74110-mailbox` use `IRQF_NO_SUSPEND`. Related
commits for `qcom-cpucp` and `bcm2835` fixed SCMI/firmware timeout
failures during suspend/resume with the same flag.

### Phase 6: Stable Tree Analysis
Step 6.1 Record: buggy CIX code exists from `v6.17-rc1` onward. Verified
not in `v6.16`, verified in `v6.17` and `v7.0`, and candidate is
contained in `v7.1-rc2` but not `v7.0`.

Step 6.2 Record: expected backport difficulty is clean or very minor.
`git apply --check` succeeded on the current 7.0 tree, and the exact
buggy request line exists in `v6.17` and `v7.0`.

Step 6.3 Record: no alternate CIX-specific fix was found in local
history or exact-subject stable search. Similar mailbox fixes exist for
other drivers, but not this CIX driver.

### Phase 7: Subsystem And Maintainer Context
Step 7.1 Record: subsystem is `drivers/mailbox`, platform-specific CIX
mailbox used by SCMI firmware communication. Criticality is platform-
specific but important for affected CIX systems because suspend/resume
and firmware messaging are core platform operations.

Step 7.2 Record: CIX mailbox file activity is low: driver introduction,
a typo fix, header cleanup, and this fix. Mailbox subsystem has
established maintainer coverage through Jassi Brar.

### Phase 8: Impact And Risk
Step 8.1 Record: affected users are CIX Sky1/ARCH_CIX systems with
`CONFIG_CIX_MBOX`; `arch/arm64/configs/defconfig` enables
`CONFIG_CIX_MBOX=y`.

Step 8.2 Record: trigger condition is system suspend/resume noirq while
SCMI needs mailbox responses. Commonness depends on affected hardware
and use of suspend/resume. Unprivileged trigger was not verified.

Step 8.3 Record: failure mode is SCMI response timeout during
suspend/resume, causing unreliable power-state transition. Severity:
HIGH for affected systems; not verified as panic/data
corruption/security.

Step 8.4 Record: benefit is high for affected CIX SCMI systems because
it fixes suspend/resume firmware communication. Risk is low: one-line
IRQ flag change, no shared IRQ flag, no ABI/API change.

### Phase 9: Final Synthesis
Step 9.1 Record: evidence for backporting: real suspend/resume timeout
described; code path verified from CIX SCMI DT through SCMI mailbox
transport into mailbox startup; kernel docs verify ordinary device IRQs
are disabled during noirq and `IRQF_NO_SUSPEND` is the intended
mechanism; fix is one line; analogous mailbox fixes exist; applies to
current 7.0 tree; buggy driver exists in `v6.17+`. Evidence against: no
`Fixes`, no reporter/tester tag, no external bug report found, no
hardware test performed here, and relevance is limited to CIX platforms.

Step 9.2 Record: stable rules checklist: obviously correct and tested by
maintainer acceptance, but hardware testing is unverified; fixes a real
user-visible bug per commit and verified code path; important issue
because suspend/resume firmware communication can time out; small and
contained at 1 line; no new feature/API; can apply to at least current
7.0 tree and likely `v6.17+`.

Step 9.3 Record: exception category: hardware/platform workaround for
suspend/resume mailbox/firmware communication, analogous to accepted
mailbox `IRQF_NO_SUSPEND` fixes.

Step 9.4 Record: decision is YES. This is a small, targeted platform
power-management fix for a real suspend/resume failure mode, with low
regression risk and verified applicability to stable trees that contain
the CIX mailbox driver.

## Verification
- [Phase 1] `git show --format=fuller --stat --patch
  80784b427970219ebc338a6fb4118cde67a6c317`: verified subject, body,
  tags, author/committer, and one-line diff.
- [Phase 2] Same `git show`: verified `drivers/mailbox/cix-mailbox.c`
  changes only `request_irq()` flags in `cix_mbox_startup()`.
- [Phase 3] `git blame -L 401,410 -- drivers/mailbox/cix-mailbox.c`:
  verified buggy line introduced by `fe2aa2361ddba`.
- [Phase 3] `git show --no-patch --format=fuller fe2aa2361ddba`:
  verified CIX mailbox driver introduction and maintainer/reviewer tags.
- [Phase 3] `git describe --contains` and `git merge-base --is-
  ancestor`: verified driver starts in `v6.17-rc1`, not `v6.16`;
  candidate is in `v7.1-rc2`, not `v7.0`.
- [Phase 3] `git apply --check`: verified candidate patch applies to
  current 7.0 tree.
- [Phase 4] `b4 dig -c`, `-a`, `-w`, and saved mbox: verified lore
  thread, single v1 series, recipients, and maintainer “Applied to
  mailbox/for-next” reply.
- [Phase 5] `ReadFile`/`rg` on `drivers/mailbox/mailbox.c`, SCMI mailbox
  transport, and CIX DTS: verified call path and CIX SCMI mailbox use.
- [Phase 5] `rg IRQF_NO_SUSPEND drivers/mailbox`: verified similar
  mailbox drivers use this flag.
- [Phase 6] `git show v6.17:drivers/mailbox/cix-mailbox.c` and
  `v7.0:...`: verified the same suspendable `request_irq(..., 0, ...)`
  line exists.
- [Phase 7] `MAINTAINERS`: verified CIX and mailbox maintainers.
- [Phase 8] `Documentation/power/suspend-and-interrupts.rst` and
  `kernel/irq/pm.c`: verified noirq IRQ masking and `IRQF_NO_SUSPEND`
  semantics.
- UNVERIFIED: real CIX hardware test result.
- UNVERIFIED: direct apply check on every stable branch/tag, though code
  presence and current-tree apply were verified.
- UNVERIFIED: separate public bug report beyond commit message/thread;
  lore WebFetch stable search was blocked by Anubis.

**YES**

 drivers/mailbox/cix-mailbox.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/mailbox/cix-mailbox.c b/drivers/mailbox/cix-mailbox.c
index 443620e8ae37f..933820b12772d 100644
--- a/drivers/mailbox/cix-mailbox.c
+++ b/drivers/mailbox/cix-mailbox.c
@@ -405,7 +405,7 @@ static int cix_mbox_startup(struct mbox_chan *chan)
 	int index = cp->index, ret;
 	u32 val;

-	ret = request_irq(priv->irq, cix_mbox_isr, 0,
+	ret = request_irq(priv->irq, cix_mbox_isr, IRQF_NO_SUSPEND,
 			  dev_name(priv->dev), chan);
 	if (ret) {
 		dev_err(priv->dev, "Unable to acquire IRQ %d\n", priv->irq);
-- 
2.53.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH AUTOSEL 7.0-6.12] ASoC: codecs: wcd937x: fix AUX PA sequencing and mixer controls
  2026-05-05  9:51 [PATCH AUTOSEL 7.0-5.10] ALSA: hda: Avoid WARN_ON() for HDMI chmap slot checks Sasha Levin
                   ` (13 preceding siblings ...)
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.18] mailbox: cix: Add IRQF_NO_SUSPEND to mailbox interrupt Sasha Levin
@ 2026-05-05  9:51 ` Sasha Levin
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.18] btrfs: replace ASSERT with proper error handling in stripe lookup fallback Sasha Levin
                   ` (12 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Sasha Levin @ 2026-05-05  9:51 UTC (permalink / raw)
  To: patches, stable
  Cc: Ajay Kumar Nandam, Mark Brown, Sasha Levin, srini, lgirdwood,
	perex, tiwai, linux-sound, linux-arm-msm, linux-kernel

From: Ajay Kumar Nandam <ajay.nandam@oss.qualcomm.com>

[ Upstream commit 74c876bfd71b1023029a483d7213015201f62b53 ]

Enable AUX PA sequencing during AUX DAC DAPM events and keep the
AUX-specific RX supplies enabled while the path is active.

Add the missing AUX-related mixer controls, including CLSH PA and
DSD left/right switches, so AUX playback can be routed from userspace.

Signed-off-by: Ajay Kumar Nandam <ajay.nandam@oss.qualcomm.com>
Link: https://patch.msgid.link/20260420180221.785113-1-ajay.nandam@oss.qualcomm.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Decision Summary
This is a small WCD937x codec fix for an existing AUX playback path. The
verified problem is functional: the AUX path has DAPM routing and
SoundWire channel definitions, but lacks required AUX PA
sequencing/supplies and mixer controls needed for userspace routing.
That can result in broken AUX playback on affected Qualcomm WCD937x
systems.

It is not a security/crash/data-corruption fix, and it does add visible
ALSA mixer controls, which is the main stable-policy concern. On
balance, I would backport it to stable trees that already contain the
WCD937x AUX playback code, especially `v6.12+`; it is small, driver-
local, applies cleanly to `v6.12`, and fixes existing hardware
functionality rather than adding a new driver or subsystem.

## Phase Walkthrough

### Phase 1: Commit Message Forensics
Step 1.1 Record: subsystem `ASoC: codecs: wcd937x`; action verb `fix`;
intent is to fix AUX PA sequencing and missing AUX-related mixer
controls.

Step 1.2 Record: tags found: `Signed-off-by: Ajay Kumar Nandam
<ajay.nandam@oss.qualcomm.com>`, `Link: https://patch.msgid.link/2026042
0180221.785113-1-ajay.nandam@oss.qualcomm.com`, `Signed-off-by: Mark
Brown <broonie@kernel.org>`. No `Fixes:`, `Reported-by:`, `Tested-by:`,
`Reviewed-by:`, or `Cc: stable`.

Step 1.3 Record: message says AUX PA sequencing must be enabled during
AUX DAC DAPM events, AUX-specific RX supplies must stay enabled while
active, and missing CLSH/DSD mixer controls prevent userspace from
routing AUX playback. No crash trace, board name, or kernel version is
stated.

Step 1.4 Record: this is a hidden functional hardware bug fix, not
cleanup. It repairs missing register sequencing and missing routing
controls for an already-present AUX playback path.

### Phase 2: Diff Analysis
Step 2.1 Record: one file, `sound/soc/codecs/wcd937x.c`, `25
insertions`, `1 deletion`. Modified functions/data:
`wcd937x_codec_aux_dac_event()`, `wcd937x_codec_enable_aux_pa()`,
`wcd937x_snd_controls[]`. Scope is single-file, driver-local, surgical.

Step 2.2 Record: `wcd937x_codec_aux_dac_event()` now sets/clears
`WCD937X_AUX_AUXPA` bit 4 with AUX DAC power-up/down.
`wcd937x_codec_enable_aux_pa()` now enables `WCD937X_ANA_RX_SUPPLIES`
bits 6 and 7 after AUX PA power-up and disables them before power-down
completes. `wcd937x_snd_controls[]` gains CLSH/DSD RX SoundWire
switches.

Step 2.3 Record: bug category is hardware sequencing plus
routing/control correctness. The affected path is normal AUX playback,
not an error path, race, memory safety issue, or refcount issue.

Step 2.4 Record: fix is small and easy to audit, but there is a
regression concern: Mark Brown noted in review that disabling RX
supplies in `SND_SOC_DAPM_PRE_PMD` before DAPM disables the PGA is “a
bit weird.” He later applied the patch anyway; no NAK or follow-up
correction was found in the thread.

### Phase 3: Git History Investigation
Step 3.1 Record: blame shows the AUX DAC/PA event code came from
`57fe69db7a015e` (“ASoC: codecs: wcd937x: add playback dapm widgets”),
first contained in `v6.11-rc1`. The base controls came from
`82be8c62a38c6a` (“add basic controls”), also first contained in
`v6.11-rc1`.

Step 3.2 Record: no `Fixes:` tag, so no tagged introducer to follow.
Manual blame identifies the relevant introducers above.

Step 3.3 Record: related history includes `041db4bbe04e8` adding a
missing `LO Switch` control, and `107a5c853eef5` relaxing the AUX PDM
watchdog. The candidate’s context includes the watchdog changes, which
are present from `v6.12`.

Step 3.4 Record: no earlier commits by Ajay Kumar Nandam touching
`sound/soc/codecs` or `wcd937x.c` were found on this branch. The patch
was applied by ASoC maintainer Mark Brown.

Step 3.5 Record: dependencies are the existing WCD937x playback widgets,
controls, SoundWire channel definitions, and for clean context the AUX
watchdog update. It applies cleanly to current `7.0` and to a checked
`v6.12` worktree.

### Phase 4: Mailing List And External Research
Step 4.1 Record: `b4 dig -c 74c876bfd71b` found the original patch at
the patch.msgid.link URL. `b4 dig -a` found only `v1`; no later
revision.

Step 4.2 Record: `b4 dig -w` showed appropriate recipients: Srinivas
Kandagatla, Liam Girdwood, Mark Brown, Jaroslav Kysela, Takashi Iwai,
`linux-sound`, `linux-arm-msm`, and `linux-kernel`.

Step 4.3 Record: no separate bug report or syzbot/bugzilla link was
present. Web/lore mirror confirmed the only substantive review comment
was Mark Brown’s sequencing concern.

Step 4.4 Record: no multi-patch series was found; this is a standalone
one-patch submission.

Step 4.5 Record: direct lore stable search via `WebFetch` was blocked by
Anubis; web search did not reveal stable-specific discussion for this
exact patch.

### Phase 5: Code Semantic Analysis
Step 5.1 Record: modified functions/data are
`wcd937x_codec_aux_dac_event()`, `wcd937x_codec_enable_aux_pa()`, and
`wcd937x_snd_controls[]`.

Step 5.2 Record: `rg` found both event functions are referenced only by
DAPM widgets: `RDAC4` uses `wcd937x_codec_aux_dac_event()`, and `AUX
PGA` uses `wcd937x_codec_enable_aux_pa()`.

Step 5.3 Record: key callees are `snd_soc_component_update_bits()`,
`wcd937x_rx_clk_enable()`, `wcd_clsh_ctrl_set_state()`, `enable_irq()`,
and `disable_irq_nosync()`. Mixer controls call `wcd937x_get_swr_port()`
/ `wcd937x_set_swr_port()`.

Step 5.4 Record: call chain is userspace ALSA routing/control changes ->
ASoC DAPM powers the AUX route (`IN3_AUX` -> `RX3` -> `RDAC4` ->
`AUX_RDAC` -> `AUX PGA` -> `AUX`) -> the modified DAPM event callbacks
program the codec. The mixer control path updates SoundWire port/channel
masks used later by `wcd937x_sdw_hw_params()` and
`sdw_stream_add_slave()`.

Step 5.5 Record: similar pattern exists in the same driver history:
`041db4bbe04e8` added a missing `LO Switch` for audio playback routing.
`wcd937x-sdw.c` already defines RX channels for `WCD937X_CLSH`,
`WCD937X_DSD_L`, and `WCD937X_DSD_R`, confirming the new controls target
existing channel definitions.

### Phase 6: Stable Tree Analysis
Step 6.1 Record: `v6.1` and `v6.6` do not contain the relevant WCD937x
AUX code. `v6.11` contains the AUX widgets and channel definitions.
`v6.12` contains the watchdog context used by this patch.

Step 6.2 Record: backport difficulty is clean for `v6.12+`; verified by
creating a detached `v6.12` worktree and running `git apply --check`.
For `v6.11`, the patch may need adjustment or prerequisite
`107a5c853eef5`.

Step 6.3 Record: no alternative fix for this exact AUX PA sequencing /
CLSH+DSD control issue was found in current branch history.

### Phase 7: Subsystem And Maintainer Context
Step 7.1 Record: subsystem is ALSA SoC codec driver under
`sound/soc/codecs`; criticality is driver-specific/peripheral, affecting
systems using Qualcomm WCD937x AUX playback.

Step 7.2 Record: WCD937x is active: recent history includes
probe/resource fixes, SoundWire port fixes, OF node leak fixes, and AUX
watchdog fixes.

### Phase 8: Impact And Risk
Step 8.1 Record: affected users are WCD937x codec users with AUX
playback routes.

Step 8.2 Record: trigger is normal AUX playback setup/routing through
ALSA/ASoC; userspace can trigger routing through mixer controls. Exact
affected board population was not verified.

Step 8.3 Record: failure mode is loss of AUX playback/routing, not a
crash or memory-safety bug. Severity is medium for the kernel generally,
high for affected hardware where AUX playback is required.

Step 8.4 Record: benefit is restoring existing hardware functionality on
affected WCD937x systems. Risk is low to medium: the patch is tiny and
driver-local, but adds visible mixer controls and has one noted
sequencing concern.

### Phase 9: Final Synthesis
Step 9.1 Record: evidence for backporting: fixes existing AUX playback
functionality; small single-driver patch; code and channels already
exist; applies cleanly to `v6.12`; reviewed/applied by ASoC maintainer.
Evidence against: no `Reported-by`/`Tested-by`; not a
crash/security/corruption fix; adds new ALSA controls; maintainer raised
a sequencing concern. Unresolved: exact board/user impact and whether
`v6.11` should receive a hand backport.

Step 9.2 Record: stable rules checklist: obviously correct and tested:
mostly yes by code inspection and maintainer application, but no
explicit `Tested-by`; real bug: yes, broken AUX routing/sequencing;
important issue: yes for affected hardware, though not critical system-
wide; small and contained: yes; no new feature/API: borderline because
mixer controls are user-visible, but they expose existing missing route
controls; applies to stable: yes for `v6.12+`, older active trees
without WCD937x AUX code are not applicable.

Step 9.3 Record: exception category: closest match is hardware-specific
driver functionality fix, not a pure device-ID/quirk/build/doc
exception.

Step 9.4 Record: decision is YES for stable trees containing the WCD937x
AUX playback path, especially `v6.12+`. Do not backport to trees where
the driver/path does not exist.

## Verification
- Phase 1: Read candidate commit message and tags from `git show
  74c876bfd71b` and b4-fetched mbox.
- Phase 2: Verified diff stat: one file, `25 insertions`, `1 deletion`.
- Phase 3: Ran `git blame` on AUX DAC, AUX PA, and controls blocks;
  identified `57fe69db7a015e` and `82be8c62a38c6a`.
- Phase 3: Ran `git describe --contains`; WCD937x AUX/control code
  starts in `v6.11-rc1`; watchdog context starts in `v6.12-rc6`.
- Phase 4: Ran `b4 dig -c 74c876bfd71b`, `-a`, and `-w`; found v1-only
  patch and original recipients.
- Phase 4: Fetched lore mirror; confirmed Mark Brown’s review concern
  and later ASoC application.
- Phase 5: Used `rg` and file reads to trace DAPM widget references, AUX
  route, SoundWire channel definitions, and `wcd937x_sdw_hw_params()`.
- Phase 6: Used `git grep` against `v6.1`, `v6.6`, `v6.11`, and `v6.12`;
  verified absence/presence of relevant code.
- Phase 6: Verified `git apply --check` succeeds on current tree and on
  a temporary detached `v6.12` worktree.
- Unverified: exact affected product/board list; no explicit tester or
  reporter was found.

**YES**

 sound/soc/codecs/wcd937x.c | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/sound/soc/codecs/wcd937x.c b/sound/soc/codecs/wcd937x.c
index 10a2d598caa71..72a53f95d6887 100644
--- a/sound/soc/codecs/wcd937x.c
+++ b/sound/soc/codecs/wcd937x.c
@@ -546,6 +546,9 @@ static int wcd937x_codec_aux_dac_event(struct snd_soc_dapm_widget *w,
 		snd_soc_component_update_bits(component,
 					      WCD937X_DIGITAL_CDC_ANA_CLK_CTL,
 					      BIT(2), BIT(2));
+		snd_soc_component_update_bits(component,
+					      WCD937X_AUX_AUXPA,
+					      BIT(4), BIT(4));
 		snd_soc_component_update_bits(component,
 					      WCD937X_DIGITAL_CDC_DIG_CLK_CTL,
 					      BIT(2), BIT(2));
@@ -562,6 +565,9 @@ static int wcd937x_codec_aux_dac_event(struct snd_soc_dapm_widget *w,
 		snd_soc_component_update_bits(component,
 					      WCD937X_DIGITAL_CDC_ANA_CLK_CTL,
 					      BIT(2), 0x00);
+		snd_soc_component_update_bits(component,
+					      WCD937X_AUX_AUXPA,
+					      BIT(4), 0x00);
 		break;
 	}

@@ -730,10 +736,23 @@ static int wcd937x_codec_enable_aux_pa(struct snd_soc_dapm_widget *w,
 			snd_soc_component_update_bits(component,
 						      WCD937X_ANA_RX_SUPPLIES,
 						      BIT(1), BIT(1));
+		/* Enable AUX PA related RX supplies */
+		snd_soc_component_update_bits(component,
+					      WCD937X_ANA_RX_SUPPLIES,
+					      BIT(6), BIT(6));
+		snd_soc_component_update_bits(component,
+					      WCD937X_ANA_RX_SUPPLIES,
+					      BIT(7), BIT(7));
 		enable_irq(wcd937x->aux_pdm_wd_int);
 		break;
 	case SND_SOC_DAPM_PRE_PMD:
 		disable_irq_nosync(wcd937x->aux_pdm_wd_int);
+		snd_soc_component_update_bits(component,
+					      WCD937X_ANA_RX_SUPPLIES,
+					      BIT(6), 0x00);
+		snd_soc_component_update_bits(component,
+					      WCD937X_ANA_RX_SUPPLIES,
+					      BIT(7), 0x00);
 		break;
 	case SND_SOC_DAPM_POST_PMD:
 		usleep_range(2000, 2010);
@@ -2051,7 +2070,12 @@ static const struct snd_kcontrol_new wcd937x_snd_controls[] = {
 		       wcd937x_get_swr_port, wcd937x_set_swr_port),
 	SOC_SINGLE_EXT("LO Switch", WCD937X_LO, 0, 1, 0,
 		       wcd937x_get_swr_port, wcd937x_set_swr_port),
-
+	SOC_SINGLE_EXT("CLSH PA Switch", WCD937X_CLSH, 0, 1, 0,
+		       wcd937x_get_swr_port, wcd937x_set_swr_port),
+	SOC_SINGLE_EXT("DSD_L Switch", WCD937X_DSD_L, 0, 1, 0,
+		       wcd937x_get_swr_port, wcd937x_set_swr_port),
+	SOC_SINGLE_EXT("DSD_R Switch", WCD937X_DSD_R, 0, 1, 0,
+		       wcd937x_get_swr_port, wcd937x_set_swr_port),
 	SOC_SINGLE_EXT("ADC1 Switch", WCD937X_ADC1, 1, 1, 0,
 		       wcd937x_get_swr_port, wcd937x_set_swr_port),
 	SOC_SINGLE_EXT("ADC2 Switch", WCD937X_ADC2, 1, 1, 0,
-- 
2.53.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH AUTOSEL 7.0-6.18] btrfs: replace ASSERT with proper error handling in stripe lookup fallback
  2026-05-05  9:51 [PATCH AUTOSEL 7.0-5.10] ALSA: hda: Avoid WARN_ON() for HDMI chmap slot checks Sasha Levin
                   ` (14 preceding siblings ...)
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.12] ASoC: codecs: wcd937x: fix AUX PA sequencing and mixer controls Sasha Levin
@ 2026-05-05  9:51 ` Sasha Levin
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-5.10] btrfs: handle unexpected free-space-tree key types Sasha Levin
                   ` (11 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Sasha Levin @ 2026-05-05  9:51 UTC (permalink / raw)
  To: patches, stable
  Cc: robbieko, David Sterba, Sasha Levin, clm, linux-btrfs,
	linux-kernel

From: robbieko <robbieko@synology.com>

[ Upstream commit 653361585d251fbca0e19ac58b04ba95dd01e378 ]

After falling back to the previous item in btrfs_delete_raid_extent(),
the code uses ASSERT(found_start <= start) to verify the found extent
actually precedes our target range. If the B-tree state is unexpected
(e.g. no overlapping extent exists), this triggers a kernel BUG/panic
in debug builds, or silently continues with wrong data otherwise.

Replace the ASSERT with a proper bounds check that returns -ENOENT if
the found extent does not actually overlap with the start position.

Signed-off-by: robbieko <robbieko@synology.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Phase 1: Commit Message Forensics
Step 1.1 Record: Subsystem `btrfs`; action verb `replace`; intent is to
replace an assertion in `btrfs_delete_raid_extent()` with real error
handling for stripe lookup fallback.

Step 1.2 Record: Tags in commit
`653361585d251fbca0e19ac58b04ba95dd01e378`: `Signed-off-by: robbieko
<robbieko@synology.com>`, `Reviewed-by: David Sterba
<dsterba@suse.com>`, `Signed-off-by: David Sterba <dsterba@suse.com>`.
No `Fixes:`, no `Reported-by:`, no `Tested-by:`, no `Link:`, no `Cc:
stable`.

Step 1.3 Record: The commit says that after fallback to the previous
item, `ASSERT(found_start <= start)` can BUG/panic when B-tree state is
unexpected, and non-assert builds can continue with wrong stripe data.
Root cause described: the previous item found may not actually overlap
the requested deletion range.

Step 1.4 Record: This is not hidden cleanup. It is explicitly a bug fix:
it converts an invariant-only check into a runtime bounds check
returning `-ENOENT`.

## Phase 2: Diff Analysis
Step 2.1 Record: One file changed: `fs/btrfs/raid-stripe-tree.c`, 4
insertions, 1 deletion. One function changed:
`btrfs_delete_raid_extent()`. Scope: single-file surgical fix.

Step 2.2 Record: Before, fallback loaded the previous key, computed
`found_start` and `found_end`, then asserted only `found_start <=
start`. After, it returns `-ENOENT` and exits the delete loop if
`found_start > start` or `found_end <= start`. This affects the stripe
extent deletion lookup path after `btrfs_search_slot()` chooses an item
after the deletion start and the code backs up to a candidate previous
item.

Step 2.3 Record: Bug category is logic/correctness with data-integrity
implications. Broken mechanism: the fallback candidate was assumed to
overlap the deletion start. The fix verifies both start and end bounds
before later code can truncate, split, or delete a stripe extent.

Step 2.4 Record: Fix quality is high: the check is local, easy to reason
about, and preserves existing error propagation. Regression risk is low,
but not zero: returning `-ENOENT` changes a previous silent/no-op path
into transaction abort at the caller. That is appropriate because the
caller already aborts on `btrfs_delete_raid_extent()` errors and the
condition means the stripe mapping lookup is inconsistent for the
deletion being performed.

## Phase 3: Git History Investigation
Step 3.1 Record: `git blame` shows the fallback block and
`ASSERT(found_start <= start)` were introduced by `76643119045eed`
(`btrfs: fix deletion of a range spanning parts two RAID stripe
extents`), first contained in `v6.14-rc1`. The broader partial deletion
machinery was introduced by `6aea95ee318890`, first contained in
`v6.13-rc1`.

Step 3.2 Record: No `Fixes:` tag is present, so there was no tagged
introducer to follow. Blame identifies the direct introducer.

Step 3.3 Record: Recent `master` history shows this was patch 4 in a
six-patch raid-stripe-tree deletion fix set: preceding fixes include
copying `devid`, fixing leaf-boundary lookup, and fixing
`btrfs_previous_item()` `min_objectid`; following fixes handle `-EAGAIN`
and missing return checks. The commit applies cleanly to the current
`stable/linux-7.0.y` checkout by itself, but it is best considered with
the neighboring raid-stripe-tree deletion fixes.

Step 3.4 Record: Author `robbieko` has multiple adjacent `fs/btrfs/raid-
stripe-tree.c` fixes in `master`. Reviewer/committer David Sterba is
listed as Btrfs maintainer in `MAINTAINERS`. Johannes Thumshirn, who
reviewed other patches in the series, has substantial prior raid-stripe-
tree history in the same file.

Step 3.5 Record: No new helper, structure, API, or external dependency
is introduced. Syntactically standalone: `git apply --check` of the
candidate patch succeeded on the current `stable/linux-7.0.y` checkout.
Semantically, it complements the surrounding lookup fixes.

## Phase 4: Mailing List And External Research
Step 4.1 Record: `b4 dig -c 653361585d251...` found the original
submission at `https://patch.msgid.link/20260413065249.2320122-5-
robbieko@synology.com`. `b4 dig -a -C` showed only v1. The thread was a
six-patch series titled `btrfs: fix multiple bugs in raid-stripe-tree
deletion path`.

Step 4.2 Record: `b4 dig -w` showed the original recipients were
`robbieko` and `linux-btrfs@vger.kernel.org`. In-thread
review/maintainer discussion involved David Sterba and Johannes
Thumshirn. David added the series to `for-next`.

Step 4.3 Record: No `Reported-by:` or `Link:` tag exists for an external
bug report. No syzbot or bugzilla report was present in the commit or
mbox.

Step 4.4 Record: Related patches found in the same series:
`513f8a52eed88`, `2aef5cb1dcf9b`, `1871ae78ffa5c`, `fe0cdfd7118d8`, and
`a8d58a7c02009`. The cover letter states all six fix bugs in raid-
stripe-tree deletion/partial deletion paths.

Step 4.5 Record: No stable-specific nomination was found in the mbox.
WebFetch attempts to lore/stable and lore/all were blocked by Anubis, so
external stable-list search could not be independently verified.

## Phase 5: Code Semantic Analysis
Step 5.1 Record: Modified function: `btrfs_delete_raid_extent()`.

Step 5.2 Record: Callers found by search: production caller
`do_free_extent_accounting()` in `fs/btrfs/extent-tree.c`, plus Btrfs
raid-stripe-tree selftests. `do_free_extent_accounting()` calls
`btrfs_delete_raid_extent()` for data extents and aborts the transaction
on error.

Step 5.3 Record: Key callees in `btrfs_delete_raid_extent()` include
`btrfs_find_chunk_map()`, `btrfs_need_stripe_tree_update()`,
`btrfs_search_slot()`, `btrfs_previous_item()`, `btrfs_del_item()`,
`btrfs_duplicate_item()`, and `btrfs_partially_delete_raid_extent()`.

Step 5.4 Record: Reachability verified: file extent removal paths call
`btrfs_free_extent()`, delayed refs call `__btrfs_free_extent()`, and
when refs reach zero `do_free_extent_accounting()` calls
`btrfs_delete_raid_extent()`. `btrfs_fallocate()` calls
`btrfs_punch_hole()` for `FALLOC_FL_PUNCH_HOLE`, which calls
`btrfs_replace_file_extents()`, which calls `btrfs_drop_extents()`,
which calls `btrfs_free_extent()`. So users with write access to files
on an affected Btrfs filesystem can reach this path via hole punching;
other data extent deletion paths also reach it.

Step 5.5 Record: Similar nearby issue pattern found in the same series:
multiple small fixes to `btrfs_delete_raid_extent()` and
`btrfs_partially_delete_raid_extent()` addressing missed entries, wrong
previous-item bounds, stale leaf pointer, missing `devid`, and unchecked
return values.

## Phase 6: Stable Tree Analysis
Step 6.1 Record: The directly blamed fallback code is first in
`v6.14-rc1`; merge-base checks show it is not in `v6.13`, but is in
`v6.14`, `v6.15`, `v6.16`, and `v7.0`. It is therefore relevant to
stable trees based on `v6.14+`, including the current
`stable/linux-7.0.y` checkout.

Step 6.2 Record: Backport difficulty is low. `git apply --check` of the
candidate patch onto current `stable/linux-7.0.y` succeeded. Neighboring
series patches also applied cleanly in the same check.

Step 6.3 Record: Local stable refs `stable/linux-6.16.y` and
`stable/linux-7.0.y` do not contain the candidate or the adjacent April
2026 raid-stripe-tree deletion fixes checked by subject/ancestor tests.

## Phase 7: Subsystem And Maintainer Context
Step 7.1 Record: Subsystem is Btrfs filesystem, specifically raid-
stripe-tree. Criticality: important, because it is filesystem
data/extent mapping code, but affected population is feature-specific
rather than universal.

Step 7.2 Record: Subsystem activity is high. Recent history in
`fs/btrfs/raid-stripe-tree.c` shows many fixes and follow-up selftests.
The feature is also exposed only under `CONFIG_BTRFS_EXPERIMENTAL` sysfs
feature attributes in this tree, so impact is limited to systems using
that feature.

## Phase 8: Impact And Risk Assessment
Step 8.1 Record: Affected users are Btrfs users with `RAID_STRIPE_TREE`
enabled and stripe-tree updates needed for data block group profiles
covered by `BTRFS_RST_SUPP_BLOCK_GROUP_MASK`.

Step 8.2 Record: Trigger conditions: deleting/freeing data extents on
such filesystems when stripe lookup fallback selects a non-overlapping
candidate. Verified reachable through file hole punching and delayed-ref
extent free paths. I did not verify a concrete reproducer for the exact
bad B-tree state.

Step 8.3 Record: Failure mode severity is high. With
`CONFIG_BTRFS_ASSERT`, `ASSERT()` calls `BUG()`, so this can crash the
kernel. Without that assertion code generated, the later deletion logic
can operate on a non-overlapping stripe extent, which is a filesystem
mapping corruption risk. If the previous candidate ends before the
target, the old code could also silently stop with success.

Step 8.4 Record: Benefit is high for affected users because it prevents
panic or wrong stripe-tree mutation in filesystem deletion code. Risk is
low because the patch is a four-line validation check in one function
and changes invalid lookup state into an error returned to an existing
transaction-abort path.

## Phase 9: Final Synthesis
Step 9.1 Record: Evidence for backporting: real filesystem correctness
bug, possible kernel BUG in assert builds, possible wrong stripe mapping
mutation in non-assert builds, one-file minimal patch, maintainer review
by David Sterba, merged by Linus via Btrfs fixes pull, applies cleanly
to current stable checkout, affected code exists in `v6.14+`. Evidence
against: feature-specific and experimental; no concrete external bug
report or reproducer found; part of a six-patch series, so stable
maintainers should consider taking the nearby deletion fixes too.
Unresolved: no successful WebFetch of lore/stable search due Anubis;
exact real-world frequency not verified.

Step 9.2 Record: Stable rules checklist:
1. Obviously correct and tested: mostly yes by inspection and maintainer
   review; no explicit `Tested-by`.
2. Fixes a real bug: yes, invalid stripe lookup fallback handling in
   deletion path.
3. Important issue: yes for affected users, because panic and filesystem
   mapping corruption are serious.
4. Small and contained: yes, 4 insertions/1 deletion in one function.
5. No new features/APIs: yes.
6. Can apply to stable: yes on current `stable/linux-7.0.y`; relevant
   for `v6.14+` trees.

Step 9.3 Record: No exception category applies. This is not a device ID,
quirk, DT, build, or documentation patch.

Step 9.4 Record: Decision is to backport. The limited affected
population does not outweigh the severity and low risk: this is a tiny
filesystem correctness fix for code already present in recent stable
trees.

## Verification
- [Phase 1] `git show --format=fuller --stat 653361585d251...`:
  confirmed subject, message, tags, reviewer, and 4 insertions/1
  deletion.
- [Phase 2] `git show 653361585d251... -- fs/btrfs/raid-stripe-tree.c`:
  confirmed exact hunk replacing `ASSERT(found_start <= start)` with
  `found_start > start || found_end <= start` check returning `-ENOENT`.
- [Phase 3] `git blame -L 114,146 -- fs/btrfs/raid-stripe-tree.c`:
  confirmed fallback block introduced by `76643119045eed`.
- [Phase 3] `git show 76643119045eed`: confirmed it added the range-
  spanning fallback logic.
- [Phase 3] `git describe --contains 76643119045eed`: confirmed first
  containment at `v6.14-rc1`.
- [Phase 3] `git show 6aea95ee318890`: confirmed earlier partial
  deletion implementation and its ASSERT-related context.
- [Phase 3] `git log master --oneline -30 -- fs/btrfs/raid-stripe-
  tree.c`: confirmed related series commits before and after candidate.
- [Phase 3] `git apply --check` using the candidate diff: confirmed
  clean application on current `stable/linux-7.0.y`.
- [Phase 4] `b4 dig -c 653361585d251...`: found original lore patch URL.
- [Phase 4] `b4 dig -a -C`: found v1 only.
- [Phase 4] `b4 dig -w`: confirmed original recipients.
- [Phase 4] Saved and read b4 mbox: confirmed six-patch series, David
  Sterba maintainer discussion, David’s “Added to for-next”, Johannes’
  question about `ENOENT` vs `EUCLEAN`, and David’s acceptance of
  `ENOENT`.
- [Phase 5] `rg btrfs_delete_raid_extent`: confirmed production caller
  in `do_free_extent_accounting()` and selftests.
- [Phase 5] Read `fs/btrfs/extent-tree.c`: confirmed errors from
  `btrfs_delete_raid_extent()` abort transaction.
- [Phase 5] Read `fs/btrfs/file.c`: confirmed `btrfs_fallocate()` hole
  punching can reach extent replacement/drop code and
  `btrfs_free_extent()`.
- [Phase 5] Read `fs/btrfs/messages.h`: confirmed `ASSERT()` calls
  `BUG()` when `CONFIG_BTRFS_ASSERT` is enabled and compiles away
  otherwise.
- [Phase 6] `git merge-base --is-ancestor` checks: confirmed blamed code
  is in `v6.14+` and candidate is not in local stable refs checked.
- [Phase 7] `rg MAINTAINERS`: confirmed David Sterba is a Btrfs
  maintainer.
- [Phase 7] `rg RAID_STRIPE_TREE fs/btrfs/sysfs.c`: confirmed raid-
  stripe-tree feature attribute is under `CONFIG_BTRFS_EXPERIMENTAL`.
- [Phase 8] Read `fs/btrfs/raid-stripe-tree.h`: confirmed
  `btrfs_need_stripe_tree_update()` limits affected data profiles and
  requires `RAID_STRIPE_TREE`.
- UNVERIFIED: exact real-world frequency and a concrete reproducer for
  the bad fallback state.
- UNVERIFIED: lore/stable search results, because WebFetch was blocked
  by Anubis.

**YES**

 fs/btrfs/raid-stripe-tree.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
index dd924048c6659..a2e9ac2d97988 100644
--- a/fs/btrfs/raid-stripe-tree.c
+++ b/fs/btrfs/raid-stripe-tree.c
@@ -151,7 +151,10 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le
 			btrfs_item_key_to_cpu(leaf, &key, slot);
 			found_start = key.objectid;
 			found_end = found_start + key.offset;
-			ASSERT(found_start <= start);
+			if (found_start > start || found_end <= start) {
+				ret = -ENOENT;
+				break;
+			}
 		}

 		if (key.type != BTRFS_RAID_STRIPE_KEY)
-- 
2.53.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH AUTOSEL 7.0-5.10] btrfs: handle unexpected free-space-tree key types
  2026-05-05  9:51 [PATCH AUTOSEL 7.0-5.10] ALSA: hda: Avoid WARN_ON() for HDMI chmap slot checks Sasha Levin
                   ` (15 preceding siblings ...)
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.18] btrfs: replace ASSERT with proper error handling in stripe lookup fallback Sasha Levin
@ 2026-05-05  9:51 ` Sasha Levin
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.18] md/raid5: Fix UAF on IO across the reshape position Sasha Levin
                   ` (10 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Sasha Levin @ 2026-05-05  9:51 UTC (permalink / raw)
  To: patches, stable
  Cc: David Sterba, Johannes Thumshirn, Sasha Levin, clm, linux-btrfs,
	linux-kernel

From: David Sterba <dsterba@suse.com>

[ Upstream commit 4d95b9efd783adca472e957b2f576983e789b839 ]

Replace the conditional assertions with proper error handling and
transaction abort if we find an unexpected key type in the free space
tree.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Phase 1: Commit Message Forensics
Step 1.1 Record: subsystem `btrfs`, action verb `handle`, intent:
replace assertions with real error handling for unexpected free-space-
tree key types.

Step 1.2 Record: tags found: `Reviewed-by: Johannes Thumshirn
<johannes.thumshirn@wdc.com>`, `Signed-off-by: David Sterba
<dsterba@suse.com>`. No `Fixes:`, `Reported-by:`, `Tested-by:`, `Link:`,
or `Cc: stable`.

Step 1.3 Record: the commit says unexpected free-space-tree key types
currently hit conditional assertions; the fix logs an error, returns
`-EUCLEAN`, and aborts the transaction. No user report, stack trace, or
affected version is stated.

Step 1.4 Record: yes, this is a hidden bug fix. `ASSERT(0)` is not
runtime handling in normal non-`CONFIG_BTRFS_ASSERT` builds, so the
unexpected-key branch previously did not return an error or advance
iteration.

## Phase 2: Diff Analysis
Step 2.1 Record: one file changed: `fs/btrfs/free-space-tree.c`, 15
insertions and 3 deletions. Modified functions:
`btrfs_convert_free_space_to_bitmaps()`,
`btrfs_convert_free_space_to_extents()`,
`btrfs_remove_block_group_free_space()`. Scope: single-file surgical
fix.

Step 2.2 Record: each hunk changes `else { ASSERT(0); }` after
recognized free-space-tree item types into `btrfs_err()`, `ret =
-EUCLEAN`, `btrfs_abort_transaction()`, and exit. Affected paths are
transaction-time conversion between free-space extents/bitmaps and
block-group free-space removal.

Step 2.3 Record: bug category is logic/corruption error handling. With
assertions disabled, the unexpected branch leaves `path->slots[0]`
unchanged and continues the inner `while (path->slots[0] > 0)`, which
can loop indefinitely. With assertions enabled, it can `BUG()`. The fix
converts both cases to a controlled filesystem-corruption error and
transaction abort.

Step 2.4 Record: fix quality is high: minimal, consistent with
surrounding transaction-abort handling, no new API, no data structure
change. Regression risk is low; the only changed behavior is for
unexpected on-disk key types.

## Phase 3: Git History Investigation
Step 3.1 Record: `git blame` shows the affected `ASSERT(0)` branches
were introduced by `a5ed91828518ab` (“Btrfs: implement the free space
B-tree”), first contained in `v4.5-rc1`. Later nearby lines changed, but
the bad branches are old.

Step 3.2 Record: no `Fixes:` tag, so no tagged introducing commit to
follow. Blame identifies `a5ed91828518ab` as the source of these
branches.

Step 3.3 Record: recent file history shows related free-space-tree
churn, but this patch is standalone. The current stable `7.0` tree
accepts the patch with `git apply --check`. Older stable tags contain
the same logical branches but may need context/name adjustment.

Step 3.4 Record: author David Sterba is listed as a Btrfs maintainer in
`MAINTAINERS`; recent history shows many Btrfs commits from him.

Step 3.5 Record: no functional prerequisite was found. The patch uses
existing `btrfs_err()`, `-EUCLEAN`, and `btrfs_abort_transaction()`
paths.

## Phase 4: Mailing List And External Research
Step 4.1 Record: `b4 dig -c 4d95b9efd783a` found the lore submission at
`https://patch.msgid.link/23bf53315a59a3acc0abdf4f4d8f4c9336c936e1.17761
80388.git.dsterba@suse.com`. It was `v1`, patch `3/6`.

Step 4.2 Record: `b4 dig -w` shows original recipients were David Sterba
and `linux-btrfs@vger.kernel.org`. The saved mbox contains Johannes
Thumshirn’s “Looks good to me” and `Reviewed-by`.

Step 4.3 Record: no reporter or bug-report link exists in the commit or
thread.

Step 4.4 Record: series title was “Debugging macro and kmalloc_obj*
cleanups”. Cover letter explicitly says this patch “updates error
handling” for “impossible” conditions. No dependency on other patches
was found.

Step 4.5 Record: stable lore WebFetch was blocked by Anubis. Web search
found the original patch discussion, but no stable-specific request or
rejection.

## Phase 5: Code Semantic Analysis
Step 5.1 Record: modified functions:
`btrfs_convert_free_space_to_bitmaps()`,
`btrfs_convert_free_space_to_extents()`,
`btrfs_remove_block_group_free_space()`.

Step 5.2 Record: conversion functions are called from
`update_free_space_extent_count()`, which is reached by free-space-tree
add/remove operations. `btrfs_remove_block_group_free_space()` is called
from block-group removal and relocation paths.

Step 5.3 Record: key callees include `btrfs_search_prev_slot()`,
`btrfs_item_key_to_cpu()`, `btrfs_del_items()`,
`btrfs_search_free_space_info()`, and now `btrfs_abort_transaction()`.

Step 5.4 Record: the affected paths are reachable during Btrfs
allocation/freeing, relocation, balance/remap, and block-group removal.
Trigger requires an unexpected free-space-tree key type in the relevant
block-group key range; I did not verify an unprivileged direct trigger.

Step 5.5 Record: similar `ASSERT(0)` patterns exist elsewhere in older
Btrfs free-space-tree code, but this patch specifically fixes the three
branches where the loop otherwise makes no progress.

## Phase 6: Stable Tree Analysis
Step 6.1 Record: the affected logic exists in stable-era tags checked:
`v5.15`, `v6.1`, `v6.6`, `v6.12`, `v6.19`, and `v7.0`. It originated in
`v4.5-rc1`.

Step 6.2 Record: clean apply verified on current `stable/linux-7.0.y`.
For older stable trees, function names differ before `v6.19`
(`convert_free_space_to_bitmaps()` etc.), so minor backport adjustment
is expected.

Step 6.3 Record: current branch search found no existing fix with the
same “unexpected free space tree key type” message.

## Phase 7: Subsystem And Maintainer Context
Step 7.1 Record: subsystem is Btrfs filesystem, criticality IMPORTANT
because it affects users of a production filesystem and can impact
transaction progress and filesystem integrity handling.

Step 7.2 Record: Btrfs is active; recent `fs/btrfs` history includes
multiple fixes and the candidate was merged via the Btrfs maintainer
tree.

## Phase 8: Impact And Risk
Step 8.1 Record: affected users are Btrfs users with the free-space-tree
feature enabled and malformed/unexpected free-space-tree entries.

Step 8.2 Record: trigger is not every boot or every operation; it
requires unexpected free-space-tree key types encountered during
conversion/removal. It can be hit by mounted filesystem state or
corruption; unprivileged trigger was not verified.

Step 8.3 Record: failure mode is severe: non-assert builds can spin in
the inner loop because the slot is not decremented; assert builds can
`BUG()`. Severity: HIGH to CRITICAL depending on build/configuration.

Step 8.4 Record: benefit is high for affected systems because it
prevents a hang/BUG and cleanly aborts a corrupt transaction. Risk is
low because the patch only affects an invalid/corrupt branch and is 15
added lines in one file.

## Phase 9: Final Synthesis
Step 9.1 Record: evidence for backporting: old bug present since
`v4.5-rc1`, affects active stable trees, fixes hang/BUG behavior, small
and contained, authored by Btrfs maintainer, reviewed. Evidence against:
no user report, no `Fixes:` or stable tag, trigger appears to require
corrupted/unexpected on-disk state, older trees need minor backport
context adjustment.

Step 9.2 Record: stable checklist: obviously correct: yes by code
inspection; tested: no explicit `Tested-by`. Real bug: yes, verified
non-progress branch. Important issue: yes, possible hang/BUG.
Small/contained: yes. No new features/APIs: yes. Applies to stable:
clean on `7.0`, minor adjustments likely for older trees.

Step 9.3 Record: no automatic exception category applies; this is not a
device ID, quirk, DT, build, or documentation change.

Step 9.4 Record: decision is YES. The key technical point is that this
is not just replacing a debug assertion: in production builds the
assertion compiles away, leaving a no-progress loop on unexpected free-
space-tree item types. The fix is small and turns a hang/BUG into normal
Btrfs corruption handling.

## Verification
- Phase 1: Parsed `git show --format=fuller` for commit
  `4d95b9efd783adca472e957b2f576983e789b839`; confirmed tags and absence
  of reporter/fixes/stable tags.
- Phase 2: Inspected full diff; confirmed 15 insertions, 3 deletions,
  one file, three functions.
- Phase 2: Read `fs/btrfs/messages.h`; confirmed `ASSERT()` calls
  `BUG()` only under `CONFIG_BTRFS_ASSERT`, otherwise compiles away via
  `BUILD_BUG_ON_INVALID()`.
- Phase 3: Ran `git blame` on all three changed regions; confirmed
  `ASSERT(0)` branches came from `a5ed91828518ab`.
- Phase 3: Ran `git describe --contains a5ed91828518ab`; confirmed first
  contained around `v4.5-rc1`.
- Phase 3: Checked `MAINTAINERS`; confirmed David Sterba is Btrfs
  maintainer.
- Phase 4: Ran `b4 dig -c`, `-a`, `-w`; confirmed v1 patch 3/6,
  recipients, and lore URL.
- Phase 4: Saved and read mbox; confirmed patch text, cover-letter
  context, and Johannes Thumshirn `Reviewed-by`.
- Phase 4: WebFetch to lore/stable was blocked by Anubis; WebSearch
  found original patch mirror and no stable-specific discussion.
- Phase 5: Used `rg` and file reads to trace callers through
  `update_free_space_extent_count()`, free-space add/remove, block-group
  removal, relocation, and extent-tree allocation/freeing paths.
- Phase 6: Scripted checks against `v5.15`, `v6.1`, `v6.6`, `v6.12`,
  `v6.19`, `v7.0`, and `master`; confirmed the affected assertions exist
  in stable-era code and the fix message only in `master`.
- Phase 6: Ran `git apply --check` for the candidate patch on current
  `stable/linux-7.0.y`; it applies cleanly.
- Phase 8: Verified `EUCLEAN` maps to filesystem corruption via
  `include/uapi/asm-generic/errno.h` (`EFSCORRUPTED EUCLEAN`).
- Unverified: no runtime test, no kernel build, no proof of unprivileged
  trigger, and no direct check that every older stable branch applies
  without manual adjustment.

**YES**

 fs/btrfs/free-space-tree.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/free-space-tree.c b/fs/btrfs/free-space-tree.c
index 9efd1ec90f031..472b3060e5ac3 100644
--- a/fs/btrfs/free-space-tree.c
+++ b/fs/btrfs/free-space-tree.c
@@ -259,7 +259,11 @@ int btrfs_convert_free_space_to_bitmaps(struct btrfs_trans_handle *trans,
 				nr++;
 				path->slots[0]--;
 			} else {
-				ASSERT(0);
+				btrfs_err(fs_info, "unexpected free space tree key type %u",
+					  found_key.type);
+				ret = -EUCLEAN;
+				btrfs_abort_transaction(trans, ret);
+				goto out;
 			}
 		}

@@ -405,7 +409,11 @@ int btrfs_convert_free_space_to_extents(struct btrfs_trans_handle *trans,

 				nr++;
 			} else {
-				ASSERT(0);
+				btrfs_err(fs_info, "unexpected free space tree key type %u",
+					  found_key.type);
+				ret = -EUCLEAN;
+				btrfs_abort_transaction(trans, ret);
+				goto out;
 			}
 		}

@@ -1518,7 +1526,11 @@ int btrfs_remove_block_group_free_space(struct btrfs_trans_handle *trans,
 				nr++;
 				path->slots[0]--;
 			} else {
-				ASSERT(0);
+				btrfs_err(trans->fs_info, "unexpected free space tree key type %u",
+					  found_key.type);
+				ret = -EUCLEAN;
+				btrfs_abort_transaction(trans, ret);
+				return ret;
 			}
 		}

-- 
2.53.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH AUTOSEL 7.0-6.18] md/raid5: Fix UAF on IO across the reshape position
  2026-05-05  9:51 [PATCH AUTOSEL 7.0-5.10] ALSA: hda: Avoid WARN_ON() for HDMI chmap slot checks Sasha Levin
                   ` (16 preceding siblings ...)
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-5.10] btrfs: handle unexpected free-space-tree key types Sasha Levin
@ 2026-05-05  9:51 ` Sasha Levin
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.6] btrfs: apply first key check for readahead when possible Sasha Levin
                   ` (9 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Sasha Levin @ 2026-05-05  9:51 UTC (permalink / raw)
  To: patches, stable
  Cc: Benjamin Marzinski, Xiao Ni, Yu Kuai, Sasha Levin, song,
	linux-raid, linux-kernel

From: Benjamin Marzinski <bmarzins@redhat.com>

[ Upstream commit 418b3e64e4459feb3f75979de9ec89e085745343 ]

If make_stripe_request() returns STRIPE_WAIT_RESHAPE,
raid5_make_request() will free the cloned bio. But raid5_make_request()
can call make_stripe_request() multiple times, writing to the various
stripes. If that bio got added to the toread or towrite lists of a
stripe disk in an earlier call to make_stripe_request(), then it's not
safe to just free the bio if a later part of it is found to cross the
reshape position. Doing so can lead to a UAF error, when bio_endio()
is called on the bio for the earlier stripes.

Instead, raid5_make_request() needs to wait until all parts of the bio
have called bio_endio(). To do this, bios that cross the reshape
position while the reshape can't make progress are flagged as needing to
wait for all parts to complete. When raid5_make_request() has a bio that
failed make_stripe_request() with STRIPE_WAIT_RESHAPE, it sets
bi->bi_private to a completion struct and waits for completion after
ending the bio.  When the bio_endio() is called for the last time on a
clone bio with bi->bi_private set, it wakes up the waiter. This
guarantees that raid5_make_request() doesn't return until the cloned bio
needing a retry for io across the reshape boundary is safely cleaned up.

There is a simple reproducer available at [1]. Compile the kernel with
KASAN for more useful reporting when the error is triggered (this is not
necessary to see the bug).

[1] https://gist.github.com/bmarzins/e48598824305cf2171289e47d7241fa5

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Reviewed-by: Xiao Ni <xni@redhat.com>
Link: https://lore.kernel.org/r/20260408043548.1695157-1-bmarzins@redhat.com
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Phase Walkthrough
1. Commit message forensics
Record: subsystem `md/raid5`; action verb `Fix`; intent is to prevent a
use-after-free when RAID5 IO crosses a reshape position and
`make_stripe_request()` returns `STRIPE_WAIT_RESHAPE`. Tags found:
`Signed-off-by: Benjamin Marzinski`, `Reviewed-by: Xiao Ni`, lore
`Link`, `Signed-off-by: Yu Kuai`; no `Fixes:`, `Reported-by:`, `Tested-
by`, or `Cc: stable`. Body describes a real UAF, a KASAN-aided
reproducer, and the root cause: a cloned bio can already be linked into
earlier stripe `toread`/`towrite` lists when a later stripe path frees
it.

2. Diff analysis
Record: 3 files, `14 insertions/25 deletions`; functions changed are
`md_end_clone_io()`, `md_clone_bio()`, removed `md_free_cloned_bio()`,
and `raid5_make_request()`. Before: `STRIPE_WAIT_RESHAPE` directly
called `md_free_cloned_bio(bi)`. After: it sets `bi_private` to a stack
completion, calls `bio_endio(bi)`, and waits until the cloned bio’s
final endio completes. Bug category: memory safety/UAF, caused by
freeing a clone still referenced by stripe bio chains. Fix is small and
contained.

3. Git history investigation
Record: upstream commit is `418b3e64e4459`. Blame shows the problematic
`STRIPE_WAIT_RESHAPE`/`md_free_cloned_bio()` path came from
`41425f96d7aa` (`dm-raid456, md/raid456: fix a deadlock...`), first
contained in `v6.9-rc1`/`v6.9`. That introducing commit was itself
stable-marked for `v6.7+` and is present in checked stable branches
`6.12.y`, `6.18.y`, `6.19.y`, and `7.0.y`; the specific buggy
helper/path was not found in `6.6.y` or `6.1.y`. No `Fixes:` tag exists,
so blame was used instead.

4. Mailing list and external research
Record: `b4 dig -c 418b3e64e4459` found `[PATCH v2]` at
`https://patch.msgid.link/20260408043548.1695157-1-bmarzins@redhat.com`.
`b4 dig -a` found v1 RFC and v2; committed patch matches v2. `b4 dig -w`
shows `linux-raid`, `dm-devel`, Song Liu, Yu Kuai, Li Nan, Xiao Ni, and
Red Hat participants were included. Thread review: Xiao Ni gave
`Reviewed-by`; Yu Kuai replied “Applied.” Reviewer asked about
`WRITE_ONCE`; author explained it was unnecessary but harmless on the
slow path, and Xiao accepted keeping it. No NAKs found. WebFetch for
lore was blocked by Anubis, but b4 retrieved the mbox. The gist
reproducer uses dmsetup/LVM RAID5 reshape loops.

5. Code semantic analysis
Record: `md_submit_bio()` reaches `md_handle_request()`, which calls the
RAID personality `.make_request = raid5_make_request` for RAID4/5/6.
`raid5_make_request()` calls `make_stripe_request()` repeatedly over
stripe bits. `add_all_stripe_bios()` calls `__add_stripe_bio()`, which
links the cloned bio into `toread`/`towrite` and calls
`bio_inc_remaining()`. `bio_endio()` only invokes `bi_end_io` after
`__bi_remaining` reaches zero, so the completion wait correctly waits
for all earlier stripe references to drain.

6. Stable tree analysis
Record: buggy code exists in `stable/linux-7.0.y`, `6.19.y`, `6.18.y`,
and `6.12.y`; not in checked `6.6.y`/`6.1.y`. Patch applies cleanly to
current `7.0.y` with `git apply --check`. Direct upstream patch does not
apply cleanly to `6.19.y` and `6.12.y` because nearby context differs
(`ctx` allocation/field access and older bitmap helpers), but the same
core code is present, so a small backport adjustment is needed.

7. Subsystem context
Record: subsystem is software RAID / MD, `drivers/md`; MAINTAINERS lists
Song Liu and Yu Kuai as maintainers and `linux-raid@vger.kernel.org` as
the list. Criticality: important storage subsystem, with data
availability and kernel memory-safety impact during RAID reshape.
Activity level is high; recent logs show multiple md/raid fixes and
refactors.

8. Impact and risk assessment
Record: affected users are md/dm RAID4/5/6 users doing IO while reshape
is blocked/interrupted. Trigger is specialized but real; reproducer
exists. Admin privileges are generally needed to create/reshape the
array, but ordinary IO can hit the path once the system is in that
state. Failure mode is UAF after freeing a cloned bio still referenced
by stripe lists; severity HIGH/CRITICAL because it can crash or corrupt
memory. Benefit is high for affected stable users; risk is low to
medium: small fix, no API change, but it touches bio completion
semantics in a storage path.

9. Final synthesis
Record: Evidence for backporting: explicit UAF, reproducer, KASAN
usefulness, small contained patch, reviewed by linux-raid reviewer,
applied by md maintainer, buggy code present in active stable trees from
`6.12.y` upward. Evidence against: not applicable to older
`6.6.y`/`6.1.y`; older applicable trees need a manual backport rather
than direct cherry-pick. Stable rules: obviously correct by code
inspection and review; fixes a real serious bug; small and contained; no
new features/APIs; applies cleanly to `7.0.y`, needs minor backport for
older affected branches. Exception categories: none needed.

## Verification
- Phase 1: `git show --format=fuller --stat 418b3e64e4459` verified
  subject, body, tags, reviewer, lore link, and line counts.
- Phase 2: `git show --no-ext-diff 418b3e64e4459` verified exact changes
  to `drivers/md/md.c`, `drivers/md/md.h`, and `drivers/md/raid5.c`.
- Phase 3: `git blame` verified `41425f96d7aa` introduced
  `STRIPE_WAIT_RESHAPE` freeing via `md_free_cloned_bio()`.
- Phase 3: `git describe --contains 41425f96d7aa` and tag checks
  verified first mainline containment around `v6.9`.
- Phase 4: `b4 dig -c/-a/-w/-m 418b3e64e4459` verified v1/v2 series,
  recipients, review, and applied response.
- Phase 4: WebFetch verified the raw gist reproducer content; lore
  WebFetch was blocked by Anubis.
- Phase 5: `rg`/file reads verified call chain, stripe list insertion,
  `bio_inc_remaining()`, and `bio_endio()` final-callback semantics.
- Phase 6: `git grep` verified buggy code in `6.12.y`, `6.18.y`,
  `6.19.y`, and `7.0.y`, absent from checked `6.6.y` and `6.1.y`.
- Phase 6: `git apply --check` verified clean apply to `7.0.y`; worktree
  checks showed direct patch conflicts on `6.19.y` and `6.12.y`.
- Unverified: I did not run the reproducer locally or test a built
  kernel.

This is stable material for affected trees, especially `6.12.y+`, with
backport adjustment where context differs.

**YES**

 drivers/md/md.c    | 31 ++++++++-----------------------
 drivers/md/md.h    |  1 -
 drivers/md/raid5.c |  7 ++++++-
 3 files changed, 14 insertions(+), 25 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 3ce6f9e9d38e6..4318d875a5f63 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9215,9 +9215,11 @@ static void md_bitmap_end(struct mddev *mddev, struct md_io_clone *md_io_clone)

 static void md_end_clone_io(struct bio *bio)
 {
-	struct md_io_clone *md_io_clone = bio->bi_private;
+	struct md_io_clone *md_io_clone = container_of(bio, struct md_io_clone,
+						       bio_clone);
 	struct bio *orig_bio = md_io_clone->orig_bio;
 	struct mddev *mddev = md_io_clone->mddev;
+	struct completion *reshape_completion = bio->bi_private;

 	if (bio_data_dir(orig_bio) == WRITE && md_bitmap_enabled(mddev, false))
 		md_bitmap_end(mddev, md_io_clone);
@@ -9229,7 +9231,10 @@ static void md_end_clone_io(struct bio *bio)
 		bio_end_io_acct(orig_bio, md_io_clone->start_time);

 	bio_put(bio);
-	bio_endio(orig_bio);
+	if (unlikely(reshape_completion))
+		complete(reshape_completion);
+	else
+		bio_endio(orig_bio);
 	percpu_ref_put(&mddev->active_io);
 }

@@ -9254,7 +9259,7 @@ static void md_clone_bio(struct mddev *mddev, struct bio **bio)
 	}

 	clone->bi_end_io = md_end_clone_io;
-	clone->bi_private = md_io_clone;
+	clone->bi_private = NULL;
 	*bio = clone;
 }

@@ -9265,26 +9270,6 @@ void md_account_bio(struct mddev *mddev, struct bio **bio)
 }
 EXPORT_SYMBOL_GPL(md_account_bio);

-void md_free_cloned_bio(struct bio *bio)
-{
-	struct md_io_clone *md_io_clone = bio->bi_private;
-	struct bio *orig_bio = md_io_clone->orig_bio;
-	struct mddev *mddev = md_io_clone->mddev;
-
-	if (bio_data_dir(orig_bio) == WRITE && md_bitmap_enabled(mddev, false))
-		md_bitmap_end(mddev, md_io_clone);
-
-	if (bio->bi_status && !orig_bio->bi_status)
-		orig_bio->bi_status = bio->bi_status;
-
-	if (md_io_clone->start_time)
-		bio_end_io_acct(orig_bio, md_io_clone->start_time);
-
-	bio_put(bio);
-	percpu_ref_put(&mddev->active_io);
-}
-EXPORT_SYMBOL_GPL(md_free_cloned_bio);
-
 /* md_allow_write(mddev)
  * Calling this ensures that the array is marked 'active' so that writes
  * may proceed without blocking.  It is important to call this before
diff --git a/drivers/md/md.h b/drivers/md/md.h
index ac84289664cd7..5d57fee22901f 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -917,7 +917,6 @@ extern void md_finish_reshape(struct mddev *mddev);
 void md_submit_discard_bio(struct mddev *mddev, struct md_rdev *rdev,
 			struct bio *bio, sector_t start, sector_t size);
 void md_account_bio(struct mddev *mddev, struct bio **bio);
-void md_free_cloned_bio(struct bio *bio);

 extern bool __must_check md_flush_request(struct mddev *mddev, struct bio *bio);
 void md_write_metadata(struct mddev *mddev, struct md_rdev *rdev,
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index a8e8d431071ba..dc0c680ca199b 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -6217,7 +6217,12 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi)

 	mempool_free(ctx, conf->ctx_pool);
 	if (res == STRIPE_WAIT_RESHAPE) {
-		md_free_cloned_bio(bi);
+		DECLARE_COMPLETION_ONSTACK(done);
+		WRITE_ONCE(bi->bi_private, &done);
+
+		bio_endio(bi);
+
+		wait_for_completion(&done);
 		return false;
 	}

-- 
2.53.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH AUTOSEL 7.0-6.6] btrfs: apply first key check for readahead when possible
  2026-05-05  9:51 [PATCH AUTOSEL 7.0-5.10] ALSA: hda: Avoid WARN_ON() for HDMI chmap slot checks Sasha Levin
                   ` (17 preceding siblings ...)
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.18] md/raid5: Fix UAF on IO across the reshape position Sasha Levin
@ 2026-05-05  9:51 ` Sasha Levin
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.6] ASoC: aw88395: Fix kernel panic caused by invalid GPIO error pointer Sasha Levin
                   ` (8 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Sasha Levin @ 2026-05-05  9:51 UTC (permalink / raw)
  To: patches, stable
  Cc: Qu Wenruo, David Sterba, Sasha Levin, clm, linux-btrfs,
	linux-kernel

From: Qu Wenruo <wqu@suse.com>

[ Upstream commit a86a283430e1a44907b142c4f53e1f3ad24e87ae ]

Currently for tree block readahead we never pass a
btrfs_tree_parent_check with @has_first_key set.

Without @has_first_key set, btrfs will skip the following extra
checks:

- Header generation check
  This is a minor one.

- Empty leaf/node checks
  This is more serious, for certain trees like the csum tree, they are
  allowed to be empty, thus an empty leaf can pass the tree checker.
  But if there is a parent node for such an empty leaf, it indicates
  corruption.

  Without @has_first_key set, we can no longer detect such a problem.

  In fact there is already a fuzzed image report that a corrupted csum
  leaf which has zero nritems but still has a parent node can trigger
  a BUG_ON() during csum deletion.

However there are only two call sites of btrfs_readahead_tree_block():

- Inside relocate_tree_blocks()
  At this call site we are trying to grab the first key of the tree
  block, thus we are not able to pass a @first_key parameter.

- Inside btrfs_readahead_node_child()
  This is the more common call site, where we have the parent node and
  want to readahead the child tree blocks.

  In this case we can easily grab the node key and pass it for checks.

Add a new parameter @first_key to btrfs_readahead_tree_block() and pass
the node key to it inside btrfs_readahead_node_child().

This should plug the gap in empty leaf detection during readahead.

Link: https://lore.kernel.org/linux-btrfs/20260409071255.3358044-1-gality369@gmail.com/
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Phase 1: Commit Message Forensics
Record 1.1: Subsystem is `btrfs`; action is `apply` / `pass existing
first key`; intent is to enable existing parent/first-key validation for
tree-block readahead where the parent node is available.

Record 1.2: Tags found:
- `Link: https://lore.kernel.org/linux-
  btrfs/20260409071255.3358044-1-gality369@gmail.com/`
- `Reviewed-by: David Sterba <dsterba@suse.com>`
- `Signed-off-by: Qu Wenruo <wqu@suse.com>`
- `Signed-off-by: David Sterba <dsterba@suse.com>`
- No `Fixes:`, `Reported-by:`, `Tested-by:`, or `Cc:
  stable@vger.kernel.org` tag in the final commit.

Record 1.3: The commit describes a real corruption-detection gap:
readahead builds a `btrfs_tree_parent_check` without `has_first_key`, so
checks that require the parent slot key are skipped. The stated failure
mode is a fuzzed corrupt csum-tree leaf with zero items but a parent
node later reaching csum deletion and triggering a `BUG_ON()`.

Record 1.4: This is a hidden bug fix, not a feature. It adds no new
behavior for valid filesystems; it makes an existing metadata integrity
check run during readahead when the caller already has the parent key.

## Phase 2: Diff Analysis
Record 2.1: Final committed diff changes 3 files: `fs/btrfs/extent_io.c`
+12/-2, `fs/btrfs/extent_io.h` +2/-1, `fs/btrfs/relocation.c` +1/-1;
total +15/-4. Modified functions are `btrfs_readahead_tree_block()`,
`btrfs_readahead_node_child()`, and the call in
`relocate_tree_blocks()`. Scope is small and Btrfs-local.

Record 2.2: Before, `btrfs_readahead_tree_block()` always initialized
parent-check data with level/transid only. After, it optionally copies a
caller-provided `first_key` and sets `has_first_key`.
`btrfs_readahead_node_child()` now derives the key from the parent node
slot and passes it. The relocation caller passes `NULL` because it lacks
the first key at that point.

Record 2.3: Bug category is filesystem metadata validation / corruption
handling. Mechanism: missing parent first-key information let readahead
read and mark a corrupt child block valid under weaker checks. Passing
the parent slot key enables existing first-key/empty-child validation
paths.

Record 2.4: Fix quality is good: small, direct, no locking changes, no
public API changes, no unrelated refactor. Regression risk is low; the
new validation applies only where the parent node already provides the
expected key. Legitimate empty roots are not children of a parent slot
with a first key.

## Phase 3: Git History Investigation
Record 3.1: `git blame` shows the current readahead helper was
introduced by `bfb484d922a3` (`btrfs: cleanup extent buffer readahead`,
described as `v5.11-rc1~144^2~112`), owner/level plumbing by
`3fbaf25817f7`, and the parent-check struct initialization in this
helper by `947a629988f1` (`v6.2-rc1~143^2~39`).
`btrfs_verify_level_key()` empty-child validation exists in the local
stable tags examined, including `v5.15`, `v6.1`, and newer.

Record 3.2: No `Fixes:` tag is present, so there was no direct fixes
target to follow for this commit.

Record 3.3: Recent file history shows related Btrfs metadata/readahead
validation work, including `f04c6475c2db` (`btrfs: revalidate cached
tree blocks on the uptodate path`) and `908ab5634751` (`btrfs: remove
atomic parameter from btrfs_buffer_uptodate()`). The candidate itself is
standalone on `master`, but older stable trees have API differences.

Record 3.4: Qu Wenruo has many recent `fs/btrfs` commits; David Sterba
is listed in `MAINTAINERS` as Btrfs maintainer and reviewed/committed
this patch.

Record 3.5: Dependency/backport finding: the patch applies cleanly to
the current `v7.0.3` working tree with offsets. `v6.6`, `v6.12`, and
`v7.0` have the `btrfs_tree_parent_check` readahead infrastructure.
`v5.15`/`v6.1` have older `read_extent_buffer_pages()` interfaces, so
they would need a tailored backport rather than the exact diff.

## Phase 4: Mailing List And External Research
Record 4.1: `b4 dig -c a86a283430e1a` failed exact patch-id matching,
then found the candidate by author/subject at
`49ae0f6badb787c1111daff252f034b7ae94f257.1775991651.git.wqu@suse.com`.
`b4 am` fetched the candidate patch and showed 3 thread messages.

Record 4.2: `b4 dig -w` mis-associated the related earlier patch thread,
so I did not rely on it for candidate recipients. The candidate mbox
shows it was sent to `linux-btrfs@vger.kernel.org`; David Sterba replied
with `Reviewed-by`.

Record 4.3: The linked bug thread contains a concrete fuzzed-image
crash: `kernel BUG at fs/btrfs/ctree.c:3388`, call chain through
`push_leaf_left()`, `btrfs_del_items()`, `btrfs_del_csums()`, delayed
refs, transaction commit, and `sync`. Qu replied that detecting the
empty non-root leaf earlier during extent-buffer validation was
preferable.

Record 4.4: Related patch context: the linked patch proposed checking
`push_leaf_left()` directly. Qu rejected that as too late and followed
with this broader readahead validation coverage patch. No newer revision
of this candidate was found; the final commit incorporated David’s minor
review by dropping the unnecessary trailing comma from the submitted
diff.

Record 4.5: WebFetch of the stable lore search was blocked by Anubis, so
stable-list discussion could not be verified through WebFetch. No
stable-specific rationale against this patch was found locally.

## Phase 5: Code Semantic Analysis
Record 5.1: Modified functions: `btrfs_readahead_tree_block()` and
`btrfs_readahead_node_child()`.

Record 5.2: Callers of `btrfs_readahead_node_child()` in the local tree
are Btrfs tree search/balance readahead in `ctree.c`, tree deletion
walking in `extent-tree.c`, chunk-tree read at mount in `volumes.c`, and
send-tree traversal in `send.c`. `relocate_tree_blocks()` calls
`btrfs_readahead_tree_block()` directly.

Record 5.3: Key callees are `btrfs_node_key_to_cpu()`,
`btrfs_node_blockptr()`, `btrfs_node_ptr_generation()`,
`btrfs_find_create_tree_block()`, `btrfs_buffer_uptodate()`, and
`read_extent_buffer_pages_nowait()`.

Record 5.4: Reachability is real for Btrfs users: metadata readahead
occurs during tree searches, balancing, send traversal, chunk-tree mount
processing, and snapshot/drop-tree walking. The concrete bug report
reaches a crash via filesystem operation and `sync` on a corrupted
mounted image.

Record 5.5: Similar pattern exists in normal tree reads:
`read_block_for_search()` already builds a parent check with
`has_first_key = true`. This patch aligns readahead with that existing
checked read path.

## Phase 6: Cross-Referencing And Stable Tree Analysis
Record 6.1: The affected readahead helper exists from `v5.11` onward.
Local tags show `v5.15`, `v6.1`, `v6.6`, `v6.12`, and `v7.0` contain the
readahead path; `v6.6+` contain the same parent-check structure in this
area.

Record 6.2: Backport difficulty: clean for current `v7.0.3` by `git
apply --check`; likely minor/manual for `v6.6` and `v6.12`; non-trivial
for `v5.15`/`v6.1` due older metadata read APIs.

Record 6.3: No equivalent fix with the same subject was found in the
checked stable working tree. Related validation infrastructure already
exists, so this patch fills a coverage gap rather than introducing the
checker.

## Phase 7: Subsystem And Maintainer Context
Record 7.1: Subsystem is `fs/btrfs`, a production filesystem.
Criticality is IMPORTANT to CORE for Btrfs users because metadata
corruption handling can mean kernel crash or filesystem transaction
abort behavior.

Record 7.2: Btrfs is actively maintained; recent logs show ongoing Btrfs
commits from Qu Wenruo and David Sterba, and `MAINTAINERS` lists David
Sterba as maintainer.

## Phase 8: Impact And Risk Assessment
Record 8.1: Affected users are Btrfs users, especially systems
encountering corrupted metadata or fuzzed/untrusted images.

Record 8.2: Trigger requires a corrupt tree block with an empty non-root
child reachable through parent-node readahead. The crash report shows a
reachable path through Btrfs csum deletion and `sync`; unprivileged
creation/mounting of such an image was not verified.

Record 8.3: Failure mode is severe: the linked report shows a kernel
`BUG`/Oops during Btrfs metadata operations. Severity: CRITICAL for
affected systems.

Record 8.4: Benefit is high because it turns a missed corruption case
into earlier detection. Risk is low because the patch is small and only
supplies existing verifier data to existing verifier code.

## Phase 9: Final Synthesis
Record 9.1: Evidence for backporting: fixes a real fuzzed corruption
crash path, reviewed by Btrfs maintainer, tiny Btrfs-only diff, no new
feature/API, clean on `v7.0.3`, aligns readahead with normal parent-key
validation. Evidence against: older stable trees need tailored
backports; no `Tested-by`; WebFetch stable-list search was blocked.
Unresolved: exact applicability to every older LTS branch without
writing/testing dedicated backports.

Record 9.2: Stable rules:
1. Obviously correct and tested: mostly yes by inspection and maintainer
   review; no explicit `Tested-by`.
2. Fixes a real bug: yes, linked fuzzed image crash and Btrfs
   corruption-detection gap.
3. Important issue: yes, kernel BUG/Oops on filesystem metadata path.
4. Small and contained: yes, 15 insertions / 4 deletions in Btrfs.
5. No new features/APIs: yes.
6. Can apply to stable: yes for current `v7.0.y`; likely minor/tailored
   for other modern stable trees; older LTS needs more care.

Record 9.3: No automatic exception category such as device ID, quirk,
DT, build, or docs. This is a filesystem bug fix.

Record 9.4: Decision: backport. The benefit of earlier metadata
corruption detection and avoiding a documented Btrfs BUG outweighs the
low risk of passing an existing parent key into existing checks.

## Verification
- Phase 1: Parsed `git show -s a86a283430e1a`; confirmed subject, body,
  tags, author, reviewer, and lack of `Fixes:`/`Tested-by`.
- Phase 2: Used `git show --stat --numstat` and full patch; confirmed
  files, line counts, and function changes.
- Phase 3: Used `git blame -L :btrfs_readahead_tree_block`,
  `:btrfs_readahead_node_child`, and `:btrfs_verify_level_key`;
  confirmed historical origins and validation code.
- Phase 3: Used `git log master --oneline -- fs/btrfs/...` and author
  logs; confirmed related Btrfs history and author activity.
- Phase 4: Used `b4 dig -c`, `b4 dig -a`, `b4 am`, and `b4 mbox`;
  confirmed candidate thread, related bug thread, maintainer review, and
  absence of substantive objections beyond a minor comma cleanup.
- Phase 4: WebFetch of lore pages was blocked by Anubis; b4 mbox
  provided the mailing-list content instead.
- Phase 5: Used `rg` and code reads for callers in `ctree.c`, `extent-
  tree.c`, `volumes.c`, and `send.c`; confirmed reachability through
  Btrfs metadata operations.
- Phase 6: Used `git grep` on `v5.15`, `v6.1`, `v6.6`, `v6.12`, `v7.0`;
  confirmed code presence and API differences.
- Phase 6: Ran `git apply --check` of the candidate diff on current
  `v7.0.3`; patch applies with offsets.
- Phase 7: Checked `MAINTAINERS`; David Sterba is Btrfs maintainer.
- Phase 8: Read the linked bug mbox; confirmed the kernel BUG/Oops stack
  through `btrfs_del_csums()` and `sync`.

**YES**

 fs/btrfs/extent_io.c  | 14 ++++++++++++--
 fs/btrfs/extent_io.h  |  3 ++-
 fs/btrfs/relocation.c |  2 +-
 3 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 5f97a3d2a8d72..83feb96b82673 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4593,7 +4593,8 @@ int try_release_extent_buffer(struct folio *folio)
  * to read the block we will not block on anything.
  */
 void btrfs_readahead_tree_block(struct btrfs_fs_info *fs_info,
-				u64 bytenr, u64 owner_root, u64 gen, int level)
+				u64 bytenr, u64 owner_root, u64 gen, int level,
+				const struct btrfs_key *first_key)
 {
 	struct btrfs_tree_parent_check check = {
 		.level = level,
@@ -4602,6 +4603,11 @@ void btrfs_readahead_tree_block(struct btrfs_fs_info *fs_info,
 	struct extent_buffer *eb;
 	int ret;

+	if (first_key) {
+		memcpy(&check.first_key, first_key, sizeof(struct btrfs_key));
+		check.has_first_key = true;
+	}
+
 	eb = btrfs_find_create_tree_block(fs_info, bytenr, owner_root, level);
 	if (IS_ERR(eb))
 		return;
@@ -4629,9 +4635,13 @@ void btrfs_readahead_tree_block(struct btrfs_fs_info *fs_info,
  */
 void btrfs_readahead_node_child(struct extent_buffer *node, int slot)
 {
+	struct btrfs_key node_key;
+
+	btrfs_node_key_to_cpu(node, &node_key, slot);
 	btrfs_readahead_tree_block(node->fs_info,
 				   btrfs_node_blockptr(node, slot),
 				   btrfs_header_owner(node),
 				   btrfs_node_ptr_generation(node, slot),
-				   btrfs_header_level(node) - 1);
+				   btrfs_header_level(node) - 1,
+				   &node_key);
 }
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 8d05f1a58b7c3..6f7e3ead1dbaa 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -264,7 +264,8 @@ static inline void wait_on_extent_buffer_writeback(struct extent_buffer *eb)
 }

 void btrfs_readahead_tree_block(struct btrfs_fs_info *fs_info,
-				u64 bytenr, u64 owner_root, u64 gen, int level);
+				u64 bytenr, u64 owner_root, u64 gen, int level,
+				const struct btrfs_key *first_key);
 void btrfs_readahead_node_child(struct extent_buffer *node, int slot);

 /* Note: this can be used in for loops without caching the value in a variable. */
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 2519cdb401eda..ff31266a44a4d 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -2610,7 +2610,7 @@ int relocate_tree_blocks(struct btrfs_trans_handle *trans,
 		if (!block->key_ready)
 			btrfs_readahead_tree_block(fs_info, block->bytenr,
 						   block->owner, 0,
-						   block->level);
+						   block->level, NULL);
 	}

 	/* Get first keys */
-- 
2.53.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH AUTOSEL 7.0-6.6] ASoC: aw88395: Fix kernel panic caused by invalid GPIO error pointer
  2026-05-05  9:51 [PATCH AUTOSEL 7.0-5.10] ALSA: hda: Avoid WARN_ON() for HDMI chmap slot checks Sasha Levin
                   ` (18 preceding siblings ...)
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.6] btrfs: apply first key check for readahead when possible Sasha Levin
@ 2026-05-05  9:51 ` Sasha Levin
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.12] nvme-tcp: teardown circular locking fixes Sasha Levin
                   ` (7 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Sasha Levin @ 2026-05-05  9:51 UTC (permalink / raw)
  To: patches, stable
  Cc: wangdicheng, Mark Brown, Sasha Levin, lgirdwood, perex, tiwai,
	linux-sound, linux-kernel

From: wangdicheng <wangdicheng@kylinos.cn>

[ Upstream commit 241ee17ecb6be210f7b231b2a81bfb68871950d0 ]

In aw88395_i2c_probe(), if `devm_gpiod_get_optional()` fails, it returns
an ERR_PTR() error pointer. The current code only prints a message and
continues execution, leaving `aw88395->reset_gpio` as an invalid pointer.

Later, in `aw88395_hw_reset()`, this invalid pointer is passed to
`gpiod_set_value_cansleep()`, which dereferences it and causes a kernel
panic.

For optional GPIOs, `devm_gpiod_get_optional()` returns NULL if the GPIO
is not defined in the DT, which is safe. If it returns an ERR_PTR, it
means a real error occurred (e.g., -EPROBE_DEFER) and the probe must be
aborted.

Also, since the GPIO is optional, remove the dev_err() log in
aw88395_hw_reset() when the GPIO is missing to match the optional
semantics. This also fixes a potential NULL pointer dereference as
aw_pa is not initialized when aw88395_hw_reset() is called.

Signed-off-by: wangdicheng <wangdicheng@kylinos.cn>
Link: https://patch.msgid.link/20260428023408.46420-1-wangdich9700@163.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Phase Walkthrough
Step 1.1 Record: Subsystem `ASoC: aw88395`; action verb `Fix`; claimed
intent is preventing probe/reset crashes from invalid or missing
optional reset GPIO handling.

Step 1.2 Record: Tags found: `Signed-off-by: wangdicheng
<wangdicheng@kylinos.cn>`, `Link:
https://patch.msgid.link/20260428023408.46420-1-wangdich9700@163.com`,
`Signed-off-by: Mark Brown <broonie@kernel.org>`. No `Fixes:`,
`Reported-by:`, `Tested-by:`, `Reviewed-by:`, `Acked-by:`, or `Cc:
stable`.

Step 1.3 Record: Commit body describes `devm_gpiod_get_optional()`
returning `ERR_PTR`, current code logging and continuing, then
`aw88395_hw_reset()` using the bad pointer. Verified an additional body
claim: `aw_pa` is uninitialized when `aw88395_hw_reset()` is called, so
the old missing-GPIO `dev_err(aw88395->aw_pa->dev, ...)` path can
dereference NULL.

Step 1.4 Record: This is a real bug fix despite also changing logging:
it fixes probe error handling and removes a pre-initialization NULL
dereference path.

Step 2.1 Record: One file changed: `sound/soc/codecs/aw88395/aw88395.c`,
4 insertions and 5 deletions. Modified functions: `aw88395_hw_reset()`
and `aw88395_i2c_probe()`. Scope is single-file surgical driver fix.

Step 2.2 Record: Hunk 1 changes missing reset GPIO from logging through
`aw88395->aw_pa->dev` to silent no-op. Hunk 2 changes
`IS_ERR(reset_gpio)` from “log and continue” to `return
dev_err_probe(...)`.

Step 2.3 Record: Bug categories: NULL pointer dereference in probe/reset
when optional GPIO is absent; probe error handling bug for `ERR_PTR`
values such as `-EPROBE_DEFER`. I could not verify the commit’s exact
claim that `gpiod_set_value_cansleep()` dereferences an `ERR_PTR`,
because current and checked stable gpiolib validate error pointers
before use.

Step 2.4 Record: Fix quality is good: minimal, contained, follows common
`devm_gpiod_get_optional()` patterns, and uses existing
`dev_err_probe()`. Regression risk is low; the behavior change is to
fail probe on real GPIO acquisition errors and stop logging through an
uninitialized pointer.

Step 3.1 Record: `git blame` shows the buggy lines were introduced by
`62fc25fbab5f4`, “ASoC: codecs: Add i2c and codec registration for
aw88395...”, first contained in `v6.3`.

Step 3.2 Record: No `Fixes:` tag, so there was no tagged introducing
commit to follow; blame identifies `62fc25fbab5f4` as the origin.

Step 3.3 Record: Recent file history shows normal aw88395 codec churn
and no prerequisite patch for this fix. On `fixes-next`, the candidate
`241ee17ecb6b` is the direct relevant change to this file.

Step 3.4 Record: `git log --author=wangdicheng` found no other local
aw88395/codecs commits by this author. Maintainer Mark Brown applied the
patch.

Step 3.5 Record: No dependent commits found. `dev_err_probe()` and the
touched code exist in the checked stable trees; the patch applies
cleanly to representative stable tags.

Step 4.1 Record: `b4 dig -c 241ee17ecb6b` matched the lore submission
`20260428023408.46420-1-wangdich9700@163.com`. `b4 dig -a` found v1 and
v2; v2 is the committed/applied version.

Step 4.2 Record: `b4 dig -w` shows recipients included Mark Brown,
Takashi Iwai, `linux-sound`, and `linux-kernel`.

Step 4.3 Record: No `Reported-by` or external bug report tag. The Link
points to the patch thread. WebFetch of lore was blocked by Anubis, but
b4 successfully retrieved the mboxes.

Step 4.4 Record: v1 changed only the `IS_ERR()` handling. Mark Brown
replied that it looked OK but noted the missing-GPIO `dev_err()` bug. v2
removed that `dev_err()` and was applied by Mark Brown to
`broonie/sound.git for-7.1` as `241ee17ecb6b`.

Step 4.5 Record: Stable-specific lore search through WebFetch was
blocked by Anubis; no stable-specific discussion was verified.

Step 5.1 Record: Modified functions are `aw88395_hw_reset()` and
`aw88395_i2c_probe()`.

Step 5.2 Record: `aw88395_hw_reset()` is called only from
`aw88395_i2c_probe()` in this file. `aw88395_i2c_probe()` is registered
as `.probe` in `aw88395_i2c_driver`.

Step 5.3 Record: Key callees include `devm_gpiod_get_optional()`,
`gpiod_set_value_cansleep()`, `devm_regmap_init_i2c()`, and later
`aw88395_init()`.

Step 5.4 Record: Call chain is I2C driver registration via
`module_i2c_driver()` -> I2C core `i2c_device_probe()` ->
`driver->probe(client)` -> `aw88395_i2c_probe()` ->
`aw88395_hw_reset()`. Trigger is device probe, not an unprivileged
syscall path.

Step 5.5 Record: Similar codec patterns usually return on
`IS_ERR(devm_gpiod_get_optional())`; `aw88166.c`, `wcd937x.c`,
`tlv320dac33.c`, and others show this local convention.

Step 6.1 Record: The driver does not exist in `v5.15` or `v6.1`. The
buggy code exists in `v6.3`, `v6.6`, `v6.12`, `v6.19`, and `v7.0`.

Step 6.2 Record: `git apply --check` passed on current `v7.0.3`;
temporary worktree checks passed cleanly on `v6.6`, `v6.12`, and
`v6.19`.

Step 6.3 Record: No related fix for this aw88395 reset GPIO issue was
found in checked `master`/file history; current stable checkout still
has the old code.

Step 7.1 Record: Subsystem is ALSA SoC codec driver under
`sound/soc/codecs`; criticality is driver-specific/peripheral, but
failure mode is severe for affected hardware.

Step 7.2 Record: aw88395 file history shows moderate ongoing maintenance
since the driver was added in v6.3.

Step 8.1 Record: Affected users are systems using the Awinic AW88395 I2C
codec driver.

Step 8.2 Record: Trigger is probe with missing optional reset GPIO, or
real GPIO acquisition failure such as deferred probe. Unprivileged users
cannot directly trigger this through a normal syscall path based on the
traced call chain.

Step 8.3 Record: Verified failure mode is probe-time NULL dereference
through `aw88395->aw_pa->dev` when `reset_gpio` is NULL before
`aw88395_init()` initializes `aw_pa`. Severity: HIGH/CRITICAL for
affected systems.

Step 8.4 Record: Benefit is high for affected hardware because it
prevents probe-time crash/misprobe. Risk is low: 9-line single-driver
change, no ABI/API changes, clean apply to representative stable trees.

Step 9.1 Record: Evidence for backporting: real NULL deref, correct
handling of `ERR_PTR`, maintainer feedback led to v2, maintainer
applied, small clean patch, affected code present in stable v6.6+.
Evidence against: no explicit reporter/test tag; exact `ERR_PTR` panic
claim was not verified because gpiolib validates error pointers.
Unresolved: no stable-specific lore discussion verified due WebFetch
blocking.

Step 9.2 Record: Stable rules: obviously correct yes; fixes real bug
yes; important issue yes, probe-time crash/oops; small and contained
yes; no new feature/API yes; applies to stable yes for checked
v6.6/v6.12/v6.19/v7.0.

Step 9.3 Record: No automatic exception category applies; this is not a
device ID, quirk, DT-only, build, or documentation patch.

Step 9.4 Record: Decision is backport. The verified NULL dereference
plus probe error handling fix outweigh the low regression risk.

## Verification
- [Phase 1] Parsed commit `241ee17ecb6be210f7b231b2a81bfb68871950d0`
  locally and confirmed tags/message.
- [Phase 2] Inspected patch and current source around
  `aw88395_hw_reset()` and `aw88395_i2c_probe()`.
- [Phase 2] Checked `drivers/gpio/gpiolib.c`: `validate_desc()` handles
  NULL and `IS_ERR()` descriptors before `gpiod_set_value_cansleep()`
  continues.
- [Phase 3] `git blame` traced both changed areas to `62fc25fbab5f4`,
  first present in `v6.3`.
- [Phase 4] `b4 dig` found v1/v2; v1 reviewer feedback from Mark Brown
  requested addressing the optional GPIO logging bug; v2 was applied.
- [Phase 5] Traced caller registration through `.probe`,
  `module_i2c_driver()`, and I2C core `driver->probe(client)`.
- [Phase 6] Verified code absent in `v5.15`/`v6.1`, present in `v6.3+`,
  and cleanly applying to `v6.6`, `v6.12`, `v6.19`, and current
  `v7.0.3`.
- [Phase 8] Verified `aw88395_malloc_init()` uses `devm_kzalloc()`, so
  `aw_pa` is NULL until later `aw88395_init()`.

**YES**

 sound/soc/codecs/aw88395/aw88395.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/sound/soc/codecs/aw88395/aw88395.c b/sound/soc/codecs/aw88395/aw88395.c
index 3602b5b9f7d77..dd09bac652f7f 100644
--- a/sound/soc/codecs/aw88395/aw88395.c
+++ b/sound/soc/codecs/aw88395/aw88395.c
@@ -456,8 +456,6 @@ static void aw88395_hw_reset(struct aw88395 *aw88395)
 		usleep_range(AW88395_1000_US, AW88395_1000_US + 10);
 		gpiod_set_value_cansleep(aw88395->reset_gpio, 1);
 		usleep_range(AW88395_1000_US, AW88395_1000_US + 10);
-	} else {
-		dev_err(aw88395->aw_pa->dev, "%s failed", __func__);
 	}
 }

@@ -522,9 +520,10 @@ static int aw88395_i2c_probe(struct i2c_client *i2c)
 	i2c_set_clientdata(i2c, aw88395);

 	aw88395->reset_gpio = devm_gpiod_get_optional(&i2c->dev, "reset", GPIOD_OUT_LOW);
-	if (IS_ERR(aw88395->reset_gpio))
-		dev_info(&i2c->dev, "reset gpio not defined\n");
-
+	if (IS_ERR(aw88395->reset_gpio)) {
+		return dev_err_probe(&i2c->dev, PTR_ERR(aw88395->reset_gpio),
+				"failed to get reset gpio\n");
+	}
 	/* hardware reset */
 	aw88395_hw_reset(aw88395);

-- 
2.53.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH AUTOSEL 7.0-6.12] nvme-tcp: teardown circular locking fixes
  2026-05-05  9:51 [PATCH AUTOSEL 7.0-5.10] ALSA: hda: Avoid WARN_ON() for HDMI chmap slot checks Sasha Levin
                   ` (19 preceding siblings ...)
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.6] ASoC: aw88395: Fix kernel panic caused by invalid GPIO error pointer Sasha Levin
@ 2026-05-05  9:51 ` Sasha Levin
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.18] btrfs: fix wrong min_objectid in btrfs_previous_item() call Sasha Levin
                   ` (6 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Sasha Levin @ 2026-05-05  9:51 UTC (permalink / raw)
  To: patches, stable
  Cc: Chaitanya Kulkarni, Christoph Hellwig, Sagi Grimberg,
	Hannes Reinecke, Keith Busch, Sasha Levin, linux-nvme,
	linux-kernel

From: Chaitanya Kulkarni <kch@nvidia.com>

[ Upstream commit 26bb12b9caafa2e62d638104bf2732f610cdbb0b ]

When a controller reset is triggered via sysfs (by writing to
/sys/class/nvme/<nvmedev>/reset_controller), the reset work tears down
and re-establishes all queues. The socket release using fput() defers
the actual cleanup to task_work delayed_fput workqueue. This deferred
cleanup can race with the subsequent queue re-allocation during reset,
potentially leading to use-after-free or resource conflicts.

Replace fput() with __fput_sync() to ensure synchronous socket release,
guaranteeing that all socket resources are fully cleaned up before the
function returns. This prevents races during controller reset where
new queue setup may begin before the old socket is fully released.

* Call chain during reset:
  nvme_reset_ctrl_work()
    -> nvme_tcp_teardown_ctrl()
      -> nvme_tcp_teardown_io_queues()
        -> nvme_tcp_free_io_queues()
          -> nvme_tcp_free_queue()       <-- fput() -> __fput_sync()
      -> nvme_tcp_teardown_admin_queue()
        -> nvme_tcp_free_admin_queue()
          -> nvme_tcp_free_queue()       <-- fput() -> __fput_sync()
    -> nvme_tcp_setup_ctrl()             <-- race with deferred fput

memalloc_noreclaim_save() sets PF_MEMALLOC which is intended for tasks
performing memory reclaim work that need reserve access. While PF_MEMALLOC
prevents the task from entering direct reclaim (causing __need_reclaim() to
return false), it does not strip __GFP_IO from gfp flags. The allocator can
therefore still trigger writeback I/O when __GFP_IO remains set, which is
unsafe when the caller holds block layer locks.

Switch to memalloc_noio_save() which sets PF_MEMALLOC_NOIO. This causes
current_gfp_context() to strip __GFP_IO|__GFP_FS from every allocation in
the scope, making it safe to allocate memory while holding elevator_lock and
set->srcu.

* The issue can be reproduced using blktests:

  nvme_trtype=tcp ./check nvme/005
blktests (master) # nvme_trtype=tcp ./check nvme/005
nvme/005 (tr=tcp) (reset local loopback target)              [failed]
    runtime  0.725s  ...  0.798s
    something found in dmesg:
    [  108.473940] run blktests nvme/005 at 2025-11-22 16:12:20

    [...]
    ...
    (See '/root/blktests/results/nodev_tr_tcp/nvme/005.dmesg' for the entire message)
blktests (master) # cat /root/blktests/results/nodev_tr_tcp/nvme/005.dmesg
[  108.473940] run blktests nvme/005 at 2025-11-22 16:12:20
[  108.526983] loop0: detected capacity change from 0 to 2097152
[  108.555606] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[  108.572531] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
[  108.613061] nvmet: Created nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
[  108.616832] nvme nvme0: creating 48 I/O queues.
[  108.630791] nvme nvme0: mapped 48/0/0 default/read/poll queues.
[  108.661892] nvme nvme0: new ctrl: NQN "blktests-subsystem-1", addr 127.0.0.1:4420, hostnqn: nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
[  108.746639] nvmet: Created nvm controller 2 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
[  108.748466] nvme nvme0: creating 48 I/O queues.
[  108.802984] nvme nvme0: mapped 48/0/0 default/read/poll queues.
[  108.829983] nvme nvme0: Removing ctrl: NQN "blktests-subsystem-1"
[  108.854288] block nvme0n1: no available path - failing I/O
[  108.854344] block nvme0n1: no available path - failing I/O
[  108.854373] Buffer I/O error on dev nvme0n1, logical block 1, async page read

[  108.891693] ======================================================
[  108.895912] WARNING: possible circular locking dependency detected
[  108.900184] 6.17.0nvme+ #3 Tainted: G                 N
[  108.903913] ------------------------------------------------------
[  108.908171] nvme/2734 is trying to acquire lock:
[  108.911957] ffff88810210e610 (set->srcu){.+.+}-{0:0}, at: __synchronize_srcu+0x17/0x170
[  108.917587]
               but task is already holding lock:
[  108.921570] ffff88813abea198 (&q->elevator_lock){+.+.}-{4:4}, at: elevator_change+0xa8/0x1c0
[  108.927361]
               which lock already depends on the new lock.

[  108.933018]
               the existing dependency chain (in reverse order) is:
[  108.938223]
               -> #4 (&q->elevator_lock){+.+.}-{4:4}:
[  108.942988]        __mutex_lock+0xa2/0x1150
[  108.945873]        elevator_change+0xa8/0x1c0
[  108.948925]        elv_iosched_store+0xdf/0x140
[  108.952043]        kernfs_fop_write_iter+0x16a/0x220
[  108.955367]        vfs_write+0x378/0x520
[  108.957598]        ksys_write+0x67/0xe0
[  108.959721]        do_syscall_64+0x76/0xbb0
[  108.962052]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  108.965145]
               -> #3 (&q->q_usage_counter(io)){++++}-{0:0}:
[  108.968923]        blk_alloc_queue+0x30e/0x350
[  108.972117]        blk_mq_alloc_queue+0x61/0xd0
[  108.974677]        scsi_alloc_sdev+0x2a0/0x3e0
[  108.977092]        scsi_probe_and_add_lun+0x1bd/0x430
[  108.979921]        __scsi_add_device+0x109/0x120
[  108.982504]        ata_scsi_scan_host+0x97/0x1c0
[  108.984365]        async_run_entry_fn+0x2d/0x130
[  108.986109]        process_one_work+0x20e/0x630
[  108.987830]        worker_thread+0x184/0x330
[  108.989473]        kthread+0x10a/0x250
[  108.990852]        ret_from_fork+0x297/0x300
[  108.992491]        ret_from_fork_asm+0x1a/0x30
[  108.994159]
               -> #2 (fs_reclaim){+.+.}-{0:0}:
[  108.996320]        fs_reclaim_acquire+0x99/0xd0
[  108.998058]        kmem_cache_alloc_node_noprof+0x4e/0x3c0
[  109.000123]        __alloc_skb+0x15f/0x190
[  109.002195]        tcp_send_active_reset+0x3f/0x1e0
[  109.004038]        tcp_disconnect+0x50b/0x720
[  109.005695]        __tcp_close+0x2b8/0x4b0
[  109.007227]        tcp_close+0x20/0x80
[  109.008663]        inet_release+0x31/0x60
[  109.010175]        __sock_release+0x3a/0xc0
[  109.011778]        sock_close+0x14/0x20
[  109.013263]        __fput+0xee/0x2c0
[  109.014673]        delayed_fput+0x31/0x50
[  109.016183]        process_one_work+0x20e/0x630
[  109.017897]        worker_thread+0x184/0x330
[  109.019543]        kthread+0x10a/0x250
[  109.020929]        ret_from_fork+0x297/0x300
[  109.022565]        ret_from_fork_asm+0x1a/0x30
[  109.024194]
               -> #1 (sk_lock-AF_INET-NVME){+.+.}-{0:0}:
[  109.026634]        lock_sock_nested+0x2e/0x70
[  109.028251]        tcp_sendmsg+0x1a/0x40
[  109.029783]        sock_sendmsg+0xed/0x110
[  109.031321]        nvme_tcp_try_send_cmd_pdu+0x13e/0x260 [nvme_tcp]
[  109.034263]        nvme_tcp_try_send+0xb3/0x330 [nvme_tcp]
[  109.036375]        nvme_tcp_queue_rq+0x342/0x3d0 [nvme_tcp]
[  109.038528]        blk_mq_dispatch_rq_list+0x297/0x800
[  109.040448]        __blk_mq_sched_dispatch_requests+0x3db/0x5f0
[  109.042677]        blk_mq_sched_dispatch_requests+0x29/0x70
[  109.044787]        blk_mq_run_work_fn+0x76/0x1b0
[  109.046535]        process_one_work+0x20e/0x630
[  109.048245]        worker_thread+0x184/0x330
[  109.049890]        kthread+0x10a/0x250
[  109.051331]        ret_from_fork+0x297/0x300
[  109.053024]        ret_from_fork_asm+0x1a/0x30
[  109.054740]
               -> #0 (set->srcu){.+.+}-{0:0}:
[  109.056850]        __lock_acquire+0x1468/0x2210
[  109.058614]        lock_sync+0xa5/0x110
[  109.060048]        __synchronize_srcu+0x49/0x170
[  109.061802]        elevator_switch+0xc9/0x330
[  109.063950]        elevator_change+0x128/0x1c0
[  109.065675]        elevator_set_none+0x4c/0x90
[  109.067316]        blk_unregister_queue+0xa8/0x110
[  109.069165]        __del_gendisk+0x14e/0x3c0
[  109.070824]        del_gendisk+0x75/0xa0
[  109.072328]        nvme_ns_remove+0xf2/0x230 [nvme_core]
[  109.074365]        nvme_remove_namespaces+0xf2/0x150 [nvme_core]
[  109.076652]        nvme_do_delete_ctrl+0x71/0x90 [nvme_core]
[  109.078775]        nvme_delete_ctrl_sync+0x3b/0x50 [nvme_core]
[  109.081009]        nvme_sysfs_delete+0x34/0x40 [nvme_core]
[  109.083082]        kernfs_fop_write_iter+0x16a/0x220
[  109.085009]        vfs_write+0x378/0x520
[  109.086539]        ksys_write+0x67/0xe0
[  109.087982]        do_syscall_64+0x76/0xbb0
[  109.089577]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  109.091665]
               other info that might help us debug this:

[  109.095478] Chain exists of:
                 set->srcu --> &q->q_usage_counter(io) --> &q->elevator_lock

[  109.099544]  Possible unsafe locking scenario:

[  109.101708]        CPU0                    CPU1
[  109.103402]        ----                    ----
[  109.105103]   lock(&q->elevator_lock);
[  109.106530]                                lock(&q->q_usage_counter(io));
[  109.109022]                                lock(&q->elevator_lock);
[  109.111391]   sync(set->srcu);
[  109.112586]
                *** DEADLOCK ***

[  109.114772] 5 locks held by nvme/2734:
[  109.116189]  #0: ffff888101925410 (sb_writers#4){.+.+}-{0:0}, at: ksys_write+0x67/0xe0
[  109.119143]  #1: ffff88817a914e88 (&of->mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0x10f/0x220
[  109.123141]  #2: ffff8881046313f8 (kn->active#185){++++}-{0:0}, at: sysfs_remove_file_self+0x26/0x50
[  109.126543]  #3: ffff88810470e1d0 (&set->update_nr_hwq_lock){++++}-{4:4}, at: del_gendisk+0x6d/0xa0
[  109.129891]  #4: ffff88813abea198 (&q->elevator_lock){+.+.}-{4:4}, at: elevator_change+0xa8/0x1c0
[  109.133149]
               stack backtrace:
[  109.134817] CPU: 6 UID: 0 PID: 2734 Comm: nvme Tainted: G                 N  6.17.0nvme+ #3 PREEMPT(voluntary)
[  109.134819] Tainted: [N]=TEST
[  109.134820] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[  109.134821] Call Trace:
[  109.134823]  <TASK>
[  109.134824]  dump_stack_lvl+0x75/0xb0
[  109.134828]  print_circular_bug+0x26a/0x330
[  109.134831]  check_noncircular+0x12f/0x150
[  109.134834]  __lock_acquire+0x1468/0x2210
[  109.134837]  ? __synchronize_srcu+0x17/0x170
[  109.134838]  lock_sync+0xa5/0x110
[  109.134840]  ? __synchronize_srcu+0x17/0x170
[  109.134842]  __synchronize_srcu+0x49/0x170
[  109.134843]  ? mark_held_locks+0x49/0x80
[  109.134845]  ? _raw_spin_unlock_irqrestore+0x2d/0x60
[  109.134847]  ? kvm_clock_get_cycles+0x14/0x30
[  109.134853]  ? ktime_get_mono_fast_ns+0x36/0xb0
[  109.134858]  elevator_switch+0xc9/0x330
[  109.134860]  elevator_change+0x128/0x1c0
[  109.134862]  ? kernfs_put.part.0+0x86/0x290
[  109.134864]  elevator_set_none+0x4c/0x90
[  109.134866]  blk_unregister_queue+0xa8/0x110
[  109.134868]  __del_gendisk+0x14e/0x3c0
[  109.134870]  del_gendisk+0x75/0xa0
[  109.134872]  nvme_ns_remove+0xf2/0x230 [nvme_core]
[  109.134879]  nvme_remove_namespaces+0xf2/0x150 [nvme_core]
[  109.134887]  nvme_do_delete_ctrl+0x71/0x90 [nvme_core]
[  109.134893]  nvme_delete_ctrl_sync+0x3b/0x50 [nvme_core]
[  109.134899]  nvme_sysfs_delete+0x34/0x40 [nvme_core]
[  109.134905]  kernfs_fop_write_iter+0x16a/0x220
[  109.134908]  vfs_write+0x378/0x520
[  109.134911]  ksys_write+0x67/0xe0
[  109.134913]  do_syscall_64+0x76/0xbb0
[  109.134915]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  109.134916] RIP: 0033:0x7fd68a737317
[  109.134917] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[  109.134919] RSP: 002b:00007ffded1546d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  109.134920] RAX: ffffffffffffffda RBX: 000000000054f7e0 RCX: 00007fd68a737317
[  109.134921] RDX: 0000000000000001 RSI: 00007fd68a855719 RDI: 0000000000000003
[  109.134921] RBP: 0000000000000003 R08: 0000000030407850 R09: 00007fd68a7cd4e0
[  109.134922] R10: 00007fd68a65b130 R11: 0000000000000246 R12: 00007fd68a855719
[  109.134923] R13: 00000000304074c0 R14: 00000000304074c0 R15: 0000000030408660
[  109.134926]  </TASK>
[  109.962756] Key type psk unregistered

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Backport Analysis

### Phase 1: Commit Message Forensics
Step 1.1 Record: subsystem is `nvme-tcp` under
`drivers/nvme/host/tcp.c`; action is a fix for teardown circular
locking. Claimed intent is to make socket teardown synchronous during
reset and use a NOIO allocation scope during socket release.

Step 1.2 Record: candidate tags present are:
- `Reviewed-by: Christoph Hellwig <hch@lst.de>`
- `Reviewed-by: Sagi Grimberg <sagi@grimberg.me>`
- `Reviewed-by: Hannes Reinecke <hare@suse.de>`
- `Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>`
- `Signed-off-by: Keith Busch <kbusch@kernel.org>`

No `Fixes:`, `Reported-by:`, `Tested-by:`, `Cc: stable`, or `Link:` tag
is present in the supplied commit message. `b4 am` additionally found
review trailers from Hannes Reinecke, Daniel Wagner, and Nilay Shroff on
the v4 thread.

Step 1.3 Record: the commit describes a real reset-time race and lockdep
issue. The verified reproducer is `nvme_trtype=tcp ./check nvme/005`.
The supplied dmesg shows a “possible circular locking dependency
detected” involving `set->srcu`, `q->elevator_lock`, `fs_reclaim`, and
NVMe/TCP socket teardown through `delayed_fput`. The root cause is that
`fput()` defers `__fput()` and `memalloc_noreclaim_save()` does not
strip `__GFP_IO`.

Step 1.4 Record: this is not merely cleanup. It is a synchronization and
deadlock-prevention fix, with a claimed possible UAF/resource conflict
from deferred socket release. The UAF aspect is verified as
author/reviewer rationale, while the lockdep circular dependency is
directly backed by the reported trace.

### Phase 2: Diff Analysis
Step 2.1 Record: one file changed: `drivers/nvme/host/tcp.c`, 21
insertions and 7 deletions in the v4 patch. Functions modified:
`nvme_tcp_free_queue()` and the `err_sock` path in
`nvme_tcp_alloc_queue()`. Scope is a single-file surgical driver fix.

Step 2.2 Record:
- In `nvme_tcp_free_queue()`, before: drain page fragments, enter
  `memalloc_noreclaim_save()`, call deferred `fput()`, clear
  `queue->sock`, restore noreclaim. After: enter `memalloc_noio_save()`,
  call `__fput_sync()`, clear `queue->sock`, restore NOIO.
- In `nvme_tcp_alloc_queue()` error handling, before: failed queue setup
  used `fput()`. After: it uses `__fput_sync()`.

Step 2.3 Record: bug categories are race condition, lockdep/deadlock
prevention, and allocation-context correctness. The race mechanism is
deferred file/socket destruction via `fput()` while reset immediately
re-enters queue setup. The deadlock mechanism is socket close allocating
memory with I/O allowed while block teardown paths hold locks.

Step 2.4 Record: the fix is minimal and understandable. Risk is low to
medium: `__fput_sync()` is intentionally special-purpose and
`fs/file_table.c` warns not to blindly convert callers, but here the
code has a concrete need and subsystem reviewers accepted it.
`memalloc_noio_save()` is the right primitive for suppressing
`__GFP_IO|__GFP_FS`, verified in `include/linux/sched/mm.h`.

### Phase 3: Git History Investigation
Step 3.1 Record: `git blame` shows:
- `nvme_tcp_free_queue()` dates to `3f2304f8c6d6` (“nvme-tcp: add NVMe
  over TCP host driver”), described as first contained around
  `v5.0-rc1`.
- `memalloc_noreclaim_save()` in this area came from `83e1226b0ee2`
  (“nvme-tcp: fix possible circular locking when deleting a controller
  under memory pressure”), first contained around `v6.1-rc3`.
- `fput(queue->sock->file)` came from `e40d4eb84089` (“nvme-tcp:
  allocate socket file”), first contained around `v6.7-rc1`.

Step 3.2 Record: no `Fixes:` tag in the candidate. I inspected related
commits instead. `83e1226b0ee2` fixed an earlier lockdep circular
locking report by adding `memalloc_noreclaim_save()`, but this candidate
corrects that to `memalloc_noio_save()` for the newer lock chain.
`e40d4eb84089` introduced socket files for TLS upcalls and therefore the
`fput()` path.

Step 3.3 Record: recent file history shows active NVMe/TCP maintenance,
including queue teardown, TLS, request handling, and UAF/race fixes. No
required multi-patch series dependency was found for the exact current-
tree patch.

Step 3.4 Record: Chaitanya Kulkarni has multiple recent NVMe/block fixes
in `drivers/nvme/host`; the strongest quality signal is that Christoph
Hellwig, Sagi Grimberg, Hannes Reinecke, Daniel Wagner, and Nilay Shroff
reviewed/tested or reviewed the patch thread.

Step 3.5 Record: dependencies are existing core APIs: `__fput_sync()` is
exported in `fs/file_table.c`, and `memalloc_noio_save()` is available
in `include/linux/sched/mm.h`. For older stable branches, the exact
`fput(queue->sock->file)` part only exists where `e40d4eb84089` is
present.

### Phase 4: Mailing List And External Research
Step 4.1 Record: I could not use `b4 dig -c` because the exact applied
commit hash was not present in local `master`, `linus-next/master`,
`storage-next`, or `pending-7.0`; `b4 dig` only accepts a commitish. I
used the message-id with `b4 am -c`, which found the v4 patch at
`https://patch.msgid.link/20260413171628.6204-1-kch@nvidia.com`.

Step 4.2 Record: original recipients/reviewers verified from the raw
thread: To `kbusch`, `sagi`; Cc `hch`, `linux-nvme`. Review trailers
found: Christoph Hellwig, Sagi Grimberg, Hannes Reinecke, Daniel Wagner,
Nilay Shroff. Keith Busch replied “applied to nvme-7.1”.

Step 4.3 Record: no syzbot/bugzilla report. The bug report evidence is
the included blktests `nvme/005` failure and lockdep trace. Daniel
Wagner replied that he tested locally with blktests and it passed,
though he could not reproduce the original failure.

Step 4.4 Record: patch evolution was v2 to v3 to v4. v2 only converted
`fput()` to `__fput_sync()`. v3 added the `memalloc_noio_save()` change
after feedback from Nilay/Christoph/Hannes. v4 rebased/retested and
added review tags. No NAKs found.

Step 4.5 Record: direct lore stable search was blocked by Anubis; web
search did not find stable-specific objections or a known reason to
avoid stable.

### Phase 5: Code Semantic Analysis
Step 5.1 Record: key functions modified are `nvme_tcp_free_queue()` and
`nvme_tcp_alloc_queue()`.

Step 5.2 Record: callers verified:
- `nvme_tcp_free_admin_queue()` calls `nvme_tcp_free_queue(ctrl, 0)`.
- `nvme_tcp_free_io_queues()` calls `nvme_tcp_free_queue()` for I/O
  queues.
- `nvme_tcp_teardown_ctrl()` calls I/O teardown then admin teardown.
- `nvme_reset_ctrl_work()` calls `nvme_tcp_teardown_ctrl()` followed
  immediately by `nvme_tcp_setup_ctrl()`.
- `nvme_sysfs_reset()` calls `nvme_reset_ctrl_sync()`, which queues and
  flushes `ctrl->reset_work`.

Step 5.3 Record: important callees are `fput()`/`__fput_sync()`, socket
close through `__fput()`, `tcp_close()`, `tcp_disconnect()`, and
allocations in `tcp_send_active_reset()` as shown in the trace.
`current_gfp_context()` strips `__GFP_IO|__GFP_FS` only for
`PF_MEMALLOC_NOIO`, not plain `PF_MEMALLOC`.

Step 5.4 Record: reachability is real. The reset path is reachable from
writable sysfs `reset_controller` and from NVMe reset ioctl paths; both
are privileged/admin operations. The send-side lock chain is reachable
through normal NVMe/TCP block I/O via `nvme_tcp_queue_rq()` ->
`nvme_tcp_queue_request()` -> workqueue send.

Step 5.5 Record: related pattern found: prior `83e1226b0ee2` was also an
NVMe/TCP circular locking fix around socket teardown under memory
pressure. No prior `__fput_sync()` fix in `drivers/nvme/host/tcp.c`
history was found.

### Phase 6: Stable Tree Analysis
Step 6.1 Record:
- `stable/linux-6.12.y`, `6.17.y`, `6.18.y`, `6.19.y`, and `7.0.y`
  contain `sock_alloc_file()`, `fput(queue->sock->file)`, and
  `memalloc_noreclaim_save()`, so they contain the exact bug pattern.
- `stable/linux-6.1.y` and `6.6.y` contain the
  `memalloc_noreclaim_save()` plus `sock_release(queue->sock)` teardown
  pattern, so the NOIO part is relevant but the exact `fput()` hunk does
  not apply.
- `stable/linux-5.10.y` and `5.15.y` in this repo did not show the
  specific `memalloc_noreclaim_save()` or `fput(queue->sock->file)`
  patterns.

Step 6.2 Record: `git apply --check` succeeds on the current `7.0.y`
checkout. Raw v4 patch does not apply cleanly to `6.12.y`, `6.6.y`, or
`5.10.y` test worktrees; `6.12.y` has the bug pattern but nearby context
differs, while `6.6.y`/`5.10.y` lack the `fput(queue->sock->file)` form.
Expected backport difficulty is clean for current 7.0, minor rework for
6.12+, and adapted/no partial backport for older branches.

Step 6.3 Record: related fix already in current history is
`83e1226b0ee2`; this candidate is a follow-up/correction rather than a
duplicate. I found no alternate `__fput_sync()` fix already in this file
history.

### Phase 7: Subsystem Context
Step 7.1 Record: subsystem is NVMe/TCP host driver, in storage/block.
Criticality is IMPORTANT: driver-specific, but it backs real block
devices and can affect I/O availability and teardown/reset reliability.

Step 7.2 Record: subsystem is active; recent `drivers/nvme/host/tcp.c`
history includes TLS, queue removal, congestion, stalls, UAF, and
failover fixes.

### Phase 8: Impact And Risk
Step 8.1 Record: affected users are systems using NVMe over TCP,
especially during controller reset/delete/reconnect or tests like
blktests `nvme/005`.

Step 8.2 Record: trigger is privileged/admin reset via sysfs or ioctl,
and teardown/delete paths. The I/O lock chain involves normal NVMe/TCP
request submission, but initiating reset/delete is not unprivileged in
the verified paths.

Step 8.3 Record: failure mode is at least HIGH and plausibly CRITICAL:
verified lockdep circular dependency with a possible deadlock scenario,
I/O failures in the reproducer trace, and a reviewed claim of possible
UAF/resource conflict from deferred socket cleanup.

Step 8.4 Record: benefit is high for affected NVMe/TCP stable users
because it prevents reset/teardown races and circular locking. Risk is
low-medium because the patch changes teardown/error paths only, is
small, and has strong review, though `__fput_sync()` is a sensitive
primitive.

### Phase 9: Final Synthesis
Step 9.1 Record:
Evidence for backporting: real reproduced lockdep issue, concrete
blktests reproducer, small single-file fix, no new API or feature,
strong subsystem review, exact bug pattern present in active stable
trees from 6.12+ and current 7.0.
Evidence against: exact patch does not apply to some older stable trees;
older branches need adaptation or may not contain the same bug pattern.
`__fput_sync()` has general cautionary documentation.
Unresolved: no applied upstream commit hash available locally, so `b4
dig -c` could not be performed; no direct stable-list discussion could
be fetched due lore Anubis.

Step 9.2 Record:
1. Obviously correct and tested: yes, based on direct code inspection,
   blktests discussion, and review/test replies.
2. Fixes a real bug: yes, verified lockdep circular dependency and reset
   teardown race.
3. Important issue: yes, possible deadlock/hang during NVMe/TCP
   reset/teardown.
4. Small and contained: yes, one file and two localized hunks.
5. No new features/APIs: yes.
6. Can apply to stable: yes for current 7.0; needs minor/adapted
   backports for some older branches.

Step 9.3 Record: no exception category such as device ID, quirk, DT,
build, or docs applies.

Step 9.4 Record: this should be backported to stable trees that contain
the affected NVMe/TCP socket-file teardown path, with branch-specific
adjustment where necessary.

## Verification
- Phase 1: Parsed supplied tags and verified additional review trailers
  via `b4 am -c`.
- Phase 2: Verified diff scope from v4 patch: `drivers/nvme/host/tcp.c`,
  21 insertions, 7 deletions.
- Phase 3: Ran `git blame` on both changed areas; identified
  `83e1226b0ee2`, `e40d4eb84089`, and `3f2304f8c6d6`.
- Phase 3: Ran `git show` on `83e1226b0ee2` and `e40d4eb84089`;
  confirmed prior circular-locking context and socket-file introduction.
- Phase 4: Fetched v2, v3, and v4 threads from public mirrors; verified
  reviewer feedback and no NAKs.
- Phase 5: Read `nvme_reset_ctrl_work()`, queue teardown/setup
  functions, sysfs reset, and `nvme_reset_ctrl_sync()` to verify
  reachability.
- Phase 5: Read `fs/file_table.c` and `include/linux/sched/mm.h` to
  verify `fput()` deferral, `__fput_sync()`, and NOIO semantics.
- Phase 6: Checked stable branch code patterns for `5.10.y`, `5.15.y`,
  `6.1.y`, `6.6.y`, `6.12.y`, `6.17.y`, `6.18.y`, `6.19.y`, and `7.0.y`.
- Phase 6: Ran `git apply --check`; current `7.0.y` applies, older
  tested branches need rework or lack exact context.
- Unverified: exact applied upstream commit hash was not available
  locally, so `b4 dig -c`, `b4 dig -a`, and `b4 dig -w` could not be
  run.

**YES**

 drivers/nvme/host/tcp.c | 28 +++++++++++++++++++++-------
 1 file changed, 21 insertions(+), 7 deletions(-)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 02c95c32b07e3..15d36d6a728e8 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1438,18 +1438,32 @@ static void nvme_tcp_free_queue(struct nvme_ctrl *nctrl, int qid)
 {
 	struct nvme_tcp_ctrl *ctrl = to_tcp_ctrl(nctrl);
 	struct nvme_tcp_queue *queue = &ctrl->queues[qid];
-	unsigned int noreclaim_flag;
+	unsigned int noio_flag;

 	if (!test_and_clear_bit(NVME_TCP_Q_ALLOCATED, &queue->flags))
 		return;

 	page_frag_cache_drain(&queue->pf_cache);

-	noreclaim_flag = memalloc_noreclaim_save();
-	/* ->sock will be released by fput() */
-	fput(queue->sock->file);
+	/**
+	 * Prevent memory reclaim from triggering block I/O during socket
+	 * teardown. The socket release path fput -> tcp_close ->
+	 * tcp_disconnect -> tcp_send_active_reset may allocate memory, and
+	 * allowing reclaim to issue I/O could deadlock if we're being called
+	 * from block device teardown (e.g., del_gendisk -> elevator cleanup)
+	 * which holds locks that the I/O completion path needs.
+	 */
+	noio_flag = memalloc_noio_save();
+
+	/**
+	 * Release the socket synchronously. During reset in
+	 * nvme_reset_ctrl_work(), queue teardown is immediately followed by
+	 * re-allocation. fput() defers socket cleanup to delayed_fput_work
+	 * in workqueue context, which can race with new queue setup.
+	 */
+	__fput_sync(queue->sock->file);
 	queue->sock = NULL;
-	memalloc_noreclaim_restore(noreclaim_flag);
+	memalloc_noio_restore(noio_flag);

 	kfree(queue->pdu);
 	mutex_destroy(&queue->send_mutex);
@@ -1901,8 +1915,8 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl, int qid,
 err_rcv_pdu:
 	kfree(queue->pdu);
 err_sock:
-	/* ->sock will be released by fput() */
-	fput(queue->sock->file);
+	/* Use sync variant - see nvme_tcp_free_queue() for explanation */
+	__fput_sync(queue->sock->file);
 	queue->sock = NULL;
 err_destroy_mutex:
 	mutex_destroy(&queue->send_mutex);
-- 
2.53.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH AUTOSEL 7.0-6.18] btrfs: fix wrong min_objectid in btrfs_previous_item() call
  2026-05-05  9:51 [PATCH AUTOSEL 7.0-5.10] ALSA: hda: Avoid WARN_ON() for HDMI chmap slot checks Sasha Levin
                   ` (20 preceding siblings ...)
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.12] nvme-tcp: teardown circular locking fixes Sasha Levin
@ 2026-05-05  9:51 ` Sasha Levin
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.18] btrfs: check return value of btrfs_partially_delete_raid_extent() Sasha Levin
                   ` (5 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Sasha Levin @ 2026-05-05  9:51 UTC (permalink / raw)
  To: patches, stable
  Cc: robbieko, Johannes Thumshirn, David Sterba, Sasha Levin, clm,
	linux-btrfs, linux-kernel

From: robbieko <robbieko@synology.com>

[ Upstream commit 1871ae78ffa5ce7c0458e9ba5867958c1753e425 ]

When found_start > start and slot == 0, btrfs_previous_item() is called
with min_objectid=start to find the previous stripe extent. However, the
previous stripe extent we are looking for has objectid < start (it starts
before our deletion range), so passing start as min_objectid prevents
finding it.

Fix by passing 0 as min_objectid to allow finding any preceding stripe
extent regardless of its objectid.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: robbieko <robbieko@synology.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Phase 1: Commit Message Forensics
Record 1.1: Subsystem `btrfs`; action verb `fix`; claimed intent is
correcting the `min_objectid` argument to `btrfs_previous_item()` in the
RAID stripe-tree deletion path.

Record 1.2: Tags found: `Reviewed-by: Johannes Thumshirn
<johannes.thumshirn@wdc.com>`, `Signed-off-by: robbieko
<robbieko@synology.com>`, `Reviewed-by: David Sterba
<dsterba@suse.com>`, `Signed-off-by: David Sterba <dsterba@suse.com>`.
No `Fixes:`, `Reported-by:`, `Tested-by:`, `Link:`, or `Cc:
stable@vger.kernel.org` tags were present.

Record 1.3: The commit describes a real lookup bug: when `found_start >
start` and `slot == 0`, the code tries to find the previous RAID stripe
extent, but passes `min_objectid=start`. `btrfs_previous_item()` stops
when it sees an item with `objectid < min_objectid`, so it cannot return
the previous extent that starts before `start`.

Record 1.4: This is not hidden cleanup. It is an explicit
logic/correctness fix for a failed previous-item lookup.

## Phase 2: Diff Analysis
Record 2.1: One file changed: `fs/btrfs/raid-stripe-tree.c`, 1 insertion
and 1 deletion. Modified function: `btrfs_delete_raid_extent()`. Scope:
single-file, one-line surgical fix.

Record 2.2: Before: `btrfs_previous_item(stripe_root, path, start,
BTRFS_RAID_STRIPE_KEY)` would reject any candidate with `objectid <
start`. After: `min_objectid=0` allows searching back to earlier stripe
extents.

Record 2.3: Bug category: logic/correctness bug in B-tree item lookup.
Mechanism verified in `btrfs_previous_item()`: it breaks when
`found_key.objectid < min_objectid` before returning a matching type, so
`min_objectid=start` prevents the intended previous extent from being
found.

Record 2.4: Fix quality is high: one argument change, no API changes, no
new feature. Regression risk is low, but not zero: the same series has
the immediately following patch `653361585d251` adding a proper overlap
check after this fallback, so stable backports should consider the
surrounding RAID stripe-tree deletion fixes too.

## Phase 3: Git History Investigation
Record 3.1: `git blame` shows the buggy call was introduced by
`76643119045ee` (`btrfs: fix deletion of a range spanning parts two RAID
stripe extents`), first contained around `v6.14-rc1`. The broader RAID
stripe tree code was introduced by `ca41504efda6`, first contained
around `v6.7-rc1`.

Record 3.2: No `Fixes:` tag is present, so there was no tag target to
follow.

Record 3.3: Recent file history shows this commit is patch 3 in a six-
patch RAID stripe-tree deletion bugfix series: `513f8a52eed88`,
`2aef5cb1dcf9b`, `1871ae78ffa5c`, `653361585d251`, `fe0cdfd7118d8`, and
`a8d58a7c02009`. This commit applies standalone to `v7.0.3`, but it is
part of a related correctness series.

Record 3.4: Author `robbieko` had several same-day Btrfs RAID stripe-
tree fixes in `fs/btrfs`. Committer/reviewer David Sterba is the Btrfs
maintainer.

Record 3.5: No hard prerequisite is needed for this one-line change to
apply. `git apply --check` against the current `stable/linux-7.0.y`
checkout succeeded. Semantic companion patches exist and should be
considered, especially the next ASSERT-to-error-handling fix.

## Phase 4: Mailing List And External Research
Record 4.1: `b4 dig -c 1871ae78ffa5c` found the original submission at `
https://patch.msgid.link/20260413065249.2320122-4-
robbieko@synology.com`, titled `[PATCH 3/6] btrfs: fix wrong
min_objectid in btrfs_previous_item() call`.

Record 4.2: `b4 dig -w` showed original recipients: `robbieko
<robbieko@synology.com>` and `linux-btrfs@vger.kernel.org`. The mbox
shows Johannes Thumshirn reviewed patch 3 with “Looks good” and
`Reviewed-by`.

Record 4.3: No external bug report, syzbot report, or user report was
linked.

Record 4.4: The cover letter says the series fixes six bugs in
`fs/btrfs/raid-stripe-tree.c`, all in stripe extent deletion and partial
deletion paths. Patch 3 is specifically this wrong `min_objectid` fix.

Record 4.5: `WebFetch` for lore and stable searches was blocked by
Anubis. Local `b4` mbox inspection found no stable-specific nomination
or objection.

## Phase 5: Code Semantic Analysis
Record 5.1: Modified function: `btrfs_delete_raid_extent()`.

Record 5.2: Verified callers: production caller is
`do_free_extent_accounting()` in `fs/btrfs/extent-tree.c`; tests also
call it from `fs/btrfs/tests/raid-stripe-tree-tests.c`.

Record 5.3: Key callees include `btrfs_search_slot()`,
`btrfs_previous_item()`, `btrfs_partially_delete_raid_extent()`, and
`btrfs_del_item()`. The affected path is the stripe extent deletion path
for Btrfs RAID stripe tree.

Record 5.4: Verified call chain: `__btrfs_free_extent()` calls
`do_free_extent_accounting()` when data extent refs drop to zero; that
calls `btrfs_delete_raid_extent()`. If `btrfs_delete_raid_extent()`
returns an error, `do_free_extent_accounting()` aborts the transaction.
I did not fully trace from VFS syscall entry, so unprivileged
triggerability is not relied on for the decision.

Record 5.5: Similar nearby bugfixes were found in the same series,
indicating this area had multiple deletion-path correctness bugs.

## Phase 6: Stable Tree Analysis
Record 6.1: The buggy `btrfs_previous_item(..., start, ...)` call exists
in release tags `v6.14`, `v6.14.11`, `v6.15`, `v6.15.11`, `v6.16`,
`v6.17`, `v6.18`, `v6.19`, `v7.0`, and current `v7.0.3`. It is absent
from `v6.13`, `v6.12`, `v6.6`, and `v6.1`.

Record 6.2: Backport difficulty is low for `v7.0.3`; `git apply --check`
succeeded. Older affected stable trees likely need only minor context
adjustment if their surrounding code differs.

Record 6.3: I did not find an alternate already-applied fix in the
checked release tags; `master` contains the corrected `min_objectid=0`
call.

## Phase 7: Subsystem And Maintainer Context
Record 7.1: Subsystem: Btrfs filesystem, specifically RAID stripe-tree
support. Criticality: important but feature-specific, not universal.

Record 7.2: `fs/btrfs` is actively maintained. Recent history shows
multiple April 2026 Btrfs fixes and the RAID stripe-tree deletion series
merged by David Sterba.

## Phase 8: Impact And Risk
Record 8.1: Affected users are Btrfs users with the `RAID_STRIPE_TREE`
incompat feature enabled and data block group profiles covered by
`BTRFS_RST_SUPP_BLOCK_GROUP_MASK` (`DUP`, `RAID0`, `RAID1` variants,
`RAID10`).

Record 8.2: Trigger condition is freeing/deleting a data extent range
that spans stripe extents where the next found stripe extent is at slot
0 and the needed previous stripe extent starts before `start`. Verified
from code flow; frequency in real workloads is not quantified.

Record 8.3: Failure mode: the old code can fail to find the needed
previous stripe extent, return `-ENOENT`, and cause
`do_free_extent_accounting()` to abort the transaction. Severity: high
for affected filesystems, because transaction abort is a serious
filesystem failure.

Record 8.4: Benefit is high for affected Btrfs RAID stripe-tree users.
Risk is low: one-line contained fix, reviewed by Btrfs developers, no
API change. Main concern is that this is part of a six-patch correctness
series, so stable maintainers should evaluate adjacent fixes as well.

## Phase 9: Final Synthesis
Record 9.1: Evidence for backporting: real Btrfs deletion-path bug;
verified wrong argument against `btrfs_previous_item()` semantics; bug
exists in `v6.14+` release lines; failure can abort Btrfs transactions;
fix is one line and reviewed by Johannes Thumshirn and David Sterba.
Evidence against: no reported user/syzbot case and feature-specific
impact; companion patches in same area should be considered. Unresolved:
exact real-world frequency and exact backport conflicts for every
affected stable branch were not fully checked.

Record 9.2 stable rules: obviously correct: yes, verified by code
semantics and reviews. Fixes a real bug: yes, incorrect lookup bound.
Important issue: yes for affected Btrfs filesystems, because it can
abort transactions. Small and contained: yes, one line in one function.
No new features/APIs: yes. Applies to stable: verified clean on current
`v7.0.3`; likely low difficulty for affected `v6.14+`.

Record 9.3: No exception category applies; this is a filesystem bug fix,
not a device ID, quirk, DT, build, or documentation change.

Record 9.4: Decision: backport. The patch is small, technically
justified, and fixes a serious correctness problem in a stable-visible
Btrfs feature. Prefer reviewing/backporting the surrounding RAID stripe-
tree deletion fixes too, but this commit itself is stable material.

## Verification
- Phase 1: Parsed commit `1871ae78ffa5c`; confirmed tags and absence of
  `Fixes:`, `Reported-by:`, `Tested-by`, `Link`, and stable Cc.
- Phase 2: Inspected diff; confirmed exactly 1-line argument change in
  `btrfs_delete_raid_extent()`.
- Phase 2: Read `btrfs_previous_item()`; confirmed it stops once
  `found_key.objectid < min_objectid`.
- Phase 3: Ran `git blame`; confirmed buggy call introduced by
  `76643119045ee`.
- Phase 3: Ran `git describe --contains`; confirmed `76643119045ee` is
  first contained around `v6.14-rc1`.
- Phase 3: Reviewed recent file history; confirmed this commit is part
  of a six-patch RAID stripe-tree deletion fix series.
- Phase 3: Ran `git apply --check`; confirmed the patch applies to
  current `v7.0.3` checkout.
- Phase 4: Ran `b4 dig -c`, `-a`, and `-w`; confirmed lore message ID,
  series context, and original recipients.
- Phase 4: Saved/read the `b4` mbox; confirmed cover letter says six
  bugs are fixed and patch 3 fixes this `min_objectid` issue.
- Phase 4: Read mbox reply; confirmed Johannes Thumshirn gave `Reviewed-
  by` for patch 3.
- Phase 4: `WebFetch` to lore/stable was blocked by Anubis; no web-
  search result was used as evidence.
- Phase 5: Searched callers; confirmed production caller path through
  `do_free_extent_accounting()` and tests.
- Phase 5: Read `do_free_extent_accounting()`; confirmed nonzero return
  from `btrfs_delete_raid_extent()` aborts the transaction.
- Phase 6: Checked release tags; confirmed buggy code exists in `v6.14+`
  through `v7.0.3`, absent in `v6.13`, `v6.12`, `v6.6`, and `v6.1`.
- Phase 7: Reviewed `fs/btrfs` recent history; confirmed active
  subsystem and maintainer merge context.
- Phase 8: Read `btrfs_need_stripe_tree_update()`; confirmed affected
  configurations are RAID stripe-tree data block groups with supported
  profiles.
- UNVERIFIED: Exact real-world frequency of the trigger.
- UNVERIFIED: Full VFS syscall-to-trigger trace and unprivileged
  triggerability.
- UNVERIFIED: Clean application to every affected stable branch older
  than current `v7.0.3`.

**YES**

 fs/btrfs/raid-stripe-tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
index 2987cb7c686ea..d2b8995febec9 100644
--- a/fs/btrfs/raid-stripe-tree.c
+++ b/fs/btrfs/raid-stripe-tree.c
@@ -123,7 +123,7 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le
 		 */
 		if (found_start > start) {
 			if (slot == 0) {
-				ret = btrfs_previous_item(stripe_root, path, start,
+				ret = btrfs_previous_item(stripe_root, path, 0,
 							  BTRFS_RAID_STRIPE_KEY);
 				if (ret) {
 					if (ret > 0)
-- 
2.53.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH AUTOSEL 7.0-6.18] btrfs: check return value of btrfs_partially_delete_raid_extent()
  2026-05-05  9:51 [PATCH AUTOSEL 7.0-5.10] ALSA: hda: Avoid WARN_ON() for HDMI chmap slot checks Sasha Levin
                   ` (21 preceding siblings ...)
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.18] btrfs: fix wrong min_objectid in btrfs_previous_item() call Sasha Levin
@ 2026-05-05  9:51 ` Sasha Levin
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.18] btrfs: fix raid stripe search missing entries at leaf boundaries Sasha Levin
                   ` (4 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Sasha Levin @ 2026-05-05  9:51 UTC (permalink / raw)
  To: patches, stable
  Cc: robbieko, Johannes Thumshirn, David Sterba, Sasha Levin, clm,
	linux-btrfs, linux-kernel

From: robbieko <robbieko@synology.com>

[ Upstream commit a8d58a7c0200904ff24ca7f0d7c147017e25aa99 ]

btrfs_partially_delete_raid_extent() returns an error code (e.g.
-ENOMEM from kzalloc(), or errors from btrfs_del_item/btrfs_insert_item()),
but all three call sites in btrfs_delete_raid_extent() discard the
return value, silently losing errors and potentially leaving the stripe
tree in an inconsistent state.

Fix by capturing the return value into ret at all three call sites and
breaking out of the loop on error where appropriate.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: robbieko <robbieko@synology.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Backport Analysis

### Phase 1: Commit Message Forensics
Record 1.1: Subsystem `btrfs`; action verb `check`; intent is to
propagate errors from `btrfs_partially_delete_raid_extent()` instead of
silently treating failed stripe-tree updates as success.

Record 1.2: Tags present: `Reviewed-by: Johannes Thumshirn
<johannes.thumshirn@wdc.com>`, `Signed-off-by: robbieko
<robbieko@synology.com>`, `Signed-off-by: David Sterba
<dsterba@suse.com>`. No `Fixes:`, no `Cc: stable`, no `Reported-by`.

Record 1.3: The body describes ignored `-ENOMEM`, `btrfs_del_item()`,
and `btrfs_insert_item()` errors, with the failure mode being silently
lost errors and possible RAID stripe tree inconsistency.

Record 1.4: This is a hidden bug fix despite the “check return value”
wording: it fixes error propagation and prevents filesystem metadata
update failures from being reported as success.

### Phase 2: Diff Analysis
Record 2.1: One file changed: `fs/btrfs/raid-stripe-tree.c`, 12
additions and 7 deletions. Modified function:
`btrfs_delete_raid_extent()`. Scope: single-file surgical error-handling
fix.

Record 2.2: Before, three calls to
`btrfs_partially_delete_raid_extent()` discarded the return value.
After, all three store it in `ret`; the middle loop case additionally
breaks immediately on error before advancing `start`/`length`.

Record 2.3: Bug category: error-path/correctness fix. The helper can
fail before modifying the tree (`kzalloc()`), during deletion, or during
insertion; ignoring those failures can leave the caller deleting more
extents or returning success after a failed partial update.

Record 2.4: Fix quality is high: it only propagates existing errors
through the existing `ret` path. Regression risk is low; the behavior
change is that real failures now abort/return instead of being hidden.

### Phase 3: Git History
Record 3.1: `git blame` shows `btrfs_partially_delete_raid_extent()` was
introduced by `6aea95ee318890` and changed to return errors by
`dc14ba10781bd`. The ignored call sites come from the partial
deletion/hole-punching work around `a678543e609df`, `50cae2ca69561`, and
`6aa0e7cc569eb`.

Record 3.2: No `Fixes:` tag is present, so there was no tagged
introducer to follow. History shows the relevant bug only exists once
`dc14ba10781bd` made the helper return `int`.

Record 3.3: Recent master history shows this is patch 6 of a six-patch
RAID stripe tree deletion bug-fix series. The candidate applies cleanly
to the current 7.0.y checkout without requiring the preceding five
patches for context.

Record 3.4: The author has multiple adjacent btrfs RAID stripe tree
fixes in master; David Sterba committed the patch, and Johannes
Thumshirn reviewed it.

Record 3.5: Dependency: target tree must have the `int`-returning helper
from `dc14ba10781bd`. Verified present in v6.14+ and absent in
v6.12/v6.13, so older trees without that helper form are not applicable.

### Phase 4: Mailing List And External Research
Record 4.1: `b4 dig -c a8d58a7c02009` found the original submission at `
https://patch.msgid.link/20260413065249.2320122-7-
robbieko@synology.com`. `b4 dig -a` found a single v1 series.

Record 4.2: `b4 dig -w` showed the patch was sent to `linux-
btrfs@vger.kernel.org`; direct recipients were limited, but the
subsystem list was included.

Record 4.3: No external bug report or syzbot link exists for this
specific patch.

Record 4.4: The cover letter states all six patches fix bugs in RAID
stripe tree deletion paths. Johannes requested tests for the series;
patch 6 itself received “Looks good” and `Reviewed-by`.

Record 4.5: Lore `WebFetch` was blocked by Anubis, but the yhbt mirror
and local `b4` mbox were readable. I found no stable-specific discussion
or explicit stable nomination.

### Phase 5: Code Semantic Analysis
Record 5.1: Modified function: `btrfs_delete_raid_extent()`.

Record 5.2: Callers found: `do_free_extent_accounting()` calls
`btrfs_delete_raid_extent()` for data extents; btrfs sanity tests also
call it.

Record 5.3: Key callees: `btrfs_partially_delete_raid_extent()` calls
`kzalloc()`, `btrfs_del_item()`, and `btrfs_insert_item()`.
`btrfs_delete_raid_extent()` also uses B-tree search/delete helpers.

Record 5.4: Reachability verified through delayed reference processing:
`run_one_delayed_ref()` -> `run_delayed_data_ref()` ->
`__btrfs_free_extent()` -> `do_free_extent_accounting()` ->
`btrfs_delete_raid_extent()`. This is reachable from normal Btrfs extent
freeing and transaction commit paths.

Record 5.5: Similar pattern search found only these three helper call
sites, and the patch fixes all of them.

### Phase 6: Stable Tree Analysis
Record 6.1: The buggy code exists in trees containing `dc14ba10781bd`:
verified v6.14, v6.15, v6.16, v6.17, v6.18, v6.19, and current 7.0.y
lineage. It is not applicable to v6.12/v6.13 as checked.

Record 6.2: Backport difficulty is low for the current 7.0.y checkout:
`git apply --check` for the candidate patch succeeded cleanly.

Record 6.3: Searches for this subject and “silently losing errors”
between v6.14 and v6.19 found no already-applied equivalent fix.

### Phase 7: Subsystem Context
Record 7.1: Subsystem is Btrfs filesystem metadata, specifically RAID
stripe tree support. Criticality: important to critical for users of
Btrfs filesystems with `RAID_STRIPE_TREE` enabled.

Record 7.2: The file has active recent maintenance and multiple bug
fixes, which indicates the area is actively being stabilized rather than
being a feature-only churn area.

### Phase 8: Impact And Risk
Record 8.1: Affected users are Btrfs users with the RAID stripe tree
incompat feature and supported RAID/DUP data profiles.

Record 8.2: Trigger is extent deletion/freeing where a stripe extent is
partially deleted and the helper hits allocation or B-tree operation
failure. User reachability depends on write access to such a mounted
filesystem; I verified ordinary filesystem extent-freeing paths, not a
standalone reproducer.

Record 8.3: Failure mode is hidden filesystem metadata update failure
and possible RAID stripe tree inconsistency. Severity: HIGH, with
data/metadata integrity risk.

Record 8.4: Benefit is high because errors propagate to transaction
abort handling instead of being hidden. Risk is low because the patch is
small, local, and only changes failure handling.

### Phase 9: Final Synthesis
Record 9.1: Evidence for backporting: real error propagation bug;
filesystem metadata consistency impact; small single-file patch; all
call sites fixed; reviewed by Johannes Thumshirn; committed by David
Sterba; applies cleanly to current 7.0.y. Evidence against: no explicit
stable tag, no external bug report, no direct test tag, and only
relevant to RAID stripe tree users.

Record 9.2: Stable checklist: obviously correct, yes; fixes a real bug,
yes; important issue, yes due filesystem metadata consistency; small and
contained, yes; no new features/APIs, yes; can apply to stable, yes for
current 7.0.y and likely v6.14+ trees with the helper.

Record 9.3: No special exception category applies; this is a normal bug
fix.

Record 9.4: The technical merit supports backporting to stable trees
that contain the `int`-returning `btrfs_partially_delete_raid_extent()`
implementation. Do not backport to trees where the helper is still
`void`.

## Verification
- Phase 1: Parsed `git show --format=fuller --stat --patch
  a8d58a7c02009`; confirmed tags and message.
- Phase 2: Inspected the diff; confirmed one file, `12+ / 7-`, and three
  ignored return values fixed.
- Phase 3: Ran `git blame` on helper and call-site ranges; identified
  relevant introducer/history commits.
- Phase 3: Checked related file history on current tree and master;
  found the six-patch deletion-path series.
- Phase 3/6: Ran `git apply --check` with the candidate patch against
  current checkout; it applies cleanly.
- Phase 4: Ran `b4 dig -c`, `-a`, `-w`, and saved/read the mbox;
  verified v1 submission, review, and no patch-specific objections.
- Phase 4: WebFetch to lore was blocked by Anubis; yhbt mirror fetch
  succeeded and matched the b4 thread.
- Phase 5: Used code search and file reads to trace
  `btrfs_delete_raid_extent()` through `do_free_extent_accounting()` and
  delayed refs.
- Phase 6: Used `git merge-base --is-ancestor` to verify v6.14+ contains
  the helper returning errors; v6.12/v6.13 do not.
- Phase 8: Verified `do_free_extent_accounting()` aborts the transaction
  on nonzero return from `btrfs_delete_raid_extent()`.

**YES**

 fs/btrfs/raid-stripe-tree.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
index 5909ad35a1b07..86ddc3ecb4060 100644
--- a/fs/btrfs/raid-stripe-tree.c
+++ b/fs/btrfs/raid-stripe-tree.c
@@ -213,8 +213,9 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le
 			/* The "left" item. */
 			path->slots[0]--;
 			btrfs_item_key_to_cpu(leaf, &key, path->slots[0]);
-			btrfs_partially_delete_raid_extent(trans, path, &key,
-							   diff_start, 0);
+			ret = btrfs_partially_delete_raid_extent(trans, path,
+								 &key,
+								 diff_start, 0);
 			break;
 		}

@@ -230,8 +231,11 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le
 		if (found_start < start) {
 			u64 diff_start = start - found_start;

-			btrfs_partially_delete_raid_extent(trans, path, &key,
-							   diff_start, 0);
+			ret = btrfs_partially_delete_raid_extent(trans, path,
+								 &key,
+								 diff_start, 0);
+			if (ret)
+				break;

 			start += (key.offset - diff_start);
 			length -= (key.offset - diff_start);
@@ -254,9 +258,10 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le
 		if (found_end > end) {
 			u64 diff_end = found_end - end;

-			btrfs_partially_delete_raid_extent(trans, path, &key,
-							   key.offset - length,
-							   length);
+			ret = btrfs_partially_delete_raid_extent(trans, path,
+								 &key,
+								 key.offset - length,
+								 length);
 			ASSERT(key.offset - diff_end == length);
 			break;
 		}
-- 
2.53.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH AUTOSEL 7.0-6.18] btrfs: fix raid stripe search missing entries at leaf boundaries
  2026-05-05  9:51 [PATCH AUTOSEL 7.0-5.10] ALSA: hda: Avoid WARN_ON() for HDMI chmap slot checks Sasha Levin
                   ` (22 preceding siblings ...)
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.18] btrfs: check return value of btrfs_partially_delete_raid_extent() Sasha Levin
@ 2026-05-05  9:51 ` Sasha Levin
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.18] btrfs: copy devid in btrfs_partially_delete_raid_extent() Sasha Levin
                   ` (3 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Sasha Levin @ 2026-05-05  9:51 UTC (permalink / raw)
  To: patches, stable
  Cc: robbieko, Johannes Thumshirn, David Sterba, Sasha Levin, clm,
	linux-btrfs, linux-kernel

From: robbieko <robbieko@synology.com>

[ Upstream commit 2aef5cb1dcf9b3e1be3895a6477dc065e618aab8 ]

In btrfs_delete_raid_extent(), the search key uses offset=0. When the
target stripe entry is the first item on a leaf, btrfs_search_slot()
may land on the previous leaf and decrementing the slot from nritems
still points to the wrong entry, causing the stripe extent to be
silently missed.

Fix this by searching with offset=(u64)-1 instead. Since no real stripe
entry has this offset, btrfs_search_slot() always returns 1 with the
slot pointing past the last matching objectid entry. Then unconditionally
decrement the slot with a proper slots[0]==0 early-exit check to handle
the case where no matching entry exists.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: robbieko <robbieko@synology.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Decision Summary
Backport this to stable trees that contain the
`btrfs_delete_raid_extent()` `offset = 0` search logic, verified locally
in `v6.13` through `v7.0`. It is a small Btrfs filesystem metadata
correctness fix: a leaf-boundary search miss can leave a RAID stripe-
tree entry undeleted, and that tree is used later to derive read-side
physical stripe mappings.

The main caveat is scope: RAID stripe tree support is behind
`CONFIG_BTRFS_EXPERIMENTAL`, so the affected population is config and
filesystem-feature specific. Still, for those users, the benefit
outweighs the low patch risk.

## Phase Walkthrough
Phase 1: Commit Message Forensics

- Step 1.1 Record: subsystem `btrfs`; action verb `fix`; intent is to
  prevent RAID stripe-tree deletion from missing entries at B-tree leaf
  boundaries.
- Step 1.2 Record: tags are `Reviewed-by: Johannes Thumshirn
  <johannes.thumshirn@wdc.com>`, `Signed-off-by: robbieko
  <robbieko@synology.com>`, `Signed-off-by: David Sterba
  <dsterba@suse.com>`. No `Fixes:`, `Reported-by:`, `Tested-by:`, `Cc:
  stable`, or `Link:` tags.
- Step 1.3 Record: the described bug is that searching with `offset = 0`
  can land on the previous leaf when the target stripe item is the first
  item in the next leaf; the existing decrement then inspects the wrong
  entry and silently misses the stripe extent. No affected kernel
  versions are stated in the message.
- Step 1.4 Record: this is not hidden cleanup; it is an explicit
  correctness bug fix.

Phase 2: Diff Analysis

- Step 2.1 Record: one file changed, `fs/btrfs/raid-stripe-tree.c`,
  +15/-3 in `btrfs_delete_raid_extent()`. Scope is single-function
  surgical.
- Step 2.2 Record: before, the delete loop searched `(objectid=start,
  type=RAID_STRIPE, offset=0)` and only decremented the slot when the
  slot was at `nritems`. After, it searches with `offset=(u64)-1`,
  checks `slots[0] == 0`, then always decrements to the last candidate
  item.
- Step 2.3 Record: bug category is logic/correctness in B-tree search
  positioning. The fix uses Btrfs key ordering by `objectid`, then
  `type`, then `offset`, and `btrfs_bin_search()` semantics where a
  missing key returns the insertion slot.
- Step 2.4 Record: fix quality is good for valid filesystem state:
  small, local, no API changes. Regression risk is low. One review
  concern exists: David Sterba noted that an exact corrupt
  `offset=(u64)-1` key should ideally be handled as `-EUCLEAN`; that is
  robustness against corrupted/fuzzed images, not the normal valid-state
  bug being fixed.

Phase 3: Git History Investigation

- Step 3.1 Record: blame shows the delete function was introduced by
  `ca41504efda646` in `v6.7`, but the specific `key.offset = 0` logic
  and `nritems` decrement pattern were introduced by `6aea95ee318890`
  (`btrfs: implement partial deletion of RAID stripe extents`), first
  contained in `v6.13`.
- Step 3.2 Record: no `Fixes:` tag, so no tagged original commit to
  follow. Blame nevertheless identifies `6aea95ee318890` as the relevant
  introducer.
- Step 3.3 Record: related recent commits in master are a six-patch
  deletion-path fix series: copy missing devid, this leaf-boundary
  search fix, wrong `btrfs_previous_item()` min objectid, ASSERT-to-
  error handling, `-EAGAIN`/stale leaf handling, and return-value
  checking.
- Step 3.4 Record: author `robbieko` authored all six related RAID
  stripe-tree deletion fixes in that series. I did not verify them as a
  subsystem maintainer; David Sterba committed the series and Johannes
  reviewed this patch.
- Step 3.5 Record: candidate applies standalone to the current `7.0.3`
  worktree via `git apply --check`; companion patches are related fixes
  but not prerequisites for this hunk.

Phase 4: Mailing List And External Research

- Step 4.1 Record: `b4 dig -c 2aef5cb1dcf9b` found the original
  submission at `https://patch.msgid.link/20260413065249.2320122-3-
  robbieko@synology.com`. `b4 dig -a` found only v1 of the six-patch
  series.
- Step 4.2 Record: `b4 dig -w` showed original recipients were
  `robbieko` and `linux-btrfs@vger.kernel.org`. The thread later
  includes Johannes Thumshirn and David Sterba review/maintainer
  feedback.
- Step 4.3 Record: no separate bug report or syzbot report was linked.
- Step 4.4 Record: series context confirms this is patch 2/6 in “fix
  multiple bugs in raid-stripe-tree deletion path.” Johannes asked for
  tests for the conditions; Johannes also gave `Reviewed-by` on this
  patch. David later said the series was added to `for-next`.
- Step 4.5 Record: direct lore stable search was blocked by Anubis; web
  search did not find stable-specific discussion. This remains
  unverified.

Phase 5: Code Semantic Analysis

- Step 5.1 Record: modified function is `btrfs_delete_raid_extent()`.
- Step 5.2 Record: callers are `do_free_extent_accounting()` in
  `fs/btrfs/extent-tree.c` and RAID stripe-tree selftests.
  `do_free_extent_accounting()` is reached when a data extent’s refs
  drop to zero in `__btrfs_free_extent()`.
- Step 5.3 Record: key callees include `btrfs_search_slot()`,
  `btrfs_item_key_to_cpu()`, `btrfs_del_item()`,
  `btrfs_previous_item()`, and partial deletion helpers.
- Step 5.4 Record: user reachability is through Btrfs extent
  deletion/accounting paths; `file.c` calls `btrfs_free_extent()` while
  dropping file extents. Exact VFS entrypoints were not fully traced,
  but the path is filesystem-operation reachable on RST-enabled
  filesystems.
- Step 5.5 Record: similar verified patterns include
  `btrfs_search_prev_slot()` checking `slot == 0` before decrementing.
  `zoned.c` also demonstrates the Btrfs pattern of treating impossible
  exact matches as `-EUCLEAN`, matching David’s review concern.

Phase 6: Stable Tree Analysis

- Step 6.1 Record: `v6.6` lacks this file/feature; `v6.12` has
  `key.offset = length`; `v6.13` through `v7.0` have `key.offset = 0`
  plus the vulnerable `nritems` decrement. Candidate is in `v7.1-rc2`,
  not in `v7.0`.
- Step 6.2 Record: backport difficulty is clean for current `7.0.3` by
  `git apply --check`; older affected trees appear to have the same
  local search pattern, but exact apply was not checked on separate
  worktrees.
- Step 6.3 Record: no alternate stable fix for the same subject was
  found locally; stable-list search was blocked, so external stable
  history is partially unverified.

Phase 7: Subsystem Context

- Step 7.1 Record: subsystem is Btrfs filesystem, specifically RAID
  stripe tree. Criticality is important for users of that filesystem
  feature, but not universal.
- Step 7.2 Record: file history shows active development and several
  recent bug fixes in the same deletion path, indicating this code is
  relatively new and still being hardened.

Phase 8: Impact And Risk

- Step 8.1 Record: affected population is config-specific and feature-
  specific: Btrfs filesystems using `RAID_STRIPE_TREE`, which is
  supported only when `CONFIG_BTRFS_EXPERIMENTAL` is enabled.
- Step 8.2 Record: trigger is deleting/freeing data extents where the
  matching stripe item is first in a leaf. Commonness of the leaf-
  boundary condition is data-layout dependent. User triggering is
  partially verified via file extent deletion paths, subject to
  filesystem permissions and the RST feature being enabled.
- Step 8.3 Record: failure mode is stale/missed RAID stripe-tree
  deletion. Severity is high for affected users because
  `btrfs_get_raid_extent_offset()` is used for read mapping and sets
  `stripe->physical` from stripe-tree contents.
- Step 8.4 Record: benefit is medium-high for affected RST users; risk
  is low because the change is +15/-3, one function, no ABI/API change,
  and accepted by Btrfs maintainers.

Phase 9: Final Synthesis

- Step 9.1 Record: evidence for backporting: real metadata correctness
  bug, verified affected stable code in `v6.13`-`v7.0`, small fix,
  reviewed by Johannes, committed by David, clean apply to current
  stable worktree. Evidence against: experimental/config-specific
  feature, no `Reported-by`/`Tested-by`, no explicit stable nomination,
  companion fixes in same area, and one unaddressed corrupt-image
  robustness comment.
- Step 9.2 Record: stable rules: obviously correct for valid state yes;
  fixes real bug yes; important issue yes for RST metadata/read mapping
  correctness; small and contained yes; no new feature/API yes; applies
  to current stable tree yes.
- Step 9.3 Record: no exception category like device ID, quirk, DT,
  build, or docs applies.
- Step 9.4 Record: decision is YES, limited to stable trees containing
  the `offset = 0` deletion search logic.

## Verification
- Phase 1: Parsed `git show 2aef5cb1dcf9b` and confirmed tags, subject,
  message, and +15/-3 diff.
- Phase 2: Read `fs/btrfs/raid-stripe-tree.c` and confirmed the exact
  modified code in `btrfs_delete_raid_extent()`.
- Phase 3: Ran blame around the changed lines; confirmed
  `6aea95ee318890` introduced `offset = 0`, and tag containment starts
  at `v6.13`.
- Phase 3: Inspected `ca41504efda646` and confirmed the function existed
  earlier with a different `offset = length` search.
- Phase 4: `b4 dig -c 2aef5cb1dcf9b` found the patch thread; `-a` found
  only v1; `-w` showed original recipients.
- Phase 4: Saved and searched the mbox; confirmed Johannes’ `Reviewed-
  by`, test request, David’s `EUCLEAN` robustness comment, and David’s
  “Added to for-next”.
- Phase 5: Used code search and reads to trace
  `btrfs_delete_raid_extent()` from `do_free_extent_accounting()` and
  `__btrfs_free_extent()`, with file deletion paths calling
  `btrfs_free_extent()`.
- Phase 5: Verified read-side RST use: `set_io_stripe()` calls
  `btrfs_get_raid_extent_offset()` for reads, and that function sets
  `stripe->physical` from the stripe-tree item.
- Phase 6: Checked historical tags: `v6.12` uses `offset = length`;
  `v6.13`-`v7.0` use `offset = 0` and the vulnerable slot handling.
- Phase 6: Ran `git apply --check` for the upstream patch against
  current `7.0.3`; it succeeded.
- UNVERIFIED: Direct lore stable search was blocked by Anubis, so
  stable-list discussion could not be confirmed.
- UNVERIFIED: Exact apply status on every older affected stable tree was
  not checked in separate worktrees.

**YES**

 fs/btrfs/raid-stripe-tree.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
index d2b8995febec9..dd924048c6659 100644
--- a/fs/btrfs/raid-stripe-tree.c
+++ b/fs/btrfs/raid-stripe-tree.c
@@ -95,14 +95,26 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le
 	while (1) {
 		key.objectid = start;
 		key.type = BTRFS_RAID_STRIPE_KEY;
-		key.offset = 0;
+		key.offset = (u64)-1;
 
 		ret = btrfs_search_slot(trans, stripe_root, &key, path, -1, 1);
 		if (ret < 0)
 			break;
 
-		if (path->slots[0] == btrfs_header_nritems(path->nodes[0]))
-			path->slots[0]--;
+		/*
+		 * Search with offset=(u64)-1 ensures we land on the correct
+		 * leaf even when the target entry is the first item on a leaf.
+		 * Since no real entry has offset=(u64)-1, ret is always 1 and
+		 * slot points past the last entry with objectid==start (or
+		 * past the end of the leaf if that entry is the last item).
+		 * Back up one slot to find the actual entry.
+		 */
+		if (path->slots[0] == 0) {
+			/* No entry with objectid <= start exists. */
+			ret = 0;
+			break;
+		}
+		path->slots[0]--;
 
 		leaf = path->nodes[0];
 		slot = path->slots[0];
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH AUTOSEL 7.0-6.18] btrfs: copy devid in btrfs_partially_delete_raid_extent()
  2026-05-05  9:51 [PATCH AUTOSEL 7.0-5.10] ALSA: hda: Avoid WARN_ON() for HDMI chmap slot checks Sasha Levin
                   ` (23 preceding siblings ...)
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.18] btrfs: fix raid stripe search missing entries at leaf boundaries Sasha Levin
@ 2026-05-05  9:51 ` Sasha Levin
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.18] nvme-multipath: put module reference when delayed removal work is canceled Sasha Levin
                   ` (2 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Sasha Levin @ 2026-05-05  9:51 UTC (permalink / raw)
  To: patches, stable
  Cc: robbieko, Johannes Thumshirn, David Sterba, Sasha Levin, clm,
	linux-btrfs, linux-kernel

From: robbieko <robbieko@synology.com>

[ Upstream commit 513f8a52eed880ea525dbb139b2127bd9bb793f1 ]

When btrfs_partially_delete_raid_extent() rebuilds a truncated/shifted
stripe extent into newitem, the loop copies the physical address for
each stride but forgets to copy the devid. The resulting item written
back to the stripe tree has zeroed-out devids, corrupting the stripe
mapping.

Fix this by reading the devid with btrfs_raid_stride_devid() and
writing it into the new item with btrfs_set_stack_raid_stride_devid()
before copying the physical address.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: robbieko <robbieko@synology.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Phase 1: Commit Message Forensics

Step 1.1 Record: subsystem `btrfs`; action verb `copy`; claimed intent
is to preserve `devid` when `btrfs_partially_delete_raid_extent()`
rebuilds a truncated or shifted RAID stripe extent.

Step 1.2 Record: tags present are `Reviewed-by: Johannes Thumshirn
<johannes.thumshirn@wdc.com>`, `Signed-off-by: robbieko
<robbieko@synology.com>`, `Reviewed-by: David Sterba
<dsterba@suse.com>`, `Signed-off-by: David Sterba <dsterba@suse.com>`.
No `Fixes:`, `Reported-by:`, `Tested-by:`, `Link:`, or `Cc:
stable@vger.kernel.org` tag was present. David Sterba is listed as a
Btrfs maintainer in `MAINTAINERS`.

Step 1.3 Record: the body says the function rebuilds a stripe extent
into `newitem`, copies each stride’s physical address, but forgets to
copy each stride’s device id. Because `newitem` is allocated with
`kzalloc()`, omitted devid fields become zero. The described failure
mode is corrupted stripe mapping. No explicit affected kernel version is
stated.

Step 1.4 Record: this is not hidden behind cleanup wording; it is a
direct correctness fix for persistent Btrfs RAID stripe tree metadata.

## Phase 2: Diff Analysis

Step 2.1 Record: one file changed, `fs/btrfs/raid-stripe-tree.c`, with 3
insertions in `btrfs_partially_delete_raid_extent()`. Scope is a single-
file surgical fix.

Step 2.2 Record: before, the copy loop populated only
`newitem->strides[i].physical`; after, it reads `devid` from the old
stride using `btrfs_raid_stride_devid()` and stores it in the stack item
with `btrfs_set_stack_raid_stride_devid()` before copying the physical
address. The affected path is partial deletion/truncation/shift of RAID
stripe extents.

Step 2.3 Record: bug category is filesystem metadata correctness/data
corruption. Mechanism: `kzalloc()` zeroes the rebuilt item, and the old
code only writes the physical address, leaving device ids as zero. Later
lookup code in `btrfs_get_raid_extent_offset()` searches for a stride
whose stored devid matches `stripe->dev->devid`; zeroed devids can fail
that match and return `-ENODATA`.

Step 2.4 Record: the fix is obviously local and correct: it copies the
missing field from the old item to the rebuilt item. Regression risk is
very low: it adds no new behavior, no locking, no API changes, and
preserves existing physical-address handling.

## Phase 3: Git History Investigation

Step 3.1 Record: `git blame` on the pre-fix function showed the rebuild-
and-reinsert logic came from `dc14ba10781bd` and the partial-delete
function originated from `6aea95ee31889`. The specific zeroed-devid bug
was introduced by `dc14ba10781bd`, first contained in `v6.14-rc1`.

Step 3.2 Record: no `Fixes:` tag exists. I manually inspected
`dc14ba10781bd`; it replaced in-place key modification with allocation
of a new item and copied only physical addresses, omitting devids. That
commit fixed a prior kernel BUG but introduced this missing-field copy.

Step 3.3 Record: recent file history shows this patch is part of a
cluster of RAID stripe tree deletion fixes: search boundary fixes,
`btrfs_previous_item()` min-objectid fix, ASSERT-to-error handling,
stale leaf pointer handling, and return-value checking. This commit is
standalone; it does not depend on later patches, though the related
patches may be independently worth stable review.

Step 3.4 Record: author `robbieko` has multiple related Btrfs fixes in
the same file around the same time. Committer/reviewer David Sterba is a
Btrfs maintainer per `MAINTAINERS`.

Step 3.5 Record: no prerequisite commit beyond the buggy rebuild-and-
reinsert implementation was identified for affected trees. The patch
applies cleanly to the current `v7.0.3` stable worktree with `git apply
--check`.

## Phase 4: Mailing List And External Research

Step 4.1 Record: `b4 dig -c 513f8a52eed88` found the original submission
at `https://patch.msgid.link/20260413065249.2320122-2-
robbieko@synology.com`. `b4 dig -a` showed only v1. WebFetch of lore was
blocked by Anubis, so I used `b4 dig -m` and read the mbox locally. The
thread includes David Sterba saying the series was added to for-next. No
NAK was found in the fetched thread.

Step 4.2 Record: `b4 dig -w` showed original recipients were `robbieko`
and `linux-btrfs@vger.kernel.org`. The mbox showed Johannes Thumshirn
replied and asked for tests for the series conditions; the committed
patch later carries his `Reviewed-by`.

Step 4.3 Record: no external bug report, syzbot report, or bugzilla link
was present. The cover letter and patch text both describe corrupted
stripe mappings.

Step 4.4 Record: this is patch 1/6 in a series titled `btrfs: fix
multiple bugs in raid-stripe-tree deletion path`. The other five patches
address separate bugs in the same deletion path. Patch 1 is not a
preparatory change; it fixes a complete missing-field copy on its own.

Step 4.5 Record: WebFetch search of lore stable was blocked by Anubis.
Local stable branch inspection found no equivalent devid-copy fix in
`stable/linux-7.0.y`.

## Phase 5: Code Semantic Analysis

Step 5.1 Record: modified function is
`btrfs_partially_delete_raid_extent()`.

Step 5.2 Record: callers are the three partial-deletion cases in
`btrfs_delete_raid_extent()`: hole punch splitting a stripe extent,
front truncation, and tail truncation. `btrfs_delete_raid_extent()` is
called from Btrfs extent free accounting for data extents.

Step 5.3 Record: relevant callees are `kzalloc()`,
`btrfs_raid_stride_devid()`, `btrfs_set_stack_raid_stride_devid()`,
`btrfs_raid_stride_physical()`,
`btrfs_set_stack_raid_stride_physical()`, `btrfs_del_item()`, and
`btrfs_insert_item()`.

Step 5.4 Record: reachability is through Btrfs data extent
deletion/freeing and transaction paths. Lookup impact is through read
mapping: `btrfs_map_block()` calls `set_io_stripe()`, which calls
`btrfs_get_raid_extent_offset()` for reads when RAID stripe tree updates
are needed. If the zeroed devid does not match the real device id,
lookup returns `-ENODATA`.

Step 5.5 Record: similar correct pattern exists in
`btrfs_insert_one_raid_extent()`, which stores both devid and physical
address for each stride. The broken partial-delete rebuild copied only
physical address.

## Phase 6: Cross-Referencing And Stable Tree Analysis

Step 6.1 Record: latest checked tags show `v6.12.85` has the file but
not the affected partial-delete rebuild code; `v6.15.11`, `v6.16.12`,
`v6.17.13`, `v6.18.26`, `v6.19.14`, and `v7.0.3` contain the buggy loop
with physical copy but no devid copy. `v6.6` has no `fs/btrfs/raid-
stripe-tree.c`.

Step 6.2 Record: expected backport difficulty is low for affected trees.
The patch applies cleanly to current `v7.0.3`; older affected trees
before the `AUTO_KFREE` cleanup still have the same loop and should need
at most context adjustment.

Step 6.3 Record: no equivalent devid-copy fix was found in
`stable/linux-7.0.y`; current stable code still lacks
`btrfs_set_stack_raid_stride_devid(&newitem->strides[i], devid)`.

## Phase 7: Subsystem And Maintainer Context

Step 7.1 Record: subsystem is Btrfs filesystem code under `fs/btrfs`.
Criticality is IMPORTANT: it is filesystem metadata and I/O mapping, but
gated to filesystems using the RAID stripe tree incompat feature.

Step 7.2 Record: subsystem activity is high; file history shows many
recent RAID stripe tree deletion fixes. This does not reduce stable
suitability because the affected buggy code is already present in
several stable tags.

## Phase 8: Impact And Risk Assessment

Step 8.1 Record: affected users are Btrfs users with `RAID_STRIPE_TREE`
enabled and data profiles supported by
`btrfs_need_stripe_tree_update()`.

Step 8.2 Record: trigger is partial deletion of RAID stripe extents,
such as data extent freeing/truncation/hole-related deletion paths.
User-triggerability depends on having an affected Btrfs RAID stripe tree
filesystem; the exact unprivileged trigger was not separately reproduced
here.

Step 8.3 Record: failure mode is corrupted RAID stripe mapping metadata
with zeroed devids. Verified read-side code relies on devid matching and
returns `-ENODATA` when no matching devid is found. Severity is HIGH to
CRITICAL for affected filesystems because it can persistently break
logical-to-physical mapping.

Step 8.4 Record: benefit is high for affected Btrfs RST users because it
prevents persistent mapping corruption. Risk is very low because the
change is 3 lines, local, and copies an existing field without changing
control flow.

## Phase 9: Final Synthesis

Step 9.1 Record: evidence for backporting: real filesystem metadata
corruption, tiny surgical fix, reviewed by Btrfs developers including
maintainer David Sterba, affects current stable trees with the buggy
rebuild loop, and applies cleanly to `v7.0.3`. Evidence against: affects
only the RAID stripe tree feature, and no `Tested-by` or external bug
report was present. Unresolved: I did not runtime-test a reproducer, and
lore WebFetch was blocked, though b4 retrieved the thread mbox.

Step 9.2 Stable rules checklist:
1. Obviously correct and tested: obviously correct by inspection; no
   explicit `Tested-by`.
2. Fixes a real bug: yes, missing copied devid causes zeroed devids in
   rebuilt stripe items.
3. Important issue: yes, filesystem mapping corruption.
4. Small and contained: yes, 3 inserted lines in one function.
5. No new features or APIs: yes.
6. Can apply to stable trees: yes for `v7.0.3`; likely clean or minor
   context for affected `v6.15+` stable trees.

Step 9.3 Record: no special exception category is needed; this is a
normal filesystem corruption fix.

Step 9.4 Decision: backport to stable trees that contain the rebuild-
and-reinsert version of `btrfs_partially_delete_raid_extent()` without
the devid copy, notably checked affected tags `v6.15.11` through
`v7.0.3`. Do not apply to trees where the affected code is absent, such
as checked `v6.12.85` and `v6.6`.

## Verification

- [Phase 1] Parsed commit `513f8a52eed88` with `git show
  --pretty=fuller`; confirmed subject, body, tags, and 3-line diff.
- [Phase 1] Checked `MAINTAINERS` with `rg`; confirmed David Sterba is a
  Btrfs maintainer.
- [Phase 2] Read `fs/btrfs/raid-stripe-tree.c`; confirmed `kzalloc()`
  allocation, old physical-only copy, and read-side devid matching.
- [Phase 3] Ran `git blame 513f8a52eed88^` on the changed function;
  identified relevant history from `6aea95ee31889`, `dc14ba10781bd`, and
  cleanup commits.
- [Phase 3] Ran `git show dc14ba10781bd`; confirmed the buggy new-item
  rebuild was introduced there and first appears in `v6.14-rc1`.
- [Phase 3] Ran recent file and author logs; confirmed related same-file
  fixes and that this patch is standalone.
- [Phase 4] Ran `b4 dig -c`, `-a`, `-w`, and `-m`; found the lore
  thread, single v1, original recipients, series context, and maintainer
  acceptance into for-next.
- [Phase 4] WebFetch lore and stable search were blocked by Anubis; b4
  mbox provided the usable mailing-list content.
- [Phase 5] Used `rg` and file reads to trace callers:
  `btrfs_delete_raid_extent()` from extent free accounting, and read
  mapping through `btrfs_map_block()`/`set_io_stripe()`/`btrfs_get_raid_
  extent_offset()`.
- [Phase 6] Checked stable tags with `git show <tag>:fs/btrfs/raid-
  stripe-tree.c`; confirmed affected code in `v6.15.11`, `v6.16.12`,
  `v6.17.13`, `v6.18.26`, `v6.19.14`, and `v7.0.3`, absent affected code
  in `v6.12.85`, and no file in `v6.6`.
- [Phase 6] Ran `git apply --check` for the upstream patch against
  current `v7.0.3`; it applies cleanly.
- [Phase 8] Verified failure mechanism in code: zeroed devids fail the
  `devid != stripe->dev->devid` match and lead to `-ENODATA`.

**YES**

 fs/btrfs/raid-stripe-tree.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
index a2e9ac2d97988..5909ad35a1b07 100644
--- a/fs/btrfs/raid-stripe-tree.c
+++ b/fs/btrfs/raid-stripe-tree.c
@@ -45,8 +45,11 @@ static int btrfs_partially_delete_raid_extent(struct btrfs_trans_handle *trans,

 	for (int i = 0; i < btrfs_num_raid_stripes(item_size); i++) {
 		struct btrfs_raid_stride *stride = &extent->strides[i];
+		u64 devid;
 		u64 phys;

+		devid = btrfs_raid_stride_devid(leaf, stride);
+		btrfs_set_stack_raid_stride_devid(&newitem->strides[i], devid);
 		phys = btrfs_raid_stride_physical(leaf, stride) + frontpad;
 		btrfs_set_stack_raid_stride_physical(&newitem->strides[i], phys);
 	}
-- 
2.53.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH AUTOSEL 7.0-6.18] nvme-multipath: put module reference when delayed removal work is canceled
  2026-05-05  9:51 [PATCH AUTOSEL 7.0-5.10] ALSA: hda: Avoid WARN_ON() for HDMI chmap slot checks Sasha Levin
                   ` (24 preceding siblings ...)
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.18] btrfs: copy devid in btrfs_partially_delete_raid_extent() Sasha Levin
@ 2026-05-05  9:51 ` Sasha Levin
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0] btrfs: abort transaction in do_remap_reloc_trans() on failure Sasha Levin
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0] drm/amdkfd: check if vm ready in svm map and unmap to gpu Sasha Levin
  27 siblings, 0 replies; 29+ messages in thread
From: Sasha Levin @ 2026-05-05  9:51 UTC (permalink / raw)
  To: patches, stable
  Cc: John Garry, Christoph Hellwig, Nilay Shroff, Chaitanya Kulkarni,
	Keith Busch, Sasha Levin, sagi, linux-nvme, linux-kernel

From: John Garry <john.g.garry@oracle.com>

[ Upstream commit 3f150f0f010f234f34a67897344f18e68fe803f7 ]

The delayed disk removal work is canceled when a NS (re)appears. However,
we do not put the module reference grabbed in nvme_mpath_remove_disk(), so
fix that.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Decision Summary
This should be backported to stable trees that contain NVMe multipath
delayed removal, especially `v6.16+` trees checked locally. It fixes a
real module reference leak: `nvme_mpath_remove_disk()` takes a
`THIS_MODULE` reference before scheduling delayed removal work, but
`nvme_init_ns_head()` can cancel that work when a namespace reappears.
If canceled, the work callback never runs and therefore never executes
its existing `module_put(THIS_MODULE)`. The patch balances that
reference only when `cancel_delayed_work()` actually canceled pending
work.

## Phase 1: Commit Message Forensics
Step 1.1 Record: subsystem `nvme-multipath`; action verb `put`/implicit
`fix`; intent is to release a module reference when delayed namespace
removal work is canceled.

Step 1.2 Record: tags present: `Reviewed-by: Christoph Hellwig
<hch@lst.de>`, `Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>`,
`Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>`, `Signed-off-by: John
Garry <john.g.garry@oracle.com>`, `Signed-off-by: Keith Busch
<kbusch@kernel.org>`. No `Fixes:`, `Reported-by:`, `Tested-by:`,
`Link:`, or `Cc: stable` tag in the commit.

Step 1.3 Record: bug is a module reference leak. The commit body states
the delayed disk removal work is canceled when a namespace appears or
reappears, but the module reference acquired in
`nvme_mpath_remove_disk()` is not released. Symptom from related
blktests discussion: module refcount does not return to the original
value after reconnect. Root cause: cancellation bypasses
`nvme_remove_head_work()`, where the existing `module_put()` lives.

Step 1.4 Record: yes, this is a hidden resource/reference leak fix. The
subject does not say “fix leak”, but the body and code show a missing
`module_put()` on a cancellation path.

## Phase 2: Diff Analysis
Step 2.1 Record: one file changed, `drivers/nvme/host/core.c`, with 2
insertions and 1 deletion. Modified function: `nvme_init_ns_head()`.
Scope: single-file surgical fix.

Step 2.2 Record: before, `nvme_init_ns_head()` unconditionally called
`cancel_delayed_work(&head->remove_work)` after adding the namespace
head. After, it checks the return value and calls
`module_put(THIS_MODULE)` only if pending delayed work was canceled. The
affected path is namespace initialization/reappearance after delayed
multipath removal was scheduled.

Step 2.3 Record: bug category is reference counting/resource leak.
`nvme_mpath_remove_disk()` acquires a module reference with
`try_module_get(THIS_MODULE)` before scheduling `head->remove_work`;
`nvme_remove_head_work()` releases it with `module_put(THIS_MODULE)`
when the work runs. If `nvme_init_ns_head()` cancels the pending work,
the callback does not run, so the added `module_put()` balances the
acquired reference. Workqueue docs in `kernel/workqueue.c` verify
`cancel_delayed_work()` returns true if pending work was canceled and
false otherwise.

Step 2.4 Record: fix quality is high. It is minimal and correctly
conditional. Regression risk is very low: an extra `module_put()` is
only performed when the work was pending and canceled, matching the
exact reference lifetime. If the work was not pending or was already
running, no extra put is done.

## Phase 3: Git History Investigation
Step 3.1 Record: `git blame` shows the cancellation line in
`nvme_init_ns_head()` was introduced by `dd2c18548964` (`nvme: reset
delayed remove_work after reconnect`), first contained around
`v6.16-rc4`. The delayed removal work and callback `module_put()` came
from `62188639ec16` (`nvme-multipath: introduce delayed removal of the
multipath head node`), first contained around `v6.16-rc1`.

Step 3.2 Record: no `Fixes:` tag is present, so there was no tagged
commit to follow. Manual history points to the interaction between
`62188639ec16` and `dd2c18548964`.

Step 3.3 Record: recent related commits include `62188639ec16`
introducing delayed removal, `dd2c18548964` adding cancellation on
reconnect, and `0f5197ea9a73` fixing a different delayed-removal module
reference failure path. This commit is standalone; it only changes the
cancellation path in `core.c`.

Step 3.4 Record: John Garry has recent NVMe-related commits and reviewed
the related `0f5197ea9a73` fix. The patch was committed by Keith Busch
and reviewed by Christoph Hellwig, both key NVMe maintainers.

Step 3.5 Record: no prerequisite commit beyond the delayed-removal
feature and reconnect cancellation code. The current `core.c` already
includes `<linux/module.h>`, so the new `module_put(THIS_MODULE)`
requires no include change.

## Phase 4: Mailing List And External Research
Step 4.1 Record: `b4 dig -c 3f150f0f010f2` found the original patch at `
https://patch.msgid.link/20260415155358.1517871-1-
john.g.garry@oracle.com`. `b4 dig -a` found only v1, so the committed
version is the submitted revision. Lore mirror shows three positive
reviews and Keith Busch applying it to `nvme-7.1`.

Step 4.2 Record: `b4 dig -w` shows John Garry, Christoph Hellwig, Keith
Busch, Nilay Shroff, and `linux-nvme` were included. Appropriate
maintainers/reviewers were on the thread.

Step 4.3 Record: no `Reported-by` or bugzilla/syzbot link. Related
blktests thread reported a concrete failure: `module refcount not as
original`; John Garry replied that he had posted the kernel fix for it,
referring to this fix.

Step 4.4 Record: related patch thread for `nvme-multipath: fix leak on
try_module_get failure` contains John Garry identifying this missing
`module_put()` on timer cancellation; Keith Busch agreed the check and
`module_put()` were correct and that `_sync` was unnecessary.

Step 4.5 Record: direct lore stable search was blocked by Anubis; web
search did not find stable-specific objections or reasons not to
backport.

## Phase 5: Code Semantic Analysis
Step 5.1 Record: modified function is `nvme_init_ns_head()`.

Step 5.2 Record: `nvme_init_ns_head()` is called by `nvme_alloc_ns()`.
`nvme_alloc_ns()` is called by `nvme_scan_ns()` when a namespace is
discovered during controller scan. Scan work is queued from namespace
change AEN handling, reset/start paths, sysfs rescan, and ioctl rescan
paths.

Step 5.3 Record: key callees around the bug are `cancel_delayed_work()`,
`module_put()`, `nvme_mpath_remove_disk()`, `try_module_get()`,
`mod_delayed_work()`, and `nvme_remove_head_work()`.

Step 5.4 Record: reachable path is: namespace/path removal ->
`nvme_ns_remove()` -> last path -> `nvme_mpath_remove_disk()` ->
`try_module_get()` + delayed work; later namespace reappears -> scan
path -> `nvme_scan_ns()` -> `nvme_alloc_ns()` -> `nvme_init_ns_head()`
-> cancel delayed work. This is reachable through real NVMe multipath
path loss/reconnect and through the blktests NVMe loop scenario. I did
not verify an unprivileged trigger; the tested path writes
`delayed_removal_secs` via sysfs and manipulates NVMe connectivity.

Step 5.5 Record: similar reference-balancing pattern exists elsewhere in
the kernel, e.g. Bluetooth code checks `cancel_delayed_work()` and drops
a reference only when cancellation succeeds. No additional identical
NVMe missing-put sites were found in the searched files.

## Phase 6: Cross-Referencing And Stable Tree Analysis
Step 6.1 Record: local stable branch checks show `stable/linux-6.16.y`,
`6.17.y`, `6.18.y`, `6.19.y`, and `7.0.y` contain
`cancel_delayed_work(&head->remove_work)`,
`try_module_get(THIS_MODULE)`, `module_put(THIS_MODULE)`, and
`delayed_removal_secs`. `stable/linux-6.12.y` does not contain this
delayed-removal code, so this specific bug is not present there.

Step 6.2 Record: `git apply --check` for the patch succeeds on the
current `stable/linux-7.0.y` checkout. The exact hunk context exists in
local `6.16.y` through `7.0.y` branches, so backport difficulty should
be clean or trivial. I did not create separate worktrees for branch-
specific apply checks.

Step 6.3 Record: no alternative fix for this exact cancellation leak was
found in checked stable branches; the candidate commit itself is not an
ancestor of the checked stable branches.

## Phase 7: Subsystem And Maintainer Context
Step 7.1 Record: subsystem is NVMe host multipath, under
`drivers/nvme/host`. Criticality: important for systems using NVMe
multipath and delayed removal, not universal core-kernel impact.

Step 7.2 Record: subsystem is actively developed; recent history
includes delayed removal introduction, reconnect cancellation, and
related leak fixes. This bug is in relatively new code introduced for
`v6.16`.

## Phase 8: Impact And Risk Assessment
Step 8.1 Record: affected users are `CONFIG_NVME_MULTIPATH` users who
enable nonzero `delayed_removal_secs` and experience path loss followed
by namespace/path return before the delayed removal work fires.

Step 8.2 Record: trigger is realistic for NVMe multipath transient path
failures or hot-remove/re-add scenarios. Related blktests exercises this
with NVMe loop. Unprivileged trigger was not verified.

Step 8.3 Record: failure mode is module reference leak, severity medium.
It can prevent `nvme_core` from being unloadable and causes persistent
refcount imbalance; the test discussion shows a concrete refcount
mismatch.

Step 8.4 Record: benefit is moderate for affected NVMe multipath users
because it fixes a real reference leak in a recovery path. Risk is very
low: 2 added lines, no API change, no behavior change except balancing a
reference when pending work is canceled.

## Phase 9: Final Synthesis
Step 9.1 Record: evidence for backporting: real reference leak; concrete
test failure in related blktests discussion; tiny one-function fix;
reviewed by three NVMe developers; maintainer agreed the logic; applies
cleanly to current `7.0.y` and relevant code exists in `6.16+` stable
branches. Evidence against: affects a specific configured NVMe multipath
feature, not all systems; severity is not crash/data
corruption/security. Unresolved: I did not verify exact branch-specific
application in separate worktrees, and direct lore stable search was
blocked.

Step 9.2 Record: stable rules checklist:
1. Obviously correct and tested: yes, by code inspection and related
   blktests discussion; no `Tested-by` tag.
2. Fixes real bug: yes, module reference leak.
3. Important issue: yes enough for stable as a driver lifecycle
   reference leak, though severity is medium rather than critical.
4. Small and contained: yes, 2 insertions/1 deletion in one function.
5. No new feature/API: yes.
6. Can apply to stable: yes for current `7.0.y`; exact context exists in
   checked `6.16+` branches. Not applicable to trees lacking delayed
   removal.

Step 9.3 Record: no exception category such as device ID, quirk, DT,
build, or documentation. This is a normal bug fix.

Step 9.4 Record: backport recommended to stable trees containing NVMe
delayed removal cancellation, notably checked `6.16.y` through `7.0.y`;
not applicable to older trees where the feature is absent.

## Verification
- [Phase 1] Parsed commit `3f150f0f010f2` with `git show --format=fuller
  --stat --patch`; confirmed subject, body, tags, and 2-line fix.
- [Phase 2] Inspected `drivers/nvme/host/core.c` and
  `drivers/nvme/host/multipath.c`; confirmed `try_module_get()` in
  `nvme_mpath_remove_disk()` and `module_put()` in
  `nvme_remove_head_work()`.
- [Phase 2] Checked `kernel/workqueue.c`; confirmed
  `cancel_delayed_work()` returns true if pending work was canceled and
  false otherwise.
- [Phase 3] Ran `git blame` on the changed and related lines; identified
  `dd2c18548964`, `62188639ec16`, and `0f5197ea9a73`.
- [Phase 3] Ran `git show` on related commits; confirmed delayed removal
  introduction, reconnect cancellation, and prior try-module failure
  fix.
- [Phase 4] Ran `b4 dig -c`, `-a`, and `-w`; found original patch URL,
  single v1 revision, and appropriate NVMe recipients.
- [Phase 4] Fetched lore mirror thread; confirmed reviews by Christoph
  Hellwig, Nilay Shroff, Chaitanya Kulkarni, and application by Keith
  Busch.
- [Phase 4] Fetched related try-module failure and blktests threads;
  confirmed this missing cancellation `module_put()` was discussed and
  tied to module refcount mismatch.
- [Phase 5] Used `rg` and file reads to trace callers from
  `nvme_init_ns_head()` through `nvme_alloc_ns()`, `nvme_scan_ns()`, and
  `nvme_scan_work()`.
- [Phase 6] Used `git merge-base --is-ancestor` and branch object
  inspection to confirm the relevant code exists in local `6.16.y`,
  `6.17.y`, `6.18.y`, `6.19.y`, and `7.0.y`, but not `6.12.y`.
- [Phase 6] Ran `git apply --check` for the candidate diff on current
  `stable/linux-7.0.y`; it succeeded.
- UNVERIFIED: exact `git apply --check` on separate worktrees for every
  stable branch; unprivileged triggerability; direct stable-list search
  results due lore Anubis blocking.

**YES**

 drivers/nvme/host/core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index ac5a7d9781f58..2e8402513189d 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4052,7 +4052,8 @@ static int nvme_init_ns_head(struct nvme_ns *ns, struct nvme_ns_info *info)
 	mutex_unlock(&ctrl->subsys->lock);

 #ifdef CONFIG_NVME_MULTIPATH
-	cancel_delayed_work(&head->remove_work);
+	if (cancel_delayed_work(&head->remove_work))
+		module_put(THIS_MODULE);
 #endif
 	return 0;

-- 
2.53.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH AUTOSEL 7.0] btrfs: abort transaction in do_remap_reloc_trans() on failure
  2026-05-05  9:51 [PATCH AUTOSEL 7.0-5.10] ALSA: hda: Avoid WARN_ON() for HDMI chmap slot checks Sasha Levin
                   ` (25 preceding siblings ...)
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.18] nvme-multipath: put module reference when delayed removal work is canceled Sasha Levin
@ 2026-05-05  9:51 ` Sasha Levin
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0] drm/amdkfd: check if vm ready in svm map and unmap to gpu Sasha Levin
  27 siblings, 0 replies; 29+ messages in thread
From: Sasha Levin @ 2026-05-05  9:51 UTC (permalink / raw)
  To: patches, stable
  Cc: Mark Harmstone, Johannes Thumshirn, David Sterba, Sasha Levin,
	clm, linux-btrfs, linux-kernel

From: Mark Harmstone <mark@harmstone.com>

[ Upstream commit 73db0fad673af844772de964eebecae60eda0496 ]

If one of the calls made by do_remap_reloc_trans() fails, we can leave
the remap tree in an inconsistent state. Abort the transaction if this
happens, to prevent the corrupt state from reaching the disk.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Phase 1: Commit Message Forensics

Step 1.1 Record: subsystem `btrfs`; action verb `abort`; claimed intent
is to abort the transaction from `do_remap_reloc_trans()` failure paths
so an inconsistent remap tree cannot be committed.

Step 1.2 Record: tags found in the actual public patch are `Reviewed-by:
Johannes Thumshirn`, `Signed-off-by: Mark Harmstone`, and `Signed-off-
by: David Sterba`. No `Fixes:`, `Reported-by:`, `Tested-by:`, `Link:`,
or `Cc: stable` tag was present in the fetched patch. Notable pattern:
reviewed by a Btrfs developer and committed by David Sterba.

Step 1.3 Record: the body describes a real corruption-prevention bug:
failures in `do_remap_reloc_trans()` can leave the remap tree
inconsistent, and aborting the transaction prevents that state from
reaching disk. No stack trace, reproducer, or explicit affected versions
were in the message. Root cause, as verified from code, is that the
function can return through `fail` after transactional remap/free-space
mutations without marking the transaction aborted.

Step 1.4 Record: this is not a hidden cleanup; it is an explicit
filesystem consistency fix. The subject and body both describe
preventing corrupt on-disk state.

## Phase 2: Diff Analysis

Step 2.1 Record: one file changed, `fs/btrfs/relocation.c`, with +10/-4.
One function modified: `do_remap_reloc_trans()`. Scope is a single-
function surgical filesystem fix.

Step 2.2 Record: before the patch, four failure paths jumped to `fail`,
released local references, freed the reserved extent, unlocked
`remap_mutex`, and ended the transaction without aborting it. After the
patch, those paths call `btrfs_abort_transaction(trans, ret)` before
cleanup. The `add_remap_entry()` failure path also stops attempting
`btrfs_add_to_free_space_tree()` recovery and aborts instead.

Step 2.3 Record: bug category is filesystem metadata consistency / data
corruption prevention. `add_remap_entry()` was verified to delete or
shorten identity remap entries before adding new remap and backref
items; if a later insertion fails, returning normally can leave a
partially updated remap tree. `btrfs_abort_transaction()` was verified
to mark the transaction aborted and set filesystem error state,
preventing normal commit of that partial state.

Step 2.4 Record: the fix is minimal and consistent with Btrfs
transaction rules. `transaction.c` says after abort, call-site recovery
should be limited to freeing local allocations and passing the error up;
the patch does exactly that. Regression risk is low but not zero: the
filesystem will now abort/remount error on these rare failures instead
of attempting to continue, but that is the correct trade-off for
avoiding persistent metadata corruption.

## Phase 3: Git History Investigation

Step 3.1 Record: `git blame` shows the changed error paths were
introduced by `fd6594b1446cc` (`btrfs: replace identity remaps with
actual remaps when doing relocations`), authored by Mark Harmstone in
Jan 2026 and merged for the 7.0 cycle. First containing tag verified as
`v7.0-rc1~232^2~74`.

Step 3.2 Record: no `Fixes:` tag is present, so there was no explicit
tag to follow. Independent blame identifies `fd6594b1446cc` as the
introducer, and that commit exists in `stable/linux-7.0.y` but not older
checked stable branches.

Step 3.3 Record: recent local history for `fs/btrfs/relocation.c` shows
several remap-tree fixes around the same code, including block-group
reference, NULL root checks, transaction handle leak, and remap-tree
setup fixes. The candidate itself is standalone; no prerequisite was
found for the shown hunk to apply to `stable/linux-7.0.y`.

Step 3.4 Record: the author, Mark Harmstone, authored the remap-tree
relocation code and several nearby Btrfs fixes. David Sterba committed
both the introducer and the candidate in the Btrfs development tree.

Step 3.5 Record: related nearby commit `942bcf6d1884` fixes a separate
`bytes_may_use` leak in the same function. It is not a prerequisite for
this abort fix; the candidate patch applies cleanly to current
`stable/linux-7.0.y` in a dry-run.

## Phase 4: Mailing List And External Research

Step 4.1 Record: `b4 dig -c a41c84ba2f51303b7dca2ccf426d99c4a3a757b3`
failed because the commit object is not present in this local clone. Web
research found the original v2 thread at `https://yhbt.net/lore/linux-
btrfs/3a0c1b94-4404-4726-aafe-809c425707fc@wdc.com/T/`. The thread shows
v2 and notes the change from aborting at the end to aborting in place so
line numbers are logged properly.

Step 4.2 Record: `b4 dig -w` also failed for the same nonlocal commit
reason. The fetched lore mirror shows recipients `linux-btrfs` and Boris
Burkov, and a review reply from Johannes Thumshirn with `Reviewed-by`.

Step 4.3 Record: no separate bug report, syzbot report, or Bugzilla link
was found in the commit or lore thread. Severity comes from the patch
description and verified code mechanism: possible persistent remap-tree
inconsistency.

Step 4.4 Record: lore shows this patch as a standalone `[PATCH v2]`, not
a numbered multi-patch series. Web search found a separate same-function
accounting leak fix, but no dependency relationship.

Step 4.5 Record: stable-specific search did not find a usable stable
discussion; direct lore stable fetches were blocked by Anubis. No
evidence of a known objection or rejection was found.

## Phase 5: Code Semantic Analysis

Step 5.1 Record: modified function is `do_remap_reloc_trans()`.

Step 5.2 Record: caller chain verified as `btrfs_ioctl_balance()` ->
`btrfs_balance()` -> `__btrfs_balance()` -> `btrfs_relocate_chunk()` ->
`btrfs_relocate_block_group()` -> `do_remap_reloc()` ->
`do_remap_reloc_trans()`. Other verified callers of
`btrfs_relocate_chunk()` include device shrink, block-group reclaim, and
zoned repair paths. Balance ioctl requires `CAP_SYS_ADMIN`.

Step 5.3 Record: key callees include
`btrfs_add_block_group_free_space()`, `copy_remapped_data()`,
`btrfs_remove_from_free_space_tree()`, `add_remap_entry()`,
`btrfs_free_reserved_extent()`, and `btrfs_end_transaction()`.
`add_remap_entry()` calls `btrfs_del_item()`,
`btrfs_set_item_key_safe()`, `add_remap_item()`,
`btrfs_insert_empty_item()`, and `add_remap_backref_item()`.

Step 5.4 Record: the buggy path is reachable from privileged userspace
via Btrfs balance and from kernel maintenance paths such as
reclaim/shrink. It is only active when
`should_relocate_using_remap_tree()` is true: filesystem has the
`REMAP_TREE` incompat feature and the block group is not system or
metadata-remap.

Step 5.5 Record: similar local patterns support aborting.
`move_existing_remap()` aborts the transaction on error before
ending/committing, and `start_block_group_remapping()` aborts on remap-
tree setup failures. `transaction.h` explicitly says to call
`btrfs_abort_transaction()` as early as possible.

## Phase 6: Cross-Referencing And Stable Tree Analysis

Step 6.1 Record: `stable/linux-7.0.y` contains `do_remap_reloc_trans()`.
`stable/linux-6.19.y`, `stable/linux-6.18.y`, `stable/linux-6.12.y`, and
`stable/linux-6.6.y` do not contain this function or remap-tree support
in the checked paths. The bug is therefore relevant to 7.0.y, not older
stable branches checked.

Step 6.2 Record: dry-run `git apply --check` of the candidate hunk
succeeded against current `stable/linux-7.0.y`, so expected backport
difficulty is clean for 7.0.y.

Step 6.3 Record: local `stable/linux-7.0.y` does not already contain the
candidate subject, and the current file still has the pre-fix failure
paths.

## Phase 7: Subsystem And Maintainer Context

Step 7.1 Record: subsystem is Btrfs filesystem code under `fs/btrfs`,
criticality IMPORTANT. It is not universal core code, but filesystem
metadata corruption is a high-severity stable concern for affected Btrfs
users.

Step 7.2 Record: subsystem activity is high in this area; local history
shows many Btrfs remap-tree commits and fixes in the 7.0 cycle. This is
recently introduced code, but it is already present in the 7.0 stable
tree.

## Phase 8: Impact And Risk Assessment

Step 8.1 Record: affected population is config/filesystem-specific:
users running kernels with Btrfs experimental/remap-tree support and
filesystems using the `REMAP_TREE` incompat feature, during
relocation/balance/shrink/reclaim paths.

Step 8.2 Record: trigger conditions are failures from the remap
relocation sequence after reservation and during free-space/remap-
tree/data-copy operations. Verified user trigger via balance requires
`CAP_SYS_ADMIN`; unprivileged trigger was not verified.

Step 8.3 Record: failure mode is potential persistent filesystem
metadata inconsistency / corruption of the remap tree. Severity is
CRITICAL for affected users because the commit’s stated goal is to
prevent corrupt state from reaching disk and code inspection confirms
partial remap-tree mutations can occur before error returns.

Step 8.4 Record: benefit is high for affected 7.0.y users because it
prevents committing inconsistent filesystem metadata. Risk is low: one
function, 10 additions and 4 removals, no API changes, and behavior only
changes rare error paths to abort the transaction.

## Phase 9: Final Synthesis

Step 9.1 Record: evidence for backporting is strong: real filesystem
consistency bug, verified partial transactional mutations in
`add_remap_entry()`, small single-function fix, reviewed on linux-btrfs,
clean dry-run apply to `stable/linux-7.0.y`, and Btrfs transaction
documentation supports early abort. Evidence against is limited to scope
of affected users: remap-tree is experimental/config-specific and the
bug only exists in 7.0.y among checked stable branches. Unresolved: no
independent reproducer or user report was found, and direct lore/stable
search was partially blocked.

Step 9.2 Record: stable rules checklist:
1. Obviously correct and tested: yes for correctness by inspection;
   reviewed by Johannes Thumshirn. No `Tested-by` found.
2. Fixes a real bug: yes, prevents inconsistent remap-tree state after
   transactional failure paths.
3. Important issue: yes, potential on-disk filesystem corruption.
4. Small and contained: yes, one function in one file, +10/-4.
5. No new features/APIs: yes.
6. Can apply to stable: yes for `stable/linux-7.0.y`; not applicable to
   older checked stable branches lacking the code.

Step 9.3 Record: no exception category like device ID, quirk, DT, build,
or docs applies. This is a normal serious filesystem bug fix.

Step 9.4 Decision: backport to stable trees that contain the remap-tree
relocation code, specifically 7.0.y based on the checked branches. Do
not try to apply it to older stable trees that lack
`do_remap_reloc_trans()`.

## Verification

- [Phase 1] Fetched public patch from GitHub commit
  `a41c84ba2f51303b7dca2ccf426d99c4a3a757b3`; confirmed subject, body,
  tags, and +10/-4 diff.
- [Phase 2] Read `fs/btrfs/relocation.c`; confirmed old failure paths
  end the transaction without aborting.
- [Phase 2] Read `add_remap_entry()`; confirmed it can delete/shorten
  identity remap state before later operations fail.
- [Phase 2] Read `transaction.h` and `transaction.c`; confirmed
  `btrfs_abort_transaction()` records abort state and transaction abort
  rules favor local cleanup only.
- [Phase 3] `git blame -L 5032,5049 -- fs/btrfs/relocation.c`: changed
  lines introduced by `fd6594b1446cc`.
- [Phase 3] `git show --stat --format=fuller fd6594b1446cc --
  fs/btrfs/relocation.c`: introducer is `btrfs: replace identity remaps
  with actual remaps when doing relocations`, +326 lines.
- [Phase 3] `git describe --contains fd6594b1446cc`: first containing
  point reported in the 7.0-rc1 range.
- [Phase 3] `git log --author='Mark Harmstone' -10 -- fs/btrfs`: author
  has multiple nearby Btrfs/remap fixes.
- [Phase 4] `b4 dig -c`, `-a`, and `-w` failed because the candidate
  commit object is not in this local clone.
- [Phase 4] Fetched lore mirror thread; confirmed `[PATCH v2]`, v2
  change note, and `Reviewed-by` reply from Johannes Thumshirn.
- [Phase 4] Web searches found no usable stable-specific discussion;
  direct lore stable fetch was blocked by Anubis.
- [Phase 5] `rg` and file reads traced callers from balance ioctl and
  maintenance paths to `do_remap_reloc_trans()`.
- [Phase 5] Read `relocation.h`; confirmed remap-tree relocation
  condition.
- [Phase 6] `git grep` on stable branches: function exists in
  `stable/linux-7.0.y`; not found in checked `6.19.y`, `6.18.y`,
  `6.12.y`, or `6.6.y`.
- [Phase 6] `git apply --check` of the candidate hunk succeeded against
  current `stable/linux-7.0.y`.
- [Phase 6] `git log --grep` in `stable/linux-7.0.y` found no existing
  candidate commit.
- UNVERIFIED: no runtime test, fstest, or reproducer was run.
- UNVERIFIED: direct canonical lore.kernel.org pages were blocked by
  Anubis, so I used the available lore mirror and GitHub commit content.

The risk-benefit balance is clearly favorable for affected stable trees:
this is a small, reviewed, error-path-only Btrfs fix that prevents
potentially persistent filesystem metadata corruption.

**YES**

 fs/btrfs/relocation.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 033f74fd6225c..2519cdb401eda 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -5031,21 +5031,27 @@ static int do_remap_reloc_trans(struct btrfs_fs_info *fs_info,

 	if (bg_needs_free_space) {
 		ret = btrfs_add_block_group_free_space(trans, dest_bg);
-		if (ret)
+		if (ret) {
+			btrfs_abort_transaction(trans, ret);
 			goto fail;
+		}
 	}

 	ret = copy_remapped_data(fs_info, start, new_addr, length);
-	if (ret)
+	if (ret) {
+		btrfs_abort_transaction(trans, ret);
 		goto fail;
+	}

 	ret = btrfs_remove_from_free_space_tree(trans, new_addr, length);
-	if (ret)
+	if (ret) {
+		btrfs_abort_transaction(trans, ret);
 		goto fail;
+	}

 	ret = add_remap_entry(trans, path, src_bg, start, new_addr, length);
 	if (ret) {
-		btrfs_add_to_free_space_tree(trans, new_addr, length);
+		btrfs_abort_transaction(trans, ret);
 		goto fail;
 	}

-- 
2.53.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH AUTOSEL 7.0] drm/amdkfd: check if vm ready in svm map and unmap to gpu
  2026-05-05  9:51 [PATCH AUTOSEL 7.0-5.10] ALSA: hda: Avoid WARN_ON() for HDMI chmap slot checks Sasha Levin
                   ` (26 preceding siblings ...)
  2026-05-05  9:51 ` [PATCH AUTOSEL 7.0] btrfs: abort transaction in do_remap_reloc_trans() on failure Sasha Levin
@ 2026-05-05  9:51 ` Sasha Levin
  27 siblings, 0 replies; 29+ messages in thread
From: Sasha Levin @ 2026-05-05  9:51 UTC (permalink / raw)
  To: patches, stable
  Cc: YuanShang, Philip Yang, Alex Deucher, Sasha Levin, Felix.Kuehling,
	christian.koenig, airlied, simona, amd-gfx, dri-devel,
	linux-kernel

From: YuanShang <YuanShang.Mao@amd.com>

[ Upstream commit d0f5711fa14a09c010537375cf34893cd33bc2ee ]

Don't map or unmap svm range to gpu if vm is not ready for updates.

Why: DRM entity may already be killed when the svm worker try to
update gpu vm.

Signed-off-by: YuanShang <YuanShang.Mao@amd.com>
Reviewed-by: Philip Yang <philip.yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 55f8e366c326980174a4f2b9501b524d8eb25135)
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Phase 1: Commit Message Forensics
Step 1.1 Record: Subsystem `drm/amdkfd`, action verb `check`, intent:
avoid SVM GPU VM map/unmap when the AMDGPU VM cannot accept updates.

Step 1.2 Record: Tags found in the actual commit: `Signed-off-by:
YuanShang <YuanShang.Mao@amd.com>`, `Reviewed-by: Philip Yang
<philip.yang@amd.com>`, `Signed-off-by: Alex Deucher
<alexander.deucher@amd.com>`. No `Fixes:`, no `Reported-by:`, no
`Tested-by:`, no `Cc: stable`.

Step 1.3 Record: The commit says the SVM worker may try to update a GPU
VM after the DRM scheduler entity has already been killed. The user-
visible symptom was verified from the lore thread: “Trying to push to a
killed entity”, SDMA timeout, GPU reset, and a hung
`svm_range_restore_work` kworker blocked in `dma_fence_wait_timeout()`
via `svm_range_validate_and_map()`.

Step 1.4 Record: This is a hidden bug fix despite the neutral “check”
wording. It prevents submitting VM update jobs to a stopped/killed VM
update entity, which otherwise can leave fences unsignaled and hang
worker context.

## Phase 2: Diff Analysis
Step 2.1 Record: One file changed:
`drivers/gpu/drm/amd/amdkfd/kfd_svm.c`, 11 insertions. Modified
functions: `svm_range_unmap_from_gpu()` and `svm_range_map_to_gpu()`.
Scope: single-file surgical fix.

Step 2.2 Record: Before, both SVM unmap and map directly called
`amdgpu_vm_update_range()`. After, both first call `amdgpu_vm_ready(vm)`
and return `-EINVAL` if the VM is not ready. Affected path is VM page
table update submission from SVM map/unmap, including restore worker and
MMU notifier/unmap paths.

Step 2.3 Record: Bug category is synchronization/lifetime correctness
around process teardown. `amdgpu_vm_ready()` in current mainline
verifies the VM is not evicting, has no evicted PTs, and its
immediate/delayed VM update scheduler entities are not stopped. The fix
avoids queueing jobs after those entities are killed.

Step 2.4 Record: Fix quality is good: 11 lines, no new API, no feature,
no data structure changes. Regression risk is low, mainly early
returning `-EINVAL` when VM updates cannot run anyway. Backport risk is
higher for older trees because `amdgpu_vm_ready()` only gained stopped-
entity checks in commit `f101c13a8720c7`; older stable trees need that
or an equivalent prerequisite for this patch to address the killed-
entity failure.

## Phase 3: Git History Investigation
Step 3.1 Record: Blame shows SVM map/unmap infrastructure was introduced
by `f80fe9d3c114` (“drm/amdkfd: map svm range to GPUs”, first in
`v5.14-rc1`) and later reshaped by commits including `6c1a7867734`
(`v5.18-rc1`). The missing readiness guard has existed in these SVM
paths for a long time.

Step 3.2 Record: No `Fixes:` tag, so no direct target to follow.

Step 3.3 Record: Recent file history contains many SVM fixes, including
UAF, address conversion, PTE clearing, restore work, and retry-fault
race fixes. Related commit `597eb70f7ff7` / upstream `10c382ec6c6d`
(“drm/amdkfd: Don’t clear PT after process killed”) added an
`amdgpu_vm_ready()` guard in a different KFD GPUVM path and was
explicitly stable-tagged.

Step 3.4 Record: `git log --author='YuanShang' -10 --
drivers/gpu/drm/amd/amdkfd` produced no reachable prior commits in this
checkout. The patch was reviewed by Philip Yang, a regular AMD KFD
contributor, and committed by Alex Deucher.

Step 3.5 Record: Dependency identified: `f101c13a8720c7` (“drm/amdgpu:
fix task hang from failed job submission during process kill”) teaches
`amdgpu_vm_ready()` to check stopped VM update entities. Without it,
this candidate’s guard does not fully detect the killed-entity condition
in older stable trees.

## Phase 4: Mailing List And External Research
Step 4.1 Record: `b4 dig -c 55f8e366c326...` found the original
submission at `https://patch.msgid.link/20260326103656.487304-1-
YuanShang.Mao@amd.com`. `b4 dig -a` found only v1, standalone. WebFetch
to lore was blocked by Anubis, but `b4 dig -m` retrieved the mbox
successfully.

Step 4.2 Record: `b4 dig -w` showed original recipients were YuanShang
and `amd-gfx@lists.freedesktop.org`. The thread later included Christian
König and Philip Yang.

Step 4.3 Record: No separate bugzilla/syzbot link. The thread itself
contains the bug log: killed entity error, SDMA timeout, GPU reset,
recovered wedge, and hung kworker in `svm_range_restore_work`.

Step 4.4 Record: Philip Yang stated the earlier “Don’t clear PT after
process killed” patch fixed one path and this patch fixes another path,
then gave `Reviewed-by: Philip Yang <philip.yang@amd.com>`. No NAKs
found.

Step 4.5 Record: Stable-specific web search could not be verified
because WebFetch to lore/stable timed out or hit Anubis. No stable
nomination for this exact patch found in the mbox.

## Phase 5: Code Semantic Analysis
Step 5.1 Record: Key functions: `svm_range_unmap_from_gpu()`,
`svm_range_map_to_gpu()`.

Step 5.2 Record: Callers verified: `svm_range_unmap_from_gpu()` is
called by `svm_range_unmap_from_gpus()`, reached from CPU unmap/MMU
notifier handling and SVM validation with PROT_NONE.
`svm_range_map_to_gpu()` is called by `svm_range_map_to_gpus()`, reached
from `svm_range_validate_and_map()`.

Step 5.3 Record: Key callees: both changed functions call
`amdgpu_vm_update_range()`. For SDMA VM updates, that path
allocates/submits an AMDGPU job; `amdgpu_job_submit()` arms the
scheduler job and calls `drm_sched_entity_push_job()`.

Step 5.4 Record: Reachability verified: `svm_range_restore_work()` calls
`svm_range_validate_and_map()`, which calls `svm_range_map_to_gpus()`
and then `svm_range_map_to_gpu()`. The lore log shows exactly this call
chain in a hung kworker. GPU page fault and MMU notifier paths also
reach the same validation/unmap functions.

Step 5.5 Record: Similar pattern verified: `amdgpu_amdkfd_gpuvm.c`
already has an `amdgpu_vm_ready()` guard with the comment “VM entity
stopped if process killed”; `amdgpu_cs.c` and `amdgpu_gem.c` also check
VM readiness before clearing freed mappings.

## Phase 6: Stable Tree Analysis
Step 6.1 Record: The SVM map/unmap functions exist in `v5.15`, `v6.1`,
`v6.6`, and `v6.8`, and none of those extracted versions had the new
guards. The reported log was from Ubuntu `6.8.0-90-generic`, confirming
a stable-derived affected kernel.

Step 6.2 Record: Backport difficulty: minor to moderate. `v6.8`, `v6.6`,
and `v6.1` have the same conceptual functions but older
`amdgpu_vm_update_range()` signatures. `v5.15` uses older
`amdgpu_vm_bo_update_mapping()` in this path. Older trees also need
`f101c13a8720c7` or equivalent stopped-entity readiness logic.

Step 6.3 Record: Related fix `597eb70f7ff7`/`10c382ec6c6d` addresses a
different process-kill VM update path and was stable-tagged. It does not
cover SVM map/unmap; Philip Yang explicitly confirmed this patch fixes
another path.

## Phase 7: Subsystem Context
Step 7.1 Record: Subsystem is AMDGPU KFD SVM/HMM GPU memory management.
Criticality: important, affecting AMD compute users using KFD SVM, GPU
page faults, migration, and process teardown.

Step 7.2 Record: Subsystem is active; recent history shows many SVM
correctness fixes. The bug is in a mature path present since `v5.14+`,
not just brand-new code.

## Phase 8: Impact And Risk
Step 8.1 Record: Affected population is driver/config/hardware specific:
AMDGPU KFD users with SVM-capable compute workloads.

Step 8.2 Record: Trigger requires SVM VM update work racing with forced
process kill or teardown after VM scheduler entities are stopped. The
lore log verifies a real trigger. Whether it is fully unprivileged
depends on render/KFD device permissions and was not independently
verified.

Step 8.3 Record: Failure mode is severe: verified killed-entity error,
SDMA ring timeout, GPU reset, recovered device wedge, and hung kworker
for more than 245 seconds. Severity: HIGH, arguably CRITICAL for
affected systems.

Step 8.4 Record: Benefit is high for affected AMD KFD users because it
avoids a real hung-task/GPU-reset failure. Risk is low in mainline-
shaped code because the fix only refuses impossible VM updates. Risk for
older stable trees is manageable but requires prerequisite/backport care
around `amdgpu_vm_ready()` semantics.

## Phase 9: Final Synthesis
Step 9.1 Record: Evidence for backporting: real user log, severe
hang/GPU reset, small surgical fix, reviewed by Philip Yang, related
already-stable process-kill fix shows same class of bug, affected code
exists in stable-derived kernels. Evidence against: no `Fixes:` or `Cc:
stable`, and older trees need dependency/backport adjustment.
Unresolved: exact clean-apply status for every active stable branch was
not tested.

Step 9.2 Record: Stable rules checklist: obviously correct and reviewed:
yes, with dependency caveat. Fixes real bug: yes, verified by lore log.
Important issue: yes, hung task/GPU reset. Small and contained: yes, 11
lines in one file. No new feature/API: yes. Applies to stable: likely
needs minor backport adjustment and `f101c13a8720c7` or equivalent for
older trees.

Step 9.3 Record: No exception category applies; this is not a device ID,
quirk, DT, build, or documentation-only patch.

Step 9.4 Record: Decision is to backport, but not as an isolated blind
cherry-pick to older stable trees. It should be taken with the stopped-
entity readiness prerequisite/backport so `amdgpu_vm_ready()` actually
detects the process-kill condition.

## Verification
- Phase 1: Parsed target commit with `git show`; confirmed tags and no
  `Fixes:`/stable tag.
- Phase 2: Diff verified as 11 insertions in
  `svm_range_unmap_from_gpu()` and `svm_range_map_to_gpu()`.
- Phase 3: `git blame` verified SVM map/unmap history; `git describe
  --contains` placed the original SVM mapping in `v5.14-rc1` and later
  function reshaping in `v5.18-rc1`.
- Phase 3: `git show f101c13a8720c7` verified `amdgpu_vm_ready()` gained
  stopped-entity checks.
- Phase 3: `git show 597eb70f7ff7` verified the related stable-tagged
  “Don’t clear PT after process killed” guard in another path.
- Phase 4: `b4 dig`, `b4 dig -a`, `b4 dig -w`, and decoded mbox verified
  the lore thread, single v1 patch, recipients, log, Christian’s “Looks
  correct”, and Philip’s `Reviewed-by`.
- Phase 5: `rg` and file reads traced map/unmap callers through
  `svm_range_validate_and_map()` and `svm_range_restore_work()`.
- Phase 6: Extracted `v5.15`, `v6.1`, `v6.6`, and `v6.8` files; verified
  the affected SVM functions exist without the new guard.
- Phase 8: Lore log verified severity: killed entity, SDMA timeout, GPU
  reset, device wedge recovery, and hung kworker.
- UNVERIFIED: WebFetch lore/stable searches were blocked/timed out, and
  I did not test applying the patch to each stable branch.

**YES**

 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 080242f9981b0..addb86803d9ae 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1363,6 +1363,12 @@ svm_range_unmap_from_gpu(struct amdgpu_device *adev, struct amdgpu_vm *vm,

 	pr_debug("CPU[0x%llx 0x%llx] -> GPU[0x%llx 0x%llx]\n", start, last,
 		gpu_start, gpu_end);
+
+	if (!amdgpu_vm_ready(vm)) {
+		pr_debug("VM not ready, canceling unmap\n");
+		return -EINVAL;
+	}
+
 	return amdgpu_vm_update_range(adev, vm, false, true, true, false, NULL, gpu_start,
 				      gpu_end, init_pte_value, 0, 0, NULL, NULL,
 				      fence);
@@ -1440,6 +1446,11 @@ svm_range_map_to_gpu(struct kfd_process_device *pdd, struct svm_range *prange,
 	pr_debug("svms 0x%p [0x%lx 0x%lx] readonly %d\n", prange->svms,
 		 last_start, last_start + npages - 1, readonly);

+	if (!amdgpu_vm_ready(vm)) {
+		pr_debug("VM not ready, canceling map\n");
+		return -EINVAL;
+	}
+
 	for (i = offset; i < offset + npages; i++) {
 		uint64_t gpu_start;
 		uint64_t gpu_end;
-- 
2.53.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2026-05-05  9:53 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-05  9:51 [PATCH AUTOSEL 7.0-5.10] ALSA: hda: Avoid WARN_ON() for HDMI chmap slot checks Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.1] nvmet-tcp: check INIT_FAILED before nvmet_req_uninit in digest error path Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0] drm/amd/pm: Update emit clock logic Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0] smb: client: change allocation requirements in smb2_compound_op Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.18] btrfs: handle -EAGAIN from btrfs_duplicate_item and refresh stale leaf pointer Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-5.10] nvme: add missing MODULE_ALIAS for fabrics transports Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0] dpll: export __dpll_pin_change_ntf() for use under dpll_lock Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-5.10] nvme-core: fix parameter name in comment Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-5.10] nvme: add quirk NVME_QUIRK_IGNORE_DEV_SUBNQN for 144d:a808 (Samsung PM981/983/970 EVO Plus ) Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0] ASoC: spacemit: move hw constraints from hw_params to startup Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-5.10] ALSA: usb-audio: apply quirk for Playstation PDP Riffmaster Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.18] nvmet-tcp: Don't clear tls_key when freeing sq Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-5.10] rculist: add list_splice_rcu() for private lists Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0] ALSA: hda/realtek: enable mute LED support on ThinkBook 16p Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.18] mailbox: cix: Add IRQF_NO_SUSPEND to mailbox interrupt Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.12] ASoC: codecs: wcd937x: fix AUX PA sequencing and mixer controls Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.18] btrfs: replace ASSERT with proper error handling in stripe lookup fallback Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-5.10] btrfs: handle unexpected free-space-tree key types Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.18] md/raid5: Fix UAF on IO across the reshape position Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.6] btrfs: apply first key check for readahead when possible Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.6] ASoC: aw88395: Fix kernel panic caused by invalid GPIO error pointer Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.12] nvme-tcp: teardown circular locking fixes Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.18] btrfs: fix wrong min_objectid in btrfs_previous_item() call Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.18] btrfs: check return value of btrfs_partially_delete_raid_extent() Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.18] btrfs: fix raid stripe search missing entries at leaf boundaries Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.18] btrfs: copy devid in btrfs_partially_delete_raid_extent() Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0-6.18] nvme-multipath: put module reference when delayed removal work is canceled Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0] btrfs: abort transaction in do_remap_reloc_trans() on failure Sasha Levin
2026-05-05  9:51 ` [PATCH AUTOSEL 7.0] drm/amdkfd: check if vm ready in svm map and unmap to gpu Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox