Archive-only list for patches

Archive-only list for patches
 help / color / mirror / Atom feed

* [PATCH AUTOSEL 6.18] dma-mapping: add DMA_ATTR_CPU_CACHE_CLEAN
From: Sasha Levin @ 2026-04-20 13:22 UTC (permalink / raw)
  To: patches, stable
  Cc: Michael S. Tsirkin, Petr Tesarik, Marek Szyprowski, Sasha Levin,
	iommu, linux-kernel
In-Reply-To: <20260420132314.1023554-1-sashal@kernel.org>

From: "Michael S. Tsirkin" <mst@redhat.com>

[ Upstream commit 61868dc55a119a5e4b912d458fc2c48ba80a35fe ]

When multiple small DMA_FROM_DEVICE or DMA_BIDIRECTIONAL buffers share a
cacheline, and DMA_API_DEBUG is enabled, we get this warning:
	cacheline tracking EEXIST, overlapping mappings aren't supported.

This is because when one of the mappings is removed, while another one
is active, CPU might write into the buffer.

Add an attribute for the driver to promise not to do this, making the
overlapping safe, and suppressing the warning.

Message-ID: <2d5d091f9d84b68ea96abd545b365dd1d00bbf48.1767601130.git.mst@redhat.com>
Reviewed-by: Petr Tesarik <ptesarik@suse.com>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Stable-dep-of: 3d48c9fd78dd ("dma-debug: suppress cacheline overlap warning when arch has no DMA alignment requirement")
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Error: Failed to generate final synthesis

 include/linux/dma-mapping.h | 7 +++++++
 kernel/dma/debug.c          | 3 ++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 190eab9f5e8c2..3e63046b899bc 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -78,6 +78,13 @@
  */
 #define DMA_ATTR_MMIO		(1UL << 10)
 
+/*
+ * DMA_ATTR_CPU_CACHE_CLEAN: Indicates the CPU will not dirty any cacheline
+ * overlapping this buffer while it is mapped for DMA. All mappings sharing
+ * a cacheline must have this attribute for this to be considered safe.
+ */
+#define DMA_ATTR_CPU_CACHE_CLEAN	(1UL << 11)
+
 /*
  * A dma_addr_t can hold any valid DMA or bus address for the platform.  It can
  * be given to a device to use as a DMA source or target.  It is specific to a
diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index 138ede653de40..7e66d863d573f 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -595,7 +595,8 @@ static void add_dma_entry(struct dma_debug_entry *entry, unsigned long attrs)
 	if (rc == -ENOMEM) {
 		pr_err_once("cacheline tracking ENOMEM, dma-debug disabled\n");
 		global_disable = true;
-	} else if (rc == -EEXIST && !(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
+	} else if (rc == -EEXIST &&
+		   !(attrs & (DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_CPU_CACHE_CLEAN)) &&
 		   !(IS_ENABLED(CONFIG_DMA_BOUNCE_UNALIGNED_KMALLOC) &&
 		     is_swiotlb_active(entry->dev))) {
 		err_printk(entry->dev, entry,
-- 
2.53.0


^ permalink raw reply related

* [PATCH AUTOSEL 6.18] HID: quirks: add HID_QUIRK_ALWAYS_POLL for 8BitDo Pro 3
From: Sasha Levin @ 2026-04-20 13:22 UTC (permalink / raw)
  To: patches, stable
  Cc: leo vriska, Jiri Kosina, Sasha Levin, jikos, bentiss, linux-input,
	linux-kernel
In-Reply-To: <20260420132314.1023554-1-sashal@kernel.org>

From: leo vriska <leo@60228.dev>

[ Upstream commit 532743944324a873bbaf8620fcabcd0e69e30c36 ]

According to a mailing list report [1], this controller's predecessor
has the same issue. However, it uses the xpad driver instead of HID, so
this quirk wouldn't apply.

[1]: https://lore.kernel.org/linux-input/unufo3$det$1@ciao.gmane.io/

Signed-off-by: leo vriska <leo@60228.dev>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Error: Failed to generate final synthesis

 drivers/hid/hid-ids.h    | 3 +++
 drivers/hid/hid-quirks.c | 1 +
 2 files changed, 4 insertions(+)

diff --git a/drivers/hid/hid-ids.h b/drivers/hid/hid-ids.h
index d9d354f1b8847..a245928933454 100644
--- a/drivers/hid/hid-ids.h
+++ b/drivers/hid/hid-ids.h
@@ -22,6 +22,9 @@
 #define USB_DEVICE_ID_3M2256		0x0502
 #define USB_DEVICE_ID_3M3266		0x0506
 
+#define USB_VENDOR_ID_8BITDO		0x2dc8
+#define USB_DEVICE_ID_8BITDO_PRO_3	0x6009
+
 #define USB_VENDOR_ID_A4TECH		0x09da
 #define USB_DEVICE_ID_A4TECH_WCP32PU	0x0006
 #define USB_DEVICE_ID_A4TECH_X5_005D	0x000a
diff --git a/drivers/hid/hid-quirks.c b/drivers/hid/hid-quirks.c
index 3217e436c052c..f6be3ffee0232 100644
--- a/drivers/hid/hid-quirks.c
+++ b/drivers/hid/hid-quirks.c
@@ -25,6 +25,7 @@
  */
 
 static const struct hid_device_id hid_quirks[] = {
+	{ HID_USB_DEVICE(USB_VENDOR_ID_8BITDO, USB_DEVICE_ID_8BITDO_PRO_3), HID_QUIRK_ALWAYS_POLL },
 	{ HID_USB_DEVICE(USB_VENDOR_ID_AASHIMA, USB_DEVICE_ID_AASHIMA_GAMEPAD), HID_QUIRK_BADPAD },
 	{ HID_USB_DEVICE(USB_VENDOR_ID_AASHIMA, USB_DEVICE_ID_AASHIMA_PREDATOR), HID_QUIRK_BADPAD },
 	{ HID_USB_DEVICE(USB_VENDOR_ID_ADATA_XPG, USB_VENDOR_ID_ADATA_XPG_WL_GAMING_MOUSE), HID_QUIRK_ALWAYS_POLL },
-- 
2.53.0


^ permalink raw reply related

* [PATCH AUTOSEL 6.18] ALSA: hda/realtek: Add quirk for ASUS ROG Flow Z13-KJP GZ302EAC
From: Sasha Levin @ 2026-04-20 13:22 UTC (permalink / raw)
  To: patches, stable
  Cc: Matthew Schwartz, Takashi Iwai, Sasha Levin, perex, tiwai,
	linux-sound, linux-kernel
In-Reply-To: <20260420132314.1023554-1-sashal@kernel.org>

From: Matthew Schwartz <matthew.schwartz@linux.dev>

[ Upstream commit 59f68dc1d8df3142cb58fd2568966a9bb7b0ed8a ]

Fixes lack of audio output on the ASUS ROG Flow Z13-KJP GZ302EAC model,
similar to the ASUS ROG Flow Z13 GZ302EA.

Signed-off-by: Matthew Schwartz <matthew.schwartz@linux.dev>
Link: https://patch.msgid.link/20260313172503.285846-1-matthew.schwartz@linux.dev
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Error: Failed to generate final synthesis

 sound/hda/codecs/realtek/alc269.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/sound/hda/codecs/realtek/alc269.c b/sound/hda/codecs/realtek/alc269.c
index 4b06cb48252e2..1959adb6c5189 100644
--- a/sound/hda/codecs/realtek/alc269.c
+++ b/sound/hda/codecs/realtek/alc269.c
@@ -7065,6 +7065,7 @@ static const struct hda_quirk alc269_fixup_tbl[] = {
 	SND_PCI_QUIRK(0x1043, 0x14e3, "ASUS G513PI/PU/PV", ALC287_FIXUP_CS35L41_I2C_2),
 	SND_PCI_QUIRK(0x1043, 0x14f2, "ASUS VivoBook X515JA", ALC256_FIXUP_ASUS_MIC_NO_PRESENCE),
 	SND_PCI_QUIRK(0x1043, 0x1503, "ASUS G733PY/PZ/PZV/PYV", ALC287_FIXUP_CS35L41_I2C_2),
+	SND_PCI_QUIRK(0x1043, 0x1514, "ASUS ROG Flow Z13 GZ302EAC", ALC287_FIXUP_CS35L41_I2C_2),
 	SND_PCI_QUIRK(0x1043, 0x1517, "Asus Zenbook UX31A", ALC269VB_FIXUP_ASUS_ZENBOOK_UX31A),
 	SND_PCI_QUIRK(0x1043, 0x1533, "ASUS GV302XA/XJ/XQ/XU/XV/XI", ALC287_FIXUP_CS35L41_I2C_2),
 	SND_PCI_QUIRK(0x1043, 0x1573, "ASUS GZ301VV/VQ/VU/VJ/VA/VC/VE/VVC/VQC/VUC/VJC/VEC/VCC", ALC285_FIXUP_ASUS_HEADSET_MIC),
-- 
2.53.0


^ permalink raw reply related

* [PATCH AUTOSEL 6.18] ALSA: hda/realtek: add HP Laptop 15-fd0xxx mute LED quirk
From: Sasha Levin @ 2026-04-20 13:22 UTC (permalink / raw)
  To: patches, stable
  Cc: Kshamendra Kumar Mishra, Takashi Iwai, Sasha Levin, perex, tiwai,
	linux-sound, linux-kernel
In-Reply-To: <20260420132314.1023554-1-sashal@kernel.org>

From: Kshamendra Kumar Mishra <kshamendrakumarmishra@gmail.com>

[ Upstream commit faceb5cf5d7a08f4a40335d22d833bb75f05d99e ]

HP Laptop 15-fd0xxx with ALC236 codec does not handle the toggling of
the mute LED.
This patch adds a quirk entry for subsystem ID 0x8dd7 using
ALC236_FIXUP_HP_MUTE_LED_COEFBIT2 fixup, enabling correct mute LED
behavior.

Signed-off-by: Kshamendra Kumar Mishra <kshamendrakumarmishra@gmail.com>
Link: https://patch.msgid.link/DHAB51ISUM96.2K9SZIABIDEQ0@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Error: Failed to generate final synthesis

 sound/hda/codecs/realtek/alc269.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/sound/hda/codecs/realtek/alc269.c b/sound/hda/codecs/realtek/alc269.c
index c782a35f9239d..0c975005793e7 100644
--- a/sound/hda/codecs/realtek/alc269.c
+++ b/sound/hda/codecs/realtek/alc269.c
@@ -6977,6 +6977,7 @@ static const struct hda_quirk alc269_fixup_tbl[] = {
 	SND_PCI_QUIRK(0x103c, 0x8da7, "HP 14 Enstrom OmniBook X", ALC287_FIXUP_CS35L41_I2C_2),
 	SND_PCI_QUIRK(0x103c, 0x8da8, "HP 16 Piston OmniBook X", ALC287_FIXUP_CS35L41_I2C_2),
 	SND_PCI_QUIRK(0x103c, 0x8dd4, "HP EliteStudio 8 AIO", ALC274_FIXUP_HP_AIO_BIND_DACS),
+	SND_PCI_QUIRK(0x103c, 0x8dd7, "HP Laptop 15-fd0xxx", ALC236_FIXUP_HP_MUTE_LED_COEFBIT2),
 	SND_PCI_QUIRK(0x103c, 0x8de8, "HP Gemtree", ALC245_FIXUP_TAS2781_SPI_2),
 	SND_PCI_QUIRK(0x103c, 0x8de9, "HP Gemtree", ALC245_FIXUP_TAS2781_SPI_2),
 	SND_PCI_QUIRK(0x103c, 0x8dec, "HP EliteBook 640 G12", ALC236_FIXUP_HP_GPIO_LED),
-- 
2.53.0


^ permalink raw reply related

* [PATCH AUTOSEL 7.0-6.18] ALSA: usb-audio: Add quirk flags for Feaulle Rainbow
From: Sasha Levin @ 2026-04-20 13:22 UTC (permalink / raw)
  To: patches, stable
  Cc: Rong Zhang, Takashi Iwai, Sasha Levin, perex, tiwai, linux-sound,
	linux-kernel
In-Reply-To: <20260420132314.1023554-1-sashal@kernel.org>

From: Rong Zhang <i@rong.moe>

[ Upstream commit 4f84e6caf38b05991b3b2afc0ddf4e48c2752d1d ]

Feaulle Rainbow is a wired USB-C dynamic in-ear monitor (IEM) featuring
active noise cancellation (ANC).

The supported sample rates are 48000Hz and 96000Hz at 16bit or 24bit,
but it does not support reading the current sample rate and results in
an error message printed to kmsg. Set QUIRK_FLAG_GET_SAMPLE_RATE to skip
the sample rate check.

Its playback mixer reports val = -15360/0/128. Setting -15360 (-60dB)
mutes the playback, so QUIRK_FLAG_MIXER_PLAYBACK_MIN_MUTE is needed.

Add a quirk table entry matching VID/PID=0x0e0b/0xfa01 and applying
the mentioned quirk flags, so that it can work properly.

Quirky device sample:

  usb 7-1: New USB device found, idVendor=0e0b, idProduct=fa01, bcdDevice= 1.00
  usb 7-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
  usb 7-1: Product: Feaulle Rainbow
  usb 7-1: Manufacturer: Generic
  usb 7-1: SerialNumber: 20210726905926

Signed-off-by: Rong Zhang <i@rong.moe>
Link: https://patch.msgid.link/20260409-feaulle-rainbow-v1-1-09179e09000d@rong.moe
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Now I have all the information needed. Let me compile the complete
analysis.

---

## PHASE 1: COMMIT MESSAGE FORENSICS

### Step 1.1: Subject Line
- **Subsystem**: `ALSA: usb-audio`
- **Action verb**: "Add" (quirk flags)
- **Summary**: Adds USB audio quirk table entry for the Feaulle Rainbow
  IEM device (VID 0x0e0b, PID 0xfa01)

### Step 1.2: Tags
- **Signed-off-by**: Rong Zhang `<i@rong.moe>` (author)
- **Signed-off-by**: Takashi Iwai `<tiwai@suse.de>` (ALSA subsystem
  maintainer - merged it)
- **Link**: `https://patch.msgid.link/20260409-feaulle-
  rainbow-v1-1-09179e09000d@rong.moe`
- No Fixes: tag (expected for quirk additions)
- No Reported-by: (author is the device user/tester)
- No Cc: stable (expected; that's why we're reviewing)

### Step 1.3: Commit Body Analysis
Two real issues described:
1. Device does not support reading current sample rate, producing error
   messages in kmsg. `QUIRK_FLAG_GET_SAMPLE_RATE` skips that unsupported
   operation.
2. Device's playback mixer reports val = -15360/0/128 where -15360
   (-60dB) mutes playback, but the driver treats it as minimum volume,
   not mute. `QUIRK_FLAG_MIXER_PLAYBACK_MIN_MUTE` correctly treats the
   minimum as mute.

The commit includes USB enumeration output proving the device exists and
has been tested.

### Step 1.4: Hidden Bug Fix Detection
This is an explicit hardware quirk addition. It fixes incorrect device
behavior without needing the word "fix" — the device doesn't work
properly without these flags.

Record: This is a hardware workaround, a well-known exception category
for stable.

---

## PHASE 2: DIFF ANALYSIS

### Step 2.1: Inventory
- **Files changed**: 1 (`sound/usb/quirks.c`)
- **Lines added**: 2
- **Lines removed**: 0
- **Scope**: Single table entry addition; purely data, no logic changes

### Step 2.2: Code Flow Change
The diff adds a single `DEVICE_FLG()` entry to the `quirk_flags_table[]`
sorted array:

```c
DEVICE_FLG(0x0e0b, 0xfa01, /* Feaulle Rainbow */
           QUIRK_FLAG_GET_SAMPLE_RATE |
QUIRK_FLAG_MIXER_PLAYBACK_MIN_MUTE),
```

Inserted in VID-sorted order between 0x0d8c and 0x0ecb entries.

### Step 2.3: Bug Mechanism
Category (h): **Hardware workarounds**. This is a device ID + quirk
flags addition to an existing quirk table. The flags are well-
established:
- `QUIRK_FLAG_GET_SAMPLE_RATE`: Causes `clock.c` to skip the unsupported
  get-sample-rate call
- `QUIRK_FLAG_MIXER_PLAYBACK_MIN_MUTE`: Causes `mixer_quirks.c` to set
  `cval->min_mute = 1` for playback

### Step 2.4: Fix Quality
- Obviously correct: adds a table entry matching one VID/PID pair
- Minimal: 2 lines, data-only
- Zero regression risk: only affects this specific USB device
  (0x0e0b:0xfa01)
- Signed off by Takashi Iwai (ALSA maintainer)

---

## PHASE 3: GIT HISTORY INVESTIGATION

### Step 3.1: Blame
The `quirk_flags_table` was introduced in commit `4d4dee0aefec3`
(2021-07-29, v5.15). The table structure has been stable for years with
entries regularly added.

### Step 3.2: Fixes Tag
No Fixes: tag — expected for a quirk/device-ID addition. Not a
regression fix; it's new hardware enablement.

### Step 3.3: File History
Recent commits to `sound/usb/quirks.c` are dominated by similar quirk
additions (Scarlett, NeuralDSP, AB17X, SPACETOUCH, etc.). This is a
well-trodden pattern.

### Step 3.4: Author
Rong Zhang is the device owner/user. The patch was accepted and merged
by Takashi Iwai, the ALSA subsystem maintainer, which is strong
validation.

### Step 3.5: Dependencies
- `QUIRK_FLAG_GET_SAMPLE_RATE`: existed since v5.15
- `QUIRK_FLAG_MIXER_PLAYBACK_MIN_MUTE`: renamed from
  `QUIRK_FLAG_MIXER_MIN_MUTE` in v6.18 (commit `ace1817ab49b3`). Stable
  trees <6.18 would need the old flag name, which is a trivial one-word
  substitution.
- No other dependencies.

---

## PHASE 4: MAILING LIST RESEARCH

### Step 4.1-4.5
b4 dig could not match the commit (it's not yet in a commit the local
tree knows). Lore.kernel.org was behind anti-bot protection. The Link in
the commit message (`patch.msgid.link/20260409-feaulle-
rainbow-v1-1-09179e09000d@rong.moe`) confirms this is v1, patch 1/1 — a
standalone single-patch submission. It was merged quickly by Takashi
Iwai, indicating no review concerns.

---

## PHASE 5: CODE SEMANTIC ANALYSIS

### Step 5.1-5.5
The `quirk_flags_table` is consulted via
`snd_usb_init_quirk_flags_table()` during USB audio device
initialization. The function iterates the table, matches by USB ID, and
sets `chip->quirk_flags`. These flags are then checked in:
- `sound/usb/clock.c` (line ~490): if `QUIRK_FLAG_GET_SAMPLE_RATE` is
  set, skip reading sample rate → prevents error messages
- `sound/usb/mixer_quirks.c` (line ~4649): if
  `QUIRK_FLAG_MIXER_PLAYBACK_MIN_MUTE`, set `cval->min_mute = 1` → makes
  the minimum volume level act as mute

Both code paths are well-exercised by the ~139 existing `DEVICE_FLG`
entries in the table.

---

## PHASE 6: STABLE TREE ANALYSIS

### Step 6.1
The `quirk_flags_table` exists in all active stable trees since v5.15.
Both quirk flags exist (though `MIXER_PLAYBACK_MIN_MUTE` was called
`MIXER_MIN_MUTE` before v6.18).

### Step 6.2: Backport Complications
- For 6.18+ and 7.0: clean apply
- For 6.6.y, 6.1.y, 5.15.y: trivial flag rename needed
  (`QUIRK_FLAG_MIXER_PLAYBACK_MIN_MUTE` → `QUIRK_FLAG_MIXER_MIN_MUTE`)
- Surrounding table context may differ slightly (nearby entries may be
  absent), but this is a simple insertion.

---

## PHASE 7: SUBSYSTEM CONTEXT

### Step 7.1
- **Subsystem**: Sound / USB audio (`sound/usb/`)
- **Criticality**: IMPORTANT — USB audio is widely used (headsets, DACs,
  IEMs, webcams)

### Step 7.2
Very active subsystem — frequent quirk additions, well-maintained by
Takashi Iwai.

---

## PHASE 8: IMPACT AND RISK ASSESSMENT

### Step 8.1: Affected Users
Users who own the Feaulle Rainbow USB-C IEM (VID 0x0e0b, PID 0xfa01).

### Step 8.2: Trigger Conditions
Triggers every time the device is plugged in. Without the quirk:
- Error messages appear in kmsg (sample rate read failure)
- Minimum playback volume is not treated as mute, causing incorrect
  volume behavior

### Step 8.3: Failure Mode Severity
- Without quirk: error messages in kernel log + incorrect audio mixer
  behavior (LOW-MEDIUM severity)
- With quirk: device works properly

### Step 8.4: Risk-Benefit
- **Benefit**: Makes a real USB audio device work correctly for its
  users
- **Risk**: Essentially zero — 2-line data-only addition, scoped to a
  single VID/PID, uses existing well-tested quirk infrastructure

---

## PHASE 9: FINAL SYNTHESIS

### Step 9.1: Evidence Summary
**FOR backporting:**
- Classic USB audio hardware quirk — explicitly listed as an exception
  category (always YES)
- 2 lines, data-only change, zero logic changes
- Uses well-established quirk flags present in all active stable trees
- Merged by ALSA subsystem maintainer Takashi Iwai
- Fixes real device behavior (error messages + incorrect volume
  handling)
- Zero regression risk (only affects this specific device)
- Identical pattern to dozens of previously backported quirk additions

**AGAINST backporting:**
- None

### Step 9.2: Stable Rules Checklist
1. Obviously correct? **YES** — trivial table entry
2. Fixes a real bug? **YES** — device doesn't work properly without it
3. Important issue? **YES** — hardware enablement quirk
4. Small and contained? **YES** — 2 lines, 1 file
5. No new features or APIs? **YES** — no new features
6. Can apply to stable? **YES** — trivially, with minor flag rename for
   <6.18

### Step 9.3: Exception Category
**Hardware quirk/workaround** — this is the textbook example of a
stable-appropriate quirk addition.

---

## Verification

- [Phase 1] Parsed tags: Signed-off-by Rong Zhang (author) and Takashi
  Iwai (maintainer), Link to patch.msgid.link
- [Phase 2] Diff analysis: 2 lines added to `quirk_flags_table[]` in
  `sound/usb/quirks.c`, single DEVICE_FLG entry
- [Phase 2] Verified both flags exist: `QUIRK_FLAG_GET_SAMPLE_RATE`
  checked at `clock.c:490`, `QUIRK_FLAG_MIXER_PLAYBACK_MIN_MUTE` checked
  at `mixer_quirks.c:4649`
- [Phase 3] git blame: quirk_flags_table introduced in commit
  4d4dee0aefec3 (2021-07-29, v5.15)
- [Phase 3] git tag: `QUIRK_FLAG_MIXER_PLAYBACK_MIN_MUTE` (renamed)
  present from v6.18+; older trees use `QUIRK_FLAG_MIXER_MIN_MUTE`
- [Phase 3] git log: recent quirks.c history shows regular quirk
  additions (identical pattern)
- [Phase 3] Author Rong Zhang has no prior commits in sound/usb/ — they
  are the device owner
- [Phase 4] b4 dig failed (commit not in local tree). Lore blocked by
  anti-bot. Patch is v1, 1/1 (standalone, no series dependencies).
- [Phase 5] `DEVICE_FLG` count: 139 existing entries in the table —
  well-established pattern
- [Phase 6] quirk_flags_table exists since v5.15 — present in all active
  stable trees
- [Phase 8] Risk: zero — data-only, single-device-scoped, well-tested
  infrastructure
- UNVERIFIED: Could not access lore.kernel.org discussion thread due to
  anti-bot protection. This does not affect the decision since the
  commit is a straightforward quirk addition signed off by the
  maintainer.

**YES**

 sound/usb/quirks.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/sound/usb/quirks.c b/sound/usb/quirks.c
index d3a69995c1ad5..e95a228def2f0 100644
--- a/sound/usb/quirks.c
+++ b/sound/usb/quirks.c
@@ -2281,6 +2281,8 @@ static const struct usb_audio_quirk_flags_table quirk_flags_table[] = {
 		   QUIRK_FLAG_MIXER_PLAYBACK_MIN_MUTE),
 	DEVICE_FLG(0x0d8c, 0x0014, /* C-Media */
 		   QUIRK_FLAG_CTL_MSG_DELAY_1M | QUIRK_FLAG_MIXER_PLAYBACK_MIN_MUTE),
+	DEVICE_FLG(0x0e0b, 0xfa01, /* Feaulle Rainbow */
+		   QUIRK_FLAG_GET_SAMPLE_RATE | QUIRK_FLAG_MIXER_PLAYBACK_MIN_MUTE),
 	DEVICE_FLG(0x0ecb, 0x205c, /* JBL Quantum610 Wireless */
 		   QUIRK_FLAG_FIXED_RATE),
 	DEVICE_FLG(0x0ecb, 0x2069, /* JBL Quantum810 Wireless */
-- 
2.53.0


^ permalink raw reply related

* [PATCH AUTOSEL 7.0-6.1] ext4: unmap invalidated folios from page tables in mpage_release_unused_pages()
From: Sasha Levin @ 2026-04-20 13:22 UTC (permalink / raw)
  To: patches, stable
  Cc: Deepanshu Kartikey, syzbot+b0a0670332b6b3230a0a, Matthew Wilcox,
	Theodore Ts'o, Sasha Levin, linux-ext4, linux-kernel
In-Reply-To: <20260420132314.1023554-1-sashal@kernel.org>

From: Deepanshu Kartikey <kartikey406@gmail.com>

[ Upstream commit 9b25f381de6b8942645f43735cb0a4fb0ab3a6d1 ]

When delayed block allocation fails (e.g., due to filesystem corruption
detected in ext4_map_blocks()), the writeback error handler calls
mpage_release_unused_pages(invalidate=true) which invalidates affected
folios by clearing their uptodate flag via folio_clear_uptodate().

However, these folios may still be mapped in process page tables. If a
subsequent operation (such as ftruncate calling ext4_block_truncate_page)
triggers a write fault, the existing page table entry allows access to
the now-invalidated folio. This leads to ext4_page_mkwrite() being called
with a non-uptodate folio, which then gets marked dirty, triggering:

    WARNING: CPU: 0 PID: 5 at mm/page-writeback.c:2960
    __folio_mark_dirty+0x578/0x880

    Call Trace:
     fault_dirty_shared_page+0x16e/0x2d0
     do_wp_page+0x38b/0xd20
     handle_pte_fault+0x1da/0x450

The sequence leading to this warning is:

1. Process writes to mmap'd file, folio becomes uptodate and dirty
2. Writeback begins, but delayed allocation fails due to corruption
3. mpage_release_unused_pages(invalidate=true) is called:
   - block_invalidate_folio() clears dirty flag
   - folio_clear_uptodate() clears uptodate flag
   - But folio remains mapped in page tables
4. Later, ftruncate triggers ext4_block_truncate_page()
5. This causes a write fault on the still-mapped folio
6. ext4_page_mkwrite() is called with folio that is !uptodate
7. block_page_mkwrite() marks buffers dirty
8. fault_dirty_shared_page() tries to mark folio dirty
9. block_dirty_folio() calls __folio_mark_dirty(warn=1)
10. WARNING triggers: WARN_ON_ONCE(warn && !uptodate && !dirty)

Fix this by unmapping folios from page tables before invalidating them
using unmap_mapping_pages(). This ensures that subsequent accesses
trigger new page faults rather than reusing invalidated folios through
stale page table entries.

Note that this results in data loss for any writes to the mmap'd region
that couldn't be written back, but this is expected behavior when
writeback fails due to filesystem corruption. The existing error message
already states "This should not happen!! Data will be lost".

Reported-by: syzbot+b0a0670332b6b3230a0a@syzkaller.appspotmail.com
Tested-by: syzbot+b0a0670332b6b3230a0a@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=b0a0670332b6b3230a0a
Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
Link: https://patch.msgid.link/20251205055914.1393799-1-kartikey406@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Now I have sufficient information to complete the forensic analysis.

## PHASE 1: COMMIT MESSAGE FORENSICS

**Step 1.1 — Subject line parse**
- Record: Subsystem `ext4:`; action verb `unmap` (describes the fix
  mechanism, but semantically a fix); one-line summary: invalidated
  folios in the writeback error path must also be unmapped from user
  page tables to avoid a `__folio_mark_dirty` WARN.

**Step 1.2 — Commit message tags**
- Record:
  - `Reported-by: syzbot+b0a0670332b6b3230a0a@syzkaller.appspotmail.com`
  - `Tested-by: syzbot+...@syzkaller.appspotmail.com` (auto-test bot
    confirmed fix)
  - `Closes:
    https://syzkaller.appspot.com/bug?extid=b0a0670332b6b3230a0a`
  - `Suggested-by: Matthew Wilcox <willy@infradead.org>` (MM maintainer)
  - `Signed-off-by: Deepanshu Kartikey` + `Signed-off-by: Theodore Ts'o`
    (ext4 maintainer)
  - `Link:` to lore message id
  - No Fixes:, no Cc: stable (expected – that's why this is up for
    review)

**Step 1.3 — Commit body analysis**
- Record: Very detailed 10-step reproduction flow. The author identifies
  the exact sequence: mmap write → dirty/uptodate folio → delayed-alloc
  failure (e.g., corruption) →
  `mpage_release_unused_pages(invalidate=true)` → folio invalidated but
  still mapped → later write fault (e.g., from
  `ext4_block_truncate_page()`) hits `ext4_page_mkwrite()` with
  `!uptodate` folio → `WARN_ON_ONCE(warn && !uptodate && !dirty)` fires
  in `__folio_mark_dirty()`. Author explicitly states this is not
  theoretical — syzbot has a C reproducer. Also notes data-loss is
  intentional/expected on writeback failure ("This should not happen!!
  Data will be lost" message is pre-existing).

**Step 1.4 — Hidden bug fix?**
- Record: Not hidden — the subject names the mechanism, and the body
  explicitly documents a WARN and a concrete syscall sequence. This is
  clearly a fix.

## PHASE 2: DIFF ANALYSIS

**Step 2.1 — Inventory**
- Record: 1 file changed (`fs/ext4/inode.c`), +15/-1 lines, all in
  `mpage_release_unused_pages()`. Single-file surgical fix, scope = very
  small.

**Step 2.2 — Code flow change**
- Record: Before: when `invalidate=true` and `folio_mapped(folio)` was
  true, we only `folio_clear_dirty_for_io(folio)` to clear the PTE-dirty
  bits (from 2016 commit `4e800c0359d9a`), then
  `block_invalidate_folio()` + `folio_clear_uptodate()`, and left the
  mapping in place. After: we additionally call
  `unmap_mapping_pages(folio->mapping, folio->index,
  folio_nr_pages(folio), false)` to tear the folio out of every
  process's page tables, so no stale PTE can resurface the now-
  invalidated folio.

**Step 2.3 — Bug mechanism classification**
- Record: Memory-safety / correctness in error path. Stale PTE pointing
  at an invalidated folio → `fault_dirty_shared_page()` reaches
  `__folio_mark_dirty()` with `!uptodate && !dirty`, firing a KERNEL
  WARN. It is a bug (WARN = kernel bug signal to syzbot) and also opens
  the door to suspicious follow-on state (dirty bits on a folio the
  filesystem has already written off).

**Step 2.4 — Fix quality**
- Record: Obvious and correct. `unmap_mapping_pages()` is the standard
  MM helper for exactly this purpose (used by truncate_pagecache,
  `filemap_fault` race handling, etc.). It runs only under
  `invalidate=true` — i.e., only on the writeback-failure path — so the
  runtime cost in the non-error case is zero. Very low regression risk:
  the worst case is forcing future access to re-fault, which is benign.

## PHASE 3: GIT HISTORY INVESTIGATION

**Step 3.1 — Blame**
- Record: The surrounding construct (`if (folio_mapped())
  folio_clear_dirty_for_io(...)`, then `block_invalidate_folio` +
  `folio_clear_uptodate`) was added by commit `4e800c0359d9a` ("ext4:
  bugfix for mmaped pages in mpage_release_unused_pages()"), released in
  v4.9-rc1 (2016). So the incomplete handling has existed since v4.9 —
  every current stable tree is affected.

**Step 3.2 — Fixes: tag**
- Record: No `Fixes:` tag is in the commit (expected — this is a
  candidate under review). The bug is logically introduced by
  `4e800c0359d9a` (v4.9), which is present in every active stable tree.

**Step 3.3 — File history**
- Record: Recent touches to `mpage_release_unused_pages()` include
  `d8be7607de039` (ext4: Move mpage_page_done() calls after error
  handling), `fb5a5be05fb45` (convert to filemap_get_folios),
  `a297b2fcee461` (unlock unused_pages timely). None address this
  specific stale-PTE issue. This patch is self-contained; not part of a
  series.

**Step 3.4 — Author**
- Record: `Deepanshu Kartikey` is a regular syzbot-driven contributor
  (many small fixes across ext4, gfs2, netfs, mac80211). Not the
  maintainer, but the commit was reviewed and applied by ext4 maintainer
  Theodore Ts'o.

**Step 3.5 — Dependencies**
- Record: Only depends on `unmap_mapping_pages()`, which exists since
  v4.16 (mm commit `977fbdcd5986c`) — verified present in every stable
  tree checked (5.10, 5.15, 6.1, 6.6, 6.12). No patch-series dependency.

## PHASE 4: MAILING LIST AND EXTERNAL RESEARCH

**Step 4.1 — Original submission**
- Record: `b4 dig -c 9b25f381...` resolved to the v3 thread at `https://
  lore.kernel.org/all/20251205055914.1393799-1-kartikey406@gmail.com/`.
  `b4 dig -a` shows this is v3 (earlier attempts v1/v2 tried to fix it
  in `ext4_page_mkwrite()` — see syzbot Discussions table linking
  `20251122015742.362444-1-...` and `20251121131305.332698-1-...`). The
  v3 approach was suggested by Matthew Wilcox and preferred by Ted Ts'o.
  Ted applied v3 directly with "Applied, thanks!" (mbox saved by b4
  shows `commit: 9b25f381de6b...`).

**Step 4.2 — Reviewers**
- Record: To/Cc from `b4 dig -w` includes `tytso@mit.edu` (ext4
  maintainer — applied), `adilger.kernel@dilger.ca` (ext4 co-
  maintainer), `willy@infradead.org` (MM maintainer — suggested the
  fix), `djwong@kernel.org`, `yi.zhang@huaweicloud.com`, `linux-
  ext4@vger.kernel.org`, `linux-kernel@vger.kernel.org`. Appropriate
  audience reviewed the change.

**Step 4.3 — Bug report**
- Record: Fetched
  https://syzkaller.appspot.com/bug?extid=b0a0670332b6b3230a0a. Syzbot
  has a C reproducer. First crash 254 days before fetch, last 5d ago.
  Label `Fix commit: 9b25f381de6b` confirms this commit closed the
  upstream bug. The sample crash shows `__folio_mark_dirty` WARN with
  call trace `block_dirty_folio → fault_dirty_shared_page → do_wp_page →
  handle_mm_fault → do_user_addr_fault` — exact match to the commit
  message. Linux-6.6 has a sibling report labeled `origin:lts-only` and
  linux-6.1 one labeled `missing-backport`, indicating stable trees
  still need a fix.

**Step 4.4 — Related patches**
- Record: This is a single-patch series (v3); v1/v2 were alternative
  approaches to the same bug, superseded. No dependent patches.

**Step 4.5 — Stable ML**
- Record: No explicit Cc: stable in the applied patch. Syzbot label
  `missing-backport` on 6.1 is effectively a public request for stable
  coverage of this bug.

## PHASE 5: CODE SEMANTIC ANALYSIS

**Step 5.1 — Functions in diff**
- Record: Only `mpage_release_unused_pages()` is modified.

**Step 5.2 — Callers**
- Record: Two call sites in `ext4_do_writepages()`:
  `mpage_release_unused_pages(mpd, false)` (normal completion, no
  invalidate) and `mpage_release_unused_pages(mpd, give_up_on_write)`
  (error path). The fix only triggers on the second (writeback-failure)
  path.

**Step 5.3 — Callees**
- Record: After fix adds `unmap_mapping_pages(folio->mapping,
  folio->index, folio_nr_pages(folio), false)` — standard MM helper that
  tears down PTEs for the given pgoff range (non-even-cows). Existing
  callees: `folio_clear_dirty_for_io`, `block_invalidate_folio`,
  `folio_clear_uptodate`, `folio_unlock`.

**Step 5.4 — Call chain / reachability**
- Record: `ext4_do_writepages` is called from the ordinary writeback
  path (syscalls such as `fsync`, `sync`, `msync`, memory-pressure-
  driven writeback). The `give_up_on_write=true` branch is taken when
  `ext4_map_blocks()` returns an error — e.g., on corruption detected by
  the extent tree. So an unprivileged user with a mmap of a corrupt ext4
  image can trigger it, which is exactly what syzbot does.

**Step 5.5 — Similar patterns**
- Record: Related earlier fix in the same function — commit
  `4e800c0359d9a` from 2016 — covered the PTE-dirty bit but not the PTE
  itself. The new patch completes that earlier partial fix. The same
  philosophy (unmap before invalidating) is used by
  `truncate_inode_pages_range()` and `invalidate_inode_pages2_range()`
  in mm/truncate.c, so this brings ext4 in line with the mm convention.

## PHASE 6: CROSS-REFERENCING AND STABLE TREE ANALYSIS

**Step 6.1 — Code exists in stable**
- Record: Verified the vulnerable pattern exists:
  - `stable/linux-6.19.y`: `folio_mapped(folio) →
    folio_clear_dirty_for_io` without unmap ✓
  - `stable/linux-6.18.y`: same ✓
  - `stable/linux-6.17.y`: same ✓
  - `stable/linux-6.12.y`: same ✓
  - `stable/linux-6.6.y`: same ✓
  - `stable/linux-6.1.y`: same ✓
  - `stable/linux-5.15.y`, `5.10.y`: same logic but pre-folio
    (`page_mapped(page) → clear_page_dirty_for_io`) — needs port to page
    API.

**Step 6.2 — Backport complications**
- Record: For 6.1..6.19 the hunk is effectively identical and should
  apply cleanly or with trivial offsets. For 5.15/5.10, the patch must
  be re-expressed using `unmap_mapping_pages(page->mapping, page->index,
  compound_nr(page), false)` or `1` for non-compound.
  `unmap_mapping_pages()` itself is available since v4.16, so available
  in all these trees.

**Step 6.3 — Already fixed?**
- Record: `git log --grep="unmap invalidated folios"` in
  `stable/linux-6.1/6.6/6.12/6.17/6.18/6.19` returned nothing. Not yet
  backported.

## PHASE 7: SUBSYSTEM AND MAINTAINER CONTEXT

**Step 7.1 — Subsystem**
- Record: `fs/ext4/` — one of the most widely deployed filesystems.
  Criticality: IMPORTANT (affects a large population of users,
  especially enterprise and Android).

**Step 7.2 — Activity**
- Record: ext4/inode.c is very actively maintained; the specific
  `mpage_release_unused_pages()` function has had targeted fixes before
  (2016, 2024). Writeback error path is exercised any time delayed
  allocation fails.

## PHASE 8: IMPACT AND RISK ASSESSMENT

**Step 8.1 — Affected users**
- Record: Any user of ext4 who has a mmapped file where delayed block
  allocation fails (FS corruption, ENOSPC under certain delalloc
  conditions, etc.). Unprivileged users can trigger it with a
  crafted/corrupt image (syzbot proved this).

**Step 8.2 — Trigger conditions**
- Record: Mmap a file on ext4, dirty it, then force writeback to fail
  (syzbot does this with a corrupt FS image). A concrete C reproducer
  exists and still crashes unpatched 6.6.y as of ~5 days ago.

**Step 8.3 — Failure mode / severity**
- Record: Kernel WARN (`WARN_ON_ONCE(warn && !uptodate && !dirty)`),
  plus the page stays accessible via stale PTEs after invalidation. On
  systems with `panic_on_warn`, this is a kernel panic (DoS). Even
  without panic_on_warn, the invariant violation signals a genuine
  state-machine bug and can mislead subsequent writeback/truncate logic.
  Severity: MEDIUM-HIGH (WARN / potential DoS / invariant violation; a
  security-relevant WARN class that syzbot tracks specifically).

**Step 8.4 — Risk-benefit**
- Record: Benefit — closes a syzbot-tracked bug with public C
  reproducer, stops WARN/panic on corrupt FS workloads, on a core
  filesystem. Risk — fix is 15 lines, only executes in the writeback-
  error path, uses a well-understood MM API, reviewed by MM + ext4
  maintainers, and has syzbot `Tested-by`. Ratio strongly favors
  backporting.

## PHASE 9: SYNTHESIS

**Step 9.1 — Evidence**
- For: syzbot-reported + reproducible, WARN on a core filesystem, tiny
  surgical change, reviewed by the MM maintainer who suggested it and
  applied by the ext4 maintainer, Tested-by syzbot, the buggy code
  exists in every active stable tree, sibling syzbot reports on 6.1.y
  and 6.6.y explicitly labeled `missing-backport` / `origin:lts-only`,
  dependency `unmap_mapping_pages()` is present since v4.16.
- Against: No `Cc: stable` tag (not a decisive signal — this is
  precisely why it's under review). For 5.10/5.15 a minor port from
  folio to page API is needed.
- Unresolved: None that would flip the decision.

**Step 9.2 — Stable rules checklist**
1. Obviously correct and tested — YES (syzbot Tested-by; straightforward
   mm API use).
2. Fixes a real bug — YES (WARN with C reproducer).
3. Important issue — YES (kernel WARN/potential panic, invariant
   violation in core filesystem).
4. Small and contained — YES (+15/-1 in one function, error path only).
5. No new features — YES.
6. Can apply to stable — YES for 6.1+ cleanly; minor hand-port for
   5.15/5.10.

**Step 9.3 — Exception category**
- Not a device-ID/quirk/DT case; this is a straightforward bug fix that
  qualifies on its own merits.

**Step 9.4 — Decision**
- Backport.

## Verification
- [Phase 1] Parsed tags from the commit body: `Reported-by:
  syzbot+b0a0670332b6b3230a0a@...`, `Tested-by: syzbot+...`, `Suggested-
  by: Matthew Wilcox`, `Signed-off-by: Theodore Ts'o`, `Closes:
  syzkaller URL`. No Fixes: or Cc: stable (expected for candidates).
- [Phase 2] Ran `git show 9b25f381de6b...` — confirmed diff is +15/-1 in
  `fs/ext4/inode.c`, only inside `mpage_release_unused_pages()`'s `if
  (invalidate)` block, adds `unmap_mapping_pages(folio->mapping,
  folio->index, folio_nr_pages(folio), false)`.
- [Phase 3] `git log --oneline --grep="mpage_release_unused_pages"` —
  found 8 historical touches including the 2016 partial fix
  `4e800c0359d9a` ("ext4: bugfix for mmaped pages..."). `git describe
  --contains 4e800c0359d9a` → v4.9-rc1 — confirms the vulnerable
  construct has been in stable trees since v4.9.
- [Phase 3] Confirmed no Fixes: tag in commit; logical predecessor is
  `4e800c0359d9a`.
- [Phase 3] `git log --author="Deepanshu Kartikey"` — author is a
  syzbot-focused contributor with many accepted small fixes across
  subsystems.
- [Phase 4] `b4 dig -c 9b25f381de6b...` returned the v3 submission URL `
  https://lore.kernel.org/all/20251205055914.1393799-1-
  kartikey406@gmail.com/`.
- [Phase 4] `b4 dig -c ... -a` showed this is v3; earlier v1/v2 took a
  different (rejected) approach in `ext4_page_mkwrite()`.
- [Phase 4] `b4 dig -c ... -w` confirmed willy, tytso, adilger, djwong,
  yi.zhang, linux-ext4 were CC'd and reviewed.
- [Phase 4] `b4 dig -c ... -m` and read the mbox — Ted Ts'o applied v3
  with "Applied, thanks!", commit `9b25f381de6b`.
- [Phase 4] Fetched syzkaller URL — confirmed public C reproducer, `Fix
  commit: 9b25f381de6b`, still first-crashed 254 days ago and last seen
  5 days ago on unpatched trees. Sibling bugs `a92b613efd5e` (linux-6.1,
  label `missing-backport`) and `d429f1fb4bc9` (linux-6.6, label
  `origin:lts-only`) indicate stable trees still need the fix.
- [Phase 5] Manually traced: only two call sites in
  `ext4_do_writepages()`, the patched branch only hits the
  `give_up_on_write` error path. Confirmed `unmap_mapping_pages` is used
  elsewhere in mm/ with same pattern (truncate/invalidate).
- [Phase 6] Read `fs/ext4/inode.c` in stable/linux-6.1.y, 6.6.y, 6.12.y,
  6.17.y, 6.18.y, 6.19.y — all contain the unchanged vulnerable `if
  (folio_mapped(folio)) folio_clear_dirty_for_io(folio);
  block_invalidate_folio(...); folio_clear_uptodate(...);` pattern. Read
  5.10.y/5.15.y — contain the pre-folio `page_mapped(page) /
  clear_page_dirty_for_io / block_invalidatepage / ClearPageUptodate`
  pattern (needs small port).
- [Phase 6] `git log stable/linux-6.{1,6,12,17,18,19}.y --grep="unmap
  invalidated folios"` — none contain this fix yet.
- [Phase 6] Confirmed `unmap_mapping_pages` is defined in `mm/memory.c`
  of every stable tree checked; `git describe --contains 977fbdcd5986c`
  = v4.16-rc1.
- [Phase 7] `fs/ext4/` is a core, widely-used filesystem;
  `mpage_release_unused_pages()` touched by multiple targeted fixes over
  the years.
- [Phase 8] Failure mode verified from the sample syzbot crash:
  `block_dirty_folio → fault_dirty_shared_page → do_wp_page →
  handle_mm_fault → exc_page_fault`, exactly matching the commit
  message's description.
- UNVERIFIED: I did not attempt an actual patch apply against
  5.10.y/5.15.y — expect a minor manual port (folio→page API), not a
  semantic change.

The fix is small, surgical, obviously correct, syzbot-reported with a
reproducer, reviewed by ext4+MM maintainers, touches only the writeback-
error path, and the vulnerable code pattern is present in every active
stable tree. It satisfies all stable-kernel rules with very low
regression risk.

**YES**

 fs/ext4/inode.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 1123d995494b5..025ea8f0c41bd 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1757,8 +1757,22 @@ static void mpage_release_unused_pages(struct mpage_da_data *mpd,
 			BUG_ON(!folio_test_locked(folio));
 			BUG_ON(folio_test_writeback(folio));
 			if (invalidate) {
-				if (folio_mapped(folio))
+				if (folio_mapped(folio)) {
 					folio_clear_dirty_for_io(folio);
+					/*
+					 * Unmap folio from page
+					 * tables to prevent
+					 * subsequent accesses through
+					 * stale PTEs. This ensures
+					 * future accesses trigger new
+					 * page faults rather than
+					 * reusing the invalidated
+					 * folio.
+					 */
+					unmap_mapping_pages(folio->mapping,
+						folio->index,
+						folio_nr_pages(folio), false);
+				}
 				block_invalidate_folio(folio, 0,
 						folio_size(folio));
 				folio_clear_uptodate(folio);
-- 
2.53.0


^ permalink raw reply related

* [PATCH AUTOSEL 7.0-6.6] ALSA: hda/realtek: Add quirk for HP Spectre x360 14-ea
From: Sasha Levin @ 2026-04-20 13:21 UTC (permalink / raw)
  To: patches, stable
  Cc: songxiebing, dzidmail, Takashi Iwai, Sasha Levin, perex, tiwai,
	linux-sound, linux-kernel
In-Reply-To: <20260420132314.1023554-1-sashal@kernel.org>

From: songxiebing <songxiebing@kylinos.cn>

[ Upstream commit 882321ccaeea52dd645dff98bfea2f92b286e673 ]

HP Spectre x360 Convertible 14-ea0xxx (2021 model or so)
doesn't make produce sound,The Bang & Olufsen speaker amplifier
is not enabled.

Root causing:
The PCI subsystem ID is 103c:0000 (HP left it unset), while the codec
subsystem ID is 103c:885b. The vendor-wide catch-all
SND_PCI_QUIRK_VENDOR(0x103c, "HP", ALC269_FIXUP_HP_MUTE_LED) matches
103c:0000 before the codec SSID fallback is reached, so
ALC245_FIXUP_HP_X360_AMP never applies.

So add the quirk in alc269_fixup_tbl.

Reported-by: dzidmail <dzidmail@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=221341
Signed-off-by: songxiebing <songxiebing@kylinos.cn>
Link: https://patch.msgid.link/20260413011854.96520-1-songxiebing@kylinos.cn
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Now I have all the information needed for a thorough analysis. Let me
compile the findings.

---

## PHASE 1: COMMIT MESSAGE FORENSICS

**Step 1.1: Subject Line**
- Subsystem: `ALSA: hda/realtek`
- Action: "Add quirk"
- Summary: Adding a codec SSID quirk for HP Spectre x360 14-ea to enable
  its speaker amplifier.
- Record: [ALSA/HDA/Realtek] [Add] [Hardware quirk for HP Spectre x360
  14-ea speaker amp]

**Step 1.2: Tags**
- Reported-by: dzidmail <dzidmail@gmail.com> — a real user reporting the
  issue
- Closes: https://bugzilla.kernel.org/show_bug.cgi?id=221341 — filed bug
  report
- Signed-off-by: songxiebing <songxiebing@kylinos.cn> — author, a
  recurring HDA contributor
- Link: patch.msgid.link — lore submission link
- Signed-off-by: Takashi Iwai <tiwai@suse.de> — the HDA subsystem
  maintainer applied it
- Record: User-reported bug with bugzilla tracker. HDA maintainer
  Takashi Iwai merged it directly.

**Step 1.3: Commit Body**
- Bug: HP Spectre x360 14-ea (2021 model) produces no sound. Bang &
  Olufsen speaker amplifier is not enabled.
- Root cause explained clearly: PCI subsystem ID is `103c:0000` (HP left
  it unset). The vendor catch-all `SND_PCI_QUIRK_VENDOR(0x103c, "HP",
  ALC269_FIXUP_HP_MUTE_LED)` matches first because it checks PCI SSID,
  preventing the codec SSID fallback from ever reaching
  `ALC245_FIXUP_HP_X360_AMP`.
- Fix: Use `HDA_CODEC_QUIRK(0x103c, 0x885b, ...)` which sets
  `match_codec_ssid=true`, causing matching against codec SSID
  `103c:885b` in the primary loop, before vendor catch-all kicks in.
- Record: [No audio output] [Speaker amp not enabled] [Incorrect quirk
  applied due to unset PCI SSID]

**Step 1.4: Hidden Bug Fix?**
This is an explicit hardware quirk fix. Not hidden — it directly
addresses a broken hardware scenario. The commit explains the exact
mechanism.

## PHASE 2: DIFF ANALYSIS

**Step 2.1: Inventory**
- 1 file changed: `sound/hda/codecs/realtek/alc269.c`
- 1 line added: `HDA_CODEC_QUIRK(0x103c, 0x885b, "HP Spectre x360
  14-ea", ALC245_FIXUP_HP_X360_AMP),`
- Scope: Single-line surgical addition to an existing quirk table.
- Record: [+1 line in alc269_fixup_tbl quirk table] [Minimal scope]

**Step 2.2: Code Flow Change**
- Before: No entry for codec SSID `103c:885b`. The vendor catch-all
  applies `ALC269_FIXUP_HP_MUTE_LED`, which doesn't toggle the GPIO pin
  needed for the B&O speaker amp.
- After: `HDA_CODEC_QUIRK` with `match_codec_ssid=true` matches in the
  primary loop via codec SSID → `ALC245_FIXUP_HP_X360_AMP` applied →
  GPIO toggled → speaker amp enabled.

**Step 2.3: Bug Mechanism**
Category (h): Hardware workaround / codec quirk. The existing
`ALC245_FIXUP_HP_X360_AMP` fixup already exists and works for sibling
models (0x87f6, 0x87f7). This just adds the correct matching entry for a
model with an unset PCI SSID.

**Step 2.4: Fix Quality**
- Obviously correct: Uses the well-established `HDA_CODEC_QUIRK` pattern
  already present ~10 times in this same table.
- Minimal: Single table entry addition.
- Regression risk: Essentially zero. Only affects devices with codec
  SSID `103c:885b`.

## PHASE 3: GIT HISTORY INVESTIGATION

**Step 3.1: Blame**
The existing `ALC245_FIXUP_HP_X360_AMP` entries for sibling models
(0x87f6, 0x87f7) date back to commit `aeeb85f26c3bb` (2025-07-09 file
split), but originate from much earlier. The fixup function
`alc245_fixup_hp_x360_amp` exists at line 1448.

**Step 3.2: Fixes tag**
No Fixes: tag — expected for this type of quirk addition.

**Step 3.3: File History**
Recent history shows a steady stream of similar quirk additions (Lenovo
Yoga, Acer Swift, HP Laptop, Samsung, ASUS, Framework). This is routine
maintenance for this file.

**Step 3.4: Author**
songxiebing is a recurring HDA contributor with 4 other commits in this
tree. Patch was merged by Takashi Iwai, the HDA subsystem maintainer.

**Step 3.5: Dependencies**
No dependencies. All required infrastructure exists in the 7.0 tree:
- `HDA_CODEC_QUIRK` macro (verified in
  `sound/hda/common/hda_local.h:314-320`)
- `ALC245_FIXUP_HP_X360_AMP` fixup (line 4841-4846)
- `alc245_fixup_hp_x360_amp` function (line 1448)
- `match_codec_ssid` matching logic in `auto_parser.c`

## PHASE 4: MAILING LIST AND EXTERNAL RESEARCH

**Step 4.1-4.5:** Lore and bugzilla were unreachable due to anti-bot
protections. However, the commit message provides sufficient context:
- Bug reported via bugzilla.kernel.org (#221341)
- Patch submitted and applied within days by the subsystem maintainer
- The Link: tag confirms it went through normal mailing list review

## PHASE 5: CODE SEMANTIC ANALYSIS

**Step 5.1-5.2:** The affected function `alc245_fixup_hp_x360_amp` (line
1448) toggles GPIO pin 0x01 to enable the speaker amplifier. This is
called during HDA codec initialization. Without this quirk matching, the
amplifier stays off = no speaker output.

**Step 5.3-5.4:** The matching logic in `snd_hda_pick_fixup`
(`auto_parser.c:1066-1080`) walks the quirk table linearly. With
`match_codec_ssid=true`, the new entry is checked against codec SSID on
every probe of this codec. The call chain is: codec probe →
`snd_hda_pick_fixup` → table walk → match codec SSID → apply fixup.

**Step 5.5:** Similar `HDA_CODEC_QUIRK` entries exist for the same
purpose (ASUS, Lenovo devices with mismatched PCI/codec SSIDs). This is
a well-established pattern.

## PHASE 6: STABLE TREE ANALYSIS

**Step 6.1:** The `HDA_CODEC_QUIRK` infrastructure and
`ALC245_FIXUP_HP_X360_AMP` fixup exist in the 7.0 stable tree. In older
trees (6.x), the file path would be `sound/pci/hda/patch_realtek.c` and
`HDA_CODEC_QUIRK` may need to be verified.

**Step 6.2:** For 7.0 specifically, the patch should apply with at most
minor context offset. The surrounding lines match the current tree
exactly.

**Step 6.3:** No existing fix for this specific device in the tree.

## PHASE 7: SUBSYSTEM AND MAINTAINER CONTEXT

**Step 7.1:** ALSA/HDA is an IMPORTANT subsystem. Audio is a core user-
facing feature — no audio output is a severe usability issue.

**Step 7.2:** The file sees constant quirk additions (10+ in recent
history), and Takashi Iwai actively maintains it.

## PHASE 8: IMPACT AND RISK ASSESSMENT

**Step 8.1:** Affected: Users of HP Spectre x360 14-ea (2021 model).
This is a premium consumer laptop.
**Step 8.2:** Trigger: Every boot. Device always has no speaker output.
100% reproducible.
**Step 8.3:** Severity: HIGH — complete loss of audio output from
speakers. Device is essentially broken for audio use without this quirk.
**Step 8.4:** Benefit: HIGH — restores audio on a shipping consumer
laptop. Risk: VERY LOW — single table entry, only affects devices with
codec SSID 103c:885b, uses a well-tested fixup function that already
works on sibling models.

## PHASE 9: FINAL SYNTHESIS

**Evidence FOR:**
- Fixes complete audio failure (no speaker output) on a real consumer
  laptop
- User-reported via bugzilla (#221341)
- Single-line quirk table entry — minimal, surgical, obviously correct
- Uses existing well-tested fixup (`ALC245_FIXUP_HP_X360_AMP`) already
  applied to sibling models
- Merged by HDA subsystem maintainer Takashi Iwai
- Falls into the "hardware quirk" exception category — automatically YES
  per stable rules
- All required infrastructure exists in the 7.0 tree

**Evidence AGAINST:**
- None identified.

**Stable Rules Checklist:**
1. Obviously correct and tested? YES — identical pattern to existing
   entries, user-reported fix
2. Fixes a real bug? YES — no audio output on HP Spectre x360 14-ea
3. Important issue? YES — complete loss of speaker output
4. Small and contained? YES — 1 line added
5. No new features or APIs? YES — no new features
6. Can apply to stable? YES — all infrastructure present

**Exception Category:** Hardware quirk addition — automatically
qualifies for stable.

## Verification

- [Phase 1] Parsed all tags: Reported-by user, Closes bugzilla, merged
  by Takashi Iwai (HDA maintainer)
- [Phase 2] Diff: exactly 1 line added — `HDA_CODEC_QUIRK(0x103c,
  0x885b, ...)` in `alc269_fixup_tbl`
- [Phase 3] git blame: sibling entries (0x87f6, 0x87f7) for same fixup
  present since file split (aeeb85f26c3bb)
- [Phase 3] git log author: songxiebing has 4 other HDA commits in tree
- [Phase 5] Verified `HDA_CODEC_QUIRK` macro definition at
  `hda_local.h:314-320` sets `match_codec_ssid=true`
- [Phase 5] Verified `snd_hda_pick_fixup` matching logic at
  `auto_parser.c:1067-1080` — codec SSID path
- [Phase 5] Verified guard at `auto_parser.c:1019` prevents double-
  matching
- [Phase 5] Verified vendor catch-all at `alc269.c:7834` is in separate
  `alc269_fixup_vendor_tbl[]`
- [Phase 5] Verified `alc245_fixup_hp_x360_amp` function exists at line
  1448 (toggles GPIO for speaker amp)
- [Phase 6] Verified `ALC245_FIXUP_HP_X360_AMP` and `HDA_CODEC_QUIRK`
  both exist in the 7.0 tree
- [Phase 4] UNVERIFIED: Could not access bugzilla or lore due to anti-
  bot protections; relied on commit message metadata

**YES**

 sound/hda/codecs/realtek/alc269.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/sound/hda/codecs/realtek/alc269.c b/sound/hda/codecs/realtek/alc269.c
index f10ee482151f6..e50ad953b09e7 100644
--- a/sound/hda/codecs/realtek/alc269.c
+++ b/sound/hda/codecs/realtek/alc269.c
@@ -6955,6 +6955,7 @@ static const struct hda_quirk alc269_fixup_tbl[] = {
 	SND_PCI_QUIRK(0x103c, 0x8847, "HP EliteBook x360 830 G8 Notebook PC", ALC285_FIXUP_HP_GPIO_LED),
 	SND_PCI_QUIRK(0x103c, 0x884b, "HP EliteBook 840 Aero G8 Notebook PC", ALC285_FIXUP_HP_GPIO_LED),
 	SND_PCI_QUIRK(0x103c, 0x884c, "HP EliteBook 840 G8 Notebook PC", ALC285_FIXUP_HP_GPIO_LED),
+	HDA_CODEC_QUIRK(0x103c, 0x885b, "HP Spectre x360 14-ea", ALC245_FIXUP_HP_X360_AMP),
 	SND_PCI_QUIRK(0x103c, 0x8862, "HP ProBook 445 G8 Notebook PC", ALC236_FIXUP_HP_LIMIT_INT_MIC_BOOST),
 	SND_PCI_QUIRK(0x103c, 0x8863, "HP ProBook 445 G8 Notebook PC", ALC236_FIXUP_HP_LIMIT_INT_MIC_BOOST),
 	SND_PCI_QUIRK(0x103c, 0x886d, "HP ZBook Fury 17.3 Inch G8 Mobile Workstation PC", ALC285_FIXUP_HP_GPIO_AMP_INIT),
-- 
2.53.0

^ permalink raw reply related

* [PATCH AUTOSEL 7.0-6.12] dm vdo indexer: validate saved zone count
From: Sasha Levin @ 2026-04-20 13:21 UTC (permalink / raw)
  To: patches, stable
  Cc: Matthew Sakai, Mikulas Patocka, Sasha Levin, dm-devel,
	linux-kernel
In-Reply-To: <20260420132314.1023554-1-sashal@kernel.org>

From: Matthew Sakai <msakai@redhat.com>

[ Upstream commit 9e809bb1defe9be7fed2e21552c6b03b2694394d ]

Verify that the loaded zone count is in the valid range
before using it as a loop iterator.

Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Now I have all the information needed for a thorough analysis. Let me
compile everything.

---

## PHASE 1: COMMIT MESSAGE FORENSICS

### Step 1.1: Subject Line
- **Subsystem:** `dm vdo indexer` (device mapper, VDO deduplication
  index)
- **Action verb:** "validate" - implies adding a missing safety check
- **Summary:** Validates the zone count loaded from disk before using it
  as a loop iterator

**Record:** [dm vdo indexer] [validate] [Add bounds check on zone count
read from disk before loop use]

### Step 1.2: Tags
- `Signed-off-by: Matthew Sakai <msakai@redhat.com>` - the dm-vdo
  subsystem author/maintainer
- `Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>` - the dm
  subsystem maintainer who committed it

No Fixes: tag, no Reported-by, no Cc: stable. The absence of these is
expected for commits under manual review.

**Record:** Author is the dm-vdo subsystem maintainer. Committed through
the dm maintainer. No explicit bug reporter.

### Step 1.3: Commit Body
The message says: "Verify that the loaded zone count is in the valid
range before using it as a loop iterator." This clearly states:
- The zone count comes from loaded (on-disk) data
- It's used as a loop iterator
- Without validation, an invalid value would be used in the loop

**Record:** Bug = missing input validation on disk-loaded data used as
loop bound. Failure = out-of-bounds array access. Root cause = no bounds
check after reading from persistent storage.

### Step 1.4: Hidden Bug Fix Detection
This IS a bug fix despite using "validate" rather than "fix". It adds a
missing bounds check on data read from disk, preventing an out-of-bounds
array access. This is a classic data corruption / corrupted metadata
handling fix.

**Record:** Yes, this is a real bug fix - adding a missing bounds check
on untrusted data from disk.

---

## PHASE 2: DIFF ANALYSIS

### Step 2.1: Changes Inventory
- **File:** `drivers/md/dm-vdo/indexer/index-layout.c`
- **Lines added:** 3 (the `if` check + error return)
- **Function modified:** `reconstruct_index_save()`
- **Scope:** Single-file, single-function, 3-line surgical fix

**Record:** 1 file, +3 lines, extremely small and contained.

### Step 2.2: Code Flow Change
**Before:** Line 1447 computes `isl->zone_count =
table->header.region_count - 3` from disk data, then immediately uses
`zone_count` as the loop bound at line 1476: `for (z = 0; z <
isl->zone_count; z++)`, indexing into `volume_index_zones[z]`.

**After:** After computing `zone_count`, the code checks `if
(isl->zone_count > MAX_ZONES)` and returns `UDS_CORRUPT_DATA` error if
invalid.

### Step 2.3: Bug Mechanism
This is a **buffer overflow / out-of-bounds write** fix:

- `region_count` is a `u16` (0-65535) read from disk via
  `decode_u16_le()` at line 1129
- `zone_count = region_count - 3` (line 1447) - stored in `unsigned int`
- If `region_count > MAX_ZONES + 3 = 19`, then `zone_count > 16`, and
  the loop writes past the end of `volume_index_zones[MAX_ZONES]` (a
  fixed-size array of 16 entries at line 162)
- If `region_count < 3`, the subtraction wraps to a very large unsigned
  value, causing massive OOB access
- There's NO other validation of `region_count` vs `MAX_ZONES` in the
  load path

**Record:** [Out-of-bounds array access] [zone_count from disk used
without bounds check as index into fixed-size MAX_ZONES=16 array]

### Step 2.4: Fix Quality
- The fix is **obviously correct**: it checks `zone_count > MAX_ZONES`
  before the array is accessed
- It's **minimal**: exactly 3 lines
- It returns a proper error code (`UDS_CORRUPT_DATA`) with a log message
- **Zero regression risk**: it only rejects previously-invalid data that
  would have caused corruption

**Record:** Fix is obviously correct, minimal, zero regression risk.

---

## PHASE 3: GIT HISTORY INVESTIGATION

### Step 3.1: Blame
The buggy code was introduced in commit `b46d79bdb82aa1` ("dm vdo: add
deduplication index storage interface"), authored by Matthew Sakai on
2023-11-16. This commit first appeared in v6.9-rc1. The buggy code has
been present since the initial introduction of dm-vdo.

**Record:** Bug introduced in b46d79bdb82aa1 (v6.9-rc1). Present in all
kernels since v6.9.

### Step 3.2: Fixes Tag
No Fixes: tag present. The implicit target would be b46d79bdb82aa1.

### Step 3.3: File History
Recent changes to this file are minimal:
- `f4e99b846c901` - string warning fix (cosmetic)
- `b0e6210e7e616` - removed unused function
- `41c58a36e2c04` - use-after-free fix (similar safety concern)

There's also `9ddf6d3fcbe0b` ("dm vdo: return error on corrupted
metadata in start_restoring_volume functions") - a very similar pattern:
adding proper error returns on corrupted metadata in the same subsystem,
with a Fixes: tag.

**Record:** Standalone fix, no prerequisites. Similar metadata
validation fixes have been applied to dm-vdo.

### Step 3.4: Author
Matthew Sakai is the original author and maintainer of dm-vdo. He
authored the initial dm-vdo code (40-patch series) and continues
maintaining it. This fix comes from the subsystem maintainer.

**Record:** Author is the subsystem maintainer - highest trust level.

### Step 3.5: Dependencies
None. This is a self-contained 3-line addition that doesn't depend on
any other commits.

**Record:** No dependencies. Fully standalone.

---

## PHASE 4: MAILING LIST RESEARCH

### Step 4.1-4.2: Patch Discussion
I was unable to find the exact mailing list submission via b4 dig (the
commit isn't in the tree yet, so there's no SHA to search). Web searches
didn't return the specific patch thread. However, the commit was signed
off by both the subsystem maintainer (Sakai) and the dm maintainer
(Patocka), indicating it went through the standard dm review process.

**Record:** Could not locate specific lore thread. Verified through
standard dm maintainer chain.

### Step 4.3: Bug Report
No Reported-by tag. This appears to be a proactive fix found through
code review by the maintainer.

**Record:** Proactive fix by maintainer, not triggered by user report.

---

## PHASE 5: CODE SEMANTIC ANALYSIS

### Step 5.1-5.4: Call Chain
The full call chain from user-facing API to the vulnerable function:
1. `uds_make_index_layout()` - public API for creating/loading VDO index
2. `load_index_layout()` - loads existing index from disk
3. `load_sub_index_regions()` - loads saved index regions
4. `load_index_save()` - loads individual index save
5. `load_region_table()` - reads region table from disk (reads
   `region_count` as u16)
6. **`reconstruct_index_save()`** - uses `region_count` without
   validation -> OOB

This is called during VDO volume activation/load, which happens when a
dm-vdo target is activated (e.g., mounting a VDO-backed filesystem or
activating a VDO logical volume). The data comes from on-disk metadata.

**Record:** Reachable from VDO volume activation. Triggered by corrupted
on-disk metadata.

### Step 5.5: Similar Patterns
The similar fix `9ddf6d3fcbe0b` validates corrupted metadata in
`start_restoring_volume` functions, showing this is a known pattern in
dm-vdo where disk metadata isn't sufficiently validated.

---

## PHASE 6: STABLE TREE ANALYSIS

### Step 6.1: Buggy Code in Stable
dm-vdo was introduced in v6.9-rc1. Active stable trees that contain this
code:
- **v6.12.y** (LTS) - YES, contains dm-vdo
- **v6.14.y** (stable) - YES
- **v6.19.y** (stable) - YES
- v6.6.y (LTS) - NO (pre-dates dm-vdo)
- v6.1.y (LTS) - NO

**Record:** Bug exists in v6.12.y, v6.14.y, v6.19.y stable trees.

### Step 6.2: Backport Complications
Changes to the file between v6.12 and HEAD are minimal (MAGIC_SIZE
cleanup and function removal) - none affect the
`reconstruct_index_save()` function area. The patch should apply cleanly
to all stable trees with dm-vdo.

**Record:** Clean apply expected on all relevant stable trees.

---

## PHASE 7: SUBSYSTEM AND MAINTAINER CONTEXT

### Step 7.1: Subsystem
- **Subsystem:** `drivers/md/dm-vdo` - Device Mapper VDO (deduplication
  + compression)
- **Criticality:** IMPORTANT - VDO is used for storage deduplication in
  RHEL/enterprise environments. Data integrity is paramount for storage
  subsystems.

### Step 7.2: Activity
dm-vdo sees regular maintenance commits from its author. It's an
actively maintained storage driver.

---

## PHASE 8: IMPACT AND RISK ASSESSMENT

### Step 8.1: Who Is Affected
Users of dm-vdo (VDO deduplication). This includes RHEL and enterprise
Linux users who use VDO for storage optimization.

### Step 8.2: Trigger Conditions
- **Trigger:** Corrupted on-disk VDO metadata where `region_count` is
  out of expected range
- **How likely:** Corruption can occur from disk errors, power failures,
  or malicious manipulation
- **User triggering:** Any user activating a VDO volume with corrupted
  metadata

### Step 8.3: Failure Mode Severity
Without this fix, corrupted metadata causes an **out-of-bounds array
write** on a stack-based or structure-embedded array
(`volume_index_zones[MAX_ZONES]`). This results in:
- **Stack/heap corruption** - writing past the array bounds
- **Kernel crash/panic** - likely from corrupted data structures
- **Potential privilege escalation** - corrupted kernel data structures
  from controlled input

**Severity: CRITICAL** - out-of-bounds write from disk-loaded data,
potential kernel crash or memory corruption.

### Step 8.4: Risk-Benefit Ratio
- **Benefit:** HIGH - prevents kernel crash/corruption from malformed
  on-disk metadata
- **Risk:** VERY LOW - 3-line check that only rejects invalid data; zero
  chance of regression for valid data
- **Ratio:** Extremely favorable

---

## PHASE 9: FINAL SYNTHESIS

### Step 9.1: Evidence Summary

**FOR backporting:**
- Fixes an out-of-bounds array access from unvalidated disk-read data
  (security-relevant)
- Only 3 lines added - minimal surgical fix
- Obviously correct - simple bounds check against well-defined constant
- Zero regression risk - only rejects data that would have caused OOB
  access
- Written by subsystem maintainer, committed through dm maintainer
- Bug exists since dm-vdo introduction (v6.9), affects all stable trees
  with dm-vdo
- Clean apply expected
- Similar fix pattern already accepted for dm-vdo (`9ddf6d3fcbe0b`)

**AGAINST backporting:**
- No explicit user report or syzbot trigger (proactive fix)
- Requires corrupted on-disk metadata to trigger (not everyday scenario)

### Step 9.2: Stable Rules Checklist
1. Obviously correct and tested? **YES** - trivial bounds check,
   maintainer-authored
2. Fixes a real bug? **YES** - OOB array access from unvalidated disk
   data
3. Important issue? **YES** - kernel memory corruption/crash from
   corrupted storage metadata
4. Small and contained? **YES** - 3 lines in one function
5. No new features? **YES** - pure safety check
6. Can apply to stable? **YES** - clean apply expected

### Step 9.3: Exception Categories
Not needed - this qualifies as a straightforward bug fix.

---

## Verification

- [Phase 1] Parsed tags: Signed-off-by from subsystem maintainer (Sakai)
  and dm maintainer (Patocka)
- [Phase 2] Diff analysis: 3 lines added in `reconstruct_index_save()`,
  adds `zone_count > MAX_ZONES` bounds check
- [Phase 2] Confirmed `volume_index_zones[MAX_ZONES]` at line 162,
  MAX_ZONES=16 at config.h:23
- [Phase 2] Confirmed `region_count` is u16 read from disk at line 1129,
  used unvalidated at line 1447
- [Phase 3] git blame: all code in `reconstruct_index_save` from commit
  b46d79bdb82aa1 (v6.9-rc1)
- [Phase 3] git describe: b46d79bdb82aa1 first appears at
  v6.9-rc1~148^2~132
- [Phase 3] Similar fix 9ddf6d3fcbe0b validates corrupted metadata in
  same subsystem, has Fixes: tag
- [Phase 4] Could not locate specific lore thread (commit not yet
  merged, no SHA for b4 dig)
- [Phase 5] Call chain traced: uds_make_index_layout ->
  load_index_layout -> load_sub_index_regions -> load_index_save ->
  reconstruct_index_save
- [Phase 5] Trigger: VDO volume activation with corrupted on-disk
  metadata
- [Phase 6] File exists in v6.12.y, v6.14.y, v6.19.y; confirmed via git
  log
- [Phase 6] Changes since v6.12 don't affect the patched function area -
  clean apply expected
- [Phase 7] Subsystem: dm-vdo, actively maintained, IMPORTANT
  criticality (enterprise storage)
- [Phase 8] Failure mode: OOB array write -> kernel memory
  corruption/crash, severity CRITICAL

This is a textbook stable-worthy fix: a tiny, obviously correct bounds
check that prevents an out-of-bounds array access from corrupted on-disk
metadata in an enterprise storage driver. The fix has zero regression
risk and addresses a potentially security-relevant memory corruption
bug.

**YES**

 drivers/md/dm-vdo/indexer/index-layout.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/md/dm-vdo/indexer/index-layout.c b/drivers/md/dm-vdo/indexer/index-layout.c
index 61edf2b72427d..37144249f7ba6 100644
--- a/drivers/md/dm-vdo/indexer/index-layout.c
+++ b/drivers/md/dm-vdo/indexer/index-layout.c
@@ -1445,6 +1445,9 @@ static int __must_check reconstruct_index_save(struct index_save_layout *isl,
 	u64 last_block = next_block + isl->index_save.block_count;

 	isl->zone_count = table->header.region_count - 3;
+	if (isl->zone_count > MAX_ZONES)
+		return vdo_log_error_strerror(UDS_CORRUPT_DATA,
+					      "invalid zone count");

 	last_region = &table->regions[table->header.region_count - 1];
 	if (last_region->kind == RL_KIND_EMPTY) {
-- 
2.53.0

^ permalink raw reply related

* [PATCH AUTOSEL 7.0-6.18] drm/panel-edp: Add AUO B116XAT04.1 (HW: 1A)
From: Sasha Levin @ 2026-04-20 13:21 UTC (permalink / raw)
  To: patches, stable
  Cc: Terry Hsiao, Douglas Anderson, Sasha Levin, neil.armstrong,
	maarten.lankhorst, mripard, tzimmermann, airlied, simona,
	dri-devel, linux-kernel
In-Reply-To: <20260420132314.1023554-1-sashal@kernel.org>

From: Terry Hsiao <terry_hsiao@compal.corp-partner.google.com>

[ Upstream commit 2cb217301e0df17f7107a1b0941b28d4053eae8b ]

Add support for the AUO - B116XAT04.1 (HW: 1A) panel.
This panel is used on MT8186 Chromebooks

The raw EDID:
00 ff ff ff ff ff ff 00 06 af ba 89 00 00 00 00
0c 23 01 04 95 1a 0e 78 02 9e a5 96 59 58 96 28
1b 50 54 00 00 00 01 01 01 01 01 01 01 01 01 01
01 01 01 01 01 01 ce 1d 56 ea 50 00 1a 30 30 20
46 00 00 90 10 00 00 18 df 13 56 ea 50 00 1a 30
30 20 46 00 00 90 10 00 00 18 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02
00 10 48 ff 0f 3c 7d 0c 0a 2a 7d 20 20 20 00 21

Signed-off-by: Terry Hsiao <terry_hsiao@compal.corp-partner.google.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://patch.msgid.link/20260122032904.3933-1-terry_hsiao@compal.corp-partner.google.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Now I have enough information for a thorough analysis. Let me walk
through all phases.

---

## PHASE 1: COMMIT MESSAGE FORENSICS

### Step 1.1: Subject Line
- **Subsystem**: `drm/panel-edp`
- **Action verb**: "Add" — adding a new panel ID entry
- **Summary**: Adds EDID product ID 0x89ba for the AUO B116XAT04.1 (HW
  revision 1A) panel used on MT8186 Chromebooks.

Record: [drm/panel-edp] [Add] [New EDID product ID for existing panel
model, different HW revision]

### Step 1.2: Tags
- **Signed-off-by**: Terry Hsiao (author, Compal/Google partner) and
  Douglas Anderson (subsystem maintainer)
- **Reviewed-by**: Douglas Anderson (subsystem maintainer for Chromebook
  panel-edp)
- **Link**: patch.msgid.link URL (lore.kernel.org blocked by Anubis)
- No Fixes: tag (expected — this is a device ID addition, not a
  traditional bug fix)
- No Reported-by (expected — hardware enablement, not a bug report)
- No Cc: stable (expected — that's why it's under review)

Record: Reviewed and committed by subsystem maintainer Douglas Anderson.
Author is a regular contributor (same author as prior 6-panel batch
commit d4b9b6da5777b).

### Step 1.3: Commit Body
The body provides the raw EDID hex dump for the panel. The EDID shows
manufacturer AUO, product ID 0x89ba. The panel is used on MT8186
Chromebooks. Without this entry, the panel-edp driver cannot match this
specific panel by its EDID, meaning the panel won't be properly
initialized.

Record: [Hardware enablement for Chromebook panel] [Without this, panel
won't be recognized] [MT8186 platform]

### Step 1.4: Hidden Bug Fix Detection
This is not a disguised bug fix — it's an explicit device ID addition.
However, missing panel entries cause real user impact: the display won't
work properly on affected Chromebooks.

Record: [Not a hidden bug fix; straightforward device ID addition with
real user impact]

---

## PHASE 2: DIFF ANALYSIS

### Step 2.1: Change Inventory
- **Files changed**: 1 (`drivers/gpu/drm/panel/panel-edp.c`)
- **Lines added**: 1
- **Lines removed**: 0
- **Functions modified**: None (only the `edp_panels[]` static data
  table)
- **Scope**: Single-line addition to a data table

Record: [1 file, +1 line, no function logic changes, minimal scope]

### Step 2.2: Code Flow Change
The single added line:
```c
EDP_PANEL_ENTRY('A', 'U', 'O', 0x89ba, &delay_200_500_e50,
"B116XAT04.1"),
```
Inserted in sorted order (between 0x8594 and 0x8bba) into the
`edp_panels[]` table. This uses the standard `EDP_PANEL_ENTRY` macro
with the well-established `delay_200_500_e50` timing struct (used by 80+
other panels).

Record: [Before: panel ID 0x89ba not recognized. After: panel matched
and properly initialized with standard timing]

### Step 2.3: Bug Mechanism
Category: **Hardware enablement / Device ID addition**. Not a bug fix
per se, but enables hardware that doesn't work without it.

Record: [Device ID addition. Existing entry 0xc4b4 covers one HW
revision; this adds HW revision 1A with EDID 0x89ba]

### Step 2.4: Fix Quality
- Obviously correct: single-line table entry using the same macro and
  timing parameters as ~80 other AUO panels
- Minimal/surgical: 1 line
- Regression risk: effectively zero — only affects panels with EDID
  product ID 0x89ba
- Reviewed by the subsystem maintainer

Record: [Obviously correct, minimal, zero regression risk]

---

## PHASE 3: GIT HISTORY INVESTIGATION

### Step 3.1: Blame
The surrounding entries in the table come from various commits dating
back to 2022 (d049a24b15d8c1, March 2022) through 2025. The `panel-
edp.c` file and `edp_panels[]` table have existed since at least kernel
5.18.

Record: [Panel table infrastructure has been in the kernel since at
least v5.18; file is stable and well-established]

### Step 3.2: No Fixes Tag
Not applicable — this is a device ID addition, not a bug fix referencing
an introduced regression.

### Step 3.3: File History
The file sees frequent panel ID additions. The last 20 commits are
almost all panel additions by various authors, showing this is a
standard, routine operation.

Record: [Extremely active file for panel additions; this is a routine
operation]

### Step 3.4: Author History
Terry Hsiao has at least 2 commits in this file: the earlier 6-panel
batch (d4b9b6da5777b, July 2024) and a name fix (21e97d3ca814e). This is
a regular contributor who works on Chromebook panel enablement.

Record: [Author is a repeat contributor to this file, working on
Chromebook panel support]

### Step 3.5: Dependencies
None. The `EDP_PANEL_ENTRY` macro and `delay_200_500_e50` struct exist
in all stable trees that have `panel-edp.c`. This is a self-contained,
standalone one-line addition.

Record: [No dependencies. Fully standalone.]

---

## PHASE 4: MAILING LIST AND EXTERNAL RESEARCH

### Step 4.1–4.5: Mailing List
The lore.kernel.org site is blocked by Anubis anti-bot protection.
However, we know:
- The patch was submitted by Terry Hsiao on 2026-01-22
- It was reviewed by Douglas Anderson (the panel-edp subsystem
  maintainer)
- Douglas Anderson also committed it (Signed-off-by)
- The patch link is
  `patch.msgid.link/20260122032904.3933-1-terry_hsiao@...`

Record: [Could not fetch lore discussion due to Anubis protection.
Reviewed and committed by subsystem maintainer Douglas Anderson.]

---

## PHASE 5: CODE SEMANTIC ANALYSIS

### Step 5.1–5.5: Function Analysis
No functions are modified. The change is purely data — a new entry in
the static `edp_panels[]` table. This table is searched by the panel-edp
driver's probe path to match panels by EDID product ID. When a match is
found, the corresponding timing delays are applied.

The `EDP_PANEL_ENTRY` macro is used 196 times in this file. The
`delay_200_500_e50` timing struct is used by 80+ entries. This is
entirely routine.

Record: [Data-only change to a well-established lookup table. No logic
changes.]

---

## PHASE 6: CROSS-REFERENCING AND STABLE TREE ANALYSIS

### Step 6.1: Buggy Code in Stable
The `panel-edp.c` file exists in stable trees including 6.6.y (confirmed
via `git log v6.6..v6.6.80`). The `EDP_PANEL_ENTRY` macro and
`delay_200_500_e50` struct exist in all active stable trees.

Record: [File and infrastructure exist in 6.6.y and all newer stable
trees]

### Step 6.2: Backport Complications
This is a one-line addition to a sorted table. It will apply cleanly to
any stable tree that has the surrounding entries. Minor context
adjustment might be needed if nearby entries differ, but the table is
insertion-order agnostic for functionality.

Record: [Expected clean apply or trivial context adjustment]

### Step 6.3: Related Fixes Already in Stable
The earlier entry for the same panel (0xc4b4) from commit d4b9b6da5777b
may or may not be in stable trees. Even if it isn't, this entry stands
alone — it matches a different EDID product ID.

Record: [No related fixes needed; this is independent]

---

## PHASE 7: SUBSYSTEM AND MAINTAINER CONTEXT

### Step 7.1: Subsystem Criticality
- **Subsystem**: `drm/panel` — Display panel drivers
- **Criticality**: IMPORTANT — panels are essential for display output.
  This specifically affects Chromebooks (MT8186 platform), which are
  widely deployed devices.

Record: [drm/panel, IMPORTANT criticality, Chromebook platform]

### Step 7.2: Activity Level
Very active — 20+ recent commits are all panel additions. This is a
well-maintained, high-traffic area.

Record: [Highly active subsystem]

---

## PHASE 8: IMPACT AND RISK ASSESSMENT

### Step 8.1: Who is Affected
Users of MT8186 Chromebooks with the AUO B116XAT04.1 (HW: 1A) panel.
Without this entry, the display panel won't be properly initialized,
meaning the screen won't work correctly.

Record: [Users of specific Chromebook hardware]

### Step 8.2: Trigger Conditions
Triggered at boot time during panel probe. Every boot on affected
hardware.

Record: [Every boot on affected Chromebook models. 100% reproducible.]

### Step 8.3: Failure Mode Severity
Without the panel entry: the display may not initialize properly or may
fall back to a generic mode with incorrect timing. This is a hardware
enablement issue.

Record: [Display malfunction on affected hardware. Severity: HIGH for
affected users.]

### Step 8.4: Risk-Benefit Ratio
- **BENEFIT**: Enables display on Chromebooks with this specific panel
  revision. HIGH for affected users.
- **RISK**: Effectively zero. One-line data table addition using
  existing macros/timing. Cannot affect any other hardware.
- **Ratio**: Extremely favorable.

Record: [High benefit, near-zero risk]

---

## PHASE 9: FINAL SYNTHESIS

### Step 9.1: Evidence Summary

**FOR backporting:**
- Classic device ID addition to existing driver (explicit exception
  category)
- Single line added, zero regression risk
- Uses well-established macro and timing parameters (80+ other panels
  use same timing)
- Reviewed and committed by subsystem maintainer (Douglas Anderson)
- Enables real hardware (MT8186 Chromebooks) for real users
- No dependencies on other patches
- Will apply cleanly to stable trees

**AGAINST backporting:**
- No evidence against. This is as clean as a backport candidate gets.

### Step 9.2: Stable Rules Checklist
1. Obviously correct and tested? **YES** — single table entry, reviewed
   by maintainer
2. Fixes a real bug? **YES** — enables hardware that doesn't work
   without it
3. Important issue? **YES** — display doesn't work on affected
   Chromebooks
4. Small and contained? **YES** — 1 line
5. No new features or APIs? **YES** — only a data table entry
6. Can apply to stable trees? **YES** — trivially

### Step 9.3: Exception Category
**Device ID addition to existing driver** — this is explicitly listed as
a common exception that is almost always YES for stable.

### Step 9.4: Decision
This is a textbook device ID addition — one line, zero risk, real
hardware enablement, reviewed by the subsystem maintainer.

---

## Verification

- [Phase 1] Parsed tags: Reviewed-by and Signed-off-by from Douglas
  Anderson (subsystem maintainer)
- [Phase 2] Diff analysis: 1 line added to `edp_panels[]` table using
  `EDP_PANEL_ENTRY` macro with `delay_200_500_e50`
- [Phase 2] Verified `delay_200_500_e50` is used by 80+ entries;
  `EDP_PANEL_ENTRY` used 196 times
- [Phase 3] git blame: surrounding table entries date from 2022–2025,
  stable infrastructure
- [Phase 3] git log --author: Terry Hsiao has 2 prior commits in this
  file (d4b9b6da5777b, 21e97d3ca814e)
- [Phase 3] Existing entry for same panel model (0xc4b4) exists at line
  1922, from commit d4b9b6da5777b
- [Phase 4] b4 dig on related commit d4b9b6da5777b: found lore thread,
  confirms same author pattern
- [Phase 4] UNVERIFIED: Could not read lore discussion for this specific
  commit due to Anubis protection
- [Phase 5] Data-only change, no function logic modified
- [Phase 6] File confirmed present in 6.6.y stable tree (6 commits found
  in v6.6..v6.6.80 range)
- [Phase 6] `EDP_PANEL_ENTRY` macro exists at line 1859,
  `delay_200_500_e50` widely used — both in stable
- [Phase 8] Failure mode: display not recognized on MT8186 Chromebooks;
  severity HIGH for affected users
- [Phase 8] Risk: near-zero (data table entry, only matches specific
  EDID 0x89ba)

**YES**

 drivers/gpu/drm/panel/panel-edp.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/panel/panel-edp.c b/drivers/gpu/drm/panel/panel-edp.c
index 679f4af5246d8..108569490ed59 100644
--- a/drivers/gpu/drm/panel/panel-edp.c
+++ b/drivers/gpu/drm/panel/panel-edp.c
@@ -1915,6 +1915,7 @@ static const struct edp_panel_entry edp_panels[] = {
 	EDP_PANEL_ENTRY('A', 'U', 'O', 0x723c, &delay_200_500_e50, "B140XTN07.2"),
 	EDP_PANEL_ENTRY('A', 'U', 'O', 0x73aa, &delay_200_500_e50, "B116XTN02.3"),
 	EDP_PANEL_ENTRY('A', 'U', 'O', 0x8594, &delay_200_500_e50, "B133UAN01.0"),
+	EDP_PANEL_ENTRY('A', 'U', 'O', 0x89ba, &delay_200_500_e50, "B116XAT04.1"),
 	EDP_PANEL_ENTRY('A', 'U', 'O', 0x8bba, &delay_200_500_e50, "B140UAN08.5"),
 	EDP_PANEL_ENTRY('A', 'U', 'O', 0xa199, &delay_200_500_e50, "B116XAN06.1"),
 	EDP_PANEL_ENTRY('A', 'U', 'O', 0xa7b3, &delay_200_500_e50, "B140UAN04.4"),
-- 
2.53.0

^ permalink raw reply related

* [PATCH AUTOSEL 7.0-6.18] drm: gpu: msm: forbid mem reclaim from reset
From: Sasha Levin @ 2026-04-20 13:21 UTC (permalink / raw)
  To: patches, stable
  Cc: Sergey Senozhatsky, Tomasz Figa, Rob Clark, Rob Clark,
	Sasha Levin, lumag, airlied, simona, linux-arm-msm, dri-devel,
	freedreno, linux-kernel
In-Reply-To: <20260420132314.1023554-1-sashal@kernel.org>

From: Sergey Senozhatsky <senozhatsky@chromium.org>

[ Upstream commit 4625fe5bbdaccd45be274c30ff0a42e30d4e38cf ]

We sometimes get into a situtation where GPU hangcheck fails to
recover GPU:

[..]
msm_dpu ae01000.display-controller: [drm:hangcheck_handler] *ERROR* (IPv4: 1): hangcheck detected gpu lockup rb 0!
msm_dpu ae01000.display-controller: [drm:hangcheck_handler] *ERROR* (IPv4: 1): completed fence: 7840161
msm_dpu ae01000.display-controller: [drm:hangcheck_handler] *ERROR* (IPv4: 1): submitted fence: 7840162
msm_dpu ae01000.display-controller: [drm:hangcheck_handler] *ERROR* (IPv4: 1): hangcheck detected gpu lockup rb 0!
msm_dpu ae01000.display-controller: [drm:hangcheck_handler] *ERROR* (IPv4: 1): completed fence: 7840162
msm_dpu ae01000.display-controller: [drm:hangcheck_handler] *ERROR* (IPv4: 1): submitted fence: 7840163
[..]

The problem is that msm_job worker is blocked on gpu->lock

INFO: task ring0:155 blocked for more than 122 seconds.
Not tainted 6.6.99-08727-gaac38b365d2c #1
task:ring0 state:D stack:0 pid:155 ppid:2 flags:0x00000008
Call trace:
__switch_to+0x108/0x208
schedule+0x544/0x11f0
schedule_preempt_disabled+0x30/0x50
__mutex_lock_common+0x410/0x850
__mutex_lock_slowpath+0x28/0x40
mutex_lock+0x5c/0x90
msm_job_run+0x9c/0x140
drm_sched_main+0x514/0x938
kthread+0x114/0x138
ret_from_fork+0x10/0x20

which is owned by recover worker, which is waiting for DMA fences
from a memory reclaim path, under the very same gpu->lock

INFO: task ring0:155 is blocked on a mutex likely owned by task gpu-worker:154.
task:gpu-worker state:D stack:0 pid:154 ppid:2 flags:0x00000008
Call trace:
__switch_to+0x108/0x208
schedule+0x544/0x11f0
schedule_timeout+0x1f8/0x770
dma_fence_default_wait+0x108/0x218
dma_fence_wait_timeout+0x6c/0x1c0
dma_resv_wait_timeout+0xe4/0x118
active_purge+0x34/0x98
drm_gem_lru_scan+0x1d0/0x388
msm_gem_shrinker_scan+0x1cc/0x2e8
shrink_slab+0x228/0x478
shrink_node+0x380/0x730
try_to_free_pages+0x204/0x510
__alloc_pages_direct_reclaim+0x90/0x158
__alloc_pages_slowpath+0x1d4/0x4a0
__alloc_pages+0x9f0/0xc88
vm_area_alloc_pages+0x17c/0x260
__vmalloc_node_range+0x1c0/0x420
kvmalloc_node+0xe8/0x108
msm_gpu_crashstate_capture+0x1e4/0x280
recover_worker+0x1c0/0x638
kthread_worker_fn+0x150/0x2d8
kthread+0x114/0x138

So no one can make any further progress.

Forbid recover/fault worker to enter memory reclaim (under
gpu->lock) to address this deadlock scenario.

Cc: Tomasz Figa <tfiga@chromium.org>
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Rob Clark <rob.clark@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/700978/
Message-ID: <20260127073341.2862078-1-senozhatsky@chromium.org>
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Now I have a complete picture. Let me compile the final analysis.

---

## PHASE 1: COMMIT MESSAGE FORENSICS

**Step 1.1: Subject Line**
- Subsystem: `drm: gpu: msm` (Qualcomm MSM GPU driver)
- Action verb: "forbid" — preventing a dangerous code path
- Summary: Prevent memory reclaim from GPU reset/fault handlers to avoid
  deadlock

**Step 1.2: Tags**
- `Cc: Tomasz Figa <tfiga@chromium.org>` — ChromeOS stakeholder
- `Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>` —
  author, ChromeOS/mm developer
- `Reviewed-by: Rob Clark <rob.clark@oss.qualcomm.com>` — MSM subsystem
  maintainer
- `Patchwork: https://patchwork.freedesktop.org/patch/700978/`
- `Message-ID: <20260127073341.2862078-1-senozhatsky@chromium.org>`
- `Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>` —
  committer/maintainer
- No Fixes: tag (expected for manual review candidates)

**Step 1.3: Body Analysis**
The commit message contains detailed stack traces showing:
1. A GPU lockup detected by hangcheck, triggering `recover_worker`
2. `recover_worker` holds `gpu->lock` and is blocked trying to allocate
   memory
3. The allocation triggers `__alloc_pages_direct_reclaim` ->
   `shrink_slab` -> `msm_gem_shrinker_scan` -> `active_purge` ->
   `dma_fence_default_wait`
4. The DMA fences cannot complete because the GPU is hung and
   `gpu->lock` is held
5. Deadlock: `msm_job_run` needs `gpu->lock` (owned by `recover_worker`)
   and `recover_worker` is stuck in reclaim waiting on DMA fences that
   can't signal

**Step 1.4: Hidden Bug Fix Detection**
This is an explicit deadlock fix, not disguised.

## PHASE 2: DIFF ANALYSIS

**Step 2.1: Inventory**
- Single file: `drivers/gpu/drm/msm/msm_gpu.c`
- 1 include added, ~10 lines of actual logic across 2 functions
- Functions modified: `recover_worker()`,
  `msm_gpu_fault_crashstate_capture()`

**Step 2.2: Code Flow Change**
In both functions, the pattern is identical:
- BEFORE: crashstate capture runs with default GFP flags, allowing
  reclaim
- AFTER: `memalloc_noreclaim_save()` called before crashstate capture,
  `memalloc_noreclaim_restore()` called after, preventing the kernel
  from entering memory reclaim while `gpu->lock` is held

**Step 2.3: Bug Mechanism**
Category: **Deadlock**. The mechanism:
1. `recover_worker` acquires `gpu->lock`
2. `msm_gpu_crashstate_capture()` calls `kvmalloc()` (line 239 in
   `msm_gpu_crashstate_get_bo`)
3. Under memory pressure, `kvmalloc` -> `__alloc_pages_slowpath` ->
   `try_to_free_pages` -> `shrink_slab`
4. `msm_gem_shrinker_scan` -> `active_purge` -> `wait_for_idle` ->
   `dma_resv_wait_timeout`
5. DMA fences can't signal because the GPU is hung — recovery needs
   `gpu->lock` which is already held

**Step 2.4: Fix Quality**
- Minimal and surgical: only adds `memalloc_noreclaim_save/restore`
  bracketing
- Well-established kernel pattern (used in amdgpu, i915)
- Regression risk: extremely low — only changes allocation behavior
  within a narrow scope
- Review: accepted by Rob Clark (MSM maintainer)

## PHASE 3: GIT HISTORY INVESTIGATION

**Step 3.1: Blame**
- `recover_worker` structure dates back to Rob Clark 2013, with
  gpu->lock added in c28e2f2b417ed7 (v5.16, 2021-11-09)
- `msm_gpu_crashstate_capture` added in c0fec7f562ec76 (v4.20/v5.0,
  2018-07-24)
- `msm_gpu_fault_crashstate_capture` introduced in e25e92e08e32c6
  (v5.15, 2021-06-10), refactored in 0c5fea1eb0dc2 (v7.0)
- The deadlock has existed since v5.16 when gpu->lock was introduced
  alongside crashstate capture

**Step 3.2: No Fixes tag to follow**

**Step 3.3: Related Changes**
- Commit 4bea53b9c7c72 "drm/msm: Reduce fallout of fence signaling vs
  reclaim hangs" (2023-11-17) — Rob Clark reduced shrinker timeout from
  1000ms to 10ms as a *partial* workaround for this exact class of
  deadlock. This confirms the issue was known.

**Step 3.4: Author**
- Sergey Senozhatsky is a well-known kernel developer (mm subsystem,
  compression, ChromeOS)
- Rob Clark is the MSM subsystem maintainer who reviewed and committed
  the fix

**Step 3.5: Dependencies**
- Standalone fix, no dependencies on other patches
- The `#include <linux/sched/mm.h>` header is available in all relevant
  stable trees
- `memalloc_noreclaim_save/restore` available since at least v4.x

## PHASE 4: MAILING LIST

**Step 4.1-4.2:** Patchwork link confirms this was reviewed through the
freedesktop.org DRM process. Rob Clark (subsystem maintainer) provided
`Reviewed-by` and committed the patch.

**Step 4.3:** No specific bug report link, but the commit includes real
stack traces from a production system running kernel 6.6.99, indicating
this was hit on ChromeOS devices.

**Step 4.4:** Single standalone patch (not part of a series).

**Step 4.5:** Could not verify stable-specific discussion due to anti-
bot protections on lore.kernel.org.

## PHASE 5: CODE SEMANTIC ANALYSIS

**Step 5.1:** Modified functions: `recover_worker()`,
`msm_gpu_fault_crashstate_capture()`

**Step 5.2:** `recover_worker` is queued by `hangcheck_handler` (timer
callback) whenever a GPU lockup is detected.
`msm_gpu_fault_crashstate_capture` is called from IOMMU fault handlers.

**Step 5.3:** Both call `msm_gpu_crashstate_capture` which calls
`kvmalloc` (via `msm_gpu_crashstate_get_bo`), the trigger for the
deadlock.

**Step 5.4:** Call chain: `hangcheck_timer` -> `hangcheck_handler` ->
`kthread_queue_work(recover_work)` -> `recover_worker`. This is the
standard GPU hang recovery path triggered automatically.

**Step 5.5:** Similar pattern exists in amdgpu and i915 where
`memalloc_noreclaim_save` is used to prevent reclaim deadlocks in GPU
driver paths.

## PHASE 6: STABLE TREE ANALYSIS

**Step 6.1:** The buggy code exists in all stable trees from v5.16+
(when `gpu->lock` was introduced). In older trees (v6.6),
`recover_worker` and `fault_worker` have the same deadlock pattern. The
code confirmed in v6.6 and v6.12 stable branches.

**Step 6.2:** The patch won't apply cleanly to older trees (v6.6) due
to:
- VM_BIND code differences in `recover_worker`
- `fault_worker` vs `msm_gpu_fault_crashstate_capture` name change
- `msm_gpu_crashstate_capture` has 4 args in v6.6 vs 5 in v7.0
But the fix concept is trivially adaptable. For v7.0.y it should apply
cleanly.

**Step 6.3:** Only the partial workaround (4bea53b9c7c72, timeout
reduction) has been applied previously.

## PHASE 7: SUBSYSTEM CONTEXT

**Step 7.1:** `drivers/gpu/drm/msm/` — Qualcomm Adreno GPU driver.
Criticality: **IMPORTANT** — used on millions of Qualcomm SoC devices
(Chromebooks, phones, embedded systems).

**Step 7.2:** Actively developed subsystem with many recent commits.

## PHASE 8: IMPACT AND RISK ASSESSMENT

**Step 8.1:** Affects all Qualcomm Adreno GPU users running with the MSM
DRM driver.

**Step 8.2:** Trigger: GPU hang + memory pressure. GPU hangs occur in
real-world usage. Under memory pressure (common on memory-constrained
mobile/embedded devices), the deadlock triggers. The stack trace shows
it was reproduced on a ChromeOS device.

**Step 8.3:** Failure mode: **DEADLOCK** — the system hangs completely
(GPU recovery blocked for >122 seconds, tasks blocked indefinitely).
Severity: **CRITICAL**.

**Step 8.4:**
- **Benefit:** HIGH — prevents complete system deadlock on common
  hardware
- **Risk:** VERY LOW — adds ~10 lines of well-established
  `memalloc_noreclaim_save/restore` pattern, reviewed by subsystem
  maintainer

## PHASE 9: FINAL SYNTHESIS

**Evidence FOR backporting:**
- Fixes a real, documented deadlock with stack traces
- Observed on production hardware (ChromeOS, kernel 6.6.99)
- Small, surgical fix (~10 lines of real changes)
- Uses well-established kernel pattern (`memalloc_noreclaim_save`)
- Reviewed by Rob Clark (MSM subsystem maintainer)
- Previous partial workaround (4bea53b9c7c72) confirms the issue was
  known
- Affects widely-deployed Qualcomm hardware
- Bug present in all stable trees since v5.16

**Evidence AGAINST backporting:**
- May need minor adaptation for older stable trees (different function
  names/signatures)
- No explicit Cc: stable tag (expected for review candidates)

**Stable rules checklist:**
1. Obviously correct and tested? YES — well-established pattern,
   reviewed by maintainer, real stack traces
2. Fixes a real bug? YES — documented deadlock
3. Important issue? YES — system deadlock (CRITICAL severity)
4. Small and contained? YES — single file, ~10 lines of logic
5. No new features/APIs? YES — purely a fix
6. Can apply to stable? YES for v7.0.y (clean), needs minor rework for
   v6.6.y

## Verification

- [Phase 1] Parsed tags: Reviewed-by Rob Clark (MSM maintainer), Cc
  Tomasz Figa, Patchwork link
- [Phase 2] Diff analysis: ~10 lines added across `recover_worker()` and
  `msm_gpu_fault_crashstate_capture()`, adds
  `memalloc_noreclaim_save/restore` bracketing around crashstate capture
- [Phase 3] git blame: `recover_worker` with `gpu->lock` since
  c28e2f2b417ed7 (v5.16), crashstate capture since c0fec7f562ec76
  (v4.20)
- [Phase 3] Related fix 4bea53b9c7c72: Rob Clark acknowledged this
  deadlock class, applied partial workaround (timeout reduction)
- [Phase 3] No dependencies found; standalone patch
- [Phase 4] Patchwork link confirms DRM review process; anti-bot
  protection prevented lore fetch
- [Phase 5] `recover_worker` called from hangcheck timer path;
  `msm_gpu_crashstate_capture` allocates memory via `kvmalloc` in
  `msm_gpu_crashstate_get_bo`
- [Phase 6] Confirmed buggy code exists in v6.6 (line 354+, 449+) and
  v6.12 (line 355+, 478+)
- [Phase 6] v7.0.y: should apply cleanly; older trees need minor
  adaptation
- [Phase 7] MSM GPU driver — IMPORTANT subsystem (Qualcomm Adreno,
  millions of devices)
- [Phase 8] Failure mode: system deadlock (CRITICAL), triggered by GPU
  hang + memory pressure
- [Phase 8] Stack trace from real ChromeOS device running kernel 6.6.99

**YES**

 drivers/gpu/drm/msm/msm_gpu.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index 84d6c7f50c8df..67033bb01a9b0 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -17,6 +17,7 @@
 #include <linux/string_helpers.h>
 #include <linux/devcoredump.h>
 #include <linux/sched/task.h>
+#include <linux/sched/mm.h>

 /*
  * Power Management:
@@ -468,6 +469,7 @@ static void recover_worker(struct kthread_work *work)
 	struct msm_gem_submit *submit;
 	struct msm_ringbuffer *cur_ring = gpu->funcs->active_ring(gpu);
 	char *comm = NULL, *cmd = NULL;
+	unsigned int noreclaim_flag;
 	struct task_struct *task;
 	int i;

@@ -505,6 +507,8 @@ static void recover_worker(struct kthread_work *work)
 			msm_gem_vm_unusable(submit->vm);
 	}

+	noreclaim_flag = memalloc_noreclaim_save();
+
 	get_comm_cmdline(submit, &comm, &cmd);

 	if (comm && cmd) {
@@ -523,6 +527,8 @@ static void recover_worker(struct kthread_work *work)
 	pm_runtime_get_sync(&gpu->pdev->dev);
 	msm_gpu_crashstate_capture(gpu, submit, NULL, comm, cmd);

+	memalloc_noreclaim_restore(noreclaim_flag);
+
 	kfree(cmd);
 	kfree(comm);

@@ -587,6 +593,7 @@ void msm_gpu_fault_crashstate_capture(struct msm_gpu *gpu, struct msm_gpu_fault_
 	struct msm_gem_submit *submit;
 	struct msm_ringbuffer *cur_ring = gpu->funcs->active_ring(gpu);
 	char *comm = NULL, *cmd = NULL;
+	unsigned int noreclaim_flag;

 	mutex_lock(&gpu->lock);

@@ -594,6 +601,8 @@ void msm_gpu_fault_crashstate_capture(struct msm_gpu *gpu, struct msm_gpu_fault_
 	if (submit && submit->fault_dumped)
 		goto resume_smmu;

+	noreclaim_flag = memalloc_noreclaim_save();
+
 	if (submit) {
 		get_comm_cmdline(submit, &comm, &cmd);

@@ -609,6 +618,8 @@ void msm_gpu_fault_crashstate_capture(struct msm_gpu *gpu, struct msm_gpu_fault_
 	msm_gpu_crashstate_capture(gpu, submit, fault_info, comm, cmd);
 	pm_runtime_put_sync(&gpu->pdev->dev);

+	memalloc_noreclaim_restore(noreclaim_flag);
+
 	kfree(cmd);
 	kfree(comm);

-- 
2.53.0

^ permalink raw reply related

* [PATCH AUTOSEL 7.0-5.10] ASoC: Intel: bytcr_rt5640: Fix MCLK leak on platform_clock_control error
From: Sasha Levin @ 2026-04-20 13:21 UTC (permalink / raw)
  To: patches, stable
  Cc: Aravind Anilraj, Cezary Rojewski, Mark Brown, Sasha Levin,
	liam.r.girdwood, peter.ujfalusi, yung-chuan.liao,
	ranjani.sridharan, kai.vehmanen, perex, tiwai, linux-sound,
	linux-kernel
In-Reply-To: <20260420132314.1023554-1-sashal@kernel.org>

From: Aravind Anilraj <aravindanilraj0702@gmail.com>

[ Upstream commit a02496a29463e7f0d1643e83aab28adb3dd03f1a ]

If byt_rt5640_prepare_and_enable_pll1() fails, the function returns
without calling clk_disable_unprepare() on priv->mclk, which was
already enabled earlier in the same code path. Add the missing
cleanup call to prevent the clock from leaking.

Signed-off-by: Aravind Anilraj <aravindanilraj0702@gmail.com>
Reviewed-by: Cezary Rojewski <cezary.rojewski@intel.com>
Link: https://patch.msgid.link/20260401220507.23557-2-aravindanilraj0702@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Now I have all the information needed for a comprehensive analysis. Let
me compile the findings.

---

## PHASE 1: COMMIT MESSAGE FORENSICS

**Step 1.1: Subject Line**
- Subsystem: `ASoC: Intel: bytcr_rt5640`
- Action verb: **Fix** (explicit bug fix)
- Summary: Fix MCLK clock leak on `platform_clock_control` error path.

**Step 1.2: Tags**
- `Signed-off-by: Aravind Anilraj <aravindanilraj0702@gmail.com>` -
  Author
- `Reviewed-by: Cezary Rojewski <cezary.rojewski@intel.com>` - Intel
  audio developer reviewed
- `Link: https://patch.msgid.link/20260401220507.23557-2-
  aravindanilraj0702@gmail.com` - Lore link
- `Signed-off-by: Mark Brown <broonie@kernel.org>` - Merged by ASoC
  subsystem maintainer
- No Fixes: tag (expected for this review pipeline)
- No Cc: stable (expected)

**Step 1.3: Commit Body**
The body clearly describes the bug: When
`byt_rt5640_prepare_and_enable_pll1()` fails, the function returns
without calling `clk_disable_unprepare()` on `priv->mclk`, which was
already enabled by `clk_prepare_enable()`. This is a textbook resource
leak on an error path.

**Step 1.4: Hidden Bug Fix Detection**
Not hidden — this is explicitly labeled as a fix. The word "Fix" is in
the subject, and the mechanism (clock leak) is clearly described.

Record: [ASoC Intel bytcr_rt5640] [fix] [MCLK clock leak on PLL1 enable
error path] [Not a hidden fix - explicitly labeled]

---

## PHASE 2: DIFF ANALYSIS

**Step 2.1: Inventory**
- 1 file changed: `sound/soc/intel/boards/bytcr_rt5640.c`
- +2 lines added (only)
- Function modified: `platform_clock_control()`
- Scope: Single-file surgical fix, extremely minimal

**Step 2.2: Code Flow Change**
Before: If `byt_rt5640_prepare_and_enable_pll1()` fails at line 291,
`ret < 0`, the function falls through to line 305 and returns the error,
but `priv->mclk` remains enabled (was enabled at line 286).

After: If `byt_rt5640_prepare_and_enable_pll1()` fails,
`clk_disable_unprepare(priv->mclk)` is called immediately, releasing the
clock before the error return.

**Step 2.3: Bug Mechanism**
Category: **Error path / resource leak fix**. The clock was enabled via
`clk_prepare_enable()` but not cleaned up on failure of the subsequent
PLL1 setup. This is a classic missing-cleanup-on-error pattern.

**Step 2.4: Fix Quality**
- Obviously correct: YES. The symmetry is clear — `clk_prepare_enable()`
  succeeded, so on failure we must call `clk_disable_unprepare()`.
- Minimal/surgical: YES. Only 2 lines added.
- Regression risk: Extremely low. The added code only runs on the error
  path when PLL1 setup fails.

Record: [1 file, +2 lines, platform_clock_control()] [Resource leak fix:
MCLK left enabled on PLL1 failure] [Obviously correct, zero regression
risk]

---

## PHASE 3: GIT HISTORY INVESTIGATION

**Step 3.1: Blame**
The buggy code pattern was introduced by commit `bcd9a325f0b0f4` (Hans
de Goede, 2018-05-08): "ASoC: Intel: bytcr_rt5640: Configure PLL1 before
using it". This commit added the `byt_rt5640_prepare_and_enable_pll1()`
call after `clk_prepare_enable()` but failed to add cleanup on its
failure path.

The MCLK handling was further cleaned up by commit `a15ca6e3b8a21f`
(Andy Shevchenko, 2021-10-07), which removed the `BYT_RT5640_MCLK_EN`
quirk guard but preserved the same missing-cleanup bug.

**Step 3.2: Fixes Tag**
No Fixes: tag present (expected). The root cause commit is
`bcd9a325f0b0f4` from 2018. Verified present in v6.1, v6.6, and all
active stable trees.

**Step 3.3: File History**
Recent changes to the file are mostly DMI quirk additions and cosmetic
refactoring. No conflicting changes to the `platform_clock_control()`
function.

**Step 3.4: Author**
Aravind Anilraj has no other commits in this tree — likely a new
contributor. However, the patch was reviewed by Cezary Rojewski (Intel
audio team) and merged by Mark Brown (ASoC maintainer), providing strong
quality assurance.

**Step 3.5: Dependencies**
None. The fix is 2 self-contained lines. No new functions, structures,
or APIs involved.

Record: [Bug introduced 2018 in bcd9a325f0b0f4, present in all stable
trees] [Reviewed by Intel developer, merged by ASoC maintainer]
[Standalone fix, no dependencies]

---

## PHASE 4: MAILING LIST AND EXTERNAL RESEARCH

**Step 4.1-4.2**: Lore was inaccessible due to anti-bot protections.
However, the `Link:` tag confirms this was submitted and reviewed via
normal mailing list processes. The `Reviewed-by: Cezary Rojewski` (Intel
audio) confirms expert review. Mark Brown (ASoC maintainer) merged it.

**Step 4.3**: No Reported-by tag — this was found by code inspection,
not a user report.

**Step 4.4**: The same bug exists in sibling driver `bytcr_rt5651.c`
(lines 206-231) — identical pattern of `clk_prepare_enable()` followed
by `byt_rt5651_prepare_and_enable_pll1()` without cleanup on failure.
This confirms it's a systematic, real bug.

Record: [Reviewed by Intel audio developer, merged by ASoC maintainer]
[Same bug pattern confirmed in sibling driver bytcr_rt5651]

---

## PHASE 5: CODE SEMANTIC ANALYSIS

**Step 5.1**: Function modified: `platform_clock_control()`

**Step 5.2: Callers**
`platform_clock_control` is registered as a DAPM supply widget callback:

```340:350:sound/soc/intel/boards/bytcr_rt5640.c
SND_SOC_DAPM_SUPPLY("Platform Clock", SND_SOC_NOPM, 0, 0,
                    platform_clock_control, SND_SOC_DAPM_PRE_PMU |
                    SND_SOC_DAPM_POST_PMD),
```

This is called by the DAPM framework every time audio playback/capture
starts or stops — a **common, hot path** for any Bay Trail tablet/laptop
user.

**Step 5.3-5.4**: `byt_rt5640_prepare_and_enable_pll1()` calls
`snd_soc_dai_set_pll()` and `snd_soc_dai_set_sysclk()`, both of which
can fail (e.g., codec communication error). The leak path is reachable
from normal audio usage.

**Step 5.5**: Identical bug pattern exists in `bytcr_rt5651.c`
(confirmed via grep).

Record: [platform_clock_control called on every audio start/stop via
DAPM] [Bug reachable from normal user audio usage] [Same pattern in
sibling driver]

---

## PHASE 6: CROSS-REFERENCING AND STABLE TREE ANALYSIS

**Step 6.1**: Verified that both the root cause commit `bcd9a325f0b0f4`
(2018) and the MCLK refactor `a15ca6e3b8a21f` (2021) are ancestors of
v6.1 and v6.6. The buggy code exists in **all active stable trees**.

**Step 6.2**: The only potential backport complication is commit
`e6995aa816557` (DAPM API conversion, Nov 2025), which changed line 276
from the old DAPM API to the new one. This commit is only in v6.19+. For
v6.1/v6.6/v6.12/v6.18, the context may differ slightly on line 276, but
the fix (+2 lines after line 291) is so localized it should apply
cleanly or with trivial fuzz.

**Step 6.3**: No related fixes already in stable for this issue.

Record: [Bug exists in all active stable trees v6.1+] [Clean apply or
trivial fuzz expected] [No existing fixes in stable]

---

## PHASE 7: SUBSYSTEM AND MAINTAINER CONTEXT

**Step 7.1**: Subsystem: `sound/soc/intel/boards` — Intel ASoC machine
drivers. Criticality: **IMPORTANT**. Bay Trail RT5640/RT5651 is used on
many x86 tablets and low-cost laptops (Asus T100, Lenovo IdeaPad,
various Atom-based devices).

**Step 7.2**: The file has moderate activity (DMI quirks being added
regularly, confirming active hardware user base).

Record: [ASoC Intel Bay Trail boards] [IMPORTANT — real hardware with
active users]

---

## PHASE 8: IMPACT AND RISK ASSESSMENT

**Step 8.1**: Affected: Users of Intel Bay Trail devices with RT5640
codec (common Atom-based tablets and laptops).

**Step 8.2**: Trigger: Every audio playback start when
`byt_rt5640_prepare_and_enable_pll1()` fails (e.g., I2C communication
error with codec). The clock leak accumulates — each failure leaves MCLK
enabled, potentially causing power management issues and preventing the
clock from being properly reused.

**Step 8.3**: Severity: **MEDIUM-HIGH**. Clock resource leak can cause:
- Power management problems (clock stays active preventing deeper sleep
  states)
- Potential clock framework warnings/errors on subsequent audio
  operations
- Accumulated leaks over time

**Step 8.4**: Risk-Benefit:
- BENEFIT: Fixes a real resource leak in a commonly-used audio driver on
  real hardware
- RISK: Extremely low — 2 lines added to an error path only, obviously
  correct symmetry with `clk_prepare_enable`/`clk_disable_unprepare`
- Ratio: **Very favorable**

Record: [Bay Trail device users] [Triggered on PLL1 failure during audio
start] [Clock leak -> power management issues] [Extremely low risk,
moderate-high benefit]

---

## PHASE 9: FINAL SYNTHESIS

**Step 9.1: Evidence Summary**

FOR backporting:
- Fixes a real resource leak (clock not unprepared on error path)
- Tiny, surgical fix: only +2 lines
- Obviously correct: symmetric cleanup of `clk_prepare_enable()`
- Reviewed by Intel audio developer (Cezary Rojewski)
- Merged by ASoC maintainer (Mark Brown)
- Bug has existed since 2018, present in all stable trees
- Affects real hardware (Bay Trail tablets/laptops)
- Same bug pattern confirmed in sibling driver (bytcr_rt5651)
- No dependencies, self-contained fix

AGAINST backporting:
- No user-reported symptoms (found by code inspection)
- Minor context conflict possible on older stable trees (trivially
  resolvable)

**Step 9.2: Stable Rules Checklist**
1. Obviously correct and tested? **YES** — reviewed by Intel, symmetric
   cleanup pattern
2. Fixes a real bug? **YES** — clock resource leak on error path
3. Important issue? **YES** — resource leak affecting power management
   on real devices
4. Small and contained? **YES** — 2 lines, single file, single function
5. No new features or APIs? **YES** — purely a bug fix
6. Can apply to stable? **YES** — clean or trivial fuzz

**Step 9.3: Exception Categories**
Not an exception case — this is a straightforward bug fix.

**Step 9.4: Decision**
This is a textbook stable-worthy fix: tiny, obviously correct, fixes a
real resource leak in a driver used on real hardware, with expert
review. The risk is near zero.

---

## Verification

- [Phase 1] Parsed tags: Reviewed-by Cezary Rojewski (Intel), Link to
  lore, Signed-off-by Mark Brown (maintainer)
- [Phase 2] Diff analysis: +2 lines in error path of
  `platform_clock_control()`, adds missing `clk_disable_unprepare()`
  after `byt_rt5640_prepare_and_enable_pll1()` failure
- [Phase 3] git blame: Bug introduced by `bcd9a325f0b0f4` (Hans de
  Goede, 2018), MCLK refactored by `a15ca6e3b8a21f` (Andy Shevchenko,
  2021)
- [Phase 3] `git merge-base --is-ancestor`: Both root cause commits
  verified present in v6.1 and v6.6
- [Phase 3] Author check: New contributor, but patch reviewed by Intel
  developer and merged by ASoC maintainer
- [Phase 4] Lore inaccessible (anti-bot), but Link: tag confirms normal
  review process
- [Phase 5] DAPM widget registration confirmed: `platform_clock_control`
  called on every audio PMU/PMD event — common path
- [Phase 5] Grep confirmed identical bug pattern in sibling
  `bytcr_rt5651.c` (lines 206-231)
- [Phase 6] `e6995aa816557` (DAPM conversion) only in v6.19+; fix
  context should apply cleanly to v6.1-v6.18 with trivial fuzz at most
- [Phase 6] No existing fixes for this issue in stable
- [Phase 8] Failure mode: Clock resource leak on audio error path,
  severity MEDIUM-HIGH

**YES**

 sound/soc/intel/boards/bytcr_rt5640.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/sound/soc/intel/boards/bytcr_rt5640.c b/sound/soc/intel/boards/bytcr_rt5640.c
index 103e0b445603f..e4c21c9c5b38c 100644
--- a/sound/soc/intel/boards/bytcr_rt5640.c
+++ b/sound/soc/intel/boards/bytcr_rt5640.c
@@ -289,6 +289,8 @@ static int platform_clock_control(struct snd_soc_dapm_widget *w,
 			return ret;
 		}
 		ret = byt_rt5640_prepare_and_enable_pll1(codec_dai, 48000);
+		if (ret < 0)
+			clk_disable_unprepare(priv->mclk);
 	} else {
 		/*
 		 * Set codec clock source to internal clock before
-- 
2.53.0

^ permalink raw reply related

* [PATCH AUTOSEL 7.0-6.18] hwmon: (nct6683) Add customer ID for ASRock B650I Lightning WiFi
From: Sasha Levin @ 2026-04-20 13:21 UTC (permalink / raw)
  To: patches, stable
  Cc: Petr Klotz, Guenter Roeck, Sasha Levin, linux-hwmon, linux-kernel
In-Reply-To: <20260420132314.1023554-1-sashal@kernel.org>

From: Petr Klotz <pklotz0@protonmail.com>

[ Upstream commit ff708b549c4dbecb308fa97e360a8fe0b2f89309 ]

The ASRock B650I Lightning WiFi motherboard uses an NCT6686D chip with a
customer ID of 0x1633. Without this ID, the nct6683 driver fails to
recognize the hardware on this board, preventing hardware monitoring
from working.

Add NCT6683_CUSTOMER_ID_ASROCK6 (0x1633) to the list of supported customer
IDs and update the probe function to handle it

Signed-off-by: Petr Klotz <pklotz0@protonmail.com>
Link: https://lore.kernel.org/r/20260412000911.9063-2-pklotz0@protonmail.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Now I have all the information needed. Let me compile the full analysis.

---

## PHASE 1: COMMIT MESSAGE FORENSICS

### Step 1.1: PARSE THE SUBJECT LINE
- **Subsystem:** hwmon (nct6683)
- **Action verb:** "Add" (customer ID)
- **Summary:** Add customer ID for ASRock B650I Lightning WiFi
  motherboard

Record: [hwmon/nct6683] [Add] [New customer ID 0x1633 for ASRock B650I
Lightning WiFi]

### Step 1.2: PARSE ALL COMMIT MESSAGE TAGS
- **Signed-off-by:** Petr Klotz <pklotz0@protonmail.com> (author)
- **Link:**
  https://lore.kernel.org/r/20260412000911.9063-2-pklotz0@protonmail.com
- **Signed-off-by:** Guenter Roeck <linux@roeck-us.net> (subsystem
  maintainer / committer)
- No Fixes: tag (expected for manual review candidates)
- No Cc: stable (expected)
- No Reported-by (author is the user who needs this)

Record: Author is Petr Klotz, applied by Guenter Roeck (hwmon
maintainer). No Fixes/stable tags (expected).

### Step 1.3: ANALYZE THE COMMIT BODY TEXT
The commit message states:
- The ASRock B650I Lightning WiFi uses an NCT6686D chip with customer ID
  0x1633
- Without this ID, the nct6683 driver fails to recognize the hardware
  (returns -ENODEV)
- This prevents hardware monitoring from working on this board

Record: Bug = driver fails to instantiate on a real board. Symptom = no
hwmon support. Root cause = customer ID 0x1633 not in the allowlist.

### Step 1.4: DETECT HIDDEN BUG FIXES
This is not a hidden fix - it's a straightforward device ID addition to
enable hardware support on a specific board. This falls into the "NEW
DEVICE IDs" exception category.

Record: Not a hidden fix. Classic hardware ID addition.

---

## PHASE 2: DIFF ANALYSIS - LINE BY LINE

### Step 2.1: INVENTORY THE CHANGES
- **Files changed:** 1 (`drivers/hwmon/nct6683.c`)
- **Lines added:** 3 (one #define, two lines for `case` statement)
- **Lines removed:** 0
- **Functions modified:** `nct6683_probe()` (adding a case to an
  existing switch)
- **Scope:** Single-file, surgical, trivially small

### Step 2.2: UNDERSTAND THE CODE FLOW CHANGE
**Hunk 1** (line ~185): Adds `#define NCT6683_CUSTOMER_ID_ASROCK6
0x1633` to the list of known customer IDs. Pure definition, no behavior
change by itself.

**Hunk 2** (line ~1248): Adds `case NCT6683_CUSTOMER_ID_ASROCK6: break;`
to the probe function's customer ID switch statement. Before: customer
ID 0x1633 falls through to `default`, which returns -ENODEV (unless
force=1). After: 0x1633 is recognized and the probe continues normally.

### Step 2.3: IDENTIFY THE BUG MECHANISM
**Category:** Hardware enablement / device ID addition

The switch statement in `nct6683_probe()` acts as an allowlist. Without
the ID, the `default` case returns `-ENODEV`, preventing the driver from
loading. Adding the case enables the driver for this specific board.

### Step 2.4: ASSESS THE FIX QUALITY
- **Obviously correct?** YES - identical to 10+ previous customer ID
  additions
- **Minimal/surgical?** YES - 3 lines, one define + one case statement
- **Regression risk?** Essentially zero - the new case only matches a
  single specific hardware ID and does nothing different from all other
  ASRock cases
- **Red flags?** None

---

## PHASE 3: GIT HISTORY INVESTIGATION

### Step 3.1: BLAME THE CHANGED LINES
The customer ID area was last modified by commit c0fa7879c985 (ASROCK5,
Dec 2025). The switch statement follows the same pattern since the
driver's creation in 2014 (41082d66bfd6).

Record: Driver exists since v3.16 (2014). Customer ID mechanism
unchanged since inception.

### Step 3.2: FOLLOW THE FIXES: TAG
No Fixes: tag present (expected - this is an ID addition, not a bug fix
per se).

### Step 3.3: CHECK FILE HISTORY FOR RELATED CHANGES
There have been 10+ identical customer ID additions to this driver:
- ASROCK (0xe2c) - v5.12
- ASROCK2 (0xe1b) - v5.15
- ASROCK3 (0x1631) - v6.7
- ASROCK4 (0x163e) - v6.14
- ASROCK5 (0x1621) - v7.0-rc1
- MSI through MSI4, AMD, MITAC, INTEL - various versions

This is a well-established pattern with the exact same structure every
time.

Record: Standalone commit. No prerequisites. Follows established
pattern.

### Step 3.4: CHECK THE AUTHOR'S OTHER COMMITS
Petr Klotz is not the subsystem maintainer. This appears to be a first-
time or infrequent contributor submitting a board-specific ID. However,
Guenter Roeck (the hwmon maintainer) applied it, lending credibility.

### Step 3.5: CHECK FOR DEPENDENT/PREREQUISITE COMMITS
No dependencies. The patch adds a #define and a case to an existing
switch. It applies cleanly to any version that has the customer ID
switch structure (all versions since v3.16).

---

## PHASE 4: MAILING LIST AND EXTERNAL RESEARCH

### Step 4.1-4.2: FIND THE ORIGINAL PATCH DISCUSSION
The Link tag points to `20260412000911.9063-2-pklotz0@protonmail.com`.
Lore.kernel.org was behind Anubis protection during fetch. However, the
commit was applied by Guenter Roeck, the hwmon subsystem maintainer,
confirming it passed review.

The "-2" in the message ID suggests this was patch 2 of a series.

### Step 4.3: BUG REPORT
No separate bug report - the author is the user affected. This is
typical for hardware ID additions.

### Step 4.4-4.5: RELATED PATCHES AND STABLE HISTORY
Similar customer ID additions have been routinely backported to stable
trees. This is standard practice for device enablement.

---

## PHASE 5: CODE SEMANTIC ANALYSIS

### Step 5.1: KEY FUNCTIONS
Only `nct6683_probe()` is modified (adding a case to an existing
switch).

### Step 5.2: TRACE CALLERS
`nct6683_probe()` is called by the platform driver subsystem during
device enumeration - a standard, well-tested path.

### Step 5.3-5.4: CALL CHAIN
The probe function reads the customer ID from hardware register 0x602,
then checks it against the allowlist. If not found and `force` is not
set, it returns -ENODEV. This is the normal device discovery path.

### Step 5.5: SIMILAR PATTERNS
Identical pattern seen in all 10+ previous customer ID additions. No
unique concerns.

---

## PHASE 6: CROSS-REFERENCING AND STABLE TREE ANALYSIS

### Step 6.1: DOES THE BUGGY CODE EXIST IN STABLE TREES?
YES. The nct6683 driver with the customer ID switch exists in all active
stable trees:
- **v6.1.y:** Has INTEL, MITAC, MSI, ASROCK, ASROCK2
- **v6.6.y:** Has INTEL, MITAC, MSI, MSI2, ASROCK, ASROCK2
- **v6.12.y:** Has INTEL, MITAC, MSI, MSI2, MSI3, ASROCK through ASROCK3

All versions have the switch structure and the `default: return -ENODEV`
path.

### Step 6.2: BACKPORT COMPLICATIONS
The patch will need trivial context adjustment since each stable tree
has a different number of existing customer IDs. However, the pattern is
identical: add a #define and a case. This is a trivial conflict to
resolve.

### Step 6.3: RELATED FIXES ALREADY IN STABLE
No fix for this specific board (0x1633) exists in stable.

---

## PHASE 7: SUBSYSTEM AND MAINTAINER CONTEXT

### Step 7.1: SUBSYSTEM CRITICALITY
- **Subsystem:** drivers/hwmon (hardware monitoring)
- **Criticality:** PERIPHERAL to IMPORTANT - hwmon is used on all
  server/desktop boards for temperature/fan monitoring. ASRock B650I is
  a consumer AM5 motherboard (AMD Ryzen).

### Step 7.2: SUBSYSTEM ACTIVITY
The nct6683 driver is actively maintained with regular customer ID
additions (the most recent being Dec 2025). Guenter Roeck is the active
maintainer.

---

## PHASE 8: IMPACT AND RISK ASSESSMENT

### Step 8.1: WHO IS AFFECTED
Users of the ASRock B650I Lightning WiFi motherboard who want hardware
monitoring. This is a consumer AM5 board, so it has a real user
population.

### Step 8.2: TRIGGER CONDITIONS
Every boot on the affected hardware. The driver will always fail to
probe without this ID.

### Step 8.3: FAILURE MODE SEVERITY
Without this patch: hardware monitoring completely non-functional on
this board. Severity: LOW-MEDIUM (not a crash or security issue, but a
real hardware enablement problem). Users can work around with `force=1`
module parameter, but this is non-obvious.

### Step 8.4: RISK-BENEFIT RATIO
- **BENEFIT:** Enables hwmon for a real board. Trivial, well-tested
  pattern.
- **RISK:** Essentially zero. 3 lines, no logic change, only affects one
  specific hardware ID.
- **Ratio:** Very favorable for backporting.

---

## PHASE 9: FINAL SYNTHESIS

### Step 9.1: EVIDENCE COMPILED

**FOR backporting:**
- Textbook device ID addition to an existing driver (explicit exception
  in stable rules)
- 3 lines total, minimal and obviously correct
- Follows the exact pattern of 10+ previous identical commits
- Enables hardware monitoring on a real consumer motherboard
- Applied by subsystem maintainer (Guenter Roeck)
- Driver exists in all active stable trees
- Zero regression risk

**AGAINST backporting:**
- Not a crash/security/corruption fix
- Cosmetically, it's "adding support" not "fixing a bug"
- Patch needs minor context adjustment for older stable trees (trivial)

### Step 9.2: STABLE RULES CHECKLIST
1. **Obviously correct and tested?** YES - identical pattern to 10+
   previous commits, applied by maintainer
2. **Fixes a real bug?** YES - enables hardware monitoring on real
   hardware that is otherwise blocked
3. **Important issue?** MODERATE - hardware completely non-functional
   without it
4. **Small and contained?** YES - 3 lines in one file
5. **No new features or APIs?** CORRECT - just a device ID to an
   existing driver
6. **Can apply to stable?** YES with trivial context adjustments

### Step 9.3: EXCEPTION CATEGORY
**YES - Device ID addition to existing driver.** This is explicitly
listed as an exception that IS allowed in stable despite being an
"addition."

### Step 9.4: DECISION
This is a textbook device ID addition. The stable kernel rules
explicitly allow these because they are trivial, risk-free, and enable
real hardware for real users.

---

## Verification

- [Phase 1] Parsed commit message: author Petr Klotz, applied by Guenter
  Roeck (hwmon maintainer), Link to lore present
- [Phase 2] Diff analysis: 3 lines added - one #define
  (NCT6683_CUSTOMER_ID_ASROCK6 0x1633) and one case statement in
  nct6683_probe()
- [Phase 3] git blame: customer ID mechanism originates from
  41082d66bfd6 (v3.16, 2014), last ASRock ID added in c0fa7879c985 (Dec
  2025)
- [Phase 3] git log history: confirmed 10+ identical customer ID
  additions following the same pattern
- [Phase 4] Lore fetch failed (Anubis protection), but Link tag confirms
  accepted submission; commit applied by subsystem maintainer
- [Phase 4] Web search: confirmed pattern of customer ID additions is
  routine for this driver
- [Phase 5] nct6683_probe() is the only function modified; standard
  platform driver probe path
- [Phase 6] Verified driver exists in v6.1, v6.6, v6.12 stable trees via
  `git show v6.X:drivers/hwmon/nct6683.c`; all have the customer ID
  switch with the default: return -ENODEV path
- [Phase 6] Minor context conflicts expected due to differing number of
  existing IDs per stable tree (trivial to resolve)
- [Phase 8] Risk: essentially zero. Benefit: enables hwmon on real
  ASRock B650I Lightning WiFi board
- UNVERIFIED: Could not read the full lore discussion thread due to
  Anubis protection; however, maintainer acceptance is confirmed by
  Guenter Roeck's SOB

**YES**

 drivers/hwmon/nct6683.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/hwmon/nct6683.c b/drivers/hwmon/nct6683.c
index 4a83804140386..0581770380cc5 100644
--- a/drivers/hwmon/nct6683.c
+++ b/drivers/hwmon/nct6683.c
@@ -182,6 +182,7 @@ superio_exit(int ioreg)
 #define NCT6683_CUSTOMER_ID_ASROCK3	0x1631
 #define NCT6683_CUSTOMER_ID_ASROCK4	0x163e
 #define NCT6683_CUSTOMER_ID_ASROCK5	0x1621
+#define NCT6683_CUSTOMER_ID_ASROCK6	0x1633

 #define NCT6683_REG_BUILD_YEAR		0x604
 #define NCT6683_REG_BUILD_MONTH		0x605
@@ -1245,6 +1246,8 @@ static int nct6683_probe(struct platform_device *pdev)
 		break;
 	case NCT6683_CUSTOMER_ID_ASROCK5:
 		break;
+	case NCT6683_CUSTOMER_ID_ASROCK6:
+		break;
 	default:
 		if (!force)
 			return -ENODEV;
-- 
2.53.0

^ permalink raw reply related

* [PATCH AUTOSEL 7.0-6.12] exfat: fix incorrect directory checksum after rename to shorter name
From: Sasha Levin @ 2026-04-20 13:21 UTC (permalink / raw)
  To: patches, stable
  Cc: Chi Zhiling, Sungjong Seo, Yuezhang Mo, Namjae Jeon, Sasha Levin,
	linux-fsdevel, linux-kernel
In-Reply-To: <20260420132314.1023554-1-sashal@kernel.org>

From: Chi Zhiling <chizhiling@kylinos.cn>

[ Upstream commit ff37797badd831797b8a27830fe5046d7e23fdc3 ]

When renaming a file in-place to a shorter name, exfat_remove_entries
marks excess entries as DELETED, but es->num_entries is not updated
accordingly. As a result, exfat_update_dir_chksum iterates over the
deleted entries and computes an incorrect checksum.

This does not lead to persistent corruption because mark_inode_dirty()
is called afterward, and __exfat_write_inode later recomputes the
checksum using the correct num_entries value.

Fix by setting es->num_entries = num_entries in exfat_init_ext_entry.

Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Now I have all the information needed. Let me compile the complete
analysis.

---

## PHASE 1: COMMIT MESSAGE FORENSICS

**Step 1.1: Subject Line**
- Subsystem: `exfat` (filesystem)
- Action verb: "fix"
- Summary: Fix incorrect directory checksum computed after in-place
  rename to a shorter filename.

**Step 1.2: Tags**
- `Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn>` - author, KylinOS
  developer
- `Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>` - Samsung, known
  exfat maintainer
- `Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com>` - Sony, major exfat
  contributor (authored the dentry cache conversion)
- `Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>` - exfat subsystem
  maintainer, committed the patch
- No Fixes: tag, no Cc: stable, no Reported-by (expected for manual
  review candidates)

**Step 1.3: Commit Body**
- Bug: When renaming in-place to a shorter name, `exfat_remove_entries`
  marks excess entries as DELETED, but `es->num_entries` is NOT updated.
  Then `exfat_update_dir_chksum` iterates over the stale (larger) count,
  including DELETED entries in the checksum calculation.
- The author states this does NOT lead to persistent corruption under
  normal operation because `__exfat_write_inode` later recomputes the
  checksum correctly.
- Fix: Set `es->num_entries = num_entries` in `exfat_init_ext_entry`.

**Step 1.4: Hidden Bug Fix Detection**
This is explicitly labeled as a "fix" - no disguise needed. It's a clear
correctness fix.

## PHASE 2: DIFF ANALYSIS

**Step 2.1: Inventory**
- Single file changed: `fs/exfat/dir.c`
- 1 line added: `es->num_entries = num_entries;`
- Function modified: `exfat_init_ext_entry()`
- Scope: single-file, single-line surgical fix

**Step 2.2: Code Flow Change**
In `exfat_init_ext_entry` (line 486-507):
- BEFORE: The function updates `file.num_ext`, stream entry, and name
  entries, then calls `exfat_update_dir_chksum(es)` which uses
  `es->num_entries` (which may be stale/larger).
- AFTER: The function first sets `es->num_entries = num_entries`,
  ensuring `exfat_update_dir_chksum` uses the correct count.

**Step 2.3: Bug Mechanism**
Category: **Logic/correctness fix** - stale state variable leading to
incorrect checksum computation.

The chain of events:
1. `exfat_rename_file()` calls `exfat_remove_entries(&old_es,
   ES_IDX_FIRST_FILENAME + 1)` which marks entries 3..old_num-1 as
   DELETED
2. `exfat_init_ext_entry(&old_es, num_new_entries, ...)` sets
   `file.num_ext = num_new_entries - 1` but doesn't update
   `es->num_entries`
3. `exfat_update_dir_chksum(es)` iterates `i = 0..es->num_entries-1` -
   this includes DELETED entries
4. Wrong checksum stored in file entry's `checksum` field
5. Written to disk via `exfat_put_dentry_set`

**Step 2.4: Fix Quality**
- Obviously correct: the function takes `num_entries` parameter and
  already uses it for loop bounds and `num_ext`; syncing
  `es->num_entries` is clearly the right thing.
- Minimal: 1 line.
- No regression risk: For all callers where `es->num_entries` already
  equals `num_entries`, this is a harmless no-op. Only the buggy rename-
  to-shorter path gets different behavior.

## PHASE 3: GIT HISTORY INVESTIGATION

**Step 3.1: Blame**
- `exfat_init_ext_entry` was created in `ca06197382bde0` (v5.7-rc1,
  Namjae Jeon, 2020-03-02) when exfat was first added.
- Converted to dentry cache in `d97e060673906d` (v6.9-rc1, Yuezhang Mo,
  2022-08-05).
- `exfat_update_dir_chksum(es)` added inside the function by
  `4d71455976891` (v6.9-rc1, Yuezhang Mo, 2022-08-05) - THIS is the
  commit that introduced the bug.

**Step 3.2: Bug Introduction**
The bug was introduced in commit `4d71455976891` ("exfat: remove unused
functions"), first in v6.9-rc1. Before this, `exfat_update_dir_chksum`
was called separately where the correct `num_entries` was used. After
this commit, the checksum computation moved into `exfat_init_ext_entry`
but relied on `es->num_entries` being correct, which isn't always the
case.

**Step 3.3: Affected Stable Trees**
- `4d71455976891` IS in v6.12: **YES** (verified with `git merge-base
  --is-ancestor`)
- `4d71455976891` is NOT in v6.6: **YES** (verified)
- `4d71455976891` is NOT in v6.1: **YES** (verified)
- So only v6.12.y and later are affected.

**Step 3.4: Author Context**
Chi Zhiling has other exfat contributions (cache improvements). Yuezhang
Mo is the author of the original dentry cache conversion that
contributed to this bug, and reviewed this fix. The fix was applied by
Namjae Jeon, the exfat maintainer.

**Step 3.5: Dependencies**
None. The fix is self-contained - it adds one line to an existing
function. No prerequisites needed.

## PHASE 4: MAILING LIST RESEARCH

Lore.kernel.org is currently behind anti-bot protection, preventing
direct access. Unable to fetch mailing list discussion.

Record: Could not verify mailing list discussion due to lore access
restrictions.

## PHASE 5: CODE SEMANTIC ANALYSIS

**Step 5.1: Key Function**
`exfat_init_ext_entry()` is modified.

**Step 5.2: Callers**
Four call sites found:
1. `namei.c:512` - `exfat_add_entry()` (new file/dir creation) - `es` is
   freshly created, `num_entries` matches. Safe.
2. `namei.c:1057` - `exfat_rename_file()`, new entry path (rename to
   longer name) - `new_es` freshly created. Safe.
3. `namei.c:1073` - `exfat_rename_file()`, in-place path (rename to
   shorter name) - **THIS IS THE BUGGY CALLER**. `old_es.num_entries` is
   stale.
4. `namei.c:1117` - `exfat_move_file()` - `new_es` freshly created.
   Safe.

**Step 5.3: Callees**
`exfat_init_ext_entry` calls `exfat_update_dir_chksum(es)` which
iterates `es->num_entries` entries. This is where the wrong checksum is
computed.

**Step 5.4: Reachability**
The buggy path is reached via: `rename(2)` → `exfat_rename()` →
`__exfat_rename()` → `exfat_rename_file()` (else branch when
`old_es.num_entries >= num_new_entries`). This is triggered by any user
renaming a file to a shorter name on an exfat filesystem. **Directly
reachable from userspace.**

## PHASE 6: STABLE TREE ANALYSIS

**Step 6.1: Buggy Code in Stable**
The bug (commit `4d71455976891`) exists in v6.12.y but NOT in v6.6.y or
v6.1.y.

**Step 6.2: Backport Complications**
The patch is a single-line addition. The `exfat_init_ext_entry` function
exists with the same structure in all affected stable trees. Should
apply cleanly.

**Step 6.3: Related Fixes Already in Stable**
No related fixes found.

## PHASE 7: SUBSYSTEM AND MAINTAINER CONTEXT

**Step 7.1: Subsystem**
- Filesystem: exfat (`fs/exfat/`)
- Criticality: IMPORTANT. exfat is the standard filesystem for SDXC
  cards, USB drives >32GB, and cross-platform file exchange. Very widely
  used.

**Step 7.2: Activity**
Active subsystem with regular contributions from Samsung and Sony
engineers. Stable with well-maintained code.

## PHASE 8: IMPACT AND RISK ASSESSMENT

**Step 8.1: Who Is Affected**
All users of exfat filesystems who rename files to shorter names. This
includes USB drive users, SD card users, and any system mounting exfat
volumes.

**Step 8.2: Trigger Conditions**
- Trigger: Renaming a file where the new name requires fewer directory
  entries (shorter name).
- Frequency: Common operation - users rename files regularly.
- Reachable from unprivileged user: Yes (any user with write access to
  the filesystem).

**Step 8.3: Failure Mode**
- Under normal operation: Transient incorrect checksum, corrected by
  inode writeback within ~30 seconds. Severity: LOW.
- Under crash (USB yank, power loss): On-disk checksum mismatch
  persists. Other OS (Windows, macOS) that validate exfat checksums may
  refuse to read the file. fsck.exfat tools will report corruption.
  Severity: MEDIUM.
- The Linux exfat driver does NOT validate checksums on read (confirmed
  by code review of `exfat_get_dentry_set`), so Linux itself would still
  read the entry, but cross-platform compatibility is compromised.

**Step 8.4: Risk-Benefit Ratio**
- BENEFIT: HIGH for crash resilience and cross-platform correctness.
  exfat is designed for removable media where surprise removal is
  common.
- RISK: VERY LOW. Single line, no-op for all callers except the buggy
  one, reviewed by two domain experts.

## PHASE 9: FINAL SYNTHESIS

**Step 9.1: Evidence Summary**

FOR backporting:
- Fixes a real filesystem correctness bug (incorrect on-disk checksum)
- Single line fix, obviously correct, minimal risk
- Reviewed by Sungjong Seo (Samsung) and Yuezhang Mo (Sony) - the two
  primary exfat reviewers
- Applied by the subsystem maintainer (Namjae Jeon)
- Triggered by common user operation (rename) reachable from userspace
- exfat is widely used on removable media where crash/surprise removal
  is common
- Crash during the window leaves persistent checksum corruption visible
  to other OS

AGAINST backporting:
- Author states no persistent corruption under normal operation
  (writeback corrects it)
- Linux exfat driver doesn't validate checksums on read (so Linux users
  won't notice)
- Impact only manifests on crash during rename + subsequent read by
  another OS or fsck

**Step 9.2: Stable Rules Checklist**
1. Obviously correct and tested? **YES** - single line, reviewed by two
   experts
2. Fixes a real bug? **YES** - incorrect checksum written to disk
3. Important issue? **YES** - filesystem data integrity (checksum
   corruption on crash)
4. Small and contained? **YES** - 1 line in 1 file
5. No new features or APIs? **YES** - pure bug fix
6. Can apply to stable? **YES** - should apply cleanly

**Step 9.3: Exception Categories**
Not applicable - this is a standard bug fix.

**Verification:**
- [Phase 1] Parsed tags: Reviewed-by from two key exfat contributors
  (Seo, Mo), applied by maintainer (Jeon)
- [Phase 2] Diff analysis: 1 line added (`es->num_entries =
  num_entries;`) in `exfat_init_ext_entry()`
- [Phase 3] git blame: `exfat_update_dir_chksum(es)` added to the
  function by commit `4d71455976891` (v6.9-rc1)
- [Phase 3] git describe --contains: bug commit `4d71455976891` first in
  v6.9-rc1
- [Phase 3] git merge-base: confirmed present in v6.12, NOT in v6.6 or
  v6.1
- [Phase 5] Grep for callers: 4 call sites, only `namei.c:1073` (rename-
  in-place to shorter name) is affected
- [Phase 5] Code review of `exfat_update_dir_chksum`: confirmed it uses
  `es->num_entries` for loop bound
- [Phase 5] Code review of `exfat_remove_entries`: confirmed it does NOT
  update `es->num_entries`
- [Phase 5] Code review of `exfat_get_dentry_set`: confirmed Linux
  driver doesn't validate checksums on read
- [Phase 6] Verified `exfat_rename_file` rename path at
  namei.c:1066-1075 shows the sequence: remove_entries then
  init_ext_entry
- [Phase 6] Verified `__exfat_write_inode` at inode.c:20-98 re-reads
  dentry set with correct count and recomputes checksum
- [Phase 8] Verified `exfat_rename()` at namei.c:1282-1285 calls
  `mark_inode_dirty(old_inode)` after rename, confirming eventual
  writeback
- UNVERIFIED: Could not access lore.kernel.org to read mailing list
  discussion (anti-bot protection)

The fix is a minimal, obviously correct single-line change that fixes a
real filesystem correctness bug (incorrect checksum written to disk
during rename-to-shorter-name). The incorrect checksum creates a crash-
vulnerability window where surprise removal or power loss can leave
persistent corruption visible to other operating systems and filesystem
checkers. The fix was reviewed by two exfat domain experts and applied
by the subsystem maintainer.

**YES**

 fs/exfat/dir.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/exfat/dir.c b/fs/exfat/dir.c
index e710dd196e2f0..2a4f6a131fbe7 100644
--- a/fs/exfat/dir.c
+++ b/fs/exfat/dir.c
@@ -490,6 +490,7 @@ void exfat_init_ext_entry(struct exfat_entry_set_cache *es, int num_entries,
 	unsigned short *uniname = p_uniname->name;
 	struct exfat_dentry *ep;

+	es->num_entries = num_entries;
 	ep = exfat_get_dentry_cached(es, ES_IDX_FILE);
 	ep->dentry.file.num_ext = (unsigned char)(num_entries - 1);

-- 
2.53.0

^ permalink raw reply related

* [PATCH AUTOSEL 6.18] arm64: dts: imx8mq: Set the correct gpu_ahb clock frequency
From: Sasha Levin @ 2026-04-20 13:21 UTC (permalink / raw)
  To: patches, stable
  Cc: Sebastian Krzyszkowiak, Frank Li, Peng Fan, Fabio Estevam,
	Sasha Levin, robh, krzk+dt, conor+dt, shawnguo, l.stach, agx,
	devicetree, imx, linux-arm-kernel, linux-kernel
In-Reply-To: <20260420132314.1023554-1-sashal@kernel.org>

From: Sebastian Krzyszkowiak <sebastian.krzyszkowiak@puri.sm>

[ Upstream commit 1f99b5d93d99ca17d50b386a674d0ce1f20932d8 ]

According to i.MX 8M Quad Reference Manual, GPU_AHB_CLK_ROOT's maximum
frequency is 400MHz.

Fixes: 45d2c84eb3a2 ("arm64: dts: imx8mq: add GPU node")
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Sebastian Krzyszkowiak <sebastian.krzyszkowiak@puri.sm>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
Reviewed-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Error: Failed to generate final synthesis

 arch/arm64/boot/dts/freescale/imx8mq.dtsi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/boot/dts/freescale/imx8mq.dtsi b/arch/arm64/boot/dts/freescale/imx8mq.dtsi
index 607962f807beb..6a25e219832ce 100644
--- a/arch/arm64/boot/dts/freescale/imx8mq.dtsi
+++ b/arch/arm64/boot/dts/freescale/imx8mq.dtsi
@@ -1632,7 +1632,7 @@ gpu: gpu@38000000 {
 			                         <&clk IMX8MQ_GPU_PLL_OUT>,
 			                         <&clk IMX8MQ_GPU_PLL>;
 			assigned-clock-rates = <800000000>, <800000000>,
-			                       <800000000>, <800000000>, <0>;
+			                       <800000000>, <400000000>, <0>;
 			power-domains = <&pgc_gpu>;
 		};
 
-- 
2.53.0


^ permalink raw reply related

* [PATCH AUTOSEL 6.18] xfrm_user: fix info leak in build_mapping()
From: Sasha Levin @ 2026-04-20 13:21 UTC (permalink / raw)
  To: patches, stable
  Cc: Greg Kroah-Hartman, Steffen Klassert, Herbert Xu, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Sasha Levin, martin, netdev, linux-kernel
In-Reply-To: <20260420132314.1023554-1-sashal@kernel.org>

From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

[ Upstream commit 1beb76b2053b68c491b78370794b8ff63c8f8c02 ]

struct xfrm_usersa_id has a one-byte padding hole after the proto
field, which ends up never getting set to zero before copying out to
userspace.  Fix that up by zeroing out the whole structure before
setting individual variables.

Fixes: 3a2dfbe8acb1 ("xfrm: Notify changes in UDP encapsulation via netlink")
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Simon Horman <horms@kernel.org>
Assisted-by: gregkh_clanker_t1000
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Error: Failed to generate final synthesis

 net/xfrm/xfrm_user.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 1ddcf2a1eff7a..b3f69c0760d4c 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -4164,6 +4164,7 @@ static int build_mapping(struct sk_buff *skb, struct xfrm_state *x,
 
 	um = nlmsg_data(nlh);
 
+	memset(&um->id, 0, sizeof(um->id));
 	memcpy(&um->id.daddr, &x->id.daddr, sizeof(um->id.daddr));
 	um->id.spi = x->id.spi;
 	um->id.family = x->props.family;
-- 
2.53.0


^ permalink raw reply related

* [PATCH AUTOSEL 7.0-6.18] cxl/region: Fix use-after-free from auto assembly failure
From: Sasha Levin @ 2026-04-20 13:21 UTC (permalink / raw)
  To: patches, stable
  Cc: Dan Williams, Ira Weiny, Alison Schofield, Dave Jiang,
	Sasha Levin, dave, jic23, vishal.l.verma, djbw, linux-cxl,
	linux-kernel
In-Reply-To: <20260420132314.1023554-1-sashal@kernel.org>

From: Dan Williams <dan.j.williams@intel.com>

[ Upstream commit 87805c32e6ad7b5ce2d9f7f47e76081857a4a335 ]

The following crash signature results from region destruction while an
endpoint decoder is staged, but not fully attached.

[ dj: Moved bus_find_device( to next line. ]

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://patch.msgid.link/20260327052821.440749-2-dan.j.williams@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

# Analysis: cxl/region: Fix use-after-free from auto assembly failure

## PHASE 1: COMMIT MESSAGE FORENSICS

**Step 1.1: Subject Line**
- Subsystem: `cxl/region` (CXL memory region management)
- Action verb: "Fix" (explicit bug fix)
- Record: cxl/region Fix use-after-free bug triggered by region
  destruction after auto-assembly staging failure

**Step 1.2: Commit Tags (from original mbox found via b4)**
- Original mbox version (20260327052821.440749-2) contains:
  - `Fixes: a32320b71f08 ("cxl/region: Add region autodiscovery")` ←
    v6.3-rc1
  - `Cc: <stable@vger.kernel.org>` ← explicit stable nomination by
    author
  - `Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>`
- Reviewed-by: Ira Weiny, Alison Schofield, Dave Jiang (three maintainer
  reviewers)
- Signed-off-by: Dan Williams (author; CXL subsystem maintainer), Dave
  Jiang (committer)
- Link:
  patch.msgid.link/20260327052821.440749-2-dan.j.williams@intel.com
- Note: `[ dj: Moved bus_find_device( to next line. ]` - minor
  formatting adjustment at commit time
- Record: Author explicitly Cc'd stable, provides Fixes: tag, triple
  maintainer Reviewed-by

**Step 1.3: Commit Body**
- Candidate commit message is very short. Original mbox (before
  committer trimming) shows a full KASAN splat:
  ```
  BUG: KASAN: slab-use-after-free in __cxl_decoder_detach+0x724/0x830
  [cxl_core]
  Read of size 8 at addr ffff888265638840 by task modprobe/1287
  ... unregister_region+0x88/0x140 [cxl_core]
  ... devres_release_all+0x172/0x230
  ```
- The "staged" state is established by `cxl_region_attach_auto()` and
  finalized by `cxl_region_attach_position()`
- Memdev removal sees `cxled->cxld.region == NULL` (staged but not
  finalized) and falsely thinks decoder is unattached; later region
  removal finds stale pointer to freed endpoint decoder
- Record: Real bug, KASAN UAF, concrete crash, reachable via memdev
  unregister during autoassembly

**Step 1.4: Hidden Fix Detection**
- Not hidden - explicit "Fix use-after-free"
- Record: Explicit UAF fix, not disguised

## PHASE 2: DIFF ANALYSIS

**Step 2.1: Inventory**
- Files: `drivers/cxl/core/region.c` (+50), `drivers/cxl/cxl.h` (+4 -2)
- Functions modified: `cxl_rr_ep_add`, `cxl_region_attach_auto`,
  `__cxl_decoder_detach`
- New functions: `cxl_region_by_target`, `cxl_cancel_auto_attach`
- Scope: single-subsystem surgical fix
- Record: ~60 lines added in 2 files, contained in CXL core

**Step 2.2: Code Flow Changes**
- Before: `cxl_region_attach_auto()` places cxled into
  `p->targets[pos]`, increments `nr_targets`, but `cxld->region` remains
  NULL until `cxl_rr_ep_add()` runs later. If the auto-assembly fails
  (never reaches `cxl_rr_ep_add`), the stale pointer in `p->targets[]`
  persists.
- After: New intermediate state `CXL_DECODER_STATE_AUTO_STAGED` tracks
  the "attached to target array but not yet fully attached" window;
  `__cxl_decoder_detach` now cancels the staging when `cxlr == NULL`
- Record: Adds state tracking for the previously-untracked window
  between target-array placement and region attachment

**Step 2.3: Bug Mechanism**
- Category: (d) Memory safety / UAF fix + state machine gap
- Mechanism: Race between auto-assembly failure and memdev removal. When
  memdev is removed via `cxld_unregister()`, `cxl_decoder_detach(NULL,
  cxled, -1, DETACH_INVALIDATE)` is called. Path hits `cxlr =
  cxled->cxld.region` which is NULL for a staged-but-not-assembled
  decoder, returns NULL without removing the stale `p->targets[pos]`
  pointer. Later region destruction dereferences the freed cxled.
- Record: UAF in `__cxl_decoder_detach` call path from
  `unregister_region` -> iterates freed targets

**Step 2.4: Fix Quality**
- Surgical: introduces one new enum value, state transitions in 2
  places, one new cleanup helper, one new matcher
- No API changes, no locking changes, no hot-path changes
- Low regression risk: only affects auto-assembly path on failure
- Record: High-quality, well-contained fix

## PHASE 3: GIT HISTORY INVESTIGATION

**Step 3.1: Blame**
- `cxl_region_attach_auto()` and the `CXL_DECODER_STATE_AUTO` enum were
  introduced in the Fixes: target
- Record: Buggy code introduced in v6.3-rc1 via a32320b71f08

**Step 3.2: Follow Fixes: Tag**
- `git describe a32320b71f08 --contains` → `v6.3-rc1~89^2~6^2~7`
- Commit: "cxl/region: Add region autodiscovery" by Dan Williams, Feb
  2023
- Present in all stable trees from v6.3+: 6.6.y, 6.12.y, 6.15.y, 6.17.y
  (note: 6.1 predates the bug)
- Record: Bug exists in all stable trees from v6.3 onwards

**Step 3.3: File History**
- Recent changes relevant: `b3a88225519cf cxl/region: Consolidate
  cxl_decoder_kill_region() and cxl_region_detach()` (v6.17-rc1)
  refactored the two call sites into `__cxl_decoder_detach`;
  `d03fcf50ba56f cxl: Convert to ACQUIRE() for conditional rwsem
  locking` introduced new locking helpers
- Record: Code has been refactored in 7.0; older stable trees (<6.17)
  use `cxl_region_detach()` with similar `if (!cxlr) return 0;` pattern
  that has the same bug and would need an adapted backport

**Step 3.4: Author**
- Dan Williams is the CXL subsystem maintainer (originator of region
  autodiscovery); regular prolific contributor to drivers/cxl/
- Record: Subsystem maintainer authoring the fix → high trust

**Step 3.5: Dependencies**
- Fix uses `bus_find_device(&cxl_bus_type, ...)` - available since CXL
  bus exists
- Uses `__free(put_device)` scope-based cleanup - present in 6.6+
- No explicit prerequisites; part of a 9-patch series but patches 2-9
  are test/dax_hmem work unrelated to this fix
- Record: This patch (1/9) is self-contained; subsequent patches don't
  depend on it

## PHASE 4: MAILING LIST AND EXTERNAL RESEARCH

**Step 4.1: b4 dig / Lore Discussion**
- `b4 am` at
  patch.msgid.link/20260327052821.440749-2-dan.j.williams@intel.com
  fetched the full 9-patch thread
- This is the only revision (no v1/v2 indicated in cover letter)
- Cover letter states: "One use-after-free has been there since the
  original automatic region assembly code."
- Record: Single revision, clean review history, author explicitly flags
  UAF age

**Step 4.2: Reviewers**
- Ira Weiny, Alison Schofield, Dave Jiang - all CXL maintainers (DKIM-
  verified intel.com sign-offs)
- All three provided Reviewed-by on this patch
- Record: Thoroughly reviewed by core CXL maintainers

**Step 4.3: Bug Report**
- Bug was discovered by the author while writing test code (series 8/9:
  "Simulate auto-assembly failure"). Series 9/9 adds a test that
  exercises this path.
- Record: Discovered via new test harness; reproducible and tested in
  tree

**Step 4.4: Related Patches**
- 9-patch series: patch 1/9 (this) is a standalone UAF fix; remaining
  patches refactor dax_hmem and add tests
- No dependencies between this patch and 2-9
- Record: Standalone fix, no series dependencies

**Step 4.5: Stable Mailing List**
- Cc: stable@vger.kernel.org was present in original mbox posting
- Record: Explicitly nominated for stable by author

## PHASE 5: CODE SEMANTIC ANALYSIS

**Step 5.1: Key Functions**
- Modified: `cxl_rr_ep_add`, `cxl_region_attach_auto`,
  `__cxl_decoder_detach`
- Added: `cxl_region_by_target`, `cxl_cancel_auto_attach`
- Record: 3 modified, 2 new helpers

**Step 5.2: Callers**
- `cxl_region_attach_auto` is called from `cxl_region_attach` during
  region creation
- `__cxl_decoder_detach` is called from `cxl_decoder_detach`, which is
  called from `cxld_unregister()` (on endpoint decoder device removal)
  and `detach_target()` (sysfs detach)
- `cxld_unregister` is registered via `devm_add_action_or_reset` in
  `cxl_decoder_autoremove` - fires on device/driver removal
- Record: Reachable via module unload, memdev hot-unplug, and sysfs-
  driven detach

**Step 5.3: Callees**
- `cxl_cancel_auto_attach` uses `bus_find_device` (existing API) with a
  simple matcher
- Record: Uses existing, well-established kernel APIs

**Step 5.4: Call Chain Reachability**
- modprobe / rmmod cxl_test / rmmod cxl_mem → memdev removal →
  cxld_unregister → cxl_decoder_detach → __cxl_decoder_detach → UAF
- Production scenarios: CXL hot-unplug, module unload during
  autoassembly, memdev probe failure during multi-decoder region
  assembly
- Record: Reachable from module-unload paths; triggerable on real
  hardware

**Step 5.5: Similar Patterns**
- The `state != CXL_DECODER_STATE_AUTO` guard in
  `cxl_region_attach_auto()` (line 1779) checks for the simpler two-
  state enum; adding a staged state does not regress this check because
  the staged->auto transition is managed internally
- Record: No parallel instances needing the same fix

## PHASE 6: CROSS-REFERENCING AND STABLE TREE ANALYSIS

**Step 6.1: Code in Stable Trees**
- `CXL_DECODER_STATE_AUTO` enum exists in v6.3 onwards (confirmed by
  checking v6.1 → missing, v6.3 → present)
- `cxl_region_attach_auto()` exists in v6.3 onwards
- The buggy `if (!cxlr) return 0;` (or `return NULL;`) pattern exists in
  v6.6, v6.12, v6.15 equivalents (verified by reading v6.6 and v6.12
  tags)
- Record: Bug exists in v6.3, v6.6, v6.12, v6.15, v6.17, v7.0 trees

**Step 6.2: Backport Complications**
- v6.17+: `__cxl_decoder_detach` exists with same structure → should
  apply cleanly or with minor offsets
- Pre-v6.17 (6.6, 6.12, 6.15): function was named `cxl_region_detach`
  and called directly from `cxl_decoder_kill_region` +
  `cxld_unregister`; fix would need adaptation - inserting
  `cxl_cancel_auto_attach(cxled)` before the `return 0` in
  `cxl_region_detach`
- Pre-6.6 `__free(put_device)` scope cleanup: available via cleanup.h
  since ~5.19, but usage may differ
- Record: Clean apply on 6.17+/7.0; adapted backport needed for 6.6-6.15

**Step 6.3: Related Fixes in Stable**
- `101c268bd2f37 cxl/port: Fix use-after-free, permit out-of-order
  decoder shutdown` (v6.12-rc6) - different UAF, already backported
- `b3a88225519cf cxl/region: Consolidate...` (v6.17-rc1) - refactor, not
  a fix
- Record: No duplicate fix already in stable

## PHASE 7: SUBSYSTEM CONTEXT

**Step 7.1: Subsystem Criticality**
- drivers/cxl/ - CXL (Compute Express Link) memory subsystem
- Used for CXL memory devices, increasingly common in server/datacenter
  deployments
- Bug triggers during module unload or memdev removal - important for
  operability
- Record: IMPORTANT (growing datacenter usage; data-tier memory path)

**Step 7.2: Activity**
- Very actively developed subsystem (~140 commits to region.c since
  v6.6)
- Record: Active subsystem; fix is current

## PHASE 8: IMPACT AND RISK

**Step 8.1: Affected Users**
- Users of CXL memory devices whose auto-assembly fails (e.g., firmware-
  programmed decoders that can't fully assemble, partial hardware
  configurations, module unload races)
- Record: CXL hardware users; scope grows as CXL adoption grows

**Step 8.2: Trigger Conditions**
- Memdev removed while at least one endpoint decoder is in staged-but-
  not-completed state
- Reproducible via cxl_test with `fail_autoassemble` module option
  (added in patch 8/9)
- Production trigger: module reload during partial assembly; hardware
  hotplug during assembly
- Record: Realistic trigger; concrete reproducer provided in same series

**Step 8.3: Failure Mode**
- Kernel panic via KASAN slab-use-after-free
- Without KASAN: silent memory corruption or crash in
  `__cxl_decoder_detach`
- Severity: CRITICAL (UAF with clear path to crash)
- Record: CRITICAL - memory safety violation

**Step 8.4: Risk/Benefit**
- Benefit: HIGH - eliminates real UAF in CXL subsystem
- Risk: LOW - adds new state, doesn't change successful path; all
  transitions are bounded
- Ratio: Strong positive
- Record: Clear net benefit

## PHASE 9: SYNTHESIS

**Step 9.1: Evidence Compilation**
- FOR: UAF with KASAN trace, Fixes: tag → v6.3 (affects all modern
  stable trees), explicit Cc: stable by author, triple maintainer
  Reviewed-by, author is subsystem maintainer, concrete reproducer in
  same series, contained ~60-line fix, no new userspace API
- AGAINST: Some adaptation needed for pre-v6.17 stable trees (function
  renamed), patch is very new (not in mainline yet - currently in linux-
  next)
- Record: FOR evidence overwhelming

**Step 9.2: Stable Rules Check**
1. Obviously correct: YES (state transitions are bounded and reviewed)
2. Real bug: YES (KASAN-confirmed UAF)
3. Important: YES (CRITICAL - UAF, potential crash/corruption)
4. Small/contained: YES (2 files, ~60 lines)
5. No new features/APIs: YES (internal state enum addition only)
6. Applies cleanly: Mostly - clean on v6.17+/v7.0, needs adaptation for
   6.6-6.15

**Step 9.3: Exception Categories**
- Not a device-ID-add or quirk; standalone UAF fix

**Step 9.4: Decision**
- Clear YES. Real UAF, author-nominated for stable, well-reviewed,
  contained scope.

## Verification

- [Phase 1] Read original mbox via `b4 am` at `/tmp/20260326_dan_j_willi
  ams_dax_hmem_add_tests_for_the_dax_hmem_takeover_capability.mbx`:
  confirmed `Fixes: a32320b71f08`, `Cc: <stable@vger.kernel.org>`, KASAN
  splat, three Reviewed-by from CXL maintainers
- [Phase 2] Read `drivers/cxl/core/region.c` 1040-1070, 1780-1810,
  2150-2220 and `drivers/cxl/cxl.h` 360-405 to validate code flow
- [Phase 3] `git show a32320b71f08 --stat`: confirmed introduction
  commit is "cxl/region: Add region autodiscovery" by Dan Williams, Feb
  2023
- [Phase 3] `git describe a32320b71f08 --contains` →
  `v6.3-rc1~89^2~6^2~7`: bug present since v6.3
- [Phase 3] `git show v6.1:drivers/cxl/cxl.h | grep cxl_decoder_state`:
  empty (enum didn't exist before v6.3)
- [Phase 3] `git show v6.3:drivers/cxl/cxl.h`: confirmed enum exists in
  v6.3
- [Phase 3] `git log --author="Dan Williams"` in drivers/cxl/: confirmed
  Dan Williams as subsystem maintainer
- [Phase 3] `git log --grep="cxl_decoder_detach"`: confirmed
  consolidation in `b3a88225519cf` (v6.17-rc1)
- [Phase 4] `b4 am https://patch.msgid.link/...`: fetched 9-patch
  series, confirmed triple DKIM-verified Reviewed-by
- [Phase 4] Cover letter read: confirmed "One use-after-free has been
  there since the original automatic region assembly code"
- [Phase 4] `git log linux-next/master --grep="use-after-free from auto
  assembly"`: commit `87805c32e6ad7` present in linux-next but not
  mainline yet
- [Phase 5] `grep -n CXL_DECODER_STATE` in drivers/cxl: identified all
  usage sites
- [Phase 5] Read `drivers/cxl/core/port.c` around line 2190: confirmed
  `cxld_unregister` calls `cxl_decoder_detach(NULL, cxled, -1,
  DETACH_INVALIDATE)`, matching the UAF trigger path
- [Phase 6] `git show v6.6:drivers/cxl/core/region.c` and `v6.12`:
  confirmed `cxl_region_detach()` has same `if (!cxlr) return 0;` bug
- [Phase 8] KASAN stack trace in original mbox shows
  `__cxl_decoder_detach+0x724 ... unregister_region+0x88 ...
  devres_release_all+0x172` - concrete reachability
- UNVERIFIED: Whether backport adaptation for pre-6.17 stable trees will
  be straightforward or require substantial rework beyond renaming
  `__cxl_decoder_detach` → `cxl_region_detach`

**Summary**

This is a genuine, well-reviewed use-after-free fix with a KASAN-
confirmed crash signature, originating from the CXL subsystem
maintainer. The bug has existed since v6.3 when region autodiscovery was
introduced, affects all current stable trees, and the author explicitly
Cc'd stable. The fix is small, contained, and introduces only an
internal enum value plus a cleanup helper. Reviewed by three CXL
maintainers. Pre-v6.17 stable trees will need minor contextual
adaptation due to the `__cxl_decoder_detach` refactor, but the
underlying logic is directly transferable.

**YES**

 drivers/cxl/core/region.c | 54 ++++++++++++++++++++++++++++++++++++++-
 drivers/cxl/cxl.h         |  6 +++--
 2 files changed, 57 insertions(+), 3 deletions(-)

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 373551022a2b3..1e97443535167 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -1063,6 +1063,14 @@ static int cxl_rr_ep_add(struct cxl_region_ref *cxl_rr,
 
 	if (!cxld->region) {
 		cxld->region = cxlr;
+
+		/*
+		 * Now that cxld->region is set the intermediate staging state
+		 * can be cleared.
+		 */
+		if (cxld == &cxled->cxld &&
+		    cxled->state == CXL_DECODER_STATE_AUTO_STAGED)
+			cxled->state = CXL_DECODER_STATE_AUTO;
 		get_device(&cxlr->dev);
 	}
 
@@ -1804,6 +1812,7 @@ static int cxl_region_attach_auto(struct cxl_region *cxlr,
 	pos = p->nr_targets;
 	p->targets[pos] = cxled;
 	cxled->pos = pos;
+	cxled->state = CXL_DECODER_STATE_AUTO_STAGED;
 	p->nr_targets++;
 
 	return 0;
@@ -2153,6 +2162,47 @@ static int cxl_region_attach(struct cxl_region *cxlr,
 	return 0;
 }
 
+static int cxl_region_by_target(struct device *dev, const void *data)
+{
+	const struct cxl_endpoint_decoder *cxled = data;
+	struct cxl_region_params *p;
+	struct cxl_region *cxlr;
+
+	if (!is_cxl_region(dev))
+		return 0;
+
+	cxlr = to_cxl_region(dev);
+	p = &cxlr->params;
+	return p->targets[cxled->pos] == cxled;
+}
+
+/*
+ * When an auto-region fails to assemble the decoder may be listed as a target,
+ * but not fully attached.
+ */
+static void cxl_cancel_auto_attach(struct cxl_endpoint_decoder *cxled)
+{
+	struct cxl_region_params *p;
+	struct cxl_region *cxlr;
+	int pos = cxled->pos;
+
+	if (cxled->state != CXL_DECODER_STATE_AUTO_STAGED)
+		return;
+
+	struct device *dev __free(put_device) =
+		bus_find_device(&cxl_bus_type, NULL, cxled, cxl_region_by_target);
+	if (!dev)
+		return;
+
+	cxlr = to_cxl_region(dev);
+	p = &cxlr->params;
+
+	p->nr_targets--;
+	cxled->state = CXL_DECODER_STATE_AUTO;
+	cxled->pos = -1;
+	p->targets[pos] = NULL;
+}
+
 static struct cxl_region *
 __cxl_decoder_detach(struct cxl_region *cxlr,
 		     struct cxl_endpoint_decoder *cxled, int pos,
@@ -2176,8 +2226,10 @@ __cxl_decoder_detach(struct cxl_region *cxlr,
 		cxled = p->targets[pos];
 	} else {
 		cxlr = cxled->cxld.region;
-		if (!cxlr)
+		if (!cxlr) {
+			cxl_cancel_auto_attach(cxled);
 			return NULL;
+		}
 		p = &cxlr->params;
 	}
 
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 9b947286eb9b0..30a31968f2663 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -378,12 +378,14 @@ struct cxl_decoder {
 };
 
 /*
- * Track whether this decoder is reserved for region autodiscovery, or
- * free for userspace provisioning.
+ * Track whether this decoder is free for userspace provisioning, reserved for
+ * region autodiscovery, whether it is started connecting (awaiting other
+ * peers), or has completed auto assembly.
  */
 enum cxl_decoder_state {
 	CXL_DECODER_STATE_MANUAL,
 	CXL_DECODER_STATE_AUTO,
+	CXL_DECODER_STATE_AUTO_STAGED,
 };
 
 /**
-- 
2.53.0


^ permalink raw reply related

* [PATCH AUTOSEL 7.0-5.10] fbdev: omap2: fix inconsistent lock returns in omapfb_mmap
From: Sasha Levin @ 2026-04-20 13:21 UTC (permalink / raw)
  To: patches, stable
  Cc: Hongling Zeng, kernel test robot, Helge Deller, Sasha Levin,
	linux-omap, linux-fbdev, dri-devel, linux-kernel
In-Reply-To: <20260420132314.1023554-1-sashal@kernel.org>

From: Hongling Zeng <zenghongling@kylinos.cn>

[ Upstream commit 98cf7df6e0844f7076df1db690c1ede9d69b61ff ]

Fix the warning about inconsistent returns for '&rg->lock' in
omapfb_mmap() function. The warning arises because the error path
uses 'ofbi->region' while the normal path uses 'rg'.

smatch warnings:
drivers/video/fbdev/omap2/omapfb/omapfb-main.c:1126 omapfb_mmap()
warn: inconsistent returns '&rg->lock'.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Hongling Zeng <zenghongling@kylinos.cn>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Good - line 1122 in v6.6 also has `omapfb_put_mem_region(ofbi->region)`
on the error path, confirming the bug exists in stable trees. Now let me
compile the full analysis.

---

## PHASE 1: COMMIT MESSAGE FORENSICS

**Step 1.1: Subject Line**
- Subsystem: `fbdev: omap2`
- Action verb: "fix"
- Summary: Fix inconsistent lock returns in `omapfb_mmap`

**Step 1.2: Tags**
- Reported-by: kernel test robot `<lkp@intel.com>` — automated static
  analysis (smatch)
- Signed-off-by: Hongling Zeng (author), Helge Deller (fbdev maintainer)
- No Fixes: tag (expected)
- No Cc: stable (expected)

**Step 1.3: Commit Body**
- Bug: smatch warns about inconsistent returns for `&rg->lock` because
  the error path uses `ofbi->region` while the success path uses `rg`
- Symptom: smatch warning (not a runtime crash report)
- Root cause: The error path re-reads `ofbi->region` instead of using
  the captured local `rg`

**Step 1.4: Hidden Bug Fix?**
Yes. While described as a "warning fix," this is actually a real locking
correctness bug, as I'll demonstrate below.

## PHASE 2: DIFF ANALYSIS

**Step 2.1: Inventory**
- Single file: `drivers/video/fbdev/omap2/omapfb/omapfb-main.c`
- 1 line changed: `ofbi->region` → `rg`
- Function: `omapfb_mmap`
- Scope: single-file surgical fix

**Step 2.2: Code Flow Change**
- BEFORE: Error path calls `omapfb_put_mem_region(ofbi->region)` — re-
  reads the `ofbi->region` pointer
- AFTER: Error path calls `omapfb_put_mem_region(rg)` — uses the locally
  captured pointer

**Step 2.3: Bug Mechanism**
This is a **synchronization/lock correctness** bug. Key details:

1. `omapfb_get_mem_region()` acquires `down_read_nested(&rg->lock)` and
   returns its argument (line 183-188 of omapfb.h)
2. At line 1100: `rg = omapfb_get_mem_region(ofbi->region)` acquires the
   read lock and stores the pointer locally
3. Success path (line 1119) correctly releases via `rg`
4. Error path (line 1124, the bug) releases via `ofbi->region`

Critically, `ofbi->region` **can be changed** by another thread — in
`omapfb-ioctl.c` line 98: `ofbi->region = new_rg` during
`omapfb_setup_plane()`. If this happens between get and put:
- `up_read()` is called on a semaphore **not held** by this thread →
  undefined behavior / corruption
- The **actual** locked semaphore is **never released** → deadlock

**Step 2.4: Fix Quality**
- Obviously correct: use the already-captured local variable
- Minimal: 1-line change
- Zero regression risk: the fix is strictly safer than the original code
- Pattern matches `omapfb-sysfs.c` line 73, which correctly uses `rg` on
  its error path

## PHASE 3: GIT HISTORY INVESTIGATION

**Step 3.1: Blame**
The buggy line was introduced in commit `3ed37d9aba486d` ("Revert
'OMAPFB: simplify locking'") by Tomi Valkeinen on 2012-12-13. This code
has been present since ~v3.8, meaning all active stable trees contain
it.

**Step 3.2: Fixes tag**
No Fixes: tag present. However, the buggy commit is `3ed37d9aba486d`
which reverted simplified locking and reintroduced per-region locking.
The error path was incorrectly written using `ofbi->region` instead of
`rg` at that time.

**Step 3.3: File History**
The file hasn't had many recent changes — last meaningful changes were
build system/boilerplate updates. No prerequisites needed.

**Step 3.4: Author**
Hongling Zeng is not the subsystem maintainer but has contributed other
small fixes (USB quirks, sysfs fixes). The commit was signed off by
Helge Deller, the fbdev maintainer.

**Step 3.5: Dependencies**
None. This is a standalone one-line fix.

## PHASE 4: MAILING LIST RESEARCH

**Step 4.1-4.2:** b4 dig could not find the original submission. Lore is
protected by anti-scraping measures. The commit was signed off by the
fbdev maintainer (Helge Deller), confirming proper review.

**Step 4.3:** The bug was reported by kernel test robot (smatch static
analysis), not a runtime bug report.

**Step 4.4-4.5:** No related series; standalone patch.

## PHASE 5: CODE SEMANTIC ANALYSIS

**Step 5.1-5.2:** The function `omapfb_mmap` is registered as the
`.fb_mmap` callback in the framebuffer ops structure, called when
userspace mmaps the framebuffer device (`/dev/fb*`). This is a standard
userspace-reachable path.

**Step 5.3:** `omapfb_get_mem_region` → `down_read_nested` (acquires
rw_semaphore read lock). `omapfb_put_mem_region` → `up_read` (releases
read lock). These must operate on the same object.

**Step 5.4:** Reachable from userspace via `mmap()` on `/dev/fbX`. The
error path triggers when `vm_iomap_memory()` fails.

**Step 5.5:** In `omapfb-sysfs.c:59-73`, the identical pattern (`rg =
omapfb_get_mem_region(ofbi->region)` followed by
`omapfb_put_mem_region(rg)`) is used correctly. The bug in `omapfb_mmap`
is the sole instance of the incorrect pattern.

## PHASE 6: STABLE TREE ANALYSIS

**Step 6.1:** The buggy code exists in v6.6 stable tree (verified: line
1122 has `omapfb_put_mem_region(ofbi->region)`). Present since v3.8
(~2012). All active stable trees are affected.

**Step 6.2:** The fix is a trivial 1-line change. Will apply cleanly to
all stable trees.

**Step 6.3:** No related fixes already in stable.

## PHASE 7: SUBSYSTEM CONTEXT

**Step 7.1:** Subsystem: `drivers/video/fbdev/omap2` — OMAP2 framebuffer
driver. Criticality: PERIPHERAL (legacy ARM platform, but real users
exist in embedded systems).

**Step 7.2:** Low activity — the file hasn't changed meaningfully in
years. Mature/stable code.

## PHASE 8: IMPACT AND RISK ASSESSMENT

**Step 8.1:** Affected users: users of OMAP2 SoC framebuffer
(embedded/ARM platforms).

**Step 8.2:** Trigger conditions: Requires concurrent `mmap()` and
region-changing ioctl on the same framebuffer, plus `vm_iomap_memory()`
failure. The race window is narrow, making this unlikely to trigger in
practice.

**Step 8.3:** Failure mode: If triggered, results in **lock state
corruption** (releasing wrong lock) and potential **deadlock** (held
lock never released). Severity: HIGH (deadlock), though likelihood is
LOW.

**Step 8.4:** Risk-benefit ratio:
- BENEFIT: Fixes a genuine lock correctness bug reachable from
  userspace, prevents potential deadlock
- RISK: Essentially zero — replacing one expression with an equivalent
  one in the normal case, and a correct one in the race case. 1-line
  change. Obviously correct.

## PHASE 9: FINAL SYNTHESIS

**Step 9.1: Evidence Summary**

FOR backporting:
- Fixes a real locking correctness bug (inconsistent lock/unlock
  objects)
- `ofbi->region` can be changed concurrently via ioctl (verified in
  `omapfb-ioctl.c:98`)
- 1-line fix, trivially correct, zero regression risk
- Matches the correct pattern used elsewhere in the same codebase
  (`omapfb-sysfs.c`)
- Accepted by fbdev maintainer (Helge Deller SOB)
- Bug exists in all stable trees (since v3.8, ~2012)
- Clean apply expected

AGAINST backporting:
- OMAP2 fbdev is legacy peripheral hardware
- Race window is extremely narrow and unlikely to trigger in practice
- Only detected by static analysis, not runtime
- Low real-world impact

**Step 9.2: Stable Rules Checklist**
1. Obviously correct and tested? **YES** — trivially verifiable
2. Fixes a real bug? **YES** — inconsistent lock/unlock on potentially
   different objects
3. Important issue? **MEDIUM** — potential deadlock, but unlikely
   trigger
4. Small and contained? **YES** — 1-line change
5. No new features? **YES**
6. Can apply to stable? **YES** — clean apply

**Step 9.3: Exception Categories**
Not an exception category, but meets standard criteria as a small bug
fix.

**Step 9.4: Decision**
The fix is a 1-line change that is obviously correct, with zero
regression risk. It fixes a real (if unlikely to trigger) locking
correctness bug that could lead to deadlock. The cost of inclusion is
negligible, and it makes the code genuinely correct.

## Verification

- [Phase 1] Parsed tags: Reported-by kernel test robot, SOB from fbdev
  maintainer Helge Deller
- [Phase 2] Diff analysis: 1 line changed in error path of
  `omapfb_mmap()`, replacing `ofbi->region` with `rg`
- [Phase 2] Verified `omapfb_get_mem_region()` returns its argument and
  acquires `down_read_nested` (omapfb.h:183-188)
- [Phase 2] Verified `ofbi->region` can be changed concurrently (omapfb-
  ioctl.c:98: `ofbi->region = new_rg`)
- [Phase 3] git blame: buggy line introduced by commit 3ed37d9aba486d
  (2012-12-13, "Revert 'OMAPFB: simplify locking'"), present since ~v3.8
- [Phase 3] File history: no prerequisites needed, standalone fix
- [Phase 4] b4 dig: could not find original submission thread
- [Phase 5] Correct pattern exists in omapfb-sysfs.c:59-73 (uses `rg`
  not `ofbi->region`)
- [Phase 6] Verified buggy code exists in v6.6 stable tree (line 1122)
- [Phase 6] Fix will apply cleanly (1-line change, no surrounding churn)
- [Phase 8] Failure mode: lock corruption + potential deadlock (severity
  HIGH, likelihood LOW)

**YES**

 drivers/video/fbdev/omap2/omapfb/omapfb-main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/video/fbdev/omap2/omapfb/omapfb-main.c b/drivers/video/fbdev/omap2/omapfb/omapfb-main.c
index a8b2930290e1f..d70deb6a91508 100644
--- a/drivers/video/fbdev/omap2/omapfb/omapfb-main.c
+++ b/drivers/video/fbdev/omap2/omapfb/omapfb-main.c
@@ -1121,7 +1121,7 @@ static int omapfb_mmap(struct fb_info *fbi, struct vm_area_struct *vma)
 	return 0;

 error:
-	omapfb_put_mem_region(ofbi->region);
+	omapfb_put_mem_region(rg);

 	return r;
 }
-- 
2.53.0

^ permalink raw reply related

* [PATCH AUTOSEL 6.18] can: mcp251x: add error handling for power enable in open and resume
From: Sasha Levin @ 2026-04-20 13:21 UTC (permalink / raw)
  To: patches, stable
  Cc: Wenyuan Li, Marc Kleine-Budde, Sasha Levin, mailhol, linux-can,
	linux-kernel
In-Reply-To: <20260420132314.1023554-1-sashal@kernel.org>

From: Wenyuan Li <2063309626@qq.com>

[ Upstream commit 7a57354756c7df223abe2c33774235ad70cb4231 ]

Add missing error handling for mcp251x_power_enable() calls in both
mcp251x_open() and mcp251x_can_resume() functions.

In mcp251x_open(), if power enable fails, jump to error path to close
candev without attempting to disable power again.

In mcp251x_can_resume(), properly check return values of power enable calls
for both power and transceiver regulators. If any fails, return the error
code to the PM framework and log the failure.

This ensures the driver properly handles power control failures and
maintains correct device state.

Signed-off-by: Wenyuan Li <2063309626@qq.com>
Link: https://patch.msgid.link/tencent_F3EFC5D7738AC548857B91657715E2D3AA06@qq.com
[mkl: fix patch description]
[mkl: mcp251x_can_resume(): replace goto by return]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Error: Failed to generate final synthesis

 drivers/net/can/spi/mcp251x.c | 29 ++++++++++++++++++++++++-----
 1 file changed, 24 insertions(+), 5 deletions(-)

diff --git a/drivers/net/can/spi/mcp251x.c b/drivers/net/can/spi/mcp251x.c
index b46262e791301..5a7aa02092c7e 100644
--- a/drivers/net/can/spi/mcp251x.c
+++ b/drivers/net/can/spi/mcp251x.c
@@ -1225,7 +1225,11 @@ static int mcp251x_open(struct net_device *net)
 	}
 
 	mutex_lock(&priv->mcp_lock);
-	mcp251x_power_enable(priv->transceiver, 1);
+	ret = mcp251x_power_enable(priv->transceiver, 1);
+	if (ret) {
+		dev_err(&spi->dev, "failed to enable transceiver power: %pe\n", ERR_PTR(ret));
+		goto out_close_candev;
+	}
 
 	priv->force_quit = 0;
 	priv->tx_skb = NULL;
@@ -1272,6 +1276,7 @@ static int mcp251x_open(struct net_device *net)
 	mcp251x_hw_sleep(spi);
 out_close:
 	mcp251x_power_enable(priv->transceiver, 0);
+out_close_candev:
 	close_candev(net);
 	mutex_unlock(&priv->mcp_lock);
 	if (release_irq)
@@ -1508,11 +1513,25 @@ static int __maybe_unused mcp251x_can_resume(struct device *dev)
 {
 	struct spi_device *spi = to_spi_device(dev);
 	struct mcp251x_priv *priv = spi_get_drvdata(spi);
+	int ret = 0;
 
-	if (priv->after_suspend & AFTER_SUSPEND_POWER)
-		mcp251x_power_enable(priv->power, 1);
-	if (priv->after_suspend & AFTER_SUSPEND_UP)
-		mcp251x_power_enable(priv->transceiver, 1);
+	if (priv->after_suspend & AFTER_SUSPEND_POWER) {
+		ret = mcp251x_power_enable(priv->power, 1);
+		if (ret) {
+			dev_err(dev, "failed to restore power: %pe\n", ERR_PTR(ret));
+			return ret;
+		}
+	}
+
+	if (priv->after_suspend & AFTER_SUSPEND_UP) {
+		ret = mcp251x_power_enable(priv->transceiver, 1);
+		if (ret) {
+			dev_err(dev, "failed to restore transceiver power: %pe\n", ERR_PTR(ret));
+			if (priv->after_suspend & AFTER_SUSPEND_POWER)
+				mcp251x_power_enable(priv->power, 0);
+			return ret;
+		}
+	}
 
 	if (priv->after_suspend & (AFTER_SUSPEND_POWER | AFTER_SUSPEND_UP))
 		queue_work(priv->wq, &priv->restart_work);
-- 
2.53.0


^ permalink raw reply related

* [PATCH AUTOSEL 7.0-6.18] wifi: mt76: mt7996: fix frequency separation for station STR mode
From: Sasha Levin @ 2026-04-20 13:21 UTC (permalink / raw)
  To: patches, stable
  Cc: Peter Chiu, Lorenzo Bianconi, Felix Fietkau, Sasha Levin,
	ryder.lee, matthias.bgg, angelogioacchino.delregno,
	linux-wireless, linux-kernel, linux-arm-kernel, linux-mediatek
In-Reply-To: <20260420132314.1023554-1-sashal@kernel.org>

From: Peter Chiu <chui-hao.chiu@mediatek.com>

[ Upstream commit 59a295335021f6973a34566554b2b9371f1c6f7d ]

Fix frequency separation field for STR in MLD capabilities to get the
correct chip capability.

Signed-off-by: Peter Chiu <chui-hao.chiu@mediatek.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20260316-mt7996-sta-str-v1-1-666814e6ab2d@kernel.org
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Now I have enough information to complete the full analysis.

---

## PHASE 1: COMMIT MESSAGE FORENSICS

### Step 1.1: Subject Line
- **Subsystem**: `wifi: mt76: mt7996:` — MediaTek mt7996 WiFi 7 wireless
  driver
- **Action verb**: "fix" — explicitly a bug fix
- **Summary**: Corrects frequency separation field for station STR mode
  in MLD capabilities

### Step 1.2: Tags
- `Signed-off-by: Peter Chiu <chui-hao.chiu@mediatek.com>` — author from
  MediaTek (the chip vendor)
- `Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>` — co-
  author/maintainer of mt76
- `Link:` to patch.msgid.link — mailing list submission link
- `Signed-off-by: Felix Fietkau <nbd@nbd.name>` — mt76 maintainer
  applied the patch
- No Fixes: tag, no Cc: stable, no Reported-by — expected for candidate
  review

### Step 1.3: Commit Body
The message says: "Fix frequency separation field for STR in MLD
capabilities to get the correct chip capability." The bug is that the
station's MLD capabilities struct was missing the `FREQ_SEP_TYPE_IND`
field, causing the driver to advertise incorrect STR capabilities during
(Re)Association Request frames.

### Step 1.4: Hidden Bug Fix?
This is an explicit fix — the word "fix" is in the subject. The missing
capability field causes incorrect WiFi frame content to be advertised to
the AP during MLD association.

Record: Genuine bug fix — incorrect WiFi capability advertisement.

---

## PHASE 2: DIFF ANALYSIS

### Step 2.1: Inventory
- **1 file** changed: `drivers/net/wireless/mediatek/mt76/mt7996/init.c`
- **1 line added**:
  `FIELD_PREP_CONST(IEEE80211_MLD_CAP_OP_FREQ_SEP_TYPE_IND, 1) |`
- Scope: Single-file, static initializer change

### Step 2.2: Code Flow
- **Before**: Station iftype entry in `iftypes_ext_capa[]` only sets
  `IEEE80211_MLD_CAP_OP_MAX_SIMUL_LINKS`
- **After**: Station entry additionally sets
  `IEEE80211_MLD_CAP_OP_FREQ_SEP_TYPE_IND` to value 1
- The `mld_capa_and_ops` field is consumed by mac80211 in `mlme.c`
  (lines 2069 and 10612-10613) and included directly in MLD capability
  elements of association frames

### Step 2.3: Bug Mechanism
This is a **logic/correctness fix** — a missing capability field in a
static const initializer. `IEEE80211_MLD_CAP_OP_FREQ_SEP_TYPE_IND` (mask
`0x0f80`, bits 7-11) was not set, meaning the station reported frequency
separation type = 0 to the AP, which does not reflect the actual mt7996
hardware STR capability.

### Step 2.4: Fix Quality
- Trivially correct — adding a missing field to a bitfield OR expression
- Cannot introduce a regression — it's a static const initializer
- Minimal and surgical
- No red flags

Record: 1 line added, static initializer, zero regression risk.

---

## PHASE 3: GIT HISTORY INVESTIGATION

### Step 3.1: Blame
The station entry was introduced by commit `f5160304d57c55` ("wifi:
mt76: mt7996: Enable MLO support for client interfaces") by Lorenzo
Bianconi on 2025-09-01. The base `iftypes_ext_capa[]` array was
introduced by `a9eae65d97f3cb` ("Export MLO AP capabilities to
mac80211") on 2025-08-27.

### Step 3.2: Fixes Target
No explicit Fixes: tag, but the implicit target is `f5160304d57c55`
which first introduced the station MLD capabilities. That commit was
first present in **v6.18** (confirmed via `git merge-base --is-
ancestor`). NOT in v6.17 or earlier.

### Step 3.3: File History
The file has 20 recent commits, many related to MLO/MLD enablement. This
fix appears standalone — no series dependencies.

### Step 3.4: Author
- Peter Chiu is from MediaTek (the silicon vendor) — strong authority on
  hardware capabilities
- Lorenzo Bianconi is the mt76 co-maintainer who also signed off
- Felix Fietkau (nbd) is the mt76 maintainer who applied the patch

Record: Author is from the chip vendor. Both mt76 maintainers signed
off. Bug introduced in v6.18.

### Step 3.5: Dependencies
The fix only adds a line to an existing OR expression in a static
initializer. No functional dependencies. The prerequisite code
(`iftypes_ext_capa` with the station entry) exists in v6.18 and v7.0.

---

## PHASE 4: MAILING LIST

### Step 4.1-4.5
Lore/WebFetch was blocked by Anubis protection. `b4 dig` for the
message-id didn't find a match by commit hash (the commit hasn't been
applied to the tree I'm on). The link
`https://patch.msgid.link/20260316-mt7996-sta-
str-v1-1-666814e6ab2d@kernel.org` indicates this was a v1 single-patch
submission from Lorenzo Bianconi.

Record: Could not fetch discussion. Patch submitted as v1, single patch.

---

## PHASE 5: CODE SEMANTIC ANALYSIS

### Step 5.1-5.4
The `iftypes_ext_capa[]` array is assigned to `wiphy->iftype_ext_capab`
at line 497 of `init.c`. This is consumed by mac80211's `mlme.c` via
`cfg80211_get_iftype_ext_capa()` — the `mld_capa_and_ops` field is
directly encoded into (Re)Association Request frames and MLD
reconfiguration frames. This is a hot path for any MLD station
association.

### Step 5.5: Similar Patterns
The AP section of the same array does NOT include `FREQ_SEP_TYPE_IND`
either, but only the station section is fixed here (STR is a station-
side mode). The mt7925 driver also sets `mld_capa_and_ops` but
dynamically.

Record: Capability is directly embedded in WiFi management frames during
association.

---

## PHASE 6: STABLE TREE ANALYSIS

### Step 6.1: Does the buggy code exist in stable trees?
- The buggy commit `f5160304d57c55` is in **v6.18 and v7.0 only**
- NOT in v6.17 or earlier — so this fix is irrelevant for all current
  LTS trees (6.12.y, 6.6.y, 6.1.y, 5.15.y)
- Only relevant for **7.0.y** stable and potentially 6.18.y if that is
  still maintained

### Step 6.2: Backport Complications
- 1 line addition to a static initializer — will apply cleanly to 7.0.y
- No conflicting changes expected

### Step 6.3: No related fixes already in stable

---

## PHASE 7: SUBSYSTEM CONTEXT

### Step 7.1
- **Subsystem**: WiFi drivers / MediaTek mt76 / mt7996
- **Criticality**: PERIPHERAL (specific WiFi hardware) but WiFi is
  important for many users

### Step 7.2
- Very active subsystem — 188 mt7996 commits since v6.12
- mt7996 is a WiFi 7 chip with active MLO development

---

## PHASE 8: IMPACT AND RISK ASSESSMENT

### Step 8.1: Affected Users
- Mt7996 WiFi 7 hardware users attempting MLD (Multi-Link Device)
  operation in station mode

### Step 8.2: Trigger Conditions
- Triggered whenever a mt7996 station performs MLD association (common
  for WiFi 7 users)
- The wrong capability is always advertised

### Step 8.3: Failure Mode
- Incorrect WiFi capability in association frames → potentially
  incorrect STR mode negotiation, possible performance degradation or
  failed MLD operation
- Severity: **MEDIUM** — functional correctness issue, not a crash or
  security vulnerability

### Step 8.4: Risk-Benefit
- **Benefit**: Medium — corrects WiFi MLD capability for mt7996 users
- **Risk**: Very low — 1-line static initializer change, cannot regress
- **Ratio**: Favorable

---

## PHASE 9: FINAL SYNTHESIS

### Step 9.1: Evidence
**FOR backporting:**
- Genuine correctness fix (missing capability field)
- Trivially small (1 line) and obviously correct
- Zero regression risk (static const initializer)
- Written by chip vendor (MediaTek), signed by both mt76 maintainers
- Fixes real functional issue for WiFi 7 MLD users
- Buggy code exists in 7.0

**AGAINST backporting:**
- Not a crash, security, or data corruption fix
- Only affects mt7996 MLD station mode users (relatively narrow)
- Relatively new code (v6.18+)

### Step 9.2: Stable Rules Checklist
1. Obviously correct and tested? **Yes** — trivially correct 1-line
   addition
2. Fixes a real bug? **Yes** — incorrect WiFi capability advertisement
3. Important issue? **Medium** — functional correctness for WiFi MLD
4. Small and contained? **Yes** — 1 line in 1 file
5. No new features? **Correct** — fixes existing code
6. Can apply to stable? **Yes** — clean apply expected

### Step 9.3: Exception Categories
Not an exception category; this is a standard driver bug fix.

### Step 9.4: Decision
The fix is a genuine 1-line correctness fix that corrects incorrect WiFi
MLD capability advertisement for mt7996 hardware. It's written by the
chip vendor, reviewed by both subsystem maintainers, and has zero
regression risk. While it's not a crash or security fix, it corrects
real WiFi functionality for mt7996 users in MLD/STR mode. The extremely
small scope and zero risk make this appropriate for stable.

---

## Verification

- [Phase 1] Parsed tags: Author from MediaTek, co-signed by mt76 co-
  maintainer and maintainer
- [Phase 2] Diff analysis: 1 line added to static const initializer,
  adds missing `IEEE80211_MLD_CAP_OP_FREQ_SEP_TYPE_IND` field
- [Phase 3] git blame: Station MLD caps introduced by `f5160304d57c55`
  (v6.18), confirmed via `git merge-base`
- [Phase 3] git merge-base: Buggy commit NOT in v6.17 or earlier, IS in
  v6.18 and v7.0
- [Phase 5] grep callers: `mld_capa_and_ops` consumed in
  `net/mac80211/mlme.c` lines 2069 and 10612-10613 for association
  frames
- [Phase 5] grep definition: `IEEE80211_MLD_CAP_OP_FREQ_SEP_TYPE_IND` =
  0x0f80 in `include/linux/ieee80211-eht.h`
- [Phase 6] Buggy code only in v6.18+ and v7.0; irrelevant for older LTS
  trees
- [Phase 7] Peter Chiu confirmed as MediaTek contributor; Lorenzo
  Bianconi and Felix Fietkau are mt76 maintainers
- UNVERIFIED: Could not access lore.kernel.org discussion due to Anubis
  protection

**YES**

 drivers/net/wireless/mediatek/mt76/mt7996/init.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/wireless/mediatek/mt76/mt7996/init.c b/drivers/net/wireless/mediatek/mt76/mt7996/init.c
index 00a8286bd1368..d940ca84bd0f1 100644
--- a/drivers/net/wireless/mediatek/mt76/mt7996/init.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7996/init.c
@@ -85,6 +85,7 @@ static const struct wiphy_iftype_ext_capab iftypes_ext_capa[] = {
 		.extended_capabilities_mask = if_types_ext_capa_ap,
 		.extended_capabilities_len = sizeof(if_types_ext_capa_ap),
 		.mld_capa_and_ops =
+			FIELD_PREP_CONST(IEEE80211_MLD_CAP_OP_FREQ_SEP_TYPE_IND, 1) |
 			FIELD_PREP_CONST(IEEE80211_MLD_CAP_OP_MAX_SIMUL_LINKS,
 					 MT7996_MAX_RADIOS - 1),
 	},
-- 
2.53.0

^ permalink raw reply related

* [PATCH AUTOSEL 6.18] xsk: validate MTU against usable frame size on bind
From: Sasha Levin @ 2026-04-20 13:21 UTC (permalink / raw)
  To: patches, stable
  Cc: Maciej Fijalkowski, Björn Töpel, Jakub Kicinski,
	Sasha Levin, magnus.karlsson, davem, edumazet, pabeni, ast,
	netdev, bpf, linux-kernel
In-Reply-To: <20260420132314.1023554-1-sashal@kernel.org>

From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>

[ Upstream commit 36ee60b569ba0dfb6f961333b90d19ab5b323fa9 ]

AF_XDP bind currently accepts zero-copy pool configurations without
verifying that the device MTU fits into the usable frame space provided
by the UMEM chunk.

This becomes a problem since we started to respect tailroom which is
subtracted from chunk_size (among with headroom). 2k chunk size might
not provide enough space for standard 1500 MTU, so let us catch such
settings at bind time. Furthermore, validate whether underlying HW will
be able to satisfy configured MTU wrt XSK's frame size multiplied by
supported Rx buffer chain length (that is exposed via
net_device::xdp_zc_max_segs).

Fixes: 24ea50127ecf ("xsk: support mbuf on ZC RX")
Reviewed-by: Björn Töpel <bjorn@kernel.org>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://patch.msgid.link/20260402154958.562179-5-maciej.fijalkowski@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Error: Failed to generate final synthesis

 net/xdp/xsk_buff_pool.c | 28 +++++++++++++++++++++++++---
 1 file changed, 25 insertions(+), 3 deletions(-)

diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
index 677c7d00f8c32..a129ce6f1c25f 100644
--- a/net/xdp/xsk_buff_pool.c
+++ b/net/xdp/xsk_buff_pool.c
@@ -10,6 +10,8 @@
 #include "xdp_umem.h"
 #include "xsk.h"
 
+#define ETH_PAD_LEN (ETH_HLEN + 2 * VLAN_HLEN  + ETH_FCS_LEN)
+
 void xp_add_xsk(struct xsk_buff_pool *pool, struct xdp_sock *xs)
 {
 	unsigned long flags;
@@ -165,8 +167,12 @@ static void xp_disable_drv_zc(struct xsk_buff_pool *pool)
 int xp_assign_dev(struct xsk_buff_pool *pool,
 		  struct net_device *netdev, u16 queue_id, u16 flags)
 {
+	u32 needed = netdev->mtu + ETH_PAD_LEN;
+	u32 segs = netdev->xdp_zc_max_segs;
+	bool mbuf = flags & XDP_USE_SG;
 	bool force_zc, force_copy;
 	struct netdev_bpf bpf;
+	u32 frame_size;
 	int err = 0;
 
 	ASSERT_RTNL();
@@ -186,7 +192,7 @@ int xp_assign_dev(struct xsk_buff_pool *pool,
 	if (err)
 		return err;
 
-	if (flags & XDP_USE_SG)
+	if (mbuf)
 		pool->umem->flags |= XDP_UMEM_SG_FLAG;
 
 	if (flags & XDP_USE_NEED_WAKEUP)
@@ -208,8 +214,24 @@ int xp_assign_dev(struct xsk_buff_pool *pool,
 		goto err_unreg_pool;
 	}
 
-	if (netdev->xdp_zc_max_segs == 1 && (flags & XDP_USE_SG)) {
-		err = -EOPNOTSUPP;
+	if (mbuf) {
+		if (segs == 1) {
+			err = -EOPNOTSUPP;
+			goto err_unreg_pool;
+		}
+	} else {
+		segs = 1;
+	}
+
+	/* open-code xsk_pool_get_rx_frame_size() as pool->dev is not
+	 * set yet at this point; we are before getting down to driver
+	 */
+	frame_size = __xsk_pool_get_rx_frame_size(pool) -
+		     xsk_pool_get_tailroom(mbuf);
+	frame_size = ALIGN_DOWN(frame_size, 128);
+
+	if (needed > frame_size * segs) {
+		err = -EINVAL;
 		goto err_unreg_pool;
 	}
 
-- 
2.53.0


^ permalink raw reply related

* [PATCH AUTOSEL 7.0-6.18] wifi: ath12k: Skip adding inactive partner vdev info
From: Sasha Levin @ 2026-04-20 13:21 UTC (permalink / raw)
  To: patches, stable
  Cc: Avula Sri Charan, Roopni Devanathan, Rameshkumar Sundaram,
	Baochen Qiang, Jeff Johnson, Sasha Levin, jjohnson,
	linux-wireless, ath12k, linux-kernel
In-Reply-To: <20260420132314.1023554-1-sashal@kernel.org>

From: Avula Sri Charan <quic_asrichar@quicinc.com>

[ Upstream commit 7d7dc26f72abb7a76abb4a68ebad75d5ab7b375e ]

Currently, a vdev that is created is considered active for partner link
population. In case of an MLD station, non-associated link vdevs can be
created but not started. Yet, they are added as partner links. This leads
to the creation of stale FW partner entries which accumulate and cause
assertions.

To resolve this issue, check if the vdev is started and operating on a
chosen frequency, i.e., arvif->is_started, instead of checking if the vdev
is created, i.e., arvif->is_created. This determines if the vdev is active
or not and skips adding it as a partner link if it's inactive.

Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.6-01181-QCAHKSWPL_SILICONZ-1

Signed-off-by: Avula Sri Charan <quic_asrichar@quicinc.com>
Signed-off-by: Roopni Devanathan <roopni.devanathan@oss.qualcomm.com>
Reviewed-by: Rameshkumar Sundaram <rameshkumar.sundaram@oss.qualcomm.com>
Reviewed-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com>
Link: https://patch.msgid.link/20260330040732.1847263-1-roopni.devanathan@oss.qualcomm.com
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Now I have sufficient information. Let me compile the full analysis.

---

## PHASE 1: COMMIT MESSAGE FORENSICS

**Step 1.1:** Subsystem: `wifi: ath12k:`. Action verb: "Skip" (which
implies avoiding incorrect behavior). Summary: Skip adding inactive
partner vdev info to prevent firmware assertions.

**Step 1.2:** Tags:
- Tested-on: QCN9274 hw2.0 PCI (hardware-tested)
- Signed-off-by: Avula Sri Charan (author), Roopni Devanathan
  (submitter), Jeff Johnson (ath12k maintainer)
- Reviewed-by: Rameshkumar Sundaram, Baochen Qiang (both active ath12k
  contributors)
- Link: patch.msgid.link URL for the submission
- No Fixes: tag (expected for review candidates)
- No Cc: stable (expected for review candidates)

**Step 1.3:** Bug: When MLD station mode has non-associated link vdevs
that are created but not started, they are incorrectly added as partner
links. This leads to stale FW partner entries that accumulate and
**cause firmware assertions** (FW crash). Root cause: the check used
`is_created` but should use `is_started` to ensure only active vdevs are
added.

**Step 1.4:** This is clearly a bug fix despite not using the word "fix"
prominently. "Skip adding" = avoiding incorrect behavior that causes
firmware crashes.

## PHASE 2: DIFF ANALYSIS

**Step 2.1:** Single file changed:
`drivers/net/wireless/ath/ath12k/mac.c`. Change is 1 line: `is_created`
-> `is_started`. Minimal scope.

**Step 2.2:** In `ath12k_mac_mlo_get_vdev_args()`, when iterating
partner link vdevs:
- BEFORE: Check `arvif_p->is_created` (vdev allocated in firmware)
- AFTER: Check `arvif_p->is_started` (vdev started and operating on
  frequency)
- This is a more restrictive check that filters out vdevs that exist but
  are not active.

**Step 2.3:** Bug category: Logic/correctness fix. A created-but-not-
started vdev should not be treated as an active partner, because it has
no channel context yet. Adding it causes stale FW partner entries ->
firmware assertion.

**Step 2.4:** Fix quality: Obviously correct. `is_started` is a subset
of `is_created` (a vdev must be created before it can be started), so
this is strictly more restrictive. The fix cannot introduce regressions
because any vdev that is started is also created. The fix is reviewed by
two ath12k developers and tested on real hardware.

## PHASE 3: GIT HISTORY INVESTIGATION

**Step 3.1:** Blame shows the buggy line (`is_created` check) was
introduced by commit `315d80be304ac1` ("wifi: ath12k: allocate new links
in change_vif_links()") by Aditya Kumar Singh. This commit appeared at
v6.13-rc6, landing in v6.15.

**Step 3.2:** The original function `ath12k_mac_mlo_get_vdev_args()` was
introduced by `1ea0cdee6fb3a4` ("wifi: ath12k: MLO vdev bringup
changes") at v6.12-rc4, landing in v6.14. The `is_created` check was an
addition on top in v6.15.

**Step 3.3:** The fix is standalone. No other patches are needed as
prerequisites.

**Step 3.4:** Avula Sri Charan has one other commit in ath12k (napi
fix). Roopni Devanathan has multiple ath12k contributions. Reviewers
(Rameshkumar Sundaram, Baochen Qiang) are active ath12k contributors.

**Step 3.5:** No dependent commits needed. The fix only changes one
condition.

## PHASE 4: MAILING LIST RESEARCH

**Step 4.1:** b4 dig could not find the commit (it's not yet merged).
The submission URL is `https://patch.msgid.link/20260330040732.1847263-
1-roopni.devanathan@oss.qualcomm.com`. Lore is behind Anubis protection,
but we can confirm from the commit tags that it was reviewed by two
developers and accepted by the subsystem maintainer Jeff Johnson.

**Step 4.2:** Two reviewers (Rameshkumar Sundaram, Baochen Qiang)
reviewed the patch. Jeff Johnson (ath12k maintainer) signed off.

**Step 4.3-4.5:** Bug report details not available via web due to Anubis
protection. The commit message itself describes the bug mechanism
clearly.

## PHASE 5: CODE SEMANTIC ANALYSIS

**Step 5.1:** Modified function: `ath12k_mac_mlo_get_vdev_args()`

**Step 5.2:** Called from `ath12k_mac_vdev_start_restart()` (line
11210), which is a key function in the vdev start path. This is called
during channel context assignment (common MLO WiFi operation).

**Step 5.3-5.4:** The function populates partner link info that gets
sent to firmware via `ath12k_wmi_vdev_start()`. Incorrect partner
entries lead to firmware state corruption and assertion failures.

**Step 5.5:** The `is_started` flag is well-established in the codebase
with clear semantics: set when vdev starts operating, cleared when it
stops.

## PHASE 6: STABLE TREE ANALYSIS

**Step 6.1:** The buggy code (`is_created` check in
`ath12k_mac_mlo_get_vdev_args()`) was introduced in v6.15 (commit
`315d80be304ac1`). It exists in:
- v7.0 (confirmed: `git merge-base --is-ancestor` = IN v7.0)
- v6.15+ (confirmed)
- NOT in v6.14 or earlier (MLO function is different or doesn't have the
  check)

**Step 6.2:** The fix is a single-line change. It will apply cleanly to
the 7.0 stable tree since the surrounding code is identical.

**Step 6.3:** No related fixes already in stable for this specific
issue.

## PHASE 7: SUBSYSTEM CONTEXT

**Step 7.1:** Subsystem: WiFi/ath12k (wireless driver). Criticality:
IMPORTANT - ath12k supports Qualcomm WiFi 7 hardware (QCN9274, WCN7850)
used in modern systems.

**Step 7.2:** Very active subsystem - 232 commits to this file since the
MLO function was introduced.

## PHASE 8: IMPACT AND RISK ASSESSMENT

**Step 8.1:** Affected users: ath12k WiFi users in MLD/MLO station mode.
This is a growing user base as WiFi 7 hardware becomes more common.

**Step 8.2:** Trigger: MLD station connects with multiple links, non-
associated link vdevs are created but not started, then a vdev start
occurs. This is a normal MLO operation path, not an edge case.

**Step 8.3:** Failure mode: Firmware assertion (crash). Severity:
**CRITICAL** - the WiFi firmware crashes, requiring recovery.

**Step 8.4:**
- Benefit: HIGH - prevents firmware crashes during normal MLO operation
- Risk: VERY LOW - 1 line change, strictly more restrictive condition,
  cannot regress
- Ratio: Strongly favorable

## PHASE 9: FINAL SYNTHESIS

**Evidence FOR backporting:**
- Fixes firmware assertion (crash) - critical severity
- Single-line change, obviously correct
- `is_started` is a strict subset of `is_created` - no regression
  possible
- Reviewed by 2 developers, tested on real hardware
- Affects normal MLO WiFi operation path
- Clean apply to 7.0 stable

**Evidence AGAINST backporting:**
- Affects only MLO (WiFi 7) users with ath12k hardware (limited
  audience)
- Relatively new code (v6.15+)
- No explicit Fixes: tag or Cc: stable

**Stable rules checklist:**
1. Obviously correct and tested? **YES** - 1-line change, more
   restrictive check, HW-tested
2. Fixes a real bug? **YES** - firmware assertions/crashes
3. Important issue? **YES** - firmware crash
4. Small and contained? **YES** - 1 line in 1 file
5. No new features? **YES** - purely a fix
6. Can apply to stable? **YES** - clean apply

## Verification

- [Phase 1] Parsed tags: Tested-on: QCN9274, Reviewed-by: 2 developers,
  Signed-off-by: Jeff Johnson (maintainer)
- [Phase 2] Diff analysis: 1 line changed in
  `ath12k_mac_mlo_get_vdev_args()`, `is_created` -> `is_started`
- [Phase 3] git blame: buggy `is_created` check introduced in
  315d80be304ac1 (v6.15), added by Aditya Kumar Singh
- [Phase 3] git describe: 315d80be304ac1 = v6.13-rc6-1322, landed in
  v6.15
- [Phase 3] git merge-base: 315d80be304ac1 IS in v7.0, NOT in v6.14
- [Phase 3] git merge-base: 1ea0cdee6fb3a4 (original function) IS in
  v6.14, NOT in v6.13
- [Phase 5] Grep callers: `ath12k_mac_mlo_get_vdev_args()` called from
  `ath12k_mac_vdev_start_restart()` at line 11210
- [Phase 5] Grep `is_started`: set to true in vdev start path (lines
  11637, 12106, 12118), false in unassign path (lines 12168, 12179)
- [Phase 6] Buggy code exists in v6.15+ and v7.0 stable trees only
- [Phase 7] Active subsystem: 232 commits since MLO function
  introduction
- [Phase 8] Failure mode: firmware assertion = CRITICAL
- UNVERIFIED: Could not access lore.kernel.org discussion due to Anubis
  protection

The fix is a minimal, obviously correct, single-line change that
prevents firmware crashes during normal MLO WiFi operation. It meets all
stable kernel criteria.

**YES**

 drivers/net/wireless/ath/ath12k/mac.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ath/ath12k/mac.c b/drivers/net/wireless/ath/ath12k/mac.c
index 769d240e3ae24..6fca4418ab0bd 100644
--- a/drivers/net/wireless/ath/ath12k/mac.c
+++ b/drivers/net/wireless/ath/ath12k/mac.c
@@ -11122,7 +11122,7 @@ ath12k_mac_mlo_get_vdev_args(struct ath12k_link_vif *arvif,
 		if (arvif == arvif_p)
 			continue;

-		if (!arvif_p->is_created)
+		if (!arvif_p->is_started)
 			continue;

 		link_conf = wiphy_dereference(ahvif->ah->hw->wiphy,
-- 
2.53.0

^ permalink raw reply related

* [PATCH AUTOSEL 7.0-6.18] drm/amdgpu: fix some more bug in amdgpu_gem_va_ioctl
From: Sasha Levin @ 2026-04-20 13:21 UTC (permalink / raw)
  To: patches, stable
  Cc: Christian König, Sunil Khatri, Alex Deucher, Sasha Levin,
	airlied, simona, amd-gfx, dri-devel, linux-kernel
In-Reply-To: <20260420132314.1023554-1-sashal@kernel.org>

From: Christian König <christian.koenig@amd.com>

[ Upstream commit 68bd4f6b8310f309eb63b41e15088690c9cec0a9 ]

Some illegal combination of input flags were not checked and we need to
take the PDEs into account when returning the fence as well.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Now I have all the information needed. Here is my complete analysis.

---

## PHASE 1: COMMIT MESSAGE FORENSICS

### Step 1.1: Subject Line
- **Subsystem**: drm/amdgpu
- **Action verb**: "fix" (explicit bug fix)
- **Summary**: Fix multiple bugs in the `amdgpu_gem_va_ioctl` function
  related to input flag validation and fence handling for PDE updates.

Record: [drm/amdgpu] [fix] [Multiple bugs in gem VA ioctl: illegal flag
combinations and missing PDE fence]

### Step 1.2: Tags
- **Signed-off-by**: Christian König (author, AMD DRM subsystem co-
  maintainer)
- **Acked-by**: Sunil Khatri (AMD developer)
- **Signed-off-by**: Alex Deucher (AMD DRM maintainer, committer)
- No Fixes: tag, no Reported-by, no Link: tag

Record: Author is Christian König, one of the primary amdgpu/drm
maintainers. Acked by AMD colleague, committed by the AMD DRM
maintainer.

### Step 1.3: Commit Body
The body says: "Some illegal combination of input flags were not checked
and we need to take the PDEs into account when returning the fence as
well."

Two distinct bugs identified:
1. Missing input validation for contradictory flag combinations
2. Missing PDE (Page Directory Entry) fence in the returned fence to
   userspace

Record: [Bug 1: invalid flag combinations not rejected] [Bug 2: PDE
updates missing from returned fence, could cause premature GPU memory
access] [No version info given] [Root cause: incomplete validation and
incomplete fence merging]

### Step 1.4: Hidden Bug Fix Detection
This is an explicit "fix" commit, not disguised as cleanup.

Record: This is explicitly labeled as a bug fix. No hidden intent.

---

## PHASE 2: DIFF ANALYSIS

### Step 2.1: Inventory
- **File**: `drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c` (1 file)
- **Added**: `#include <linux/dma-fence-unwrap.h>` (1 line)
- **Functions modified**: `amdgpu_gem_va_update_vm()`,
  `amdgpu_gem_va_ioctl()`
- **Scope**: ~35 lines removed, ~30 lines added in
  `amdgpu_gem_va_update_vm`; ~10 lines changed in `amdgpu_gem_va_ioctl`

Record: [1 file, ~45 lines changed, 2 functions modified] [Single-file
contained fix]

### Step 2.2: Code Flow Changes

**Hunk 1 - `amdgpu_gem_va_update_vm` - VM-not-ready path**:
- Before: `fence = dma_fence_get(vm->last_update)` then if not ready,
  return that fence
- After: If not ready, return `dma_fence_get_stub()` immediately
- Effect: Cleaner early return; stub fence is sufficient when VM isn't
  ready

**Hunk 2 - `amdgpu_vm_clear_freed` argument**:
- Before: `amdgpu_vm_clear_freed(adev, vm, &fence)` (local variable)
- After: `amdgpu_vm_clear_freed(adev, vm, &vm->last_update)` (VM state
  directly)
- Effect: `vm->last_update` is kept current after clearing freed
  mappings, so subsequent `amdgpu_vm_update_pdes` properly syncs

**Hunk 3 - Fence return logic**:
- Before: Switch/case returning either `vm->last_update` or
  `bo_va->last_pt_update` (but NOT both)
- After: For non-always-valid MAP/REPLACE, merges both `vm->last_update`
  and `bo_va->last_pt_update` using `dma_fence_unwrap_merge()`; includes
  OOM fallback; for other cases returns `vm->last_update`
- Effect: Returned fence now accounts for both PTE and PDE updates

**Hunk 4 - Error path**:
- Before: Falls through from normal path to error label, always returns
  local fence
- After: Normal path returns fence via explicit `return`; error path
  returns `dma_fence_get(vm->last_update)`
- Effect: Cleaner separation of normal and error paths

**Hunk 5 - `amdgpu_gem_va_ioctl` - flag validation**:
- Added check: `AMDGPU_VM_DELAY_UPDATE && vm_timeline_syncobj_out`
  returns -EINVAL
- Effect: Rejects contradictory flags (delay + immediate fence request)

**Hunk 6 - `amdgpu_gem_va_ioctl` - update condition**:
- Before: `!adev->debug_vm`
- After: `(!adev->debug_vm || timeline_syncobj)`
- Effect: When timeline syncobj is requested, update happens even in
  debug mode

Record: [6 distinct hunks, all fixing correctness issues]

### Step 2.3: Bug Mechanism
This is a **logic/correctness fix** with two aspects:
1. **Missing fence merge**: `amdgpu_vm_update_pdes()` stores its fence
   into `vm->last_update` (verified at `amdgpu_vm.c:1006`). For non-
   always-valid BOs on MAP/REPLACE, the old code returned only
   `bo_va->last_pt_update`, missing the PDE fence. Userspace could start
   using the mapping before PDE updates complete.
2. **Input validation gap**: DELAY_UPDATE + syncobj_out is contradictory
   and wasn't rejected.

Record: [Logic/correctness fix] [Missing PDE fence could cause premature
GPU memory access; missing input validation for contradictory flags]

### Step 2.4: Fix Quality
- The fix uses `dma_fence_unwrap_merge()` which is the correct API for
  merging fences
- OOM fallback with `dma_fence_wait` + `dma_fence_get_stub()` is
  reasonable
- The flag validation check is trivially correct
- The debug_vm condition change is obviously correct
- Low regression risk: the fence merge is strictly more conservative
  (waits for more work)

Record: [Fix is well-designed with proper fallback] [Low regression risk
- waits for MORE work, not less]

---

## PHASE 3: GIT HISTORY INVESTIGATION

### Step 3.1: Blame
The code being modified was primarily introduced by:
- `efdc66fe12b07` (2026-01-09) - "Refactor amdgpu_gem_va_ioctl v7"
- `bd8150a1b3370` (2025-12-11) - "Refactor amdgpu_gem_va_ioctl v4"

Both are in v7.0-rc1/v7.0. The bugs were introduced in the v7.0 cycle.

Record: [Buggy code from efdc66fe12b07 and bd8150a1b3370, both v7.0
cycle]

### Step 3.2: Fixes tag
No Fixes: tag present. The bugs were introduced by `bd8150a1b3370` and
partially addressed by `efdc66fe12b07`, but this commit fixes remaining
issues in `efdc66fe12b07`.

Record: [No explicit Fixes: tag; fixes bugs remaining from
efdc66fe12b07]

### Step 3.3: File History
44 commits changed this file since v6.12. The specific area
(va_update_vm, va_ioctl) has been actively modified in the v7.0 cycle
with bd8150a1b3370 and efdc66fe12b07.

Record: [Active development area; this is a follow-up fix for recent
refactoring]

### Step 3.4: Author
Christian König is one of the primary amdgpu/drm TTM maintainers. He
authored many critical fixes in this subsystem. He also
suggested/reviewed the v7 refactor that this commit fixes further. His
fixes carry high weight.

Record: [Author is subsystem co-maintainer with deep expertise]

### Step 3.5: Dependencies
- Depends on `efdc66fe12b07` (present in tree as v7.0)
- Requires `dma_fence_unwrap_merge` (present in tree via
  `include/linux/dma-fence-unwrap.h`)
- Requires `amdgpu_vm_is_bo_always_valid` (present in tree)
- The patch context matches the current tree state exactly - clean apply
  expected

Record: [All dependencies present in 7.0 tree; should apply cleanly]

---

## PHASE 4: MAILING LIST RESEARCH

### Step 4.1: Original Discussion
lore.kernel.org blocked automated access. However, I found via b4 dig
that the predecessor commit (`efdc66fe12b07`) was submitted as v7 of the
refactoring series, meaning the fence handling went through 7 iterations
of review. The current fix is by Christian König himself who guided the
entire refactoring.

Record: [Unable to access lore directly; predecessor went through 7
review iterations]

### Step 4.2: Reviewers
- Christian König (author) - AMD DRM co-maintainer
- Sunil Khatri (acker) - AMD developer
- Alex Deucher (committer) - AMD DRM maintainer

Record: [Reviewed by top AMD DRM maintainers]

### Step 4.3-4.5: Bug Reports / Related Patches / Stable Discussion
The predecessor commits (`bd8150a1b3370`) had a documented crash
signature (refcount underflow, use-after-free, kernel panic). While
`efdc66fe12b07` fixed the worst of it, this commit addresses remaining
correctness issues.

Record: [Predecessor had kernel panic crash signature; this fixes
remaining issues]

---

## PHASE 5: CODE SEMANTIC ANALYSIS

### Step 5.1: Key Functions
- `amdgpu_gem_va_update_vm()` - updates VM page tables after VA
  operation
- `amdgpu_gem_va_ioctl()` - userspace-facing ioctl handler

### Step 5.2: Callers
`amdgpu_gem_va_ioctl` is the DRM ioctl handler called via
`DRM_IOCTL_DEF_DRV(AMDGPU_GEM_VA, ...)` at `amdgpu_drv.c:3082`. It's
callable by any process with DRM_AUTH|DRM_RENDER_ALLOW. This is a hot
path for all AMD GPU userspace (Mesa, ROCm, etc.).

`amdgpu_gem_va_update_vm` is called only from `amdgpu_gem_va_ioctl`.

Record: [Directly callable from userspace; affects all AMD GPU users]

### Step 5.4: Reachability
The buggy code path is reachable from any unprivileged process that
opens a DRM render node and performs VM address space management
(standard GPU operation).

Record: [Reachable from unprivileged userspace; common GPU operation
path]

---

## PHASE 6: STABLE TREE ANALYSIS

### Step 6.1: Code in Stable Trees
`bd8150a1b3370` first appears in v7.0-rc1. It does NOT exist in v6.14 or
earlier stable trees. The timeline syncobj support
(`vm_timeline_syncobj_out`, `AMDGPU_VM_DELAY_UPDATE` in this context) is
v7.0-only functionality. The fix is relevant ONLY for the 7.0.y stable
tree.

Record: [Buggy code only in 7.0.y; not in 6.12.y or earlier]

### Step 6.2: Backport Complications
The diff context matches the current tree state perfectly. Clean apply
expected.

Record: [Clean apply expected for 7.0.y]

### Step 6.3: Related Fixes Already in Stable
`efdc66fe12b07` (the v7 refactor fix) is already in the 7.0 tree, but
this commit fixes remaining issues.

Record: [Predecessor fix present; this addresses additional bugs]

---

## PHASE 7: SUBSYSTEM CONTEXT

### Step 7.1: Subsystem
- **Subsystem**: drivers/gpu/drm/amd/amdgpu (GPU driver)
- **Criticality**: IMPORTANT - AMD GPU driver is used by millions of
  users (desktop Linux, server compute, embedded)

Record: [GPU driver, IMPORTANT criticality, affects all AMD GPU users]

### Step 7.2: Subsystem Activity
Very actively developed - 44 changes to this file since v6.12. The VA
ioctl area specifically is being stabilized after recent refactoring.

Record: [Very active; area being stabilized after v7.0 refactoring]

---

## PHASE 8: IMPACT AND RISK ASSESSMENT

### Step 8.1: Who Is Affected
All users of AMD GPUs on kernel 7.0.y. This includes desktop,
workstation, and compute users.

Record: [All AMD GPU users on 7.0.y]

### Step 8.2: Trigger Conditions
The missing PDE fence bug triggers during MAP/REPLACE operations on non-
always-valid BOs, which is the normal case for application-managed GPU
memory. This is the common path for all GPU applications. The invalid
flag combination requires specific userspace to pass contradictory
flags.

Record: [PDE fence bug: common GPU operation; flag validation: requires
specific bad input]

### Step 8.3: Failure Mode Severity
- **Missing PDE fence**: Userspace sees mapping as ready before PDE
  updates complete. Could lead to GPU page faults, incorrect GPU memory
  access, or visual corruption. Severity: **HIGH**
- **Invalid flag combination**: Unexpected behavior with contradictory
  flags. Severity: **MEDIUM**
- **debug_vm condition**: Timeline syncobj never signals in debug mode.
  Severity: **MEDIUM** (only affects debug configs)

Record: [HIGH severity for PDE fence bug; MEDIUM for others]

### Step 8.4: Risk-Benefit
- **BENEFIT**: Prevents GPU page faults and incorrect memory access in
  common GPU operations; validates contradictory input flags
- **RISK**: Medium-sized change (~45 lines); restructures fence return
  logic; adds `dma_fence_unwrap_merge` dependency. However, the new
  logic is strictly more conservative (waits for more work) so
  regression risk is low.

Record: [High benefit, low-medium risk; net positive]

---

## PHASE 9: FINAL SYNTHESIS

### Step 9.1: Evidence Compilation

**FOR backporting**:
- Fixes real correctness bugs: missing PDE fence, invalid input not
  rejected
- Author is Christian König, AMD DRM co-maintainer with deep expertise
- Acked by AMD developer, committed by AMD DRM maintainer
- Single file change, well-contained
- Userspace-facing ioctl - affects all AMD GPU applications
- The missing PDE fence could cause GPU page faults or incorrect memory
  access
- All dependencies present in 7.0 tree
- Clean apply expected
- The fence merge is strictly more conservative (lower regression risk)

**AGAINST backporting**:
- Medium-sized change (~45 lines), not a tiny surgical fix
- Restructures fence return logic in `amdgpu_gem_va_update_vm`
- Adds new API usage (`dma_fence_unwrap_merge`)
- Bugs only exist in 7.0 code (limited affected trees)
- Area is under active development and may see more changes

### Step 9.2: Stable Rules Checklist
1. Obviously correct and tested? **Yes** - from subsystem maintainer,
   well-reviewed
2. Fixes a real bug? **Yes** - missing PDE fence, invalid input not
   rejected
3. Important issue? **Yes** - GPU page faults, incorrect memory access
   (HIGH severity)
4. Small and contained? **Yes** - single file, ~45 lines, 2 functions
5. No new features or APIs? **Correct** - no new features
6. Can apply to stable? **Yes** - context matches exactly for 7.0.y

### Step 9.3: Exception Categories
Not an exception category; standard bug fix.

### Step 9.4: Decision
The bugs are real and affect all AMD GPU users on 7.0.y. The missing PDE
fence is a correctness issue that can cause GPU page faults during
normal operations. The fix is from the subsystem maintainer, well-
reviewed, contained to a single file, and all dependencies are met.

---

## Verification

- [Phase 1] Parsed tags: Author Christian König (AMD DRM co-maintainer),
  Acked-by Sunil Khatri, committed by Alex Deucher
- [Phase 2] Diff analysis: ~45 lines changed in `amdgpu_gem.c`, modifies
  `amdgpu_gem_va_update_vm` and `amdgpu_gem_va_ioctl`
- [Phase 2] Verified `amdgpu_vm_update_pdes` stores its fence in
  `vm->last_update` (amdgpu_vm.c:1006) - confirms PDE fence is missed
  without the merge
- [Phase 2] Verified `amdgpu_vm_clear_freed` replaces `*fence` when work
  is done (amdgpu_vm.c:1583-1588) - confirms changing argument from
  `&fence` to `&vm->last_update` keeps VM state current
- [Phase 3] git blame: buggy code from `efdc66fe12b07` (2026-01-09) and
  `bd8150a1b3370` (2025-12-11)
- [Phase 3] git tag --contains: both commits first appear in
  v7.0-rc1/v7.0
- [Phase 3] Author's recent commits: 15 commits in amdgpu, including
  multiple critical fixes
- [Phase 4] b4 dig -c efdc66fe12b07: found v7 submission thread
- [Phase 4] b4 dig -a: patch went through v6->v7 iterations
- [Phase 5] `amdgpu_gem_va_ioctl` is registered as DRM ioctl at
  amdgpu_drv.c:3082 (DRM_AUTH|DRM_RENDER_ALLOW)
- [Phase 6] `bd8150a1b3370` NOT in v6.14 (verified via git log
  v6.12..v6.14); only in v7.0+
- [Phase 6] `dma_fence_unwrap_merge` exists in tree (include/linux/dma-
  fence-unwrap.h:69)
- [Phase 6] Current file state matches diff context exactly - clean
  apply expected
- [Phase 8] PDE fence bug: triggers on all MAP/REPLACE of non-always-
  valid BOs (common path)
- UNVERIFIED: Could not verify lore.kernel.org discussion for the
  specific commit under analysis (bot protection blocked access)

**YES**

 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 76 +++++++++++--------------
 1 file changed, 34 insertions(+), 42 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index a6107109a2b86..c4839cf2dce37 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -30,6 +30,7 @@
 #include <linux/pagemap.h>
 #include <linux/pci.h>
 #include <linux/dma-buf.h>
+#include <linux/dma-fence-unwrap.h>
 
 #include <drm/amdgpu_drm.h>
 #include <drm/drm_drv.h>
@@ -744,11 +745,10 @@ amdgpu_gem_va_update_vm(struct amdgpu_device *adev,
 	struct dma_fence *fence;
 	int r = 0;
 
-	/* Always start from the VM's existing last update fence. */
-	fence = dma_fence_get(vm->last_update);
-
+	/* If the VM is not ready return only a stub. */
 	if (!amdgpu_vm_ready(vm))
-		return fence;
+		return dma_fence_get_stub();
+
 
 	/*
 	 * First clean up any freed mappings in the VM.
@@ -757,7 +757,7 @@ amdgpu_gem_va_update_vm(struct amdgpu_device *adev,
 	 * schedules GPU work. If nothing needs clearing, @fence can remain as
 	 * the original vm->last_update.
 	 */
-	r = amdgpu_vm_clear_freed(adev, vm, &fence);
+	r = amdgpu_vm_clear_freed(adev, vm, &vm->last_update);
 	if (r)
 		goto error;
 
@@ -774,47 +774,34 @@ amdgpu_gem_va_update_vm(struct amdgpu_device *adev,
 	if (r)
 		goto error;
 
-	/*
-	 * Decide which fence best represents the last update:
-	 *
-	 * MAP/REPLACE:
-	 *   - For always-valid mappings, use vm->last_update.
-	 *   - Otherwise, export bo_va->last_pt_update.
-	 *
-	 * UNMAP/CLEAR:
-	 *   Keep the fence returned by amdgpu_vm_clear_freed(). If no work was
-	 *   needed, it can remain as vm->last_pt_update.
-	 *
-	 * The VM and BO update fences are always initialized to a valid value.
-	 * vm->last_update and bo_va->last_pt_update always start as valid fences.
-	 * and are never expected to be NULL.
-	 */
-	switch (operation) {
-	case AMDGPU_VA_OP_MAP:
-	case AMDGPU_VA_OP_REPLACE:
+	if ((operation == AMDGPU_VA_OP_MAP ||
+	     operation == AMDGPU_VA_OP_REPLACE) &&
+	    !amdgpu_vm_is_bo_always_valid(vm, bo_va->base.bo)) {
+
 		/*
-		 * For MAP/REPLACE, return the page table update fence for the
-		 * mapping we just modified. bo_va is expected to be valid here.
+		 * For MAP/REPLACE of non per-VM BOs we need to sync to both the
+		 * bo_va->last_pt_update and vm->last_update or otherwise we
+		 * potentially miss the PDE updates.
 		 */
-		dma_fence_put(fence);
-
-		if (amdgpu_vm_is_bo_always_valid(vm, bo_va->base.bo))
-			fence = dma_fence_get(vm->last_update);
-		else
-			fence = dma_fence_get(bo_va->last_pt_update);
-		break;
-	case AMDGPU_VA_OP_UNMAP:
-	case AMDGPU_VA_OP_CLEAR:
-	default:
-		/* keep @fence as returned by amdgpu_vm_clear_freed() */
-		break;
+		fence = dma_fence_unwrap_merge(vm->last_update,
+					       bo_va->last_pt_update);
+		if (!fence) {
+			/* As fallback in OOM situations */
+			dma_fence_wait(vm->last_update, false);
+			dma_fence_wait(bo_va->last_pt_update, false);
+			fence = dma_fence_get_stub();
+		}
+	} else {
+		fence = dma_fence_get(vm->last_update);
 	}
 
+	return fence;
+
 error:
 	if (r && r != -ERESTARTSYS)
 		DRM_ERROR("Couldn't update BO_VA (%d)\n", r);
 
-	return fence;
+	return dma_fence_get(vm->last_update);
 }
 
 int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,
@@ -835,7 +822,6 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,
 	struct amdgpu_bo_va *bo_va;
 	struct drm_syncobj *timeline_syncobj = NULL;
 	struct dma_fence_chain *timeline_chain = NULL;
-	struct dma_fence *fence;
 	struct drm_exec exec;
 	uint64_t vm_size;
 	int r = 0;
@@ -887,6 +873,10 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,
 		return -EINVAL;
 	}
 
+	if (args->flags & AMDGPU_VM_DELAY_UPDATE &&
+	    args->vm_timeline_syncobj_out)
+		return -EINVAL;
+
 	if ((args->operation != AMDGPU_VA_OP_CLEAR) &&
 	    !(args->flags & AMDGPU_VM_PAGE_PRT)) {
 		gobj = drm_gem_object_lookup(filp, args->handle);
@@ -976,11 +966,13 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,
 	 * that represents the last relevant update for this mapping. This
 	 * fence can then be exported to the user-visible VM timeline.
 	 */
-	if (!r && !(args->flags & AMDGPU_VM_DELAY_UPDATE) && !adev->debug_vm) {
+	if (!r && !(args->flags & AMDGPU_VM_DELAY_UPDATE) &&
+	    (!adev->debug_vm || timeline_syncobj)) {
+		struct dma_fence *fence;
+
 		fence = amdgpu_gem_va_update_vm(adev, &fpriv->vm, bo_va,
 						args->operation);
-
-		if (timeline_syncobj && fence) {
+		if (timeline_syncobj) {
 			if (!args->vm_timeline_point) {
 				/* Replace the existing fence when no point is given. */
 				drm_syncobj_replace_fence(timeline_syncobj,
-- 
2.53.0


^ permalink raw reply related

* [PATCH AUTOSEL 6.18] crypto: af_alg - limit RX SG extraction by receive buffer budget
From: Sasha Levin @ 2026-04-20 13:21 UTC (permalink / raw)
  To: patches, stable
  Cc: Douya Le, Yifan Wu, Juefei Pu, Yuan Tan, Xin Liu, Ren Wei,
	Herbert Xu, Sasha Levin, davem, smueller, linux-crypto,
	linux-kernel
In-Reply-To: <20260420132314.1023554-1-sashal@kernel.org>

From: Douya Le <ldy3087146292@gmail.com>

[ Upstream commit 8eceab19eba9dcbfd2a0daec72e1bf48aa100170 ]

Make af_alg_get_rsgl() limit each RX scatterlist extraction to the
remaining receive buffer budget.

af_alg_get_rsgl() currently uses af_alg_readable() only as a gate
before extracting data into the RX scatterlist. Limit each extraction
to the remaining af_alg_rcvbuf(sk) budget so that receive-side
accounting matches the amount of data attached to the request.

If skcipher cannot obtain enough RX space for at least one chunk while
more data remains to be processed, reject the recvmsg call instead of
rounding the request length down to zero.

Fixes: e870456d8e7c8d57c059ea479b5aadbb55ff4c3a ("crypto: algif_skcipher - overhaul memory management")
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Co-developed-by: Yuan Tan <yuantan098@gmail.com>
Signed-off-by: Yuan Tan <yuantan098@gmail.com>
Suggested-by: Xin Liu <bird@lzu.edu.cn>
Signed-off-by: Douya Le <ldy3087146292@gmail.com>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Error: Failed to generate final synthesis

 crypto/af_alg.c         | 2 ++
 crypto/algif_skcipher.c | 5 +++++
 2 files changed, 7 insertions(+)

diff --git a/crypto/af_alg.c b/crypto/af_alg.c
index 3236601aa6dc0..6867d177f2a2d 100644
--- a/crypto/af_alg.c
+++ b/crypto/af_alg.c
@@ -1229,6 +1229,8 @@ int af_alg_get_rsgl(struct sock *sk, struct msghdr *msg, int flags,
 
 		seglen = min_t(size_t, (maxsize - len),
 			       msg_data_left(msg));
+		/* Never pin more pages than the remaining RX accounting budget. */
+		seglen = min_t(size_t, seglen, af_alg_rcvbuf(sk));
 
 		if (list_empty(&areq->rsgl_list)) {
 			rsgl = &areq->first_rsgl;
diff --git a/crypto/algif_skcipher.c b/crypto/algif_skcipher.c
index 82735e51be108..ba0a17fd95aca 100644
--- a/crypto/algif_skcipher.c
+++ b/crypto/algif_skcipher.c
@@ -130,6 +130,11 @@ static int _skcipher_recvmsg(struct socket *sock, struct msghdr *msg,
 	 * full block size buffers.
 	 */
 	if (ctx->more || len < ctx->used) {
+		if (len < bs) {
+			err = -EINVAL;
+			goto free;
+		}
+
 		len -= len % bs;
 		cflags |= CRYPTO_SKCIPHER_REQ_NOTFINAL;
 	}
-- 
2.53.0


^ permalink raw reply related

* [PATCH AUTOSEL 6.18] dma-debug: suppress cacheline overlap warning when arch has no DMA alignment requirement
From: Sasha Levin @ 2026-04-20 13:21 UTC (permalink / raw)
  To: patches, stable
  Cc: Mikhail Gavrilov, Harry Yoo, Marek Szyprowski, Sasha Levin,
	someguy, iommu, linux-kernel
In-Reply-To: <20260420132314.1023554-1-sashal@kernel.org>

From: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>

[ Upstream commit 3d48c9fd78dd0b1809669ec49c4d0997b8127512 ]

When CONFIG_DMA_API_DEBUG is enabled, the DMA debug infrastructure
tracks active mappings per cacheline and warns if two different DMA
mappings share the same cacheline ("cacheline tracking EEXIST,
overlapping mappings aren't supported").

On x86_64, ARCH_KMALLOC_MINALIGN defaults to 8, so small kmalloc
allocations (e.g. the 8-byte hub->buffer and hub->status in the USB
hub driver) frequently land in the same 64-byte cacheline.  When both
are DMA-mapped, this triggers a false positive warning.

This has been reported repeatedly since v5.14 (when the EEXIST check
was added) across various USB host controllers and devices including
xhci_hcd with USB hubs, USB audio devices, and USB ethernet adapters.

The cacheline overlap is only a real concern on architectures that
require DMA buffer alignment to cacheline boundaries (i.e. where
ARCH_DMA_MINALIGN >= L1_CACHE_BYTES).  On architectures like x86_64
where dma_get_cache_alignment() returns 1, the hardware is
cache-coherent and overlapping cacheline mappings are harmless.

Suppress the EEXIST warning when dma_get_cache_alignment() is less
than L1_CACHE_BYTES, indicating the architecture does not require
cacheline-aligned DMA buffers.

Verified with a kernel module reproducer that performs two kmalloc(8)
allocations back-to-back and DMA-maps both:

  Before: allocations share a cacheline, EEXIST fires within ~50 pairs
  After:  same cacheline pair found, but no warning emitted

Fixes: 2b4bbc6231d7 ("dma-debug: report -EEXIST errors in add_dma_entry")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215740
Suggested-by: Harry Yoo <harry@kernel.org>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20260327124156.24820-1-mikhail.v.gavrilov@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Error: Failed to generate final synthesis

 kernel/dma/debug.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index 43d6a996d7a78..596ea7abbda15 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -614,6 +614,7 @@ static void add_dma_entry(struct dma_debug_entry *entry, unsigned long attrs)
 	} else if (rc == -EEXIST &&
 		   !(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
 		   !(entry->is_cache_clean && overlap_cache_clean) &&
+		   dma_get_cache_alignment() >= L1_CACHE_BYTES &&
 		   !(IS_ENABLED(CONFIG_DMA_BOUNCE_UNALIGNED_KMALLOC) &&
 		     is_swiotlb_active(entry->dev))) {
 		err_printk(entry->dev, entry,
-- 
2.53.0

^ permalink raw reply related

* [PATCH AUTOSEL 6.18] PCI: hv: Set default NUMA node to 0 for devices without affinity info
From: Sasha Levin @ 2026-04-20 13:21 UTC (permalink / raw)
  To: patches, stable
  Cc: Long Li, Michael Kelley, Wei Liu, Sasha Levin, kys, haiyangz,
	decui, lpieralisi, kwilczynski, mani, bhelgaas, mikelley,
	linux-hyperv, linux-pci, linux-kernel
In-Reply-To: <20260420132314.1023554-1-sashal@kernel.org>

From: Long Li <longli@microsoft.com>

[ Upstream commit 7b3b1e5a87b2f5e35c52b5386d7c327be869454f ]

When hv_pci_assign_numa_node() processes a device that does not have
HV_PCI_DEVICE_FLAG_NUMA_AFFINITY set or has an out-of-range
virtual_numa_node, the device NUMA node is left unset. On x86_64,
the uninitialized default happens to be 0, but on ARM64 it is
NUMA_NO_NODE (-1).

Tests show that when no NUMA information is available from the Hyper-V
host, devices perform best when assigned to node 0. With NUMA_NO_NODE
the kernel may spread work across NUMA nodes, which degrades
performance on Hyper-V, particularly for high-throughput devices like
MANA.

Always set the device NUMA node to 0 before the conditional NUMA
affinity check, so that devices get a performant default when the host
provides no NUMA information, and behavior is consistent on both
x86_64 and ARM64.

Fixes: 999dd956d838 ("PCI: hv: Add support for protocol 1.3 and support PCI_BUS_RELATIONS2")
Signed-off-by: Long Li <longli@microsoft.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Error: Failed to generate final synthesis

 drivers/pci/controller/pci-hyperv.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 146b43981b278..28b1572974879 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -2486,6 +2486,14 @@ static void hv_pci_assign_numa_node(struct hv_pcibus_device *hbus)
 		if (!hv_dev)
 			continue;

+		/*
+		 * If the Hyper-V host doesn't provide a NUMA node for the
+		 * device, default to node 0. With NUMA_NO_NODE the kernel
+		 * may spread work across NUMA nodes, which degrades
+		 * performance on Hyper-V.
+		 */
+		set_dev_node(&dev->dev, 0);
+
 		if (hv_dev->desc.flags & HV_PCI_DEVICE_FLAG_NUMA_AFFINITY &&
 		    hv_dev->desc.virtual_numa_node < num_possible_nodes())
 			/*
-- 
2.53.0

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox