From: Sasha Levin <sashal@kernel.org>
To: patches@lists.linux.dev, stable@vger.kernel.org
Cc: Zac Bowling <zac@zacbowling.com>, Felix Fietkau <nbd@nbd.name>,
Sasha Levin <sashal@kernel.org>,
lorenzo@kernel.org, ryder.lee@mediatek.com,
matthias.bgg@gmail.com, angelogioacchino.delregno@collabora.com,
linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org,
linux-mediatek@lists.infradead.org
Subject: [PATCH AUTOSEL 7.0-6.6] wifi: mt76: fix list corruption in mt76_wcid_cleanup
Date: Mon, 20 Apr 2026 09:18:47 -0400 [thread overview]
Message-ID: <20260420132314.1023554-133-sashal@kernel.org> (raw)
In-Reply-To: <20260420132314.1023554-1-sashal@kernel.org>
From: Zac Bowling <zac@zacbowling.com>
[ Upstream commit 34163942195410372fb138bea806c9b34e2f5257 ]
mt76_wcid_cleanup() was not removing wcid entries from sta_poll_list
before mt76_reset_device() reinitializes the master list. This leaves
stale pointers in wcid->poll_list, causing list corruption when
mt76_wcid_add_poll() later checks list_empty() and tries to add the
entry back.
The fix adds proper cleanup of poll_list in mt76_wcid_cleanup(),
matching how tx_list is already handled. This is similar to what
mt7996_mac_sta_deinit_link() already does correctly.
Fixes list corruption warnings like:
list_add corruption. prev->next should be next (ffffffff...)
Signed-off-by: Zac Bowling <zac@zacbowling.com>
Link: https://patch.msgid.link/20260120201043.38225-3-zac@zacbowling.com
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
I now have all the information needed for the full analysis. Let me
compile the findings.
---
## PHASE 1: COMMIT MESSAGE FORENSICS
### Step 1.1: Subject Line
- **Subsystem**: wifi: mt76 (MediaTek WiFi driver)
- **Action verb**: "fix" - explicitly a bug fix
- **Summary**: Fix list corruption in `mt76_wcid_cleanup()`
Record: [wifi: mt76] [fix] [list corruption in mt76_wcid_cleanup causing
stale pointers after reset]
### Step 1.2: Tags
- **Signed-off-by**: Zac Bowling <zac@zacbowling.com> (author)
- **Link**:
https://patch.msgid.link/20260120201043.38225-3-zac@zacbowling.com
(original submission)
- **Signed-off-by**: Felix Fietkau <nbd@nbd.name> (mt76 subsystem
maintainer - applied the patch)
- No Fixes: tag (expected for manual review candidates)
- No Cc: stable tag (expected)
Record: Patch was applied by subsystem maintainer Felix Fietkau, who is
the author of the surrounding code. This is a strong signal the fix is
correct.
### Step 1.3: Commit Body Analysis
The commit clearly explains the bug:
1. `mt76_wcid_cleanup()` does not remove wcid entries from
`sta_poll_list`
2. `mt76_reset_device()` reinitializes the master `sta_poll_list` with
`INIT_LIST_HEAD`
3. This leaves `wcid->poll_list` with stale prev/next pointers
4. When `mt76_wcid_add_poll()` later checks `list_empty()` and does
`list_add_tail()`, list corruption occurs
**Symptom**: `list_add corruption. prev->next should be next
(ffffffff...)` - a kernel WARNING/BUG
Record: Clear list corruption bug during hardware restart. The failure
mode is a kernel list corruption warning, which indicates corrupted
linked list pointers. This can lead to crashes or undefined behavior.
### Step 1.4: Hidden Bug Fix Detection
This is NOT a hidden fix - it explicitly says "fix list corruption" and
describes the exact mechanism.
---
## PHASE 2: DIFF ANALYSIS
### Step 2.1: Inventory
- **Files changed**: 1 (`drivers/net/wireless/mediatek/mt76/mac80211.c`)
- **Lines added**: ~7 (5 lines of code + 4 lines of comment)
- **Lines removed**: 0
- **Functions modified**: `mt76_wcid_cleanup()`
- **Scope**: Single-file, single-function, surgical fix
Record: Very small, contained change. +10 lines (including comments),
single function.
### Step 2.2: Code Flow Change
**Before**: `mt76_wcid_cleanup()` cleaned up `tx_list`, `tx_pending`,
`tx_offchannel`, and `pktid` but NOT `poll_list`.
**After**: `mt76_wcid_cleanup()` also removes the wcid from
`sta_poll_list` using the proper `spin_lock_bh(&dev->sta_poll_lock)` /
`list_del_init()` pattern, matching how `tx_list` is handled (lines
1721-1722).
### Step 2.3: Bug Mechanism
This is a **list corruption / stale pointer bug**:
1. `mt76_reset_device()` calls `mt76_wcid_cleanup()` for each wcid (line
848)
2. After the loop, it does `INIT_LIST_HEAD(&dev->sta_poll_list)` (line
854) - reinitializes the list head
3. Any wcid still linked to `sta_poll_list` now has stale prev/next
pointers
4. Later `mt76_wcid_add_poll()` (line 1747) checks `list_empty()` on the
stale entry, gets a bogus result, and triggers list corruption when
trying to add
The fix adds the missing cleanup. This matches the established pattern -
every other caller of `mt76_wcid_cleanup()` (mt7996, mt7915, mt792x,
mt7615, mt7603) removes the wcid from poll_list BEFORE calling
`mt76_wcid_cleanup()`. Only the `mt76_reset_device()` path was missing
this.
### Step 2.4: Fix Quality
- **Obviously correct**: Yes. It adds `list_del_init()` under the same
lock, matching the exact pattern used by ALL individual driver callers
and matching how `tx_list` is already handled in the same function.
- **Minimal**: Yes. 5 lines of code, 4 lines of comment.
- **Regression risk**: Very low. Adding a properly locked
`list_del_init()` is safe. The `list_empty()` check prevents double-
delete. The init ensures the poll_list is in a clean state.
---
## PHASE 3: GIT HISTORY INVESTIGATION
### Step 3.1: Blame
- `mt76_wcid_cleanup()` was introduced by commit `0335c034e7265d` (Felix
Fietkau, 2023-08-29)
- `poll_list` initialization in `mt76_wcid_init` was added by
`cbf5e61da66028` (Felix Fietkau, 2025-01-02)
- `mt76_wcid_add_poll()` was added by `387ab042ace87` (Felix Fietkau,
2024-12-30, in v6.14)
- `mt76_reset_device()` was added by `065c79df595af` (Felix Fietkau,
2025-08-27, in v6.17)
The bug was introduced when `065c79df595af` added `mt76_reset_device()`
which calls `mt76_wcid_cleanup()` then reinitializes `sta_poll_list`
without first removing entries.
### Step 3.2: Fixes Tag
No Fixes: tag. Based on analysis, should reference `065c79df595af`
("wifi: mt76: mt7915: fix list corruption after hardware restart") which
introduced `mt76_reset_device()`.
### Step 3.3: Related Changes
- `065c79df595af` - mt7915 list corruption fix (introduced
mt76_reset_device, paradoxically introducing THIS bug)
- `a3c99ef88a084` - do not add non-sta wcid entries to the poll list
- `ace5d3b6b49e8` - mt7996 hardware restart reliability (uses
mt76_reset_device)
- `328e35c7bfc67` - mt7915 hardware restart reliability
### Step 3.4: Author
Zac Bowling is not a regular mt76 contributor (only 1 commit found).
However, the patch was accepted and signed by Felix Fietkau
(nbd@nbd.name), who is the mt76 subsystem maintainer and authored ALL
the surrounding code.
### Step 3.5: Dependencies
The fix is standalone. It only uses `dev->sta_poll_lock`,
`wcid->poll_list`, `list_empty()`, `list_del_init()`, and
`spin_lock_bh()/spin_unlock_bh()` - all of which exist in any kernel
that has `mt76_reset_device()` (v6.17+).
---
## PHASE 4: MAILING LIST RESEARCH
### Step 4.1-4.5
Lore was behind Anubis anti-bot protection and could not be directly
fetched. However, the commit has a Link: to
`patch.msgid.link/20260120201043.38225-3-zac@zacbowling.com`, and b4 dig
confirmed the related series context. The patch was applied by the
subsystem maintainer (Felix Fietkau), which is the strongest possible
endorsement for mt76 patches.
---
## PHASE 5: CODE SEMANTIC ANALYSIS
### Step 5.1: Functions Modified
- `mt76_wcid_cleanup()` - the only function modified
### Step 5.2: Callers
`mt76_wcid_cleanup()` is called from:
1. `mt76_reset_device()` (mac80211.c:848) - the buggy path
2. `mt76_unregister_device()` (mac80211.c:807) - for global wcid
3. `mt76_sta_pre_rcu_remove()` (mac80211.c:1617) - normal station
removal
4. Individual drivers: mt7996, mt7915, mt7925, mt792x, mt7615, mt7603 -
in their sta_remove/bss_remove handlers
All the individual driver callers (items 4) already remove `poll_list`
BEFORE calling `mt76_wcid_cleanup()`. Only the `mt76_reset_device()`
path (item 1) was missing this cleanup.
### Step 5.3-5.5: Call Chain and Impact
`mt76_reset_device()` is called from:
- `mt7915_mac_full_reset()` - hardware restart path
- `mt7996` hardware restart path
This is triggered during hardware error recovery - a real, non-rare
event for WiFi users experiencing firmware crashes.
---
## PHASE 6: STABLE TREE ANALYSIS
### Step 6.1: Buggy Code in Stable Trees
- `mt76_reset_device()` (the bug source) was introduced in
`065c79df595af`, first in v6.17
- `mt76_wcid_add_poll()` (needed for the bug to manifest) in v6.14
- **Bug exists in**: v6.17, v6.18, v6.19, v7.0
- The surrounding code (`bdeac7815629c` offchannel cleanup) is also in
v6.17+ so the context should match
### Step 6.2: Backport Complications
The fix should apply cleanly to v6.17+. The diff context lines
(idr_destroy, tx_list cleanup) have been stable since 2023.
---
## PHASE 7: SUBSYSTEM CONTEXT
### Step 7.1: Subsystem
- **Subsystem**: WiFi driver (drivers/net/wireless/mediatek/mt76)
- **Criticality**: IMPORTANT - mt76 is one of the most popular open-
source WiFi drivers, used in many routers (OpenWrt), embedded systems,
and Linux laptops
- MediaTek WiFi chipsets (mt7915, mt7996, mt7921/mt7922) are extremely
common
### Step 7.2: Activity Level
Very active - many commits in the v6.17-v7.0 window, actively maintained
by Felix Fietkau.
---
## PHASE 8: IMPACT AND RISK ASSESSMENT
### Step 8.1: Affected Users
All users of mt76-based WiFi hardware that experience hardware restarts
(firmware crashes). This is a broad category including:
- OpenWrt routers with mt7915/mt7996 chipsets
- Laptops with mt7921/mt7922 WiFi
- Any system using MediaTek WiFi that encounters a hardware error
triggering restart
### Step 8.2: Trigger Conditions
Triggered during hardware restart/reset recovery - specifically when
`mt76_reset_device()` is called and then `mt76_wcid_add_poll()` is
called for a wcid that was on the old poll_list. Not extremely rare for
WiFi users since firmware crashes do happen.
### Step 8.3: Failure Mode Severity
- **List corruption** triggers kernel `BUG()`/`WARN()` from list debug
code
- Without list debug: silent corruption can lead to kernel crashes, use-
after-free, or infinite loops
- **Severity: CRITICAL** (kernel crash or corruption during WiFi
recovery)
### Step 8.4: Risk-Benefit Ratio
- **Benefit**: HIGH - prevents kernel crashes during WiFi hardware
restart, affects common hardware
- **Risk**: VERY LOW - 5 lines of code, uses established patterns,
accepted by subsystem maintainer
- **Ratio**: Strongly favorable
---
## PHASE 9: FINAL SYNTHESIS
### Step 9.1: Evidence Summary
**FOR backporting:**
- Fixes a real, demonstrable list corruption bug that causes kernel
crashes
- Small, surgical fix (5 lines of code + comments)
- Follows the exact established pattern used by ALL other callers
- Applied by the subsystem maintainer (Felix Fietkau) who wrote the
surrounding code
- Affects widely-used WiFi hardware (MediaTek mt76 chipsets)
- Bug triggers during hardware restart - a real-world scenario
**AGAINST backporting:**
- No Fixes: tag (but the fix implicitly targets `065c79df595af`)
- Author is not a regular contributor (but maintainer accepted it)
- Bug only exists in v6.17+ (limited to recent stable trees)
### Step 9.2: Stable Rules Checklist
1. **Obviously correct and tested?** YES - follows existing pattern,
accepted by maintainer
2. **Fixes a real bug?** YES - list corruption during hardware restart
3. **Important issue?** YES - kernel BUG/crash severity
4. **Small and contained?** YES - 5 lines of code, single function
5. **No new features or APIs?** Correct - pure bug fix
6. **Can apply to stable?** YES - should apply cleanly to v6.17+
### Step 9.3: Exception Categories
Not applicable - this is a straightforward bug fix, not an exception
category.
### Step 9.4: Decision
Strong YES. This is a textbook stable backport candidate: small,
obviously correct, fixes a real kernel crash (list corruption), affects
popular hardware, accepted by the subsystem maintainer.
---
## Verification
- [Phase 1] Parsed tags: Link to patch.msgid.link, applied by Felix
Fietkau (mt76 maintainer)
- [Phase 2] Diff analysis: +5 lines of code in `mt76_wcid_cleanup()`,
adds `list_del_init(&wcid->poll_list)` under `sta_poll_lock`
- [Phase 2] Pattern match: Verified ALL other callers (mt7996, mt7915,
mt792x, mt7615, mt7603) remove poll_list before calling
mt76_wcid_cleanup - this fix makes the function self-contained
- [Phase 3] git blame: `mt76_wcid_cleanup()` introduced in
0335c034e7265d (2023-08-29), `mt76_reset_device()` introduced in
065c79df595af (2025-08-27)
- [Phase 3] git merge-base: `mt76_reset_device()` (065c79df595af) is in
v6.17 but NOT v6.16, confirming bug exists in v6.17+
- [Phase 3] Related fix: 065c79df595af introduced `mt76_reset_device()`
which paradoxically introduced this bug by calling
`mt76_wcid_cleanup()` without poll_list cleanup then doing
`INIT_LIST_HEAD(&dev->sta_poll_list)`
- [Phase 4] Lore: blocked by Anubis, but commit was applied by the
subsystem maintainer which confirms review
- [Phase 5] Callers of `mt76_wcid_cleanup`: 7 call sites found;
`mt76_reset_device()` is the only one that doesn't remove poll_list
beforehand
- [Phase 5] `mt76_reset_device()` called from mt7915_mac_full_reset and
mt7996 restart - real hardware restart paths
- [Phase 6] Bug exists in v6.17, v6.18, v6.19, v7.0 (confirmed with git
merge-base)
- [Phase 6] Context code (offchannel bdeac7815629c) confirmed in v6.17+,
so patch should apply cleanly
- [Phase 8] Failure mode: list_add corruption BUG/WARN → kernel crash,
severity CRITICAL
- UNVERIFIED: Could not access lore discussion due to Anubis protection;
maintainer sign-off is sufficient evidence of review
**YES**
drivers/net/wireless/mediatek/mt76/mac80211.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/net/wireless/mediatek/mt76/mac80211.c b/drivers/net/wireless/mediatek/mt76/mac80211.c
index 75772979f438e..d0c522909e980 100644
--- a/drivers/net/wireless/mediatek/mt76/mac80211.c
+++ b/drivers/net/wireless/mediatek/mt76/mac80211.c
@@ -1716,6 +1716,16 @@ void mt76_wcid_cleanup(struct mt76_dev *dev, struct mt76_wcid *wcid)
idr_destroy(&wcid->pktid);
+ /* Remove from sta_poll_list to prevent list corruption after reset.
+ * Without this, mt76_reset_device() reinitializes sta_poll_list but
+ * leaves wcid->poll_list with stale pointers, causing list corruption
+ * when mt76_wcid_add_poll() checks list_empty().
+ */
+ spin_lock_bh(&dev->sta_poll_lock);
+ if (!list_empty(&wcid->poll_list))
+ list_del_init(&wcid->poll_list);
+ spin_unlock_bh(&dev->sta_poll_lock);
+
spin_lock_bh(&phy->tx_lock);
if (!list_empty(&wcid->tx_list))
--
2.53.0
next prev parent reply other threads:[~2026-04-20 13:27 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20260420132314.1023554-1-sashal@kernel.org>
2026-04-20 13:17 ` [PATCH AUTOSEL 7.0-6.19] wifi: mt76: avoid to set ACK for MCU command if wait_resp is not set Sasha Levin
2026-04-20 13:17 ` [PATCH AUTOSEL 7.0-6.18] phy: phy-mtk-tphy: Update names and format of kernel-doc comments Sasha Levin
2026-04-20 13:18 ` [PATCH AUTOSEL 7.0-6.12] Bluetooth: btmtk: add MT7902 MCU support Sasha Levin
2026-04-20 13:18 ` [PATCH AUTOSEL 7.0-6.18] wifi: mt76: flush pending TX before channel switch Sasha Levin
2026-04-20 13:18 ` Sasha Levin [this message]
2026-04-20 13:18 ` [PATCH AUTOSEL 7.0-6.12] wifi: mt76: add missing lock protection in mt76_sta_state for sta_event callback Sasha Levin
2026-04-20 13:18 ` [PATCH AUTOSEL 7.0-6.1] Bluetooth: btmtk: improve mt79xx firmware setup retry flow Sasha Levin
2026-04-20 13:19 ` [PATCH AUTOSEL 7.0-6.18] wifi: mt76: mt7996: Disable Rx hdr_trans in monitor mode Sasha Levin
2026-04-20 13:19 ` [PATCH AUTOSEL 7.0-6.12] wifi: mt76: mt7925: Skip scan process during suspend Sasha Levin
2026-04-20 13:19 ` [PATCH AUTOSEL 7.0-5.10] wifi: mt76: mt76x02: wake queues after reconfig Sasha Levin
2026-04-20 13:19 ` [PATCH AUTOSEL 7.0-6.12] wifi: mt76: mt7925: resolve link after acquiring mt76 mutex Sasha Levin
2026-04-20 13:19 ` [PATCH AUTOSEL 7.0-6.18] wifi: mt76: mt7996: fix queue pause after scan due to wrong channel switch reason Sasha Levin
2026-04-20 13:20 ` [PATCH AUTOSEL 7.0-6.12] net: ethernet: mtk_eth_soc: avoid writing to ESW registers on MT7628 Sasha Levin
2026-04-20 13:20 ` [PATCH AUTOSEL 7.0-6.6] wifi: mt76: mt7996: reset device after MCU message timeout Sasha Levin
2026-04-20 13:21 ` [PATCH AUTOSEL 7.0-5.10] drm/mediatek: mtk_dsi: enable hs clock during pre-enable Sasha Levin
2026-04-20 13:21 ` [PATCH AUTOSEL 7.0-6.18] wifi: mt76: mt7996: fix frequency separation for station STR mode Sasha Levin
2026-04-20 13:22 ` [PATCH AUTOSEL 6.18] net: airoha: Fix memory leak in airoha_qdma_rx_process() Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260420132314.1023554-133-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=angelogioacchino.delregno@collabora.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mediatek@lists.infradead.org \
--cc=linux-wireless@vger.kernel.org \
--cc=lorenzo@kernel.org \
--cc=matthias.bgg@gmail.com \
--cc=nbd@nbd.name \
--cc=patches@lists.linux.dev \
--cc=ryder.lee@mediatek.com \
--cc=stable@vger.kernel.org \
--cc=zac@zacbowling.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox