From: Cole Leavitt <cole@unwrap.rs>
To: linux-wireless@vger.kernel.org
Cc: greearb@candelatech.com, miriam.rachel.korenblit@intel.com,
johannes@sipsolutions.net, cole@unwrap.rs
Subject: [PATCH v3 0/3] wifi: iwlwifi: mld: stability fixes around firmware error recovery
Date: Mon, 20 Apr 2026 10:44:03 -0700 [thread overview]
Message-ID: <20260420174406.128254-1-cole@unwrap.rs> (raw)
Three fixes for the iwlmld sub-driver observed on Intel BE200 (Wi-Fi 7).
1/3 adds STATUS_FW_ERROR guards to the NAPI poll functions and to the
TX/BA notification handlers. iwl_trans_reclaim() already has its
own STATUS_FW_ERROR early-return, so these are defense-in-depth
with WARN_ONCE instrumentation: if the suspected post-FW-error race
fires, we now catch it early (before reclaim) and log it. Tested by
Ben Greear, who confirmed the WARN fires on his systems during
firmware error recovery.
2/3 fixes a real TSO segmentation bug. When the TLC notification
disables AMSDU for a TID, max_tid_amsdu_len is set to the sentinel
value 1 (see mld/tlc.c line 858). The existing zero-only check in
iwl_mld_tx_tso_segment() lets this sentinel through, producing
num_subframes=0, which feeds gso_size=0 into iwl_tx_tso_segment()
and downstream skb_gso_segment().
MVM is immune because it gates on mvmsta->amsdu_enabled; MLD has no
equivalent bitmap. Fix by also treating the sentinel 1 as "AMSDU
disabled" at the existing guard, and add a WARN_ON_ONCE(!num_subframes)
after the division so any future path that produces zero through a
different mechanism is caught and reported rather than silently
creating a pathological GSO skb.
3/3 adds STATUS_FW_ERROR checks at the top of iwl_mld_tx_from_txq() and
iwl_mld_mac80211_tx() to stop pulling frames for dead firmware,
eliminating an observed soft lockup during firmware error recovery.
Revised per Johannes Berg's feedback to use status-bit checks rather
than ieee80211_stop_queues()/wake_queues() which do not interact
well with TXQ-based APIs.
Changes since v2:
- 1/3:
* Stripped inadvertent clang-format style churn from v2; the diff
is now only the functional STATUS_FW_ERROR guards (four hunks,
~30 lines added).
* Rewrote commit message: the v2 message claimed a proven
SSN-corruption -> iwl_trans_reclaim() UAF chain, but
iwl_trans_reclaim() already checks STATUS_FW_ERROR itself
(iwl-trans.c:~663), so that chain cannot actually reach the
queue walk. The patch is more accurately described as
"earlier STATUS_FW_ERROR guards with WARN_ONCE instrumentation
for diagnosis of suspected post-FW-error NAPI scheduling."
* Kept Tested-by: Ben Greear.
- 2/3:
* Removed the speculative "TCP retransmit queue UAF / refcount
underflow in tcp_shifted_skb / NULL deref in tcp_rack_detect_loss"
chain from the commit message per Ben Greear's feedback. Those
symptoms are real but the causal link to this bug was not
directly traced; describing them as consequences of this patch
was overclaiming.
* Commit message now states only what can be traced in-tree:
sentinel 1 -> num_subframes=0 -> gso_size=0 -> unbounded
skb_gso_segment() output. Downstream symptom attribution is
left for the separate investigation Ben and I have underway.
* Code change is unchanged from v2.
- 3/3: Unchanged from v2 beyond rebase context.
To Miriam's question in the v2 thread ("Was the soft lockup happening
as a consequence of the bug fixed in 2/3?"): Yes, our typical trace is
2/3's GSO explosion -> firmware receives malformed AMSDU descriptors
-> firmware hangs in an MMIO poll (FSEQ_ERROR_CODE 0x67A00000,
SYSTEM_STATISTICS_CMD timeout) -> 3/3's dead-firmware TX path keeps
spinning -> soft lockup. Full dmesg attached to a follow-up on the
v2 thread so the Intel firmware team can investigate the c102 MMIO
poll hang separately; the kernel-side chain is independently
reproducible with a small test case.
Cole Leavitt (3):
wifi: iwlwifi: add STATUS_FW_ERROR guards to NAPI/TX-notif paths
wifi: iwlwifi: mld: fix TSO segmentation when AMSDU is disabled
wifi: iwlwifi: mld: skip TX when firmware is dead
drivers/net/wireless/intel/iwlwifi/mld/mac80211.c | 4 ++++
drivers/net/wireless/intel/iwlwifi/mld/tx.c | 22 ++++++++++++++++++++++
drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/rx.c| 18 ++++++++++++++++++
3 files changed, 44 insertions(+)
base-commit: 3aae9383f42f687221c011d7ee87529398e826b3
--
2.52.0
next reply other threads:[~2026-04-20 17:44 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-20 17:44 Cole Leavitt [this message]
2026-04-20 17:44 ` [PATCH v3 1/3] wifi: iwlwifi: add STATUS_FW_ERROR guards to NAPI/TX-notif paths Cole Leavitt
2026-04-20 17:44 ` [PATCH v3 2/3] wifi: iwlwifi: mld: fix TSO segmentation when AMSDU is disabled Cole Leavitt
2026-04-20 17:44 ` [PATCH v3 3/3] wifi: iwlwifi: mld: skip TX when firmware is dead Cole Leavitt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260420174406.128254-1-cole@unwrap.rs \
--to=cole@unwrap.rs \
--cc=greearb@candelatech.com \
--cc=johannes@sipsolutions.net \
--cc=linux-wireless@vger.kernel.org \
--cc=miriam.rachel.korenblit@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox