* [PATCH v3 1/3] wifi: iwlwifi: add STATUS_FW_ERROR guards to NAPI/TX-notif paths
[not found] <20260420174406.128254-1-cole@unwrap.rs>
@ 2026-04-20 17:44 ` Cole Leavitt
2026-04-20 17:44 ` [PATCH v3 2/3] wifi: iwlwifi: mld: fix TSO segmentation when AMSDU is disabled Cole Leavitt
1 sibling, 0 replies; 2+ messages in thread
From: Cole Leavitt @ 2026-04-20 17:44 UTC (permalink / raw)
To: linux-wireless; +Cc: greearb, miriam.rachel.korenblit, johannes, cole, stable
After firmware error is detected and STATUS_FW_ERROR is set, NAPI may
still be in-flight from a prior interrupt or get scheduled by the MSIX
IRQ handler before the error bit is processed. The NAPI poll functions
have no STATUS_FW_ERROR check and will continue processing stale RX ring
entries from dying firmware.
iwl_trans_reclaim() already early-returns on STATUS_FW_ERROR, so any
TX-response notification that makes it through to reclaim is a no-op.
What remains is:
* CPU spent parsing stale RX inside iwl_pcie_rx_handle() before
dispatching to the op_mode.
* No signal in the logs when the race fires, making the
post-FW-error sequence harder to debug.
Add STATUS_FW_ERROR early-returns with WARN_ONCE() in four places:
* iwl_pcie_napi_poll() (legacy NAPI poll)
* iwl_pcie_napi_poll_msix() (MSIX NAPI poll)
* iwl_mld_handle_tx_resp_notif()
* iwl_mld_handle_compressed_ba_notif()
Rationale:
1. Stop NAPI from consuming any more RX budget once firmware is
declared dead; the restart path will re-initialise the rings.
2. Provide a single, one-shot log line via WARN_ONCE so we can tell
from a user's dmesg whether the post-error race actually fired in
their configuration, which has been hard to reproduce outside
Ben Greear's test rig.
_iwl_trans_pcie_gen2_stop_device() already calls iwl_pcie_rx_napi_sync()
to quiesce NAPI during device teardown, but that runs much later in the
restart sequence; these checks close the window between error detection
and device stop.
Fixes: d1e879ec600f ("wifi: iwlwifi: add iwlmld sub-driver")
Cc: stable@vger.kernel.org
Tested-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Cole Leavitt <cole@unwrap.rs>
---
drivers/net/wireless/intel/iwlwifi/mld/tx.c | 19 +++++++++++++++++++
drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/rx.c | 18 ++++++++++++++++++
2 files changed, 37 insertions(+)
diff --git a/drivers/net/wireless/intel/iwlwifi/mld/tx.c b/drivers/net/wireless/intel/iwlwifi/mld/tx.c
index 546d09a38dab..e341d12e5233 100644
--- a/drivers/net/wireless/intel/iwlwifi/mld/tx.c
+++ b/drivers/net/wireless/intel/iwlwifi/mld/tx.c
@@ -1082,6 +1082,15 @@ void iwl_mld_handle_tx_resp_notif(struct iwl_mld *mld,
bool mgmt = false;
bool tx_failure = (status & TX_STATUS_MSK) != TX_STATUS_SUCCESS;
+ /* iwl_trans_reclaim() already guards on STATUS_FW_ERROR, but
+ * bail out earlier (and log once) so we can tell from dmesg
+ * whether this race actually fires in the field.
+ */
+ if (unlikely(test_bit(STATUS_FW_ERROR, &mld->trans->status))) {
+ WARN_ONCE(1, "iwlwifi: TX resp notif (sta=%d txq=%d) after FW error\n",
+ sta_id, txq_id);
+ return;
+ }
if (IWL_FW_CHECK(mld, tx_resp->frame_count != 1,
"Invalid tx_resp notif frame_count (%d)\n",
tx_resp->frame_count))
@@ -1360,6 +1369,16 @@ void iwl_mld_handle_compressed_ba_notif(struct iwl_mld *mld,
u8 sta_id = ba_res->sta_id;
struct ieee80211_link_sta *link_sta;
+ /* Same rationale as iwl_mld_handle_tx_resp_notif: redundant with
+ * iwl_trans_reclaim()'s own STATUS_FW_ERROR check, but fails fast
+ * and logs via WARN_ONCE when the race is actually hit.
+ */
+ if (unlikely(test_bit(STATUS_FW_ERROR, &mld->trans->status))) {
+ WARN_ONCE(1, "iwlwifi: BA notif (sta=%d) after FW error\n",
+ sta_id);
+ return;
+ }
+
if (!tfd_cnt)
return;
diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/rx.c b/drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/rx.c
index fe263cdc2e4f..554c22777ec1 100644
--- a/drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/rx.c
+++ b/drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/rx.c
@@ -1012,6 +1012,15 @@ static int iwl_pcie_napi_poll(struct napi_struct *napi, int budget)
trans_pcie = iwl_netdev_to_trans_pcie(napi->dev);
trans = trans_pcie->trans;
+ /* Don't process RX for dying firmware; the restart path will
+ * re-init the rings. WARN_ONCE helps surface whether this race
+ * actually fires in user dmesg.
+ */
+ if (unlikely(test_bit(STATUS_FW_ERROR, &trans->status))) {
+ WARN_ONCE(1, "iwlwifi: NAPI poll[%d] invoked after FW error\n",
+ rxq->id);
+ napi_complete_done(napi, 0);
+ return 0;
+ }
+
ret = iwl_pcie_rx_handle(trans, rxq->id, budget);
IWL_DEBUG_ISR(trans, "[%d] handled %d, budget %d\n",
@@ -1039,6 +1048,15 @@ static int iwl_pcie_napi_poll_msix(struct napi_struct *napi, int budget)
trans_pcie = iwl_netdev_to_trans_pcie(napi->dev);
trans = trans_pcie->trans;
+ if (unlikely(test_bit(STATUS_FW_ERROR, &trans->status))) {
+ WARN_ONCE(1,
+ "iwlwifi: NAPI MSIX poll[%d] invoked after FW error\n",
+ rxq->id);
+ napi_complete_done(napi, 0);
+ return 0;
+ }
+
ret = iwl_pcie_rx_handle(trans, rxq->id, budget);
IWL_DEBUG_ISR(trans, "[%d] handled %d, budget %d\n", rxq->id, ret,
budget);
--
2.52.0
^ permalink raw reply related [flat|nested] 2+ messages in thread* [PATCH v3 2/3] wifi: iwlwifi: mld: fix TSO segmentation when AMSDU is disabled
[not found] <20260420174406.128254-1-cole@unwrap.rs>
2026-04-20 17:44 ` [PATCH v3 1/3] wifi: iwlwifi: add STATUS_FW_ERROR guards to NAPI/TX-notif paths Cole Leavitt
@ 2026-04-20 17:44 ` Cole Leavitt
1 sibling, 0 replies; 2+ messages in thread
From: Cole Leavitt @ 2026-04-20 17:44 UTC (permalink / raw)
To: linux-wireless; +Cc: greearb, miriam.rachel.korenblit, johannes, cole, stable
When the TLC notification disables AMSDU for a TID, mld/tlc.c (line 858)
sets link_sta->agg.max_tid_amsdu_len[TID] to the sentinel value 1. The
TSO segmentation path in iwl_mld_tx_tso_segment() guards only against
zero, not this sentinel, so it reaches the num_subframes calculation:
num_subframes = (max_tid_amsdu_len + pad) / (subf_len + pad)
= (1 + 2) / (1534 + 2) = 0
and then passes num_subframes=0 to iwl_tx_tso_segment(), which sets
gso_size = num_subframes * mss = 0
Calling skb_gso_segment() with gso_size=0 produces an unbounded number
of tiny segments from a single GSO skb. On a BE200 we've observed the
expansion push thousands of micro-frames into the TX ring before the
rest are purged.
The MVM driver is immune because it gates on mvmsta->amsdu_enabled (see
mvm/tx.c lines 910-913) before reaching the num_subframes calculation.
MLD has no equivalent bitmap and relies solely on max_tid_amsdu_len,
which does not recognise the sentinel 1.
Fix by checking max_tid_amsdu_len == 1 at the existing guard and
falling back to non-AMSDU TSO segmentation (Suggested-by Miriam
Korenblit). Also add WARN_ON_ONCE(!num_subframes) after the division
as defense-in-depth so any future path that produces zero through a
different mechanism is logged rather than silently creating a
pathological GSO skb.
Downstream user-visible symptoms (TCP retransmit queue corruption on
some setups, firmware MMIO-poll hang on BE200 with c102 firmware) have
been reported in connection with this bug but the causal chain between
the GSO explosion and those symptoms is being investigated separately
with Ben Greear, so they are not claimed here.
Suggested-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com>
Fixes: d1e879ec600f ("wifi: iwlwifi: add iwlmld sub-driver")
Cc: stable@vger.kernel.org
Signed-off-by: Cole Leavitt <cole@unwrap.rs>
---
drivers/net/wireless/intel/iwlwifi/mld/tx.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/net/wireless/intel/iwlwifi/mld/tx.c b/drivers/net/wireless/intel/iwlwifi/mld/tx.c
index e341d12e5233..8af58aabcd68 100644
--- a/drivers/net/wireless/intel/iwlwifi/mld/tx.c
+++ b/drivers/net/wireless/intel/iwlwifi/mld/tx.c
@@ -823,7 +823,7 @@ static int iwl_mld_tx_tso_segment(struct iwl_mld *mld, struct sk_buff *skb,
return -EINVAL;
max_tid_amsdu_len = sta->cur->max_tid_amsdu_len[tid];
- if (!max_tid_amsdu_len)
+ if (!max_tid_amsdu_len || max_tid_amsdu_len == 1)
return iwl_tx_tso_segment(skb, 1, netdev_flags, mpdus_skbs);
/* Sub frame header + SNAP + IP header + TCP header + MSS */
@@ -835,6 +835,9 @@ static int iwl_mld_tx_tso_segment(struct iwl_mld *mld, struct sk_buff *skb,
*/
num_subframes = (max_tid_amsdu_len + pad) / (subf_len + pad);
+ if (WARN_ON_ONCE(!num_subframes))
+ return iwl_tx_tso_segment(skb, 1, netdev_flags, mpdus_skbs);
+
if (sta->max_amsdu_subframes &&
num_subframes > sta->max_amsdu_subframes)
num_subframes = sta->max_amsdu_subframes;
--
2.52.0
^ permalink raw reply related [flat|nested] 2+ messages in thread