public inbox for linux-wireless@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] wifi: iwlwifi: prevent NAPI processing after firmware error
       [not found] <c6f886d4-b9ed-48a6-9723-a738af055b64@candelatech.com>
@ 2026-02-14 18:10 ` Cole Leavitt
       [not found]   ` <5be8a502-d53a-4cce-821f-202368c44f6d@candelatech.com>
                     ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Cole Leavitt @ 2026-02-14 18:10 UTC (permalink / raw)
  To: greearb, johannes.berg, miriam.rachel.korenblit
  Cc: linux-wireless, Cole Leavitt

After a firmware error is detected and STATUS_FW_ERROR is set, NAPI can
still be actively polling or get scheduled from a prior interrupt. The
NAPI poll functions (both legacy and MSIX variants) have no check for
STATUS_FW_ERROR and will continue processing stale RX ring entries from
dying firmware. This can dispatch TX completion notifications containing
corrupt SSN values to iwl_mld_handle_tx_resp_notif(), which passes them
to iwl_trans_reclaim(). If the corrupt SSN causes reclaim to walk TX
queue entries that were already freed by a prior correct reclaim, the
result is an skb use-after-free or double-free.

The race window opens when the MSIX IRQ handler schedules NAPI (lines
2319-2321 in rx.c) before processing the error bit (lines 2382-2396),
or when NAPI is already running on another CPU from a previous interrupt
when STATUS_FW_ERROR gets set on the current CPU.

Add STATUS_FW_ERROR checks to both NAPI poll functions to prevent
processing stale RX data after firmware error, and add early-return
guards in the TX response and compressed BA notification handlers as
defense-in-depth. Each check uses WARN_ONCE to log if the race is
actually hit, which aids diagnosis of the hard-to-reproduce skb
use-after-free reported on Intel BE200.

Note that _iwl_trans_pcie_gen2_stop_device() already calls
iwl_pcie_rx_napi_sync() to quiesce NAPI during device teardown, but that
runs much later in the restart sequence. These checks close the window
between error detection and device stop.

Signed-off-by: Cole Leavitt <cole@unwrap.rs>
---
Tested on Intel BE200 (FW 101.6e695a70.0) by forcing NMI via debugfs.
The WARN_ONCE fires reliably:

  iwlwifi: NAPI MSIX poll[0] invoked after FW error
  WARNING: drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/rx.c:1058
           at iwl_pcie_napi_poll_msix+0xff/0x130 [iwlwifi], CPU#22

Confirming NAPI poll is invoked after STATUS_FW_ERROR is set. Without
this patch, that poll processes stale RX ring data from dead firmware.

 drivers/net/wireless/intel/iwlwifi/mld/tx.c   | 19 ++++++++++++++++++
 .../wireless/intel/iwlwifi/pcie/gen1_2/rx.c   | 20 +++++++++++++++++++
 2 files changed, 39 insertions(+)

diff --git a/drivers/net/wireless/intel/iwlwifi/mld/tx.c b/drivers/net/wireless/intel/iwlwifi/mld/tx.c
index 3b4b575aadaa..3e99f3ded9bc 100644
--- a/drivers/net/wireless/intel/iwlwifi/mld/tx.c
+++ b/drivers/net/wireless/intel/iwlwifi/mld/tx.c
@@ -1071,6 +1071,18 @@ void iwl_mld_handle_tx_resp_notif(struct iwl_mld *mld,
 	bool mgmt = false;
 	bool tx_failure = (status & TX_STATUS_MSK) != TX_STATUS_SUCCESS;
 
+	/* Firmware is dead — the TX response may contain corrupt SSN values
+	 * from a dying firmware DMA. Processing it could cause
+	 * iwl_trans_reclaim() to free the wrong TX queue entries, leading to
+	 * skb use-after-free or double-free.
+	 */
+	if (unlikely(test_bit(STATUS_FW_ERROR, &mld->trans->status))) {
+		WARN_ONCE(1,
+			  "iwlwifi: TX resp notif (sta=%d txq=%d) after FW error\n",
+			  sta_id, txq_id);
+		return;
+	}
+
 	if (IWL_FW_CHECK(mld, tx_resp->frame_count != 1,
 			 "Invalid tx_resp notif frame_count (%d)\n",
 			 tx_resp->frame_count))
@@ -1349,6 +1361,13 @@ void iwl_mld_handle_compressed_ba_notif(struct iwl_mld *mld,
 	u8 sta_id = ba_res->sta_id;
 	struct ieee80211_link_sta *link_sta;
 
+	if (unlikely(test_bit(STATUS_FW_ERROR, &mld->trans->status))) {
+		WARN_ONCE(1,
+			  "iwlwifi: BA notif (sta=%d) after FW error\n",
+			  sta_id);
+		return;
+	}
+
 	if (!tfd_cnt)
 		return;
 
diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/rx.c b/drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/rx.c
index 619a9505e6d9..ba18d35fa55d 100644
--- a/drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/rx.c
+++ b/drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/rx.c
@@ -1015,6 +1015,18 @@ static int iwl_pcie_napi_poll(struct napi_struct *napi, int budget)
 	trans_pcie = iwl_netdev_to_trans_pcie(napi->dev);
 	trans = trans_pcie->trans;
 
+	/* Stop processing RX if firmware has crashed. Stale notifications
+	 * from dying firmware (e.g. TX completions with corrupt SSN values)
+	 * can cause use-after-free in reclaim paths.
+	 */
+	if (unlikely(test_bit(STATUS_FW_ERROR, &trans->status))) {
+		WARN_ONCE(1,
+			  "iwlwifi: NAPI poll[%d] invoked after FW error\n",
+			  rxq->id);
+		napi_complete_done(napi, 0);
+		return 0;
+	}
+
 	ret = iwl_pcie_rx_handle(trans, rxq->id, budget);
 
 	IWL_DEBUG_ISR(trans, "[%d] handled %d, budget %d\n",
@@ -1042,6 +1054,14 @@ static int iwl_pcie_napi_poll_msix(struct napi_struct *napi, int budget)
 	trans_pcie = iwl_netdev_to_trans_pcie(napi->dev);
 	trans = trans_pcie->trans;
 
+	if (unlikely(test_bit(STATUS_FW_ERROR, &trans->status))) {
+		WARN_ONCE(1,
+			  "iwlwifi: NAPI MSIX poll[%d] invoked after FW error\n",
+			  rxq->id);
+		napi_complete_done(napi, 0);
+		return 0;
+	}
+
 	ret = iwl_pcie_rx_handle(trans, rxq->id, budget);
 	IWL_DEBUG_ISR(trans, "[%d] handled %d, budget %d\n", rxq->id, ret,
 		      budget);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] wifi: iwlwifi: prevent NAPI processing after firmware error
       [not found]   ` <5be8a502-d53a-4cce-821f-202368c44f6d@candelatech.com>
@ 2026-02-14 18:33     ` Cole Leavitt
  2026-02-16 18:12       ` Ben Greear
  0 siblings, 1 reply; 15+ messages in thread
From: Cole Leavitt @ 2026-02-14 18:33 UTC (permalink / raw)
  To: greearb
  Cc: johannes.berg, miriam.rachel.korenblit, linux-wireless,
	Cole Leavitt

Ben,

Good catch on both fronts.

On the build_tfd dangling pointer -- you're right. The failure path at
line 775 leaves entries[idx].skb/cmd pointing at caller-owned objects
(set at lines 763-764). The caller gets -1 and presumably frees the
skb, so entries[idx].skb becomes a dangling pointer. While write_ptr
not advancing means current unmap paths won't iterate to that index,
it's a latent UAF waiting for a flush path change or future code to
touch it. Two NULL stores inside a held spinlock cost nothing. I think
this should go upstream as its own patch.

On the TOCTOU question -- this is the part I spent the most time on.
The window you're asking about is: firmware starts producing corrupt
completion data *before* STATUS_FW_ERROR gets set. Our NAPI/TX handler
checks can't help there because the flag isn't set yet.

The primary guard in that window is iwl_txq_used() in
iwl_pcie_reclaim(). It validates that the firmware's SSN falls within
[read_ptr, write_ptr). This catches wild values -- out-of-range SSNs,
wraparound corruption, etc.

What it can't catch is an in-range corrupt SSN -- e.g., firmware says
reclaim up to index 15 when legitimate is 8, but write_ptr is 20.
That passes bounds checking and the reclaim loop frees skbs for
entries still in-flight (active DMA). The NULL skb WARN_ONCE in the
loop catches double-reclaim but not first-time over-reclaim.

The complete fix for this would be a per-entry generation counter --
tag each entry on submit, validate on reclaim. But that adds per-entry
overhead on the TX hot path to protect against a condition (firmware
producing corrupt completions) that is already terminal. I think the
right trade-off is:

  1. Your build_tfd NULL fix (eliminates one dangling pointer class)
  2. STATUS_FW_ERROR checks in NAPI poll + TX handlers (this series --
     shrinks the detection window to near-zero)
  3. The existing iwl_txq_used() bounds check (catches most corrupt
     SSNs)

Together these make the damage window small enough that a per-entry
generation scheme isn't justified -- by the time firmware is sending
corrupt SSNs, we're in dump-and-reset territory anyway.

That said, if you're seeing corruption patterns in your customer
testing where a valid-looking-but-wrong SSN gets through before
FW_ERROR fires, I'd be very interested in the traces. That would
change the cost/benefit on the generation counter approach.

Thanks,
Cole

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH] wifi: iwlwifi: prevent NAPI processing after firmware error
  2026-02-14 18:10 ` [PATCH] wifi: iwlwifi: prevent NAPI processing after firmware error Cole Leavitt
       [not found]   ` <5be8a502-d53a-4cce-821f-202368c44f6d@candelatech.com>
@ 2026-02-14 18:41   ` Cole Leavitt
  2026-02-14 18:43   ` [PATCH v3] " Cole Leavitt
  2 siblings, 0 replies; 15+ messages in thread
From: Cole Leavitt @ 2026-02-14 18:41 UTC (permalink / raw)
  To: johannes.berg, miriam.rachel.korenblit
  Cc: greearb, linux-wireless, stable, Cole Leavitt

After a firmware error is detected and STATUS_FW_ERROR is set, NAPI can
still be actively polling or get scheduled from a prior interrupt. The
NAPI poll functions (both legacy and MSIX variants) have no check for
STATUS_FW_ERROR and will continue processing stale RX ring entries from
dying firmware. This can dispatch TX completion notifications containing
corrupt SSN values to iwl_mld_handle_tx_resp_notif(), which passes them
to iwl_trans_reclaim(). If the corrupt SSN causes reclaim to walk TX
queue entries that were already freed by a prior correct reclaim, the
result is an skb use-after-free or double-free.

The race window opens when the MSIX IRQ handler schedules NAPI (lines
2319-2321 in rx.c) before processing the error bit (lines 2382-2396),
or when NAPI is already running on another CPU from a previous interrupt
when STATUS_FW_ERROR gets set on the current CPU.

Add STATUS_FW_ERROR checks to both NAPI poll functions to prevent
processing stale RX data after firmware error, and add early-return
guards in the TX response and compressed BA notification handlers as
defense-in-depth. Each check uses WARN_ONCE to log if the race is
actually hit, which aids diagnosis of the hard-to-reproduce skb
use-after-free reported on Intel BE200.

Note that _iwl_trans_pcie_gen2_stop_device() already calls
iwl_pcie_rx_napi_sync() to quiesce NAPI during device teardown, but that
runs much later in the restart sequence. These checks close the window
between error detection and device stop.

Fixes: d1e879ec600f ("wifi: iwlwifi: add iwlmld sub-driver")
Cc: stable@vger.kernel.org
Signed-off-by: Cole Leavitt <cole@unwrap.rs>
---
 drivers/net/wireless/intel/iwlwifi/mld/tx.c   | 19 ++++++++++++++++++
 .../wireless/intel/iwlwifi/pcie/gen1_2/rx.c   | 20 +++++++++++++++++++
 2 files changed, 39 insertions(+)

diff --git a/drivers/net/wireless/intel/iwlwifi/mld/tx.c b/drivers/net/wireless/intel/iwlwifi/mld/tx.c
index 3b4b575aadaa..3e99f3ded9bc 100644
--- a/drivers/net/wireless/intel/iwlwifi/mld/tx.c
+++ b/drivers/net/wireless/intel/iwlwifi/mld/tx.c
@@ -1071,6 +1071,18 @@ void iwl_mld_handle_tx_resp_notif(struct iwl_mld *mld,
 	bool mgmt = false;
 	bool tx_failure = (status & TX_STATUS_MSK) != TX_STATUS_SUCCESS;
 
+	/* Firmware is dead — the TX response may contain corrupt SSN values
+	 * from a dying firmware DMA. Processing it could cause
+	 * iwl_trans_reclaim() to free the wrong TX queue entries, leading to
+	 * skb use-after-free or double-free.
+	 */
+	if (unlikely(test_bit(STATUS_FW_ERROR, &mld->trans->status))) {
+		WARN_ONCE(1,
+			  "iwlwifi: TX resp notif (sta=%d txq=%d) after FW error\n",
+			  sta_id, txq_id);
+		return;
+	}
+
 	if (IWL_FW_CHECK(mld, tx_resp->frame_count != 1,
 			 "Invalid tx_resp notif frame_count (%d)\n",
 			 tx_resp->frame_count))
@@ -1349,6 +1361,13 @@ void iwl_mld_handle_compressed_ba_notif(struct iwl_mld *mld,
 	u8 sta_id = ba_res->sta_id;
 	struct ieee80211_link_sta *link_sta;
 
+	if (unlikely(test_bit(STATUS_FW_ERROR, &mld->trans->status))) {
+		WARN_ONCE(1,
+			  "iwlwifi: BA notif (sta=%d) after FW error\n",
+			  sta_id);
+		return;
+	}
+
 	if (!tfd_cnt)
 		return;
 
diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/rx.c b/drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/rx.c
index 619a9505e6d9..ba18d35fa55d 100644
--- a/drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/rx.c
+++ b/drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/rx.c
@@ -1015,6 +1015,18 @@ static int iwl_pcie_napi_poll(struct napi_struct *napi, int budget)
 	trans_pcie = iwl_netdev_to_trans_pcie(napi->dev);
 	trans = trans_pcie->trans;
 
+	/* Stop processing RX if firmware has crashed. Stale notifications
+	 * from dying firmware (e.g. TX completions with corrupt SSN values)
+	 * can cause use-after-free in reclaim paths.
+	 */
+	if (unlikely(test_bit(STATUS_FW_ERROR, &trans->status))) {
+		WARN_ONCE(1,
+			  "iwlwifi: NAPI poll[%d] invoked after FW error\n",
+			  rxq->id);
+		napi_complete_done(napi, 0);
+		return 0;
+	}
+
 	ret = iwl_pcie_rx_handle(trans, rxq->id, budget);
 
 	IWL_DEBUG_ISR(trans, "[%d] handled %d, budget %d\n",
@@ -1042,6 +1054,14 @@ static int iwl_pcie_napi_poll_msix(struct napi_struct *napi, int budget)
 	trans_pcie = iwl_netdev_to_trans_pcie(napi->dev);
 	trans = trans_pcie->trans;
 
+	if (unlikely(test_bit(STATUS_FW_ERROR, &trans->status))) {
+		WARN_ONCE(1,
+			  "iwlwifi: NAPI MSIX poll[%d] invoked after FW error\n",
+			  rxq->id);
+		napi_complete_done(napi, 0);
+		return 0;
+	}
+
 	ret = iwl_pcie_rx_handle(trans, rxq->id, budget);
 	IWL_DEBUG_ISR(trans, "[%d] handled %d, budget %d\n", rxq->id, ret,
 		      budget);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3] wifi: iwlwifi: prevent NAPI processing after firmware error
  2026-02-14 18:10 ` [PATCH] wifi: iwlwifi: prevent NAPI processing after firmware error Cole Leavitt
       [not found]   ` <5be8a502-d53a-4cce-821f-202368c44f6d@candelatech.com>
  2026-02-14 18:41   ` Cole Leavitt
@ 2026-02-14 18:43   ` Cole Leavitt
  2026-02-26 19:37     ` Ben Greear
  2 siblings, 1 reply; 15+ messages in thread
From: Cole Leavitt @ 2026-02-14 18:43 UTC (permalink / raw)
  To: johannes.berg, miriam.rachel.korenblit
  Cc: greearb, linux-wireless, stable, Cole Leavitt

After a firmware error is detected and STATUS_FW_ERROR is set, NAPI can
still be actively polling or get scheduled from a prior interrupt. The
NAPI poll functions (both legacy and MSIX variants) have no check for
STATUS_FW_ERROR and will continue processing stale RX ring entries from
dying firmware. This can dispatch TX completion notifications containing
corrupt SSN values to iwl_mld_handle_tx_resp_notif(), which passes them
to iwl_trans_reclaim(). If the corrupt SSN causes reclaim to walk TX
queue entries that were already freed by a prior correct reclaim, the
result is an skb use-after-free or double-free.

The race window opens when the MSIX IRQ handler schedules NAPI (lines
2319-2321 in rx.c) before processing the error bit (lines 2382-2396),
or when NAPI is already running on another CPU from a previous interrupt
when STATUS_FW_ERROR gets set on the current CPU.

Add STATUS_FW_ERROR checks to both NAPI poll functions to prevent
processing stale RX data after firmware error, and add early-return
guards in the TX response and compressed BA notification handlers as
defense-in-depth. Each check uses WARN_ONCE to log if the race is
actually hit, which aids diagnosis of the hard-to-reproduce skb
use-after-free reported on Intel BE200.

Note that _iwl_trans_pcie_gen2_stop_device() already calls
iwl_pcie_rx_napi_sync() to quiesce NAPI during device teardown, but that
runs much later in the restart sequence. These checks close the window
between error detection and device stop.

Fixes: d1e879ec600f ("wifi: iwlwifi: add iwlmld sub-driver")
Cc: stable@vger.kernel.org
Signed-off-by: Cole Leavitt <cole@unwrap.rs>
---
Changes since v1:
  - Added Fixes: tag and Cc: stable@vger.kernel.org

Tested on Intel BE200 (FW 101.6e695a70.0) by forcing NMI via debugfs.
The WARN_ONCE fires reliably:

  iwlwifi: NAPI MSIX poll[0] invoked after FW error
  WARNING: drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/rx.c:1058
           at iwl_pcie_napi_poll_msix+0xff/0x130 [iwlwifi], CPU#22

Confirming NAPI poll is invoked after STATUS_FW_ERROR is set. Without
this patch, that poll processes stale RX ring data from dead firmware.

 drivers/net/wireless/intel/iwlwifi/mld/tx.c   | 19 ++++++++++++++++++
 .../wireless/intel/iwlwifi/pcie/gen1_2/rx.c   | 20 +++++++++++++++++++
 2 files changed, 39 insertions(+)

diff --git a/drivers/net/wireless/intel/iwlwifi/mld/tx.c b/drivers/net/wireless/intel/iwlwifi/mld/tx.c
index 3b4b575aadaa..3e99f3ded9bc 100644
--- a/drivers/net/wireless/intel/iwlwifi/mld/tx.c
+++ b/drivers/net/wireless/intel/iwlwifi/mld/tx.c
@@ -1071,6 +1071,18 @@ void iwl_mld_handle_tx_resp_notif(struct iwl_mld *mld,
 	bool mgmt = false;
 	bool tx_failure = (status & TX_STATUS_MSK) != TX_STATUS_SUCCESS;
 
+	/* Firmware is dead — the TX response may contain corrupt SSN values
+	 * from a dying firmware DMA. Processing it could cause
+	 * iwl_trans_reclaim() to free the wrong TX queue entries, leading to
+	 * skb use-after-free or double-free.
+	 */
+	if (unlikely(test_bit(STATUS_FW_ERROR, &mld->trans->status))) {
+		WARN_ONCE(1,
+			  "iwlwifi: TX resp notif (sta=%d txq=%d) after FW error\n",
+			  sta_id, txq_id);
+		return;
+	}
+
 	if (IWL_FW_CHECK(mld, tx_resp->frame_count != 1,
 			 "Invalid tx_resp notif frame_count (%d)\n",
 			 tx_resp->frame_count))
@@ -1349,6 +1361,13 @@ void iwl_mld_handle_compressed_ba_notif(struct iwl_mld *mld,
 	u8 sta_id = ba_res->sta_id;
 	struct ieee80211_link_sta *link_sta;
 
+	if (unlikely(test_bit(STATUS_FW_ERROR, &mld->trans->status))) {
+		WARN_ONCE(1,
+			  "iwlwifi: BA notif (sta=%d) after FW error\n",
+			  sta_id);
+		return;
+	}
+
 	if (!tfd_cnt)
 		return;
 
diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/rx.c b/drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/rx.c
index 619a9505e6d9..ba18d35fa55d 100644
--- a/drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/rx.c
+++ b/drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/rx.c
@@ -1015,6 +1015,18 @@ static int iwl_pcie_napi_poll(struct napi_struct *napi, int budget)
 	trans_pcie = iwl_netdev_to_trans_pcie(napi->dev);
 	trans = trans_pcie->trans;
 
+	/* Stop processing RX if firmware has crashed. Stale notifications
+	 * from dying firmware (e.g. TX completions with corrupt SSN values)
+	 * can cause use-after-free in reclaim paths.
+	 */
+	if (unlikely(test_bit(STATUS_FW_ERROR, &trans->status))) {
+		WARN_ONCE(1,
+			  "iwlwifi: NAPI poll[%d] invoked after FW error\n",
+			  rxq->id);
+		napi_complete_done(napi, 0);
+		return 0;
+	}
+
 	ret = iwl_pcie_rx_handle(trans, rxq->id, budget);
 
 	IWL_DEBUG_ISR(trans, "[%d] handled %d, budget %d\n",
@@ -1042,6 +1054,14 @@ static int iwl_pcie_napi_poll_msix(struct napi_struct *napi, int budget)
 	trans_pcie = iwl_netdev_to_trans_pcie(napi->dev);
 	trans = trans_pcie->trans;
 
+	if (unlikely(test_bit(STATUS_FW_ERROR, &trans->status))) {
+		WARN_ONCE(1,
+			  "iwlwifi: NAPI MSIX poll[%d] invoked after FW error\n",
+			  rxq->id);
+		napi_complete_done(napi, 0);
+		return 0;
+	}
+
 	ret = iwl_pcie_rx_handle(trans, rxq->id, budget);
 	IWL_DEBUG_ISR(trans, "[%d] handled %d, budget %d\n", rxq->id, ret,
 		      budget);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] wifi: iwlwifi: prevent NAPI processing after firmware error
  2026-02-14 18:33     ` Cole Leavitt
@ 2026-02-16 18:12       ` Ben Greear
  2026-02-18 14:44         ` Cole Leavitt
                           ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Ben Greear @ 2026-02-16 18:12 UTC (permalink / raw)
  To: Cole Leavitt; +Cc: johannes.berg, miriam.rachel.korenblit, linux-wireless

On 2/14/26 10:33 AM, Cole Leavitt wrote:
> Ben,
> 
> Good catch on both fronts.
> 
> On the build_tfd dangling pointer -- you're right. The failure path at
> line 775 leaves entries[idx].skb/cmd pointing at caller-owned objects
> (set at lines 763-764). The caller gets -1 and presumably frees the
> skb, so entries[idx].skb becomes a dangling pointer. While write_ptr
> not advancing means current unmap paths won't iterate to that index,
> it's a latent UAF waiting for a flush path change or future code to
> touch it. Two NULL stores inside a held spinlock cost nothing. I think
> this should go upstream as its own patch.
> 
> On the TOCTOU question -- this is the part I spent the most time on.
> The window you're asking about is: firmware starts producing corrupt
> completion data *before* STATUS_FW_ERROR gets set. Our NAPI/TX handler
> checks can't help there because the flag isn't set yet.
> 
> The primary guard in that window is iwl_txq_used() in
> iwl_pcie_reclaim(). It validates that the firmware's SSN falls within
> [read_ptr, write_ptr). This catches wild values -- out-of-range SSNs,
> wraparound corruption, etc.
> 
> What it can't catch is an in-range corrupt SSN -- e.g., firmware says
> reclaim up to index 15 when legitimate is 8, but write_ptr is 20.
> That passes bounds checking and the reclaim loop frees skbs for
> entries still in-flight (active DMA). The NULL skb WARN_ONCE in the
> loop catches double-reclaim but not first-time over-reclaim.
> 
> The complete fix for this would be a per-entry generation counter --
> tag each entry on submit, validate on reclaim. But that adds per-entry
> overhead on the TX hot path to protect against a condition (firmware
> producing corrupt completions) that is already terminal. I think the
> right trade-off is:
> 
>    1. Your build_tfd NULL fix (eliminates one dangling pointer class)
>    2. STATUS_FW_ERROR checks in NAPI poll + TX handlers (this series --
>       shrinks the detection window to near-zero)
>    3. The existing iwl_txq_used() bounds check (catches most corrupt
>       SSNs)
> 
> Together these make the damage window small enough that a per-entry
> generation scheme isn't justified -- by the time firmware is sending
> corrupt SSNs, we're in dump-and-reset territory anyway.
> 
> That said, if you're seeing corruption patterns in your customer
> testing where a valid-looking-but-wrong SSN gets through before
> FW_ERROR fires, I'd be very interested in the traces. That would
> change the cost/benefit on the generation counter approach.

Hello Cole,

Looks like even with your patches we are still seeing use-after-free.  I tried
adding a lot of checks to detect already freed skbs in iwlwifi, and those are not hitting,
so possibly the bug is very close to the end of the call chain, or I am doing it
wrong, or it is some sort of race or bug that my code will not catch.

We do not see any related crashes when using mt76 radios, so pretty sure this
is related to iwlwifi.  A particular AP reproduces this problem within
a day, and we can run tcp tests for 30+ days against other APs with no problem.
I don't know what the AP could be doing to trigger this though.

No FW crash was seen in my logs in this case.

My tree is here if you care to investigate any of my UAF debugging or see
what code is printing some of these logs.  Suggestions for improvement would
be welcome!

https://github.com/greearb/linux-ct-6.18

One problem I see (for several years) is an infinite busy-spin in iwl-mvm-tx-tso-segment.  I added code to break
out after 32k loops, and warn.  That hits here.  The system crashes 28 minutes later, so not
sure if that is directly related.  I guess I can try to do more debugging around that bad tso
segment path.

Feb 16 00:16:01 LF1-MobileStation1 kernel: skbuff: ERROR: Found more than 32000 packets in skbuff::skb_segment, bailing out.
Feb 16 00:16:01 LF1-MobileStation1 kernel: ERROR: iwl-mvm-tx-tso-segment, list gso-segment list is huge: 32001, bailing out.
Feb 16 00:16:06 LF1-MobileStation1 kernel: skbuff: ERROR: Found more than 32000 packets in skbuff::skb_segment, bailing out.
Feb 16 00:16:06 LF1-MobileStation1 kernel: ERROR: iwl-mvm-tx-tso-segment, list gso-segment list is huge: 32001, bailing out.

Feb 16 00:44:06 LF1-MobileStation1 kernel: ------------[ cut here ]------------
Feb 16 00:44:06 LF1-MobileStation1 kernel: refcount_t: underflow; use-after-free.
Feb 16 00:44:06 LF1-MobileStation1 kernel: WARNING: CPU: 18 PID: 1203 at lib/refcount.c:28 refcount_warn_saturate+0xd8/0xe0
Feb 16 00:44:06 LF1-MobileStation1 kernel: Modules linked in: nf_conntrack_netlink nf_conntrack nfnetlink tls vrf nf_defrag_ipv6 nf_defrag_ipv4 8021q garp mrp 
stp llc macvlan wanlink(O) pktgen rpcrdma rdma_cm iw_cm ib_cm ib_core qrtr nct7802 vfat fat intel_rapl_msr coretemp intel_rapl_common intel_uncore_frequency 
intel_uncore_frequency_common snd_hda_codec_intelhdmi snd_hda_codec_hdmi snd_hda_codec_alc882 x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_realtek_lib 
ofpart snd_hda_codec_generic i2c_designware_platform spi_nor kvm_intel spi_pxa2xx_platform iwlmld i2c_designware_core spd5118 dw_dmac iTCO_wdt intel_pmc_bxt ccp 
mtd regmap_i2c spi_pxa2xx_core uvcvideo kvm snd_hda_intel 8250_dw iTCO_vendor_support mac80211 uvc snd_intel_dspcfg irqbypass videobuf2_vmalloc snd_hda_codec 
videobuf2_memops btusb videobuf2_v4l2 btbcm snd_hda_core videobuf2_common snd_hwdep videodev btmtk snd_seq btrtl mc btintel iwlwifi cdc_acm onboard_usb_dev 
snd_seq_device bluetooth snd_pcm cfg80211 snd_timer intel_pmc_core snd intel_lpss_pci i2c_i801 pmt_telemetry
Feb 16 00:44:06 LF1-MobileStation1 kernel:  i2c_smbus soundcore intel_lpss pmt_discovery spi_intel_pci mei_hdcp idma64 i2c_mux pmt_class wmi_bmof spi_intel 
pcspkr mei_pxp intel_pmc_ssram_telemetry bfq acpi_tad acpi_pad nfsd auth_rpcgss nfs_acl lockd grace nfs_localio sch_fq_codel sunrpc fuse zram raid1 dm_raid 
raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq xe drm_ttm_helper gpu_sched drm_suballoc_helper drm_gpuvm drm_exec 
drm_gpusvm_helper i915 i2c_algo_bit drm_buddy intel_gtt drm_client_lib drm_display_helper drm_kms_helper cec rc_core intel_oc_wdt ttm ixgbe agpgart mdio 
libie_fwlog e1000e igc dca hwmon drm mei_wdt intel_vsec i2c_core video wmi pinctrl_alderlake efivarfs [last unloaded: nfnetlink]
Feb 16 00:44:06 LF1-MobileStation1 kernel: CPU: 18 UID: 0 PID: 1203 Comm: irq/343-iwlwifi Tainted: G S         O        6.18.9+ #53 PREEMPT(full)
Feb 16 00:44:06 LF1-MobileStation1 kernel: Tainted: [S]=CPU_OUT_OF_SPEC, [O]=OOT_MODULE
Feb 16 00:44:06 LF1-MobileStation1 kernel: Hardware name: Default string /Default string, BIOS 5.27 11/12/2024
Feb 16 00:44:06 LF1-MobileStation1 kernel: RIP: 0010:refcount_warn_saturate+0xd8/0xe0
Feb 16 00:44:07 LF1-MobileStation1 kernel: Code: ff 48 c7 c7 d8 a4 6d 82 c6 05 d0 4a 3e 01 01 e8 3e 83 a7 ff 0f 0b c3 48 c7 c7 80 a4 6d 82 c6 05 bc 4a 3e 01 01 
e8 28 83 a7 ff <0f> 0b c3 0f 1f 44 00 00 8b 07 3d 00 00 00 c0 74 12 83 f8 01 74 13
Feb 16 00:44:07 LF1-MobileStation1 kernel: RSP: 0018:ffffc9000045c6d0 EFLAGS: 00010282
Feb 16 00:44:07 LF1-MobileStation1 kernel: RAX: 0000000000000000 RBX: ffff8882772db000 RCX: 0000000000000000
Feb 16 00:44:07 LF1-MobileStation1 kernel: RDX: ffff88885faa5f00 RSI: 0000000000000001 RDI: ffff88885fa98d00
Feb 16 00:44:07 LF1-MobileStation1 kernel: RBP: ffff8882447d9e00 R08: 0000000000000000 R09: 0000000000000003
Feb 16 00:44:07 LF1-MobileStation1 kernel: R10: ffffc9000045c570 R11: ffffffff82b58da8 R12: ffff88820165f200
Feb 16 00:44:07 LF1-MobileStation1 kernel: R13: 0000000000000001 R14: 00000000000005a8 R15: ffffc9000045c890
Feb 16 00:44:07 LF1-MobileStation1 kernel: FS:  0000000000000000(0000) GS:ffff8888dc5ae000(0000) knlGS:0000000000000000
Feb 16 00:44:07 LF1-MobileStation1 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 16 00:44:07 LF1-MobileStation1 kernel: CR2: 00007fd1022fdcb4 CR3: 0000000005a36004 CR4: 0000000000772ef0
Feb 16 00:44:07 LF1-MobileStation1 kernel: PKRU: 55555554
Feb 16 00:44:07 LF1-MobileStation1 kernel: Call Trace:
Feb 16 00:44:07 LF1-MobileStation1 kernel:  <IRQ>
Feb 16 00:44:07 LF1-MobileStation1 kernel:  tcp_shifted_skb+0x1d2/0x300
Feb 16 00:44:07 LF1-MobileStation1 kernel:  tcp_sacktag_walk+0x2da/0x4d0
Feb 16 00:44:07 LF1-MobileStation1 kernel:  tcp_sacktag_write_queue+0x4a1/0x9a0
Feb 16 00:44:07 LF1-MobileStation1 kernel:  tcp_ack+0xd66/0x16e0
Feb 16 00:44:07 LF1-MobileStation1 kernel:  ? ip_finish_output2+0x189/0x570
Feb 16 00:44:07 LF1-MobileStation1 kernel:  tcp_rcv_established+0x211/0xc10
Feb 16 00:44:07 LF1-MobileStation1 kernel:  ? sk_filter_trim_cap+0x1a7/0x350
Feb 16 00:44:07 LF1-MobileStation1 kernel:  tcp_v4_do_rcv+0x1bf/0x350
Feb 16 00:44:07 LF1-MobileStation1 kernel:  tcp_v4_rcv+0xddf/0x1550
Feb 16 00:44:07 LF1-MobileStation1 kernel:  ? raw_local_deliver+0xcc/0x280
Feb 16 00:44:07 LF1-MobileStation1 kernel:  ip_protocol_deliver_rcu+0x20/0x130
Feb 16 00:44:07 LF1-MobileStation1 kernel:  ip_local_deliver_finish+0x85/0xf0
Feb 16 00:44:07 LF1-MobileStation1 kernel:  ip_sublist_rcv_finish+0x35/0x50
Feb 16 00:44:07 LF1-MobileStation1 kernel:  ip_sublist_rcv+0x16f/0x200
Feb 16 00:44:07 LF1-MobileStation1 kernel:  ip_list_rcv+0xfe/0x130
Feb 16 00:44:07 LF1-MobileStation1 kernel:  __netif_receive_skb_list_core+0x183/0x1f0
Feb 16 00:44:07 LF1-MobileStation1 kernel:  netif_receive_skb_list_internal+0x1c8/0x2a0
Feb 16 00:44:07 LF1-MobileStation1 kernel:  gro_receive_skb+0x12e/0x210
Feb 16 00:44:07 LF1-MobileStation1 kernel:  ieee80211_rx_napi+0x82/0xc0 [mac80211]
Feb 16 00:44:07 LF1-MobileStation1 kernel:  iwl_mld_rx_mpdu+0xd0f/0xf00 [iwlmld]
Feb 16 00:44:07 LF1-MobileStation1 kernel:  iwl_pcie_rx_handle+0x394/0xa00 [iwlwifi]
Feb 16 00:44:07 LF1-MobileStation1 kernel:  iwl_pcie_napi_poll_msix+0x3f/0x110 [iwlwifi]
Feb 16 00:44:07 LF1-MobileStation1 kernel:  __napi_poll+0x25/0x1e0
Feb 16 00:44:07 LF1-MobileStation1 kernel:  net_rx_action+0x2d3/0x340
Feb 16 00:44:07 LF1-MobileStation1 kernel:  ? try_to_wake_up+0x2e6/0x610
Feb 16 00:44:07 LF1-MobileStation1 kernel:  ? __handle_irq_event_percpu+0xa3/0x230
Feb 16 00:44:07 LF1-MobileStation1 kernel:  handle_softirqs+0xca/0x2b0
Feb 16 00:44:07 LF1-MobileStation1 kernel:  ? irq_thread_dtor+0xa0/0xa0
Feb 16 00:44:07 LF1-MobileStation1 kernel:  do_softirq.part.0+0x3b/0x60
Feb 16 00:44:07 LF1-MobileStation1 kernel:  </IRQ>
Feb 16 00:44:07 LF1-MobileStation1 kernel:  <TASK>
Feb 16 00:44:07 LF1-MobileStation1 kernel:  __local_bh_enable_ip+0x58/0x60
Feb 16 00:44:07 LF1-MobileStation1 kernel:  iwl_pcie_irq_rx_msix_handler+0xbb/0x100 [iwlwifi]
Feb 16 00:44:07 LF1-MobileStation1 kernel:  irq_thread_fn+0x19/0x50
Feb 16 00:44:07 LF1-MobileStation1 kernel:  irq_thread+0x126/0x230
Feb 16 00:44:07 LF1-MobileStation1 kernel:  ? irq_finalize_oneshot.part.0+0xc0/0xc0
Feb 16 00:44:07 LF1-MobileStation1 kernel:  ? irq_forced_thread_fn+0x40/0x40
Feb 16 00:44:07 LF1-MobileStation1 kernel:  kthread+0xf7/0x1f0
Feb 16 00:44:07 LF1-MobileStation1 kernel:  ? kthreads_online_cpu+0x100/0x100
Feb 16 00:44:07 LF1-MobileStation1 kernel:  ? kthreads_online_cpu+0x100/0x100
Feb 16 00:44:07 LF1-MobileStation1 kernel:  ret_from_fork+0x114/0x140
Feb 16 00:44:07 LF1-MobileStation1 kernel:  ? kthreads_online_cpu+0x100/0x100
Feb 16 00:44:07 LF1-MobileStation1 kernel:  ret_from_fork_asm+0x11/0x20
Feb 16 00:44:07 LF1-MobileStation1 kernel:  </TASK>
Feb 16 00:44:07 LF1-MobileStation1 kernel: ---[ end trace 0000000000000000 ]---
Feb 16 00:44:07 LF1-MobileStation1 kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000

[NPE shortly after in tcp code, bug real problem is the use-after-free I assume]
# serial console output of the crash following the UAF.

#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP
CPU: 18 UID: 0 PID: 1203 Comm: irq/343-iwlwifi Tainted: G S      W  O        6.18.9+ #53 PREEMPT(full)
Tainted: [S]=CPU_OUT_OF_SPEC, [W]=WARN, [O]=OOT_MODULE
Hardware name: Default string /Default string, BIOS 5.27 11/12/2024
RIP: 0010:tcp_rack_detect_loss+0x11c/0x170
Code: 07 00 00 48 8b 87 b0 06 00 00 44 01 ee 48 29 d0 ba 00 00 00 00 48 0f 48 c2 29 c6 85 f6 7e 27 41 8b 06 39 f0 0f 42 c6 41 89 06 <48> 8b 45 58 4c 8d 65 58 48 
89 eb 48 83 e8 58 4d 39 fc 74 ab 48 89
RSP: 0018:ffffc9000045c758 EFLAGS: 00010293
RAX: 000000000000408d RBX: ffff88824fff7a00 RCX: 20c49ba5e353f7cf
RDX: 0000000000000000 RSI: 000000000000408d RDI: ffff88820165f200
RBP: ffffffffffffffa8 R08: 0000000083eed3f9 R09: 000000000000012c
R10: 00000000000005ba R11: 000000000000001d R12: ffff88824fff7a58
R13: 000000000000408d R14: ffffc9000045c79c R15: ffff88820165f888
FS:  0000000000000000(0000) GS:ffff8888dc5ae000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000005a36004 CR4: 0000000000772ef0
PKRU: 55555554
Call Trace:
  <IRQ>
  tcp_rack_mark_lost+0x59/0xe0
  tcp_identify_packet_loss+0x30/0x70
  tcp_fastretrans_alert+0x366/0x810
  tcp_ack+0xc66/0x16e0
  ? ip_finish_output2+0x189/0x570
  tcp_rcv_established+0x211/0xc10
  ? sk_filter_trim_cap+0x1a7/0x350
  tcp_v4_do_rcv+0x1bf/0x350
  tcp_v4_rcv+0xddf/0x1550
  ? raw_local_deliver+0xcc/0x280
  ip_protocol_deliver_rcu+0x20/0x130
  ip_local_deliver_finish+0x85/0xf0
  ip_sublist_rcv_finish+0x35/0x50
  ip_sublist_rcv+0x16f/0x200
  ip_list_rcv+0xfe/0x130
  __netif_receive_skb_list_core+0x183/0x1f0
  netif_receive_skb_list_internal+0x1c8/0x2a0
  gro_receive_skb+0x12e/0x210
  ieee80211_rx_napi+0x82/0xc0 [mac80211]
  iwl_mld_rx_mpdu+0xd0f/0xf00 [iwlmld]
  iwl_pcie_rx_handle+0x394/0xa00 [iwlwifi]
  iwl_pcie_napi_poll_msix+0x3f/0x110 [iwlwifi]
  __napi_poll+0x25/0x1e0
  net_rx_action+0x2d3/0x340
  ? try_to_wake_up+0x2e6/0x610
  ? __handle_irq_event_percpu+0xa3/0x230
  handle_softirqs+0xca/0x2b0
  ? irq_thread_dtor+0xa0/0xa0
  do_softirq.part.0+0x3b/0x60
  </IRQ>
  <TASK>
  __local_bh_enable_ip+0x58/0x60
  iwl_pcie_irq_rx_msix_handler+0xbb/0x100 [iwlwifi]
  irq_thread_fn+0x19/0x50
  irq_thread+0x126/0x230
  ? irq_finalize_oneshot.part.0+0xc0/0xc0
  ? irq_forced_thread_fn+0x40/0x40
  kthread+0xf7/0x1f0
  ? kthreads_online_cpu+0x100/0x100
  ? kthreads_online_cpu+0x100/0x100
  ret_from_fork+0x114/0x140
  ? kthreads_online_cpu+0x100/0x100
  ret_from_fork_asm+0x11/0x20
  </TASK>
Modules linked in: nf_conntrack_netlink nf_conntrack nfnetlink tls vrf nf_defrag_ipv6 nf_defrag_ipv4 8021q garp mrp stp llc macvlan wanlink(O) pktgen rpcrdma 
rdma_cm iw_cm ib_cm ib_core qrtr nct7802 vfat fat intel_rapl_msr coretemp intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common 
snd_hda_codec_intelhdmi snd_hda_codec_hdmi snd_hda_codec_alc882 x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_realtek_lib ofpart snd_hda_codec_generic 
i2c_designware_platform spi_nor kvm_intel spi_pxa2xx_platform iwlmld i2c_designware_core spd5118 dw_dmac iTCO_wdt intel_pmc_bxt ccp mtd regmap_i2c 
spi_pxa2xx_core uvcvideo kvm snd_hda_intel 8250_dw iTCO_vendor_support mac80211 uvc snd_intel_dspcfg irqbypass videobuf2_vmalloc snd_hda_codec videobuf2_memops 
btusb videobuf2_v4l2 btbcm snd_hda_core videobuf2_common snd_hwdep videodev btmtk snd_seq btrtl mc btintel iwlwifi cdc_acm onboard_usb_dev snd_seq_device 
bluetooth snd_pcm cfg80211 snd_timer intel_pmc_core snd intel_lpss_pci i2c_i801 pmt_telemetry
  i2c_smbus soundcore intel_lpss pmt_discovery spi_intel_pci mei_hdcp idma64 i2c_mux pmt_class wmi_bmof spi_intel pcspkr mei_pxp intel_pmc_ssram_telemetry bfq 
acpi_tad acpi_pad nfsd auth_rpcgss nfs_acl lockd grace nfs_localio sch_fq_codel sunrpc fuse zram raid1 dm_raid raid456 async_raid6_recov async_memcpy async_pq 
async_xor xor async_tx raid6_pq xe drm_ttm_helper gpu_sched drm_suballoc_helper drm_gpuvm drm_exec drm_gpusvm_helper i915 i2c_algo_bit drm_buddy intel_gtt 
drm_client_lib drm_display_helper drm_kms_helper cec rc_core intel_oc_wdt ttm ixgbe agpgart mdio libie_fwlog e1000e igc dca hwmon drm mei_wdt intel_vsec 
i2c_core video wmi pinctrl_alderlake efivarfs [last unloaded: nfnetlink]
CR2: 0000000000000000
---[ end trace 0000000000000000 ]---
RIP: 0010:tcp_rack_detect_loss+0x11c/0x170
Code: 07 00 00 48 8b 87 b0 06 00 00 44 01 ee 48 29 d0 ba 00 00 00 00 48 0f 48 c2 29 c6 85 f6 7e 27 41 8b 06 39 f0 0f 42 c6 41 89 06 <48> 8b 45 58 4c 8d 65 58 48 
89 eb 48 83 e8 58 4d 39 fc 74 ab 48 89
RSP: 0018:ffffc9000045c758 EFLAGS: 00010293
RAX: 000000000000408d RBX: ffff88824fff7a00 RCX: 20c49ba5e353f7cf
RDX: 0000000000000000 RSI: 000000000000408d RDI: ffff88820165f200
RBP: ffffffffffffffa8 R08: 0000000083eed3f9 R09: 000000000000012c
R10: 00000000000005ba R11: 000000000000001d R12: ffff88824fff7a58
R13: 000000000000408d R14: ffffc9000045c79c R15: ffff88820165f888
FS:  0000000000000000(0000) GS:ffff8888dc5ae000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000005a36004 CR4: 0000000000772ef0
PKRU: 55555554
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: disabled
Rebooting in 10 seconds..

Thanks,
Ben

> 
> Thanks,
> Cole
> 


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] wifi: iwlwifi: prevent NAPI processing after firmware error
  2026-02-16 18:12       ` Ben Greear
@ 2026-02-18 14:44         ` Cole Leavitt
  2026-02-18 14:44         ` Cole Leavitt
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 15+ messages in thread
From: Cole Leavitt @ 2026-02-18 14:44 UTC (permalink / raw)
  To: Ben Greear; +Cc: Johannes Berg, linux-wireless, Miri Korenblit

Ben,

I've been digging into the use-after-free crash you reported on your
BE200 running the MLD driver (tcp_shifted_skb refcount underflow,
followed by NULL deref in tcp_rack_detect_loss). I think I found the
root cause -- it's a missing guard in the MLD TSO segmentation path
that lets num_subframes=0 reach skb_gso_segment(), producing the 32k+
segment explosion you're seeing.

Here's the full chain:

1) mld/tlc.c:790 -- when firmware's TLC notification disables AMSDU for
   a TID (bit not set in amsdu_enabled), the MLD driver sets:

     link_sta->agg.max_tid_amsdu_len[i] = 1;

   This sentinel value 1 means "AMSDU disabled on this TID".

2) mld/tx.c:836-837 -- the TSO path checks:

     max_tid_amsdu_len = sta->cur->max_tid_amsdu_len[tid];
     if (!max_tid_amsdu_len)   // <-- only catches zero, not 1
         return iwl_tx_tso_segment(skb, 1, ...);

   Value 1 passes this check.

3) mld/tx.c:847 -- the division produces zero:

     num_subframes = (1 + 2) / (1534 + 2) = 0

   Any max_tid_amsdu_len below ~1534 (one subframe) produces 0 here.

4) iwl-utils.c:27 -- gso_size is set to zero:

     skb_shinfo(skb)->gso_size = num_subframes * mss = 0 * 1460 = 0

5) iwl-utils.c:30 -- skb_gso_segment() with gso_size=0 creates 32001+
   tiny segments, which is the error you're seeing:

     "skbuff: ERROR: Found more than 32000 packets in skb_segment"
     "iwl-mvm-tx-tso-segment, list gso-segment list is huge: 32001"

6) mld/tx.c:912-936 -- the loop queues ~1024 of those segments to the
   TX ring before it fills up, then purges the rest. This creates a
   massive burst of tiny frames that stress the BA completion path.

The MVM driver is immune because it checks mvmsta->amsdu_enabled (a
separate bitmap) at tx.c:912 and tx.c:936 BEFORE ever reaching the
num_subframes calculation. MLD has no equivalent -- it relies solely on
max_tid_amsdu_len, and the sentinel value 1 slips through.

This explains all your observations:
- 6.18 regression: BE200 moved from MVM (has guard) to MLD (no guard)
- AP-specific: the problem AP causes firmware to disable AMSDU for the
  active TID (other APs enable it, so max_tid_amsdu_len gets a proper
  value from iwl_mld_get_amsdu_size_of_tid())
- 28min gap between TSO explosion and UAF: the ~1024 micro-frame burst
  creates massive alloc/free churn in the skb slab, which can corrupt
  TCP retransmit queue entries allocated from the same cache
- No firmware error: firmware is fine, the bug is purely in MLD's TSO
  parameter calculation

Fix below. It adds a guard after the num_subframes calculation -- if
it's zero, fall back to single-subframe TSO (num_subframes=1), which
correctly sets gso_size=mss. This matches what MVM effectively does via
its amsdu_enabled checks.

Could you test this against the problem AP? Two things that would help
confirm the theory:

1) Before applying the fix, add this debug print to see the actual
   max_tid_amsdu_len value with the problem AP:

     // In iwl_mld_tx_tso_segment(), after line 847
     if (!num_subframes)
         pr_warn_once("iwlmld: num_subframes=0, max_tid_amsdu_len=%u "
                      "subf_len=%u mss=%u\n",
                      max_tid_amsdu_len, subf_len, mss);

2) After applying the fix, run against the problem AP for 1+ day and
   check if both the TSO explosion AND the UAF are gone.

I also noticed a few secondary defense-in-depth regressions in MLD's
TX completion path vs MVM:

- MLD's iwl_mld_tx_reclaim_txq() has no per-TID reclaim tracking
  (MVM has tid_data->next_reclaimed and validates tid_data->txq_id)
- The transport-level reclaim_lock prevents direct double-free, but
  MLD is missing MVM's extra safety checks

These are probably not directly causing your crash, but worth noting.

---
 drivers/net/wireless/intel/iwlwifi/mld/tx.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/net/wireless/intel/iwlwifi/mld/tx.c b/drivers/net/wireless/intel/iwlwifi/mld/tx.c
index fbb672f4d8c7..1d47254a4148 100644
--- a/drivers/net/wireless/intel/iwlwifi/mld/tx.c
+++ b/drivers/net/wireless/intel/iwlwifi/mld/tx.c
@@ -846,6 +846,17 @@ static int iwl_mld_tx_tso_segment(struct iwl_mld *mld, struct sk_buff *skb,
 	 */
 	num_subframes = (max_tid_amsdu_len + pad) / (subf_len + pad);
 
+	/* If the AMSDU length limit is too small to fit even a single
+	 * subframe (e.g. max_tid_amsdu_len is the sentinel value 1 set by
+	 * the TLC notification when AMSDU is disabled for this TID), fall
+	 * back to non-AMSDU TSO segmentation. Without this guard,
+	 * num_subframes=0 causes gso_size=0 in iwl_tx_tso_segment(),
+	 * which makes skb_gso_segment() produce tens of thousands of
+	 * 1-byte segments, overloading the TX ring and completion path.
+	 */
+	if (!num_subframes)
+		return iwl_tx_tso_segment(skb, 1, netdev_flags, mpdus_skbs);
+
 	if (sta->max_amsdu_subframes &&
 	    num_subframes > sta->max_amsdu_subframes)
 		num_subframes = sta->max_amsdu_subframes;
-- 
2.52.0

Cole


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] wifi: iwlwifi: prevent NAPI processing after firmware error
  2026-02-16 18:12       ` Ben Greear
  2026-02-18 14:44         ` Cole Leavitt
@ 2026-02-18 14:44         ` Cole Leavitt
  2026-02-18 14:47         ` [PATCH 0/1] wifi: iwlwifi: mld: fix TSO segmentation explosion causing UAF Cole Leavitt
  2026-02-18 17:35         ` [PATCH] wifi: iwlwifi: prevent NAPI processing after firmware error Ben Greear
  3 siblings, 0 replies; 15+ messages in thread
From: Cole Leavitt @ 2026-02-18 14:44 UTC (permalink / raw)
  To: Ben Greear; +Cc: Johannes Berg, linux-wireless, Miri Korenblit

Ben,

I've been digging into the use-after-free crash you reported on your
BE200 running the MLD driver (tcp_shifted_skb refcount underflow,
followed by NULL deref in tcp_rack_detect_loss). I think I found the
root cause -- it's a missing guard in the MLD TSO segmentation path
that lets num_subframes=0 reach skb_gso_segment(), producing the 32k+
segment explosion you're seeing.

Here's the full chain:

1) mld/tlc.c:790 -- when firmware's TLC notification disables AMSDU for
   a TID (bit not set in amsdu_enabled), the MLD driver sets:

     link_sta->agg.max_tid_amsdu_len[i] = 1;

   This sentinel value 1 means "AMSDU disabled on this TID".

2) mld/tx.c:836-837 -- the TSO path checks:

     max_tid_amsdu_len = sta->cur->max_tid_amsdu_len[tid];
     if (!max_tid_amsdu_len)   // <-- only catches zero, not 1
         return iwl_tx_tso_segment(skb, 1, ...);

   Value 1 passes this check.

3) mld/tx.c:847 -- the division produces zero:

     num_subframes = (1 + 2) / (1534 + 2) = 0

   Any max_tid_amsdu_len below ~1534 (one subframe) produces 0 here.

4) iwl-utils.c:27 -- gso_size is set to zero:

     skb_shinfo(skb)->gso_size = num_subframes * mss = 0 * 1460 = 0

5) iwl-utils.c:30 -- skb_gso_segment() with gso_size=0 creates 32001+
   tiny segments, which is the error you're seeing:

     "skbuff: ERROR: Found more than 32000 packets in skb_segment"
     "iwl-mvm-tx-tso-segment, list gso-segment list is huge: 32001"

6) mld/tx.c:912-936 -- the loop queues ~1024 of those segments to the
   TX ring before it fills up, then purges the rest. This creates a
   massive burst of tiny frames that stress the BA completion path.

The MVM driver is immune because it checks mvmsta->amsdu_enabled (a
separate bitmap) at tx.c:912 and tx.c:936 BEFORE ever reaching the
num_subframes calculation. MLD has no equivalent -- it relies solely on
max_tid_amsdu_len, and the sentinel value 1 slips through.

This explains all your observations:
- 6.18 regression: BE200 moved from MVM (has guard) to MLD (no guard)
- AP-specific: the problem AP causes firmware to disable AMSDU for the
  active TID (other APs enable it, so max_tid_amsdu_len gets a proper
  value from iwl_mld_get_amsdu_size_of_tid())
- 28min gap between TSO explosion and UAF: the ~1024 micro-frame burst
  creates massive alloc/free churn in the skb slab, which can corrupt
  TCP retransmit queue entries allocated from the same cache
- No firmware error: firmware is fine, the bug is purely in MLD's TSO
  parameter calculation

Fix below. It adds a guard after the num_subframes calculation -- if
it's zero, fall back to single-subframe TSO (num_subframes=1), which
correctly sets gso_size=mss. This matches what MVM effectively does via
its amsdu_enabled checks.

Could you test this against the problem AP? Two things that would help
confirm the theory:

1) Before applying the fix, add this debug print to see the actual
   max_tid_amsdu_len value with the problem AP:

     // In iwl_mld_tx_tso_segment(), after line 847
     if (!num_subframes)
         pr_warn_once("iwlmld: num_subframes=0, max_tid_amsdu_len=%u "
                      "subf_len=%u mss=%u\n",
                      max_tid_amsdu_len, subf_len, mss);

2) After applying the fix, run against the problem AP for 1+ day and
   check if both the TSO explosion AND the UAF are gone.

I also noticed a few secondary defense-in-depth regressions in MLD's
TX completion path vs MVM:

- MLD's iwl_mld_tx_reclaim_txq() has no per-TID reclaim tracking
  (MVM has tid_data->next_reclaimed and validates tid_data->txq_id)
- The transport-level reclaim_lock prevents direct double-free, but
  MLD is missing MVM's extra safety checks

These are probably not directly causing your crash, but worth noting.

---
 drivers/net/wireless/intel/iwlwifi/mld/tx.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/net/wireless/intel/iwlwifi/mld/tx.c b/drivers/net/wireless/intel/iwlwifi/mld/tx.c
index fbb672f4d8c7..1d47254a4148 100644
--- a/drivers/net/wireless/intel/iwlwifi/mld/tx.c
+++ b/drivers/net/wireless/intel/iwlwifi/mld/tx.c
@@ -846,6 +846,17 @@ static int iwl_mld_tx_tso_segment(struct iwl_mld *mld, struct sk_buff *skb,
 	 */
 	num_subframes = (max_tid_amsdu_len + pad) / (subf_len + pad);
 
+	/* If the AMSDU length limit is too small to fit even a single
+	 * subframe (e.g. max_tid_amsdu_len is the sentinel value 1 set by
+	 * the TLC notification when AMSDU is disabled for this TID), fall
+	 * back to non-AMSDU TSO segmentation. Without this guard,
+	 * num_subframes=0 causes gso_size=0 in iwl_tx_tso_segment(),
+	 * which makes skb_gso_segment() produce tens of thousands of
+	 * 1-byte segments, overloading the TX ring and completion path.
+	 */
+	if (!num_subframes)
+		return iwl_tx_tso_segment(skb, 1, netdev_flags, mpdus_skbs);
+
 	if (sta->max_amsdu_subframes &&
 	    num_subframes > sta->max_amsdu_subframes)
 		num_subframes = sta->max_amsdu_subframes;
-- 
2.52.0

Cole


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 0/1] wifi: iwlwifi: mld: fix TSO segmentation explosion causing UAF
  2026-02-16 18:12       ` Ben Greear
  2026-02-18 14:44         ` Cole Leavitt
  2026-02-18 14:44         ` Cole Leavitt
@ 2026-02-18 14:47         ` Cole Leavitt
  2026-02-18 14:47           ` [PATCH 1/1] wifi: iwlwifi: mld: fix TSO segmentation explosion when AMSDU is disabled Cole Leavitt
  2026-03-22 12:29           ` [PATCH 0/1] wifi: iwlwifi: mld: fix TSO segmentation explosion causing UAF Korenblit, Miriam Rachel
  2026-02-18 17:35         ` [PATCH] wifi: iwlwifi: prevent NAPI processing after firmware error Ben Greear
  3 siblings, 2 replies; 15+ messages in thread
From: Cole Leavitt @ 2026-02-18 14:47 UTC (permalink / raw)
  To: greearb; +Cc: johannes, linux-wireless, miriam.rachel.korenblit, Cole Leavitt

Ben,

I've been digging into the use-after-free crash you reported on your
BE200 running the MLD driver (tcp_shifted_skb refcount underflow,
followed by NULL deref in tcp_rack_detect_loss). I think I found the
root cause -- it's a missing guard in the MLD TSO segmentation path
that lets num_subframes=0 reach skb_gso_segment(), producing the 32k+
segment explosion you're seeing.

Here's the full chain:

1) mld/tlc.c:790 -- when firmware's TLC notification disables AMSDU for
   a TID (bit not set in amsdu_enabled), the MLD driver sets:

     link_sta->agg.max_tid_amsdu_len[i] = 1;

   This sentinel value 1 means "AMSDU disabled on this TID".

2) mld/tx.c:836-837 -- the TSO path checks:

     max_tid_amsdu_len = sta->cur->max_tid_amsdu_len[tid];
     if (!max_tid_amsdu_len)   // <-- only catches zero, not 1
         return iwl_tx_tso_segment(skb, 1, ...);

   Value 1 passes this check.

3) mld/tx.c:847 -- the division produces zero:

     num_subframes = (1 + 2) / (1534 + 2) = 0

   Any max_tid_amsdu_len below ~1534 (one subframe) produces 0 here.

4) iwl-utils.c:27 -- gso_size is set to zero:

     skb_shinfo(skb)->gso_size = num_subframes * mss = 0 * 1460 = 0

5) iwl-utils.c:30 -- skb_gso_segment() with gso_size=0 creates 32001+
   tiny segments, which is the error you're seeing:

     "skbuff: ERROR: Found more than 32000 packets in skb_segment"
     "iwl-mvm-tx-tso-segment, list gso-segment list is huge: 32001"

6) mld/tx.c:912-936 -- the loop queues ~1024 of those segments to the
   TX ring before it fills up, then purges the rest. This creates a
   massive burst of tiny frames that stress the BA completion path.

The MVM driver is immune because it checks mvmsta->amsdu_enabled (a
separate bitmap) at tx.c:912 and tx.c:936 BEFORE ever reaching the
num_subframes calculation. MLD has no equivalent -- it relies solely on
max_tid_amsdu_len, and the sentinel value 1 slips through.

This explains all your observations:
- 6.18 regression: BE200 moved from MVM (has guard) to MLD (no guard)
- AP-specific: the problem AP causes firmware to disable AMSDU for the
  active TID (other APs enable it, so max_tid_amsdu_len gets a proper
  value from iwl_mld_get_amsdu_size_of_tid())
- 28min gap between TSO explosion and UAF: the ~1024 micro-frame burst
  creates massive alloc/free churn in the skb slab, which can corrupt
  TCP retransmit queue entries allocated from the same cache
- No firmware error: firmware is fine, the bug is purely in MLD's TSO
  parameter calculation

The fix (in patch 1/1) adds a guard after the num_subframes
calculation -- if it's zero, fall back to single-subframe TSO
(num_subframes=1), which correctly sets gso_size=mss. This matches what
MVM effectively does via its amsdu_enabled checks.

Could you test this against the problem AP? Two things that would help
confirm the theory:

1) Before applying the fix, add this debug print to see the actual
   max_tid_amsdu_len value with the problem AP:

     // In iwl_mld_tx_tso_segment(), after line 847
     if (!num_subframes)
         pr_warn_once("iwlmld: num_subframes=0, max_tid_amsdu_len=%u "
                      "subf_len=%u mss=%u\n",
                      max_tid_amsdu_len, subf_len, mss);

2) After applying the fix, run against the problem AP for 1+ day and
   check if both the TSO explosion AND the UAF are gone.

I also noticed a few secondary defense-in-depth regressions in MLD's TX
completion path vs MVM:

- MLD's iwl_mld_tx_reclaim_txq() has no per-TID reclaim tracking
  (MVM has tid_data->next_reclaimed and validates tid_data->txq_id)
- The transport-level reclaim_lock prevents direct double-free, but
  MLD is missing MVM's extra safety checks

These are probably not directly causing your crash, but worth noting.

Cole Leavitt (1):
  wifi: iwlwifi: mld: fix TSO segmentation explosion when AMSDU is
    disabled

 drivers/net/wireless/intel/iwlwifi/mld/tx.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

-- 
2.52.0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/1] wifi: iwlwifi: mld: fix TSO segmentation explosion when AMSDU is disabled
  2026-02-18 14:47         ` [PATCH 0/1] wifi: iwlwifi: mld: fix TSO segmentation explosion causing UAF Cole Leavitt
@ 2026-02-18 14:47           ` Cole Leavitt
  2026-03-22 12:28             ` Korenblit, Miriam Rachel
  2026-03-22 12:29           ` [PATCH 0/1] wifi: iwlwifi: mld: fix TSO segmentation explosion causing UAF Korenblit, Miriam Rachel
  1 sibling, 1 reply; 15+ messages in thread
From: Cole Leavitt @ 2026-02-18 14:47 UTC (permalink / raw)
  To: greearb; +Cc: johannes, linux-wireless, miriam.rachel.korenblit, Cole Leavitt

When the TLC notification disables AMSDU for a TID, the MLD driver sets
max_tid_amsdu_len to the sentinel value 1. The TSO segmentation path in
iwl_mld_tx_tso_segment() checks for zero but not for this sentinel,
allowing it to reach the num_subframes calculation:

  num_subframes = (max_tid_amsdu_len + pad) / (subf_len + pad)
                = (1 + 2) / (1534 + 2) = 0

This zero propagates to iwl_tx_tso_segment() which sets:

  gso_size = num_subframes * mss = 0

Calling skb_gso_segment() with gso_size=0 creates over 32000 tiny
segments from a single GSO skb. This floods the TX ring with ~1024
micro-frames (the rest are purged), creating a massive burst of TX
completion events that can lead to memory corruption and a subsequent
use-after-free in TCP's retransmit queue (refcount underflow in
tcp_shifted_skb, NULL deref in tcp_rack_detect_loss).

The MVM driver is immune because it checks mvmsta->amsdu_enabled before
reaching the num_subframes calculation. The MLD driver has no equivalent
bitmap check and relies solely on max_tid_amsdu_len, which does not
catch the sentinel value.

Fix this by falling back to single-subframe TSO (num_subframes=1) when
the AMSDU length limit is too small to fit even one subframe.

Fixes: d1e879ec600f ("wifi: iwlwifi: add iwlmld sub-driver")
Signed-off-by: Cole Leavitt <cole@unwrap.rs>
---
 drivers/net/wireless/intel/iwlwifi/mld/tx.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/net/wireless/intel/iwlwifi/mld/tx.c b/drivers/net/wireless/intel/iwlwifi/mld/tx.c
index fbb672f4d8c7..1d47254a4148 100644
--- a/drivers/net/wireless/intel/iwlwifi/mld/tx.c
+++ b/drivers/net/wireless/intel/iwlwifi/mld/tx.c
@@ -846,6 +846,17 @@ static int iwl_mld_tx_tso_segment(struct iwl_mld *mld, struct sk_buff *skb,
 	 */
 	num_subframes = (max_tid_amsdu_len + pad) / (subf_len + pad);
 
+	/* If the AMSDU length limit is too small to fit even a single
+	 * subframe (e.g. max_tid_amsdu_len is the sentinel value 1 set by
+	 * the TLC notification when AMSDU is disabled for this TID), fall
+	 * back to non-AMSDU TSO segmentation. Without this guard,
+	 * num_subframes=0 causes gso_size=0 in iwl_tx_tso_segment(),
+	 * which makes skb_gso_segment() produce tens of thousands of
+	 * 1-byte segments, overloading the TX ring and completion path.
+	 */
+	if (!num_subframes)
+		return iwl_tx_tso_segment(skb, 1, netdev_flags, mpdus_skbs);
+
 	if (sta->max_amsdu_subframes &&
 	    num_subframes > sta->max_amsdu_subframes)
 		num_subframes = sta->max_amsdu_subframes;
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] wifi: iwlwifi: prevent NAPI processing after firmware error
  2026-02-16 18:12       ` Ben Greear
                           ` (2 preceding siblings ...)
  2026-02-18 14:47         ` [PATCH 0/1] wifi: iwlwifi: mld: fix TSO segmentation explosion causing UAF Cole Leavitt
@ 2026-02-18 17:35         ` Ben Greear
  3 siblings, 0 replies; 15+ messages in thread
From: Ben Greear @ 2026-02-18 17:35 UTC (permalink / raw)
  To: Cole Leavitt; +Cc: Johannes Berg, linux-wireless, Miri Korenblit

On 2/18/26 09:17, Cole Leavitt wrote:
> Ben,
> 
> I've been digging into the use-after-free crash you reported on your
> BE200 running the MLD driver (tcp_shifted_skb refcount underflow,
> followed by NULL deref in tcp_rack_detect_loss). I think I found the
> root cause -- it's a missing guard in the MLD TSO segmentation path
> that lets num_subframes=0 reach skb_gso_segment(), producing the 32k+
> segment explosion you're seeing

Hello Cole,

Thanks for this, I'll take a closer look and test this out.

But also, I first saw this back in 2024, and that was before mld split from
mvm driver.  Possibly mvm added protection after I saw the problem and
that didn't make it into mld for some reason, or maybe there are other problems
as well.

Thanks,
Ben


> 
> Here's the full chain:
> 
> 1) mld/tlc.c:790 -- when firmware's TLC notification disables AMSDU for
>     a TID (bit not set in amsdu_enabled), the MLD driver sets:
> 
>       link_sta->agg.max_tid_amsdu_len[i] = 1;
> 
>     This sentinel value 1 means "AMSDU disabled on this TID".
> 
> 2) mld/tx.c:836-837 -- the TSO path checks:
> 
>       max_tid_amsdu_len = sta->cur->max_tid_amsdu_len[tid];
>       if (!max_tid_amsdu_len)   // <-- only catches zero, not 1
>           return iwl_tx_tso_segment(skb, 1, ...);
> 
>     Value 1 passes this check.
> 
> 3) mld/tx.c:847 -- the division produces zero:
> 
>       num_subframes = (1 + 2) / (1534 + 2) = 0
> 
>     Any max_tid_amsdu_len below ~1534 (one subframe) produces 0 here.
> 
> 4) iwl-utils.c:27 -- gso_size is set to zero:
> 
>       skb_shinfo(skb)->gso_size = num_subframes * mss = 0 * 1460 = 0
> 
> 5) iwl-utils.c:30 -- skb_gso_segment() with gso_size=0 creates 32001+
>     tiny segments, which is the error you're seeing:
> 
>       "skbuff: ERROR: Found more than 32000 packets in skb_segment"
>       "iwl-mvm-tx-tso-segment, list gso-segment list is huge: 32001"
> 
> 6) mld/tx.c:912-936 -- the loop queues ~1024 of those segments to the
>     TX ring before it fills up, then purges the rest. This creates a
>     massive burst of tiny frames that stress the BA completion path.
> 
> The MVM driver is immune because it checks mvmsta->amsdu_enabled (a
> separate bitmap) at tx.c:912 and tx.c:936 BEFORE ever reaching the
> num_subframes calculation. MLD has no equivalent -- it relies solely on
> max_tid_amsdu_len, and the sentinel value 1 slips through.
> 
> This explains all your observations:
> - 6.18 regression: BE200 moved from MVM (has guard) to MLD (no guard)
> - AP-specific: the problem AP causes firmware to disable AMSDU for the
>    active TID (other APs enable it, so max_tid_amsdu_len gets a proper
>    value from iwl_mld_get_amsdu_size_of_tid())
> - 28min gap between TSO explosion and UAF: the ~1024 micro-frame burst
>    creates massive alloc/free churn in the skb slab, which can corrupt
>    TCP retransmit queue entries allocated from the same cache
> - No firmware error: firmware is fine, the bug is purely in MLD's TSO
>    parameter calculation
> 
> Fix below. It adds a guard after the num_subframes calculation -- if
> it's zero, fall back to single-subframe TSO (num_subframes=1), which
> correctly sets gso_size=mss. This matches what MVM effectively does via
> its amsdu_enabled checks.
> 
> Could you test this against the problem AP? Two things that would help
> confirm the theory:
> 
> 1) Before applying the fix, add this debug print to see the actual
>     max_tid_amsdu_len value with the problem AP:
> 
>       // In iwl_mld_tx_tso_segment(), after line 847
>       if (!num_subframes)
>           pr_warn_once("iwlmld: num_subframes=0, max_tid_amsdu_len=%u "
>                        "subf_len=%u mss=%u\n",
>                        max_tid_amsdu_len, subf_len, mss);
> 
> 2) After applying the fix, run against the problem AP for 1+ day and
>     check if both the TSO explosion AND the UAF are gone.
> 
> I also noticed a few secondary defense-in-depth regressions in MLD's
> TX completion path vs MVM:
> 
> - MLD's iwl_mld_tx_reclaim_txq() has no per-TID reclaim tracking
>    (MVM has tid_data->next_reclaimed and validates tid_data->txq_id)
> - The transport-level reclaim_lock prevents direct double-free, but
>    MLD is missing MVM's extra safety checks
> 
> These are probably not directly causing your crash, but worth noting.
> 
> ---
>   drivers/net/wireless/intel/iwlwifi/mld/tx.c | 11 +++++++++++
>   1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/net/wireless/intel/iwlwifi/mld/tx.c b/drivers/net/wireless/intel/iwlwifi/mld/tx.c
> index fbb672f4d8c7..1d47254a4148 100644
> --- a/drivers/net/wireless/intel/iwlwifi/mld/tx.c
> +++ b/drivers/net/wireless/intel/iwlwifi/mld/tx.c
> @@ -846,6 +846,17 @@ static int iwl_mld_tx_tso_segment(struct iwl_mld *mld, struct sk_buff *skb,
>   	 */
>   	num_subframes = (max_tid_amsdu_len + pad) / (subf_len + pad);
>   
> +	/* If the AMSDU length limit is too small to fit even a single
> +	 * subframe (e.g. max_tid_amsdu_len is the sentinel value 1 set by
> +	 * the TLC notification when AMSDU is disabled for this TID), fall
> +	 * back to non-AMSDU TSO segmentation. Without this guard,
> +	 * num_subframes=0 causes gso_size=0 in iwl_tx_tso_segment(),
> +	 * which makes skb_gso_segment() produce tens of thousands of
> +	 * 1-byte segments, overloading the TX ring and completion path.
> +	 */
> +	if (!num_subframes)
> +		return iwl_tx_tso_segment(skb, 1, netdev_flags, mpdus_skbs);
> +
>   	if (sta->max_amsdu_subframes &&
>   	    num_subframes > sta->max_amsdu_subframes)
>   		num_subframes = sta->max_amsdu_subframes;

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] wifi: iwlwifi: prevent NAPI processing after firmware error
       [not found] <7f72ac08-6b4a-486b-a8f9-7b78ea0f5ae1@candelatech.com>
@ 2026-02-18 18:47 ` Cole Leavitt
  2026-02-19 16:38   ` Ben Greear
  0 siblings, 1 reply; 15+ messages in thread
From: Cole Leavitt @ 2026-02-18 18:47 UTC (permalink / raw)
  To: Ben Greear; +Cc: linux-wireless

Ben,

Thanks for the historical context. I dug through the git history and
your linux-ct repos to verify exactly what happened when. I want to
make sure I have this right - can you confirm whether this matches
what you saw?

2018 Bug (Bug 199209)
---------------------
Fixed by Emmanuel in commit 0eac9abace16 ("iwlwifi: mvm: fix TX of
AMSDU with fragmented SKBs"). That was a different trigger - NFS
created highly fragmented SKBs where nr_frags was so high that the
buffer descriptor limit check produced num_subframes=0. Emmanuel's
fix clamps that path to 1.

Current MLD Bug
---------------
Different path to the same symptom. When TLC disables AMSDU for a
TID, both MVM and MLD set max_tid_amsdu_len[tid] = 1 as a sentinel
value. The key difference in protection:

MVM has a private mvmsta->amsdu_enabled bitmap that gates the entire
AMSDU path:

    if (!mvmsta->amsdu_enabled)
        return iwl_tx_tso_segment(skb, 1, ...);  // bail out early

    if (!(mvmsta->amsdu_enabled & BIT(tid)))
        return iwl_tx_tso_segment(skb, 1, ...);  // bail out early

MVM never reads max_tid_amsdu_len in its TX path - it uses its own
mvmsta->max_amsdu_len. This bitmap was added in commit 84226ca1c5d3
("iwlwifi: mvm: enable AMSDU for all TIDs", Nov 2017).

MLD was designed to use mac80211's sta->cur->max_tid_amsdu_len
directly, with no equivalent bitmap:

    max_tid_amsdu_len = sta->cur->max_tid_amsdu_len[tid];
    if (!max_tid_amsdu_len)  // only catches 0, not sentinel 1!
        return iwl_tx_tso_segment(skb, 1, ...);

    num_subframes = (max_tid_amsdu_len + pad) / (subf_len + pad);
    // When max_tid_amsdu_len=1: num_subframes = (1 + 3) / (1534 + 3) = 0

What I found in your repos:

  - linux-ct-6.5-be200, linux-ct-6.10, linux-ct-6.14: No MLD driver,
    only MVM with amsdu_enabled bitmap protection
  - linux-ct-6.15, linux-ct-6.18: Have MLD driver
    (drivers/net/wireless/intel/iwlwifi/mld/)
  - backport-iwlwifi: MLD tx.c first appeared in commit 56f903a89
    (2024-07-17)

So MVM should have been immune to this specific sentinel-value bug
due to the bitmap check.

Question for you: When you saw TSO segment explosions in 2024, what
kernel and driver were you using? If it was one of your 6.5-6.14
kernels with MVM, then there may be a different path to
num_subframes=0 that I haven't identified yet. If you were using
backport-iwlwifi with MLD enabled, that would explain it hitting the
same bug I'm fixing now.

The commit ae6d30a71521 (Feb 2024) added better error reporting for
skb_gso_segment failures, which suggests people were hitting GSO
segment errors around that time - but I don't have visibility into
what specific trigger you hit.

My fix catches the sentinel-induced zero after the calculation, which
is equivalent to what MVM's bitmap check accomplishes. This should
prevent the current MLD bug from reaching skb_gso_segment with
gso_size=0.

Looking forward to your test results with the problem AP, and any
clarification on what setup you were using in 2024.

Cole

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] wifi: iwlwifi: prevent NAPI processing after firmware error
  2026-02-18 18:47 ` [PATCH] " Cole Leavitt
@ 2026-02-19 16:38   ` Ben Greear
  0 siblings, 0 replies; 15+ messages in thread
From: Ben Greear @ 2026-02-19 16:38 UTC (permalink / raw)
  To: Cole Leavitt; +Cc: linux-wireless

On 2/18/26 10:47, Cole Leavitt wrote:
> Ben,
> 
> Thanks for the historical context. I dug through the git history and
> your linux-ct repos to verify exactly what happened when. I want to
> make sure I have this right - can you confirm whether this matches
> what you saw?

Bug was originally seen in mainline kernel before MLD driver was forked
off from mvm, not in a backports kernel.

Adding your patch below didn't solve the UAF in the tcp_ack path,
at least.  I did not see the debugging indicated that code path
in the patch was taken.  I have not seen any more instances of the 32k loops in
packet segment loop in the last crash, so at least it is not only reason why a UAF
would happen.

The problem reproduced overnight was:

BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 0 P4D 0
Oops: Oops: 0002 [#1] SMP
CPU: 12 UID: 0 PID: 1234 Comm: irq/345-iwlwifi Tainted: G S         O        6.18.9+ #53 PREEMPT(full)
Tainted: [S]=CPU_OUT_OF_SPEC, [O]=OOT_MODULE
Hardware name: Default string /Default string, BIOS 5.27 11/12/2024
RIP: 0010:rb_erase+0x173/0x350
Code: 08 48 8b 01 a8 01 75 97 48 83 c0 01 48 89 01 c3 c3 48 89 46 10 e9 27 ff ff ff 48 8b 56 10 48 8d 41 01 48 89 51 08 48 89 4e 10 <48> 89 02 48 8b 01 48 89 06 
48 89 31 48 83 f8 03 0f 86 8e 00 00 00
RSP: 0018:ffffc9000038c820 EFLAGS: 00010246
RAX: ffff8881b0646601 RBX: 000000000000000c RCX: ffff8881b0646600
RDX: 0000000000000000 RSI: ffff8881e9cbea00 RDI: ffff8881b0646200
------------[ cut here ]------------
RBP: ffff8881b0646200 R08: ffff8881ce443108 R09: 0000000080200001
R10: 0000000000010000 R11: 00000000f0eaffb7 R12: ffff8881ce442f80
R13: 0000000000000004 R14: ffff8881b0646600 R15: 0000000000000001
refcount_t: underflow; use-after-free.
FS:  0000000000000000(0000) GS:ffff8888dc42e000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000005a36002 CR4: 0000000000772ef0
PKRU: 55555554
Call Trace:
WARNING: CPU: 0 PID: 1224 at lib/refcount.c:28 refcount_warn_saturate+0xd8/0xe0
  <IRQ>
Modules linked in:
  tcp_ack+0x635/0x16e0
  nf_conntrack_netlink
  tcp_rcv_established+0x211/0xc10
  nf_conntrack
  ? sk_filter_trim_cap+0x1a7/0x350
  nfnetlink
  tcp_v4_do_rcv+0x1bf/0x350
  tls
  tcp_v4_rcv+0xddf/0x1550
  vrf
  ? lock_timer_base+0x6d/0x90
  nf_defrag_ipv6
  ? raw_local_deliver+0xcc/0x280
  nf_defrag_ipv4
ip_protocol_deliver_rcu+0x20/0x130
  8021q
  ip_local_deliver_finish+0x85/0xf0
  garp
  ip_sublist_rcv_finish+0x35/0x50
  mrp
  ip_sublist_rcv+0x16f/0x200
  stp
  ip_list_rcv+0xfe/0x130
  llc
  __netif_receive_skb_list_core+0x183/0x1f0
  macvlan
  netif_receive_skb_list_internal+0x1c8/0x2a0
  wanlink(O)
  gro_receive_skb+0x12e/0x210
  pktgen
  ieee80211_rx_napi+0x82/0xc0 [mac80211]
  rpcrdma
  iwl_mld_rx_mpdu+0xd0f/0xf00 [iwlmld]
  rdma_cm
  iwl_pcie_rx_handle+0x394/0xa00 [iwlwifi]
  iw_cm
  iwl_pcie_napi_poll_msix+0x3f/0x110 [iwlwifi]
  ib_cm
  __napi_poll+0x25/0x1e0
  ib_core
  net_rx_action+0x2d3/0x340
  qrtr


I have enough guard/debugging logic in place that I'm pretty sure the skb coming
from iwlwifi in this particular path is fine.  It appears the problem is that
there is an already freed skb in the socket's skb collection, and code blows up
trying to access a bad rbtree link, or something.  I'm continuing to try to narrow
down where skb goes bad, but it seems like probably some other thread of logic is
racing to free the skb since the crash site moves around a lot.  Maybe I can add
some sort of debugging to warn if skb is freed while in an rbtree...

Thanks,
Ben

> 
> 2018 Bug (Bug 199209)
> ---------------------
> Fixed by Emmanuel in commit 0eac9abace16 ("iwlwifi: mvm: fix TX of
> AMSDU with fragmented SKBs"). That was a different trigger - NFS
> created highly fragmented SKBs where nr_frags was so high that the
> buffer descriptor limit check produced num_subframes=0. Emmanuel's
> fix clamps that path to 1.
> 
> Current MLD Bug
> ---------------
> Different path to the same symptom. When TLC disables AMSDU for a
> TID, both MVM and MLD set max_tid_amsdu_len[tid] = 1 as a sentinel
> value. The key difference in protection:
> 
> MVM has a private mvmsta->amsdu_enabled bitmap that gates the entire
> AMSDU path:
> 
>      if (!mvmsta->amsdu_enabled)
>          return iwl_tx_tso_segment(skb, 1, ...);  // bail out early
> 
>      if (!(mvmsta->amsdu_enabled & BIT(tid)))
>          return iwl_tx_tso_segment(skb, 1, ...);  // bail out early
> 
> MVM never reads max_tid_amsdu_len in its TX path - it uses its own
> mvmsta->max_amsdu_len. This bitmap was added in commit 84226ca1c5d3
> ("iwlwifi: mvm: enable AMSDU for all TIDs", Nov 2017).
> 
> MLD was designed to use mac80211's sta->cur->max_tid_amsdu_len
> directly, with no equivalent bitmap:
> 
>      max_tid_amsdu_len = sta->cur->max_tid_amsdu_len[tid];
>      if (!max_tid_amsdu_len)  // only catches 0, not sentinel 1!
>          return iwl_tx_tso_segment(skb, 1, ...);
> 
>      num_subframes = (max_tid_amsdu_len + pad) / (subf_len + pad);
>      // When max_tid_amsdu_len=1: num_subframes = (1 + 3) / (1534 + 3) = 0
> 
> What I found in your repos:
> 
>    - linux-ct-6.5-be200, linux-ct-6.10, linux-ct-6.14: No MLD driver,
>      only MVM with amsdu_enabled bitmap protection
>    - linux-ct-6.15, linux-ct-6.18: Have MLD driver
>      (drivers/net/wireless/intel/iwlwifi/mld/)
>    - backport-iwlwifi: MLD tx.c first appeared in commit 56f903a89
>      (2024-07-17)
> 
> So MVM should have been immune to this specific sentinel-value bug
> due to the bitmap check.
> 
> Question for you: When you saw TSO segment explosions in 2024, what
> kernel and driver were you using? If it was one of your 6.5-6.14
> kernels with MVM, then there may be a different path to
> num_subframes=0 that I haven't identified yet. If you were using
> backport-iwlwifi with MLD enabled, that would explain it hitting the
> same bug I'm fixing now.
> 
> The commit ae6d30a71521 (Feb 2024) added better error reporting for
> skb_gso_segment failures, which suggests people were hitting GSO
> segment errors around that time - but I don't have visibility into
> what specific trigger you hit.
> 
> My fix catches the sentinel-induced zero after the calculation, which
> is equivalent to what MVM's bitmap check accomplishes. This should
> prevent the current MLD bug from reaching skb_gso_segment with
> gso_size=0.
> 
> Looking forward to your test results with the problem AP, and any
> clarification on what setup you were using in 2024.
> 
> Cole
> 

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] wifi: iwlwifi: prevent NAPI processing after firmware error
  2026-02-14 18:43   ` [PATCH v3] " Cole Leavitt
@ 2026-02-26 19:37     ` Ben Greear
  0 siblings, 0 replies; 15+ messages in thread
From: Ben Greear @ 2026-02-26 19:37 UTC (permalink / raw)
  To: Cole Leavitt, johannes.berg, miriam.rachel.korenblit
  Cc: linux-wireless, stable

On 2/14/26 10:43, Cole Leavitt wrote:
> After a firmware error is detected and STATUS_FW_ERROR is set, NAPI can
> still be actively polling or get scheduled from a prior interrupt. The
> NAPI poll functions (both legacy and MSIX variants) have no check for
> STATUS_FW_ERROR and will continue processing stale RX ring entries from
> dying firmware. This can dispatch TX completion notifications containing
> corrupt SSN values to iwl_mld_handle_tx_resp_notif(), which passes them
> to iwl_trans_reclaim(). If the corrupt SSN causes reclaim to walk TX
> queue entries that were already freed by a prior correct reclaim, the
> result is an skb use-after-free or double-free.

Hello Cole,

We've been testing with this patch, and today managed to see its logic trigger.
The system had a cascade of other errors leading to use-after-free that does
not appear to be related to the skb use-after free, but I can at least confirm your
patch can be triggered.  Here are logs around this.  I believe in this case, firmware
didn't actually bubble up a crash notification/interrupt, but probably the driver detected timeout
and faked a crash.

Feb 26 11:09:35 ct523c-de7c kernel: workqueue: blk_mq_requeue_work hogged CPU for >10000us 4 times, consider switching to WQ_UNBOUND
Feb 26 11:09:36 ct523c-de7c kernel: workqueue: gc_worker [nf_conntrack] hogged CPU for >10000us 515 times, consider switching to WQ_UNBOUND
Feb 26 11:09:39 ct523c-de7c kernel: iwlwifi 0000:28:00.0: Error sending SYSTEM_STATISTICS_CMD: time out after 2000ms.
Feb 26 11:09:40 ct523c-de7c kernel: iwlwifi 0000:28:00.0: Current CMD queue read_ptr 92 write_ptr 93
Feb 26 11:09:40 ct523c-de7c kernel: iwlwifi 0000:28:00.0: Start IWL Error Log Dump:
Feb 26 11:09:40 ct523c-de7c kernel: iwlwifi 0000:28:00.0: Transport status: 0x0000004A, valid: 6
Feb 26 11:09:40 ct523c-de7c kernel: iwlwifi 0000:28:00.0: Loaded firmware version: 101.6ef20b19.0 gl-c0-fm-c0-101.ucode
Feb 26 11:09:40 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000084 | NMI_INTERRUPT_UNKNOWN
Feb 26 11:09:40 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x000002F0 | trm_hw_status0
Feb 26 11:09:40 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000000 | trm_hw_status1
Feb 26 11:09:40 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x002C438C | branchlink2
Feb 26 11:09:40 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00009C04 | interruptlink1
Feb 26 11:09:40 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00009C04 | interruptlink2
Feb 26 11:09:40 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00009C3C | data1
Feb 26 11:09:40 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x01000000 | data2
Feb 26 11:09:40 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000000 | data3
Feb 26 11:09:40 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0xD040FE38 | beacon time
Feb 26 11:09:40 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x3B6411CD | tsf low
Feb 26 11:09:40 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x0000026B | tsf hi
Feb 26 11:09:40 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000000 | time gp1
Feb 26 11:09:40 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x0616DB18 | time gp2
Feb 26 11:09:40 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000001 | uCode revision type
Feb 26 11:09:40 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000065 | uCode version major
Feb 26 11:09:40 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x6EF20B19 | uCode version minor
Feb 26 11:09:40 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000472 | hw version
Feb 26 11:09:40 ct523c-de7c kernel: ------------[ cut here ]------------
Feb 26 11:09:40 ct523c-de7c kernel: iwlwifi: NAPI MSIX poll[0] invoked after FW error
Feb 26 11:09:40 ct523c-de7c kernel: WARNING: CPU: 3 PID: 36 at drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/rx.c:1058 iwl_pcie_napi_poll_msix+0x29e/0x310 
[iwlwifi]
Feb 26 11:09:40 ct523c-de7c kernel: Modules linked in: vrf nf_conntrack_netlink nf_conntrack nfnetlink nf_defrag_ipv6 nf_defrag_ipv4 8021q garp mrp stp llc 
macvlan pktgen rfcomm rpcrdma rdma_cm iw_cm ib_cm ib_core qrtr bnep intel_rapl_msr iTCO_wdt intel_pmc_bxt ee1004 iTCO_vendor_support iwlmld coretemp 
intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common snd_hda_codec_intelhdmi snd_hda_codec_hdmi intel_tcc_cooling x86_pkg_temp_thermal 
snd_hda_codec_alc662 snd_hda_codec_realtek_lib intel_powerclamp intel_wmi_thunderbolt snd_hda_codec_generic snd_hda_intel snd_intel_dspcfg snd_hda_codec 
mac80211 snd_hda_core snd_hwdep snd_seq btusb btbcm snd_seq_device btmtk btrtl btintel snd_pcm iwlwifi bluetooth snd_timer cfg80211 i2c_i801 snd i2c_smbus 
soundcore i2c_mux mei_pxp mei_hdcp intel_pch_thermal intel_pmc_core pmt_telemetry pmt_discovery pmt_class intel_pmc_ssram_telemetry intel_vsec acpi_pad bfq nfsd 
sch_fq_codel auth_rpcgss nfs_acl lockd grace nfs_localio fuse sunrpc raid1 dm_raid raid456 async_raid6_recov async_memcpy async_pq
Feb 26 11:09:40 ct523c-de7c kernel:  async_xor xor async_tx raid6_pq i915 drm_buddy intel_gtt drm_client_lib drm_display_helper drm_kms_helper cec rc_core ttm 
agpgart ixgbe mdio igb libie_fwlog i2c_algo_bit dca drm hwmon mei_wdt i2c_core intel_oc_wdt video wmi scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_multipath [last 
unloaded: nfnetlink]
Feb 26 11:09:40 ct523c-de7c kernel: CPU: 3 UID: 0 PID: 36 Comm: ksoftirqd/3 Kdump: loaded Not tainted 6.18.9+ #18 PREEMPT(full)
Feb 26 11:09:40 ct523c-de7c kernel: Hardware name: Default string Default string/SKYBAY, BIOS 5.12 02/21/2023
Feb 26 11:09:40 ct523c-de7c kernel: RIP: 0010:iwl_pcie_napi_poll_msix+0x29e/0x310 [iwlwifi]
Feb 26 11:09:40 ct523c-de7c kernel: Code: 00 00 fc ff df 48 89 fa 48 c1 ea 03 0f b6 04 02 84 c0 74 04 3c 03 7e 74 8b b3 48 ff ff ff 48 c7 c7 a0 d2 19 a2 e8 62 
00 0f df <0f> 0b eb 96 4c 89 e7 e8 d6 92 9f df e9 0b fe ff ff e8 cc 92 9f df
Feb 26 11:09:40 ct523c-de7c kernel: RSP: 0018:ffff888120fe7b70 EFLAGS: 00010286
Feb 26 11:09:40 ct523c-de7c kernel: RAX: 0000000000000000 RBX: ffff888182e100b8 RCX: 0000000000000001
Feb 26 11:09:40 ct523c-de7c kernel: RDX: 0000000000000027 RSI: 0000000000000004 RDI: ffff88841dba4e48
Feb 26 11:09:40 ct523c-de7c kernel: RBP: ffff888132504028 R08: 0000000000000001 R09: ffffed1083b749c9
Feb 26 11:09:40 ct523c-de7c kernel: R10: ffff88841dba4e4b R11: 0000000000072ee8 R12: ffff888132504090
Feb 26 11:09:40 ct523c-de7c kernel: R13: ffff888132504d90 R14: 0000000000000040 R15: ffff888134b480b8
Feb 26 11:09:40 ct523c-de7c kernel: FS:  0000000000000000(0000) GS:ffff8884974c7000(0000) knlGS:0000000000000000
Feb 26 11:09:40 ct523c-de7c kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 26 11:09:40 ct523c-de7c kernel: CR2: 00007fd5beb80028 CR3: 000000013e2d4001 CR4: 00000000003706f0
Feb 26 11:09:40 ct523c-de7c kernel: Call Trace:
Feb 26 11:09:40 ct523c-de7c kernel:  <TASK>
Feb 26 11:09:40 ct523c-de7c kernel:  __napi_poll.constprop.0+0xa0/0x580
Feb 26 11:09:40 ct523c-de7c kernel:  net_rx_action+0x84d/0xe40
Feb 26 11:09:40 ct523c-de7c kernel:  ? __napi_poll.constprop.0+0x580/0x580
Feb 26 11:09:40 ct523c-de7c kernel:  ? do_raw_spin_lock+0x12c/0x270
Feb 26 11:09:40 ct523c-de7c kernel:  ? run_timer_softirq+0xf2/0x1b0
Feb 26 11:09:40 ct523c-de7c kernel:  ? lock_release+0xce/0x290
Feb 26 11:09:40 ct523c-de7c kernel:  ? trace_irq_enable.constprop.0+0xbe/0x100
Feb 26 11:09:40 ct523c-de7c kernel:  handle_softirqs+0x1c6/0x810
Feb 26 11:09:40 ct523c-de7c kernel:  run_ksoftirqd+0x2d/0x50
Feb 26 11:09:40 ct523c-de7c kernel:  smpboot_thread_fn+0x338/0x8c0
Feb 26 11:09:40 ct523c-de7c kernel:  ? sort_range+0x20/0x20
Feb 26 11:09:40 ct523c-de7c kernel:  kthread+0x3b7/0x770
Feb 26 11:09:40 ct523c-de7c kernel:  ? kthread_is_per_cpu+0xb0/0xb0
Feb 26 11:09:40 ct523c-de7c kernel:  ? ret_from_fork+0x17/0x3a0
Feb 26 11:09:40 ct523c-de7c kernel:  ? lock_release+0xce/0x290
Feb 26 11:09:40 ct523c-de7c kernel:  ? kthread_is_per_cpu+0xb0/0xb0
Feb 26 11:09:40 ct523c-de7c kernel:  ret_from_fork+0x28b/0x3a0
Feb 26 11:09:40 ct523c-de7c kernel:  ? kthread_is_per_cpu+0xb0/0xb0
Feb 26 11:09:40 ct523c-de7c kernel:  ret_from_fork_asm+0x11/0x20
Feb 26 11:09:40 ct523c-de7c kernel:  </TASK>
Feb 26 11:09:40 ct523c-de7c kernel: irq event stamp: 36429934
Feb 26 11:09:40 ct523c-de7c kernel: hardirqs last  enabled at (36429940): [<ffffffff816116ee>] __up_console_sem+0x5e/0x70
Feb 26 11:09:40 ct523c-de7c kernel: hardirqs last disabled at (36429945): [<ffffffff816116d3>] __up_console_sem+0x43/0x70
Feb 26 11:09:40 ct523c-de7c kernel: softirqs last  enabled at (36428910): [<ffffffff81471d3d>] run_ksoftirqd+0x2d/0x50
Feb 26 11:09:40 ct523c-de7c kernel: softirqs last disabled at (36428915): [<ffffffff81471d3d>] run_ksoftirqd+0x2d/0x50
Feb 26 11:09:40 ct523c-de7c kernel: ---[ end trace 0000000000000000 ]---
Feb 26 11:09:40 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00C80002 | board version
Feb 26 11:09:40 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x036D001C | hcmd
Feb 26 11:09:40 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0xF6F38000 | isr0
Feb 26 11:09:40 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x01400000 | isr1
Feb 26 11:09:40 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x48F00002 | isr2
Feb 26 11:09:40 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00C00008 | isr3
Feb 26 11:09:40 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x18200000 | isr4
Feb 26 11:09:40 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x036D001C | last cmd Id
Feb 26 11:09:40 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00009C3C | wait_event
Feb 26 11:09:40 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x10000010 | l2p_control
Feb 26 11:09:40 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000000 | l2p_duration
Feb 26 11:09:40 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000000 | l2p_mhvalid
Feb 26 11:09:40 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000000 | l2p_addr_match
Feb 26 11:09:40 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000009 | lmpm_pmg_sel
Feb 26 11:09:40 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000000 | timestamp
Feb 26 11:09:40 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00009018 | flow_handler
Feb 26 11:09:40 ct523c-de7c kernel: iwlwifi 0000:28:00.0: Start IWL Error Log Dump:
Feb 26 11:09:40 ct523c-de7c kernel: iwlwifi 0000:28:00.0: Transport status: 0x0000004A, valid: 6
Feb 26 11:09:40 ct523c-de7c kernel: iwlwifi 0000:28:00.0: Loaded firmware version: 101.6ef20b19.0 gl-c0-fm-c0-101.ucode
Feb 26 11:09:40 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000084 | NMI_INTERRUPT_UNKNOWN
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x000002F0 | trm_hw_status0
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000000 | trm_hw_status1
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x002C438C | branchlink2
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x002B8AD0 | interruptlink1
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x002B8AD0 | interruptlink2
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x002A5822 | data1
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x01000000 | data2
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000000 | data3
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0xD040FE37 | beacon time
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x3B6411CE | tsf low
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x0000026B | tsf hi
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000000 | time gp1
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x0616DB19 | time gp2
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000001 | uCode revision type
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000065 | uCode version major
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x6EF20B19 | uCode version minor
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000472 | hw version
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00C80002 | board version
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x804AFC12 | hcmd
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00020000 | isr0
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000000 | isr1
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x48F00002 | isr2
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00C0001C | isr3
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000000 | isr4
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000000 | last cmd Id
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x002A5822 | wait_event
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x10000010 | l2p_control
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000000 | l2p_duration
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000000 | l2p_mhvalid
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000000 | l2p_addr_match
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000028 | lmpm_pmg_sel
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000000 | timestamp
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00009018 | flow_handler
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: Start IWL Error Log Dump:
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: Transport status: 0x0000004A, valid: 7
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x20000066 | NMI_INTERRUPT_HOST
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000000 | umac branchlink1
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0xC00808AA | umac branchlink2
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x80287BFE | umac interruptlink1
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x0102163E | umac interruptlink2
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x01000000 | umac data1
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x0102163E | umac data2
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000000 | umac data3
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000065 | umac major
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x6EF20B19 | umac minor
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x0616DB0F | frame pointer
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0xD00D6258 | stack pointer
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x005C020F | last host cmd
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000400 | isr status reg
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: TCM1 status:
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000070 | error ID
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00001D2E | tcm branchlink2
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x0000211C | tcm interruptlink1
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x0000211C | tcm interruptlink2
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x40000000 | tcm data1
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0xDEADBEEF | tcm data2
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0xDEADBEEF | tcm data3
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00001DAC | tcm log PC
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00803FF0 | tcm frame pointer
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00803F5C | tcm stack pointer
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000000 | tcm msg ID
Feb 26 11:09:41 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x4000001F | tcm ISR status
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x000002F0 | tcm HW status[0]
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000000 | tcm HW status[1]
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000000 | tcm HW status[2]
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x40008300 | tcm HW status[3]
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000000 | tcm HW status[4]
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000000 | tcm SW status[0]
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: RCM1 status:
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000070 | error ID
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00001E2E | rcm branchlink2
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x000027A0 | rcm interruptlink1
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x000027A0 | rcm interruptlink2
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x20000000 | rcm data1
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0xDEADBEEF | rcm data2
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0xDEADBEEF | rcm data3
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00001E98 | rcm log PC
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00803FF0 | rcm frame pointer
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00803F5C | rcm stack pointer
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000000 | rcm msg ID
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x2006F000 | rcm ISR status
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00020400 | frame HW status
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000000 | LMAC-to-RCM request mbox
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000000 | RCM-to-LMAC request mbox
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000000 | MAC header control
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000000 | MAC header addr1 low
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x003C0000 | MAC header info
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000000 | MAC header error
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: TCM2 status:
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000070 | error ID
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00001D2E | tcm branchlink2
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x0000211C | tcm interruptlink1
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x0000211C | tcm interruptlink2
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x40000000 | tcm data1
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0xDEADBEEF | tcm data2
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0xDEADBEEF | tcm data3
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00001DAC | tcm log PC
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00803FF0 | tcm frame pointer
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00803F5C | tcm stack pointer
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000000 | tcm msg ID
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x60000000 | tcm ISR status
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x000002F0 | tcm HW status[0]
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000000 | tcm HW status[1]
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000000 | tcm HW status[2]
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00008000 | tcm HW status[3]
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000000 | tcm HW status[4]
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000000 | tcm SW status[0]
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: RCM2 status:
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000070 | error ID
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00001E2E | rcm branchlink2
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x000027A0 | rcm interruptlink1
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x000027A0 | rcm interruptlink2
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x20000000 | rcm data1
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0xDEADBEEF | rcm data2
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0xDEADBEEF | rcm data3
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00001E98 | rcm log PC
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00803FF0 | rcm frame pointer
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00803F5C | rcm stack pointer
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000000 | rcm msg ID
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x20000000 | rcm ISR status
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00020400 | frame HW status
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000000 | LMAC-to-RCM request mbox
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000000 | RCM-to-LMAC request mbox
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000000 | MAC header control
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000000 | MAC header addr1 low
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x003C0000 | MAC header info
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000000 | MAC header error
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: IML/ROM dump:
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000B03 | IML/ROM error/state
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x0000EEE3 | IML/ROM data1
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000080 | IML/ROM WFPM_AUTH_KEY_0
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: Fseq Registers:
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x60000000 | FSEQ_ERROR_CODE
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x80B10006 | FSEQ_TOP_INIT_VERSION
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00570000 | FSEQ_CNVIO_INIT_VERSION
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x0000AA14 | FSEQ_OTP_VERSION
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x0000000F | FSEQ_TOP_CONTENT_VERSION
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x4552414E | FSEQ_ALIVE_TOKEN
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x02001910 | FSEQ_CNVI_ID
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x02001910 | FSEQ_CNVR_ID
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x02001910 | CNVI_AUX_MISC_CHIP
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x02001910 | CNVR_AUX_MISC_CHIP
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x8F0F1BEF | CNVR_SCU_SD_REGS_SD_REG_DIG_DCDC_VTRIM
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00000000 | CNVR_SCU_SD_REGS_SD_REG_ACTIVE_VDIG_MIRROR
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00570000 | FSEQ_PREV_CNVIO_INIT_VERSION
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00B10006 | FSEQ_WIFI_FSEQ_VERSION
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x00B10005 | FSEQ_BT_FSEQ_VERSION
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x000000DC | FSEQ_CLASS_TP_VERSION
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: UMAC CURRENT PC: 0x8028b720
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: LMAC1 CURRENT PC: 0xd0
Feb 26 11:09:42 ct523c-de7c kernel: iwlwifi 0000:28:00.0: LMAC2 CURRENT PC: 0xd0
Feb 26 11:09:43 ct523c-de7c kernel: iwlwifi 0000:28:00.0: UMAC CURRENT PC 1: 0x8028b722
Feb 26 11:09:43 ct523c-de7c kernel: iwlwifi 0000:28:00.0: TCM1 CURRENT PC: 0xd0
Feb 26 11:09:43 ct523c-de7c kernel: iwlwifi 0000:28:00.0: RCM1 CURRENT PC: 0xd0
Feb 26 11:09:43 ct523c-de7c kernel: iwlwifi 0000:28:00.0: RCM2 CURRENT PC: 0xd0
Feb 26 11:09:43 ct523c-de7c kernel: iwlwifi 0000:28:00.0: Function Scratch status:
Feb 26 11:09:43 ct523c-de7c kernel: iwlwifi 0000:28:00.0: 0x01010100 | Func Scratch
Feb 26 11:09:43 ct523c-de7c kernel: iwlwifi 0000:28:00.0: WRT: Collecting data: ini trigger 4 fired (delay=0ms).


Thanks,
Ben

> 
> The race window opens when the MSIX IRQ handler schedules NAPI (lines
> 2319-2321 in rx.c) before processing the error bit (lines 2382-2396),
> or when NAPI is already running on another CPU from a previous interrupt
> when STATUS_FW_ERROR gets set on the current CPU.
> 
> Add STATUS_FW_ERROR checks to both NAPI poll functions to prevent
> processing stale RX data after firmware error, and add early-return
> guards in the TX response and compressed BA notification handlers as
> defense-in-depth. Each check uses WARN_ONCE to log if the race is
> actually hit, which aids diagnosis of the hard-to-reproduce skb
> use-after-free reported on Intel BE200.
> 
> Note that _iwl_trans_pcie_gen2_stop_device() already calls
> iwl_pcie_rx_napi_sync() to quiesce NAPI during device teardown, but that
> runs much later in the restart sequence. These checks close the window
> between error detection and device stop.
> 
> Fixes: d1e879ec600f ("wifi: iwlwifi: add iwlmld sub-driver")
> Cc: stable@vger.kernel.org
> Signed-off-by: Cole Leavitt <cole@unwrap.rs>
> ---
> Changes since v1:
>    - Added Fixes: tag and Cc: stable@vger.kernel.org
> 
> Tested on Intel BE200 (FW 101.6e695a70.0) by forcing NMI via debugfs.
> The WARN_ONCE fires reliably:
> 
>    iwlwifi: NAPI MSIX poll[0] invoked after FW error
>    WARNING: drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/rx.c:1058
>             at iwl_pcie_napi_poll_msix+0xff/0x130 [iwlwifi], CPU#22
> 
> Confirming NAPI poll is invoked after STATUS_FW_ERROR is set. Without
> this patch, that poll processes stale RX ring data from dead firmware.
> 
>   drivers/net/wireless/intel/iwlwifi/mld/tx.c   | 19 ++++++++++++++++++
>   .../wireless/intel/iwlwifi/pcie/gen1_2/rx.c   | 20 +++++++++++++++++++
>   2 files changed, 39 insertions(+)
> 
> diff --git a/drivers/net/wireless/intel/iwlwifi/mld/tx.c b/drivers/net/wireless/intel/iwlwifi/mld/tx.c
> index 3b4b575aadaa..3e99f3ded9bc 100644
> --- a/drivers/net/wireless/intel/iwlwifi/mld/tx.c
> +++ b/drivers/net/wireless/intel/iwlwifi/mld/tx.c
> @@ -1071,6 +1071,18 @@ void iwl_mld_handle_tx_resp_notif(struct iwl_mld *mld,
>   	bool mgmt = false;
>   	bool tx_failure = (status & TX_STATUS_MSK) != TX_STATUS_SUCCESS;
>   
> +	/* Firmware is dead — the TX response may contain corrupt SSN values
> +	 * from a dying firmware DMA. Processing it could cause
> +	 * iwl_trans_reclaim() to free the wrong TX queue entries, leading to
> +	 * skb use-after-free or double-free.
> +	 */
> +	if (unlikely(test_bit(STATUS_FW_ERROR, &mld->trans->status))) {
> +		WARN_ONCE(1,
> +			  "iwlwifi: TX resp notif (sta=%d txq=%d) after FW error\n",
> +			  sta_id, txq_id);
> +		return;
> +	}
> +
>   	if (IWL_FW_CHECK(mld, tx_resp->frame_count != 1,
>   			 "Invalid tx_resp notif frame_count (%d)\n",
>   			 tx_resp->frame_count))
> @@ -1349,6 +1361,13 @@ void iwl_mld_handle_compressed_ba_notif(struct iwl_mld *mld,
>   	u8 sta_id = ba_res->sta_id;
>   	struct ieee80211_link_sta *link_sta;
>   
> +	if (unlikely(test_bit(STATUS_FW_ERROR, &mld->trans->status))) {
> +		WARN_ONCE(1,
> +			  "iwlwifi: BA notif (sta=%d) after FW error\n",
> +			  sta_id);
> +		return;
> +	}
> +
>   	if (!tfd_cnt)
>   		return;
>   
> diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/rx.c b/drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/rx.c
> index 619a9505e6d9..ba18d35fa55d 100644
> --- a/drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/rx.c
> +++ b/drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/rx.c
> @@ -1015,6 +1015,18 @@ static int iwl_pcie_napi_poll(struct napi_struct *napi, int budget)
>   	trans_pcie = iwl_netdev_to_trans_pcie(napi->dev);
>   	trans = trans_pcie->trans;
>   
> +	/* Stop processing RX if firmware has crashed. Stale notifications
> +	 * from dying firmware (e.g. TX completions with corrupt SSN values)
> +	 * can cause use-after-free in reclaim paths.
> +	 */
> +	if (unlikely(test_bit(STATUS_FW_ERROR, &trans->status))) {
> +		WARN_ONCE(1,
> +			  "iwlwifi: NAPI poll[%d] invoked after FW error\n",
> +			  rxq->id);
> +		napi_complete_done(napi, 0);
> +		return 0;
> +	}
> +
>   	ret = iwl_pcie_rx_handle(trans, rxq->id, budget);
>   
>   	IWL_DEBUG_ISR(trans, "[%d] handled %d, budget %d\n",
> @@ -1042,6 +1054,14 @@ static int iwl_pcie_napi_poll_msix(struct napi_struct *napi, int budget)
>   	trans_pcie = iwl_netdev_to_trans_pcie(napi->dev);
>   	trans = trans_pcie->trans;
>   
> +	if (unlikely(test_bit(STATUS_FW_ERROR, &trans->status))) {
> +		WARN_ONCE(1,
> +			  "iwlwifi: NAPI MSIX poll[%d] invoked after FW error\n",
> +			  rxq->id);
> +		napi_complete_done(napi, 0);
> +		return 0;
> +	}
> +
>   	ret = iwl_pcie_rx_handle(trans, rxq->id, budget);
>   	IWL_DEBUG_ISR(trans, "[%d] handled %d, budget %d\n", rxq->id, ret,
>   		      budget);

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com



^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH 1/1] wifi: iwlwifi: mld: fix TSO segmentation explosion when AMSDU is disabled
  2026-02-18 14:47           ` [PATCH 1/1] wifi: iwlwifi: mld: fix TSO segmentation explosion when AMSDU is disabled Cole Leavitt
@ 2026-03-22 12:28             ` Korenblit, Miriam Rachel
  0 siblings, 0 replies; 15+ messages in thread
From: Korenblit, Miriam Rachel @ 2026-03-22 12:28 UTC (permalink / raw)
  To: Cole Leavitt, greearb@candelatech.com
  Cc: johannes@sipsolutions.net, linux-wireless@vger.kernel.org



> -----Original Message-----
> From: Cole Leavitt <cole@unwrap.rs>
> Sent: Wednesday, February 18, 2026 4:47 PM
> To: greearb@candelatech.com
> Cc: johannes@sipsolutions.net; linux-wireless@vger.kernel.org; Korenblit, Miriam
> Rachel <miriam.rachel.korenblit@intel.com>; Cole Leavitt <cole@unwrap.rs>
> Subject: [PATCH 1/1] wifi: iwlwifi: mld: fix TSO segmentation explosion when
> AMSDU is disabled
> 
> When the TLC notification disables AMSDU for a TID, the MLD driver sets
> max_tid_amsdu_len to the sentinel value 1. The TSO segmentation path in
> iwl_mld_tx_tso_segment() checks for zero but not for this sentinel, allowing it to
> reach the num_subframes calculation:
> 
>   num_subframes = (max_tid_amsdu_len + pad) / (subf_len + pad)
>                 = (1 + 2) / (1534 + 2) = 0
> 
> This zero propagates to iwl_tx_tso_segment() which sets:
> 
>   gso_size = num_subframes * mss = 0
> 
> Calling skb_gso_segment() with gso_size=0 creates over 32000 tiny segments
> from a single GSO skb. This floods the TX ring with ~1024 micro-frames (the rest
> are purged), creating a massive burst of TX completion events that can lead to
> memory corruption and a subsequent use-after-free in TCP's retransmit queue
> (refcount underflow in tcp_shifted_skb, NULL deref in tcp_rack_detect_loss).
> 
> The MVM driver is immune because it checks mvmsta->amsdu_enabled before
> reaching the num_subframes calculation. The MLD driver has no equivalent
> bitmap check and relies solely on max_tid_amsdu_len, which does not catch the
> sentinel value.
> 
> Fix this by falling back to single-subframe TSO (num_subframes=1) when the
> AMSDU length limit is too small to fit even one subframe.
> 
> Fixes: d1e879ec600f ("wifi: iwlwifi: add iwlmld sub-driver")
> Signed-off-by: Cole Leavitt <cole@unwrap.rs>
> ---
>  drivers/net/wireless/intel/iwlwifi/mld/tx.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/net/wireless/intel/iwlwifi/mld/tx.c
> b/drivers/net/wireless/intel/iwlwifi/mld/tx.c
> index fbb672f4d8c7..1d47254a4148 100644
> --- a/drivers/net/wireless/intel/iwlwifi/mld/tx.c
> +++ b/drivers/net/wireless/intel/iwlwifi/mld/tx.c
> @@ -846,6 +846,17 @@ static int iwl_mld_tx_tso_segment(struct iwl_mld *mld,
> struct sk_buff *skb,
>  	 */
>  	num_subframes = (max_tid_amsdu_len + pad) / (subf_len + pad);
> 
> +	/* If the AMSDU length limit is too small to fit even a single
> +	 * subframe (e.g. max_tid_amsdu_len is the sentinel value 1 set by
> +	 * the TLC notification when AMSDU is disabled for this TID), fall
> +	 * back to non-AMSDU TSO segmentation. Without this guard,
> +	 * num_subframes=0 causes gso_size=0 in iwl_tx_tso_segment(),
> +	 * which makes skb_gso_segment() produce tens of thousands of
> +	 * 1-byte segments, overloading the TX ring and completion path.
> +	 */
Having 0 subframes doesn't make sense and wouldn't have happen if not the bug...

I would have check if AMSDU is disabled for the TID in question by correcting the check to " max_tid_amsdu_len == 1"

Then, we can even warn on if (!num_subframe), to avoid more bugs like this in the future...

Also, I'd make sure that link_sta->agg.max_tid_amsdu_len can't be 0 (there is only some error case that needs to be adjusted).

> +	if (!num_subframes)
> +		return iwl_tx_tso_segment(skb, 1, netdev_flags, mpdus_skbs);
> +
>  	if (sta->max_amsdu_subframes &&
>  	    num_subframes > sta->max_amsdu_subframes)
>  		num_subframes = sta->max_amsdu_subframes;
> --
> 2.52.0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH 0/1] wifi: iwlwifi: mld: fix TSO segmentation explosion causing UAF
  2026-02-18 14:47         ` [PATCH 0/1] wifi: iwlwifi: mld: fix TSO segmentation explosion causing UAF Cole Leavitt
  2026-02-18 14:47           ` [PATCH 1/1] wifi: iwlwifi: mld: fix TSO segmentation explosion when AMSDU is disabled Cole Leavitt
@ 2026-03-22 12:29           ` Korenblit, Miriam Rachel
  1 sibling, 0 replies; 15+ messages in thread
From: Korenblit, Miriam Rachel @ 2026-03-22 12:29 UTC (permalink / raw)
  To: Cole Leavitt, greearb@candelatech.com
  Cc: johannes@sipsolutions.net, linux-wireless@vger.kernel.org



> -----Original Message-----
> From: Cole Leavitt <cole@unwrap.rs>
> Sent: Wednesday, February 18, 2026 4:47 PM
> To: greearb@candelatech.com
> Cc: johannes@sipsolutions.net; linux-wireless@vger.kernel.org; Korenblit, Miriam
> Rachel <miriam.rachel.korenblit@intel.com>; Cole Leavitt <cole@unwrap.rs>
> Subject: [PATCH 0/1] wifi: iwlwifi: mld: fix TSO segmentation explosion causing
> UAF
> 
> Ben,
> 
> I've been digging into the use-after-free crash you reported on your
> BE200 running the MLD driver (tcp_shifted_skb refcount underflow, followed by
> NULL deref in tcp_rack_detect_loss). I think I found the root cause -- it's a
> missing guard in the MLD TSO segmentation path that lets num_subframes=0
> reach skb_gso_segment(), producing the 32k+ segment explosion you're seeing.
> 
> Here's the full chain:
> 
> 1) mld/tlc.c:790 -- when firmware's TLC notification disables AMSDU for
>    a TID (bit not set in amsdu_enabled), the MLD driver sets:
> 
>      link_sta->agg.max_tid_amsdu_len[i] = 1;
> 
>    This sentinel value 1 means "AMSDU disabled on this TID".
> 
> 2) mld/tx.c:836-837 -- the TSO path checks:
> 
>      max_tid_amsdu_len = sta->cur->max_tid_amsdu_len[tid];
>      if (!max_tid_amsdu_len)   // <-- only catches zero, not 1
>          return iwl_tx_tso_segment(skb, 1, ...);
> 
>    Value 1 passes this check.
> 
> 3) mld/tx.c:847 -- the division produces zero:
> 
>      num_subframes = (1 + 2) / (1534 + 2) = 0
> 
>    Any max_tid_amsdu_len below ~1534 (one subframe) produces 0 here.
> 
> 4) iwl-utils.c:27 -- gso_size is set to zero:
> 
>      skb_shinfo(skb)->gso_size = num_subframes * mss = 0 * 1460 = 0
> 
> 5) iwl-utils.c:30 -- skb_gso_segment() with gso_size=0 creates 32001+
>    tiny segments, which is the error you're seeing:
> 
>      "skbuff: ERROR: Found more than 32000 packets in skb_segment"
>      "iwl-mvm-tx-tso-segment, list gso-segment list is huge: 32001"
> 
> 6) mld/tx.c:912-936 -- the loop queues ~1024 of those segments to the
>    TX ring before it fills up, then purges the rest. This creates a
>    massive burst of tiny frames that stress the BA completion path.
> 
> The MVM driver is immune because it checks mvmsta->amsdu_enabled (a
> separate bitmap) at tx.c:912 and tx.c:936 BEFORE ever reaching the
> num_subframes calculation. MLD has no equivalent -- it relies solely on
> max_tid_amsdu_len, and the sentinel value 1 slips through.
> 
> This explains all your observations:
> - 6.18 regression: BE200 moved from MVM (has guard) to MLD (no guard)
> - AP-specific: the problem AP causes firmware to disable AMSDU for the
>   active TID (other APs enable it, so max_tid_amsdu_len gets a proper
>   value from iwl_mld_get_amsdu_size_of_tid())
> - 28min gap between TSO explosion and UAF: the ~1024 micro-frame burst
>   creates massive alloc/free churn in the skb slab, which can corrupt
>   TCP retransmit queue entries allocated from the same cache
> - No firmware error: firmware is fine, the bug is purely in MLD's TSO
>   parameter calculation
> 
> The fix (in patch 1/1) adds a guard after the num_subframes calculation -- if it's
> zero, fall back to single-subframe TSO (num_subframes=1), which correctly sets
> gso_size=mss. This matches what MVM effectively does via its amsdu_enabled
> checks.
> 
> Could you test this against the problem AP? Two things that would help confirm
> the theory:
> 
> 1) Before applying the fix, add this debug print to see the actual
>    max_tid_amsdu_len value with the problem AP:
> 
>      // In iwl_mld_tx_tso_segment(), after line 847
>      if (!num_subframes)
>          pr_warn_once("iwlmld: num_subframes=0, max_tid_amsdu_len=%u "
>                       "subf_len=%u mss=%u\n",
>                       max_tid_amsdu_len, subf_len, mss);
> 
> 2) After applying the fix, run against the problem AP for 1+ day and
>    check if both the TSO explosion AND the UAF are gone.
> 
> I also noticed a few secondary defense-in-depth regressions in MLD's TX
> completion path vs MVM:
> 
> - MLD's iwl_mld_tx_reclaim_txq() has no per-TID reclaim tracking
>   (MVM has tid_data->next_reclaimed and validates tid_data->txq_id)
> - The transport-level reclaim_lock prevents direct double-free, but
>   MLD is missing MVM's extra safety checks
> 
> These are probably not directly causing your crash, but worth noting.
> 
> Cole Leavitt (1):
>   wifi: iwlwifi: mld: fix TSO segmentation explosion when AMSDU is
>     disabled
> 
>  drivers/net/wireless/intel/iwlwifi/mld/tx.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> --
> 2.52.0

Thank you for the clear analysis!

Miri



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2026-03-22 12:29 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <c6f886d4-b9ed-48a6-9723-a738af055b64@candelatech.com>
2026-02-14 18:10 ` [PATCH] wifi: iwlwifi: prevent NAPI processing after firmware error Cole Leavitt
     [not found]   ` <5be8a502-d53a-4cce-821f-202368c44f6d@candelatech.com>
2026-02-14 18:33     ` Cole Leavitt
2026-02-16 18:12       ` Ben Greear
2026-02-18 14:44         ` Cole Leavitt
2026-02-18 14:44         ` Cole Leavitt
2026-02-18 14:47         ` [PATCH 0/1] wifi: iwlwifi: mld: fix TSO segmentation explosion causing UAF Cole Leavitt
2026-02-18 14:47           ` [PATCH 1/1] wifi: iwlwifi: mld: fix TSO segmentation explosion when AMSDU is disabled Cole Leavitt
2026-03-22 12:28             ` Korenblit, Miriam Rachel
2026-03-22 12:29           ` [PATCH 0/1] wifi: iwlwifi: mld: fix TSO segmentation explosion causing UAF Korenblit, Miriam Rachel
2026-02-18 17:35         ` [PATCH] wifi: iwlwifi: prevent NAPI processing after firmware error Ben Greear
2026-02-14 18:41   ` Cole Leavitt
2026-02-14 18:43   ` [PATCH v3] " Cole Leavitt
2026-02-26 19:37     ` Ben Greear
     [not found] <7f72ac08-6b4a-486b-a8f9-7b78ea0f5ae1@candelatech.com>
2026-02-18 18:47 ` [PATCH] " Cole Leavitt
2026-02-19 16:38   ` Ben Greear

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox