Re: [PATCH] wifi: iwlwifi: prevent NAPI processing after firmware error

public inbox for linux-wireless@vger.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH] wifi: iwlwifi: prevent NAPI processing after firmware error
       [not found] <7f72ac08-6b4a-486b-a8f9-7b78ea0f5ae1@candelatech.com>
@ 2026-02-18 18:47 ` Cole Leavitt
  2026-02-19 16:38   ` Ben Greear
  0 siblings, 1 reply; 9+ messages in thread
From: Cole Leavitt @ 2026-02-18 18:47 UTC (permalink / raw)
  To: Ben Greear; +Cc: linux-wireless

Ben,

Thanks for the historical context. I dug through the git history and
your linux-ct repos to verify exactly what happened when. I want to
make sure I have this right - can you confirm whether this matches
what you saw?

2018 Bug (Bug 199209)
---------------------
Fixed by Emmanuel in commit 0eac9abace16 ("iwlwifi: mvm: fix TX of
AMSDU with fragmented SKBs"). That was a different trigger - NFS
created highly fragmented SKBs where nr_frags was so high that the
buffer descriptor limit check produced num_subframes=0. Emmanuel's
fix clamps that path to 1.

Current MLD Bug
---------------
Different path to the same symptom. When TLC disables AMSDU for a
TID, both MVM and MLD set max_tid_amsdu_len[tid] = 1 as a sentinel
value. The key difference in protection:

MVM has a private mvmsta->amsdu_enabled bitmap that gates the entire
AMSDU path:

    if (!mvmsta->amsdu_enabled)
        return iwl_tx_tso_segment(skb, 1, ...);  // bail out early

    if (!(mvmsta->amsdu_enabled & BIT(tid)))
        return iwl_tx_tso_segment(skb, 1, ...);  // bail out early

MVM never reads max_tid_amsdu_len in its TX path - it uses its own
mvmsta->max_amsdu_len. This bitmap was added in commit 84226ca1c5d3
("iwlwifi: mvm: enable AMSDU for all TIDs", Nov 2017).

MLD was designed to use mac80211's sta->cur->max_tid_amsdu_len
directly, with no equivalent bitmap:

    max_tid_amsdu_len = sta->cur->max_tid_amsdu_len[tid];
    if (!max_tid_amsdu_len)  // only catches 0, not sentinel 1!
        return iwl_tx_tso_segment(skb, 1, ...);

    num_subframes = (max_tid_amsdu_len + pad) / (subf_len + pad);
    // When max_tid_amsdu_len=1: num_subframes = (1 + 3) / (1534 + 3) = 0

What I found in your repos:

  - linux-ct-6.5-be200, linux-ct-6.10, linux-ct-6.14: No MLD driver,
    only MVM with amsdu_enabled bitmap protection
  - linux-ct-6.15, linux-ct-6.18: Have MLD driver
    (drivers/net/wireless/intel/iwlwifi/mld/)
  - backport-iwlwifi: MLD tx.c first appeared in commit 56f903a89
    (2024-07-17)

So MVM should have been immune to this specific sentinel-value bug
due to the bitmap check.

Question for you: When you saw TSO segment explosions in 2024, what
kernel and driver were you using? If it was one of your 6.5-6.14
kernels with MVM, then there may be a different path to
num_subframes=0 that I haven't identified yet. If you were using
backport-iwlwifi with MLD enabled, that would explain it hitting the
same bug I'm fixing now.

The commit ae6d30a71521 (Feb 2024) added better error reporting for
skb_gso_segment failures, which suggests people were hitting GSO
segment errors around that time - but I don't have visibility into
what specific trigger you hit.

My fix catches the sentinel-induced zero after the calculation, which
is equivalent to what MVM's bitmap check accomplishes. This should
prevent the current MLD bug from reaching skb_gso_segment with
gso_size=0.

Looking forward to your test results with the problem AP, and any
clarification on what setup you were using in 2024.

Cole

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] wifi: iwlwifi: prevent NAPI processing after firmware error
  2026-02-18 18:47 ` [PATCH] wifi: iwlwifi: prevent NAPI processing after firmware error Cole Leavitt
@ 2026-02-19 16:38   ` Ben Greear
  0 siblings, 0 replies; 9+ messages in thread
From: Ben Greear @ 2026-02-19 16:38 UTC (permalink / raw)
  To: Cole Leavitt; +Cc: linux-wireless

On 2/18/26 10:47, Cole Leavitt wrote:
> Ben,
> 
> Thanks for the historical context. I dug through the git history and
> your linux-ct repos to verify exactly what happened when. I want to
> make sure I have this right - can you confirm whether this matches
> what you saw?

Bug was originally seen in mainline kernel before MLD driver was forked
off from mvm, not in a backports kernel.

Adding your patch below didn't solve the UAF in the tcp_ack path,
at least.  I did not see the debugging indicated that code path
in the patch was taken.  I have not seen any more instances of the 32k loops in
packet segment loop in the last crash, so at least it is not only reason why a UAF
would happen.

The problem reproduced overnight was:

BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 0 P4D 0
Oops: Oops: 0002 [#1] SMP
CPU: 12 UID: 0 PID: 1234 Comm: irq/345-iwlwifi Tainted: G S         O        6.18.9+ #53 PREEMPT(full)
Tainted: [S]=CPU_OUT_OF_SPEC, [O]=OOT_MODULE
Hardware name: Default string /Default string, BIOS 5.27 11/12/2024
RIP: 0010:rb_erase+0x173/0x350
Code: 08 48 8b 01 a8 01 75 97 48 83 c0 01 48 89 01 c3 c3 48 89 46 10 e9 27 ff ff ff 48 8b 56 10 48 8d 41 01 48 89 51 08 48 89 4e 10 <48> 89 02 48 8b 01 48 89 06 
48 89 31 48 83 f8 03 0f 86 8e 00 00 00
RSP: 0018:ffffc9000038c820 EFLAGS: 00010246
RAX: ffff8881b0646601 RBX: 000000000000000c RCX: ffff8881b0646600
RDX: 0000000000000000 RSI: ffff8881e9cbea00 RDI: ffff8881b0646200
------------[ cut here ]------------
RBP: ffff8881b0646200 R08: ffff8881ce443108 R09: 0000000080200001
R10: 0000000000010000 R11: 00000000f0eaffb7 R12: ffff8881ce442f80
R13: 0000000000000004 R14: ffff8881b0646600 R15: 0000000000000001
refcount_t: underflow; use-after-free.
FS:  0000000000000000(0000) GS:ffff8888dc42e000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000005a36002 CR4: 0000000000772ef0
PKRU: 55555554
Call Trace:
WARNING: CPU: 0 PID: 1224 at lib/refcount.c:28 refcount_warn_saturate+0xd8/0xe0
  <IRQ>
Modules linked in:
  tcp_ack+0x635/0x16e0
  nf_conntrack_netlink
  tcp_rcv_established+0x211/0xc10
  nf_conntrack
  ? sk_filter_trim_cap+0x1a7/0x350
  nfnetlink
  tcp_v4_do_rcv+0x1bf/0x350
  tls
  tcp_v4_rcv+0xddf/0x1550
  vrf
  ? lock_timer_base+0x6d/0x90
  nf_defrag_ipv6
  ? raw_local_deliver+0xcc/0x280
  nf_defrag_ipv4
ip_protocol_deliver_rcu+0x20/0x130
  8021q
  ip_local_deliver_finish+0x85/0xf0
  garp
  ip_sublist_rcv_finish+0x35/0x50
  mrp
  ip_sublist_rcv+0x16f/0x200
  stp
  ip_list_rcv+0xfe/0x130
  llc
  __netif_receive_skb_list_core+0x183/0x1f0
  macvlan
  netif_receive_skb_list_internal+0x1c8/0x2a0
  wanlink(O)
  gro_receive_skb+0x12e/0x210
  pktgen
  ieee80211_rx_napi+0x82/0xc0 [mac80211]
  rpcrdma
  iwl_mld_rx_mpdu+0xd0f/0xf00 [iwlmld]
  rdma_cm
  iwl_pcie_rx_handle+0x394/0xa00 [iwlwifi]
  iw_cm
  iwl_pcie_napi_poll_msix+0x3f/0x110 [iwlwifi]
  ib_cm
  __napi_poll+0x25/0x1e0
  ib_core
  net_rx_action+0x2d3/0x340
  qrtr


I have enough guard/debugging logic in place that I'm pretty sure the skb coming
from iwlwifi in this particular path is fine.  It appears the problem is that
there is an already freed skb in the socket's skb collection, and code blows up
trying to access a bad rbtree link, or something.  I'm continuing to try to narrow
down where skb goes bad, but it seems like probably some other thread of logic is
racing to free the skb since the crash site moves around a lot.  Maybe I can add
some sort of debugging to warn if skb is freed while in an rbtree...

Thanks,
Ben

> 
> 2018 Bug (Bug 199209)
> ---------------------
> Fixed by Emmanuel in commit 0eac9abace16 ("iwlwifi: mvm: fix TX of
> AMSDU with fragmented SKBs"). That was a different trigger - NFS
> created highly fragmented SKBs where nr_frags was so high that the
> buffer descriptor limit check produced num_subframes=0. Emmanuel's
> fix clamps that path to 1.
> 
> Current MLD Bug
> ---------------
> Different path to the same symptom. When TLC disables AMSDU for a
> TID, both MVM and MLD set max_tid_amsdu_len[tid] = 1 as a sentinel
> value. The key difference in protection:
> 
> MVM has a private mvmsta->amsdu_enabled bitmap that gates the entire
> AMSDU path:
> 
>      if (!mvmsta->amsdu_enabled)
>          return iwl_tx_tso_segment(skb, 1, ...);  // bail out early
> 
>      if (!(mvmsta->amsdu_enabled & BIT(tid)))
>          return iwl_tx_tso_segment(skb, 1, ...);  // bail out early
> 
> MVM never reads max_tid_amsdu_len in its TX path - it uses its own
> mvmsta->max_amsdu_len. This bitmap was added in commit 84226ca1c5d3
> ("iwlwifi: mvm: enable AMSDU for all TIDs", Nov 2017).
> 
> MLD was designed to use mac80211's sta->cur->max_tid_amsdu_len
> directly, with no equivalent bitmap:
> 
>      max_tid_amsdu_len = sta->cur->max_tid_amsdu_len[tid];
>      if (!max_tid_amsdu_len)  // only catches 0, not sentinel 1!
>          return iwl_tx_tso_segment(skb, 1, ...);
> 
>      num_subframes = (max_tid_amsdu_len + pad) / (subf_len + pad);
>      // When max_tid_amsdu_len=1: num_subframes = (1 + 3) / (1534 + 3) = 0
> 
> What I found in your repos:
> 
>    - linux-ct-6.5-be200, linux-ct-6.10, linux-ct-6.14: No MLD driver,
>      only MVM with amsdu_enabled bitmap protection
>    - linux-ct-6.15, linux-ct-6.18: Have MLD driver
>      (drivers/net/wireless/intel/iwlwifi/mld/)
>    - backport-iwlwifi: MLD tx.c first appeared in commit 56f903a89
>      (2024-07-17)
> 
> So MVM should have been immune to this specific sentinel-value bug
> due to the bitmap check.
> 
> Question for you: When you saw TSO segment explosions in 2024, what
> kernel and driver were you using? If it was one of your 6.5-6.14
> kernels with MVM, then there may be a different path to
> num_subframes=0 that I haven't identified yet. If you were using
> backport-iwlwifi with MLD enabled, that would explain it hitting the
> same bug I'm fixing now.
> 
> The commit ae6d30a71521 (Feb 2024) added better error reporting for
> skb_gso_segment failures, which suggests people were hitting GSO
> segment errors around that time - but I don't have visibility into
> what specific trigger you hit.
> 
> My fix catches the sentinel-induced zero after the calculation, which
> is equivalent to what MVM's bitmap check accomplishes. This should
> prevent the current MLD bug from reaching skb_gso_segment with
> gso_size=0.
> 
> Looking forward to your test results with the problem AP, and any
> clarification on what setup you were using in 2024.
> 
> Cole
> 

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com



^ permalink raw reply	[flat|nested] 9+ messages in thread

[parent not found: <c6f886d4-b9ed-48a6-9723-a738af055b64@candelatech.com>]

* [PATCH] wifi: iwlwifi: prevent NAPI processing after firmware error
       [not found] <c6f886d4-b9ed-48a6-9723-a738af055b64@candelatech.com>
@ 2026-02-14 18:10 ` Cole Leavitt
       [not found]   ` <5be8a502-d53a-4cce-821f-202368c44f6d@candelatech.com>
  2026-02-14 18:41   ` Cole Leavitt
  0 siblings, 2 replies; 9+ messages in thread
From: Cole Leavitt @ 2026-02-14 18:10 UTC (permalink / raw)
  To: greearb, johannes.berg, miriam.rachel.korenblit
  Cc: linux-wireless, Cole Leavitt

After a firmware error is detected and STATUS_FW_ERROR is set, NAPI can
still be actively polling or get scheduled from a prior interrupt. The
NAPI poll functions (both legacy and MSIX variants) have no check for
STATUS_FW_ERROR and will continue processing stale RX ring entries from
dying firmware. This can dispatch TX completion notifications containing
corrupt SSN values to iwl_mld_handle_tx_resp_notif(), which passes them
to iwl_trans_reclaim(). If the corrupt SSN causes reclaim to walk TX
queue entries that were already freed by a prior correct reclaim, the
result is an skb use-after-free or double-free.

The race window opens when the MSIX IRQ handler schedules NAPI (lines
2319-2321 in rx.c) before processing the error bit (lines 2382-2396),
or when NAPI is already running on another CPU from a previous interrupt
when STATUS_FW_ERROR gets set on the current CPU.

Add STATUS_FW_ERROR checks to both NAPI poll functions to prevent
processing stale RX data after firmware error, and add early-return
guards in the TX response and compressed BA notification handlers as
defense-in-depth. Each check uses WARN_ONCE to log if the race is
actually hit, which aids diagnosis of the hard-to-reproduce skb
use-after-free reported on Intel BE200.

Note that _iwl_trans_pcie_gen2_stop_device() already calls
iwl_pcie_rx_napi_sync() to quiesce NAPI during device teardown, but that
runs much later in the restart sequence. These checks close the window
between error detection and device stop.

Signed-off-by: Cole Leavitt <cole@unwrap.rs>
---
Tested on Intel BE200 (FW 101.6e695a70.0) by forcing NMI via debugfs.
The WARN_ONCE fires reliably:

  iwlwifi: NAPI MSIX poll[0] invoked after FW error
  WARNING: drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/rx.c:1058
           at iwl_pcie_napi_poll_msix+0xff/0x130 [iwlwifi], CPU#22

Confirming NAPI poll is invoked after STATUS_FW_ERROR is set. Without
this patch, that poll processes stale RX ring data from dead firmware.

 drivers/net/wireless/intel/iwlwifi/mld/tx.c   | 19 ++++++++++++++++++
 .../wireless/intel/iwlwifi/pcie/gen1_2/rx.c   | 20 +++++++++++++++++++
 2 files changed, 39 insertions(+)

diff --git a/drivers/net/wireless/intel/iwlwifi/mld/tx.c b/drivers/net/wireless/intel/iwlwifi/mld/tx.c
index 3b4b575aadaa..3e99f3ded9bc 100644
--- a/drivers/net/wireless/intel/iwlwifi/mld/tx.c
+++ b/drivers/net/wireless/intel/iwlwifi/mld/tx.c
@@ -1071,6 +1071,18 @@ void iwl_mld_handle_tx_resp_notif(struct iwl_mld *mld,
 	bool mgmt = false;
 	bool tx_failure = (status & TX_STATUS_MSK) != TX_STATUS_SUCCESS;
 
+	/* Firmware is dead — the TX response may contain corrupt SSN values
+	 * from a dying firmware DMA. Processing it could cause
+	 * iwl_trans_reclaim() to free the wrong TX queue entries, leading to
+	 * skb use-after-free or double-free.
+	 */
+	if (unlikely(test_bit(STATUS_FW_ERROR, &mld->trans->status))) {
+		WARN_ONCE(1,
+			  "iwlwifi: TX resp notif (sta=%d txq=%d) after FW error\n",
+			  sta_id, txq_id);
+		return;
+	}
+
 	if (IWL_FW_CHECK(mld, tx_resp->frame_count != 1,
 			 "Invalid tx_resp notif frame_count (%d)\n",
 			 tx_resp->frame_count))
@@ -1349,6 +1361,13 @@ void iwl_mld_handle_compressed_ba_notif(struct iwl_mld *mld,
 	u8 sta_id = ba_res->sta_id;
 	struct ieee80211_link_sta *link_sta;
 
+	if (unlikely(test_bit(STATUS_FW_ERROR, &mld->trans->status))) {
+		WARN_ONCE(1,
+			  "iwlwifi: BA notif (sta=%d) after FW error\n",
+			  sta_id);
+		return;
+	}
+
 	if (!tfd_cnt)
 		return;
 
diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/rx.c b/drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/rx.c
index 619a9505e6d9..ba18d35fa55d 100644
--- a/drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/rx.c
+++ b/drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/rx.c
@@ -1015,6 +1015,18 @@ static int iwl_pcie_napi_poll(struct napi_struct *napi, int budget)
 	trans_pcie = iwl_netdev_to_trans_pcie(napi->dev);
 	trans = trans_pcie->trans;
 
+	/* Stop processing RX if firmware has crashed. Stale notifications
+	 * from dying firmware (e.g. TX completions with corrupt SSN values)
+	 * can cause use-after-free in reclaim paths.
+	 */
+	if (unlikely(test_bit(STATUS_FW_ERROR, &trans->status))) {
+		WARN_ONCE(1,
+			  "iwlwifi: NAPI poll[%d] invoked after FW error\n",
+			  rxq->id);
+		napi_complete_done(napi, 0);
+		return 0;
+	}
+
 	ret = iwl_pcie_rx_handle(trans, rxq->id, budget);
 
 	IWL_DEBUG_ISR(trans, "[%d] handled %d, budget %d\n",
@@ -1042,6 +1054,14 @@ static int iwl_pcie_napi_poll_msix(struct napi_struct *napi, int budget)
 	trans_pcie = iwl_netdev_to_trans_pcie(napi->dev);
 	trans = trans_pcie->trans;
 
+	if (unlikely(test_bit(STATUS_FW_ERROR, &trans->status))) {
+		WARN_ONCE(1,
+			  "iwlwifi: NAPI MSIX poll[%d] invoked after FW error\n",
+			  rxq->id);
+		napi_complete_done(napi, 0);
+		return 0;
+	}
+
 	ret = iwl_pcie_rx_handle(trans, rxq->id, budget);
 	IWL_DEBUG_ISR(trans, "[%d] handled %d, budget %d\n", rxq->id, ret,
 		      budget);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

[parent not found: <5be8a502-d53a-4cce-821f-202368c44f6d@candelatech.com>]

* Re: [PATCH] wifi: iwlwifi: prevent NAPI processing after firmware error
       [not found]   ` <5be8a502-d53a-4cce-821f-202368c44f6d@candelatech.com>
@ 2026-02-14 18:33     ` Cole Leavitt
  2026-02-16 18:12       ` Ben Greear
  0 siblings, 1 reply; 9+ messages in thread
From: Cole Leavitt @ 2026-02-14 18:33 UTC (permalink / raw)
  To: greearb
  Cc: johannes.berg, miriam.rachel.korenblit, linux-wireless,
	Cole Leavitt

Ben,

Good catch on both fronts.

On the build_tfd dangling pointer -- you're right. The failure path at
line 775 leaves entries[idx].skb/cmd pointing at caller-owned objects
(set at lines 763-764). The caller gets -1 and presumably frees the
skb, so entries[idx].skb becomes a dangling pointer. While write_ptr
not advancing means current unmap paths won't iterate to that index,
it's a latent UAF waiting for a flush path change or future code to
touch it. Two NULL stores inside a held spinlock cost nothing. I think
this should go upstream as its own patch.

On the TOCTOU question -- this is the part I spent the most time on.
The window you're asking about is: firmware starts producing corrupt
completion data *before* STATUS_FW_ERROR gets set. Our NAPI/TX handler
checks can't help there because the flag isn't set yet.

The primary guard in that window is iwl_txq_used() in
iwl_pcie_reclaim(). It validates that the firmware's SSN falls within
[read_ptr, write_ptr). This catches wild values -- out-of-range SSNs,
wraparound corruption, etc.

What it can't catch is an in-range corrupt SSN -- e.g., firmware says
reclaim up to index 15 when legitimate is 8, but write_ptr is 20.
That passes bounds checking and the reclaim loop frees skbs for
entries still in-flight (active DMA). The NULL skb WARN_ONCE in the
loop catches double-reclaim but not first-time over-reclaim.

The complete fix for this would be a per-entry generation counter --
tag each entry on submit, validate on reclaim. But that adds per-entry
overhead on the TX hot path to protect against a condition (firmware
producing corrupt completions) that is already terminal. I think the
right trade-off is:

  1. Your build_tfd NULL fix (eliminates one dangling pointer class)
  2. STATUS_FW_ERROR checks in NAPI poll + TX handlers (this series --
     shrinks the detection window to near-zero)
  3. The existing iwl_txq_used() bounds check (catches most corrupt
     SSNs)

Together these make the damage window small enough that a per-entry
generation scheme isn't justified -- by the time firmware is sending
corrupt SSNs, we're in dump-and-reset territory anyway.

That said, if you're seeing corruption patterns in your customer
testing where a valid-looking-but-wrong SSN gets through before
FW_ERROR fires, I'd be very interested in the traces. That would
change the cost/benefit on the generation counter approach.

Thanks,
Cole

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] wifi: iwlwifi: prevent NAPI processing after firmware error
  2026-02-14 18:33     ` Cole Leavitt
@ 2026-02-16 18:12       ` Ben Greear
  2026-02-18 14:44         ` Cole Leavitt
                           ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Ben Greear @ 2026-02-16 18:12 UTC (permalink / raw)
  To: Cole Leavitt; +Cc: johannes.berg, miriam.rachel.korenblit, linux-wireless

On 2/14/26 10:33 AM, Cole Leavitt wrote:
> Ben,
> 
> Good catch on both fronts.
> 
> On the build_tfd dangling pointer -- you're right. The failure path at
> line 775 leaves entries[idx].skb/cmd pointing at caller-owned objects
> (set at lines 763-764). The caller gets -1 and presumably frees the
> skb, so entries[idx].skb becomes a dangling pointer. While write_ptr
> not advancing means current unmap paths won't iterate to that index,
> it's a latent UAF waiting for a flush path change or future code to
> touch it. Two NULL stores inside a held spinlock cost nothing. I think
> this should go upstream as its own patch.
> 
> On the TOCTOU question -- this is the part I spent the most time on.
> The window you're asking about is: firmware starts producing corrupt
> completion data *before* STATUS_FW_ERROR gets set. Our NAPI/TX handler
> checks can't help there because the flag isn't set yet.
> 
> The primary guard in that window is iwl_txq_used() in
> iwl_pcie_reclaim(). It validates that the firmware's SSN falls within
> [read_ptr, write_ptr). This catches wild values -- out-of-range SSNs,
> wraparound corruption, etc.
> 
> What it can't catch is an in-range corrupt SSN -- e.g., firmware says
> reclaim up to index 15 when legitimate is 8, but write_ptr is 20.
> That passes bounds checking and the reclaim loop frees skbs for
> entries still in-flight (active DMA). The NULL skb WARN_ONCE in the
> loop catches double-reclaim but not first-time over-reclaim.
> 
> The complete fix for this would be a per-entry generation counter --
> tag each entry on submit, validate on reclaim. But that adds per-entry
> overhead on the TX hot path to protect against a condition (firmware
> producing corrupt completions) that is already terminal. I think the
> right trade-off is:
> 
>    1. Your build_tfd NULL fix (eliminates one dangling pointer class)
>    2. STATUS_FW_ERROR checks in NAPI poll + TX handlers (this series --
>       shrinks the detection window to near-zero)
>    3. The existing iwl_txq_used() bounds check (catches most corrupt
>       SSNs)
> 
> Together these make the damage window small enough that a per-entry
> generation scheme isn't justified -- by the time firmware is sending
> corrupt SSNs, we're in dump-and-reset territory anyway.
> 
> That said, if you're seeing corruption patterns in your customer
> testing where a valid-looking-but-wrong SSN gets through before
> FW_ERROR fires, I'd be very interested in the traces. That would
> change the cost/benefit on the generation counter approach.

Hello Cole,

Looks like even with your patches we are still seeing use-after-free.  I tried
adding a lot of checks to detect already freed skbs in iwlwifi, and those are not hitting,
so possibly the bug is very close to the end of the call chain, or I am doing it
wrong, or it is some sort of race or bug that my code will not catch.

We do not see any related crashes when using mt76 radios, so pretty sure this
is related to iwlwifi.  A particular AP reproduces this problem within
a day, and we can run tcp tests for 30+ days against other APs with no problem.
I don't know what the AP could be doing to trigger this though.

No FW crash was seen in my logs in this case.

My tree is here if you care to investigate any of my UAF debugging or see
what code is printing some of these logs.  Suggestions for improvement would
be welcome!

https://github.com/greearb/linux-ct-6.18

One problem I see (for several years) is an infinite busy-spin in iwl-mvm-tx-tso-segment.  I added code to break
out after 32k loops, and warn.  That hits here.  The system crashes 28 minutes later, so not
sure if that is directly related.  I guess I can try to do more debugging around that bad tso
segment path.

Feb 16 00:16:01 LF1-MobileStation1 kernel: skbuff: ERROR: Found more than 32000 packets in skbuff::skb_segment, bailing out.
Feb 16 00:16:01 LF1-MobileStation1 kernel: ERROR: iwl-mvm-tx-tso-segment, list gso-segment list is huge: 32001, bailing out.
Feb 16 00:16:06 LF1-MobileStation1 kernel: skbuff: ERROR: Found more than 32000 packets in skbuff::skb_segment, bailing out.
Feb 16 00:16:06 LF1-MobileStation1 kernel: ERROR: iwl-mvm-tx-tso-segment, list gso-segment list is huge: 32001, bailing out.

Feb 16 00:44:06 LF1-MobileStation1 kernel: ------------[ cut here ]------------
Feb 16 00:44:06 LF1-MobileStation1 kernel: refcount_t: underflow; use-after-free.
Feb 16 00:44:06 LF1-MobileStation1 kernel: WARNING: CPU: 18 PID: 1203 at lib/refcount.c:28 refcount_warn_saturate+0xd8/0xe0
Feb 16 00:44:06 LF1-MobileStation1 kernel: Modules linked in: nf_conntrack_netlink nf_conntrack nfnetlink tls vrf nf_defrag_ipv6 nf_defrag_ipv4 8021q garp mrp 
stp llc macvlan wanlink(O) pktgen rpcrdma rdma_cm iw_cm ib_cm ib_core qrtr nct7802 vfat fat intel_rapl_msr coretemp intel_rapl_common intel_uncore_frequency 
intel_uncore_frequency_common snd_hda_codec_intelhdmi snd_hda_codec_hdmi snd_hda_codec_alc882 x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_realtek_lib 
ofpart snd_hda_codec_generic i2c_designware_platform spi_nor kvm_intel spi_pxa2xx_platform iwlmld i2c_designware_core spd5118 dw_dmac iTCO_wdt intel_pmc_bxt ccp 
mtd regmap_i2c spi_pxa2xx_core uvcvideo kvm snd_hda_intel 8250_dw iTCO_vendor_support mac80211 uvc snd_intel_dspcfg irqbypass videobuf2_vmalloc snd_hda_codec 
videobuf2_memops btusb videobuf2_v4l2 btbcm snd_hda_core videobuf2_common snd_hwdep videodev btmtk snd_seq btrtl mc btintel iwlwifi cdc_acm onboard_usb_dev 
snd_seq_device bluetooth snd_pcm cfg80211 snd_timer intel_pmc_core snd intel_lpss_pci i2c_i801 pmt_telemetry
Feb 16 00:44:06 LF1-MobileStation1 kernel:  i2c_smbus soundcore intel_lpss pmt_discovery spi_intel_pci mei_hdcp idma64 i2c_mux pmt_class wmi_bmof spi_intel 
pcspkr mei_pxp intel_pmc_ssram_telemetry bfq acpi_tad acpi_pad nfsd auth_rpcgss nfs_acl lockd grace nfs_localio sch_fq_codel sunrpc fuse zram raid1 dm_raid 
raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq xe drm_ttm_helper gpu_sched drm_suballoc_helper drm_gpuvm drm_exec 
drm_gpusvm_helper i915 i2c_algo_bit drm_buddy intel_gtt drm_client_lib drm_display_helper drm_kms_helper cec rc_core intel_oc_wdt ttm ixgbe agpgart mdio 
libie_fwlog e1000e igc dca hwmon drm mei_wdt intel_vsec i2c_core video wmi pinctrl_alderlake efivarfs [last unloaded: nfnetlink]
Feb 16 00:44:06 LF1-MobileStation1 kernel: CPU: 18 UID: 0 PID: 1203 Comm: irq/343-iwlwifi Tainted: G S         O        6.18.9+ #53 PREEMPT(full)
Feb 16 00:44:06 LF1-MobileStation1 kernel: Tainted: [S]=CPU_OUT_OF_SPEC, [O]=OOT_MODULE
Feb 16 00:44:06 LF1-MobileStation1 kernel: Hardware name: Default string /Default string, BIOS 5.27 11/12/2024
Feb 16 00:44:06 LF1-MobileStation1 kernel: RIP: 0010:refcount_warn_saturate+0xd8/0xe0
Feb 16 00:44:07 LF1-MobileStation1 kernel: Code: ff 48 c7 c7 d8 a4 6d 82 c6 05 d0 4a 3e 01 01 e8 3e 83 a7 ff 0f 0b c3 48 c7 c7 80 a4 6d 82 c6 05 bc 4a 3e 01 01 
e8 28 83 a7 ff <0f> 0b c3 0f 1f 44 00 00 8b 07 3d 00 00 00 c0 74 12 83 f8 01 74 13
Feb 16 00:44:07 LF1-MobileStation1 kernel: RSP: 0018:ffffc9000045c6d0 EFLAGS: 00010282
Feb 16 00:44:07 LF1-MobileStation1 kernel: RAX: 0000000000000000 RBX: ffff8882772db000 RCX: 0000000000000000
Feb 16 00:44:07 LF1-MobileStation1 kernel: RDX: ffff88885faa5f00 RSI: 0000000000000001 RDI: ffff88885fa98d00
Feb 16 00:44:07 LF1-MobileStation1 kernel: RBP: ffff8882447d9e00 R08: 0000000000000000 R09: 0000000000000003
Feb 16 00:44:07 LF1-MobileStation1 kernel: R10: ffffc9000045c570 R11: ffffffff82b58da8 R12: ffff88820165f200
Feb 16 00:44:07 LF1-MobileStation1 kernel: R13: 0000000000000001 R14: 00000000000005a8 R15: ffffc9000045c890
Feb 16 00:44:07 LF1-MobileStation1 kernel: FS:  0000000000000000(0000) GS:ffff8888dc5ae000(0000) knlGS:0000000000000000
Feb 16 00:44:07 LF1-MobileStation1 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 16 00:44:07 LF1-MobileStation1 kernel: CR2: 00007fd1022fdcb4 CR3: 0000000005a36004 CR4: 0000000000772ef0
Feb 16 00:44:07 LF1-MobileStation1 kernel: PKRU: 55555554
Feb 16 00:44:07 LF1-MobileStation1 kernel: Call Trace:
Feb 16 00:44:07 LF1-MobileStation1 kernel:  <IRQ>
Feb 16 00:44:07 LF1-MobileStation1 kernel:  tcp_shifted_skb+0x1d2/0x300
Feb 16 00:44:07 LF1-MobileStation1 kernel:  tcp_sacktag_walk+0x2da/0x4d0
Feb 16 00:44:07 LF1-MobileStation1 kernel:  tcp_sacktag_write_queue+0x4a1/0x9a0
Feb 16 00:44:07 LF1-MobileStation1 kernel:  tcp_ack+0xd66/0x16e0
Feb 16 00:44:07 LF1-MobileStation1 kernel:  ? ip_finish_output2+0x189/0x570
Feb 16 00:44:07 LF1-MobileStation1 kernel:  tcp_rcv_established+0x211/0xc10
Feb 16 00:44:07 LF1-MobileStation1 kernel:  ? sk_filter_trim_cap+0x1a7/0x350
Feb 16 00:44:07 LF1-MobileStation1 kernel:  tcp_v4_do_rcv+0x1bf/0x350
Feb 16 00:44:07 LF1-MobileStation1 kernel:  tcp_v4_rcv+0xddf/0x1550
Feb 16 00:44:07 LF1-MobileStation1 kernel:  ? raw_local_deliver+0xcc/0x280
Feb 16 00:44:07 LF1-MobileStation1 kernel:  ip_protocol_deliver_rcu+0x20/0x130
Feb 16 00:44:07 LF1-MobileStation1 kernel:  ip_local_deliver_finish+0x85/0xf0
Feb 16 00:44:07 LF1-MobileStation1 kernel:  ip_sublist_rcv_finish+0x35/0x50
Feb 16 00:44:07 LF1-MobileStation1 kernel:  ip_sublist_rcv+0x16f/0x200
Feb 16 00:44:07 LF1-MobileStation1 kernel:  ip_list_rcv+0xfe/0x130
Feb 16 00:44:07 LF1-MobileStation1 kernel:  __netif_receive_skb_list_core+0x183/0x1f0
Feb 16 00:44:07 LF1-MobileStation1 kernel:  netif_receive_skb_list_internal+0x1c8/0x2a0
Feb 16 00:44:07 LF1-MobileStation1 kernel:  gro_receive_skb+0x12e/0x210
Feb 16 00:44:07 LF1-MobileStation1 kernel:  ieee80211_rx_napi+0x82/0xc0 [mac80211]
Feb 16 00:44:07 LF1-MobileStation1 kernel:  iwl_mld_rx_mpdu+0xd0f/0xf00 [iwlmld]
Feb 16 00:44:07 LF1-MobileStation1 kernel:  iwl_pcie_rx_handle+0x394/0xa00 [iwlwifi]
Feb 16 00:44:07 LF1-MobileStation1 kernel:  iwl_pcie_napi_poll_msix+0x3f/0x110 [iwlwifi]
Feb 16 00:44:07 LF1-MobileStation1 kernel:  __napi_poll+0x25/0x1e0
Feb 16 00:44:07 LF1-MobileStation1 kernel:  net_rx_action+0x2d3/0x340
Feb 16 00:44:07 LF1-MobileStation1 kernel:  ? try_to_wake_up+0x2e6/0x610
Feb 16 00:44:07 LF1-MobileStation1 kernel:  ? __handle_irq_event_percpu+0xa3/0x230
Feb 16 00:44:07 LF1-MobileStation1 kernel:  handle_softirqs+0xca/0x2b0
Feb 16 00:44:07 LF1-MobileStation1 kernel:  ? irq_thread_dtor+0xa0/0xa0
Feb 16 00:44:07 LF1-MobileStation1 kernel:  do_softirq.part.0+0x3b/0x60
Feb 16 00:44:07 LF1-MobileStation1 kernel:  </IRQ>
Feb 16 00:44:07 LF1-MobileStation1 kernel:  <TASK>
Feb 16 00:44:07 LF1-MobileStation1 kernel:  __local_bh_enable_ip+0x58/0x60
Feb 16 00:44:07 LF1-MobileStation1 kernel:  iwl_pcie_irq_rx_msix_handler+0xbb/0x100 [iwlwifi]
Feb 16 00:44:07 LF1-MobileStation1 kernel:  irq_thread_fn+0x19/0x50
Feb 16 00:44:07 LF1-MobileStation1 kernel:  irq_thread+0x126/0x230
Feb 16 00:44:07 LF1-MobileStation1 kernel:  ? irq_finalize_oneshot.part.0+0xc0/0xc0
Feb 16 00:44:07 LF1-MobileStation1 kernel:  ? irq_forced_thread_fn+0x40/0x40
Feb 16 00:44:07 LF1-MobileStation1 kernel:  kthread+0xf7/0x1f0
Feb 16 00:44:07 LF1-MobileStation1 kernel:  ? kthreads_online_cpu+0x100/0x100
Feb 16 00:44:07 LF1-MobileStation1 kernel:  ? kthreads_online_cpu+0x100/0x100
Feb 16 00:44:07 LF1-MobileStation1 kernel:  ret_from_fork+0x114/0x140
Feb 16 00:44:07 LF1-MobileStation1 kernel:  ? kthreads_online_cpu+0x100/0x100
Feb 16 00:44:07 LF1-MobileStation1 kernel:  ret_from_fork_asm+0x11/0x20
Feb 16 00:44:07 LF1-MobileStation1 kernel:  </TASK>
Feb 16 00:44:07 LF1-MobileStation1 kernel: ---[ end trace 0000000000000000 ]---
Feb 16 00:44:07 LF1-MobileStation1 kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000

[NPE shortly after in tcp code, bug real problem is the use-after-free I assume]
# serial console output of the crash following the UAF.

#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP
CPU: 18 UID: 0 PID: 1203 Comm: irq/343-iwlwifi Tainted: G S      W  O        6.18.9+ #53 PREEMPT(full)
Tainted: [S]=CPU_OUT_OF_SPEC, [W]=WARN, [O]=OOT_MODULE
Hardware name: Default string /Default string, BIOS 5.27 11/12/2024
RIP: 0010:tcp_rack_detect_loss+0x11c/0x170
Code: 07 00 00 48 8b 87 b0 06 00 00 44 01 ee 48 29 d0 ba 00 00 00 00 48 0f 48 c2 29 c6 85 f6 7e 27 41 8b 06 39 f0 0f 42 c6 41 89 06 <48> 8b 45 58 4c 8d 65 58 48 
89 eb 48 83 e8 58 4d 39 fc 74 ab 48 89
RSP: 0018:ffffc9000045c758 EFLAGS: 00010293
RAX: 000000000000408d RBX: ffff88824fff7a00 RCX: 20c49ba5e353f7cf
RDX: 0000000000000000 RSI: 000000000000408d RDI: ffff88820165f200
RBP: ffffffffffffffa8 R08: 0000000083eed3f9 R09: 000000000000012c
R10: 00000000000005ba R11: 000000000000001d R12: ffff88824fff7a58
R13: 000000000000408d R14: ffffc9000045c79c R15: ffff88820165f888
FS:  0000000000000000(0000) GS:ffff8888dc5ae000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000005a36004 CR4: 0000000000772ef0
PKRU: 55555554
Call Trace:
  <IRQ>
  tcp_rack_mark_lost+0x59/0xe0
  tcp_identify_packet_loss+0x30/0x70
  tcp_fastretrans_alert+0x366/0x810
  tcp_ack+0xc66/0x16e0
  ? ip_finish_output2+0x189/0x570
  tcp_rcv_established+0x211/0xc10
  ? sk_filter_trim_cap+0x1a7/0x350
  tcp_v4_do_rcv+0x1bf/0x350
  tcp_v4_rcv+0xddf/0x1550
  ? raw_local_deliver+0xcc/0x280
  ip_protocol_deliver_rcu+0x20/0x130
  ip_local_deliver_finish+0x85/0xf0
  ip_sublist_rcv_finish+0x35/0x50
  ip_sublist_rcv+0x16f/0x200
  ip_list_rcv+0xfe/0x130
  __netif_receive_skb_list_core+0x183/0x1f0
  netif_receive_skb_list_internal+0x1c8/0x2a0
  gro_receive_skb+0x12e/0x210
  ieee80211_rx_napi+0x82/0xc0 [mac80211]
  iwl_mld_rx_mpdu+0xd0f/0xf00 [iwlmld]
  iwl_pcie_rx_handle+0x394/0xa00 [iwlwifi]
  iwl_pcie_napi_poll_msix+0x3f/0x110 [iwlwifi]
  __napi_poll+0x25/0x1e0
  net_rx_action+0x2d3/0x340
  ? try_to_wake_up+0x2e6/0x610
  ? __handle_irq_event_percpu+0xa3/0x230
  handle_softirqs+0xca/0x2b0
  ? irq_thread_dtor+0xa0/0xa0
  do_softirq.part.0+0x3b/0x60
  </IRQ>
  <TASK>
  __local_bh_enable_ip+0x58/0x60
  iwl_pcie_irq_rx_msix_handler+0xbb/0x100 [iwlwifi]
  irq_thread_fn+0x19/0x50
  irq_thread+0x126/0x230
  ? irq_finalize_oneshot.part.0+0xc0/0xc0
  ? irq_forced_thread_fn+0x40/0x40
  kthread+0xf7/0x1f0
  ? kthreads_online_cpu+0x100/0x100
  ? kthreads_online_cpu+0x100/0x100
  ret_from_fork+0x114/0x140
  ? kthreads_online_cpu+0x100/0x100
  ret_from_fork_asm+0x11/0x20
  </TASK>
Modules linked in: nf_conntrack_netlink nf_conntrack nfnetlink tls vrf nf_defrag_ipv6 nf_defrag_ipv4 8021q garp mrp stp llc macvlan wanlink(O) pktgen rpcrdma 
rdma_cm iw_cm ib_cm ib_core qrtr nct7802 vfat fat intel_rapl_msr coretemp intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common 
snd_hda_codec_intelhdmi snd_hda_codec_hdmi snd_hda_codec_alc882 x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_realtek_lib ofpart snd_hda_codec_generic 
i2c_designware_platform spi_nor kvm_intel spi_pxa2xx_platform iwlmld i2c_designware_core spd5118 dw_dmac iTCO_wdt intel_pmc_bxt ccp mtd regmap_i2c 
spi_pxa2xx_core uvcvideo kvm snd_hda_intel 8250_dw iTCO_vendor_support mac80211 uvc snd_intel_dspcfg irqbypass videobuf2_vmalloc snd_hda_codec videobuf2_memops 
btusb videobuf2_v4l2 btbcm snd_hda_core videobuf2_common snd_hwdep videodev btmtk snd_seq btrtl mc btintel iwlwifi cdc_acm onboard_usb_dev snd_seq_device 
bluetooth snd_pcm cfg80211 snd_timer intel_pmc_core snd intel_lpss_pci i2c_i801 pmt_telemetry
  i2c_smbus soundcore intel_lpss pmt_discovery spi_intel_pci mei_hdcp idma64 i2c_mux pmt_class wmi_bmof spi_intel pcspkr mei_pxp intel_pmc_ssram_telemetry bfq 
acpi_tad acpi_pad nfsd auth_rpcgss nfs_acl lockd grace nfs_localio sch_fq_codel sunrpc fuse zram raid1 dm_raid raid456 async_raid6_recov async_memcpy async_pq 
async_xor xor async_tx raid6_pq xe drm_ttm_helper gpu_sched drm_suballoc_helper drm_gpuvm drm_exec drm_gpusvm_helper i915 i2c_algo_bit drm_buddy intel_gtt 
drm_client_lib drm_display_helper drm_kms_helper cec rc_core intel_oc_wdt ttm ixgbe agpgart mdio libie_fwlog e1000e igc dca hwmon drm mei_wdt intel_vsec 
i2c_core video wmi pinctrl_alderlake efivarfs [last unloaded: nfnetlink]
CR2: 0000000000000000
---[ end trace 0000000000000000 ]---
RIP: 0010:tcp_rack_detect_loss+0x11c/0x170
Code: 07 00 00 48 8b 87 b0 06 00 00 44 01 ee 48 29 d0 ba 00 00 00 00 48 0f 48 c2 29 c6 85 f6 7e 27 41 8b 06 39 f0 0f 42 c6 41 89 06 <48> 8b 45 58 4c 8d 65 58 48 
89 eb 48 83 e8 58 4d 39 fc 74 ab 48 89
RSP: 0018:ffffc9000045c758 EFLAGS: 00010293
RAX: 000000000000408d RBX: ffff88824fff7a00 RCX: 20c49ba5e353f7cf
RDX: 0000000000000000 RSI: 000000000000408d RDI: ffff88820165f200
RBP: ffffffffffffffa8 R08: 0000000083eed3f9 R09: 000000000000012c
R10: 00000000000005ba R11: 000000000000001d R12: ffff88824fff7a58
R13: 000000000000408d R14: ffffc9000045c79c R15: ffff88820165f888
FS:  0000000000000000(0000) GS:ffff8888dc5ae000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000005a36004 CR4: 0000000000772ef0
PKRU: 55555554
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: disabled
Rebooting in 10 seconds..

Thanks,
Ben

> 
> Thanks,
> Cole
> 


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] wifi: iwlwifi: prevent NAPI processing after firmware error
  2026-02-16 18:12       ` Ben Greear
@ 2026-02-18 14:44         ` Cole Leavitt
  2026-02-18 14:44         ` Cole Leavitt
  2026-02-18 17:35         ` Ben Greear
  2 siblings, 0 replies; 9+ messages in thread
From: Cole Leavitt @ 2026-02-18 14:44 UTC (permalink / raw)
  To: Ben Greear; +Cc: Johannes Berg, linux-wireless, Miri Korenblit

Ben,

I've been digging into the use-after-free crash you reported on your
BE200 running the MLD driver (tcp_shifted_skb refcount underflow,
followed by NULL deref in tcp_rack_detect_loss). I think I found the
root cause -- it's a missing guard in the MLD TSO segmentation path
that lets num_subframes=0 reach skb_gso_segment(), producing the 32k+
segment explosion you're seeing.

Here's the full chain:

1) mld/tlc.c:790 -- when firmware's TLC notification disables AMSDU for
   a TID (bit not set in amsdu_enabled), the MLD driver sets:

     link_sta->agg.max_tid_amsdu_len[i] = 1;

   This sentinel value 1 means "AMSDU disabled on this TID".

2) mld/tx.c:836-837 -- the TSO path checks:

     max_tid_amsdu_len = sta->cur->max_tid_amsdu_len[tid];
     if (!max_tid_amsdu_len)   // <-- only catches zero, not 1
         return iwl_tx_tso_segment(skb, 1, ...);

   Value 1 passes this check.

3) mld/tx.c:847 -- the division produces zero:

     num_subframes = (1 + 2) / (1534 + 2) = 0

   Any max_tid_amsdu_len below ~1534 (one subframe) produces 0 here.

4) iwl-utils.c:27 -- gso_size is set to zero:

     skb_shinfo(skb)->gso_size = num_subframes * mss = 0 * 1460 = 0

5) iwl-utils.c:30 -- skb_gso_segment() with gso_size=0 creates 32001+
   tiny segments, which is the error you're seeing:

     "skbuff: ERROR: Found more than 32000 packets in skb_segment"
     "iwl-mvm-tx-tso-segment, list gso-segment list is huge: 32001"

6) mld/tx.c:912-936 -- the loop queues ~1024 of those segments to the
   TX ring before it fills up, then purges the rest. This creates a
   massive burst of tiny frames that stress the BA completion path.

The MVM driver is immune because it checks mvmsta->amsdu_enabled (a
separate bitmap) at tx.c:912 and tx.c:936 BEFORE ever reaching the
num_subframes calculation. MLD has no equivalent -- it relies solely on
max_tid_amsdu_len, and the sentinel value 1 slips through.

This explains all your observations:
- 6.18 regression: BE200 moved from MVM (has guard) to MLD (no guard)
- AP-specific: the problem AP causes firmware to disable AMSDU for the
  active TID (other APs enable it, so max_tid_amsdu_len gets a proper
  value from iwl_mld_get_amsdu_size_of_tid())
- 28min gap between TSO explosion and UAF: the ~1024 micro-frame burst
  creates massive alloc/free churn in the skb slab, which can corrupt
  TCP retransmit queue entries allocated from the same cache
- No firmware error: firmware is fine, the bug is purely in MLD's TSO
  parameter calculation

Fix below. It adds a guard after the num_subframes calculation -- if
it's zero, fall back to single-subframe TSO (num_subframes=1), which
correctly sets gso_size=mss. This matches what MVM effectively does via
its amsdu_enabled checks.

Could you test this against the problem AP? Two things that would help
confirm the theory:

1) Before applying the fix, add this debug print to see the actual
   max_tid_amsdu_len value with the problem AP:

     // In iwl_mld_tx_tso_segment(), after line 847
     if (!num_subframes)
         pr_warn_once("iwlmld: num_subframes=0, max_tid_amsdu_len=%u "
                      "subf_len=%u mss=%u\n",
                      max_tid_amsdu_len, subf_len, mss);

2) After applying the fix, run against the problem AP for 1+ day and
   check if both the TSO explosion AND the UAF are gone.

I also noticed a few secondary defense-in-depth regressions in MLD's
TX completion path vs MVM:

- MLD's iwl_mld_tx_reclaim_txq() has no per-TID reclaim tracking
  (MVM has tid_data->next_reclaimed and validates tid_data->txq_id)
- The transport-level reclaim_lock prevents direct double-free, but
  MLD is missing MVM's extra safety checks

These are probably not directly causing your crash, but worth noting.

---
 drivers/net/wireless/intel/iwlwifi/mld/tx.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/net/wireless/intel/iwlwifi/mld/tx.c b/drivers/net/wireless/intel/iwlwifi/mld/tx.c
index fbb672f4d8c7..1d47254a4148 100644
--- a/drivers/net/wireless/intel/iwlwifi/mld/tx.c
+++ b/drivers/net/wireless/intel/iwlwifi/mld/tx.c
@@ -846,6 +846,17 @@ static int iwl_mld_tx_tso_segment(struct iwl_mld *mld, struct sk_buff *skb,
 	 */
 	num_subframes = (max_tid_amsdu_len + pad) / (subf_len + pad);

+	/* If the AMSDU length limit is too small to fit even a single
+	 * subframe (e.g. max_tid_amsdu_len is the sentinel value 1 set by
+	 * the TLC notification when AMSDU is disabled for this TID), fall
+	 * back to non-AMSDU TSO segmentation. Without this guard,
+	 * num_subframes=0 causes gso_size=0 in iwl_tx_tso_segment(),
+	 * which makes skb_gso_segment() produce tens of thousands of
+	 * 1-byte segments, overloading the TX ring and completion path.
+	 */
+	if (!num_subframes)
+		return iwl_tx_tso_segment(skb, 1, netdev_flags, mpdus_skbs);
+
 	if (sta->max_amsdu_subframes &&
 	    num_subframes > sta->max_amsdu_subframes)
 		num_subframes = sta->max_amsdu_subframes;
-- 
2.52.0

Cole

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] wifi: iwlwifi: prevent NAPI processing after firmware error
  2026-02-16 18:12       ` Ben Greear
  2026-02-18 14:44         ` Cole Leavitt
@ 2026-02-18 14:44         ` Cole Leavitt
  2026-02-18 17:35         ` Ben Greear
  2 siblings, 0 replies; 9+ messages in thread
From: Cole Leavitt @ 2026-02-18 14:44 UTC (permalink / raw)
  To: Ben Greear; +Cc: Johannes Berg, linux-wireless, Miri Korenblit

Ben,

I've been digging into the use-after-free crash you reported on your
BE200 running the MLD driver (tcp_shifted_skb refcount underflow,
followed by NULL deref in tcp_rack_detect_loss). I think I found the
root cause -- it's a missing guard in the MLD TSO segmentation path
that lets num_subframes=0 reach skb_gso_segment(), producing the 32k+
segment explosion you're seeing.

Here's the full chain:

1) mld/tlc.c:790 -- when firmware's TLC notification disables AMSDU for
   a TID (bit not set in amsdu_enabled), the MLD driver sets:

     link_sta->agg.max_tid_amsdu_len[i] = 1;

   This sentinel value 1 means "AMSDU disabled on this TID".

2) mld/tx.c:836-837 -- the TSO path checks:

     max_tid_amsdu_len = sta->cur->max_tid_amsdu_len[tid];
     if (!max_tid_amsdu_len)   // <-- only catches zero, not 1
         return iwl_tx_tso_segment(skb, 1, ...);

   Value 1 passes this check.

3) mld/tx.c:847 -- the division produces zero:

     num_subframes = (1 + 2) / (1534 + 2) = 0

   Any max_tid_amsdu_len below ~1534 (one subframe) produces 0 here.

4) iwl-utils.c:27 -- gso_size is set to zero:

     skb_shinfo(skb)->gso_size = num_subframes * mss = 0 * 1460 = 0

5) iwl-utils.c:30 -- skb_gso_segment() with gso_size=0 creates 32001+
   tiny segments, which is the error you're seeing:

     "skbuff: ERROR: Found more than 32000 packets in skb_segment"
     "iwl-mvm-tx-tso-segment, list gso-segment list is huge: 32001"

6) mld/tx.c:912-936 -- the loop queues ~1024 of those segments to the
   TX ring before it fills up, then purges the rest. This creates a
   massive burst of tiny frames that stress the BA completion path.

The MVM driver is immune because it checks mvmsta->amsdu_enabled (a
separate bitmap) at tx.c:912 and tx.c:936 BEFORE ever reaching the
num_subframes calculation. MLD has no equivalent -- it relies solely on
max_tid_amsdu_len, and the sentinel value 1 slips through.

This explains all your observations:
- 6.18 regression: BE200 moved from MVM (has guard) to MLD (no guard)
- AP-specific: the problem AP causes firmware to disable AMSDU for the
  active TID (other APs enable it, so max_tid_amsdu_len gets a proper
  value from iwl_mld_get_amsdu_size_of_tid())
- 28min gap between TSO explosion and UAF: the ~1024 micro-frame burst
  creates massive alloc/free churn in the skb slab, which can corrupt
  TCP retransmit queue entries allocated from the same cache
- No firmware error: firmware is fine, the bug is purely in MLD's TSO
  parameter calculation

Fix below. It adds a guard after the num_subframes calculation -- if
it's zero, fall back to single-subframe TSO (num_subframes=1), which
correctly sets gso_size=mss. This matches what MVM effectively does via
its amsdu_enabled checks.

Could you test this against the problem AP? Two things that would help
confirm the theory:

1) Before applying the fix, add this debug print to see the actual
   max_tid_amsdu_len value with the problem AP:

     // In iwl_mld_tx_tso_segment(), after line 847
     if (!num_subframes)
         pr_warn_once("iwlmld: num_subframes=0, max_tid_amsdu_len=%u "
                      "subf_len=%u mss=%u\n",
                      max_tid_amsdu_len, subf_len, mss);

2) After applying the fix, run against the problem AP for 1+ day and
   check if both the TSO explosion AND the UAF are gone.

I also noticed a few secondary defense-in-depth regressions in MLD's
TX completion path vs MVM:

- MLD's iwl_mld_tx_reclaim_txq() has no per-TID reclaim tracking
  (MVM has tid_data->next_reclaimed and validates tid_data->txq_id)
- The transport-level reclaim_lock prevents direct double-free, but
  MLD is missing MVM's extra safety checks

These are probably not directly causing your crash, but worth noting.

---
 drivers/net/wireless/intel/iwlwifi/mld/tx.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/net/wireless/intel/iwlwifi/mld/tx.c b/drivers/net/wireless/intel/iwlwifi/mld/tx.c
index fbb672f4d8c7..1d47254a4148 100644
--- a/drivers/net/wireless/intel/iwlwifi/mld/tx.c
+++ b/drivers/net/wireless/intel/iwlwifi/mld/tx.c
@@ -846,6 +846,17 @@ static int iwl_mld_tx_tso_segment(struct iwl_mld *mld, struct sk_buff *skb,
 	 */
 	num_subframes = (max_tid_amsdu_len + pad) / (subf_len + pad);

+	/* If the AMSDU length limit is too small to fit even a single
+	 * subframe (e.g. max_tid_amsdu_len is the sentinel value 1 set by
+	 * the TLC notification when AMSDU is disabled for this TID), fall
+	 * back to non-AMSDU TSO segmentation. Without this guard,
+	 * num_subframes=0 causes gso_size=0 in iwl_tx_tso_segment(),
+	 * which makes skb_gso_segment() produce tens of thousands of
+	 * 1-byte segments, overloading the TX ring and completion path.
+	 */
+	if (!num_subframes)
+		return iwl_tx_tso_segment(skb, 1, netdev_flags, mpdus_skbs);
+
 	if (sta->max_amsdu_subframes &&
 	    num_subframes > sta->max_amsdu_subframes)
 		num_subframes = sta->max_amsdu_subframes;
-- 
2.52.0

Cole

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] wifi: iwlwifi: prevent NAPI processing after firmware error
  2026-02-16 18:12       ` Ben Greear
  2026-02-18 14:44         ` Cole Leavitt
  2026-02-18 14:44         ` Cole Leavitt
@ 2026-02-18 17:35         ` Ben Greear
  2 siblings, 0 replies; 9+ messages in thread
From: Ben Greear @ 2026-02-18 17:35 UTC (permalink / raw)
  To: Cole Leavitt; +Cc: Johannes Berg, linux-wireless, Miri Korenblit

On 2/18/26 09:17, Cole Leavitt wrote:
> Ben,
> 
> I've been digging into the use-after-free crash you reported on your
> BE200 running the MLD driver (tcp_shifted_skb refcount underflow,
> followed by NULL deref in tcp_rack_detect_loss). I think I found the
> root cause -- it's a missing guard in the MLD TSO segmentation path
> that lets num_subframes=0 reach skb_gso_segment(), producing the 32k+
> segment explosion you're seeing

Hello Cole,

Thanks for this, I'll take a closer look and test this out.

But also, I first saw this back in 2024, and that was before mld split from
mvm driver.  Possibly mvm added protection after I saw the problem and
that didn't make it into mld for some reason, or maybe there are other problems
as well.

Thanks,
Ben


> 
> Here's the full chain:
> 
> 1) mld/tlc.c:790 -- when firmware's TLC notification disables AMSDU for
>     a TID (bit not set in amsdu_enabled), the MLD driver sets:
> 
>       link_sta->agg.max_tid_amsdu_len[i] = 1;
> 
>     This sentinel value 1 means "AMSDU disabled on this TID".
> 
> 2) mld/tx.c:836-837 -- the TSO path checks:
> 
>       max_tid_amsdu_len = sta->cur->max_tid_amsdu_len[tid];
>       if (!max_tid_amsdu_len)   // <-- only catches zero, not 1
>           return iwl_tx_tso_segment(skb, 1, ...);
> 
>     Value 1 passes this check.
> 
> 3) mld/tx.c:847 -- the division produces zero:
> 
>       num_subframes = (1 + 2) / (1534 + 2) = 0
> 
>     Any max_tid_amsdu_len below ~1534 (one subframe) produces 0 here.
> 
> 4) iwl-utils.c:27 -- gso_size is set to zero:
> 
>       skb_shinfo(skb)->gso_size = num_subframes * mss = 0 * 1460 = 0
> 
> 5) iwl-utils.c:30 -- skb_gso_segment() with gso_size=0 creates 32001+
>     tiny segments, which is the error you're seeing:
> 
>       "skbuff: ERROR: Found more than 32000 packets in skb_segment"
>       "iwl-mvm-tx-tso-segment, list gso-segment list is huge: 32001"
> 
> 6) mld/tx.c:912-936 -- the loop queues ~1024 of those segments to the
>     TX ring before it fills up, then purges the rest. This creates a
>     massive burst of tiny frames that stress the BA completion path.
> 
> The MVM driver is immune because it checks mvmsta->amsdu_enabled (a
> separate bitmap) at tx.c:912 and tx.c:936 BEFORE ever reaching the
> num_subframes calculation. MLD has no equivalent -- it relies solely on
> max_tid_amsdu_len, and the sentinel value 1 slips through.
> 
> This explains all your observations:
> - 6.18 regression: BE200 moved from MVM (has guard) to MLD (no guard)
> - AP-specific: the problem AP causes firmware to disable AMSDU for the
>    active TID (other APs enable it, so max_tid_amsdu_len gets a proper
>    value from iwl_mld_get_amsdu_size_of_tid())
> - 28min gap between TSO explosion and UAF: the ~1024 micro-frame burst
>    creates massive alloc/free churn in the skb slab, which can corrupt
>    TCP retransmit queue entries allocated from the same cache
> - No firmware error: firmware is fine, the bug is purely in MLD's TSO
>    parameter calculation
> 
> Fix below. It adds a guard after the num_subframes calculation -- if
> it's zero, fall back to single-subframe TSO (num_subframes=1), which
> correctly sets gso_size=mss. This matches what MVM effectively does via
> its amsdu_enabled checks.
> 
> Could you test this against the problem AP? Two things that would help
> confirm the theory:
> 
> 1) Before applying the fix, add this debug print to see the actual
>     max_tid_amsdu_len value with the problem AP:
> 
>       // In iwl_mld_tx_tso_segment(), after line 847
>       if (!num_subframes)
>           pr_warn_once("iwlmld: num_subframes=0, max_tid_amsdu_len=%u "
>                        "subf_len=%u mss=%u\n",
>                        max_tid_amsdu_len, subf_len, mss);
> 
> 2) After applying the fix, run against the problem AP for 1+ day and
>     check if both the TSO explosion AND the UAF are gone.
> 
> I also noticed a few secondary defense-in-depth regressions in MLD's
> TX completion path vs MVM:
> 
> - MLD's iwl_mld_tx_reclaim_txq() has no per-TID reclaim tracking
>    (MVM has tid_data->next_reclaimed and validates tid_data->txq_id)
> - The transport-level reclaim_lock prevents direct double-free, but
>    MLD is missing MVM's extra safety checks
> 
> These are probably not directly causing your crash, but worth noting.
> 
> ---
>   drivers/net/wireless/intel/iwlwifi/mld/tx.c | 11 +++++++++++
>   1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/net/wireless/intel/iwlwifi/mld/tx.c b/drivers/net/wireless/intel/iwlwifi/mld/tx.c
> index fbb672f4d8c7..1d47254a4148 100644
> --- a/drivers/net/wireless/intel/iwlwifi/mld/tx.c
> +++ b/drivers/net/wireless/intel/iwlwifi/mld/tx.c
> @@ -846,6 +846,17 @@ static int iwl_mld_tx_tso_segment(struct iwl_mld *mld, struct sk_buff *skb,
>   	 */
>   	num_subframes = (max_tid_amsdu_len + pad) / (subf_len + pad);
>   
> +	/* If the AMSDU length limit is too small to fit even a single
> +	 * subframe (e.g. max_tid_amsdu_len is the sentinel value 1 set by
> +	 * the TLC notification when AMSDU is disabled for this TID), fall
> +	 * back to non-AMSDU TSO segmentation. Without this guard,
> +	 * num_subframes=0 causes gso_size=0 in iwl_tx_tso_segment(),
> +	 * which makes skb_gso_segment() produce tens of thousands of
> +	 * 1-byte segments, overloading the TX ring and completion path.
> +	 */
> +	if (!num_subframes)
> +		return iwl_tx_tso_segment(skb, 1, netdev_flags, mpdus_skbs);
> +
>   	if (sta->max_amsdu_subframes &&
>   	    num_subframes > sta->max_amsdu_subframes)
>   		num_subframes = sta->max_amsdu_subframes;

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] wifi: iwlwifi: prevent NAPI processing after firmware error
  2026-02-14 18:10 ` Cole Leavitt
       [not found]   ` <5be8a502-d53a-4cce-821f-202368c44f6d@candelatech.com>
@ 2026-02-14 18:41   ` Cole Leavitt
  1 sibling, 0 replies; 9+ messages in thread
From: Cole Leavitt @ 2026-02-14 18:41 UTC (permalink / raw)
  To: johannes.berg, miriam.rachel.korenblit
  Cc: greearb, linux-wireless, stable, Cole Leavitt

After a firmware error is detected and STATUS_FW_ERROR is set, NAPI can
still be actively polling or get scheduled from a prior interrupt. The
NAPI poll functions (both legacy and MSIX variants) have no check for
STATUS_FW_ERROR and will continue processing stale RX ring entries from
dying firmware. This can dispatch TX completion notifications containing
corrupt SSN values to iwl_mld_handle_tx_resp_notif(), which passes them
to iwl_trans_reclaim(). If the corrupt SSN causes reclaim to walk TX
queue entries that were already freed by a prior correct reclaim, the
result is an skb use-after-free or double-free.

The race window opens when the MSIX IRQ handler schedules NAPI (lines
2319-2321 in rx.c) before processing the error bit (lines 2382-2396),
or when NAPI is already running on another CPU from a previous interrupt
when STATUS_FW_ERROR gets set on the current CPU.

Add STATUS_FW_ERROR checks to both NAPI poll functions to prevent
processing stale RX data after firmware error, and add early-return
guards in the TX response and compressed BA notification handlers as
defense-in-depth. Each check uses WARN_ONCE to log if the race is
actually hit, which aids diagnosis of the hard-to-reproduce skb
use-after-free reported on Intel BE200.

Note that _iwl_trans_pcie_gen2_stop_device() already calls
iwl_pcie_rx_napi_sync() to quiesce NAPI during device teardown, but that
runs much later in the restart sequence. These checks close the window
between error detection and device stop.

Fixes: d1e879ec600f ("wifi: iwlwifi: add iwlmld sub-driver")
Cc: stable@vger.kernel.org
Signed-off-by: Cole Leavitt <cole@unwrap.rs>
---
 drivers/net/wireless/intel/iwlwifi/mld/tx.c   | 19 ++++++++++++++++++
 .../wireless/intel/iwlwifi/pcie/gen1_2/rx.c   | 20 +++++++++++++++++++
 2 files changed, 39 insertions(+)

diff --git a/drivers/net/wireless/intel/iwlwifi/mld/tx.c b/drivers/net/wireless/intel/iwlwifi/mld/tx.c
index 3b4b575aadaa..3e99f3ded9bc 100644
--- a/drivers/net/wireless/intel/iwlwifi/mld/tx.c
+++ b/drivers/net/wireless/intel/iwlwifi/mld/tx.c
@@ -1071,6 +1071,18 @@ void iwl_mld_handle_tx_resp_notif(struct iwl_mld *mld,
 	bool mgmt = false;
 	bool tx_failure = (status & TX_STATUS_MSK) != TX_STATUS_SUCCESS;
 
+	/* Firmware is dead — the TX response may contain corrupt SSN values
+	 * from a dying firmware DMA. Processing it could cause
+	 * iwl_trans_reclaim() to free the wrong TX queue entries, leading to
+	 * skb use-after-free or double-free.
+	 */
+	if (unlikely(test_bit(STATUS_FW_ERROR, &mld->trans->status))) {
+		WARN_ONCE(1,
+			  "iwlwifi: TX resp notif (sta=%d txq=%d) after FW error\n",
+			  sta_id, txq_id);
+		return;
+	}
+
 	if (IWL_FW_CHECK(mld, tx_resp->frame_count != 1,
 			 "Invalid tx_resp notif frame_count (%d)\n",
 			 tx_resp->frame_count))
@@ -1349,6 +1361,13 @@ void iwl_mld_handle_compressed_ba_notif(struct iwl_mld *mld,
 	u8 sta_id = ba_res->sta_id;
 	struct ieee80211_link_sta *link_sta;
 
+	if (unlikely(test_bit(STATUS_FW_ERROR, &mld->trans->status))) {
+		WARN_ONCE(1,
+			  "iwlwifi: BA notif (sta=%d) after FW error\n",
+			  sta_id);
+		return;
+	}
+
 	if (!tfd_cnt)
 		return;
 
diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/rx.c b/drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/rx.c
index 619a9505e6d9..ba18d35fa55d 100644
--- a/drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/rx.c
+++ b/drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/rx.c
@@ -1015,6 +1015,18 @@ static int iwl_pcie_napi_poll(struct napi_struct *napi, int budget)
 	trans_pcie = iwl_netdev_to_trans_pcie(napi->dev);
 	trans = trans_pcie->trans;
 
+	/* Stop processing RX if firmware has crashed. Stale notifications
+	 * from dying firmware (e.g. TX completions with corrupt SSN values)
+	 * can cause use-after-free in reclaim paths.
+	 */
+	if (unlikely(test_bit(STATUS_FW_ERROR, &trans->status))) {
+		WARN_ONCE(1,
+			  "iwlwifi: NAPI poll[%d] invoked after FW error\n",
+			  rxq->id);
+		napi_complete_done(napi, 0);
+		return 0;
+	}
+
 	ret = iwl_pcie_rx_handle(trans, rxq->id, budget);
 
 	IWL_DEBUG_ISR(trans, "[%d] handled %d, budget %d\n",
@@ -1042,6 +1054,14 @@ static int iwl_pcie_napi_poll_msix(struct napi_struct *napi, int budget)
 	trans_pcie = iwl_netdev_to_trans_pcie(napi->dev);
 	trans = trans_pcie->trans;
 
+	if (unlikely(test_bit(STATUS_FW_ERROR, &trans->status))) {
+		WARN_ONCE(1,
+			  "iwlwifi: NAPI MSIX poll[%d] invoked after FW error\n",
+			  rxq->id);
+		napi_complete_done(napi, 0);
+		return 0;
+	}
+
 	ret = iwl_pcie_rx_handle(trans, rxq->id, budget);
 	IWL_DEBUG_ISR(trans, "[%d] handled %d, budget %d\n", rxq->id, ret,
 		      budget);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-02-19 16:38 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <7f72ac08-6b4a-486b-a8f9-7b78ea0f5ae1@candelatech.com>
2026-02-18 18:47 ` [PATCH] wifi: iwlwifi: prevent NAPI processing after firmware error Cole Leavitt
2026-02-19 16:38   ` Ben Greear
     [not found] <c6f886d4-b9ed-48a6-9723-a738af055b64@candelatech.com>
2026-02-14 18:10 ` Cole Leavitt
     [not found]   ` <5be8a502-d53a-4cce-821f-202368c44f6d@candelatech.com>
2026-02-14 18:33     ` Cole Leavitt
2026-02-16 18:12       ` Ben Greear
2026-02-18 14:44         ` Cole Leavitt
2026-02-18 14:44         ` Cole Leavitt
2026-02-18 17:35         ` Ben Greear
2026-02-14 18:41   ` Cole Leavitt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox