* Re: [PATCH] wifi: iwlwifi: prevent NAPI processing after firmware error [not found] <7f72ac08-6b4a-486b-a8f9-7b78ea0f5ae1@candelatech.com> @ 2026-02-18 18:47 ` Cole Leavitt 2026-02-19 16:38 ` Ben Greear 0 siblings, 1 reply; 9+ messages in thread From: Cole Leavitt @ 2026-02-18 18:47 UTC (permalink / raw) To: Ben Greear; +Cc: linux-wireless Ben, Thanks for the historical context. I dug through the git history and your linux-ct repos to verify exactly what happened when. I want to make sure I have this right - can you confirm whether this matches what you saw? 2018 Bug (Bug 199209) --------------------- Fixed by Emmanuel in commit 0eac9abace16 ("iwlwifi: mvm: fix TX of AMSDU with fragmented SKBs"). That was a different trigger - NFS created highly fragmented SKBs where nr_frags was so high that the buffer descriptor limit check produced num_subframes=0. Emmanuel's fix clamps that path to 1. Current MLD Bug --------------- Different path to the same symptom. When TLC disables AMSDU for a TID, both MVM and MLD set max_tid_amsdu_len[tid] = 1 as a sentinel value. The key difference in protection: MVM has a private mvmsta->amsdu_enabled bitmap that gates the entire AMSDU path: if (!mvmsta->amsdu_enabled) return iwl_tx_tso_segment(skb, 1, ...); // bail out early if (!(mvmsta->amsdu_enabled & BIT(tid))) return iwl_tx_tso_segment(skb, 1, ...); // bail out early MVM never reads max_tid_amsdu_len in its TX path - it uses its own mvmsta->max_amsdu_len. This bitmap was added in commit 84226ca1c5d3 ("iwlwifi: mvm: enable AMSDU for all TIDs", Nov 2017). MLD was designed to use mac80211's sta->cur->max_tid_amsdu_len directly, with no equivalent bitmap: max_tid_amsdu_len = sta->cur->max_tid_amsdu_len[tid]; if (!max_tid_amsdu_len) // only catches 0, not sentinel 1! return iwl_tx_tso_segment(skb, 1, ...); num_subframes = (max_tid_amsdu_len + pad) / (subf_len + pad); // When max_tid_amsdu_len=1: num_subframes = (1 + 3) / (1534 + 3) = 0 What I found in your repos: - linux-ct-6.5-be200, linux-ct-6.10, linux-ct-6.14: No MLD driver, only MVM with amsdu_enabled bitmap protection - linux-ct-6.15, linux-ct-6.18: Have MLD driver (drivers/net/wireless/intel/iwlwifi/mld/) - backport-iwlwifi: MLD tx.c first appeared in commit 56f903a89 (2024-07-17) So MVM should have been immune to this specific sentinel-value bug due to the bitmap check. Question for you: When you saw TSO segment explosions in 2024, what kernel and driver were you using? If it was one of your 6.5-6.14 kernels with MVM, then there may be a different path to num_subframes=0 that I haven't identified yet. If you were using backport-iwlwifi with MLD enabled, that would explain it hitting the same bug I'm fixing now. The commit ae6d30a71521 (Feb 2024) added better error reporting for skb_gso_segment failures, which suggests people were hitting GSO segment errors around that time - but I don't have visibility into what specific trigger you hit. My fix catches the sentinel-induced zero after the calculation, which is equivalent to what MVM's bitmap check accomplishes. This should prevent the current MLD bug from reaching skb_gso_segment with gso_size=0. Looking forward to your test results with the problem AP, and any clarification on what setup you were using in 2024. Cole ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] wifi: iwlwifi: prevent NAPI processing after firmware error 2026-02-18 18:47 ` [PATCH] wifi: iwlwifi: prevent NAPI processing after firmware error Cole Leavitt @ 2026-02-19 16:38 ` Ben Greear 0 siblings, 0 replies; 9+ messages in thread From: Ben Greear @ 2026-02-19 16:38 UTC (permalink / raw) To: Cole Leavitt; +Cc: linux-wireless On 2/18/26 10:47, Cole Leavitt wrote: > Ben, > > Thanks for the historical context. I dug through the git history and > your linux-ct repos to verify exactly what happened when. I want to > make sure I have this right - can you confirm whether this matches > what you saw? Bug was originally seen in mainline kernel before MLD driver was forked off from mvm, not in a backports kernel. Adding your patch below didn't solve the UAF in the tcp_ack path, at least. I did not see the debugging indicated that code path in the patch was taken. I have not seen any more instances of the 32k loops in packet segment loop in the last crash, so at least it is not only reason why a UAF would happen. The problem reproduced overnight was: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: Oops: 0002 [#1] SMP CPU: 12 UID: 0 PID: 1234 Comm: irq/345-iwlwifi Tainted: G S O 6.18.9+ #53 PREEMPT(full) Tainted: [S]=CPU_OUT_OF_SPEC, [O]=OOT_MODULE Hardware name: Default string /Default string, BIOS 5.27 11/12/2024 RIP: 0010:rb_erase+0x173/0x350 Code: 08 48 8b 01 a8 01 75 97 48 83 c0 01 48 89 01 c3 c3 48 89 46 10 e9 27 ff ff ff 48 8b 56 10 48 8d 41 01 48 89 51 08 48 89 4e 10 <48> 89 02 48 8b 01 48 89 06 48 89 31 48 83 f8 03 0f 86 8e 00 00 00 RSP: 0018:ffffc9000038c820 EFLAGS: 00010246 RAX: ffff8881b0646601 RBX: 000000000000000c RCX: ffff8881b0646600 RDX: 0000000000000000 RSI: ffff8881e9cbea00 RDI: ffff8881b0646200 ------------[ cut here ]------------ RBP: ffff8881b0646200 R08: ffff8881ce443108 R09: 0000000080200001 R10: 0000000000010000 R11: 00000000f0eaffb7 R12: ffff8881ce442f80 R13: 0000000000000004 R14: ffff8881b0646600 R15: 0000000000000001 refcount_t: underflow; use-after-free. FS: 0000000000000000(0000) GS:ffff8888dc42e000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000005a36002 CR4: 0000000000772ef0 PKRU: 55555554 Call Trace: WARNING: CPU: 0 PID: 1224 at lib/refcount.c:28 refcount_warn_saturate+0xd8/0xe0 <IRQ> Modules linked in: tcp_ack+0x635/0x16e0 nf_conntrack_netlink tcp_rcv_established+0x211/0xc10 nf_conntrack ? sk_filter_trim_cap+0x1a7/0x350 nfnetlink tcp_v4_do_rcv+0x1bf/0x350 tls tcp_v4_rcv+0xddf/0x1550 vrf ? lock_timer_base+0x6d/0x90 nf_defrag_ipv6 ? raw_local_deliver+0xcc/0x280 nf_defrag_ipv4 ip_protocol_deliver_rcu+0x20/0x130 8021q ip_local_deliver_finish+0x85/0xf0 garp ip_sublist_rcv_finish+0x35/0x50 mrp ip_sublist_rcv+0x16f/0x200 stp ip_list_rcv+0xfe/0x130 llc __netif_receive_skb_list_core+0x183/0x1f0 macvlan netif_receive_skb_list_internal+0x1c8/0x2a0 wanlink(O) gro_receive_skb+0x12e/0x210 pktgen ieee80211_rx_napi+0x82/0xc0 [mac80211] rpcrdma iwl_mld_rx_mpdu+0xd0f/0xf00 [iwlmld] rdma_cm iwl_pcie_rx_handle+0x394/0xa00 [iwlwifi] iw_cm iwl_pcie_napi_poll_msix+0x3f/0x110 [iwlwifi] ib_cm __napi_poll+0x25/0x1e0 ib_core net_rx_action+0x2d3/0x340 qrtr I have enough guard/debugging logic in place that I'm pretty sure the skb coming from iwlwifi in this particular path is fine. It appears the problem is that there is an already freed skb in the socket's skb collection, and code blows up trying to access a bad rbtree link, or something. I'm continuing to try to narrow down where skb goes bad, but it seems like probably some other thread of logic is racing to free the skb since the crash site moves around a lot. Maybe I can add some sort of debugging to warn if skb is freed while in an rbtree... Thanks, Ben > > 2018 Bug (Bug 199209) > --------------------- > Fixed by Emmanuel in commit 0eac9abace16 ("iwlwifi: mvm: fix TX of > AMSDU with fragmented SKBs"). That was a different trigger - NFS > created highly fragmented SKBs where nr_frags was so high that the > buffer descriptor limit check produced num_subframes=0. Emmanuel's > fix clamps that path to 1. > > Current MLD Bug > --------------- > Different path to the same symptom. When TLC disables AMSDU for a > TID, both MVM and MLD set max_tid_amsdu_len[tid] = 1 as a sentinel > value. The key difference in protection: > > MVM has a private mvmsta->amsdu_enabled bitmap that gates the entire > AMSDU path: > > if (!mvmsta->amsdu_enabled) > return iwl_tx_tso_segment(skb, 1, ...); // bail out early > > if (!(mvmsta->amsdu_enabled & BIT(tid))) > return iwl_tx_tso_segment(skb, 1, ...); // bail out early > > MVM never reads max_tid_amsdu_len in its TX path - it uses its own > mvmsta->max_amsdu_len. This bitmap was added in commit 84226ca1c5d3 > ("iwlwifi: mvm: enable AMSDU for all TIDs", Nov 2017). > > MLD was designed to use mac80211's sta->cur->max_tid_amsdu_len > directly, with no equivalent bitmap: > > max_tid_amsdu_len = sta->cur->max_tid_amsdu_len[tid]; > if (!max_tid_amsdu_len) // only catches 0, not sentinel 1! > return iwl_tx_tso_segment(skb, 1, ...); > > num_subframes = (max_tid_amsdu_len + pad) / (subf_len + pad); > // When max_tid_amsdu_len=1: num_subframes = (1 + 3) / (1534 + 3) = 0 > > What I found in your repos: > > - linux-ct-6.5-be200, linux-ct-6.10, linux-ct-6.14: No MLD driver, > only MVM with amsdu_enabled bitmap protection > - linux-ct-6.15, linux-ct-6.18: Have MLD driver > (drivers/net/wireless/intel/iwlwifi/mld/) > - backport-iwlwifi: MLD tx.c first appeared in commit 56f903a89 > (2024-07-17) > > So MVM should have been immune to this specific sentinel-value bug > due to the bitmap check. > > Question for you: When you saw TSO segment explosions in 2024, what > kernel and driver were you using? If it was one of your 6.5-6.14 > kernels with MVM, then there may be a different path to > num_subframes=0 that I haven't identified yet. If you were using > backport-iwlwifi with MLD enabled, that would explain it hitting the > same bug I'm fixing now. > > The commit ae6d30a71521 (Feb 2024) added better error reporting for > skb_gso_segment failures, which suggests people were hitting GSO > segment errors around that time - but I don't have visibility into > what specific trigger you hit. > > My fix catches the sentinel-induced zero after the calculation, which > is equivalent to what MVM's bitmap check accomplishes. This should > prevent the current MLD bug from reaching skb_gso_segment with > gso_size=0. > > Looking forward to your test results with the problem AP, and any > clarification on what setup you were using in 2024. > > Cole > -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <c6f886d4-b9ed-48a6-9723-a738af055b64@candelatech.com>]
* [PATCH] wifi: iwlwifi: prevent NAPI processing after firmware error [not found] <c6f886d4-b9ed-48a6-9723-a738af055b64@candelatech.com> @ 2026-02-14 18:10 ` Cole Leavitt [not found] ` <5be8a502-d53a-4cce-821f-202368c44f6d@candelatech.com> 2026-02-14 18:41 ` Cole Leavitt 0 siblings, 2 replies; 9+ messages in thread From: Cole Leavitt @ 2026-02-14 18:10 UTC (permalink / raw) To: greearb, johannes.berg, miriam.rachel.korenblit Cc: linux-wireless, Cole Leavitt After a firmware error is detected and STATUS_FW_ERROR is set, NAPI can still be actively polling or get scheduled from a prior interrupt. The NAPI poll functions (both legacy and MSIX variants) have no check for STATUS_FW_ERROR and will continue processing stale RX ring entries from dying firmware. This can dispatch TX completion notifications containing corrupt SSN values to iwl_mld_handle_tx_resp_notif(), which passes them to iwl_trans_reclaim(). If the corrupt SSN causes reclaim to walk TX queue entries that were already freed by a prior correct reclaim, the result is an skb use-after-free or double-free. The race window opens when the MSIX IRQ handler schedules NAPI (lines 2319-2321 in rx.c) before processing the error bit (lines 2382-2396), or when NAPI is already running on another CPU from a previous interrupt when STATUS_FW_ERROR gets set on the current CPU. Add STATUS_FW_ERROR checks to both NAPI poll functions to prevent processing stale RX data after firmware error, and add early-return guards in the TX response and compressed BA notification handlers as defense-in-depth. Each check uses WARN_ONCE to log if the race is actually hit, which aids diagnosis of the hard-to-reproduce skb use-after-free reported on Intel BE200. Note that _iwl_trans_pcie_gen2_stop_device() already calls iwl_pcie_rx_napi_sync() to quiesce NAPI during device teardown, but that runs much later in the restart sequence. These checks close the window between error detection and device stop. Signed-off-by: Cole Leavitt <cole@unwrap.rs> --- Tested on Intel BE200 (FW 101.6e695a70.0) by forcing NMI via debugfs. The WARN_ONCE fires reliably: iwlwifi: NAPI MSIX poll[0] invoked after FW error WARNING: drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/rx.c:1058 at iwl_pcie_napi_poll_msix+0xff/0x130 [iwlwifi], CPU#22 Confirming NAPI poll is invoked after STATUS_FW_ERROR is set. Without this patch, that poll processes stale RX ring data from dead firmware. drivers/net/wireless/intel/iwlwifi/mld/tx.c | 19 ++++++++++++++++++ .../wireless/intel/iwlwifi/pcie/gen1_2/rx.c | 20 +++++++++++++++++++ 2 files changed, 39 insertions(+) diff --git a/drivers/net/wireless/intel/iwlwifi/mld/tx.c b/drivers/net/wireless/intel/iwlwifi/mld/tx.c index 3b4b575aadaa..3e99f3ded9bc 100644 --- a/drivers/net/wireless/intel/iwlwifi/mld/tx.c +++ b/drivers/net/wireless/intel/iwlwifi/mld/tx.c @@ -1071,6 +1071,18 @@ void iwl_mld_handle_tx_resp_notif(struct iwl_mld *mld, bool mgmt = false; bool tx_failure = (status & TX_STATUS_MSK) != TX_STATUS_SUCCESS; + /* Firmware is dead — the TX response may contain corrupt SSN values + * from a dying firmware DMA. Processing it could cause + * iwl_trans_reclaim() to free the wrong TX queue entries, leading to + * skb use-after-free or double-free. + */ + if (unlikely(test_bit(STATUS_FW_ERROR, &mld->trans->status))) { + WARN_ONCE(1, + "iwlwifi: TX resp notif (sta=%d txq=%d) after FW error\n", + sta_id, txq_id); + return; + } + if (IWL_FW_CHECK(mld, tx_resp->frame_count != 1, "Invalid tx_resp notif frame_count (%d)\n", tx_resp->frame_count)) @@ -1349,6 +1361,13 @@ void iwl_mld_handle_compressed_ba_notif(struct iwl_mld *mld, u8 sta_id = ba_res->sta_id; struct ieee80211_link_sta *link_sta; + if (unlikely(test_bit(STATUS_FW_ERROR, &mld->trans->status))) { + WARN_ONCE(1, + "iwlwifi: BA notif (sta=%d) after FW error\n", + sta_id); + return; + } + if (!tfd_cnt) return; diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/rx.c b/drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/rx.c index 619a9505e6d9..ba18d35fa55d 100644 --- a/drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/rx.c +++ b/drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/rx.c @@ -1015,6 +1015,18 @@ static int iwl_pcie_napi_poll(struct napi_struct *napi, int budget) trans_pcie = iwl_netdev_to_trans_pcie(napi->dev); trans = trans_pcie->trans; + /* Stop processing RX if firmware has crashed. Stale notifications + * from dying firmware (e.g. TX completions with corrupt SSN values) + * can cause use-after-free in reclaim paths. + */ + if (unlikely(test_bit(STATUS_FW_ERROR, &trans->status))) { + WARN_ONCE(1, + "iwlwifi: NAPI poll[%d] invoked after FW error\n", + rxq->id); + napi_complete_done(napi, 0); + return 0; + } + ret = iwl_pcie_rx_handle(trans, rxq->id, budget); IWL_DEBUG_ISR(trans, "[%d] handled %d, budget %d\n", @@ -1042,6 +1054,14 @@ static int iwl_pcie_napi_poll_msix(struct napi_struct *napi, int budget) trans_pcie = iwl_netdev_to_trans_pcie(napi->dev); trans = trans_pcie->trans; + if (unlikely(test_bit(STATUS_FW_ERROR, &trans->status))) { + WARN_ONCE(1, + "iwlwifi: NAPI MSIX poll[%d] invoked after FW error\n", + rxq->id); + napi_complete_done(napi, 0); + return 0; + } + ret = iwl_pcie_rx_handle(trans, rxq->id, budget); IWL_DEBUG_ISR(trans, "[%d] handled %d, budget %d\n", rxq->id, ret, budget); -- 2.52.0 ^ permalink raw reply related [flat|nested] 9+ messages in thread
[parent not found: <5be8a502-d53a-4cce-821f-202368c44f6d@candelatech.com>]
* Re: [PATCH] wifi: iwlwifi: prevent NAPI processing after firmware error [not found] ` <5be8a502-d53a-4cce-821f-202368c44f6d@candelatech.com> @ 2026-02-14 18:33 ` Cole Leavitt 2026-02-16 18:12 ` Ben Greear 0 siblings, 1 reply; 9+ messages in thread From: Cole Leavitt @ 2026-02-14 18:33 UTC (permalink / raw) To: greearb Cc: johannes.berg, miriam.rachel.korenblit, linux-wireless, Cole Leavitt Ben, Good catch on both fronts. On the build_tfd dangling pointer -- you're right. The failure path at line 775 leaves entries[idx].skb/cmd pointing at caller-owned objects (set at lines 763-764). The caller gets -1 and presumably frees the skb, so entries[idx].skb becomes a dangling pointer. While write_ptr not advancing means current unmap paths won't iterate to that index, it's a latent UAF waiting for a flush path change or future code to touch it. Two NULL stores inside a held spinlock cost nothing. I think this should go upstream as its own patch. On the TOCTOU question -- this is the part I spent the most time on. The window you're asking about is: firmware starts producing corrupt completion data *before* STATUS_FW_ERROR gets set. Our NAPI/TX handler checks can't help there because the flag isn't set yet. The primary guard in that window is iwl_txq_used() in iwl_pcie_reclaim(). It validates that the firmware's SSN falls within [read_ptr, write_ptr). This catches wild values -- out-of-range SSNs, wraparound corruption, etc. What it can't catch is an in-range corrupt SSN -- e.g., firmware says reclaim up to index 15 when legitimate is 8, but write_ptr is 20. That passes bounds checking and the reclaim loop frees skbs for entries still in-flight (active DMA). The NULL skb WARN_ONCE in the loop catches double-reclaim but not first-time over-reclaim. The complete fix for this would be a per-entry generation counter -- tag each entry on submit, validate on reclaim. But that adds per-entry overhead on the TX hot path to protect against a condition (firmware producing corrupt completions) that is already terminal. I think the right trade-off is: 1. Your build_tfd NULL fix (eliminates one dangling pointer class) 2. STATUS_FW_ERROR checks in NAPI poll + TX handlers (this series -- shrinks the detection window to near-zero) 3. The existing iwl_txq_used() bounds check (catches most corrupt SSNs) Together these make the damage window small enough that a per-entry generation scheme isn't justified -- by the time firmware is sending corrupt SSNs, we're in dump-and-reset territory anyway. That said, if you're seeing corruption patterns in your customer testing where a valid-looking-but-wrong SSN gets through before FW_ERROR fires, I'd be very interested in the traces. That would change the cost/benefit on the generation counter approach. Thanks, Cole ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] wifi: iwlwifi: prevent NAPI processing after firmware error 2026-02-14 18:33 ` Cole Leavitt @ 2026-02-16 18:12 ` Ben Greear 2026-02-18 14:44 ` Cole Leavitt ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: Ben Greear @ 2026-02-16 18:12 UTC (permalink / raw) To: Cole Leavitt; +Cc: johannes.berg, miriam.rachel.korenblit, linux-wireless On 2/14/26 10:33 AM, Cole Leavitt wrote: > Ben, > > Good catch on both fronts. > > On the build_tfd dangling pointer -- you're right. The failure path at > line 775 leaves entries[idx].skb/cmd pointing at caller-owned objects > (set at lines 763-764). The caller gets -1 and presumably frees the > skb, so entries[idx].skb becomes a dangling pointer. While write_ptr > not advancing means current unmap paths won't iterate to that index, > it's a latent UAF waiting for a flush path change or future code to > touch it. Two NULL stores inside a held spinlock cost nothing. I think > this should go upstream as its own patch. > > On the TOCTOU question -- this is the part I spent the most time on. > The window you're asking about is: firmware starts producing corrupt > completion data *before* STATUS_FW_ERROR gets set. Our NAPI/TX handler > checks can't help there because the flag isn't set yet. > > The primary guard in that window is iwl_txq_used() in > iwl_pcie_reclaim(). It validates that the firmware's SSN falls within > [read_ptr, write_ptr). This catches wild values -- out-of-range SSNs, > wraparound corruption, etc. > > What it can't catch is an in-range corrupt SSN -- e.g., firmware says > reclaim up to index 15 when legitimate is 8, but write_ptr is 20. > That passes bounds checking and the reclaim loop frees skbs for > entries still in-flight (active DMA). The NULL skb WARN_ONCE in the > loop catches double-reclaim but not first-time over-reclaim. > > The complete fix for this would be a per-entry generation counter -- > tag each entry on submit, validate on reclaim. But that adds per-entry > overhead on the TX hot path to protect against a condition (firmware > producing corrupt completions) that is already terminal. I think the > right trade-off is: > > 1. Your build_tfd NULL fix (eliminates one dangling pointer class) > 2. STATUS_FW_ERROR checks in NAPI poll + TX handlers (this series -- > shrinks the detection window to near-zero) > 3. The existing iwl_txq_used() bounds check (catches most corrupt > SSNs) > > Together these make the damage window small enough that a per-entry > generation scheme isn't justified -- by the time firmware is sending > corrupt SSNs, we're in dump-and-reset territory anyway. > > That said, if you're seeing corruption patterns in your customer > testing where a valid-looking-but-wrong SSN gets through before > FW_ERROR fires, I'd be very interested in the traces. That would > change the cost/benefit on the generation counter approach. Hello Cole, Looks like even with your patches we are still seeing use-after-free. I tried adding a lot of checks to detect already freed skbs in iwlwifi, and those are not hitting, so possibly the bug is very close to the end of the call chain, or I am doing it wrong, or it is some sort of race or bug that my code will not catch. We do not see any related crashes when using mt76 radios, so pretty sure this is related to iwlwifi. A particular AP reproduces this problem within a day, and we can run tcp tests for 30+ days against other APs with no problem. I don't know what the AP could be doing to trigger this though. No FW crash was seen in my logs in this case. My tree is here if you care to investigate any of my UAF debugging or see what code is printing some of these logs. Suggestions for improvement would be welcome! https://github.com/greearb/linux-ct-6.18 One problem I see (for several years) is an infinite busy-spin in iwl-mvm-tx-tso-segment. I added code to break out after 32k loops, and warn. That hits here. The system crashes 28 minutes later, so not sure if that is directly related. I guess I can try to do more debugging around that bad tso segment path. Feb 16 00:16:01 LF1-MobileStation1 kernel: skbuff: ERROR: Found more than 32000 packets in skbuff::skb_segment, bailing out. Feb 16 00:16:01 LF1-MobileStation1 kernel: ERROR: iwl-mvm-tx-tso-segment, list gso-segment list is huge: 32001, bailing out. Feb 16 00:16:06 LF1-MobileStation1 kernel: skbuff: ERROR: Found more than 32000 packets in skbuff::skb_segment, bailing out. Feb 16 00:16:06 LF1-MobileStation1 kernel: ERROR: iwl-mvm-tx-tso-segment, list gso-segment list is huge: 32001, bailing out. Feb 16 00:44:06 LF1-MobileStation1 kernel: ------------[ cut here ]------------ Feb 16 00:44:06 LF1-MobileStation1 kernel: refcount_t: underflow; use-after-free. Feb 16 00:44:06 LF1-MobileStation1 kernel: WARNING: CPU: 18 PID: 1203 at lib/refcount.c:28 refcount_warn_saturate+0xd8/0xe0 Feb 16 00:44:06 LF1-MobileStation1 kernel: Modules linked in: nf_conntrack_netlink nf_conntrack nfnetlink tls vrf nf_defrag_ipv6 nf_defrag_ipv4 8021q garp mrp stp llc macvlan wanlink(O) pktgen rpcrdma rdma_cm iw_cm ib_cm ib_core qrtr nct7802 vfat fat intel_rapl_msr coretemp intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common snd_hda_codec_intelhdmi snd_hda_codec_hdmi snd_hda_codec_alc882 x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_realtek_lib ofpart snd_hda_codec_generic i2c_designware_platform spi_nor kvm_intel spi_pxa2xx_platform iwlmld i2c_designware_core spd5118 dw_dmac iTCO_wdt intel_pmc_bxt ccp mtd regmap_i2c spi_pxa2xx_core uvcvideo kvm snd_hda_intel 8250_dw iTCO_vendor_support mac80211 uvc snd_intel_dspcfg irqbypass videobuf2_vmalloc snd_hda_codec videobuf2_memops btusb videobuf2_v4l2 btbcm snd_hda_core videobuf2_common snd_hwdep videodev btmtk snd_seq btrtl mc btintel iwlwifi cdc_acm onboard_usb_dev snd_seq_device bluetooth snd_pcm cfg80211 snd_timer intel_pmc_core snd intel_lpss_pci i2c_i801 pmt_telemetry Feb 16 00:44:06 LF1-MobileStation1 kernel: i2c_smbus soundcore intel_lpss pmt_discovery spi_intel_pci mei_hdcp idma64 i2c_mux pmt_class wmi_bmof spi_intel pcspkr mei_pxp intel_pmc_ssram_telemetry bfq acpi_tad acpi_pad nfsd auth_rpcgss nfs_acl lockd grace nfs_localio sch_fq_codel sunrpc fuse zram raid1 dm_raid raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq xe drm_ttm_helper gpu_sched drm_suballoc_helper drm_gpuvm drm_exec drm_gpusvm_helper i915 i2c_algo_bit drm_buddy intel_gtt drm_client_lib drm_display_helper drm_kms_helper cec rc_core intel_oc_wdt ttm ixgbe agpgart mdio libie_fwlog e1000e igc dca hwmon drm mei_wdt intel_vsec i2c_core video wmi pinctrl_alderlake efivarfs [last unloaded: nfnetlink] Feb 16 00:44:06 LF1-MobileStation1 kernel: CPU: 18 UID: 0 PID: 1203 Comm: irq/343-iwlwifi Tainted: G S O 6.18.9+ #53 PREEMPT(full) Feb 16 00:44:06 LF1-MobileStation1 kernel: Tainted: [S]=CPU_OUT_OF_SPEC, [O]=OOT_MODULE Feb 16 00:44:06 LF1-MobileStation1 kernel: Hardware name: Default string /Default string, BIOS 5.27 11/12/2024 Feb 16 00:44:06 LF1-MobileStation1 kernel: RIP: 0010:refcount_warn_saturate+0xd8/0xe0 Feb 16 00:44:07 LF1-MobileStation1 kernel: Code: ff 48 c7 c7 d8 a4 6d 82 c6 05 d0 4a 3e 01 01 e8 3e 83 a7 ff 0f 0b c3 48 c7 c7 80 a4 6d 82 c6 05 bc 4a 3e 01 01 e8 28 83 a7 ff <0f> 0b c3 0f 1f 44 00 00 8b 07 3d 00 00 00 c0 74 12 83 f8 01 74 13 Feb 16 00:44:07 LF1-MobileStation1 kernel: RSP: 0018:ffffc9000045c6d0 EFLAGS: 00010282 Feb 16 00:44:07 LF1-MobileStation1 kernel: RAX: 0000000000000000 RBX: ffff8882772db000 RCX: 0000000000000000 Feb 16 00:44:07 LF1-MobileStation1 kernel: RDX: ffff88885faa5f00 RSI: 0000000000000001 RDI: ffff88885fa98d00 Feb 16 00:44:07 LF1-MobileStation1 kernel: RBP: ffff8882447d9e00 R08: 0000000000000000 R09: 0000000000000003 Feb 16 00:44:07 LF1-MobileStation1 kernel: R10: ffffc9000045c570 R11: ffffffff82b58da8 R12: ffff88820165f200 Feb 16 00:44:07 LF1-MobileStation1 kernel: R13: 0000000000000001 R14: 00000000000005a8 R15: ffffc9000045c890 Feb 16 00:44:07 LF1-MobileStation1 kernel: FS: 0000000000000000(0000) GS:ffff8888dc5ae000(0000) knlGS:0000000000000000 Feb 16 00:44:07 LF1-MobileStation1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Feb 16 00:44:07 LF1-MobileStation1 kernel: CR2: 00007fd1022fdcb4 CR3: 0000000005a36004 CR4: 0000000000772ef0 Feb 16 00:44:07 LF1-MobileStation1 kernel: PKRU: 55555554 Feb 16 00:44:07 LF1-MobileStation1 kernel: Call Trace: Feb 16 00:44:07 LF1-MobileStation1 kernel: <IRQ> Feb 16 00:44:07 LF1-MobileStation1 kernel: tcp_shifted_skb+0x1d2/0x300 Feb 16 00:44:07 LF1-MobileStation1 kernel: tcp_sacktag_walk+0x2da/0x4d0 Feb 16 00:44:07 LF1-MobileStation1 kernel: tcp_sacktag_write_queue+0x4a1/0x9a0 Feb 16 00:44:07 LF1-MobileStation1 kernel: tcp_ack+0xd66/0x16e0 Feb 16 00:44:07 LF1-MobileStation1 kernel: ? ip_finish_output2+0x189/0x570 Feb 16 00:44:07 LF1-MobileStation1 kernel: tcp_rcv_established+0x211/0xc10 Feb 16 00:44:07 LF1-MobileStation1 kernel: ? sk_filter_trim_cap+0x1a7/0x350 Feb 16 00:44:07 LF1-MobileStation1 kernel: tcp_v4_do_rcv+0x1bf/0x350 Feb 16 00:44:07 LF1-MobileStation1 kernel: tcp_v4_rcv+0xddf/0x1550 Feb 16 00:44:07 LF1-MobileStation1 kernel: ? raw_local_deliver+0xcc/0x280 Feb 16 00:44:07 LF1-MobileStation1 kernel: ip_protocol_deliver_rcu+0x20/0x130 Feb 16 00:44:07 LF1-MobileStation1 kernel: ip_local_deliver_finish+0x85/0xf0 Feb 16 00:44:07 LF1-MobileStation1 kernel: ip_sublist_rcv_finish+0x35/0x50 Feb 16 00:44:07 LF1-MobileStation1 kernel: ip_sublist_rcv+0x16f/0x200 Feb 16 00:44:07 LF1-MobileStation1 kernel: ip_list_rcv+0xfe/0x130 Feb 16 00:44:07 LF1-MobileStation1 kernel: __netif_receive_skb_list_core+0x183/0x1f0 Feb 16 00:44:07 LF1-MobileStation1 kernel: netif_receive_skb_list_internal+0x1c8/0x2a0 Feb 16 00:44:07 LF1-MobileStation1 kernel: gro_receive_skb+0x12e/0x210 Feb 16 00:44:07 LF1-MobileStation1 kernel: ieee80211_rx_napi+0x82/0xc0 [mac80211] Feb 16 00:44:07 LF1-MobileStation1 kernel: iwl_mld_rx_mpdu+0xd0f/0xf00 [iwlmld] Feb 16 00:44:07 LF1-MobileStation1 kernel: iwl_pcie_rx_handle+0x394/0xa00 [iwlwifi] Feb 16 00:44:07 LF1-MobileStation1 kernel: iwl_pcie_napi_poll_msix+0x3f/0x110 [iwlwifi] Feb 16 00:44:07 LF1-MobileStation1 kernel: __napi_poll+0x25/0x1e0 Feb 16 00:44:07 LF1-MobileStation1 kernel: net_rx_action+0x2d3/0x340 Feb 16 00:44:07 LF1-MobileStation1 kernel: ? try_to_wake_up+0x2e6/0x610 Feb 16 00:44:07 LF1-MobileStation1 kernel: ? __handle_irq_event_percpu+0xa3/0x230 Feb 16 00:44:07 LF1-MobileStation1 kernel: handle_softirqs+0xca/0x2b0 Feb 16 00:44:07 LF1-MobileStation1 kernel: ? irq_thread_dtor+0xa0/0xa0 Feb 16 00:44:07 LF1-MobileStation1 kernel: do_softirq.part.0+0x3b/0x60 Feb 16 00:44:07 LF1-MobileStation1 kernel: </IRQ> Feb 16 00:44:07 LF1-MobileStation1 kernel: <TASK> Feb 16 00:44:07 LF1-MobileStation1 kernel: __local_bh_enable_ip+0x58/0x60 Feb 16 00:44:07 LF1-MobileStation1 kernel: iwl_pcie_irq_rx_msix_handler+0xbb/0x100 [iwlwifi] Feb 16 00:44:07 LF1-MobileStation1 kernel: irq_thread_fn+0x19/0x50 Feb 16 00:44:07 LF1-MobileStation1 kernel: irq_thread+0x126/0x230 Feb 16 00:44:07 LF1-MobileStation1 kernel: ? irq_finalize_oneshot.part.0+0xc0/0xc0 Feb 16 00:44:07 LF1-MobileStation1 kernel: ? irq_forced_thread_fn+0x40/0x40 Feb 16 00:44:07 LF1-MobileStation1 kernel: kthread+0xf7/0x1f0 Feb 16 00:44:07 LF1-MobileStation1 kernel: ? kthreads_online_cpu+0x100/0x100 Feb 16 00:44:07 LF1-MobileStation1 kernel: ? kthreads_online_cpu+0x100/0x100 Feb 16 00:44:07 LF1-MobileStation1 kernel: ret_from_fork+0x114/0x140 Feb 16 00:44:07 LF1-MobileStation1 kernel: ? kthreads_online_cpu+0x100/0x100 Feb 16 00:44:07 LF1-MobileStation1 kernel: ret_from_fork_asm+0x11/0x20 Feb 16 00:44:07 LF1-MobileStation1 kernel: </TASK> Feb 16 00:44:07 LF1-MobileStation1 kernel: ---[ end trace 0000000000000000 ]--- Feb 16 00:44:07 LF1-MobileStation1 kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 [NPE shortly after in tcp code, bug real problem is the use-after-free I assume] # serial console output of the crash following the UAF. #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP CPU: 18 UID: 0 PID: 1203 Comm: irq/343-iwlwifi Tainted: G S W O 6.18.9+ #53 PREEMPT(full) Tainted: [S]=CPU_OUT_OF_SPEC, [W]=WARN, [O]=OOT_MODULE Hardware name: Default string /Default string, BIOS 5.27 11/12/2024 RIP: 0010:tcp_rack_detect_loss+0x11c/0x170 Code: 07 00 00 48 8b 87 b0 06 00 00 44 01 ee 48 29 d0 ba 00 00 00 00 48 0f 48 c2 29 c6 85 f6 7e 27 41 8b 06 39 f0 0f 42 c6 41 89 06 <48> 8b 45 58 4c 8d 65 58 48 89 eb 48 83 e8 58 4d 39 fc 74 ab 48 89 RSP: 0018:ffffc9000045c758 EFLAGS: 00010293 RAX: 000000000000408d RBX: ffff88824fff7a00 RCX: 20c49ba5e353f7cf RDX: 0000000000000000 RSI: 000000000000408d RDI: ffff88820165f200 RBP: ffffffffffffffa8 R08: 0000000083eed3f9 R09: 000000000000012c R10: 00000000000005ba R11: 000000000000001d R12: ffff88824fff7a58 R13: 000000000000408d R14: ffffc9000045c79c R15: ffff88820165f888 FS: 0000000000000000(0000) GS:ffff8888dc5ae000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000005a36004 CR4: 0000000000772ef0 PKRU: 55555554 Call Trace: <IRQ> tcp_rack_mark_lost+0x59/0xe0 tcp_identify_packet_loss+0x30/0x70 tcp_fastretrans_alert+0x366/0x810 tcp_ack+0xc66/0x16e0 ? ip_finish_output2+0x189/0x570 tcp_rcv_established+0x211/0xc10 ? sk_filter_trim_cap+0x1a7/0x350 tcp_v4_do_rcv+0x1bf/0x350 tcp_v4_rcv+0xddf/0x1550 ? raw_local_deliver+0xcc/0x280 ip_protocol_deliver_rcu+0x20/0x130 ip_local_deliver_finish+0x85/0xf0 ip_sublist_rcv_finish+0x35/0x50 ip_sublist_rcv+0x16f/0x200 ip_list_rcv+0xfe/0x130 __netif_receive_skb_list_core+0x183/0x1f0 netif_receive_skb_list_internal+0x1c8/0x2a0 gro_receive_skb+0x12e/0x210 ieee80211_rx_napi+0x82/0xc0 [mac80211] iwl_mld_rx_mpdu+0xd0f/0xf00 [iwlmld] iwl_pcie_rx_handle+0x394/0xa00 [iwlwifi] iwl_pcie_napi_poll_msix+0x3f/0x110 [iwlwifi] __napi_poll+0x25/0x1e0 net_rx_action+0x2d3/0x340 ? try_to_wake_up+0x2e6/0x610 ? __handle_irq_event_percpu+0xa3/0x230 handle_softirqs+0xca/0x2b0 ? irq_thread_dtor+0xa0/0xa0 do_softirq.part.0+0x3b/0x60 </IRQ> <TASK> __local_bh_enable_ip+0x58/0x60 iwl_pcie_irq_rx_msix_handler+0xbb/0x100 [iwlwifi] irq_thread_fn+0x19/0x50 irq_thread+0x126/0x230 ? irq_finalize_oneshot.part.0+0xc0/0xc0 ? irq_forced_thread_fn+0x40/0x40 kthread+0xf7/0x1f0 ? kthreads_online_cpu+0x100/0x100 ? kthreads_online_cpu+0x100/0x100 ret_from_fork+0x114/0x140 ? kthreads_online_cpu+0x100/0x100 ret_from_fork_asm+0x11/0x20 </TASK> Modules linked in: nf_conntrack_netlink nf_conntrack nfnetlink tls vrf nf_defrag_ipv6 nf_defrag_ipv4 8021q garp mrp stp llc macvlan wanlink(O) pktgen rpcrdma rdma_cm iw_cm ib_cm ib_core qrtr nct7802 vfat fat intel_rapl_msr coretemp intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common snd_hda_codec_intelhdmi snd_hda_codec_hdmi snd_hda_codec_alc882 x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_realtek_lib ofpart snd_hda_codec_generic i2c_designware_platform spi_nor kvm_intel spi_pxa2xx_platform iwlmld i2c_designware_core spd5118 dw_dmac iTCO_wdt intel_pmc_bxt ccp mtd regmap_i2c spi_pxa2xx_core uvcvideo kvm snd_hda_intel 8250_dw iTCO_vendor_support mac80211 uvc snd_intel_dspcfg irqbypass videobuf2_vmalloc snd_hda_codec videobuf2_memops btusb videobuf2_v4l2 btbcm snd_hda_core videobuf2_common snd_hwdep videodev btmtk snd_seq btrtl mc btintel iwlwifi cdc_acm onboard_usb_dev snd_seq_device bluetooth snd_pcm cfg80211 snd_timer intel_pmc_core snd intel_lpss_pci i2c_i801 pmt_telemetry i2c_smbus soundcore intel_lpss pmt_discovery spi_intel_pci mei_hdcp idma64 i2c_mux pmt_class wmi_bmof spi_intel pcspkr mei_pxp intel_pmc_ssram_telemetry bfq acpi_tad acpi_pad nfsd auth_rpcgss nfs_acl lockd grace nfs_localio sch_fq_codel sunrpc fuse zram raid1 dm_raid raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq xe drm_ttm_helper gpu_sched drm_suballoc_helper drm_gpuvm drm_exec drm_gpusvm_helper i915 i2c_algo_bit drm_buddy intel_gtt drm_client_lib drm_display_helper drm_kms_helper cec rc_core intel_oc_wdt ttm ixgbe agpgart mdio libie_fwlog e1000e igc dca hwmon drm mei_wdt intel_vsec i2c_core video wmi pinctrl_alderlake efivarfs [last unloaded: nfnetlink] CR2: 0000000000000000 ---[ end trace 0000000000000000 ]--- RIP: 0010:tcp_rack_detect_loss+0x11c/0x170 Code: 07 00 00 48 8b 87 b0 06 00 00 44 01 ee 48 29 d0 ba 00 00 00 00 48 0f 48 c2 29 c6 85 f6 7e 27 41 8b 06 39 f0 0f 42 c6 41 89 06 <48> 8b 45 58 4c 8d 65 58 48 89 eb 48 83 e8 58 4d 39 fc 74 ab 48 89 RSP: 0018:ffffc9000045c758 EFLAGS: 00010293 RAX: 000000000000408d RBX: ffff88824fff7a00 RCX: 20c49ba5e353f7cf RDX: 0000000000000000 RSI: 000000000000408d RDI: ffff88820165f200 RBP: ffffffffffffffa8 R08: 0000000083eed3f9 R09: 000000000000012c R10: 00000000000005ba R11: 000000000000001d R12: ffff88824fff7a58 R13: 000000000000408d R14: ffffc9000045c79c R15: ffff88820165f888 FS: 0000000000000000(0000) GS:ffff8888dc5ae000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000005a36004 CR4: 0000000000772ef0 PKRU: 55555554 Kernel panic - not syncing: Fatal exception in interrupt Kernel Offset: disabled Rebooting in 10 seconds.. Thanks, Ben > > Thanks, > Cole > -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] wifi: iwlwifi: prevent NAPI processing after firmware error 2026-02-16 18:12 ` Ben Greear @ 2026-02-18 14:44 ` Cole Leavitt 2026-02-18 14:44 ` Cole Leavitt 2026-02-18 17:35 ` Ben Greear 2 siblings, 0 replies; 9+ messages in thread From: Cole Leavitt @ 2026-02-18 14:44 UTC (permalink / raw) To: Ben Greear; +Cc: Johannes Berg, linux-wireless, Miri Korenblit Ben, I've been digging into the use-after-free crash you reported on your BE200 running the MLD driver (tcp_shifted_skb refcount underflow, followed by NULL deref in tcp_rack_detect_loss). I think I found the root cause -- it's a missing guard in the MLD TSO segmentation path that lets num_subframes=0 reach skb_gso_segment(), producing the 32k+ segment explosion you're seeing. Here's the full chain: 1) mld/tlc.c:790 -- when firmware's TLC notification disables AMSDU for a TID (bit not set in amsdu_enabled), the MLD driver sets: link_sta->agg.max_tid_amsdu_len[i] = 1; This sentinel value 1 means "AMSDU disabled on this TID". 2) mld/tx.c:836-837 -- the TSO path checks: max_tid_amsdu_len = sta->cur->max_tid_amsdu_len[tid]; if (!max_tid_amsdu_len) // <-- only catches zero, not 1 return iwl_tx_tso_segment(skb, 1, ...); Value 1 passes this check. 3) mld/tx.c:847 -- the division produces zero: num_subframes = (1 + 2) / (1534 + 2) = 0 Any max_tid_amsdu_len below ~1534 (one subframe) produces 0 here. 4) iwl-utils.c:27 -- gso_size is set to zero: skb_shinfo(skb)->gso_size = num_subframes * mss = 0 * 1460 = 0 5) iwl-utils.c:30 -- skb_gso_segment() with gso_size=0 creates 32001+ tiny segments, which is the error you're seeing: "skbuff: ERROR: Found more than 32000 packets in skb_segment" "iwl-mvm-tx-tso-segment, list gso-segment list is huge: 32001" 6) mld/tx.c:912-936 -- the loop queues ~1024 of those segments to the TX ring before it fills up, then purges the rest. This creates a massive burst of tiny frames that stress the BA completion path. The MVM driver is immune because it checks mvmsta->amsdu_enabled (a separate bitmap) at tx.c:912 and tx.c:936 BEFORE ever reaching the num_subframes calculation. MLD has no equivalent -- it relies solely on max_tid_amsdu_len, and the sentinel value 1 slips through. This explains all your observations: - 6.18 regression: BE200 moved from MVM (has guard) to MLD (no guard) - AP-specific: the problem AP causes firmware to disable AMSDU for the active TID (other APs enable it, so max_tid_amsdu_len gets a proper value from iwl_mld_get_amsdu_size_of_tid()) - 28min gap between TSO explosion and UAF: the ~1024 micro-frame burst creates massive alloc/free churn in the skb slab, which can corrupt TCP retransmit queue entries allocated from the same cache - No firmware error: firmware is fine, the bug is purely in MLD's TSO parameter calculation Fix below. It adds a guard after the num_subframes calculation -- if it's zero, fall back to single-subframe TSO (num_subframes=1), which correctly sets gso_size=mss. This matches what MVM effectively does via its amsdu_enabled checks. Could you test this against the problem AP? Two things that would help confirm the theory: 1) Before applying the fix, add this debug print to see the actual max_tid_amsdu_len value with the problem AP: // In iwl_mld_tx_tso_segment(), after line 847 if (!num_subframes) pr_warn_once("iwlmld: num_subframes=0, max_tid_amsdu_len=%u " "subf_len=%u mss=%u\n", max_tid_amsdu_len, subf_len, mss); 2) After applying the fix, run against the problem AP for 1+ day and check if both the TSO explosion AND the UAF are gone. I also noticed a few secondary defense-in-depth regressions in MLD's TX completion path vs MVM: - MLD's iwl_mld_tx_reclaim_txq() has no per-TID reclaim tracking (MVM has tid_data->next_reclaimed and validates tid_data->txq_id) - The transport-level reclaim_lock prevents direct double-free, but MLD is missing MVM's extra safety checks These are probably not directly causing your crash, but worth noting. --- drivers/net/wireless/intel/iwlwifi/mld/tx.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/drivers/net/wireless/intel/iwlwifi/mld/tx.c b/drivers/net/wireless/intel/iwlwifi/mld/tx.c index fbb672f4d8c7..1d47254a4148 100644 --- a/drivers/net/wireless/intel/iwlwifi/mld/tx.c +++ b/drivers/net/wireless/intel/iwlwifi/mld/tx.c @@ -846,6 +846,17 @@ static int iwl_mld_tx_tso_segment(struct iwl_mld *mld, struct sk_buff *skb, */ num_subframes = (max_tid_amsdu_len + pad) / (subf_len + pad); + /* If the AMSDU length limit is too small to fit even a single + * subframe (e.g. max_tid_amsdu_len is the sentinel value 1 set by + * the TLC notification when AMSDU is disabled for this TID), fall + * back to non-AMSDU TSO segmentation. Without this guard, + * num_subframes=0 causes gso_size=0 in iwl_tx_tso_segment(), + * which makes skb_gso_segment() produce tens of thousands of + * 1-byte segments, overloading the TX ring and completion path. + */ + if (!num_subframes) + return iwl_tx_tso_segment(skb, 1, netdev_flags, mpdus_skbs); + if (sta->max_amsdu_subframes && num_subframes > sta->max_amsdu_subframes) num_subframes = sta->max_amsdu_subframes; -- 2.52.0 Cole ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] wifi: iwlwifi: prevent NAPI processing after firmware error 2026-02-16 18:12 ` Ben Greear 2026-02-18 14:44 ` Cole Leavitt @ 2026-02-18 14:44 ` Cole Leavitt 2026-02-18 17:35 ` Ben Greear 2 siblings, 0 replies; 9+ messages in thread From: Cole Leavitt @ 2026-02-18 14:44 UTC (permalink / raw) To: Ben Greear; +Cc: Johannes Berg, linux-wireless, Miri Korenblit Ben, I've been digging into the use-after-free crash you reported on your BE200 running the MLD driver (tcp_shifted_skb refcount underflow, followed by NULL deref in tcp_rack_detect_loss). I think I found the root cause -- it's a missing guard in the MLD TSO segmentation path that lets num_subframes=0 reach skb_gso_segment(), producing the 32k+ segment explosion you're seeing. Here's the full chain: 1) mld/tlc.c:790 -- when firmware's TLC notification disables AMSDU for a TID (bit not set in amsdu_enabled), the MLD driver sets: link_sta->agg.max_tid_amsdu_len[i] = 1; This sentinel value 1 means "AMSDU disabled on this TID". 2) mld/tx.c:836-837 -- the TSO path checks: max_tid_amsdu_len = sta->cur->max_tid_amsdu_len[tid]; if (!max_tid_amsdu_len) // <-- only catches zero, not 1 return iwl_tx_tso_segment(skb, 1, ...); Value 1 passes this check. 3) mld/tx.c:847 -- the division produces zero: num_subframes = (1 + 2) / (1534 + 2) = 0 Any max_tid_amsdu_len below ~1534 (one subframe) produces 0 here. 4) iwl-utils.c:27 -- gso_size is set to zero: skb_shinfo(skb)->gso_size = num_subframes * mss = 0 * 1460 = 0 5) iwl-utils.c:30 -- skb_gso_segment() with gso_size=0 creates 32001+ tiny segments, which is the error you're seeing: "skbuff: ERROR: Found more than 32000 packets in skb_segment" "iwl-mvm-tx-tso-segment, list gso-segment list is huge: 32001" 6) mld/tx.c:912-936 -- the loop queues ~1024 of those segments to the TX ring before it fills up, then purges the rest. This creates a massive burst of tiny frames that stress the BA completion path. The MVM driver is immune because it checks mvmsta->amsdu_enabled (a separate bitmap) at tx.c:912 and tx.c:936 BEFORE ever reaching the num_subframes calculation. MLD has no equivalent -- it relies solely on max_tid_amsdu_len, and the sentinel value 1 slips through. This explains all your observations: - 6.18 regression: BE200 moved from MVM (has guard) to MLD (no guard) - AP-specific: the problem AP causes firmware to disable AMSDU for the active TID (other APs enable it, so max_tid_amsdu_len gets a proper value from iwl_mld_get_amsdu_size_of_tid()) - 28min gap between TSO explosion and UAF: the ~1024 micro-frame burst creates massive alloc/free churn in the skb slab, which can corrupt TCP retransmit queue entries allocated from the same cache - No firmware error: firmware is fine, the bug is purely in MLD's TSO parameter calculation Fix below. It adds a guard after the num_subframes calculation -- if it's zero, fall back to single-subframe TSO (num_subframes=1), which correctly sets gso_size=mss. This matches what MVM effectively does via its amsdu_enabled checks. Could you test this against the problem AP? Two things that would help confirm the theory: 1) Before applying the fix, add this debug print to see the actual max_tid_amsdu_len value with the problem AP: // In iwl_mld_tx_tso_segment(), after line 847 if (!num_subframes) pr_warn_once("iwlmld: num_subframes=0, max_tid_amsdu_len=%u " "subf_len=%u mss=%u\n", max_tid_amsdu_len, subf_len, mss); 2) After applying the fix, run against the problem AP for 1+ day and check if both the TSO explosion AND the UAF are gone. I also noticed a few secondary defense-in-depth regressions in MLD's TX completion path vs MVM: - MLD's iwl_mld_tx_reclaim_txq() has no per-TID reclaim tracking (MVM has tid_data->next_reclaimed and validates tid_data->txq_id) - The transport-level reclaim_lock prevents direct double-free, but MLD is missing MVM's extra safety checks These are probably not directly causing your crash, but worth noting. --- drivers/net/wireless/intel/iwlwifi/mld/tx.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/drivers/net/wireless/intel/iwlwifi/mld/tx.c b/drivers/net/wireless/intel/iwlwifi/mld/tx.c index fbb672f4d8c7..1d47254a4148 100644 --- a/drivers/net/wireless/intel/iwlwifi/mld/tx.c +++ b/drivers/net/wireless/intel/iwlwifi/mld/tx.c @@ -846,6 +846,17 @@ static int iwl_mld_tx_tso_segment(struct iwl_mld *mld, struct sk_buff *skb, */ num_subframes = (max_tid_amsdu_len + pad) / (subf_len + pad); + /* If the AMSDU length limit is too small to fit even a single + * subframe (e.g. max_tid_amsdu_len is the sentinel value 1 set by + * the TLC notification when AMSDU is disabled for this TID), fall + * back to non-AMSDU TSO segmentation. Without this guard, + * num_subframes=0 causes gso_size=0 in iwl_tx_tso_segment(), + * which makes skb_gso_segment() produce tens of thousands of + * 1-byte segments, overloading the TX ring and completion path. + */ + if (!num_subframes) + return iwl_tx_tso_segment(skb, 1, netdev_flags, mpdus_skbs); + if (sta->max_amsdu_subframes && num_subframes > sta->max_amsdu_subframes) num_subframes = sta->max_amsdu_subframes; -- 2.52.0 Cole ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] wifi: iwlwifi: prevent NAPI processing after firmware error 2026-02-16 18:12 ` Ben Greear 2026-02-18 14:44 ` Cole Leavitt 2026-02-18 14:44 ` Cole Leavitt @ 2026-02-18 17:35 ` Ben Greear 2 siblings, 0 replies; 9+ messages in thread From: Ben Greear @ 2026-02-18 17:35 UTC (permalink / raw) To: Cole Leavitt; +Cc: Johannes Berg, linux-wireless, Miri Korenblit On 2/18/26 09:17, Cole Leavitt wrote: > Ben, > > I've been digging into the use-after-free crash you reported on your > BE200 running the MLD driver (tcp_shifted_skb refcount underflow, > followed by NULL deref in tcp_rack_detect_loss). I think I found the > root cause -- it's a missing guard in the MLD TSO segmentation path > that lets num_subframes=0 reach skb_gso_segment(), producing the 32k+ > segment explosion you're seeing Hello Cole, Thanks for this, I'll take a closer look and test this out. But also, I first saw this back in 2024, and that was before mld split from mvm driver. Possibly mvm added protection after I saw the problem and that didn't make it into mld for some reason, or maybe there are other problems as well. Thanks, Ben > > Here's the full chain: > > 1) mld/tlc.c:790 -- when firmware's TLC notification disables AMSDU for > a TID (bit not set in amsdu_enabled), the MLD driver sets: > > link_sta->agg.max_tid_amsdu_len[i] = 1; > > This sentinel value 1 means "AMSDU disabled on this TID". > > 2) mld/tx.c:836-837 -- the TSO path checks: > > max_tid_amsdu_len = sta->cur->max_tid_amsdu_len[tid]; > if (!max_tid_amsdu_len) // <-- only catches zero, not 1 > return iwl_tx_tso_segment(skb, 1, ...); > > Value 1 passes this check. > > 3) mld/tx.c:847 -- the division produces zero: > > num_subframes = (1 + 2) / (1534 + 2) = 0 > > Any max_tid_amsdu_len below ~1534 (one subframe) produces 0 here. > > 4) iwl-utils.c:27 -- gso_size is set to zero: > > skb_shinfo(skb)->gso_size = num_subframes * mss = 0 * 1460 = 0 > > 5) iwl-utils.c:30 -- skb_gso_segment() with gso_size=0 creates 32001+ > tiny segments, which is the error you're seeing: > > "skbuff: ERROR: Found more than 32000 packets in skb_segment" > "iwl-mvm-tx-tso-segment, list gso-segment list is huge: 32001" > > 6) mld/tx.c:912-936 -- the loop queues ~1024 of those segments to the > TX ring before it fills up, then purges the rest. This creates a > massive burst of tiny frames that stress the BA completion path. > > The MVM driver is immune because it checks mvmsta->amsdu_enabled (a > separate bitmap) at tx.c:912 and tx.c:936 BEFORE ever reaching the > num_subframes calculation. MLD has no equivalent -- it relies solely on > max_tid_amsdu_len, and the sentinel value 1 slips through. > > This explains all your observations: > - 6.18 regression: BE200 moved from MVM (has guard) to MLD (no guard) > - AP-specific: the problem AP causes firmware to disable AMSDU for the > active TID (other APs enable it, so max_tid_amsdu_len gets a proper > value from iwl_mld_get_amsdu_size_of_tid()) > - 28min gap between TSO explosion and UAF: the ~1024 micro-frame burst > creates massive alloc/free churn in the skb slab, which can corrupt > TCP retransmit queue entries allocated from the same cache > - No firmware error: firmware is fine, the bug is purely in MLD's TSO > parameter calculation > > Fix below. It adds a guard after the num_subframes calculation -- if > it's zero, fall back to single-subframe TSO (num_subframes=1), which > correctly sets gso_size=mss. This matches what MVM effectively does via > its amsdu_enabled checks. > > Could you test this against the problem AP? Two things that would help > confirm the theory: > > 1) Before applying the fix, add this debug print to see the actual > max_tid_amsdu_len value with the problem AP: > > // In iwl_mld_tx_tso_segment(), after line 847 > if (!num_subframes) > pr_warn_once("iwlmld: num_subframes=0, max_tid_amsdu_len=%u " > "subf_len=%u mss=%u\n", > max_tid_amsdu_len, subf_len, mss); > > 2) After applying the fix, run against the problem AP for 1+ day and > check if both the TSO explosion AND the UAF are gone. > > I also noticed a few secondary defense-in-depth regressions in MLD's > TX completion path vs MVM: > > - MLD's iwl_mld_tx_reclaim_txq() has no per-TID reclaim tracking > (MVM has tid_data->next_reclaimed and validates tid_data->txq_id) > - The transport-level reclaim_lock prevents direct double-free, but > MLD is missing MVM's extra safety checks > > These are probably not directly causing your crash, but worth noting. > > --- > drivers/net/wireless/intel/iwlwifi/mld/tx.c | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/drivers/net/wireless/intel/iwlwifi/mld/tx.c b/drivers/net/wireless/intel/iwlwifi/mld/tx.c > index fbb672f4d8c7..1d47254a4148 100644 > --- a/drivers/net/wireless/intel/iwlwifi/mld/tx.c > +++ b/drivers/net/wireless/intel/iwlwifi/mld/tx.c > @@ -846,6 +846,17 @@ static int iwl_mld_tx_tso_segment(struct iwl_mld *mld, struct sk_buff *skb, > */ > num_subframes = (max_tid_amsdu_len + pad) / (subf_len + pad); > > + /* If the AMSDU length limit is too small to fit even a single > + * subframe (e.g. max_tid_amsdu_len is the sentinel value 1 set by > + * the TLC notification when AMSDU is disabled for this TID), fall > + * back to non-AMSDU TSO segmentation. Without this guard, > + * num_subframes=0 causes gso_size=0 in iwl_tx_tso_segment(), > + * which makes skb_gso_segment() produce tens of thousands of > + * 1-byte segments, overloading the TX ring and completion path. > + */ > + if (!num_subframes) > + return iwl_tx_tso_segment(skb, 1, netdev_flags, mpdus_skbs); > + > if (sta->max_amsdu_subframes && > num_subframes > sta->max_amsdu_subframes) > num_subframes = sta->max_amsdu_subframes; -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] wifi: iwlwifi: prevent NAPI processing after firmware error 2026-02-14 18:10 ` Cole Leavitt [not found] ` <5be8a502-d53a-4cce-821f-202368c44f6d@candelatech.com> @ 2026-02-14 18:41 ` Cole Leavitt 1 sibling, 0 replies; 9+ messages in thread From: Cole Leavitt @ 2026-02-14 18:41 UTC (permalink / raw) To: johannes.berg, miriam.rachel.korenblit Cc: greearb, linux-wireless, stable, Cole Leavitt After a firmware error is detected and STATUS_FW_ERROR is set, NAPI can still be actively polling or get scheduled from a prior interrupt. The NAPI poll functions (both legacy and MSIX variants) have no check for STATUS_FW_ERROR and will continue processing stale RX ring entries from dying firmware. This can dispatch TX completion notifications containing corrupt SSN values to iwl_mld_handle_tx_resp_notif(), which passes them to iwl_trans_reclaim(). If the corrupt SSN causes reclaim to walk TX queue entries that were already freed by a prior correct reclaim, the result is an skb use-after-free or double-free. The race window opens when the MSIX IRQ handler schedules NAPI (lines 2319-2321 in rx.c) before processing the error bit (lines 2382-2396), or when NAPI is already running on another CPU from a previous interrupt when STATUS_FW_ERROR gets set on the current CPU. Add STATUS_FW_ERROR checks to both NAPI poll functions to prevent processing stale RX data after firmware error, and add early-return guards in the TX response and compressed BA notification handlers as defense-in-depth. Each check uses WARN_ONCE to log if the race is actually hit, which aids diagnosis of the hard-to-reproduce skb use-after-free reported on Intel BE200. Note that _iwl_trans_pcie_gen2_stop_device() already calls iwl_pcie_rx_napi_sync() to quiesce NAPI during device teardown, but that runs much later in the restart sequence. These checks close the window between error detection and device stop. Fixes: d1e879ec600f ("wifi: iwlwifi: add iwlmld sub-driver") Cc: stable@vger.kernel.org Signed-off-by: Cole Leavitt <cole@unwrap.rs> --- drivers/net/wireless/intel/iwlwifi/mld/tx.c | 19 ++++++++++++++++++ .../wireless/intel/iwlwifi/pcie/gen1_2/rx.c | 20 +++++++++++++++++++ 2 files changed, 39 insertions(+) diff --git a/drivers/net/wireless/intel/iwlwifi/mld/tx.c b/drivers/net/wireless/intel/iwlwifi/mld/tx.c index 3b4b575aadaa..3e99f3ded9bc 100644 --- a/drivers/net/wireless/intel/iwlwifi/mld/tx.c +++ b/drivers/net/wireless/intel/iwlwifi/mld/tx.c @@ -1071,6 +1071,18 @@ void iwl_mld_handle_tx_resp_notif(struct iwl_mld *mld, bool mgmt = false; bool tx_failure = (status & TX_STATUS_MSK) != TX_STATUS_SUCCESS; + /* Firmware is dead — the TX response may contain corrupt SSN values + * from a dying firmware DMA. Processing it could cause + * iwl_trans_reclaim() to free the wrong TX queue entries, leading to + * skb use-after-free or double-free. + */ + if (unlikely(test_bit(STATUS_FW_ERROR, &mld->trans->status))) { + WARN_ONCE(1, + "iwlwifi: TX resp notif (sta=%d txq=%d) after FW error\n", + sta_id, txq_id); + return; + } + if (IWL_FW_CHECK(mld, tx_resp->frame_count != 1, "Invalid tx_resp notif frame_count (%d)\n", tx_resp->frame_count)) @@ -1349,6 +1361,13 @@ void iwl_mld_handle_compressed_ba_notif(struct iwl_mld *mld, u8 sta_id = ba_res->sta_id; struct ieee80211_link_sta *link_sta; + if (unlikely(test_bit(STATUS_FW_ERROR, &mld->trans->status))) { + WARN_ONCE(1, + "iwlwifi: BA notif (sta=%d) after FW error\n", + sta_id); + return; + } + if (!tfd_cnt) return; diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/rx.c b/drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/rx.c index 619a9505e6d9..ba18d35fa55d 100644 --- a/drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/rx.c +++ b/drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/rx.c @@ -1015,6 +1015,18 @@ static int iwl_pcie_napi_poll(struct napi_struct *napi, int budget) trans_pcie = iwl_netdev_to_trans_pcie(napi->dev); trans = trans_pcie->trans; + /* Stop processing RX if firmware has crashed. Stale notifications + * from dying firmware (e.g. TX completions with corrupt SSN values) + * can cause use-after-free in reclaim paths. + */ + if (unlikely(test_bit(STATUS_FW_ERROR, &trans->status))) { + WARN_ONCE(1, + "iwlwifi: NAPI poll[%d] invoked after FW error\n", + rxq->id); + napi_complete_done(napi, 0); + return 0; + } + ret = iwl_pcie_rx_handle(trans, rxq->id, budget); IWL_DEBUG_ISR(trans, "[%d] handled %d, budget %d\n", @@ -1042,6 +1054,14 @@ static int iwl_pcie_napi_poll_msix(struct napi_struct *napi, int budget) trans_pcie = iwl_netdev_to_trans_pcie(napi->dev); trans = trans_pcie->trans; + if (unlikely(test_bit(STATUS_FW_ERROR, &trans->status))) { + WARN_ONCE(1, + "iwlwifi: NAPI MSIX poll[%d] invoked after FW error\n", + rxq->id); + napi_complete_done(napi, 0); + return 0; + } + ret = iwl_pcie_rx_handle(trans, rxq->id, budget); IWL_DEBUG_ISR(trans, "[%d] handled %d, budget %d\n", rxq->id, ret, budget); -- 2.52.0 ^ permalink raw reply related [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-02-19 16:38 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <7f72ac08-6b4a-486b-a8f9-7b78ea0f5ae1@candelatech.com>
2026-02-18 18:47 ` [PATCH] wifi: iwlwifi: prevent NAPI processing after firmware error Cole Leavitt
2026-02-19 16:38 ` Ben Greear
[not found] <c6f886d4-b9ed-48a6-9723-a738af055b64@candelatech.com>
2026-02-14 18:10 ` Cole Leavitt
[not found] ` <5be8a502-d53a-4cce-821f-202368c44f6d@candelatech.com>
2026-02-14 18:33 ` Cole Leavitt
2026-02-16 18:12 ` Ben Greear
2026-02-18 14:44 ` Cole Leavitt
2026-02-18 14:44 ` Cole Leavitt
2026-02-18 17:35 ` Ben Greear
2026-02-14 18:41 ` Cole Leavitt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox