From: Ben Greear <greearb@candelatech.com>
To: Cole Leavitt <cole@unwrap.rs>
Cc: linux-wireless@vger.kernel.org
Subject: Re: [PATCH] wifi: iwlwifi: prevent NAPI processing after firmware error
Date: Thu, 19 Feb 2026 08:38:36 -0800 [thread overview]
Message-ID: <5ff61419-0d65-7cdf-4033-afb43df03a2e@candelatech.com> (raw)
In-Reply-To: <20260218184749.22675-1-cole@unwrap.rs>
On 2/18/26 10:47, Cole Leavitt wrote:
> Ben,
>
> Thanks for the historical context. I dug through the git history and
> your linux-ct repos to verify exactly what happened when. I want to
> make sure I have this right - can you confirm whether this matches
> what you saw?
Bug was originally seen in mainline kernel before MLD driver was forked
off from mvm, not in a backports kernel.
Adding your patch below didn't solve the UAF in the tcp_ack path,
at least. I did not see the debugging indicated that code path
in the patch was taken. I have not seen any more instances of the 32k loops in
packet segment loop in the last crash, so at least it is not only reason why a UAF
would happen.
The problem reproduced overnight was:
BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 0 P4D 0
Oops: Oops: 0002 [#1] SMP
CPU: 12 UID: 0 PID: 1234 Comm: irq/345-iwlwifi Tainted: G S O 6.18.9+ #53 PREEMPT(full)
Tainted: [S]=CPU_OUT_OF_SPEC, [O]=OOT_MODULE
Hardware name: Default string /Default string, BIOS 5.27 11/12/2024
RIP: 0010:rb_erase+0x173/0x350
Code: 08 48 8b 01 a8 01 75 97 48 83 c0 01 48 89 01 c3 c3 48 89 46 10 e9 27 ff ff ff 48 8b 56 10 48 8d 41 01 48 89 51 08 48 89 4e 10 <48> 89 02 48 8b 01 48 89 06
48 89 31 48 83 f8 03 0f 86 8e 00 00 00
RSP: 0018:ffffc9000038c820 EFLAGS: 00010246
RAX: ffff8881b0646601 RBX: 000000000000000c RCX: ffff8881b0646600
RDX: 0000000000000000 RSI: ffff8881e9cbea00 RDI: ffff8881b0646200
------------[ cut here ]------------
RBP: ffff8881b0646200 R08: ffff8881ce443108 R09: 0000000080200001
R10: 0000000000010000 R11: 00000000f0eaffb7 R12: ffff8881ce442f80
R13: 0000000000000004 R14: ffff8881b0646600 R15: 0000000000000001
refcount_t: underflow; use-after-free.
FS: 0000000000000000(0000) GS:ffff8888dc42e000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000005a36002 CR4: 0000000000772ef0
PKRU: 55555554
Call Trace:
WARNING: CPU: 0 PID: 1224 at lib/refcount.c:28 refcount_warn_saturate+0xd8/0xe0
<IRQ>
Modules linked in:
tcp_ack+0x635/0x16e0
nf_conntrack_netlink
tcp_rcv_established+0x211/0xc10
nf_conntrack
? sk_filter_trim_cap+0x1a7/0x350
nfnetlink
tcp_v4_do_rcv+0x1bf/0x350
tls
tcp_v4_rcv+0xddf/0x1550
vrf
? lock_timer_base+0x6d/0x90
nf_defrag_ipv6
? raw_local_deliver+0xcc/0x280
nf_defrag_ipv4
ip_protocol_deliver_rcu+0x20/0x130
8021q
ip_local_deliver_finish+0x85/0xf0
garp
ip_sublist_rcv_finish+0x35/0x50
mrp
ip_sublist_rcv+0x16f/0x200
stp
ip_list_rcv+0xfe/0x130
llc
__netif_receive_skb_list_core+0x183/0x1f0
macvlan
netif_receive_skb_list_internal+0x1c8/0x2a0
wanlink(O)
gro_receive_skb+0x12e/0x210
pktgen
ieee80211_rx_napi+0x82/0xc0 [mac80211]
rpcrdma
iwl_mld_rx_mpdu+0xd0f/0xf00 [iwlmld]
rdma_cm
iwl_pcie_rx_handle+0x394/0xa00 [iwlwifi]
iw_cm
iwl_pcie_napi_poll_msix+0x3f/0x110 [iwlwifi]
ib_cm
__napi_poll+0x25/0x1e0
ib_core
net_rx_action+0x2d3/0x340
qrtr
I have enough guard/debugging logic in place that I'm pretty sure the skb coming
from iwlwifi in this particular path is fine. It appears the problem is that
there is an already freed skb in the socket's skb collection, and code blows up
trying to access a bad rbtree link, or something. I'm continuing to try to narrow
down where skb goes bad, but it seems like probably some other thread of logic is
racing to free the skb since the crash site moves around a lot. Maybe I can add
some sort of debugging to warn if skb is freed while in an rbtree...
Thanks,
Ben
>
> 2018 Bug (Bug 199209)
> ---------------------
> Fixed by Emmanuel in commit 0eac9abace16 ("iwlwifi: mvm: fix TX of
> AMSDU with fragmented SKBs"). That was a different trigger - NFS
> created highly fragmented SKBs where nr_frags was so high that the
> buffer descriptor limit check produced num_subframes=0. Emmanuel's
> fix clamps that path to 1.
>
> Current MLD Bug
> ---------------
> Different path to the same symptom. When TLC disables AMSDU for a
> TID, both MVM and MLD set max_tid_amsdu_len[tid] = 1 as a sentinel
> value. The key difference in protection:
>
> MVM has a private mvmsta->amsdu_enabled bitmap that gates the entire
> AMSDU path:
>
> if (!mvmsta->amsdu_enabled)
> return iwl_tx_tso_segment(skb, 1, ...); // bail out early
>
> if (!(mvmsta->amsdu_enabled & BIT(tid)))
> return iwl_tx_tso_segment(skb, 1, ...); // bail out early
>
> MVM never reads max_tid_amsdu_len in its TX path - it uses its own
> mvmsta->max_amsdu_len. This bitmap was added in commit 84226ca1c5d3
> ("iwlwifi: mvm: enable AMSDU for all TIDs", Nov 2017).
>
> MLD was designed to use mac80211's sta->cur->max_tid_amsdu_len
> directly, with no equivalent bitmap:
>
> max_tid_amsdu_len = sta->cur->max_tid_amsdu_len[tid];
> if (!max_tid_amsdu_len) // only catches 0, not sentinel 1!
> return iwl_tx_tso_segment(skb, 1, ...);
>
> num_subframes = (max_tid_amsdu_len + pad) / (subf_len + pad);
> // When max_tid_amsdu_len=1: num_subframes = (1 + 3) / (1534 + 3) = 0
>
> What I found in your repos:
>
> - linux-ct-6.5-be200, linux-ct-6.10, linux-ct-6.14: No MLD driver,
> only MVM with amsdu_enabled bitmap protection
> - linux-ct-6.15, linux-ct-6.18: Have MLD driver
> (drivers/net/wireless/intel/iwlwifi/mld/)
> - backport-iwlwifi: MLD tx.c first appeared in commit 56f903a89
> (2024-07-17)
>
> So MVM should have been immune to this specific sentinel-value bug
> due to the bitmap check.
>
> Question for you: When you saw TSO segment explosions in 2024, what
> kernel and driver were you using? If it was one of your 6.5-6.14
> kernels with MVM, then there may be a different path to
> num_subframes=0 that I haven't identified yet. If you were using
> backport-iwlwifi with MLD enabled, that would explain it hitting the
> same bug I'm fixing now.
>
> The commit ae6d30a71521 (Feb 2024) added better error reporting for
> skb_gso_segment failures, which suggests people were hitting GSO
> segment errors around that time - but I don't have visibility into
> what specific trigger you hit.
>
> My fix catches the sentinel-induced zero after the calculation, which
> is equivalent to what MVM's bitmap check accomplishes. This should
> prevent the current MLD bug from reaching skb_gso_segment with
> gso_size=0.
>
> Looking forward to your test results with the problem AP, and any
> clarification on what setup you were using in 2024.
>
> Cole
>
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
next prev parent reply other threads:[~2026-02-19 16:38 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <7f72ac08-6b4a-486b-a8f9-7b78ea0f5ae1@candelatech.com>
2026-02-18 18:47 ` [PATCH] wifi: iwlwifi: prevent NAPI processing after firmware error Cole Leavitt
2026-02-19 16:38 ` Ben Greear [this message]
[not found] <c6f886d4-b9ed-48a6-9723-a738af055b64@candelatech.com>
2026-02-14 18:10 ` Cole Leavitt
[not found] ` <5be8a502-d53a-4cce-821f-202368c44f6d@candelatech.com>
2026-02-14 18:33 ` Cole Leavitt
2026-02-16 18:12 ` Ben Greear
2026-02-18 14:44 ` Cole Leavitt
2026-02-18 14:44 ` Cole Leavitt
2026-02-18 17:35 ` Ben Greear
2026-02-14 18:41 ` Cole Leavitt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5ff61419-0d65-7cdf-4033-afb43df03a2e@candelatech.com \
--to=greearb@candelatech.com \
--cc=cole@unwrap.rs \
--cc=linux-wireless@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox