From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail.unwrap.rs (mail.unwrap.rs [172.232.15.166]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 43FD73009DE for ; Wed, 18 Feb 2026 18:47:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=172.232.15.166 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771440478; cv=none; b=UEK1rVlz3WraLdEn4IqiodTCOWAxaOwNoSApoGVpvzlbc/1vn4FVif5mNVgYO9FZXLEloRai4e9DK6ApJ8Pgkl0XhjUrkXyHzT/SvXjkLNxUzc3F/0bO5NHbWfyxtt272WhpSF2uc7hediJ/JRSXIZndZGExbykpLbjUgtJM4iA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771440478; c=relaxed/simple; bh=0LrHsHHQ7iSGgU3vLyN+JsxLWcPoykD9UQTWrIgo+sk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=e1allnlabQnENHnqteZRvTnm6WxMFaqlf5xb0C1RKaZP+KfWTmTY+XCftNKto3PO7UNb+rsLxgHWQgGeJjwBPFnyWbN/BDpbwsi5+mX0N2HlpnPoY0OGh+ZisXl4SXbN2zk8zauUz777B422Nmqul915yizY8n17fQdm1anZBGQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=unwrap.rs; spf=pass smtp.mailfrom=unwrap.rs; arc=none smtp.client-ip=172.232.15.166 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=unwrap.rs Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=unwrap.rs From: Cole Leavitt To: Ben Greear Cc: linux-wireless@vger.kernel.org Subject: Re: [PATCH] wifi: iwlwifi: prevent NAPI processing after firmware error Date: Wed, 18 Feb 2026 11:47:49 -0700 Message-ID: <20260218184749.22675-1-cole@unwrap.rs> X-Mailer: git-send-email 2.52.0 In-Reply-To: <7f72ac08-6b4a-486b-a8f9-7b78ea0f5ae1@candelatech.com> References: <7f72ac08-6b4a-486b-a8f9-7b78ea0f5ae1@candelatech.com> Precedence: bulk X-Mailing-List: linux-wireless@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Ben, Thanks for the historical context. I dug through the git history and your linux-ct repos to verify exactly what happened when. I want to make sure I have this right - can you confirm whether this matches what you saw? 2018 Bug (Bug 199209) --------------------- Fixed by Emmanuel in commit 0eac9abace16 ("iwlwifi: mvm: fix TX of AMSDU with fragmented SKBs"). That was a different trigger - NFS created highly fragmented SKBs where nr_frags was so high that the buffer descriptor limit check produced num_subframes=0. Emmanuel's fix clamps that path to 1. Current MLD Bug --------------- Different path to the same symptom. When TLC disables AMSDU for a TID, both MVM and MLD set max_tid_amsdu_len[tid] = 1 as a sentinel value. The key difference in protection: MVM has a private mvmsta->amsdu_enabled bitmap that gates the entire AMSDU path: if (!mvmsta->amsdu_enabled) return iwl_tx_tso_segment(skb, 1, ...); // bail out early if (!(mvmsta->amsdu_enabled & BIT(tid))) return iwl_tx_tso_segment(skb, 1, ...); // bail out early MVM never reads max_tid_amsdu_len in its TX path - it uses its own mvmsta->max_amsdu_len. This bitmap was added in commit 84226ca1c5d3 ("iwlwifi: mvm: enable AMSDU for all TIDs", Nov 2017). MLD was designed to use mac80211's sta->cur->max_tid_amsdu_len directly, with no equivalent bitmap: max_tid_amsdu_len = sta->cur->max_tid_amsdu_len[tid]; if (!max_tid_amsdu_len) // only catches 0, not sentinel 1! return iwl_tx_tso_segment(skb, 1, ...); num_subframes = (max_tid_amsdu_len + pad) / (subf_len + pad); // When max_tid_amsdu_len=1: num_subframes = (1 + 3) / (1534 + 3) = 0 What I found in your repos: - linux-ct-6.5-be200, linux-ct-6.10, linux-ct-6.14: No MLD driver, only MVM with amsdu_enabled bitmap protection - linux-ct-6.15, linux-ct-6.18: Have MLD driver (drivers/net/wireless/intel/iwlwifi/mld/) - backport-iwlwifi: MLD tx.c first appeared in commit 56f903a89 (2024-07-17) So MVM should have been immune to this specific sentinel-value bug due to the bitmap check. Question for you: When you saw TSO segment explosions in 2024, what kernel and driver were you using? If it was one of your 6.5-6.14 kernels with MVM, then there may be a different path to num_subframes=0 that I haven't identified yet. If you were using backport-iwlwifi with MLD enabled, that would explain it hitting the same bug I'm fixing now. The commit ae6d30a71521 (Feb 2024) added better error reporting for skb_gso_segment failures, which suggests people were hitting GSO segment errors around that time - but I don't have visibility into what specific trigger you hit. My fix catches the sentinel-induced zero after the calculation, which is equivalent to what MVM's bitmap check accomplishes. This should prevent the current MLD bug from reaching skb_gso_segment with gso_size=0. Looking forward to your test results with the problem AP, and any clarification on what setup you were using in 2024. Cole