public inbox for linux-wireless@vger.kernel.org
 help / color / mirror / Atom feed
From: Cole Leavitt <cole@unwrap.rs>
To: linux-wireless@vger.kernel.org
Cc: greearb@candelatech.com, miriam.rachel.korenblit@intel.com,
	johannes@sipsolutions.net, cole@unwrap.rs
Subject: [PATCH v3 0/3] wifi: iwlwifi: mld: stability fixes around firmware error recovery
Date: Mon, 20 Apr 2026 10:44:03 -0700	[thread overview]
Message-ID: <20260420174406.128254-1-cole@unwrap.rs> (raw)

Three fixes for the iwlmld sub-driver observed on Intel BE200 (Wi-Fi 7).

1/3 adds STATUS_FW_ERROR guards to the NAPI poll functions and to the
    TX/BA notification handlers.  iwl_trans_reclaim() already has its
    own STATUS_FW_ERROR early-return, so these are defense-in-depth
    with WARN_ONCE instrumentation: if the suspected post-FW-error race
    fires, we now catch it early (before reclaim) and log it.  Tested by
    Ben Greear, who confirmed the WARN fires on his systems during
    firmware error recovery.

2/3 fixes a real TSO segmentation bug.  When the TLC notification
    disables AMSDU for a TID, max_tid_amsdu_len is set to the sentinel
    value 1 (see mld/tlc.c line 858).  The existing zero-only check in
    iwl_mld_tx_tso_segment() lets this sentinel through, producing
    num_subframes=0, which feeds gso_size=0 into iwl_tx_tso_segment()
    and downstream skb_gso_segment().

    MVM is immune because it gates on mvmsta->amsdu_enabled; MLD has no
    equivalent bitmap.  Fix by also treating the sentinel 1 as "AMSDU
    disabled" at the existing guard, and add a WARN_ON_ONCE(!num_subframes)
    after the division so any future path that produces zero through a
    different mechanism is caught and reported rather than silently
    creating a pathological GSO skb.

3/3 adds STATUS_FW_ERROR checks at the top of iwl_mld_tx_from_txq() and
    iwl_mld_mac80211_tx() to stop pulling frames for dead firmware,
    eliminating an observed soft lockup during firmware error recovery.
    Revised per Johannes Berg's feedback to use status-bit checks rather
    than ieee80211_stop_queues()/wake_queues() which do not interact
    well with TXQ-based APIs.

Changes since v2:
  - 1/3:
      * Stripped inadvertent clang-format style churn from v2; the diff
        is now only the functional STATUS_FW_ERROR guards (four hunks,
        ~30 lines added).
      * Rewrote commit message: the v2 message claimed a proven
        SSN-corruption -> iwl_trans_reclaim() UAF chain, but
        iwl_trans_reclaim() already checks STATUS_FW_ERROR itself
        (iwl-trans.c:~663), so that chain cannot actually reach the
        queue walk.  The patch is more accurately described as
        "earlier STATUS_FW_ERROR guards with WARN_ONCE instrumentation
        for diagnosis of suspected post-FW-error NAPI scheduling."
      * Kept Tested-by: Ben Greear.

  - 2/3:
      * Removed the speculative "TCP retransmit queue UAF / refcount
        underflow in tcp_shifted_skb / NULL deref in tcp_rack_detect_loss"
        chain from the commit message per Ben Greear's feedback.  Those
        symptoms are real but the causal link to this bug was not
        directly traced; describing them as consequences of this patch
        was overclaiming.
      * Commit message now states only what can be traced in-tree:
        sentinel 1 -> num_subframes=0 -> gso_size=0 -> unbounded
        skb_gso_segment() output.  Downstream symptom attribution is
        left for the separate investigation Ben and I have underway.
      * Code change is unchanged from v2.

  - 3/3: Unchanged from v2 beyond rebase context.

To Miriam's question in the v2 thread ("Was the soft lockup happening
as a consequence of the bug fixed in 2/3?"):  Yes, our typical trace is
2/3's GSO explosion -> firmware receives malformed AMSDU descriptors
-> firmware hangs in an MMIO poll (FSEQ_ERROR_CODE 0x67A00000,
SYSTEM_STATISTICS_CMD timeout) -> 3/3's dead-firmware TX path keeps
spinning -> soft lockup.  Full dmesg attached to a follow-up on the
v2 thread so the Intel firmware team can investigate the c102 MMIO
poll hang separately; the kernel-side chain is independently
reproducible with a small test case.

Cole Leavitt (3):
  wifi: iwlwifi: add STATUS_FW_ERROR guards to NAPI/TX-notif paths
  wifi: iwlwifi: mld: fix TSO segmentation when AMSDU is disabled
  wifi: iwlwifi: mld: skip TX when firmware is dead

 drivers/net/wireless/intel/iwlwifi/mld/mac80211.c  |  4 ++++
 drivers/net/wireless/intel/iwlwifi/mld/tx.c        | 22 ++++++++++++++++++++++
 drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/rx.c| 18 ++++++++++++++++++
 3 files changed, 44 insertions(+)

base-commit: 3aae9383f42f687221c011d7ee87529398e826b3
-- 
2.52.0

             reply	other threads:[~2026-04-20 17:44 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-20 17:44 Cole Leavitt [this message]
2026-04-20 17:44 ` [PATCH v3 1/3] wifi: iwlwifi: add STATUS_FW_ERROR guards to NAPI/TX-notif paths Cole Leavitt
2026-04-20 17:44 ` [PATCH v3 2/3] wifi: iwlwifi: mld: fix TSO segmentation when AMSDU is disabled Cole Leavitt
2026-04-20 17:44 ` [PATCH v3 3/3] wifi: iwlwifi: mld: skip TX when firmware is dead Cole Leavitt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260420174406.128254-1-cole@unwrap.rs \
    --to=cole@unwrap.rs \
    --cc=greearb@candelatech.com \
    --cc=johannes@sipsolutions.net \
    --cc=linux-wireless@vger.kernel.org \
    --cc=miriam.rachel.korenblit@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox