Re: [PATCH] wifi: iwlwifi: prevent NAPI processing after firmware error

public inbox for linux-wireless@vger.kernel.org
 help / color / mirror / Atom feed

From: Cole Leavitt <cole@unwrap.rs>
To: greearb@candelatech.com
Cc: johannes.berg@intel.com, miriam.rachel.korenblit@intel.com,
	linux-wireless@vger.kernel.org, Cole Leavitt <cole@unwrap.rs>
Subject: Re: [PATCH] wifi: iwlwifi: prevent NAPI processing after firmware error
Date: Sat, 14 Feb 2026 11:33:06 -0700	[thread overview]
Message-ID: <20260214183306.10188-1-cole@unwrap.rs> (raw)
In-Reply-To: <5be8a502-d53a-4cce-821f-202368c44f6d@candelatech.com>

Ben,

Good catch on both fronts.

On the build_tfd dangling pointer -- you're right. The failure path at
line 775 leaves entries[idx].skb/cmd pointing at caller-owned objects
(set at lines 763-764). The caller gets -1 and presumably frees the
skb, so entries[idx].skb becomes a dangling pointer. While write_ptr
not advancing means current unmap paths won't iterate to that index,
it's a latent UAF waiting for a flush path change or future code to
touch it. Two NULL stores inside a held spinlock cost nothing. I think
this should go upstream as its own patch.

On the TOCTOU question -- this is the part I spent the most time on.
The window you're asking about is: firmware starts producing corrupt
completion data *before* STATUS_FW_ERROR gets set. Our NAPI/TX handler
checks can't help there because the flag isn't set yet.

The primary guard in that window is iwl_txq_used() in
iwl_pcie_reclaim(). It validates that the firmware's SSN falls within
[read_ptr, write_ptr). This catches wild values -- out-of-range SSNs,
wraparound corruption, etc.

What it can't catch is an in-range corrupt SSN -- e.g., firmware says
reclaim up to index 15 when legitimate is 8, but write_ptr is 20.
That passes bounds checking and the reclaim loop frees skbs for
entries still in-flight (active DMA). The NULL skb WARN_ONCE in the
loop catches double-reclaim but not first-time over-reclaim.

The complete fix for this would be a per-entry generation counter --
tag each entry on submit, validate on reclaim. But that adds per-entry
overhead on the TX hot path to protect against a condition (firmware
producing corrupt completions) that is already terminal. I think the
right trade-off is:

  1. Your build_tfd NULL fix (eliminates one dangling pointer class)
  2. STATUS_FW_ERROR checks in NAPI poll + TX handlers (this series --
     shrinks the detection window to near-zero)
  3. The existing iwl_txq_used() bounds check (catches most corrupt
     SSNs)

Together these make the damage window small enough that a per-entry
generation scheme isn't justified -- by the time firmware is sending
corrupt SSNs, we're in dump-and-reset territory anyway.

That said, if you're seeing corruption patterns in your customer
testing where a valid-looking-but-wrong SSN gets through before
FW_ERROR fires, I'd be very interested in the traces. That would
change the cost/benefit on the generation counter approach.

Thanks,
Cole

next prev parent reply	other threads:[~2026-02-14 18:35 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <c6f886d4-b9ed-48a6-9723-a738af055b64@candelatech.com>
2026-02-14 18:10 ` [PATCH] wifi: iwlwifi: prevent NAPI processing after firmware error Cole Leavitt
     [not found]   ` <5be8a502-d53a-4cce-821f-202368c44f6d@candelatech.com>
2026-02-14 18:33     ` Cole Leavitt [this message]
2026-02-16 18:12       ` Ben Greear
2026-02-18 14:44         ` Cole Leavitt
2026-02-18 14:44         ` Cole Leavitt
2026-02-18 14:47         ` [PATCH 0/1] wifi: iwlwifi: mld: fix TSO segmentation explosion causing UAF Cole Leavitt
2026-02-18 14:47           ` [PATCH 1/1] wifi: iwlwifi: mld: fix TSO segmentation explosion when AMSDU is disabled Cole Leavitt
2026-03-22 12:28             ` Korenblit, Miriam Rachel
2026-03-22 12:29           ` [PATCH 0/1] wifi: iwlwifi: mld: fix TSO segmentation explosion causing UAF Korenblit, Miriam Rachel
2026-02-18 17:35         ` [PATCH] wifi: iwlwifi: prevent NAPI processing after firmware error Ben Greear
2026-02-14 18:41   ` Cole Leavitt
2026-02-14 18:43   ` [PATCH v3] " Cole Leavitt
2026-02-26 19:37     ` Ben Greear
     [not found] <7f72ac08-6b4a-486b-a8f9-7b78ea0f5ae1@candelatech.com>
2026-02-18 18:47 ` [PATCH] " Cole Leavitt
2026-02-19 16:38   ` Ben Greear

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260214183306.10188-1-cole@unwrap.rs \
    --to=cole@unwrap.rs \
    --cc=greearb@candelatech.com \
    --cc=johannes.berg@intel.com \
    --cc=linux-wireless@vger.kernel.org \
    --cc=miriam.rachel.korenblit@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox