* [PATCH wireless] wifi: cfg80211: remove scan request n_channels counted_by
@ 2025-07-14 12:21 Johannes Berg
2025-07-15 5:04 ` Kees Cook
0 siblings, 1 reply; 7+ messages in thread
From: Johannes Berg @ 2025-07-14 12:21 UTC (permalink / raw)
To: linux-wireless
Cc: Kees Cook, Haoyu Li, Johannes Berg, syzbot+e834e757bd9b3d3e1251
From: Johannes Berg <johannes.berg@intel.com>
This reverts commit e3eac9f32ec0 ("wifi: cfg80211: Annotate struct
cfg80211_scan_request with __counted_by").
This really has been a completely failed experiment. There were
no actual bugs found, and yet at this point we already have four
"fixes" to it, with nothing to show for but code churn, and it
never even made the code any safer.
In all of the cases that ended up getting "fixed", the structure
is also internally inconsistent after the n_channels setting as
the channel list isn't actually filled yet. You cannot scan with
such a structure, that's just wrong. In mac80211, the struct is
also reused multiple times, so initializing it once is no good.
Some previous "fixes" (e.g. one in brcm80211) are also just setting
n_channels before accessing the array, under the assumption that the
code is correct and the array can be accessed, further showing that
the whole thing is just pointless when the allocation count and use
count are not separate.
If we really wanted to fix it, we'd need to separately track the
number of channels allocated and the number of channels currently
used, but given that no bugs were found despite the numerous syzbot
reports, that'd just be a waste of time.
Remove the __counted_by() annotation. We really should also remove
a number of the n_channels settings that are setting up a structure
that's inconsistent, but that can wait.
Reported-by: syzbot+e834e757bd9b3d3e1251@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=e834e757bd9b3d3e1251
Fixes: e3eac9f32ec0 ("wifi: cfg80211: Annotate struct cfg80211_scan_request with __counted_by")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
---
include/net/cfg80211.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h
index d1848dc8ec99..10248d527616 100644
--- a/include/net/cfg80211.h
+++ b/include/net/cfg80211.h
@@ -2690,7 +2690,7 @@ struct cfg80211_scan_request {
s8 tsf_report_link_id;
/* keep last */
- struct ieee80211_channel *channels[] __counted_by(n_channels);
+ struct ieee80211_channel *channels[];
};
static inline void get_random_mask_addr(u8 *buf, const u8 *addr, const u8 *mask)
--
2.50.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH wireless] wifi: cfg80211: remove scan request n_channels counted_by
2025-07-14 12:21 [PATCH wireless] wifi: cfg80211: remove scan request n_channels counted_by Johannes Berg
@ 2025-07-15 5:04 ` Kees Cook
2025-07-15 8:24 ` Johannes Berg
0 siblings, 1 reply; 7+ messages in thread
From: Kees Cook @ 2025-07-15 5:04 UTC (permalink / raw)
To: Johannes Berg
Cc: linux-wireless, Haoyu Li, Johannes Berg,
syzbot+e834e757bd9b3d3e1251
On Mon, Jul 14, 2025 at 02:21:30PM +0200, Johannes Berg wrote:
> If we really wanted to fix it, we'd need to separately track the
> number of channels allocated and the number of channels currently
> used, but given that no bugs were found despite the numerous syzbot
> reports, that'd just be a waste of time.
This mismatch between "quantity allocated" and "quantity used from the
allocation" is repeated in more places that just wifi, and I'd agree
that it has caused some confusion. The intent of __counted_by is to
track the _allocation_, so my mistake was trying to apply it in places
where the allocation size is not retained, and to shoe-horn it into the
"used" tracking member.
Any opposition to adding such a field here, maybe "avail_channels"?
-Kees
--
Kees Cook
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH wireless] wifi: cfg80211: remove scan request n_channels counted_by
2025-07-15 5:04 ` Kees Cook
@ 2025-07-15 8:24 ` Johannes Berg
2025-07-18 0:05 ` Kees Cook
2025-07-21 18:36 ` Review of __counted_by in wireless (was Re: [PATCH wireless] wifi: cfg80211: remove scan request n_channels counted_by) Kees Cook
0 siblings, 2 replies; 7+ messages in thread
From: Johannes Berg @ 2025-07-15 8:24 UTC (permalink / raw)
To: Kees Cook; +Cc: linux-wireless, Haoyu Li, syzbot+e834e757bd9b3d3e1251
On Mon, 2025-07-14 at 22:04 -0700, Kees Cook wrote:
> On Mon, Jul 14, 2025 at 02:21:30PM +0200, Johannes Berg wrote:
> > If we really wanted to fix it, we'd need to separately track the
> > number of channels allocated and the number of channels currently
> > used, but given that no bugs were found despite the numerous syzbot
> > reports, that'd just be a waste of time.
>
> This mismatch between "quantity allocated" and "quantity used from the
> allocation" is repeated in more places that just wifi, and I'd agree
> that it has caused some confusion. The intent of __counted_by is to
> track the _allocation_, so my mistake was trying to apply it in places
> where the allocation size is not retained, and to shoe-horn it into the
> "used" tracking member.
I'd expect in most cases it's really the same - you allocate, fill, and
never touch it again before throwing it away at the end. I'd argue
though that in those cases the whole thing is quite pointless, although
it still allows finding out-of-bounds reads.
> Any opposition to adding such a field here, maybe "avail_channels"?
I guess fundamentally not, however, as described here, I don't think it
has actually all _that_ useful. We haven't found a single _real_ bug
with it in the two years this has been around, and quite frequently had
false positives. Now, tracking the allocation size would hopefully not
get into the false positives again but ...
Also, I still have 11 other such annotations that are probably at least
somewhat wrong in the same ways, and I guess it'll fall on me to review
them, since before forming an opinion on how it should be used I just
merged the changes, naively trusting you (and others.) The scheduled
scan for example is likely in a similar situation, though in that case
maybe we don't reuse the allocations as much.
So for now: no, I'm not going to apply any new counted_by() annotations.
It's cost me far too much time for having absolutely nothing to show for
the investment. Ask me again again next year maybe.
If you feel motivated you could help review and perhaps annotate the
still existing ones I guess, I'm thinking we should have comments there
like this perhaps:
--- a/net/mac80211/parse.c
+++ b/net/mac80211/parse.c
@@ -54,8 +54,9 @@ struct ieee80211_elems_parse {
* scratch buffer that can be used for various element parsing related
* tasks, e.g., element de-fragmentation etc.
*/
- size_t scratch_len;
u8 *scratch_pos;
+ /* __counted_by: scratch_len tracks the allocation and doesn't change */
+ size_t scratch_len;
u8 scratch[] __counted_by(scratch_len);
};
which also helps for otherwise understanding how scratch_len is used.
But I'm also not completely sure I've convinced myself that all the
above discussion about allocated vs. used is really the _entire_
explanation for it being such a spectacular failure here.
johannes
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH wireless] wifi: cfg80211: remove scan request n_channels counted_by
2025-07-15 8:24 ` Johannes Berg
@ 2025-07-18 0:05 ` Kees Cook
2025-07-18 8:26 ` Johannes Berg
2025-07-21 18:36 ` Review of __counted_by in wireless (was Re: [PATCH wireless] wifi: cfg80211: remove scan request n_channels counted_by) Kees Cook
1 sibling, 1 reply; 7+ messages in thread
From: Kees Cook @ 2025-07-18 0:05 UTC (permalink / raw)
To: Johannes Berg; +Cc: linux-wireless, Haoyu Li, syzbot+e834e757bd9b3d3e1251
On Tue, Jul 15, 2025 at 10:24:16AM +0200, Johannes Berg wrote:
> I'd expect in most cases it's really the same - you allocate, fill, and
> never touch it again before throwing it away at the end. I'd argue
> though that in those cases the whole thing is quite pointless, although
> it still allows finding out-of-bounds reads.
Adding counted_by has filled a large gap in the compiler's ability to
provide the bounds checking. Back in 2022 I had mostly written it off as
"unsolvable" with dynamic sizes being over 60% of the uninstrumentable
memcpy() destinations: https://outflux.net/slides/2022/lss-na/#/6
But thanks to counted_by, we can actually chip away at that percentage
and it's fallen under 50% now, which is nice. :)
> So for now: no, I'm not going to apply any new counted_by() annotations.
> It's cost me far too much time for having absolutely nothing to show for
> the investment. Ask me again again next year maybe.
Yup, totally understood. My desire to get them applied was mainly due to
there being so many (fixed size) array bounds problems in various wifi
drivers that it felt like a good place to focus: dynamic sizes in the
core wifi code. From the same slides above, you can arrow-down through
all of the then-recent array bounds flaws and excepting one in BT,
they're all in wifi: https://outflux.net/slides/2022/lss-na/#/1/6
> If you feel motivated you could help review and perhaps annotate the
> still existing ones I guess, I'm thinking we should have comments there
> like this perhaps:
>
> --- a/net/mac80211/parse.c
> +++ b/net/mac80211/parse.c
> @@ -54,8 +54,9 @@ struct ieee80211_elems_parse {
> * scratch buffer that can be used for various element parsing related
> * tasks, e.g., element de-fragmentation etc.
> */
> - size_t scratch_len;
> u8 *scratch_pos;
> + /* __counted_by: scratch_len tracks the allocation and doesn't change */
> + size_t scratch_len;
> u8 scratch[] __counted_by(scratch_len);
> };
Sure, I can add this to the TODO list.
> But I'm also not completely sure I've convinced myself that all the
> above discussion about allocated vs. used is really the _entire_
> explanation for it being such a spectacular failure here.
I think it's a big part of it -- having the counted member change after
initial assignment is even frowned upon by the compiler folks, but is
technically supported.
Anyway, sorry again for the wasting of time of yours (and others) that I
caused with this -- I really wasn't expecting it to go that way, and it
hasn't been anywhere near as troublesome in other areas of the kernel,
so it took me by surprise. I have tried to chase down and fix the glitches
when I became aware of them, FWIW.
I'll see if I can write up some patches for comments like you suggest
above with good "proof" attached to them. :)
-Kees
--
Kees Cook
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH wireless] wifi: cfg80211: remove scan request n_channels counted_by
2025-07-18 0:05 ` Kees Cook
@ 2025-07-18 8:26 ` Johannes Berg
0 siblings, 0 replies; 7+ messages in thread
From: Johannes Berg @ 2025-07-18 8:26 UTC (permalink / raw)
To: Kees Cook; +Cc: linux-wireless, Haoyu Li
On Thu, 2025-07-17 at 17:05 -0700, Kees Cook wrote:
> > But I'm also not completely sure I've convinced myself that all the
> > above discussion about allocated vs. used is really the _entire_
> > explanation for it being such a spectacular failure here.
>
> I think it's a big part of it -- having the counted member change after
> initial assignment is even frowned upon by the compiler folks, but is
> technically supported.
Right. So since I wrote the above, I've come to think that while it'd
almost certainly fix this issue, the num_channels had another side
effect of validating the *read* side in cases like this. While we may
allocate 25 channels, if we're only using 2 then reading the 10th would
be a bug since it's not been initialized (OK, it's probably NULL, but
still not filled to a valid value.)
So in some way, you almost want to be able to separate the two and write
__counted_by(write=n_alloc, read=n_used), but I guess I don't see
compilers supporting that any time.
I guess the only other way around it would be to re-allocate it all the
time, but that's also annoying in other ways (possibly context, copying
other large things in it, etc.)
> Anyway, sorry again for the wasting of time of yours (and others) that I
> caused with this -- I really wasn't expecting it to go that way, and it
> hasn't been anywhere near as troublesome in other areas of the kernel,
> so it took me by surprise. I have tried to chase down and fix the glitches
> when I became aware of them, FWIW.
Sure, I know, and thanks for that.
> I'll see if I can write up some patches for comments like you suggest
> above with good "proof" attached to them. :)
Thanks, help re-reviewing these is much appreciated.
johannes
^ permalink raw reply [flat|nested] 7+ messages in thread
* Review of __counted_by in wireless (was Re: [PATCH wireless] wifi: cfg80211: remove scan request n_channels counted_by)
2025-07-15 8:24 ` Johannes Berg
2025-07-18 0:05 ` Kees Cook
@ 2025-07-21 18:36 ` Kees Cook
2025-07-21 18:52 ` Johannes Berg
1 sibling, 1 reply; 7+ messages in thread
From: Kees Cook @ 2025-07-21 18:36 UTC (permalink / raw)
To: Johannes Berg; +Cc: linux-wireless, linux-hardening
On Tue, Jul 15, 2025 at 10:24:16AM +0200, Johannes Berg wrote:
> If you feel motivated you could help review and perhaps annotate the
> still existing ones I guess, I'm thinking we should have comments there
> like this perhaps:
>
> --- a/net/mac80211/parse.c
> +++ b/net/mac80211/parse.c
> @@ -54,8 +54,9 @@ struct ieee80211_elems_parse {
> * scratch buffer that can be used for various element parsing related
> * tasks, e.g., element de-fragmentation etc.
> */
> - size_t scratch_len;
> u8 *scratch_pos;
> + /* __counted_by: scratch_len tracks the allocation and doesn't change */
> + size_t scratch_len;
> u8 scratch[] __counted_by(scratch_len);
> };
This seemed like a good project to compare my manual review against a
review performed by the LLM I've been playing with, and it did okay. It
missed several dynamic cases and I had to prod it into getting them
right, and it couldn't find some allocation patterns on its own. It did
spontaneously find the le/ge counted_by variants, which was nice. Anyway,
here are the results...
Static Counters (28 structures)
Almost all follow the pattern:
instance = alloc(...N...);
instance->counter = N;
where counter is set once and never changes. (Some are copying existing
data and never changing the counter variable.)
- struct ath10k_ce_ring (drivers/net/wireless/ath/ath10k/)
void *per_transfer_context[] __counted_by(nentries)
- struct wmi_set_link_monitor_cmd (drivers/net/wireless/ath/wil6210/)
s8 rssi_thresholds_list[] __counted_by(rssi_thresholds_list_size)
- struct mt76_rx_tid (drivers/net/wireless/mediatek/mt76/)
struct sk_buff *reorder_buf[] __counted_by(size)
- struct p54_cal_database (drivers/net/wireless/intersil/p54/)
u8 data[] __counted_by(len)
- struct libipw_txb (drivers/net/wireless/intel/ipw2x00/)
struct sk_buff *fragments[] __counted_by(nr_frags)
- struct at76_command (drivers/net/wireless/atmel/)
u8 data[] __counted_by_le(size)
- struct brcmf_gscan_config (drivers/net/wireless/broadcom/brcm80211/brcmfmac/)
struct brcmf_gscan_bucket_config bucket[] __counted_by(count_of_channel_buckets)
- struct brcmf_fw_request (drivers/net/wireless/broadcom/brcm80211/brcmfmac/)
struct brcmf_fw_item items[] __counted_by(n_items)
- struct brcmf_fweh_event_map (drivers/net/wireless/broadcom/brcm80211/brcmfmac/)
struct brcmf_fweh_event_name items[] __counted_by(n_items)
- struct brcmf_fweh_info (drivers/net/wireless/broadcom/brcm80211/brcmfmac/)
int evt_handler[] __counted_by(num_event_codes)
- struct brcmf_fweh_queue_item (drivers/net/wireless/broadcom/brcm80211/brcmfmac/)
u8 data[] __counted_by(datalen)
- struct brcmf_eventmsgs_ext (drivers/net/wireless/broadcom/brcm80211/brcmfmac/)
u8 mask[] __counted_by(len)
- struct brcmf_mf_params_le (drivers/net/wireless/broadcom/brcm80211/brcmfmac/)
u8 data[] __counted_by(len)
- struct wcn36xx_hal_ind_msg (drivers/net/wireless/ath/wcn36xx/)
u8 msg[] __counted_by(msg_len)
- struct iwl_mvm_acs_survey (drivers/net/wireless/intel/iwlwifi/)
struct iwl_mvm_acs_survey_channel channels[] __counted_by(n_channels)
- struct rtw89_btc_btf_set_slot_table (drivers/net/wireless/realtek/rtw89/)
struct rtw89_btc_btf_slot tbls[] __counted_by(tbl_num)
- struct rtw89_btc_btf_set_mon_reg_v1 (drivers/net/wireless/realtek/rtw89/)
struct rtw89_btc_btf_reg regs[] __counted_by(reg_num)
- struct rtw89_btc_btf_set_mon_reg_v7 (drivers/net/wireless/realtek/rtw89/)
struct rtw89_btc_btf_reg regs[] __counted_by(len)
- struct rtw89_h2c_chinfo (drivers/net/wireless/realtek/rtw89/)
struct rtw89_h2c_ch elem[] __counted_by(ch_num)
- struct rtw89_h2c_chinfo_be (drivers/net/wireless/realtek/rtw89/)
struct rtw89_h2c_ch_be elem[] __counted_by(ch_num)
- struct rtw89_h2c_mrc_req_tsf (drivers/net/wireless/realtek/rtw89/)
struct rtw89_h2c_mrc_req_tsf_info infos[] __counted_by(req_tsf_num)
- struct rtw89_acpi_data (drivers/net/wireless/realtek/rtw89/)
u8 buf[] __counted_by(len)
- struct rtw89_acpi_policy_6ghz (drivers/net/wireless/realtek/rtw89/)
char country_list[] __counted_by(country_count)
- struct rtw89_regd_data (drivers/net/wireless/realtek/rtw89/)
struct rtw89_regulatory_info map[] __counted_by(nr)
- struct cfg80211_cqm_config (net/wireless/)
s32 rssi_thresholds[] __counted_by(n_rssi_thresholds)
- struct ieee80211_elems_parse (net/mac80211/)
u8 scratch[] __counted_by(scratch_len)
- struct cfg80211_tid_config (include/net/cfg80211.h)
struct cfg80211_tid_cfg tid_conf[] __counted_by(n_tid_conf)
- struct cfg80211_acl_data (include/net/cfg80211.h)
struct mac_address mac_addrs[] __counted_by(n_acl_entries)
- struct cfg80211_coalesce (include/net/cfg80211.h)
struct cfg80211_coalesce_rules rules[] __counted_by(n_rules)
- struct cfg80211_pmsr_request (include/net/cfg80211.h)
struct cfg80211_pmsr_request_peer peers[] __counted_by(n_peers)
Dynamic Counters (10 structures):
- struct wmi_start_scan_cmd (drivers/net/wireless/ath/wil6210/)
struct wmi_start_scan_cmd_channel_list channel_list[] __counted_by(num_channels)
- struct rtw89_vif (drivers/net/wireless/realtek/rtw89/)
struct rtw89_vif_link links_inst[] __counted_by(links_inst_valid_num)
- struct rtw89_sta (drivers/net/wireless/realtek/rtw89/)
struct rtw89_sta_link links_inst[] __counted_by(links_inst_valid_num)
- struct cfg80211_scan_request (include/net/cfg80211.h)
struct ieee80211_channel *channels[] __counted_by(n_channels)
- struct cfg80211_sched_scan_request (include/net/cfg80211.h)
struct ieee80211_channel *channels[] __counted_by(n_channels)
- struct cfg80211_mbssid_elems (include/net/cfg80211.h)
struct cfg80211_mbssid_elem elem[] __counted_by(cnt)
- struct cfg80211_rnr_elems (include/net/cfg80211.h)
struct cfg80211_rnr_elem elem[] __counted_by(cnt)
- struct cfg80211_sar_specs (include/net/cfg80211.h)
struct cfg80211_sar_sub_specs sub_specs[] __counted_by(num_sub_specs)
- struct cfg80211_wowlan_nd_match (include/net/cfg80211.h)
u32 channels[] __counted_by(n_channels)
- struct cfg80211_wowlan_nd_info (include/net/cfg80211.h)
struct cfg80211_wowlan_nd_match *matches[] __counted_by(n_matches)
In the review, I found 3 existing bugs, which I've set as separate
patches now:
https://lore.kernel.org/lkml/20250721181810.work.575-kees@kernel.org/
https://lore.kernel.org/lkml/20250721182521.work.540-kees@kernel.org/
https://lore.kernel.org/lkml/20250721183125.work.183-kees@kernel.org/
At the end of the day, with the above fixes, I think the dynamic cases
appear safe, as they all follow basically the same pattern of allocating
some max size and then filling/filtering the actual population of the
array. (So I think cfg80211_scan_request is correctly used at this point,
but I understand your desire to remove __counted_by on it.)
Do you want me to send patches for the static cases to add comments or
is that too much churn?
-Kees
--
Kees Cook
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Review of __counted_by in wireless (was Re: [PATCH wireless] wifi: cfg80211: remove scan request n_channels counted_by)
2025-07-21 18:36 ` Review of __counted_by in wireless (was Re: [PATCH wireless] wifi: cfg80211: remove scan request n_channels counted_by) Kees Cook
@ 2025-07-21 18:52 ` Johannes Berg
0 siblings, 0 replies; 7+ messages in thread
From: Johannes Berg @ 2025-07-21 18:52 UTC (permalink / raw)
To: Kees Cook; +Cc: linux-wireless, linux-hardening
On Mon, 2025-07-21 at 11:36 -0700, Kees Cook wrote:
> This seemed like a good project to compare my manual review against a
> review performed by the LLM I've been playing with, and it did okay. It
> missed several dynamic cases and I had to prod it into getting them
> right, and it couldn't find some allocation patterns on its own. It did
> spontaneously find the le/ge counted_by variants, which was nice. Anyway,
> here are the results...
Heh, cool, I'm glad you did that. Thanks.
[snip]
> In the review, I found 3 existing bugs, which I've set as separate
> patches now:
> https://lore.kernel.org/lkml/20250721181810.work.575-kees@kernel.org/
> https://lore.kernel.org/lkml/20250721182521.work.540-kees@kernel.org/
> https://lore.kernel.org/lkml/20250721183125.work.183-kees@kernel.org/
Right. I'm not sure I'll get those in for 6.16 still, I was really
hoping to not send more changes. I guess I'll decide today or tomorrow,
but I also haven't see any complaints about that.
> At the end of the day, with the above fixes, I think the dynamic cases
> appear safe, as they all follow basically the same pattern of allocating
> some max size and then filling/filtering the actual population of the
> array. (So I think cfg80211_scan_request is correctly used at this point,
> but I understand your desire to remove __counted_by on it.)
It's not correct, which is why I removed it now. In mac80211, we
allocate the int_scan_req and set the n_channels, but we can use that
request multiple times, and of course if e.g. the first use just fills a
single channel and the second use wants more than one, then it gets a
complaint from UBSAN although we allocated enough space for all the
channels the device actually has.
That case is something you can't really ever check though, it assumes
that channels are never going to be added twice and that the device
never changes the list of channels on the fly, which both semantically
makes sense so I don't think any code would ever really break it, but
it's definitely harder to reason about.
Or we could have n_chans_allocated which only makes it detect the memory
corruption part of this issue and can never do anything more than that
such as detection misuse during reading the channel list (which should
only go to n_chans_used.)
> Do you want me to send patches for the static cases to add comments or
> is that too much churn?
I may have changed my mind on that, it's easy to have a comment that
says "it never changes" but then new code comes along that does just
that and never notices the comment ... I guess I'm still not sure about
the whole thing after all this, it seems awfully hard to reason about in
many cases.
johannes
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-07-21 19:17 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-14 12:21 [PATCH wireless] wifi: cfg80211: remove scan request n_channels counted_by Johannes Berg
2025-07-15 5:04 ` Kees Cook
2025-07-15 8:24 ` Johannes Berg
2025-07-18 0:05 ` Kees Cook
2025-07-18 8:26 ` Johannes Berg
2025-07-21 18:36 ` Review of __counted_by in wireless (was Re: [PATCH wireless] wifi: cfg80211: remove scan request n_channels counted_by) Kees Cook
2025-07-21 18:52 ` Johannes Berg
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).