public inbox for linux-wireless@vger.kernel.org
 help / color / mirror / Atom feed
From: James Prestwood <prestwoj@gmail.com>
To: Johannes Berg <johannes@sipsolutions.net>,
	linux-wireless@vger.kernel.org
Subject: Re: Association comeback delay behavior
Date: Fri, 23 May 2025 05:28:25 -0700	[thread overview]
Message-ID: <56bcd608-bda2-40a8-9314-d978a39bf90f@gmail.com> (raw)
In-Reply-To: <2e1fdb77f2ed5f381323f6a493c62ea1bdec19a7.camel@sipsolutions.net>

Hi,

On 5/23/25 4:59 AM, Johannes Berg wrote:
> Hi James,
>
>> 1. The kernel takes the delay in the association response frame and
>> waits, but has no sane bounds for how long the wait is. An AP could send
>> 0xffffffff and the kernel will just block for that entire duration.
> For some value of "block", it's not really blocking in the (traditional)
> threading sense of the word :)
Yes, true, its not blocking the kernel. Just blocks userspace unless 
that event is handled, which isn't handled by any userspace supplicants 
AFAICT.
>
>
>> 2. The first issue would appear to be guarded by the fact that
>> run_again() only reschedules if the new timeout is less than the current
>> time remaining but only if there is an existing timer set.
>>
>> Looking at the code, the association timer gets set when we begin an
>> association so it _should_ be set when we hit this comeback delay case.
>> But through testing I found that it is not. Hacking hostapd to use 10000
>> TU's as the comeback delay I see this:
>>
>> [    4.338185] wlan1: associate with 02:00:00:00:00:00 (try 1/3)
>> [    4.340023] wlan1: RX AssocResp from 02:00:00:00:00:00 (capab=0x411
>> status=30 aid=0)
>> [    4.340409] wlan1: 02:00:00:00:00:00 rejected association
>> temporarily; comeback duration 10000 TU (10240 ms)
>> [   14.654103] wlan1: associate with 02:00:00:00:00:00 (try 2/3)
>> [   14.657405] wlan1: RX AssocResp from 02:00:00:00:00:00 (capab=0x411
>> status=30 aid=0)
>> [   14.658430] wlan1: 02:00:00:00:00:00 rejected association
>> temporarily; comeback duration 10000 TU (10240 ms)
>> [   14.848706] wlan1: associate with 02:00:00:00:00:00 (try 3/3)
>> [   14.851596] wlan1: RX AssocResp from 02:00:00:00:00:00 (capab=0x411
>> status=30 aid=0)
>> [   14.854269] wlan1: 02:00:00:00:00:00 rejected association
>> temporarily; comeback duration 10000 TU (10240 ms)
>>
>> So the first association attempt waited the full 10 seconds, then after
>> that the timer was presumably set, and we only waited the default 200ms
>> (ASSOC_TIMEOUT).
> That's not exactly how it works, run_again() multiplexes different
> things onto the same timer by tracking the various sources. So the
> _timer_ might be expiring again, but the actual "assoc handling" part
> should only happen after 10000 TU.
>
>> So to me, this feels like either a bug
> Yes. I can't reproduce it though:
>
> [    4.300000] wlan0: authenticate with 02:00:00:00:00:00 (local address=92:9c:4c:00:00:01)
> [    4.300000] wlan0: send auth to 02:00:00:00:00:00 (try 1/3)
> [    4.300000] wlan0: authenticated
> [    4.310000] wlan0: associate with 02:00:00:00:00:00 (try 1/3)
> [    4.310000] wlan0: RX AssocResp from 02:00:00:00:00:00 (capab=0x401 status=30 aid=0)
> [    4.310000] wlan0: 02:00:00:00:00:00 rejected association temporarily; comeback duration 10000 TU (10240 ms)
> [   14.560000] wlan0: associate with 02:00:00:00:00:00 (try 2/3)
> [   14.560000] wlan0: RX AssocResp from 02:00:00:00:00:00 (capab=0x401 status=30 aid=0)
> [   14.560000] wlan0: 02:00:00:00:00:00 rejected association temporarily; comeback duration 10000 TU (10240 ms)
> [   25.440000] wlan0: associate with 02:00:00:00:00:00 (try 3/3)
> [   25.440000] wlan0: RX AssocResp from 02:00:00:00:00:00 (capab=0x401 status=30 aid=0)
> [   25.440000] wlan0: 02:00:00:00:00:00 rejected association temporarily; comeback duration 10000 TU (10240 ms)
> [   36.320000] wlan0: association with 02:00:00:00:00:00 timed out
>
>
> That last "timed out" should really come earlier though, oops. Let me
> fix that:
>
> diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c
> index fa7cf3b8ad59..f4a5deedfaab 100644
> --- a/net/mac80211/mlme.c
> +++ b/net/mac80211/mlme.c
> @@ -6383,7 +6383,8 @@ static void ieee80211_rx_mgmt_assoc_resp(struct ieee80211_sub_if_data *sdata,
>   
>   	if (status_code == WLAN_STATUS_ASSOC_REJECTED_TEMPORARILY &&
>   	    elems->timeout_int &&
> -	    elems->timeout_int->type == WLAN_TIMEOUT_ASSOC_COMEBACK) {
> +	    elems->timeout_int->type == WLAN_TIMEOUT_ASSOC_COMEBACK &&
> +	    assoc_data->tries < IEEE80211_ASSOC_MAX_TRIES) {
>   		u32 tu, ms;
>   
>   		cfg80211_assoc_comeback(sdata->dev, assoc_data->ap_addr,
>
>
> So now I see:
>
> [    4.300000] wlan0: authenticate with 02:00:00:00:00:00 (local address=92:9c:4c:00:00:01)
> [    4.300000] wlan0: send auth to 02:00:00:00:00:00 (try 1/3)
> [    4.300000] wlan0: authenticated
> [    4.310000] wlan0: associate with 02:00:00:00:00:00 (try 1/3)
> [    4.310000] wlan0: RX AssocResp from 02:00:00:00:00:00 (capab=0x401 status=30 aid=0)
> [    4.310000] wlan0: 02:00:00:00:00:00 rejected association temporarily; comeback duration 10000 TU (10240 ms)
> [   14.560000] wlan0: associate with 02:00:00:00:00:00 (try 2/3)
> [   14.560000] wlan0: RX AssocResp from 02:00:00:00:00:00 (capab=0x401 status=30 aid=0)
> [   14.560000] wlan0: 02:00:00:00:00:00 rejected association temporarily; comeback duration 10000 TU (10240 ms)
> [   25.440000] wlan0: associate with 02:00:00:00:00:00 (try 3/3)
> [   25.440000] wlan0: RX AssocResp from 02:00:00:00:00:00 (capab=0x401 status=30 aid=0)
> [   25.440000] wlan0: 02:00:00:00:00:00 denied association (code=30)
>
>>    - If the timer being unset is expected, the kernel should be limiting
>> this wait to something reasonable.
> Define "reasonable"? I mean, sure, if it says 0xffffffff we'll even
> overflow the calculation and end up trying way too early, and if it says
> 0x100000 instead to avoid the overflow inside the calculation and in
> jiffies, we'll wait a very long time:
>
> [    4.300000] wlan0: authenticate with 02:00:00:00:00:00 (local address=92:9c:4c:00:00:01)
> [    4.300000] wlan0: send auth to 02:00:00:00:00:00 (try 1/3)
> [    4.300000] wlan0: authenticated
> [    4.310000] wlan0: associate with 02:00:00:00:00:00 (try 1/3)
> [    4.310000] wlan0: RX AssocResp from 02:00:00:00:00:00 (capab=0x401 status=30 aid=0)
> [    4.310000] wlan0: 02:00:00:00:00:00 rejected association temporarily; comeback duration 1048576 TU (1073741 ms)
> [ 1078.240000] wlan0: associate with 02:00:00:00:00:00 (try 2/3)
> [ 1078.240000] wlan0: deauthenticated from 02:00:00:00:00:00 while associating (Reason: 6=CLASS2_FRAME_FROM_NONAUTH_STA)
>
> Long enough, in fact, that hostapd forgot the STA even existed ;-)
>
>
>> I also realize that CMD_ASSOC_COMEBACK was added and userspace gets
>> notified, but this feels excessive to handle in userspace when the
>> kernel could instead enforce a sane timeout all on its own without
>> requiring userspace disconnect/reconnect when the AP sends an absurd
>> timeout.
> Define "absurd". Bigger than around what I was demonstrating above
> doesn't actually work properly anyway due to the possible overflows, and
> sure, 15 minutes is long, but doesn't feel "absurd".
>
> I tend to think this is exactly right - the kernel will wait, but since
> it's not doing anything else that doesn't really matter. Maybe it'll
> work later (earlier tests above), maybe it won't (like when the AP
> forgot about the STA above), but it's not like the kernel is holding
> some important resource busy for all that time?
>
> And userspace gets notified and gets a choice, so of course it can give
> up on the association instead.
>
> And yeah I did "iw connect -w" and it'd be hard to actually work around
> it with that, but it could even make the assoc socket-owned and then
> it'd probably stop when you hit Ctrl-C, and anyway nobody really uses
> that.
>
>
>> My main concern here is a rouge AP scenario that can then DoS all your
>> clients that try and connect to it.
> Oh, so you're just trying to sell us a missing implementation in iwd as
> a kernel security bug? :-)
Depends on how you look at it I guess. Handling the event in userspace 
almost feels like an escape hatch for the kernel having used untrusted 
input but that's just how I see it. Waiting 15 minutes for a WiFi 
connection that should take 200ms is on the level of absurd and 
unreasonable from my view, but that's just my opinion. But it sounds 
like this is all by design, so we can just handle the event in userspace.
>
> johannes

  reply	other threads:[~2025-05-23 12:28 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-22 17:45 Association comeback delay behavior James Prestwood
2025-05-23 11:59 ` Johannes Berg
2025-05-23 12:28   ` James Prestwood [this message]
2025-05-23 12:34     ` Johannes Berg
2025-05-23 12:54       ` James Prestwood
2025-05-23 12:57         ` Johannes Berg
2025-05-23 13:07           ` James Prestwood

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56bcd608-bda2-40a8-9314-d978a39bf90f@gmail.com \
    --to=prestwoj@gmail.com \
    --cc=johannes@sipsolutions.net \
    --cc=linux-wireless@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox