Association comeback delay behavior

public inbox for linux-wireless@vger.kernel.org
 help / color / mirror / Atom feed

* Association comeback delay behavior
@ 2025-05-22 17:45 James Prestwood
  2025-05-23 11:59 ` Johannes Berg
  0 siblings, 1 reply; 7+ messages in thread
From: James Prestwood @ 2025-05-22 17:45 UTC (permalink / raw)
  To: open list:MEDIATEK MT76 WIRELESS LAN DRIVER

Hi,

After noticing this log "rejected association temporarily; comeback 
duration 1000 TU (1024 ms)" I started looking more into how the kernel 
handles this and noticed a few things:

1. The kernel takes the delay in the association response frame and 
waits, but has no sane bounds for how long the wait is. An AP could send 
0xffffffff and the kernel will just block for that entire duration.

2. The first issue would appear to be guarded by the fact that 
run_again() only reschedules if the new timeout is less than the current 
time remaining but only if there is an existing timer set.

Looking at the code, the association timer gets set when we begin an 
association so it _should_ be set when we hit this comeback delay case. 
But through testing I found that it is not. Hacking hostapd to use 10000 
TU's as the comeback delay I see this:

[    4.338185] wlan1: associate with 02:00:00:00:00:00 (try 1/3)
[    4.340023] wlan1: RX AssocResp from 02:00:00:00:00:00 (capab=0x411 
status=30 aid=0)
[    4.340409] wlan1: 02:00:00:00:00:00 rejected association 
temporarily; comeback duration 10000 TU (10240 ms)
[   14.654103] wlan1: associate with 02:00:00:00:00:00 (try 2/3)
[   14.657405] wlan1: RX AssocResp from 02:00:00:00:00:00 (capab=0x411 
status=30 aid=0)
[   14.658430] wlan1: 02:00:00:00:00:00 rejected association 
temporarily; comeback duration 10000 TU (10240 ms)
[   14.848706] wlan1: associate with 02:00:00:00:00:00 (try 3/3)
[   14.851596] wlan1: RX AssocResp from 02:00:00:00:00:00 (capab=0x411 
status=30 aid=0)
[   14.854269] wlan1: 02:00:00:00:00:00 rejected association 
temporarily; comeback duration 10000 TU (10240 ms)

So the first association attempt waited the full 10 seconds, then after 
that the timer was presumably set, and we only waited the default 200ms 
(ASSOC_TIMEOUT). So to me, this feels like either a bug or an oversight 
on how this should be handled:

  - If the timer should already be set, this is a bug as I see the 
kernel waiting excessively.

  - If the timer being unset is expected, the kernel should be limiting 
this wait to something reasonable.

I also realize that CMD_ASSOC_COMEBACK was added and userspace gets 
notified, but this feels excessive to handle in userspace when the 
kernel could instead enforce a sane timeout all on its own without 
requiring userspace disconnect/reconnect when the AP sends an absurd 
timeout.

My main concern here is a rouge AP scenario that can then DoS all your 
clients that try and connect to it.

Thanks,

James

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Association comeback delay behavior
  2025-05-22 17:45 Association comeback delay behavior James Prestwood
@ 2025-05-23 11:59 ` Johannes Berg
  2025-05-23 12:28   ` James Prestwood
  0 siblings, 1 reply; 7+ messages in thread
From: Johannes Berg @ 2025-05-23 11:59 UTC (permalink / raw)
  To: James Prestwood, linux-wireless

Hi James,

> 1. The kernel takes the delay in the association response frame and 
> waits, but has no sane bounds for how long the wait is. An AP could send 
> 0xffffffff and the kernel will just block for that entire duration.

For some value of "block", it's not really blocking in the (traditional)
threading sense of the word :)


> 2. The first issue would appear to be guarded by the fact that 
> run_again() only reschedules if the new timeout is less than the current 
> time remaining but only if there is an existing timer set.
> 
> Looking at the code, the association timer gets set when we begin an 
> association so it _should_ be set when we hit this comeback delay case. 
> But through testing I found that it is not. Hacking hostapd to use 10000 
> TU's as the comeback delay I see this:
> 
> [    4.338185] wlan1: associate with 02:00:00:00:00:00 (try 1/3)
> [    4.340023] wlan1: RX AssocResp from 02:00:00:00:00:00 (capab=0x411 
> status=30 aid=0)
> [    4.340409] wlan1: 02:00:00:00:00:00 rejected association 
> temporarily; comeback duration 10000 TU (10240 ms)
> [   14.654103] wlan1: associate with 02:00:00:00:00:00 (try 2/3)
> [   14.657405] wlan1: RX AssocResp from 02:00:00:00:00:00 (capab=0x411 
> status=30 aid=0)
> [   14.658430] wlan1: 02:00:00:00:00:00 rejected association 
> temporarily; comeback duration 10000 TU (10240 ms)
> [   14.848706] wlan1: associate with 02:00:00:00:00:00 (try 3/3)
> [   14.851596] wlan1: RX AssocResp from 02:00:00:00:00:00 (capab=0x411 
> status=30 aid=0)
> [   14.854269] wlan1: 02:00:00:00:00:00 rejected association 
> temporarily; comeback duration 10000 TU (10240 ms)
> 
> So the first association attempt waited the full 10 seconds, then after 
> that the timer was presumably set, and we only waited the default 200ms 
> (ASSOC_TIMEOUT). 

That's not exactly how it works, run_again() multiplexes different
things onto the same timer by tracking the various sources. So the
_timer_ might be expiring again, but the actual "assoc handling" part
should only happen after 10000 TU.

> So to me, this feels like either a bug

Yes. I can't reproduce it though:

[    4.300000] wlan0: authenticate with 02:00:00:00:00:00 (local address=92:9c:4c:00:00:01)
[    4.300000] wlan0: send auth to 02:00:00:00:00:00 (try 1/3)
[    4.300000] wlan0: authenticated
[    4.310000] wlan0: associate with 02:00:00:00:00:00 (try 1/3)
[    4.310000] wlan0: RX AssocResp from 02:00:00:00:00:00 (capab=0x401 status=30 aid=0)
[    4.310000] wlan0: 02:00:00:00:00:00 rejected association temporarily; comeback duration 10000 TU (10240 ms)
[   14.560000] wlan0: associate with 02:00:00:00:00:00 (try 2/3)
[   14.560000] wlan0: RX AssocResp from 02:00:00:00:00:00 (capab=0x401 status=30 aid=0)
[   14.560000] wlan0: 02:00:00:00:00:00 rejected association temporarily; comeback duration 10000 TU (10240 ms)
[   25.440000] wlan0: associate with 02:00:00:00:00:00 (try 3/3)
[   25.440000] wlan0: RX AssocResp from 02:00:00:00:00:00 (capab=0x401 status=30 aid=0)
[   25.440000] wlan0: 02:00:00:00:00:00 rejected association temporarily; comeback duration 10000 TU (10240 ms)
[   36.320000] wlan0: association with 02:00:00:00:00:00 timed out


That last "timed out" should really come earlier though, oops. Let me
fix that:

diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c
index fa7cf3b8ad59..f4a5deedfaab 100644
--- a/net/mac80211/mlme.c
+++ b/net/mac80211/mlme.c
@@ -6383,7 +6383,8 @@ static void ieee80211_rx_mgmt_assoc_resp(struct ieee80211_sub_if_data *sdata,
 
 	if (status_code == WLAN_STATUS_ASSOC_REJECTED_TEMPORARILY &&
 	    elems->timeout_int &&
-	    elems->timeout_int->type == WLAN_TIMEOUT_ASSOC_COMEBACK) {
+	    elems->timeout_int->type == WLAN_TIMEOUT_ASSOC_COMEBACK &&
+	    assoc_data->tries < IEEE80211_ASSOC_MAX_TRIES) {
 		u32 tu, ms;
 
 		cfg80211_assoc_comeback(sdata->dev, assoc_data->ap_addr,


So now I see:

[    4.300000] wlan0: authenticate with 02:00:00:00:00:00 (local address=92:9c:4c:00:00:01)
[    4.300000] wlan0: send auth to 02:00:00:00:00:00 (try 1/3)
[    4.300000] wlan0: authenticated
[    4.310000] wlan0: associate with 02:00:00:00:00:00 (try 1/3)
[    4.310000] wlan0: RX AssocResp from 02:00:00:00:00:00 (capab=0x401 status=30 aid=0)
[    4.310000] wlan0: 02:00:00:00:00:00 rejected association temporarily; comeback duration 10000 TU (10240 ms)
[   14.560000] wlan0: associate with 02:00:00:00:00:00 (try 2/3)
[   14.560000] wlan0: RX AssocResp from 02:00:00:00:00:00 (capab=0x401 status=30 aid=0)
[   14.560000] wlan0: 02:00:00:00:00:00 rejected association temporarily; comeback duration 10000 TU (10240 ms)
[   25.440000] wlan0: associate with 02:00:00:00:00:00 (try 3/3)
[   25.440000] wlan0: RX AssocResp from 02:00:00:00:00:00 (capab=0x401 status=30 aid=0)
[   25.440000] wlan0: 02:00:00:00:00:00 denied association (code=30)

>   - If the timer being unset is expected, the kernel should be limiting 
> this wait to something reasonable.

Define "reasonable"? I mean, sure, if it says 0xffffffff we'll even
overflow the calculation and end up trying way too early, and if it says
0x100000 instead to avoid the overflow inside the calculation and in
jiffies, we'll wait a very long time:

[    4.300000] wlan0: authenticate with 02:00:00:00:00:00 (local address=92:9c:4c:00:00:01)
[    4.300000] wlan0: send auth to 02:00:00:00:00:00 (try 1/3)
[    4.300000] wlan0: authenticated
[    4.310000] wlan0: associate with 02:00:00:00:00:00 (try 1/3)
[    4.310000] wlan0: RX AssocResp from 02:00:00:00:00:00 (capab=0x401 status=30 aid=0)
[    4.310000] wlan0: 02:00:00:00:00:00 rejected association temporarily; comeback duration 1048576 TU (1073741 ms)
[ 1078.240000] wlan0: associate with 02:00:00:00:00:00 (try 2/3)
[ 1078.240000] wlan0: deauthenticated from 02:00:00:00:00:00 while associating (Reason: 6=CLASS2_FRAME_FROM_NONAUTH_STA)

Long enough, in fact, that hostapd forgot the STA even existed ;-)


> I also realize that CMD_ASSOC_COMEBACK was added and userspace gets 
> notified, but this feels excessive to handle in userspace when the 
> kernel could instead enforce a sane timeout all on its own without 
> requiring userspace disconnect/reconnect when the AP sends an absurd 
> timeout.

Define "absurd". Bigger than around what I was demonstrating above
doesn't actually work properly anyway due to the possible overflows, and
sure, 15 minutes is long, but doesn't feel "absurd".

I tend to think this is exactly right - the kernel will wait, but since
it's not doing anything else that doesn't really matter. Maybe it'll
work later (earlier tests above), maybe it won't (like when the AP
forgot about the STA above), but it's not like the kernel is holding
some important resource busy for all that time?

And userspace gets notified and gets a choice, so of course it can give
up on the association instead.

And yeah I did "iw connect -w" and it'd be hard to actually work around
it with that, but it could even make the assoc socket-owned and then
it'd probably stop when you hit Ctrl-C, and anyway nobody really uses
that.


> My main concern here is a rouge AP scenario that can then DoS all your 
> clients that try and connect to it.

Oh, so you're just trying to sell us a missing implementation in iwd as
a kernel security bug? :-)

johannes

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: Association comeback delay behavior
  2025-05-23 11:59 ` Johannes Berg
@ 2025-05-23 12:28   ` James Prestwood
  2025-05-23 12:34     ` Johannes Berg
  0 siblings, 1 reply; 7+ messages in thread
From: James Prestwood @ 2025-05-23 12:28 UTC (permalink / raw)
  To: Johannes Berg, linux-wireless

Hi,

On 5/23/25 4:59 AM, Johannes Berg wrote:
> Hi James,
>
>> 1. The kernel takes the delay in the association response frame and
>> waits, but has no sane bounds for how long the wait is. An AP could send
>> 0xffffffff and the kernel will just block for that entire duration.
> For some value of "block", it's not really blocking in the (traditional)
> threading sense of the word :)
Yes, true, its not blocking the kernel. Just blocks userspace unless 
that event is handled, which isn't handled by any userspace supplicants 
AFAICT.
>
>
>> 2. The first issue would appear to be guarded by the fact that
>> run_again() only reschedules if the new timeout is less than the current
>> time remaining but only if there is an existing timer set.
>>
>> Looking at the code, the association timer gets set when we begin an
>> association so it _should_ be set when we hit this comeback delay case.
>> But through testing I found that it is not. Hacking hostapd to use 10000
>> TU's as the comeback delay I see this:
>>
>> [    4.338185] wlan1: associate with 02:00:00:00:00:00 (try 1/3)
>> [    4.340023] wlan1: RX AssocResp from 02:00:00:00:00:00 (capab=0x411
>> status=30 aid=0)
>> [    4.340409] wlan1: 02:00:00:00:00:00 rejected association
>> temporarily; comeback duration 10000 TU (10240 ms)
>> [   14.654103] wlan1: associate with 02:00:00:00:00:00 (try 2/3)
>> [   14.657405] wlan1: RX AssocResp from 02:00:00:00:00:00 (capab=0x411
>> status=30 aid=0)
>> [   14.658430] wlan1: 02:00:00:00:00:00 rejected association
>> temporarily; comeback duration 10000 TU (10240 ms)
>> [   14.848706] wlan1: associate with 02:00:00:00:00:00 (try 3/3)
>> [   14.851596] wlan1: RX AssocResp from 02:00:00:00:00:00 (capab=0x411
>> status=30 aid=0)
>> [   14.854269] wlan1: 02:00:00:00:00:00 rejected association
>> temporarily; comeback duration 10000 TU (10240 ms)
>>
>> So the first association attempt waited the full 10 seconds, then after
>> that the timer was presumably set, and we only waited the default 200ms
>> (ASSOC_TIMEOUT).
> That's not exactly how it works, run_again() multiplexes different
> things onto the same timer by tracking the various sources. So the
> _timer_ might be expiring again, but the actual "assoc handling" part
> should only happen after 10000 TU.
>
>> So to me, this feels like either a bug
> Yes. I can't reproduce it though:
>
> [    4.300000] wlan0: authenticate with 02:00:00:00:00:00 (local address=92:9c:4c:00:00:01)
> [    4.300000] wlan0: send auth to 02:00:00:00:00:00 (try 1/3)
> [    4.300000] wlan0: authenticated
> [    4.310000] wlan0: associate with 02:00:00:00:00:00 (try 1/3)
> [    4.310000] wlan0: RX AssocResp from 02:00:00:00:00:00 (capab=0x401 status=30 aid=0)
> [    4.310000] wlan0: 02:00:00:00:00:00 rejected association temporarily; comeback duration 10000 TU (10240 ms)
> [   14.560000] wlan0: associate with 02:00:00:00:00:00 (try 2/3)
> [   14.560000] wlan0: RX AssocResp from 02:00:00:00:00:00 (capab=0x401 status=30 aid=0)
> [   14.560000] wlan0: 02:00:00:00:00:00 rejected association temporarily; comeback duration 10000 TU (10240 ms)
> [   25.440000] wlan0: associate with 02:00:00:00:00:00 (try 3/3)
> [   25.440000] wlan0: RX AssocResp from 02:00:00:00:00:00 (capab=0x401 status=30 aid=0)
> [   25.440000] wlan0: 02:00:00:00:00:00 rejected association temporarily; comeback duration 10000 TU (10240 ms)
> [   36.320000] wlan0: association with 02:00:00:00:00:00 timed out
>
>
> That last "timed out" should really come earlier though, oops. Let me
> fix that:
>
> diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c
> index fa7cf3b8ad59..f4a5deedfaab 100644
> --- a/net/mac80211/mlme.c
> +++ b/net/mac80211/mlme.c
> @@ -6383,7 +6383,8 @@ static void ieee80211_rx_mgmt_assoc_resp(struct ieee80211_sub_if_data *sdata,
>   
>   	if (status_code == WLAN_STATUS_ASSOC_REJECTED_TEMPORARILY &&
>   	    elems->timeout_int &&
> -	    elems->timeout_int->type == WLAN_TIMEOUT_ASSOC_COMEBACK) {
> +	    elems->timeout_int->type == WLAN_TIMEOUT_ASSOC_COMEBACK &&
> +	    assoc_data->tries < IEEE80211_ASSOC_MAX_TRIES) {
>   		u32 tu, ms;
>   
>   		cfg80211_assoc_comeback(sdata->dev, assoc_data->ap_addr,
>
>
> So now I see:
>
> [    4.300000] wlan0: authenticate with 02:00:00:00:00:00 (local address=92:9c:4c:00:00:01)
> [    4.300000] wlan0: send auth to 02:00:00:00:00:00 (try 1/3)
> [    4.300000] wlan0: authenticated
> [    4.310000] wlan0: associate with 02:00:00:00:00:00 (try 1/3)
> [    4.310000] wlan0: RX AssocResp from 02:00:00:00:00:00 (capab=0x401 status=30 aid=0)
> [    4.310000] wlan0: 02:00:00:00:00:00 rejected association temporarily; comeback duration 10000 TU (10240 ms)
> [   14.560000] wlan0: associate with 02:00:00:00:00:00 (try 2/3)
> [   14.560000] wlan0: RX AssocResp from 02:00:00:00:00:00 (capab=0x401 status=30 aid=0)
> [   14.560000] wlan0: 02:00:00:00:00:00 rejected association temporarily; comeback duration 10000 TU (10240 ms)
> [   25.440000] wlan0: associate with 02:00:00:00:00:00 (try 3/3)
> [   25.440000] wlan0: RX AssocResp from 02:00:00:00:00:00 (capab=0x401 status=30 aid=0)
> [   25.440000] wlan0: 02:00:00:00:00:00 denied association (code=30)
>
>>    - If the timer being unset is expected, the kernel should be limiting
>> this wait to something reasonable.
> Define "reasonable"? I mean, sure, if it says 0xffffffff we'll even
> overflow the calculation and end up trying way too early, and if it says
> 0x100000 instead to avoid the overflow inside the calculation and in
> jiffies, we'll wait a very long time:
>
> [    4.300000] wlan0: authenticate with 02:00:00:00:00:00 (local address=92:9c:4c:00:00:01)
> [    4.300000] wlan0: send auth to 02:00:00:00:00:00 (try 1/3)
> [    4.300000] wlan0: authenticated
> [    4.310000] wlan0: associate with 02:00:00:00:00:00 (try 1/3)
> [    4.310000] wlan0: RX AssocResp from 02:00:00:00:00:00 (capab=0x401 status=30 aid=0)
> [    4.310000] wlan0: 02:00:00:00:00:00 rejected association temporarily; comeback duration 1048576 TU (1073741 ms)
> [ 1078.240000] wlan0: associate with 02:00:00:00:00:00 (try 2/3)
> [ 1078.240000] wlan0: deauthenticated from 02:00:00:00:00:00 while associating (Reason: 6=CLASS2_FRAME_FROM_NONAUTH_STA)
>
> Long enough, in fact, that hostapd forgot the STA even existed ;-)
>
>
>> I also realize that CMD_ASSOC_COMEBACK was added and userspace gets
>> notified, but this feels excessive to handle in userspace when the
>> kernel could instead enforce a sane timeout all on its own without
>> requiring userspace disconnect/reconnect when the AP sends an absurd
>> timeout.
> Define "absurd". Bigger than around what I was demonstrating above
> doesn't actually work properly anyway due to the possible overflows, and
> sure, 15 minutes is long, but doesn't feel "absurd".
>
> I tend to think this is exactly right - the kernel will wait, but since
> it's not doing anything else that doesn't really matter. Maybe it'll
> work later (earlier tests above), maybe it won't (like when the AP
> forgot about the STA above), but it's not like the kernel is holding
> some important resource busy for all that time?
>
> And userspace gets notified and gets a choice, so of course it can give
> up on the association instead.
>
> And yeah I did "iw connect -w" and it'd be hard to actually work around
> it with that, but it could even make the assoc socket-owned and then
> it'd probably stop when you hit Ctrl-C, and anyway nobody really uses
> that.
>
>
>> My main concern here is a rouge AP scenario that can then DoS all your
>> clients that try and connect to it.
> Oh, so you're just trying to sell us a missing implementation in iwd as
> a kernel security bug? :-)
Depends on how you look at it I guess. Handling the event in userspace 
almost feels like an escape hatch for the kernel having used untrusted 
input but that's just how I see it. Waiting 15 minutes for a WiFi 
connection that should take 200ms is on the level of absurd and 
unreasonable from my view, but that's just my opinion. But it sounds 
like this is all by design, so we can just handle the event in userspace.
>
> johannes

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Association comeback delay behavior
  2025-05-23 12:28   ` James Prestwood
@ 2025-05-23 12:34     ` Johannes Berg
  2025-05-23 12:54       ` James Prestwood
  0 siblings, 1 reply; 7+ messages in thread
From: Johannes Berg @ 2025-05-23 12:34 UTC (permalink / raw)
  To: James Prestwood, linux-wireless

Hi,

(uh, please trim quotes)

> > > 1. The kernel takes the delay in the association response frame and
> > > waits, but has no sane bounds for how long the wait is. An AP could send
> > > 0xffffffff and the kernel will just block for that entire duration.
> > For some value of "block", it's not really blocking in the (traditional)
> > threading sense of the word :)
> Yes, true, its not blocking the kernel. Just blocks userspace unless 
> that event is handled, which isn't handled by any userspace supplicants 
> AFAICT.

wpa_supplicant seems to handle it just fine? I guess in an ideal world
we'd have made it some kind of opt-in, or the supplicant would've given
some sort of maximum wait value.

> > Oh, so you're just trying to sell us a missing implementation in iwd as
> > a kernel security bug? :-)
> Depends on how you look at it I guess. Handling the event in userspace 
> almost feels like an escape hatch for the kernel having used untrusted 
> input but that's just how I see it. Waiting 15 minutes for a WiFi 
> connection that should take 200ms is on the level of absurd and 
> unreasonable from my view, but that's just my opinion.

But it shouldn't take 200ms. The comeback can be larger and you may
actually want to wait for it if you have no other choice of AP anyway.
Sure it takes less than 200ms to connect in normal cases, but comeback
already isn't an immediate connection. Whether that then is 15 seconds
or 15 minutes - sure, the latter seems excessive, but I'm not sure I'd
want to define a timeout somewhere between there. Given your
argumentation it sounds like you'd say "1 second is excessive", but
that's just under the _default_ setting in hostapd for
assoc_sa_query_max_timeout.

johannes

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Association comeback delay behavior
  2025-05-23 12:34     ` Johannes Berg
@ 2025-05-23 12:54       ` James Prestwood
  2025-05-23 12:57         ` Johannes Berg
  0 siblings, 1 reply; 7+ messages in thread
From: James Prestwood @ 2025-05-23 12:54 UTC (permalink / raw)
  To: Johannes Berg, linux-wireless

Hi,

On 5/23/25 5:34 AM, Johannes Berg wrote:
> Hi,
>
> (uh, please trim quotes)
>
>>>> 1. The kernel takes the delay in the association response frame and
>>>> waits, but has no sane bounds for how long the wait is. An AP could send
>>>> 0xffffffff and the kernel will just block for that entire duration.
>>> For some value of "block", it's not really blocking in the (traditional)
>>> threading sense of the word :)
>> Yes, true, its not blocking the kernel. Just blocks userspace unless
>> that event is handled, which isn't handled by any userspace supplicants
>> AFAICT.
> wpa_supplicant seems to handle it just fine? I guess in an ideal world
> we'd have made it some kind of opt-in, or the supplicant would've given
> some sort of maximum wait value.

It logs and returns:

static void nl80211_assoc_comeback(struct wpa_driver_nl80211_data *drv,
                    struct nlattr *mac, struct nlattr *timeout)
{
     if (!mac || !timeout)
         return;
     wpa_printf(MSG_DEBUG, "nl80211: Association comeback requested by "
            MACSTR " (timeout: %u ms)",
            MAC2STR((u8 *) nla_data(mac)), nla_get_u32(timeout));
}

Its relying on its connection timer though, which is why it "works". 
Opt-in or userspace providing a maximum is definitely a step in the 
right direction IMO.

>
>>> Oh, so you're just trying to sell us a missing implementation in iwd as
>>> a kernel security bug? :-)
>> Depends on how you look at it I guess. Handling the event in userspace
>> almost feels like an escape hatch for the kernel having used untrusted
>> input but that's just how I see it. Waiting 15 minutes for a WiFi
>> connection that should take 200ms is on the level of absurd and
>> unreasonable from my view, but that's just my opinion.
> But it shouldn't take 200ms. The comeback can be larger and you may
> actually want to wait for it if you have no other choice of AP anyway.
> Sure it takes less than 200ms to connect in normal cases, but comeback
> already isn't an immediate connection. Whether that then is 15 seconds
> or 15 minutes - sure, the latter seems excessive, but I'm not sure I'd
> want to define a timeout somewhere between there. Given your
> argumentation it sounds like you'd say "1 second is excessive", but
> that's just under the _default_ setting in hostapd for
> assoc_sa_query_max_timeout.

I'll admit, from a single AP use-case if the AP _REALLY_ needs you to 
wait that long you are right in that we have no other choice. I question 
under what circumstances the AP would need that though; when your 
talking on the order of minutes or even 10-15 seconds the AP feels 
broken at that point. I'm also not a vendor and don't know what 
conditions would even trigger this in the first place. Maybe this piece 
of information is what I need to convince me either direction, a 
legitimate reason for an AP to tell the station to wait, and an amount 
of time that would be for.

 From a user-experience point of view I think most people watching their 
device trying to connect for more than about 20-30 seconds is going to 
trigger a "wtf, this is broken" response. And I know if my router was 
taking that long to accept connections it would be promptly rebooted. 
Its a long way of saying that I think there is some reasonable value here.

Thanks,

James

>
> johannes

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Association comeback delay behavior
  2025-05-23 12:54       ` James Prestwood
@ 2025-05-23 12:57         ` Johannes Berg
  2025-05-23 13:07           ` James Prestwood
  0 siblings, 1 reply; 7+ messages in thread
From: Johannes Berg @ 2025-05-23 12:57 UTC (permalink / raw)
  To: James Prestwood, linux-wireless

On Fri, 2025-05-23 at 05:54 -0700, James Prestwood wrote:
> 
> It logs and returns:
> 
> static void nl80211_assoc_comeback(struct wpa_driver_nl80211_data *drv,
>                     struct nlattr *mac, struct nlattr *timeout)
> {
>      if (!mac || !timeout)
>          return;
>      wpa_printf(MSG_DEBUG, "nl80211: Association comeback requested by "
>             MACSTR " (timeout: %u ms)",
>             MAC2STR((u8 *) nla_data(mac)), nla_get_u32(timeout));
> }
> 
> Its relying on its connection timer though, which is why it "works". 
> Opt-in or userspace providing a maximum is definitely a step in the 
> right direction IMO.

Hah, ok, I only saw it doing something other than "get stuck" in the
explicit tests :)

> I'll admit, from a single AP use-case if the AP _REALLY_ needs you to 
> wait that long you are right in that we have no other choice. I question 
> under what circumstances the AP would need that though; when your 
> talking on the order of minutes or even 10-15 seconds the AP feels 
> broken at that point. I'm also not a vendor and don't know what 
> conditions would even trigger this in the first place. Maybe this piece 
> of information is what I need to convince me either direction, a 
> legitimate reason for an AP to tell the station to wait, and an amount 
> of time that would be for.

I don't really have that either, but simply looking at hostapd says 1
second is even default. So where do you draw the line? 10x that? 100x
that? Who knows? Why bother?

>  From a user-experience point of view I think most people watching their 
> device trying to connect for more than about 20-30 seconds is going to 
> trigger a "wtf, this is broken" response. And I know if my router was 
> taking that long to accept connections it would be promptly rebooted. 
> Its a long way of saying that I think there is some reasonable value here.

We don't just implement wifi for laptops though. I mean we, Intel, do,
but generally the stack gets used elsewhere. The random IOT device out
in my garage? I don't really care where it waits, it only gets a single
BSSID it could ever use.

johannes

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Association comeback delay behavior
  2025-05-23 12:57         ` Johannes Berg
@ 2025-05-23 13:07           ` James Prestwood
  0 siblings, 0 replies; 7+ messages in thread
From: James Prestwood @ 2025-05-23 13:07 UTC (permalink / raw)
  To: Johannes Berg, linux-wireless

Hi,

On 5/23/25 5:57 AM, Johannes Berg wrote:
>>   From a user-experience point of view I think most people watching their
>> device trying to connect for more than about 20-30 seconds is going to
>> trigger a "wtf, this is broken" response. And I know if my router was
>> taking that long to accept connections it would be promptly rebooted.
>> Its a long way of saying that I think there is some reasonable value here.
> We don't just implement wifi for laptops though. I mean we, Intel, do,
> but generally the stack gets used elsewhere. The random IOT device out
> in my garage? I don't really care where it waits, it only gets a single
> BSSID it could ever use.

Yeah the use-case is important. And for mine where connectivity/uptime 
is very critical even a 1 second comeback delay is too long and I would 
want the device to roam/connect elsewhere, which then leads back to 
CMD_ASSOC_COMEBACK.

So no matter what I'll need that event, I was mainly trying to figure 
out why the kernel let these excessive wait times happen. I don't have 
an answer as to where the cutoff would be though, so I guess I'll drop 
it for now.

Thanks,

James

>
> johannes

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-05-23 13:07 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-22 17:45 Association comeback delay behavior James Prestwood
2025-05-23 11:59 ` Johannes Berg
2025-05-23 12:28   ` James Prestwood
2025-05-23 12:34     ` Johannes Berg
2025-05-23 12:54       ` James Prestwood
2025-05-23 12:57         ` Johannes Berg
2025-05-23 13:07           ` James Prestwood

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox