From: James Prestwood <prestwoj@gmail.com>
To: Denis Kenzior <denkenz@gmail.com>, iwd@lists.linux.dev
Subject: Re: [PATCH 0/4] Packet/beacon loss roaming improvements
Date: Mon, 30 Oct 2023 10:37:15 -0700 [thread overview]
Message-ID: <70935a8f-1f38-4e9e-8d77-40179c2b31f3@gmail.com> (raw)
In-Reply-To: <fb8dfeeb-3f5d-49d2-8a4e-063b4933d905@gmail.com>
Hi Denis,
On 10/30/23 10:05 AM, Denis Kenzior wrote:
> Hi James,
>
> On 10/30/23 10:37, James Prestwood wrote:
>> Hi Denis,
>>
>> On 10/30/23 8:00 AM, Denis Kenzior wrote:
>>> Hi James,
>>>
>>>>
>>>> We were seeing beacon loss events not resulting in an immediate
>>>> disconnnect (as I have always expected), still eventually but after
>>>
>>> If I recall correctly, Lost Beacon is sent after several beacons in
>>> succession were lost. You are right that this could just be bad luck
>>> and doesn't actually mean that no packets are getting through.
>>> However, in practice mac80211 almost always disconnected us soon
>>> after. Didn't we test this pretty thoroughly?
>>
>> Yes, it appears mac80211 by default waits for 7 missed beacons before
>> sending the event. It then probes the AP (either nullfunc or probe
>> request) so apparently the connection could be recovered if the AP
>> responded. Unfortunately we don't get any notification in userspace if
>> the AP responded or not...
>
> So this magic here?
> https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next.git/tree/net/mac80211/mlme.c#n3215
Yep
>
>>
>> I can't remember what hardware was tested. But there really wasn't a
>> consistent way to test this. The testing involved me disabling roaming
>> and walking away from the AP until I got disconnected. Sometimes this
>> was due to beacon loss, sometimes the AP disconnected explicitly. But
>> what I do remember is when beacon loss occurred, a local disconnect
>> followed near immediately. This is why (I think) we thought there was
>> no reason to handle this event.
>
> So what does your ath10k driver/hw do? Does it send nullfuncs or probe
> requests?
Based on the driver it sets REPORTS_TX_ACK_STATUS, so nullfunc. Which
would add an extra second before disconnect by default (2 nullfuncs
500ms apart).
>
>>
>>>
>>> My memory is fuzzy here, but I seem to recall that power save has an
>>> effect on how lost beacon events are treated by mac80211. Maybe your
>>> recent power save patches had an effect?
>>
>> From what I can tell in mac80211 power save doesn't change handling.
>> Its the driver that tells mac80211 of the beacon loss but maybe the
>> driver (or firmware) could handle it differently depending on power save.
>>
>> When I was watching this device power save was disabled.
>
> Okay, fair enough.
>
>>> If this is a driver behavior quirk, then this belongs in src/wiphy.c
>>> driver_infos table somehow. I'd really rather not add a bazillion
>>> config options that address the bug-of-the-day.
>>
>> Yeah, adding a driver specific quirk doesn't seem like the right route.
>>
>> I think for now there is no harm in trying to roam on beacon loss,
>> basically the same handling as packet loss. If a disconnect comes
>> immediately the scan would be canceled. Otherwise maybe we get lucky
>> and be able to roam.
>
> So the problem is, we had the _exact_ same behavior you're proposing
> here. We took it out. See commit:
> 836beb1276d1 ("station/wsc: remove beacon loss handling")
>
> So when we do that, alarm bells start going off. Why did we get rid of
> it if it was useful?
Huh, apparently it was handled previously. I really have no idea,
apparently I thought something changed in the kernel. Maybe based on
testing only certain hardware.
Looking back to kernel 5.3 (since 5.4+ was mentioned in that commit) I
really don't see much difference either. This device was on 5.11, and
again, not much different than current upstream in that regard.
>
> 7 consecutive lost beacons is actually a lot. That's ~700ms with no
> connection with default settings. And you can maintain the connection
> after that for another 5-6? Something smells fishy.
According to the logs, yes. Looking back to the logs in the commit it
was actually 4 seconds.
Still, I'm not sure how it could get that far without a second lost
beacon event (e.g. miss beacons, but get a nullfunc reply, miss 7 more,
nullfunc reply etc.). With a single event the longest it could get drug
out would be 1.7 seconds (7 beacons + 2 nullfuncs) before either
disconnecting or recovering but missing more beacons and generating an
event, unless there is some other reason=4 failure path I'm missing.
> If the kernel has a hard limit after which it expects the connection to
> be disconnected, we can start a timer for 2-4x that limit? Looks like
> kernel uses probe_wait_ms parameter for this with a default of 500ms.
> Is your setup using the default values for beacon_loss_count and
> probe_wait_ms?
Yep, everything is default as far as nl80211/driver options.
>
> Regards,
> -Denis
next prev parent reply other threads:[~2023-10-30 17:37 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-30 13:48 [PATCH 0/4] Packet/beacon loss roaming improvements James Prestwood
2023-10-30 13:48 ` [PATCH 1/4] station: rename ap_directed_roam to force_roam James Prestwood
2023-10-30 13:48 ` [PATCH 2/4] station: start roam on beacon loss event James Prestwood
2023-10-30 13:48 ` [PATCH 3/4] netdev: handle/send " James Prestwood
2023-10-30 13:48 ` [PATCH 4/4] station: rate limit packet loss roam scans James Prestwood
2023-10-30 14:48 ` Denis Kenzior
2023-10-30 15:00 ` [PATCH 0/4] Packet/beacon loss roaming improvements Denis Kenzior
2023-10-30 15:37 ` James Prestwood
2023-10-30 17:05 ` Denis Kenzior
2023-10-30 17:37 ` James Prestwood [this message]
2023-11-01 12:07 ` James Prestwood
2023-11-02 1:39 ` Denis Kenzior
2023-11-02 11:58 ` James Prestwood
2023-11-02 14:10 ` Denis Kenzior
2023-11-02 14:33 ` James Prestwood
2023-11-02 15:17 ` Denis Kenzior
2023-11-02 15:41 ` James Prestwood
2023-11-02 16:10 ` Denis Kenzior
2023-11-02 16:13 ` James Prestwood
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=70935a8f-1f38-4e9e-8d77-40179c2b31f3@gmail.com \
--to=prestwoj@gmail.com \
--cc=denkenz@gmail.com \
--cc=iwd@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.