From: James Prestwood <prestwoj@gmail.com>
To: Denis Kenzior <denkenz@gmail.com>, iwd@lists.linux.dev
Subject: Re: [PATCH 0/4] Packet/beacon loss roaming improvements
Date: Mon, 30 Oct 2023 10:37:15 -0700 [thread overview]
Message-ID: <70935a8f-1f38-4e9e-8d77-40179c2b31f3@gmail.com> (raw)
In-Reply-To: <fb8dfeeb-3f5d-49d2-8a4e-063b4933d905@gmail.com>
Hi Denis,
On 10/30/23 10:05 AM, Denis Kenzior wrote:
> Hi James,
>
> On 10/30/23 10:37, James Prestwood wrote:
>> Hi Denis,
>>
>> On 10/30/23 8:00 AM, Denis Kenzior wrote:
>>> Hi James,
>>>
>>>>
>>>> We were seeing beacon loss events not resulting in an immediate
>>>> disconnnect (as I have always expected), still eventually but after
>>>
>>> If I recall correctly, Lost Beacon is sent after several beacons in
>>> succession were lost. You are right that this could just be bad luck
>>> and doesn't actually mean that no packets are getting through.
>>> However, in practice mac80211 almost always disconnected us soon
>>> after. Didn't we test this pretty thoroughly?
>>
>> Yes, it appears mac80211 by default waits for 7 missed beacons before
>> sending the event. It then probes the AP (either nullfunc or probe
>> request) so apparently the connection could be recovered if the AP
>> responded. Unfortunately we don't get any notification in userspace if
>> the AP responded or not...
>
> So this magic here?
> https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next.git/tree/net/mac80211/mlme.c#n3215
Yep
>
>>
>> I can't remember what hardware was tested. But there really wasn't a
>> consistent way to test this. The testing involved me disabling roaming
>> and walking away from the AP until I got disconnected. Sometimes this
>> was due to beacon loss, sometimes the AP disconnected explicitly. But
>> what I do remember is when beacon loss occurred, a local disconnect
>> followed near immediately. This is why (I think) we thought there was
>> no reason to handle this event.
>
> So what does your ath10k driver/hw do? Does it send nullfuncs or probe
> requests?
Based on the driver it sets REPORTS_TX_ACK_STATUS, so nullfunc. Which
would add an extra second before disconnect by default (2 nullfuncs
500ms apart).
>
>>
>>>
>>> My memory is fuzzy here, but I seem to recall that power save has an
>>> effect on how lost beacon events are treated by mac80211. Maybe your
>>> recent power save patches had an effect?
>>
>> From what I can tell in mac80211 power save doesn't change handling.
>> Its the driver that tells mac80211 of the beacon loss but maybe the
>> driver (or firmware) could handle it differently depending on power save.
>>
>> When I was watching this device power save was disabled.
>
> Okay, fair enough.
>
>>> If this is a driver behavior quirk, then this belongs in src/wiphy.c
>>> driver_infos table somehow. I'd really rather not add a bazillion
>>> config options that address the bug-of-the-day.
>>
>> Yeah, adding a driver specific quirk doesn't seem like the right route.
>>
>> I think for now there is no harm in trying to roam on beacon loss,
>> basically the same handling as packet loss. If a disconnect comes
>> immediately the scan would be canceled. Otherwise maybe we get lucky
>> and be able to roam.
>
> So the problem is, we had the _exact_ same behavior you're proposing
> here. We took it out. See commit:
> 836beb1276d1 ("station/wsc: remove beacon loss handling")
>
> So when we do that, alarm bells start going off. Why did we get rid of
> it if it was useful?
Huh, apparently it was handled previously. I really have no idea,
apparently I thought something changed in the kernel. Maybe based on
testing only certain hardware.
Looking back to kernel 5.3 (since 5.4+ was mentioned in that commit) I
really don't see much difference either. This device was on 5.11, and
again, not much different than current upstream in that regard.
>
> 7 consecutive lost beacons is actually a lot. That's ~700ms with no
> connection with default settings. And you can maintain the connection
> after that for another 5-6? Something smells fishy.
According to the logs, yes. Looking back to the logs in the commit it
was actually 4 seconds.
Still, I'm not sure how it could get that far without a second lost
beacon event (e.g. miss beacons, but get a nullfunc reply, miss 7 more,
nullfunc reply etc.). With a single event the longest it could get drug
out would be 1.7 seconds (7 beacons + 2 nullfuncs) before either
disconnecting or recovering but missing more beacons and generating an
event, unless there is some other reason=4 failure path I'm missing.
> If the kernel has a hard limit after which it expects the connection to
> be disconnected, we can start a timer for 2-4x that limit? Looks like
> kernel uses probe_wait_ms parameter for this with a default of 500ms.
> Is your setup using the default values for beacon_loss_count and
> probe_wait_ms?
Yep, everything is default as far as nl80211/driver options.
>
> Regards,
> -Denis
next prev parent reply other threads:[~2023-10-30 17:37 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-30 13:48 [PATCH 0/4] Packet/beacon loss roaming improvements James Prestwood
2023-10-30 13:48 ` [PATCH 1/4] station: rename ap_directed_roam to force_roam James Prestwood
2023-10-30 13:48 ` [PATCH 2/4] station: start roam on beacon loss event James Prestwood
2023-10-30 13:48 ` [PATCH 3/4] netdev: handle/send " James Prestwood
2023-10-30 13:48 ` [PATCH 4/4] station: rate limit packet loss roam scans James Prestwood
2023-10-30 14:48 ` Denis Kenzior
2023-10-30 15:00 ` [PATCH 0/4] Packet/beacon loss roaming improvements Denis Kenzior
2023-10-30 15:37 ` James Prestwood
2023-10-30 17:05 ` Denis Kenzior
2023-10-30 17:37 ` James Prestwood [this message]
2023-11-01 12:07 ` James Prestwood
2023-11-02 1:39 ` Denis Kenzior
2023-11-02 11:58 ` James Prestwood
2023-11-02 14:10 ` Denis Kenzior
2023-11-02 14:33 ` James Prestwood
2023-11-02 15:17 ` Denis Kenzior
2023-11-02 15:41 ` James Prestwood
2023-11-02 16:10 ` Denis Kenzior
2023-11-02 16:13 ` James Prestwood
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=70935a8f-1f38-4e9e-8d77-40179c2b31f3@gmail.com \
--to=prestwoj@gmail.com \
--cc=denkenz@gmail.com \
--cc=iwd@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox