Wireless Daemon for Linux
 help / color / mirror / Atom feed
From: James Prestwood <prestwoj@gmail.com>
To: Denis Kenzior <denkenz@gmail.com>, iwd@lists.linux.dev
Subject: Re: [PATCH 0/4] Packet/beacon loss roaming improvements
Date: Wed, 1 Nov 2023 05:07:20 -0700	[thread overview]
Message-ID: <68d50637-4b8d-4690-bfac-e379e1044492@gmail.com> (raw)
In-Reply-To: <70935a8f-1f38-4e9e-8d77-40179c2b31f3@gmail.com>

Hi Denis,

On 10/30/23 10:37 AM, James Prestwood wrote:
> Hi Denis,
> 
> On 10/30/23 10:05 AM, Denis Kenzior wrote:
>> Hi James,
>>
>> On 10/30/23 10:37, James Prestwood wrote:
>>> Hi Denis,
>>>
>>> On 10/30/23 8:00 AM, Denis Kenzior wrote:
>>>> Hi James,
>>>>
>>>>>
>>>>> We were seeing beacon loss events not resulting in an immediate
>>>>> disconnnect (as I have always expected), still eventually but after
>>>>
>>>> If I recall correctly, Lost Beacon is sent after several beacons in 
>>>> succession were lost.  You are right that this could just be bad 
>>>> luck and doesn't actually mean that no packets are getting through. 
>>>> However, in practice mac80211 almost always disconnected us soon 
>>>> after.  Didn't we test this pretty thoroughly?
>>>
>>> Yes, it appears mac80211 by default waits for 7 missed beacons before 
>>> sending the event. It then probes the AP (either nullfunc or probe 
>>> request) so apparently the connection could be recovered if the AP 
>>> responded. Unfortunately we don't get any notification in userspace 
>>> if the AP responded or not...
>>
>> So this magic here?
>> https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next.git/tree/net/mac80211/mlme.c#n3215
> 
> Yep
> 
>>
>>>
>>> I can't remember what hardware was tested. But there really wasn't a 
>>> consistent way to test this. The testing involved me disabling 
>>> roaming and walking away from the AP until I got disconnected. 
>>> Sometimes this was due to beacon loss, sometimes the AP disconnected 
>>> explicitly. But what I do remember is when beacon loss occurred, a 
>>> local disconnect followed near immediately. This is why (I think) we 
>>> thought there was no reason to handle this event.
>>
>> So what does your ath10k driver/hw do?  Does it send nullfuncs or 
>> probe requests?
> 
> Based on the driver it sets REPORTS_TX_ACK_STATUS, so nullfunc. Which 
> would add an extra second before disconnect by default (2 nullfuncs 
> 500ms apart).
> 
>>
>>>
>>>>
>>>> My memory is fuzzy here, but I seem to recall that power save has an 
>>>> effect on how lost beacon events are treated by mac80211.  Maybe 
>>>> your recent power save patches had an effect?
>>>
>>>  From what I can tell in mac80211 power save doesn't change handling. 
>>> Its the driver that tells mac80211 of the beacon loss but maybe the 
>>> driver (or firmware) could handle it differently depending on power 
>>> save.
>>>
>>> When I was watching this device power save was disabled.
>>
>> Okay, fair enough.
>>
>>>> If this is a driver behavior quirk, then this belongs in src/wiphy.c 
>>>> driver_infos table somehow.  I'd really rather not add a bazillion 
>>>> config options that address the bug-of-the-day.
>>>
>>> Yeah, adding a driver specific quirk doesn't seem like the right route.
>>>
>>> I think for now there is no harm in trying to roam on beacon loss, 
>>> basically the same handling as packet loss. If a disconnect comes 
>>> immediately the scan would be canceled. Otherwise maybe we get lucky 
>>> and be able to roam.
>>
>> So the problem is, we had the _exact_ same behavior you're proposing 
>> here.  We took it out.  See commit:
>> 836beb1276d1 ("station/wsc: remove beacon loss handling")
>>
>> So when we do that, alarm bells start going off.  Why did we get rid 
>> of it if it was useful?
> 
> Huh, apparently it was handled previously. I really have no idea, 
> apparently I thought something changed in the kernel. Maybe based on 
> testing only certain hardware.
> 
> Looking back to kernel 5.3 (since 5.4+ was mentioned in that commit) I 
> really don't see much difference either. This device was on 5.11, and 
> again, not much different than current upstream in that regard.
> 
>>
>> 7 consecutive lost beacons is actually a lot.  That's ~700ms with no 
>> connection with default settings.  And you can maintain the connection 
>> after that for another 5-6?  Something smells fishy.
> 
> According to the logs, yes. Looking back to the logs in the commit it 
> was actually 4 seconds.
> 
> Still, I'm not sure how it could get that far without a second lost 
> beacon event (e.g. miss beacons, but get a nullfunc reply, miss 7 more, 
> nullfunc reply etc.). With a single event the longest it could get drug 
> out would be 1.7 seconds (7 beacons + 2 nullfuncs) before either 
> disconnecting or recovering but missing more beacons and generating an 
> event, unless there is some other reason=4 failure path I'm missing.
> 
> 
>> If the kernel has a hard limit after which it expects the connection 
>> to be disconnected, we can start a timer for 2-4x that limit?  Looks 
>> like kernel uses probe_wait_ms parameter for this with a default of 
>> 500ms. Is your setup using the default values for beacon_loss_count 
>> and probe_wait_ms?
> 
> Yep, everything is default as far as nl80211/driver options.

I found an old thread I asked on linux-wireless back when I removed the 
beacon loss handling [1]. Johannes explained that behavior did 
apparently change in order to support power save (your memory was correct).

I identified the behavior change with hwsim, and apparently took his 
word when he said:

"I'm pretty sure real hardware will behave just like hwsim here, albeit 
perhaps a bit slower"

And I never added "proper" testing of beacon loss, i.e. block several 
beacons as opposed to just tearing down the AP. And this appears closer 
to the behavior I'm seeing in reality (AP not going down, just dropping 
beacons).

So this is my bad, I didn't take into account the situation of beacons 
being dropped but then being picked up again, or the nullfuncs/probes 
coming back successfully.

[1] 
https://lore.kernel.org/linux-wireless/ada14dfad76b93d654606c3b397de059d968096b.camel@gmail.com/

Thanks,
James
> 
>>
>> Regards,
>> -Denis

  reply	other threads:[~2023-11-01 12:07 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-30 13:48 [PATCH 0/4] Packet/beacon loss roaming improvements James Prestwood
2023-10-30 13:48 ` [PATCH 1/4] station: rename ap_directed_roam to force_roam James Prestwood
2023-10-30 13:48 ` [PATCH 2/4] station: start roam on beacon loss event James Prestwood
2023-10-30 13:48 ` [PATCH 3/4] netdev: handle/send " James Prestwood
2023-10-30 13:48 ` [PATCH 4/4] station: rate limit packet loss roam scans James Prestwood
2023-10-30 14:48   ` Denis Kenzior
2023-10-30 15:00 ` [PATCH 0/4] Packet/beacon loss roaming improvements Denis Kenzior
2023-10-30 15:37   ` James Prestwood
2023-10-30 17:05     ` Denis Kenzior
2023-10-30 17:37       ` James Prestwood
2023-11-01 12:07         ` James Prestwood [this message]
2023-11-02  1:39           ` Denis Kenzior
2023-11-02 11:58             ` James Prestwood
2023-11-02 14:10               ` Denis Kenzior
2023-11-02 14:33                 ` James Prestwood
2023-11-02 15:17                   ` Denis Kenzior
2023-11-02 15:41                     ` James Prestwood
2023-11-02 16:10                       ` Denis Kenzior
2023-11-02 16:13                         ` James Prestwood

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=68d50637-4b8d-4690-bfac-e379e1044492@gmail.com \
    --to=prestwoj@gmail.com \
    --cc=denkenz@gmail.com \
    --cc=iwd@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox