public inbox for iwd@lists.linux.dev
 help / color / mirror / Atom feed
From: James Prestwood <prestwoj@gmail.com>
To: Denis Kenzior <denkenz@gmail.com>, iwd@lists.linux.dev
Subject: Re: [PATCH 0/4] Packet/beacon loss roaming improvements
Date: Mon, 30 Oct 2023 10:37:15 -0700	[thread overview]
Message-ID: <70935a8f-1f38-4e9e-8d77-40179c2b31f3@gmail.com> (raw)
In-Reply-To: <fb8dfeeb-3f5d-49d2-8a4e-063b4933d905@gmail.com>

Hi Denis,

On 10/30/23 10:05 AM, Denis Kenzior wrote:
> Hi James,
> 
> On 10/30/23 10:37, James Prestwood wrote:
>> Hi Denis,
>>
>> On 10/30/23 8:00 AM, Denis Kenzior wrote:
>>> Hi James,
>>>
>>>>
>>>> We were seeing beacon loss events not resulting in an immediate
>>>> disconnnect (as I have always expected), still eventually but after
>>>
>>> If I recall correctly, Lost Beacon is sent after several beacons in 
>>> succession were lost.  You are right that this could just be bad luck 
>>> and doesn't actually mean that no packets are getting through.  
>>> However, in practice mac80211 almost always disconnected us soon 
>>> after.  Didn't we test this pretty thoroughly?
>>
>> Yes, it appears mac80211 by default waits for 7 missed beacons before 
>> sending the event. It then probes the AP (either nullfunc or probe 
>> request) so apparently the connection could be recovered if the AP 
>> responded. Unfortunately we don't get any notification in userspace if 
>> the AP responded or not...
> 
> So this magic here?
> https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next.git/tree/net/mac80211/mlme.c#n3215

Yep

> 
>>
>> I can't remember what hardware was tested. But there really wasn't a 
>> consistent way to test this. The testing involved me disabling roaming 
>> and walking away from the AP until I got disconnected. Sometimes this 
>> was due to beacon loss, sometimes the AP disconnected explicitly. But 
>> what I do remember is when beacon loss occurred, a local disconnect 
>> followed near immediately. This is why (I think) we thought there was 
>> no reason to handle this event.
> 
> So what does your ath10k driver/hw do?  Does it send nullfuncs or probe 
> requests?

Based on the driver it sets REPORTS_TX_ACK_STATUS, so nullfunc. Which 
would add an extra second before disconnect by default (2 nullfuncs 
500ms apart).

> 
>>
>>>
>>> My memory is fuzzy here, but I seem to recall that power save has an 
>>> effect on how lost beacon events are treated by mac80211.  Maybe your 
>>> recent power save patches had an effect?
>>
>>  From what I can tell in mac80211 power save doesn't change handling. 
>> Its the driver that tells mac80211 of the beacon loss but maybe the 
>> driver (or firmware) could handle it differently depending on power save.
>>
>> When I was watching this device power save was disabled.
> 
> Okay, fair enough.
> 
>>> If this is a driver behavior quirk, then this belongs in src/wiphy.c 
>>> driver_infos table somehow.  I'd really rather not add a bazillion 
>>> config options that address the bug-of-the-day.
>>
>> Yeah, adding a driver specific quirk doesn't seem like the right route.
>>
>> I think for now there is no harm in trying to roam on beacon loss, 
>> basically the same handling as packet loss. If a disconnect comes 
>> immediately the scan would be canceled. Otherwise maybe we get lucky 
>> and be able to roam.
> 
> So the problem is, we had the _exact_ same behavior you're proposing 
> here.  We took it out.  See commit:
> 836beb1276d1 ("station/wsc: remove beacon loss handling")
> 
> So when we do that, alarm bells start going off.  Why did we get rid of 
> it if it was useful?

Huh, apparently it was handled previously. I really have no idea, 
apparently I thought something changed in the kernel. Maybe based on 
testing only certain hardware.

Looking back to kernel 5.3 (since 5.4+ was mentioned in that commit) I 
really don't see much difference either. This device was on 5.11, and 
again, not much different than current upstream in that regard.

> 
> 7 consecutive lost beacons is actually a lot.  That's ~700ms with no 
> connection with default settings.  And you can maintain the connection 
> after that for another 5-6?  Something smells fishy.

According to the logs, yes. Looking back to the logs in the commit it 
was actually 4 seconds.

Still, I'm not sure how it could get that far without a second lost 
beacon event (e.g. miss beacons, but get a nullfunc reply, miss 7 more, 
nullfunc reply etc.). With a single event the longest it could get drug 
out would be 1.7 seconds (7 beacons + 2 nullfuncs) before either 
disconnecting or recovering but missing more beacons and generating an 
event, unless there is some other reason=4 failure path I'm missing.


> If the kernel has a hard limit after which it expects the connection to 
> be disconnected, we can start a timer for 2-4x that limit?  Looks like 
> kernel uses probe_wait_ms parameter for this with a default of 500ms.  
> Is your setup using the default values for beacon_loss_count and 
> probe_wait_ms?

Yep, everything is default as far as nl80211/driver options.

> 
> Regards,
> -Denis

  reply	other threads:[~2023-10-30 17:37 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-30 13:48 [PATCH 0/4] Packet/beacon loss roaming improvements James Prestwood
2023-10-30 13:48 ` [PATCH 1/4] station: rename ap_directed_roam to force_roam James Prestwood
2023-10-30 13:48 ` [PATCH 2/4] station: start roam on beacon loss event James Prestwood
2023-10-30 13:48 ` [PATCH 3/4] netdev: handle/send " James Prestwood
2023-10-30 13:48 ` [PATCH 4/4] station: rate limit packet loss roam scans James Prestwood
2023-10-30 14:48   ` Denis Kenzior
2023-10-30 15:00 ` [PATCH 0/4] Packet/beacon loss roaming improvements Denis Kenzior
2023-10-30 15:37   ` James Prestwood
2023-10-30 17:05     ` Denis Kenzior
2023-10-30 17:37       ` James Prestwood [this message]
2023-11-01 12:07         ` James Prestwood
2023-11-02  1:39           ` Denis Kenzior
2023-11-02 11:58             ` James Prestwood
2023-11-02 14:10               ` Denis Kenzior
2023-11-02 14:33                 ` James Prestwood
2023-11-02 15:17                   ` Denis Kenzior
2023-11-02 15:41                     ` James Prestwood
2023-11-02 16:10                       ` Denis Kenzior
2023-11-02 16:13                         ` James Prestwood

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=70935a8f-1f38-4e9e-8d77-40179c2b31f3@gmail.com \
    --to=prestwoj@gmail.com \
    --cc=denkenz@gmail.com \
    --cc=iwd@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox