All of lore.kernel.org
 help / color / mirror / Atom feed
From: James Prestwood <prestwoj@gmail.com>
To: Denis Kenzior <denkenz@gmail.com>, iwd@lists.linux.dev
Subject: Re: [PATCH 0/4] Packet/beacon loss roaming improvements
Date: Mon, 30 Oct 2023 10:37:15 -0700	[thread overview]
Message-ID: <70935a8f-1f38-4e9e-8d77-40179c2b31f3@gmail.com> (raw)
In-Reply-To: <fb8dfeeb-3f5d-49d2-8a4e-063b4933d905@gmail.com>

Hi Denis,

On 10/30/23 10:05 AM, Denis Kenzior wrote:
> Hi James,
> 
> On 10/30/23 10:37, James Prestwood wrote:
>> Hi Denis,
>>
>> On 10/30/23 8:00 AM, Denis Kenzior wrote:
>>> Hi James,
>>>
>>>>
>>>> We were seeing beacon loss events not resulting in an immediate
>>>> disconnnect (as I have always expected), still eventually but after
>>>
>>> If I recall correctly, Lost Beacon is sent after several beacons in 
>>> succession were lost.  You are right that this could just be bad luck 
>>> and doesn't actually mean that no packets are getting through.  
>>> However, in practice mac80211 almost always disconnected us soon 
>>> after.  Didn't we test this pretty thoroughly?
>>
>> Yes, it appears mac80211 by default waits for 7 missed beacons before 
>> sending the event. It then probes the AP (either nullfunc or probe 
>> request) so apparently the connection could be recovered if the AP 
>> responded. Unfortunately we don't get any notification in userspace if 
>> the AP responded or not...
> 
> So this magic here?
> https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next.git/tree/net/mac80211/mlme.c#n3215

Yep

> 
>>
>> I can't remember what hardware was tested. But there really wasn't a 
>> consistent way to test this. The testing involved me disabling roaming 
>> and walking away from the AP until I got disconnected. Sometimes this 
>> was due to beacon loss, sometimes the AP disconnected explicitly. But 
>> what I do remember is when beacon loss occurred, a local disconnect 
>> followed near immediately. This is why (I think) we thought there was 
>> no reason to handle this event.
> 
> So what does your ath10k driver/hw do?  Does it send nullfuncs or probe 
> requests?

Based on the driver it sets REPORTS_TX_ACK_STATUS, so nullfunc. Which 
would add an extra second before disconnect by default (2 nullfuncs 
500ms apart).

> 
>>
>>>
>>> My memory is fuzzy here, but I seem to recall that power save has an 
>>> effect on how lost beacon events are treated by mac80211.  Maybe your 
>>> recent power save patches had an effect?
>>
>>  From what I can tell in mac80211 power save doesn't change handling. 
>> Its the driver that tells mac80211 of the beacon loss but maybe the 
>> driver (or firmware) could handle it differently depending on power save.
>>
>> When I was watching this device power save was disabled.
> 
> Okay, fair enough.
> 
>>> If this is a driver behavior quirk, then this belongs in src/wiphy.c 
>>> driver_infos table somehow.  I'd really rather not add a bazillion 
>>> config options that address the bug-of-the-day.
>>
>> Yeah, adding a driver specific quirk doesn't seem like the right route.
>>
>> I think for now there is no harm in trying to roam on beacon loss, 
>> basically the same handling as packet loss. If a disconnect comes 
>> immediately the scan would be canceled. Otherwise maybe we get lucky 
>> and be able to roam.
> 
> So the problem is, we had the _exact_ same behavior you're proposing 
> here.  We took it out.  See commit:
> 836beb1276d1 ("station/wsc: remove beacon loss handling")
> 
> So when we do that, alarm bells start going off.  Why did we get rid of 
> it if it was useful?

Huh, apparently it was handled previously. I really have no idea, 
apparently I thought something changed in the kernel. Maybe based on 
testing only certain hardware.

Looking back to kernel 5.3 (since 5.4+ was mentioned in that commit) I 
really don't see much difference either. This device was on 5.11, and 
again, not much different than current upstream in that regard.

> 
> 7 consecutive lost beacons is actually a lot.  That's ~700ms with no 
> connection with default settings.  And you can maintain the connection 
> after that for another 5-6?  Something smells fishy.

According to the logs, yes. Looking back to the logs in the commit it 
was actually 4 seconds.

Still, I'm not sure how it could get that far without a second lost 
beacon event (e.g. miss beacons, but get a nullfunc reply, miss 7 more, 
nullfunc reply etc.). With a single event the longest it could get drug 
out would be 1.7 seconds (7 beacons + 2 nullfuncs) before either 
disconnecting or recovering but missing more beacons and generating an 
event, unless there is some other reason=4 failure path I'm missing.


> If the kernel has a hard limit after which it expects the connection to 
> be disconnected, we can start a timer for 2-4x that limit?  Looks like 
> kernel uses probe_wait_ms parameter for this with a default of 500ms.  
> Is your setup using the default values for beacon_loss_count and 
> probe_wait_ms?

Yep, everything is default as far as nl80211/driver options.

> 
> Regards,
> -Denis

  reply	other threads:[~2023-10-30 17:37 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-30 13:48 [PATCH 0/4] Packet/beacon loss roaming improvements James Prestwood
2023-10-30 13:48 ` [PATCH 1/4] station: rename ap_directed_roam to force_roam James Prestwood
2023-10-30 13:48 ` [PATCH 2/4] station: start roam on beacon loss event James Prestwood
2023-10-30 13:48 ` [PATCH 3/4] netdev: handle/send " James Prestwood
2023-10-30 13:48 ` [PATCH 4/4] station: rate limit packet loss roam scans James Prestwood
2023-10-30 14:48   ` Denis Kenzior
2023-10-30 15:00 ` [PATCH 0/4] Packet/beacon loss roaming improvements Denis Kenzior
2023-10-30 15:37   ` James Prestwood
2023-10-30 17:05     ` Denis Kenzior
2023-10-30 17:37       ` James Prestwood [this message]
2023-11-01 12:07         ` James Prestwood
2023-11-02  1:39           ` Denis Kenzior
2023-11-02 11:58             ` James Prestwood
2023-11-02 14:10               ` Denis Kenzior
2023-11-02 14:33                 ` James Prestwood
2023-11-02 15:17                   ` Denis Kenzior
2023-11-02 15:41                     ` James Prestwood
2023-11-02 16:10                       ` Denis Kenzior
2023-11-02 16:13                         ` James Prestwood

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=70935a8f-1f38-4e9e-8d77-40179c2b31f3@gmail.com \
    --to=prestwoj@gmail.com \
    --cc=denkenz@gmail.com \
    --cc=iwd@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.