From: James Prestwood <prestwoj@gmail.com>
To: Martin Petzold <martin.petzold@tavla.de>
Cc: iwd@lists.linux.dev, Arend Van Spriel <arend.vanspriel@broadcom.com>
Subject: Re: Connection loss (IWD HEAD with latest OWE / BSS selection patches) - brcmfmac driver
Date: Wed, 6 Nov 2024 13:35:15 -0800 [thread overview]
Message-ID: <22221676-2961-4dc9-a1ae-1ffbaed34316@gmail.com> (raw)
In-Reply-To: <b718a3d3-aa26-4b95-9b16-f4c9acd27663@tavla.de>
Hi Martin,
On 11/6/24 12:32 PM, Martin Petzold wrote:
> Dear James,
>
> Am 04.11.24 um 13:36 schrieb James Prestwood:
>>
>> On 11/3/24 3:13 PM, Martin Petzold wrote:
>>> Dear James,
>>>
>>> Am 25.10.24 um 17:17 schrieb James Prestwood:
>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I open a new thread for this one: During the last weeks I
>>>>>>>>>>>> have seen connection losses for 30+ minutes, sometimes even
>>>>>>>>>>>> hours or just now even forever (IWD HEAD with v2 OWE / BSS
>>>>>>>>>>>> selection patches). Driver is brcmfmac (NXP 6.1.36 kernel)
>>>>>>>>>>>> and chip is BCM4339 (Laird LWB5).
>>>>>>>>>>>>
>>>>>>>>>>>> It happens in a) single router environment (WPA2-PSK;
>>>>>>>>>>>> Touchstone TG3442DE), and b) router + repeater environment
>>>>>>>>>>>> (WPA2 CCMP; Fritz!Box + Fritz!Repeater), and maybe also in
>>>>>>>>>>>> the WPA3 OWE Transition network (yesterday lost a
>>>>>>>>>>>> connection again).
>>>>>>>>>>>
>>>>>>>>>>> I lost now again 2 of 10 devices in the WPA3 OWE network
>>>>>>>>>>> (with roaming). However, now they don't disappear all after
>>>>>>>>>>> a shorter while. It seems to be later.
>>>>>>>>>>>
>>>>>>>>>>> I also lost one device in a Router+Repeater WPA2 (CCMP)
>>>>>>>>>>> network. It is confirmed here on router side, that the
>>>>>>>>>>> device is disconnected. Since more than a day.
>>>>>>>>>>
>>>>>>>>>> We can't do anything without logs. If you suspect its the
>>>>>>>>>> blacklist you can lower the blacklist time down in main.conf:
>>>>>>>>>>
>>>>>>>>>> [
>>>
>>> I am still losing devices. Sometimes they come back again, but
>>> mostly do not re-connect. I have observed the following:
>>>
>>> - Connection exists for several hours until about one day, or two.
>>> Then gone for several hours or mostly forever.
>>> - For FritzBox+FritzRepeater I have seen the connection coming back
>>> after like a day (here connection loss was also confirmed on router
>>> side!)
>>> - For the Aruba enterprise environment the connection never came
>>> back (until now no AP logs - waiting for an answer)
>>> - After reboot the connection comes back
>>> - It occurs only in an environment with multiple APs with same SSID
>>> (i.e. roaming environment), however my single AP environments have
>>> all strong signal
>>> - Some devices with identical configuration in this environment DO
>>> NOT get lost, those seem to have quite strong signal (maybe they
>>> don't roam)
>>> - Other devices in the same environment work without any problems
>>> (Intel+NetworkManager) and the APs are Aruba enterprise grade
>>> - I see almost the same in the Aruba enterprise environment, but
>>> ALSO in a FritzBox + FritzRepeater environment
>>> - We had a bug in our web socket connection, causing to many IWD
>>> requests. However, this was fixed. And why are all the other devices
>>> okay? Maybe co-incidence with roaming and anything related to
>>> dropping and re-connecting web socket connection.
>>>
>>> Please find attached my currently available debug logs (they are a
>>> few days old, but I am quite sure this is the connection loss
>>> situation). These logs are from the FritzBox+FritzRepeater
>>> environment. There are no brcmfmac messages (but also no special
>>> debug level configured here)!
>>>
>>> I have now also disabled WiFi power saving and will deploy to the
>>> environment...hoping the best.
>>>
>>> Maybe you could check the logs and have an idea?
>>
>> Looks like the same thing as the last logs you sent. IWD tries to
>> connect (sends CMD_CONNECT to the kernel) but gets no associated
>> CMD_CONNECT event after that which causes IWD to wait indefinitely
>> for that event. This, again, appears like a driver problem because
>> its expected that the kernel tells userspace the result of the
>> CMD_CONNECT request.
>>
>> Only similarity I can see between the two sets of logs is there is a
>> failed connection just prior to the hang. IWD then attempts to
>> connect again but the 4-way handshake is never started and this
>> results in a failure with status 16 (group key handshake timeout). In
>> your latest set of logs IWD actually again tries to connect to a
>> different BSS and gets status 16 before trying yet again and hanging.
>>
>> This actually seems similar to an issue I encountered with ath10k
>> where the network interface would time out being brought up. Retrying
>> would succeed but the driver would be in a similar state where IWD
>> could authenticate/associate but no data frames (i.e. 4-way
>> handshake) would be passed to userspace. Only solution (until
>> upstream fixed the bug) was to unload/reload the driver when we
>> detected this condition.
>>
>> If you are able to physically attach to a device currently in this
>> state you may be able to get more info. For example if IWD is stuck
>> like this try disconnecting/reconnecting with iwctl or restarting IWD
>> to see what happens. If you end up in the same state right away I'm
>> 99.9% sure the driver is the entire reason your running into this.
>
> -----
>
> Okt 24 02:30:33 tavla iwd[386]: src/netdev.c:netdev_mlme_notify() MLME
> notification Disconnect(48)
> Okt 24 02:30:33 tavla iwd[386]: src/netdev.c:netdev_disconnect_event()
> Okt 24 02:30:33 tavla iwd[386]: Received Deauthentication event,
> reason: 8, from_ap: true
>
> -----
>
> From where does this MLME notification Disconnect come before the AP
> Deauth event?
Its all part of the same call stack. The kernel sent a disconnect
notification and IWD handles it.
>
> And does reason 8 mean "Disassociated because sending station is
> leaving (or has left)" (which sounds like originally station decision)
> OR "AP moved the client to another access point using non-aggressive
> load balancing" (which then seems to be an AP decision)?
I've learned to take any disconnect reason code the AP sends with a huge
grain of salt. I work with devices running under various professional
WiFi network vendors (Cisco, Meraki, Aruba, etc) and I've seen all kinds
of reason codes that make little to no sense at all. I've seen reason
code 8 specifically with Cisco APs and actually sat on calls with Cisco
engineers asking them why. They weren't able to give me a reason, even
denied that reason code 8 was possible with their equipment (until I
showed them an OTA packet capture of it happening). Anyways, what I'm
saying is you could spend a eternity trying to make sense of why or how
a client gets disconnect by an AP but its generally not worth time
(unless its happening very frequently). If the disconnects are
infrequent, and the client recovers quickly, this is about par for the
course in my experience.
>
> Because we are talking about signals around -70. What would the event
> be, if the AP finds the station unacceptable because of bad signal
> strength (so this is signal in direction to AP). Or doesn't this exist?
You'd have to ask the AP vendors. IWD just gets notified it was
disconnected and a reason code. When you see "from_ap: true" all we know
is it was initiated by the AP.
>
> Thanks,
>
> Martin
>
next prev parent reply other threads:[~2024-11-06 21:35 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-25 10:12 Connection loss (IWD HEAD with latest OWE / BSS selection patches) - brcmfmac driver Martin Petzold
2024-10-25 11:10 ` Martin Petzold
2024-10-25 11:48 ` James Prestwood
2024-10-25 12:01 ` Martin Petzold
2024-10-25 12:28 ` Martin Petzold
2024-10-25 12:33 ` Martin Petzold
2024-10-25 12:39 ` Martin Petzold
2024-10-25 12:48 ` Martin Petzold
2024-10-25 12:54 ` James Prestwood
2024-10-25 13:05 ` Martin Petzold
2024-10-25 13:17 ` James Prestwood
2024-10-25 13:11 ` Martin Petzold
2024-10-25 13:18 ` James Prestwood
2024-10-25 15:03 ` Martin Petzold
2024-10-25 15:17 ` James Prestwood
2024-10-25 22:22 ` Martin Petzold
2024-10-26 10:01 ` Martin Petzold
2024-10-26 8:26 ` Arend Van Spriel
2024-11-03 23:13 ` Martin Petzold
2024-11-04 0:43 ` Martin Petzold
2024-11-04 12:36 ` James Prestwood
2024-11-04 22:42 ` Martin Petzold
2024-11-04 23:20 ` James Prestwood
2024-11-05 8:03 ` Martin Petzold
2024-11-05 13:14 ` James Prestwood
2024-11-05 15:16 ` Martin Petzold
2024-11-12 9:15 ` Martin Petzold
2024-11-12 12:13 ` James Prestwood
2024-11-07 13:09 ` Martin Petzold
2024-11-06 20:32 ` Martin Petzold
2024-11-06 21:35 ` James Prestwood [this message]
2024-10-25 15:17 ` Martin Petzold
2024-10-26 9:07 ` Arend Van Spriel
2024-10-26 10:08 ` Martin Petzold
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=22221676-2961-4dc9-a1ae-1ffbaed34316@gmail.com \
--to=prestwoj@gmail.com \
--cc=arend.vanspriel@broadcom.com \
--cc=iwd@lists.linux.dev \
--cc=martin.petzold@tavla.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox