From: James Prestwood <prestwoj@gmail.com>
To: Martin Petzold <martin.petzold@tavla.de>
Cc: iwd@lists.linux.dev, Arend Van Spriel <arend.vanspriel@broadcom.com>
Subject: Re: Connection loss (IWD HEAD with latest OWE / BSS selection patches) - brcmfmac driver
Date: Tue, 5 Nov 2024 05:14:24 -0800 [thread overview]
Message-ID: <43124b6c-cb86-4ac3-b632-01ed9322d685@gmail.com> (raw)
In-Reply-To: <49b9d8b9-2769-4c65-8f10-4ffafc822885@gmail.com>
Hi Martin,
On 11/4/24 3:20 PM, James Prestwood wrote:
> Hi Martin,
>
> On 11/4/24 2:42 PM, Martin Petzold wrote:
>> Dear James,
>>
>> Am 04.11.24 um 13:36 schrieb James Prestwood:
>>>
>>> On 11/3/24 3:13 PM, Martin Petzold wrote:
>>>> Dear James,
>>>>
>>>> Am 25.10.24 um 17:17 schrieb James Prestwood:
>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I open a new thread for this one: During the last weeks I
>>>>>>>>>>>>> have seen connection losses for 30+ minutes, sometimes
>>>>>>>>>>>>> even hours or just now even forever (IWD HEAD with v2 OWE
>>>>>>>>>>>>> / BSS selection patches). Driver is brcmfmac (NXP 6.1.36
>>>>>>>>>>>>> kernel) and chip is BCM4339 (Laird LWB5).
>>>>>>>>>>>>>
>>>>>>>>>>>>> It happens in a) single router environment (WPA2-PSK;
>>>>>>>>>>>>> Touchstone TG3442DE), and b) router + repeater environment
>>>>>>>>>>>>> (WPA2 CCMP; Fritz!Box + Fritz!Repeater), and maybe also in
>>>>>>>>>>>>> the WPA3 OWE Transition network (yesterday lost a
>>>>>>>>>>>>> connection again).
>>>>>>>>>>>>
>>>>>>>>>>>> I lost now again 2 of 10 devices in the WPA3 OWE network
>>>>>>>>>>>> (with roaming). However, now they don't disappear all after
>>>>>>>>>>>> a shorter while. It seems to be later.
>>>>>>>>>>>>
>>>>>>>>>>>> I also lost one device in a Router+Repeater WPA2 (CCMP)
>>>>>>>>>>>> network. It is confirmed here on router side, that the
>>>>>>>>>>>> device is disconnected. Since more than a day.
>>>>>>>>>>>
>>>>>>>>>>> We can't do anything without logs. If you suspect its the
>>>>>>>>>>> blacklist you can lower the blacklist time down in main.conf:
>>>>>>>>>>>
>>>>>>>>>>> [
>>>>
>>>> I am still losing devices. Sometimes they come back again, but
>>>> mostly do not re-connect. I have observed the following:
>>>>
>>>> - Connection exists for several hours until about one day, or two.
>>>> Then gone for several hours or mostly forever.
>>>> - For FritzBox+FritzRepeater I have seen the connection coming back
>>>> after like a day (here connection loss was also confirmed on router
>>>> side!)
>>>> - For the Aruba enterprise environment the connection never came
>>>> back (until now no AP logs - waiting for an answer)
>>>> - After reboot the connection comes back
>>>> - It occurs only in an environment with multiple APs with same SSID
>>>> (i.e. roaming environment), however my single AP environments have
>>>> all strong signal
>>>> - Some devices with identical configuration in this environment DO
>>>> NOT get lost, those seem to have quite strong signal (maybe they
>>>> don't roam)
>>>> - Other devices in the same environment work without any problems
>>>> (Intel+NetworkManager) and the APs are Aruba enterprise grade
>>>> - I see almost the same in the Aruba enterprise environment, but
>>>> ALSO in a FritzBox + FritzRepeater environment
>>>> - We had a bug in our web socket connection, causing to many IWD
>>>> requests. However, this was fixed. And why are all the other
>>>> devices okay? Maybe co-incidence with roaming and anything related
>>>> to dropping and re-connecting web socket connection.
>>>>
>>>> Please find attached my currently available debug logs (they are a
>>>> few days old, but I am quite sure this is the connection loss
>>>> situation). These logs are from the FritzBox+FritzRepeater
>>>> environment. There are no brcmfmac messages (but also no special
>>>> debug level configured here)!
>>>>
>>>> I have now also disabled WiFi power saving and will deploy to the
>>>> environment...hoping the best.
>>>>
>>>> Maybe you could check the logs and have an idea?
>>>
>>> Looks like the same thing as the last logs you sent. IWD tries to
>>> connect (sends CMD_CONNECT to the kernel) but gets no associated
>>> CMD_CONNECT event after that which causes IWD to wait indefinitely
>>> for that event. This, again, appears like a driver problem because
>>> its expected that the kernel tells userspace the result of the
>>> CMD_CONNECT request.
>>>
>>> Only similarity I can see between the two sets of logs is there is a
>>> failed connection just prior to the hang. IWD then attempts to
>>> connect again but the 4-way handshake is never started and this
>>> results in a failure with status 16 (group key handshake timeout).
>>> In your latest set of logs IWD actually again tries to connect to a
>>> different BSS and gets status 16 before trying yet again and hanging.
>>>
>>> This actually seems similar to an issue I encountered with ath10k
>>> where the network interface would time out being brought up.
>>> Retrying would succeed but the driver would be in a similar state
>>> where IWD could authenticate/associate but no data frames (i.e.
>>> 4-way handshake) would be passed to userspace. Only solution (until
>>> upstream fixed the bug) was to unload/reload the driver when we
>>> detected this condition.
>>>
>>> If you are able to physically attach to a device currently in this
>>> state you may be able to get more info. For example if IWD is stuck
>>> like this try disconnecting/reconnecting with iwctl or restarting
>>> IWD to see what happens. If you end up in the same state right away
>>> I'm 99.9% sure the driver is the entire reason your running into this.
>>
>> Are you sure? Maybe you could double-check?
> I'm sure.
>>
>> Because my SOM vendor (Variscite) selling a few hundred thousand of
>> these do not report any issue with this kernel, firmware, and
>> NetworkManager (wpa_supplicant)...
>
> Because wpa_supplicant sets internal timers for these commands in case
> the driver is broken. I would expect you would see this exact behavior
> with wpa_supplicant, it would just disconnect/reconnect after 5
> seconds of no response from the kernel. And something like this either
> a) goes entirely unnoticed and/or b) works well enough for a hardware
> vendor to ship it to customers and not care.
>
> This is the commit adding these timers to wpa_supplicant:
>
> commit e29853bbff1eef781099a9108e3b51f26b477ac3
> Author: Ben Greear <greearb@candelatech.com>
> Date: Thu Feb 24 16:59:46 2011 +0200
>
> SME: Add timers for authentication and asscoiation
>
> mac80211 authentication or association operation may get stuck for
> some
> reasons, so wpa_supplicant better use an internal timer to recover
> from
> this.
>
> Signed-off-by: Ben Greear <greearb@candelatech.com>
>
> I wish it surprised me that 13 years later this behavior still
> happens... We don't like adding special driver workarounds like this
> in IWD because a) it becomes difficult to maintain and b) it just
> hides the root cause and nobody ever fixes it. But my opinions aside,
> for a driver like brcmfmac which is very mainstream, I guess we have
> no choice but to adapt IWD to work around it like wpa_supplicant does.
>
> Thanks,
>
> James
I've sent a patch to the list which sets a timer within IWD in case the
connect event never arrives. Note that I cannot test this beyond
manually commenting out code to "trick" IWD into thinking this happens.
Applying that patch to the brcmfmac client your using is going to be the
true test.
Thanks,
James
next prev parent reply other threads:[~2024-11-05 13:14 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-25 10:12 Connection loss (IWD HEAD with latest OWE / BSS selection patches) - brcmfmac driver Martin Petzold
2024-10-25 11:10 ` Martin Petzold
2024-10-25 11:48 ` James Prestwood
2024-10-25 12:01 ` Martin Petzold
2024-10-25 12:28 ` Martin Petzold
2024-10-25 12:33 ` Martin Petzold
2024-10-25 12:39 ` Martin Petzold
2024-10-25 12:48 ` Martin Petzold
2024-10-25 12:54 ` James Prestwood
2024-10-25 13:05 ` Martin Petzold
2024-10-25 13:17 ` James Prestwood
2024-10-25 13:11 ` Martin Petzold
2024-10-25 13:18 ` James Prestwood
2024-10-25 15:03 ` Martin Petzold
2024-10-25 15:17 ` James Prestwood
2024-10-25 22:22 ` Martin Petzold
2024-10-26 10:01 ` Martin Petzold
2024-10-26 8:26 ` Arend Van Spriel
2024-11-03 23:13 ` Martin Petzold
2024-11-04 0:43 ` Martin Petzold
2024-11-04 12:36 ` James Prestwood
2024-11-04 22:42 ` Martin Petzold
2024-11-04 23:20 ` James Prestwood
2024-11-05 8:03 ` Martin Petzold
2024-11-05 13:14 ` James Prestwood [this message]
2024-11-05 15:16 ` Martin Petzold
2024-11-12 9:15 ` Martin Petzold
2024-11-12 12:13 ` James Prestwood
2024-11-07 13:09 ` Martin Petzold
2024-11-06 20:32 ` Martin Petzold
2024-11-06 21:35 ` James Prestwood
2024-10-25 15:17 ` Martin Petzold
2024-10-26 9:07 ` Arend Van Spriel
2024-10-26 10:08 ` Martin Petzold
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=43124b6c-cb86-4ac3-b632-01ed9322d685@gmail.com \
--to=prestwoj@gmail.com \
--cc=arend.vanspriel@broadcom.com \
--cc=iwd@lists.linux.dev \
--cc=martin.petzold@tavla.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox