From: James Prestwood <prestwoj@gmail.com>
To: Martin Petzold <martin.petzold@tavla.de>
Cc: iwd@lists.linux.dev, Arend Van Spriel <arend.vanspriel@broadcom.com>
Subject: Re: Connection loss (IWD HEAD with latest OWE / BSS selection patches) - brcmfmac driver
Date: Tue, 12 Nov 2024 04:13:23 -0800 [thread overview]
Message-ID: <a2ef307c-e0a9-438f-9bc4-1076b7bc5c1c@gmail.com> (raw)
In-Reply-To: <7fa8c348-9834-4a63-a700-eb3117c4ae89@tavla.de>
Hi Martin,
On 11/12/24 1:15 AM, Martin Petzold wrote:
> Dear James,
>
> Am 05.11.24 um 16:16 schrieb Martin Petzold:
>> Dear James, dear Arend,
>>
>> Am 05.11.24 um 14:14 schrieb James Prestwood:
>>> Hi Martin,
>>>
>>> On 11/4/24 3:20 PM, James Prestwood wrote:
>>>> Hi Martin,
>>>>
>>>> On 11/4/24 2:42 PM, Martin Petzold wrote:
>>>>> Dear James,
>>>>>
>>>>> Am 04.11.24 um 13:36 schrieb James Prestwood:
>>>>>>
>>>>>> On 11/3/24 3:13 PM, Martin Petzold wrote:
>>>>>>> Dear James,
>>>>>>>
>>>>>>> Am 25.10.24 um 17:17 schrieb James Prestwood:
>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I open a new thread for this one: During the last weeks
>>>>>>>>>>>>>>>> I have seen connection losses for 30+ minutes,
>>>>>>>>>>>>>>>> sometimes even hours or just now even forever (IWD HEAD
>>>>>>>>>>>>>>>> with v2 OWE / BSS selection patches). Driver is
>>>>>>>>>>>>>>>> brcmfmac (NXP 6.1.36 kernel) and chip is BCM4339 (Laird
>>>>>>>>>>>>>>>> LWB5).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It happens in a) single router environment (WPA2-PSK;
>>>>>>>>>>>>>>>> Touchstone TG3442DE), and b) router + repeater
>>>>>>>>>>>>>>>> environment (WPA2 CCMP; Fritz!Box + Fritz!Repeater),
>>>>>>>>>>>>>>>> and maybe also in the WPA3 OWE Transition network
>>>>>>>>>>>>>>>> (yesterday lost a connection again).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I lost now again 2 of 10 devices in the WPA3 OWE network
>>>>>>>>>>>>>>> (with roaming). However, now they don't disappear all
>>>>>>>>>>>>>>> after a shorter while. It seems to be later.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I also lost one device in a Router+Repeater WPA2 (CCMP)
>>>>>>>>>>>>>>> network. It is confirmed here on router side, that the
>>>>>>>>>>>>>>> device is disconnected. Since more than a day.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We can't do anything without logs. If you suspect its the
>>>>>>>>>>>>>> blacklist you can lower the blacklist time down in
>>>>>>>>>>>>>> main.conf:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [
>>>>>>>
>>>>>>> I am still losing devices. Sometimes they come back again, but
>>>>>>> mostly do not re-connect. I have observed the following:
>>>>>>>
>>>>>>> - Connection exists for several hours until about one day, or
>>>>>>> two. Then gone for several hours or mostly forever.
>>>>>>> - For FritzBox+FritzRepeater I have seen the connection coming
>>>>>>> back after like a day (here connection loss was also confirmed
>>>>>>> on router side!)
>>>>>>> - For the Aruba enterprise environment the connection never came
>>>>>>> back (until now no AP logs - waiting for an answer)
>>>>>>> - After reboot the connection comes back
>>>>>>> - It occurs only in an environment with multiple APs with same
>>>>>>> SSID (i.e. roaming environment), however my single AP
>>>>>>> environments have all strong signal
>>>>>>> - Some devices with identical configuration in this environment
>>>>>>> DO NOT get lost, those seem to have quite strong signal (maybe
>>>>>>> they don't roam)
>>>>>>> - Other devices in the same environment work without any
>>>>>>> problems (Intel+NetworkManager) and the APs are Aruba enterprise
>>>>>>> grade
>>>>>>> - I see almost the same in the Aruba enterprise environment, but
>>>>>>> ALSO in a FritzBox + FritzRepeater environment
>>>>>>> - We had a bug in our web socket connection, causing to many IWD
>>>>>>> requests. However, this was fixed. And why are all the other
>>>>>>> devices okay? Maybe co-incidence with roaming and anything
>>>>>>> related to dropping and re-connecting web socket connection.
>>>>>>>
>>>>>>> Please find attached my currently available debug logs (they are
>>>>>>> a few days old, but I am quite sure this is the connection loss
>>>>>>> situation). These logs are from the FritzBox+FritzRepeater
>>>>>>> environment. There are no brcmfmac messages (but also no special
>>>>>>> debug level configured here)!
>>>>>>>
>>>>>>> I have now also disabled WiFi power saving and will deploy to
>>>>>>> the environment...hoping the best.
>>>>>>>
>>>>>>> Maybe you could check the logs and have an idea?
>>>>>>
>>>>>> Looks like the same thing as the last logs you sent. IWD tries to
>>>>>> connect (sends CMD_CONNECT to the kernel) but gets no associated
>>>>>> CMD_CONNECT event after that which causes IWD to wait
>>>>>> indefinitely for that event. This, again, appears like a driver
>>>>>> problem because its expected that the kernel tells userspace the
>>>>>> result of the CMD_CONNECT request.
>>>>>>
>>>>>> Only similarity I can see between the two sets of logs is there
>>>>>> is a failed connection just prior to the hang. IWD then attempts
>>>>>> to connect again but the 4-way handshake is never started and
>>>>>> this results in a failure with status 16 (group key handshake
>>>>>> timeout). In your latest set of logs IWD actually again tries to
>>>>>> connect to a different BSS and gets status 16 before trying yet
>>>>>> again and hanging.
>>>>>>
>>>>>> This actually seems similar to an issue I encountered with ath10k
>>>>>> where the network interface would time out being brought up.
>>>>>> Retrying would succeed but the driver would be in a similar state
>>>>>> where IWD could authenticate/associate but no data frames (i.e.
>>>>>> 4-way handshake) would be passed to userspace. Only solution
>>>>>> (until upstream fixed the bug) was to unload/reload the driver
>>>>>> when we detected this condition.
>>>>>>
>>>>>> If you are able to physically attach to a device currently in
>>>>>> this state you may be able to get more info. For example if IWD
>>>>>> is stuck like this try disconnecting/reconnecting with iwctl or
>>>>>> restarting IWD to see what happens. If you end up in the same
>>>>>> state right away I'm 99.9% sure the driver is the entire reason
>>>>>> your running into this.
>>>>>
>>>>> Are you sure? Maybe you could double-check?
>>>> I'm sure.
>>>>>
>>>>> Because my SOM vendor (Variscite) selling a few hundred thousand
>>>>> of these do not report any issue with this kernel, firmware, and
>>>>> NetworkManager (wpa_supplicant)...
>>>>
>>>> Because wpa_supplicant sets internal timers for these commands in
>>>> case the driver is broken. I would expect you would see this exact
>>>> behavior with wpa_supplicant, it would just disconnect/reconnect
>>>> after 5 seconds of no response from the kernel. And something like
>>>> this either a) goes entirely unnoticed and/or b) works well enough
>>>> for a hardware vendor to ship it to customers and not care.
>>>>
>>>> This is the commit adding these timers to wpa_supplicant:
>>>>
>>>> commit e29853bbff1eef781099a9108e3b51f26b477ac3
>>>> Author: Ben Greear <greearb@candelatech.com>
>>>> Date: Thu Feb 24 16:59:46 2011 +0200
>>>>
>>>> SME: Add timers for authentication and asscoiation
>>>>
>>>> mac80211 authentication or association operation may get stuck
>>>> for some
>>>> reasons, so wpa_supplicant better use an internal timer to
>>>> recover from
>>>> this.
>>>>
>>>> Signed-off-by: Ben Greear <greearb@candelatech.com>
>>>>
>>>> I wish it surprised me that 13 years later this behavior still
>>>> happens... We don't like adding special driver workarounds like
>>>> this in IWD because a) it becomes difficult to maintain and b) it
>>>> just hides the root cause and nobody ever fixes it. But my opinions
>>>> aside, for a driver like brcmfmac which is very mainstream, I guess
>>>> we have no choice but to adapt IWD to work around it like
>>>> wpa_supplicant does.
>>>>
>>>> Thanks,
>>>>
>>>> James
>>>
>>> I've sent a patch to the list which sets a timer within IWD in case
>>> the connect event never arrives. Note that I cannot test this beyond
>>> manually commenting out code to "trick" IWD into thinking this
>>> happens. Applying that patch to the brcmfmac client your using is
>>> going to be the true test.
>>
>> I may be able to test it, however only together with wifi power save
>> disabled (I also prepared NetworkManager branch, because the customer
>> will kill me otherwise).
>>
>> @Arent: Maybe you could check all this, as it seems to be related to
>> some brcmfmac state. Just now I cannot provide logs, and your debug
>> level for brcmfmac produces a lot lot of data, which I somehow also
>> need to handle (limited space).
>
> With this patch and power save disabled, we have a better connection,
> but still carrier losses several times a day (sometimes minutes,
> sometimes hours). We think this is still due to driver / daemon. We
> have checked AP logs and the infrastructure is Aruba. We will try to
> get some more debug logs from that environment.
>
> My feeling is somehow, there could maybe still be a corner-case
> related to BSS selection in WPA3 OWE. Or it is about failed roamings
> of the driver.
>
> We will now a) upgrade to 6.6x kernel (driver) and b) add another wifi
> chip (NXP IW611) to our board and test that one too.
>
> @James: Do you know if NXP IW611 works well with IWD?
Sorry, I have no experience with that chipset.
>
> Thanks,
>
> Martin
>
>
next prev parent reply other threads:[~2024-11-12 12:13 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-25 10:12 Connection loss (IWD HEAD with latest OWE / BSS selection patches) - brcmfmac driver Martin Petzold
2024-10-25 11:10 ` Martin Petzold
2024-10-25 11:48 ` James Prestwood
2024-10-25 12:01 ` Martin Petzold
2024-10-25 12:28 ` Martin Petzold
2024-10-25 12:33 ` Martin Petzold
2024-10-25 12:39 ` Martin Petzold
2024-10-25 12:48 ` Martin Petzold
2024-10-25 12:54 ` James Prestwood
2024-10-25 13:05 ` Martin Petzold
2024-10-25 13:17 ` James Prestwood
2024-10-25 13:11 ` Martin Petzold
2024-10-25 13:18 ` James Prestwood
2024-10-25 15:03 ` Martin Petzold
2024-10-25 15:17 ` James Prestwood
2024-10-25 22:22 ` Martin Petzold
2024-10-26 10:01 ` Martin Petzold
2024-10-26 8:26 ` Arend Van Spriel
2024-11-03 23:13 ` Martin Petzold
2024-11-04 0:43 ` Martin Petzold
2024-11-04 12:36 ` James Prestwood
2024-11-04 22:42 ` Martin Petzold
2024-11-04 23:20 ` James Prestwood
2024-11-05 8:03 ` Martin Petzold
2024-11-05 13:14 ` James Prestwood
2024-11-05 15:16 ` Martin Petzold
2024-11-12 9:15 ` Martin Petzold
2024-11-12 12:13 ` James Prestwood [this message]
2024-11-07 13:09 ` Martin Petzold
2024-11-06 20:32 ` Martin Petzold
2024-11-06 21:35 ` James Prestwood
2024-10-25 15:17 ` Martin Petzold
2024-10-26 9:07 ` Arend Van Spriel
2024-10-26 10:08 ` Martin Petzold
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a2ef307c-e0a9-438f-9bc4-1076b7bc5c1c@gmail.com \
--to=prestwoj@gmail.com \
--cc=arend.vanspriel@broadcom.com \
--cc=iwd@lists.linux.dev \
--cc=martin.petzold@tavla.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox