public inbox for iwd@lists.linux.dev
 help / color / mirror / Atom feed
From: James Prestwood <prestwoj@gmail.com>
To: Martin Petzold <martin.petzold@tavla.de>
Cc: iwd@lists.linux.dev, Arend Van Spriel <arend.vanspriel@broadcom.com>
Subject: Re: Connection loss (IWD HEAD with latest OWE / BSS selection patches) - brcmfmac driver
Date: Tue, 12 Nov 2024 04:13:23 -0800	[thread overview]
Message-ID: <a2ef307c-e0a9-438f-9bc4-1076b7bc5c1c@gmail.com> (raw)
In-Reply-To: <7fa8c348-9834-4a63-a700-eb3117c4ae89@tavla.de>

Hi Martin,

On 11/12/24 1:15 AM, Martin Petzold wrote:
> Dear James,
>
> Am 05.11.24 um 16:16 schrieb Martin Petzold:
>> Dear James, dear Arend,
>>
>> Am 05.11.24 um 14:14 schrieb James Prestwood:
>>> Hi Martin,
>>>
>>> On 11/4/24 3:20 PM, James Prestwood wrote:
>>>> Hi Martin,
>>>>
>>>> On 11/4/24 2:42 PM, Martin Petzold wrote:
>>>>> Dear James,
>>>>>
>>>>> Am 04.11.24 um 13:36 schrieb James Prestwood:
>>>>>>
>>>>>> On 11/3/24 3:13 PM, Martin Petzold wrote:
>>>>>>> Dear James,
>>>>>>>
>>>>>>> Am 25.10.24 um 17:17 schrieb James Prestwood:
>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I open a new thread for this one: During the last weeks 
>>>>>>>>>>>>>>>> I have seen connection losses for 30+ minutes, 
>>>>>>>>>>>>>>>> sometimes even hours or just now even forever (IWD HEAD 
>>>>>>>>>>>>>>>> with v2 OWE / BSS selection patches). Driver is 
>>>>>>>>>>>>>>>> brcmfmac (NXP 6.1.36 kernel) and chip is BCM4339 (Laird 
>>>>>>>>>>>>>>>> LWB5).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It happens in a) single router environment (WPA2-PSK; 
>>>>>>>>>>>>>>>> Touchstone TG3442DE), and b) router + repeater 
>>>>>>>>>>>>>>>> environment (WPA2 CCMP; Fritz!Box + Fritz!Repeater), 
>>>>>>>>>>>>>>>> and maybe also in the WPA3 OWE Transition network 
>>>>>>>>>>>>>>>> (yesterday lost a connection again).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I lost now again 2 of 10 devices in the WPA3 OWE network 
>>>>>>>>>>>>>>> (with roaming). However, now they don't disappear all 
>>>>>>>>>>>>>>> after a shorter while. It seems to be later.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I also lost one device in a Router+Repeater WPA2 (CCMP) 
>>>>>>>>>>>>>>> network. It is confirmed here on router side, that the 
>>>>>>>>>>>>>>> device is disconnected. Since more than a day.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We can't do anything without logs. If you suspect its the 
>>>>>>>>>>>>>> blacklist you can lower the blacklist time down in 
>>>>>>>>>>>>>> main.conf:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [
>>>>>>>
>>>>>>> I am still losing devices. Sometimes they come back again, but 
>>>>>>> mostly do not re-connect. I have observed the following:
>>>>>>>
>>>>>>> - Connection exists for several hours until about one day, or 
>>>>>>> two. Then gone for several hours or mostly forever.
>>>>>>> - For FritzBox+FritzRepeater I have seen the connection coming 
>>>>>>> back after like a day (here connection loss was also confirmed 
>>>>>>> on router side!)
>>>>>>> - For the Aruba enterprise environment the connection never came 
>>>>>>> back (until now no AP logs - waiting for an answer)
>>>>>>> - After reboot the connection comes back
>>>>>>> - It occurs only in an environment with multiple APs with same 
>>>>>>> SSID (i.e. roaming environment), however my single AP 
>>>>>>> environments have all strong signal
>>>>>>> - Some devices with identical configuration in this environment 
>>>>>>> DO NOT get lost, those seem to have quite strong signal (maybe 
>>>>>>> they don't roam)
>>>>>>> - Other devices in the same environment work without any 
>>>>>>> problems (Intel+NetworkManager) and the APs are Aruba enterprise 
>>>>>>> grade
>>>>>>> - I see almost the same in the Aruba enterprise environment, but 
>>>>>>> ALSO in a FritzBox + FritzRepeater environment
>>>>>>> - We had a bug in our web socket connection, causing to many IWD 
>>>>>>> requests. However, this was fixed. And why are all the other 
>>>>>>> devices okay? Maybe co-incidence with roaming and anything 
>>>>>>> related to dropping and re-connecting web socket connection.
>>>>>>>
>>>>>>> Please find attached my currently available debug logs (they are 
>>>>>>> a few days old, but I am quite sure this is the connection loss 
>>>>>>> situation). These logs are from the FritzBox+FritzRepeater 
>>>>>>> environment. There are no brcmfmac messages (but also no special 
>>>>>>> debug level configured here)!
>>>>>>>
>>>>>>> I have now also disabled WiFi power saving and will deploy to 
>>>>>>> the environment...hoping the best.
>>>>>>>
>>>>>>> Maybe you could check the logs and have an idea?
>>>>>>
>>>>>> Looks like the same thing as the last logs you sent. IWD tries to 
>>>>>> connect (sends CMD_CONNECT to the kernel) but gets no associated 
>>>>>> CMD_CONNECT event after that which causes IWD to wait 
>>>>>> indefinitely for that event. This, again, appears like a driver 
>>>>>> problem because its expected that the kernel tells userspace the 
>>>>>> result of the CMD_CONNECT request.
>>>>>>
>>>>>> Only similarity I can see between the two sets of logs is there 
>>>>>> is a failed connection just prior to the hang. IWD then attempts 
>>>>>> to connect again but the 4-way handshake is never started and 
>>>>>> this results in a failure with status 16 (group key handshake 
>>>>>> timeout). In your latest set of logs IWD actually again tries to 
>>>>>> connect to a different BSS and gets status 16 before trying yet 
>>>>>> again and hanging.
>>>>>>
>>>>>> This actually seems similar to an issue I encountered with ath10k 
>>>>>> where the network interface would time out being brought up. 
>>>>>> Retrying would succeed but the driver would be in a similar state 
>>>>>> where IWD could authenticate/associate but no data frames (i.e. 
>>>>>> 4-way handshake) would be passed to userspace. Only solution 
>>>>>> (until upstream fixed the bug) was to unload/reload the driver 
>>>>>> when we detected this condition.
>>>>>>
>>>>>> If you are able to physically attach to a device currently in 
>>>>>> this state you may be able to get more info. For example if IWD 
>>>>>> is stuck like this try disconnecting/reconnecting with iwctl or 
>>>>>> restarting IWD to see what happens. If you end up in the same 
>>>>>> state right away I'm 99.9% sure the driver is the entire reason 
>>>>>> your running into this.
>>>>>
>>>>> Are you sure? Maybe you could double-check?
>>>> I'm sure.
>>>>>
>>>>> Because my SOM vendor (Variscite) selling a few hundred thousand 
>>>>> of these do not report any issue with this kernel, firmware, and 
>>>>> NetworkManager (wpa_supplicant)...
>>>>
>>>> Because wpa_supplicant sets internal timers for these commands in 
>>>> case the driver is broken. I would expect you would see this exact 
>>>> behavior with wpa_supplicant, it would just disconnect/reconnect 
>>>> after 5 seconds of no response from the kernel. And something like 
>>>> this either a) goes entirely unnoticed and/or b) works well enough 
>>>> for a hardware vendor to ship it to customers and not care.
>>>>
>>>> This is the commit adding these timers to wpa_supplicant:
>>>>
>>>> commit e29853bbff1eef781099a9108e3b51f26b477ac3
>>>> Author: Ben Greear <greearb@candelatech.com>
>>>> Date:   Thu Feb 24 16:59:46 2011 +0200
>>>>
>>>>     SME: Add timers for authentication and asscoiation
>>>>
>>>>     mac80211 authentication or association operation may get stuck 
>>>> for some
>>>>     reasons, so wpa_supplicant better use an internal timer to 
>>>> recover from
>>>>     this.
>>>>
>>>>     Signed-off-by: Ben Greear <greearb@candelatech.com>
>>>>
>>>> I wish it surprised me that 13 years later this behavior still 
>>>> happens... We don't like adding special driver workarounds like 
>>>> this in IWD because a) it becomes difficult to maintain and b) it 
>>>> just hides the root cause and nobody ever fixes it. But my opinions 
>>>> aside, for a driver like brcmfmac which is very mainstream, I guess 
>>>> we have no choice but to adapt IWD to work around it like 
>>>> wpa_supplicant does.
>>>>
>>>> Thanks,
>>>>
>>>> James
>>>
>>> I've sent a patch to the list which sets a timer within IWD in case 
>>> the connect event never arrives. Note that I cannot test this beyond 
>>> manually commenting out code to "trick" IWD into thinking this 
>>> happens. Applying that patch to the brcmfmac client your using is 
>>> going to be the true test.
>>
>> I may be able to test it, however only together with wifi power save 
>> disabled (I also prepared NetworkManager branch, because the customer 
>> will kill me otherwise).
>>
>> @Arent: Maybe you could check all this, as it seems to be related to 
>> some brcmfmac state. Just now I cannot provide logs, and your debug 
>> level for brcmfmac produces a lot lot of data, which I somehow also 
>> need to handle (limited space).
>
> With this patch and power save disabled, we have a better connection, 
> but still carrier losses several times a day (sometimes minutes, 
> sometimes hours). We think this is still due to driver / daemon. We 
> have checked AP logs and the infrastructure is Aruba. We will try to 
> get some more debug logs from that environment.
>
> My feeling is somehow, there could maybe still be a corner-case 
> related to BSS selection in WPA3 OWE. Or it is about failed roamings 
> of the driver.
>
> We will now a) upgrade to 6.6x kernel (driver) and b) add another wifi 
> chip (NXP IW611) to our board and test that one too.
>
> @James: Do you know if NXP IW611 works well with IWD?
Sorry, I have no experience with that chipset.
>
> Thanks,
>
> Martin
>
>

  reply	other threads:[~2024-11-12 12:13 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-25 10:12 Connection loss (IWD HEAD with latest OWE / BSS selection patches) - brcmfmac driver Martin Petzold
2024-10-25 11:10 ` Martin Petzold
2024-10-25 11:48   ` James Prestwood
2024-10-25 12:01     ` Martin Petzold
2024-10-25 12:28     ` Martin Petzold
2024-10-25 12:33       ` Martin Petzold
2024-10-25 12:39       ` Martin Petzold
2024-10-25 12:48       ` Martin Petzold
2024-10-25 12:54       ` James Prestwood
2024-10-25 13:05         ` Martin Petzold
2024-10-25 13:17           ` James Prestwood
2024-10-25 13:11         ` Martin Petzold
2024-10-25 13:18           ` James Prestwood
2024-10-25 15:03             ` Martin Petzold
2024-10-25 15:17               ` James Prestwood
2024-10-25 22:22                 ` Martin Petzold
2024-10-26 10:01                   ` Martin Petzold
2024-10-26  8:26                 ` Arend Van Spriel
2024-11-03 23:13                 ` Martin Petzold
2024-11-04  0:43                   ` Martin Petzold
2024-11-04 12:36                   ` James Prestwood
2024-11-04 22:42                     ` Martin Petzold
2024-11-04 23:20                       ` James Prestwood
2024-11-05  8:03                         ` Martin Petzold
2024-11-05 13:14                         ` James Prestwood
2024-11-05 15:16                           ` Martin Petzold
2024-11-12  9:15                             ` Martin Petzold
2024-11-12 12:13                               ` James Prestwood [this message]
2024-11-07 13:09                         ` Martin Petzold
2024-11-06 20:32                     ` Martin Petzold
2024-11-06 21:35                       ` James Prestwood
2024-10-25 15:17         ` Martin Petzold
2024-10-26  9:07           ` Arend Van Spriel
2024-10-26 10:08             ` Martin Petzold

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a2ef307c-e0a9-438f-9bc4-1076b7bc5c1c@gmail.com \
    --to=prestwoj@gmail.com \
    --cc=arend.vanspriel@broadcom.com \
    --cc=iwd@lists.linux.dev \
    --cc=martin.petzold@tavla.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox