public inbox for iwd@lists.linux.dev
 help / color / mirror / Atom feed
From: James Prestwood <prestwoj@gmail.com>
To: Martin Petzold <martin.petzold@tavla.de>
Cc: iwd@lists.linux.dev, Arend Van Spriel <arend.vanspriel@broadcom.com>
Subject: Re: Connection loss (IWD HEAD with latest OWE / BSS selection patches) - brcmfmac driver
Date: Mon, 4 Nov 2024 04:36:31 -0800	[thread overview]
Message-ID: <f489d856-c788-47c5-8739-325a11d7ff42@gmail.com> (raw)
In-Reply-To: <f9ca5733-7cda-4494-8b6d-7a4de4157da2@tavla.de>


On 11/3/24 3:13 PM, Martin Petzold wrote:
> Dear James,
>
> Am 25.10.24 um 17:17 schrieb James Prestwood:
>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I open a new thread for this one: During the last weeks I 
>>>>>>>>>> have seen connection losses for 30+ minutes, sometimes even 
>>>>>>>>>> hours or just now even forever (IWD HEAD with v2 OWE / BSS 
>>>>>>>>>> selection patches). Driver is brcmfmac (NXP 6.1.36 kernel) 
>>>>>>>>>> and chip is BCM4339 (Laird LWB5).
>>>>>>>>>>
>>>>>>>>>> It happens in a) single router environment (WPA2-PSK; 
>>>>>>>>>> Touchstone TG3442DE), and b) router + repeater environment 
>>>>>>>>>> (WPA2 CCMP; Fritz!Box + Fritz!Repeater), and maybe also in 
>>>>>>>>>> the WPA3 OWE Transition network (yesterday lost a connection 
>>>>>>>>>> again).
>>>>>>>>>
>>>>>>>>> I lost now again 2 of 10 devices in the WPA3 OWE network (with 
>>>>>>>>> roaming). However, now they don't disappear all after a 
>>>>>>>>> shorter while. It seems to be later.
>>>>>>>>>
>>>>>>>>> I also lost one device in a Router+Repeater WPA2 (CCMP) 
>>>>>>>>> network. It is confirmed here on router side, that the device 
>>>>>>>>> is disconnected. Since more than a day.
>>>>>>>>
>>>>>>>> We can't do anything without logs. If you suspect its the 
>>>>>>>> blacklist you can lower the blacklist time down in main.conf:
>>>>>>>>
>>>>>>>> [
>
> I am still losing devices. Sometimes they come back again, but mostly 
> do not re-connect. I have observed the following:
>
> - Connection exists for several hours until about one day, or two. 
> Then gone for several hours or mostly forever.
> - For FritzBox+FritzRepeater I have seen the connection coming back 
> after like a day (here connection loss was also confirmed on router 
> side!)
> - For the Aruba enterprise environment the connection never came back 
> (until now no AP logs - waiting for an answer)
> - After reboot the connection comes back
> - It occurs only in an environment with multiple APs with same SSID 
> (i.e. roaming environment), however my single AP environments have all 
> strong signal
> - Some devices with identical configuration in this environment DO NOT 
> get lost, those seem to have quite strong signal (maybe they don't roam)
> - Other devices in the same environment work without any problems 
> (Intel+NetworkManager) and the APs are Aruba enterprise grade
> - I see almost the same in the Aruba enterprise environment, but ALSO 
> in a FritzBox + FritzRepeater environment
> - We had a bug in our web socket connection, causing to many IWD 
> requests. However, this was fixed. And why are all the other devices 
> okay? Maybe co-incidence with roaming and anything related to dropping 
> and re-connecting web socket connection.
>
> Please find attached my currently available debug logs (they are a few 
> days old, but I am quite sure this is the connection loss situation). 
> These logs are from the FritzBox+FritzRepeater environment. There are 
> no brcmfmac messages (but also no special debug level configured here)!
>
> I have now also disabled WiFi power saving and will deploy to the 
> environment...hoping the best.
>
> Maybe you could check the logs and have an idea?

Looks like the same thing as the last logs you sent. IWD tries to 
connect (sends CMD_CONNECT to the kernel) but gets no associated 
CMD_CONNECT event after that which causes IWD to wait indefinitely for 
that event. This, again, appears like a driver problem because its 
expected that the kernel tells userspace the result of the CMD_CONNECT 
request.

Only similarity I can see between the two sets of logs is there is a 
failed connection just prior to the hang. IWD then attempts to connect 
again but the 4-way handshake is never started and this results in a 
failure with status 16 (group key handshake timeout). In your latest set 
of logs IWD actually again tries to connect to a different BSS and gets 
status 16 before trying yet again and hanging.

This actually seems similar to an issue I encountered with ath10k where 
the network interface would time out being brought up. Retrying would 
succeed but the driver would be in a similar state where IWD could 
authenticate/associate but no data frames (i.e. 4-way handshake) would 
be passed to userspace. Only solution (until upstream fixed the bug) was 
to unload/reload the driver when we detected this condition.

If you are able to physically attach to a device currently in this state 
you may be able to get more info. For example if IWD is stuck like this 
try disconnecting/reconnecting with iwctl or restarting IWD to see what 
happens. If you end up in the same state right away I'm 99.9% sure the 
driver is the entire reason your running into this.

Thanks,

James


  parent reply	other threads:[~2024-11-04 12:36 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-25 10:12 Connection loss (IWD HEAD with latest OWE / BSS selection patches) - brcmfmac driver Martin Petzold
2024-10-25 11:10 ` Martin Petzold
2024-10-25 11:48   ` James Prestwood
2024-10-25 12:01     ` Martin Petzold
2024-10-25 12:28     ` Martin Petzold
2024-10-25 12:33       ` Martin Petzold
2024-10-25 12:39       ` Martin Petzold
2024-10-25 12:48       ` Martin Petzold
2024-10-25 12:54       ` James Prestwood
2024-10-25 13:05         ` Martin Petzold
2024-10-25 13:17           ` James Prestwood
2024-10-25 13:11         ` Martin Petzold
2024-10-25 13:18           ` James Prestwood
2024-10-25 15:03             ` Martin Petzold
2024-10-25 15:17               ` James Prestwood
2024-10-25 22:22                 ` Martin Petzold
2024-10-26 10:01                   ` Martin Petzold
2024-10-26  8:26                 ` Arend Van Spriel
2024-11-03 23:13                 ` Martin Petzold
2024-11-04  0:43                   ` Martin Petzold
2024-11-04 12:36                   ` James Prestwood [this message]
2024-11-04 22:42                     ` Martin Petzold
2024-11-04 23:20                       ` James Prestwood
2024-11-05  8:03                         ` Martin Petzold
2024-11-05 13:14                         ` James Prestwood
2024-11-05 15:16                           ` Martin Petzold
2024-11-12  9:15                             ` Martin Petzold
2024-11-12 12:13                               ` James Prestwood
2024-11-07 13:09                         ` Martin Petzold
2024-11-06 20:32                     ` Martin Petzold
2024-11-06 21:35                       ` James Prestwood
2024-10-25 15:17         ` Martin Petzold
2024-10-26  9:07           ` Arend Van Spriel
2024-10-26 10:08             ` Martin Petzold

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f489d856-c788-47c5-8739-325a11d7ff42@gmail.com \
    --to=prestwoj@gmail.com \
    --cc=arend.vanspriel@broadcom.com \
    --cc=iwd@lists.linux.dev \
    --cc=martin.petzold@tavla.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox