public inbox for iwd@lists.linux.dev
 help / color / mirror / Atom feed
From: James Prestwood <prestwoj@gmail.com>
To: Martin Petzold <martin.petzold@tavla.de>
Cc: iwd@lists.linux.dev
Subject: Re: Connection loss (IWD HEAD with latest OWE / BSS selection patches) - brcmfmac driver
Date: Fri, 25 Oct 2024 05:54:45 -0700	[thread overview]
Message-ID: <d33154fa-9309-4285-a592-9e377e199f43@gmail.com> (raw)
In-Reply-To: <6d384377-ee6c-4a5d-8b67-75f367403acd@tavla.de>

Hi Martin,

On 10/25/24 5:28 AM, Martin Petzold wrote:
> Hi James,
>
> Am 25.10.24 um 13:48 schrieb James Prestwood:
>> Hi Martin,
>>
>> On 10/25/24 4:10 AM, Martin Petzold wrote:
>>> Hi all,
>>>
>>> Am 25.10.24 um 12:12 schrieb Martin Petzold:
>>>> Hi all,
>>>>
>>>> I open a new thread for this one: During the last weeks I have seen 
>>>> connection losses for 30+ minutes, sometimes even hours or just now 
>>>> even forever (IWD HEAD with v2 OWE / BSS selection patches). Driver 
>>>> is brcmfmac (NXP 6.1.36 kernel) and chip is BCM4339 (Laird LWB5).
>>>>
>>>> It happens in a) single router environment (WPA2-PSK; Touchstone 
>>>> TG3442DE), and b) router + repeater environment (WPA2 CCMP; 
>>>> Fritz!Box + Fritz!Repeater), and maybe also in the WPA3 OWE 
>>>> Transition network (yesterday lost a connection again).
>>>
>>> I lost now again 2 of 10 devices in the WPA3 OWE network (with 
>>> roaming). However, now they don't disappear all after a shorter 
>>> while. It seems to be later.
>>>
>>> I also lost one device in a Router+Repeater WPA2 (CCMP) network. It 
>>> is confirmed here on router side, that the device is disconnected. 
>>> Since more than a day.
>>
>> We can't do anything without logs. If you suspect its the blacklist 
>> you can lower the blacklist time down in main.conf:
>>
>> [Blacklist]
>> MaximumTimeout=5
>>
>> But logs would be best, we would be able to see if that is whats 
>> happening or if its something else.
>>
>>
> I can also now see this on one of our local devices (which was 
> connected to Ethernet and WiFi).
>
> Please find attached the log. The logs start at late 23rd with the 
> image upgrade.
>
> I can see on our server side, that devices continuously connect and 
> disconnect again (missed heartbeat = hard loss of websocket 
> connection). I see some type of connection loop on client side too 
> ("org.eclipse.jetty.websocket.api.UpgradeException: 0 null" caused by 
> "java.util.concurrent.TimeoutException: DNS timeout 15000 ms"). Maybe 
> it alternates between Ethernet and WiFi. You will understand better. 
> Or the network is re-configured all the time.
>
> Here is my local status:
>
> tavla@tavla:~$ iwctl station wlan0 show
>                                  Station: wlan0
> -------------------------------------------------------------------------------- 
>
>   Settable  Property Value
> -------------------------------------------------------------------------------- 
>
>             Scanning no
>             State connecting
>             Connected network     XYZ
>             No IP addresses       Is DHCP client configured?
>
> tavla@tavla:~$ networkctl status
> ●        State: routable
>   Online state: unknown
>        Address: 192.168.178.178 on eth0
>                 2a0a:a549:da80:0:fadc:7aff:fe67:2e4 on eth0
>                 fe80::fadc:7aff:fe67:2e4 on eth0
>                 fe80::c2ee:40ff:fe8a:dd62 on wlan0
>        Gateway: 192.168.178.1 on eth0
>                 fe80::3a10:d5ff:fe37:2c79 on eth0
>            DNS: 192.168.178.1
>            NTP: 192.168.178.1
> IDX LINK  TYPE     OPERATIONAL SETUP
>   1 lo    loopback carrier     unmanaged
>   2 eth0  ether    routable    configured
>   3 wlan0 wlan     no-carrier  unmanaged
>
> 3 links listed.

Yep, IWD is ultimately hung up waiting for a connect event from the 
kernel/driver. A few things I noticed prior to that:

1. Not really critical but you have an agent connecting and 
disconnecting multiple times a second. Are you polling info with iwctl 
or something?

2. I see this really odd local disconnect with reason code 0. This is 
likely something from brcmfmac because I doubt the wireless subsystem 
would use a "success" reason code in a local deauth:

Okt 24 06:05:00 tavla iwd[384]: Received Deauthentication event, reason: 
0, from_ap: false

3. You had another disconnect from the AP, but IWD reconnected:

Okt 24 06:06:12 tavla iwd[384]: Received Deauthentication event, reason: 
6, from_ap: true

4. You then got a disconnect from the AP yet again, reason code 3:

Okt 25 06:13:21 tavla iwd[384]: Received Deauthentication event, reason: 
3, from_ap: true

# You got disconnected by the AP here

Okt 25 06:13:21 tavla iwd[384]: src/station.c:station_disconnect_event() 3
Okt 25 06:13:21 tavla iwd[384]: event: disconnect-info, reason: 3
Okt 25 06:13:21 tavla iwd[384]: src/station.c:station_disassociated() 3
Okt 25 06:13:21 tavla iwd[384]: 
src/station.c:station_reset_connection_state() 3
Okt 25 06:13:21 tavla iwd[384]: src/station.c:station_roam_state_clear() 3
Okt 25 06:13:21 tavla iwd[384]: 
src/netconfig.c:netconfig_event_handler() l_netconfig event 2
Okt 25 06:13:21 tavla iwd[384]: 
src/netconfig-commit.c:netconfig_commit_print_addrs() removing address: 
192.168.178.113
Okt 25 06:13:21 tavla iwd[384]: src/resolve.c:resolve_systemd_revert() 
ifindex: 3
Okt 25 06:13:21 tavla iwd[384]: event: state, old: connected, new: 
disconnected
Okt 25 06:13:21 tavla iwd[384]: event: state, old: disconnected, new: 
autoconnect_quick
Okt 25 06:13:21 tavla iwd[384]: src/wiphy.c:wiphy_radio_work_insert() 
Inserting work item 17
Okt 25 06:13:21 tavla iwd[384]: src/wiphy.c:wiphy_radio_work_next() 
Starting work item 17
Okt 25 06:13:21 tavla iwd[384]: src/wiphy.c:wiphy_reg_notify() 
Notification of command Reg Change(36)
Okt 25 06:13:21 tavla iwd[384]: src/netdev.c:netdev_link_notify() event 
16 on ifindex 3
Okt 25 06:13:21 tavla iwd[384]: src/scan.c:scan_notify() Scan 
notification Trigger Scan(33)
Okt 25 06:13:21 tavla iwd[384]: src/netdev.c:netdev_link_notify() event 
16 on ifindex 3
Okt 25 06:13:21 tavla iwd[384]: src/scan.c:scan_request_triggered() 
Passive scan triggered for wdev 1
Okt 25 06:13:21 tavla iwd[384]: 
src/station.c:station_quick_scan_triggered() Quick scan triggered for wlan0
Okt 25 06:13:21 tavla iwd[384]: src/scan.c:scan_notify() Scan 
notification New Scan Results(34)
Okt 25 06:13:21 tavla iwd[384]: src/netdev.c:netdev_link_notify() event 
16 on ifindex 3
Okt 25 06:13:21 tavla iwd[384]: src/station.c:station_print_scan_bss() 
Processing BSS '38:10:d5:37:2c:7b' with SSID: XYZ, freq: 2462, rank: 
442, strength: -3300, data_rate: 54.0, load: 21/255
Okt 25 06:13:21 tavla iwd[384]: src/station.c:station_print_scan_bss() 
Processing BSS '38:10:d5:7a:45:ca' with SSID: FRITZ!Box 7490, freq: 
5580, rank: 16, strength: -8700, data_rate: 2.0, load: 12/255
Okt 25 06:13:21 tavla iwd[384]: src/station.c:station_print_scan_bss() 
Processing BSS '84:47:65:1a:c8:14' with SSID: WLAN-T4MQUD, freq: 5260, 
rank: 442, strength: -6100, data_rate: 54.0, load: 46/255
Okt 25 06:13:21 tavla iwd[384]: src/station.c:station_print_scan_bss() 
Processing BSS '84:47:65:1a:c8:11' with SSID: WLAN-T4MQUD, freq: 2437, 
rank: 294, strength: -6700, data_rate: 36.0, load: 63/255
Okt 25 06:13:21 tavla iwd[384]: src/station.c:station_autoconnect_start()
Okt 25 06:13:21 tavla iwd[384]: src/station.c:station_autoconnect_next() 
autoconnect: Trying SSID: XYZ
Okt 25 06:13:21 tavla iwd[384]: src/station.c:station_autoconnect_next() 
autoconnect: '38:10:d5:37:2c:7b' freq: 2462, rank: 442, strength: -3300
Okt 25 06:13:21 tavla iwd[384]: src/netdev.c:netdev_cqm_rssi_update()
Okt 25 06:13:21 tavla iwd[384]: src/wiphy.c:wiphy_radio_work_insert() 
Inserting work item 18
Okt 25 06:13:21 tavla iwd[384]: event: connect-info, ssid: XYZ, bss: 
38:10:d5:37:2c:7b, signal: -33, load: 21/255
Okt 25 06:13:21 tavla iwd[384]: event: state, old: autoconnect_quick, 
new: connecting (auto)
Okt 25 06:13:21 tavla iwd[384]: src/scan.c:scan_cancel() Trying to 
cancel scan id 17 for wdev 1
Okt 25 06:13:21 tavla iwd[384]: src/wiphy.c:wiphy_radio_work_done() Work 
item 17 done
Okt 25 06:13:21 tavla iwd[384]: src/wiphy.c:wiphy_radio_work_next() 
Starting work item 18
Okt 25 06:13:22 tavla iwd[384]: src/netdev.c:netdev_mlme_notify() MLME 
notification Connect(46)
Okt 25 06:13:22 tavla iwd[384]: src/netdev.c:netdev_connect_event()
Okt 25 06:13:22 tavla iwd[384]: src/netdev.c:netdev_connect_event() 
aborting and ignore_connect_event not set, proceed
Okt 25 06:13:22 tavla iwd[384]: src/netdev.c:netdev_connect_event() 
expect_connect_failure not set, proceed
Okt 25 06:13:22 tavla iwd[384]: src/wiphy.c:wiphy_radio_work_done() Work 
item 18 done
Okt 25 06:13:22 tavla iwd[384]: src/station.c:station_connect_cb() 3, 
result: 2
Okt 25 06:13:22 tavla iwd[384]: event: connect-failed, status: 16

# AP rejects your next attempt to connect with status 16 (group 
handshake timeout). Looks like the 4-way handshake never even started so 
this is kinda odd.

Okt 25 06:13:22 tavla iwd[384]: src/netdev.c:netdev_cqm_rssi_update()
Okt 25 06:13:22 tavla iwd[384]: src/wiphy.c:wiphy_radio_work_insert() 
Inserting work item 19
Okt 25 06:13:22 tavla iwd[384]: src/wiphy.c:wiphy_radio_work_next() 
Starting work item 19
Okt 25 06:13:22 tavla iwd[384]: event: connect-info, ssid: XYZ, bss: 
38:10:d5:37:2c:7b, signal: -33, load: 21/255
Okt 25 06:13:22 tavla iwd[384]: src/station.c:station_try_next_bss() 
Attempting to connect to next BSS 38:10:d5:37:2c:7b

# Then IWD tries to connect again and we get no connect even whatsoever.

I think brcmfmac got very confused and stuck. There are many cases in 
IWD where we trust the kernel (and associated driver) will send the 
events we expect. And in this case that did not happen and IWD waits 
indefinitely.

Unless I'm missing something this seems like a bug in the driver. When 
IWD issues a CMD_CONNECT call the driver needs to tell us if it 
succeeded or failed. Not replying at all is going to cause IWD to hang 
like this.

Thanks,

James

>
>>>
>>>>
>>>> We have 100% reproducable builds, test environments (testing 
>>>> sites), and web socket-based telemetry. Thus my feeling is, there 
>>>> is still something wrong with the BSS selection (maybe 
>>>> blacklisting)! Of course this can also be something about the 
>>>> driver (brcmfmac) or firmware (roaming?). Or can this be about IPv6 
>>>> disabled (my configuration below)?
>>>>
>>>> This is really really serious, I (still) lose connections to devices!
>>>>
>>>> I will investigate further and provide logs later today, if possible.
>>>>
>>>> SETUP:
>>>>
>>>> SoM: Variscite DART-IMX8M-PLUS
>>>> SoC: NXP i.MX8M Plus
>>>> Wi-Fi Chip: Laird Sterling LWB5 (Infineon CYW43353)
>>>> Antenna: Laird FlexPIFA
>>>> RootFS: Debian bookworm
>>>> Kernel: NXP Kernel 6.1.36 (with Variscite patches)
>>>> Driver: brcmfmac (from Kernel, only minor patches - device tree)
>>>> Firmware: Laird 12.29.0.22 (loaded via scripts)
>>>> Network Daemon: systemd-networkd 252.30
>>>> Wi-Fi Daemon: IWD 2.22+ (c459dc75c080dcbff6f9b9389344e11328dec411 + 
>>>> OWE and BSS selection patches) + ELL 0.69 
>>>> (7bed251f8020ef896043f008276f41cd13fdd43f)
>>>>
>>>> KERNEL:
>>>>
>>>> REPOSITORY=https://github.com/tavla/linux-imx.git (fork from 
>>>> https://github.com/varigit/linux-imx)
>>>> BRANCH=lf-6.1.y_var03
>>>> COMMIT=a4b3cc1dbbb396284e501b938aa79d2dfd4f1a98
>>>>
>>>> FIRMWARE:
>>>>
>>>> summit-lwb5-etsi-firmware-12.29.0.22.tar.bz2
>>>>
>>>> From: 
>>>> https://github.com/LairdCP/Sterling-LWB-and-LWB5-Release-Packages/releases/
>>>>
>>>> IWD
>>>>
>>>> -----
>>>>
>>>> [General]
>>>> EnableNetworkConfiguration=true
>>>>
>>>> [Network]
>>>> NameResolvingService=systemd
>>>> EnableIPv6=false
>>>>
>>>> [DriverQuirks]
>>>> OweDisable=brcmfmac
>>>>
>>>> -----
>>>>
>>>> Best regards,
>>>>
>>>> Martin
>>>>
>>>>

  parent reply	other threads:[~2024-10-25 12:54 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-25 10:12 Connection loss (IWD HEAD with latest OWE / BSS selection patches) - brcmfmac driver Martin Petzold
2024-10-25 11:10 ` Martin Petzold
2024-10-25 11:48   ` James Prestwood
2024-10-25 12:01     ` Martin Petzold
2024-10-25 12:28     ` Martin Petzold
2024-10-25 12:33       ` Martin Petzold
2024-10-25 12:39       ` Martin Petzold
2024-10-25 12:48       ` Martin Petzold
2024-10-25 12:54       ` James Prestwood [this message]
2024-10-25 13:05         ` Martin Petzold
2024-10-25 13:17           ` James Prestwood
2024-10-25 13:11         ` Martin Petzold
2024-10-25 13:18           ` James Prestwood
2024-10-25 15:03             ` Martin Petzold
2024-10-25 15:17               ` James Prestwood
2024-10-25 22:22                 ` Martin Petzold
2024-10-26 10:01                   ` Martin Petzold
2024-10-26  8:26                 ` Arend Van Spriel
2024-11-03 23:13                 ` Martin Petzold
2024-11-04  0:43                   ` Martin Petzold
2024-11-04 12:36                   ` James Prestwood
2024-11-04 22:42                     ` Martin Petzold
2024-11-04 23:20                       ` James Prestwood
2024-11-05  8:03                         ` Martin Petzold
2024-11-05 13:14                         ` James Prestwood
2024-11-05 15:16                           ` Martin Petzold
2024-11-12  9:15                             ` Martin Petzold
2024-11-12 12:13                               ` James Prestwood
2024-11-07 13:09                         ` Martin Petzold
2024-11-06 20:32                     ` Martin Petzold
2024-11-06 21:35                       ` James Prestwood
2024-10-25 15:17         ` Martin Petzold
2024-10-26  9:07           ` Arend Van Spriel
2024-10-26 10:08             ` Martin Petzold

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d33154fa-9309-4285-a592-9e377e199f43@gmail.com \
    --to=prestwoj@gmail.com \
    --cc=iwd@lists.linux.dev \
    --cc=martin.petzold@tavla.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox