* Segmentation fault when taking device for a walk @ 2024-08-16 0:24 Richard Acayan 2024-08-16 11:53 ` James Prestwood 0 siblings, 1 reply; 8+ messages in thread From: Richard Acayan @ 2024-08-16 0:24 UTC (permalink / raw) To: iwd Hi, A segmentation fault occurs in station_start_roam() when the station is disconnected from an access point, or in other words, when the station's connected_bss is NULL. Usually, this is triggered by a timeout, possibly scheduled in response to a weak signal event. This is occurring on my Pixel 3a running postmarketOS/Alpine Linux, when receding from an access point, on iwd 2.19. I have collected 6 coredumps of the crash in the span of around 2 weeks and would be willing to use GDB if more information is necessary for a patch. Sample: Program terminated with signal SIGSEGV, Segmentation fault. #0 0x0000aaaadf2086a0 in station_start_roam (station=0xffff8776ae50) at src/station.c:2880 warning: 2880 src/station.c: No such file or directory (gdb) bt #0 0x0000aaaadf2086a0 in station_start_roam (station=0xffff8776ae50) at src/station.c:2880 #1 0x0000aaaadf28c544 in timeout_callback (fd=<optimized out>, events=<optimized out>, user_data=0xffff876b2e20) at ell/timeout.c:68 #2 timeout_callback (fd=<optimized out>, events=<optimized out>, user_data=0xffff876b2e20) at ell/timeout.c:57 #3 0x0000aaaadf28b9d0 in l_main_iterate (timeout=<optimized out>) at ell/main.c:461 #4 0x0000aaaadf28bac0 in l_main_run () at ell/main.c:508 #5 l_main_run () at ell/main.c:490 #6 0x0000aaaadf28bce4 in l_main_run_with_signal ( callback=callback@entry=0xaaaadf1f1110 <signal_handler>, user_data=user_data@entry=0x0) at ell/main.c:630 #7 0x0000aaaadf1f0b0c in main (argc=<optimized out>, argv=<optimized out>) at src/main.c:611 (gdb) p station->connected_bss $1 = (struct scan_bss *) 0x0 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Segmentation fault when taking device for a walk 2024-08-16 0:24 Segmentation fault when taking device for a walk Richard Acayan @ 2024-08-16 11:53 ` James Prestwood 2024-08-19 21:59 ` Richard Acayan 0 siblings, 1 reply; 8+ messages in thread From: James Prestwood @ 2024-08-16 11:53 UTC (permalink / raw) To: Richard Acayan, iwd Hi Richard, On 8/15/24 5:24 PM, Richard Acayan wrote: > Hi, > > A segmentation fault occurs in station_start_roam() when the station is > disconnected from an access point, or in other words, when the station's > connected_bss is NULL. Usually, this is triggered by a timeout, possibly > scheduled in response to a weak signal event. > > This is occurring on my Pixel 3a running postmarketOS/Alpine Linux, when > receding from an access point, on iwd 2.19. I have collected 6 coredumps > of the crash in the span of around 2 weeks and would be willing to use > GDB if more information is necessary for a patch. > > Sample: > > Program terminated with signal SIGSEGV, Segmentation fault. > #0 0x0000aaaadf2086a0 in station_start_roam (station=0xffff8776ae50) at src/station.c:2880 > > warning: 2880 src/station.c: No such file or directory > (gdb) bt > #0 0x0000aaaadf2086a0 in station_start_roam (station=0xffff8776ae50) at src/station.c:2880 > #1 0x0000aaaadf28c544 in timeout_callback (fd=<optimized out>, events=<optimized out>, > user_data=0xffff876b2e20) at ell/timeout.c:68 > #2 timeout_callback (fd=<optimized out>, events=<optimized out>, user_data=0xffff876b2e20) > at ell/timeout.c:57 > #3 0x0000aaaadf28b9d0 in l_main_iterate (timeout=<optimized out>) at ell/main.c:461 > #4 0x0000aaaadf28bac0 in l_main_run () at ell/main.c:508 > #5 l_main_run () at ell/main.c:490 > #6 0x0000aaaadf28bce4 in l_main_run_with_signal ( > callback=callback@entry=0xaaaadf1f1110 <signal_handler>, user_data=user_data@entry=0x0) > at ell/main.c:630 > #7 0x0000aaaadf1f0b0c in main (argc=<optimized out>, argv=<optimized out>) at src/main.c:611 > (gdb) p station->connected_bss > $1 = (struct scan_bss *) 0x0 > Its hard to say without any debug logs as well but it appears the disconnect never cleared out the timer used for the next roam attempt. I did fix a hang due to a disconnect coming in during a roam attempt after 2.19, but I can't really make heads or tails without debug logs to see what happened before/after the disconnect. Thanks, James ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Segmentation fault when taking device for a walk 2024-08-16 11:53 ` James Prestwood @ 2024-08-19 21:59 ` Richard Acayan 2024-08-20 15:04 ` James Prestwood 0 siblings, 1 reply; 8+ messages in thread From: Richard Acayan @ 2024-08-19 21:59 UTC (permalink / raw) To: James Prestwood; +Cc: iwd On Fri, Aug 16, 2024 at 04:53:41AM -0700, James Prestwood wrote: > Hi Richard, > > On 8/15/24 5:24 PM, Richard Acayan wrote: >> Hi, >> >> A segmentation fault occurs in station_start_roam() when the station is >> disconnected from an access point, or in other words, when the station's >> connected_bss is NULL. Usually, this is triggered by a timeout, possibly >> scheduled in response to a weak signal event. >> >> This is occurring on my Pixel 3a running postmarketOS/Alpine Linux, when >> receding from an access point, on iwd 2.19. I have collected 6 coredumps >> of the crash in the span of around 2 weeks and would be willing to use >> GDB if more information is necessary for a patch. >> >> Sample: >> >> Program terminated with signal SIGSEGV, Segmentation fault. >> #0 0x0000aaaadf2086a0 in station_start_roam (station=0xffff8776ae50) at src/station.c:2880 >> >> warning: 2880 src/station.c: No such file or directory >> (gdb) bt >> #0 0x0000aaaadf2086a0 in station_start_roam (station=0xffff8776ae50) at src/station.c:2880 >> #1 0x0000aaaadf28c544 in timeout_callback (fd=<optimized out>, events=<optimized out>, >> user_data=0xffff876b2e20) at ell/timeout.c:68 >> #2 timeout_callback (fd=<optimized out>, events=<optimized out>, user_data=0xffff876b2e20) >> at ell/timeout.c:57 >> #3 0x0000aaaadf28b9d0 in l_main_iterate (timeout=<optimized out>) at ell/main.c:461 >> #4 0x0000aaaadf28bac0 in l_main_run () at ell/main.c:508 >> #5 l_main_run () at ell/main.c:490 >> #6 0x0000aaaadf28bce4 in l_main_run_with_signal ( >> callback=callback@entry=0xaaaadf1f1110 <signal_handler>, user_data=user_data@entry=0x0) >> at ell/main.c:630 >> #7 0x0000aaaadf1f0b0c in main (argc=<optimized out>, argv=<optimized out>) at src/main.c:611 >> (gdb) p station->connected_bss >> $1 = (struct scan_bss *) 0x0 >> > Its hard to say without any debug logs as well but it appears the disconnect > never cleared out the timer used for the next roam attempt. I did fix a hang > due to a disconnect coming in during a roam attempt after 2.19, but I can't > really make heads or tails without debug logs to see what happened > before/after the disconnect. It happened again with debug logs enabled. Relevant snippet (from logread): [Aug 17 21:22:12] daemon iwd: src/station.c:station_roam_state_clear() 5 [Aug 17 21:22:12] daemon iwd: event: state, old: connected, new: disconnecting [Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20) [Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_link_notify() event 16 on ifindex 5 [Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_mlme_notify() MLME notification Deauthenticate(39) [Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_deauthenticate_event() [Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_mlme_notify() MLME notification Disconnect(48) [Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_disconnect_event() [Aug 17 21:22:15] daemon iwd: src/station.c:station_disconnect_cb() 5, success: 1 [Aug 17 21:22:15] daemon iwd: event: state, old: disconnecting, new: disconnected [Aug 17 21:22:15] daemon iwd: src/wiphy.c:wiphy_reg_notify() Notification of command Reg Change(36) [Aug 17 21:22:15] daemon iwd: src/wiphy.c:wiphy_update_reg_domain() New reg domain country code for (global) is XX [Aug 17 21:22:36] daemon iwd: src/station.c:station_roam_trigger_cb() 5 Afterwards is the segmentation fault. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Segmentation fault when taking device for a walk 2024-08-19 21:59 ` Richard Acayan @ 2024-08-20 15:04 ` James Prestwood 2024-08-20 16:00 ` Richard Acayan 0 siblings, 1 reply; 8+ messages in thread From: James Prestwood @ 2024-08-20 15:04 UTC (permalink / raw) To: Richard Acayan; +Cc: iwd Hi Richard, On 8/19/24 2:59 PM, Richard Acayan wrote: > On Fri, Aug 16, 2024 at 04:53:41AM -0700, James Prestwood wrote: >> Hi Richard, >> >> On 8/15/24 5:24 PM, Richard Acayan wrote: >>> Hi, >>> >>> A segmentation fault occurs in station_start_roam() when the station is >>> disconnected from an access point, or in other words, when the station's >>> connected_bss is NULL. Usually, this is triggered by a timeout, possibly >>> scheduled in response to a weak signal event. >>> >>> This is occurring on my Pixel 3a running postmarketOS/Alpine Linux, when >>> receding from an access point, on iwd 2.19. I have collected 6 coredumps >>> of the crash in the span of around 2 weeks and would be willing to use >>> GDB if more information is necessary for a patch. >>> >>> Sample: >>> >>> Program terminated with signal SIGSEGV, Segmentation fault. >>> #0 0x0000aaaadf2086a0 in station_start_roam (station=0xffff8776ae50) at src/station.c:2880 >>> >>> warning: 2880 src/station.c: No such file or directory >>> (gdb) bt >>> #0 0x0000aaaadf2086a0 in station_start_roam (station=0xffff8776ae50) at src/station.c:2880 >>> #1 0x0000aaaadf28c544 in timeout_callback (fd=<optimized out>, events=<optimized out>, >>> user_data=0xffff876b2e20) at ell/timeout.c:68 >>> #2 timeout_callback (fd=<optimized out>, events=<optimized out>, user_data=0xffff876b2e20) >>> at ell/timeout.c:57 >>> #3 0x0000aaaadf28b9d0 in l_main_iterate (timeout=<optimized out>) at ell/main.c:461 >>> #4 0x0000aaaadf28bac0 in l_main_run () at ell/main.c:508 >>> #5 l_main_run () at ell/main.c:490 >>> #6 0x0000aaaadf28bce4 in l_main_run_with_signal ( >>> callback=callback@entry=0xaaaadf1f1110 <signal_handler>, user_data=user_data@entry=0x0) >>> at ell/main.c:630 >>> #7 0x0000aaaadf1f0b0c in main (argc=<optimized out>, argv=<optimized out>) at src/main.c:611 >>> (gdb) p station->connected_bss >>> $1 = (struct scan_bss *) 0x0 >>> >> Its hard to say without any debug logs as well but it appears the disconnect >> never cleared out the timer used for the next roam attempt. I did fix a hang >> due to a disconnect coming in during a roam attempt after 2.19, but I can't >> really make heads or tails without debug logs to see what happened >> before/after the disconnect. > It happened again with debug logs enabled. Relevant snippet (from > logread): > > [Aug 17 21:22:12] daemon iwd: src/station.c:station_roam_state_clear() 5 > [Aug 17 21:22:12] daemon iwd: event: state, old: connected, new: disconnecting > [Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20) > [Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_link_notify() event 16 on ifindex 5 > [Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_mlme_notify() MLME notification Deauthenticate(39) > [Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_deauthenticate_event() > [Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_mlme_notify() MLME notification Disconnect(48) > [Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_disconnect_event() > [Aug 17 21:22:15] daemon iwd: src/station.c:station_disconnect_cb() 5, success: 1 > [Aug 17 21:22:15] daemon iwd: event: state, old: disconnecting, new: disconnected > [Aug 17 21:22:15] daemon iwd: src/wiphy.c:wiphy_reg_notify() Notification of command Reg Change(36) > [Aug 17 21:22:15] daemon iwd: src/wiphy.c:wiphy_update_reg_domain() New reg domain country code for (global) is XX > [Aug 17 21:22:36] daemon iwd: src/station.c:station_roam_trigger_cb() 5 > > Afterwards is the segmentation fault. Do you happen to have the logs a few minutes prior. The roam timeout is defaulted to 60 seconds, so at some point it was re-armed but the logs don't go back that far. Its trivial to handle the segfault but I suspect the roam timeout being rearmed is also leaking memory so we should address that as the root cause. Thanks, James ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Segmentation fault when taking device for a walk 2024-08-20 15:04 ` James Prestwood @ 2024-08-20 16:00 ` Richard Acayan 2024-08-21 14:27 ` James Prestwood 0 siblings, 1 reply; 8+ messages in thread From: Richard Acayan @ 2024-08-20 16:00 UTC (permalink / raw) To: James Prestwood; +Cc: iwd On Tue, Aug 20, 2024 at 08:04:20AM -0700, James Prestwood wrote: > Hi Richard, > > On 8/19/24 2:59 PM, Richard Acayan wrote: > > On Fri, Aug 16, 2024 at 04:53:41AM -0700, James Prestwood wrote: > > > Hi Richard, > > > > > > On 8/15/24 5:24 PM, Richard Acayan wrote: > > > > Hi, > > > > > > > > A segmentation fault occurs in station_start_roam() when the station is > > > > disconnected from an access point, or in other words, when the station's > > > > connected_bss is NULL. Usually, this is triggered by a timeout, possibly > > > > scheduled in response to a weak signal event. > > > > > > > > This is occurring on my Pixel 3a running postmarketOS/Alpine Linux, when > > > > receding from an access point, on iwd 2.19. I have collected 6 coredumps > > > > of the crash in the span of around 2 weeks and would be willing to use > > > > GDB if more information is necessary for a patch. > > > > > > > > Sample: > > > > > > > > Program terminated with signal SIGSEGV, Segmentation fault. > > > > #0 0x0000aaaadf2086a0 in station_start_roam (station=0xffff8776ae50) at src/station.c:2880 > > > > > > > > warning: 2880 src/station.c: No such file or directory > > > > (gdb) bt > > > > #0 0x0000aaaadf2086a0 in station_start_roam (station=0xffff8776ae50) at src/station.c:2880 > > > > #1 0x0000aaaadf28c544 in timeout_callback (fd=<optimized out>, events=<optimized out>, > > > > user_data=0xffff876b2e20) at ell/timeout.c:68 > > > > #2 timeout_callback (fd=<optimized out>, events=<optimized out>, user_data=0xffff876b2e20) > > > > at ell/timeout.c:57 > > > > #3 0x0000aaaadf28b9d0 in l_main_iterate (timeout=<optimized out>) at ell/main.c:461 > > > > #4 0x0000aaaadf28bac0 in l_main_run () at ell/main.c:508 > > > > #5 l_main_run () at ell/main.c:490 > > > > #6 0x0000aaaadf28bce4 in l_main_run_with_signal ( > > > > callback=callback@entry=0xaaaadf1f1110 <signal_handler>, user_data=user_data@entry=0x0) > > > > at ell/main.c:630 > > > > #7 0x0000aaaadf1f0b0c in main (argc=<optimized out>, argv=<optimized out>) at src/main.c:611 > > > > (gdb) p station->connected_bss > > > > $1 = (struct scan_bss *) 0x0 > > > > > > > Its hard to say without any debug logs as well but it appears the disconnect > > > never cleared out the timer used for the next roam attempt. I did fix a hang > > > due to a disconnect coming in during a roam attempt after 2.19, but I can't > > > really make heads or tails without debug logs to see what happened > > > before/after the disconnect. > > It happened again with debug logs enabled. Relevant snippet (from > > logread): > > > > [Aug 17 21:22:12] daemon iwd: src/station.c:station_roam_state_clear() 5 > > [Aug 17 21:22:12] daemon iwd: event: state, old: connected, new: disconnecting > > [Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20) > > [Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_link_notify() event 16 on ifindex 5 > > [Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_mlme_notify() MLME notification Deauthenticate(39) > > [Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_deauthenticate_event() > > [Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_mlme_notify() MLME notification Disconnect(48) > > [Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_disconnect_event() > > [Aug 17 21:22:15] daemon iwd: src/station.c:station_disconnect_cb() 5, success: 1 > > [Aug 17 21:22:15] daemon iwd: event: state, old: disconnecting, new: disconnected > > [Aug 17 21:22:15] daemon iwd: src/wiphy.c:wiphy_reg_notify() Notification of command Reg Change(36) > > [Aug 17 21:22:15] daemon iwd: src/wiphy.c:wiphy_update_reg_domain() New reg domain country code for (global) is XX > > [Aug 17 21:22:36] daemon iwd: src/station.c:station_roam_trigger_cb() 5 > > > > Afterwards is the segmentation fault. > > Do you happen to have the logs a few minutes prior. The roam timeout is > defaulted to 60 seconds, so at some point it was re-armed but the logs don't > go back that far. Its trivial to handle the segfault but I suspect the roam > timeout being rearmed is also leaking memory so we should address that as > the root cause. Unfortunately not, only for the last half-minute (with personal information). Here's a bit more that I don't need to redact: [Aug 17 21:22:07] daemon iwd: src/station.c:station_roam_failed() 5 [Aug 17 21:22:07] daemon iwd: src/wiphy.c:wiphy_radio_work_done() Work item 279 done [Aug 17 21:22:12] daemon iwd: src/station.c:station_dbus_disconnect() [Aug 17 21:22:12] daemon iwd: src/station.c:station_reset_connection_state() 5 [Aug 17 21:22:12] daemon iwd: src/station.c:station_roam_state_clear() 5 [Aug 17 21:22:12] daemon iwd: event: state, old: connected, new: disconnecting [Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20) [Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_link_notify() event 16 on ifindex 5 [Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_mlme_notify() MLME notification Deauthenticate(39) [Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_deauthenticate_event() [Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_mlme_notify() MLME notification Disconnect(48) [Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_disconnect_event() [Aug 17 21:22:15] daemon iwd: src/station.c:station_disconnect_cb() 5, success: 1 [Aug 17 21:22:15] daemon iwd: event: state, old: disconnecting, new: disconnected [Aug 17 21:22:15] daemon iwd: src/wiphy.c:wiphy_reg_notify() Notification of command Reg Change(36) [Aug 17 21:22:15] daemon iwd: src/wiphy.c:wiphy_update_reg_domain() New reg domain country code for (global) is XX [Aug 17 21:22:36] daemon iwd: src/station.c:station_roam_trigger_cb() 5 I can confirm, though, that there are more timeouts than expected, even after (presumably) clearing the last timeout: Core was generated by `/usr/libexec/iwd -d'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x0000aaaadf1686a0 in station_start_roam (station=0xffffb0527b30) at src/station.c:2880 warning: 2880 src/station.c: No such file or directory (gdb) p *watch_list[12] $1 = {fd = 12, events = 1073741825, flags = 1, callback = 0xaaaadf1ec4f0 <timeout_callback>, destroy = 0xaaaadf1ec350 <timeout_destroy>, user_data = 0xffffb0530dd0} (gdb) p *(struct l_timeout *) watch_list[12]->user_data $2 = {fd = 12, callback = 0xaaaadf1687a0 <station_roam_trigger_cb>, destroy = 0x0, user_data = 0xffffb0527b30} (gdb) p *watch_list[13] $3 = {fd = 13, events = 1073741825, flags = 0, callback = 0xaaaadf1ec4f0 <timeout_callback>, destroy = 0xaaaadf1ec350 <timeout_destroy>, user_data = 0xffffb0530ef0} (gdb) p *(struct l_timeout *) watch_list[13]->user_data $4 = {fd = 13, callback = 0xaaaadf1687a0 <station_roam_trigger_cb>, destroy = 0x0, user_data = 0xffffb0527b30} (gdb) p *watch_list[14] $5 = {fd = 14, events = 1073741825, flags = 0, callback = 0xaaaadf1ec4f0 <timeout_callback>, destroy = 0xaaaadf1ec350 <timeout_destroy>, user_data = 0xffffb0530c50} (gdb) p *(struct l_timeout *) watch_list[14]->user_data $6 = {fd = 14, callback = 0xaaaadf1687a0 <station_roam_trigger_cb>, destroy = 0x0, user_data = 0xffffb0527b30} ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Segmentation fault when taking device for a walk 2024-08-20 16:00 ` Richard Acayan @ 2024-08-21 14:27 ` James Prestwood 2024-08-27 0:40 ` Richard Acayan 0 siblings, 1 reply; 8+ messages in thread From: James Prestwood @ 2024-08-21 14:27 UTC (permalink / raw) To: Richard Acayan; +Cc: iwd [-- Attachment #1: Type: text/plain, Size: 7619 bytes --] Hi Richard, On 8/20/24 9:00 AM, Richard Acayan wrote: > On Tue, Aug 20, 2024 at 08:04:20AM -0700, James Prestwood wrote: >> Hi Richard, >> >> On 8/19/24 2:59 PM, Richard Acayan wrote: >>> On Fri, Aug 16, 2024 at 04:53:41AM -0700, James Prestwood wrote: >>>> Hi Richard, >>>> >>>> On 8/15/24 5:24 PM, Richard Acayan wrote: >>>>> Hi, >>>>> >>>>> A segmentation fault occurs in station_start_roam() when the station is >>>>> disconnected from an access point, or in other words, when the station's >>>>> connected_bss is NULL. Usually, this is triggered by a timeout, possibly >>>>> scheduled in response to a weak signal event. >>>>> >>>>> This is occurring on my Pixel 3a running postmarketOS/Alpine Linux, when >>>>> receding from an access point, on iwd 2.19. I have collected 6 coredumps >>>>> of the crash in the span of around 2 weeks and would be willing to use >>>>> GDB if more information is necessary for a patch. >>>>> >>>>> Sample: >>>>> >>>>> Program terminated with signal SIGSEGV, Segmentation fault. >>>>> #0 0x0000aaaadf2086a0 in station_start_roam (station=0xffff8776ae50) at src/station.c:2880 >>>>> >>>>> warning: 2880 src/station.c: No such file or directory >>>>> (gdb) bt >>>>> #0 0x0000aaaadf2086a0 in station_start_roam (station=0xffff8776ae50) at src/station.c:2880 >>>>> #1 0x0000aaaadf28c544 in timeout_callback (fd=<optimized out>, events=<optimized out>, >>>>> user_data=0xffff876b2e20) at ell/timeout.c:68 >>>>> #2 timeout_callback (fd=<optimized out>, events=<optimized out>, user_data=0xffff876b2e20) >>>>> at ell/timeout.c:57 >>>>> #3 0x0000aaaadf28b9d0 in l_main_iterate (timeout=<optimized out>) at ell/main.c:461 >>>>> #4 0x0000aaaadf28bac0 in l_main_run () at ell/main.c:508 >>>>> #5 l_main_run () at ell/main.c:490 >>>>> #6 0x0000aaaadf28bce4 in l_main_run_with_signal ( >>>>> callback=callback@entry=0xaaaadf1f1110 <signal_handler>, user_data=user_data@entry=0x0) >>>>> at ell/main.c:630 >>>>> #7 0x0000aaaadf1f0b0c in main (argc=<optimized out>, argv=<optimized out>) at src/main.c:611 >>>>> (gdb) p station->connected_bss >>>>> $1 = (struct scan_bss *) 0x0 >>>>> >>>> Its hard to say without any debug logs as well but it appears the disconnect >>>> never cleared out the timer used for the next roam attempt. I did fix a hang >>>> due to a disconnect coming in during a roam attempt after 2.19, but I can't >>>> really make heads or tails without debug logs to see what happened >>>> before/after the disconnect. >>> It happened again with debug logs enabled. Relevant snippet (from >>> logread): >>> >>> [Aug 17 21:22:12] daemon iwd: src/station.c:station_roam_state_clear() 5 >>> [Aug 17 21:22:12] daemon iwd: event: state, old: connected, new: disconnecting >>> [Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20) >>> [Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_link_notify() event 16 on ifindex 5 >>> [Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_mlme_notify() MLME notification Deauthenticate(39) >>> [Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_deauthenticate_event() >>> [Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_mlme_notify() MLME notification Disconnect(48) >>> [Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_disconnect_event() >>> [Aug 17 21:22:15] daemon iwd: src/station.c:station_disconnect_cb() 5, success: 1 >>> [Aug 17 21:22:15] daemon iwd: event: state, old: disconnecting, new: disconnected >>> [Aug 17 21:22:15] daemon iwd: src/wiphy.c:wiphy_reg_notify() Notification of command Reg Change(36) >>> [Aug 17 21:22:15] daemon iwd: src/wiphy.c:wiphy_update_reg_domain() New reg domain country code for (global) is XX >>> [Aug 17 21:22:36] daemon iwd: src/station.c:station_roam_trigger_cb() 5 >>> >>> Afterwards is the segmentation fault. >> Do you happen to have the logs a few minutes prior. The roam timeout is >> defaulted to 60 seconds, so at some point it was re-armed but the logs don't >> go back that far. Its trivial to handle the segfault but I suspect the roam >> timeout being rearmed is also leaking memory so we should address that as >> the root cause. > Unfortunately not, only for the last half-minute (with personal > information). Here's a bit more that I don't need to redact: > > [Aug 17 21:22:07] daemon iwd: src/station.c:station_roam_failed() 5 > [Aug 17 21:22:07] daemon iwd: src/wiphy.c:wiphy_radio_work_done() Work item 279 done > [Aug 17 21:22:12] daemon iwd: src/station.c:station_dbus_disconnect() > [Aug 17 21:22:12] daemon iwd: src/station.c:station_reset_connection_state() 5 > [Aug 17 21:22:12] daemon iwd: src/station.c:station_roam_state_clear() 5 > [Aug 17 21:22:12] daemon iwd: event: state, old: connected, new: disconnecting > [Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20) > [Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_link_notify() event 16 on ifindex 5 > [Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_mlme_notify() MLME notification Deauthenticate(39) > [Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_deauthenticate_event() > [Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_mlme_notify() MLME notification Disconnect(48) > [Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_disconnect_event() > [Aug 17 21:22:15] daemon iwd: src/station.c:station_disconnect_cb() 5, success: 1 > [Aug 17 21:22:15] daemon iwd: event: state, old: disconnecting, new: disconnected > [Aug 17 21:22:15] daemon iwd: src/wiphy.c:wiphy_reg_notify() Notification of command Reg Change(36) > [Aug 17 21:22:15] daemon iwd: src/wiphy.c:wiphy_update_reg_domain() New reg domain country code for (global) is XX > [Aug 17 21:22:36] daemon iwd: src/station.c:station_roam_trigger_cb() 5 > > I can confirm, though, that there are more timeouts than expected, even > after (presumably) clearing the last timeout: > > Core was generated by `/usr/libexec/iwd -d'. > Program terminated with signal SIGSEGV, Segmentation fault. > #0 0x0000aaaadf1686a0 in station_start_roam (station=0xffffb0527b30) at src/station.c:2880 > > warning: 2880 src/station.c: No such file or directory > (gdb) p *watch_list[12] > $1 = {fd = 12, events = 1073741825, flags = 1, callback = 0xaaaadf1ec4f0 <timeout_callback>, > destroy = 0xaaaadf1ec350 <timeout_destroy>, user_data = 0xffffb0530dd0} > (gdb) p *(struct l_timeout *) watch_list[12]->user_data > $2 = {fd = 12, callback = 0xaaaadf1687a0 <station_roam_trigger_cb>, destroy = 0x0, > user_data = 0xffffb0527b30} > (gdb) p *watch_list[13] > $3 = {fd = 13, events = 1073741825, flags = 0, callback = 0xaaaadf1ec4f0 <timeout_callback>, > destroy = 0xaaaadf1ec350 <timeout_destroy>, user_data = 0xffffb0530ef0} > (gdb) p *(struct l_timeout *) watch_list[13]->user_data > $4 = {fd = 13, callback = 0xaaaadf1687a0 <station_roam_trigger_cb>, destroy = 0x0, > user_data = 0xffffb0527b30} > (gdb) p *watch_list[14] > $5 = {fd = 14, events = 1073741825, flags = 0, callback = 0xaaaadf1ec4f0 <timeout_callback>, > destroy = 0xaaaadf1ec350 <timeout_destroy>, user_data = 0xffffb0530c50} > (gdb) p *(struct l_timeout *) watch_list[14]->user_data > $6 = {fd = 14, callback = 0xaaaadf1687a0 <station_roam_trigger_cb>, destroy = 0x0, > user_data = 0xffffb0527b30} Could you try the attached patch. Assuming this fixes it you should also see some WARNING debug logs, which would be nice to provide to us as this will tell us where this timer is being overwritten. Also do you happen to know the wifi chipset in the Pixel 3a? I cant seem to find that anywhere online. Thanks, James [-- Attachment #2: 0001-station-check-for-roam-timeout-before-rearming.patch --] [-- Type: text/x-patch, Size: 1152 bytes --] From 1366e47de0700c9235da03c58f22a2a423451a6e Mon Sep 17 00:00:00 2001 From: James Prestwood <prestwoj@gmail.com> Date: Tue, 20 Aug 2024 07:59:36 -0700 Subject: [PATCH] station: check for roam timeout before rearming --- src/station.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/station.c b/src/station.c index 509d919c..e90a64f7 100644 --- a/src/station.c +++ b/src/station.c @@ -2114,7 +2114,7 @@ static void station_roamed(struct station *station) * Schedule another roaming attempt in case the signal continues to * remain low. A subsequent high signal notification will cancel it. */ - if (station->signal_low) + if (station->signal_low && L_WARN_ON(!station->roam_trigger_timeout)) station_roam_timeout_rearm(station, roam_retry_interval); if (station->netconfig) @@ -2153,7 +2153,7 @@ static void station_roam_retry(struct station *station) station->roam_scan_full = false; station->ap_directed_roaming = false; - if (station->signal_low) + if (station->signal_low && L_WARN_ON(!station->roam_trigger_timeout)) station_roam_timeout_rearm(station, roam_retry_interval); } -- 2.34.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: Segmentation fault when taking device for a walk 2024-08-21 14:27 ` James Prestwood @ 2024-08-27 0:40 ` Richard Acayan 2024-08-27 11:46 ` James Prestwood 0 siblings, 1 reply; 8+ messages in thread From: Richard Acayan @ 2024-08-27 0:40 UTC (permalink / raw) To: James Prestwood; +Cc: iwd On Wed, Aug 21, 2024 at 07:27:41AM -0700, James Prestwood wrote: > Could you try the attached patch. Assuming this fixes it you should also see > some WARNING debug logs, which would be nice to provide to us as this will > tell us where this timer is being overwritten. Yeah, I see some: [Aug 24 16:51:10] daemon iwd: src/station.c:station_roam_failed() 5 [Aug 24 16:51:10] daemon iwd: WARNING: src/station.c:station_roam_retry() condition !station->roam_trigger_timeout failed [Aug 24 16:52:54] daemon iwd: src/station.c:station_roam_failed() 5 [Aug 24 16:52:54] daemon iwd: WARNING: src/station.c:station_roam_retry() condition !station->roam_trigger_timeout failed [Aug 24 16:53:09] daemon iwd: src/station.c:station_roam_failed() 5 [Aug 24 16:53:09] daemon iwd: WARNING: src/station.c:station_roam_retry() condition !station->roam_trigger_timeout failed [Aug 24 16:53:22] daemon iwd: src/station.c:station_roam_failed() 5 [Aug 24 16:53:22] daemon iwd: WARNING: src/station.c:station_roam_retry() condition !station->roam_trigger_timeout failed [Aug 24 16:53:32] daemon iwd: src/station.c:station_roam_failed() 5 [Aug 24 16:53:32] daemon iwd: WARNING: src/station.c:station_roam_retry() condition !station->roam_trigger_timeout failed [Aug 24 16:54:33] daemon iwd: src/station.c:station_roam_failed() 5 [Aug 24 16:54:33] daemon iwd: WARNING: src/station.c:station_roam_retry() condition !station->roam_trigger_timeout failed > Also do you happen to know the wifi chipset in the Pixel 3a? I cant seem to > find that anywhere online. The Pixel 3a is based on the SDM670 SoC, and has the WCN3990 WiFi chip. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Segmentation fault when taking device for a walk 2024-08-27 0:40 ` Richard Acayan @ 2024-08-27 11:46 ` James Prestwood 0 siblings, 0 replies; 8+ messages in thread From: James Prestwood @ 2024-08-27 11:46 UTC (permalink / raw) To: Richard Acayan; +Cc: iwd Hi Richard, On 8/26/24 5:40 PM, Richard Acayan wrote: > On Wed, Aug 21, 2024 at 07:27:41AM -0700, James Prestwood wrote: >> Could you try the attached patch. Assuming this fixes it you should also see >> some WARNING debug logs, which would be nice to provide to us as this will >> tell us where this timer is being overwritten. > Yeah, I see some: > > [Aug 24 16:51:10] daemon iwd: src/station.c:station_roam_failed() 5 > [Aug 24 16:51:10] daemon iwd: WARNING: src/station.c:station_roam_retry() condition !station->roam_trigger_timeout failed > [Aug 24 16:52:54] daemon iwd: src/station.c:station_roam_failed() 5 > [Aug 24 16:52:54] daemon iwd: WARNING: src/station.c:station_roam_retry() condition !station->roam_trigger_timeout failed > [Aug 24 16:53:09] daemon iwd: src/station.c:station_roam_failed() 5 > [Aug 24 16:53:09] daemon iwd: WARNING: src/station.c:station_roam_retry() condition !station->roam_trigger_timeout failed > [Aug 24 16:53:22] daemon iwd: src/station.c:station_roam_failed() 5 > [Aug 24 16:53:22] daemon iwd: WARNING: src/station.c:station_roam_retry() condition !station->roam_trigger_timeout failed > [Aug 24 16:53:32] daemon iwd: src/station.c:station_roam_failed() 5 > [Aug 24 16:53:32] daemon iwd: WARNING: src/station.c:station_roam_retry() condition !station->roam_trigger_timeout failed > [Aug 24 16:54:33] daemon iwd: src/station.c:station_roam_failed() 5 > [Aug 24 16:54:33] daemon iwd: WARNING: src/station.c:station_roam_retry() condition !station->roam_trigger_timeout failed Are there any other messages in there in between those warnings and failed roam prints? > >> Also do you happen to know the wifi chipset in the Pixel 3a? I cant seem to >> find that anywhere online. > The Pixel 3a is based on the SDM670 SoC, and has the WCN3990 WiFi chip. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-08-27 11:46 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-08-16 0:24 Segmentation fault when taking device for a walk Richard Acayan 2024-08-16 11:53 ` James Prestwood 2024-08-19 21:59 ` Richard Acayan 2024-08-20 15:04 ` James Prestwood 2024-08-20 16:00 ` Richard Acayan 2024-08-21 14:27 ` James Prestwood 2024-08-27 0:40 ` Richard Acayan 2024-08-27 11:46 ` James Prestwood
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox