From: Richard Acayan <mailingradian@gmail.com>
To: James Prestwood <prestwoj@gmail.com>
Cc: iwd@lists.linux.dev
Subject: Re: Segmentation fault when taking device for a walk
Date: Tue, 20 Aug 2024 12:00:34 -0400 [thread overview]
Message-ID: <ZsS9oqna82kDmgJy@radian> (raw)
In-Reply-To: <e6ff554e-ad17-46e3-b614-ac6da796b337@gmail.com>
On Tue, Aug 20, 2024 at 08:04:20AM -0700, James Prestwood wrote:
> Hi Richard,
>
> On 8/19/24 2:59 PM, Richard Acayan wrote:
> > On Fri, Aug 16, 2024 at 04:53:41AM -0700, James Prestwood wrote:
> > > Hi Richard,
> > >
> > > On 8/15/24 5:24 PM, Richard Acayan wrote:
> > > > Hi,
> > > >
> > > > A segmentation fault occurs in station_start_roam() when the station is
> > > > disconnected from an access point, or in other words, when the station's
> > > > connected_bss is NULL. Usually, this is triggered by a timeout, possibly
> > > > scheduled in response to a weak signal event.
> > > >
> > > > This is occurring on my Pixel 3a running postmarketOS/Alpine Linux, when
> > > > receding from an access point, on iwd 2.19. I have collected 6 coredumps
> > > > of the crash in the span of around 2 weeks and would be willing to use
> > > > GDB if more information is necessary for a patch.
> > > >
> > > > Sample:
> > > >
> > > > Program terminated with signal SIGSEGV, Segmentation fault.
> > > > #0 0x0000aaaadf2086a0 in station_start_roam (station=0xffff8776ae50) at src/station.c:2880
> > > >
> > > > warning: 2880 src/station.c: No such file or directory
> > > > (gdb) bt
> > > > #0 0x0000aaaadf2086a0 in station_start_roam (station=0xffff8776ae50) at src/station.c:2880
> > > > #1 0x0000aaaadf28c544 in timeout_callback (fd=<optimized out>, events=<optimized out>,
> > > > user_data=0xffff876b2e20) at ell/timeout.c:68
> > > > #2 timeout_callback (fd=<optimized out>, events=<optimized out>, user_data=0xffff876b2e20)
> > > > at ell/timeout.c:57
> > > > #3 0x0000aaaadf28b9d0 in l_main_iterate (timeout=<optimized out>) at ell/main.c:461
> > > > #4 0x0000aaaadf28bac0 in l_main_run () at ell/main.c:508
> > > > #5 l_main_run () at ell/main.c:490
> > > > #6 0x0000aaaadf28bce4 in l_main_run_with_signal (
> > > > callback=callback@entry=0xaaaadf1f1110 <signal_handler>, user_data=user_data@entry=0x0)
> > > > at ell/main.c:630
> > > > #7 0x0000aaaadf1f0b0c in main (argc=<optimized out>, argv=<optimized out>) at src/main.c:611
> > > > (gdb) p station->connected_bss
> > > > $1 = (struct scan_bss *) 0x0
> > > >
> > > Its hard to say without any debug logs as well but it appears the disconnect
> > > never cleared out the timer used for the next roam attempt. I did fix a hang
> > > due to a disconnect coming in during a roam attempt after 2.19, but I can't
> > > really make heads or tails without debug logs to see what happened
> > > before/after the disconnect.
> > It happened again with debug logs enabled. Relevant snippet (from
> > logread):
> >
> > [Aug 17 21:22:12] daemon iwd: src/station.c:station_roam_state_clear() 5
> > [Aug 17 21:22:12] daemon iwd: event: state, old: connected, new: disconnecting
> > [Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20)
> > [Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_link_notify() event 16 on ifindex 5
> > [Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_mlme_notify() MLME notification Deauthenticate(39)
> > [Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_deauthenticate_event()
> > [Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_mlme_notify() MLME notification Disconnect(48)
> > [Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_disconnect_event()
> > [Aug 17 21:22:15] daemon iwd: src/station.c:station_disconnect_cb() 5, success: 1
> > [Aug 17 21:22:15] daemon iwd: event: state, old: disconnecting, new: disconnected
> > [Aug 17 21:22:15] daemon iwd: src/wiphy.c:wiphy_reg_notify() Notification of command Reg Change(36)
> > [Aug 17 21:22:15] daemon iwd: src/wiphy.c:wiphy_update_reg_domain() New reg domain country code for (global) is XX
> > [Aug 17 21:22:36] daemon iwd: src/station.c:station_roam_trigger_cb() 5
> >
> > Afterwards is the segmentation fault.
>
> Do you happen to have the logs a few minutes prior. The roam timeout is
> defaulted to 60 seconds, so at some point it was re-armed but the logs don't
> go back that far. Its trivial to handle the segfault but I suspect the roam
> timeout being rearmed is also leaking memory so we should address that as
> the root cause.
Unfortunately not, only for the last half-minute (with personal
information). Here's a bit more that I don't need to redact:
[Aug 17 21:22:07] daemon iwd: src/station.c:station_roam_failed() 5
[Aug 17 21:22:07] daemon iwd: src/wiphy.c:wiphy_radio_work_done() Work item 279 done
[Aug 17 21:22:12] daemon iwd: src/station.c:station_dbus_disconnect()
[Aug 17 21:22:12] daemon iwd: src/station.c:station_reset_connection_state() 5
[Aug 17 21:22:12] daemon iwd: src/station.c:station_roam_state_clear() 5
[Aug 17 21:22:12] daemon iwd: event: state, old: connected, new: disconnecting
[Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20)
[Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_link_notify() event 16 on ifindex 5
[Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_mlme_notify() MLME notification Deauthenticate(39)
[Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_deauthenticate_event()
[Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_mlme_notify() MLME notification Disconnect(48)
[Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_disconnect_event()
[Aug 17 21:22:15] daemon iwd: src/station.c:station_disconnect_cb() 5, success: 1
[Aug 17 21:22:15] daemon iwd: event: state, old: disconnecting, new: disconnected
[Aug 17 21:22:15] daemon iwd: src/wiphy.c:wiphy_reg_notify() Notification of command Reg Change(36)
[Aug 17 21:22:15] daemon iwd: src/wiphy.c:wiphy_update_reg_domain() New reg domain country code for (global) is XX
[Aug 17 21:22:36] daemon iwd: src/station.c:station_roam_trigger_cb() 5
I can confirm, though, that there are more timeouts than expected, even
after (presumably) clearing the last timeout:
Core was generated by `/usr/libexec/iwd -d'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0000aaaadf1686a0 in station_start_roam (station=0xffffb0527b30) at src/station.c:2880
warning: 2880 src/station.c: No such file or directory
(gdb) p *watch_list[12]
$1 = {fd = 12, events = 1073741825, flags = 1, callback = 0xaaaadf1ec4f0 <timeout_callback>,
destroy = 0xaaaadf1ec350 <timeout_destroy>, user_data = 0xffffb0530dd0}
(gdb) p *(struct l_timeout *) watch_list[12]->user_data
$2 = {fd = 12, callback = 0xaaaadf1687a0 <station_roam_trigger_cb>, destroy = 0x0,
user_data = 0xffffb0527b30}
(gdb) p *watch_list[13]
$3 = {fd = 13, events = 1073741825, flags = 0, callback = 0xaaaadf1ec4f0 <timeout_callback>,
destroy = 0xaaaadf1ec350 <timeout_destroy>, user_data = 0xffffb0530ef0}
(gdb) p *(struct l_timeout *) watch_list[13]->user_data
$4 = {fd = 13, callback = 0xaaaadf1687a0 <station_roam_trigger_cb>, destroy = 0x0,
user_data = 0xffffb0527b30}
(gdb) p *watch_list[14]
$5 = {fd = 14, events = 1073741825, flags = 0, callback = 0xaaaadf1ec4f0 <timeout_callback>,
destroy = 0xaaaadf1ec350 <timeout_destroy>, user_data = 0xffffb0530c50}
(gdb) p *(struct l_timeout *) watch_list[14]->user_data
$6 = {fd = 14, callback = 0xaaaadf1687a0 <station_roam_trigger_cb>, destroy = 0x0,
user_data = 0xffffb0527b30}
next prev parent reply other threads:[~2024-08-20 16:00 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-16 0:24 Segmentation fault when taking device for a walk Richard Acayan
2024-08-16 11:53 ` James Prestwood
2024-08-19 21:59 ` Richard Acayan
2024-08-20 15:04 ` James Prestwood
2024-08-20 16:00 ` Richard Acayan [this message]
2024-08-21 14:27 ` James Prestwood
2024-08-27 0:40 ` Richard Acayan
2024-08-27 11:46 ` James Prestwood
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZsS9oqna82kDmgJy@radian \
--to=mailingradian@gmail.com \
--cc=iwd@lists.linux.dev \
--cc=prestwoj@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox