From: James Prestwood <prestwoj@gmail.com>
To: Richard Acayan <mailingradian@gmail.com>
Cc: iwd@lists.linux.dev
Subject: Re: Segmentation fault when taking device for a walk
Date: Tue, 20 Aug 2024 08:04:20 -0700 [thread overview]
Message-ID: <e6ff554e-ad17-46e3-b614-ac6da796b337@gmail.com> (raw)
In-Reply-To: <ZsPATiu0_31D8Qcq@radian>
Hi Richard,
On 8/19/24 2:59 PM, Richard Acayan wrote:
> On Fri, Aug 16, 2024 at 04:53:41AM -0700, James Prestwood wrote:
>> Hi Richard,
>>
>> On 8/15/24 5:24 PM, Richard Acayan wrote:
>>> Hi,
>>>
>>> A segmentation fault occurs in station_start_roam() when the station is
>>> disconnected from an access point, or in other words, when the station's
>>> connected_bss is NULL. Usually, this is triggered by a timeout, possibly
>>> scheduled in response to a weak signal event.
>>>
>>> This is occurring on my Pixel 3a running postmarketOS/Alpine Linux, when
>>> receding from an access point, on iwd 2.19. I have collected 6 coredumps
>>> of the crash in the span of around 2 weeks and would be willing to use
>>> GDB if more information is necessary for a patch.
>>>
>>> Sample:
>>>
>>> Program terminated with signal SIGSEGV, Segmentation fault.
>>> #0 0x0000aaaadf2086a0 in station_start_roam (station=0xffff8776ae50) at src/station.c:2880
>>>
>>> warning: 2880 src/station.c: No such file or directory
>>> (gdb) bt
>>> #0 0x0000aaaadf2086a0 in station_start_roam (station=0xffff8776ae50) at src/station.c:2880
>>> #1 0x0000aaaadf28c544 in timeout_callback (fd=<optimized out>, events=<optimized out>,
>>> user_data=0xffff876b2e20) at ell/timeout.c:68
>>> #2 timeout_callback (fd=<optimized out>, events=<optimized out>, user_data=0xffff876b2e20)
>>> at ell/timeout.c:57
>>> #3 0x0000aaaadf28b9d0 in l_main_iterate (timeout=<optimized out>) at ell/main.c:461
>>> #4 0x0000aaaadf28bac0 in l_main_run () at ell/main.c:508
>>> #5 l_main_run () at ell/main.c:490
>>> #6 0x0000aaaadf28bce4 in l_main_run_with_signal (
>>> callback=callback@entry=0xaaaadf1f1110 <signal_handler>, user_data=user_data@entry=0x0)
>>> at ell/main.c:630
>>> #7 0x0000aaaadf1f0b0c in main (argc=<optimized out>, argv=<optimized out>) at src/main.c:611
>>> (gdb) p station->connected_bss
>>> $1 = (struct scan_bss *) 0x0
>>>
>> Its hard to say without any debug logs as well but it appears the disconnect
>> never cleared out the timer used for the next roam attempt. I did fix a hang
>> due to a disconnect coming in during a roam attempt after 2.19, but I can't
>> really make heads or tails without debug logs to see what happened
>> before/after the disconnect.
> It happened again with debug logs enabled. Relevant snippet (from
> logread):
>
> [Aug 17 21:22:12] daemon iwd: src/station.c:station_roam_state_clear() 5
> [Aug 17 21:22:12] daemon iwd: event: state, old: connected, new: disconnecting
> [Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20)
> [Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_link_notify() event 16 on ifindex 5
> [Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_mlme_notify() MLME notification Deauthenticate(39)
> [Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_deauthenticate_event()
> [Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_mlme_notify() MLME notification Disconnect(48)
> [Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_disconnect_event()
> [Aug 17 21:22:15] daemon iwd: src/station.c:station_disconnect_cb() 5, success: 1
> [Aug 17 21:22:15] daemon iwd: event: state, old: disconnecting, new: disconnected
> [Aug 17 21:22:15] daemon iwd: src/wiphy.c:wiphy_reg_notify() Notification of command Reg Change(36)
> [Aug 17 21:22:15] daemon iwd: src/wiphy.c:wiphy_update_reg_domain() New reg domain country code for (global) is XX
> [Aug 17 21:22:36] daemon iwd: src/station.c:station_roam_trigger_cb() 5
>
> Afterwards is the segmentation fault.
Do you happen to have the logs a few minutes prior. The roam timeout is
defaulted to 60 seconds, so at some point it was re-armed but the logs
don't go back that far. Its trivial to handle the segfault but I suspect
the roam timeout being rearmed is also leaking memory so we should
address that as the root cause.
Thanks,
James
next prev parent reply other threads:[~2024-08-20 15:04 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-16 0:24 Segmentation fault when taking device for a walk Richard Acayan
2024-08-16 11:53 ` James Prestwood
2024-08-19 21:59 ` Richard Acayan
2024-08-20 15:04 ` James Prestwood [this message]
2024-08-20 16:00 ` Richard Acayan
2024-08-21 14:27 ` James Prestwood
2024-08-27 0:40 ` Richard Acayan
2024-08-27 11:46 ` James Prestwood
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e6ff554e-ad17-46e3-b614-ac6da796b337@gmail.com \
--to=prestwoj@gmail.com \
--cc=iwd@lists.linux.dev \
--cc=mailingradian@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox