Re: Segmentation fault when taking device for a walk

public inbox for iwd@lists.linux.dev
 help / color / mirror / Atom feed

From: James Prestwood <prestwoj@gmail.com>
To: Richard Acayan <mailingradian@gmail.com>
Cc: iwd@lists.linux.dev
Subject: Re: Segmentation fault when taking device for a walk
Date: Tue, 20 Aug 2024 08:04:20 -0700	[thread overview]
Message-ID: <e6ff554e-ad17-46e3-b614-ac6da796b337@gmail.com> (raw)
In-Reply-To: <ZsPATiu0_31D8Qcq@radian>

Hi Richard,

On 8/19/24 2:59 PM, Richard Acayan wrote:
> On Fri, Aug 16, 2024 at 04:53:41AM -0700, James Prestwood wrote:
>> Hi Richard,
>>
>> On 8/15/24 5:24 PM, Richard Acayan wrote:
>>> Hi,
>>>
>>> A segmentation fault occurs in station_start_roam() when the station is
>>> disconnected from an access point, or in other words, when the station's
>>> connected_bss is NULL. Usually, this is triggered by a timeout, possibly
>>> scheduled in response to a weak signal event.
>>>
>>> This is occurring on my Pixel 3a running postmarketOS/Alpine Linux, when
>>> receding from an access point, on iwd 2.19. I have collected 6 coredumps
>>> of the crash in the span of around 2 weeks and would be willing to use
>>> GDB if more information is necessary for a patch.
>>>
>>> Sample:
>>>
>>> 	Program terminated with signal SIGSEGV, Segmentation fault.
>>> 	#0  0x0000aaaadf2086a0 in station_start_roam (station=0xffff8776ae50) at src/station.c:2880
>>> 	
>>> 	warning: 2880	src/station.c: No such file or directory
>>> 	(gdb) bt
>>> 	#0  0x0000aaaadf2086a0 in station_start_roam (station=0xffff8776ae50) at src/station.c:2880
>>> 	#1  0x0000aaaadf28c544 in timeout_callback (fd=<optimized out>, events=<optimized out>,
>>> 	    user_data=0xffff876b2e20) at ell/timeout.c:68
>>> 	#2  timeout_callback (fd=<optimized out>, events=<optimized out>, user_data=0xffff876b2e20)
>>> 	    at ell/timeout.c:57
>>> 	#3  0x0000aaaadf28b9d0 in l_main_iterate (timeout=<optimized out>) at ell/main.c:461
>>> 	#4  0x0000aaaadf28bac0 in l_main_run () at ell/main.c:508
>>> 	#5  l_main_run () at ell/main.c:490
>>> 	#6  0x0000aaaadf28bce4 in l_main_run_with_signal (
>>> 	    callback=callback@entry=0xaaaadf1f1110 <signal_handler>, user_data=user_data@entry=0x0)
>>> 	    at ell/main.c:630
>>> 	#7  0x0000aaaadf1f0b0c in main (argc=<optimized out>, argv=<optimized out>) at src/main.c:611
>>> 	(gdb) p station->connected_bss
>>> 	$1 = (struct scan_bss *) 0x0
>>>
>> Its hard to say without any debug logs as well but it appears the disconnect
>> never cleared out the timer used for the next roam attempt. I did fix a hang
>> due to a disconnect coming in during a roam attempt after 2.19, but I can't
>> really make heads or tails without debug logs to see what happened
>> before/after the disconnect.
> It happened again with debug logs enabled. Relevant snippet (from
> logread):
>
> 	[Aug 17 21:22:12] daemon iwd: src/station.c:station_roam_state_clear() 5
> 	[Aug 17 21:22:12] daemon iwd: event: state, old: connected, new: disconnecting
> 	[Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20)
> 	[Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_link_notify() event 16 on ifindex 5
> 	[Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_mlme_notify() MLME notification Deauthenticate(39)
> 	[Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_deauthenticate_event()
> 	[Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_mlme_notify() MLME notification Disconnect(48)
> 	[Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_disconnect_event()
> 	[Aug 17 21:22:15] daemon iwd: src/station.c:station_disconnect_cb() 5, success: 1
> 	[Aug 17 21:22:15] daemon iwd: event: state, old: disconnecting, new: disconnected
> 	[Aug 17 21:22:15] daemon iwd: src/wiphy.c:wiphy_reg_notify() Notification of command Reg Change(36)
> 	[Aug 17 21:22:15] daemon iwd: src/wiphy.c:wiphy_update_reg_domain() New reg domain country code for (global) is XX
> 	[Aug 17 21:22:36] daemon iwd: src/station.c:station_roam_trigger_cb() 5
>
> Afterwards is the segmentation fault.

Do you happen to have the logs a few minutes prior. The roam timeout is 
defaulted to 60 seconds, so at some point it was re-armed but the logs 
don't go back that far. Its trivial to handle the segfault but I suspect 
the roam timeout being rearmed is also leaking memory so we should 
address that as the root cause.

Thanks,

James

next prev parent reply	other threads:[~2024-08-20 15:04 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-16  0:24 Segmentation fault when taking device for a walk Richard Acayan
2024-08-16 11:53 ` James Prestwood
2024-08-19 21:59   ` Richard Acayan
2024-08-20 15:04     ` James Prestwood [this message]
2024-08-20 16:00       ` Richard Acayan
2024-08-21 14:27         ` James Prestwood
2024-08-27  0:40           ` Richard Acayan
2024-08-27 11:46             ` James Prestwood

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e6ff554e-ad17-46e3-b614-ac6da796b337@gmail.com \
    --to=prestwoj@gmail.com \
    --cc=iwd@lists.linux.dev \
    --cc=mailingradian@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox