public inbox for iwd@lists.linux.dev
 help / color / mirror / Atom feed
From: Richard Acayan <mailingradian@gmail.com>
To: James Prestwood <prestwoj@gmail.com>
Cc: iwd@lists.linux.dev
Subject: Re: Segmentation fault when taking device for a walk
Date: Tue, 20 Aug 2024 12:00:34 -0400	[thread overview]
Message-ID: <ZsS9oqna82kDmgJy@radian> (raw)
In-Reply-To: <e6ff554e-ad17-46e3-b614-ac6da796b337@gmail.com>

On Tue, Aug 20, 2024 at 08:04:20AM -0700, James Prestwood wrote:
> Hi Richard,
> 
> On 8/19/24 2:59 PM, Richard Acayan wrote:
> > On Fri, Aug 16, 2024 at 04:53:41AM -0700, James Prestwood wrote:
> > > Hi Richard,
> > > 
> > > On 8/15/24 5:24 PM, Richard Acayan wrote:
> > > > Hi,
> > > > 
> > > > A segmentation fault occurs in station_start_roam() when the station is
> > > > disconnected from an access point, or in other words, when the station's
> > > > connected_bss is NULL. Usually, this is triggered by a timeout, possibly
> > > > scheduled in response to a weak signal event.
> > > > 
> > > > This is occurring on my Pixel 3a running postmarketOS/Alpine Linux, when
> > > > receding from an access point, on iwd 2.19. I have collected 6 coredumps
> > > > of the crash in the span of around 2 weeks and would be willing to use
> > > > GDB if more information is necessary for a patch.
> > > > 
> > > > Sample:
> > > > 
> > > > 	Program terminated with signal SIGSEGV, Segmentation fault.
> > > > 	#0  0x0000aaaadf2086a0 in station_start_roam (station=0xffff8776ae50) at src/station.c:2880
> > > > 	
> > > > 	warning: 2880	src/station.c: No such file or directory
> > > > 	(gdb) bt
> > > > 	#0  0x0000aaaadf2086a0 in station_start_roam (station=0xffff8776ae50) at src/station.c:2880
> > > > 	#1  0x0000aaaadf28c544 in timeout_callback (fd=<optimized out>, events=<optimized out>,
> > > > 	    user_data=0xffff876b2e20) at ell/timeout.c:68
> > > > 	#2  timeout_callback (fd=<optimized out>, events=<optimized out>, user_data=0xffff876b2e20)
> > > > 	    at ell/timeout.c:57
> > > > 	#3  0x0000aaaadf28b9d0 in l_main_iterate (timeout=<optimized out>) at ell/main.c:461
> > > > 	#4  0x0000aaaadf28bac0 in l_main_run () at ell/main.c:508
> > > > 	#5  l_main_run () at ell/main.c:490
> > > > 	#6  0x0000aaaadf28bce4 in l_main_run_with_signal (
> > > > 	    callback=callback@entry=0xaaaadf1f1110 <signal_handler>, user_data=user_data@entry=0x0)
> > > > 	    at ell/main.c:630
> > > > 	#7  0x0000aaaadf1f0b0c in main (argc=<optimized out>, argv=<optimized out>) at src/main.c:611
> > > > 	(gdb) p station->connected_bss
> > > > 	$1 = (struct scan_bss *) 0x0
> > > > 
> > > Its hard to say without any debug logs as well but it appears the disconnect
> > > never cleared out the timer used for the next roam attempt. I did fix a hang
> > > due to a disconnect coming in during a roam attempt after 2.19, but I can't
> > > really make heads or tails without debug logs to see what happened
> > > before/after the disconnect.
> > It happened again with debug logs enabled. Relevant snippet (from
> > logread):
> > 
> > 	[Aug 17 21:22:12] daemon iwd: src/station.c:station_roam_state_clear() 5
> > 	[Aug 17 21:22:12] daemon iwd: event: state, old: connected, new: disconnecting
> > 	[Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20)
> > 	[Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_link_notify() event 16 on ifindex 5
> > 	[Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_mlme_notify() MLME notification Deauthenticate(39)
> > 	[Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_deauthenticate_event()
> > 	[Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_mlme_notify() MLME notification Disconnect(48)
> > 	[Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_disconnect_event()
> > 	[Aug 17 21:22:15] daemon iwd: src/station.c:station_disconnect_cb() 5, success: 1
> > 	[Aug 17 21:22:15] daemon iwd: event: state, old: disconnecting, new: disconnected
> > 	[Aug 17 21:22:15] daemon iwd: src/wiphy.c:wiphy_reg_notify() Notification of command Reg Change(36)
> > 	[Aug 17 21:22:15] daemon iwd: src/wiphy.c:wiphy_update_reg_domain() New reg domain country code for (global) is XX
> > 	[Aug 17 21:22:36] daemon iwd: src/station.c:station_roam_trigger_cb() 5
> > 
> > Afterwards is the segmentation fault.
> 
> Do you happen to have the logs a few minutes prior. The roam timeout is
> defaulted to 60 seconds, so at some point it was re-armed but the logs don't
> go back that far. Its trivial to handle the segfault but I suspect the roam
> timeout being rearmed is also leaking memory so we should address that as
> the root cause.

Unfortunately not, only for the last half-minute (with personal
information). Here's a bit more that I don't need to redact:

	[Aug 17 21:22:07] daemon iwd: src/station.c:station_roam_failed() 5
	[Aug 17 21:22:07] daemon iwd: src/wiphy.c:wiphy_radio_work_done() Work item 279 done
	[Aug 17 21:22:12] daemon iwd: src/station.c:station_dbus_disconnect() 
	[Aug 17 21:22:12] daemon iwd: src/station.c:station_reset_connection_state() 5
	[Aug 17 21:22:12] daemon iwd: src/station.c:station_roam_state_clear() 5
	[Aug 17 21:22:12] daemon iwd: event: state, old: connected, new: disconnecting
	[Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_mlme_notify() MLME notification Del Station(20)
	[Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_link_notify() event 16 on ifindex 5
	[Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_mlme_notify() MLME notification Deauthenticate(39)
	[Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_deauthenticate_event() 
	[Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_mlme_notify() MLME notification Disconnect(48)
	[Aug 17 21:22:15] daemon iwd: src/netdev.c:netdev_disconnect_event() 
	[Aug 17 21:22:15] daemon iwd: src/station.c:station_disconnect_cb() 5, success: 1
	[Aug 17 21:22:15] daemon iwd: event: state, old: disconnecting, new: disconnected
	[Aug 17 21:22:15] daemon iwd: src/wiphy.c:wiphy_reg_notify() Notification of command Reg Change(36)
	[Aug 17 21:22:15] daemon iwd: src/wiphy.c:wiphy_update_reg_domain() New reg domain country code for (global) is XX
	[Aug 17 21:22:36] daemon iwd: src/station.c:station_roam_trigger_cb() 5

I can confirm, though, that there are more timeouts than expected, even
after (presumably) clearing the last timeout:

	Core was generated by `/usr/libexec/iwd -d'.
	Program terminated with signal SIGSEGV, Segmentation fault.
	#0  0x0000aaaadf1686a0 in station_start_roam (station=0xffffb0527b30) at src/station.c:2880
	
	warning: 2880	src/station.c: No such file or directory
	(gdb) p *watch_list[12]
	$1 = {fd = 12, events = 1073741825, flags = 1, callback = 0xaaaadf1ec4f0 <timeout_callback>, 
	  destroy = 0xaaaadf1ec350 <timeout_destroy>, user_data = 0xffffb0530dd0}
	(gdb) p *(struct l_timeout *) watch_list[12]->user_data
	$2 = {fd = 12, callback = 0xaaaadf1687a0 <station_roam_trigger_cb>, destroy = 0x0, 
	  user_data = 0xffffb0527b30}
	(gdb) p *watch_list[13]
	$3 = {fd = 13, events = 1073741825, flags = 0, callback = 0xaaaadf1ec4f0 <timeout_callback>, 
	  destroy = 0xaaaadf1ec350 <timeout_destroy>, user_data = 0xffffb0530ef0}
	(gdb) p *(struct l_timeout *) watch_list[13]->user_data
	$4 = {fd = 13, callback = 0xaaaadf1687a0 <station_roam_trigger_cb>, destroy = 0x0, 
	  user_data = 0xffffb0527b30}
	(gdb) p *watch_list[14]
	$5 = {fd = 14, events = 1073741825, flags = 0, callback = 0xaaaadf1ec4f0 <timeout_callback>, 
	  destroy = 0xaaaadf1ec350 <timeout_destroy>, user_data = 0xffffb0530c50}
	(gdb) p *(struct l_timeout *) watch_list[14]->user_data
	$6 = {fd = 14, callback = 0xaaaadf1687a0 <station_roam_trigger_cb>, destroy = 0x0, 
	  user_data = 0xffffb0527b30}

  reply	other threads:[~2024-08-20 16:00 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-16  0:24 Segmentation fault when taking device for a walk Richard Acayan
2024-08-16 11:53 ` James Prestwood
2024-08-19 21:59   ` Richard Acayan
2024-08-20 15:04     ` James Prestwood
2024-08-20 16:00       ` Richard Acayan [this message]
2024-08-21 14:27         ` James Prestwood
2024-08-27  0:40           ` Richard Acayan
2024-08-27 11:46             ` James Prestwood

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZsS9oqna82kDmgJy@radian \
    --to=mailingradian@gmail.com \
    --cc=iwd@lists.linux.dev \
    --cc=prestwoj@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox