Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [syzbot] [arm?] WARNING in delayed_work_timer_fn
       [not found] <00000000000046efb605fec3def0@google.com>
@ 2024-12-13 12:30 ` syzbot
  2025-02-11 11:10   ` Will Deacon
  0 siblings, 1 reply; 5+ messages in thread
From: syzbot @ 2024-12-13 12:30 UTC (permalink / raw)
  To: catalin.marinas, linux-arm-kernel, linux-kernel, syzkaller-bugs,
	will

syzbot has found a reproducer for the following issue on:

HEAD commit:    2e7aff49b5da Merge branches 'for-next/core' and 'for-next/..
git tree:       git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci
console output: https://syzkaller.appspot.com/x/log.txt?x=11ed5be8580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=696fb014d05da3a3
dashboard link: https://syzkaller.appspot.com/bug?extid=e13e654d315d4da1277c
compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
userspace arch: arm64
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=10192d44580000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/ef408f67fde3/disk-2e7aff49.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/414ac17a20dc/vmlinux-2e7aff49.xz
kernel image: https://storage.googleapis.com/syzbot-assets/a93415d2a7e7/Image-2e7aff49.gz.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+e13e654d315d4da1277c@syzkaller.appspotmail.com

------------[ cut here ]------------
WARNING: CPU: 1 PID: 6023 at kernel/workqueue.c:2257 current_wq_worker kernel/workqueue_internal.h:69 [inline]
WARNING: CPU: 1 PID: 6023 at kernel/workqueue.c:2257 is_chained_work kernel/workqueue.c:2199 [inline]
WARNING: CPU: 1 PID: 6023 at kernel/workqueue.c:2257 __queue_work+0xe50/0x1308 kernel/workqueue.c:2256
Modules linked in:
CPU: 1 UID: 0 PID: 6023 Comm: klogd Not tainted 6.13.0-rc2-syzkaller-g2e7aff49b5da #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
pstate: 404000c5 (nZcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : __queue_work+0xe50/0x1308 kernel/workqueue_internal.h:69
lr : current_wq_worker kernel/workqueue_internal.h:69 [inline]
lr : is_chained_work kernel/workqueue.c:2199 [inline]
lr : __queue_work+0xe50/0x1308 kernel/workqueue.c:2256
sp : ffff800080017b40
x29: ffff800080017b90 x28: dfff800000000000 x27: ffff700010002f88
x26: dfff800000000000 x25: ffff0000d1c63800 x24: 0000000000000008
x23: 1fffe0001b14d791 x22: ffff0000d1c639c0 x21: 0000000000000100
x20: ffff0000d8a6bc80 x19: ffff0000da2f0930 x18: 1fffe000366c5e7e
x17: ffff800123d64000 x16: ffff80008326d65c x15: 0000000000000002
x14: 1ffff00011f300ca x13: dfff800000000000 x12: 00000000b1aebe1c
x11: 00000000beaa9035 x10: 0000000000ff0100 x9 : 0000000000000000
x8 : ffff0000d8a6bc80 x7 : ffff8000805819c0 x6 : 0000000000000000
x5 : 0000000000000001 x4 : 0000000000000001 x3 : ffff80008df6a320
x2 : ffff0000da2f0930 x1 : 0000000000000100 x0 : 0000000000000000
Call trace:
 current_wq_worker kernel/workqueue_internal.h:69 [inline] (P)
 is_chained_work kernel/workqueue.c:2199 [inline] (P)
 __queue_work+0xe50/0x1308 kernel/workqueue.c:2256 (P)
 current_wq_worker kernel/workqueue_internal.h:69 [inline] (L)
 is_chained_work kernel/workqueue.c:2199 [inline] (L)
 __queue_work+0xe50/0x1308 kernel/workqueue.c:2256 (L)
 delayed_work_timer_fn+0x74/0x90 kernel/workqueue.c:2485
 call_timer_fn+0x1b4/0x8b8 kernel/time/timer.c:1793
 expire_timers kernel/time/timer.c:1839 [inline]
 __run_timers kernel/time/timer.c:2418 [inline]
 __run_timer_base+0x59c/0x7b4 kernel/time/timer.c:2430
 run_timer_base kernel/time/timer.c:2439 [inline]
 run_timer_softirq+0xcc/0x194 kernel/time/timer.c:2449
 handle_softirqs+0x320/0xd34 kernel/softirq.c:554
 __do_softirq+0x14/0x20 kernel/softirq.c:588
 ____do_softirq+0x14/0x20 arch/arm64/kernel/irq.c:81
 call_on_irq_stack+0x24/0x4c arch/arm64/kernel/entry.S:891
 do_softirq_own_stack+0x20/0x2c arch/arm64/kernel/irq.c:86
 invoke_softirq kernel/softirq.c:435 [inline]
 __irq_exit_rcu+0x1d8/0x544 kernel/softirq.c:655
 irq_exit_rcu+0x14/0x84 kernel/softirq.c:671
 __el1_irq arch/arm64/kernel/entry-common.c:561 [inline]
 el1_interrupt+0x38/0x68 arch/arm64/kernel/entry-common.c:575
 el1h_64_irq_handler+0x18/0x24 arch/arm64/kernel/entry-common.c:580
 el1h_64_irq+0x6c/0x70 arch/arm64/kernel/entry.S:596
 __daif_local_irq_enable arch/arm64/include/asm/irqflags.h:26 [inline] (P)
 arch_local_irq_enable arch/arm64/include/asm/irqflags.h:48 [inline] (P)
 raw_spin_rq_unlock_irq kernel/sched/sched.h:1526 [inline] (P)
 finish_lock_switch+0xc0/0x1e4 kernel/sched/core.c:5123 (P)
 raw_spin_rq_unlock_irq kernel/sched/sched.h:1526 [inline] (L)
 finish_lock_switch+0xbc/0x1e4 kernel/sched/core.c:5123 (L)
 finish_task_switch+0x120/0x608 kernel/sched/core.c:5241
 context_switch kernel/sched/core.c:5372 [inline]
 __schedule+0x1748/0x27f0 kernel/sched/core.c:6756
 preempt_schedule_irq+0x80/0x188 kernel/sched/core.c:7078
 arm64_preempt_schedule_irq arch/arm64/kernel/entry-common.c:301 [inline]
 __el1_irq arch/arm64/kernel/entry-common.c:563 [inline]
 el1_interrupt+0x4c/0x68 arch/arm64/kernel/entry-common.c:575
 el1h_64_irq_handler+0x18/0x24 arch/arm64/kernel/entry-common.c:580
 el1h_64_irq+0x6c/0x70 arch/arm64/kernel/entry.S:596
 __daif_local_irq_enable arch/arm64/include/asm/irqflags.h:26 [inline] (P)
 arch_local_irq_enable arch/arm64/include/asm/irqflags.h:48 [inline] (P)
 do_notify_resume+0x5c/0x1f4 arch/arm64/kernel/entry-common.c:133 (P)
 do_notify_resume+0x58/0x1f4 arch/arm64/kernel/entry-common.c:133 (L)
 exit_to_user_mode_prepare arch/arm64/kernel/entry-common.c:169 [inline]
 exit_to_user_mode arch/arm64/kernel/entry-common.c:178 [inline]
 el0_interrupt+0xc8/0x248 arch/arm64/kernel/entry-common.c:832
 __el0_irq_handler_common+0x18/0x24 arch/arm64/kernel/entry-common.c:837
 el0t_64_irq_handler+0x10/0x1c arch/arm64/kernel/entry-common.c:842
 el0t_64_irq+0x198/0x19c arch/arm64/kernel/entry.S:601
irq event stamp: 35689
hardirqs last  enabled at (35688): [<ffff80008b687f2c>] __exit_to_kernel_mode arch/arm64/kernel/entry-common.c:85 [inline]
hardirqs last  enabled at (35688): [<ffff80008b687f2c>] exit_to_kernel_mode+0xdc/0x10c arch/arm64/kernel/entry-common.c:95
hardirqs last disabled at (35689): [<ffff80008b6a9b44>] __raw_spin_lock_irq include/linux/spinlock_api_smp.h:117 [inline]
hardirqs last disabled at (35689): [<ffff80008b6a9b44>] _raw_spin_lock_irq+0x28/0x70 kernel/locking/spinlock.c:170
softirqs last  enabled at (34558): [<ffff800080129934>] local_bh_enable+0x10/0x34 include/linux/bottom_half.h:32
softirqs last disabled at (35685): [<ffff800080020db4>] __do_softirq+0x14/0x20 kernel/softirq.c:588
---[ end trace 0000000000000000 ]---


---
If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [syzbot] [arm?] WARNING in delayed_work_timer_fn
  2024-12-13 12:30 ` [syzbot] [arm?] WARNING in delayed_work_timer_fn syzbot
@ 2025-02-11 11:10   ` Will Deacon
  2025-02-11 20:59     ` Tejun Heo
  0 siblings, 1 reply; 5+ messages in thread
From: Will Deacon @ 2025-02-11 11:10 UTC (permalink / raw)
  To: syzbot
  Cc: catalin.marinas, linux-arm-kernel, linux-kernel, syzkaller-bugs,
	maz, mark.rutland, richard.xnu.clark, jiangshanlai, tj, marcel,
	johan.hedberg, luiz.dentz, linux-bluetooth

[+ some other folks]

On Fri, Dec 13, 2024 at 04:30:24AM -0800, syzbot wrote:
> syzbot has found a reproducer for the following issue on:
> 
> HEAD commit:    2e7aff49b5da Merge branches 'for-next/core' and 'for-next/..
> git tree:       git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci
> console output: https://syzkaller.appspot.com/x/log.txt?x=11ed5be8580000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=696fb014d05da3a3
> dashboard link: https://syzkaller.appspot.com/bug?extid=e13e654d315d4da1277c
> compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> userspace arch: arm64
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=10192d44580000
> 
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/ef408f67fde3/disk-2e7aff49.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/414ac17a20dc/vmlinux-2e7aff49.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/a93415d2a7e7/Image-2e7aff49.gz.xz
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+e13e654d315d4da1277c@syzkaller.appspotmail.com

I spent a bit of time looking at the reports in the "dashboard link"
above. There are quite a few in there and I ended up suspecting the
bluetooth subsystem and drivers using hci_register_dev(). Specifically,
it looks like one of the workqueues is serviced in softirq context after
its corresponding device has been destroyed.

I wasn't able to reproduce it locally (I don't have a setup for running
the 'syz' reproducers), but the program is pretty small (see the "syz repro"
link above).

Given that this seems to explode a few times a month, I wonder if it's
worth adding some instrumentation to e.g. dump the name of the workqueue?

Will

--->8

(mammoth backtrace below)

> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 6023 at kernel/workqueue.c:2257 current_wq_worker kernel/workqueue_internal.h:69 [inline]
> WARNING: CPU: 1 PID: 6023 at kernel/workqueue.c:2257 is_chained_work kernel/workqueue.c:2199 [inline]
> WARNING: CPU: 1 PID: 6023 at kernel/workqueue.c:2257 __queue_work+0xe50/0x1308 kernel/workqueue.c:2256
> Modules linked in:
> CPU: 1 UID: 0 PID: 6023 Comm: klogd Not tainted 6.13.0-rc2-syzkaller-g2e7aff49b5da #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
> pstate: 404000c5 (nZcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : __queue_work+0xe50/0x1308 kernel/workqueue_internal.h:69
> lr : current_wq_worker kernel/workqueue_internal.h:69 [inline]
> lr : is_chained_work kernel/workqueue.c:2199 [inline]
> lr : __queue_work+0xe50/0x1308 kernel/workqueue.c:2256
> sp : ffff800080017b40
> x29: ffff800080017b90 x28: dfff800000000000 x27: ffff700010002f88
> x26: dfff800000000000 x25: ffff0000d1c63800 x24: 0000000000000008
> x23: 1fffe0001b14d791 x22: ffff0000d1c639c0 x21: 0000000000000100
> x20: ffff0000d8a6bc80 x19: ffff0000da2f0930 x18: 1fffe000366c5e7e
> x17: ffff800123d64000 x16: ffff80008326d65c x15: 0000000000000002
> x14: 1ffff00011f300ca x13: dfff800000000000 x12: 00000000b1aebe1c
> x11: 00000000beaa9035 x10: 0000000000ff0100 x9 : 0000000000000000
> x8 : ffff0000d8a6bc80 x7 : ffff8000805819c0 x6 : 0000000000000000
> x5 : 0000000000000001 x4 : 0000000000000001 x3 : ffff80008df6a320
> x2 : ffff0000da2f0930 x1 : 0000000000000100 x0 : 0000000000000000
> Call trace:
>  current_wq_worker kernel/workqueue_internal.h:69 [inline] (P)
>  is_chained_work kernel/workqueue.c:2199 [inline] (P)
>  __queue_work+0xe50/0x1308 kernel/workqueue.c:2256 (P)
>  current_wq_worker kernel/workqueue_internal.h:69 [inline] (L)
>  is_chained_work kernel/workqueue.c:2199 [inline] (L)
>  __queue_work+0xe50/0x1308 kernel/workqueue.c:2256 (L)
>  delayed_work_timer_fn+0x74/0x90 kernel/workqueue.c:2485
>  call_timer_fn+0x1b4/0x8b8 kernel/time/timer.c:1793
>  expire_timers kernel/time/timer.c:1839 [inline]
>  __run_timers kernel/time/timer.c:2418 [inline]
>  __run_timer_base+0x59c/0x7b4 kernel/time/timer.c:2430
>  run_timer_base kernel/time/timer.c:2439 [inline]
>  run_timer_softirq+0xcc/0x194 kernel/time/timer.c:2449
>  handle_softirqs+0x320/0xd34 kernel/softirq.c:554
>  __do_softirq+0x14/0x20 kernel/softirq.c:588
>  ____do_softirq+0x14/0x20 arch/arm64/kernel/irq.c:81
>  call_on_irq_stack+0x24/0x4c arch/arm64/kernel/entry.S:891
>  do_softirq_own_stack+0x20/0x2c arch/arm64/kernel/irq.c:86
>  invoke_softirq kernel/softirq.c:435 [inline]
>  __irq_exit_rcu+0x1d8/0x544 kernel/softirq.c:655
>  irq_exit_rcu+0x14/0x84 kernel/softirq.c:671
>  __el1_irq arch/arm64/kernel/entry-common.c:561 [inline]
>  el1_interrupt+0x38/0x68 arch/arm64/kernel/entry-common.c:575
>  el1h_64_irq_handler+0x18/0x24 arch/arm64/kernel/entry-common.c:580
>  el1h_64_irq+0x6c/0x70 arch/arm64/kernel/entry.S:596
>  __daif_local_irq_enable arch/arm64/include/asm/irqflags.h:26 [inline] (P)
>  arch_local_irq_enable arch/arm64/include/asm/irqflags.h:48 [inline] (P)
>  raw_spin_rq_unlock_irq kernel/sched/sched.h:1526 [inline] (P)
>  finish_lock_switch+0xc0/0x1e4 kernel/sched/core.c:5123 (P)
>  raw_spin_rq_unlock_irq kernel/sched/sched.h:1526 [inline] (L)
>  finish_lock_switch+0xbc/0x1e4 kernel/sched/core.c:5123 (L)
>  finish_task_switch+0x120/0x608 kernel/sched/core.c:5241
>  context_switch kernel/sched/core.c:5372 [inline]
>  __schedule+0x1748/0x27f0 kernel/sched/core.c:6756
>  preempt_schedule_irq+0x80/0x188 kernel/sched/core.c:7078
>  arm64_preempt_schedule_irq arch/arm64/kernel/entry-common.c:301 [inline]
>  __el1_irq arch/arm64/kernel/entry-common.c:563 [inline]
>  el1_interrupt+0x4c/0x68 arch/arm64/kernel/entry-common.c:575
>  el1h_64_irq_handler+0x18/0x24 arch/arm64/kernel/entry-common.c:580
>  el1h_64_irq+0x6c/0x70 arch/arm64/kernel/entry.S:596
>  __daif_local_irq_enable arch/arm64/include/asm/irqflags.h:26 [inline] (P)
>  arch_local_irq_enable arch/arm64/include/asm/irqflags.h:48 [inline] (P)
>  do_notify_resume+0x5c/0x1f4 arch/arm64/kernel/entry-common.c:133 (P)
>  do_notify_resume+0x58/0x1f4 arch/arm64/kernel/entry-common.c:133 (L)
>  exit_to_user_mode_prepare arch/arm64/kernel/entry-common.c:169 [inline]
>  exit_to_user_mode arch/arm64/kernel/entry-common.c:178 [inline]
>  el0_interrupt+0xc8/0x248 arch/arm64/kernel/entry-common.c:832
>  __el0_irq_handler_common+0x18/0x24 arch/arm64/kernel/entry-common.c:837
>  el0t_64_irq_handler+0x10/0x1c arch/arm64/kernel/entry-common.c:842
>  el0t_64_irq+0x198/0x19c arch/arm64/kernel/entry.S:601
> irq event stamp: 35689
> hardirqs last  enabled at (35688): [<ffff80008b687f2c>] __exit_to_kernel_mode arch/arm64/kernel/entry-common.c:85 [inline]
> hardirqs last  enabled at (35688): [<ffff80008b687f2c>] exit_to_kernel_mode+0xdc/0x10c arch/arm64/kernel/entry-common.c:95
> hardirqs last disabled at (35689): [<ffff80008b6a9b44>] __raw_spin_lock_irq include/linux/spinlock_api_smp.h:117 [inline]
> hardirqs last disabled at (35689): [<ffff80008b6a9b44>] _raw_spin_lock_irq+0x28/0x70 kernel/locking/spinlock.c:170
> softirqs last  enabled at (34558): [<ffff800080129934>] local_bh_enable+0x10/0x34 include/linux/bottom_half.h:32
> softirqs last disabled at (35685): [<ffff800080020db4>] __do_softirq+0x14/0x20 kernel/softirq.c:588
> ---[ end trace 0000000000000000 ]---
> 
> 
> ---
> If you want syzbot to run the reproducer, reply with:
> #syz test: git://repo/address.git branch-or-commit-hash
> If you attach or paste a git patch, syzbot will apply it before testing.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [syzbot] [arm?] WARNING in delayed_work_timer_fn
  2025-02-11 11:10   ` Will Deacon
@ 2025-02-11 20:59     ` Tejun Heo
  2025-02-14 16:45       ` Will Deacon
  0 siblings, 1 reply; 5+ messages in thread
From: Tejun Heo @ 2025-02-11 20:59 UTC (permalink / raw)
  To: Will Deacon
  Cc: syzbot, catalin.marinas, linux-arm-kernel, linux-kernel,
	syzkaller-bugs, maz, mark.rutland, richard.xnu.clark,
	jiangshanlai, marcel, johan.hedberg, luiz.dentz, linux-bluetooth

On Tue, Feb 11, 2025 at 11:10:46AM +0000, Will Deacon wrote:
> Given that this seems to explode a few times a month, I wonder if it's
> worth adding some instrumentation to e.g. dump the name of the workqueue?

It's usually most useful to print out the work function when identifying the
culprit. I'd be happy to take the patch.

Thanks.

-- 
tejun


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [syzbot] [arm?] WARNING in delayed_work_timer_fn
  2025-02-11 20:59     ` Tejun Heo
@ 2025-02-14 16:45       ` Will Deacon
  2025-03-13 19:30         ` Will Deacon
  0 siblings, 1 reply; 5+ messages in thread
From: Will Deacon @ 2025-02-14 16:45 UTC (permalink / raw)
  To: Tejun Heo
  Cc: syzbot, catalin.marinas, linux-arm-kernel, linux-kernel,
	syzkaller-bugs, maz, mark.rutland, richard.xnu.clark,
	jiangshanlai, marcel, johan.hedberg, luiz.dentz, linux-bluetooth

On Tue, Feb 11, 2025 at 10:59:30AM -1000, Tejun Heo wrote:
> On Tue, Feb 11, 2025 at 11:10:46AM +0000, Will Deacon wrote:
> > Given that this seems to explode a few times a month, I wonder if it's
> > worth adding some instrumentation to e.g. dump the name of the workqueue?
> 
> It's usually most useful to print out the work function when identifying the
> culprit. I'd be happy to take the patch.

Thanks, Tejun. I sent a patch adding some more diagnostics:

https://lore.kernel.org/r/20250214164349.13694-1-will@kernel.org

Will


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [syzbot] [arm?] WARNING in delayed_work_timer_fn
  2025-02-14 16:45       ` Will Deacon
@ 2025-03-13 19:30         ` Will Deacon
  0 siblings, 0 replies; 5+ messages in thread
From: Will Deacon @ 2025-03-13 19:30 UTC (permalink / raw)
  To: Tejun Heo
  Cc: syzbot, catalin.marinas, linux-arm-kernel, linux-kernel,
	syzkaller-bugs, maz, mark.rutland, richard.xnu.clark,
	jiangshanlai, marcel, johan.hedberg, luiz.dentz, linux-bluetooth

On Fri, Feb 14, 2025 at 04:45:57PM +0000, Will Deacon wrote:
> On Tue, Feb 11, 2025 at 10:59:30AM -1000, Tejun Heo wrote:
> > On Tue, Feb 11, 2025 at 11:10:46AM +0000, Will Deacon wrote:
> > > Given that this seems to explode a few times a month, I wonder if it's
> > > worth adding some instrumentation to e.g. dump the name of the workqueue?
> > 
> > It's usually most useful to print out the work function when identifying the
> > culprit. I'd be happy to take the patch.
> 
> Thanks, Tejun. I sent a patch adding some more diagnostics:
> 
> https://lore.kernel.org/r/20250214164349.13694-1-will@kernel.org

Ok, we have our first crash [1] with the new debug information:

[   77.133818][   T10] Bluetooth: hci3: Opcode 0x0c1a failed: -110
[   77.135329][   T10] Bluetooth: hci3: Error when powering off device on rfkill (-110)
[   77.143656][    C1] ------------[ cut here ]------------
[   77.145336][    C1] workqueue: cannot queue hci_cmd_timeout on wq hci3
[   77.147101][    C1] WARNING: CPU: 1 PID: 7433 at kernel/workqueue.c:2258 __queue_work+0xe94/0x1324
[   77.149301][    C1] Modules linked in:
[   77.150287][    C1] CPU: 1 UID: 0 PID: 7433 Comm: syz.2.193 Not tainted 6.14.0-rc5-syzkaller-g77c95b8c7a16 #0
[   77.152960][    C1] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2025
[   77.155583][    C1] pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   77.157546][    C1] pc : __queue_work+0xe94/0x1324
[   77.158811][    C1] lr : __queue_work+0xe94/0x1324
[   77.160114][    C1] sp : ffff800080017b40
[   77.161145][    C1] x29: ffff800080017b90 x28: dfff800000000000 x27: ffff700010002f88
[   77.163285][    C1] x26: ffff80008fbbd600 x25: ffff0000dacac800 x24: 0000000000000008
[   77.165326][    C1] x23: ffff0000dacac9c0 x22: 0000000000000100 x21: 1fffe0001b595938
[   77.167421][    C1] x20: ffff800092f09000 x19: ffff0000ecf20948 x18: 0000000000000008
[   77.169356][    C1] x17: 0000000000000000 x16: ffff80008b72ce6c x15: ffff700011f87b38
[   77.171279][    C1] x14: 1ffff00011f87b38 x13: 0000000000000004 x12: ffffffffffffffff
[   77.173271][    C1] x11: 0000000000000103 x10: 0000000000ff0100 x9 : d5bcdc7e4519df00
[   77.175255][    C1] x8 : d5bcdc7e4519df00 x7 : 0000000000000001 x6 : 0000000000000001
[   77.177277][    C1] x5 : ffff8000800172f8 x4 : ffff80008fcaf7c0 x3 : ffff8000804a7bb4
[   77.179372][    C1] x2 : 0000000000000000 x1 : 0000000100000101 x0 : 0000000000000000
[   77.181374][    C1] Call trace:
[   77.182230][    C1]  __queue_work+0xe94/0x1324 (P)
[   77.183476][    C1]  delayed_work_timer_fn+0x74/0x90
[   77.184755][    C1]  call_timer_fn+0x1b4/0x8b8
[   77.185932][    C1]  __run_timer_base+0x59c/0x7b4
[   77.187119][    C1]  run_timer_softirq+0xcc/0x194
[   77.188360][    C1]  handle_softirqs+0x320/0xd34
[   77.189576][    C1]  __do_softirq+0x14/0x20
[   77.190672][    C1]  ____do_softirq+0x14/0x20
[   77.191884][    C1]  call_on_irq_stack+0x24/0x4c
[   77.193050][    C1]  do_softirq_own_stack+0x20/0x2c
[   77.194311][    C1]  __irq_exit_rcu+0x1d8/0x544
[   77.195550][    C1]  irq_exit_rcu+0x14/0x84
[   77.196669][    C1]  el1_interrupt+0x38/0x68
[   77.197818][    C1]  el1h_64_irq_handler+0x18/0x24
[   77.199152][    C1]  el1h_64_irq+0x6c/0x70
[   77.200245][    C1]  lock_acquire+0x278/0x724 (P)
[   77.201520][    C1]  rcu_lock_acquire+0x44/0x54
[   77.202722][    C1]  page_ext_get+0x2c/0x2e8
[   77.203908][    C1]  page_table_check_set+0xa0/0x408
[   77.205117][    C1]  __page_table_check_ptes_set+0x2d0/0x398
[   77.206601][    C1]  set_pte_range+0x618/0x644
[   77.207792][    C1]  finish_fault+0x968/0xd6c
[   77.208968][    C1]  handle_pte_fault+0x3528/0x57b0
[   77.210314][    C1]  handle_mm_fault+0xfa8/0x188c
[   77.211592][    C1]  __get_user_pages+0x1878/0x3400
[   77.212967][    C1]  populate_vma_page_range+0x220/0x2f0
[   77.214435][    C1]  __mm_populate+0x240/0x3d8
[   77.215600][    C1]  vm_mmap_pgoff+0x304/0x3c4
[   77.216833][    C1]  ksys_mmap_pgoff+0xd0/0x5c8
[   77.218066][    C1]  __arm64_sys_mmap+0xf8/0x110
[   77.219195][    C1]  invoke_syscall+0x98/0x2b8
[   77.220453][    C1]  el0_svc_common+0x130/0x23c
[   77.221747][    C1]  do_el0_svc+0x48/0x58
[   77.222881][    C1]  el0_svc+0x54/0x168
[   77.223946][    C1]  el0t_64_sync_handler+0x84/0x108
[   77.225305][    C1]  el0t_64_sync+0x198/0x19c
[   77.226425][    C1] irq event stamp: 3967
[   77.227573][    C1] hardirqs last  enabled at (3966): [<ffff80008b7ec538>] _raw_spin_unlock_irqrestore+0x38/0x98
[   77.230143][    C1] hardirqs last disabled at (3967): [<ffff80008b7ec3e0>] _raw_spin_lock_irq+0x28/0x70
[   77.232639][    C1] softirqs last  enabled at (2318): [<ffff800080311b48>] handle_softirqs+0xb44/0xd34
[   77.235151][    C1] softirqs last disabled at (3923): [<ffff800080020dbc>] __do_softirq+0x14/0x20
[   77.237528][    C1] ---[ end trace 0000000000000000 ]---


So it looks like I was right about bluetooth (the work function is
"hci_cmd_timeout") but, unfortunately, "hci3" doesn't tell us much about
the client.

Perhaps the failed power-off right before the warning suggests that
we're not tearing down the wq properly on that error path? Let's see...

hci_rfkill_set_block() calls hci_dev_do_close() if the power-off fails.
That in turn cancels the delayed 'cmd_timer' work but only if HCI_UP is
not set in the device flags. But if hci_dev_do_poweroff() failed, HCI_UP
may well still be set. Then somehow the workqueue is destroyed, I guess
via hci_unregister_dev() setting HCI_UNREGISTER which enables
hci_release_dev() but at that point it's hard to know which bluetooth
driver is responsible.

Assuming hci_unregister_dev() is triggered via hci_dev_do_close()
closing the device, then unconditionally cancelling the delayed cmd work
when clearing HCI_UP sounds like a reasonable starter for ten?

Dunno, this is all guesswork in code that I'm not familiar with, so I'm
hoping somebody can tell me why I'm wrong :)

Will

--->8

diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c
index dd770ef5ec36..d2623f6d0593 100644
--- a/net/bluetooth/hci_sync.c
+++ b/net/bluetooth/hci_sync.c
@@ -5147,11 +5147,9 @@ int hci_dev_close_sync(struct hci_dev *hdev)
        }

        err = hci_dev_shutdown(hdev);
-
-       if (!test_and_clear_bit(HCI_UP, &hdev->flags)) {
-               cancel_delayed_work_sync(&hdev->cmd_timer);
+       cancel_delayed_work_sync(&hdev->cmd_timer);
+       if (!test_and_clear_bit(HCI_UP, &hdev->flags))
                return err;
-       }

        hci_leds_update_powered(hdev, false);
 


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-03-13 19:33 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <00000000000046efb605fec3def0@google.com>
2024-12-13 12:30 ` [syzbot] [arm?] WARNING in delayed_work_timer_fn syzbot
2025-02-11 11:10   ` Will Deacon
2025-02-11 20:59     ` Tejun Heo
2025-02-14 16:45       ` Will Deacon
2025-03-13 19:30         ` Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox