public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [syzbot] [arm?] WARNING in delayed_work_timer_fn
@ 2023-06-23  3:48 syzbot
  2024-12-13 12:30 ` syzbot
  0 siblings, 1 reply; 6+ messages in thread
From: syzbot @ 2023-06-23  3:48 UTC (permalink / raw)
  To: catalin.marinas, linux-arm-kernel, linux-kernel, syzkaller-bugs,
	will

Hello,

syzbot found the following issue on:

HEAD commit:    177239177378 Merge branch 'for-next/core', remote-tracking..
git tree:       git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci
console output: https://syzkaller.appspot.com/x/log.txt?x=13d8a337280000
kernel config:  https://syzkaller.appspot.com/x/.config?x=8776b67768a3c9af
dashboard link: https://syzkaller.appspot.com/bug?extid=e13e654d315d4da1277c
compiler:       Debian clang version 15.0.7, GNU ld (GNU Binutils for Debian) 2.35.2
userspace arch: arm64

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/0c47a40dd633/disk-17723917.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/0ff319b6fb50/vmlinux-17723917.xz
kernel image: https://storage.googleapis.com/syzbot-assets/3ce1ea9e3b7e/Image-17723917.gz.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+e13e654d315d4da1277c@syzkaller.appspotmail.com

------------[ cut here ]------------
WARNING: CPU: 0 PID: 15561 at kernel/workqueue.c:1447 current_wq_worker kernel/workqueue_internal.h:68 [inline]
WARNING: CPU: 0 PID: 15561 at kernel/workqueue.c:1447 is_chained_work kernel/workqueue.c:1384 [inline]
WARNING: CPU: 0 PID: 15561 at kernel/workqueue.c:1447 __queue_work+0xe48/0x1374 kernel/workqueue.c:1446
Modules linked in:
CPU: 0 PID: 15561 Comm: syz-executor.2 Not tainted 6.4.0-rc5-syzkaller-g177239177378 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/25/2023
pstate: 804000c5 (Nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : __queue_work+0xe48/0x1374 kernel/workqueue_internal.h:68
lr : current_wq_worker kernel/workqueue_internal.h:68 [inline]
lr : is_chained_work kernel/workqueue.c:1384 [inline]
lr : __queue_work+0xe48/0x1374 kernel/workqueue.c:1446
sp : ffff800080007b70
x29: ffff800080007bb0 x28: 1fffe000195506f1 x27: 0000000000000008
x26: ffff0000f6cfb800 x25: dfff800000000000 x24: ffff0000caa83788
x23: ffff0000f6cfb9c0 x22: ffff0000caa83780 x21: 0000000000000000
x20: 0000000000000100 x19: ffff000140bd8b30 x18: 1fffe00019bd827e
x17: ffff800126418000 x16: ffff80008a43bfbc x15: 0000000000010001
x14: 1ffff00011bd80ac x13: dfff800000000000 x12: 00000000beb1c7bb
x11: 0000000000000101 x10: 0000000000000100 x9 : 0000000000000000
x8 : ffff0000caa83780 x7 : ffff800080412e8c x6 : 0000000000000000
x5 : 0000000000000001 x4 : 0000000000000001 x3 : ffff80008c805480
x2 : ffff000140bd8b30 x1 : 0000000000000100 x0 : 0000000000000000
Call trace:
 current_wq_worker kernel/workqueue_internal.h:68 [inline]
 is_chained_work kernel/workqueue.c:1384 [inline]
 __queue_work+0xe48/0x1374 kernel/workqueue.c:1446
 delayed_work_timer_fn+0x74/0x90 kernel/workqueue.c:1652
 call_timer_fn+0x19c/0x8cc kernel/time/timer.c:1700
 expire_timers kernel/time/timer.c:1746 [inline]
 __run_timers+0x5b4/0x734 kernel/time/timer.c:2022
 run_timer_softirq+0x7c/0x114 kernel/time/timer.c:2035
 __do_softirq+0x2d0/0xd54 kernel/softirq.c:571
 ____do_softirq+0x14/0x20 arch/arm64/kernel/irq.c:80
 call_on_irq_stack+0x24/0x4c arch/arm64/kernel/entry.S:882
 do_softirq_own_stack+0x20/0x2c arch/arm64/kernel/irq.c:85
 invoke_softirq kernel/softirq.c:452 [inline]
 __irq_exit_rcu+0x28c/0x534 kernel/softirq.c:650
 irq_exit_rcu+0x14/0x84 kernel/softirq.c:662
 __el1_irq arch/arm64/kernel/entry-common.c:474 [inline]
 el1_interrupt+0x38/0x68 arch/arm64/kernel/entry-common.c:488
 el1h_64_irq_handler+0x18/0x24 arch/arm64/kernel/entry-common.c:493
 el1h_64_irq+0x64/0x68 arch/arm64/kernel/entry.S:587
 __daif_local_irq_restore arch/arm64/include/asm/irqflags.h:182 [inline]
 arch_local_irq_restore arch/arm64/include/asm/irqflags.h:202 [inline]
 __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:151 [inline]
 _raw_spin_unlock_irqrestore+0x44/0x98 kernel/locking/spinlock.c:194
 spin_unlock_irqrestore include/linux/spinlock.h:405 [inline]
 __wake_up_common_lock kernel/sched/wait.c:140 [inline]
 __wake_up_sync_key+0x124/0x1ac kernel/sched/wait.c:208
 __unix_dgram_recvmsg+0x4c4/0xe2c net/unix/af_unix.c:2479
 unix_dgram_recvmsg+0xd0/0xe8 net/unix/af_unix.c:2549
 sock_recvmsg_nosec+0x90/0xec net/socket.c:1019
 ____sys_recvmsg+0x5c0/0x6e8 net/socket.c:2720
 ___sys_recvmsg net/socket.c:2764 [inline]
 do_recvmmsg+0x41c/0xb60 net/socket.c:2858
 __sys_recvmmsg net/socket.c:2937 [inline]
 __do_sys_recvmmsg net/socket.c:2960 [inline]
 __se_sys_recvmmsg net/socket.c:2953 [inline]
 __arm64_sys_recvmmsg+0x180/0x23c net/socket.c:2953
 __invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
 invoke_syscall+0x98/0x2c0 arch/arm64/kernel/syscall.c:52
 el0_svc_common+0x138/0x244 arch/arm64/kernel/syscall.c:142
 do_el0_svc+0x64/0x198 arch/arm64/kernel/syscall.c:191
 el0_svc+0x4c/0x160 arch/arm64/kernel/entry-common.c:647
 el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:665
 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:591
irq event stamp: 295995
hardirqs last  enabled at (295994): [<ffff80008a521910>] __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:151 [inline]
hardirqs last  enabled at (295994): [<ffff80008a521910>] _raw_spin_unlock_irqrestore+0x38/0x98 kernel/locking/spinlock.c:194
hardirqs last disabled at (295995): [<ffff80008a5217b8>] __raw_spin_lock_irq include/linux/spinlock_api_smp.h:117 [inline]
hardirqs last disabled at (295995): [<ffff80008a5217b8>] _raw_spin_lock_irq+0x28/0x70 kernel/locking/spinlock.c:170
softirqs last  enabled at (293288): [<ffff8000800218ec>] softirq_handle_end kernel/softirq.c:414 [inline]
softirqs last  enabled at (293288): [<ffff8000800218ec>] __do_softirq+0xac0/0xd54 kernel/softirq.c:600
softirqs last disabled at (295981): [<ffff80008002b660>] ____do_softirq+0x14/0x20 arch/arm64/kernel/irq.c:80
---[ end trace 0000000000000000 ]---


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the bug is already fixed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to change bug's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the bug is a duplicate of another bug, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [syzbot] [arm?] WARNING in delayed_work_timer_fn
  2023-06-23  3:48 [syzbot] [arm?] WARNING in delayed_work_timer_fn syzbot
@ 2024-12-13 12:30 ` syzbot
  2025-02-11 11:10   ` Will Deacon
  0 siblings, 1 reply; 6+ messages in thread
From: syzbot @ 2024-12-13 12:30 UTC (permalink / raw)
  To: catalin.marinas, linux-arm-kernel, linux-kernel, syzkaller-bugs,
	will

syzbot has found a reproducer for the following issue on:

HEAD commit:    2e7aff49b5da Merge branches 'for-next/core' and 'for-next/..
git tree:       git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci
console output: https://syzkaller.appspot.com/x/log.txt?x=11ed5be8580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=696fb014d05da3a3
dashboard link: https://syzkaller.appspot.com/bug?extid=e13e654d315d4da1277c
compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
userspace arch: arm64
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=10192d44580000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/ef408f67fde3/disk-2e7aff49.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/414ac17a20dc/vmlinux-2e7aff49.xz
kernel image: https://storage.googleapis.com/syzbot-assets/a93415d2a7e7/Image-2e7aff49.gz.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+e13e654d315d4da1277c@syzkaller.appspotmail.com

------------[ cut here ]------------
WARNING: CPU: 1 PID: 6023 at kernel/workqueue.c:2257 current_wq_worker kernel/workqueue_internal.h:69 [inline]
WARNING: CPU: 1 PID: 6023 at kernel/workqueue.c:2257 is_chained_work kernel/workqueue.c:2199 [inline]
WARNING: CPU: 1 PID: 6023 at kernel/workqueue.c:2257 __queue_work+0xe50/0x1308 kernel/workqueue.c:2256
Modules linked in:
CPU: 1 UID: 0 PID: 6023 Comm: klogd Not tainted 6.13.0-rc2-syzkaller-g2e7aff49b5da #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
pstate: 404000c5 (nZcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : __queue_work+0xe50/0x1308 kernel/workqueue_internal.h:69
lr : current_wq_worker kernel/workqueue_internal.h:69 [inline]
lr : is_chained_work kernel/workqueue.c:2199 [inline]
lr : __queue_work+0xe50/0x1308 kernel/workqueue.c:2256
sp : ffff800080017b40
x29: ffff800080017b90 x28: dfff800000000000 x27: ffff700010002f88
x26: dfff800000000000 x25: ffff0000d1c63800 x24: 0000000000000008
x23: 1fffe0001b14d791 x22: ffff0000d1c639c0 x21: 0000000000000100
x20: ffff0000d8a6bc80 x19: ffff0000da2f0930 x18: 1fffe000366c5e7e
x17: ffff800123d64000 x16: ffff80008326d65c x15: 0000000000000002
x14: 1ffff00011f300ca x13: dfff800000000000 x12: 00000000b1aebe1c
x11: 00000000beaa9035 x10: 0000000000ff0100 x9 : 0000000000000000
x8 : ffff0000d8a6bc80 x7 : ffff8000805819c0 x6 : 0000000000000000
x5 : 0000000000000001 x4 : 0000000000000001 x3 : ffff80008df6a320
x2 : ffff0000da2f0930 x1 : 0000000000000100 x0 : 0000000000000000
Call trace:
 current_wq_worker kernel/workqueue_internal.h:69 [inline] (P)
 is_chained_work kernel/workqueue.c:2199 [inline] (P)
 __queue_work+0xe50/0x1308 kernel/workqueue.c:2256 (P)
 current_wq_worker kernel/workqueue_internal.h:69 [inline] (L)
 is_chained_work kernel/workqueue.c:2199 [inline] (L)
 __queue_work+0xe50/0x1308 kernel/workqueue.c:2256 (L)
 delayed_work_timer_fn+0x74/0x90 kernel/workqueue.c:2485
 call_timer_fn+0x1b4/0x8b8 kernel/time/timer.c:1793
 expire_timers kernel/time/timer.c:1839 [inline]
 __run_timers kernel/time/timer.c:2418 [inline]
 __run_timer_base+0x59c/0x7b4 kernel/time/timer.c:2430
 run_timer_base kernel/time/timer.c:2439 [inline]
 run_timer_softirq+0xcc/0x194 kernel/time/timer.c:2449
 handle_softirqs+0x320/0xd34 kernel/softirq.c:554
 __do_softirq+0x14/0x20 kernel/softirq.c:588
 ____do_softirq+0x14/0x20 arch/arm64/kernel/irq.c:81
 call_on_irq_stack+0x24/0x4c arch/arm64/kernel/entry.S:891
 do_softirq_own_stack+0x20/0x2c arch/arm64/kernel/irq.c:86
 invoke_softirq kernel/softirq.c:435 [inline]
 __irq_exit_rcu+0x1d8/0x544 kernel/softirq.c:655
 irq_exit_rcu+0x14/0x84 kernel/softirq.c:671
 __el1_irq arch/arm64/kernel/entry-common.c:561 [inline]
 el1_interrupt+0x38/0x68 arch/arm64/kernel/entry-common.c:575
 el1h_64_irq_handler+0x18/0x24 arch/arm64/kernel/entry-common.c:580
 el1h_64_irq+0x6c/0x70 arch/arm64/kernel/entry.S:596
 __daif_local_irq_enable arch/arm64/include/asm/irqflags.h:26 [inline] (P)
 arch_local_irq_enable arch/arm64/include/asm/irqflags.h:48 [inline] (P)
 raw_spin_rq_unlock_irq kernel/sched/sched.h:1526 [inline] (P)
 finish_lock_switch+0xc0/0x1e4 kernel/sched/core.c:5123 (P)
 raw_spin_rq_unlock_irq kernel/sched/sched.h:1526 [inline] (L)
 finish_lock_switch+0xbc/0x1e4 kernel/sched/core.c:5123 (L)
 finish_task_switch+0x120/0x608 kernel/sched/core.c:5241
 context_switch kernel/sched/core.c:5372 [inline]
 __schedule+0x1748/0x27f0 kernel/sched/core.c:6756
 preempt_schedule_irq+0x80/0x188 kernel/sched/core.c:7078
 arm64_preempt_schedule_irq arch/arm64/kernel/entry-common.c:301 [inline]
 __el1_irq arch/arm64/kernel/entry-common.c:563 [inline]
 el1_interrupt+0x4c/0x68 arch/arm64/kernel/entry-common.c:575
 el1h_64_irq_handler+0x18/0x24 arch/arm64/kernel/entry-common.c:580
 el1h_64_irq+0x6c/0x70 arch/arm64/kernel/entry.S:596
 __daif_local_irq_enable arch/arm64/include/asm/irqflags.h:26 [inline] (P)
 arch_local_irq_enable arch/arm64/include/asm/irqflags.h:48 [inline] (P)
 do_notify_resume+0x5c/0x1f4 arch/arm64/kernel/entry-common.c:133 (P)
 do_notify_resume+0x58/0x1f4 arch/arm64/kernel/entry-common.c:133 (L)
 exit_to_user_mode_prepare arch/arm64/kernel/entry-common.c:169 [inline]
 exit_to_user_mode arch/arm64/kernel/entry-common.c:178 [inline]
 el0_interrupt+0xc8/0x248 arch/arm64/kernel/entry-common.c:832
 __el0_irq_handler_common+0x18/0x24 arch/arm64/kernel/entry-common.c:837
 el0t_64_irq_handler+0x10/0x1c arch/arm64/kernel/entry-common.c:842
 el0t_64_irq+0x198/0x19c arch/arm64/kernel/entry.S:601
irq event stamp: 35689
hardirqs last  enabled at (35688): [<ffff80008b687f2c>] __exit_to_kernel_mode arch/arm64/kernel/entry-common.c:85 [inline]
hardirqs last  enabled at (35688): [<ffff80008b687f2c>] exit_to_kernel_mode+0xdc/0x10c arch/arm64/kernel/entry-common.c:95
hardirqs last disabled at (35689): [<ffff80008b6a9b44>] __raw_spin_lock_irq include/linux/spinlock_api_smp.h:117 [inline]
hardirqs last disabled at (35689): [<ffff80008b6a9b44>] _raw_spin_lock_irq+0x28/0x70 kernel/locking/spinlock.c:170
softirqs last  enabled at (34558): [<ffff800080129934>] local_bh_enable+0x10/0x34 include/linux/bottom_half.h:32
softirqs last disabled at (35685): [<ffff800080020db4>] __do_softirq+0x14/0x20 kernel/softirq.c:588
---[ end trace 0000000000000000 ]---


---
If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [syzbot] [arm?] WARNING in delayed_work_timer_fn
  2024-12-13 12:30 ` syzbot
@ 2025-02-11 11:10   ` Will Deacon
  2025-02-11 20:59     ` Tejun Heo
  0 siblings, 1 reply; 6+ messages in thread
From: Will Deacon @ 2025-02-11 11:10 UTC (permalink / raw)
  To: syzbot
  Cc: catalin.marinas, linux-arm-kernel, linux-kernel, syzkaller-bugs,
	maz, mark.rutland, richard.xnu.clark, jiangshanlai, tj, marcel,
	johan.hedberg, luiz.dentz, linux-bluetooth

[+ some other folks]

On Fri, Dec 13, 2024 at 04:30:24AM -0800, syzbot wrote:
> syzbot has found a reproducer for the following issue on:
> 
> HEAD commit:    2e7aff49b5da Merge branches 'for-next/core' and 'for-next/..
> git tree:       git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci
> console output: https://syzkaller.appspot.com/x/log.txt?x=11ed5be8580000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=696fb014d05da3a3
> dashboard link: https://syzkaller.appspot.com/bug?extid=e13e654d315d4da1277c
> compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> userspace arch: arm64
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=10192d44580000
> 
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/ef408f67fde3/disk-2e7aff49.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/414ac17a20dc/vmlinux-2e7aff49.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/a93415d2a7e7/Image-2e7aff49.gz.xz
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+e13e654d315d4da1277c@syzkaller.appspotmail.com

I spent a bit of time looking at the reports in the "dashboard link"
above. There are quite a few in there and I ended up suspecting the
bluetooth subsystem and drivers using hci_register_dev(). Specifically,
it looks like one of the workqueues is serviced in softirq context after
its corresponding device has been destroyed.

I wasn't able to reproduce it locally (I don't have a setup for running
the 'syz' reproducers), but the program is pretty small (see the "syz repro"
link above).

Given that this seems to explode a few times a month, I wonder if it's
worth adding some instrumentation to e.g. dump the name of the workqueue?

Will

--->8

(mammoth backtrace below)

> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 6023 at kernel/workqueue.c:2257 current_wq_worker kernel/workqueue_internal.h:69 [inline]
> WARNING: CPU: 1 PID: 6023 at kernel/workqueue.c:2257 is_chained_work kernel/workqueue.c:2199 [inline]
> WARNING: CPU: 1 PID: 6023 at kernel/workqueue.c:2257 __queue_work+0xe50/0x1308 kernel/workqueue.c:2256
> Modules linked in:
> CPU: 1 UID: 0 PID: 6023 Comm: klogd Not tainted 6.13.0-rc2-syzkaller-g2e7aff49b5da #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
> pstate: 404000c5 (nZcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : __queue_work+0xe50/0x1308 kernel/workqueue_internal.h:69
> lr : current_wq_worker kernel/workqueue_internal.h:69 [inline]
> lr : is_chained_work kernel/workqueue.c:2199 [inline]
> lr : __queue_work+0xe50/0x1308 kernel/workqueue.c:2256
> sp : ffff800080017b40
> x29: ffff800080017b90 x28: dfff800000000000 x27: ffff700010002f88
> x26: dfff800000000000 x25: ffff0000d1c63800 x24: 0000000000000008
> x23: 1fffe0001b14d791 x22: ffff0000d1c639c0 x21: 0000000000000100
> x20: ffff0000d8a6bc80 x19: ffff0000da2f0930 x18: 1fffe000366c5e7e
> x17: ffff800123d64000 x16: ffff80008326d65c x15: 0000000000000002
> x14: 1ffff00011f300ca x13: dfff800000000000 x12: 00000000b1aebe1c
> x11: 00000000beaa9035 x10: 0000000000ff0100 x9 : 0000000000000000
> x8 : ffff0000d8a6bc80 x7 : ffff8000805819c0 x6 : 0000000000000000
> x5 : 0000000000000001 x4 : 0000000000000001 x3 : ffff80008df6a320
> x2 : ffff0000da2f0930 x1 : 0000000000000100 x0 : 0000000000000000
> Call trace:
>  current_wq_worker kernel/workqueue_internal.h:69 [inline] (P)
>  is_chained_work kernel/workqueue.c:2199 [inline] (P)
>  __queue_work+0xe50/0x1308 kernel/workqueue.c:2256 (P)
>  current_wq_worker kernel/workqueue_internal.h:69 [inline] (L)
>  is_chained_work kernel/workqueue.c:2199 [inline] (L)
>  __queue_work+0xe50/0x1308 kernel/workqueue.c:2256 (L)
>  delayed_work_timer_fn+0x74/0x90 kernel/workqueue.c:2485
>  call_timer_fn+0x1b4/0x8b8 kernel/time/timer.c:1793
>  expire_timers kernel/time/timer.c:1839 [inline]
>  __run_timers kernel/time/timer.c:2418 [inline]
>  __run_timer_base+0x59c/0x7b4 kernel/time/timer.c:2430
>  run_timer_base kernel/time/timer.c:2439 [inline]
>  run_timer_softirq+0xcc/0x194 kernel/time/timer.c:2449
>  handle_softirqs+0x320/0xd34 kernel/softirq.c:554
>  __do_softirq+0x14/0x20 kernel/softirq.c:588
>  ____do_softirq+0x14/0x20 arch/arm64/kernel/irq.c:81
>  call_on_irq_stack+0x24/0x4c arch/arm64/kernel/entry.S:891
>  do_softirq_own_stack+0x20/0x2c arch/arm64/kernel/irq.c:86
>  invoke_softirq kernel/softirq.c:435 [inline]
>  __irq_exit_rcu+0x1d8/0x544 kernel/softirq.c:655
>  irq_exit_rcu+0x14/0x84 kernel/softirq.c:671
>  __el1_irq arch/arm64/kernel/entry-common.c:561 [inline]
>  el1_interrupt+0x38/0x68 arch/arm64/kernel/entry-common.c:575
>  el1h_64_irq_handler+0x18/0x24 arch/arm64/kernel/entry-common.c:580
>  el1h_64_irq+0x6c/0x70 arch/arm64/kernel/entry.S:596
>  __daif_local_irq_enable arch/arm64/include/asm/irqflags.h:26 [inline] (P)
>  arch_local_irq_enable arch/arm64/include/asm/irqflags.h:48 [inline] (P)
>  raw_spin_rq_unlock_irq kernel/sched/sched.h:1526 [inline] (P)
>  finish_lock_switch+0xc0/0x1e4 kernel/sched/core.c:5123 (P)
>  raw_spin_rq_unlock_irq kernel/sched/sched.h:1526 [inline] (L)
>  finish_lock_switch+0xbc/0x1e4 kernel/sched/core.c:5123 (L)
>  finish_task_switch+0x120/0x608 kernel/sched/core.c:5241
>  context_switch kernel/sched/core.c:5372 [inline]
>  __schedule+0x1748/0x27f0 kernel/sched/core.c:6756
>  preempt_schedule_irq+0x80/0x188 kernel/sched/core.c:7078
>  arm64_preempt_schedule_irq arch/arm64/kernel/entry-common.c:301 [inline]
>  __el1_irq arch/arm64/kernel/entry-common.c:563 [inline]
>  el1_interrupt+0x4c/0x68 arch/arm64/kernel/entry-common.c:575
>  el1h_64_irq_handler+0x18/0x24 arch/arm64/kernel/entry-common.c:580
>  el1h_64_irq+0x6c/0x70 arch/arm64/kernel/entry.S:596
>  __daif_local_irq_enable arch/arm64/include/asm/irqflags.h:26 [inline] (P)
>  arch_local_irq_enable arch/arm64/include/asm/irqflags.h:48 [inline] (P)
>  do_notify_resume+0x5c/0x1f4 arch/arm64/kernel/entry-common.c:133 (P)
>  do_notify_resume+0x58/0x1f4 arch/arm64/kernel/entry-common.c:133 (L)
>  exit_to_user_mode_prepare arch/arm64/kernel/entry-common.c:169 [inline]
>  exit_to_user_mode arch/arm64/kernel/entry-common.c:178 [inline]
>  el0_interrupt+0xc8/0x248 arch/arm64/kernel/entry-common.c:832
>  __el0_irq_handler_common+0x18/0x24 arch/arm64/kernel/entry-common.c:837
>  el0t_64_irq_handler+0x10/0x1c arch/arm64/kernel/entry-common.c:842
>  el0t_64_irq+0x198/0x19c arch/arm64/kernel/entry.S:601
> irq event stamp: 35689
> hardirqs last  enabled at (35688): [<ffff80008b687f2c>] __exit_to_kernel_mode arch/arm64/kernel/entry-common.c:85 [inline]
> hardirqs last  enabled at (35688): [<ffff80008b687f2c>] exit_to_kernel_mode+0xdc/0x10c arch/arm64/kernel/entry-common.c:95
> hardirqs last disabled at (35689): [<ffff80008b6a9b44>] __raw_spin_lock_irq include/linux/spinlock_api_smp.h:117 [inline]
> hardirqs last disabled at (35689): [<ffff80008b6a9b44>] _raw_spin_lock_irq+0x28/0x70 kernel/locking/spinlock.c:170
> softirqs last  enabled at (34558): [<ffff800080129934>] local_bh_enable+0x10/0x34 include/linux/bottom_half.h:32
> softirqs last disabled at (35685): [<ffff800080020db4>] __do_softirq+0x14/0x20 kernel/softirq.c:588
> ---[ end trace 0000000000000000 ]---
> 
> 
> ---
> If you want syzbot to run the reproducer, reply with:
> #syz test: git://repo/address.git branch-or-commit-hash
> If you attach or paste a git patch, syzbot will apply it before testing.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [syzbot] [arm?] WARNING in delayed_work_timer_fn
  2025-02-11 11:10   ` Will Deacon
@ 2025-02-11 20:59     ` Tejun Heo
  2025-02-14 16:45       ` Will Deacon
  0 siblings, 1 reply; 6+ messages in thread
From: Tejun Heo @ 2025-02-11 20:59 UTC (permalink / raw)
  To: Will Deacon
  Cc: syzbot, catalin.marinas, linux-arm-kernel, linux-kernel,
	syzkaller-bugs, maz, mark.rutland, richard.xnu.clark,
	jiangshanlai, marcel, johan.hedberg, luiz.dentz, linux-bluetooth

On Tue, Feb 11, 2025 at 11:10:46AM +0000, Will Deacon wrote:
> Given that this seems to explode a few times a month, I wonder if it's
> worth adding some instrumentation to e.g. dump the name of the workqueue?

It's usually most useful to print out the work function when identifying the
culprit. I'd be happy to take the patch.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [syzbot] [arm?] WARNING in delayed_work_timer_fn
  2025-02-11 20:59     ` Tejun Heo
@ 2025-02-14 16:45       ` Will Deacon
  2025-03-13 19:30         ` Will Deacon
  0 siblings, 1 reply; 6+ messages in thread
From: Will Deacon @ 2025-02-14 16:45 UTC (permalink / raw)
  To: Tejun Heo
  Cc: syzbot, catalin.marinas, linux-arm-kernel, linux-kernel,
	syzkaller-bugs, maz, mark.rutland, richard.xnu.clark,
	jiangshanlai, marcel, johan.hedberg, luiz.dentz, linux-bluetooth

On Tue, Feb 11, 2025 at 10:59:30AM -1000, Tejun Heo wrote:
> On Tue, Feb 11, 2025 at 11:10:46AM +0000, Will Deacon wrote:
> > Given that this seems to explode a few times a month, I wonder if it's
> > worth adding some instrumentation to e.g. dump the name of the workqueue?
> 
> It's usually most useful to print out the work function when identifying the
> culprit. I'd be happy to take the patch.

Thanks, Tejun. I sent a patch adding some more diagnostics:

https://lore.kernel.org/r/20250214164349.13694-1-will@kernel.org

Will

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [syzbot] [arm?] WARNING in delayed_work_timer_fn
  2025-02-14 16:45       ` Will Deacon
@ 2025-03-13 19:30         ` Will Deacon
  0 siblings, 0 replies; 6+ messages in thread
From: Will Deacon @ 2025-03-13 19:30 UTC (permalink / raw)
  To: Tejun Heo
  Cc: syzbot, catalin.marinas, linux-arm-kernel, linux-kernel,
	syzkaller-bugs, maz, mark.rutland, richard.xnu.clark,
	jiangshanlai, marcel, johan.hedberg, luiz.dentz, linux-bluetooth

On Fri, Feb 14, 2025 at 04:45:57PM +0000, Will Deacon wrote:
> On Tue, Feb 11, 2025 at 10:59:30AM -1000, Tejun Heo wrote:
> > On Tue, Feb 11, 2025 at 11:10:46AM +0000, Will Deacon wrote:
> > > Given that this seems to explode a few times a month, I wonder if it's
> > > worth adding some instrumentation to e.g. dump the name of the workqueue?
> > 
> > It's usually most useful to print out the work function when identifying the
> > culprit. I'd be happy to take the patch.
> 
> Thanks, Tejun. I sent a patch adding some more diagnostics:
> 
> https://lore.kernel.org/r/20250214164349.13694-1-will@kernel.org

Ok, we have our first crash [1] with the new debug information:

[   77.133818][   T10] Bluetooth: hci3: Opcode 0x0c1a failed: -110
[   77.135329][   T10] Bluetooth: hci3: Error when powering off device on rfkill (-110)
[   77.143656][    C1] ------------[ cut here ]------------
[   77.145336][    C1] workqueue: cannot queue hci_cmd_timeout on wq hci3
[   77.147101][    C1] WARNING: CPU: 1 PID: 7433 at kernel/workqueue.c:2258 __queue_work+0xe94/0x1324
[   77.149301][    C1] Modules linked in:
[   77.150287][    C1] CPU: 1 UID: 0 PID: 7433 Comm: syz.2.193 Not tainted 6.14.0-rc5-syzkaller-g77c95b8c7a16 #0
[   77.152960][    C1] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2025
[   77.155583][    C1] pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   77.157546][    C1] pc : __queue_work+0xe94/0x1324
[   77.158811][    C1] lr : __queue_work+0xe94/0x1324
[   77.160114][    C1] sp : ffff800080017b40
[   77.161145][    C1] x29: ffff800080017b90 x28: dfff800000000000 x27: ffff700010002f88
[   77.163285][    C1] x26: ffff80008fbbd600 x25: ffff0000dacac800 x24: 0000000000000008
[   77.165326][    C1] x23: ffff0000dacac9c0 x22: 0000000000000100 x21: 1fffe0001b595938
[   77.167421][    C1] x20: ffff800092f09000 x19: ffff0000ecf20948 x18: 0000000000000008
[   77.169356][    C1] x17: 0000000000000000 x16: ffff80008b72ce6c x15: ffff700011f87b38
[   77.171279][    C1] x14: 1ffff00011f87b38 x13: 0000000000000004 x12: ffffffffffffffff
[   77.173271][    C1] x11: 0000000000000103 x10: 0000000000ff0100 x9 : d5bcdc7e4519df00
[   77.175255][    C1] x8 : d5bcdc7e4519df00 x7 : 0000000000000001 x6 : 0000000000000001
[   77.177277][    C1] x5 : ffff8000800172f8 x4 : ffff80008fcaf7c0 x3 : ffff8000804a7bb4
[   77.179372][    C1] x2 : 0000000000000000 x1 : 0000000100000101 x0 : 0000000000000000
[   77.181374][    C1] Call trace:
[   77.182230][    C1]  __queue_work+0xe94/0x1324 (P)
[   77.183476][    C1]  delayed_work_timer_fn+0x74/0x90
[   77.184755][    C1]  call_timer_fn+0x1b4/0x8b8
[   77.185932][    C1]  __run_timer_base+0x59c/0x7b4
[   77.187119][    C1]  run_timer_softirq+0xcc/0x194
[   77.188360][    C1]  handle_softirqs+0x320/0xd34
[   77.189576][    C1]  __do_softirq+0x14/0x20
[   77.190672][    C1]  ____do_softirq+0x14/0x20
[   77.191884][    C1]  call_on_irq_stack+0x24/0x4c
[   77.193050][    C1]  do_softirq_own_stack+0x20/0x2c
[   77.194311][    C1]  __irq_exit_rcu+0x1d8/0x544
[   77.195550][    C1]  irq_exit_rcu+0x14/0x84
[   77.196669][    C1]  el1_interrupt+0x38/0x68
[   77.197818][    C1]  el1h_64_irq_handler+0x18/0x24
[   77.199152][    C1]  el1h_64_irq+0x6c/0x70
[   77.200245][    C1]  lock_acquire+0x278/0x724 (P)
[   77.201520][    C1]  rcu_lock_acquire+0x44/0x54
[   77.202722][    C1]  page_ext_get+0x2c/0x2e8
[   77.203908][    C1]  page_table_check_set+0xa0/0x408
[   77.205117][    C1]  __page_table_check_ptes_set+0x2d0/0x398
[   77.206601][    C1]  set_pte_range+0x618/0x644
[   77.207792][    C1]  finish_fault+0x968/0xd6c
[   77.208968][    C1]  handle_pte_fault+0x3528/0x57b0
[   77.210314][    C1]  handle_mm_fault+0xfa8/0x188c
[   77.211592][    C1]  __get_user_pages+0x1878/0x3400
[   77.212967][    C1]  populate_vma_page_range+0x220/0x2f0
[   77.214435][    C1]  __mm_populate+0x240/0x3d8
[   77.215600][    C1]  vm_mmap_pgoff+0x304/0x3c4
[   77.216833][    C1]  ksys_mmap_pgoff+0xd0/0x5c8
[   77.218066][    C1]  __arm64_sys_mmap+0xf8/0x110
[   77.219195][    C1]  invoke_syscall+0x98/0x2b8
[   77.220453][    C1]  el0_svc_common+0x130/0x23c
[   77.221747][    C1]  do_el0_svc+0x48/0x58
[   77.222881][    C1]  el0_svc+0x54/0x168
[   77.223946][    C1]  el0t_64_sync_handler+0x84/0x108
[   77.225305][    C1]  el0t_64_sync+0x198/0x19c
[   77.226425][    C1] irq event stamp: 3967
[   77.227573][    C1] hardirqs last  enabled at (3966): [<ffff80008b7ec538>] _raw_spin_unlock_irqrestore+0x38/0x98
[   77.230143][    C1] hardirqs last disabled at (3967): [<ffff80008b7ec3e0>] _raw_spin_lock_irq+0x28/0x70
[   77.232639][    C1] softirqs last  enabled at (2318): [<ffff800080311b48>] handle_softirqs+0xb44/0xd34
[   77.235151][    C1] softirqs last disabled at (3923): [<ffff800080020dbc>] __do_softirq+0x14/0x20
[   77.237528][    C1] ---[ end trace 0000000000000000 ]---


So it looks like I was right about bluetooth (the work function is
"hci_cmd_timeout") but, unfortunately, "hci3" doesn't tell us much about
the client.

Perhaps the failed power-off right before the warning suggests that
we're not tearing down the wq properly on that error path? Let's see...

hci_rfkill_set_block() calls hci_dev_do_close() if the power-off fails.
That in turn cancels the delayed 'cmd_timer' work but only if HCI_UP is
not set in the device flags. But if hci_dev_do_poweroff() failed, HCI_UP
may well still be set. Then somehow the workqueue is destroyed, I guess
via hci_unregister_dev() setting HCI_UNREGISTER which enables
hci_release_dev() but at that point it's hard to know which bluetooth
driver is responsible.

Assuming hci_unregister_dev() is triggered via hci_dev_do_close()
closing the device, then unconditionally cancelling the delayed cmd work
when clearing HCI_UP sounds like a reasonable starter for ten?

Dunno, this is all guesswork in code that I'm not familiar with, so I'm
hoping somebody can tell me why I'm wrong :)

Will

--->8

diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c
index dd770ef5ec36..d2623f6d0593 100644
--- a/net/bluetooth/hci_sync.c
+++ b/net/bluetooth/hci_sync.c
@@ -5147,11 +5147,9 @@ int hci_dev_close_sync(struct hci_dev *hdev)
        }

        err = hci_dev_shutdown(hdev);
-
-       if (!test_and_clear_bit(HCI_UP, &hdev->flags)) {
-               cancel_delayed_work_sync(&hdev->cmd_timer);
+       cancel_delayed_work_sync(&hdev->cmd_timer);
+       if (!test_and_clear_bit(HCI_UP, &hdev->flags))
                return err;
-       }

        hci_leds_update_powered(hdev, false);
 

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-03-13 19:31 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-23  3:48 [syzbot] [arm?] WARNING in delayed_work_timer_fn syzbot
2024-12-13 12:30 ` syzbot
2025-02-11 11:10   ` Will Deacon
2025-02-11 20:59     ` Tejun Heo
2025-02-14 16:45       ` Will Deacon
2025-03-13 19:30         ` Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox