* [syzbot] [arm?] WARNING in delayed_work_timer_fn @ 2023-06-23 3:48 syzbot 2024-12-13 12:30 ` syzbot 0 siblings, 1 reply; 6+ messages in thread From: syzbot @ 2023-06-23 3:48 UTC (permalink / raw) To: catalin.marinas, linux-arm-kernel, linux-kernel, syzkaller-bugs, will Hello, syzbot found the following issue on: HEAD commit: 177239177378 Merge branch 'for-next/core', remote-tracking.. git tree: git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci console output: https://syzkaller.appspot.com/x/log.txt?x=13d8a337280000 kernel config: https://syzkaller.appspot.com/x/.config?x=8776b67768a3c9af dashboard link: https://syzkaller.appspot.com/bug?extid=e13e654d315d4da1277c compiler: Debian clang version 15.0.7, GNU ld (GNU Binutils for Debian) 2.35.2 userspace arch: arm64 Unfortunately, I don't have any reproducer for this issue yet. Downloadable assets: disk image: https://storage.googleapis.com/syzbot-assets/0c47a40dd633/disk-17723917.raw.xz vmlinux: https://storage.googleapis.com/syzbot-assets/0ff319b6fb50/vmlinux-17723917.xz kernel image: https://storage.googleapis.com/syzbot-assets/3ce1ea9e3b7e/Image-17723917.gz.xz IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+e13e654d315d4da1277c@syzkaller.appspotmail.com ------------[ cut here ]------------ WARNING: CPU: 0 PID: 15561 at kernel/workqueue.c:1447 current_wq_worker kernel/workqueue_internal.h:68 [inline] WARNING: CPU: 0 PID: 15561 at kernel/workqueue.c:1447 is_chained_work kernel/workqueue.c:1384 [inline] WARNING: CPU: 0 PID: 15561 at kernel/workqueue.c:1447 __queue_work+0xe48/0x1374 kernel/workqueue.c:1446 Modules linked in: CPU: 0 PID: 15561 Comm: syz-executor.2 Not tainted 6.4.0-rc5-syzkaller-g177239177378 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/25/2023 pstate: 804000c5 (Nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : __queue_work+0xe48/0x1374 kernel/workqueue_internal.h:68 lr : current_wq_worker kernel/workqueue_internal.h:68 [inline] lr : is_chained_work kernel/workqueue.c:1384 [inline] lr : __queue_work+0xe48/0x1374 kernel/workqueue.c:1446 sp : ffff800080007b70 x29: ffff800080007bb0 x28: 1fffe000195506f1 x27: 0000000000000008 x26: ffff0000f6cfb800 x25: dfff800000000000 x24: ffff0000caa83788 x23: ffff0000f6cfb9c0 x22: ffff0000caa83780 x21: 0000000000000000 x20: 0000000000000100 x19: ffff000140bd8b30 x18: 1fffe00019bd827e x17: ffff800126418000 x16: ffff80008a43bfbc x15: 0000000000010001 x14: 1ffff00011bd80ac x13: dfff800000000000 x12: 00000000beb1c7bb x11: 0000000000000101 x10: 0000000000000100 x9 : 0000000000000000 x8 : ffff0000caa83780 x7 : ffff800080412e8c x6 : 0000000000000000 x5 : 0000000000000001 x4 : 0000000000000001 x3 : ffff80008c805480 x2 : ffff000140bd8b30 x1 : 0000000000000100 x0 : 0000000000000000 Call trace: current_wq_worker kernel/workqueue_internal.h:68 [inline] is_chained_work kernel/workqueue.c:1384 [inline] __queue_work+0xe48/0x1374 kernel/workqueue.c:1446 delayed_work_timer_fn+0x74/0x90 kernel/workqueue.c:1652 call_timer_fn+0x19c/0x8cc kernel/time/timer.c:1700 expire_timers kernel/time/timer.c:1746 [inline] __run_timers+0x5b4/0x734 kernel/time/timer.c:2022 run_timer_softirq+0x7c/0x114 kernel/time/timer.c:2035 __do_softirq+0x2d0/0xd54 kernel/softirq.c:571 ____do_softirq+0x14/0x20 arch/arm64/kernel/irq.c:80 call_on_irq_stack+0x24/0x4c arch/arm64/kernel/entry.S:882 do_softirq_own_stack+0x20/0x2c arch/arm64/kernel/irq.c:85 invoke_softirq kernel/softirq.c:452 [inline] __irq_exit_rcu+0x28c/0x534 kernel/softirq.c:650 irq_exit_rcu+0x14/0x84 kernel/softirq.c:662 __el1_irq arch/arm64/kernel/entry-common.c:474 [inline] el1_interrupt+0x38/0x68 arch/arm64/kernel/entry-common.c:488 el1h_64_irq_handler+0x18/0x24 arch/arm64/kernel/entry-common.c:493 el1h_64_irq+0x64/0x68 arch/arm64/kernel/entry.S:587 __daif_local_irq_restore arch/arm64/include/asm/irqflags.h:182 [inline] arch_local_irq_restore arch/arm64/include/asm/irqflags.h:202 [inline] __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:151 [inline] _raw_spin_unlock_irqrestore+0x44/0x98 kernel/locking/spinlock.c:194 spin_unlock_irqrestore include/linux/spinlock.h:405 [inline] __wake_up_common_lock kernel/sched/wait.c:140 [inline] __wake_up_sync_key+0x124/0x1ac kernel/sched/wait.c:208 __unix_dgram_recvmsg+0x4c4/0xe2c net/unix/af_unix.c:2479 unix_dgram_recvmsg+0xd0/0xe8 net/unix/af_unix.c:2549 sock_recvmsg_nosec+0x90/0xec net/socket.c:1019 ____sys_recvmsg+0x5c0/0x6e8 net/socket.c:2720 ___sys_recvmsg net/socket.c:2764 [inline] do_recvmmsg+0x41c/0xb60 net/socket.c:2858 __sys_recvmmsg net/socket.c:2937 [inline] __do_sys_recvmmsg net/socket.c:2960 [inline] __se_sys_recvmmsg net/socket.c:2953 [inline] __arm64_sys_recvmmsg+0x180/0x23c net/socket.c:2953 __invoke_syscall arch/arm64/kernel/syscall.c:38 [inline] invoke_syscall+0x98/0x2c0 arch/arm64/kernel/syscall.c:52 el0_svc_common+0x138/0x244 arch/arm64/kernel/syscall.c:142 do_el0_svc+0x64/0x198 arch/arm64/kernel/syscall.c:191 el0_svc+0x4c/0x160 arch/arm64/kernel/entry-common.c:647 el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:665 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:591 irq event stamp: 295995 hardirqs last enabled at (295994): [<ffff80008a521910>] __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:151 [inline] hardirqs last enabled at (295994): [<ffff80008a521910>] _raw_spin_unlock_irqrestore+0x38/0x98 kernel/locking/spinlock.c:194 hardirqs last disabled at (295995): [<ffff80008a5217b8>] __raw_spin_lock_irq include/linux/spinlock_api_smp.h:117 [inline] hardirqs last disabled at (295995): [<ffff80008a5217b8>] _raw_spin_lock_irq+0x28/0x70 kernel/locking/spinlock.c:170 softirqs last enabled at (293288): [<ffff8000800218ec>] softirq_handle_end kernel/softirq.c:414 [inline] softirqs last enabled at (293288): [<ffff8000800218ec>] __do_softirq+0xac0/0xd54 kernel/softirq.c:600 softirqs last disabled at (295981): [<ffff80008002b660>] ____do_softirq+0x14/0x20 arch/arm64/kernel/irq.c:80 ---[ end trace 0000000000000000 ]--- --- This report is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkaller@googlegroups.com. syzbot will keep track of this issue. See: https://goo.gl/tpsmEJ#status for how to communicate with syzbot. If the bug is already fixed, let syzbot know by replying with: #syz fix: exact-commit-title If you want to change bug's subsystems, reply with: #syz set subsystems: new-subsystem (See the list of subsystem names on the web dashboard) If the bug is a duplicate of another bug, reply with: #syz dup: exact-subject-of-another-report If you want to undo deduplication, reply with: #syz undup ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [syzbot] [arm?] WARNING in delayed_work_timer_fn 2023-06-23 3:48 [syzbot] [arm?] WARNING in delayed_work_timer_fn syzbot @ 2024-12-13 12:30 ` syzbot 2025-02-11 11:10 ` Will Deacon 0 siblings, 1 reply; 6+ messages in thread From: syzbot @ 2024-12-13 12:30 UTC (permalink / raw) To: catalin.marinas, linux-arm-kernel, linux-kernel, syzkaller-bugs, will syzbot has found a reproducer for the following issue on: HEAD commit: 2e7aff49b5da Merge branches 'for-next/core' and 'for-next/.. git tree: git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci console output: https://syzkaller.appspot.com/x/log.txt?x=11ed5be8580000 kernel config: https://syzkaller.appspot.com/x/.config?x=696fb014d05da3a3 dashboard link: https://syzkaller.appspot.com/bug?extid=e13e654d315d4da1277c compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40 userspace arch: arm64 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=10192d44580000 Downloadable assets: disk image: https://storage.googleapis.com/syzbot-assets/ef408f67fde3/disk-2e7aff49.raw.xz vmlinux: https://storage.googleapis.com/syzbot-assets/414ac17a20dc/vmlinux-2e7aff49.xz kernel image: https://storage.googleapis.com/syzbot-assets/a93415d2a7e7/Image-2e7aff49.gz.xz IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+e13e654d315d4da1277c@syzkaller.appspotmail.com ------------[ cut here ]------------ WARNING: CPU: 1 PID: 6023 at kernel/workqueue.c:2257 current_wq_worker kernel/workqueue_internal.h:69 [inline] WARNING: CPU: 1 PID: 6023 at kernel/workqueue.c:2257 is_chained_work kernel/workqueue.c:2199 [inline] WARNING: CPU: 1 PID: 6023 at kernel/workqueue.c:2257 __queue_work+0xe50/0x1308 kernel/workqueue.c:2256 Modules linked in: CPU: 1 UID: 0 PID: 6023 Comm: klogd Not tainted 6.13.0-rc2-syzkaller-g2e7aff49b5da #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 pstate: 404000c5 (nZcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : __queue_work+0xe50/0x1308 kernel/workqueue_internal.h:69 lr : current_wq_worker kernel/workqueue_internal.h:69 [inline] lr : is_chained_work kernel/workqueue.c:2199 [inline] lr : __queue_work+0xe50/0x1308 kernel/workqueue.c:2256 sp : ffff800080017b40 x29: ffff800080017b90 x28: dfff800000000000 x27: ffff700010002f88 x26: dfff800000000000 x25: ffff0000d1c63800 x24: 0000000000000008 x23: 1fffe0001b14d791 x22: ffff0000d1c639c0 x21: 0000000000000100 x20: ffff0000d8a6bc80 x19: ffff0000da2f0930 x18: 1fffe000366c5e7e x17: ffff800123d64000 x16: ffff80008326d65c x15: 0000000000000002 x14: 1ffff00011f300ca x13: dfff800000000000 x12: 00000000b1aebe1c x11: 00000000beaa9035 x10: 0000000000ff0100 x9 : 0000000000000000 x8 : ffff0000d8a6bc80 x7 : ffff8000805819c0 x6 : 0000000000000000 x5 : 0000000000000001 x4 : 0000000000000001 x3 : ffff80008df6a320 x2 : ffff0000da2f0930 x1 : 0000000000000100 x0 : 0000000000000000 Call trace: current_wq_worker kernel/workqueue_internal.h:69 [inline] (P) is_chained_work kernel/workqueue.c:2199 [inline] (P) __queue_work+0xe50/0x1308 kernel/workqueue.c:2256 (P) current_wq_worker kernel/workqueue_internal.h:69 [inline] (L) is_chained_work kernel/workqueue.c:2199 [inline] (L) __queue_work+0xe50/0x1308 kernel/workqueue.c:2256 (L) delayed_work_timer_fn+0x74/0x90 kernel/workqueue.c:2485 call_timer_fn+0x1b4/0x8b8 kernel/time/timer.c:1793 expire_timers kernel/time/timer.c:1839 [inline] __run_timers kernel/time/timer.c:2418 [inline] __run_timer_base+0x59c/0x7b4 kernel/time/timer.c:2430 run_timer_base kernel/time/timer.c:2439 [inline] run_timer_softirq+0xcc/0x194 kernel/time/timer.c:2449 handle_softirqs+0x320/0xd34 kernel/softirq.c:554 __do_softirq+0x14/0x20 kernel/softirq.c:588 ____do_softirq+0x14/0x20 arch/arm64/kernel/irq.c:81 call_on_irq_stack+0x24/0x4c arch/arm64/kernel/entry.S:891 do_softirq_own_stack+0x20/0x2c arch/arm64/kernel/irq.c:86 invoke_softirq kernel/softirq.c:435 [inline] __irq_exit_rcu+0x1d8/0x544 kernel/softirq.c:655 irq_exit_rcu+0x14/0x84 kernel/softirq.c:671 __el1_irq arch/arm64/kernel/entry-common.c:561 [inline] el1_interrupt+0x38/0x68 arch/arm64/kernel/entry-common.c:575 el1h_64_irq_handler+0x18/0x24 arch/arm64/kernel/entry-common.c:580 el1h_64_irq+0x6c/0x70 arch/arm64/kernel/entry.S:596 __daif_local_irq_enable arch/arm64/include/asm/irqflags.h:26 [inline] (P) arch_local_irq_enable arch/arm64/include/asm/irqflags.h:48 [inline] (P) raw_spin_rq_unlock_irq kernel/sched/sched.h:1526 [inline] (P) finish_lock_switch+0xc0/0x1e4 kernel/sched/core.c:5123 (P) raw_spin_rq_unlock_irq kernel/sched/sched.h:1526 [inline] (L) finish_lock_switch+0xbc/0x1e4 kernel/sched/core.c:5123 (L) finish_task_switch+0x120/0x608 kernel/sched/core.c:5241 context_switch kernel/sched/core.c:5372 [inline] __schedule+0x1748/0x27f0 kernel/sched/core.c:6756 preempt_schedule_irq+0x80/0x188 kernel/sched/core.c:7078 arm64_preempt_schedule_irq arch/arm64/kernel/entry-common.c:301 [inline] __el1_irq arch/arm64/kernel/entry-common.c:563 [inline] el1_interrupt+0x4c/0x68 arch/arm64/kernel/entry-common.c:575 el1h_64_irq_handler+0x18/0x24 arch/arm64/kernel/entry-common.c:580 el1h_64_irq+0x6c/0x70 arch/arm64/kernel/entry.S:596 __daif_local_irq_enable arch/arm64/include/asm/irqflags.h:26 [inline] (P) arch_local_irq_enable arch/arm64/include/asm/irqflags.h:48 [inline] (P) do_notify_resume+0x5c/0x1f4 arch/arm64/kernel/entry-common.c:133 (P) do_notify_resume+0x58/0x1f4 arch/arm64/kernel/entry-common.c:133 (L) exit_to_user_mode_prepare arch/arm64/kernel/entry-common.c:169 [inline] exit_to_user_mode arch/arm64/kernel/entry-common.c:178 [inline] el0_interrupt+0xc8/0x248 arch/arm64/kernel/entry-common.c:832 __el0_irq_handler_common+0x18/0x24 arch/arm64/kernel/entry-common.c:837 el0t_64_irq_handler+0x10/0x1c arch/arm64/kernel/entry-common.c:842 el0t_64_irq+0x198/0x19c arch/arm64/kernel/entry.S:601 irq event stamp: 35689 hardirqs last enabled at (35688): [<ffff80008b687f2c>] __exit_to_kernel_mode arch/arm64/kernel/entry-common.c:85 [inline] hardirqs last enabled at (35688): [<ffff80008b687f2c>] exit_to_kernel_mode+0xdc/0x10c arch/arm64/kernel/entry-common.c:95 hardirqs last disabled at (35689): [<ffff80008b6a9b44>] __raw_spin_lock_irq include/linux/spinlock_api_smp.h:117 [inline] hardirqs last disabled at (35689): [<ffff80008b6a9b44>] _raw_spin_lock_irq+0x28/0x70 kernel/locking/spinlock.c:170 softirqs last enabled at (34558): [<ffff800080129934>] local_bh_enable+0x10/0x34 include/linux/bottom_half.h:32 softirqs last disabled at (35685): [<ffff800080020db4>] __do_softirq+0x14/0x20 kernel/softirq.c:588 ---[ end trace 0000000000000000 ]--- --- If you want syzbot to run the reproducer, reply with: #syz test: git://repo/address.git branch-or-commit-hash If you attach or paste a git patch, syzbot will apply it before testing. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [syzbot] [arm?] WARNING in delayed_work_timer_fn 2024-12-13 12:30 ` syzbot @ 2025-02-11 11:10 ` Will Deacon 2025-02-11 20:59 ` Tejun Heo 0 siblings, 1 reply; 6+ messages in thread From: Will Deacon @ 2025-02-11 11:10 UTC (permalink / raw) To: syzbot Cc: catalin.marinas, linux-arm-kernel, linux-kernel, syzkaller-bugs, maz, mark.rutland, richard.xnu.clark, jiangshanlai, tj, marcel, johan.hedberg, luiz.dentz, linux-bluetooth [+ some other folks] On Fri, Dec 13, 2024 at 04:30:24AM -0800, syzbot wrote: > syzbot has found a reproducer for the following issue on: > > HEAD commit: 2e7aff49b5da Merge branches 'for-next/core' and 'for-next/.. > git tree: git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci > console output: https://syzkaller.appspot.com/x/log.txt?x=11ed5be8580000 > kernel config: https://syzkaller.appspot.com/x/.config?x=696fb014d05da3a3 > dashboard link: https://syzkaller.appspot.com/bug?extid=e13e654d315d4da1277c > compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40 > userspace arch: arm64 > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=10192d44580000 > > Downloadable assets: > disk image: https://storage.googleapis.com/syzbot-assets/ef408f67fde3/disk-2e7aff49.raw.xz > vmlinux: https://storage.googleapis.com/syzbot-assets/414ac17a20dc/vmlinux-2e7aff49.xz > kernel image: https://storage.googleapis.com/syzbot-assets/a93415d2a7e7/Image-2e7aff49.gz.xz > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > Reported-by: syzbot+e13e654d315d4da1277c@syzkaller.appspotmail.com I spent a bit of time looking at the reports in the "dashboard link" above. There are quite a few in there and I ended up suspecting the bluetooth subsystem and drivers using hci_register_dev(). Specifically, it looks like one of the workqueues is serviced in softirq context after its corresponding device has been destroyed. I wasn't able to reproduce it locally (I don't have a setup for running the 'syz' reproducers), but the program is pretty small (see the "syz repro" link above). Given that this seems to explode a few times a month, I wonder if it's worth adding some instrumentation to e.g. dump the name of the workqueue? Will --->8 (mammoth backtrace below) > ------------[ cut here ]------------ > WARNING: CPU: 1 PID: 6023 at kernel/workqueue.c:2257 current_wq_worker kernel/workqueue_internal.h:69 [inline] > WARNING: CPU: 1 PID: 6023 at kernel/workqueue.c:2257 is_chained_work kernel/workqueue.c:2199 [inline] > WARNING: CPU: 1 PID: 6023 at kernel/workqueue.c:2257 __queue_work+0xe50/0x1308 kernel/workqueue.c:2256 > Modules linked in: > CPU: 1 UID: 0 PID: 6023 Comm: klogd Not tainted 6.13.0-rc2-syzkaller-g2e7aff49b5da #0 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 > pstate: 404000c5 (nZcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > pc : __queue_work+0xe50/0x1308 kernel/workqueue_internal.h:69 > lr : current_wq_worker kernel/workqueue_internal.h:69 [inline] > lr : is_chained_work kernel/workqueue.c:2199 [inline] > lr : __queue_work+0xe50/0x1308 kernel/workqueue.c:2256 > sp : ffff800080017b40 > x29: ffff800080017b90 x28: dfff800000000000 x27: ffff700010002f88 > x26: dfff800000000000 x25: ffff0000d1c63800 x24: 0000000000000008 > x23: 1fffe0001b14d791 x22: ffff0000d1c639c0 x21: 0000000000000100 > x20: ffff0000d8a6bc80 x19: ffff0000da2f0930 x18: 1fffe000366c5e7e > x17: ffff800123d64000 x16: ffff80008326d65c x15: 0000000000000002 > x14: 1ffff00011f300ca x13: dfff800000000000 x12: 00000000b1aebe1c > x11: 00000000beaa9035 x10: 0000000000ff0100 x9 : 0000000000000000 > x8 : ffff0000d8a6bc80 x7 : ffff8000805819c0 x6 : 0000000000000000 > x5 : 0000000000000001 x4 : 0000000000000001 x3 : ffff80008df6a320 > x2 : ffff0000da2f0930 x1 : 0000000000000100 x0 : 0000000000000000 > Call trace: > current_wq_worker kernel/workqueue_internal.h:69 [inline] (P) > is_chained_work kernel/workqueue.c:2199 [inline] (P) > __queue_work+0xe50/0x1308 kernel/workqueue.c:2256 (P) > current_wq_worker kernel/workqueue_internal.h:69 [inline] (L) > is_chained_work kernel/workqueue.c:2199 [inline] (L) > __queue_work+0xe50/0x1308 kernel/workqueue.c:2256 (L) > delayed_work_timer_fn+0x74/0x90 kernel/workqueue.c:2485 > call_timer_fn+0x1b4/0x8b8 kernel/time/timer.c:1793 > expire_timers kernel/time/timer.c:1839 [inline] > __run_timers kernel/time/timer.c:2418 [inline] > __run_timer_base+0x59c/0x7b4 kernel/time/timer.c:2430 > run_timer_base kernel/time/timer.c:2439 [inline] > run_timer_softirq+0xcc/0x194 kernel/time/timer.c:2449 > handle_softirqs+0x320/0xd34 kernel/softirq.c:554 > __do_softirq+0x14/0x20 kernel/softirq.c:588 > ____do_softirq+0x14/0x20 arch/arm64/kernel/irq.c:81 > call_on_irq_stack+0x24/0x4c arch/arm64/kernel/entry.S:891 > do_softirq_own_stack+0x20/0x2c arch/arm64/kernel/irq.c:86 > invoke_softirq kernel/softirq.c:435 [inline] > __irq_exit_rcu+0x1d8/0x544 kernel/softirq.c:655 > irq_exit_rcu+0x14/0x84 kernel/softirq.c:671 > __el1_irq arch/arm64/kernel/entry-common.c:561 [inline] > el1_interrupt+0x38/0x68 arch/arm64/kernel/entry-common.c:575 > el1h_64_irq_handler+0x18/0x24 arch/arm64/kernel/entry-common.c:580 > el1h_64_irq+0x6c/0x70 arch/arm64/kernel/entry.S:596 > __daif_local_irq_enable arch/arm64/include/asm/irqflags.h:26 [inline] (P) > arch_local_irq_enable arch/arm64/include/asm/irqflags.h:48 [inline] (P) > raw_spin_rq_unlock_irq kernel/sched/sched.h:1526 [inline] (P) > finish_lock_switch+0xc0/0x1e4 kernel/sched/core.c:5123 (P) > raw_spin_rq_unlock_irq kernel/sched/sched.h:1526 [inline] (L) > finish_lock_switch+0xbc/0x1e4 kernel/sched/core.c:5123 (L) > finish_task_switch+0x120/0x608 kernel/sched/core.c:5241 > context_switch kernel/sched/core.c:5372 [inline] > __schedule+0x1748/0x27f0 kernel/sched/core.c:6756 > preempt_schedule_irq+0x80/0x188 kernel/sched/core.c:7078 > arm64_preempt_schedule_irq arch/arm64/kernel/entry-common.c:301 [inline] > __el1_irq arch/arm64/kernel/entry-common.c:563 [inline] > el1_interrupt+0x4c/0x68 arch/arm64/kernel/entry-common.c:575 > el1h_64_irq_handler+0x18/0x24 arch/arm64/kernel/entry-common.c:580 > el1h_64_irq+0x6c/0x70 arch/arm64/kernel/entry.S:596 > __daif_local_irq_enable arch/arm64/include/asm/irqflags.h:26 [inline] (P) > arch_local_irq_enable arch/arm64/include/asm/irqflags.h:48 [inline] (P) > do_notify_resume+0x5c/0x1f4 arch/arm64/kernel/entry-common.c:133 (P) > do_notify_resume+0x58/0x1f4 arch/arm64/kernel/entry-common.c:133 (L) > exit_to_user_mode_prepare arch/arm64/kernel/entry-common.c:169 [inline] > exit_to_user_mode arch/arm64/kernel/entry-common.c:178 [inline] > el0_interrupt+0xc8/0x248 arch/arm64/kernel/entry-common.c:832 > __el0_irq_handler_common+0x18/0x24 arch/arm64/kernel/entry-common.c:837 > el0t_64_irq_handler+0x10/0x1c arch/arm64/kernel/entry-common.c:842 > el0t_64_irq+0x198/0x19c arch/arm64/kernel/entry.S:601 > irq event stamp: 35689 > hardirqs last enabled at (35688): [<ffff80008b687f2c>] __exit_to_kernel_mode arch/arm64/kernel/entry-common.c:85 [inline] > hardirqs last enabled at (35688): [<ffff80008b687f2c>] exit_to_kernel_mode+0xdc/0x10c arch/arm64/kernel/entry-common.c:95 > hardirqs last disabled at (35689): [<ffff80008b6a9b44>] __raw_spin_lock_irq include/linux/spinlock_api_smp.h:117 [inline] > hardirqs last disabled at (35689): [<ffff80008b6a9b44>] _raw_spin_lock_irq+0x28/0x70 kernel/locking/spinlock.c:170 > softirqs last enabled at (34558): [<ffff800080129934>] local_bh_enable+0x10/0x34 include/linux/bottom_half.h:32 > softirqs last disabled at (35685): [<ffff800080020db4>] __do_softirq+0x14/0x20 kernel/softirq.c:588 > ---[ end trace 0000000000000000 ]--- > > > --- > If you want syzbot to run the reproducer, reply with: > #syz test: git://repo/address.git branch-or-commit-hash > If you attach or paste a git patch, syzbot will apply it before testing. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [syzbot] [arm?] WARNING in delayed_work_timer_fn 2025-02-11 11:10 ` Will Deacon @ 2025-02-11 20:59 ` Tejun Heo 2025-02-14 16:45 ` Will Deacon 0 siblings, 1 reply; 6+ messages in thread From: Tejun Heo @ 2025-02-11 20:59 UTC (permalink / raw) To: Will Deacon Cc: syzbot, catalin.marinas, linux-arm-kernel, linux-kernel, syzkaller-bugs, maz, mark.rutland, richard.xnu.clark, jiangshanlai, marcel, johan.hedberg, luiz.dentz, linux-bluetooth On Tue, Feb 11, 2025 at 11:10:46AM +0000, Will Deacon wrote: > Given that this seems to explode a few times a month, I wonder if it's > worth adding some instrumentation to e.g. dump the name of the workqueue? It's usually most useful to print out the work function when identifying the culprit. I'd be happy to take the patch. Thanks. -- tejun ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [syzbot] [arm?] WARNING in delayed_work_timer_fn 2025-02-11 20:59 ` Tejun Heo @ 2025-02-14 16:45 ` Will Deacon 2025-03-13 19:30 ` Will Deacon 0 siblings, 1 reply; 6+ messages in thread From: Will Deacon @ 2025-02-14 16:45 UTC (permalink / raw) To: Tejun Heo Cc: syzbot, catalin.marinas, linux-arm-kernel, linux-kernel, syzkaller-bugs, maz, mark.rutland, richard.xnu.clark, jiangshanlai, marcel, johan.hedberg, luiz.dentz, linux-bluetooth On Tue, Feb 11, 2025 at 10:59:30AM -1000, Tejun Heo wrote: > On Tue, Feb 11, 2025 at 11:10:46AM +0000, Will Deacon wrote: > > Given that this seems to explode a few times a month, I wonder if it's > > worth adding some instrumentation to e.g. dump the name of the workqueue? > > It's usually most useful to print out the work function when identifying the > culprit. I'd be happy to take the patch. Thanks, Tejun. I sent a patch adding some more diagnostics: https://lore.kernel.org/r/20250214164349.13694-1-will@kernel.org Will ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [syzbot] [arm?] WARNING in delayed_work_timer_fn 2025-02-14 16:45 ` Will Deacon @ 2025-03-13 19:30 ` Will Deacon 0 siblings, 0 replies; 6+ messages in thread From: Will Deacon @ 2025-03-13 19:30 UTC (permalink / raw) To: Tejun Heo Cc: syzbot, catalin.marinas, linux-arm-kernel, linux-kernel, syzkaller-bugs, maz, mark.rutland, richard.xnu.clark, jiangshanlai, marcel, johan.hedberg, luiz.dentz, linux-bluetooth On Fri, Feb 14, 2025 at 04:45:57PM +0000, Will Deacon wrote: > On Tue, Feb 11, 2025 at 10:59:30AM -1000, Tejun Heo wrote: > > On Tue, Feb 11, 2025 at 11:10:46AM +0000, Will Deacon wrote: > > > Given that this seems to explode a few times a month, I wonder if it's > > > worth adding some instrumentation to e.g. dump the name of the workqueue? > > > > It's usually most useful to print out the work function when identifying the > > culprit. I'd be happy to take the patch. > > Thanks, Tejun. I sent a patch adding some more diagnostics: > > https://lore.kernel.org/r/20250214164349.13694-1-will@kernel.org Ok, we have our first crash [1] with the new debug information: [ 77.133818][ T10] Bluetooth: hci3: Opcode 0x0c1a failed: -110 [ 77.135329][ T10] Bluetooth: hci3: Error when powering off device on rfkill (-110) [ 77.143656][ C1] ------------[ cut here ]------------ [ 77.145336][ C1] workqueue: cannot queue hci_cmd_timeout on wq hci3 [ 77.147101][ C1] WARNING: CPU: 1 PID: 7433 at kernel/workqueue.c:2258 __queue_work+0xe94/0x1324 [ 77.149301][ C1] Modules linked in: [ 77.150287][ C1] CPU: 1 UID: 0 PID: 7433 Comm: syz.2.193 Not tainted 6.14.0-rc5-syzkaller-g77c95b8c7a16 #0 [ 77.152960][ C1] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2025 [ 77.155583][ C1] pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 77.157546][ C1] pc : __queue_work+0xe94/0x1324 [ 77.158811][ C1] lr : __queue_work+0xe94/0x1324 [ 77.160114][ C1] sp : ffff800080017b40 [ 77.161145][ C1] x29: ffff800080017b90 x28: dfff800000000000 x27: ffff700010002f88 [ 77.163285][ C1] x26: ffff80008fbbd600 x25: ffff0000dacac800 x24: 0000000000000008 [ 77.165326][ C1] x23: ffff0000dacac9c0 x22: 0000000000000100 x21: 1fffe0001b595938 [ 77.167421][ C1] x20: ffff800092f09000 x19: ffff0000ecf20948 x18: 0000000000000008 [ 77.169356][ C1] x17: 0000000000000000 x16: ffff80008b72ce6c x15: ffff700011f87b38 [ 77.171279][ C1] x14: 1ffff00011f87b38 x13: 0000000000000004 x12: ffffffffffffffff [ 77.173271][ C1] x11: 0000000000000103 x10: 0000000000ff0100 x9 : d5bcdc7e4519df00 [ 77.175255][ C1] x8 : d5bcdc7e4519df00 x7 : 0000000000000001 x6 : 0000000000000001 [ 77.177277][ C1] x5 : ffff8000800172f8 x4 : ffff80008fcaf7c0 x3 : ffff8000804a7bb4 [ 77.179372][ C1] x2 : 0000000000000000 x1 : 0000000100000101 x0 : 0000000000000000 [ 77.181374][ C1] Call trace: [ 77.182230][ C1] __queue_work+0xe94/0x1324 (P) [ 77.183476][ C1] delayed_work_timer_fn+0x74/0x90 [ 77.184755][ C1] call_timer_fn+0x1b4/0x8b8 [ 77.185932][ C1] __run_timer_base+0x59c/0x7b4 [ 77.187119][ C1] run_timer_softirq+0xcc/0x194 [ 77.188360][ C1] handle_softirqs+0x320/0xd34 [ 77.189576][ C1] __do_softirq+0x14/0x20 [ 77.190672][ C1] ____do_softirq+0x14/0x20 [ 77.191884][ C1] call_on_irq_stack+0x24/0x4c [ 77.193050][ C1] do_softirq_own_stack+0x20/0x2c [ 77.194311][ C1] __irq_exit_rcu+0x1d8/0x544 [ 77.195550][ C1] irq_exit_rcu+0x14/0x84 [ 77.196669][ C1] el1_interrupt+0x38/0x68 [ 77.197818][ C1] el1h_64_irq_handler+0x18/0x24 [ 77.199152][ C1] el1h_64_irq+0x6c/0x70 [ 77.200245][ C1] lock_acquire+0x278/0x724 (P) [ 77.201520][ C1] rcu_lock_acquire+0x44/0x54 [ 77.202722][ C1] page_ext_get+0x2c/0x2e8 [ 77.203908][ C1] page_table_check_set+0xa0/0x408 [ 77.205117][ C1] __page_table_check_ptes_set+0x2d0/0x398 [ 77.206601][ C1] set_pte_range+0x618/0x644 [ 77.207792][ C1] finish_fault+0x968/0xd6c [ 77.208968][ C1] handle_pte_fault+0x3528/0x57b0 [ 77.210314][ C1] handle_mm_fault+0xfa8/0x188c [ 77.211592][ C1] __get_user_pages+0x1878/0x3400 [ 77.212967][ C1] populate_vma_page_range+0x220/0x2f0 [ 77.214435][ C1] __mm_populate+0x240/0x3d8 [ 77.215600][ C1] vm_mmap_pgoff+0x304/0x3c4 [ 77.216833][ C1] ksys_mmap_pgoff+0xd0/0x5c8 [ 77.218066][ C1] __arm64_sys_mmap+0xf8/0x110 [ 77.219195][ C1] invoke_syscall+0x98/0x2b8 [ 77.220453][ C1] el0_svc_common+0x130/0x23c [ 77.221747][ C1] do_el0_svc+0x48/0x58 [ 77.222881][ C1] el0_svc+0x54/0x168 [ 77.223946][ C1] el0t_64_sync_handler+0x84/0x108 [ 77.225305][ C1] el0t_64_sync+0x198/0x19c [ 77.226425][ C1] irq event stamp: 3967 [ 77.227573][ C1] hardirqs last enabled at (3966): [<ffff80008b7ec538>] _raw_spin_unlock_irqrestore+0x38/0x98 [ 77.230143][ C1] hardirqs last disabled at (3967): [<ffff80008b7ec3e0>] _raw_spin_lock_irq+0x28/0x70 [ 77.232639][ C1] softirqs last enabled at (2318): [<ffff800080311b48>] handle_softirqs+0xb44/0xd34 [ 77.235151][ C1] softirqs last disabled at (3923): [<ffff800080020dbc>] __do_softirq+0x14/0x20 [ 77.237528][ C1] ---[ end trace 0000000000000000 ]--- So it looks like I was right about bluetooth (the work function is "hci_cmd_timeout") but, unfortunately, "hci3" doesn't tell us much about the client. Perhaps the failed power-off right before the warning suggests that we're not tearing down the wq properly on that error path? Let's see... hci_rfkill_set_block() calls hci_dev_do_close() if the power-off fails. That in turn cancels the delayed 'cmd_timer' work but only if HCI_UP is not set in the device flags. But if hci_dev_do_poweroff() failed, HCI_UP may well still be set. Then somehow the workqueue is destroyed, I guess via hci_unregister_dev() setting HCI_UNREGISTER which enables hci_release_dev() but at that point it's hard to know which bluetooth driver is responsible. Assuming hci_unregister_dev() is triggered via hci_dev_do_close() closing the device, then unconditionally cancelling the delayed cmd work when clearing HCI_UP sounds like a reasonable starter for ten? Dunno, this is all guesswork in code that I'm not familiar with, so I'm hoping somebody can tell me why I'm wrong :) Will --->8 diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c index dd770ef5ec36..d2623f6d0593 100644 --- a/net/bluetooth/hci_sync.c +++ b/net/bluetooth/hci_sync.c @@ -5147,11 +5147,9 @@ int hci_dev_close_sync(struct hci_dev *hdev) } err = hci_dev_shutdown(hdev); - - if (!test_and_clear_bit(HCI_UP, &hdev->flags)) { - cancel_delayed_work_sync(&hdev->cmd_timer); + cancel_delayed_work_sync(&hdev->cmd_timer); + if (!test_and_clear_bit(HCI_UP, &hdev->flags)) return err; - } hci_leds_update_powered(hdev, false); ^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-03-13 19:31 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-06-23 3:48 [syzbot] [arm?] WARNING in delayed_work_timer_fn syzbot 2024-12-13 12:30 ` syzbot 2025-02-11 11:10 ` Will Deacon 2025-02-11 20:59 ` Tejun Heo 2025-02-14 16:45 ` Will Deacon 2025-03-13 19:30 ` Will Deacon
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox