* BUG: soft lockup in smp_call_function @ 2023-09-12 23:02 Sanan Hasanov 2023-09-13 10:05 ` Peter Zijlstra 2023-09-13 11:07 ` Hillf Danton 0 siblings, 2 replies; 26+ messages in thread From: Sanan Hasanov @ 2023-09-12 23:02 UTC (permalink / raw) To: peterz@infradead.org, paulmck@kernel.org, jgross@suse.com, vschneid@redhat.com, yury.norov@gmail.com, linux-kernel@vger.kernel.org Cc: syzkaller@googlegroups.com, contact@pgazz.com Good day, dear maintainers, We found a bug using a modified kernel configuration file used by syzbot. We enhanced the coverage of the configuration file using our tool, klocalizer. Kernel Branch: 6.3.0-next-20230426 Kernel Config: https://drive.google.com/file/d/1WSUEWrith9-539qo6xRqmwy4LfDtmKpp/view?usp=sharing Reproducer: https://drive.google.com/file/d/1pN6FfcjuUs6Wx94g1gufuYGjRbMMgiZ4/view?usp=sharing Thank you! Best regards, Sanan Hasanov watchdog: BUG: soft lockup - CPU#5 stuck for 26s! [kworker/u16:1:12] Modules linked in: irq event stamp: 192794 hardirqs last enabled at (192793): [<ffffffff89a0140a>] asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:645 hardirqs last disabled at (192794): [<ffffffff89975d4f>] sysvec_apic_timer_interrupt+0xf/0xc0 arch/x86/kernel/apic/apic.c:1106 softirqs last enabled at (187764): [<ffffffff814b94bd>] invoke_softirq kernel/softirq.c:445 [inline] softirqs last enabled at (187764): [<ffffffff814b94bd>] __irq_exit_rcu+0x11d/0x190 kernel/softirq.c:650 softirqs last disabled at (187671): [<ffffffff814b94bd>] invoke_softirq kernel/softirq.c:445 [inline] softirqs last disabled at (187671): [<ffffffff814b94bd>] __irq_exit_rcu+0x11d/0x190 kernel/softirq.c:650 CPU: 5 PID: 12 Comm: kworker/u16:1 Not tainted 6.3.0-next-20230426 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Workqueue: events_unbound toggle_allocation_gate RIP: 0010:csd_lock_wait kernel/smp.c:294 [inline] RIP: 0010:smp_call_function_many_cond+0x5bd/0x1020 kernel/smp.c:828 Code: 0b 00 85 ed 74 4d 48 b8 00 00 00 00 00 fc ff df 4d 89 f4 4c 89 f5 49 c1 ec 03 83 e5 07 49 01 c4 83 c5 03 e8 b5 07 0b 00 f3 90 <41> 0f b6 04 24 40 38 c5 7c 08 84 c0 0f 85 46 08 00 00 8b 43 08 31 RSP: 0018:ffffc900000cf9e8 EFLAGS: 00000293 RAX: 0000000000000000 RBX: ffff888119cc4d80 RCX: 0000000000000000 RDX: ffff888100325940 RSI: ffffffff8176807b RDI: 0000000000000005 RBP: 0000000000000003 R08: 0000000000000005 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000001 R12: ffffed10233989b1 R13: 0000000000000001 R14: ffff888119cc4d88 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffff888119e80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000555556a6cc88 CR3: 000000000bb73000 CR4: 0000000000350ee0 Call Trace: <TASK> on_each_cpu_cond_mask+0x40/0x90 kernel/smp.c:996 on_each_cpu include/linux/smp.h:71 [inline] text_poke_sync arch/x86/kernel/alternative.c:1770 [inline] text_poke_bp_batch+0x237/0x770 arch/x86/kernel/alternative.c:1970 text_poke_flush arch/x86/kernel/alternative.c:2161 [inline] text_poke_flush arch/x86/kernel/alternative.c:2158 [inline] text_poke_finish+0x1a/0x30 arch/x86/kernel/alternative.c:2168 arch_jump_label_transform_apply+0x17/0x30 arch/x86/kernel/jump_label.c:146 jump_label_update+0x321/0x400 kernel/jump_label.c:829 static_key_enable_cpuslocked+0x1b5/0x270 kernel/jump_label.c:205 static_key_enable+0x1a/0x20 kernel/jump_label.c:218 toggle_allocation_gate mm/kfence/core.c:831 [inline] toggle_allocation_gate+0xf4/0x220 mm/kfence/core.c:823 process_one_work+0x993/0x15e0 kernel/workqueue.c:2405 worker_thread+0x67d/0x10c0 kernel/workqueue.c:2552 kthread+0x33e/0x440 kernel/kthread.c:379 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 </TASK> Sending NMI from CPU 5 to CPUs 0-4,6-7: NMI backtrace for cpu 1 CPU: 1 PID: 20602 Comm: syz-executor.3 Not tainted 6.3.0-next-20230426 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 RIP: 0010:hlock_class kernel/locking/lockdep.c:228 [inline] RIP: 0010:check_wait_context kernel/locking/lockdep.c:4747 [inline] RIP: 0010:__lock_acquire+0x489/0x5d00 kernel/locking/lockdep.c:5024 Code: 41 81 e5 ff 1f 45 0f b7 ed be 08 00 00 00 4c 89 e8 48 c1 e8 06 48 8d 3c c5 00 6b 2c 90 e8 5f 90 6e 00 4c 0f a3 2d d7 35 c9 0e <0f> 83 5c 0c 00 00 4f 8d 6c 6d 00 49 c1 e5 06 49 81 c5 20 6f 2c 90 RSP: 0018:ffffc90002aa7350 EFLAGS: 00000047 RAX: 0000000000000001 RBX: 0000000000000001 RCX: ffffffff81633521 RDX: fffffbfff2058d62 RSI: 0000000000000008 RDI: ffffffff902c6b08 RBP: ffff888042995940 R08: 0000000000000000 R09: ffffffff902c6b0f R10: fffffbfff2058d61 R11: 0000000000000001 R12: ffff888119e2b818 R13: 0000000000000063 R14: 0000000000000002 R15: ffff888042996598 FS: 00007fdaad065700(0000) GS:ffff888119c80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b30623000 CR3: 0000000101969000 CR4: 0000000000350ee0 Call Trace: <TASK> lock_acquire kernel/locking/lockdep.c:5691 [inline] lock_acquire+0x1b1/0x520 kernel/locking/lockdep.c:5656 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0x3d/0x60 kernel/locking/spinlock.c:162 lock_hrtimer_base kernel/time/hrtimer.c:173 [inline] hrtimer_try_to_cancel kernel/time/hrtimer.c:1331 [inline] hrtimer_try_to_cancel+0xa9/0x2e0 kernel/time/hrtimer.c:1316 hrtimer_cancel+0x17/0x40 kernel/time/hrtimer.c:1443 __disable_vblank drivers/gpu/drm/drm_vblank.c:434 [inline] drm_vblank_disable_and_save+0x282/0x3d0 drivers/gpu/drm/drm_vblank.c:478 drm_crtc_vblank_off+0x312/0x970 drivers/gpu/drm/drm_vblank.c:1366 disable_outputs+0x7c7/0xbb0 drivers/gpu/drm/drm_atomic_helper.c:1202 drm_atomic_helper_commit_modeset_disables+0x1d/0x40 drivers/gpu/drm/drm_atomic_helper.c:1397 vkms_atomic_commit_tail+0x51/0x240 drivers/gpu/drm/vkms/vkms_drv.c:71 commit_tail+0x288/0x420 drivers/gpu/drm/drm_atomic_helper.c:1812 drm_atomic_helper_commit drivers/gpu/drm/drm_atomic_helper.c:2052 [inline] drm_atomic_helper_commit+0x306/0x390 drivers/gpu/drm/drm_atomic_helper.c:1985 drm_atomic_commit+0x20a/0x2d0 drivers/gpu/drm/drm_atomic.c:1503 drm_client_modeset_commit_atomic+0x698/0x7e0 drivers/gpu/drm/drm_client_modeset.c:1045 drm_client_modeset_dpms+0x174/0x200 drivers/gpu/drm/drm_client_modeset.c:1226 drm_fb_helper_dpms drivers/gpu/drm/drm_fb_helper.c:323 [inline] drm_fb_helper_blank+0xd1/0x260 drivers/gpu/drm/drm_fb_helper.c:356 fb_blank+0x105/0x190 drivers/video/fbdev/core/fbmem.c:1088 do_fb_ioctl+0x390/0x760 drivers/video/fbdev/core/fbmem.c:1180 fb_ioctl+0xeb/0x150 drivers/video/fbdev/core/fbmem.c:1204 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:870 [inline] __se_sys_ioctl fs/ioctl.c:856 [inline] __x64_sys_ioctl+0x197/0x210 fs/ioctl.c:856 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0x80 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7fdaabe8edcd Code: 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fdaad064bf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007fdaabfbbf80 RCX: 00007fdaabe8edcd RDX: 0000000000000004 RSI: 0000000000004611 RDI: 0000000000000003 RBP: 00007fdaabefc59c R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007ffdadeffe9f R14: 00007ffdadf00040 R15: 00007fdaad064d80 </TASK> NMI backtrace for cpu 0 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline] NMI backtrace for cpu 0 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline] NMI backtrace for cpu 0 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729 NMI backtrace for cpu 3 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline] NMI backtrace for cpu 3 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline] NMI backtrace for cpu 3 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729 NMI backtrace for cpu 6 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline] NMI backtrace for cpu 6 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline] NMI backtrace for cpu 6 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729 NMI backtrace for cpu 7 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline] NMI backtrace for cpu 7 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline] NMI backtrace for cpu 7 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729 NMI backtrace for cpu 4 CPU: 4 PID: 20623 Comm: syz-executor.6 Not tainted 6.3.0-next-20230426 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 RIP: 0010:kvm_wait+0xb7/0x110 arch/x86/kernel/kvm.c:1064 Code: 40 38 c6 74 1b 48 83 c4 10 c3 c3 e8 93 d3 50 00 eb 07 0f 00 2d 4a 04 92 08 fb f4 48 83 c4 10 c3 eb 07 0f 00 2d 3a 04 92 08 f4 <48> 83 c4 10 c3 89 74 24 0c 48 89 3c 24 e8 d7 d4 50 00 8b 74 24 0c RSP: 0018:ffffc90000300b50 EFLAGS: 00000046 RAX: 0000000000000003 RBX: 0000000000000000 RCX: dffffc0000000000 RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff88810b0803d8 RBP: ffff88810b0803d8 R08: 0000000000000001 R09: ffff88810b0803d8 R10: ffffed102161007b R11: ffffc90000300ff8 R12: 0000000000000000 R13: ffffed102161007b R14: 0000000000000001 R15: ffff888119e3d3c0 FS: 0000000000000000(0000) GS:ffff888119e00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f28183bd0b0 CR3: 000000000bb73000 CR4: 0000000000350ee0 Call Trace: <IRQ> pv_wait arch/x86/include/asm/paravirt.h:598 [inline] pv_wait_head_or_lock kernel/locking/qspinlock_paravirt.h:470 [inline] __pv_queued_spin_lock_slowpath+0x8e4/0xb80 kernel/locking/qspinlock.c:511 pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:586 [inline] queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:51 [inline] queued_spin_lock include/asm-generic/qspinlock.h:114 [inline] do_raw_spin_lock+0x20d/0x2b0 kernel/locking/spinlock_debug.c:115 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:111 [inline] _raw_spin_lock_irqsave+0x45/0x60 kernel/locking/spinlock.c:162 drm_handle_vblank+0x11e/0xb80 drivers/gpu/drm/drm_vblank.c:1986 vkms_vblank_simulate+0xe8/0x3e0 drivers/gpu/drm/vkms/vkms_crtc.c:29 __run_hrtimer kernel/time/hrtimer.c:1685 [inline] __hrtimer_run_queues+0x599/0xa30 kernel/time/hrtimer.c:1749 hrtimer_interrupt+0x320/0x7b0 kernel/time/hrtimer.c:1811 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1095 [inline] __sysvec_apic_timer_interrupt+0x14a/0x430 arch/x86/kernel/apic/apic.c:1112 sysvec_apic_timer_interrupt+0x92/0xc0 arch/x86/kernel/apic/apic.c:1106 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:645 RIP: 0010:check_kcov_mode kernel/kcov.c:173 [inline] RIP: 0010:__sanitizer_cov_trace_pc+0x11/0x70 kernel/kcov.c:207 Code: a8 01 00 00 e8 b0 ff ff ff 31 c0 c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 f3 0f 1e fa 65 8b 05 0d 33 82 7e 89 c1 48 8b 34 24 <81> e1 00 01 00 00 65 48 8b 14 25 40 bb 03 00 a9 00 01 ff 00 74 0e RSP: 0018:ffffc90002be76d8 EFLAGS: 00000286 RAX: 0000000080000001 RBX: 0000000000000001 RCX: 0000000080000001 RDX: 00007f2817c77000 RSI: ffffffff81bcd756 RDI: ffffc90002be7ad8 RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000001 R12: ffffea00014fc480 R13: 0000000000000000 R14: dffffc0000000000 R15: 8000000053f12007 zap_drop_file_uffd_wp mm/memory.c:1352 [inline] zap_install_uffd_wp_if_needed mm/memory.c:1371 [inline] zap_pte_range mm/memory.c:1417 [inline] zap_pmd_range mm/memory.c:1564 [inline] zap_pud_range mm/memory.c:1593 [inline] zap_p4d_range mm/memory.c:1614 [inline] unmap_page_range+0x1046/0x4470 mm/memory.c:1635 unmap_single_vma+0x19a/0x2b0 mm/memory.c:1681 unmap_vmas+0x234/0x380 mm/memory.c:1720 exit_mmap+0x190/0x930 mm/mmap.c:3111 __mmput+0x128/0x4c0 kernel/fork.c:1351 mmput+0x60/0x70 kernel/fork.c:1373 exit_mm kernel/exit.c:564 [inline] do_exit+0x9d1/0x29f0 kernel/exit.c:858 do_group_exit+0xd4/0x2a0 kernel/exit.c:1021 get_signal+0x2311/0x25c0 kernel/signal.c:2874 arch_do_signal_or_restart+0x79/0x5a0 arch/x86/kernel/signal.c:307 exit_to_user_mode_loop kernel/entry/common.c:168 [inline] exit_to_user_mode_prepare+0x11f/0x240 kernel/entry/common.c:204 __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline] syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:297 do_syscall_64+0x46/0x80 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f281828edcd Code: Unable to access opcode bytes at 0x7f281828eda3. RSP: 002b:00007f28194c0c98 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca RAX: fffffffffffffe00 RBX: 00007f28183bbf80 RCX: 00007f281828edcd RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00007f28183bbf88 RBP: 00007f28183bbf88 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f28183bbf8c R13: 00007ffd5038e1ef R14: 00007ffd5038e390 R15: 00007f28194c0d80 </TASK> NMI backtrace for cpu 2 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline] NMI backtrace for cpu 2 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline] NMI backtrace for cpu 2 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729 ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: BUG: soft lockup in smp_call_function 2023-09-12 23:02 BUG: soft lockup in smp_call_function Sanan Hasanov @ 2023-09-13 10:05 ` Peter Zijlstra 2023-09-13 11:07 ` Hillf Danton 1 sibling, 0 replies; 26+ messages in thread From: Peter Zijlstra @ 2023-09-13 10:05 UTC (permalink / raw) To: Sanan Hasanov Cc: paulmck@kernel.org, jgross@suse.com, vschneid@redhat.com, yury.norov@gmail.com, linux-kernel@vger.kernel.org, syzkaller@googlegroups.com, contact@pgazz.com, rodrigosiqueiramelo, melissa.srw, mairacanal, hamohammed.sa, daniel, airlied On Tue, Sep 12, 2023 at 11:02:56PM +0000, Sanan Hasanov wrote: > Good day, dear maintainers, > > We found a bug using a modified kernel configuration file used by syzbot. > > We enhanced the coverage of the configuration file using our tool, klocalizer. > > Kernel Branch: 6.3.0-next-20230426 > Kernel Config: https://drive.google.com/file/d/1WSUEWrith9-539qo6xRqmwy4LfDtmKpp/view?usp=sharing > Reproducer: https://drive.google.com/file/d/1pN6FfcjuUs6Wx94g1gufuYGjRbMMgiZ4/view?usp=sharing > Thank you! AFAICT the thing is stuck in DRM somewhere... > watchdog: BUG: soft lockup - CPU#5 stuck for 26s! [kworker/u16:1:12] > Modules linked in: > irq event stamp: 192794 > hardirqs last enabled at (192793): [<ffffffff89a0140a>] asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:645 > hardirqs last disabled at (192794): [<ffffffff89975d4f>] sysvec_apic_timer_interrupt+0xf/0xc0 arch/x86/kernel/apic/apic.c:1106 > softirqs last enabled at (187764): [<ffffffff814b94bd>] invoke_softirq kernel/softirq.c:445 [inline] > softirqs last enabled at (187764): [<ffffffff814b94bd>] __irq_exit_rcu+0x11d/0x190 kernel/softirq.c:650 > softirqs last disabled at (187671): [<ffffffff814b94bd>] invoke_softirq kernel/softirq.c:445 [inline] > softirqs last disabled at (187671): [<ffffffff814b94bd>] __irq_exit_rcu+0x11d/0x190 kernel/softirq.c:650 > CPU: 5 PID: 12 Comm: kworker/u16:1 Not tainted 6.3.0-next-20230426 #1 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 > Workqueue: events_unbound toggle_allocation_gate > RIP: 0010:csd_lock_wait kernel/smp.c:294 [inline] > RIP: 0010:smp_call_function_many_cond+0x5bd/0x1020 kernel/smp.c:828 > Code: 0b 00 85 ed 74 4d 48 b8 00 00 00 00 00 fc ff df 4d 89 f4 4c 89 f5 49 c1 ec 03 83 e5 07 49 01 c4 83 c5 03 e8 b5 07 0b 00 f3 90 <41> 0f b6 04 24 40 38 c5 7c 08 84 c0 0f 85 46 08 00 00 8b 43 08 31 > RSP: 0018:ffffc900000cf9e8 EFLAGS: 00000293 > RAX: 0000000000000000 RBX: ffff888119cc4d80 RCX: 0000000000000000 > RDX: ffff888100325940 RSI: ffffffff8176807b RDI: 0000000000000005 > RBP: 0000000000000003 R08: 0000000000000005 R09: 0000000000000000 > R10: 0000000000000001 R11: 0000000000000001 R12: ffffed10233989b1 > R13: 0000000000000001 R14: ffff888119cc4d88 R15: 0000000000000001 > FS: 0000000000000000(0000) GS:ffff888119e80000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000555556a6cc88 CR3: 000000000bb73000 CR4: 0000000000350ee0 > Call Trace: > <TASK> > on_each_cpu_cond_mask+0x40/0x90 kernel/smp.c:996 > on_each_cpu include/linux/smp.h:71 [inline] > text_poke_sync arch/x86/kernel/alternative.c:1770 [inline] > text_poke_bp_batch+0x237/0x770 arch/x86/kernel/alternative.c:1970 > text_poke_flush arch/x86/kernel/alternative.c:2161 [inline] > text_poke_flush arch/x86/kernel/alternative.c:2158 [inline] > text_poke_finish+0x1a/0x30 arch/x86/kernel/alternative.c:2168 > arch_jump_label_transform_apply+0x17/0x30 arch/x86/kernel/jump_label.c:146 > jump_label_update+0x321/0x400 kernel/jump_label.c:829 > static_key_enable_cpuslocked+0x1b5/0x270 kernel/jump_label.c:205 > static_key_enable+0x1a/0x20 kernel/jump_label.c:218 > toggle_allocation_gate mm/kfence/core.c:831 [inline] > toggle_allocation_gate+0xf4/0x220 mm/kfence/core.c:823 > process_one_work+0x993/0x15e0 kernel/workqueue.c:2405 > worker_thread+0x67d/0x10c0 kernel/workqueue.c:2552 > kthread+0x33e/0x440 kernel/kthread.c:379 > ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 > </TASK> Right, so this is waiting for an IPI to be processed.. while #1 has IRQs disabled > Sending NMI from CPU 5 to CPUs 0-4,6-7: > NMI backtrace for cpu 1 > CPU: 1 PID: 20602 Comm: syz-executor.3 Not tainted 6.3.0-next-20230426 #1 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 > RIP: 0010:hlock_class kernel/locking/lockdep.c:228 [inline] > RIP: 0010:check_wait_context kernel/locking/lockdep.c:4747 [inline] > RIP: 0010:__lock_acquire+0x489/0x5d00 kernel/locking/lockdep.c:5024 > Code: 41 81 e5 ff 1f 45 0f b7 ed be 08 00 00 00 4c 89 e8 48 c1 e8 06 48 8d 3c c5 00 6b 2c 90 e8 5f 90 6e 00 4c 0f a3 2d d7 35 c9 0e <0f> 83 5c 0c 00 00 4f 8d 6c 6d 00 49 c1 e5 06 49 81 c5 20 6f 2c 90 > RSP: 0018:ffffc90002aa7350 EFLAGS: 00000047 > RAX: 0000000000000001 RBX: 0000000000000001 RCX: ffffffff81633521 > RDX: fffffbfff2058d62 RSI: 0000000000000008 RDI: ffffffff902c6b08 > RBP: ffff888042995940 R08: 0000000000000000 R09: ffffffff902c6b0f > R10: fffffbfff2058d61 R11: 0000000000000001 R12: ffff888119e2b818 > R13: 0000000000000063 R14: 0000000000000002 R15: ffff888042996598 > FS: 00007fdaad065700(0000) GS:ffff888119c80000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000001b30623000 CR3: 0000000101969000 CR4: 0000000000350ee0 > Call Trace: > <TASK> > lock_acquire kernel/locking/lockdep.c:5691 [inline] > lock_acquire+0x1b1/0x520 kernel/locking/lockdep.c:5656 > __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] > _raw_spin_lock_irqsave+0x3d/0x60 kernel/locking/spinlock.c:162 > lock_hrtimer_base kernel/time/hrtimer.c:173 [inline] > hrtimer_try_to_cancel kernel/time/hrtimer.c:1331 [inline] > hrtimer_try_to_cancel+0xa9/0x2e0 kernel/time/hrtimer.c:1316 > hrtimer_cancel+0x17/0x40 kernel/time/hrtimer.c:1443 And this is trying to cancel a hrtimer which is ran on CPU4 and won't be making much progress. > __disable_vblank drivers/gpu/drm/drm_vblank.c:434 [inline] So we're here, holding vbl_lock, vblank_time_lock one of which is what #4 is waiting on. This also has IRQs disabled, which is what #1 is waiting on. > drm_vblank_disable_and_save+0x282/0x3d0 drivers/gpu/drm/drm_vblank.c:478 > drm_crtc_vblank_off+0x312/0x970 drivers/gpu/drm/drm_vblank.c:1366 > disable_outputs+0x7c7/0xbb0 drivers/gpu/drm/drm_atomic_helper.c:1202 > drm_atomic_helper_commit_modeset_disables+0x1d/0x40 drivers/gpu/drm/drm_atomic_helper.c:1397 > vkms_atomic_commit_tail+0x51/0x240 drivers/gpu/drm/vkms/vkms_drv.c:71 > commit_tail+0x288/0x420 drivers/gpu/drm/drm_atomic_helper.c:1812 > drm_atomic_helper_commit drivers/gpu/drm/drm_atomic_helper.c:2052 [inline] > drm_atomic_helper_commit+0x306/0x390 drivers/gpu/drm/drm_atomic_helper.c:1985 > drm_atomic_commit+0x20a/0x2d0 drivers/gpu/drm/drm_atomic.c:1503 > drm_client_modeset_commit_atomic+0x698/0x7e0 drivers/gpu/drm/drm_client_modeset.c:1045 > drm_client_modeset_dpms+0x174/0x200 drivers/gpu/drm/drm_client_modeset.c:1226 > drm_fb_helper_dpms drivers/gpu/drm/drm_fb_helper.c:323 [inline] > drm_fb_helper_blank+0xd1/0x260 drivers/gpu/drm/drm_fb_helper.c:356 > fb_blank+0x105/0x190 drivers/video/fbdev/core/fbmem.c:1088 > do_fb_ioctl+0x390/0x760 drivers/video/fbdev/core/fbmem.c:1180 > fb_ioctl+0xeb/0x150 drivers/video/fbdev/core/fbmem.c:1204 > vfs_ioctl fs/ioctl.c:51 [inline] > __do_sys_ioctl fs/ioctl.c:870 [inline] > __se_sys_ioctl fs/ioctl.c:856 [inline] > __x64_sys_ioctl+0x197/0x210 fs/ioctl.c:856 > do_syscall_x64 arch/x86/entry/common.c:50 [inline] > do_syscall_64+0x39/0x80 arch/x86/entry/common.c:80 > entry_SYSCALL_64_after_hwframe+0x63/0xcd > RIP: 0033:0x7fdaabe8edcd > Code: 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 > RSP: 002b:00007fdaad064bf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > RAX: ffffffffffffffda RBX: 00007fdaabfbbf80 RCX: 00007fdaabe8edcd > RDX: 0000000000000004 RSI: 0000000000004611 RDI: 0000000000000003 > RBP: 00007fdaabefc59c R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 > R13: 00007ffdadeffe9f R14: 00007ffdadf00040 R15: 00007fdaad064d80 > </TASK> > CPU: 4 PID: 20623 Comm: syz-executor.6 Not tainted 6.3.0-next-20230426 #1 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 > RIP: 0010:kvm_wait+0xb7/0x110 arch/x86/kernel/kvm.c:1064 > Code: 40 38 c6 74 1b 48 83 c4 10 c3 c3 e8 93 d3 50 00 eb 07 0f 00 2d 4a 04 92 08 fb f4 48 83 c4 10 c3 eb 07 0f 00 2d 3a 04 92 08 f4 <48> 83 c4 10 c3 89 74 24 0c 48 89 3c 24 e8 d7 d4 50 00 8b 74 24 0c > RSP: 0018:ffffc90000300b50 EFLAGS: 00000046 > RAX: 0000000000000003 RBX: 0000000000000000 RCX: dffffc0000000000 > RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff88810b0803d8 > RBP: ffff88810b0803d8 R08: 0000000000000001 R09: ffff88810b0803d8 > R10: ffffed102161007b R11: ffffc90000300ff8 R12: 0000000000000000 > R13: ffffed102161007b R14: 0000000000000001 R15: ffff888119e3d3c0 > FS: 0000000000000000(0000) GS:ffff888119e00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007f28183bd0b0 CR3: 000000000bb73000 CR4: 0000000000350ee0 > Call Trace: > <IRQ> > pv_wait arch/x86/include/asm/paravirt.h:598 [inline] > pv_wait_head_or_lock kernel/locking/qspinlock_paravirt.h:470 [inline] > __pv_queued_spin_lock_slowpath+0x8e4/0xb80 kernel/locking/qspinlock.c:511 > pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:586 [inline] > queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:51 [inline] > queued_spin_lock include/asm-generic/qspinlock.h:114 [inline] > do_raw_spin_lock+0x20d/0x2b0 kernel/locking/spinlock_debug.c:115 > __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:111 [inline] > _raw_spin_lock_irqsave+0x45/0x60 kernel/locking/spinlock.c:162 > drm_handle_vblank+0x11e/0xb80 drivers/gpu/drm/drm_vblank.c:1986 stuck on a spinlock held by #5 > vkms_vblank_simulate+0xe8/0x3e0 drivers/gpu/drm/vkms/vkms_crtc.c:29 > __run_hrtimer kernel/time/hrtimer.c:1685 [inline] > __hrtimer_run_queues+0x599/0xa30 kernel/time/hrtimer.c:1749 > hrtimer_interrupt+0x320/0x7b0 kernel/time/hrtimer.c:1811 > local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1095 [inline] > __sysvec_apic_timer_interrupt+0x14a/0x430 arch/x86/kernel/apic/apic.c:1112 > sysvec_apic_timer_interrupt+0x92/0xc0 arch/x86/kernel/apic/apic.c:1106 > </IRQ> > <TASK> > asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:645 > RIP: 0010:check_kcov_mode kernel/kcov.c:173 [inline] > RIP: 0010:__sanitizer_cov_trace_pc+0x11/0x70 kernel/kcov.c:207 > Code: a8 01 00 00 e8 b0 ff ff ff 31 c0 c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 f3 0f 1e fa 65 8b 05 0d 33 82 7e 89 c1 48 8b 34 24 <81> e1 00 01 00 00 65 48 8b 14 25 40 bb 03 00 a9 00 01 ff 00 74 0e > RSP: 0018:ffffc90002be76d8 EFLAGS: 00000286 > RAX: 0000000080000001 RBX: 0000000000000001 RCX: 0000000080000001 > RDX: 00007f2817c77000 RSI: ffffffff81bcd756 RDI: ffffc90002be7ad8 > RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000 > R10: 0000000000000001 R11: 0000000000000001 R12: ffffea00014fc480 > R13: 0000000000000000 R14: dffffc0000000000 R15: 8000000053f12007 > zap_drop_file_uffd_wp mm/memory.c:1352 [inline] > zap_install_uffd_wp_if_needed mm/memory.c:1371 [inline] > zap_pte_range mm/memory.c:1417 [inline] > zap_pmd_range mm/memory.c:1564 [inline] > zap_pud_range mm/memory.c:1593 [inline] > zap_p4d_range mm/memory.c:1614 [inline] > unmap_page_range+0x1046/0x4470 mm/memory.c:1635 > unmap_single_vma+0x19a/0x2b0 mm/memory.c:1681 > unmap_vmas+0x234/0x380 mm/memory.c:1720 > exit_mmap+0x190/0x930 mm/mmap.c:3111 > __mmput+0x128/0x4c0 kernel/fork.c:1351 > mmput+0x60/0x70 kernel/fork.c:1373 > exit_mm kernel/exit.c:564 [inline] > do_exit+0x9d1/0x29f0 kernel/exit.c:858 > do_group_exit+0xd4/0x2a0 kernel/exit.c:1021 > get_signal+0x2311/0x25c0 kernel/signal.c:2874 > arch_do_signal_or_restart+0x79/0x5a0 arch/x86/kernel/signal.c:307 > exit_to_user_mode_loop kernel/entry/common.c:168 [inline] > exit_to_user_mode_prepare+0x11f/0x240 kernel/entry/common.c:204 > __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline] > syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:297 > do_syscall_64+0x46/0x80 arch/x86/entry/common.c:86 > entry_SYSCALL_64_after_hwframe+0x63/0xcd > RIP: 0033:0x7f281828edcd > Code: Unable to access opcode bytes at 0x7f281828eda3. > RSP: 002b:00007f28194c0c98 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca > RAX: fffffffffffffe00 RBX: 00007f28183bbf80 RCX: 00007f281828edcd > RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00007f28183bbf88 > RBP: 00007f28183bbf88 R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000246 R12: 00007f28183bbf8c > R13: 00007ffd5038e1ef R14: 00007ffd5038e390 R15: 00007f28194c0d80 > </TASK> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: BUG: soft lockup in smp_call_function 2023-09-12 23:02 BUG: soft lockup in smp_call_function Sanan Hasanov 2023-09-13 10:05 ` Peter Zijlstra @ 2023-09-13 11:07 ` Hillf Danton 2023-09-13 14:21 ` Tetsuo Handa 2023-09-13 14:30 ` BUG: soft lockup in smp_call_function Tetsuo Handa 1 sibling, 2 replies; 26+ messages in thread From: Hillf Danton @ 2023-09-13 11:07 UTC (permalink / raw) To: Sanan Hasanov Cc: peterz, Linus Torvalds, Tetsuo Handa, Thomas Gleixner, syzkaller, linux-mm, LKML On Tue, 12 Sep 2023 23:02:56 +0000 Sanan Hasanov <Sanan.Hasanov@ucf.edu> > Good day, dear maintainers, > > We found a bug using a modified kernel configuration file used by syzbot. > Thanks for your report. > We enhanced the coverage of the configuration file using our tool, klocalizer. > > Kernel Branch: 6.3.0-next-20230426 > Kernel Config: https://drive.google.com/file/d/1WSUEWrith9-539qo6xRqmwy4LfDtmKpp/view?usp=sharing > Reproducer: https://drive.google.com/file/d/1pN6FfcjuUs6Wx94g1gufuYGjRbMMgiZ4/view?usp=sharing > Thank you! > > Best regards, > Sanan Hasanov > > watchdog: BUG: soft lockup - CPU#5 stuck for 26s! [kworker/u16:1:12] > Modules linked in: > irq event stamp: 192794 > hardirqs last enabled at (192793): [<ffffffff89a0140a>] asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:645 > hardirqs last disabled at (192794): [<ffffffff89975d4f>] sysvec_apic_timer_interrupt+0xf/0xc0 arch/x86/kernel/apic/apic.c:1106 > softirqs last enabled at (187764): [<ffffffff814b94bd>] invoke_softirq kernel/softirq.c:445 [inline] > softirqs last enabled at (187764): [<ffffffff814b94bd>] __irq_exit_rcu+0x11d/0x190 kernel/softirq.c:650 > softirqs last disabled at (187671): [<ffffffff814b94bd>] invoke_softirq kernel/softirq.c:445 [inline] > softirqs last disabled at (187671): [<ffffffff814b94bd>] __irq_exit_rcu+0x11d/0x190 kernel/softirq.c:650 > CPU: 5 PID: 12 Comm: kworker/u16:1 Not tainted 6.3.0-next-20230426 #1 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 > Workqueue: events_unbound toggle_allocation_gate > RIP: 0010:csd_lock_wait kernel/smp.c:294 [inline] > RIP: 0010:smp_call_function_many_cond+0x5bd/0x1020 kernel/smp.c:828 > Code: 0b 00 85 ed 74 4d 48 b8 00 00 00 00 00 fc ff df 4d 89 f4 4c 89 f5 49 c1 ec 03 83 e5 07 49 01 c4 83 c5 03 e8 b5 07 0b 00 f3 90 <41> 0f b6 04 24 40 38 c5 7c 08 84 c0 0f 85 46 08 00 00 8b 43 08 31 > RSP: 0018:ffffc900000cf9e8 EFLAGS: 00000293 > RAX: 0000000000000000 RBX: ffff888119cc4d80 RCX: 0000000000000000 > RDX: ffff888100325940 RSI: ffffffff8176807b RDI: 0000000000000005 > RBP: 0000000000000003 R08: 0000000000000005 R09: 0000000000000000 > R10: 0000000000000001 R11: 0000000000000001 R12: ffffed10233989b1 > R13: 0000000000000001 R14: ffff888119cc4d88 R15: 0000000000000001 > FS: 0000000000000000(0000) GS:ffff888119e80000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000555556a6cc88 CR3: 000000000bb73000 CR4: 0000000000350ee0 > Call Trace: > <TASK> > on_each_cpu_cond_mask+0x40/0x90 kernel/smp.c:996 > on_each_cpu include/linux/smp.h:71 [inline] > text_poke_sync arch/x86/kernel/alternative.c:1770 [inline] > text_poke_bp_batch+0x237/0x770 arch/x86/kernel/alternative.c:1970 > text_poke_flush arch/x86/kernel/alternative.c:2161 [inline] > text_poke_flush arch/x86/kernel/alternative.c:2158 [inline] > text_poke_finish+0x1a/0x30 arch/x86/kernel/alternative.c:2168 > arch_jump_label_transform_apply+0x17/0x30 arch/x86/kernel/jump_label.c:146 > jump_label_update+0x321/0x400 kernel/jump_label.c:829 > static_key_enable_cpuslocked+0x1b5/0x270 kernel/jump_label.c:205 > static_key_enable+0x1a/0x20 kernel/jump_label.c:218 > toggle_allocation_gate mm/kfence/core.c:831 [inline] > toggle_allocation_gate+0xf4/0x220 mm/kfence/core.c:823 > process_one_work+0x993/0x15e0 kernel/workqueue.c:2405 > worker_thread+0x67d/0x10c0 kernel/workqueue.c:2552 > kthread+0x33e/0x440 kernel/kthread.c:379 > ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 > </TASK> > Sending NMI from CPU 5 to CPUs 0-4,6-7: > NMI backtrace for cpu 1 > CPU: 1 PID: 20602 Comm: syz-executor.3 Not tainted 6.3.0-next-20230426 #1 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 > RIP: 0010:hlock_class kernel/locking/lockdep.c:228 [inline] > RIP: 0010:check_wait_context kernel/locking/lockdep.c:4747 [inline] > RIP: 0010:__lock_acquire+0x489/0x5d00 kernel/locking/lockdep.c:5024 > Code: 41 81 e5 ff 1f 45 0f b7 ed be 08 00 00 00 4c 89 e8 48 c1 e8 06 48 8d 3c c5 00 6b 2c 90 e8 5f 90 6e 00 4c 0f a3 2d d7 35 c9 0e <0f> 83 5c 0c 00 00 4f 8d 6c 6d 00 49 c1 e5 06 49 81 c5 20 6f 2c 90 > RSP: 0018:ffffc90002aa7350 EFLAGS: 00000047 > RAX: 0000000000000001 RBX: 0000000000000001 RCX: ffffffff81633521 > RDX: fffffbfff2058d62 RSI: 0000000000000008 RDI: ffffffff902c6b08 > RBP: ffff888042995940 R08: 0000000000000000 R09: ffffffff902c6b0f > R10: fffffbfff2058d61 R11: 0000000000000001 R12: ffff888119e2b818 > R13: 0000000000000063 R14: 0000000000000002 R15: ffff888042996598 > FS: 00007fdaad065700(0000) GS:ffff888119c80000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000001b30623000 CR3: 0000000101969000 CR4: 0000000000350ee0 > Call Trace: > <TASK> > lock_acquire kernel/locking/lockdep.c:5691 [inline] > lock_acquire+0x1b1/0x520 kernel/locking/lockdep.c:5656 > __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] > _raw_spin_lock_irqsave+0x3d/0x60 kernel/locking/spinlock.c:162 > lock_hrtimer_base kernel/time/hrtimer.c:173 [inline] > hrtimer_try_to_cancel kernel/time/hrtimer.c:1331 [inline] > hrtimer_try_to_cancel+0xa9/0x2e0 kernel/time/hrtimer.c:1316 > hrtimer_cancel+0x17/0x40 kernel/time/hrtimer.c:1443 > __disable_vblank drivers/gpu/drm/drm_vblank.c:434 [inline] > drm_vblank_disable_and_save+0x282/0x3d0 drivers/gpu/drm/drm_vblank.c:478 > drm_crtc_vblank_off+0x312/0x970 drivers/gpu/drm/drm_vblank.c:1366 cpu1 cpu4 (see below) ==== ==== drm_crtc_vblank_off __run_hrtimer spin_lock_irq(&dev->event_lock); ... drm_handle_vblank hrtimer_cancel spin_lock_irqsave(&dev->event_lock, irqflags); Deadlock should have been reported instead provided the lockdep_map in struct timer_list were added also to hrtimer, so it is highly appreciated if Tetsuo or Thomas adds it before 6.8 or 6.10. Hillf > disable_outputs+0x7c7/0xbb0 drivers/gpu/drm/drm_atomic_helper.c:1202 > drm_atomic_helper_commit_modeset_disables+0x1d/0x40 drivers/gpu/drm/drm_atomic_helper.c:1397 > vkms_atomic_commit_tail+0x51/0x240 drivers/gpu/drm/vkms/vkms_drv.c:71 > commit_tail+0x288/0x420 drivers/gpu/drm/drm_atomic_helper.c:1812 > drm_atomic_helper_commit drivers/gpu/drm/drm_atomic_helper.c:2052 [inline] > drm_atomic_helper_commit+0x306/0x390 drivers/gpu/drm/drm_atomic_helper.c:1985 > drm_atomic_commit+0x20a/0x2d0 drivers/gpu/drm/drm_atomic.c:1503 > drm_client_modeset_commit_atomic+0x698/0x7e0 drivers/gpu/drm/drm_client_modeset.c:1045 > drm_client_modeset_dpms+0x174/0x200 drivers/gpu/drm/drm_client_modeset.c:1226 > drm_fb_helper_dpms drivers/gpu/drm/drm_fb_helper.c:323 [inline] > drm_fb_helper_blank+0xd1/0x260 drivers/gpu/drm/drm_fb_helper.c:356 > fb_blank+0x105/0x190 drivers/video/fbdev/core/fbmem.c:1088 > do_fb_ioctl+0x390/0x760 drivers/video/fbdev/core/fbmem.c:1180 > fb_ioctl+0xeb/0x150 drivers/video/fbdev/core/fbmem.c:1204 > vfs_ioctl fs/ioctl.c:51 [inline] > __do_sys_ioctl fs/ioctl.c:870 [inline] > __se_sys_ioctl fs/ioctl.c:856 [inline] > __x64_sys_ioctl+0x197/0x210 fs/ioctl.c:856 > do_syscall_x64 arch/x86/entry/common.c:50 [inline] > do_syscall_64+0x39/0x80 arch/x86/entry/common.c:80 > entry_SYSCALL_64_after_hwframe+0x63/0xcd > RIP: 0033:0x7fdaabe8edcd > Code: 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 > RSP: 002b:00007fdaad064bf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > RAX: ffffffffffffffda RBX: 00007fdaabfbbf80 RCX: 00007fdaabe8edcd > RDX: 0000000000000004 RSI: 0000000000004611 RDI: 0000000000000003 > RBP: 00007fdaabefc59c R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 > R13: 00007ffdadeffe9f R14: 00007ffdadf00040 R15: 00007fdaad064d80 > </TASK> > NMI backtrace for cpu 0 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline] > NMI backtrace for cpu 0 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline] > NMI backtrace for cpu 0 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729 > NMI backtrace for cpu 3 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline] > NMI backtrace for cpu 3 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline] > NMI backtrace for cpu 3 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729 > NMI backtrace for cpu 6 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline] > NMI backtrace for cpu 6 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline] > NMI backtrace for cpu 6 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729 > NMI backtrace for cpu 7 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline] > NMI backtrace for cpu 7 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline] > NMI backtrace for cpu 7 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729 > NMI backtrace for cpu 4 > CPU: 4 PID: 20623 Comm: syz-executor.6 Not tainted 6.3.0-next-20230426 #1 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 > RIP: 0010:kvm_wait+0xb7/0x110 arch/x86/kernel/kvm.c:1064 > Code: 40 38 c6 74 1b 48 83 c4 10 c3 c3 e8 93 d3 50 00 eb 07 0f 00 2d 4a 04 92 08 fb f4 48 83 c4 10 c3 eb 07 0f 00 2d 3a 04 92 08 f4 <48> 83 c4 10 c3 89 74 24 0c 48 89 3c 24 e8 d7 d4 50 00 8b 74 24 0c > RSP: 0018:ffffc90000300b50 EFLAGS: 00000046 > RAX: 0000000000000003 RBX: 0000000000000000 RCX: dffffc0000000000 > RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff88810b0803d8 > RBP: ffff88810b0803d8 R08: 0000000000000001 R09: ffff88810b0803d8 > R10: ffffed102161007b R11: ffffc90000300ff8 R12: 0000000000000000 > R13: ffffed102161007b R14: 0000000000000001 R15: ffff888119e3d3c0 > FS: 0000000000000000(0000) GS:ffff888119e00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007f28183bd0b0 CR3: 000000000bb73000 CR4: 0000000000350ee0 > Call Trace: > <IRQ> > pv_wait arch/x86/include/asm/paravirt.h:598 [inline] > pv_wait_head_or_lock kernel/locking/qspinlock_paravirt.h:470 [inline] > __pv_queued_spin_lock_slowpath+0x8e4/0xb80 kernel/locking/qspinlock.c:511 > pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:586 [inline] > queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:51 [inline] > queued_spin_lock include/asm-generic/qspinlock.h:114 [inline] > do_raw_spin_lock+0x20d/0x2b0 kernel/locking/spinlock_debug.c:115 > __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:111 [inline] > _raw_spin_lock_irqsave+0x45/0x60 kernel/locking/spinlock.c:162 > drm_handle_vblank+0x11e/0xb80 drivers/gpu/drm/drm_vblank.c:1986 > vkms_vblank_simulate+0xe8/0x3e0 drivers/gpu/drm/vkms/vkms_crtc.c:29 > __run_hrtimer kernel/time/hrtimer.c:1685 [inline] > __hrtimer_run_queues+0x599/0xa30 kernel/time/hrtimer.c:1749 > hrtimer_interrupt+0x320/0x7b0 kernel/time/hrtimer.c:1811 > local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1095 [inline] > __sysvec_apic_timer_interrupt+0x14a/0x430 arch/x86/kernel/apic/apic.c:1112 > sysvec_apic_timer_interrupt+0x92/0xc0 arch/x86/kernel/apic/apic.c:1106 > </IRQ> > <TASK> > asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:645 > RIP: 0010:check_kcov_mode kernel/kcov.c:173 [inline] > RIP: 0010:__sanitizer_cov_trace_pc+0x11/0x70 kernel/kcov.c:207 > Code: a8 01 00 00 e8 b0 ff ff ff 31 c0 c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 f3 0f 1e fa 65 8b 05 0d 33 82 7e 89 c1 48 8b 34 24 <81> e1 00 01 00 00 65 48 8b 14 25 40 bb 03 00 a9 00 01 ff 00 74 0e > RSP: 0018:ffffc90002be76d8 EFLAGS: 00000286 > RAX: 0000000080000001 RBX: 0000000000000001 RCX: 0000000080000001 > RDX: 00007f2817c77000 RSI: ffffffff81bcd756 RDI: ffffc90002be7ad8 > RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000 > R10: 0000000000000001 R11: 0000000000000001 R12: ffffea00014fc480 > R13: 0000000000000000 R14: dffffc0000000000 R15: 8000000053f12007 > zap_drop_file_uffd_wp mm/memory.c:1352 [inline] > zap_install_uffd_wp_if_needed mm/memory.c:1371 [inline] > zap_pte_range mm/memory.c:1417 [inline] > zap_pmd_range mm/memory.c:1564 [inline] > zap_pud_range mm/memory.c:1593 [inline] > zap_p4d_range mm/memory.c:1614 [inline] > unmap_page_range+0x1046/0x4470 mm/memory.c:1635 > unmap_single_vma+0x19a/0x2b0 mm/memory.c:1681 > unmap_vmas+0x234/0x380 mm/memory.c:1720 > exit_mmap+0x190/0x930 mm/mmap.c:3111 > __mmput+0x128/0x4c0 kernel/fork.c:1351 > mmput+0x60/0x70 kernel/fork.c:1373 > exit_mm kernel/exit.c:564 [inline] > do_exit+0x9d1/0x29f0 kernel/exit.c:858 > do_group_exit+0xd4/0x2a0 kernel/exit.c:1021 > get_signal+0x2311/0x25c0 kernel/signal.c:2874 > arch_do_signal_or_restart+0x79/0x5a0 arch/x86/kernel/signal.c:307 > exit_to_user_mode_loop kernel/entry/common.c:168 [inline] > exit_to_user_mode_prepare+0x11f/0x240 kernel/entry/common.c:204 > __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline] > syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:297 > do_syscall_64+0x46/0x80 arch/x86/entry/common.c:86 > entry_SYSCALL_64_after_hwframe+0x63/0xcd > RIP: 0033:0x7f281828edcd > Code: Unable to access opcode bytes at 0x7f281828eda3. > RSP: 002b:00007f28194c0c98 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca > RAX: fffffffffffffe00 RBX: 00007f28183bbf80 RCX: 00007f281828edcd > RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00007f28183bbf88 > RBP: 00007f28183bbf88 R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000246 R12: 00007f28183bbf8c > R13: 00007ffd5038e1ef R14: 00007ffd5038e390 R15: 00007f28194c0d80 > </TASK> > NMI backtrace for cpu 2 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline] > NMI backtrace for cpu 2 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline] > NMI backtrace for cpu 2 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729 ^ permalink raw reply [flat|nested] 26+ messages in thread
* drm/vkms: deadlock between dev->event_lock and timer 2023-09-13 11:07 ` Hillf Danton @ 2023-09-13 14:21 ` Tetsuo Handa 2023-09-13 14:30 ` BUG: soft lockup in smp_call_function Tetsuo Handa 1 sibling, 0 replies; 26+ messages in thread From: Tetsuo Handa @ 2023-09-13 14:21 UTC (permalink / raw) To: Rodrigo Siqueira, Melissa Wen, Maira Canal, Haneen Mohammed, Daniel Vetter, David Airlie, DRI Cc: Hillf Danton, syzkaller, LKML, Sanan Hasanov Hello. A deadlock was reported in drivers/gpu/drm/vkms/ . It looks like this locking pattern remains as of 6.6-rc1. Please fix. void drm_crtc_vblank_off(struct drm_crtc *crtc) { spin_lock_irq(&dev->event_lock); drm_vblank_disable_and_save(dev, pipe) { __disable_vblank(dev, pipe) { crtc->funcs->disable_vblank(crtc) == vkms_disable_vblank { hrtimer_cancel(&out->vblank_hrtimer) { // Retries with dev->event_lock held until lock_hrtimer_base() succeeds. ret = hrtimer_try_to_cancel(timer) { base = lock_hrtimer_base(timer, &flags); // Fails forever because vkms_vblank_simulate() is in progress. } } } } } spin_unlock_irq(&dev->event_lock); } static void __run_hrtimer(...) { restart = fn(timer) == vkms_vblank_simulate { drm_crtc_handle_vblank(crtc) { drm_handle_vblank(struct drm_device *dev, unsigned int pipe) { spin_lock_irqsave(&dev->event_lock, irqflags); // Trying to hold dev->event_lock inside timer interrupt handler. => Deadlock was reported as a soft lockup. spin_unlock_irqrestore(&dev->event_lock, irqflags); } } } } On 2023/09/13 20:07, Hillf Danton wrote: > On Tue, 12 Sep 2023 23:02:56 +0000 Sanan Hasanov <Sanan.Hasanov@ucf.edu> >> Good day, dear maintainers, >> >> We found a bug using a modified kernel configuration file used by syzbot. >> > Thanks for your report. > >> We enhanced the coverage of the configuration file using our tool, klocalizer. >> >> Kernel Branch: 6.3.0-next-20230426 >> Kernel Config: https://drive.google.com/file/d/1WSUEWrith9-539qo6xRqmwy4LfDtmKpp/view?usp=sharing >> Reproducer: https://drive.google.com/file/d/1pN6FfcjuUs6Wx94g1gufuYGjRbMMgiZ4/view?usp=sharing >> Thank you! >> >> Best regards, >> Sanan Hasanov >> >> watchdog: BUG: soft lockup - CPU#5 stuck for 26s! [kworker/u16:1:12] >> Modules linked in: >> irq event stamp: 192794 >> hardirqs last enabled at (192793): [<ffffffff89a0140a>] asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:645 >> hardirqs last disabled at (192794): [<ffffffff89975d4f>] sysvec_apic_timer_interrupt+0xf/0xc0 arch/x86/kernel/apic/apic.c:1106 >> softirqs last enabled at (187764): [<ffffffff814b94bd>] invoke_softirq kernel/softirq.c:445 [inline] >> softirqs last enabled at (187764): [<ffffffff814b94bd>] __irq_exit_rcu+0x11d/0x190 kernel/softirq.c:650 >> softirqs last disabled at (187671): [<ffffffff814b94bd>] invoke_softirq kernel/softirq.c:445 [inline] >> softirqs last disabled at (187671): [<ffffffff814b94bd>] __irq_exit_rcu+0x11d/0x190 kernel/softirq.c:650 >> CPU: 5 PID: 12 Comm: kworker/u16:1 Not tainted 6.3.0-next-20230426 #1 >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 >> Workqueue: events_unbound toggle_allocation_gate >> RIP: 0010:csd_lock_wait kernel/smp.c:294 [inline] >> RIP: 0010:smp_call_function_many_cond+0x5bd/0x1020 kernel/smp.c:828 >> Code: 0b 00 85 ed 74 4d 48 b8 00 00 00 00 00 fc ff df 4d 89 f4 4c 89 f5 49 c1 ec 03 83 e5 07 49 01 c4 83 c5 03 e8 b5 07 0b 00 f3 90 <41> 0f b6 04 24 40 38 c5 7c 08 84 c0 0f 85 46 08 00 00 8b 43 08 31 >> RSP: 0018:ffffc900000cf9e8 EFLAGS: 00000293 >> RAX: 0000000000000000 RBX: ffff888119cc4d80 RCX: 0000000000000000 >> RDX: ffff888100325940 RSI: ffffffff8176807b RDI: 0000000000000005 >> RBP: 0000000000000003 R08: 0000000000000005 R09: 0000000000000000 >> R10: 0000000000000001 R11: 0000000000000001 R12: ffffed10233989b1 >> R13: 0000000000000001 R14: ffff888119cc4d88 R15: 0000000000000001 >> FS: 0000000000000000(0000) GS:ffff888119e80000(0000) knlGS:0000000000000000 >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> CR2: 0000555556a6cc88 CR3: 000000000bb73000 CR4: 0000000000350ee0 >> Call Trace: >> <TASK> >> on_each_cpu_cond_mask+0x40/0x90 kernel/smp.c:996 >> on_each_cpu include/linux/smp.h:71 [inline] >> text_poke_sync arch/x86/kernel/alternative.c:1770 [inline] >> text_poke_bp_batch+0x237/0x770 arch/x86/kernel/alternative.c:1970 >> text_poke_flush arch/x86/kernel/alternative.c:2161 [inline] >> text_poke_flush arch/x86/kernel/alternative.c:2158 [inline] >> text_poke_finish+0x1a/0x30 arch/x86/kernel/alternative.c:2168 >> arch_jump_label_transform_apply+0x17/0x30 arch/x86/kernel/jump_label.c:146 >> jump_label_update+0x321/0x400 kernel/jump_label.c:829 >> static_key_enable_cpuslocked+0x1b5/0x270 kernel/jump_label.c:205 >> static_key_enable+0x1a/0x20 kernel/jump_label.c:218 >> toggle_allocation_gate mm/kfence/core.c:831 [inline] >> toggle_allocation_gate+0xf4/0x220 mm/kfence/core.c:823 >> process_one_work+0x993/0x15e0 kernel/workqueue.c:2405 >> worker_thread+0x67d/0x10c0 kernel/workqueue.c:2552 >> kthread+0x33e/0x440 kernel/kthread.c:379 >> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 >> </TASK> >> Sending NMI from CPU 5 to CPUs 0-4,6-7: >> NMI backtrace for cpu 1 >> CPU: 1 PID: 20602 Comm: syz-executor.3 Not tainted 6.3.0-next-20230426 #1 >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 >> RIP: 0010:hlock_class kernel/locking/lockdep.c:228 [inline] >> RIP: 0010:check_wait_context kernel/locking/lockdep.c:4747 [inline] >> RIP: 0010:__lock_acquire+0x489/0x5d00 kernel/locking/lockdep.c:5024 >> Code: 41 81 e5 ff 1f 45 0f b7 ed be 08 00 00 00 4c 89 e8 48 c1 e8 06 48 8d 3c c5 00 6b 2c 90 e8 5f 90 6e 00 4c 0f a3 2d d7 35 c9 0e <0f> 83 5c 0c 00 00 4f 8d 6c 6d 00 49 c1 e5 06 49 81 c5 20 6f 2c 90 >> RSP: 0018:ffffc90002aa7350 EFLAGS: 00000047 >> RAX: 0000000000000001 RBX: 0000000000000001 RCX: ffffffff81633521 >> RDX: fffffbfff2058d62 RSI: 0000000000000008 RDI: ffffffff902c6b08 >> RBP: ffff888042995940 R08: 0000000000000000 R09: ffffffff902c6b0f >> R10: fffffbfff2058d61 R11: 0000000000000001 R12: ffff888119e2b818 >> R13: 0000000000000063 R14: 0000000000000002 R15: ffff888042996598 >> FS: 00007fdaad065700(0000) GS:ffff888119c80000(0000) knlGS:0000000000000000 >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> CR2: 0000001b30623000 CR3: 0000000101969000 CR4: 0000000000350ee0 >> Call Trace: >> <TASK> >> lock_acquire kernel/locking/lockdep.c:5691 [inline] >> lock_acquire+0x1b1/0x520 kernel/locking/lockdep.c:5656 >> __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] >> _raw_spin_lock_irqsave+0x3d/0x60 kernel/locking/spinlock.c:162 >> lock_hrtimer_base kernel/time/hrtimer.c:173 [inline] >> hrtimer_try_to_cancel kernel/time/hrtimer.c:1331 [inline] >> hrtimer_try_to_cancel+0xa9/0x2e0 kernel/time/hrtimer.c:1316 >> hrtimer_cancel+0x17/0x40 kernel/time/hrtimer.c:1443 >> __disable_vblank drivers/gpu/drm/drm_vblank.c:434 [inline] >> drm_vblank_disable_and_save+0x282/0x3d0 drivers/gpu/drm/drm_vblank.c:478 >> drm_crtc_vblank_off+0x312/0x970 drivers/gpu/drm/drm_vblank.c:1366 > > cpu1 cpu4 (see below) > ==== ==== > drm_crtc_vblank_off __run_hrtimer > spin_lock_irq(&dev->event_lock); > ... > drm_handle_vblank > hrtimer_cancel spin_lock_irqsave(&dev->event_lock, irqflags); > > > Deadlock should have been reported instead provided the lockdep_map in > struct timer_list were added also to hrtimer, so it is highly appreciated > if Tetsuo or Thomas adds it before 6.8 or 6.10. > > Hillf > >> disable_outputs+0x7c7/0xbb0 drivers/gpu/drm/drm_atomic_helper.c:1202 >> drm_atomic_helper_commit_modeset_disables+0x1d/0x40 drivers/gpu/drm/drm_atomic_helper.c:1397 >> vkms_atomic_commit_tail+0x51/0x240 drivers/gpu/drm/vkms/vkms_drv.c:71 >> commit_tail+0x288/0x420 drivers/gpu/drm/drm_atomic_helper.c:1812 >> drm_atomic_helper_commit drivers/gpu/drm/drm_atomic_helper.c:2052 [inline] >> drm_atomic_helper_commit+0x306/0x390 drivers/gpu/drm/drm_atomic_helper.c:1985 >> drm_atomic_commit+0x20a/0x2d0 drivers/gpu/drm/drm_atomic.c:1503 >> drm_client_modeset_commit_atomic+0x698/0x7e0 drivers/gpu/drm/drm_client_modeset.c:1045 >> drm_client_modeset_dpms+0x174/0x200 drivers/gpu/drm/drm_client_modeset.c:1226 >> drm_fb_helper_dpms drivers/gpu/drm/drm_fb_helper.c:323 [inline] >> drm_fb_helper_blank+0xd1/0x260 drivers/gpu/drm/drm_fb_helper.c:356 >> fb_blank+0x105/0x190 drivers/video/fbdev/core/fbmem.c:1088 >> do_fb_ioctl+0x390/0x760 drivers/video/fbdev/core/fbmem.c:1180 >> fb_ioctl+0xeb/0x150 drivers/video/fbdev/core/fbmem.c:1204 >> vfs_ioctl fs/ioctl.c:51 [inline] >> __do_sys_ioctl fs/ioctl.c:870 [inline] >> __se_sys_ioctl fs/ioctl.c:856 [inline] >> __x64_sys_ioctl+0x197/0x210 fs/ioctl.c:856 >> do_syscall_x64 arch/x86/entry/common.c:50 [inline] >> do_syscall_64+0x39/0x80 arch/x86/entry/common.c:80 >> entry_SYSCALL_64_after_hwframe+0x63/0xcd >> RIP: 0033:0x7fdaabe8edcd >> Code: 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 >> RSP: 002b:00007fdaad064bf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 >> RAX: ffffffffffffffda RBX: 00007fdaabfbbf80 RCX: 00007fdaabe8edcd >> RDX: 0000000000000004 RSI: 0000000000004611 RDI: 0000000000000003 >> RBP: 00007fdaabefc59c R08: 0000000000000000 R09: 0000000000000000 >> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 >> R13: 00007ffdadeffe9f R14: 00007ffdadf00040 R15: 00007fdaad064d80 >> </TASK> >> NMI backtrace for cpu 0 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline] >> NMI backtrace for cpu 0 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline] >> NMI backtrace for cpu 0 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729 >> NMI backtrace for cpu 3 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline] >> NMI backtrace for cpu 3 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline] >> NMI backtrace for cpu 3 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729 >> NMI backtrace for cpu 6 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline] >> NMI backtrace for cpu 6 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline] >> NMI backtrace for cpu 6 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729 >> NMI backtrace for cpu 7 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline] >> NMI backtrace for cpu 7 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline] >> NMI backtrace for cpu 7 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729 >> NMI backtrace for cpu 4 >> CPU: 4 PID: 20623 Comm: syz-executor.6 Not tainted 6.3.0-next-20230426 #1 >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 >> RIP: 0010:kvm_wait+0xb7/0x110 arch/x86/kernel/kvm.c:1064 >> Code: 40 38 c6 74 1b 48 83 c4 10 c3 c3 e8 93 d3 50 00 eb 07 0f 00 2d 4a 04 92 08 fb f4 48 83 c4 10 c3 eb 07 0f 00 2d 3a 04 92 08 f4 <48> 83 c4 10 c3 89 74 24 0c 48 89 3c 24 e8 d7 d4 50 00 8b 74 24 0c >> RSP: 0018:ffffc90000300b50 EFLAGS: 00000046 >> RAX: 0000000000000003 RBX: 0000000000000000 RCX: dffffc0000000000 >> RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff88810b0803d8 >> RBP: ffff88810b0803d8 R08: 0000000000000001 R09: ffff88810b0803d8 >> R10: ffffed102161007b R11: ffffc90000300ff8 R12: 0000000000000000 >> R13: ffffed102161007b R14: 0000000000000001 R15: ffff888119e3d3c0 >> FS: 0000000000000000(0000) GS:ffff888119e00000(0000) knlGS:0000000000000000 >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> CR2: 00007f28183bd0b0 CR3: 000000000bb73000 CR4: 0000000000350ee0 >> Call Trace: >> <IRQ> >> pv_wait arch/x86/include/asm/paravirt.h:598 [inline] >> pv_wait_head_or_lock kernel/locking/qspinlock_paravirt.h:470 [inline] >> __pv_queued_spin_lock_slowpath+0x8e4/0xb80 kernel/locking/qspinlock.c:511 >> pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:586 [inline] >> queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:51 [inline] >> queued_spin_lock include/asm-generic/qspinlock.h:114 [inline] >> do_raw_spin_lock+0x20d/0x2b0 kernel/locking/spinlock_debug.c:115 >> __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:111 [inline] >> _raw_spin_lock_irqsave+0x45/0x60 kernel/locking/spinlock.c:162 >> drm_handle_vblank+0x11e/0xb80 drivers/gpu/drm/drm_vblank.c:1986 >> vkms_vblank_simulate+0xe8/0x3e0 drivers/gpu/drm/vkms/vkms_crtc.c:29 >> __run_hrtimer kernel/time/hrtimer.c:1685 [inline] >> __hrtimer_run_queues+0x599/0xa30 kernel/time/hrtimer.c:1749 >> hrtimer_interrupt+0x320/0x7b0 kernel/time/hrtimer.c:1811 >> local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1095 [inline] >> __sysvec_apic_timer_interrupt+0x14a/0x430 arch/x86/kernel/apic/apic.c:1112 >> sysvec_apic_timer_interrupt+0x92/0xc0 arch/x86/kernel/apic/apic.c:1106 >> </IRQ> >> <TASK> >> asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:645 >> RIP: 0010:check_kcov_mode kernel/kcov.c:173 [inline] >> RIP: 0010:__sanitizer_cov_trace_pc+0x11/0x70 kernel/kcov.c:207 >> Code: a8 01 00 00 e8 b0 ff ff ff 31 c0 c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 f3 0f 1e fa 65 8b 05 0d 33 82 7e 89 c1 48 8b 34 24 <81> e1 00 01 00 00 65 48 8b 14 25 40 bb 03 00 a9 00 01 ff 00 74 0e >> RSP: 0018:ffffc90002be76d8 EFLAGS: 00000286 >> RAX: 0000000080000001 RBX: 0000000000000001 RCX: 0000000080000001 >> RDX: 00007f2817c77000 RSI: ffffffff81bcd756 RDI: ffffc90002be7ad8 >> RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000 >> R10: 0000000000000001 R11: 0000000000000001 R12: ffffea00014fc480 >> R13: 0000000000000000 R14: dffffc0000000000 R15: 8000000053f12007 >> zap_drop_file_uffd_wp mm/memory.c:1352 [inline] >> zap_install_uffd_wp_if_needed mm/memory.c:1371 [inline] >> zap_pte_range mm/memory.c:1417 [inline] >> zap_pmd_range mm/memory.c:1564 [inline] >> zap_pud_range mm/memory.c:1593 [inline] >> zap_p4d_range mm/memory.c:1614 [inline] >> unmap_page_range+0x1046/0x4470 mm/memory.c:1635 >> unmap_single_vma+0x19a/0x2b0 mm/memory.c:1681 >> unmap_vmas+0x234/0x380 mm/memory.c:1720 >> exit_mmap+0x190/0x930 mm/mmap.c:3111 >> __mmput+0x128/0x4c0 kernel/fork.c:1351 >> mmput+0x60/0x70 kernel/fork.c:1373 >> exit_mm kernel/exit.c:564 [inline] >> do_exit+0x9d1/0x29f0 kernel/exit.c:858 >> do_group_exit+0xd4/0x2a0 kernel/exit.c:1021 >> get_signal+0x2311/0x25c0 kernel/signal.c:2874 >> arch_do_signal_or_restart+0x79/0x5a0 arch/x86/kernel/signal.c:307 >> exit_to_user_mode_loop kernel/entry/common.c:168 [inline] >> exit_to_user_mode_prepare+0x11f/0x240 kernel/entry/common.c:204 >> __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline] >> syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:297 >> do_syscall_64+0x46/0x80 arch/x86/entry/common.c:86 >> entry_SYSCALL_64_after_hwframe+0x63/0xcd >> RIP: 0033:0x7f281828edcd >> Code: Unable to access opcode bytes at 0x7f281828eda3. >> RSP: 002b:00007f28194c0c98 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca >> RAX: fffffffffffffe00 RBX: 00007f28183bbf80 RCX: 00007f281828edcd >> RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00007f28183bbf88 >> RBP: 00007f28183bbf88 R08: 0000000000000000 R09: 0000000000000000 >> R10: 0000000000000000 R11: 0000000000000246 R12: 00007f28183bbf8c >> R13: 00007ffd5038e1ef R14: 00007ffd5038e390 R15: 00007f28194c0d80 >> </TASK> >> NMI backtrace for cpu 2 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline] >> NMI backtrace for cpu 2 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline] >> NMI backtrace for cpu 2 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729 ^ permalink raw reply [flat|nested] 26+ messages in thread
* drm/vkms: deadlock between dev->event_lock and timer @ 2023-09-13 14:21 ` Tetsuo Handa 0 siblings, 0 replies; 26+ messages in thread From: Tetsuo Handa @ 2023-09-13 14:21 UTC (permalink / raw) To: Rodrigo Siqueira, Melissa Wen, Maira Canal, Haneen Mohammed, Daniel Vetter, David Airlie, DRI Cc: syzkaller, LKML, Hillf Danton, Sanan Hasanov Hello. A deadlock was reported in drivers/gpu/drm/vkms/ . It looks like this locking pattern remains as of 6.6-rc1. Please fix. void drm_crtc_vblank_off(struct drm_crtc *crtc) { spin_lock_irq(&dev->event_lock); drm_vblank_disable_and_save(dev, pipe) { __disable_vblank(dev, pipe) { crtc->funcs->disable_vblank(crtc) == vkms_disable_vblank { hrtimer_cancel(&out->vblank_hrtimer) { // Retries with dev->event_lock held until lock_hrtimer_base() succeeds. ret = hrtimer_try_to_cancel(timer) { base = lock_hrtimer_base(timer, &flags); // Fails forever because vkms_vblank_simulate() is in progress. } } } } } spin_unlock_irq(&dev->event_lock); } static void __run_hrtimer(...) { restart = fn(timer) == vkms_vblank_simulate { drm_crtc_handle_vblank(crtc) { drm_handle_vblank(struct drm_device *dev, unsigned int pipe) { spin_lock_irqsave(&dev->event_lock, irqflags); // Trying to hold dev->event_lock inside timer interrupt handler. => Deadlock was reported as a soft lockup. spin_unlock_irqrestore(&dev->event_lock, irqflags); } } } } On 2023/09/13 20:07, Hillf Danton wrote: > On Tue, 12 Sep 2023 23:02:56 +0000 Sanan Hasanov <Sanan.Hasanov@ucf.edu> >> Good day, dear maintainers, >> >> We found a bug using a modified kernel configuration file used by syzbot. >> > Thanks for your report. > >> We enhanced the coverage of the configuration file using our tool, klocalizer. >> >> Kernel Branch: 6.3.0-next-20230426 >> Kernel Config: https://drive.google.com/file/d/1WSUEWrith9-539qo6xRqmwy4LfDtmKpp/view?usp=sharing >> Reproducer: https://drive.google.com/file/d/1pN6FfcjuUs6Wx94g1gufuYGjRbMMgiZ4/view?usp=sharing >> Thank you! >> >> Best regards, >> Sanan Hasanov >> >> watchdog: BUG: soft lockup - CPU#5 stuck for 26s! [kworker/u16:1:12] >> Modules linked in: >> irq event stamp: 192794 >> hardirqs last enabled at (192793): [<ffffffff89a0140a>] asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:645 >> hardirqs last disabled at (192794): [<ffffffff89975d4f>] sysvec_apic_timer_interrupt+0xf/0xc0 arch/x86/kernel/apic/apic.c:1106 >> softirqs last enabled at (187764): [<ffffffff814b94bd>] invoke_softirq kernel/softirq.c:445 [inline] >> softirqs last enabled at (187764): [<ffffffff814b94bd>] __irq_exit_rcu+0x11d/0x190 kernel/softirq.c:650 >> softirqs last disabled at (187671): [<ffffffff814b94bd>] invoke_softirq kernel/softirq.c:445 [inline] >> softirqs last disabled at (187671): [<ffffffff814b94bd>] __irq_exit_rcu+0x11d/0x190 kernel/softirq.c:650 >> CPU: 5 PID: 12 Comm: kworker/u16:1 Not tainted 6.3.0-next-20230426 #1 >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 >> Workqueue: events_unbound toggle_allocation_gate >> RIP: 0010:csd_lock_wait kernel/smp.c:294 [inline] >> RIP: 0010:smp_call_function_many_cond+0x5bd/0x1020 kernel/smp.c:828 >> Code: 0b 00 85 ed 74 4d 48 b8 00 00 00 00 00 fc ff df 4d 89 f4 4c 89 f5 49 c1 ec 03 83 e5 07 49 01 c4 83 c5 03 e8 b5 07 0b 00 f3 90 <41> 0f b6 04 24 40 38 c5 7c 08 84 c0 0f 85 46 08 00 00 8b 43 08 31 >> RSP: 0018:ffffc900000cf9e8 EFLAGS: 00000293 >> RAX: 0000000000000000 RBX: ffff888119cc4d80 RCX: 0000000000000000 >> RDX: ffff888100325940 RSI: ffffffff8176807b RDI: 0000000000000005 >> RBP: 0000000000000003 R08: 0000000000000005 R09: 0000000000000000 >> R10: 0000000000000001 R11: 0000000000000001 R12: ffffed10233989b1 >> R13: 0000000000000001 R14: ffff888119cc4d88 R15: 0000000000000001 >> FS: 0000000000000000(0000) GS:ffff888119e80000(0000) knlGS:0000000000000000 >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> CR2: 0000555556a6cc88 CR3: 000000000bb73000 CR4: 0000000000350ee0 >> Call Trace: >> <TASK> >> on_each_cpu_cond_mask+0x40/0x90 kernel/smp.c:996 >> on_each_cpu include/linux/smp.h:71 [inline] >> text_poke_sync arch/x86/kernel/alternative.c:1770 [inline] >> text_poke_bp_batch+0x237/0x770 arch/x86/kernel/alternative.c:1970 >> text_poke_flush arch/x86/kernel/alternative.c:2161 [inline] >> text_poke_flush arch/x86/kernel/alternative.c:2158 [inline] >> text_poke_finish+0x1a/0x30 arch/x86/kernel/alternative.c:2168 >> arch_jump_label_transform_apply+0x17/0x30 arch/x86/kernel/jump_label.c:146 >> jump_label_update+0x321/0x400 kernel/jump_label.c:829 >> static_key_enable_cpuslocked+0x1b5/0x270 kernel/jump_label.c:205 >> static_key_enable+0x1a/0x20 kernel/jump_label.c:218 >> toggle_allocation_gate mm/kfence/core.c:831 [inline] >> toggle_allocation_gate+0xf4/0x220 mm/kfence/core.c:823 >> process_one_work+0x993/0x15e0 kernel/workqueue.c:2405 >> worker_thread+0x67d/0x10c0 kernel/workqueue.c:2552 >> kthread+0x33e/0x440 kernel/kthread.c:379 >> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 >> </TASK> >> Sending NMI from CPU 5 to CPUs 0-4,6-7: >> NMI backtrace for cpu 1 >> CPU: 1 PID: 20602 Comm: syz-executor.3 Not tainted 6.3.0-next-20230426 #1 >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 >> RIP: 0010:hlock_class kernel/locking/lockdep.c:228 [inline] >> RIP: 0010:check_wait_context kernel/locking/lockdep.c:4747 [inline] >> RIP: 0010:__lock_acquire+0x489/0x5d00 kernel/locking/lockdep.c:5024 >> Code: 41 81 e5 ff 1f 45 0f b7 ed be 08 00 00 00 4c 89 e8 48 c1 e8 06 48 8d 3c c5 00 6b 2c 90 e8 5f 90 6e 00 4c 0f a3 2d d7 35 c9 0e <0f> 83 5c 0c 00 00 4f 8d 6c 6d 00 49 c1 e5 06 49 81 c5 20 6f 2c 90 >> RSP: 0018:ffffc90002aa7350 EFLAGS: 00000047 >> RAX: 0000000000000001 RBX: 0000000000000001 RCX: ffffffff81633521 >> RDX: fffffbfff2058d62 RSI: 0000000000000008 RDI: ffffffff902c6b08 >> RBP: ffff888042995940 R08: 0000000000000000 R09: ffffffff902c6b0f >> R10: fffffbfff2058d61 R11: 0000000000000001 R12: ffff888119e2b818 >> R13: 0000000000000063 R14: 0000000000000002 R15: ffff888042996598 >> FS: 00007fdaad065700(0000) GS:ffff888119c80000(0000) knlGS:0000000000000000 >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> CR2: 0000001b30623000 CR3: 0000000101969000 CR4: 0000000000350ee0 >> Call Trace: >> <TASK> >> lock_acquire kernel/locking/lockdep.c:5691 [inline] >> lock_acquire+0x1b1/0x520 kernel/locking/lockdep.c:5656 >> __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] >> _raw_spin_lock_irqsave+0x3d/0x60 kernel/locking/spinlock.c:162 >> lock_hrtimer_base kernel/time/hrtimer.c:173 [inline] >> hrtimer_try_to_cancel kernel/time/hrtimer.c:1331 [inline] >> hrtimer_try_to_cancel+0xa9/0x2e0 kernel/time/hrtimer.c:1316 >> hrtimer_cancel+0x17/0x40 kernel/time/hrtimer.c:1443 >> __disable_vblank drivers/gpu/drm/drm_vblank.c:434 [inline] >> drm_vblank_disable_and_save+0x282/0x3d0 drivers/gpu/drm/drm_vblank.c:478 >> drm_crtc_vblank_off+0x312/0x970 drivers/gpu/drm/drm_vblank.c:1366 > > cpu1 cpu4 (see below) > ==== ==== > drm_crtc_vblank_off __run_hrtimer > spin_lock_irq(&dev->event_lock); > ... > drm_handle_vblank > hrtimer_cancel spin_lock_irqsave(&dev->event_lock, irqflags); > > > Deadlock should have been reported instead provided the lockdep_map in > struct timer_list were added also to hrtimer, so it is highly appreciated > if Tetsuo or Thomas adds it before 6.8 or 6.10. > > Hillf > >> disable_outputs+0x7c7/0xbb0 drivers/gpu/drm/drm_atomic_helper.c:1202 >> drm_atomic_helper_commit_modeset_disables+0x1d/0x40 drivers/gpu/drm/drm_atomic_helper.c:1397 >> vkms_atomic_commit_tail+0x51/0x240 drivers/gpu/drm/vkms/vkms_drv.c:71 >> commit_tail+0x288/0x420 drivers/gpu/drm/drm_atomic_helper.c:1812 >> drm_atomic_helper_commit drivers/gpu/drm/drm_atomic_helper.c:2052 [inline] >> drm_atomic_helper_commit+0x306/0x390 drivers/gpu/drm/drm_atomic_helper.c:1985 >> drm_atomic_commit+0x20a/0x2d0 drivers/gpu/drm/drm_atomic.c:1503 >> drm_client_modeset_commit_atomic+0x698/0x7e0 drivers/gpu/drm/drm_client_modeset.c:1045 >> drm_client_modeset_dpms+0x174/0x200 drivers/gpu/drm/drm_client_modeset.c:1226 >> drm_fb_helper_dpms drivers/gpu/drm/drm_fb_helper.c:323 [inline] >> drm_fb_helper_blank+0xd1/0x260 drivers/gpu/drm/drm_fb_helper.c:356 >> fb_blank+0x105/0x190 drivers/video/fbdev/core/fbmem.c:1088 >> do_fb_ioctl+0x390/0x760 drivers/video/fbdev/core/fbmem.c:1180 >> fb_ioctl+0xeb/0x150 drivers/video/fbdev/core/fbmem.c:1204 >> vfs_ioctl fs/ioctl.c:51 [inline] >> __do_sys_ioctl fs/ioctl.c:870 [inline] >> __se_sys_ioctl fs/ioctl.c:856 [inline] >> __x64_sys_ioctl+0x197/0x210 fs/ioctl.c:856 >> do_syscall_x64 arch/x86/entry/common.c:50 [inline] >> do_syscall_64+0x39/0x80 arch/x86/entry/common.c:80 >> entry_SYSCALL_64_after_hwframe+0x63/0xcd >> RIP: 0033:0x7fdaabe8edcd >> Code: 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 >> RSP: 002b:00007fdaad064bf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 >> RAX: ffffffffffffffda RBX: 00007fdaabfbbf80 RCX: 00007fdaabe8edcd >> RDX: 0000000000000004 RSI: 0000000000004611 RDI: 0000000000000003 >> RBP: 00007fdaabefc59c R08: 0000000000000000 R09: 0000000000000000 >> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 >> R13: 00007ffdadeffe9f R14: 00007ffdadf00040 R15: 00007fdaad064d80 >> </TASK> >> NMI backtrace for cpu 0 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline] >> NMI backtrace for cpu 0 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline] >> NMI backtrace for cpu 0 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729 >> NMI backtrace for cpu 3 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline] >> NMI backtrace for cpu 3 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline] >> NMI backtrace for cpu 3 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729 >> NMI backtrace for cpu 6 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline] >> NMI backtrace for cpu 6 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline] >> NMI backtrace for cpu 6 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729 >> NMI backtrace for cpu 7 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline] >> NMI backtrace for cpu 7 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline] >> NMI backtrace for cpu 7 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729 >> NMI backtrace for cpu 4 >> CPU: 4 PID: 20623 Comm: syz-executor.6 Not tainted 6.3.0-next-20230426 #1 >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 >> RIP: 0010:kvm_wait+0xb7/0x110 arch/x86/kernel/kvm.c:1064 >> Code: 40 38 c6 74 1b 48 83 c4 10 c3 c3 e8 93 d3 50 00 eb 07 0f 00 2d 4a 04 92 08 fb f4 48 83 c4 10 c3 eb 07 0f 00 2d 3a 04 92 08 f4 <48> 83 c4 10 c3 89 74 24 0c 48 89 3c 24 e8 d7 d4 50 00 8b 74 24 0c >> RSP: 0018:ffffc90000300b50 EFLAGS: 00000046 >> RAX: 0000000000000003 RBX: 0000000000000000 RCX: dffffc0000000000 >> RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff88810b0803d8 >> RBP: ffff88810b0803d8 R08: 0000000000000001 R09: ffff88810b0803d8 >> R10: ffffed102161007b R11: ffffc90000300ff8 R12: 0000000000000000 >> R13: ffffed102161007b R14: 0000000000000001 R15: ffff888119e3d3c0 >> FS: 0000000000000000(0000) GS:ffff888119e00000(0000) knlGS:0000000000000000 >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> CR2: 00007f28183bd0b0 CR3: 000000000bb73000 CR4: 0000000000350ee0 >> Call Trace: >> <IRQ> >> pv_wait arch/x86/include/asm/paravirt.h:598 [inline] >> pv_wait_head_or_lock kernel/locking/qspinlock_paravirt.h:470 [inline] >> __pv_queued_spin_lock_slowpath+0x8e4/0xb80 kernel/locking/qspinlock.c:511 >> pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:586 [inline] >> queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:51 [inline] >> queued_spin_lock include/asm-generic/qspinlock.h:114 [inline] >> do_raw_spin_lock+0x20d/0x2b0 kernel/locking/spinlock_debug.c:115 >> __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:111 [inline] >> _raw_spin_lock_irqsave+0x45/0x60 kernel/locking/spinlock.c:162 >> drm_handle_vblank+0x11e/0xb80 drivers/gpu/drm/drm_vblank.c:1986 >> vkms_vblank_simulate+0xe8/0x3e0 drivers/gpu/drm/vkms/vkms_crtc.c:29 >> __run_hrtimer kernel/time/hrtimer.c:1685 [inline] >> __hrtimer_run_queues+0x599/0xa30 kernel/time/hrtimer.c:1749 >> hrtimer_interrupt+0x320/0x7b0 kernel/time/hrtimer.c:1811 >> local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1095 [inline] >> __sysvec_apic_timer_interrupt+0x14a/0x430 arch/x86/kernel/apic/apic.c:1112 >> sysvec_apic_timer_interrupt+0x92/0xc0 arch/x86/kernel/apic/apic.c:1106 >> </IRQ> >> <TASK> >> asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:645 >> RIP: 0010:check_kcov_mode kernel/kcov.c:173 [inline] >> RIP: 0010:__sanitizer_cov_trace_pc+0x11/0x70 kernel/kcov.c:207 >> Code: a8 01 00 00 e8 b0 ff ff ff 31 c0 c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 f3 0f 1e fa 65 8b 05 0d 33 82 7e 89 c1 48 8b 34 24 <81> e1 00 01 00 00 65 48 8b 14 25 40 bb 03 00 a9 00 01 ff 00 74 0e >> RSP: 0018:ffffc90002be76d8 EFLAGS: 00000286 >> RAX: 0000000080000001 RBX: 0000000000000001 RCX: 0000000080000001 >> RDX: 00007f2817c77000 RSI: ffffffff81bcd756 RDI: ffffc90002be7ad8 >> RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000 >> R10: 0000000000000001 R11: 0000000000000001 R12: ffffea00014fc480 >> R13: 0000000000000000 R14: dffffc0000000000 R15: 8000000053f12007 >> zap_drop_file_uffd_wp mm/memory.c:1352 [inline] >> zap_install_uffd_wp_if_needed mm/memory.c:1371 [inline] >> zap_pte_range mm/memory.c:1417 [inline] >> zap_pmd_range mm/memory.c:1564 [inline] >> zap_pud_range mm/memory.c:1593 [inline] >> zap_p4d_range mm/memory.c:1614 [inline] >> unmap_page_range+0x1046/0x4470 mm/memory.c:1635 >> unmap_single_vma+0x19a/0x2b0 mm/memory.c:1681 >> unmap_vmas+0x234/0x380 mm/memory.c:1720 >> exit_mmap+0x190/0x930 mm/mmap.c:3111 >> __mmput+0x128/0x4c0 kernel/fork.c:1351 >> mmput+0x60/0x70 kernel/fork.c:1373 >> exit_mm kernel/exit.c:564 [inline] >> do_exit+0x9d1/0x29f0 kernel/exit.c:858 >> do_group_exit+0xd4/0x2a0 kernel/exit.c:1021 >> get_signal+0x2311/0x25c0 kernel/signal.c:2874 >> arch_do_signal_or_restart+0x79/0x5a0 arch/x86/kernel/signal.c:307 >> exit_to_user_mode_loop kernel/entry/common.c:168 [inline] >> exit_to_user_mode_prepare+0x11f/0x240 kernel/entry/common.c:204 >> __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline] >> syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:297 >> do_syscall_64+0x46/0x80 arch/x86/entry/common.c:86 >> entry_SYSCALL_64_after_hwframe+0x63/0xcd >> RIP: 0033:0x7f281828edcd >> Code: Unable to access opcode bytes at 0x7f281828eda3. >> RSP: 002b:00007f28194c0c98 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca >> RAX: fffffffffffffe00 RBX: 00007f28183bbf80 RCX: 00007f281828edcd >> RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00007f28183bbf88 >> RBP: 00007f28183bbf88 R08: 0000000000000000 R09: 0000000000000000 >> R10: 0000000000000000 R11: 0000000000000246 R12: 00007f28183bbf8c >> R13: 00007ffd5038e1ef R14: 00007ffd5038e390 R15: 00007f28194c0d80 >> </TASK> >> NMI backtrace for cpu 2 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline] >> NMI backtrace for cpu 2 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline] >> NMI backtrace for cpu 2 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729 ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: drm/vkms: deadlock between dev->event_lock and timer 2023-09-13 14:21 ` Tetsuo Handa @ 2023-09-13 16:47 ` Linus Torvalds -1 siblings, 0 replies; 26+ messages in thread From: Linus Torvalds @ 2023-09-13 16:47 UTC (permalink / raw) To: Tetsuo Handa Cc: Haneen Mohammed, Hillf Danton, Sanan Hasanov, Rodrigo Siqueira, LKML, DRI, Melissa Wen, Maira Canal, syzkaller On Wed, 13 Sept 2023 at 07:21, Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote: > > Hello. A deadlock was reported in drivers/gpu/drm/vkms/ . > It looks like this locking pattern remains as of 6.6-rc1. Please fix. > > void drm_crtc_vblank_off(struct drm_crtc *crtc) { > spin_lock_irq(&dev->event_lock); > drm_vblank_disable_and_save(dev, pipe) { > __disable_vblank(dev, pipe) { > crtc->funcs->disable_vblank(crtc) == vkms_disable_vblank { > hrtimer_cancel(&out->vblank_hrtimer) { // Retries with dev->event_lock held until lock_hrtimer_base() succeeds. > ret = hrtimer_try_to_cancel(timer) { > base = lock_hrtimer_base(timer, &flags); // Fails forever because vkms_vblank_simulate() is in progress. Heh. Ok. This is clearly a bug, but it does seem to be limited to just the vkms driver, and literally only to the "simulate vblank" case. The worst part about it is that it's so subtle and not obvious. Some light grepping seems to show that amdgpu has almost the exact same pattern in its own vkms thing, except it uses hrtimer_try_to_cancel(&amdgpu_crtc->vblank_timer); directly, which presumably fixes the livelock, but means that the cancel will fail if it's currently running. So just doing the same thing in the vkms driver probably fixes things. Maybe the vkms people need to add a flag to say "it's canceled" so that it doesn't then get re-enabled? Or maybe it doesn't matter and/or already happens for some reason I didn't look into. Linus ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: drm/vkms: deadlock between dev->event_lock and timer @ 2023-09-13 16:47 ` Linus Torvalds 0 siblings, 0 replies; 26+ messages in thread From: Linus Torvalds @ 2023-09-13 16:47 UTC (permalink / raw) To: Tetsuo Handa Cc: Rodrigo Siqueira, Melissa Wen, Maira Canal, Haneen Mohammed, Daniel Vetter, David Airlie, DRI, syzkaller, LKML, Hillf Danton, Sanan Hasanov On Wed, 13 Sept 2023 at 07:21, Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote: > > Hello. A deadlock was reported in drivers/gpu/drm/vkms/ . > It looks like this locking pattern remains as of 6.6-rc1. Please fix. > > void drm_crtc_vblank_off(struct drm_crtc *crtc) { > spin_lock_irq(&dev->event_lock); > drm_vblank_disable_and_save(dev, pipe) { > __disable_vblank(dev, pipe) { > crtc->funcs->disable_vblank(crtc) == vkms_disable_vblank { > hrtimer_cancel(&out->vblank_hrtimer) { // Retries with dev->event_lock held until lock_hrtimer_base() succeeds. > ret = hrtimer_try_to_cancel(timer) { > base = lock_hrtimer_base(timer, &flags); // Fails forever because vkms_vblank_simulate() is in progress. Heh. Ok. This is clearly a bug, but it does seem to be limited to just the vkms driver, and literally only to the "simulate vblank" case. The worst part about it is that it's so subtle and not obvious. Some light grepping seems to show that amdgpu has almost the exact same pattern in its own vkms thing, except it uses hrtimer_try_to_cancel(&amdgpu_crtc->vblank_timer); directly, which presumably fixes the livelock, but means that the cancel will fail if it's currently running. So just doing the same thing in the vkms driver probably fixes things. Maybe the vkms people need to add a flag to say "it's canceled" so that it doesn't then get re-enabled? Or maybe it doesn't matter and/or already happens for some reason I didn't look into. Linus ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: drm/vkms: deadlock between dev->event_lock and timer 2023-09-13 16:47 ` Linus Torvalds @ 2023-09-13 21:08 ` Thomas Gleixner -1 siblings, 0 replies; 26+ messages in thread From: Thomas Gleixner @ 2023-09-13 21:08 UTC (permalink / raw) To: Linus Torvalds, Tetsuo Handa Cc: Haneen Mohammed, Hillf Danton, Sanan Hasanov, Rodrigo Siqueira, LKML, DRI, Melissa Wen, Maira Canal, syzkaller On Wed, Sep 13 2023 at 09:47, Linus Torvalds wrote: > On Wed, 13 Sept 2023 at 07:21, Tetsuo Handa > <penguin-kernel@i-love.sakura.ne.jp> wrote: >> >> Hello. A deadlock was reported in drivers/gpu/drm/vkms/ . >> It looks like this locking pattern remains as of 6.6-rc1. Please fix. >> >> void drm_crtc_vblank_off(struct drm_crtc *crtc) { >> spin_lock_irq(&dev->event_lock); >> drm_vblank_disable_and_save(dev, pipe) { >> __disable_vblank(dev, pipe) { >> crtc->funcs->disable_vblank(crtc) == vkms_disable_vblank { >> hrtimer_cancel(&out->vblank_hrtimer) { // Retries with dev->event_lock held until lock_hrtimer_base() succeeds. >> ret = hrtimer_try_to_cancel(timer) { >> base = lock_hrtimer_base(timer, &flags); // Fails forever because vkms_vblank_simulate() is in progress. > > Heh. Ok. This is clearly a bug, but it does seem to be limited to just > the vkms driver, and literally only to the "simulate vblank" case. > > The worst part about it is that it's so subtle and not obvious. > > Some light grepping seems to show that amdgpu has almost the exact > same pattern in its own vkms thing, except it uses > > hrtimer_try_to_cancel(&amdgpu_crtc->vblank_timer); > > directly, which presumably fixes the livelock, but means that the > cancel will fail if it's currently running. > > So just doing the same thing in the vkms driver probably fixes things. > > Maybe the vkms people need to add a flag to say "it's canceled" so > that it doesn't then get re-enabled? Or maybe it doesn't matter > and/or already happens for some reason I didn't look into. Maybe the VKMS people need to understand locking in the first place. The first thing I saw in this code is: static enum hrtimer_restart vkms_vblank_simulate(struct hrtimer *timer) { ... mutex_unlock(&output->enabled_lock); What? Unlocking a mutex in the context of a hrtimer callback is simply violating all mutex locking rules. How has this code ever survived lock debugging without triggering a big fat warning? Thanks, tglx ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: drm/vkms: deadlock between dev->event_lock and timer @ 2023-09-13 21:08 ` Thomas Gleixner 0 siblings, 0 replies; 26+ messages in thread From: Thomas Gleixner @ 2023-09-13 21:08 UTC (permalink / raw) To: Linus Torvalds, Tetsuo Handa Cc: Rodrigo Siqueira, Melissa Wen, Maira Canal, Haneen Mohammed, Daniel Vetter, David Airlie, DRI, syzkaller, LKML, Hillf Danton, Sanan Hasanov On Wed, Sep 13 2023 at 09:47, Linus Torvalds wrote: > On Wed, 13 Sept 2023 at 07:21, Tetsuo Handa > <penguin-kernel@i-love.sakura.ne.jp> wrote: >> >> Hello. A deadlock was reported in drivers/gpu/drm/vkms/ . >> It looks like this locking pattern remains as of 6.6-rc1. Please fix. >> >> void drm_crtc_vblank_off(struct drm_crtc *crtc) { >> spin_lock_irq(&dev->event_lock); >> drm_vblank_disable_and_save(dev, pipe) { >> __disable_vblank(dev, pipe) { >> crtc->funcs->disable_vblank(crtc) == vkms_disable_vblank { >> hrtimer_cancel(&out->vblank_hrtimer) { // Retries with dev->event_lock held until lock_hrtimer_base() succeeds. >> ret = hrtimer_try_to_cancel(timer) { >> base = lock_hrtimer_base(timer, &flags); // Fails forever because vkms_vblank_simulate() is in progress. > > Heh. Ok. This is clearly a bug, but it does seem to be limited to just > the vkms driver, and literally only to the "simulate vblank" case. > > The worst part about it is that it's so subtle and not obvious. > > Some light grepping seems to show that amdgpu has almost the exact > same pattern in its own vkms thing, except it uses > > hrtimer_try_to_cancel(&amdgpu_crtc->vblank_timer); > > directly, which presumably fixes the livelock, but means that the > cancel will fail if it's currently running. > > So just doing the same thing in the vkms driver probably fixes things. > > Maybe the vkms people need to add a flag to say "it's canceled" so > that it doesn't then get re-enabled? Or maybe it doesn't matter > and/or already happens for some reason I didn't look into. Maybe the VKMS people need to understand locking in the first place. The first thing I saw in this code is: static enum hrtimer_restart vkms_vblank_simulate(struct hrtimer *timer) { ... mutex_unlock(&output->enabled_lock); What? Unlocking a mutex in the context of a hrtimer callback is simply violating all mutex locking rules. How has this code ever survived lock debugging without triggering a big fat warning? Thanks, tglx ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: drm/vkms: deadlock between dev->event_lock and timer 2023-09-13 21:08 ` Thomas Gleixner @ 2023-09-14 6:33 ` Tetsuo Handa -1 siblings, 0 replies; 26+ messages in thread From: Tetsuo Handa @ 2023-09-14 6:33 UTC (permalink / raw) To: Maira Canal, Arthur Grillo Cc: Haneen Mohammed, Hillf Danton, Sanan Hasanov, Rodrigo Siqueira, Linus Torvalds, LKML, DRI, Melissa Wen, syzkaller, Thomas Gleixner On 2023/09/14 6:08, Thomas Gleixner wrote: > Maybe the VKMS people need to understand locking in the first place. The > first thing I saw in this code is: > > static enum hrtimer_restart vkms_vblank_simulate(struct hrtimer *timer) > { > ... > mutex_unlock(&output->enabled_lock); > > What? > > Unlocking a mutex in the context of a hrtimer callback is simply > violating all mutex locking rules. > > How has this code ever survived lock debugging without triggering a big > fat warning? Commit a0e6a017ab56936c ("drm/vkms: Fix race-condition between the hrtimer and the atomic commit") in 6.6-rc1 replaced spinlock with mutex. So we haven't tested with the lock debugging yet... Maíra and Arthur, mutex_unlock() from interrupt context is not permitted. Please revert that patch immediately. I guess that a semaphore (down()/up()) could be used instead of a mutex. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: drm/vkms: deadlock between dev->event_lock and timer @ 2023-09-14 6:33 ` Tetsuo Handa 0 siblings, 0 replies; 26+ messages in thread From: Tetsuo Handa @ 2023-09-14 6:33 UTC (permalink / raw) To: Maira Canal, Arthur Grillo Cc: Rodrigo Siqueira, Melissa Wen, Haneen Mohammed, Daniel Vetter, David Airlie, DRI, syzkaller, LKML, Hillf Danton, Sanan Hasanov, Thomas Gleixner, Linus Torvalds On 2023/09/14 6:08, Thomas Gleixner wrote: > Maybe the VKMS people need to understand locking in the first place. The > first thing I saw in this code is: > > static enum hrtimer_restart vkms_vblank_simulate(struct hrtimer *timer) > { > ... > mutex_unlock(&output->enabled_lock); > > What? > > Unlocking a mutex in the context of a hrtimer callback is simply > violating all mutex locking rules. > > How has this code ever survived lock debugging without triggering a big > fat warning? Commit a0e6a017ab56936c ("drm/vkms: Fix race-condition between the hrtimer and the atomic commit") in 6.6-rc1 replaced spinlock with mutex. So we haven't tested with the lock debugging yet... Maíra and Arthur, mutex_unlock() from interrupt context is not permitted. Please revert that patch immediately. I guess that a semaphore (down()/up()) could be used instead of a mutex. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: drm/vkms: deadlock between dev->event_lock and timer 2023-09-14 6:33 ` Tetsuo Handa @ 2023-09-14 8:12 ` Daniel Vetter -1 siblings, 0 replies; 26+ messages in thread From: Daniel Vetter @ 2023-09-14 8:12 UTC (permalink / raw) To: Tetsuo Handa Cc: Haneen Mohammed, Hillf Danton, Sanan Hasanov, Rodrigo Siqueira, Linus Torvalds, LKML, DRI, Melissa Wen, Maira Canal, syzkaller, Thomas Gleixner, Arthur Grillo On Thu, Sep 14, 2023 at 03:33:41PM +0900, Tetsuo Handa wrote: > On 2023/09/14 6:08, Thomas Gleixner wrote: > > Maybe the VKMS people need to understand locking in the first place. The > > first thing I saw in this code is: > > > > static enum hrtimer_restart vkms_vblank_simulate(struct hrtimer *timer) > > { > > ... > > mutex_unlock(&output->enabled_lock); > > > > What? > > > > Unlocking a mutex in the context of a hrtimer callback is simply > > violating all mutex locking rules. > > > > How has this code ever survived lock debugging without triggering a big > > fat warning? > > Commit a0e6a017ab56936c ("drm/vkms: Fix race-condition between the hrtimer > and the atomic commit") in 6.6-rc1 replaced spinlock with mutex. So we haven't > tested with the lock debugging yet... Yeah that needs an immediate revert, there's not much that looks legit in that patch. I'll chat with Maira. Also yes how that landed without anyone running lockdep is ... not good. I guess we need a lockdep enabled drm ci target that runs vkms tests asap :-) > Maíra and Arthur, mutex_unlock() from interrupt context is not permitted. > Please revert that patch immediately. > I guess that a semaphore (down()/up()) could be used instead of a mutex. From a quick look this smells like a classic "try to use locking when you want synchronization primitives", so semaphore here doesn't look any better. The vkms_set_composer() function was originally for crc generation, where it's userspace's job to make sure they wait for all the crc they need to be generated before they shut it down again. But for writeback the kernel must guarantee that the compositiona actually happens, and the current function just doesn't make any such guarantees. Cheers, Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: drm/vkms: deadlock between dev->event_lock and timer @ 2023-09-14 8:12 ` Daniel Vetter 0 siblings, 0 replies; 26+ messages in thread From: Daniel Vetter @ 2023-09-14 8:12 UTC (permalink / raw) To: Tetsuo Handa Cc: Maira Canal, Arthur Grillo, Rodrigo Siqueira, Melissa Wen, Haneen Mohammed, Daniel Vetter, David Airlie, DRI, syzkaller, LKML, Hillf Danton, Sanan Hasanov, Thomas Gleixner, Linus Torvalds On Thu, Sep 14, 2023 at 03:33:41PM +0900, Tetsuo Handa wrote: > On 2023/09/14 6:08, Thomas Gleixner wrote: > > Maybe the VKMS people need to understand locking in the first place. The > > first thing I saw in this code is: > > > > static enum hrtimer_restart vkms_vblank_simulate(struct hrtimer *timer) > > { > > ... > > mutex_unlock(&output->enabled_lock); > > > > What? > > > > Unlocking a mutex in the context of a hrtimer callback is simply > > violating all mutex locking rules. > > > > How has this code ever survived lock debugging without triggering a big > > fat warning? > > Commit a0e6a017ab56936c ("drm/vkms: Fix race-condition between the hrtimer > and the atomic commit") in 6.6-rc1 replaced spinlock with mutex. So we haven't > tested with the lock debugging yet... Yeah that needs an immediate revert, there's not much that looks legit in that patch. I'll chat with Maira. Also yes how that landed without anyone running lockdep is ... not good. I guess we need a lockdep enabled drm ci target that runs vkms tests asap :-) > Maíra and Arthur, mutex_unlock() from interrupt context is not permitted. > Please revert that patch immediately. > I guess that a semaphore (down()/up()) could be used instead of a mutex. From a quick look this smells like a classic "try to use locking when you want synchronization primitives", so semaphore here doesn't look any better. The vkms_set_composer() function was originally for crc generation, where it's userspace's job to make sure they wait for all the crc they need to be generated before they shut it down again. But for writeback the kernel must guarantee that the compositiona actually happens, and the current function just doesn't make any such guarantees. Cheers, Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: drm/vkms: deadlock between dev->event_lock and timer 2023-09-14 8:12 ` Daniel Vetter (?) @ 2023-09-18 22:02 ` Helen Koike 2023-09-19 6:38 ` Daniel Stone -1 siblings, 1 reply; 26+ messages in thread From: Helen Koike @ 2023-09-18 22:02 UTC (permalink / raw) To: Tetsuo Handa, Maira Canal, Arthur Grillo, Rodrigo Siqueira, Melissa Wen, Haneen Mohammed, David Airlie, DRI, syzkaller, LKML, Hillf Danton, Sanan Hasanov, Thomas Gleixner, Linus Torvalds, Daniel Stone, David Heidelberg, Vignesh Raman On 14/09/2023 05:12, Daniel Vetter wrote: > On Thu, Sep 14, 2023 at 03:33:41PM +0900, Tetsuo Handa wrote: >> On 2023/09/14 6:08, Thomas Gleixner wrote: >>> Maybe the VKMS people need to understand locking in the first place. The >>> first thing I saw in this code is: >>> >>> static enum hrtimer_restart vkms_vblank_simulate(struct hrtimer *timer) >>> { >>> ... >>> mutex_unlock(&output->enabled_lock); >>> >>> What? >>> >>> Unlocking a mutex in the context of a hrtimer callback is simply >>> violating all mutex locking rules. >>> >>> How has this code ever survived lock debugging without triggering a big >>> fat warning? >> >> Commit a0e6a017ab56936c ("drm/vkms: Fix race-condition between the hrtimer >> and the atomic commit") in 6.6-rc1 replaced spinlock with mutex. So we haven't >> tested with the lock debugging yet... > > Yeah that needs an immediate revert, there's not much that looks legit in > that patch. I'll chat with Maira. > > Also yes how that landed without anyone running lockdep is ... not good. I > guess we need a lockdep enabled drm ci target that runs vkms tests asap > :-) btw, I just executed a draft version of vkms targed on the ci, we do get the warning: https://gitlab.freedesktop.org/helen.fornazier/linux/-/jobs/49156305#L623 I'm just not sure if tests would fail (since it is a warning) and it has a chance to be ignored if people don't look at the logs (unless if igt already handles that). I still need to do some adjustments (it seems the tests is hanging somewhere and we got a timeout) but we should have vkms in drm ci soon. Regards, Helen > >> Maíra and Arthur, mutex_unlock() from interrupt context is not permitted. >> Please revert that patch immediately. >> I guess that a semaphore (down()/up()) could be used instead of a mutex. > > From a quick look this smells like a classic "try to use locking when you > want synchronization primitives", so semaphore here doesn't look any > better. The vkms_set_composer() function was originally for crc > generation, where it's userspace's job to make sure they wait for all the > crc they need to be generated before they shut it down again. But for > writeback the kernel must guarantee that the compositiona actually > happens, and the current function just doesn't make any such guarantees. > > Cheers, Daniel ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: drm/vkms: deadlock between dev->event_lock and timer 2023-09-18 22:02 ` Helen Koike @ 2023-09-19 6:38 ` Daniel Stone 0 siblings, 0 replies; 26+ messages in thread From: Daniel Stone @ 2023-09-19 6:38 UTC (permalink / raw) To: Helen Koike Cc: Haneen Mohammed, Hillf Danton, Sanan Hasanov, Rodrigo Siqueira, David Heidelberg, Tetsuo Handa, Linus Torvalds, Vignesh Raman, LKML, DRI, Melissa Wen, Maira Canal, syzkaller, Thomas Gleixner, Arthur Grillo, Daniel Stone On Mon, 18 Sept 2023 at 23:02, Helen Koike <helen.koike@collabora.com> wrote: > On 14/09/2023 05:12, Daniel Vetter wrote: > > Also yes how that landed without anyone running lockdep is ... not good. I > > guess we need a lockdep enabled drm ci target that runs vkms tests asap > > :-) > > btw, I just executed a draft version of vkms targed on the ci, we do get > the warning: > > https://gitlab.freedesktop.org/helen.fornazier/linux/-/jobs/49156305#L623 > > I'm just not sure if tests would fail (since it is a warning) and it has > a chance to be ignored if people don't look at the logs (unless if igt > already handles that). Hmm, dmesg-warn is already a separate igt test status. I guess it just needs to be handled explicitly. > I still need to do some adjustments (it seems the tests is hanging > somewhere and we got a timeout) but we should have vkms in drm ci soon. Might be due to the locking explosion? The kernels should probably have soft-lockup detection enabled as well as lockdep. Cheers, Daniel ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: drm/vkms: deadlock between dev->event_lock and timer @ 2023-09-19 6:38 ` Daniel Stone 0 siblings, 0 replies; 26+ messages in thread From: Daniel Stone @ 2023-09-19 6:38 UTC (permalink / raw) To: Helen Koike Cc: Tetsuo Handa, Maira Canal, Arthur Grillo, Rodrigo Siqueira, Melissa Wen, Haneen Mohammed, David Airlie, DRI, syzkaller, LKML, Hillf Danton, Sanan Hasanov, Thomas Gleixner, Linus Torvalds, Daniel Stone, David Heidelberg, Vignesh Raman On Mon, 18 Sept 2023 at 23:02, Helen Koike <helen.koike@collabora.com> wrote: > On 14/09/2023 05:12, Daniel Vetter wrote: > > Also yes how that landed without anyone running lockdep is ... not good. I > > guess we need a lockdep enabled drm ci target that runs vkms tests asap > > :-) > > btw, I just executed a draft version of vkms targed on the ci, we do get > the warning: > > https://gitlab.freedesktop.org/helen.fornazier/linux/-/jobs/49156305#L623 > > I'm just not sure if tests would fail (since it is a warning) and it has > a chance to be ignored if people don't look at the logs (unless if igt > already handles that). Hmm, dmesg-warn is already a separate igt test status. I guess it just needs to be handled explicitly. > I still need to do some adjustments (it seems the tests is hanging > somewhere and we got a timeout) but we should have vkms in drm ci soon. Might be due to the locking explosion? The kernels should probably have soft-lockup detection enabled as well as lockdep. Cheers, Daniel ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: BUG: soft lockup in smp_call_function 2023-09-13 11:07 ` Hillf Danton 2023-09-13 14:21 ` Tetsuo Handa @ 2023-09-13 14:30 ` Tetsuo Handa 2023-09-14 12:21 ` Hillf Danton 1 sibling, 1 reply; 26+ messages in thread From: Tetsuo Handa @ 2023-09-13 14:30 UTC (permalink / raw) To: Hillf Danton, Sanan Hasanov, Thomas Gleixner, peterz Cc: Linus Torvalds, syzkaller, LKML On 2023/09/13 20:07, Hillf Danton wrote: > > cpu1 cpu4 (see below) > ==== ==== > drm_crtc_vblank_off __run_hrtimer > spin_lock_irq(&dev->event_lock); > ... > drm_handle_vblank > hrtimer_cancel spin_lock_irqsave(&dev->event_lock, irqflags); > > > Deadlock should have been reported instead provided the lockdep_map in > struct timer_list were added also to hrtimer, so it is highly appreciated > if Tetsuo or Thomas adds it before 6.8 or 6.10. Not me. ;-) Since hrtimer_cancel() retries forever until lock_hrtimer_base() succeeds, we want to add a lockdep annotation into hrtimer_cancel() so that we can detect this type of deadlock? ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: BUG: soft lockup in smp_call_function 2023-09-13 14:30 ` BUG: soft lockup in smp_call_function Tetsuo Handa @ 2023-09-14 12:21 ` Hillf Danton 2023-09-14 13:13 ` Tetsuo Handa 0 siblings, 1 reply; 26+ messages in thread From: Hillf Danton @ 2023-09-14 12:21 UTC (permalink / raw) To: Tetsuo Handa Cc: Sanan Hasanov, Thomas Gleixner, Linus Torvalds, syzkaller, linux-mm, LKML On Wed, 13 Sep 2023 23:30:23 +0900 Tetsuo Handa wrote: > On 2023/09/13 20:07, Hillf Danton wrote: > > > > cpu1 cpu4 (see below) > > ==== ==== > > drm_crtc_vblank_off __run_hrtimer > > spin_lock_irq(&dev->event_lock); > > ... > > drm_handle_vblank > > hrtimer_cancel spin_lock_irqsave(&dev->event_lock, irqflags); > > > > > > Deadlock should have been reported instead provided the lockdep_map in > > struct timer_list were added also to hrtimer, so it is highly appreciated > > if Tetsuo or Thomas adds it before 6.8 or 6.10. > > Not me. ;-) > > Since hrtimer_cancel() retries forever until lock_hrtimer_base() succeeds, > we want to add a lockdep annotation into hrtimer_cancel() so that we can > detect this type of deadlock? Yes, you are right. The diff below is my two cents (only for thoughts). --- x/include/linux/timer.h +++ y/include/linux/timer.h @@ -124,6 +124,9 @@ struct hrtimer { u8 is_rel; u8 is_soft; u8 is_hard; +#ifdef CONFIG_LOCKDEP + struct lockdep_map lockdep_map; +#endif }; /** @@ -369,33 +372,65 @@ static inline void hrtimer_cancel_wait_r /* Exported timer functions: */ /* Initialize timers: */ -extern void hrtimer_init(struct hrtimer *timer, clockid_t which_clock, - enum hrtimer_mode mode); -extern void hrtimer_init_sleeper(struct hrtimer_sleeper *sl, clockid_t clock_id, - enum hrtimer_mode mode); +extern void hrtimer_init_key(struct hrtimer *timer, clockid_t which_clock, + enum hrtimer_mode mode, + const char *name, struct lock_class_key *key); +extern void hrtimer_init_sleeper_key(struct hrtimer_sleeper *sl, clockid_t clock_id, + enum hrtimer_mode mode, + const char *name, struct lock_class_key *key); +#ifdef CONFIG_LOCKDEP +#define hrtimer_init(t, c, m) \ + do { \ + static struct lock_class_key __key; \ + hrtimer_init_key(t, c, m, #t, &__key); \ + } while (0) + +#define hrtimer_init_sleeper(s, c, m) \ + do { \ + static struct lock_class_key __key; \ + hrtimer_init_sleeper_key(s, c, m, #s, &__key); \ + } while (0) +#else +#define hrtimer_init(t, c, m) \ + hrtimer_init_key(t, c, m, NULL, NULL) + +#define hrtimer_init_sleeper(s, c, m) \ + hrtimer_init_sleeper_key(s, c, m, NULL, NULL) +#endif #ifdef CONFIG_DEBUG_OBJECTS_TIMERS -extern void hrtimer_init_on_stack(struct hrtimer *timer, clockid_t which_clock, - enum hrtimer_mode mode); -extern void hrtimer_init_sleeper_on_stack(struct hrtimer_sleeper *sl, +extern void hrtimer_init_on_stack_key(struct hrtimer *timer, clockid_t which_clock, + enum hrtimer_mode mode, + const char *name, struct lock_class_key *key); +extern void hrtimer_init_sleeper_on_stack_key(struct hrtimer_sleeper *sl, clockid_t clock_id, - enum hrtimer_mode mode); + enum hrtimer_mode mode, + const char *name, struct lock_class_key *key); +#ifdef CONFIG_LOCKDEP + #define hrtimer_init_on_stack(t, c, m) \ + do { \ + static struct lock_class_key __key; \ + hrtimer_init_on_stack_key(t, c, m, #t, &__key); \ + } while (0) + #define hrtimer_init_sleeper_on_stack(s, c, m) \ + do { \ + static struct lock_class_key __key; \ + hrtimer_init_sleeper_on_stack_key(s, c, m, #s, &__key); \ + } while (0) +#else + #define hrtimer_init_on_stack(t, c, m) \ + hrtimer_init_on_stack_key(t, c, m, NULL, NULL) + #define hrtimer_init_sleeper_on_stack(s, c, m) \ + hrtimer_init_sleeper_on_stack_key(s, c, m, NULL, NULL) +#endif extern void destroy_hrtimer_on_stack(struct hrtimer *timer); #else -static inline void hrtimer_init_on_stack(struct hrtimer *timer, - clockid_t which_clock, - enum hrtimer_mode mode) -{ - hrtimer_init(timer, which_clock, mode); -} +#define hrtimer_init_on_stack(t, c, m) \ + hrtimer_init(t, c, m) -static inline void hrtimer_init_sleeper_on_stack(struct hrtimer_sleeper *sl, - clockid_t clock_id, - enum hrtimer_mode mode) -{ - hrtimer_init_sleeper(sl, clock_id, mode); -} +#define hrtimer_init_sleeper_on_stack(s, c, m) \ + hrtimer_init_sleeper(s, c, m) static inline void destroy_hrtimer_on_stack(struct hrtimer *timer) { } #endif --- x/kernel/time/hrtimer.c +++ y/kernel/time/hrtimer.c @@ -428,22 +428,26 @@ static inline void debug_hrtimer_deactiv static void __hrtimer_init(struct hrtimer *timer, clockid_t clock_id, enum hrtimer_mode mode); -void hrtimer_init_on_stack(struct hrtimer *timer, clockid_t clock_id, - enum hrtimer_mode mode) +void hrtimer_init_on_stack_key(struct hrtimer *timer, clockid_t clock_id, + enum hrtimer_mode mode, + const char *name, struct lock_class_key *key) { debug_object_init_on_stack(timer, &hrtimer_debug_descr); __hrtimer_init(timer, clock_id, mode); + lockdep_init_map(&timer->lockdep_map, name, key, 0); } EXPORT_SYMBOL_GPL(hrtimer_init_on_stack); static void __hrtimer_init_sleeper(struct hrtimer_sleeper *sl, clockid_t clock_id, enum hrtimer_mode mode); -void hrtimer_init_sleeper_on_stack(struct hrtimer_sleeper *sl, - clockid_t clock_id, enum hrtimer_mode mode) +void hrtimer_init_sleeper_on_stack_key(struct hrtimer_sleeper *sl, + clockid_t clock_id, enum hrtimer_mode mode, + const char *name, struct lock_class_key *key) { debug_object_init_on_stack(&sl->timer, &hrtimer_debug_descr); __hrtimer_init_sleeper(sl, clock_id, mode); + lockdep_init_map(&sl->timer.lockdep_map, name, key, 0); } EXPORT_SYMBOL_GPL(hrtimer_init_sleeper_on_stack); @@ -1439,6 +1443,8 @@ int hrtimer_cancel(struct hrtimer *timer { int ret; + lock_map_acquire(&timer->lockdep_map); + lock_map_release(&timer->lockdep_map); do { ret = hrtimer_try_to_cancel(timer); @@ -1586,11 +1592,12 @@ static void __hrtimer_init(struct hrtime * but the PINNED bit is ignored as pinning happens * when the hrtimer is started */ -void hrtimer_init(struct hrtimer *timer, clockid_t clock_id, - enum hrtimer_mode mode) +void hrtimer_init_key(struct hrtimer *timer, clockid_t clock_id, enum hrtimer_mode mode, + const char *name, struct lock_class_key *key) { debug_init(timer, clock_id, mode); __hrtimer_init(timer, clock_id, mode); + lockdep_init_map(&timer->lockdep_map, name, key, 0); } EXPORT_SYMBOL_GPL(hrtimer_init); @@ -1647,6 +1654,11 @@ static void __run_hrtimer(struct hrtimer enum hrtimer_restart (*fn)(struct hrtimer *); bool expires_in_hardirq; int restart; +#ifdef CONFIG_LOCKDEP + struct lockdep_map lockdep_map; + + lockdep_copy_map(&lockdep_map, &timer->lockdep_map); +#endif lockdep_assert_held(&cpu_base->lock); @@ -1682,7 +1694,9 @@ static void __run_hrtimer(struct hrtimer trace_hrtimer_expire_entry(timer, now); expires_in_hardirq = lockdep_hrtimer_enter(timer); + lock_map_acquire(&lockdep_map); restart = fn(timer); + lock_map_release(&lockdep_map); lockdep_hrtimer_exit(expires_in_hardirq); trace_hrtimer_expire_exit(timer); @@ -2004,12 +2018,13 @@ static void __hrtimer_init_sleeper(struc * @clock_id: the clock to be used * @mode: timer mode abs/rel */ -void hrtimer_init_sleeper(struct hrtimer_sleeper *sl, clockid_t clock_id, - enum hrtimer_mode mode) +void hrtimer_init_sleeper_key(struct hrtimer_sleeper *sl, clockid_t clock_id, + enum hrtimer_mode mode, + const char *name, struct lock_class_key *key) { debug_init(&sl->timer, clock_id, mode); __hrtimer_init_sleeper(sl, clock_id, mode); - + lockdep_init_map(&sl->timer.lockdep_map, name, key, 0); } EXPORT_SYMBOL_GPL(hrtimer_init_sleeper); -- ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: BUG: soft lockup in smp_call_function 2023-09-14 12:21 ` Hillf Danton @ 2023-09-14 13:13 ` Tetsuo Handa 0 siblings, 0 replies; 26+ messages in thread From: Tetsuo Handa @ 2023-09-14 13:13 UTC (permalink / raw) To: Thomas Gleixner Cc: Sanan Hasanov, Linus Torvalds, syzkaller, LKML, Hillf Danton On 2023/09/14 21:21, Hillf Danton wrote: > On Wed, 13 Sep 2023 23:30:23 +0900 Tetsuo Handa wrote: >> On 2023/09/13 20:07, Hillf Danton wrote: >>> >>> cpu1 cpu4 (see below) >>> ==== ==== >>> drm_crtc_vblank_off __run_hrtimer >>> spin_lock_irq(&dev->event_lock); >>> ... >>> drm_handle_vblank >>> hrtimer_cancel spin_lock_irqsave(&dev->event_lock, irqflags); >>> >>> >>> Deadlock should have been reported instead provided the lockdep_map in >>> struct timer_list were added also to hrtimer, so it is highly appreciated >>> if Tetsuo or Thomas adds it before 6.8 or 6.10. >> >> Not me. ;-) >> >> Since hrtimer_cancel() retries forever until lock_hrtimer_base() succeeds, >> we want to add a lockdep annotation into hrtimer_cancel() so that we can >> detect this type of deadlock? Here is a reproducer. ---------------------------------------- #include <linux/module.h> #include <linux/delay.h> static DEFINE_SPINLOCK(lock1); static struct hrtimer timer1; static enum hrtimer_restart timer_func(struct hrtimer *timer) { unsigned long flags; mdelay(100); // Wait for test_init() to hold lock1. spin_lock_irqsave(&lock1, flags); spin_unlock_irqrestore(&lock1, flags); return HRTIMER_RESTART; } static int __init test_init(void) { unsigned long flags; hrtimer_init(&timer1, CLOCK_MONOTONIC, HRTIMER_MODE_REL); timer1.function = &timer_func; hrtimer_start(&timer1, 1, HRTIMER_MODE_REL); mdelay(10); // Wait for timer_func() to start. spin_lock_irqsave(&lock1, flags); hrtimer_cancel(&timer1); // Wait for timer_func() to finish. spin_unlock_irqrestore(&lock1, flags); return -EINVAL; } module_init(test_init); MODULE_LICENSE("GPL"); ---------------------------------------- ---------------------------------------- [ 996.507681] test: loading out-of-tree module taints kernel. [ 996.514019] test: module verification failed: signature and/or required key missing - tainting kernel [ 1061.893054] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: [ 1061.900411] rcu: 4-...0: (1 GPs behind) idle=ed6c/1/0x4000000000000000 softirq=3304/3305 fqs=15784 [ 1061.909128] rcu: (detected by 0, t=65018 jiffies, g=12625, q=4422 ncpus=12) [ 1061.915003] Sending NMI from CPU 0 to CPUs 4: [ 1061.918972] NMI backtrace for cpu 4 [ 1061.919036] CPU: 4 PID: 3826 Comm: insmod Tainted: G OE 6.6.0-rc1+ #20 [ 1061.919093] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 [ 1061.919095] RIP: 0010:delay_tsc+0x34/0xa0 [ 1061.919560] Code: ff 05 e8 b1 26 70 65 44 8b 0d e4 b1 26 70 0f 01 f9 66 90 48 c1 e2 20 48 09 c2 49 89 d0 eb 21 65 ff 0d c8 b1 26 70 74 54 f3 90 <65> ff 05 bd b1 26 70 65 8b 35 ba b1 26 70 41 39 f1 75 28 41 89 f1 [ 1061.919563] RSP: 0018:ffffb471c059cf00 EFLAGS: 00000083 [ 1061.919567] RAX: 0000028efe104ef6 RBX: 0000000000000041 RCX: 0000000000000004 [ 1061.919569] RDX: 00000000002192f8 RSI: 0000000000000004 RDI: 000000000027d81e [ 1061.919571] RBP: ffff8970dafe5040 R08: 0000028efdeebbfe R09: 0000000000000004 [ 1061.919572] R10: 0000000000000001 R11: ffffffffc0a8d600 R12: ffffffff90e030e0 [ 1061.919574] R13: ffff8970dafe5040 R14: ffffffffc0a8b010 R15: ffff8970dafe5100 [ 1061.919630] FS: 00007fdd998eb740(0000) GS:ffff8970dae00000(0000) knlGS:0000000000000000 [ 1061.919633] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1061.919635] CR2: 0000000001edf678 CR3: 00000001a1060000 CR4: 0000000000350ee0 [ 1061.919639] Call Trace: [ 1061.919641] <NMI> [ 1061.919646] ? nmi_cpu_backtrace+0xb1/0x130 [ 1061.919711] ? nmi_cpu_backtrace_handler+0x11/0x20 [ 1061.922096] ? nmi_handle+0xe4/0x290 [ 1061.922163] ? default_do_nmi+0x49/0x100 [ 1061.922196] ? exc_nmi+0x152/0x1e0 [ 1061.922198] ? end_repeat_nmi+0x16/0x67 [ 1061.922340] ? __pfx_timer_func+0x10/0x10 [test] [ 1061.922347] ? delay_tsc+0x34/0xa0 [ 1061.922349] ? delay_tsc+0x34/0xa0 [ 1061.922350] ? delay_tsc+0x34/0xa0 [ 1061.922352] </NMI> [ 1061.922353] <IRQ> [ 1061.922353] timer_func+0x19/0xff0 [test] [ 1061.922358] __hrtimer_run_queues+0x177/0x3a0 [ 1061.922362] hrtimer_interrupt+0x104/0x240 [ 1061.922364] ? __do_softirq+0x2db/0x392 [ 1061.922827] __sysvec_apic_timer_interrupt+0x64/0x180 [ 1061.922833] sysvec_apic_timer_interrupt+0x65/0x80 [ 1061.922894] </IRQ> [ 1061.922896] <TASK> [ 1061.922898] asm_sysvec_apic_timer_interrupt+0x1a/0x20 [ 1061.922902] RIP: 0010:delay_tsc+0x4d/0xa0 [ 1061.922907] Code: c2 49 89 d0 eb 21 65 ff 0d c8 b1 26 70 74 54 f3 90 65 ff 05 bd b1 26 70 65 8b 35 ba b1 26 70 41 39 f1 75 28 41 89 f1 0f 01 f9 <66> 90 48 c1 e2 20 48 09 d0 48 89 c2 4c 29 c2 48 39 fa 72 c8 65 ff [ 1061.922909] RSP: 0018:ffffb471c1e63bd0 EFLAGS: 00000246 [ 1061.922912] RAX: 00000000751ed8ab RBX: 000000000000000a RCX: 0000000000000004 [ 1061.922914] RDX: 0000000000000267 RSI: 0000000000000004 RDI: 000000000027d81e [ 1061.922915] RBP: ffffffffc0a91010 R08: 00000267751adc59 R09: 0000000000000004 [ 1061.922917] R10: 0000000000000001 R11: ffffffff90cd85c8 R12: 0000000000000000 [ 1061.922918] R13: ffffb471c1e63d20 R14: 0000000000000000 R15: ffffffffc0a8d080 [ 1061.922923] ? __pfx_test_init+0x10/0x10 [test] [ 1061.922934] test_init+0x52/0xff0 [test] [ 1061.922941] do_one_initcall+0x5c/0x280 [ 1061.923004] ? kmalloc_trace+0xa9/0xc0 [ 1061.923105] do_init_module+0x60/0x240 [ 1061.923111] load_module+0x1f6e/0x20d0 [ 1061.923119] ? ima_post_read_file+0xe3/0xf0 [ 1061.923225] ? init_module_from_file+0x88/0xc0 [ 1061.923229] init_module_from_file+0x88/0xc0 [ 1061.923238] idempotent_init_module+0x19c/0x250 [ 1061.923244] ? security_capable+0x39/0x60 [ 1061.923304] __x64_sys_finit_module+0x5b/0xb0 [ 1061.923310] do_syscall_64+0x3b/0x90 [ 1061.923366] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 1061.923421] RIP: 0033:0x7fdd986f8e29 [ 1061.923427] Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 17 e0 2c 00 f7 d8 64 89 01 48 [ 1061.923429] RSP: 002b:00007ffe2f34dd18 EFLAGS: 00000206 ORIG_RAX: 0000000000000139 [ 1061.923432] RAX: ffffffffffffffda RBX: 0000000001ede240 RCX: 00007fdd986f8e29 [ 1061.923434] RDX: 0000000000000000 RSI: 000000000041a96e RDI: 0000000000000003 [ 1061.923435] RBP: 000000000041a96e R08: 0000000000000000 R09: 00007ffe2f34deb8 [ 1061.923436] R10: 0000000000000003 R11: 0000000000000206 R12: 0000000000000000 [ 1061.923437] R13: 0000000001ede210 R14: 0000000000000000 R15: 0000000000000000 [ 1061.923444] </TASK> [ 1061.923446] INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 4.474 msecs ---------------------------------------- > > Yes, you are right. > > The diff below is my two cents (only for thoughts). > I'm thinking something like below. (Completely untested.) I haven't checked IRQ state handling. But in the last diff chunk, why raw_spin_unlock_irqrestore() (which does not re-enable IRQs if the caller already disabled IRQs) is used before calling the callback function and raw_spin_lock_irq() (which always disables IRQs) is used after calling the callback function? Is it legal to disable IRQs again when the caller already disabled IRQs? ---------------------------------------- diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h index 0ee140176f10..5640730ec31c 100644 --- a/include/linux/hrtimer.h +++ b/include/linux/hrtimer.h @@ -123,8 +123,11 @@ struct hrtimer { u8 state; u8 is_rel; u8 is_soft; u8 is_hard; +#ifdef CONFIG_LOCKDEP + struct lockdep_map lockdep_map; +#endif }; /** * struct hrtimer_sleeper - simple sleeper structure @@ -440,15 +443,15 @@ static inline void hrtimer_restart(struct hrtimer *timer) hrtimer_start_expires(timer, HRTIMER_MODE_ABS); } /* Query timers: */ -extern ktime_t __hrtimer_get_remaining(const struct hrtimer *timer, bool adjust); +extern ktime_t __hrtimer_get_remaining(struct hrtimer *timer, bool adjust); /** * hrtimer_get_remaining - get remaining time for the timer * @timer: the timer to read */ -static inline ktime_t hrtimer_get_remaining(const struct hrtimer *timer) +static inline ktime_t hrtimer_get_remaining(struct hrtimer *timer) { return __hrtimer_get_remaining(timer, false); } diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index 238262e4aba7..fe0681d34b56 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -161,14 +161,23 @@ static inline bool is_migration_base(struct hrtimer_clock_base *base) * possible to set timer->base = &migration_base and drop the lock: the timer * remains locked. */ static -struct hrtimer_clock_base *lock_hrtimer_base(const struct hrtimer *timer, +struct hrtimer_clock_base *lock_hrtimer_base(struct hrtimer *timer, unsigned long *flags) __acquires(&timer->base->lock) { struct hrtimer_clock_base *base; +#ifdef CONFIG_LOCKDEP + unsigned long flags2; + + local_irq_save(flags2); + lock_map_acquire(&timer->lockdep_map); + lock_map_release(&timer->lockdep_map); + local_irq_restore(flags2); +#endif + for (;;) { base = READ_ONCE(timer->base); if (likely(base != &migration_base)) { raw_spin_lock_irqsave(&base->cpu_base->lock, *flags); @@ -1456,9 +1465,9 @@ EXPORT_SYMBOL_GPL(hrtimer_cancel); * __hrtimer_get_remaining - get remaining time for the timer * @timer: the timer to read * @adjust: adjust relative timers when CONFIG_TIME_LOW_RES=y */ -ktime_t __hrtimer_get_remaining(const struct hrtimer *timer, bool adjust) +ktime_t __hrtimer_get_remaining(struct hrtimer *timer, bool adjust) { unsigned long flags; ktime_t rem; @@ -1574,8 +1583,14 @@ static void __hrtimer_init(struct hrtimer *timer, clockid_t clock_id, timer->is_soft = softtimer; timer->is_hard = !!(mode & HRTIMER_MODE_HARD); timer->base = &cpu_base->clock_base[base]; timerqueue_init(&timer->node); +#ifdef CONFIG_LOCKDEP + { + static struct lock_class_key __key; + lockdep_init_map(&timer->lockdep_map, "hrtimer", &__key, 0); + } +#endif } /** * hrtimer_init - initialize a timer to the given clock @@ -1684,9 +1699,19 @@ static void __run_hrtimer(struct hrtimer_cpu_base *cpu_base, raw_spin_unlock_irqrestore(&cpu_base->lock, flags); trace_hrtimer_expire_entry(timer, now); expires_in_hardirq = lockdep_hrtimer_enter(timer); +#ifdef CONFIG_LOCKDEP + local_irq_save(flags); + lock_map_acquire(&timer->lockdep_map); + local_irq_restore(flags); +#endif restart = fn(timer); +#ifdef CONFIG_LOCKDEP + local_irq_save(flags); + lock_map_release(&timer->lockdep_map); + local_irq_restore(flags); +#endif lockdep_hrtimer_exit(expires_in_hardirq); trace_hrtimer_expire_exit(timer); raw_spin_lock_irq(&cpu_base->lock); ---------------------------------------- ^ permalink raw reply related [flat|nested] 26+ messages in thread
* BUG: soft lockup in smp_call_function @ 2025-11-20 6:48 Xianying Wang 0 siblings, 0 replies; 26+ messages in thread From: Xianying Wang @ 2025-11-20 6:48 UTC (permalink / raw) To: pbonzini; +Cc: vkuznets, kvm, linux-kernel, tglx@linutronix.de Hi, I hit a repeatable soft lockup in csd_lock_wait() via smp_call_function_many_cond() while running a KVM guest with a syzkaller workload. This soft lockup can be triggered by running the attached C reproducer inside a KVM guest for some time. The reproducer just loops perf_event_open() + ioctl(PERF_EVENT_IOC_REFRESH) + socket(AF_INET6, ...) in a child process, while normal userspace (systemd/journald) is running.This may be a soft lockup caused by an incomplete cross-CPU TLB flush (smp_call_function_many_cond / csd_lock_wait). The lockup occurs in csd_lock_wait() in kernel/smp.c (inlined into smp_call_function_many_cond()), with the upper call chain being flush_tlb_mm_range() → kvm_flush_tlb_multi(), triggered by an ext4 fsync(). Since this is a KVM guest and syzkaller typically does a lot of stressing, it looks like a possible race between kvm_flush_tlb_multi() and CPU state (e.g. CPU hotplug / vCPU offlining or an incorrect cpumask) in the paravirt TLB shootdown path, where one target CPU never processes the IPI. This can be reproduced on: HEAD commit: e5f0a698b34ed76002dc5cff3804a61c80233a7a 6fab32bb6508abbb8b7b1c5498e44f0c32320ed5 report: https://pastebin.com/raw/Lu4Tz2SH console output :https://pastebin.com/raw/BxtNEXnq console output v6.17.0:https://pastebin.com/raw/PBytK7Wq kernel config : https://pastebin.com/raw/1grwrT16 C reproducer :https://pastebin.com/raw/ySCpMzk2 Let me know if you need more details or testing. Best regards, Xianying ^ permalink raw reply [flat|nested] 26+ messages in thread
* BUG: soft lockup in smp_call_function @ 2021-09-15 1:59 Hao Sun 2021-09-15 5:12 ` Zhang, Qiang 0 siblings, 1 reply; 26+ messages in thread From: Hao Sun @ 2021-09-15 1:59 UTC (permalink / raw) To: linux-kernel; +Cc: mingo Hello, When using Healer to fuzz the latest Linux kernel, the following two similar crashes was triggered. HEAD commit: 6880fa6c5660 Linux 5.15-rc1 git tree: upstream console output: https://drive.google.com/file/d/1W2g_mj5JPDXI5U7HibVepNdxMdChKS0_/view?usp=sharing kernel config: https://drive.google.com/file/d/1rUzyMbe5vcs6khA3tL9EHTLJvsUdWcgB/view?usp=sharing similar report: INFO: rcu detected stall in smp_call_function Sorry, I don't have a reproducer for this crash, hope the symbolized report can help. If you fix this issue, please add the following tag to the commit: Reported-by: Hao Sun <sunhao.th@gmail.com> watchdog: BUG: soft lockup - CPU#1 stuck for 157s! [kworker/1:8:14241] Modules linked in: irq event stamp: 371624 hardirqs last enabled at (371623): [<ffffffff84400c02>] asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638 hardirqs last disabled at (371624): [<ffffffff8424818a>] sysvec_apic_timer_interrupt+0xa/0xc0 arch/x86/kernel/apic/apic.c:1097 softirqs last enabled at (337636): [<ffffffff84600398>] softirq_handle_end kernel/softirq.c:401 [inline] softirqs last enabled at (337636): [<ffffffff84600398>] __do_softirq+0x398/0x561 kernel/softirq.c:587 softirqs last disabled at (337625): [<ffffffff81235b42>] invoke_softirq kernel/softirq.c:432 [inline] softirqs last disabled at (337625): [<ffffffff81235b42>] __irq_exit_rcu kernel/softirq.c:636 [inline] softirqs last disabled at (337625): [<ffffffff81235b42>] irq_exit_rcu+0xe2/0x100 kernel/softirq.c:648 CPU: 1 PID: 14241 Comm: kworker/1:8 Not tainted 5.15.0-rc1 #16 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 Workqueue: rcu_gp wait_rcu_exp_gp RIP: 0010:csd_lock_wait kernel/smp.c:440 [inline] RIP: 0010:csd_lock kernel/smp.c:446 [inline] RIP: 0010:smp_call_function_single+0x1a3/0x220 kernel/smp.c:745 Code: 48 c7 c3 80 a8 02 00 e8 cb b1 f1 02 89 c0 48 83 f8 07 77 6c 48 03 1c c5 80 98 62 85 8b 43 08 a8 01 74 0e e8 3f 9d 04 00 f3 90 <8b> 43 08 a8 01 75 f2 e8 31 9d 04 00 83 4b 08 01 48 89 de 44 89 e7 RSP: 0018:ffffc90001d4fd00 EFLAGS: 00000293 RAX: 0000000000000000 RBX: ffff88813dc2a880 RCX: 0000000000000000 RDX: ffff888112ac8000 RSI: ffffffff8132f4c1 RDI: ffffffff852d80e6 RBP: ffffc90001d4fd90 R08: 0000000000000001 R09: 0000000000000001 R10: ffffc90001d4fda8 R11: 0000000000000002 R12: 0000000000000002 R13: 0000000000000000 R14: ffffffff812ef7c0 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88813dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055a15807ccf0 CR3: 0000000025f6b000 CR4: 0000000000750ee0 PKRU: 55555554 Call Trace: sync_rcu_exp_select_node_cpus+0x27b/0x440 kernel/rcu/tree_exp.h:393 sync_rcu_exp_select_cpus+0x238/0x5d0 kernel/rcu/tree_exp.h:441 rcu_exp_sel_wait_wake kernel/rcu/tree_exp.h:615 [inline] wait_rcu_exp_gp+0xa/0x20 kernel/rcu/tree_exp.h:629 process_one_work+0x359/0x850 kernel/workqueue.c:2297 worker_thread+0x41/0x4d0 kernel/workqueue.c:2444 kthread+0x178/0x1b0 kernel/kthread.c:319 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 Sending NMI from CPU 1 to CPUs 0,2-3: NMI backtrace for cpu 0 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.0-rc1 #16 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 RIP: 0010:__lock_acquire+0x14f/0x1d60 kernel/locking/lockdep.c:4955 Code: 00 48 89 43 08 48 8b 04 24 48 89 43 18 41 8b b5 dc 09 00 00 65 8b 05 08 ed d5 7e 85 c0 0f 95 c2 31 c0 85 f6 44 89 fe 0f 95 c0 <83> e1 03 c1 e6 07 8d 14 50 0f b6 43 21 83 e2 03 c1 e2 05 83 e0 1f RSP: 0018:ffffc90000003c40 EFLAGS: 00000002 RAX: 0000000000000001 RBX: ffffffff858b2d70 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88813dd1a3d8 RBP: 0000000000000057 R08: 0000000000000001 R09: 0000000000000001 R10: ffff88813dd1a3d8 R11: 0000000000000003 R12: 0000000000000001 R13: ffffffff858b2300 R14: ffffffff858b2cf8 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055a1580bf6d7 CR3: 0000000025f6b000 CR4: 0000000000750ef0 PKRU: 55555554 Call Trace: <IRQ> lock_acquire+0x1f9/0x340 kernel/locking/lockdep.c:5625 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0x38/0x50 kernel/locking/spinlock.c:162 lock_hrtimer_base+0x2f/0x70 kernel/time/hrtimer.c:173 hrtimer_try_to_cancel+0x6d/0x270 kernel/time/hrtimer.c:1331 hrtimer_cancel+0x12/0x30 kernel/time/hrtimer.c:1443 __disable_vblank drivers/gpu/drm/drm_vblank.c:434 [inline] drm_vblank_disable_and_save+0xca/0x130 drivers/gpu/drm/drm_vblank.c:478 vblank_disable_fn+0x83/0xa0 drivers/gpu/drm/drm_vblank.c:495 call_timer_fn+0xcb/0x3f0 kernel/time/timer.c:1421 expire_timers kernel/time/timer.c:1466 [inline] __run_timers kernel/time/timer.c:1734 [inline] run_timer_softirq+0x6bd/0x820 kernel/time/timer.c:1747 __do_softirq+0xe9/0x561 kernel/softirq.c:558 invoke_softirq kernel/softirq.c:432 [inline] __irq_exit_rcu kernel/softirq.c:636 [inline] irq_exit_rcu+0xe2/0x100 kernel/softirq.c:648 sysvec_apic_timer_interrupt+0x9e/0xc0 arch/x86/kernel/apic/apic.c:1097 </IRQ> asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638 RIP: 0010:default_idle+0xb/0x10 arch/x86/kernel/process.c:717 Code: 60 ce 25 84 75 09 48 83 c4 18 5b 5d 41 5c c3 e8 ab c4 fe ff cc cc cc cc cc cc cc cc cc cc cc eb 07 0f 00 2d ff 00 5c 00 fb f4 <c3> 0f 1f 40 00 65 48 8b 04 25 40 70 01 00 f0 80 48 02 20 48 8b 10 RSP: 0018:ffffffff85803e90 EFLAGS: 00000202 RAX: 00000000010c94ab RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff85524f3d RDI: ffffffff853cbec6 RBP: ffffffff860d9290 R08: 0000000000000001 R09: 0000000000000001 R10: ffffffff85803be8 R11: 0000000000000001 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff858b2300 default_idle_call+0x6a/0x260 kernel/sched/idle.c:112 cpuidle_idle_call kernel/sched/idle.c:194 [inline] do_idle+0x1e9/0x2b0 kernel/sched/idle.c:306 cpu_startup_entry+0x14/0x20 kernel/sched/idle.c:403 start_kernel+0x839/0x860 init/main.c:1141 secondary_startup_64_no_verify+0xb0/0xbb NMI backtrace for cpu 2 CPU: 2 PID: 9794 Comm: sshd Not tainted 5.15.0-rc1 #16 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 RIP: 0010:__sanitizer_cov_trace_pc+0x37/0x60 kernel/kcov.c:197 Code: 65 48 8b 14 25 40 70 01 00 81 e1 00 01 00 00 a9 00 01 ff 00 74 0e 85 c9 74 35 8b 82 34 15 00 00 85 c0 74 2b 8b 82 10 15 00 00 <83> f8 02 75 20 48 8b 8a 18 15 00 00 8b 92 14 15 00 00 48 8b 01 48 RSP: 0018:ffffc900008cfa80 EFLAGS: 00000246 RAX: 0000000000000000 RBX: ffff88807dc318e0 RCX: 0000000000000000 RDX: ffff88800e9c4480 RSI: ffffffff8132ebc0 RDI: 00000000ffffffff RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000001 R10: ffffc900008cfa30 R11: 0000000000000000 R12: ffff88807dd2a900 R13: ffff88807dd2a908 R14: ffffffff85a27620 R15: 0000000000000000 FS: 00007f831f8568c0(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffd76d1df1f CR3: 000000000fe3c000 CR4: 0000000000750ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: rep_nop arch/x86/include/asm/vdso/processor.h:13 [inline] cpu_relax arch/x86/include/asm/vdso/processor.h:18 [inline] csd_lock_wait kernel/smp.c:440 [inline] smp_call_function_many_cond+0x1d0/0x550 kernel/smp.c:969 on_each_cpu_cond_mask+0x48/0x90 kernel/smp.c:1135 __purge_vmap_area_lazy+0xd7/0xa40 mm/vmalloc.c:1683 _vm_unmap_aliases+0x233/0x280 mm/vmalloc.c:2107 change_page_attr_set_clr+0xb3/0x340 arch/x86/mm/pat/set_memory.c:1740 change_page_attr_clear arch/x86/mm/pat/set_memory.c:1797 [inline] set_memory_ro+0x2b/0x40 arch/x86/mm/pat/set_memory.c:1943 bpf_jit_binary_lock_ro include/linux/filter.h:890 [inline] bpf_int_jit_compile+0x65d/0x6b0 arch/x86/net/bpf_jit_comp.c:2338 bpf_prog_select_runtime+0x173/0x1e0 kernel/bpf/core.c:1914 bpf_migrate_filter net/core/filter.c:1295 [inline] bpf_prepare_filter+0x40d/0x5d0 net/core/filter.c:1343 bpf_prog_create_from_user+0xcf/0x150 net/core/filter.c:1437 seccomp_prepare_filter kernel/seccomp.c:666 [inline] seccomp_prepare_user_filter kernel/seccomp.c:703 [inline] seccomp_set_mode_filter kernel/seccomp.c:1824 [inline] do_seccomp+0x2ca/0xdb0 kernel/seccomp.c:1944 prctl_set_seccomp+0x2f/0x60 kernel/seccomp.c:1997 __do_sys_prctl kernel/sys.c:2346 [inline] __se_sys_prctl kernel/sys.c:2264 [inline] __x64_sys_prctl+0xd4/0x770 kernel/sys.c:2264 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x34/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f831d9cd54a Code: 48 8b 0d 51 f9 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 9d 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1e f9 2a 00 f7 d8 64 89 01 48 RSP: 002b:00007ffd76d1c428 EFLAGS: 00000246 ORIG_RAX: 000000000000009d RAX: ffffffffffffffda RBX: 00005597d3a36f20 RCX: 00007f831d9cd54a RDX: 00005597d32bdf50 RSI: 0000000000000002 RDI: 0000000000000016 RBP: 00005597d3a35d40 R08: 0000000000000000 R09: 0000000000000005 R10: 00007f831d9cd54a R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000028 R14: 0000000000000000 R15: 00007ffd76d1c870 NMI backtrace for cpu 3 CPU: 3 PID: 9782 Comm: syz-executor Not tainted 5.15.0-rc1 #16 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 RIP: 0010:kvm_wait+0x4a/0x70 arch/x86/kernel/kvm.c:893 Code: 5d c3 89 f3 48 89 fd 9c 58 fa f6 c4 02 75 30 0f b6 45 00 38 c3 74 16 e8 94 82 25 00 fb 5b 5d c3 eb 07 0f 00 2d 95 c1 6d 03 f4 <5b> 5d c3 e8 7e 82 25 00 eb 07 0f 00 2d 83 c1 6d 03 fb f4 eb c0 e8 RSP: 0000:ffffc90000768e00 EFLAGS: 00000046 RAX: 0000000000000003 RBX: ffff888100998338 RCX: 0000000000000008 RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff888100998338 RBP: ffff88813dd2a600 R08: 0000000000000000 R09: 0000000000000001 R10: ffffc90000768d20 R11: 0000000000000002 R12: 0000000000000001 R13: 0000000000000100 R14: 0000000000000000 R15: 0000000000100000 FS: 00007f849523b700(0000) GS:ffff88813dd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020400010 CR3: 0000000025f6b000 CR4: 0000000000750ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <IRQ> pv_wait arch/x86/include/asm/paravirt.h:597 [inline] pv_wait_head_or_lock kernel/locking/qspinlock_paravirt.h:470 [inline] __pv_queued_spin_lock_slowpath+0x262/0x330 kernel/locking/qspinlock.c:508 pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:585 [inline] queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:51 [inline] queued_spin_lock include/asm-generic/qspinlock.h:85 [inline] do_raw_spin_lock+0xb6/0xc0 kernel/locking/spinlock_debug.c:115 spin_lock include/linux/spinlock.h:363 [inline] drm_handle_vblank+0x86/0x530 drivers/gpu/drm/drm_vblank.c:1951 vkms_vblank_simulate+0x5a/0x190 drivers/gpu/drm/vkms/vkms_crtc.c:29 __run_hrtimer kernel/time/hrtimer.c:1685 [inline] __hrtimer_run_queues+0xb8/0x610 kernel/time/hrtimer.c:1749 hrtimer_interrupt+0xfe/0x280 kernel/time/hrtimer.c:1811 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1086 [inline] __sysvec_apic_timer_interrupt+0x9c/0x2c0 arch/x86/kernel/apic/apic.c:1103 sysvec_apic_timer_interrupt+0x99/0xc0 arch/x86/kernel/apic/apic.c:1097 </IRQ> asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638 RIP: 0010:__raw_spin_unlock_irq include/linux/spinlock_api_smp.h:169 [inline] RIP: 0010:_raw_spin_unlock_irq+0x25/0x40 kernel/locking/spinlock.c:202 Code: 0f 1f 44 00 00 53 48 8b 74 24 08 48 89 fb 48 83 c7 18 e8 be ce 05 fd 48 89 df e8 f6 06 06 fd e8 e1 19 14 fd fb bf 01 00 00 00 <e8> 76 9c 01 fd 65 8b 05 df 93 db 7b 85 c0 74 02 5b c3 e8 9b 4c da RSP: 0000:ffffc90001503e18 EFLAGS: 00000202 RAX: 00000000000b619e RBX: ffff88813dd18240 RCX: 0000000000000100 RDX: 0000000000000000 RSI: ffffffff85524f3d RDI: 0000000000000001 RBP: ffffc90001503e50 R08: 0000000000000001 R09: 0000000000000001 R10: ffffc90001503e28 R11: 0000000000000001 R12: ffff88810d86c9c0 R13: ffffc90001503e50 R14: ffffffff839a5e10 R15: ffffffff860d9290 expire_timers kernel/time/timer.c:1465 [inline] __run_timers kernel/time/timer.c:1734 [inline] run_timer_softirq+0x6ae/0x820 kernel/time/timer.c:1747 __do_softirq+0xe9/0x561 kernel/softirq.c:558 invoke_softirq kernel/softirq.c:432 [inline] __irq_exit_rcu kernel/softirq.c:636 [inline] irq_exit_rcu+0xe2/0x100 kernel/softirq.c:648 sysvec_apic_timer_interrupt+0x52/0xc0 arch/x86/kernel/apic/apic.c:1097 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638 RIP: 0033:0x418dcf Code: 1f 84 00 00 00 00 00 0f 1f 40 00 64 8b 04 25 08 03 00 00 41 89 c3 41 83 cb 02 41 39 c3 74 17 f0 64 44 0f b1 1c 25 08 03 00 00 <75> e7 41 83 e3 bb 41 83 fb 0a 74 01 c3 48 83 ec 08 64 48 c7 04 25 RSP: 002b:00007f849523a9b8 EFLAGS: 00000246 RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000002 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000003 R15: 00000000204413a0 ---------------- Code disassembly (best guess): 0: 48 c7 c3 80 a8 02 00 mov $0x2a880,%rbx 7: e8 cb b1 f1 02 callq 0x2f1b1d7 c: 89 c0 mov %eax,%eax e: 48 83 f8 07 cmp $0x7,%rax 12: 77 6c ja 0x80 14: 48 03 1c c5 80 98 62 add -0x7a9d6780(,%rax,8),%rbx 1b: 85 1c: 8b 43 08 mov 0x8(%rbx),%eax 1f: a8 01 test $0x1,%al 21: 74 0e je 0x31 23: e8 3f 9d 04 00 callq 0x49d67 28: f3 90 pause * 2a: 8b 43 08 mov 0x8(%rbx),%eax <-- trapping instruction 2d: a8 01 test $0x1,%al 2f: 75 f2 jne 0x23 31: e8 31 9d 04 00 callq 0x49d67 36: 83 4b 08 01 orl $0x1,0x8(%rbx) 3a: 48 89 de mov %rbx,%rsi 3d: 44 89 e7 mov %r12d,%edi ===== INFO: rcu detected stall in smp_call_function rcu: INFO: rcu_preempt self-detected stall on CPU rcu: 0-....: (3203 ticks this GP) idle=417/1/0x4000000000000000 softirq=25875/25875 fqs=485 (t=10500 jiffies g=37621 q=38) rcu: rcu_preempt kthread starved for 486 jiffies! g37621 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=3 rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior. rcu: RCU grace-period kthread stack dump: task:rcu_preempt state:R running task stack:13832 pid: 16 ppid: 2 flags:0x00004000 Call Trace: context_switch kernel/sched/core.c:4940 [inline] __schedule+0x323/0xae0 kernel/sched/core.c:6287 schedule+0x36/0xe0 kernel/sched/core.c:6366 schedule_timeout+0x2cb/0x430 kernel/time/timer.c:1881 rcu_gp_fqs_loop+0x2f7/0x3e0 kernel/rcu/tree.c:1957 rcu_gp_kthread+0xd0/0x160 kernel/rcu/tree.c:2130 kthread+0x178/0x1b0 kernel/kthread.c:319 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 rcu: Stack dump where RCU GP kthread last ran: Sending NMI from CPU 0 to CPUs 3: NMI backtrace for cpu 3 CPU: 3 PID: 14220 Comm: kworker/3:10 Not tainted 5.15.0-rc1 #16 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 Workqueue: rcu_gp wait_rcu_exp_gp RIP: 0010:csd_lock_wait kernel/smp.c:440 [inline] RIP: 0010:csd_lock kernel/smp.c:446 [inline] RIP: 0010:smp_call_function_single+0x1a6/0x220 kernel/smp.c:745 Code: 80 a8 02 00 e8 cb b1 f1 02 89 c0 48 83 f8 07 77 6c 48 03 1c c5 80 98 62 85 8b 43 08 a8 01 74 0e e8 3f 9d 04 00 f3 90 8b 43 08 <a8> 01 75 f2 e8 31 9d 04 00 83 4b 08 01 48 89 de 44 89 e7 4c 89 73 RSP: 0018:ffffc90005793d00 EFLAGS: 00000293 RAX: 0000000000000001 RBX: ffff88813dd2a880 RCX: 0000000000000000 RDX: ffff888103ddc480 RSI: ffffffff8132f4c1 RDI: ffffffff852d80e6 RBP: ffffc90005793d90 R08: 0000000000000001 R09: 0000000000000001 R10: ffffc90005793da8 R11: 0000000000000000 R12: 0000000000000002 R13: 0000000000000000 R14: ffffffff812ef7c0 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88813dd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055b72e9c2cd8 CR3: 000000000dedc000 CR4: 0000000000750ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: sync_rcu_exp_select_node_cpus+0x27b/0x440 kernel/rcu/tree_exp.h:393 sync_rcu_exp_select_cpus+0x238/0x5d0 kernel/rcu/tree_exp.h:441 rcu_exp_sel_wait_wake kernel/rcu/tree_exp.h:615 [inline] wait_rcu_exp_gp+0xa/0x20 kernel/rcu/tree_exp.h:629 process_one_work+0x359/0x850 kernel/workqueue.c:2297 worker_thread+0x41/0x4d0 kernel/workqueue.c:2444 kthread+0x178/0x1b0 kernel/kthread.c:319 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 NMI backtrace for cpu 0 CPU: 0 PID: 9 Comm: kworker/u9:0 Not tainted 5.15.0-rc1 #16 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 Workqueue: events_unbound toggle_allocation_gate Call Trace: <IRQ> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x8d/0xcf lib/dump_stack.c:106 nmi_cpu_backtrace+0x1e9/0x210 lib/nmi_backtrace.c:105 nmi_trigger_cpumask_backtrace+0x120/0x180 lib/nmi_backtrace.c:62 trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline] rcu_dump_cpu_stacks+0x125/0x17b kernel/rcu/tree_stall.h:343 print_cpu_stall kernel/rcu/tree_stall.h:627 [inline] check_cpu_stall kernel/rcu/tree_stall.h:711 [inline] rcu_pending kernel/rcu/tree.c:3880 [inline] rcu_sched_clock_irq+0xc20/0x1360 kernel/rcu/tree.c:2599 update_process_times+0xac/0x100 kernel/time/timer.c:1785 tick_sched_handle.isra.20+0x27/0x70 kernel/time/tick-sched.c:226 tick_sched_timer+0x8a/0xb0 kernel/time/tick-sched.c:1421 __run_hrtimer kernel/time/hrtimer.c:1685 [inline] __hrtimer_run_queues+0x23e/0x610 kernel/time/hrtimer.c:1749 hrtimer_interrupt+0xfe/0x280 kernel/time/hrtimer.c:1811 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1086 [inline] __sysvec_apic_timer_interrupt+0x9c/0x2c0 arch/x86/kernel/apic/apic.c:1103 sysvec_apic_timer_interrupt+0x99/0xc0 arch/x86/kernel/apic/apic.c:1097 </IRQ> asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638 RIP: 0010:csd_lock_wait kernel/smp.c:440 [inline] RIP: 0010:smp_call_function_many_cond+0x1d2/0x550 kernel/smp.c:969 Code: a6 04 00 4c 63 fd 49 8b 1c 24 49 83 ff 07 0f 87 31 03 00 00 4a 03 1c fd 80 98 62 85 8b 43 08 a8 01 74 0e e8 40 a6 04 00 f3 90 <8b> 43 08 a8 01 75 f2 e8 32 a6 04 00 eb ad 48 83 c4 40 5b 5d 41 5c RSP: 0018:ffffc9000067bc80 EFLAGS: 00000293 RAX: 0000000000000000 RBX: ffff88807dd2f9c0 RCX: 0000000000000000 RDX: ffff88800985c480 RSI: ffffffff8132ebc0 RDI: 0000000000000001 RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000000000001 R10: ffffc9000067bc28 R11: 0000000000000000 R12: ffff88807dc2a900 R13: ffff88807dc2a908 R14: ffffffff85a27620 R15: 0000000000000002 on_each_cpu_cond_mask+0x48/0x90 kernel/smp.c:1135 on_each_cpu include/linux/smp.h:71 [inline] text_poke_sync arch/x86/kernel/alternative.c:929 [inline] text_poke_bp_batch+0xb6/0x2c0 arch/x86/kernel/alternative.c:1114 text_poke_flush arch/x86/kernel/alternative.c:1268 [inline] text_poke_finish+0x16/0x30 arch/x86/kernel/alternative.c:1275 arch_jump_label_transform_apply+0x13/0x20 arch/x86/kernel/jump_label.c:146 jump_label_update+0xbc/0x190 kernel/jump_label.c:830 static_key_enable_cpuslocked+0x77/0xb0 kernel/jump_label.c:177 static_key_enable+0x16/0x20 kernel/jump_label.c:190 toggle_allocation_gate+0x71/0x240 mm/kfence/core.c:626 process_one_work+0x359/0x850 kernel/workqueue.c:2297 worker_thread+0x41/0x4d0 kernel/workqueue.c:2444 kthread+0x178/0x1b0 kernel/kthread.c:319 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 Sending NMI from CPU 0 to CPUs 2: ---------------- Code disassembly (best guess), 2 bytes skipped: 0: 02 00 add (%rax),%al 2: e8 cb b1 f1 02 callq 0x2f1b1d2 7: 89 c0 mov %eax,%eax 9: 48 83 f8 07 cmp $0x7,%rax d: 77 6c ja 0x7b f: 48 03 1c c5 80 98 62 add -0x7a9d6780(,%rax,8),%rbx 16: 85 17: 8b 43 08 mov 0x8(%rbx),%eax 1a: a8 01 test $0x1,%al 1c: 74 0e je 0x2c 1e: e8 3f 9d 04 00 callq 0x49d62 23: f3 90 pause 25: 8b 43 08 mov 0x8(%rbx),%eax * 28: a8 01 test $0x1,%al <-- trapping instruction 2a: 75 f2 jne 0x1e 2c: e8 31 9d 04 00 callq 0x49d62 31: 83 4b 08 01 orl $0x1,0x8(%rbx) 35: 48 89 de mov %rbx,%rsi 38: 44 89 e7 mov %r12d,%edi 3b: 4c rex.WR 3c: 89 .byte 0x89 3d: 73 .byte 0x73 ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: BUG: soft lockup in smp_call_function 2021-09-15 1:59 Hao Sun @ 2021-09-15 5:12 ` Zhang, Qiang 0 siblings, 0 replies; 26+ messages in thread From: Zhang, Qiang @ 2021-09-15 5:12 UTC (permalink / raw) To: Hao Sun, linux-kernel@vger.kernel.org; +Cc: mingo@kernel.org Hi Sun It's like csd_lock cannot be acquired, causing the problem Enable CONFIG_CSD_LOCK_WAIT_DEBUG for more information. Thanks Qiang ________________________________________ From: Hao Sun <sunhao.th@gmail.com> Sent: Wednesday, 15 September 2021 09:59 To: linux-kernel@vger.kernel.org Cc: mingo@kernel.org Subject: BUG: soft lockup in smp_call_function [Please note: This e-mail is from an EXTERNAL e-mail address] Hello, When using Healer to fuzz the latest Linux kernel, the following two similar crashes was triggered. HEAD commit: 6880fa6c5660 Linux 5.15-rc1 git tree: upstream console output: https://drive.google.com/file/d/1W2g_mj5JPDXI5U7HibVepNdxMdChKS0_/view?usp=sharing kernel config: https://drive.google.com/file/d/1rUzyMbe5vcs6khA3tL9EHTLJvsUdWcgB/view?usp=sharing similar report: INFO: rcu detected stall in smp_call_function Sorry, I don't have a reproducer for this crash, hope the symbolized report can help. If you fix this issue, please add the following tag to the commit: Reported-by: Hao Sun <sunhao.th@gmail.com> watchdog: BUG: soft lockup - CPU#1 stuck for 157s! [kworker/1:8:14241] Modules linked in: irq event stamp: 371624 hardirqs last enabled at (371623): [<ffffffff84400c02>] asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638 hardirqs last disabled at (371624): [<ffffffff8424818a>] sysvec_apic_timer_interrupt+0xa/0xc0 arch/x86/kernel/apic/apic.c:1097 softirqs last enabled at (337636): [<ffffffff84600398>] softirq_handle_end kernel/softirq.c:401 [inline] softirqs last enabled at (337636): [<ffffffff84600398>] __do_softirq+0x398/0x561 kernel/softirq.c:587 softirqs last disabled at (337625): [<ffffffff81235b42>] invoke_softirq kernel/softirq.c:432 [inline] softirqs last disabled at (337625): [<ffffffff81235b42>] __irq_exit_rcu kernel/softirq.c:636 [inline] softirqs last disabled at (337625): [<ffffffff81235b42>] irq_exit_rcu+0xe2/0x100 kernel/softirq.c:648 CPU: 1 PID: 14241 Comm: kworker/1:8 Not tainted 5.15.0-rc1 #16 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 Workqueue: rcu_gp wait_rcu_exp_gp RIP: 0010:csd_lock_wait kernel/smp.c:440 [inline] RIP: 0010:csd_lock kernel/smp.c:446 [inline] RIP: 0010:smp_call_function_single+0x1a3/0x220 kernel/smp.c:745 Code: 48 c7 c3 80 a8 02 00 e8 cb b1 f1 02 89 c0 48 83 f8 07 77 6c 48 03 1c c5 80 98 62 85 8b 43 08 a8 01 74 0e e8 3f 9d 04 00 f3 90 <8b> 43 08 a8 01 75 f2 e8 31 9d 04 00 83 4b 08 01 48 89 de 44 89 e7 RSP: 0018:ffffc90001d4fd00 EFLAGS: 00000293 RAX: 0000000000000000 RBX: ffff88813dc2a880 RCX: 0000000000000000 RDX: ffff888112ac8000 RSI: ffffffff8132f4c1 RDI: ffffffff852d80e6 RBP: ffffc90001d4fd90 R08: 0000000000000001 R09: 0000000000000001 R10: ffffc90001d4fda8 R11: 0000000000000002 R12: 0000000000000002 R13: 0000000000000000 R14: ffffffff812ef7c0 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88813dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055a15807ccf0 CR3: 0000000025f6b000 CR4: 0000000000750ee0 PKRU: 55555554 Call Trace: sync_rcu_exp_select_node_cpus+0x27b/0x440 kernel/rcu/tree_exp.h:393 sync_rcu_exp_select_cpus+0x238/0x5d0 kernel/rcu/tree_exp.h:441 rcu_exp_sel_wait_wake kernel/rcu/tree_exp.h:615 [inline] wait_rcu_exp_gp+0xa/0x20 kernel/rcu/tree_exp.h:629 process_one_work+0x359/0x850 kernel/workqueue.c:2297 worker_thread+0x41/0x4d0 kernel/workqueue.c:2444 kthread+0x178/0x1b0 kernel/kthread.c:319 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 Sending NMI from CPU 1 to CPUs 0,2-3: NMI backtrace for cpu 0 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.0-rc1 #16 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 RIP: 0010:__lock_acquire+0x14f/0x1d60 kernel/locking/lockdep.c:4955 Code: 00 48 89 43 08 48 8b 04 24 48 89 43 18 41 8b b5 dc 09 00 00 65 8b 05 08 ed d5 7e 85 c0 0f 95 c2 31 c0 85 f6 44 89 fe 0f 95 c0 <83> e1 03 c1 e6 07 8d 14 50 0f b6 43 21 83 e2 03 c1 e2 05 83 e0 1f RSP: 0018:ffffc90000003c40 EFLAGS: 00000002 RAX: 0000000000000001 RBX: ffffffff858b2d70 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88813dd1a3d8 RBP: 0000000000000057 R08: 0000000000000001 R09: 0000000000000001 R10: ffff88813dd1a3d8 R11: 0000000000000003 R12: 0000000000000001 R13: ffffffff858b2300 R14: ffffffff858b2cf8 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055a1580bf6d7 CR3: 0000000025f6b000 CR4: 0000000000750ef0 PKRU: 55555554 Call Trace: <IRQ> lock_acquire+0x1f9/0x340 kernel/locking/lockdep.c:5625 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0x38/0x50 kernel/locking/spinlock.c:162 lock_hrtimer_base+0x2f/0x70 kernel/time/hrtimer.c:173 hrtimer_try_to_cancel+0x6d/0x270 kernel/time/hrtimer.c:1331 hrtimer_cancel+0x12/0x30 kernel/time/hrtimer.c:1443 __disable_vblank drivers/gpu/drm/drm_vblank.c:434 [inline] drm_vblank_disable_and_save+0xca/0x130 drivers/gpu/drm/drm_vblank.c:478 vblank_disable_fn+0x83/0xa0 drivers/gpu/drm/drm_vblank.c:495 call_timer_fn+0xcb/0x3f0 kernel/time/timer.c:1421 expire_timers kernel/time/timer.c:1466 [inline] __run_timers kernel/time/timer.c:1734 [inline] run_timer_softirq+0x6bd/0x820 kernel/time/timer.c:1747 __do_softirq+0xe9/0x561 kernel/softirq.c:558 invoke_softirq kernel/softirq.c:432 [inline] __irq_exit_rcu kernel/softirq.c:636 [inline] irq_exit_rcu+0xe2/0x100 kernel/softirq.c:648 sysvec_apic_timer_interrupt+0x9e/0xc0 arch/x86/kernel/apic/apic.c:1097 </IRQ> asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638 RIP: 0010:default_idle+0xb/0x10 arch/x86/kernel/process.c:717 Code: 60 ce 25 84 75 09 48 83 c4 18 5b 5d 41 5c c3 e8 ab c4 fe ff cc cc cc cc cc cc cc cc cc cc cc eb 07 0f 00 2d ff 00 5c 00 fb f4 <c3> 0f 1f 40 00 65 48 8b 04 25 40 70 01 00 f0 80 48 02 20 48 8b 10 RSP: 0018:ffffffff85803e90 EFLAGS: 00000202 RAX: 00000000010c94ab RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff85524f3d RDI: ffffffff853cbec6 RBP: ffffffff860d9290 R08: 0000000000000001 R09: 0000000000000001 R10: ffffffff85803be8 R11: 0000000000000001 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff858b2300 default_idle_call+0x6a/0x260 kernel/sched/idle.c:112 cpuidle_idle_call kernel/sched/idle.c:194 [inline] do_idle+0x1e9/0x2b0 kernel/sched/idle.c:306 cpu_startup_entry+0x14/0x20 kernel/sched/idle.c:403 start_kernel+0x839/0x860 init/main.c:1141 secondary_startup_64_no_verify+0xb0/0xbb NMI backtrace for cpu 2 CPU: 2 PID: 9794 Comm: sshd Not tainted 5.15.0-rc1 #16 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 RIP: 0010:__sanitizer_cov_trace_pc+0x37/0x60 kernel/kcov.c:197 Code: 65 48 8b 14 25 40 70 01 00 81 e1 00 01 00 00 a9 00 01 ff 00 74 0e 85 c9 74 35 8b 82 34 15 00 00 85 c0 74 2b 8b 82 10 15 00 00 <83> f8 02 75 20 48 8b 8a 18 15 00 00 8b 92 14 15 00 00 48 8b 01 48 RSP: 0018:ffffc900008cfa80 EFLAGS: 00000246 RAX: 0000000000000000 RBX: ffff88807dc318e0 RCX: 0000000000000000 RDX: ffff88800e9c4480 RSI: ffffffff8132ebc0 RDI: 00000000ffffffff RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000001 R10: ffffc900008cfa30 R11: 0000000000000000 R12: ffff88807dd2a900 R13: ffff88807dd2a908 R14: ffffffff85a27620 R15: 0000000000000000 FS: 00007f831f8568c0(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffd76d1df1f CR3: 000000000fe3c000 CR4: 0000000000750ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: rep_nop arch/x86/include/asm/vdso/processor.h:13 [inline] cpu_relax arch/x86/include/asm/vdso/processor.h:18 [inline] csd_lock_wait kernel/smp.c:440 [inline] smp_call_function_many_cond+0x1d0/0x550 kernel/smp.c:969 on_each_cpu_cond_mask+0x48/0x90 kernel/smp.c:1135 __purge_vmap_area_lazy+0xd7/0xa40 mm/vmalloc.c:1683 _vm_unmap_aliases+0x233/0x280 mm/vmalloc.c:2107 change_page_attr_set_clr+0xb3/0x340 arch/x86/mm/pat/set_memory.c:1740 change_page_attr_clear arch/x86/mm/pat/set_memory.c:1797 [inline] set_memory_ro+0x2b/0x40 arch/x86/mm/pat/set_memory.c:1943 bpf_jit_binary_lock_ro include/linux/filter.h:890 [inline] bpf_int_jit_compile+0x65d/0x6b0 arch/x86/net/bpf_jit_comp.c:2338 bpf_prog_select_runtime+0x173/0x1e0 kernel/bpf/core.c:1914 bpf_migrate_filter net/core/filter.c:1295 [inline] bpf_prepare_filter+0x40d/0x5d0 net/core/filter.c:1343 bpf_prog_create_from_user+0xcf/0x150 net/core/filter.c:1437 seccomp_prepare_filter kernel/seccomp.c:666 [inline] seccomp_prepare_user_filter kernel/seccomp.c:703 [inline] seccomp_set_mode_filter kernel/seccomp.c:1824 [inline] do_seccomp+0x2ca/0xdb0 kernel/seccomp.c:1944 prctl_set_seccomp+0x2f/0x60 kernel/seccomp.c:1997 __do_sys_prctl kernel/sys.c:2346 [inline] __se_sys_prctl kernel/sys.c:2264 [inline] __x64_sys_prctl+0xd4/0x770 kernel/sys.c:2264 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x34/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f831d9cd54a Code: 48 8b 0d 51 f9 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 9d 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1e f9 2a 00 f7 d8 64 89 01 48 RSP: 002b:00007ffd76d1c428 EFLAGS: 00000246 ORIG_RAX: 000000000000009d RAX: ffffffffffffffda RBX: 00005597d3a36f20 RCX: 00007f831d9cd54a RDX: 00005597d32bdf50 RSI: 0000000000000002 RDI: 0000000000000016 RBP: 00005597d3a35d40 R08: 0000000000000000 R09: 0000000000000005 R10: 00007f831d9cd54a R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000028 R14: 0000000000000000 R15: 00007ffd76d1c870 NMI backtrace for cpu 3 CPU: 3 PID: 9782 Comm: syz-executor Not tainted 5.15.0-rc1 #16 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 RIP: 0010:kvm_wait+0x4a/0x70 arch/x86/kernel/kvm.c:893 Code: 5d c3 89 f3 48 89 fd 9c 58 fa f6 c4 02 75 30 0f b6 45 00 38 c3 74 16 e8 94 82 25 00 fb 5b 5d c3 eb 07 0f 00 2d 95 c1 6d 03 f4 <5b> 5d c3 e8 7e 82 25 00 eb 07 0f 00 2d 83 c1 6d 03 fb f4 eb c0 e8 RSP: 0000:ffffc90000768e00 EFLAGS: 00000046 RAX: 0000000000000003 RBX: ffff888100998338 RCX: 0000000000000008 RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff888100998338 RBP: ffff88813dd2a600 R08: 0000000000000000 R09: 0000000000000001 R10: ffffc90000768d20 R11: 0000000000000002 R12: 0000000000000001 R13: 0000000000000100 R14: 0000000000000000 R15: 0000000000100000 FS: 00007f849523b700(0000) GS:ffff88813dd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020400010 CR3: 0000000025f6b000 CR4: 0000000000750ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <IRQ> pv_wait arch/x86/include/asm/paravirt.h:597 [inline] pv_wait_head_or_lock kernel/locking/qspinlock_paravirt.h:470 [inline] __pv_queued_spin_lock_slowpath+0x262/0x330 kernel/locking/qspinlock.c:508 pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:585 [inline] queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:51 [inline] queued_spin_lock include/asm-generic/qspinlock.h:85 [inline] do_raw_spin_lock+0xb6/0xc0 kernel/locking/spinlock_debug.c:115 spin_lock include/linux/spinlock.h:363 [inline] drm_handle_vblank+0x86/0x530 drivers/gpu/drm/drm_vblank.c:1951 vkms_vblank_simulate+0x5a/0x190 drivers/gpu/drm/vkms/vkms_crtc.c:29 __run_hrtimer kernel/time/hrtimer.c:1685 [inline] __hrtimer_run_queues+0xb8/0x610 kernel/time/hrtimer.c:1749 hrtimer_interrupt+0xfe/0x280 kernel/time/hrtimer.c:1811 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1086 [inline] __sysvec_apic_timer_interrupt+0x9c/0x2c0 arch/x86/kernel/apic/apic.c:1103 sysvec_apic_timer_interrupt+0x99/0xc0 arch/x86/kernel/apic/apic.c:1097 </IRQ> asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638 RIP: 0010:__raw_spin_unlock_irq include/linux/spinlock_api_smp.h:169 [inline] RIP: 0010:_raw_spin_unlock_irq+0x25/0x40 kernel/locking/spinlock.c:202 Code: 0f 1f 44 00 00 53 48 8b 74 24 08 48 89 fb 48 83 c7 18 e8 be ce 05 fd 48 89 df e8 f6 06 06 fd e8 e1 19 14 fd fb bf 01 00 00 00 <e8> 76 9c 01 fd 65 8b 05 df 93 db 7b 85 c0 74 02 5b c3 e8 9b 4c da RSP: 0000:ffffc90001503e18 EFLAGS: 00000202 RAX: 00000000000b619e RBX: ffff88813dd18240 RCX: 0000000000000100 RDX: 0000000000000000 RSI: ffffffff85524f3d RDI: 0000000000000001 RBP: ffffc90001503e50 R08: 0000000000000001 R09: 0000000000000001 R10: ffffc90001503e28 R11: 0000000000000001 R12: ffff88810d86c9c0 R13: ffffc90001503e50 R14: ffffffff839a5e10 R15: ffffffff860d9290 expire_timers kernel/time/timer.c:1465 [inline] __run_timers kernel/time/timer.c:1734 [inline] run_timer_softirq+0x6ae/0x820 kernel/time/timer.c:1747 __do_softirq+0xe9/0x561 kernel/softirq.c:558 invoke_softirq kernel/softirq.c:432 [inline] __irq_exit_rcu kernel/softirq.c:636 [inline] irq_exit_rcu+0xe2/0x100 kernel/softirq.c:648 sysvec_apic_timer_interrupt+0x52/0xc0 arch/x86/kernel/apic/apic.c:1097 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638 RIP: 0033:0x418dcf Code: 1f 84 00 00 00 00 00 0f 1f 40 00 64 8b 04 25 08 03 00 00 41 89 c3 41 83 cb 02 41 39 c3 74 17 f0 64 44 0f b1 1c 25 08 03 00 00 <75> e7 41 83 e3 bb 41 83 fb 0a 74 01 c3 48 83 ec 08 64 48 c7 04 25 RSP: 002b:00007f849523a9b8 EFLAGS: 00000246 RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000002 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000003 R15: 00000000204413a0 ---------------- Code disassembly (best guess): 0: 48 c7 c3 80 a8 02 00 mov $0x2a880,%rbx 7: e8 cb b1 f1 02 callq 0x2f1b1d7 c: 89 c0 mov %eax,%eax e: 48 83 f8 07 cmp $0x7,%rax 12: 77 6c ja 0x80 14: 48 03 1c c5 80 98 62 add -0x7a9d6780(,%rax,8),%rbx 1b: 85 1c: 8b 43 08 mov 0x8(%rbx),%eax 1f: a8 01 test $0x1,%al 21: 74 0e je 0x31 23: e8 3f 9d 04 00 callq 0x49d67 28: f3 90 pause * 2a: 8b 43 08 mov 0x8(%rbx),%eax <-- trapping instruction 2d: a8 01 test $0x1,%al 2f: 75 f2 jne 0x23 31: e8 31 9d 04 00 callq 0x49d67 36: 83 4b 08 01 orl $0x1,0x8(%rbx) 3a: 48 89 de mov %rbx,%rsi 3d: 44 89 e7 mov %r12d,%edi ===== INFO: rcu detected stall in smp_call_function rcu: INFO: rcu_preempt self-detected stall on CPU rcu: 0-....: (3203 ticks this GP) idle=417/1/0x4000000000000000 softirq=25875/25875 fqs=485 (t=10500 jiffies g=37621 q=38) rcu: rcu_preempt kthread starved for 486 jiffies! g37621 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=3 rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior. rcu: RCU grace-period kthread stack dump: task:rcu_preempt state:R running task stack:13832 pid: 16 ppid: 2 flags:0x00004000 Call Trace: context_switch kernel/sched/core.c:4940 [inline] __schedule+0x323/0xae0 kernel/sched/core.c:6287 schedule+0x36/0xe0 kernel/sched/core.c:6366 schedule_timeout+0x2cb/0x430 kernel/time/timer.c:1881 rcu_gp_fqs_loop+0x2f7/0x3e0 kernel/rcu/tree.c:1957 rcu_gp_kthread+0xd0/0x160 kernel/rcu/tree.c:2130 kthread+0x178/0x1b0 kernel/kthread.c:319 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 rcu: Stack dump where RCU GP kthread last ran: Sending NMI from CPU 0 to CPUs 3: NMI backtrace for cpu 3 CPU: 3 PID: 14220 Comm: kworker/3:10 Not tainted 5.15.0-rc1 #16 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 Workqueue: rcu_gp wait_rcu_exp_gp RIP: 0010:csd_lock_wait kernel/smp.c:440 [inline] RIP: 0010:csd_lock kernel/smp.c:446 [inline] RIP: 0010:smp_call_function_single+0x1a6/0x220 kernel/smp.c:745 Code: 80 a8 02 00 e8 cb b1 f1 02 89 c0 48 83 f8 07 77 6c 48 03 1c c5 80 98 62 85 8b 43 08 a8 01 74 0e e8 3f 9d 04 00 f3 90 8b 43 08 <a8> 01 75 f2 e8 31 9d 04 00 83 4b 08 01 48 89 de 44 89 e7 4c 89 73 RSP: 0018:ffffc90005793d00 EFLAGS: 00000293 RAX: 0000000000000001 RBX: ffff88813dd2a880 RCX: 0000000000000000 RDX: ffff888103ddc480 RSI: ffffffff8132f4c1 RDI: ffffffff852d80e6 RBP: ffffc90005793d90 R08: 0000000000000001 R09: 0000000000000001 R10: ffffc90005793da8 R11: 0000000000000000 R12: 0000000000000002 R13: 0000000000000000 R14: ffffffff812ef7c0 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88813dd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055b72e9c2cd8 CR3: 000000000dedc000 CR4: 0000000000750ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: sync_rcu_exp_select_node_cpus+0x27b/0x440 kernel/rcu/tree_exp.h:393 sync_rcu_exp_select_cpus+0x238/0x5d0 kernel/rcu/tree_exp.h:441 rcu_exp_sel_wait_wake kernel/rcu/tree_exp.h:615 [inline] wait_rcu_exp_gp+0xa/0x20 kernel/rcu/tree_exp.h:629 process_one_work+0x359/0x850 kernel/workqueue.c:2297 worker_thread+0x41/0x4d0 kernel/workqueue.c:2444 kthread+0x178/0x1b0 kernel/kthread.c:319 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 NMI backtrace for cpu 0 CPU: 0 PID: 9 Comm: kworker/u9:0 Not tainted 5.15.0-rc1 #16 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 Workqueue: events_unbound toggle_allocation_gate Call Trace: <IRQ> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x8d/0xcf lib/dump_stack.c:106 nmi_cpu_backtrace+0x1e9/0x210 lib/nmi_backtrace.c:105 nmi_trigger_cpumask_backtrace+0x120/0x180 lib/nmi_backtrace.c:62 trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline] rcu_dump_cpu_stacks+0x125/0x17b kernel/rcu/tree_stall.h:343 print_cpu_stall kernel/rcu/tree_stall.h:627 [inline] check_cpu_stall kernel/rcu/tree_stall.h:711 [inline] rcu_pending kernel/rcu/tree.c:3880 [inline] rcu_sched_clock_irq+0xc20/0x1360 kernel/rcu/tree.c:2599 update_process_times+0xac/0x100 kernel/time/timer.c:1785 tick_sched_handle.isra.20+0x27/0x70 kernel/time/tick-sched.c:226 tick_sched_timer+0x8a/0xb0 kernel/time/tick-sched.c:1421 __run_hrtimer kernel/time/hrtimer.c:1685 [inline] __hrtimer_run_queues+0x23e/0x610 kernel/time/hrtimer.c:1749 hrtimer_interrupt+0xfe/0x280 kernel/time/hrtimer.c:1811 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1086 [inline] __sysvec_apic_timer_interrupt+0x9c/0x2c0 arch/x86/kernel/apic/apic.c:1103 sysvec_apic_timer_interrupt+0x99/0xc0 arch/x86/kernel/apic/apic.c:1097 </IRQ> asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638 RIP: 0010:csd_lock_wait kernel/smp.c:440 [inline] RIP: 0010:smp_call_function_many_cond+0x1d2/0x550 kernel/smp.c:969 Code: a6 04 00 4c 63 fd 49 8b 1c 24 49 83 ff 07 0f 87 31 03 00 00 4a 03 1c fd 80 98 62 85 8b 43 08 a8 01 74 0e e8 40 a6 04 00 f3 90 <8b> 43 08 a8 01 75 f2 e8 32 a6 04 00 eb ad 48 83 c4 40 5b 5d 41 5c RSP: 0018:ffffc9000067bc80 EFLAGS: 00000293 RAX: 0000000000000000 RBX: ffff88807dd2f9c0 RCX: 0000000000000000 RDX: ffff88800985c480 RSI: ffffffff8132ebc0 RDI: 0000000000000001 RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000000000001 R10: ffffc9000067bc28 R11: 0000000000000000 R12: ffff88807dc2a900 R13: ffff88807dc2a908 R14: ffffffff85a27620 R15: 0000000000000002 on_each_cpu_cond_mask+0x48/0x90 kernel/smp.c:1135 on_each_cpu include/linux/smp.h:71 [inline] text_poke_sync arch/x86/kernel/alternative.c:929 [inline] text_poke_bp_batch+0xb6/0x2c0 arch/x86/kernel/alternative.c:1114 text_poke_flush arch/x86/kernel/alternative.c:1268 [inline] text_poke_finish+0x16/0x30 arch/x86/kernel/alternative.c:1275 arch_jump_label_transform_apply+0x13/0x20 arch/x86/kernel/jump_label.c:146 jump_label_update+0xbc/0x190 kernel/jump_label.c:830 static_key_enable_cpuslocked+0x77/0xb0 kernel/jump_label.c:177 static_key_enable+0x16/0x20 kernel/jump_label.c:190 toggle_allocation_gate+0x71/0x240 mm/kfence/core.c:626 process_one_work+0x359/0x850 kernel/workqueue.c:2297 worker_thread+0x41/0x4d0 kernel/workqueue.c:2444 kthread+0x178/0x1b0 kernel/kthread.c:319 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 Sending NMI from CPU 0 to CPUs 2: ---------------- Code disassembly (best guess), 2 bytes skipped: 0: 02 00 add (%rax),%al 2: e8 cb b1 f1 02 callq 0x2f1b1d2 7: 89 c0 mov %eax,%eax 9: 48 83 f8 07 cmp $0x7,%rax d: 77 6c ja 0x7b f: 48 03 1c c5 80 98 62 add -0x7a9d6780(,%rax,8),%rbx 16: 85 17: 8b 43 08 mov 0x8(%rbx),%eax 1a: a8 01 test $0x1,%al 1c: 74 0e je 0x2c 1e: e8 3f 9d 04 00 callq 0x49d62 23: f3 90 pause 25: 8b 43 08 mov 0x8(%rbx),%eax * 28: a8 01 test $0x1,%al <-- trapping instruction 2a: 75 f2 jne 0x1e 2c: e8 31 9d 04 00 callq 0x49d62 31: 83 4b 08 01 orl $0x1,0x8(%rbx) 35: 48 89 de mov %rbx,%rsi 38: 44 89 e7 mov %r12d,%edi 3b: 4c rex.WR 3c: 89 .byte 0x89 3d: 73 .byte 0x73 ^ permalink raw reply [flat|nested] 26+ messages in thread
* BUG: soft lockup in smp_call_function
@ 2020-07-12 23:02 syzbot
2020-07-13 6:53 ` Dmitry Vyukov
` (2 more replies)
0 siblings, 3 replies; 26+ messages in thread
From: syzbot @ 2020-07-12 23:02 UTC (permalink / raw)
To: bp, hpa, linux-kernel, luto, mingo, syzkaller-bugs, tglx, x86
Hello,
syzbot found the following crash on:
HEAD commit: 4437dd6e Merge tag 'io_uring-5.8-2020-07-12' of git://git...
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=17183f0b100000
kernel config: https://syzkaller.appspot.com/x/.config?x=66ad203c2bb6d8b
dashboard link: https://syzkaller.appspot.com/bug?extid=cce3691658bef1b12ac9
compiler: gcc (GCC) 10.1.0-syz 20200507
Unfortunately, I don't have any reproducer for this crash yet.
IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+cce3691658bef1b12ac9@syzkaller.appspotmail.com
watchdog: BUG: soft lockup - CPU#0 stuck for 123s! [syz-executor.5:6928]
Modules linked in:
irq event stamp: 45064
hardirqs last enabled at (45063): [<ffffffff88000c42>] asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:587
hardirqs last disabled at (45064): [<ffffffff87e4f2ed>] idtentry_enter_cond_rcu+0x1d/0x50 arch/x86/entry/common.c:649
softirqs last enabled at (44672): [<ffffffff88200748>] __do_softirq+0x748/0xa60 kernel/softirq.c:319
softirqs last disabled at (44667): [<ffffffff88000f0f>] asm_call_on_stack+0xf/0x20 arch/x86/entry/entry_64.S:711
CPU: 0 PID: 6928 Comm: syz-executor.5 Not tainted 5.8.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:csd_lock_wait kernel/smp.c:108 [inline]
RIP: 0010:smp_call_function_single+0x197/0x4f0 kernel/smp.c:382
Code: 48 8d 74 24 40 48 89 44 24 50 48 8b 44 24 08 48 89 44 24 58 e8 fa f9 ff ff 41 89 c5 eb 07 e8 90 00 0b 00 f3 90 44 8b 64 24 48 <31> ff 41 83 e4 01 44 89 e6 e8 fb fc 0a 00 45 85 e4 75 e1 e8 71 00
RSP: 0018:ffffc90008b27aa0 EFLAGS: 00000293
RAX: 0000000000000000 RBX: 1ffff92001164f58 RCX: ffffffff8168b985
RDX: ffff88805cd7e4c0 RSI: ffffffff8168b970 RDI: 0000000000000005
RBP: ffffc90008b27b68 R08: 0000000000000001 R09: ffff8880ae736dc7
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000011
R13: 0000000000000000 R14: 0000000000000001 R15: ffffc90008b27c88
FS: 00000000014c3940(0000) GS:ffff8880ae600000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000556f55a93ed3 CR3: 0000000213330000 CR4: 00000000001406f0
DR0: 0000000020000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Call Trace:
task_function_call+0xd7/0x160 kernel/events/core.c:116
event_function_call+0x20a/0x420 kernel/events/core.c:283
perf_remove_from_context+0x9b/0x1c0 kernel/events/core.c:2338
perf_event_release_kernel+0x485/0xe00 kernel/events/core.c:4918
perf_release+0x33/0x40 kernel/events/core.c:5014
__fput+0x33c/0x880 fs/file_table.c:281
task_work_run+0xdd/0x190 kernel/task_work.c:135
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_usermode_loop arch/x86/entry/common.c:239 [inline]
__prepare_exit_to_usermode+0x1e9/0x1f0 arch/x86/entry/common.c:269
do_syscall_64+0x6c/0xe0 arch/x86/entry/common.c:393
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x416721
Code: Bad RIP value.
RSP: 002b:00007ffdb12d68f0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000005 RCX: 0000000000416721
RDX: 0000000000000000 RSI: 0000000000000e0d RDI: 0000000000000004
RBP: 0000000000000001 R08: 00000000569bce0a R09: 00000000569bce0e
R10: 00007ffdb12d69e0 R11: 0000000000000293 R12: 000000000078c900
R13: 000000000078c900 R14: ffffffffffffffff R15: 000000000078bf0c
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 6932 Comm: syz-executor.5 Not tainted 5.8.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:hlock_class kernel/locking/lockdep.c:179 [inline]
RIP: 0010:mark_lock+0xc4/0x1710 kernel/locking/lockdep.c:3937
Code: 20 66 81 e3 ff 1f 0f b7 db be 08 00 00 00 48 89 d8 48 c1 f8 06 48 8d 3c c5 20 3a 59 8c e8 e4 6d 59 00 48 0f a3 1d 6c b6 ff 0a <0f> 83 c1 00 00 00 48 69 db b8 00 00 00 48 81 c3 40 3e 59 8c 48 8d
RSP: 0018:ffffc90000da8a48 EFLAGS: 00000047
RAX: 0000000000000001 RBX: 0000000000000000 RCX: ffffffff815983ac
RDX: fffffbfff18b2745 RSI: 0000000000000008 RDI: ffffffff8c593a20
RBP: ffff8880336c0540 R08: 0000000000000000 R09: ffffffff8c593a27
R10: fffffbfff18b2744 R11: 0000000000000001 R12: 0000000000000008
R13: ffff8880336c0e5a R14: ffff8880336c0e38 R15: ffff8880336c0e10
FS: 00007fd045d50700(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000012ec08cc CR3: 0000000213330000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Call Trace:
<IRQ>
mark_usage kernel/locking/lockdep.c:3885 [inline]
__lock_acquire+0x944/0x56e0 kernel/locking/lockdep.c:4334
lock_acquire+0x1f1/0xad0 kernel/locking/lockdep.c:4959
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x8c/0xc0 kernel/locking/spinlock.c:159
debug_object_deactivate lib/debugobjects.c:710 [inline]
debug_object_deactivate+0x101/0x300 lib/debugobjects.c:698
debug_hrtimer_deactivate kernel/time/hrtimer.c:421 [inline]
debug_deactivate kernel/time/hrtimer.c:482 [inline]
__run_hrtimer kernel/time/hrtimer.c:1488 [inline]
__hrtimer_run_queues+0x3cb/0xfc0 kernel/time/hrtimer.c:1584
hrtimer_interrupt+0x32a/0x930 kernel/time/hrtimer.c:1646
local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1080 [inline]
__sysvec_apic_timer_interrupt+0x142/0x5e0 arch/x86/kernel/apic/apic.c:1097
asm_call_on_stack+0xf/0x20 arch/x86/entry/entry_64.S:711
</IRQ>
__run_on_irqstack arch/x86/include/asm/irq_stack.h:22 [inline]
run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:48 [inline]
sysvec_apic_timer_interrupt+0xe0/0x120 arch/x86/kernel/apic/apic.c:1091
asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:587
RIP: 0010:arch_static_branch arch/x86/include/asm/jump_label.h:25 [inline]
RIP: 0010:static_key_false include/linux/jump_label.h:200 [inline]
RIP: 0010:native_write_msr+0x6/0x20 arch/x86/include/asm/msr.h:164
Code: c3 89 ef e8 0c 0b f9 ff b8 40 00 00 00 0f 00 d8 89 ef e8 dd 0b f9 ff eb d2 e8 96 6a b2 06 66 0f 1f 44 00 00 89 f9 89 f0 0f 30 <0f> 1f 44 00 00 c3 48 c1 e2 20 89 f6 48 09 d6 31 d2 e9 f4 7c 7a 02
RSP: 0018:ffffc90008b87970 EFLAGS: 00000282
RAX: 00000000000000fb RBX: ffff8880ae619374 RCX: 0000000000000830
RDX: 0000000000000000 RSI: 00000000000000fb RDI: 0000000000000830
RBP: 00000000000000fb R08: 0000000000000001 R09: ffff8880ae636dc7
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000040
paravirt_write_msr arch/x86/include/asm/paravirt.h:173 [inline]
wrmsrl arch/x86/include/asm/paravirt.h:206 [inline]
native_x2apic_icr_write arch/x86/include/asm/apic.h:249 [inline]
__x2apic_send_IPI_dest arch/x86/kernel/apic/x2apic_phys.c:112 [inline]
x2apic_send_IPI+0xb5/0x100 arch/x86/kernel/apic/x2apic_phys.c:41
__smp_call_single_queue kernel/smp.c:150 [inline]
generic_exec_single+0x118/0x480 kernel/smp.c:181
smp_call_function_single+0x186/0x4f0 kernel/smp.c:379
smp_call_function_many_cond+0x1a4/0x990 kernel/smp.c:518
smp_call_function_many kernel/smp.c:577 [inline]
smp_call_function kernel/smp.c:599 [inline]
on_each_cpu+0x4a/0x240 kernel/smp.c:699
clock_was_set+0x18/0x20 kernel/time/hrtimer.c:872
timekeeping_inject_offset+0x3e9/0x4d0 kernel/time/timekeeping.c:1305
do_adjtimex+0x28f/0x990 kernel/time/timekeeping.c:2332
do_clock_adjtime kernel/time/posix-timers.c:1109 [inline]
__do_sys_clock_adjtime+0x155/0x250 kernel/time/posix-timers.c:1121
do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:384
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x45cba9
Code: Bad RIP value.
RSP: 002b:00007fd045d4fc78 EFLAGS: 00000246 ORIG_RAX: 0000000000000131
RAX: ffffffffffffffda RBX: 00000000004dbac0 RCX: 000000000045cba9
RDX: 0000000000000000 RSI: 0000000020000300 RDI: 0000000000000000
RBP: 000000000078bf00 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 0000000000000077 R14: 00000000004c34f3 R15: 00007fd045d506d4
---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.
syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: BUG: soft lockup in smp_call_function 2020-07-12 23:02 syzbot @ 2020-07-13 6:53 ` Dmitry Vyukov 2020-07-13 8:58 ` syzbot 2020-07-15 5:43 ` syzbot 2 siblings, 0 replies; 26+ messages in thread From: Dmitry Vyukov @ 2020-07-13 6:53 UTC (permalink / raw) To: syzbot Cc: Borislav Petkov, H. Peter Anvin, LKML, Andy Lutomirski, Ingo Molnar, syzkaller-bugs, Thomas Gleixner, the arch/x86 maintainers On Mon, Jul 13, 2020 at 1:02 AM syzbot <syzbot+cce3691658bef1b12ac9@syzkaller.appspotmail.com> wrote: > > Hello, > > syzbot found the following crash on: > > HEAD commit: 4437dd6e Merge tag 'io_uring-5.8-2020-07-12' of git://git... > git tree: upstream > console output: https://syzkaller.appspot.com/x/log.txt?x=17183f0b100000 > kernel config: https://syzkaller.appspot.com/x/.config?x=66ad203c2bb6d8b > dashboard link: https://syzkaller.appspot.com/bug?extid=cce3691658bef1b12ac9 > compiler: gcc (GCC) 10.1.0-syz 20200507 > > Unfortunately, I don't have any reproducer for this crash yet. > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > Reported-by: syzbot+cce3691658bef1b12ac9@syzkaller.appspotmail.com It looks like something broken in the kernel recently and now instead of diagnosing a stall on one CPU, it diagnoses it as a stall in smp_call_function on another CPU. This produces a large number of actionable crash reports. > watchdog: BUG: soft lockup - CPU#0 stuck for 123s! [syz-executor.5:6928] > Modules linked in: > irq event stamp: 45064 > hardirqs last enabled at (45063): [<ffffffff88000c42>] asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:587 > hardirqs last disabled at (45064): [<ffffffff87e4f2ed>] idtentry_enter_cond_rcu+0x1d/0x50 arch/x86/entry/common.c:649 > softirqs last enabled at (44672): [<ffffffff88200748>] __do_softirq+0x748/0xa60 kernel/softirq.c:319 > softirqs last disabled at (44667): [<ffffffff88000f0f>] asm_call_on_stack+0xf/0x20 arch/x86/entry/entry_64.S:711 > CPU: 0 PID: 6928 Comm: syz-executor.5 Not tainted 5.8.0-rc4-syzkaller #0 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > RIP: 0010:csd_lock_wait kernel/smp.c:108 [inline] > RIP: 0010:smp_call_function_single+0x197/0x4f0 kernel/smp.c:382 > Code: 48 8d 74 24 40 48 89 44 24 50 48 8b 44 24 08 48 89 44 24 58 e8 fa f9 ff ff 41 89 c5 eb 07 e8 90 00 0b 00 f3 90 44 8b 64 24 48 <31> ff 41 83 e4 01 44 89 e6 e8 fb fc 0a 00 45 85 e4 75 e1 e8 71 00 > RSP: 0018:ffffc90008b27aa0 EFLAGS: 00000293 > RAX: 0000000000000000 RBX: 1ffff92001164f58 RCX: ffffffff8168b985 > RDX: ffff88805cd7e4c0 RSI: ffffffff8168b970 RDI: 0000000000000005 > RBP: ffffc90008b27b68 R08: 0000000000000001 R09: ffff8880ae736dc7 > R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000011 > R13: 0000000000000000 R14: 0000000000000001 R15: ffffc90008b27c88 > FS: 00000000014c3940(0000) GS:ffff8880ae600000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000556f55a93ed3 CR3: 0000000213330000 CR4: 00000000001406f0 > DR0: 0000000020000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600 > Call Trace: > task_function_call+0xd7/0x160 kernel/events/core.c:116 > event_function_call+0x20a/0x420 kernel/events/core.c:283 > perf_remove_from_context+0x9b/0x1c0 kernel/events/core.c:2338 > perf_event_release_kernel+0x485/0xe00 kernel/events/core.c:4918 > perf_release+0x33/0x40 kernel/events/core.c:5014 > __fput+0x33c/0x880 fs/file_table.c:281 > task_work_run+0xdd/0x190 kernel/task_work.c:135 > tracehook_notify_resume include/linux/tracehook.h:188 [inline] > exit_to_usermode_loop arch/x86/entry/common.c:239 [inline] > __prepare_exit_to_usermode+0x1e9/0x1f0 arch/x86/entry/common.c:269 > do_syscall_64+0x6c/0xe0 arch/x86/entry/common.c:393 > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > RIP: 0033:0x416721 > Code: Bad RIP value. > RSP: 002b:00007ffdb12d68f0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003 > RAX: 0000000000000000 RBX: 0000000000000005 RCX: 0000000000416721 > RDX: 0000000000000000 RSI: 0000000000000e0d RDI: 0000000000000004 > RBP: 0000000000000001 R08: 00000000569bce0a R09: 00000000569bce0e > R10: 00007ffdb12d69e0 R11: 0000000000000293 R12: 000000000078c900 > R13: 000000000078c900 R14: ffffffffffffffff R15: 000000000078bf0c > Sending NMI from CPU 0 to CPUs 1: > NMI backtrace for cpu 1 > CPU: 1 PID: 6932 Comm: syz-executor.5 Not tainted 5.8.0-rc4-syzkaller #0 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > RIP: 0010:hlock_class kernel/locking/lockdep.c:179 [inline] > RIP: 0010:mark_lock+0xc4/0x1710 kernel/locking/lockdep.c:3937 > Code: 20 66 81 e3 ff 1f 0f b7 db be 08 00 00 00 48 89 d8 48 c1 f8 06 48 8d 3c c5 20 3a 59 8c e8 e4 6d 59 00 48 0f a3 1d 6c b6 ff 0a <0f> 83 c1 00 00 00 48 69 db b8 00 00 00 48 81 c3 40 3e 59 8c 48 8d > RSP: 0018:ffffc90000da8a48 EFLAGS: 00000047 > RAX: 0000000000000001 RBX: 0000000000000000 RCX: ffffffff815983ac > RDX: fffffbfff18b2745 RSI: 0000000000000008 RDI: ffffffff8c593a20 > RBP: ffff8880336c0540 R08: 0000000000000000 R09: ffffffff8c593a27 > R10: fffffbfff18b2744 R11: 0000000000000001 R12: 0000000000000008 > R13: ffff8880336c0e5a R14: ffff8880336c0e38 R15: ffff8880336c0e10 > FS: 00007fd045d50700(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000012ec08cc CR3: 0000000213330000 CR4: 00000000001406e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600 > Call Trace: > <IRQ> > mark_usage kernel/locking/lockdep.c:3885 [inline] > __lock_acquire+0x944/0x56e0 kernel/locking/lockdep.c:4334 > lock_acquire+0x1f1/0xad0 kernel/locking/lockdep.c:4959 > __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] > _raw_spin_lock_irqsave+0x8c/0xc0 kernel/locking/spinlock.c:159 > debug_object_deactivate lib/debugobjects.c:710 [inline] > debug_object_deactivate+0x101/0x300 lib/debugobjects.c:698 > debug_hrtimer_deactivate kernel/time/hrtimer.c:421 [inline] > debug_deactivate kernel/time/hrtimer.c:482 [inline] > __run_hrtimer kernel/time/hrtimer.c:1488 [inline] > __hrtimer_run_queues+0x3cb/0xfc0 kernel/time/hrtimer.c:1584 > hrtimer_interrupt+0x32a/0x930 kernel/time/hrtimer.c:1646 > local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1080 [inline] > __sysvec_apic_timer_interrupt+0x142/0x5e0 arch/x86/kernel/apic/apic.c:1097 > asm_call_on_stack+0xf/0x20 arch/x86/entry/entry_64.S:711 > </IRQ> > __run_on_irqstack arch/x86/include/asm/irq_stack.h:22 [inline] > run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:48 [inline] > sysvec_apic_timer_interrupt+0xe0/0x120 arch/x86/kernel/apic/apic.c:1091 > asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:587 > RIP: 0010:arch_static_branch arch/x86/include/asm/jump_label.h:25 [inline] > RIP: 0010:static_key_false include/linux/jump_label.h:200 [inline] > RIP: 0010:native_write_msr+0x6/0x20 arch/x86/include/asm/msr.h:164 > Code: c3 89 ef e8 0c 0b f9 ff b8 40 00 00 00 0f 00 d8 89 ef e8 dd 0b f9 ff eb d2 e8 96 6a b2 06 66 0f 1f 44 00 00 89 f9 89 f0 0f 30 <0f> 1f 44 00 00 c3 48 c1 e2 20 89 f6 48 09 d6 31 d2 e9 f4 7c 7a 02 > RSP: 0018:ffffc90008b87970 EFLAGS: 00000282 > RAX: 00000000000000fb RBX: ffff8880ae619374 RCX: 0000000000000830 > RDX: 0000000000000000 RSI: 00000000000000fb RDI: 0000000000000830 > RBP: 00000000000000fb R08: 0000000000000001 R09: ffff8880ae636dc7 > R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 > R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000040 > paravirt_write_msr arch/x86/include/asm/paravirt.h:173 [inline] > wrmsrl arch/x86/include/asm/paravirt.h:206 [inline] > native_x2apic_icr_write arch/x86/include/asm/apic.h:249 [inline] > __x2apic_send_IPI_dest arch/x86/kernel/apic/x2apic_phys.c:112 [inline] > x2apic_send_IPI+0xb5/0x100 arch/x86/kernel/apic/x2apic_phys.c:41 > __smp_call_single_queue kernel/smp.c:150 [inline] > generic_exec_single+0x118/0x480 kernel/smp.c:181 > smp_call_function_single+0x186/0x4f0 kernel/smp.c:379 > smp_call_function_many_cond+0x1a4/0x990 kernel/smp.c:518 > smp_call_function_many kernel/smp.c:577 [inline] > smp_call_function kernel/smp.c:599 [inline] > on_each_cpu+0x4a/0x240 kernel/smp.c:699 > clock_was_set+0x18/0x20 kernel/time/hrtimer.c:872 > timekeeping_inject_offset+0x3e9/0x4d0 kernel/time/timekeeping.c:1305 > do_adjtimex+0x28f/0x990 kernel/time/timekeeping.c:2332 > do_clock_adjtime kernel/time/posix-timers.c:1109 [inline] > __do_sys_clock_adjtime+0x155/0x250 kernel/time/posix-timers.c:1121 > do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:384 > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > RIP: 0033:0x45cba9 > Code: Bad RIP value. > RSP: 002b:00007fd045d4fc78 EFLAGS: 00000246 ORIG_RAX: 0000000000000131 > RAX: ffffffffffffffda RBX: 00000000004dbac0 RCX: 000000000045cba9 > RDX: 0000000000000000 RSI: 0000000020000300 RDI: 0000000000000000 > RBP: 000000000078bf00 R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff > R13: 0000000000000077 R14: 00000000004c34f3 R15: 00007fd045d506d4 > > > --- > This bug is generated by a bot. It may contain errors. > See https://goo.gl/tpsmEJ for more information about syzbot. > syzbot engineers can be reached at syzkaller@googlegroups.com. > > syzbot will keep track of this bug report. See: > https://goo.gl/tpsmEJ#status for how to communicate with syzbot. > > -- > You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group. > To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com. > To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/000000000000080a5f05aa4691a8%40google.com. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: BUG: soft lockup in smp_call_function 2020-07-12 23:02 syzbot 2020-07-13 6:53 ` Dmitry Vyukov @ 2020-07-13 8:58 ` syzbot 2020-07-15 5:43 ` syzbot 2 siblings, 0 replies; 26+ messages in thread From: syzbot @ 2020-07-13 8:58 UTC (permalink / raw) To: bp, dvyukov, hpa, linux-kernel, luto, mingo, syzkaller-bugs, tglx, x86 syzbot has found a reproducer for the following crash on: HEAD commit: 4437dd6e Merge tag 'io_uring-5.8-2020-07-12' of git://git... git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=15998b47100000 kernel config: https://syzkaller.appspot.com/x/.config?x=66ad203c2bb6d8b dashboard link: https://syzkaller.appspot.com/bug?extid=cce3691658bef1b12ac9 compiler: gcc (GCC) 10.1.0-syz 20200507 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=11a160bf100000 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=16d2bd77100000 IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+cce3691658bef1b12ac9@syzkaller.appspotmail.com watchdog: BUG: soft lockup - CPU#1 stuck for 123s! [syz-executor644:6902] Modules linked in: irq event stamp: 173384 hardirqs last enabled at (173383): [<ffffffff88000c42>] asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:587 hardirqs last disabled at (173384): [<ffffffff87e4f2ed>] idtentry_enter_cond_rcu+0x1d/0x50 arch/x86/entry/common.c:649 softirqs last enabled at (172872): [<ffffffff88200748>] __do_softirq+0x748/0xa60 kernel/softirq.c:319 softirqs last disabled at (172863): [<ffffffff88000f0f>] asm_call_on_stack+0xf/0x20 arch/x86/entry/entry_64.S:711 CPU: 1 PID: 6902 Comm: syz-executor644 Not tainted 5.8.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:csd_lock_wait kernel/smp.c:108 [inline] RIP: 0010:smp_call_function_single+0x192/0x4f0 kernel/smp.c:382 Code: 10 8b 7c 24 1c 48 8d 74 24 40 48 89 44 24 50 48 8b 44 24 08 48 89 44 24 58 e8 fa f9 ff ff 41 89 c5 eb 07 e8 90 00 0b 00 f3 90 <44> 8b 64 24 48 31 ff 41 83 e4 01 44 89 e6 e8 fb fc 0a 00 45 85 e4 RSP: 0018:ffffc900023a7a60 EFLAGS: 00000293 RAX: 0000000000000000 RBX: 1ffff92000474f50 RCX: ffffffff8168b985 RDX: ffff8880955564c0 RSI: ffffffff8168b970 RDI: 0000000000000005 RBP: ffffc900023a7b28 R08: 0000000000000001 R09: ffff8880ae636dc7 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000040 FS: 0000000000000000(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000004c8aa8 CR3: 0000000009a79000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: smp_call_function_many_cond+0x1a4/0x990 kernel/smp.c:518 __flush_tlb_others arch/x86/include/asm/paravirt.h:74 [inline] flush_tlb_others arch/x86/mm/tlb.c:856 [inline] flush_tlb_mm_range+0x295/0x4d0 arch/x86/mm/tlb.c:943 tlb_flush arch/x86/include/asm/tlb.h:24 [inline] tlb_flush_mmu_tlbonly include/asm-generic/tlb.h:425 [inline] tlb_flush_mmu_tlbonly include/asm-generic/tlb.h:415 [inline] tlb_flush_mmu mm/mmu_gather.c:248 [inline] tlb_finish_mmu+0x3e7/0x8c0 mm/mmu_gather.c:328 exit_mmap+0x2d1/0x510 mm/mmap.c:3152 __mmput+0x122/0x470 kernel/fork.c:1093 mmput+0x53/0x60 kernel/fork.c:1114 exit_mm kernel/exit.c:482 [inline] do_exit+0xa8f/0x2a40 kernel/exit.c:792 do_group_exit+0x125/0x310 kernel/exit.c:903 __do_sys_exit_group kernel/exit.c:914 [inline] __se_sys_exit_group kernel/exit.c:912 [inline] __x64_sys_exit_group+0x3a/0x50 kernel/exit.c:912 do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:384 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x442478 Code: Bad RIP value. RSP: 002b:00007ffeb3621f28 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 0000000000442478 RDX: 0000000000000001 RSI: 000000000000003c RDI: 0000000000000001 RBP: 00000000004c8a70 R08: 00000000000000e7 R09: ffffffffffffffd0 R10: 0000000001bbbbbb R11: 0000000000000246 R12: 0000000000000001 R13: 00000000006dba40 R14: 0000000000000000 R15: 0000000000000000 Sending NMI from CPU 1 to CPUs 0: NMI backtrace for cpu 0 CPU: 0 PID: 7110 Comm: kworker/0:0 Not tainted 5.8.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: wg-crypt-wg1 wg_packet_tx_worker RIP: 0010:__lock_is_held kernel/locking/lockdep.c:4733 [inline] RIP: 0010:lock_is_held_type+0x72/0xe0 kernel/locking/lockdep.c:4996 Code: 00 00 85 c0 7f 0d eb 72 83 c3 01 3b 9d c8 08 00 00 7d 67 48 63 c3 4c 89 e6 48 8d 04 80 4d 8d 74 c5 00 4c 89 f7 e8 1e fe ff ff <85> c0 74 da 8b 4c 24 04 41 bc 01 00 00 00 83 f9 ff 74 11 41 0f b6 RSP: 0018:ffffc90000007d38 EFLAGS: 00000046 RAX: 0000000000000001 RBX: 0000000000000006 RCX: ffffffff86626577 RDX: dffffc0000000000 RSI: ffff888093ce4300 RDI: ffff888094a20c00 RBP: ffff888094a20240 R08: 0000000000000001 R09: 0000000000000003 R10: 0000000000000000 R11: 0000000000000000 R12: ffff888093ce4300 R13: ffff888094a20b10 R14: ffff888094a20c00 R15: 0000000000000082 FS: 0000000000000000(0000) GS:ffff8880ae600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000610 CR3: 000000009e49a000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <IRQ> lock_is_held include/linux/lockdep.h:425 [inline] advance_sched+0x638/0x990 net/sched/sch_taprio.c:702 __run_hrtimer kernel/time/hrtimer.c:1520 [inline] __hrtimer_run_queues+0x6a9/0xfc0 kernel/time/hrtimer.c:1584 hrtimer_interrupt+0x32a/0x930 kernel/time/hrtimer.c:1646 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1080 [inline] __sysvec_apic_timer_interrupt+0x142/0x5e0 arch/x86/kernel/apic/apic.c:1097 asm_call_on_stack+0xf/0x20 arch/x86/entry/entry_64.S:711 </IRQ> __run_on_irqstack arch/x86/include/asm/irq_stack.h:22 [inline] run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:48 [inline] sysvec_apic_timer_interrupt+0xe0/0x120 arch/x86/kernel/apic/apic.c:1091 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:587 RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:765 [inline] RIP: 0010:seqcount_lockdep_reader_access include/linux/seqlock.h:99 [inline] RIP: 0010:read_seqcount_begin.constprop.0+0x110/0x1f0 include/linux/seqlock.h:182 Code: 00 fc ff df 48 c1 e8 03 80 3c 10 00 0f 85 d0 00 00 00 48 83 3d 90 ff 44 03 00 0f 84 92 00 00 00 e8 05 d9 03 fb 48 89 df 57 9d <0f> 1f 44 00 00 e8 f6 d8 03 fb 44 8b 25 6f 4a 3f 04 31 ff 44 89 e3 RSP: 0018:ffffc90001b67630 EFLAGS: 00000293 RAX: 0000000000000000 RBX: 0000000000000293 RCX: 0000000000000003 RDX: ffff888094a20240 RSI: ffffffff866fe0fb RDI: 0000000000000293 RBP: 0000000000000200 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: ffff8880941ed2c0 R13: 0000000011108e1c R14: 0000000000000000 R15: dffffc0000000000 nf_conntrack_get_ht include/net/netfilter/nf_conntrack.h:300 [inline] ____nf_conntrack_find net/netfilter/nf_conntrack_core.c:740 [inline] __nf_conntrack_find_get+0xcb/0xf30 net/netfilter/nf_conntrack_core.c:778 resolve_normal_ct net/netfilter/nf_conntrack_core.c:1684 [inline] nf_conntrack_in+0x484/0x12a0 net/netfilter/nf_conntrack_core.c:1846 ipv4_conntrack_local+0x11c/0x220 net/netfilter/nf_conntrack_proto.c:200 nf_hook_entry_hookfn include/linux/netfilter.h:135 [inline] nf_hook_slow+0xc5/0x1e0 net/netfilter/core.c:512 nf_hook+0x2b8/0x540 include/linux/netfilter.h:262 __ip_local_out+0x26e/0x530 net/ipv4/ip_output.c:114 ip_local_out+0x26/0x1a0 net/ipv4/ip_output.c:123 iptunnel_xmit+0x5a0/0x9b0 net/ipv4/ip_tunnel_core.c:81 send4+0x4c6/0xd70 drivers/net/wireguard/socket.c:85 wg_socket_send_skb_to_peer+0x192/0x220 drivers/net/wireguard/socket.c:174 wg_packet_create_data_done drivers/net/wireguard/send.c:252 [inline] wg_packet_tx_worker+0x2f9/0x980 drivers/net/wireguard/send.c:280 process_one_work+0x94c/0x1670 kernel/workqueue.c:2269 worker_thread+0x64c/0x1120 kernel/workqueue.c:2415 kthread+0x3b5/0x4a0 kernel/kthread.c:291 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:293 INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 0.000 msecs ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: BUG: soft lockup in smp_call_function 2020-07-12 23:02 syzbot 2020-07-13 6:53 ` Dmitry Vyukov 2020-07-13 8:58 ` syzbot @ 2020-07-15 5:43 ` syzbot 2 siblings, 0 replies; 26+ messages in thread From: syzbot @ 2020-07-15 5:43 UTC (permalink / raw) To: bp, davem, dvyukov, hpa, jhs, jiri, keescook, linux-kernel, luto, luto, mingo, netdev, syzkaller-bugs, tglx, vinicius.gomes, wad, x86, xiyou.wangcong syzbot has bisected this issue to: commit 5a781ccbd19e4664babcbe4b4ead7aa2b9283d22 Author: Vinicius Costa Gomes <vinicius.gomes@intel.com> Date: Sat Sep 29 00:59:43 2018 +0000 tc: Add support for configuring the taprio scheduler bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=1196d06f100000 start commit: 4437dd6e Merge tag 'io_uring-5.8-2020-07-12' of git://git... git tree: upstream final oops: https://syzkaller.appspot.com/x/report.txt?x=1396d06f100000 console output: https://syzkaller.appspot.com/x/log.txt?x=1596d06f100000 kernel config: https://syzkaller.appspot.com/x/.config?x=66ad203c2bb6d8b dashboard link: https://syzkaller.appspot.com/bug?extid=cce3691658bef1b12ac9 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=11a160bf100000 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=16d2bd77100000 Reported-by: syzbot+cce3691658bef1b12ac9@syzkaller.appspotmail.com Fixes: 5a781ccbd19e ("tc: Add support for configuring the taprio scheduler") For information about bisection process see: https://goo.gl/tpsmEJ#bisection ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2025-11-20 6:48 UTC | newest] Thread overview: 26+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-09-12 23:02 BUG: soft lockup in smp_call_function Sanan Hasanov 2023-09-13 10:05 ` Peter Zijlstra 2023-09-13 11:07 ` Hillf Danton 2023-09-13 14:21 ` drm/vkms: deadlock between dev->event_lock and timer Tetsuo Handa 2023-09-13 14:21 ` Tetsuo Handa 2023-09-13 16:47 ` Linus Torvalds 2023-09-13 16:47 ` Linus Torvalds 2023-09-13 21:08 ` Thomas Gleixner 2023-09-13 21:08 ` Thomas Gleixner 2023-09-14 6:33 ` Tetsuo Handa 2023-09-14 6:33 ` Tetsuo Handa 2023-09-14 8:12 ` Daniel Vetter 2023-09-14 8:12 ` Daniel Vetter 2023-09-18 22:02 ` Helen Koike 2023-09-19 6:38 ` Daniel Stone 2023-09-19 6:38 ` Daniel Stone 2023-09-13 14:30 ` BUG: soft lockup in smp_call_function Tetsuo Handa 2023-09-14 12:21 ` Hillf Danton 2023-09-14 13:13 ` Tetsuo Handa -- strict thread matches above, loose matches on Subject: below -- 2025-11-20 6:48 Xianying Wang 2021-09-15 1:59 Hao Sun 2021-09-15 5:12 ` Zhang, Qiang 2020-07-12 23:02 syzbot 2020-07-13 6:53 ` Dmitry Vyukov 2020-07-13 8:58 ` syzbot 2020-07-15 5:43 ` syzbot
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.