* BUG: soft lockup in smp_call_function @ 2023-09-12 23:02 Sanan Hasanov 2023-09-13 10:05 ` Peter Zijlstra 2023-09-13 11:07 ` Hillf Danton 0 siblings, 2 replies; 19+ messages in thread From: Sanan Hasanov @ 2023-09-12 23:02 UTC (permalink / raw) To: peterz@infradead.org, paulmck@kernel.org, jgross@suse.com, vschneid@redhat.com, yury.norov@gmail.com, linux-kernel@vger.kernel.org Cc: syzkaller@googlegroups.com, contact@pgazz.com Good day, dear maintainers, We found a bug using a modified kernel configuration file used by syzbot. We enhanced the coverage of the configuration file using our tool, klocalizer. Kernel Branch: 6.3.0-next-20230426 Kernel Config: https://drive.google.com/file/d/1WSUEWrith9-539qo6xRqmwy4LfDtmKpp/view?usp=sharing Reproducer: https://drive.google.com/file/d/1pN6FfcjuUs6Wx94g1gufuYGjRbMMgiZ4/view?usp=sharing Thank you! Best regards, Sanan Hasanov watchdog: BUG: soft lockup - CPU#5 stuck for 26s! [kworker/u16:1:12] Modules linked in: irq event stamp: 192794 hardirqs last enabled at (192793): [<ffffffff89a0140a>] asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:645 hardirqs last disabled at (192794): [<ffffffff89975d4f>] sysvec_apic_timer_interrupt+0xf/0xc0 arch/x86/kernel/apic/apic.c:1106 softirqs last enabled at (187764): [<ffffffff814b94bd>] invoke_softirq kernel/softirq.c:445 [inline] softirqs last enabled at (187764): [<ffffffff814b94bd>] __irq_exit_rcu+0x11d/0x190 kernel/softirq.c:650 softirqs last disabled at (187671): [<ffffffff814b94bd>] invoke_softirq kernel/softirq.c:445 [inline] softirqs last disabled at (187671): [<ffffffff814b94bd>] __irq_exit_rcu+0x11d/0x190 kernel/softirq.c:650 CPU: 5 PID: 12 Comm: kworker/u16:1 Not tainted 6.3.0-next-20230426 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Workqueue: events_unbound toggle_allocation_gate RIP: 0010:csd_lock_wait kernel/smp.c:294 [inline] RIP: 0010:smp_call_function_many_cond+0x5bd/0x1020 kernel/smp.c:828 Code: 0b 00 85 ed 74 4d 48 b8 00 00 00 00 00 fc ff df 4d 89 f4 4c 89 f5 49 c1 ec 03 83 e5 07 49 01 c4 83 c5 03 e8 b5 07 0b 00 f3 90 <41> 0f b6 04 24 40 38 c5 7c 08 84 c0 0f 85 46 08 00 00 8b 43 08 31 RSP: 0018:ffffc900000cf9e8 EFLAGS: 00000293 RAX: 0000000000000000 RBX: ffff888119cc4d80 RCX: 0000000000000000 RDX: ffff888100325940 RSI: ffffffff8176807b RDI: 0000000000000005 RBP: 0000000000000003 R08: 0000000000000005 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000001 R12: ffffed10233989b1 R13: 0000000000000001 R14: ffff888119cc4d88 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffff888119e80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000555556a6cc88 CR3: 000000000bb73000 CR4: 0000000000350ee0 Call Trace: <TASK> on_each_cpu_cond_mask+0x40/0x90 kernel/smp.c:996 on_each_cpu include/linux/smp.h:71 [inline] text_poke_sync arch/x86/kernel/alternative.c:1770 [inline] text_poke_bp_batch+0x237/0x770 arch/x86/kernel/alternative.c:1970 text_poke_flush arch/x86/kernel/alternative.c:2161 [inline] text_poke_flush arch/x86/kernel/alternative.c:2158 [inline] text_poke_finish+0x1a/0x30 arch/x86/kernel/alternative.c:2168 arch_jump_label_transform_apply+0x17/0x30 arch/x86/kernel/jump_label.c:146 jump_label_update+0x321/0x400 kernel/jump_label.c:829 static_key_enable_cpuslocked+0x1b5/0x270 kernel/jump_label.c:205 static_key_enable+0x1a/0x20 kernel/jump_label.c:218 toggle_allocation_gate mm/kfence/core.c:831 [inline] toggle_allocation_gate+0xf4/0x220 mm/kfence/core.c:823 process_one_work+0x993/0x15e0 kernel/workqueue.c:2405 worker_thread+0x67d/0x10c0 kernel/workqueue.c:2552 kthread+0x33e/0x440 kernel/kthread.c:379 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 </TASK> Sending NMI from CPU 5 to CPUs 0-4,6-7: NMI backtrace for cpu 1 CPU: 1 PID: 20602 Comm: syz-executor.3 Not tainted 6.3.0-next-20230426 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 RIP: 0010:hlock_class kernel/locking/lockdep.c:228 [inline] RIP: 0010:check_wait_context kernel/locking/lockdep.c:4747 [inline] RIP: 0010:__lock_acquire+0x489/0x5d00 kernel/locking/lockdep.c:5024 Code: 41 81 e5 ff 1f 45 0f b7 ed be 08 00 00 00 4c 89 e8 48 c1 e8 06 48 8d 3c c5 00 6b 2c 90 e8 5f 90 6e 00 4c 0f a3 2d d7 35 c9 0e <0f> 83 5c 0c 00 00 4f 8d 6c 6d 00 49 c1 e5 06 49 81 c5 20 6f 2c 90 RSP: 0018:ffffc90002aa7350 EFLAGS: 00000047 RAX: 0000000000000001 RBX: 0000000000000001 RCX: ffffffff81633521 RDX: fffffbfff2058d62 RSI: 0000000000000008 RDI: ffffffff902c6b08 RBP: ffff888042995940 R08: 0000000000000000 R09: ffffffff902c6b0f R10: fffffbfff2058d61 R11: 0000000000000001 R12: ffff888119e2b818 R13: 0000000000000063 R14: 0000000000000002 R15: ffff888042996598 FS: 00007fdaad065700(0000) GS:ffff888119c80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b30623000 CR3: 0000000101969000 CR4: 0000000000350ee0 Call Trace: <TASK> lock_acquire kernel/locking/lockdep.c:5691 [inline] lock_acquire+0x1b1/0x520 kernel/locking/lockdep.c:5656 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0x3d/0x60 kernel/locking/spinlock.c:162 lock_hrtimer_base kernel/time/hrtimer.c:173 [inline] hrtimer_try_to_cancel kernel/time/hrtimer.c:1331 [inline] hrtimer_try_to_cancel+0xa9/0x2e0 kernel/time/hrtimer.c:1316 hrtimer_cancel+0x17/0x40 kernel/time/hrtimer.c:1443 __disable_vblank drivers/gpu/drm/drm_vblank.c:434 [inline] drm_vblank_disable_and_save+0x282/0x3d0 drivers/gpu/drm/drm_vblank.c:478 drm_crtc_vblank_off+0x312/0x970 drivers/gpu/drm/drm_vblank.c:1366 disable_outputs+0x7c7/0xbb0 drivers/gpu/drm/drm_atomic_helper.c:1202 drm_atomic_helper_commit_modeset_disables+0x1d/0x40 drivers/gpu/drm/drm_atomic_helper.c:1397 vkms_atomic_commit_tail+0x51/0x240 drivers/gpu/drm/vkms/vkms_drv.c:71 commit_tail+0x288/0x420 drivers/gpu/drm/drm_atomic_helper.c:1812 drm_atomic_helper_commit drivers/gpu/drm/drm_atomic_helper.c:2052 [inline] drm_atomic_helper_commit+0x306/0x390 drivers/gpu/drm/drm_atomic_helper.c:1985 drm_atomic_commit+0x20a/0x2d0 drivers/gpu/drm/drm_atomic.c:1503 drm_client_modeset_commit_atomic+0x698/0x7e0 drivers/gpu/drm/drm_client_modeset.c:1045 drm_client_modeset_dpms+0x174/0x200 drivers/gpu/drm/drm_client_modeset.c:1226 drm_fb_helper_dpms drivers/gpu/drm/drm_fb_helper.c:323 [inline] drm_fb_helper_blank+0xd1/0x260 drivers/gpu/drm/drm_fb_helper.c:356 fb_blank+0x105/0x190 drivers/video/fbdev/core/fbmem.c:1088 do_fb_ioctl+0x390/0x760 drivers/video/fbdev/core/fbmem.c:1180 fb_ioctl+0xeb/0x150 drivers/video/fbdev/core/fbmem.c:1204 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:870 [inline] __se_sys_ioctl fs/ioctl.c:856 [inline] __x64_sys_ioctl+0x197/0x210 fs/ioctl.c:856 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0x80 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7fdaabe8edcd Code: 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fdaad064bf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007fdaabfbbf80 RCX: 00007fdaabe8edcd RDX: 0000000000000004 RSI: 0000000000004611 RDI: 0000000000000003 RBP: 00007fdaabefc59c R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007ffdadeffe9f R14: 00007ffdadf00040 R15: 00007fdaad064d80 </TASK> NMI backtrace for cpu 0 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline] NMI backtrace for cpu 0 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline] NMI backtrace for cpu 0 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729 NMI backtrace for cpu 3 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline] NMI backtrace for cpu 3 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline] NMI backtrace for cpu 3 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729 NMI backtrace for cpu 6 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline] NMI backtrace for cpu 6 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline] NMI backtrace for cpu 6 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729 NMI backtrace for cpu 7 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline] NMI backtrace for cpu 7 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline] NMI backtrace for cpu 7 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729 NMI backtrace for cpu 4 CPU: 4 PID: 20623 Comm: syz-executor.6 Not tainted 6.3.0-next-20230426 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 RIP: 0010:kvm_wait+0xb7/0x110 arch/x86/kernel/kvm.c:1064 Code: 40 38 c6 74 1b 48 83 c4 10 c3 c3 e8 93 d3 50 00 eb 07 0f 00 2d 4a 04 92 08 fb f4 48 83 c4 10 c3 eb 07 0f 00 2d 3a 04 92 08 f4 <48> 83 c4 10 c3 89 74 24 0c 48 89 3c 24 e8 d7 d4 50 00 8b 74 24 0c RSP: 0018:ffffc90000300b50 EFLAGS: 00000046 RAX: 0000000000000003 RBX: 0000000000000000 RCX: dffffc0000000000 RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff88810b0803d8 RBP: ffff88810b0803d8 R08: 0000000000000001 R09: ffff88810b0803d8 R10: ffffed102161007b R11: ffffc90000300ff8 R12: 0000000000000000 R13: ffffed102161007b R14: 0000000000000001 R15: ffff888119e3d3c0 FS: 0000000000000000(0000) GS:ffff888119e00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f28183bd0b0 CR3: 000000000bb73000 CR4: 0000000000350ee0 Call Trace: <IRQ> pv_wait arch/x86/include/asm/paravirt.h:598 [inline] pv_wait_head_or_lock kernel/locking/qspinlock_paravirt.h:470 [inline] __pv_queued_spin_lock_slowpath+0x8e4/0xb80 kernel/locking/qspinlock.c:511 pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:586 [inline] queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:51 [inline] queued_spin_lock include/asm-generic/qspinlock.h:114 [inline] do_raw_spin_lock+0x20d/0x2b0 kernel/locking/spinlock_debug.c:115 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:111 [inline] _raw_spin_lock_irqsave+0x45/0x60 kernel/locking/spinlock.c:162 drm_handle_vblank+0x11e/0xb80 drivers/gpu/drm/drm_vblank.c:1986 vkms_vblank_simulate+0xe8/0x3e0 drivers/gpu/drm/vkms/vkms_crtc.c:29 __run_hrtimer kernel/time/hrtimer.c:1685 [inline] __hrtimer_run_queues+0x599/0xa30 kernel/time/hrtimer.c:1749 hrtimer_interrupt+0x320/0x7b0 kernel/time/hrtimer.c:1811 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1095 [inline] __sysvec_apic_timer_interrupt+0x14a/0x430 arch/x86/kernel/apic/apic.c:1112 sysvec_apic_timer_interrupt+0x92/0xc0 arch/x86/kernel/apic/apic.c:1106 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:645 RIP: 0010:check_kcov_mode kernel/kcov.c:173 [inline] RIP: 0010:__sanitizer_cov_trace_pc+0x11/0x70 kernel/kcov.c:207 Code: a8 01 00 00 e8 b0 ff ff ff 31 c0 c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 f3 0f 1e fa 65 8b 05 0d 33 82 7e 89 c1 48 8b 34 24 <81> e1 00 01 00 00 65 48 8b 14 25 40 bb 03 00 a9 00 01 ff 00 74 0e RSP: 0018:ffffc90002be76d8 EFLAGS: 00000286 RAX: 0000000080000001 RBX: 0000000000000001 RCX: 0000000080000001 RDX: 00007f2817c77000 RSI: ffffffff81bcd756 RDI: ffffc90002be7ad8 RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000001 R12: ffffea00014fc480 R13: 0000000000000000 R14: dffffc0000000000 R15: 8000000053f12007 zap_drop_file_uffd_wp mm/memory.c:1352 [inline] zap_install_uffd_wp_if_needed mm/memory.c:1371 [inline] zap_pte_range mm/memory.c:1417 [inline] zap_pmd_range mm/memory.c:1564 [inline] zap_pud_range mm/memory.c:1593 [inline] zap_p4d_range mm/memory.c:1614 [inline] unmap_page_range+0x1046/0x4470 mm/memory.c:1635 unmap_single_vma+0x19a/0x2b0 mm/memory.c:1681 unmap_vmas+0x234/0x380 mm/memory.c:1720 exit_mmap+0x190/0x930 mm/mmap.c:3111 __mmput+0x128/0x4c0 kernel/fork.c:1351 mmput+0x60/0x70 kernel/fork.c:1373 exit_mm kernel/exit.c:564 [inline] do_exit+0x9d1/0x29f0 kernel/exit.c:858 do_group_exit+0xd4/0x2a0 kernel/exit.c:1021 get_signal+0x2311/0x25c0 kernel/signal.c:2874 arch_do_signal_or_restart+0x79/0x5a0 arch/x86/kernel/signal.c:307 exit_to_user_mode_loop kernel/entry/common.c:168 [inline] exit_to_user_mode_prepare+0x11f/0x240 kernel/entry/common.c:204 __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline] syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:297 do_syscall_64+0x46/0x80 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f281828edcd Code: Unable to access opcode bytes at 0x7f281828eda3. RSP: 002b:00007f28194c0c98 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca RAX: fffffffffffffe00 RBX: 00007f28183bbf80 RCX: 00007f281828edcd RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00007f28183bbf88 RBP: 00007f28183bbf88 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f28183bbf8c R13: 00007ffd5038e1ef R14: 00007ffd5038e390 R15: 00007f28194c0d80 </TASK> NMI backtrace for cpu 2 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline] NMI backtrace for cpu 2 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline] NMI backtrace for cpu 2 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729 ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: BUG: soft lockup in smp_call_function 2023-09-12 23:02 BUG: soft lockup in smp_call_function Sanan Hasanov @ 2023-09-13 10:05 ` Peter Zijlstra 2023-09-13 11:07 ` Hillf Danton 1 sibling, 0 replies; 19+ messages in thread From: Peter Zijlstra @ 2023-09-13 10:05 UTC (permalink / raw) To: Sanan Hasanov Cc: paulmck@kernel.org, jgross@suse.com, vschneid@redhat.com, yury.norov@gmail.com, linux-kernel@vger.kernel.org, syzkaller@googlegroups.com, contact@pgazz.com, rodrigosiqueiramelo, melissa.srw, mairacanal, hamohammed.sa, daniel, airlied On Tue, Sep 12, 2023 at 11:02:56PM +0000, Sanan Hasanov wrote: > Good day, dear maintainers, > > We found a bug using a modified kernel configuration file used by syzbot. > > We enhanced the coverage of the configuration file using our tool, klocalizer. > > Kernel Branch: 6.3.0-next-20230426 > Kernel Config: https://drive.google.com/file/d/1WSUEWrith9-539qo6xRqmwy4LfDtmKpp/view?usp=sharing > Reproducer: https://drive.google.com/file/d/1pN6FfcjuUs6Wx94g1gufuYGjRbMMgiZ4/view?usp=sharing > Thank you! AFAICT the thing is stuck in DRM somewhere... > watchdog: BUG: soft lockup - CPU#5 stuck for 26s! [kworker/u16:1:12] > Modules linked in: > irq event stamp: 192794 > hardirqs last enabled at (192793): [<ffffffff89a0140a>] asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:645 > hardirqs last disabled at (192794): [<ffffffff89975d4f>] sysvec_apic_timer_interrupt+0xf/0xc0 arch/x86/kernel/apic/apic.c:1106 > softirqs last enabled at (187764): [<ffffffff814b94bd>] invoke_softirq kernel/softirq.c:445 [inline] > softirqs last enabled at (187764): [<ffffffff814b94bd>] __irq_exit_rcu+0x11d/0x190 kernel/softirq.c:650 > softirqs last disabled at (187671): [<ffffffff814b94bd>] invoke_softirq kernel/softirq.c:445 [inline] > softirqs last disabled at (187671): [<ffffffff814b94bd>] __irq_exit_rcu+0x11d/0x190 kernel/softirq.c:650 > CPU: 5 PID: 12 Comm: kworker/u16:1 Not tainted 6.3.0-next-20230426 #1 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 > Workqueue: events_unbound toggle_allocation_gate > RIP: 0010:csd_lock_wait kernel/smp.c:294 [inline] > RIP: 0010:smp_call_function_many_cond+0x5bd/0x1020 kernel/smp.c:828 > Code: 0b 00 85 ed 74 4d 48 b8 00 00 00 00 00 fc ff df 4d 89 f4 4c 89 f5 49 c1 ec 03 83 e5 07 49 01 c4 83 c5 03 e8 b5 07 0b 00 f3 90 <41> 0f b6 04 24 40 38 c5 7c 08 84 c0 0f 85 46 08 00 00 8b 43 08 31 > RSP: 0018:ffffc900000cf9e8 EFLAGS: 00000293 > RAX: 0000000000000000 RBX: ffff888119cc4d80 RCX: 0000000000000000 > RDX: ffff888100325940 RSI: ffffffff8176807b RDI: 0000000000000005 > RBP: 0000000000000003 R08: 0000000000000005 R09: 0000000000000000 > R10: 0000000000000001 R11: 0000000000000001 R12: ffffed10233989b1 > R13: 0000000000000001 R14: ffff888119cc4d88 R15: 0000000000000001 > FS: 0000000000000000(0000) GS:ffff888119e80000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000555556a6cc88 CR3: 000000000bb73000 CR4: 0000000000350ee0 > Call Trace: > <TASK> > on_each_cpu_cond_mask+0x40/0x90 kernel/smp.c:996 > on_each_cpu include/linux/smp.h:71 [inline] > text_poke_sync arch/x86/kernel/alternative.c:1770 [inline] > text_poke_bp_batch+0x237/0x770 arch/x86/kernel/alternative.c:1970 > text_poke_flush arch/x86/kernel/alternative.c:2161 [inline] > text_poke_flush arch/x86/kernel/alternative.c:2158 [inline] > text_poke_finish+0x1a/0x30 arch/x86/kernel/alternative.c:2168 > arch_jump_label_transform_apply+0x17/0x30 arch/x86/kernel/jump_label.c:146 > jump_label_update+0x321/0x400 kernel/jump_label.c:829 > static_key_enable_cpuslocked+0x1b5/0x270 kernel/jump_label.c:205 > static_key_enable+0x1a/0x20 kernel/jump_label.c:218 > toggle_allocation_gate mm/kfence/core.c:831 [inline] > toggle_allocation_gate+0xf4/0x220 mm/kfence/core.c:823 > process_one_work+0x993/0x15e0 kernel/workqueue.c:2405 > worker_thread+0x67d/0x10c0 kernel/workqueue.c:2552 > kthread+0x33e/0x440 kernel/kthread.c:379 > ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 > </TASK> Right, so this is waiting for an IPI to be processed.. while #1 has IRQs disabled > Sending NMI from CPU 5 to CPUs 0-4,6-7: > NMI backtrace for cpu 1 > CPU: 1 PID: 20602 Comm: syz-executor.3 Not tainted 6.3.0-next-20230426 #1 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 > RIP: 0010:hlock_class kernel/locking/lockdep.c:228 [inline] > RIP: 0010:check_wait_context kernel/locking/lockdep.c:4747 [inline] > RIP: 0010:__lock_acquire+0x489/0x5d00 kernel/locking/lockdep.c:5024 > Code: 41 81 e5 ff 1f 45 0f b7 ed be 08 00 00 00 4c 89 e8 48 c1 e8 06 48 8d 3c c5 00 6b 2c 90 e8 5f 90 6e 00 4c 0f a3 2d d7 35 c9 0e <0f> 83 5c 0c 00 00 4f 8d 6c 6d 00 49 c1 e5 06 49 81 c5 20 6f 2c 90 > RSP: 0018:ffffc90002aa7350 EFLAGS: 00000047 > RAX: 0000000000000001 RBX: 0000000000000001 RCX: ffffffff81633521 > RDX: fffffbfff2058d62 RSI: 0000000000000008 RDI: ffffffff902c6b08 > RBP: ffff888042995940 R08: 0000000000000000 R09: ffffffff902c6b0f > R10: fffffbfff2058d61 R11: 0000000000000001 R12: ffff888119e2b818 > R13: 0000000000000063 R14: 0000000000000002 R15: ffff888042996598 > FS: 00007fdaad065700(0000) GS:ffff888119c80000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000001b30623000 CR3: 0000000101969000 CR4: 0000000000350ee0 > Call Trace: > <TASK> > lock_acquire kernel/locking/lockdep.c:5691 [inline] > lock_acquire+0x1b1/0x520 kernel/locking/lockdep.c:5656 > __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] > _raw_spin_lock_irqsave+0x3d/0x60 kernel/locking/spinlock.c:162 > lock_hrtimer_base kernel/time/hrtimer.c:173 [inline] > hrtimer_try_to_cancel kernel/time/hrtimer.c:1331 [inline] > hrtimer_try_to_cancel+0xa9/0x2e0 kernel/time/hrtimer.c:1316 > hrtimer_cancel+0x17/0x40 kernel/time/hrtimer.c:1443 And this is trying to cancel a hrtimer which is ran on CPU4 and won't be making much progress. > __disable_vblank drivers/gpu/drm/drm_vblank.c:434 [inline] So we're here, holding vbl_lock, vblank_time_lock one of which is what #4 is waiting on. This also has IRQs disabled, which is what #1 is waiting on. > drm_vblank_disable_and_save+0x282/0x3d0 drivers/gpu/drm/drm_vblank.c:478 > drm_crtc_vblank_off+0x312/0x970 drivers/gpu/drm/drm_vblank.c:1366 > disable_outputs+0x7c7/0xbb0 drivers/gpu/drm/drm_atomic_helper.c:1202 > drm_atomic_helper_commit_modeset_disables+0x1d/0x40 drivers/gpu/drm/drm_atomic_helper.c:1397 > vkms_atomic_commit_tail+0x51/0x240 drivers/gpu/drm/vkms/vkms_drv.c:71 > commit_tail+0x288/0x420 drivers/gpu/drm/drm_atomic_helper.c:1812 > drm_atomic_helper_commit drivers/gpu/drm/drm_atomic_helper.c:2052 [inline] > drm_atomic_helper_commit+0x306/0x390 drivers/gpu/drm/drm_atomic_helper.c:1985 > drm_atomic_commit+0x20a/0x2d0 drivers/gpu/drm/drm_atomic.c:1503 > drm_client_modeset_commit_atomic+0x698/0x7e0 drivers/gpu/drm/drm_client_modeset.c:1045 > drm_client_modeset_dpms+0x174/0x200 drivers/gpu/drm/drm_client_modeset.c:1226 > drm_fb_helper_dpms drivers/gpu/drm/drm_fb_helper.c:323 [inline] > drm_fb_helper_blank+0xd1/0x260 drivers/gpu/drm/drm_fb_helper.c:356 > fb_blank+0x105/0x190 drivers/video/fbdev/core/fbmem.c:1088 > do_fb_ioctl+0x390/0x760 drivers/video/fbdev/core/fbmem.c:1180 > fb_ioctl+0xeb/0x150 drivers/video/fbdev/core/fbmem.c:1204 > vfs_ioctl fs/ioctl.c:51 [inline] > __do_sys_ioctl fs/ioctl.c:870 [inline] > __se_sys_ioctl fs/ioctl.c:856 [inline] > __x64_sys_ioctl+0x197/0x210 fs/ioctl.c:856 > do_syscall_x64 arch/x86/entry/common.c:50 [inline] > do_syscall_64+0x39/0x80 arch/x86/entry/common.c:80 > entry_SYSCALL_64_after_hwframe+0x63/0xcd > RIP: 0033:0x7fdaabe8edcd > Code: 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 > RSP: 002b:00007fdaad064bf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > RAX: ffffffffffffffda RBX: 00007fdaabfbbf80 RCX: 00007fdaabe8edcd > RDX: 0000000000000004 RSI: 0000000000004611 RDI: 0000000000000003 > RBP: 00007fdaabefc59c R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 > R13: 00007ffdadeffe9f R14: 00007ffdadf00040 R15: 00007fdaad064d80 > </TASK> > CPU: 4 PID: 20623 Comm: syz-executor.6 Not tainted 6.3.0-next-20230426 #1 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 > RIP: 0010:kvm_wait+0xb7/0x110 arch/x86/kernel/kvm.c:1064 > Code: 40 38 c6 74 1b 48 83 c4 10 c3 c3 e8 93 d3 50 00 eb 07 0f 00 2d 4a 04 92 08 fb f4 48 83 c4 10 c3 eb 07 0f 00 2d 3a 04 92 08 f4 <48> 83 c4 10 c3 89 74 24 0c 48 89 3c 24 e8 d7 d4 50 00 8b 74 24 0c > RSP: 0018:ffffc90000300b50 EFLAGS: 00000046 > RAX: 0000000000000003 RBX: 0000000000000000 RCX: dffffc0000000000 > RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff88810b0803d8 > RBP: ffff88810b0803d8 R08: 0000000000000001 R09: ffff88810b0803d8 > R10: ffffed102161007b R11: ffffc90000300ff8 R12: 0000000000000000 > R13: ffffed102161007b R14: 0000000000000001 R15: ffff888119e3d3c0 > FS: 0000000000000000(0000) GS:ffff888119e00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007f28183bd0b0 CR3: 000000000bb73000 CR4: 0000000000350ee0 > Call Trace: > <IRQ> > pv_wait arch/x86/include/asm/paravirt.h:598 [inline] > pv_wait_head_or_lock kernel/locking/qspinlock_paravirt.h:470 [inline] > __pv_queued_spin_lock_slowpath+0x8e4/0xb80 kernel/locking/qspinlock.c:511 > pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:586 [inline] > queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:51 [inline] > queued_spin_lock include/asm-generic/qspinlock.h:114 [inline] > do_raw_spin_lock+0x20d/0x2b0 kernel/locking/spinlock_debug.c:115 > __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:111 [inline] > _raw_spin_lock_irqsave+0x45/0x60 kernel/locking/spinlock.c:162 > drm_handle_vblank+0x11e/0xb80 drivers/gpu/drm/drm_vblank.c:1986 stuck on a spinlock held by #5 > vkms_vblank_simulate+0xe8/0x3e0 drivers/gpu/drm/vkms/vkms_crtc.c:29 > __run_hrtimer kernel/time/hrtimer.c:1685 [inline] > __hrtimer_run_queues+0x599/0xa30 kernel/time/hrtimer.c:1749 > hrtimer_interrupt+0x320/0x7b0 kernel/time/hrtimer.c:1811 > local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1095 [inline] > __sysvec_apic_timer_interrupt+0x14a/0x430 arch/x86/kernel/apic/apic.c:1112 > sysvec_apic_timer_interrupt+0x92/0xc0 arch/x86/kernel/apic/apic.c:1106 > </IRQ> > <TASK> > asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:645 > RIP: 0010:check_kcov_mode kernel/kcov.c:173 [inline] > RIP: 0010:__sanitizer_cov_trace_pc+0x11/0x70 kernel/kcov.c:207 > Code: a8 01 00 00 e8 b0 ff ff ff 31 c0 c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 f3 0f 1e fa 65 8b 05 0d 33 82 7e 89 c1 48 8b 34 24 <81> e1 00 01 00 00 65 48 8b 14 25 40 bb 03 00 a9 00 01 ff 00 74 0e > RSP: 0018:ffffc90002be76d8 EFLAGS: 00000286 > RAX: 0000000080000001 RBX: 0000000000000001 RCX: 0000000080000001 > RDX: 00007f2817c77000 RSI: ffffffff81bcd756 RDI: ffffc90002be7ad8 > RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000 > R10: 0000000000000001 R11: 0000000000000001 R12: ffffea00014fc480 > R13: 0000000000000000 R14: dffffc0000000000 R15: 8000000053f12007 > zap_drop_file_uffd_wp mm/memory.c:1352 [inline] > zap_install_uffd_wp_if_needed mm/memory.c:1371 [inline] > zap_pte_range mm/memory.c:1417 [inline] > zap_pmd_range mm/memory.c:1564 [inline] > zap_pud_range mm/memory.c:1593 [inline] > zap_p4d_range mm/memory.c:1614 [inline] > unmap_page_range+0x1046/0x4470 mm/memory.c:1635 > unmap_single_vma+0x19a/0x2b0 mm/memory.c:1681 > unmap_vmas+0x234/0x380 mm/memory.c:1720 > exit_mmap+0x190/0x930 mm/mmap.c:3111 > __mmput+0x128/0x4c0 kernel/fork.c:1351 > mmput+0x60/0x70 kernel/fork.c:1373 > exit_mm kernel/exit.c:564 [inline] > do_exit+0x9d1/0x29f0 kernel/exit.c:858 > do_group_exit+0xd4/0x2a0 kernel/exit.c:1021 > get_signal+0x2311/0x25c0 kernel/signal.c:2874 > arch_do_signal_or_restart+0x79/0x5a0 arch/x86/kernel/signal.c:307 > exit_to_user_mode_loop kernel/entry/common.c:168 [inline] > exit_to_user_mode_prepare+0x11f/0x240 kernel/entry/common.c:204 > __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline] > syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:297 > do_syscall_64+0x46/0x80 arch/x86/entry/common.c:86 > entry_SYSCALL_64_after_hwframe+0x63/0xcd > RIP: 0033:0x7f281828edcd > Code: Unable to access opcode bytes at 0x7f281828eda3. > RSP: 002b:00007f28194c0c98 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca > RAX: fffffffffffffe00 RBX: 00007f28183bbf80 RCX: 00007f281828edcd > RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00007f28183bbf88 > RBP: 00007f28183bbf88 R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000246 R12: 00007f28183bbf8c > R13: 00007ffd5038e1ef R14: 00007ffd5038e390 R15: 00007f28194c0d80 > </TASK> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: BUG: soft lockup in smp_call_function 2023-09-12 23:02 BUG: soft lockup in smp_call_function Sanan Hasanov 2023-09-13 10:05 ` Peter Zijlstra @ 2023-09-13 11:07 ` Hillf Danton 2023-09-13 14:21 ` Tetsuo Handa 2023-09-13 14:30 ` BUG: soft lockup in smp_call_function Tetsuo Handa 1 sibling, 2 replies; 19+ messages in thread From: Hillf Danton @ 2023-09-13 11:07 UTC (permalink / raw) To: Sanan Hasanov Cc: peterz, Linus Torvalds, Tetsuo Handa, Thomas Gleixner, syzkaller, linux-mm, LKML On Tue, 12 Sep 2023 23:02:56 +0000 Sanan Hasanov <Sanan.Hasanov@ucf.edu> > Good day, dear maintainers, > > We found a bug using a modified kernel configuration file used by syzbot. > Thanks for your report. > We enhanced the coverage of the configuration file using our tool, klocalizer. > > Kernel Branch: 6.3.0-next-20230426 > Kernel Config: https://drive.google.com/file/d/1WSUEWrith9-539qo6xRqmwy4LfDtmKpp/view?usp=sharing > Reproducer: https://drive.google.com/file/d/1pN6FfcjuUs6Wx94g1gufuYGjRbMMgiZ4/view?usp=sharing > Thank you! > > Best regards, > Sanan Hasanov > > watchdog: BUG: soft lockup - CPU#5 stuck for 26s! [kworker/u16:1:12] > Modules linked in: > irq event stamp: 192794 > hardirqs last enabled at (192793): [<ffffffff89a0140a>] asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:645 > hardirqs last disabled at (192794): [<ffffffff89975d4f>] sysvec_apic_timer_interrupt+0xf/0xc0 arch/x86/kernel/apic/apic.c:1106 > softirqs last enabled at (187764): [<ffffffff814b94bd>] invoke_softirq kernel/softirq.c:445 [inline] > softirqs last enabled at (187764): [<ffffffff814b94bd>] __irq_exit_rcu+0x11d/0x190 kernel/softirq.c:650 > softirqs last disabled at (187671): [<ffffffff814b94bd>] invoke_softirq kernel/softirq.c:445 [inline] > softirqs last disabled at (187671): [<ffffffff814b94bd>] __irq_exit_rcu+0x11d/0x190 kernel/softirq.c:650 > CPU: 5 PID: 12 Comm: kworker/u16:1 Not tainted 6.3.0-next-20230426 #1 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 > Workqueue: events_unbound toggle_allocation_gate > RIP: 0010:csd_lock_wait kernel/smp.c:294 [inline] > RIP: 0010:smp_call_function_many_cond+0x5bd/0x1020 kernel/smp.c:828 > Code: 0b 00 85 ed 74 4d 48 b8 00 00 00 00 00 fc ff df 4d 89 f4 4c 89 f5 49 c1 ec 03 83 e5 07 49 01 c4 83 c5 03 e8 b5 07 0b 00 f3 90 <41> 0f b6 04 24 40 38 c5 7c 08 84 c0 0f 85 46 08 00 00 8b 43 08 31 > RSP: 0018:ffffc900000cf9e8 EFLAGS: 00000293 > RAX: 0000000000000000 RBX: ffff888119cc4d80 RCX: 0000000000000000 > RDX: ffff888100325940 RSI: ffffffff8176807b RDI: 0000000000000005 > RBP: 0000000000000003 R08: 0000000000000005 R09: 0000000000000000 > R10: 0000000000000001 R11: 0000000000000001 R12: ffffed10233989b1 > R13: 0000000000000001 R14: ffff888119cc4d88 R15: 0000000000000001 > FS: 0000000000000000(0000) GS:ffff888119e80000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000555556a6cc88 CR3: 000000000bb73000 CR4: 0000000000350ee0 > Call Trace: > <TASK> > on_each_cpu_cond_mask+0x40/0x90 kernel/smp.c:996 > on_each_cpu include/linux/smp.h:71 [inline] > text_poke_sync arch/x86/kernel/alternative.c:1770 [inline] > text_poke_bp_batch+0x237/0x770 arch/x86/kernel/alternative.c:1970 > text_poke_flush arch/x86/kernel/alternative.c:2161 [inline] > text_poke_flush arch/x86/kernel/alternative.c:2158 [inline] > text_poke_finish+0x1a/0x30 arch/x86/kernel/alternative.c:2168 > arch_jump_label_transform_apply+0x17/0x30 arch/x86/kernel/jump_label.c:146 > jump_label_update+0x321/0x400 kernel/jump_label.c:829 > static_key_enable_cpuslocked+0x1b5/0x270 kernel/jump_label.c:205 > static_key_enable+0x1a/0x20 kernel/jump_label.c:218 > toggle_allocation_gate mm/kfence/core.c:831 [inline] > toggle_allocation_gate+0xf4/0x220 mm/kfence/core.c:823 > process_one_work+0x993/0x15e0 kernel/workqueue.c:2405 > worker_thread+0x67d/0x10c0 kernel/workqueue.c:2552 > kthread+0x33e/0x440 kernel/kthread.c:379 > ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 > </TASK> > Sending NMI from CPU 5 to CPUs 0-4,6-7: > NMI backtrace for cpu 1 > CPU: 1 PID: 20602 Comm: syz-executor.3 Not tainted 6.3.0-next-20230426 #1 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 > RIP: 0010:hlock_class kernel/locking/lockdep.c:228 [inline] > RIP: 0010:check_wait_context kernel/locking/lockdep.c:4747 [inline] > RIP: 0010:__lock_acquire+0x489/0x5d00 kernel/locking/lockdep.c:5024 > Code: 41 81 e5 ff 1f 45 0f b7 ed be 08 00 00 00 4c 89 e8 48 c1 e8 06 48 8d 3c c5 00 6b 2c 90 e8 5f 90 6e 00 4c 0f a3 2d d7 35 c9 0e <0f> 83 5c 0c 00 00 4f 8d 6c 6d 00 49 c1 e5 06 49 81 c5 20 6f 2c 90 > RSP: 0018:ffffc90002aa7350 EFLAGS: 00000047 > RAX: 0000000000000001 RBX: 0000000000000001 RCX: ffffffff81633521 > RDX: fffffbfff2058d62 RSI: 0000000000000008 RDI: ffffffff902c6b08 > RBP: ffff888042995940 R08: 0000000000000000 R09: ffffffff902c6b0f > R10: fffffbfff2058d61 R11: 0000000000000001 R12: ffff888119e2b818 > R13: 0000000000000063 R14: 0000000000000002 R15: ffff888042996598 > FS: 00007fdaad065700(0000) GS:ffff888119c80000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000001b30623000 CR3: 0000000101969000 CR4: 0000000000350ee0 > Call Trace: > <TASK> > lock_acquire kernel/locking/lockdep.c:5691 [inline] > lock_acquire+0x1b1/0x520 kernel/locking/lockdep.c:5656 > __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] > _raw_spin_lock_irqsave+0x3d/0x60 kernel/locking/spinlock.c:162 > lock_hrtimer_base kernel/time/hrtimer.c:173 [inline] > hrtimer_try_to_cancel kernel/time/hrtimer.c:1331 [inline] > hrtimer_try_to_cancel+0xa9/0x2e0 kernel/time/hrtimer.c:1316 > hrtimer_cancel+0x17/0x40 kernel/time/hrtimer.c:1443 > __disable_vblank drivers/gpu/drm/drm_vblank.c:434 [inline] > drm_vblank_disable_and_save+0x282/0x3d0 drivers/gpu/drm/drm_vblank.c:478 > drm_crtc_vblank_off+0x312/0x970 drivers/gpu/drm/drm_vblank.c:1366 cpu1 cpu4 (see below) ==== ==== drm_crtc_vblank_off __run_hrtimer spin_lock_irq(&dev->event_lock); ... drm_handle_vblank hrtimer_cancel spin_lock_irqsave(&dev->event_lock, irqflags); Deadlock should have been reported instead provided the lockdep_map in struct timer_list were added also to hrtimer, so it is highly appreciated if Tetsuo or Thomas adds it before 6.8 or 6.10. Hillf > disable_outputs+0x7c7/0xbb0 drivers/gpu/drm/drm_atomic_helper.c:1202 > drm_atomic_helper_commit_modeset_disables+0x1d/0x40 drivers/gpu/drm/drm_atomic_helper.c:1397 > vkms_atomic_commit_tail+0x51/0x240 drivers/gpu/drm/vkms/vkms_drv.c:71 > commit_tail+0x288/0x420 drivers/gpu/drm/drm_atomic_helper.c:1812 > drm_atomic_helper_commit drivers/gpu/drm/drm_atomic_helper.c:2052 [inline] > drm_atomic_helper_commit+0x306/0x390 drivers/gpu/drm/drm_atomic_helper.c:1985 > drm_atomic_commit+0x20a/0x2d0 drivers/gpu/drm/drm_atomic.c:1503 > drm_client_modeset_commit_atomic+0x698/0x7e0 drivers/gpu/drm/drm_client_modeset.c:1045 > drm_client_modeset_dpms+0x174/0x200 drivers/gpu/drm/drm_client_modeset.c:1226 > drm_fb_helper_dpms drivers/gpu/drm/drm_fb_helper.c:323 [inline] > drm_fb_helper_blank+0xd1/0x260 drivers/gpu/drm/drm_fb_helper.c:356 > fb_blank+0x105/0x190 drivers/video/fbdev/core/fbmem.c:1088 > do_fb_ioctl+0x390/0x760 drivers/video/fbdev/core/fbmem.c:1180 > fb_ioctl+0xeb/0x150 drivers/video/fbdev/core/fbmem.c:1204 > vfs_ioctl fs/ioctl.c:51 [inline] > __do_sys_ioctl fs/ioctl.c:870 [inline] > __se_sys_ioctl fs/ioctl.c:856 [inline] > __x64_sys_ioctl+0x197/0x210 fs/ioctl.c:856 > do_syscall_x64 arch/x86/entry/common.c:50 [inline] > do_syscall_64+0x39/0x80 arch/x86/entry/common.c:80 > entry_SYSCALL_64_after_hwframe+0x63/0xcd > RIP: 0033:0x7fdaabe8edcd > Code: 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 > RSP: 002b:00007fdaad064bf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > RAX: ffffffffffffffda RBX: 00007fdaabfbbf80 RCX: 00007fdaabe8edcd > RDX: 0000000000000004 RSI: 0000000000004611 RDI: 0000000000000003 > RBP: 00007fdaabefc59c R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 > R13: 00007ffdadeffe9f R14: 00007ffdadf00040 R15: 00007fdaad064d80 > </TASK> > NMI backtrace for cpu 0 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline] > NMI backtrace for cpu 0 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline] > NMI backtrace for cpu 0 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729 > NMI backtrace for cpu 3 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline] > NMI backtrace for cpu 3 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline] > NMI backtrace for cpu 3 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729 > NMI backtrace for cpu 6 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline] > NMI backtrace for cpu 6 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline] > NMI backtrace for cpu 6 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729 > NMI backtrace for cpu 7 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline] > NMI backtrace for cpu 7 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline] > NMI backtrace for cpu 7 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729 > NMI backtrace for cpu 4 > CPU: 4 PID: 20623 Comm: syz-executor.6 Not tainted 6.3.0-next-20230426 #1 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 > RIP: 0010:kvm_wait+0xb7/0x110 arch/x86/kernel/kvm.c:1064 > Code: 40 38 c6 74 1b 48 83 c4 10 c3 c3 e8 93 d3 50 00 eb 07 0f 00 2d 4a 04 92 08 fb f4 48 83 c4 10 c3 eb 07 0f 00 2d 3a 04 92 08 f4 <48> 83 c4 10 c3 89 74 24 0c 48 89 3c 24 e8 d7 d4 50 00 8b 74 24 0c > RSP: 0018:ffffc90000300b50 EFLAGS: 00000046 > RAX: 0000000000000003 RBX: 0000000000000000 RCX: dffffc0000000000 > RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff88810b0803d8 > RBP: ffff88810b0803d8 R08: 0000000000000001 R09: ffff88810b0803d8 > R10: ffffed102161007b R11: ffffc90000300ff8 R12: 0000000000000000 > R13: ffffed102161007b R14: 0000000000000001 R15: ffff888119e3d3c0 > FS: 0000000000000000(0000) GS:ffff888119e00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007f28183bd0b0 CR3: 000000000bb73000 CR4: 0000000000350ee0 > Call Trace: > <IRQ> > pv_wait arch/x86/include/asm/paravirt.h:598 [inline] > pv_wait_head_or_lock kernel/locking/qspinlock_paravirt.h:470 [inline] > __pv_queued_spin_lock_slowpath+0x8e4/0xb80 kernel/locking/qspinlock.c:511 > pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:586 [inline] > queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:51 [inline] > queued_spin_lock include/asm-generic/qspinlock.h:114 [inline] > do_raw_spin_lock+0x20d/0x2b0 kernel/locking/spinlock_debug.c:115 > __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:111 [inline] > _raw_spin_lock_irqsave+0x45/0x60 kernel/locking/spinlock.c:162 > drm_handle_vblank+0x11e/0xb80 drivers/gpu/drm/drm_vblank.c:1986 > vkms_vblank_simulate+0xe8/0x3e0 drivers/gpu/drm/vkms/vkms_crtc.c:29 > __run_hrtimer kernel/time/hrtimer.c:1685 [inline] > __hrtimer_run_queues+0x599/0xa30 kernel/time/hrtimer.c:1749 > hrtimer_interrupt+0x320/0x7b0 kernel/time/hrtimer.c:1811 > local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1095 [inline] > __sysvec_apic_timer_interrupt+0x14a/0x430 arch/x86/kernel/apic/apic.c:1112 > sysvec_apic_timer_interrupt+0x92/0xc0 arch/x86/kernel/apic/apic.c:1106 > </IRQ> > <TASK> > asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:645 > RIP: 0010:check_kcov_mode kernel/kcov.c:173 [inline] > RIP: 0010:__sanitizer_cov_trace_pc+0x11/0x70 kernel/kcov.c:207 > Code: a8 01 00 00 e8 b0 ff ff ff 31 c0 c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 f3 0f 1e fa 65 8b 05 0d 33 82 7e 89 c1 48 8b 34 24 <81> e1 00 01 00 00 65 48 8b 14 25 40 bb 03 00 a9 00 01 ff 00 74 0e > RSP: 0018:ffffc90002be76d8 EFLAGS: 00000286 > RAX: 0000000080000001 RBX: 0000000000000001 RCX: 0000000080000001 > RDX: 00007f2817c77000 RSI: ffffffff81bcd756 RDI: ffffc90002be7ad8 > RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000 > R10: 0000000000000001 R11: 0000000000000001 R12: ffffea00014fc480 > R13: 0000000000000000 R14: dffffc0000000000 R15: 8000000053f12007 > zap_drop_file_uffd_wp mm/memory.c:1352 [inline] > zap_install_uffd_wp_if_needed mm/memory.c:1371 [inline] > zap_pte_range mm/memory.c:1417 [inline] > zap_pmd_range mm/memory.c:1564 [inline] > zap_pud_range mm/memory.c:1593 [inline] > zap_p4d_range mm/memory.c:1614 [inline] > unmap_page_range+0x1046/0x4470 mm/memory.c:1635 > unmap_single_vma+0x19a/0x2b0 mm/memory.c:1681 > unmap_vmas+0x234/0x380 mm/memory.c:1720 > exit_mmap+0x190/0x930 mm/mmap.c:3111 > __mmput+0x128/0x4c0 kernel/fork.c:1351 > mmput+0x60/0x70 kernel/fork.c:1373 > exit_mm kernel/exit.c:564 [inline] > do_exit+0x9d1/0x29f0 kernel/exit.c:858 > do_group_exit+0xd4/0x2a0 kernel/exit.c:1021 > get_signal+0x2311/0x25c0 kernel/signal.c:2874 > arch_do_signal_or_restart+0x79/0x5a0 arch/x86/kernel/signal.c:307 > exit_to_user_mode_loop kernel/entry/common.c:168 [inline] > exit_to_user_mode_prepare+0x11f/0x240 kernel/entry/common.c:204 > __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline] > syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:297 > do_syscall_64+0x46/0x80 arch/x86/entry/common.c:86 > entry_SYSCALL_64_after_hwframe+0x63/0xcd > RIP: 0033:0x7f281828edcd > Code: Unable to access opcode bytes at 0x7f281828eda3. > RSP: 002b:00007f28194c0c98 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca > RAX: fffffffffffffe00 RBX: 00007f28183bbf80 RCX: 00007f281828edcd > RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00007f28183bbf88 > RBP: 00007f28183bbf88 R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000246 R12: 00007f28183bbf8c > R13: 00007ffd5038e1ef R14: 00007ffd5038e390 R15: 00007f28194c0d80 > </TASK> > NMI backtrace for cpu 2 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline] > NMI backtrace for cpu 2 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline] > NMI backtrace for cpu 2 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729 ^ permalink raw reply [flat|nested] 19+ messages in thread
* drm/vkms: deadlock between dev->event_lock and timer 2023-09-13 11:07 ` Hillf Danton @ 2023-09-13 14:21 ` Tetsuo Handa 2023-09-13 14:30 ` BUG: soft lockup in smp_call_function Tetsuo Handa 1 sibling, 0 replies; 19+ messages in thread From: Tetsuo Handa @ 2023-09-13 14:21 UTC (permalink / raw) To: Rodrigo Siqueira, Melissa Wen, Maira Canal, Haneen Mohammed, Daniel Vetter, David Airlie, DRI Cc: Hillf Danton, syzkaller, LKML, Sanan Hasanov Hello. A deadlock was reported in drivers/gpu/drm/vkms/ . It looks like this locking pattern remains as of 6.6-rc1. Please fix. void drm_crtc_vblank_off(struct drm_crtc *crtc) { spin_lock_irq(&dev->event_lock); drm_vblank_disable_and_save(dev, pipe) { __disable_vblank(dev, pipe) { crtc->funcs->disable_vblank(crtc) == vkms_disable_vblank { hrtimer_cancel(&out->vblank_hrtimer) { // Retries with dev->event_lock held until lock_hrtimer_base() succeeds. ret = hrtimer_try_to_cancel(timer) { base = lock_hrtimer_base(timer, &flags); // Fails forever because vkms_vblank_simulate() is in progress. } } } } } spin_unlock_irq(&dev->event_lock); } static void __run_hrtimer(...) { restart = fn(timer) == vkms_vblank_simulate { drm_crtc_handle_vblank(crtc) { drm_handle_vblank(struct drm_device *dev, unsigned int pipe) { spin_lock_irqsave(&dev->event_lock, irqflags); // Trying to hold dev->event_lock inside timer interrupt handler. => Deadlock was reported as a soft lockup. spin_unlock_irqrestore(&dev->event_lock, irqflags); } } } } On 2023/09/13 20:07, Hillf Danton wrote: > On Tue, 12 Sep 2023 23:02:56 +0000 Sanan Hasanov <Sanan.Hasanov@ucf.edu> >> Good day, dear maintainers, >> >> We found a bug using a modified kernel configuration file used by syzbot. >> > Thanks for your report. > >> We enhanced the coverage of the configuration file using our tool, klocalizer. >> >> Kernel Branch: 6.3.0-next-20230426 >> Kernel Config: https://drive.google.com/file/d/1WSUEWrith9-539qo6xRqmwy4LfDtmKpp/view?usp=sharing >> Reproducer: https://drive.google.com/file/d/1pN6FfcjuUs6Wx94g1gufuYGjRbMMgiZ4/view?usp=sharing >> Thank you! >> >> Best regards, >> Sanan Hasanov >> >> watchdog: BUG: soft lockup - CPU#5 stuck for 26s! [kworker/u16:1:12] >> Modules linked in: >> irq event stamp: 192794 >> hardirqs last enabled at (192793): [<ffffffff89a0140a>] asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:645 >> hardirqs last disabled at (192794): [<ffffffff89975d4f>] sysvec_apic_timer_interrupt+0xf/0xc0 arch/x86/kernel/apic/apic.c:1106 >> softirqs last enabled at (187764): [<ffffffff814b94bd>] invoke_softirq kernel/softirq.c:445 [inline] >> softirqs last enabled at (187764): [<ffffffff814b94bd>] __irq_exit_rcu+0x11d/0x190 kernel/softirq.c:650 >> softirqs last disabled at (187671): [<ffffffff814b94bd>] invoke_softirq kernel/softirq.c:445 [inline] >> softirqs last disabled at (187671): [<ffffffff814b94bd>] __irq_exit_rcu+0x11d/0x190 kernel/softirq.c:650 >> CPU: 5 PID: 12 Comm: kworker/u16:1 Not tainted 6.3.0-next-20230426 #1 >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 >> Workqueue: events_unbound toggle_allocation_gate >> RIP: 0010:csd_lock_wait kernel/smp.c:294 [inline] >> RIP: 0010:smp_call_function_many_cond+0x5bd/0x1020 kernel/smp.c:828 >> Code: 0b 00 85 ed 74 4d 48 b8 00 00 00 00 00 fc ff df 4d 89 f4 4c 89 f5 49 c1 ec 03 83 e5 07 49 01 c4 83 c5 03 e8 b5 07 0b 00 f3 90 <41> 0f b6 04 24 40 38 c5 7c 08 84 c0 0f 85 46 08 00 00 8b 43 08 31 >> RSP: 0018:ffffc900000cf9e8 EFLAGS: 00000293 >> RAX: 0000000000000000 RBX: ffff888119cc4d80 RCX: 0000000000000000 >> RDX: ffff888100325940 RSI: ffffffff8176807b RDI: 0000000000000005 >> RBP: 0000000000000003 R08: 0000000000000005 R09: 0000000000000000 >> R10: 0000000000000001 R11: 0000000000000001 R12: ffffed10233989b1 >> R13: 0000000000000001 R14: ffff888119cc4d88 R15: 0000000000000001 >> FS: 0000000000000000(0000) GS:ffff888119e80000(0000) knlGS:0000000000000000 >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> CR2: 0000555556a6cc88 CR3: 000000000bb73000 CR4: 0000000000350ee0 >> Call Trace: >> <TASK> >> on_each_cpu_cond_mask+0x40/0x90 kernel/smp.c:996 >> on_each_cpu include/linux/smp.h:71 [inline] >> text_poke_sync arch/x86/kernel/alternative.c:1770 [inline] >> text_poke_bp_batch+0x237/0x770 arch/x86/kernel/alternative.c:1970 >> text_poke_flush arch/x86/kernel/alternative.c:2161 [inline] >> text_poke_flush arch/x86/kernel/alternative.c:2158 [inline] >> text_poke_finish+0x1a/0x30 arch/x86/kernel/alternative.c:2168 >> arch_jump_label_transform_apply+0x17/0x30 arch/x86/kernel/jump_label.c:146 >> jump_label_update+0x321/0x400 kernel/jump_label.c:829 >> static_key_enable_cpuslocked+0x1b5/0x270 kernel/jump_label.c:205 >> static_key_enable+0x1a/0x20 kernel/jump_label.c:218 >> toggle_allocation_gate mm/kfence/core.c:831 [inline] >> toggle_allocation_gate+0xf4/0x220 mm/kfence/core.c:823 >> process_one_work+0x993/0x15e0 kernel/workqueue.c:2405 >> worker_thread+0x67d/0x10c0 kernel/workqueue.c:2552 >> kthread+0x33e/0x440 kernel/kthread.c:379 >> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 >> </TASK> >> Sending NMI from CPU 5 to CPUs 0-4,6-7: >> NMI backtrace for cpu 1 >> CPU: 1 PID: 20602 Comm: syz-executor.3 Not tainted 6.3.0-next-20230426 #1 >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 >> RIP: 0010:hlock_class kernel/locking/lockdep.c:228 [inline] >> RIP: 0010:check_wait_context kernel/locking/lockdep.c:4747 [inline] >> RIP: 0010:__lock_acquire+0x489/0x5d00 kernel/locking/lockdep.c:5024 >> Code: 41 81 e5 ff 1f 45 0f b7 ed be 08 00 00 00 4c 89 e8 48 c1 e8 06 48 8d 3c c5 00 6b 2c 90 e8 5f 90 6e 00 4c 0f a3 2d d7 35 c9 0e <0f> 83 5c 0c 00 00 4f 8d 6c 6d 00 49 c1 e5 06 49 81 c5 20 6f 2c 90 >> RSP: 0018:ffffc90002aa7350 EFLAGS: 00000047 >> RAX: 0000000000000001 RBX: 0000000000000001 RCX: ffffffff81633521 >> RDX: fffffbfff2058d62 RSI: 0000000000000008 RDI: ffffffff902c6b08 >> RBP: ffff888042995940 R08: 0000000000000000 R09: ffffffff902c6b0f >> R10: fffffbfff2058d61 R11: 0000000000000001 R12: ffff888119e2b818 >> R13: 0000000000000063 R14: 0000000000000002 R15: ffff888042996598 >> FS: 00007fdaad065700(0000) GS:ffff888119c80000(0000) knlGS:0000000000000000 >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> CR2: 0000001b30623000 CR3: 0000000101969000 CR4: 0000000000350ee0 >> Call Trace: >> <TASK> >> lock_acquire kernel/locking/lockdep.c:5691 [inline] >> lock_acquire+0x1b1/0x520 kernel/locking/lockdep.c:5656 >> __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] >> _raw_spin_lock_irqsave+0x3d/0x60 kernel/locking/spinlock.c:162 >> lock_hrtimer_base kernel/time/hrtimer.c:173 [inline] >> hrtimer_try_to_cancel kernel/time/hrtimer.c:1331 [inline] >> hrtimer_try_to_cancel+0xa9/0x2e0 kernel/time/hrtimer.c:1316 >> hrtimer_cancel+0x17/0x40 kernel/time/hrtimer.c:1443 >> __disable_vblank drivers/gpu/drm/drm_vblank.c:434 [inline] >> drm_vblank_disable_and_save+0x282/0x3d0 drivers/gpu/drm/drm_vblank.c:478 >> drm_crtc_vblank_off+0x312/0x970 drivers/gpu/drm/drm_vblank.c:1366 > > cpu1 cpu4 (see below) > ==== ==== > drm_crtc_vblank_off __run_hrtimer > spin_lock_irq(&dev->event_lock); > ... > drm_handle_vblank > hrtimer_cancel spin_lock_irqsave(&dev->event_lock, irqflags); > > > Deadlock should have been reported instead provided the lockdep_map in > struct timer_list were added also to hrtimer, so it is highly appreciated > if Tetsuo or Thomas adds it before 6.8 or 6.10. > > Hillf > >> disable_outputs+0x7c7/0xbb0 drivers/gpu/drm/drm_atomic_helper.c:1202 >> drm_atomic_helper_commit_modeset_disables+0x1d/0x40 drivers/gpu/drm/drm_atomic_helper.c:1397 >> vkms_atomic_commit_tail+0x51/0x240 drivers/gpu/drm/vkms/vkms_drv.c:71 >> commit_tail+0x288/0x420 drivers/gpu/drm/drm_atomic_helper.c:1812 >> drm_atomic_helper_commit drivers/gpu/drm/drm_atomic_helper.c:2052 [inline] >> drm_atomic_helper_commit+0x306/0x390 drivers/gpu/drm/drm_atomic_helper.c:1985 >> drm_atomic_commit+0x20a/0x2d0 drivers/gpu/drm/drm_atomic.c:1503 >> drm_client_modeset_commit_atomic+0x698/0x7e0 drivers/gpu/drm/drm_client_modeset.c:1045 >> drm_client_modeset_dpms+0x174/0x200 drivers/gpu/drm/drm_client_modeset.c:1226 >> drm_fb_helper_dpms drivers/gpu/drm/drm_fb_helper.c:323 [inline] >> drm_fb_helper_blank+0xd1/0x260 drivers/gpu/drm/drm_fb_helper.c:356 >> fb_blank+0x105/0x190 drivers/video/fbdev/core/fbmem.c:1088 >> do_fb_ioctl+0x390/0x760 drivers/video/fbdev/core/fbmem.c:1180 >> fb_ioctl+0xeb/0x150 drivers/video/fbdev/core/fbmem.c:1204 >> vfs_ioctl fs/ioctl.c:51 [inline] >> __do_sys_ioctl fs/ioctl.c:870 [inline] >> __se_sys_ioctl fs/ioctl.c:856 [inline] >> __x64_sys_ioctl+0x197/0x210 fs/ioctl.c:856 >> do_syscall_x64 arch/x86/entry/common.c:50 [inline] >> do_syscall_64+0x39/0x80 arch/x86/entry/common.c:80 >> entry_SYSCALL_64_after_hwframe+0x63/0xcd >> RIP: 0033:0x7fdaabe8edcd >> Code: 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 >> RSP: 002b:00007fdaad064bf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 >> RAX: ffffffffffffffda RBX: 00007fdaabfbbf80 RCX: 00007fdaabe8edcd >> RDX: 0000000000000004 RSI: 0000000000004611 RDI: 0000000000000003 >> RBP: 00007fdaabefc59c R08: 0000000000000000 R09: 0000000000000000 >> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 >> R13: 00007ffdadeffe9f R14: 00007ffdadf00040 R15: 00007fdaad064d80 >> </TASK> >> NMI backtrace for cpu 0 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline] >> NMI backtrace for cpu 0 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline] >> NMI backtrace for cpu 0 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729 >> NMI backtrace for cpu 3 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline] >> NMI backtrace for cpu 3 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline] >> NMI backtrace for cpu 3 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729 >> NMI backtrace for cpu 6 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline] >> NMI backtrace for cpu 6 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline] >> NMI backtrace for cpu 6 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729 >> NMI backtrace for cpu 7 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline] >> NMI backtrace for cpu 7 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline] >> NMI backtrace for cpu 7 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729 >> NMI backtrace for cpu 4 >> CPU: 4 PID: 20623 Comm: syz-executor.6 Not tainted 6.3.0-next-20230426 #1 >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 >> RIP: 0010:kvm_wait+0xb7/0x110 arch/x86/kernel/kvm.c:1064 >> Code: 40 38 c6 74 1b 48 83 c4 10 c3 c3 e8 93 d3 50 00 eb 07 0f 00 2d 4a 04 92 08 fb f4 48 83 c4 10 c3 eb 07 0f 00 2d 3a 04 92 08 f4 <48> 83 c4 10 c3 89 74 24 0c 48 89 3c 24 e8 d7 d4 50 00 8b 74 24 0c >> RSP: 0018:ffffc90000300b50 EFLAGS: 00000046 >> RAX: 0000000000000003 RBX: 0000000000000000 RCX: dffffc0000000000 >> RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff88810b0803d8 >> RBP: ffff88810b0803d8 R08: 0000000000000001 R09: ffff88810b0803d8 >> R10: ffffed102161007b R11: ffffc90000300ff8 R12: 0000000000000000 >> R13: ffffed102161007b R14: 0000000000000001 R15: ffff888119e3d3c0 >> FS: 0000000000000000(0000) GS:ffff888119e00000(0000) knlGS:0000000000000000 >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> CR2: 00007f28183bd0b0 CR3: 000000000bb73000 CR4: 0000000000350ee0 >> Call Trace: >> <IRQ> >> pv_wait arch/x86/include/asm/paravirt.h:598 [inline] >> pv_wait_head_or_lock kernel/locking/qspinlock_paravirt.h:470 [inline] >> __pv_queued_spin_lock_slowpath+0x8e4/0xb80 kernel/locking/qspinlock.c:511 >> pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:586 [inline] >> queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:51 [inline] >> queued_spin_lock include/asm-generic/qspinlock.h:114 [inline] >> do_raw_spin_lock+0x20d/0x2b0 kernel/locking/spinlock_debug.c:115 >> __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:111 [inline] >> _raw_spin_lock_irqsave+0x45/0x60 kernel/locking/spinlock.c:162 >> drm_handle_vblank+0x11e/0xb80 drivers/gpu/drm/drm_vblank.c:1986 >> vkms_vblank_simulate+0xe8/0x3e0 drivers/gpu/drm/vkms/vkms_crtc.c:29 >> __run_hrtimer kernel/time/hrtimer.c:1685 [inline] >> __hrtimer_run_queues+0x599/0xa30 kernel/time/hrtimer.c:1749 >> hrtimer_interrupt+0x320/0x7b0 kernel/time/hrtimer.c:1811 >> local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1095 [inline] >> __sysvec_apic_timer_interrupt+0x14a/0x430 arch/x86/kernel/apic/apic.c:1112 >> sysvec_apic_timer_interrupt+0x92/0xc0 arch/x86/kernel/apic/apic.c:1106 >> </IRQ> >> <TASK> >> asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:645 >> RIP: 0010:check_kcov_mode kernel/kcov.c:173 [inline] >> RIP: 0010:__sanitizer_cov_trace_pc+0x11/0x70 kernel/kcov.c:207 >> Code: a8 01 00 00 e8 b0 ff ff ff 31 c0 c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 f3 0f 1e fa 65 8b 05 0d 33 82 7e 89 c1 48 8b 34 24 <81> e1 00 01 00 00 65 48 8b 14 25 40 bb 03 00 a9 00 01 ff 00 74 0e >> RSP: 0018:ffffc90002be76d8 EFLAGS: 00000286 >> RAX: 0000000080000001 RBX: 0000000000000001 RCX: 0000000080000001 >> RDX: 00007f2817c77000 RSI: ffffffff81bcd756 RDI: ffffc90002be7ad8 >> RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000 >> R10: 0000000000000001 R11: 0000000000000001 R12: ffffea00014fc480 >> R13: 0000000000000000 R14: dffffc0000000000 R15: 8000000053f12007 >> zap_drop_file_uffd_wp mm/memory.c:1352 [inline] >> zap_install_uffd_wp_if_needed mm/memory.c:1371 [inline] >> zap_pte_range mm/memory.c:1417 [inline] >> zap_pmd_range mm/memory.c:1564 [inline] >> zap_pud_range mm/memory.c:1593 [inline] >> zap_p4d_range mm/memory.c:1614 [inline] >> unmap_page_range+0x1046/0x4470 mm/memory.c:1635 >> unmap_single_vma+0x19a/0x2b0 mm/memory.c:1681 >> unmap_vmas+0x234/0x380 mm/memory.c:1720 >> exit_mmap+0x190/0x930 mm/mmap.c:3111 >> __mmput+0x128/0x4c0 kernel/fork.c:1351 >> mmput+0x60/0x70 kernel/fork.c:1373 >> exit_mm kernel/exit.c:564 [inline] >> do_exit+0x9d1/0x29f0 kernel/exit.c:858 >> do_group_exit+0xd4/0x2a0 kernel/exit.c:1021 >> get_signal+0x2311/0x25c0 kernel/signal.c:2874 >> arch_do_signal_or_restart+0x79/0x5a0 arch/x86/kernel/signal.c:307 >> exit_to_user_mode_loop kernel/entry/common.c:168 [inline] >> exit_to_user_mode_prepare+0x11f/0x240 kernel/entry/common.c:204 >> __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline] >> syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:297 >> do_syscall_64+0x46/0x80 arch/x86/entry/common.c:86 >> entry_SYSCALL_64_after_hwframe+0x63/0xcd >> RIP: 0033:0x7f281828edcd >> Code: Unable to access opcode bytes at 0x7f281828eda3. >> RSP: 002b:00007f28194c0c98 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca >> RAX: fffffffffffffe00 RBX: 00007f28183bbf80 RCX: 00007f281828edcd >> RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00007f28183bbf88 >> RBP: 00007f28183bbf88 R08: 0000000000000000 R09: 0000000000000000 >> R10: 0000000000000000 R11: 0000000000000246 R12: 00007f28183bbf8c >> R13: 00007ffd5038e1ef R14: 00007ffd5038e390 R15: 00007f28194c0d80 >> </TASK> >> NMI backtrace for cpu 2 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline] >> NMI backtrace for cpu 2 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline] >> NMI backtrace for cpu 2 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729 ^ permalink raw reply [flat|nested] 19+ messages in thread
* drm/vkms: deadlock between dev->event_lock and timer @ 2023-09-13 14:21 ` Tetsuo Handa 0 siblings, 0 replies; 19+ messages in thread From: Tetsuo Handa @ 2023-09-13 14:21 UTC (permalink / raw) To: Rodrigo Siqueira, Melissa Wen, Maira Canal, Haneen Mohammed, Daniel Vetter, David Airlie, DRI Cc: syzkaller, LKML, Hillf Danton, Sanan Hasanov Hello. A deadlock was reported in drivers/gpu/drm/vkms/ . It looks like this locking pattern remains as of 6.6-rc1. Please fix. void drm_crtc_vblank_off(struct drm_crtc *crtc) { spin_lock_irq(&dev->event_lock); drm_vblank_disable_and_save(dev, pipe) { __disable_vblank(dev, pipe) { crtc->funcs->disable_vblank(crtc) == vkms_disable_vblank { hrtimer_cancel(&out->vblank_hrtimer) { // Retries with dev->event_lock held until lock_hrtimer_base() succeeds. ret = hrtimer_try_to_cancel(timer) { base = lock_hrtimer_base(timer, &flags); // Fails forever because vkms_vblank_simulate() is in progress. } } } } } spin_unlock_irq(&dev->event_lock); } static void __run_hrtimer(...) { restart = fn(timer) == vkms_vblank_simulate { drm_crtc_handle_vblank(crtc) { drm_handle_vblank(struct drm_device *dev, unsigned int pipe) { spin_lock_irqsave(&dev->event_lock, irqflags); // Trying to hold dev->event_lock inside timer interrupt handler. => Deadlock was reported as a soft lockup. spin_unlock_irqrestore(&dev->event_lock, irqflags); } } } } On 2023/09/13 20:07, Hillf Danton wrote: > On Tue, 12 Sep 2023 23:02:56 +0000 Sanan Hasanov <Sanan.Hasanov@ucf.edu> >> Good day, dear maintainers, >> >> We found a bug using a modified kernel configuration file used by syzbot. >> > Thanks for your report. > >> We enhanced the coverage of the configuration file using our tool, klocalizer. >> >> Kernel Branch: 6.3.0-next-20230426 >> Kernel Config: https://drive.google.com/file/d/1WSUEWrith9-539qo6xRqmwy4LfDtmKpp/view?usp=sharing >> Reproducer: https://drive.google.com/file/d/1pN6FfcjuUs6Wx94g1gufuYGjRbMMgiZ4/view?usp=sharing >> Thank you! >> >> Best regards, >> Sanan Hasanov >> >> watchdog: BUG: soft lockup - CPU#5 stuck for 26s! [kworker/u16:1:12] >> Modules linked in: >> irq event stamp: 192794 >> hardirqs last enabled at (192793): [<ffffffff89a0140a>] asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:645 >> hardirqs last disabled at (192794): [<ffffffff89975d4f>] sysvec_apic_timer_interrupt+0xf/0xc0 arch/x86/kernel/apic/apic.c:1106 >> softirqs last enabled at (187764): [<ffffffff814b94bd>] invoke_softirq kernel/softirq.c:445 [inline] >> softirqs last enabled at (187764): [<ffffffff814b94bd>] __irq_exit_rcu+0x11d/0x190 kernel/softirq.c:650 >> softirqs last disabled at (187671): [<ffffffff814b94bd>] invoke_softirq kernel/softirq.c:445 [inline] >> softirqs last disabled at (187671): [<ffffffff814b94bd>] __irq_exit_rcu+0x11d/0x190 kernel/softirq.c:650 >> CPU: 5 PID: 12 Comm: kworker/u16:1 Not tainted 6.3.0-next-20230426 #1 >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 >> Workqueue: events_unbound toggle_allocation_gate >> RIP: 0010:csd_lock_wait kernel/smp.c:294 [inline] >> RIP: 0010:smp_call_function_many_cond+0x5bd/0x1020 kernel/smp.c:828 >> Code: 0b 00 85 ed 74 4d 48 b8 00 00 00 00 00 fc ff df 4d 89 f4 4c 89 f5 49 c1 ec 03 83 e5 07 49 01 c4 83 c5 03 e8 b5 07 0b 00 f3 90 <41> 0f b6 04 24 40 38 c5 7c 08 84 c0 0f 85 46 08 00 00 8b 43 08 31 >> RSP: 0018:ffffc900000cf9e8 EFLAGS: 00000293 >> RAX: 0000000000000000 RBX: ffff888119cc4d80 RCX: 0000000000000000 >> RDX: ffff888100325940 RSI: ffffffff8176807b RDI: 0000000000000005 >> RBP: 0000000000000003 R08: 0000000000000005 R09: 0000000000000000 >> R10: 0000000000000001 R11: 0000000000000001 R12: ffffed10233989b1 >> R13: 0000000000000001 R14: ffff888119cc4d88 R15: 0000000000000001 >> FS: 0000000000000000(0000) GS:ffff888119e80000(0000) knlGS:0000000000000000 >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> CR2: 0000555556a6cc88 CR3: 000000000bb73000 CR4: 0000000000350ee0 >> Call Trace: >> <TASK> >> on_each_cpu_cond_mask+0x40/0x90 kernel/smp.c:996 >> on_each_cpu include/linux/smp.h:71 [inline] >> text_poke_sync arch/x86/kernel/alternative.c:1770 [inline] >> text_poke_bp_batch+0x237/0x770 arch/x86/kernel/alternative.c:1970 >> text_poke_flush arch/x86/kernel/alternative.c:2161 [inline] >> text_poke_flush arch/x86/kernel/alternative.c:2158 [inline] >> text_poke_finish+0x1a/0x30 arch/x86/kernel/alternative.c:2168 >> arch_jump_label_transform_apply+0x17/0x30 arch/x86/kernel/jump_label.c:146 >> jump_label_update+0x321/0x400 kernel/jump_label.c:829 >> static_key_enable_cpuslocked+0x1b5/0x270 kernel/jump_label.c:205 >> static_key_enable+0x1a/0x20 kernel/jump_label.c:218 >> toggle_allocation_gate mm/kfence/core.c:831 [inline] >> toggle_allocation_gate+0xf4/0x220 mm/kfence/core.c:823 >> process_one_work+0x993/0x15e0 kernel/workqueue.c:2405 >> worker_thread+0x67d/0x10c0 kernel/workqueue.c:2552 >> kthread+0x33e/0x440 kernel/kthread.c:379 >> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 >> </TASK> >> Sending NMI from CPU 5 to CPUs 0-4,6-7: >> NMI backtrace for cpu 1 >> CPU: 1 PID: 20602 Comm: syz-executor.3 Not tainted 6.3.0-next-20230426 #1 >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 >> RIP: 0010:hlock_class kernel/locking/lockdep.c:228 [inline] >> RIP: 0010:check_wait_context kernel/locking/lockdep.c:4747 [inline] >> RIP: 0010:__lock_acquire+0x489/0x5d00 kernel/locking/lockdep.c:5024 >> Code: 41 81 e5 ff 1f 45 0f b7 ed be 08 00 00 00 4c 89 e8 48 c1 e8 06 48 8d 3c c5 00 6b 2c 90 e8 5f 90 6e 00 4c 0f a3 2d d7 35 c9 0e <0f> 83 5c 0c 00 00 4f 8d 6c 6d 00 49 c1 e5 06 49 81 c5 20 6f 2c 90 >> RSP: 0018:ffffc90002aa7350 EFLAGS: 00000047 >> RAX: 0000000000000001 RBX: 0000000000000001 RCX: ffffffff81633521 >> RDX: fffffbfff2058d62 RSI: 0000000000000008 RDI: ffffffff902c6b08 >> RBP: ffff888042995940 R08: 0000000000000000 R09: ffffffff902c6b0f >> R10: fffffbfff2058d61 R11: 0000000000000001 R12: ffff888119e2b818 >> R13: 0000000000000063 R14: 0000000000000002 R15: ffff888042996598 >> FS: 00007fdaad065700(0000) GS:ffff888119c80000(0000) knlGS:0000000000000000 >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> CR2: 0000001b30623000 CR3: 0000000101969000 CR4: 0000000000350ee0 >> Call Trace: >> <TASK> >> lock_acquire kernel/locking/lockdep.c:5691 [inline] >> lock_acquire+0x1b1/0x520 kernel/locking/lockdep.c:5656 >> __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] >> _raw_spin_lock_irqsave+0x3d/0x60 kernel/locking/spinlock.c:162 >> lock_hrtimer_base kernel/time/hrtimer.c:173 [inline] >> hrtimer_try_to_cancel kernel/time/hrtimer.c:1331 [inline] >> hrtimer_try_to_cancel+0xa9/0x2e0 kernel/time/hrtimer.c:1316 >> hrtimer_cancel+0x17/0x40 kernel/time/hrtimer.c:1443 >> __disable_vblank drivers/gpu/drm/drm_vblank.c:434 [inline] >> drm_vblank_disable_and_save+0x282/0x3d0 drivers/gpu/drm/drm_vblank.c:478 >> drm_crtc_vblank_off+0x312/0x970 drivers/gpu/drm/drm_vblank.c:1366 > > cpu1 cpu4 (see below) > ==== ==== > drm_crtc_vblank_off __run_hrtimer > spin_lock_irq(&dev->event_lock); > ... > drm_handle_vblank > hrtimer_cancel spin_lock_irqsave(&dev->event_lock, irqflags); > > > Deadlock should have been reported instead provided the lockdep_map in > struct timer_list were added also to hrtimer, so it is highly appreciated > if Tetsuo or Thomas adds it before 6.8 or 6.10. > > Hillf > >> disable_outputs+0x7c7/0xbb0 drivers/gpu/drm/drm_atomic_helper.c:1202 >> drm_atomic_helper_commit_modeset_disables+0x1d/0x40 drivers/gpu/drm/drm_atomic_helper.c:1397 >> vkms_atomic_commit_tail+0x51/0x240 drivers/gpu/drm/vkms/vkms_drv.c:71 >> commit_tail+0x288/0x420 drivers/gpu/drm/drm_atomic_helper.c:1812 >> drm_atomic_helper_commit drivers/gpu/drm/drm_atomic_helper.c:2052 [inline] >> drm_atomic_helper_commit+0x306/0x390 drivers/gpu/drm/drm_atomic_helper.c:1985 >> drm_atomic_commit+0x20a/0x2d0 drivers/gpu/drm/drm_atomic.c:1503 >> drm_client_modeset_commit_atomic+0x698/0x7e0 drivers/gpu/drm/drm_client_modeset.c:1045 >> drm_client_modeset_dpms+0x174/0x200 drivers/gpu/drm/drm_client_modeset.c:1226 >> drm_fb_helper_dpms drivers/gpu/drm/drm_fb_helper.c:323 [inline] >> drm_fb_helper_blank+0xd1/0x260 drivers/gpu/drm/drm_fb_helper.c:356 >> fb_blank+0x105/0x190 drivers/video/fbdev/core/fbmem.c:1088 >> do_fb_ioctl+0x390/0x760 drivers/video/fbdev/core/fbmem.c:1180 >> fb_ioctl+0xeb/0x150 drivers/video/fbdev/core/fbmem.c:1204 >> vfs_ioctl fs/ioctl.c:51 [inline] >> __do_sys_ioctl fs/ioctl.c:870 [inline] >> __se_sys_ioctl fs/ioctl.c:856 [inline] >> __x64_sys_ioctl+0x197/0x210 fs/ioctl.c:856 >> do_syscall_x64 arch/x86/entry/common.c:50 [inline] >> do_syscall_64+0x39/0x80 arch/x86/entry/common.c:80 >> entry_SYSCALL_64_after_hwframe+0x63/0xcd >> RIP: 0033:0x7fdaabe8edcd >> Code: 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 >> RSP: 002b:00007fdaad064bf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 >> RAX: ffffffffffffffda RBX: 00007fdaabfbbf80 RCX: 00007fdaabe8edcd >> RDX: 0000000000000004 RSI: 0000000000004611 RDI: 0000000000000003 >> RBP: 00007fdaabefc59c R08: 0000000000000000 R09: 0000000000000000 >> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 >> R13: 00007ffdadeffe9f R14: 00007ffdadf00040 R15: 00007fdaad064d80 >> </TASK> >> NMI backtrace for cpu 0 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline] >> NMI backtrace for cpu 0 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline] >> NMI backtrace for cpu 0 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729 >> NMI backtrace for cpu 3 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline] >> NMI backtrace for cpu 3 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline] >> NMI backtrace for cpu 3 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729 >> NMI backtrace for cpu 6 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline] >> NMI backtrace for cpu 6 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline] >> NMI backtrace for cpu 6 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729 >> NMI backtrace for cpu 7 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline] >> NMI backtrace for cpu 7 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline] >> NMI backtrace for cpu 7 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729 >> NMI backtrace for cpu 4 >> CPU: 4 PID: 20623 Comm: syz-executor.6 Not tainted 6.3.0-next-20230426 #1 >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 >> RIP: 0010:kvm_wait+0xb7/0x110 arch/x86/kernel/kvm.c:1064 >> Code: 40 38 c6 74 1b 48 83 c4 10 c3 c3 e8 93 d3 50 00 eb 07 0f 00 2d 4a 04 92 08 fb f4 48 83 c4 10 c3 eb 07 0f 00 2d 3a 04 92 08 f4 <48> 83 c4 10 c3 89 74 24 0c 48 89 3c 24 e8 d7 d4 50 00 8b 74 24 0c >> RSP: 0018:ffffc90000300b50 EFLAGS: 00000046 >> RAX: 0000000000000003 RBX: 0000000000000000 RCX: dffffc0000000000 >> RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff88810b0803d8 >> RBP: ffff88810b0803d8 R08: 0000000000000001 R09: ffff88810b0803d8 >> R10: ffffed102161007b R11: ffffc90000300ff8 R12: 0000000000000000 >> R13: ffffed102161007b R14: 0000000000000001 R15: ffff888119e3d3c0 >> FS: 0000000000000000(0000) GS:ffff888119e00000(0000) knlGS:0000000000000000 >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> CR2: 00007f28183bd0b0 CR3: 000000000bb73000 CR4: 0000000000350ee0 >> Call Trace: >> <IRQ> >> pv_wait arch/x86/include/asm/paravirt.h:598 [inline] >> pv_wait_head_or_lock kernel/locking/qspinlock_paravirt.h:470 [inline] >> __pv_queued_spin_lock_slowpath+0x8e4/0xb80 kernel/locking/qspinlock.c:511 >> pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:586 [inline] >> queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:51 [inline] >> queued_spin_lock include/asm-generic/qspinlock.h:114 [inline] >> do_raw_spin_lock+0x20d/0x2b0 kernel/locking/spinlock_debug.c:115 >> __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:111 [inline] >> _raw_spin_lock_irqsave+0x45/0x60 kernel/locking/spinlock.c:162 >> drm_handle_vblank+0x11e/0xb80 drivers/gpu/drm/drm_vblank.c:1986 >> vkms_vblank_simulate+0xe8/0x3e0 drivers/gpu/drm/vkms/vkms_crtc.c:29 >> __run_hrtimer kernel/time/hrtimer.c:1685 [inline] >> __hrtimer_run_queues+0x599/0xa30 kernel/time/hrtimer.c:1749 >> hrtimer_interrupt+0x320/0x7b0 kernel/time/hrtimer.c:1811 >> local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1095 [inline] >> __sysvec_apic_timer_interrupt+0x14a/0x430 arch/x86/kernel/apic/apic.c:1112 >> sysvec_apic_timer_interrupt+0x92/0xc0 arch/x86/kernel/apic/apic.c:1106 >> </IRQ> >> <TASK> >> asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:645 >> RIP: 0010:check_kcov_mode kernel/kcov.c:173 [inline] >> RIP: 0010:__sanitizer_cov_trace_pc+0x11/0x70 kernel/kcov.c:207 >> Code: a8 01 00 00 e8 b0 ff ff ff 31 c0 c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 f3 0f 1e fa 65 8b 05 0d 33 82 7e 89 c1 48 8b 34 24 <81> e1 00 01 00 00 65 48 8b 14 25 40 bb 03 00 a9 00 01 ff 00 74 0e >> RSP: 0018:ffffc90002be76d8 EFLAGS: 00000286 >> RAX: 0000000080000001 RBX: 0000000000000001 RCX: 0000000080000001 >> RDX: 00007f2817c77000 RSI: ffffffff81bcd756 RDI: ffffc90002be7ad8 >> RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000 >> R10: 0000000000000001 R11: 0000000000000001 R12: ffffea00014fc480 >> R13: 0000000000000000 R14: dffffc0000000000 R15: 8000000053f12007 >> zap_drop_file_uffd_wp mm/memory.c:1352 [inline] >> zap_install_uffd_wp_if_needed mm/memory.c:1371 [inline] >> zap_pte_range mm/memory.c:1417 [inline] >> zap_pmd_range mm/memory.c:1564 [inline] >> zap_pud_range mm/memory.c:1593 [inline] >> zap_p4d_range mm/memory.c:1614 [inline] >> unmap_page_range+0x1046/0x4470 mm/memory.c:1635 >> unmap_single_vma+0x19a/0x2b0 mm/memory.c:1681 >> unmap_vmas+0x234/0x380 mm/memory.c:1720 >> exit_mmap+0x190/0x930 mm/mmap.c:3111 >> __mmput+0x128/0x4c0 kernel/fork.c:1351 >> mmput+0x60/0x70 kernel/fork.c:1373 >> exit_mm kernel/exit.c:564 [inline] >> do_exit+0x9d1/0x29f0 kernel/exit.c:858 >> do_group_exit+0xd4/0x2a0 kernel/exit.c:1021 >> get_signal+0x2311/0x25c0 kernel/signal.c:2874 >> arch_do_signal_or_restart+0x79/0x5a0 arch/x86/kernel/signal.c:307 >> exit_to_user_mode_loop kernel/entry/common.c:168 [inline] >> exit_to_user_mode_prepare+0x11f/0x240 kernel/entry/common.c:204 >> __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline] >> syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:297 >> do_syscall_64+0x46/0x80 arch/x86/entry/common.c:86 >> entry_SYSCALL_64_after_hwframe+0x63/0xcd >> RIP: 0033:0x7f281828edcd >> Code: Unable to access opcode bytes at 0x7f281828eda3. >> RSP: 002b:00007f28194c0c98 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca >> RAX: fffffffffffffe00 RBX: 00007f28183bbf80 RCX: 00007f281828edcd >> RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00007f28183bbf88 >> RBP: 00007f28183bbf88 R08: 0000000000000000 R09: 0000000000000000 >> R10: 0000000000000000 R11: 0000000000000246 R12: 00007f28183bbf8c >> R13: 00007ffd5038e1ef R14: 00007ffd5038e390 R15: 00007f28194c0d80 >> </TASK> >> NMI backtrace for cpu 2 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline] >> NMI backtrace for cpu 2 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline] >> NMI backtrace for cpu 2 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729 ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: drm/vkms: deadlock between dev->event_lock and timer 2023-09-13 14:21 ` Tetsuo Handa @ 2023-09-13 16:47 ` Linus Torvalds -1 siblings, 0 replies; 19+ messages in thread From: Linus Torvalds @ 2023-09-13 16:47 UTC (permalink / raw) To: Tetsuo Handa Cc: Haneen Mohammed, Hillf Danton, Sanan Hasanov, Rodrigo Siqueira, LKML, DRI, Melissa Wen, Maira Canal, syzkaller On Wed, 13 Sept 2023 at 07:21, Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote: > > Hello. A deadlock was reported in drivers/gpu/drm/vkms/ . > It looks like this locking pattern remains as of 6.6-rc1. Please fix. > > void drm_crtc_vblank_off(struct drm_crtc *crtc) { > spin_lock_irq(&dev->event_lock); > drm_vblank_disable_and_save(dev, pipe) { > __disable_vblank(dev, pipe) { > crtc->funcs->disable_vblank(crtc) == vkms_disable_vblank { > hrtimer_cancel(&out->vblank_hrtimer) { // Retries with dev->event_lock held until lock_hrtimer_base() succeeds. > ret = hrtimer_try_to_cancel(timer) { > base = lock_hrtimer_base(timer, &flags); // Fails forever because vkms_vblank_simulate() is in progress. Heh. Ok. This is clearly a bug, but it does seem to be limited to just the vkms driver, and literally only to the "simulate vblank" case. The worst part about it is that it's so subtle and not obvious. Some light grepping seems to show that amdgpu has almost the exact same pattern in its own vkms thing, except it uses hrtimer_try_to_cancel(&amdgpu_crtc->vblank_timer); directly, which presumably fixes the livelock, but means that the cancel will fail if it's currently running. So just doing the same thing in the vkms driver probably fixes things. Maybe the vkms people need to add a flag to say "it's canceled" so that it doesn't then get re-enabled? Or maybe it doesn't matter and/or already happens for some reason I didn't look into. Linus ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: drm/vkms: deadlock between dev->event_lock and timer @ 2023-09-13 16:47 ` Linus Torvalds 0 siblings, 0 replies; 19+ messages in thread From: Linus Torvalds @ 2023-09-13 16:47 UTC (permalink / raw) To: Tetsuo Handa Cc: Rodrigo Siqueira, Melissa Wen, Maira Canal, Haneen Mohammed, Daniel Vetter, David Airlie, DRI, syzkaller, LKML, Hillf Danton, Sanan Hasanov On Wed, 13 Sept 2023 at 07:21, Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote: > > Hello. A deadlock was reported in drivers/gpu/drm/vkms/ . > It looks like this locking pattern remains as of 6.6-rc1. Please fix. > > void drm_crtc_vblank_off(struct drm_crtc *crtc) { > spin_lock_irq(&dev->event_lock); > drm_vblank_disable_and_save(dev, pipe) { > __disable_vblank(dev, pipe) { > crtc->funcs->disable_vblank(crtc) == vkms_disable_vblank { > hrtimer_cancel(&out->vblank_hrtimer) { // Retries with dev->event_lock held until lock_hrtimer_base() succeeds. > ret = hrtimer_try_to_cancel(timer) { > base = lock_hrtimer_base(timer, &flags); // Fails forever because vkms_vblank_simulate() is in progress. Heh. Ok. This is clearly a bug, but it does seem to be limited to just the vkms driver, and literally only to the "simulate vblank" case. The worst part about it is that it's so subtle and not obvious. Some light grepping seems to show that amdgpu has almost the exact same pattern in its own vkms thing, except it uses hrtimer_try_to_cancel(&amdgpu_crtc->vblank_timer); directly, which presumably fixes the livelock, but means that the cancel will fail if it's currently running. So just doing the same thing in the vkms driver probably fixes things. Maybe the vkms people need to add a flag to say "it's canceled" so that it doesn't then get re-enabled? Or maybe it doesn't matter and/or already happens for some reason I didn't look into. Linus ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: drm/vkms: deadlock between dev->event_lock and timer 2023-09-13 16:47 ` Linus Torvalds @ 2023-09-13 21:08 ` Thomas Gleixner -1 siblings, 0 replies; 19+ messages in thread From: Thomas Gleixner @ 2023-09-13 21:08 UTC (permalink / raw) To: Linus Torvalds, Tetsuo Handa Cc: Haneen Mohammed, Hillf Danton, Sanan Hasanov, Rodrigo Siqueira, LKML, DRI, Melissa Wen, Maira Canal, syzkaller On Wed, Sep 13 2023 at 09:47, Linus Torvalds wrote: > On Wed, 13 Sept 2023 at 07:21, Tetsuo Handa > <penguin-kernel@i-love.sakura.ne.jp> wrote: >> >> Hello. A deadlock was reported in drivers/gpu/drm/vkms/ . >> It looks like this locking pattern remains as of 6.6-rc1. Please fix. >> >> void drm_crtc_vblank_off(struct drm_crtc *crtc) { >> spin_lock_irq(&dev->event_lock); >> drm_vblank_disable_and_save(dev, pipe) { >> __disable_vblank(dev, pipe) { >> crtc->funcs->disable_vblank(crtc) == vkms_disable_vblank { >> hrtimer_cancel(&out->vblank_hrtimer) { // Retries with dev->event_lock held until lock_hrtimer_base() succeeds. >> ret = hrtimer_try_to_cancel(timer) { >> base = lock_hrtimer_base(timer, &flags); // Fails forever because vkms_vblank_simulate() is in progress. > > Heh. Ok. This is clearly a bug, but it does seem to be limited to just > the vkms driver, and literally only to the "simulate vblank" case. > > The worst part about it is that it's so subtle and not obvious. > > Some light grepping seems to show that amdgpu has almost the exact > same pattern in its own vkms thing, except it uses > > hrtimer_try_to_cancel(&amdgpu_crtc->vblank_timer); > > directly, which presumably fixes the livelock, but means that the > cancel will fail if it's currently running. > > So just doing the same thing in the vkms driver probably fixes things. > > Maybe the vkms people need to add a flag to say "it's canceled" so > that it doesn't then get re-enabled? Or maybe it doesn't matter > and/or already happens for some reason I didn't look into. Maybe the VKMS people need to understand locking in the first place. The first thing I saw in this code is: static enum hrtimer_restart vkms_vblank_simulate(struct hrtimer *timer) { ... mutex_unlock(&output->enabled_lock); What? Unlocking a mutex in the context of a hrtimer callback is simply violating all mutex locking rules. How has this code ever survived lock debugging without triggering a big fat warning? Thanks, tglx ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: drm/vkms: deadlock between dev->event_lock and timer @ 2023-09-13 21:08 ` Thomas Gleixner 0 siblings, 0 replies; 19+ messages in thread From: Thomas Gleixner @ 2023-09-13 21:08 UTC (permalink / raw) To: Linus Torvalds, Tetsuo Handa Cc: Rodrigo Siqueira, Melissa Wen, Maira Canal, Haneen Mohammed, Daniel Vetter, David Airlie, DRI, syzkaller, LKML, Hillf Danton, Sanan Hasanov On Wed, Sep 13 2023 at 09:47, Linus Torvalds wrote: > On Wed, 13 Sept 2023 at 07:21, Tetsuo Handa > <penguin-kernel@i-love.sakura.ne.jp> wrote: >> >> Hello. A deadlock was reported in drivers/gpu/drm/vkms/ . >> It looks like this locking pattern remains as of 6.6-rc1. Please fix. >> >> void drm_crtc_vblank_off(struct drm_crtc *crtc) { >> spin_lock_irq(&dev->event_lock); >> drm_vblank_disable_and_save(dev, pipe) { >> __disable_vblank(dev, pipe) { >> crtc->funcs->disable_vblank(crtc) == vkms_disable_vblank { >> hrtimer_cancel(&out->vblank_hrtimer) { // Retries with dev->event_lock held until lock_hrtimer_base() succeeds. >> ret = hrtimer_try_to_cancel(timer) { >> base = lock_hrtimer_base(timer, &flags); // Fails forever because vkms_vblank_simulate() is in progress. > > Heh. Ok. This is clearly a bug, but it does seem to be limited to just > the vkms driver, and literally only to the "simulate vblank" case. > > The worst part about it is that it's so subtle and not obvious. > > Some light grepping seems to show that amdgpu has almost the exact > same pattern in its own vkms thing, except it uses > > hrtimer_try_to_cancel(&amdgpu_crtc->vblank_timer); > > directly, which presumably fixes the livelock, but means that the > cancel will fail if it's currently running. > > So just doing the same thing in the vkms driver probably fixes things. > > Maybe the vkms people need to add a flag to say "it's canceled" so > that it doesn't then get re-enabled? Or maybe it doesn't matter > and/or already happens for some reason I didn't look into. Maybe the VKMS people need to understand locking in the first place. The first thing I saw in this code is: static enum hrtimer_restart vkms_vblank_simulate(struct hrtimer *timer) { ... mutex_unlock(&output->enabled_lock); What? Unlocking a mutex in the context of a hrtimer callback is simply violating all mutex locking rules. How has this code ever survived lock debugging without triggering a big fat warning? Thanks, tglx ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: drm/vkms: deadlock between dev->event_lock and timer 2023-09-13 21:08 ` Thomas Gleixner @ 2023-09-14 6:33 ` Tetsuo Handa -1 siblings, 0 replies; 19+ messages in thread From: Tetsuo Handa @ 2023-09-14 6:33 UTC (permalink / raw) To: Maira Canal, Arthur Grillo Cc: Haneen Mohammed, Hillf Danton, Sanan Hasanov, Rodrigo Siqueira, Linus Torvalds, LKML, DRI, Melissa Wen, syzkaller, Thomas Gleixner On 2023/09/14 6:08, Thomas Gleixner wrote: > Maybe the VKMS people need to understand locking in the first place. The > first thing I saw in this code is: > > static enum hrtimer_restart vkms_vblank_simulate(struct hrtimer *timer) > { > ... > mutex_unlock(&output->enabled_lock); > > What? > > Unlocking a mutex in the context of a hrtimer callback is simply > violating all mutex locking rules. > > How has this code ever survived lock debugging without triggering a big > fat warning? Commit a0e6a017ab56936c ("drm/vkms: Fix race-condition between the hrtimer and the atomic commit") in 6.6-rc1 replaced spinlock with mutex. So we haven't tested with the lock debugging yet... Maíra and Arthur, mutex_unlock() from interrupt context is not permitted. Please revert that patch immediately. I guess that a semaphore (down()/up()) could be used instead of a mutex. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: drm/vkms: deadlock between dev->event_lock and timer @ 2023-09-14 6:33 ` Tetsuo Handa 0 siblings, 0 replies; 19+ messages in thread From: Tetsuo Handa @ 2023-09-14 6:33 UTC (permalink / raw) To: Maira Canal, Arthur Grillo Cc: Rodrigo Siqueira, Melissa Wen, Haneen Mohammed, Daniel Vetter, David Airlie, DRI, syzkaller, LKML, Hillf Danton, Sanan Hasanov, Thomas Gleixner, Linus Torvalds On 2023/09/14 6:08, Thomas Gleixner wrote: > Maybe the VKMS people need to understand locking in the first place. The > first thing I saw in this code is: > > static enum hrtimer_restart vkms_vblank_simulate(struct hrtimer *timer) > { > ... > mutex_unlock(&output->enabled_lock); > > What? > > Unlocking a mutex in the context of a hrtimer callback is simply > violating all mutex locking rules. > > How has this code ever survived lock debugging without triggering a big > fat warning? Commit a0e6a017ab56936c ("drm/vkms: Fix race-condition between the hrtimer and the atomic commit") in 6.6-rc1 replaced spinlock with mutex. So we haven't tested with the lock debugging yet... Maíra and Arthur, mutex_unlock() from interrupt context is not permitted. Please revert that patch immediately. I guess that a semaphore (down()/up()) could be used instead of a mutex. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: drm/vkms: deadlock between dev->event_lock and timer 2023-09-14 6:33 ` Tetsuo Handa @ 2023-09-14 8:12 ` Daniel Vetter -1 siblings, 0 replies; 19+ messages in thread From: Daniel Vetter @ 2023-09-14 8:12 UTC (permalink / raw) To: Tetsuo Handa Cc: Haneen Mohammed, Hillf Danton, Sanan Hasanov, Rodrigo Siqueira, Linus Torvalds, LKML, DRI, Melissa Wen, Maira Canal, syzkaller, Thomas Gleixner, Arthur Grillo On Thu, Sep 14, 2023 at 03:33:41PM +0900, Tetsuo Handa wrote: > On 2023/09/14 6:08, Thomas Gleixner wrote: > > Maybe the VKMS people need to understand locking in the first place. The > > first thing I saw in this code is: > > > > static enum hrtimer_restart vkms_vblank_simulate(struct hrtimer *timer) > > { > > ... > > mutex_unlock(&output->enabled_lock); > > > > What? > > > > Unlocking a mutex in the context of a hrtimer callback is simply > > violating all mutex locking rules. > > > > How has this code ever survived lock debugging without triggering a big > > fat warning? > > Commit a0e6a017ab56936c ("drm/vkms: Fix race-condition between the hrtimer > and the atomic commit") in 6.6-rc1 replaced spinlock with mutex. So we haven't > tested with the lock debugging yet... Yeah that needs an immediate revert, there's not much that looks legit in that patch. I'll chat with Maira. Also yes how that landed without anyone running lockdep is ... not good. I guess we need a lockdep enabled drm ci target that runs vkms tests asap :-) > Maíra and Arthur, mutex_unlock() from interrupt context is not permitted. > Please revert that patch immediately. > I guess that a semaphore (down()/up()) could be used instead of a mutex. From a quick look this smells like a classic "try to use locking when you want synchronization primitives", so semaphore here doesn't look any better. The vkms_set_composer() function was originally for crc generation, where it's userspace's job to make sure they wait for all the crc they need to be generated before they shut it down again. But for writeback the kernel must guarantee that the compositiona actually happens, and the current function just doesn't make any such guarantees. Cheers, Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: drm/vkms: deadlock between dev->event_lock and timer @ 2023-09-14 8:12 ` Daniel Vetter 0 siblings, 0 replies; 19+ messages in thread From: Daniel Vetter @ 2023-09-14 8:12 UTC (permalink / raw) To: Tetsuo Handa Cc: Maira Canal, Arthur Grillo, Rodrigo Siqueira, Melissa Wen, Haneen Mohammed, Daniel Vetter, David Airlie, DRI, syzkaller, LKML, Hillf Danton, Sanan Hasanov, Thomas Gleixner, Linus Torvalds On Thu, Sep 14, 2023 at 03:33:41PM +0900, Tetsuo Handa wrote: > On 2023/09/14 6:08, Thomas Gleixner wrote: > > Maybe the VKMS people need to understand locking in the first place. The > > first thing I saw in this code is: > > > > static enum hrtimer_restart vkms_vblank_simulate(struct hrtimer *timer) > > { > > ... > > mutex_unlock(&output->enabled_lock); > > > > What? > > > > Unlocking a mutex in the context of a hrtimer callback is simply > > violating all mutex locking rules. > > > > How has this code ever survived lock debugging without triggering a big > > fat warning? > > Commit a0e6a017ab56936c ("drm/vkms: Fix race-condition between the hrtimer > and the atomic commit") in 6.6-rc1 replaced spinlock with mutex. So we haven't > tested with the lock debugging yet... Yeah that needs an immediate revert, there's not much that looks legit in that patch. I'll chat with Maira. Also yes how that landed without anyone running lockdep is ... not good. I guess we need a lockdep enabled drm ci target that runs vkms tests asap :-) > Maíra and Arthur, mutex_unlock() from interrupt context is not permitted. > Please revert that patch immediately. > I guess that a semaphore (down()/up()) could be used instead of a mutex. From a quick look this smells like a classic "try to use locking when you want synchronization primitives", so semaphore here doesn't look any better. The vkms_set_composer() function was originally for crc generation, where it's userspace's job to make sure they wait for all the crc they need to be generated before they shut it down again. But for writeback the kernel must guarantee that the compositiona actually happens, and the current function just doesn't make any such guarantees. Cheers, Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: drm/vkms: deadlock between dev->event_lock and timer 2023-09-14 8:12 ` Daniel Vetter (?) @ 2023-09-18 22:02 ` Helen Koike 2023-09-19 6:38 ` Daniel Stone -1 siblings, 1 reply; 19+ messages in thread From: Helen Koike @ 2023-09-18 22:02 UTC (permalink / raw) To: Tetsuo Handa, Maira Canal, Arthur Grillo, Rodrigo Siqueira, Melissa Wen, Haneen Mohammed, David Airlie, DRI, syzkaller, LKML, Hillf Danton, Sanan Hasanov, Thomas Gleixner, Linus Torvalds, Daniel Stone, David Heidelberg, Vignesh Raman On 14/09/2023 05:12, Daniel Vetter wrote: > On Thu, Sep 14, 2023 at 03:33:41PM +0900, Tetsuo Handa wrote: >> On 2023/09/14 6:08, Thomas Gleixner wrote: >>> Maybe the VKMS people need to understand locking in the first place. The >>> first thing I saw in this code is: >>> >>> static enum hrtimer_restart vkms_vblank_simulate(struct hrtimer *timer) >>> { >>> ... >>> mutex_unlock(&output->enabled_lock); >>> >>> What? >>> >>> Unlocking a mutex in the context of a hrtimer callback is simply >>> violating all mutex locking rules. >>> >>> How has this code ever survived lock debugging without triggering a big >>> fat warning? >> >> Commit a0e6a017ab56936c ("drm/vkms: Fix race-condition between the hrtimer >> and the atomic commit") in 6.6-rc1 replaced spinlock with mutex. So we haven't >> tested with the lock debugging yet... > > Yeah that needs an immediate revert, there's not much that looks legit in > that patch. I'll chat with Maira. > > Also yes how that landed without anyone running lockdep is ... not good. I > guess we need a lockdep enabled drm ci target that runs vkms tests asap > :-) btw, I just executed a draft version of vkms targed on the ci, we do get the warning: https://gitlab.freedesktop.org/helen.fornazier/linux/-/jobs/49156305#L623 I'm just not sure if tests would fail (since it is a warning) and it has a chance to be ignored if people don't look at the logs (unless if igt already handles that). I still need to do some adjustments (it seems the tests is hanging somewhere and we got a timeout) but we should have vkms in drm ci soon. Regards, Helen > >> Maíra and Arthur, mutex_unlock() from interrupt context is not permitted. >> Please revert that patch immediately. >> I guess that a semaphore (down()/up()) could be used instead of a mutex. > > From a quick look this smells like a classic "try to use locking when you > want synchronization primitives", so semaphore here doesn't look any > better. The vkms_set_composer() function was originally for crc > generation, where it's userspace's job to make sure they wait for all the > crc they need to be generated before they shut it down again. But for > writeback the kernel must guarantee that the compositiona actually > happens, and the current function just doesn't make any such guarantees. > > Cheers, Daniel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: drm/vkms: deadlock between dev->event_lock and timer 2023-09-18 22:02 ` Helen Koike @ 2023-09-19 6:38 ` Daniel Stone 0 siblings, 0 replies; 19+ messages in thread From: Daniel Stone @ 2023-09-19 6:38 UTC (permalink / raw) To: Helen Koike Cc: Haneen Mohammed, Hillf Danton, Sanan Hasanov, Rodrigo Siqueira, David Heidelberg, Tetsuo Handa, Linus Torvalds, Vignesh Raman, LKML, DRI, Melissa Wen, Maira Canal, syzkaller, Thomas Gleixner, Arthur Grillo, Daniel Stone On Mon, 18 Sept 2023 at 23:02, Helen Koike <helen.koike@collabora.com> wrote: > On 14/09/2023 05:12, Daniel Vetter wrote: > > Also yes how that landed without anyone running lockdep is ... not good. I > > guess we need a lockdep enabled drm ci target that runs vkms tests asap > > :-) > > btw, I just executed a draft version of vkms targed on the ci, we do get > the warning: > > https://gitlab.freedesktop.org/helen.fornazier/linux/-/jobs/49156305#L623 > > I'm just not sure if tests would fail (since it is a warning) and it has > a chance to be ignored if people don't look at the logs (unless if igt > already handles that). Hmm, dmesg-warn is already a separate igt test status. I guess it just needs to be handled explicitly. > I still need to do some adjustments (it seems the tests is hanging > somewhere and we got a timeout) but we should have vkms in drm ci soon. Might be due to the locking explosion? The kernels should probably have soft-lockup detection enabled as well as lockdep. Cheers, Daniel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: drm/vkms: deadlock between dev->event_lock and timer @ 2023-09-19 6:38 ` Daniel Stone 0 siblings, 0 replies; 19+ messages in thread From: Daniel Stone @ 2023-09-19 6:38 UTC (permalink / raw) To: Helen Koike Cc: Tetsuo Handa, Maira Canal, Arthur Grillo, Rodrigo Siqueira, Melissa Wen, Haneen Mohammed, David Airlie, DRI, syzkaller, LKML, Hillf Danton, Sanan Hasanov, Thomas Gleixner, Linus Torvalds, Daniel Stone, David Heidelberg, Vignesh Raman On Mon, 18 Sept 2023 at 23:02, Helen Koike <helen.koike@collabora.com> wrote: > On 14/09/2023 05:12, Daniel Vetter wrote: > > Also yes how that landed without anyone running lockdep is ... not good. I > > guess we need a lockdep enabled drm ci target that runs vkms tests asap > > :-) > > btw, I just executed a draft version of vkms targed on the ci, we do get > the warning: > > https://gitlab.freedesktop.org/helen.fornazier/linux/-/jobs/49156305#L623 > > I'm just not sure if tests would fail (since it is a warning) and it has > a chance to be ignored if people don't look at the logs (unless if igt > already handles that). Hmm, dmesg-warn is already a separate igt test status. I guess it just needs to be handled explicitly. > I still need to do some adjustments (it seems the tests is hanging > somewhere and we got a timeout) but we should have vkms in drm ci soon. Might be due to the locking explosion? The kernels should probably have soft-lockup detection enabled as well as lockdep. Cheers, Daniel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: BUG: soft lockup in smp_call_function 2023-09-13 11:07 ` Hillf Danton 2023-09-13 14:21 ` Tetsuo Handa @ 2023-09-13 14:30 ` Tetsuo Handa 2023-09-14 12:21 ` Hillf Danton 1 sibling, 1 reply; 19+ messages in thread From: Tetsuo Handa @ 2023-09-13 14:30 UTC (permalink / raw) To: Hillf Danton, Sanan Hasanov, Thomas Gleixner, peterz Cc: Linus Torvalds, syzkaller, LKML On 2023/09/13 20:07, Hillf Danton wrote: > > cpu1 cpu4 (see below) > ==== ==== > drm_crtc_vblank_off __run_hrtimer > spin_lock_irq(&dev->event_lock); > ... > drm_handle_vblank > hrtimer_cancel spin_lock_irqsave(&dev->event_lock, irqflags); > > > Deadlock should have been reported instead provided the lockdep_map in > struct timer_list were added also to hrtimer, so it is highly appreciated > if Tetsuo or Thomas adds it before 6.8 or 6.10. Not me. ;-) Since hrtimer_cancel() retries forever until lock_hrtimer_base() succeeds, we want to add a lockdep annotation into hrtimer_cancel() so that we can detect this type of deadlock? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: BUG: soft lockup in smp_call_function 2023-09-13 14:30 ` BUG: soft lockup in smp_call_function Tetsuo Handa @ 2023-09-14 12:21 ` Hillf Danton 2023-09-14 13:13 ` Tetsuo Handa 0 siblings, 1 reply; 19+ messages in thread From: Hillf Danton @ 2023-09-14 12:21 UTC (permalink / raw) To: Tetsuo Handa Cc: Sanan Hasanov, Thomas Gleixner, Linus Torvalds, syzkaller, linux-mm, LKML On Wed, 13 Sep 2023 23:30:23 +0900 Tetsuo Handa wrote: > On 2023/09/13 20:07, Hillf Danton wrote: > > > > cpu1 cpu4 (see below) > > ==== ==== > > drm_crtc_vblank_off __run_hrtimer > > spin_lock_irq(&dev->event_lock); > > ... > > drm_handle_vblank > > hrtimer_cancel spin_lock_irqsave(&dev->event_lock, irqflags); > > > > > > Deadlock should have been reported instead provided the lockdep_map in > > struct timer_list were added also to hrtimer, so it is highly appreciated > > if Tetsuo or Thomas adds it before 6.8 or 6.10. > > Not me. ;-) > > Since hrtimer_cancel() retries forever until lock_hrtimer_base() succeeds, > we want to add a lockdep annotation into hrtimer_cancel() so that we can > detect this type of deadlock? Yes, you are right. The diff below is my two cents (only for thoughts). --- x/include/linux/timer.h +++ y/include/linux/timer.h @@ -124,6 +124,9 @@ struct hrtimer { u8 is_rel; u8 is_soft; u8 is_hard; +#ifdef CONFIG_LOCKDEP + struct lockdep_map lockdep_map; +#endif }; /** @@ -369,33 +372,65 @@ static inline void hrtimer_cancel_wait_r /* Exported timer functions: */ /* Initialize timers: */ -extern void hrtimer_init(struct hrtimer *timer, clockid_t which_clock, - enum hrtimer_mode mode); -extern void hrtimer_init_sleeper(struct hrtimer_sleeper *sl, clockid_t clock_id, - enum hrtimer_mode mode); +extern void hrtimer_init_key(struct hrtimer *timer, clockid_t which_clock, + enum hrtimer_mode mode, + const char *name, struct lock_class_key *key); +extern void hrtimer_init_sleeper_key(struct hrtimer_sleeper *sl, clockid_t clock_id, + enum hrtimer_mode mode, + const char *name, struct lock_class_key *key); +#ifdef CONFIG_LOCKDEP +#define hrtimer_init(t, c, m) \ + do { \ + static struct lock_class_key __key; \ + hrtimer_init_key(t, c, m, #t, &__key); \ + } while (0) + +#define hrtimer_init_sleeper(s, c, m) \ + do { \ + static struct lock_class_key __key; \ + hrtimer_init_sleeper_key(s, c, m, #s, &__key); \ + } while (0) +#else +#define hrtimer_init(t, c, m) \ + hrtimer_init_key(t, c, m, NULL, NULL) + +#define hrtimer_init_sleeper(s, c, m) \ + hrtimer_init_sleeper_key(s, c, m, NULL, NULL) +#endif #ifdef CONFIG_DEBUG_OBJECTS_TIMERS -extern void hrtimer_init_on_stack(struct hrtimer *timer, clockid_t which_clock, - enum hrtimer_mode mode); -extern void hrtimer_init_sleeper_on_stack(struct hrtimer_sleeper *sl, +extern void hrtimer_init_on_stack_key(struct hrtimer *timer, clockid_t which_clock, + enum hrtimer_mode mode, + const char *name, struct lock_class_key *key); +extern void hrtimer_init_sleeper_on_stack_key(struct hrtimer_sleeper *sl, clockid_t clock_id, - enum hrtimer_mode mode); + enum hrtimer_mode mode, + const char *name, struct lock_class_key *key); +#ifdef CONFIG_LOCKDEP + #define hrtimer_init_on_stack(t, c, m) \ + do { \ + static struct lock_class_key __key; \ + hrtimer_init_on_stack_key(t, c, m, #t, &__key); \ + } while (0) + #define hrtimer_init_sleeper_on_stack(s, c, m) \ + do { \ + static struct lock_class_key __key; \ + hrtimer_init_sleeper_on_stack_key(s, c, m, #s, &__key); \ + } while (0) +#else + #define hrtimer_init_on_stack(t, c, m) \ + hrtimer_init_on_stack_key(t, c, m, NULL, NULL) + #define hrtimer_init_sleeper_on_stack(s, c, m) \ + hrtimer_init_sleeper_on_stack_key(s, c, m, NULL, NULL) +#endif extern void destroy_hrtimer_on_stack(struct hrtimer *timer); #else -static inline void hrtimer_init_on_stack(struct hrtimer *timer, - clockid_t which_clock, - enum hrtimer_mode mode) -{ - hrtimer_init(timer, which_clock, mode); -} +#define hrtimer_init_on_stack(t, c, m) \ + hrtimer_init(t, c, m) -static inline void hrtimer_init_sleeper_on_stack(struct hrtimer_sleeper *sl, - clockid_t clock_id, - enum hrtimer_mode mode) -{ - hrtimer_init_sleeper(sl, clock_id, mode); -} +#define hrtimer_init_sleeper_on_stack(s, c, m) \ + hrtimer_init_sleeper(s, c, m) static inline void destroy_hrtimer_on_stack(struct hrtimer *timer) { } #endif --- x/kernel/time/hrtimer.c +++ y/kernel/time/hrtimer.c @@ -428,22 +428,26 @@ static inline void debug_hrtimer_deactiv static void __hrtimer_init(struct hrtimer *timer, clockid_t clock_id, enum hrtimer_mode mode); -void hrtimer_init_on_stack(struct hrtimer *timer, clockid_t clock_id, - enum hrtimer_mode mode) +void hrtimer_init_on_stack_key(struct hrtimer *timer, clockid_t clock_id, + enum hrtimer_mode mode, + const char *name, struct lock_class_key *key) { debug_object_init_on_stack(timer, &hrtimer_debug_descr); __hrtimer_init(timer, clock_id, mode); + lockdep_init_map(&timer->lockdep_map, name, key, 0); } EXPORT_SYMBOL_GPL(hrtimer_init_on_stack); static void __hrtimer_init_sleeper(struct hrtimer_sleeper *sl, clockid_t clock_id, enum hrtimer_mode mode); -void hrtimer_init_sleeper_on_stack(struct hrtimer_sleeper *sl, - clockid_t clock_id, enum hrtimer_mode mode) +void hrtimer_init_sleeper_on_stack_key(struct hrtimer_sleeper *sl, + clockid_t clock_id, enum hrtimer_mode mode, + const char *name, struct lock_class_key *key) { debug_object_init_on_stack(&sl->timer, &hrtimer_debug_descr); __hrtimer_init_sleeper(sl, clock_id, mode); + lockdep_init_map(&sl->timer.lockdep_map, name, key, 0); } EXPORT_SYMBOL_GPL(hrtimer_init_sleeper_on_stack); @@ -1439,6 +1443,8 @@ int hrtimer_cancel(struct hrtimer *timer { int ret; + lock_map_acquire(&timer->lockdep_map); + lock_map_release(&timer->lockdep_map); do { ret = hrtimer_try_to_cancel(timer); @@ -1586,11 +1592,12 @@ static void __hrtimer_init(struct hrtime * but the PINNED bit is ignored as pinning happens * when the hrtimer is started */ -void hrtimer_init(struct hrtimer *timer, clockid_t clock_id, - enum hrtimer_mode mode) +void hrtimer_init_key(struct hrtimer *timer, clockid_t clock_id, enum hrtimer_mode mode, + const char *name, struct lock_class_key *key) { debug_init(timer, clock_id, mode); __hrtimer_init(timer, clock_id, mode); + lockdep_init_map(&timer->lockdep_map, name, key, 0); } EXPORT_SYMBOL_GPL(hrtimer_init); @@ -1647,6 +1654,11 @@ static void __run_hrtimer(struct hrtimer enum hrtimer_restart (*fn)(struct hrtimer *); bool expires_in_hardirq; int restart; +#ifdef CONFIG_LOCKDEP + struct lockdep_map lockdep_map; + + lockdep_copy_map(&lockdep_map, &timer->lockdep_map); +#endif lockdep_assert_held(&cpu_base->lock); @@ -1682,7 +1694,9 @@ static void __run_hrtimer(struct hrtimer trace_hrtimer_expire_entry(timer, now); expires_in_hardirq = lockdep_hrtimer_enter(timer); + lock_map_acquire(&lockdep_map); restart = fn(timer); + lock_map_release(&lockdep_map); lockdep_hrtimer_exit(expires_in_hardirq); trace_hrtimer_expire_exit(timer); @@ -2004,12 +2018,13 @@ static void __hrtimer_init_sleeper(struc * @clock_id: the clock to be used * @mode: timer mode abs/rel */ -void hrtimer_init_sleeper(struct hrtimer_sleeper *sl, clockid_t clock_id, - enum hrtimer_mode mode) +void hrtimer_init_sleeper_key(struct hrtimer_sleeper *sl, clockid_t clock_id, + enum hrtimer_mode mode, + const char *name, struct lock_class_key *key) { debug_init(&sl->timer, clock_id, mode); __hrtimer_init_sleeper(sl, clock_id, mode); - + lockdep_init_map(&sl->timer.lockdep_map, name, key, 0); } EXPORT_SYMBOL_GPL(hrtimer_init_sleeper); -- ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: BUG: soft lockup in smp_call_function 2023-09-14 12:21 ` Hillf Danton @ 2023-09-14 13:13 ` Tetsuo Handa 0 siblings, 0 replies; 19+ messages in thread From: Tetsuo Handa @ 2023-09-14 13:13 UTC (permalink / raw) To: Thomas Gleixner Cc: Sanan Hasanov, Linus Torvalds, syzkaller, LKML, Hillf Danton On 2023/09/14 21:21, Hillf Danton wrote: > On Wed, 13 Sep 2023 23:30:23 +0900 Tetsuo Handa wrote: >> On 2023/09/13 20:07, Hillf Danton wrote: >>> >>> cpu1 cpu4 (see below) >>> ==== ==== >>> drm_crtc_vblank_off __run_hrtimer >>> spin_lock_irq(&dev->event_lock); >>> ... >>> drm_handle_vblank >>> hrtimer_cancel spin_lock_irqsave(&dev->event_lock, irqflags); >>> >>> >>> Deadlock should have been reported instead provided the lockdep_map in >>> struct timer_list were added also to hrtimer, so it is highly appreciated >>> if Tetsuo or Thomas adds it before 6.8 or 6.10. >> >> Not me. ;-) >> >> Since hrtimer_cancel() retries forever until lock_hrtimer_base() succeeds, >> we want to add a lockdep annotation into hrtimer_cancel() so that we can >> detect this type of deadlock? Here is a reproducer. ---------------------------------------- #include <linux/module.h> #include <linux/delay.h> static DEFINE_SPINLOCK(lock1); static struct hrtimer timer1; static enum hrtimer_restart timer_func(struct hrtimer *timer) { unsigned long flags; mdelay(100); // Wait for test_init() to hold lock1. spin_lock_irqsave(&lock1, flags); spin_unlock_irqrestore(&lock1, flags); return HRTIMER_RESTART; } static int __init test_init(void) { unsigned long flags; hrtimer_init(&timer1, CLOCK_MONOTONIC, HRTIMER_MODE_REL); timer1.function = &timer_func; hrtimer_start(&timer1, 1, HRTIMER_MODE_REL); mdelay(10); // Wait for timer_func() to start. spin_lock_irqsave(&lock1, flags); hrtimer_cancel(&timer1); // Wait for timer_func() to finish. spin_unlock_irqrestore(&lock1, flags); return -EINVAL; } module_init(test_init); MODULE_LICENSE("GPL"); ---------------------------------------- ---------------------------------------- [ 996.507681] test: loading out-of-tree module taints kernel. [ 996.514019] test: module verification failed: signature and/or required key missing - tainting kernel [ 1061.893054] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: [ 1061.900411] rcu: 4-...0: (1 GPs behind) idle=ed6c/1/0x4000000000000000 softirq=3304/3305 fqs=15784 [ 1061.909128] rcu: (detected by 0, t=65018 jiffies, g=12625, q=4422 ncpus=12) [ 1061.915003] Sending NMI from CPU 0 to CPUs 4: [ 1061.918972] NMI backtrace for cpu 4 [ 1061.919036] CPU: 4 PID: 3826 Comm: insmod Tainted: G OE 6.6.0-rc1+ #20 [ 1061.919093] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 [ 1061.919095] RIP: 0010:delay_tsc+0x34/0xa0 [ 1061.919560] Code: ff 05 e8 b1 26 70 65 44 8b 0d e4 b1 26 70 0f 01 f9 66 90 48 c1 e2 20 48 09 c2 49 89 d0 eb 21 65 ff 0d c8 b1 26 70 74 54 f3 90 <65> ff 05 bd b1 26 70 65 8b 35 ba b1 26 70 41 39 f1 75 28 41 89 f1 [ 1061.919563] RSP: 0018:ffffb471c059cf00 EFLAGS: 00000083 [ 1061.919567] RAX: 0000028efe104ef6 RBX: 0000000000000041 RCX: 0000000000000004 [ 1061.919569] RDX: 00000000002192f8 RSI: 0000000000000004 RDI: 000000000027d81e [ 1061.919571] RBP: ffff8970dafe5040 R08: 0000028efdeebbfe R09: 0000000000000004 [ 1061.919572] R10: 0000000000000001 R11: ffffffffc0a8d600 R12: ffffffff90e030e0 [ 1061.919574] R13: ffff8970dafe5040 R14: ffffffffc0a8b010 R15: ffff8970dafe5100 [ 1061.919630] FS: 00007fdd998eb740(0000) GS:ffff8970dae00000(0000) knlGS:0000000000000000 [ 1061.919633] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1061.919635] CR2: 0000000001edf678 CR3: 00000001a1060000 CR4: 0000000000350ee0 [ 1061.919639] Call Trace: [ 1061.919641] <NMI> [ 1061.919646] ? nmi_cpu_backtrace+0xb1/0x130 [ 1061.919711] ? nmi_cpu_backtrace_handler+0x11/0x20 [ 1061.922096] ? nmi_handle+0xe4/0x290 [ 1061.922163] ? default_do_nmi+0x49/0x100 [ 1061.922196] ? exc_nmi+0x152/0x1e0 [ 1061.922198] ? end_repeat_nmi+0x16/0x67 [ 1061.922340] ? __pfx_timer_func+0x10/0x10 [test] [ 1061.922347] ? delay_tsc+0x34/0xa0 [ 1061.922349] ? delay_tsc+0x34/0xa0 [ 1061.922350] ? delay_tsc+0x34/0xa0 [ 1061.922352] </NMI> [ 1061.922353] <IRQ> [ 1061.922353] timer_func+0x19/0xff0 [test] [ 1061.922358] __hrtimer_run_queues+0x177/0x3a0 [ 1061.922362] hrtimer_interrupt+0x104/0x240 [ 1061.922364] ? __do_softirq+0x2db/0x392 [ 1061.922827] __sysvec_apic_timer_interrupt+0x64/0x180 [ 1061.922833] sysvec_apic_timer_interrupt+0x65/0x80 [ 1061.922894] </IRQ> [ 1061.922896] <TASK> [ 1061.922898] asm_sysvec_apic_timer_interrupt+0x1a/0x20 [ 1061.922902] RIP: 0010:delay_tsc+0x4d/0xa0 [ 1061.922907] Code: c2 49 89 d0 eb 21 65 ff 0d c8 b1 26 70 74 54 f3 90 65 ff 05 bd b1 26 70 65 8b 35 ba b1 26 70 41 39 f1 75 28 41 89 f1 0f 01 f9 <66> 90 48 c1 e2 20 48 09 d0 48 89 c2 4c 29 c2 48 39 fa 72 c8 65 ff [ 1061.922909] RSP: 0018:ffffb471c1e63bd0 EFLAGS: 00000246 [ 1061.922912] RAX: 00000000751ed8ab RBX: 000000000000000a RCX: 0000000000000004 [ 1061.922914] RDX: 0000000000000267 RSI: 0000000000000004 RDI: 000000000027d81e [ 1061.922915] RBP: ffffffffc0a91010 R08: 00000267751adc59 R09: 0000000000000004 [ 1061.922917] R10: 0000000000000001 R11: ffffffff90cd85c8 R12: 0000000000000000 [ 1061.922918] R13: ffffb471c1e63d20 R14: 0000000000000000 R15: ffffffffc0a8d080 [ 1061.922923] ? __pfx_test_init+0x10/0x10 [test] [ 1061.922934] test_init+0x52/0xff0 [test] [ 1061.922941] do_one_initcall+0x5c/0x280 [ 1061.923004] ? kmalloc_trace+0xa9/0xc0 [ 1061.923105] do_init_module+0x60/0x240 [ 1061.923111] load_module+0x1f6e/0x20d0 [ 1061.923119] ? ima_post_read_file+0xe3/0xf0 [ 1061.923225] ? init_module_from_file+0x88/0xc0 [ 1061.923229] init_module_from_file+0x88/0xc0 [ 1061.923238] idempotent_init_module+0x19c/0x250 [ 1061.923244] ? security_capable+0x39/0x60 [ 1061.923304] __x64_sys_finit_module+0x5b/0xb0 [ 1061.923310] do_syscall_64+0x3b/0x90 [ 1061.923366] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 1061.923421] RIP: 0033:0x7fdd986f8e29 [ 1061.923427] Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 17 e0 2c 00 f7 d8 64 89 01 48 [ 1061.923429] RSP: 002b:00007ffe2f34dd18 EFLAGS: 00000206 ORIG_RAX: 0000000000000139 [ 1061.923432] RAX: ffffffffffffffda RBX: 0000000001ede240 RCX: 00007fdd986f8e29 [ 1061.923434] RDX: 0000000000000000 RSI: 000000000041a96e RDI: 0000000000000003 [ 1061.923435] RBP: 000000000041a96e R08: 0000000000000000 R09: 00007ffe2f34deb8 [ 1061.923436] R10: 0000000000000003 R11: 0000000000000206 R12: 0000000000000000 [ 1061.923437] R13: 0000000001ede210 R14: 0000000000000000 R15: 0000000000000000 [ 1061.923444] </TASK> [ 1061.923446] INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 4.474 msecs ---------------------------------------- > > Yes, you are right. > > The diff below is my two cents (only for thoughts). > I'm thinking something like below. (Completely untested.) I haven't checked IRQ state handling. But in the last diff chunk, why raw_spin_unlock_irqrestore() (which does not re-enable IRQs if the caller already disabled IRQs) is used before calling the callback function and raw_spin_lock_irq() (which always disables IRQs) is used after calling the callback function? Is it legal to disable IRQs again when the caller already disabled IRQs? ---------------------------------------- diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h index 0ee140176f10..5640730ec31c 100644 --- a/include/linux/hrtimer.h +++ b/include/linux/hrtimer.h @@ -123,8 +123,11 @@ struct hrtimer { u8 state; u8 is_rel; u8 is_soft; u8 is_hard; +#ifdef CONFIG_LOCKDEP + struct lockdep_map lockdep_map; +#endif }; /** * struct hrtimer_sleeper - simple sleeper structure @@ -440,15 +443,15 @@ static inline void hrtimer_restart(struct hrtimer *timer) hrtimer_start_expires(timer, HRTIMER_MODE_ABS); } /* Query timers: */ -extern ktime_t __hrtimer_get_remaining(const struct hrtimer *timer, bool adjust); +extern ktime_t __hrtimer_get_remaining(struct hrtimer *timer, bool adjust); /** * hrtimer_get_remaining - get remaining time for the timer * @timer: the timer to read */ -static inline ktime_t hrtimer_get_remaining(const struct hrtimer *timer) +static inline ktime_t hrtimer_get_remaining(struct hrtimer *timer) { return __hrtimer_get_remaining(timer, false); } diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index 238262e4aba7..fe0681d34b56 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -161,14 +161,23 @@ static inline bool is_migration_base(struct hrtimer_clock_base *base) * possible to set timer->base = &migration_base and drop the lock: the timer * remains locked. */ static -struct hrtimer_clock_base *lock_hrtimer_base(const struct hrtimer *timer, +struct hrtimer_clock_base *lock_hrtimer_base(struct hrtimer *timer, unsigned long *flags) __acquires(&timer->base->lock) { struct hrtimer_clock_base *base; +#ifdef CONFIG_LOCKDEP + unsigned long flags2; + + local_irq_save(flags2); + lock_map_acquire(&timer->lockdep_map); + lock_map_release(&timer->lockdep_map); + local_irq_restore(flags2); +#endif + for (;;) { base = READ_ONCE(timer->base); if (likely(base != &migration_base)) { raw_spin_lock_irqsave(&base->cpu_base->lock, *flags); @@ -1456,9 +1465,9 @@ EXPORT_SYMBOL_GPL(hrtimer_cancel); * __hrtimer_get_remaining - get remaining time for the timer * @timer: the timer to read * @adjust: adjust relative timers when CONFIG_TIME_LOW_RES=y */ -ktime_t __hrtimer_get_remaining(const struct hrtimer *timer, bool adjust) +ktime_t __hrtimer_get_remaining(struct hrtimer *timer, bool adjust) { unsigned long flags; ktime_t rem; @@ -1574,8 +1583,14 @@ static void __hrtimer_init(struct hrtimer *timer, clockid_t clock_id, timer->is_soft = softtimer; timer->is_hard = !!(mode & HRTIMER_MODE_HARD); timer->base = &cpu_base->clock_base[base]; timerqueue_init(&timer->node); +#ifdef CONFIG_LOCKDEP + { + static struct lock_class_key __key; + lockdep_init_map(&timer->lockdep_map, "hrtimer", &__key, 0); + } +#endif } /** * hrtimer_init - initialize a timer to the given clock @@ -1684,9 +1699,19 @@ static void __run_hrtimer(struct hrtimer_cpu_base *cpu_base, raw_spin_unlock_irqrestore(&cpu_base->lock, flags); trace_hrtimer_expire_entry(timer, now); expires_in_hardirq = lockdep_hrtimer_enter(timer); +#ifdef CONFIG_LOCKDEP + local_irq_save(flags); + lock_map_acquire(&timer->lockdep_map); + local_irq_restore(flags); +#endif restart = fn(timer); +#ifdef CONFIG_LOCKDEP + local_irq_save(flags); + lock_map_release(&timer->lockdep_map); + local_irq_restore(flags); +#endif lockdep_hrtimer_exit(expires_in_hardirq); trace_hrtimer_expire_exit(timer); raw_spin_lock_irq(&cpu_base->lock); ---------------------------------------- ^ permalink raw reply related [flat|nested] 19+ messages in thread
end of thread, other threads:[~2023-09-19 6:39 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-09-12 23:02 BUG: soft lockup in smp_call_function Sanan Hasanov 2023-09-13 10:05 ` Peter Zijlstra 2023-09-13 11:07 ` Hillf Danton 2023-09-13 14:21 ` drm/vkms: deadlock between dev->event_lock and timer Tetsuo Handa 2023-09-13 14:21 ` Tetsuo Handa 2023-09-13 16:47 ` Linus Torvalds 2023-09-13 16:47 ` Linus Torvalds 2023-09-13 21:08 ` Thomas Gleixner 2023-09-13 21:08 ` Thomas Gleixner 2023-09-14 6:33 ` Tetsuo Handa 2023-09-14 6:33 ` Tetsuo Handa 2023-09-14 8:12 ` Daniel Vetter 2023-09-14 8:12 ` Daniel Vetter 2023-09-18 22:02 ` Helen Koike 2023-09-19 6:38 ` Daniel Stone 2023-09-19 6:38 ` Daniel Stone 2023-09-13 14:30 ` BUG: soft lockup in smp_call_function Tetsuo Handa 2023-09-14 12:21 ` Hillf Danton 2023-09-14 13:13 ` Tetsuo Handa
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.