watchdog: BUG: soft lockup

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

* watchdog: BUG: soft lockup
@ 2024-12-23  6:32 wzs
  2025-01-02 16:10 ` Petr Mladek
  2025-01-08 17:27 ` Doug Anderson
  0 siblings, 2 replies; 4+ messages in thread
From: wzs @ 2024-12-23  6:32 UTC (permalink / raw)
  To: dianders, tglx, liusong, akpm, pmladek, kernelfans, deller,
	npiggin, tsbogend, James.Bottomley, jan.kiszka
  Cc: linux-kernel, linux-mips, linux-parisc, linuxppc-dev, yaoma

Hello,
when fuzzing the Linux kernel,
I triggered many "watch: BUG: soft lockup" warnings.
I am not sure whether this is an issue with the kernel or with the
fuzzing program I ran.
(The same fuzzing program, when tested on kernel versions from
Linux-6.7.0 to 6.12.0, triggers the 'watchdog: BUG: soft lockup'
warning on some versions, while others do not. Linux 6.12.0 is the
latest stable release where this error occurs.)

The bug information I provided below is from the Linux-6.12.0 kernel.
If you need bug information from other versions, I would be happy to provide it.

kernel config :https://pastebin.com/i4LPXNAN
console output :https://pastebin.com/uKVpvJ78

The beginning part of the crash report is as follows:
[283539.715873] rcu: INFO: rcu_preempt self-detected stall on CPU
[283540.766023] watchdog: BUG: soft lockup - CPU#9 stuck for 64s!
[kworker/u515:2:86301]
[283540.766046] Modules linked in: raw_gadget(OE) dummy_hcd(OE)
snd_ctl_led hid_sensor_hub hid_multitouch udc_core veth xt_conntrack
nft_chain_nat xt_MASQUERADE bridge stp llc nf_conntrack_netlink
xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables overlay
nfnetlink_cttimeout nfnetlink bnep intel_rapl_msr intel_rapl_common
intel_uncore_frequency_common intel_pmc_core intel_vsec pmt_telemetry
pmt_class openvswitch vsock_loopback vmw_vsock_virtio_transport_common
nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
kvm_intel psample snd_ens1371 crct10dif_pclmul libcrc32c
polyval_clmulni snd_ac97_codec vmw_vsock_vmci_transport
polyval_generic vsock sunrpc binfmt_misc gameport ghash_clmulni_intel
kvm sha256_ssse3 sha1_ssse3 ac97_bus vmw_balloon aesni_intel
crypto_simd
[283545.281916] rcu: 2-...!: (137 ticks this GP)
idle=ecec/1/0x4000000000000000 softirq=11168967/11168969 fqs=0
[283545.291031]  snd_pcm cryptd snd_seq_midi btusb snd_seq_midi_event
btrtl rapl nls_iso8859_1 btintel snd_rawmidi btbcm btmtk snd_seq
[283545.301366] rcu: (t=73327 jiffies g=69763773 q=104 ncpus=16)
[283545.301387] rcu: rcu_preempt kthread timer wakeup didn't happen
for 73324 jiffies! g69763773 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
[283545.307516] rcu: Possible timer handling issue on cpu=4
timer-softirq=4703160
[283545.308817]  input_leds snd_seq_device joydev bluetooth snd_timer
serio_raw snd soundcore vmw_vmci mac_hid sch_fq_codel vmwgfx
drm_ttm_helper ttm msr parport_pc ppdev lp
[283545.316000] rcu: rcu_preempt kthread starved for 73327 jiffies!
g69763773 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=4
[283545.316032] rcu: Unless rcu_preempt kthread gets sufficient CPU
time, OOM is now expected behavior.
[283545.316035] rcu: RCU grace-period kthread stack dump:
[283545.316038] task:rcu_preempt     state:I
[283545.779182]  parport efi_pstore ip_tables x_tables autofs4
hid_generic crc32_pclmul
[283545.783533] ------------[ cut here ]------------
[283545.785609]  psmouse usbhid hid mptspi mptscsih i2c_piix4 ahci
mptbase e1000 libahci scsi_transport_spi i2c_smbus pata_acpi [last
unloaded: raw_gadget(OE)]
[283545.789981] CPU: 9 UID: 0 PID: 86301 Comm: kworker/u515:2 Tainted:
G           OE      6.12.0 #1
[283545.790017] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[283545.790019] Hardware name: VMware, Inc. VMware Virtual
Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
[283545.790027] Workqueue: events_freezable_pwr_efficient disk_events_workfn
[283545.808817]  stack:29152 pid:17    tgid:17    ppid:2      flags:0x00004000
[283545.808838] Call Trace:
[283545.808843]  <TASK>
[283545.803556] RIP: 0010:_raw_spin_unlock_irqrestore+0x21/0x60
[283545.808853]  __schedule+0xc0c/0x4ee0
[283545.841937] Code: 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 49
89 f0 48 89 e5 c6 07 00 0f 1f 00 41 f7 c0 00 02 00 00 74 06 fb 0f 1f
44 00 00 <65> ff 0d 20 8d e9 73 74 13 5d 31 c0 31 d2 31 c9 31 f6 31 ff
45 31
[283545.841978] RSP: 0018:ffffc900309af550 EFLAGS: 00000206
[283545.841988] RAX: dffffc0000000000 RBX: ffff888114cf4000 RCX:
0000000000000000
[283545.841992] RDX: 1ffff1102299e802 RSI: 0000000000000246 RDI:
ffff888106bcd000
[283545.841995] RBP: ffffc900309af550 R08: 0000000000000246 R09:
0000000000000000
[283545.841997] R10: 0000000000000000 R11: 0000000000000000 R12:
0000000000000000
[283545.842001] R13: 0000000000000246 R14: ffff888114cf4010 R15:
ffff88822caa4000
[283545.842005] FS:  0000000000000000(0000) GS:ffff88862ea80000(0000)
knlGS:0000000000000000
[283545.846254] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[283545.787091] WARNING: CPU: 13 PID: 2974 at
kernel/sched/deadline.c:1995 enqueue_dl_entity+0x167f/0x3300
[283545.812128]  ? sched_clock_cpu+0x6a/0x520
[283545.847597]  ? __kasan_check_write+0x14/0x30
[283545.847607]  ? __pfx___schedule+0x10/0x10
[283545.847616]  ? save_fpregs_to_fpstate+0xb0/0x230
[283545.847621]  ? raw_spin_rq_unlock+0x17/0xa0
[283545.847627]  schedule+0x83/0x350
[283545.847634]  schedule_timeout+0x106/0x350
[283545.847640]  ? __kasan_check_write+0x14/0x30
[283545.847639] Modules linked in:
[283545.847644]  ? __pfx_schedule_timeout+0x10/0x10
[283545.847647]  raw_gadget(OE)
[283545.847648]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[283545.847651]  dummy_hcd(OE) snd_ctl_led
[283545.847652]  ? __pfx_process_timeout+0x10/0x10
[283545.847657]  hid_sensor_hub hid_multitouch
[283545.847659]  ? __kasan_check_write+0x14/0x30
[283545.847664]  ? prepare_to_swait_event+0xb8/0x350
[283545.847665] CR2: 00002a0000c65280 CR3: 000000011b16e000 CR4:
0000000000752ef0
[283545.847670]  udc_core veth xt_conntrack nft_chain_nat
xt_MASQUERADE bridge stp llc nf_conntrack_netlink xfrm_user xfrm_algo
xt_addrtype nft_compat nf_tables overlay nfnetlink_cttimeout nfnetlink
bnep intel_rapl_msr intel_rapl_common
[283545.847709] PKRU: 55555554
[283545.847711]  intel_uncore_frequency_common intel_pmc_core
[283545.847715] Call Trace:
[283545.847718]  intel_vsec pmt_telemetry
[283545.847720]  <IRQ>
[283545.847723]  pmt_class openvswitch vsock_loopback
vmw_vsock_virtio_transport_common nsh nf_conncount nf_nat nf_conntrack
nf_defrag_ipv6 nf_defrag_ipv4 kvm_intel psample snd_ens1371
crct10dif_pclmul libcrc32c polyval_clmulni snd_ac97_codec
vmw_vsock_vmci_transport polyval_generic vsock sunrpc binfmt_misc
gameport ghash_clmulni_intel kvm sha256_ssse3 sha1_ssse3 ac97_bus
vmw_balloon aesni_intel crypto_simd snd_pcm cryptd snd_seq_midi btusb
snd_seq_midi_event btrtl rapl nls_iso8859_1 btintel snd_rawmidi btbcm
btmtk snd_seq input_leds snd_seq_device joydev bluetooth snd_timer
serio_raw snd soundcore vmw_vmci mac_hid sch_fq_codel vmwgfx
drm_ttm_helper ttm msr parport_pc ppdev lp parport efi_pstore
ip_tables x_tables autofs4 hid_generic crc32_pclmul psmouse usbhid hid
mptspi mptscsih i2c_piix4 ahci mptbase e1000 libahci
scsi_transport_spi i2c_smbus pata_acpi [last unloaded: raw_gadget(OE)]
[283545.847881] CPU: 13 UID: 1000 PID: 2974 Comm: gsd-housekeepin
Tainted: G           OE      6.12.0 #1
[283545.847888] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[283545.847891] Hardware name: VMware, Inc. VMware Virtual
Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
[283545.847895] RIP: 0010:enqueue_dl_entity+0x167f/0x3300
[283545.847901] Code: e8 f6 bb 8b 00 4c 8b 5d c8 4c 8b 55 d0 e9 18 f7
ff ff 85 c9 0f 84 51 fa ff ff 44 89 c6 4c 89 e7 e8 76 2b ff ff e9 41
fa ff ff <0f> 0b e9 bb e9 ff ff 0f 0b e9 c2 ef ff ff 48 b8 00 00 00 00
00 fc
[283545.847905] RSP: 0018:ffffc90005fd6ce0 EFLAGS: 00010082
[283545.847909] RAX: dffffc0000000000 RBX: ffff88862ecc7ca8 RCX:
0000000000000001
[283545.847912] RDX: 1ffff110c5d98f95 RSI: 0000000000000001 RDI:
ffff88862ecc7ca8
[283545.847915] RBP: ffffc90005fd6d40 R08: 0000000000000001 R09:
0000000000000000
[283545.847917] R10: 0000000000000000 R11: 0000000000000000 R12:
ffff88862ecc7ca8
[283545.847920] R13: ffff88862ecc7d80 R14: ffff88862ecc72c0 R15:
dffffc0000000000
[283545.847923] FS:  000073dc1824d800(0000) GS:ffff88862ec80000(0000)
knlGS:0000000000000000
[283545.847926] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[283545.847928] CR2: 0000049400328000 CR3: 000000018a21c000 CR4:
0000000000752ef0
[283545.847965] PKRU: 55555554
[283545.847968] Call Trace:
[283545.847971]  <TASK>
[283545.847668]  rcu_gp_fqs_loop+0x1a3/0x970
[283545.848430]  ? rcu_gp_init+0x8b7/0x1410
[283545.848436]  ? __pfx_rcu_gp_fqs_loop+0x10/0x10
[283545.848440]  ? finish_swait+0x12a/0x1f0
[283545.848445]  ? prepare_to_swait_event+0xb8/0x350
[283545.848449]  rcu_gp_kthread+0x27a/0x430
[283545.848452]  ? __pfx_rcu_gp_kthread+0x10/0x10
[283545.848456]  ? __kasan_check_read+0x11/0x20
[283545.847725]  ? show_regs+0x71/0x90
[283545.848459]  ? __kthread_parkme+0x8f/0x150
[283545.850678]  ? schedule+0x83/0x350
[283545.850687]  kthread+0x2b5/0x390
[283545.850691]  ? __pfx_rcu_gp_kthread+0x10/0x10
[283545.850695]  ? __pfx_kthread+0x10/0x10
[283545.850699]  ret_from_fork+0x43/0x90
[283545.850705]  ? __pfx_kthread+0x10/0x10
[283545.847975]  ? show_regs+0x71/0x90
[283545.850100]  ? watchdog_timer_fn+0x305/0x3d0
[283545.851262]  ? __hrtimer_run_queues+0x2f0/0x7b0
[283545.851399]  ? __pfx_watchdog_timer_fn+0x10/0x10
[283545.851405]  ? __pfx___hrtimer_run_queues+0x10/0x10
[283545.851409]  ? hrtimer_interrupt+0x2bc/0x750
[283545.850709]  ret_from_fork_asm+0x1a/0x30
[283545.853112]  </TASK>
[283545.853172] Sending NMI from CPU 2 to CPUs 0:
[283545.850722]  ? __warn+0xd3/0x2d0
[283545.854922]  ? enqueue_dl_entity+0x167f/0x3300
[283545.851414]  ? __sysvec_apic_timer_interrupt+0x8e/0x310
[283545.875438]  ? sysvec_apic_timer_interrupt+0x91/0xb0
[283545.875446]  </IRQ>
[283545.875449]  <TASK>
[283545.875471]  ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
[283545.875483]  ? _raw_spin_unlock_irqrestore+0x21/0x60
[283545.854928]  ? report_bug+0x2ad/0x300
[283545.900589]  ? handle_bug+0x6e/0xc0
[283545.900593]  ? exc_invalid_op+0x19/0x50
[283545.900596]  ? asm_exc_invalid_op+0x1b/0x20
[283545.900602]  ? enqueue_dl_entity+0x167f/0x3300
[283545.900607]  ? update_cfs_group+0x274/0x2f0
[283545.901798]  dl_server_start+0xd3/0x8a0
[283545.901805]  enqueue_task_fair+0xb41/0x1830
[283545.901810]  ? __pfx_sched_mm_cid_migrate_to+0x10/0x10
[283545.901814]  enqueue_task+0x7f/0x540
[283545.901819]  activate_task+0x5f/0xd0
[283545.901822]  sched_balance_rq+0x13ba/0x3200
[283545.901825]  ? sched_clock_cpu+0x6a/0x520
[283545.901830]  ? sched_clock_noinstr+0x9/0x10
[283545.901836]  ? __pfx_sched_balance_rq+0x10/0x10
[283545.901841]  sched_balance_newidle+0x615/0x1040
[283545.901845]  ? __pfx_sched_balance_newidle+0x10/0x10
[283545.901848]  ? dequeue_entities+0x8ce/0x1330
[283545.901852]  ? sched_clock_cpu+0x6a/0x520
[283545.901856]  pick_next_task_fair+0x35/0xdb0
[283545.901859]  ? dequeue_task_fair+0x232/0x5a0
[283545.901863]  __pick_next_task+0x101/0x6d0
[283545.901866]  __schedule+0x53e/0x4ee0
[283545.901871]  ? update_stack_state+0x2d9/0x610
[283545.901877]  ? __pfx___schedule+0x10/0x10
[283545.901880]  ? _raw_write_unlock_irqrestore+0x80/0x90
[283545.901883]  ? enqueue_hrtimer+0x10b/0x290
[283545.901888]  ? hrtimer_start_range_ns+0x2f8/0xb60
[283545.901892]  schedule+0x83/0x350
[283545.901895]  schedule_hrtimeout_range_clock+0x15c/0x490
[283545.901900]  ? __pfx_schedule_hrtimeout_range_clock+0x10/0x10
[283545.901904]  ? __pfx_hrtimer_wakeup+0x10/0x10
[283545.901908]  schedule_hrtimeout_range+0x13/0x30
[283545.901912]  poll_schedule_timeout.constprop.0+0x79/0x100
[283545.901916]  do_sys_poll+0x6f6/0xc10
[283545.901921]  ? __pfx_do_sys_poll+0x10/0x10
[283545.901925]  ? is_bpf_text_address+0x24/0x40
[283545.901929]  ? kernel_text_address+0xd7/0xe0
[283545.901933]  ? __kernel_text_address+0x12/0x50
[283545.901936]  ? unwind_get_return_address+0x65/0xb0
[283545.901940]  ? __pfx_stack_trace_consume_entry+0x10/0x10
[283545.901945]  ? arch_stack_walk+0x9e/0x100
[283545.901951]  ? __pfx_pollwake+0x10/0x10
[283545.901954]  ? __pfx_pollwake+0x10/0x10
[283545.901748] INFO: NMI handler (perf_event_nmi_handler) took too
long to run: 2.846 msecs
[283545.901956]  ? lockref_get_not_dead+0xd8/0x1f0
[283545.904986]  ? putname+0xf4/0x150
[283545.904991]  ? kasan_save_stack+0x3c/0x60
[283545.904995]  ? kasan_save_stack+0x28/0x60
[283545.904999]  ? kasan_save_track+0x18/0x70
[283545.905002]  ? kasan_save_free_info+0x3b/0x60
[283545.905006]  ? __kasan_slab_free+0x54/0x80
[283545.905009]  ? kmem_cache_free+0x165/0x590
[283545.905014]  ? putname+0xf4/0x150
[283545.905017]  ? user_path_at+0x46/0x70
[283545.905020]  ? user_statfs+0x8f/0x130
[283545.905751]  ? __do_sys_statfs+0x87/0x120
[283545.905757]  ? __x64_sys_statfs+0x54/0x80
[283545.905760]  ? x64_sys_call+0xa30/0x2670
[283545.905764]  ? do_syscall_64+0x7c/0x170
[283545.905768]  ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
[283545.905775]  ? try_to_unlazy+0x1cc/0x4b0
[283545.905779]  ? terminate_walk+0x173/0x540
[283545.905783]  ? path_lookupat+0x172/0x670
[283545.905786]  ? x64_sys_call+0xa30/0x2670
[283545.905790]  ? filename_lookup+0x1a6/0x520
[283545.905793]  ? __pfx_timespec64_add_safe+0x10/0x10
[283545.905797]  ? sched_clock_noinstr+0x9/0x10
[283545.905802]  ? ktime_get_ts64+0x81/0x270
[283545.905807]  __x64_sys_poll+0x180/0x500
[283545.905811]  ? __pfx___x64_sys_poll+0x10/0x10
[283545.905814]  ? lockref_put_return+0xd0/0x150
[283545.905818]  ? __pfx_mntput_no_expire+0x10/0x10
[283545.905823]  x64_sys_call+0x140e/0x2670
[283545.905826]  do_syscall_64+0x7c/0x170
[283545.905829]  ? mntput+0x57/0x90
[283545.905832]  ? path_put+0x50/0x80
[283545.905836]  ? user_statfs+0xbe/0x130
[283545.905839]  ? __pfx_user_statfs+0x10/0x10
[283545.905842]  ? __pfx_cp_new_stat+0x10/0x10
[283545.907651]  ? __kasan_check_read+0x11/0x20
[283545.907656]  ? _copy_to_user+0x3b/0x80
[283545.909759]  ? __do_sys_statfs+0x101/0x120
[283545.909764]  ? __pfx___do_sys_statfs+0x10/0x10
[283545.909793]  ? __kasan_check_read+0x11/0x20
[283545.909803]  ? fpregs_assert_state_consistent+0x21/0xb0
[283545.909810]  ? syscall_exit_to_user_mode+0x4e/0x270
[283545.909815]  ? do_syscall_64+0x88/0x170
[283545.909818]  ? syscall_exit_to_user_mode+0x4e/0x270
[283545.909821]  ? __kasan_check_read+0x11/0x20
[283545.909825]  ? fpregs_assert_state_consistent+0x21/0xb0
[283545.909829]  ? syscall_exit_to_user_mode+0x4e/0x270
[283545.909832]  ? do_syscall_64+0x88/0x170
[283545.909834]  ? clear_bhb_loop+0x15/0x70
[283545.909838]  ? clear_bhb_loop+0x15/0x70
[283545.909842]  ? clear_bhb_loop+0x15/0x70
[283545.909845]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[283545.909849] RIP: 0033:0x73dc18118bcf
[283545.909855] Code: 54 24 1c 48 89 74 24 10 48 89 7c 24 08 e8 39 7e
f7 ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 48 8b 7c 24 08 b8 07 00 00
00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 89 44 24 08 e8 7d 7e f7 ff
8b 44
[283545.909859] RSP: 002b:00007fffd38bae60 EFLAGS: 00000293 ORIG_RAX:
0000000000000007
[283545.909865] RAX: ffffffffffffffda RBX: 000073dc1844e7a0 RCX:
000073dc18118bcf
[283545.875487]  ata_scsi_queuecmd+0x180/0x200
[283545.909868] RDX: 000000000000e9c5 RSI: 0000000000000002 RDI:
00005b604981cec0
[283545.909872] RBP: 00005b604981cec0 R08: 0000000000000000 R09:
0000000000000000
[283545.909875] R10: 0000000000000000 R11: 0000000000000293 R12:
0000000000000002
[283545.909874]  ? blk_mq_start_request+0xc1/0x770
[283545.909878] R13: 00007fffd38baea4 R14: 000000000000e9c5 R15:
00005b60497f9970
[283545.909883]  </TASK>
[283545.909886] ---[ end trace 0000000000000000 ]---
[283545.909883]  scsi_queue_rq+0xb34/0x3320
[283545.922100]  blk_mq_dispatch_rq_list+0x391/0x2570
[283545.922107]  ? __kasan_check_read+0x11/0x20
[283545.922113]  ? __pfx_blk_mq_dispatch_rq_list+0x10/0x10
[283545.922118]  ? _raw_write_lock_nested+0x41/0xe0
[283545.922123]  __blk_mq_sched_dispatch_requests+0x1dd/0x1620
[283545.922127]  ? bio_kmalloc+0x23/0x40
[283545.923317]  ? sr_check_events+0x1bc/0xa50
[283545.924738]  ? cdrom_check_events+0x60/0x120
[283545.925485]  ? __pfx___blk_mq_sched_dispatch_requests+0x10/0x10
[283545.925491]  ? _raw_spin_lock+0x82/0xf0
[283545.925495]  ? __pfx__raw_spin_lock+0x10/0x10
[283545.925498]  ? policy_nodemask+0x239/0x360
[283545.925504]  blk_mq_sched_dispatch_requests+0xb9/0x100
[283545.925510]  blk_mq_run_hw_queue+0x32f/0x560
[283545.925514]  blk_execute_rq+0x1a6/0x3d0
[283545.925518]  ? __pfx_bio_add_pc_page+0x10/0x10
[283545.925521]  ? __pfx_blk_execute_rq+0x10/0x10
[283545.925526]  scsi_execute_cmd+0x39d/0xcc0
[283545.925530]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[283545.925533]  ? __pfx_scsi_execute_cmd+0x10/0x10
[283545.925536]  ? _raw_spin_lock+0x82/0xf0
[283545.925539]  ? __kasan_check_write+0x14/0x30
[283545.925542]  ? enqueue_timer+0xec/0x5b0
[283545.925548]  sr_check_events+0x1bc/0xa50
[283545.925552]  ? __pfx_sr_check_events+0x10/0x10
[283545.925554]  ? __kasan_check_write+0x14/0x30
[283545.925558]  ? _raw_spin_lock_irqsave+0x96/0x110
[283545.925561]  ? rt_mutex_post_schedule+0x118/0x160
[283545.925580]  ? add_timer_global+0x90/0xd0
[283545.925584]  cdrom_check_events+0x60/0x120
[283545.925587]  sr_block_check_events+0xab/0xe0
[283545.925590]  disk_check_events+0xbb/0x3f0
[283545.925594]  ? wake_up_process+0x15/0x30
[283545.925597]  ? kick_pool+0x26c/0x680
[283545.926093]  disk_events_workfn+0x19/0x30
[283545.926100]  process_one_work+0x5f7/0x1060
[283545.926104]  ? __kasan_check_write+0x14/0x30
[283545.926109]  worker_thread+0x779/0x1200
[283545.926112]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[283545.926117]  kthread+0x2b5/0x390
[283545.926121]  ? __pfx_worker_thread+0x10/0x10
[283545.926124]  ? __pfx_kthread+0x10/0x10
[283545.926128]  ret_from_fork+0x43/0x90
[283545.926132]  ? __pfx_kthread+0x10/0x10
[283545.926136]  ret_from_fork_asm+0x1a/0x30
[283545.926141]  </TASK>


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: watchdog: BUG: soft lockup
  2024-12-23  6:32 watchdog: BUG: soft lockup wzs
@ 2025-01-02 16:10 ` Petr Mladek
  2025-01-08 17:27 ` Doug Anderson
  1 sibling, 0 replies; 4+ messages in thread
From: Petr Mladek @ 2025-01-02 16:10 UTC (permalink / raw)
  To: wzs
  Cc: dianders, tglx, liusong, akpm, kernelfans, deller, npiggin,
	tsbogend, James.Bottomley, jan.kiszka, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, yaoma

On Mon 2024-12-23 14:32:11, wzs wrote:
> Hello,
> when fuzzing the Linux kernel,
> I triggered many "watch: BUG: soft lockup" warnings.
> I am not sure whether this is an issue with the kernel or with the
> fuzzing program I ran.
> (The same fuzzing program, when tested on kernel versions from
> Linux-6.7.0 to 6.12.0, triggers the 'watchdog: BUG: soft lockup'
> warning on some versions, while others do not. Linux 6.12.0 is the
> latest stable release where this error occurs.)

I am not sure if I understand the above correctly.

Did you test also kernel versions older than 6.7.0 or newer then 6.12.0?

Was the result random? Like for example, did you saw the problem
on 6.7.0 and 6.12.0 but not on 6.9.0?

Did you try to run the test more times with the same version of the
kernel? Was it reproducible?

I guess that the problem is in all versions but it is not 100%
reproducible. So that the softlockup happens or does not happen by chance.

> The bug information I provided below is from the Linux-6.12.0 kernel.
> If you need bug information from other versions, I would be happy to provide it.
> 
> kernel config :https://pastebin.com/i4LPXNAN
> console output :https://pastebin.com/uKVpvJ78

It would be great to see at least three logs. It might help to find
a common pattern.

Also it would be great to provide a full log if possible. Or at least
the boot part and few more lines before the crash.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: watchdog: BUG: soft lockup
  2024-12-23  6:32 watchdog: BUG: soft lockup wzs
  2025-01-02 16:10 ` Petr Mladek
@ 2025-01-08 17:27 ` Doug Anderson
  2025-01-13  8:27   ` wzs
  1 sibling, 1 reply; 4+ messages in thread
From: Doug Anderson @ 2025-01-08 17:27 UTC (permalink / raw)
  To: wzs
  Cc: tglx, liusong, akpm, pmladek, kernelfans, deller, npiggin,
	tsbogend, James.Bottomley, jan.kiszka, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, yaoma

Hi,

On Sun, Dec 22, 2024 at 10:32 PM wzs <wangzhengshu39@gmail.com> wrote:
>
> Hello,
> when fuzzing the Linux kernel,
> I triggered many "watch: BUG: soft lockup" warnings.
> I am not sure whether this is an issue with the kernel or with the
> fuzzing program I ran.
> (The same fuzzing program, when tested on kernel versions from
> Linux-6.7.0 to 6.12.0, triggers the 'watchdog: BUG: soft lockup'
> warning on some versions, while others do not. Linux 6.12.0 is the
> latest stable release where this error occurs.)
>
> The bug information I provided below is from the Linux-6.12.0 kernel.
> If you need bug information from other versions, I would be happy to provide it.
>
> kernel config :https://pastebin.com/i4LPXNAN
> console output :https://pastebin.com/uKVpvJ78

IMO it's nearly always a bug if userspace can cause the kernel to soft
lockup. I'd expect this isn't a bug in the soft lockup detector but a
problem in whatever part of the kernel you're fuzzing. For some
details of the soft lockup detector, see
`Documentation/admin-guide/lockup-watchdogs.rst`.

Presumably you're fuzzing the kernel in a way that causes it to enter
a big loop while preemption is disabled, or something like that.
Presumably the kernel should be detecting something invalid that
userspace did and that would keep it from looping so long.

I tried looking at your pastebin and probably what's going on is
somewhere hidden in there, but unfortunately the beginning of the logs
are a bit jumbled since it looks like the RCU warning and the soft
lockup warning happened at about the same time and their stuff is
jumbled. There's also a lot of tasks to go through. Honestly, it's
probably less work just to look at whatever you were trying to fuzz to
help you pinpoint the problem.

I'll also note that you seem to be using KASAN and are running in a
virtual machine. It's not inconceivable that's contributing to your
problems. KASAN makes things _a lot_ slower and a VM may be getting
its time stolen by the host.

-Doug

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: watchdog: BUG: soft lockup
  2025-01-08 17:27 ` Doug Anderson
@ 2025-01-13  8:27   ` wzs
  0 siblings, 0 replies; 4+ messages in thread
From: wzs @ 2025-01-13  8:27 UTC (permalink / raw)
  To: Doug Anderson
  Cc: tglx, liusong, akpm, pmladek, kernelfans, deller, npiggin,
	tsbogend, James.Bottomley, jan.kiszka, linux-kernel, linux-mips,
	linux-parisc, linuxppc-dev, yaoma

Thanks for the tip!

Doug Anderson <dianders@chromium.org> 于2025年1月9日周四 01:33写道：
>
> Hi,
>
> On Sun, Dec 22, 2024 at 10:32 PM wzs <wangzhengshu39@gmail.com> wrote:
> >
> > Hello,
> > when fuzzing the Linux kernel,
> > I triggered many "watch: BUG: soft lockup" warnings.
> > I am not sure whether this is an issue with the kernel or with the
> > fuzzing program I ran.
> > (The same fuzzing program, when tested on kernel versions from
> > Linux-6.7.0 to 6.12.0, triggers the 'watchdog: BUG: soft lockup'
> > warning on some versions, while others do not. Linux 6.12.0 is the
> > latest stable release where this error occurs.)
> >
> > The bug information I provided below is from the Linux-6.12.0 kernel.
> > If you need bug information from other versions, I would be happy to provide it.
> >
> > kernel config :https://pastebin.com/i4LPXNAN
> > console output :https://pastebin.com/uKVpvJ78
>
> IMO it's nearly always a bug if userspace can cause the kernel to soft
> lockup. I'd expect this isn't a bug in the soft lockup detector but a
> problem in whatever part of the kernel you're fuzzing. For some
> details of the soft lockup detector, see
> `Documentation/admin-guide/lockup-watchdogs.rst`.
>
> Presumably you're fuzzing the kernel in a way that causes it to enter
> a big loop while preemption is disabled, or something like that.
> Presumably the kernel should be detecting something invalid that
> userspace did and that would keep it from looping so long.
>
> I tried looking at your pastebin and probably what's going on is
> somewhere hidden in there, but unfortunately the beginning of the logs
> are a bit jumbled since it looks like the RCU warning and the soft
> lockup warning happened at about the same time and their stuff is
> jumbled. There's also a lot of tasks to go through. Honestly, it's
> probably less work just to look at whatever you were trying to fuzz to
> help you pinpoint the problem.
>
> I'll also note that you seem to be using KASAN and are running in a
> virtual machine. It's not inconceivable that's contributing to your
> problems. KASAN makes things _a lot_ slower and a VM may be getting
> its time stolen by the host.
>
> -Doug


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-01-13  8:27 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-23  6:32 watchdog: BUG: soft lockup wzs
2025-01-02 16:10 ` Petr Mladek
2025-01-08 17:27 ` Doug Anderson
2025-01-13  8:27   ` wzs

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).