* [REGRESSION] intermittent psi_avgs_work soft lockup
@ 2025-08-03 22:14 Chris Bainbridge
2025-08-04 13:32 ` Johannes Weiner
0 siblings, 1 reply; 4+ messages in thread
From: Chris Bainbridge @ 2025-08-03 22:14 UTC (permalink / raw)
To: linux-kernel
Cc: surenb, bsegall, dietmar.eggemann, mingo, hannes, juri.lelli,
mgorman, peterz, rostedt, vschneid, vincent.guittot, regressions
Hello,
I'm getting intermittent soft lockups with recent kernel builds. This is
a new error that I haven't seen before.
An example lockup from 6.16.0-08685-g260f6f4fda93:
[39389.154516] iwlwifi 0000:01:00.0: Queue 3 is stuck 4977 5129
[39400.400429] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [kworker/2:1:1751316]
[39400.400433] Modules linked in: snd_seq_dummy snd_hrtimer snd_seq xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat x_tables nf_tables br_netfilter bridge stp llc ccm overlay qrtr rfcomm cmac algif_hash algif_skcipher af_alg bnep binfmt_misc ext4 mbcache jbd2 nls_ascii nls_cp437 vfat fat snd_hda_codec_generic snd_hda_codec_hdmi intel_rapl_msr intel_rapl_common iwlmvm snd_hda_intel snd_acp3x_pdm_dma snd_soc_dmic snd_acp3x_rn kvm_amd snd_hda_codec uvcvideo snd_soc_core mac80211 snd_usb_audio btusb snd_intel_dspcfg snd_compress videobuf2_vmalloc snd_usbmidi_lib btrtl libarc4 videobuf2_memops kvm snd_rawmidi snd_hwdep snd_pci_acp6x btintel uvc snd_seq_device snd_hda_core snd_pci_acp5x btbcm videobuf2_v4l2 irqbypass snd_pcm btmtk iwlwifi snd_rn_pci_acp3x sg videodev rapl snd_timer videobuf2_common wmi_bmof ee1004 snd_acp_config pcspkr bluetooth cfg80211 snd_soc_acpi k10temp snd mc snd_pci_acp3x soundcore ccp rfkill ac
[39400.400478] battery acpi_tad amd_pmc joydev evdev msr parport_pc ppdev lp parport efi_pstore fuse nvme_fabrics configfs nfnetlink efivarfs autofs4 crc32c_cryptoapi btrfs blake2b_generic xor raid6_pq hid_microsoft ff_memless hid_cmedia r8153_ecm cdc_ether usbnet r8152 mii libphy mdio_bus usbhid dm_crypt dm_mod sd_mod uas usb_storage scsi_mod scsi_common amdgpu drm_client_lib i2c_algo_bit drm_ttm_helper ttm drm_panel_backlight_quirks drm_exec drm_suballoc_helper amdxcp drm_buddy gpu_sched hid_multitouch drm_display_helper ucsi_acpi hid_generic drm_kms_helper typec_ucsi sp5100_tco roles xhci_pci cec i2c_hid_acpi watchdog typec xhci_hcd amd_sfh i2c_hid rc_core nvme i2c_piix4 thunderbolt video usbcore ghash_clmulni_intel serio_raw hid crc16 nvme_core fan i2c_smbus usb_common button wmi drm aesni_intel
[39400.400514] irq event stamp: 28884
[39400.400515] hardirqs last enabled at (28883): [<ffffffffb6200dc6>] asm_sysvec_apic_timer_interrupt+0x16/0x20
[39400.400521] hardirqs last disabled at (28884): [<ffffffffb71185fa>] sysvec_apic_timer_interrupt+0xa/0xc0
[39400.400526] softirqs last enabled at (28882): [<ffffffffb64f934d>] __irq_exit_rcu+0xcd/0x140
[39400.400530] softirqs last disabled at (28877): [<ffffffffb64f934d>] __irq_exit_rcu+0xcd/0x140
[39400.400533] CPU: 2 UID: 0 PID: 1751316 Comm: kworker/2:1 Not tainted 6.16.0-08685-g260f6f4fda93 #489 PREEMPT(voluntary)
[39400.400535] Hardware name: HP HP Pavilion Aero Laptop 13-be0xxx/8916, BIOS F.17 12/18/2024
[39400.400537] Workqueue: events psi_avgs_work
[39400.400541] RIP: 0010:collect_percpu_times+0x2d5/0x440
[39400.400543] Code: 00 00 00 00 00 41 8b 0c 94 48 0f af c8 48 01 4c d5 00 48 83 c2 01 48 83 fa 06 75 e9 8d 53 01 e9 aa fd ff ff f3 90 48 8b 3c 24 <48> 8b 14 fd 20 d0 6d b7 48 01 c2 8b 12 f6 c2 01 0f 84 ab fe ff ff
[39400.400545] RSP: 0018:ffffc06b07823cf8 EFLAGS: 00000202
[39400.400546] RAX: ffffffffb82abc80 RBX: ffffe06aff48f440 RCX: 0000000000000006
[39400.400548] RDX: 00000000000014b7 RSI: ffffffffb76b7293 RDI: 000000000000000d
[39400.400548] RBP: ffffc06b07823d70 R08: 0000000000000001 R09: 0000000000000000
[39400.400549] R10: 0000000000000001 R11: 0000000000000003 R12: ffffc06b07823d50
[39400.400550] R13: ffffe06aff48f454 R14: 000000000000000d R15: ffffffffb82abc80
[39400.400551] FS: 0000000000000000(0000) GS:ffff9d9f4e072000(0000) knlGS:0000000000000000
[39400.400552] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[39400.400553] CR2: 00000c2100382000 CR3: 0000000387c3b000 CR4: 0000000000750ef0
[39400.400554] PKRU: 55555554
[39400.400555] Call Trace:
[39400.400557] <TASK>
[39400.400571] psi_avgs_work+0x56/0xe0
[39400.400576] process_one_work+0x22b/0x5b0
[39400.400588] worker_thread+0x1d6/0x3c0
[39400.400592] ? bh_worker+0x260/0x260
[39400.400594] kthread+0x115/0x260
[39400.400599] ? kthreads_online_cpu+0x120/0x120
[39400.400603] ret_from_fork+0x231/0x2a0
[39400.400606] ? kthreads_online_cpu+0x120/0x120
[39400.400610] ret_from_fork_asm+0x11/0x20
[39400.400621] </TASK>
[39400.404429] watchdog: BUG: soft lockup - CPU#4 stuck for 21s! [kworker/4:0:1751752]
It appears to happen randomly when I have been away from the laptop for
some time and return, or sometimes if I leave it overnight. It also
looks like it occurs on 2% of system boots. Bisecting with such a low
failure probability takes a long time. I haven't identified the bad
commit yet, but I think I have narrowed it down to between v6.16-rc6
(good) and v6.16-rc6-79-g44e4e0297c3c (bad). At this rate, I should have
a more exact bisect result within a week.
#regzbot introduced: v6.16-rc6..v6.16-rc6-79-g44e4e0297c3c
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [REGRESSION] intermittent psi_avgs_work soft lockup
2025-08-03 22:14 [REGRESSION] intermittent psi_avgs_work soft lockup Chris Bainbridge
@ 2025-08-04 13:32 ` Johannes Weiner
2025-08-04 16:54 ` Johannes Weiner
2025-08-04 22:34 ` Chris Bainbridge
0 siblings, 2 replies; 4+ messages in thread
From: Johannes Weiner @ 2025-08-04 13:32 UTC (permalink / raw)
To: Chris Bainbridge
Cc: linux-kernel, surenb, bsegall, dietmar.eggemann, mingo,
juri.lelli, mgorman, peterz, rostedt, vschneid, vincent.guittot,
regressions
Hi Chris,
On Sun, Aug 03, 2025 at 11:14:42PM +0100, Chris Bainbridge wrote:
> Hello,
>
> I'm getting intermittent soft lockups with recent kernel builds. This is
> a new error that I haven't seen before.
This smells of the seqlock re-init problem in 570c8efd5eb7. Could you
try to see if the below patch fixes it for you?
https://lore.kernel.org/lkml/20250716104050.GR1613200@noisy.programming.kicks-ass.net/
We probably want Cc: stable on this patch now that 6.16 is released.
Thanks
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [REGRESSION] intermittent psi_avgs_work soft lockup
2025-08-04 13:32 ` Johannes Weiner
@ 2025-08-04 16:54 ` Johannes Weiner
2025-08-04 22:34 ` Chris Bainbridge
1 sibling, 0 replies; 4+ messages in thread
From: Johannes Weiner @ 2025-08-04 16:54 UTC (permalink / raw)
To: Chris Bainbridge
Cc: linux-kernel, surenb, bsegall, dietmar.eggemann, mingo,
juri.lelli, mgorman, peterz, rostedt, vschneid, vincent.guittot,
regressions
On Mon, Aug 04, 2025 at 09:32:45AM -0400, Johannes Weiner wrote:
> We probably want Cc: stable on this patch now that 6.16 is released.
Nevermind, this didn't go into v6.16. The v6.16-rc6 tags made it look
like it did.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [REGRESSION] intermittent psi_avgs_work soft lockup
2025-08-04 13:32 ` Johannes Weiner
2025-08-04 16:54 ` Johannes Weiner
@ 2025-08-04 22:34 ` Chris Bainbridge
1 sibling, 0 replies; 4+ messages in thread
From: Chris Bainbridge @ 2025-08-04 22:34 UTC (permalink / raw)
To: Johannes Weiner
Cc: linux-kernel, surenb, bsegall, dietmar.eggemann, mingo,
juri.lelli, mgorman, peterz, rostedt, vschneid, vincent.guittot,
regressions
On Mon, Aug 04, 2025 at 09:32:40AM -0400, Johannes Weiner wrote:
> Hi Chris,
>
> On Sun, Aug 03, 2025 at 11:14:42PM +0100, Chris Bainbridge wrote:
> > Hello,
> >
> > I'm getting intermittent soft lockups with recent kernel builds. This is
> > a new error that I haven't seen before.
>
> This smells of the seqlock re-init problem in 570c8efd5eb7. Could you
> try to see if the below patch fixes it for you?
>
> https://lore.kernel.org/lkml/20250716104050.GR1613200@noisy.programming.kicks-ass.net/
>
> We probably want Cc: stable on this patch now that 6.16 is released.
>
> Thanks
That's the one. Thank you.
#regzbot introduced: 570c8efd5eb7
#regzbot monitor: https://lore.kernel.org/lkml/0d86c527-27a7-44d5-9ddc-f9a153f67b4d@meta.com/
#regzbot fix: 99b773d720ae
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-08-04 22:34 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-03 22:14 [REGRESSION] intermittent psi_avgs_work soft lockup Chris Bainbridge
2025-08-04 13:32 ` Johannes Weiner
2025-08-04 16:54 ` Johannes Weiner
2025-08-04 22:34 ` Chris Bainbridge
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox