* Re: the stuttering regression in 7.0: should I have done something different? [not found] <4dd98a32-d1d6-43de-910c-7e487503177e@leemhuis.info> @ 2026-05-08 5:51 ` John Paul Adrian Glaubitz 2026-05-08 6:33 ` Thorsten Leemhuis 0 siblings, 1 reply; 7+ messages in thread From: John Paul Adrian Glaubitz @ 2026-05-08 5:51 UTC (permalink / raw) To: Thorsten Leemhuis, Greg KH, Linus Torvalds Cc: Linux kernel regressions list, LKML, Tony Rodriguez Hi Thorsten, On Thu, 2026-04-23 at 18:30 +0200, Thorsten Leemhuis wrote: > FWIW, here is the rough timeline of the regression, just to be sure we > are all on the same page: > > * The regression I'm talking about is caused by d6e152d905bdb1 > ("clockevents: Prevent timer interrupt starvation") [authored: > 2026-04-07 10:54:17; committed: 2026-04-10 22:45:38; next arrival: > next-20260413; merged: 2026-04-12 19:01:55; v7.0 (2026-04-12 22:48:06)] > > * On Monday and thus within 24 of the 7.0 release the first report about > the regression came in and immediately mentioned that a revert was able > to fix things: > https://lore.kernel.org/all/68d1e9ac-2780-4be3-8ee3-0788062dd3a4@gmail.com/ > > * On Tuesday someone else confirmed the findings and mentioned that > "several users" were seeing the problem: > https://lore.kernel.org/all/aeb848aa-404a-40fb-bd41-329644623b1d@cachyos.org/ > > * A few hours later (aka within 24 hours of the first report) Thomas had > a rough fix ready https://lore.kernel.org/all/87340xfeje.ffs@tglx/ (yeah!) > > * On Thursday the fix was committed to the tip tree: > https://lore.kernel.org/all/177636758252.1323100.5283878386670888513.tip-bot2@tip-bot2/ > > * On Sunday I asked when the fix was going to be mainlined (with Linus > in CC) -- I feared Greg would soon start preparing 7.0.1-rc1 and I > wanted to ensure the fix was included there: > https://lore.kernel.org/all/5cbb14d8-46f9-4197-917f-51da852d7500@leemhuis.info/ > > * On Monday morning (UTC) mingo submitted a PR wit the fix: > https://lore.kernel.org/all/aeXYPt1FEbFRZNJf@gmail.com/ > > * On Monday Greg released 7.0.1-rc1 without the fix -- and a backport of > the culprit was in the -rc1 of various earlier series. Thomas quickly > told the stable team to not backport the culprit before the fix was > mainlined https://lore.kernel.org/all/87pl3ten5y.ffs@tglx/ > > * On Monday night Linus merged the PR from mingo as 4096fd0e8eaea1 > ("clockevents: Add missing resets of the next_event_forced flag") > [authored: 2026-04-14 22:55:01; committed: 2026-04-16 21:22:04; next > arrival: next-20260417; merged: 2026-04-21 00:30:08; v7.0-post] > > * On Tuesday morning I wrote a mail to Greg about including the fix in > 7.0.1; Thomas round about the same time provided the necessary backport, > which Greg then included out-of-band: > https://lore.kernel.org/all/2026042105-malformed-probation-232b@gregkh/ > https://lore.kernel.org/all/87jyu0de2c.ffs@tglx/ > > * v7.0.1 is released on Wednesday, 2026-04-22 13:32:23 Tony Rodriguez from the SPARC community has observed the regression on SPARC as well and proposed a fix to address it [1]. Not sure whether he has retested on the latest commit of Linus' tree yet. Tony, can you verify that 4096fd0e8eaea1 fixes the issue for you? Adrian > [1] https://github.com/sparclinux/issues/issues/79 -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer `. `' Physicist `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: the stuttering regression in 7.0: should I have done something different? 2026-05-08 5:51 ` the stuttering regression in 7.0: should I have done something different? John Paul Adrian Glaubitz @ 2026-05-08 6:33 ` Thorsten Leemhuis [not found] ` <D5D19776-C809-4284-9417-F9A860877B98@gmail.com> 0 siblings, 1 reply; 7+ messages in thread From: Thorsten Leemhuis @ 2026-05-08 6:33 UTC (permalink / raw) To: John Paul Adrian Glaubitz, Greg KH, Linus Torvalds Cc: Linux kernel regressions list, LKML, Tony Rodriguez On 5/8/26 07:51, John Paul Adrian Glaubitz wrote: > On Thu, 2026-04-23 at 18:30 +0200, Thorsten Leemhuis wrote: >> FWIW, here is the rough timeline of the regression, just to be sure we >> are all on the same page: >> >> * The regression I'm talking about is caused by d6e152d905bdb1 >> ("clockevents: Prevent timer interrupt starvation") [authored: >> 2026-04-07 10:54:17; committed: 2026-04-10 22:45:38; next arrival: >> next-20260413; merged: 2026-04-12 19:01:55; v7.0 (2026-04-12 22:48:06)] > [...] > Tony Rodriguez from the SPARC community has observed the regression on SPARC as well > and proposed a fix to address it [1]. Not sure whether he has retested on the latest > commit of Linus' tree yet. > > Tony, can you verify that 4096fd0e8eaea1 fixes the issue for you? > >> [1] https://github.com/sparclinux/issues/issues/79 It's likely a different regressions, as that report's title says that v7.0.1, v7.0.2, v7.0.3, and v7.1‑rc1 are affected, which all contain the fix, aka 4096fd0e8eaea1. Reporting in a new thread is likely best, as the authors of the culprit are not even CCed here. Ciao, Thorsten ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <D5D19776-C809-4284-9417-F9A860877B98@gmail.com>]
* Re: the stuttering regression in 7.0: should I have done something different? [not found] ` <D5D19776-C809-4284-9417-F9A860877B98@gmail.com> @ 2026-05-08 7:50 ` Thorsten Leemhuis 2026-05-08 20:15 ` Tony Rodriguez 0 siblings, 1 reply; 7+ messages in thread From: Thorsten Leemhuis @ 2026-05-08 7:50 UTC (permalink / raw) To: Tony Rodriguez Cc: John Paul Adrian Glaubitz, Greg KH, Linus Torvalds, Linux kernel regressions list, LKML, Thomas Gleixner [+tglx so he knows about it; details about the problem that Tony faces can be found in https://github.com/sparclinux/issues/issues/79 ] On 5/8/26 09:38, Tony Rodriguez wrote: > I still don't believe this is fixed upstream as of v7.03 and v7.1-rc1, Yes and no. It looks like d6e152d905bdb1 ("clockevents: Prevent timer interrupt starvation") causes two regression. Thomas fixed one with 4096fd0e8eaea1 ("clockevents: Add missing resets of the next_event_forced flag") -- and feedback shows that it definitely solved the problem for quite a few people. If that's not the case for you, then you seem to face a different problem caused by the same change. Happens, that's life sometimes. Ciao, Thorsten > only when my patch is applied does the SPARC74 S7-2 system become stable > again. I also tested my patch with v7.04 and it works their as well. > Will perform additional tests without my fix against v7.0.4 and v7.1-rc2 > later today to revalidate the regression (USA Pacific time). > > Tony Rodriguez > www.linkedin.com/in/unixpro1970 > >> On May 7, 2026, at 11:33 PM, Thorsten Leemhuis <linux@leemhuis.info> >> wrote: >> >> On 5/8/26 07:51, John Paul Adrian Glaubitz wrote: >>> On Thu, 2026-04-23 at 18:30 +0200, Thorsten Leemhuis wrote: >>>> FWIW, here is the rough timeline of the regression, just to be sure we >>>> are all on the same page: >>>> >>>> * The regression I'm talking about is caused by d6e152d905bdb1 >>>> ("clockevents: Prevent timer interrupt starvation") [authored: >>>> 2026-04-07 10:54:17; committed: 2026-04-10 22:45:38; next arrival: >>>> next-20260413; merged: 2026-04-12 19:01:55; v7.0 (2026-04-12 22:48:06)] >>> [...] >>> Tony Rodriguez from the SPARC community has observed the regression >>> on SPARC as well >>> and proposed a fix to address it [1]. Not sure whether he has >>> retested on the latest >>> commit of Linus' tree yet. >>> >>> Tony, can you verify that 4096fd0e8eaea1 fixes the issue for you? >>> >>>> [1] https://github.com/sparclinux/issues/issues/79 >> >> It's likely a different regressions, as that report's title says that >> v7.0.1, v7.0.2, v7.0.3, and v7.1‑rc1 are affected, which all contain the >> fix, aka 4096fd0e8eaea1. Reporting in a new thread is likely best, as >> the authors of the culprit are not even CCed here. >> >> Ciao, Thorsten ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: the stuttering regression in 7.0: should I have done something different? 2026-05-08 7:50 ` Thorsten Leemhuis @ 2026-05-08 20:15 ` Tony Rodriguez 2026-05-08 20:21 ` Tony Rodriguez 2026-05-10 21:29 ` Thomas Gleixner 0 siblings, 2 replies; 7+ messages in thread From: Tony Rodriguez @ 2026-05-08 20:15 UTC (permalink / raw) To: Thorsten Leemhuis Cc: John Paul Adrian Glaubitz, Greg KH, Linus Torvalds, Linux kernel regressions list, LKML, Thomas Gleixner Just confirmed on my end today. This regression also impacts both SPARC64 S7-2 and SPARC64 T7-1 on v7.0.4 and v7.1-rc2 as well. Different systems using the same exact kernels. ** Please see points (A1) (A2) (B1) (B2) Once again, I am not experiencing such issues when "my patch" (link below) is added to address this regression. https://github.com/sparclinux/issues/issues/79#issuecomment-4362173884 Output demonstrating issues for SPARC64 S7-2 and T7-1 systems (without my regression patch): PS - On May 2nd 2026 at 9:42 PM: I also sent an email to Thomas Gleixner regarding this issue. Will be happy to validate any patches from your end regarding this issue, as time permits me to do so. Best Regards, Tony Rodriguez A1) SPARC64 S7-2: Kernel v7.1.0-rc2 uname -a Linux s7t7-debian-test 7.1.0-rc2-test01 #1 SMP Fri May 8 10:02:12 PDT 2026 sparc64 GNU/Linux cat /proc/cmdline BOOT_IMAGE=/boot/vmlinuz-7.1.0-rc2-test01 root=UUID=ce937a4b-126a-41bd-a54b-03a424421086 ro console=ttyHV0,9600n81 systemd.log_level=info systemd.show_status=1 systemd.journald.forward_to_console=0 plymouth.enable=0 quiet [ 243.267359] INFO: task kworker/u512:1:706 blocked for more than 120 seconds. [ 243.281208] Not tainted 7.1.0-rc2-test01 #1 [ 243.290583] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 243.306349] INFO: task kworker/127:1:714 blocked for more than 120 seconds. [ 243.320106] Not tainted 7.1.0-rc2-test01 #1 [ 243.329476] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 364.099350] INFO: task kworker/u512:1:706 blocked for more than 241 seconds. [ 364.113199] Not tainted 7.1.0-rc2-test01 #1 [ 364.122585] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 364.138328] INFO: task kworker/127:1:714 blocked for more than 241 seconds. [ 364.152086] Not tainted 7.1.0-rc2-test01 #1 [ 364.161470] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 485.295360] INFO: task kworker/u512:1:706 blocked for more than 362 seconds. [ 485.309209] Not tainted 7.1.0-rc2-test01 #1 [ 485.318581] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 485.334345] INFO: task kworker/127:1:714 blocked for more than 362 seconds. [ 485.348099] Not tainted 7.1.0-rc2-test01 #1 [ 485.357467] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 726.849598] INFO: task kworker/u512:1:706 blocked for more than 604 seconds. [ 726.863444] Not tainted 7.1.0-rc2-test01 #1 [ 726.872832] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 726.888573] INFO: task kworker/127:1:714 blocked for more than 604 seconds. [ 726.902340] Not tainted 7.1.0-rc2-test01 #1 [ 726.911708] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. sudo dmesg | grep -iE block | grep -iE worker [ 243.267359] INFO: task kworker/u512:1:706 blocked for more than 120 seconds. [ 243.306349] INFO: task kworker/127:1:714 blocked for more than 120 seconds. [ 364.099350] INFO: task kworker/u512:1:706 blocked for more than 241 seconds. [ 364.138328] INFO: task kworker/127:1:714 blocked for more than 241 seconds. [ 485.295360] INFO: task kworker/u512:1:706 blocked for more than 362 seconds. [ 485.334345] INFO: task kworker/127:1:714 blocked for more than 362 seconds. [ 605.849474] INFO: task kworker/u512:1:706 blocked for more than 483 seconds. [ 605.888461] INFO: task kworker/127:1:714 blocked for more than 483 seconds. sudo poweroff or sudo reboot NOTE(S): Random hangs during startup. Also, hangs during shutdown/reboot process. ------------------------------------------------------------------------------------------- A2) SPARC64 S7-2: Kernel v7.0.4 [ OK ] Finished e2scrub_reap.service - Remove Stale Online ext4 Metadata Check Snapshots. Debian GNU/Linux forky/sid s7t7-debian-test ttyHV0 s7t7-debian-test login: tonyr Password: Linux s7t7-debian-test 7.0.4-test01 #1 SMP Fri May 8 09:27:58 PDT 2026 sparc64 [...] Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent permitted by applicable law. login: timed [ 114.687722] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: [ 114.699319] rcu: 67-...!: (240 GPs behind) idle=e9c0/0/0x0 softirq=174/174 fqs=0 (false positive?) [ 114.717370] rcu: 102-...!: (73 GPs behind) idle=77e0/0/0x0 softirq=286/287 fqs=0 (false positive?) [ 114.735419] rcu: 111-...!: (52 GPs behind) idle=11d8/0/0x0 softirq=860/861 fqs=0 (false positive?) [ 114.753489] rcu: (detected by 11, t=5268 jiffies, g=4457, q=528 ncpus=128) [ 114.767628] rcu: rcu_sched kthread timer wakeup didn't happen for 5270 jiffies! g4457 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 [ 114.789647] rcu: Possible timer handling issue on cpu=105 timer-softirq=98 [ 114.803535] rcu: rcu_sched kthread starved for 5280 jiffies! g4457 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=105 [ 114.824201] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior. [ 114.842080] rcu: RCU grace-period kthread stack dump: [ 114.852221] rcu: Stack dump where RCU GP kthread last ran: [ 135.867723] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: [ 135.879326] rcu: 65-...!: (1 GPs behind) idle=35b0/0/0x0 softirq=483/484 fqs=0 (false positive?) [ 135.897024] rcu: 67-...!: (241 GPs behind) idle=ecc0/0/0x0 softirq=174/174 fqs=0 (false positive?) [ 135.915082] rcu: 102-...!: (74 GPs behind) idle=7800/0/0x0 softirq=286/287 fqs=0 (false positive?) [ 135.933123] rcu: 111-...!: (53 GPs behind) idle=1238/0/0x0 softirq=860/861 fqs=0 (false positive?) [ 135.951184] rcu: (detected by 64, t=5272 jiffies, g=4461, q=752 ncpus=128) [ 135.965398] rcu: rcu_sched kthread timer wakeup didn't happen for 5275 jiffies! g4461 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 [ 135.987393] rcu: Possible timer handling issue on cpu=105 timer-softirq=98 [ 136.001287] rcu: rcu_sched kthread starved for 5285 jiffies! g4461 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=105 [ 136.021944] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior. [ 136.039829] rcu: RCU grace-period kthread stack dump: [ 136.049971] rcu: Stack dump where RCU GP kthread last ran: NOTE(S): Unable to login and random hangs during system startup. ------------------------------------------------------------------------- B1) SPARC64 T7-1: Kernel v7.1.0-rc2 lscpu;uname -a Architecture: sparc64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Big Endian CPU(s): 256 On-line CPU(s) list: 0-255 Model name: SPARC-M7 Thread(s) per core: 8 Core(s) per socket: 32 Socket(s): 1 Flags: sun4v Caches (sum of all): L1d: 4 MiB (256 instances) L1i: 4 MiB (256 instances) L2: 64 MiB (256 instances) Linux s7t7-debian-test 7.1.0-rc2-test01 #1 SMP Fri May 8 10:02:12 PDT 2026 sparc64 GNU/Linux 526.766867] rcu: 8-...!: (806 GPs behind) idle=069c/0/0x1 softirq=682/682 fqs=0 [ 526.781618] rcu: 22-...!: (0 ticks this GP) idle=7b40/0/0x0 softirq=739/739 fqs=0 (false positive?) [ 526.799841] rcu: 89-...!: (770 GPs behind) idle=7800/0/0x0 softirq=270/273 fqs=0 (false positive?) [ 526.817901] rcu: 112-...!: (225 GPs behind) idle=c0c8/0/0x0 softirq=193/193 fqs=0 (false positive?) [ 526.836131] rcu: 189-...!: (0 ticks this GP) idle=8ef0/0/0x0 softirq=1016/1016 fqs=0 (false positive?) [ 526.854885] rcu: 204-...!: (0 ticks this GP) idle=5d20/0/0x0 softirq=774/774 fqs=0 (false positive?) [ 526.873278] rcu: 219-...!: (225 GPs behind) idle=d580/0/0x0 softirq=605/607 fqs=0 (false positive?) [ 526.891508] rcu: 226-...!: (233 GPs behind) idle=ec08/0/0x0 softirq=1189/1190 fqs=0 (false positive?) [ 526.910079] rcu: (detected by 157, t=5289 jiffies, g=5989, q=5339 ncpus=256) [ 526.924916] rcu: rcu_sched kthread timer wakeup didn't happen for 5295 jiffies! g5989 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 [ 526.946930] rcu: Possible timer handling issue on cpu=94 timer-softirq=279 [ 526.960818] rcu: rcu_sched kthread starved for 5302 jiffies! g5989 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=94 [ 526.981300] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior. [ 526.999182] rcu: RCU grace-period kthread stack dump: [ 527.009301] rcu: Stack dump where RCU GP kthread last ran: [ 548.035259] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: [ 548.046861] rcu: 8-...!: (807 GPs behind) idle=071c/0/0x1 softirq=682/682 fqs=0 [ 548.061608] rcu: 17-...!: (0 ticks this GP) idle=00e8/0/0x0 softirq=812/812 fqs=0 (false positive?) [ 548.079831] rcu: 84-...!: (0 ticks this GP) idle=d2b0/0/0x0 softirq=797/797 fqs=0 (false positive?) [ 548.098070] rcu: 89-...!: (771 GPs behind) idle=7be8/0/0x0 softirq=270/273 fqs=0 (false positive?) [ 548.116122] rcu: 112-...!: (226 GPs behind) idle=c110/0/0x0 softirq=193/193 fqs=0 (false positive?) [ 548.134342] rcu: 185-...!: (0 ticks this GP) idle=45b8/0/0x0 softirq=871/871 fqs=0 (false positive?) [ 548.152759] rcu: 193-...!: (0 ticks this GP) idle=1758/0/0x0 softirq=1520/1520 fqs=0 (false positive?) [ 548.171509] rcu: 205-...!: (0 ticks this GP) idle=1e98/0/0x0 softirq=852/852 fqs=0 (false positive?) [ 548.189893] rcu: 219-...!: (226 GPs behind) idle=d5c8/0/0x0 softirq=605/607 fqs=0 (false positive?) [ 548.208128] rcu: 226-...!: (234 GPs behind) idle=eff0/0/0x0 softirq=1189/1190 fqs=0 (false positive?) [ 548.226699] rcu: (detected by 115, t=5300 jiffies, g=5993, q=5539 ncpus=256) [ 548.241699] rcu: rcu_sched kthread timer wakeup didn't happen for 5303 jiffies! g5993 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 [ 548.263704] rcu: Possible timer handling issue on cpu=94 timer-softirq=279 [ 548.277593] rcu: rcu_sched kthread starved for 5311 jiffies! g5993 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=94 [ 548.298081] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior. [ 548.315971] rcu: RCU grace-period kthread stack dump: [ 548.326084] rcu: Stack dump where RCU GP kthread last ran: [ 569.343268] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: [ 569.354868] rcu: 8-...!: (808 GPs behind) idle=07ac/0/0x1 softirq=682/682 fqs=0 [ 569.369617] rcu: 89-...!: (772 GPs behind) idle=8518/0/0x0 softirq=270/273 fqs=0 (false positive?) [ 569.387674] rcu: 112-...!: (227 GPs behind) idle=c168/0/0x0 softirq=193/193 fqs=0 (false positive?) [ 569.405894] rcu: 219-...!: (227 GPs behind) idle=d620/0/0x0 softirq=605/607 fqs=0 (false positive?) [ 569.424128] rcu: 226-...!: (235 GPs behind) idle=f920/0/0x0 softirq=1189/1190 fqs=0 (false positive?) [ 569.442700] rcu: (detected by 76, t=5276 jiffies, g=5997, q=5665 ncpus=256) [ 569.457146] rcu: rcu_sched kthread timer wakeup didn't happen for 5278 jiffies! g5997 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 [ 569.479149] rcu: Possible timer handling issue on cpu=94 timer-softirq=279 [ 569.493043] rcu: rcu_sched kthread starved for 5285 jiffies! g5997 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=94 [ 569.513534] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior. [ 569.531419] rcu: RCU grace-period kthread stack dump: [ 569.541536] rcu: Stack dump where RCU GP kthread last ran: [ 590.563260] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: [ 590.574870] rcu: 8-...!: (809 GPs behind) idle=0824/0/0x1 softirq=682/682 fqs=0 [ 590.589618] rcu: 89-...!: (773 GPs behind) idle=8850/0/0x0 softirq=270/273 fqs=0 (false positive?) [ 590.607682] rcu: 112-...!: (228 GPs behind) idle=c198/0/0x0 softirq=193/193 fqs=0 (false positive?) [ 590.625904] rcu: 195-...!: (0 ticks this GP) idle=7178/0/0x0 softirq=1038/1038 fqs=0 (false positive?) [ 590.644660] rcu: 207-...!: (0 ticks this GP) idle=9440/0/0x0 softirq=809/809 fqs=0 (false positive?) [ 590.663056] rcu: 219-...!: (228 GPs behind) idle=d650/0/0x0 softirq=605/607 fqs=0 (false positive?) [ 590.681285] rcu: 226-...!: (236 GPs behind) idle=fc78/0/0x0 softirq=1189/1190 fqs=0 (false positive?) [ 590.699859] rcu: (detected by 138, t=5286 jiffies, g=6001, q=5524 ncpus=256) [ 590.714623] rcu: rcu_sched kthread timer wakeup didn't happen for 5288 jiffies! g6001 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 [ 590.736635] rcu: Possible timer handling issue on cpu=94 timer-softirq=279 [ 590.750524] rcu: rcu_sched kthread starved for 5296 jiffies! g6001 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=94 [ 590.771021] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior. [ 590.788903] rcu: RCU grace-period kthread stack dump: [ 590.799012] rcu: Stack dump where RCU GP kthread last ran: [ 606.363275] INFO: task kworker/u1024:0:12 blocked for more than 483 seconds. [ 606.377139] Tainted: G W 7.1.0-rc2-test01 #1 [ 606.389636] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 611.823259] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: [ 611.834860] rcu: 8-...!: (810 GPs behind) idle=08bc/0/0x1 softirq=682/682 fqs=0 [ 611.849612] rcu: 89-...!: (774 GPs behind) idle=91a8/0/0x0 softirq=270/273 fqs=0 (false positive?) [ 611.867665] rcu: 112-...!: (229 GPs behind) idle=c1d8/0/0x0 softirq=193/193 fqs=0 (false positive?) [ 611.885887] rcu: 205-...!: (0 ticks this GP) idle=2160/0/0x0 softirq=865/865 fqs=0 (false positive?) [ 611.904290] rcu: 219-...!: (229 GPs behind) idle=d690/0/0x0 softirq=605/607 fqs=0 (false positive?) [ 611.922525] rcu: 226-...!: (237 GPs behind) idle=05e0/0/0x0 softirq=1189/1190 fqs=0 (false positive?) [ 611.941095] rcu: (detected by 166, t=5283 jiffies, g=6005, q=5522 ncpus=256) [ 611.955789] rcu: rcu_sched kthread timer wakeup didn't happen for 5285 jiffies! g6005 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 [ 611.977793] rcu: Possible timer handling issue on cpu=94 timer-softirq=279 [ 611.991685] rcu: rcu_sched kthread starved for 5292 jiffies! g6005 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=94 [ 612.012174] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior. [ 612.030060] rcu: RCU grace-period kthread stack dump: [ 612.040180] rcu: Stack dump where RCU GP kthread last ran: r[ 727.195272] INFO: task kworker/u1024:0:12 blocked for more than 604 seconds. [ 727.209134] Tainted: G W 7.1.0-rc2-test01 #1 [ 727.221628] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. NOTE(S): Random hangs and same messages as S7-2. Takes about 15 minutes to see the messages. --------------------------------------------------------- B2) SPARC64 T7-1 Kernel v7.0.4 [ OK ] Finished e2scrub_reap.service - Remove Stale Online ext4 Metadata Check Snapshots. Debian GNU/Linux forky/sid s7t7-debian-test ttyHV0 s7t7-debian-test login: tonyr Password: Linux s7t7-debian-test 7.0.4-test01 #1 SMP Fri May 8 09:27:58 PDT 2026 sparc64 [..] Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent permitted by applicable law. [ 79.468871] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: [ 79.480483] rcu: 114-...!: (424 GPs behind) idle=8760/0/0x0 softirq=126/126 fqs=0 (false positive?) [ 79.498713] rcu: (detected by 90, t=5259 jiffies, g=3769, q=818 ncpus=256) [ 79.512702] rcu: rcu_sched kthread timer wakeup didn't happen for 5260 jiffies! g3769 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 [ 79.534808] rcu: Possible timer handling issue on cpu=2 timer-softirq=330 [ 79.548516] rcu: rcu_sched kthread starved for 5267 jiffies! g3769 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=2 [ 79.568838] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior. [ 79.586709] rcu: RCU grace-period kthread stack dump: [ 79.596867] rcu: Stack dump where RCU GP kthread last ran: [ 100.612874] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: [ 100.624477] rcu: 114-...!: (425 GPs behind) idle=88f0/0/0x0 softirq=126/126 fqs=0 (false positive?) [ 100.642709] rcu: 157-...!: (0 ticks this GP) idle=4c08/0/0x0 softirq=122/122 fqs=0 (false positive?) [ 100.661106] rcu: (detected by 3, t=5264 jiffies, g=3773, q=1046 ncpus=256) [ 100.675155] rcu: rcu_sched kthread timer wakeup didn't happen for 5265 jiffies! g3773 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 [ 100.697211] rcu: Possible timer handling issue on cpu=2 timer-softirq=330 [ 100.710934] rcu: rcu_sched kthread starved for 5276 jiffies! g3773 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=2 [ 100.731244] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior. [ 100.749125] rcu: RCU grace-period kthread stack dump: [ 100.759255] rcu: Stack dump where RCU GP kthread last ran: login: ti[ 121.776867] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: [ 121.788467] rcu: 114-...!: (426 GPs behind) idle=8a20/0/0x0 softirq=126/126 fqs=0 (false positive?) [ 121.806703] rcu: (detected by 3, t=5259 jiffies, g=3777, q=1267 ncpus=256) [ 121.820664] rcu: rcu_sched kthread timer wakeup didn't happen for 5260 jiffies! g3777 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 [ 121.842799] rcu: Possible timer handling issue on cpu=2 timer-softirq=330 [ 121.856521] rcu: rcu_sched kthread starved for 5271 jiffies! g3777 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=2 [ 121.876836] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior. [ 121.894717] rcu: RCU grace-period kthread stack dump: [ 121.904824] rcu: Stack dump where RCU GP kthread last ran: [ 142.920877] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: [ 142.932481] rcu: 114-...!: (427 GPs behind) idle=8b98/0/0x0 softirq=126/126 fqs=0 (false positive?) [ 142.950709] rcu: 158-...!: (1 GPs behind) idle=5220/0/0x0 softirq=142/148 fqs=0 (false positive?) [ 142.968586] rcu: (detected by 122, t=5260 jiffies, g=3781, q=722 ncpus=256) [ 142.982808] rcu: rcu_sched kthread timer wakeup didn't happen for 5265 jiffies! g3781 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 [ 143.004857] rcu: Possible timer handling issue on cpu=2 timer-softirq=330 [ 143.018583] rcu: rcu_sched kthread starved for 5273 jiffies! g3781 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=2 [ 143.038893] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior. [ 143.056761] rcu: RCU grace-period kthread stack dump: [ 143.066898] rcu: Stack dump where RCU GP kthread last ran: [ 164.084863] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: [ 164.096463] rcu: 14-...!: (0 ticks this GP) idle=56b0/0/0x0 softirq=165/165 fqs=0 (false positive?) [ 164.114695] rcu: 114-...!: (428 GPs behind) idle=8ed0/0/0x0 softirq=126/126 fqs=0 (false positive?) [ 164.132916] rcu: (detected by 96, t=5264 jiffies, g=3785, q=750 ncpus=256) [ 164.146969] rcu: rcu_sched kthread timer wakeup didn't happen for 5265 jiffies! g3785 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 [ 164.169019] rcu: Possible timer handling issue on cpu=2 timer-softirq=330 [ 164.182728] rcu: rcu_sched kthread starved for 5276 jiffies! g3785 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=2 [ 164.203055] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior. [ 164.220922] rcu: RCU grace-period kthread stack dump: [ 164.231039] rcu: Stack dump where RCU GP kthread last ran: [ 185.248867] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: [ 185.260468] rcu: 21-...!: (0 ticks this GP) idle=36c8/0/0x0 softirq=154/154 fqs=0 (false positive?) [ 185.278684] rcu: 114-...!: (429 GPs behind) idle=8f68/0/0x0 softirq=126/126 fqs=0 (false positive?) [ 185.296922] rcu: (detected by 116, t=5264 jiffies, g=3789, q=760 ncpus=256) [ 185.311140] rcu: rcu_sched kthread timer wakeup didn't happen for 5265 jiffies! g3789 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 [ 185.333205] rcu: Possible timer handling issue on cpu=2 timer-softirq=330 [ 185.346918] rcu: rcu_sched kthread starved for 5276 jiffies! g3789 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=2 [ 185.367224] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior. [ 185.385113] rcu: RCU grace-period kthread stack dump: [ 185.395229] rcu: Stack dump where RCU GP kthread last ran: OK ] Reached target network-online.target - Network is Online. [ OK ] Started anacron.service - Run anacron jobs. [ OK ] Started cups-browsed.service - Make remote CUPS printers available locally. Starting exim4.service - exim Mail Transport Agent... Starting xrdp.service - xrdp daemon... [ OK ] Finished user-runtime-dir@1000.service - User Runtime Directory /run/user/1000. [ OK ] Started xrdp.service - xrdp daemon. [ OK ] Started serial-getty@ttyHV0.service - Serial Getty on ttyHV0. Starting user@1000.service - User Manager for UID 1000... [ OK ] Started exim4.service - exim Mail Transport Agent. [ OK ] Reached target multi-user.target - Multi-User System. [ OK ] Reached target graphical.target - Graphical Interface. [ OK ] Started user@1000.service - User Manager for UID 1000. [FAILED] Failed to start session-1.scope - Session 1 of User tonyr. See 'systemctl status session-1.scope' for details. [ 206.412865] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: [ 206.424477] rcu: 114-...!: (430 GPs behind) idle=97b0/0/0x0 softirq=126/126 fqs=0 (false positive?) [ 206.442691] rcu: (detected by 123, t=5259 jiffies, g=3793, q=5473 ncpus=256) [ 206.457056] rcu: rcu_sched kthread timer wakeup didn't happen for 5261 jiffies! g3793 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 [ 206.479157] rcu: Possible timer handling issue on cpu=2 timer-softirq=330 [ 206.492868] rcu: rcu_sched kthread starved for 5271 jiffies! g3793 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=2 [ 206.513173] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior. [ 206.531061] rcu: RCU grace-period kthread stack dump: [ 206.541202] rcu: Stack dump where RCU GP kthread last ran: NOTE(S): Unable to login and random hangs during system startup. Same messages/issues as S7-2. On 5/8/26 12:50 AM, Thorsten Leemhuis wrote: > [+tglx so he knows about it; details about the problem that Tony faces > can be found in https://github.com/sparclinux/issues/issues/79 ] > > On 5/8/26 09:38, Tony Rodriguez wrote: >> I still don't believe this is fixed upstream as of v7.03 and v7.1-rc1, > Yes and no. It looks like d6e152d905bdb1 ("clockevents: Prevent timer > interrupt starvation") causes two regression. > > Thomas fixed one with 4096fd0e8eaea1 ("clockevents: Add missing resets > of the next_event_forced flag") -- and feedback shows that it definitely > solved the problem for quite a few people. If that's not the case for > you, then you seem to face a different problem caused by the same > change. Happens, that's life sometimes. > > Ciao, Thorsten > >> only when my patch is applied does the SPARC74 S7-2 system become stable >> again. I also tested my patch with v7.0.4 and it works their as well. >> Will perform additional tests without my fix against v7.0.4 and v7.1-rc2 >> later today to revalidate the regression (USA Pacific time). >> >> Tony Rodriguez >> www.linkedin.com/in/unixpro1970 >> >>> On May 7, 2026, at 11:33 PM, Thorsten Leemhuis <linux@leemhuis.info> >>> wrote: >>> >>> On 5/8/26 07:51, John Paul Adrian Glaubitz wrote: >>>> On Thu, 2026-04-23 at 18:30 +0200, Thorsten Leemhuis wrote: >>>>> FWIW, here is the rough timeline of the regression, just to be sure we >>>>> are all on the same page: >>>>> >>>>> * The regression I'm talking about is caused by d6e152d905bdb1 >>>>> ("clockevents: Prevent timer interrupt starvation") [authored: >>>>> 2026-04-07 10:54:17; committed: 2026-04-10 22:45:38; next arrival: >>>>> next-20260413; merged: 2026-04-12 19:01:55; v7.0 (2026-04-12 22:48:06)] >>>> [...] >>>> Tony Rodriguez from the SPARC community has observed the regression >>>> on SPARC as well >>>> and proposed a fix to address it [1]. Not sure whether he has >>>> retested on the latest >>>> commit of Linus' tree yet. >>>> >>>> Tony, can you verify that 4096fd0e8eaea1 fixes the issue for you? >>>> >>>>> [1] https://github.com/sparclinux/issues/issues/79 >>> It's likely a different regressions, as that report's title says that >>> v7.0.1, v7.0.2, v7.0.3, and v7.1‑rc1 are affected, which all contain the >>> fix, aka 4096fd0e8eaea1. Reporting in a new thread is likely best, as >>> the authors of the culprit are not even CCed here. >>> >>> Ciao, Thorsten ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: the stuttering regression in 7.0: should I have done something different? 2026-05-08 20:15 ` Tony Rodriguez @ 2026-05-08 20:21 ` Tony Rodriguez 2026-05-10 21:29 ` Thomas Gleixner 1 sibling, 0 replies; 7+ messages in thread From: Tony Rodriguez @ 2026-05-08 20:21 UTC (permalink / raw) To: Thorsten Leemhuis Cc: John Paul Adrian Glaubitz, Greg KH, Linus Torvalds, Linux kernel regressions list, LKML, Thomas Gleixner Just confirmed on my end today. This regression also impacts both SPARC64 S7-2 and SPARC64 T7-1 on v7.0.4 and v7.1-rc2 as well. Different systems using the same exact kernels. ** Please see points (A1) (A2) (B1) (B2) Once again, I am not experiencing such issues when "my patch" (link below) is added to address this regression. https://github.com/sparclinux/issues/issues/79#issuecomment-4362173884 Output demonstrating issues for SPARC64 S7-2 and T7-1 systems (without my regression patch): PS - On May 2nd, 2026 at 9:42 PM: I also sent an email to Thomas Gleixner regarding this issue. I will be happy to validate any patches from your end regarding this issue, as time permits me to do so. Best regards, Tony Rodriguez A1) SPARC64 S7-2: Kernel v7.1.0-rc2 uname -a Linux s7t7-debian-test 7.1.0-rc2-test01 #1 SMP Fri May 8 10:02:12 PDT 2026 sparc64 GNU/Linux cat /proc/cmdline BOOT_IMAGE=/boot/vmlinuz-7.1.0-rc2-test01 root=UUID=ce937a4b-126a-41bd-a54b-03a424421086 ro console=ttyHV0,9600n81 systemd.log_level=info systemd.show_status=1 systemd.journald.forward_to_console=0 plymouth.enable=0 quiet [ 243.267359] INFO: task kworker/u512:1:706 blocked for more than 120 seconds. [ 243.281208] Not tainted 7.1.0-rc2-test01 #1 [ 243.290583] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 243.306349] INFO: task kworker/127:1:714 blocked for more than 120 seconds. [ 243.320106] Not tainted 7.1.0-rc2-test01 #1 [ 243.329476] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 364.099350] INFO: task kworker/u512:1:706 blocked for more than 241 seconds. [ 364.113199] Not tainted 7.1.0-rc2-test01 #1 [ 364.122585] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 364.138328] INFO: task kworker/127:1:714 blocked for more than 241 seconds. [ 364.152086] Not tainted 7.1.0-rc2-test01 #1 [ 364.161470] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 485.295360] INFO: task kworker/u512:1:706 blocked for more than 362 seconds. [ 485.309209] Not tainted 7.1.0-rc2-test01 #1 [ 485.318581] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 485.334345] INFO: task kworker/127:1:714 blocked for more than 362 seconds. [ 485.348099] Not tainted 7.1.0-rc2-test01 #1 [ 485.357467] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 726.849598] INFO: task kworker/u512:1:706 blocked for more than 604 seconds. [ 726.863444] Not tainted 7.1.0-rc2-test01 #1 [ 726.872832] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 726.888573] INFO: task kworker/127:1:714 blocked for more than 604 seconds. [ 726.902340] Not tainted 7.1.0-rc2-test01 #1 [ 726.911708] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. sudo dmesg | grep -iE block | grep -iE worker [ 243.267359] INFO: task kworker/u512:1:706 blocked for more than 120 seconds. [ 243.306349] INFO: task kworker/127:1:714 blocked for more than 120 seconds. [ 364.099350] INFO: task kworker/u512:1:706 blocked for more than 241 seconds. [ 364.138328] INFO: task kworker/127:1:714 blocked for more than 241 seconds. [ 485.295360] INFO: task kworker/u512:1:706 blocked for more than 362 seconds. [ 485.334345] INFO: task kworker/127:1:714 blocked for more than 362 seconds. [ 605.849474] INFO: task kworker/u512:1:706 blocked for more than 483 seconds. [ 605.888461] INFO: task kworker/127:1:714 blocked for more than 483 seconds. sudo poweroff or sudo reboot NOTE(S): Random hangs during startup. Also, hangs during shutdown/reboot process. ------------------------------------------------------------------------------------------- A2) SPARC64 S7-2: Kernel v7.0.4 [ OK ] Finished e2scrub_reap.service - Remove Stale Online ext4 Metadata Check Snapshots. Debian GNU/Linux forky/sid s7t7-debian-test ttyHV0 s7t7-debian-test login: tonyr Password: Linux s7t7-debian-test 7.0.4-test01 #1 SMP Fri May 8 09:27:58 PDT 2026 sparc64 [...] Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent permitted by applicable law. login: timed [ 114.687722] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: [ 114.699319] rcu: 67-...!: (240 GPs behind) idle=e9c0/0/0x0 softirq=174/174 fqs=0 (false positive?) [ 114.717370] rcu: 102-...!: (73 GPs behind) idle=77e0/0/0x0 softirq=286/287 fqs=0 (false positive?) [ 114.735419] rcu: 111-...!: (52 GPs behind) idle=11d8/0/0x0 softirq=860/861 fqs=0 (false positive?) [ 114.753489] rcu: (detected by 11, t=5268 jiffies, g=4457, q=528 ncpus=128) [ 114.767628] rcu: rcu_sched kthread timer wakeup didn't happen for 5270 jiffies! g4457 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 [ 114.789647] rcu: Possible timer handling issue on cpu=105 timer-softirq=98 [ 114.803535] rcu: rcu_sched kthread starved for 5280 jiffies! g4457 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=105 [ 114.824201] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior. [ 114.842080] rcu: RCU grace-period kthread stack dump: [ 114.852221] rcu: Stack dump where RCU GP kthread last ran: [ 135.867723] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: [ 135.879326] rcu: 65-...!: (1 GPs behind) idle=35b0/0/0x0 softirq=483/484 fqs=0 (false positive?) [ 135.897024] rcu: 67-...!: (241 GPs behind) idle=ecc0/0/0x0 softirq=174/174 fqs=0 (false positive?) [ 135.915082] rcu: 102-...!: (74 GPs behind) idle=7800/0/0x0 softirq=286/287 fqs=0 (false positive?) [ 135.933123] rcu: 111-...!: (53 GPs behind) idle=1238/0/0x0 softirq=860/861 fqs=0 (false positive?) [ 135.951184] rcu: (detected by 64, t=5272 jiffies, g=4461, q=752 ncpus=128) [ 135.965398] rcu: rcu_sched kthread timer wakeup didn't happen for 5275 jiffies! g4461 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 [ 135.987393] rcu: Possible timer handling issue on cpu=105 timer-softirq=98 [ 136.001287] rcu: rcu_sched kthread starved for 5285 jiffies! g4461 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=105 [ 136.021944] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior. [ 136.039829] rcu: RCU grace-period kthread stack dump: [ 136.049971] rcu: Stack dump where RCU GP kthread last ran: NOTE(S): Unable to login and random hangs during system startup. ------------------------------------------------------------------------- B1) SPARC64 T7-1: Kernel v7.1.0-rc2 lscpu;uname -a Architecture: sparc64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Big Endian CPU(s): 256 On-line CPU(s) list: 0-255 Model name: SPARC-M7 Thread(s) per core: 8 Core(s) per socket: 32 Socket(s): 1 Flags: sun4v Caches (sum of all): L1d: 4 MiB (256 instances) L1i: 4 MiB (256 instances) L2: 64 MiB (256 instances) Linux s7t7-debian-test 7.1.0-rc2-test01 #1 SMP Fri May 8 10:02:12 PDT 2026 sparc64 GNU/Linux 526.766867] rcu: 8-...!: (806 GPs behind) idle=069c/0/0x1 softirq=682/682 fqs=0 [ 526.781618] rcu: 22-...!: (0 ticks this GP) idle=7b40/0/0x0 softirq=739/739 fqs=0 (false positive?) [ 526.799841] rcu: 89-...!: (770 GPs behind) idle=7800/0/0x0 softirq=270/273 fqs=0 (false positive?) [ 526.817901] rcu: 112-...!: (225 GPs behind) idle=c0c8/0/0x0 softirq=193/193 fqs=0 (false positive?) [ 526.836131] rcu: 189-...!: (0 ticks this GP) idle=8ef0/0/0x0 softirq=1016/1016 fqs=0 (false positive?) [ 526.854885] rcu: 204-...!: (0 ticks this GP) idle=5d20/0/0x0 softirq=774/774 fqs=0 (false positive?) [ 526.873278] rcu: 219-...!: (225 GPs behind) idle=d580/0/0x0 softirq=605/607 fqs=0 (false positive?) [ 526.891508] rcu: 226-...!: (233 GPs behind) idle=ec08/0/0x0 softirq=1189/1190 fqs=0 (false positive?) [ 526.910079] rcu: (detected by 157, t=5289 jiffies, g=5989, q=5339 ncpus=256) [ 526.924916] rcu: rcu_sched kthread timer wakeup didn't happen for 5295 jiffies! g5989 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 [ 526.946930] rcu: Possible timer handling issue on cpu=94 timer-softirq=279 [ 526.960818] rcu: rcu_sched kthread starved for 5302 jiffies! g5989 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=94 [ 526.981300] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior. [ 526.999182] rcu: RCU grace-period kthread stack dump: [ 527.009301] rcu: Stack dump where RCU GP kthread last ran: [ 548.035259] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: [ 548.046861] rcu: 8-...!: (807 GPs behind) idle=071c/0/0x1 softirq=682/682 fqs=0 [ 548.061608] rcu: 17-...!: (0 ticks this GP) idle=00e8/0/0x0 softirq=812/812 fqs=0 (false positive?) [ 548.079831] rcu: 84-...!: (0 ticks this GP) idle=d2b0/0/0x0 softirq=797/797 fqs=0 (false positive?) [ 548.098070] rcu: 89-...!: (771 GPs behind) idle=7be8/0/0x0 softirq=270/273 fqs=0 (false positive?) [ 548.116122] rcu: 112-...!: (226 GPs behind) idle=c110/0/0x0 softirq=193/193 fqs=0 (false positive?) [ 548.134342] rcu: 185-...!: (0 ticks this GP) idle=45b8/0/0x0 softirq=871/871 fqs=0 (false positive?) [ 548.152759] rcu: 193-...!: (0 ticks this GP) idle=1758/0/0x0 softirq=1520/1520 fqs=0 (false positive?) [ 548.171509] rcu: 205-...!: (0 ticks this GP) idle=1e98/0/0x0 softirq=852/852 fqs=0 (false positive?) [ 548.189893] rcu: 219-...!: (226 GPs behind) idle=d5c8/0/0x0 softirq=605/607 fqs=0 (false positive?) [ 548.208128] rcu: 226-...!: (234 GPs behind) idle=eff0/0/0x0 softirq=1189/1190 fqs=0 (false positive?) [ 548.226699] rcu: (detected by 115, t=5300 jiffies, g=5993, q=5539 ncpus=256) [ 548.241699] rcu: rcu_sched kthread timer wakeup didn't happen for 5303 jiffies! g5993 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 [ 548.263704] rcu: Possible timer handling issue on cpu=94 timer-softirq=279 [ 548.277593] rcu: rcu_sched kthread starved for 5311 jiffies! g5993 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=94 [ 548.298081] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior. [ 548.315971] rcu: RCU grace-period kthread stack dump: [ 548.326084] rcu: Stack dump where RCU GP kthread last ran: [ 569.343268] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: [ 569.354868] rcu: 8-...!: (808 GPs behind) idle=07ac/0/0x1 softirq=682/682 fqs=0 [ 569.369617] rcu: 89-...!: (772 GPs behind) idle=8518/0/0x0 softirq=270/273 fqs=0 (false positive?) [ 569.387674] rcu: 112-...!: (227 GPs behind) idle=c168/0/0x0 softirq=193/193 fqs=0 (false positive?) [ 569.405894] rcu: 219-...!: (227 GPs behind) idle=d620/0/0x0 softirq=605/607 fqs=0 (false positive?) [ 569.424128] rcu: 226-...!: (235 GPs behind) idle=f920/0/0x0 softirq=1189/1190 fqs=0 (false positive?) [ 569.442700] rcu: (detected by 76, t=5276 jiffies, g=5997, q=5665 ncpus=256) [ 569.457146] rcu: rcu_sched kthread timer wakeup didn't happen for 5278 jiffies! g5997 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 [ 569.479149] rcu: Possible timer handling issue on cpu=94 timer-softirq=279 [ 569.493043] rcu: rcu_sched kthread starved for 5285 jiffies! g5997 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=94 [ 569.513534] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior. [ 569.531419] rcu: RCU grace-period kthread stack dump: [ 569.541536] rcu: Stack dump where RCU GP kthread last ran: [ 590.563260] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: [ 590.574870] rcu: 8-...!: (809 GPs behind) idle=0824/0/0x1 softirq=682/682 fqs=0 [ 590.589618] rcu: 89-...!: (773 GPs behind) idle=8850/0/0x0 softirq=270/273 fqs=0 (false positive?) [ 590.607682] rcu: 112-...!: (228 GPs behind) idle=c198/0/0x0 softirq=193/193 fqs=0 (false positive?) [ 590.625904] rcu: 195-...!: (0 ticks this GP) idle=7178/0/0x0 softirq=1038/1038 fqs=0 (false positive?) [ 590.644660] rcu: 207-...!: (0 ticks this GP) idle=9440/0/0x0 softirq=809/809 fqs=0 (false positive?) [ 590.663056] rcu: 219-...!: (228 GPs behind) idle=d650/0/0x0 softirq=605/607 fqs=0 (false positive?) [ 590.681285] rcu: 226-...!: (236 GPs behind) idle=fc78/0/0x0 softirq=1189/1190 fqs=0 (false positive?) [ 590.699859] rcu: (detected by 138, t=5286 jiffies, g=6001, q=5524 ncpus=256) [ 590.714623] rcu: rcu_sched kthread timer wakeup didn't happen for 5288 jiffies! g6001 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 [ 590.736635] rcu: Possible timer handling issue on cpu=94 timer-softirq=279 [ 590.750524] rcu: rcu_sched kthread starved for 5296 jiffies! g6001 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=94 [ 590.771021] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior. [ 590.788903] rcu: RCU grace-period kthread stack dump: [ 590.799012] rcu: Stack dump where RCU GP kthread last ran: [ 606.363275] INFO: task kworker/u1024:0:12 blocked for more than 483 seconds. [ 606.377139] Tainted: G W 7.1.0-rc2-test01 #1 [ 606.389636] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 611.823259] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: [ 611.834860] rcu: 8-...!: (810 GPs behind) idle=08bc/0/0x1 softirq=682/682 fqs=0 [ 611.849612] rcu: 89-...!: (774 GPs behind) idle=91a8/0/0x0 softirq=270/273 fqs=0 (false positive?) [ 611.867665] rcu: 112-...!: (229 GPs behind) idle=c1d8/0/0x0 softirq=193/193 fqs=0 (false positive?) [ 611.885887] rcu: 205-...!: (0 ticks this GP) idle=2160/0/0x0 softirq=865/865 fqs=0 (false positive?) [ 611.904290] rcu: 219-...!: (229 GPs behind) idle=d690/0/0x0 softirq=605/607 fqs=0 (false positive?) [ 611.922525] rcu: 226-...!: (237 GPs behind) idle=05e0/0/0x0 softirq=1189/1190 fqs=0 (false positive?) [ 611.941095] rcu: (detected by 166, t=5283 jiffies, g=6005, q=5522 ncpus=256) [ 611.955789] rcu: rcu_sched kthread timer wakeup didn't happen for 5285 jiffies! g6005 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 [ 611.977793] rcu: Possible timer handling issue on cpu=94 timer-softirq=279 [ 611.991685] rcu: rcu_sched kthread starved for 5292 jiffies! g6005 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=94 [ 612.012174] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior. [ 612.030060] rcu: RCU grace-period kthread stack dump: [ 612.040180] rcu: Stack dump where RCU GP kthread last ran: r[ 727.195272] INFO: task kworker/u1024:0:12 blocked for more than 604 seconds. [ 727.209134] Tainted: G W 7.1.0-rc2-test01 #1 [ 727.221628] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. NOTE(S): Random hangs and same messages as S7-2. Takes about 15 minutes to see the messages. --------------------------------------------------------- B2) SPARC64 T7-1 Kernel v7.0.4 [ OK ] Finished e2scrub_reap.service - Remove Stale Online ext4 Metadata Check Snapshots. Debian GNU/Linux forky/sid s7t7-debian-test ttyHV0 s7t7-debian-test login: tonyr Password: Linux s7t7-debian-test 7.0.4-test01 #1 SMP Fri May 8 09:27:58 PDT 2026 sparc64 [..] Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent permitted by applicable law. [ 79.468871] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: [ 79.480483] rcu: 114-...!: (424 GPs behind) idle=8760/0/0x0 softirq=126/126 fqs=0 (false positive?) [ 79.498713] rcu: (detected by 90, t=5259 jiffies, g=3769, q=818 ncpus=256) [ 79.512702] rcu: rcu_sched kthread timer wakeup didn't happen for 5260 jiffies! g3769 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 [ 79.534808] rcu: Possible timer handling issue on cpu=2 timer-softirq=330 [ 79.548516] rcu: rcu_sched kthread starved for 5267 jiffies! g3769 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=2 [ 79.568838] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior. [ 79.586709] rcu: RCU grace-period kthread stack dump: [ 79.596867] rcu: Stack dump where RCU GP kthread last ran: [ 100.612874] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: [ 100.624477] rcu: 114-...!: (425 GPs behind) idle=88f0/0/0x0 softirq=126/126 fqs=0 (false positive?) [ 100.642709] rcu: 157-...!: (0 ticks this GP) idle=4c08/0/0x0 softirq=122/122 fqs=0 (false positive?) [ 100.661106] rcu: (detected by 3, t=5264 jiffies, g=3773, q=1046 ncpus=256) [ 100.675155] rcu: rcu_sched kthread timer wakeup didn't happen for 5265 jiffies! g3773 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 [ 100.697211] rcu: Possible timer handling issue on cpu=2 timer-softirq=330 [ 100.710934] rcu: rcu_sched kthread starved for 5276 jiffies! g3773 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=2 [ 100.731244] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior. [ 100.749125] rcu: RCU grace-period kthread stack dump: [ 100.759255] rcu: Stack dump where RCU GP kthread last ran: login: ti[ 121.776867] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: [ 121.788467] rcu: 114-...!: (426 GPs behind) idle=8a20/0/0x0 softirq=126/126 fqs=0 (false positive?) [ 121.806703] rcu: (detected by 3, t=5259 jiffies, g=3777, q=1267 ncpus=256) [ 121.820664] rcu: rcu_sched kthread timer wakeup didn't happen for 5260 jiffies! g3777 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 [ 121.842799] rcu: Possible timer handling issue on cpu=2 timer-softirq=330 [ 121.856521] rcu: rcu_sched kthread starved for 5271 jiffies! g3777 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=2 [ 121.876836] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior. [ 121.894717] rcu: RCU grace-period kthread stack dump: [ 121.904824] rcu: Stack dump where RCU GP kthread last ran: [ 142.920877] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: [ 142.932481] rcu: 114-...!: (427 GPs behind) idle=8b98/0/0x0 softirq=126/126 fqs=0 (false positive?) [ 142.950709] rcu: 158-...!: (1 GPs behind) idle=5220/0/0x0 softirq=142/148 fqs=0 (false positive?) [ 142.968586] rcu: (detected by 122, t=5260 jiffies, g=3781, q=722 ncpus=256) [ 142.982808] rcu: rcu_sched kthread timer wakeup didn't happen for 5265 jiffies! g3781 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 [ 143.004857] rcu: Possible timer handling issue on cpu=2 timer-softirq=330 [ 143.018583] rcu: rcu_sched kthread starved for 5273 jiffies! g3781 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=2 [ 143.038893] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior. [ 143.056761] rcu: RCU grace-period kthread stack dump: [ 143.066898] rcu: Stack dump where RCU GP kthread last ran: [ 164.084863] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: [ 164.096463] rcu: 14-...!: (0 ticks this GP) idle=56b0/0/0x0 softirq=165/165 fqs=0 (false positive?) [ 164.114695] rcu: 114-...!: (428 GPs behind) idle=8ed0/0/0x0 softirq=126/126 fqs=0 (false positive?) [ 164.132916] rcu: (detected by 96, t=5264 jiffies, g=3785, q=750 ncpus=256) [ 164.146969] rcu: rcu_sched kthread timer wakeup didn't happen for 5265 jiffies! g3785 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 [ 164.169019] rcu: Possible timer handling issue on cpu=2 timer-softirq=330 [ 164.182728] rcu: rcu_sched kthread starved for 5276 jiffies! g3785 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=2 [ 164.203055] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior. [ 164.220922] rcu: RCU grace-period kthread stack dump: [ 164.231039] rcu: Stack dump where RCU GP kthread last ran: [ 185.248867] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: [ 185.260468] rcu: 21-...!: (0 ticks this GP) idle=36c8/0/0x0 softirq=154/154 fqs=0 (false positive?) [ 185.278684] rcu: 114-...!: (429 GPs behind) idle=8f68/0/0x0 softirq=126/126 fqs=0 (false positive?) [ 185.296922] rcu: (detected by 116, t=5264 jiffies, g=3789, q=760 ncpus=256) [ 185.311140] rcu: rcu_sched kthread timer wakeup didn't happen for 5265 jiffies! g3789 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 [ 185.333205] rcu: Possible timer handling issue on cpu=2 timer-softirq=330 [ 185.346918] rcu: rcu_sched kthread starved for 5276 jiffies! g3789 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=2 [ 185.367224] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior. [ 185.385113] rcu: RCU grace-period kthread stack dump: [ 185.395229] rcu: Stack dump where RCU GP kthread last ran: OK ] Reached target network-online.target - Network is Online. [ OK ] Started anacron.service - Run anacron jobs. [ OK ] Started cups-browsed.service - Make remote CUPS printers available locally. Starting exim4.service - exim Mail Transport Agent... Starting xrdp.service - xrdp daemon... [ OK ] Finished user-runtime-dir@1000.service - User Runtime Directory /run/user/1000. [ OK ] Started xrdp.service - xrdp daemon. [ OK ] Started serial-getty@ttyHV0.service - Serial Getty on ttyHV0. Starting user@1000.service - User Manager for UID 1000... [ OK ] Started exim4.service - exim Mail Transport Agent. [ OK ] Reached target multi-user.target - Multi-User System. [ OK ] Reached target graphical.target - Graphical Interface. [ OK ] Started user@1000.service - User Manager for UID 1000. [FAILED] Failed to start session-1.scope - Session 1 of User tonyr. See 'systemctl status session-1.scope' for details. [ 206.412865] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: [ 206.424477] rcu: 114-...!: (430 GPs behind) idle=97b0/0/0x0 softirq=126/126 fqs=0 (false positive?) [ 206.442691] rcu: (detected by 123, t=5259 jiffies, g=3793, q=5473 ncpus=256) [ 206.457056] rcu: rcu_sched kthread timer wakeup didn't happen for 5261 jiffies! g3793 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 [ 206.479157] rcu: Possible timer handling issue on cpu=2 timer-softirq=330 [ 206.492868] rcu: rcu_sched kthread starved for 5271 jiffies! g3793 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=2 [ 206.513173] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior. [ 206.531061] rcu: RCU grace-period kthread stack dump: [ 206.541202] rcu: Stack dump where RCU GP kthread last ran: NOTE(S): Unable to login and random hangs during system startup. Same messages/issues as S7-2. > > On 5/8/26 12:50 AM, Thorsten Leemhuis wrote: >> [+tglx so he knows about it; details about the problem that Tony faces >> can be found in https://github.com/sparclinux/issues/issues/79 ] >> >> On 5/8/26 09:38, Tony Rodriguez wrote: >>> I still don't believe this is fixed upstream as of v7.03 and v7.1-rc1, >> Yes and no. It looks like d6e152d905bdb1 ("clockevents: Prevent timer >> interrupt starvation") causes two regression. >> >> Thomas fixed one with 4096fd0e8eaea1 ("clockevents: Add missing resets >> of the next_event_forced flag") -- and feedback shows that it definitely >> solved the problem for quite a few people. If that's not the case for >> you, then you seem to face a different problem caused by the same >> change. Happens, that's life sometimes. >> >> Ciao, Thorsten >> >>> only when my patch is applied does the SPARC64 S7-2 system become >>> stable >>> again. I also tested my patch with v7.0.4 and it works their as well. >>> Will perform additional tests without my fix against v7.0.4 and >>> v7.1-rc2 >>> later today to revalidate the regression (USA Pacific time). >>> >>> Tony Rodriguez >>> www.linkedin.com/in/unixpro1970 >>> >>>> On May 7, 2026, at 11:33 PM, Thorsten Leemhuis <linux@leemhuis.info> >>>> wrote: >>>> >>>> On 5/8/26 07:51, John Paul Adrian Glaubitz wrote: >>>>> On Thu, 2026-04-23 at 18:30 +0200, Thorsten Leemhuis wrote: >>>>>> FWIW, here is the rough timeline of the regression, just to be >>>>>> sure we >>>>>> are all on the same page: >>>>>> >>>>>> * The regression I'm talking about is caused by d6e152d905bdb1 >>>>>> ("clockevents: Prevent timer interrupt starvation") [authored: >>>>>> 2026-04-07 10:54:17; committed: 2026-04-10 22:45:38; next arrival: >>>>>> next-20260413; merged: 2026-04-12 19:01:55; v7.0 (2026-04-12 >>>>>> 22:48:06)] >>>>> [...] >>>>> Tony Rodriguez from the SPARC community has observed the regression >>>>> on SPARC as well >>>>> and proposed a fix to address it [1]. Not sure whether he has >>>>> retested on the latest >>>>> commit of Linus' tree yet. >>>>> >>>>> Tony, can you verify that 4096fd0e8eaea1 fixes the issue for you? >>>>> >>>>>> [1] https://github.com/sparclinux/issues/issues/79 >>>> It's likely a different regressions, as that report's title says that >>>> v7.0.1, v7.0.2, v7.0.3, and v7.1‑rc1 are affected, which all >>>> contain the >>>> fix, aka 4096fd0e8eaea1. Reporting in a new thread is likely best, as >>>> the authors of the culprit are not even CCed here. >>>> >>>> Ciao, Thorsten ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: the stuttering regression in 7.0: should I have done something different? 2026-05-08 20:15 ` Tony Rodriguez 2026-05-08 20:21 ` Tony Rodriguez @ 2026-05-10 21:29 ` Thomas Gleixner 2026-05-11 3:13 ` Tony Rodriguez 1 sibling, 1 reply; 7+ messages in thread From: Thomas Gleixner @ 2026-05-10 21:29 UTC (permalink / raw) To: Tony Rodriguez, Thorsten Leemhuis Cc: John Paul Adrian Glaubitz, Greg KH, Linus Torvalds, Linux kernel regressions list, LKML On Fri, May 08 2026 at 13:15, Tony Rodriguez wrote: > Just confirmed on my end today. This regression also impacts both > SPARC64 S7-2 and SPARC64 T7-1 on v7.0.4 and v7.1-rc2 as well. Different > systems using the same exact kernels. > > ** Please see points (A1) (A2) (B1) (B2) > > Once again, I am not experiencing such issues when "my patch" (link > below) is added to address this regression. > > https://github.com/sparclinux/issues/issues/79#issuecomment-4362173884 Github issues are really not helpful. > PS - On May 2nd 2026 at 9:42 PM: I also sent an email to Thomas Gleixner > regarding this issue. Will be happy to validate any patches from your > end regarding this issue, as time permits me to do so. Sorry, that mail got lost as it was in reply to a random other archived thread which has absolutely nothing to do with the problem at hand. I just looked at your github thing. Despite your changelog claiming otherwise your "fix" breaks the DoS protection completely. It's a polished version of a revert. It also lacks a proper root cause analysis. This list: - skipped programming events when delta <= min_delta_ns - changed force semantics for overdue events - introduced a sticky next_event_forced state - returned success even when no event was programmed does not qualify and is actually wrong. The code does not unconditionally skip the programming of events when delta <= min_delta_ns. It only does so conditionally when the previous force programmed min_delta_ns event has not been delivered to the kernel yet, i.e. dev->next_event_forced is still set. That flag is only set when the minimal value has been successfully programmed and it _is_ cleared on the next timer interrupt, which should obviously happen due to this minimal delta programming. It is also cleared when a new event > min_delta_ns is successfully programmed _before_ the previous one was delivered. IOW, the core code programmed the hardware with the min_delta_ns (min_delta_ticks) timeout and the SPARC clockevents driver returned success (0). Now the core code refuses to do further reprogramming with the min_delta_ns timeout as that would shift the expiry (interrupt) further out until the interrupt actually is delivered or some other event which is not below the min_delta_ns threshold is programmed. So let's assume that this logic is causing the problem, then the only explanation for the observed behaviour is that the expected interrupt due to a forced min_delta_ns programming is never delivered. That made me look into the SPARC specific set_next_event() functions. I don't know which variant your machines are using, but all of them have the same underlying problem. The interrupt is based on a equal comparator, so the programming logic for each of the tick variants is: $variant_add_compare(delta) { cmp = read_timer() + delta; write_comparator(cmp); now = read_timer(); return (now - cmp) > 0; } and the actual set_next_event() function which is invoked from the core code does: return tick_operations.add_compare(delta) ? -ETIME : 0; IOW, when the timer read _after_ writing the comparator value is ahead of the comparator value the operation failed. Looks about right in theory. But then there is the reality of hardware which ruins everything. I've banged my head against the wall many years ago when debugging a similar issue with the x86 HPET which has the same hardware design failure of using a compare equal comparator instead of having a compare less than equal one. See the lengthy comment in hpet_clkevt_set_next_event() for further information. Can you apply the debug patch below, which will disable tracing once it hits the hung task detector and then retrieve the trace? If that's not possible as the system is unresponsive, then please add 'ftrace_dump_on_oops' on the kernel command line or enable it after boot in /proc/sys/kernel and let the kernel panic when it hits the hung task detector fail. Thanks, tglx --- --- a/arch/sparc/kernel/time_64.c +++ b/arch/sparc/kernel/time_64.c @@ -732,8 +732,10 @@ void __irq_entry timer_interrupt(int irq if (unlikely(!evt->event_handler)) { printk(KERN_WARNING "Spurious SPARC64 timer interrupt on cpu %d\n", cpu); - } else + } else { + trace_printk("Invoking handler %pS\n", evt->event_handler); evt->event_handler(evt); + } irq_exit(); --- a/kernel/hung_task.c +++ b/kernel/hung_task.c @@ -248,6 +248,7 @@ static void hung_task_info(struct task_s * accordingly */ if (sysctl_hung_task_warnings || hung_task_call_panic) { + tracing_off(); if (sysctl_hung_task_warnings > 0) sysctl_hung_task_warnings--; pr_err("INFO: task %s:%d blocked%s for more than %ld seconds.\n", --- a/kernel/time/clockevents.c +++ b/kernel/time/clockevents.c @@ -370,18 +370,22 @@ int clockevents_program_event(struct clo delta = min(delta, (int64_t) dev->max_delta_ns); cycles = ((u64)delta * dev->mult) >> dev->shift; if (!dev->set_next_event((unsigned long) cycles, dev)) { + trace_printk("Successfully programmed %lld %lld\n", expires, delta); dev->next_event_forced = 0; return 0; } } - if (dev->next_event_forced) + if (dev->next_event_forced) { + trace_printk("Skipping %lld %lld\n", expires, delta); return 0; + } if (dev->set_next_event(dev->min_delta_ticks, dev)) { if (!force || clockevents_program_min_delta(dev)) return -ETIME; } + trace_printk("Force programmed min delta %lld %lld\n", expires, delta); dev->next_event_forced = 1; return 0; } ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: the stuttering regression in 7.0: should I have done something different? 2026-05-10 21:29 ` Thomas Gleixner @ 2026-05-11 3:13 ` Tony Rodriguez 0 siblings, 0 replies; 7+ messages in thread From: Tony Rodriguez @ 2026-05-11 3:13 UTC (permalink / raw) To: Thomas Gleixner, Thorsten Leemhuis Cc: John Paul Adrian Glaubitz, Greg KH, Linus Torvalds, Linux kernel regressions list, LKML Hi Thomas, Thank you for the detailed analysis — this helps clarify the situation on the SPARC side. You are correct that my earlier explanation focused too much on the core changes and not enough on the SPARC clockevents behaviour under the new forced-min-delta semantics. However, having a stable system is equally important and is the main reason that I developed the test patch (to help). However, your explanation makes sense. I will apply your debug patch and capture the trace as requested. If the system becomes unresponsive, I will enable ftrace_dump_on_oops so the trace is emitted when the hung task detector triggers. Once I have the trace, I’ll send another email. Thanks again for the guidance — I’ll follow up with trace results sometime tomorrow. Tony On 5/10/26 2:29 PM, Thomas Gleixner wrote: > On Fri, May 08 2026 at 13:15, Tony Rodriguez wrote: >> Just confirmed on my end today. This regression also impacts both >> SPARC64 S7-2 and SPARC64 T7-1 on v7.0.4 and v7.1-rc2 as well. Different >> systems using the same exact kernels. >> >> ** Please see points (A1) (A2) (B1) (B2) >> >> Once again, I am not experiencing such issues when "my patch" (link >> below) is added to address this regression. >> >> https://github.com/sparclinux/issues/issues/79#issuecomment-4362173884 > Github issues are really not helpful. > >> PS - On May 2nd 2026 at 9:42 PM: I also sent an email to Thomas Gleixner >> regarding this issue. Will be happy to validate any patches from your >> end regarding this issue, as time permits me to do so. > Sorry, that mail got lost as it was in reply to a random other archived > thread which has absolutely nothing to do with the problem at hand. > > I just looked at your github thing. Despite your changelog claiming > otherwise your "fix" breaks the DoS protection completely. It's a > polished version of a revert. > > It also lacks a proper root cause analysis. This list: > > - skipped programming events when delta <= min_delta_ns > - changed force semantics for overdue events > - introduced a sticky next_event_forced state > - returned success even when no event was programmed > > does not qualify and is actually wrong. > > The code does not unconditionally skip the programming of events when > delta <= min_delta_ns. It only does so conditionally when the previous > force programmed min_delta_ns event has not been delivered to the kernel > yet, i.e. dev->next_event_forced is still set. > > That flag is only set when the minimal value has been successfully > programmed and it _is_ cleared on the next timer interrupt, which should > obviously happen due to this minimal delta programming. It is also > cleared when a new event > min_delta_ns is successfully programmed > _before_ the previous one was delivered. > > IOW, the core code programmed the hardware with the min_delta_ns > (min_delta_ticks) timeout and the SPARC clockevents driver returned > success (0). Now the core code refuses to do further reprogramming with > the min_delta_ns timeout as that would shift the expiry (interrupt) > further out until the interrupt actually is delivered or some other > event which is not below the min_delta_ns threshold is programmed. > > So let's assume that this logic is causing the problem, then the only > explanation for the observed behaviour is that the expected interrupt > due to a forced min_delta_ns programming is never delivered. > > That made me look into the SPARC specific set_next_event() functions. I > don't know which variant your machines are using, but all of them have > the same underlying problem. The interrupt is based on a equal > comparator, so the programming logic for each of the tick variants is: > > $variant_add_compare(delta) > { > cmp = read_timer() + delta; > write_comparator(cmp); > now = read_timer(); > return (now - cmp) > 0; > } > > and the actual set_next_event() function which is invoked from the core > code does: > > return tick_operations.add_compare(delta) ? -ETIME : 0; > > IOW, when the timer read _after_ writing the comparator value is ahead > of the comparator value the operation failed. Looks about right in > theory. > > But then there is the reality of hardware which ruins everything. I've > banged my head against the wall many years ago when debugging a similar > issue with the x86 HPET which has the same hardware design failure of > using a compare equal comparator instead of having a compare less than > equal one. See the lengthy comment in hpet_clkevt_set_next_event() for > further information. > > Can you apply the debug patch below, which will disable tracing once it > hits the hung task detector and then retrieve the trace? > > If that's not possible as the system is unresponsive, then please add > 'ftrace_dump_on_oops' on the kernel command line or enable it after boot > in /proc/sys/kernel and let the kernel panic when it hits the hung task > detector fail. > > Thanks, > > tglx > --- > --- a/arch/sparc/kernel/time_64.c > +++ b/arch/sparc/kernel/time_64.c > @@ -732,8 +732,10 @@ void __irq_entry timer_interrupt(int irq > if (unlikely(!evt->event_handler)) { > printk(KERN_WARNING > "Spurious SPARC64 timer interrupt on cpu %d\n", cpu); > - } else > + } else { > + trace_printk("Invoking handler %pS\n", evt->event_handler); > evt->event_handler(evt); > + } > > irq_exit(); > > --- a/kernel/hung_task.c > +++ b/kernel/hung_task.c > @@ -248,6 +248,7 @@ static void hung_task_info(struct task_s > * accordingly > */ > if (sysctl_hung_task_warnings || hung_task_call_panic) { > + tracing_off(); > if (sysctl_hung_task_warnings > 0) > sysctl_hung_task_warnings--; > pr_err("INFO: task %s:%d blocked%s for more than %ld seconds.\n", > --- a/kernel/time/clockevents.c > +++ b/kernel/time/clockevents.c > @@ -370,18 +370,22 @@ int clockevents_program_event(struct clo > delta = min(delta, (int64_t) dev->max_delta_ns); > cycles = ((u64)delta * dev->mult) >> dev->shift; > if (!dev->set_next_event((unsigned long) cycles, dev)) { > + trace_printk("Successfully programmed %lld %lld\n", expires, delta); > dev->next_event_forced = 0; > return 0; > } > } > > - if (dev->next_event_forced) > + if (dev->next_event_forced) { > + trace_printk("Skipping %lld %lld\n", expires, delta); > return 0; > + } > > if (dev->set_next_event(dev->min_delta_ticks, dev)) { > if (!force || clockevents_program_min_delta(dev)) > return -ETIME; > } > + trace_printk("Force programmed min delta %lld %lld\n", expires, delta); > dev->next_event_forced = 1; > return 0; > } ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-05-11 3:13 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <4dd98a32-d1d6-43de-910c-7e487503177e@leemhuis.info>
2026-05-08 5:51 ` the stuttering regression in 7.0: should I have done something different? John Paul Adrian Glaubitz
2026-05-08 6:33 ` Thorsten Leemhuis
[not found] ` <D5D19776-C809-4284-9417-F9A860877B98@gmail.com>
2026-05-08 7:50 ` Thorsten Leemhuis
2026-05-08 20:15 ` Tony Rodriguez
2026-05-08 20:21 ` Tony Rodriguez
2026-05-10 21:29 ` Thomas Gleixner
2026-05-11 3:13 ` Tony Rodriguez
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox