From: Thomas Gleixner <tglx@kernel.org>
To: Tony Rodriguez <unixpro1970@gmail.com>
Cc: Linux kernel regressions list <regressions@lists.linux.dev>,
LKML <linux-kernel@vger.kernel.org>,
sparclinux@vger.kernel.org,
John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>,
Thorsten Leemhuis <regressions@leemhuis.info>
Subject: Re: the stuttering regression in 7.0: should I have done something different
Date: Wed, 13 May 2026 22:28:35 +0200 [thread overview]
Message-ID: <87tssb6olo.ffs@tglx> (raw)
In-Reply-To: <64f465ca-6117-4375-9c4b-af771b8205fd@gmail.com>
Tony!
On Tue, May 12 2026 at 14:43, Tony Rodriguez wrote:
>> Can you add 'trace_buf_size=50k' to the kernel command line, which
>> limits the buffer size to about 640 entries. Assuming 115200 Baud this
>> should then take about 4 seconds per CPU to dump, which still is a bunch
>> on a large machine, but definitely way more workable than the default.
>
> Done. The complete trace file "s7-2-05122026-dump.tar.gz" can be
> obtained from my GitHub repo:
>
> https://github.com/unixpro1970/Sparc64-Kernel-Debugging-Dumps
Thanks for providing the data. So in both traces there is a clear
indication that the forced programmed min delta does not result in an
interrupt. Here are the last trace events on the affected CPUs.
No AHAVI CPU 116:
[ 280.939873] <idle>-0 116d.h.. 11612209us : timer_interrupt: Invoking handler hrtimer_interrupt+0x0/0x280
[ 280.980493] <idle>-0 116d.h.. 11612213us : clockevents_program_event: Successfully programmed 9580000000 3991235
[ 281.023902] <idle>-0 116d.... 11612218us : clockevents_program_event: Successfully programmed 10112024440 536010830
[ 281.089687] <idle>-0 116dn... 11636205us : clockevents_program_event: Force programmed min delta 9600000000 10
No AHAVI CPU 100:
[ 299.943989] systemd-1 100d.h.. 27594794us : timer_interrupt: Invoking handler hrtimer_interrupt+0x0/0x280
[ 299.964303] systemd-1 100d.h.. 27594796us : clockevents_program_event: Successfully programmed 25560000000 1407865
[ 299.986182] systemd-1 100d.... 27594932us : clockevents_program_event: Force programmed min delta 1 -25558727644
[ 300.007707] systemd-1 100d.h.. 27594933us : timer_interrupt: Invoking handler hrtimer_interrupt+0x0/0x280
[ 300.028019] systemd-1 100d.h.. 27594934us : clockevents_program_event: Successfully programmed 25560000000 1269565
[ 300.049894] systemd-1 100d.... 27594971us : clockevents_program_event: Force programmed min delta 1 -25558767244
[ 300.071415] systemd-1 100d.... 27598043us : clockevents_program_event: Skipping 25560000000 -1838405
AHAVI CPU 6:
[ 1247.573212] <idle>-0 6d.h.. 84194945us : timer_interrupt: Invoking handler hrtimer_interrupt+0x0/0x280
[ 1247.613828] <idle>-0 6d.h.. 84194947us : clockevents_program_event: Successfully programmed 80140000000 3928334
[ 1247.762267] <idle>-0 6d.h.. 84198876us : timer_interrupt: Invoking handler hrtimer_interrupt+0x0/0x280
[ 1247.844549] <idle>-0 6d.h.. 84198878us : clockevents_program_event: Force programmed min delta 80140000000 771
AHAVI CPU 61:
[ 1258.222440] <idle>-0 61d.h.. 84234905us : timer_interrupt: Invoking handler hrtimer_interrupt+0x0/0x280
[ 1258.516354] <idle>-0 61dnh.. 84234910us : clockevents_program_event: Successfully programmed 84176000000 3999995280
[ 1258.648636] <idle>-0 61dn... 84234914us : clockevents_program_event: Successfully programmed 80180000000 3991863
[ 1258.868940] <idle>-0 61d.h.. 84238906us : timer_interrupt: Invoking handler hrtimer_interrupt+0x0/0x280
[ 1258.993594] <idle>-0 61d.h.. 84238908us : clockevents_program_event: Force programmed min delta 80180000000 612
So there is only one case (CPU116) where another event in the past
programming (delta < 0) is skipped due to the force bit being set. But
that skip happens ~3ms after the min delta was programmed, which should
have resulted in an interrupt which never happened.
The original code is not really different vs. that min delta
programming, except that it does not have the next_event_forced
logic. But as you can see above this logic is not really making a
difference.
So I went through the differences line by line again and I found a very
subtle difference, but I can't seen how that would magically cure the
actual problem of the non-firing interrupt. The missing update of
dev->next_event in the force reprogram case of (delta <= 0) is
completely irrelevant as both events are in the past so it does not
matter at all. Nevertheless see the pointless and purely cosmetic delta
patch below.
But coming back to the trace data. There are tons of instances where the
forced programmed min delta results in an interrupt right afterwards:
[ 1258.868940] <idle>-0 61d.h.. 84238906us : timer_interrupt: Invoking handler hrtimer_interrupt+0x0/0x280
[ 1258.889262] <idle>-0 60d.h.. 84238906us : timer_interrupt: Invoking handler hrtimer_interrupt+0x0/0x280
[ 1258.909570] <idle>-0 63d.h.. 84238906us : timer_interrupt: Invoking handler hrtimer_interrupt+0x0/0x280
[ 1258.929877] <idle>-0 70d.h.. 84238906us : timer_interrupt: Invoking handler hrtimer_interrupt+0x0/0x280
[ 1258.950197] <idle>-0 60d.h.. 84238907us : clockevents_program_event: Force programmed min delta 80180000000 552
[ 1258.971896] <idle>-0 63d.h.. 84238908us : clockevents_program_event: Force programmed min delta 80180000000 627
[ 1258.993594] <idle>-0 61d.h.. 84238908us : clockevents_program_event: Force programmed min delta 80180000000 612
[ 1259.015292] <idle>-0 70d.h.. 84238908us : clockevents_program_event: Force programmed min delta 80180000000 313
[ 1259.036992] <idle>-0 60d.h.. 84238910us : timer_interrupt: Invoking handler hrtimer_interrupt+0x0/0x280
[ 1259.057313] <idle>-0 63d.h.. 84238910us : timer_interrupt: Invoking handler hrtimer_interrupt+0x0/0x280
[ 1259.077620] <idle>-0 70d.h.. 84238912us : timer_interrupt: Invoking handler hrtimer_interrupt+0x0/0x280
So all four involved CPUs force program min delta from the timer
interrupt context, but only three of them actually get an interrupt
afterwards. CPU61 fails to deliver one and as a result it goes stale.
As the set_next_event() callback returns 0 (success) in all cases -
otherwise we wouldn't see the trace entry - this all points to a problem
with that rearming logic:
exp = read_cnt() + delta_ticks;
write_cmp(exp);
return (read_cnt() - exp) > 0 ? -ETIME : 0;
Your machine uses 'stick', which runs according to the conversion
factors in dmesg at 1GHz, but the CPU runs at 4.27GHz AFAIK. So you can
clearly run into a situation like this:
TICK_CNT CPU
T1 exp = read_cnt() + D
... // Some delay
T1 + D
write_cmp(T1 + D)
now = read_cnt() // Reads T1 + D
T1 + D + 1
---> returns success and the interrupt is never firing
Why?
Just to be clear: I never saw the VHDL code of that CPU, but that
pattern is way too familiar.
Those equal comparators, which were designed by AI (Absence of
Intelligence) before AI got popular, generally work this way:
The comparator is only evaluated on the clock edge which increments
the counter, but not when the comparator value is written. So a write
of the same value does not result in an interrupt.
That's an "optimization" which spares quite a few gates and is obviously
nowhere documented. So software has to deal with the consequences by
using a crystal ball, which is trivial to get wrong and can go unnoticed
for a long time until it roars it's ugly head at some point for whatever
reasons.
I'm willing to bet a round of beers at the next conference that this is
the problem and that it will magically disappear when you change that
condition to:
return (read_cnt() - exp) >= 0 ? -ETIME : 0;
unless they managed to add some extra propagation delay to that
comparator write like the HPET folks did at some point without telling
anyone. I doubt the SPARC janitor who implemented it did so because
that would have made the failure way more likely.
I have truly no idea why the original code did not expose this problem,
though it might have been just papered over by sheer luck and timing.
Thanks,
tglx
---
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -381,6 +381,8 @@ int clockevents_program_event(struct clo
if (dev->set_next_event(dev->min_delta_ticks, dev)) {
if (!force || clockevents_program_min_delta(dev))
return -ETIME;
+ } else if (delta <= 0) {
+ dev->next_event = ktime_add_ns(ktime_get(), dev->min_delta_ns);
}
dev->next_event_forced = 1;
return 0;
next prev parent reply other threads:[~2026-05-13 20:28 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <4dd98a32-d1d6-43de-910c-7e487503177e@leemhuis.info>
2026-05-08 5:51 ` the stuttering regression in 7.0: should I have done something different? John Paul Adrian Glaubitz
2026-05-08 6:33 ` Thorsten Leemhuis
[not found] ` <D5D19776-C809-4284-9417-F9A860877B98@gmail.com>
2026-05-08 7:50 ` Thorsten Leemhuis
2026-05-08 20:15 ` Tony Rodriguez
2026-05-08 20:21 ` Tony Rodriguez
2026-05-10 21:29 ` Thomas Gleixner
2026-05-11 3:13 ` Tony Rodriguez
2026-05-12 5:03 ` the stuttering regression in 7.0: should I have done something different Tony Rodriguez
2026-05-12 8:17 ` Thomas Gleixner
2026-05-12 21:43 ` Tony Rodriguez
2026-05-13 20:28 ` Thomas Gleixner [this message]
2026-05-14 7:24 ` Tony Rodriguez
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87tssb6olo.ffs@tglx \
--to=tglx@kernel.org \
--cc=glaubitz@physik.fu-berlin.de \
--cc=linux-kernel@vger.kernel.org \
--cc=regressions@leemhuis.info \
--cc=regressions@lists.linux.dev \
--cc=sparclinux@vger.kernel.org \
--cc=unixpro1970@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox