Re: the stuttering regression in 7.0: should I have done something different

The Linux Kernel Mailing List
 help / color / mirror / Atom feed

From: Tony Rodriguez <unixpro1970@gmail.com>
To: Thomas Gleixner <tglx@kernel.org>
Cc: Linux kernel regressions list <regressions@lists.linux.dev>,
	LKML <linux-kernel@vger.kernel.org>,
	sparclinux@vger.kernel.org,
	John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>,
	Thorsten Leemhuis <regressions@leemhuis.info>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: the stuttering regression in 7.0: should I have done something different
Date: Thu, 14 May 2026 00:24:46 -0700	[thread overview]
Message-ID: <e45438e7-8501-4c10-95ee-07f118de8a51@gmail.com> (raw)
In-Reply-To: <87tssb6olo.ffs@tglx>

Hi Thomas,

Cheers!

Initial validation of the test patches for v7.0.6 and 7.1-rc3 on the 
S7-2 looks promising: I have not observed panics, timer delays, or other 
timer-related issues so far. I will pause broader validation on the S7-2 
and T7-1 until I receive your recommendation or any requested revisions 
(see inline comments below).

Note: I did see an intermittent error on the S7-2 running 7.1-rc3, 
usually when the system is under heavy load during a kernel build. I’m 
not sure whether it is a separate problem?

"[676.464681] BUG: Bad rss-counter state mm:000000008d9f1cf2 
type:MM_FILEPAGES val:-4096 Comm:cc1 Pid:78165".

On 5/13/26 1:28 PM, Thomas Gleixner wrote:

> Just to be clear: I never saw the VHDL code of that CPU, but that
> pattern is way too familiar.
>
> Those equal comparators, which were designed by AI (Absence of
> Intelligence) before AI got popular, generally work this way:
>
>    The comparator is only evaluated on the clock edge which increments
>    the counter, but not when the comparator value is written. So a write
>    of the same value does not result in an interrupt.
>
> That's an "optimization" which spares quite a few gates and is obviously
> nowhere documented. So software has to deal with the consequences by
> using a crystal ball, which is trivial to get wrong and can go unnoticed
> for a long time until it roars it's ugly head at some point for whatever
> reasons.
>
> I'm willing to bet a round of beers at the next conference that this is
> the problem and that it will magically disappear when you change that
> condition to:
>
>          return (read_cnt() - exp) >= 0 ? -ETIME : 0;

Attempted to locate "return (read_cnt() - exp) >= 0 ? -ETIME : 0;" but 
could not find an exact match. After additional inspection I updated the 
following functions "tick_add_compare()" and "stick_add_compare()" in 
arch/sparc/kernel/time_64.c to from "> 0L" to ">= 0L". This appears to 
have resolved the lost-timer behavior.

--- time_64.c.orig
+++ time_64.c
@@ -146,7 +146,7 @@
                              : "=r" (new_tick));
         new_tick &= ~TICKCMP_IRQ_BIT;

-       return ((long)(new_tick - (orig_tick+adj))) > 0L;
+       return ((long)(new_tick - (orig_tick+adj))) >= 0L;
  }

  static unsigned long tick_add_tick(unsigned long adj)
@@ -277,7 +277,7 @@
                              : "=r" (new_tick));
         new_tick &= ~TICKCMP_IRQ_BIT;

-       return ((long)(new_tick - (orig_tick+adj))) > 0L;
+       return ((long)(new_tick - (orig_tick+adj))) >= 0L;
  }

  static unsigned long stick_get_frequency(void)

>
> unless they managed to add some extra propagation delay to that
> comparator write like the HPET folks did at some point without telling
> anyone. I doubt the SPARC janitor who implemented it did so because
> that would have made the failure way more likely.
>
> I have truly no idea why the original code did not expose this problem,
> though it might have been just papered over by sheer luck and timing.
>
> Thanks,
>
>          tglx
> ---
> --- a/kernel/time/clockevents.c
> +++ b/kernel/time/clockevents.c
> @@ -381,6 +381,8 @@ int clockevents_program_event(struct clo
>   	if (dev->set_next_event(dev->min_delta_ticks, dev)) {
>   		if (!force || clockevents_program_min_delta(dev))
>   			return -ETIME;
> +	} else if (delta <= 0) {
> +		dev->next_event = ktime_add_ns(ktime_get(), dev->min_delta_ns);
>   	}
>   	dev->next_event_forced = 1;
>   	return 0;
>
You mentioned this kernel/time/clockevents.c patch is optional, but I 
propose revising clockevents_program_event(). If the requested event 
time is already at or before now, record a sane next_event (now + 
min_delta) so core code sees a future expected time and can behave 
correctly. Does this seem reasonable?

  --- clockevents.c.orig
+++ clockevents.c
@@ -347,6 +347,11 @@
         if (dev->set_next_event(dev->min_delta_ticks, dev)) {
                 if (!force || clockevents_program_min_delta(dev))
                         return -ETIME;
+       } else {
+               ktime_t now = ktime_get();
+               s64 delta_ns = ktime_to_ns(ktime_sub(expires, now));
+               if (delta_ns <= 0)
+                       dev->next_event = ktime_add_ns(now, 
dev->min_delta_ns);
         }
         dev->next_event_forced = 1;
         return 0;

next prev parent reply	other threads:[~2026-05-14  7:24 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <4dd98a32-d1d6-43de-910c-7e487503177e@leemhuis.info>
2026-05-08  5:51 ` the stuttering regression in 7.0: should I have done something different? John Paul Adrian Glaubitz
2026-05-08  6:33   ` Thorsten Leemhuis
     [not found]     ` <D5D19776-C809-4284-9417-F9A860877B98@gmail.com>
2026-05-08  7:50       ` Thorsten Leemhuis
2026-05-08 20:15         ` Tony Rodriguez
2026-05-08 20:21           ` Tony Rodriguez
2026-05-10 21:29           ` Thomas Gleixner
2026-05-11  3:13             ` Tony Rodriguez
2026-05-12  5:03               ` the stuttering regression in 7.0: should I have done something different Tony Rodriguez
2026-05-12  8:17                 ` Thomas Gleixner
2026-05-12 21:43                   ` Tony Rodriguez
2026-05-13 20:28                     ` Thomas Gleixner
2026-05-14  7:24                       ` Tony Rodriguez [this message]
2026-05-14 10:24                         ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e45438e7-8501-4c10-95ee-07f118de8a51@gmail.com \
    --to=unixpro1970@gmail.com \
    --cc=glaubitz@physik.fu-berlin.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=regressions@leemhuis.info \
    --cc=regressions@lists.linux.dev \
    --cc=sparclinux@vger.kernel.org \
    --cc=tglx@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox