All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tony Rodriguez <unixpro1970@gmail.com>
To: Thomas Gleixner <tglx@kernel.org>
Cc: Linux kernel regressions list <regressions@lists.linux.dev>,
	LKML <linux-kernel@vger.kernel.org>,
	sparclinux@vger.kernel.org,
	John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>,
	Thorsten Leemhuis <regressions@leemhuis.info>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: the stuttering regression in 7.0: should I have done something different
Date: Thu, 14 May 2026 00:24:46 -0700	[thread overview]
Message-ID: <e45438e7-8501-4c10-95ee-07f118de8a51@gmail.com> (raw)
In-Reply-To: <87tssb6olo.ffs@tglx>

Hi Thomas,

Cheers!

Initial validation of the test patches for v7.0.6 and 7.1-rc3 on the 
S7-2 looks promising: I have not observed panics, timer delays, or other 
timer-related issues so far. I will pause broader validation on the S7-2 
and T7-1 until I receive your recommendation or any requested revisions 
(see inline comments below).

Note: I did see an intermittent error on the S7-2 running 7.1-rc3, 
usually when the system is under heavy load during a kernel build. I’m 
not sure whether it is a separate problem?

"[676.464681] BUG: Bad rss-counter state mm:000000008d9f1cf2 
type:MM_FILEPAGES val:-4096 Comm:cc1 Pid:78165".

On 5/13/26 1:28 PM, Thomas Gleixner wrote:

> Just to be clear: I never saw the VHDL code of that CPU, but that
> pattern is way too familiar.
>
> Those equal comparators, which were designed by AI (Absence of
> Intelligence) before AI got popular, generally work this way:
>
>    The comparator is only evaluated on the clock edge which increments
>    the counter, but not when the comparator value is written. So a write
>    of the same value does not result in an interrupt.
>
> That's an "optimization" which spares quite a few gates and is obviously
> nowhere documented. So software has to deal with the consequences by
> using a crystal ball, which is trivial to get wrong and can go unnoticed
> for a long time until it roars it's ugly head at some point for whatever
> reasons.
>
> I'm willing to bet a round of beers at the next conference that this is
> the problem and that it will magically disappear when you change that
> condition to:
>
>          return (read_cnt() - exp) >= 0 ? -ETIME : 0;

Attempted to locate "return (read_cnt() - exp) >= 0 ? -ETIME : 0;" but 
could not find an exact match. After additional inspection I updated the 
following functions "tick_add_compare()" and "stick_add_compare()" in 
arch/sparc/kernel/time_64.c to from "> 0L" to ">= 0L". This appears to 
have resolved the lost-timer behavior.

--- time_64.c.orig
+++ time_64.c
@@ -146,7 +146,7 @@
                              : "=r" (new_tick));
         new_tick &= ~TICKCMP_IRQ_BIT;

-       return ((long)(new_tick - (orig_tick+adj))) > 0L;
+       return ((long)(new_tick - (orig_tick+adj))) >= 0L;
  }

  static unsigned long tick_add_tick(unsigned long adj)
@@ -277,7 +277,7 @@
                              : "=r" (new_tick));
         new_tick &= ~TICKCMP_IRQ_BIT;

-       return ((long)(new_tick - (orig_tick+adj))) > 0L;
+       return ((long)(new_tick - (orig_tick+adj))) >= 0L;
  }

  static unsigned long stick_get_frequency(void)

>
> unless they managed to add some extra propagation delay to that
> comparator write like the HPET folks did at some point without telling
> anyone. I doubt the SPARC janitor who implemented it did so because
> that would have made the failure way more likely.
>
> I have truly no idea why the original code did not expose this problem,
> though it might have been just papered over by sheer luck and timing.
>
> Thanks,
>
>          tglx
> ---
> --- a/kernel/time/clockevents.c
> +++ b/kernel/time/clockevents.c
> @@ -381,6 +381,8 @@ int clockevents_program_event(struct clo
>   	if (dev->set_next_event(dev->min_delta_ticks, dev)) {
>   		if (!force || clockevents_program_min_delta(dev))
>   			return -ETIME;
> +	} else if (delta <= 0) {
> +		dev->next_event = ktime_add_ns(ktime_get(), dev->min_delta_ns);
>   	}
>   	dev->next_event_forced = 1;
>   	return 0;
>
You mentioned this kernel/time/clockevents.c patch is optional, but I 
propose revising clockevents_program_event(). If the requested event 
time is already at or before now, record a sane next_event (now + 
min_delta) so core code sees a future expected time and can behave 
correctly. Does this seem reasonable?

  --- clockevents.c.orig
+++ clockevents.c
@@ -347,6 +347,11 @@
         if (dev->set_next_event(dev->min_delta_ticks, dev)) {
                 if (!force || clockevents_program_min_delta(dev))
                         return -ETIME;
+       } else {
+               ktime_t now = ktime_get();
+               s64 delta_ns = ktime_to_ns(ktime_sub(expires, now));
+               if (delta_ns <= 0)
+                       dev->next_event = ktime_add_ns(now, 
dev->min_delta_ns);
         }
         dev->next_event_forced = 1;
         return 0;



  reply	other threads:[~2026-05-14  7:24 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-23 16:30 the stuttering regression in 7.0: should I have done something different? Thorsten Leemhuis
2026-04-26 21:16 ` Greg KH
2026-05-08  5:51 ` John Paul Adrian Glaubitz
2026-05-08  6:33   ` Thorsten Leemhuis
     [not found]     ` <D5D19776-C809-4284-9417-F9A860877B98@gmail.com>
2026-05-08  7:50       ` Thorsten Leemhuis
2026-05-08 20:15         ` Tony Rodriguez
2026-05-08 20:21           ` Tony Rodriguez
2026-05-10 21:29           ` Thomas Gleixner
2026-05-11  3:13             ` Tony Rodriguez
2026-05-12  5:03               ` the stuttering regression in 7.0: should I have done something different Tony Rodriguez
2026-05-12  8:17                 ` Thomas Gleixner
2026-05-12 21:43                   ` Tony Rodriguez
2026-05-13 20:28                     ` Thomas Gleixner
2026-05-14  7:24                       ` Tony Rodriguez [this message]
2026-05-14 10:24                         ` Thomas Gleixner
2026-05-15  4:47                           ` Tony Rodriguez
2026-05-15 15:35                             ` Thomas Gleixner
2026-05-15 17:51                               ` John Paul Adrian Glaubitz
2026-05-15 19:57                                 ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e45438e7-8501-4c10-95ee-07f118de8a51@gmail.com \
    --to=unixpro1970@gmail.com \
    --cc=glaubitz@physik.fu-berlin.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=regressions@leemhuis.info \
    --cc=regressions@lists.linux.dev \
    --cc=sparclinux@vger.kernel.org \
    --cc=tglx@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.