* [PATCH v2 1/1] sparc64: Fix comparator problem with timer interrupts
@ 2026-05-19 2:24 Tony Rodriguez
2026-05-19 2:24 ` Tony Rodriguez
0 siblings, 1 reply; 4+ messages in thread
From: Tony Rodriguez @ 2026-05-19 2:24 UTC (permalink / raw)
To: davem, sparclinux
Cc: linux-kernel, andreas, tglx, thomas.weissschuh, regressions,
glaubitz, linux, torvalds, Tony Rodriguez
sparc64: Fix comparator problem with timer interrupts
This patch fixes SPARC64 tick/stick comparator logic that can cause missed
timer interrupts
Full dmesg and crash dumps:
https://github.com/unixpro1970/Sparc64-Kernel-Debugging-Dumps/blob/main/s7-2-05122026-dump.tar.gz
Debugging discussion:
https://lore.kernel.org/all/87tssb6olo.ffs@tglx/
https://lore.kernel.org/all/871pfcznw0.ffs@tglx/
Short dmesg excerpt:
[ 48.632215] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[ 48.676018] rcu: rcu_sched kthread timer wakeup didn't happen for 5259 jiffies!
[ 48.743621] rcu: Possible timer handling issue on cpu=100 timer-softirq=15
Test summary:
After applying this patch, S7-2 and T7-1 systems no longer hang on v7 and v7.1 kernels.
Tony Rodriguez (1):
sparc64: Fix comparator problem with timer interrupts
arch/sparc/kernel/time_64.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
--
2.53.0
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH v2 1/1] sparc64: Fix comparator problem with timer interrupts
2026-05-19 2:24 [PATCH v2 1/1] sparc64: Fix comparator problem with timer interrupts Tony Rodriguez
@ 2026-05-19 2:24 ` Tony Rodriguez
2026-05-19 14:22 ` Thomas Gleixner
0 siblings, 1 reply; 4+ messages in thread
From: Tony Rodriguez @ 2026-05-19 2:24 UTC (permalink / raw)
To: davem, sparclinux
Cc: linux-kernel, andreas, tglx, thomas.weissschuh, regressions,
glaubitz, linux, torvalds, Tony Rodriguez
On SPARC64 the check:
return ((long)(new_tick - (orig_tick + adj))) > 0L;
Is safe only if retries make forward progress. The comparator can
take effect with a latency, so the moment when counter == comparator
may be missed, which can cause delays or hangs on some SPARC64 systems.
For clarity:
exp = orig_tick + adj /* expected comparator value */
The current check requires new_tick to be strictly greater than exp;
equality (new_tick == exp) is treated as not yet passed and the caller
will retry.
By contrast, using:
return ((long)(new_tick - (orig_tick + adj))) >= 0L;
causes the caller to stop retrying and assume the timer is scheduled;
both equality and greater-than are accepted (new_tick == exp or
new_tick > exp).
Signed-off-by: Tony Rodriguez <unixpro1970@gmail.com>
---
arch/sparc/kernel/time_64.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/sparc/kernel/time_64.c b/arch/sparc/kernel/time_64.c
index 87b267043ccd..783b60e547c4 100644
--- a/arch/sparc/kernel/time_64.c
+++ b/arch/sparc/kernel/time_64.c
@@ -146,7 +146,7 @@ static int tick_add_compare(unsigned long adj)
: "=r" (new_tick));
new_tick &= ~TICKCMP_IRQ_BIT;
- return ((long)(new_tick - (orig_tick+adj))) > 0L;
+ return ((long)(new_tick - (orig_tick+adj))) >= 0L;
}
static unsigned long tick_add_tick(unsigned long adj)
@@ -277,7 +277,7 @@ static int stick_add_compare(unsigned long adj)
: "=r" (new_tick));
new_tick &= ~TICKCMP_IRQ_BIT;
- return ((long)(new_tick - (orig_tick+adj))) > 0L;
+ return ((long)(new_tick - (orig_tick+adj))) >= 0L;
}
static unsigned long stick_get_frequency(void)
@@ -411,7 +411,7 @@ static int hbtick_add_compare(unsigned long adj)
val2 = __hbird_read_stick() & ~TICKCMP_IRQ_BIT;
- return ((long)(val2 - val)) > 0L;
+ return ((long)(val2 - val)) >= 0L;
}
static unsigned long hbtick_get_frequency(void)
--
2.53.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v2 1/1] sparc64: Fix comparator problem with timer interrupts
2026-05-19 2:24 ` Tony Rodriguez
@ 2026-05-19 14:22 ` Thomas Gleixner
2026-05-19 23:25 ` Tony Rodriguez
0 siblings, 1 reply; 4+ messages in thread
From: Thomas Gleixner @ 2026-05-19 14:22 UTC (permalink / raw)
To: Tony Rodriguez, davem, sparclinux
Cc: linux-kernel, andreas, thomas.weissschuh, regressions, glaubitz,
linux, torvalds, Tony Rodriguez
On Mon, May 18 2026 at 19:24, Tony Rodriguez wrote:
> On SPARC64 the check:
>
> return ((long)(new_tick - (orig_tick + adj))) > 0L;
>
> Is safe only if retries make forward progress. The comparator can
> take effect with a latency, so the moment when counter == comparator
> may be missed, which can cause delays or hangs on some SPARC64 systems.
>
> For clarity:
> exp = orig_tick + adj /* expected comparator value */
>
> The current check requires new_tick to be strictly greater than exp;
> equality (new_tick == exp) is treated as not yet passed and the caller
> will retry.
That's confusing at best. You really want to explain how the ordering is
similar to what I described in the analysis:
exp = read_cnt() + delta_ticks;
write_cmp(exp);
return (read_cnt() - exp) > 0;
If the counter advanced past the expected expiry time, after writing it,
then the caller will retry, as the calling code does:
return tick.add_compare(delta_ticks) ? -ETIME : 0;
But it won't do so when the counter is equal, which is causing the
problem.
> By contrast, using:
>
> return ((long)(new_tick - (orig_tick + adj))) >= 0L;
>
> causes the caller to stop retrying and assume the timer is scheduled;
> both equality and greater-than are accepted (new_tick == exp or
> new_tick > exp).
It's the other way round. When counter >= expiry time, then the write is
considered failed. If the counter has not yet reached expiry time,
i.e. it is smaller, then it assumes the timer is scheduled.
> Signed-off-by: Tony Rodriguez <unixpro1970@gmail.com>
It would be nice to have a link to the original thread in the change log
itself as that gives people quick access when they are wondering about
this a year down the road.
Thanks,
tglx
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v2 1/1] sparc64: Fix comparator problem with timer interrupts
2026-05-19 14:22 ` Thomas Gleixner
@ 2026-05-19 23:25 ` Tony Rodriguez
0 siblings, 0 replies; 4+ messages in thread
From: Tony Rodriguez @ 2026-05-19 23:25 UTC (permalink / raw)
To: Thomas Gleixner
Cc: linux-kernel, andreas, thomas.weissschuh, regressions, glaubitz,
linux, torvalds, davem, sparclinux
Hi Thomas,
Thanks again for the careful analysis of the comparator ordering. After
applying the changes in the patch I confirmed S7‑2 and T7‑1 systems no
longer hang; timer scheduling now behaves as expected.
I don’t have bandwidth right now to finish the remaining tidy steps and
I’d rather avoid delaying this fix. Since we worked on this together and
you’re familiar with the issue, would you be willing to take it over so
upstream has everything required for approval? I’m happy to answer any
questions; if you’re busy I can ask the maintainer to reassign.
Thanks again for the review and the clear explanation.
Tony
On 5/19/26 7:22 AM, Thomas Gleixner wrote:
>> On SPARC64 the check:
>>
>> return ((long)(new_tick - (orig_tick + adj))) > 0L;
>>
>> Is safe only if retries make forward progress. The comparator can
>> take effect with a latency, so the moment when counter == comparator
>> may be missed, which can cause delays or hangs on some SPARC64 systems.
>>
>> For clarity:
>> exp = orig_tick + adj /* expected comparator value */
>>
>> The current check requires new_tick to be strictly greater than exp;
>> equality (new_tick == exp) is treated as not yet passed and the caller
>> will retry.
> That's confusing at best. You really want to explain how the ordering is
> similar to what I described in the analysis:
>
> exp = read_cnt() + delta_ticks;
> write_cmp(exp);
> return (read_cnt() - exp) > 0;
>
> If the counter advanced past the expected expiry time, after writing it,
> then the caller will retry, as the calling code does:
>
> return tick.add_compare(delta_ticks) ? -ETIME : 0;
>
> But it won't do so when the counter is equal, which is causing the
> problem.
>
>> By contrast, using:
>>
>> return ((long)(new_tick - (orig_tick + adj))) >= 0L;
>>
>> causes the caller to stop retrying and assume the timer is scheduled;
>> both equality and greater-than are accepted (new_tick == exp or
>> new_tick > exp).
> It's the other way round. When counter >= expiry time, then the write is
> considered failed. If the counter has not yet reached expiry time,
> i.e. it is smaller, then it assumes the timer is scheduled.
>
>> Signed-off-by: Tony Rodriguez<unixpro1970@gmail.com>
> It would be nice to have a link to the original thread in the change log
> itself as that gives people quick access when they are wondering about
> this a year down the road.
>
> Thanks,
>
> tglx
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-05-19 23:25 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-19 2:24 [PATCH v2 1/1] sparc64: Fix comparator problem with timer interrupts Tony Rodriguez
2026-05-19 2:24 ` Tony Rodriguez
2026-05-19 14:22 ` Thomas Gleixner
2026-05-19 23:25 ` Tony Rodriguez
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox