From: Thomas Gleixner <tglx@kernel.org>
To: Jiri Wiesner <jwiesner@suse.de>
Cc: LKML <linux-kernel@vger.kernel.org>,
"Paul E. McKenney" <paulmck@kernel.org>,
John Stultz <jstultz@google.com>,
Waiman Long <longman@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Daniel Lezcano <daniel.lezcano@linaro.org>,
Stephen Boyd <sboyd@kernel.org>,
x86@kernel.org, "Gautham R. Shenoy" <gautham.shenoy@amd.com>,
Daniel J Blueman <daniel@quora.org>,
Scott Hamilton <scott.hamilton@eviden.com>,
Helge Deller <deller@gmx.de>,
linux-parisc@vger.kernel.org,
Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
linux-mips@vger.kernel.org
Subject: Re: [patch 5/5] clocksource: Rewrite watchdog code completely
Date: Sun, 08 Mar 2026 11:05:31 +0100 [thread overview]
Message-ID: <87y0k21vro.ffs@tglx> (raw)
In-Reply-To: <aZ87wpdHJ5vajYoL@incl>
On Wed, Feb 25 2026 at 19:13, Jiri Wiesner wrote:
> On Sat, Jan 24, 2026 at 12:18:01AM +0100, Thomas Gleixner wrote:
>> To address this and bring back sanity to the watchdog, rewrite the code
>> completely with a different approach:
>>
>> 1) Restrict the validation against a reference clocksource to the boot
>> CPU, which is usually the CPU/Socket closest to the legacy block which
>> contains the reference source (HPET/ACPI-PM timer).
>
> The UEFI picks the boot CPU so the kernel does not have control over
> that. On the other hand, I think the CPU that is connected to the
> southbridge chip (by DMI or PCIe) will be selected in the majority of
> UEFI implementations.
Picking a remote node CPU would be insane, but yes BIOSes are insane by
definition.
> There is one issue: What if the reference clocksource itself
> experiences time skew? I have seen a case like this with the sgi_rtc
> clocksource. I created a debugging kernel with the HPET as a second
> watchdog (not affecting the decisions by the watchdog) and got this
> result:
>> clocksource: timekeeping watchdog on CPU118: Marking clocksource 'tsc' as unstable because the skew is too large:
>> clocksource: 'sgi_rtc' wd_nsec: 511302794 wd_now: 1cb50e4c4b wd_last: 1ca7097111 mask: ffffffffffffff
>> clocksource: 'hpet' wd2_nsec: 512005960 wd2_now: 65892719 wd2_last: 64c5d684 mask: ffffffff
>> clocksource: 'tsc' cs_nsec: 512006458 cs_now: 86b5982cb1 cs_last: 867581bbab mask: ffffffffffffffff
>> clocksource: 'tsc' skewed 703664 ns (0 ms) over watchdog 'sgi_rtc' interval of 511302794 ns (511 ms)
>> clocksource: 'tsc' is current clocksource.
>> tsc: Marking TSC unstable due to clocksource watchdog
>> clocksource: Checking clocksource tsc synchronization from CPU 610 to CPUs 0-609,611-767.
>> clocksource: Switched to clocksource sgi_rtc
>
> The intervals measured by the TSC and the HPET match very well; the
> sgi_rtc is off. Even the new implementation of the clocksource
> watchdog would be susceptible to the reference clocksource
> experiencing time skew. I think the clocksource watchdog needs to make
> the assumption that the reference clocksource is right, and the onus
> should be on hardware developers to make sure the reference
> clocksource is accurate. In reality, one has to resort to disabling
> the reference clocksource experiencing time skew or, at least,
> decreasing the rating of that clocksource.
Yes, we have to make the assumption that the watchdog clocksource is
actually stable and accurate. If the sgi_rtc is un-reliable, then it
should be rated down. AFAICT it is per blade and I have no idea how
synchronized it is accross blades.
>> +static bool watchdog_check_freq(struct clocksource *cs, bool reset_pending)
>> +{
>> + /*
>> + * Calculate and validate the skew against the allowed PPM
>> + * value of the maximum delta plus the watchdog readout
>> + * time.
>> + */
>> + if (abs(wd_delta - cs_delta) < (max_delta >> ppm_shift) + wd_seq)
>> + return true;
>
> Making the threshold proportional to the length of the interval
> resolves the issue with the (previously) fixed threshold and the
> interval being stretched on account of the timer running later than
> when it was meant to expire.
Indeed.
>> +static void watchdog_check_result(struct clocksource *cs)
>> {
>> - struct clocksource *cs;
>> + switch (watchdog_data.result) {
>> + case WD_SUCCESS:
>> + clocksource_tick_stable(cs);
>> + clocksource_enable_highres(cs);
>> + return;
>>
>> - list_for_each_entry(cs, &watchdog_list, wd_list)
>> + case WD_FREQ_TIMEOUT:
>> + watchdog_print_freq_timeout(cs);
>> + /* Try again later and invalidate the reference timestamps. */
>> cs->flags &= ~CLOCK_SOURCE_WATCHDOG;
>> -}
>> + return;
> I like that the new clocksource watchdog is far less punishing. A
> clocksource may be marked unstable only when the readout latency is
> below 50 us (and there is time skew or unsynchronized CPU
> sockets). There is no need for skipping watchdog checks to mitigate
> the clocksource being marked unstable on account of quite possibly
> unrelated readout latency, SMIs or vCPU preemption.
That was the design goal of that rewrite. Glad you like it.
Thanks,
tglx
next prev parent reply other threads:[~2026-03-08 10:05 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-23 23:17 [patch 0/5] clocksource: Rewrite clocksource watchdog and related cleanups Thomas Gleixner
2026-01-23 23:17 ` [patch 1/5] parisc: Remove unused clocksource flags Thomas Gleixner
2026-01-24 8:40 ` Helge Deller
2026-03-12 11:25 ` [tip: timers/core] " tip-bot2 for Thomas Gleixner
2026-01-23 23:17 ` [patch 2/5] MIPS: Dont select CLOCKSOURCE_WATCHDOG Thomas Gleixner
2026-01-24 22:28 ` Maciej W. Rozycki
2026-01-26 9:10 ` Thomas Gleixner
2026-03-12 11:25 ` [tip: timers/core] MIPS: Don't " tip-bot2 for Thomas Gleixner
2026-01-23 23:17 ` [patch 3/5] x86/tsc: Handle CLOCK_SOURCE_VALID_FOR_HRES correctly Thomas Gleixner
2026-03-12 11:25 ` [tip: timers/core] " tip-bot2 for Thomas Gleixner
2026-01-23 23:17 ` [patch 4/5] clocksource: Dont use non-continuous clocksources as watchdog Thomas Gleixner
2026-03-12 11:25 ` [tip: timers/core] clocksource: Don't " tip-bot2 for Thomas Gleixner
2026-01-23 23:18 ` [patch 5/5] clocksource: Rewrite watchdog code completely Thomas Gleixner
2026-02-02 6:45 ` Daniel J Blueman
2026-02-02 11:27 ` Thomas Gleixner
2026-02-15 12:18 ` Daniel J Blueman
2026-02-23 13:53 ` Thomas Gleixner
2026-03-08 9:53 ` Thomas Gleixner
2026-03-15 14:59 ` Daniel J Blueman
2026-03-17 9:01 ` Thomas Gleixner
2026-03-18 14:10 ` Daniel J Blueman
2026-03-19 20:31 ` Thomas Gleixner
2026-03-20 2:21 ` Daniel J Blueman
2026-03-20 8:26 ` [tip: timers/core] " tip-bot2 for Thomas Gleixner
2026-03-20 12:42 ` tip-bot2 for Thomas Gleixner
2026-02-25 18:13 ` [patch 5/5] " Jiri Wiesner
2026-03-08 10:05 ` Thomas Gleixner [this message]
2026-03-11 13:12 ` Jiri Wiesner
2026-03-09 15:43 ` Borislav Petkov
2026-03-11 7:58 ` Thomas Gleixner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87y0k21vro.ffs@tglx \
--to=tglx@kernel.org \
--cc=daniel.lezcano@linaro.org \
--cc=daniel@quora.org \
--cc=deller@gmx.de \
--cc=gautham.shenoy@amd.com \
--cc=jstultz@google.com \
--cc=jwiesner@suse.de \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mips@vger.kernel.org \
--cc=linux-parisc@vger.kernel.org \
--cc=longman@redhat.com \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=sboyd@kernel.org \
--cc=scott.hamilton@eviden.com \
--cc=tsbogend@alpha.franken.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox