public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@kernel.org>
To: Jiri Wiesner <jwiesner@suse.de>
Cc: LKML <linux-kernel@vger.kernel.org>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	John Stultz <jstultz@google.com>,
	Waiman Long <longman@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Daniel Lezcano <daniel.lezcano@linaro.org>,
	Stephen Boyd <sboyd@kernel.org>,
	x86@kernel.org, "Gautham R. Shenoy" <gautham.shenoy@amd.com>,
	Daniel J Blueman <daniel@quora.org>,
	Scott Hamilton <scott.hamilton@eviden.com>,
	Helge Deller <deller@gmx.de>,
	linux-parisc@vger.kernel.org,
	Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
	linux-mips@vger.kernel.org
Subject: Re: [patch 5/5] clocksource: Rewrite watchdog code completely
Date: Sun, 08 Mar 2026 11:05:31 +0100	[thread overview]
Message-ID: <87y0k21vro.ffs@tglx> (raw)
In-Reply-To: <aZ87wpdHJ5vajYoL@incl>

On Wed, Feb 25 2026 at 19:13, Jiri Wiesner wrote:
> On Sat, Jan 24, 2026 at 12:18:01AM +0100, Thomas Gleixner wrote:
>> To address this and bring back sanity to the watchdog, rewrite the code
>> completely with a different approach:
>> 
>>   1) Restrict the validation against a reference clocksource to the boot
>>      CPU, which is usually the CPU/Socket closest to the legacy block which
>>      contains the reference source (HPET/ACPI-PM timer).
>
> The UEFI picks the boot CPU so the kernel does not have control over
> that. On the other hand, I think the CPU that is connected to the
> southbridge chip (by DMI or PCIe) will be selected in the majority of
> UEFI implementations.

Picking a remote node CPU would be insane, but yes BIOSes are insane by
definition.

> There is one issue: What if the reference clocksource itself
> experiences time skew? I have seen a case like this with the sgi_rtc
> clocksource. I created a debugging kernel with the HPET as a second
> watchdog (not affecting the decisions by the watchdog) and got this
> result:

>> clocksource: timekeeping watchdog on CPU118: Marking clocksource 'tsc' as unstable because the skew is too large:
>> clocksource: 'sgi_rtc' wd_nsec: 511302794 wd_now: 1cb50e4c4b wd_last: 1ca7097111 mask: ffffffffffffff
>> clocksource: 'hpet' wd2_nsec: 512005960 wd2_now: 65892719 wd2_last: 64c5d684 mask: ffffffff
>> clocksource: 'tsc' cs_nsec: 512006458 cs_now: 86b5982cb1 cs_last: 867581bbab mask: ffffffffffffffff
>> clocksource: 'tsc' skewed 703664 ns (0 ms) over watchdog 'sgi_rtc' interval of 511302794 ns (511 ms)
>> clocksource: 'tsc' is current clocksource.
>> tsc: Marking TSC unstable due to clocksource watchdog
>> clocksource: Checking clocksource tsc synchronization from CPU 610 to CPUs 0-609,611-767.
>> clocksource: Switched to clocksource sgi_rtc
>
> The intervals measured by the TSC and the HPET match very well; the
> sgi_rtc is off. Even the new implementation of the clocksource
> watchdog would be susceptible to the reference clocksource
> experiencing time skew. I think the clocksource watchdog needs to make
> the assumption that the reference clocksource is right, and the onus
> should be on hardware developers to make sure the reference
> clocksource is accurate. In reality, one has to resort to disabling
> the reference clocksource experiencing time skew or, at least,
> decreasing the rating of that clocksource.

Yes, we have to make the assumption that the watchdog clocksource is
actually stable and accurate. If the sgi_rtc is un-reliable, then it
should be rated down. AFAICT it is per blade and I have no idea how
synchronized it is accross blades.

>> +static bool watchdog_check_freq(struct clocksource *cs, bool reset_pending)
>> +{
>> +		/*
>> +		 * Calculate and validate the skew against the allowed PPM
>> +		 * value of the maximum delta plus the watchdog readout
>> +		 * time.
>> +		 */
>> +		if (abs(wd_delta - cs_delta) < (max_delta >> ppm_shift) + wd_seq)
>> +			return true;
>
> Making the threshold proportional to the length of the interval
> resolves the issue with the (previously) fixed threshold and the
> interval being stretched on account of the timer running later than
> when it was meant to expire.

Indeed.

>> +static void watchdog_check_result(struct clocksource *cs)
>>  {
>> -	struct clocksource *cs;
>> +	switch (watchdog_data.result) {
>> +	case WD_SUCCESS:
>> +		clocksource_tick_stable(cs);
>> +		clocksource_enable_highres(cs);
>> +		return;
>>  
>> -	list_for_each_entry(cs, &watchdog_list, wd_list)
>> +	case WD_FREQ_TIMEOUT:
>> +		watchdog_print_freq_timeout(cs);
>> +		/* Try again later and invalidate the reference timestamps. */
>>  		cs->flags &= ~CLOCK_SOURCE_WATCHDOG;
>> -}
>> +		return;

> I like that the new clocksource watchdog is far less punishing. A
> clocksource may be marked unstable only when the readout latency is
> below 50 us (and there is time skew or unsynchronized CPU
> sockets). There is no need for skipping watchdog checks to mitigate
> the clocksource being marked unstable on account of quite possibly
> unrelated readout latency, SMIs or vCPU preemption.

That was the design goal of that rewrite. Glad you like it.

Thanks,

        tglx

  reply	other threads:[~2026-03-08 10:05 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-23 23:17 [patch 0/5] clocksource: Rewrite clocksource watchdog and related cleanups Thomas Gleixner
2026-01-23 23:17 ` [patch 1/5] parisc: Remove unused clocksource flags Thomas Gleixner
2026-01-24  8:40   ` Helge Deller
2026-03-12 11:25   ` [tip: timers/core] " tip-bot2 for Thomas Gleixner
2026-01-23 23:17 ` [patch 2/5] MIPS: Dont select CLOCKSOURCE_WATCHDOG Thomas Gleixner
2026-01-24 22:28   ` Maciej W. Rozycki
2026-01-26  9:10     ` Thomas Gleixner
2026-03-12 11:25   ` [tip: timers/core] MIPS: Don't " tip-bot2 for Thomas Gleixner
2026-01-23 23:17 ` [patch 3/5] x86/tsc: Handle CLOCK_SOURCE_VALID_FOR_HRES correctly Thomas Gleixner
2026-03-12 11:25   ` [tip: timers/core] " tip-bot2 for Thomas Gleixner
2026-01-23 23:17 ` [patch 4/5] clocksource: Dont use non-continuous clocksources as watchdog Thomas Gleixner
2026-03-12 11:25   ` [tip: timers/core] clocksource: Don't " tip-bot2 for Thomas Gleixner
2026-01-23 23:18 ` [patch 5/5] clocksource: Rewrite watchdog code completely Thomas Gleixner
2026-02-02  6:45   ` Daniel J Blueman
2026-02-02 11:27     ` Thomas Gleixner
2026-02-15 12:18       ` Daniel J Blueman
2026-02-23 13:53         ` Thomas Gleixner
2026-03-08  9:53           ` Thomas Gleixner
2026-03-15 14:59           ` Daniel J Blueman
2026-03-17  9:01             ` Thomas Gleixner
2026-03-18 14:10               ` Daniel J Blueman
2026-03-19 20:31                 ` Thomas Gleixner
2026-03-20  2:21                   ` Daniel J Blueman
2026-03-20  8:26               ` [tip: timers/core] " tip-bot2 for Thomas Gleixner
2026-03-20 12:42               ` tip-bot2 for Thomas Gleixner
2026-02-25 18:13   ` [patch 5/5] " Jiri Wiesner
2026-03-08 10:05     ` Thomas Gleixner [this message]
2026-03-11 13:12       ` Jiri Wiesner
2026-03-09 15:43   ` Borislav Petkov
2026-03-11  7:58     ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y0k21vro.ffs@tglx \
    --to=tglx@kernel.org \
    --cc=daniel.lezcano@linaro.org \
    --cc=daniel@quora.org \
    --cc=deller@gmx.de \
    --cc=gautham.shenoy@amd.com \
    --cc=jstultz@google.com \
    --cc=jwiesner@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mips@vger.kernel.org \
    --cc=linux-parisc@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=sboyd@kernel.org \
    --cc=scott.hamilton@eviden.com \
    --cc=tsbogend@alpha.franken.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox