public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Feng Tang <feng.tang@intel.com>
To: Jiri Wiesner <jwiesner@suse.de>
Cc: <linux-kernel@vger.kernel.org>, John Stultz <jstultz@google.com>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	Stephen Boyd <sboyd@kernel.org>,
	"Paul E. McKenney" <paulmck@kernel.org>
Subject: Re: [PATCH] clocksource: Use proportional clocksource skew threshold
Date: Tue, 26 Dec 2023 22:16:33 +0800	[thread overview]
Message-ID: <ZYrgQUTB3ayTtMqK@feng-clx> (raw)
In-Reply-To: <20231221160517.GA22919@incl>

On Thu, Dec 21, 2023 at 05:05:17PM +0100, Jiri Wiesner wrote:
> There have been reports of the watchdog marking clocksources unstable on
> machines 8 NUMA nodes:
> > clocksource: timekeeping watchdog on CPU373: Marking clocksource 'tsc' as unstable because the skew is too large:
> > clocksource:   'hpet' wd_nsec: 14523447520 wd_now: 5a749706 wd_last: 45adf1e0 mask: ffffffff
> > clocksource:   'tsc' cs_nsec: 14524115132 cs_now: 515ce2c5a96caa cs_last: 515cd9a9d83918 mask: ffffffffffffffff
> > clocksource:   'tsc' is current clocksource.
> > tsc: Marking TSC unstable due to clocksource watchdog
> > TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
> > sched_clock: Marking unstable (1950347883333462, 79649632569)<-(1950428279338308, -745776594)
> > clocksource: Checking clocksource tsc synchronization from CPU 400 to CPUs 0,46,52,54,138,208,392,397.
> > clocksource: Switched to clocksource hpet
> 
> The measured clocksource skew - the absolute difference between cs_nsec
> and wd_nsec - was 668 microseconds:
> > cs_nsec - wd_nsec = 14524115132 - 14523447520 = 667612
> 
> The kernel used 200 microseconds for the uncertainty_margin of both the
> clocksource and watchdog, resulting in a threshold of 400 microseconds.
> The discrepancy is that the measured clocksource skew was evaluated against
> a threshold suited for watchdog intervals of roughly WATCHDOG_INTERVAL,
> i.e. HZ >> 1. Both the cs_nsec and the wd_nsec value indicate that the
> actual watchdog interval was circa 14.5 seconds. Since the watchdog is
> executed in softirq context the expiration of the watchdog timer can get
> severely delayed on account of a ksoftirqd thread not getting to run in a
> timely manner. Surely, a system with such belated softirq execution is not
> working well and the scheduling issue should be looked into but the
> clocksource watchdog should be able to deal with it accordingly.

We've seen similar reports on LKML that the watchdog timer was delayed
for a very long time (some was 100+ seconds). As you said, the
scheduling issue should be addressed.

Meanwhile, instead of adding new complex logic to clocksource watchdog
code, can we just printk_once a warning message and skip the current
watchdog check if the duration is too long. ACPI_PM timer only has a
24 bit counter which will wrap around every 3~4 seconds, when the
duration is too long, like 14.5 seconds here, the check is already
meaningless.

Thanks,
Feng

> 
> To keep the limit imposed by NTP (500 microseconds of skew per second),
> the margins must be scaled so that the threshold value is proportional to
> the length of the actual watchdog interval. The solution in the patch
> utilizes left-shifting to approximate the division by
> WATCHDOG_INTERVAL * NSEC_PER_SEC / HZ, which leads to slighly narrower
> margins and a slightly lower threshold for longer watchdog intervals.
> 
> Fixes: 2e27e793e280 ("clocksource: Reduce clocksource-skew threshold")
> Signed-off-by: Jiri Wiesner <jwiesner@suse.de>
> ---
>  kernel/time/clocksource.c | 60 ++++++++++++++++++++++++++++++++++-----
>  1 file changed, 53 insertions(+), 7 deletions(-)
[...]

  reply	other threads:[~2023-12-26 14:25 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-21 16:05 [PATCH] clocksource: Use proportional clocksource skew threshold Jiri Wiesner
2023-12-26 14:16 ` Feng Tang [this message]
2024-01-02 13:56   ` Jiri Wiesner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZYrgQUTB3ayTtMqK@feng-clx \
    --to=feng.tang@intel.com \
    --cc=jstultz@google.com \
    --cc=jwiesner@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@kernel.org \
    --cc=sboyd@kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox