All of lore.kernel.org
 help / color / mirror / Atom feed
From: Feng Tang <feng.tang@intel.com>
To: Jiri Wiesner <jwiesner@suse.de>
Cc: <linux-kernel@vger.kernel.org>, John Stultz <jstultz@google.com>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	Stephen Boyd <sboyd@kernel.org>,
	"Paul E. McKenney" <paulmck@kernel.org>
Subject: Re: [PATCH] clocksource: Use proportional clocksource skew threshold
Date: Tue, 26 Dec 2023 22:16:33 +0800	[thread overview]
Message-ID: <ZYrgQUTB3ayTtMqK@feng-clx> (raw)
In-Reply-To: <20231221160517.GA22919@incl>

On Thu, Dec 21, 2023 at 05:05:17PM +0100, Jiri Wiesner wrote:
> There have been reports of the watchdog marking clocksources unstable on
> machines 8 NUMA nodes:
> > clocksource: timekeeping watchdog on CPU373: Marking clocksource 'tsc' as unstable because the skew is too large:
> > clocksource:   'hpet' wd_nsec: 14523447520 wd_now: 5a749706 wd_last: 45adf1e0 mask: ffffffff
> > clocksource:   'tsc' cs_nsec: 14524115132 cs_now: 515ce2c5a96caa cs_last: 515cd9a9d83918 mask: ffffffffffffffff
> > clocksource:   'tsc' is current clocksource.
> > tsc: Marking TSC unstable due to clocksource watchdog
> > TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
> > sched_clock: Marking unstable (1950347883333462, 79649632569)<-(1950428279338308, -745776594)
> > clocksource: Checking clocksource tsc synchronization from CPU 400 to CPUs 0,46,52,54,138,208,392,397.
> > clocksource: Switched to clocksource hpet
> 
> The measured clocksource skew - the absolute difference between cs_nsec
> and wd_nsec - was 668 microseconds:
> > cs_nsec - wd_nsec = 14524115132 - 14523447520 = 667612
> 
> The kernel used 200 microseconds for the uncertainty_margin of both the
> clocksource and watchdog, resulting in a threshold of 400 microseconds.
> The discrepancy is that the measured clocksource skew was evaluated against
> a threshold suited for watchdog intervals of roughly WATCHDOG_INTERVAL,
> i.e. HZ >> 1. Both the cs_nsec and the wd_nsec value indicate that the
> actual watchdog interval was circa 14.5 seconds. Since the watchdog is
> executed in softirq context the expiration of the watchdog timer can get
> severely delayed on account of a ksoftirqd thread not getting to run in a
> timely manner. Surely, a system with such belated softirq execution is not
> working well and the scheduling issue should be looked into but the
> clocksource watchdog should be able to deal with it accordingly.

We've seen similar reports on LKML that the watchdog timer was delayed
for a very long time (some was 100+ seconds). As you said, the
scheduling issue should be addressed.

Meanwhile, instead of adding new complex logic to clocksource watchdog
code, can we just printk_once a warning message and skip the current
watchdog check if the duration is too long. ACPI_PM timer only has a
24 bit counter which will wrap around every 3~4 seconds, when the
duration is too long, like 14.5 seconds here, the check is already
meaningless.

Thanks,
Feng

> 
> To keep the limit imposed by NTP (500 microseconds of skew per second),
> the margins must be scaled so that the threshold value is proportional to
> the length of the actual watchdog interval. The solution in the patch
> utilizes left-shifting to approximate the division by
> WATCHDOG_INTERVAL * NSEC_PER_SEC / HZ, which leads to slighly narrower
> margins and a slightly lower threshold for longer watchdog intervals.
> 
> Fixes: 2e27e793e280 ("clocksource: Reduce clocksource-skew threshold")
> Signed-off-by: Jiri Wiesner <jwiesner@suse.de>
> ---
>  kernel/time/clocksource.c | 60 ++++++++++++++++++++++++++++++++++-----
>  1 file changed, 53 insertions(+), 7 deletions(-)
[...]

  reply	other threads:[~2023-12-26 14:25 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-21 16:05 [PATCH] clocksource: Use proportional clocksource skew threshold Jiri Wiesner
2023-12-26 14:16 ` Feng Tang [this message]
2024-01-02 13:56   ` Jiri Wiesner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZYrgQUTB3ayTtMqK@feng-clx \
    --to=feng.tang@intel.com \
    --cc=jstultz@google.com \
    --cc=jwiesner@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@kernel.org \
    --cc=sboyd@kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.