From: Feng Tang <feng.tang@intel.com>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: <linux-kernel@vger.kernel.org>, <clm@meta.com>,
<jstultz@google.com>, <tglx@linutronix.de>, <sboyd@kernel.org>,
<longman@redhat.com>
Subject: Re: [PATCH clocksource] Reject bogus watchdog clocksource measurements
Date: Thu, 20 Oct 2022 16:09:01 +0800 [thread overview]
Message-ID: <Y1ECHVUHilqgKD9o@feng-clx> (raw)
In-Reply-To: <20221019230904.GA2502730@paulmck-ThinkPad-P17-Gen-1>
On Wed, Oct 19, 2022 at 04:09:04PM -0700, Paul E. McKenney wrote:
> One remaining clocksource-skew issue involves extreme CPU overcommit,
> which can cause the clocksource watchdog measurements to be delayed by
> tens of seconds. This in turn means that a clock-skew criterion that
> is appropriate for a 500-millisecond interval will instead give lots of
> false positives.
I remembered I saw logs that the watchdog were delayed to dozens of
or hundreds of seconds.
Thanks for the fix which makes sense to me! with some nits below.
> Therefore, check for the watchdog clocksource reporting much larger or
> much less than the time specified by WATCHDOG_INTERVAL. In these cases,
> print a pr_warn() warning and refrain from marking the clocksource under
> test as being unstable.
>
> Reported-by: Chris Mason <clm@meta.com>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> Cc: John Stultz <jstultz@google.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Stephen Boyd <sboyd@kernel.org>
> Cc: Feng Tang <feng.tang@intel.com>
> Cc: Waiman Long <longman@redhat.com>
>
> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> index 8058bec87acee..dcaf38c062161 100644
> --- a/kernel/time/clocksource.c
> +++ b/kernel/time/clocksource.c
> @@ -386,7 +386,7 @@ EXPORT_SYMBOL_GPL(clocksource_verify_percpu);
>
> static void clocksource_watchdog(struct timer_list *unused)
> {
> - u64 csnow, wdnow, cslast, wdlast, delta;
> + u64 csnow, wdnow, cslast, wdlast, delta, wdi;
> int next_cpu, reset_pending;
> int64_t wd_nsec, cs_nsec;
> struct clocksource *cs;
> @@ -440,6 +440,17 @@ static void clocksource_watchdog(struct timer_list *unused)
> if (atomic_read(&watchdog_reset_pending))
> continue;
>
> + /* Check for bogus measurements. */
> + wdi = jiffies_to_nsecs(WATCHDOG_INTERVAL);
> + if (wd_nsec < (wdi >> 2)) {
> + pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
> + continue;
> + }
If this happens (500ms timer happens only after less than 125ms),
there is some severe problem with timer/interrupt system.
> + if (wd_nsec > (wdi << 2)) {
> + pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval, probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
> + continue;
> + }
I agree with Waiman that some rate limiting may be needed. As there
were reports of hundreds of seconds of delay, 2 seconds delay could
easily happen if a system is too busy or misbehave to trigger this
problem.
Thanks,
Feng
next prev parent reply other threads:[~2022-10-20 8:09 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-19 23:09 [PATCH clocksource] Reject bogus watchdog clocksource measurements Paul E. McKenney
2022-10-20 3:01 ` Waiman Long
2022-10-20 8:09 ` Feng Tang [this message]
2022-10-20 14:09 ` Paul E. McKenney
2022-10-20 17:29 ` Waiman Long
2022-10-21 0:46 ` Feng Tang
2022-10-28 17:52 ` Paul E. McKenney
2022-10-28 17:52 ` Paul E. McKenney
2022-10-31 5:59 ` Feng Tang
2022-10-31 17:42 ` Paul E. McKenney
2022-11-01 5:43 ` Feng Tang
2022-11-01 19:06 ` Paul E. McKenney
2022-11-02 2:58 ` Feng Tang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y1ECHVUHilqgKD9o@feng-clx \
--to=feng.tang@intel.com \
--cc=clm@meta.com \
--cc=jstultz@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=longman@redhat.com \
--cc=paulmck@kernel.org \
--cc=sboyd@kernel.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.