From: Feng Tang <feng.tang@intel.com>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: <linux-kernel@vger.kernel.org>, <clm@meta.com>,
<jstultz@google.com>, <tglx@linutronix.de>, <sboyd@kernel.org>,
<longman@redhat.com>
Subject: Re: [PATCH clocksource] Reject bogus watchdog clocksource measurements
Date: Wed, 2 Nov 2022 10:58:43 +0800 [thread overview]
Message-ID: <Y2Hc47MqcGiT1lUE@feng-clx> (raw)
In-Reply-To: <20221101190627.GI5600@paulmck-ThinkPad-P17-Gen-1>
On Tue, Nov 01, 2022 at 12:06:27PM -0700, Paul E. McKenney wrote:
> On Tue, Nov 01, 2022 at 01:43:32PM +0800, Feng Tang wrote:
> > On Mon, Oct 31, 2022 at 10:42:12AM -0700, Paul E. McKenney wrote:
> >
> > [...]
> > > > > @@ -448,8 +448,26 @@ static void clocksource_watchdog(struct timer_list *unused)
> > > > > continue;
> > > > > }
> > > > > if (wd_nsec > (wdi << 2)) {
> > > >
> > > > Just recalled one thing, that it may be better to check 'cs_nsec'
> > > > instead of 'wd_nsec', as some watchdog may have small wrap-around
> > > > value. IIRC, HPET's counter is 32 bits long and wraps at about
> > > > 300 seconds, and PMTIMER's counter is 24 bits which wraps at about
> > > > 3 ~ 4 seconds. So when a long stall of the watchdog timer happens,
> > > > the watchdog's value could 'overflow' many times.
> > > >
> > > > And usually the 'current' closcksource has longer wrap time than
> > > > the watchdog.
> > >
> > > Why not both?
> >
> > You mean checking both clocksource and the watchdog? It's fine for
> > me, though I still trust clocksource more.
>
> OK, good, I will check both. You never know what future hardware
> might bring.
Makes sense to me.
> I also reversed the order of the checks, so that it now checks for heavy
> load before too-short interval. The purpose is to automatically avoid
> being fooled by clock wrap.
>
> > I checked some old emails and found some long stall logs for reference.
> >
> > * one stall of 471 seconds
> >
> > [ 2410.694068] clocksource: timekeeping watchdog on CPU262: Marking clocksource 'tsc' as unstable because the skew is too large:
> > [ 2410.706920] clocksource: 'hpet' wd_nsec: 0 wd_now: ffd70be2 wd_last: 40da633b mask: ffffffff
> > [ 2410.718583] clocksource: 'tsc' cs_nsec: 471766594285 cs_now: 44f62c184e9 cs_last: 394a7a43771 mask: ffffffffffffffff
> > [ 2410.732568] clocksource: 'tsc' is current clocksource.
> > [ 2410.740553] tsc: Marking TSC unstable due to clocksource watchdog
> > [ 2410.747611] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
> > [ 2410.757321] sched_clock: Marking unstable (2398804490960, 11943006672)<-(2419023952548, -8276474713)
> > [ 2410.767741] clocksource: Checking clocksource tsc synchronization from CPU 233 to CPUs 0,73,93-94,226,454,602,821.
> > [ 2410.784045] clocksource: Switched to clocksource hpet
> >
> >
> > * another one of 5 seconds
> >
> > [ 3302.211708] clocksource: timekeeping watchdog on CPU9: Marking clocksource 'tsc' as unstable because the skew is too large:
> > [ 3302.211710] clocksource: 'acpi_pm' wd_nsec: 312227950 wd_now: 92367f wd_last: 8128bd mask: ffffff
> > [ 3302.211712] clocksource: 'tsc' cs_nsec: 4999196389 cs_now: 9e811223a9754 cs_last: 9e80e767df194 mask: ffffffffffffffff
> > [ 3302.211714] clocksource: 'tsc' is current clocksource.
> > [ 3302.211716] tsc: Marking TSC unstable due to clocksource watchdog
>
> Very good, thank you! I believe that both of these would be handled
> by the updated commit (see below for the update).
Yes, I think so too.
>
> > > if (wd_nsec > (wdi << 2) || cs_nsec > (wdi << 2)) {
> > >
> > > > > - /* This can happen on busy systems, which can delay the watchdog. */
> > > > > - pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval, probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
> > > > > + bool needwarn = false;
> > > > > + u64 wd_lb;
> > > > > +
> > > > > + cs->wd_bogus_count++;
> > > > > + if (!cs->wd_bogus_shift) {
> > > > > + needwarn = true;
> > > > > + } else {
> > > > > + delta = clocksource_delta(wdnow, cs->wd_last_bogus, watchdog->mask);
> > > > > + wd_lb = clocksource_cyc2ns(delta, watchdog->mult, watchdog->shift);
> > > > > + if ((1 << cs->wd_bogus_shift) * wdi <= wd_lb)
> > > > > + needwarn = true;
> > > >
> > > > I'm not sure if we need to check the last_bogus counter, or just
> > > > the current interval 'cs_nsec' is what we care, and some code
> > > > like this ?
> > >
> > > I thought we wanted exponential backoff? Do you really get that from
> > > the changes below?
> >
> > Aha, I misunderstood your words. I thought to only report one time for
> > each 2, 4, 8, ... 256 seconds stall, and after that only report stall
> > of 512+ seconds. So your approach looks good to me, as our intention is
> > to avoid the flood of warning message.
>
> Sounds good, thank you!
>
> Please see below for a patch to be squashed into the original.
>
> Thoughts?
It looks good to me, thanks!
- Feng
>
>
> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> commit eaee921daa7091f0eb731c9217ccc638ed5f8baf
> Author: Paul E. McKenney <paulmck@kernel.org>
> Date: Tue Nov 1 12:02:18 2022 -0700
>
> squash! clocksource: Exponential backoff for load-induced bogus watchdog reads
>
> [ paulmck: Apply Feng Tang feedback. ]
>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
>
> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> index 6537ffa02e445..de8047b6720f5 100644
> --- a/kernel/time/clocksource.c
> +++ b/kernel/time/clocksource.c
> @@ -442,12 +442,7 @@ static void clocksource_watchdog(struct timer_list *unused)
>
> /* Check for bogus measurements. */
> wdi = jiffies_to_nsecs(WATCHDOG_INTERVAL);
> - if (wd_nsec < (wdi >> 2)) {
> - /* This usually indicates broken timer code or hardware. */
> - pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
> - continue;
> - }
> - if (wd_nsec > (wdi << 2)) {
> + if (wd_nsec > (wdi << 2) || cs_nsec > (wdi << 2)) {
> bool needwarn = false;
> u64 wd_lb;
>
> @@ -470,6 +465,12 @@ static void clocksource_watchdog(struct timer_list *unused)
> }
> continue;
> }
> + /* Check too-short measurements second to handle wrap. */
> + if (wd_nsec < (wdi >> 2) || cs_nsec < (wdi >> 2)) {
> + /* This usually indicates broken timer code or hardware. */
> + pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
> + continue;
> + }
>
> /* Check the deviation from the watchdog clocksource. */
> md = cs->uncertainty_margin + watchdog->uncertainty_margin;
prev parent reply other threads:[~2022-11-02 2:58 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-19 23:09 [PATCH clocksource] Reject bogus watchdog clocksource measurements Paul E. McKenney
2022-10-20 3:01 ` Waiman Long
2022-10-20 8:09 ` Feng Tang
2022-10-20 14:09 ` Paul E. McKenney
2022-10-20 17:29 ` Waiman Long
2022-10-21 0:46 ` Feng Tang
2022-10-28 17:52 ` Paul E. McKenney
2022-10-28 17:52 ` Paul E. McKenney
2022-10-31 5:59 ` Feng Tang
2022-10-31 17:42 ` Paul E. McKenney
2022-11-01 5:43 ` Feng Tang
2022-11-01 19:06 ` Paul E. McKenney
2022-11-02 2:58 ` Feng Tang [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y2Hc47MqcGiT1lUE@feng-clx \
--to=feng.tang@intel.com \
--cc=clm@meta.com \
--cc=jstultz@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=longman@redhat.com \
--cc=paulmck@kernel.org \
--cc=sboyd@kernel.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.