public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Feng Tang <feng.tang@intel.com>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: <linux-kernel@vger.kernel.org>, <clm@meta.com>,
	<jstultz@google.com>, <tglx@linutronix.de>, <sboyd@kernel.org>,
	<longman@redhat.com>
Subject: Re: [PATCH clocksource] Reject bogus watchdog clocksource measurements
Date: Wed, 2 Nov 2022 10:58:43 +0800	[thread overview]
Message-ID: <Y2Hc47MqcGiT1lUE@feng-clx> (raw)
In-Reply-To: <20221101190627.GI5600@paulmck-ThinkPad-P17-Gen-1>

On Tue, Nov 01, 2022 at 12:06:27PM -0700, Paul E. McKenney wrote:
> On Tue, Nov 01, 2022 at 01:43:32PM +0800, Feng Tang wrote:
> > On Mon, Oct 31, 2022 at 10:42:12AM -0700, Paul E. McKenney wrote:
> > 
> > [...]
> > > > > @@ -448,8 +448,26 @@ static void clocksource_watchdog(struct timer_list *unused)
> > > > >  			continue;
> > > > >  		}
> > > > >  		if (wd_nsec > (wdi << 2)) {
> > > > 
> > > > Just recalled one thing, that it may be better to check 'cs_nsec' 
> > > > instead of 'wd_nsec', as some watchdog may have small wrap-around
> > > > value. IIRC, HPET's counter is 32 bits long and wraps at about
> > > > 300 seconds, and PMTIMER's counter is 24 bits which wraps at about
> > > > 3 ~ 4 seconds. So when a long stall of the watchdog timer happens,
> > > > the watchdog's value could 'overflow' many times.
> > > > 
> > > > And usually the 'current' closcksource has longer wrap time than
> > > > the watchdog.
> > > 
> > > Why not both?
> > 
> > You mean checking both clocksource and the watchdog? It's fine for
> > me, though I still trust clocksource more.
> 
> OK, good, I will check both.  You never know what future hardware
> might bring.

Makes sense to me.

> I also reversed the order of the checks, so that it now checks for heavy
> load before too-short interval.  The purpose is to automatically avoid
> being fooled by clock wrap.
> 
> > I checked some old emails and found some long stall logs for reference.
> > 
> > * one stall of 471 seconds
> > 
> >  [ 2410.694068] clocksource: timekeeping watchdog on CPU262: Marking clocksource 'tsc' as unstable because the skew is too large:
> >  [ 2410.706920] clocksource:                       'hpet' wd_nsec: 0 wd_now: ffd70be2 wd_last: 40da633b mask: ffffffff
> >  [ 2410.718583] clocksource:                       'tsc' cs_nsec: 471766594285 cs_now: 44f62c184e9 cs_last: 394a7a43771 mask: ffffffffffffffff
> >  [ 2410.732568] clocksource:                       'tsc' is current clocksource.
> >  [ 2410.740553] tsc: Marking TSC unstable due to clocksource watchdog
> >  [ 2410.747611] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
> >  [ 2410.757321] sched_clock: Marking unstable (2398804490960, 11943006672)<-(2419023952548, -8276474713)
> >  [ 2410.767741] clocksource: Checking clocksource tsc synchronization from CPU 233 to CPUs 0,73,93-94,226,454,602,821.
> >  [ 2410.784045] clocksource: Switched to clocksource hpet
> > 
> > 
> > * another one of 5 seconds
> > 
> >  [ 3302.211708] clocksource: timekeeping watchdog on CPU9: Marking clocksource 'tsc' as unstable because the skew is too large:
> >  [ 3302.211710] clocksource:                       'acpi_pm' wd_nsec: 312227950 wd_now: 92367f wd_last: 8128bd mask: ffffff
> >  [ 3302.211712] clocksource:                       'tsc' cs_nsec: 4999196389 cs_now: 9e811223a9754 cs_last: 9e80e767df194 mask: ffffffffffffffff
> >  [ 3302.211714] clocksource:                       'tsc' is current clocksource.
> >  [ 3302.211716] tsc: Marking TSC unstable due to clocksource watchdog
> 
> Very good, thank you!  I believe that both of these would be handled
> by the updated commit (see below for the update).

Yes, I think so too.

> 
> > >  		if (wd_nsec > (wdi << 2) || cs_nsec > (wdi << 2)) {
> > > 
> > > > > -			/* This can happen on busy systems, which can delay the watchdog. */
> > > > > -			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval, probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
> > > > > +			bool needwarn = false;
> > > > > +			u64 wd_lb;
> > > > > +
> > > > > +			cs->wd_bogus_count++;
> > > > > +			if (!cs->wd_bogus_shift) {
> > > > > +				needwarn = true;
> > > > > +			} else {
> > > > > +				delta = clocksource_delta(wdnow, cs->wd_last_bogus, watchdog->mask);
> > > > > +				wd_lb = clocksource_cyc2ns(delta, watchdog->mult, watchdog->shift);
> > > > > +				if ((1 << cs->wd_bogus_shift) * wdi <= wd_lb)
> > > > > +					needwarn = true;
> > > > 
> > > > I'm not sure if we need to check the last_bogus counter, or just
> > > > the current interval 'cs_nsec' is what we care, and some code
> > > > like this ?
> > > 
> > > I thought we wanted exponential backoff?  Do you really get that from
> > > the changes below?
> > 
> > Aha, I misunderstood your words. I thought to only report one time for
> > each 2, 4, 8, ... 256 seconds stall, and after that only report stall
> > of 512+ seconds. So your approach looks good to me, as our intention is
> > to avoid the flood of warning message.
> 
> Sounds good, thank you!
> 
> Please see below for a patch to be squashed into the original.
> 
> Thoughts?

It looks good to me, thanks!

- Feng

>
> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> commit eaee921daa7091f0eb731c9217ccc638ed5f8baf
> Author: Paul E. McKenney <paulmck@kernel.org>
> Date:   Tue Nov 1 12:02:18 2022 -0700
> 
>     squash! clocksource: Exponential backoff for load-induced bogus watchdog reads
>     
>     [ paulmck: Apply Feng Tang feedback. ]
>     
>     Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> 
> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> index 6537ffa02e445..de8047b6720f5 100644
> --- a/kernel/time/clocksource.c
> +++ b/kernel/time/clocksource.c
> @@ -442,12 +442,7 @@ static void clocksource_watchdog(struct timer_list *unused)
>  
>  		/* Check for bogus measurements. */
>  		wdi = jiffies_to_nsecs(WATCHDOG_INTERVAL);
> -		if (wd_nsec < (wdi >> 2)) {
> -			/* This usually indicates broken timer code or hardware. */
> -			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
> -			continue;
> -		}
> -		if (wd_nsec > (wdi << 2)) {
> +		if (wd_nsec > (wdi << 2) || cs_nsec > (wdi << 2)) {
>  			bool needwarn = false;
>  			u64 wd_lb;
>  
> @@ -470,6 +465,12 @@ static void clocksource_watchdog(struct timer_list *unused)
>  			}
>  			continue;
>  		}
> +		/* Check too-short measurements second to handle wrap. */
> +		if (wd_nsec < (wdi >> 2) || cs_nsec < (wdi >> 2)) {
> +			/* This usually indicates broken timer code or hardware. */
> +			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
> +			continue;
> +		}
>  
>  		/* Check the deviation from the watchdog clocksource. */
>  		md = cs->uncertainty_margin + watchdog->uncertainty_margin;

      reply	other threads:[~2022-11-02  2:58 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-19 23:09 [PATCH clocksource] Reject bogus watchdog clocksource measurements Paul E. McKenney
2022-10-20  3:01 ` Waiman Long
2022-10-20  8:09 ` Feng Tang
2022-10-20 14:09   ` Paul E. McKenney
2022-10-20 17:29     ` Waiman Long
2022-10-21  0:46     ` Feng Tang
2022-10-28 17:52       ` Paul E. McKenney
2022-10-28 17:52       ` Paul E. McKenney
2022-10-31  5:59         ` Feng Tang
2022-10-31 17:42           ` Paul E. McKenney
2022-11-01  5:43             ` Feng Tang
2022-11-01 19:06               ` Paul E. McKenney
2022-11-02  2:58                 ` Feng Tang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y2Hc47MqcGiT1lUE@feng-clx \
    --to=feng.tang@intel.com \
    --cc=clm@meta.com \
    --cc=jstultz@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=sboyd@kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox