All of lore.kernel.org
 help / color / mirror / Atom feed
From: Feng Tang <feng.tang@intel.com>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: <linux-kernel@vger.kernel.org>, <clm@meta.com>,
	<jstultz@google.com>, <tglx@linutronix.de>, <sboyd@kernel.org>,
	<longman@redhat.com>
Subject: Re: [PATCH clocksource] Reject bogus watchdog clocksource measurements
Date: Mon, 31 Oct 2022 13:59:38 +0800	[thread overview]
Message-ID: <Y19kStu4zBFyvKPK@feng-clx> (raw)
In-Reply-To: <20221028175245.GN5600@paulmck-ThinkPad-P17-Gen-1>

On Fri, Oct 28, 2022 at 10:52:45AM -0700, Paul E. McKenney wrote:
[...]
> > > Left to myself, I would use a capped power-of-two backoff that was reset
> > > any time that the interval was within bounds.  Maybe a cap of 10 minutes?
> > 
> > Yep, that's more informative. Thanks!
> 
> And here is the resulting patch.  Thoughts?
> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> commit adc0512cf34fe7f0d73966d59644b826ee526742
> Author: Paul E. McKenney <paulmck@kernel.org>
> Date:   Fri Oct 28 10:38:58 2022 -0700
> 
>     clocksource: Exponential backoff for load-induced bogus watchdog reads
>     
>     The clocksource watchdog will reject measurements that are excessively
>     delayed, that is, by more than 1.5 seconds beyond the intended 0.5-second
>     watchdog interval.  On an extremely busy system, this can result in a
>     console message being printed every two seconds.  This is excessively
>     noisy for a non-error condition.
>     
>     Therefore, apply exponential backoff to these messages.  This exponential
>     backoff is capped at 1024 times the watchdog interval, which comes to
>     not quite one message per ten minutes.
>     
>     Please note that the bogus watchdog reads that occur when the watchdog
>     interval is less than 0.125 seconds are still printed unconditionally
>     because these likely correspond to a serious error condition in the
>     timer code or hardware.
>     
>     Reported-by: Waiman Long <longman@redhat.com>
>     Reported-by: Feng Tang <feng.tang@intel.com>
>     Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
>     Cc: John Stultz <jstultz@google.com>
>     Cc: Thomas Gleixner <tglx@linutronix.de>
>     Cc: Stephen Boyd <sboyd@kernel.org>
>     Cc: Feng Tang <feng.tang@intel.com>
>     Cc: Waiman Long <longman@redhat.com>
> 
> diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
> index 1d42d4b173271..daac05aedf56a 100644
> --- a/include/linux/clocksource.h
> +++ b/include/linux/clocksource.h
> @@ -125,6 +125,10 @@ struct clocksource {
>  	struct list_head	wd_list;
>  	u64			cs_last;
>  	u64			wd_last;
> +	u64			wd_last_bogus;
> +	int			wd_bogus_shift;
> +	unsigned long		wd_bogus_count;
> +	unsigned long		wd_bogus_count_last;
>  #endif
>  	struct module		*owner;
>  };
> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> index 3f5317faf891f..6537ffa02e445 100644
> --- a/kernel/time/clocksource.c
> +++ b/kernel/time/clocksource.c
> @@ -448,8 +448,26 @@ static void clocksource_watchdog(struct timer_list *unused)
>  			continue;
>  		}
>  		if (wd_nsec > (wdi << 2)) {

Just recalled one thing, that it may be better to check 'cs_nsec' 
instead of 'wd_nsec', as some watchdog may have small wrap-around
value. IIRC, HPET's counter is 32 bits long and wraps at about
300 seconds, and PMTIMER's counter is 24 bits which wraps at about
3 ~ 4 seconds. So when a long stall of the watchdog timer happens,
the watchdog's value could 'overflow' many times.

And usually the 'current' closcksource has longer wrap time than
the watchdog.


> -			/* This can happen on busy systems, which can delay the watchdog. */
> -			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval, probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
> +			bool needwarn = false;
> +			u64 wd_lb;
> +
> +			cs->wd_bogus_count++;
> +			if (!cs->wd_bogus_shift) {
> +				needwarn = true;
> +			} else {
> +				delta = clocksource_delta(wdnow, cs->wd_last_bogus, watchdog->mask);
> +				wd_lb = clocksource_cyc2ns(delta, watchdog->mult, watchdog->shift);
> +				if ((1 << cs->wd_bogus_shift) * wdi <= wd_lb)
> +					needwarn = true;

I'm not sure if we need to check the last_bogus counter, or just
the current interval 'cs_nsec' is what we care, and some code
like this ?

diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index daac05aedf56..3910dbb9b960 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -125,7 +125,6 @@ struct clocksource {
 	struct list_head	wd_list;
 	u64			cs_last;
 	u64			wd_last;
-	u64			wd_last_bogus;
 	int			wd_bogus_shift;
 	unsigned long		wd_bogus_count;
 	unsigned long		wd_bogus_count_last;
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 6537ffa02e44..8e6d498b1492 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -442,28 +442,18 @@ static void clocksource_watchdog(struct timer_list *unused)
 
 		/* Check for bogus measurements. */
 		wdi = jiffies_to_nsecs(WATCHDOG_INTERVAL);
-		if (wd_nsec < (wdi >> 2)) {
+		if (cs_nsec < (wdi >> 2)) {
 			/* This usually indicates broken timer code or hardware. */
-			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
+			pr_warn("timekeeping watchdog on CPU%d: clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), cs->name, wd_nsec, WATCHDOG_INTERVAL);
 			continue;
 		}
-		if (wd_nsec > (wdi << 2)) {
-			bool needwarn = false;
-			u64 wd_lb;
-
+		if (cs_nsec > (wdi << 2)) {
 			cs->wd_bogus_count++;
-			if (!cs->wd_bogus_shift) {
-				needwarn = true;
-			} else {
-				delta = clocksource_delta(wdnow, cs->wd_last_bogus, watchdog->mask);
-				wd_lb = clocksource_cyc2ns(delta, watchdog->mult, watchdog->shift);
-				if ((1 << cs->wd_bogus_shift) * wdi <= wd_lb)
-					needwarn = true;
-			}
-			if (needwarn) {
+			if (!cs->wd_bogus_shift ||
+			    (1 << cs->wd_bogus_shift) * wdi <= cs_nsec) {
 				/* This can happen on busy systems, which can delay the watchdog. */
-				pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval (%lu additional), probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL, cs->wd_bogus_count - cs->wd_bogus_count_last);
-				cs->wd_last_bogus = wdnow;
+				pr_warn("timekeeping watchdog on CPU%d: clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval (%lu additional), probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), cs->name, cs_nsec, WATCHDOG_INTERVAL, cs->wd_bogus_count - cs->wd_bogus_count_last);
+
 				if (cs->wd_bogus_shift < 10)
 					cs->wd_bogus_shift++;
 				cs->wd_bogus_count_last = cs->wd_bogus_count;

Thanks,
Feng


> +			}
> +			if (needwarn) {
> +				/* This can happen on busy systems, which can delay the watchdog. */
> +				pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval (%lu additional), probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL, cs->wd_bogus_count - cs->wd_bogus_count_last);
> +				cs->wd_last_bogus = wdnow;
> +				if (cs->wd_bogus_shift < 10)
> +					cs->wd_bogus_shift++;
> +				cs->wd_bogus_count_last = cs->wd_bogus_count;
> +			}
>  			continue;
>  		}
>  

  reply	other threads:[~2022-10-31  5:59 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-19 23:09 [PATCH clocksource] Reject bogus watchdog clocksource measurements Paul E. McKenney
2022-10-20  3:01 ` Waiman Long
2022-10-20  8:09 ` Feng Tang
2022-10-20 14:09   ` Paul E. McKenney
2022-10-20 17:29     ` Waiman Long
2022-10-21  0:46     ` Feng Tang
2022-10-28 17:52       ` Paul E. McKenney
2022-10-28 17:52       ` Paul E. McKenney
2022-10-31  5:59         ` Feng Tang [this message]
2022-10-31 17:42           ` Paul E. McKenney
2022-11-01  5:43             ` Feng Tang
2022-11-01 19:06               ` Paul E. McKenney
2022-11-02  2:58                 ` Feng Tang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y19kStu4zBFyvKPK@feng-clx \
    --to=feng.tang@intel.com \
    --cc=clm@meta.com \
    --cc=jstultz@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=sboyd@kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.