All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chao Gao <chao.gao@intel.com>
To: lkp@lists.01.org
Subject: Re: [clocksource] 8901ecc231: stress-ng.lockbus.ops_per_sec -9.5% regression
Date: Mon, 02 Aug 2021 14:20:09 +0800	[thread overview]
Message-ID: <20210802062008.GA24720@gao-cwp> (raw)
In-Reply-To: <20210527182959.GA437082@paulmck-ThinkPad-P17-Gen-1>

[-- Attachment #1: Type: text/plain, Size: 7111 bytes --]

[snip]
>commit 48ebcfbfd877f5d9cddcc03c91352a8ca7b190af
>Author: Paul E. McKenney <paulmck@kernel.org>
>Date:   Thu May 27 11:03:28 2021 -0700
>
>    clocksource: Forgive repeated long-latency watchdog clocksource reads
>    
>    Currently, the clocksource watchdog reacts to repeated long-latency
>    clocksource reads by marking that clocksource unstable on the theory that
>    these long-latency reads are a sign of a serious problem.  And this theory
>    does in fact have real-world support in the form of firmware issues [1].
>    
>    However, it is also possible to trigger this using stress-ng on what
>    the stress-ng man page terms "poorly designed hardware" [2].  And it
>    is not necessarily a bad thing for the kernel to diagnose cases where
>    high-stress workloads are being run on hardware that is not designed
>    for this sort of use.
>    
>    Nevertheless, it is quite possible that real-world use will result in
>    some situation requiring that high-stress workloads run on hardware
>    not designed to accommodate them, and also requiring that the kernel
>    refrain from marking clocksources unstable.
>    
>    Therefore, provide an out-of-tree patch that reacts to this situation
>    by leaving the clocksource alone, but using the old 62.5-millisecond
>    skew-detection threshold in response persistent long-latency reads.
>    In addition, the offending clocksource is marked for re-initialization
>    in this case, which both restarts that clocksource with a clean bill of
>    health and avoids false-positive skew reports on later watchdog checks.

Hi Paul,

Sorry to dig out this old thread.

I am testing with this patch in a VM, but I find sometimes re-initialization
after coarse-grained skew check may not happen as expected because ...

>    
>    Link: https://lore.kernel.org/lkml/20210513155515.GB23902(a)xsang-OptiPlex-9020/ # [1]
>    Link: https://lore.kernel.org/lkml/20210521083322.GG25531(a)xsang-OptiPlex-9020/ # [2]
>    Link: https://lore.kernel.org/lkml/20210521084405.GH25531(a)xsang-OptiPlex-9020/
>    Link: https://lore.kernel.org/lkml/20210511233403.GA2896757(a)paulmck-ThinkPad-P17-Gen-1/
>    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
>
>diff --git a/kernel/time/clocksource-wdtest.c b/kernel/time/clocksource-wdtest.c
>index 01df12395c0e..b72a969f7b93 100644
>--- a/kernel/time/clocksource-wdtest.c
>+++ b/kernel/time/clocksource-wdtest.c
>@@ -146,13 +146,12 @@ static int wdtest_func(void *arg)
> 		else if (i <= max_cswd_read_retries)
> 			s = ", expect message";
> 		else
>-			s = ", expect clock skew";
>+			s = ", expect coarse-grained clock skew check and re-initialization";
> 		pr_info("--- Watchdog with %dx error injection, %lu retries%s.\n", i, max_cswd_read_retries, s);
> 		WRITE_ONCE(wdtest_ktime_read_ndelays, i);
> 		schedule_timeout_uninterruptible(2 * HZ);
> 		WARN_ON_ONCE(READ_ONCE(wdtest_ktime_read_ndelays));
>-		WARN_ON_ONCE((i <= max_cswd_read_retries) !=
>-			     !(clocksource_wdtest_ktime.flags & CLOCK_SOURCE_UNSTABLE));
>+		WARN_ON_ONCE(clocksource_wdtest_ktime.flags & CLOCK_SOURCE_UNSTABLE);
> 		wdtest_ktime_clocksource_reset();
> 	}
> 
>diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
>index 4485635b69f5..6c0820779bd3 100644
>--- a/kernel/time/clocksource.c
>+++ b/kernel/time/clocksource.c
>@@ -225,13 +225,13 @@ static bool cs_watchdog_read(struct clocksource *cs, u64 *csnow, u64 *wdnow)
> 				pr_warn("timekeeping watchdog on CPU%d: %s retried %d times before success\n",
> 					smp_processor_id(), watchdog->name, nretries);
> 			}
>-			return true;
>+			return false;
> 		}
> 	}
> 
>-	pr_warn("timekeeping watchdog on CPU%d: %s read-back delay of %lldns, attempt %d, marking unstable\n",
>+	pr_warn("timekeeping watchdog on CPU%d: %s read-back delay of %lldns, attempt %d, coarse-grained skew check followed by re-initialization\n",
> 		smp_processor_id(), watchdog->name, wd_delay, nretries);
>-	return false;
>+	return true;
> }
> 
> static u64 csnow_mid;
>@@ -355,6 +355,7 @@ static void clocksource_watchdog(struct timer_list *unused)
> 	int next_cpu, reset_pending;
> 	int64_t wd_nsec, cs_nsec;
> 	struct clocksource *cs;
>+	bool coarse;
> 	u32 md;
> 
> 	spin_lock(&watchdog_lock);
>@@ -372,11 +373,7 @@ static void clocksource_watchdog(struct timer_list *unused)
> 			continue;
> 		}
> 
>-		if (!cs_watchdog_read(cs, &csnow, &wdnow)) {
>-			/* Clock readout unreliable, so give it up. */
>-			__clocksource_unstable(cs);
>-			continue;
>-		}
>+		coarse = cs_watchdog_read(cs, &csnow, &wdnow);
> 
> 		/* Clocksource initialized ? */
> 		if (!(cs->flags & CLOCK_SOURCE_WATCHDOG) ||
>@@ -402,7 +399,13 @@ static void clocksource_watchdog(struct timer_list *unused)
> 			continue;
> 
> 		/* Check the deviation from the watchdog clocksource. */
>-		md = cs->uncertainty_margin + watchdog->uncertainty_margin;
>+		if (coarse) {
>+			md = 62500 * NSEC_PER_USEC;
>+			cs->flags &= ~CLOCK_SOURCE_WATCHDOG;
>+			pr_warn("timekeeping watchdog on CPU%d: %s coarse-grained %lu.%03lu ms clock-skew check followed by re-initialization\n", smp_processor_id(), watchdog->name, md / NSEC_PER_MSEC, md % NSEC_PER_MSEC / NSEC_PER_USEC);

... this message on CPU5 doesn't show up in below kernel logs.
Do you think it is a bug? if yes, any idea to resolve it?

[  498.571086] clocksource: timekeeping watchdog on CPU1: hpet read-back delay of 432490ns, attempt 4, coarse-grained skew check followed by re-initialization
[  498.572867] clocksource: timekeeping watchdog on CPU1: hpet coarse-grained 62.500 ms clock-skew check followed by re-initialization
[  504.071959] clocksource: timekeeping watchdog on CPU4: hpet read-back delay of 1679880ns, attempt 4, coarse-grained skew check followed by re-initialization
[  504.073817] clocksource: timekeeping watchdog on CPU4: hpet coarse-grained 62.500 ms clock-skew check followed by re-initialization
[  504.568821] clocksource: timekeeping watchdog on CPU5: hpet read-back delay of 554880ns, attempt 4, coarse-grained skew check followed by re-initialization
[  505.067666] clocksource: timekeeping watchdog on CPU6: hpet retried 3 times before success
[  505.068593] clocksource: timekeeping watchdog on CPU6: Marking clocksource 'tsc' as unstable because the skew is too large:
[  505.069596] clocksource:                       'hpet' wd_nsec: 499376790 wd_now: be2f200d wd_last: bb3522fe mask: ffffffff
[  505.071131] clocksource:                       'tsc' cs_nsec: 498867307 cs_now: 103895c060a cs_last: 1034aea96ea mask: ffffffffffffffff
[  505.072994] clocksource:                       'tsc' is current clocksource.
[  505.074748] tsc: Marking TSC unstable due to clocksource watchdog

Thanks
-Chao

>+		} else {
>+			md = cs->uncertainty_margin + watchdog->uncertainty_margin;
>+		}
> 		if (abs(cs_nsec - wd_nsec) > md) {
> 			pr_warn("timekeeping watchdog on CPU%d: Marking clocksource '%s' as unstable because the skew is too large:\n",
> 				smp_processor_id(), cs->name);

WARNING: multiple messages have this Message-ID (diff)
From: Chao Gao <chao.gao@intel.com>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Feng Tang <feng.tang@intel.com>,
	kernel test robot <oliver.sang@intel.com>,
	John Stultz <john.stultz@linaro.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Stephen Boyd <sboyd@kernel.org>, Jonathan Corbet <corbet@lwn.net>,
	Mark Rutland <Mark.Rutland@arm.com>,
	Marc Zyngier <maz@kernel.org>, Andi Kleen <ak@linux.intel.com>,
	Xing Zhengjun <zhengjun.xing@linux.intel.com>,
	Chris Mason <clm@fb.com>, LKML <linux-kernel@vger.kernel.org>,
	Linux Memory Management List <linux-mm@kvack.org>,
	lkp@lists.01.org, lkp@intel.com, ying.huang@intel.com,
	zhengjun.xing@intel.com
Subject: Re: [clocksource]  8901ecc231:  stress-ng.lockbus.ops_per_sec -9.5% regression
Date: Mon, 2 Aug 2021 14:20:09 +0800	[thread overview]
Message-ID: <20210802062008.GA24720@gao-cwp> (raw)
In-Reply-To: <20210527182959.GA437082@paulmck-ThinkPad-P17-Gen-1>

[snip]
>commit 48ebcfbfd877f5d9cddcc03c91352a8ca7b190af
>Author: Paul E. McKenney <paulmck@kernel.org>
>Date:   Thu May 27 11:03:28 2021 -0700
>
>    clocksource: Forgive repeated long-latency watchdog clocksource reads
>    
>    Currently, the clocksource watchdog reacts to repeated long-latency
>    clocksource reads by marking that clocksource unstable on the theory that
>    these long-latency reads are a sign of a serious problem.  And this theory
>    does in fact have real-world support in the form of firmware issues [1].
>    
>    However, it is also possible to trigger this using stress-ng on what
>    the stress-ng man page terms "poorly designed hardware" [2].  And it
>    is not necessarily a bad thing for the kernel to diagnose cases where
>    high-stress workloads are being run on hardware that is not designed
>    for this sort of use.
>    
>    Nevertheless, it is quite possible that real-world use will result in
>    some situation requiring that high-stress workloads run on hardware
>    not designed to accommodate them, and also requiring that the kernel
>    refrain from marking clocksources unstable.
>    
>    Therefore, provide an out-of-tree patch that reacts to this situation
>    by leaving the clocksource alone, but using the old 62.5-millisecond
>    skew-detection threshold in response persistent long-latency reads.
>    In addition, the offending clocksource is marked for re-initialization
>    in this case, which both restarts that clocksource with a clean bill of
>    health and avoids false-positive skew reports on later watchdog checks.

Hi Paul,

Sorry to dig out this old thread.

I am testing with this patch in a VM, but I find sometimes re-initialization
after coarse-grained skew check may not happen as expected because ...

>    
>    Link: https://lore.kernel.org/lkml/20210513155515.GB23902@xsang-OptiPlex-9020/ # [1]
>    Link: https://lore.kernel.org/lkml/20210521083322.GG25531@xsang-OptiPlex-9020/ # [2]
>    Link: https://lore.kernel.org/lkml/20210521084405.GH25531@xsang-OptiPlex-9020/
>    Link: https://lore.kernel.org/lkml/20210511233403.GA2896757@paulmck-ThinkPad-P17-Gen-1/
>    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
>
>diff --git a/kernel/time/clocksource-wdtest.c b/kernel/time/clocksource-wdtest.c
>index 01df12395c0e..b72a969f7b93 100644
>--- a/kernel/time/clocksource-wdtest.c
>+++ b/kernel/time/clocksource-wdtest.c
>@@ -146,13 +146,12 @@ static int wdtest_func(void *arg)
> 		else if (i <= max_cswd_read_retries)
> 			s = ", expect message";
> 		else
>-			s = ", expect clock skew";
>+			s = ", expect coarse-grained clock skew check and re-initialization";
> 		pr_info("--- Watchdog with %dx error injection, %lu retries%s.\n", i, max_cswd_read_retries, s);
> 		WRITE_ONCE(wdtest_ktime_read_ndelays, i);
> 		schedule_timeout_uninterruptible(2 * HZ);
> 		WARN_ON_ONCE(READ_ONCE(wdtest_ktime_read_ndelays));
>-		WARN_ON_ONCE((i <= max_cswd_read_retries) !=
>-			     !(clocksource_wdtest_ktime.flags & CLOCK_SOURCE_UNSTABLE));
>+		WARN_ON_ONCE(clocksource_wdtest_ktime.flags & CLOCK_SOURCE_UNSTABLE);
> 		wdtest_ktime_clocksource_reset();
> 	}
> 
>diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
>index 4485635b69f5..6c0820779bd3 100644
>--- a/kernel/time/clocksource.c
>+++ b/kernel/time/clocksource.c
>@@ -225,13 +225,13 @@ static bool cs_watchdog_read(struct clocksource *cs, u64 *csnow, u64 *wdnow)
> 				pr_warn("timekeeping watchdog on CPU%d: %s retried %d times before success\n",
> 					smp_processor_id(), watchdog->name, nretries);
> 			}
>-			return true;
>+			return false;
> 		}
> 	}
> 
>-	pr_warn("timekeeping watchdog on CPU%d: %s read-back delay of %lldns, attempt %d, marking unstable\n",
>+	pr_warn("timekeeping watchdog on CPU%d: %s read-back delay of %lldns, attempt %d, coarse-grained skew check followed by re-initialization\n",
> 		smp_processor_id(), watchdog->name, wd_delay, nretries);
>-	return false;
>+	return true;
> }
> 
> static u64 csnow_mid;
>@@ -355,6 +355,7 @@ static void clocksource_watchdog(struct timer_list *unused)
> 	int next_cpu, reset_pending;
> 	int64_t wd_nsec, cs_nsec;
> 	struct clocksource *cs;
>+	bool coarse;
> 	u32 md;
> 
> 	spin_lock(&watchdog_lock);
>@@ -372,11 +373,7 @@ static void clocksource_watchdog(struct timer_list *unused)
> 			continue;
> 		}
> 
>-		if (!cs_watchdog_read(cs, &csnow, &wdnow)) {
>-			/* Clock readout unreliable, so give it up. */
>-			__clocksource_unstable(cs);
>-			continue;
>-		}
>+		coarse = cs_watchdog_read(cs, &csnow, &wdnow);
> 
> 		/* Clocksource initialized ? */
> 		if (!(cs->flags & CLOCK_SOURCE_WATCHDOG) ||
>@@ -402,7 +399,13 @@ static void clocksource_watchdog(struct timer_list *unused)
> 			continue;
> 
> 		/* Check the deviation from the watchdog clocksource. */
>-		md = cs->uncertainty_margin + watchdog->uncertainty_margin;
>+		if (coarse) {
>+			md = 62500 * NSEC_PER_USEC;
>+			cs->flags &= ~CLOCK_SOURCE_WATCHDOG;
>+			pr_warn("timekeeping watchdog on CPU%d: %s coarse-grained %lu.%03lu ms clock-skew check followed by re-initialization\n", smp_processor_id(), watchdog->name, md / NSEC_PER_MSEC, md % NSEC_PER_MSEC / NSEC_PER_USEC);

... this message on CPU5 doesn't show up in below kernel logs.
Do you think it is a bug? if yes, any idea to resolve it?

[  498.571086] clocksource: timekeeping watchdog on CPU1: hpet read-back delay of 432490ns, attempt 4, coarse-grained skew check followed by re-initialization
[  498.572867] clocksource: timekeeping watchdog on CPU1: hpet coarse-grained 62.500 ms clock-skew check followed by re-initialization
[  504.071959] clocksource: timekeeping watchdog on CPU4: hpet read-back delay of 1679880ns, attempt 4, coarse-grained skew check followed by re-initialization
[  504.073817] clocksource: timekeeping watchdog on CPU4: hpet coarse-grained 62.500 ms clock-skew check followed by re-initialization
[  504.568821] clocksource: timekeeping watchdog on CPU5: hpet read-back delay of 554880ns, attempt 4, coarse-grained skew check followed by re-initialization
[  505.067666] clocksource: timekeeping watchdog on CPU6: hpet retried 3 times before success
[  505.068593] clocksource: timekeeping watchdog on CPU6: Marking clocksource 'tsc' as unstable because the skew is too large:
[  505.069596] clocksource:                       'hpet' wd_nsec: 499376790 wd_now: be2f200d wd_last: bb3522fe mask: ffffffff
[  505.071131] clocksource:                       'tsc' cs_nsec: 498867307 cs_now: 103895c060a cs_last: 1034aea96ea mask: ffffffffffffffff
[  505.072994] clocksource:                       'tsc' is current clocksource.
[  505.074748] tsc: Marking TSC unstable due to clocksource watchdog

Thanks
-Chao

>+		} else {
>+			md = cs->uncertainty_margin + watchdog->uncertainty_margin;
>+		}
> 		if (abs(cs_nsec - wd_nsec) > md) {
> 			pr_warn("timekeeping watchdog on CPU%d: Marking clocksource '%s' as unstable because the skew is too large:\n",
> 				smp_processor_id(), cs->name);


  parent reply	other threads:[~2021-08-02  6:20 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-21  8:33 [clocksource] 8901ecc231: stress-ng.lockbus.ops_per_sec -9.5% regression kernel test robot
2021-05-21  8:33 ` kernel test robot
2021-05-21 13:56 ` Paul E. McKenney
2021-05-21 13:56   ` Paul E. McKenney
2021-05-22 16:08   ` Paul E. McKenney
2021-05-22 16:08     ` Paul E. McKenney
2021-05-26  6:49     ` Feng Tang
2021-05-26  6:49       ` Feng Tang
2021-05-26 13:49       ` Paul E. McKenney
2021-05-26 13:49         ` Paul E. McKenney
2021-05-27 18:29         ` Paul E. McKenney
2021-05-27 18:29           ` Paul E. McKenney
2021-05-27 19:01           ` Andi Kleen
2021-05-27 19:01             ` Andi Kleen
2021-05-27 19:19             ` Paul E. McKenney
2021-05-27 19:19               ` Paul E. McKenney
2021-05-27 19:29               ` Matthew Wilcox
2021-05-27 19:29                 ` Matthew Wilcox
2021-05-27 21:05                 ` Paul E. McKenney
2021-05-27 21:05                   ` Paul E. McKenney
2021-05-28  0:58                   ` Andi Kleen
2021-05-28  0:58                     ` Andi Kleen
2021-06-01 17:10                     ` Paul E. McKenney
2021-06-01 17:10                       ` Paul E. McKenney
2021-08-02  6:20           ` Chao Gao [this message]
2021-08-02  6:20             ` Chao Gao
2021-08-02 17:02             ` Paul E. McKenney
2021-08-02 17:02               ` Paul E. McKenney
2021-08-03  8:58               ` Chao Gao
2021-08-03  8:58                 ` Chao Gao
2021-08-03 13:48                 ` Paul E. McKenney
2021-08-03 13:48                   ` Paul E. McKenney
2021-08-05  2:16                   ` Chao Gao
2021-08-05  2:16                     ` Chao Gao
2021-08-05  4:03                     ` Paul E. McKenney
2021-08-05  4:03                       ` Paul E. McKenney
2021-08-05  4:34                       ` Andi Kleen
2021-08-05  4:34                         ` Andi Kleen
2021-08-05 15:33                         ` Paul E. McKenney
2021-08-05 15:33                           ` Paul E. McKenney
2021-08-05  5:39                       ` Chao Gao
2021-08-05  5:39                         ` Chao Gao
2021-08-05 15:37                         ` Paul E. McKenney
2021-08-05 15:37                           ` Paul E. McKenney
2021-08-06  2:10                           ` Chao Gao
2021-08-06  2:10                             ` Chao Gao
2021-08-06  4:15                             ` Paul E. McKenney
2021-08-06  4:15                               ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210802062008.GA24720@gao-cwp \
    --to=chao.gao@intel.com \
    --cc=lkp@lists.01.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.