From: Feng Tang <feng.tang@intel.com>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Waiman Long <longman@redhat.com>,
John Stultz <jstultz@google.com>,
Thomas Gleixner <tglx@linutronix.de>,
Stephen Boyd <sboyd@kernel.org>, <x86@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
<linux-kernel@vger.kernel.org>, Tim Chen <tim.c.chen@intel.com>
Subject: Re: [RFC PATCH] clocksource: Suspend the watchdog temporarily when high read lantency detected
Date: Thu, 22 Dec 2022 14:00:42 +0800 [thread overview]
Message-ID: <Y6PyisHYYtde/6Xk@feng-clx> (raw)
In-Reply-To: <20221222055515.GJ4001@paulmck-ThinkPad-P17-Gen-1>
On Wed, Dec 21, 2022 at 09:55:15PM -0800, Paul E. McKenney wrote:
> On Wed, Dec 21, 2022 at 10:39:53PM -0500, Waiman Long wrote:
> > On 12/21/22 19:40, Paul E. McKenney wrote:
> > > commit 199dfa2ba23dd0d650b1482a091e2e15457698b7
> > > Author: Paul E. McKenney<paulmck@kernel.org>
> > > Date: Wed Dec 21 16:20:25 2022 -0800
> > >
> > > clocksource: Verify HPET and PMTMR when TSC unverified
> > > On systems with two or fewer sockets, when the boot CPU has CONSTANT_TSC,
> > > NONSTOP_TSC, and TSC_ADJUST, clocksource watchdog verification of the
> > > TSC is disabled. This works well much of the time, but there is the
> > > occasional system that meets all of these criteria, but which still
> > > has a TSC that skews significantly from atomic-clock time. This is
> > > usually attributed to a firmware or hardware fault. Yes, the various
> > > NTP daemons do express their opinions of userspace-to-atomic-clock time
> > > skew, but they put them in various places, depending on the daemon and
> > > distro in question. It would therefore be good for the kernel to have
> > > some clue that there is a problem.
> > > The old behavior of marking the TSC unstable is a non-starter because a
> > > great many workloads simply cannot tolerate the overheads and latencies
> > > of the various non-TSC clocksources. In addition, NTP-corrected systems
> > > often seem to be able to tolerate significant kernel-space time skew as
> > > long as the userspace time sources are within epsilon of atomic-clock
> > > time.
> > > Therefore, when watchdog verification of TSC is disabled, enable it for
> > > HPET and PMTMR (AKA ACPI PM timer). This provides the needed in-kernel
> > > time-skew diagnostic without degrading the system's performance.
> > > Signed-off-by: Paul E. McKenney<paulmck@kernel.org>
> > > Cc: Thomas Gleixner<tglx@linutronix.de>
> > > Cc: Ingo Molnar<mingo@redhat.com>
> > > Cc: Borislav Petkov<bp@alien8.de>
> > > Cc: Dave Hansen<dave.hansen@linux.intel.com>
> > > Cc: "H. Peter Anvin"<hpa@zytor.com>
> > > Cc: Daniel Lezcano<daniel.lezcano@linaro.org>
> > > Cc: Feng Tang<feng.tang@intel.com>
> > > Cc: Waiman Long <longman@redhat.com
> > > Cc:<x86@kernel.org>
> >
> > As I currently understand, you are trying to use TSC as a watchdog to check
> > against HPET and PMTMR. I do have 2 questions about this patch.
> >
> > First of all, why you need to use both HPET and PMTMR? Can you just use one
> > of those that are available. Secondly, is it possible to enable this
> > time-skew diagnostic for a limit amount of time instead running
> > indefinitely? The running of the clocksource watchdog itself will still
> > consume a tiny amount of CPU cycles.
>
> I could certainly do something so that only the first of HPET and PMTMR
> is checked. Could you give me a quick run-through of the advantages of
> using only one? I would need to explain that in the commit log.
>
> Would it make sense to have a kernel boot variable giving the number of
> minutes for which the watchdog was to run, with a default of zero
> meaning "indefinitely"?
We've discussed about the "os noise", which customer may really care.
IIUC, this patch intends to test if HPET/PMTIMER HW is broken, so how
about making it run for a number of minutes the default behavior.
Also I've run the patch on a Alderlake system, with a fine acpi pm_timer
and a fake broken pm_timer, and they both works without errors.
Thanks,
Feng
> Thanx, Paul
next prev parent reply other threads:[~2022-12-22 6:03 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-20 8:25 [RFC PATCH] clocksource: Suspend the watchdog temporarily when high read lantency detected Feng Tang
2022-12-20 16:11 ` Waiman Long
2022-12-20 18:34 ` Paul E. McKenney
2022-12-21 1:01 ` Feng Tang
2022-12-21 3:26 ` Waiman Long
2022-12-22 0:40 ` Paul E. McKenney
2022-12-22 3:39 ` Waiman Long
2022-12-22 5:55 ` Paul E. McKenney
2022-12-22 6:00 ` Feng Tang [this message]
2022-12-22 6:14 ` Paul E. McKenney
2022-12-22 6:37 ` Feng Tang
2022-12-22 18:24 ` Paul E. McKenney
2022-12-22 21:42 ` Paul E. McKenney
2022-12-22 23:28 ` Paul E. McKenney
2022-12-23 2:09 ` Feng Tang
2022-12-23 3:37 ` Paul E. McKenney
[not found] ` <ad71008d-4acc-d211-dc19-c33bb25ff42c@redhat.com>
2022-12-23 4:14 ` Feng Tang
2022-12-27 18:38 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y6PyisHYYtde/6Xk@feng-clx \
--to=feng.tang@intel.com \
--cc=jstultz@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=longman@redhat.com \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=sboyd@kernel.org \
--cc=tglx@linutronix.de \
--cc=tim.c.chen@intel.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.