public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jiri Wiesner <jwiesner@suse.de>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org, John Stultz <jstultz@google.com>,
	Stephen Boyd <sboyd@kernel.org>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Feng Tang <feng.tang@intel.com>
Subject: Re: [PATCH v2] clocksource: Skip watchdog check for large watchdog intervals
Date: Fri, 19 Jan 2024 15:11:29 +0100	[thread overview]
Message-ID: <20240119141129.GJ3303@incl> (raw)
In-Reply-To: <87cyu4hih1.ffs@tglx>

On Sun, Jan 14, 2024 at 01:22:18AM +0100, Thomas Gleixner wrote:
> On Sat, Jan 13 2024 at 12:44, Jiri Wiesner wrote:
> > On Fri, Jan 12, 2024 at 05:48:22PM +0100, Thomas Gleixner wrote:
> >> On Wed, Jan 10 2024 at 20:26, Jiri Wiesner wrote:
> >> > The measured clocksource skew - the absolute difference between cs_nsec
> >> > and wd_nsec - was 668 microseconds:
> >> >> cs_nsec - wd_nsec = 14524115132 - 14523447520 = 667612
> >> >
> >> > The kernel (based on 5.14.21) used 200 microseconds for the
> >> > uncertainty_margin of both the clocksource and watchdog, resulting in a
> >> > threshold of 400 microseconds.  The discrepancy is that the measured
> >> > clocksource skew was evaluated against a threshold suited for watchdog
> >> > intervals of roughly WATCHDOG_INTERVAL, i.e. HZ >> 1, which is 0.5
> >> > second.
> >> 
> >> This really took some time to decode. What you are trying to explain is:
> >> 
> >>    The comparison between the clocksource and the watchdog is not
> >>    working for large readout intervals because the conversion to
> >>    nanoseconds is imprecise. The reason is that the initialization
> >>    values of the shift/mult pairs which are used for conversion are not
> >>    sufficiently accurate and the accumulated inaccuracy causes the
> >>    comparison to exceed the threshold.
> >
> > The root cause of the bug does not concern the precision of the conversion 
> > to nanoseconds. The shift/mult pair of the TSC can convert diffs as large 
> > as 600 seconds. The HPET is limited to 179.0 seconds on account of being a 
> > 32-bit counter. The acpi_pm can convert only 4.7 seconds. With the 
> > CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE option enabled, the ranges are 
> > reduced to a half. The example above showed the TSC as the clocksource and 
> > the HPET as a watchdog both of which should be able to convert a diff of 
> > 14.5 seconds to nanoseconds with sufficient precision.
> 
> No. It _is_ an initialization and conversion precision problem, nothing
> else.
> 
> Assume a perfect world where the frequency of the TSC and the frequency
> of the HPET are precisely known at initialization time and the
> conversion factors to nanoseconds are precise as well. Assume further
> that the clock frequencies wont drift over time differently.
> 
> Then the relationship between the nanosecond converted readouts of TSC
> and HPET (or any other watchdog clocksource) would be constant
> independent of the readout interval with the following prerequisites:
> 
>   1) freq(TSC) / freq(HPET) == constant
> 
>   2) Conversion from TSC to nanoseconds is correct independent of the
>      interval
> 
>   3) Conversion from HPET to nanoseconds is correct independent of the
>      interval
> 
> So if all apply the uncertainty margin would be correct for any
> valid non wrapped around readout interval

Agreed.

> valid non wrapped around readout interval, which is a completely
> different issue to take care of.

It is. I did not mean to address that with the patch but did only 
coincidentally.

> Though because neither the frequency values are precise nor the
> conversion factors to nanoseconds are precise for a larger interval nor
> there is a guarantee that the clock frequencies of both clocks can't
> drift differently all prerequisites #1 - #3 above are not there.
> 
> As a consequence there _is_ a limitation to the readout interval where
> the comparison is valid.

I must admit the root cause of both inaccuracy of the shift/mult pairs and 
conversion precision fits the observed behaviour. The observed behaviour 
is that watchdog checks may fail for large readout intervals on 8 NUMA node 
machines. So, my hypothesis was that the size of the skew is directly 
proportinal to the length of the readout interval. This is what the patch 
means to address.
-- 
Jiri Wiesner
SUSE Labs

      reply	other threads:[~2024-01-19 14:11 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-10 19:26 [PATCH v2] clocksource: Skip watchdog check for large watchdog intervals Jiri Wiesner
2024-01-12 16:48 ` Thomas Gleixner
2024-01-13 11:44   ` Jiri Wiesner
2024-01-14  0:22     ` Thomas Gleixner
2024-01-19 14:11       ` Jiri Wiesner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240119141129.GJ3303@incl \
    --to=jwiesner@suse.de \
    --cc=feng.tang@intel.com \
    --cc=jstultz@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@kernel.org \
    --cc=sboyd@kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox