Re: [PATCH RFC] timekeeping: Fix clock stability with nohz

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: John Stultz <john.stultz@linaro.org>
To: Richard Cochran <richardcochran@gmail.com>,
	Miroslav Lichvar <mlichvar@redhat.com>
Cc: linux-kernel@vger.kernel.org, Prarit Bhargava <prarit@redhat.com>
Subject: Re: [PATCH RFC] timekeeping: Fix clock stability with nohz
Date: Mon, 18 Nov 2013 13:28:07 -0800	[thread overview]
Message-ID: <528A8667.7000304@linaro.org> (raw)
In-Reply-To: <20131116070304.GB4355@netboy>

On 11/15/2013 11:03 PM, Richard Cochran wrote:
> On Thu, Nov 14, 2013 at 03:50:40PM +0100, Miroslav Lichvar wrote:
>
>>  include/linux/timekeeper_internal.h |   4 +
>>  kernel/time/timekeeping.c           | 209 +++++-------------------------------
>>  2 files changed, 31 insertions(+), 182 deletions(-)
> This looks like an impressive simplification...

Indeed!

>
>> -	 * So the following can be confusing.
> Yep.
>
> So I really have no idea how the deleted code worked (or didn't work
> for nohz), but I can confirm that nohz time keeping is broken under
> light system load. Running a high load (like recompiling the kernel on
> all cores for CONFIG_HZ_PERIODIC=y ;) hides the issue, but that is
> obviously not the right solution.
Thanks for confirming the issue, and yes, forcing periodic hz isn't the
right solution.

That said there is a bit of a tension between very long nohz periods and
very accurate ntp syncing. Its a bit like sleeping at the wheel: you can
either get your rest, or stay on the road. :)  

It may be that we need to feed some of the ntp error state into the
tick-scheduling logic, so we don't try to sleep very long when we know
we're on a curvy bend.

Also, just for clarity's sake, when you're seeing things "broken",
curious how/what you are measuring?

Also is this brokenness something that has been around for awhile for
you or more recently cropped up?  I'm wondering as nohz idle has been
around for quite a few years and I've not seen too many complaints. So
I'm wondering if I'm looking in the right places, or if something has
regressed recently, or if its just that accuracy expectations have gone up?

> Out of my ignorance, two questions spring to mind.
>
> 1. Considering the simplicity of Miroslav's patch, what was the
>    benefit of the much more complicated code in the first place?

The much more complicated code was designed by Roman quite awhile back
(2006-ish). He was extremely bright, but produced very opaque code. It
made it very difficult to review, but once you sat down for awhile and
worked out the math, it ended up being correct and very efficient.

His concern was mostly about making the timer interrupt very fast on
very slow hardware (m68k was the architecture he co-maintained at the
time). So the code is very optimized for steady state, where the
adjustment value is 1,0, or -1. And we avoid doing any expensive math
operations (no multiplies, no divides), even in the non-steady state path.

Now it was designed before nohz was common (and back when nohz for a
full second was considered a aggresive length), so I could very well
believe that it has been stretched past its limit.  So I think your
point above about understanding exactly how it doesn't work for nohz is key.

Now, there is some parts of his look-ahead logic that I never quite
understood the rational for (see the top part of timekeeping_bigadjust).
At the time it was designed, I preferred a more PID-like approach, but
it was more computationally expensive and I couldn't measure any benefit
to my approach over his, so that part stuck around.

> 2. Does this patch work in the CONFIG_HZ_PERIODIC case just as well as
>    the deleted code?

As I mentioned in my other email, I'm a little concerned about some of
the accounting that is removed. At a blackbox level, its not handling
all the same values, so I suspect there's some issues here, but they may
not be very significant or difficult to fix.

So while I'm very cautious to throwing out the complex code which has
worked relatively well for quite awhile, I am very interested in coming
up with a solution that is easier to understand and validate as correct.

thanks
-john

next prev parent reply	other threads:[~2013-11-18 21:28 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-14 14:50 [PATCH RFC] timekeeping: Fix clock stability with nohz Miroslav Lichvar
2013-11-14 18:01 ` Rik van Riel
2013-11-16  7:03 ` Richard Cochran
2013-11-18 21:28   ` John Stultz [this message]
2013-11-19 14:13     ` Richard Cochran
2013-11-27 10:07       ` Richard Cochran
2013-11-21 10:12     ` Miroslav Lichvar
2013-11-18 20:46 ` John Stultz
2013-11-20 18:39   ` Miroslav Lichvar
2013-12-03  0:53     ` John Stultz
2013-12-03  4:03       ` John Stultz
2013-12-06 14:26         ` Miroslav Lichvar
2013-12-06 18:09           ` John Stultz
2013-12-06 18:37             ` Miroslav Lichvar
2013-12-07  1:43           ` John Stultz
2013-12-07 17:56             ` Richard Cochran
2013-12-07 22:16               ` John Stultz
2013-12-10 10:20             ` Miroslav Lichvar
2013-12-10 11:26               ` Miroslav Lichvar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=528A8667.7000304@linaro.org \
    --to=john.stultz@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mlichvar@redhat.com \
    --cc=prarit@redhat.com \
    --cc=richardcochran@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).