From: Ingo Molnar <mingo@elte.hu>
To: John Kacur <jkacur@redhat.com>
Cc: john stultz <johnstul@us.ibm.com>,
lkml <linux-kernel@vger.kernel.org>,
Clark Williams <williams@redhat.com>,
Martin Schwidefsky <schwidefsky@de.ibm.com>,
Thomas Gleixner <tglx@linutronix.de>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH 1/2] time: logrithmic time accumulation
Date: Mon, 5 Oct 2009 13:52:18 +0200 [thread overview]
Message-ID: <20091005115218.GA7475@elte.hu> (raw)
In-Reply-To: <alpine.LFD.2.00.0910051228420.5132@localhost.localdomain>
* John Kacur <jkacur@redhat.com> wrote:
>
>
> On Fri, 2 Oct 2009, john stultz wrote:
>
> > Accumulating one tick at a time works well unless we're using NOHZ. Then
> > it can be an issue, since we may have to run through the loop a few
> > thousand times, which can increase timer interrupt caused latency.
> >
> > The current solution was to accumulate in half-second intervals with
> > NOHZ. This kept the number of loops down, however it did slightly change
> > how we make NTP adjustments. While not an issue with NTPd users, as NTPd
> > makes adjustments over a longer period of time, other adjtimex() users
> > have noticed the half-second granularity with which we can apply
> > frequency changes to the clock.
> >
> > For instance, if a application tries to apply a 100ppm frequency
> > correction for 20ms to correct a 2us offset, with NOHZ they either get
> > no correction, or a 50us correction.
> >
> > Now, there will always be some granularity error for applying frequency
> > corrections. However with users sensitive to this error have seen a
> > 50-500x increase with NOHZ compared to running without NOHZ.
> >
> > So I figured I'd try another approach then just simply increasing the
> > interval. My approach is to consume the time interval logarithmically.
> > This reduces the number of times through the loop needed keeping
> > latency down, while still preserving the original granularity error for
> > adjtimex() changes.
> >
> > Further, this change allows us to remove the xtime_cache code (patch to
> > follow), as xtime is always within one tick of the current time, instead
> > of the half-second updates it saw before.
> >
> > An earlier version of this patch has been shipping to x86 users in the
> > RedHat MRG releases for awhile without issue, but I've reworked this
> > version to be even more careful about avoiding possible overflows if the
> > shift value gets too large.
> >
> > Since this is not the most trivial code, and its slightly different then
> > whats been tested for awhile, it would be good to get this into some
> > trees for testing. Be it -tip or -mm, either would work. If there's no
> > problems it could be a 2.6.33 or 2.6.34 item.
> >
> > Any comments or feedback would be appreciated!
> >
> > Signed-off-by: John Stultz <johnstul@us.ibm.com>
> >
> > diff --git a/include/linux/timex.h b/include/linux/timex.h
> > index e6967d1..0c0ef7d 100644
> > --- a/include/linux/timex.h
> > +++ b/include/linux/timex.h
> > @@ -261,11 +261,7 @@ static inline int ntp_synced(void)
> >
> > #define NTP_SCALE_SHIFT 32
> >
> > -#ifdef CONFIG_NO_HZ
> > -#define NTP_INTERVAL_FREQ (2)
> > -#else
> > #define NTP_INTERVAL_FREQ (HZ)
> > -#endif
> > #define NTP_INTERVAL_LENGTH (NSEC_PER_SEC/NTP_INTERVAL_FREQ)
> >
> > /* Returns how long ticks are at present, in ns / 2^NTP_SCALE_SHIFT. */
> > diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> > index fb0f46f..4cc5656 100644
> > --- a/kernel/time/timekeeping.c
> > +++ b/kernel/time/timekeeping.c
> > @@ -721,6 +721,51 @@ static void timekeeping_adjust(s64 offset)
> > timekeeper.ntp_error_shift;
> > }
> >
> > +
> > +/**
> > + * logarithmic_accumulation - shifted accumulation of cycles
> > + *
> > + * This functions accumulates a shifted interval of cycles into
> > + * into a shifted interval nanoseconds. Allows for O(log) accumulation
> > + * loop.
> > + *
> > + * Returns the unconsumed cycles.
> > + */
> > +static cycle_t logarithmic_accumulation(cycle_t offset, int shift)
> > +{
> > + u64 nsecps = (u64)NSEC_PER_SEC << timekeeper.shift;
> > +
> > + /* If the offset is smaller then a shifted interval, do nothing */
> > + if (offset < timekeeper.cycle_interval<<shift)
> > + return offset;
> > +
> > + /* accumulate one shifted interval */
> > + offset -= timekeeper.cycle_interval << shift;
> > + timekeeper.clock->cycle_last += timekeeper.cycle_interval << shift;
> > +
> > + timekeeper.xtime_nsec += timekeeper.xtime_interval << shift;
> > + while (timekeeper.xtime_nsec >= nsecps) {
> > + timekeeper.xtime_nsec -= nsecps;
> > + xtime.tv_sec++;
> > + second_overflow();
> > + }
> > +
> > + /* accumulate into raw time */
> > + raw_time.tv_nsec += timekeeper.raw_interval << shift;;
> > + while (raw_time.tv_nsec >= NSEC_PER_SEC) {
> > + raw_time.tv_nsec -= NSEC_PER_SEC;
> > + raw_time.tv_sec++;
> > + }
> > +
> > + /* accumulate error between NTP and clock interval */
> > + timekeeper.ntp_error += tick_length << shift;
> > + timekeeper.ntp_error -= timekeeper.xtime_interval <<
> > + (timekeeper.ntp_error_shift + shift);
> > +
> > + return offset;
> > +}
> > +
> > +
> > /**
> > * update_wall_time - Uses the current clocksource to increment the wall time
> > *
> > @@ -731,6 +776,7 @@ void update_wall_time(void)
> > struct clocksource *clock;
> > cycle_t offset;
> > u64 nsecs;
> > + int shift = 0, maxshift;
> >
> > /* Make sure we're fully resumed: */
> > if (unlikely(timekeeping_suspended))
> > @@ -744,33 +790,22 @@ void update_wall_time(void)
> > #endif
> > timekeeper.xtime_nsec = (s64)xtime.tv_nsec << timekeeper.shift;
> >
> > - /* normally this loop will run just once, however in the
> > - * case of lost or late ticks, it will accumulate correctly.
> > + /*
> > + * With NO_HZ we may have to accumulate many cycle_intervals
> > + * (think "ticks") worth of time at once. To do this efficiently,
> > + * we calculate the largest doubling multiple of cycle_intervals
> > + * that is smaller then the offset. We then accumulate that
> > + * chunk in one go, and then try to consume the next smaller
> > + * doubled multiple.
> > */
> > + shift = ilog2(offset) - ilog2(timekeeper.cycle_interval);
> > + shift = max(0, shift);
> > + /* Bound shift to one less then what overflows tick_length */
> > + maxshift = (8*sizeof(tick_length) - (ilog2(tick_length)+1)) - 1;
> > + shift = min(shift, maxshift);
> > while (offset >= timekeeper.cycle_interval) {
> > - u64 nsecps = (u64)NSEC_PER_SEC << timekeeper.shift;
> > -
> > - /* accumulate one interval */
> > - offset -= timekeeper.cycle_interval;
> > - clock->cycle_last += timekeeper.cycle_interval;
> > -
> > - timekeeper.xtime_nsec += timekeeper.xtime_interval;
> > - if (timekeeper.xtime_nsec >= nsecps) {
> > - timekeeper.xtime_nsec -= nsecps;
> > - xtime.tv_sec++;
> > - second_overflow();
> > - }
> > -
> > - raw_time.tv_nsec += timekeeper.raw_interval;
> > - if (raw_time.tv_nsec >= NSEC_PER_SEC) {
> > - raw_time.tv_nsec -= NSEC_PER_SEC;
> > - raw_time.tv_sec++;
> > - }
> > -
> > - /* accumulate error between NTP and clock interval */
> > - timekeeper.ntp_error += tick_length;
> > - timekeeper.ntp_error -= timekeeper.xtime_interval <<
> > - timekeeper.ntp_error_shift;
> > + offset = logarithmic_accumulation(offset, shift);
> > + shift--;
> > }
> >
> > /* correct the clock when NTP error is too big */
> >
> >
> >
>
> There are several (6) trailing whitespace errors that checkpatch exposes,
> but other than that:
Yep, i fixed those already. (also the few inconsistent comment
capitalizations)
The two patches held up fine in -tip testing so far.
> Signed-off-by: John Kacur <jkacur@redhat.com>
Thanks - i changed that to Reviewed-by (SoB is for being part of the
patch-flow) and added it to the two commits.
Ingo
next prev parent reply other threads:[~2009-10-05 11:53 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-10-02 23:17 [PATCH 1/2] time: logrithmic time accumulation john stultz
2009-10-02 23:24 ` [PATCH 2/2] time: remove xtime_cache john stultz
2009-10-05 11:46 ` [tip:timers/core] time: Remove xtime_cache tip-bot for john stultz
2009-10-05 11:54 ` tip-bot for john stultz
2009-10-07 8:02 ` [PATCH 2/2] time: remove xtime_cache John Kacur
2009-10-05 10:43 ` [PATCH 1/2] time: logrithmic time accumulation John Kacur
2009-10-05 11:52 ` Ingo Molnar [this message]
2009-10-07 7:58 ` John Kacur
2009-10-05 15:24 ` Randy Dunlap
2009-10-05 11:46 ` [tip:timers/core] time: Implement logarithmic " tip-bot for john stultz
2009-10-05 11:54 ` tip-bot for john stultz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20091005115218.GA7475@elte.hu \
--to=mingo@elte.hu \
--cc=akpm@linux-foundation.org \
--cc=jkacur@redhat.com \
--cc=johnstul@us.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=schwidefsky@de.ibm.com \
--cc=tglx@linutronix.de \
--cc=williams@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox