From: Ingo Molnar <mingo@elte.hu>
To: John Kacur <jkacur@redhat.com>
Cc: john stultz <johnstul@us.ibm.com>,
lkml <linux-kernel@vger.kernel.org>,
Clark Williams <williams@redhat.com>,
Martin Schwidefsky <schwidefsky@de.ibm.com>,
Thomas Gleixner <tglx@linutronix.de>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH 1/2] time: logrithmic time accumulation
Date: Mon, 5 Oct 2009 13:52:18 +0200 [thread overview]
Message-ID: <20091005115218.GA7475@elte.hu> (raw)
In-Reply-To: <alpine.LFD.2.00.0910051228420.5132@localhost.localdomain>
* John Kacur <jkacur@redhat.com> wrote:
>
>
> On Fri, 2 Oct 2009, john stultz wrote:
>
> > Accumulating one tick at a time works well unless we're using NOHZ. Then
> > it can be an issue, since we may have to run through the loop a few
> > thousand times, which can increase timer interrupt caused latency.
> >
> > The current solution was to accumulate in half-second intervals with
> > NOHZ. This kept the number of loops down, however it did slightly change
> > how we make NTP adjustments. While not an issue with NTPd users, as NTPd
> > makes adjustments over a longer period of time, other adjtimex() users
> > have noticed the half-second granularity with which we can apply
> > frequency changes to the clock.
> >
> > For instance, if a application tries to apply a 100ppm frequency
> > correction for 20ms to correct a 2us offset, with NOHZ they either get
> > no correction, or a 50us correction.
> >
> > Now, there will always be some granularity error for applying frequency
> > corrections. However with users sensitive to this error have seen a
> > 50-500x increase with NOHZ compared to running without NOHZ.
> >
> > So I figured I'd try another approach then just simply increasing the
> > interval. My approach is to consume the time interval logarithmically.
> > This reduces the number of times through the loop needed keeping
> > latency down, while still preserving the original granularity error for
> > adjtimex() changes.
> >
> > Further, this change allows us to remove the xtime_cache code (patch to
> > follow), as xtime is always within one tick of the current time, instead
> > of the half-second updates it saw before.
> >
> > An earlier version of this patch has been shipping to x86 users in the
> > RedHat MRG releases for awhile without issue, but I've reworked this
> > version to be even more careful about avoiding possible overflows if the
> > shift value gets too large.
> >
> > Since this is not the most trivial code, and its slightly different then
> > whats been tested for awhile, it would be good to get this into some
> > trees for testing. Be it -tip or -mm, either would work. If there's no
> > problems it could be a 2.6.33 or 2.6.34 item.
> >
> > Any comments or feedback would be appreciated!
> >
> > Signed-off-by: John Stultz <johnstul@us.ibm.com>
> >
> > diff --git a/include/linux/timex.h b/include/linux/timex.h
> > index e6967d1..0c0ef7d 100644
> > --- a/include/linux/timex.h
> > +++ b/include/linux/timex.h
> > @@ -261,11 +261,7 @@ static inline int ntp_synced(void)
> >
> > #define NTP_SCALE_SHIFT 32
> >
> > -#ifdef CONFIG_NO_HZ
> > -#define NTP_INTERVAL_FREQ (2)
> > -#else
> > #define NTP_INTERVAL_FREQ (HZ)
> > -#endif
> > #define NTP_INTERVAL_LENGTH (NSEC_PER_SEC/NTP_INTERVAL_FREQ)
> >
> > /* Returns how long ticks are at present, in ns / 2^NTP_SCALE_SHIFT. */
> > diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> > index fb0f46f..4cc5656 100644
> > --- a/kernel/time/timekeeping.c
> > +++ b/kernel/time/timekeeping.c
> > @@ -721,6 +721,51 @@ static void timekeeping_adjust(s64 offset)
> > timekeeper.ntp_error_shift;
> > }
> >
> > +
> > +/**
> > + * logarithmic_accumulation - shifted accumulation of cycles
> > + *
> > + * This functions accumulates a shifted interval of cycles into
> > + * into a shifted interval nanoseconds. Allows for O(log) accumulation
> > + * loop.
> > + *
> > + * Returns the unconsumed cycles.
> > + */
> > +static cycle_t logarithmic_accumulation(cycle_t offset, int shift)
> > +{
> > + u64 nsecps = (u64)NSEC_PER_SEC << timekeeper.shift;
> > +
> > + /* If the offset is smaller then a shifted interval, do nothing */
> > + if (offset < timekeeper.cycle_interval<<shift)
> > + return offset;
> > +
> > + /* accumulate one shifted interval */
> > + offset -= timekeeper.cycle_interval << shift;
> > + timekeeper.clock->cycle_last += timekeeper.cycle_interval << shift;
> > +
> > + timekeeper.xtime_nsec += timekeeper.xtime_interval << shift;
> > + while (timekeeper.xtime_nsec >= nsecps) {
> > + timekeeper.xtime_nsec -= nsecps;
> > + xtime.tv_sec++;
> > + second_overflow();
> > + }
> > +
> > + /* accumulate into raw time */
> > + raw_time.tv_nsec += timekeeper.raw_interval << shift;;
> > + while (raw_time.tv_nsec >= NSEC_PER_SEC) {
> > + raw_time.tv_nsec -= NSEC_PER_SEC;
> > + raw_time.tv_sec++;
> > + }
> > +
> > + /* accumulate error between NTP and clock interval */
> > + timekeeper.ntp_error += tick_length << shift;
> > + timekeeper.ntp_error -= timekeeper.xtime_interval <<
> > + (timekeeper.ntp_error_shift + shift);
> > +
> > + return offset;
> > +}
> > +
> > +
> > /**
> > * update_wall_time - Uses the current clocksource to increment the wall time
> > *
> > @@ -731,6 +776,7 @@ void update_wall_time(void)
> > struct clocksource *clock;
> > cycle_t offset;
> > u64 nsecs;
> > + int shift = 0, maxshift;
> >
> > /* Make sure we're fully resumed: */
> > if (unlikely(timekeeping_suspended))
> > @@ -744,33 +790,22 @@ void update_wall_time(void)
> > #endif
> > timekeeper.xtime_nsec = (s64)xtime.tv_nsec << timekeeper.shift;
> >
> > - /* normally this loop will run just once, however in the
> > - * case of lost or late ticks, it will accumulate correctly.
> > + /*
> > + * With NO_HZ we may have to accumulate many cycle_intervals
> > + * (think "ticks") worth of time at once. To do this efficiently,
> > + * we calculate the largest doubling multiple of cycle_intervals
> > + * that is smaller then the offset. We then accumulate that
> > + * chunk in one go, and then try to consume the next smaller
> > + * doubled multiple.
> > */
> > + shift = ilog2(offset) - ilog2(timekeeper.cycle_interval);
> > + shift = max(0, shift);
> > + /* Bound shift to one less then what overflows tick_length */
> > + maxshift = (8*sizeof(tick_length) - (ilog2(tick_length)+1)) - 1;
> > + shift = min(shift, maxshift);
> > while (offset >= timekeeper.cycle_interval) {
> > - u64 nsecps = (u64)NSEC_PER_SEC << timekeeper.shift;
> > -
> > - /* accumulate one interval */
> > - offset -= timekeeper.cycle_interval;
> > - clock->cycle_last += timekeeper.cycle_interval;
> > -
> > - timekeeper.xtime_nsec += timekeeper.xtime_interval;
> > - if (timekeeper.xtime_nsec >= nsecps) {
> > - timekeeper.xtime_nsec -= nsecps;
> > - xtime.tv_sec++;
> > - second_overflow();
> > - }
> > -
> > - raw_time.tv_nsec += timekeeper.raw_interval;
> > - if (raw_time.tv_nsec >= NSEC_PER_SEC) {
> > - raw_time.tv_nsec -= NSEC_PER_SEC;
> > - raw_time.tv_sec++;
> > - }
> > -
> > - /* accumulate error between NTP and clock interval */
> > - timekeeper.ntp_error += tick_length;
> > - timekeeper.ntp_error -= timekeeper.xtime_interval <<
> > - timekeeper.ntp_error_shift;
> > + offset = logarithmic_accumulation(offset, shift);
> > + shift--;
> > }
> >
> > /* correct the clock when NTP error is too big */
> >
> >
> >
>
> There are several (6) trailing whitespace errors that checkpatch exposes,
> but other than that:
Yep, i fixed those already. (also the few inconsistent comment
capitalizations)
The two patches held up fine in -tip testing so far.
> Signed-off-by: John Kacur <jkacur@redhat.com>
Thanks - i changed that to Reviewed-by (SoB is for being part of the
patch-flow) and added it to the two commits.
Ingo
next prev parent reply other threads:[~2009-10-05 11:53 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-10-02 23:17 [PATCH 1/2] time: logrithmic time accumulation john stultz
2009-10-02 23:24 ` [PATCH 2/2] time: remove xtime_cache john stultz
2009-10-05 11:46 ` [tip:timers/core] time: Remove xtime_cache tip-bot for john stultz
2009-10-05 11:54 ` tip-bot for john stultz
2009-10-07 8:02 ` [PATCH 2/2] time: remove xtime_cache John Kacur
2009-10-07 8:02 ` John Kacur
2009-10-05 10:43 ` [PATCH 1/2] time: logrithmic time accumulation John Kacur
2009-10-05 11:52 ` Ingo Molnar [this message]
2009-10-07 7:58 ` John Kacur
2009-10-07 7:58 ` John Kacur
2009-10-05 15:24 ` Randy Dunlap
2009-10-05 11:46 ` [tip:timers/core] time: Implement logarithmic " tip-bot for john stultz
2009-10-05 11:54 ` tip-bot for john stultz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20091005115218.GA7475@elte.hu \
--to=mingo@elte.hu \
--cc=akpm@linux-foundation.org \
--cc=jkacur@redhat.com \
--cc=johnstul@us.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=schwidefsky@de.ibm.com \
--cc=tglx@linutronix.de \
--cc=williams@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.