[PATCH 2/3] [RFC] timekeeping: Use cached ntp_tick

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 2/3] [RFC] timekeeping: Use cached ntp_tick_length when accumulating error
  2014-04-24 23:04 [PATCH 0/3] timekeeping: Improved NOHZ frequency steering John Stultz
@ 2014-04-24 23:04 ` John Stultz
  0 siblings, 0 replies; 15+ messages in thread
From: John Stultz @ 2014-04-24 23:04 UTC (permalink / raw)
  To: LKML; +Cc: John Stultz, Miroslav Lichvar, Richard Cochran, Prarit Bhargava

By caching the ntp_tick_length() when we correct the frequency error,
and then using that cached value to accumulate error, we avoid large
initial errors when the tick length is changed.

This makes convergence happen much faster in the simulator, since the
initial error doesn't have to be slowly whittled away.

This initially seems like an accounting error, but Miroslav pointed out
that ntp_tick_length() can change mid-tick, so when we apply it in the
error accumulation, we are applying any recent change to the entire tick.

This approach chooses to apply changes in the ntp_tick_length() only to
the next tick, which allows us to calculate the freq correction before
using the new tick length, which avoids accummulating error.

Credit to Miroslav for pointing this out and providing the original patch
this functionality has been pulled out from, along with the rational.

Cc: Miroslav Lichvar <mlichvar@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Reported-by: Miroslav Lichvar <mlichvar@redhat.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 include/linux/timekeeper_internal.h | 7 +++++++
 kernel/time/timekeeping.c           | 4 +++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/include/linux/timekeeper_internal.h b/include/linux/timekeeper_internal.h
index c1825eb..83fd40f 100644
--- a/include/linux/timekeeper_internal.h
+++ b/include/linux/timekeeper_internal.h
@@ -34,6 +34,13 @@ struct timekeeper {
 	/* Clock shifted nano seconds */
 	u64			xtime_nsec;
 
+	/* The ntp_tick_length() value currently being used.
+	 * This cached copy ensures we consistently apply the tick
+	 * length for an entire tick, as ntp_tick_length may change
+	 * mid-tick, and we don't want to apply that new value to
+	 * the tick in progress.
+	 */
+	u64			ntp_tick;
 	/* Difference between accumulated time and NTP time in ntp
 	 * shifted nano seconds. */
 	s64			ntp_error;
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 9aa8ccf..a130d00 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -148,6 +148,7 @@ static void tk_setup_internals(struct timekeeper *tk, struct clocksource *clock)
 	 * to counteract clock drifting.
 	 */
 	tk->mult = clock->mult;
+	tk->ntp_tick = ntpinterval << tk->ntp_error_shift;
 }
 
 /* Timekeeper helper functions. */
@@ -1138,6 +1139,7 @@ static __always_inline void timekeeping_freqadjust(struct timekeeper *tk,
 	bool negative;
 	u32 adj;
 
+	tk->ntp_tick = ntp_tick_length();
 	/* Calculate current error per tick */
 	tick_error = ntp_tick_length() >> tk->ntp_error_shift;
 	tick_error -= (tk->xtime_interval + tk->xtime_remainder);
@@ -1292,7 +1294,7 @@ static cycle_t logarithmic_accumulation(struct timekeeper *tk, cycle_t offset,
 	tk->raw_time.tv_nsec = raw_nsecs;
 
 	/* Accumulate error between NTP and clock interval */
-	tk->ntp_error += ntp_tick_length() << shift;
+	tk->ntp_error += tk->ntp_tick << shift;
 	tk->ntp_error -= (tk->xtime_interval + tk->xtime_remainder) <<
 						(tk->ntp_error_shift + shift);
 
-- 
1.8.3.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 0/3] timekeeping: Improved NOHZ frequency steering (v2)
@ 2014-05-17  0:56 John Stultz
  2014-05-17  0:56 ` [PATCH 1/3] [RFC] timekeeping: Rework frequency adjustments to work better w/ nohz John Stultz
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: John Stultz @ 2014-05-17  0:56 UTC (permalink / raw)
  To: LKML; +Cc: John Stultz, Miroslav Lichvar, Richard Cochran, Prarit Bhargava

I managed to find some time to further work on the next iteration here.

This patch set, based on ideas from Miroslav, tries to improve the ntp
freq steering when using NOHZ. 

Rather then just doing error proportional correction, this patchset
splits the logic to two steps: frequency correction and error
correction. By doing this, we are able to cap the rate of the error
correction, so it is small and we avoid the strong over-corrections
seen with the previous implementation when NOHZ was enabled.

This version of the patch set corrects a few issues Miroslav pointed
out, as well as adapts his approach almost completely for the last
patch. This pulls the results in to be very close to his original
patch.

I'm not 100% sure we need the last patch in this series, as
it has additional computational cost and testing on real
hardware has shown NOHZ=y performance matching NOHZ=n with a
earlier version of just the first patch:
	https://lkml.org/lkml/2014/1/13/501
(though admittedly, the patch has changed since Richard's testing,
so the results are a bit stale).

Below are some of the simulator results comparing this patchset
to vanilla and Miroslav's original patch.

Vanilla 3.15-rc
---------------
$  ./tk_test -t 500 -n 4000
samples: 1-500 reg: 1-500 slope: 1.00 dev: 2252427.1 max: 9614317.1 freq: -99.94417
samples: 501-1000 reg: 501-1000 slope: 1.00 dev: 1426381.1 max: 7957313.5 freq: -100.23188
samples: 1001-1500 reg: 1001-1500 slope: 1.00 dev: 1146520.7 max: 7795769.4 freq: -99.96096
samples: 1501-2000 reg: 1501-2000 slope: 1.00 dev: 942903.0 max: 7091089.3 freq: -100.24301
samples: 2001-2500 reg: 2001-2500 slope: 1.00 dev: 927526.2 max: 6688226.0 freq: -99.74349
samples: 2501-3000 reg: 2501-3000 slope: 1.00 dev: 843482.1 max: 10259091.7 freq: -100.01513
samples: 3001-3500 reg: 3001-3500 slope: 1.00 dev: 935390.6 max: 9055331.8 freq: -99.74709
samples: 3501-4000 reg: 3501-4000 slope: 1.00 dev: 760640.7 max: 8755998.4 freq: -99.75966

$ ./test1.sh 
		freq10          fre100          dev             max
nohz on		38.38368        2.72579         1468940.9       9846788.2
nohz off	3.89181         0.10436         0.2             0.6

Miroslav's original patch:
--------------------------
$ ./tk_test -t 500 -n 4000
samples: 1-500 reg: 1-500 slope: 1.00 dev: 115.5 max: 300.1 freq: -100.00004
samples: 501-1000 reg: 501-1000 slope: 1.00 dev: 73.7 max: 289.6 freq: -100.00003
samples: 1001-1500 reg: 1001-1500 slope: 1.00 dev: 59.7 max: 283.9 freq: -100.00001
samples: 1501-2000 reg: 1501-2000 slope: 1.00 dev: 50.6 max: 291.6 freq: -100.00000
samples: 2001-2500 reg: 2001-2500 slope: 1.00 dev: 48.6 max: 285.7 freq: -100.00002
samples: 2501-3000 reg: 2501-3000 slope: 1.00 dev: 44.8 max: 293.5 freq: -99.99993
samples: 3001-3500 reg: 3001-3500 slope: 1.00 dev: 40.5 max: 303.8 freq: -99.99996
samples: 3501-4000 reg: 3501-4000 slope: 1.00 dev: 37.6 max: 288.5 freq: -99.99997

$ ./test1.sh 
		freq10          freq100         dev             max
nohz on		0.00601         0.00028         74.0            279.4
nohz off	0.05867         0.00204         0.2             0.6


This patchset:
--------------
$ ./tk_test -t 500 -n 4000
samples: 1-500 reg: 1-500 slope: 1.00 dev: 111.0 max: 276.4 freq: -100.00000
samples: 501-1000 reg: 501-1000 slope: 1.00 dev: 76.5 max: 286.7 freq: -99.99998
samples: 1001-1500 reg: 1001-1500 slope: 1.00 dev: 59.3 max: 291.8 freq: -100.00003
samples: 1501-2000 reg: 1501-2000 slope: 1.00 dev: 51.2 max: 290.7 freq: -100.00001
samples: 2001-2500 reg: 2001-2500 slope: 1.00 dev: 48.0 max: 275.9 freq: -100.00002
samples: 2501-3000 reg: 2501-3000 slope: 1.00 dev: 44.8 max: 293.2 freq: -99.99993
samples: 3001-3500 reg: 3001-3500 slope: 1.00 dev: 40.1 max: 300.4 freq: -99.99998
samples: 3501-4000 reg: 3501-4000 slope: 1.00 dev: 37.6 max: 285.7 freq: -99.99999

$ ./test1.sh 
		freq10          freq100         dev             max
nohz on		0.00582         0.00033         74.1            279.9
nohz off	0.06275         0.06440         0.4             1.4


Again, many many thanks to Miroslav for pointing out this issue, providing
the simulator and initial patches, as well as helping to point out problems
with my rework of his change.

Comments, and feedback and testing (particularly on 32bit systems, and
systems like powerpc) would be greatly appreciated!

thanks
-john

Cc: Miroslav Lichvar <mlichvar@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>

John Stultz (3):
  [RFC] timekeeping: Rework frequency adjustments to work better w/ nohz
  [RFC] timekeeping: Use cached ntp_tick_length when accumulating error
  [RFC] timekeeping: Calculate freq adjustment directly

 include/linux/timekeeper_internal.h |  12 +++
 kernel/time/timekeeping.c           | 175 ++++++++++++++----------------------
 2 files changed, 77 insertions(+), 110 deletions(-)

-- 
1.9.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/3] [RFC] timekeeping: Rework frequency adjustments to work better w/ nohz
  2014-05-17  0:56 [PATCH 0/3] timekeeping: Improved NOHZ frequency steering (v2) John Stultz
@ 2014-05-17  0:56 ` John Stultz
  2014-05-17  0:56 ` [PATCH 2/3] [RFC] timekeeping: Use cached ntp_tick_length when accumulating error John Stultz
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 15+ messages in thread
From: John Stultz @ 2014-05-17  0:56 UTC (permalink / raw)
  To: LKML; +Cc: John Stultz, Miroslav Lichvar, Richard Cochran, Prarit Bhargava

The existing timekeeping_adjust logic has always been complicated
to understand. Further, since it was developed prior to NOHZ becoming
common, its not surprising it performs poorly when NOHZ is enabled.

Since Miroslav pointed out the problematic nature of the existing code
in the NOHZ case, I've tried to refactor the code to perform better.

The problem with the previous approach was that it tried to adjust
for the total cumulative error using a scaled dampening factor. This
resulted in large errors to be corrected slowly, while small errors
were corrected quickly. With NOHZ the timekeeping code doesn't know
how far out the next tick will be, so this results in bad
over-correction to small errors, and insufficient correction to large
errors.

Inspired by Miroslav's patch, I've refactored the code to try to
address the correction in two steps.

1) Check the future freq error for the next tick, and if the frequency
error is large, try to make sure we correct it so it doesn't cause
much accumulated error.

2) Then make a small single unit adjustment to correct any cumulative
error that has collected over time.

This method performs fairly well in the simulator Miroslav created.

Major credit to Miroslav for pointing out the issue, providing the
original patch to resolve this, a simulator for testing, as well as
helping debug and resolve issues in my implementation so that it
performed closer to his original implementation.

I'd be very interested in feedback, thoughts, and testing.

Cc: Miroslav Lichvar <mlichvar@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Reported-by: Miroslav Lichvar <mlichvar@redhat.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 include/linux/timekeeper_internal.h |   3 +
 kernel/time/timekeeping.c           | 193 ++++++++++++++++--------------------
 2 files changed, 86 insertions(+), 110 deletions(-)

diff --git a/include/linux/timekeeper_internal.h b/include/linux/timekeeper_internal.h
index c1825eb..4d398f1 100644
--- a/include/linux/timekeeper_internal.h
+++ b/include/linux/timekeeper_internal.h
@@ -41,6 +41,9 @@ struct timekeeper {
 	 * ntp shifted nano seconds. */
 	u32			ntp_error_shift;
 
+	/* Mult adjustment being applied to correct ntp_error */
+	u32			ntp_err_mult;
+
 	/*
 	 * wall_to_monotonic is what we need to add to xtime (or xtime corrected
 	 * for sub jiffie times) to get to monotonic time.  Monotonic is pegged
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 63f9ed7..d46bda2 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -148,6 +148,7 @@ static void tk_setup_internals(struct timekeeper *tk, struct clocksource *clock)
 	 * to counteract clock drifting.
 	 */
 	tk->mult = clock->mult;
+	tk->ntp_err_mult = 0;
 }
 
 /* Timekeeper helper functions. */
@@ -1049,125 +1050,34 @@ static int __init timekeeping_init_ops(void)
 	register_syscore_ops(&timekeeping_syscore_ops);
 	return 0;
 }
-
 device_initcall(timekeeping_init_ops);
 
 /*
- * If the error is already larger, we look ahead even further
- * to compensate for late or lost adjustments.
+ * Apply a multiplier adjustment to the timekeeper
  */
-static __always_inline int timekeeping_bigadjust(struct timekeeper *tk,
-						 s64 error, s64 *interval,
-						 s64 *offset)
+static __always_inline void timekeeping_apply_adjustment(struct timekeeper *tk,
+							 s64 offset,
+							 bool negative,
+							 int adj_scale)
 {
-	s64 tick_error, i;
-	u32 look_ahead, adj;
-	s32 error2, mult;
-
-	/*
-	 * Use the current error value to determine how much to look ahead.
-	 * The larger the error the slower we adjust for it to avoid problems
-	 * with losing too many ticks, otherwise we would overadjust and
-	 * produce an even larger error.  The smaller the adjustment the
-	 * faster we try to adjust for it, as lost ticks can do less harm
-	 * here.  This is tuned so that an error of about 1 msec is adjusted
-	 * within about 1 sec (or 2^20 nsec in 2^SHIFT_HZ ticks).
-	 */
-	error2 = tk->ntp_error >> (NTP_SCALE_SHIFT + 22 - 2 * SHIFT_HZ);
-	error2 = abs(error2);
-	for (look_ahead = 0; error2 > 0; look_ahead++)
-		error2 >>= 2;
+	s64 interval = tk->cycle_interval;
+	s32 mult_adj = 1;
 
-	/*
-	 * Now calculate the error in (1 << look_ahead) ticks, but first
-	 * remove the single look ahead already included in the error.
-	 */
-	tick_error = ntp_tick_length() >> (tk->ntp_error_shift + 1);
-	tick_error -= tk->xtime_interval >> 1;
-	error = ((error - tick_error) >> look_ahead) + tick_error;
-
-	/* Finally calculate the adjustment shift value.  */
-	i = *interval;
-	mult = 1;
-	if (error < 0) {
-		error = -error;
-		*interval = -*interval;
-		*offset = -*offset;
-		mult = -1;
+	if (negative) {
+		mult_adj = -mult_adj;
+		interval = -interval;
+		offset  = -offset;
 	}
-	for (adj = 0; error > i; adj++)
-		error >>= 1;
-
-	*interval <<= adj;
-	*offset <<= adj;
-	return mult << adj;
-}
+	mult_adj <<= adj_scale;
+	interval <<= adj_scale;
+	offset <<= adj_scale;
 
-/*
- * Adjust the multiplier to reduce the error value,
- * this is optimized for the most common adjustments of -1,0,1,
- * for other values we can do a bit more work.
- */
-static void timekeeping_adjust(struct timekeeper *tk, s64 offset)
-{
-	s64 error, interval = tk->cycle_interval;
-	int adj;
-
-	/*
-	 * The point of this is to check if the error is greater than half
-	 * an interval.
-	 *
-	 * First we shift it down from NTP_SHIFT to clocksource->shifted nsecs.
-	 *
-	 * Note we subtract one in the shift, so that error is really error*2.
-	 * This "saves" dividing(shifting) interval twice, but keeps the
-	 * (error > interval) comparison as still measuring if error is
-	 * larger than half an interval.
-	 *
-	 * Note: It does not "save" on aggravation when reading the code.
-	 */
-	error = tk->ntp_error >> (tk->ntp_error_shift - 1);
-	if (error > interval) {
-		/*
-		 * We now divide error by 4(via shift), which checks if
-		 * the error is greater than twice the interval.
-		 * If it is greater, we need a bigadjust, if its smaller,
-		 * we can adjust by 1.
-		 */
-		error >>= 2;
-		if (likely(error <= interval))
-			adj = 1;
-		else
-			adj = timekeeping_bigadjust(tk, error, &interval, &offset);
-	} else {
-		if (error < -interval) {
-			/* See comment above, this is just switched for the negative */
-			error >>= 2;
-			if (likely(error >= -interval)) {
-				adj = -1;
-				interval = -interval;
-				offset = -offset;
-			} else {
-				adj = timekeeping_bigadjust(tk, error, &interval, &offset);
-			}
-		} else {
-			goto out_adjust;
-		}
-	}
-
-	if (unlikely(tk->clock->maxadj &&
-		(tk->mult + adj > tk->clock->mult + tk->clock->maxadj))) {
-		printk_once(KERN_WARNING
-			"Adjusting %s more than 11%% (%ld vs %ld)\n",
-			tk->clock->name, (long)tk->mult + adj,
-			(long)tk->clock->mult + tk->clock->maxadj);
-	}
 	/*
 	 * So the following can be confusing.
 	 *
-	 * To keep things simple, lets assume adj == 1 for now.
+	 * To keep things simple, lets assume mult_adj == 1 for now.
 	 *
-	 * When adj != 1, remember that the interval and offset values
+	 * When mult_adj != 1, remember that the interval and offset values
 	 * have been appropriately scaled so the math is the same.
 	 *
 	 * The basic idea here is that we're increasing the multiplier
@@ -1211,12 +1121,76 @@ static void timekeeping_adjust(struct timekeeper *tk, s64 offset)
 	 *
 	 * XXX - TODO: Doc ntp_error calculation.
 	 */
-	tk->mult += adj;
+	tk->mult += mult_adj;
 	tk->xtime_interval += interval;
 	tk->xtime_nsec -= offset;
 	tk->ntp_error -= (interval - offset) << tk->ntp_error_shift;
+}
+
+/*
+ * Calculate the multiplier adjustment needed to match the frequency
+ * specified by NTP
+ */
+static __always_inline void timekeeping_freqadjust(struct timekeeper *tk,
+							s64 offset)
+{
+	s64 interval = tk->cycle_interval;
+	s64 xinterval = tk->xtime_interval;
+	s64 tick_error;
+	bool negative;
+	u32 adj;
+
+	/* Remove any current error adj from freq calculation */
+	if (tk->ntp_err_mult)
+		xinterval -= tk->cycle_interval;
+
+	/* Calculate current error per tick */
+	tick_error = ntp_tick_length() >> tk->ntp_error_shift;
+	tick_error -= (xinterval + tk->xtime_remainder);
+
+	/* Don't worry about correcting it if its small */
+	if (likely((tick_error >= 0) && (tick_error <= interval)))
+		return;
+
+	/* preserve the direction of correction */
+	negative = (tick_error < 0);
+
+	/* Sort out the magnitude of the correction */
+	tick_error = abs(tick_error);
+	for (adj = 0; tick_error > interval; adj++)
+		tick_error >>= 1;
+
+	/* scale the corrections */
+	timekeeping_apply_adjustment(tk, offset, negative, adj);
+}
+
+/*
+ * Adjust the timekeeper's multiplier to the correct frequency
+ * and also to reduce the accumulated error value.
+ */
+static void timekeeping_adjust(struct timekeeper *tk, s64 offset)
+{
+	/* Correct for the current frequency error */
+	timekeeping_freqadjust(tk, offset);
+
+	/* Next make a small adjustment to fix any cumulative error */
+	if (!tk->ntp_err_mult && (tk->ntp_error > 0)) {
+		tk->ntp_err_mult = 1;
+		timekeeping_apply_adjustment(tk, offset, 0, 0);
+	} else if (tk->ntp_err_mult && (tk->ntp_error <= 0)) {
+		/* Undo any existing error adjustment */
+		timekeeping_apply_adjustment(tk, offset, 1, 0);
+		tk->ntp_err_mult = 0;
+	}
+
+	if (unlikely(tk->clock->maxadj &&
+		(tk->mult > tk->clock->mult + tk->clock->maxadj))) {
+		printk_once(KERN_WARNING
+			"Adjusting %s more than 11%% (%ld vs %ld)\n",
+			tk->clock->name, (long)tk->mult,
+			(long)tk->clock->mult + tk->clock->maxadj);
+	}
 
-out_adjust:
 	/*
 	 * It may be possible that when we entered this function, xtime_nsec
 	 * was very small.  Further, if we're slightly speeding the clocksource
@@ -1236,7 +1210,6 @@ out_adjust:
 		tk->xtime_nsec = 0;
 		tk->ntp_error += neg << tk->ntp_error_shift;
 	}
-
 }
 
 /**
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/3] [RFC] timekeeping: Use cached ntp_tick_length when accumulating error
  2014-05-17  0:56 [PATCH 0/3] timekeeping: Improved NOHZ frequency steering (v2) John Stultz
  2014-05-17  0:56 ` [PATCH 1/3] [RFC] timekeeping: Rework frequency adjustments to work better w/ nohz John Stultz
@ 2014-05-17  0:56 ` John Stultz
  2014-05-17  0:56 ` [PATCH 3/3] [RFC] timekeeping: Calculate freq adjustment directly John Stultz
  2014-05-19 10:14 ` [PATCH 0/3] timekeeping: Improved NOHZ frequency steering (v2) Miroslav Lichvar
  3 siblings, 0 replies; 15+ messages in thread
From: John Stultz @ 2014-05-17  0:56 UTC (permalink / raw)
  To: LKML; +Cc: John Stultz, Miroslav Lichvar, Richard Cochran, Prarit Bhargava

By caching the ntp_tick_length() when we correct the frequency error,
and then using that cached value to accumulate error, we avoid large
initial errors when the tick length is changed.

This makes convergence happen much faster in the simulator, since the
initial error doesn't have to be slowly whittled away.

This initially seems like an accounting error, but Miroslav pointed out
that ntp_tick_length() can change mid-tick, so when we apply it in the
error accumulation, we are applying any recent change to the entire tick.

This approach chooses to apply changes in the ntp_tick_length() only to
the next tick, which allows us to calculate the freq correction before
using the new tick length, which avoids accumulating error.

Credit to Miroslav for pointing this out and providing the original patch
this functionality has been pulled out from, along with the rational.

Cc: Miroslav Lichvar <mlichvar@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Reported-by: Miroslav Lichvar <mlichvar@redhat.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 include/linux/timekeeper_internal.h | 7 +++++++
 kernel/time/timekeeping.c           | 5 ++++-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/include/linux/timekeeper_internal.h b/include/linux/timekeeper_internal.h
index 4d398f1..4fddf6c 100644
--- a/include/linux/timekeeper_internal.h
+++ b/include/linux/timekeeper_internal.h
@@ -34,6 +34,13 @@ struct timekeeper {
 	/* Clock shifted nano seconds */
 	u64			xtime_nsec;
 
+	/* The ntp_tick_length() value currently being used.
+	 * This cached copy ensures we consistently apply the tick
+	 * length for an entire tick, as ntp_tick_length may change
+	 * mid-tick, and we don't want to apply that new value to
+	 * the tick in progress.
+	 */
+	u64			ntp_tick;
 	/* Difference between accumulated time and NTP time in ntp
 	 * shifted nano seconds. */
 	s64			ntp_error;
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index d46bda2..846603e 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -141,6 +141,7 @@ static void tk_setup_internals(struct timekeeper *tk, struct clocksource *clock)
 
 	tk->ntp_error = 0;
 	tk->ntp_error_shift = NTP_SCALE_SHIFT - clock->shift;
+	tk->ntp_tick = ntpinterval << tk->ntp_error_shift;
 
 	/*
 	 * The timekeeper keeps its own mult values for the currently
@@ -1144,6 +1145,8 @@ static __always_inline void timekeeping_freqadjust(struct timekeeper *tk,
 	if (tk->ntp_err_mult)
 		xinterval -= tk->cycle_interval;
 
+	tk->ntp_tick = ntp_tick_length();
+
 	/* Calculate current error per tick */
 	tick_error = ntp_tick_length() >> tk->ntp_error_shift;
 	tick_error -= (xinterval + tk->xtime_remainder);
@@ -1289,7 +1292,7 @@ static cycle_t logarithmic_accumulation(struct timekeeper *tk, cycle_t offset,
 	tk->raw_time.tv_nsec = raw_nsecs;
 
 	/* Accumulate error between NTP and clock interval */
-	tk->ntp_error += ntp_tick_length() << shift;
+	tk->ntp_error += tk->ntp_tick << shift;
 	tk->ntp_error -= (tk->xtime_interval + tk->xtime_remainder) <<
 						(tk->ntp_error_shift + shift);
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/3] [RFC] timekeeping: Calculate freq adjustment directly
  2014-05-17  0:56 [PATCH 0/3] timekeeping: Improved NOHZ frequency steering (v2) John Stultz
  2014-05-17  0:56 ` [PATCH 1/3] [RFC] timekeeping: Rework frequency adjustments to work better w/ nohz John Stultz
  2014-05-17  0:56 ` [PATCH 2/3] [RFC] timekeeping: Use cached ntp_tick_length when accumulating error John Stultz
@ 2014-05-17  0:56 ` John Stultz
  2014-05-19 10:14 ` [PATCH 0/3] timekeeping: Improved NOHZ frequency steering (v2) Miroslav Lichvar
  3 siblings, 0 replies; 15+ messages in thread
From: John Stultz @ 2014-05-17  0:56 UTC (permalink / raw)
  To: LKML; +Cc: John Stultz, Miroslav Lichvar, Richard Cochran, Prarit Bhargava

After working with Mirolsav's simulator and his initial patch,
I've grown more comfortable with his approach of calculating
the freq adjustment directly using a division, rather then
trying to aproximate it over a number of ticks.

Part of the rational here is that now the error adjustment
is very small (only a 0 or 1 unit adjustment to the multiplier)
it can take quite some time to reduce any accumulated error.

The approximation method strains this, since it takes
log(adjustment) number of updates to approximate, and we can
accumulate quite a bit of error in that period.

The downside with this approach is it requires doing a 64bit
divide and a few multiplies to properly make and apply the
calculation, which will occur about every second or whenever
the ntp_tick_length() value is adjusted.

The positive side of this approach is that we see very small
error accumulation.

I do have some concern that compared with the simulator, the
scale of the error without this patch in the real world will
be hard to ever observe, the extra overhead of the computation
(particularly on 32bit systems) won't actually provide a benefit.

Cc: Miroslav Lichvar <mlichvar@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 include/linux/timekeeper_internal.h |  2 ++
 kernel/time/timekeeping.c           | 67 +++++++++++++------------------------
 2 files changed, 25 insertions(+), 44 deletions(-)

diff --git a/include/linux/timekeeper_internal.h b/include/linux/timekeeper_internal.h
index 4fddf6c..03e5dff 100644
--- a/include/linux/timekeeper_internal.h
+++ b/include/linux/timekeeper_internal.h
@@ -48,6 +48,8 @@ struct timekeeper {
 	 * ntp shifted nano seconds. */
 	u32			ntp_error_shift;
 
+	/* Mult adjustment being applied to correct freq error */
+	u32			ntp_freq_mult;
 	/* Mult adjustment being applied to correct ntp_error */
 	u32			ntp_err_mult;
 
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 846603e..27c54a6 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -149,6 +149,7 @@ static void tk_setup_internals(struct timekeeper *tk, struct clocksource *clock)
 	 * to counteract clock drifting.
 	 */
 	tk->mult = clock->mult;
+	tk->ntp_freq_mult = clock->mult;
 	tk->ntp_err_mult = 0;
 }
 
@@ -1058,20 +1059,20 @@ device_initcall(timekeeping_init_ops);
  */
 static __always_inline void timekeeping_apply_adjustment(struct timekeeper *tk,
 							 s64 offset,
-							 bool negative,
-							 int adj_scale)
+							 int adj)
 {
 	s64 interval = tk->cycle_interval;
-	s32 mult_adj = 1;
 
-	if (negative) {
-		mult_adj = -mult_adj;
+	if (!adj)
+		return;
+
+	if (unlikely(abs(adj) > 1)) {
+		interval *= adj;
+		offset  *= adj;
+	} else if (adj < 0) {
 		interval = -interval;
 		offset  = -offset;
 	}
-	mult_adj <<= adj_scale;
-	interval <<= adj_scale;
-	offset <<= adj_scale;
 
 	/*
 	 * So the following can be confusing.
@@ -1079,7 +1080,7 @@ static __always_inline void timekeeping_apply_adjustment(struct timekeeper *tk,
 	 * To keep things simple, lets assume mult_adj == 1 for now.
 	 *
 	 * When mult_adj != 1, remember that the interval and offset values
-	 * have been appropriately scaled so the math is the same.
+	 * have been appropriately multiplied so the math is the same.
 	 *
 	 * The basic idea here is that we're increasing the multiplier
 	 * by one, this causes the xtime_interval to be incremented by
@@ -1122,7 +1123,7 @@ static __always_inline void timekeeping_apply_adjustment(struct timekeeper *tk,
 	 *
 	 * XXX - TODO: Doc ntp_error calculation.
 	 */
-	tk->mult += mult_adj;
+	tk->mult += adj;
 	tk->xtime_interval += interval;
 	tk->xtime_nsec -= offset;
 	tk->ntp_error -= (interval - offset) << tk->ntp_error_shift;
@@ -1135,36 +1136,13 @@ static __always_inline void timekeeping_apply_adjustment(struct timekeeper *tk,
 static __always_inline void timekeeping_freqadjust(struct timekeeper *tk,
 							s64 offset)
 {
-	s64 interval = tk->cycle_interval;
-	s64 xinterval = tk->xtime_interval;
-	s64 tick_error;
-	bool negative;
-	u32 adj;
-
-	/* Remove any current error adj from freq calculation */
-	if (tk->ntp_err_mult)
-		xinterval -= tk->cycle_interval;
-
-	tk->ntp_tick = ntp_tick_length();
-
-	/* Calculate current error per tick */
-	tick_error = ntp_tick_length() >> tk->ntp_error_shift;
-	tick_error -= (xinterval + tk->xtime_remainder);
+	s64 tick_ns;
 
-	/* Don't worry about correcting it if its small */
-	if (likely((tick_error >= 0) && (tick_error <= interval)))
+	if (likely(tk->ntp_tick == ntp_tick_length()))
 		return;
-
-	/* preserve the direction of correction */
-	negative = (tick_error < 0);
-
-	/* Sort out the magnitude of the correction */
-	tick_error = abs(tick_error);
-	for (adj = 0; tick_error > interval; adj++)
-		tick_error >>= 1;
-
-	/* scale the corrections */
-	timekeeping_apply_adjustment(tk, offset, negative, adj);
+	tk->ntp_tick = ntp_tick_length();
+	tick_ns = (tk->ntp_tick >> tk->ntp_error_shift) - tk->xtime_remainder;
+	tk->ntp_freq_mult = div64_u64(tick_ns, tk->cycle_interval);
 }
 
 /*
@@ -1173,18 +1151,19 @@ static __always_inline void timekeeping_freqadjust(struct timekeeper *tk,
  */
 static void timekeeping_adjust(struct timekeeper *tk, s64 offset)
 {
+	u32 new_mult;
+
 	/* Correct for the current frequency error */
 	timekeeping_freqadjust(tk, offset);
 
 	/* Next make a small adjustment to fix any cumulative error */
-	if (!tk->ntp_err_mult && (tk->ntp_error > 0)) {
+	if (tk->ntp_error > 0)
 		tk->ntp_err_mult = 1;
-		timekeeping_apply_adjustment(tk, offset, 0, 0);
-	} else if (tk->ntp_err_mult && (tk->ntp_error <= 0)) {
-		/* Undo any existing error adjustment */
-		timekeeping_apply_adjustment(tk, offset, 1, 0);
+	else
 		tk->ntp_err_mult = 0;
-	}
+
+	new_mult = tk->ntp_freq_mult + tk->ntp_err_mult;
+	timekeeping_apply_adjustment(tk, offset, new_mult - tk->mult);
 
 	if (unlikely(tk->clock->maxadj &&
 		(tk->mult > tk->clock->mult + tk->clock->maxadj))) {
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] timekeeping: Improved NOHZ frequency steering (v2)
  2014-05-17  0:56 [PATCH 0/3] timekeeping: Improved NOHZ frequency steering (v2) John Stultz
                   ` (2 preceding siblings ...)
  2014-05-17  0:56 ` [PATCH 3/3] [RFC] timekeeping: Calculate freq adjustment directly John Stultz
@ 2014-05-19 10:14 ` Miroslav Lichvar
  2014-05-19 17:57   ` John Stultz
  3 siblings, 1 reply; 15+ messages in thread
From: Miroslav Lichvar @ 2014-05-19 10:14 UTC (permalink / raw)
  To: John Stultz; +Cc: LKML, Richard Cochran, Prarit Bhargava

On Fri, May 16, 2014 at 05:56:41PM -0700, John Stultz wrote:
> This version of the patch set corrects a few issues Miroslav pointed
> out, as well as adapts his approach almost completely for the last
> patch. This pulls the results in to be very close to his original
> patch.

Ok, so it seems to be almost identical to my patch now. The only two
differences seem to be the removal of the ntp_error correction to
change the effective clock frequency at the current time instead of
aligning it to the start of the tick and the flag to handle the xtime
underflow.

Can we get them in too?

I think both are necessary to avoid having large steps in ntp_error
which take long time to correct. You can see this in the nohz off
freq100 result from the simulator, for example.

> I'm not 100% sure we need the last patch in this series, as
> it has additional computational cost and testing on real
> hardware has shown NOHZ=y performance matching NOHZ=n with a
> earlier version of just the first patch:
> 	https://lkml.org/lkml/2014/1/13/501
> (though admittedly, the patch has changed since Richard's testing,
> so the results are a bit stale).

If I could get a sufficiently low update rate on real HW or a more
accurate reference clock, I think I would be able to show you a
difference. But even if we can't at this time, the removal of the
complex feedback loop, which as you saw is difficult to keep working
well in all cases (and we have tested only a very small number of
them), is probably worth the extra computational cost.

Thanks,

> Below are some of the simulator results comparing this patchset
> to vanilla and Miroslav's original patch.

> Miroslav's original patch:
> --------------------------
> 
> $ ./test1.sh 
> 		freq10          freq100         dev             max
> nohz on		0.00601         0.00028         74.0            279.4
> nohz off	0.05867         0.00204         0.2             0.6

> This patchset:
> --------------
> 
> $ ./test1.sh 
> 		freq10          freq100         dev             max
> nohz on		0.00582         0.00033         74.1            279.9
> nohz off	0.06275         0.06440         0.4             1.4

-- 
Miroslav Lichvar

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] timekeeping: Improved NOHZ frequency steering (v2)
  2014-05-19 10:14 ` [PATCH 0/3] timekeeping: Improved NOHZ frequency steering (v2) Miroslav Lichvar
@ 2014-05-19 17:57   ` John Stultz
  2014-05-20 10:26     ` Miroslav Lichvar
  2014-07-08 11:08     ` Miroslav Lichvar
  0 siblings, 2 replies; 15+ messages in thread
From: John Stultz @ 2014-05-19 17:57 UTC (permalink / raw)
  To: Miroslav Lichvar; +Cc: LKML, Richard Cochran, Prarit Bhargava

On Mon, May 19, 2014 at 3:14 AM, Miroslav Lichvar <mlichvar@redhat.com> wrote:
> On Fri, May 16, 2014 at 05:56:41PM -0700, John Stultz wrote:
>> This version of the patch set corrects a few issues Miroslav pointed
>> out, as well as adapts his approach almost completely for the last
>> patch. This pulls the results in to be very close to his original
>> patch.
>
> Ok, so it seems to be almost identical to my patch now. The only two
> differences seem to be the removal of the ntp_error correction to
> change the effective clock frequency at the current time instead of
> aligning it to the start of the tick and the flag to handle the xtime
> underflow.
>
> Can we get them in too?

So as for the ntp_error correction when we adjust the multiplier, I
believe that is still needed, since when we adjust the freq we move
the xtime_nsec value, which means we need to adjust the error from
that xtime_nsec value to the ideal value at that point.  But if you
can provide a better rational as to why it can be safely dropped, I'd
be happy to listen.

As for the xtime underflow handling, I'm still hesitant here since its
just a bit ugly. :)   So I'm trying to think of a cleaner way.

> I think both are necessary to avoid having large steps in ntp_error
> which take long time to correct. You can see this in the nohz off
> freq100 result from the simulator, for example.

So one part of my concern is I'm not as comfortable bending over
backwards all the time to avoid ntp_error. Sure, if we can avoid it in
the common case, that makes things better, but having occasional
bursts for good reason (like the underflow issue) I think is important
so we can ensure that the logic to converge does actually function
with larger errors. If we constrain all the cases where error can be
added, I worry it will make the correction logic fragile in the end.
But I'm not totally settled on this.

Another area we have to be careful with is there are still
architectures (powerpc and ia64) which haven't switched from the old
vsyscall rounding logic (CONFIG_GENERIC_TIME_VSYSCALL_OLD). In these
cases we add up to 1ns of error each tick/update as we round up the
sub-nanoseconds to ensure we don't see inconsistencies.  If the
adjustment logic can't handle this, I don't want to regress those
arches.

>> I'm not 100% sure we need the last patch in this series, as
>> it has additional computational cost and testing on real
>> hardware has shown NOHZ=y performance matching NOHZ=n with a
>> earlier version of just the first patch:
>>       https://lkml.org/lkml/2014/1/13/501
>> (though admittedly, the patch has changed since Richard's testing,
>> so the results are a bit stale).
>
> If I could get a sufficiently low update rate on real HW or a more
> accurate reference clock, I think I would be able to show you a
> difference. But even if we can't at this time, the removal of the
> complex feedback loop, which as you saw is difficult to keep working
> well in all cases (and we have tested only a very small number of
> them), is probably worth the extra computational cost.

 I do like the straightforward aspect of calculating the freq
adjustment, but I still am worried about particularly 32bit systems
where the extra 64bit divide and multiplies may add quite a bit of
overhead in the irq path (even if it is only done ~once a second).
But I'm going to try to get some measurement numbers to validate that.

thanks
-john

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] timekeeping: Improved NOHZ frequency steering (v2)
  2014-05-19 17:57   ` John Stultz
@ 2014-05-20 10:26     ` Miroslav Lichvar
  2014-07-08 11:08     ` Miroslav Lichvar
  1 sibling, 0 replies; 15+ messages in thread
From: Miroslav Lichvar @ 2014-05-20 10:26 UTC (permalink / raw)
  To: John Stultz; +Cc: LKML, Richard Cochran, Prarit Bhargava

On Mon, May 19, 2014 at 10:57:29AM -0700, John Stultz wrote:
> On Mon, May 19, 2014 at 3:14 AM, Miroslav Lichvar <mlichvar@redhat.com> wrote:
> > Ok, so it seems to be almost identical to my patch now. The only two
> > differences seem to be the removal of the ntp_error correction to
> > change the effective clock frequency at the current time instead of
> > aligning it to the start of the tick and the flag to handle the xtime
> > underflow.

> So as for the ntp_error correction when we adjust the multiplier, I
> believe that is still needed, since when we adjust the freq we move
> the xtime_nsec value, which means we need to adjust the error from
> that xtime_nsec value to the ideal value at that point.  But if you
> can provide a better rational as to why it can be safely dropped, I'd
> be happy to listen.

Ok, I tried to make a drawing to explain it better
http://mlichvar.fedorapeople.org/tmp/mult_update.png

My understanding of the current code is this. The update of mult value
doesn't happen exactly on tick boundary, so xtime needs to be
corrected to not break the continuity of the observed time (i.e. time
with old mult/xtime is equal to the time with new mult/xtime at the
time of the update), that's the start of the blue line in the drawing.
To make it appear as if the frequency was adjusted on tick boundary
(green line), the xtime correction is added to ntp_error and the time
is slowly corrected in the following updates (red line). 

I think that ntp_error correction is not necessary and we can stick
to the blue line. The frequency of the clock was changed at the
current time instead of the start of the current tick.

> So one part of my concern is I'm not as comfortable bending over
> backwards all the time to avoid ntp_error. Sure, if we can avoid it in
> the common case, that makes things better, but having occasional
> bursts for good reason (like the underflow issue) I think is important
> so we can ensure that the logic to converge does actually function
> with larger errors.

It will converge but very slowly, I'm not sure if that's acceptable
even if it happens rarely. If it was able to correct larger errors
quickly, we would have the other problems we were trying to fix back
again.

> If we constrain all the cases where error can be
> added, I worry it will make the correction logic fragile in the end.
> But I'm not totally settled on this.

The correction logic is simple and robust, but slow. It can be used
only if all these sources of the error are removed.

> Another area we have to be careful with is there are still
> architectures (powerpc and ia64) which haven't switched from the old
> vsyscall rounding logic (CONFIG_GENERIC_TIME_VSYSCALL_OLD). In these
> cases we add up to 1ns of error each tick/update as we round up the
> sub-nanoseconds to ensure we don't see inconsistencies.  If the
> adjustment logic can't handle this, I don't want to regress those
> arches.

Yes, this would probably need to be fixed too. If converting the
remaining vsyscalls is difficult, maybe a short-term fix could be
adding a new register for the rounding error and use it to select
which direction should be xtime rounded instead of always rounding up?

Thanks,

--
Miroslav Lichvar

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] timekeeping: Improved NOHZ frequency steering (v2)
  2014-05-19 17:57   ` John Stultz
  2014-05-20 10:26     ` Miroslav Lichvar
@ 2014-07-08 11:08     ` Miroslav Lichvar
  2014-07-16  4:02       ` John Stultz
  1 sibling, 1 reply; 15+ messages in thread
From: Miroslav Lichvar @ 2014-07-08 11:08 UTC (permalink / raw)
  To: John Stultz; +Cc: LKML, Richard Cochran, Prarit Bhargava

On Mon, May 19, 2014 at 10:57:29AM -0700, John Stultz wrote:
> Another area we have to be careful with is there are still
> architectures (powerpc and ia64) which haven't switched from the old
> vsyscall rounding logic (CONFIG_GENERIC_TIME_VSYSCALL_OLD). In these
> cases we add up to 1ns of error each tick/update as we round up the
> sub-nanoseconds to ensure we don't see inconsistencies.  If the
> adjustment logic can't handle this, I don't want to regress those
> arches.

I spent some time trying to figure out a workaround for the nanosecond
rounding, but I didn't find anything that wouldn't complicate the mult
adjustment logic and bring back the problems which the direct division
approach is supposed to solve.

It seems it may be a while before the old vsyscalls are fixed. How
about including only the first two patches from this set for now?

Thanks,

-- 
Miroslav Lichvar

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] timekeeping: Improved NOHZ frequency steering (v2)
  2014-07-08 11:08     ` Miroslav Lichvar
@ 2014-07-16  4:02       ` John Stultz
  2014-07-16  6:59         ` Thomas Gleixner
                           ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: John Stultz @ 2014-07-16  4:02 UTC (permalink / raw)
  To: Miroslav Lichvar; +Cc: LKML, Richard Cochran, Prarit Bhargava, Thomas Gleixner

On 07/08/2014 04:08 AM, Miroslav Lichvar wrote:
> On Mon, May 19, 2014 at 10:57:29AM -0700, John Stultz wrote:
>> Another area we have to be careful with is there are still
>> architectures (powerpc and ia64) which haven't switched from the old
>> vsyscall rounding logic (CONFIG_GENERIC_TIME_VSYSCALL_OLD). In these
>> cases we add up to 1ns of error each tick/update as we round up the
>> sub-nanoseconds to ensure we don't see inconsistencies.  If the
>> adjustment logic can't handle this, I don't want to regress those
>> arches.
> I spent some time trying to figure out a workaround for the nanosecond
> rounding, but I didn't find anything that wouldn't complicate the mult
> adjustment logic and bring back the problems which the direct division
> approach is supposed to solve.
>
> It seems it may be a while before the old vsyscalls are fixed. How
> about including only the first two patches from this set for now?

Hey! Sorry for the slow response here, I was on the road the last two weeks.

So thanks for the ping here. If you're happy with the first two as an
initial step, I'd be willing to try to push those in. The only trouble
is there's a whole lot of timekeeping churn headed for 3.17 that Thomas
has cooked up. While there isn't likely to be direct conflicts in the
changes, I get nervous about mixing too many changes in subtle code at once.

But maybe I'm being overly cautious?

Thomas: Would you be ok if I push you some patches to improve ntp/ptp
freq steering w/ NOHZ for this merge window as well? So far the patches
have only shown nice improvements and no regressions, but its clearly
subtle code, and there's a lot going on in this merge window.

thanks
-john

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] timekeeping: Improved NOHZ frequency steering (v2)
  2014-07-16  4:02       ` John Stultz
@ 2014-07-16  6:59         ` Thomas Gleixner
  2014-07-16 11:57         ` Miroslav Lichvar
  2017-05-12 15:14         ` Miroslav Lichvar
  2 siblings, 0 replies; 15+ messages in thread
From: Thomas Gleixner @ 2014-07-16  6:59 UTC (permalink / raw)
  To: John Stultz; +Cc: Miroslav Lichvar, LKML, Richard Cochran, Prarit Bhargava

On Tue, 15 Jul 2014, John Stultz wrote:

> On 07/08/2014 04:08 AM, Miroslav Lichvar wrote:
> > On Mon, May 19, 2014 at 10:57:29AM -0700, John Stultz wrote:
> >> Another area we have to be careful with is there are still
> >> architectures (powerpc and ia64) which haven't switched from the old
> >> vsyscall rounding logic (CONFIG_GENERIC_TIME_VSYSCALL_OLD). In these
> >> cases we add up to 1ns of error each tick/update as we round up the
> >> sub-nanoseconds to ensure we don't see inconsistencies.  If the
> >> adjustment logic can't handle this, I don't want to regress those
> >> arches.
> > I spent some time trying to figure out a workaround for the nanosecond
> > rounding, but I didn't find anything that wouldn't complicate the mult
> > adjustment logic and bring back the problems which the direct division
> > approach is supposed to solve.
> >
> > It seems it may be a while before the old vsyscalls are fixed. How
> > about including only the first two patches from this set for now?
> 
> Hey! Sorry for the slow response here, I was on the road the last two weeks.
> 
> So thanks for the ping here. If you're happy with the first two as an
> initial step, I'd be willing to try to push those in. The only trouble
> is there's a whole lot of timekeeping churn headed for 3.17 that Thomas
> has cooked up. While there isn't likely to be direct conflicts in the
> changes, I get nervous about mixing too many changes in subtle code at once.
> 
> But maybe I'm being overly cautious?
> 
> Thomas: Would you be ok if I push you some patches to improve ntp/ptp
> freq steering w/ NOHZ for this merge window as well? So far the patches
> have only shown nice improvements and no regressions, but its clearly
> subtle code, and there's a lot going on in this merge window.

I think it's ok. The steering is orthogonal to the other stuff cooking.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] timekeeping: Improved NOHZ frequency steering (v2)
  2014-07-16  4:02       ` John Stultz
  2014-07-16  6:59         ` Thomas Gleixner
@ 2014-07-16 11:57         ` Miroslav Lichvar
  2017-05-12 15:14         ` Miroslav Lichvar
  2 siblings, 0 replies; 15+ messages in thread
From: Miroslav Lichvar @ 2014-07-16 11:57 UTC (permalink / raw)
  To: John Stultz; +Cc: LKML, Richard Cochran, Prarit Bhargava, Thomas Gleixner

On Tue, Jul 15, 2014 at 09:02:38PM -0700, John Stultz wrote:
> On 07/08/2014 04:08 AM, Miroslav Lichvar wrote:
> > It seems it may be a while before the old vsyscalls are fixed. How
> > about including only the first two patches from this set for now?
> 
> Hey! Sorry for the slow response here, I was on the road the last two weeks.
> 
> So thanks for the ping here. If you're happy with the first two as an
> initial step, I'd be willing to try to push those in.

Ok, thanks.

> Thomas: Would you be ok if I push you some patches to improve ntp/ptp
> freq steering w/ NOHZ for this merge window as well? So far the patches
> have only shown nice improvements and no regressions, but its clearly
> subtle code, and there's a lot going on in this merge window.

For the record, here are test results from the simulator. There is a
regression in the performance with disabled nohz, it takes longer for
the clock to converge, but I think it's acceptable. When compared on
the same time scale, disabled nohz still seems to be significantly
better than nohz enabled.

3.16-rc5:
$ ./test1.sh 
                freq10          freq100         dev             max    
nohz on  	38.38368        2.72579         1468940.9       9846788.2
nohz off 	3.89181         0.10436         0.2             0.6

3.16-rc5 + two patches from this set:
$ ./test1.sh 
                freq10          freq100         dev             max    
nohz on		1.13791         0.05155         85.0            313.7
nohz off	1.02364         0.05149         0.5             1.7

Here are results from a new script I added to linux-tktest, which
I think shows better how the clock is converging.

3.16-rc5:

$ ./test2.sh 
nohz on             [1, samples/2]           [samples/2 + 1, samples]
samples         freq       dev       max      freq       dev       max
10          32.08360   16039.4   23435.5  39.61782   11077.4   30938.0
30          13.84304   35033.8   80321.5   8.52682   21495.6   52639.6
100          0.68945   31395.7   86405.6   1.26818   26426.9   98448.1
300          0.18118   33906.1  104062.0   0.11419   21147.4  102418.1
1000         0.02699   32254.1  120619.4   0.02234   21222.3  120107.6
3000         0.00515   31210.4  131688.9   0.00525   23067.8  140999.4
10000        0.00077   31904.1  149537.0   0.00079   21688.2  144957.5
30000        0.00015   31422.9  158176.5   0.00015   21867.9  159288.5
100000       0.00003   31301.3  170099.5   0.00003   22265.5  168985.6

nohz off            [1, samples/2]           [samples/2 + 1, samples]
samples         freq       dev       max      freq       dev       max
10           6.64956      14.0      18.8   7.10428       2.4       5.1
30           2.87697      13.1      33.0   0.02541       0.1       0.4
100          0.38990       9.9      35.1   0.00502       0.2       0.5
300          0.04733       6.5      42.9   0.00096       0.2       0.5
1000         0.00437       3.7      46.2   0.00023       0.2       0.6
3000         0.00049       2.2      47.1   0.00003       0.2       0.6
10000        0.00004       1.2      47.5   0.00000       0.2       0.6
30000        0.00001       0.8      47.6   0.00000       0.2       0.6
100000       0.00000       0.5      47.6   0.00000       0.2       0.6


3.16-rc5 + two patches from this set:

$ ./test2.sh          
nohz on             [1, samples/2]           [samples/2 + 1, samples]
samples         freq       dev       max      freq       dev       max
10           1.47222    1341.3    2217.8   0.06322       0.2       0.5
30           0.20799     849.5    2448.7   0.06311       0.2       0.6
100          0.04101     492.1    2895.2   0.06311       0.2       0.5
300          0.05660     295.5    3026.1   0.02064      28.3     108.9
1000         0.01994     409.8    2732.1   0.00355      13.7      52.2
3000         0.00477     469.1    3238.9   0.00070      11.0      40.9
10000        0.00081     377.3    3791.6   0.00013       9.4      36.2
30000        0.00016     259.9    4055.7   0.00004       8.9      34.1
100000       0.00003     159.0    4177.2   0.00000      13.7      58.4

nohz off            [1, samples/2]           [samples/2 + 1, samples]
samples         freq       dev       max      freq       dev       max
10           3.55062       6.2      10.8   0.05730       0.0       0.0
30           0.44672       4.5      14.1   0.05724       0.2       0.5
100          0.03649       2.7      17.4   0.05711       0.2       0.5
300          0.05815       1.7      18.7   0.06313       0.2       0.5
1000         0.06270       1.0      19.1   0.06315       0.2       0.5
3000         0.05720       1.9      19.9   0.02065       1.1       4.1
10000        0.01947      13.5      41.0   0.00339       0.5       1.7
30000        0.00448      17.5      75.9   0.00065       0.3       1.0
100000       0.00078      14.2     101.7   0.00012       0.2       0.7

-- 
Miroslav Lichvar

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] timekeeping: Improved NOHZ frequency steering (v2)
  2014-07-16  4:02       ` John Stultz
  2014-07-16  6:59         ` Thomas Gleixner
  2014-07-16 11:57         ` Miroslav Lichvar
@ 2017-05-12 15:14         ` Miroslav Lichvar
  2017-05-12 17:26           ` John Stultz
  2 siblings, 1 reply; 15+ messages in thread
From: Miroslav Lichvar @ 2017-05-12 15:14 UTC (permalink / raw)
  To: John Stultz; +Cc: LKML, Richard Cochran, Prarit Bhargava, Marcelo Tosatti

On Tue, Jul 15, 2014 at 09:02:38PM -0700, John Stultz wrote:
> On 07/08/2014 04:08 AM, Miroslav Lichvar wrote:
> > I spent some time trying to figure out a workaround for the nanosecond
> > rounding, but I didn't find anything that wouldn't complicate the mult
> > adjustment logic and bring back the problems which the direct division
> > approach is supposed to solve.
> >
> > It seems it may be a while before the old vsyscalls are fixed. How
> > about including only the first two patches from this set for now?

> So thanks for the ping here. If you're happy with the first two as an
> initial step, I'd be willing to try to push those in. The only trouble
> is there's a whole lot of timekeeping churn headed for 3.17 that Thomas
> has cooked up. While there isn't likely to be direct conflicts in the
> changes, I get nervous about mixing too many changes in subtle code at once.

I'm sorry for digging up this skeleton. Are we any closer to being
able to remove CONFIG_GENERIC_TIME_VSYSCALL_OLD, which prevented
simplifying the steering logic of the internal clock?

The two patches added in 3.17 helped, but there is still the problem
that it takes too long for the kernel clock to converge to the
internal NTP time when a large error was accumulated, e.g. a large
frequency adjustment was made to correct the initial offset of the
clock.

It seems the error can reach milliseconds and take hours or even days
to be corrected. As the rate at which the error is decreasing is not
constant (due to random clock updates), it doesn't seem to be possible
to synchronize the clock with better stability than few tens or
hundreds of nanoseconds.

With the new PTP KVM clock the problem can be easily observed. Here is
a graph of the offset and frequency as measured by chronyd when
configured to synchronize the guest's clock to the host using the
virtual PHC. In the middle is the point when the NTP error reached
zero. The apparent frequency jumped by about 50 ppb and the offset
improved by an order of magnitude.

https://mlichvar.fedorapeople.org/tmp/kvm_phc.png

I see this with real PHCs and PTP/NTP synchronization too. It's very
confusing when the timekeeping changes so much for no apparent reason.
If we can't remove the old vsyscalls yet, I was thinking maybe a new
flag could be added to adjtimex to report the error, so applications
can at least detect this problem and consider stepping the clock in
order to reset the error?

Thoughts?

-- 
Miroslav Lichvar

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] timekeeping: Improved NOHZ frequency steering (v2)
  2017-05-12 15:14         ` Miroslav Lichvar
@ 2017-05-12 17:26           ` John Stultz
  2017-05-17 15:03             ` Miroslav Lichvar
  0 siblings, 1 reply; 15+ messages in thread
From: John Stultz @ 2017-05-12 17:26 UTC (permalink / raw)
  To: Miroslav Lichvar
  Cc: LKML, Richard Cochran, Prarit Bhargava, Marcelo Tosatti,
	Paul Mackerras, Anton Blanchard, Benjamin Herrenschmidt,
	Tony Luck, Michael Ellerman, Fenghua Yu

On Fri, May 12, 2017 at 8:14 AM, Miroslav Lichvar <mlichvar@redhat.com> wrote:
> On Tue, Jul 15, 2014 at 09:02:38PM -0700, John Stultz wrote:
>> On 07/08/2014 04:08 AM, Miroslav Lichvar wrote:
>> > I spent some time trying to figure out a workaround for the nanosecond
>> > rounding, but I didn't find anything that wouldn't complicate the mult
>> > adjustment logic and bring back the problems which the direct division
>> > approach is supposed to solve.
>> >
>> > It seems it may be a while before the old vsyscalls are fixed. How
>> > about including only the first two patches from this set for now?
>
>> So thanks for the ping here. If you're happy with the first two as an
>> initial step, I'd be willing to try to push those in. The only trouble
>> is there's a whole lot of timekeeping churn headed for 3.17 that Thomas
>> has cooked up. While there isn't likely to be direct conflicts in the
>> changes, I get nervous about mixing too many changes in subtle code at once.
>
> I'm sorry for digging up this skeleton. Are we any closer to being
> able to remove CONFIG_GENERIC_TIME_VSYSCALL_OLD, which prevented
> simplifying the steering logic of the internal clock?

Yea. I think we've waited for a few years w/o action on this from the
ppc and ia64 folks.

Probably time to put a compile warning in making it clear its going to
be removed in the next release or two to force the issue.


> With the new PTP KVM clock the problem can be easily observed. Here is
> a graph of the offset and frequency as measured by chronyd when
> configured to synchronize the guest's clock to the host using the
> virtual PHC. In the middle is the point when the NTP error reached
> zero. The apparent frequency jumped by about 50 ppb and the offset
> improved by an order of magnitude.
>
> https://mlichvar.fedorapeople.org/tmp/kvm_phc.png
>
> I see this with real PHCs and PTP/NTP synchronization too. It's very
> confusing when the timekeeping changes so much for no apparent reason.
> If we can't remove the old vsyscalls yet, I was thinking maybe a new
> flag could be added to adjtimex to report the error, so applications
> can at least detect this problem and consider stepping the clock in
> order to reset the error?
>
> Thoughts?

I'd rather not have short-term hacks that applications have to adapt.
So I think we should drop the old vsyscall method in the near term.
Sorry this sort of fell off my radar.

Do you have an updated set of patches you want to get ready to address
the issue? We can get those reviewed while we increase the pressure on
dropping the OLD_VSYSCALL implementations.

thanks
-john

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] timekeeping: Improved NOHZ frequency steering (v2)
  2017-05-12 17:26           ` John Stultz
@ 2017-05-17 15:03             ` Miroslav Lichvar
  0 siblings, 0 replies; 15+ messages in thread
From: Miroslav Lichvar @ 2017-05-17 15:03 UTC (permalink / raw)
  To: John Stultz
  Cc: LKML, Richard Cochran, Prarit Bhargava, Marcelo Tosatti,
	Paul Mackerras, Anton Blanchard, Benjamin Herrenschmidt,
	Tony Luck, Michael Ellerman, Fenghua Yu

On Fri, May 12, 2017 at 10:26:12AM -0700, John Stultz wrote:
> On Fri, May 12, 2017 at 8:14 AM, Miroslav Lichvar <mlichvar@redhat.com> wrote:
> > I see this with real PHCs and PTP/NTP synchronization too. It's very
> > confusing when the timekeeping changes so much for no apparent reason.
> > If we can't remove the old vsyscalls yet, I was thinking maybe a new
> > flag could be added to adjtimex to report the error, so applications
> > can at least detect this problem and consider stepping the clock in
> > order to reset the error?
> >
> > Thoughts?
> 
> I'd rather not have short-term hacks that applications have to adapt.
> So I think we should drop the old vsyscall method in the near term.
> Sorry this sort of fell off my radar.

Ok. Sounds good.

> Do you have an updated set of patches you want to get ready to address
> the issue? We can get those reviewed while we increase the pressure on
> dropping the OLD_VSYSCALL implementations.

I'll send an RFC series shortly.

Thanks,

-- 
Miroslav Lichvar

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-05-17 15:03 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-17  0:56 [PATCH 0/3] timekeeping: Improved NOHZ frequency steering (v2) John Stultz
2014-05-17  0:56 ` [PATCH 1/3] [RFC] timekeeping: Rework frequency adjustments to work better w/ nohz John Stultz
2014-05-17  0:56 ` [PATCH 2/3] [RFC] timekeeping: Use cached ntp_tick_length when accumulating error John Stultz
2014-05-17  0:56 ` [PATCH 3/3] [RFC] timekeeping: Calculate freq adjustment directly John Stultz
2014-05-19 10:14 ` [PATCH 0/3] timekeeping: Improved NOHZ frequency steering (v2) Miroslav Lichvar
2014-05-19 17:57   ` John Stultz
2014-05-20 10:26     ` Miroslav Lichvar
2014-07-08 11:08     ` Miroslav Lichvar
2014-07-16  4:02       ` John Stultz
2014-07-16  6:59         ` Thomas Gleixner
2014-07-16 11:57         ` Miroslav Lichvar
2017-05-12 15:14         ` Miroslav Lichvar
2017-05-12 17:26           ` John Stultz
2017-05-17 15:03             ` Miroslav Lichvar
  -- strict thread matches above, loose matches on Subject: below --
2014-04-24 23:04 [PATCH 0/3] timekeeping: Improved NOHZ frequency steering John Stultz
2014-04-24 23:04 ` [PATCH 2/3] [RFC] timekeeping: Use cached ntp_tick_length when accumulating error John Stultz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).