[PATCH] [RFC] timekeeping: Rework frequency adjustments to work better w/ nohz

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] [RFC] timekeeping: Rework frequency adjustments to work better w/ nohz
@ 2014-01-07  3:57 John Stultz
  2014-01-13 17:51 ` Richard Cochran
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: John Stultz @ 2014-01-07  3:57 UTC (permalink / raw)
  To: LKML; +Cc: John Stultz, Miroslav Lichvar, Richard Cochran, Prarit Bhargava

Got a few cycles to take another look at this, and tried to address
Miroslav's latest comments. Please let me know if you have further
thoughts!

thanks
-john


The existing timekeeping_adjust logic has always been complicated
to understand. Further, since it was developed prior to NOHZ becoming
common, its not surprising it performs poorly when NOHZ is enabled.

Since Miroslav pointed out the problematic nature of the existing code
in the NOHZ case, I've tried to refactor the code to perform better.

The problem with the previous approach was that it tried to adjust
for the total cumulative error using a scaled dampening factor. This
resulted in large errors to be corrected slowly, while small errors
were corrected quickly. With NOHZ the timekeeping code doesn't know
how far out the next tick will be, so this results in bad
over-correction to small errors, and insufficient correction to large
errors.

Inspired by Miroslav's patch, I've refactored the code to try to
address the correction in two steps.

1) Check the future freq error for the next tick, and if the frequency
error is large, try to make sure we correct that error as best we can.

2) Then make a small single unit adjustment to correct any cumulative
error that we've collected.

This method performs fairly well in the simulator Miroslav created.

I also dropped the tk->ntp_error adjustments, as I've never quite
been able to recall how that was correct, and it doesn't seem to 
have any affect in the simulator. Will have to proceed carefully
with testing here.

I still think this is probably 3.15 or later material, but I'd be
very interested in feedback, thoughts, and testing.

Cc: Miroslav Lichvar <mlichvar@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Reported-by: Miroslav Lichvar <mlichvar@redhat.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 kernel/time/timekeeping.c | 163 +++++++++++++++++-----------------------------
 1 file changed, 61 insertions(+), 102 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 87b4f00..15354d4 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1048,54 +1048,50 @@ static int __init timekeeping_init_ops(void)
 device_initcall(timekeeping_init_ops);
 
 /*
- * If the error is already larger, we look ahead even further
- * to compensate for late or lost adjustments.
+ * Calculate the future error caused by incorrect freq value
+ * and adjust the timekeeper to correct that.
  */
-static __always_inline int timekeeping_bigadjust(struct timekeeper *tk,
-						 s64 error, s64 *interval,
-						 s64 *offset)
+static __always_inline void timekeeping_freqadjust(struct timekeeper *tk,
+						 s64 interval,
+						 s64 offset)
 {
 	s64 tick_error, i;
-	u32 look_ahead, adj;
-	s32 error2, mult;
+	u32 adj;
+	s32 mult = 1;
 
-	/*
-	 * Use the current error value to determine how much to look ahead.
-	 * The larger the error the slower we adjust for it to avoid problems
-	 * with losing too many ticks, otherwise we would overadjust and
-	 * produce an even larger error.  The smaller the adjustment the
-	 * faster we try to adjust for it, as lost ticks can do less harm
-	 * here.  This is tuned so that an error of about 1 msec is adjusted
-	 * within about 1 sec (or 2^20 nsec in 2^SHIFT_HZ ticks).
-	 */
-	error2 = tk->ntp_error >> (NTP_SCALE_SHIFT + 22 - 2 * SHIFT_HZ);
-	error2 = abs(error2);
-	for (look_ahead = 0; error2 > 0; look_ahead++)
-		error2 >>= 2;
+	/* Calculate current error per tick */
+	tick_error = ntp_tick_length() >> tk->ntp_error_shift;
+	tick_error -= tk->xtime_interval;
 
-	/*
-	 * Now calculate the error in (1 << look_ahead) ticks, but first
-	 * remove the single look ahead already included in the error.
-	 */
-	tick_error = ntp_tick_length() >> (tk->ntp_error_shift + 1);
-	tick_error -= tk->xtime_interval >> 1;
-	error = ((error - tick_error) >> look_ahead) + tick_error;
+	/* Don't worry about correcting it if its small */
+	if (likely(abs(tick_error) < 2*interval))
+		return;
 
-	/* Finally calculate the adjustment shift value.  */
-	i = *interval;
-	mult = 1;
-	if (error < 0) {
-		error = -error;
-		*interval = -*interval;
-		*offset = -*offset;
-		mult = -1;
+	if (tick_error < 0) {
+		interval = -interval;
+		offset = -offset;
+		mult = -mult;
 	}
-	for (adj = 0; error > i; adj++)
-		error >>= 1;
 
-	*interval <<= adj;
-	*offset <<= adj;
-	return mult << adj;
+	/* Sort out the magnitude of the correction */
+	tick_error = abs(tick_error);
+	i = abs(interval);
+	for (adj = 0; tick_error > i; adj++)
+		tick_error >>= 1;
+
+	/* scale the corrections */
+	interval <<= adj;
+	offset <<= adj;
+	mult <<= adj;
+
+	/*
+	 * Apply the correction to the timekeeper.
+	 * See long comment in timekeeping_adjust to explain the math.
+	 */
+	tk->mult += mult;
+	tk->xtime_interval += interval;
+	tk->xtime_nsec -= offset;
+
 }
 
 /*
@@ -1108,65 +1104,23 @@ static void timekeeping_adjust(struct timekeeper *tk, s64 offset)
 	s64 error, interval = tk->cycle_interval;
 	int adj;
 
-	/*
-	 * The point of this is to check if the error is greater than half
-	 * an interval.
-	 *
-	 * First we shift it down from NTP_SHIFT to clocksource->shifted nsecs.
-	 *
-	 * Note we subtract one in the shift, so that error is really error*2.
-	 * This "saves" dividing(shifting) interval twice, but keeps the
-	 * (error > interval) comparison as still measuring if error is
-	 * larger than half an interval.
-	 *
-	 * Note: It does not "save" on aggravation when reading the code.
-	 */
-	error = tk->ntp_error >> (tk->ntp_error_shift - 1);
-	if (error > interval) {
-		/*
-		 * We now divide error by 4(via shift), which checks if
-		 * the error is greater than twice the interval.
-		 * If it is greater, we need a bigadjust, if its smaller,
-		 * we can adjust by 1.
-		 */
-		error >>= 2;
-		/*
-		 * XXX - In update_wall_time, we round up to the next
-		 * nanosecond, and store the amount rounded up into
-		 * the error. This causes the likely below to be unlikely.
-		 *
-		 * The proper fix is to avoid rounding up by using
-		 * the high precision tk->xtime_nsec instead of
-		 * xtime.tv_nsec everywhere. Fixing this will take some
-		 * time.
-		 */
-		if (likely(error <= interval))
-			adj = 1;
-		else
-			adj = timekeeping_bigadjust(tk, error, &interval, &offset);
-	} else {
-		if (error < -interval) {
-			/* See comment above, this is just switched for the negative */
-			error >>= 2;
-			if (likely(error >= -interval)) {
-				adj = -1;
-				interval = -interval;
-				offset = -offset;
-			} else {
-				adj = timekeeping_bigadjust(tk, error, &interval, &offset);
-			}
-		} else {
-			goto out_adjust;
-		}
-	}
+	/* First correct for the current frequency error */
+	timekeeping_freqadjust(tk, interval, offset);
+
+
+	/* Next make a small adjustment to fix any cumulative error */
+	error = tk->ntp_error >> tk->ntp_error_shift;
+	if (likely(abs(error) <= interval/2))
+		goto out_adjust;
+
+	if (error < 0) {
+		adj = -1;
+		interval = -interval;
+		offset = -offset;
+	} else
+		adj = 1;
+
 
-	if (unlikely(tk->clock->maxadj &&
-		(tk->mult + adj > tk->clock->mult + tk->clock->maxadj))) {
-		printk_once(KERN_WARNING
-			"Adjusting %s more than 11%% (%ld vs %ld)\n",
-			tk->clock->name, (long)tk->mult + adj,
-			(long)tk->clock->mult + tk->clock->maxadj);
-	}
 	/*
 	 * So the following can be confusing.
 	 *
@@ -1213,15 +1167,21 @@ static void timekeeping_adjust(struct timekeeper *tk, s64 offset)
 	 *	xtime_nsec_2 = xtime_nsec_1 - offset
 	 * Which simplfies to:
 	 *	xtime_nsec -= offset
-	 *
-	 * XXX - TODO: Doc ntp_error calculation.
 	 */
 	tk->mult += adj;
 	tk->xtime_interval += interval;
 	tk->xtime_nsec -= offset;
-	tk->ntp_error -= (interval - offset) << tk->ntp_error_shift;
 
 out_adjust:
+
+	if (unlikely(tk->clock->maxadj &&
+		(tk->mult > tk->clock->mult + tk->clock->maxadj))) {
+		printk_once(KERN_WARNING
+			"Adjusting %s more than 11%% (%ld vs %ld)\n",
+			tk->clock->name, (long)tk->mult,
+			(long)tk->clock->mult + tk->clock->maxadj);
+	}
+
 	/*
 	 * It may be possible that when we entered this function, xtime_nsec
 	 * was very small.  Further, if we're slightly speeding the clocksource
@@ -1241,7 +1201,6 @@ out_adjust:
 		tk->xtime_nsec = 0;
 		tk->ntp_error += neg << tk->ntp_error_shift;
 	}
-
 }
 
 /**
-- 
1.8.3.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] [RFC] timekeeping: Rework frequency adjustments to work better w/ nohz
  2014-01-07  3:57 [PATCH] [RFC] timekeeping: Rework frequency adjustments to work better w/ nohz John Stultz
@ 2014-01-13 17:51 ` Richard Cochran
  2014-01-13 18:15   ` John Stultz
  2014-02-07 11:45 ` Miroslav Lichvar
  2014-02-12 15:42 ` Miroslav Lichvar
  2 siblings, 1 reply; 11+ messages in thread
From: Richard Cochran @ 2014-01-13 17:51 UTC (permalink / raw)
  To: John Stultz; +Cc: LKML, Miroslav Lichvar, Prarit Bhargava

On Mon, Jan 06, 2014 at 07:57:03PM -0800, John Stultz wrote:
> 
> I still think this is probably 3.15 or later material, but I'd be
> very interested in feedback, thoughts, and testing.

Over the weekend I did a short test of this new approach. I compiled
two kernels, one plain v3.12.7 and one with the proposed fix. The
test was run on three different kernel variants, the current nohz
kernel, the same with nohz=off, and the same but with John's patch.

I used a simple test script (see below). Using a PCIe express card as
a PHC, the Intel Corporation 82574L, I simply let the this clock run
free (not sync'ed to gps or anything), and then synchronized the Linux
system clock to the PHC using the phc2sys program with a sample rate
of once every 32 seconds. Each of the tests ran for at least three
hours on a system without any other load.

- Linux 3.12.7-nohz-plain-20140106                nohz-plain.log
- Linux 3.12.7-nohz-plain-20140106 NOHZ=OFF       periodic-plain.log
- Linux 3.12.7-nohz-fix-20140106-00001-gd753140   nohz-fix.log

The performance in the log files as reflected in the clock offset is
summarized in this table. The values are in nanoseconds.

 |         | periodic-plain |      nohz-fix |    nohz-plain |
 |---------+----------------+---------------+---------------|
 | minimum |  -1.599000e+03 | -1.051000e+03 | -5.373700e+04 |
 | maximum |  +1.311000e+03 | +1.048000e+03 | +6.389500e+04 |
 | mean    |  +9.880240e-02 | -7.747305e+01 | +1.597904e+01 |
 | stddev  |  +4.610021e+02 | +3.960978e+02 | +1.491263e+04 |

I also made two graphs.

   http://linuxptp.sourceforge.net/nohz-fix/current_nohz.png
   http://linuxptp.sourceforge.net/nohz-fix/periodic_vs_fix.png

(The log files and scripts are also in that directory.)

So in this test, it looks like the new approach performed at least as
well as using regular, periodic ticks.

Thanks,
Richard

---
# nohz-fix-testing.sh
#
# set the ptp clock time from the system time
#
./testptp -s; ./testptp -g; date
#
# wait a bit to let the clocks drift
#
sleep 10
#
# start the servo
#
./phc2sys -s eth4 -m -q -l7 -O0 -P0 -I0 -R.03125

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] [RFC] timekeeping: Rework frequency adjustments to work better w/ nohz
  2014-01-13 17:51 ` Richard Cochran
@ 2014-01-13 18:15   ` John Stultz
  2014-01-14  7:07     ` Richard Cochran
  2014-01-28 17:58     ` Richard Cochran
  0 siblings, 2 replies; 11+ messages in thread
From: John Stultz @ 2014-01-13 18:15 UTC (permalink / raw)
  To: Richard Cochran; +Cc: LKML, Miroslav Lichvar, Prarit Bhargava

On 01/13/2014 09:51 AM, Richard Cochran wrote:
> On Mon, Jan 06, 2014 at 07:57:03PM -0800, John Stultz wrote:
>> I still think this is probably 3.15 or later material, but I'd be
>> very interested in feedback, thoughts, and testing.
> Over the weekend I did a short test of this new approach. I compiled
> two kernels, one plain v3.12.7 and one with the proposed fix. The
> test was run on three different kernel variants, the current nohz
> kernel, the same with nohz=off, and the same but with John's patch.
>
> I used a simple test script (see below). Using a PCIe express card as
> a PHC, the Intel Corporation 82574L, I simply let the this clock run
> free (not sync'ed to gps or anything), and then synchronized the Linux
> system clock to the PHC using the phc2sys program with a sample rate
> of once every 32 seconds. Each of the tests ran for at least three
> hours on a system without any other load.
>
> - Linux 3.12.7-nohz-plain-20140106                nohz-plain.log
> - Linux 3.12.7-nohz-plain-20140106 NOHZ=OFF       periodic-plain.log
> - Linux 3.12.7-nohz-fix-20140106-00001-gd753140   nohz-fix.log
>
> The performance in the log files as reflected in the clock offset is
> summarized in this table. The values are in nanoseconds.
>
>  |         | periodic-plain |      nohz-fix |    nohz-plain |
>  |---------+----------------+---------------+---------------|
>  | minimum |  -1.599000e+03 | -1.051000e+03 | -5.373700e+04 |
>  | maximum |  +1.311000e+03 | +1.048000e+03 | +6.389500e+04 |
>  | mean    |  +9.880240e-02 | -7.747305e+01 | +1.597904e+01 |
>  | stddev  |  +4.610021e+02 | +3.960978e+02 | +1.491263e+04 |
>
> I also made two graphs.
>
>    http://linuxptp.sourceforge.net/nohz-fix/current_nohz.png
>    http://linuxptp.sourceforge.net/nohz-fix/periodic_vs_fix.png
>
> (The log files and scripts are also in that directory.)
>
> So in this test, it looks like the new approach performed at least as
> well as using regular, periodic ticks.

That's great to hear! Thanks so much, I really appreciate the testing!
And this is with HZ=?

If you do get a chance to look again, I'd also be interested if running
with nohz=off w/ the fix doesn't show any regression compared to the
unmodified nohz=off case.

I've been trying to validate the ntpd case to ensure there aren't
regressions when there's more erratic steering, but unfortunately got
side-tracked with what seems to be an ntpd/virtualization issue.

thanks
-john








^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] [RFC] timekeeping: Rework frequency adjustments to work better w/ nohz
  2014-01-13 18:15   ` John Stultz
@ 2014-01-14  7:07     ` Richard Cochran
  2014-01-28 17:58     ` Richard Cochran
  1 sibling, 0 replies; 11+ messages in thread
From: Richard Cochran @ 2014-01-14  7:07 UTC (permalink / raw)
  To: John Stultz; +Cc: LKML, Miroslav Lichvar, Prarit Bhargava

On Mon, Jan 13, 2014 at 10:15:18AM -0800, John Stultz wrote:
> That's great to hear! Thanks so much, I really appreciate the testing!
> And this is with HZ=?

HZ=1000
 
> If you do get a chance to look again, I'd also be interested if running
> with nohz=off w/ the fix doesn't show any regression compared to the
> unmodified nohz=off case.

Okay, will do.
 
Thanks,
Richard

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] [RFC] timekeeping: Rework frequency adjustments to work better w/ nohz
  2014-01-13 18:15   ` John Stultz
  2014-01-14  7:07     ` Richard Cochran
@ 2014-01-28 17:58     ` Richard Cochran
  2014-01-29 17:56       ` John Stultz
  1 sibling, 1 reply; 11+ messages in thread
From: Richard Cochran @ 2014-01-28 17:58 UTC (permalink / raw)
  To: John Stultz; +Cc: LKML, Miroslav Lichvar, Prarit Bhargava

I tested for a regression using the patched kernel with the nohz=off
command line option...

On Mon, Jan 13, 2014 at 10:15:18AM -0800, John Stultz wrote:
> On 01/13/2014 09:51 AM, Richard Cochran wrote:
> >
> > - Linux 3.12.7-nohz-plain-20140106                nohz-plain.log
> > - Linux 3.12.7-nohz-plain-20140106 NOHZ=OFF       periodic-plain.log
> > - Linux 3.12.7-nohz-fix-20140106-00001-gd753140   nohz-fix.log

Regression check:

 - Linux homeboy 3.12.7-nohz-fix-20140106-00001-gd753140 NOHZ=OFF
   nohz-fix-periodic.log

> > The performance in the log files as reflected in the clock offset is
> > summarized in this table. The values are in nanoseconds.
> >
> >  |         | periodic-plain |      nohz-fix |    nohz-plain |
> >  |---------+----------------+---------------+---------------|
> >  | minimum |  -1.599000e+03 | -1.051000e+03 | -5.373700e+04 |
> >  | maximum |  +1.311000e+03 | +1.048000e+03 | +6.389500e+04 |
> >  | mean    |  +9.880240e-02 | -7.747305e+01 | +1.597904e+01 |
> >  | stddev  |  +4.610021e+02 | +3.960978e+02 | +1.491263e+04 |

Comparing the nohz=off case with and without the patch, the three hour
test looks like this.

  |         | periodic-plain | nohz-fix-periodic |
  |---------+----------------+-------------------+
  | minimum |  -1.599000e+03 |     -1.427000e+03 |
  | maximum |  +1.311000e+03 |     +1.279000e+03 |
  | mean    |  +9.880240e-02 |     -2.710778e+01 |
  | stddev  |  +4.610021e+02 |     +3.974372e+02 |

> >    http://linuxptp.sourceforge.net/nohz-fix/current_nohz.png
> >    http://linuxptp.sourceforge.net/nohz-fix/periodic_vs_fix.png

I also made a third graph showing before and after the patch.

    http://linuxptp.sourceforge.net/nohz-fix/nohz_regression.png

> If you do get a chance to look again, I'd also be interested if running
> with nohz=off w/ the fix doesn't show any regression compared to the
> unmodified nohz=off case.

Looks like there is no regression.

Thanks,
Richard



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] [RFC] timekeeping: Rework frequency adjustments to work better w/ nohz
  2014-01-28 17:58     ` Richard Cochran
@ 2014-01-29 17:56       ` John Stultz
  0 siblings, 0 replies; 11+ messages in thread
From: John Stultz @ 2014-01-29 17:56 UTC (permalink / raw)
  To: Richard Cochran; +Cc: LKML, Miroslav Lichvar, Prarit Bhargava

On Tue, Jan 28, 2014 at 9:58 AM, Richard Cochran
<richardcochran@gmail.com> wrote:
> I tested for a regression using the patched kernel with the nohz=off
> command line option...
>
> On Mon, Jan 13, 2014 at 10:15:18AM -0800, John Stultz wrote:
>> On 01/13/2014 09:51 AM, Richard Cochran wrote:
>> >
>> > - Linux 3.12.7-nohz-plain-20140106                nohz-plain.log
>> > - Linux 3.12.7-nohz-plain-20140106 NOHZ=OFF       periodic-plain.log
>> > - Linux 3.12.7-nohz-fix-20140106-00001-gd753140   nohz-fix.log
>
> Regression check:
>
>  - Linux homeboy 3.12.7-nohz-fix-20140106-00001-gd753140 NOHZ=OFF
>    nohz-fix-periodic.log
>
>> > The performance in the log files as reflected in the clock offset is
>> > summarized in this table. The values are in nanoseconds.
>> >
>> >  |         | periodic-plain |      nohz-fix |    nohz-plain |
>> >  |---------+----------------+---------------+---------------|
>> >  | minimum |  -1.599000e+03 | -1.051000e+03 | -5.373700e+04 |
>> >  | maximum |  +1.311000e+03 | +1.048000e+03 | +6.389500e+04 |
>> >  | mean    |  +9.880240e-02 | -7.747305e+01 | +1.597904e+01 |
>> >  | stddev  |  +4.610021e+02 | +3.960978e+02 | +1.491263e+04 |
>
> Comparing the nohz=off case with and without the patch, the three hour
> test looks like this.
>
>   |         | periodic-plain | nohz-fix-periodic |
>   |---------+----------------+-------------------+
>   | minimum |  -1.599000e+03 |     -1.427000e+03 |
>   | maximum |  +1.311000e+03 |     +1.279000e+03 |
>   | mean    |  +9.880240e-02 |     -2.710778e+01 |
>   | stddev  |  +4.610021e+02 |     +3.974372e+02 |
>
>> >    http://linuxptp.sourceforge.net/nohz-fix/current_nohz.png
>> >    http://linuxptp.sourceforge.net/nohz-fix/periodic_vs_fix.png
>
> I also made a third graph showing before and after the patch.
>
>     http://linuxptp.sourceforge.net/nohz-fix/nohz_regression.png
>
>> If you do get a chance to look again, I'd also be interested if running
>> with nohz=off w/ the fix doesn't show any regression compared to the
>> unmodified nohz=off case.
>
> Looks like there is no regression.

That's great to hear! Looks like we might be able to queue it for 3.15...

Thanks so much again for the detailed testing! I really appreciate it!

thanks
-john

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] [RFC] timekeeping: Rework frequency adjustments to work better w/ nohz
  2014-01-07  3:57 [PATCH] [RFC] timekeeping: Rework frequency adjustments to work better w/ nohz John Stultz
  2014-01-13 17:51 ` Richard Cochran
@ 2014-02-07 11:45 ` Miroslav Lichvar
  2014-02-07 18:21   ` John Stultz
  2014-02-12 15:42 ` Miroslav Lichvar
  2 siblings, 1 reply; 11+ messages in thread
From: Miroslav Lichvar @ 2014-02-07 11:45 UTC (permalink / raw)
  To: John Stultz; +Cc: LKML, Richard Cochran, Prarit Bhargava

On Mon, Jan 06, 2014 at 07:57:03PM -0800, John Stultz wrote:
> Got a few cycles to take another look at this, and tried to address
> Miroslav's latest comments. Please let me know if you have further
> thoughts!

I've had finally some time to look at this, sorry for the delay.

> I also dropped the tk->ntp_error adjustments, as I've never quite
> been able to recall how that was correct, and it doesn't seem to 
> have any affect in the simulator. Will have to proceed carefully
> with testing here.

I see some effect of the ntp_error correction in the simulator, but it
seems rather disruptive than helpful. Perhaps the correction was meant
to account the fact that for the ntp_error accumulation ntp_tick
should change at the time when mult is changed instead of start of the
tick. I tried to find that correction and came up with this:

(ntp_tick1 - ntp_tick2) * offset / interval + offset

That doesn't look usable. But I don't think it's really necessary to
have any correction for that and I'm for dropping that line from the
code as your patch does.

Anyway, it seems there is a different problem in the code. I modified
the simulator to see how the code works when the clocksource frequency
is not divisible by HZ. With TSC_FREQ set to 3579545 (acpi_pm freq)
ntp_error doesn't settle down and the clock has a large frequency
error.

On top of your patch, this fixes the problem for me:

--- a/kernel/time/timekeeping.c  
+++ b/kernel/time/timekeeping.c
@@ -1065,7 +1065,7 @@ static __always_inline void timekeeping_freqadjust(struct timekeeper *tk,  

        /* Calculate current error per tick */
        tick_error = ntp_tick_length() >> tk->ntp_error_shift;
-       tick_error -= tk->xtime_interval;
+       tick_error -= tk->xtime_interval + tk->xtime_remainder;

        /* Don't worry about correcting it if its small */
        if (likely(abs(tick_error) < 2*interval))

My patch has this problem too. The original code seems to be affected
to some extent, it's able to converge to the correct frequency, but
there is some residual ntp error. Adding xtime_remainder to
timekeeping_bigadjust() fixes that.

I've some comments on the performance of the proposed code, I'll send
them in a separate mail later.

Thanks,

-- 
Miroslav Lichvar

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] [RFC] timekeeping: Rework frequency adjustments to work better w/ nohz
  2014-02-07 11:45 ` Miroslav Lichvar
@ 2014-02-07 18:21   ` John Stultz
  0 siblings, 0 replies; 11+ messages in thread
From: John Stultz @ 2014-02-07 18:21 UTC (permalink / raw)
  To: Miroslav Lichvar; +Cc: LKML, Richard Cochran, Prarit Bhargava

On 02/07/2014 03:45 AM, Miroslav Lichvar wrote:
> On Mon, Jan 06, 2014 at 07:57:03PM -0800, John Stultz wrote:
>> Got a few cycles to take another look at this, and tried to address
>> Miroslav's latest comments. Please let me know if you have further
>> thoughts!
> I've had finally some time to look at this, sorry for the delay.
>
>> I also dropped the tk->ntp_error adjustments, as I've never quite
>> been able to recall how that was correct, and it doesn't seem to 
>> have any affect in the simulator. Will have to proceed carefully
>> with testing here.
> I see some effect of the ntp_error correction in the simulator, but it
> seems rather disruptive than helpful. Perhaps the correction was meant
> to account the fact that for the ntp_error accumulation ntp_tick
> should change at the time when mult is changed instead of start of the
> tick. I tried to find that correction and came up with this:
Yes, so I actually re-added the logic a few days after sending that
patch out.

I realized the issue is that for the accumulation point, we're adjusting
the time forward or backwards, so with the new freq the non-accumulated
offset lines up with the current time. Thus the ntp_error is the error
at that accumulation point, which also needs to be adjusted
appropriately (apologies its much easier to see when drawn).

> (ntp_tick1 - ntp_tick2) * offset / interval + offset
>
> That doesn't look usable. But I don't think it's really necessary to
> have any correction for that and I'm for dropping that line from the
> code as your patch does.

I'll have to try to look over your suggestion here another day. I
unfortunately have a head cold at the moment, so any complicated math is
a bit treacherous. :)


> Anyway, it seems there is a different problem in the code. I modified
> the simulator to see how the code works when the clocksource frequency
> is not divisible by HZ. With TSC_FREQ set to 3579545 (acpi_pm freq)
> ntp_error doesn't settle down and the clock has a large frequency
> error.
>
> On top of your patch, this fixes the problem for me:
>
> --- a/kernel/time/timekeeping.c  
> +++ b/kernel/time/timekeeping.c
> @@ -1065,7 +1065,7 @@ static __always_inline void timekeeping_freqadjust(struct timekeeper *tk,  
>   
>         /* Calculate current error per tick */
>         tick_error = ntp_tick_length() >> tk->ntp_error_shift;
> -       tick_error -= tk->xtime_interval;
> +       tick_error -= tk->xtime_interval + tk->xtime_remainder;
>
>         /* Don't worry about correcting it if its small */
>         if (likely(abs(tick_error) < 2*interval))
>
> My patch has this problem too. The original code seems to be affected
> to some extent, it's able to converge to the correct frequency, but
> there is some residual ntp error. Adding xtime_remainder to
> timekeeping_bigadjust() fixes that.
>
> I've some comments on the performance of the proposed code, I'll send
> them in a separate mail later.

Ok.. I'll look into this as well.  Thanks so much for your review and fixes!

thanks!
-john


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] [RFC] timekeeping: Rework frequency adjustments to work better w/ nohz
  2014-01-07  3:57 [PATCH] [RFC] timekeeping: Rework frequency adjustments to work better w/ nohz John Stultz
  2014-01-13 17:51 ` Richard Cochran
  2014-02-07 11:45 ` Miroslav Lichvar
@ 2014-02-12 15:42 ` Miroslav Lichvar
  2014-04-24  4:22   ` John Stultz
  2 siblings, 1 reply; 11+ messages in thread
From: Miroslav Lichvar @ 2014-02-12 15:42 UTC (permalink / raw)
  To: John Stultz; +Cc: LKML, Richard Cochran, Prarit Bhargava

[-- Attachment #1: Type: text/plain, Size: 4132 bytes --]

On Mon, Jan 06, 2014 at 07:57:03PM -0800, John Stultz wrote:
> Got a few cycles to take another look at this, and tried to address
> Miroslav's latest comments. Please let me know if you have further
> thoughts!

In the simulations this version of the patch does indeed work better
than the previous one. I tested it with the ntp_error correction added
back as per your previous mail. It converges, the measured frequency
matches the value set by adjtimex and the ntp error stays close to
zero. I'm concerned about two things now and I'm not sure if they can
be fixed easily.

One is that the convergence still seems too slow to me.

./tk_test -t 500 -n 4000
samples: 1-500 reg: 1-500 slope: 1.00 dev: 1947.5 max: 36940.9 freq: -100.08081
samples: 501-1000 reg: 501-1000 slope: 1.00 dev: 430.7 max: 1669.6 freq: -100.07168
samples: 1001-1500 reg: 1001-1500 slope: 1.00 dev: 675.3 max: 3172.9 freq: -100.07393
samples: 1501-2000 reg: 1501-2000 slope: 1.00 dev: 453.9 max: 2223.4 freq: -100.07177
samples: 2001-2500 reg: 2001-2500 slope: 1.00 dev: 2601.2 max: 10875.4 freq: -100.03978
samples: 2501-3000 reg: 2501-3000 slope: 1.00 dev: 185.6 max: 1251.6 freq: -99.99987
samples: 3001-3500 reg: 3001-3500 slope: 1.00 dev: 160.1 max: 1181.7 freq: -99.99996
samples: 3501-4000 reg: 3501-4000 slope: 1.00 dev: 150.7 max: 1103.2 freq: -99.99990

You can see in this test it takes about 2500 updates to correct the
initial ntp error and settle down. That's with 1GHz clocksource. In
some tests I did with smaller clock frequencies or different frequency
offsets it took much longer than that.

$ ./tk_test -s 2500 -n 5000
samples: 1-5000 reg: 2501-5000 slope: 1.00 dev: 135942.4 max: 390761.8 freq: -100.00000

Here the regression line is calculated only through the latter half of
the samples (where it was already settled down) and we can see the
total ntp error corrected since the first update is around 390
microseconds.

I'm not saying ntpd or linuxptp can't work with this. The PLL and PI
controllers seem to handle this slowness in the frequency adjustment
well. I'm more worried about designs that may change the frequency
quickly and rely on accurate prediction of the clock in their feedback
loop.

The other thing I'm concerned about is that the multiplier doesn't
change only between two closest values when the loop has settled down,
but in a larger range. With the 1GHz clock I see the multiplier is
moving in range of 8387767-8387772, which makes the ntp error several
times larger than it would be if the multiplier switched just between
8387769 and 8387770.

Let's make a comparison between the current kernel (A), your patch
(B), and my patch (C). The script used for these tests is in the
tktest git as test1.sh. The first two columns are errors in the
measured frequency offsets (in ppm) from the first 10 and 100 samples. The
other two columns are time error deviation and maximum (in
nanoseconds) when the loop has converged. Both nohz and nohz off are
tested. The tests are repeated with 100 slightly different frequencies
and the mean value or stddev is presented.

                freq10       freq100      dev          max
A, nohz on      38.38368     2.72579      1468940.9    9846788.2
B, nohz on      1.37940      0.15085      298.5        1312.5  
C, nohz on      0.00601      0.00028      74.0         279.4
A, nohz off     3.89181      0.10436      0.2          0.6
B, nohz off     1.29396      0.14372      0.2          0.6
C, nohz off     0.05867      0.00204      0.2          0.6

In these tests A with nohz is really bad. C is much better than B in
the frequency control with both nohz enabled and disabled. As for the
time error, with nohz disabled all perform similarly, with nohz
enabled C is about 4 times better than B.

I've attached the latest version of my patch. Some bugfixes and
optimizations were made, but that division is still used in the code.
I've noticed there is a division in logarithmic_accumulation(), which
is used as a workaround for an older compiler bug. Perhaps it could be
removed now, so there is more space for this one?

Please reconsider.

Thanks,

-- 
Miroslav Lichvar

[-- Attachment #2: 0001-timekeeping-Fix-clock-stability-with-nohz.patch --]
[-- Type: text/plain, Size: 10770 bytes --]

>From 150e7c8ed96d078d025f154c219c0a4367a401fc Mon Sep 17 00:00:00 2001
From: Miroslav Lichvar <mlichvar@redhat.com>
Date: Thu, 14 Nov 2013 11:06:02 +0100
Subject: [PATCHv2] timekeeping: Fix clock stability with nohz

Since commit 5eb6d205, NTP corrections are accumulated independently
from the system clock and the clock multiplier is adjusted in a feedback
loop according to the current error between the system time and NTP
time.

This works well when the multiplier is updated often and regularly. With
nohz and idle system, however, the update interval is so long that the
loop becomes unstable. The frequency of the clock doesn't settle down
and there is a large time error, which seems to grow quadratically with
update interval.

If the constants used in the loop were modified for a maximum update
interval, it would be stable, but too slow to keep up with NTP. Without
knowing when will be the next update it's difficult to implement a loop
that is both fast and stable.

This patch fixes the problem by postponing update of NTP tick length in
the clock and setting the multiplier directly without feedback loop by
dividing the tick length with clock cycles. Previously, a change in tick
length was applied immediately to all ticks since last update, which
caused a jump in the NTP error proportional to the change and the update
interval and which had to be corrected later by the loop.

The only source of the accounted NTP error is now lack of resolution in
the clock multiplier. The error is corrected by adding 0 or 1 to the
calculated multiplier. With a constant tick length, the multiplier will
be switching between the two values. The loop is greatly simplified and
there is no problem with stability. The maximum error is limited by the
update interval.

In a simulation with 1GHz TSC clock and 10Hz clock update the maximum
error went down from 4.7 microseconds to 5.5 nanoseconds. With 1Hz
update the maximum error went down from 480 microseconds to 55
nanoseconds.

In a real test on idle machine comparing raw TSC and clock_gettime()
time stamps, the maximum error went down from microseconds to tens of
nanoseconds. A test with clock synchronized to a PTP hardware clock by
phc2sys from linuxptp now shows no difference when running with nohz
enabled and disabled, the clock seems to be stable to few tens of
nanoseconds.

Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
---
 include/linux/timekeeper_internal.h |   6 ++
 kernel/time/timekeeping.c           | 156 ++++++++++++------------------------
 2 files changed, 58 insertions(+), 104 deletions(-)

diff --git a/include/linux/timekeeper_internal.h b/include/linux/timekeeper_internal.h
index c1825eb..d55a12a 100644
--- a/include/linux/timekeeper_internal.h
+++ b/include/linux/timekeeper_internal.h
@@ -34,12 +34,18 @@ struct timekeeper {
 	/* Clock shifted nano seconds */
 	u64			xtime_nsec;

+	/* Tick length used in calculation of NTP error. */
+	u64			ntp_tick;
 	/* Difference between accumulated time and NTP time in ntp
 	 * shifted nano seconds. */
 	s64			ntp_error;
 	/* Shift conversion between clock shifted nano seconds and
 	 * ntp shifted nano seconds. */
 	u32			ntp_error_shift;
+	/* Correction applied to mult to reduce the error. */
+	s32			mult_ntp_correction;
+	/* Flag used to avoid updating NTP twice with same second */
+	u32			skip_second_overflow;

 	/*
 	 * wall_to_monotonic is what we need to add to xtime (or xtime corrected
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 0aa4ce8..b94f1df 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -147,6 +147,10 @@ static void tk_setup_internals(struct timekeeper *tk, struct clocksource *clock)
 	 * to counteract clock drifting.
 	 */
 	tk->mult = clock->mult;
+
+	tk->ntp_tick = ntpinterval << tk->ntp_error_shift;
+	tk->mult_ntp_correction = 0;
+	tk->skip_second_overflow = 0;
 }

 /* Timekeeper helper functions. */
@@ -1052,106 +1056,49 @@ static int __init timekeeping_init_ops(void)
 device_initcall(timekeeping_init_ops);

 /*
- * If the error is already larger, we look ahead even further
- * to compensate for late or lost adjustments.
- */
-static __always_inline int timekeeping_bigadjust(struct timekeeper *tk,
-						 s64 error, s64 *interval,
-						 s64 *offset)
-{
-	s64 tick_error, i;
-	u32 look_ahead, adj;
-	s32 error2, mult;
-
-	/*
-	 * Use the current error value to determine how much to look ahead.
-	 * The larger the error the slower we adjust for it to avoid problems
-	 * with losing too many ticks, otherwise we would overadjust and
-	 * produce an even larger error.  The smaller the adjustment the
-	 * faster we try to adjust for it, as lost ticks can do less harm
-	 * here.  This is tuned so that an error of about 1 msec is adjusted
-	 * within about 1 sec (or 2^20 nsec in 2^SHIFT_HZ ticks).
-	 */
-	error2 = tk->ntp_error >> (NTP_SCALE_SHIFT + 22 - 2 * SHIFT_HZ);
-	error2 = abs(error2);
-	for (look_ahead = 0; error2 > 0; look_ahead++)
-		error2 >>= 2;
-
-	/*
-	 * Now calculate the error in (1 << look_ahead) ticks, but first
-	 * remove the single look ahead already included in the error.
-	 */
-	tick_error = ntp_tick_length() >> (tk->ntp_error_shift + 1);
-	tick_error -= tk->xtime_interval >> 1;
-	error = ((error - tick_error) >> look_ahead) + tick_error;
-
-	/* Finally calculate the adjustment shift value.  */
-	i = *interval;
-	mult = 1;
-	if (error < 0) {
-		error = -error;
-		*interval = -*interval;
-		*offset = -*offset;
-		mult = -1;
-	}
-	for (adj = 0; error > i; adj++)
-		error >>= 1;
-
-	*interval <<= adj;
-	*offset <<= adj;
-	return mult << adj;
-}
-
-/*
  * Adjust the multiplier to reduce the error value,
  * this is optimized for the most common adjustments of -1,0,1,
  * for other values we can do a bit more work.
  */
 static void timekeeping_adjust(struct timekeeper *tk, s64 offset)
 {
-	s64 error, interval = tk->cycle_interval;
+	s64 interval = tk->cycle_interval;
+	u32 mult;
 	int adj;

 	/*
-	 * The point of this is to check if the error is greater than half
-	 * an interval.
-	 *
-	 * First we shift it down from NTP_SHIFT to clocksource->shifted nsecs.
-	 *
-	 * Note we subtract one in the shift, so that error is really error*2.
-	 * This "saves" dividing(shifting) interval twice, but keeps the
-	 * (error > interval) comparison as still measuring if error is
-	 * larger than half an interval.
-	 *
-	 * Note: It does not "save" on aggravation when reading the code.
+	 * Get multiplier matching the current NTP tick length. Use division
+	 * only when the tick length has changed (normally once per second
+	 * with active NTP PLL).
 	 */
-	error = tk->ntp_error >> (tk->ntp_error_shift - 1);
-	if (error > interval) {
-		/*
-		 * We now divide error by 4(via shift), which checks if
-		 * the error is greater than twice the interval.
-		 * If it is greater, we need a bigadjust, if its smaller,
-		 * we can adjust by 1.
-		 */
-		error >>= 2;
-		if (likely(error <= interval))
-			adj = 1;
-		else
-			adj = timekeeping_bigadjust(tk, error, &interval, &offset);
+	if (tk->ntp_tick == ntp_tick_length()) {
+		mult = tk->mult - tk->mult_ntp_correction;
 	} else {
-		if (error < -interval) {
-			/* See comment above, this is just switched for the negative */
-			error >>= 2;
-			if (likely(error >= -interval)) {
-				adj = -1;
-				interval = -interval;
-				offset = -offset;
-			} else {
-				adj = timekeeping_bigadjust(tk, error, &interval, &offset);
-			}
-		} else {
-			goto out_adjust;
-		}
+		tk->ntp_tick = ntp_tick_length();
+		mult = div64_u64((tk->ntp_tick >> tk->ntp_error_shift) -
+				 tk->xtime_remainder, tk->cycle_interval);
+	}
+
+	/*
+	 * Speed the clock up by 1 if it's behind NTP time. If there is no
+	 * remainder from the tick division, the clock will stay ahead
+	 * of NTP time until a non-divisible tick length is encountered.
+	 * (slowing the clock by 1 would increase the frequency error)
+	 */
+	tk->mult_ntp_correction = tk->ntp_error > 0 ? 1 : 0;
+	mult += tk->mult_ntp_correction;
+	adj = mult - tk->mult;
+
+	if (adj == 0)
+		goto out_adjust;
+
+	/* Multiply offset and interval by adj */
+	if (adj == -1) {
+		interval = -interval;
+		offset = -offset;
+	} else if (adj != 1) {
+		interval *= adj;
+		offset *= adj;
 	}

 	if (unlikely(tk->clock->maxadj &&
@@ -1207,13 +1154,10 @@ static void timekeeping_adjust(struct timekeeper *tk, s64 offset)
 	 *	xtime_nsec_2 = xtime_nsec_1 - offset
 	 * Which simplfies to:
 	 *	xtime_nsec -= offset
-	 *
-	 * XXX - TODO: Doc ntp_error calculation.
 	 */
 	tk->mult += adj;
 	tk->xtime_interval += interval;
 	tk->xtime_nsec -= offset;
-	tk->ntp_error -= (interval - offset) << tk->ntp_error_shift;

 out_adjust:
 	/*
@@ -1222,18 +1166,13 @@ out_adjust:
 	 * in the code above, its possible the required corrective factor to
 	 * xtime_nsec could cause it to underflow.
 	 *
-	 * Now, since we already accumulated the second, cannot simply roll
-	 * the accumulated second back, since the NTP subsystem has been
-	 * notified via second_overflow. So instead we push xtime_nsec forward
-	 * by the amount we underflowed, and add that amount into the error.
-	 *
-	 * We'll correct this error next time through this function, when
-	 * xtime_nsec is not as small.
+	 * Now, since we already accumulated the second and the NTP subsystem has
+	 * been notified via second_overflow(), we need to skip the next update.
 	 */
 	if (unlikely((s64)tk->xtime_nsec < 0)) {
-		s64 neg = -(s64)tk->xtime_nsec;
-		tk->xtime_nsec = 0;
-		tk->ntp_error += neg << tk->ntp_error_shift;
+		tk->xtime_nsec += (u64)NSEC_PER_SEC << tk->shift;
+		tk->xtime_sec--;
+		tk->skip_second_overflow = 1;
 	}

 }
@@ -1257,6 +1196,15 @@ static inline unsigned int accumulate_nsecs_to_secs(struct timekeeper *tk)
 		tk->xtime_nsec -= nsecps;
 		tk->xtime_sec++;

+		/*
+		 * Skip NTP update if this second was accumulated before,
+		 * i.e. xtime_nsec underflowed in timekeeping_adjust()
+		 */
+		if (unlikely(tk->skip_second_overflow)) {
+			tk->skip_second_overflow = 0;
+			continue;
+		}
+
 		/* Figure out if its a leap sec and apply if needed */
 		leap = second_overflow(tk->xtime_sec);
 		if (unlikely(leap)) {
@@ -1315,7 +1263,7 @@ static cycle_t logarithmic_accumulation(struct timekeeper *tk, cycle_t offset,
 	tk->raw_time.tv_nsec = raw_nsecs;

 	/* Accumulate error between NTP and clock interval */
-	tk->ntp_error += ntp_tick_length() << shift;
+	tk->ntp_error += tk->ntp_tick << shift;
 	tk->ntp_error -= (tk->xtime_interval + tk->xtime_remainder) <<
 						(tk->ntp_error_shift + shift);

@@ -1401,7 +1349,7 @@ void update_wall_time(void)
 			shift--;
 	}

-	/* correct the clock when NTP error is too big */
+	/* Update the multiplier */
 	timekeeping_adjust(tk, offset);

 	/*
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] [RFC] timekeeping: Rework frequency adjustments to work better w/ nohz
  2014-02-12 15:42 ` Miroslav Lichvar
@ 2014-04-24  4:22   ` John Stultz
  2014-04-24 11:28     ` Miroslav Lichvar
  0 siblings, 1 reply; 11+ messages in thread
From: John Stultz @ 2014-04-24  4:22 UTC (permalink / raw)
  To: Miroslav Lichvar; +Cc: LKML, Richard Cochran, Prarit Bhargava

Hey Miroslav!

  Once again, a few months pass and I finally get some more time to look
at this. :( Sorry for how slow this has been going.

On 02/12/2014 07:42 AM, Miroslav Lichvar wrote:
> On Mon, Jan 06, 2014 at 07:57:03PM -0800, John Stultz wrote:
>> Got a few cycles to take another look at this, and tried to address
>> Miroslav's latest comments. Please let me know if you have further
>> thoughts!
> In the simulations this version of the patch does indeed work better
> than the previous one. I tested it with the ntp_error correction added
> back as per your previous mail. It converges, the measured frequency
> matches the value set by adjtimex and the ntp error stays close to
> zero. I'm concerned about two things now and I'm not sure if they can
> be fixed easily.
>
> One is that the convergence still seems too slow to me.
>
> ./tk_test -t 500 -n 4000
> samples: 1-500 reg: 1-500 slope: 1.00 dev: 1947.5 max: 36940.9 freq: -100.08081
> samples: 501-1000 reg: 501-1000 slope: 1.00 dev: 430.7 max: 1669.6 freq: -100.07168
> samples: 1001-1500 reg: 1001-1500 slope: 1.00 dev: 675.3 max: 3172.9 freq: -100.07393
> samples: 1501-2000 reg: 1501-2000 slope: 1.00 dev: 453.9 max: 2223.4 freq: -100.07177
> samples: 2001-2500 reg: 2001-2500 slope: 1.00 dev: 2601.2 max: 10875.4 freq: -100.03978
> samples: 2501-3000 reg: 2501-3000 slope: 1.00 dev: 185.6 max: 1251.6 freq: -99.99987
> samples: 3001-3500 reg: 3001-3500 slope: 1.00 dev: 160.1 max: 1181.7 freq: -99.99996
> samples: 3501-4000 reg: 3501-4000 slope: 1.00 dev: 150.7 max: 1103.2 freq: -99.99990
>
> You can see in this test it takes about 2500 updates to correct the
> initial ntp error and settle down. That's with 1GHz clocksource. In
> some tests I did with smaller clock frequencies or different frequency
> offsets it took much longer than that.

So I started to look into this slow update issue. I was a little
confused, as the logarithmic approximation done in the frequency
correction shouldn't let *that* much initial error accumulate before get
to the +1/0 adjustments.

It ends up this is more a reflection of a different part of your patch.
Particularly the tk->ntp_tick storage. Bascially the ntp_tick variable
is a cache of the ntp_tick_length() value, however it doesn't get set to
ntp_tick_length() until *after* you do the first frequency correction.
Basically this avoids accumulating any error until after the first
correction is made.

My main concern is that this seems like an accounting error. By
basically avoiding accumulating the initial error it seems it would
never be corrected, no?

If I add a similar ntp_tick caching to my current implementation, it
converges practically as fast (with the only initial error coming from
the logarithmic approximation of the divide being quite small). 
Similarly, if you remove the usage of tk->ntp_tick in the error
accumulation loop, using ntp_tick_length() your patch behaves quite
similarly to mine.

Does this align with your view of the code? Or am I misunderstanding
something?

> $ ./tk_test -s 2500 -n 5000
> samples: 1-5000 reg: 2501-5000 slope: 1.00 dev: 135942.4 max: 390761.8 freq: -100.00000
>
> Here the regression line is calculated only through the latter half of
> the samples (where it was already settled down) and we can see the
> total ntp error corrected since the first update is around 390
> microseconds.
>
> I'm not saying ntpd or linuxptp can't work with this. The PLL and PI
> controllers seem to handle this slowness in the frequency adjustment
> well. I'm more worried about designs that may change the frequency
> quickly and rely on accurate prediction of the clock in their feedback
> loop.
>
> The other thing I'm concerned about is that the multiplier doesn't
> change only between two closest values when the loop has settled down,
> but in a larger range. With the 1GHz clock I see the multiplier is
> moving in range of 8387767-8387772, which makes the ntp error several
> times larger than it would be if the multiplier switched just between
> 8387769 and 8387770.

So this point is valid. I spent some time reworking things and I'm not
totally happy with it currently (there's a little mess to it), but its
much closer to your implementation logically, but again just avoids the
divide and keeps the accounting closer to what we already have.

I'll hopefully clean up my current work and send it out tomorrow.

Also I do really appreciate your submissions here and apologize I've not
been able to put more time towards it recently. I also know having me
rewrite your patch over and over with various mistakes is probably
frustrating, but I do really want to make sure I both understand all the
subtleties of your changes (which resynthesizing helps) as well as make
sure the accounting is really correct (or at least not changed subtlety
without clear reason).

Thanks so much again!
-john

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] [RFC] timekeeping: Rework frequency adjustments to work better w/ nohz
  2014-04-24  4:22   ` John Stultz
@ 2014-04-24 11:28     ` Miroslav Lichvar
  0 siblings, 0 replies; 11+ messages in thread
From: Miroslav Lichvar @ 2014-04-24 11:28 UTC (permalink / raw)
  To: John Stultz; +Cc: LKML, Richard Cochran, Prarit Bhargava

On Wed, Apr 23, 2014 at 09:22:45PM -0700, John Stultz wrote:
> On 02/12/2014 07:42 AM, Miroslav Lichvar wrote:
> > You can see in this test it takes about 2500 updates to correct the
> > initial ntp error and settle down. That's with 1GHz clocksource. In
> > some tests I did with smaller clock frequencies or different frequency
> > offsets it took much longer than that.
> 
> So I started to look into this slow update issue. I was a little
> confused, as the logarithmic approximation done in the frequency
> correction shouldn't let *that* much initial error accumulate before get
> to the +1/0 adjustments.
> 
> It ends up this is more a reflection of a different part of your patch.
> Particularly the tk->ntp_tick storage. Bascially the ntp_tick variable
> is a cache of the ntp_tick_length() value, however it doesn't get set to
> ntp_tick_length() until *after* you do the first frequency correction.
> Basically this avoids accumulating any error until after the first
> correction is made.

Yes, that was one of the main points of the patch. Postponing the tick
length change should remove the biggest source of the ntp error.

> My main concern is that this seems like an accounting error. By
> basically avoiding accumulating the initial error it seems it would
> never be corrected, no?

Not if we don't consider it to be something that should be corrected.
When the ntp tick length is changed (by adjtimex call or on second
overflow), to which ticks should be this change applied? The current
code always accumulates ntp error with the current tick length, i.e.
the change is effectively applied to already passed ticks since last
accumulation. I'm not saying this is necessarily wrong, but it causes
large ntp errors. I'm proposing to look at the frequency change
in a different way and apply it at the current time in the current
tick when the clock is updated.

In my understanding, there are three sources of ntp error in the
current code:
- change in the tick length is not effectively applied at the current
  time in the clock update
- mult is controlled by an iterative method
- insufficient resolution of mult

I think the first source can be removed by postpoing the tick length
change as explained above. The second source can be removed by
calculating mult precisely with division instead of an iterative
method. There is probably nothing we can do about the third source
(except switching to 64-bit mult), but it's small, predictable and can
be handled easily and cheaply by the +1/0 mult adjustment.

I'm still not convinced the clock can be controlled quickly and
accurately without information about when will be the next clock
update if the first and possibly second source of ntp error remain
there. As you have probably seen when working on the patches, the
requirements are in conflict and it's difficult or maybe not even
possible to get something working well with all different update
intervals, clock multipliers and frequency changes.

>From my view, as someone involved in development of algorithms
controlling clocks, I'd like the clock to be as deterministic as
possible. When I set the frequency, the kernel shouldn't be correcting
some large unknown phase error behind my back. I still wouldn't know
when exactly was the frequency actually set, but if that information
was exported by adjtimex (I have some ideas how to do that), it would
be perfect for me.

Thanks for still working on this.

-- 
Miroslav Lichvar

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-04-24 11:28 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-07  3:57 [PATCH] [RFC] timekeeping: Rework frequency adjustments to work better w/ nohz John Stultz
2014-01-13 17:51 ` Richard Cochran
2014-01-13 18:15   ` John Stultz
2014-01-14  7:07     ` Richard Cochran
2014-01-28 17:58     ` Richard Cochran
2014-01-29 17:56       ` John Stultz
2014-02-07 11:45 ` Miroslav Lichvar
2014-02-07 18:21   ` John Stultz
2014-02-12 15:42 ` Miroslav Lichvar
2014-04-24  4:22   ` John Stultz
2014-04-24 11:28     ` Miroslav Lichvar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox