public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Time precision, adjtime(x) vs. gettimeofday
@ 2003-10-08 13:32 Benjamin Herrenschmidt
  2003-10-08 15:48 ` Gabriel Paubert
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Benjamin Herrenschmidt @ 2003-10-08 13:32 UTC (permalink / raw)
  To: linuxppc-dev list, Linux Kernel list

Hi !

While fixing problems experienced by some scientific users who
found out that gettimeofday() could sometimes run backward, I
found a nasty issue I don't know if we can fix at all or if it's
not worth bothering.

So the problem is with any arch (ppc, x86, ...) who uses a HW
timer (like the CPU timebase on PPC) to provide better-than-jiffy
precision in do_gettimeofday().

The problem is that the offset added to xtime value (typically
the HW timer current value minus the HW timer value at the last
timer interrupt scaled to usec) uses a scaling factor which has
been calibrated once, and doesn't take into account the adjustements
done to xtime increase by adjtime/adjtimex algorithm.

That means that if, for example, adjtimex was called with a factor
that is trying to slow you down a bit, and you call gettimeofday
right before the end of a jiffy period, you may calculate an offset
based on the HW timer that is actually higher than what will be
really added to xtime on the next interrupt.

So you can end-up returning non-monotonic values from gettimeofday().

I don't see a way to fix that that wouldn't bloat do_gettimeofday(),
except if we can, at jiffy interrupt time, pre-calculate a scaling
factor for the next jiffy and just apply it on the HW timer value
on the next calls to do_gettimeofday(). But that option would need
better understanding of the adjtime(x) algorithm that what I have
at this point.

Storing the last value to make sure we don't return a value that is
lower will defeat the read_lock/write_lock mecanism, forcing us to
take the write_lock(), and thus screwing up scalability.

Any idea ?

Note: In addition to the above, there seem to be a race on x86 2.4
(only, 2.6 doesn't have it) due to the fact that the actual xtime
increase is done from a bottom half. The HW timer "last stamp" is
stored from the HW interrupt, xtime is only updated on the BH, so
if gettimeofday is called in between those 2, you'll end up using
the "new" "last stamp" with the old xtime, thus returning an
incorrect value. A fix we use on PPC is to use

 jiffies - wall_jiffies

As an additional correction.

Ben.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Time precision, adjtime(x) vs. gettimeofday
  2003-10-08 13:32 Time precision, adjtime(x) vs. gettimeofday Benjamin Herrenschmidt
@ 2003-10-08 15:48 ` Gabriel Paubert
  2003-10-08 16:22   ` Benjamin Herrenschmidt
  2003-10-08 18:25 ` [PATCH] " Stephen Hemminger
  2003-10-08 22:17 ` Pavel Machek
  2 siblings, 1 reply; 9+ messages in thread
From: Gabriel Paubert @ 2003-10-08 15:48 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev list, Linux Kernel list


	Hi!,

On Wed, Oct 08, 2003 at 03:32:31PM +0200, Benjamin Herrenschmidt wrote:
> 
> Hi !
> 
> While fixing problems experienced by some scientific users who
> found out that gettimeofday() could sometimes run backward, I
> found a nasty issue I don't know if we can fix at all or if it's
> not worth bothering.
> 
> So the problem is with any arch (ppc, x86, ...) who uses a HW
> timer (like the CPU timebase on PPC) to provide better-than-jiffy
> precision in do_gettimeofday().
> 
> The problem is that the offset added to xtime value (typically
> the HW timer current value minus the HW timer value at the last
> timer interrupt scaled to usec) uses a scaling factor which has
> been calibrated once, and doesn't take into account the adjustements
> done to xtime increase by adjtime/adjtimex algorithm.

Well, it it affects gettimeofday which has a precision of 1 part in
10000 (100 ppm), it means that our boot time timebase calibration was 
not very good to start with, on my set of running VME machines I have
the following (values in ppm): 

$cat  /nfsroots/v*/etc/ntp/drift
-10.191
-2.787
3.869
-5.645
-1.146
-7.383
4.400
5.824
4.640
0.014
-8.371
0.056
-2.324
-5.655
-5.828
-4.862
-3.380

I can understand that we'll certainly have more serious problems
of non-monotonicity for nanosecond precision timestamps.

I also have from time to time a bad timebase calibration at boot which 
makes the drift go to about 400ppm. I just don't have this problem
on any machine right now. I believed I mentioned this issue once 
on the list but never found time to track it. 

Maybe the boot-time timebase calibration could use a longer period.
However, I'd first like to know by how much the timebase calibration
of the user which has the problem varies between reboots.

> 
> That means that if, for example, adjtimex was called with a factor
> that is trying to slow you down a bit, and you call gettimeofday
> right before the end of a jiffy period, you may calculate an offset
> based on the HW timer that is actually higher than what will be
> really added to xtime on the next interrupt.
> 
> So you can end-up returning non-monotonic values from gettimeofday().

As I said, only if you have fairly large corrections. Anything below


> 
> I don't see a way to fix that that wouldn't bloat do_gettimeofday(),
> except if we can, at jiffy interrupt time, pre-calculate a scaling
> factor for the next jiffy and just apply it on the HW timer value
> on the next calls to do_gettimeofday(). But that option would need
> better understanding of the adjtime(x) algorithm that what I have
> at this point.
> 
> Storing the last value to make sure we don't return a value that is
> lower will defeat the read_lock/write_lock mecanism, forcing us to
> take the write_lock(), and thus screwing up scalability.
> 
> Any idea ?
> 
> Note: In addition to the above, there seem to be a race on x86 2.4
> (only, 2.6 doesn't have it) due to the fact that the actual xtime
> increase is done from a bottom half. The HW timer "last stamp" is
> stored from the HW interrupt, xtime is only updated on the BH, so
> if gettimeofday is called in between those 2, you'll end up using
> the "new" "last stamp" with the old xtime, thus returning an
> incorrect value. A fix we use on PPC is to use
> 
>  jiffies - wall_jiffies
> 
> As an additional correction.

AFAIR, this correction is also done on x86.


	Regards,
	Gabriel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Time precision, adjtime(x) vs. gettimeofday
  2003-10-08 15:48 ` Gabriel Paubert
@ 2003-10-08 16:22   ` Benjamin Herrenschmidt
  2003-10-08 17:50     ` Gabriel Paubert
  0 siblings, 1 reply; 9+ messages in thread
From: Benjamin Herrenschmidt @ 2003-10-08 16:22 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: linuxppc-dev list, Linux Kernel list


> Well, it it affects gettimeofday which has a precision of 1 part in
> 10000 (100 ppm), it means that our boot time timebase calibration was
> not very good to start with, on my set of running VME machines I have
> the following (values in ppm):
>
> ../..

Boot time calibration can't be perfect... I depends very much on the
quality of what your are calibrating against, and the bus path to it.

On most pmacs, I'm calibrating either against a VIA timer which isn't
_that_ good or on OF value (which are themselves calibrated, I think,
against the KeyLargo timer).

On all cases, those will drift some way from what the NTP server will
give, either a lot or not, it will. So we may end up adjusting our
kernel rate and thus opening a window for the problem.

Ben.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Time precision, adjtime(x) vs. gettimeofday
  2003-10-08 16:22   ` Benjamin Herrenschmidt
@ 2003-10-08 17:50     ` Gabriel Paubert
  2003-10-08 18:22       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 9+ messages in thread
From: Gabriel Paubert @ 2003-10-08 17:50 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev list, Linux Kernel list

On Wed, Oct 08, 2003 at 06:22:58PM +0200, Benjamin Herrenschmidt wrote:
> 
> > Well, it it affects gettimeofday which has a precision of 1 part in
> > 10000 (100 ppm), it means that our boot time timebase calibration was
> > not very good to start with, on my set of running VME machines I have
> > the following (values in ppm):
> >
> > ../..
> 
> Boot time calibration can't be perfect... 

No, indeed.

> I depends very much on the quality of what your are calibrating against, 
> and the bus path to it.

At the time you it is performed, most devices should not be active
(no long DMA bursts) so the variations should be rather small.
Another solution is to increase the measurement period. I have to 
use one second on some machines because I don't have anything else
reliable (only the RTC which changes every second and its interrupt
pin is not routed), even a 1 to 2 second delay does not significantly
affect boot times.

> 
> On most pmacs, I'm calibrating either against a VIA timer which isn't
> _that_ good or on OF value (which are themselves calibrated, I think,
> against the KeyLargo timer).

On the Macs I have around here, the ntp drift values are:
- on a PB G3/400: +8ppm
- on a PM G4/466: -6ppm

that's not _that_ bad (I believe these come from OF). 

10 ppm of a 10ms jiffy is 0.1 microseconds. Increasing HZ can only 
improve this figure, although it is stupid to run the correction loop 
that often IMNSHO.

I repeat the question: what are the values of drift on the machines
that encounter the problem ? Is this drift stable or unstable? 

> 
> On all cases, those will drift some way from what the NTP server will
> give, either a lot or not, it will. So we may end up adjusting our
> kernel rate and thus opening a window for the problem.

The worst variations of drift I've seen are a few ppm for a given
machine, barring the occasional boot-time calibration problems that
I have encountered.

	Gabriel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Time precision, adjtime(x) vs. gettimeofday
  2003-10-08 17:50     ` Gabriel Paubert
@ 2003-10-08 18:22       ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 9+ messages in thread
From: Benjamin Herrenschmidt @ 2003-10-08 18:22 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: linuxppc-dev list, Linux Kernel list


> I repeat the question: what are the values of drift on the machines
> that encounter the problem ? Is this drift stable or unstable?

So far, there is no problem. The problem that was happening
was a via_calibrate_decr() bug with HZ != 100, but when
investigating, I figured out that we had a potential problem
there, that's all and that's why I want people like you who
know those problems well to state if it's worth bothering ;)
> >
> > On all cases, those will drift some way from what the NTP server will
> > give, either a lot or not, it will. So we may end up adjusting our
> > kernel rate and thus opening a window for the problem.
> 
> The worst variations of drift I've seen are a few ppm for a given
> machine, barring the occasional boot-time calibration problems that
> I have encountered.

OK.




^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] Time precision, adjtime(x) vs. gettimeofday
  2003-10-08 13:32 Time precision, adjtime(x) vs. gettimeofday Benjamin Herrenschmidt
  2003-10-08 15:48 ` Gabriel Paubert
@ 2003-10-08 18:25 ` Stephen Hemminger
  2003-10-08 18:43   ` Benjamin Herrenschmidt
  2003-10-08 19:11   ` john stultz
  2003-10-08 22:17 ` Pavel Machek
  2 siblings, 2 replies; 9+ messages in thread
From: Stephen Hemminger @ 2003-10-08 18:25 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Andrew Morton, john stultz; +Cc: linux-kernel

On Wed, 08 Oct 2003 15:32:31 +0200
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> Hi !
> 
> While fixing problems experienced by some scientific users who
> found out that gettimeofday() could sometimes run backward, I
> found a nasty issue I don't know if we can fix at all or if it's
> not worth bothering.
> 
> So the problem is with any arch (ppc, x86, ...) who uses a HW
> timer (like the CPU timebase on PPC) to provide better-than-jiffy
> precision in do_gettimeofday().
> 
> The problem is that the offset added to xtime value (typically
> the HW timer current value minus the HW timer value at the last
> timer interrupt scaled to usec) uses a scaling factor which has
> been calibrated once, and doesn't take into account the adjustements
> done to xtime increase by adjtime/adjtimex algorithm.
> 
> That means that if, for example, adjtimex was called with a factor
> that is trying to slow you down a bit, and you call gettimeofday
> right before the end of a jiffy period, you may calculate an offset
> based on the HW timer that is actually higher than what will be
> really added to xtime on the next interrupt.
> 
> So you can end-up returning non-monotonic values from gettimeofday().
> 
> I don't see a way to fix that that wouldn't bloat do_gettimeofday(),
> except if we can, at jiffy interrupt time, pre-calculate a scaling
> factor for the next jiffy and just apply it on the HW timer value
> on the next calls to do_gettimeofday(). But that option would need
> better understanding of the adjtime(x) algorithm that what I have
> at this point.
> 
> Storing the last value to make sure we don't return a value that is
> lower will defeat the read_lock/write_lock mecanism, forcing us to
> take the write_lock(), and thus screwing up scalability.

The following will prevent adjtime from causing time regression.
It delays starting the adjtime mechanism for one tick, and 
keeps gettimeofday inside the window.

Only fixes i386, but changes to other arch would be similar.

Running a simple clock test program and playing with adjtime demonstrates
that this fixes the problem (and 2.6.0-test6 is broken). 
But given the fragile nature of the timer code, it should go through some
more testing before inclusion.  Andrew could you put this in the next
-mm tree?

diff -Nru a/arch/i386/kernel/time.c b/arch/i386/kernel/time.c
--- a/arch/i386/kernel/time.c	Wed Oct  8 11:20:55 2003
+++ b/arch/i386/kernel/time.c	Wed Oct  8 11:20:55 2003
@@ -102,6 +102,15 @@
 		lost = jiffies - wall_jiffies;
 		if (lost)
 			usec += lost * (1000000 / HZ);
+
+		/*
+		 * If time_adjust is negative then NTP is slowing the clock
+		 * so make sure not to go into next possible interval.
+		 * Better to lose some accuracy than have time go backwards..
+		 */
+		if (unlikely(time_adjust < 0) && usec > tickadj)
+			usec = tickadj;
+
 		sec = xtime.tv_sec;
 		usec += (xtime.tv_nsec / 1000);
 	} while (read_seqretry(&xtime_lock, seq));
diff -Nru a/include/linux/timex.h b/include/linux/timex.h
--- a/include/linux/timex.h	Wed Oct  8 11:20:55 2003
+++ b/include/linux/timex.h	Wed Oct  8 11:20:55 2003
@@ -302,6 +302,7 @@
 extern long time_reftime;	/* time at last adjustment (s) */
 
 extern long time_adjust;	/* The amount of adjtime left */
+extern long time_next_adjust;	/* Value for time_adjust at next tick */
 
 /* interface variables pps->timer interrupt */
 extern long pps_offset;		/* pps time offset (us) */
diff -Nru a/kernel/time.c b/kernel/time.c
--- a/kernel/time.c	Wed Oct  8 11:20:55 2003
+++ b/kernel/time.c	Wed Oct  8 11:20:55 2003
@@ -233,7 +233,7 @@
 	result = time_state;	/* mostly `TIME_OK' */
 
 	/* Save for later - semantics of adjtime is to return old value */
-	save_adjust = time_adjust;
+	save_adjust = time_next_adjust ? time_next_adjust : time_adjust;
 
 #if 0	/* STA_CLOCKERR is never set yet */
 	time_status &= ~STA_CLOCKERR;		/* reset STA_CLOCKERR */
@@ -280,7 +280,8 @@
 	    if (txc->modes & ADJ_OFFSET) {	/* values checked earlier */
 		if (txc->modes == ADJ_OFFSET_SINGLESHOT) {
 		    /* adjtime() is independent from ntp_adjtime() */
-		    time_adjust = txc->offset;
+		    if ((time_next_adjust = txc->offset) == 0)
+			 time_adjust = 0;
 		}
 		else if ( time_status & (STA_PLL | STA_PPSTIME) ) {
 		    ltemp = (time_status & (STA_PPSTIME | STA_PPSSIGNAL)) ==
diff -Nru a/kernel/timer.c b/kernel/timer.c
--- a/kernel/timer.c	Wed Oct  8 11:20:55 2003
+++ b/kernel/timer.c	Wed Oct  8 11:20:55 2003
@@ -463,6 +463,7 @@
 long time_adj;				/* tick adjust (scaled 1 / HZ)	*/
 long time_reftime;			/* time at last adjustment (s)	*/
 long time_adjust;
+long time_next_adjust;
 
 /*
  * this routine handles the overflow of the microsecond field
@@ -643,6 +644,12 @@
 	}
 	xtime.tv_nsec += delta_nsec;
 	time_interpolator_update(delta_nsec);
+
+	/* Changes by adjtime() do not take effect till next tick. */
+	if (time_next_adjust != 0) {
+		time_adjust = time_next_adjust;
+		time_next_adjust = 0;
+	}
 }
 
 /*

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Time precision, adjtime(x) vs. gettimeofday
  2003-10-08 18:25 ` [PATCH] " Stephen Hemminger
@ 2003-10-08 18:43   ` Benjamin Herrenschmidt
  2003-10-08 19:11   ` john stultz
  1 sibling, 0 replies; 9+ messages in thread
From: Benjamin Herrenschmidt @ 2003-10-08 18:43 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Andrew Morton, john stultz, Linux Kernel list


> The following will prevent adjtime from causing time regression.
> It delays starting the adjtime mechanism for one tick, and 
> keeps gettimeofday inside the window.
> 
> Only fixes i386, but changes to other arch would be similar.
> 
> Running a simple clock test program and playing with adjtime demonstrates
> that this fixes the problem (and 2.6.0-test6 is broken). 
> But given the fragile nature of the timer code, it should go through some
> more testing before inclusion.  Andrew could you put this in the next
> -mm tree?

I like that solution. There is still a possible small issue
in 2.4 but I don't think we need to care about it (see below)

Note about the 2.4 SMP race I talked about, x86 is indeed safe,
as it also uses (jiffies - wall_jiffies) to adjust the offset,
I missed it at first as it's not done from the do_gettimeoffset()
function where I was looking for it.

However, that that means we may apply more than one jiffie to
xtime at once, thus the above workaround would still have a small
hole. But since that happens only with insane interrupt latencies
that I don't expect to see in real life, it's probably a non-issue.

2.6 should always have jiffies and wall_jiffies in perfect sync
as they are manipulated within the same write_lock block.

Ben.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Time precision, adjtime(x) vs. gettimeofday
  2003-10-08 18:25 ` [PATCH] " Stephen Hemminger
  2003-10-08 18:43   ` Benjamin Herrenschmidt
@ 2003-10-08 19:11   ` john stultz
  1 sibling, 0 replies; 9+ messages in thread
From: john stultz @ 2003-10-08 19:11 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Benjamin Herrenschmidt, Andrew Morton, lkml

On Wed, 2003-10-08 at 11:25, Stephen Hemminger wrote:
> On Wed, 08 Oct 2003 15:32:31 +0200
> Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> > 
> > That means that if, for example, adjtimex was called with a factor
> > that is trying to slow you down a bit, and you call gettimeofday
> > right before the end of a jiffy period, you may calculate an offset
> > based on the HW timer that is actually higher than what will be
> > really added to xtime on the next interrupt.
[snip]
> The following will prevent adjtime from causing time regression.
> It delays starting the adjtime mechanism for one tick, and 
> keeps gettimeofday inside the window.

Hmm. Looks good to me. If we're losing ticks, it would make time
stair-step somewhat, but it doesn't seem to affect the actual time
accumulation. 

thanks
-john




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Time precision, adjtime(x) vs. gettimeofday
  2003-10-08 13:32 Time precision, adjtime(x) vs. gettimeofday Benjamin Herrenschmidt
  2003-10-08 15:48 ` Gabriel Paubert
  2003-10-08 18:25 ` [PATCH] " Stephen Hemminger
@ 2003-10-08 22:17 ` Pavel Machek
  2 siblings, 0 replies; 9+ messages in thread
From: Pavel Machek @ 2003-10-08 22:17 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev list, Linux Kernel list

Hi!

> While fixing problems experienced by some scientific users who
> found out that gettimeofday() could sometimes run backward, I

Having time run backward is not really an option; screensavers start
kicking randomly, make has problems, etc, etc.
								Pavel
-- 
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2003-10-08 22:17 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-10-08 13:32 Time precision, adjtime(x) vs. gettimeofday Benjamin Herrenschmidt
2003-10-08 15:48 ` Gabriel Paubert
2003-10-08 16:22   ` Benjamin Herrenschmidt
2003-10-08 17:50     ` Gabriel Paubert
2003-10-08 18:22       ` Benjamin Herrenschmidt
2003-10-08 18:25 ` [PATCH] " Stephen Hemminger
2003-10-08 18:43   ` Benjamin Herrenschmidt
2003-10-08 19:11   ` john stultz
2003-10-08 22:17 ` Pavel Machek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox