[PATCH] gettimeofday stability

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] gettimeofday stability
@ 2001-04-11 19:00 Samuel Rydh
  2001-04-11 19:42 ` Gabriel Paubert
  0 siblings, 1 reply; 18+ messages in thread
From: Samuel Rydh @ 2001-04-11 19:00 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Benjamin Herrenschmidt

I'd like to suggest the following modification of do_gettimeofday():

===== time.c 1.8 vs edited =====
--- 1.8/arch/ppc/kernel/time.c  Mon Apr  2 03:36:42 2001
+++ edited/time.c       Wed Apr 11 20:38:42 2001
@@ -212,6 +212,10 @@
        sec = xtime.tv_sec;
        usec = xtime.tv_usec;
        delta = tb_ticks_since(tb_last_stamp);
+
+       if( (int)delta < 0 )
+               delta = 0;
+
 #ifdef CONFIG_SMP
        /* As long as timebases are not in sync, gettimeofday can only
         * have jiffy resolution on SMP.

Normally, delta should be strictly positive. However, if
coherency between DEC and TB is lost, then delta might turn
out to be (slightly) negative, which results in a
bogus time stamp.

The main reason why I want this modification is that MOL
touches both DEC and TB. I've not managed to maintain
exact coherency (appears to be more or less impossible).
The fix above would guard against an occasional drift.

/Samuel

----------------------------------------------------------
 E-mail <samuel@ibrium.se>  WWW: <http://www.ibrium.se>
  Phone/fax: (home) +46 8 4418431, (work) +46 8 7908470
----------------------------------------------------------

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] gettimeofday stability
  2001-04-11 19:00 [PATCH] gettimeofday stability Samuel Rydh
@ 2001-04-11 19:42 ` Gabriel Paubert
  2001-04-11 20:09   ` Karim Yaghmour
  2001-04-11 23:07   ` Samuel Rydh
  0 siblings, 2 replies; 18+ messages in thread
From: Gabriel Paubert @ 2001-04-11 19:42 UTC (permalink / raw)
  To: Samuel Rydh; +Cc: linuxppc-dev, Benjamin Herrenschmidt

On Wed, 11 Apr 2001, Samuel Rydh wrote:

>
> I'd like to suggest the following modification of do_gettimeofday():
>
>
> ===== time.c 1.8 vs edited =====
> --- 1.8/arch/ppc/kernel/time.c  Mon Apr  2 03:36:42 2001
> +++ edited/time.c       Wed Apr 11 20:38:42 2001
> @@ -212,6 +212,10 @@
>         sec = xtime.tv_sec;
>         usec = xtime.tv_usec;
>         delta = tb_ticks_since(tb_last_stamp);
> +
> +       if( (int)delta < 0 )
> +               delta = 0;
> +
>  #ifdef CONFIG_SMP
>         /* As long as timebases are not in sync, gettimeofday can only
>          * have jiffy resolution on SMP.
>
>
> Normally, delta should be strictly positive. However, if
> coherency between DEC and TB is lost, then delta might turn
> out to be (slightly) negative, which results in a
> bogus time stamp.
>
> The main reason why I want this modification is that MOL
> touches both DEC and TB. I've not managed to maintain
> exact coherency (appears to be more or less impossible).
> The fix above would guard against an occasional drift.

Why in the hell does it touch TB ? I could understand touching the
decrementer, but the TB should be sacred. It is the only absolute time
reference we have, and apart from being synchronized at boot on SMP, it
should never be changed.

If you touch the TB, you will lose the nicely spaced regular interrupts
that we have and screw up NTP for example, get occasionally shorter or
longer jiffies etc... I wrote the code carefully to avoid all these kinds
of problems. Besides that, you have to touch all TB simultaneously on SMP,
it's not as easy as it seems.

Finally, if you _really_ run into this problem, given the delay between
the decrementer interrupt and the update of tb_last_stamp, it means that
you likely introduce uncertainties of several microseconds. I forgot also
to mention that, to complicate matters, you have to check CPU type before
you touch the TB (601 versus all others).

	Regards,
	Gabriel

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] gettimeofday stability
  2001-04-11 19:42 ` Gabriel Paubert
@ 2001-04-11 20:09   ` Karim Yaghmour
  2001-04-11 21:31     ` Benjamin Herrenschmidt
                       ` (2 more replies)
  2001-04-11 23:07   ` Samuel Rydh
  1 sibling, 3 replies; 18+ messages in thread
From: Karim Yaghmour @ 2001-04-11 20:09 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: Samuel Rydh, linuxppc-dev, Benjamin Herrenschmidt

Gabriel Paubert wrote:
>
> Finally, if you _really_ run into this problem, given the delay between
> the decrementer interrupt and the update of tb_last_stamp, it means that
> you likely introduce uncertainties of several microseconds. I forgot also
> to mention that, to complicate matters, you have to check CPU type before
> you touch the TB (601 versus all others).
>

While porting the Linux Trace Toolkit to PPC I noticed a problem
that may be explained by the symptoms described. The way it works
is that LTT uses do_gettimeofday() to stamp the events that occur.
Occasionnaly, a trace would contain entries where the timestamp
will jump (from one event to the next) of approximately 4295 seconds.
Later on, this would come back to a "normal" value. And the
4295 seconds are 2^32/1000000. Hence the underflow.

This has been noticed with both 2.2.x and 2.4.x kernels.

===================================================
                 Karim Yaghmour
               karym@opersys.com
      Embedded and Real-Time Linux Expert
===================================================

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] gettimeofday stability
  2001-04-11 20:09   ` Karim Yaghmour
@ 2001-04-11 21:31     ` Benjamin Herrenschmidt
  2001-04-12 18:09     ` Gabriel Paubert
  2001-04-17 11:22     ` Gabriel Paubert
  2 siblings, 0 replies; 18+ messages in thread
From: Benjamin Herrenschmidt @ 2001-04-11 21:31 UTC (permalink / raw)
  To: Karim Yaghmour; +Cc: Gabriel Paubert, linuxppc-dev


>Gabriel Paubert wrote:
>>
>> Finally, if you _really_ run into this problem, given the delay between
>> the decrementer interrupt and the update of tb_last_stamp, it means that
>> you likely introduce uncertainties of several microseconds. I forgot also
>> to mention that, to complicate matters, you have to check CPU type before
>> you touch the TB (601 versus all others).
>>
>
>While porting the Linux Trace Toolkit to PPC I noticed a problem
>that may be explained by the symptoms described. The way it works
>is that LTT uses do_gettimeofday() to stamp the events that occur.
>Occasionnaly, a trace would contain entries where the timestamp
>will jump (from one event to the next) of approximately 4295 seconds.
>Later on, this would come back to a "normal" value. And the
>4295 seconds are 2^32/1000000. Hence the underflow.
>
>This has been noticed with both 2.2.x and 2.4.x kernels.

Hrm... looks like we need to protect about a DEC rupt falling too early ?

That can be caused in some rare occasions. I think Paulus has fixed
one event of that in the latest bk trees in order to force the DEC to
emulate lost interrupts.

Ben.


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] gettimeofday stability
  2001-04-11 19:42 ` Gabriel Paubert
  2001-04-11 20:09   ` Karim Yaghmour
@ 2001-04-11 23:07   ` Samuel Rydh
  2001-04-16 11:25     ` Gabriel Paubert
  1 sibling, 1 reply; 18+ messages in thread
From: Samuel Rydh @ 2001-04-11 23:07 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: linuxppc-dev

On Wed, Apr 11, 2001 at 09:42:58PM +0200, Gabriel Paubert wrote:
> > Normally, delta should be strictly positive. However, if
> > coherency between DEC and TB is lost, then delta might turn
> > out to be (slightly) negative, which results in a
> > bogus time stamp.
> >
> > The main reason why I want this modification is that MOL
> > touches both DEC and TB. I've not managed to maintain
> > exact coherency (appears to be more or less impossible).
> > The fix above would guard against an occasional drift.
>
> Why in the hell does it touch TB ? I could understand touching the
> decrementer, but the TB should be sacred. It is the only absolute time
> reference we have, and apart from being synchronized at boot on SMP, it
> should never be changed.

Ideally, TB should not be touched. Indeed, MOL can run without
touching TB (but DEC is essential). However, TB needs to be
modified for 'save session' feature to work. Basically, the RAM
and cpu state of MacOS is flushed to disk. At a later time,
MacOS can be restarted instantly. The problem is that MacOS can't
deal with the TB skip that occurs if the timebase is not restored
(no big surprise there).

> If you touch the TB, you will lose the nicely spaced regular interrupts
> that we have and screw up NTP for example, get occasionally shorter or
> longer jiffies etc... I wrote the code carefully to avoid all these kinds
> of problems.

Yes, that is probably unavoidable. My though was to let the user
choose the use of the save session feature at the price of slightly
less regularly spaced DEC interrupts while MOL is running (there
will probably be a minor clock drift too).

>Besides that, you have to touch all TB simultaneously on SMP,
> it's not as easy as it seems.

I know :-). It is difficult enough to keep DEC and TB coherent
on a single processor system (i.e. I don't think there is a
simple way - loading them simultaneously just after a clock
edge does not work since mtdec/mttb requires too many processor
cycles on certains CPUs).

> Finally, if you _really_ run into this problem, given the delay between
> the decrementer interrupt and the update of tb_last_stamp, it means that
> you likely introduce uncertainties of several microseconds.

Yes, that is probably true.

> I forgot also to mention that, to complicate matters, you have to
> check CPU type before you touch the TB (601 versus all others).

Well, the TB/RTC issue is a minor problem compared to the other differences
(in particular, the unified BATs and the lack of a no-execute bit
in the segment registers).

Anyway, the negative offset check is desirable even if it is
only the DEC that is touched. Of course, as a workaround one could
make sure the DEC register is only accessed in such a manner that
linux-DEC interrupts never occur early (this is what MOL does now).
However, this workaround is a bit tricky to implement without
introducing dependencies on the inner workings of the processor
(at least as long as one tries to avoid introducing a drift
in the opposite direction).

Regards,

/Samuel

----------------------------------------------------------
 E-mail <samuel@ibrium.se>  WWW: <http://www.ibrium.se>
  Phone/fax: (home) +46 8 4418431, (work) +46 8 7908470
----------------------------------------------------------

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] gettimeofday stability
  2001-04-11 20:09   ` Karim Yaghmour
  2001-04-11 21:31     ` Benjamin Herrenschmidt
@ 2001-04-12 18:09     ` Gabriel Paubert
  2001-04-14  6:49       ` Karim Yaghmour
  2001-04-17 11:22     ` Gabriel Paubert
  2 siblings, 1 reply; 18+ messages in thread
From: Gabriel Paubert @ 2001-04-12 18:09 UTC (permalink / raw)
  To: Karim Yaghmour; +Cc: Samuel Rydh, linuxppc-dev, Benjamin Herrenschmidt


On Wed, 11 Apr 2001, Karim Yaghmour wrote:

> While porting the Linux Trace Toolkit to PPC I noticed a problem
> that may be explained by the symptoms described. The way it works
> is that LTT uses do_gettimeofday() to stamp the events that occur.
> Occasionnaly, a trace would contain entries where the timestamp
> will jump (from one event to the next) of approximately 4295 seconds.
> Later on, this would come back to a "normal" value. And the
> 4295 seconds are 2^32/1000000. Hence the underflow.
>
> This has been noticed with both 2.2.x and 2.4.x kernels.

Given that time handling is completely different in 2.2 and 2.4, it is
surprising. This would indicate that a decrementer interrupt has happened
too early.

How often does it happen ? Does it still happen with recent kernels ?
Were you running something that touches the decrementer (MOL, RTLINUX) ?

	Regards,
	Gabriel


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] gettimeofday stability
  2001-04-12 18:09     ` Gabriel Paubert
@ 2001-04-14  6:49       ` Karim Yaghmour
  2001-04-16 11:56         ` Gabriel Paubert
  0 siblings, 1 reply; 18+ messages in thread
From: Karim Yaghmour @ 2001-04-14  6:49 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: Samuel Rydh, linuxppc-dev, Benjamin Herrenschmidt

Gabriel Paubert wrote:
>
> Given that time handling is completely different in 2.2 and 2.4, it is
> surprising. This would indicate that a decrementer interrupt has happened
> too early.
>
> How often does it happen ? Does it still happen with recent kernels ?
> Were you running something that touches the decrementer (MOL, RTLINUX) ?
>

I can't tell you if the interrupt is happening too early or not,
although code could be added to check this.

I can tell you that this has been noticed by more than one person.
I noticed this and so did the person at MontaVista that contributed
the cross-platform reading code (to enable traces to be read accross
different endian machines).

If you really want to see what I mean I could forward you a textual
trace sample to proove my point.

The latest kernel I've tested this on is 2.4.0-test10.

===================================================
                 Karim Yaghmour
               karym@opersys.com
      Embedded and Real-Time Linux Expert
===================================================

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] gettimeofday stability
  2001-04-11 23:07   ` Samuel Rydh
@ 2001-04-16 11:25     ` Gabriel Paubert
  2001-04-19 20:43       ` Samuel Rydh
  0 siblings, 1 reply; 18+ messages in thread
From: Gabriel Paubert @ 2001-04-16 11:25 UTC (permalink / raw)
  To: Samuel Rydh; +Cc: linuxppc-dev


On Thu, 12 Apr 2001, Samuel Rydh wrote:

> On Wed, Apr 11, 2001 at 09:42:58PM +0200, Gabriel Paubert wrote:
> > > Normally, delta should be strictly positive. However, if
> > > coherency between DEC and TB is lost, then delta might turn
> > > out to be (slightly) negative, which results in a
> > > bogus time stamp.
> > >
> > > The main reason why I want this modification is that MOL
> > > touches both DEC and TB. I've not managed to maintain
> > > exact coherency (appears to be more or less impossible).
> > > The fix above would guard against an occasional drift.
> >
> > Why in the hell does it touch TB ? I could understand touching the
> > decrementer, but the TB should be sacred. It is the only absolute time
> > reference we have, and apart from being synchronized at boot on SMP, it
> > should never be changed.
>
> Ideally, TB should not be touched. Indeed, MOL can run without
> touching TB (but DEC is essential). However, TB needs to be
> modified for 'save session' feature to work. Basically, the RAM
> and cpu state of MacOS is flushed to disk. At a later time,
> MacOS can be restarted instantly. The problem is that MacOS can't
> deal with the TB skip that occurs if the timebase is not restored
> (no big surprise there).

By touching the TB, you'll also break all other Linux applications which
may have a valid use for the TB.

BTW: how do you handle multiple MOL sessions ?


>
> > If you touch the TB, you will lose the nicely spaced regular interrupts
> > that we have and screw up NTP for example, get occasionally shorter or
> > longer jiffies etc... I wrote the code carefully to avoid all these kinds
> > of problems.
>
> Yes, that is probably unavoidable. My though was to let the user
> choose the use of the save session feature at the price of slightly
> less regularly spaced DEC interrupts while MOL is running (there
> will probably be a minor clock drift too).

Well, I'd say that losing the timebase is unacceptable. Screwing up the
rest of the system for the sake of a single application is never
acceptable. And on top of this, you suggest a patch which cures the
symptom, not the cause.


>
> >Besides that, you have to touch all TB simultaneously on SMP,
> > it's not as easy as it seems.
>
> I know :-). It is difficult enough to keep DEC and TB coherent
> on a single processor system (i.e. I don't think there is a
> simple way - loading them simultaneously just after a clock
> edge does not work since mtdec/mttb requires too many processor
> cycles on certains CPUs).
>
> > Finally, if you _really_ run into this problem, given the delay between
> > the decrementer interrupt and the update of tb_last_stamp, it means that
> > you likely introduce uncertainties of several microseconds.
>
> Yes, that is probably true.

I keep my clocks very well synchronized, this would break everything.



>
> > I forgot also to mention that, to complicate matters, you have to
> > check CPU type before you touch the TB (601 versus all others).
>
> Well, the TB/RTC issue is a minor problem compared to the other differences
> (in particular, the unified BATs and the lack of a no-execute bit
> in the segment registers).
>
> Anyway, the negative offset check is desirable even if it is
> only the DEC that is touched.

No, it is not, is is a textbook case of curing the symptom instead of the
cause.

	Regards,
	Gabriel.


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] gettimeofday stability
  2001-04-14  6:49       ` Karim Yaghmour
@ 2001-04-16 11:56         ` Gabriel Paubert
  2001-04-16 13:25           ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 18+ messages in thread
From: Gabriel Paubert @ 2001-04-16 11:56 UTC (permalink / raw)
  To: Karim Yaghmour; +Cc: Samuel Rydh, linuxppc-dev, Benjamin Herrenschmidt

On Sat, 14 Apr 2001, Karim Yaghmour wrote:

>
> Gabriel Paubert wrote:
> >
> > Given that time handling is completely different in 2.2 and 2.4, it is
> > surprising. This would indicate that a decrementer interrupt has happened
> > too early.
> >
> > How often does it happen ? Does it still happen with recent kernels ?
> > Were you running something that touches the decrementer (MOL, RTLINUX) ?
> >
>
> I can't tell you if the interrupt is happening too early or not,
> although code could be added to check this.

Yes, currently the decrementer interrupt assumes that it is the only user.
This is quite simple however, use a while() {} instead of a do {}
while() in the loop that catches up on pending decrementer interrupts.

> I can tell you that this has been noticed by more than one person.
> I noticed this and so did the person at MontaVista that contributed
> the cross-platform reading code (to enable traces to be read accross
> different endian machines).
>
> If you really want to see what I mean I could forward you a textual
> trace sample to proove my point.

This could be very helpful (but from a recent kernel, please).

>
> The latest kernel I've tested this on is 2.4.0-test10.

Ok, the new code was introduced with test9. I'm very surprised since I've
heavily tested it (time stamping lots of interrupts+NTP), but I'm going to
add more thorough checks to see if I can catch some problem. I'll come
back after a few billion do_gettimeofday() calls.

	Regards,
	Gabriel.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] gettimeofday stability
  2001-04-16 13:25           ` Benjamin Herrenschmidt
@ 2001-04-16 12:53             ` Gabriel Paubert
  0 siblings, 0 replies; 18+ messages in thread
From: Gabriel Paubert @ 2001-04-16 12:53 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Samuel Rydh

On Mon, 16 Apr 2001, Benjamin Herrenschmidt wrote:

> >
> >Yes, currently the decrementer interrupt assumes that it is the only user.
> >This is quite simple however, use a while() {} instead of a do {}
> >while() in the loop that catches up on pending decrementer interrupts.
>
> Paulus already pushed such a patch to the bk repository (last week I
> think). He rewrote the pmac lost_interrupts stuff to use the DEC to cause
> interrupts.

Fine. I have no problem with using the decrementer as a "please interrupt
me asap" request... And it is nice to see some of the bloat of unmasking
interrupts go away with this (not that this affects me much though, my
sources were patched).

I still object strongly to anything touching the timebase, however.

	Regards,
	Gabriel.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] gettimeofday stability
  2001-04-16 11:56         ` Gabriel Paubert
@ 2001-04-16 13:25           ` Benjamin Herrenschmidt
  2001-04-16 12:53             ` Gabriel Paubert
  0 siblings, 1 reply; 18+ messages in thread
From: Benjamin Herrenschmidt @ 2001-04-16 13:25 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: linuxppc-dev, Samuel Rydh


>
>Yes, currently the decrementer interrupt assumes that it is the only user.
>This is quite simple however, use a while() {} instead of a do {}
>while() in the loop that catches up on pending decrementer interrupts.

Paulus already pushed such a patch to the bk repository (last week I
think). He rewrote the pmac lost_interrupts stuff to use the DEC to cause
interrupts.

Ben.


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] gettimeofday stability
@ 2001-04-16 16:00 Iain Sandoe
  2001-04-16 22:19 ` Dan Malek
  0 siblings, 1 reply; 18+ messages in thread
From: Iain Sandoe @ 2001-04-16 16:00 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: linuxppc-dev, Samuel Rydh

> I still object strongly to anything touching the timebase, however.

I'd prefer if it wasn't touched too... I've used it in both the IRQ blocking
measurement stuff (yet to be ported to 2.4.x) and the Audio Latency
measurement stuff (PPC port of Benno Sennoner's stuff)...

I thought I could rely on TB ;-))

ciao,
Iain.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] gettimeofday stability
  2001-04-16 16:00 Iain Sandoe
@ 2001-04-16 22:19 ` Dan Malek
  0 siblings, 0 replies; 18+ messages in thread
From: Dan Malek @ 2001-04-16 22:19 UTC (permalink / raw)
  To: Iain Sandoe; +Cc: Gabriel Paubert, linuxppc-dev, Samuel Rydh

Iain Sandoe wrote:
>
> > I still object strongly to anything touching the timebase, however.
>
> I'd prefer if it wasn't touched too...

As I recall, "touching" the TB simply means reading it and computing
elapsed ticks since the last read.  I don't remember anything changing
the value in the timebase registers (which isn't possible outside
of the kernel).

I regularly use the TB for user application event timing, and
haven't seen anything abnormal (other than the code I was testing :-).

	-- Dan

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] gettimeofday stability
  2001-04-11 20:09   ` Karim Yaghmour
  2001-04-11 21:31     ` Benjamin Herrenschmidt
  2001-04-12 18:09     ` Gabriel Paubert
@ 2001-04-17 11:22     ` Gabriel Paubert
  2 siblings, 0 replies; 18+ messages in thread
From: Gabriel Paubert @ 2001-04-17 11:22 UTC (permalink / raw)
  To: Karim Yaghmour; +Cc: Samuel Rydh, linuxppc-dev, Benjamin Herrenschmidt

On Wed, 11 Apr 2001, Karim Yaghmour wrote:

>
> Gabriel Paubert wrote:
> >
> > Finally, if you _really_ run into this problem, given the delay between
> > the decrementer interrupt and the update of tb_last_stamp, it means that
> > you likely introduce uncertainties of several microseconds. I forgot also
> > to mention that, to complicate matters, you have to check CPU type before
> > you touch the TB (601 versus all others).
> >
>
> While porting the Linux Trace Toolkit to PPC I noticed a problem
> that may be explained by the symptoms described. The way it works
> is that LTT uses do_gettimeofday() to stamp the events that occur.
> Occasionnaly, a trace would contain entries where the timestamp
> will jump (from one event to the next) of approximately 4295 seconds.
> Later on, this would come back to a "normal" value. And the
> 4295 seconds are 2^32/1000000. Hence the underflow.

Wiat a minute, my explanation was wrong. When you skip forward by 4295
seconds this means that the result of the mulhwu instruction has several
of the most signficant bits set. The problem is that mulhwu(x, tb_to_us)
can never return a value larger than tb_to_us, or x for that matter.

An early decrementer interrupt would make the time jump forward by ~2^32
tb ticks, or closer to 256 seconds with a 16 MHz timebase for example.
Still unacceptable of course, but a _very_ different symptom.

That's even more puzzling than the previous hypothesis, and I would
certainly like to know if you can still reproduce it. I suspect now a
problem with lost ticks handling. Actually, this lost tick thing is a bad
implementation, it is the clock maintenance routine in the bottom half
handler that should change the values of the point of reference
(tb_last_stamp), killing more global variable references in gettimeofday.

I have started to write some code to do this with two structures
alternatively referenced and a generation counter (this allows to make a
spinlock free do_gettimeofday()). It should scale better on SMP of course
but it's not yet in a publishable state however :-(

	Regards,
	Gabriel.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] gettimeofday stability
  2001-04-16 11:25     ` Gabriel Paubert
@ 2001-04-19 20:43       ` Samuel Rydh
  2001-04-21 15:21         ` Gabriel Paubert
  0 siblings, 1 reply; 18+ messages in thread
From: Samuel Rydh @ 2001-04-19 20:43 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: linuxppc-dev

On Mon, Apr 16, 2001 at 01:25:56PM +0200, Gabriel Paubert wrote:
> > Ideally, TB should not be touched. Indeed, MOL can run without
> > touching TB (but DEC is essential). However, TB needs to be
> > modified for 'save session' feature to work. Basically, the RAM
> > and cpu state of MacOS is flushed to disk. At a later time,
> > MacOS can be restarted instantly. The problem is that MacOS can't
> > deal with the TB skip that occurs if the timebase is not restored
> > (no big surprise there).
>
> By touching the TB, you'll also break all other Linux applications which
> may have a valid use for the TB.

The only noticeable effect is a small clock drift originating from
the loading and restoring of the timebase (and of course, only when
MOL is running). Whatever MOL puts into TB/DEC is completely
invisible to other processes.

Anyway, I'll see if I can locate and patch away the use of
the timebase register in MacOS - that would allow the
save-session feature to work without having to touch the TB.

> BTW: how do you handle multiple MOL sessions ?

Mutli-session support was actually added a few days ago -
a matter of making sure the MOL kernel module keeps session
specific data in a single struct, passed as a parameter.

> > Anyway, the negative offset check is desirable even if it is
> > only the DEC that is touched.
>
> No, it is not, is is a textbook case of curing the symptom instead of the
> cause.

Well, what I was looking for was a change to remove the assumption
that the timer code was the sole user of the DEC register. Pauls
change seems fix that neatly.

Regards,

/Samuel

----------------------------------------------------------
 E-mail <samuel@ibrium.se>  WWW: <http://www.ibrium.se>
  Phone/fax: (home) +46 8 4418431, (work) +46 8 7908470
----------------------------------------------------------

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] gettimeofday stability
  2001-04-19 20:43       ` Samuel Rydh
@ 2001-04-21 15:21         ` Gabriel Paubert
  2001-04-21 18:16           ` Samuel Rydh
  0 siblings, 1 reply; 18+ messages in thread
From: Gabriel Paubert @ 2001-04-21 15:21 UTC (permalink / raw)
  To: Samuel Rydh; +Cc: linuxppc-dev

On Thu, 19 Apr 2001, Samuel Rydh wrote:

> > By touching the TB, you'll also break all other Linux applications which
> > may have a valid use for the TB.
>
> The only noticeable effect is a small clock drift originating from
> the loading and restoring of the timebase (and of course, only when
> MOL is running). Whatever MOL puts into TB/DEC is completely
> invisible to other processes.
>
> Anyway, I'll see if I can locate and patch away the use of
> the timebase register in MacOS - that would allow the
> save-session feature to work without having to touch the TB.
>
> > BTW: how do you handle multiple MOL sessions ?
>
> Mutli-session support was actually added a few days ago -
> a matter of making sure the MOL kernel module keeps session
> specific data in a single struct, passed as a parameter.

Hmm, I'm not satisfied by the answer: consider the case of an SMP system
in which you have two processors running two instances of MOL which want a
different timebase. Now an interrupt comes in one processor, and this
handler needs a timestamp with do_gettimeofday(), how do you guarantee
that the time stamp does not depend on the processor on which the
interrupt arrives ?

Don't tell me that you fix the TB on each interrupt, please.

> Well, what I was looking for was a change to remove the assumption
> that the timer code was the sole user of the DEC register. Pauls
> change seems fix that neatly.

Yes, I fully agree with Paul's patch, it is the Right Thing (TM) to do.
This wrong assumption of my part only resulted in a micro-optimization in
the branch structure of the code. Certainly not worth it for the
correctness in case the decrementer is used for other purposes (and
overall robustness if not).

I'm still worried by the reports of 4295 seconds time jump, however. But
as I explained earlier I don't understand how this can be due to the
decrementer interrupt code; basically the timestamp is computed as

	tv_sec = xtime.tv_sec
	tv_usec = xtime.tv_usec+mulhwu(tb_to_us, expression)
	while (tv_usec>=1000000) {tv_usec -=1000000; tv_sec++}
	tv->tv_sec = tv_sec; tv->tv_usec= tv_usec

The only way to iterate 4295 times in the loop is to have tv_usec close to
2**32, but since the upper bound of the result of mulhwu is tb_to_us-1,
which is typically of the order of 2**27-2**28 (and this would require
an extremely large value of delta or of lost_ticks), the only explanation
I have now is that xtime.tb_usec was corrupt. I've been looking at the
clock maintenance code and I don't think it can happen, furthermore it's
generic and shared by (almost?) all architectures so it has been heavily
tested. I'm still looking for the problem, but I'm unable to reproduce it.

	Regards,
	Gabriel.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] gettimeofday stability
  2001-04-21 15:21         ` Gabriel Paubert
@ 2001-04-21 18:16           ` Samuel Rydh
  2001-04-21 19:37             ` Gabriel Paubert
  0 siblings, 1 reply; 18+ messages in thread
From: Samuel Rydh @ 2001-04-21 18:16 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gabriel Paubert

On Sat, Apr 21, 2001 at 05:21:51PM +0200, Gabriel Paubert wrote:
> > > BTW: how do you handle multiple MOL sessions ?
> >
> > Mutli-session support was actually added a few days ago -
> > a matter of making sure the MOL kernel module keeps session
> > specific data in a single struct, passed as a parameter.
>
> Hmm, I'm not satisfied by the answer: consider the case of an SMP system
> in which you have two processors running two instances of MOL which want a
> different timebase. Now an interrupt comes in one processor, and this
> handler needs a timestamp with do_gettimeofday(), how do you guarantee
> that the time stamp does not depend on the processor on which the
> interrupt arrives ?
>
> Don't tell me that you fix the TB on each interrupt, please.

The MOL user process runs in two modes, mac-mode and normal mode.
In mac-mode, the MOL module is in full control of the CPU (including
the MMU, DEC and the TB). When an interrupt occurs (or in general,
a non-MOL exception), everything is restored to what linux expects
before the exception is taken.

That is, TB is restored whenever an interrupt occurs in mac-mode.
The TB will estimately loose 0-2 ticks at each switch
(depending on the exact moment the clock happens to tick).

Currently, MOL does not run on SMP due to certain MMU related
complications. Much has been done here though, and only minor
fixes should be needed in order to get MOL running on
SMP.

In any case, MOL won't touch TB on SMP since that would
desynchronize the timebases which is clearly unacceptable.
Currently this means that the save-session feature will
not be available. But as I said, I'll investigate if it is
possible to locate and patch out all mftb instructions
in MacOS.

Cheers,

/Samuel

----------------------------------------------------------
 E-mail <samuel@ibrium.se>  WWW: <http://www.ibrium.se>
  Phone/fax: (home) +46 8 4418431, (work) +46 8 7908470
----------------------------------------------------------

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] gettimeofday stability
  2001-04-21 18:16           ` Samuel Rydh
@ 2001-04-21 19:37             ` Gabriel Paubert
  0 siblings, 0 replies; 18+ messages in thread
From: Gabriel Paubert @ 2001-04-21 19:37 UTC (permalink / raw)
  To: Samuel Rydh; +Cc: linuxppc-dev

On Sat, 21 Apr 2001, Samuel Rydh wrote:

> > Don't tell me that you fix the TB on each interrupt, please.
>
> The MOL user process runs in two modes, mac-mode and normal mode.
> In mac-mode, the MOL module is in full control of the CPU (including
> the MMU, DEC and the TB). When an interrupt occurs (or in general,
> a non-MOL exception), everything is restored to what linux expects
> before the exception is taken.

Ouch !!! Playing with the TB on each interrupt (external/decrementer) and
syscall too (don't forget that filesystems will call do_gettimeofday()
for timestamps) ?

> That is, TB is restored whenever an interrupt occurs in mac-mode.
> The TB will estimately loose 0-2 ticks at each switch
> (depending on the exact moment the clock happens to tick).

Well, this may mean thousands of time per second, it's bad. Anything which
can cause cumulative errors is not acceptable. It will badly interfere
with NTP for sure, especially if you have bursts of activity in MOL.

So I'd like to sugest a different solution, it will need an additional
(perhaps per CPU) global variable, that I shall call tb_error. I'll
assume that what you wnat to do is to bump the TB forward or backwards by
a given 64 bit offset (let us ignore the 601 for now, please :-)).

When interrupts are masked tbl+dec is constant and this fact can be used
to avoid cumulative errors:

1:	mftbl 	y
	mfdec 	x
	mftbl 	z		# low 32 bit of timestamp
	cmpw	y,z
	bne-	1b
	mftbu	y
	mftbl	t
	add	x,x,z
	subf	x,tb_error,x	# Theoretical current sum
	subfc	t,z,t		# This carry manipulation
	addme	y,y		# needs to be checked!
	subfc	z,tb_error,z
	subfze	y,y
# Now y,z is a corrected 64 bit timestamp, and x is a time independent
# constant betwen decrementer interrupts.
	addc	z,z,offsetl	# Bump the timebase and store it.
	adde	y,y,offsetu
	li	t,0		# This avoids an accidental carry into
	mttbl	t		# the MSW between mttbu and the second
	mttbu	y		# mttbl.
	mttbl	z
# Let us measure again the error
2:	mftbl	z		# The new tbl+dec constant should have
	mfdec	y		# been bumped by offsetl. But we can
	mftbl	t		# measure the error, save it and use it
	cmpw	t,z		# at the next timebase adjustment to
	bne-	2b		# avoid cumulative errors.
	add	y,y,z		# New constant
	subf	x,y,x		# Difference with old constant
	subf	new_tb_error,offsetl,x
# save new_tb_error for the next invocation

I think the mftb+mfdec+mftb loop will always work even on the slowest
processors, because mftb and mfdec are rather fast and it is comparable
to the recommended mftbl/mftbu/mftbl sequence used to read the full
timebase.

Of course the existence of the 601 will complicate the code (better write
a different routine, modulo 1e9 arithmetic will be a nightmare). But that
solution should not have _any_ cumulative error, which are _evil_ and the
only thing I care about (RTLinux folks might disagree, but then don't mix
RTLinux and MOL :-)).

Warning: untested, I just wrote it as is on the fly and that's assembly
(but with all the carry manipulations and privileged instructions, I have
the feeling that C would be even less readable ;-)).

Possible interface to this routine:

	tb_error = bump_tb(long long offset, int tb_error)

you just have to assign registers according to the standard API :-)

To test it a series of:
	tb_error = bump_tb(offset, tb_error);
	tb_error = bump_tb(-offset, tb_error);

(with interrupts disabled, using a small offset to avoid disrupting the
system) should never give large values for tb_error and the time should
not start drifting away. Actually tb_error should be a small negative
integer (or zero), which will vary depending on whether the i-cache is hot
or not when this code is executed. A small constant offset (well below one
microsecond) is no problem for timekeeping.

> Currently, MOL does not run on SMP due to certain MMU related
> complications. Much has been done here though, and only minor
> fixes should be needed in order to get MOL running on
> SMP.
>
> In any case, MOL won't touch TB on SMP since that would
> desynchronize the timebases which is clearly unacceptable.

If the technique I suggest turns out to work (a big if), it should make it
acceptable even on SMP :-)

> Currently this means that the save-session feature will
> not be available. But as I said, I'll investigate if it is
> possible to locate and patch out all mftb instructions
> in MacOS.

Maybe it's not even necessary... And don't forget that some applications
might also use mftb, or libraries, extensions, whatever...

However, I still think that switching the tb at each interrupt may be
overkill: we have control of who uses mftb in the kernel. I'm not 100%
sure, but it is possible that keeping a global (per processor on SMP)
variable which would be the low order 32 bits of (offset+tb_error) and
subtracting it from the mftb result whenever the tb is read would make
modifying the timebase only necessary on context switches.

	Regards,
	Gabriel.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2001-04-21 19:37 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-04-11 19:00 [PATCH] gettimeofday stability Samuel Rydh
2001-04-11 19:42 ` Gabriel Paubert
2001-04-11 20:09   ` Karim Yaghmour
2001-04-11 21:31     ` Benjamin Herrenschmidt
2001-04-12 18:09     ` Gabriel Paubert
2001-04-14  6:49       ` Karim Yaghmour
2001-04-16 11:56         ` Gabriel Paubert
2001-04-16 13:25           ` Benjamin Herrenschmidt
2001-04-16 12:53             ` Gabriel Paubert
2001-04-17 11:22     ` Gabriel Paubert
2001-04-11 23:07   ` Samuel Rydh
2001-04-16 11:25     ` Gabriel Paubert
2001-04-19 20:43       ` Samuel Rydh
2001-04-21 15:21         ` Gabriel Paubert
2001-04-21 18:16           ` Samuel Rydh
2001-04-21 19:37             ` Gabriel Paubert
  -- strict thread matches above, loose matches on Subject: below --
2001-04-16 16:00 Iain Sandoe
2001-04-16 22:19 ` Dan Malek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).