sched_clock

public inbox for linux-ia64@vger.kernel.org
 help / color / mirror / Atom feed

* sched_clock
@ 2004-05-26 11:59 Zoltan Menyhart
  2004-06-03 22:46 ` sched_clock David Mosberger
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: Zoltan Menyhart @ 2004-05-26 11:59 UTC (permalink / raw)
  To: linux-ia64

Time can go backward.
At least for the IA64 implementation where "sched_clock()" overflows.
Example:

sched_clock[MFI]	    mov.m r3=ar.itc

// r3: 0x0000002d19db39e7

sched_clock+0xc		    addl r10=-2096336,r1;;
sched_clock+0x10[MFI]       setf.sig f9=r3
sched_clock+0x20[MMI]       ld8 r9=[r10];;
sched_clock+0x26            adds r8$,r9;;
sched_clock+0x30[MMI]       ld8 r2=[r8];;

// r2: 0x0000000050383e5f   -- itc MHz: 797.809000

sched_clock+0x36            setf.sig f6=r2;;
sched_clock+0x46            xmpy.l f8ù,f6;;
sched_clock+0x50[MMI]       getf.sig r2ø;;

// r2: 0x21fd270c8ae86eb9   -- has overflown

sched_clock+0x5c            extr.u r8=r2,30,34

// r8: 0x0000000087f49c32   -- *Previously* I got 3fe001a68

Funny results can be obtained in "schedule()". E.g.:

	unsigned long run_time;
	now = sched_clock();
        run_time = now - prev->timestamp;

I do think it is a good programming solution to abuse the fact that
the variables are unsigned, and should "sched_clock()" overflow, we would
be saved by the "else" branch.

        if (likely(now - prev->timestamp < NS_MAX_SLEEP_AVG))
                run_time = now - prev->timestamp;
        else
                run_time = NS_MAX_SLEEP_AVG;

BTW is completely unfair if a task (even if it has run for just a fraction
of microsec.) is given "NS_MAX_SLEEP_AVG" just because "sched_clock()" has
overflown.

I do not think this comment below could be right (neither what it states nor
how it is used) because the ITC is a running counter, it is not restated
every time when a task is scheduled or time-stamp-ed.

/*
 * This shift should be large enough to be able to represent 1000000000/itc_freq with good
 * accuracy while being small enough to fit 10*1000000000<<IA64_NSEC_PER_CYC_SHIFT in 64 bits
 * (this will give enough slack to represent 10 seconds worth of time as a scaled number).
 */

I do not really see why we multiply the value read out from the ITC by
"local_cpu_data->nsec_per_cyc" in "sched_clock()".

Why do not we simply count nanoseconds as the scheduler wants us ?
We should convert carefully the ITC ticks into nanoseconds, doing something like:

unsigned long long sched_clock (void)
{
	return ia64_get_itc() * local_cpu_data->mult / local_cpu_data->div;
}

By "carefully" I mean avoiding overflows.

Time stamps should form an ever increasing "chain of time".

Thanks.

Zoltán Menyhárt

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: sched_clock
  2004-05-26 11:59 sched_clock Zoltan Menyhart
@ 2004-06-03 22:46 ` David Mosberger
  2004-06-04  9:43 ` sched_clock Ingo Molnar
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: David Mosberger @ 2004-06-03 22:46 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Wed, 26 May 2004 13:59:11 +0200, Zoltan Menyhart <Zoltan.Menyhart_AT_bull.net@nospam.org> said:

  Zoltan> Time can go backward.  At least for the IA64 implementation
  Zoltan> where "sched_clock()" overflows.

Yes, you're right, there is an intermediate-result overflow problem in
sched_clock() that I missed.  How does the attached patch work for you?

  Zoltan> Funny results can be obtained in "schedule()". E.g.:

  Zoltan> unsigned long run_time;
  Zoltan> now = sched_clock();
  Zoltan> run_time = now - prev->timestamp;

  Zoltan> I do think it is a good programming solution to abuse the
  Zoltan> fact that the variables are unsigned, and should
  Zoltan> "sched_clock()" overflow, we would be saved by the "else"
  Zoltan> branch.

Ingo, please correct me if I'm wrong, but I believe the code in
kernel/sched.c assumes that the cycle-counter will NOT overflow for
all practical purposes.  For example, a 64-bit cycle-counter running
at 10GHz would overflow only once every 532,249,697 years.  However,
this assumes that the cycle counter starts at (or near) zero at
boot-time.  I don't think there is any such guarantee on ia64 so
perhaps we should reset AR.ITC to zero on the boot-strap processor at
boot-time (or on all CPUs if the cycle-counters are not synchronized).

Ingo, is there something on x86 that guarantees that the cycle-counter
will start out near zero at boot time?

	--david

=== arch/ia64/kernel/head.S 1.22 vs edited ==--- 1.22/arch/ia64/kernel/head.S	Thu May 27 15:44:02 2004
+++ edited/arch/ia64/kernel/head.S	Thu Jun  3 14:36:56 2004
@@ -815,6 +815,36 @@
 	br.ret.sptk.many rp
 END(ia64_delay_loop)

+/*
+ * Return a CPU-local timestamp in nano-seconds.  This timestamp is NOT synchronized
+ * across CPUs its return value must never be compared against the values returned
+ * on another CPU.  The usage in kernel/sched.c ensures that.
+ *
+ * The code below basically calculates:
+ *
+ *   (ia64_get_itc() * local_cpu_data->nsec_per_cyc) >> IA64_NSEC_PER_CYC_SHIFT
+ *
+ * except that the multiplication and the shift are done with 128-bit intermediate
+ * precision so that we can produce a full 64-bit result.
+ */
+GLOBAL_ENTRY(sched_clock)
+	addl r8=THIS_CPU(cpu_info) + IA64_CPUINFO_NSEC_PER_CYC_OFFSET,r0
+	mov.m r9=ar.itc		// fetch cycle-counter				(35 cyc)
+	;;
+	ldf8 f8=[r8]
+	;;
+	setf.sig f9=r9		// certain to stall, so issue it _after_ ldf8...
+	;;
+	xmpy.lu f10ù,f8	// calculate low 64 bits of 128-bit product	(4 cyc)
+	xmpy.hu f11ù,f8	// calculate high 64 bits of 128-bit product
+	;;
+	getf.sig r8ñ0		//						(5 cyc)
+	getf.sig r9ñ1
+	;;
+	shrp r8=r9,r8,IA64_NSEC_PER_CYC_SHIFT
+	br.ret.sptk.many rp
+END(sched_clock)
+
 GLOBAL_ENTRY(start_kernel_thread)
 	.prologue
 	.save rp, r0				// this is the end of the call-chain
=== arch/ia64/kernel/time.c 1.41 vs edited ==--- 1.41/arch/ia64/kernel/time.c	Fri May 14 19:00:12 2004
+++ edited/arch/ia64/kernel/time.c	Thu Jun  3 14:26:19 2004
@@ -45,14 +45,6 @@

 #endif

-unsigned long long
-sched_clock (void)
-{
-	unsigned long offset = ia64_get_itc();
-
-	return (offset * local_cpu_data->nsec_per_cyc) >> IA64_NSEC_PER_CYC_SHIFT;
-}
-
 static void
 itc_reset (void)
 {

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: sched_clock
  2004-05-26 11:59 sched_clock Zoltan Menyhart
  2004-06-03 22:46 ` sched_clock David Mosberger
@ 2004-06-04  9:43 ` Ingo Molnar
  2004-06-04 11:02 ` sched_clock Andi Kleen
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Ingo Molnar @ 2004-06-04  9:43 UTC (permalink / raw)
  To: linux-ia64


On Thu, 3 Jun 2004, David Mosberger wrote:

> Ingo, is there something on x86 that guarantees that the cycle-counter
> will start out near zero at boot time?

it starts at zero, but there's no guarantee as far as i know. Would there
be any reason for it to start at another reason?

	Ingo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: sched_clock
  2004-05-26 11:59 sched_clock Zoltan Menyhart
  2004-06-03 22:46 ` sched_clock David Mosberger
  2004-06-04  9:43 ` sched_clock Ingo Molnar
@ 2004-06-04 11:02 ` Andi Kleen
  2004-06-04 11:11 ` sched_clock Zoltan Menyhart
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Andi Kleen @ 2004-06-04 11:02 UTC (permalink / raw)
  To: linux-ia64

On Fri, Jun 04, 2004 at 05:43:08AM -0400, Ingo Molnar wrote:
> 
> On Thu, 3 Jun 2004, David Mosberger wrote:
> 
> > Ingo, is there something on x86 that guarantees that the cycle-counter
> > will start out near zero at boot time?
> 
> it starts at zero, but there's no guarantee as far as i know. Would there
> be any reason for it to start at another reason?

Software can change it using MSR 1.  One reason might be that the easiest way 
to synchronize them for multiple CPUs at boot time is to set it to
a high future value. But I don't know of any BIOS that does that.

At one point we considered doing it ourselves on x86-64 to store the 
CPU number in the high bits for fast and race free per CPU gettimefoday.
But so far this hasn't been done. This won't work on i386 because
older Intel CPUs don't allow to write the full 64bits.

BTW I had code relying on the TSC not wrapping in the x86-64 kernel
for some time (now fixed), but nothing bad has happened.

-Andi

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: sched_clock
  2004-05-26 11:59 sched_clock Zoltan Menyhart
                   ` (2 preceding siblings ...)
  2004-06-04 11:02 ` sched_clock Andi Kleen
@ 2004-06-04 11:11 ` Zoltan Menyhart
  2004-06-04 22:23 ` sched_clock David Mosberger
  2004-06-04 22:52 ` sched_clock David Mosberger
  5 siblings, 0 replies; 7+ messages in thread
From: Zoltan Menyhart @ 2004-06-04 11:11 UTC (permalink / raw)
  To: linux-ia64

David Mosberger wrote:
> 
> Yes, you're right, there is an intermediate-result overflow problem in
> sched_clock() that I missed.  How does the attached patch work for you?

Thank you, it is O.K.

On the other hand, we might have similar problems in fsys.S. Some comments
like the one below makes me worry a bit:

	// if now < last_tick, set p7 = 1, p8 = 0

I cannot really say that I fully understand what is in fsys.S, but
should not "now" always be after no matter what time stamp, last tick ?

Thanks,

Zoltán

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: sched_clock
  2004-05-26 11:59 sched_clock Zoltan Menyhart
                   ` (3 preceding siblings ...)
  2004-06-04 11:11 ` sched_clock Zoltan Menyhart
@ 2004-06-04 22:23 ` David Mosberger
  2004-06-04 22:52 ` sched_clock David Mosberger
  5 siblings, 0 replies; 7+ messages in thread
From: David Mosberger @ 2004-06-04 22:23 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Fri, 4 Jun 2004 05:43:08 -0400 (EDT), Ingo Molnar <mingo@redhat.com> said:

  Ingo> On Thu, 3 Jun 2004, David Mosberger wrote:

  >> Ingo, is there something on x86 that guarantees that the
  >> cycle-counter will start out near zero at boot time?

  Ingo> it starts at zero, but there's no guarantee as far as i
  Ingo> know. Would there be any reason for it to start at another
  Ingo> reason?

I was thinking that firmware might do some testing of the
cycle-counter and then leave the counter in an indeterminate state.
It's reasonable to expect that the firmware would clear the counter to
zero afterwards, but since it's not required by the specs, I wouldn't
want to _rely_ on it (especially if the resulting errors could be
obscure).  I think I'll go ahead and clear the cycle-counter in
cpu_init().  That ought to be safe.

	--david

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: sched_clock
  2004-05-26 11:59 sched_clock Zoltan Menyhart
                   ` (4 preceding siblings ...)
  2004-06-04 22:23 ` sched_clock David Mosberger
@ 2004-06-04 22:52 ` David Mosberger
  5 siblings, 0 replies; 7+ messages in thread
From: David Mosberger @ 2004-06-04 22:52 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Fri, 04 Jun 2004 13:11:26 +0200, Zoltan Menyhart <Zoltan.Menyhart_AT_bull.net@nospam.org> said:

  Zoltan> David Mosberger wrote:
  >>  Yes, you're right, there is an intermediate-result overflow
  >> problem in sched_clock() that I missed.  How does the attached
  >> patch work for you?

  Zoltan> Thank you, it is O.K.

Thanks for checking it out.

  Zoltan> On the other hand, we might have similar problems in
  Zoltan> fsys.S. Some comments like the one below makes me worry a
  Zoltan> bit:

  Zoltan> 	// if now < last_tick, set p7 = 1, p8 = 0

  Zoltan> I cannot really say that I fully understand what is in
  Zoltan> fsys.S, but should not "now" always be after no matter what
  Zoltan> time stamp, last tick ?

No, the "<" there is done with 64-bit modular arithmetic so this code
is fine (it's the equivalent of the time_before() macro).

The ia64-specific code is very careful to work correctly even when the
ITC wraps around.  sched_clock() is a (platform-independent) exception
and it is OK because, in the worst case, an overflow will lead to a
scheduling hiccup.  Basically, the scheduler will simply "think" that
a task slept for NS_MAX_SLEEP_AVG when it may have slept much less
(there may be some other minor hiccups, but that should be pretty much
it, AFAICT).

	--david

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2004-06-04 22:52 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-05-26 11:59 sched_clock Zoltan Menyhart
2004-06-03 22:46 ` sched_clock David Mosberger
2004-06-04  9:43 ` sched_clock Ingo Molnar
2004-06-04 11:02 ` sched_clock Andi Kleen
2004-06-04 11:11 ` sched_clock Zoltan Menyhart
2004-06-04 22:23 ` sched_clock David Mosberger
2004-06-04 22:52 ` sched_clock David Mosberger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox