public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH] 2.6.0-test11 sched_clock() broken for "drifty ITC"
@ 2003-12-18 20:44 John Hawkes
  2003-12-18 22:37 ` john stultz
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: John Hawkes @ 2003-12-18 20:44 UTC (permalink / raw)
  To: linux-ia64, linux-kernel; +Cc: mingo, johnstul

David Mosberger suggests raising this issue on LKML to encourage a search
for a more general solution to my ia64 problem.

My specific problem is that the generic ia64 sched_clock() is broken for
"drifty ITC" (the per-CPU cycle counter clock) platforms, such as the SGI
sn.  sched_clock() currently uses its local CPU's ITC and therefore on
drifty platforms its values are not synchronized across the CPUs.  This
results (in part) in an invalid load_balance() is-the-cache-hot-or-not
calculation.

I proposed fixing this in arch/ia64 by making sched_clock() support an
optional platform-specific implementation.  I insert a quick summary of
that patch here, and append the full patch at the end of this email.  In my
patch the generic ia64 implementation remains the same, and the sn platform-
specific implementation uses the sn's synchronized RTC to return a high-
resolution sched_clock() value.

The generic part of my patch:

+static unsigned long long
+itc_sched_clock (void)
+{
+	return (offset * local_cpu_data->nsec_per_cyc) >> IA64_NSEC_PER_CY
C_SHIFT;
+}
+
+unsigned long long (*ia64_sched_clock)(void) = &itc_sched_clock;
+
 unsigned long long
 sched_clock (void)
 {
-	unsigned long offset = ia64_get_itc();
-
-	return (offset * local_cpu_data->nsec_per_cyc) >> IA64_NSEC_PER_CYC_SHIFT;
+	return ia64_sched_clock();
 }

However, David Mosberger rejected this patch, and he seeks instead some
hypothetical more generic approach to "drifty timebase platforms".  One
possible generic change would be to relax the semantics of sched_clock()
to no longer expect that the values be synchronized across all CPUs.
Some amount of code would need to be added to sched.c to deal with
unsynchronized values: scheduler_tick() remembers a local jiffies-
granularity sched_clock() in the runqueue struct, and load_balance's
can_migrate_task() uses that saved timestamp to compare against the tested
task->timestamp to determine cache-hot-or-not, rather than using the local
CPU's sched_clock() value.  Also, task->timestamp needs to be readjusted
when the task migrates:

diff -X /home/hawkes/Patches/ignore.dirs -Naur linux-2.6.0/kernel/sched.c linux-2.6.0-schedclock2/kernel/sched.c
--- linux-2.6.0/kernel/sched.c	Mon Nov 24 12:18:20 2003
+++ linux-2.6.0-schedclock2/kernel/sched.c	Mon Dec 15 17:13:24 2003
@@ -199,7 +199,7 @@
 struct runqueue {
 	spinlock_t lock;
 	unsigned long nr_running, nr_switches, expired_timestamp,
-			nr_uninterruptible;
+			nr_uninterruptible, timestamp_last_tick;
 	task_t *curr, *idle;
 	struct mm_struct *prev_mm;
 	prio_array_t *active, *expired, arrays[2];
@@ -1135,6 +1135,7 @@
 	set_task_cpu(p, this_cpu);
 	nr_running_inc(this_rq);
 	enqueue_task(p, this_rq->active);
+	p->timestamp = sched_clock() - (src_rq->timestamp_last_tick - p->timestamp);
 	/*
 	 * Note that idle threads have a prio of MAX_PRIO, for this test
 	 * to be always true for them.
@@ -1155,7 +1156,7 @@
 static inline int
 can_migrate_task(task_t *tsk, runqueue_t *rq, int this_cpu, int idle)
 {
-	unsigned long delta = sched_clock() - tsk->timestamp;
+	unsigned long delta = rq->timestamp_last_tick - tsk->timestamp;
 
 	if (!idle && (delta <= JIFFIES_TO_NS(cache_decay_ticks)))
 		return 0;
@@ -1361,6 +1362,8 @@
 	runqueue_t *rq = this_rq();
 	task_t *p = current;
 
+	rq->timestamp_last_tick = sched_clock();
+
 	if (rcu_pending(cpu))
 		rcu_check_callbacks(cpu, user_ticks);
 
@@ -2639,6 +2642,8 @@
 		if (p->prio < rq_dest->curr->prio)
 			resched_task(rq_dest->curr);
 	}
+	p->timestamp = rq_dest->timestamp_last_tick;
+
  out:
 	double_rq_unlock(this_rq(), rq_dest);
 	local_irq_restore(flags);


My personal preference is to keep sched_clock() semantics unchanged (i.e.,
to presume that all CPU values are synchronized), and to bury non-compliant
platform complexity in arch-specific and platform-specific implementations,
rather than add any unnecessary complications to sched.c.

As a reference point, the current i386 2.6.0-test11 algorithm is platform-
specific and is, in my opinion, less clean than the arch/ia64 change I have
proposed:  all i386 NUMA platforms are forced to use the low-resolution
"jiffies" value as the only officially synchronized timebase.  (However,
with my sched.c patch, it is possible that i386 NUMA platforms could use the
unsynchronized higher resolution TSC.)

unsigned long long sched_clock(void)
{
	unsigned long long this_offset;

	/*
	 * In the NUMA case we dont use the TSC as they are not
	 * synchronized across all CPUs.
	 */
#ifndef CONFIG_NUMA
	if (!use_tsc)
#endif
		return (unsigned long long)jiffies * (1000000000 / HZ);

	/* Read the Time Stamp Counter */
	rdtscll(this_offset);

	/* return the value in ns */
	return cycles_2_ns(this_offset);
}

This same "#ifndef CONFIG_NUMA" fallback-to-jiffies approach would work for
ia64, but I don't see it as either visually superior to my platform-specific
vector or as functionally superior, because all NUMA platforms would revert
back to using the low-resolution jiffies as the sched_clock() timebase.  The
2.4 kernel used jiffies, and someone decided that in 2.6 the scheduler
needs a higher-resolution timestamp to better calculate process priority.

An unworkable approach (before anyone suggests it) is to periodically resync
ITC clocks:  an sn platform can theoretically support CPUs that run at
different clock speeds, and thus these ITCs can never be synchronized.

Comments?

John Hawkes


Here is the full ia64 patch that Mosberger rejected:

diff -X /home/hawkes/Patches/ignore.dirs -Naur linux-2.6.0/arch/ia64/kernel/time.c linux-2.6.0-schedclock/arch/ia64/kernel/time.c
--- linux-2.6.0/arch/ia64/kernel/time.c	Mon Oct 20 12:48:11 2003
+++ linux-2.6.0-schedclock/arch/ia64/kernel/time.c	Tue Nov 18 17:28:31 2003
@@ -42,12 +42,18 @@
 
 #endif
 
+static unsigned long long
+itc_sched_clock (void)
+{
+	return (offset * local_cpu_data->nsec_per_cyc) >> IA64_NSEC_PER_CY
C_SHIFT;
+}
+
+unsigned long long (*ia64_sched_clock)(void) = &itc_sched_clock;
+
 unsigned long long
 sched_clock (void)
 {
-	unsigned long offset = ia64_get_itc();
-
-	return (offset * local_cpu_data->nsec_per_cyc) >> IA64_NSEC_PER_CYC_SHIFT;
+	return ia64_sched_clock();
 }
 
 static void
diff -X /home/hawkes/Patches/ignore.dirs -Naur linux-2.6.0/arch/ia64/sn/kernel/sn2/timer.c linux-2.6.0-schedclock/arch/ia64/sn/kernel/sn2/timer.c
--- linux-2.6.0/arch/ia64/sn/kernel/sn2/timer.c	Mon Jul 14 11:51:29 2003
+++ linux-2.6.0-schedclock/arch/ia64/sn/kernel/sn2/timer.c	Wed Nov 19 10:00:43 2003
@@ -22,6 +22,8 @@
 extern unsigned long sn_rtc_cycles_per_second;
 static volatile unsigned long last_wall_rtc;
 
+extern unsigned long long (*ia64_sched_clock)(void);
+
 static unsigned long rtc_offset;	/* updated only when xtime write-lock is held! */
 static long rtc_nsecs_per_cycle;
 static long rtc_per_timer_tick;
@@ -55,6 +57,12 @@
 	last_wall_rtc = GET_RTC_COUNTER();
 }
 
+static unsigned long long
+sn_sched_clock(void)
+{
+	return GET_RTC_COUNTER() * rtc_nsecs_per_cycle;
+}
+
 
 static struct time_interpolator sn2_interpolator = {
 	.get_offset =	getoffset,
@@ -73,4 +81,6 @@
 	rtc_nsecs_per_cycle = 1000000000 / sn_rtc_cycles_per_second;
 
 	last_wall_rtc = GET_RTC_COUNTER();
+
+	ia64_sched_clock = &sn_sched_clock;
 }

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC][PATCH] 2.6.0-test11 sched_clock() broken for "drifty ITC"
  2003-12-18 20:44 [RFC][PATCH] 2.6.0-test11 sched_clock() broken for "drifty ITC" John Hawkes
@ 2003-12-18 22:37 ` john stultz
  2003-12-20 10:41 ` Andrew Morton
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: john stultz @ 2003-12-18 22:37 UTC (permalink / raw)
  To: John Hawkes; +Cc: ia64, lkml, Ingo Molnar

On Thu, 2003-12-18 at 12:44, John Hawkes wrote:
> David Mosberger suggests raising this issue on LKML to encourage a search
> for a more general solution to my ia64 problem.
> 
> My specific problem is that the generic ia64 sched_clock() is broken for
> "drifty ITC" (the per-CPU cycle counter clock) platforms, such as the SGI
> sn.  sched_clock() currently uses its local CPU's ITC and therefore on
> drifty platforms its values are not synchronized across the CPUs.  This
> results (in part) in an invalid load_balance() is-the-cache-hot-or-not
> calculation.

[snip]

> My personal preference is to keep sched_clock() semantics unchanged (i.e.,
> to presume that all CPU values are synchronized), and to bury non-compliant
> platform complexity in arch-specific and platform-specific implementations,
> rather than add any unnecessary complications to sched.c.
> 
> As a reference point, the current i386 2.6.0-test11 algorithm is platform-
> specific and is, in my opinion, less clean than the arch/ia64 change I have
> proposed:  all i386 NUMA platforms are forced to use the low-resolution
> "jiffies" value as the only officially synchronized timebase.  (However,
> with my sched.c patch, it is possible that i386 NUMA platforms could use the
> unsynchronized higher resolution TSC.)

I've tried switching sched_clock() to call monotonic_clock() on i386
(both functions do basically the same thing, only monotonic_clock
redirects to the system's chosen time source). However,the consensus was
that the access time to read the systems non-"drifty" time source can be
too expensive to make it worth while. 

I'll concede that I haven't done enough testing to measure it in any
great detail, but the interest in my patch was very very low. So I'd be
interested in any measurements you have to the contrary.

Personally, I'd keep your changes arch specific for 2.6. When 2.7 opens
I'd be very interested in ideas and input for broader change in the time
subsystem.

thanks
-john


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC][PATCH] 2.6.0-test11 sched_clock() broken for "drifty ITC"
  2003-12-18 20:44 [RFC][PATCH] 2.6.0-test11 sched_clock() broken for "drifty ITC" John Hawkes
  2003-12-18 22:37 ` john stultz
@ 2003-12-20 10:41 ` Andrew Morton
  2003-12-20 10:50 ` Ingo Molnar
  2003-12-29 18:51 ` John Hawkes
  3 siblings, 0 replies; 12+ messages in thread
From: Andrew Morton @ 2003-12-20 10:41 UTC (permalink / raw)
  To: John Hawkes; +Cc: linux-ia64, linux-kernel, mingo, johnstul

John Hawkes <hawkes@babylon.engr.sgi.com> wrote:
>
> David Mosberger suggests raising this issue on LKML to encourage a search
>  for a more general solution to my ia64 problem.
> 
>  My specific problem is that the generic ia64 sched_clock() is broken for
>  "drifty ITC" (the per-CPU cycle counter clock) platforms, such as the SGI
>  sn.  sched_clock() currently uses its local CPU's ITC and therefore on
>  drifty platforms its values are not synchronized across the CPUs.  This
>  results (in part) in an invalid load_balance() is-the-cache-hot-or-not
>  calculation.

Requiring that sched_clock() be synchronised is difficult for some
platforms.  Clearly, it is better if we can relax that.

> However, David Mosberger rejected this patch, and he seeks instead some
> hypothetical more generic approach to "drifty timebase platforms".  One
> possible generic change would be to relax the semantics of sched_clock() to
> no longer expect that the values be synchronized across all CPUs.

Your patch to kernel/sched.c looks good: low overhead, simple, Ingo likes
it.

Could you please finalise it, cook up the ia64 and numaq implementations
and send it over?



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC][PATCH] 2.6.0-test11 sched_clock() broken for "drifty ITC"
  2003-12-18 20:44 [RFC][PATCH] 2.6.0-test11 sched_clock() broken for "drifty ITC" John Hawkes
  2003-12-18 22:37 ` john stultz
  2003-12-20 10:41 ` Andrew Morton
@ 2003-12-20 10:50 ` Ingo Molnar
  2003-12-20 14:57   ` Nick Piggin
  2003-12-29 18:51 ` John Hawkes
  3 siblings, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2003-12-20 10:50 UTC (permalink / raw)
  To: John Hawkes; +Cc: linux-ia64, linux-kernel, johnstul, Andrew Morton

* John Hawkes <hawkes@babylon.engr.sgi.com> wrote:

> Some amount of code would need to be added to sched.c to deal with
> unsynchronized values: scheduler_tick() remembers a local jiffies-
> granularity sched_clock() in the runqueue struct, and load_balance's
> can_migrate_task() uses that saved timestamp to compare against the
> tested task->timestamp to determine cache-hot-or-not, rather than
> using the local CPU's sched_clock() value.  Also, task->timestamp
> needs to be readjusted when the task migrates:

this is a tough problem that wont go away.

Even platforms where the per-CPU clock is supposed to be synchronized,
sometimes it isnt. (this is a recurring problem on x86 SMP - so x86 will
benefit from it too.)

the relaxation means the effective granularity reduction of the
migration decisions - but this is not a problem, migration latencies are
always a high multiple of the timer irq frequency. The cycle accuracy of
sched_clock() is otherwise very important for correct interactivity
decisions - but this is only used locally and is thus preserved by the
patch. The only area where this change can reduce the quality of
interactivity estimatio is when a task oscillates very quickly between
multiple CPUs and is also somehow relevant to interactivity.

So i believe the generic relaxing of sched_clock() synchronization is
the right thing to do. I like your patch. It adds minimal overhead and
solves a hard problem - nice work! Andrew, please apply it.

	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC][PATCH] 2.6.0-test11 sched_clock() broken for "drifty ITC"
  2003-12-20 10:50 ` Ingo Molnar
@ 2003-12-20 14:57   ` Nick Piggin
  2003-12-20 15:05     ` Andrew Morton
  0 siblings, 1 reply; 12+ messages in thread
From: Nick Piggin @ 2003-12-20 14:57 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: John Hawkes, linux-ia64, linux-kernel, johnstul, Andrew Morton



Ingo Molnar wrote:

>
>So i believe the generic relaxing of sched_clock() synchronization is
>the right thing to do. I like your patch. It adds minimal overhead and
>solves a hard problem - nice work! Andrew, please apply it.
>
>

Its a great looking patch if you must have high res sched_clock. So
I guess I agree with it.

Can we have a scheduler day when Andrew is ready to take patches for
it? I have a few small changes that I'd like to get merged soon too
(not sched domains - that should probably go to the mm tree for a while)

Relevant patches are
sched-ctx-count-preempt.patch
sched-fork-cleanup.patch
sched-migrate-comment.patch
sched-style.patch
sched-migrate-affinity-race.patch




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC][PATCH] 2.6.0-test11 sched_clock() broken for "drifty ITC"
  2003-12-20 14:57   ` Nick Piggin
@ 2003-12-20 15:05     ` Andrew Morton
  2003-12-20 15:12       ` Nick Piggin
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2003-12-20 15:05 UTC (permalink / raw)
  To: Nick Piggin; +Cc: mingo, hawkes, linux-ia64, linux-kernel, johnstul

Nick Piggin <piggin@cyberone.com.au> wrote:
>
> 
> 
> Ingo Molnar wrote:
> 
> >
> >So i believe the generic relaxing of sched_clock() synchronization is
> >the right thing to do. I like your patch. It adds minimal overhead and
> >solves a hard problem - nice work! Andrew, please apply it.
> >
> >
> 
> Its a great looking patch if you must have high res sched_clock. So
> I guess I agree with it.

miaow ;)

> Can we have a scheduler day when Andrew is ready to take patches for
> it? I have a few small changes that I'd like to get merged soon too
> (not sched domains - that should probably go to the mm tree for a while)
> 
> Relevant patches are
> sched-ctx-count-preempt.patch
> sched-fork-cleanup.patch
> sched-migrate-comment.patch
> sched-style.patch
> sched-migrate-affinity-race.patch
> 

Post 'em.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC][PATCH] 2.6.0-test11 sched_clock() broken for "drifty ITC"
  2003-12-20 15:05     ` Andrew Morton
@ 2003-12-20 15:12       ` Nick Piggin
  2003-12-20 16:36         ` Ingo Molnar
  2003-12-20 21:41         ` Zwane Mwaikambo
  0 siblings, 2 replies; 12+ messages in thread
From: Nick Piggin @ 2003-12-20 15:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: mingo, hawkes, linux-ia64, linux-kernel, johnstul



Andrew Morton wrote:

>Nick Piggin <piggin@cyberone.com.au> wrote:
>
>>
>>
>>Ingo Molnar wrote:
>>
>>
>>>So i believe the generic relaxing of sched_clock() synchronization is
>>>the right thing to do. I like your patch. It adds minimal overhead and
>>>solves a hard problem - nice work! Andrew, please apply it.
>>>
>>>
>>>
>>Its a great looking patch if you must have high res sched_clock. So
>>I guess I agree with it.
>>
>
>miaow ;)
>

I'm just thinking that computers with unsynched clocks have less
need for good interactivity, but thats probably too narrow and x86
a view anyway.

>
>>Can we have a scheduler day when Andrew is ready to take patches for
>>it? I have a few small changes that I'd like to get merged soon too
>>(not sched domains - that should probably go to the mm tree for a while)
>>
>>Relevant patches are
>>sched-ctx-count-preempt.patch
>>sched-fork-cleanup.patch
>>sched-migrate-comment.patch
>>sched-style.patch
>>sched-migrate-affinity-race.patch
>>
>>
>
>Post 'em.
>

OK, I'll trim the cc list though...



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC][PATCH] 2.6.0-test11 sched_clock() broken for "drifty ITC"
  2003-12-20 15:12       ` Nick Piggin
@ 2003-12-20 16:36         ` Ingo Molnar
  2003-12-20 21:41         ` Zwane Mwaikambo
  1 sibling, 0 replies; 12+ messages in thread
From: Ingo Molnar @ 2003-12-20 16:36 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Morton, hawkes, linux-ia64, linux-kernel, johnstul


* Nick Piggin <piggin@cyberone.com.au> wrote:

> I'm just thinking that computers with unsynched clocks have less need
> for good interactivity, but thats probably too narrow and x86 a view
> anyway.

it's also a matter of predictable behavior. The scheduler should not
behave differently just because there's no synchronized clock. 
Behavioral forks like that tend to come back and cause trouble later.

	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC][PATCH] 2.6.0-test11 sched_clock() broken for "drifty ITC"
  2003-12-20 15:12       ` Nick Piggin
  2003-12-20 16:36         ` Ingo Molnar
@ 2003-12-20 21:41         ` Zwane Mwaikambo
  1 sibling, 0 replies; 12+ messages in thread
From: Zwane Mwaikambo @ 2003-12-20 21:41 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Andrew Morton, mingo, hawkes, linux-ia64, linux-kernel, johnstul

On Sun, 21 Dec 2003, Nick Piggin wrote:

> Andrew Morton wrote:
>
> >miaow ;)

Andrew what are they feeding you these days? ;)

> I'm just thinking that computers with unsynched clocks have less
> need for good interactivity, but thats probably too narrow and x86
> a view anyway.

x86 and TSCs don't do that great a job either ;)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC][PATCH] 2.6.0-test11 sched_clock() broken for "drifty ITC"
  2003-12-18 20:44 [RFC][PATCH] 2.6.0-test11 sched_clock() broken for "drifty ITC" John Hawkes
                   ` (2 preceding siblings ...)
  2003-12-20 10:50 ` Ingo Molnar
@ 2003-12-29 18:51 ` John Hawkes
  2003-12-29 19:32   ` [Lse-tech] " Martin J. Bligh
  2003-12-29 20:16   ` John Hawkes
  3 siblings, 2 replies; 12+ messages in thread
From: John Hawkes @ 2003-12-29 18:51 UTC (permalink / raw)
  To: Andrew Morton; +Cc: lse-tech, mingo, linux-kernel

> Could you please finalise it, cook up the ia64 and numaq implementations
> and send it over?

I believe the "ia64 implementation" stands as-is, since it uses the low-
overhead ITC.

I'm not familiar with NUMAQ issues, but perhaps this timer_tsc.c change
would be appropriate?  It allows i386 CONFIG_NUMA platforms to potentially
use the TSC for sched_clock() timings, given that sched_clock() no longer
requires that the TSC be synchronized across all CPUs.  It does, however,
require that "use_tsc" be properly initialized for i386 CONFIG_NUMA.  Is
that a valid assumption?

Also, I have added a comment to sched.h to make note of the change
in sched_clock() semantics.

John Hawkes


diff -X /home/hawkes/Patches/ignore.dirs -Naur linux-2.6.0/arch/i386/kernel/timers/timer_tsc.c linux-2.6.0-schedclock2/arch/i386/kernel/timers/timer_tsc.c
--- linux-2.6.0/arch/i386/kernel/timers/timer_tsc.c	Mon Nov 24 12:18:20 2003
+++ linux-2.6.0-schedclock2/arch/i386/kernel/timers/timer_tsc.c	Sat Dec 13 11:33:04 2003
@@ -138,9 +138,7 @@
 	 * In the NUMA case we dont use the TSC as they are not
 	 * synchronized across all CPUs.
 	 */
-#ifndef CONFIG_NUMA
 	if (!use_tsc)
-#endif
 		return (unsigned long long)jiffies * (1000000000 / HZ);
 
 	/* Read the Time Stamp Counter */
diff -X /home/hawkes/Patches/ignore.dirs -Naur linux-2.6.0/include/linux/sched.h linux-2.6.0-schedclock2/include/linux/sched.h
--- linux-2.6.0/include/linux/sched.h	Mon Nov 24 12:18:20 2003
+++ linux-2.6.0-schedclock2/include/linux/sched.h	Mon Dec 29 10:47:55 2003
@@ -510,6 +510,7 @@
 }
 #endif
 
+/* nanosecond granularity, not necessarily synchronized across all CPUs */
 extern unsigned long long sched_clock(void);
 
 #ifdef CONFIG_NUMA
diff -X /home/hawkes/Patches/ignore.dirs -Naur linux-2.6.0/kernel/sched.c linux-2.6.0-schedclock2/kernel/sched.c
--- linux-2.6.0/kernel/sched.c	Mon Nov 24 12:18:20 2003
+++ linux-2.6.0-schedclock2/kernel/sched.c	Mon Dec 15 17:13:24 2003
@@ -199,7 +199,7 @@
 struct runqueue {
 	spinlock_t lock;
 	unsigned long nr_running, nr_switches, expired_timestamp,
-			nr_uninterruptible;
+			nr_uninterruptible, timestamp_last_tick;
 	task_t *curr, *idle;
 	struct mm_struct *prev_mm;
 	prio_array_t *active, *expired, arrays[2];
@@ -1135,6 +1135,7 @@
 	set_task_cpu(p, this_cpu);
 	nr_running_inc(this_rq);
 	enqueue_task(p, this_rq->active);
+	p->timestamp = sched_clock() - (src_rq->timestamp_last_tick - p->timestamp);
 	/*
 	 * Note that idle threads have a prio of MAX_PRIO, for this test
 	 * to be always true for them.
@@ -1155,7 +1156,7 @@
 static inline int
 can_migrate_task(task_t *tsk, runqueue_t *rq, int this_cpu, int idle)
 {
-	unsigned long delta = sched_clock() - tsk->timestamp;
+	unsigned long delta = rq->timestamp_last_tick - tsk->timestamp;
 
 	if (!idle && (delta <= JIFFIES_TO_NS(cache_decay_ticks)))
 		return 0;
@@ -1361,6 +1362,8 @@
 	runqueue_t *rq = this_rq();
 	task_t *p = current;
 
+	rq->timestamp_last_tick = sched_clock();
+
 	if (rcu_pending(cpu))
 		rcu_check_callbacks(cpu, user_ticks);
 
@@ -2639,6 +2642,8 @@
 		if (p->prio < rq_dest->curr->prio)
 			resched_task(rq_dest->curr);
 	}
+	p->timestamp = rq_dest->timestamp_last_tick;
+
  out:
 	double_rq_unlock(this_rq(), rq_dest);
 	local_irq_restore(flags);

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Lse-tech] Re: [RFC][PATCH] 2.6.0-test11 sched_clock() broken for "drifty ITC"
  2003-12-29 18:51 ` John Hawkes
@ 2003-12-29 19:32   ` Martin J. Bligh
  2003-12-29 20:16   ` John Hawkes
  1 sibling, 0 replies; 12+ messages in thread
From: Martin J. Bligh @ 2003-12-29 19:32 UTC (permalink / raw)
  To: John Hawkes, Andrew Morton; +Cc: lse-tech, mingo, linux-kernel, johnstul

>> Could you please finalise it, cook up the ia64 and numaq implementations
>> and send it over?
> 
> I believe the "ia64 implementation" stands as-is, since it uses the low-
> overhead ITC.
> 
> I'm not familiar with NUMAQ issues, but perhaps this timer_tsc.c change
> would be appropriate?  It allows i386 CONFIG_NUMA platforms to potentially
> use the TSC for sched_clock() timings, given that sched_clock() no longer
> requires that the TSC be synchronized across all CPUs.  It does, however,
> require that "use_tsc" be properly initialized for i386 CONFIG_NUMA.  Is
> that a valid assumption?

I don't think there's anything special about use_tsc for NUMA-Q - it's
set to 1 AFAICS. I had a patch to force notsc on about 6 months or so
ago, but it made the machine crash hard in strange ways I never bothered
debugging. 

Is there any harm in dropping the first part of your patch, ie.

> diff -X /home/hawkes/Patches/ignore.dirs -Naur linux-2.6.0/arch/i386/kernel/timers/timer_tsc.c linux-2.6.0-schedclock2/arch/i386/kernel/timers/timer_tsc.c
> --- linux-2.6.0/arch/i386/kernel/timers/timer_tsc.c	Mon Nov 24 12:18:20 2003
> +++ linux-2.6.0-schedclock2/arch/i386/kernel/timers/timer_tsc.c	Sat Dec 13 11:33:04 2003
> @@ -138,9 +138,7 @@
>  	 * In the NUMA case we dont use the TSC as they are not
>  	 * synchronized across all CPUs.
>  	 */
> -#ifndef CONFIG_NUMA
>  	if (!use_tsc)
> -#endif
>  		return (unsigned long long)jiffies * (1000000000 / HZ);
>  
>  	/* Read the Time Stamp Counter */

and leaving the rest of it? the CONFIG_NUMA affects both NUMA-Q and Summit,
BTW, which uses the cyclone timer, so it gets even more complex ;-)

M.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Lse-tech] Re: [RFC][PATCH] 2.6.0-test11 sched_clock() broken for "drifty ITC"
  2003-12-29 18:51 ` John Hawkes
  2003-12-29 19:32   ` [Lse-tech] " Martin J. Bligh
@ 2003-12-29 20:16   ` John Hawkes
  1 sibling, 0 replies; 12+ messages in thread
From: John Hawkes @ 2003-12-29 20:16 UTC (permalink / raw)
  To: Andrew Morton, John Hawkes, Martin J. Bligh
  Cc: johnstul, linux-kernel, mingo, lse-tech

> [ Martin Bligh writes: ]
> Is there any harm in dropping the first part of your patch, ie.
...
> > diff -X /home/hawkes/Patches/ignore.dirs -Naur linux-2.6.0/arch/i386/kernel/timers/timer_tsc.c linux-2.6.0-schedclock2/arch/i386/kernel/timers/timer_tsc.c
...
> and leaving the rest of it? the CONFIG_NUMA affects both NUMA-Q and Summit,
> BTW, which uses the cyclone timer, so it gets even more complex ;-)

No, no harm.  Leaving out that timer_tsc.c part means only that i386
CONFIG_NUMA continues to use "jiffies" as the timebase, which is a
low-resolution timebase and may affect the quality of some of the
interactive scheduling decisions.

John Hawkes

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2003-12-29 20:20 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-12-18 20:44 [RFC][PATCH] 2.6.0-test11 sched_clock() broken for "drifty ITC" John Hawkes
2003-12-18 22:37 ` john stultz
2003-12-20 10:41 ` Andrew Morton
2003-12-20 10:50 ` Ingo Molnar
2003-12-20 14:57   ` Nick Piggin
2003-12-20 15:05     ` Andrew Morton
2003-12-20 15:12       ` Nick Piggin
2003-12-20 16:36         ` Ingo Molnar
2003-12-20 21:41         ` Zwane Mwaikambo
2003-12-29 18:51 ` John Hawkes
2003-12-29 19:32   ` [Lse-tech] " Martin J. Bligh
2003-12-29 20:16   ` John Hawkes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox