public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* bug in sched.c:activate_task()
@ 2004-10-05  2:16 Chen, Kenneth W
  2004-10-05  2:34 ` Con Kolivas
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Chen, Kenneth W @ 2004-10-05  2:16 UTC (permalink / raw)
  To: linux-kernel

Update p->timestamp to "now" in activate_task() doesn't look right
to me at all.  p->timestamp records last time it was running on a
cpu.  activate_task shouldn't update that variable when it queues
a task on the runqueue.

This bug (and combined with others) triggers improper load balancing.

Patch against linux-2.6.9-rc3.  Didn't diff it against 2.6.9-rc3-mm2
because mm tree has so many change in sched.c.

Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>


--- linux-2.6.9-rc3/kernel/sched.c.orig	2004-10-04 19:11:21.000000000 -0700
+++ linux-2.6.9-rc3/kernel/sched.c	2004-10-04 19:11:35.000000000 -0700
@@ -888,7 +888,6 @@ static void activate_task(task_t *p, run
 			p->activated = 1;
 		}
 	}
-	p->timestamp = now;

 	__activate_task(p, rq);
 }



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bug in sched.c:activate_task()
  2004-10-05  2:16 bug in sched.c:activate_task() Chen, Kenneth W
@ 2004-10-05  2:34 ` Con Kolivas
  2004-10-05  3:14   ` Nick Piggin
  2004-10-05  5:05   ` Peter Williams
  2004-10-05  6:10 ` Nick Piggin
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 18+ messages in thread
From: Con Kolivas @ 2004-10-05  2:34 UTC (permalink / raw)
  To: Chen, Kenneth W; +Cc: linux-kernel

Chen, Kenneth W writes:

> Update p->timestamp to "now" in activate_task() doesn't look right
> to me at all.  p->timestamp records last time it was running on a
> cpu.  activate_task shouldn't update that variable when it queues
> a task on the runqueue.
> 
> This bug (and combined with others) triggers improper load balancing.

The updated timestamp was placed there by Ingo to detect on-runqueue time. 
If it is being used for load balancing then it is being used in error.

Con


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bug in sched.c:activate_task()
  2004-10-05  2:34 ` Con Kolivas
@ 2004-10-05  3:14   ` Nick Piggin
  2004-10-05  4:45     ` Con Kolivas
  2004-10-05  5:05   ` Peter Williams
  1 sibling, 1 reply; 18+ messages in thread
From: Nick Piggin @ 2004-10-05  3:14 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Chen, Kenneth W, linux-kernel

Con Kolivas wrote:

> Chen, Kenneth W writes:
>
>> Update p->timestamp to "now" in activate_task() doesn't look right
>> to me at all.  p->timestamp records last time it was running on a
>> cpu.  activate_task shouldn't update that variable when it queues
>> a task on the runqueue.
>>
>> This bug (and combined with others) triggers improper load balancing.
>
>
> The updated timestamp was placed there by Ingo to detect on-runqueue 
> time. If it is being used for load balancing then it is being used in 
> error.
>

Load balancing wants to know if a task is considered cache hot.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bug in sched.c:activate_task()
  2004-10-05  3:14   ` Nick Piggin
@ 2004-10-05  4:45     ` Con Kolivas
  0 siblings, 0 replies; 18+ messages in thread
From: Con Kolivas @ 2004-10-05  4:45 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Con Kolivas, Chen, Kenneth W, linux-kernel

Nick Piggin writes:

> Con Kolivas wrote:
> 
>> Chen, Kenneth W writes:
>>
>>> Update p->timestamp to "now" in activate_task() doesn't look right
>>> to me at all.  p->timestamp records last time it was running on a
>>> cpu.  activate_task shouldn't update that variable when it queues
>>> a task on the runqueue.
>>>
>>> This bug (and combined with others) triggers improper load balancing.
>>
>>
>> The updated timestamp was placed there by Ingo to detect on-runqueue 
>> time. If it is being used for load balancing then it is being used in 
>> error.
>>
> 
> Load balancing wants to know if a task is considered cache hot.

Yes I know. It used to be performed based on jiffies which was adequate 
resolution for cache warmth at the time. The timestamp was being used for on 
runqueue length measurement before the load balancing was modified to use 
that value.

Con


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bug in sched.c:activate_task()
  2004-10-05  2:34 ` Con Kolivas
  2004-10-05  3:14   ` Nick Piggin
@ 2004-10-05  5:05   ` Peter Williams
  1 sibling, 0 replies; 18+ messages in thread
From: Peter Williams @ 2004-10-05  5:05 UTC (permalink / raw)
  To: Kenneth W Chen; +Cc: Con Kolivas, Kenneth W, linux-kernel

Con Kolivas wrote:
> Chen, Kenneth W writes:
> 
>> Update p->timestamp to "now" in activate_task() doesn't look right
>> to me at all.  p->timestamp records last time it was running on a
>> cpu.  activate_task shouldn't update that variable when it queues
>> a task on the runqueue.
>>
>> This bug (and combined with others) triggers improper load balancing.
> 
> 
> The updated timestamp was placed there by Ingo to detect on-runqueue 
> time. If it is being used for load balancing then it is being used in 
> error.

The ZAPHOD scheduler (being trialled in 2.6.9-rc3-mm2) uses its own time 
stamp so as not to interfere with the use of timestamp by load balancing 
code so it should be OK to delete this line form activate_task() in 
2.6.9-rc3-mm2 without effecting ZAPHOD.

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bug in sched.c:activate_task()
  2004-10-05  2:16 bug in sched.c:activate_task() Chen, Kenneth W
  2004-10-05  2:34 ` Con Kolivas
@ 2004-10-05  6:10 ` Nick Piggin
  2004-10-05  6:31 ` Ingo Molnar
  2004-10-05  8:19 ` Ingo Molnar
  3 siblings, 0 replies; 18+ messages in thread
From: Nick Piggin @ 2004-10-05  6:10 UTC (permalink / raw)
  To: Chen, Kenneth W; +Cc: linux-kernel

Chen, Kenneth W wrote:
> Update p->timestamp to "now" in activate_task() doesn't look right
> to me at all.  p->timestamp records last time it was running on a
> cpu.  activate_task shouldn't update that variable when it queues
> a task on the runqueue.
> 
> This bug (and combined with others) triggers improper load balancing.
> 
> Patch against linux-2.6.9-rc3.  Didn't diff it against 2.6.9-rc3-mm2
> because mm tree has so many change in sched.c.
> 
> Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
> 

Actually, now that I have the code in front of me, I was wrong
and this patch is right.

This timestamp is never used for anything, so the assignment is
pointless.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bug in sched.c:activate_task()
  2004-10-05  2:16 bug in sched.c:activate_task() Chen, Kenneth W
  2004-10-05  2:34 ` Con Kolivas
  2004-10-05  6:10 ` Nick Piggin
@ 2004-10-05  6:31 ` Ingo Molnar
  2004-10-05  6:36   ` Con Kolivas
  2004-10-05  8:19 ` Ingo Molnar
  3 siblings, 1 reply; 18+ messages in thread
From: Ingo Molnar @ 2004-10-05  6:31 UTC (permalink / raw)
  To: Chen, Kenneth W; +Cc: linux-kernel, Andrew Morton


On Mon, 4 Oct 2004, Chen, Kenneth W wrote:

> Update p->timestamp to "now" in activate_task() doesn't look right to me
> at all.  p->timestamp records last time it was running on a cpu.  
> activate_task shouldn't update that variable when it queues a task on
> the runqueue.

correct, we are overriding it in schedule():

        if (likely(prev != next)) {
                next->timestamp = now;
                rq->nr_switches++;

the line your patch removes is a remnant of an earlier logic when we
timestamped tasks when they touched the runqueue. (vs. timestamping when
they actually run on a CPU.) So the patch looks good to me. Andrew, please
apply.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

        Ingo


> 
> This bug (and combined with others) triggers improper load balancing.
> 
> Patch against linux-2.6.9-rc3.  Didn't diff it against 2.6.9-rc3-mm2
> because mm tree has so many change in sched.c.
> 
> Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
> 
> 
> --- linux-2.6.9-rc3/kernel/sched.c.orig	2004-10-04 19:11:21.000000000 -0700
> +++ linux-2.6.9-rc3/kernel/sched.c	2004-10-04 19:11:35.000000000 -0700
> @@ -888,7 +888,6 @@ static void activate_task(task_t *p, run
>  			p->activated = 1;
>  		}
>  	}
> -	p->timestamp = now;
> 
>  	__activate_task(p, rq);
>  }
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bug in sched.c:activate_task()
  2004-10-05  6:31 ` Ingo Molnar
@ 2004-10-05  6:36   ` Con Kolivas
  2004-10-05  6:54     ` Ingo Molnar
  2004-10-05  6:57     ` Nick Piggin
  0 siblings, 2 replies; 18+ messages in thread
From: Con Kolivas @ 2004-10-05  6:36 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Chen, Kenneth W, linux-kernel, Andrew Morton

Ingo Molnar writes:

> 
> On Mon, 4 Oct 2004, Chen, Kenneth W wrote:
> 
>> Update p->timestamp to "now" in activate_task() doesn't look right to me
>> at all.  p->timestamp records last time it was running on a cpu.  
>> activate_task shouldn't update that variable when it queues a task on
>> the runqueue.
> 
> correct, we are overriding it in schedule():
> 
>         if (likely(prev != next)) {
>                 next->timestamp = now;
>                 rq->nr_switches++;
> 
> the line your patch removes is a remnant of an earlier logic when we
> timestamped tasks when they touched the runqueue. (vs. timestamping when
> they actually run on a CPU.) So the patch looks good to me. Andrew, please
> apply.

	unsigned long long delta = now - next->timestamp;

	if (next->activated == 1)
		delta = delta * (ON_RUNQUEUE_WEIGHT * 128 / 100) / 128;

is in schedule() before we update the timestamp, no?

Con


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bug in sched.c:activate_task()
  2004-10-05  6:36   ` Con Kolivas
@ 2004-10-05  6:54     ` Ingo Molnar
  2004-10-05  6:59       ` Con Kolivas
  2004-10-05 17:25       ` Chen, Kenneth W
  2004-10-05  6:57     ` Nick Piggin
  1 sibling, 2 replies; 18+ messages in thread
From: Ingo Molnar @ 2004-10-05  6:54 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Chen, Kenneth W, linux-kernel, Andrew Morton


On Tue, 5 Oct 2004, Con Kolivas wrote:

> 	unsigned long long delta = now - next->timestamp;
> 
> 	if (next->activated == 1)
> 		delta = delta * (ON_RUNQUEUE_WEIGHT * 128 / 100) / 128;
> 
> is in schedule() before we update the timestamp, no?

indeed ... so the patch is just random incorrect damage that happened to
distrub the scheduler fixing some balancing problem. Kenneth, what
precisely is the balancing problem you are seeing?

	Ingo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bug in sched.c:activate_task()
  2004-10-05  6:36   ` Con Kolivas
  2004-10-05  6:54     ` Ingo Molnar
@ 2004-10-05  6:57     ` Nick Piggin
  1 sibling, 0 replies; 18+ messages in thread
From: Nick Piggin @ 2004-10-05  6:57 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Ingo Molnar, Chen, Kenneth W, linux-kernel, Andrew Morton

Con Kolivas wrote:
> Ingo Molnar writes:
> 
>>
>> On Mon, 4 Oct 2004, Chen, Kenneth W wrote:
>>
>>> Update p->timestamp to "now" in activate_task() doesn't look right to me
>>> at all.  p->timestamp records last time it was running on a cpu.  
>>> activate_task shouldn't update that variable when it queues a task on
>>> the runqueue.
>>
>>
>> correct, we are overriding it in schedule():
>>
>>         if (likely(prev != next)) {
>>                 next->timestamp = now;
>>                 rq->nr_switches++;
>>
>> the line your patch removes is a remnant of an earlier logic when we
>> timestamped tasks when they touched the runqueue. (vs. timestamping when
>> they actually run on a CPU.) So the patch looks good to me. Andrew, 
>> please
>> apply.
> 
> 
>     unsigned long long delta = now - next->timestamp;
> 
>     if (next->activated == 1)
>         delta = delta * (ON_RUNQUEUE_WEIGHT * 128 / 100) / 128;
> 
> is in schedule() before we update the timestamp, no?
> 

Yeah right, unfortunately.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bug in sched.c:activate_task()
  2004-10-05  6:54     ` Ingo Molnar
@ 2004-10-05  6:59       ` Con Kolivas
  2004-10-05  7:08         ` Nick Piggin
  2004-10-05  7:09         ` Ingo Molnar
  2004-10-05 17:25       ` Chen, Kenneth W
  1 sibling, 2 replies; 18+ messages in thread
From: Con Kolivas @ 2004-10-05  6:59 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Con Kolivas, Chen, Kenneth W, linux-kernel, Andrew Morton

Ingo Molnar writes:

> 
> On Tue, 5 Oct 2004, Con Kolivas wrote:
> 
>> 	unsigned long long delta = now - next->timestamp;
>> 
>> 	if (next->activated == 1)
>> 		delta = delta * (ON_RUNQUEUE_WEIGHT * 128 / 100) / 128;
>> 
>> is in schedule() before we update the timestamp, no?
> 
> indeed ... so the patch is just random incorrect damage that happened to
> distrub the scheduler fixing some balancing problem. Kenneth, what
> precisely is the balancing problem you are seeing?

We used to compare jiffy difference in can_migrate_task by comparing it to
cache_decay_ticks. Somewhere in the merging of sched_domains it was changed 
to task_hot which uses timestamp.

Con


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bug in sched.c:activate_task()
  2004-10-05  6:59       ` Con Kolivas
@ 2004-10-05  7:08         ` Nick Piggin
  2004-10-05  7:09         ` Ingo Molnar
  1 sibling, 0 replies; 18+ messages in thread
From: Nick Piggin @ 2004-10-05  7:08 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Ingo Molnar, Chen, Kenneth W, linux-kernel, Andrew Morton

Con Kolivas wrote:
> Ingo Molnar writes:
> 
>>
>> On Tue, 5 Oct 2004, Con Kolivas wrote:
>>
>>>     unsigned long long delta = now - next->timestamp;
>>>
>>>     if (next->activated == 1)
>>>         delta = delta * (ON_RUNQUEUE_WEIGHT * 128 / 100) / 128;
>>>
>>> is in schedule() before we update the timestamp, no?
>>
>>
>> indeed ... so the patch is just random incorrect damage that happened to
>> distrub the scheduler fixing some balancing problem. Kenneth, what
>> precisely is the balancing problem you are seeing?
> 
> 
> We used to compare jiffy difference in can_migrate_task by comparing it to
> cache_decay_ticks. Somewhere in the merging of sched_domains it was 
> changed to task_hot which uses timestamp.
> 

It always used ->timestamp though, even when that was in jiffies.
sched domains didn't change anything there (and IIRC it used
task_hot before sched domains).

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bug in sched.c:activate_task()
  2004-10-05  6:59       ` Con Kolivas
  2004-10-05  7:08         ` Nick Piggin
@ 2004-10-05  7:09         ` Ingo Molnar
  2004-10-05 17:30           ` Chen, Kenneth W
  1 sibling, 1 reply; 18+ messages in thread
From: Ingo Molnar @ 2004-10-05  7:09 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Chen, Kenneth W, linux-kernel, Andrew Morton, Nick Piggin


On Tue, 5 Oct 2004, Con Kolivas wrote:

> We used to compare jiffy difference in can_migrate_task by comparing it
> to cache_decay_ticks. Somewhere in the merging of sched_domains it was
> changed to task_hot which uses timestamp.

yep, that's fishy. Kenneth, could you try the simple patch below? It gets
rid of task_hot() in essence. If this works out we could try it - it gets
rid of some more code from sched.c too. Perhaps SD_WAKE_AFFINE is enough
control.

	Ingo

--- kernel/sched.c.orig	2004-10-05 08:28:42.295395160 +0200
+++ kernel/sched.c	2004-10-05 09:07:44.081389576 +0200
@@ -180,7 +180,7 @@ static unsigned int task_timeslice(task_
 	else
 		return SCALE_PRIO(DEF_TIMESLICE, p->static_prio);
 }
-#define task_hot(p, now, sd) ((now) - (p)->timestamp < (sd)->cache_hot_time)
+#define task_hot(p, now, sd) 0
 
 enum idle_type
 {

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bug in sched.c:activate_task()
  2004-10-05  2:16 bug in sched.c:activate_task() Chen, Kenneth W
                   ` (2 preceding siblings ...)
  2004-10-05  6:31 ` Ingo Molnar
@ 2004-10-05  8:19 ` Ingo Molnar
  2004-10-05 17:44   ` Chen, Kenneth W
  3 siblings, 1 reply; 18+ messages in thread
From: Ingo Molnar @ 2004-10-05  8:19 UTC (permalink / raw)
  To: Chen, Kenneth W; +Cc: linux-kernel, Andrew Morton, Nick Piggin


* Chen, Kenneth W <kenneth.w.chen@intel.com> wrote:

> Update p->timestamp to "now" in activate_task() doesn't look right to
> me at all.  p->timestamp records last time it was running on a cpu. 
> activate_task shouldn't update that variable when it queues a task on
> the runqueue.

it is being used for multiple purposes: measuring on-CPU time, measuring
on-runqueue time (for interactivity) and measuring off-runqueue time
(for interactivity). It is also used to measure cache-hotness, by the 
balancing code.

now, this particular update of p->timestamp:

> @@ -888,7 +888,6 @@ static void activate_task(task_t *p, run

> -	p->timestamp = now;

is important for the interactivity code that runs in schedule() - it
wants to know how much time we spent on the runqueue without having run.

but you are right that the task_hot() use of p->timestamp is incorrect
for freshly woken up tasks - we want the balancer to be able to re-route
tasks to another CPU, as long as the task has not hit the runqueue yet
(which it hasnt where we consider it in the balancer).

the clean solution is to separate the multiple uses of p->timestamp:
with the patch below p->timestamp is purely used for the interactivity
code, and p->last_ran is for the rebalancer. The patch is ontop of your
task_hot() fix-patch. Does this work for your workload?

	Ingo

Signed-off-by: Ingo Molnar <mingo@elte.hu>

--- linux/kernel/sched.c.orig
+++ linux/kernel/sched.c
@@ -185,7 +185,8 @@ static unsigned int task_timeslice(task_
 	else
 		return SCALE_PRIO(DEF_TIMESLICE, p->static_prio);
 }
-#define task_hot(p, now, sd) ((now) - (p)->timestamp < (sd)->cache_hot_time)
+#define task_hot(p, now, sd) ((long long) ((now) - (p)->last_ran)	\
+				< (long long) (sd)->cache_hot_time)
 
 /*
  * These are the runqueue data structures:
@@ -2833,7 +2834,7 @@ switch_tasks:
 		if (!(HIGH_CREDIT(prev) || LOW_CREDIT(prev)))
 			prev->interactive_credit--;
 	}
-	prev->timestamp = now;
+	prev->timestamp = prev->last_ran = now;
 
 	sched_info_switch(prev, next);
 	if (likely(prev != next)) {
--- linux/include/linux/sched.h.orig
+++ linux/include/linux/sched.h
@@ -527,7 +527,7 @@ struct task_struct {
 
 	unsigned long sleep_avg;
 	long interactive_credit;
-	unsigned long long timestamp;
+	unsigned long long timestamp, last_ran;
 	int activated;
 
 	unsigned long policy;

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: bug in sched.c:activate_task()
  2004-10-05  6:54     ` Ingo Molnar
  2004-10-05  6:59       ` Con Kolivas
@ 2004-10-05 17:25       ` Chen, Kenneth W
  1 sibling, 0 replies; 18+ messages in thread
From: Chen, Kenneth W @ 2004-10-05 17:25 UTC (permalink / raw)
  To: 'Ingo Molnar', Con Kolivas; +Cc: linux-kernel, Andrew Morton

Ingo Molnar wrote on Monday, October 04, 2004 11:55 PM
> On Tue, 5 Oct 2004, Con Kolivas wrote:
>
> > 	unsigned long long delta = now - next->timestamp;
> >
> > 	if (next->activated == 1)
> > 		delta = delta * (ON_RUNQUEUE_WEIGHT * 128 / 100) / 128;
> >
> > is in schedule() before we update the timestamp, no?
>
> indeed ... so the patch is just random incorrect damage that happened to
> distrub the scheduler fixing some balancing problem. Kenneth, what
> precisely is the balancing problem you are seeing?

We are seeing truck load of move_task() which was originated from newly
idle load balance.  The problem is load balancing going nuts moving tons
of tasks around and what we are seeing is cache misses for the application
shots up to the roof and suffering from cache threshing.

- Ken



^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: bug in sched.c:activate_task()
  2004-10-05  7:09         ` Ingo Molnar
@ 2004-10-05 17:30           ` Chen, Kenneth W
  0 siblings, 0 replies; 18+ messages in thread
From: Chen, Kenneth W @ 2004-10-05 17:30 UTC (permalink / raw)
  To: 'Ingo Molnar', Con Kolivas
  Cc: linux-kernel, Andrew Morton, Nick Piggin

On Tue, 5 Oct 2004, Con Kolivas wrote:
> We used to compare jiffy difference in can_migrate_task by comparing it
> to cache_decay_ticks. Somewhere in the merging of sched_domains it was
> changed to task_hot which uses timestamp.


On Tuesday, October 05, 2004 12:10 AM, Ingo Molnar wrote:
> yep, that's fishy. Kenneth, could you try the simple patch below? It gets
> rid of task_hot() in essence. If this works out we could try it - it gets
> rid of some more code from sched.c too. Perhaps SD_WAKE_AFFINE is enough
> control.
>
> --- kernel/sched.c.orig	2004-10-05 08:28:42.295395160 +0200
> +++ kernel/sched.c	2004-10-05 09:07:44.081389576 +0200
> @@ -180,7 +180,7 @@ static unsigned int task_timeslice(task_
>  	else
>  		return SCALE_PRIO(DEF_TIMESLICE, p->static_prio);
>  }
> -#define task_hot(p, now, sd) ((now) - (p)->timestamp < (sd)->cache_hot_time)
> +#define task_hot(p, now, sd) 0
>
>  enum idle_type
>  {

We have experimented with similar thing, via bumping up sd->cache_hot_time to
a very large number, like 1 sec.  What we measured was a equally low throughput.
But that was because of not enough load balancing, we are seeing is large amount
of idle time.

- Ken



^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: bug in sched.c:activate_task()
  2004-10-05  8:19 ` Ingo Molnar
@ 2004-10-05 17:44   ` Chen, Kenneth W
  0 siblings, 0 replies; 18+ messages in thread
From: Chen, Kenneth W @ 2004-10-05 17:44 UTC (permalink / raw)
  To: 'Ingo Molnar'; +Cc: linux-kernel, Andrew Morton, Nick Piggin

* Chen, Kenneth W <kenneth.w.chen@intel.com> wrote:
> Update p->timestamp to "now" in activate_task() doesn't look right to
> me at all.  p->timestamp records last time it was running on a cpu.
> activate_task shouldn't update that variable when it queues a task on
> the runqueue.


Ingo Molnar wrote on Tuesday, October 05, 2004 1:19 AM
> it is being used for multiple purposes: measuring on-CPU time, measuring
> on-runqueue time (for interactivity) and measuring off-runqueue time
> (for interactivity). It is also used to measure cache-hotness, by the
> balancing code.
>
> now, this particular update of p->timestamp:
>
> > @@ -888,7 +888,6 @@ static void activate_task(task_t *p, run
>
> > -	p->timestamp = now;
>
> is important for the interactivity code that runs in schedule() - it
> wants to know how much time we spent on the runqueue without having run.
>
> but you are right that the task_hot() use of p->timestamp is incorrect
> for freshly woken up tasks - we want the balancer to be able to re-route
> tasks to another CPU, as long as the task has not hit the runqueue yet
> (which it hasnt where we consider it in the balancer).
>
> the clean solution is to separate the multiple uses of p->timestamp:
> with the patch below p->timestamp is purely used for the interactivity
> code, and p->last_ran is for the rebalancer. The patch is ontop of your
> task_hot() fix-patch. Does this work for your workload?

I will take this for a spin and report back the result.

- Ken



^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: bug in sched.c:activate_task()
@ 2004-10-05 22:46 Chen, Kenneth W
  0 siblings, 0 replies; 18+ messages in thread
From: Chen, Kenneth W @ 2004-10-05 22:46 UTC (permalink / raw)
  To: 'Ingo Molnar'
  Cc: linux-kernel, 'Andrew Morton', 'Nick Piggin'

* Chen, Kenneth W <kenneth.w.chen@intel.com> wrote:
> Update p->timestamp to "now" in activate_task() doesn't look right to
> me at all.  p->timestamp records last time it was running on a cpu.
> activate_task shouldn't update that variable when it queues a task on
> the runqueue.

Ingo Molnar wrote on Tuesday, October 05, 2004 1:19 AM
> it is being used for multiple purposes: measuring on-CPU time, measuring
> on-runqueue time (for interactivity) and measuring off-runqueue time
> (for interactivity). It is also used to measure cache-hotness, by the
> balancing code.
>
> now, this particular update of p->timestamp:
>
> > @@ -888,7 +888,6 @@ static void activate_task(task_t *p, run
>
> > -	p->timestamp = now;
>
> is important for the interactivity code that runs in schedule() - it
> wants to know how much time we spent on the runqueue without having run.
>
> but you are right that the task_hot() use of p->timestamp is incorrect
> for freshly woken up tasks - we want the balancer to be able to re-route
> tasks to another CPU, as long as the task has not hit the runqueue yet
> (which it hasnt where we consider it in the balancer).
>
> the clean solution is to separate the multiple uses of p->timestamp:
> with the patch below p->timestamp is purely used for the interactivity
> code, and p->last_ran is for the rebalancer. The patch is ontop of your
> task_hot() fix-patch. Does this work for your workload?


Chen, Kenneth W wrote on Tuesday, October 05, 2004 10:45 AM
> I will take this for a spin and report back the result.

Confirmed that Ingo's latest patch does the right thing for the db workload.

- Ken



^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2004-10-05 22:47 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-10-05  2:16 bug in sched.c:activate_task() Chen, Kenneth W
2004-10-05  2:34 ` Con Kolivas
2004-10-05  3:14   ` Nick Piggin
2004-10-05  4:45     ` Con Kolivas
2004-10-05  5:05   ` Peter Williams
2004-10-05  6:10 ` Nick Piggin
2004-10-05  6:31 ` Ingo Molnar
2004-10-05  6:36   ` Con Kolivas
2004-10-05  6:54     ` Ingo Molnar
2004-10-05  6:59       ` Con Kolivas
2004-10-05  7:08         ` Nick Piggin
2004-10-05  7:09         ` Ingo Molnar
2004-10-05 17:30           ` Chen, Kenneth W
2004-10-05 17:25       ` Chen, Kenneth W
2004-10-05  6:57     ` Nick Piggin
2004-10-05  8:19 ` Ingo Molnar
2004-10-05 17:44   ` Chen, Kenneth W
  -- strict thread matches above, loose matches on Subject: below --
2004-10-05 22:46 Chen, Kenneth W

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox