* [PATCH] sched: avg_overlap decay
@ 2009-03-10 18:18 Peter Zijlstra
2009-03-11 4:09 ` Mike Galbraith
2009-03-11 11:14 ` [tip:sched/core] sched: add " Mike Galbraith
0 siblings, 2 replies; 3+ messages in thread
From: Peter Zijlstra @ 2009-03-10 18:18 UTC (permalink / raw)
To: Ingo Molnar, Mike Galbraith; +Cc: lkml
Mike, are you good with this patch as it stands?
---
Subject: sched: avg_overlap decay
From: Mike Galbraith <efault@gmx.de>
Date: Tue Mar 10 19:08:11 CET 2009
avg_overlap is used to measure the runtime overlap of the waker and wakee.
However, when a process changes behaviour, eg a pipe becomes un-congested
and we don't need to go to sleep after a wakeup for a while, the avg_overlap
value grows stale.
When running we use the avg runtime between preemption as a measure for
avg_overlap since the amount of runtime can be correlated to cache footprint.
The longer we run, the less likely we'll be wanting to be migrated to another
CPU.
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
kernel/sched.c | 24 +++++++++++++++++++++++-
1 file changed, 23 insertions(+), 1 deletion(-)
Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -4692,6 +4692,28 @@ static inline void schedule_debug(struct
#endif
}
+static void put_prev_task(struct rq *rq, struct task_struct *prev)
+{
+ if (prev->state == TASK_RUNNING) {
+ u64 runtime = prev->se.sum_exec_runtime;
+
+ runtime -= prev->se.prev_sum_exec_runtime;
+ runtime = min_t(u64, runtime, 2*sysctl_sched_migration_cost);
+
+ /*
+ * In order to avoid avg_overlap growing stale when we are
+ * indeed overlapping and hence not getting put to sleep, grow
+ * the avg_overlap on preemption.
+ *
+ * We use the average preemption runtime because that
+ * correlates to the amount of cache footprint a task can
+ * build up.
+ */
+ update_avg(&prev->se.avg_overlap, runtime);
+ }
+ prev->sched_class->put_prev_task(rq, prev);
+}
+
/*
* Pick up the highest-prio task:
*/
@@ -4768,7 +4790,7 @@ need_resched_nonpreemptible:
if (unlikely(!rq->nr_running))
idle_balance(cpu, rq);
- prev->sched_class->put_prev_task(rq, prev);
+ put_prev_task(rq, prev);
next = pick_next_task(rq);
if (likely(prev != next)) {
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] sched: avg_overlap decay
2009-03-10 18:18 [PATCH] sched: avg_overlap decay Peter Zijlstra
@ 2009-03-11 4:09 ` Mike Galbraith
2009-03-11 11:14 ` [tip:sched/core] sched: add " Mike Galbraith
1 sibling, 0 replies; 3+ messages in thread
From: Mike Galbraith @ 2009-03-11 4:09 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: Ingo Molnar, lkml
On Tue, 2009-03-10 at 19:18 +0100, Peter Zijlstra wrote:
> Mike, are you good with this patch as it stands?
Yes, works for me.
-Mike
> ---
> Subject: sched: avg_overlap decay
> From: Mike Galbraith <efault@gmx.de>
> Date: Tue Mar 10 19:08:11 CET 2009
>
> avg_overlap is used to measure the runtime overlap of the waker and wakee.
>
> However, when a process changes behaviour, eg a pipe becomes un-congested
> and we don't need to go to sleep after a wakeup for a while, the avg_overlap
> value grows stale.
>
> When running we use the avg runtime between preemption as a measure for
> avg_overlap since the amount of runtime can be correlated to cache footprint.
>
> The longer we run, the less likely we'll be wanting to be migrated to another
> CPU.
>
> Signed-off-by: Mike Galbraith <efault@gmx.de>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
> kernel/sched.c | 24 +++++++++++++++++++++++-
> 1 file changed, 23 insertions(+), 1 deletion(-)
>
> Index: linux-2.6/kernel/sched.c
> ===================================================================
> --- linux-2.6.orig/kernel/sched.c
> +++ linux-2.6/kernel/sched.c
> @@ -4692,6 +4692,28 @@ static inline void schedule_debug(struct
> #endif
> }
>
> +static void put_prev_task(struct rq *rq, struct task_struct *prev)
> +{
> + if (prev->state == TASK_RUNNING) {
> + u64 runtime = prev->se.sum_exec_runtime;
> +
> + runtime -= prev->se.prev_sum_exec_runtime;
> + runtime = min_t(u64, runtime, 2*sysctl_sched_migration_cost);
> +
> + /*
> + * In order to avoid avg_overlap growing stale when we are
> + * indeed overlapping and hence not getting put to sleep, grow
> + * the avg_overlap on preemption.
> + *
> + * We use the average preemption runtime because that
> + * correlates to the amount of cache footprint a task can
> + * build up.
> + */
> + update_avg(&prev->se.avg_overlap, runtime);
> + }
> + prev->sched_class->put_prev_task(rq, prev);
> +}
> +
> /*
> * Pick up the highest-prio task:
> */
> @@ -4768,7 +4790,7 @@ need_resched_nonpreemptible:
> if (unlikely(!rq->nr_running))
> idle_balance(cpu, rq);
>
> - prev->sched_class->put_prev_task(rq, prev);
> + put_prev_task(rq, prev);
> next = pick_next_task(rq);
>
> if (likely(prev != next)) {
>
^ permalink raw reply [flat|nested] 3+ messages in thread
* [tip:sched/core] sched: add avg_overlap decay
2009-03-10 18:18 [PATCH] sched: avg_overlap decay Peter Zijlstra
2009-03-11 4:09 ` Mike Galbraith
@ 2009-03-11 11:14 ` Mike Galbraith
1 sibling, 0 replies; 3+ messages in thread
From: Mike Galbraith @ 2009-03-11 11:14 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, hpa, mingo, a.p.zijlstra, efault, tglx, mingo
Commit-ID: df1c99d416500da8d26a4d78777467c53ee7689e
Gitweb: http://git.kernel.org/tip/df1c99d416500da8d26a4d78777467c53ee7689e
Author: "Mike Galbraith" <efault@gmx.de>
AuthorDate: Tue, 10 Mar 2009 19:08:11 +0100
Commit: Ingo Molnar <mingo@elte.hu>
CommitDate: Wed, 11 Mar 2009 11:31:50 +0100
sched: add avg_overlap decay
Impact: more precise avg_overlap metric - better load-balancing
avg_overlap is used to measure the runtime overlap of the waker and
wakee.
However, when a process changes behaviour, eg a pipe becomes
un-congested and we don't need to go to sleep after a wakeup
for a while, the avg_overlap value grows stale.
When running we use the avg runtime between preemption as a
measure for avg_overlap since the amount of runtime can be
correlated to cache footprint.
The longer we run, the less likely we'll be wanting to be
migrated to another CPU.
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1236709131.25234.576.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
kernel/sched.c | 24 +++++++++++++++++++++++-
1 files changed, 23 insertions(+), 1 deletions(-)
diff --git a/kernel/sched.c b/kernel/sched.c
index af5cd1b..2f28351 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -4620,6 +4620,28 @@ static inline void schedule_debug(struct task_struct *prev)
#endif
}
+static void put_prev_task(struct rq *rq, struct task_struct *prev)
+{
+ if (prev->state == TASK_RUNNING) {
+ u64 runtime = prev->se.sum_exec_runtime;
+
+ runtime -= prev->se.prev_sum_exec_runtime;
+ runtime = min_t(u64, runtime, 2*sysctl_sched_migration_cost);
+
+ /*
+ * In order to avoid avg_overlap growing stale when we are
+ * indeed overlapping and hence not getting put to sleep, grow
+ * the avg_overlap on preemption.
+ *
+ * We use the average preemption runtime because that
+ * correlates to the amount of cache footprint a task can
+ * build up.
+ */
+ update_avg(&prev->se.avg_overlap, runtime);
+ }
+ prev->sched_class->put_prev_task(rq, prev);
+}
+
/*
* Pick up the highest-prio task:
*/
@@ -4698,7 +4720,7 @@ need_resched_nonpreemptible:
if (unlikely(!rq->nr_running))
idle_balance(cpu, rq);
- prev->sched_class->put_prev_task(rq, prev);
+ put_prev_task(rq, prev);
next = pick_next_task(rq);
if (likely(prev != next)) {
^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2009-03-11 11:14 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-03-10 18:18 [PATCH] sched: avg_overlap decay Peter Zijlstra
2009-03-11 4:09 ` Mike Galbraith
2009-03-11 11:14 ` [tip:sched/core] sched: add " Mike Galbraith
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox