From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: linux-kernel@vger.kernel.org
Cc: Ingo Molnar <mingo@elte.hu>,
Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>,
Mike Galbraith <efault@gmx.de>,
Peter Zijlstra <a.p.zijlstra@chello.nl>
Subject: [PATCH 19/30] sched: hierarchical load vs find_busiest_group
Date: Fri, 27 Jun 2008 13:41:28 +0200 [thread overview]
Message-ID: <20080627115211.899673668@chello.nl> (raw)
In-Reply-To: 20080627114109.724249622@chello.nl
[-- Attachment #1: sched-find_busiest_group.patch --]
[-- Type: text/plain, Size: 2904 bytes --]
find_busiest_group() has some assumptions about task weight being in the
NICE_0_LOAD range. Hierarchical task groups break this assumption - fix this
by replacing it with the average task weight, which will adapt the situation.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
kernel/sched.c | 26 +++++++++++++++++++++++---
1 file changed, 23 insertions(+), 3 deletions(-)
Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -3111,6 +3111,7 @@ find_busiest_group(struct sched_domain *
max_load = this_load = total_load = total_pwr = 0;
busiest_load_per_task = busiest_nr_running = 0;
this_load_per_task = this_nr_running = 0;
+
if (idle == CPU_NOT_IDLE)
load_idx = sd->busy_idx;
else if (idle == CPU_NEWLY_IDLE)
@@ -3125,6 +3126,8 @@ find_busiest_group(struct sched_domain *
int __group_imb = 0;
unsigned int balance_cpu = -1, first_idle_cpu = 0;
unsigned long sum_nr_running, sum_weighted_load;
+ unsigned long sum_avg_load_per_task;
+ unsigned long avg_load_per_task;
local_group = cpu_isset(this_cpu, group->cpumask);
@@ -3133,6 +3136,8 @@ find_busiest_group(struct sched_domain *
/* Tally up the load of all CPUs in the group */
sum_weighted_load = sum_nr_running = avg_load = 0;
+ sum_avg_load_per_task = avg_load_per_task = 0;
+
max_cpu_load = 0;
min_cpu_load = ~0UL;
@@ -3166,6 +3171,8 @@ find_busiest_group(struct sched_domain *
avg_load += load;
sum_nr_running += rq->nr_running;
sum_weighted_load += weighted_cpuload(i);
+
+ sum_avg_load_per_task += cpu_avg_load_per_task(i);
}
/*
@@ -3187,7 +3194,20 @@ find_busiest_group(struct sched_domain *
avg_load = sg_div_cpu_power(group,
avg_load * SCHED_LOAD_SCALE);
- if ((max_cpu_load - min_cpu_load) > SCHED_LOAD_SCALE)
+
+ /*
+ * Consider the group unbalanced when the imbalance is larger
+ * than the average weight of two tasks.
+ *
+ * APZ: with cgroup the avg task weight can vary wildly and
+ * might not be a suitable number - should we keep a
+ * normalized nr_running number somewhere that negates
+ * the hierarchy?
+ */
+ avg_load_per_task = sg_div_cpu_power(group,
+ sum_avg_load_per_task * SCHED_LOAD_SCALE);
+
+ if ((max_cpu_load - min_cpu_load) > 2*avg_load_per_task)
__group_imb = 1;
group_capacity = group->__cpu_power / SCHED_LOAD_SCALE;
@@ -3328,9 +3348,9 @@ small_imbalance:
if (busiest_load_per_task > this_load_per_task)
imbn = 1;
} else
- this_load_per_task = SCHED_LOAD_SCALE;
+ this_load_per_task = cpu_avg_load_per_task(this_cpu);
- if (max_load - this_load + SCHED_LOAD_SCALE_FUZZ >=
+ if (max_load - this_load + 2*busiest_load_per_task >=
busiest_load_per_task * imbn) {
*imbalance = busiest_load_per_task;
return busiest;
--
next prev parent reply other threads:[~2008-06-27 12:02 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-06-27 11:41 [PATCH 00/30] SMP-group balancer - take 3 Peter Zijlstra
2008-06-27 11:41 ` [PATCH 01/30] sched: clean up some unused variables Peter Zijlstra
2008-06-27 11:41 ` [PATCH 02/30] sched: revert the revert of: weight calculations Peter Zijlstra
2008-06-30 18:07 ` Balbir Singh
2008-07-15 20:16 ` Peter Zijlstra
2008-06-27 11:41 ` [PATCH 03/30] sched: fix calc_delta_asym() Peter Zijlstra
2008-06-27 11:41 ` [PATCH 04/30] sched: fix calc_delta_asym Peter Zijlstra
2008-06-27 11:41 ` [PATCH 05/30] sched: revert revert of: fair-group: SMP-nice for group scheduling Peter Zijlstra
2008-06-27 11:41 ` [PATCH 06/30] sched: sched_clock_cpu() based cpu_clock() Peter Zijlstra
2008-06-27 11:41 ` [PATCH 07/30] sched: fix wakeup granularity and buddy granularity Peter Zijlstra
2008-06-27 11:41 ` [PATCH 08/30] sched: add full schedstats to /proc/sched_debug Peter Zijlstra
2008-06-27 11:41 ` [PATCH 09/30] sched: fix sched_domain aggregation Peter Zijlstra
2008-06-27 11:41 ` [PATCH 10/30] sched: update aggregate when holding the RQs Peter Zijlstra
2008-06-27 11:41 ` [PATCH 11/30] sched: kill task_group balancing Peter Zijlstra
2008-06-27 11:41 ` [PATCH 12/30] sched: dont micro manage share losses Peter Zijlstra
2008-06-27 11:41 ` [PATCH 13/30] sched: no need to aggregate task_weight Peter Zijlstra
2008-06-27 11:41 ` [PATCH 14/30] sched: simplify the group load balancer Peter Zijlstra
2008-06-27 11:41 ` [PATCH 15/30] sched: fix newidle smp group balancing Peter Zijlstra
2008-06-27 11:41 ` [PATCH 16/30] sched: fix sched_balance_self() " Peter Zijlstra
2008-06-27 11:41 ` [PATCH 17/30] sched: persistent average load per task Peter Zijlstra
2008-06-27 11:41 ` [PATCH 18/30] sched: hierarchical load vs affine wakeups Peter Zijlstra
2008-06-27 11:41 ` Peter Zijlstra [this message]
2008-06-27 11:41 ` [PATCH 20/30] sched: fix load scaling in group balancing Peter Zijlstra
2008-06-27 11:41 ` [PATCH 21/30] sched: fix task_h_load() Peter Zijlstra
2008-06-27 11:41 ` [PATCH 22/30] sched: remove prio preference from balance decisions Peter Zijlstra
2008-06-27 11:41 ` [PATCH 23/30] sched: optimize effective_load() Peter Zijlstra
2008-06-27 11:41 ` [PATCH 24/30] sched: disable source/target_load bias Peter Zijlstra
2008-06-27 11:41 ` [PATCH 25/30] sched: fix shares boost logic Peter Zijlstra
2008-06-27 11:41 ` [PATCH 26/30] sched: update shares on wakeup Peter Zijlstra
2008-06-27 11:41 ` [PATCH 27/30] sched: fix mult overflow Peter Zijlstra
2008-06-27 11:41 ` [PATCH 28/30] sched: correct wakeup weight calculations Peter Zijlstra
2008-06-27 11:41 ` [PATCH 29/30] sched: incremental effective_load() Peter Zijlstra
2008-06-27 11:41 ` [PATCH 30/30] sched: bias effective_load() error towards failing wake_affine() Peter Zijlstra
2008-06-27 12:46 ` [PATCH 00/30] SMP-group balancer - take 3 Ingo Molnar
2008-06-27 17:33 ` Dhaval Giani
2008-06-28 17:08 ` Dhaval Giani
2008-06-30 12:59 ` Ingo Molnar
2008-06-30 14:53 ` Dhaval Giani
2008-07-01 10:57 ` Dhaval Giani
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080627115211.899673668@chello.nl \
--to=a.p.zijlstra@chello.nl \
--cc=efault@gmx.de \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=vatsa@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox