From: "Martin J. Bligh" <mbligh@aracnet.com>
To: Erich Focht <efocht@ess.nec.de>,
Michael Hohnbaum <hohnbaum@us.ibm.com>,
Robert Love <rml@tech9.net>, Ingo Molnar <mingo@elte.hu>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
lse-tech <lse-tech@lists.sourceforge.net>
Subject: Minature NUMA scheduler
Date: Thu, 09 Jan 2003 15:54:08 -0800 [thread overview]
Message-ID: <52570000.1042156448@flay> (raw)
I tried a small experiment today - did a simple restriction of
the O(1) scheduler to only balance inside a node. Coupled with
the small initial load balancing patch floating around, this
covers 95% of cases, is a trivial change (3 lines), performs
just as well as Erich's patch on a kernel compile, and actually
better on schedbench.
This is NOT meant to be a replacement for the code Erich wrote,
it's meant to be a simple way to get integration and acceptance.
Code that just forks and never execs will stay on one node - but
we can take the code Erich wrote, and put it in seperate rebalancer
that fires much less often to do a cross-node rebalance. All that
would be under #ifdef CONFIG_NUMA, the only thing that would touch
mainline is these three lines of change, and it's trivial to see
they're completely equivalent to the current code on non-NUMA systems.
I also believe that this is the more correct approach in design, it
should result in much less cross-node migration of tasks, and less
scanning of remote runqueues.
Opinions / comments?
M.
Kernbench:
Elapsed User System CPU
2.5.54-mjb3 19.41s 186.38s 39.624s 1191.4%
2.5.54-mjb3-mjbsched 19.508s 186.356s 39.888s 1164.6%
Schedbench 4:
AvgUser Elapsed TotalUser TotalSys
2.5.54-mjb3 0.00 35.14 88.82 0.64
2.5.54-mjb3-mjbsched 0.00 31.84 88.91 0.49
Schedbench 8:
AvgUser Elapsed TotalUser TotalSys
2.5.54-mjb3 0.00 47.55 269.36 1.48
2.5.54-mjb3-mjbsched 0.00 41.01 252.34 1.07
Schedbench 16:
AvgUser Elapsed TotalUser TotalSys
2.5.54-mjb3 0.00 76.53 957.48 4.17
2.5.54-mjb3-mjbsched 0.00 69.01 792.71 2.74
Schedbench 32:
AvgUser Elapsed TotalUser TotalSys
2.5.54-mjb3 0.00 145.20 1993.97 11.05
2.5.54-mjb3-mjbsched 0.00 117.47 1798.93 5.95
Schedbench 64:
AvgUser Elapsed TotalUser TotalSys
2.5.54-mjb3 0.00 307.80 4643.55 20.36
2.5.54-mjb3-mjbsched 0.00 241.04 3589.55 12.74
-----------------------------------------
diff -purN -X /home/mbligh/.diff.exclude virgin/kernel/sched.c mjbsched/kernel/sched.c
--- virgin/kernel/sched.c Mon Dec 9 18:46:15 2002
+++ mjbsched/kernel/sched.c Thu Jan 9 14:09:17 2003
@@ -654,7 +654,7 @@ static inline unsigned int double_lock_b
/*
* find_busiest_queue - find the busiest runqueue.
*/
-static inline runqueue_t *find_busiest_queue(runqueue_t *this_rq, int this_cpu, int idle, int *imbalance)
+static inline runqueue_t *find_busiest_queue(runqueue_t *this_rq, int this_cpu, int idle, int *imbalance, unsigned long cpumask)
{
int nr_running, load, max_load, i;
runqueue_t *busiest, *rq_src;
@@ -689,7 +689,7 @@ static inline runqueue_t *find_busiest_q
busiest = NULL;
max_load = 1;
for (i = 0; i < NR_CPUS; i++) {
- if (!cpu_online(i))
+ if (!cpu_online(i) || !((1 << i) & cpumask) )
continue;
rq_src = cpu_rq(i);
@@ -764,7 +764,8 @@ static void load_balance(runqueue_t *thi
struct list_head *head, *curr;
task_t *tmp;
- busiest = find_busiest_queue(this_rq, this_cpu, idle, &imbalance);
+ busiest = find_busiest_queue(this_rq, this_cpu, idle, &imbalance,
+ __node_to_cpu_mask(__cpu_to_node(this_cpu)) );
if (!busiest)
goto out;
---------------------------------------------------
A tiny change in the current ilb patch is also needed to stop it
using a macro from the first patch:
diff -purN -X /home/mbligh/.diff.exclude ilbold/kernel/sched.c ilbnew/kernel/sched.c
--- ilbold/kernel/sched.c Thu Jan 9 15:20:53 2003
+++ ilbnew/kernel/sched.c Thu Jan 9 15:27:49 2003
@@ -2213,6 +2213,7 @@ static void sched_migrate_task(task_t *p
static int sched_best_cpu(struct task_struct *p)
{
int i, minload, load, best_cpu, node = 0;
+ unsigned long cpumask;
best_cpu = task_cpu(p);
if (cpu_rq(best_cpu)->nr_running <= 2)
@@ -2226,9 +2227,11 @@ static int sched_best_cpu(struct task_st
node = i;
}
}
+
minload = 10000000;
- loop_over_node(i,node) {
- if (!cpu_online(i))
+ cpumask = __node_to_cpu_mask(node);
+ for (i = 0; i < NR_CPUS; ++i) {
+ if (!(cpumask & (1 << i)))
continue;
if (cpu_rq(i)->nr_running < minload) {
best_cpu = i;
next reply other threads:[~2003-01-09 23:52 UTC|newest]
Thread overview: 96+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-01-09 23:54 Martin J. Bligh [this message]
2003-01-10 5:36 ` [Lse-tech] Minature NUMA scheduler Michael Hohnbaum
2003-01-10 16:34 ` Erich Focht
2003-01-10 16:57 ` Martin J. Bligh
2003-01-12 23:35 ` Erich Focht
2003-01-12 23:55 ` NUMA scheduler 2nd approach Erich Focht
2003-01-13 8:02 ` Christoph Hellwig
2003-01-13 11:32 ` Erich Focht
2003-01-13 15:26 ` [Lse-tech] " Christoph Hellwig
2003-01-13 15:46 ` Erich Focht
2003-01-13 19:03 ` Michael Hohnbaum
2003-01-14 1:23 ` Michael Hohnbaum
2003-01-14 4:45 ` [Lse-tech] " Andrew Theurer
2003-01-14 4:56 ` Martin J. Bligh
2003-01-14 11:14 ` Erich Focht
2003-01-14 15:55 ` [PATCH 2.5.58] new NUMA scheduler Erich Focht
2003-01-14 16:07 ` [Lse-tech] " Christoph Hellwig
2003-01-14 16:23 ` [PATCH 2.5.58] new NUMA scheduler: fix Erich Focht
2003-01-14 16:43 ` Erich Focht
2003-01-14 19:02 ` Michael Hohnbaum
2003-01-14 21:56 ` [Lse-tech] " Michael Hohnbaum
2003-01-15 15:10 ` Erich Focht
2003-01-16 0:14 ` Michael Hohnbaum
2003-01-16 6:05 ` Martin J. Bligh
2003-01-16 16:47 ` Erich Focht
2003-01-16 18:07 ` Robert Love
2003-01-16 18:48 ` Martin J. Bligh
2003-01-16 19:07 ` Ingo Molnar
2003-01-16 18:59 ` Martin J. Bligh
2003-01-16 19:10 ` Christoph Hellwig
2003-01-16 19:44 ` Ingo Molnar
2003-01-16 19:43 ` Martin J. Bligh
2003-01-16 20:19 ` Ingo Molnar
2003-01-16 20:29 ` [Lse-tech] " Rick Lindsley
2003-01-16 23:31 ` Martin J. Bligh
2003-01-17 7:23 ` Ingo Molnar
2003-01-17 8:47 ` [patch] sched-2.5.59-A2 Ingo Molnar
2003-01-17 14:35 ` Erich Focht
2003-01-17 15:11 ` Ingo Molnar
2003-01-17 15:30 ` Erich Focht
2003-01-17 16:58 ` Martin J. Bligh
2003-01-18 20:54 ` NUMA sched -> pooling scheduler (inc HT) Martin J. Bligh
2003-01-18 21:34 ` [Lse-tech] " Martin J. Bligh
2003-01-19 0:13 ` Andrew Theurer
2003-01-17 18:19 ` [patch] sched-2.5.59-A2 Michael Hohnbaum
2003-01-18 7:08 ` William Lee Irwin III
2003-01-18 8:12 ` Martin J. Bligh
2003-01-18 8:16 ` William Lee Irwin III
2003-01-19 4:22 ` William Lee Irwin III
2003-01-17 17:21 ` Martin J. Bligh
2003-01-17 17:23 ` Martin J. Bligh
2003-01-17 18:11 ` Erich Focht
2003-01-17 19:04 ` Martin J. Bligh
2003-01-17 19:26 ` [Lse-tech] " Martin J. Bligh
2003-01-18 0:13 ` Michael Hohnbaum
2003-01-18 13:31 ` [patch] tunable rebalance rates for sched-2.5.59-B0 Erich Focht
2003-01-18 23:09 ` [patch] sched-2.5.59-A2 Erich Focht
2003-01-20 9:28 ` Ingo Molnar
2003-01-20 12:07 ` Erich Focht
2003-01-20 16:56 ` Ingo Molnar
2003-01-20 17:04 ` Ingo Molnar
2003-01-20 17:10 ` Martin J. Bligh
2003-01-20 17:24 ` Ingo Molnar
2003-01-20 19:13 ` Andrew Theurer
2003-01-20 19:33 ` Martin J. Bligh
2003-01-20 19:52 ` Andrew Theurer
2003-01-20 19:52 ` Martin J. Bligh
2003-01-20 21:18 ` [patch] HT scheduler, sched-2.5.59-D7 Ingo Molnar
2003-01-20 22:28 ` Andrew Morton
2003-01-21 1:11 ` Michael Hohnbaum
2003-01-22 3:15 ` Michael Hohnbaum
2003-01-22 16:41 ` Andrew Theurer
2003-01-22 16:17 ` Martin J. Bligh
2003-01-22 16:20 ` Andrew Theurer
2003-01-22 16:35 ` Michael Hohnbaum
2003-02-03 18:23 ` [patch] HT scheduler, sched-2.5.59-E2 Ingo Molnar
2003-02-03 20:47 ` Robert Love
2003-02-04 9:31 ` Erich Focht
2003-01-20 17:04 ` [patch] sched-2.5.59-A2 Martin J. Bligh
2003-01-21 17:44 ` Erich Focht
2003-01-20 16:23 ` Martin J. Bligh
2003-01-20 16:59 ` Ingo Molnar
2003-01-17 23:09 ` Matthew Dobson
2003-01-16 23:45 ` [PATCH 2.5.58] new NUMA scheduler: fix Michael Hohnbaum
2003-01-17 11:10 ` Erich Focht
2003-01-17 14:07 ` Ingo Molnar
2003-01-16 19:44 ` John Bradford
2003-01-14 16:51 ` Christoph Hellwig
2003-01-15 0:05 ` Michael Hohnbaum
2003-01-15 7:47 ` Martin J. Bligh
2003-01-14 5:50 ` [Lse-tech] Re: NUMA scheduler 2nd approach Michael Hohnbaum
2003-01-14 16:52 ` Andrew Theurer
2003-01-14 15:13 ` Erich Focht
2003-01-14 10:56 ` Erich Focht
2003-01-11 14:43 ` [Lse-tech] Minature NUMA scheduler Bill Davidsen
2003-01-12 23:24 ` Erich Focht
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52570000.1042156448@flay \
--to=mbligh@aracnet.com \
--cc=efocht@ess.nec.de \
--cc=hohnbaum@us.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lse-tech@lists.sourceforge.net \
--cc=mingo@elte.hu \
--cc=rml@tech9.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox