From: "Martin J. Bligh" <mbligh@aracnet.com>
To: Erich Focht <efocht@ess.nec.de>,
Michael Hohnbaum <hohnbaum@us.ibm.com>,
Robert Love <rml@tech9.net>, Ingo Molnar <mingo@elte.hu>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
lse-tech <lse-tech@lists.sourceforge.net>
Subject: Minature NUMA scheduler
Date: Thu, 09 Jan 2003 15:54:08 -0800 [thread overview]
Message-ID: <52570000.1042156448@flay> (raw)
I tried a small experiment today - did a simple restriction of
the O(1) scheduler to only balance inside a node. Coupled with
the small initial load balancing patch floating around, this
covers 95% of cases, is a trivial change (3 lines), performs
just as well as Erich's patch on a kernel compile, and actually
better on schedbench.
This is NOT meant to be a replacement for the code Erich wrote,
it's meant to be a simple way to get integration and acceptance.
Code that just forks and never execs will stay on one node - but
we can take the code Erich wrote, and put it in seperate rebalancer
that fires much less often to do a cross-node rebalance. All that
would be under #ifdef CONFIG_NUMA, the only thing that would touch
mainline is these three lines of change, and it's trivial to see
they're completely equivalent to the current code on non-NUMA systems.
I also believe that this is the more correct approach in design, it
should result in much less cross-node migration of tasks, and less
scanning of remote runqueues.
Opinions / comments?
M.
Kernbench:
Elapsed User System CPU
2.5.54-mjb3 19.41s 186.38s 39.624s 1191.4%
2.5.54-mjb3-mjbsched 19.508s 186.356s 39.888s 1164.6%
Schedbench 4:
AvgUser Elapsed TotalUser TotalSys
2.5.54-mjb3 0.00 35.14 88.82 0.64
2.5.54-mjb3-mjbsched 0.00 31.84 88.91 0.49
Schedbench 8:
AvgUser Elapsed TotalUser TotalSys
2.5.54-mjb3 0.00 47.55 269.36 1.48
2.5.54-mjb3-mjbsched 0.00 41.01 252.34 1.07
Schedbench 16:
AvgUser Elapsed TotalUser TotalSys
2.5.54-mjb3 0.00 76.53 957.48 4.17
2.5.54-mjb3-mjbsched 0.00 69.01 792.71 2.74
Schedbench 32:
AvgUser Elapsed TotalUser TotalSys
2.5.54-mjb3 0.00 145.20 1993.97 11.05
2.5.54-mjb3-mjbsched 0.00 117.47 1798.93 5.95
Schedbench 64:
AvgUser Elapsed TotalUser TotalSys
2.5.54-mjb3 0.00 307.80 4643.55 20.36
2.5.54-mjb3-mjbsched 0.00 241.04 3589.55 12.74
-----------------------------------------
diff -purN -X /home/mbligh/.diff.exclude virgin/kernel/sched.c mjbsched/kernel/sched.c
--- virgin/kernel/sched.c Mon Dec 9 18:46:15 2002
+++ mjbsched/kernel/sched.c Thu Jan 9 14:09:17 2003
@@ -654,7 +654,7 @@ static inline unsigned int double_lock_b
/*
* find_busiest_queue - find the busiest runqueue.
*/
-static inline runqueue_t *find_busiest_queue(runqueue_t *this_rq, int this_cpu, int idle, int *imbalance)
+static inline runqueue_t *find_busiest_queue(runqueue_t *this_rq, int this_cpu, int idle, int *imbalance, unsigned long cpumask)
{
int nr_running, load, max_load, i;
runqueue_t *busiest, *rq_src;
@@ -689,7 +689,7 @@ static inline runqueue_t *find_busiest_q
busiest = NULL;
max_load = 1;
for (i = 0; i < NR_CPUS; i++) {
- if (!cpu_online(i))
+ if (!cpu_online(i) || !((1 << i) & cpumask) )
continue;
rq_src = cpu_rq(i);
@@ -764,7 +764,8 @@ static void load_balance(runqueue_t *thi
struct list_head *head, *curr;
task_t *tmp;
- busiest = find_busiest_queue(this_rq, this_cpu, idle, &imbalance);
+ busiest = find_busiest_queue(this_rq, this_cpu, idle, &imbalance,
+ __node_to_cpu_mask(__cpu_to_node(this_cpu)) );
if (!busiest)
goto out;
---------------------------------------------------
A tiny change in the current ilb patch is also needed to stop it
using a macro from the first patch:
diff -purN -X /home/mbligh/.diff.exclude ilbold/kernel/sched.c ilbnew/kernel/sched.c
--- ilbold/kernel/sched.c Thu Jan 9 15:20:53 2003
+++ ilbnew/kernel/sched.c Thu Jan 9 15:27:49 2003
@@ -2213,6 +2213,7 @@ static void sched_migrate_task(task_t *p
static int sched_best_cpu(struct task_struct *p)
{
int i, minload, load, best_cpu, node = 0;
+ unsigned long cpumask;
best_cpu = task_cpu(p);
if (cpu_rq(best_cpu)->nr_running <= 2)
@@ -2226,9 +2227,11 @@ static int sched_best_cpu(struct task_st
node = i;
}
}
+
minload = 10000000;
- loop_over_node(i,node) {
- if (!cpu_online(i))
+ cpumask = __node_to_cpu_mask(node);
+ for (i = 0; i < NR_CPUS; ++i) {
+ if (!(cpumask & (1 << i)))
continue;
if (cpu_rq(i)->nr_running < minload) {
best_cpu = i;
next reply other threads:[~2003-01-09 23:52 UTC|newest]
Thread overview: 96+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-01-09 23:54 Martin J. Bligh [this message]
2003-01-10 5:36 ` [Lse-tech] Minature NUMA scheduler Michael Hohnbaum
2003-01-10 16:34 ` Erich Focht
2003-01-10 16:57 ` Martin J. Bligh
2003-01-12 23:35 ` Erich Focht
2003-01-12 23:55 ` NUMA scheduler 2nd approach Erich Focht
2003-01-13 8:02 ` Christoph Hellwig
2003-01-13 11:32 ` Erich Focht
2003-01-13 15:26 ` [Lse-tech] " Christoph Hellwig
2003-01-13 15:46 ` Erich Focht
2003-01-13 19:03 ` Michael Hohnbaum
2003-01-14 1:23 ` Michael Hohnbaum
2003-01-14 4:45 ` [Lse-tech] " Andrew Theurer
2003-01-14 4:56 ` Martin J. Bligh
2003-01-14 11:14 ` Erich Focht
2003-01-14 15:55 ` [PATCH 2.5.58] new NUMA scheduler Erich Focht
2003-01-14 16:07 ` [Lse-tech] " Christoph Hellwig
2003-01-14 16:23 ` [PATCH 2.5.58] new NUMA scheduler: fix Erich Focht
2003-01-14 16:43 ` Erich Focht
2003-01-14 19:02 ` Michael Hohnbaum
2003-01-14 21:56 ` [Lse-tech] " Michael Hohnbaum
2003-01-15 15:10 ` Erich Focht
2003-01-16 0:14 ` Michael Hohnbaum
2003-01-16 6:05 ` Martin J. Bligh
2003-01-16 16:47 ` Erich Focht
2003-01-16 18:07 ` Robert Love
2003-01-16 18:48 ` Martin J. Bligh
2003-01-16 19:07 ` Ingo Molnar
2003-01-16 18:59 ` Martin J. Bligh
2003-01-16 19:10 ` Christoph Hellwig
2003-01-16 19:44 ` Ingo Molnar
2003-01-16 19:43 ` Martin J. Bligh
2003-01-16 20:19 ` Ingo Molnar
2003-01-16 20:29 ` [Lse-tech] " Rick Lindsley
2003-01-16 23:31 ` Martin J. Bligh
2003-01-17 7:23 ` Ingo Molnar
2003-01-17 8:47 ` [patch] sched-2.5.59-A2 Ingo Molnar
2003-01-17 14:35 ` Erich Focht
2003-01-17 15:11 ` Ingo Molnar
2003-01-17 15:30 ` Erich Focht
2003-01-17 16:58 ` Martin J. Bligh
2003-01-18 20:54 ` NUMA sched -> pooling scheduler (inc HT) Martin J. Bligh
2003-01-18 21:34 ` [Lse-tech] " Martin J. Bligh
2003-01-19 0:13 ` Andrew Theurer
2003-01-17 18:19 ` [patch] sched-2.5.59-A2 Michael Hohnbaum
2003-01-18 7:08 ` William Lee Irwin III
2003-01-18 8:12 ` Martin J. Bligh
2003-01-18 8:16 ` William Lee Irwin III
2003-01-19 4:22 ` William Lee Irwin III
2003-01-17 17:21 ` Martin J. Bligh
2003-01-17 17:23 ` Martin J. Bligh
2003-01-17 18:11 ` Erich Focht
2003-01-17 19:04 ` Martin J. Bligh
2003-01-17 19:26 ` [Lse-tech] " Martin J. Bligh
2003-01-18 0:13 ` Michael Hohnbaum
2003-01-18 13:31 ` [patch] tunable rebalance rates for sched-2.5.59-B0 Erich Focht
2003-01-18 23:09 ` [patch] sched-2.5.59-A2 Erich Focht
2003-01-20 9:28 ` Ingo Molnar
2003-01-20 12:07 ` Erich Focht
2003-01-20 16:56 ` Ingo Molnar
2003-01-20 17:04 ` Ingo Molnar
2003-01-20 17:10 ` Martin J. Bligh
2003-01-20 17:24 ` Ingo Molnar
2003-01-20 19:13 ` Andrew Theurer
2003-01-20 19:33 ` Martin J. Bligh
2003-01-20 19:52 ` Andrew Theurer
2003-01-20 19:52 ` Martin J. Bligh
2003-01-20 21:18 ` [patch] HT scheduler, sched-2.5.59-D7 Ingo Molnar
2003-01-20 22:28 ` Andrew Morton
2003-01-21 1:11 ` Michael Hohnbaum
2003-01-22 3:15 ` Michael Hohnbaum
2003-01-22 16:41 ` Andrew Theurer
2003-01-22 16:17 ` Martin J. Bligh
2003-01-22 16:20 ` Andrew Theurer
2003-01-22 16:35 ` Michael Hohnbaum
2003-02-03 18:23 ` [patch] HT scheduler, sched-2.5.59-E2 Ingo Molnar
2003-02-03 20:47 ` Robert Love
2003-02-04 9:31 ` Erich Focht
2003-01-20 17:04 ` [patch] sched-2.5.59-A2 Martin J. Bligh
2003-01-21 17:44 ` Erich Focht
2003-01-20 16:23 ` Martin J. Bligh
2003-01-20 16:59 ` Ingo Molnar
2003-01-17 23:09 ` Matthew Dobson
2003-01-16 23:45 ` [PATCH 2.5.58] new NUMA scheduler: fix Michael Hohnbaum
2003-01-17 11:10 ` Erich Focht
2003-01-17 14:07 ` Ingo Molnar
2003-01-16 19:44 ` John Bradford
2003-01-14 16:51 ` Christoph Hellwig
2003-01-15 0:05 ` Michael Hohnbaum
2003-01-15 7:47 ` Martin J. Bligh
2003-01-14 5:50 ` [Lse-tech] Re: NUMA scheduler 2nd approach Michael Hohnbaum
2003-01-14 16:52 ` Andrew Theurer
2003-01-14 15:13 ` Erich Focht
2003-01-14 10:56 ` Erich Focht
2003-01-11 14:43 ` [Lse-tech] Minature NUMA scheduler Bill Davidsen
2003-01-12 23:24 ` Erich Focht
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52570000.1042156448@flay \
--to=mbligh@aracnet.com \
--cc=efocht@ess.nec.de \
--cc=hohnbaum@us.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lse-tech@lists.sourceforge.net \
--cc=mingo@elte.hu \
--cc=rml@tech9.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.