From: Christoph Hellwig <hch@infradead.org>
To: Erich Focht <efocht@ess.nec.de>
Cc: "Martin J. Bligh" <mbligh@aracnet.com>,
Michael Hohnbaum <hohnbaum@us.ibm.com>,
Robert Love <rml@tech9.net>, Ingo Molnar <mingo@elte.hu>,
linux-kernel <linux-kernel@vger.kernel.org>,
lse-tech <lse-tech@lists.sourceforge.net>
Subject: Re: NUMA scheduler 2nd approach
Date: Mon, 13 Jan 2003 08:02:08 +0000 [thread overview]
Message-ID: <20030113080207.A9119@infradead.org> (raw)
In-Reply-To: <200301130055.28005.efocht@ess.nec.de>; from efocht@ess.nec.de on Mon, Jan 13, 2003 at 12:55:28AM +0100
On Mon, Jan 13, 2003 at 12:55:28AM +0100, Erich Focht wrote:
> Hi Martin & Michael,
>
> as discussed on the LSE call I played around with a cross-node
> balancer approach put on top of the miniature NUMA scheduler. The
> patches are attached and it seems to be clear that we can regain the
> good performance for hackbench by adding a cross-node balancer.
The changes look fine to me. But I think there's some conding style
issues that need cleaning up (see below). Also is there a reason
patches 2/3 and 4/5 aren't merged into one patch each?
/*
- * find_busiest_queue - find the busiest runqueue.
+ * find_busiest_in_mask - find the busiest runqueue among the cpus in cpumask
*/
-static inline runqueue_t *find_busiest_queue(runqueue_t *this_rq, int this_cpu, int idle, int *imbalance)
+static inline runqueue_t *find_busiest_in_mask(runqueue_t *this_rq, int this_cpu, int idle, int *imbalance, unsigned long cpumask)
find_busiest_queue has just one caller in 2.5.56, I'd suggest just
changing the prototype and updating that single caller to pass in
the cpumask opencoded.
@@ -160,7 +160,6 @@ extern void update_one_process(struct ta
extern void scheduler_tick(int user_tick, int system);
extern unsigned long cache_decay_ticks;
-
#define MAX_SCHEDULE_TIMEOUT LONG_MAX
extern signed long FASTCALL(schedule_timeout(signed long timeout));
asmlinkage void schedule(void);
I don't think you need this spurious whitespace change :)
+#ifdef CONFIG_NUMA
+extern void sched_balance_exec(void);
+extern void node_nr_running_init(void);
+#define nr_running_inc(rq) atomic_inc(rq->node_ptr); \
+ rq->nr_running++
+#define nr_running_dec(rq) atomic_dec(rq->node_ptr); \
+ rq->nr_running--
static inline void nr_running_inc(runqueue_t *rq)
{
atomic_inc(rq->node_ptr);
rq->nr_running++
}
etc.. would look a bit nicer.
diff -urNp linux-2.5.55-ms/kernel/sched.c linux-2.5.55-ms-ilb/kernel/sched.c
--- linux-2.5.55-ms/kernel/sched.c 2003-01-10 23:01:02.000000000 +0100
+++ linux-2.5.55-ms-ilb/kernel/sched.c 2003-01-11 01:12:43.000000000 +0100
@@ -153,6 +153,7 @@ struct runqueue {
task_t *curr, *idle;
prio_array_t *active, *expired, arrays[2];
int prev_nr_running[NR_CPUS];
+ atomic_t * node_ptr;
atomic_t *node_ptr; would match the style above.
+#if CONFIG_NUMA
+static atomic_t node_nr_running[MAX_NUMNODES] ____cacheline_maxaligned_in_smp = {[0 ...MAX_NUMNODES-1] = ATOMIC_INIT(0)};
Maybe wants some linewrapping after 80 chars?
+__init void node_nr_running_init(void)
+{
+ int i;
+
+ for (i = 0; i < NR_CPUS; i++) {
+ cpu_rq(i)->node_ptr = &node_nr_running[__cpu_to_node(i)];
+ }
+ return;
+}
The braces and the return are superflous. Also kernel/sched.c (or
mingo codein general) seems to prefer array + i instead of &array[i]
(not that I have a general preferences, but you should try to match
the surrounding code)
+static void sched_migrate_task(task_t *p, int dest_cpu)
+{
+ unsigned long old_mask;
+
+ old_mask = p->cpus_allowed;
+ if (!(old_mask & (1UL << dest_cpu)))
+ return;
+ /* force the process onto the specified CPU */
+ set_cpus_allowed(p, 1UL << dest_cpu);
+
+ /* restore the cpus allowed mask */
+ set_cpus_allowed(p, old_mask);
This double set_cpus_allowed doesn't look nice to me. I don't
have a better suggestion of-hand, though :(
+#define decl_numa_int(ctr) int ctr
This is ugly as hell. I'd prefer wasting one int in each runqueue
or even an ifdef in the struct declaration over this obsfucation
all the time.
@@ -816,6 +834,16 @@ out:
static inline runqueue_t *find_busiest_queue(runqueue_t *this_rq, int this_cpu, int idle, int *imbalance)
{
unsigned long cpumask = __node_to_cpu_mask(__cpu_to_node(this_cpu));
+#if CONFIG_NUMA
+ int node;
+# define INTERNODE_LB 10
This wants to be put to the other magic constants in the scheduler
and needs an explanation there.
#define nr_running_dec(rq) atomic_dec(rq->node_ptr); \
rq->nr_running--
#define decl_numa_int(ctr) int ctr
+#define decl_numa_nodeint(v) int v[MAX_NUMNODES]
Another one of those.. You should reall just stick the CONFIG_NUMA
ifdef into the actual struct declaration.
+/*
+ * Find the busiest node. All previous node loads contribute with a geometrically
+ * deccaying weight to the load measure:
Linewrapping again..
next prev parent reply other threads:[~2003-01-13 7:53 UTC|newest]
Thread overview: 96+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-01-09 23:54 Minature NUMA scheduler Martin J. Bligh
2003-01-10 5:36 ` [Lse-tech] " Michael Hohnbaum
2003-01-10 16:34 ` Erich Focht
2003-01-10 16:57 ` Martin J. Bligh
2003-01-12 23:35 ` Erich Focht
2003-01-12 23:55 ` NUMA scheduler 2nd approach Erich Focht
2003-01-13 8:02 ` Christoph Hellwig [this message]
2003-01-13 11:32 ` Erich Focht
2003-01-13 15:26 ` [Lse-tech] " Christoph Hellwig
2003-01-13 15:46 ` Erich Focht
2003-01-13 19:03 ` Michael Hohnbaum
2003-01-14 1:23 ` Michael Hohnbaum
2003-01-14 4:45 ` [Lse-tech] " Andrew Theurer
2003-01-14 4:56 ` Martin J. Bligh
2003-01-14 11:14 ` Erich Focht
2003-01-14 15:55 ` [PATCH 2.5.58] new NUMA scheduler Erich Focht
2003-01-14 16:07 ` [Lse-tech] " Christoph Hellwig
2003-01-14 16:23 ` [PATCH 2.5.58] new NUMA scheduler: fix Erich Focht
2003-01-14 16:43 ` Erich Focht
2003-01-14 19:02 ` Michael Hohnbaum
2003-01-14 21:56 ` [Lse-tech] " Michael Hohnbaum
2003-01-15 15:10 ` Erich Focht
2003-01-16 0:14 ` Michael Hohnbaum
2003-01-16 6:05 ` Martin J. Bligh
2003-01-16 16:47 ` Erich Focht
2003-01-16 18:07 ` Robert Love
2003-01-16 18:48 ` Martin J. Bligh
2003-01-16 19:07 ` Ingo Molnar
2003-01-16 18:59 ` Martin J. Bligh
2003-01-16 19:10 ` Christoph Hellwig
2003-01-16 19:44 ` Ingo Molnar
2003-01-16 19:43 ` Martin J. Bligh
2003-01-16 20:19 ` Ingo Molnar
2003-01-16 20:29 ` [Lse-tech] " Rick Lindsley
2003-01-16 23:31 ` Martin J. Bligh
2003-01-17 7:23 ` Ingo Molnar
2003-01-17 8:47 ` [patch] sched-2.5.59-A2 Ingo Molnar
2003-01-17 14:35 ` Erich Focht
2003-01-17 15:11 ` Ingo Molnar
2003-01-17 15:30 ` Erich Focht
2003-01-17 16:58 ` Martin J. Bligh
2003-01-18 20:54 ` NUMA sched -> pooling scheduler (inc HT) Martin J. Bligh
2003-01-18 21:34 ` [Lse-tech] " Martin J. Bligh
2003-01-19 0:13 ` Andrew Theurer
2003-01-17 18:19 ` [patch] sched-2.5.59-A2 Michael Hohnbaum
2003-01-18 7:08 ` William Lee Irwin III
2003-01-18 8:12 ` Martin J. Bligh
2003-01-18 8:16 ` William Lee Irwin III
2003-01-19 4:22 ` William Lee Irwin III
2003-01-17 17:21 ` Martin J. Bligh
2003-01-17 17:23 ` Martin J. Bligh
2003-01-17 18:11 ` Erich Focht
2003-01-17 19:04 ` Martin J. Bligh
2003-01-17 19:26 ` [Lse-tech] " Martin J. Bligh
2003-01-18 0:13 ` Michael Hohnbaum
2003-01-18 13:31 ` [patch] tunable rebalance rates for sched-2.5.59-B0 Erich Focht
2003-01-18 23:09 ` [patch] sched-2.5.59-A2 Erich Focht
2003-01-20 9:28 ` Ingo Molnar
2003-01-20 12:07 ` Erich Focht
2003-01-20 16:56 ` Ingo Molnar
2003-01-20 17:04 ` Ingo Molnar
2003-01-20 17:10 ` Martin J. Bligh
2003-01-20 17:24 ` Ingo Molnar
2003-01-20 19:13 ` Andrew Theurer
2003-01-20 19:33 ` Martin J. Bligh
2003-01-20 19:52 ` Andrew Theurer
2003-01-20 19:52 ` Martin J. Bligh
2003-01-20 21:18 ` [patch] HT scheduler, sched-2.5.59-D7 Ingo Molnar
2003-01-20 22:28 ` Andrew Morton
2003-01-21 1:11 ` Michael Hohnbaum
2003-01-22 3:15 ` Michael Hohnbaum
2003-01-22 16:41 ` Andrew Theurer
2003-01-22 16:17 ` Martin J. Bligh
2003-01-22 16:20 ` Andrew Theurer
2003-01-22 16:35 ` Michael Hohnbaum
2003-02-03 18:23 ` [patch] HT scheduler, sched-2.5.59-E2 Ingo Molnar
2003-02-03 20:47 ` Robert Love
2003-02-04 9:31 ` Erich Focht
2003-01-20 17:04 ` [patch] sched-2.5.59-A2 Martin J. Bligh
2003-01-21 17:44 ` Erich Focht
2003-01-20 16:23 ` Martin J. Bligh
2003-01-20 16:59 ` Ingo Molnar
2003-01-17 23:09 ` Matthew Dobson
2003-01-16 23:45 ` [PATCH 2.5.58] new NUMA scheduler: fix Michael Hohnbaum
2003-01-17 11:10 ` Erich Focht
2003-01-17 14:07 ` Ingo Molnar
2003-01-16 19:44 ` John Bradford
2003-01-14 16:51 ` Christoph Hellwig
2003-01-15 0:05 ` Michael Hohnbaum
2003-01-15 7:47 ` Martin J. Bligh
2003-01-14 5:50 ` [Lse-tech] Re: NUMA scheduler 2nd approach Michael Hohnbaum
2003-01-14 16:52 ` Andrew Theurer
2003-01-14 15:13 ` Erich Focht
2003-01-14 10:56 ` Erich Focht
2003-01-11 14:43 ` [Lse-tech] Minature NUMA scheduler Bill Davidsen
2003-01-12 23:24 ` Erich Focht
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20030113080207.A9119@infradead.org \
--to=hch@infradead.org \
--cc=efocht@ess.nec.de \
--cc=hohnbaum@us.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lse-tech@lists.sourceforge.net \
--cc=mbligh@aracnet.com \
--cc=mingo@elte.hu \
--cc=rml@tech9.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox