public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@infradead.org>
To: Erich Focht <efocht@ess.nec.de>
Cc: "Martin J. Bligh" <mbligh@aracnet.com>,
	Michael Hohnbaum <hohnbaum@us.ibm.com>,
	Robert Love <rml@tech9.net>, Ingo Molnar <mingo@elte.hu>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	lse-tech <lse-tech@lists.sourceforge.net>
Subject: Re: NUMA scheduler 2nd approach
Date: Mon, 13 Jan 2003 08:02:08 +0000	[thread overview]
Message-ID: <20030113080207.A9119@infradead.org> (raw)
In-Reply-To: <200301130055.28005.efocht@ess.nec.de>; from efocht@ess.nec.de on Mon, Jan 13, 2003 at 12:55:28AM +0100

On Mon, Jan 13, 2003 at 12:55:28AM +0100, Erich Focht wrote:
> Hi Martin & Michael,
> 
> as discussed on the LSE call I played around with a cross-node
> balancer approach put on top of the miniature NUMA scheduler. The
> patches are attached and it seems to be clear that we can regain the
> good performance for hackbench by adding a cross-node balancer.

The changes look fine to me.  But I think there's some conding style
issues that need cleaning up (see below).  Also is there a reason
patches 2/3 and 4/5 aren't merged into one patch each?

 /*
- * find_busiest_queue - find the busiest runqueue.
+ * find_busiest_in_mask - find the busiest runqueue among the cpus in cpumask
  */
-static inline runqueue_t *find_busiest_queue(runqueue_t *this_rq, int this_cpu, int idle, int *imbalance)
+static inline runqueue_t *find_busiest_in_mask(runqueue_t *this_rq, int this_cpu, int idle, int *imbalance, unsigned long cpumask)


	find_busiest_queue has just one caller in 2.5.56, I'd suggest just
	changing the prototype and updating that single caller to pass in
	the cpumask opencoded.


@@ -160,7 +160,6 @@ extern void update_one_process(struct ta
 extern void scheduler_tick(int user_tick, int system);
 extern unsigned long cache_decay_ticks;
 
-
 #define	MAX_SCHEDULE_TIMEOUT	LONG_MAX
 extern signed long FASTCALL(schedule_timeout(signed long timeout));
 asmlinkage void schedule(void);

	I don't think you need this spurious whitespace change :)
 
+#ifdef CONFIG_NUMA
+extern void sched_balance_exec(void);
+extern void node_nr_running_init(void);
+#define nr_running_inc(rq) atomic_inc(rq->node_ptr); \
+	rq->nr_running++
+#define nr_running_dec(rq) atomic_dec(rq->node_ptr); \
+	rq->nr_running--

	static inline void nr_running_inc(runqueue_t *rq)
	{
		atomic_inc(rq->node_ptr);
		rq->nr_running++
	}

	etc.. would look a bit nicer.

diff -urNp linux-2.5.55-ms/kernel/sched.c linux-2.5.55-ms-ilb/kernel/sched.c
--- linux-2.5.55-ms/kernel/sched.c	2003-01-10 23:01:02.000000000 +0100
+++ linux-2.5.55-ms-ilb/kernel/sched.c	2003-01-11 01:12:43.000000000 +0100
@@ -153,6 +153,7 @@ struct runqueue {
 	task_t *curr, *idle;
 	prio_array_t *active, *expired, arrays[2];
 	int prev_nr_running[NR_CPUS];
+	atomic_t * node_ptr;

	atomic_t *node_ptr; would match the style above.

+#if CONFIG_NUMA
+static atomic_t node_nr_running[MAX_NUMNODES] ____cacheline_maxaligned_in_smp = {[0 ...MAX_NUMNODES-1] = ATOMIC_INIT(0)};

	Maybe wants some linewrapping after 80 chars?


+__init void node_nr_running_init(void)
+{
+	int i;
+
+	for (i = 0; i < NR_CPUS; i++) {
+		cpu_rq(i)->node_ptr = &node_nr_running[__cpu_to_node(i)];
+	}
+	return;
+}

	The braces and the return are superflous.  Also kernel/sched.c (or
	mingo codein general) seems to prefer array + i instead of &array[i]
	(not that I have a general preferences, but you should try to match
	the surrounding code)

+static void sched_migrate_task(task_t *p, int dest_cpu)
+{
+	unsigned long old_mask;
+
+	old_mask = p->cpus_allowed;
+	if (!(old_mask & (1UL << dest_cpu)))
+		return;
+	/* force the process onto the specified CPU */
+	set_cpus_allowed(p, 1UL << dest_cpu);
+
+	/* restore the cpus allowed mask */
+	set_cpus_allowed(p, old_mask);

	This double set_cpus_allowed doesn't look nice to me.  I don't
	have a better suggestion of-hand, though :(

+#define decl_numa_int(ctr) int ctr

	This is ugly as hell.  I'd prefer wasting one int in each runqueue
	or even an ifdef in the struct declaration over this obsfucation
	all the time.

@@ -816,6 +834,16 @@ out:
 static inline runqueue_t *find_busiest_queue(runqueue_t *this_rq, int this_cpu, int idle, int *imbalance)
 {
 	unsigned long cpumask = __node_to_cpu_mask(__cpu_to_node(this_cpu));
+#if CONFIG_NUMA
+	int node;
+#       define INTERNODE_LB 10

	This wants to be put to the other magic constants in the scheduler
	and needs an explanation there.


 #define nr_running_dec(rq) atomic_dec(rq->node_ptr); \
 	rq->nr_running--
 #define decl_numa_int(ctr) int ctr
+#define decl_numa_nodeint(v) int v[MAX_NUMNODES]

	Another one of those..  You should reall just stick the CONFIG_NUMA
	ifdef into the actual struct declaration.

+/*
+ * Find the busiest node. All previous node loads contribute with a geometrically
+ * deccaying weight to the load measure:

	Linewrapping again..



  reply	other threads:[~2003-01-13  7:53 UTC|newest]

Thread overview: 96+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-01-09 23:54 Minature NUMA scheduler Martin J. Bligh
2003-01-10  5:36 ` [Lse-tech] " Michael Hohnbaum
2003-01-10 16:34   ` Erich Focht
2003-01-10 16:57     ` Martin J. Bligh
2003-01-12 23:35       ` Erich Focht
2003-01-12 23:55       ` NUMA scheduler 2nd approach Erich Focht
2003-01-13  8:02         ` Christoph Hellwig [this message]
2003-01-13 11:32           ` Erich Focht
2003-01-13 15:26             ` [Lse-tech] " Christoph Hellwig
2003-01-13 15:46               ` Erich Focht
2003-01-13 19:03             ` Michael Hohnbaum
2003-01-14  1:23         ` Michael Hohnbaum
2003-01-14  4:45           ` [Lse-tech] " Andrew Theurer
2003-01-14  4:56             ` Martin J. Bligh
2003-01-14 11:14               ` Erich Focht
2003-01-14 15:55                 ` [PATCH 2.5.58] new NUMA scheduler Erich Focht
2003-01-14 16:07                   ` [Lse-tech] " Christoph Hellwig
2003-01-14 16:23                   ` [PATCH 2.5.58] new NUMA scheduler: fix Erich Focht
2003-01-14 16:43                     ` Erich Focht
2003-01-14 19:02                       ` Michael Hohnbaum
2003-01-14 21:56                         ` [Lse-tech] " Michael Hohnbaum
2003-01-15 15:10                         ` Erich Focht
2003-01-16  0:14                           ` Michael Hohnbaum
2003-01-16  6:05                           ` Martin J. Bligh
2003-01-16 16:47                             ` Erich Focht
2003-01-16 18:07                               ` Robert Love
2003-01-16 18:48                                 ` Martin J. Bligh
2003-01-16 19:07                                 ` Ingo Molnar
2003-01-16 18:59                                   ` Martin J. Bligh
2003-01-16 19:10                                   ` Christoph Hellwig
2003-01-16 19:44                                     ` Ingo Molnar
2003-01-16 19:43                                       ` Martin J. Bligh
2003-01-16 20:19                                         ` Ingo Molnar
2003-01-16 20:29                                           ` [Lse-tech] " Rick Lindsley
2003-01-16 23:31                                           ` Martin J. Bligh
2003-01-17  7:23                                             ` Ingo Molnar
2003-01-17  8:47                                             ` [patch] sched-2.5.59-A2 Ingo Molnar
2003-01-17 14:35                                               ` Erich Focht
2003-01-17 15:11                                                 ` Ingo Molnar
2003-01-17 15:30                                                   ` Erich Focht
2003-01-17 16:58                                                   ` Martin J. Bligh
2003-01-18 20:54                                                     ` NUMA sched -> pooling scheduler (inc HT) Martin J. Bligh
2003-01-18 21:34                                                       ` [Lse-tech] " Martin J. Bligh
2003-01-19  0:13                                                         ` Andrew Theurer
2003-01-17 18:19                                                   ` [patch] sched-2.5.59-A2 Michael Hohnbaum
2003-01-18  7:08                                                   ` William Lee Irwin III
2003-01-18  8:12                                                     ` Martin J. Bligh
2003-01-18  8:16                                                       ` William Lee Irwin III
2003-01-19  4:22                                                     ` William Lee Irwin III
2003-01-17 17:21                                                 ` Martin J. Bligh
2003-01-17 17:23                                                 ` Martin J. Bligh
2003-01-17 18:11                                                 ` Erich Focht
2003-01-17 19:04                                                   ` Martin J. Bligh
2003-01-17 19:26                                                     ` [Lse-tech] " Martin J. Bligh
2003-01-18  0:13                                                       ` Michael Hohnbaum
2003-01-18 13:31                                                         ` [patch] tunable rebalance rates for sched-2.5.59-B0 Erich Focht
2003-01-18 23:09                                                         ` [patch] sched-2.5.59-A2 Erich Focht
2003-01-20  9:28                                                           ` Ingo Molnar
2003-01-20 12:07                                                             ` Erich Focht
2003-01-20 16:56                                                               ` Ingo Molnar
2003-01-20 17:04                                                                 ` Ingo Molnar
2003-01-20 17:10                                                                   ` Martin J. Bligh
2003-01-20 17:24                                                                     ` Ingo Molnar
2003-01-20 19:13                                                                       ` Andrew Theurer
2003-01-20 19:33                                                                         ` Martin J. Bligh
2003-01-20 19:52                                                                           ` Andrew Theurer
2003-01-20 19:52                                                                             ` Martin J. Bligh
2003-01-20 21:18                                                                               ` [patch] HT scheduler, sched-2.5.59-D7 Ingo Molnar
2003-01-20 22:28                                                                                 ` Andrew Morton
2003-01-21  1:11                                                                                   ` Michael Hohnbaum
2003-01-22  3:15                                                                                 ` Michael Hohnbaum
2003-01-22 16:41                                                                                   ` Andrew Theurer
2003-01-22 16:17                                                                                     ` Martin J. Bligh
2003-01-22 16:20                                                                                       ` Andrew Theurer
2003-01-22 16:35                                                                                     ` Michael Hohnbaum
2003-02-03 18:23                                                                                 ` [patch] HT scheduler, sched-2.5.59-E2 Ingo Molnar
2003-02-03 20:47                                                                                   ` Robert Love
2003-02-04  9:31                                                                                   ` Erich Focht
2003-01-20 17:04                                                                 ` [patch] sched-2.5.59-A2 Martin J. Bligh
2003-01-21 17:44                                                                 ` Erich Focht
2003-01-20 16:23                                                             ` Martin J. Bligh
2003-01-20 16:59                                                               ` Ingo Molnar
2003-01-17 23:09                                                     ` Matthew Dobson
2003-01-16 23:45                                           ` [PATCH 2.5.58] new NUMA scheduler: fix Michael Hohnbaum
2003-01-17 11:10                                           ` Erich Focht
2003-01-17 14:07                                             ` Ingo Molnar
2003-01-16 19:44                                       ` John Bradford
2003-01-14 16:51                     ` Christoph Hellwig
2003-01-15  0:05                     ` Michael Hohnbaum
2003-01-15  7:47                     ` Martin J. Bligh
2003-01-14  5:50             ` [Lse-tech] Re: NUMA scheduler 2nd approach Michael Hohnbaum
2003-01-14 16:52               ` Andrew Theurer
2003-01-14 15:13                 ` Erich Focht
2003-01-14 10:56           ` Erich Focht
2003-01-11 14:43     ` [Lse-tech] Minature NUMA scheduler Bill Davidsen
2003-01-12 23:24       ` Erich Focht

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20030113080207.A9119@infradead.org \
    --to=hch@infradead.org \
    --cc=efocht@ess.nec.de \
    --cc=hohnbaum@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lse-tech@lists.sourceforge.net \
    --cc=mbligh@aracnet.com \
    --cc=mingo@elte.hu \
    --cc=rml@tech9.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox