Re: [Lse-tech] [PATCH 1/2] node affine NUMA scheduler

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Erich Focht <efocht@ess.nec.de>
To: "Martin J. Bligh" <mbligh@aracnet.com>,
	linux-kernel <linux-kernel@vger.kernel.org>
Cc: LSE <lse-tech@lists.sourceforge.net>, Ingo Molnar <mingo@elte.hu>,
	Michael Hohnbaum <hohnbaum@us.ibm.com>
Subject: Re: [Lse-tech] [PATCH 1/2] node affine NUMA scheduler
Date: Tue, 24 Sep 2002 23:04:44 +0200	[thread overview]
Message-ID: <200209242304.44799.efocht@ess.nec.de> (raw)
In-Reply-To: <170330281.1032781640@[10.10.2.3]>

On Monday 23 September 2002 20:47, Martin J. Bligh wrote:
> > I have two problems with this approach:
> > 1: Freeing memory is quite expensive, as it currently involves finding
> > the maximum of the array node_mem[].
>
> Bleh ... why? This needs to be calculated much more lazily than this,
> or you're going to kick the hell out of any cache affinity. Can you
> recalc this in the rebalance code or something instead?

You're right, that would be too slow. I think of marking the tasks
needing recalculation and update their homenode when their runqueue
is scanned for a task to be stolen.

> > 2: I have no idea how tasks sharing the mm structure will behave. I'd
> > like them to run on different nodes (that's why node_mem is not in mm),
> > but they could (legally) free pages which they did not allocate and
> > have wrong values in node_mem[].
>
> Yes, that really ought to be per-process, not per task. Which means
> locking or atomics ... and overhead. Ick.

Hmm, I think it is sometimes ok to have it per task. For example OpenMP
parallel jobs working on huge arrays. The "first-touch" of these arrays
leads to pagefaults generated by the different tasks and thus different
node_mem[] arrays for each task. As long as they just allocate memory,
all is well. If they only release it at the end of the job, too. This
probably goes wrong if we have a long running task that spawns short
living clones. They inherit the node_mem from the parent but pages
added by them to the common mm are not reflected in the parent's node_mem
after their death.

> For the first cut of the NUMA sched, maybe you could just leave page
> allocation alone, and do that seperately? or is that what the second
> patch was meant to be?

The first patch needs a correction, add in load_balance()
	if (!busiest) goto out;
after the call to find_busiest_queue. This works alone. On top of this
pooling NUMA scheduler we can put the node affinity approach that fits
best. With or without memory allocation. I'll update the patches and
their setup code (thanks for the comments!) and resend them.

Regards,
Erich

next prev parent reply	other threads:[~2002-09-24 21:04 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-09-21  9:59 [PATCH 1/2] node affine NUMA scheduler Erich Focht
2002-09-21 10:02 ` [PATCH 2/2] " Erich Focht
2002-09-21 15:55 ` [Lse-tech] [PATCH 1/2] " Martin J. Bligh
2002-09-21 16:32   ` Martin J. Bligh
2002-09-21 16:46     ` Martin J. Bligh
2002-09-21 17:11       ` Martin J. Bligh
2002-09-21 17:32         ` Erich Focht
2002-09-21 17:38           ` William Lee Irwin III
2002-09-21 23:18       ` William Lee Irwin III
2002-09-22  8:09         ` William Lee Irwin III
2002-09-22  8:30           ` Erich Focht
2002-09-22 17:11             ` Martin J. Bligh
2002-09-22 19:20               ` Martin J. Bligh
2002-09-22 21:59                 ` Erich Focht
2002-09-22 22:36                   ` William Lee Irwin III
2002-09-22 22:51                     ` Martin J. Bligh
2002-09-23 18:19               ` node affine NUMA scheduler: simple benchmark Erich Focht
2002-09-22 10:35       ` [Lse-tech] [PATCH 1/2] node affine NUMA scheduler Erich Focht
2002-09-22 10:45   ` Erich Focht
2002-09-22 14:57     ` Martin J. Bligh
2002-09-23 18:38       ` Erich Focht
2002-09-23 18:47         ` Martin J. Bligh
2002-09-24 21:04           ` Erich Focht [this message]
2002-09-24 21:17             ` Martin J. Bligh
2002-09-22 15:52 ` Martin J. Bligh
2002-09-22 19:24   ` Martin J. Bligh
2002-09-24 23:59   ` Matthew Dobson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200209242304.44799.efocht@ess.nec.de \
    --to=efocht@ess.nec.de \
    --cc=hohnbaum@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lse-tech@lists.sourceforge.net \
    --cc=mbligh@aracnet.com \
    --cc=mingo@elte.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox