Re: [Lse-tech] NUMA scheduling

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Larry McVoy <lm@bitmover.com>
To: "Martin J. Bligh" <Martin.Bligh@us.ibm.com>
Cc: Erich Focht <focht@ess.nec.de>, Mike Kravetz <kravetz@us.ibm.com>,
	Jesse Barnes <jbarnes@sgi.com>, Peter Rival <frival@zk3.dec.com>,
	lse-tech@lists.sourceforge.net, linux-kernel@vger.kernel.org
Subject: Re: [Lse-tech] NUMA scheduling
Date: Mon, 25 Feb 2002 11:03:27 -0800	[thread overview]
Message-ID: <20020225110327.A22497@work.bitmover.com> (raw)
In-Reply-To: <Pine.LNX.4.21.0202251737420.30318-100000@sx6.ess.nec.de> <20940000.1014663303@flay>
In-Reply-To: <20940000.1014663303@flay>; from Martin.Bligh@us.ibm.com on Mon, Feb 25, 2002 at 10:55:03AM -0800

On Mon, Feb 25, 2002 at 10:55:03AM -0800, Martin J. Bligh wrote:
> > - The load_balancing() concept is different:
> > 	- there are no special time intervals for balancing across pool
> > 	boundaries, the need for this can occur very quickly and I
> > 	have the feeling that 2*250ms is a long time for keeping the 
> > 	nodes unbalanced. This means: each time load_balance() is called
> > 	it _can_ balance across pool boundaries (but doesn't have to).
> 
> Imagine for a moment that there's a short spike in workload on one node.
> By agressively balancing across nodes, won't you incur a high cost
> in terms of migrating all the cache data to the remote node (destroying
> the cache on both the remote and local node), when it would be cheaper 
> to wait for a few more ms, and run on the local node?

Great question!  The answer is that you are absolutely right.  SGI tried
a pile of things in this area, both on NUMA and on traditional SMPs (the
NUMA stuff was more page migration and the SMP stuff was more process
migration, but the problems are the same, you screw up the cache).  They
never got the page migration to give them better performance while I was
there and I doubt they have today.  And the process "migration" from CPU
to CPU didn't work either, people tended to lock processes to processors
for exactly the reason you alluded to.

If you read the early hardware papers on SMP, they all claim "Symmetric
Multi Processor", i.e., you can run any process on any CPU.  Skip forward
3 years, now read the cache affinity papers from the same hardware people.
You have to step back and squint but what you'll see is that these papers
could be summarized on one sentence:

	"Oops, we lied, it's not really symmetric at all"

You should treat each CPU as a mini system and think of a process reschedule
someplace else as a checkpoint/restart and assume that is heavy weight.  In
fact, I'd love to see the scheduler code forcibly sleep the process for 
500 milliseconds each time it lands on a different CPU.  Tune the system
to work well with that, then take out the sleep, and you'll have the right
answer.
-- 
---
Larry McVoy            	 lm at bitmover.com           http://www.bitmover.com/lm

next prev parent reply	other threads:[~2002-02-25 19:03 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-02-22 18:56 NUMA scheduling Mike Kravetz
2002-02-22 19:14 ` [Lse-tech] " Jesse Barnes
2002-02-22 19:29   ` Peter Rival
2002-02-22 23:59 ` Mike Kravetz
2002-02-25 18:32 ` Erich Focht
2002-02-25 18:55   ` Martin J. Bligh
2002-02-25 19:03     ` Larry McVoy [this message]
2002-02-25 19:28       ` Davide Libenzi
2002-02-25 19:45         ` Davide Libenzi
2002-02-25 19:35       ` Timothy D. Witham
2002-02-25 19:49       ` Bill Davidsen
2002-02-25 20:02         ` Larry McVoy
2002-02-25 20:18           ` Davide Libenzi
2002-02-26  5:14           ` Bill Davidsen
2002-02-25 23:35     ` [Lse-tech] [rebalance at: do_fork() vs. do_execve()] " Andy Pfiffer
2002-02-26 10:33     ` [Lse-tech] " Erich Focht
2002-02-26 15:30       ` Martin J. Bligh
2002-02-27 16:56         ` Erich Focht
2002-02-26 19:03       ` Mike Kravetz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20020225110327.A22497@work.bitmover.com \
    --to=lm@bitmover.com \
    --cc=Martin.Bligh@us.ibm.com \
    --cc=focht@ess.nec.de \
    --cc=frival@zk3.dec.com \
    --cc=jbarnes@sgi.com \
    --cc=kravetz@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lse-tech@lists.sourceforge.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox