Re: Re: NUMA and SMP - Emmanuel Ackaouy

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Emmanuel Ackaouy <ack@xensource.com>
To: "Petersson, Mats" <Mats.Petersson@amd.com>
Cc: Anthony Liguori <aliguori@linux.vnet.ibm.com>,
	xen-devel <xen-devel@lists.xensource.com>,
	David Pilger <pilger.david@gmail.com>,
	Ryan Harper <ryanh@us.ibm.com>
Subject: Re: Re: NUMA and SMP
Date: Tue, 16 Jan 2007 17:13:43 +0100	[thread overview]
Message-ID: <f4840fc0c699087f1057581dfd22555b@xensource.com> (raw)
In-Reply-To: <907625E08839C4409CE5768403633E0B018E187F@sefsexmb1.amd.com>

On Jan 16, 2007, at 15:19, Petersson, Mats wrote:
>> There is a strong argument for making hypervisors and OSes NUMA
>> aware in the sense that:
>> 1- They know about system topology
>> 2- They can export this information up the stack to applications and
>> users
>> 3- They can take in directives from users and applications to
>> partition
>> the
>>      host and place some threads and memory in specific partitions.
>> 4- They use an interleaved (or random) initial memory
>> placement strategy
>>      by default.
>>
>> The argument that the OS on its own -- without user or application
>> directives -- can make better placement decisions than round-robin or
>> random placement is -- in my opinion -- flawed.
>
> Debatable - it depends a lot on WHAT applications you expect to run, 
> and
> how they behave. If you consider an application that frequently
> allocates and de-allocates memory dynamically in a single threaded
> process (say compiler), then allocating memory in the local node should
> be the "first choice".
>
> Multithreaded apps can use a similar approach, if a thread is 
> allocating
> memory, it's often a good chance that the memory is being used by that
> thread too [although this doesn't work for message passing between
> threads, obviously, this is again a case where "knowledge from the app"
> will be the only better solution than "random"].
>
> This approach is by far not perfect, but if you consider that
> applications often do short term allocations, it makes sense to 
> allocate
> on the local node if possible.

I do not agree.

Just because a thread happens to run on processor X when
it first faults in a page off the process' heap doesn't give you
a good indication that the memory will be used mostly by
this thread or that the thread will continue running on the
same processor. There are at least as many cases when
this assumption is invalid than when it is valid. Without any
solid indication that something else will work better, round
robin allocation has to be the default strategy.

Also, if you allow one process to consume a large percentage
of one node's memory, you are indirectly hurting all competing
multi-threaded apps which benefit from higher total memory
bandwidth when they spread their data across nodes.

I understand your point that if a single threaded process quickly
shrinks its heap after growing it, it makes it less likely that it will
migrate to a different processor while it is using this memory. I'm
not sure how you predict that memory will be quickly released at
allocation time though. Even if you could, I maintain you would
still need safeguards in place to balance that process' needs
with that of competing multi-threaded apps benefiting from the
memory bandwidth scaling with number of hosting nodes.

You could try and compromise and allocate round robin starting
locally and perhaps with diminishing strides as the total allocation
grows (ie allocate local and progressively move towards a page
round robin scheme as more memory is requested). I'm not sure
this would do any better than plain old dumb round robin in the
average case but it's worth a thought.

> However, supporting NUMA in the Hypervisor and forwarding arch-info to
> the guest would make sense. At the least the very basic principle of: 
> If
> the guest is to run on a limited set of processors (nodes), allocate
> memory from that (those) node(s) for the guest would make a lot of
> sense.

I suspect there is widespread agreement on this point.

next prev parent reply	other threads:[~2007-01-16 16:13 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-01-14 11:55 NUMA and SMP David Pilger
2007-01-14 19:00 ` Ryan Harper
2007-01-15 17:21 ` Anthony Liguori
2007-01-16 10:47   ` Petersson, Mats
2007-01-16 13:55     ` Emmanuel Ackaouy
2007-01-16 14:19       ` Petersson, Mats
2007-01-16 16:13         ` Emmanuel Ackaouy [this message]
2007-01-16 16:30           ` Petersson, Mats
2007-03-20 13:10       ` tgh
2007-03-20 13:19         ` Petersson, Mats
2007-03-20 13:49           ` tgh
2007-03-20 15:50             ` Petersson, Mats
2007-03-20 16:45               ` Ryan Harper
2007-03-20 16:47                 ` Petersson, Mats
2007-03-20 13:51         ` Daniel Stodden
2007-03-21  1:08           ` tgh
2007-03-21  2:45             ` Daniel Stodden
2007-03-22  1:16               ` tgh
2007-03-22 10:42                 ` Daniel Stodden
2007-03-22 12:13                   ` tgh
2007-03-22 12:28                     ` Daniel Stodden
2007-03-22 13:02                       ` Ryan Harper
2007-03-22 14:56                         ` Daniel Stodden
2007-03-22 15:12                           ` Ryan Harper
2007-03-22 15:38                             ` Daniel Stodden
2007-03-22 16:01                               ` Ryan Harper
2007-03-22 16:22                                 ` Daniel Stodden
2007-03-22 17:02                                   ` Ryan Harper
2007-03-23  5:47                                     ` tgh
2007-03-23 14:42                                       ` Ryan Harper
2007-03-23 14:48                                         ` Petersson, Mats
2007-03-28  1:50                                           ` tgh
2007-03-28  2:01                                             ` Ryan Harper
2007-03-28 21:25                                   ` The context switch overhead comparison between vmexit/vmentry and hypercall Liang Yang
2007-01-16 14:51   ` Re: NUMA and SMP ron minnich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f4840fc0c699087f1057581dfd22555b@xensource.com \
    --to=ack@xensource.com \
    --cc=Mats.Petersson@amd.com \
    --cc=aliguori@linux.vnet.ibm.com \
    --cc=pilger.david@gmail.com \
    --cc=ryanh@us.ibm.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.