All of lore.kernel.org
 help / color / mirror / Atom feed
* NUMA-aware VM placement in Xen
@ 2012-02-24 10:12 Dario Faggioli
  2012-02-24 10:16 ` George Dunlap
  0 siblings, 1 reply; 4+ messages in thread
From: Dario Faggioli @ 2012-02-24 10:12 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Ian.Campbell, Keir Fraser, Tim Deegan


[-- Attachment #1.1: Type: text/plain, Size: 2063 bytes --]

Hi guys,

As some of you know I'm working on putting some kind of NUMA-aware
placement of the various VMs within Xen. This means I'm investigating
deeply how memory allocation works, which isn't an easy task for me (I
started completely from scratch), so forgive me if I say something
wrong! :-P

The status is I'm dealing for a while with a "design issue" I'd be very
glad to discuss with someone, as I'm not sure which path to go for...

To keep it short, what I need is a place --ideally during VM creation--
where I can check how much memory a VM wants against how much memory is
available in the various NUMA-nodes, and use this as the basis of my
decision. The question is, where is this place?

I traced memory related calls (e.g., for HVMs) from libxl__build_hvm to
xc_hvm_build_target_mem to setup_guest to xc_domain_populate_physmap and
alloc_domheap_pages. The last twos have been my target for a while, but
I'm not so sure they would be the right choice, mainly because both of
them are called for allocating _only_part_ of the VM's memory, i.e.,
some extents of it at each call (am I right?).

Basically, given alloc_domheap_pages uses d->node_affinity for deciding
from which node(s) to actually take memory from, I was planning to
either use the same mask or build a new one with similar purposes, the
problem being _where_ to populate it with the proper nodes.
I'm now looking at xc_domain_setmaxmem-->do_domctl(XEN_DOMCTL_max_mem),
although I think it's too early, and I'd end up guessing wrt a lot of
aspects... But considering xm/xend was doing the same even earlier (at
least I think)...

Sorry fro writing so much... Any help/ideas you feel comfortable with
sharing would be very appreciated! :-)

Thanks a lot and regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-------------------------------------------------------------------
Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)



[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: NUMA-aware VM placement in Xen
  2012-02-24 10:12 NUMA-aware VM placement in Xen Dario Faggioli
@ 2012-02-24 10:16 ` George Dunlap
  2012-02-24 10:50   ` Dario Faggioli
  0 siblings, 1 reply; 4+ messages in thread
From: George Dunlap @ 2012-02-24 10:16 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: Tim (Xen.org), xen-devel, Keir (Xen.org), Ian Campbell

On 24/02/12 10:12, Dario Faggioli wrote:
> Hi guys,
>
> As some of you know I'm working on putting some kind of NUMA-aware
> placement of the various VMs within Xen. This means I'm investigating
> deeply how memory allocation works, which isn't an easy task for me (I
> started completely from scratch), so forgive me if I say something
> wrong! :-P
>
> The status is I'm dealing for a while with a "design issue" I'd be very
> glad to discuss with someone, as I'm not sure which path to go for...
>
> To keep it short, what I need is a place --ideally during VM creation--
> where I can check how much memory a VM wants against how much memory is
> available in the various NUMA-nodes, and use this as the basis of my
> decision. The question is, where is this place?
>
> I traced memory related calls (e.g., for HVMs) from libxl__build_hvm to
> xc_hvm_build_target_mem to setup_guest to xc_domain_populate_physmap and
> alloc_domheap_pages. The last twos have been my target for a while, but
> I'm not so sure they would be the right choice, mainly because both of
> them are called for allocating _only_part_ of the VM's memory, i.e.,
> some extents of it at each call (am I right?).
>
> Basically, given alloc_domheap_pages uses d->node_affinity for deciding
> from which node(s) to actually take memory from, I was planning to
> either use the same mask or build a new one with similar purposes, the
> problem being _where_ to populate it with the proper nodes.
> I'm now looking at xc_domain_setmaxmem-->do_domctl(XEN_DOMCTL_max_mem),
> although I think it's too early, and I'd end up guessing wrt a lot of
> aspects... But considering xm/xend was doing the same even earlier (at
> least I think)...
So the first question is, where should the decision about NUMA placement 
be made, and the second is how that level should implement it.

Doing it at the libxc level I think is not right.  It seems to me we 
have two options:
* Have libxl do the NUMA placement on behalf of the toolstack.  In that 
case, the libxl_domain_create_new function should look at the available 
memory, the NUMA layout, &c, and then set d->node_affinity before 
calling xc_hvm_build.
* Have the toolstack do it.  In this case, you'd be modifying xl to set 
d->node_affinity before calling libxl's domain creation function.

Do those options work?  Let me know if I've misunderstood anything.

Any thoughts one way or the other from anyone?

I'd be tempted to have it be optional -- you can set "numa=auto" and the 
domain creation function will do the simple thing; or you can set 
"numa=manual" and have the toolstack / config file set the nodes 
manually.  That would translate pretty well to config files as well -- 
more "set the knobs" administrators could set the numa layout in the 
config file manually if they wanted.

Thoughts?

  -George

>
> Sorry fro writing so much... Any help/ideas you feel comfortable with
> sharing would be very appreciated! :-)
>
> Thanks a lot and regards,
> Dario
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: NUMA-aware VM placement in Xen
  2012-02-24 10:16 ` George Dunlap
@ 2012-02-24 10:50   ` Dario Faggioli
  2012-02-24 14:36     ` George Dunlap
  0 siblings, 1 reply; 4+ messages in thread
From: Dario Faggioli @ 2012-02-24 10:50 UTC (permalink / raw)
  To: George Dunlap; +Cc: Tim (Xen.org), xen-devel, Keir (Xen.org), Ian Campbell


[-- Attachment #1.1: Type: text/plain, Size: 3651 bytes --]

On Fri, 2012-02-24 at 10:16 +0000, George Dunlap wrote: 
> > Basically, given alloc_domheap_pages uses d->node_affinity for deciding
> > from which node(s) to actually take memory from, I was planning to
> > either use the same mask or build a new one with similar purposes, the
> > problem being _where_ to populate it with the proper nodes.
> > I'm now looking at xc_domain_setmaxmem-->do_domctl(XEN_DOMCTL_max_mem),
> > although I think it's too early, and I'd end up guessing wrt a lot of
> > aspects... But considering xm/xend was doing the same even earlier (at
> > least I think)...
> So the first question is, where should the decision about NUMA placement 
> be made, and the second is how that level should implement it.
> 
Yes, indeed.

> Doing it at the libxc level I think is not right.  
>
Ok, same here. Just to be sure I understood what you're saying, if you
refer to xc_domain_setmaxmem, as I'll end up doing it in do_domctl, it'd
be in Xen, but anyway it still won't look like the way I wanted it to be
(see below). :-(

> It seems to me we 
> have two options:
> * Have libxl do the NUMA placement on behalf of the toolstack.  In that 
> case, the libxl_domain_create_new function should look at the available 
> memory, the NUMA layout, &c, and then set d->node_affinity before 
> calling xc_hvm_build.
>
This can be done. If I got it correctly it is more or less what xm/xend
already does.

> * Have the toolstack do it.  In this case, you'd be modifying xl to set 
> d->node_affinity before calling libxl's domain creation function.
> 
I'm not sure I'm getting this right... It seems very similar to the one
above.

> Do those options work?  Let me know if I've misunderstood anything.
> 
I think they can be implemented. "work", it depends on how we define
"work". :-D

That's why I was struggling for putting this in the hypervisor and not
in the toolstack because I really think it should live there if
possible. For example it would be nice for the decision to be protected
by the proper locking. I mean, what's the point in checking the amount
of free memory in a node somewhere in (lib)xl, if when the actual
allocation will happen (in Xen) that might be a completely different
value (due to concurrent domain creation, destruction, etc.)?

> Any thoughts one way or the other from anyone?
> 
Any ideas on how to put that thing _in_ Xen?

> I'd be tempted to have it be optional -- you can set "numa=auto" and the 
> domain creation function will do the simple thing; or you can set 
> "numa=manual" and have the toolstack / config file set the nodes 
> manually.  That would translate pretty well to config files as well -- 
> more "set the knobs" administrators could set the numa layout in the 
> config file manually if they wanted.
> 
I agree and that was already my plan: configurable and per-domain.

I think the config file, supporting cpupools and vcpu-pinning, already
offer almost all the facilities for manually deploying a VM reflecting a
specific NUMA-layout. What I was thinking adding was the "numa=auto" or
whatever switch, so that if one does not (want to) specify cpupools or
pinning, VM still gets NUMA-sensible placement.

But anyway, no problem adding other knobs if considered worthwhile, the
problem is the other part! :-P

Thanks and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-------------------------------------------------------------------
Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)



[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: NUMA-aware VM placement in Xen
  2012-02-24 10:50   ` Dario Faggioli
@ 2012-02-24 14:36     ` George Dunlap
  0 siblings, 0 replies; 4+ messages in thread
From: George Dunlap @ 2012-02-24 14:36 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: Keir (Xen.org), xen-devel, Tim (Xen.org), Ian Campbell

On Fri, Feb 24, 2012 at 10:50 AM, Dario Faggioli <raistlin@linux.it> wrote:
>> It seems to me we
>> have two options:
>> * Have libxl do the NUMA placement on behalf of the toolstack.  In that
>> case, the libxl_domain_create_new function should look at the available
>> memory, the NUMA layout, &c, and then set d->node_affinity before
>> calling xc_hvm_build.
>>
> This can be done. If I got it correctly it is more or less what xm/xend
> already does.
>
>> * Have the toolstack do it.  In this case, you'd be modifying xl to set
>> d->node_affinity before calling libxl's domain creation function.
>>
> I'm not sure I'm getting this right... It seems very similar to the one
> above.

>From Xen's perspective, yes.  But from the libxl perspective, no.
libxl is meant to be the interface we give to other external
toolstacks, so the interface there is important.

>> Do those options work?  Let me know if I've misunderstood anything.
>>
> I think they can be implemented. "work", it depends on how we define
> "work". :-D
>
> That's why I was struggling for putting this in the hypervisor and not
> in the toolstack because I really think it should live there if
> possible. For example it would be nice for the decision to be protected
> by the proper locking. I mean, what's the point in checking the amount
> of free memory in a node somewhere in (lib)xl, if when the actual
> allocation will happen (in Xen) that might be a completely different
> value (due to concurrent domain creation, destruction, etc.)?

At the moment, pages for a VM are not allocated in one big chunk
anyway -- xc_hvm_build.c:setup_guest() calls
xc_domain_populate_physmap() in a loop.  So Xen is in less of a
position to avoid the TOCTTOU race than the toolstack is.  The
toolstack in theory, at least, can refrain from starting a second VM
until the first is completely allocated.

I don't think there's any reason to do it in Xen -- it's not
time-critical, it doesn't require any information that the toolstack
and/or domain builder wouldn't have available to it.

> I think the config file, supporting cpupools and vcpu-pinning, already
> offer almost all the facilities for manually deploying a VM reflecting a
> specific NUMA-layout. What I was thinking adding was the "numa=auto" or
> whatever switch, so that if one does not (want to) specify cpupools or
> pinning, VM still gets NUMA-sensible placement.

Do we have a way to specify the NUMA layout?  That should be a
separate config option than vcpu pinning.  But yes, I think adding
"numa=auto" (on by default) is the big feature we need; and I think
that's probably best implemented in either the toolstack or libxl.

 -George

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-02-24 14:36 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-02-24 10:12 NUMA-aware VM placement in Xen Dario Faggioli
2012-02-24 10:16 ` George Dunlap
2012-02-24 10:50   ` Dario Faggioli
2012-02-24 14:36     ` George Dunlap

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.