All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [RFC] Xen NUMA strategy
@ 2007-09-20 10:26 André Przywara
  0 siblings, 0 replies; 14+ messages in thread
From: André Przywara @ 2007-09-20 10:26 UTC (permalink / raw)
  To: aron; +Cc: xen-devel, Anthony.Xu

Hi Aron,

 >> 1.) Guest NUMA support: spread a guest's resources (CPUs and memory)
 >> over several nodes and propagate the appropriate topology to the
 >> guest. ...
 >It seems like you are proposing two things at once here.  Let's call
 >these 1a and 1b
 >1a. Expose NUMA topology to the guests.  This isn't the topology of
 >    dom0, just the topology of the domU, i.e. it is constructed by
 >    dom0 when starting the domain.
 >1b. Spread the guest over nodes.  I can't tell if you mean to do this
 >    automatically or by request when starting the guest.  This seems
 >    to be separate from 1a.
 From an implementation point-of-view this is right, if you look at my 
patches I sent mid of August those parts are done in seperate patches:
http://lists.xensource.com/archives/html/xen-devel/2007-08/msg00275.html
Patch 3/4 cares about 1b), Patch 4/4 is about 1a)
But both parts do not make much sense if done seperately. If you spread 
the guest over several nodes and don't tell the guest OS about it, you 
will have about the same behaviour Xen had before the integration of the 
basic NUMA patches from Ryan Harper in October 2006.

 >>       ***Disadvantages***:
 >> - The guest has to support NUMA...
 >> - The guest's workload has to fit NUMA...
 >IMHO the list of disadvantages is only what we have in xen today.
 >Presently no guests can see the NUMA topology, so it's the same as if
 >they don't have support in the guest.  Adding NUMA topology
 >propogation does not create these disadvantages, it simply exposes the
 >weakness of the lesser operating systems.
This was mostly thought of disadvantages against the solution 2)

 >> 2.) Dynamic load balancing and page migration:
 >Again, this seems like a two-part proposal.
 >2a. Add to xen the ability to run a guest within a node, so that cpus
 >    and ram are allocated from within the node instead of randomly
 >    across the system.
This is already in Xen, at least if you pin the guest manually to a 
certain node _before_ creating the guest (by saying for instance 
cpus=0,1 if the first node consists of the first two CPUs). Xen will try 
to allocate the guest's memory from within the node the first VCPU is 
currently scheduled on (at least for HVM guests).

 >2b. NUMA balancing.  While this seems like a worthwhile goal, IMHO
 >    it's separate from the first part of the proposal.
This is most of the work that has to be done.

 > If the mechanics of migrating between NUMA nodes is implemented in the
 > hypervisor, then policy and control can be implemented in dom0
 > userland, so none of the automatic part of this needs to be in the
 > hypervisor.
This maybe true, at least there should be some means to manually migrate 
domains between nodes, which must be triggered from Dom0. So automatic 
behavior could be triggered from there, too.

Andre.

--
Andre Przywara
AMD - Operating System Research Center, Dresden, Germany

^ permalink raw reply	[flat|nested] 14+ messages in thread
* [RFC] Xen NUMA strategy
@ 2007-09-14 12:05 Andre Przywara
  2007-09-18  6:08 ` Akio Takebe
  2007-09-18 14:31 ` Aron Griffis
  0 siblings, 2 replies; 14+ messages in thread
From: Andre Przywara @ 2007-09-14 12:05 UTC (permalink / raw)
  To: xen-devel; +Cc: Xu, Anthony

Hi,

Anthony Xu and I have had some fruitful discussion about the further 
direction of the NUMA support in Xen, I wanted to share the results with 
the Xen community and start a discussion:

We came up with two different approaches for better NUMA support in Xen:
1.) Guest NUMA support: spread a guest's resources (CPUs and memory) 
over several nodes and propagate the appropriate topology to the guest.
The first part of this is in the patches I sent recently to the list (PV 
support is following, bells and whistles like automatic placement will 
follow, too.).
	***Advantages***:
- The guest OS has better means to deal with the NUMA setup, it can more 
easily migrate _processes_ among the nodes (Xen-HV can only migrate 
whole domains).
- Changes to Xen are relatively small.
- There is no limit for the guest resources, since they can use more 
resources than there are on one node.
- If guests are well spread over the nodes, the system is more balanced 
even if guests are destroyed and created later.
	***Disadvantages***:
- The guest has to support NUMA. This is not true for older guests 
(Win2K, older Linux).
- The guest's workload has to fit NUMA. If the guests tasks are merely 
parallelizable or use much shared memory, they cannot take advantage of 
NUMA and will degrade in performance. This includes all single task 
problems.

In general this approach seems to fit better with smaller NUMA nodes and 
larger guests.

2.) Dynamic load balancing and page migration: create guests within one 
NUMA node and distribute all guests across the nodes. If the system 
becomes imbalanced, migrate guests to other nodes and copy (at least 
part of) their memory pages to the other node's local memory.
	***Advantages***:
- No guest NUMA support necessary. Older as well a recent guests should 
run fine.
- Smaller guests don't have to cope with NUMA and will have 'flat' 
memory available.
- Guests running on separate nodes usually don't disturb each other and 
can benefit from the higher distributed memory bandwidth.
	***Disadvantages***:
- Guests are limited to the resources available on one node. This 
applies for both the number of CPUs and the amount of memory.
- Costly migration of guests. In a simple implementation we'd use live 
migration, which requires the whole guest's memory to be copied before 
the guest starts to run on the other node. If this whole move proves to 
be unnecessary a few minutes later, all this was in vain. A more 
advanced implementation would do the page migration in the background 
and thus can avoid this problem, if only the hot pages are migrated first.
- Integration into Xen seems to be more complicated (at least for the 
more ungifted hackers among us).

This approach seems to be more reasonable if you have larger nodes (for 
instance 16 cores) and smaller guests (the more usual case nowadays?)

After some discussion we came to the conclusion that both approaches 
should be implemented. I want to put this to the list and am looking 
forward to any feedback.

Regards,
Andre.




-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 277-84917
----to satisfy European Law for business letters:
AMD Saxony Limited Liability Company & Co. KG
Sitz (Geschäftsanschrift): Wilschdorfer Landstr. 101, 01109 Dresden, 
Deutschland
Registergericht Dresden: HRA 4896
vertretungsberechtigter Komplementär: AMD Saxony LLC (Sitz Wilmington, 
Delaware, USA)
Geschäftsführer der AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2007-09-21 21:36 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-09-20 10:26 [RFC] Xen NUMA strategy André Przywara
  -- strict thread matches above, loose matches on Subject: below --
2007-09-14 12:05 Andre Przywara
2007-09-18  6:08 ` Akio Takebe
2007-09-18  6:33   ` Xu, Anthony
2007-09-18  6:57     ` Akio Takebe
2007-09-18  8:43     ` Ian Pratt
2007-09-18 13:30       ` Aron Griffis
2007-09-19  1:04         ` Ian Pratt
2007-09-20  1:44       ` Xu, Anthony
2007-09-20  9:56         ` Ian Pratt
2007-09-20  3:09       ` Aron Griffis
2007-09-20  9:50         ` Ian Pratt
2007-09-21 21:36           ` Aron Griffis
2007-09-18 14:31 ` Aron Griffis

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.