[RFC] NUMA support - Andre Przywara

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Andre Przywara" <andre.przywara@amd.com>
To: "Duan, Ronghui" <ronghui.duan@intel.com>, xen-devel@lists.xensource.com
Cc: Anthony.Xu@intel.com
Subject: [RFC] NUMA support
Date: Fri, 23 Nov 2007 15:23:08 +0100	[thread overview]
Message-ID: <4746E24C.9010403@amd.com> (raw)
In-Reply-To: <82C666AA63DC75449C51EAD62E8B2BEC337773@pdsmsx412.ccr.corp.intel.com>

All,
thanks Ronghui for your patches and ideas. To make a more structured 
approach to a better NUMA support, I suggest to concentrate on 
one-node-guests first:
* introduce CPU affinity to memory allocation routines called from Dom0. 
This is basically my patch 2/4 from August. We should think about using 
a NUMA node number instead of a physical CPU, is there something to be 
said against this?

* find _some_ method of load balancing when creating guests. The method 
1 from Ronghui is a start, but a real decision based on each node's 
utilization (or free memory) would be more reasonable.

* patch the guest memory allocation routines to allocate memory from 
that specific node only (based on my patch 3/4)

* use live migration to local host to allow node migration. Assuming 
that localhost live migration works reliably (is that really true?) it 
shouldn't be too hard to implement this (basically just using node 
affinity while allocating guest memory). Since this is a rather 
expensive operation (takes twice the memory temporarily and quite some 
time), I'd suggest to trigger that explicitly from the admin via a xm 
command, maybe as an addition to migrate:
# xm migrate --live --node 1 <domid> localhost
There could be some Dom0 daemon based re-balancer to do this somewhat 
automatically later on.

I would take care of the memory allocation patch and would look into 
node migration. It would be great if Roughui or Anthony would help to 
improve the "load balancing" algorithm.

Meanwhile I will continue to patch that d*** Linux kernel to accept both 
CONFIG_NUMA and CONFIG_XEN without crashing that early ;-), this should 
allow both HVM and PV guests to support multiple NUMA nodes within one 
guest.

Also we should start a discussion on the config file options to add:
Shall we use "numanodes=<nr of nodes>", something like "numa=on" (for 
one-node-guests only), or something like "numanode=0,1" to explicitly 
specify certain nodes?

Any comments are appreciated.

> I read your patches and Anthony's commands. Write a patch based on
> 
> 1:    If guest set numanodes=n (default it will be 1 means that this
> guest   	will be restricted in one node); hypervisor will choose
> begin node to 	pin for this guest use round robin. But the method I use
> need a 	spin_lock to prevent create domain at same time. Are there any
> more 	good methods, hope for your suggestion.
That's a good start, thank you. Maybe Keir has some comments on the 
spinlock issue.
> 2:	pass node parameter use higher bits in flags when create domain.
> At  	this time, domain can record node information in domain struct
> for 	further use, i.e. show which node to pin when setup_guest.    
> 	If use this method, in your patch, can simply balance nodes just
> like 	below;
> 
>> +    for (i=0;i<=dominfo.max_vcpu_id;i++)
>> +    {
>> +        node= ( i * numanodes ) / (dominfo.max_vcpu_id+1)+ 		
>> +		domaininfo.first_node;
>> +        xc_vcpu_setaffinity (xc_handle, dom, i, nodemasks[node]);
>> +    }
How many bits do you want to use? Maybe it's not a good idea to abuse 
some variable to hold a limited number of nodes only ("640K ought to be 
enough for anybody" ;-) But the general idea is good.

Regards,
Andre.

-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 277-84917
----to satisfy European Law for business letters:
AMD Saxony Limited Liability Company & Co. KG
Sitz (Geschäftsanschrift): Wilschdorfer Landstr. 101, 01109 Dresden, 
Deutschland
Registergericht Dresden: HRA 4896
vertretungsberechtigter Komplementär: AMD Saxony LLC (Sitz Wilmington, 
Delaware, USA)
Geschäftsführer der AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy

next prev parent reply	other threads:[~2007-11-23 14:23 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-08-13 10:01 [PATCH 0/4] [HVM] NUMA support in HVM guests Andre Przywara
2007-09-07  8:42 ` Xu, Anthony
2007-09-07 12:49   ` Andre Przywara
2007-09-10  1:14     ` Xu, Anthony
2007-11-23  8:42       ` [PATCH 0/4] [HVM][RFC] " Duan, Ronghui
2007-11-23 14:23         ` Andre Przywara [this message]
2007-11-24 15:57           ` [RFC] NUMA support Duan, Ronghui

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4746E24C.9010403@amd.com \
    --to=andre.przywara@amd.com \
    --cc=Anthony.Xu@intel.com \
    --cc=ronghui.duan@intel.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.