From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Andre Przywara" Subject: [RFC] NUMA support Date: Fri, 23 Nov 2007 15:23:08 +0100 Message-ID: <4746E24C.9010403@amd.com> References: <82C666AA63DC75449C51EAD62E8B2BEC337773@pdsmsx412.ccr.corp.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1; format=flowed Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <82C666AA63DC75449C51EAD62E8B2BEC337773@pdsmsx412.ccr.corp.intel.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: "Duan, Ronghui" , xen-devel@lists.xensource.com Cc: Anthony.Xu@intel.com List-Id: xen-devel@lists.xenproject.org All, thanks Ronghui for your patches and ideas. To make a more structured=20 approach to a better NUMA support, I suggest to concentrate on=20 one-node-guests first: * introduce CPU affinity to memory allocation routines called from Dom0.=20 This is basically my patch 2/4 from August. We should think about using=20 a NUMA node number instead of a physical CPU, is there something to be=20 said against this? * find _some_ method of load balancing when creating guests. The method=20 1 from Ronghui is a start, but a real decision based on each node's=20 utilization (or free memory) would be more reasonable. * patch the guest memory allocation routines to allocate memory from=20 that specific node only (based on my patch 3/4) * use live migration to local host to allow node migration. Assuming=20 that localhost live migration works reliably (is that really true?) it=20 shouldn't be too hard to implement this (basically just using node=20 affinity while allocating guest memory). Since this is a rather=20 expensive operation (takes twice the memory temporarily and quite some=20 time), I'd suggest to trigger that explicitly from the admin via a xm=20 command, maybe as an addition to migrate: # xm migrate --live --node 1 localhost There could be some Dom0 daemon based re-balancer to do this somewhat=20 automatically later on. I would take care of the memory allocation patch and would look into=20 node migration. It would be great if Roughui or Anthony would help to=20 improve the "load balancing" algorithm. Meanwhile I will continue to patch that d*** Linux kernel to accept both=20 CONFIG_NUMA and CONFIG_XEN without crashing that early ;-), this should=20 allow both HVM and PV guests to support multiple NUMA nodes within one=20 guest. Also we should start a discussion on the config file options to add: Shall we use "numanodes=3D", something like "numa=3Don" (for= =20 one-node-guests only), or something like "numanode=3D0,1" to explicitly=20 specify certain nodes? Any comments are appreciated. > I read your patches and Anthony's commands. Write a patch based on >=20 > 1: If guest set numanodes=3Dn (default it will be 1 means that this > guest will be restricted in one node); hypervisor will choose > begin node to pin for this guest use round robin. But the method I use > need a spin_lock to prevent create domain at same time. Are there any > more good methods, hope for your suggestion. That's a good start, thank you. Maybe Keir has some comments on the=20 spinlock issue. > 2: pass node parameter use higher bits in flags when create domain. > At this time, domain can record node information in domain struct > for further use, i.e. show which node to pin when setup_guest. =20 > If use this method, in your patch, can simply balance nodes just > like below; >=20 >> + for (i=3D0;i<=3Ddominfo.max_vcpu_id;i++) >> + { >> + node=3D ( i * numanodes ) / (dominfo.max_vcpu_id+1)+ =09 >> + domaininfo.first_node; >> + xc_vcpu_setaffinity (xc_handle, dom, i, nodemasks[node]); >> + } How many bits do you want to use? Maybe it's not a good idea to abuse=20 some variable to hold a limited number of nodes only ("640K ought to be=20 enough for anybody" ;-) But the general idea is good. Regards, Andre. --=20 Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany Tel: +49 351 277-84917 ----to satisfy European Law for business letters: AMD Saxony Limited Liability Company & Co. KG Sitz (Gesch=E4ftsanschrift): Wilschdorfer Landstr. 101, 01109 Dresden,=20 Deutschland Registergericht Dresden: HRA 4896 vertretungsberechtigter Komplement=E4r: AMD Saxony LLC (Sitz Wilmington,=20 Delaware, USA) Gesch=E4ftsf=FChrer der AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy