From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Andre Przywara" Subject: [RFC] Xen NUMA strategy Date: Fri, 14 Sep 2007 14:05:26 +0200 Message-ID: <46EA7906.2010504@amd.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1; format=flowed Content-Transfer-Encoding: quoted-printable Return-path: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: xen-devel@lists.xensource.com Cc: "Xu, Anthony" List-Id: xen-devel@lists.xenproject.org Hi, Anthony Xu and I have had some fruitful discussion about the further=20 direction of the NUMA support in Xen, I wanted to share the results with=20 the Xen community and start a discussion: We came up with two different approaches for better NUMA support in Xen: 1.) Guest NUMA support: spread a guest's resources (CPUs and memory)=20 over several nodes and propagate the appropriate topology to the guest. The first part of this is in the patches I sent recently to the list (PV=20 support is following, bells and whistles like automatic placement will=20 follow, too.). ***Advantages***: - The guest OS has better means to deal with the NUMA setup, it can more=20 easily migrate _processes_ among the nodes (Xen-HV can only migrate=20 whole domains). - Changes to Xen are relatively small. - There is no limit for the guest resources, since they can use more=20 resources than there are on one node. - If guests are well spread over the nodes, the system is more balanced=20 even if guests are destroyed and created later. ***Disadvantages***: - The guest has to support NUMA. This is not true for older guests=20 (Win2K, older Linux). - The guest's workload has to fit NUMA. If the guests tasks are merely=20 parallelizable or use much shared memory, they cannot take advantage of=20 NUMA and will degrade in performance. This includes all single task=20 problems. In general this approach seems to fit better with smaller NUMA nodes and=20 larger guests. 2.) Dynamic load balancing and page migration: create guests within one=20 NUMA node and distribute all guests across the nodes. If the system=20 becomes imbalanced, migrate guests to other nodes and copy (at least=20 part of) their memory pages to the other node's local memory. ***Advantages***: - No guest NUMA support necessary. Older as well a recent guests should=20 run fine. - Smaller guests don't have to cope with NUMA and will have 'flat'=20 memory available. - Guests running on separate nodes usually don't disturb each other and=20 can benefit from the higher distributed memory bandwidth. ***Disadvantages***: - Guests are limited to the resources available on one node. This=20 applies for both the number of CPUs and the amount of memory. - Costly migration of guests. In a simple implementation we'd use live=20 migration, which requires the whole guest's memory to be copied before=20 the guest starts to run on the other node. If this whole move proves to=20 be unnecessary a few minutes later, all this was in vain. A more=20 advanced implementation would do the page migration in the background=20 and thus can avoid this problem, if only the hot pages are migrated first= . - Integration into Xen seems to be more complicated (at least for the=20 more ungifted hackers among us). This approach seems to be more reasonable if you have larger nodes (for=20 instance 16 cores) and smaller guests (the more usual case nowadays?) After some discussion we came to the conclusion that both approaches=20 should be implemented. I want to put this to the list and am looking=20 forward to any feedback. Regards, Andre. --=20 Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany Tel: +49 351 277-84917 ----to satisfy European Law for business letters: AMD Saxony Limited Liability Company & Co. KG Sitz (Gesch=E4ftsanschrift): Wilschdorfer Landstr. 101, 01109 Dresden,=20 Deutschland Registergericht Dresden: HRA 4896 vertretungsberechtigter Komplement=E4r: AMD Saxony LLC (Sitz Wilmington,=20 Delaware, USA) Gesch=E4ftsf=FChrer der AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy