[PATCH] tools: avoid over-commitment if numa=on

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Andre Przywara <andre.przywara@amd.com>
To: Keir Fraser <keir.fraser@eu.citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
	Dan Magenheimer <dan.magenheimer@oracle.com>,
	xen-devel@lists.xensource.com,
	Papagiannis Anastasios <apapag@ics.forth.gr>,
	Jan Beulich <JBeulich@novell.com>
Subject: [PATCH] tools: avoid over-commitment if numa=on
Date: Mon, 30 Nov 2009 16:40:48 +0100	[thread overview]
Message-ID: <4B13E780.1000807@amd.com> (raw)
In-Reply-To: <4AF84100020000780001E8CC@vpn.id2.novell.com>

[-- Attachment #1: Type: text/plain, Size: 1271 bytes --]

Jan Beulich wrote:
>>>> Andre Przywara <andre.przywara@amd.com> 09.11.09 16:02 >>>
>> BTW: Shouldn't we set finally numa=on as the default value?
> 
> I'd say no, at least until the default confinement of a guest to a single
> node gets fixed to properly deal with guests having more vCPU-s than
> a node's worth of pCPU-s (i.e. I take it for granted that the benefits of
> not overcommitting CPUs outweigh the drawbacks of cross-node memory
> accesses at the very least for CPU-bound workloads).
That sounds reasonable.
Attached a patch to lift the restriction of one node per guest if the 
number of VCPUs is greater than the number of cores / node.
This isn't optimal (the best way would be to inform the guest about it, 
but this is another patchset ;-), but should solve the above concerns.

Please apply,
Andre.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>

-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 448 3567 12
----to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Andrew Bowd; Thomas M. McCoy; Giuliano Meroni
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

[-- Attachment #2: more_NUMA_nodes.patch --]
[-- Type: text/x-patch, Size: 2389 bytes --]

# HG changeset patch
# User Andre Przywara <andre.przywara@amd.com>
# Date 1259594006 -3600
# Node ID bdf4109edffbcc0cbac605a19d2fd7a7459f1117
# Parent  abc6183f486e66b5721dbf0313ee0d3460613a99
allocate enough NUMA nodes for all VCPUs

If numa=on, we constrain a guest to one node to keep it's memory
accesses local. This will hurt performance if the number of VCPUs
is greater than the number of cores per node. We detect this case
now and allocate further NUMA nodes to allow all VCPUs to run
simultaneously.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>

diff -r abc6183f486e -r bdf4109edffb tools/python/xen/xend/XendDomainInfo.py
--- a/tools/python/xen/xend/XendDomainInfo.py	Mon Nov 30 10:58:23 2009 +0000
+++ b/tools/python/xen/xend/XendDomainInfo.py	Mon Nov 30 16:13:26 2009 +0100
@@ -2637,8 +2637,7 @@
                         nodeload[i] = int(nodeload[i] * 16 / len(info['node_to_cpu'][i]))
                     else:
                         nodeload[i] = sys.maxint
-                index = nodeload.index( min(nodeload) )    
-                return index
+                return map(lambda x: x[0], sorted(enumerate(nodeload), key=lambda x:x[1]))
 
             info = xc.physinfo()
             if info['nr_nodes'] > 1:
@@ -2648,8 +2647,15 @@
                 for i in range(0, info['nr_nodes']):
                     if node_memory_list[i] >= needmem and len(info['node_to_cpu'][i]) > 0:
                         candidate_node_list.append(i)
-                index = find_relaxed_node(candidate_node_list)
-                cpumask = info['node_to_cpu'][index]
+                best_node = find_relaxed_node(candidate_node_list)[0]
+                cpumask = info['node_to_cpu'][best_node]
+                cores_per_node = info['nr_cpus'] / info['nr_nodes']
+                nodes_required = (self.info['VCPUs_max'] + cores_per_node - 1) / cores_per_node
+                if nodes_required > 1:
+                    log.debug("allocating %d NUMA nodes", nodes_required)
+                    best_nodes = find_relaxed_node(filter(lambda x: x != best_node, range(0,info['nr_nodes'])))
+                    for i in best_nodes[:nodes_required - 1]:
+                        cpumask = cpumask + info['node_to_cpu'][i]
                 for v in range(0, self.info['VCPUs_max']):
                     xc.vcpu_setaffinity(self.domid, v, cpumask)
         return index

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

     prev parent reply	other threads:[~2009-11-30 15:40 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-04 12:02 Xen 3.4.1 NUMA support Papagiannis Anastasios
2009-11-04 12:32 ` Keir Fraser
2009-11-06 18:07   ` Dan Magenheimer
2009-11-09 11:33     ` George Dunlap
2009-11-09 11:39       ` Dulloor
2009-11-09 12:29         ` George Dunlap
2009-11-09 12:51           ` Dulloor
2009-11-09 11:44       ` Juergen Gross
2009-11-09 12:07         ` George Dunlap
2009-11-09 12:40         ` Keir Fraser
2009-11-09 15:02     ` Andre Przywara
2009-11-09 15:06       ` George Dunlap
2009-11-09 22:51         ` Andre Przywara
2009-11-10  6:56           ` Dulloor
2009-11-10  7:49             ` Andre Przywara
2009-11-13 14:14         ` Andre Przywara
2009-11-13 14:29           ` Ian Pratt
2009-11-13 15:25             ` George Dunlap
2009-11-13 15:35               ` Ian Pratt
2009-11-13 15:27             ` Keir Fraser
2009-11-13 15:40               ` Ian Pratt
2009-11-13 16:02                 ` Keir Fraser
2009-11-13 14:31           ` Keir Fraser
2009-11-13 15:38             ` Ian Pratt
2009-11-09 15:19       ` Jan Beulich
2009-11-10  1:46         ` Ian Pratt
2009-11-10  8:51           ` Jan Beulich
2009-11-10  8:57             ` Keir Fraser
2009-11-12 16:09         ` Keir Fraser
2009-11-30 15:40         ` Andre Przywara [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B13E780.1000807@amd.com \
    --to=andre.przywara@amd.com \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=JBeulich@novell.com \
    --cc=apapag@ics.forth.gr \
    --cc=dan.magenheimer@oracle.com \
    --cc=keir.fraser@eu.citrix.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.