xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Dario Faggioli <raistlin@linux.it>
To: xen-devel@lists.xen.org
Cc: Andre Przywara <andre.przywara@amd.com>,
	Ian Campbell <Ian.Campbell@citrix.com>,
	Stefano Stabellini <Stefano.Stabellini@eu.citrix.com>,
	George Dunlap <george.dunlap@eu.citrix.com>,
	Juergen Gross <juergen.gross@ts.fujitsu.com>,
	Ian Jackson <Ian.Jackson@eu.citrix.com>
Subject: [PATCH 10 of 10 v2] Some automatic NUMA placement documentation
Date: Fri, 15 Jun 2012 19:04:38 +0200	[thread overview]
Message-ID: <0e566a40afad1e3dc299.1339779878@Solace> (raw)
In-Reply-To: <patchbomb.1339779868@Solace>

About rationale, usage and (some small bits of) API.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Changes from v1:
 * API documentation moved close to the actual functions.

diff --git a/docs/misc/xl-numa-placement.markdown b/docs/misc/xl-numa-placement.markdown
new file mode 100644
--- /dev/null
+++ b/docs/misc/xl-numa-placement.markdown
@@ -0,0 +1,74 @@
+# Guest Automatic NUMA Placement in libxl and xl #
+
+## Rationale ##
+
+The Xen hypervisor deals with Non-Uniform Memory Access (NUMA])
+machines by assigning to its domain a "node affinity", i.e., a set of NUMA
+nodes of the host from which it gets its memory allocated.
+
+NUMA awareness becomes very important as soon as many domains start running
+memory-intensive workloads on a shared host. In fact, the cost of accessing
+non node-local memory locations is very high, and the performance degradation
+is likely to be noticeable.
+
+## Guest Placement in xl ##
+
+If using xl for creating and managing guests, it is very easy to ask
+for both manual or automatic placement of them across the host's NUMA
+nodes.
+
+Note that xm/xend does the very same thing, the only differences residing
+in the details of the heuristics adopted for the placement (see below).
+
+### Manual Guest Placement with xl ###
+
+Thanks to the "cpus=" option, it is possible to specify where a domain
+should be created and scheduled on, directly in its config file. This
+affects NUMA placement and memory accesses as the hypervisor constructs
+the node affinity of a VM basing right on its CPU affinity when it is
+created.
+
+This is very simple and effective, but requires the user/system
+administrator to explicitly specify affinities for each and every domain,
+or Xen won't be able to enable guarantee the locality for their memory
+accesses.
+
+### Automatic Guest Placement with xl ###
+
+In case no "cpus=" option is specified in the config file, xl tries
+to figure out on its own on which node(s) the domain could fit best.
+
+First of all, it needs to find a node (or a set of nodes) that have
+enough free memory for accommodating the domain. After that, the actual
+decision on where to put the new guest happens by generating all the
+possible combinations of nodes that satisfies the above and chose among
+them according to the following heuristics:
+
+  *  candidates involving fewer nodes come first. In case two (or more)
+     candidates span the same number of nodes,
+  *  candidates with greater amount of free memory come first. In case
+     two (or more) candidates differ in their amount of free memory by
+     less than 10%,
+  *  candidates with fewer domains already placed on them come first.
+
+Giving preference to small candidates ensures better performance for
+the guest, as it avoid spreading its memory among different nodes.
+Using the nodes that have the biggest amounts of free memory helps
+keeping the memory fragmentation small, from a system wide perspective.
+Finally, in case more candidates fulfil these criteria by the same
+extent, choosing the candidate that is hosting fewer domain helps
+balancing the load on the various nodes.
+
+The last step is figuring out whether the selected candidate contains
+at least as much CPUs as the number of VCPUs of the VM. The current
+solution for the case when this is not verified is just to add some
+more nodes, until the condition turns into being true. When doing
+this, the nodes with the least possible distance from the ones
+already in the nodemap are considered.
+
+## Guest Placement within libxl ##
+
+xl achieves automatic NUMA placement by means of the following API
+calls, provided by libxl.
+
+

  parent reply	other threads:[~2012-06-15 17:04 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-15 17:04 [PATCH 00 of 10 v2] Automatic NUMA placement for xl Dario Faggioli
2012-06-15 17:04 ` [PATCH 01 of 10 v2] libxl: fix a typo in the GCREALLOC_ARRAY macro Dario Faggioli
2012-06-21  8:53   ` Ian Campbell
2012-06-26 16:00     ` Ian Campbell
2012-06-26 16:26       ` Dario Faggioli
2012-06-15 17:04 ` [PATCH 02 of 10 v2] libxl: add a new Array type to the IDL Dario Faggioli
2012-06-15 17:04 ` [PATCH 03 of 10 v2] libxl, libxc: introduce libxl_get_numainfo() Dario Faggioli
2012-06-21  9:02   ` Ian Campbell
2012-06-21 10:00     ` Dario Faggioli
2012-06-21 10:21       ` Ian Campbell
2012-06-15 17:04 ` [PATCH 04 of 10 v2] xl: add more NUMA information to `xl info -n' Dario Faggioli
2012-06-21  9:04   ` Ian Campbell
2012-06-15 17:04 ` [PATCH 05 of 10 v2] libxl: rename libxl_cpumap to libxl_bitmap Dario Faggioli
2012-06-21  9:12   ` Ian Campbell
2012-06-21  9:49     ` Dario Faggioli
2012-06-21 10:22       ` Ian Campbell
2012-06-15 17:04 ` [PATCH 06 of 10 v2] libxl: expand the libxl_bitmap API a bit Dario Faggioli
2012-06-21  9:30   ` Ian Campbell
2012-06-21  9:46     ` Dario Faggioli
2012-06-15 17:04 ` [PATCH 07 of 10 v2] libxl: introduce some node map helpers Dario Faggioli
2012-06-21  9:35   ` Ian Campbell
2012-06-21  9:44     ` Dario Faggioli
2012-06-15 17:04 ` [PATCH 08 of 10 v2] libxl: enable automatic placement of guests on NUMA nodes Dario Faggioli
2012-06-21 11:40   ` Ian Campbell
2012-06-21 16:34     ` Dario Faggioli
2012-06-22 10:14       ` Ian Campbell
2012-06-26 16:25         ` Dario Faggioli
2012-06-26 16:26           ` Ian Campbell
2012-06-26 17:23             ` Ian Jackson
2012-06-21 16:16   ` George Dunlap
2012-06-21 16:43     ` Dario Faggioli
2012-06-22 10:05       ` George Dunlap
2012-06-26 11:03         ` Ian Jackson
2012-06-26 15:20           ` Dario Faggioli
2012-06-27  8:15           ` Dario Faggioli
2012-06-28  7:25   ` Zhang, Yang Z
2012-06-28  8:36     ` George Dunlap
2012-06-29  5:38       ` Zhang, Yang Z
2012-06-29  9:46         ` Dario Faggioli
2012-06-28 10:12     ` Dario Faggioli
2012-06-28 12:41       ` Pasi Kärkkäinen
2012-06-28 17:03         ` Dario Faggioli
2012-06-29  5:29           ` Zhang, Yang Z
2012-06-29  9:38             ` Dario Faggioli
2012-06-15 17:04 ` [PATCH 09 of 10 v2] libxl: have NUMA placement deal with cpupools Dario Faggioli
2012-06-21 13:31   ` Ian Campbell
2012-06-21 13:54     ` Dario Faggioli
2012-06-21 13:58       ` Ian Campbell
2012-06-15 17:04 ` Dario Faggioli [this message]
2012-06-18 15:54   ` [PATCH 10 of 10 v2] Some automatic NUMA placement documentation Dario Faggioli
2012-06-21 13:38   ` Ian Campbell
2012-06-21 13:57     ` Dario Faggioli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0e566a40afad1e3dc299.1339779878@Solace \
    --to=raistlin@linux.it \
    --cc=Ian.Campbell@citrix.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=Stefano.Stabellini@eu.citrix.com \
    --cc=andre.przywara@amd.com \
    --cc=george.dunlap@eu.citrix.com \
    --cc=juergen.gross@ts.fujitsu.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).