From: Dario Faggioli <raistlin@linux.it>
To: xen-devel@lists.xen.org
Cc: Andre Przywara <andre.przywara@amd.com>,
Ian Campbell <Ian.Campbell@citrix.com>,
Stefano Stabellini <Stefano.Stabellini@eu.citrix.com>,
George Dunlap <george.dunlap@eu.citrix.com>,
Juergen Gross <juergen.gross@ts.fujitsu.com>,
Ian Jackson <Ian.Jackson@eu.citrix.com>
Subject: [PATCH 11 of 11] Some automatic NUMA placement documentation
Date: Thu, 31 May 2012 14:11:16 +0200 [thread overview]
Message-ID: <e9b2e81e4afbde8c3ceb.1338466276@Solace> (raw)
In-Reply-To: <patchbomb.1338466265@Solace>
About rationale, usage and API.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
diff --git a/docs/misc/xl-numa-placement.markdonw b/docs/misc/xl-numa-placement.markdonw
new file mode 100644
--- /dev/null
+++ b/docs/misc/xl-numa-placement.markdonw
@@ -0,0 +1,113 @@
+# Guest Automatic NUMA Placement in libxl and xl #
+
+## Rationale ##
+
+The Xen hypervisor deals with Non-Uniform Memory Access (NUMA])
+machines by assigning to its domain a "node affinity", i.e., a set of NUMA
+nodes of the host from which it gets its memory allocated.
+
+NUMA awareness becomes very important as soon as many domains start running
+memory-intensive workloads on a shared host. In fact, the cost of accessing
+non node-local memory locations is very high, and the performance degradation
+is likely to be noticeable.
+
+## Guest Placement in xl ##
+
+If using xl for creating and managing guests, it is very easy to ask
+for both manual or automatic placement of them across the host's NUMA
+nodes.
+
+Note that xm/xend does the very same thing, the only differences residing
+in the details of the heuristics adopted for the placement (see below).
+
+### Manual Guest Placement with xl ###
+
+Thanks to the "cpus=" option, it is possible to specify where a domain
+should be created and scheduled on, directly in its config file. This
+affects NUMA placement and memory accesses as the hypervisor constructs
+the node affinity of a VM basing right on its CPU affinity when it is
+created.
+
+This is very simple and effective, but requires the user/system
+administrator to explicitly specify affinities for each and every domain,
+or Xen won't be able to enable guarantee the locality for their memory
+accesses.
+
+### Automatic Guest Placement with xl ###
+
+In case no "cpus=" option is specified in the config file, xl tries
+to figure out on its own on which node(s) the domain could fit best.
+
+First of all, it needs to find a node (or a set of nodes) that have
+enough free memory for accommodating the domain. After that, the actual
+decision on where to put the new guest happens by generating all the
+possible combinations of nodes that satisfies the above and chose among
+them according to the following heuristics:
+
+ * candidates involving fewer nodes come first. In case two (or more)
+ candidates span the same number of nodes,
+ * candidates with greater amount of free memory come first. In case
+ two (or more) candidates differ in their amount of free memory by
+ less than 10%,
+ * candidates with fewer domains already placed on them come first.
+
+Giving preference to small candidates ensures better performance for
+the guest, as it avoid spreading its memory among different nodes.
+Using the nodes that have the biggest amounts of free memory helps
+keeping the memory fragmentation small, from a system wide perspective.
+Finally, in case more candidates fulfil these criteria by the same
+extent, choosing the candidate that is hosting fewer domain helps
+balancing the load on the various nodes.
+
+The last step is figuring out whether the selected candidate contains
+at least as much CPUs as the number of VCPUs of the VM. The current
+solution for the case when this is not verified is just to add some
+more nodes, until the condition turns into being true. When doing
+this, the nodes with the least possible distance from the ones
+already in the nodemap are considered.
+
+## Guest Placement in libxl ##
+
+xl achieves automatic NUMA placement by means of the following API
+calls, provided by libxl.
+
+ libxl_numa_candidate *libxl_domain_numa_candidates(libxl_ctx *ctx,
+ libxl_domain_build_info *b_info,
+ int min_nodes, int *nr_cndts);
+
+This is what should be used to generate the full set of placement
+candidates. In fact, the function returns an array of containing nr_cndts
+libxl_numa_candidate (see below). Each candidate is basically a set of nodes
+that has been checked against the memory requirement derived from the
+provided libxl_domain_build_info.
+
+ int libxl_numa_candidate_add_cpus(libxl_ctx *ctx,
+ int min_cpus, int max_nodes,
+ libxl_numa_candidate *candidate);
+
+This is what should be used to ensure a placement candidate has at least
+min_cpus CPUs. In case it does not, the function also take care of
+adding more nodes to the candidate itself (up to when the value specified
+in max_nodes is reached). When adding new nodes, the one that has the
+smallest "distance" from the current node map is selected at each step.
+
+ libxl_numa_candidate_count_domains(libxl_ctx *ctx,
+ libxl_numa_candidate *candidate);
+
+This is what counts the number of domains that are currently pinned
+to the CPUs of the nodes of a given candidate.
+
+Finally, a placement candidate is represented by the following data
+structure:
+
+ typedef struct libxl_numa_candidate {
+ int nr_nodes;
+ int nr_domains;
+ uint32_t free_memkb;
+ libxl_nodemap nodemap;
+ } libxl_numa_candidate;
+
+It basically tells what are the nodes the candidate spans (in the nodemap),
+how many of them there are (nr_nodes), how much free memory they can
+provide all together (free_memkb) and how many domains are running pinned
+to their CPUs (nr_domains).
next prev parent reply other threads:[~2012-05-31 12:11 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-31 12:11 [PATCH 00 of 11] Automatic NUMA placement for xl Dario Faggioli
2012-05-31 12:11 ` [PATCH 01 of 11] libxc: abstract xenctl_cpumap to just xenctl_map Dario Faggioli
2012-05-31 14:10 ` Ian Jackson
2012-05-31 14:47 ` George Dunlap
2012-05-31 14:55 ` George Dunlap
2012-05-31 15:01 ` George Dunlap
2012-05-31 15:08 ` Dario Faggioli
2012-05-31 12:11 ` [PATCH 02 of 11] libxl: abstract libxl_cpumap to just libxl_map Dario Faggioli
2012-05-31 14:11 ` Ian Jackson
2012-05-31 14:54 ` George Dunlap
2012-05-31 12:11 ` [PATCH 03 of 11] libxc, libxl: introduce xc_nodemap_t and libxl_nodemap Dario Faggioli
2012-05-31 14:12 ` Ian Jackson
2012-05-31 14:32 ` Dario Faggioli
2012-05-31 15:41 ` George Dunlap
2012-05-31 16:09 ` Dario Faggioli
2012-05-31 12:11 ` [PATCH 04 of 11] libxl: expand the libxl_{cpu, node}map API a bit Dario Faggioli
2012-05-31 14:13 ` Ian Jackson
2012-05-31 14:30 ` Dario Faggioli
2012-06-08 13:54 ` Ian Jackson
2012-05-31 15:51 ` George Dunlap
2012-05-31 16:01 ` Dario Faggioli
2012-05-31 12:11 ` [PATCH 05 of 11] libxl: add a new Array type to the IDL Dario Faggioli
2012-05-31 15:54 ` George Dunlap
2012-06-08 14:03 ` Ian Jackson
2012-06-08 15:14 ` Dario Faggioli
2012-06-08 15:17 ` Ian Jackson
2012-06-08 15:37 ` Ian Jackson
2012-06-08 15:52 ` Dario Faggioli
2012-06-08 15:57 ` Ian Jackson
2012-06-12 9:02 ` Ian Campbell
2012-06-13 6:59 ` Dario Faggioli
2012-06-18 12:06 ` Dario Faggioli
2012-06-21 14:32 ` Dario Faggioli
2012-06-21 14:35 ` Ian Campbell
2012-06-21 14:35 ` Dario Faggioli
2012-06-26 16:28 ` Ian Jackson
2012-06-26 16:30 ` Ian Campbell
2012-06-26 16:58 ` Dario Faggioli
2012-05-31 12:11 ` [PATCH 06 of 11] libxl: introduce libxl_get_numainfo() Dario Faggioli
2012-05-31 14:22 ` Ian Jackson
2012-05-31 14:57 ` Dario Faggioli
2012-06-01 16:44 ` George Dunlap
2012-06-01 16:58 ` Ian Jackson
2012-05-31 12:11 ` [PATCH 07 of 11] xen: enhance dump_numa output Dario Faggioli
2012-05-31 14:23 ` Ian Jackson
2012-05-31 14:35 ` Dario Faggioli
2012-05-31 12:11 ` [PATCH 08 of 11] xl: add more NUMA information to `xl info -n' Dario Faggioli
2012-05-31 14:24 ` Ian Jackson
2012-05-31 14:40 ` Dario Faggioli
2012-06-01 16:56 ` George Dunlap
2012-05-31 12:11 ` [PATCH 09 of 11] libxl, xl: enable automatic placement of guests on NUMA nodes Dario Faggioli
2012-05-31 15:02 ` Ian Jackson
2012-05-31 16:27 ` Dario Faggioli
2012-05-31 16:42 ` Ian Campbell
2012-05-31 16:56 ` Dario Faggioli
2012-05-31 12:11 ` [PATCH 10 of 11] libxl, xl: heuristics for reordering NUMA placement candidates Dario Faggioli
2012-05-31 12:11 ` Dario Faggioli [this message]
2012-05-31 15:08 ` [PATCH 11 of 11] Some automatic NUMA placement documentation Ian Jackson
2012-05-31 15:41 ` Dario Faggioli
2012-06-08 14:01 ` Ian Jackson
2012-06-08 14:03 ` George Dunlap
2012-06-08 14:06 ` Ian Jackson
2012-06-08 14:35 ` Dario Faggioli
2012-06-08 15:19 ` Dario Faggioli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e9b2e81e4afbde8c3ceb.1338466276@Solace \
--to=raistlin@linux.it \
--cc=Ian.Campbell@citrix.com \
--cc=Ian.Jackson@eu.citrix.com \
--cc=Stefano.Stabellini@eu.citrix.com \
--cc=andre.przywara@amd.com \
--cc=george.dunlap@eu.citrix.com \
--cc=juergen.gross@ts.fujitsu.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).