From: Dario Faggioli <raistlin@linux.it>
To: xen-devel@lists.xen.org
Cc: Andre Przywara <andre.przywara@amd.com>,
Ian Campbell <Ian.Campbell@citrix.com>,
Stefano Stabellini <Stefano.Stabellini@eu.citrix.com>,
George Dunlap <george.dunlap@eu.citrix.com>,
Juergen Gross <juergen.gross@ts.fujitsu.com>,
Ian Jackson <Ian.Jackson@eu.citrix.com>
Subject: [PATCH 11 of 11] Some automatic NUMA placement documentation
Date: Thu, 31 May 2012 14:11:16 +0200 [thread overview]
Message-ID: <e9b2e81e4afbde8c3ceb.1338466276@Solace> (raw)
In-Reply-To: <patchbomb.1338466265@Solace>
About rationale, usage and API.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
diff --git a/docs/misc/xl-numa-placement.markdonw b/docs/misc/xl-numa-placement.markdonw
new file mode 100644
--- /dev/null
+++ b/docs/misc/xl-numa-placement.markdonw
@@ -0,0 +1,113 @@
+# Guest Automatic NUMA Placement in libxl and xl #
+
+## Rationale ##
+
+The Xen hypervisor deals with Non-Uniform Memory Access (NUMA])
+machines by assigning to its domain a "node affinity", i.e., a set of NUMA
+nodes of the host from which it gets its memory allocated.
+
+NUMA awareness becomes very important as soon as many domains start running
+memory-intensive workloads on a shared host. In fact, the cost of accessing
+non node-local memory locations is very high, and the performance degradation
+is likely to be noticeable.
+
+## Guest Placement in xl ##
+
+If using xl for creating and managing guests, it is very easy to ask
+for both manual or automatic placement of them across the host's NUMA
+nodes.
+
+Note that xm/xend does the very same thing, the only differences residing
+in the details of the heuristics adopted for the placement (see below).
+
+### Manual Guest Placement with xl ###
+
+Thanks to the "cpus=" option, it is possible to specify where a domain
+should be created and scheduled on, directly in its config file. This
+affects NUMA placement and memory accesses as the hypervisor constructs
+the node affinity of a VM basing right on its CPU affinity when it is
+created.
+
+This is very simple and effective, but requires the user/system
+administrator to explicitly specify affinities for each and every domain,
+or Xen won't be able to enable guarantee the locality for their memory
+accesses.
+
+### Automatic Guest Placement with xl ###
+
+In case no "cpus=" option is specified in the config file, xl tries
+to figure out on its own on which node(s) the domain could fit best.
+
+First of all, it needs to find a node (or a set of nodes) that have
+enough free memory for accommodating the domain. After that, the actual
+decision on where to put the new guest happens by generating all the
+possible combinations of nodes that satisfies the above and chose among
+them according to the following heuristics:
+
+ * candidates involving fewer nodes come first. In case two (or more)
+ candidates span the same number of nodes,
+ * candidates with greater amount of free memory come first. In case
+ two (or more) candidates differ in their amount of free memory by
+ less than 10%,
+ * candidates with fewer domains already placed on them come first.
+
+Giving preference to small candidates ensures better performance for
+the guest, as it avoid spreading its memory among different nodes.
+Using the nodes that have the biggest amounts of free memory helps
+keeping the memory fragmentation small, from a system wide perspective.
+Finally, in case more candidates fulfil these criteria by the same
+extent, choosing the candidate that is hosting fewer domain helps
+balancing the load on the various nodes.
+
+The last step is figuring out whether the selected candidate contains
+at least as much CPUs as the number of VCPUs of the VM. The current
+solution for the case when this is not verified is just to add some
+more nodes, until the condition turns into being true. When doing
+this, the nodes with the least possible distance from the ones
+already in the nodemap are considered.
+
+## Guest Placement in libxl ##
+
+xl achieves automatic NUMA placement by means of the following API
+calls, provided by libxl.
+
+ libxl_numa_candidate *libxl_domain_numa_candidates(libxl_ctx *ctx,
+ libxl_domain_build_info *b_info,
+ int min_nodes, int *nr_cndts);
+
+This is what should be used to generate the full set of placement
+candidates. In fact, the function returns an array of containing nr_cndts
+libxl_numa_candidate (see below). Each candidate is basically a set of nodes
+that has been checked against the memory requirement derived from the
+provided libxl_domain_build_info.
+
+ int libxl_numa_candidate_add_cpus(libxl_ctx *ctx,
+ int min_cpus, int max_nodes,
+ libxl_numa_candidate *candidate);
+
+This is what should be used to ensure a placement candidate has at least
+min_cpus CPUs. In case it does not, the function also take care of
+adding more nodes to the candidate itself (up to when the value specified
+in max_nodes is reached). When adding new nodes, the one that has the
+smallest "distance" from the current node map is selected at each step.
+
+ libxl_numa_candidate_count_domains(libxl_ctx *ctx,
+ libxl_numa_candidate *candidate);
+
+This is what counts the number of domains that are currently pinned
+to the CPUs of the nodes of a given candidate.
+
+Finally, a placement candidate is represented by the following data
+structure:
+
+ typedef struct libxl_numa_candidate {
+ int nr_nodes;
+ int nr_domains;
+ uint32_t free_memkb;
+ libxl_nodemap nodemap;
+ } libxl_numa_candidate;
+
+It basically tells what are the nodes the candidate spans (in the nodemap),
+how many of them there are (nr_nodes), how much free memory they can
+provide all together (free_memkb) and how many domains are running pinned
+to their CPUs (nr_domains).
next prev parent reply other threads:[~2012-05-31 12:11 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-31 12:11 [PATCH 00 of 11] Automatic NUMA placement for xl Dario Faggioli
2012-05-31 12:11 ` [PATCH 01 of 11] libxc: abstract xenctl_cpumap to just xenctl_map Dario Faggioli
2012-05-31 14:10 ` Ian Jackson
2012-05-31 14:47 ` George Dunlap
2012-05-31 14:55 ` George Dunlap
2012-05-31 15:01 ` George Dunlap
2012-05-31 15:08 ` Dario Faggioli
2012-05-31 12:11 ` [PATCH 02 of 11] libxl: abstract libxl_cpumap to just libxl_map Dario Faggioli
2012-05-31 14:11 ` Ian Jackson
2012-05-31 14:54 ` George Dunlap
2012-05-31 12:11 ` [PATCH 03 of 11] libxc, libxl: introduce xc_nodemap_t and libxl_nodemap Dario Faggioli
2012-05-31 14:12 ` Ian Jackson
2012-05-31 14:32 ` Dario Faggioli
2012-05-31 15:41 ` George Dunlap
2012-05-31 16:09 ` Dario Faggioli
2012-05-31 12:11 ` [PATCH 04 of 11] libxl: expand the libxl_{cpu, node}map API a bit Dario Faggioli
2012-05-31 14:13 ` Ian Jackson
2012-05-31 14:30 ` Dario Faggioli
2012-06-08 13:54 ` Ian Jackson
2012-05-31 15:51 ` George Dunlap
2012-05-31 16:01 ` Dario Faggioli
2012-05-31 12:11 ` [PATCH 05 of 11] libxl: add a new Array type to the IDL Dario Faggioli
2012-05-31 15:54 ` George Dunlap
2012-06-08 14:03 ` Ian Jackson
2012-06-08 15:14 ` Dario Faggioli
2012-06-08 15:17 ` Ian Jackson
2012-06-08 15:37 ` Ian Jackson
2012-06-08 15:52 ` Dario Faggioli
2012-06-08 15:57 ` Ian Jackson
2012-06-12 9:02 ` Ian Campbell
2012-06-13 6:59 ` Dario Faggioli
2012-06-18 12:06 ` Dario Faggioli
2012-06-21 14:32 ` Dario Faggioli
2012-06-21 14:35 ` Ian Campbell
2012-06-21 14:35 ` Dario Faggioli
2012-06-26 16:28 ` Ian Jackson
2012-06-26 16:30 ` Ian Campbell
2012-06-26 16:58 ` Dario Faggioli
2012-05-31 12:11 ` [PATCH 06 of 11] libxl: introduce libxl_get_numainfo() Dario Faggioli
2012-05-31 14:22 ` Ian Jackson
2012-05-31 14:57 ` Dario Faggioli
2012-06-01 16:44 ` George Dunlap
2012-06-01 16:58 ` Ian Jackson
2012-05-31 12:11 ` [PATCH 07 of 11] xen: enhance dump_numa output Dario Faggioli
2012-05-31 14:23 ` Ian Jackson
2012-05-31 14:35 ` Dario Faggioli
2012-05-31 12:11 ` [PATCH 08 of 11] xl: add more NUMA information to `xl info -n' Dario Faggioli
2012-05-31 14:24 ` Ian Jackson
2012-05-31 14:40 ` Dario Faggioli
2012-06-01 16:56 ` George Dunlap
2012-05-31 12:11 ` [PATCH 09 of 11] libxl, xl: enable automatic placement of guests on NUMA nodes Dario Faggioli
2012-05-31 15:02 ` Ian Jackson
2012-05-31 16:27 ` Dario Faggioli
2012-05-31 16:42 ` Ian Campbell
2012-05-31 16:56 ` Dario Faggioli
2012-05-31 12:11 ` [PATCH 10 of 11] libxl, xl: heuristics for reordering NUMA placement candidates Dario Faggioli
2012-05-31 12:11 ` Dario Faggioli [this message]
2012-05-31 15:08 ` [PATCH 11 of 11] Some automatic NUMA placement documentation Ian Jackson
2012-05-31 15:41 ` Dario Faggioli
2012-06-08 14:01 ` Ian Jackson
2012-06-08 14:03 ` George Dunlap
2012-06-08 14:06 ` Ian Jackson
2012-06-08 14:35 ` Dario Faggioli
2012-06-08 15:19 ` Dario Faggioli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e9b2e81e4afbde8c3ceb.1338466276@Solace \
--to=raistlin@linux.it \
--cc=Ian.Campbell@citrix.com \
--cc=Ian.Jackson@eu.citrix.com \
--cc=Stefano.Stabellini@eu.citrix.com \
--cc=andre.przywara@amd.com \
--cc=george.dunlap@eu.citrix.com \
--cc=juergen.gross@ts.fujitsu.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.