[PATCH 3 of 3 v5/leftover] Some automatic NUMA placement documentation

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

From: Dario Faggioli <raistlin@linux.it>
To: xen-devel <xen-devel@lists.xen.org>, Dario Faggioli <raistlin@linux.it>
Cc: Andre Przywara <andre.przywara@amd.com>,
	Ian Campbell <Ian.Campbell@citrix.com>,
	Stefano Stabellini <Stefano.Stabellini@eu.citrix.com>,
	George Dunlap <george.dunlap@eu.citrix.com>,
	Juergen Gross <juergen.gross@ts.fujitsu.com>,
	Ian Jackson <Ian.Jackson@eu.citrix.com>
Subject: [PATCH 3 of 3 v5/leftover] Some automatic NUMA placement documentation
Date: Mon, 16 Jul 2012 19:13:14 +0200	[thread overview]
Message-ID: <e030cd2086a3332966c5.1342458794@Solace> (raw)
In-Reply-To: <patchbomb.1342458791@Solace>

About rationale, usage and (some small bits of) API.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

---
Changes from v3:
 * typos and rewording of some sentences, as suggested during review.

Changes from v1:
 * API documentation moved close to the actual functions.

diff --git a/docs/misc/xl-numa-placement.markdown b/docs/misc/xl-numa-placement.markdown
new file mode 100644
--- /dev/null
+++ b/docs/misc/xl-numa-placement.markdown
@@ -0,0 +1,89 @@
+# Guest Automatic NUMA Placement in libxl and xl #
+
+## Rationale ##
+
+NUMA means the memory accessing times of a program running on a CPU depends on
+the relative distance between that CPU and that memory. In fact, most of the
+NUMA systems are built in such a way that each processor has its local memory,
+on which it can operate very fast. On the other hand, getting and storing data
+from and on remote memory (that is, memory local to some other processor) is
+quite more complex and slow. On these machines, a NUMA node is usually defined
+as a set of processor cores (typically a physical CPU package) and the memory
+directly attached to the set of cores.
+
+The Xen hypervisor deals with Non-Uniform Memory Access (NUMA]) machines by
+assigning to its domain a "node affinity", i.e., a set of NUMA nodes of the
+host from which it gets its memory allocated.
+
+NUMA awareness becomes very important as soon as many domains start running
+memory-intensive workloads on a shared host. In fact, the cost of accessing non
+node-local memory locations is very high, and the performance degradation is
+likely to be noticeable.
+
+## Guest Placement in xl ##
+
+If using xl for creating and managing guests, it is very easy to ask for both
+manual or automatic placement of them across the host's NUMA nodes.
+
+Note that xm/xend does the very same thing, the only differences residing in
+the details of the heuristics adopted for the placement (see below).
+
+### Manual Guest Placement with xl ###
+
+Thanks to the "cpus=" option, it is possible to specify where a domain should
+be created and scheduled on, directly in its config file. This affects NUMA
+placement and memory accesses as the hypervisor constructs the node affinity of
+a VM basing right on its CPU affinity when it is created.
+
+This is very simple and effective, but requires the user/system administrator
+to explicitly specify affinities for each and every domain, or Xen won't be
+able to guarantee the locality for their memory accesses.
+
+It is also possible to deal with NUMA by partitioning the system using cpupools
+(available in the upcoming release of Xen, 4.2). Again, this could be "The
+Right Answer" for many needs and occasions, but  has to to be carefully
+considered and manually setup by hand.
+
+### Automatic Guest Placement with xl ###
+
+If no "cpus=" option is specified in the config file, libxl tries to figure out
+on its own on which node(s) the domain could fit best.  It is worthwhile noting
+that optimally fitting a set of VMs on the NUMA nodes of an host is an
+incarnation of the Bin Packing Problem. In fact, the various VMs with different
+memory sizes are the items to be packed, and the host nodes are the bins. That
+is known to be NP-hard, thus. We will therefore be using some heuristics.
+
+The first thing to do is finding  a node, or even a set of nodes, that have
+enough free memory and enough physical CPUs for accommodating the one new
+domain. The idea is to find a spot for the domain with at least as much free
+memory as it has configured, and as much pCPUs as it has vCPUs.  After that,
+the actual decision on which solution to go for happens accordingly to the
+following heuristics:
+
+  *  candidates involving fewer nodes come first. In case two (or more)
+     candidates span the same number of nodes,
+  *  the amount of free memory and the number of domains assigned to the
+     candidates are considered. In doing that, candidates with greater amount
+     of free memory and fewer assigned domains are preferred, with free memory
+     "weighting" three times as much as number of domains.
+
+Giving preference to candidates with fewer nodes ensures better performance for
+the guest, as it avoid spreading its memory among different nodes.  Favouring
+the nodes that have the largest amounts of free memory helps keeping the memory
+fragmentation small, from a system wide perspective.  However, if more
+candidates fulfil these criteria by roughly the same extent, having the number
+of domains the candidates are "hosting" helps balancing the load on the various
+nodes.
+
+## Guest Placement within libxl ##
+
+xl achieves automatic NUMA just because libxl does it interrnally.
+No API is provided (yet) for interacting with this feature and modify
+the library behaviour regarding automatic placement, it just happens
+by default if no affinity is specified (as it is with xm/xend).
+
+For actually looking and maybe tweaking the mechanism and the algorithms it
+uses, all is implemented as a set of libxl internal interfaces and facilities.
+Look at the comment "Automatic NUMA placement" in libxl\_internal.h.
+
+Note this may change in future versions of Xen/libxl.

next prev parent reply	other threads:[~2012-07-16 17:13 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-10 15:03 [PATCH 0 of 3 v4/leftover] Automatic NUMA placement for xl Dario Faggioli
2012-07-10 15:03 ` [PATCH 1 of 3 v4/leftover] libxl: enable automatic placement of guests on NUMA nodes Dario Faggioli
2012-07-17 15:55   ` Ian Jackson
2012-07-16 17:13     ` [PATCH 0 of 3 v5/leftover] Automatic NUMA placement for xl Dario Faggioli
2012-07-16 17:13       ` [PATCH 1 of 3 v5/leftover] libxl: enable automatic placement of guests on NUMA nodes Dario Faggioli
2012-07-17 18:04         ` [PATCH 1 of 3 v5/leftover] libxl: enable automatic placement of guests on NUMA nodes [and 1 more messages] Ian Jackson
2012-07-17 20:23           ` Ian Campbell
2012-07-18  0:31             ` Dario Faggioli
2012-07-18 10:44             ` Ian Jackson
2012-07-18  0:22           ` Dario Faggioli
2012-07-18  8:27             ` Dario Faggioli
2012-07-18  9:13             ` Ian Campbell
2012-07-18  9:43               ` Dario Faggioli
2012-07-18  9:53                 ` Ian Campbell
2012-07-18 10:08                   ` Dario Faggioli
2012-07-18 11:00                   ` Ian Jackson
2012-07-18 13:14                     ` Ian Campbell
2012-07-18 13:35                       ` Dario Faggioli
2012-07-19 12:47                       ` Dario Faggioli
2012-07-18 13:40                     ` Andre Przywara
2012-07-18 13:54                       ` Juergen Gross
2012-07-18 14:00                       ` Dario Faggioli
2012-07-19 14:43                       ` Ian Jackson
2012-07-19 18:37                         ` Andre Przywara
2012-07-21  1:46                           ` Dario Faggioli
2012-07-18 10:53                 ` Ian Jackson
2012-07-18 13:12                   ` Ian Campbell
2012-07-18  9:47             ` Dario Faggioli
2012-07-19 12:21         ` [PATCH 1 of 3 v5/leftover] libxl: enable automatic placement of guests on NUMA nodes Andre Przywara
2012-07-19 14:22           ` Dario Faggioli
2012-07-20  8:19             ` Andre Przywara
2012-07-20  9:39               ` Dario Faggioli
2012-07-20 10:01                 ` Dario Faggioli
2012-07-20  8:20             ` Dario Faggioli
2012-07-20  8:26               ` Andre Przywara
2012-07-20  8:38                 ` Juergen Gross
2012-07-20  9:52                   ` Dario Faggioli
2012-07-20  9:56                     ` Juergen Gross
2012-07-20  9:44                 ` Dario Faggioli
2012-07-20 11:47                   ` Andre Przywara
2012-07-20 12:54                     ` Dario Faggioli
2012-07-20 13:07                       ` Andre Przywara
2012-07-21  1:44                         ` Dario Faggioli
2012-07-16 17:13       ` [PATCH 2 of 3 v5/leftover] libxl: have NUMA placement deal with cpupools Dario Faggioli
2012-07-16 17:13       ` Dario Faggioli [this message]
2012-07-20 11:07       ` [PATCH 0 of 3 v5/leftover] Automatic NUMA placement for xl David Vrabel
2012-07-20 11:43         ` Andre Przywara
2012-07-20 12:00           ` Ian Campbell
2012-07-20 12:08             ` Ian Campbell
2012-07-23 10:38               ` Dario Faggioli
2012-07-23 10:42                 ` Ian Campbell
2012-07-23 15:31                   ` Dario Faggioli
2012-07-23 10:23             ` Dario Faggioli
2012-07-20 12:14           ` David Vrabel
2012-07-17 15:59     ` [PATCH 1 of 3 v4/leftover] libxl: enable automatic placement of guests on NUMA nodes Ian Campbell
2012-07-17 18:01       ` Ian Jackson
2012-07-17 22:15     ` Dario Faggioli
2012-07-10 15:03 ` [PATCH 2 of 3 v4/leftover] libxl: have NUMA placement deal with cpupools Dario Faggioli
2012-07-10 15:03 ` [PATCH 3 of 3 v4/leftover] Some automatic NUMA placement documentation Dario Faggioli
2012-07-16 17:03 ` [PATCH 0 of 3 v4/leftover] Automatic NUMA placement for xl Dario Faggioli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e030cd2086a3332966c5.1342458794@Solace \
    --to=raistlin@linux.it \
    --cc=Ian.Campbell@citrix.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=Stefano.Stabellini@eu.citrix.com \
    --cc=andre.przywara@amd.com \
    --cc=george.dunlap@eu.citrix.com \
    --cc=juergen.gross@ts.fujitsu.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).