xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Dario Faggioli <raistlin@linux.it>
To: xen-devel <xen-devel@lists.xen.org>, Dario Faggioli <raistlin@linux.it>
Cc: Andre Przywara <andre.przywara@amd.com>,
	Ian Campbell <Ian.Campbell@citrix.com>,
	Stefano Stabellini <Stefano.Stabellini@eu.citrix.com>,
	George Dunlap <george.dunlap@eu.citrix.com>,
	Juergen Gross <juergen.gross@ts.fujitsu.com>,
	Ian Jackson <Ian.Jackson@eu.citrix.com>,
	Roger Pau Monne <roger.pau@citrix.com>
Subject: [PATCH 4 of 4 v7/leftover] Some automatic NUMA placement documentation
Date: Tue, 24 Jul 2012 16:42:11 +0200	[thread overview]
Message-ID: <b74f5e8ceff2512b4a96.1343140931@Solace> (raw)
In-Reply-To: <patchbomb.1343140927@Solace>

About rationale, usage and (some small bits of) API.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

---
Changes from v6:
 * text updated to reflect the modified behaviour.
 * Lines wrapped to a smaller column number.

Changes from v5:
 * text updated to reflect the modified behaviour.

Changes from v3:
 * typos and rewording of some sentences, as suggested during review.

Changes from v1:
 * API documentation moved close to the actual functions.

diff --git a/docs/misc/xl-numa-placement.markdown b/docs/misc/xl-numa-placement.markdown
new file mode 100644
--- /dev/null
+++ b/docs/misc/xl-numa-placement.markdown
@@ -0,0 +1,97 @@
+# Guest Automatic NUMA Placement in libxl and xl #
+
+## Rationale ##
+
+NUMA (which stands for Non-Uniform Memory Access) means that the memory
+accessing times of a program running on a CPU depends on the relative
+distance between that CPU and that memory. In fact, most of the NUMA
+systems are built in such a way that each processor has its local memory,
+on which it can operate very fast. On the other hand, getting and storing
+data from and on remote memory (that is, memory local to some other processor)
+is quite more complex and slow. On these machines, a NUMA node is usually
+defined as a set of processor cores (typically a physical CPU package) and
+the memory directly attached to the set of cores.
+
+The Xen hypervisor deals with NUMA machines by assigning to each domain
+a "node affinity", i.e., a set of NUMA nodes of the host from which they
+get their memory allocated.
+
+NUMA awareness becomes very important as soon as many domains start
+running memory-intensive workloads on a shared host. In fact, the cost
+of accessing non node-local memory locations is very high, and the
+performance degradation is likely to be noticeable.
+
+## Guest Placement in xl ##
+
+If using xl for creating and managing guests, it is very easy to ask for
+both manual or automatic placement of them across the host's NUMA nodes.
+
+Note that xm/xend does the very same thing, the only differences residing
+in the details of the heuristics adopted for the placement (see below).
+
+### Manual Guest Placement with xl ###
+
+Thanks to the "cpus=" option, it is possible to specify where a domain
+should be created and scheduled on, directly in its config file. This
+affects NUMA placement and memory accesses as the hypervisor constructs
+the node affinity of a VM basing right on its CPU affinity when it is
+created.
+
+This is very simple and effective, but requires the user/system
+administrator to explicitly specify affinities for each and every domain,
+or Xen won't be able to guarantee the locality for their memory accesses.
+
+It is also possible to deal with NUMA by partitioning the system using
+cpupools (available in the upcoming release of Xen, 4.2). Again, this
+could be "The Right Answer" for many needs and occasions, but has to
+be carefully considered and setup by hand.
+
+### Automatic Guest Placement with xl ###
+
+If no "cpus=" option is specified in the config file, libxl tries
+to figure out on its own on which node(s) the domain could fit best.
+It is worthwhile noting that optimally fitting a set of VMs on the NUMA
+nodes of an host is an incarnation of the Bin Packing Problem. In fact,
+the various VMs with different memory sizes are the items to be packed,
+and the host nodes are the bins. As such problem is known to be NP-hard,
+we will be using some heuristics.
+
+The first thing to do is find the nodes or the sets of nodes (from now
+on referred to as 'candidates') that have enough free memory and enough
+physical CPUs for accommodating the new domain. The idea is to find a
+spot for the domain with at least as much free memory as it has configured
+to have, and as much pCPUs as it has vCPUs.  After that, the actual
+decision on which candidate to pick happens accordingly to the following
+heuristics:
+
+  *  candidates involving fewer nodes are considered better. In case
+     two (or more) candidates span the same number of nodes,
+  *  candidates with a smaller number of vCPUs runnable on them (due
+     to previous placement and/or plain vCPU pinning) are considered
+     better. In case the same number of vCPUs can run on two (or more)
+     candidates,
+  *  the candidate with with the greatest amount of free memory is
+     considered to be the best one.
+
+Giving preference to candidates with fewer nodes ensures better
+performance for the guest, as it avoid spreading its memory among
+different nodes. Favoring candidates with fewer vCPUs already runnable
+there ensures a good balance of the overall host load. Finally, if more
+candidates fulfil these criteria, prioritizing the nodes that have the
+largest amounts of free memory helps keeping the memory fragmentation
+small, and maximizes the probability of being able to put more domains
+there.
+
+## Guest Placement within libxl ##
+
+xl achieves automatic NUMA just because libxl does it internally. No
+API is provided (yet) for interacting with this feature and modify the
+library behaviour regarding automatic placement, it just happens by
+default if no affinity is specified (as it is with xm/xend).
+
+For actually looking and maybe tweaking the mechanism and the algorithms
+it uses, all is implemented as a set of libxl internal interfaces and
+facilities. Look for the comment "Automatic NUMA placement"
+in libxl\_internal.h.
+
+Note this may change in future versions of Xen/libxl.

  parent reply	other threads:[~2012-07-24 14:42 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-24 14:42 [PATCH 0 of 4 v7/leftover] Automatic NUMA placement for xl Dario Faggioli
2012-07-24 14:42 ` [PATCH 1 of 4 v7/leftover] libxl: kill the need for checking and linking to libm Dario Faggioli
2012-07-25 16:40   ` Ian Campbell
2012-07-24 14:42 ` [PATCH 2 of 4 v7/leftover] libxl: enable automatic placement of guests on NUMA nodes Dario Faggioli
2012-07-26 12:03   ` Andre Przywara
2012-07-26 13:54     ` Dario Faggioli
2012-07-26 14:14   ` Ian Jackson
2012-07-26 16:00     ` Dario Faggioli
2012-07-24 14:42 ` [PATCH 3 of 4 v7/leftover] libxl: have NUMA placement deal with cpupools Dario Faggioli
2012-07-26 14:20   ` Ian Jackson
2012-07-24 14:42 ` Dario Faggioli [this message]
2012-07-26 14:22   ` [PATCH 4 of 4 v7/leftover] Some automatic NUMA placement documentation Ian Jackson
2012-07-26 14:31   ` Ian Campbell
2012-07-26 14:42 ` [PATCH 0 of 4 v7/leftover] Automatic NUMA placement for xl Ian Campbell
2012-07-26 15:47   ` Dario Faggioli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b74f5e8ceff2512b4a96.1343140931@Solace \
    --to=raistlin@linux.it \
    --cc=Ian.Campbell@citrix.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=Stefano.Stabellini@eu.citrix.com \
    --cc=andre.przywara@amd.com \
    --cc=george.dunlap@eu.citrix.com \
    --cc=juergen.gross@ts.fujitsu.com \
    --cc=roger.pau@citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).