From: George Dunlap <george.dunlap@eu.citrix.com>
To: Dario Faggioli <raistlin@linux.it>
Cc: Andre Przywara <andre.przywara@amd.com>,
Ian Campbell <Ian.Campbell@citrix.com>,
Stefano Stabellini <Stefano.Stabellini@eu.citrix.com>,
Juergen Gross <juergen.gross@ts.fujitsu.com>,
Ian Jackson <Ian.Jackson@eu.citrix.com>,
"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
Roger Pau Monne <roger.pau@citrix.com>
Subject: Re: [PATCH 10 of 10 v3] Some automatic NUMA placement documentation
Date: Fri, 6 Jul 2012 15:08:35 +0100 [thread overview]
Message-ID: <4FF6F163.6010300@eu.citrix.com> (raw)
In-Reply-To: <f1523c3dc63746e07b11.1341418689@Solace>
On 04/07/12 17:18, Dario Faggioli wrote:
> # HG changeset patch
> # User Dario Faggioli<raistlin@linux.it>
> # Date 1341416324 -7200
> # Node ID f1523c3dc63746e07b11fada5be3d461c3807256
> # Parent 885e2f385601d66179058bfb6bd3960f17d5e068
> Some automatic NUMA placement documentation
>
> About rationale, usage and (some small bits of) API.
>
> Signed-off-by: Dario Faggioli<dario.faggioli@citrix.com>
> Acked-by: Ian Campbell<ian.campbell@citrix.com>
>
> Changes from v1:
> * API documentation moved close to the actual functions.
>
> diff --git a/docs/misc/xl-numa-placement.markdown b/docs/misc/xl-numa-placement.markdown
> new file mode 100644
> --- /dev/null
> +++ b/docs/misc/xl-numa-placement.markdown
> @@ -0,0 +1,91 @@
> +# Guest Automatic NUMA Placement in libxl and xl #
> +
> +## Rationale ##
> +
> +NUMA means the memory accessing times of a program running on a CPU depends on
> +the relative distance between that CPU and that memory. In fact, most of the
> +NUMA systems are built in such a way that each processor has its local memory,
> +on which it can operate very fast. On the other hand, getting and storing data
> +from and on remote memory (that is, memory local to some other processor) is
> +quite more complex and slow. On these machines, a NUMA node is usually defined
> +as a set of processor cores (typically a physical CPU package) and the memory
> +directly attached to the set of cores.
> +
> +The Xen hypervisor deals with Non-Uniform Memory Access (NUMA]) machines by
> +assigning to its domain a "node affinity", i.e., a set of NUMA nodes of the
> +host from which it gets its memory allocated.
> +
> +NUMA awareness becomes very important as soon as many domains start running
> +memory-intensive workloads on a shared host. In fact, the cost of accessing non
> +node-local memory locations is very high, and the performance degradation is
> +likely to be noticeable.
> +
> +## Guest Placement in xl ##
> +
> +If using xl for creating and managing guests, it is very easy to ask for both
> +manual or automatic placement of them across the host's NUMA nodes.
> +
> +Note that xm/xend does the very same thing, the only differences residing in
> +the details of the heuristics adopted for the placement (see below).
> +
> +### Manual Guest Placement with xl ###
> +
> +Thanks to the "cpus=" option, it is possible to specify where a domain should
> +be created and scheduled on, directly in its config file. This affects NUMA
> +placement and memory accesses as the hypervisor constructs the node affinity of
> +a VM basing right on its CPU affinity when it is created.
> +
> +This is very simple and effective, but requires the user/system administrator
> +to explicitly specify affinities for each and every domain, or Xen won't be
> +able to guarantee the locality for their memory accesses.
> +
> +It is also possible to deal with NUMA by partitioning the system using cpupools
> +(available in the upcoming release of Xen, 4.2). Again, this could be "The
> +Right Answer" for many needs and occasions, but has to to be carefully
> +considered and manually setup by hand.
> +
> +### Automatic Guest Placement with xl ###
> +
> +In case no "cpus=" option is specified in the config file, libxl tries to
I think "If no 'cpus=' option..." is better here.
> +figure out on its own on which node(s) the domain could fit best. It is
> +worthwhile noting that optimally fitting a set of VMs on the NUMA nodes of an
> +host host is an incarnation of the Bin Packing Problem. In fact, the various
host host
> +VMs with different memory sizes are the items to be packed, and the host nodes
> +are the bins. That is known to be NP-hard, thus, it is probably better to
> +tackle the problem with some sort of hauristics, as we do not have any oracle
> +available!
I think you can just say "...is an incarnation of the Bin Packing
Problem, which is known to be NP-hard." We will therefore be using some
heuristics."
(nb the spelling of "heuristics" as well.)
> +
> +The first thing to do is finding a node, or even a set of nodes, that have
> +enough free memory and enough physical CPUs for accommodating the one new
> +domain. The idea is to find a spot for the domain with at least as much free
> +memory as it has configured, and as much pCPUs as it has vCPUs. After that,
> +the actual decision on which solution to go for happens accordingly to the
> +following heuristics:
> +
> + * candidates involving fewer nodes come first. In case two (or more)
> + candidates span the same number of nodes,
> + * the amount of free memory and the number of domains assigned to the
> + candidates are considered. In doing that, candidates with greater amount
> + of free memory and fewer assigned domains are preferred, with free memory
> + "weighting" three times as much as number of domains.
> +
> +Giving preference to small candidates ensures better performance for the guest,
I think I would say "candidates with fewer nodes" here; "small
candidates" doesn't convey "fewer nodes" to me.
> +as it avoid spreading its memory among different nodes. Favouring the nodes
> +that have the biggest amounts of free memory helps keeping the memory
We normally don't say "big amount", but "large amount" (don't ask me why
-- just sounds a bit funny to me). So this would be "largest amount".
> +fragmentation small, from a system wide perspective. However, in case more
Again, s/in case/if/;
Other than that, looks good to me.
-George
> +candidates fulfil these criteria by roughly the same extent, having the number
> +of domains the candidates are "hosting" helps balancing the load on the various
> +nodes.
> +
> +## Guest Placement within libxl ##
> +
> +xl achieves automatic NUMA just because libxl does it interrnally.
> +No API is provided (yet) for interacting with this feature and modify
> +the library behaviour regarding automatic placement, it just happens
> +by default if no affinity is specified (as it is with xm/xend).
> +
> +For actually looking and maybe tweaking the mechanism and the algorithms it
> +uses, all is implemented as a set of libxl internal interfaces and facilities.
> +Look at the comment "Automatic NUMA placement" in libxl\_internal.h.
> +
> +Note this may change in future versions of Xen/libxl.
next prev parent reply other threads:[~2012-07-06 14:08 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-07-04 16:17 [PATCH 00 of 10 v3] Automatic NUMA placement for xl Dario Faggioli
2012-07-04 16:18 ` [PATCH 01 of 10 v3] libxl: add a new Array type to the IDL Dario Faggioli
2012-07-04 16:18 ` [PATCH 02 of 10 v3] libxl, libxc: introduce libxl_get_numainfo() Dario Faggioli
2012-07-06 10:35 ` Ian Campbell
2012-07-04 16:18 ` [PATCH 03 of 10 v3] xl: add more NUMA information to `xl info -n' Dario Faggioli
2012-07-06 11:37 ` Ian Campbell
2012-07-06 12:00 ` Dario Faggioli
2012-07-06 12:15 ` Ian Campbell
2012-07-06 12:52 ` Dario Faggioli
2012-07-04 16:18 ` [PATCH 04 of 10 v3] libxl: rename libxl_cpumap to libxl_bitmap Dario Faggioli
2012-07-06 10:39 ` Ian Campbell
2012-07-04 16:18 ` [PATCH 05 of 10 v3] libxl: expand the libxl_bitmap API a bit Dario Faggioli
2012-07-06 10:40 ` Ian Campbell
2012-07-04 16:18 ` [PATCH 06 of 10 v3] libxl: introduce some node map helpers Dario Faggioli
2012-07-04 16:18 ` [PATCH 07 of 10 v3] libxl: explicitly check for libmath in autoconf Dario Faggioli
2012-07-04 16:44 ` Roger Pau Monne
2012-07-06 11:42 ` Ian Campbell
2012-07-06 11:54 ` Dario Faggioli
2012-07-04 16:18 ` [PATCH 08 of 10 v3] libxl: enable automatic placement of guests on NUMA nodes Dario Faggioli
2012-07-04 16:41 ` Dario Faggioli
2012-07-06 10:55 ` Ian Campbell
2012-07-06 13:03 ` Dario Faggioli
2012-07-06 13:21 ` Ian Campbell
2012-07-06 13:52 ` Dario Faggioli
2012-07-06 13:54 ` Ian Campbell
2012-07-06 11:30 ` George Dunlap
2012-07-06 13:00 ` Dario Faggioli
2012-07-06 13:05 ` George Dunlap
2012-07-06 14:35 ` Dario Faggioli
2012-07-06 14:40 ` George Dunlap
2012-07-06 16:27 ` Ian Campbell
2012-07-04 16:18 ` [PATCH 09 of 10 v3] libxl: have NUMA placement deal with cpupools Dario Faggioli
2012-07-06 12:42 ` George Dunlap
2012-07-06 13:10 ` Dario Faggioli
2012-07-06 13:27 ` George Dunlap
2012-07-06 13:32 ` Ian Campbell
2012-07-06 13:42 ` Dario Faggioli
2012-07-10 15:16 ` Dario Faggioli
2012-07-04 16:18 ` [PATCH 10 of 10 v3] Some automatic NUMA placement documentation Dario Faggioli
2012-07-06 14:08 ` George Dunlap [this message]
2012-07-06 14:26 ` George Dunlap
2012-07-06 14:37 ` Dario Faggioli
2012-07-06 11:16 ` [PATCH 00 of 10 v3] Automatic NUMA placement for xl Ian Campbell
2012-07-06 11:20 ` Ian Campbell
2012-07-06 11:22 ` Ian Campbell
2012-07-06 13:05 ` Dario Faggioli
2012-07-06 12:19 ` Ian Campbell
2012-07-08 18:32 ` Ian Campbell
2012-07-09 14:32 ` Dario Faggioli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FF6F163.6010300@eu.citrix.com \
--to=george.dunlap@eu.citrix.com \
--cc=Ian.Campbell@citrix.com \
--cc=Ian.Jackson@eu.citrix.com \
--cc=Stefano.Stabellini@eu.citrix.com \
--cc=andre.przywara@amd.com \
--cc=juergen.gross@ts.fujitsu.com \
--cc=raistlin@linux.it \
--cc=roger.pau@citrix.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.