xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: George Dunlap <george.dunlap@eu.citrix.com>
To: Dario Faggioli <dario.faggioli@citrix.com>,
	xen-devel <xen-devel@lists.xenproject.org>
Cc: AndrewCooper <Andrew.Cooper3@citrix.com>,
	KeirFraser <keir@xen.org>,
	Ian Jackson <Ian.Jackson@eu.citrix.com>,
	Ian Campbell <Ian.Campbell@citrix.com>,
	Jan Beulich <JBeulich@suse.com>
Subject: Re: [PATCH v6 2/9] xen: sched: introduce soft-affinity and use it instead d->node-affinity
Date: Mon, 2 Jun 2014 15:40:53 +0100	[thread overview]
Message-ID: <538C8CF5.8060107@eu.citrix.com> (raw)
In-Reply-To: <1401237770-7003-3-git-send-email-dario.faggioli@citrix.com>

On 05/28/2014 01:42 AM, Dario Faggioli wrote:
> Before this change, each vcpu had its own vcpu-affinity
> (in v->cpu_affinity), representing the set of pcpus where
> the vcpu is allowed to run. Since when NUMA-aware scheduling
> was introduced the (credit1 only, for now) scheduler also
> tries as much as it can to run all the vcpus of a domain
> on one of the nodes that constitutes the domain's
> node-affinity.
>
> The idea here is making the mechanism more general by:
>    * allowing for this 'preference' for some pcpus/nodes to be
>      expressed on a per-vcpu basis, instead than for the domain
>      as a whole. That is to say, each vcpu should have its own
>      set of preferred pcpus/nodes, instead than it being the
>      very same for all the vcpus of the domain;
>    * generalizing the idea of 'preferred pcpus' to not only NUMA
>      awareness and support. That is to say, independently from
>      it being or not (mostly) useful on NUMA systems, it should
>      be possible to specify, for each vcpu, a set of pcpus where
>      it prefers to run (in addition, and possibly unrelated to,
>      the set of pcpus where it is allowed to run).
>
> We will be calling this set of *preferred* pcpus the vcpu's
> soft affinity, and this changes introduce it, and starts using it
> for scheduling, replacing the indirect use of the domain's NUMA
> node-affinity. This is more general, as soft affinity does not
> have to be related to NUMA. Nevertheless, it allows to achieve the
> same results of NUMA-aware scheduling, just by making soft affinity
> equal to the domain's node affinity, for all the vCPUs (e.g.,
> from the toolstack).
>
> This also means renaming most of the NUMA-aware scheduling related
> functions, in credit1, to something more generic, hinting toward
> the concept of soft affinity rather than directly to NUMA awareness.
>
> As a side effects, this simplifies the code quit a bit. In fact,
> prior to this change, we needed to cache the translation of
> d->node_affinity (which is a nodemask_t) to a cpumask_t, since that
> is what scheduling decisions require (we used to keep it in
> node_affinity_cpumask). This, and all the complicated logic
> required to keep it updated, is not necessary any longer.
>
> The high level description of NUMA placement and scheduling in
> docs/misc/xl-numa-placement.markdown is being updated too, to match
> the new architecture.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
> ---
> Changes from v2:
>   * this patch folds patches 6 ("xen: sched: make space for
>     cpu_soft_affinity") and 10 ("xen: sched: use soft-affinity
>     instead of domain's node-affinity"), as suggested during
>     review. 'Reviewed-by' from George is there since both patch
>     6 and 10 had it, and I didn't do anything else than squashing
>     them.
>
> Changes from v1:
>   * in v1, "7/12 xen: numa-sched: use per-vcpu node-affinity for
>     actual scheduling" was doing something very similar to this
>     patch.
> ---
>   docs/misc/xl-numa-placement.markdown |  148 ++++++++++++++++++++------------
>   xen/common/domain.c                  |    5 +-
>   xen/common/keyhandler.c              |    2 +
>   xen/common/sched_credit.c            |  153 +++++++++++++---------------------
>   xen/common/schedule.c                |    3 +
>   xen/include/xen/sched.h              |    3 +
>   6 files changed, 168 insertions(+), 146 deletions(-)
>
> diff --git a/docs/misc/xl-numa-placement.markdown b/docs/misc/xl-numa-placement.markdown
> index caa3fec..b1ed361 100644
> --- a/docs/misc/xl-numa-placement.markdown
> +++ b/docs/misc/xl-numa-placement.markdown
> @@ -12,13 +12,6 @@ is quite more complex and slow. On these machines, a NUMA node is usually
>   defined as a set of processor cores (typically a physical CPU package) and
>   the memory directly attached to the set of cores.
>   
> -The Xen hypervisor deals with NUMA machines by assigning to each domain
> -a "node affinity", i.e., a set of NUMA nodes of the host from which they
> -get their memory allocated. Also, even if the node affinity of a domain
> -is allowed to change on-line, it is very important to "place" the domain
> -correctly when it is fist created, as the most of its memory is allocated
> -at that time and can not (for now) be moved easily.
> -
>   NUMA awareness becomes very important as soon as many domains start
>   running memory-intensive workloads on a shared host. In fact, the cost
>   of accessing non node-local memory locations is very high, and the
> @@ -27,14 +20,37 @@ performance degradation is likely to be noticeable.
>   For more information, have a look at the [Xen NUMA Introduction][numa_intro]
>   page on the Wiki.
>   
> +## Xen and NUMA machines: the concept of _node-affinity_ ##
> +
> +The Xen hypervisor deals with NUMA machines throughout the concept of
> +_node-affinity_. The node-affinity of a domain is the set of NUMA nodes
> +of the host where the memory for the domain is being allocated (mostly,
> +at domain creation time). This is, at least in principle, different and
> +unrelated with the vCPU (hard and soft, see below) scheduling affinity,
> +which instead is the set of pCPUs where the vCPU is allowed (or prefers)
> +to run.
> +
> +Of course, despite the fact that they belong to and affect different
> +subsystems, the domain node-affinity and the vCPUs affinity are not
> +completely independent.
> +In fact, if the domain node-affinity is not explicitly specified by the
> +user, via the proper libxl calls or xl config item, it will be computed
> +basing on the vCPUs' scheduling affinity.
> +
> +Notice that, even if the node affinity of a domain may change on-line,
> +it is very important to "place" the domain correctly when it is fist
> +created, as the most of its memory is allocated at that time and can
> +not (for now) be moved easily.
> +
>   ### Placing via pinning and cpupools ###
>   
> -The simplest way of placing a domain on a NUMA node is statically pinning
> -the domain's vCPUs to the pCPUs of the node. This goes under the name of
> -CPU affinity and can be set through the "cpus=" option in the config file
> -(more about this below). Another option is to pool together the pCPUs
> -spanning the node and put the domain in such a cpupool with the "pool="
> -config option (as documented in our [Wiki][cpupools_howto]).
> +The simplest way of placing a domain on a NUMA node is setting the hard
> +scheduling affinity of the domain's vCPUs to the pCPUs of the node. This
> +also goes under the name of vCPU pinning, and can be done through the
> +"cpus=" option in the config file (more about this below). Another option
> +is to pool together the pCPUs spanning the node and put the domain in
> +such a _cpupool_ with the "pool=" config option (as documented in our
> +[Wiki][cpupools_howto]).
>   
>   In both the above cases, the domain will not be able to execute outside
>   the specified set of pCPUs for any reasons, even if all those pCPUs are
> @@ -45,24 +61,45 @@ may come at he cost of some load imbalances.
>   
>   ### NUMA aware scheduling ###
>   
> -If the credit scheduler is in use, the concept of node affinity defined
> -above does not only apply to memory. In fact, starting from Xen 4.3, the
> -scheduler always tries to run the domain's vCPUs on one of the nodes in
> -its node affinity. Only if that turns out to be impossible, it will just
> -pick any free pCPU.
> -
> -This is, therefore, something more flexible than CPU affinity, as a domain
> -can still run everywhere, it just prefers some nodes rather than others.
> -Locality of access is less guaranteed than in the pinning case, but that
> -comes along with better chances to exploit all the host resources (e.g.,
> -the pCPUs).
> -
> -In fact, if all the pCPUs in a domain's node affinity are busy, it is
> -possible for the domain to run outside of there, but it is very likely that
> -slower execution (due to remote memory accesses) is still better than no
> -execution at all, as it would happen with pinning. For this reason, NUMA
> -aware scheduling has the potential of bringing substantial performances
> -benefits, although this will depend on the workload.
> +If using the credit1 scheduler, and starting from Xen 4.3, the scheduler
> +itself always tries to run the domain's vCPUs on one of the nodes in
> +its node-affinity. Only if that turns out to be impossible, it will just
> +pick any free pCPU. Locality of access is less guaranteed than in the
> +pinning case, but that comes along with better chances to exploit all
> +the host resources (e.g., the pCPUs).
> +
> +Starting from Xen 4.4, credit1 supports two forms of affinity: hard and

Just noticed, you need to  s/4.4/4.5/g; throughout this whole hunk.

Other than that, the Reviewed-by stands.

  -George

  parent reply	other threads:[~2014-06-02 14:41 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-28  0:42 [PATCH v6 0/9] Implement vcpu soft affinity for credit1 Dario Faggioli
2014-05-28  0:42 ` [PATCH v6 1/9] xen: sched: rename v->cpu_affinity into v->cpu_hard_affinity Dario Faggioli
2014-05-28  7:28   ` Jan Beulich
2014-05-28  0:42 ` [PATCH v6 2/9] xen: sched: introduce soft-affinity and use it instead d->node-affinity Dario Faggioli
2014-05-28  7:33   ` Jan Beulich
2014-05-28 15:50     ` Dario Faggioli
2014-06-02 14:40   ` George Dunlap [this message]
2014-05-28  0:42 ` [PATCH v6 3/9] xen: derive NUMA node affinity from hard and soft CPU affinity Dario Faggioli
2014-05-28  7:36   ` Jan Beulich
2014-05-28  0:42 ` [PATCH v6 4/9] xen/libxc: sched: DOMCTL_*vcpuaffinity works with hard and soft affinity Dario Faggioli
2014-05-28  7:40   ` Jan Beulich
2014-05-28 15:09     ` Ian Campbell
2014-05-28  0:42 ` [PATCH v6 5/9] libxc: get and set soft and hard affinity Dario Faggioli
2014-05-28  0:42 ` [PATCH v6 6/9] libxl: get and set soft affinity Dario Faggioli
2014-05-28 15:13   ` Ian Campbell
2014-05-28 15:15     ` Dario Faggioli
2014-05-28 15:23   ` Ian Campbell
2014-06-05 12:59     ` Dario Faggioli
2014-06-06  8:46       ` Ian Campbell
2014-06-06 22:11         ` Dario Faggioli
2014-05-28  0:42 ` [PATCH v6 7/9] xl: enable getting and setting soft Dario Faggioli
2014-05-28 15:33   ` Ian Campbell
2014-05-28 16:01     ` Dario Faggioli
2014-06-02 15:20   ` George Dunlap
2014-05-28  0:42 ` [PATCH v6 8/9] xl: enable for specifying node-affinity in the config file Dario Faggioli
2014-05-28 15:48   ` Ian Campbell
2014-05-28 16:55     ` Dario Faggioli
2014-05-28  0:42 ` [PATCH v6 9/9] libxl: automatic NUMA placement affects soft affinity Dario Faggioli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=538C8CF5.8060107@eu.citrix.com \
    --to=george.dunlap@eu.citrix.com \
    --cc=Andrew.Cooper3@citrix.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=JBeulich@suse.com \
    --cc=dario.faggioli@citrix.com \
    --cc=keir@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).