All of lore.kernel.org
 help / color / mirror / Atom feed
From: George Dunlap <george.dunlap@eu.citrix.com>
To: Dario Faggioli <dario.faggioli@citrix.com>,
	xen-devel <xen-devel@lists.xenproject.org>
Cc: AndrewCooper <Andrew.Cooper3@citrix.com>,
	KeirFraser <keir@xen.org>,
	Ian Jackson <Ian.Jackson@eu.citrix.com>,
	Ian Campbell <Ian.Campbell@citrix.com>,
	Jan Beulich <JBeulich@suse.com>
Subject: Re: [PATCH v6 2/9] xen: sched: introduce soft-affinity and use it instead d->node-affinity
Date: Mon, 2 Jun 2014 15:40:53 +0100	[thread overview]
Message-ID: <538C8CF5.8060107@eu.citrix.com> (raw)
In-Reply-To: <1401237770-7003-3-git-send-email-dario.faggioli@citrix.com>

On 05/28/2014 01:42 AM, Dario Faggioli wrote:
> Before this change, each vcpu had its own vcpu-affinity
> (in v->cpu_affinity), representing the set of pcpus where
> the vcpu is allowed to run. Since when NUMA-aware scheduling
> was introduced the (credit1 only, for now) scheduler also
> tries as much as it can to run all the vcpus of a domain
> on one of the nodes that constitutes the domain's
> node-affinity.
>
> The idea here is making the mechanism more general by:
>    * allowing for this 'preference' for some pcpus/nodes to be
>      expressed on a per-vcpu basis, instead than for the domain
>      as a whole. That is to say, each vcpu should have its own
>      set of preferred pcpus/nodes, instead than it being the
>      very same for all the vcpus of the domain;
>    * generalizing the idea of 'preferred pcpus' to not only NUMA
>      awareness and support. That is to say, independently from
>      it being or not (mostly) useful on NUMA systems, it should
>      be possible to specify, for each vcpu, a set of pcpus where
>      it prefers to run (in addition, and possibly unrelated to,
>      the set of pcpus where it is allowed to run).
>
> We will be calling this set of *preferred* pcpus the vcpu's
> soft affinity, and this changes introduce it, and starts using it
> for scheduling, replacing the indirect use of the domain's NUMA
> node-affinity. This is more general, as soft affinity does not
> have to be related to NUMA. Nevertheless, it allows to achieve the
> same results of NUMA-aware scheduling, just by making soft affinity
> equal to the domain's node affinity, for all the vCPUs (e.g.,
> from the toolstack).
>
> This also means renaming most of the NUMA-aware scheduling related
> functions, in credit1, to something more generic, hinting toward
> the concept of soft affinity rather than directly to NUMA awareness.
>
> As a side effects, this simplifies the code quit a bit. In fact,
> prior to this change, we needed to cache the translation of
> d->node_affinity (which is a nodemask_t) to a cpumask_t, since that
> is what scheduling decisions require (we used to keep it in
> node_affinity_cpumask). This, and all the complicated logic
> required to keep it updated, is not necessary any longer.
>
> The high level description of NUMA placement and scheduling in
> docs/misc/xl-numa-placement.markdown is being updated too, to match
> the new architecture.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
> ---
> Changes from v2:
>   * this patch folds patches 6 ("xen: sched: make space for
>     cpu_soft_affinity") and 10 ("xen: sched: use soft-affinity
>     instead of domain's node-affinity"), as suggested during
>     review. 'Reviewed-by' from George is there since both patch
>     6 and 10 had it, and I didn't do anything else than squashing
>     them.
>
> Changes from v1:
>   * in v1, "7/12 xen: numa-sched: use per-vcpu node-affinity for
>     actual scheduling" was doing something very similar to this
>     patch.
> ---
>   docs/misc/xl-numa-placement.markdown |  148 ++++++++++++++++++++------------
>   xen/common/domain.c                  |    5 +-
>   xen/common/keyhandler.c              |    2 +
>   xen/common/sched_credit.c            |  153 +++++++++++++---------------------
>   xen/common/schedule.c                |    3 +
>   xen/include/xen/sched.h              |    3 +
>   6 files changed, 168 insertions(+), 146 deletions(-)
>
> diff --git a/docs/misc/xl-numa-placement.markdown b/docs/misc/xl-numa-placement.markdown
> index caa3fec..b1ed361 100644
> --- a/docs/misc/xl-numa-placement.markdown
> +++ b/docs/misc/xl-numa-placement.markdown
> @@ -12,13 +12,6 @@ is quite more complex and slow. On these machines, a NUMA node is usually
>   defined as a set of processor cores (typically a physical CPU package) and
>   the memory directly attached to the set of cores.
>   
> -The Xen hypervisor deals with NUMA machines by assigning to each domain
> -a "node affinity", i.e., a set of NUMA nodes of the host from which they
> -get their memory allocated. Also, even if the node affinity of a domain
> -is allowed to change on-line, it is very important to "place" the domain
> -correctly when it is fist created, as the most of its memory is allocated
> -at that time and can not (for now) be moved easily.
> -
>   NUMA awareness becomes very important as soon as many domains start
>   running memory-intensive workloads on a shared host. In fact, the cost
>   of accessing non node-local memory locations is very high, and the
> @@ -27,14 +20,37 @@ performance degradation is likely to be noticeable.
>   For more information, have a look at the [Xen NUMA Introduction][numa_intro]
>   page on the Wiki.
>   
> +## Xen and NUMA machines: the concept of _node-affinity_ ##
> +
> +The Xen hypervisor deals with NUMA machines throughout the concept of
> +_node-affinity_. The node-affinity of a domain is the set of NUMA nodes
> +of the host where the memory for the domain is being allocated (mostly,
> +at domain creation time). This is, at least in principle, different and
> +unrelated with the vCPU (hard and soft, see below) scheduling affinity,
> +which instead is the set of pCPUs where the vCPU is allowed (or prefers)
> +to run.
> +
> +Of course, despite the fact that they belong to and affect different
> +subsystems, the domain node-affinity and the vCPUs affinity are not
> +completely independent.
> +In fact, if the domain node-affinity is not explicitly specified by the
> +user, via the proper libxl calls or xl config item, it will be computed
> +basing on the vCPUs' scheduling affinity.
> +
> +Notice that, even if the node affinity of a domain may change on-line,
> +it is very important to "place" the domain correctly when it is fist
> +created, as the most of its memory is allocated at that time and can
> +not (for now) be moved easily.
> +
>   ### Placing via pinning and cpupools ###
>   
> -The simplest way of placing a domain on a NUMA node is statically pinning
> -the domain's vCPUs to the pCPUs of the node. This goes under the name of
> -CPU affinity and can be set through the "cpus=" option in the config file
> -(more about this below). Another option is to pool together the pCPUs
> -spanning the node and put the domain in such a cpupool with the "pool="
> -config option (as documented in our [Wiki][cpupools_howto]).
> +The simplest way of placing a domain on a NUMA node is setting the hard
> +scheduling affinity of the domain's vCPUs to the pCPUs of the node. This
> +also goes under the name of vCPU pinning, and can be done through the
> +"cpus=" option in the config file (more about this below). Another option
> +is to pool together the pCPUs spanning the node and put the domain in
> +such a _cpupool_ with the "pool=" config option (as documented in our
> +[Wiki][cpupools_howto]).
>   
>   In both the above cases, the domain will not be able to execute outside
>   the specified set of pCPUs for any reasons, even if all those pCPUs are
> @@ -45,24 +61,45 @@ may come at he cost of some load imbalances.
>   
>   ### NUMA aware scheduling ###
>   
> -If the credit scheduler is in use, the concept of node affinity defined
> -above does not only apply to memory. In fact, starting from Xen 4.3, the
> -scheduler always tries to run the domain's vCPUs on one of the nodes in
> -its node affinity. Only if that turns out to be impossible, it will just
> -pick any free pCPU.
> -
> -This is, therefore, something more flexible than CPU affinity, as a domain
> -can still run everywhere, it just prefers some nodes rather than others.
> -Locality of access is less guaranteed than in the pinning case, but that
> -comes along with better chances to exploit all the host resources (e.g.,
> -the pCPUs).
> -
> -In fact, if all the pCPUs in a domain's node affinity are busy, it is
> -possible for the domain to run outside of there, but it is very likely that
> -slower execution (due to remote memory accesses) is still better than no
> -execution at all, as it would happen with pinning. For this reason, NUMA
> -aware scheduling has the potential of bringing substantial performances
> -benefits, although this will depend on the workload.
> +If using the credit1 scheduler, and starting from Xen 4.3, the scheduler
> +itself always tries to run the domain's vCPUs on one of the nodes in
> +its node-affinity. Only if that turns out to be impossible, it will just
> +pick any free pCPU. Locality of access is less guaranteed than in the
> +pinning case, but that comes along with better chances to exploit all
> +the host resources (e.g., the pCPUs).
> +
> +Starting from Xen 4.4, credit1 supports two forms of affinity: hard and

Just noticed, you need to  s/4.4/4.5/g; throughout this whole hunk.

Other than that, the Reviewed-by stands.

  -George

  parent reply	other threads:[~2014-06-02 14:41 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-28  0:42 [PATCH v6 0/9] Implement vcpu soft affinity for credit1 Dario Faggioli
2014-05-28  0:42 ` [PATCH v6 1/9] xen: sched: rename v->cpu_affinity into v->cpu_hard_affinity Dario Faggioli
2014-05-28  7:28   ` Jan Beulich
2014-05-28  0:42 ` [PATCH v6 2/9] xen: sched: introduce soft-affinity and use it instead d->node-affinity Dario Faggioli
2014-05-28  7:33   ` Jan Beulich
2014-05-28 15:50     ` Dario Faggioli
2014-06-02 14:40   ` George Dunlap [this message]
2014-05-28  0:42 ` [PATCH v6 3/9] xen: derive NUMA node affinity from hard and soft CPU affinity Dario Faggioli
2014-05-28  7:36   ` Jan Beulich
2014-05-28  0:42 ` [PATCH v6 4/9] xen/libxc: sched: DOMCTL_*vcpuaffinity works with hard and soft affinity Dario Faggioli
2014-05-28  7:40   ` Jan Beulich
2014-05-28 15:09     ` Ian Campbell
2014-05-28  0:42 ` [PATCH v6 5/9] libxc: get and set soft and hard affinity Dario Faggioli
2014-05-28  0:42 ` [PATCH v6 6/9] libxl: get and set soft affinity Dario Faggioli
2014-05-28 15:13   ` Ian Campbell
2014-05-28 15:15     ` Dario Faggioli
2014-05-28 15:23   ` Ian Campbell
2014-06-05 12:59     ` Dario Faggioli
2014-06-06  8:46       ` Ian Campbell
2014-06-06 22:11         ` Dario Faggioli
2014-05-28  0:42 ` [PATCH v6 7/9] xl: enable getting and setting soft Dario Faggioli
2014-05-28 15:33   ` Ian Campbell
2014-05-28 16:01     ` Dario Faggioli
2014-06-02 15:20   ` George Dunlap
2014-05-28  0:42 ` [PATCH v6 8/9] xl: enable for specifying node-affinity in the config file Dario Faggioli
2014-05-28 15:48   ` Ian Campbell
2014-05-28 16:55     ` Dario Faggioli
2014-05-28  0:42 ` [PATCH v6 9/9] libxl: automatic NUMA placement affects soft affinity Dario Faggioli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=538C8CF5.8060107@eu.citrix.com \
    --to=george.dunlap@eu.citrix.com \
    --cc=Andrew.Cooper3@citrix.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=JBeulich@suse.com \
    --cc=dario.faggioli@citrix.com \
    --cc=keir@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.