[PATCH v8 03/13] xen: derive NUMA node affinity from hard and soft CPU affinity

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Dario Faggioli <dario.faggioli@citrix.com>
To: xen-devel@lists.xen.org
Cc: keir@xen.org, Ian.Campbell@citrix.com, Andrew.Cooper3@citrix.com,
	George.Dunlap@citrix.com, JBeulich@suse.com,
	Ian.Jackson@citrix.com, Wei Liu <wei.liu2@citrix.com>
Subject: [PATCH v8 03/13] xen: derive NUMA node affinity from hard and soft CPU affinity
Date: Fri, 13 Jun 2014 15:09:27 +0200	[thread overview]
Message-ID: <20140613130926.4106.77055.stgit@Solace> (raw)
In-Reply-To: <20140613124847.4106.70161.stgit@Solace>

if a domain's NUMA node-affinity (which is what controls
memory allocations) is provided by the user/toolstack, it
just is not touched. However, if the user does not say
anything, leaving it all to Xen, let's compute it in the
following way:

 1. cpupool's cpus & hard-affinity & soft-affinity
 2. if (1) is empty: cpupool's cpus & hard-affinity

This guarantees memory to be allocated from the narrowest
possible set of NUMA nodes, ad makes it relatively easy to
set up NUMA-aware scheduling on top of soft affinity.

Note that such 'narrowest set' is guaranteed to be non-empty.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
Changes from v7:
 * George goes from Reviewed-by to Acked-by

Chenges from v6:
 * fixed a bug when a domain was being created inside a
   cpupool;
 * coding style.

Changes from v3:
 * avoid pointless calls to cpumask_clear(), as requested
   during review;
 * ASSERT() non emptyness of cpupool & hard affinity, as
   suggested during review.

Changes from v2:
 * the loop computing the mask is now only executed when
   it really is useful, as suggested during review;
 * the loop, and all the cpumask handling is optimized,
   in a way similar to what was suggested during review.
---
 xen/common/domain.c   |   61 +++++++++++++++++++++++++++++++------------------
 xen/common/schedule.c |    4 ++-
 2 files changed, 42 insertions(+), 23 deletions(-)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index e20d3bf..c3a576e 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -409,17 +409,17 @@ struct domain *domain_create(
 
 void domain_update_node_affinity(struct domain *d)
 {
-    cpumask_var_t cpumask;
-    cpumask_var_t online_affinity;
+    cpumask_var_t dom_cpumask, dom_cpumask_soft;
+    cpumask_t *dom_affinity;
     const cpumask_t *online;
     struct vcpu *v;
-    unsigned int node;
+    unsigned int cpu;
 
-    if ( !zalloc_cpumask_var(&cpumask) )
+    if ( !zalloc_cpumask_var(&dom_cpumask) )
         return;
-    if ( !alloc_cpumask_var(&online_affinity) )
+    if ( !zalloc_cpumask_var(&dom_cpumask_soft) )
     {
-        free_cpumask_var(cpumask);
+        free_cpumask_var(dom_cpumask);
         return;
     }
 
@@ -427,31 +427,48 @@ void domain_update_node_affinity(struct domain *d)
 
     spin_lock(&d->node_affinity_lock);
 
-    for_each_vcpu ( d, v )
-    {
-        cpumask_and(online_affinity, v->cpu_hard_affinity, online);
-        cpumask_or(cpumask, cpumask, online_affinity);
-    }
-
     /*
-     * If d->auto_node_affinity is true, the domain's node-affinity mask
-     * (d->node_affinity) is automaically computed from all the domain's
-     * vcpus' vcpu-affinity masks (the union of which we have just built
-     * above in cpumask). OTOH, if d->auto_node_affinity is false, we
-     * must leave the node-affinity of the domain alone.
+     * If d->auto_node_affinity is true, let's compute the domain's
+     * node-affinity and update d->node_affinity accordingly. if false,
+     * just leave d->auto_node_affinity alone.
      */
     if ( d->auto_node_affinity )
     {
+        /*
+         * We want the narrowest possible set of pcpus (to get the narowest
+         * possible set of nodes). What we need is the cpumask of where the
+         * domain can run (the union of the hard affinity of all its vcpus),
+         * and the full mask of where it would prefer to run (the union of
+         * the soft affinity of all its various vcpus). Let's build them.
+         */
+        for_each_vcpu ( d, v )
+        {
+            cpumask_or(dom_cpumask, dom_cpumask, v->cpu_hard_affinity);
+            cpumask_or(dom_cpumask_soft, dom_cpumask_soft,
+                       v->cpu_soft_affinity);
+        }
+        /* Filter out non-online cpus */
+        cpumask_and(dom_cpumask, dom_cpumask, online);
+        ASSERT(!cpumask_empty(dom_cpumask));
+        /* And compute the intersection between hard, online and soft */
+        cpumask_and(dom_cpumask_soft, dom_cpumask_soft, dom_cpumask);
+
+        /*
+         * If not empty, the intersection of hard, soft and online is the
+         * narrowest set we want. If empty, we fall back to hard&online.
+         */
+        dom_affinity = cpumask_empty(dom_cpumask_soft) ?
+                           dom_cpumask : dom_cpumask_soft;
+
         nodes_clear(d->node_affinity);
-        for_each_online_node ( node )
-            if ( cpumask_intersects(&node_to_cpumask(node), cpumask) )
-                node_set(node, d->node_affinity);
+        for_each_cpu ( cpu, dom_affinity )
+            node_set(cpu_to_node(cpu), d->node_affinity);
     }
 
     spin_unlock(&d->node_affinity_lock);
 
-    free_cpumask_var(online_affinity);
-    free_cpumask_var(cpumask);
+    free_cpumask_var(dom_cpumask_soft);
+    free_cpumask_var(dom_cpumask);
 }
 
 
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index d76a425..e57cd91 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -310,7 +310,9 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
         SCHED_OP(old_ops, free_vdata, vcpudata);
     }
 
-    domain_update_node_affinity(d);
+    /* Do we have vcpus already? If not, no need to update node-affinity */
+    if ( d->vcpu )
+        domain_update_node_affinity(d);
 
     domain_unpause(d);

next prev parent reply	other threads:[~2014-06-13 13:09 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-13 13:09 [PATCH v8 00/13] Implement vcpu soft affinity for credit1 Dario Faggioli
2014-06-13 13:09 ` [PATCH v8 01/13] xen: sched: rename v->cpu_affinity into v->cpu_hard_affinity Dario Faggioli
2014-06-13 13:09 ` [PATCH v8 02/13] xen: sched: introduce soft-affinity and use it instead d->node-affinity Dario Faggioli
2014-06-13 13:09 ` Dario Faggioli [this message]
2014-06-13 13:09 ` [PATCH v8 04/13] xen/libxc: sched: DOMCTL_*vcpuaffinity works with hard and soft affinity Dario Faggioli
2014-06-13 13:09 ` [PATCH v8 05/13] libxc/libxl: bump library SONAMEs Dario Faggioli
2014-06-13 13:09 ` [PATCH v8 06/13] libxc: get and set soft and hard affinity Dario Faggioli
2014-06-13 13:10 ` [PATCH v8 07/13] libxl: get and set soft affinity Dario Faggioli
2014-06-13 13:10 ` [PATCH v8 08/13] xl: enable getting and setting " Dario Faggioli
2014-06-13 13:10 ` [PATCH v8 09/13] libxl/xl: push VCPU affinity pinning down to libxl Dario Faggioli
2014-06-13 13:25   ` Wei Liu
2014-06-17 10:09     ` Dario Faggioli
2014-06-13 13:10 ` [PATCH v8 10/13] libxl/xl: deprecate the build_info->cpumap field Dario Faggioli
2014-06-13 13:34   ` Wei Liu
2014-06-13 13:38     ` Ian Campbell
2014-06-17 10:11       ` Dario Faggioli
2014-06-13 13:49     ` Wei Liu
2014-06-17  9:59       ` Dario Faggioli
2014-06-17 10:16   ` Wei Liu
2014-06-17 10:29     ` Dario Faggioli
2014-06-17 10:45       ` Wei Liu
2014-06-13 13:10 ` [PATCH v8 11/13] xl: move the vcpu affinity parsing in a function Dario Faggioli
2014-06-13 13:10 ` [PATCH v8 12/13] xl: enable for specifying soft-affinity in the config file Dario Faggioli
2014-06-17 10:34   ` Wei Liu
2014-06-13 13:10 ` [PATCH v8 13/13] libxl: automatic NUMA placement affects soft affinity Dario Faggioli

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:e20d3bf dfblob:c3a576e dfblob:d76a425 dfblob:e57cd91 )
 OR (
bs:"[PATCH v8 03/13] xen: derive NUMA node affinity from hard and soft CPU affinity" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140613130926.4106.77055.stgit@Solace \
    --to=dario.faggioli@citrix.com \
    --cc=Andrew.Cooper3@citrix.com \
    --cc=George.Dunlap@citrix.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=Ian.Jackson@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=keir@xen.org \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.