[PATCH] xen: make sure the node-affinity is always updated

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] xen: make sure the node-affinity is always updated
@ 2013-09-12 15:57 Dario Faggioli
  2013-09-13  4:31 ` Juergen Gross
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Dario Faggioli @ 2013-09-12 15:57 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Juergen Gross, Keir Fraser, Jan Beulich

in particular when a domain moves from a cpupool to another, and
when the list of cpus in a cpupool changes.

Not doing that may result in the domain's node-affinity mask
(as retrieved by csched_balance_cpumask(..CSCHED_BALANCE_NODE_AFFINITY..) )
and online mask (as retrieved by cpupool_scheduler_cpumask() ) to have
an empty intersection.

This in turn means that, in _csched_cpu_pick(), we think it is
legitimate to do a node-affinity load balancing step and, when
running this:

    ...
    /* Pick an online CPU from the proper affinity mask */
    csched_balance_cpumask(vc, balance_step, &cpus);
    cpumask_and(&cpus, &cpus, online);
    ...

we end up with an empty cpumask (in cpus). At this point, in
the following code:

    ....
    /* If present, prefer vc's current processor */
    cpu = cpumask_test_cpu(vc->processor, &cpus)
            ? vc->processor
            : cpumask_cycle(vc->processor, &cpus);
    ....

an ASSERT (from inside cpumask_cycle() ) triggers like this:

(XEN) Xen call trace:
(XEN)    [<ffff82d08011b124>] _csched_cpu_pick+0x1d2/0x652
(XEN)    [<ffff82d08011b5b2>] csched_cpu_pick+0xe/0x10
(XEN)    [<ffff82d0801232de>] vcpu_migrate+0x167/0x31e
(XEN)    [<ffff82d0801238cc>] cpu_disable_scheduler+0x1c8/0x287
(XEN)    [<ffff82d080101b3f>] cpupool_unassign_cpu_helper+0x20/0xb4
(XEN)    [<ffff82d08010544f>] continue_hypercall_tasklet_handler+0x4a/0xb1
(XEN)    [<ffff82d080127793>] do_tasklet_work+0x78/0xab
(XEN)    [<ffff82d080127a70>] do_tasklet+0x5f/0x8b
(XEN)    [<ffff82d080158985>] idle_loop+0x57/0x5e
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 1:
(XEN) Assertion 'cpu < nr_cpu_ids' failed at /home/dario/Sources/xen/xen/xen.git/xen/include/xe:16481

It is for example sufficient to have a domain with node-affinity
to NUMA node 1 running, and issueing a `xl cpupool-numa-split'
would make the above happen. That is because, by default, all
the existing domains remain assigned to the first cpupool, and
it now (after the cpupool-numa-split) only includes NUMA node 0.

This change prevents that, by making sure that the node-affinity
mask is reset and automatically recomputed, if that is the case,
in domain_update_node_affinity(), as well as making sure that
such function is invoked every time that is required.

In fact, it makes much more sense to reset the node-affinity
when either it and the domain's vcpu-affinity or the cpumap of
the domain's cpupool are inconsistent, rather than not touching
it, as it was before. Also, updating the domain's node affinity
on the cpupool_unassing_cpu() path, as it already happens on
the cpupool_assign_cpu_locked() path, was something we were
missing anyway, even independently from the ASSERT triggering.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Juergen Gross <juergen.gross@ts.fujitsu.com>
---
 xen/common/cpupool.c |    8 ++++++++
 xen/common/domain.c  |   40 +++++++++++++++++++++++-----------------
 2 files changed, 31 insertions(+), 17 deletions(-)

diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
index 2164a9f..23e461d 100644
--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -355,6 +355,14 @@ int cpupool_unassign_cpu(struct cpupool *c, unsigned int cpu)
     atomic_inc(&c->refcnt);
     cpupool_cpu_moving = c;
     cpumask_clear_cpu(cpu, c->cpu_valid);
+
+    rcu_read_lock(&domlist_read_lock);
+    for_each_domain_in_cpupool(d, c)
+    {
+        domain_update_node_affinity(d);
+    }
+    rcu_read_unlock(&domlist_read_lock);
+
     spin_unlock(&cpupool_lock);
 
     work_cpu = smp_processor_id();
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 5999779..f708b3c 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -352,7 +352,6 @@ void domain_update_node_affinity(struct domain *d)
     cpumask_var_t cpumask;
     cpumask_var_t online_affinity;
     const cpumask_t *online;
-    nodemask_t nodemask = NODE_MASK_NONE;
     struct vcpu *v;
     unsigned int node;
 
@@ -374,28 +373,35 @@ void domain_update_node_affinity(struct domain *d)
         cpumask_or(cpumask, cpumask, online_affinity);
     }
 
-    if ( d->auto_node_affinity )
-    {
-        /* Node-affinity is automaically computed from all vcpu-affinities */
-        for_each_online_node ( node )
-            if ( cpumask_intersects(&node_to_cpumask(node), cpumask) )
-                node_set(node, nodemask);
-
-        d->node_affinity = nodemask;
-    }
-    else
+    if ( !d->auto_node_affinity )
     {
         /* Node-affinity is provided by someone else, just filter out cpus
          * that are either offline or not in the affinity of any vcpus. */
-        nodemask = d->node_affinity;
         for_each_node_mask ( node, d->node_affinity )
             if ( !cpumask_intersects(&node_to_cpumask(node), cpumask) )
-                node_clear(node, nodemask);//d->node_affinity);
+                node_clear(node, d->node_affinity);
+
+        /*
+         * If we end up with an empty node-affinity, e.g., because the user
+         * specified an incompatible vcpu-affinity, or moved the domain to
+         * a different cpupool, reset the node-affinity and re-calculate it
+         * (in the body of the if() below).
+         *
+         * This is necessary to prevent the schedulers from attempting
+         * node-affinity load balancing on empty cpumasks, with (potentially)
+         * nasty effects (increased overhead or even crash!).
+         */
+        if ( nodes_empty(d->node_affinity) )
+            d->auto_node_affinity = 1;
+    }
 
-        /* Avoid loosing track of node-affinity because of a bad
-         * vcpu-affinity has been specified. */
-        if ( !nodes_empty(nodemask) )
-            d->node_affinity = nodemask;
+    if ( d->auto_node_affinity )
+    {
+        /* Node-affinity is automatically computed from all vcpu-affinities */
+        nodes_clear(d->node_affinity);
+        for_each_online_node ( node )
+            if ( cpumask_intersects(&node_to_cpumask(node), cpumask) )
+                node_set(node, d->node_affinity);
     }
 
     sched_set_node_affinity(d, &d->node_affinity);

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] xen: make sure the node-affinity is always updated
  2013-09-12 15:57 [PATCH] xen: make sure the node-affinity is always updated Dario Faggioli
@ 2013-09-13  4:31 ` Juergen Gross
  2013-09-13  9:52 ` Dario Faggioli
  2013-09-13 10:45 ` George Dunlap
  2 siblings, 0 replies; 7+ messages in thread
From: Juergen Gross @ 2013-09-13  4:31 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: George Dunlap, Keir Fraser, Jan Beulich, xen-devel

On 12.09.2013 17:57, Dario Faggioli wrote:
> in particular when a domain moves from a cpupool to another, and
> when the list of cpus in a cpupool changes.
>
> Not doing that may result in the domain's node-affinity mask
> (as retrieved by csched_balance_cpumask(..CSCHED_BALANCE_NODE_AFFINITY..) )
> and online mask (as retrieved by cpupool_scheduler_cpumask() ) to have
> an empty intersection.
>
> This in turn means that, in _csched_cpu_pick(), we think it is
> legitimate to do a node-affinity load balancing step and, when
> running this:
>
>      ...
>      /* Pick an online CPU from the proper affinity mask */
>      csched_balance_cpumask(vc, balance_step, &cpus);
>      cpumask_and(&cpus, &cpus, online);
>      ...
>
> we end up with an empty cpumask (in cpus). At this point, in
> the following code:
>
>      ....
>      /* If present, prefer vc's current processor */
>      cpu = cpumask_test_cpu(vc->processor, &cpus)
>              ? vc->processor
>              : cpumask_cycle(vc->processor, &cpus);
>      ....
>
> an ASSERT (from inside cpumask_cycle() ) triggers like this:
>
> (XEN) Xen call trace:
> (XEN)    [<ffff82d08011b124>] _csched_cpu_pick+0x1d2/0x652
> (XEN)    [<ffff82d08011b5b2>] csched_cpu_pick+0xe/0x10
> (XEN)    [<ffff82d0801232de>] vcpu_migrate+0x167/0x31e
> (XEN)    [<ffff82d0801238cc>] cpu_disable_scheduler+0x1c8/0x287
> (XEN)    [<ffff82d080101b3f>] cpupool_unassign_cpu_helper+0x20/0xb4
> (XEN)    [<ffff82d08010544f>] continue_hypercall_tasklet_handler+0x4a/0xb1
> (XEN)    [<ffff82d080127793>] do_tasklet_work+0x78/0xab
> (XEN)    [<ffff82d080127a70>] do_tasklet+0x5f/0x8b
> (XEN)    [<ffff82d080158985>] idle_loop+0x57/0x5e
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 1:
> (XEN) Assertion 'cpu < nr_cpu_ids' failed at /home/dario/Sources/xen/xen/xen.git/xen/include/xe:16481
>
> It is for example sufficient to have a domain with node-affinity
> to NUMA node 1 running, and issueing a `xl cpupool-numa-split'
> would make the above happen. That is because, by default, all
> the existing domains remain assigned to the first cpupool, and
> it now (after the cpupool-numa-split) only includes NUMA node 0.
>
> This change prevents that, by making sure that the node-affinity
> mask is reset and automatically recomputed, if that is the case,
> in domain_update_node_affinity(), as well as making sure that
> such function is invoked every time that is required.
>
> In fact, it makes much more sense to reset the node-affinity
> when either it and the domain's vcpu-affinity or the cpumap of
> the domain's cpupool are inconsistent, rather than not touching
> it, as it was before. Also, updating the domain's node affinity
> on the cpupool_unassing_cpu() path, as it already happens on
> the cpupool_assign_cpu_locked() path, was something we were
> missing anyway, even independently from the ASSERT triggering.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Acked-by: Juergen Gross <juergen.gross@ts.fujitsu.com>

> Cc: George Dunlap <George.Dunlap@eu.citrix.com>
> Cc: Jan Beulich <JBeulich@suse.com>
> Cc: Keir Fraser <keir@xen.org>
> Cc: Juergen Gross <juergen.gross@ts.fujitsu.com>
> ---
>   xen/common/cpupool.c |    8 ++++++++
>   xen/common/domain.c  |   40 +++++++++++++++++++++++-----------------
>   2 files changed, 31 insertions(+), 17 deletions(-)
>
> diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
> index 2164a9f..23e461d 100644
> --- a/xen/common/cpupool.c
> +++ b/xen/common/cpupool.c
> @@ -355,6 +355,14 @@ int cpupool_unassign_cpu(struct cpupool *c, unsigned int cpu)
>       atomic_inc(&c->refcnt);
>       cpupool_cpu_moving = c;
>       cpumask_clear_cpu(cpu, c->cpu_valid);
> +
> +    rcu_read_lock(&domlist_read_lock);
> +    for_each_domain_in_cpupool(d, c)
> +    {
> +        domain_update_node_affinity(d);
> +    }
> +    rcu_read_unlock(&domlist_read_lock);
> +
>       spin_unlock(&cpupool_lock);
>
>       work_cpu = smp_processor_id();
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index 5999779..f708b3c 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -352,7 +352,6 @@ void domain_update_node_affinity(struct domain *d)
>       cpumask_var_t cpumask;
>       cpumask_var_t online_affinity;
>       const cpumask_t *online;
> -    nodemask_t nodemask = NODE_MASK_NONE;
>       struct vcpu *v;
>       unsigned int node;
>
> @@ -374,28 +373,35 @@ void domain_update_node_affinity(struct domain *d)
>           cpumask_or(cpumask, cpumask, online_affinity);
>       }
>
> -    if ( d->auto_node_affinity )
> -    {
> -        /* Node-affinity is automaically computed from all vcpu-affinities */
> -        for_each_online_node ( node )
> -            if ( cpumask_intersects(&node_to_cpumask(node), cpumask) )
> -                node_set(node, nodemask);
> -
> -        d->node_affinity = nodemask;
> -    }
> -    else
> +    if ( !d->auto_node_affinity )
>       {
>           /* Node-affinity is provided by someone else, just filter out cpus
>            * that are either offline or not in the affinity of any vcpus. */
> -        nodemask = d->node_affinity;
>           for_each_node_mask ( node, d->node_affinity )
>               if ( !cpumask_intersects(&node_to_cpumask(node), cpumask) )
> -                node_clear(node, nodemask);//d->node_affinity);
> +                node_clear(node, d->node_affinity);
> +
> +        /*
> +         * If we end up with an empty node-affinity, e.g., because the user
> +         * specified an incompatible vcpu-affinity, or moved the domain to
> +         * a different cpupool, reset the node-affinity and re-calculate it
> +         * (in the body of the if() below).
> +         *
> +         * This is necessary to prevent the schedulers from attempting
> +         * node-affinity load balancing on empty cpumasks, with (potentially)
> +         * nasty effects (increased overhead or even crash!).
> +         */
> +        if ( nodes_empty(d->node_affinity) )
> +            d->auto_node_affinity = 1;
> +    }
>
> -        /* Avoid loosing track of node-affinity because of a bad
> -         * vcpu-affinity has been specified. */
> -        if ( !nodes_empty(nodemask) )
> -            d->node_affinity = nodemask;
> +    if ( d->auto_node_affinity )
> +    {
> +        /* Node-affinity is automatically computed from all vcpu-affinities */
> +        nodes_clear(d->node_affinity);
> +        for_each_online_node ( node )
> +            if ( cpumask_intersects(&node_to_cpumask(node), cpumask) )
> +                node_set(node, d->node_affinity);
>       }
>
>       sched_set_node_affinity(d, &d->node_affinity);
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>
>


-- 
Juergen Gross                 Principal Developer Operating Systems
PBG PDG ES&S SWE OS6                   Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xen: make sure the node-affinity is always updated
  2013-09-12 15:57 [PATCH] xen: make sure the node-affinity is always updated Dario Faggioli
  2013-09-13  4:31 ` Juergen Gross
@ 2013-09-13  9:52 ` Dario Faggioli
  2013-09-26  8:05   ` Jan Beulich
  2013-09-13 10:45 ` George Dunlap
  2 siblings, 1 reply; 7+ messages in thread
From: Dario Faggioli @ 2013-09-13  9:52 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Juergen Gross, Keir Fraser, Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 1709 bytes --]


On gio, 2013-09-12 at 17:57 +0200, Dario Faggioli wrote:
> in particular when a domain moves from a cpupool to another, and
> when the list of cpus in a cpupool changes.
> 
> Not doing that may result in the domain's node-affinity mask
> (as retrieved by csched_balance_cpumask(..CSCHED_BALANCE_NODE_AFFINITY..) )
> and online mask (as retrieved by cpupool_scheduler_cpumask() ) to have
> an empty intersection.
> 
> [..]
>
> an ASSERT (from inside cpumask_cycle() ) triggers like this:
> 
> (XEN) Xen call trace:
> (XEN)    [<ffff82d08011b124>] _csched_cpu_pick+0x1d2/0x652
> (XEN)    [<ffff82d08011b5b2>] csched_cpu_pick+0xe/0x10
> (XEN)    [<ffff82d0801232de>] vcpu_migrate+0x167/0x31e
> (XEN)    [<ffff82d0801238cc>] cpu_disable_scheduler+0x1c8/0x287
> (XEN)    [<ffff82d080101b3f>] cpupool_unassign_cpu_helper+0x20/0xb4
> (XEN)    [<ffff82d08010544f>] continue_hypercall_tasklet_handler+0x4a/0xb1
> (XEN)    [<ffff82d080127793>] do_tasklet_work+0x78/0xab
> (XEN)    [<ffff82d080127a70>] do_tasklet+0x5f/0x8b
> (XEN)    [<ffff82d080158985>] idle_loop+0x57/0x5e
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 1:
> (XEN) Assertion 'cpu < nr_cpu_ids' failed at /home/dario/Sources/xen/xen/xen.git/xen/include/xe:16481
> 
BTW, I just checked, and this affects 4.3.0.

I'll request for it to be properly backported as soon as it hits the
unstable repo.

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xen: make sure the node-affinity is always updated
  2013-09-12 15:57 [PATCH] xen: make sure the node-affinity is always updated Dario Faggioli
  2013-09-13  4:31 ` Juergen Gross
  2013-09-13  9:52 ` Dario Faggioli
@ 2013-09-13 10:45 ` George Dunlap
  2013-09-13 12:43   ` Dario Faggioli
  2 siblings, 1 reply; 7+ messages in thread
From: George Dunlap @ 2013-09-13 10:45 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Juergen Gross, Keir Fraser, Jan Beulich, xen-devel@lists.xen.org

On Thu, Sep 12, 2013 at 4:57 PM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> in particular when a domain moves from a cpupool to another, and
> when the list of cpus in a cpupool changes.
>
> Not doing that may result in the domain's node-affinity mask
> (as retrieved by csched_balance_cpumask(..CSCHED_BALANCE_NODE_AFFINITY..) )
> and online mask (as retrieved by cpupool_scheduler_cpumask() ) to have
> an empty intersection.
>
> This in turn means that, in _csched_cpu_pick(), we think it is
> legitimate to do a node-affinity load balancing step and, when
> running this:
>
>     ...
>     /* Pick an online CPU from the proper affinity mask */
>     csched_balance_cpumask(vc, balance_step, &cpus);
>     cpumask_and(&cpus, &cpus, online);
>     ...
>
> we end up with an empty cpumask (in cpus). At this point, in
> the following code:
>
>     ....
>     /* If present, prefer vc's current processor */
>     cpu = cpumask_test_cpu(vc->processor, &cpus)
>             ? vc->processor
>             : cpumask_cycle(vc->processor, &cpus);
>     ....
>
> an ASSERT (from inside cpumask_cycle() ) triggers like this:
>
> (XEN) Xen call trace:
> (XEN)    [<ffff82d08011b124>] _csched_cpu_pick+0x1d2/0x652
> (XEN)    [<ffff82d08011b5b2>] csched_cpu_pick+0xe/0x10
> (XEN)    [<ffff82d0801232de>] vcpu_migrate+0x167/0x31e
> (XEN)    [<ffff82d0801238cc>] cpu_disable_scheduler+0x1c8/0x287
> (XEN)    [<ffff82d080101b3f>] cpupool_unassign_cpu_helper+0x20/0xb4
> (XEN)    [<ffff82d08010544f>] continue_hypercall_tasklet_handler+0x4a/0xb1
> (XEN)    [<ffff82d080127793>] do_tasklet_work+0x78/0xab
> (XEN)    [<ffff82d080127a70>] do_tasklet+0x5f/0x8b
> (XEN)    [<ffff82d080158985>] idle_loop+0x57/0x5e
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 1:
> (XEN) Assertion 'cpu < nr_cpu_ids' failed at /home/dario/Sources/xen/xen/xen.git/xen/include/xe:16481
>
> It is for example sufficient to have a domain with node-affinity
> to NUMA node 1 running, and issueing a `xl cpupool-numa-split'
> would make the above happen. That is because, by default, all
> the existing domains remain assigned to the first cpupool, and
> it now (after the cpupool-numa-split) only includes NUMA node 0.
>
> This change prevents that, by making sure that the node-affinity
> mask is reset and automatically recomputed, if that is the case,
> in domain_update_node_affinity(), as well as making sure that
> such function is invoked every time that is required.
>
> In fact, it makes much more sense to reset the node-affinity
> when either it and the domain's vcpu-affinity or the cpumap of
> the domain's cpupool are inconsistent, rather than not touching
> it, as it was before. Also, updating the domain's node affinity
> on the cpupool_unassing_cpu() path, as it already happens on
> the cpupool_assign_cpu_locked() path, was something we were
> missing anyway, even independently from the ASSERT triggering.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Cc: George Dunlap <George.Dunlap@eu.citrix.com>
> Cc: Jan Beulich <JBeulich@suse.com>
> Cc: Keir Fraser <keir@xen.org>
> Cc: Juergen Gross <juergen.gross@ts.fujitsu.com>
> ---
>  xen/common/cpupool.c |    8 ++++++++
>  xen/common/domain.c  |   40 +++++++++++++++++++++++-----------------
>  2 files changed, 31 insertions(+), 17 deletions(-)
>
> diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
> index 2164a9f..23e461d 100644
> --- a/xen/common/cpupool.c
> +++ b/xen/common/cpupool.c
> @@ -355,6 +355,14 @@ int cpupool_unassign_cpu(struct cpupool *c, unsigned int cpu)
>      atomic_inc(&c->refcnt);
>      cpupool_cpu_moving = c;
>      cpumask_clear_cpu(cpu, c->cpu_valid);
> +
> +    rcu_read_lock(&domlist_read_lock);
> +    for_each_domain_in_cpupool(d, c)
> +    {
> +        domain_update_node_affinity(d);
> +    }
> +    rcu_read_unlock(&domlist_read_lock);
> +
>      spin_unlock(&cpupool_lock);
>
>      work_cpu = smp_processor_id();
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index 5999779..f708b3c 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -352,7 +352,6 @@ void domain_update_node_affinity(struct domain *d)
>      cpumask_var_t cpumask;
>      cpumask_var_t online_affinity;
>      const cpumask_t *online;
> -    nodemask_t nodemask = NODE_MASK_NONE;
>      struct vcpu *v;
>      unsigned int node;
>
> @@ -374,28 +373,35 @@ void domain_update_node_affinity(struct domain *d)
>          cpumask_or(cpumask, cpumask, online_affinity);
>      }
>
> -    if ( d->auto_node_affinity )
> -    {
> -        /* Node-affinity is automaically computed from all vcpu-affinities */
> -        for_each_online_node ( node )
> -            if ( cpumask_intersects(&node_to_cpumask(node), cpumask) )
> -                node_set(node, nodemask);
> -
> -        d->node_affinity = nodemask;
> -    }
> -    else
> +    if ( !d->auto_node_affinity )
>      {
>          /* Node-affinity is provided by someone else, just filter out cpus
>           * that are either offline or not in the affinity of any vcpus. */
> -        nodemask = d->node_affinity;
>          for_each_node_mask ( node, d->node_affinity )
>              if ( !cpumask_intersects(&node_to_cpumask(node), cpumask) )
> -                node_clear(node, nodemask);//d->node_affinity);
> +                node_clear(node, d->node_affinity);
> +
> +        /*
> +         * If we end up with an empty node-affinity, e.g., because the user
> +         * specified an incompatible vcpu-affinity, or moved the domain to
> +         * a different cpupool, reset the node-affinity and re-calculate it
> +         * (in the body of the if() below).
> +         *
> +         * This is necessary to prevent the schedulers from attempting
> +         * node-affinity load balancing on empty cpumasks, with (potentially)
> +         * nasty effects (increased overhead or even crash!).
> +         */
> +        if ( nodes_empty(d->node_affinity) )
> +            d->auto_node_affinity = 1;

Hmm -- silently erasing one set of parameters and changing it to
"auto" doesn't seem that great from a UI perspective -- "principle of
least surprise", and all that.

Also, consider someone who has already set both a node and a vcpu
affinity, and decides they now want to change it such that the new
vcpu affinity doesn't intersect with the old node affinity, and does
the following:

1. Set new node affinity.  The update notices that the node & vcpu
affinity clash, and effectively removes the node affinity
2. Set new vcpu affinity.

Now we have the desired vcpu affinity, but the node affinity has been
erased.  To avoid this you have to set the vcpu affinity first, which
isn't exactly obvious.  (Either that, or #1 will not actually remove
the node affinity, and we still have the potential for a disjoint set
tripping over an assert if someone only updates node affinity but not
vcpu affinity.)

Since node affinity is just advisory, wouldn't it make more sense to
make the scheduling system tolerate a disjoint set of affinities?  If
we could make cpu_cycle indicate somehow that the mask is empty,
instead of crashing, then we can detect that and skip the "look for an
idle processor within the NUMA affinity" step.

For cpupool_numa split, it's probably a good idea to do one of the following:
1. Detect the numa affinity and place it in the appropriate pool
2. Detect the conflict and break the affinity from xl (giving a
warning that you're doing so)
3. Detect the conflict and just give a warning.

But that's a UI polish thing, rather than a bug fix, so is slightly
lower priority.

 -George

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xen: make sure the node-affinity is always updated
  2013-09-13 10:45 ` George Dunlap
@ 2013-09-13 12:43   ` Dario Faggioli
  2013-09-13 15:45     ` Dario Faggioli
  0 siblings, 1 reply; 7+ messages in thread
From: Dario Faggioli @ 2013-09-13 12:43 UTC (permalink / raw)
  To: George Dunlap
  Cc: Juergen Gross, Keir Fraser, Jan Beulich, xen-devel@lists.xen.org


[-- Attachment #1.1: Type: text/plain, Size: 6039 bytes --]

On ven, 2013-09-13 at 11:45 +0100, George Dunlap wrote:
> On Thu, Sep 12, 2013 at 4:57 PM, Dario Faggioli
>
> > diff --git a/xen/common/domain.c b/xen/common/domain.c
> > index 5999779..f708b3c 100644
> > --- a/xen/common/domain.c
> > +++ b/xen/common/domain.c
> > @@ -374,28 +373,35 @@ void domain_update_node_affinity(struct domain *d)
> >          cpumask_or(cpumask, cpumask, online_affinity);
>
> > +    if ( !d->auto_node_affinity )
> >      {
> >          /* Node-affinity is provided by someone else, just filter out cpus
> >           * that are either offline or not in the affinity of any vcpus. */
> > -        nodemask = d->node_affinity;
> >          for_each_node_mask ( node, d->node_affinity )
> >              if ( !cpumask_intersects(&node_to_cpumask(node), cpumask) )
> > -                node_clear(node, nodemask);//d->node_affinity);
> > +                node_clear(node, d->node_affinity);
> > +
> > +        /*
> > +         * If we end up with an empty node-affinity, e.g., because the user
> > +         * specified an incompatible vcpu-affinity, or moved the domain to
> > +         * a different cpupool, reset the node-affinity and re-calculate it
> > +         * (in the body of the if() below).
> > +         *
> > +         * This is necessary to prevent the schedulers from attempting
> > +         * node-affinity load balancing on empty cpumasks, with (potentially)
> > +         * nasty effects (increased overhead or even crash!).
> > +         */
> > +        if ( nodes_empty(d->node_affinity) )
> > +            d->auto_node_affinity = 1;
> 
> Hmm -- silently erasing one set of parameters and changing it to
> "auto" doesn't seem that great from a UI perspective -- "principle of
> least surprise", and all that.
> 
I know, and I agree. That is actually why I had quite an hard time
trying to figure out what it would be best done to fix the issue. :-(

> Also, consider someone who has already set both a node and a vcpu
> affinity, and decides they now want to change it such that the new
> vcpu affinity doesn't intersect with the old node affinity, and does
> the following:
> 
> 1. Set new node affinity.  The update notices that the node & vcpu
> affinity clash, and effectively removes the node affinity
> 2. Set new vcpu affinity.
> 
Well, that can't happen in that exact way, since it is _NOT_ possible to
change node-affinity on line. It is decided at domain creation time and
remains unchanged for the entire life of the domain.

I did it like this because, since node-affinity controls where the
memory is allocated, I think it would be rather confusing to allow for
it to change, and then leave the memory where it is. I've got patches to
enable that already, and I'll push them out together with the memory
migration work.

However, it is true that, before this patch, if someone sets a
vcpu-affinity which conflicts with the node-affinity nothing happens to
the node-affinity. After this patch that change and the node-affinity
will be reset to auto, and as I said, I don't like that either.

> Since node affinity is just advisory, wouldn't it make more sense to
> make the scheduling system tolerate a disjoint set of affinities?  If
> we could make cpu_cycle indicate somehow that the mask is empty,
> instead of crashing, then we can detect that and skip the "look for an
> idle processor within the NUMA affinity" step.
> 
That was the route I took for the first approach. It is indeed possible
to make the scheduler deal with that, especially since that is what it
happens already (and that was by design!) in all other places: if
node-affinity and vcpu-affinity mismatch, what you say is exactly what
happens.

However, when cpupools are involved (or, in general, where there is the
need to filter things against the online cpus, as in _cshced_cpu_pick()
), we have a third mask coming into play, which I thought I was handling
correctly, but was not.

The outcome of that being that taking care of that properly in the
scheduler will add some overhead (a bit more of cpumask_* manipulation)
in a quite hot path, and that is going to be payed by everyone, just for
the sake of this quite uncommon situation.

Anyway, let me try to look harder into that direction, perhaps I could
rearrange the stuff in such a way that the cost is not that higher.

> For cpupool_numa split, it's probably a good idea to do one of the following:
> 1. Detect the numa affinity and place it in the appropriate pool
> 2. Detect the conflict and break the affinity from xl (giving a
> warning that you're doing so)
> 3. Detect the conflict and just give a warning.
> 
cpupools already (potentially) breaks vcpu-affinity, and ISTR, the
warning is given in the form of a DEBUG_PRINTK() in Xen (or, anyway,
I've seen it in the logs, not on xl's stdout or stderr).

The reason why we can't do something too fancy for cpupool-numa-split
only is that it is entirely implemented in xl, by using the basic
building blocks of the cpupool interface (add-a-cpu-to-pool,
remove-a-cpu-to-pool, etc.).

The tricky part is _when_ to move the domain in the new pool, and
whether that will work or not really depends on how we deal with the
issue above (i.e., what do we do when the masks ends up in an empty
intersection state).

Let's see...

> But that's a UI polish thing, rather than a bug fix, so is slightly
> lower priority.
> 
Sure, but I was really concerned about making the interface a mess, even
if that happens by fixing the bug.

I'll think about it a bit more, and see if I can come up with a
different, less "surprising" solution from the UI standpoint in the next
hours, ok?

Thanks for your comments,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xen: make sure the node-affinity is always updated
  2013-09-13 12:43   ` Dario Faggioli
@ 2013-09-13 15:45     ` Dario Faggioli
  0 siblings, 0 replies; 7+ messages in thread
From: Dario Faggioli @ 2013-09-13 15:45 UTC (permalink / raw)
  To: George Dunlap
  Cc: Juergen Gross, Keir Fraser, Jan Beulich, xen-devel@lists.xen.org


[-- Attachment #1.1: Type: text/plain, Size: 1261 bytes --]

On ven, 2013-09-13 at 14:43 +0200, Dario Faggioli wrote:
> On ven, 2013-09-13 at 11:45 +0100, George Dunlap wrote:
> > But that's a UI polish thing, rather than a bug fix, so is slightly
> > lower priority.
> > 
> Sure, but I was really concerned about making the interface a mess, even
> if that happens by fixing the bug.
> 
> I'll think about it a bit more, and see if I can come up with a
> different, less "surprising" solution from the UI standpoint in the next
> hours, ok?
> 
Ok, I did found a way to deal with this entirely in the (credit)
scheduler, without adding too many cpumask operations (well, I added a
cpumask_and() in _csched_cpu_pick(), but that's it).

I'll send it as a new patch, containing only the minimum required to fix
the bug, so to make backporting easier.

Jurgen, I'll bundle the cpupool hunk with a couple of other patches I've
got about cpupools changes I have in my queue and send a series in the
coming days.

Thanks and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xen: make sure the node-affinity is always updated
  2013-09-13  9:52 ` Dario Faggioli
@ 2013-09-26  8:05   ` Jan Beulich
  0 siblings, 0 replies; 7+ messages in thread
From: Jan Beulich @ 2013-09-26  8:05 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: George Dunlap, xen-devel, Juergen Gross, Keir Fraser

>>> On 13.09.13 at 11:52, Dario Faggioli <dario.faggioli@citrix.com> wrote:
> BTW, I just checked, and this affects 4.3.0.
> 
> I'll request for it to be properly backported as soon as it hits the
> unstable repo.

As this has undergone quite some changes (including patch
title), please indicate again which patch(es) need(s) backporting
once all constituent parts are in.

Jan

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-09-26  8:05 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-12 15:57 [PATCH] xen: make sure the node-affinity is always updated Dario Faggioli
2013-09-13  4:31 ` Juergen Gross
2013-09-13  9:52 ` Dario Faggioli
2013-09-26  8:05   ` Jan Beulich
2013-09-13 10:45 ` George Dunlap
2013-09-13 12:43   ` Dario Faggioli
2013-09-13 15:45     ` Dario Faggioli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).