From: Juergen Gross <juergen.gross@ts.fujitsu.com>
To: Dario Faggioli <dario.faggioli@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
Keir Fraser <keir@xen.org>, Jan Beulich <JBeulich@suse.com>,
xen-devel@lists.xen.org
Subject: Re: [PATCH] xen: make sure the node-affinity is always updated
Date: Fri, 13 Sep 2013 06:31:55 +0200 [thread overview]
Message-ID: <5232953B.6090900@ts.fujitsu.com> (raw)
In-Reply-To: <20130912155657.20126.35983.stgit@hit-nxdomain.opendns.com>
On 12.09.2013 17:57, Dario Faggioli wrote:
> in particular when a domain moves from a cpupool to another, and
> when the list of cpus in a cpupool changes.
>
> Not doing that may result in the domain's node-affinity mask
> (as retrieved by csched_balance_cpumask(..CSCHED_BALANCE_NODE_AFFINITY..) )
> and online mask (as retrieved by cpupool_scheduler_cpumask() ) to have
> an empty intersection.
>
> This in turn means that, in _csched_cpu_pick(), we think it is
> legitimate to do a node-affinity load balancing step and, when
> running this:
>
> ...
> /* Pick an online CPU from the proper affinity mask */
> csched_balance_cpumask(vc, balance_step, &cpus);
> cpumask_and(&cpus, &cpus, online);
> ...
>
> we end up with an empty cpumask (in cpus). At this point, in
> the following code:
>
> ....
> /* If present, prefer vc's current processor */
> cpu = cpumask_test_cpu(vc->processor, &cpus)
> ? vc->processor
> : cpumask_cycle(vc->processor, &cpus);
> ....
>
> an ASSERT (from inside cpumask_cycle() ) triggers like this:
>
> (XEN) Xen call trace:
> (XEN) [<ffff82d08011b124>] _csched_cpu_pick+0x1d2/0x652
> (XEN) [<ffff82d08011b5b2>] csched_cpu_pick+0xe/0x10
> (XEN) [<ffff82d0801232de>] vcpu_migrate+0x167/0x31e
> (XEN) [<ffff82d0801238cc>] cpu_disable_scheduler+0x1c8/0x287
> (XEN) [<ffff82d080101b3f>] cpupool_unassign_cpu_helper+0x20/0xb4
> (XEN) [<ffff82d08010544f>] continue_hypercall_tasklet_handler+0x4a/0xb1
> (XEN) [<ffff82d080127793>] do_tasklet_work+0x78/0xab
> (XEN) [<ffff82d080127a70>] do_tasklet+0x5f/0x8b
> (XEN) [<ffff82d080158985>] idle_loop+0x57/0x5e
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 1:
> (XEN) Assertion 'cpu < nr_cpu_ids' failed at /home/dario/Sources/xen/xen/xen.git/xen/include/xe:16481
>
> It is for example sufficient to have a domain with node-affinity
> to NUMA node 1 running, and issueing a `xl cpupool-numa-split'
> would make the above happen. That is because, by default, all
> the existing domains remain assigned to the first cpupool, and
> it now (after the cpupool-numa-split) only includes NUMA node 0.
>
> This change prevents that, by making sure that the node-affinity
> mask is reset and automatically recomputed, if that is the case,
> in domain_update_node_affinity(), as well as making sure that
> such function is invoked every time that is required.
>
> In fact, it makes much more sense to reset the node-affinity
> when either it and the domain's vcpu-affinity or the cpumap of
> the domain's cpupool are inconsistent, rather than not touching
> it, as it was before. Also, updating the domain's node affinity
> on the cpupool_unassing_cpu() path, as it already happens on
> the cpupool_assign_cpu_locked() path, was something we were
> missing anyway, even independently from the ASSERT triggering.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Juergen Gross <juergen.gross@ts.fujitsu.com>
> Cc: George Dunlap <George.Dunlap@eu.citrix.com>
> Cc: Jan Beulich <JBeulich@suse.com>
> Cc: Keir Fraser <keir@xen.org>
> Cc: Juergen Gross <juergen.gross@ts.fujitsu.com>
> ---
> xen/common/cpupool.c | 8 ++++++++
> xen/common/domain.c | 40 +++++++++++++++++++++++-----------------
> 2 files changed, 31 insertions(+), 17 deletions(-)
>
> diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
> index 2164a9f..23e461d 100644
> --- a/xen/common/cpupool.c
> +++ b/xen/common/cpupool.c
> @@ -355,6 +355,14 @@ int cpupool_unassign_cpu(struct cpupool *c, unsigned int cpu)
> atomic_inc(&c->refcnt);
> cpupool_cpu_moving = c;
> cpumask_clear_cpu(cpu, c->cpu_valid);
> +
> + rcu_read_lock(&domlist_read_lock);
> + for_each_domain_in_cpupool(d, c)
> + {
> + domain_update_node_affinity(d);
> + }
> + rcu_read_unlock(&domlist_read_lock);
> +
> spin_unlock(&cpupool_lock);
>
> work_cpu = smp_processor_id();
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index 5999779..f708b3c 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -352,7 +352,6 @@ void domain_update_node_affinity(struct domain *d)
> cpumask_var_t cpumask;
> cpumask_var_t online_affinity;
> const cpumask_t *online;
> - nodemask_t nodemask = NODE_MASK_NONE;
> struct vcpu *v;
> unsigned int node;
>
> @@ -374,28 +373,35 @@ void domain_update_node_affinity(struct domain *d)
> cpumask_or(cpumask, cpumask, online_affinity);
> }
>
> - if ( d->auto_node_affinity )
> - {
> - /* Node-affinity is automaically computed from all vcpu-affinities */
> - for_each_online_node ( node )
> - if ( cpumask_intersects(&node_to_cpumask(node), cpumask) )
> - node_set(node, nodemask);
> -
> - d->node_affinity = nodemask;
> - }
> - else
> + if ( !d->auto_node_affinity )
> {
> /* Node-affinity is provided by someone else, just filter out cpus
> * that are either offline or not in the affinity of any vcpus. */
> - nodemask = d->node_affinity;
> for_each_node_mask ( node, d->node_affinity )
> if ( !cpumask_intersects(&node_to_cpumask(node), cpumask) )
> - node_clear(node, nodemask);//d->node_affinity);
> + node_clear(node, d->node_affinity);
> +
> + /*
> + * If we end up with an empty node-affinity, e.g., because the user
> + * specified an incompatible vcpu-affinity, or moved the domain to
> + * a different cpupool, reset the node-affinity and re-calculate it
> + * (in the body of the if() below).
> + *
> + * This is necessary to prevent the schedulers from attempting
> + * node-affinity load balancing on empty cpumasks, with (potentially)
> + * nasty effects (increased overhead or even crash!).
> + */
> + if ( nodes_empty(d->node_affinity) )
> + d->auto_node_affinity = 1;
> + }
>
> - /* Avoid loosing track of node-affinity because of a bad
> - * vcpu-affinity has been specified. */
> - if ( !nodes_empty(nodemask) )
> - d->node_affinity = nodemask;
> + if ( d->auto_node_affinity )
> + {
> + /* Node-affinity is automatically computed from all vcpu-affinities */
> + nodes_clear(d->node_affinity);
> + for_each_online_node ( node )
> + if ( cpumask_intersects(&node_to_cpumask(node), cpumask) )
> + node_set(node, d->node_affinity);
> }
>
> sched_set_node_affinity(d, &d->node_affinity);
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>
>
--
Juergen Gross Principal Developer Operating Systems
PBG PDG ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28 Internet: ts.fujitsu.com
D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html
next prev parent reply other threads:[~2013-09-13 4:31 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-09-12 15:57 [PATCH] xen: make sure the node-affinity is always updated Dario Faggioli
2013-09-13 4:31 ` Juergen Gross [this message]
2013-09-13 9:52 ` Dario Faggioli
2013-09-26 8:05 ` Jan Beulich
2013-09-13 10:45 ` George Dunlap
2013-09-13 12:43 ` Dario Faggioli
2013-09-13 15:45 ` Dario Faggioli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5232953B.6090900@ts.fujitsu.com \
--to=juergen.gross@ts.fujitsu.com \
--cc=George.Dunlap@eu.citrix.com \
--cc=JBeulich@suse.com \
--cc=dario.faggioli@citrix.com \
--cc=keir@xen.org \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.