From mboxrd@z Thu Jan 1 00:00:00 1970 From: Juergen Gross Subject: Re: [PATCH] xen: make sure the node-affinity is always updated Date: Fri, 13 Sep 2013 06:31:55 +0200 Message-ID: <5232953B.6090900@ts.fujitsu.com> References: <20130912155657.20126.35983.stgit@hit-nxdomain.opendns.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20130912155657.20126.35983.stgit@hit-nxdomain.opendns.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Dario Faggioli Cc: George Dunlap , Keir Fraser , Jan Beulich , xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org On 12.09.2013 17:57, Dario Faggioli wrote: > in particular when a domain moves from a cpupool to another, and > when the list of cpus in a cpupool changes. > > Not doing that may result in the domain's node-affinity mask > (as retrieved by csched_balance_cpumask(..CSCHED_BALANCE_NODE_AFFINITY..) ) > and online mask (as retrieved by cpupool_scheduler_cpumask() ) to have > an empty intersection. > > This in turn means that, in _csched_cpu_pick(), we think it is > legitimate to do a node-affinity load balancing step and, when > running this: > > ... > /* Pick an online CPU from the proper affinity mask */ > csched_balance_cpumask(vc, balance_step, &cpus); > cpumask_and(&cpus, &cpus, online); > ... > > we end up with an empty cpumask (in cpus). At this point, in > the following code: > > .... > /* If present, prefer vc's current processor */ > cpu = cpumask_test_cpu(vc->processor, &cpus) > ? vc->processor > : cpumask_cycle(vc->processor, &cpus); > .... > > an ASSERT (from inside cpumask_cycle() ) triggers like this: > > (XEN) Xen call trace: > (XEN) [] _csched_cpu_pick+0x1d2/0x652 > (XEN) [] csched_cpu_pick+0xe/0x10 > (XEN) [] vcpu_migrate+0x167/0x31e > (XEN) [] cpu_disable_scheduler+0x1c8/0x287 > (XEN) [] cpupool_unassign_cpu_helper+0x20/0xb4 > (XEN) [] continue_hypercall_tasklet_handler+0x4a/0xb1 > (XEN) [] do_tasklet_work+0x78/0xab > (XEN) [] do_tasklet+0x5f/0x8b > (XEN) [] idle_loop+0x57/0x5e > (XEN) > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 1: > (XEN) Assertion 'cpu < nr_cpu_ids' failed at /home/dario/Sources/xen/xen/xen.git/xen/include/xe:16481 > > It is for example sufficient to have a domain with node-affinity > to NUMA node 1 running, and issueing a `xl cpupool-numa-split' > would make the above happen. That is because, by default, all > the existing domains remain assigned to the first cpupool, and > it now (after the cpupool-numa-split) only includes NUMA node 0. > > This change prevents that, by making sure that the node-affinity > mask is reset and automatically recomputed, if that is the case, > in domain_update_node_affinity(), as well as making sure that > such function is invoked every time that is required. > > In fact, it makes much more sense to reset the node-affinity > when either it and the domain's vcpu-affinity or the cpumap of > the domain's cpupool are inconsistent, rather than not touching > it, as it was before. Also, updating the domain's node affinity > on the cpupool_unassing_cpu() path, as it already happens on > the cpupool_assign_cpu_locked() path, was something we were > missing anyway, even independently from the ASSERT triggering. > > Signed-off-by: Dario Faggioli Acked-by: Juergen Gross > Cc: George Dunlap > Cc: Jan Beulich > Cc: Keir Fraser > Cc: Juergen Gross > --- > xen/common/cpupool.c | 8 ++++++++ > xen/common/domain.c | 40 +++++++++++++++++++++++----------------- > 2 files changed, 31 insertions(+), 17 deletions(-) > > diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c > index 2164a9f..23e461d 100644 > --- a/xen/common/cpupool.c > +++ b/xen/common/cpupool.c > @@ -355,6 +355,14 @@ int cpupool_unassign_cpu(struct cpupool *c, unsigned int cpu) > atomic_inc(&c->refcnt); > cpupool_cpu_moving = c; > cpumask_clear_cpu(cpu, c->cpu_valid); > + > + rcu_read_lock(&domlist_read_lock); > + for_each_domain_in_cpupool(d, c) > + { > + domain_update_node_affinity(d); > + } > + rcu_read_unlock(&domlist_read_lock); > + > spin_unlock(&cpupool_lock); > > work_cpu = smp_processor_id(); > diff --git a/xen/common/domain.c b/xen/common/domain.c > index 5999779..f708b3c 100644 > --- a/xen/common/domain.c > +++ b/xen/common/domain.c > @@ -352,7 +352,6 @@ void domain_update_node_affinity(struct domain *d) > cpumask_var_t cpumask; > cpumask_var_t online_affinity; > const cpumask_t *online; > - nodemask_t nodemask = NODE_MASK_NONE; > struct vcpu *v; > unsigned int node; > > @@ -374,28 +373,35 @@ void domain_update_node_affinity(struct domain *d) > cpumask_or(cpumask, cpumask, online_affinity); > } > > - if ( d->auto_node_affinity ) > - { > - /* Node-affinity is automaically computed from all vcpu-affinities */ > - for_each_online_node ( node ) > - if ( cpumask_intersects(&node_to_cpumask(node), cpumask) ) > - node_set(node, nodemask); > - > - d->node_affinity = nodemask; > - } > - else > + if ( !d->auto_node_affinity ) > { > /* Node-affinity is provided by someone else, just filter out cpus > * that are either offline or not in the affinity of any vcpus. */ > - nodemask = d->node_affinity; > for_each_node_mask ( node, d->node_affinity ) > if ( !cpumask_intersects(&node_to_cpumask(node), cpumask) ) > - node_clear(node, nodemask);//d->node_affinity); > + node_clear(node, d->node_affinity); > + > + /* > + * If we end up with an empty node-affinity, e.g., because the user > + * specified an incompatible vcpu-affinity, or moved the domain to > + * a different cpupool, reset the node-affinity and re-calculate it > + * (in the body of the if() below). > + * > + * This is necessary to prevent the schedulers from attempting > + * node-affinity load balancing on empty cpumasks, with (potentially) > + * nasty effects (increased overhead or even crash!). > + */ > + if ( nodes_empty(d->node_affinity) ) > + d->auto_node_affinity = 1; > + } > > - /* Avoid loosing track of node-affinity because of a bad > - * vcpu-affinity has been specified. */ > - if ( !nodes_empty(nodemask) ) > - d->node_affinity = nodemask; > + if ( d->auto_node_affinity ) > + { > + /* Node-affinity is automatically computed from all vcpu-affinities */ > + nodes_clear(d->node_affinity); > + for_each_online_node ( node ) > + if ( cpumask_intersects(&node_to_cpumask(node), cpumask) ) > + node_set(node, d->node_affinity); > } > > sched_set_node_affinity(d, &d->node_affinity); > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel > > -- Juergen Gross Principal Developer Operating Systems PBG PDG ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html