From mboxrd@z Thu Jan 1 00:00:00 1970 From: George Dunlap Subject: Re: [PATCH] xen: sched_credit: filter node-affinity mask against online cpus Date: Mon, 16 Sep 2013 15:00:09 +0100 Message-ID: <52370EE9.3040005@eu.citrix.com> References: <20130913160856.25674.4965.stgit@hit-nxdomain.opendns.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20130913160856.25674.4965.stgit@hit-nxdomain.opendns.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Dario Faggioli Cc: Keir Fraser , Jan Beulich , xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org On 13/09/13 17:09, Dario Faggioli wrote: > in _csched_cpu_pick(), as not doing so may result in the domain's > node-affinity mask (as retrieved by csched_balance_cpumask() ) > and online mask (as retrieved by cpupool_scheduler_cpumask() ) > having an empty intersection. > > Therefore, when attempting a node-affinity load balancing step > and running this: > > ... > /* Pick an online CPU from the proper affinity mask */ > csched_balance_cpumask(vc, balance_step, &cpus); > cpumask_and(&cpus, &cpus, online); > ... > > we end up with an empty cpumask (in cpus). At this point, in > the following code: > > .... > /* If present, prefer vc's current processor */ > cpu = cpumask_test_cpu(vc->processor, &cpus) > ? vc->processor > : cpumask_cycle(vc->processor, &cpus); > .... > > an ASSERT (from inside cpumask_cycle() ) triggers like this: > > (XEN) Xen call trace: > (XEN) [] _csched_cpu_pick+0x1d2/0x652 > (XEN) [] csched_cpu_pick+0xe/0x10 > (XEN) [] vcpu_migrate+0x167/0x31e > (XEN) [] cpu_disable_scheduler+0x1c8/0x287 > (XEN) [] cpupool_unassign_cpu_helper+0x20/0xb4 > (XEN) [] continue_hypercall_tasklet_handler+0x4a/0xb1 > (XEN) [] do_tasklet_work+0x78/0xab > (XEN) [] do_tasklet+0x5f/0x8b > (XEN) [] idle_loop+0x57/0x5e > (XEN) > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 1: > (XEN) Assertion 'cpu < nr_cpu_ids' failed at /home/dario/Sources/xen/xen/xen.git/xen/include/xe:16481 > > It is for example sufficient to have a domain with node-affinity > to NUMA node 1 running, and issueing a `xl cpupool-numa-split' > would make the above happen. That is because, by default, all > the existing domains remain assigned to the first cpupool, and > it now (after the cpupool-numa-split) only includes NUMA node 0. > > This change prevents that by generalizing the function used > for figuring out whether a node-affinity load balancing step > is legit or not. This way we can, in _csched_cpu_pick(), > figure out early enough that the mask would end up empty, > skip the step all together and avoid the splat. > > Signed-off-by: Dario Faggioli > Cc: George Dunlap > Cc: Jan Beulich > Cc: Keir Fraser > --- > This was originally submitted (yesterday) as "xen: make sure the node-affinity > is always updated". After discussing it with George, I got convinced that a fix > like this is more appropriate, since it deals with the issue in the scheduler, > without making a mess out of the user interface. > > This is therefore not being posted as a real v2 of that patch because the > approach radically changed. > --- > xen/common/sched_credit.c | 41 ++++++++++++++++++++++++++++++----------- > 1 file changed, 30 insertions(+), 11 deletions(-) > > diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c > index dbe6de6..d76cd88 100644 > --- a/xen/common/sched_credit.c > +++ b/xen/common/sched_credit.c > @@ -296,15 +296,25 @@ static void csched_set_node_affinity( > * vcpu-affinity balancing is always necessary and must never be skipped. > * OTOH, if a domain's node-affinity is said to be automatically computed > * (or if it just spans all the nodes), we can safely avoid dealing with > - * node-affinity entirely. Ah, node-affinity is also deemed meaningless > - * in case it has empty intersection with the vcpu's vcpu-affinity, as it > - * would mean trying to schedule it on _no_ pcpu! > + * node-affinity entirely. > + * > + * Node-affinity is also deemed meaningless in case it has empty > + * intersection with mask, to cover the cases where using the node-affinity > + * mask seems legit, but would instead led to trying to schedule the vcpu > + * on _no_ pcpu! Typical use cases are for mask to be equal to the vcpu's > + * vcpu-affinity, or to the && of vcpu-affinity and the set of online cpus > + * in the domain's cpupool. > */ > -#define __vcpu_has_node_affinity(vc) \ > - ( !(cpumask_full(CSCHED_DOM(vc->domain)->node_affinity_cpumask) \ > - || !cpumask_intersects(vc->cpu_affinity, \ > - CSCHED_DOM(vc->domain)->node_affinity_cpumask) \ > - || vc->domain->auto_node_affinity == 1) ) > +static inline int __vcpu_has_node_affinity(struct vcpu *vc, cpumask_t *mask) > +{ > + if ( vc->domain->auto_node_affinity == 1 > + || cpumask_full(CSCHED_DOM(vc->domain)->node_affinity_cpumask) > + || !cpumask_intersects(CSCHED_DOM(vc->domain)->node_affinity_cpumask, > + mask) ) > + return 0; > + > + return 1; > +} > > /* > * Each csched-balance step uses its own cpumask. This function determines > @@ -393,7 +403,8 @@ __runq_tickle(unsigned int cpu, struct csched_vcpu *new) > int new_idlers_empty; > > if ( balance_step == CSCHED_BALANCE_NODE_AFFINITY > - && !__vcpu_has_node_affinity(new->vcpu) ) > + && !__vcpu_has_node_affinity(new->vcpu, > + new->vcpu->cpu_affinity) ) > continue; > > /* Are there idlers suitable for new (for this balance step)? */ > @@ -627,10 +638,18 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit) > int balance_step; > > online = cpupool_scheduler_cpumask(vc->domain->cpupool); > + cpumask_and(&cpus, vc->cpu_affinity, online); > + > for_each_csched_balance_step( balance_step ) > { > + /* > + * We filter the node-affinity mask against > + * [vc->cpu_affinity && online] here to avoid problems if all > + * the cpus in in the node-affinity mask are offline (e.g., > + * because the domain moved to a different cpupool). > + */ It's probably worth mentioning that the reason you do the cpumask_and here and not in the other place this is called is that cpumask_cycle is called after this one (which will choke on an empy mask), but not in the other place it's called. Other than that, looks good -- thanks. -George