From mboxrd@z Thu Jan 1 00:00:00 1970 From: George Dunlap Subject: Re: [PATCH v2] xen: sched_credit: filter node-affinity mask against online cpus Date: Thu, 19 Sep 2013 11:46:51 +0100 Message-ID: <523AD61B.7000102@eu.citrix.com> References: <20130917151644.2240.68085.stgit@hit-nxdomain.opendns.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20130917151644.2240.68085.stgit@hit-nxdomain.opendns.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Dario Faggioli Cc: Keir Fraser , Jan Beulich , xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org On 17/09/13 16:16, Dario Faggioli wrote: > in _csched_cpu_pick(), as not doing so may result in the domain's > node-affinity mask (as retrieved by csched_balance_cpumask() ) > and online mask (as retrieved by cpupool_scheduler_cpumask() ) > having an empty intersection. > > Therefore, when attempting a node-affinity load balancing step > and running this: > > ... > /* Pick an online CPU from the proper affinity mask */ > csched_balance_cpumask(vc, balance_step, &cpus); > cpumask_and(&cpus, &cpus, online); > ... > > we end up with an empty cpumask (in cpus). At this point, in > the following code: > > .... > /* If present, prefer vc's current processor */ > cpu = cpumask_test_cpu(vc->processor, &cpus) > ? vc->processor > : cpumask_cycle(vc->processor, &cpus); > .... > > an ASSERT (from inside cpumask_cycle() ) triggers like this: > > (XEN) Xen call trace: > (XEN) [] _csched_cpu_pick+0x1d2/0x652 > (XEN) [] csched_cpu_pick+0xe/0x10 > (XEN) [] vcpu_migrate+0x167/0x31e > (XEN) [] cpu_disable_scheduler+0x1c8/0x287 > (XEN) [] cpupool_unassign_cpu_helper+0x20/0xb4 > (XEN) [] continue_hypercall_tasklet_handler+0x4a/0xb1 > (XEN) [] do_tasklet_work+0x78/0xab > (XEN) [] do_tasklet+0x5f/0x8b > (XEN) [] idle_loop+0x57/0x5e > (XEN) > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 1: > (XEN) Assertion 'cpu < nr_cpu_ids' failed at /home/dario/Sources/xen/xen/xen.git/xen/include/xe:16481 > > It is for example sufficient to have a domain with node-affinity > to NUMA node 1 running, and issueing a `xl cpupool-numa-split' > would make the above happen. That is because, by default, all > the existing domains remain assigned to the first cpupool, and > it now (after the cpupool-numa-split) only includes NUMA node 0. > > This change prevents that by generalizing the function used > for figuring out whether a node-affinity load balancing step > is legit or not. This way we can, in _csched_cpu_pick(), > figure out early enough that the mask would end up empty, > skip the step all together and avoid the splat. > > Signed-off-by: Dario Faggioli Reviewed-by: George Dunlap