From: George Dunlap <george.dunlap@eu.citrix.com>
To: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Keir Fraser <keir@xen.org>, Jan Beulich <JBeulich@suse.com>,
xen-devel@lists.xen.org
Subject: Re: [PATCH] xen: sched_credit: filter node-affinity mask against online cpus
Date: Mon, 16 Sep 2013 15:00:09 +0100 [thread overview]
Message-ID: <52370EE9.3040005@eu.citrix.com> (raw)
In-Reply-To: <20130913160856.25674.4965.stgit@hit-nxdomain.opendns.com>
On 13/09/13 17:09, Dario Faggioli wrote:
> in _csched_cpu_pick(), as not doing so may result in the domain's
> node-affinity mask (as retrieved by csched_balance_cpumask() )
> and online mask (as retrieved by cpupool_scheduler_cpumask() )
> having an empty intersection.
>
> Therefore, when attempting a node-affinity load balancing step
> and running this:
>
> ...
> /* Pick an online CPU from the proper affinity mask */
> csched_balance_cpumask(vc, balance_step, &cpus);
> cpumask_and(&cpus, &cpus, online);
> ...
>
> we end up with an empty cpumask (in cpus). At this point, in
> the following code:
>
> ....
> /* If present, prefer vc's current processor */
> cpu = cpumask_test_cpu(vc->processor, &cpus)
> ? vc->processor
> : cpumask_cycle(vc->processor, &cpus);
> ....
>
> an ASSERT (from inside cpumask_cycle() ) triggers like this:
>
> (XEN) Xen call trace:
> (XEN) [<ffff82d08011b124>] _csched_cpu_pick+0x1d2/0x652
> (XEN) [<ffff82d08011b5b2>] csched_cpu_pick+0xe/0x10
> (XEN) [<ffff82d0801232de>] vcpu_migrate+0x167/0x31e
> (XEN) [<ffff82d0801238cc>] cpu_disable_scheduler+0x1c8/0x287
> (XEN) [<ffff82d080101b3f>] cpupool_unassign_cpu_helper+0x20/0xb4
> (XEN) [<ffff82d08010544f>] continue_hypercall_tasklet_handler+0x4a/0xb1
> (XEN) [<ffff82d080127793>] do_tasklet_work+0x78/0xab
> (XEN) [<ffff82d080127a70>] do_tasklet+0x5f/0x8b
> (XEN) [<ffff82d080158985>] idle_loop+0x57/0x5e
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 1:
> (XEN) Assertion 'cpu < nr_cpu_ids' failed at /home/dario/Sources/xen/xen/xen.git/xen/include/xe:16481
>
> It is for example sufficient to have a domain with node-affinity
> to NUMA node 1 running, and issueing a `xl cpupool-numa-split'
> would make the above happen. That is because, by default, all
> the existing domains remain assigned to the first cpupool, and
> it now (after the cpupool-numa-split) only includes NUMA node 0.
>
> This change prevents that by generalizing the function used
> for figuring out whether a node-affinity load balancing step
> is legit or not. This way we can, in _csched_cpu_pick(),
> figure out early enough that the mask would end up empty,
> skip the step all together and avoid the splat.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Cc: George Dunlap <George.Dunlap@eu.citrix.com>
> Cc: Jan Beulich <JBeulich@suse.com>
> Cc: Keir Fraser <keir@xen.org>
> ---
> This was originally submitted (yesterday) as "xen: make sure the node-affinity
> is always updated". After discussing it with George, I got convinced that a fix
> like this is more appropriate, since it deals with the issue in the scheduler,
> without making a mess out of the user interface.
>
> This is therefore not being posted as a real v2 of that patch because the
> approach radically changed.
> ---
> xen/common/sched_credit.c | 41 ++++++++++++++++++++++++++++++-----------
> 1 file changed, 30 insertions(+), 11 deletions(-)
>
> diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
> index dbe6de6..d76cd88 100644
> --- a/xen/common/sched_credit.c
> +++ b/xen/common/sched_credit.c
> @@ -296,15 +296,25 @@ static void csched_set_node_affinity(
> * vcpu-affinity balancing is always necessary and must never be skipped.
> * OTOH, if a domain's node-affinity is said to be automatically computed
> * (or if it just spans all the nodes), we can safely avoid dealing with
> - * node-affinity entirely. Ah, node-affinity is also deemed meaningless
> - * in case it has empty intersection with the vcpu's vcpu-affinity, as it
> - * would mean trying to schedule it on _no_ pcpu!
> + * node-affinity entirely.
> + *
> + * Node-affinity is also deemed meaningless in case it has empty
> + * intersection with mask, to cover the cases where using the node-affinity
> + * mask seems legit, but would instead led to trying to schedule the vcpu
> + * on _no_ pcpu! Typical use cases are for mask to be equal to the vcpu's
> + * vcpu-affinity, or to the && of vcpu-affinity and the set of online cpus
> + * in the domain's cpupool.
> */
> -#define __vcpu_has_node_affinity(vc) \
> - ( !(cpumask_full(CSCHED_DOM(vc->domain)->node_affinity_cpumask) \
> - || !cpumask_intersects(vc->cpu_affinity, \
> - CSCHED_DOM(vc->domain)->node_affinity_cpumask) \
> - || vc->domain->auto_node_affinity == 1) )
> +static inline int __vcpu_has_node_affinity(struct vcpu *vc, cpumask_t *mask)
> +{
> + if ( vc->domain->auto_node_affinity == 1
> + || cpumask_full(CSCHED_DOM(vc->domain)->node_affinity_cpumask)
> + || !cpumask_intersects(CSCHED_DOM(vc->domain)->node_affinity_cpumask,
> + mask) )
> + return 0;
> +
> + return 1;
> +}
>
> /*
> * Each csched-balance step uses its own cpumask. This function determines
> @@ -393,7 +403,8 @@ __runq_tickle(unsigned int cpu, struct csched_vcpu *new)
> int new_idlers_empty;
>
> if ( balance_step == CSCHED_BALANCE_NODE_AFFINITY
> - && !__vcpu_has_node_affinity(new->vcpu) )
> + && !__vcpu_has_node_affinity(new->vcpu,
> + new->vcpu->cpu_affinity) )
> continue;
>
> /* Are there idlers suitable for new (for this balance step)? */
> @@ -627,10 +638,18 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit)
> int balance_step;
>
> online = cpupool_scheduler_cpumask(vc->domain->cpupool);
> + cpumask_and(&cpus, vc->cpu_affinity, online);
> +
> for_each_csched_balance_step( balance_step )
> {
> + /*
> + * We filter the node-affinity mask against
> + * [vc->cpu_affinity && online] here to avoid problems if all
> + * the cpus in in the node-affinity mask are offline (e.g.,
> + * because the domain moved to a different cpupool).
> + */
It's probably worth mentioning that the reason you do the cpumask_and
here and not in the other place this is called is that cpumask_cycle is
called after this one (which will choke on an empy mask), but not in the
other place it's called.
Other than that, looks good -- thanks.
-George
next prev parent reply other threads:[~2013-09-16 14:00 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-09-13 16:09 [PATCH] xen: sched_credit: filter node-affinity mask against online cpus Dario Faggioli
2013-09-16 14:00 ` George Dunlap [this message]
2013-09-16 15:23 ` Dario Faggioli
2013-09-16 15:27 ` George Dunlap
2013-09-17 15:14 ` Dario Faggioli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52370EE9.3040005@eu.citrix.com \
--to=george.dunlap@eu.citrix.com \
--cc=JBeulich@suse.com \
--cc=dario.faggioli@citrix.com \
--cc=keir@xen.org \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.