All of lore.kernel.org
 help / color / mirror / Atom feed
From: Juergen Gross <jgross@suse.com>
To: Dario Faggioli <dario.faggioli@citrix.com>,
	xen-devel@lists.xenproject.org
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Subject: Re: [PATCH 4/4] xen: sched/cpupool: properly update affinity when removing a cpu from a cpupool
Date: Tue, 07 Jul 2015 13:30:34 +0200	[thread overview]
Message-ID: <559BB85A.10708@suse.com> (raw)
In-Reply-To: <20150703154937.23194.24091.stgit@Solace.station>

On 07/03/2015 05:49 PM, Dario Faggioli wrote:
> And this time, do it right. In fact, a similar change was
> attempted in 93be8285a79c6 ("cpupools: update domU's node-affinity
> on the cpupool_unassign_cpu() path"). But that was buggy, and got
> reverted with 8395b67ab0b8a86.
>
> However, even though reverting was the right thing to do, it
> remains true that:
>   - calling the function is better done in the cpupool cpu removal
>     code, even if just for simmetry with the cpupool cpu adding path;
>   - it is not necessary to call it during cpu teardown (for suspend
>     or shutdown) code as we either are going down and will never
>     come up (shutdown) or, when coming up, we want everything to be
>     as before the tearing down process started, and so we would just
>     undo any update made during the process.
>   - calling it from the teardown path is not only unnecessary, but
>     it can trigger an ASSERT(), in case we get, during the process,
>     to remove the last online pcpu of a domain's node affinity:
>
>    (XEN) Assertion '!cpumask_empty(dom_cpumask)' failed at domain.c:466
>    (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
>    ... ... ...
>    (XEN) Xen call trace:
>    (XEN)    [<ffff82d0801055b9>] domain_update_node_affinity+0x113/0x240
>    (XEN)    [<ffff82d08012e676>] cpu_disable_scheduler+0x334/0x3f2
>    (XEN)    [<ffff82d08018bb8d>] __cpu_disable+0x313/0x36e
>    (XEN)    [<ffff82d080101424>] take_cpu_down+0x34/0x3b
>    (XEN)    [<ffff82d080130ad9>] stopmachine_action+0x70/0x99
>    (XEN)    [<ffff82d08013274f>] do_tasklet_work+0x78/0xab
>    (XEN)    [<ffff82d080132a85>] do_tasklet+0x5e/0x8a
>    (XEN)    [<ffff82d08016478c>] idle_loop+0x56/0x6b
>    (XEN)
>    (XEN)
>    (XEN) ****************************************
>    (XEN) Panic on CPU 12:
>    (XEN) Assertion '!cpumask_empty(dom_cpumask)' failed at domain.c:466
>    (XEN) ****************************************
>
> Therefore, for all these reasons, move the call from
> cpu_disable_schedule() to cpupool_unassign_cpu_helper().
>
> While there, add some sanity checking (in the latter function), and
> make sure that scanning the domain list is done with domlist_read_lock
> held, at least when the system is 'live'.
>
> I re-tested the scenario described in here:
>   http://permalink.gmane.org/gmane.comp.emulators.xen.devel/235310
>
> which is what led to the revert of 93be8285a79c6, and that is
> working ok after this commit.
>
> Signed-off-by: <dario.faggioli@citrix.com>

Acked-by: Juergen Gross <jgross@suse.com>

> ---
> Cc: Juergen Gross <jgross@suse.com>
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> ---
>   xen/common/cpupool.c  |   18 ++++++++++++++++++
>   xen/common/schedule.c |    7 ++++++-
>   2 files changed, 24 insertions(+), 1 deletion(-)
>
> diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
> index 563864d..79bcb21 100644
> --- a/xen/common/cpupool.c
> +++ b/xen/common/cpupool.c
> @@ -297,12 +297,25 @@ static int cpupool_assign_cpu_locked(struct cpupool *c, unsigned int cpu)
>   static long cpupool_unassign_cpu_helper(void *info)
>   {
>       int cpu = cpupool_moving_cpu;
> +    struct cpupool *c = info;
> +    struct domain *d;
>       long ret;
>
>       cpupool_dprintk("cpupool_unassign_cpu(pool=%d,cpu=%d)\n",
>                       cpupool_cpu_moving->cpupool_id, cpu);
>
>       spin_lock(&cpupool_lock);
> +    if ( c != cpupool_cpu_moving )
> +    {
> +        ret = -EBUSY;
> +        goto out;
> +    }
> +
> +    /*
> +     * We need this for scanning the domain list, both in
> +     * cpu_disable_scheduler(), and at the bottom of this function.
> +     */
> +    rcu_read_lock(&domlist_read_lock);
>       ret = cpu_disable_scheduler(cpu);
>       cpumask_set_cpu(cpu, &cpupool_free_cpus);
>       if ( !ret )
> @@ -319,6 +332,11 @@ static long cpupool_unassign_cpu_helper(void *info)
>           cpupool_cpu_moving = NULL;
>       }
>
> +    for_each_domain_in_cpupool(d, c)
> +    {
> +        domain_update_node_affinity(d);
> +    }
> +    rcu_read_unlock(&domlist_read_lock);
>   out:
>       spin_unlock(&cpupool_lock);
>       cpupool_dprintk("cpupool_unassign_cpu ret=%ld\n", ret);
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index eac8804..a1840c9 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -652,6 +652,12 @@ int cpu_disable_scheduler(unsigned int cpu)
>       if ( c == NULL )
>           return ret;
>
> +    /*
> +     * We'd need the domain RCU lock, but:
> +     *  - when we are called from cpupool code, it's acquired there already;
> +     *  - when we are called for CPU teardown, we're in stop-machine context,
> +     *    so that's not be a problem.
> +     */
>       for_each_domain_in_cpupool ( d, c )
>       {
>           for_each_vcpu ( d, v )
> @@ -741,7 +747,6 @@ int cpu_disable_scheduler(unsigned int cpu)
>                       ret = -EAGAIN;
>               }
>           }
> -        domain_update_node_affinity(d);
>       }
>
>       return ret;
>
>

  reply	other threads:[~2015-07-07 11:30 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-03 15:49 [PATCH 0/4] xen: sched/cpupool: more fixing of (corner?) cases Dario Faggioli
2015-07-03 15:49 ` [PATCH 1/4] xen: sched: factor the code for taking two runq locks in a function Dario Faggioli
2015-07-08 14:47   ` George Dunlap
2015-07-03 15:49 ` [PATCH 2/4] xen: sched: factor code that moves a vcpu to a new pcpu " Dario Faggioli
2015-07-08 14:56   ` George Dunlap
2015-07-08 15:09     ` Dario Faggioli
2015-07-03 15:49 ` [PATCH 3/4] xen: sched: reorganize cpu_disable_scheduler() Dario Faggioli
2015-07-07 11:16   ` Juergen Gross
2015-07-08 15:13     ` Dario Faggioli
2015-07-09 10:24       ` Dario Faggioli
2015-07-09 10:45         ` Juergen Gross
2015-07-15 14:54           ` Dario Faggioli
2015-07-15 15:07             ` Juergen Gross
2015-07-08 16:01   ` George Dunlap
2015-07-08 16:37     ` Dario Faggioli
2015-07-09 10:33       ` George Dunlap
2015-07-03 15:49 ` [PATCH 4/4] xen: sched/cpupool: properly update affinity when removing a cpu from a cpupool Dario Faggioli
2015-07-07 11:30   ` Juergen Gross [this message]
2015-07-08 16:19   ` George Dunlap

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=559BB85A.10708@suse.com \
    --to=jgross@suse.com \
    --cc=dario.faggioli@citrix.com \
    --cc=george.dunlap@eu.citrix.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.