From mboxrd@z Thu Jan  1 00:00:00 1970
From: Juergen Gross <jgross@suse.com>
Subject: Re: [PATCH 3/4] xen: sched: reorganize
	cpu_disable_scheduler()
Date: Tue, 07 Jul 2015 13:16:12 +0200
Message-ID: <559BB4FC.1070106@suse.com>
References: <20150703152743.23194.15530.stgit@Solace.station>
	<20150703154930.23194.20319.stgit@Solace.station>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
Received: from mail6.bemta14.messagelabs.com ([193.109.254.103])
	by lists.xen.org with esmtp (Exim 4.72)
	(envelope-from <jgross@suse.com>) id 1ZCQrH-00040u-U3
	for xen-devel@lists.xenproject.org; Tue, 07 Jul 2015 11:16:16 +0000
In-Reply-To: <20150703154930.23194.20319.stgit@Solace.station>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Dario Faggioli <dario.faggioli@citrix.com>, xen-devel@lists.xenproject.org
Cc: George Dunlap <george.dunlap@eu.citrix.com>
List-Id: xen-devel@lists.xenproject.org

On 07/03/2015 05:49 PM, Dario Faggioli wrote:
> The function is called both when we want to remove a cpu
> from a cpupool, and during cpu teardown, for suspend or
> shutdown. If, however, the boot cpu (cpu 0, most of the
> times) is not present in the default cpupool, during
> suspend or shutdown, Xen crashes like this:
>
>    root@Zhaman:~# xl cpupool-cpu-remove Pool-0 0
>    root@Zhaman:~# shutdown -h now
>    (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
>    ...
>    (XEN) Xen call trace:
>    (XEN)    [<ffff82d0801238de>] _csched_cpu_pick+0x156/0x61f
>    (XEN)    [<ffff82d080123db5>] csched_cpu_pick+0xe/0x10
>    (XEN)    [<ffff82d08012de3c>] vcpu_migrate+0x18e/0x321
>    (XEN)    [<ffff82d08012e4f8>] cpu_disable_scheduler+0x1cf/0x2ac
>    (XEN)    [<ffff82d08018bb8d>] __cpu_disable+0x313/0x36e
>    (XEN)    [<ffff82d080101424>] take_cpu_down+0x34/0x3b
>    (XEN)    [<ffff82d08013097a>] stopmachine_action+0x70/0x99
>    (XEN)    [<ffff82d0801325f0>] do_tasklet_work+0x78/0xab
>    (XEN)    [<ffff82d080132926>] do_tasklet+0x5e/0x8a
>    (XEN)    [<ffff82d08016478c>] idle_loop+0x56/0x6b
>    (XEN)
>    (XEN)
>    (XEN) ****************************************
>    (XEN) Panic on CPU 15:
>    (XEN) Assertion 'cpu < nr_cpu_ids' failed at ...URCES/xen/xen/xen.git/xen/include/xen/cpumask.h:97
>    (XEN) ****************************************
>
> There also are problems when we try to suspend or shutdown
> with a cpupool configured with just one cpu (no matter, in
> this case, whether that is the boot cpu or not):
>
>    root@Zhaman:~# xl create /etc/xen/test.cfg
>    root@Zhaman:~# xl cpupool-migrate test Pool-1
>    root@Zhaman:~# xl cpupool-list -c
>    Name               CPU list
>    Pool-0             0,1,2,3,4,5,6,7,8,9,10,11,13,14,15
>    Pool-1             12
>    root@Zhaman:~# shutdown -h now
>    (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
>    (XEN) CPU:    12
>    ...
>    (XEN) Xen call trace:
>    (XEN)    [<ffff82d08018bb91>] __cpu_disable+0x317/0x36e
>    (XEN)    [<ffff82d080101424>] take_cpu_down+0x34/0x3b
>    (XEN)    [<ffff82d08013097a>] stopmachine_action+0x70/0x99
>    (XEN)    [<ffff82d0801325f0>] do_tasklet_work+0x78/0xab
>    (XEN)    [<ffff82d080132926>] do_tasklet+0x5e/0x8a
>    (XEN)    [<ffff82d08016478c>] idle_loop+0x56/0x6b
>    (XEN)
>    (XEN)
>    (XEN) ****************************************
>    (XEN) Panic on CPU 12:
>    (XEN) Xen BUG at smpboot.c:895
>    (XEN) ****************************************
>
> In both cases, the problem is the scheduler not being able
> to:
>   - move all the vcpus to the boot cpu (as the boot cpu is
>     not in the cpupool), in the former;
>   - move the vcpus away from a cpu at all (as that is the
>     only one cpu in the cpupool), in the latter.
>
> Solution is to distinguish, inside cpu_disable_scheduler(),
> the two cases of cpupool manipulation and teardown. For
> cpupool manipulation, it is correct to ask the scheduler to
> take an action, as pathological situation (like there not
> being any cpu in the pool where to send vcpus) are taken
> care of (i.e., forbidden!) already. For suspend and shutdown,
> we don't want the scheduler to be involved at all, as the
> final goal is pretty simple: "send all the vcpus to the
> boot cpu ASAP", so we just go for it.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Just 2 minor nits (see below), otherwise:

Acked-by: Juergen Gross <jgross@suse.com>

> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Juergen Gross <jgross@suse.com>
> ---
>   xen/common/schedule.c |  109 ++++++++++++++++++++++++++++++++++++++++++-------
>   1 file changed, 93 insertions(+), 16 deletions(-)
>
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index e83c666..eac8804 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -455,8 +455,8 @@ void vcpu_unblock(struct vcpu *v)
>    * Do the actual movemet of a vcpu from old to new CPU. Locks for *both*
>    * CPUs needs to have been taken already when calling this!
>    */
> -static void vcpu_move(struct vcpu *v, unsigned int old_cpu,
> -                      unsigned int new_cpu)
> +static void _vcpu_move(struct vcpu *v, unsigned int old_cpu,
> +                       unsigned int new_cpu)
>   {
>       /*
>        * Transfer urgency status to new CPU before switching CPUs, as
> @@ -479,6 +479,35 @@ static void vcpu_move(struct vcpu *v, unsigned int old_cpu,
>           v->processor = new_cpu;
>   }
>
> +static void vcpu_move(struct vcpu *v, unsigned int new_cpu)
> +{
> +    unsigned long flags;
> +    unsigned int cpu = v->processor;
> +    spinlock_t *lock, *new_lock;
> +
> +    /*
> +     * When here, the vcpu should be ready for being moved. This means:
> +     *  - both its original and target processor must be quiet;
> +     *  - it must not be marked as currently running;
> +     *  - the proper flag must be set (i.e., no one must have had any
> +     *    chance to reset it).
> +     */
> +    ASSERT(is_idle_vcpu(curr_on_cpu(cpu)) &&
> +           is_idle_vcpu(curr_on_cpu(new_cpu)));
> +    ASSERT(!v->is_running && test_bit(_VPF_migrating, &v->pause_flags));
> +
> +    lock = per_cpu(schedule_data, v->processor).schedule_lock;
> +    new_lock = per_cpu(schedule_data, new_cpu).schedule_lock;
> +
> +    sched_spin_lock_double(lock, new_lock, &flags);
> +    ASSERT(new_cpu != v->processor);
> +    _vcpu_move(v, cpu, new_cpu);
> +    sched_spin_unlock_double(lock, new_lock, flags);
> +
> +    sched_move_irqs(v);
> +    vcpu_wake(v);
> +}
> +
>   static void vcpu_migrate(struct vcpu *v)
>   {
>       unsigned long flags;
> @@ -543,7 +572,7 @@ static void vcpu_migrate(struct vcpu *v)
>           return;
>       }
>
> -    vcpu_move(v, old_cpu, new_cpu);
> +    _vcpu_move(v, old_cpu, new_cpu);
>
>       sched_spin_unlock_double(old_lock, new_lock, flags);
>
> @@ -616,7 +645,8 @@ int cpu_disable_scheduler(unsigned int cpu)
>       struct vcpu *v;
>       struct cpupool *c;
>       cpumask_t online_affinity;
> -    int    ret = 0;
> +    unsigned int new_cpu;
> +    int ret = 0;
>
>       c = per_cpu(cpupool, cpu);
>       if ( c == NULL )
> @@ -645,25 +675,72 @@ int cpu_disable_scheduler(unsigned int cpu)
>                   cpumask_setall(v->cpu_hard_affinity);
>               }
>
> -            if ( v->processor == cpu )
> +            if ( v->processor != cpu )
>               {
> -                set_bit(_VPF_migrating, &v->pause_flags);
> +                /* This vcpu is not on cpu, so we can move on. */
>                   vcpu_schedule_unlock_irqrestore(lock, flags, v);
> -                vcpu_sleep_nosync(v);
> -                vcpu_migrate(v);
> +                continue;
>               }
> -            else
> -                vcpu_schedule_unlock_irqrestore(lock, flags, v);
>
>               /*
> -             * A vcpu active in the hypervisor will not be migratable.
> -             * The caller should try again after releasing and reaquiring
> -             * all locks.
> +             * If we're here, it means that the vcpu is on cpu. Let's see how
> +             * it's best to send it away, depending on whether we are shutting
> +             * down/suspending, or doing cpupool manipulations.
>                */
> -            if ( v->processor == cpu )
> -                ret = -EAGAIN;
> -        }
> +            set_bit(_VPF_migrating, &v->pause_flags);
> +            vcpu_schedule_unlock_irqrestore(lock, flags, v);
> +            vcpu_sleep_nosync(v);
> +
> +            /*
> +             * In case of shutdown/suspend, it is not necessary to ask the
> +             * scheduler to chime in. In fact:
> +             *  * there is no reason for it: the end result we are after is
> +             *    just 'all the vcpus on the boot pcpu, and no vcpu anywhere
> +             *    else', so let's just go for it;
> +             *  * it's wrong, when dealing a cpupool with only non-boot pcpus,
> +             *    as the scheduler will always fail to send the vcpus away
> +             *    from the last online (non boot) pcpu!

I'd add a comment that in shutdown/suspend case all domains are being
paused, so we can be active in dom0/Pool-0 only.

> +             *
> +             * Therefore, in the shutdown/suspend case, let's just pick one
> +             * of the still online pcpus, and send everyone there. Ideally,
> +             * we would pick up the boot pcpu directly, but we don't know
> +             * which one it is.
> +             *
> +             * OTOH, if the system is still live, and we are here because of
> +             * cpupool manipulations:
> +             *  * it would be best to call the scheduler, as that would serve
> +             *    as a re-evaluation of the placement of the vcpu, taking into
> +             *    account the modified cpupool configuration;
> +             *  * the scheduler will always fine a suitable solution, or
> +             *    things would have failed before getting in here.
> +             *
> +             * Therefore, in the cpupool manipulation case, let's just ask the
> +             * scheduler to do its job, via calling vcpu_migrate().
> +             */
> +            if ( unlikely(system_state == SYS_STATE_suspend) )
> +            {
> +                /*
> +                 * The boot pcpu is, usually, pcpu #0, so using cpumask_first()
> +                 * actually helps us to achieve our ultimate goal quicker.
> +                 */
> +                cpumask_andnot(&online_affinity, &cpu_online_map, cpumask_of(cpu));

What about an ASSERT/BUG regarding a non-empty online_affinity?


Juergen

> +                new_cpu = cpumask_first(&online_affinity);
> +                vcpu_move(v, new_cpu);
> +            }
> +            else
> +            {
> +                /*
> +                 * The only caveat, in this case, is that if the vcpu active
> +                 * in the hypervisor, it won't be migratable. In this case,
> +                 * the caller should try again after releasing and reaquiring
> +                 * all locks.
> +                 */
> +                vcpu_migrate(v);
>
> +                if ( v->processor == cpu )
> +                    ret = -EAGAIN;
> +            }
> +        }
>           domain_update_node_affinity(d);
>       }
>
>
>