From mboxrd@z Thu Jan 1 00:00:00 1970 From: Juergen Gross Subject: Re: [PATCH 3/4] xen: sched: reorganize cpu_disable_scheduler() Date: Tue, 07 Jul 2015 13:16:12 +0200 Message-ID: <559BB4FC.1070106@suse.com> References: <20150703152743.23194.15530.stgit@Solace.station> <20150703154930.23194.20319.stgit@Solace.station> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta14.messagelabs.com ([193.109.254.103]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1ZCQrH-00040u-U3 for xen-devel@lists.xenproject.org; Tue, 07 Jul 2015 11:16:16 +0000 In-Reply-To: <20150703154930.23194.20319.stgit@Solace.station> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Dario Faggioli , xen-devel@lists.xenproject.org Cc: George Dunlap List-Id: xen-devel@lists.xenproject.org On 07/03/2015 05:49 PM, Dario Faggioli wrote: > The function is called both when we want to remove a cpu > from a cpupool, and during cpu teardown, for suspend or > shutdown. If, however, the boot cpu (cpu 0, most of the > times) is not present in the default cpupool, during > suspend or shutdown, Xen crashes like this: > > root@Zhaman:~# xl cpupool-cpu-remove Pool-0 0 > root@Zhaman:~# shutdown -h now > (XEN) ----[ Xen-4.6-unstable x86_64 debug=y Tainted: C ]---- > ... > (XEN) Xen call trace: > (XEN) [] _csched_cpu_pick+0x156/0x61f > (XEN) [] csched_cpu_pick+0xe/0x10 > (XEN) [] vcpu_migrate+0x18e/0x321 > (XEN) [] cpu_disable_scheduler+0x1cf/0x2ac > (XEN) [] __cpu_disable+0x313/0x36e > (XEN) [] take_cpu_down+0x34/0x3b > (XEN) [] stopmachine_action+0x70/0x99 > (XEN) [] do_tasklet_work+0x78/0xab > (XEN) [] do_tasklet+0x5e/0x8a > (XEN) [] idle_loop+0x56/0x6b > (XEN) > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 15: > (XEN) Assertion 'cpu < nr_cpu_ids' failed at ...URCES/xen/xen/xen.git/xen/include/xen/cpumask.h:97 > (XEN) **************************************** > > There also are problems when we try to suspend or shutdown > with a cpupool configured with just one cpu (no matter, in > this case, whether that is the boot cpu or not): > > root@Zhaman:~# xl create /etc/xen/test.cfg > root@Zhaman:~# xl cpupool-migrate test Pool-1 > root@Zhaman:~# xl cpupool-list -c > Name CPU list > Pool-0 0,1,2,3,4,5,6,7,8,9,10,11,13,14,15 > Pool-1 12 > root@Zhaman:~# shutdown -h now > (XEN) ----[ Xen-4.6-unstable x86_64 debug=y Tainted: C ]---- > (XEN) CPU: 12 > ... > (XEN) Xen call trace: > (XEN) [] __cpu_disable+0x317/0x36e > (XEN) [] take_cpu_down+0x34/0x3b > (XEN) [] stopmachine_action+0x70/0x99 > (XEN) [] do_tasklet_work+0x78/0xab > (XEN) [] do_tasklet+0x5e/0x8a > (XEN) [] idle_loop+0x56/0x6b > (XEN) > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 12: > (XEN) Xen BUG at smpboot.c:895 > (XEN) **************************************** > > In both cases, the problem is the scheduler not being able > to: > - move all the vcpus to the boot cpu (as the boot cpu is > not in the cpupool), in the former; > - move the vcpus away from a cpu at all (as that is the > only one cpu in the cpupool), in the latter. > > Solution is to distinguish, inside cpu_disable_scheduler(), > the two cases of cpupool manipulation and teardown. For > cpupool manipulation, it is correct to ask the scheduler to > take an action, as pathological situation (like there not > being any cpu in the pool where to send vcpus) are taken > care of (i.e., forbidden!) already. For suspend and shutdown, > we don't want the scheduler to be involved at all, as the > final goal is pretty simple: "send all the vcpus to the > boot cpu ASAP", so we just go for it. > > Signed-off-by: Dario Faggioli Just 2 minor nits (see below), otherwise: Acked-by: Juergen Gross > --- > Cc: George Dunlap > Cc: Juergen Gross > --- > xen/common/schedule.c | 109 ++++++++++++++++++++++++++++++++++++++++++------- > 1 file changed, 93 insertions(+), 16 deletions(-) > > diff --git a/xen/common/schedule.c b/xen/common/schedule.c > index e83c666..eac8804 100644 > --- a/xen/common/schedule.c > +++ b/xen/common/schedule.c > @@ -455,8 +455,8 @@ void vcpu_unblock(struct vcpu *v) > * Do the actual movemet of a vcpu from old to new CPU. Locks for *both* > * CPUs needs to have been taken already when calling this! > */ > -static void vcpu_move(struct vcpu *v, unsigned int old_cpu, > - unsigned int new_cpu) > +static void _vcpu_move(struct vcpu *v, unsigned int old_cpu, > + unsigned int new_cpu) > { > /* > * Transfer urgency status to new CPU before switching CPUs, as > @@ -479,6 +479,35 @@ static void vcpu_move(struct vcpu *v, unsigned int old_cpu, > v->processor = new_cpu; > } > > +static void vcpu_move(struct vcpu *v, unsigned int new_cpu) > +{ > + unsigned long flags; > + unsigned int cpu = v->processor; > + spinlock_t *lock, *new_lock; > + > + /* > + * When here, the vcpu should be ready for being moved. This means: > + * - both its original and target processor must be quiet; > + * - it must not be marked as currently running; > + * - the proper flag must be set (i.e., no one must have had any > + * chance to reset it). > + */ > + ASSERT(is_idle_vcpu(curr_on_cpu(cpu)) && > + is_idle_vcpu(curr_on_cpu(new_cpu))); > + ASSERT(!v->is_running && test_bit(_VPF_migrating, &v->pause_flags)); > + > + lock = per_cpu(schedule_data, v->processor).schedule_lock; > + new_lock = per_cpu(schedule_data, new_cpu).schedule_lock; > + > + sched_spin_lock_double(lock, new_lock, &flags); > + ASSERT(new_cpu != v->processor); > + _vcpu_move(v, cpu, new_cpu); > + sched_spin_unlock_double(lock, new_lock, flags); > + > + sched_move_irqs(v); > + vcpu_wake(v); > +} > + > static void vcpu_migrate(struct vcpu *v) > { > unsigned long flags; > @@ -543,7 +572,7 @@ static void vcpu_migrate(struct vcpu *v) > return; > } > > - vcpu_move(v, old_cpu, new_cpu); > + _vcpu_move(v, old_cpu, new_cpu); > > sched_spin_unlock_double(old_lock, new_lock, flags); > > @@ -616,7 +645,8 @@ int cpu_disable_scheduler(unsigned int cpu) > struct vcpu *v; > struct cpupool *c; > cpumask_t online_affinity; > - int ret = 0; > + unsigned int new_cpu; > + int ret = 0; > > c = per_cpu(cpupool, cpu); > if ( c == NULL ) > @@ -645,25 +675,72 @@ int cpu_disable_scheduler(unsigned int cpu) > cpumask_setall(v->cpu_hard_affinity); > } > > - if ( v->processor == cpu ) > + if ( v->processor != cpu ) > { > - set_bit(_VPF_migrating, &v->pause_flags); > + /* This vcpu is not on cpu, so we can move on. */ > vcpu_schedule_unlock_irqrestore(lock, flags, v); > - vcpu_sleep_nosync(v); > - vcpu_migrate(v); > + continue; > } > - else > - vcpu_schedule_unlock_irqrestore(lock, flags, v); > > /* > - * A vcpu active in the hypervisor will not be migratable. > - * The caller should try again after releasing and reaquiring > - * all locks. > + * If we're here, it means that the vcpu is on cpu. Let's see how > + * it's best to send it away, depending on whether we are shutting > + * down/suspending, or doing cpupool manipulations. > */ > - if ( v->processor == cpu ) > - ret = -EAGAIN; > - } > + set_bit(_VPF_migrating, &v->pause_flags); > + vcpu_schedule_unlock_irqrestore(lock, flags, v); > + vcpu_sleep_nosync(v); > + > + /* > + * In case of shutdown/suspend, it is not necessary to ask the > + * scheduler to chime in. In fact: > + * * there is no reason for it: the end result we are after is > + * just 'all the vcpus on the boot pcpu, and no vcpu anywhere > + * else', so let's just go for it; > + * * it's wrong, when dealing a cpupool with only non-boot pcpus, > + * as the scheduler will always fail to send the vcpus away > + * from the last online (non boot) pcpu! I'd add a comment that in shutdown/suspend case all domains are being paused, so we can be active in dom0/Pool-0 only. > + * > + * Therefore, in the shutdown/suspend case, let's just pick one > + * of the still online pcpus, and send everyone there. Ideally, > + * we would pick up the boot pcpu directly, but we don't know > + * which one it is. > + * > + * OTOH, if the system is still live, and we are here because of > + * cpupool manipulations: > + * * it would be best to call the scheduler, as that would serve > + * as a re-evaluation of the placement of the vcpu, taking into > + * account the modified cpupool configuration; > + * * the scheduler will always fine a suitable solution, or > + * things would have failed before getting in here. > + * > + * Therefore, in the cpupool manipulation case, let's just ask the > + * scheduler to do its job, via calling vcpu_migrate(). > + */ > + if ( unlikely(system_state == SYS_STATE_suspend) ) > + { > + /* > + * The boot pcpu is, usually, pcpu #0, so using cpumask_first() > + * actually helps us to achieve our ultimate goal quicker. > + */ > + cpumask_andnot(&online_affinity, &cpu_online_map, cpumask_of(cpu)); What about an ASSERT/BUG regarding a non-empty online_affinity? Juergen > + new_cpu = cpumask_first(&online_affinity); > + vcpu_move(v, new_cpu); > + } > + else > + { > + /* > + * The only caveat, in this case, is that if the vcpu active > + * in the hypervisor, it won't be migratable. In this case, > + * the caller should try again after releasing and reaquiring > + * all locks. > + */ > + vcpu_migrate(v); > > + if ( v->processor == cpu ) > + ret = -EAGAIN; > + } > + } > domain_update_node_affinity(d); > } > > >