[PATCH 0/4] xen: sched/cpupool: more fixing of (corner?) cases

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/4] xen: sched/cpupool: more fixing of (corner?) cases
@ 2015-07-03 15:49 Dario Faggioli
  2015-07-03 15:49 ` [PATCH 1/4] xen: sched: factor the code for taking two runq locks in a function Dario Faggioli
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Dario Faggioli @ 2015-07-03 15:49 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Juergen Gross

More specifically, this time, of cases when we try to shut the system down or
suspend it in the following situations:
 - when the boot cpu (i.e., most of the times, cpu 0) is not assigned to any
   cpupool,
 - when a (non default) cpupool only has one cpu (and that is not the boot
   cpu).

In both these cases, Xen is crashing, without this series. The splats are in
the changelogs of the relevant patches in the series.

The core of the fix is recognising that cpu_disable_scheduler() is, right now,
serving two purposes, and hence being called from the respective relevant code
path: (1) when a cpu is removed from a pool, and (2) during teardown of a cpu,
for shutdown or suspend.
In its turn, cpu_disable_scheduler() calls the scheduler to make the vcpus go
away from the cpu that is either being removed or is going offline.
Well, while in the former case (cpupool manipulation) that works fine, and is
the right thing to do, in the latter (shutdown or suspend), it just can't
work... And in fact we Oops! :-/

In fact, when the final goal is to move all the vcpus to the boot cpu (as that
is the only one that is not actually going offline, or at least is doing so in
a special way), if the boot cpu is not a possible choice for the scheduler,
well, things simply can't work.
Therefore, in this series (more specifically, in patch 3),
cpu_disable_scheduler() is changed in such a way that the two use cases it
serves are better identified, isolated and handled properly.

About other patches: 1 and 2 are preparatory changes for 3, and they're pure
code motion and refactoring.
Patch 4 fixes another crash, and makes the code look a bit better, in terms of
improved simmetry between the operations of removing and adding a cpu to a
cpupool.[*]

Thanks and Regards,
Dario

[*] Note that this was attempted already, but in the wrong way, and hence such
change was then reverted. Now it's correct (and necessary!). See the changelog
for more details.
---
Dario Faggioli (4):
      xen: sched: factor the code for taking two runq locks in a function
      xen: sched: factor code that moves a vcpu to a new pcpu in a function
      xen: sched: reorganize cpu_disable_scheduler()
      xen: sched/cpupool: properly update affinity when removing a cpu from a cpupool

 xen/common/cpupool.c  |   18 ++++
 xen/common/schedule.c |  222 ++++++++++++++++++++++++++++++++++++-------------
 2 files changed, 179 insertions(+), 61 deletions(-)
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/4] xen: sched: factor the code for taking two runq locks in a function
  2015-07-03 15:49 [PATCH 0/4] xen: sched/cpupool: more fixing of (corner?) cases Dario Faggioli
@ 2015-07-03 15:49 ` Dario Faggioli
  2015-07-08 14:47   ` George Dunlap
  2015-07-03 15:49 ` [PATCH 2/4] xen: sched: factor code that moves a vcpu to a new pcpu " Dario Faggioli
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 19+ messages in thread
From: Dario Faggioli @ 2015-07-03 15:49 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap

No functional change intended.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
---
 xen/common/schedule.c |   64 ++++++++++++++++++++++++++++---------------------
 1 file changed, 36 insertions(+), 28 deletions(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index ecf1545..26e8430 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -185,6 +185,38 @@ uint64_t get_cpu_idle_time(unsigned int cpu)
     return state.time[RUNSTATE_running];
 }
 
+/*
+ * If locks are different, take the one with the lower address first.
+ * This avoids dead- or live-locks when this code is running on both
+ * cpus at the same time.
+ */
+static void sched_spin_lock_double(spinlock_t *lock1, spinlock_t *lock2,
+                                   unsigned long *flags)
+{
+    if ( lock1 == lock2 )
+    {
+        spin_lock_irqsave(lock1, *flags);
+    }
+    else if ( lock1 < lock2 )
+    {
+        spin_lock_irqsave(lock1, *flags);
+        spin_lock(lock2);
+    }
+    else
+    {
+        spin_lock_irqsave(lock2, *flags);
+        spin_lock(lock1);
+    }
+}
+
+static void sched_spin_unlock_double(spinlock_t *lock1, spinlock_t *lock2,
+                                     unsigned long flags)
+{
+    if ( lock1 != lock2 )
+        spin_unlock(lock2);
+    spin_unlock_irqrestore(lock1, flags);
+}
+
 int sched_init_vcpu(struct vcpu *v, unsigned int processor) 
 {
     struct domain *d = v->domain;
@@ -430,31 +462,14 @@ static void vcpu_migrate(struct vcpu *v)
     for ( ; ; )
     {
         /*
-         * If per-cpu locks for old and new cpu are different, take the one
-         * with the lower lock address first. This avoids dead- or live-locks
-         * when this code is running on both cpus at the same time.
          * We need another iteration if the pre-calculated lock addresses
          * are not correct any longer after evaluating old and new cpu holding
          * the locks.
          */
-
         old_lock = per_cpu(schedule_data, old_cpu).schedule_lock;
         new_lock = per_cpu(schedule_data, new_cpu).schedule_lock;
 
-        if ( old_lock == new_lock )
-        {
-            spin_lock_irqsave(old_lock, flags);
-        }
-        else if ( old_lock < new_lock )
-        {
-            spin_lock_irqsave(old_lock, flags);
-            spin_lock(new_lock);
-        }
-        else
-        {
-            spin_lock_irqsave(new_lock, flags);
-            spin_lock(old_lock);
-        }
+        sched_spin_lock_double(old_lock, new_lock, &flags);
 
         old_cpu = v->processor;
         if ( old_lock == per_cpu(schedule_data, old_cpu).schedule_lock )
@@ -485,9 +500,7 @@ static void vcpu_migrate(struct vcpu *v)
             pick_called = 0;
         }
 
-        if ( old_lock != new_lock )
-            spin_unlock(new_lock);
-        spin_unlock_irqrestore(old_lock, flags);
+        sched_spin_unlock_double(old_lock, new_lock, flags);
     }
 
     /*
@@ -498,9 +511,7 @@ static void vcpu_migrate(struct vcpu *v)
     if ( v->is_running ||
          !test_and_clear_bit(_VPF_migrating, &v->pause_flags) )
     {
-        if ( old_lock != new_lock )
-            spin_unlock(new_lock);
-        spin_unlock_irqrestore(old_lock, flags);
+        sched_spin_unlock_double(old_lock, new_lock, flags);
         return;
     }
 
@@ -524,10 +535,7 @@ static void vcpu_migrate(struct vcpu *v)
     else
         v->processor = new_cpu;
 
-
-    if ( old_lock != new_lock )
-        spin_unlock(new_lock);
-    spin_unlock_irqrestore(old_lock, flags);
+    sched_spin_unlock_double(old_lock, new_lock, flags);
 
     if ( old_cpu != new_cpu )
         sched_move_irqs(v);

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/4] xen: sched: factor code that moves a vcpu to a new pcpu in a function
  2015-07-03 15:49 [PATCH 0/4] xen: sched/cpupool: more fixing of (corner?) cases Dario Faggioli
  2015-07-03 15:49 ` [PATCH 1/4] xen: sched: factor the code for taking two runq locks in a function Dario Faggioli
@ 2015-07-03 15:49 ` Dario Faggioli
  2015-07-08 14:56   ` George Dunlap
  2015-07-03 15:49 ` [PATCH 3/4] xen: sched: reorganize cpu_disable_scheduler() Dario Faggioli
  2015-07-03 15:49 ` [PATCH 4/4] xen: sched/cpupool: properly update affinity when removing a cpu from a cpupool Dario Faggioli
  3 siblings, 1 reply; 19+ messages in thread
From: Dario Faggioli @ 2015-07-03 15:49 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap

No functional change intended.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
---
 xen/common/schedule.c |   48 +++++++++++++++++++++++++++++-------------------
 1 file changed, 29 insertions(+), 19 deletions(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 26e8430..e83c666 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -451,6 +451,34 @@ void vcpu_unblock(struct vcpu *v)
     vcpu_wake(v);
 }
 
+/*
+ * Do the actual movemet of a vcpu from old to new CPU. Locks for *both*
+ * CPUs needs to have been taken already when calling this!
+ */
+static void vcpu_move(struct vcpu *v, unsigned int old_cpu,
+                      unsigned int new_cpu)
+{
+    /*
+     * Transfer urgency status to new CPU before switching CPUs, as
+     * once the switch occurs, v->is_urgent is no longer protected by
+     * the per-CPU scheduler lock we are holding.
+     */
+    if ( unlikely(v->is_urgent) && (old_cpu != new_cpu) )
+    {
+        atomic_inc(&per_cpu(schedule_data, new_cpu).urgent_count);
+        atomic_dec(&per_cpu(schedule_data, old_cpu).urgent_count);
+    }
+
+    /*
+     * Actual CPU switch to new CPU.  This is safe because the lock
+     * pointer cant' change while the current lock is held.
+     */
+    if ( VCPU2OP(v)->migrate )
+        SCHED_OP(VCPU2OP(v), migrate, v, new_cpu);
+    else
+        v->processor = new_cpu;
+}
+
 static void vcpu_migrate(struct vcpu *v)
 {
     unsigned long flags;
@@ -515,25 +543,7 @@ static void vcpu_migrate(struct vcpu *v)
         return;
     }
 
-    /*
-     * Transfer urgency status to new CPU before switching CPUs, as once
-     * the switch occurs, v->is_urgent is no longer protected by the per-CPU
-     * scheduler lock we are holding.
-     */
-    if ( unlikely(v->is_urgent) && (old_cpu != new_cpu) )
-    {
-        atomic_inc(&per_cpu(schedule_data, new_cpu).urgent_count);
-        atomic_dec(&per_cpu(schedule_data, old_cpu).urgent_count);
-    }
-
-    /*
-     * Switch to new CPU, then unlock new and old CPU.  This is safe because
-     * the lock pointer cant' change while the current lock is held.
-     */
-    if ( VCPU2OP(v)->migrate )
-        SCHED_OP(VCPU2OP(v), migrate, v, new_cpu);
-    else
-        v->processor = new_cpu;
+    vcpu_move(v, old_cpu, new_cpu);
 
     sched_spin_unlock_double(old_lock, new_lock, flags);

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/4] xen: sched: reorganize cpu_disable_scheduler()
  2015-07-03 15:49 [PATCH 0/4] xen: sched/cpupool: more fixing of (corner?) cases Dario Faggioli
  2015-07-03 15:49 ` [PATCH 1/4] xen: sched: factor the code for taking two runq locks in a function Dario Faggioli
  2015-07-03 15:49 ` [PATCH 2/4] xen: sched: factor code that moves a vcpu to a new pcpu " Dario Faggioli
@ 2015-07-03 15:49 ` Dario Faggioli
  2015-07-07 11:16   ` Juergen Gross
  2015-07-08 16:01   ` George Dunlap
  2015-07-03 15:49 ` [PATCH 4/4] xen: sched/cpupool: properly update affinity when removing a cpu from a cpupool Dario Faggioli
  3 siblings, 2 replies; 19+ messages in thread
From: Dario Faggioli @ 2015-07-03 15:49 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Juergen Gross

The function is called both when we want to remove a cpu
from a cpupool, and during cpu teardown, for suspend or
shutdown. If, however, the boot cpu (cpu 0, most of the
times) is not present in the default cpupool, during
suspend or shutdown, Xen crashes like this:

  root@Zhaman:~# xl cpupool-cpu-remove Pool-0 0
  root@Zhaman:~# shutdown -h now
  (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
  ...
  (XEN) Xen call trace:
  (XEN)    [<ffff82d0801238de>] _csched_cpu_pick+0x156/0x61f
  (XEN)    [<ffff82d080123db5>] csched_cpu_pick+0xe/0x10
  (XEN)    [<ffff82d08012de3c>] vcpu_migrate+0x18e/0x321
  (XEN)    [<ffff82d08012e4f8>] cpu_disable_scheduler+0x1cf/0x2ac
  (XEN)    [<ffff82d08018bb8d>] __cpu_disable+0x313/0x36e
  (XEN)    [<ffff82d080101424>] take_cpu_down+0x34/0x3b
  (XEN)    [<ffff82d08013097a>] stopmachine_action+0x70/0x99
  (XEN)    [<ffff82d0801325f0>] do_tasklet_work+0x78/0xab
  (XEN)    [<ffff82d080132926>] do_tasklet+0x5e/0x8a
  (XEN)    [<ffff82d08016478c>] idle_loop+0x56/0x6b
  (XEN)
  (XEN)
  (XEN) ****************************************
  (XEN) Panic on CPU 15:
  (XEN) Assertion 'cpu < nr_cpu_ids' failed at ...URCES/xen/xen/xen.git/xen/include/xen/cpumask.h:97
  (XEN) ****************************************

There also are problems when we try to suspend or shutdown
with a cpupool configured with just one cpu (no matter, in
this case, whether that is the boot cpu or not):

  root@Zhaman:~# xl create /etc/xen/test.cfg
  root@Zhaman:~# xl cpupool-migrate test Pool-1
  root@Zhaman:~# xl cpupool-list -c
  Name               CPU list
  Pool-0             0,1,2,3,4,5,6,7,8,9,10,11,13,14,15
  Pool-1             12
  root@Zhaman:~# shutdown -h now
  (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
  (XEN) CPU:    12
  ...
  (XEN) Xen call trace:
  (XEN)    [<ffff82d08018bb91>] __cpu_disable+0x317/0x36e
  (XEN)    [<ffff82d080101424>] take_cpu_down+0x34/0x3b
  (XEN)    [<ffff82d08013097a>] stopmachine_action+0x70/0x99
  (XEN)    [<ffff82d0801325f0>] do_tasklet_work+0x78/0xab
  (XEN)    [<ffff82d080132926>] do_tasklet+0x5e/0x8a
  (XEN)    [<ffff82d08016478c>] idle_loop+0x56/0x6b
  (XEN)
  (XEN)
  (XEN) ****************************************
  (XEN) Panic on CPU 12:
  (XEN) Xen BUG at smpboot.c:895
  (XEN) ****************************************

In both cases, the problem is the scheduler not being able
to:
 - move all the vcpus to the boot cpu (as the boot cpu is
   not in the cpupool), in the former;
 - move the vcpus away from a cpu at all (as that is the
   only one cpu in the cpupool), in the latter.

Solution is to distinguish, inside cpu_disable_scheduler(),
the two cases of cpupool manipulation and teardown. For
cpupool manipulation, it is correct to ask the scheduler to
take an action, as pathological situation (like there not
being any cpu in the pool where to send vcpus) are taken
care of (i.e., forbidden!) already. For suspend and shutdown,
we don't want the scheduler to be involved at all, as the
final goal is pretty simple: "send all the vcpus to the
boot cpu ASAP", so we just go for it.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Juergen Gross <jgross@suse.com>
---
 xen/common/schedule.c |  109 ++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 93 insertions(+), 16 deletions(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index e83c666..eac8804 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -455,8 +455,8 @@ void vcpu_unblock(struct vcpu *v)
  * Do the actual movemet of a vcpu from old to new CPU. Locks for *both*
  * CPUs needs to have been taken already when calling this!
  */
-static void vcpu_move(struct vcpu *v, unsigned int old_cpu,
-                      unsigned int new_cpu)
+static void _vcpu_move(struct vcpu *v, unsigned int old_cpu,
+                       unsigned int new_cpu)
 {
     /*
      * Transfer urgency status to new CPU before switching CPUs, as
@@ -479,6 +479,35 @@ static void vcpu_move(struct vcpu *v, unsigned int old_cpu,
         v->processor = new_cpu;
 }
 
+static void vcpu_move(struct vcpu *v, unsigned int new_cpu)
+{
+    unsigned long flags;
+    unsigned int cpu = v->processor;
+    spinlock_t *lock, *new_lock;
+
+    /*
+     * When here, the vcpu should be ready for being moved. This means:
+     *  - both its original and target processor must be quiet;
+     *  - it must not be marked as currently running;
+     *  - the proper flag must be set (i.e., no one must have had any
+     *    chance to reset it).
+     */
+    ASSERT(is_idle_vcpu(curr_on_cpu(cpu)) &&
+           is_idle_vcpu(curr_on_cpu(new_cpu)));
+    ASSERT(!v->is_running && test_bit(_VPF_migrating, &v->pause_flags));
+
+    lock = per_cpu(schedule_data, v->processor).schedule_lock;
+    new_lock = per_cpu(schedule_data, new_cpu).schedule_lock;
+
+    sched_spin_lock_double(lock, new_lock, &flags);
+    ASSERT(new_cpu != v->processor);
+    _vcpu_move(v, cpu, new_cpu);
+    sched_spin_unlock_double(lock, new_lock, flags);
+
+    sched_move_irqs(v);
+    vcpu_wake(v);
+}
+
 static void vcpu_migrate(struct vcpu *v)
 {
     unsigned long flags;
@@ -543,7 +572,7 @@ static void vcpu_migrate(struct vcpu *v)
         return;
     }
 
-    vcpu_move(v, old_cpu, new_cpu);
+    _vcpu_move(v, old_cpu, new_cpu);
 
     sched_spin_unlock_double(old_lock, new_lock, flags);
 
@@ -616,7 +645,8 @@ int cpu_disable_scheduler(unsigned int cpu)
     struct vcpu *v;
     struct cpupool *c;
     cpumask_t online_affinity;
-    int    ret = 0;
+    unsigned int new_cpu;
+    int ret = 0;
 
     c = per_cpu(cpupool, cpu);
     if ( c == NULL )
@@ -645,25 +675,72 @@ int cpu_disable_scheduler(unsigned int cpu)
                 cpumask_setall(v->cpu_hard_affinity);
             }
 
-            if ( v->processor == cpu )
+            if ( v->processor != cpu )
             {
-                set_bit(_VPF_migrating, &v->pause_flags);
+                /* This vcpu is not on cpu, so we can move on. */
                 vcpu_schedule_unlock_irqrestore(lock, flags, v);
-                vcpu_sleep_nosync(v);
-                vcpu_migrate(v);
+                continue;
             }
-            else
-                vcpu_schedule_unlock_irqrestore(lock, flags, v);
 
             /*
-             * A vcpu active in the hypervisor will not be migratable.
-             * The caller should try again after releasing and reaquiring
-             * all locks.
+             * If we're here, it means that the vcpu is on cpu. Let's see how
+             * it's best to send it away, depending on whether we are shutting
+             * down/suspending, or doing cpupool manipulations.
              */
-            if ( v->processor == cpu )
-                ret = -EAGAIN;
-        }
+            set_bit(_VPF_migrating, &v->pause_flags);
+            vcpu_schedule_unlock_irqrestore(lock, flags, v);
+            vcpu_sleep_nosync(v);
+
+            /*
+             * In case of shutdown/suspend, it is not necessary to ask the
+             * scheduler to chime in. In fact:
+             *  * there is no reason for it: the end result we are after is
+             *    just 'all the vcpus on the boot pcpu, and no vcpu anywhere
+             *    else', so let's just go for it;
+             *  * it's wrong, when dealing a cpupool with only non-boot pcpus,
+             *    as the scheduler will always fail to send the vcpus away
+             *    from the last online (non boot) pcpu!
+             *
+             * Therefore, in the shutdown/suspend case, let's just pick one
+             * of the still online pcpus, and send everyone there. Ideally,
+             * we would pick up the boot pcpu directly, but we don't know
+             * which one it is.
+             *
+             * OTOH, if the system is still live, and we are here because of
+             * cpupool manipulations:
+             *  * it would be best to call the scheduler, as that would serve
+             *    as a re-evaluation of the placement of the vcpu, taking into
+             *    account the modified cpupool configuration;
+             *  * the scheduler will always fine a suitable solution, or
+             *    things would have failed before getting in here.
+             *
+             * Therefore, in the cpupool manipulation case, let's just ask the
+             * scheduler to do its job, via calling vcpu_migrate().
+             */
+            if ( unlikely(system_state == SYS_STATE_suspend) )
+            {
+                /*
+                 * The boot pcpu is, usually, pcpu #0, so using cpumask_first()
+                 * actually helps us to achieve our ultimate goal quicker.
+                 */
+                cpumask_andnot(&online_affinity, &cpu_online_map, cpumask_of(cpu));
+                new_cpu = cpumask_first(&online_affinity);
+                vcpu_move(v, new_cpu);
+            }
+            else
+            {
+                /*
+                 * The only caveat, in this case, is that if the vcpu active
+                 * in the hypervisor, it won't be migratable. In this case,
+                 * the caller should try again after releasing and reaquiring
+                 * all locks.
+                 */
+                vcpu_migrate(v);
 
+                if ( v->processor == cpu )
+                    ret = -EAGAIN;
+            }
+        }
         domain_update_node_affinity(d);
     }

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/4] xen: sched/cpupool: properly update affinity when removing a cpu from a cpupool
  2015-07-03 15:49 [PATCH 0/4] xen: sched/cpupool: more fixing of (corner?) cases Dario Faggioli
                   ` (2 preceding siblings ...)
  2015-07-03 15:49 ` [PATCH 3/4] xen: sched: reorganize cpu_disable_scheduler() Dario Faggioli
@ 2015-07-03 15:49 ` Dario Faggioli
  2015-07-07 11:30   ` Juergen Gross
  2015-07-08 16:19   ` George Dunlap
  3 siblings, 2 replies; 19+ messages in thread
From: Dario Faggioli @ 2015-07-03 15:49 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, George Dunlap

And this time, do it right. In fact, a similar change was
attempted in 93be8285a79c6 ("cpupools: update domU's node-affinity
on the cpupool_unassign_cpu() path"). But that was buggy, and got
reverted with 8395b67ab0b8a86.

However, even though reverting was the right thing to do, it
remains true that:
 - calling the function is better done in the cpupool cpu removal
   code, even if just for simmetry with the cpupool cpu adding path;
 - it is not necessary to call it during cpu teardown (for suspend
   or shutdown) code as we either are going down and will never
   come up (shutdown) or, when coming up, we want everything to be
   as before the tearing down process started, and so we would just
   undo any update made during the process.
 - calling it from the teardown path is not only unnecessary, but
   it can trigger an ASSERT(), in case we get, during the process,
   to remove the last online pcpu of a domain's node affinity:

  (XEN) Assertion '!cpumask_empty(dom_cpumask)' failed at domain.c:466
  (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
  ... ... ...
  (XEN) Xen call trace:
  (XEN)    [<ffff82d0801055b9>] domain_update_node_affinity+0x113/0x240
  (XEN)    [<ffff82d08012e676>] cpu_disable_scheduler+0x334/0x3f2
  (XEN)    [<ffff82d08018bb8d>] __cpu_disable+0x313/0x36e
  (XEN)    [<ffff82d080101424>] take_cpu_down+0x34/0x3b
  (XEN)    [<ffff82d080130ad9>] stopmachine_action+0x70/0x99
  (XEN)    [<ffff82d08013274f>] do_tasklet_work+0x78/0xab
  (XEN)    [<ffff82d080132a85>] do_tasklet+0x5e/0x8a
  (XEN)    [<ffff82d08016478c>] idle_loop+0x56/0x6b
  (XEN)
  (XEN)
  (XEN) ****************************************
  (XEN) Panic on CPU 12:
  (XEN) Assertion '!cpumask_empty(dom_cpumask)' failed at domain.c:466
  (XEN) ****************************************

Therefore, for all these reasons, move the call from
cpu_disable_schedule() to cpupool_unassign_cpu_helper().

While there, add some sanity checking (in the latter function), and
make sure that scanning the domain list is done with domlist_read_lock
held, at least when the system is 'live'.

I re-tested the scenario described in here:
 http://permalink.gmane.org/gmane.comp.emulators.xen.devel/235310

which is what led to the revert of 93be8285a79c6, and that is
working ok after this commit.

Signed-off-by: <dario.faggioli@citrix.com>
---
Cc: Juergen Gross <jgross@suse.com>
Cc: George Dunlap <george.dunlap@eu.citrix.com>
---
 xen/common/cpupool.c  |   18 ++++++++++++++++++
 xen/common/schedule.c |    7 ++++++-
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
index 563864d..79bcb21 100644
--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -297,12 +297,25 @@ static int cpupool_assign_cpu_locked(struct cpupool *c, unsigned int cpu)
 static long cpupool_unassign_cpu_helper(void *info)
 {
     int cpu = cpupool_moving_cpu;
+    struct cpupool *c = info;
+    struct domain *d;
     long ret;
 
     cpupool_dprintk("cpupool_unassign_cpu(pool=%d,cpu=%d)\n",
                     cpupool_cpu_moving->cpupool_id, cpu);
 
     spin_lock(&cpupool_lock);
+    if ( c != cpupool_cpu_moving )
+    {
+        ret = -EBUSY;
+        goto out;
+    }
+
+    /*
+     * We need this for scanning the domain list, both in
+     * cpu_disable_scheduler(), and at the bottom of this function.
+     */
+    rcu_read_lock(&domlist_read_lock);
     ret = cpu_disable_scheduler(cpu);
     cpumask_set_cpu(cpu, &cpupool_free_cpus);
     if ( !ret )
@@ -319,6 +332,11 @@ static long cpupool_unassign_cpu_helper(void *info)
         cpupool_cpu_moving = NULL;
     }
 
+    for_each_domain_in_cpupool(d, c)
+    {
+        domain_update_node_affinity(d);
+    }
+    rcu_read_unlock(&domlist_read_lock);
 out:
     spin_unlock(&cpupool_lock);
     cpupool_dprintk("cpupool_unassign_cpu ret=%ld\n", ret);
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index eac8804..a1840c9 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -652,6 +652,12 @@ int cpu_disable_scheduler(unsigned int cpu)
     if ( c == NULL )
         return ret;
 
+    /*
+     * We'd need the domain RCU lock, but:
+     *  - when we are called from cpupool code, it's acquired there already;
+     *  - when we are called for CPU teardown, we're in stop-machine context,
+     *    so that's not be a problem.
+     */
     for_each_domain_in_cpupool ( d, c )
     {
         for_each_vcpu ( d, v )
@@ -741,7 +747,6 @@ int cpu_disable_scheduler(unsigned int cpu)
                     ret = -EAGAIN;
             }
         }
-        domain_update_node_affinity(d);
     }
 
     return ret;

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/4] xen: sched: reorganize cpu_disable_scheduler()
  2015-07-03 15:49 ` [PATCH 3/4] xen: sched: reorganize cpu_disable_scheduler() Dario Faggioli
@ 2015-07-07 11:16   ` Juergen Gross
  2015-07-08 15:13     ` Dario Faggioli
  2015-07-08 16:01   ` George Dunlap
  1 sibling, 1 reply; 19+ messages in thread
From: Juergen Gross @ 2015-07-07 11:16 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: George Dunlap

On 07/03/2015 05:49 PM, Dario Faggioli wrote:
> The function is called both when we want to remove a cpu
> from a cpupool, and during cpu teardown, for suspend or
> shutdown. If, however, the boot cpu (cpu 0, most of the
> times) is not present in the default cpupool, during
> suspend or shutdown, Xen crashes like this:
>
>    root@Zhaman:~# xl cpupool-cpu-remove Pool-0 0
>    root@Zhaman:~# shutdown -h now
>    (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
>    ...
>    (XEN) Xen call trace:
>    (XEN)    [<ffff82d0801238de>] _csched_cpu_pick+0x156/0x61f
>    (XEN)    [<ffff82d080123db5>] csched_cpu_pick+0xe/0x10
>    (XEN)    [<ffff82d08012de3c>] vcpu_migrate+0x18e/0x321
>    (XEN)    [<ffff82d08012e4f8>] cpu_disable_scheduler+0x1cf/0x2ac
>    (XEN)    [<ffff82d08018bb8d>] __cpu_disable+0x313/0x36e
>    (XEN)    [<ffff82d080101424>] take_cpu_down+0x34/0x3b
>    (XEN)    [<ffff82d08013097a>] stopmachine_action+0x70/0x99
>    (XEN)    [<ffff82d0801325f0>] do_tasklet_work+0x78/0xab
>    (XEN)    [<ffff82d080132926>] do_tasklet+0x5e/0x8a
>    (XEN)    [<ffff82d08016478c>] idle_loop+0x56/0x6b
>    (XEN)
>    (XEN)
>    (XEN) ****************************************
>    (XEN) Panic on CPU 15:
>    (XEN) Assertion 'cpu < nr_cpu_ids' failed at ...URCES/xen/xen/xen.git/xen/include/xen/cpumask.h:97
>    (XEN) ****************************************
>
> There also are problems when we try to suspend or shutdown
> with a cpupool configured with just one cpu (no matter, in
> this case, whether that is the boot cpu or not):
>
>    root@Zhaman:~# xl create /etc/xen/test.cfg
>    root@Zhaman:~# xl cpupool-migrate test Pool-1
>    root@Zhaman:~# xl cpupool-list -c
>    Name               CPU list
>    Pool-0             0,1,2,3,4,5,6,7,8,9,10,11,13,14,15
>    Pool-1             12
>    root@Zhaman:~# shutdown -h now
>    (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
>    (XEN) CPU:    12
>    ...
>    (XEN) Xen call trace:
>    (XEN)    [<ffff82d08018bb91>] __cpu_disable+0x317/0x36e
>    (XEN)    [<ffff82d080101424>] take_cpu_down+0x34/0x3b
>    (XEN)    [<ffff82d08013097a>] stopmachine_action+0x70/0x99
>    (XEN)    [<ffff82d0801325f0>] do_tasklet_work+0x78/0xab
>    (XEN)    [<ffff82d080132926>] do_tasklet+0x5e/0x8a
>    (XEN)    [<ffff82d08016478c>] idle_loop+0x56/0x6b
>    (XEN)
>    (XEN)
>    (XEN) ****************************************
>    (XEN) Panic on CPU 12:
>    (XEN) Xen BUG at smpboot.c:895
>    (XEN) ****************************************
>
> In both cases, the problem is the scheduler not being able
> to:
>   - move all the vcpus to the boot cpu (as the boot cpu is
>     not in the cpupool), in the former;
>   - move the vcpus away from a cpu at all (as that is the
>     only one cpu in the cpupool), in the latter.
>
> Solution is to distinguish, inside cpu_disable_scheduler(),
> the two cases of cpupool manipulation and teardown. For
> cpupool manipulation, it is correct to ask the scheduler to
> take an action, as pathological situation (like there not
> being any cpu in the pool where to send vcpus) are taken
> care of (i.e., forbidden!) already. For suspend and shutdown,
> we don't want the scheduler to be involved at all, as the
> final goal is pretty simple: "send all the vcpus to the
> boot cpu ASAP", so we just go for it.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Just 2 minor nits (see below), otherwise:

Acked-by: Juergen Gross <jgross@suse.com>

> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Juergen Gross <jgross@suse.com>
> ---
>   xen/common/schedule.c |  109 ++++++++++++++++++++++++++++++++++++++++++-------
>   1 file changed, 93 insertions(+), 16 deletions(-)
>
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index e83c666..eac8804 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -455,8 +455,8 @@ void vcpu_unblock(struct vcpu *v)
>    * Do the actual movemet of a vcpu from old to new CPU. Locks for *both*
>    * CPUs needs to have been taken already when calling this!
>    */
> -static void vcpu_move(struct vcpu *v, unsigned int old_cpu,
> -                      unsigned int new_cpu)
> +static void _vcpu_move(struct vcpu *v, unsigned int old_cpu,
> +                       unsigned int new_cpu)
>   {
>       /*
>        * Transfer urgency status to new CPU before switching CPUs, as
> @@ -479,6 +479,35 @@ static void vcpu_move(struct vcpu *v, unsigned int old_cpu,
>           v->processor = new_cpu;
>   }
>
> +static void vcpu_move(struct vcpu *v, unsigned int new_cpu)
> +{
> +    unsigned long flags;
> +    unsigned int cpu = v->processor;
> +    spinlock_t *lock, *new_lock;
> +
> +    /*
> +     * When here, the vcpu should be ready for being moved. This means:
> +     *  - both its original and target processor must be quiet;
> +     *  - it must not be marked as currently running;
> +     *  - the proper flag must be set (i.e., no one must have had any
> +     *    chance to reset it).
> +     */
> +    ASSERT(is_idle_vcpu(curr_on_cpu(cpu)) &&
> +           is_idle_vcpu(curr_on_cpu(new_cpu)));
> +    ASSERT(!v->is_running && test_bit(_VPF_migrating, &v->pause_flags));
> +
> +    lock = per_cpu(schedule_data, v->processor).schedule_lock;
> +    new_lock = per_cpu(schedule_data, new_cpu).schedule_lock;
> +
> +    sched_spin_lock_double(lock, new_lock, &flags);
> +    ASSERT(new_cpu != v->processor);
> +    _vcpu_move(v, cpu, new_cpu);
> +    sched_spin_unlock_double(lock, new_lock, flags);
> +
> +    sched_move_irqs(v);
> +    vcpu_wake(v);
> +}
> +
>   static void vcpu_migrate(struct vcpu *v)
>   {
>       unsigned long flags;
> @@ -543,7 +572,7 @@ static void vcpu_migrate(struct vcpu *v)
>           return;
>       }
>
> -    vcpu_move(v, old_cpu, new_cpu);
> +    _vcpu_move(v, old_cpu, new_cpu);
>
>       sched_spin_unlock_double(old_lock, new_lock, flags);
>
> @@ -616,7 +645,8 @@ int cpu_disable_scheduler(unsigned int cpu)
>       struct vcpu *v;
>       struct cpupool *c;
>       cpumask_t online_affinity;
> -    int    ret = 0;
> +    unsigned int new_cpu;
> +    int ret = 0;
>
>       c = per_cpu(cpupool, cpu);
>       if ( c == NULL )
> @@ -645,25 +675,72 @@ int cpu_disable_scheduler(unsigned int cpu)
>                   cpumask_setall(v->cpu_hard_affinity);
>               }
>
> -            if ( v->processor == cpu )
> +            if ( v->processor != cpu )
>               {
> -                set_bit(_VPF_migrating, &v->pause_flags);
> +                /* This vcpu is not on cpu, so we can move on. */
>                   vcpu_schedule_unlock_irqrestore(lock, flags, v);
> -                vcpu_sleep_nosync(v);
> -                vcpu_migrate(v);
> +                continue;
>               }
> -            else
> -                vcpu_schedule_unlock_irqrestore(lock, flags, v);
>
>               /*
> -             * A vcpu active in the hypervisor will not be migratable.
> -             * The caller should try again after releasing and reaquiring
> -             * all locks.
> +             * If we're here, it means that the vcpu is on cpu. Let's see how
> +             * it's best to send it away, depending on whether we are shutting
> +             * down/suspending, or doing cpupool manipulations.
>                */
> -            if ( v->processor == cpu )
> -                ret = -EAGAIN;
> -        }
> +            set_bit(_VPF_migrating, &v->pause_flags);
> +            vcpu_schedule_unlock_irqrestore(lock, flags, v);
> +            vcpu_sleep_nosync(v);
> +
> +            /*
> +             * In case of shutdown/suspend, it is not necessary to ask the
> +             * scheduler to chime in. In fact:
> +             *  * there is no reason for it: the end result we are after is
> +             *    just 'all the vcpus on the boot pcpu, and no vcpu anywhere
> +             *    else', so let's just go for it;
> +             *  * it's wrong, when dealing a cpupool with only non-boot pcpus,
> +             *    as the scheduler will always fail to send the vcpus away
> +             *    from the last online (non boot) pcpu!

I'd add a comment that in shutdown/suspend case all domains are being
paused, so we can be active in dom0/Pool-0 only.

> +             *
> +             * Therefore, in the shutdown/suspend case, let's just pick one
> +             * of the still online pcpus, and send everyone there. Ideally,
> +             * we would pick up the boot pcpu directly, but we don't know
> +             * which one it is.
> +             *
> +             * OTOH, if the system is still live, and we are here because of
> +             * cpupool manipulations:
> +             *  * it would be best to call the scheduler, as that would serve
> +             *    as a re-evaluation of the placement of the vcpu, taking into
> +             *    account the modified cpupool configuration;
> +             *  * the scheduler will always fine a suitable solution, or
> +             *    things would have failed before getting in here.
> +             *
> +             * Therefore, in the cpupool manipulation case, let's just ask the
> +             * scheduler to do its job, via calling vcpu_migrate().
> +             */
> +            if ( unlikely(system_state == SYS_STATE_suspend) )
> +            {
> +                /*
> +                 * The boot pcpu is, usually, pcpu #0, so using cpumask_first()
> +                 * actually helps us to achieve our ultimate goal quicker.
> +                 */
> +                cpumask_andnot(&online_affinity, &cpu_online_map, cpumask_of(cpu));

What about an ASSERT/BUG regarding a non-empty online_affinity?


Juergen

> +                new_cpu = cpumask_first(&online_affinity);
> +                vcpu_move(v, new_cpu);
> +            }
> +            else
> +            {
> +                /*
> +                 * The only caveat, in this case, is that if the vcpu active
> +                 * in the hypervisor, it won't be migratable. In this case,
> +                 * the caller should try again after releasing and reaquiring
> +                 * all locks.
> +                 */
> +                vcpu_migrate(v);
>
> +                if ( v->processor == cpu )
> +                    ret = -EAGAIN;
> +            }
> +        }
>           domain_update_node_affinity(d);
>       }
>
>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/4] xen: sched/cpupool: properly update affinity when removing a cpu from a cpupool
  2015-07-03 15:49 ` [PATCH 4/4] xen: sched/cpupool: properly update affinity when removing a cpu from a cpupool Dario Faggioli
@ 2015-07-07 11:30   ` Juergen Gross
  2015-07-08 16:19   ` George Dunlap
  1 sibling, 0 replies; 19+ messages in thread
From: Juergen Gross @ 2015-07-07 11:30 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: George Dunlap

On 07/03/2015 05:49 PM, Dario Faggioli wrote:
> And this time, do it right. In fact, a similar change was
> attempted in 93be8285a79c6 ("cpupools: update domU's node-affinity
> on the cpupool_unassign_cpu() path"). But that was buggy, and got
> reverted with 8395b67ab0b8a86.
>
> However, even though reverting was the right thing to do, it
> remains true that:
>   - calling the function is better done in the cpupool cpu removal
>     code, even if just for simmetry with the cpupool cpu adding path;
>   - it is not necessary to call it during cpu teardown (for suspend
>     or shutdown) code as we either are going down and will never
>     come up (shutdown) or, when coming up, we want everything to be
>     as before the tearing down process started, and so we would just
>     undo any update made during the process.
>   - calling it from the teardown path is not only unnecessary, but
>     it can trigger an ASSERT(), in case we get, during the process,
>     to remove the last online pcpu of a domain's node affinity:
>
>    (XEN) Assertion '!cpumask_empty(dom_cpumask)' failed at domain.c:466
>    (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
>    ... ... ...
>    (XEN) Xen call trace:
>    (XEN)    [<ffff82d0801055b9>] domain_update_node_affinity+0x113/0x240
>    (XEN)    [<ffff82d08012e676>] cpu_disable_scheduler+0x334/0x3f2
>    (XEN)    [<ffff82d08018bb8d>] __cpu_disable+0x313/0x36e
>    (XEN)    [<ffff82d080101424>] take_cpu_down+0x34/0x3b
>    (XEN)    [<ffff82d080130ad9>] stopmachine_action+0x70/0x99
>    (XEN)    [<ffff82d08013274f>] do_tasklet_work+0x78/0xab
>    (XEN)    [<ffff82d080132a85>] do_tasklet+0x5e/0x8a
>    (XEN)    [<ffff82d08016478c>] idle_loop+0x56/0x6b
>    (XEN)
>    (XEN)
>    (XEN) ****************************************
>    (XEN) Panic on CPU 12:
>    (XEN) Assertion '!cpumask_empty(dom_cpumask)' failed at domain.c:466
>    (XEN) ****************************************
>
> Therefore, for all these reasons, move the call from
> cpu_disable_schedule() to cpupool_unassign_cpu_helper().
>
> While there, add some sanity checking (in the latter function), and
> make sure that scanning the domain list is done with domlist_read_lock
> held, at least when the system is 'live'.
>
> I re-tested the scenario described in here:
>   http://permalink.gmane.org/gmane.comp.emulators.xen.devel/235310
>
> which is what led to the revert of 93be8285a79c6, and that is
> working ok after this commit.
>
> Signed-off-by: <dario.faggioli@citrix.com>

Acked-by: Juergen Gross <jgross@suse.com>

> ---
> Cc: Juergen Gross <jgross@suse.com>
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> ---
>   xen/common/cpupool.c  |   18 ++++++++++++++++++
>   xen/common/schedule.c |    7 ++++++-
>   2 files changed, 24 insertions(+), 1 deletion(-)
>
> diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
> index 563864d..79bcb21 100644
> --- a/xen/common/cpupool.c
> +++ b/xen/common/cpupool.c
> @@ -297,12 +297,25 @@ static int cpupool_assign_cpu_locked(struct cpupool *c, unsigned int cpu)
>   static long cpupool_unassign_cpu_helper(void *info)
>   {
>       int cpu = cpupool_moving_cpu;
> +    struct cpupool *c = info;
> +    struct domain *d;
>       long ret;
>
>       cpupool_dprintk("cpupool_unassign_cpu(pool=%d,cpu=%d)\n",
>                       cpupool_cpu_moving->cpupool_id, cpu);
>
>       spin_lock(&cpupool_lock);
> +    if ( c != cpupool_cpu_moving )
> +    {
> +        ret = -EBUSY;
> +        goto out;
> +    }
> +
> +    /*
> +     * We need this for scanning the domain list, both in
> +     * cpu_disable_scheduler(), and at the bottom of this function.
> +     */
> +    rcu_read_lock(&domlist_read_lock);
>       ret = cpu_disable_scheduler(cpu);
>       cpumask_set_cpu(cpu, &cpupool_free_cpus);
>       if ( !ret )
> @@ -319,6 +332,11 @@ static long cpupool_unassign_cpu_helper(void *info)
>           cpupool_cpu_moving = NULL;
>       }
>
> +    for_each_domain_in_cpupool(d, c)
> +    {
> +        domain_update_node_affinity(d);
> +    }
> +    rcu_read_unlock(&domlist_read_lock);
>   out:
>       spin_unlock(&cpupool_lock);
>       cpupool_dprintk("cpupool_unassign_cpu ret=%ld\n", ret);
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index eac8804..a1840c9 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -652,6 +652,12 @@ int cpu_disable_scheduler(unsigned int cpu)
>       if ( c == NULL )
>           return ret;
>
> +    /*
> +     * We'd need the domain RCU lock, but:
> +     *  - when we are called from cpupool code, it's acquired there already;
> +     *  - when we are called for CPU teardown, we're in stop-machine context,
> +     *    so that's not be a problem.
> +     */
>       for_each_domain_in_cpupool ( d, c )
>       {
>           for_each_vcpu ( d, v )
> @@ -741,7 +747,6 @@ int cpu_disable_scheduler(unsigned int cpu)
>                       ret = -EAGAIN;
>               }
>           }
> -        domain_update_node_affinity(d);
>       }
>
>       return ret;
>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/4] xen: sched: factor the code for taking two runq locks in a function
  2015-07-03 15:49 ` [PATCH 1/4] xen: sched: factor the code for taking two runq locks in a function Dario Faggioli
@ 2015-07-08 14:47   ` George Dunlap
  0 siblings, 0 replies; 19+ messages in thread
From: George Dunlap @ 2015-07-08 14:47 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: xen-devel

On Fri, Jul 3, 2015 at 4:49 PM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> No functional change intended.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>

> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> ---
>  xen/common/schedule.c |   64 ++++++++++++++++++++++++++++---------------------
>  1 file changed, 36 insertions(+), 28 deletions(-)
>
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index ecf1545..26e8430 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -185,6 +185,38 @@ uint64_t get_cpu_idle_time(unsigned int cpu)
>      return state.time[RUNSTATE_running];
>  }
>
> +/*
> + * If locks are different, take the one with the lower address first.
> + * This avoids dead- or live-locks when this code is running on both
> + * cpus at the same time.
> + */
> +static void sched_spin_lock_double(spinlock_t *lock1, spinlock_t *lock2,
> +                                   unsigned long *flags)
> +{
> +    if ( lock1 == lock2 )
> +    {
> +        spin_lock_irqsave(lock1, *flags);
> +    }
> +    else if ( lock1 < lock2 )
> +    {
> +        spin_lock_irqsave(lock1, *flags);
> +        spin_lock(lock2);
> +    }
> +    else
> +    {
> +        spin_lock_irqsave(lock2, *flags);
> +        spin_lock(lock1);
> +    }
> +}
> +
> +static void sched_spin_unlock_double(spinlock_t *lock1, spinlock_t *lock2,
> +                                     unsigned long flags)
> +{
> +    if ( lock1 != lock2 )
> +        spin_unlock(lock2);
> +    spin_unlock_irqrestore(lock1, flags);
> +}
> +
>  int sched_init_vcpu(struct vcpu *v, unsigned int processor)
>  {
>      struct domain *d = v->domain;
> @@ -430,31 +462,14 @@ static void vcpu_migrate(struct vcpu *v)
>      for ( ; ; )
>      {
>          /*
> -         * If per-cpu locks for old and new cpu are different, take the one
> -         * with the lower lock address first. This avoids dead- or live-locks
> -         * when this code is running on both cpus at the same time.
>           * We need another iteration if the pre-calculated lock addresses
>           * are not correct any longer after evaluating old and new cpu holding
>           * the locks.
>           */
> -
>          old_lock = per_cpu(schedule_data, old_cpu).schedule_lock;
>          new_lock = per_cpu(schedule_data, new_cpu).schedule_lock;
>
> -        if ( old_lock == new_lock )
> -        {
> -            spin_lock_irqsave(old_lock, flags);
> -        }
> -        else if ( old_lock < new_lock )
> -        {
> -            spin_lock_irqsave(old_lock, flags);
> -            spin_lock(new_lock);
> -        }
> -        else
> -        {
> -            spin_lock_irqsave(new_lock, flags);
> -            spin_lock(old_lock);
> -        }
> +        sched_spin_lock_double(old_lock, new_lock, &flags);
>
>          old_cpu = v->processor;
>          if ( old_lock == per_cpu(schedule_data, old_cpu).schedule_lock )
> @@ -485,9 +500,7 @@ static void vcpu_migrate(struct vcpu *v)
>              pick_called = 0;
>          }
>
> -        if ( old_lock != new_lock )
> -            spin_unlock(new_lock);
> -        spin_unlock_irqrestore(old_lock, flags);
> +        sched_spin_unlock_double(old_lock, new_lock, flags);
>      }
>
>      /*
> @@ -498,9 +511,7 @@ static void vcpu_migrate(struct vcpu *v)
>      if ( v->is_running ||
>           !test_and_clear_bit(_VPF_migrating, &v->pause_flags) )
>      {
> -        if ( old_lock != new_lock )
> -            spin_unlock(new_lock);
> -        spin_unlock_irqrestore(old_lock, flags);
> +        sched_spin_unlock_double(old_lock, new_lock, flags);
>          return;
>      }
>
> @@ -524,10 +535,7 @@ static void vcpu_migrate(struct vcpu *v)
>      else
>          v->processor = new_cpu;
>
> -
> -    if ( old_lock != new_lock )
> -        spin_unlock(new_lock);
> -    spin_unlock_irqrestore(old_lock, flags);
> +    sched_spin_unlock_double(old_lock, new_lock, flags);
>
>      if ( old_cpu != new_cpu )
>          sched_move_irqs(v);
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/4] xen: sched: factor code that moves a vcpu to a new pcpu in a function
  2015-07-03 15:49 ` [PATCH 2/4] xen: sched: factor code that moves a vcpu to a new pcpu " Dario Faggioli
@ 2015-07-08 14:56   ` George Dunlap
  2015-07-08 15:09     ` Dario Faggioli
  0 siblings, 1 reply; 19+ messages in thread
From: George Dunlap @ 2015-07-08 14:56 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: xen-devel

On Fri, Jul 3, 2015 at 4:49 PM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> No functional change intended.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>

With one nit...

> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> ---
>  xen/common/schedule.c |   48 +++++++++++++++++++++++++++++-------------------
>  1 file changed, 29 insertions(+), 19 deletions(-)
>
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index 26e8430..e83c666 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -451,6 +451,34 @@ void vcpu_unblock(struct vcpu *v)
>      vcpu_wake(v);
>  }
>
> +/*
> + * Do the actual movemet of a vcpu from old to new CPU. Locks for *both*

"movement"  (missing an 'n')

 -G

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/4] xen: sched: factor code that moves a vcpu to a new pcpu in a function
  2015-07-08 14:56   ` George Dunlap
@ 2015-07-08 15:09     ` Dario Faggioli
  0 siblings, 0 replies; 19+ messages in thread
From: Dario Faggioli @ 2015-07-08 15:09 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1197 bytes --]

On Wed, 2015-07-08 at 15:56 +0100, George Dunlap wrote:
> On Fri, Jul 3, 2015 at 4:49 PM, Dario Faggioli
> <dario.faggioli@citrix.com> wrote:
> > No functional change intended.
> >
> > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> 
> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
> 
Thanks.

> With one nit...

> > diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> > index 26e8430..e83c666 100644
> > --- a/xen/common/schedule.c
> > +++ b/xen/common/schedule.c
> > @@ -451,6 +451,34 @@ void vcpu_unblock(struct vcpu *v)
> >      vcpu_wake(v);
> >  }
> >
> > +/*
> > + * Do the actual movemet of a vcpu from old to new CPU. Locks for *both*
> 
> "movement"  (missing an 'n')
> 
Yeah, well, what I was missing was 'set spell' in .vimrc. Done now,
hopefully, this would help limiting things like this! :-/

Anyway, I'll resend and take care of this.

Thanks again,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/4] xen: sched: reorganize cpu_disable_scheduler()
  2015-07-07 11:16   ` Juergen Gross
@ 2015-07-08 15:13     ` Dario Faggioli
  2015-07-09 10:24       ` Dario Faggioli
  0 siblings, 1 reply; 19+ messages in thread
From: Dario Faggioli @ 2015-07-08 15:13 UTC (permalink / raw)
  To: Juergen Gross; +Cc: George Dunlap, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 4930 bytes --]

On Tue, 2015-07-07 at 13:16 +0200, Juergen Gross wrote:
> On 07/03/2015 05:49 PM, Dario Faggioli wrote:

> > Solution is to distinguish, inside cpu_disable_scheduler(),
> > the two cases of cpupool manipulation and teardown. For
> > cpupool manipulation, it is correct to ask the scheduler to
> > take an action, as pathological situation (like there not
> > being any cpu in the pool where to send vcpus) are taken
> > care of (i.e., forbidden!) already. For suspend and shutdown,
> > we don't want the scheduler to be involved at all, as the
> > final goal is pretty simple: "send all the vcpus to the
> > boot cpu ASAP", so we just go for it.
> >
> > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> 
> Just 2 minor nits (see below), otherwise:
> 
> Acked-by: Juergen Gross <jgross@suse.com>
> 
Thanks.

> > --- a/xen/common/schedule.c
> > +++ b/xen/common/schedule.c
> > @@ -645,25 +675,72 @@ int cpu_disable_scheduler(unsigned int cpu)
> >                   cpumask_setall(v->cpu_hard_affinity);
> >               }
> >
> > -            if ( v->processor == cpu )
> > +            if ( v->processor != cpu )
> >               {
> > -                set_bit(_VPF_migrating, &v->pause_flags);
> > +                /* This vcpu is not on cpu, so we can move on. */
> >                   vcpu_schedule_unlock_irqrestore(lock, flags, v);
> > -                vcpu_sleep_nosync(v);
> > -                vcpu_migrate(v);
> > +                continue;
> >               }
> > -            else
> > -                vcpu_schedule_unlock_irqrestore(lock, flags, v);
> >
> >               /*
> > -             * A vcpu active in the hypervisor will not be migratable.
> > -             * The caller should try again after releasing and reaquiring
> > -             * all locks.
> > +             * If we're here, it means that the vcpu is on cpu. Let's see how
> > +             * it's best to send it away, depending on whether we are shutting
> > +             * down/suspending, or doing cpupool manipulations.
> >                */
> > -            if ( v->processor == cpu )
> > -                ret = -EAGAIN;
> > -        }
> > +            set_bit(_VPF_migrating, &v->pause_flags);
> > +            vcpu_schedule_unlock_irqrestore(lock, flags, v);
> > +            vcpu_sleep_nosync(v);
> > +
> > +            /*
> > +             * In case of shutdown/suspend, it is not necessary to ask the
> > +             * scheduler to chime in. In fact:
> > +             *  * there is no reason for it: the end result we are after is
> > +             *    just 'all the vcpus on the boot pcpu, and no vcpu anywhere
> > +             *    else', so let's just go for it;
> > +             *  * it's wrong, when dealing a cpupool with only non-boot pcpus,
> > +             *    as the scheduler will always fail to send the vcpus away
> > +             *    from the last online (non boot) pcpu!
> 
> I'd add a comment that in shutdown/suspend case all domains are being
> paused, so we can be active in dom0/Pool-0 only.
> 
Sure, I'll add this.

> > +             *
> > +             * Therefore, in the shutdown/suspend case, let's just pick one
> > +             * of the still online pcpus, and send everyone there. Ideally,
> > +             * we would pick up the boot pcpu directly, but we don't know
> > +             * which one it is.
> > +             *
> > +             * OTOH, if the system is still live, and we are here because of
> > +             * cpupool manipulations:
> > +             *  * it would be best to call the scheduler, as that would serve
> > +             *    as a re-evaluation of the placement of the vcpu, taking into
> > +             *    account the modified cpupool configuration;
> > +             *  * the scheduler will always fine a suitable solution, or
> > +             *    things would have failed before getting in here.
> > +             *
> > +             * Therefore, in the cpupool manipulation case, let's just ask the
> > +             * scheduler to do its job, via calling vcpu_migrate().
> > +             */
> > +            if ( unlikely(system_state == SYS_STATE_suspend) )
> > +            {
> > +                /*
> > +                 * The boot pcpu is, usually, pcpu #0, so using cpumask_first()
> > +                 * actually helps us to achieve our ultimate goal quicker.
> > +                 */
> > +                cpumask_andnot(&online_affinity, &cpu_online_map, cpumask_of(cpu));
> 
> What about an ASSERT/BUG regarding a non-empty online_affinity?
> 
Sounds good.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/4] xen: sched: reorganize cpu_disable_scheduler()
  2015-07-03 15:49 ` [PATCH 3/4] xen: sched: reorganize cpu_disable_scheduler() Dario Faggioli
  2015-07-07 11:16   ` Juergen Gross
@ 2015-07-08 16:01   ` George Dunlap
  2015-07-08 16:37     ` Dario Faggioli
  1 sibling, 1 reply; 19+ messages in thread
From: George Dunlap @ 2015-07-08 16:01 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: Juergen Gross, xen-devel

On Fri, Jul 3, 2015 at 4:49 PM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> The function is called both when we want to remove a cpu
> from a cpupool, and during cpu teardown, for suspend or
> shutdown. If, however, the boot cpu (cpu 0, most of the
> times) is not present in the default cpupool, during
> suspend or shutdown, Xen crashes like this:
>
>   root@Zhaman:~# xl cpupool-cpu-remove Pool-0 0
>   root@Zhaman:~# shutdown -h now
>   (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
>   ...
>   (XEN) Xen call trace:
>   (XEN)    [<ffff82d0801238de>] _csched_cpu_pick+0x156/0x61f
>   (XEN)    [<ffff82d080123db5>] csched_cpu_pick+0xe/0x10
>   (XEN)    [<ffff82d08012de3c>] vcpu_migrate+0x18e/0x321
>   (XEN)    [<ffff82d08012e4f8>] cpu_disable_scheduler+0x1cf/0x2ac
>   (XEN)    [<ffff82d08018bb8d>] __cpu_disable+0x313/0x36e
>   (XEN)    [<ffff82d080101424>] take_cpu_down+0x34/0x3b
>   (XEN)    [<ffff82d08013097a>] stopmachine_action+0x70/0x99
>   (XEN)    [<ffff82d0801325f0>] do_tasklet_work+0x78/0xab
>   (XEN)    [<ffff82d080132926>] do_tasklet+0x5e/0x8a
>   (XEN)    [<ffff82d08016478c>] idle_loop+0x56/0x6b
>   (XEN)
>   (XEN)
>   (XEN) ****************************************
>   (XEN) Panic on CPU 15:
>   (XEN) Assertion 'cpu < nr_cpu_ids' failed at ...URCES/xen/xen/xen.git/xen/include/xen/cpumask.h:97
>   (XEN) ****************************************
>
> There also are problems when we try to suspend or shutdown
> with a cpupool configured with just one cpu (no matter, in
> this case, whether that is the boot cpu or not):
>
>   root@Zhaman:~# xl create /etc/xen/test.cfg
>   root@Zhaman:~# xl cpupool-migrate test Pool-1
>   root@Zhaman:~# xl cpupool-list -c
>   Name               CPU list
>   Pool-0             0,1,2,3,4,5,6,7,8,9,10,11,13,14,15
>   Pool-1             12
>   root@Zhaman:~# shutdown -h now
>   (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
>   (XEN) CPU:    12
>   ...
>   (XEN) Xen call trace:
>   (XEN)    [<ffff82d08018bb91>] __cpu_disable+0x317/0x36e
>   (XEN)    [<ffff82d080101424>] take_cpu_down+0x34/0x3b
>   (XEN)    [<ffff82d08013097a>] stopmachine_action+0x70/0x99
>   (XEN)    [<ffff82d0801325f0>] do_tasklet_work+0x78/0xab
>   (XEN)    [<ffff82d080132926>] do_tasklet+0x5e/0x8a
>   (XEN)    [<ffff82d08016478c>] idle_loop+0x56/0x6b
>   (XEN)
>   (XEN)
>   (XEN) ****************************************
>   (XEN) Panic on CPU 12:
>   (XEN) Xen BUG at smpboot.c:895
>   (XEN) ****************************************
>
> In both cases, the problem is the scheduler not being able
> to:
>  - move all the vcpus to the boot cpu (as the boot cpu is
>    not in the cpupool), in the former;
>  - move the vcpus away from a cpu at all (as that is the
>    only one cpu in the cpupool), in the latter.
>
> Solution is to distinguish, inside cpu_disable_scheduler(),
> the two cases of cpupool manipulation and teardown. For
> cpupool manipulation, it is correct to ask the scheduler to
> take an action, as pathological situation (like there not
> being any cpu in the pool where to send vcpus) are taken
> care of (i.e., forbidden!) already. For suspend and shutdown,
> we don't want the scheduler to be involved at all, as the
> final goal is pretty simple: "send all the vcpus to the
> boot cpu ASAP", so we just go for it.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Juergen Gross <jgross@suse.com>
> ---
>  xen/common/schedule.c |  109 ++++++++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 93 insertions(+), 16 deletions(-)
>
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index e83c666..eac8804 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -455,8 +455,8 @@ void vcpu_unblock(struct vcpu *v)
>   * Do the actual movemet of a vcpu from old to new CPU. Locks for *both*
>   * CPUs needs to have been taken already when calling this!
>   */
> -static void vcpu_move(struct vcpu *v, unsigned int old_cpu,
> -                      unsigned int new_cpu)
> +static void _vcpu_move(struct vcpu *v, unsigned int old_cpu,
> +                       unsigned int new_cpu)
>  {
>      /*
>       * Transfer urgency status to new CPU before switching CPUs, as
> @@ -479,6 +479,35 @@ static void vcpu_move(struct vcpu *v, unsigned int old_cpu,
>          v->processor = new_cpu;
>  }
>
> +static void vcpu_move(struct vcpu *v, unsigned int new_cpu)

Not quite a fan of the naming scheme here.  Perhaps vcpu_move and
vcpu_move_locked?

Actually -- looking at this again, is there a reason to pass old_cpu
into _vcpu_move?  It looks like old_vcpu should always be equal to
v->processor.  That would make the function prototypes the same.

> +{
> +    unsigned long flags;
> +    unsigned int cpu = v->processor;
> +    spinlock_t *lock, *new_lock;
> +
> +    /*
> +     * When here, the vcpu should be ready for being moved. This means:
> +     *  - both its original and target processor must be quiet;
> +     *  - it must not be marked as currently running;
> +     *  - the proper flag must be set (i.e., no one must have had any
> +     *    chance to reset it).
> +     */
> +    ASSERT(is_idle_vcpu(curr_on_cpu(cpu)) &&
> +           is_idle_vcpu(curr_on_cpu(new_cpu)));
> +    ASSERT(!v->is_running && test_bit(_VPF_migrating, &v->pause_flags));
> +
> +    lock = per_cpu(schedule_data, v->processor).schedule_lock;
> +    new_lock = per_cpu(schedule_data, new_cpu).schedule_lock;
> +
> +    sched_spin_lock_double(lock, new_lock, &flags);
> +    ASSERT(new_cpu != v->processor);

Don't we need to do the "schedule lock dance" here as well --
double-check to make sure that v->processor hasn't changed since the
time we grabbed the lock?

> +    _vcpu_move(v, cpu, new_cpu);
> +    sched_spin_unlock_double(lock, new_lock, flags);
> +
> +    sched_move_irqs(v);
> +    vcpu_wake(v);
> +}
> +
>  static void vcpu_migrate(struct vcpu *v)
>  {
>      unsigned long flags;
> @@ -543,7 +572,7 @@ static void vcpu_migrate(struct vcpu *v)
>          return;
>      }
>
> -    vcpu_move(v, old_cpu, new_cpu);
> +    _vcpu_move(v, old_cpu, new_cpu);
>
>      sched_spin_unlock_double(old_lock, new_lock, flags);
>
> @@ -616,7 +645,8 @@ int cpu_disable_scheduler(unsigned int cpu)
>      struct vcpu *v;
>      struct cpupool *c;
>      cpumask_t online_affinity;
> -    int    ret = 0;
> +    unsigned int new_cpu;
> +    int ret = 0;
>
>      c = per_cpu(cpupool, cpu);
>      if ( c == NULL )
> @@ -645,25 +675,72 @@ int cpu_disable_scheduler(unsigned int cpu)
>                  cpumask_setall(v->cpu_hard_affinity);
>              }
>
> -            if ( v->processor == cpu )
> +            if ( v->processor != cpu )
>              {
> -                set_bit(_VPF_migrating, &v->pause_flags);
> +                /* This vcpu is not on cpu, so we can move on. */
>                  vcpu_schedule_unlock_irqrestore(lock, flags, v);
> -                vcpu_sleep_nosync(v);
> -                vcpu_migrate(v);
> +                continue;
>              }
> -            else
> -                vcpu_schedule_unlock_irqrestore(lock, flags, v);
>
>              /*
> -             * A vcpu active in the hypervisor will not be migratable.
> -             * The caller should try again after releasing and reaquiring
> -             * all locks.
> +             * If we're here, it means that the vcpu is on cpu. Let's see how
> +             * it's best to send it away, depending on whether we are shutting
> +             * down/suspending, or doing cpupool manipulations.
>               */
> -            if ( v->processor == cpu )
> -                ret = -EAGAIN;
> -        }
> +            set_bit(_VPF_migrating, &v->pause_flags);
> +            vcpu_schedule_unlock_irqrestore(lock, flags, v);
> +            vcpu_sleep_nosync(v);

I don't quite understand this.  By calling _nosync(), you're not
guaranteed that the vcpu has actually stopped running when you call
vcpu_move() down below; and yet inside vcpu_move(), you assert
!v->is_running.

So either at this point, when doing suspend, the vcpu has already been
paused; in which case this is unnecessary; or it hasn't been paused,
in which case we're potentially racing the IPI / deschedule, and will
trip over the ASSERT if we "win".

Did I miss something?  (I'm perfectly ready to believe that I have...)

 -George

> +
> +            /*
> +             * In case of shutdown/suspend, it is not necessary to ask the
> +             * scheduler to chime in. In fact:
> +             *  * there is no reason for it: the end result we are after is
> +             *    just 'all the vcpus on the boot pcpu, and no vcpu anywhere
> +             *    else', so let's just go for it;
> +             *  * it's wrong, when dealing a cpupool with only non-boot pcpus,
> +             *    as the scheduler will always fail to send the vcpus away
> +             *    from the last online (non boot) pcpu!
> +             *
> +             * Therefore, in the shutdown/suspend case, let's just pick one
> +             * of the still online pcpus, and send everyone there. Ideally,
> +             * we would pick up the boot pcpu directly, but we don't know
> +             * which one it is.
> +             *
> +             * OTOH, if the system is still live, and we are here because of
> +             * cpupool manipulations:
> +             *  * it would be best to call the scheduler, as that would serve
> +             *    as a re-evaluation of the placement of the vcpu, taking into
> +             *    account the modified cpupool configuration;
> +             *  * the scheduler will always fine a suitable solution, or
> +             *    things would have failed before getting in here.
> +             *
> +             * Therefore, in the cpupool manipulation case, let's just ask the
> +             * scheduler to do its job, via calling vcpu_migrate().
> +             */
> +            if ( unlikely(system_state == SYS_STATE_suspend) )
> +            {
> +                /*
> +                 * The boot pcpu is, usually, pcpu #0, so using cpumask_first()
> +                 * actually helps us to achieve our ultimate goal quicker.
> +                 */
> +                cpumask_andnot(&online_affinity, &cpu_online_map, cpumask_of(cpu));
> +                new_cpu = cpumask_first(&online_affinity);
> +                vcpu_move(v, new_cpu);
> +            }
> +            else
> +            {
> +                /*
> +                 * The only caveat, in this case, is that if the vcpu active
> +                 * in the hypervisor, it won't be migratable. In this case,
> +                 * the caller should try again after releasing and reaquiring
> +                 * all locks.
> +                 */
> +                vcpu_migrate(v);
>
> +                if ( v->processor == cpu )
> +                    ret = -EAGAIN;
> +            }
> +        }
>          domain_update_node_affinity(d);
>      }
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/4] xen: sched/cpupool: properly update affinity when removing a cpu from a cpupool
  2015-07-03 15:49 ` [PATCH 4/4] xen: sched/cpupool: properly update affinity when removing a cpu from a cpupool Dario Faggioli
  2015-07-07 11:30   ` Juergen Gross
@ 2015-07-08 16:19   ` George Dunlap
  1 sibling, 0 replies; 19+ messages in thread
From: George Dunlap @ 2015-07-08 16:19 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: Juergen Gross, xen-devel

On Fri, Jul 3, 2015 at 4:49 PM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> And this time, do it right. In fact, a similar change was
> attempted in 93be8285a79c6 ("cpupools: update domU's node-affinity
> on the cpupool_unassign_cpu() path"). But that was buggy, and got
> reverted with 8395b67ab0b8a86.
>
> However, even though reverting was the right thing to do, it
> remains true that:
>  - calling the function is better done in the cpupool cpu removal
>    code, even if just for simmetry with the cpupool cpu adding path;
>  - it is not necessary to call it during cpu teardown (for suspend
>    or shutdown) code as we either are going down and will never
>    come up (shutdown) or, when coming up, we want everything to be
>    as before the tearing down process started, and so we would just
>    undo any update made during the process.
>  - calling it from the teardown path is not only unnecessary, but
>    it can trigger an ASSERT(), in case we get, during the process,
>    to remove the last online pcpu of a domain's node affinity:
>
>   (XEN) Assertion '!cpumask_empty(dom_cpumask)' failed at domain.c:466
>   (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
>   ... ... ...
>   (XEN) Xen call trace:
>   (XEN)    [<ffff82d0801055b9>] domain_update_node_affinity+0x113/0x240
>   (XEN)    [<ffff82d08012e676>] cpu_disable_scheduler+0x334/0x3f2
>   (XEN)    [<ffff82d08018bb8d>] __cpu_disable+0x313/0x36e
>   (XEN)    [<ffff82d080101424>] take_cpu_down+0x34/0x3b
>   (XEN)    [<ffff82d080130ad9>] stopmachine_action+0x70/0x99
>   (XEN)    [<ffff82d08013274f>] do_tasklet_work+0x78/0xab
>   (XEN)    [<ffff82d080132a85>] do_tasklet+0x5e/0x8a
>   (XEN)    [<ffff82d08016478c>] idle_loop+0x56/0x6b
>   (XEN)
>   (XEN)
>   (XEN) ****************************************
>   (XEN) Panic on CPU 12:
>   (XEN) Assertion '!cpumask_empty(dom_cpumask)' failed at domain.c:466
>   (XEN) ****************************************
>
> Therefore, for all these reasons, move the call from
> cpu_disable_schedule() to cpupool_unassign_cpu_helper().
>
> While there, add some sanity checking (in the latter function), and
> make sure that scanning the domain list is done with domlist_read_lock
> held, at least when the system is 'live'.
>
> I re-tested the scenario described in here:
>  http://permalink.gmane.org/gmane.comp.emulators.xen.devel/235310
>
> which is what led to the revert of 93be8285a79c6, and that is
> working ok after this commit.
>
> Signed-off-by: <dario.faggioli@citrix.com>

Acked-by: George Dunlap <george.dunlap@eu.citrix.com>

> ---
> Cc: Juergen Gross <jgross@suse.com>
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> ---
>  xen/common/cpupool.c  |   18 ++++++++++++++++++
>  xen/common/schedule.c |    7 ++++++-
>  2 files changed, 24 insertions(+), 1 deletion(-)
>
> diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
> index 563864d..79bcb21 100644
> --- a/xen/common/cpupool.c
> +++ b/xen/common/cpupool.c
> @@ -297,12 +297,25 @@ static int cpupool_assign_cpu_locked(struct cpupool *c, unsigned int cpu)
>  static long cpupool_unassign_cpu_helper(void *info)
>  {
>      int cpu = cpupool_moving_cpu;
> +    struct cpupool *c = info;
> +    struct domain *d;
>      long ret;
>
>      cpupool_dprintk("cpupool_unassign_cpu(pool=%d,cpu=%d)\n",
>                      cpupool_cpu_moving->cpupool_id, cpu);
>
>      spin_lock(&cpupool_lock);
> +    if ( c != cpupool_cpu_moving )
> +    {
> +        ret = -EBUSY;
> +        goto out;
> +    }
> +
> +    /*
> +     * We need this for scanning the domain list, both in
> +     * cpu_disable_scheduler(), and at the bottom of this function.
> +     */
> +    rcu_read_lock(&domlist_read_lock);
>      ret = cpu_disable_scheduler(cpu);
>      cpumask_set_cpu(cpu, &cpupool_free_cpus);
>      if ( !ret )
> @@ -319,6 +332,11 @@ static long cpupool_unassign_cpu_helper(void *info)
>          cpupool_cpu_moving = NULL;
>      }
>
> +    for_each_domain_in_cpupool(d, c)
> +    {
> +        domain_update_node_affinity(d);
> +    }
> +    rcu_read_unlock(&domlist_read_lock);
>  out:
>      spin_unlock(&cpupool_lock);
>      cpupool_dprintk("cpupool_unassign_cpu ret=%ld\n", ret);
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index eac8804..a1840c9 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -652,6 +652,12 @@ int cpu_disable_scheduler(unsigned int cpu)
>      if ( c == NULL )
>          return ret;
>
> +    /*
> +     * We'd need the domain RCU lock, but:
> +     *  - when we are called from cpupool code, it's acquired there already;
> +     *  - when we are called for CPU teardown, we're in stop-machine context,
> +     *    so that's not be a problem.
> +     */
>      for_each_domain_in_cpupool ( d, c )
>      {
>          for_each_vcpu ( d, v )
> @@ -741,7 +747,6 @@ int cpu_disable_scheduler(unsigned int cpu)
>                      ret = -EAGAIN;
>              }
>          }
> -        domain_update_node_affinity(d);
>      }
>
>      return ret;
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/4] xen: sched: reorganize cpu_disable_scheduler()
  2015-07-08 16:01   ` George Dunlap
@ 2015-07-08 16:37     ` Dario Faggioli
  2015-07-09 10:33       ` George Dunlap
  0 siblings, 1 reply; 19+ messages in thread
From: Dario Faggioli @ 2015-07-08 16:37 UTC (permalink / raw)
  To: George Dunlap; +Cc: Juergen Gross, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 5422 bytes --]

On Wed, 2015-07-08 at 17:01 +0100, George Dunlap wrote:
> On Fri, Jul 3, 2015 at 4:49 PM, Dario Faggioli

> > --- a/xen/common/schedule.c
> > +++ b/xen/common/schedule.c
> > @@ -455,8 +455,8 @@ void vcpu_unblock(struct vcpu *v)
> >   * Do the actual movemet of a vcpu from old to new CPU. Locks for *both*
> >   * CPUs needs to have been taken already when calling this!
> >   */
> > -static void vcpu_move(struct vcpu *v, unsigned int old_cpu,
> > -                      unsigned int new_cpu)
> > +static void _vcpu_move(struct vcpu *v, unsigned int old_cpu,
> > +                       unsigned int new_cpu)
> >  {
> >      /*
> >       * Transfer urgency status to new CPU before switching CPUs, as
> > @@ -479,6 +479,35 @@ static void vcpu_move(struct vcpu *v, unsigned int old_cpu,
> >          v->processor = new_cpu;
> >  }
> >
> > +static void vcpu_move(struct vcpu *v, unsigned int new_cpu)
> 
> Not quite a fan of the naming scheme here.  Perhaps vcpu_move and
> vcpu_move_locked?
> 
I'm fine with that.

> Actually -- looking at this again, is there a reason to pass old_cpu
> into _vcpu_move?  It looks like old_vcpu should always be equal to
> v->processor.  That would make the function prototypes the same.
> 
It should yes, I think I can get rid of it.

> > +{
> > +    unsigned long flags;
> > +    unsigned int cpu = v->processor;
> > +    spinlock_t *lock, *new_lock;
> > +
> > +    /*
> > +     * When here, the vcpu should be ready for being moved. This means:
> > +     *  - both its original and target processor must be quiet;
> > +     *  - it must not be marked as currently running;
> > +     *  - the proper flag must be set (i.e., no one must have had any
> > +     *    chance to reset it).
> > +     */
> > +    ASSERT(is_idle_vcpu(curr_on_cpu(cpu)) &&
> > +           is_idle_vcpu(curr_on_cpu(new_cpu)));
> > +    ASSERT(!v->is_running && test_bit(_VPF_migrating, &v->pause_flags));
> > +
> > +    lock = per_cpu(schedule_data, v->processor).schedule_lock;
> > +    new_lock = per_cpu(schedule_data, new_cpu).schedule_lock;
> > +
> > +    sched_spin_lock_double(lock, new_lock, &flags);
> > +    ASSERT(new_cpu != v->processor);
> 
> Don't we need to do the "schedule lock dance" here as well --
> double-check to make sure that v->processor hasn't changed since the
> time we grabbed the lock?
> 
This is intended to be called pretty much only from the place where it's
called, i.e., during system teardown, when already is already quite
quiet.

So, no, I don't think we need that, but I probably could have made this
_a_lot_ more clear both with comments and ASSERT()-s. Would that be ok?

> > @@ -645,25 +675,72 @@ int cpu_disable_scheduler(unsigned int cpu)
> >                  cpumask_setall(v->cpu_hard_affinity);
> >              }
> >
> > -            if ( v->processor == cpu )
> > +            if ( v->processor != cpu )
> >              {
> > -                set_bit(_VPF_migrating, &v->pause_flags);
> > +                /* This vcpu is not on cpu, so we can move on. */
> >                  vcpu_schedule_unlock_irqrestore(lock, flags, v);
> > -                vcpu_sleep_nosync(v);
> > -                vcpu_migrate(v);
> > +                continue;
> >              }
> > -            else
> > -                vcpu_schedule_unlock_irqrestore(lock, flags, v);
> >
> >              /*
> > -             * A vcpu active in the hypervisor will not be migratable.
> > -             * The caller should try again after releasing and reaquiring
> > -             * all locks.
> > +             * If we're here, it means that the vcpu is on cpu. Let's see how
> > +             * it's best to send it away, depending on whether we are shutting
> > +             * down/suspending, or doing cpupool manipulations.
> >               */
> > -            if ( v->processor == cpu )
> > -                ret = -EAGAIN;
> > -        }
> > +            set_bit(_VPF_migrating, &v->pause_flags);
> > +            vcpu_schedule_unlock_irqrestore(lock, flags, v);
> > +            vcpu_sleep_nosync(v);
> 
> I don't quite understand this.  By calling _nosync(), you're not
> guaranteed that the vcpu has actually stopped running when you call
> vcpu_move() down below; and yet inside vcpu_move(), you assert
> !v->is_running.
>
> So either at this point, when doing suspend, the vcpu has already been
> paused; in which case this is unnecessary; or it hasn't been paused,
> in which case we're potentially racing the IPI / deschedule, and will
> trip over the ASSERT if we "win".
> 
The former: if we're are suspending, at this stage, everything is paused
already. My aim was to minimize the code being special cased basing on
system_state, and I left this out because it is required for the
!SYS_STATE_suspend case, and does not harm in the SYS_STATE_suspended
case.

However, I see how you find it confusing, and agree it could trip people
over. I'll restructure the code in such a way that we go through this
only in the non-suspending case (and change vcpu_move() accordingly).

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/4] xen: sched: reorganize cpu_disable_scheduler()
  2015-07-08 15:13     ` Dario Faggioli
@ 2015-07-09 10:24       ` Dario Faggioli
  2015-07-09 10:45         ` Juergen Gross
  0 siblings, 1 reply; 19+ messages in thread
From: Dario Faggioli @ 2015-07-09 10:24 UTC (permalink / raw)
  To: Juergen Gross; +Cc: George Dunlap, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 3275 bytes --]

Hey,

1 thing...

On Wed, 2015-07-08 at 17:13 +0200, Dario Faggioli wrote:
> On Tue, 2015-07-07 at 13:16 +0200, Juergen Gross wrote:

> > > @@ -645,25 +675,72 @@ int cpu_disable_scheduler(unsigned int cpu)
> > >                   cpumask_setall(v->cpu_hard_affinity);
> > >               }
> > >
> > > -            if ( v->processor == cpu )
> > > +            if ( v->processor != cpu )
> > >               {
> > > -                set_bit(_VPF_migrating, &v->pause_flags);
> > > +                /* This vcpu is not on cpu, so we can move on. */
> > >                   vcpu_schedule_unlock_irqrestore(lock, flags, v);
> > > -                vcpu_sleep_nosync(v);
> > > -                vcpu_migrate(v);
> > > +                continue;
> > >               }
> > > -            else
> > > -                vcpu_schedule_unlock_irqrestore(lock, flags, v);
> > >
> > >               /*
> > > -             * A vcpu active in the hypervisor will not be migratable.
> > > -             * The caller should try again after releasing and reaquiring
> > > -             * all locks.
> > > +             * If we're here, it means that the vcpu is on cpu. Let's see how
> > > +             * it's best to send it away, depending on whether we are shutting
> > > +             * down/suspending, or doing cpupool manipulations.
> > >                */
> > > -            if ( v->processor == cpu )
> > > -                ret = -EAGAIN;
> > > -        }
> > > +            set_bit(_VPF_migrating, &v->pause_flags);
> > > +            vcpu_schedule_unlock_irqrestore(lock, flags, v);
> > > +            vcpu_sleep_nosync(v);
> > > +
> > > +            /*
> > > +             * In case of shutdown/suspend, it is not necessary to ask the
> > > +             * scheduler to chime in. In fact:
> > > +             *  * there is no reason for it: the end result we are after is
> > > +             *    just 'all the vcpus on the boot pcpu, and no vcpu anywhere
> > > +             *    else', so let's just go for it;
> > > +             *  * it's wrong, when dealing a cpupool with only non-boot pcpus,
> > > +             *    as the scheduler will always fail to send the vcpus away
> > > +             *    from the last online (non boot) pcpu!
> > 
> > I'd add a comment that in shutdown/suspend case all domains are being
> > paused, so we can be active in dom0/Pool-0 only.
> > 
> Sure, I'll add this.
> 
...while putting such a comment together, I'm realizing that I'm not
sure about what you meant, or what you wanted the comment itself to
express.

I mean, it is certainly true that all domains are being paused (they've
been paused already, actually), but that include Dom0 too. Also, we are
in Xen, in stop_machine context, so I'm not sure what you meant either
with "we can be active in dom0/Pool-0 only".

So, I'm adding a line about things being paused. If you think I should
say anything more than that, let me know.

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/4] xen: sched: reorganize cpu_disable_scheduler()
  2015-07-08 16:37     ` Dario Faggioli
@ 2015-07-09 10:33       ` George Dunlap
  0 siblings, 0 replies; 19+ messages in thread
From: George Dunlap @ 2015-07-09 10:33 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: Juergen Gross, xen-devel

On 07/08/2015 05:37 PM, Dario Faggioli wrote:
> On Wed, 2015-07-08 at 17:01 +0100, George Dunlap wrote:
>> On Fri, Jul 3, 2015 at 4:49 PM, Dario Faggioli
> 
>>> --- a/xen/common/schedule.c
>>> +++ b/xen/common/schedule.c
>>> @@ -455,8 +455,8 @@ void vcpu_unblock(struct vcpu *v)
>>>   * Do the actual movemet of a vcpu from old to new CPU. Locks for *both*
>>>   * CPUs needs to have been taken already when calling this!
>>>   */
>>> -static void vcpu_move(struct vcpu *v, unsigned int old_cpu,
>>> -                      unsigned int new_cpu)
>>> +static void _vcpu_move(struct vcpu *v, unsigned int old_cpu,
>>> +                       unsigned int new_cpu)
>>>  {
>>>      /*
>>>       * Transfer urgency status to new CPU before switching CPUs, as
>>> @@ -479,6 +479,35 @@ static void vcpu_move(struct vcpu *v, unsigned int old_cpu,
>>>          v->processor = new_cpu;
>>>  }
>>>
>>> +static void vcpu_move(struct vcpu *v, unsigned int new_cpu)
>>
>> Not quite a fan of the naming scheme here.  Perhaps vcpu_move and
>> vcpu_move_locked?
>>
> I'm fine with that.
> 
>> Actually -- looking at this again, is there a reason to pass old_cpu
>> into _vcpu_move?  It looks like old_vcpu should always be equal to
>> v->processor.  That would make the function prototypes the same.
>>
> It should yes, I think I can get rid of it.
> 
>>> +{
>>> +    unsigned long flags;
>>> +    unsigned int cpu = v->processor;
>>> +    spinlock_t *lock, *new_lock;
>>> +
>>> +    /*
>>> +     * When here, the vcpu should be ready for being moved. This means:
>>> +     *  - both its original and target processor must be quiet;
>>> +     *  - it must not be marked as currently running;
>>> +     *  - the proper flag must be set (i.e., no one must have had any
>>> +     *    chance to reset it).
>>> +     */
>>> +    ASSERT(is_idle_vcpu(curr_on_cpu(cpu)) &&
>>> +           is_idle_vcpu(curr_on_cpu(new_cpu)));
>>> +    ASSERT(!v->is_running && test_bit(_VPF_migrating, &v->pause_flags));
>>> +
>>> +    lock = per_cpu(schedule_data, v->processor).schedule_lock;
>>> +    new_lock = per_cpu(schedule_data, new_cpu).schedule_lock;
>>> +
>>> +    sched_spin_lock_double(lock, new_lock, &flags);
>>> +    ASSERT(new_cpu != v->processor);
>>
>> Don't we need to do the "schedule lock dance" here as well --
>> double-check to make sure that v->processor hasn't changed since the
>> time we grabbed the lock?
>>
> This is intended to be called pretty much only from the place where it's
> called, i.e., during system teardown, when already is already quite
> quiet.
> 
> So, no, I don't think we need that, but I probably could have made this
> _a_lot_ more clear both with comments and ASSERT()-s. Would that be ok?

Yes, a comment and an assert are important.  We might give it a more
descriptive name too -- vcpu_move_teardown()?  I can't immediately think
of something that I like though. :-/

Anyway, I guess use your best judgement on the name.

> 
>>> @@ -645,25 +675,72 @@ int cpu_disable_scheduler(unsigned int cpu)
>>>                  cpumask_setall(v->cpu_hard_affinity);
>>>              }
>>>
>>> -            if ( v->processor == cpu )
>>> +            if ( v->processor != cpu )
>>>              {
>>> -                set_bit(_VPF_migrating, &v->pause_flags);
>>> +                /* This vcpu is not on cpu, so we can move on. */
>>>                  vcpu_schedule_unlock_irqrestore(lock, flags, v);
>>> -                vcpu_sleep_nosync(v);
>>> -                vcpu_migrate(v);
>>> +                continue;
>>>              }
>>> -            else
>>> -                vcpu_schedule_unlock_irqrestore(lock, flags, v);
>>>
>>>              /*
>>> -             * A vcpu active in the hypervisor will not be migratable.
>>> -             * The caller should try again after releasing and reaquiring
>>> -             * all locks.
>>> +             * If we're here, it means that the vcpu is on cpu. Let's see how
>>> +             * it's best to send it away, depending on whether we are shutting
>>> +             * down/suspending, or doing cpupool manipulations.
>>>               */
>>> -            if ( v->processor == cpu )
>>> -                ret = -EAGAIN;
>>> -        }
>>> +            set_bit(_VPF_migrating, &v->pause_flags);
>>> +            vcpu_schedule_unlock_irqrestore(lock, flags, v);
>>> +            vcpu_sleep_nosync(v);
>>
>> I don't quite understand this.  By calling _nosync(), you're not
>> guaranteed that the vcpu has actually stopped running when you call
>> vcpu_move() down below; and yet inside vcpu_move(), you assert
>> !v->is_running.
>>
>> So either at this point, when doing suspend, the vcpu has already been
>> paused; in which case this is unnecessary; or it hasn't been paused,
>> in which case we're potentially racing the IPI / deschedule, and will
>> trip over the ASSERT if we "win".
>>
> The former: if we're are suspending, at this stage, everything is paused
> already. My aim was to minimize the code being special cased basing on
> system_state, and I left this out because it is required for the
> !SYS_STATE_suspend case, and does not harm in the SYS_STATE_suspended
> case.
> 
> However, I see how you find it confusing, and agree it could trip people
> over. I'll restructure the code in such a way that we go through this
> only in the non-suspending case (and change vcpu_move() accordingly).

Great, thanks.

 -George

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/4] xen: sched: reorganize cpu_disable_scheduler()
  2015-07-09 10:24       ` Dario Faggioli
@ 2015-07-09 10:45         ` Juergen Gross
  2015-07-15 14:54           ` Dario Faggioli
  0 siblings, 1 reply; 19+ messages in thread
From: Juergen Gross @ 2015-07-09 10:45 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: George Dunlap, xen-devel

On 07/09/2015 12:24 PM, Dario Faggioli wrote:
> Hey,
>
> 1 thing...
>
> On Wed, 2015-07-08 at 17:13 +0200, Dario Faggioli wrote:
>> On Tue, 2015-07-07 at 13:16 +0200, Juergen Gross wrote:
>
>>>> @@ -645,25 +675,72 @@ int cpu_disable_scheduler(unsigned int cpu)
>>>>                    cpumask_setall(v->cpu_hard_affinity);
>>>>                }
>>>>
>>>> -            if ( v->processor == cpu )
>>>> +            if ( v->processor != cpu )
>>>>                {
>>>> -                set_bit(_VPF_migrating, &v->pause_flags);
>>>> +                /* This vcpu is not on cpu, so we can move on. */
>>>>                    vcpu_schedule_unlock_irqrestore(lock, flags, v);
>>>> -                vcpu_sleep_nosync(v);
>>>> -                vcpu_migrate(v);
>>>> +                continue;
>>>>                }
>>>> -            else
>>>> -                vcpu_schedule_unlock_irqrestore(lock, flags, v);
>>>>
>>>>                /*
>>>> -             * A vcpu active in the hypervisor will not be migratable.
>>>> -             * The caller should try again after releasing and reaquiring
>>>> -             * all locks.
>>>> +             * If we're here, it means that the vcpu is on cpu. Let's see how
>>>> +             * it's best to send it away, depending on whether we are shutting
>>>> +             * down/suspending, or doing cpupool manipulations.
>>>>                 */
>>>> -            if ( v->processor == cpu )
>>>> -                ret = -EAGAIN;
>>>> -        }
>>>> +            set_bit(_VPF_migrating, &v->pause_flags);
>>>> +            vcpu_schedule_unlock_irqrestore(lock, flags, v);
>>>> +            vcpu_sleep_nosync(v);
>>>> +
>>>> +            /*
>>>> +             * In case of shutdown/suspend, it is not necessary to ask the
>>>> +             * scheduler to chime in. In fact:
>>>> +             *  * there is no reason for it: the end result we are after is
>>>> +             *    just 'all the vcpus on the boot pcpu, and no vcpu anywhere
>>>> +             *    else', so let's just go for it;
>>>> +             *  * it's wrong, when dealing a cpupool with only non-boot pcpus,
>>>> +             *    as the scheduler will always fail to send the vcpus away
>>>> +             *    from the last online (non boot) pcpu!
>>>
>>> I'd add a comment that in shutdown/suspend case all domains are being
>>> paused, so we can be active in dom0/Pool-0 only.
>>>
>> Sure, I'll add this.
>>
> ...while putting such a comment together, I'm realizing that I'm not
> sure about what you meant, or what you wanted the comment itself to
> express.
>
> I mean, it is certainly true that all domains are being paused (they've
> been paused already, actually), but that include Dom0 too. Also, we are
> in Xen, in stop_machine context, so I'm not sure what you meant either
> with "we can be active in dom0/Pool-0 only".

We are running on the vcpu which issued the hypercall resulting in
pausing the domains. A vcpu can't pause itself.

> So, I'm adding a line about things being paused. If you think I should
> say anything more than that, let me know.

I think dom0/Pool-0 should be mentioned. The coding is written with this
assumption so it should be documented.


Juergen

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/4] xen: sched: reorganize cpu_disable_scheduler()
  2015-07-09 10:45         ` Juergen Gross
@ 2015-07-15 14:54           ` Dario Faggioli
  2015-07-15 15:07             ` Juergen Gross
  0 siblings, 1 reply; 19+ messages in thread
From: Dario Faggioli @ 2015-07-15 14:54 UTC (permalink / raw)
  To: Juergen Gross; +Cc: George Dunlap, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 4743 bytes --]

On Thu, 2015-07-09 at 12:45 +0200, Juergen Gross wrote:
> On 07/09/2015 12:24 PM, Dario Faggioli wrote:
> > On Wed, 2015-07-08 at 17:13 +0200, Dario Faggioli wrote:
> >> On Tue, 2015-07-07 at 13:16 +0200, Juergen Gross wrote:
> >>>> +            /*
> >>>> +             * In case of shutdown/suspend, it is not necessary to ask the
> >>>> +             * scheduler to chime in. In fact:
> >>>> +             *  * there is no reason for it: the end result we are after is
> >>>> +             *    just 'all the vcpus on the boot pcpu, and no vcpu anywhere
> >>>> +             *    else', so let's just go for it;
> >>>> +             *  * it's wrong, when dealing a cpupool with only non-boot pcpus,
> >>>> +             *    as the scheduler will always fail to send the vcpus away
> >>>> +             *    from the last online (non boot) pcpu!
> >>>
> >>> I'd add a comment that in shutdown/suspend case all domains are being
> >>> paused, so we can be active in dom0/Pool-0 only.
> >>>
> >> Sure, I'll add this.
> >>
> > ...while putting such a comment together, I'm realizing that I'm not
> > sure about what you meant, or what you wanted the comment itself to
> > express.
> >
> > I mean, it is certainly true that all domains are being paused (they've
> > been paused already, actually), but that include Dom0 too. Also, we are
> > in Xen, in stop_machine context, so I'm not sure what you meant either
> > with "we can be active in dom0/Pool-0 only".
> 
> We are running on the vcpu which issued the hypercall resulting in
> pausing the domains. A vcpu can't pause itself.
> 
Hey, sorry it took me a bit to reply.

Actually, I'm still not getting the "we are running on the vcpu" part.
Perhaps it's all a terminology issue that we have, and it may not be
that much worthwhile to spend a lot of time on it.

However, the hypercall to suspend/shutdown the system has indeed been
issued by a dom0 vcpu. The call chain is this:

XENPF_enter_acpi_sleep:
  acpi_enter_sleep()
    continue_hypercall_on_cpu(0, enter_state_helper)
      enter_state()
        disable_nonboot_cpus()
          for_each_online_cpu { cpu_down() }
                                  stop_machine_run(take_cpu_down)
                                    take_cpu_down()
                                      __cpu_disable()
                                        cpu_disable_scheduler()

The doc comment of acpu_enter_sleep_state() is as follows:

/*
 * Dom0 issues this hypercall in place of writing pm1a_cnt. Xen then
 * takes over the control and put the system into sleep state really.
 */

The pausing happens from within enter_state(), before calling
disable_nonboot_cpus(), in freeze_domains(). freeze_domains() does this:

    /*
     * Note that we iterate in order of domain-id. Hence we will pause dom0
     * first which is required for correctness (as only dom0 can add domains to
     * the domain list). Otherwise we could miss concurrently-created domains.
     */
    for_each_domain ( d )
        domain_pause(d);

Looking inside domain_pause(), I found this ASSERT:

  ASSERT(d != current->domain);

meaning that, if one of the dom0 vcpus were running when we reach this
point, we would explode, wouldn't we?

I've added some quick-&-dirty printk in freeze_domains(), checking
what's current, and here's what they say:

(XEN) XXX freezing d0 from cpu 0, running d32767.v0
(XEN) XXX pausing d0.v0, while running d32767.v0
(XEN) XXX pausing d0.v1, while running d32767.v0
(XEN) XXX pausing d0.v2, while running d32767.v0
(XEN) XXX pausing d0.v3, while running d32767.v0
(XEN) XXX pausing d0.v4, while running d32767.v0
(XEN) XXX pausing d0.v5, while running d32767.v0
(XEN) XXX pausing d0.v6, while running d32767.v0
(XEN) XXX pausing d0.v7, while running d32767.v0

So, I'm probably missing what you actually mean with "we can be active
in dom0/Pool-0 only" and "We are running on the vcpu which issued the
hypercall resulting in pausing the domains".

The way I interpreted it was that a dom0 vcpu was current on some pcpu,
but that is clearly not the case.

> > So, I'm adding a line about things being paused. If you think I should
> > say anything more than that, let me know.
> 
> I think dom0/Pool-0 should be mentioned. The coding is written with this
> assumption so it should be documented.
> 
Yeah, but, given all the above, I'm not sure how...

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/4] xen: sched: reorganize cpu_disable_scheduler()
  2015-07-15 14:54           ` Dario Faggioli
@ 2015-07-15 15:07             ` Juergen Gross
  0 siblings, 0 replies; 19+ messages in thread
From: Juergen Gross @ 2015-07-15 15:07 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: George Dunlap, xen-devel

On 07/15/2015 04:54 PM, Dario Faggioli wrote:
> On Thu, 2015-07-09 at 12:45 +0200, Juergen Gross wrote:
>> On 07/09/2015 12:24 PM, Dario Faggioli wrote:
>>> On Wed, 2015-07-08 at 17:13 +0200, Dario Faggioli wrote:
>>>> On Tue, 2015-07-07 at 13:16 +0200, Juergen Gross wrote:
>>>>>> +            /*
>>>>>> +             * In case of shutdown/suspend, it is not necessary to ask the
>>>>>> +             * scheduler to chime in. In fact:
>>>>>> +             *  * there is no reason for it: the end result we are after is
>>>>>> +             *    just 'all the vcpus on the boot pcpu, and no vcpu anywhere
>>>>>> +             *    else', so let's just go for it;
>>>>>> +             *  * it's wrong, when dealing a cpupool with only non-boot pcpus,
>>>>>> +             *    as the scheduler will always fail to send the vcpus away
>>>>>> +             *    from the last online (non boot) pcpu!
>>>>>
>>>>> I'd add a comment that in shutdown/suspend case all domains are being
>>>>> paused, so we can be active in dom0/Pool-0 only.
>>>>>
>>>> Sure, I'll add this.
>>>>
>>> ...while putting such a comment together, I'm realizing that I'm not
>>> sure about what you meant, or what you wanted the comment itself to
>>> express.
>>>
>>> I mean, it is certainly true that all domains are being paused (they've
>>> been paused already, actually), but that include Dom0 too. Also, we are
>>> in Xen, in stop_machine context, so I'm not sure what you meant either
>>> with "we can be active in dom0/Pool-0 only".
>>
>> We are running on the vcpu which issued the hypercall resulting in
>> pausing the domains. A vcpu can't pause itself.
>>
> Hey, sorry it took me a bit to reply.
>
> Actually, I'm still not getting the "we are running on the vcpu" part.
> Perhaps it's all a terminology issue that we have, and it may not be
> that much worthwhile to spend a lot of time on it.
>
> However, the hypercall to suspend/shutdown the system has indeed been
> issued by a dom0 vcpu. The call chain is this:
>
> XENPF_enter_acpi_sleep:
>    acpi_enter_sleep()
>      continue_hypercall_on_cpu(0, enter_state_helper)

Interesting. I managed to miss this one.

I'm rather sure this was handled differently when I initially wrote the
cpupool stuff. So please forget my comment about Pool-0/dom0, it was
from memory and is no longer or was never correct.


Juergen

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2015-07-15 15:07 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-03 15:49 [PATCH 0/4] xen: sched/cpupool: more fixing of (corner?) cases Dario Faggioli
2015-07-03 15:49 ` [PATCH 1/4] xen: sched: factor the code for taking two runq locks in a function Dario Faggioli
2015-07-08 14:47   ` George Dunlap
2015-07-03 15:49 ` [PATCH 2/4] xen: sched: factor code that moves a vcpu to a new pcpu " Dario Faggioli
2015-07-08 14:56   ` George Dunlap
2015-07-08 15:09     ` Dario Faggioli
2015-07-03 15:49 ` [PATCH 3/4] xen: sched: reorganize cpu_disable_scheduler() Dario Faggioli
2015-07-07 11:16   ` Juergen Gross
2015-07-08 15:13     ` Dario Faggioli
2015-07-09 10:24       ` Dario Faggioli
2015-07-09 10:45         ` Juergen Gross
2015-07-15 14:54           ` Dario Faggioli
2015-07-15 15:07             ` Juergen Gross
2015-07-08 16:01   ` George Dunlap
2015-07-08 16:37     ` Dario Faggioli
2015-07-09 10:33       ` George Dunlap
2015-07-03 15:49 ` [PATCH 4/4] xen: sched/cpupool: properly update affinity when removing a cpu from a cpupool Dario Faggioli
2015-07-07 11:30   ` Juergen Gross
2015-07-08 16:19   ` George Dunlap

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).