xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] fix locking in cpu_disable_scheduler()
@ 2013-10-28 16:05 Jan Beulich
  2013-10-28 15:24 ` Keir Fraser
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Jan Beulich @ 2013-10-28 16:05 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Keir Fraser

[-- Attachment #1: Type: text/plain, Size: 2802 bytes --]

So commit eedd6039 ("scheduler: adjust internal locking interface")
uncovered - by now using proper spin lock constructs - a bug after all:
When bringing down a CPU, cpu_disable_scheduler() gets called with
interrupts disabled, and hence the use of vcpu_schedule_lock_irq() was
never really correct (i.e. the caller ended up with interrupts enabled
despite having disabled them explicitly).

Fixing this however surfaced another problem: The call path
vcpu_migrate() -> evtchn_move_pirqs() wants to acquire the event lock,
which however is a non-IRQ-safe once, and hence check_lock() doesn't
like this lock to be acquired when interrupts are already off. As we're
in stop-machine context here, getting things wrong wrt interrupt state
management during lock acquire/release is out of question though, so
the simple solution to this appears to be to just suppress spin lock
debugging for the period of time while the stop machine callback gets
run.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -601,7 +601,8 @@ int cpu_disable_scheduler(unsigned int c
     {
         for_each_vcpu ( d, v )
         {
-            spinlock_t *lock = vcpu_schedule_lock_irq(v);
+            unsigned long flags;
+            spinlock_t *lock = vcpu_schedule_lock_irqsave(v, &flags);
 
             cpumask_and(&online_affinity, v->cpu_affinity, c->cpu_valid);
             if ( cpumask_empty(&online_affinity) &&
@@ -622,14 +623,12 @@ int cpu_disable_scheduler(unsigned int c
             if ( v->processor == cpu )
             {
                 set_bit(_VPF_migrating, &v->pause_flags);
-                vcpu_schedule_unlock_irq(lock, v);
+                vcpu_schedule_unlock_irqrestore(lock, flags, v);
                 vcpu_sleep_nosync(v);
                 vcpu_migrate(v);
             }
             else
-            {
-                vcpu_schedule_unlock_irq(lock, v);
-            }
+                vcpu_schedule_unlock_irqrestore(lock, flags, v);
 
             /*
              * A vcpu active in the hypervisor will not be migratable.
--- a/xen/common/stop_machine.c
+++ b/xen/common/stop_machine.c
@@ -110,6 +110,7 @@ int stop_machine_run(int (*fn)(void *), 
     local_irq_disable();
     stopmachine_set_state(STOPMACHINE_DISABLE_IRQ);
     stopmachine_wait_state();
+    spin_debug_disable();
 
     stopmachine_set_state(STOPMACHINE_INVOKE);
     if ( (cpu == smp_processor_id()) || (cpu == NR_CPUS) )
@@ -117,6 +118,7 @@ int stop_machine_run(int (*fn)(void *), 
     stopmachine_wait_state();
     ret = stopmachine_data.fn_result;
 
+    spin_debug_enable();
     stopmachine_set_state(STOPMACHINE_EXIT);
     stopmachine_wait_state();
     local_irq_enable();




[-- Attachment #2: cpu-disable-scheduler-irq.patch --]
[-- Type: text/plain, Size: 2838 bytes --]

fix locking in cpu_disable_scheduler()

So commit eedd6039 ("scheduler: adjust internal locking interface")
uncovered - by now using proper spin lock constructs - a bug after all:
When bringing down a CPU, cpu_disable_scheduler() gets called with
interrupts disabled, and hence the use of vcpu_schedule_lock_irq() was
never really correct (i.e. the caller ended up with interrupts enabled
despite having disabled them explicitly).

Fixing this however surfaced another problem: The call path
vcpu_migrate() -> evtchn_move_pirqs() wants to acquire the event lock,
which however is a non-IRQ-safe once, and hence check_lock() doesn't
like this lock to be acquired when interrupts are already off. As we're
in stop-machine context here, getting things wrong wrt interrupt state
management during lock acquire/release is out of question though, so
the simple solution to this appears to be to just suppress spin lock
debugging for the period of time while the stop machine callback gets
run.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -601,7 +601,8 @@ int cpu_disable_scheduler(unsigned int c
     {
         for_each_vcpu ( d, v )
         {
-            spinlock_t *lock = vcpu_schedule_lock_irq(v);
+            unsigned long flags;
+            spinlock_t *lock = vcpu_schedule_lock_irqsave(v, &flags);
 
             cpumask_and(&online_affinity, v->cpu_affinity, c->cpu_valid);
             if ( cpumask_empty(&online_affinity) &&
@@ -622,14 +623,12 @@ int cpu_disable_scheduler(unsigned int c
             if ( v->processor == cpu )
             {
                 set_bit(_VPF_migrating, &v->pause_flags);
-                vcpu_schedule_unlock_irq(lock, v);
+                vcpu_schedule_unlock_irqrestore(lock, flags, v);
                 vcpu_sleep_nosync(v);
                 vcpu_migrate(v);
             }
             else
-            {
-                vcpu_schedule_unlock_irq(lock, v);
-            }
+                vcpu_schedule_unlock_irqrestore(lock, flags, v);
 
             /*
              * A vcpu active in the hypervisor will not be migratable.
--- a/xen/common/stop_machine.c
+++ b/xen/common/stop_machine.c
@@ -110,6 +110,7 @@ int stop_machine_run(int (*fn)(void *), 
     local_irq_disable();
     stopmachine_set_state(STOPMACHINE_DISABLE_IRQ);
     stopmachine_wait_state();
+    spin_debug_disable();
 
     stopmachine_set_state(STOPMACHINE_INVOKE);
     if ( (cpu == smp_processor_id()) || (cpu == NR_CPUS) )
@@ -117,6 +118,7 @@ int stop_machine_run(int (*fn)(void *), 
     stopmachine_wait_state();
     ret = stopmachine_data.fn_result;
 
+    spin_debug_enable();
     stopmachine_set_state(STOPMACHINE_EXIT);
     stopmachine_wait_state();
     local_irq_enable();

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-10-29 11:30 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-28 16:05 [PATCH] fix locking in cpu_disable_scheduler() Jan Beulich
2013-10-28 15:24 ` Keir Fraser
2013-10-28 16:24 ` Andrew Cooper
2013-10-29  7:39   ` Jan Beulich
2013-10-29 11:30 ` George Dunlap

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).