[PATCH] fix locking in cpu_disable

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] fix locking in cpu_disable_scheduler()
@ 2013-10-28 16:05 Jan Beulich
  2013-10-28 15:24 ` Keir Fraser
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Jan Beulich @ 2013-10-28 16:05 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Keir Fraser

[-- Attachment #1: Type: text/plain, Size: 2802 bytes --]

So commit eedd6039 ("scheduler: adjust internal locking interface")
uncovered - by now using proper spin lock constructs - a bug after all:
When bringing down a CPU, cpu_disable_scheduler() gets called with
interrupts disabled, and hence the use of vcpu_schedule_lock_irq() was
never really correct (i.e. the caller ended up with interrupts enabled
despite having disabled them explicitly).

Fixing this however surfaced another problem: The call path
vcpu_migrate() -> evtchn_move_pirqs() wants to acquire the event lock,
which however is a non-IRQ-safe once, and hence check_lock() doesn't
like this lock to be acquired when interrupts are already off. As we're
in stop-machine context here, getting things wrong wrt interrupt state
management during lock acquire/release is out of question though, so
the simple solution to this appears to be to just suppress spin lock
debugging for the period of time while the stop machine callback gets
run.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -601,7 +601,8 @@ int cpu_disable_scheduler(unsigned int c
     {
         for_each_vcpu ( d, v )
         {
-            spinlock_t *lock = vcpu_schedule_lock_irq(v);
+            unsigned long flags;
+            spinlock_t *lock = vcpu_schedule_lock_irqsave(v, &flags);

             cpumask_and(&online_affinity, v->cpu_affinity, c->cpu_valid);
             if ( cpumask_empty(&online_affinity) &&
@@ -622,14 +623,12 @@ int cpu_disable_scheduler(unsigned int c
             if ( v->processor == cpu )
             {
                 set_bit(_VPF_migrating, &v->pause_flags);
-                vcpu_schedule_unlock_irq(lock, v);
+                vcpu_schedule_unlock_irqrestore(lock, flags, v);
                 vcpu_sleep_nosync(v);
                 vcpu_migrate(v);
             }
             else
-            {
-                vcpu_schedule_unlock_irq(lock, v);
-            }
+                vcpu_schedule_unlock_irqrestore(lock, flags, v);

             /*
              * A vcpu active in the hypervisor will not be migratable.
--- a/xen/common/stop_machine.c
+++ b/xen/common/stop_machine.c
@@ -110,6 +110,7 @@ int stop_machine_run(int (*fn)(void *), 
     local_irq_disable();
     stopmachine_set_state(STOPMACHINE_DISABLE_IRQ);
     stopmachine_wait_state();
+    spin_debug_disable();

     stopmachine_set_state(STOPMACHINE_INVOKE);
     if ( (cpu == smp_processor_id()) || (cpu == NR_CPUS) )
@@ -117,6 +118,7 @@ int stop_machine_run(int (*fn)(void *), 
     stopmachine_wait_state();
     ret = stopmachine_data.fn_result;

+    spin_debug_enable();
     stopmachine_set_state(STOPMACHINE_EXIT);
     stopmachine_wait_state();
     local_irq_enable();

[-- Attachment #2: cpu-disable-scheduler-irq.patch --]
[-- Type: text/plain, Size: 2838 bytes --]

fix locking in cpu_disable_scheduler()

So commit eedd6039 ("scheduler: adjust internal locking interface")
uncovered - by now using proper spin lock constructs - a bug after all:
When bringing down a CPU, cpu_disable_scheduler() gets called with
interrupts disabled, and hence the use of vcpu_schedule_lock_irq() was
never really correct (i.e. the caller ended up with interrupts enabled
despite having disabled them explicitly).

Fixing this however surfaced another problem: The call path
vcpu_migrate() -> evtchn_move_pirqs() wants to acquire the event lock,
which however is a non-IRQ-safe once, and hence check_lock() doesn't
like this lock to be acquired when interrupts are already off. As we're
in stop-machine context here, getting things wrong wrt interrupt state
management during lock acquire/release is out of question though, so
the simple solution to this appears to be to just suppress spin lock
debugging for the period of time while the stop machine callback gets
run.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -601,7 +601,8 @@ int cpu_disable_scheduler(unsigned int c
     {
         for_each_vcpu ( d, v )
         {
-            spinlock_t *lock = vcpu_schedule_lock_irq(v);
+            unsigned long flags;
+            spinlock_t *lock = vcpu_schedule_lock_irqsave(v, &flags);

             cpumask_and(&online_affinity, v->cpu_affinity, c->cpu_valid);
             if ( cpumask_empty(&online_affinity) &&
@@ -622,14 +623,12 @@ int cpu_disable_scheduler(unsigned int c
             if ( v->processor == cpu )
             {
                 set_bit(_VPF_migrating, &v->pause_flags);
-                vcpu_schedule_unlock_irq(lock, v);
+                vcpu_schedule_unlock_irqrestore(lock, flags, v);
                 vcpu_sleep_nosync(v);
                 vcpu_migrate(v);
             }
             else
-            {
-                vcpu_schedule_unlock_irq(lock, v);
-            }
+                vcpu_schedule_unlock_irqrestore(lock, flags, v);

             /*
              * A vcpu active in the hypervisor will not be migratable.
--- a/xen/common/stop_machine.c
+++ b/xen/common/stop_machine.c
@@ -110,6 +110,7 @@ int stop_machine_run(int (*fn)(void *), 
     local_irq_disable();
     stopmachine_set_state(STOPMACHINE_DISABLE_IRQ);
     stopmachine_wait_state();
+    spin_debug_disable();

     stopmachine_set_state(STOPMACHINE_INVOKE);
     if ( (cpu == smp_processor_id()) || (cpu == NR_CPUS) )
@@ -117,6 +118,7 @@ int stop_machine_run(int (*fn)(void *), 
     stopmachine_wait_state();
     ret = stopmachine_data.fn_result;

+    spin_debug_enable();
     stopmachine_set_state(STOPMACHINE_EXIT);
     stopmachine_wait_state();
     local_irq_enable();

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] fix locking in cpu_disable_scheduler()
  2013-10-28 16:05 [PATCH] fix locking in cpu_disable_scheduler() Jan Beulich
@ 2013-10-28 15:24 ` Keir Fraser
  2013-10-28 16:24 ` Andrew Cooper
  2013-10-29 11:30 ` George Dunlap
  2 siblings, 0 replies; 5+ messages in thread
From: Keir Fraser @ 2013-10-28 15:24 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: George Dunlap

On 28/10/2013 17:05, "Jan Beulich" <JBeulich@suse.com> wrote:

> So commit eedd6039 ("scheduler: adjust internal locking interface")
> uncovered - by now using proper spin lock constructs - a bug after all:
> When bringing down a CPU, cpu_disable_scheduler() gets called with
> interrupts disabled, and hence the use of vcpu_schedule_lock_irq() was
> never really correct (i.e. the caller ended up with interrupts enabled
> despite having disabled them explicitly).
> 
> Fixing this however surfaced another problem: The call path
> vcpu_migrate() -> evtchn_move_pirqs() wants to acquire the event lock,
> which however is a non-IRQ-safe once, and hence check_lock() doesn't
> like this lock to be acquired when interrupts are already off. As we're
> in stop-machine context here, getting things wrong wrt interrupt state
> management during lock acquire/release is out of question though, so
> the simple solution to this appears to be to just suppress spin lock
> debugging for the period of time while the stop machine callback gets
> run.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Keir Fraser <keir@xen.org>

> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -601,7 +601,8 @@ int cpu_disable_scheduler(unsigned int c
>      {
>          for_each_vcpu ( d, v )
>          {
> -            spinlock_t *lock = vcpu_schedule_lock_irq(v);
> +            unsigned long flags;
> +            spinlock_t *lock = vcpu_schedule_lock_irqsave(v, &flags);
>  
>              cpumask_and(&online_affinity, v->cpu_affinity, c->cpu_valid);
>              if ( cpumask_empty(&online_affinity) &&
> @@ -622,14 +623,12 @@ int cpu_disable_scheduler(unsigned int c
>              if ( v->processor == cpu )
>              {
>                  set_bit(_VPF_migrating, &v->pause_flags);
> -                vcpu_schedule_unlock_irq(lock, v);
> +                vcpu_schedule_unlock_irqrestore(lock, flags, v);
>                  vcpu_sleep_nosync(v);
>                  vcpu_migrate(v);
>              }
>              else
> -            {
> -                vcpu_schedule_unlock_irq(lock, v);
> -            }
> +                vcpu_schedule_unlock_irqrestore(lock, flags, v);
>  
>              /*
>               * A vcpu active in the hypervisor will not be migratable.
> --- a/xen/common/stop_machine.c
> +++ b/xen/common/stop_machine.c
> @@ -110,6 +110,7 @@ int stop_machine_run(int (*fn)(void *),
>      local_irq_disable();
>      stopmachine_set_state(STOPMACHINE_DISABLE_IRQ);
>      stopmachine_wait_state();
> +    spin_debug_disable();
>  
>      stopmachine_set_state(STOPMACHINE_INVOKE);
>      if ( (cpu == smp_processor_id()) || (cpu == NR_CPUS) )
> @@ -117,6 +118,7 @@ int stop_machine_run(int (*fn)(void *),
>      stopmachine_wait_state();
>      ret = stopmachine_data.fn_result;
>  
> +    spin_debug_enable();
>      stopmachine_set_state(STOPMACHINE_EXIT);
>      stopmachine_wait_state();
>      local_irq_enable();
> 
> 
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] fix locking in cpu_disable_scheduler()
  2013-10-28 16:05 [PATCH] fix locking in cpu_disable_scheduler() Jan Beulich
  2013-10-28 15:24 ` Keir Fraser
@ 2013-10-28 16:24 ` Andrew Cooper
  2013-10-29  7:39   ` Jan Beulich
  2013-10-29 11:30 ` George Dunlap
  2 siblings, 1 reply; 5+ messages in thread
From: Andrew Cooper @ 2013-10-28 16:24 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, xen-devel, Keir Fraser


[-- Attachment #1.1: Type: text/plain, Size: 3387 bytes --]

On 28/10/13 16:05, Jan Beulich wrote:
> So commit eedd6039 ("scheduler: adjust internal locking interface")
> uncovered - by now using proper spin lock constructs - a bug after all:
> When bringing down a CPU, cpu_disable_scheduler() gets called with
> interrupts disabled, and hence the use of vcpu_schedule_lock_irq() was
> never really correct (i.e. the caller ended up with interrupts enabled
> despite having disabled them explicitly).
>
> Fixing this however surfaced another problem: The call path
> vcpu_migrate() -> evtchn_move_pirqs() wants to acquire the event lock,
> which however is a non-IRQ-safe once, and hence check_lock() doesn't
> like this lock to be acquired when interrupts are already off. As we're
> in stop-machine context here, getting things wrong wrt interrupt state
> management during lock acquire/release is out of question though, so
> the simple solution to this appears to be to just suppress spin lock
> debugging for the period of time while the stop machine callback gets
> run.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

One comment however.  With the unlock_irqrestore functions, having the
cpu and flags parameters reversed wrt the lock_irqsave looks funny.  As
this new interface is still very new, would it be worth moving the flags
parameter to the end of unlock_irqrestore() for consistency ?

~Andrew

>
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -601,7 +601,8 @@ int cpu_disable_scheduler(unsigned int c
>      {
>          for_each_vcpu ( d, v )
>          {
> -            spinlock_t *lock = vcpu_schedule_lock_irq(v);
> +            unsigned long flags;
> +            spinlock_t *lock = vcpu_schedule_lock_irqsave(v, &flags);
>  
>              cpumask_and(&online_affinity, v->cpu_affinity, c->cpu_valid);
>              if ( cpumask_empty(&online_affinity) &&
> @@ -622,14 +623,12 @@ int cpu_disable_scheduler(unsigned int c
>              if ( v->processor == cpu )
>              {
>                  set_bit(_VPF_migrating, &v->pause_flags);
> -                vcpu_schedule_unlock_irq(lock, v);
> +                vcpu_schedule_unlock_irqrestore(lock, flags, v);
>                  vcpu_sleep_nosync(v);
>                  vcpu_migrate(v);
>              }
>              else
> -            {
> -                vcpu_schedule_unlock_irq(lock, v);
> -            }
> +                vcpu_schedule_unlock_irqrestore(lock, flags, v);
>  
>              /*
>               * A vcpu active in the hypervisor will not be migratable.
> --- a/xen/common/stop_machine.c
> +++ b/xen/common/stop_machine.c
> @@ -110,6 +110,7 @@ int stop_machine_run(int (*fn)(void *), 
>      local_irq_disable();
>      stopmachine_set_state(STOPMACHINE_DISABLE_IRQ);
>      stopmachine_wait_state();
> +    spin_debug_disable();
>  
>      stopmachine_set_state(STOPMACHINE_INVOKE);
>      if ( (cpu == smp_processor_id()) || (cpu == NR_CPUS) )
> @@ -117,6 +118,7 @@ int stop_machine_run(int (*fn)(void *), 
>      stopmachine_wait_state();
>      ret = stopmachine_data.fn_result;
>  
> +    spin_debug_enable();
>      stopmachine_set_state(STOPMACHINE_EXIT);
>      stopmachine_wait_state();
>      local_irq_enable();
>
>
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel


[-- Attachment #1.2: Type: text/html, Size: 4277 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] fix locking in cpu_disable_scheduler()
  2013-10-28 16:24 ` Andrew Cooper
@ 2013-10-29  7:39   ` Jan Beulich
  0 siblings, 0 replies; 5+ messages in thread
From: Jan Beulich @ 2013-10-29  7:39 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: George Dunlap, xen-devel, Keir Fraser

>>> On 28.10.13 at 17:24, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> One comment however.  With the unlock_irqrestore functions, having the
> cpu and flags parameters reversed wrt the lock_irqsave looks funny.  As
> this new interface is still very new, would it be worth moving the flags
> parameter to the end of unlock_irqrestore() for consistency ?

Actually I intentionally did it this way - the (v)CPU parameter is a
helper one only (unused when NDEBUG is defined), and as such
didn't seem to be appropriate to be put at a more prominent
place.

Jan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] fix locking in cpu_disable_scheduler()
  2013-10-28 16:05 [PATCH] fix locking in cpu_disable_scheduler() Jan Beulich
  2013-10-28 15:24 ` Keir Fraser
  2013-10-28 16:24 ` Andrew Cooper
@ 2013-10-29 11:30 ` George Dunlap
  2 siblings, 0 replies; 5+ messages in thread
From: George Dunlap @ 2013-10-29 11:30 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Keir Fraser

On 10/28/2013 04:05 PM, Jan Beulich wrote:
> So commit eedd6039 ("scheduler: adjust internal locking interface")
> uncovered - by now using proper spin lock constructs - a bug after all:
> When bringing down a CPU, cpu_disable_scheduler() gets called with
> interrupts disabled, and hence the use of vcpu_schedule_lock_irq() was
> never really correct (i.e. the caller ended up with interrupts enabled
> despite having disabled them explicitly).
>
> Fixing this however surfaced another problem: The call path
> vcpu_migrate() -> evtchn_move_pirqs() wants to acquire the event lock,
> which however is a non-IRQ-safe once, and hence check_lock() doesn't
> like this lock to be acquired when interrupts are already off. As we're
> in stop-machine context here, getting things wrong wrt interrupt state
> management during lock acquire/release is out of question though, so
> the simple solution to this appears to be to just suppress spin lock
> debugging for the period of time while the stop machine callback gets
> run.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: George Dunlap <george.dunlap@eu.citrix.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-10-29 11:30 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-28 16:05 [PATCH] fix locking in cpu_disable_scheduler() Jan Beulich
2013-10-28 15:24 ` Keir Fraser
2013-10-28 16:24 ` Andrew Cooper
2013-10-29  7:39   ` Jan Beulich
2013-10-29 11:30 ` George Dunlap

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).