xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] xen: credit1: fix a race when picking initial pCPU for a vCPU
@ 2016-08-12  4:07 Dario Faggioli
  2016-08-12  9:14 ` George Dunlap
  0 siblings, 1 reply; 4+ messages in thread
From: Dario Faggioli @ 2016-08-12  4:07 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Andrew Cooper, Jan Beulich

In the Credit1 hunk of 9f358ddd69463 ("xen: Have
schedulers revise initial placement") csched_cpu_pick()
is called without taking the runqueue lock of the
(temporary) pCPU that the vCPU has been assigned to
(e.g., in XEN_DOMCTL_max_vcpus).

However, although 'hidden' in the IS_RUNQ_IDLE() macro,
that function does access the runq (for doing load
balancing calculations), and hence the appropriate lock
must be taken.

Races have been observed, in the form of IS_RUNQ_IDLE()
falling over LIST_POISON.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Jan Beulich <JBeulich@suse.com>
---
 xen/common/sched_credit.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 220ff0d..b6f82e8 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -998,9 +998,13 @@ csched_vcpu_insert(const struct scheduler *ops, struct vcpu *vc)
 
     BUG_ON( is_idle_vcpu(vc) );
 
-    /* This is safe because vc isn't yet being scheduled */
+    /* csched_cpu_pick() looks in vc->processor's runq, so we need the lock. */
+    lock = vcpu_schedule_lock_irq(vc);
+
     vc->processor = csched_cpu_pick(ops, vc);
 
+    spin_unlock_irq(lock);
+
     lock = vcpu_schedule_lock_irq(vc);
 
     if ( !__vcpu_on_runq(svc) && vcpu_runnable(vc) && !vc->is_running )


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] xen: credit1: fix a race when picking initial pCPU for a vCPU
  2016-08-12  4:07 [PATCH] xen: credit1: fix a race when picking initial pCPU for a vCPU Dario Faggioli
@ 2016-08-12  9:14 ` George Dunlap
  2016-08-12  9:46   ` Dario Faggioli
  2016-08-12 15:17   ` Dario Faggioli
  0 siblings, 2 replies; 4+ messages in thread
From: George Dunlap @ 2016-08-12  9:14 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: George Dunlap, Andrew Cooper, Jan Beulich

On 12/08/16 05:07, Dario Faggioli wrote:
> In the Credit1 hunk of 9f358ddd69463 ("xen: Have
> schedulers revise initial placement") csched_cpu_pick()
> is called without taking the runqueue lock of the
> (temporary) pCPU that the vCPU has been assigned to
> (e.g., in XEN_DOMCTL_max_vcpus).
> 
> However, although 'hidden' in the IS_RUNQ_IDLE() macro,
> that function does access the runq (for doing load
> balancing calculations), and hence the appropriate lock
> must be taken.
> 
> Races have been observed, in the form of IS_RUNQ_IDLE()
> falling over LIST_POISON.
> 
> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

It might be nice if we could add an ASSERT() that the appropriate
runqueue was locked, to make sure we don't get caught out again like
this in the future, but I think that would probably require turning it
into a static inline (which probably wouldn't be so bad anyway).

But in any case:

Acked-by: George Dunlap <george.dunlap@citrix.com>

Let me know if you want me to check this in as-is or if you think you
might send a follow-up patch adding an ASSERT.

 -George

> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: Jan Beulich <JBeulich@suse.com>
> ---
>  xen/common/sched_credit.c |    6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
> index 220ff0d..b6f82e8 100644
> --- a/xen/common/sched_credit.c
> +++ b/xen/common/sched_credit.c
> @@ -998,9 +998,13 @@ csched_vcpu_insert(const struct scheduler *ops, struct vcpu *vc)
>  
>      BUG_ON( is_idle_vcpu(vc) );
>  
> -    /* This is safe because vc isn't yet being scheduled */
> +    /* csched_cpu_pick() looks in vc->processor's runq, so we need the lock. */
> +    lock = vcpu_schedule_lock_irq(vc);
> +
>      vc->processor = csched_cpu_pick(ops, vc);
>  
> +    spin_unlock_irq(lock);
> +
>      lock = vcpu_schedule_lock_irq(vc);
>  
>      if ( !__vcpu_on_runq(svc) && vcpu_runnable(vc) && !vc->is_running )
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] xen: credit1: fix a race when picking initial pCPU for a vCPU
  2016-08-12  9:14 ` George Dunlap
@ 2016-08-12  9:46   ` Dario Faggioli
  2016-08-12 15:17   ` Dario Faggioli
  1 sibling, 0 replies; 4+ messages in thread
From: Dario Faggioli @ 2016-08-12  9:46 UTC (permalink / raw)
  To: George Dunlap, xen-devel; +Cc: George Dunlap, Andrew Cooper, Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 1062 bytes --]

On Fri, 2016-08-12 at 10:14 +0100, George Dunlap wrote:
> On 12/08/16 05:07, Dario Faggioli wrote:
> > 
> > Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
> > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> It might be nice if we could add an ASSERT() that the appropriate
> runqueue was locked, to make sure we don't get caught out again like
> this in the future, but I think that would probably require turning
> it
> into a static inline (which probably wouldn't be so bad anyway).
> 
Mmm... good point.

> But in any case:
> 
> Acked-by: George Dunlap <george.dunlap@citrix.com>
> 
> Let me know if you want me to check this in as-is or if you think you
> might send a follow-up patch adding an ASSERT.
> 
Yes, I'll send a new patch.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] xen: credit1: fix a race when picking initial pCPU for a vCPU
  2016-08-12  9:14 ` George Dunlap
  2016-08-12  9:46   ` Dario Faggioli
@ 2016-08-12 15:17   ` Dario Faggioli
  1 sibling, 0 replies; 4+ messages in thread
From: Dario Faggioli @ 2016-08-12 15:17 UTC (permalink / raw)
  To: George Dunlap, xen-devel; +Cc: George Dunlap, Andrew Cooper, Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 2469 bytes --]

On Fri, 2016-08-12 at 10:14 +0100, George Dunlap wrote:
> On 12/08/16 05:07, Dario Faggioli wrote:

> Let me know if you want me to check this in as-is or if you think you
> might send a follow-up patch adding an ASSERT.
> 
Done, and it actually explodes like this:

(XEN) [    4.870128] Xen call trace:
(XEN) [    4.870130]    [<ffff82d080131cba>] spinlock.c#check_lock+0x42/0x46
(XEN) [    4.870133]    [<ffff82d080131db2>] _spin_is_locked+0x11/0x4d
(XEN) [    4.870139]    [<ffff82d080126c2b>] sched_credit.c#_csched_cpu_pick+0x1a9/0x632
(XEN) [    4.870142]    [<ffff82d08012747f>] sched_credit.c#csched_tick+0x1fd/0x385
(XEN) [    4.870146]    [<ffff82d080134a16>] timer.c#execute_timer+0x47/0x62
(XEN) [    4.870148]    [<ffff82d080134b0c>] timer.c#timer_softirq_action+0xdb/0x22c
(XEN) [    4.870151]    [<ffff82d080131487>] softirq.c#__do_softirq+0x7f/0x8a
(XEN) [    4.870153]    [<ffff82d0801314dc>] do_softirq+0x13/0x15
(XEN) [    4.870157]    [<ffff82d080243e01>] entry.o#process_softirqs+0x21/0x30
(XEN) [    4.870159] 
(XEN) [    5.619096] 
(XEN) [    5.621085] ****************************************
(XEN) [    5.626536] Panic on CPU 0:
(XEN) [    5.629826] Xen BUG at spinlock.c:48
(XEN) [    5.633895] ****************************************

And if I look at csched_tick(), it indeed is the case that we
call csched_vcpu_acct() **without** holding the runq lock.

It in turns calls things like burn_credits(), accesses current, and
other stuff, which I'm having a little bit of an hard time convincing
myself it is safe... Although it must be, if there have been no issues
after all these years. :-O

csched_runq_sort(), called later, still by csched_tick(), acquires the
lock by itself, and we can't acquire it in csched_tick(), because
__csched_vcpu_acct_start() acquires the private lock, and we'd violate
the nesting rule.

In summary, this is looking more complicated than it seemed, and I'll
have to look at this again on Tuesday (it's public holiday, here, on
Monday).

Gosh, how much I hate this scheduler!! :-/

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-08-12 15:17 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-08-12  4:07 [PATCH] xen: credit1: fix a race when picking initial pCPU for a vCPU Dario Faggioli
2016-08-12  9:14 ` George Dunlap
2016-08-12  9:46   ` Dario Faggioli
2016-08-12 15:17   ` Dario Faggioli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).