[PATCH] xen: Send spinlock IPI to all waiters

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] xen: Send spinlock IPI to all waiters
@ 2013-02-15 10:52 Stefan Bader
  2013-02-15 11:10 ` Ian Campbell
  2013-02-15 15:05 ` Konrad Rzeszutek Wilk
  0 siblings, 2 replies; 9+ messages in thread
From: Stefan Bader @ 2013-02-15 10:52 UTC (permalink / raw)
  To: xen-devel; +Cc: Jan Beulich, Konrad Rzeszutek Wilk

Hopefully not mis-parsing Jan's last comments on the other thread,
this would be the fix covering things until a better implementation
is done.
This also prevents the hang on older kernels, where it could be re-
produced reliably.

-Stefan

>From 7e042a253b06da96409a0e059744c217f396a17f Mon Sep 17 00:00:00 2001
From: Stefan Bader <stefan.bader@canonical.com>
Date: Fri, 15 Feb 2013 09:48:52 +0100
Subject: [PATCH] xen: Send spinlock IPI to all waiters

There is a loophole between Xen's current implementation of
pv-spinlocks and the scheduler. This was triggerable through
a testcase until v3.6 changed the TLB flushing code. The
problem potentially is still there just not observable in the
same way.

What could happen was (is):

1. CPU n tries to schedule task x away and goes into a slow
   wait for the runq lock of CPU n-# (must be one with a lower
   number).
2. CPU n-#, while processing softirqs, tries to balance domains
   and goes into a slow wait for its own runq lock (for updating
   some records). Since this is a spin_lock_irqsave in softirq
   context, interrupts will be re-enabled for the duration of
   the poll_irq hypercall used by Xen.
3. Before the runq lock of CPU n-# is unlocked, CPU n-1 receives
   an interrupt (e.g. endio) and when processing the interrupt,
   tries to wake up task x. But that is in schedule and still
   on_cpu, so try_to_wake_up goes into a tight loop.
4. The runq lock of CPU n-# gets unlocked, but the message only
   gets sent to the first waiter, which is CPU n-# and that is
   busily stuck.

To avoid this and since the unlocking code has no real sense of
which waiter is best suited to grab the lock, just send the IPI
to all of them. This causes the waiters to return from the hyper-
call (those not interrupted at least) and do active spinlocking.

BugLink: http://bugs.launchpad.net/bugs/1011792

Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Cc: stable@vger.kernel.org
---
 arch/x86/xen/spinlock.c |    1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 83e866d..f7a080e 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -328,7 +328,6 @@ static noinline void xen_spin_unlock_slow(struct xen_spinlock *xl)
 		if (per_cpu(lock_spinners, cpu) == xl) {
 			ADD_STATS(released_slow_kicked, 1);
 			xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR);
-			break;
 		}
 	}
 }
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] xen: Send spinlock IPI to all waiters
  2013-02-15 10:52 [PATCH] xen: Send spinlock IPI to all waiters Stefan Bader
@ 2013-02-15 11:10 ` Ian Campbell
  2013-02-15 11:26   ` Jan Beulich
  2013-02-15 15:05 ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 9+ messages in thread
From: Ian Campbell @ 2013-02-15 11:10 UTC (permalink / raw)
  To: Stefan Bader; +Cc: Konrad Rzeszutek Wilk, Jan Beulich, xen-devel@lists.xen.org

On Fri, 2013-02-15 at 10:52 +0000, Stefan Bader wrote:
> diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
> index 83e866d..f7a080e 100644
> --- a/arch/x86/xen/spinlock.c
> +++ b/arch/x86/xen/spinlock.c
> @@ -328,7 +328,6 @@ static noinline void xen_spin_unlock_slow(struct xen_spinlock *xl)
>  		if (per_cpu(lock_spinners, cpu) == xl) {
>  			ADD_STATS(released_slow_kicked, 1);
>  			xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR);
> -			break;

It would be more efficient to build a mask and use xen_send_IPI_mask().

Ian.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] xen: Send spinlock IPI to all waiters
  2013-02-15 11:10 ` Ian Campbell
@ 2013-02-15 11:26   ` Jan Beulich
  2013-02-15 11:31     ` Ian Campbell
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2013-02-15 11:26 UTC (permalink / raw)
  To: Stefan Bader, Ian Campbell; +Cc: Konrad Rzeszutek Wilk, xen-devel@lists.xen.org

>>> On 15.02.13 at 12:10, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> On Fri, 2013-02-15 at 10:52 +0000, Stefan Bader wrote:
>> diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
>> index 83e866d..f7a080e 100644
>> --- a/arch/x86/xen/spinlock.c
>> +++ b/arch/x86/xen/spinlock.c
>> @@ -328,7 +328,6 @@ static noinline void xen_spin_unlock_slow(struct 
> xen_spinlock *xl)
>>  		if (per_cpu(lock_spinners, cpu) == xl) {
>>  			ADD_STATS(released_slow_kicked, 1);
>>  			xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR);
>> -			break;
> 
> It would be more efficient to build a mask and use xen_send_IPI_mask().

In order for __xen_send_IPI_mask() to then take the list apart
again and call xen_send_IPI_one()? There's no batching
implemented currently...

Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] xen: Send spinlock IPI to all waiters
  2013-02-15 11:26   ` Jan Beulich
@ 2013-02-15 11:31     ` Ian Campbell
  2013-02-15 15:14       ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 9+ messages in thread
From: Ian Campbell @ 2013-02-15 11:31 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Konrad Rzeszutek Wilk, Stefan Bader, xen-devel@lists.xen.org

On Fri, 2013-02-15 at 11:26 +0000, Jan Beulich wrote:
> >>> On 15.02.13 at 12:10, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> > On Fri, 2013-02-15 at 10:52 +0000, Stefan Bader wrote:
> >> diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
> >> index 83e866d..f7a080e 100644
> >> --- a/arch/x86/xen/spinlock.c
> >> +++ b/arch/x86/xen/spinlock.c
> >> @@ -328,7 +328,6 @@ static noinline void xen_spin_unlock_slow(struct 
> > xen_spinlock *xl)
> >>  		if (per_cpu(lock_spinners, cpu) == xl) {
> >>  			ADD_STATS(released_slow_kicked, 1);
> >>  			xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR);
> >> -			break;
> > 
> > It would be more efficient to build a mask and use xen_send_IPI_mask().
> 
> In order for __xen_send_IPI_mask() to then take the list apart
> again and call xen_send_IPI_one()? There's no batching
> implemented currently...

Oh, I simply assumed it must obviously do that!

Ian.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] xen: Send spinlock IPI to all waiters
  2013-02-15 11:31     ` Ian Campbell
@ 2013-02-15 15:14       ` Konrad Rzeszutek Wilk
  2013-02-15 15:46         ` Jan Beulich
  0 siblings, 1 reply; 9+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-02-15 15:14 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Stefan Bader, Jan Beulich, xen-devel@lists.xen.org

On Fri, Feb 15, 2013 at 11:31:23AM +0000, Ian Campbell wrote:
> On Fri, 2013-02-15 at 11:26 +0000, Jan Beulich wrote:
> > >>> On 15.02.13 at 12:10, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> > > On Fri, 2013-02-15 at 10:52 +0000, Stefan Bader wrote:
> > >> diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
> > >> index 83e866d..f7a080e 100644
> > >> --- a/arch/x86/xen/spinlock.c
> > >> +++ b/arch/x86/xen/spinlock.c
> > >> @@ -328,7 +328,6 @@ static noinline void xen_spin_unlock_slow(struct 
> > > xen_spinlock *xl)
> > >>  		if (per_cpu(lock_spinners, cpu) == xl) {
> > >>  			ADD_STATS(released_slow_kicked, 1);
> > >>  			xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR);
> > >> -			break;
> > > 
> > > It would be more efficient to build a mask and use xen_send_IPI_mask().
> > 
> > In order for __xen_send_IPI_mask() to then take the list apart
> > again and call xen_send_IPI_one()? There's no batching
> > implemented currently...
> 
> Oh, I simply assumed it must obviously do that!

Perhaps if it was done via a multicall? We could batch then things up. But I recall
you saying that for homogeneous calls it is silly to use multicalls.

> 
> Ian.
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] xen: Send spinlock IPI to all waiters
  2013-02-15 15:14       ` Konrad Rzeszutek Wilk
@ 2013-02-15 15:46         ` Jan Beulich
  2013-02-15 15:59           ` Ian Campbell
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2013-02-15 15:46 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Ian Campbell, Stefan Bader, xen-devel@lists.xen.org

>>> On 15.02.13 at 16:14, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> On Fri, Feb 15, 2013 at 11:31:23AM +0000, Ian Campbell wrote:
>> On Fri, 2013-02-15 at 11:26 +0000, Jan Beulich wrote:
>> > >>> On 15.02.13 at 12:10, Ian Campbell <Ian.Campbell@citrix.com> wrote:
>> > > On Fri, 2013-02-15 at 10:52 +0000, Stefan Bader wrote:
>> > >> diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
>> > >> index 83e866d..f7a080e 100644
>> > >> --- a/arch/x86/xen/spinlock.c
>> > >> +++ b/arch/x86/xen/spinlock.c
>> > >> @@ -328,7 +328,6 @@ static noinline void xen_spin_unlock_slow(struct 
>> > > xen_spinlock *xl)
>> > >>  		if (per_cpu(lock_spinners, cpu) == xl) {
>> > >>  			ADD_STATS(released_slow_kicked, 1);
>> > >>  			xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR);
>> > >> -			break;
>> > > 
>> > > It would be more efficient to build a mask and use xen_send_IPI_mask().
>> > 
>> > In order for __xen_send_IPI_mask() to then take the list apart
>> > again and call xen_send_IPI_one()? There's no batching
>> > implemented currently...
>> 
>> Oh, I simply assumed it must obviously do that!
> 
> Perhaps if it was done via a multicall? We could batch then things up. But I 
> recall you saying that for homogeneous calls it is silly to use multicalls.

Aren't you mixing this up with it being pointless to use multicalls
for things that already allow for batching, like grant table ops?

Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] xen: Send spinlock IPI to all waiters
  2013-02-15 15:46         ` Jan Beulich
@ 2013-02-15 15:59           ` Ian Campbell
  0 siblings, 0 replies; 9+ messages in thread
From: Ian Campbell @ 2013-02-15 15:59 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel@lists.xen.org, Stefan Bader, Konrad Rzeszutek Wilk

On Fri, 2013-02-15 at 15:46 +0000, Jan Beulich wrote:
> >>> On 15.02.13 at 16:14, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> > On Fri, Feb 15, 2013 at 11:31:23AM +0000, Ian Campbell wrote:
> >> On Fri, 2013-02-15 at 11:26 +0000, Jan Beulich wrote:
> >> > >>> On 15.02.13 at 12:10, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> >> > > On Fri, 2013-02-15 at 10:52 +0000, Stefan Bader wrote:
> >> > >> diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
> >> > >> index 83e866d..f7a080e 100644
> >> > >> --- a/arch/x86/xen/spinlock.c
> >> > >> +++ b/arch/x86/xen/spinlock.c
> >> > >> @@ -328,7 +328,6 @@ static noinline void xen_spin_unlock_slow(struct 
> >> > > xen_spinlock *xl)
> >> > >>  		if (per_cpu(lock_spinners, cpu) == xl) {
> >> > >>  			ADD_STATS(released_slow_kicked, 1);
> >> > >>  			xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR);
> >> > >> -			break;
> >> > > 
> >> > > It would be more efficient to build a mask and use xen_send_IPI_mask().
> >> > 
> >> > In order for __xen_send_IPI_mask() to then take the list apart
> >> > again and call xen_send_IPI_one()? There's no batching
> >> > implemented currently...
> >> 
> >> Oh, I simply assumed it must obviously do that!
> > 
> > Perhaps if it was done via a multicall? We could batch then things up. But I 
> > recall you saying that for homogeneous calls it is silly to use multicalls.
> 
> Aren't you mixing this up with it being pointless to use multicalls
> for things that already allow for batching, like grant table ops?

Right, it is only silly to use multicalls for homogeneous calls which
already natively support batching.

Ian.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] xen: Send spinlock IPI to all waiters
  2013-02-15 10:52 [PATCH] xen: Send spinlock IPI to all waiters Stefan Bader
  2013-02-15 11:10 ` Ian Campbell
@ 2013-02-15 15:05 ` Konrad Rzeszutek Wilk
  1 sibling, 0 replies; 9+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-02-15 15:05 UTC (permalink / raw)
  To: Stefan Bader; +Cc: Jan Beulich, xen-devel

On Fri, Feb 15, 2013 at 11:52:35AM +0100, Stefan Bader wrote:
> Hopefully not mis-parsing Jan's last comments on the other thread,
> this would be the fix covering things until a better implementation
> is done.
> This also prevents the hang on older kernels, where it could be re-
> produced reliably.
> 
> -Stefan
> 
> >From 7e042a253b06da96409a0e059744c217f396a17f Mon Sep 17 00:00:00 2001
> From: Stefan Bader <stefan.bader@canonical.com>
> Date: Fri, 15 Feb 2013 09:48:52 +0100
> Subject: [PATCH] xen: Send spinlock IPI to all waiters
> 
> There is a loophole between Xen's current implementation of
> pv-spinlocks and the scheduler. This was triggerable through
> a testcase until v3.6 changed the TLB flushing code. The
> problem potentially is still there just not observable in the
> same way.
> 
> What could happen was (is):
> 
> 1. CPU n tries to schedule task x away and goes into a slow
>    wait for the runq lock of CPU n-# (must be one with a lower
>    number).
> 2. CPU n-#, while processing softirqs, tries to balance domains
>    and goes into a slow wait for its own runq lock (for updating
>    some records). Since this is a spin_lock_irqsave in softirq
>    context, interrupts will be re-enabled for the duration of
>    the poll_irq hypercall used by Xen.
> 3. Before the runq lock of CPU n-# is unlocked, CPU n-1 receives
>    an interrupt (e.g. endio) and when processing the interrupt,
>    tries to wake up task x. But that is in schedule and still
>    on_cpu, so try_to_wake_up goes into a tight loop.
> 4. The runq lock of CPU n-# gets unlocked, but the message only
>    gets sent to the first waiter, which is CPU n-# and that is
>    busily stuck.

Just for completness:

5. The 3) (so CPU n-1) sits in its tight loop and never exits
   as nothing ever interrupted it.


> 
> To avoid this and since the unlocking code has no real sense of
> which waiter is best suited to grab the lock, just send the IPI
> to all of them. This causes the waiters to return from the hyper-
> call (those not interrupted at least) and do active spinlocking.
> 
> BugLink: http://bugs.launchpad.net/bugs/1011792
> 
> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
> Cc: stable@vger.kernel.org
> ---
>  arch/x86/xen/spinlock.c |    1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
> index 83e866d..f7a080e 100644
> --- a/arch/x86/xen/spinlock.c
> +++ b/arch/x86/xen/spinlock.c
> @@ -328,7 +328,6 @@ static noinline void xen_spin_unlock_slow(struct xen_spinlock *xl)
>  		if (per_cpu(lock_spinners, cpu) == xl) {
>  			ADD_STATS(released_slow_kicked, 1);
>  			xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR);
> -			break;
>  		}
>  	}
>  }
> -- 
> 1.7.9.5
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] xen: Send spinlock IPI to all waiters
@ 2013-02-15 11:20 Jan Beulich
  0 siblings, 0 replies; 9+ messages in thread
From: Jan Beulich @ 2013-02-15 11:20 UTC (permalink / raw)
  To: Stefan Bader; +Cc: Konrad Rzeszutek Wilk, xen-devel

(resending with proper list address)

>>> On 15.02.13 at 11:50, Stefan Bader <stefan.bader@canonical.com> wrote:
> Hopefully not mis-parsing Jan's last comments on the other thread,
> this would be the fix covering things until a better implementation
> is done.
> This also prevents the hang on older kernels, where it could be re-
> produced reliably.
> 
> -Stefan
> 
> From 7e042a253b06da96409a0e059744c217f396a17f Mon Sep 17 00:00:00 2001
> From: Stefan Bader <stefan.bader@canonical.com>
> Date: Fri, 15 Feb 2013 09:48:52 +0100
> Subject: [PATCH] xen: Send spinlock IPI to all waiters
> 
> There is a loophole between Xen's current implementation of
> pv-spinlocks and the scheduler. This was triggerable through
> a testcase until v3.6 changed the TLB flushing code. The
> problem potentially is still there just not observable in the
> same way.
> 
> What could happen was (is):
> 
> 1. CPU n tries to schedule task x away and goes into a slow
>    wait for the runq lock of CPU n-# (must be one with a lower
>    number).
> 2. CPU n-#, while processing softirqs, tries to balance domains
>    and goes into a slow wait for its own runq lock (for updating
>    some records). Since this is a spin_lock_irqsave in softirq
>    context, interrupts will be re-enabled for the duration of
>    the poll_irq hypercall used by Xen.
> 3. Before the runq lock of CPU n-# is unlocked, CPU n-1 receives
>    an interrupt (e.g. endio) and when processing the interrupt,
>    tries to wake up task x. But that is in schedule and still
>    on_cpu, so try_to_wake_up goes into a tight loop.
> 4. The runq lock of CPU n-# gets unlocked, but the message only
>    gets sent to the first waiter, which is CPU n-# and that is
>    busily stuck.
> 
> To avoid this and since the unlocking code has no real sense of
> which waiter is best suited to grab the lock, just send the IPI
> to all of them. This causes the waiters to return from the hyper-
> call (those not interrupted at least) and do active spinlocking.
> 
> BugLink: http://bugs.launchpad.net/bugs/1011792 
> 
> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

Acked-by: Jan Beulich <jbeulich@suse.com>

(but see below)

> Cc: stable@vger.kernel.org 

(back to 3.0; earlier kernels shouldn't be affected)

> ---
>  arch/x86/xen/spinlock.c |    1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
> index 83e866d..f7a080e 100644
> --- a/arch/x86/xen/spinlock.c
> +++ b/arch/x86/xen/spinlock.c
> @@ -328,7 +328,6 @@ static noinline void xen_spin_unlock_slow(struct 
> xen_spinlock *xl)
>  		if (per_cpu(lock_spinners, cpu) == xl) {
>  			ADD_STATS(released_slow_kicked, 1);
>  			xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR);
> -			break;

Of course that's the minimal fix. Would be nice if this could be done
as a batch, since even if congestion on locks might be rare, it's
specifically that case where the biggest gain from doing things
quickly can be had.

But of course, as said earlier, switching to ticket locks should be the
real goal.

Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2013-02-15 15:59 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-02-15 10:52 [PATCH] xen: Send spinlock IPI to all waiters Stefan Bader
2013-02-15 11:10 ` Ian Campbell
2013-02-15 11:26   ` Jan Beulich
2013-02-15 11:31     ` Ian Campbell
2013-02-15 15:14       ` Konrad Rzeszutek Wilk
2013-02-15 15:46         ` Jan Beulich
2013-02-15 15:59           ` Ian Campbell
2013-02-15 15:05 ` Konrad Rzeszutek Wilk
  -- strict thread matches above, loose matches on Subject: below --
2013-02-15 11:20 Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.