From mboxrd@z Thu Jan 1 00:00:00 1970 From: Konrad Rzeszutek Wilk Subject: Re: [PATCH] xen: Send spinlock IPI to all waiters Date: Fri, 15 Feb 2013 10:05:18 -0500 Message-ID: <20130215150518.GB12178@phenom.dumpdata.com> References: <1360925555-15078-1-git-send-email-stefan.bader@canonical.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: <1360925555-15078-1-git-send-email-stefan.bader@canonical.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Stefan Bader Cc: Jan Beulich , xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org On Fri, Feb 15, 2013 at 11:52:35AM +0100, Stefan Bader wrote: > Hopefully not mis-parsing Jan's last comments on the other thread, > this would be the fix covering things until a better implementation > is done. > This also prevents the hang on older kernels, where it could be re- > produced reliably. > > -Stefan > > >From 7e042a253b06da96409a0e059744c217f396a17f Mon Sep 17 00:00:00 2001 > From: Stefan Bader > Date: Fri, 15 Feb 2013 09:48:52 +0100 > Subject: [PATCH] xen: Send spinlock IPI to all waiters > > There is a loophole between Xen's current implementation of > pv-spinlocks and the scheduler. This was triggerable through > a testcase until v3.6 changed the TLB flushing code. The > problem potentially is still there just not observable in the > same way. > > What could happen was (is): > > 1. CPU n tries to schedule task x away and goes into a slow > wait for the runq lock of CPU n-# (must be one with a lower > number). > 2. CPU n-#, while processing softirqs, tries to balance domains > and goes into a slow wait for its own runq lock (for updating > some records). Since this is a spin_lock_irqsave in softirq > context, interrupts will be re-enabled for the duration of > the poll_irq hypercall used by Xen. > 3. Before the runq lock of CPU n-# is unlocked, CPU n-1 receives > an interrupt (e.g. endio) and when processing the interrupt, > tries to wake up task x. But that is in schedule and still > on_cpu, so try_to_wake_up goes into a tight loop. > 4. The runq lock of CPU n-# gets unlocked, but the message only > gets sent to the first waiter, which is CPU n-# and that is > busily stuck. Just for completness: 5. The 3) (so CPU n-1) sits in its tight loop and never exits as nothing ever interrupted it. > > To avoid this and since the unlocking code has no real sense of > which waiter is best suited to grab the lock, just send the IPI > to all of them. This causes the waiters to return from the hyper- > call (those not interrupted at least) and do active spinlocking. > > BugLink: http://bugs.launchpad.net/bugs/1011792 > > Signed-off-by: Stefan Bader > Cc: stable@vger.kernel.org > --- > arch/x86/xen/spinlock.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c > index 83e866d..f7a080e 100644 > --- a/arch/x86/xen/spinlock.c > +++ b/arch/x86/xen/spinlock.c > @@ -328,7 +328,6 @@ static noinline void xen_spin_unlock_slow(struct xen_spinlock *xl) > if (per_cpu(lock_spinners, cpu) == xl) { > ADD_STATS(released_slow_kicked, 1); > xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR); > - break; > } > } > } > -- > 1.7.9.5 > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel >