From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754021AbbCFVMd (ORCPT ); Fri, 6 Mar 2015 16:12:33 -0500 Received: from g4t3425.houston.hp.com ([15.201.208.53]:38156 "EHLO g4t3425.houston.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750789AbbCFVMb (ORCPT ); Fri, 6 Mar 2015 16:12:31 -0500 Message-ID: <1425676346.2475.135.camel@j-VirtualBox> Subject: Re: softlockups in multi_cpu_stop From: Jason Low To: Linus Torvalds Cc: Davidlohr Bueso , Ingo Molnar , Sasha Levin , Peter Zijlstra , LKML , Dave Jones , jason.low2@hp.com Date: Fri, 06 Mar 2015 13:12:26 -0800 In-Reply-To: <1425670188.2475.113.camel@j-VirtualBox> References: <54F41516.6060608@oracle.com> <54F98F1F.3080107@oracle.com> <20150306123233.GA9972@gmail.com> <1425662342.19505.41.camel@stgolabs.net> <1425668223.2475.94.camel@j-VirtualBox> <1425670188.2475.113.camel@j-VirtualBox> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.2.3-0ubuntu6 Content-Transfer-Encoding: 7bit Mime-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 2015-03-06 at 11:29 -0800, Jason Low wrote: > Hi Linus, > > Agreed, this is an issue we need to address, though we're just trying to > figure out if the change to rwsem_can_spin_on_owner() in "commit: > 37e9562453b" is really the one that's causing the issue. > > For example, it looks like Ming recently found another change in the > same patchset: commit b3fd4f03ca0b995(locking/rwsem: Avoid deceiving > lock spinners) to be causing lockups. > > https://lkml.org/lkml/2015/3/6/521 So I think I may have spotted a problem in the tip commit: Commit b3fd4f03ca0b995 (locking/rwsem: Avoid deceiving lock spinners). In owner_running() there are 2 conditions that would make it return false: if the owner changed or if the owner is not running. However, that patch continues spinning if there is a "new owner" but it does not take into account that we may want to stop spinning if the owner is not running (due to getting rescheduled). So we we really want this right (not yet tested): --- Subject: [PATCH] locking/rwsem: Avoid spinning when owner is not running not-yet-Signed-off-by: Jason Low --- kernel/locking/rwsem-xadd.c | 28 ++++++++-------------------- 1 files changed, 8 insertions(+), 20 deletions(-) diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c index 06e2214..e9379ee 100644 --- a/kernel/locking/rwsem-xadd.c +++ b/kernel/locking/rwsem-xadd.c @@ -324,32 +324,20 @@ done: return ret; } -static inline bool owner_running(struct rw_semaphore *sem, - struct task_struct *owner) -{ - if (sem->owner != owner) - return false; - - /* - * Ensure we emit the owner->on_cpu, dereference _after_ checking - * sem->owner still matches owner, if that fails, owner might - * point to free()d memory, if it still matches, the rcu_read_lock() - * ensures the memory stays valid. - */ - barrier(); - - return owner->on_cpu; -} - static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner) { long count; rcu_read_lock(); - while (owner_running(sem, owner)) { - /* abort spinning when need_resched */ - if (need_resched()) { + while (true) { + if (sem->owner != owner) + break; + + barrier(); + + /* abort spinning when need_resched or owner is not running*/ + if (!owner->on_cpu || need_resched()) { rcu_read_unlock(); return false; } -- 1.7.2.5