From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753410AbbCGB6s (ORCPT ); Fri, 6 Mar 2015 20:58:48 -0500 Received: from g1t5425.austin.hp.com ([15.216.225.55]:54422 "EHLO g1t5425.austin.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750830AbbCGB6r (ORCPT ); Fri, 6 Mar 2015 20:58:47 -0500 X-Greylist: delayed 25300 seconds by postgrey-1.27 at vger.kernel.org; Fri, 06 Mar 2015 20:58:47 EST Message-ID: <1425693523.2475.319.camel@j-VirtualBox> Subject: Re: softlockups in multi_cpu_stop From: Jason Low To: Davidlohr Bueso Cc: Linus Torvalds , Ingo Molnar , Sasha Levin , Peter Zijlstra , LKML , Dave Jones , Ming Lei , jason.low2@hp.com Date: Fri, 06 Mar 2015 17:58:43 -0800 In-Reply-To: <1425680137.19505.63.camel@stgolabs.net> References: <54F41516.6060608@oracle.com> <54F98F1F.3080107@oracle.com> <20150306123233.GA9972@gmail.com> <1425662342.19505.41.camel@stgolabs.net> <1425668223.2475.94.camel@j-VirtualBox> <1425670188.2475.113.camel@j-VirtualBox> <1425676346.2475.135.camel@j-VirtualBox> <1425680137.19505.63.camel@stgolabs.net> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.2.3-0ubuntu6 Content-Transfer-Encoding: 7bit Mime-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 2015-03-06 at 14:15 -0800, Davidlohr Bueso wrote: > On Fri, 2015-03-06 at 13:12 -0800, Jason Low wrote: > > In owner_running() there are 2 conditions that would make it return > > false: if the owner changed or if the owner is not running. However, > > that patch continues spinning if there is a "new owner" but it does not > > take into account that we may want to stop spinning if the owner is not > > running (due to getting rescheduled). > > So you're rationale is that we're missing this need_resched: > > while (owner_running(sem, owner)) { > /* abort spinning when need_resched */ > if (need_resched()) { > rcu_read_unlock(); > return false; > } > } > > Because the owner_running() would return false, right? Yeah that makes > sense, as missing a resched is a bug, as opposed to our heuristics being > so painfully off. Actually, the rationale is that when the lock owner reschedules while holding the lock, we'd want the spinners to stop spinning. The original owner_running() check takes care of this since it returns false if ->on_cpu gets set to false and the sem->owner != NULL would be false causing us to stop spinning . However, with the patch, when owner_running returns false, we check sem->owner, which causes the ->on_cpu check to essentially get ignored.