From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756784AbbCFT3z (ORCPT ); Fri, 6 Mar 2015 14:29:55 -0500 Received: from g9t5009.houston.hp.com ([15.240.92.67]:47270 "EHLO g9t5009.houston.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756751AbbCFT3w (ORCPT ); Fri, 6 Mar 2015 14:29:52 -0500 Message-ID: <1425670188.2475.113.camel@j-VirtualBox> Subject: Re: sched: softlockups in multi_cpu_stop From: Jason Low To: Linus Torvalds Cc: Davidlohr Bueso , Ingo Molnar , Sasha Levin , Peter Zijlstra , LKML , Dave Jones , jason.low2@hp.com Date: Fri, 06 Mar 2015 11:29:48 -0800 In-Reply-To: References: <54F41516.6060608@oracle.com> <54F98F1F.3080107@oracle.com> <20150306123233.GA9972@gmail.com> <1425662342.19505.41.camel@stgolabs.net> <1425668223.2475.94.camel@j-VirtualBox> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.2.3-0ubuntu6 Content-Transfer-Encoding: 7bit Mime-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 2015-03-06 at 11:05 -0800, Linus Torvalds wrote: > On Fri, Mar 6, 2015 at 10:57 AM, Jason Low wrote: > > > > Right, the can_spin_on_owner() was originally added to the mutex > > spinning code for optimization purposes, particularly so that we can > > avoid adding the spinner to the OSQ only to find that it doesn't need to > > spin. This function needing to return a correct value should really only > > affect performance, so yes, lockups due to this seems surprising. > > Well, softlockups aren't about "correct behavior". They are about > certain things not happening in a timely manner. > > Clearly the mutex code now tries to hold on to the CPU too aggressively. > > At some point people need to admit that busy-looping isn't always a > good idea. Especially if > > (a) we could idle the core instead > > (b) the tuning has been done based on som especial-purpose benchmark > that is likely not realistic > > (c) we get reports from people that it causes problems. > > In other words: Let's just undo that excessive busy-looping. The > performance numbers were dubious to begin with. Real scalability comes > from fixing the locking, not from trying to play games with the locks > themselves. Particularly games that then cause problems. Hi Linus, Agreed, this is an issue we need to address, though we're just trying to figure out if the change to rwsem_can_spin_on_owner() in "commit: 37e9562453b" is really the one that's causing the issue. For example, it looks like Ming recently found another change in the same patchset: commit b3fd4f03ca0b995(locking/rwsem: Avoid deceiving lock spinners) to be causing lockups. https://lkml.org/lkml/2015/3/6/521