From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752961AbbFXNj4 (ORCPT ); Wed, 24 Jun 2015 09:39:56 -0400 Received: from e34.co.us.ibm.com ([32.97.110.152]:49090 "EHLO e34.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752560AbbFXNji (ORCPT ); Wed, 24 Jun 2015 09:39:38 -0400 X-Helo: d03dlp03.boulder.ibm.com X-MailFrom: paulmck@linux.vnet.ibm.com X-RcptTo: linux-kernel@vger.kernel.org Date: Wed, 24 Jun 2015 06:39:31 -0700 From: "Paul E. McKenney" To: Ingo Molnar Cc: Peter Zijlstra , Oleg Nesterov , tj@kernel.org, mingo@redhat.com, linux-kernel@vger.kernel.org, der.herr@hofr.at, dave@stgolabs.net, riel@redhat.com, viro@ZenIV.linux.org.uk, torvalds@linux-foundation.org Subject: Re: [RFC][PATCH 12/13] stop_machine: Remove lglock Message-ID: <20150624133859.GA3892@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20150622222152.GA4460@redhat.com> <20150623100932.GB3644@twins.programming.kicks-ass.net> <20150623105548.GE18673@twins.programming.kicks-ass.net> <20150623112041.GF18673@twins.programming.kicks-ass.net> <20150623130826.GG18673@twins.programming.kicks-ass.net> <20150623173038.GJ3892@linux.vnet.ibm.com> <20150623180411.GF3644@twins.programming.kicks-ass.net> <20150623182626.GO3892@linux.vnet.ibm.com> <20150624073503.GH3644@twins.programming.kicks-ass.net> <20150624084248.GA27873@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150624084248.GA27873@gmail.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 15062413-0017-0000-0000-00000BE84D8A Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 24, 2015 at 10:42:48AM +0200, Ingo Molnar wrote: > > * Peter Zijlstra wrote: > > > On Tue, Jun 23, 2015 at 11:26:26AM -0700, Paul E. McKenney wrote: > > > > > > > > I really think you're making that expedited nonsense far too accessible. > > > > > > This has nothing to do with accessibility and everything to do with > > > robustness. And with me not becoming the triage center for too many non-RCU > > > bugs. > > > > But by making it so you're rewarding abuse instead of flagging it :-( > > Btw., being a 'triage center' is the bane of APIs that are overly successful, > so we should take that burden with pride! :-) I will gladly accept that compliment. And the burden. But, lazy as I am, I intend to automate it. ;-) > Lockdep (and the scheduler APIs as well) frequently got into such situations as > well, and we mostly solved it by being more informative with debug splats. > > I don't think a kernel API should (ever!) stay artificially silent, just for fear > of flagging too many problems in other code. I agree, as attested by RCU CPU stall warnings, lockdep-RCU, sparse-based RCU checks, and the object-debug-based checks for double call_rcu(). That said, in all of these cases, including your example of lockdep, the diagnostic is a debug splat rather than a mutex-contention meltdown. And it is the mutex-contention meltdown that I will continue making synchronize_sched_expedited() avoid. But given the change from bulk try_stop_cpus() to either stop_one_cpu() or IPIs, it would not be hard to splat if a given CPU didn't come back fast enough. The latency tracer would of course provide better information, but synchronize_sched_expedited() could do a coarse-grained job with less setup required. My first guess for the timeout would be something like 500 milliseconds. Thoughts? Thanx, Paul