From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753101AbbFXNns (ORCPT ); Wed, 24 Jun 2015 09:43:48 -0400 Received: from mail-wi0-f180.google.com ([209.85.212.180]:35345 "EHLO mail-wi0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752243AbbFXNnm (ORCPT ); Wed, 24 Jun 2015 09:43:42 -0400 Date: Wed, 24 Jun 2015 15:43:37 +0200 From: Ingo Molnar To: "Paul E. McKenney" Cc: Peter Zijlstra , Oleg Nesterov , tj@kernel.org, mingo@redhat.com, linux-kernel@vger.kernel.org, der.herr@hofr.at, dave@stgolabs.net, riel@redhat.com, viro@ZenIV.linux.org.uk, torvalds@linux-foundation.org Subject: Re: [RFC][PATCH 12/13] stop_machine: Remove lglock Message-ID: <20150624134337.GA10662@gmail.com> References: <20150623100932.GB3644@twins.programming.kicks-ass.net> <20150623105548.GE18673@twins.programming.kicks-ass.net> <20150623112041.GF18673@twins.programming.kicks-ass.net> <20150623130826.GG18673@twins.programming.kicks-ass.net> <20150623173038.GJ3892@linux.vnet.ibm.com> <20150623180411.GF3644@twins.programming.kicks-ass.net> <20150623182626.GO3892@linux.vnet.ibm.com> <20150624073503.GH3644@twins.programming.kicks-ass.net> <20150624084248.GA27873@gmail.com> <20150624133859.GA3892@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150624133859.GA3892@linux.vnet.ibm.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Paul E. McKenney wrote: > On Wed, Jun 24, 2015 at 10:42:48AM +0200, Ingo Molnar wrote: > > > > * Peter Zijlstra wrote: > > > > > On Tue, Jun 23, 2015 at 11:26:26AM -0700, Paul E. McKenney wrote: > > > > > > > > > > I really think you're making that expedited nonsense far too accessible. > > > > > > > > This has nothing to do with accessibility and everything to do with > > > > robustness. And with me not becoming the triage center for too many non-RCU > > > > bugs. > > > > > > But by making it so you're rewarding abuse instead of flagging it :-( > > > > Btw., being a 'triage center' is the bane of APIs that are overly successful, > > so we should take that burden with pride! :-) > > I will gladly accept that compliment. > > And the burden. But, lazy as I am, I intend to automate it. ;-) lol :) > > Lockdep (and the scheduler APIs as well) frequently got into such situations as > > well, and we mostly solved it by being more informative with debug splats. > > > > I don't think a kernel API should (ever!) stay artificially silent, just for fear > > of flagging too many problems in other code. > > I agree, as attested by RCU CPU stall warnings, lockdep-RCU, sparse-based > RCU checks, and the object-debug-based checks for double call_rcu(). > That said, in all of these cases, including your example of lockdep, > the diagnostic is a debug splat rather than a mutex-contention meltdown. > And it is the mutex-contention meltdown that I will continue making > synchronize_sched_expedited() avoid. > > But given the change from bulk try_stop_cpus() to either stop_one_cpu() or > IPIs, it would not be hard to splat if a given CPU didn't come back fast > enough. The latency tracer would of course provide better information, > but synchronize_sched_expedited() could do a coarse-grained job with > less setup required. > > My first guess for the timeout would be something like 500 milliseconds. > Thoughts? So I'd start with 5,000 milliseconds and observe the results first ... Thanks, Ingo