From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1762897AbZEHMuu (ORCPT ); Fri, 8 May 2009 08:50:50 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1761917AbZEHMu0 (ORCPT ); Fri, 8 May 2009 08:50:26 -0400 Received: from e1.ny.us.ibm.com ([32.97.182.141]:49866 "EHLO e1.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1761029AbZEHMu0 (ORCPT ); Fri, 8 May 2009 08:50:26 -0400 Date: Fri, 8 May 2009 05:50:23 -0700 From: "Paul E. McKenney" To: Peter Zijlstra Cc: Christoph Lameter , Alok Kataria , "H. Peter Anvin" , Ingo Molnar , Thomas Gleixner , the arch/x86 maintainers , LKML , "alan@lxorguk.ukuu.org.uk" Subject: Re: [PATCH] x86: Reduce the default HZ value Message-ID: <20090508125023.GA6935@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <4A00ADDE.9000908@zytor.com> <1241560625.8665.17.camel@alok-dev1> <1241716053.6311.1514.camel@laptop> <1241716422.6311.1524.camel@laptop> <1241716718.6311.1531.camel@laptop> <20090507173608.GC6693@linux.vnet.ibm.com> <1241717904.6311.1558.camel@laptop> <20090507180112.GE6693@linux.vnet.ibm.com> <1241778776.6311.2585.camel@laptop> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1241778776.6311.2585.camel@laptop> User-Agent: Mutt/1.5.15+20070412 (2007-04-11) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 08, 2009 at 12:32:56PM +0200, Peter Zijlstra wrote: > On Thu, 2009-05-07 at 11:01 -0700, Paul E. McKenney wrote: > > > In general, I agree. However, in the case where you have a single > > CPU-bound task running in user mode, you don't care that much about > > syscall performance. So, yes, this would mean having yet another config > > variable that users running big CPU-bound scientific applications would > > need to worry about, which is not perfect either. > > > > For whatever it is worth, the added overhead on entry would be something > > like the following: > > > > void rcu_irq_enter(void) > > { > > struct rcu_dynticks *rdtp = &__get_cpu_var(rcu_dynticks); > > > > if (rdtp->dynticks_nesting++) > > return; > > rdtp->dynticks++; > > WARN_ON_RATELIMIT(!(rdtp->dynticks & 0x1), &rcu_rs); > > smp_mb(); /* CPUs seeing ++ must see later RCU read-side crit sects */ > > } > > > > On exit, a bit more: > > > > void rcu_irq_exit(void) > > { > > struct rcu_dynticks *rdtp = &__get_cpu_var(rcu_dynticks); > > > > if (--rdtp->dynticks_nesting) > > return; > > smp_mb(); /* CPUs seeing ++ must see prior RCU read-side crit sects */ > > rdtp->dynticks++; > > WARN_ON_RATELIMIT(rdtp->dynticks & 0x1, &rcu_rs); > > > > /* If the interrupt queued a callback, get out of dyntick mode. */ > > if (__get_cpu_var(rcu_data).nxtlist || > > __get_cpu_var(rcu_bh_data).nxtlist) > > set_need_resched(); > > } > > > > But I could move the callback check into call_rcu(), which would get the > > overhead of rcu_irq_exit() down to about that of rcu_irq_enter(). > > Can't you simply enter idle state after a grace period completes and > finds no pending callbacks for the next period. And leave idle state at > the next call_rcu()? If there were no RCU callbacks -globally- across all CPUs, yes. But the check at the end of rcu_irq_exit() is testing only on the current CPU. Checking across all CPUs is expensive and racy. So what happens instead is that there is rcu_needs_cpu(), which gates entry into dynticks-idle mode. This function returns 1 if there are callbacks on the current CPU. So, if no CPU has an RCU callback, then all CPUs can enter dynticks-idle mode so that the entire system is quiescent from an RCU viewpoint -- no RCU processing at all. Or am I missing what you are getting at with your question? Thanx, Paul