From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751656AbcGMXCt (ORCPT ); Wed, 13 Jul 2016 19:02:49 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:40609 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751523AbcGMXCY (ORCPT ); Wed, 13 Jul 2016 19:02:24 -0400 X-IBM-Helo: d01dlp01.pok.ibm.com X-IBM-MailFrom: paulmck@linux.vnet.ibm.com Date: Wed, 13 Jul 2016 16:02:38 -0700 From: "Paul E. McKenney" To: John Stultz Cc: Tejun Heo , Peter Zijlstra , Ingo Molnar , lkml , Dmitry Shmidt , Rom Lemarchand , Colin Cross , Todd Kjos , Oleg Nesterov Subject: Re: Severe performance regression w/ 4.4+ on Android due to cgroup locking changes Reply-To: paulmck@linux.vnet.ibm.com References: <20160713202657.GW30154@twins.programming.kicks-ass.net> <20160713203944.GC29670@mtj.duckdns.org> <20160713205102.GZ30909@twins.programming.kicks-ass.net> <20160713210315.GO7094@linux.vnet.ibm.com> <20160713210526.GF29670@mtj.duckdns.org> <20160713211841.GQ7094@linux.vnet.ibm.com> <20160713214238.GA15996@linux.vnet.ibm.com> <20160713221750.GR7094@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16071323-0056-0000-0000-000000C6E86E X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16071323-0057-0000-0000-000004E0F9F6 Message-Id: <20160713230238.GU7094@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-07-13_12:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1604210000 definitions=main-1607130252 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jul 13, 2016 at 03:39:37PM -0700, John Stultz wrote: > On Wed, Jul 13, 2016 at 3:17 PM, Paul E. McKenney > wrote: > > On Wed, Jul 13, 2016 at 02:46:37PM -0700, John Stultz wrote: > >> On Wed, Jul 13, 2016 at 2:42 PM, Paul E. McKenney > >> wrote: > >> > On Wed, Jul 13, 2016 at 02:18:41PM -0700, Paul E. McKenney wrote: > >> >> On Wed, Jul 13, 2016 at 05:05:26PM -0400, Tejun Heo wrote: > >> >> > On Wed, Jul 13, 2016 at 02:03:15PM -0700, Paul E. McKenney wrote: > >> >> > > Take the patch that I just sent out and make the choice of normal > >> >> > > vs. expedited depend on CONFIG_PREEMPT_RT or whatever the -rt guys are > >> >> > > calling it these days. Is there a low-latency Kconfig option other > >> >> > > than CONFIG_NO_HZ_FULL? > >> >> > > >> >> > Sounds like a plan to me. > >> >> > >> >> I like the way we like each other's idea. Mutually assured laziness? ;-) > >> > > >> > But here is what mine might look like. Untested, probably does > >> > not even build. Note that the default is -no- expediting, use the > >> > rcusync.expedited kernel parameter to enable it. > >> > >> I was working on something similar, but using a config option. Would > >> adding a config option for the default make sense here, since I'd > >> probably prefer to have one less thing to always specify on the > >> cmdline? > > > > As long as you don't mind it depending on CONFIG_RCU_EXPERT, no problem. > > > > Perhaps like the following, on top of the previous patch? > > > > Or if you are going to put it in defconfig files only, I can make it > > so that it isn't changeable at menuconfig time. > > I think having it discoverable via menuconfig is useful, and I've got > no objections to it being under RCU_EXPERT > (assuming I don't badly muck up my RCU settings accidentally :). But isn't mucking up your RCU settings half of the fun? ;-) > I only had that one nit about maybe wanting to put something in dmesg > when we're using the expedited methods. Updated, please see below. > But otherwise both patches look great and are working well! > > Do you mind marking them both for stable 4.4+? OK, looks like it does qualify in the "fix a notable performance or interactivity issue" category. > Tested-by: John Stultz > Acked-by: John Stultz > > Also, do make sure Dmitry gets the reported-by credit for the first patch. Done! The updated first patch is below, and the second will follow. Thanx, Paul ------------------------------------------------------------------------ commit 59435eb836ee73b30ed6ada525125b67b4029321 Author: Paul E. McKenney Date: Wed Jul 13 14:43:46 2016 -0700 rcu: Provide rcusync.expedited kernel boot parameter Dmitry Shmidt and John Stultz noticed that __cgroup_procs_write() sometimes incurred excessive overheads, ranging up into the tens of milliseconds. Further testing confirmed speculation that this was due to synchronize_sched() within rcusync being invoked by per-CPU rwsems. This testing also showed that substituting synchronize_sched_expedited() for synchronize_sched() greatly reduced the overheads to below 200 microseconds, with the occasional excursion into the low single digits worth of milliseconds. This commit therefore provides a rcusync.expedited kernel boot parameter that causes rcusync to use expedited grace-period primitives. Reported-by: Dmitry Shmidt Reported-by: John Stultz Signed-off-by: Paul E. McKenney Tested-by: John Stultz Acked-by: John Stultz Cc: # 4.4.x- diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 82b42c958d1c..b8bc9854e548 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -3229,6 +3229,11 @@ bytes respectively. Such letter suffixes can also be entirely omitted. energy efficiency by requiring that the kthreads periodically wake up to do the polling. + rcusync.expedited [KNL] + Specify that the rcusync mechanism use expedited + grace periods. As of mid-2016, this affects + per-CPU rwsems. + rcutree.blimit= [KNL] Set maximum number of finished RCU callbacks to process in one batch. diff --git a/kernel/rcu/sync.c b/kernel/rcu/sync.c index be922c9f3d37..0d0dc992cce7 100644 --- a/kernel/rcu/sync.c +++ b/kernel/rcu/sync.c @@ -22,6 +22,14 @@ #include #include +#include +#include + +MODULE_ALIAS("rcusync"); +#ifdef MODULE_PARAM_PREFIX +#undef MODULE_PARAM_PREFIX +#endif +#define MODULE_PARAM_PREFIX "rcusync." #ifdef CONFIG_PROVE_RCU #define __INIT_HELD(func) .held = func, @@ -29,14 +37,14 @@ #define __INIT_HELD(func) #endif -static const struct { +static struct { void (*sync)(void); void (*call)(struct rcu_head *, void (*)(struct rcu_head *)); void (*wait)(void); #ifdef CONFIG_PROVE_RCU int (*held)(void); #endif -} gp_ops[] = { +} gp_ops[] __read_mostly = { [RCU_SYNC] = { .sync = synchronize_rcu, .call = call_rcu, @@ -62,6 +70,21 @@ enum { CB_IDLE = 0, CB_PENDING, CB_REPLAY }; #define rss_lock gp_wait.lock +static bool expedited; +module_param(expedited, bool, 0444); + +static int __init rcu_sync_early_init(void) +{ + if (expedited) { + pr_info("RCU_SYNC: Expedited operation in effect.\n"); + gp_ops[RCU_SYNC].sync = synchronize_rcu_expedited; + gp_ops[RCU_SCHED_SYNC].sync = synchronize_sched_expedited; + gp_ops[RCU_BH_SYNC].sync = synchronize_rcu_bh_expedited; + } + return 0; +} +early_initcall(rcu_sync_early_init); + #ifdef CONFIG_PROVE_RCU void rcu_sync_lockdep_assert(struct rcu_sync *rsp) {