From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753723AbdEPO1x (ORCPT ); Tue, 16 May 2017 10:27:53 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:34948 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751680AbdEPO1s (ORCPT ); Tue, 16 May 2017 10:27:48 -0400 Date: Tue, 16 May 2017 07:27:42 -0700 From: "Paul E. McKenney" To: Peter Zijlstra Cc: Steven Rostedt , linux-kernel@vger.kernel.org, Thomas Gleixner , Ingo Molnar , Mathieu Desnoyers , Masami Hiramatsu Subject: Re: [RFC][PATCH 0/5] perf/tracing/cpuhotplug: Fix locking order Reply-To: paulmck@linux.vnet.ibm.com References: <20170512171544.100715273@goodmis.org> <20170512194956.GH4626@worktop.programming.kicks-ass.net> <20170512173448.5e2106b6@gandalf.local.home> <20170513134003.GA30927@linux.vnet.ibm.com> <20170515090318.amup4tbqujg27nrl@hirez.programming.kicks-ass.net> <20170515184043.GU3956@linux.vnet.ibm.com> <20170516081923.fxg67gawc44eg6i6@hirez.programming.kicks-ass.net> <20170516124606.GC3956@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170516124606.GC3956@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 17051614-0040-0000-0000-0000033ED78D X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00007072; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000212; SDB=6.00861293; UDB=6.00427173; IPR=6.00640957; BA=6.00005351; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00015482; XFM=3.00000015; UTC=2017-05-16 14:27:46 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17051614-0041-0000-0000-000007330530 Message-Id: <20170516142742.GA17599@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-05-16_04:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1703280000 definitions=main-1705160116 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 16, 2017 at 05:46:06AM -0700, Paul E. McKenney wrote: > On Tue, May 16, 2017 at 10:19:23AM +0200, Peter Zijlstra wrote: > > On Mon, May 15, 2017 at 11:40:43AM -0700, Paul E. McKenney wrote: > > > > > Given that you acquire the global pmus_lock when doing the > > > get_online_cpus(), and given that CPU hotplug is rare, is it possible > > > to momentarily acquire the global pmus_lock in perf_event_init_cpu() > > > and perf_event_exit_cpu() and interact directly with that? Then perf > > > would presumably leave alone any outgoing CPU that had already executed > > > perf_event_exit_cpu(), and also any incoming CPU that had not already > > > executed perf_event_init_cpu(). > > > > > > What prevents this approach from working? > > > > Lack of sleep probably ;-) > > I know that feeling... > > > I'd blame the kids, but those have actually been very good lately. > > I don't get that excuse anymore, all are on their own. So I need > to come up with some fresh excuses. ;-) > > > You're suggesting the below on top, right? I'll run it with lockdep > > enabled after I chase some regression.. > > Something like this, yes. Maybe even exactly like this. ;-) Ah, one thing I forgot... If you are avoiding use of get_online_cpus(), you usually also have to be very careful about how you use things like cpu_online() and cpu_is_offline. Thanx, Paul > > --- > > --- a/kernel/events/core.c > > +++ b/kernel/events/core.c > > @@ -8997,7 +8997,6 @@ int perf_pmu_register(struct pmu *pmu, c > > { > > int cpu, ret; > > > > - get_online_cpus(); > > mutex_lock(&pmus_lock); > > ret = -ENOMEM; > > pmu->pmu_disable_count = alloc_percpu(int); > > There is usually also some state check in here somewhere for the CPU > being offline from a perf perspective. Such a check might already exist, > but I must plead ignorance of perf. > > > @@ -9093,7 +9092,6 @@ int perf_pmu_register(struct pmu *pmu, c > > ret = 0; > > unlock: > > mutex_unlock(&pmus_lock); > > - put_online_cpus(); > > > > return ret; > > > > @@ -11002,10 +11000,9 @@ static void perf_event_exit_cpu_context( > > struct perf_cpu_context *cpuctx; > > struct perf_event_context *ctx; > > struct pmu *pmu; > > - int idx; > > > > - idx = srcu_read_lock(&pmus_srcu); > > - list_for_each_entry_rcu(pmu, &pmus, entry) { > > + mutex_lock(&pmus_lock); > > If the state change checked for by perf_pmu_register() needs to be also > guarded by ctx->mutex, this looks right to me. > > Just for completeness, the other style is to maintain separate per-CPU > state, in which case you would instead acquire pmus_lock, mark this > CPU off-limits to more perf_pmu_register() usage, release pmus_lock, > then clean up any old usage. > > The approach you have here seems to work best when the cleanup > and initialization naturally mark the CPU as off limits and ready, > respectively. The other style seems to work best when you need a separate > indication of which CPUs are off limits and usable. > > RCU is an example of the other style, with the rcu_node structure's > ->qsmaskinitnext mask serving to mark which CPUs usable. One reason > that the other style works so well for RCU is that a CPU coming online > has no effect on the current grace period, so rcu_cpu_starting() just > sets the CPU's bit in ->qsmaskinitnext, which takes effect only once > the the next grace period starts. > > It is quite possible that many of the other use cases instead need to > use something like what you have here. I suspect that the common case > is that a CPU appearing or disappearing must have some immediate effect. > > > + list_for_each_entry(pmu, &pmus, entry) { > > cpuctx = per_cpu_ptr(pmu->pmu_cpu_context, cpu); > > ctx = &cpuctx->ctx; > > > > @@ -11014,7 +11011,7 @@ static void perf_event_exit_cpu_context( > > cpuctx->online = 0; > > mutex_unlock(&ctx->mutex); > > } > > - srcu_read_unlock(&pmus_srcu, idx); > > + mutex_unlock(&pmus_lock); > > } > > #else > > > > @@ -11027,12 +11024,11 @@ int perf_event_init_cpu(unsigned int cpu > > struct perf_cpu_context *cpuctx; > > struct perf_event_context *ctx; > > struct pmu *pmu; > > - int idx; > > > > perf_swevent_init_cpu(cpu); > > > > - idx = srcu_read_lock(&pmus_srcu); > > - list_for_each_entry_rcu(pmu, &pmus, entry) { > > + mutex_lock(&pmus_lock); > > + list_for_each_entry(pmu, &pmus, entry) { > > cpuctx = per_cpu_ptr(pmu->pmu_cpu_context, cpu); > > ctx = &cpuctx->ctx; > > > > @@ -11040,7 +11036,7 @@ int perf_event_init_cpu(unsigned int cpu > > cpuctx->online = 1; > > mutex_unlock(&ctx->mutex); > > } > > - srcu_read_unlock(&pmus_srcu, idx); > > + mutex_unlock(&pmus_lock); > > And same here. > > Again for completeness, the other style would be to mark this CPU > as ready for perf usage at the very end, protected by pmus_lock. > > Thanx, Paul > > > return 0; > > } > >