From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754974AbdEQOzk (ORCPT ); Wed, 17 May 2017 10:55:40 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:37550 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753752AbdEQOzg (ORCPT ); Wed, 17 May 2017 10:55:36 -0400 Date: Wed, 17 May 2017 07:55:20 -0700 From: "Paul E. McKenney" To: Peter Zijlstra Cc: Steven Rostedt , linux-kernel@vger.kernel.org, Thomas Gleixner , Ingo Molnar , Mathieu Desnoyers , Masami Hiramatsu Subject: Re: [RFC][PATCH 0/5] perf/tracing/cpuhotplug: Fix locking order Reply-To: paulmck@linux.vnet.ibm.com References: <20170512171544.100715273@goodmis.org> <20170512194956.GH4626@worktop.programming.kicks-ass.net> <20170512173448.5e2106b6@gandalf.local.home> <20170513134003.GA30927@linux.vnet.ibm.com> <20170515090318.amup4tbqujg27nrl@hirez.programming.kicks-ass.net> <20170515184043.GU3956@linux.vnet.ibm.com> <20170516081923.fxg67gawc44eg6i6@hirez.programming.kicks-ass.net> <20170516124606.GC3956@linux.vnet.ibm.com> <20170516142742.GA17599@linux.vnet.ibm.com> <20170517104010.5dfz7qqeigwbzb2u@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170517104010.5dfz7qqeigwbzb2u@hirez.programming.kicks-ass.net> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 17051714-0040-0000-0000-000003404BDB X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00007078; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000212; SDB=6.00861782; UDB=6.00427466; IPR=6.00641442; BA=6.00005355; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00015491; XFM=3.00000015; UTC=2017-05-17 14:55:23 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17051714-0041-0000-0000-000007347A75 Message-Id: <20170517145520.GI3956@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-05-17_11:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1703280000 definitions=main-1705170114 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 17, 2017 at 12:40:10PM +0200, Peter Zijlstra wrote: > On Tue, May 16, 2017 at 07:27:42AM -0700, Paul E. McKenney wrote: > > On Tue, May 16, 2017 at 05:46:06AM -0700, Paul E. McKenney wrote: > > > > Something like this, yes. Maybe even exactly like this. ;-) > > > > Ah, one thing I forgot... If you are avoiding use of get_online_cpus(), > > you usually also have to be very careful about how you use things like > > cpu_online() and cpu_is_offline. > > OK, so I think I got it wrong there. This hunk should close any race > between perf_pmu_register() and hotplug by tracking a global state > protected by pmus_lock. Thereby insuring the cpuctx->online state gets > initialized right. Looks much better! > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -8993,6 +8993,8 @@ static int pmu_dev_alloc(struct pmu *pmu > static struct lock_class_key cpuctx_mutex; > static struct lock_class_key cpuctx_lock; > > +static cpumask_var_t perf_online_mask; > + > int perf_pmu_register(struct pmu *pmu, const char *name, int type) > { > int cpu, ret; > @@ -9056,7 +9058,7 @@ int perf_pmu_register(struct pmu *pmu, c > lockdep_set_class(&cpuctx->ctx.mutex, &cpuctx_mutex); > lockdep_set_class(&cpuctx->ctx.lock, &cpuctx_lock); > cpuctx->ctx.pmu = pmu; > - cpuctx->online = cpu_online(cpu); > + cpuctx->online = cpumask_test_cpu(cpu, perf_online_mask); > > __perf_mux_hrtimer_init(cpuctx, cpu); > } > @@ -10952,6 +10954,8 @@ static void __init perf_event_init_all_c > struct swevent_htable *swhash; > int cpu; > > + zalloc_cpumask_var(&perf_online_mask, GFP_KERNEL); > + > for_each_possible_cpu(cpu) { > swhash = &per_cpu(swevent_htable, cpu); > mutex_init(&swhash->hlist_mutex); > @@ -11011,6 +11015,7 @@ static void perf_event_exit_cpu_context( > cpuctx->online = 0; > mutex_unlock(&ctx->mutex); > } > + cpumask_clear_cpu(cpu, perf_online_mask); > mutex_unlock(&pmus_lock); > } > #else > @@ -11028,6 +11033,7 @@ int perf_event_init_cpu(unsigned int cpu > perf_swevent_init_cpu(cpu); > > mutex_lock(&pmus_lock); > + cpumask_set_cpu(cpu, perf_online_mask); > list_for_each_entry(pmu, &pmus, entry) { > cpuctx = per_cpu_ptr(pmu->pmu_cpu_context, cpu); > ctx = &cpuctx->ctx; > > > > > --- > > > > --- a/kernel/events/core.c > > > > +++ b/kernel/events/core.c > > > > @@ -8997,7 +8997,6 @@ int perf_pmu_register(struct pmu *pmu, c > > > > { > > > > int cpu, ret; > > > > > > > > - get_online_cpus(); > > > > mutex_lock(&pmus_lock); > > > > ret = -ENOMEM; > > > > pmu->pmu_disable_count = alloc_percpu(int); > > > > > > There is usually also some state check in here somewhere for the CPU > > > being offline from a perf perspective. Such a check might already exist, > > > but I must plead ignorance of perf. > > This just allocates per-cpu storage, that is per definition on the > possible mask and unrelated to the online mask. Got it! > > > > @@ -9093,7 +9092,6 @@ int perf_pmu_register(struct pmu *pmu, c > > > > ret = 0; > > > > unlock: > > > > mutex_unlock(&pmus_lock); > > > > - put_online_cpus(); > > > > > > > > return ret; > > > > > > > > @@ -11002,10 +11000,9 @@ static void perf_event_exit_cpu_context( > > > > struct perf_cpu_context *cpuctx; > > > > struct perf_event_context *ctx; > > > > struct pmu *pmu; > > > > - int idx; > > > > > > > > - idx = srcu_read_lock(&pmus_srcu); > > > > - list_for_each_entry_rcu(pmu, &pmus, entry) { > > > > + mutex_lock(&pmus_lock); > > > > > > If the state change checked for by perf_pmu_register() needs to be also > > > guarded by ctx->mutex, this looks right to me. > > Right, so we have two locks, pmus_lock that serializes hotplug vs > perf_pmu_register and ctx->lock that serializes find_get_context() vs > hotplug. > > Together they ensure that if a PMU is observed, the PMU's cpuctx's have > the correct ->online state. Sounds good, and now the added pmus_lock allows dropping get_online_cpus(). > > > Just for completeness, the other style is to maintain separate per-CPU > > > state, in which case you would instead acquire pmus_lock, mark this > > > CPU off-limits to more perf_pmu_register() usage, release pmus_lock, > > > then clean up any old usage. > > I'm not immediately seeing the other style, but per the above, I need > that too. Because the previous could race against hotplug if > perf_pmu_register() would observe cpu_online() as set after > perf_event_exit_cpu() or something. > > With the above change any chance of a race is gone and we don't need to > worry about hotplug ordering too much. Nice!!! > Now I just need to do something about the swevent hash, because that's > got a hole in now. On the styles, here is a more coherence list: 1. Use get_online_cpus(), but no CPU-hotplug notifiers. You can use cpu_online() and cpu_is_offline() straightforwardly while get_online_cpus() is in effect, but these two may only be used for statistical/heuristic purposes otherwise. 2a. Use CPU-hotplug notifiers, but not get_online_cpus(). The notifiers maintain some sort of bitmask tracking which CPUs are present from the viewpoint of this subsystem. This bitmask provides exact CPU presence/absence indications, at least assuming the appropriate lock is held. The cpu_online() and cpu_is_offline() macros should be avoided, except possibly for statistical/heuristic purposes. For your perf patch, the bitmask is a cpumask_var_t. For RCU, it is the combination of the ->qsmaskinitnext fields of the leaf rcu_node structures. 2b. Like 2a, except that instead of a bitmask, the CPU online/offline information is implicit in other data structures. For example, per-CPU structures might be allocated only when the corresponding CPU is online, so that a given per-CPU pointer is non-NULL iff the corresponding CPU is online. So I was (confusingly) initially using "style" to distinguish #1 on the one hand from #2a and #2b on the other. Later on, I was using "style" to distinguish #2a from #2b. At this point, I am not sure that it makes all that much sense to distinguish 2a from 2b. Or you could argue that use of a cpumask_var_t is its own substyle, with hand-crafted bitmasks (such as RCU's) are separate substyles. Can't say that I care all that much about that fine-grained gradiations. ;-) Thanx, Paul