Re: [RFC][PATCH 0/5] perf/tracing/cpuhotplug: Fix locking order

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>,
	linux-kernel@vger.kernel.org,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@kernel.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Masami Hiramatsu <mhiramat@kernel.org>
Subject: Re: [RFC][PATCH 0/5] perf/tracing/cpuhotplug: Fix locking order
Date: Wed, 17 May 2017 07:55:20 -0700	[thread overview]
Message-ID: <20170517145520.GI3956@linux.vnet.ibm.com> (raw)
In-Reply-To: <20170517104010.5dfz7qqeigwbzb2u@hirez.programming.kicks-ass.net>

On Wed, May 17, 2017 at 12:40:10PM +0200, Peter Zijlstra wrote:
> On Tue, May 16, 2017 at 07:27:42AM -0700, Paul E. McKenney wrote:
> > On Tue, May 16, 2017 at 05:46:06AM -0700, Paul E. McKenney wrote:
> 
> > > Something like this, yes.  Maybe even exactly like this.  ;-)
> > 
> > Ah, one thing I forgot...  If you are avoiding use of get_online_cpus(),
> > you usually also have to be very careful about how you use things like
> > cpu_online() and cpu_is_offline.
> 
> OK, so I think I got it wrong there. This hunk should close any race
> between perf_pmu_register() and hotplug by tracking a global state
> protected by pmus_lock. Thereby insuring the cpuctx->online state gets
> initialized right.

Looks much better!

> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -8993,6 +8993,8 @@ static int pmu_dev_alloc(struct pmu *pmu
>  static struct lock_class_key cpuctx_mutex;
>  static struct lock_class_key cpuctx_lock;
> 
> +static cpumask_var_t perf_online_mask;
> +
>  int perf_pmu_register(struct pmu *pmu, const char *name, int type)
>  {
>  	int cpu, ret;
> @@ -9056,7 +9058,7 @@ int perf_pmu_register(struct pmu *pmu, c
>  		lockdep_set_class(&cpuctx->ctx.mutex, &cpuctx_mutex);
>  		lockdep_set_class(&cpuctx->ctx.lock, &cpuctx_lock);
>  		cpuctx->ctx.pmu = pmu;
> -		cpuctx->online = cpu_online(cpu);
> +		cpuctx->online = cpumask_test_cpu(cpu, perf_online_mask);
> 
>  		__perf_mux_hrtimer_init(cpuctx, cpu);
>  	}
> @@ -10952,6 +10954,8 @@ static void __init perf_event_init_all_c
>  	struct swevent_htable *swhash;
>  	int cpu;
> 
> +	zalloc_cpumask_var(&perf_online_mask, GFP_KERNEL);
> +
>  	for_each_possible_cpu(cpu) {
>  		swhash = &per_cpu(swevent_htable, cpu);
>  		mutex_init(&swhash->hlist_mutex);
> @@ -11011,6 +11015,7 @@ static void perf_event_exit_cpu_context(
>  		cpuctx->online = 0;
>  		mutex_unlock(&ctx->mutex);
>  	}
> +	cpumask_clear_cpu(cpu, perf_online_mask);
>  	mutex_unlock(&pmus_lock);
>  }
>  #else
> @@ -11028,6 +11033,7 @@ int perf_event_init_cpu(unsigned int cpu
>  	perf_swevent_init_cpu(cpu);
> 
>  	mutex_lock(&pmus_lock);
> +	cpumask_set_cpu(cpu, perf_online_mask);
>  	list_for_each_entry(pmu, &pmus, entry) {
>  		cpuctx = per_cpu_ptr(pmu->pmu_cpu_context, cpu);
>  		ctx = &cpuctx->ctx;
> 
> > > > ---
> > > > --- a/kernel/events/core.c
> > > > +++ b/kernel/events/core.c
> > > > @@ -8997,7 +8997,6 @@ int perf_pmu_register(struct pmu *pmu, c
> > > >  {
> > > >  	int cpu, ret;
> > > > 
> > > > -	get_online_cpus();
> > > >  	mutex_lock(&pmus_lock);
> > > >  	ret = -ENOMEM;
> > > >  	pmu->pmu_disable_count = alloc_percpu(int);
> > > 
> > > There is usually also some state check in here somewhere for the CPU
> > > being offline from a perf perspective.  Such a check might already exist,
> > > but I must plead ignorance of perf.
> 
> This just allocates per-cpu storage, that is per definition on the
> possible mask and unrelated to the online mask.

Got it!

> > > > @@ -9093,7 +9092,6 @@ int perf_pmu_register(struct pmu *pmu, c
> > > >  	ret = 0;
> > > >  unlock:
> > > >  	mutex_unlock(&pmus_lock);
> > > > -	put_online_cpus();
> > > > 
> > > >  	return ret;
> > > > 
> > > > @@ -11002,10 +11000,9 @@ static void perf_event_exit_cpu_context(
> > > >  	struct perf_cpu_context *cpuctx;
> > > >  	struct perf_event_context *ctx;
> > > >  	struct pmu *pmu;
> > > > -	int idx;
> > > > 
> > > > -	idx = srcu_read_lock(&pmus_srcu);
> > > > -	list_for_each_entry_rcu(pmu, &pmus, entry) {
> > > > +	mutex_lock(&pmus_lock);
> > > 
> > > If the state change checked for by perf_pmu_register() needs to be also
> > > guarded by ctx->mutex, this looks right to me.
> 
> Right, so we have two locks, pmus_lock that serializes hotplug vs
> perf_pmu_register and ctx->lock that serializes find_get_context() vs
> hotplug.
> 
> Together they ensure that if a PMU is observed, the PMU's cpuctx's have
> the correct ->online state.

Sounds good, and now the added pmus_lock allows dropping get_online_cpus().

> > > Just for completeness, the other style is to maintain separate per-CPU
> > > state, in which case you would instead acquire pmus_lock, mark this
> > > CPU off-limits to more perf_pmu_register() usage, release pmus_lock,
> > > then clean up any old usage.
> 
> I'm not immediately seeing the other style, but per the above, I need
> that too. Because the previous could race against hotplug if
> perf_pmu_register() would observe cpu_online() as set after
> perf_event_exit_cpu() or something.
> 
> With the above change any chance of a race is gone and we don't need to
> worry about hotplug ordering too much.

Nice!!!

> Now I just need to do something about the swevent hash, because that's
> got a hole in now.

On the styles, here is a more coherence list:

1.	Use get_online_cpus(), but no CPU-hotplug notifiers.  You can
	use cpu_online() and cpu_is_offline() straightforwardly while
	get_online_cpus() is in effect, but these two may only be used
	for statistical/heuristic purposes otherwise.

2a.	Use CPU-hotplug notifiers, but not get_online_cpus().
	The notifiers maintain some sort of bitmask tracking which CPUs
	are present from the viewpoint of this subsystem.  This bitmask
	provides exact CPU presence/absence indications, at least
	assuming the appropriate lock is held.	The cpu_online() and
	cpu_is_offline() macros should be avoided, except possibly for
	statistical/heuristic purposes.

	For your perf patch, the bitmask is a cpumask_var_t.  For RCU,
	it is the combination of the ->qsmaskinitnext fields of the leaf
	rcu_node structures.

2b.	Like 2a, except that instead of a bitmask, the CPU online/offline
	information is implicit in other data structures.  For example,
	per-CPU structures might be allocated only when the corresponding
	CPU is online, so that a given per-CPU pointer is non-NULL
	iff the corresponding CPU is online.

So I was (confusingly) initially using "style" to distinguish #1 on the
one hand from #2a and #2b on the other.  Later on, I was using "style"
to distinguish #2a from #2b.

At this point, I am not sure that it makes all that much sense to
distinguish 2a from 2b.  Or you could argue that use of a cpumask_var_t
is its own substyle, with hand-crafted bitmasks (such as RCU's) are
separate substyles.  Can't say that I care all that much about that
fine-grained gradiations.  ;-)

							Thanx, Paul

next prev parent reply	other threads:[~2017-05-17 14:55 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-12 17:15 [RFC][PATCH 0/5] perf/tracing/cpuhotplug: Fix locking order Steven Rostedt
2017-05-12 17:15 ` [RFC][PATCH 1/5] tracing: Make sure RCU is watching before calling a stack trace Steven Rostedt
2017-05-12 18:25   ` Paul E. McKenney
2017-05-12 18:36     ` Steven Rostedt
2017-05-12 18:50       ` Paul E. McKenney
2017-05-12 20:05         ` Steven Rostedt
2017-05-12 20:31           ` Paul E. McKenney
2017-05-17 16:46             ` Steven Rostedt
2017-05-12 17:15 ` [RFC][PATCH 2/5] cpu-hotplug: Allow get_online_cpus() to nest Steven Rostedt
2017-05-12 18:35   ` Paul E. McKenney
2017-05-12 18:40     ` Steven Rostedt
2017-05-12 18:52       ` Paul E. McKenney
2017-05-12 22:15   ` Thomas Gleixner
2017-05-13  0:23     ` Steven Rostedt
2017-05-12 17:15 ` [RFC][PATCH 3/5] kprobes: Take get_online_cpus() before taking jump_label_lock() Steven Rostedt
2017-05-12 18:39   ` Paul E. McKenney
2017-05-12 18:44     ` Steven Rostedt
2017-05-17 17:50   ` Masami Hiramatsu
2017-05-12 17:15 ` [RFC][PATCH 4/5] tracepoints: Grab get_online_cpus() before taking tracepoints_mutex Steven Rostedt
2017-05-12 17:15 ` [RFC][PATCH 5/5] perf: Grab event_mutex before taking get_online_cpus() Steven Rostedt
2017-05-12 18:13 ` [RFC][PATCH 0/5] perf/tracing/cpuhotplug: Fix locking order Paul E. McKenney
2017-05-12 19:49 ` Peter Zijlstra
2017-05-12 20:14   ` Steven Rostedt
2017-05-12 21:34   ` Steven Rostedt
2017-05-13 13:40     ` Paul E. McKenney
2017-05-15  9:03       ` Peter Zijlstra
2017-05-15 18:40         ` Paul E. McKenney
2017-05-16  8:19           ` Peter Zijlstra
2017-05-16 12:46             ` Paul E. McKenney
2017-05-16 14:27               ` Paul E. McKenney
2017-05-17 10:40                 ` Peter Zijlstra
2017-05-17 14:55                   ` Paul E. McKenney [this message]
2017-05-18  3:58                     ` Paul E. McKenney
2017-05-15 19:06   ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170517145520.GI3956@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mhiramat@kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.