All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Ingo Molnar <mingo@redhat.com>,
	linux-kernel@vger.kernel.org,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Mike Galbraith <efault@gmx.de>, Paul Mackerras <paulus@samba.org>,
	Stephane Eranian <eranian@google.com>,
	Andi Kleen <ak@linux.intel.com>
Subject: Re: [PATCH] [RFC] perf: Fix a race between ring_buffer_detach() and ring_buffer_wakeup()
Date: Fri, 14 Mar 2014 17:00:09 -0700	[thread overview]
Message-ID: <20140315000009.GA28008@linux.vnet.ibm.com> (raw)
In-Reply-To: <20140314230231.GM21124@linux.vnet.ibm.com>

On Fri, Mar 14, 2014 at 04:02:31PM -0700, Paul E. McKenney wrote:
> On Fri, Mar 14, 2014 at 11:43:17PM +0100, Peter Zijlstra wrote:
> > On Fri, Mar 14, 2014 at 01:47:37PM -0700, Paul E. McKenney wrote:
> > > This general idea can be made to work, but it will need some
> > > internal-to-RCU help.  One vulnerability of the patch below is the
> > > following sequence of steps:
> > > 
> > > 1.	RCU has just finished a grace period, and is doing the
> > > 	end-of-grace-period accounting.
> > > 
> > > 2.	The code below invokes rcu_batches_completed().  Let's assume
> > > 	the result returned is 42.
> > > 
> > > 3.	RCU completes the end-of-grace-period accounting, and increments
> > > 	rcu_sched_state.completed.
> > > 
> > > 4.	The code below checks ->rcu_batches against the result from
> > > 	another invocation of rcu_batches_completed() and sees that
> > > 	the 43 is not equal to 42, so skips the synchronize_rcu().
> > > 
> > > Except that a grace period has not actually completed.  Boom!!!
> > > 
> > > The problem is that rcu_batches_completed() is only intended to give
> > > progress information on RCU.
> > 
> > Ah, I thought I was missing something when I was looking through the rcu
> > code in a hurry :-)
> 
> Well, given that I sometimes miss things when looking through RCU code
> carefuly, I guess I cannot give you too much trouble about it.
> 
> > I knew there'd be some subtlety between completed and gpnum and such :-)
> 
> Some of which I have learned about one RCU bug at a time.  ;-)
> 
> > > What I can do is give you a pair of functions, one to take a snapshot of
> > > the current grace-period state (returning an unsigned long) and another
> > > to evaluate a previous snapshot, invoking synchronize_rcu() if there has
> > > not been a full grace period in the meantime.
> > > 
> > > The most straightforward approach would invoke acquiring the global
> > > rcu_state ->lock on each call, which I am guessing just might be
> > > considered to be excessive overhead.  ;-)  I should be able to decrease
> > > the overhead to a memory barrier on each call, and perhaps even down
> > > to an smp_load_acquire().  Accessing the RCU state probably costs you
> > > a cache miss both times.
> > > 
> > > Thoughts?
> > 
> > Sounds fine, the attach isn't a hotpath, so even the locked version
> > should be fine, but I won't keep you from making it all fancy and such
> > :-)
> 
> Fair enough, let me see what I can come up with.

And here is an untested patch.  Thoughts?

(And yes, I need to update documentation and torture tests accordingly.)

							Thanx, Paul

------------------------------------------------------------------------

rcu: Provide grace-period piggybacking API
    
The following pattern is currently not well supported by RCU:
    
1.	Make data element inaccessible to RCU readers.
    
2.	Do work that probably lasts for more than one grace period.
    
3.	Do something to make sure RCU readers in flight before #1 above
    	have completed.
    
Here are some things that could currently be done:
    
a.	Do a synchronize_rcu() unconditionally at either #1 or #3 above.
    	This works, but imposes needless work and latency.
    
b.	Post an RCU callback at #1 above that does a wakeup, then
    	wait for the wakeup at #3.  This works well, but likely results
    	in an extra unneeded grace period.  Open-coding this is also
    	a bit more semi-tricky code than would be good.

This commit therefore adds get_state_synchronize_rcu() and
cond_synchronize_rcu() APIs.  Call get_state_synchronize_rcu() at #1
above and pass its return value to cond_synchronize_rcu() at #3 above.
This results in a call to synchronize_rcu() if no grace period has
elapsed between #1 and #3, but requires only a load, comparison, and
memory barrier if a full grace period did elapse.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index c9be2235712c..dbf0f225bca0 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1503,13 +1503,14 @@ static int rcu_gp_init(struct rcu_state *rsp)
 
 	/* Advance to a new grace period and initialize state. */
 	record_gp_stall_check_time(rsp);
-	smp_wmb(); /* Record GP times before starting GP. */
-	rsp->gpnum++;
+	/* Record GP times before starting GP, hence smp_store_release(). */
+	smp_store_release(&rsp->gpnum, rsp->gpnum + 1);
 	trace_rcu_grace_period(rsp->name, rsp->gpnum, TPS("start"));
 	raw_spin_unlock_irq(&rnp->lock);
 
 	/* Exclude any concurrent CPU-hotplug operations. */
 	mutex_lock(&rsp->onoff_mutex);
+	smp_mb__after_unlock_lock(); /* ->gpnum increment before GP! */
 
 	/*
 	 * Set the quiescent-state-needed bits in all the rcu_node
@@ -1638,10 +1639,11 @@ static void rcu_gp_cleanup(struct rcu_state *rsp)
 	}
 	rnp = rcu_get_root(rsp);
 	raw_spin_lock_irq(&rnp->lock);
-	smp_mb__after_unlock_lock();
+	smp_mb__after_unlock_lock(); /* Order GP before ->completed update. */
 	rcu_nocb_gp_set(rnp, nocb);
 
-	rsp->completed = rsp->gpnum; /* Declare grace period done. */
+	/* Declare grace period done. */
+	ACCESS_ONCE(rsp->completed) = rsp->gpnum;
 	trace_rcu_grace_period(rsp->name, rsp->completed, TPS("end"));
 	rsp->fqs_state = RCU_GP_IDLE;
 	rdp = this_cpu_ptr(rsp->rda);
@@ -2728,6 +2730,37 @@ void synchronize_rcu_bh(void)
 }
 EXPORT_SYMBOL_GPL(synchronize_rcu_bh);
 
+/**
+ * get_state_synchronize_rcu - Snapshot current RCU state
+ *
+ * Returns a cookie that is used by a later call to cond_synchronize_rcu()
+ * to determine whether or not a full grace period has elapsed in the
+ * meantime.
+ */
+unsigned long get_state_synchronize_rcu(void)
+{
+	return smp_load_acquire(&rcu_state->gpnum);
+}
+EXPORT_SYMBOL_GPL(get_state_synchronize_rcu);
+
+/**
+ * cond_synchronize_rcu - Conditionally wait for an RCU grace period
+ *
+ * @oldstate: return value from earlier call to get_state_synchronize_rcu()
+ *
+ * If a full RCU grace period has elapsed since the earlier call to
+ * get_state_synchronize_rcu(), just return.  Otherwise, invoke
+ * synchronize_rcu() to wait for a full grace period.
+ */
+void cond_synchronize_rcu(unsigned long oldstate)
+{
+	unsigned long newstate = smp_load_acquire(&rcu_state->completed);
+
+	if (ULONG_CMP_GE(oldstate, newstate))
+		synchronize_rcu();
+}
+EXPORT_SYMBOL_GPL(cond_synchronize_rcu);
+
 static int synchronize_sched_expedited_cpu_stop(void *data)
 {
 	/*


  reply	other threads:[~2014-03-15  0:00 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-07 13:38 [PATCH] [RFC] perf: Fix a race between ring_buffer_detach() and ring_buffer_wakeup() Alexander Shishkin
2014-03-13 19:58 ` Paul E. McKenney
2014-03-14  9:50   ` Peter Zijlstra
2014-03-14 20:47     ` Paul E. McKenney
2014-03-14 22:43       ` Peter Zijlstra
2014-03-14 23:02         ` Paul E. McKenney
2014-03-15  0:00           ` Paul E. McKenney [this message]
2014-03-17 11:18             ` Peter Zijlstra
2014-03-17 16:48               ` Paul E. McKenney
2014-03-17 17:30                 ` Peter Zijlstra
2014-03-18  2:45                   ` Paul E. McKenney
2014-03-18  8:26                     ` Peter Zijlstra
2014-05-07 12:35     ` Peter Zijlstra
2014-05-07 18:06       ` Peter Zijlstra
2014-05-08 15:37         ` Paul E. McKenney
2014-05-08 16:09           ` Peter Zijlstra
2014-05-19 12:48       ` [tip:perf/urgent] perf: Fix a race between ring_buffer_detach() and ring_buffer_attach() tip-bot for Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140315000009.GA28008@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=ak@linux.intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=efault@gmx.de \
    --cc=eranian@google.com \
    --cc=fweisbec@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=paulus@samba.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.