linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org, mingo@kernel.org,
	jiangshanlai@gmail.com, dipankar@in.ibm.com,
	akpm@linux-foundation.org, mathieu.desnoyers@efficios.com,
	josh@joshtriplett.org, tglx@linutronix.de, rostedt@goodmis.org,
	dhowells@redhat.com, edumazet@google.com, dvhart@linux.intel.com,
	fweisbec@gmail.com, oleg@redhat.com, bobby.prani@gmail.com
Subject: Re: [PATCH tip/core/rcu 02/18] rcu: Move rcu_report_exp_rnp() to allow consolidation
Date: Wed, 7 Oct 2015 09:48:58 -0700	[thread overview]
Message-ID: <20151007164858.GQ3910@linux.vnet.ibm.com> (raw)
In-Reply-To: <20151007144024.GI17308@twins.programming.kicks-ass.net>

On Wed, Oct 07, 2015 at 04:40:24PM +0200, Peter Zijlstra wrote:
> On Wed, Oct 07, 2015 at 07:33:25AM -0700, Paul E. McKenney wrote:
> > > I'm sure you know what that means, but I've no clue ;-) That is, I
> > > wouldn't know where to start looking in the RCU implementation to verify
> > > the barrier is either needed or sufficient. Unless you mean _everywhere_
> > > :-)
> > 
> > Pretty much everywhere.
> > 
> > Let's take the usual RCU removal pattern as an example:
> > 
> > 	void f1(struct foo *p)
> > 	{
> > 		list_del_rcu(p);
> > 		synchronize_rcu_expedited();
> > 		kfree(p);
> > 	}
> > 
> > 	void f2(void)
> > 	{
> > 		struct foo *p;
> > 
> > 		list_for_each_entry_rcu(p, &my_head, next)
> > 			do_something_with(p);
> > 	}
> > 
> > So the synchronize_rcu_expedited() acts as an extremely heavyweight
> > memory barrier that pairs with the rcu_dereference() inside of
> > list_for_each_entry_rcu().  Easy enough, right?
> > 
> > But what exactly within synchronize_rcu_expedited() provides the
> > ordering?  The answer is a web of lock-based critical sections and
> > explicit memory barriers, with the one you called out as needing
> > a comment being one of them.
> 
> Right, but seeing there's possible implementations of sync_rcu(_exp)*()
> that do not have the whole rcu_node tree like thing, there's more to
> this particular barrier than the semantics of sync_rcu().
> 
> Some implementation choice requires this barrier upgrade -- and in
> another email I suggest its the whole tree thing, we need to firmly
> establish the state of one level before propagating the state up etc.
> 
> Now I'm not entirely sure this is fully correct, but its the best I
> could come up.

It is pretty close.  Ignoring dyntick idle for the moment, things
go (very) roughly like this:

o	The RCU grace-period kthread notices that a new grace period
	is needed.  It initializes the tree, which includes acquiring
	every rcu_node structure's ->lock.

o	CPU A notices that there is a new grace period.  It acquires
	the ->lock of its leaf rcu_node structure, which forces full
	ordering against the grace-period kthread.

o	Some time later, that CPU A realizes that it has passed
	through a quiescent state, and again acquires its leaf rcu_node
	structure's ->lock, again enforcing full ordering, but this
	time against all CPUs corresponding to this same leaf rcu_node
	structure that previously noticed quiescent states for this
	same grace period.  Also against all prior readers on this
	same CPU.

o	Some time later, CPU B (corresponding to that same leaf
	rcu_node structure) is the last of that leaf's group of CPUs
	to notice a quiescent state.  It has also acquired that leaf's
	->lock, again forcing ordering against its prior RCU read-side
	critical sections, but also against all the prior RCU
	read-side critical sections of all other CPUs corresponding
	to this same leaf.

o	CPU B therefore moves up the tree, acquiring the parent
	rcu_node structures' ->lock.  In so doing, it forces full
	ordering against all prior RCU read-side critical sections
	of all CPUs corresponding to all leaf rcu_node structures
	subordinate to the current (non-leaf) rcu_node structure.

o	And so on, up the tree.

o	When CPU C reaches the root of the tree, and realizes that
	it is the last CPU to report a quiescent state for the
	current grace period, its acquisition of the root rcu_node
	structure's ->lock has forced full ordering against all
	RCU read-side critical sections that started before this
	grace period -- on all CPUs.

	CPU C therefore awakens the grace-period kthread.

o	When the grace-period kthread wakes up, it does cleanup,
	which (you guessed it!) requires acquiring the ->lock of
	each rcu_node structure.  This not only forces full ordering
	against each pre-existing RCU read-side critical section,
	it also sets up things so that...

o	When CPU D notices that the grace period ended, it does so
	while holding its leaf rcu_node structure's ->lock.  This
	forces full ordering against all relevant RCU read-side
	critical sections.  This ordering prevails when CPU D later
	starts invoking RCU callbacks.

o	Just for fun, suppose that one of those callbacks does an
	"smp_store_release(&leak_gp, 1)".  Suppose further that some
	CPU E that is not yet aware that the grace period is finished
	does an "r1 = smp_load_acquire(&lead_gp)" and gets 1.  Even
	if CPU E was the very first CPU to report a quiescent state
	for the grace period, and even if CPU E has not executed any
	sort of ordering operations since, CPU E's subsequent code is
	-still- guaranteed to be fully ordered after each and every
	RCU read-side critical section that started before the grace
	period.

Hey, you asked!!!  ;-)

Again, this is a cartoon-like view of the ordering that leaves out a
lot of details, but it should get across the gist of the ordering.

							Thanx, Paul


  reply	other threads:[~2015-10-07 16:49 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-06 16:29 [PATCH tip/core/rcu 0/18] Expedited grace-period improvements for 4.4 Paul E. McKenney
2015-10-06 16:29 ` [PATCH tip/core/rcu 01/18] rcu: Use rsp->expedited_wq instead of sync_rcu_preempt_exp_wq Paul E. McKenney
2015-10-06 16:29   ` [PATCH tip/core/rcu 02/18] rcu: Move rcu_report_exp_rnp() to allow consolidation Paul E. McKenney
2015-10-06 20:29     ` Peter Zijlstra
2015-10-06 20:58       ` Paul E. McKenney
2015-10-07  7:51         ` Peter Zijlstra
2015-10-07  8:42           ` Mathieu Desnoyers
2015-10-07 11:01             ` Peter Zijlstra
2015-10-07 11:50               ` Peter Zijlstra
2015-10-07 12:03                 ` Peter Zijlstra
2015-10-07 12:05                 ` kbuild test robot
2015-10-07 12:09                 ` kbuild test robot
2015-10-07 12:11                 ` kbuild test robot
2015-10-07 12:17                   ` Peter Zijlstra
2015-10-07 13:44                     ` [kbuild-all] " Fengguang Wu
2015-10-07 13:55                       ` Peter Zijlstra
2015-10-07 14:21                         ` Fengguang Wu
2015-10-07 14:28                           ` Peter Zijlstra
2015-10-07 15:18                 ` Paul E. McKenney
2015-10-08 10:24                   ` Peter Zijlstra
2015-10-07 15:15               ` Paul E. McKenney
2015-10-07 14:33           ` Paul E. McKenney
2015-10-07 14:40             ` Peter Zijlstra
2015-10-07 16:48               ` Paul E. McKenney [this message]
2015-10-08  9:49                 ` Peter Zijlstra
2015-10-08 15:33                   ` Paul E. McKenney
2015-10-08 17:12                     ` Peter Zijlstra
2015-10-08 17:46                       ` Paul E. McKenney
2015-10-09  0:10                       ` Paul E. McKenney
2015-10-09  8:44                         ` Peter Zijlstra
2015-10-06 16:29   ` [PATCH tip/core/rcu 03/18] rcu: Consolidate tree setup for synchronize_rcu_expedited() Paul E. McKenney
2015-10-06 16:29   ` [PATCH tip/core/rcu 04/18] rcu: Use single-stage IPI algorithm for RCU expedited grace period Paul E. McKenney
2015-10-07 13:24     ` Peter Zijlstra
2015-10-07 18:11       ` Paul E. McKenney
2015-10-07 13:35     ` Peter Zijlstra
2015-10-07 15:44       ` Paul E. McKenney
2015-10-07 13:43     ` Peter Zijlstra
2015-10-07 13:49       ` Peter Zijlstra
2015-10-07 16:14         ` Paul E. McKenney
2015-10-08  9:00           ` Peter Zijlstra
2015-10-07 16:13       ` Paul E. McKenney
2015-10-06 16:29   ` [PATCH tip/core/rcu 05/18] rcu: Move synchronize_sched_expedited() to combining tree Paul E. McKenney
2015-10-06 16:29   ` [PATCH tip/core/rcu 06/18] rcu: Rename qs_pending to core_needs_qs Paul E. McKenney
2015-10-06 16:29   ` [PATCH tip/core/rcu 07/18] rcu: Invert passed_quiesce and rename to cpu_no_qs Paul E. McKenney
2015-10-06 16:29   ` [PATCH tip/core/rcu 08/18] rcu: Make ->cpu_no_qs be a union for aggregate OR Paul E. McKenney
2015-10-06 16:29   ` [PATCH tip/core/rcu 09/18] rcu: Switch synchronize_sched_expedited() to IPI Paul E. McKenney
2015-10-07 14:18     ` Peter Zijlstra
2015-10-07 16:24       ` Paul E. McKenney
2015-10-06 16:29   ` [PATCH tip/core/rcu 10/18] rcu: Stop silencing lockdep false positive for expedited grace periods Paul E. McKenney
2015-10-06 16:29   ` [PATCH tip/core/rcu 11/18] rcu: Stop excluding CPU hotplug in synchronize_sched_expedited() Paul E. McKenney
2015-10-06 16:29   ` [PATCH tip/core/rcu 12/18] cpu: Remove try_get_online_cpus() Paul E. McKenney
2015-10-06 16:29   ` [PATCH tip/core/rcu 13/18] rcu: Prepare for consolidating expedited CPU selection Paul E. McKenney
2015-10-06 16:29   ` [PATCH tip/core/rcu 14/18] rcu: Consolidate " Paul E. McKenney
2015-10-06 16:29   ` [PATCH tip/core/rcu 15/18] rcu: Add online/offline info to expedited stall warning message Paul E. McKenney
2015-10-06 16:29   ` [PATCH tip/core/rcu 16/18] rcu: Add tasks to expedited stall-warning messages Paul E. McKenney
2015-10-06 16:29   ` [PATCH tip/core/rcu 17/18] rcu: Enable stall warnings for synchronize_rcu_expedited() Paul E. McKenney
2015-10-06 16:29   ` [PATCH tip/core/rcu 18/18] rcu: Better hotplug handling for synchronize_sched_expedited() Paul E. McKenney
2015-10-07 14:26     ` Peter Zijlstra
2015-10-07 16:26       ` Paul E. McKenney
2015-10-08  9:01         ` Peter Zijlstra
2015-10-08 15:06           ` Paul E. McKenney
2015-10-08 15:12             ` Peter Zijlstra
2015-10-08 15:19               ` Paul E. McKenney
2015-10-08 18:01                 ` Josh Triplett
2015-10-09  0:11                   ` Paul E. McKenney
2015-10-09  0:48                     ` Josh Triplett
2015-10-09  3:54                       ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151007164858.GQ3910@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=bobby.prani@gmail.com \
    --cc=dhowells@redhat.com \
    --cc=dipankar@in.ibm.com \
    --cc=dvhart@linux.intel.com \
    --cc=edumazet@google.com \
    --cc=fweisbec@gmail.com \
    --cc=jiangshanlai@gmail.com \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).