From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Pranith Kumar <bobby.prani@gmail.com>
Cc: Dipankar Sarma <dipankar@in.ibm.com>,
"open list:READ-COPY UPDATE..." <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH 1/1] kernel/rcu/tree.c: correct a check for grace period in progress
Date: Wed, 11 Jun 2014 11:18:40 -0700 [thread overview]
Message-ID: <20140611181840.GA26554@linux.vnet.ibm.com> (raw)
In-Reply-To: <20140611044242.GY4581@linux.vnet.ibm.com>
On Tue, Jun 10, 2014 at 09:42:42PM -0700, Paul E. McKenney wrote:
> On Wed, Jun 11, 2014 at 12:23:57AM -0400, Pranith Kumar wrote:
> > Hi Paul,
> >
> > On Wed, Jun 11, 2014 at 12:12 AM, Paul E. McKenney
> > <paulmck@linux.vnet.ibm.com> wrote:
> > >> if (rnp->gpnum != rnp->completed ||
> > >> - ACCESS_ONCE(rnp->gpnum) != ACCESS_ONCE(rnp->completed)) {
> > >> + ACCESS_ONCE(rnp_root->gpnum) != ACCESS_ONCE(rnp_root->completed)) {
> > >
> > > At this point in the code, we are checking the current rcu_node structure,
> > > which might or might not be the root. If it is not the root, we absolutely
> > > cannot compare against the root because we don't yet hold the root's lock.
> > >
> >
> > I was a bit thrown by the double checking which is being done
> > (rnp->gpnum != rnp->complete) in that if condition. Once without
> > ACCESS_ONCE and one with. Is there any particular reason for this?
> >
> > I now understand that we are comparing ->gpnum and ->completed of the
> > root node which might change from under us if we don't hold the root's
> > lock. I will keep looking :)
>
> Hmmm... Now that you mention it, that does look a bit strange.
And it turns out that you were right to begin with! I queue your change,
but with a full explanation in the commit log and with some additions to
the comment. Please see below.
Thanx, Paul
------------------------------------------------------------------------
rcu: Check both root and current rcu_node when setting up future grace period
The rcu_start_future_gp() function checks the current rcu_node's ->gpnum
and ->completed twice, once without ACCESS_ONCE() and once with it.
Which is pointless because we hold that rcu_node's ->lock at that point.
The intent was to check the current rcu_node structure and the root
rcu_node structure, the latter locklessly with ACCESS_ONCE(). This
commit therefore makes that change.
The reason that it is safe to locklessly check the root rcu_nodes's
->gpnum and ->completed fields is that we hold the current rcu_node's
->lock, which constrains the root rcu_node's ability to change its
->gpnum and ->completed fields. Of course, if there is a single rcu_node
structure, then rnp_root==rnp, and holding the lock prevents all changes.
If there is more than one rcu_node structure, then the code updates the
fields in the following order:
1. Increment rnp_root->gpnum to start new grace period.
2. Increment rnp->gpnum to initialize the current rcu_node,
continuing initialization for the new grace period.
3. Increment rnp_root->completed to end the current grace period.
4. Increment rnp->completed to continue cleaning up after the
old grace period.
So there are four possible combinations of relative values of these
four fields:
N N N N: RCU idle, new grace period must be initiated.
Although rnp_root->gpnum might be incremented immediately
after we check, that will just result in unnecessary work.
The grace period already started, and we try to start it.
N+1 N N N: RCU grace period just started. No further change is
possible because we hold rnp->lock, so the checks of
rnp_root->gpnum and rnp_root->completed are stable.
We know that our request for a future grace period will
be seen during grace-period cleanup.
N+1 N N+1 N: RCU grace period is ongoing. Because rnp->gpnum is
different than rnp->completed, we won't even look at
rnp_root->gpnum and rnp_root->completed, so the possible
concurrent change to rnp_root->completed does not matter.
We know that our request for a future grace period will
be seen during grace-period cleanup, which cannot pass
this rcu_node because we hold its ->lock.
N+1 N+1 N+1 N: RCU grace period has ended, but not yet been cleaned up.
Because rnp->gpnum is different than rnp->completed, we
won't look at rnp_root->gpnum and rnp_root->completed, so
the possible concurrent change to rnp_root->completed does
not matter. We know that our request for a future grace
period will be seen during grace-period cleanup, which
cannot pass this rcu_node because we hold its ->lock.
Therefore, despite initial appearances, the lockless check is safe.
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
[ paulmck: Update comment to say why the lockless check is safe. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index b14ea3693b79..ebafb08f2b2a 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1224,10 +1224,16 @@ rcu_start_future_gp(struct rcu_node *rnp, struct rcu_data *rdp,
* believe that a grace period is in progress, then we must wait
* for the one following, which is in "c". Because our request
* will be noticed at the end of the current grace period, we don't
- * need to explicitly start one.
+ * need to explicitly start one. We only do the lockless check
+ * of rnp_root's fields if the current rcu_node structure thinks
+ * there is no grace period in flight, and because we hold rnp->lock,
+ * the only possible change is when rnp_root's two fields are
+ * equal, in which case rnp_root->gpnum might be concurrently
+ * incremented. But that is OK, as it will just result in our
+ * doing some extra useless work.
*/
if (rnp->gpnum != rnp->completed ||
- ACCESS_ONCE(rnp->gpnum) != ACCESS_ONCE(rnp->completed)) {
+ ACCESS_ONCE(rnp_root->gpnum) != ACCESS_ONCE(rnp_root->completed)) {
rnp->need_future_gp[c & 0x1]++;
trace_rcu_future_gp(rnp, rdp, c, TPS("Startedleaf"));
goto out;
next prev parent reply other threads:[~2014-06-11 18:18 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-11 3:20 [RFC PATCH 1/1] kernel/rcu/tree.c: correct a check for grace period in progress Pranith Kumar
2014-06-11 4:12 ` Paul E. McKenney
2014-06-11 4:23 ` Pranith Kumar
2014-06-11 4:42 ` Paul E. McKenney
2014-06-11 18:18 ` Paul E. McKenney [this message]
2014-06-11 18:34 ` Pranith Kumar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140611181840.GA26554@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=bobby.prani@gmail.com \
--cc=dipankar@in.ibm.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox