From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Pranith Kumar <bobby.prani@gmail.com>
Cc: Dipankar Sarma <dipankar@in.ibm.com>,
"open list:READ-COPY UPDATE..." <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH 1/1] kernel/rcu/tree.c: correct a check for grace period in progress
Date: Wed, 11 Jun 2014 11:18:40 -0700 [thread overview]
Message-ID: <20140611181840.GA26554@linux.vnet.ibm.com> (raw)
In-Reply-To: <20140611044242.GY4581@linux.vnet.ibm.com>
On Tue, Jun 10, 2014 at 09:42:42PM -0700, Paul E. McKenney wrote:
> On Wed, Jun 11, 2014 at 12:23:57AM -0400, Pranith Kumar wrote:
> > Hi Paul,
> >
> > On Wed, Jun 11, 2014 at 12:12 AM, Paul E. McKenney
> > <paulmck@linux.vnet.ibm.com> wrote:
> > >> if (rnp->gpnum != rnp->completed ||
> > >> - ACCESS_ONCE(rnp->gpnum) != ACCESS_ONCE(rnp->completed)) {
> > >> + ACCESS_ONCE(rnp_root->gpnum) != ACCESS_ONCE(rnp_root->completed)) {
> > >
> > > At this point in the code, we are checking the current rcu_node structure,
> > > which might or might not be the root. If it is not the root, we absolutely
> > > cannot compare against the root because we don't yet hold the root's lock.
> > >
> >
> > I was a bit thrown by the double checking which is being done
> > (rnp->gpnum != rnp->complete) in that if condition. Once without
> > ACCESS_ONCE and one with. Is there any particular reason for this?
> >
> > I now understand that we are comparing ->gpnum and ->completed of the
> > root node which might change from under us if we don't hold the root's
> > lock. I will keep looking :)
>
> Hmmm... Now that you mention it, that does look a bit strange.
And it turns out that you were right to begin with! I queue your change,
but with a full explanation in the commit log and with some additions to
the comment. Please see below.
Thanx, Paul
------------------------------------------------------------------------
rcu: Check both root and current rcu_node when setting up future grace period
The rcu_start_future_gp() function checks the current rcu_node's ->gpnum
and ->completed twice, once without ACCESS_ONCE() and once with it.
Which is pointless because we hold that rcu_node's ->lock at that point.
The intent was to check the current rcu_node structure and the root
rcu_node structure, the latter locklessly with ACCESS_ONCE(). This
commit therefore makes that change.
The reason that it is safe to locklessly check the root rcu_nodes's
->gpnum and ->completed fields is that we hold the current rcu_node's
->lock, which constrains the root rcu_node's ability to change its
->gpnum and ->completed fields. Of course, if there is a single rcu_node
structure, then rnp_root==rnp, and holding the lock prevents all changes.
If there is more than one rcu_node structure, then the code updates the
fields in the following order:
1. Increment rnp_root->gpnum to start new grace period.
2. Increment rnp->gpnum to initialize the current rcu_node,
continuing initialization for the new grace period.
3. Increment rnp_root->completed to end the current grace period.
4. Increment rnp->completed to continue cleaning up after the
old grace period.
So there are four possible combinations of relative values of these
four fields:
N N N N: RCU idle, new grace period must be initiated.
Although rnp_root->gpnum might be incremented immediately
after we check, that will just result in unnecessary work.
The grace period already started, and we try to start it.
N+1 N N N: RCU grace period just started. No further change is
possible because we hold rnp->lock, so the checks of
rnp_root->gpnum and rnp_root->completed are stable.
We know that our request for a future grace period will
be seen during grace-period cleanup.
N+1 N N+1 N: RCU grace period is ongoing. Because rnp->gpnum is
different than rnp->completed, we won't even look at
rnp_root->gpnum and rnp_root->completed, so the possible
concurrent change to rnp_root->completed does not matter.
We know that our request for a future grace period will
be seen during grace-period cleanup, which cannot pass
this rcu_node because we hold its ->lock.
N+1 N+1 N+1 N: RCU grace period has ended, but not yet been cleaned up.
Because rnp->gpnum is different than rnp->completed, we
won't look at rnp_root->gpnum and rnp_root->completed, so
the possible concurrent change to rnp_root->completed does
not matter. We know that our request for a future grace
period will be seen during grace-period cleanup, which
cannot pass this rcu_node because we hold its ->lock.
Therefore, despite initial appearances, the lockless check is safe.
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
[ paulmck: Update comment to say why the lockless check is safe. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index b14ea3693b79..ebafb08f2b2a 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1224,10 +1224,16 @@ rcu_start_future_gp(struct rcu_node *rnp, struct rcu_data *rdp,
* believe that a grace period is in progress, then we must wait
* for the one following, which is in "c". Because our request
* will be noticed at the end of the current grace period, we don't
- * need to explicitly start one.
+ * need to explicitly start one. We only do the lockless check
+ * of rnp_root's fields if the current rcu_node structure thinks
+ * there is no grace period in flight, and because we hold rnp->lock,
+ * the only possible change is when rnp_root's two fields are
+ * equal, in which case rnp_root->gpnum might be concurrently
+ * incremented. But that is OK, as it will just result in our
+ * doing some extra useless work.
*/
if (rnp->gpnum != rnp->completed ||
- ACCESS_ONCE(rnp->gpnum) != ACCESS_ONCE(rnp->completed)) {
+ ACCESS_ONCE(rnp_root->gpnum) != ACCESS_ONCE(rnp_root->completed)) {
rnp->need_future_gp[c & 0x1]++;
trace_rcu_future_gp(rnp, rdp, c, TPS("Startedleaf"));
goto out;
next prev parent reply other threads:[~2014-06-11 18:18 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-11 3:20 [RFC PATCH 1/1] kernel/rcu/tree.c: correct a check for grace period in progress Pranith Kumar
2014-06-11 4:12 ` Paul E. McKenney
2014-06-11 4:23 ` Pranith Kumar
2014-06-11 4:42 ` Paul E. McKenney
2014-06-11 18:18 ` Paul E. McKenney [this message]
2014-06-11 18:34 ` Pranith Kumar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140611181840.GA26554@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=bobby.prani@gmail.com \
--cc=dipankar@in.ibm.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.