public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 1/1] kernel/rcu/tree.c: correct a check for grace period in progress
@ 2014-06-11  3:20 Pranith Kumar
  2014-06-11  4:12 ` Paul E. McKenney
  0 siblings, 1 reply; 6+ messages in thread
From: Pranith Kumar @ 2014-06-11  3:20 UTC (permalink / raw)
  To: Dipankar Sarma, Paul E. McKenney, open list:READ-COPY UPDATE...

The comment above the code says that we are checking both the current node and
the parent node to see if a grace period is in progress. Change the code
accordingly.

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
---
 kernel/rcu/tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index f1ba773..b632189 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1227,7 +1227,7 @@ rcu_start_future_gp(struct rcu_node *rnp, struct rcu_data *rdp,
 	 * need to explicitly start one.
 	 */
 	if (rnp->gpnum != rnp->completed ||
-	    ACCESS_ONCE(rnp->gpnum) != ACCESS_ONCE(rnp->completed)) {
+	    ACCESS_ONCE(rnp_root->gpnum) != ACCESS_ONCE(rnp_root->completed)) {
 		rnp->need_future_gp[c & 0x1]++;
 		trace_rcu_future_gp(rnp, rdp, c, TPS("Startedleaf"));
 		goto out;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH 1/1] kernel/rcu/tree.c: correct a check for grace period in progress
  2014-06-11  3:20 [RFC PATCH 1/1] kernel/rcu/tree.c: correct a check for grace period in progress Pranith Kumar
@ 2014-06-11  4:12 ` Paul E. McKenney
  2014-06-11  4:23   ` Pranith Kumar
  0 siblings, 1 reply; 6+ messages in thread
From: Paul E. McKenney @ 2014-06-11  4:12 UTC (permalink / raw)
  To: Pranith Kumar; +Cc: Dipankar Sarma, open list:READ-COPY UPDATE...

On Tue, Jun 10, 2014 at 11:20:19PM -0400, Pranith Kumar wrote:
> The comment above the code says that we are checking both the current node and
> the parent node to see if a grace period is in progress. Change the code
> accordingly.

Almost...  Please see below.

							Thanx, Paul

> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
> ---
>  kernel/rcu/tree.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index f1ba773..b632189 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1227,7 +1227,7 @@ rcu_start_future_gp(struct rcu_node *rnp, struct rcu_data *rdp,
>  	 * need to explicitly start one.
>  	 */
>  	if (rnp->gpnum != rnp->completed ||
> -	    ACCESS_ONCE(rnp->gpnum) != ACCESS_ONCE(rnp->completed)) {
> +	    ACCESS_ONCE(rnp_root->gpnum) != ACCESS_ONCE(rnp_root->completed)) {

At this point in the code, we are checking the current rcu_node structure,
which might or might not be the root.  If it is not the root, we absolutely
cannot compare against the root because we don't yet hold the root's lock.

So I cannot take this change.

That said, I do heartily encourage you to keep looking.  After all, there
are bound to be at least a few bugs in RCU somewhere.

>  		rnp->need_future_gp[c & 0x1]++;
>  		trace_rcu_future_gp(rnp, rdp, c, TPS("Startedleaf"));
>  		goto out;


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH 1/1] kernel/rcu/tree.c: correct a check for grace period in progress
  2014-06-11  4:12 ` Paul E. McKenney
@ 2014-06-11  4:23   ` Pranith Kumar
  2014-06-11  4:42     ` Paul E. McKenney
  0 siblings, 1 reply; 6+ messages in thread
From: Pranith Kumar @ 2014-06-11  4:23 UTC (permalink / raw)
  To: Paul McKenney; +Cc: Dipankar Sarma, open list:READ-COPY UPDATE...

Hi Paul,

On Wed, Jun 11, 2014 at 12:12 AM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
>>       if (rnp->gpnum != rnp->completed ||
>> -         ACCESS_ONCE(rnp->gpnum) != ACCESS_ONCE(rnp->completed)) {
>> +         ACCESS_ONCE(rnp_root->gpnum) != ACCESS_ONCE(rnp_root->completed)) {
>
> At this point in the code, we are checking the current rcu_node structure,
> which might or might not be the root.  If it is not the root, we absolutely
> cannot compare against the root because we don't yet hold the root's lock.
>

I was a bit thrown by the double checking which is being done
(rnp->gpnum != rnp->complete) in that if condition. Once without
ACCESS_ONCE and one with. Is there any particular reason for this?

I now understand that we are comparing ->gpnum and ->completed of the
root node which might change from under us if we don't hold the root's
lock. I will keep looking :)

Thanks!
-- 
Pranith

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH 1/1] kernel/rcu/tree.c: correct a check for grace period in progress
  2014-06-11  4:23   ` Pranith Kumar
@ 2014-06-11  4:42     ` Paul E. McKenney
  2014-06-11 18:18       ` Paul E. McKenney
  0 siblings, 1 reply; 6+ messages in thread
From: Paul E. McKenney @ 2014-06-11  4:42 UTC (permalink / raw)
  To: Pranith Kumar; +Cc: Dipankar Sarma, open list:READ-COPY UPDATE...

On Wed, Jun 11, 2014 at 12:23:57AM -0400, Pranith Kumar wrote:
> Hi Paul,
> 
> On Wed, Jun 11, 2014 at 12:12 AM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> >>       if (rnp->gpnum != rnp->completed ||
> >> -         ACCESS_ONCE(rnp->gpnum) != ACCESS_ONCE(rnp->completed)) {
> >> +         ACCESS_ONCE(rnp_root->gpnum) != ACCESS_ONCE(rnp_root->completed)) {
> >
> > At this point in the code, we are checking the current rcu_node structure,
> > which might or might not be the root.  If it is not the root, we absolutely
> > cannot compare against the root because we don't yet hold the root's lock.
> >
> 
> I was a bit thrown by the double checking which is being done
> (rnp->gpnum != rnp->complete) in that if condition. Once without
> ACCESS_ONCE and one with. Is there any particular reason for this?
> 
> I now understand that we are comparing ->gpnum and ->completed of the
> root node which might change from under us if we don't hold the root's
> lock. I will keep looking :)

Hmmm...  Now that you mention it, that does look a bit strange.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH 1/1] kernel/rcu/tree.c: correct a check for grace period in progress
  2014-06-11  4:42     ` Paul E. McKenney
@ 2014-06-11 18:18       ` Paul E. McKenney
  2014-06-11 18:34         ` Pranith Kumar
  0 siblings, 1 reply; 6+ messages in thread
From: Paul E. McKenney @ 2014-06-11 18:18 UTC (permalink / raw)
  To: Pranith Kumar; +Cc: Dipankar Sarma, open list:READ-COPY UPDATE...

On Tue, Jun 10, 2014 at 09:42:42PM -0700, Paul E. McKenney wrote:
> On Wed, Jun 11, 2014 at 12:23:57AM -0400, Pranith Kumar wrote:
> > Hi Paul,
> > 
> > On Wed, Jun 11, 2014 at 12:12 AM, Paul E. McKenney
> > <paulmck@linux.vnet.ibm.com> wrote:
> > >>       if (rnp->gpnum != rnp->completed ||
> > >> -         ACCESS_ONCE(rnp->gpnum) != ACCESS_ONCE(rnp->completed)) {
> > >> +         ACCESS_ONCE(rnp_root->gpnum) != ACCESS_ONCE(rnp_root->completed)) {
> > >
> > > At this point in the code, we are checking the current rcu_node structure,
> > > which might or might not be the root.  If it is not the root, we absolutely
> > > cannot compare against the root because we don't yet hold the root's lock.
> > >
> > 
> > I was a bit thrown by the double checking which is being done
> > (rnp->gpnum != rnp->complete) in that if condition. Once without
> > ACCESS_ONCE and one with. Is there any particular reason for this?
> > 
> > I now understand that we are comparing ->gpnum and ->completed of the
> > root node which might change from under us if we don't hold the root's
> > lock. I will keep looking :)
> 
> Hmmm...  Now that you mention it, that does look a bit strange.

And it turns out that you were right to begin with!  I queue your change,
but with a full explanation in the commit log and with some additions to
the comment.  Please see below.

							Thanx, Paul

------------------------------------------------------------------------

rcu: Check both root and current rcu_node when setting up future grace period

The rcu_start_future_gp() function checks the current rcu_node's ->gpnum
and ->completed twice, once without ACCESS_ONCE() and once with it.
Which is pointless because we hold that rcu_node's ->lock at that point.
The intent was to check the current rcu_node structure and the root
rcu_node structure, the latter locklessly with ACCESS_ONCE().  This
commit therefore makes that change.

The reason that it is safe to locklessly check the root rcu_nodes's
->gpnum and ->completed fields is that we hold the current rcu_node's
->lock, which constrains the root rcu_node's ability to change its
->gpnum and ->completed fields.  Of course, if there is a single rcu_node
structure, then rnp_root==rnp, and holding the lock prevents all changes.
If there is more than one rcu_node structure, then the code updates the
fields in the following order:

1.	Increment rnp_root->gpnum to start new grace period.
2.	Increment rnp->gpnum to initialize the current rcu_node,
    	continuing initialization for the new grace period.
3.	Increment rnp_root->completed to end the current grace period.
4.	Increment rnp->completed to continue cleaning up after the
    	old grace period.
    
So there are four possible combinations of relative values of these
four fields:

N   N   N   N:  RCU idle, new grace period must be initiated.
    		Although rnp_root->gpnum might be incremented immediately
    		after we check, that will just result in unnecessary work.
    		The grace period already started, and we try to start it.
    
N+1 N   N   N:  RCU grace period just started.  No further change is
    		possible because we hold rnp->lock, so the checks of
    		rnp_root->gpnum and rnp_root->completed are stable.
    		We know that our request for a future grace period will
    		be seen during grace-period cleanup.
    
N+1 N   N+1 N:  RCU grace period is ongoing.  Because rnp->gpnum is
    		different than rnp->completed, we won't even look at
    		rnp_root->gpnum and rnp_root->completed, so the possible
    		concurrent change to rnp_root->completed does not matter.
    		We know that our request for a future grace period will
    		be seen during grace-period cleanup, which cannot pass
    		this rcu_node because we hold its ->lock.
    
N+1 N+1 N+1 N:  RCU grace period has ended, but not yet been cleaned up.
    		Because rnp->gpnum is different than rnp->completed, we
    		won't look at rnp_root->gpnum and rnp_root->completed, so
    		the possible concurrent change to rnp_root->completed does
    		not matter.  We know that our request for a future grace
    		period will be seen during grace-period cleanup, which
    		cannot pass this rcu_node because we hold its ->lock.
    
Therefore, despite initial appearances, the lockless check is safe.

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
[ paulmck: Update comment to say why the lockless check is safe. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index b14ea3693b79..ebafb08f2b2a 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1224,10 +1224,16 @@ rcu_start_future_gp(struct rcu_node *rnp, struct rcu_data *rdp,
 	 * believe that a grace period is in progress, then we must wait
 	 * for the one following, which is in "c".  Because our request
 	 * will be noticed at the end of the current grace period, we don't
-	 * need to explicitly start one.
+	 * need to explicitly start one.  We only do the lockless check
+	 * of rnp_root's fields if the current rcu_node structure thinks
+	 * there is no grace period in flight, and because we hold rnp->lock,
+	 * the only possible change is when rnp_root's two fields are
+	 * equal, in which case rnp_root->gpnum might be concurrently
+	 * incremented.  But that is OK, as it will just result in our
+	 * doing some extra useless work.
 	 */
 	if (rnp->gpnum != rnp->completed ||
-	    ACCESS_ONCE(rnp->gpnum) != ACCESS_ONCE(rnp->completed)) {
+	    ACCESS_ONCE(rnp_root->gpnum) != ACCESS_ONCE(rnp_root->completed)) {
 		rnp->need_future_gp[c & 0x1]++;
 		trace_rcu_future_gp(rnp, rdp, c, TPS("Startedleaf"));
 		goto out;


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH 1/1] kernel/rcu/tree.c: correct a check for grace period in progress
  2014-06-11 18:18       ` Paul E. McKenney
@ 2014-06-11 18:34         ` Pranith Kumar
  0 siblings, 0 replies; 6+ messages in thread
From: Pranith Kumar @ 2014-06-11 18:34 UTC (permalink / raw)
  To: paulmck; +Cc: Dipankar Sarma, open list:READ-COPY UPDATE...

On 06/11/2014 02:18 PM, Paul E. McKenney wrote:
> On Tue, Jun 10, 2014 at 09:42:42PM -0700, Paul E. McKenney wrote:
>> On Wed, Jun 11, 2014 at 12:23:57AM -0400, Pranith Kumar wrote:
>>> Hi Paul,
>>>
>>> On Wed, Jun 11, 2014 at 12:12 AM, Paul E. McKenney
>>> <paulmck@linux.vnet.ibm.com> wrote:
>>>>>       if (rnp->gpnum != rnp->completed ||
>>>>> -         ACCESS_ONCE(rnp->gpnum) != ACCESS_ONCE(rnp->completed)) {
>>>>> +         ACCESS_ONCE(rnp_root->gpnum) != ACCESS_ONCE(rnp_root->completed)) {
>>>>
>>>> At this point in the code, we are checking the current rcu_node structure,
>>>> which might or might not be the root.  If it is not the root, we absolutely
>>>> cannot compare against the root because we don't yet hold the root's lock.
>>>>
>>>
>>> I was a bit thrown by the double checking which is being done
>>> (rnp->gpnum != rnp->complete) in that if condition. Once without
>>> ACCESS_ONCE and one with. Is there any particular reason for this?
>>>
>>> I now understand that we are comparing ->gpnum and ->completed of the
>>> root node which might change from under us if we don't hold the root's
>>> lock. I will keep looking :)
>>
>> Hmmm...  Now that you mention it, that does look a bit strange.
> 
> And it turns out that you were right to begin with!  I queue your change,
> but with a full explanation in the commit log and with some additions to
> the comment.  Please see below.
> 

Awesome! A few more patches on your way :)

--
Pranith


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-06-11 18:35 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-06-11  3:20 [RFC PATCH 1/1] kernel/rcu/tree.c: correct a check for grace period in progress Pranith Kumar
2014-06-11  4:12 ` Paul E. McKenney
2014-06-11  4:23   ` Pranith Kumar
2014-06-11  4:42     ` Paul E. McKenney
2014-06-11 18:18       ` Paul E. McKenney
2014-06-11 18:34         ` Pranith Kumar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox