From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1751160AbbEAUKO (ORCPT <rfc822;w@1wt.eu>);
	Fri, 1 May 2015 16:10:14 -0400
Received: from e34.co.us.ibm.com ([32.97.110.152]:35025 "EHLO
	e34.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1750737AbbEAUKL (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Fri, 1 May 2015 16:10:11 -0400
Date: Fri, 1 May 2015 13:10:05 -0700
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Rik van Riel <riel@redhat.com>
Cc: Linux kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: RCU recursion? (code inspection)
Message-ID: <20150501201005.GA15557@linux.vnet.ibm.com>
Reply-To: paulmck@linux.vnet.ibm.com
References: <5543D184.4070707@redhat.com>
 <20150501194102.GH5381@linux.vnet.ibm.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20150501194102.GH5381@linux.vnet.ibm.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-TM-AS-MML: disable
X-Content-Scanned: Fidelis XPS MAILER
x-cbid: 15050120-0017-0000-0000-00000A772B39
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Fri, May 01, 2015 at 12:41:02PM -0700, Paul E. McKenney wrote:
> On Fri, May 01, 2015 at 03:18:28PM -0400, Rik van Riel wrote:
> > Hi Paul,
> > 
> > While looking at synchronize_rcu(), I noticed that
> > synchronize_rcu_expedited() calls synchronize_sched_expedited(),
> > which can call synchronize_sched() when it is worried about
> > the counter wrapping, which can call synchronize_sched_expedited()
> > 
> > The code is sufficiently convoluted that I am unsure whether this
> > recursion can actually happen in practice, but I also did not spot
> > anything that would stop it.
> 
> Hmmm...  Sounds like I should take a look!

And good catch!  The following patch should fix this.  Bad one on me,
given that all the other places in synchronize_sched_expedited() that
you would expect to invoke synchronize_sched() instead invoke
wait_rcu_gp(call_rcu_sched)...

							Thanx, Paul

------------------------------------------------------------------------

    rcu: Make synchronize_sched_expedited() call wait_rcu_gp()
    
    Currently, synchronize_sched_expedited() will call synchronize_sched()
    if there is danger of counter wrap.  But if configuration says to
    always do expedited grace periods, synchronize_sched() will just
    call synchronize_sched_expedited() right back again.  In theory,
    the old expedited operations will complete, the counters will
    get back in synch, and the recursion will end.  But we could
    easily run out of stack long before that time.  This commit
    therefore makes synchronize_sched_expedited() invoke the underlying
    wait_rcu_gp(call_rcu_sched) instead of synchronize_sched(), the same as
    all the other calls out from synchronize_sched_expedited().
    
    This bug was introduced by commit 1924bcb02597 (Avoid counter wrap in
    synchronize_sched_expedited()).
    
    Reported-by: Rik van Riel <riel@redhat.com>
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index bcc59437fc93..4e6902005228 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3310,7 +3310,7 @@ void synchronize_sched_expedited(void)
 	if (ULONG_CMP_GE((ulong)atomic_long_read(&rsp->expedited_start),
 			 (ulong)atomic_long_read(&rsp->expedited_done) +
 			 ULONG_MAX / 8)) {
-		synchronize_sched();
+		wait_rcu_gp(call_rcu_sched);
 		atomic_long_inc(&rsp->expedited_wrap);
 		return;
 	}