All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org, mingo@kernel.org,
	jiangshanlai@gmail.com, dipankar@in.ibm.com,
	akpm@linux-foundation.org, mathieu.desnoyers@efficios.com,
	josh@joshtriplett.org, tglx@linutronix.de, rostedt@goodmis.org,
	dhowells@redhat.com, edumazet@google.com, fweisbec@gmail.com,
	oleg@redhat.com, joel@joelfernandes.org
Subject: Re: [PATCH tip/core/rcu 13/22] rcu: Fix grace-period hangs due to race with CPU offline
Date: Tue, 26 Jun 2018 13:26:15 -0700	[thread overview]
Message-ID: <20180626202615.GA32162@linux.vnet.ibm.com> (raw)
In-Reply-To: <20180626182950.GH3593@linux.vnet.ibm.com>

On Tue, Jun 26, 2018 at 11:29:50AM -0700, Paul E. McKenney wrote:
> On Tue, Jun 26, 2018 at 07:51:19PM +0200, Peter Zijlstra wrote:
> > On Tue, Jun 26, 2018 at 10:10:39AM -0700, Paul E. McKenney wrote:
> > > Without special fail-safe quiescent-state-propagation checks, grace-period
> > > hangs can result from the following scenario:
> > > 
> > > 1.	CPU 1 goes offline.
> > > 
> > > 2.	Because CPU 1 is the only CPU in the system blocking the current
> > > 	grace period, as soon as rcu_cleanup_dying_idle_cpu()'s call to
> > > 	rcu_report_qs_rnp() returns.
> > > 
> > > 3.	At this point, the leaf rcu_node structure's ->lock is no longer
> > > 	held: rcu_report_qs_rnp() has released it, as it must in order
> > > 	to awaken the RCU grace-period kthread.
> > > 
> > > 4.	At this point, that same leaf rcu_node structure's ->qsmaskinitnext
> > > 	field still records CPU 1 as being online.  This is absolutely
> > > 	necessary because the scheduler uses RCU, and ->qsmaskinitnext
> > 
> > Can you expand a bit on this, where does the scheduler care about the
> > online state of the CPU that's about to call into arch_cpu_idle_dead()?
> 
> Because the CPU does a context switch between the time that the CPU gets
> marked offline from the viewpoint of cpu_offline() and the time that
> the CPU finally makes it to arch_cpu_idle_dead().  Plus reporting the
> quiescent state (rcu_report_qs_rnp()) can result in waking up RCU's
> grace-period kthread.  During that context switch and that wakeup,
> the scheduler needs RCU to continue paying attention to the outgoing
> CPU, right?

And is the following a reasonable expansion?

							Thanx, Paul

------------------------------------------------------------------------

commit 2e5b2ff4047b138d6b56e4e3ba91bc47503cdebe
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Fri May 25 19:23:09 2018 -0700

    rcu: Fix grace-period hangs due to race with CPU offline
    
    Without special fail-safe quiescent-state-propagation checks, grace-period
    hangs can result from the following scenario:
    
    1.      CPU 1 goes offline.
    
    2.      Because CPU 1 is the only CPU in the system blocking the current
            grace period, the grace period ends as soon as
            rcu_cleanup_dying_idle_cpu()'s call to rcu_report_qs_rnp()
            returns.
    
    3.      At this point, the leaf rcu_node structure's ->lock is no longer
            held: rcu_report_qs_rnp() has released it, as it must in order
            to awaken the RCU grace-period kthread.
    
    4.      At this point, that same leaf rcu_node structure's ->qsmaskinitnext
            field still records CPU 1 as being online.  This is absolutely
            necessary because the scheduler uses RCU (in this case on the
            wake-up path while awakening RCU's grace-period kthread), and
            ->qsmaskinitnext contains RCU's idea as to which CPUs are online.
            Therefore, invoking rcu_report_qs_rnp() after clearing CPU 1's
            bit from ->qsmaskinitnext would result in a lockdep-RCU splat
            due to RCU being used from an offline CPU.
    
    5.      RCU's grace-period kthread awakens, sees that the old grace period
            has completed and that a new one is needed.  It therefore starts
            a new grace period, but because CPU 1's leaf rcu_node structure's
            ->qsmaskinitnext field still shows CPU 1 as being online, this new
            grace period is initialized to wait for a quiescent state from the
            now-offline CPU 1.
    
    6.      Without the fail-safe force-quiescent-state checks, there would
            be no quiescent state from the now-offline CPU 1, which would
            eventually result in RCU CPU stall warnings and memory exhaustion.
    
    It would be good to get rid of the special fail-safe quiescent-state
    propagation checks, and thus it would be good to fix things so that
    he above scenario cannot happen.  This commit therefore adds a new
    ->ofl_lock to the rcu_state structure.  This lock is held by rcu_gp_init()
    across the applying of buffered online and offline operations to the
    rcu_node tree, and it is also held by rcu_cleanup_dying_idle_cpu()
    when buffering a new offline operation.  This prevents rcu_gp_init()
    from acquiring the leaf rcu_node structure's lock during the interval
    between when rcu_cleanup_dying_idle_cpu() invokes rcu_report_qs_rnp(),
    which releases ->lock and the re-acquisition of that same lock.
    This in turn prevents the failure scenario outlined above, and will
    hopefully eventually allow removal of the offline-CPU checks from the
    force-quiescent-state code path.
    
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 2cfd5d3da4f8..bb8f45c0fa68 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -101,6 +101,7 @@ struct rcu_state sname##_state = { \
 	.abbr = sabbr, \
 	.exp_mutex = __MUTEX_INITIALIZER(sname##_state.exp_mutex), \
 	.exp_wake_mutex = __MUTEX_INITIALIZER(sname##_state.exp_wake_mutex), \
+	.ofl_lock = __SPIN_LOCK_UNLOCKED(sname##_state.ofl_lock), \
 }
 
 RCU_STATE_INITIALIZER(rcu_sched, 's', call_rcu_sched);
@@ -1900,11 +1901,13 @@ static bool rcu_gp_init(struct rcu_state *rsp)
 	 */
 	rcu_for_each_leaf_node(rsp, rnp) {
 		rcu_gp_slow(rsp, gp_preinit_delay);
+		spin_lock(&rsp->ofl_lock);
 		raw_spin_lock_irq_rcu_node(rnp);
 		if (rnp->qsmaskinit == rnp->qsmaskinitnext &&
 		    !rnp->wait_blkd_tasks) {
 			/* Nothing to do on this leaf rcu_node structure. */
 			raw_spin_unlock_irq_rcu_node(rnp);
+			spin_unlock(&rsp->ofl_lock);
 			continue;
 		}
 
@@ -1940,6 +1943,7 @@ static bool rcu_gp_init(struct rcu_state *rsp)
 		}
 
 		raw_spin_unlock_irq_rcu_node(rnp);
+		spin_unlock(&rsp->ofl_lock);
 	}
 
 	/*
@@ -3747,6 +3751,7 @@ static void rcu_cleanup_dying_idle_cpu(int cpu, struct rcu_state *rsp)
 
 	/* Remove outgoing CPU from mask in the leaf rcu_node structure. */
 	mask = rdp->grpmask;
+	spin_lock(&rsp->ofl_lock);
 	raw_spin_lock_irqsave_rcu_node(rnp, flags); /* Enforce GP memory-order guarantee. */
 	if (rnp->qsmask & mask) { /* RCU waiting on outgoing CPU? */
 		/* Report quiescent state -before- changing ->qsmaskinitnext! */
@@ -3755,6 +3760,7 @@ static void rcu_cleanup_dying_idle_cpu(int cpu, struct rcu_state *rsp)
 	}
 	rnp->qsmaskinitnext &= ~mask;
 	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
+	spin_unlock(&rsp->ofl_lock);
 }
 
 /*
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 3def94fc9c74..6683da6e4ecc 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -363,6 +363,10 @@ struct rcu_state {
 	const char *name;			/* Name of structure. */
 	char abbr;				/* Abbreviated name. */
 	struct list_head flavors;		/* List of RCU flavors. */
+
+	spinlock_t ofl_lock ____cacheline_internodealigned_in_smp;
+						/* Synchronize offline with */
+						/*  GP pre-initialization. */
 };
 
 /* Values for rcu_state structure's gp_flags field. */


  parent reply	other threads:[~2018-06-26 20:24 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-26  0:20 [PATCH tip/core/rcu 0/22] Grace-period fixes for v4.19 Paul E. McKenney
2018-06-26 17:10 ` [PATCH tip/core/rcu 01/22] rcu: Clean up handling of tasks blocked across full-rcu_node offline Paul E. McKenney
2018-06-26 17:10 ` [PATCH tip/core/rcu 02/22] rcu: Fix an obsolete ->qsmaskinit comment Paul E. McKenney
2018-06-26 17:10 ` [PATCH tip/core/rcu 03/22] rcu: Make rcu_init_new_rnp() stop upon already-set bit Paul E. McKenney
2018-06-26 17:10 ` [PATCH tip/core/rcu 04/22] rcu: Make rcu_report_unblock_qs_rnp() warn on violated preconditions Paul E. McKenney
2018-06-26 17:10 ` [PATCH tip/core/rcu 05/22] rcu: Fix typo and add additional debug Paul E. McKenney
2018-06-26 17:10 ` [PATCH tip/core/rcu 06/22] rcu: Replace smp_wmb() with smp_store_release() for stall check Paul E. McKenney
2018-06-26 17:10 ` [PATCH tip/core/rcu 07/22] rcu: Prevent useless FQS scan after all CPUs have checked in Paul E. McKenney
2018-06-26 17:10 ` [PATCH tip/core/rcu 08/22] rcu: Suppress false-positive offline-CPU lockdep-RCU splat Paul E. McKenney
2018-06-26 17:10 ` [PATCH tip/core/rcu 09/22] rcu: Suppress false-positive preempted-task splats Paul E. McKenney
2018-06-26 17:10 ` [PATCH tip/core/rcu 10/22] rcu: Suppress more involved " Paul E. McKenney
2018-06-26 17:10 ` [PATCH tip/core/rcu 11/22] rcu: Suppress false-positive splats from mid-init task resume Paul E. McKenney
2018-06-26 17:10 ` [PATCH tip/core/rcu 12/22] rcu: Fix grace-period hangs " Paul E. McKenney
2018-06-26 17:10 ` [PATCH tip/core/rcu 13/22] rcu: Fix grace-period hangs due to race with CPU offline Paul E. McKenney
2018-06-26 17:44   ` Peter Zijlstra
2018-06-26 18:19     ` Paul E. McKenney
2018-06-26 19:48       ` Peter Zijlstra
2018-06-26 20:40         ` Paul E. McKenney
2018-06-26 17:51   ` Peter Zijlstra
2018-06-26 18:29     ` Paul E. McKenney
2018-06-26 20:07       ` Peter Zijlstra
2018-06-26 20:26       ` Paul E. McKenney [this message]
2018-06-26 20:32         ` Peter Zijlstra
2018-06-26 23:40           ` Paul E. McKenney
2018-06-27  8:33             ` Peter Zijlstra
2018-06-27 15:43               ` Paul E. McKenney
2018-06-27  9:11             ` Peter Zijlstra
2018-06-27  9:46               ` Peter Zijlstra
2018-06-27 15:57                 ` Paul E. McKenney
2018-06-27 17:51                   ` Peter Zijlstra
2018-06-28  5:13                     ` Paul E. McKenney
2018-06-28  8:26                       ` Peter Zijlstra
2018-06-28 12:38                         ` Paul E. McKenney
2018-06-28 13:06                           ` Peter Zijlstra
2018-06-29  4:30                             ` Paul E. McKenney
2018-06-27 15:53               ` Paul E. McKenney
2018-06-26 17:10 ` [PATCH tip/core/rcu 14/22] rcu: Add RCU-preempt check for waiting on newly onlined CPU Paul E. McKenney
2018-06-26 17:10 ` [PATCH tip/core/rcu 15/22] rcu: Move grace-period pre-init delay after pre-init Paul E. McKenney
2018-06-26 17:10 ` [PATCH tip/core/rcu 16/22] rcu: Remove failsafe check for lost quiescent state Paul E. McKenney
2018-06-26 17:10 ` [PATCH tip/core/rcu 17/22] rcutorture: Change units of onoff_interval to jiffies Paul E. McKenney
2018-06-26 17:10 ` [PATCH tip/core/rcu 18/22] rcu: Remove CPU-hotplug failsafe from force-quiescent-state code path Paul E. McKenney
2018-06-26 17:10 ` [PATCH tip/core/rcu 19/22] rcu: Add up-tree information to dump_blkd_tasks() diagnostics Paul E. McKenney
2018-06-26 17:10 ` [PATCH tip/core/rcu 20/22] rcu: Add CPU online/offline state to dump_blkd_tasks() Paul E. McKenney
2018-06-26 17:10 ` [PATCH tip/core/rcu 21/22] rcu: Record ->gp_state for both phases of grace-period initialization Paul E. McKenney
2018-06-26 17:10 ` [PATCH tip/core/rcu 22/22] rcu: Add diagnostics for offline CPUs failing to report QS Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180626202615.GA32162@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=dhowells@redhat.com \
    --cc=dipankar@in.ibm.com \
    --cc=edumazet@google.com \
    --cc=fweisbec@gmail.com \
    --cc=jiangshanlai@gmail.com \
    --cc=joel@joelfernandes.org \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.