public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org, mingo@kernel.org,
	jiangshanlai@gmail.com, dipankar@in.ibm.com,
	akpm@linux-foundation.org, mathieu.desnoyers@efficios.com,
	josh@joshtriplett.org, tglx@linutronix.de, rostedt@goodmis.org,
	dhowells@redhat.com, edumazet@google.com, fweisbec@gmail.com,
	oleg@redhat.com, joel@joelfernandes.org
Subject: Re: [PATCH tip/core/rcu 13/22] rcu: Fix grace-period hangs due to race with CPU offline
Date: Wed, 27 Jun 2018 08:57:21 -0700	[thread overview]
Message-ID: <20180627155721.GZ3593@linux.vnet.ibm.com> (raw)
In-Reply-To: <20180627094633.GG2512@hirez.programming.kicks-ass.net>

On Wed, Jun 27, 2018 at 11:46:33AM +0200, Peter Zijlstra wrote:
> On Wed, Jun 27, 2018 at 11:11:06AM +0200, Peter Zijlstra wrote:
> > On Tue, Jun 26, 2018 at 04:40:04PM -0700, Paul E. McKenney wrote:
> > > The options I have considered are as follows:
> > 
> > > 2.	Stick with the no-failsafe approach, but rely on RCU's grace-period
> > > 	kthread to wake up later due to its timed wait during the
> > > 	force-quiescent-state process.  This would be a bit obnoxious,
> > > 	as it requires passing a don't-wake flag (or some such) up the
> > > 	quiescent-state reporting mechanism.  It would also needlessly
> > > 	delay grace-period ends, especially on large systems (RCU scales
> > > 	up the FQS delay on larger systems to maintain limited CPU
> > > 	consumption per unit time).
> > > 
> > > 3.	Stick with the no-failsafe approach, but have the quiescent-state
> > > 	reporting code hand back a value indicating that a wakeup is needed.
> > > 	Also a bit obnoxious, as this value would need to be threaded up
> > > 	the reporting code's return path.  Simple in theory, but a bit
> > > 	of an ugly change, especially for the many places in the code that
> > > 	currently expect quiescent-state reporting to be an unconditional
> > > 	fire-and-forget operation.
> > 
> > Here's a variant on 2+3, instead of propagating the state back, we
> > completely ignore if we needed a wakeup or not, and then unconditionally
> > wake the GP kthread on the managing CPU's rcutree_migrate_callbacks()
> > invocation.
> > 
> > Hotplug is rare (or should damn well be), doing a spurious wake of the
> > GP thread shouldn't matter here.
> 
> Another variant, which simply skips the wakeup whever ran on an offline
> CPU, relying on the wakeup from rcutree_migrate_callbacks() right after
> the CPU really is dead.

Cute!  ;-)

And a much smaller change.

However, this means that if someone indirectly and erroneously causes
rcu_report_qs_rsp() to be invoked from an offline CPU, the result is an
intermittent and difficult-to-debug grace-period hang.  A lockdep splat
whose stack trace directly implicates the culprit is much better.

But your point about CPU hotplug being rare is a good one.  I should
at the very least move ->ofl_lock to sit right beside ->lock, ditching
the ____cacheline_internodealigned_in_smp.

							Thanx, Paul

> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 7832dd556490..417496a03259 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -104,7 +104,6 @@ struct rcu_state sname##_state = { \
>  	.abbr = sabbr, \
>  	.exp_mutex = __MUTEX_INITIALIZER(sname##_state.exp_mutex), \
>  	.exp_wake_mutex = __MUTEX_INITIALIZER(sname##_state.exp_wake_mutex), \
> -	.ofl_lock = __SPIN_LOCK_UNLOCKED(sname##_state.ofl_lock), \
>  }
> 
>  RCU_STATE_INITIALIZER(rcu_sched, 's', call_rcu_sched);
> @@ -1928,13 +1927,11 @@ static bool rcu_gp_init(struct rcu_state *rsp)
>  	 */
>  	rsp->gp_state = RCU_GP_ONOFF;
>  	rcu_for_each_leaf_node(rsp, rnp) {
> -		spin_lock(&rsp->ofl_lock);
>  		raw_spin_lock_irq_rcu_node(rnp);
>  		if (rnp->qsmaskinit == rnp->qsmaskinitnext &&
>  		    !rnp->wait_blkd_tasks) {
>  			/* Nothing to do on this leaf rcu_node structure. */
>  			raw_spin_unlock_irq_rcu_node(rnp);
> -			spin_unlock(&rsp->ofl_lock);
>  			continue;
>  		}
> 
> @@ -1970,7 +1967,6 @@ static bool rcu_gp_init(struct rcu_state *rsp)
>  		}
> 
>  		raw_spin_unlock_irq_rcu_node(rnp);
> -		spin_unlock(&rsp->ofl_lock);
>  	}
>  	rcu_gp_slow(rsp, gp_preinit_delay); /* Races with CPU hotplug. */
> 
> @@ -2250,11 +2246,19 @@ static int __noreturn rcu_gp_kthread(void *arg)
>  static void rcu_report_qs_rsp(struct rcu_state *rsp, unsigned long flags)
>  	__releases(rcu_get_root(rsp)->lock)
>  {
> +	int cpu = smp_processor_id();
> +
>  	raw_lockdep_assert_held_rcu_node(rcu_get_root(rsp));
>  	WARN_ON_ONCE(!rcu_gp_in_progress(rsp));
>  	WRITE_ONCE(rsp->gp_flags, READ_ONCE(rsp->gp_flags) | RCU_GP_FLAG_FQS);
>  	raw_spin_unlock_irqrestore_rcu_node(rcu_get_root(rsp), flags);
> -	rcu_gp_kthread_wake(rsp);
> +
> +	/*
> +	 * When our @cpu is offline, we'll get a wakeup from
> +	 * rcutree_migrate_callbacks.
> +	 */
> +	if (cpu_online(cpu))
> +		rcu_gp_kthread_wake(rsp);
>  }
> 
>  /*
> @@ -3768,18 +3772,15 @@ static void rcu_cleanup_dying_idle_cpu(int cpu, struct rcu_state *rsp)
> 
>  	/* Remove outgoing CPU from mask in the leaf rcu_node structure. */
>  	mask = rdp->grpmask;
> -	spin_lock(&rsp->ofl_lock);
>  	raw_spin_lock_irqsave_rcu_node(rnp, flags); /* Enforce GP memory-order guarantee. */
>  	rdp->rcu_ofl_gp_seq = READ_ONCE(rsp->gp_seq);
>  	rdp->rcu_ofl_gp_flags = READ_ONCE(rsp->gp_flags);
> +	rnp->qsmaskinitnext &= ~mask;
>  	if (rnp->qsmask & mask) { /* RCU waiting on outgoing CPU? */
> -		/* Report quiescent state -before- changing ->qsmaskinitnext! */
>  		rcu_report_qs_rnp(mask, rsp, rnp, rnp->gp_seq, flags);
>  		raw_spin_lock_irqsave_rcu_node(rnp, flags);
>  	}
> -	rnp->qsmaskinitnext &= ~mask;
>  	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> -	spin_unlock(&rsp->ofl_lock);
>  }
> 
>  /*
> @@ -3849,6 +3850,12 @@ void rcutree_migrate_callbacks(int cpu)
>  {
>  	struct rcu_state *rsp;
> 
> +	/*
> +	 * Just in case the outgoing CPU needed to wake the GP kthread
> +	 * do so here.
> +	 */
> +	rcu_gp_kthread_wake(rsp);
> +
>  	for_each_rcu_flavor(rsp)
>  		rcu_migrate_callbacks(cpu, rsp);
>  }
> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> index 4e74df768c57..8dab71838141 100644
> --- a/kernel/rcu/tree.h
> +++ b/kernel/rcu/tree.h
> @@ -367,10 +367,6 @@ struct rcu_state {
>  	const char *name;			/* Name of structure. */
>  	char abbr;				/* Abbreviated name. */
>  	struct list_head flavors;		/* List of RCU flavors. */
> -
> -	spinlock_t ofl_lock ____cacheline_internodealigned_in_smp;
> -						/* Synchronize offline with */
> -						/*  GP pre-initialization. */
>  };
> 
>  /* Values for rcu_state structure's gp_flags field. */
> 


  reply	other threads:[~2018-06-27 15:55 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-26  0:20 [PATCH tip/core/rcu 0/22] Grace-period fixes for v4.19 Paul E. McKenney
2018-06-26 17:10 ` [PATCH tip/core/rcu 01/22] rcu: Clean up handling of tasks blocked across full-rcu_node offline Paul E. McKenney
2018-06-26 17:10 ` [PATCH tip/core/rcu 02/22] rcu: Fix an obsolete ->qsmaskinit comment Paul E. McKenney
2018-06-26 17:10 ` [PATCH tip/core/rcu 03/22] rcu: Make rcu_init_new_rnp() stop upon already-set bit Paul E. McKenney
2018-06-26 17:10 ` [PATCH tip/core/rcu 04/22] rcu: Make rcu_report_unblock_qs_rnp() warn on violated preconditions Paul E. McKenney
2018-06-26 17:10 ` [PATCH tip/core/rcu 05/22] rcu: Fix typo and add additional debug Paul E. McKenney
2018-06-26 17:10 ` [PATCH tip/core/rcu 06/22] rcu: Replace smp_wmb() with smp_store_release() for stall check Paul E. McKenney
2018-06-26 17:10 ` [PATCH tip/core/rcu 07/22] rcu: Prevent useless FQS scan after all CPUs have checked in Paul E. McKenney
2018-06-26 17:10 ` [PATCH tip/core/rcu 08/22] rcu: Suppress false-positive offline-CPU lockdep-RCU splat Paul E. McKenney
2018-06-26 17:10 ` [PATCH tip/core/rcu 09/22] rcu: Suppress false-positive preempted-task splats Paul E. McKenney
2018-06-26 17:10 ` [PATCH tip/core/rcu 10/22] rcu: Suppress more involved " Paul E. McKenney
2018-06-26 17:10 ` [PATCH tip/core/rcu 11/22] rcu: Suppress false-positive splats from mid-init task resume Paul E. McKenney
2018-06-26 17:10 ` [PATCH tip/core/rcu 12/22] rcu: Fix grace-period hangs " Paul E. McKenney
2018-06-26 17:10 ` [PATCH tip/core/rcu 13/22] rcu: Fix grace-period hangs due to race with CPU offline Paul E. McKenney
2018-06-26 17:44   ` Peter Zijlstra
2018-06-26 18:19     ` Paul E. McKenney
2018-06-26 19:48       ` Peter Zijlstra
2018-06-26 20:40         ` Paul E. McKenney
2018-06-26 17:51   ` Peter Zijlstra
2018-06-26 18:29     ` Paul E. McKenney
2018-06-26 20:07       ` Peter Zijlstra
2018-06-26 20:26       ` Paul E. McKenney
2018-06-26 20:32         ` Peter Zijlstra
2018-06-26 23:40           ` Paul E. McKenney
2018-06-27  8:33             ` Peter Zijlstra
2018-06-27 15:43               ` Paul E. McKenney
2018-06-27  9:11             ` Peter Zijlstra
2018-06-27  9:46               ` Peter Zijlstra
2018-06-27 15:57                 ` Paul E. McKenney [this message]
2018-06-27 17:51                   ` Peter Zijlstra
2018-06-28  5:13                     ` Paul E. McKenney
2018-06-28  8:26                       ` Peter Zijlstra
2018-06-28 12:38                         ` Paul E. McKenney
2018-06-28 13:06                           ` Peter Zijlstra
2018-06-29  4:30                             ` Paul E. McKenney
2018-06-27 15:53               ` Paul E. McKenney
2018-06-26 17:10 ` [PATCH tip/core/rcu 14/22] rcu: Add RCU-preempt check for waiting on newly onlined CPU Paul E. McKenney
2018-06-26 17:10 ` [PATCH tip/core/rcu 15/22] rcu: Move grace-period pre-init delay after pre-init Paul E. McKenney
2018-06-26 17:10 ` [PATCH tip/core/rcu 16/22] rcu: Remove failsafe check for lost quiescent state Paul E. McKenney
2018-06-26 17:10 ` [PATCH tip/core/rcu 17/22] rcutorture: Change units of onoff_interval to jiffies Paul E. McKenney
2018-06-26 17:10 ` [PATCH tip/core/rcu 18/22] rcu: Remove CPU-hotplug failsafe from force-quiescent-state code path Paul E. McKenney
2018-06-26 17:10 ` [PATCH tip/core/rcu 19/22] rcu: Add up-tree information to dump_blkd_tasks() diagnostics Paul E. McKenney
2018-06-26 17:10 ` [PATCH tip/core/rcu 20/22] rcu: Add CPU online/offline state to dump_blkd_tasks() Paul E. McKenney
2018-06-26 17:10 ` [PATCH tip/core/rcu 21/22] rcu: Record ->gp_state for both phases of grace-period initialization Paul E. McKenney
2018-06-26 17:10 ` [PATCH tip/core/rcu 22/22] rcu: Add diagnostics for offline CPUs failing to report QS Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180627155721.GZ3593@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=dhowells@redhat.com \
    --cc=dipankar@in.ibm.com \
    --cc=edumazet@google.com \
    --cc=fweisbec@gmail.com \
    --cc=jiangshanlai@gmail.com \
    --cc=joel@joelfernandes.org \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox