public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: linux-kernel@vger.kernel.org
Cc: mingo@kernel.org, jiangshanlai@gmail.com, dipankar@in.ibm.com,
	akpm@linux-foundation.org, mathieu.desnoyers@efficios.com,
	josh@joshtriplett.org, tglx@linutronix.de, peterz@infradead.org,
	rostedt@goodmis.org, dhowells@redhat.com, edumazet@google.com,
	fweisbec@gmail.com, oleg@redhat.com, joel.opensrc@gmail.com
Subject: Re: [PATCH tip/core/rcu 13/22] rcu: Exclude near-simultaneous RCU CPU stall warnings
Date: Thu, 3 May 2018 11:22:13 -0700	[thread overview]
Message-ID: <20180503182213.GA1981@linux.vnet.ibm.com> (raw)
In-Reply-To: <1524450747-22778-13-git-send-email-paulmck@linux.vnet.ibm.com>

On Sun, Apr 22, 2018 at 07:32:18PM -0700, Paul E. McKenney wrote:
> There is a two-jiffy delay between the time that a CPU will self-report
> an RCU CPU stall warning and the time that some other CPU will report a
> warning on behalf of the first CPU.  This has worked well in the past,
> but on busy systems, it is possible for the two warnings to overlap,
> which makes interpreting them extremely difficult.
> 
> This commit therefore uses a cmpxchg-based timing decision that
> allows only one report in a given one-minute period (assuming default
> stall-warning Kconfig parameters).  This approach will of course fail
> if you are seeing minute-long vCPU preemption, but in that case the
> overlapping RCU CPU stall warnings are the least of your worries.
> 
> Reported-by: Dmitry Vyukov <dvyukov@google.com>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

And later testing showed that this commit had the unfortunate side-effect
of completely suppressing other-CPU reporting of RCU CPU stalls.  The
patch below includes a fix, and this patch has been kicked out of the
queue for the next merge window in favor of the one following.

							Thanx, Paul

------------------------------------------------------------------------

commit ed569311d8d655a72f93310dbf479ca84daa736f
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Mon Apr 9 11:04:46 2018 -0700

    rcu: Exclude near-simultaneous RCU CPU stall warnings
    
    There is a two-jiffy delay between the time that a CPU will self-report
    an RCU CPU stall warning and the time that some other CPU will report a
    warning on behalf of the first CPU.  This has worked well in the past,
    but on busy systems, it is possible for the two warnings to overlap,
    which makes interpreting them extremely difficult.
    
    This commit therefore uses a cmpxchg-based timing decision that
    allows only one report in a given one-minute period (assuming default
    stall-warning Kconfig parameters).  This approach will of course fail
    if you are seeing minute-long vCPU preemption, but in that case the
    overlapping RCU CPU stall warnings are the least of your worries.
    
    Reported-by: Dmitry Vyukov <dvyukov@google.com>
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 35efe85c35b4..f066269d5b91 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1368,7 +1368,6 @@ static inline void panic_on_rcu_stall(void)
 static void print_other_cpu_stall(struct rcu_state *rsp, unsigned long gpnum)
 {
 	int cpu;
-	long delta;
 	unsigned long flags;
 	unsigned long gpa;
 	unsigned long j;
@@ -1381,18 +1380,6 @@ static void print_other_cpu_stall(struct rcu_state *rsp, unsigned long gpnum)
 	if (rcu_cpu_stall_suppress)
 		return;
 
-	/* Only let one CPU complain about others per time interval. */
-
-	raw_spin_lock_irqsave_rcu_node(rnp, flags);
-	delta = jiffies - READ_ONCE(rsp->jiffies_stall);
-	if (delta < RCU_STALL_RAT_DELAY || !rcu_gp_in_progress(rsp)) {
-		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
-		return;
-	}
-	WRITE_ONCE(rsp->jiffies_stall,
-		   jiffies + 3 * rcu_jiffies_till_stall_check() + 3);
-	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
-
 	/*
 	 * OK, time to rat on our buddy...
 	 * See Documentation/RCU/stallwarn.txt for info on how to debug
@@ -1441,6 +1428,10 @@ static void print_other_cpu_stall(struct rcu_state *rsp, unsigned long gpnum)
 			sched_show_task(current);
 		}
 	}
+	/* Rewrite if needed in case of slow consoles. */
+	if (ULONG_CMP_GE(jiffies, READ_ONCE(rsp->jiffies_stall)))
+		WRITE_ONCE(rsp->jiffies_stall,
+			   jiffies + 3 * rcu_jiffies_till_stall_check() + 3);
 
 	rcu_check_gp_kthread_starvation(rsp);
 
@@ -1485,6 +1476,7 @@ static void print_cpu_stall(struct rcu_state *rsp)
 	rcu_dump_cpu_stacks(rsp);
 
 	raw_spin_lock_irqsave_rcu_node(rnp, flags);
+	/* Rewrite if needed in case of slow consoles. */
 	if (ULONG_CMP_GE(jiffies, READ_ONCE(rsp->jiffies_stall)))
 		WRITE_ONCE(rsp->jiffies_stall,
 			   jiffies + 3 * rcu_jiffies_till_stall_check() + 3);
@@ -1508,6 +1500,7 @@ static void check_cpu_stall(struct rcu_state *rsp, struct rcu_data *rdp)
 	unsigned long gpnum;
 	unsigned long gps;
 	unsigned long j;
+	unsigned long jn;
 	unsigned long js;
 	struct rcu_node *rnp;
 
@@ -1546,14 +1539,17 @@ static void check_cpu_stall(struct rcu_state *rsp, struct rcu_data *rdp)
 	    ULONG_CMP_GE(gps, js))
 		return; /* No stall or GP completed since entering function. */
 	rnp = rdp->mynode;
+	jn = jiffies + 3 * rcu_jiffies_till_stall_check() + 3;
 	if (rcu_gp_in_progress(rsp) &&
-	    (READ_ONCE(rnp->qsmask) & rdp->grpmask)) {
+	    (READ_ONCE(rnp->qsmask) & rdp->grpmask) &&
+	    cmpxchg(&rsp->jiffies_stall, js, jn) == js) {
 
 		/* We haven't checked in, so go dump stack. */
 		print_cpu_stall(rsp);
 
 	} else if (rcu_gp_in_progress(rsp) &&
-		   ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY)) {
+		   ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY) &&
+		   cmpxchg(&rsp->jiffies_stall, js, jn) == js) {
 
 		/* They had a few time units to dump stack, so complain. */
 		print_other_cpu_stall(rsp, gpnum);

  reply	other threads:[~2018-05-03 18:20 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-23  2:31 [PATCH tip/core/rcu 0/22] Miscellaneous fixes for v4.18 Paul E. McKenney
2018-04-23  2:32 ` [PATCH tip/core/rcu 01/22] sched: Make non-production PREEMPT cond_resched() help Tasks RCU Paul E. McKenney
2018-04-23  8:51   ` Peter Zijlstra
2018-04-23 12:40     ` Paul E. McKenney
2018-04-23 13:47       ` Steven Rostedt
2018-04-23 14:10         ` Peter Zijlstra
2018-04-23 14:35           ` Steven Rostedt
2018-04-23 15:47             ` Paul E. McKenney
2018-04-23 19:12               ` Paul E. McKenney
2018-04-23 14:03       ` Peter Zijlstra
2018-04-23  2:32 ` [PATCH tip/core/rcu 02/22] rcu: Inline rcu_preempt_do_callback() into its sole caller Paul E. McKenney
2018-04-23  2:32 ` [PATCH tip/core/rcu 03/22] rcu: Don't allocate rcu_nocb_mask if no one needs it Paul E. McKenney
2018-04-23  2:32 ` [PATCH tip/core/rcu 04/22] rcu: Call wake_nocb_leader_defer() with 'FORCE' when nocb_q_count is high Paul E. McKenney
2018-04-23  2:32 ` [PATCH tip/core/rcu 05/22] rcu: Remove deprecated RCU debugfs tracing code Paul E. McKenney
2018-04-23  2:32 ` [PATCH tip/core/rcu 06/22] rcu: Rename cond_resched_rcu_qs() to cond_resched_tasks_rcu_qs() Paul E. McKenney
2018-04-23  8:53   ` Peter Zijlstra
2018-04-23 12:32     ` Paul E. McKenney
2018-04-23 12:48       ` Peter Zijlstra
2018-04-23 13:12         ` Paul E. McKenney
2018-04-23  2:32 ` [PATCH tip/core/rcu 07/22] softirq: Eliminate unused cond_resched_softirq() macro Paul E. McKenney
2018-04-23  8:54   ` Peter Zijlstra
2018-04-23 13:25     ` Eric Dumazet
2018-04-23 19:10       ` Paul E. McKenney
2018-04-23  2:32 ` [PATCH tip/core/rcu 08/22] rcu: Move __rcu_read_lock() and __rcu_read_unlock() to tree_plugin.h Paul E. McKenney
2018-04-23  2:32 ` [PATCH tip/core/rcu 09/22] rcu: Update rcu_bind_gp_kthread() header comment Paul E. McKenney
2018-04-23  2:32 ` [PATCH tip/core/rcu 10/22] rcu: Declare rcu_eqs_special_set() in public header Paul E. McKenney
2018-04-23  2:32 ` [PATCH tip/core/rcu 11/22] srcu: Add cleanup_srcu_struct_quiesced() Paul E. McKenney
2018-04-23  2:32 ` [PATCH tip/core/rcu 12/22] nvme: Avoid flush dependency in delete controller flow Paul E. McKenney
2018-04-23  2:32 ` [PATCH tip/core/rcu 13/22] rcu: Exclude near-simultaneous RCU CPU stall warnings Paul E. McKenney
2018-05-03 18:22   ` Paul E. McKenney [this message]
2018-04-23  2:32 ` [PATCH tip/core/rcu 14/22] rcu: Add leaf-node macros Paul E. McKenney
2018-04-23  2:32 ` [PATCH tip/core/rcu 15/22] doc: Ensure whatisRCU.txt actually says what RCU is Paul E. McKenney
2018-04-23  2:32 ` [PATCH tip/core/rcu 16/22] EXP: rcu: Add debugging info to assertion Paul E. McKenney
2018-05-03 18:23   ` Paul E. McKenney
2018-04-23  2:32 ` [PATCH tip/core/rcu 17/22] EXP: rcu: Abstract addition of debugging information " Paul E. McKenney
2018-04-23  2:32 ` [PATCH tip/core/rcu 18/22] EXP: rcu: Add ->boost_tasks " Paul E. McKenney
2018-04-23  2:32 ` [PATCH tip/core/rcu 19/22] EXP: rcu: Add debugging info to other assertion Paul E. McKenney
2018-04-23  2:32 ` [PATCH tip/core/rcu 20/22] EXP rcu: Add ->qsmask to assertion Paul E. McKenney
2018-04-23  2:32 ` [PATCH tip/core/rcu 21/22] EXP rcu: Add checks for setting ->gp_flags Paul E. McKenney
2018-04-23  2:32 ` [PATCH tip/core/rcu 22/22] rcu: Use the proper lockdep annotation in dump_blkd_tasks() Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180503182213.GA1981@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=dhowells@redhat.com \
    --cc=dipankar@in.ibm.com \
    --cc=edumazet@google.com \
    --cc=fweisbec@gmail.com \
    --cc=jiangshanlai@gmail.com \
    --cc=joel.opensrc@gmail.com \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox