public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@kernel.org>
To: frederic@kernel.org, rcu@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, kernel-team@meta.com,
	rostedt@goodmis.org, "Paul E. McKenney" <paulmck@kernel.org>
Subject: [PATCH v2 rcu 2/3] rcu: Stop stall warning from dumping stacks if grace period ends
Date: Wed, 16 Oct 2024 09:19:30 -0700	[thread overview]
Message-ID: <20241016161931.478592-2-paulmck@kernel.org> (raw)
In-Reply-To: <92193018-8624-495e-a685-320119f78db1@paulmck-laptop>

Currently, once an RCU CPU stall warning decides to dump the stalling
CPUs' stacks, the rcu_dump_cpu_stacks() function persists until it
has gone through the full list.  Unfortunately, if the stalled grace
periods ends midway through, this function will be dumping stacks of
innocent-bystander CPUs that happen to be blocking not the old grace
period, but instead the new one.  This can cause serious confusion.

This commit therefore stops dumping stacks if and when the stalled grace
period ends.

[ paulmck: Apply Joel Fernandes feedback. ]

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree_stall.h | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index d7cdd535e50b1..b530844becf85 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -335,13 +335,17 @@ static int rcu_print_task_stall(struct rcu_node *rnp, unsigned long flags)
  * that don't support NMI-based stack dumps.  The NMI-triggered stack
  * traces are more accurate because they are printed by the target CPU.
  */
-static void rcu_dump_cpu_stacks(void)
+static void rcu_dump_cpu_stacks(unsigned long gp_seq)
 {
 	int cpu;
 	unsigned long flags;
 	struct rcu_node *rnp;
 
 	rcu_for_each_leaf_node(rnp) {
+		if (gp_seq != data_race(rcu_state.gp_seq)) {
+			pr_err("INFO: Stall ended during stack backtracing.\n");
+			return;
+		}
 		printk_deferred_enter();
 		raw_spin_lock_irqsave_rcu_node(rnp, flags);
 		for_each_leaf_node_possible_cpu(rnp, cpu)
@@ -608,7 +612,7 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
 	       (long)rcu_seq_current(&rcu_state.gp_seq), totqlen,
 	       data_race(rcu_state.n_online_cpus)); // Diagnostic read
 	if (ndetected) {
-		rcu_dump_cpu_stacks();
+		rcu_dump_cpu_stacks(gp_seq);
 
 		/* Complain about tasks blocking the grace period. */
 		rcu_for_each_leaf_node(rnp)
@@ -640,7 +644,7 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
 	rcu_force_quiescent_state();  /* Kick them all. */
 }
 
-static void print_cpu_stall(unsigned long gps)
+static void print_cpu_stall(unsigned long gp_seq, unsigned long gps)
 {
 	int cpu;
 	unsigned long flags;
@@ -677,7 +681,7 @@ static void print_cpu_stall(unsigned long gps)
 	rcu_check_gp_kthread_expired_fqs_timer();
 	rcu_check_gp_kthread_starvation();
 
-	rcu_dump_cpu_stacks();
+	rcu_dump_cpu_stacks(gp_seq);
 
 	raw_spin_lock_irqsave_rcu_node(rnp, flags);
 	/* Rewrite if needed in case of slow consoles. */
@@ -759,7 +763,8 @@ static void check_cpu_stall(struct rcu_data *rdp)
 	gs2 = READ_ONCE(rcu_state.gp_seq);
 	if (gs1 != gs2 ||
 	    ULONG_CMP_LT(j, js) ||
-	    ULONG_CMP_GE(gps, js))
+	    ULONG_CMP_GE(gps, js) ||
+	    !rcu_seq_state(gs2))
 		return; /* No stall or GP completed since entering function. */
 	rnp = rdp->mynode;
 	jn = jiffies + ULONG_MAX / 2;
@@ -780,7 +785,7 @@ static void check_cpu_stall(struct rcu_data *rdp)
 			pr_err("INFO: %s detected stall, but suppressed full report due to a stuck CSD-lock.\n", rcu_state.name);
 		} else if (self_detected) {
 			/* We haven't checked in, so go dump stack. */
-			print_cpu_stall(gps);
+			print_cpu_stall(gs2, gps);
 		} else {
 			/* They had a few time units to dump stack, so complain. */
 			print_other_cpu_stall(gs2, gps);
-- 
2.40.1


  parent reply	other threads:[~2024-10-16 16:19 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-09 18:05 [PATCH rcu 0/3] RCU CPU stall-warning changes for v6.13 Paul E. McKenney
2024-10-09 18:05 ` [PATCH rcu 1/3] rcu: Delete unused rcu_gp_might_be_stalled() function Paul E. McKenney
2024-10-09 18:05 ` [PATCH rcu 2/3] rcu: Stop stall warning from dumping stacks if grace period ends Paul E. McKenney
2024-10-15 18:48   ` Joel Fernandes
2024-10-09 18:05 ` [PATCH rcu 3/3] rcu: Finer-grained grace-period-end checks in rcu_dump_cpu_stacks() Paul E. McKenney
2024-10-15 18:49 ` [PATCH rcu 0/3] RCU CPU stall-warning changes for v6.13 Joel Fernandes
2024-10-15 23:02   ` Paul E. McKenney
2024-10-16  0:01     ` Joel Fernandes
2024-10-16 16:18 ` [PATCH v2 " Paul E. McKenney
2024-10-16 16:19   ` [PATCH v2 rcu 1/3] rcu: Delete unused rcu_gp_might_be_stalled() function Paul E. McKenney
2024-10-16 16:19   ` Paul E. McKenney [this message]
2024-10-16 16:19   ` [PATCH v2 rcu 3/3] rcu: Finer-grained grace-period-end checks in rcu_dump_cpu_stacks() Paul E. McKenney
2024-10-29  0:22     ` [PATCH v3 " Paul E. McKenney
2024-10-29  2:20       ` Cheng-Jui Wang (王正睿)
2024-10-29 16:29         ` Paul E. McKenney
2024-10-30  3:55           ` Cheng-Jui Wang (王正睿)
2024-10-30 13:54             ` Paul E. McKenney
2024-10-30 20:12               ` Doug Anderson
2024-10-30 23:26                 ` Paul E. McKenney
2024-10-31  0:21                   ` Doug Anderson
2024-10-31  5:03                     ` Paul E. McKenney
2024-10-31 21:27                       ` Doug Anderson
2024-11-01  1:44                         ` Cheng-Jui Wang (王正睿)
2024-11-01 13:55                           ` Paul E. McKenney
2025-10-30  8:30                             ` Tze-nan Wu (吳澤南)
2024-11-01  7:41               ` Cheng-Jui Wang (王正睿)
2024-11-01 13:59                 ` Paul E. McKenney
2024-10-18 21:49   ` [PATCH v2 rcu 0/3] RCU CPU stall-warning changes for v6.13 Frederic Weisbecker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241016161931.478592-2-paulmck@kernel.org \
    --to=paulmck@kernel.org \
    --cc=frederic@kernel.org \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox