public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] RCU stall detection updates for v6.7
@ 2023-10-19 12:01 Frederic Weisbecker
  2023-10-19 12:01 ` [PATCH 1/6] rcu: Delete a redundant check in rcu_check_gp_kthread_starvation() Frederic Weisbecker
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: Frederic Weisbecker @ 2023-10-19 12:01 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Boqun Feng, Joel Fernandes, Josh Triplett,
	Mathieu Desnoyers, Neeraj Upadhyay, Paul E . McKenney,
	Steven Rostedt, Uladzislau Rezki, rcu

Hello,

Please find below the RCU stall detection updates for the next merge 
window:

Joel Fernandes (Google) (1):
  rcu/tree: Defer setting of jiffies during stall reset

Paul E. McKenney (2):
  rcu: Add RCU CPU stall notifier
  rcutorture: Add test of RCU CPU stall notifiers

Zhen Lei (3):
  rcu: Delete a redundant check in rcu_check_gp_kthread_starvation()
  rcu: Don't redump the stalled CPU where RCU GP kthread last ran
  rcu: Eliminate check_cpu_stall() duplicate code

Thanks,
        Frederic.

 include/linux/rcu_notifier.h |  32 +++++++++
 kernel/rcu/rcu.h             |   6 ++
 kernel/rcu/rcutorture.c      |  21 ++++++
 kernel/rcu/tree.c            |  12 ++++
 kernel/rcu/tree.h            |   4 ++
 kernel/rcu/tree_exp.h        |   6 +-
 kernel/rcu/tree_stall.h      | 135 +++++++++++++++++++++++++----------
 7 files changed, 178 insertions(+), 38 deletions(-)
 create mode 100644 include/linux/rcu_notifier.h

-- 
2.34.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/6] rcu: Delete a redundant check in rcu_check_gp_kthread_starvation()
  2023-10-19 12:01 [PATCH 0/6] RCU stall detection updates for v6.7 Frederic Weisbecker
@ 2023-10-19 12:01 ` Frederic Weisbecker
  2023-10-19 12:01 ` [PATCH 2/6] rcu: Don't redump the stalled CPU where RCU GP kthread last ran Frederic Weisbecker
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Frederic Weisbecker @ 2023-10-19 12:01 UTC (permalink / raw)
  To: LKML
  Cc: Zhen Lei, Boqun Feng, Joel Fernandes, Josh Triplett,
	Mathieu Desnoyers, Neeraj Upadhyay, Paul E . McKenney,
	Steven Rostedt, Uladzislau Rezki, rcu, Frederic Weisbecker

From: Zhen Lei <thunder.leizhen@huawei.com>

The rcu_check_gp_kthread_starvation() function uses task_cpu() to sample
the last CPU that the grace-period kthread ran on, and task_cpu() samples
the thread_info structure's ->cpu field.  But this field will always
contain a number corresponding to a CPU that was online some time in
the past, thus never a negative number.  This invariant is checked by
a WARN_ON_ONCE() in set_task_cpu().

This means that if the grace-period kthread exists, that is, if the "gpk"
local variable is non-NULL, the "cpu" local variable will be non-negative.
This in turn means that the existing check for non-negative "cpu" is
redundant with the enclosing check for non-NULL "gpk".

This commit threefore removes the redundant check of "cpu".

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/rcu/tree_stall.h | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index 6f06dc12904a..b5ce0580074e 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -537,13 +537,11 @@ static void rcu_check_gp_kthread_starvation(void)
 			pr_err("\tUnless %s kthread gets sufficient CPU time, OOM is now expected behavior.\n", rcu_state.name);
 			pr_err("RCU grace-period kthread stack dump:\n");
 			sched_show_task(gpk);
-			if (cpu >= 0) {
-				if (cpu_is_offline(cpu)) {
-					pr_err("RCU GP kthread last ran on offline CPU %d.\n", cpu);
-				} else  {
-					pr_err("Stack dump where RCU GP kthread last ran:\n");
-					dump_cpu_task(cpu);
-				}
+			if (cpu_is_offline(cpu)) {
+				pr_err("RCU GP kthread last ran on offline CPU %d.\n", cpu);
+			} else  {
+				pr_err("Stack dump where RCU GP kthread last ran:\n");
+				dump_cpu_task(cpu);
 			}
 			wake_up_process(gpk);
 		}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/6] rcu: Don't redump the stalled CPU where RCU GP kthread last ran
  2023-10-19 12:01 [PATCH 0/6] RCU stall detection updates for v6.7 Frederic Weisbecker
  2023-10-19 12:01 ` [PATCH 1/6] rcu: Delete a redundant check in rcu_check_gp_kthread_starvation() Frederic Weisbecker
@ 2023-10-19 12:01 ` Frederic Weisbecker
  2023-10-19 12:01 ` [PATCH 3/6] rcu: Eliminate check_cpu_stall() duplicate code Frederic Weisbecker
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Frederic Weisbecker @ 2023-10-19 12:01 UTC (permalink / raw)
  To: LKML
  Cc: Zhen Lei, Boqun Feng, Joel Fernandes, Josh Triplett,
	Mathieu Desnoyers, Neeraj Upadhyay, Paul E . McKenney,
	Steven Rostedt, Uladzislau Rezki, rcu, Frederic Weisbecker

From: Zhen Lei <thunder.leizhen@huawei.com>

The stacks of all stalled CPUs will be dumped in rcu_dump_cpu_stacks().
If the CPU on where RCU GP kthread last ran is stalled, its stack does
not need to be dumped again. We can search the corresponding backtrace
based on the printed CPU ID.

For example:
[   87.328275] rcu: rcu_sched kthread starved for ... ->cpu=3  <--------|
... ...                                                                 |
[   89.385007] NMI backtrace for cpu 3                         <--------|
[   89.385179] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.10.0+ #22 <--|
[   89.385188] Hardware name: linux,dummy-virt (DT)
[   89.385196] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
[   89.385204] pc : arch_cpu_idle+0x40/0xc0
[   89.385211] lr : arch_cpu_idle+0x2c/0xc0
... ...
[   89.385566] Call trace:
[   89.385574]  arch_cpu_idle+0x40/0xc0
[   89.385581]  default_idle_call+0x100/0x450
[   89.385589]  cpuidle_idle_call+0x2f8/0x460
[   89.385596]  do_idle+0x1dc/0x3d0
[   89.385604]  cpu_startup_entry+0x5c/0xb0
[   89.385613]  secondary_start_kernel+0x35c/0x520

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/rcu/tree_stall.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index b5ce0580074e..fc04a8d7ce96 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -534,12 +534,14 @@ static void rcu_check_gp_kthread_starvation(void)
 		       data_race(READ_ONCE(rcu_state.gp_state)),
 		       gpk ? data_race(READ_ONCE(gpk->__state)) : ~0, cpu);
 		if (gpk) {
+			struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
+
 			pr_err("\tUnless %s kthread gets sufficient CPU time, OOM is now expected behavior.\n", rcu_state.name);
 			pr_err("RCU grace-period kthread stack dump:\n");
 			sched_show_task(gpk);
 			if (cpu_is_offline(cpu)) {
 				pr_err("RCU GP kthread last ran on offline CPU %d.\n", cpu);
-			} else  {
+			} else if (!(data_race(READ_ONCE(rdp->mynode->qsmask)) & rdp->grpmask)) {
 				pr_err("Stack dump where RCU GP kthread last ran:\n");
 				dump_cpu_task(cpu);
 			}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/6] rcu: Eliminate check_cpu_stall() duplicate code
  2023-10-19 12:01 [PATCH 0/6] RCU stall detection updates for v6.7 Frederic Weisbecker
  2023-10-19 12:01 ` [PATCH 1/6] rcu: Delete a redundant check in rcu_check_gp_kthread_starvation() Frederic Weisbecker
  2023-10-19 12:01 ` [PATCH 2/6] rcu: Don't redump the stalled CPU where RCU GP kthread last ran Frederic Weisbecker
@ 2023-10-19 12:01 ` Frederic Weisbecker
  2023-10-19 12:02 ` [PATCH 4/6] rcu: Add RCU CPU stall notifier Frederic Weisbecker
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Frederic Weisbecker @ 2023-10-19 12:01 UTC (permalink / raw)
  To: LKML
  Cc: Zhen Lei, Boqun Feng, Joel Fernandes, Josh Triplett,
	Mathieu Desnoyers, Neeraj Upadhyay, Paul E . McKenney,
	Steven Rostedt, Uladzislau Rezki, rcu, Frederic Weisbecker

From: Zhen Lei <thunder.leizhen@huawei.com>

The code and comments of self-detected and other-detected RCU CPU stall
warnings are identical except the output function.  This commit therefore
refactors so as to consolidate the duplicate code.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/rcu/tree_stall.h | 42 +++++++++++++++--------------------------
 1 file changed, 15 insertions(+), 27 deletions(-)

diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index fc04a8d7ce96..2443d1d4a6dc 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -711,7 +711,7 @@ static void print_cpu_stall(unsigned long gps)
 
 static void check_cpu_stall(struct rcu_data *rdp)
 {
-	bool didstall = false;
+	bool self_detected;
 	unsigned long gs1;
 	unsigned long gs2;
 	unsigned long gps;
@@ -758,10 +758,10 @@ static void check_cpu_stall(struct rcu_data *rdp)
 		return; /* No stall or GP completed since entering function. */
 	rnp = rdp->mynode;
 	jn = jiffies + ULONG_MAX / 2;
+	self_detected = READ_ONCE(rnp->qsmask) & rdp->grpmask;
 	if (rcu_gp_in_progress() &&
-	    (READ_ONCE(rnp->qsmask) & rdp->grpmask) &&
+	    (self_detected || ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY)) &&
 	    cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) {
-
 		/*
 		 * If a virtual machine is stopped by the host it can look to
 		 * the watchdog like an RCU stall. Check to see if the host
@@ -770,33 +770,21 @@ static void check_cpu_stall(struct rcu_data *rdp)
 		if (kvm_check_and_clear_guest_paused())
 			return;
 
-		/* We haven't checked in, so go dump stack. */
-		print_cpu_stall(gps);
-		if (READ_ONCE(rcu_cpu_stall_ftrace_dump))
-			rcu_ftrace_dump(DUMP_ALL);
-		didstall = true;
-
-	} else if (rcu_gp_in_progress() &&
-		   ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY) &&
-		   cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) {
+		if (self_detected) {
+			/* We haven't checked in, so go dump stack. */
+			print_cpu_stall(gps);
+		} else {
+			/* They had a few time units to dump stack, so complain. */
+			print_other_cpu_stall(gs2, gps);
+		}
 
-		/*
-		 * If a virtual machine is stopped by the host it can look to
-		 * the watchdog like an RCU stall. Check to see if the host
-		 * stopped the vm.
-		 */
-		if (kvm_check_and_clear_guest_paused())
-			return;
-
-		/* They had a few time units to dump stack, so complain. */
-		print_other_cpu_stall(gs2, gps);
 		if (READ_ONCE(rcu_cpu_stall_ftrace_dump))
 			rcu_ftrace_dump(DUMP_ALL);
-		didstall = true;
-	}
-	if (didstall && READ_ONCE(rcu_state.jiffies_stall) == jn) {
-		jn = jiffies + 3 * rcu_jiffies_till_stall_check() + 3;
-		WRITE_ONCE(rcu_state.jiffies_stall, jn);
+
+		if (READ_ONCE(rcu_state.jiffies_stall) == jn) {
+			jn = jiffies + 3 * rcu_jiffies_till_stall_check() + 3;
+			WRITE_ONCE(rcu_state.jiffies_stall, jn);
+		}
 	}
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 4/6] rcu: Add RCU CPU stall notifier
  2023-10-19 12:01 [PATCH 0/6] RCU stall detection updates for v6.7 Frederic Weisbecker
                   ` (2 preceding siblings ...)
  2023-10-19 12:01 ` [PATCH 3/6] rcu: Eliminate check_cpu_stall() duplicate code Frederic Weisbecker
@ 2023-10-19 12:02 ` Frederic Weisbecker
  2023-10-19 12:02 ` [PATCH 5/6] rcutorture: Add test of RCU CPU stall notifiers Frederic Weisbecker
  2023-10-19 12:02 ` [PATCH 6/6] rcu/tree: Defer setting of jiffies during stall reset Frederic Weisbecker
  5 siblings, 0 replies; 7+ messages in thread
From: Frederic Weisbecker @ 2023-10-19 12:02 UTC (permalink / raw)
  To: LKML
  Cc: Paul E. McKenney, Boqun Feng, Joel Fernandes, Josh Triplett,
	Mathieu Desnoyers, Neeraj Upadhyay, Steven Rostedt,
	Uladzislau Rezki, rcu, Frederic Weisbecker

From: "Paul E. McKenney" <paulmck@kernel.org>

It is sometimes helpful to have a way for the subsystem causing
the stall to dump its state when an RCU CPU stall occurs.  This
commit therefore bases rcu_stall_chain_notifier_register() and
rcu_stall_chain_notifier_unregister() on atomic notifiers in order to
provide this functionality.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 include/linux/rcu_notifier.h | 32 +++++++++++++++++++
 kernel/rcu/rcu.h             |  6 ++++
 kernel/rcu/tree_exp.h        |  6 +++-
 kernel/rcu/tree_stall.h      | 59 +++++++++++++++++++++++++++++++++++-
 4 files changed, 101 insertions(+), 2 deletions(-)
 create mode 100644 include/linux/rcu_notifier.h

diff --git a/include/linux/rcu_notifier.h b/include/linux/rcu_notifier.h
new file mode 100644
index 000000000000..ebf371364581
--- /dev/null
+++ b/include/linux/rcu_notifier.h
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+/*
+ * Read-Copy Update notifiers, initially RCU CPU stall notifier.
+ * Separate from rcupdate.h to avoid #include loops.
+ *
+ * Copyright (C) 2023 Paul E. McKenney.
+ */
+
+#ifndef __LINUX_RCU_NOTIFIER_H
+#define __LINUX_RCU_NOTIFIER_H
+
+// Actions for RCU CPU stall notifier calls.
+#define RCU_STALL_NOTIFY_NORM	1
+#define RCU_STALL_NOTIFY_EXP	2
+
+#ifdef CONFIG_RCU_STALL_COMMON
+
+#include <linux/notifier.h>
+#include <linux/types.h>
+
+int rcu_stall_chain_notifier_register(struct notifier_block *n);
+int rcu_stall_chain_notifier_unregister(struct notifier_block *n);
+
+#else // #ifdef CONFIG_RCU_STALL_COMMON
+
+// No RCU CPU stall warnings in Tiny RCU.
+static inline int rcu_stall_chain_notifier_register(struct notifier_block *n) { return -EEXIST; }
+static inline int rcu_stall_chain_notifier_unregister(struct notifier_block *n) { return -ENOENT; }
+
+#endif // #else // #ifdef CONFIG_RCU_STALL_COMMON
+
+#endif /* __LINUX_RCU_NOTIFIER_H */
diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index 98e13be411af..ef3bab977407 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -654,4 +654,10 @@ static inline bool rcu_cpu_beenfullyonline(int cpu) { return true; }
 bool rcu_cpu_beenfullyonline(int cpu);
 #endif
 
+#ifdef CONFIG_RCU_STALL_COMMON
+int rcu_stall_notifier_call_chain(unsigned long val, void *v);
+#else // #ifdef CONFIG_RCU_STALL_COMMON
+static inline int rcu_stall_notifier_call_chain(unsigned long val, void *v) { return NOTIFY_DONE; }
+#endif // #else // #ifdef CONFIG_RCU_STALL_COMMON
+
 #endif /* __LINUX_RCU_H */
diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 8239b39d945b..6d7cea5d591f 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -621,10 +621,14 @@ static void synchronize_rcu_expedited_wait(void)
 	}
 
 	for (;;) {
+		unsigned long j;
+
 		if (synchronize_rcu_expedited_wait_once(jiffies_stall))
 			return;
 		if (rcu_stall_is_suppressed())
 			continue;
+		j = jiffies;
+		rcu_stall_notifier_call_chain(RCU_STALL_NOTIFY_EXP, (void *)(j - jiffies_start));
 		trace_rcu_stall_warning(rcu_state.name, TPS("ExpeditedStall"));
 		pr_err("INFO: %s detected expedited stalls on CPUs/tasks: {",
 		       rcu_state.name);
@@ -647,7 +651,7 @@ static void synchronize_rcu_expedited_wait(void)
 			}
 		}
 		pr_cont(" } %lu jiffies s: %lu root: %#lx/%c\n",
-			jiffies - jiffies_start, rcu_state.expedited_sequence,
+			j - jiffies_start, rcu_state.expedited_sequence,
 			data_race(rnp_root->expmask),
 			".T"[!!data_race(rnp_root->exp_tasks)]);
 		if (ndetected) {
diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index 2443d1d4a6dc..49544f932279 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -8,6 +8,7 @@
  */
 
 #include <linux/kvm_para.h>
+#include <linux/rcu_notifier.h>
 
 //////////////////////////////////////////////////////////////////////////////
 //
@@ -770,6 +771,7 @@ static void check_cpu_stall(struct rcu_data *rdp)
 		if (kvm_check_and_clear_guest_paused())
 			return;
 
+		rcu_stall_notifier_call_chain(RCU_STALL_NOTIFY_NORM, (void *)j - gps);
 		if (self_detected) {
 			/* We haven't checked in, so go dump stack. */
 			print_cpu_stall(gps);
@@ -790,7 +792,7 @@ static void check_cpu_stall(struct rcu_data *rdp)
 
 //////////////////////////////////////////////////////////////////////////////
 //
-// RCU forward-progress mechanisms, including of callback invocation.
+// RCU forward-progress mechanisms, including for callback invocation.
 
 
 /*
@@ -1042,3 +1044,58 @@ static int __init rcu_sysrq_init(void)
 	return 0;
 }
 early_initcall(rcu_sysrq_init);
+
+
+//////////////////////////////////////////////////////////////////////////////
+//
+// RCU CPU stall-warning notifiers
+
+static ATOMIC_NOTIFIER_HEAD(rcu_cpu_stall_notifier_list);
+
+/**
+ * rcu_stall_chain_notifier_register - Add an RCU CPU stall notifier
+ * @n: Entry to add.
+ *
+ * Adds an RCU CPU stall notifier to an atomic notifier chain.
+ * The @action passed to a notifier will be @RCU_STALL_NOTIFY_NORM or
+ * friends.  The @data will be the duration of the stalled grace period,
+ * in jiffies, coerced to a void* pointer.
+ *
+ * Returns 0 on success, %-EEXIST on error.
+ */
+int rcu_stall_chain_notifier_register(struct notifier_block *n)
+{
+	return atomic_notifier_chain_register(&rcu_cpu_stall_notifier_list, n);
+}
+EXPORT_SYMBOL_GPL(rcu_stall_chain_notifier_register);
+
+/**
+ * rcu_stall_chain_notifier_unregister - Remove an RCU CPU stall notifier
+ * @n: Entry to add.
+ *
+ * Removes an RCU CPU stall notifier from an atomic notifier chain.
+ *
+ * Returns zero on success, %-ENOENT on failure.
+ */
+int rcu_stall_chain_notifier_unregister(struct notifier_block *n)
+{
+	return atomic_notifier_chain_unregister(&rcu_cpu_stall_notifier_list, n);
+}
+EXPORT_SYMBOL_GPL(rcu_stall_chain_notifier_unregister);
+
+/*
+ * rcu_stall_notifier_call_chain - Call functions in an RCU CPU stall notifier chain
+ * @val: Value passed unmodified to notifier function
+ * @v: Pointer passed unmodified to notifier function
+ *
+ * Calls each function in the RCU CPU stall notifier chain in turn, which
+ * is an atomic call chain.  See atomic_notifier_call_chain() for more
+ * information.
+ *
+ * This is for use within RCU, hence the omission of the extra asterisk
+ * to indicate a non-kerneldoc format header comment.
+ */
+int rcu_stall_notifier_call_chain(unsigned long val, void *v)
+{
+	return atomic_notifier_call_chain(&rcu_cpu_stall_notifier_list, val, v);
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 5/6] rcutorture: Add test of RCU CPU stall notifiers
  2023-10-19 12:01 [PATCH 0/6] RCU stall detection updates for v6.7 Frederic Weisbecker
                   ` (3 preceding siblings ...)
  2023-10-19 12:02 ` [PATCH 4/6] rcu: Add RCU CPU stall notifier Frederic Weisbecker
@ 2023-10-19 12:02 ` Frederic Weisbecker
  2023-10-19 12:02 ` [PATCH 6/6] rcu/tree: Defer setting of jiffies during stall reset Frederic Weisbecker
  5 siblings, 0 replies; 7+ messages in thread
From: Frederic Weisbecker @ 2023-10-19 12:02 UTC (permalink / raw)
  To: LKML
  Cc: Paul E. McKenney, Boqun Feng, Joel Fernandes, Josh Triplett,
	Mathieu Desnoyers, Neeraj Upadhyay, Steven Rostedt,
	Uladzislau Rezki, rcu, Frederic Weisbecker

From: "Paul E. McKenney" <paulmck@kernel.org>

This commit registers an RCU CPU stall notifier when testing RCU CPU
stalls.  The notifier logs a message similar to the following:

	rcu_torture_stall_nf: v=1, duration=21001.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/rcu/rcutorture.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index ade42d6a9d9b..1830cacac01d 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -21,6 +21,7 @@
 #include <linux/spinlock.h>
 #include <linux/smp.h>
 #include <linux/rcupdate_wait.h>
+#include <linux/rcu_notifier.h>
 #include <linux/interrupt.h>
 #include <linux/sched/signal.h>
 #include <uapi/linux/sched/types.h>
@@ -2428,6 +2429,16 @@ static int rcutorture_booster_init(unsigned int cpu)
 	return 0;
 }
 
+static int rcu_torture_stall_nf(struct notifier_block *nb, unsigned long v, void *ptr)
+{
+	pr_info("%s: v=%lu, duration=%lu.\n", __func__, v, (unsigned long)ptr);
+	return NOTIFY_OK;
+}
+
+static struct notifier_block rcu_torture_stall_block = {
+	.notifier_call = rcu_torture_stall_nf,
+};
+
 /*
  * CPU-stall kthread.  It waits as specified by stall_cpu_holdoff, then
  * induces a CPU stall for the time specified by stall_cpu.
@@ -2435,9 +2446,14 @@ static int rcutorture_booster_init(unsigned int cpu)
 static int rcu_torture_stall(void *args)
 {
 	int idx;
+	int ret;
 	unsigned long stop_at;
 
 	VERBOSE_TOROUT_STRING("rcu_torture_stall task started");
+	ret = rcu_stall_chain_notifier_register(&rcu_torture_stall_block);
+	if (ret)
+		pr_info("%s: rcu_stall_chain_notifier_register() returned %d, %sexpected.\n",
+			__func__, ret, !IS_ENABLED(CONFIG_RCU_STALL_COMMON) ? "un" : "");
 	if (stall_cpu_holdoff > 0) {
 		VERBOSE_TOROUT_STRING("rcu_torture_stall begin holdoff");
 		schedule_timeout_interruptible(stall_cpu_holdoff * HZ);
@@ -2481,6 +2497,11 @@ static int rcu_torture_stall(void *args)
 		cur_ops->readunlock(idx);
 	}
 	pr_alert("%s end.\n", __func__);
+	if (!ret) {
+		ret = rcu_stall_chain_notifier_unregister(&rcu_torture_stall_block);
+		if (ret)
+			pr_info("%s: rcu_stall_chain_notifier_unregister() returned %d.\n", __func__, ret);
+	}
 	torture_shutdown_absorb("rcu_torture_stall");
 	while (!kthread_should_stop())
 		schedule_timeout_interruptible(10 * HZ);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 6/6] rcu/tree: Defer setting of jiffies during stall reset
  2023-10-19 12:01 [PATCH 0/6] RCU stall detection updates for v6.7 Frederic Weisbecker
                   ` (4 preceding siblings ...)
  2023-10-19 12:02 ` [PATCH 5/6] rcutorture: Add test of RCU CPU stall notifiers Frederic Weisbecker
@ 2023-10-19 12:02 ` Frederic Weisbecker
  5 siblings, 0 replies; 7+ messages in thread
From: Frederic Weisbecker @ 2023-10-19 12:02 UTC (permalink / raw)
  To: LKML
  Cc: Joel Fernandes (Google), Boqun Feng, Josh Triplett,
	Mathieu Desnoyers, Neeraj Upadhyay, Paul E . McKenney,
	Steven Rostedt, Uladzislau Rezki, rcu, Huacai Chen, Binbin Zhou,
	Sergey Senozhatsky, Thomas Gleixner, stable, Frederic Weisbecker

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

There are instances where rcu_cpu_stall_reset() is called when jiffies
did not get a chance to update for a long time. Before jiffies is
updated, the CPU stall detector can go off triggering false-positives
where a just-started grace period appears to be ages old. In the past,
we disabled stall detection in rcu_cpu_stall_reset() however this got
changed [1]. This is resulting in false-positives in KGDB usecase [2].

Fix this by deferring the update of jiffies to the third run of the FQS
loop. This is more robust, as, even if rcu_cpu_stall_reset() is called
just before jiffies is read, we would end up pushing out the jiffies
read by 3 more FQS loops. Meanwhile the CPU stall detection will be
delayed and we will not get any false positives.

[1] https://lore.kernel.org/all/20210521155624.174524-2-senozhatsky@chromium.org/
[2] https://lore.kernel.org/all/20230814020045.51950-2-chenhuacai@loongson.cn/

Tested with rcutorture.cpu_stall option as well to verify stall behavior
with/without patch.

Tested-by: Huacai Chen <chenhuacai@loongson.cn>
Reported-by: Binbin Zhou <zhoubinbin@loongson.cn>
Closes: https://lore.kernel.org/all/20230814020045.51950-2-chenhuacai@loongson.cn/
Suggested-by: Paul  McKenney <paulmck@kernel.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Fixes: a80be428fbc1 ("rcu: Do not disable GP stall detection in rcu_cpu_stall_reset()")
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/rcu/tree.c       | 12 ++++++++++++
 kernel/rcu/tree.h       |  4 ++++
 kernel/rcu/tree_stall.h | 20 ++++++++++++++++++--
 3 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index cb1caefa8bd0..d85779f67aea 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1556,10 +1556,22 @@ static bool rcu_gp_fqs_check_wake(int *gfp)
  */
 static void rcu_gp_fqs(bool first_time)
 {
+	int nr_fqs = READ_ONCE(rcu_state.nr_fqs_jiffies_stall);
 	struct rcu_node *rnp = rcu_get_root();
 
 	WRITE_ONCE(rcu_state.gp_activity, jiffies);
 	WRITE_ONCE(rcu_state.n_force_qs, rcu_state.n_force_qs + 1);
+
+	WARN_ON_ONCE(nr_fqs > 3);
+	/* Only countdown nr_fqs for stall purposes if jiffies moves. */
+	if (nr_fqs) {
+		if (nr_fqs == 1) {
+			WRITE_ONCE(rcu_state.jiffies_stall,
+				   jiffies + rcu_jiffies_till_stall_check());
+		}
+		WRITE_ONCE(rcu_state.nr_fqs_jiffies_stall, --nr_fqs);
+	}
+
 	if (first_time) {
 		/* Collect dyntick-idle snapshots. */
 		force_qs_rnp(dyntick_save_progress_counter);
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 192536916f9a..e9821a8422db 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -386,6 +386,10 @@ struct rcu_state {
 						/*  in jiffies. */
 	unsigned long jiffies_stall;		/* Time at which to check */
 						/*  for CPU stalls. */
+	int nr_fqs_jiffies_stall;		/* Number of fqs loops after
+						 * which read jiffies and set
+						 * jiffies_stall. Stall
+						 * warnings disabled if !0. */
 	unsigned long jiffies_resched;		/* Time at which to resched */
 						/*  a reluctant CPU. */
 	unsigned long n_force_qs_gpstart;	/* Snapshot of n_force_qs at */
diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index 49544f932279..ac8e86babe44 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -150,12 +150,17 @@ static void panic_on_rcu_stall(void)
 /**
  * rcu_cpu_stall_reset - restart stall-warning timeout for current grace period
  *
+ * To perform the reset request from the caller, disable stall detection until
+ * 3 fqs loops have passed. This is required to ensure a fresh jiffies is
+ * loaded.  It should be safe to do from the fqs loop as enough timer
+ * interrupts and context switches should have passed.
+ *
  * The caller must disable hard irqs.
  */
 void rcu_cpu_stall_reset(void)
 {
-	WRITE_ONCE(rcu_state.jiffies_stall,
-		   jiffies + rcu_jiffies_till_stall_check());
+	WRITE_ONCE(rcu_state.nr_fqs_jiffies_stall, 3);
+	WRITE_ONCE(rcu_state.jiffies_stall, ULONG_MAX);
 }
 
 //////////////////////////////////////////////////////////////////////////////
@@ -171,6 +176,7 @@ static void record_gp_stall_check_time(void)
 	WRITE_ONCE(rcu_state.gp_start, j);
 	j1 = rcu_jiffies_till_stall_check();
 	smp_mb(); // ->gp_start before ->jiffies_stall and caller's ->gp_seq.
+	WRITE_ONCE(rcu_state.nr_fqs_jiffies_stall, 0);
 	WRITE_ONCE(rcu_state.jiffies_stall, j + j1);
 	rcu_state.jiffies_resched = j + j1 / 2;
 	rcu_state.n_force_qs_gpstart = READ_ONCE(rcu_state.n_force_qs);
@@ -726,6 +732,16 @@ static void check_cpu_stall(struct rcu_data *rdp)
 	    !rcu_gp_in_progress())
 		return;
 	rcu_stall_kick_kthreads();
+
+	/*
+	 * Check if it was requested (via rcu_cpu_stall_reset()) that the FQS
+	 * loop has to set jiffies to ensure a non-stale jiffies value. This
+	 * is required to have good jiffies value after coming out of long
+	 * breaks of jiffies updates. Not doing so can cause false positives.
+	 */
+	if (READ_ONCE(rcu_state.nr_fqs_jiffies_stall) > 0)
+		return;
+
 	j = jiffies;
 
 	/*
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-10-19 12:02 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-19 12:01 [PATCH 0/6] RCU stall detection updates for v6.7 Frederic Weisbecker
2023-10-19 12:01 ` [PATCH 1/6] rcu: Delete a redundant check in rcu_check_gp_kthread_starvation() Frederic Weisbecker
2023-10-19 12:01 ` [PATCH 2/6] rcu: Don't redump the stalled CPU where RCU GP kthread last ran Frederic Weisbecker
2023-10-19 12:01 ` [PATCH 3/6] rcu: Eliminate check_cpu_stall() duplicate code Frederic Weisbecker
2023-10-19 12:02 ` [PATCH 4/6] rcu: Add RCU CPU stall notifier Frederic Weisbecker
2023-10-19 12:02 ` [PATCH 5/6] rcutorture: Add test of RCU CPU stall notifiers Frederic Weisbecker
2023-10-19 12:02 ` [PATCH 6/6] rcu/tree: Defer setting of jiffies during stall reset Frederic Weisbecker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox