[PATCH 00/18] RCU fixes for v6.7

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 00/18] RCU fixes for v6.7
@ 2023-10-13 11:58 Frederic Weisbecker
  2023-10-13 11:58 ` [PATCH 01/18] Revert "checkpatch: Error out if deprecated RCU API used" Frederic Weisbecker
                   ` (17 more replies)
  0 siblings, 18 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2023-10-13 11:58 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Boqun Feng, Joel Fernandes, Josh Triplett,
	Mathieu Desnoyers, Neeraj Upadhyay, Paul E . McKenney,
	Steven Rostedt, Uladzislau Rezki, rcu

Hello,

Please find below the general (S)RCU fixes:

Catalin Marinas (1):
  rcu: kmemleak: Ignore kmemleak false positives when RCU-freeing
    objects

Denis Arefev (1):
  srcu: Fix srcu_struct node grpmask overflow on 64-bit systems

Frederic Weisbecker (8):
  rcu: Use rcu_segcblist_segempty() instead of open coding it
  rcu: Assume IRQS disabled from rcu_report_dead()
  rcu: Assume rcu_report_dead() is always called locally
  rcu: Conditionally build CPU-hotplug teardown callbacks
  rcu: Standardize explicit CPU-hotplug calls
  rcu: Comment why callbacks migration can't wait for CPUHP_RCUTREE_PREP
  srcu: Fix callbacks acceleration mishandling
  srcu: Only accelerate on enqueue time

Joel Fernandes (Google) (3):
  Revert "checkpatch: Error out if deprecated RCU API used"
  srcu: Fix error handling in init_srcu_struct_fields()
  rcu/tree: Remove superfluous return from void call_rcu* functions

Paul E. McKenney (2):
  rcu: Add sysfs to provide throttled access to rcu_barrier()
  rcu: Eliminate rcu_gp_slow_unregister() false positive

Yue Haibing (1):
  rcu: Remove unused function declaration rcu_eqs_special_set()

Zhen Lei (2):
  mm: Remove kmem_valid_obj()
  rcu: Dump memory object info if callback function is invalid

 .../Expedited-Grace-Periods.rst               |   2 +-
 .../Design/Memory-Ordering/TreeRCU-gp-fqs.svg |   4 +-
 .../RCU/Design/Memory-Ordering/TreeRCU-gp.svg |   4 +-
 .../Memory-Ordering/TreeRCU-hotplug.svg       |   4 +-
 .../RCU/Design/Requirements/Requirements.rst  |   4 +-
 .../admin-guide/kernel-parameters.txt         |   7 +
 arch/arm64/kernel/smp.c                       |   4 +-
 arch/powerpc/kernel/smp.c                     |   2 +-
 arch/s390/kernel/smp.c                        |   2 +-
 arch/x86/kernel/smpboot.c                     |   2 +-
 include/linux/interrupt.h                     |   2 +-
 include/linux/rcupdate.h                      |   2 -
 include/linux/rcutiny.h                       |   2 +-
 include/linux/rcutree.h                       |  17 +-
 include/linux/slab.h                          |   5 +-
 kernel/cpu.c                                  |  13 +-
 kernel/rcu/rcu.h                              |   7 +
 kernel/rcu/rcu_segcblist.c                    |   4 +-
 kernel/rcu/srcutiny.c                         |   1 +
 kernel/rcu/srcutree.c                         |  76 ++++--
 kernel/rcu/tasks.h                            |   1 +
 kernel/rcu/tiny.c                             |   1 +
 kernel/rcu/tree.c                             | 230 ++++++++++++------
 mm/slab_common.c                              |  41 +---
 mm/util.c                                     |   4 +-
 scripts/checkpatch.pl                         |   9 -
 26 files changed, 284 insertions(+), 166 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 01/18] Revert "checkpatch: Error out if deprecated RCU API used"
  2023-10-13 11:58 [PATCH 00/18] RCU fixes for v6.7 Frederic Weisbecker
@ 2023-10-13 11:58 ` Frederic Weisbecker
  2023-10-13 11:58 ` [PATCH 02/18] srcu: Fix error handling in init_srcu_struct_fields() Frederic Weisbecker
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2023-10-13 11:58 UTC (permalink / raw)
  To: LKML
  Cc: Joel Fernandes (Google), Boqun Feng, Josh Triplett,
	Mathieu Desnoyers, Neeraj Upadhyay, Paul E . McKenney,
	Steven Rostedt, Uladzislau Rezki, rcu, Frederic Weisbecker

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

The definition for single-argument kfree_rcu() has been removed,
so that any further attempt to use it will result in a build error.
Because of this build error, there is no longer any need for a special
check in checkpatch.pl.

Therefore, revert commit 1eacac3255495be7502d406e2ba5444fb5c3607c.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 scripts/checkpatch.pl | 9 ---------
 1 file changed, 9 deletions(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 7d16f863edf1..25fdb7fda112 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -6427,15 +6427,6 @@ sub process {
 			}
 		}
 
-# check for soon-to-be-deprecated single-argument k[v]free_rcu() API
-		if ($line =~ /\bk[v]?free_rcu\s*\([^(]+\)/) {
-			if ($line =~ /\bk[v]?free_rcu\s*\([^,]+\)/) {
-				ERROR("DEPRECATED_API",
-				      "Single-argument k[v]free_rcu() API is deprecated, please pass rcu_head object or call k[v]free_rcu_mightsleep()." . $herecurr);
-			}
-		}
-
-
 # check for unnecessary "Out of Memory" messages
 		if ($line =~ /^\+.*\b$logFunctions\s*\(/ &&
 		    $prevline =~ /^[ \+]\s*if\s*\(\s*(\!\s*|NULL\s*==\s*)?($Lval)(\s*==\s*NULL\s*)?\s*\)/ &&
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 02/18] srcu: Fix error handling in init_srcu_struct_fields()
  2023-10-13 11:58 [PATCH 00/18] RCU fixes for v6.7 Frederic Weisbecker
  2023-10-13 11:58 ` [PATCH 01/18] Revert "checkpatch: Error out if deprecated RCU API used" Frederic Weisbecker
@ 2023-10-13 11:58 ` Frederic Weisbecker
  2023-10-13 11:58 ` [PATCH 03/18] rcu/tree: Remove superfluous return from void call_rcu* functions Frederic Weisbecker
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2023-10-13 11:58 UTC (permalink / raw)
  To: LKML
  Cc: Joel Fernandes (Google), Boqun Feng, Josh Triplett,
	Mathieu Desnoyers, Neeraj Upadhyay, Paul E . McKenney,
	Steven Rostedt, Uladzislau Rezki, rcu, Frederic Weisbecker

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

The current error handling in init_srcu_struct_fields() is a bit
inconsistent.  If init_srcu_struct_nodes() fails, the function either
returns -ENOMEM or 0 depending on whether ssp->sda_is_static is true or
false. This can make init_srcu_struct_fields() return 0 even if memory
allocation failed!

Simplify the error handling by always returning -ENOMEM if either
init_srcu_struct_nodes() or the per-CPU allocation fails. This makes the
control flow easier to follow and avoids the inconsistent return values.

Add goto labels to avoid duplicating the error cleanup code.

Link: https://lore.kernel.org/r/20230404003508.GA254019@google.com
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/rcu/srcutree.c | 32 +++++++++++++++++---------------
 1 file changed, 17 insertions(+), 15 deletions(-)

diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index 20d7a238d675..f1a905200fc2 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -255,29 +255,31 @@ static int init_srcu_struct_fields(struct srcu_struct *ssp, bool is_static)
 	ssp->srcu_sup->sda_is_static = is_static;
 	if (!is_static)
 		ssp->sda = alloc_percpu(struct srcu_data);
-	if (!ssp->sda) {
-		if (!is_static)
-			kfree(ssp->srcu_sup);
-		return -ENOMEM;
-	}
+	if (!ssp->sda)
+		goto err_free_sup;
 	init_srcu_struct_data(ssp);
 	ssp->srcu_sup->srcu_gp_seq_needed_exp = 0;
 	ssp->srcu_sup->srcu_last_gp_end = ktime_get_mono_fast_ns();
 	if (READ_ONCE(ssp->srcu_sup->srcu_size_state) == SRCU_SIZE_SMALL && SRCU_SIZING_IS_INIT()) {
-		if (!init_srcu_struct_nodes(ssp, GFP_ATOMIC)) {
-			if (!ssp->srcu_sup->sda_is_static) {
-				free_percpu(ssp->sda);
-				ssp->sda = NULL;
-				kfree(ssp->srcu_sup);
-				return -ENOMEM;
-			}
-		} else {
-			WRITE_ONCE(ssp->srcu_sup->srcu_size_state, SRCU_SIZE_BIG);
-		}
+		if (!init_srcu_struct_nodes(ssp, GFP_ATOMIC))
+			goto err_free_sda;
+		WRITE_ONCE(ssp->srcu_sup->srcu_size_state, SRCU_SIZE_BIG);
 	}
 	ssp->srcu_sup->srcu_ssp = ssp;
 	smp_store_release(&ssp->srcu_sup->srcu_gp_seq_needed, 0); /* Init done. */
 	return 0;
+
+err_free_sda:
+	if (!is_static) {
+		free_percpu(ssp->sda);
+		ssp->sda = NULL;
+	}
+err_free_sup:
+	if (!is_static) {
+		kfree(ssp->srcu_sup);
+		ssp->srcu_sup = NULL;
+	}
+	return -ENOMEM;
 }
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 03/18] rcu/tree: Remove superfluous return from void call_rcu* functions
  2023-10-13 11:58 [PATCH 00/18] RCU fixes for v6.7 Frederic Weisbecker
  2023-10-13 11:58 ` [PATCH 01/18] Revert "checkpatch: Error out if deprecated RCU API used" Frederic Weisbecker
  2023-10-13 11:58 ` [PATCH 02/18] srcu: Fix error handling in init_srcu_struct_fields() Frederic Weisbecker
@ 2023-10-13 11:58 ` Frederic Weisbecker
  2023-10-13 11:58 ` [PATCH 04/18] rcu: Add sysfs to provide throttled access to rcu_barrier() Frederic Weisbecker
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2023-10-13 11:58 UTC (permalink / raw)
  To: LKML
  Cc: Joel Fernandes (Google), Boqun Feng, Josh Triplett,
	Mathieu Desnoyers, Neeraj Upadhyay, Paul E . McKenney,
	Steven Rostedt, Uladzislau Rezki, rcu, Frederic Weisbecker

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

The return keyword is not needed here.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/rcu/tree.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index cb1caefa8bd0..7c79480bfaa0 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2713,7 +2713,7 @@ __call_rcu_common(struct rcu_head *head, rcu_callback_t func, bool lazy_in)
  */
 void call_rcu_hurry(struct rcu_head *head, rcu_callback_t func)
 {
-	return __call_rcu_common(head, func, false);
+	__call_rcu_common(head, func, false);
 }
 EXPORT_SYMBOL_GPL(call_rcu_hurry);
 #endif
@@ -2764,7 +2764,7 @@ EXPORT_SYMBOL_GPL(call_rcu_hurry);
  */
 void call_rcu(struct rcu_head *head, rcu_callback_t func)
 {
-	return __call_rcu_common(head, func, IS_ENABLED(CONFIG_RCU_LAZY));
+	__call_rcu_common(head, func, IS_ENABLED(CONFIG_RCU_LAZY));
 }
 EXPORT_SYMBOL_GPL(call_rcu);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 04/18] rcu: Add sysfs to provide throttled access to rcu_barrier()
  2023-10-13 11:58 [PATCH 00/18] RCU fixes for v6.7 Frederic Weisbecker
                   ` (2 preceding siblings ...)
  2023-10-13 11:58 ` [PATCH 03/18] rcu/tree: Remove superfluous return from void call_rcu* functions Frederic Weisbecker
@ 2023-10-13 11:58 ` Frederic Weisbecker
  2023-10-13 11:58 ` [PATCH 05/18] rcu: Remove unused function declaration rcu_eqs_special_set() Frederic Weisbecker
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2023-10-13 11:58 UTC (permalink / raw)
  To: LKML
  Cc: Paul E. McKenney, Boqun Feng, Joel Fernandes, Josh Triplett,
	Mathieu Desnoyers, Neeraj Upadhyay, Steven Rostedt,
	Uladzislau Rezki, rcu, Johannes Weiner, Frederic Weisbecker

From: "Paul E. McKenney" <paulmck@kernel.org>

When running a series of stress tests all making heavy use of RCU,
it is all too possible to OOM the system when the prior test's RCU
callbacks don't get invoked until after the subsequent test starts.
One way of handling this is just a timed wait, but this fails when a
given CPU has so many callbacks queued that they take longer to invoke
than allowed for by that timed wait.

This commit therefore adds an rcutree.do_rcu_barrier module parameter that
is accessible from sysfs.  Writing one of the many synonyms for boolean
"true" will cause an rcu_barrier() to be invoked, but will guarantee that
no more than one rcu_barrier() will be invoked per sixteenth of a second
via this mechanism.  The flip side is that a given request might wait a
second or three longer than absolutely necessary, but only when there are
multiple uses of rcutree.do_rcu_barrier within a one-second time interval.

This commit unnecessarily serializes the rcu_barrier() machinery, given
that serialization is already provided by procfs.  This has the advantage
of allowing throttled rcu_barrier() from other sources within the kernel.

Reported-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 .../admin-guide/kernel-parameters.txt         |  7 ++
 kernel/rcu/tree.c                             | 76 +++++++++++++++++++
 2 files changed, 83 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 0a1731a0f0ef..7ec8a406d419 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4769,6 +4769,13 @@
 			Set maximum number of finished RCU callbacks to
 			process in one batch.
 
+	rcutree.do_rcu_barrier=	[KNL]
+			Request a call to rcu_barrier().  This is
+			throttled so that userspace tests can safely
+			hammer on the sysfs variable if they so choose.
+			If triggered before the RCU grace-period machinery
+			is fully active, this will error out with EAGAIN.
+
 	rcutree.dump_tree=	[KNL]
 			Dump the structure of the rcu_node combining tree
 			out at early boot.  This is used for diagnostic
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 7c79480bfaa0..3c7281fc25a7 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -4083,6 +4083,82 @@ void rcu_barrier(void)
 }
 EXPORT_SYMBOL_GPL(rcu_barrier);
 
+static unsigned long rcu_barrier_last_throttle;
+
+/**
+ * rcu_barrier_throttled - Do rcu_barrier(), but limit to one per second
+ *
+ * This can be thought of as guard rails around rcu_barrier() that
+ * permits unrestricted userspace use, at least assuming the hardware's
+ * try_cmpxchg() is robust.  There will be at most one call per second to
+ * rcu_barrier() system-wide from use of this function, which means that
+ * callers might needlessly wait a second or three.
+ *
+ * This is intended for use by test suites to avoid OOM by flushing RCU
+ * callbacks from the previous test before starting the next.  See the
+ * rcutree.do_rcu_barrier module parameter for more information.
+ *
+ * Why not simply make rcu_barrier() more scalable?  That might be
+ * the eventual endpoint, but let's keep it simple for the time being.
+ * Note that the module parameter infrastructure serializes calls to a
+ * given .set() function, but should concurrent .set() invocation ever be
+ * possible, we are ready!
+ */
+static void rcu_barrier_throttled(void)
+{
+	unsigned long j = jiffies;
+	unsigned long old = READ_ONCE(rcu_barrier_last_throttle);
+	unsigned long s = rcu_seq_snap(&rcu_state.barrier_sequence);
+
+	while (time_in_range(j, old, old + HZ / 16) ||
+	       !try_cmpxchg(&rcu_barrier_last_throttle, &old, j)) {
+		schedule_timeout_idle(HZ / 16);
+		if (rcu_seq_done(&rcu_state.barrier_sequence, s)) {
+			smp_mb(); /* caller's subsequent code after above check. */
+			return;
+		}
+		j = jiffies;
+		old = READ_ONCE(rcu_barrier_last_throttle);
+	}
+	rcu_barrier();
+}
+
+/*
+ * Invoke rcu_barrier_throttled() when a rcutree.do_rcu_barrier
+ * request arrives.  We insist on a true value to allow for possible
+ * future expansion.
+ */
+static int param_set_do_rcu_barrier(const char *val, const struct kernel_param *kp)
+{
+	bool b;
+	int ret;
+
+	if (rcu_scheduler_active != RCU_SCHEDULER_RUNNING)
+		return -EAGAIN;
+	ret = kstrtobool(val, &b);
+	if (!ret && b) {
+		atomic_inc((atomic_t *)kp->arg);
+		rcu_barrier_throttled();
+		atomic_dec((atomic_t *)kp->arg);
+	}
+	return ret;
+}
+
+/*
+ * Output the number of outstanding rcutree.do_rcu_barrier requests.
+ */
+static int param_get_do_rcu_barrier(char *buffer, const struct kernel_param *kp)
+{
+	return sprintf(buffer, "%d\n", atomic_read((atomic_t *)kp->arg));
+}
+
+static const struct kernel_param_ops do_rcu_barrier_ops = {
+	.set = param_set_do_rcu_barrier,
+	.get = param_get_do_rcu_barrier,
+};
+static atomic_t do_rcu_barrier;
+module_param_cb(do_rcu_barrier, &do_rcu_barrier_ops, &do_rcu_barrier, 0644);
+
 /*
  * Compute the mask of online CPUs for the specified rcu_node structure.
  * This will not be stable unless the rcu_node structure's ->lock is
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 05/18] rcu: Remove unused function declaration rcu_eqs_special_set()
  2023-10-13 11:58 [PATCH 00/18] RCU fixes for v6.7 Frederic Weisbecker
                   ` (3 preceding siblings ...)
  2023-10-13 11:58 ` [PATCH 04/18] rcu: Add sysfs to provide throttled access to rcu_barrier() Frederic Weisbecker
@ 2023-10-13 11:58 ` Frederic Weisbecker
  2023-10-13 11:58 ` [PATCH 06/18] mm: Remove kmem_valid_obj() Frederic Weisbecker
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2023-10-13 11:58 UTC (permalink / raw)
  To: LKML
  Cc: Yue Haibing, Boqun Feng, Joel Fernandes, Josh Triplett,
	Mathieu Desnoyers, Neeraj Upadhyay, Paul E . McKenney,
	Steven Rostedt, Uladzislau Rezki, rcu, Frederic Weisbecker

From: Yue Haibing <yuehaibing@huawei.com>

Commit a86baa69c2b7 ("rcu: Remove special bit at the bottom of the ->dynticks counter")
left behind this, remove it.

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 include/linux/rcutree.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index 126f6b418f6a..153cfc7bbffd 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -37,7 +37,6 @@ void synchronize_rcu_expedited(void);
 void kvfree_call_rcu(struct rcu_head *head, void *ptr);
 
 void rcu_barrier(void);
-bool rcu_eqs_special_set(int cpu);
 void rcu_momentary_dyntick_idle(void);
 void kfree_rcu_scheduler_running(void);
 bool rcu_gp_might_be_stalled(void);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 06/18] mm: Remove kmem_valid_obj()
  2023-10-13 11:58 [PATCH 00/18] RCU fixes for v6.7 Frederic Weisbecker
                   ` (4 preceding siblings ...)
  2023-10-13 11:58 ` [PATCH 05/18] rcu: Remove unused function declaration rcu_eqs_special_set() Frederic Weisbecker
@ 2023-10-13 11:58 ` Frederic Weisbecker
  2023-10-13 11:58 ` [PATCH 07/18] rcu: Dump memory object info if callback function is invalid Frederic Weisbecker
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2023-10-13 11:58 UTC (permalink / raw)
  To: LKML
  Cc: Zhen Lei, Boqun Feng, Joel Fernandes, Josh Triplett,
	Mathieu Desnoyers, Neeraj Upadhyay, Paul E . McKenney,
	Steven Rostedt, Uladzislau Rezki, rcu, Matthew Wilcox,
	Vlastimil Babka, Frederic Weisbecker

From: Zhen Lei <thunder.leizhen@huawei.com>

Function kmem_dump_obj() will splat if passed a pointer to a non-slab
object. So nothing calls it directly, instead calling kmem_valid_obj()
first to determine whether the passed pointer to a valid slab object. This
means that merging kmem_valid_obj() into kmem_dump_obj() will make the
code more concise. Therefore, convert kmem_dump_obj() to work the same
way as vmalloc_dump_obj(), removing the need for the kmem_dump_obj()
caller to check kmem_valid_obj().  After this, there are no remaining
calls to kmem_valid_obj() anymore, and it can be safely removed.

Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 include/linux/slab.h |  5 +++--
 mm/slab_common.c     | 41 +++++++++++------------------------------
 mm/util.c            |  4 +---
 3 files changed, 15 insertions(+), 35 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 8228d1276a2f..ff56ab804bf6 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -245,8 +245,9 @@ DEFINE_FREE(kfree, void *, if (_T) kfree(_T))
 size_t ksize(const void *objp);
 
 #ifdef CONFIG_PRINTK
-bool kmem_valid_obj(void *object);
-void kmem_dump_obj(void *object);
+bool kmem_dump_obj(void *object);
+#else
+static inline bool kmem_dump_obj(void *object) { return false; }
 #endif
 
 /*
diff --git a/mm/slab_common.c b/mm/slab_common.c
index cd71f9581e67..a425bedf2103 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -528,26 +528,6 @@ bool slab_is_available(void)
 }
 
 #ifdef CONFIG_PRINTK
-/**
- * kmem_valid_obj - does the pointer reference a valid slab object?
- * @object: pointer to query.
- *
- * Return: %true if the pointer is to a not-yet-freed object from
- * kmalloc() or kmem_cache_alloc(), either %true or %false if the pointer
- * is to an already-freed object, and %false otherwise.
- */
-bool kmem_valid_obj(void *object)
-{
-	struct folio *folio;
-
-	/* Some arches consider ZERO_SIZE_PTR to be a valid address. */
-	if (object < (void *)PAGE_SIZE || !virt_addr_valid(object))
-		return false;
-	folio = virt_to_folio(object);
-	return folio_test_slab(folio);
-}
-EXPORT_SYMBOL_GPL(kmem_valid_obj);
-
 static void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct slab *slab)
 {
 	if (__kfence_obj_info(kpp, object, slab))
@@ -566,11 +546,11 @@ static void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct slab *
  * and, if available, the slab name, return address, and stack trace from
  * the allocation and last free path of that object.
  *
- * This function will splat if passed a pointer to a non-slab object.
- * If you are not sure what type of object you have, you should instead
- * use mem_dump_obj().
+ * Return: %true if the pointer is to a not-yet-freed object from
+ * kmalloc() or kmem_cache_alloc(), either %true or %false if the pointer
+ * is to an already-freed object, and %false otherwise.
  */
-void kmem_dump_obj(void *object)
+bool kmem_dump_obj(void *object)
 {
 	char *cp = IS_ENABLED(CONFIG_MMU) ? "" : "/vmalloc";
 	int i;
@@ -578,13 +558,13 @@ void kmem_dump_obj(void *object)
 	unsigned long ptroffset;
 	struct kmem_obj_info kp = { };
 
-	if (WARN_ON_ONCE(!virt_addr_valid(object)))
-		return;
+	/* Some arches consider ZERO_SIZE_PTR to be a valid address. */
+	if (object < (void *)PAGE_SIZE || !virt_addr_valid(object))
+		return false;
 	slab = virt_to_slab(object);
-	if (WARN_ON_ONCE(!slab)) {
-		pr_cont(" non-slab memory.\n");
-		return;
-	}
+	if (!slab)
+		return false;
+
 	kmem_obj_info(&kp, object, slab);
 	if (kp.kp_slab_cache)
 		pr_cont(" slab%s %s", cp, kp.kp_slab_cache->name);
@@ -621,6 +601,7 @@ void kmem_dump_obj(void *object)
 		pr_info("    %pS\n", kp.kp_free_stack[i]);
 	}
 
+	return true;
 }
 EXPORT_SYMBOL_GPL(kmem_dump_obj);
 #endif
diff --git a/mm/util.c b/mm/util.c
index 8cbbfd3a3d59..6eddd891198e 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -1060,10 +1060,8 @@ void mem_dump_obj(void *object)
 {
 	const char *type;
 
-	if (kmem_valid_obj(object)) {
-		kmem_dump_obj(object);
+	if (kmem_dump_obj(object))
 		return;
-	}
 
 	if (vmalloc_dump_obj(object))
 		return;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 07/18] rcu: Dump memory object info if callback function is invalid
  2023-10-13 11:58 [PATCH 00/18] RCU fixes for v6.7 Frederic Weisbecker
                   ` (5 preceding siblings ...)
  2023-10-13 11:58 ` [PATCH 06/18] mm: Remove kmem_valid_obj() Frederic Weisbecker
@ 2023-10-13 11:58 ` Frederic Weisbecker
  2023-10-13 11:58 ` [PATCH 08/18] rcu: Eliminate rcu_gp_slow_unregister() false positive Frederic Weisbecker
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2023-10-13 11:58 UTC (permalink / raw)
  To: LKML
  Cc: Zhen Lei, Boqun Feng, Joel Fernandes, Josh Triplett,
	Mathieu Desnoyers, Neeraj Upadhyay, Paul E . McKenney,
	Steven Rostedt, Uladzislau Rezki, rcu, Frederic Weisbecker

From: Zhen Lei <thunder.leizhen@huawei.com>

When a structure containing an RCU callback rhp is (incorrectly) freed
and reallocated after rhp is passed to call_rcu(), it is not unusual for
rhp->func to be set to NULL. This defeats the debugging prints used by
__call_rcu_common() in kernels built with CONFIG_DEBUG_OBJECTS_RCU_HEAD=y,
which expect to identify the offending code using the identity of this
function.

And in kernels build without CONFIG_DEBUG_OBJECTS_RCU_HEAD=y, things
are even worse, as can be seen from this splat:

Unable to handle kernel NULL pointer dereference at virtual address 0
... ...
PC is at 0x0
LR is at rcu_do_batch+0x1c0/0x3b8
... ...
 (rcu_do_batch) from (rcu_core+0x1d4/0x284)
 (rcu_core) from (__do_softirq+0x24c/0x344)
 (__do_softirq) from (__irq_exit_rcu+0x64/0x108)
 (__irq_exit_rcu) from (irq_exit+0x8/0x10)
 (irq_exit) from (__handle_domain_irq+0x74/0x9c)
 (__handle_domain_irq) from (gic_handle_irq+0x8c/0x98)
 (gic_handle_irq) from (__irq_svc+0x5c/0x94)
 (__irq_svc) from (arch_cpu_idle+0x20/0x3c)
 (arch_cpu_idle) from (default_idle_call+0x4c/0x78)
 (default_idle_call) from (do_idle+0xf8/0x150)
 (do_idle) from (cpu_startup_entry+0x18/0x20)
 (cpu_startup_entry) from (0xc01530)

This commit therefore adds calls to mem_dump_obj(rhp) to output some
information, for example:

  slab kmalloc-256 start ffff410c45019900 pointer offset 0 size 256

This provides the rough size of the memory block and the offset of the
rcu_head structure, which as least provides at least a few clues to help
locate the problem. If the problem is reproducible, additional slab
debugging can be enabled, for example, CONFIG_DEBUG_SLAB=y, which can
provide significantly more information.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/rcu/rcu.h      | 7 +++++++
 kernel/rcu/srcutiny.c | 1 +
 kernel/rcu/srcutree.c | 1 +
 kernel/rcu/tasks.h    | 1 +
 kernel/rcu/tiny.c     | 1 +
 kernel/rcu/tree.c     | 1 +
 6 files changed, 12 insertions(+)

diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index 98e13be411af..d612731feea4 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -10,6 +10,7 @@
 #ifndef __LINUX_RCU_H
 #define __LINUX_RCU_H
 
+#include <linux/slab.h>
 #include <trace/events/rcu.h>
 
 /*
@@ -248,6 +249,12 @@ static inline void debug_rcu_head_unqueue(struct rcu_head *head)
 }
 #endif	/* #else !CONFIG_DEBUG_OBJECTS_RCU_HEAD */
 
+static inline void debug_rcu_head_callback(struct rcu_head *rhp)
+{
+	if (unlikely(!rhp->func))
+		kmem_dump_obj(rhp);
+}
+
 extern int rcu_cpu_stall_suppress_at_boot;
 
 static inline bool rcu_stall_is_suppressed_at_boot(void)
diff --git a/kernel/rcu/srcutiny.c b/kernel/rcu/srcutiny.c
index 336af24e0fe3..c38e5933a5d6 100644
--- a/kernel/rcu/srcutiny.c
+++ b/kernel/rcu/srcutiny.c
@@ -138,6 +138,7 @@ void srcu_drive_gp(struct work_struct *wp)
 	while (lh) {
 		rhp = lh;
 		lh = lh->next;
+		debug_rcu_head_callback(rhp);
 		local_bh_disable();
 		rhp->func(rhp);
 		local_bh_enable();
diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index f1a905200fc2..833a8f848a90 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -1710,6 +1710,7 @@ static void srcu_invoke_callbacks(struct work_struct *work)
 	rhp = rcu_cblist_dequeue(&ready_cbs);
 	for (; rhp != NULL; rhp = rcu_cblist_dequeue(&ready_cbs)) {
 		debug_rcu_head_unqueue(rhp);
+		debug_rcu_head_callback(rhp);
 		local_bh_disable();
 		rhp->func(rhp);
 		local_bh_enable();
diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 8d65f7d576a3..7c845532a50a 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -538,6 +538,7 @@ static void rcu_tasks_invoke_cbs(struct rcu_tasks *rtp, struct rcu_tasks_percpu
 	raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags);
 	len = rcl.len;
 	for (rhp = rcu_cblist_dequeue(&rcl); rhp; rhp = rcu_cblist_dequeue(&rcl)) {
+		debug_rcu_head_callback(rhp);
 		local_bh_disable();
 		rhp->func(rhp);
 		local_bh_enable();
diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c
index 42f7589e51e0..fec804b79080 100644
--- a/kernel/rcu/tiny.c
+++ b/kernel/rcu/tiny.c
@@ -97,6 +97,7 @@ static inline bool rcu_reclaim_tiny(struct rcu_head *head)
 
 	trace_rcu_invoke_callback("", head);
 	f = head->func;
+	debug_rcu_head_callback(head);
 	WRITE_ONCE(head->func, (rcu_callback_t)0L);
 	f(head);
 	rcu_lock_release(&rcu_callback_map);
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 3c7281fc25a7..aae515071ffd 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2135,6 +2135,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
 		trace_rcu_invoke_callback(rcu_state.name, rhp);
 
 		f = rhp->func;
+		debug_rcu_head_callback(rhp);
 		WRITE_ONCE(rhp->func, (rcu_callback_t)0L);
 		f(rhp);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 08/18] rcu: Eliminate rcu_gp_slow_unregister() false positive
  2023-10-13 11:58 [PATCH 00/18] RCU fixes for v6.7 Frederic Weisbecker
                   ` (6 preceding siblings ...)
  2023-10-13 11:58 ` [PATCH 07/18] rcu: Dump memory object info if callback function is invalid Frederic Weisbecker
@ 2023-10-13 11:58 ` Frederic Weisbecker
  2023-10-13 11:58 ` [PATCH 09/18] srcu: Fix srcu_struct node grpmask overflow on 64-bit systems Frederic Weisbecker
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2023-10-13 11:58 UTC (permalink / raw)
  To: LKML
  Cc: Paul E. McKenney, Boqun Feng, Joel Fernandes, Josh Triplett,
	Mathieu Desnoyers, Neeraj Upadhyay, Steven Rostedt,
	Uladzislau Rezki, rcu, Frederic Weisbecker

From: "Paul E. McKenney" <paulmck@kernel.org>

When using rcutorture as a module, there are a number of conditions that
can abort the modprobe operation, for example, when attempting to run
both RCU CPU stall warning tests and forward-progress tests.  This can
cause rcu_torture_cleanup() to be invoked on the unwind path out of
rcu_rcu_torture_init(), which will mean that rcu_gp_slow_unregister()
is invoked without a matching rcu_gp_slow_register().  This will cause
a splat because rcu_gp_slow_unregister() is passed rcu_fwd_cb_nodelay,
which does not match a NULL pointer.

This commit therefore forgives a mismatch involving a NULL pointer, thus
avoiding this false-positive splat.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/rcu/tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index aae515071ffd..a83ecab77917 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1260,7 +1260,7 @@ EXPORT_SYMBOL_GPL(rcu_gp_slow_register);
 /* Unregister a counter, with NULL for not caring which. */
 void rcu_gp_slow_unregister(atomic_t *rgssp)
 {
-	WARN_ON_ONCE(rgssp && rgssp != rcu_gp_slow_suppress);
+	WARN_ON_ONCE(rgssp && rgssp != rcu_gp_slow_suppress && rcu_gp_slow_suppress != NULL);

 	WRITE_ONCE(rcu_gp_slow_suppress, NULL);
 }
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 09/18] srcu: Fix srcu_struct node grpmask overflow on 64-bit systems
  2023-10-13 11:58 [PATCH 00/18] RCU fixes for v6.7 Frederic Weisbecker
                   ` (7 preceding siblings ...)
  2023-10-13 11:58 ` [PATCH 08/18] rcu: Eliminate rcu_gp_slow_unregister() false positive Frederic Weisbecker
@ 2023-10-13 11:58 ` Frederic Weisbecker
  2023-10-13 12:54   ` David Laight
  2023-10-13 11:58 ` [PATCH 10/18] rcu: kmemleak: Ignore kmemleak false positives when RCU-freeing objects Frederic Weisbecker
                   ` (8 subsequent siblings)
  17 siblings, 1 reply; 21+ messages in thread
From: Frederic Weisbecker @ 2023-10-13 11:58 UTC (permalink / raw)
  To: LKML
  Cc: Denis Arefev, Boqun Feng, Joel Fernandes, Josh Triplett,
	Mathieu Desnoyers, Neeraj Upadhyay, Paul E . McKenney,
	Steven Rostedt, Uladzislau Rezki, rcu, David Laight,
	Frederic Weisbecker

From: Denis Arefev <arefev@swemel.ru>

The value of a bitwise expression 1 << (cpu - sdp->mynode->grplo)
is subject to overflow due to a failure to cast operands to a larger
data type before performing the bitwise operation.

The maximum result of this subtraction is defined by the RCU_FANOUT_LEAF
Kconfig option, which on 64-bit systems defaults to 16 (resulting in a
maximum shift of 15), but which can be set up as high as 64 (resulting
in a maximum shift of 63).  A value of 31 can result in sign extension,
resulting in 0xffffffff80000000 instead of the desired 0x80000000.
A value of 32 or greater triggers undefined behavior per the C standard.

This bug has not been known to cause issues because almost all kernels
take the default CONFIG_RCU_FANOUT_LEAF=16.  Furthermore, as long as a
given compiler gives a deterministic non-zero result for 1<<N for N>=32,
the code correctly invokes all SRCU callbacks, albeit wasting CPU time
along the way.

This commit therefore substitutes the correct 1UL for the buggy 1.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Signed-off-by: Denis Arefev <arefev@swemel.ru>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: David Laight <David.Laight@aculab.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/rcu/srcutree.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index 833a8f848a90..5602042856b1 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -223,7 +223,7 @@ static bool init_srcu_struct_nodes(struct srcu_struct *ssp, gfp_t gfp_flags)
 				snp->grplo = cpu;
 			snp->grphi = cpu;
 		}
-		sdp->grpmask = 1 << (cpu - sdp->mynode->grplo);
+		sdp->grpmask = 1UL << (cpu - sdp->mynode->grplo);
 	}
 	smp_store_release(&ssp->srcu_sup->srcu_size_state, SRCU_SIZE_WAIT_BARRIER);
 	return true;
@@ -835,7 +835,7 @@ static void srcu_schedule_cbs_snp(struct srcu_struct *ssp, struct srcu_node *snp
 	int cpu;
 
 	for (cpu = snp->grplo; cpu <= snp->grphi; cpu++) {
-		if (!(mask & (1 << (cpu - snp->grplo))))
+		if (!(mask & (1UL << (cpu - snp->grplo))))
 			continue;
 		srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* RE: [PATCH 09/18] srcu: Fix srcu_struct node grpmask overflow on 64-bit systems
  2023-10-13 11:58 ` [PATCH 09/18] srcu: Fix srcu_struct node grpmask overflow on 64-bit systems Frederic Weisbecker
@ 2023-10-13 12:54   ` David Laight
  2023-10-13 14:11     ` Frederic Weisbecker
  0 siblings, 1 reply; 21+ messages in thread
From: David Laight @ 2023-10-13 12:54 UTC (permalink / raw)
  To: 'Frederic Weisbecker', LKML
  Cc: Denis Arefev, Boqun Feng, Joel Fernandes, Josh Triplett,
	Mathieu Desnoyers, Neeraj Upadhyay, Paul E . McKenney,
	Steven Rostedt, Uladzislau Rezki, rcu

From: Frederic Weisbecker
> Sent: 13 October 2023 12:59
> 
> The value of a bitwise expression 1 << (cpu - sdp->mynode->grplo)
> is subject to overflow due to a failure to cast operands to a larger
> data type before performing the bitwise operation.
> 
> The maximum result of this subtraction is defined by the RCU_FANOUT_LEAF
> Kconfig option, which on 64-bit systems defaults to 16 (resulting in a
> maximum shift of 15), but which can be set up as high as 64 (resulting
> in a maximum shift of 63).  A value of 31 can result in sign extension,
> resulting in 0xffffffff80000000 instead of the desired 0x80000000.
> A value of 32 or greater triggers undefined behavior per the C standard.
> 
> This bug has not been known to cause issues because almost all kernels
> take the default CONFIG_RCU_FANOUT_LEAF=16.  Furthermore, as long as a
> given compiler gives a deterministic non-zero result for 1<<N for N>=32,
> the code correctly invokes all SRCU callbacks, albeit wasting CPU time
> along the way.
> 
> This commit therefore substitutes the correct 1UL for the buggy 1.
> 
> Found by Linux Verification Center (linuxtesting.org) with SVACE.
> 
> Signed-off-by: Denis Arefev <arefev@swemel.ru>
> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> Cc: David Laight <David.Laight@aculab.com>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> ---
>  kernel/rcu/srcutree.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> index 833a8f848a90..5602042856b1 100644
> --- a/kernel/rcu/srcutree.c
> +++ b/kernel/rcu/srcutree.c
> @@ -223,7 +223,7 @@ static bool init_srcu_struct_nodes(struct srcu_struct *ssp, gfp_t gfp_flags)
>  				snp->grplo = cpu;
>  			snp->grphi = cpu;
>  		}
> -		sdp->grpmask = 1 << (cpu - sdp->mynode->grplo);
> +		sdp->grpmask = 1UL << (cpu - sdp->mynode->grplo);
>  	}
>  	smp_store_release(&ssp->srcu_sup->srcu_size_state, SRCU_SIZE_WAIT_BARRIER);
>  	return true;
> @@ -835,7 +835,7 @@ static void srcu_schedule_cbs_snp(struct srcu_struct *ssp, struct srcu_node *snp
>  	int cpu;
> 
>  	for (cpu = snp->grplo; cpu <= snp->grphi; cpu++) {
> -		if (!(mask & (1 << (cpu - snp->grplo))))
> +		if (!(mask & (1UL << (cpu - snp->grplo))))
>  			continue;
>  		srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
>  	}

That loop is entirely horrid.
The compiler almost certainly has to reload snp->grphi every iteration.
Also it looks as though the bottom bit of mask is checked first.
So how about:
	grphi = snp->grphi;
	for (cpu = snp->grplo; cpu <= grphi; cpu++, mask >>= 1) {
		if (!(mask & 1))
			continue;
		srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
	}

	David		

> --
> 2.34.1

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 09/18] srcu: Fix srcu_struct node grpmask overflow on 64-bit systems
  2023-10-13 12:54   ` David Laight
@ 2023-10-13 14:11     ` Frederic Weisbecker
  0 siblings, 0 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2023-10-13 14:11 UTC (permalink / raw)
  To: David Laight
  Cc: LKML, Denis Arefev, Boqun Feng, Joel Fernandes, Josh Triplett,
	Mathieu Desnoyers, Neeraj Upadhyay, Paul E . McKenney,
	Steven Rostedt, Uladzislau Rezki, rcu

On Fri, Oct 13, 2023 at 12:54:32PM +0000, David Laight wrote:
> From: Frederic Weisbecker
> > Sent: 13 October 2023 12:59
> > 
> > The value of a bitwise expression 1 << (cpu - sdp->mynode->grplo)
> > is subject to overflow due to a failure to cast operands to a larger
> > data type before performing the bitwise operation.
> > 
> > The maximum result of this subtraction is defined by the RCU_FANOUT_LEAF
> > Kconfig option, which on 64-bit systems defaults to 16 (resulting in a
> > maximum shift of 15), but which can be set up as high as 64 (resulting
> > in a maximum shift of 63).  A value of 31 can result in sign extension,
> > resulting in 0xffffffff80000000 instead of the desired 0x80000000.
> > A value of 32 or greater triggers undefined behavior per the C standard.
> > 
> > This bug has not been known to cause issues because almost all kernels
> > take the default CONFIG_RCU_FANOUT_LEAF=16.  Furthermore, as long as a
> > given compiler gives a deterministic non-zero result for 1<<N for N>=32,
> > the code correctly invokes all SRCU callbacks, albeit wasting CPU time
> > along the way.
> > 
> > This commit therefore substitutes the correct 1UL for the buggy 1.
> > 
> > Found by Linux Verification Center (linuxtesting.org) with SVACE.
> > 
> > Signed-off-by: Denis Arefev <arefev@swemel.ru>
> > Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > Cc: David Laight <David.Laight@aculab.com>
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > ---
> >  kernel/rcu/srcutree.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> > index 833a8f848a90..5602042856b1 100644
> > --- a/kernel/rcu/srcutree.c
> > +++ b/kernel/rcu/srcutree.c
> > @@ -223,7 +223,7 @@ static bool init_srcu_struct_nodes(struct srcu_struct *ssp, gfp_t gfp_flags)
> >  				snp->grplo = cpu;
> >  			snp->grphi = cpu;
> >  		}
> > -		sdp->grpmask = 1 << (cpu - sdp->mynode->grplo);
> > +		sdp->grpmask = 1UL << (cpu - sdp->mynode->grplo);
> >  	}
> >  	smp_store_release(&ssp->srcu_sup->srcu_size_state, SRCU_SIZE_WAIT_BARRIER);
> >  	return true;
> > @@ -835,7 +835,7 @@ static void srcu_schedule_cbs_snp(struct srcu_struct *ssp, struct srcu_node *snp
> >  	int cpu;
> > 
> >  	for (cpu = snp->grplo; cpu <= snp->grphi; cpu++) {
> > -		if (!(mask & (1 << (cpu - snp->grplo))))
> > +		if (!(mask & (1UL << (cpu - snp->grplo))))
> >  			continue;
> >  		srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
> >  	}
> 
> That loop is entirely horrid.
> The compiler almost certainly has to reload snp->grphi every iteration.
> Also it looks as though the bottom bit of mask is checked first.
> So how about:
> 	grphi = snp->grphi;
> 	for (cpu = snp->grplo; cpu <= grphi; cpu++, mask >>= 1) {
> 		if (!(mask & 1))
> 			continue;
> 		srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
> 	}

Well, it's cache-hot and RCU update side is not really a fast-path.
Not sure it's worth optimizing...

Thanks.

> 
> 	David		
> 
> > --
> > 2.34.1
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 10/18] rcu: kmemleak: Ignore kmemleak false positives when RCU-freeing objects
  2023-10-13 11:58 [PATCH 00/18] RCU fixes for v6.7 Frederic Weisbecker
                   ` (8 preceding siblings ...)
  2023-10-13 11:58 ` [PATCH 09/18] srcu: Fix srcu_struct node grpmask overflow on 64-bit systems Frederic Weisbecker
@ 2023-10-13 11:58 ` Frederic Weisbecker
  2023-10-13 11:58 ` [PATCH 11/18] rcu: Use rcu_segcblist_segempty() instead of open coding it Frederic Weisbecker
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2023-10-13 11:58 UTC (permalink / raw)
  To: LKML
  Cc: Catalin Marinas, Boqun Feng, Joel Fernandes, Josh Triplett,
	Mathieu Desnoyers, Neeraj Upadhyay, Paul E . McKenney,
	Steven Rostedt, Uladzislau Rezki, rcu, Christoph Paasch, stable,
	Frederic Weisbecker

From: Catalin Marinas <catalin.marinas@arm.com>

Since the actual slab freeing is deferred when calling kvfree_rcu(), so
is the kmemleak_free() callback informing kmemleak of the object
deletion. From the perspective of the kvfree_rcu() caller, the object is
freed and it may remove any references to it. Since kmemleak does not
scan RCU internal data storing the pointer, it will report such objects
as leaks during the grace period.

Tell kmemleak to ignore such objects on the kvfree_call_rcu() path. Note
that the tiny RCU implementation does not have such issue since the
objects can be tracked from the rcu_ctrlblk structure.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://lore.kernel.org/all/F903A825-F05F-4B77-A2B5-7356282FBA2C@apple.com/
Cc: <stable@vger.kernel.org>
Tested-by: Christoph Paasch <cpaasch@apple.com>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/rcu/tree.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index a83ecab77917..4dd7df30df31 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -31,6 +31,7 @@
 #include <linux/bitops.h>
 #include <linux/export.h>
 #include <linux/completion.h>
+#include <linux/kmemleak.h>
 #include <linux/moduleparam.h>
 #include <linux/panic.h>
 #include <linux/panic_notifier.h>
@@ -3389,6 +3390,14 @@ void kvfree_call_rcu(struct rcu_head *head, void *ptr)
 		success = true;
 	}
 
+	/*
+	 * The kvfree_rcu() caller considers the pointer freed at this point
+	 * and likely removes any references to it. Since the actual slab
+	 * freeing (and kmemleak_free()) is deferred, tell kmemleak to ignore
+	 * this object (no scanning or false positives reporting).
+	 */
+	kmemleak_ignore(ptr);
+
 	// Set timer to drain after KFREE_DRAIN_JIFFIES.
 	if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING)
 		schedule_delayed_monitor_work(krcp);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 11/18] rcu: Use rcu_segcblist_segempty() instead of open coding it
  2023-10-13 11:58 [PATCH 00/18] RCU fixes for v6.7 Frederic Weisbecker
                   ` (9 preceding siblings ...)
  2023-10-13 11:58 ` [PATCH 10/18] rcu: kmemleak: Ignore kmemleak false positives when RCU-freeing objects Frederic Weisbecker
@ 2023-10-13 11:58 ` Frederic Weisbecker
  2023-10-13 11:58 ` [PATCH 12/18] rcu: Assume IRQS disabled from rcu_report_dead() Frederic Weisbecker
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2023-10-13 11:58 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Boqun Feng, Joel Fernandes, Josh Triplett,
	Mathieu Desnoyers, Neeraj Upadhyay, Paul E . McKenney,
	Steven Rostedt, Uladzislau Rezki, rcu, Qiuxu Zhuo

This makes the code more readable.

Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/rcu/rcu_segcblist.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
index f71fac422c8f..1693ea22ef1b 100644
--- a/kernel/rcu/rcu_segcblist.c
+++ b/kernel/rcu/rcu_segcblist.c
@@ -368,7 +368,7 @@ bool rcu_segcblist_entrain(struct rcu_segcblist *rsclp,
 	smp_mb(); /* Ensure counts are updated before callback is entrained. */
 	rhp->next = NULL;
 	for (i = RCU_NEXT_TAIL; i > RCU_DONE_TAIL; i--)
-		if (rsclp->tails[i] != rsclp->tails[i - 1])
+		if (!rcu_segcblist_segempty(rsclp, i))
 			break;
 	rcu_segcblist_inc_seglen(rsclp, i);
 	WRITE_ONCE(*rsclp->tails[i], rhp);
@@ -551,7 +551,7 @@ bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq)
 	 * as their ->gp_seq[] grace-period completion sequence number.
 	 */
 	for (i = RCU_NEXT_READY_TAIL; i > RCU_DONE_TAIL; i--)
-		if (rsclp->tails[i] != rsclp->tails[i - 1] &&
+		if (!rcu_segcblist_segempty(rsclp, i) &&
 		    ULONG_CMP_LT(rsclp->gp_seq[i], seq))
 			break;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 12/18] rcu: Assume IRQS disabled from rcu_report_dead()
  2023-10-13 11:58 [PATCH 00/18] RCU fixes for v6.7 Frederic Weisbecker
                   ` (10 preceding siblings ...)
  2023-10-13 11:58 ` [PATCH 11/18] rcu: Use rcu_segcblist_segempty() instead of open coding it Frederic Weisbecker
@ 2023-10-13 11:58 ` Frederic Weisbecker
  2023-10-13 11:58 ` [PATCH 13/18] rcu: Assume rcu_report_dead() is always called locally Frederic Weisbecker
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2023-10-13 11:58 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Boqun Feng, Joel Fernandes, Josh Triplett,
	Mathieu Desnoyers, Neeraj Upadhyay, Paul E . McKenney,
	Steven Rostedt, Uladzislau Rezki, rcu

rcu_report_dead() is the last RCU word from the CPU down through the
hotplug path. It is called in the idle loop right before the CPU shuts
down for good. Because it removes the CPU from the grace period state
machine and reports an ultimate quiescent state if necessary, no further
use of RCU is allowed. Therefore it is expected that IRQs are disabled
upon calling this function and are not to be re-enabled again until the
CPU shuts down.

Remove the IRQs disablement from that function and verify instead that
it is actually called with IRQs disabled as it is expected at that
special point in the idle path.

Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/rcu/tree.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 4dd7df30df31..8c2954502e55 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -4562,11 +4562,16 @@ void rcu_cpu_starting(unsigned int cpu)
  */
 void rcu_report_dead(unsigned int cpu)
 {
-	unsigned long flags, seq_flags;
+	unsigned long flags;
 	unsigned long mask;
 	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
 	struct rcu_node *rnp = rdp->mynode;  /* Outgoing CPU's rdp & rnp. */
 
+	/*
+	 * IRQS must be disabled from now on and until the CPU dies, or an interrupt
+	 * may introduce a new READ-side while it is actually off the QS masks.
+	 */
+	lockdep_assert_irqs_disabled();
 	// Do any dangling deferred wakeups.
 	do_nocb_deferred_wakeup(rdp);
 
@@ -4574,7 +4579,6 @@ void rcu_report_dead(unsigned int cpu)
 
 	/* Remove outgoing CPU from mask in the leaf rcu_node structure. */
 	mask = rdp->grpmask;
-	local_irq_save(seq_flags);
 	arch_spin_lock(&rcu_state.ofl_lock);
 	raw_spin_lock_irqsave_rcu_node(rnp, flags); /* Enforce GP memory-order guarantee. */
 	rdp->rcu_ofl_gp_seq = READ_ONCE(rcu_state.gp_seq);
@@ -4588,8 +4592,6 @@ void rcu_report_dead(unsigned int cpu)
 	WRITE_ONCE(rnp->qsmaskinitnext, rnp->qsmaskinitnext & ~mask);
 	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 	arch_spin_unlock(&rcu_state.ofl_lock);
-	local_irq_restore(seq_flags);
-
 	rdp->cpu_started = false;
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 13/18] rcu: Assume rcu_report_dead() is always called locally
  2023-10-13 11:58 [PATCH 00/18] RCU fixes for v6.7 Frederic Weisbecker
                   ` (11 preceding siblings ...)
  2023-10-13 11:58 ` [PATCH 12/18] rcu: Assume IRQS disabled from rcu_report_dead() Frederic Weisbecker
@ 2023-10-13 11:58 ` Frederic Weisbecker
  2023-10-13 11:58 ` [PATCH 14/18] rcu: Conditionally build CPU-hotplug teardown callbacks Frederic Weisbecker
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2023-10-13 11:58 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Boqun Feng, Joel Fernandes, Josh Triplett,
	Mathieu Desnoyers, Neeraj Upadhyay, Paul E . McKenney,
	Steven Rostedt, Uladzislau Rezki, rcu

rcu_report_dead() has to be called locally by the CPU that is going to
exit the RCU state machine. Passing a cpu argument here is error-prone
and leaves the possibility for a racy remote call.

Use local access instead.

Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 arch/arm64/kernel/smp.c  | 2 +-
 include/linux/rcupdate.h | 2 +-
 kernel/cpu.c             | 2 +-
 kernel/rcu/tree.c        | 4 ++--
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 960b98b43506..8fa646c90c67 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -401,7 +401,7 @@ void __noreturn cpu_die_early(void)
 
 	/* Mark this CPU absent */
 	set_cpu_present(cpu, 0);
-	rcu_report_dead(cpu);
+	rcu_report_dead();
 
 	if (IS_ENABLED(CONFIG_HOTPLUG_CPU)) {
 		update_cpu_boot_status(CPU_KILL_ME);
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 5e5f920ade90..aa351ddcbe8d 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -122,7 +122,7 @@ static inline void call_rcu_hurry(struct rcu_head *head, rcu_callback_t func)
 void rcu_init(void);
 extern int rcu_scheduler_active;
 void rcu_sched_clock_irq(int user);
-void rcu_report_dead(unsigned int cpu);
+void rcu_report_dead(void);
 void rcutree_migrate_callbacks(int cpu);
 
 #ifdef CONFIG_TASKS_RCU_GENERIC
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 6de7c6bb74ee..076e75fed8bb 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1388,7 +1388,7 @@ void cpuhp_report_idle_dead(void)
 	struct cpuhp_cpu_state *st = this_cpu_ptr(&cpuhp_state);
 
 	BUG_ON(st->state != CPUHP_AP_OFFLINE);
-	rcu_report_dead(smp_processor_id());
+	rcu_report_dead();
 	st->state = CPUHP_AP_IDLE_DEAD;
 	/*
 	 * We cannot call complete after rcu_report_dead() so we delegate it
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 8c2954502e55..2e1e7eadf2cc 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -4560,11 +4560,11 @@ void rcu_cpu_starting(unsigned int cpu)
  * from the outgoing CPU rather than from the cpuhp_step mechanism.
  * This is because this function must be invoked at a precise location.
  */
-void rcu_report_dead(unsigned int cpu)
+void rcu_report_dead(void)
 {
 	unsigned long flags;
 	unsigned long mask;
-	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
+	struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
 	struct rcu_node *rnp = rdp->mynode;  /* Outgoing CPU's rdp & rnp. */
 
 	/*
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 14/18] rcu: Conditionally build CPU-hotplug teardown callbacks
  2023-10-13 11:58 [PATCH 00/18] RCU fixes for v6.7 Frederic Weisbecker
                   ` (12 preceding siblings ...)
  2023-10-13 11:58 ` [PATCH 13/18] rcu: Assume rcu_report_dead() is always called locally Frederic Weisbecker
@ 2023-10-13 11:58 ` Frederic Weisbecker
  2023-10-13 11:58 ` [PATCH 15/18] rcu: Standardize explicit CPU-hotplug calls Frederic Weisbecker
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2023-10-13 11:58 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Boqun Feng, Joel Fernandes, Josh Triplett,
	Mathieu Desnoyers, Neeraj Upadhyay, Paul E . McKenney,
	Steven Rostedt, Uladzislau Rezki, rcu

Among the three CPU-hotplug teardown RCU callbacks, two of them early
exit if CONFIG_HOTPLUG_CPU=n, and one is left unchanged. In any case
all of them have an implementation when CONFIG_HOTPLUG_CPU=n.

Align instead with the common way to deal with CPU-hotplug teardown
callbacks and provide a proper stub when they are not supported.

Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 include/linux/rcutree.h |  11 +++-
 kernel/rcu/tree.c       | 114 +++++++++++++++++++---------------------
 2 files changed, 63 insertions(+), 62 deletions(-)

diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index 153cfc7bbffd..46875c4e9f56 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -110,9 +110,16 @@ void rcu_all_qs(void);
 /* RCUtree hotplug events */
 int rcutree_prepare_cpu(unsigned int cpu);
 int rcutree_online_cpu(unsigned int cpu);
-int rcutree_offline_cpu(unsigned int cpu);
+void rcu_cpu_starting(unsigned int cpu);
+
+#ifdef CONFIG_HOTPLUG_CPU
 int rcutree_dead_cpu(unsigned int cpu);
 int rcutree_dying_cpu(unsigned int cpu);
-void rcu_cpu_starting(unsigned int cpu);
+int rcutree_offline_cpu(unsigned int cpu);
+#else
+#define rcutree_dead_cpu NULL
+#define rcutree_dying_cpu NULL
+#define rcutree_offline_cpu NULL
+#endif
 
 #endif /* __LINUX_RCUTREE_H */
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 2e1e7eadf2cc..f9c6b2680cbb 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -4237,25 +4237,6 @@ static bool rcu_init_invoked(void)
 	return !!rcu_state.n_online_cpus;
 }
 
-/*
- * Near the end of the offline process.  Trace the fact that this CPU
- * is going offline.
- */
-int rcutree_dying_cpu(unsigned int cpu)
-{
-	bool blkd;
-	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
-	struct rcu_node *rnp = rdp->mynode;
-
-	if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
-		return 0;
-
-	blkd = !!(READ_ONCE(rnp->qsmask) & rdp->grpmask);
-	trace_rcu_grace_period(rcu_state.name, READ_ONCE(rnp->gp_seq),
-			       blkd ? TPS("cpuofl-bgp") : TPS("cpuofl"));
-	return 0;
-}
-
 /*
  * All CPUs for the specified rcu_node structure have gone offline,
  * and all tasks that were preempted within an RCU read-side critical
@@ -4301,23 +4282,6 @@ static void rcu_cleanup_dead_rnp(struct rcu_node *rnp_leaf)
 	}
 }
 
-/*
- * The CPU has been completely removed, and some other CPU is reporting
- * this fact from process context.  Do the remainder of the cleanup.
- * There can only be one CPU hotplug operation at a time, so no need for
- * explicit locking.
- */
-int rcutree_dead_cpu(unsigned int cpu)
-{
-	if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
-		return 0;
-
-	WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1);
-	// Stop-machine done, so allow nohz_full to disable tick.
-	tick_dep_clear(TICK_DEP_BIT_RCU);
-	return 0;
-}
-
 /*
  * Propagate ->qsinitmask bits up the rcu_node tree to account for the
  * first CPU in a given leaf rcu_node structure coming online.  The caller
@@ -4470,29 +4434,6 @@ int rcutree_online_cpu(unsigned int cpu)
 	return 0;
 }
 
-/*
- * Near the beginning of the process.  The CPU is still very much alive
- * with pretty much all services enabled.
- */
-int rcutree_offline_cpu(unsigned int cpu)
-{
-	unsigned long flags;
-	struct rcu_data *rdp;
-	struct rcu_node *rnp;
-
-	rdp = per_cpu_ptr(&rcu_data, cpu);
-	rnp = rdp->mynode;
-	raw_spin_lock_irqsave_rcu_node(rnp, flags);
-	rnp->ffmask &= ~rdp->grpmask;
-	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
-
-	rcutree_affinity_setting(cpu, cpu);
-
-	// nohz_full CPUs need the tick for stop-machine to work quickly
-	tick_dep_set(TICK_DEP_BIT_RCU);
-	return 0;
-}
-
 /*
  * Mark the specified CPU as being online so that subsequent grace periods
  * (both expedited and normal) will wait on it.  Note that this means that
@@ -4646,7 +4587,60 @@ void rcutree_migrate_callbacks(int cpu)
 		  cpu, rcu_segcblist_n_cbs(&rdp->cblist),
 		  rcu_segcblist_first_cb(&rdp->cblist));
 }
-#endif
+
+/*
+ * The CPU has been completely removed, and some other CPU is reporting
+ * this fact from process context.  Do the remainder of the cleanup.
+ * There can only be one CPU hotplug operation at a time, so no need for
+ * explicit locking.
+ */
+int rcutree_dead_cpu(unsigned int cpu)
+{
+	WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1);
+	// Stop-machine done, so allow nohz_full to disable tick.
+	tick_dep_clear(TICK_DEP_BIT_RCU);
+	return 0;
+}
+
+/*
+ * Near the end of the offline process.  Trace the fact that this CPU
+ * is going offline.
+ */
+int rcutree_dying_cpu(unsigned int cpu)
+{
+	bool blkd;
+	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
+	struct rcu_node *rnp = rdp->mynode;
+
+	blkd = !!(READ_ONCE(rnp->qsmask) & rdp->grpmask);
+	trace_rcu_grace_period(rcu_state.name, READ_ONCE(rnp->gp_seq),
+			       blkd ? TPS("cpuofl-bgp") : TPS("cpuofl"));
+	return 0;
+}
+
+/*
+ * Near the beginning of the process.  The CPU is still very much alive
+ * with pretty much all services enabled.
+ */
+int rcutree_offline_cpu(unsigned int cpu)
+{
+	unsigned long flags;
+	struct rcu_data *rdp;
+	struct rcu_node *rnp;
+
+	rdp = per_cpu_ptr(&rcu_data, cpu);
+	rnp = rdp->mynode;
+	raw_spin_lock_irqsave_rcu_node(rnp, flags);
+	rnp->ffmask &= ~rdp->grpmask;
+	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
+
+	rcutree_affinity_setting(cpu, cpu);
+
+	// nohz_full CPUs need the tick for stop-machine to work quickly
+	tick_dep_set(TICK_DEP_BIT_RCU);
+	return 0;
+}
+#endif /* #ifdef CONFIG_HOTPLUG_CPU */
 
 /*
  * On non-huge systems, use expedited RCU grace periods to make suspend
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 15/18] rcu: Standardize explicit CPU-hotplug calls
  2023-10-13 11:58 [PATCH 00/18] RCU fixes for v6.7 Frederic Weisbecker
                   ` (13 preceding siblings ...)
  2023-10-13 11:58 ` [PATCH 14/18] rcu: Conditionally build CPU-hotplug teardown callbacks Frederic Weisbecker
@ 2023-10-13 11:58 ` Frederic Weisbecker
  2023-10-13 11:59 ` [PATCH 16/18] rcu: Comment why callbacks migration can't wait for CPUHP_RCUTREE_PREP Frederic Weisbecker
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2023-10-13 11:58 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Boqun Feng, Joel Fernandes, Josh Triplett,
	Mathieu Desnoyers, Neeraj Upadhyay, Paul E . McKenney,
	Steven Rostedt, Uladzislau Rezki, rcu

rcu_report_dead() and rcutree_migrate_callbacks() have their headers in
rcupdate.h while those are pure rcutree calls, like the other CPU-hotplug
functions.

Also rcu_cpu_starting() and rcu_report_dead() have different naming
conventions while they mirror each other's effects.

Fix the headers and propose a naming that relates both functions and
aligns with the prefix of other rcutree CPU-hotplug functions.

Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 .../Expedited-Grace-Periods.rst                      |  2 +-
 .../RCU/Design/Memory-Ordering/TreeRCU-gp-fqs.svg    |  4 ++--
 .../RCU/Design/Memory-Ordering/TreeRCU-gp.svg        |  4 ++--
 .../RCU/Design/Memory-Ordering/TreeRCU-hotplug.svg   |  4 ++--
 .../RCU/Design/Requirements/Requirements.rst         |  4 ++--
 arch/arm64/kernel/smp.c                              |  4 ++--
 arch/powerpc/kernel/smp.c                            |  2 +-
 arch/s390/kernel/smp.c                               |  2 +-
 arch/x86/kernel/smpboot.c                            |  2 +-
 include/linux/interrupt.h                            |  2 +-
 include/linux/rcupdate.h                             |  2 --
 include/linux/rcutiny.h                              |  2 +-
 include/linux/rcutree.h                              |  7 ++++++-
 kernel/cpu.c                                         |  6 +++---
 kernel/rcu/tree.c                                    | 12 ++++++++----
 15 files changed, 33 insertions(+), 26 deletions(-)

diff --git a/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.rst b/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.rst
index 93d899d53258..414f8a2012d6 100644
--- a/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.rst
+++ b/Documentation/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.rst
@@ -181,7 +181,7 @@ operations is carried out at several levels:
    of this wait (or series of waits, as the case may be) is to permit a
    concurrent CPU-hotplug operation to complete.
 #. In the case of RCU-sched, one of the last acts of an outgoing CPU is
-   to invoke ``rcu_report_dead()``, which reports a quiescent state for
+   to invoke ``rcutree_report_cpu_dead()``, which reports a quiescent state for
    that CPU. However, this is likely paranoia-induced redundancy.
 
 +-----------------------------------------------------------------------+
diff --git a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp-fqs.svg b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp-fqs.svg
index 7ddc094d7f28..d82a77d03d8c 100644
--- a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp-fqs.svg
+++ b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp-fqs.svg
@@ -1135,7 +1135,7 @@
        font-weight="bold"
        font-size="192"
        id="text202-7-5-3-27-6-5"
-       style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcu_report_dead()</text>
+       style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcutree_report_cpu_dead()</text>
     <text
        xml:space="preserve"
        x="3745.7725"
@@ -1256,7 +1256,7 @@
        font-style="normal"
        y="3679.27"
        x="-3804.9949"
-       xml:space="preserve">rcu_cpu_starting()</text>
+       xml:space="preserve">rcutree_report_cpu_starting()</text>
     <g
        style="fill:none;stroke-width:0.025in"
        id="g3107-7-5-0"
diff --git a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp.svg b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp.svg
index 069f6f8371c2..6e690a3be161 100644
--- a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp.svg
+++ b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-gp.svg
@@ -3274,7 +3274,7 @@
          font-weight="bold"
          font-size="192"
          id="text202-7-5-3-27-6-5"
-         style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcu_report_dead()</text>
+         style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcutree_report_cpu_dead()</text>
       <text
          xml:space="preserve"
          x="3745.7725"
@@ -3395,7 +3395,7 @@
          font-style="normal"
          y="3679.27"
          x="-3804.9949"
-         xml:space="preserve">rcu_cpu_starting()</text>
+         xml:space="preserve">rcutree_report_cpu_starting()</text>
       <g
          style="fill:none;stroke-width:0.025in"
          id="g3107-7-5-0"
diff --git a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-hotplug.svg b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-hotplug.svg
index 2c9310ba29ba..4fa7506082bf 100644
--- a/Documentation/RCU/Design/Memory-Ordering/TreeRCU-hotplug.svg
+++ b/Documentation/RCU/Design/Memory-Ordering/TreeRCU-hotplug.svg
@@ -607,7 +607,7 @@
        font-weight="bold"
        font-size="192"
        id="text202-7-5-3-27-6"
-       style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcu_report_dead()</text>
+       style="font-size:192px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;stroke-width:0.025in;font-family:Courier">rcutree_report_cpu_dead()</text>
     <text
        xml:space="preserve"
        x="3745.7725"
@@ -728,7 +728,7 @@
        font-style="normal"
        y="3679.27"
        x="-3804.9949"
-       xml:space="preserve">rcu_cpu_starting()</text>
+       xml:space="preserve">rcutree_report_cpu_starting()</text>
     <g
        style="fill:none;stroke-width:0.025in"
        id="g3107-7-5-0"
diff --git a/Documentation/RCU/Design/Requirements/Requirements.rst b/Documentation/RCU/Design/Requirements/Requirements.rst
index f3b605285a87..cccafdaa1f84 100644
--- a/Documentation/RCU/Design/Requirements/Requirements.rst
+++ b/Documentation/RCU/Design/Requirements/Requirements.rst
@@ -1955,12 +1955,12 @@ if offline CPUs block an RCU grace period for too long.
 
 An offline CPU's quiescent state will be reported either:
 
-1.  As the CPU goes offline using RCU's hotplug notifier (rcu_report_dead()).
+1.  As the CPU goes offline using RCU's hotplug notifier (rcutree_report_cpu_dead()).
 2.  When grace period initialization (rcu_gp_init()) detects a
     race either with CPU offlining or with a task unblocking on a leaf
     ``rcu_node`` structure whose CPUs are all offline.
 
-The CPU-online path (rcu_cpu_starting()) should never need to report
+The CPU-online path (rcutree_report_cpu_starting()) should never need to report
 a quiescent state for an offline CPU.  However, as a debugging measure,
 it does emit a warning if a quiescent state was not already reported
 for that CPU.
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 8fa646c90c67..196533c362e1 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -215,7 +215,7 @@ asmlinkage notrace void secondary_start_kernel(void)
 	if (system_uses_irq_prio_masking())
 		init_gic_priority_masking();
 
-	rcu_cpu_starting(cpu);
+	rcutree_report_cpu_starting(cpu);
 	trace_hardirqs_off();
 
 	/*
@@ -401,7 +401,7 @@ void __noreturn cpu_die_early(void)
 
 	/* Mark this CPU absent */
 	set_cpu_present(cpu, 0);
-	rcu_report_dead();
+	rcutree_report_cpu_dead();
 
 	if (IS_ENABLED(CONFIG_HOTPLUG_CPU)) {
 		update_cpu_boot_status(CPU_KILL_ME);
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 5826f5108a12..a30d4d93ff0b 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1629,7 +1629,7 @@ void start_secondary(void *unused)
 
 	smp_store_cpu_info(cpu);
 	set_dec(tb_ticks_per_jiffy);
-	rcu_cpu_starting(cpu);
+	rcutree_report_cpu_starting(cpu);
 	cpu_callin_map[cpu] = 1;
 
 	if (smp_ops->setup_cpu)
diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c
index a4edb7ea66ea..214a1b67f80a 100644
--- a/arch/s390/kernel/smp.c
+++ b/arch/s390/kernel/smp.c
@@ -898,7 +898,7 @@ static void smp_start_secondary(void *cpuvoid)
 	S390_lowcore.restart_flags = 0;
 	restore_access_regs(S390_lowcore.access_regs_save_area);
 	cpu_init();
-	rcu_cpu_starting(cpu);
+	rcutree_report_cpu_starting(cpu);
 	init_cpu_timer();
 	vtime_init();
 	vdso_getcpu_init();
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 4e45ff44aa07..4ccb76f89af8 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -288,7 +288,7 @@ static void notrace start_secondary(void *unused)
 
 	cpu_init();
 	fpu__init_cpu();
-	rcu_cpu_starting(raw_smp_processor_id());
+	rcutree_report_cpu_starting(raw_smp_processor_id());
 	x86_cpuinit.early_percpu_clock_init();
 
 	ap_starting();
diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index a92bce40b04b..d05e1e9a553c 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -566,7 +566,7 @@ enum
  *
  * _ RCU:
  * 	1) rcutree_migrate_callbacks() migrates the queue.
- * 	2) rcu_report_dead() reports the final quiescent states.
+ * 	2) rcutree_report_cpu_dead() reports the final quiescent states.
  *
  * _ IRQ_POLL: irq_poll_cpu_dead() migrates the queue
  */
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index aa351ddcbe8d..f7206b2623c9 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -122,8 +122,6 @@ static inline void call_rcu_hurry(struct rcu_head *head, rcu_callback_t func)
 void rcu_init(void);
 extern int rcu_scheduler_active;
 void rcu_sched_clock_irq(int user);
-void rcu_report_dead(void);
-void rcutree_migrate_callbacks(int cpu);
 
 #ifdef CONFIG_TASKS_RCU_GENERIC
 void rcu_init_tasks_generic(void);
diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
index 7b949292908a..d9ac7b136aea 100644
--- a/include/linux/rcutiny.h
+++ b/include/linux/rcutiny.h
@@ -171,6 +171,6 @@ static inline void rcu_all_qs(void) { barrier(); }
 #define rcutree_offline_cpu      NULL
 #define rcutree_dead_cpu         NULL
 #define rcutree_dying_cpu        NULL
-static inline void rcu_cpu_starting(unsigned int cpu) { }
+static inline void rcutree_report_cpu_starting(unsigned int cpu) { }
 
 #endif /* __LINUX_RCUTINY_H */
diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index 46875c4e9f56..254244202ea9 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -110,7 +110,7 @@ void rcu_all_qs(void);
 /* RCUtree hotplug events */
 int rcutree_prepare_cpu(unsigned int cpu);
 int rcutree_online_cpu(unsigned int cpu);
-void rcu_cpu_starting(unsigned int cpu);
+void rcutree_report_cpu_starting(unsigned int cpu);
 
 #ifdef CONFIG_HOTPLUG_CPU
 int rcutree_dead_cpu(unsigned int cpu);
@@ -122,4 +122,9 @@ int rcutree_offline_cpu(unsigned int cpu);
 #define rcutree_offline_cpu NULL
 #endif
 
+void rcutree_migrate_callbacks(int cpu);
+
+/* Called from hotplug and also arm64 early secondary boot failure */
+void rcutree_report_cpu_dead(void);
+
 #endif /* __LINUX_RCUTREE_H */
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 076e75fed8bb..2491766e1fd5 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1388,10 +1388,10 @@ void cpuhp_report_idle_dead(void)
 	struct cpuhp_cpu_state *st = this_cpu_ptr(&cpuhp_state);
 
 	BUG_ON(st->state != CPUHP_AP_OFFLINE);
-	rcu_report_dead();
+	rcutree_report_cpu_dead();
 	st->state = CPUHP_AP_IDLE_DEAD;
 	/*
-	 * We cannot call complete after rcu_report_dead() so we delegate it
+	 * We cannot call complete after rcutree_report_cpu_dead() so we delegate it
 	 * to an online cpu.
 	 */
 	smp_call_function_single(cpumask_first(cpu_online_mask),
@@ -1617,7 +1617,7 @@ void notify_cpu_starting(unsigned int cpu)
 	struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
 	enum cpuhp_state target = min((int)st->target, CPUHP_AP_ONLINE);
 
-	rcu_cpu_starting(cpu);	/* Enables RCU usage on this CPU. */
+	rcutree_report_cpu_starting(cpu);	/* Enables RCU usage on this CPU. */
 	cpumask_set_cpu(cpu, &cpus_booted_once_mask);
 
 	/*
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index f9c6b2680cbb..36d8818eaec1 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -4216,7 +4216,7 @@ bool rcu_lockdep_current_cpu_online(void)
 	rdp = this_cpu_ptr(&rcu_data);
 	/*
 	 * Strictly, we care here about the case where the current CPU is
-	 * in rcu_cpu_starting() and thus has an excuse for rdp->grpmask
+	 * in rcutree_report_cpu_starting() and thus has an excuse for rdp->grpmask
 	 * not being up to date. So arch_spin_is_locked() might have a
 	 * false positive if it's held by some *other* CPU, but that's
 	 * OK because that just means a false *negative* on the warning.
@@ -4445,8 +4445,10 @@ int rcutree_online_cpu(unsigned int cpu)
  * from the incoming CPU rather than from the cpuhp_step mechanism.
  * This is because this function must be invoked at a precise location.
  * This incoming CPU must not have enabled interrupts yet.
+ *
+ * This mirrors the effects of rcutree_report_cpu_dead().
  */
-void rcu_cpu_starting(unsigned int cpu)
+void rcutree_report_cpu_starting(unsigned int cpu)
 {
 	unsigned long mask;
 	struct rcu_data *rdp;
@@ -4500,8 +4502,10 @@ void rcu_cpu_starting(unsigned int cpu)
  * Note that this function is special in that it is invoked directly
  * from the outgoing CPU rather than from the cpuhp_step mechanism.
  * This is because this function must be invoked at a precise location.
+ *
+ * This mirrors the effect of rcutree_report_cpu_starting().
  */
-void rcu_report_dead(void)
+void rcutree_report_cpu_dead(void)
 {
 	unsigned long flags;
 	unsigned long mask;
@@ -5072,7 +5076,7 @@ void __init rcu_init(void)
 	pm_notifier(rcu_pm_notify, 0);
 	WARN_ON(num_online_cpus() > 1); // Only one CPU this early in boot.
 	rcutree_prepare_cpu(cpu);
-	rcu_cpu_starting(cpu);
+	rcutree_report_cpu_starting(cpu);
 	rcutree_online_cpu(cpu);
 
 	/* Create workqueue for Tree SRCU and for expedited GPs. */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 16/18] rcu: Comment why callbacks migration can't wait for CPUHP_RCUTREE_PREP
  2023-10-13 11:58 [PATCH 00/18] RCU fixes for v6.7 Frederic Weisbecker
                   ` (14 preceding siblings ...)
  2023-10-13 11:58 ` [PATCH 15/18] rcu: Standardize explicit CPU-hotplug calls Frederic Weisbecker
@ 2023-10-13 11:59 ` Frederic Weisbecker
  2023-10-13 11:59 ` [PATCH 17/18] srcu: Fix callbacks acceleration mishandling Frederic Weisbecker
  2023-10-13 11:59 ` [PATCH 18/18] srcu: Only accelerate on enqueue time Frederic Weisbecker
  17 siblings, 0 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2023-10-13 11:59 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Boqun Feng, Joel Fernandes, Josh Triplett,
	Mathieu Desnoyers, Neeraj Upadhyay, Paul E . McKenney,
	Steven Rostedt, Uladzislau Rezki, rcu

The callbacks migration is performed through an explicit call from
the hotplug control CPU right after the death of the target CPU and
before proceeding with the CPUHP_ teardown functions.

This is unusual but necessary and yet uncommented. Summarize the reason
as explained in the changelog of:

	a58163d8ca2c (rcu: Migrate callbacks earlier in the CPU-offline timeline)

Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/cpu.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 2491766e1fd5..3b9d5c7eb4a2 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1372,7 +1372,14 @@ static int takedown_cpu(unsigned int cpu)
 	cpuhp_bp_sync_dead(cpu);
 
 	tick_cleanup_dead_cpu(cpu);
+
+	/*
+	 * Callbacks must be re-integrated right away to the RCU state machine.
+	 * Otherwise an RCU callback could block a further teardown function
+	 * waiting for its completion.
+	 */
 	rcutree_migrate_callbacks(cpu);
+
 	return 0;
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 17/18] srcu: Fix callbacks acceleration mishandling
  2023-10-13 11:58 [PATCH 00/18] RCU fixes for v6.7 Frederic Weisbecker
                   ` (15 preceding siblings ...)
  2023-10-13 11:59 ` [PATCH 16/18] rcu: Comment why callbacks migration can't wait for CPUHP_RCUTREE_PREP Frederic Weisbecker
@ 2023-10-13 11:59 ` Frederic Weisbecker
  2023-10-13 11:59 ` [PATCH 18/18] srcu: Only accelerate on enqueue time Frederic Weisbecker
  17 siblings, 0 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2023-10-13 11:59 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Boqun Feng, Joel Fernandes, Josh Triplett,
	Mathieu Desnoyers, Neeraj Upadhyay, Paul E . McKenney,
	Steven Rostedt, Uladzislau Rezki, rcu, Yong He, Neeraj upadhyay

SRCU callbacks acceleration might fail if the preceding callbacks
advance also fails. This can happen when the following steps are met:

1) The RCU_WAIT_TAIL segment has callbacks (say for gp_num 8) and the
   RCU_NEXT_READY_TAIL also has callbacks (say for gp_num 12).

2) The grace period for RCU_WAIT_TAIL is observed as started but not yet
   completed so rcu_seq_current() returns 4 + SRCU_STATE_SCAN1 = 5.

3) This value is passed to rcu_segcblist_advance() which can't move
   any segment forward and fails.

4) srcu_gp_start_if_needed() still proceeds with callback acceleration.
   But then the call to rcu_seq_snap() observes the grace period for the
   RCU_WAIT_TAIL segment (gp_num 8) as completed and the subsequent one
   for the RCU_NEXT_READY_TAIL segment as started
   (ie: 8 + SRCU_STATE_SCAN1 = 9) so it returns a snapshot of the
   next grace period, which is 16.

5) The value of 16 is passed to rcu_segcblist_accelerate() but the
   freshly enqueued callback in RCU_NEXT_TAIL can't move to
   RCU_NEXT_READY_TAIL which already has callbacks for a previous grace
   period (gp_num = 12). So acceleration fails.

6) Note in all these steps, srcu_invoke_callbacks() hadn't had a chance
   to run srcu_invoke_callbacks().

Then some very bad outcome may happen if the following happens:

7) Some other CPU races and starts the grace period number 16 before the
   CPU handling previous steps had a chance. Therefore srcu_gp_start()
   isn't called on the latter sdp to fix the acceleration leak from
   previous steps with a new pair of call to advance/accelerate.

8) The grace period 16 completes and srcu_invoke_callbacks() is finally
   called. All the callbacks from previous grace periods (8 and 12) are
   correctly advanced and executed but callbacks in RCU_NEXT_READY_TAIL
   still remain. Then rcu_segcblist_accelerate() is called with a
   snaphot of 20.

9) Since nothing started the grace period number 20, callbacks stay
   unhandled.

This has been reported in real load:

	[3144162.608392] INFO: task kworker/136:12:252684 blocked for more
	than 122 seconds.
	[3144162.615986]       Tainted: G           O  K   5.4.203-1-tlinux4-0011.1 #1
	[3144162.623053] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
	disables this message.
	[3144162.631162] kworker/136:12  D    0 252684      2 0x90004000
	[3144162.631189] Workqueue: kvm-irqfd-cleanup irqfd_shutdown [kvm]
	[3144162.631192] Call Trace:
	[3144162.631202]  __schedule+0x2ee/0x660
	[3144162.631206]  schedule+0x33/0xa0
	[3144162.631209]  schedule_timeout+0x1c4/0x340
	[3144162.631214]  ? update_load_avg+0x82/0x660
	[3144162.631217]  ? raw_spin_rq_lock_nested+0x1f/0x30
	[3144162.631218]  wait_for_completion+0x119/0x180
	[3144162.631220]  ? wake_up_q+0x80/0x80
	[3144162.631224]  __synchronize_srcu.part.19+0x81/0xb0
	[3144162.631226]  ? __bpf_trace_rcu_utilization+0x10/0x10
	[3144162.631227]  synchronize_srcu+0x5f/0xc0
	[3144162.631236]  irqfd_shutdown+0x3c/0xb0 [kvm]
	[3144162.631239]  ? __schedule+0x2f6/0x660
	[3144162.631243]  process_one_work+0x19a/0x3a0
	[3144162.631244]  worker_thread+0x37/0x3a0
	[3144162.631247]  kthread+0x117/0x140
	[3144162.631247]  ? process_one_work+0x3a0/0x3a0
	[3144162.631248]  ? __kthread_cancel_work+0x40/0x40
	[3144162.631250]  ret_from_fork+0x1f/0x30

Fix this with taking the snapshot for acceleration _before_ the read
of the current grace period number.

The only side effect of this solution is that callbacks advancing happen
then _after_ the full barrier in rcu_seq_snap(). This is not a problem
because that barrier only cares about:

1) Ordering accesses of the update side before call_srcu() so they don't
   bleed.
2) See all the accesses prior to the grace period of the current gp_num

The only things callbacks advancing need to be ordered against are
carried by snp locking.

Reported-by: Yong He <alexyonghe@tencent.com>
Co-developed-by:: Yong He <alexyonghe@tencent.com>
Signed-off-by: Yong He <alexyonghe@tencent.com>
Co-developed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by:  Joel Fernandes (Google) <joel@joelfernandes.org>
Co-developed-by: Neeraj upadhyay <Neeraj.Upadhyay@amd.com>
Signed-off-by: Neeraj upadhyay <Neeraj.Upadhyay@amd.com>
Link: http://lore.kernel.org/CANZk6aR+CqZaqmMWrC2eRRPY12qAZnDZLwLnHZbNi=xXMB401g@mail.gmail.com
Fixes: da915ad5cf25 ("srcu: Parallelize callback handling")
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/rcu/srcutree.c | 33 ++++++++++++++++++++++++++++++---
 1 file changed, 30 insertions(+), 3 deletions(-)

diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index 5602042856b1..9fab9ac36996 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -1244,10 +1244,37 @@ static unsigned long srcu_gp_start_if_needed(struct srcu_struct *ssp,
 	spin_lock_irqsave_sdp_contention(sdp, &flags);
 	if (rhp)
 		rcu_segcblist_enqueue(&sdp->srcu_cblist, rhp);
-	rcu_segcblist_advance(&sdp->srcu_cblist,
-			      rcu_seq_current(&ssp->srcu_sup->srcu_gp_seq));
+	/*
+	 * The snapshot for acceleration must be taken _before_ the read of the
+	 * current gp sequence used for advancing, otherwise advancing may fail
+	 * and acceleration may then fail too.
+	 *
+	 * This could happen if:
+	 *
+	 *  1) The RCU_WAIT_TAIL segment has callbacks (gp_num = X + 4) and the
+	 *     RCU_NEXT_READY_TAIL also has callbacks (gp_num = X + 8).
+	 *
+	 *  2) The grace period for RCU_WAIT_TAIL is seen as started but not
+	 *     completed so rcu_seq_current() returns X + SRCU_STATE_SCAN1.
+	 *
+	 *  3) This value is passed to rcu_segcblist_advance() which can't move
+	 *     any segment forward and fails.
+	 *
+	 *  4) srcu_gp_start_if_needed() still proceeds with callback acceleration.
+	 *     But then the call to rcu_seq_snap() observes the grace period for the
+	 *     RCU_WAIT_TAIL segment as completed and the subsequent one for the
+	 *     RCU_NEXT_READY_TAIL segment as started (ie: X + 4 + SRCU_STATE_SCAN1)
+	 *     so it returns a snapshot of the next grace period, which is X + 12.
+	 *
+	 *  5) The value of X + 12 is passed to rcu_segcblist_accelerate() but the
+	 *     freshly enqueued callback in RCU_NEXT_TAIL can't move to
+	 *     RCU_NEXT_READY_TAIL which already has callbacks for a previous grace
+	 *     period (gp_num = X + 8). So acceleration fails.
+	 */
 	s = rcu_seq_snap(&ssp->srcu_sup->srcu_gp_seq);
-	(void)rcu_segcblist_accelerate(&sdp->srcu_cblist, s);
+	rcu_segcblist_advance(&sdp->srcu_cblist,
+			      rcu_seq_current(&ssp->srcu_sup->srcu_gp_seq));
+	WARN_ON_ONCE(!rcu_segcblist_accelerate(&sdp->srcu_cblist, s) && rhp);
 	if (ULONG_CMP_LT(sdp->srcu_gp_seq_needed, s)) {
 		sdp->srcu_gp_seq_needed = s;
 		needgp = true;
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 18/18] srcu: Only accelerate on enqueue time
  2023-10-13 11:58 [PATCH 00/18] RCU fixes for v6.7 Frederic Weisbecker
                   ` (16 preceding siblings ...)
  2023-10-13 11:59 ` [PATCH 17/18] srcu: Fix callbacks acceleration mishandling Frederic Weisbecker
@ 2023-10-13 11:59 ` Frederic Weisbecker
  17 siblings, 0 replies; 21+ messages in thread
From: Frederic Weisbecker @ 2023-10-13 11:59 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Boqun Feng, Joel Fernandes, Josh Triplett,
	Mathieu Desnoyers, Neeraj Upadhyay, Paul E . McKenney,
	Steven Rostedt, Uladzislau Rezki, rcu, Yong He, Neeraj upadhyay,
	Like Xu

Acceleration in SRCU happens on enqueue time for each new callback. This
operation is expected not to fail and therefore any similar attempt
from other places shouldn't find any remaining callbacks to accelerate.

Moreover accelerations performed beyond enqueue time are error prone
because rcu_seq_snap() then may return the snapshot for a new grace
period that is not going to be started.

Remove these dangerous and needless accelerations and introduce instead
assertions reporting leaking unaccelerated callbacks beyond enqueue
time.

Co-developed-by: Yong He <alexyonghe@tencent.com>
Signed-off-by: Yong He <alexyonghe@tencent.com>
Co-developed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Co-developed-by: Neeraj upadhyay <Neeraj.Upadhyay@amd.com>
Signed-off-by: Neeraj upadhyay <Neeraj.Upadhyay@amd.com>
Reviewed-by: Like Xu <likexu@tencent.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/rcu/srcutree.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index 9fab9ac36996..560e99ec5333 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -784,8 +784,7 @@ static void srcu_gp_start(struct srcu_struct *ssp)
 	spin_lock_rcu_node(sdp);  /* Interrupts already disabled. */
 	rcu_segcblist_advance(&sdp->srcu_cblist,
 			      rcu_seq_current(&ssp->srcu_sup->srcu_gp_seq));
-	(void)rcu_segcblist_accelerate(&sdp->srcu_cblist,
-				       rcu_seq_snap(&ssp->srcu_sup->srcu_gp_seq));
+	WARN_ON_ONCE(!rcu_segcblist_segempty(&sdp->srcu_cblist, RCU_NEXT_TAIL));
 	spin_unlock_rcu_node(sdp);  /* Interrupts remain disabled. */
 	WRITE_ONCE(ssp->srcu_sup->srcu_gp_start, jiffies);
 	WRITE_ONCE(ssp->srcu_sup->srcu_n_exp_nodelay, 0);
@@ -1721,6 +1720,7 @@ static void srcu_invoke_callbacks(struct work_struct *work)
 	ssp = sdp->ssp;
 	rcu_cblist_init(&ready_cbs);
 	spin_lock_irq_rcu_node(sdp);
+	WARN_ON_ONCE(!rcu_segcblist_segempty(&sdp->srcu_cblist, RCU_NEXT_TAIL));
 	rcu_segcblist_advance(&sdp->srcu_cblist,
 			      rcu_seq_current(&ssp->srcu_sup->srcu_gp_seq));
 	if (sdp->srcu_cblist_invoking ||
@@ -1750,8 +1750,6 @@ static void srcu_invoke_callbacks(struct work_struct *work)
 	 */
 	spin_lock_irq_rcu_node(sdp);
 	rcu_segcblist_add_len(&sdp->srcu_cblist, -len);
-	(void)rcu_segcblist_accelerate(&sdp->srcu_cblist,
-				       rcu_seq_snap(&ssp->srcu_sup->srcu_gp_seq));
 	sdp->srcu_cblist_invoking = false;
 	more = rcu_segcblist_ready_cbs(&sdp->srcu_cblist);
 	spin_unlock_irq_rcu_node(sdp);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2023-10-13 14:11 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-13 11:58 [PATCH 00/18] RCU fixes for v6.7 Frederic Weisbecker
2023-10-13 11:58 ` [PATCH 01/18] Revert "checkpatch: Error out if deprecated RCU API used" Frederic Weisbecker
2023-10-13 11:58 ` [PATCH 02/18] srcu: Fix error handling in init_srcu_struct_fields() Frederic Weisbecker
2023-10-13 11:58 ` [PATCH 03/18] rcu/tree: Remove superfluous return from void call_rcu* functions Frederic Weisbecker
2023-10-13 11:58 ` [PATCH 04/18] rcu: Add sysfs to provide throttled access to rcu_barrier() Frederic Weisbecker
2023-10-13 11:58 ` [PATCH 05/18] rcu: Remove unused function declaration rcu_eqs_special_set() Frederic Weisbecker
2023-10-13 11:58 ` [PATCH 06/18] mm: Remove kmem_valid_obj() Frederic Weisbecker
2023-10-13 11:58 ` [PATCH 07/18] rcu: Dump memory object info if callback function is invalid Frederic Weisbecker
2023-10-13 11:58 ` [PATCH 08/18] rcu: Eliminate rcu_gp_slow_unregister() false positive Frederic Weisbecker
2023-10-13 11:58 ` [PATCH 09/18] srcu: Fix srcu_struct node grpmask overflow on 64-bit systems Frederic Weisbecker
2023-10-13 12:54   ` David Laight
2023-10-13 14:11     ` Frederic Weisbecker
2023-10-13 11:58 ` [PATCH 10/18] rcu: kmemleak: Ignore kmemleak false positives when RCU-freeing objects Frederic Weisbecker
2023-10-13 11:58 ` [PATCH 11/18] rcu: Use rcu_segcblist_segempty() instead of open coding it Frederic Weisbecker
2023-10-13 11:58 ` [PATCH 12/18] rcu: Assume IRQS disabled from rcu_report_dead() Frederic Weisbecker
2023-10-13 11:58 ` [PATCH 13/18] rcu: Assume rcu_report_dead() is always called locally Frederic Weisbecker
2023-10-13 11:58 ` [PATCH 14/18] rcu: Conditionally build CPU-hotplug teardown callbacks Frederic Weisbecker
2023-10-13 11:58 ` [PATCH 15/18] rcu: Standardize explicit CPU-hotplug calls Frederic Weisbecker
2023-10-13 11:59 ` [PATCH 16/18] rcu: Comment why callbacks migration can't wait for CPUHP_RCUTREE_PREP Frederic Weisbecker
2023-10-13 11:59 ` [PATCH 17/18] srcu: Fix callbacks acceleration mishandling Frederic Weisbecker
2023-10-13 11:59 ` [PATCH 18/18] srcu: Only accelerate on enqueue time Frederic Weisbecker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox