public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH rcu 01/16] rcu: Simplify rcu_init_nohz() cpumask handling
  2022-11-30 18:13 [PATCH v3 rcu 0/16] Lazy call_rcu() updates for v6.2 Paul E. McKenney
@ 2022-11-30 18:13 ` Paul E. McKenney
  2022-11-30 18:13 ` [PATCH rcu 02/16] rcu: Fix late wakeup when flush of bypass cblist happens Paul E. McKenney
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 40+ messages in thread
From: Paul E. McKenney @ 2022-11-30 18:13 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, rostedt, Zhen Lei, Joel Fernandes,
	Frederic Weisbecker, Paul E . McKenney

From: Zhen Lei <thunder.leizhen@huawei.com>

In kernels built with either CONFIG_RCU_NOCB_CPU_DEFAULT_ALL=y or
CONFIG_NO_HZ_FULL=y, additional CPUs must be added to rcu_nocb_mask.
Except that kernels booted without the rcu_nocbs= will not have
allocated rcu_nocb_mask.  And the current rcu_init_nohz() function uses
its need_rcu_nocb_mask and offload_all local variables to track the
rcu_nocb and nohz_full state.

But there is a much simpler approach, namely creating a cpumask pointer
to track the default and then using cpumask_available() to check the
rcu_nocb_mask state.  This commit takes this approach, thereby simplifying
and shortening the rcu_init_nohz() function.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree_nocb.h | 34 +++++++++++-----------------------
 1 file changed, 11 insertions(+), 23 deletions(-)

diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 0a5f0ef414845..ce526cc2791ca 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -1210,45 +1210,33 @@ EXPORT_SYMBOL_GPL(rcu_nocb_cpu_offload);
 void __init rcu_init_nohz(void)
 {
 	int cpu;
-	bool need_rcu_nocb_mask = false;
-	bool offload_all = false;
 	struct rcu_data *rdp;
-
-#if defined(CONFIG_RCU_NOCB_CPU_DEFAULT_ALL)
-	if (!rcu_state.nocb_is_setup) {
-		need_rcu_nocb_mask = true;
-		offload_all = true;
-	}
-#endif /* #if defined(CONFIG_RCU_NOCB_CPU_DEFAULT_ALL) */
+	const struct cpumask *cpumask = NULL;
 
 #if defined(CONFIG_NO_HZ_FULL)
-	if (tick_nohz_full_running && !cpumask_empty(tick_nohz_full_mask)) {
-		need_rcu_nocb_mask = true;
-		offload_all = false; /* NO_HZ_FULL has its own mask. */
-	}
-#endif /* #if defined(CONFIG_NO_HZ_FULL) */
+	if (tick_nohz_full_running && !cpumask_empty(tick_nohz_full_mask))
+		cpumask = tick_nohz_full_mask;
+#endif
+
+	if (IS_ENABLED(CONFIG_RCU_NOCB_CPU_DEFAULT_ALL) &&
+	    !rcu_state.nocb_is_setup && !cpumask)
+		cpumask = cpu_possible_mask;
 
-	if (need_rcu_nocb_mask) {
+	if (cpumask) {
 		if (!cpumask_available(rcu_nocb_mask)) {
 			if (!zalloc_cpumask_var(&rcu_nocb_mask, GFP_KERNEL)) {
 				pr_info("rcu_nocb_mask allocation failed, callback offloading disabled.\n");
 				return;
 			}
 		}
+
+		cpumask_or(rcu_nocb_mask, rcu_nocb_mask, cpumask);
 		rcu_state.nocb_is_setup = true;
 	}
 
 	if (!rcu_state.nocb_is_setup)
 		return;
 
-#if defined(CONFIG_NO_HZ_FULL)
-	if (tick_nohz_full_running)
-		cpumask_or(rcu_nocb_mask, rcu_nocb_mask, tick_nohz_full_mask);
-#endif /* #if defined(CONFIG_NO_HZ_FULL) */
-
-	if (offload_all)
-		cpumask_setall(rcu_nocb_mask);
-
 	if (!cpumask_subset(rcu_nocb_mask, cpu_possible_mask)) {
 		pr_info("\tNote: kernel parameter 'rcu_nocbs=', 'nohz_full', or 'isolcpus=' contains nonexistent CPUs.\n");
 		cpumask_and(rcu_nocb_mask, cpu_possible_mask,
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH rcu 02/16] rcu: Fix late wakeup when flush of bypass cblist happens
  2022-11-30 18:13 [PATCH v3 rcu 0/16] Lazy call_rcu() updates for v6.2 Paul E. McKenney
  2022-11-30 18:13 ` [PATCH rcu 01/16] rcu: Simplify rcu_init_nohz() cpumask handling Paul E. McKenney
@ 2022-11-30 18:13 ` Paul E. McKenney
  2022-11-30 18:13 ` [PATCH rcu 03/16] rcu: Fix missing nocb gp wake on rcu_barrier() Paul E. McKenney
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 40+ messages in thread
From: Paul E. McKenney @ 2022-11-30 18:13 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, rostedt, Joel Fernandes (Google),
	Frederic Weisbecker, Paul E . McKenney

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

When the bypass cblist gets too big or its timeout has occurred, it is
flushed into the main cblist. However, the bypass timer is still running
and the behavior is that it would eventually expire and wake the GP
thread.

Since we are going to use the bypass cblist for lazy CBs, do the wakeup
soon as the flush for "too big or too long" bypass list happens.
Otherwise, long delays can happen for callbacks which get promoted from
lazy to non-lazy.

This is a good thing to do anyway (regardless of future lazy patches),
since it makes the behavior consistent with behavior of other code paths
where flushing into the ->cblist makes the GP kthread into a
non-sleeping state quickly.

[ Frederic Weisbecker: Changes to avoid unnecessary GP-thread wakeups plus
		    comment changes. ]

Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree_nocb.h | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index ce526cc2791ca..f77a6d7e13564 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -433,8 +433,9 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
 	if ((ncbs && j != READ_ONCE(rdp->nocb_bypass_first)) ||
 	    ncbs >= qhimark) {
 		rcu_nocb_lock(rdp);
+		*was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
+
 		if (!rcu_nocb_flush_bypass(rdp, rhp, j)) {
-			*was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
 			if (*was_alldone)
 				trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
 						    TPS("FirstQ"));
@@ -447,7 +448,12 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
 			rcu_advance_cbs_nowake(rdp->mynode, rdp);
 			rdp->nocb_gp_adv_time = j;
 		}
-		rcu_nocb_unlock_irqrestore(rdp, flags);
+
+		// The flush succeeded and we moved CBs into the regular list.
+		// Don't wait for the wake up timer as it may be too far ahead.
+		// Wake up the GP thread now instead, if the cblist was empty.
+		__call_rcu_nocb_wake(rdp, *was_alldone, flags);
+
 		return true; // Callback already enqueued.
 	}
 
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH rcu 03/16] rcu: Fix missing nocb gp wake on rcu_barrier()
  2022-11-30 18:13 [PATCH v3 rcu 0/16] Lazy call_rcu() updates for v6.2 Paul E. McKenney
  2022-11-30 18:13 ` [PATCH rcu 01/16] rcu: Simplify rcu_init_nohz() cpumask handling Paul E. McKenney
  2022-11-30 18:13 ` [PATCH rcu 02/16] rcu: Fix late wakeup when flush of bypass cblist happens Paul E. McKenney
@ 2022-11-30 18:13 ` Paul E. McKenney
  2022-11-30 18:13 ` [PATCH rcu 04/16] rcu: Make call_rcu() lazy to save power Paul E. McKenney
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 40+ messages in thread
From: Paul E. McKenney @ 2022-11-30 18:13 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, rostedt, Frederic Weisbecker,
	Joel Fernandes, Paul E . McKenney

From: Frederic Weisbecker <frederic@kernel.org>

In preparation for RCU lazy changes, wake up the RCU nocb gp thread if
needed after an entrain.  This change prevents the RCU barrier callback
from waiting in the queue for several seconds before the lazy callbacks
in front of it are serviced.

Reported-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree.c      | 11 +++++++++++
 kernel/rcu/tree.h      |  1 +
 kernel/rcu/tree_nocb.h |  5 +++++
 3 files changed, 17 insertions(+)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 6bb8e72bc8151..fb7a1b95af71e 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3894,6 +3894,8 @@ static void rcu_barrier_entrain(struct rcu_data *rdp)
 {
 	unsigned long gseq = READ_ONCE(rcu_state.barrier_sequence);
 	unsigned long lseq = READ_ONCE(rdp->barrier_seq_snap);
+	bool wake_nocb = false;
+	bool was_alldone = false;
 
 	lockdep_assert_held(&rcu_state.barrier_lock);
 	if (rcu_seq_state(lseq) || !rcu_seq_state(gseq) || rcu_seq_ctr(lseq) != rcu_seq_ctr(gseq))
@@ -3902,7 +3904,14 @@ static void rcu_barrier_entrain(struct rcu_data *rdp)
 	rdp->barrier_head.func = rcu_barrier_callback;
 	debug_rcu_head_queue(&rdp->barrier_head);
 	rcu_nocb_lock(rdp);
+	/*
+	 * Flush bypass and wakeup rcuog if we add callbacks to an empty regular
+	 * queue. This way we don't wait for bypass timer that can reach seconds
+	 * if it's fully lazy.
+	 */
+	was_alldone = rcu_rdp_is_offloaded(rdp) && !rcu_segcblist_pend_cbs(&rdp->cblist);
 	WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies));
+	wake_nocb = was_alldone && rcu_segcblist_pend_cbs(&rdp->cblist);
 	if (rcu_segcblist_entrain(&rdp->cblist, &rdp->barrier_head)) {
 		atomic_inc(&rcu_state.barrier_cpu_count);
 	} else {
@@ -3910,6 +3919,8 @@ static void rcu_barrier_entrain(struct rcu_data *rdp)
 		rcu_barrier_trace(TPS("IRQNQ"), -1, rcu_state.barrier_sequence);
 	}
 	rcu_nocb_unlock(rdp);
+	if (wake_nocb)
+		wake_nocb_gp(rdp, false);
 	smp_store_release(&rdp->barrier_seq_snap, gseq);
 }
 
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index d4a97e40ea9c3..925dd98f8b23b 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -439,6 +439,7 @@ static void zero_cpu_stall_ticks(struct rcu_data *rdp);
 static struct swait_queue_head *rcu_nocb_gp_get(struct rcu_node *rnp);
 static void rcu_nocb_gp_cleanup(struct swait_queue_head *sq);
 static void rcu_init_one_nocb(struct rcu_node *rnp);
+static bool wake_nocb_gp(struct rcu_data *rdp, bool force);
 static bool rcu_nocb_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
 				  unsigned long j);
 static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index f77a6d7e13564..094fd454b6c38 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -1558,6 +1558,11 @@ static void rcu_init_one_nocb(struct rcu_node *rnp)
 {
 }
 
+static bool wake_nocb_gp(struct rcu_data *rdp, bool force)
+{
+	return false;
+}
+
 static bool rcu_nocb_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
 				  unsigned long j)
 {
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH rcu 04/16] rcu: Make call_rcu() lazy to save power
  2022-11-30 18:13 [PATCH v3 rcu 0/16] Lazy call_rcu() updates for v6.2 Paul E. McKenney
                   ` (2 preceding siblings ...)
  2022-11-30 18:13 ` [PATCH rcu 03/16] rcu: Fix missing nocb gp wake on rcu_barrier() Paul E. McKenney
@ 2022-11-30 18:13 ` Paul E. McKenney
  2022-11-30 18:13 ` [PATCH rcu 05/16] rcu: Refactor code a bit in rcu_nocb_do_flush_bypass() Paul E. McKenney
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 40+ messages in thread
From: Paul E. McKenney @ 2022-11-30 18:13 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, rostedt, Joel Fernandes (Google),
	Paul McKenney, Frederic Weisbecker

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

Implement timer-based RCU callback batching (also known as lazy
callbacks). With this we save about 5-10% of power consumed due
to RCU requests that happen when system is lightly loaded or idle.

By default, all async callbacks (queued via call_rcu) are marked
lazy. An alternate API call_rcu_hurry() is provided for the few users,
for example synchronize_rcu(), that need the old behavior.

The batch is flushed whenever a certain amount of time has passed, or
the batch on a particular CPU grows too big. Also memory pressure will
flush it in a future patch.

To handle several corner cases automagically (such as rcu_barrier() and
hotplug), we re-use bypass lists which were originally introduced to
address lock contention, to handle lazy CBs as well. The bypass list
length has the lazy CB length included in it. A separate lazy CB length
counter is also introduced to keep track of the number of lazy CBs.

[ paulmck: Fix formatting of inline call_rcu_lazy() definition. ]
[ paulmck: Apply Zqiang feedback. ]
[ paulmck: Apply s/call_rcu_flush/call_rcu_hurry/ feedback from Tejun Heo. ]

Suggested-by: Paul McKenney <paulmck@kernel.org>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 include/linux/rcupdate.h |   9 +++
 kernel/rcu/Kconfig       |   8 ++
 kernel/rcu/rcu.h         |   8 ++
 kernel/rcu/tiny.c        |   2 +-
 kernel/rcu/tree.c        | 129 ++++++++++++++++++++-----------
 kernel/rcu/tree.h        |  11 ++-
 kernel/rcu/tree_exp.h    |   2 +-
 kernel/rcu/tree_nocb.h   | 159 +++++++++++++++++++++++++++++++--------
 8 files changed, 246 insertions(+), 82 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 08605ce7379d7..611c11383d236 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -108,6 +108,15 @@ static inline int rcu_preempt_depth(void)
 
 #endif /* #else #ifdef CONFIG_PREEMPT_RCU */
 
+#ifdef CONFIG_RCU_LAZY
+void call_rcu_hurry(struct rcu_head *head, rcu_callback_t func);
+#else
+static inline void call_rcu_hurry(struct rcu_head *head, rcu_callback_t func)
+{
+	call_rcu(head, func);
+}
+#endif
+
 /* Internal to kernel */
 void rcu_init(void);
 extern int rcu_scheduler_active;
diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
index d471d22a5e21b..d78f6181c8aad 100644
--- a/kernel/rcu/Kconfig
+++ b/kernel/rcu/Kconfig
@@ -311,4 +311,12 @@ config TASKS_TRACE_RCU_READ_MB
 	  Say N here if you hate read-side memory barriers.
 	  Take the default if you are unsure.
 
+config RCU_LAZY
+	bool "RCU callback lazy invocation functionality"
+	depends on RCU_NOCB_CPU
+	default n
+	help
+	  To save power, batch RCU callbacks and flush after delay, memory
+	  pressure, or callback list growing too big.
+
 endmenu # "RCU Subsystem"
diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index be5979da07f59..65704cbc9df7b 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -474,6 +474,14 @@ enum rcutorture_type {
 	INVALID_RCU_FLAVOR
 };
 
+#if defined(CONFIG_RCU_LAZY)
+unsigned long rcu_lazy_get_jiffies_till_flush(void);
+void rcu_lazy_set_jiffies_till_flush(unsigned long j);
+#else
+static inline unsigned long rcu_lazy_get_jiffies_till_flush(void) { return 0; }
+static inline void rcu_lazy_set_jiffies_till_flush(unsigned long j) { }
+#endif
+
 #if defined(CONFIG_TREE_RCU)
 void rcutorture_get_gp_data(enum rcutorture_type test_type, int *flags,
 			    unsigned long *gp_seq);
diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c
index a33a8d4942c37..72913ce21258b 100644
--- a/kernel/rcu/tiny.c
+++ b/kernel/rcu/tiny.c
@@ -44,7 +44,7 @@ static struct rcu_ctrlblk rcu_ctrlblk = {
 
 void rcu_barrier(void)
 {
-	wait_rcu_gp(call_rcu);
+	wait_rcu_gp(call_rcu_hurry);
 }
 EXPORT_SYMBOL(rcu_barrier);
 
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index fb7a1b95af71e..4b68e50312d95 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2728,47 +2728,8 @@ static void check_cb_ovld(struct rcu_data *rdp)
 	raw_spin_unlock_rcu_node(rnp);
 }
 
-/**
- * call_rcu() - Queue an RCU callback for invocation after a grace period.
- * @head: structure to be used for queueing the RCU updates.
- * @func: actual callback function to be invoked after the grace period
- *
- * The callback function will be invoked some time after a full grace
- * period elapses, in other words after all pre-existing RCU read-side
- * critical sections have completed.  However, the callback function
- * might well execute concurrently with RCU read-side critical sections
- * that started after call_rcu() was invoked.
- *
- * RCU read-side critical sections are delimited by rcu_read_lock()
- * and rcu_read_unlock(), and may be nested.  In addition, but only in
- * v5.0 and later, regions of code across which interrupts, preemption,
- * or softirqs have been disabled also serve as RCU read-side critical
- * sections.  This includes hardware interrupt handlers, softirq handlers,
- * and NMI handlers.
- *
- * Note that all CPUs must agree that the grace period extended beyond
- * all pre-existing RCU read-side critical section.  On systems with more
- * than one CPU, this means that when "func()" is invoked, each CPU is
- * guaranteed to have executed a full memory barrier since the end of its
- * last RCU read-side critical section whose beginning preceded the call
- * to call_rcu().  It also means that each CPU executing an RCU read-side
- * critical section that continues beyond the start of "func()" must have
- * executed a memory barrier after the call_rcu() but before the beginning
- * of that RCU read-side critical section.  Note that these guarantees
- * include CPUs that are offline, idle, or executing in user mode, as
- * well as CPUs that are executing in the kernel.
- *
- * Furthermore, if CPU A invoked call_rcu() and CPU B invoked the
- * resulting RCU callback function "func()", then both CPU A and CPU B are
- * guaranteed to execute a full memory barrier during the time interval
- * between the call to call_rcu() and the invocation of "func()" -- even
- * if CPU A and CPU B are the same CPU (but again only if the system has
- * more than one CPU).
- *
- * Implementation of these memory-ordering guarantees is described here:
- * Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst.
- */
-void call_rcu(struct rcu_head *head, rcu_callback_t func)
+static void
+__call_rcu_common(struct rcu_head *head, rcu_callback_t func, bool lazy)
 {
 	static atomic_t doublefrees;
 	unsigned long flags;
@@ -2809,7 +2770,7 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func)
 	}
 
 	check_cb_ovld(rdp);
-	if (rcu_nocb_try_bypass(rdp, head, &was_alldone, flags))
+	if (rcu_nocb_try_bypass(rdp, head, &was_alldone, flags, lazy))
 		return; // Enqueued onto ->nocb_bypass, so just leave.
 	// If no-CBs CPU gets here, rcu_nocb_try_bypass() acquired ->nocb_lock.
 	rcu_segcblist_enqueue(&rdp->cblist, head);
@@ -2831,8 +2792,84 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func)
 		local_irq_restore(flags);
 	}
 }
-EXPORT_SYMBOL_GPL(call_rcu);
 
+#ifdef CONFIG_RCU_LAZY
+/**
+ * call_rcu_hurry() - Queue RCU callback for invocation after grace period, and
+ * flush all lazy callbacks (including the new one) to the main ->cblist while
+ * doing so.
+ *
+ * @head: structure to be used for queueing the RCU updates.
+ * @func: actual callback function to be invoked after the grace period
+ *
+ * The callback function will be invoked some time after a full grace
+ * period elapses, in other words after all pre-existing RCU read-side
+ * critical sections have completed.
+ *
+ * Use this API instead of call_rcu() if you don't want the callback to be
+ * invoked after very long periods of time, which can happen on systems without
+ * memory pressure and on systems which are lightly loaded or mostly idle.
+ * This function will cause callbacks to be invoked sooner than later at the
+ * expense of extra power. Other than that, this function is identical to, and
+ * reuses call_rcu()'s logic. Refer to call_rcu() for more details about memory
+ * ordering and other functionality.
+ */
+void call_rcu_hurry(struct rcu_head *head, rcu_callback_t func)
+{
+	return __call_rcu_common(head, func, false);
+}
+EXPORT_SYMBOL_GPL(call_rcu_hurry);
+#endif
+
+/**
+ * call_rcu() - Queue an RCU callback for invocation after a grace period.
+ * By default the callbacks are 'lazy' and are kept hidden from the main
+ * ->cblist to prevent starting of grace periods too soon.
+ * If you desire grace periods to start very soon, use call_rcu_hurry().
+ *
+ * @head: structure to be used for queueing the RCU updates.
+ * @func: actual callback function to be invoked after the grace period
+ *
+ * The callback function will be invoked some time after a full grace
+ * period elapses, in other words after all pre-existing RCU read-side
+ * critical sections have completed.  However, the callback function
+ * might well execute concurrently with RCU read-side critical sections
+ * that started after call_rcu() was invoked.
+ *
+ * RCU read-side critical sections are delimited by rcu_read_lock()
+ * and rcu_read_unlock(), and may be nested.  In addition, but only in
+ * v5.0 and later, regions of code across which interrupts, preemption,
+ * or softirqs have been disabled also serve as RCU read-side critical
+ * sections.  This includes hardware interrupt handlers, softirq handlers,
+ * and NMI handlers.
+ *
+ * Note that all CPUs must agree that the grace period extended beyond
+ * all pre-existing RCU read-side critical section.  On systems with more
+ * than one CPU, this means that when "func()" is invoked, each CPU is
+ * guaranteed to have executed a full memory barrier since the end of its
+ * last RCU read-side critical section whose beginning preceded the call
+ * to call_rcu().  It also means that each CPU executing an RCU read-side
+ * critical section that continues beyond the start of "func()" must have
+ * executed a memory barrier after the call_rcu() but before the beginning
+ * of that RCU read-side critical section.  Note that these guarantees
+ * include CPUs that are offline, idle, or executing in user mode, as
+ * well as CPUs that are executing in the kernel.
+ *
+ * Furthermore, if CPU A invoked call_rcu() and CPU B invoked the
+ * resulting RCU callback function "func()", then both CPU A and CPU B are
+ * guaranteed to execute a full memory barrier during the time interval
+ * between the call to call_rcu() and the invocation of "func()" -- even
+ * if CPU A and CPU B are the same CPU (but again only if the system has
+ * more than one CPU).
+ *
+ * Implementation of these memory-ordering guarantees is described here:
+ * Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst.
+ */
+void call_rcu(struct rcu_head *head, rcu_callback_t func)
+{
+	return __call_rcu_common(head, func, IS_ENABLED(CONFIG_RCU_LAZY));
+}
+EXPORT_SYMBOL_GPL(call_rcu);
 
 /* Maximum number of jiffies to wait before draining a batch. */
 #define KFREE_DRAIN_JIFFIES (5 * HZ)
@@ -3507,7 +3544,7 @@ void synchronize_rcu(void)
 		if (rcu_gp_is_expedited())
 			synchronize_rcu_expedited();
 		else
-			wait_rcu_gp(call_rcu);
+			wait_rcu_gp(call_rcu_hurry);
 		return;
 	}
 
@@ -3910,7 +3947,7 @@ static void rcu_barrier_entrain(struct rcu_data *rdp)
 	 * if it's fully lazy.
 	 */
 	was_alldone = rcu_rdp_is_offloaded(rdp) && !rcu_segcblist_pend_cbs(&rdp->cblist);
-	WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies));
+	WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies, false));
 	wake_nocb = was_alldone && rcu_segcblist_pend_cbs(&rdp->cblist);
 	if (rcu_segcblist_entrain(&rdp->cblist, &rdp->barrier_head)) {
 		atomic_inc(&rcu_state.barrier_cpu_count);
@@ -4336,7 +4373,7 @@ void rcutree_migrate_callbacks(int cpu)
 	my_rdp = this_cpu_ptr(&rcu_data);
 	my_rnp = my_rdp->mynode;
 	rcu_nocb_lock(my_rdp); /* irqs already disabled. */
-	WARN_ON_ONCE(!rcu_nocb_flush_bypass(my_rdp, NULL, jiffies));
+	WARN_ON_ONCE(!rcu_nocb_flush_bypass(my_rdp, NULL, jiffies, false));
 	raw_spin_lock_rcu_node(my_rnp); /* irqs already disabled. */
 	/* Leverage recent GPs and set GP for new callbacks. */
 	needwake = rcu_advance_cbs(my_rnp, rdp) ||
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 925dd98f8b23b..fcb5d696eb170 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -263,14 +263,16 @@ struct rcu_data {
 	unsigned long last_fqs_resched;	/* Time of last rcu_resched(). */
 	unsigned long last_sched_clock;	/* Jiffies of last rcu_sched_clock_irq(). */
 
+	long lazy_len;			/* Length of buffered lazy callbacks. */
 	int cpu;
 };
 
 /* Values for nocb_defer_wakeup field in struct rcu_data. */
 #define RCU_NOCB_WAKE_NOT	0
 #define RCU_NOCB_WAKE_BYPASS	1
-#define RCU_NOCB_WAKE		2
-#define RCU_NOCB_WAKE_FORCE	3
+#define RCU_NOCB_WAKE_LAZY	2
+#define RCU_NOCB_WAKE		3
+#define RCU_NOCB_WAKE_FORCE	4
 
 #define RCU_JIFFIES_TILL_FORCE_QS (1 + (HZ > 250) + (HZ > 500))
 					/* For jiffies_till_first_fqs and */
@@ -441,9 +443,10 @@ static void rcu_nocb_gp_cleanup(struct swait_queue_head *sq);
 static void rcu_init_one_nocb(struct rcu_node *rnp);
 static bool wake_nocb_gp(struct rcu_data *rdp, bool force);
 static bool rcu_nocb_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
-				  unsigned long j);
+				  unsigned long j, bool lazy);
 static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
-				bool *was_alldone, unsigned long flags);
+				bool *was_alldone, unsigned long flags,
+				bool lazy);
 static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_empty,
 				 unsigned long flags);
 static int rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp, int level);
diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 18e9b4cd78ef8..ed6c3cce28f23 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -937,7 +937,7 @@ void synchronize_rcu_expedited(void)
 
 	/* If expedited grace periods are prohibited, fall back to normal. */
 	if (rcu_gp_is_normal()) {
-		wait_rcu_gp(call_rcu);
+		wait_rcu_gp(call_rcu_hurry);
 		return;
 	}
 
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 094fd454b6c38..d6e4c076b0515 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -256,6 +256,31 @@ static bool wake_nocb_gp(struct rcu_data *rdp, bool force)
 	return __wake_nocb_gp(rdp_gp, rdp, force, flags);
 }
 
+/*
+ * LAZY_FLUSH_JIFFIES decides the maximum amount of time that
+ * can elapse before lazy callbacks are flushed. Lazy callbacks
+ * could be flushed much earlier for a number of other reasons
+ * however, LAZY_FLUSH_JIFFIES will ensure no lazy callbacks are
+ * left unsubmitted to RCU after those many jiffies.
+ */
+#define LAZY_FLUSH_JIFFIES (10 * HZ)
+static unsigned long jiffies_till_flush = LAZY_FLUSH_JIFFIES;
+
+#ifdef CONFIG_RCU_LAZY
+// To be called only from test code.
+void rcu_lazy_set_jiffies_till_flush(unsigned long jif)
+{
+	jiffies_till_flush = jif;
+}
+EXPORT_SYMBOL(rcu_lazy_set_jiffies_till_flush);
+
+unsigned long rcu_lazy_get_jiffies_till_flush(void)
+{
+	return jiffies_till_flush;
+}
+EXPORT_SYMBOL(rcu_lazy_get_jiffies_till_flush);
+#endif
+
 /*
  * Arrange to wake the GP kthread for this NOCB group at some future
  * time when it is safe to do so.
@@ -269,10 +294,14 @@ static void wake_nocb_gp_defer(struct rcu_data *rdp, int waketype,
 	raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags);
 
 	/*
-	 * Bypass wakeup overrides previous deferments. In case
-	 * of callback storm, no need to wake up too early.
+	 * Bypass wakeup overrides previous deferments. In case of
+	 * callback storms, no need to wake up too early.
 	 */
-	if (waketype == RCU_NOCB_WAKE_BYPASS) {
+	if (waketype == RCU_NOCB_WAKE_LAZY &&
+	    rdp->nocb_defer_wakeup == RCU_NOCB_WAKE_NOT) {
+		mod_timer(&rdp_gp->nocb_timer, jiffies + jiffies_till_flush);
+		WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
+	} else if (waketype == RCU_NOCB_WAKE_BYPASS) {
 		mod_timer(&rdp_gp->nocb_timer, jiffies + 2);
 		WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
 	} else {
@@ -293,10 +322,13 @@ static void wake_nocb_gp_defer(struct rcu_data *rdp, int waketype,
  * proves to be initially empty, just return false because the no-CB GP
  * kthread may need to be awakened in this case.
  *
+ * Return true if there was something to be flushed and it succeeded, otherwise
+ * false.
+ *
  * Note that this function always returns true if rhp is NULL.
  */
 static bool rcu_nocb_do_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
-				     unsigned long j)
+				     unsigned long j, bool lazy)
 {
 	struct rcu_cblist rcl;
 
@@ -310,7 +342,20 @@ static bool rcu_nocb_do_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
 	/* Note: ->cblist.len already accounts for ->nocb_bypass contents. */
 	if (rhp)
 		rcu_segcblist_inc_len(&rdp->cblist); /* Must precede enqueue. */
-	rcu_cblist_flush_enqueue(&rcl, &rdp->nocb_bypass, rhp);
+
+	/*
+	 * If the new CB requested was a lazy one, queue it onto the main
+	 * ->cblist so we can take advantage of a sooner grade period.
+	 */
+	if (lazy && rhp) {
+		rcu_cblist_flush_enqueue(&rcl, &rdp->nocb_bypass, NULL);
+		rcu_cblist_enqueue(&rcl, rhp);
+		WRITE_ONCE(rdp->lazy_len, 0);
+	} else {
+		rcu_cblist_flush_enqueue(&rcl, &rdp->nocb_bypass, rhp);
+		WRITE_ONCE(rdp->lazy_len, 0);
+	}
+
 	rcu_segcblist_insert_pend_cbs(&rdp->cblist, &rcl);
 	WRITE_ONCE(rdp->nocb_bypass_first, j);
 	rcu_nocb_bypass_unlock(rdp);
@@ -326,13 +371,13 @@ static bool rcu_nocb_do_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
  * Note that this function always returns true if rhp is NULL.
  */
 static bool rcu_nocb_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
-				  unsigned long j)
+				  unsigned long j, bool lazy)
 {
 	if (!rcu_rdp_is_offloaded(rdp))
 		return true;
 	rcu_lockdep_assert_cblist_protected(rdp);
 	rcu_nocb_bypass_lock(rdp);
-	return rcu_nocb_do_flush_bypass(rdp, rhp, j);
+	return rcu_nocb_do_flush_bypass(rdp, rhp, j, lazy);
 }
 
 /*
@@ -345,7 +390,7 @@ static void rcu_nocb_try_flush_bypass(struct rcu_data *rdp, unsigned long j)
 	if (!rcu_rdp_is_offloaded(rdp) ||
 	    !rcu_nocb_bypass_trylock(rdp))
 		return;
-	WARN_ON_ONCE(!rcu_nocb_do_flush_bypass(rdp, NULL, j));
+	WARN_ON_ONCE(!rcu_nocb_do_flush_bypass(rdp, NULL, j, false));
 }
 
 /*
@@ -367,12 +412,14 @@ static void rcu_nocb_try_flush_bypass(struct rcu_data *rdp, unsigned long j)
  * there is only one CPU in operation.
  */
 static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
-				bool *was_alldone, unsigned long flags)
+				bool *was_alldone, unsigned long flags,
+				bool lazy)
 {
 	unsigned long c;
 	unsigned long cur_gp_seq;
 	unsigned long j = jiffies;
 	long ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
+	bool bypass_is_lazy = (ncbs == READ_ONCE(rdp->lazy_len));
 
 	lockdep_assert_irqs_disabled();
 
@@ -417,25 +464,29 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
 	// If there hasn't yet been all that many ->cblist enqueues
 	// this jiffy, tell the caller to enqueue onto ->cblist.  But flush
 	// ->nocb_bypass first.
-	if (rdp->nocb_nobypass_count < nocb_nobypass_lim_per_jiffy) {
+	// Lazy CBs throttle this back and do immediate bypass queuing.
+	if (rdp->nocb_nobypass_count < nocb_nobypass_lim_per_jiffy && !lazy) {
 		rcu_nocb_lock(rdp);
 		*was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
 		if (*was_alldone)
 			trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
 					    TPS("FirstQ"));
-		WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, j));
+
+		WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, j, false));
 		WARN_ON_ONCE(rcu_cblist_n_cbs(&rdp->nocb_bypass));
 		return false; // Caller must enqueue the callback.
 	}
 
 	// If ->nocb_bypass has been used too long or is too full,
 	// flush ->nocb_bypass to ->cblist.
-	if ((ncbs && j != READ_ONCE(rdp->nocb_bypass_first)) ||
+	if ((ncbs && !bypass_is_lazy && j != READ_ONCE(rdp->nocb_bypass_first)) ||
+	    (ncbs &&  bypass_is_lazy &&
+	     (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + jiffies_till_flush))) ||
 	    ncbs >= qhimark) {
 		rcu_nocb_lock(rdp);
 		*was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
 
-		if (!rcu_nocb_flush_bypass(rdp, rhp, j)) {
+		if (!rcu_nocb_flush_bypass(rdp, rhp, j, lazy)) {
 			if (*was_alldone)
 				trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
 						    TPS("FirstQ"));
@@ -463,13 +514,24 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
 	ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
 	rcu_segcblist_inc_len(&rdp->cblist); /* Must precede enqueue. */
 	rcu_cblist_enqueue(&rdp->nocb_bypass, rhp);
+
+	if (lazy)
+		WRITE_ONCE(rdp->lazy_len, rdp->lazy_len + 1);
+
 	if (!ncbs) {
 		WRITE_ONCE(rdp->nocb_bypass_first, j);
 		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("FirstBQ"));
 	}
 	rcu_nocb_bypass_unlock(rdp);
 	smp_mb(); /* Order enqueue before wake. */
-	if (ncbs) {
+	// A wake up of the grace period kthread or timer adjustment
+	// needs to be done only if:
+	// 1. Bypass list was fully empty before (this is the first
+	//    bypass list entry), or:
+	// 2. Both of these conditions are met:
+	//    a. The bypass list previously had only lazy CBs, and:
+	//    b. The new CB is non-lazy.
+	if (ncbs && (!bypass_is_lazy || lazy)) {
 		local_irq_restore(flags);
 	} else {
 		// No-CBs GP kthread might be indefinitely asleep, if so, wake.
@@ -497,8 +559,10 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
 				 unsigned long flags)
 				 __releases(rdp->nocb_lock)
 {
+	long bypass_len;
 	unsigned long cur_gp_seq;
 	unsigned long j;
+	long lazy_len;
 	long len;
 	struct task_struct *t;
 
@@ -512,9 +576,16 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
 	}
 	// Need to actually to a wakeup.
 	len = rcu_segcblist_n_cbs(&rdp->cblist);
+	bypass_len = rcu_cblist_n_cbs(&rdp->nocb_bypass);
+	lazy_len = READ_ONCE(rdp->lazy_len);
 	if (was_alldone) {
 		rdp->qlen_last_fqs_check = len;
-		if (!irqs_disabled_flags(flags)) {
+		// Only lazy CBs in bypass list
+		if (lazy_len && bypass_len == lazy_len) {
+			rcu_nocb_unlock_irqrestore(rdp, flags);
+			wake_nocb_gp_defer(rdp, RCU_NOCB_WAKE_LAZY,
+					   TPS("WakeLazy"));
+		} else if (!irqs_disabled_flags(flags)) {
 			/* ... if queue was empty ... */
 			rcu_nocb_unlock_irqrestore(rdp, flags);
 			wake_nocb_gp(rdp, false);
@@ -605,12 +676,12 @@ static void nocb_gp_sleep(struct rcu_data *my_rdp, int cpu)
 static void nocb_gp_wait(struct rcu_data *my_rdp)
 {
 	bool bypass = false;
-	long bypass_ncbs;
 	int __maybe_unused cpu = my_rdp->cpu;
 	unsigned long cur_gp_seq;
 	unsigned long flags;
 	bool gotcbs = false;
 	unsigned long j = jiffies;
+	bool lazy = false;
 	bool needwait_gp = false; // This prevents actual uninitialized use.
 	bool needwake;
 	bool needwake_gp;
@@ -640,24 +711,43 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
 	 * won't be ignored for long.
 	 */
 	list_for_each_entry(rdp, &my_rdp->nocb_head_rdp, nocb_entry_rdp) {
+		long bypass_ncbs;
+		bool flush_bypass = false;
+		long lazy_ncbs;
+
 		trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("Check"));
 		rcu_nocb_lock_irqsave(rdp, flags);
 		lockdep_assert_held(&rdp->nocb_lock);
 		bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
-		if (bypass_ncbs &&
+		lazy_ncbs = READ_ONCE(rdp->lazy_len);
+
+		if (bypass_ncbs && (lazy_ncbs == bypass_ncbs) &&
+		    (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + jiffies_till_flush) ||
+		     bypass_ncbs > 2 * qhimark)) {
+			flush_bypass = true;
+		} else if (bypass_ncbs && (lazy_ncbs != bypass_ncbs) &&
 		    (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + 1) ||
 		     bypass_ncbs > 2 * qhimark)) {
-			// Bypass full or old, so flush it.
-			(void)rcu_nocb_try_flush_bypass(rdp, j);
-			bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
+			flush_bypass = true;
 		} else if (!bypass_ncbs && rcu_segcblist_empty(&rdp->cblist)) {
 			rcu_nocb_unlock_irqrestore(rdp, flags);
 			continue; /* No callbacks here, try next. */
 		}
+
+		if (flush_bypass) {
+			// Bypass full or old, so flush it.
+			(void)rcu_nocb_try_flush_bypass(rdp, j);
+			bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
+			lazy_ncbs = READ_ONCE(rdp->lazy_len);
+		}
+
 		if (bypass_ncbs) {
 			trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
-					    TPS("Bypass"));
-			bypass = true;
+					    bypass_ncbs == lazy_ncbs ? TPS("Lazy") : TPS("Bypass"));
+			if (bypass_ncbs == lazy_ncbs)
+				lazy = true;
+			else
+				bypass = true;
 		}
 		rnp = rdp->mynode;
 
@@ -705,12 +795,20 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
 	my_rdp->nocb_gp_gp = needwait_gp;
 	my_rdp->nocb_gp_seq = needwait_gp ? wait_gp_seq : 0;
 
-	if (bypass && !rcu_nocb_poll) {
-		// At least one child with non-empty ->nocb_bypass, so set
-		// timer in order to avoid stranding its callbacks.
-		wake_nocb_gp_defer(my_rdp, RCU_NOCB_WAKE_BYPASS,
-				   TPS("WakeBypassIsDeferred"));
+	// At least one child with non-empty ->nocb_bypass, so set
+	// timer in order to avoid stranding its callbacks.
+	if (!rcu_nocb_poll) {
+		// If bypass list only has lazy CBs. Add a deferred lazy wake up.
+		if (lazy && !bypass) {
+			wake_nocb_gp_defer(my_rdp, RCU_NOCB_WAKE_LAZY,
+					TPS("WakeLazyIsDeferred"));
+		// Otherwise add a deferred bypass wake up.
+		} else if (bypass) {
+			wake_nocb_gp_defer(my_rdp, RCU_NOCB_WAKE_BYPASS,
+					TPS("WakeBypassIsDeferred"));
+		}
 	}
+
 	if (rcu_nocb_poll) {
 		/* Polling, so trace if first poll in the series. */
 		if (gotcbs)
@@ -1036,7 +1134,7 @@ static long rcu_nocb_rdp_deoffload(void *arg)
 	 * return false, which means that future calls to rcu_nocb_try_bypass()
 	 * will refuse to put anything into the bypass.
 	 */
-	WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies));
+	WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies, false));
 	/*
 	 * Start with invoking rcu_core() early. This way if the current thread
 	 * happens to preempt an ongoing call to rcu_core() in the middle,
@@ -1278,6 +1376,7 @@ static void __init rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp)
 	raw_spin_lock_init(&rdp->nocb_gp_lock);
 	timer_setup(&rdp->nocb_timer, do_nocb_deferred_wakeup_timer, 0);
 	rcu_cblist_init(&rdp->nocb_bypass);
+	WRITE_ONCE(rdp->lazy_len, 0);
 	mutex_init(&rdp->nocb_gp_kthread_mutex);
 }
 
@@ -1564,13 +1663,13 @@ static bool wake_nocb_gp(struct rcu_data *rdp, bool force)
 }
 
 static bool rcu_nocb_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
-				  unsigned long j)
+				  unsigned long j, bool lazy)
 {
 	return true;
 }
 
 static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
-				bool *was_alldone, unsigned long flags)
+				bool *was_alldone, unsigned long flags, bool lazy)
 {
 	return false;
 }
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH rcu 05/16] rcu: Refactor code a bit in rcu_nocb_do_flush_bypass()
  2022-11-30 18:13 [PATCH v3 rcu 0/16] Lazy call_rcu() updates for v6.2 Paul E. McKenney
                   ` (3 preceding siblings ...)
  2022-11-30 18:13 ` [PATCH rcu 04/16] rcu: Make call_rcu() lazy to save power Paul E. McKenney
@ 2022-11-30 18:13 ` Paul E. McKenney
  2022-11-30 18:13 ` [PATCH rcu 06/16] rcu: Shrinker for lazy rcu Paul E. McKenney
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 40+ messages in thread
From: Paul E. McKenney @ 2022-11-30 18:13 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, rostedt, Joel Fernandes (Google),
	Paul E . McKenney, Frederic Weisbecker

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

This consolidates the code a bit and makes it cleaner. Functionally it
is the same.

Reported-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree_nocb.h | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index d6e4c076b0515..213daf81c057f 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -327,10 +327,11 @@ static void wake_nocb_gp_defer(struct rcu_data *rdp, int waketype,
  *
  * Note that this function always returns true if rhp is NULL.
  */
-static bool rcu_nocb_do_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
+static bool rcu_nocb_do_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp_in,
 				     unsigned long j, bool lazy)
 {
 	struct rcu_cblist rcl;
+	struct rcu_head *rhp = rhp_in;
 
 	WARN_ON_ONCE(!rcu_rdp_is_offloaded(rdp));
 	rcu_lockdep_assert_cblist_protected(rdp);
@@ -345,16 +346,16 @@ static bool rcu_nocb_do_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
 
 	/*
 	 * If the new CB requested was a lazy one, queue it onto the main
-	 * ->cblist so we can take advantage of a sooner grade period.
+	 * ->cblist so that we can take advantage of the grace-period that will
+	 * happen regardless. But queue it onto the bypass list first so that
+	 * the lazy CB is ordered with the existing CBs in the bypass list.
 	 */
 	if (lazy && rhp) {
-		rcu_cblist_flush_enqueue(&rcl, &rdp->nocb_bypass, NULL);
-		rcu_cblist_enqueue(&rcl, rhp);
-		WRITE_ONCE(rdp->lazy_len, 0);
-	} else {
-		rcu_cblist_flush_enqueue(&rcl, &rdp->nocb_bypass, rhp);
-		WRITE_ONCE(rdp->lazy_len, 0);
+		rcu_cblist_enqueue(&rdp->nocb_bypass, rhp);
+		rhp = NULL;
 	}
+	rcu_cblist_flush_enqueue(&rcl, &rdp->nocb_bypass, rhp);
+	WRITE_ONCE(rdp->lazy_len, 0);
 
 	rcu_segcblist_insert_pend_cbs(&rdp->cblist, &rcl);
 	WRITE_ONCE(rdp->nocb_bypass_first, j);
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH rcu 06/16] rcu: Shrinker for lazy rcu
  2022-11-30 18:13 [PATCH v3 rcu 0/16] Lazy call_rcu() updates for v6.2 Paul E. McKenney
                   ` (4 preceding siblings ...)
  2022-11-30 18:13 ` [PATCH rcu 05/16] rcu: Refactor code a bit in rcu_nocb_do_flush_bypass() Paul E. McKenney
@ 2022-11-30 18:13 ` Paul E. McKenney
  2022-11-30 18:13 ` [PATCH rcu 07/16] rcuscale: Add laziness and kfree tests Paul E. McKenney
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 40+ messages in thread
From: Paul E. McKenney @ 2022-11-30 18:13 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, rostedt, Vineeth Pillai,
	Joel Fernandes, Paul E . McKenney

From: Vineeth Pillai <vineeth@bitbyteword.org>

The shrinker is used to speed up the free'ing of memory potentially held
by RCU lazy callbacks. RCU kernel module test cases show this to be
effective. Test is introduced in a later patch.

Signed-off-by: Vineeth Pillai <vineeth@bitbyteword.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree_nocb.h | 52 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 213daf81c057f..9e1c8caec5ceb 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -1312,6 +1312,55 @@ int rcu_nocb_cpu_offload(int cpu)
 }
 EXPORT_SYMBOL_GPL(rcu_nocb_cpu_offload);
 
+static unsigned long
+lazy_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
+{
+	int cpu;
+	unsigned long count = 0;
+
+	/* Snapshot count of all CPUs */
+	for_each_possible_cpu(cpu) {
+		struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
+
+		count +=  READ_ONCE(rdp->lazy_len);
+	}
+
+	return count ? count : SHRINK_EMPTY;
+}
+
+static unsigned long
+lazy_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
+{
+	int cpu;
+	unsigned long flags;
+	unsigned long count = 0;
+
+	/* Snapshot count of all CPUs */
+	for_each_possible_cpu(cpu) {
+		struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
+		int _count = READ_ONCE(rdp->lazy_len);
+
+		if (_count == 0)
+			continue;
+		rcu_nocb_lock_irqsave(rdp, flags);
+		WRITE_ONCE(rdp->lazy_len, 0);
+		rcu_nocb_unlock_irqrestore(rdp, flags);
+		wake_nocb_gp(rdp, false);
+		sc->nr_to_scan -= _count;
+		count += _count;
+		if (sc->nr_to_scan <= 0)
+			break;
+	}
+	return count ? count : SHRINK_STOP;
+}
+
+static struct shrinker lazy_rcu_shrinker = {
+	.count_objects = lazy_rcu_shrink_count,
+	.scan_objects = lazy_rcu_shrink_scan,
+	.batch = 0,
+	.seeks = DEFAULT_SEEKS,
+};
+
 void __init rcu_init_nohz(void)
 {
 	int cpu;
@@ -1342,6 +1391,9 @@ void __init rcu_init_nohz(void)
 	if (!rcu_state.nocb_is_setup)
 		return;
 
+	if (register_shrinker(&lazy_rcu_shrinker, "rcu-lazy"))
+		pr_err("Failed to register lazy_rcu shrinker!\n");
+
 	if (!cpumask_subset(rcu_nocb_mask, cpu_possible_mask)) {
 		pr_info("\tNote: kernel parameter 'rcu_nocbs=', 'nohz_full', or 'isolcpus=' contains nonexistent CPUs.\n");
 		cpumask_and(rcu_nocb_mask, cpu_possible_mask,
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v3 rcu 0/16] Lazy call_rcu() updates for v6.2
@ 2022-11-30 18:13 Paul E. McKenney
  2022-11-30 18:13 ` [PATCH rcu 01/16] rcu: Simplify rcu_init_nohz() cpumask handling Paul E. McKenney
                   ` (15 more replies)
  0 siblings, 16 replies; 40+ messages in thread
From: Paul E. McKenney @ 2022-11-30 18:13 UTC (permalink / raw)
  To: rcu; +Cc: linux-kernel, kernel-team, rostedt

Hello!

This series provides energy efficiency for nearly-idle systems by making
call_rcu() more lazy.  Several NOCB changes come along for the ride:

1.	Simplify rcu_init_nohz() cpumask handling, courtesy of Zhen Lei.

2.	Fix late wakeup when flush of bypass cblist happens, courtesy of
	"Joel Fernandes (Google)".

3.	Fix missing nocb gp wake on rcu_barrier(), courtesy of Frederic
	Weisbecker.

4.	Make call_rcu() lazy to save power, courtesy of "Joel Fernandes
	(Google)".

5.	Refactor code a bit in rcu_nocb_do_flush_bypass(), courtesy of
	"Joel Fernandes (Google)".

6.	Shrinker for lazy rcu, courtesy of Vineeth Pillai.

7.	Add laziness and kfree tests, courtesy of "Joel Fernandes
	(Google)".

8.	Use call_rcu_hurry() instead of call_rcu, courtesy of "Joel
	Fernandes (Google)".

9.	Use call_rcu_hurry() for async reader test, courtesy of "Joel
	Fernandes (Google)".

10.	Use call_rcu_hurry() where needed, courtesy of "Joel Fernandes
	(Google)".

11.	scsi/scsi_error: Use call_rcu_hurry() instead of call_rcu(),
	courtesy of Uladzislau Rezki.

12.	percpu-refcount: Use call_rcu_hurry() for atomic switch, courtesy
	of "Joel Fernandes (Google)".

13.	Make queue_rcu_work() use call_rcu_hurry(), courtesy of Uladzislau
	Rezki.

14.	Use call_rcu_hurry() instead of call_rcu(), courtesy of "Joel
	Fernandes (Google)".

15.	Use call_rcu_hurry() for dst_release(), courtesy of "Joel
	Fernandes (Google)".

16.	devinet: Reduce refcount before grace period, courtesy of Eric
	Dumazet.

Changes since v2:

o	Rename call_rcu_flush() to call_rcu_hurry() to avoid naming
	conflicts in workqueues as suggested by Tejun Heo.

o	Apply acks and reviews.

https://lore.kernel.org/all/20221122010408.GA3799268@paulmck-ThinkPad-P17-Gen-1/

Changes since v1:

o	Add more adjustments to avoid excessive laziness (#15 and
	#16 above).

o	Get appropriate Cc lines onto non-RCU patches.

https://lore.kernel.org/all/20221019225138.GA2499943@paulmck-ThinkPad-P17-Gen-1/

						Thanx, Paul

------------------------------------------------------------------------

 b/drivers/scsi/scsi_error.c |    2 
 b/include/linux/rcupdate.h  |    9 +
 b/kernel/rcu/Kconfig        |    8 +
 b/kernel/rcu/rcu.h          |    8 +
 b/kernel/rcu/rcuscale.c     |   67 +++++++++++-
 b/kernel/rcu/rcutorture.c   |   16 +-
 b/kernel/rcu/sync.c         |    2 
 b/kernel/rcu/tiny.c         |    2 
 b/kernel/rcu/tree.c         |   11 +
 b/kernel/rcu/tree.h         |    1 
 b/kernel/rcu/tree_exp.h     |    2 
 b/kernel/rcu/tree_nocb.h    |   34 +-----
 b/kernel/workqueue.c        |    2 
 b/lib/percpu-refcount.c     |    3 
 b/net/core/dst.c            |    2 
 b/net/ipv4/devinet.c        |   19 +--
 b/net/rxrpc/conn_object.c   |    2 
 kernel/rcu/rcuscale.c       |    2 
 kernel/rcu/tree.c           |  129 +++++++++++++++--------
 kernel/rcu/tree.h           |   11 +
 kernel/rcu/tree_nocb.h      |  243 ++++++++++++++++++++++++++++++++++++--------
 21 files changed, 434 insertions(+), 141 deletions(-)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH rcu 07/16] rcuscale: Add laziness and kfree tests
  2022-11-30 18:13 [PATCH v3 rcu 0/16] Lazy call_rcu() updates for v6.2 Paul E. McKenney
                   ` (5 preceding siblings ...)
  2022-11-30 18:13 ` [PATCH rcu 06/16] rcu: Shrinker for lazy rcu Paul E. McKenney
@ 2022-11-30 18:13 ` Paul E. McKenney
  2022-11-30 18:13 ` [PATCH rcu 08/16] rcu/sync: Use call_rcu_hurry() instead of call_rcu Paul E. McKenney
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 40+ messages in thread
From: Paul E. McKenney @ 2022-11-30 18:13 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, rostedt, Joel Fernandes (Google),
	Paul E . McKenney

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

This commit adds 2 tests to rcuscale.  The first one is a startup test
to check whether we are not too lazy or too hard working.  The second
one causes kfree_rcu() itself to use call_rcu() and checks memory
pressure. Testing indicates that the new call_rcu() keeps memory pressure
under control roughly as well as does kfree_rcu().

[ paulmck: Apply checkpatch feedback. ]

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/rcuscale.c | 67 +++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 65 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/rcuscale.c b/kernel/rcu/rcuscale.c
index 3ef02d4a81085..3baded807a616 100644
--- a/kernel/rcu/rcuscale.c
+++ b/kernel/rcu/rcuscale.c
@@ -95,6 +95,7 @@ torture_param(int, verbose, 1, "Enable verbose debugging printk()s");
 torture_param(int, writer_holdoff, 0, "Holdoff (us) between GPs, zero to disable");
 torture_param(int, kfree_rcu_test, 0, "Do we run a kfree_rcu() scale test?");
 torture_param(int, kfree_mult, 1, "Multiple of kfree_obj size to allocate.");
+torture_param(int, kfree_by_call_rcu, 0, "Use call_rcu() to emulate kfree_rcu()?");
 
 static char *scale_type = "rcu";
 module_param(scale_type, charp, 0444);
@@ -659,6 +660,14 @@ struct kfree_obj {
 	struct rcu_head rh;
 };
 
+/* Used if doing RCU-kfree'ing via call_rcu(). */
+static void kfree_call_rcu(struct rcu_head *rh)
+{
+	struct kfree_obj *obj = container_of(rh, struct kfree_obj, rh);
+
+	kfree(obj);
+}
+
 static int
 kfree_scale_thread(void *arg)
 {
@@ -696,6 +705,11 @@ kfree_scale_thread(void *arg)
 			if (!alloc_ptr)
 				return -ENOMEM;
 
+			if (kfree_by_call_rcu) {
+				call_rcu(&(alloc_ptr->rh), kfree_call_rcu);
+				continue;
+			}
+
 			// By default kfree_rcu_test_single and kfree_rcu_test_double are
 			// initialized to false. If both have the same value (false or true)
 			// both are randomly tested, otherwise only the one with value true
@@ -767,11 +781,58 @@ kfree_scale_shutdown(void *arg)
 	return -EINVAL;
 }
 
+// Used if doing RCU-kfree'ing via call_rcu().
+static unsigned long jiffies_at_lazy_cb;
+static struct rcu_head lazy_test1_rh;
+static int rcu_lazy_test1_cb_called;
+static void call_rcu_lazy_test1(struct rcu_head *rh)
+{
+	jiffies_at_lazy_cb = jiffies;
+	WRITE_ONCE(rcu_lazy_test1_cb_called, 1);
+}
+
 static int __init
 kfree_scale_init(void)
 {
-	long i;
 	int firsterr = 0;
+	long i;
+	unsigned long jif_start;
+	unsigned long orig_jif;
+
+	// Also, do a quick self-test to ensure laziness is as much as
+	// expected.
+	if (kfree_by_call_rcu && !IS_ENABLED(CONFIG_RCU_LAZY)) {
+		pr_alert("CONFIG_RCU_LAZY is disabled, falling back to kfree_rcu() for delayed RCU kfree'ing\n");
+		kfree_by_call_rcu = 0;
+	}
+
+	if (kfree_by_call_rcu) {
+		/* do a test to check the timeout. */
+		orig_jif = rcu_lazy_get_jiffies_till_flush();
+
+		rcu_lazy_set_jiffies_till_flush(2 * HZ);
+		rcu_barrier();
+
+		jif_start = jiffies;
+		jiffies_at_lazy_cb = 0;
+		call_rcu(&lazy_test1_rh, call_rcu_lazy_test1);
+
+		smp_cond_load_relaxed(&rcu_lazy_test1_cb_called, VAL == 1);
+
+		rcu_lazy_set_jiffies_till_flush(orig_jif);
+
+		if (WARN_ON_ONCE(jiffies_at_lazy_cb - jif_start < 2 * HZ)) {
+			pr_alert("ERROR: call_rcu() CBs are not being lazy as expected!\n");
+			WARN_ON_ONCE(1);
+			return -1;
+		}
+
+		if (WARN_ON_ONCE(jiffies_at_lazy_cb - jif_start > 3 * HZ)) {
+			pr_alert("ERROR: call_rcu() CBs are being too lazy!\n");
+			WARN_ON_ONCE(1);
+			return -1;
+		}
+	}
 
 	kfree_nrealthreads = compute_real(kfree_nthreads);
 	/* Start up the kthreads. */
@@ -784,7 +845,9 @@ kfree_scale_init(void)
 		schedule_timeout_uninterruptible(1);
 	}
 
-	pr_alert("kfree object size=%zu\n", kfree_mult * sizeof(struct kfree_obj));
+	pr_alert("kfree object size=%zu, kfree_by_call_rcu=%d\n",
+			kfree_mult * sizeof(struct kfree_obj),
+			kfree_by_call_rcu);
 
 	kfree_reader_tasks = kcalloc(kfree_nrealthreads, sizeof(kfree_reader_tasks[0]),
 			       GFP_KERNEL);
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH rcu 08/16] rcu/sync: Use call_rcu_hurry() instead of call_rcu
  2022-11-30 18:13 [PATCH v3 rcu 0/16] Lazy call_rcu() updates for v6.2 Paul E. McKenney
                   ` (6 preceding siblings ...)
  2022-11-30 18:13 ` [PATCH rcu 07/16] rcuscale: Add laziness and kfree tests Paul E. McKenney
@ 2022-11-30 18:13 ` Paul E. McKenney
  2022-11-30 18:13 ` [PATCH rcu 09/16] rcu/rcuscale: Use call_rcu_hurry() for async reader test Paul E. McKenney
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 40+ messages in thread
From: Paul E. McKenney @ 2022-11-30 18:13 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, rostedt, Joel Fernandes (Google),
	Paul E . McKenney

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

call_rcu() changes to save power will slow down rcu sync. Use the
call_rcu_hurry() API instead which reverts to the old behavior.

[ paulmck: Apply s/call_rcu_flush/call_rcu_hurry/ feedback from Tejun Heo. ]

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/sync.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/sync.c b/kernel/rcu/sync.c
index 5cefc702158fe..e550f97779b8d 100644
--- a/kernel/rcu/sync.c
+++ b/kernel/rcu/sync.c
@@ -44,7 +44,7 @@ static void rcu_sync_func(struct rcu_head *rhp);
 
 static void rcu_sync_call(struct rcu_sync *rsp)
 {
-	call_rcu(&rsp->cb_head, rcu_sync_func);
+	call_rcu_hurry(&rsp->cb_head, rcu_sync_func);
 }
 
 /**
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH rcu 09/16] rcu/rcuscale: Use call_rcu_hurry() for async reader test
  2022-11-30 18:13 [PATCH v3 rcu 0/16] Lazy call_rcu() updates for v6.2 Paul E. McKenney
                   ` (7 preceding siblings ...)
  2022-11-30 18:13 ` [PATCH rcu 08/16] rcu/sync: Use call_rcu_hurry() instead of call_rcu Paul E. McKenney
@ 2022-11-30 18:13 ` Paul E. McKenney
  2022-11-30 18:13 ` [PATCH rcu 10/16] rcu/rcutorture: Use call_rcu_hurry() where needed Paul E. McKenney
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 40+ messages in thread
From: Paul E. McKenney @ 2022-11-30 18:13 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, rostedt, Joel Fernandes (Google),
	Paul E . McKenney

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

rcuscale uses call_rcu() to queue async readers. With recent changes to
save power, the test will have fewer async readers in flight. Use the
call_rcu_hurry() API instead to revert to the old behavior.

[ paulmck: Apply s/call_rcu_flush/call_rcu_hurry/ feedback from Tejun Heo. ]

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/rcuscale.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/rcuscale.c b/kernel/rcu/rcuscale.c
index 3baded807a616..91fb5905a008f 100644
--- a/kernel/rcu/rcuscale.c
+++ b/kernel/rcu/rcuscale.c
@@ -176,7 +176,7 @@ static struct rcu_scale_ops rcu_ops = {
 	.get_gp_seq	= rcu_get_gp_seq,
 	.gp_diff	= rcu_seq_diff,
 	.exp_completed	= rcu_exp_batches_completed,
-	.async		= call_rcu,
+	.async		= call_rcu_hurry,
 	.gp_barrier	= rcu_barrier,
 	.sync		= synchronize_rcu,
 	.exp_sync	= synchronize_rcu_expedited,
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH rcu 10/16] rcu/rcutorture: Use call_rcu_hurry() where needed
  2022-11-30 18:13 [PATCH v3 rcu 0/16] Lazy call_rcu() updates for v6.2 Paul E. McKenney
                   ` (8 preceding siblings ...)
  2022-11-30 18:13 ` [PATCH rcu 09/16] rcu/rcuscale: Use call_rcu_hurry() for async reader test Paul E. McKenney
@ 2022-11-30 18:13 ` Paul E. McKenney
  2022-11-30 18:13 ` [PATCH rcu 11/16] scsi/scsi_error: Use call_rcu_hurry() instead of call_rcu() Paul E. McKenney
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 40+ messages in thread
From: Paul E. McKenney @ 2022-11-30 18:13 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, rostedt, Joel Fernandes (Google),
	Paul E . McKenney

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

call_rcu() changes to save power will change the behavior of rcutorture
tests. Use the call_rcu_hurry() API instead which reverts to the old
behavior.

[ paulmck: Apply s/call_rcu_flush/call_rcu_hurry/ feedback from Tejun Heo. ]

Reported-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/rcutorture.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index 503c2aa845a4a..2226f86f54f78 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -510,7 +510,7 @@ static unsigned long rcu_no_completed(void)
 
 static void rcu_torture_deferred_free(struct rcu_torture *p)
 {
-	call_rcu(&p->rtort_rcu, rcu_torture_cb);
+	call_rcu_hurry(&p->rtort_rcu, rcu_torture_cb);
 }
 
 static void rcu_sync_torture_init(void)
@@ -551,7 +551,7 @@ static struct rcu_torture_ops rcu_ops = {
 	.start_gp_poll_exp_full	= start_poll_synchronize_rcu_expedited_full,
 	.poll_gp_state_exp	= poll_state_synchronize_rcu,
 	.cond_sync_exp		= cond_synchronize_rcu_expedited,
-	.call			= call_rcu,
+	.call			= call_rcu_hurry,
 	.cb_barrier		= rcu_barrier,
 	.fqs			= rcu_force_quiescent_state,
 	.stats			= NULL,
@@ -848,7 +848,7 @@ static void rcu_tasks_torture_deferred_free(struct rcu_torture *p)
 
 static void synchronize_rcu_mult_test(void)
 {
-	synchronize_rcu_mult(call_rcu_tasks, call_rcu);
+	synchronize_rcu_mult(call_rcu_tasks, call_rcu_hurry);
 }
 
 static struct rcu_torture_ops tasks_ops = {
@@ -3388,13 +3388,13 @@ static void rcu_test_debug_objects(void)
 	/* Try to queue the rh2 pair of callbacks for the same grace period. */
 	preempt_disable(); /* Prevent preemption from interrupting test. */
 	rcu_read_lock(); /* Make it impossible to finish a grace period. */
-	call_rcu(&rh1, rcu_torture_leak_cb); /* Start grace period. */
+	call_rcu_hurry(&rh1, rcu_torture_leak_cb); /* Start grace period. */
 	local_irq_disable(); /* Make it harder to start a new grace period. */
-	call_rcu(&rh2, rcu_torture_leak_cb);
-	call_rcu(&rh2, rcu_torture_err_cb); /* Duplicate callback. */
+	call_rcu_hurry(&rh2, rcu_torture_leak_cb);
+	call_rcu_hurry(&rh2, rcu_torture_err_cb); /* Duplicate callback. */
 	if (rhp) {
-		call_rcu(rhp, rcu_torture_leak_cb);
-		call_rcu(rhp, rcu_torture_err_cb); /* Another duplicate callback. */
+		call_rcu_hurry(rhp, rcu_torture_leak_cb);
+		call_rcu_hurry(rhp, rcu_torture_err_cb); /* Another duplicate callback. */
 	}
 	local_irq_enable();
 	rcu_read_unlock();
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH rcu 11/16] scsi/scsi_error: Use call_rcu_hurry() instead of call_rcu()
  2022-11-30 18:13 [PATCH v3 rcu 0/16] Lazy call_rcu() updates for v6.2 Paul E. McKenney
                   ` (9 preceding siblings ...)
  2022-11-30 18:13 ` [PATCH rcu 10/16] rcu/rcutorture: Use call_rcu_hurry() where needed Paul E. McKenney
@ 2022-11-30 18:13 ` Paul E. McKenney
  2022-11-30 18:13 ` [PATCH rcu 12/16] percpu-refcount: Use call_rcu_hurry() for atomic switch Paul E. McKenney
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 40+ messages in thread
From: Paul E. McKenney @ 2022-11-30 18:13 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, rostedt, Uladzislau Rezki,
	Joel Fernandes, James E.J. Bottomley, linux-scsi, Bart Van Assche,
	Martin K . Petersen, Paul E . McKenney

From: Uladzislau Rezki <urezki@gmail.com>

Earlier commits in this series allow battery-powered systems to build
their kernels with the default-disabled CONFIG_RCU_LAZY=y Kconfig option.
This Kconfig option causes call_rcu() to delay its callbacks in order
to batch them.  This means that a given RCU grace period covers more
callbacks, thus reducing the number of grace periods, in turn reducing
the amount of energy consumed, which increases battery lifetime which
can be a very good thing.  This is not a subtle effect: In some important
use cases, the battery lifetime is increased by more than 10%.

This CONFIG_RCU_LAZY=y option is available only for CPUs that offload
callbacks, for example, CPUs mentioned in the rcu_nocbs kernel boot
parameter passed to kernels built with CONFIG_RCU_NOCB_CPU=y.

Delaying callbacks is normally not a problem because most callbacks do
nothing but free memory.  If the system is short on memory, a shrinker
will kick all currently queued lazy callbacks out of their laziness,
thus freeing their memory in short order.  Similarly, the rcu_barrier()
function, which blocks until all currently queued callbacks are invoked,
will also kick lazy callbacks, thus enabling rcu_barrier() to complete
in a timely manner.

However, there are some cases where laziness is not a good option.
For example, synchronize_rcu() invokes call_rcu(), and blocks until
the newly queued callback is invoked.  It would not be a good for
synchronize_rcu() to block for ten seconds, even on an idle system.
Therefore, synchronize_rcu() invokes call_rcu_hurry() instead of
call_rcu().  The arrival of a non-lazy call_rcu_hurry() callback on a
given CPU kicks any lazy callbacks that might be already queued on that
CPU.  After all, if there is going to be a grace period, all callbacks
might as well get full benefit from it.

Yes, this could be done the other way around by creating a
call_rcu_lazy(), but earlier experience with this approach and
feedback at the 2022 Linux Plumbers Conference shifted the approach
to call_rcu() being lazy with call_rcu_hurry() for the few places
where laziness is inappropriate.

And another call_rcu() instance that cannot be lazy is the one in the
scsi_eh_scmd_add() function.  Leaving this instance lazy results in
unacceptably slow boot times.

Therefore, make scsi_eh_scmd_add() use call_rcu_hurry() in order to
revert to the old behavior.

[ paulmck: Apply s/call_rcu_flush/call_rcu_hurry/ feedback from Tejun Heo. ]

Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Uladzislau Rezki <urezki@gmail.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: <linux-scsi@vger.kernel.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 drivers/scsi/scsi_error.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 6995c89792300..ac5ff0783b4f0 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -312,7 +312,7 @@ void scsi_eh_scmd_add(struct scsi_cmnd *scmd)
 	 * Ensure that all tasks observe the host state change before the
 	 * host_failed change.
 	 */
-	call_rcu(&scmd->rcu, scsi_eh_inc_host_failed);
+	call_rcu_hurry(&scmd->rcu, scsi_eh_inc_host_failed);
 }
 
 /**
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH rcu 12/16] percpu-refcount: Use call_rcu_hurry() for atomic switch
  2022-11-30 18:13 [PATCH v3 rcu 0/16] Lazy call_rcu() updates for v6.2 Paul E. McKenney
                   ` (10 preceding siblings ...)
  2022-11-30 18:13 ` [PATCH rcu 11/16] scsi/scsi_error: Use call_rcu_hurry() instead of call_rcu() Paul E. McKenney
@ 2022-11-30 18:13 ` Paul E. McKenney
  2022-11-30 18:19   ` Joel Fernandes
  2022-11-30 19:43   ` Tejun Heo
  2022-11-30 18:13 ` [PATCH rcu 13/16] workqueue: Make queue_rcu_work() use call_rcu_hurry() Paul E. McKenney
                   ` (3 subsequent siblings)
  15 siblings, 2 replies; 40+ messages in thread
From: Paul E. McKenney @ 2022-11-30 18:13 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, rostedt, Joel Fernandes (Google),
	Paul E . McKenney, Dennis Zhou, Tejun Heo, Christoph Lameter,
	linux-mm

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

Earlier commits in this series allow battery-powered systems to build
their kernels with the default-disabled CONFIG_RCU_LAZY=y Kconfig option.
This Kconfig option causes call_rcu() to delay its callbacks in order to
batch callbacks.  This means that a given RCU grace period covers more
callbacks, thus reducing the number of grace periods, in turn reducing
the amount of energy consumed, which increases battery lifetime which
can be a very good thing.  This is not a subtle effect: In some important
use cases, the battery lifetime is increased by more than 10%.

This CONFIG_RCU_LAZY=y option is available only for CPUs that offload
callbacks, for example, CPUs mentioned in the rcu_nocbs kernel boot
parameter passed to kernels built with CONFIG_RCU_NOCB_CPU=y.

Delaying callbacks is normally not a problem because most callbacks do
nothing but free memory.  If the system is short on memory, a shrinker
will kick all currently queued lazy callbacks out of their laziness,
thus freeing their memory in short order.  Similarly, the rcu_barrier()
function, which blocks until all currently queued callbacks are invoked,
will also kick lazy callbacks, thus enabling rcu_barrier() to complete
in a timely manner.

However, there are some cases where laziness is not a good option.
For example, synchronize_rcu() invokes call_rcu(), and blocks until
the newly queued callback is invoked.  It would not be a good for
synchronize_rcu() to block for ten seconds, even on an idle system.
Therefore, synchronize_rcu() invokes call_rcu_hurry() instead of
call_rcu().  The arrival of a non-lazy call_rcu_hurry() callback on a
given CPU kicks any lazy callbacks that might be already queued on that
CPU.  After all, if there is going to be a grace period, all callbacks
might as well get full benefit from it.

Yes, this could be done the other way around by creating a
call_rcu_lazy(), but earlier experience with this approach and
feedback at the 2022 Linux Plumbers Conference shifted the approach
to call_rcu() being lazy with call_rcu_hurry() for the few places
where laziness is inappropriate.

And another call_rcu() instance that cannot be lazy is the one on the
percpu refcounter's "per-CPU to atomic switch" code path, which
uses RCU when switching to atomic mode.  The enqueued callback
wakes up waiters waiting in the percpu_ref_switch_waitq.  Allowing
this callback to be lazy would result in unacceptable slowdowns for
users of per-CPU refcounts, such as blk_pre_runtime_suspend().

Therefore, make __percpu_ref_switch_to_atomic() use call_rcu_hurry()
in order to revert to the old behavior.

[ paulmck: Apply s/call_rcu_flush/call_rcu_hurry/ feedback from Tejun Heo. ]

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: <linux-mm@kvack.org>
---
 lib/percpu-refcount.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c
index e5c5315da2741..668f6aa6a75de 100644
--- a/lib/percpu-refcount.c
+++ b/lib/percpu-refcount.c
@@ -230,7 +230,8 @@ static void __percpu_ref_switch_to_atomic(struct percpu_ref *ref,
 		percpu_ref_noop_confirm_switch;
 
 	percpu_ref_get(ref);	/* put after confirmation */
-	call_rcu(&ref->data->rcu, percpu_ref_switch_to_atomic_rcu);
+	call_rcu_hurry(&ref->data->rcu,
+		       percpu_ref_switch_to_atomic_rcu);
 }
 
 static void __percpu_ref_switch_to_percpu(struct percpu_ref *ref)
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH rcu 13/16] workqueue: Make queue_rcu_work() use call_rcu_hurry()
  2022-11-30 18:13 [PATCH v3 rcu 0/16] Lazy call_rcu() updates for v6.2 Paul E. McKenney
                   ` (11 preceding siblings ...)
  2022-11-30 18:13 ` [PATCH rcu 12/16] percpu-refcount: Use call_rcu_hurry() for atomic switch Paul E. McKenney
@ 2022-11-30 18:13 ` Paul E. McKenney
  2022-11-30 18:18   ` Joel Fernandes
  2022-11-30 19:43   ` Tejun Heo
  2022-11-30 18:13 ` [PATCH rcu 14/16] rxrpc: Use call_rcu_hurry() instead of call_rcu() Paul E. McKenney
                   ` (2 subsequent siblings)
  15 siblings, 2 replies; 40+ messages in thread
From: Paul E. McKenney @ 2022-11-30 18:13 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, rostedt, Uladzislau Rezki,
	Joel Fernandes, Tejun Heo, Lai Jiangshan, Paul E . McKenney

From: Uladzislau Rezki <urezki@gmail.com>

Earlier commits in this series allow battery-powered systems to build
their kernels with the default-disabled CONFIG_RCU_LAZY=y Kconfig option.
This Kconfig option causes call_rcu() to delay its callbacks in order
to batch them.  This means that a given RCU grace period covers more
callbacks, thus reducing the number of grace periods, in turn reducing
the amount of energy consumed, which increases battery lifetime which
can be a very good thing.  This is not a subtle effect: In some important
use cases, the battery lifetime is increased by more than 10%.

This CONFIG_RCU_LAZY=y option is available only for CPUs that offload
callbacks, for example, CPUs mentioned in the rcu_nocbs kernel boot
parameter passed to kernels built with CONFIG_RCU_NOCB_CPU=y.

Delaying callbacks is normally not a problem because most callbacks do
nothing but free memory.  If the system is short on memory, a shrinker
will kick all currently queued lazy callbacks out of their laziness,
thus freeing their memory in short order.  Similarly, the rcu_barrier()
function, which blocks until all currently queued callbacks are invoked,
will also kick lazy callbacks, thus enabling rcu_barrier() to complete
in a timely manner.

However, there are some cases where laziness is not a good option.
For example, synchronize_rcu() invokes call_rcu(), and blocks until
the newly queued callback is invoked.  It would not be a good for
synchronize_rcu() to block for ten seconds, even on an idle system.
Therefore, synchronize_rcu() invokes call_rcu_hurry() instead of
call_rcu().  The arrival of a non-lazy call_rcu_hurry() callback on a
given CPU kicks any lazy callbacks that might be already queued on that
CPU.  After all, if there is going to be a grace period, all callbacks
might as well get full benefit from it.

Yes, this could be done the other way around by creating a
call_rcu_lazy(), but earlier experience with this approach and
feedback at the 2022 Linux Plumbers Conference shifted the approach
to call_rcu() being lazy with call_rcu_hurry() for the few places
where laziness is inappropriate.

And another call_rcu() instance that cannot be lazy is the one
in queue_rcu_work(), given that callers to queue_rcu_work() are
not necessarily OK with long delays.

Therefore, make queue_rcu_work() use call_rcu_hurry() in order to revert
to the old behavior.

[ paulmck: Apply s/call_rcu_flush/call_rcu_hurry/ feedback from Tejun Heo. ]

Signed-off-by: Uladzislau Rezki <urezki@gmail.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/workqueue.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 7cd5f5e7e0a1b..07895deca2711 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1771,7 +1771,7 @@ bool queue_rcu_work(struct workqueue_struct *wq, struct rcu_work *rwork)
 
 	if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))) {
 		rwork->wq = wq;
-		call_rcu(&rwork->rcu, rcu_work_rcufn);
+		call_rcu_hurry(&rwork->rcu, rcu_work_rcufn);
 		return true;
 	}
 
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH rcu 14/16] rxrpc: Use call_rcu_hurry() instead of call_rcu()
  2022-11-30 18:13 [PATCH v3 rcu 0/16] Lazy call_rcu() updates for v6.2 Paul E. McKenney
                   ` (12 preceding siblings ...)
  2022-11-30 18:13 ` [PATCH rcu 13/16] workqueue: Make queue_rcu_work() use call_rcu_hurry() Paul E. McKenney
@ 2022-11-30 18:13 ` Paul E. McKenney
  2022-11-30 18:16   ` Joel Fernandes
  2022-11-30 18:13 ` [PATCH rcu 15/16] net: Use call_rcu_hurry() for dst_release() Paul E. McKenney
  2022-11-30 18:13 ` [PATCH rcu 16/16] net: devinet: Reduce refcount before grace period Paul E. McKenney
  15 siblings, 1 reply; 40+ messages in thread
From: Paul E. McKenney @ 2022-11-30 18:13 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, rostedt, Joel Fernandes (Google),
	David Howells, Marc Dionne, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, linux-afs, netdev, Paul E . McKenney

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

Earlier commits in this series allow battery-powered systems to build
their kernels with the default-disabled CONFIG_RCU_LAZY=y Kconfig option.
This Kconfig option causes call_rcu() to delay its callbacks in order
to batch them.  This means that a given RCU grace period covers more
callbacks, thus reducing the number of grace periods, in turn reducing
the amount of energy consumed, which increases battery lifetime which
can be a very good thing.  This is not a subtle effect: In some important
use cases, the battery lifetime is increased by more than 10%.

This CONFIG_RCU_LAZY=y option is available only for CPUs that offload
callbacks, for example, CPUs mentioned in the rcu_nocbs kernel boot
parameter passed to kernels built with CONFIG_RCU_NOCB_CPU=y.

Delaying callbacks is normally not a problem because most callbacks do
nothing but free memory.  If the system is short on memory, a shrinker
will kick all currently queued lazy callbacks out of their laziness,
thus freeing their memory in short order.  Similarly, the rcu_barrier()
function, which blocks until all currently queued callbacks are invoked,
will also kick lazy callbacks, thus enabling rcu_barrier() to complete
in a timely manner.

However, there are some cases where laziness is not a good option.
For example, synchronize_rcu() invokes call_rcu(), and blocks until
the newly queued callback is invoked.  It would not be a good for
synchronize_rcu() to block for ten seconds, even on an idle system.
Therefore, synchronize_rcu() invokes call_rcu_hurry() instead of
call_rcu().  The arrival of a non-lazy call_rcu_hurry() callback on a
given CPU kicks any lazy callbacks that might be already queued on that
CPU.  After all, if there is going to be a grace period, all callbacks
might as well get full benefit from it.

Yes, this could be done the other way around by creating a
call_rcu_lazy(), but earlier experience with this approach and
feedback at the 2022 Linux Plumbers Conference shifted the approach
to call_rcu() being lazy with call_rcu_hurry() for the few places
where laziness is inappropriate.

And another call_rcu() instance that cannot be lazy is the one
in rxrpc_kill_connection(), which sometimes does a wakeup
that should not be unduly delayed.

Therefore, make rxrpc_kill_connection() use call_rcu_hurry() in order
to revert to the old behavior.

[ paulmck: Apply s/call_rcu_flush/call_rcu_hurry/ feedback from Tejun Heo. ]

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Marc Dionne <marc.dionne@auristor.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: <linux-afs@lists.infradead.org>
Cc: <netdev@vger.kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 net/rxrpc/conn_object.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/rxrpc/conn_object.c b/net/rxrpc/conn_object.c
index 22089e37e97f0..9c5fae9ca106c 100644
--- a/net/rxrpc/conn_object.c
+++ b/net/rxrpc/conn_object.c
@@ -253,7 +253,7 @@ void rxrpc_kill_connection(struct rxrpc_connection *conn)
 	 * must carry a ref on the connection to prevent us getting here whilst
 	 * it is queued or running.
 	 */
-	call_rcu(&conn->rcu, rxrpc_destroy_connection);
+	call_rcu_hurry(&conn->rcu, rxrpc_destroy_connection);
 }
 
 /*
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH rcu 15/16] net: Use call_rcu_hurry() for dst_release()
  2022-11-30 18:13 [PATCH v3 rcu 0/16] Lazy call_rcu() updates for v6.2 Paul E. McKenney
                   ` (13 preceding siblings ...)
  2022-11-30 18:13 ` [PATCH rcu 14/16] rxrpc: Use call_rcu_hurry() instead of call_rcu() Paul E. McKenney
@ 2022-11-30 18:13 ` Paul E. McKenney
  2022-11-30 18:16   ` Joel Fernandes
  2022-11-30 18:13 ` [PATCH rcu 16/16] net: devinet: Reduce refcount before grace period Paul E. McKenney
  15 siblings, 1 reply; 40+ messages in thread
From: Paul E. McKenney @ 2022-11-30 18:13 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, rostedt, Joel Fernandes (Google),
	David Ahern, David S. Miller, Eric Dumazet, Hideaki YOSHIFUJI,
	Jakub Kicinski, Paolo Abeni, netdev, Paul E . McKenney

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

In a networking test on ChromeOS, kernels built with the new
CONFIG_RCU_LAZY=y Kconfig option fail a networking test in the teardown
phase.

This failure may be reproduced as follows: ip netns del <name>

The CONFIG_RCU_LAZY=y Kconfig option was introduced by earlier commits
in this series for the benefit of certain battery-powered systems.
This Kconfig option causes call_rcu() to delay its callbacks in order
to batch them.  This means that a given RCU grace period covers more
callbacks, thus reducing the number of grace periods, in turn reducing
the amount of energy consumed, which increases battery lifetime which
can be a very good thing.  This is not a subtle effect: In some important
use cases, the battery lifetime is increased by more than 10%.

This CONFIG_RCU_LAZY=y option is available only for CPUs that offload
callbacks, for example, CPUs mentioned in the rcu_nocbs kernel boot
parameter passed to kernels built with CONFIG_RCU_NOCB_CPU=y.

Delaying callbacks is normally not a problem because most callbacks do
nothing but free memory.  If the system is short on memory, a shrinker
will kick all currently queued lazy callbacks out of their laziness,
thus freeing their memory in short order.  Similarly, the rcu_barrier()
function, which blocks until all currently queued callbacks are invoked,
will also kick lazy callbacks, thus enabling rcu_barrier() to complete
in a timely manner.

However, there are some cases where laziness is not a good option.
For example, synchronize_rcu() invokes call_rcu(), and blocks until
the newly queued callback is invoked.  It would not be a good for
synchronize_rcu() to block for ten seconds, even on an idle system.
Therefore, synchronize_rcu() invokes call_rcu_hurry() instead of
call_rcu().  The arrival of a non-lazy call_rcu_hurry() callback on a
given CPU kicks any lazy callbacks that might be already queued on that
CPU.  After all, if there is going to be a grace period, all callbacks
might as well get full benefit from it.

Yes, this could be done the other way around by creating a
call_rcu_lazy(), but earlier experience with this approach and
feedback at the 2022 Linux Plumbers Conference shifted the approach
to call_rcu() being lazy with call_rcu_hurry() for the few places
where laziness is inappropriate.

Returning to the test failure, use of ftrace showed that this failure
cause caused by the aadded delays due to this new lazy behavior of
call_rcu() in kernels built with CONFIG_RCU_LAZY=y.

Therefore, make dst_release() use call_rcu_hurry() in order to revert
to the old test-failure-free behavior.

[ paulmck: Apply s/call_rcu_flush/call_rcu_hurry/ feedback from Tejun Heo. ]

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: David Ahern <dsahern@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: <netdev@vger.kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 net/core/dst.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/dst.c b/net/core/dst.c
index bc9c9be4e0801..a4e738d321ba2 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -174,7 +174,7 @@ void dst_release(struct dst_entry *dst)
 			net_warn_ratelimited("%s: dst:%p refcnt:%d\n",
 					     __func__, dst, newrefcnt);
 		if (!newrefcnt)
-			call_rcu(&dst->rcu_head, dst_destroy_rcu);
+			call_rcu_hurry(&dst->rcu_head, dst_destroy_rcu);
 	}
 }
 EXPORT_SYMBOL(dst_release);
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH rcu 16/16] net: devinet: Reduce refcount before grace period
  2022-11-30 18:13 [PATCH v3 rcu 0/16] Lazy call_rcu() updates for v6.2 Paul E. McKenney
                   ` (14 preceding siblings ...)
  2022-11-30 18:13 ` [PATCH rcu 15/16] net: Use call_rcu_hurry() for dst_release() Paul E. McKenney
@ 2022-11-30 18:13 ` Paul E. McKenney
  15 siblings, 0 replies; 40+ messages in thread
From: Paul E. McKenney @ 2022-11-30 18:13 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, rostedt, Eric Dumazet, Joel Fernandes,
	David Ahern, David S. Miller, Hideaki YOSHIFUJI, Jakub Kicinski,
	Paolo Abeni, netdev, Paul E . McKenney

From: Eric Dumazet <edumazet@google.com>

Currently, the inetdev_destroy() function waits for an RCU grace period
before decrementing the refcount and freeing memory. This causes a delay
with a new RCU configuration that tries to save power, which results in the
network interface disappearing later than expected. The resulting delay
causes test failures on ChromeOS.

Refactor the code such that the refcount is freed before the grace period
and memory is freed after. With this a ChromeOS network test passes that
does 'ip netns del' and polls for an interface disappearing, now passes.

Reported-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: David Ahern <dsahern@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: <netdev@vger.kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 net/ipv4/devinet.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index e8b9a9202fecd..b0acf6e19aed3 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -234,13 +234,20 @@ static void inet_free_ifa(struct in_ifaddr *ifa)
 	call_rcu(&ifa->rcu_head, inet_rcu_free_ifa);
 }
 
+static void in_dev_free_rcu(struct rcu_head *head)
+{
+	struct in_device *idev = container_of(head, struct in_device, rcu_head);
+
+	kfree(rcu_dereference_protected(idev->mc_hash, 1));
+	kfree(idev);
+}
+
 void in_dev_finish_destroy(struct in_device *idev)
 {
 	struct net_device *dev = idev->dev;
 
 	WARN_ON(idev->ifa_list);
 	WARN_ON(idev->mc_list);
-	kfree(rcu_dereference_protected(idev->mc_hash, 1));
 #ifdef NET_REFCNT_DEBUG
 	pr_debug("%s: %p=%s\n", __func__, idev, dev ? dev->name : "NIL");
 #endif
@@ -248,7 +255,7 @@ void in_dev_finish_destroy(struct in_device *idev)
 	if (!idev->dead)
 		pr_err("Freeing alive in_device %p\n", idev);
 	else
-		kfree(idev);
+		call_rcu(&idev->rcu_head, in_dev_free_rcu);
 }
 EXPORT_SYMBOL(in_dev_finish_destroy);
 
@@ -298,12 +305,6 @@ static struct in_device *inetdev_init(struct net_device *dev)
 	goto out;
 }
 
-static void in_dev_rcu_put(struct rcu_head *head)
-{
-	struct in_device *idev = container_of(head, struct in_device, rcu_head);
-	in_dev_put(idev);
-}
-
 static void inetdev_destroy(struct in_device *in_dev)
 {
 	struct net_device *dev;
@@ -328,7 +329,7 @@ static void inetdev_destroy(struct in_device *in_dev)
 	neigh_parms_release(&arp_tbl, in_dev->arp_parms);
 	arp_ifdown(dev);
 
-	call_rcu(&in_dev->rcu_head, in_dev_rcu_put);
+	in_dev_put(in_dev);
 }
 
 int inet_addr_onlink(struct in_device *in_dev, __be32 a, __be32 b)
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH rcu 14/16] rxrpc: Use call_rcu_hurry() instead of call_rcu()
  2022-11-30 18:13 ` [PATCH rcu 14/16] rxrpc: Use call_rcu_hurry() instead of call_rcu() Paul E. McKenney
@ 2022-11-30 18:16   ` Joel Fernandes
  2022-11-30 18:37     ` Eric Dumazet
  2022-11-30 19:09     ` David Howells
  0 siblings, 2 replies; 40+ messages in thread
From: Joel Fernandes @ 2022-11-30 18:16 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: rcu, linux-kernel, kernel-team, rostedt, David Howells,
	Marc Dionne, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, linux-afs, netdev

Hi Eric,

Could you give your ACK for this patch?

The networking testing passed on ChromeOS and it has been in -next for
some time so has gotten testing there. The CONFIG option is default
disabled.

Thanks a lot,

- Joel

On Wed, Nov 30, 2022 at 6:13 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
>
> Earlier commits in this series allow battery-powered systems to build
> their kernels with the default-disabled CONFIG_RCU_LAZY=y Kconfig option.
> This Kconfig option causes call_rcu() to delay its callbacks in order
> to batch them.  This means that a given RCU grace period covers more
> callbacks, thus reducing the number of grace periods, in turn reducing
> the amount of energy consumed, which increases battery lifetime which
> can be a very good thing.  This is not a subtle effect: In some important
> use cases, the battery lifetime is increased by more than 10%.
>
> This CONFIG_RCU_LAZY=y option is available only for CPUs that offload
> callbacks, for example, CPUs mentioned in the rcu_nocbs kernel boot
> parameter passed to kernels built with CONFIG_RCU_NOCB_CPU=y.
>
> Delaying callbacks is normally not a problem because most callbacks do
> nothing but free memory.  If the system is short on memory, a shrinker
> will kick all currently queued lazy callbacks out of their laziness,
> thus freeing their memory in short order.  Similarly, the rcu_barrier()
> function, which blocks until all currently queued callbacks are invoked,
> will also kick lazy callbacks, thus enabling rcu_barrier() to complete
> in a timely manner.
>
> However, there are some cases where laziness is not a good option.
> For example, synchronize_rcu() invokes call_rcu(), and blocks until
> the newly queued callback is invoked.  It would not be a good for
> synchronize_rcu() to block for ten seconds, even on an idle system.
> Therefore, synchronize_rcu() invokes call_rcu_hurry() instead of
> call_rcu().  The arrival of a non-lazy call_rcu_hurry() callback on a
> given CPU kicks any lazy callbacks that might be already queued on that
> CPU.  After all, if there is going to be a grace period, all callbacks
> might as well get full benefit from it.
>
> Yes, this could be done the other way around by creating a
> call_rcu_lazy(), but earlier experience with this approach and
> feedback at the 2022 Linux Plumbers Conference shifted the approach
> to call_rcu() being lazy with call_rcu_hurry() for the few places
> where laziness is inappropriate.
>
> And another call_rcu() instance that cannot be lazy is the one
> in rxrpc_kill_connection(), which sometimes does a wakeup
> that should not be unduly delayed.
>
> Therefore, make rxrpc_kill_connection() use call_rcu_hurry() in order
> to revert to the old behavior.
>
> [ paulmck: Apply s/call_rcu_flush/call_rcu_hurry/ feedback from Tejun Heo. ]
>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> Cc: David Howells <dhowells@redhat.com>
> Cc: Marc Dionne <marc.dionne@auristor.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Paolo Abeni <pabeni@redhat.com>
> Cc: <linux-afs@lists.infradead.org>
> Cc: <netdev@vger.kernel.org>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> ---
>  net/rxrpc/conn_object.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/rxrpc/conn_object.c b/net/rxrpc/conn_object.c
> index 22089e37e97f0..9c5fae9ca106c 100644
> --- a/net/rxrpc/conn_object.c
> +++ b/net/rxrpc/conn_object.c
> @@ -253,7 +253,7 @@ void rxrpc_kill_connection(struct rxrpc_connection *conn)
>          * must carry a ref on the connection to prevent us getting here whilst
>          * it is queued or running.
>          */
> -       call_rcu(&conn->rcu, rxrpc_destroy_connection);
> +       call_rcu_hurry(&conn->rcu, rxrpc_destroy_connection);
>  }
>
>  /*
> --
> 2.31.1.189.g2e36527f23
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH rcu 15/16] net: Use call_rcu_hurry() for dst_release()
  2022-11-30 18:13 ` [PATCH rcu 15/16] net: Use call_rcu_hurry() for dst_release() Paul E. McKenney
@ 2022-11-30 18:16   ` Joel Fernandes
  2022-11-30 18:39     ` Eric Dumazet
  0 siblings, 1 reply; 40+ messages in thread
From: Joel Fernandes @ 2022-11-30 18:16 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: rcu, linux-kernel, kernel-team, rostedt, David Ahern,
	David S. Miller, Eric Dumazet, Hideaki YOSHIFUJI, Jakub Kicinski,
	Paolo Abeni, netdev

Hi Eric,

Could you give your ACK for this patch for this one as well? This is
the other networking one.

The networking testing passed on ChromeOS and it has been in -next for
some time so has gotten testing there. The CONFIG option is default
disabled.

Thanks a lot,

- Joel

On Wed, Nov 30, 2022 at 6:14 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
>
> In a networking test on ChromeOS, kernels built with the new
> CONFIG_RCU_LAZY=y Kconfig option fail a networking test in the teardown
> phase.
>
> This failure may be reproduced as follows: ip netns del <name>
>
> The CONFIG_RCU_LAZY=y Kconfig option was introduced by earlier commits
> in this series for the benefit of certain battery-powered systems.
> This Kconfig option causes call_rcu() to delay its callbacks in order
> to batch them.  This means that a given RCU grace period covers more
> callbacks, thus reducing the number of grace periods, in turn reducing
> the amount of energy consumed, which increases battery lifetime which
> can be a very good thing.  This is not a subtle effect: In some important
> use cases, the battery lifetime is increased by more than 10%.
>
> This CONFIG_RCU_LAZY=y option is available only for CPUs that offload
> callbacks, for example, CPUs mentioned in the rcu_nocbs kernel boot
> parameter passed to kernels built with CONFIG_RCU_NOCB_CPU=y.
>
> Delaying callbacks is normally not a problem because most callbacks do
> nothing but free memory.  If the system is short on memory, a shrinker
> will kick all currently queued lazy callbacks out of their laziness,
> thus freeing their memory in short order.  Similarly, the rcu_barrier()
> function, which blocks until all currently queued callbacks are invoked,
> will also kick lazy callbacks, thus enabling rcu_barrier() to complete
> in a timely manner.
>
> However, there are some cases where laziness is not a good option.
> For example, synchronize_rcu() invokes call_rcu(), and blocks until
> the newly queued callback is invoked.  It would not be a good for
> synchronize_rcu() to block for ten seconds, even on an idle system.
> Therefore, synchronize_rcu() invokes call_rcu_hurry() instead of
> call_rcu().  The arrival of a non-lazy call_rcu_hurry() callback on a
> given CPU kicks any lazy callbacks that might be already queued on that
> CPU.  After all, if there is going to be a grace period, all callbacks
> might as well get full benefit from it.
>
> Yes, this could be done the other way around by creating a
> call_rcu_lazy(), but earlier experience with this approach and
> feedback at the 2022 Linux Plumbers Conference shifted the approach
> to call_rcu() being lazy with call_rcu_hurry() for the few places
> where laziness is inappropriate.
>
> Returning to the test failure, use of ftrace showed that this failure
> cause caused by the aadded delays due to this new lazy behavior of
> call_rcu() in kernels built with CONFIG_RCU_LAZY=y.
>
> Therefore, make dst_release() use call_rcu_hurry() in order to revert
> to the old test-failure-free behavior.
>
> [ paulmck: Apply s/call_rcu_flush/call_rcu_hurry/ feedback from Tejun Heo. ]
>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> Cc: David Ahern <dsahern@kernel.org>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Paolo Abeni <pabeni@redhat.com>
> Cc: <netdev@vger.kernel.org>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> ---
>  net/core/dst.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/core/dst.c b/net/core/dst.c
> index bc9c9be4e0801..a4e738d321ba2 100644
> --- a/net/core/dst.c
> +++ b/net/core/dst.c
> @@ -174,7 +174,7 @@ void dst_release(struct dst_entry *dst)
>                         net_warn_ratelimited("%s: dst:%p refcnt:%d\n",
>                                              __func__, dst, newrefcnt);
>                 if (!newrefcnt)
> -                       call_rcu(&dst->rcu_head, dst_destroy_rcu);
> +                       call_rcu_hurry(&dst->rcu_head, dst_destroy_rcu);
>         }
>  }
>  EXPORT_SYMBOL(dst_release);
> --
> 2.31.1.189.g2e36527f23
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH rcu 13/16] workqueue: Make queue_rcu_work() use call_rcu_hurry()
  2022-11-30 18:13 ` [PATCH rcu 13/16] workqueue: Make queue_rcu_work() use call_rcu_hurry() Paul E. McKenney
@ 2022-11-30 18:18   ` Joel Fernandes
  2022-11-30 19:43   ` Tejun Heo
  1 sibling, 0 replies; 40+ messages in thread
From: Joel Fernandes @ 2022-11-30 18:18 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: rcu, linux-kernel, kernel-team, rostedt, Uladzislau Rezki,
	Tejun Heo, Lai Jiangshan

Hi Tejun,

The API is renamed to call_rcu_hurry() as you and Paul discussed, to
avoid conflicts with the word flush. Could you give your ACK for this
patch, for workqueue?

Thanks a lot,

- Joel

On Wed, Nov 30, 2022 at 6:13 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> From: Uladzislau Rezki <urezki@gmail.com>
>
> Earlier commits in this series allow battery-powered systems to build
> their kernels with the default-disabled CONFIG_RCU_LAZY=y Kconfig option.
> This Kconfig option causes call_rcu() to delay its callbacks in order
> to batch them.  This means that a given RCU grace period covers more
> callbacks, thus reducing the number of grace periods, in turn reducing
> the amount of energy consumed, which increases battery lifetime which
> can be a very good thing.  This is not a subtle effect: In some important
> use cases, the battery lifetime is increased by more than 10%.
>
> This CONFIG_RCU_LAZY=y option is available only for CPUs that offload
> callbacks, for example, CPUs mentioned in the rcu_nocbs kernel boot
> parameter passed to kernels built with CONFIG_RCU_NOCB_CPU=y.
>
> Delaying callbacks is normally not a problem because most callbacks do
> nothing but free memory.  If the system is short on memory, a shrinker
> will kick all currently queued lazy callbacks out of their laziness,
> thus freeing their memory in short order.  Similarly, the rcu_barrier()
> function, which blocks until all currently queued callbacks are invoked,
> will also kick lazy callbacks, thus enabling rcu_barrier() to complete
> in a timely manner.
>
> However, there are some cases where laziness is not a good option.
> For example, synchronize_rcu() invokes call_rcu(), and blocks until
> the newly queued callback is invoked.  It would not be a good for
> synchronize_rcu() to block for ten seconds, even on an idle system.
> Therefore, synchronize_rcu() invokes call_rcu_hurry() instead of
> call_rcu().  The arrival of a non-lazy call_rcu_hurry() callback on a
> given CPU kicks any lazy callbacks that might be already queued on that
> CPU.  After all, if there is going to be a grace period, all callbacks
> might as well get full benefit from it.
>
> Yes, this could be done the other way around by creating a
> call_rcu_lazy(), but earlier experience with this approach and
> feedback at the 2022 Linux Plumbers Conference shifted the approach
> to call_rcu() being lazy with call_rcu_hurry() for the few places
> where laziness is inappropriate.
>
> And another call_rcu() instance that cannot be lazy is the one
> in queue_rcu_work(), given that callers to queue_rcu_work() are
> not necessarily OK with long delays.
>
> Therefore, make queue_rcu_work() use call_rcu_hurry() in order to revert
> to the old behavior.
>
> [ paulmck: Apply s/call_rcu_flush/call_rcu_hurry/ feedback from Tejun Heo. ]
>
> Signed-off-by: Uladzislau Rezki <urezki@gmail.com>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> ---
>  kernel/workqueue.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 7cd5f5e7e0a1b..07895deca2711 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -1771,7 +1771,7 @@ bool queue_rcu_work(struct workqueue_struct *wq, struct rcu_work *rwork)
>
>         if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))) {
>                 rwork->wq = wq;
> -               call_rcu(&rwork->rcu, rcu_work_rcufn);
> +               call_rcu_hurry(&rwork->rcu, rcu_work_rcufn);
>                 return true;
>         }
>
> --
> 2.31.1.189.g2e36527f23
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH rcu 12/16] percpu-refcount: Use call_rcu_hurry() for atomic switch
  2022-11-30 18:13 ` [PATCH rcu 12/16] percpu-refcount: Use call_rcu_hurry() for atomic switch Paul E. McKenney
@ 2022-11-30 18:19   ` Joel Fernandes
  2022-11-30 19:43   ` Tejun Heo
  1 sibling, 0 replies; 40+ messages in thread
From: Joel Fernandes @ 2022-11-30 18:19 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: rcu, linux-kernel, kernel-team, rostedt, Dennis Zhou, Tejun Heo,
	Christoph Lameter, linux-mm

Hi Tejun,

Could you give your ACK for this patch, for percpu refcount? The API
is renamed like in the workqueue one, as well.

Thanks a lot,

- Joel


On Wed, Nov 30, 2022 at 6:13 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
>
> Earlier commits in this series allow battery-powered systems to build
> their kernels with the default-disabled CONFIG_RCU_LAZY=y Kconfig option.
> This Kconfig option causes call_rcu() to delay its callbacks in order to
> batch callbacks.  This means that a given RCU grace period covers more
> callbacks, thus reducing the number of grace periods, in turn reducing
> the amount of energy consumed, which increases battery lifetime which
> can be a very good thing.  This is not a subtle effect: In some important
> use cases, the battery lifetime is increased by more than 10%.
>
> This CONFIG_RCU_LAZY=y option is available only for CPUs that offload
> callbacks, for example, CPUs mentioned in the rcu_nocbs kernel boot
> parameter passed to kernels built with CONFIG_RCU_NOCB_CPU=y.
>
> Delaying callbacks is normally not a problem because most callbacks do
> nothing but free memory.  If the system is short on memory, a shrinker
> will kick all currently queued lazy callbacks out of their laziness,
> thus freeing their memory in short order.  Similarly, the rcu_barrier()
> function, which blocks until all currently queued callbacks are invoked,
> will also kick lazy callbacks, thus enabling rcu_barrier() to complete
> in a timely manner.
>
> However, there are some cases where laziness is not a good option.
> For example, synchronize_rcu() invokes call_rcu(), and blocks until
> the newly queued callback is invoked.  It would not be a good for
> synchronize_rcu() to block for ten seconds, even on an idle system.
> Therefore, synchronize_rcu() invokes call_rcu_hurry() instead of
> call_rcu().  The arrival of a non-lazy call_rcu_hurry() callback on a
> given CPU kicks any lazy callbacks that might be already queued on that
> CPU.  After all, if there is going to be a grace period, all callbacks
> might as well get full benefit from it.
>
> Yes, this could be done the other way around by creating a
> call_rcu_lazy(), but earlier experience with this approach and
> feedback at the 2022 Linux Plumbers Conference shifted the approach
> to call_rcu() being lazy with call_rcu_hurry() for the few places
> where laziness is inappropriate.
>
> And another call_rcu() instance that cannot be lazy is the one on the
> percpu refcounter's "per-CPU to atomic switch" code path, which
> uses RCU when switching to atomic mode.  The enqueued callback
> wakes up waiters waiting in the percpu_ref_switch_waitq.  Allowing
> this callback to be lazy would result in unacceptable slowdowns for
> users of per-CPU refcounts, such as blk_pre_runtime_suspend().
>
> Therefore, make __percpu_ref_switch_to_atomic() use call_rcu_hurry()
> in order to revert to the old behavior.
>
> [ paulmck: Apply s/call_rcu_flush/call_rcu_hurry/ feedback from Tejun Heo. ]
>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> Cc: Dennis Zhou <dennis@kernel.org>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: <linux-mm@kvack.org>
> ---
>  lib/percpu-refcount.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c
> index e5c5315da2741..668f6aa6a75de 100644
> --- a/lib/percpu-refcount.c
> +++ b/lib/percpu-refcount.c
> @@ -230,7 +230,8 @@ static void __percpu_ref_switch_to_atomic(struct percpu_ref *ref,
>                 percpu_ref_noop_confirm_switch;
>
>         percpu_ref_get(ref);    /* put after confirmation */
> -       call_rcu(&ref->data->rcu, percpu_ref_switch_to_atomic_rcu);
> +       call_rcu_hurry(&ref->data->rcu,
> +                      percpu_ref_switch_to_atomic_rcu);
>  }
>
>  static void __percpu_ref_switch_to_percpu(struct percpu_ref *ref)
> --
> 2.31.1.189.g2e36527f23
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH rcu 14/16] rxrpc: Use call_rcu_hurry() instead of call_rcu()
  2022-11-30 18:16   ` Joel Fernandes
@ 2022-11-30 18:37     ` Eric Dumazet
  2022-11-30 21:45       ` Paul E. McKenney
  2022-11-30 19:09     ` David Howells
  1 sibling, 1 reply; 40+ messages in thread
From: Eric Dumazet @ 2022-11-30 18:37 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Paul E. McKenney, rcu, linux-kernel, kernel-team, rostedt,
	David Howells, Marc Dionne, David S. Miller, Jakub Kicinski,
	Paolo Abeni, linux-afs, netdev

Ah, I see a slightly better name has been chosen ;)

Reviewed-by: Eric Dumazet <edumazet@google.com>

On Wed, Nov 30, 2022 at 7:16 PM Joel Fernandes <joel@joelfernandes.org> wrote:
>
> Hi Eric,
>
> Could you give your ACK for this patch?
>
> The networking testing passed on ChromeOS and it has been in -next for
> some time so has gotten testing there. The CONFIG option is default
> disabled.
>
> Thanks a lot,
>
> - Joel
>
> On Wed, Nov 30, 2022 at 6:13 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> >
> > Earlier commits in this series allow battery-powered systems to build
> > their kernels with the default-disabled CONFIG_RCU_LAZY=y Kconfig option.
> > This Kconfig option causes call_rcu() to delay its callbacks in order
> > to batch them.  This means that a given RCU grace period covers more
> > callbacks, thus reducing the number of grace periods, in turn reducing
> > the amount of energy consumed, which increases battery lifetime which
> > can be a very good thing.  This is not a subtle effect: In some important
> > use cases, the battery lifetime is increased by more than 10%.
> >
> > This CONFIG_RCU_LAZY=y option is available only for CPUs that offload
> > callbacks, for example, CPUs mentioned in the rcu_nocbs kernel boot
> > parameter passed to kernels built with CONFIG_RCU_NOCB_CPU=y.
> >
> > Delaying callbacks is normally not a problem because most callbacks do
> > nothing but free memory.  If the system is short on memory, a shrinker
> > will kick all currently queued lazy callbacks out of their laziness,
> > thus freeing their memory in short order.  Similarly, the rcu_barrier()
> > function, which blocks until all currently queued callbacks are invoked,
> > will also kick lazy callbacks, thus enabling rcu_barrier() to complete
> > in a timely manner.
> >
> > However, there are some cases where laziness is not a good option.
> > For example, synchronize_rcu() invokes call_rcu(), and blocks until
> > the newly queued callback is invoked.  It would not be a good for
> > synchronize_rcu() to block for ten seconds, even on an idle system.
> > Therefore, synchronize_rcu() invokes call_rcu_hurry() instead of
> > call_rcu().  The arrival of a non-lazy call_rcu_hurry() callback on a
> > given CPU kicks any lazy callbacks that might be already queued on that
> > CPU.  After all, if there is going to be a grace period, all callbacks
> > might as well get full benefit from it.
> >
> > Yes, this could be done the other way around by creating a
> > call_rcu_lazy(), but earlier experience with this approach and
> > feedback at the 2022 Linux Plumbers Conference shifted the approach
> > to call_rcu() being lazy with call_rcu_hurry() for the few places
> > where laziness is inappropriate.
> >
> > And another call_rcu() instance that cannot be lazy is the one
> > in rxrpc_kill_connection(), which sometimes does a wakeup
> > that should not be unduly delayed.
> >
> > Therefore, make rxrpc_kill_connection() use call_rcu_hurry() in order
> > to revert to the old behavior.
> >
> > [ paulmck: Apply s/call_rcu_flush/call_rcu_hurry/ feedback from Tejun Heo. ]
> >
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > Cc: David Howells <dhowells@redhat.com>
> > Cc: Marc Dionne <marc.dionne@auristor.com>
> > Cc: "David S. Miller" <davem@davemloft.net>
> > Cc: Eric Dumazet <edumazet@google.com>
> > Cc: Jakub Kicinski <kuba@kernel.org>
> > Cc: Paolo Abeni <pabeni@redhat.com>
> > Cc: <linux-afs@lists.infradead.org>
> > Cc: <netdev@vger.kernel.org>
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > ---
> >  net/rxrpc/conn_object.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/net/rxrpc/conn_object.c b/net/rxrpc/conn_object.c
> > index 22089e37e97f0..9c5fae9ca106c 100644
> > --- a/net/rxrpc/conn_object.c
> > +++ b/net/rxrpc/conn_object.c
> > @@ -253,7 +253,7 @@ void rxrpc_kill_connection(struct rxrpc_connection *conn)
> >          * must carry a ref on the connection to prevent us getting here whilst
> >          * it is queued or running.
> >          */
> > -       call_rcu(&conn->rcu, rxrpc_destroy_connection);
> > +       call_rcu_hurry(&conn->rcu, rxrpc_destroy_connection);
> >  }
> >
> >  /*
> > --
> > 2.31.1.189.g2e36527f23
> >

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH rcu 15/16] net: Use call_rcu_hurry() for dst_release()
  2022-11-30 18:16   ` Joel Fernandes
@ 2022-11-30 18:39     ` Eric Dumazet
  2022-11-30 18:50       ` Joel Fernandes
  2022-11-30 21:40       ` Paul E. McKenney
  0 siblings, 2 replies; 40+ messages in thread
From: Eric Dumazet @ 2022-11-30 18:39 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Paul E. McKenney, rcu, linux-kernel, kernel-team, rostedt,
	David Ahern, David S. Miller, Hideaki YOSHIFUJI, Jakub Kicinski,
	Paolo Abeni, netdev

Sure, thanks.

Reviewed-by: Eric Dumazet <edumazet@google.com>

I think we can work later to change how dst are freed/released to
avoid using call_rcu_hurry()

On Wed, Nov 30, 2022 at 7:17 PM Joel Fernandes <joel@joelfernandes.org> wrote:
>
> Hi Eric,
>
> Could you give your ACK for this patch for this one as well? This is
> the other networking one.
>
> The networking testing passed on ChromeOS and it has been in -next for
> some time so has gotten testing there. The CONFIG option is default
> disabled.
>
> Thanks a lot,
>
> - Joel
>
> On Wed, Nov 30, 2022 at 6:14 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> >
> > In a networking test on ChromeOS, kernels built with the new
> > CONFIG_RCU_LAZY=y Kconfig option fail a networking test in the teardown
> > phase.
> >
> > This failure may be reproduced as follows: ip netns del <name>
> >
> > The CONFIG_RCU_LAZY=y Kconfig option was introduced by earlier commits
> > in this series for the benefit of certain battery-powered systems.
> > This Kconfig option causes call_rcu() to delay its callbacks in order
> > to batch them.  This means that a given RCU grace period covers more
> > callbacks, thus reducing the number of grace periods, in turn reducing
> > the amount of energy consumed, which increases battery lifetime which
> > can be a very good thing.  This is not a subtle effect: In some important
> > use cases, the battery lifetime is increased by more than 10%.
> >
> > This CONFIG_RCU_LAZY=y option is available only for CPUs that offload
> > callbacks, for example, CPUs mentioned in the rcu_nocbs kernel boot
> > parameter passed to kernels built with CONFIG_RCU_NOCB_CPU=y.
> >
> > Delaying callbacks is normally not a problem because most callbacks do
> > nothing but free memory.  If the system is short on memory, a shrinker
> > will kick all currently queued lazy callbacks out of their laziness,
> > thus freeing their memory in short order.  Similarly, the rcu_barrier()
> > function, which blocks until all currently queued callbacks are invoked,
> > will also kick lazy callbacks, thus enabling rcu_barrier() to complete
> > in a timely manner.
> >
> > However, there are some cases where laziness is not a good option.
> > For example, synchronize_rcu() invokes call_rcu(), and blocks until
> > the newly queued callback is invoked.  It would not be a good for
> > synchronize_rcu() to block for ten seconds, even on an idle system.
> > Therefore, synchronize_rcu() invokes call_rcu_hurry() instead of
> > call_rcu().  The arrival of a non-lazy call_rcu_hurry() callback on a
> > given CPU kicks any lazy callbacks that might be already queued on that
> > CPU.  After all, if there is going to be a grace period, all callbacks
> > might as well get full benefit from it.
> >
> > Yes, this could be done the other way around by creating a
> > call_rcu_lazy(), but earlier experience with this approach and
> > feedback at the 2022 Linux Plumbers Conference shifted the approach
> > to call_rcu() being lazy with call_rcu_hurry() for the few places
> > where laziness is inappropriate.
> >
> > Returning to the test failure, use of ftrace showed that this failure
> > cause caused by the aadded delays due to this new lazy behavior of
> > call_rcu() in kernels built with CONFIG_RCU_LAZY=y.
> >
> > Therefore, make dst_release() use call_rcu_hurry() in order to revert
> > to the old test-failure-free behavior.
> >
> > [ paulmck: Apply s/call_rcu_flush/call_rcu_hurry/ feedback from Tejun Heo. ]
> >
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > Cc: David Ahern <dsahern@kernel.org>
> > Cc: "David S. Miller" <davem@davemloft.net>
> > Cc: Eric Dumazet <edumazet@google.com>
> > Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
> > Cc: Jakub Kicinski <kuba@kernel.org>
> > Cc: Paolo Abeni <pabeni@redhat.com>
> > Cc: <netdev@vger.kernel.org>
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > ---
> >  net/core/dst.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/net/core/dst.c b/net/core/dst.c
> > index bc9c9be4e0801..a4e738d321ba2 100644
> > --- a/net/core/dst.c
> > +++ b/net/core/dst.c
> > @@ -174,7 +174,7 @@ void dst_release(struct dst_entry *dst)
> >                         net_warn_ratelimited("%s: dst:%p refcnt:%d\n",
> >                                              __func__, dst, newrefcnt);
> >                 if (!newrefcnt)
> > -                       call_rcu(&dst->rcu_head, dst_destroy_rcu);
> > +                       call_rcu_hurry(&dst->rcu_head, dst_destroy_rcu);
> >         }
> >  }
> >  EXPORT_SYMBOL(dst_release);
> > --
> > 2.31.1.189.g2e36527f23
> >

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH rcu 15/16] net: Use call_rcu_hurry() for dst_release()
  2022-11-30 18:39     ` Eric Dumazet
@ 2022-11-30 18:50       ` Joel Fernandes
  2022-11-30 21:40       ` Paul E. McKenney
  1 sibling, 0 replies; 40+ messages in thread
From: Joel Fernandes @ 2022-11-30 18:50 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Paul E. McKenney, rcu, linux-kernel, kernel-team, rostedt,
	David Ahern, David S. Miller, Hideaki YOSHIFUJI, Jakub Kicinski,
	Paolo Abeni, netdev

Hi Eric,

On Wed, Nov 30, 2022 at 6:39 PM Eric Dumazet <edumazet@google.com> wrote:
>
> Sure, thanks.
>
> Reviewed-by: Eric Dumazet <edumazet@google.com>
>
> I think we can work later to change how dst are freed/released to
> avoid using call_rcu_hurry()

That sounds great, if you can give me any high-level guidance (in the
future) on that and what to look for, I can give it a try as well. I
have been wanting to learn more about the networking code :-)

Thanks,

 - Joel


> On Wed, Nov 30, 2022 at 7:17 PM Joel Fernandes <joel@joelfernandes.org> wrote:
> >
> > Hi Eric,
> >
> > Could you give your ACK for this patch for this one as well? This is
> > the other networking one.
> >
> > The networking testing passed on ChromeOS and it has been in -next for
> > some time so has gotten testing there. The CONFIG option is default
> > disabled.
> >
> > Thanks a lot,
> >
> > - Joel
> >
> > On Wed, Nov 30, 2022 at 6:14 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> > >
> > > From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> > >
> > > In a networking test on ChromeOS, kernels built with the new
> > > CONFIG_RCU_LAZY=y Kconfig option fail a networking test in the teardown
> > > phase.
> > >
> > > This failure may be reproduced as follows: ip netns del <name>
> > >
> > > The CONFIG_RCU_LAZY=y Kconfig option was introduced by earlier commits
> > > in this series for the benefit of certain battery-powered systems.
> > > This Kconfig option causes call_rcu() to delay its callbacks in order
> > > to batch them.  This means that a given RCU grace period covers more
> > > callbacks, thus reducing the number of grace periods, in turn reducing
> > > the amount of energy consumed, which increases battery lifetime which
> > > can be a very good thing.  This is not a subtle effect: In some important
> > > use cases, the battery lifetime is increased by more than 10%.
> > >
> > > This CONFIG_RCU_LAZY=y option is available only for CPUs that offload
> > > callbacks, for example, CPUs mentioned in the rcu_nocbs kernel boot
> > > parameter passed to kernels built with CONFIG_RCU_NOCB_CPU=y.
> > >
> > > Delaying callbacks is normally not a problem because most callbacks do
> > > nothing but free memory.  If the system is short on memory, a shrinker
> > > will kick all currently queued lazy callbacks out of their laziness,
> > > thus freeing their memory in short order.  Similarly, the rcu_barrier()
> > > function, which blocks until all currently queued callbacks are invoked,
> > > will also kick lazy callbacks, thus enabling rcu_barrier() to complete
> > > in a timely manner.
> > >
> > > However, there are some cases where laziness is not a good option.
> > > For example, synchronize_rcu() invokes call_rcu(), and blocks until
> > > the newly queued callback is invoked.  It would not be a good for
> > > synchronize_rcu() to block for ten seconds, even on an idle system.
> > > Therefore, synchronize_rcu() invokes call_rcu_hurry() instead of
> > > call_rcu().  The arrival of a non-lazy call_rcu_hurry() callback on a
> > > given CPU kicks any lazy callbacks that might be already queued on that
> > > CPU.  After all, if there is going to be a grace period, all callbacks
> > > might as well get full benefit from it.
> > >
> > > Yes, this could be done the other way around by creating a
> > > call_rcu_lazy(), but earlier experience with this approach and
> > > feedback at the 2022 Linux Plumbers Conference shifted the approach
> > > to call_rcu() being lazy with call_rcu_hurry() for the few places
> > > where laziness is inappropriate.
> > >
> > > Returning to the test failure, use of ftrace showed that this failure
> > > cause caused by the aadded delays due to this new lazy behavior of
> > > call_rcu() in kernels built with CONFIG_RCU_LAZY=y.
> > >
> > > Therefore, make dst_release() use call_rcu_hurry() in order to revert
> > > to the old test-failure-free behavior.
> > >
> > > [ paulmck: Apply s/call_rcu_flush/call_rcu_hurry/ feedback from Tejun Heo. ]
> > >
> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > Cc: David Ahern <dsahern@kernel.org>
> > > Cc: "David S. Miller" <davem@davemloft.net>
> > > Cc: Eric Dumazet <edumazet@google.com>
> > > Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
> > > Cc: Jakub Kicinski <kuba@kernel.org>
> > > Cc: Paolo Abeni <pabeni@redhat.com>
> > > Cc: <netdev@vger.kernel.org>
> > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > > ---
> > >  net/core/dst.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/net/core/dst.c b/net/core/dst.c
> > > index bc9c9be4e0801..a4e738d321ba2 100644
> > > --- a/net/core/dst.c
> > > +++ b/net/core/dst.c
> > > @@ -174,7 +174,7 @@ void dst_release(struct dst_entry *dst)
> > >                         net_warn_ratelimited("%s: dst:%p refcnt:%d\n",
> > >                                              __func__, dst, newrefcnt);
> > >                 if (!newrefcnt)
> > > -                       call_rcu(&dst->rcu_head, dst_destroy_rcu);
> > > +                       call_rcu_hurry(&dst->rcu_head, dst_destroy_rcu);
> > >         }
> > >  }
> > >  EXPORT_SYMBOL(dst_release);
> > > --
> > > 2.31.1.189.g2e36527f23
> > >

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH rcu 14/16] rxrpc: Use call_rcu_hurry() instead of call_rcu()
  2022-11-30 18:16   ` Joel Fernandes
  2022-11-30 18:37     ` Eric Dumazet
@ 2022-11-30 19:09     ` David Howells
  2022-11-30 19:20       ` Joel Fernandes
                         ` (2 more replies)
  1 sibling, 3 replies; 40+ messages in thread
From: David Howells @ 2022-11-30 19:09 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: dhowells, Paul E. McKenney, rcu, linux-kernel, kernel-team,
	rostedt, Marc Dionne, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, linux-afs, netdev

Note that this conflicts with my patch:

	rxrpc: Don't hold a ref for connection workqueue
	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/commit/?h=rxrpc-next&id=450b00011290660127c2d76f5c5ed264126eb229

which should render it unnecessary.  It's a little ahead of yours in the
net-next queue, if that means anything.

David


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH rcu 14/16] rxrpc: Use call_rcu_hurry() instead of call_rcu()
  2022-11-30 19:09     ` David Howells
@ 2022-11-30 19:20       ` Joel Fernandes
  2022-11-30 21:43         ` Paul E. McKenney
  2022-11-30 20:12       ` Paul E. McKenney
  2022-11-30 22:47       ` Joel Fernandes
  2 siblings, 1 reply; 40+ messages in thread
From: Joel Fernandes @ 2022-11-30 19:20 UTC (permalink / raw)
  To: David Howells
  Cc: Paul E. McKenney, rcu, linux-kernel, kernel-team, rostedt,
	Marc Dionne, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, linux-afs, netdev



> On Nov 30, 2022, at 2:09 PM, David Howells <dhowells@redhat.com> wrote:
> 
> Note that this conflicts with my patch:

Oh.  I don’t see any review or Ack tags on it. Is it still under review?

Thanks,

- Joel



> 
>    rxrpc: Don't hold a ref for connection workqueue
>    https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/commit/?h=rxrpc-next&id=450b00011290660127c2d76f5c5ed264126eb229
> 
> which should render it unnecessary.  It's a little ahead of yours in the
> net-next queue, if that means anything.
> 
> David
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH rcu 13/16] workqueue: Make queue_rcu_work() use call_rcu_hurry()
  2022-11-30 18:13 ` [PATCH rcu 13/16] workqueue: Make queue_rcu_work() use call_rcu_hurry() Paul E. McKenney
  2022-11-30 18:18   ` Joel Fernandes
@ 2022-11-30 19:43   ` Tejun Heo
  1 sibling, 0 replies; 40+ messages in thread
From: Tejun Heo @ 2022-11-30 19:43 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: rcu, linux-kernel, kernel-team, rostedt, Uladzislau Rezki,
	Joel Fernandes, Lai Jiangshan

On Wed, Nov 30, 2022 at 10:13:22AM -0800, Paul E. McKenney wrote:
> From: Uladzislau Rezki <urezki@gmail.com>
> 
> Earlier commits in this series allow battery-powered systems to build
> their kernels with the default-disabled CONFIG_RCU_LAZY=y Kconfig option.
> This Kconfig option causes call_rcu() to delay its callbacks in order
> to batch them.  This means that a given RCU grace period covers more
> callbacks, thus reducing the number of grace periods, in turn reducing
> the amount of energy consumed, which increases battery lifetime which
> can be a very good thing.  This is not a subtle effect: In some important
> use cases, the battery lifetime is increased by more than 10%.
> 
> This CONFIG_RCU_LAZY=y option is available only for CPUs that offload
> callbacks, for example, CPUs mentioned in the rcu_nocbs kernel boot
> parameter passed to kernels built with CONFIG_RCU_NOCB_CPU=y.
> 
> Delaying callbacks is normally not a problem because most callbacks do
> nothing but free memory.  If the system is short on memory, a shrinker
> will kick all currently queued lazy callbacks out of their laziness,
> thus freeing their memory in short order.  Similarly, the rcu_barrier()
> function, which blocks until all currently queued callbacks are invoked,
> will also kick lazy callbacks, thus enabling rcu_barrier() to complete
> in a timely manner.
> 
> However, there are some cases where laziness is not a good option.
> For example, synchronize_rcu() invokes call_rcu(), and blocks until
> the newly queued callback is invoked.  It would not be a good for
> synchronize_rcu() to block for ten seconds, even on an idle system.
> Therefore, synchronize_rcu() invokes call_rcu_hurry() instead of
> call_rcu().  The arrival of a non-lazy call_rcu_hurry() callback on a
> given CPU kicks any lazy callbacks that might be already queued on that
> CPU.  After all, if there is going to be a grace period, all callbacks
> might as well get full benefit from it.
> 
> Yes, this could be done the other way around by creating a
> call_rcu_lazy(), but earlier experience with this approach and
> feedback at the 2022 Linux Plumbers Conference shifted the approach
> to call_rcu() being lazy with call_rcu_hurry() for the few places
> where laziness is inappropriate.
> 
> And another call_rcu() instance that cannot be lazy is the one
> in queue_rcu_work(), given that callers to queue_rcu_work() are
> not necessarily OK with long delays.
> 
> Therefore, make queue_rcu_work() use call_rcu_hurry() in order to revert
> to the old behavior.
> 
> [ paulmck: Apply s/call_rcu_flush/call_rcu_hurry/ feedback from Tejun Heo. ]
> 
> Signed-off-by: Uladzislau Rezki <urezki@gmail.com>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH rcu 12/16] percpu-refcount: Use call_rcu_hurry() for atomic switch
  2022-11-30 18:13 ` [PATCH rcu 12/16] percpu-refcount: Use call_rcu_hurry() for atomic switch Paul E. McKenney
  2022-11-30 18:19   ` Joel Fernandes
@ 2022-11-30 19:43   ` Tejun Heo
  2022-11-30 21:44     ` Paul E. McKenney
  1 sibling, 1 reply; 40+ messages in thread
From: Tejun Heo @ 2022-11-30 19:43 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: rcu, linux-kernel, kernel-team, rostedt, Joel Fernandes (Google),
	Dennis Zhou, Christoph Lameter, linux-mm

On Wed, Nov 30, 2022 at 10:13:21AM -0800, Paul E. McKenney wrote:
> From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> 
> Earlier commits in this series allow battery-powered systems to build
> their kernels with the default-disabled CONFIG_RCU_LAZY=y Kconfig option.
> This Kconfig option causes call_rcu() to delay its callbacks in order to
> batch callbacks.  This means that a given RCU grace period covers more
> callbacks, thus reducing the number of grace periods, in turn reducing
> the amount of energy consumed, which increases battery lifetime which
> can be a very good thing.  This is not a subtle effect: In some important
> use cases, the battery lifetime is increased by more than 10%.
> 
> This CONFIG_RCU_LAZY=y option is available only for CPUs that offload
> callbacks, for example, CPUs mentioned in the rcu_nocbs kernel boot
> parameter passed to kernels built with CONFIG_RCU_NOCB_CPU=y.
> 
> Delaying callbacks is normally not a problem because most callbacks do
> nothing but free memory.  If the system is short on memory, a shrinker
> will kick all currently queued lazy callbacks out of their laziness,
> thus freeing their memory in short order.  Similarly, the rcu_barrier()
> function, which blocks until all currently queued callbacks are invoked,
> will also kick lazy callbacks, thus enabling rcu_barrier() to complete
> in a timely manner.
> 
> However, there are some cases where laziness is not a good option.
> For example, synchronize_rcu() invokes call_rcu(), and blocks until
> the newly queued callback is invoked.  It would not be a good for
> synchronize_rcu() to block for ten seconds, even on an idle system.
> Therefore, synchronize_rcu() invokes call_rcu_hurry() instead of
> call_rcu().  The arrival of a non-lazy call_rcu_hurry() callback on a
> given CPU kicks any lazy callbacks that might be already queued on that
> CPU.  After all, if there is going to be a grace period, all callbacks
> might as well get full benefit from it.
> 
> Yes, this could be done the other way around by creating a
> call_rcu_lazy(), but earlier experience with this approach and
> feedback at the 2022 Linux Plumbers Conference shifted the approach
> to call_rcu() being lazy with call_rcu_hurry() for the few places
> where laziness is inappropriate.
> 
> And another call_rcu() instance that cannot be lazy is the one on the
> percpu refcounter's "per-CPU to atomic switch" code path, which
> uses RCU when switching to atomic mode.  The enqueued callback
> wakes up waiters waiting in the percpu_ref_switch_waitq.  Allowing
> this callback to be lazy would result in unacceptable slowdowns for
> users of per-CPU refcounts, such as blk_pre_runtime_suspend().
> 
> Therefore, make __percpu_ref_switch_to_atomic() use call_rcu_hurry()
> in order to revert to the old behavior.
> 
> [ paulmck: Apply s/call_rcu_flush/call_rcu_hurry/ feedback from Tejun Heo. ]
> 
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> Cc: Dennis Zhou <dennis@kernel.org>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: <linux-mm@kvack.org>

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH rcu 14/16] rxrpc: Use call_rcu_hurry() instead of call_rcu()
  2022-11-30 19:09     ` David Howells
  2022-11-30 19:20       ` Joel Fernandes
@ 2022-11-30 20:12       ` Paul E. McKenney
  2022-11-30 22:47       ` Joel Fernandes
  2 siblings, 0 replies; 40+ messages in thread
From: Paul E. McKenney @ 2022-11-30 20:12 UTC (permalink / raw)
  To: David Howells
  Cc: Joel Fernandes, rcu, linux-kernel, kernel-team, rostedt,
	Marc Dionne, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, linux-afs, netdev

On Wed, Nov 30, 2022 at 07:09:04PM +0000, David Howells wrote:
> Note that this conflicts with my patch:
> 
> 	rxrpc: Don't hold a ref for connection workqueue
> 	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/commit/?h=rxrpc-next&id=450b00011290660127c2d76f5c5ed264126eb229
> 
> which should render it unnecessary.  It's a little ahead of yours in the
> net-next queue, if that means anything.

OK, I will drop this patch in favor of yours, thank you!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH rcu 15/16] net: Use call_rcu_hurry() for dst_release()
  2022-11-30 18:39     ` Eric Dumazet
  2022-11-30 18:50       ` Joel Fernandes
@ 2022-11-30 21:40       ` Paul E. McKenney
  1 sibling, 0 replies; 40+ messages in thread
From: Paul E. McKenney @ 2022-11-30 21:40 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Joel Fernandes, rcu, linux-kernel, kernel-team, rostedt,
	David Ahern, David S. Miller, Hideaki YOSHIFUJI, Jakub Kicinski,
	Paolo Abeni, netdev

On Wed, Nov 30, 2022 at 07:39:02PM +0100, Eric Dumazet wrote:
> Sure, thanks.
> 
> Reviewed-by: Eric Dumazet <edumazet@google.com>

Applied, thank you!!!

> I think we can work later to change how dst are freed/released to
> avoid using call_rcu_hurry()

Thank you for being willing to look into that!

							Thanx, Paul

> On Wed, Nov 30, 2022 at 7:17 PM Joel Fernandes <joel@joelfernandes.org> wrote:
> >
> > Hi Eric,
> >
> > Could you give your ACK for this patch for this one as well? This is
> > the other networking one.
> >
> > The networking testing passed on ChromeOS and it has been in -next for
> > some time so has gotten testing there. The CONFIG option is default
> > disabled.
> >
> > Thanks a lot,
> >
> > - Joel
> >
> > On Wed, Nov 30, 2022 at 6:14 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> > >
> > > From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> > >
> > > In a networking test on ChromeOS, kernels built with the new
> > > CONFIG_RCU_LAZY=y Kconfig option fail a networking test in the teardown
> > > phase.
> > >
> > > This failure may be reproduced as follows: ip netns del <name>
> > >
> > > The CONFIG_RCU_LAZY=y Kconfig option was introduced by earlier commits
> > > in this series for the benefit of certain battery-powered systems.
> > > This Kconfig option causes call_rcu() to delay its callbacks in order
> > > to batch them.  This means that a given RCU grace period covers more
> > > callbacks, thus reducing the number of grace periods, in turn reducing
> > > the amount of energy consumed, which increases battery lifetime which
> > > can be a very good thing.  This is not a subtle effect: In some important
> > > use cases, the battery lifetime is increased by more than 10%.
> > >
> > > This CONFIG_RCU_LAZY=y option is available only for CPUs that offload
> > > callbacks, for example, CPUs mentioned in the rcu_nocbs kernel boot
> > > parameter passed to kernels built with CONFIG_RCU_NOCB_CPU=y.
> > >
> > > Delaying callbacks is normally not a problem because most callbacks do
> > > nothing but free memory.  If the system is short on memory, a shrinker
> > > will kick all currently queued lazy callbacks out of their laziness,
> > > thus freeing their memory in short order.  Similarly, the rcu_barrier()
> > > function, which blocks until all currently queued callbacks are invoked,
> > > will also kick lazy callbacks, thus enabling rcu_barrier() to complete
> > > in a timely manner.
> > >
> > > However, there are some cases where laziness is not a good option.
> > > For example, synchronize_rcu() invokes call_rcu(), and blocks until
> > > the newly queued callback is invoked.  It would not be a good for
> > > synchronize_rcu() to block for ten seconds, even on an idle system.
> > > Therefore, synchronize_rcu() invokes call_rcu_hurry() instead of
> > > call_rcu().  The arrival of a non-lazy call_rcu_hurry() callback on a
> > > given CPU kicks any lazy callbacks that might be already queued on that
> > > CPU.  After all, if there is going to be a grace period, all callbacks
> > > might as well get full benefit from it.
> > >
> > > Yes, this could be done the other way around by creating a
> > > call_rcu_lazy(), but earlier experience with this approach and
> > > feedback at the 2022 Linux Plumbers Conference shifted the approach
> > > to call_rcu() being lazy with call_rcu_hurry() for the few places
> > > where laziness is inappropriate.
> > >
> > > Returning to the test failure, use of ftrace showed that this failure
> > > cause caused by the aadded delays due to this new lazy behavior of
> > > call_rcu() in kernels built with CONFIG_RCU_LAZY=y.
> > >
> > > Therefore, make dst_release() use call_rcu_hurry() in order to revert
> > > to the old test-failure-free behavior.
> > >
> > > [ paulmck: Apply s/call_rcu_flush/call_rcu_hurry/ feedback from Tejun Heo. ]
> > >
> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > Cc: David Ahern <dsahern@kernel.org>
> > > Cc: "David S. Miller" <davem@davemloft.net>
> > > Cc: Eric Dumazet <edumazet@google.com>
> > > Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
> > > Cc: Jakub Kicinski <kuba@kernel.org>
> > > Cc: Paolo Abeni <pabeni@redhat.com>
> > > Cc: <netdev@vger.kernel.org>
> > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > > ---
> > >  net/core/dst.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/net/core/dst.c b/net/core/dst.c
> > > index bc9c9be4e0801..a4e738d321ba2 100644
> > > --- a/net/core/dst.c
> > > +++ b/net/core/dst.c
> > > @@ -174,7 +174,7 @@ void dst_release(struct dst_entry *dst)
> > >                         net_warn_ratelimited("%s: dst:%p refcnt:%d\n",
> > >                                              __func__, dst, newrefcnt);
> > >                 if (!newrefcnt)
> > > -                       call_rcu(&dst->rcu_head, dst_destroy_rcu);
> > > +                       call_rcu_hurry(&dst->rcu_head, dst_destroy_rcu);
> > >         }
> > >  }
> > >  EXPORT_SYMBOL(dst_release);
> > > --
> > > 2.31.1.189.g2e36527f23
> > >

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH rcu 14/16] rxrpc: Use call_rcu_hurry() instead of call_rcu()
  2022-11-30 19:20       ` Joel Fernandes
@ 2022-11-30 21:43         ` Paul E. McKenney
  2022-11-30 22:06           ` Joel Fernandes
  0 siblings, 1 reply; 40+ messages in thread
From: Paul E. McKenney @ 2022-11-30 21:43 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: David Howells, rcu, linux-kernel, kernel-team, rostedt,
	Marc Dionne, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, linux-afs, netdev

On Wed, Nov 30, 2022 at 02:20:52PM -0500, Joel Fernandes wrote:
> 
> 
> > On Nov 30, 2022, at 2:09 PM, David Howells <dhowells@redhat.com> wrote:
> > 
> > Note that this conflicts with my patch:
> 
> Oh.  I don’t see any review or Ack tags on it. Is it still under review?

So what I have done is to drop this patch from the series, but to also
preserve it for posterity at -rcu branch lazy-obsolete.2022.11.30a.

It looks like that wakeup is still delayed, but I could easily be
missing something.

Joel, could you please test the effects of having the current lazy branch,
but also David Howells's patch?  That way, if there is an issue, we can
work it sooner rather than later, and if it all works fine, we can stop
worrying about it.  ;-)

							Thanx, Paul

> Thanks,
> 
> - Joel
> 
> 
> 
> > 
> >    rxrpc: Don't hold a ref for connection workqueue
> >    https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/commit/?h=rxrpc-next&id=450b00011290660127c2d76f5c5ed264126eb229
> > 
> > which should render it unnecessary.  It's a little ahead of yours in the
> > net-next queue, if that means anything.
> > 
> > David
> > 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH rcu 12/16] percpu-refcount: Use call_rcu_hurry() for atomic switch
  2022-11-30 19:43   ` Tejun Heo
@ 2022-11-30 21:44     ` Paul E. McKenney
  0 siblings, 0 replies; 40+ messages in thread
From: Paul E. McKenney @ 2022-11-30 21:44 UTC (permalink / raw)
  To: Tejun Heo
  Cc: rcu, linux-kernel, kernel-team, rostedt, Joel Fernandes (Google),
	Dennis Zhou, Christoph Lameter, linux-mm

On Wed, Nov 30, 2022 at 09:43:44AM -1000, Tejun Heo wrote:
> On Wed, Nov 30, 2022 at 10:13:21AM -0800, Paul E. McKenney wrote:
> > From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> > 
> > Earlier commits in this series allow battery-powered systems to build
> > their kernels with the default-disabled CONFIG_RCU_LAZY=y Kconfig option.
> > This Kconfig option causes call_rcu() to delay its callbacks in order to
> > batch callbacks.  This means that a given RCU grace period covers more
> > callbacks, thus reducing the number of grace periods, in turn reducing
> > the amount of energy consumed, which increases battery lifetime which
> > can be a very good thing.  This is not a subtle effect: In some important
> > use cases, the battery lifetime is increased by more than 10%.
> > 
> > This CONFIG_RCU_LAZY=y option is available only for CPUs that offload
> > callbacks, for example, CPUs mentioned in the rcu_nocbs kernel boot
> > parameter passed to kernels built with CONFIG_RCU_NOCB_CPU=y.
> > 
> > Delaying callbacks is normally not a problem because most callbacks do
> > nothing but free memory.  If the system is short on memory, a shrinker
> > will kick all currently queued lazy callbacks out of their laziness,
> > thus freeing their memory in short order.  Similarly, the rcu_barrier()
> > function, which blocks until all currently queued callbacks are invoked,
> > will also kick lazy callbacks, thus enabling rcu_barrier() to complete
> > in a timely manner.
> > 
> > However, there are some cases where laziness is not a good option.
> > For example, synchronize_rcu() invokes call_rcu(), and blocks until
> > the newly queued callback is invoked.  It would not be a good for
> > synchronize_rcu() to block for ten seconds, even on an idle system.
> > Therefore, synchronize_rcu() invokes call_rcu_hurry() instead of
> > call_rcu().  The arrival of a non-lazy call_rcu_hurry() callback on a
> > given CPU kicks any lazy callbacks that might be already queued on that
> > CPU.  After all, if there is going to be a grace period, all callbacks
> > might as well get full benefit from it.
> > 
> > Yes, this could be done the other way around by creating a
> > call_rcu_lazy(), but earlier experience with this approach and
> > feedback at the 2022 Linux Plumbers Conference shifted the approach
> > to call_rcu() being lazy with call_rcu_hurry() for the few places
> > where laziness is inappropriate.
> > 
> > And another call_rcu() instance that cannot be lazy is the one on the
> > percpu refcounter's "per-CPU to atomic switch" code path, which
> > uses RCU when switching to atomic mode.  The enqueued callback
> > wakes up waiters waiting in the percpu_ref_switch_waitq.  Allowing
> > this callback to be lazy would result in unacceptable slowdowns for
> > users of per-CPU refcounts, such as blk_pre_runtime_suspend().
> > 
> > Therefore, make __percpu_ref_switch_to_atomic() use call_rcu_hurry()
> > in order to revert to the old behavior.
> > 
> > [ paulmck: Apply s/call_rcu_flush/call_rcu_hurry/ feedback from Tejun Heo. ]
> > 
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > Cc: Dennis Zhou <dennis@kernel.org>
> > Cc: Tejun Heo <tj@kernel.org>
> > Cc: Christoph Lameter <cl@linux.com>
> > Cc: <linux-mm@kvack.org>
> 
> Acked-by: Tejun Heo <tj@kernel.org>

I applied both, thank you very much!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH rcu 14/16] rxrpc: Use call_rcu_hurry() instead of call_rcu()
  2022-11-30 18:37     ` Eric Dumazet
@ 2022-11-30 21:45       ` Paul E. McKenney
  2022-11-30 21:49         ` Steven Rostedt
  0 siblings, 1 reply; 40+ messages in thread
From: Paul E. McKenney @ 2022-11-30 21:45 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Joel Fernandes, rcu, linux-kernel, kernel-team, rostedt,
	David Howells, Marc Dionne, David S. Miller, Jakub Kicinski,
	Paolo Abeni, linux-afs, netdev

On Wed, Nov 30, 2022 at 07:37:07PM +0100, Eric Dumazet wrote:
> Ah, I see a slightly better name has been chosen ;)

call_rcu_vite()?  call_rcu_tres_grande_vitesse()?  call_rcu_tgv()?

Sorry, couldn't resist!  ;-)

							Thanx, Paul

> Reviewed-by: Eric Dumazet <edumazet@google.com>
> 
> On Wed, Nov 30, 2022 at 7:16 PM Joel Fernandes <joel@joelfernandes.org> wrote:
> >
> > Hi Eric,
> >
> > Could you give your ACK for this patch?
> >
> > The networking testing passed on ChromeOS and it has been in -next for
> > some time so has gotten testing there. The CONFIG option is default
> > disabled.
> >
> > Thanks a lot,
> >
> > - Joel
> >
> > On Wed, Nov 30, 2022 at 6:13 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> > >
> > > From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> > >
> > > Earlier commits in this series allow battery-powered systems to build
> > > their kernels with the default-disabled CONFIG_RCU_LAZY=y Kconfig option.
> > > This Kconfig option causes call_rcu() to delay its callbacks in order
> > > to batch them.  This means that a given RCU grace period covers more
> > > callbacks, thus reducing the number of grace periods, in turn reducing
> > > the amount of energy consumed, which increases battery lifetime which
> > > can be a very good thing.  This is not a subtle effect: In some important
> > > use cases, the battery lifetime is increased by more than 10%.
> > >
> > > This CONFIG_RCU_LAZY=y option is available only for CPUs that offload
> > > callbacks, for example, CPUs mentioned in the rcu_nocbs kernel boot
> > > parameter passed to kernels built with CONFIG_RCU_NOCB_CPU=y.
> > >
> > > Delaying callbacks is normally not a problem because most callbacks do
> > > nothing but free memory.  If the system is short on memory, a shrinker
> > > will kick all currently queued lazy callbacks out of their laziness,
> > > thus freeing their memory in short order.  Similarly, the rcu_barrier()
> > > function, which blocks until all currently queued callbacks are invoked,
> > > will also kick lazy callbacks, thus enabling rcu_barrier() to complete
> > > in a timely manner.
> > >
> > > However, there are some cases where laziness is not a good option.
> > > For example, synchronize_rcu() invokes call_rcu(), and blocks until
> > > the newly queued callback is invoked.  It would not be a good for
> > > synchronize_rcu() to block for ten seconds, even on an idle system.
> > > Therefore, synchronize_rcu() invokes call_rcu_hurry() instead of
> > > call_rcu().  The arrival of a non-lazy call_rcu_hurry() callback on a
> > > given CPU kicks any lazy callbacks that might be already queued on that
> > > CPU.  After all, if there is going to be a grace period, all callbacks
> > > might as well get full benefit from it.
> > >
> > > Yes, this could be done the other way around by creating a
> > > call_rcu_lazy(), but earlier experience with this approach and
> > > feedback at the 2022 Linux Plumbers Conference shifted the approach
> > > to call_rcu() being lazy with call_rcu_hurry() for the few places
> > > where laziness is inappropriate.
> > >
> > > And another call_rcu() instance that cannot be lazy is the one
> > > in rxrpc_kill_connection(), which sometimes does a wakeup
> > > that should not be unduly delayed.
> > >
> > > Therefore, make rxrpc_kill_connection() use call_rcu_hurry() in order
> > > to revert to the old behavior.
> > >
> > > [ paulmck: Apply s/call_rcu_flush/call_rcu_hurry/ feedback from Tejun Heo. ]
> > >
> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > Cc: David Howells <dhowells@redhat.com>
> > > Cc: Marc Dionne <marc.dionne@auristor.com>
> > > Cc: "David S. Miller" <davem@davemloft.net>
> > > Cc: Eric Dumazet <edumazet@google.com>
> > > Cc: Jakub Kicinski <kuba@kernel.org>
> > > Cc: Paolo Abeni <pabeni@redhat.com>
> > > Cc: <linux-afs@lists.infradead.org>
> > > Cc: <netdev@vger.kernel.org>
> > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > > ---
> > >  net/rxrpc/conn_object.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/net/rxrpc/conn_object.c b/net/rxrpc/conn_object.c
> > > index 22089e37e97f0..9c5fae9ca106c 100644
> > > --- a/net/rxrpc/conn_object.c
> > > +++ b/net/rxrpc/conn_object.c
> > > @@ -253,7 +253,7 @@ void rxrpc_kill_connection(struct rxrpc_connection *conn)
> > >          * must carry a ref on the connection to prevent us getting here whilst
> > >          * it is queued or running.
> > >          */
> > > -       call_rcu(&conn->rcu, rxrpc_destroy_connection);
> > > +       call_rcu_hurry(&conn->rcu, rxrpc_destroy_connection);
> > >  }
> > >
> > >  /*
> > > --
> > > 2.31.1.189.g2e36527f23
> > >

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH rcu 14/16] rxrpc: Use call_rcu_hurry() instead of call_rcu()
  2022-11-30 21:45       ` Paul E. McKenney
@ 2022-11-30 21:49         ` Steven Rostedt
  2022-11-30 22:00           ` Paul E. McKenney
  0 siblings, 1 reply; 40+ messages in thread
From: Steven Rostedt @ 2022-11-30 21:49 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Eric Dumazet, Joel Fernandes, rcu, linux-kernel, kernel-team,
	David Howells, Marc Dionne, David S. Miller, Jakub Kicinski,
	Paolo Abeni, linux-afs, netdev

On Wed, 30 Nov 2022 13:45:52 -0800
"Paul E. McKenney" <paulmck@kernel.org> wrote:

> On Wed, Nov 30, 2022 at 07:37:07PM +0100, Eric Dumazet wrote:
> > Ah, I see a slightly better name has been chosen ;)  
> 
> call_rcu_vite()?  call_rcu_tres_grande_vitesse()?  call_rcu_tgv()?
> 
> Sorry, couldn't resist!  ;-)
> 
>

  call_rcu_twitter_2_0()  ?

-- Steve

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH rcu 14/16] rxrpc: Use call_rcu_hurry() instead of call_rcu()
  2022-11-30 21:49         ` Steven Rostedt
@ 2022-11-30 22:00           ` Paul E. McKenney
  0 siblings, 0 replies; 40+ messages in thread
From: Paul E. McKenney @ 2022-11-30 22:00 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Eric Dumazet, Joel Fernandes, rcu, linux-kernel, kernel-team,
	David Howells, Marc Dionne, David S. Miller, Jakub Kicinski,
	Paolo Abeni, linux-afs, netdev

On Wed, Nov 30, 2022 at 04:49:49PM -0500, Steven Rostedt wrote:
> On Wed, 30 Nov 2022 13:45:52 -0800
> "Paul E. McKenney" <paulmck@kernel.org> wrote:
> 
> > On Wed, Nov 30, 2022 at 07:37:07PM +0100, Eric Dumazet wrote:
> > > Ah, I see a slightly better name has been chosen ;)  
> > 
> > call_rcu_vite()?  call_rcu_tres_grande_vitesse()?  call_rcu_tgv()?
> > 
> > Sorry, couldn't resist!  ;-)
> 
>   call_rcu_twitter_2_0()  ?

call_rcu_grace_period_finishes_before_it_starts() ?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH rcu 14/16] rxrpc: Use call_rcu_hurry() instead of call_rcu()
  2022-11-30 21:43         ` Paul E. McKenney
@ 2022-11-30 22:06           ` Joel Fernandes
  0 siblings, 0 replies; 40+ messages in thread
From: Joel Fernandes @ 2022-11-30 22:06 UTC (permalink / raw)
  To: paulmck
  Cc: David Howells, rcu, linux-kernel, kernel-team, rostedt,
	Marc Dionne, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, linux-afs, netdev



> On Nov 30, 2022, at 4:43 PM, Paul E. McKenney <paulmck@kernel.org> wrote:
> 
> On Wed, Nov 30, 2022 at 02:20:52PM -0500, Joel Fernandes wrote:
>> 
>> 
>>>> On Nov 30, 2022, at 2:09 PM, David Howells <dhowells@redhat.com> wrote:
>>> 
>>> Note that this conflicts with my patch:
>> 
>> Oh.  I don’t see any review or Ack tags on it. Is it still under review?
> 
> So what I have done is to drop this patch from the series, but to also
> preserve it for posterity at -rcu branch lazy-obsolete.2022.11.30a.
> 
> It looks like that wakeup is still delayed, but I could easily be
> missing something.
> 
> Joel, could you please test the effects of having the current lazy branch,
> but also David Howells's patch?  That way, if there is an issue, we can
> work it sooner rather than later, and if it all works fine, we can stop
> worrying about it.  ;-)

Sure, I will kick off the failing test and see if it passes with Davids patch. Will let you know.

Thanks,

 - Joel


>                            Thanx, Paul
> 
>> Thanks,
>> 
>> - Joel
>> 
>> 
>> 
>>> 
>>>   rxrpc: Don't hold a ref for connection workqueue
>>>   https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/commit/?h=rxrpc-next&id=450b00011290660127c2d76f5c5ed264126eb229
>>> 
>>> which should render it unnecessary.  It's a little ahead of yours in the
>>> net-next queue, if that means anything.
>>> 
>>> David
>>> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH rcu 14/16] rxrpc: Use call_rcu_hurry() instead of call_rcu()
  2022-11-30 19:09     ` David Howells
  2022-11-30 19:20       ` Joel Fernandes
  2022-11-30 20:12       ` Paul E. McKenney
@ 2022-11-30 22:47       ` Joel Fernandes
  2022-11-30 23:05         ` David Howells
  2 siblings, 1 reply; 40+ messages in thread
From: Joel Fernandes @ 2022-11-30 22:47 UTC (permalink / raw)
  To: David Howells
  Cc: Paul E. McKenney, rcu, linux-kernel, kernel-team, rostedt,
	Marc Dionne, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, linux-afs, netdev

Hi David,

On Wed, Nov 30, 2022 at 7:09 PM David Howells <dhowells@redhat.com> wrote:
>
> Note that this conflicts with my patch:
>
>         rxrpc: Don't hold a ref for connection workqueue
>         https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/commit/?h=rxrpc-next&id=450b00011290660127c2d76f5c5ed264126eb229
>
> which should render it unnecessary.  It's a little ahead of yours in the
> net-next queue, if that means anything.

Could you clarify why it is unnecessary?

After your patch, you are still doing a wake up in your call_rcu() callback:

- ASSERTCMP(refcount_read(&conn->ref), ==, 0);
+ if (atomic_dec_and_test(&rxnet->nr_conns))
+    wake_up_var(&rxnet->nr_conns);
+}

Are you saying the code can now tolerate delays? What if the RCU
callback is invoked after arbitrarily long delays making the sleeping
process to wait?

If you agree, you can convert the call_rcu() to call_rcu_hurry() in
your patch itself. Would you be willing to do that? If not, that's
totally OK and I can send a patch later once yours is in (after
further testing).

Thanks,

 - Joel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH rcu 14/16] rxrpc: Use call_rcu_hurry() instead of call_rcu()
  2022-11-30 22:47       ` Joel Fernandes
@ 2022-11-30 23:05         ` David Howells
  2022-11-30 23:15           ` Joel Fernandes
  2023-03-11 17:46           ` Joel Fernandes
  0 siblings, 2 replies; 40+ messages in thread
From: David Howells @ 2022-11-30 23:05 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: dhowells, Paul E. McKenney, rcu, linux-kernel, kernel-team,
	rostedt, Marc Dionne, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, linux-afs, netdev

Joel Fernandes <joel@joelfernandes.org> wrote:

> > Note that this conflicts with my patch:
> >
> >         rxrpc: Don't hold a ref for connection workqueue
> >         https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/commit/?h=rxrpc-next&id=450b00011290660127c2d76f5c5ed264126eb229
> >
> > which should render it unnecessary.  It's a little ahead of yours in the
> > net-next queue, if that means anything.
> 
> Could you clarify why it is unnecessary?

Rather than tearing down parts of the connection it only logs a trace line,
frees the memory and decrements the counter on the namespace.  This it used to
account that all the pieces of memory allocated in that namespace are gone
before the namespace is removed to check for leaks.  The RCU cleanup used to
use some other stuff (such as the peer hash) in the rxrpc_net struct but no
longer will after the patches I submitted.

> After your patch, you are still doing a wake up in your call_rcu() callback:
>
> - ASSERTCMP(refcount_read(&conn->ref), ==, 0);
> + if (atomic_dec_and_test(&rxnet->nr_conns))
> +    wake_up_var(&rxnet->nr_conns);
> +}
> 
> Are you saying the code can now tolerate delays? What if the RCU
> callback is invoked after arbitrarily long delays making the sleeping
> process to wait?

True.  But that now only holds up the destruction of a net namespace and the
removal of the rxrpc module.

> If you agree, you can convert the call_rcu() to call_rcu_hurry() in
> your patch itself. Would you be willing to do that? If not, that's
> totally OK and I can send a patch later once yours is in (after
> further testing).

I can add it to part 4 (see my rxrpc-ringless-5 branch) if it is necessary.

David


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH rcu 14/16] rxrpc: Use call_rcu_hurry() instead of call_rcu()
  2022-11-30 23:05         ` David Howells
@ 2022-11-30 23:15           ` Joel Fernandes
  2023-03-11 17:46           ` Joel Fernandes
  1 sibling, 0 replies; 40+ messages in thread
From: Joel Fernandes @ 2022-11-30 23:15 UTC (permalink / raw)
  To: David Howells
  Cc: Paul E. McKenney, rcu, linux-kernel, kernel-team, rostedt,
	Marc Dionne, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, linux-afs, netdev

On Wed, Nov 30, 2022 at 11:05 PM David Howells <dhowells@redhat.com> wrote:
>
> Joel Fernandes <joel@joelfernandes.org> wrote:
>
> > > Note that this conflicts with my patch:
> > >
> > >         rxrpc: Don't hold a ref for connection workqueue
> > >         https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/commit/?h=rxrpc-next&id=450b00011290660127c2d76f5c5ed264126eb229
> > >
> > > which should render it unnecessary.  It's a little ahead of yours in the
> > > net-next queue, if that means anything.
> >
> > Could you clarify why it is unnecessary?
>
> Rather than tearing down parts of the connection it only logs a trace line,
> frees the memory and decrements the counter on the namespace.  This it used to
> account that all the pieces of memory allocated in that namespace are gone
> before the namespace is removed to check for leaks.  The RCU cleanup used to
> use some other stuff (such as the peer hash) in the rxrpc_net struct but no
> longer will after the patches I submitted.
>
> > After your patch, you are still doing a wake up in your call_rcu() callback:
> >
> > - ASSERTCMP(refcount_read(&conn->ref), ==, 0);
> > + if (atomic_dec_and_test(&rxnet->nr_conns))
> > +    wake_up_var(&rxnet->nr_conns);
> > +}
> >
> > Are you saying the code can now tolerate delays? What if the RCU
> > callback is invoked after arbitrarily long delays making the sleeping
> > process to wait?
>
> True.  But that now only holds up the destruction of a net namespace and the
> removal of the rxrpc module.
>
> > If you agree, you can convert the call_rcu() to call_rcu_hurry() in
> > your patch itself. Would you be willing to do that? If not, that's
> > totally OK and I can send a patch later once yours is in (after
> > further testing).
>
> I can add it to part 4 (see my rxrpc-ringless-5 branch) if it is necessary.

Ok sounds good, on module removal the rcu_barrier() will flush out
pending callbacks so that should not be an issue.

Based on your message, I think we can drop this patch then. Since Paul
is already dropping it, no other action is needed.

(I just realized my patch was not fixing a test failure, like the
other net ones did, but rather we found the issue by static analysis
-- i.e. programmatically auditing all callbacks in the kernel doing
wake ups).

thanks,
 - Joel

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH rcu 14/16] rxrpc: Use call_rcu_hurry() instead of call_rcu()
  2022-11-30 23:05         ` David Howells
  2022-11-30 23:15           ` Joel Fernandes
@ 2023-03-11 17:46           ` Joel Fernandes
  1 sibling, 0 replies; 40+ messages in thread
From: Joel Fernandes @ 2023-03-11 17:46 UTC (permalink / raw)
  To: David Howells
  Cc: Paul E. McKenney, rcu, linux-kernel, kernel-team, rostedt,
	Marc Dionne, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, linux-afs, netdev

On Wed, Nov 30, 2022 at 11:05:22PM +0000, David Howells wrote:
> Joel Fernandes <joel@joelfernandes.org> wrote:
[...] 
> > After your patch, you are still doing a wake up in your call_rcu() callback:
> >
> > - ASSERTCMP(refcount_read(&conn->ref), ==, 0);
> > + if (atomic_dec_and_test(&rxnet->nr_conns))
> > +    wake_up_var(&rxnet->nr_conns);
> > +}
> > 
> > Are you saying the code can now tolerate delays? What if the RCU
> > callback is invoked after arbitrarily long delays making the sleeping
> > process to wait?
> 
> True.  But that now only holds up the destruction of a net namespace and the
> removal of the rxrpc module.

I am guessing not destructing the net namespace soon enough is not an issue.
I do remember (in a different patch) that not tearing down networking things
have a weird side effect to tools that require state to disappear..

> > If you agree, you can convert the call_rcu() to call_rcu_hurry() in
> > your patch itself. Would you be willing to do that? If not, that's
> > totally OK and I can send a patch later once yours is in (after
> > further testing).
> 
> I can add it to part 4 (see my rxrpc-ringless-5 branch) if it is necessary.

I am guessing the conversion to call_rcu_hurry() is still not necessary here,
if it is then consider the conversion.

But yeah feel free to ignore this, I am just pinging here so that it did not
slip through the cracks.

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2023-03-11 17:46 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-11-30 18:13 [PATCH v3 rcu 0/16] Lazy call_rcu() updates for v6.2 Paul E. McKenney
2022-11-30 18:13 ` [PATCH rcu 01/16] rcu: Simplify rcu_init_nohz() cpumask handling Paul E. McKenney
2022-11-30 18:13 ` [PATCH rcu 02/16] rcu: Fix late wakeup when flush of bypass cblist happens Paul E. McKenney
2022-11-30 18:13 ` [PATCH rcu 03/16] rcu: Fix missing nocb gp wake on rcu_barrier() Paul E. McKenney
2022-11-30 18:13 ` [PATCH rcu 04/16] rcu: Make call_rcu() lazy to save power Paul E. McKenney
2022-11-30 18:13 ` [PATCH rcu 05/16] rcu: Refactor code a bit in rcu_nocb_do_flush_bypass() Paul E. McKenney
2022-11-30 18:13 ` [PATCH rcu 06/16] rcu: Shrinker for lazy rcu Paul E. McKenney
2022-11-30 18:13 ` [PATCH rcu 07/16] rcuscale: Add laziness and kfree tests Paul E. McKenney
2022-11-30 18:13 ` [PATCH rcu 08/16] rcu/sync: Use call_rcu_hurry() instead of call_rcu Paul E. McKenney
2022-11-30 18:13 ` [PATCH rcu 09/16] rcu/rcuscale: Use call_rcu_hurry() for async reader test Paul E. McKenney
2022-11-30 18:13 ` [PATCH rcu 10/16] rcu/rcutorture: Use call_rcu_hurry() where needed Paul E. McKenney
2022-11-30 18:13 ` [PATCH rcu 11/16] scsi/scsi_error: Use call_rcu_hurry() instead of call_rcu() Paul E. McKenney
2022-11-30 18:13 ` [PATCH rcu 12/16] percpu-refcount: Use call_rcu_hurry() for atomic switch Paul E. McKenney
2022-11-30 18:19   ` Joel Fernandes
2022-11-30 19:43   ` Tejun Heo
2022-11-30 21:44     ` Paul E. McKenney
2022-11-30 18:13 ` [PATCH rcu 13/16] workqueue: Make queue_rcu_work() use call_rcu_hurry() Paul E. McKenney
2022-11-30 18:18   ` Joel Fernandes
2022-11-30 19:43   ` Tejun Heo
2022-11-30 18:13 ` [PATCH rcu 14/16] rxrpc: Use call_rcu_hurry() instead of call_rcu() Paul E. McKenney
2022-11-30 18:16   ` Joel Fernandes
2022-11-30 18:37     ` Eric Dumazet
2022-11-30 21:45       ` Paul E. McKenney
2022-11-30 21:49         ` Steven Rostedt
2022-11-30 22:00           ` Paul E. McKenney
2022-11-30 19:09     ` David Howells
2022-11-30 19:20       ` Joel Fernandes
2022-11-30 21:43         ` Paul E. McKenney
2022-11-30 22:06           ` Joel Fernandes
2022-11-30 20:12       ` Paul E. McKenney
2022-11-30 22:47       ` Joel Fernandes
2022-11-30 23:05         ` David Howells
2022-11-30 23:15           ` Joel Fernandes
2023-03-11 17:46           ` Joel Fernandes
2022-11-30 18:13 ` [PATCH rcu 15/16] net: Use call_rcu_hurry() for dst_release() Paul E. McKenney
2022-11-30 18:16   ` Joel Fernandes
2022-11-30 18:39     ` Eric Dumazet
2022-11-30 18:50       ` Joel Fernandes
2022-11-30 21:40       ` Paul E. McKenney
2022-11-30 18:13 ` [PATCH rcu 16/16] net: devinet: Reduce refcount before grace period Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox