linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] rcu: Fix delayed execution of hurry callbacks
@ 2025-07-17  5:53 Tze-nan Wu
  2025-07-17 13:34 ` Frederic Weisbecker
  0 siblings, 1 reply; 6+ messages in thread
From: Tze-nan Wu @ 2025-07-17  5:53 UTC (permalink / raw)
  To: rcu
  Cc: bobule.chang, wsd_upstream, Tze-nan Wu, Cheng-jui Wang, Lorry.Luo,
	weiyangyang, Paul E. McKenney, Frederic Weisbecker,
	Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
	Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang, Matthias Brugger,
	AngeloGioacchino Del Regno, Joel Fernandes (Google), linux-kernel,
	linux-arm-kernel, linux-mediatek

We observed a regression in our customer’s environment after enabling
CONFIG_LAZY_RCU. In the Android Update Engine scenario, where ioctl() is
used heavily, we found that callbacks queued via call_rcu_hurry (such as
percpu_ref_switch_to_atomic_rcu) can sometimes be delayed by up to 5
seconds before execution. This occurs because the new grace period does
not start immediately after the previous one completes.

The root cause is that the wake_nocb_gp_defer() function now checks
"rdp->nocb_defer_wakeup" instead of "rdp_gp->nocb_defer_wakeup". On CPUs
that are not rcuog, "rdp->nocb_defer_wakeup" may always be
RCU_NOCB_WAKE_NOT. This can cause "rdp_gp->nocb_defer_wakeup" to be
downgraded and the "rdp_gp->nocb_timer" to be postponed by up to 10
seconds, delaying the execution of hurry RCU callbacks.

The trace log of one scenario we encountered is as follow:
  // previous GP ends at this point
  rcu_preempt   [000] d..1.   137.240210: rcu_grace_period: rcu_preempt 8369 end
  rcu_preempt   [000] .....   137.240212: rcu_grace_period: rcu_preempt 8372 reqwait
  // call_rcu_hurry enqueues "percpu_ref_switch_to_atomic_rcu", the callback waited on by UpdateEngine
  update_engine [002] d..1.   137.301593: __call_rcu_common: wyy: unlikely p_ref = 00000000********. lazy = 0
  // FirstQ on cpu 2 rdp_gp->nocb_timer is set to fire after 1 jiffy (4ms)
  // and the rdp_gp->nocb_defer_wakeup is set to RCU_NOCB_WAKE
  update_engine [002] d..2.   137.301595: rcu_nocb_wake: rcu_preempt 2 FirstQ on cpu2 with rdp_gp (cpu0).
  // FirstBQ event on cpu2 during the 1 jiffy, make the timer postpond 10 seconds later.
  // also, the rdp_gp->nocb_defer_wakeup is overwrite to RCU_NOCB_WAKE_LAZY
  update_engine [002] d..1.   137.301601: rcu_nocb_wake: rcu_preempt 2 WakeEmptyIsDeferred
  ...
  ...
  ...
  // before the 10 seconds timeout, cpu0 received another call_rcu_hurry
  // reset the timer to jiffies+1 and set the waketype = RCU_NOCB_WAKE.
  kworker/u32:0 [000] d..2.   142.557564: rcu_nocb_wake: rcu_preempt 0 FirstQ
  kworker/u32:0 [000] d..1.   142.557576: rcu_nocb_wake: rcu_preempt 0 WakeEmptyIsDeferred
  kworker/u32:0 [000] d..1.   142.558296: rcu_nocb_wake: rcu_preempt 0 WakeNot
  kworker/u32:0 [000] d..1.   142.558562: rcu_nocb_wake: rcu_preempt 0 WakeNot
  // idle(do_nocb_deferred_wakeup) wake rcuog due to waketype == RCU_NOCB_WAKE
  <idle>        [000] d..1.   142.558786: rcu_nocb_wake: rcu_preempt 0 DoWake
  <idle>        [000] dN.1.   142.558839: rcu_nocb_wake: rcu_preempt 0 DeferredWake
  rcuog/0       [000] .....   142.558871: rcu_nocb_wake: rcu_preempt 0 EndSleep
  rcuog/0       [000] .....   142.558877: rcu_nocb_wake: rcu_preempt 0 Check
  // finally rcuog request a new GP at this point (5 seconds after the FirstQ event)
  rcuog/0       [000] d..2.   142.558886: rcu_grace_period: rcu_preempt 8372 newreq
  rcu_preempt   [001] d..1.   142.559458: rcu_grace_period: rcu_preempt 8373 start
  ...
  rcu_preempt   [000] d..1.   142.564258: rcu_grace_period: rcu_preempt 8373 end
  rcuop/2       [000] D..1.   142.566337: rcu_batch_start: rcu_preempt CBs=219 bl=10
  // the hurry CB is invoked at this point
  rcuop/2       [000] b....   142.566352: blk_queue_usage_counter_release: wyy: wakeup. p_ref = 00000000********.

This patch changes the condition to check "rdp_gp->nocb_defer_wakeup" in
the lazy path. This prevents an already scheduled "rdp_gp->nocb_timer"
from being postponed and avoids overwriting "rdp_gp->nocb_defer_wakeup"
when it is not RCU_NOCB_WAKE_NOT.

Fixes: 3cb278e73be5 ("rcu: Make call_rcu() lazy to save power")
Co-developed-by: Cheng-jui Wang <cheng-jui.wang@mediatek.com>
Signed-off-by: Cheng-jui Wang <cheng-jui.wang@mediatek.com>
Co-developed-by: Lorry.Luo@mediatek.com
Signed-off-by: Lorry.Luo@mediatek.com
Tested-by: weiyangyang@vivo.com
Signed-off-by: weiyangyang@vivo.com
Signed-off-by: Tze-nan Wu <Tze-nan.Wu@mediatek.com>
---
The regression is first observed by wyy in the Update Engine scenario
with  CONFIG_LAZY_RCU enabled. there is an additional delay of 4–5
seconds during the heavy ioctl API call, waiting for
percpu_ref_switch_to_atomic_rcu (RCU hurry CB) to complete.

Initially, we suspected that the percpu_ref_switch_to_atomic_rcu
function itself was taking too long. However, after enabling some
custome and the following trace events: rcu_do_batch, rcu_nocb_wake, and
rcu_grace_period. we found that the root cause was that rcuog was not
being woken up in time to request a new GP. This led to the delay in
invoking the hurry RCU callback (percpu_ref_switch_to_atomic_rcu).

Environment:
  Android-16, Kernel: 6.12, 8 CPUs (ARM)

Configuration:
  CONFIG_TREE_RCU=y
  CONFIG_PREEMPT_RCU=y
  CONFIG_LAZY_RCU=y
  CONFIG_RCU_NOCB_CPU=y
  CONFIG_RCU_NOCB_CPU_DEFAULT_ALL=y
  rcu_nocb_gp_stride = -1 (default is 4 for 8 cores)
  jiffies_lazy_flush = 10 * HZ

Contributions:
Tze-Nan Wu:
Collaborated with Cheng-Jui to discuss which tracepoints needed to be
added, jointly analyzed the trace logs, identified the root cause, and
proposed this upstream change.

Cheng-Jui Wang:
Provided many valuable suggestions during the debugging process,
repeatedly found breakthroughs when we were stuck, and helped identify
the root cause.

Lorry Luo:
Assisted in verifying whether rcu-hurry-callback was executed too long
or deferred, supported with testing, and helped with communication.

Weiyangyang:
Main tester who discovered the regression scenario, confirmed that
enabling CONFIG_LAZY_RCU caused the regression, and verified that this
patch resolves the issue.

Note:
With my limited understanding of lazy RCU, I am not fully confident that
this is a real issue. In my opinion, hurry callbacks should not be
delayed by other events such as firstBQ trace event.
If my understanding is incorrect, I would greatly appreciate any
guidance or clarification from the maintainers.
---
 kernel/rcu/tree_nocb.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 08eb9b0e2fab..e6cd56603cad 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -276,7 +276,7 @@ static void wake_nocb_gp_defer(struct rcu_data *rdp, int waketype,
 	 * callback storms, no need to wake up too early.
 	 */
 	if (waketype == RCU_NOCB_WAKE_LAZY &&
-	    rdp->nocb_defer_wakeup == RCU_NOCB_WAKE_NOT) {
+	    rdp_gp->nocb_defer_wakeup == RCU_NOCB_WAKE_NOT) {
 		mod_timer(&rdp_gp->nocb_timer, jiffies + rcu_get_jiffies_lazy_flush());
 		WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
 	} else if (waketype == RCU_NOCB_WAKE_BYPASS) {
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-07-24  5:30 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-17  5:53 [PATCH] rcu: Fix delayed execution of hurry callbacks Tze-nan Wu
2025-07-17 13:34 ` Frederic Weisbecker
2025-07-18  9:10   ` Tze-nan Wu (吳澤南)
2025-07-23 14:32     ` Joel Fernandes
2025-07-24  2:49       ` Tze-nan Wu (吳澤南)
2025-07-24  5:27         ` Neeraj Upadhyay

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).