public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@kernel.org>
To: rcu@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, kernel-team@meta.com,
	rostedt@goodmis.org, "Paul E. McKenney" <paulmck@kernel.org>,
	Leonardo Bras <leobras@redhat.com>,
	Sean Christopherson <seanjc@google.com>
Subject: [PATCH rcu 6/9] rcu: Add rcutree.nocb_patience_delay to reduce nohz_full OS jitter
Date: Tue,  4 Jun 2024 15:23:52 -0700	[thread overview]
Message-ID: <20240604222355.2370768-6-paulmck@kernel.org> (raw)
In-Reply-To: <657595c8-e86c-4594-a5b1-3c64a8275607@paulmck-laptop>

If a CPU is running either a userspace application or a guest OS in
nohz_full mode, it is possible for a system call to occur just as an
RCU grace period is starting.  If that CPU also has the scheduling-clock
tick enabled for any reason (such as a second runnable task), and if the
system was booted with rcutree.use_softirq=0, then RCU can add insult to
injury by awakening that CPU's rcuc kthread, resulting in yet another
task and yet more OS jitter due to switching to that task, running it,
and switching back.

In addition, in the common case where that system call is not of
excessively long duration, awakening the rcuc task is pointless.
This pointlessness is due to the fact that the CPU will enter an extended
quiescent state upon returning to the userspace application or guest OS.
In this case, the rcuc kthread cannot do anything that the main RCU
grace-period kthread cannot do on its behalf, at least if it is given
a few additional milliseconds (for example, given the time duration
specified by rcutree.jiffies_till_first_fqs, give or take scheduling
delays).

This commit therefore adds a rcutree.nocb_patience_delay kernel boot
parameter that specifies the grace period age (in milliseconds)
before which RCU will refrain from awakening the rcuc kthread.
Preliminary experiementation suggests a value of 1000, that is,
one second.  Increasing rcutree.nocb_patience_delay will increase
grace-period latency and in turn increase memory footprint, so systems
with constrained memory might choose a smaller value.  Systems with
less-aggressive OS-jitter requirements might choose the default value
of zero, which keeps the traditional immediate-wakeup behavior, thus
avoiding increases in grace-period latency.

[ paulmck: Apply Leonardo Bras feedback.  ]

Link: https://lore.kernel.org/all/20240328171949.743211-1-leobras@redhat.com/

Reported-by: Leonardo Bras <leobras@redhat.com>
Suggested-by: Leonardo Bras <leobras@redhat.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Leonardo Bras <leobras@redhat.com>
---
 Documentation/admin-guide/kernel-parameters.txt |  8 ++++++++
 kernel/rcu/tree.c                               | 10 ++++++++--
 kernel/rcu/tree_plugin.h                        | 10 ++++++++++
 3 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 500cfa7762257..2d4a512cf1fc6 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5018,6 +5018,14 @@
 			the ->nocb_bypass queue.  The definition of "too
 			many" is supplied by this kernel boot parameter.
 
+	rcutree.nocb_patience_delay= [KNL]
+			On callback-offloaded (rcu_nocbs) CPUs, avoid
+			disturbing RCU unless the grace period has
+			reached the specified age in milliseconds.
+			Defaults to zero.  Large values will be capped
+			at five seconds.  All values will be rounded down
+			to the nearest value representable by jiffies.
+
 	rcutree.qhimark= [KNL]
 			Set threshold of queued RCU callbacks beyond which
 			batch limiting is disabled.
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 35bf4a3736765..408b020c9501f 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -176,6 +176,9 @@ static int gp_init_delay;
 module_param(gp_init_delay, int, 0444);
 static int gp_cleanup_delay;
 module_param(gp_cleanup_delay, int, 0444);
+static int nocb_patience_delay;
+module_param(nocb_patience_delay, int, 0444);
+static int nocb_patience_delay_jiffies;
 
 // Add delay to rcu_read_unlock() for strict grace periods.
 static int rcu_unlock_delay;
@@ -4344,11 +4347,14 @@ static int rcu_pending(int user)
 		return 1;
 
 	/* Is this a nohz_full CPU in userspace or idle?  (Ignore RCU if so.) */
-	if ((user || rcu_is_cpu_rrupt_from_idle()) && rcu_nohz_full_cpu())
+	gp_in_progress = rcu_gp_in_progress();
+	if ((user || rcu_is_cpu_rrupt_from_idle() ||
+	     (gp_in_progress &&
+	      time_before(jiffies, READ_ONCE(rcu_state.gp_start) + nocb_patience_delay_jiffies))) &&
+	    rcu_nohz_full_cpu())
 		return 0;
 
 	/* Is the RCU core waiting for a quiescent state from this CPU? */
-	gp_in_progress = rcu_gp_in_progress();
 	if (rdp->core_needs_qs && !rdp->cpu_no_qs.b.norm && gp_in_progress)
 		return 1;
 
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 340bbefe5f652..31c539f09c150 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -93,6 +93,16 @@ static void __init rcu_bootup_announce_oddness(void)
 		pr_info("\tRCU debug GP init slowdown %d jiffies.\n", gp_init_delay);
 	if (gp_cleanup_delay)
 		pr_info("\tRCU debug GP cleanup slowdown %d jiffies.\n", gp_cleanup_delay);
+	if (nocb_patience_delay < 0) {
+		pr_info("\tRCU NOCB CPU patience negative (%d), resetting to zero.\n", nocb_patience_delay);
+		nocb_patience_delay = 0;
+	} else if (nocb_patience_delay > 5 * MSEC_PER_SEC) {
+		pr_info("\tRCU NOCB CPU patience too large (%d), resetting to %ld.\n", nocb_patience_delay, 5 * MSEC_PER_SEC);
+		nocb_patience_delay = 5 * MSEC_PER_SEC;
+	} else if (nocb_patience_delay) {
+		pr_info("\tRCU NOCB CPU patience set to %d milliseconds.\n", nocb_patience_delay);
+	}
+	nocb_patience_delay_jiffies = msecs_to_jiffies(nocb_patience_delay);
 	if (!use_softirq)
 		pr_info("\tRCU_SOFTIRQ processing moved to rcuc kthreads.\n");
 	if (IS_ENABLED(CONFIG_RCU_EQS_DEBUG))
-- 
2.40.1


  parent reply	other threads:[~2024-06-04 22:23 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-04 22:23 [PATCH rcu 0/9] Miscellaneous fixes for v6.11 Paul E. McKenney
2024-06-04 22:23 ` [PATCH rcu 1/9] rcu: Add lockdep_assert_in_rcu_read_lock() and friends Paul E. McKenney
2025-02-20 19:38   ` Jeff Johnson
2025-02-20 22:04     ` Paul E. McKenney
2025-02-20 23:51       ` Jeff Johnson
2024-06-04 22:23 ` [PATCH rcu 2/9] rcu: Reduce synchronize_rcu() delays when all wait heads are in use Paul E. McKenney
2024-06-05 12:09   ` Frederic Weisbecker
2024-06-05 18:38     ` Paul E. McKenney
2024-06-06  3:46       ` Neeraj Upadhyay
2024-06-06 16:49         ` Paul E. McKenney
2024-06-11 10:12           ` Uladzislau Rezki
2024-06-04 22:23 ` [PATCH rcu 3/9] rcu/tree: Reduce wake up for synchronize_rcu() common case Paul E. McKenney
2024-06-05 16:35   ` Frederic Weisbecker
2024-06-05 18:42     ` Paul E. McKenney
2024-06-06  5:58     ` Neeraj upadhyay
2024-06-06 18:12       ` Paul E. McKenney
2024-06-07  1:51         ` Neeraj upadhyay
2024-06-10 15:12           ` Paul E. McKenney
2024-06-11 13:46             ` Neeraj upadhyay
2024-06-11 16:17               ` Paul E. McKenney
2024-06-04 22:23 ` [PATCH rcu 4/9] rcu: Disable interrupts directly in rcu_gp_init() Paul E. McKenney
2024-06-04 22:23 ` [PATCH rcu 5/9] srcu: Disable interrupts directly in srcu_gp_end() Paul E. McKenney
2024-06-04 22:23 ` Paul E. McKenney [this message]
2024-06-10  5:05   ` [PATCH rcu 6/9] rcu: Add rcutree.nocb_patience_delay to reduce nohz_full OS jitter Leonardo Bras
2024-06-10 15:10     ` Paul E. McKenney
2024-07-03 16:21   ` Frederic Weisbecker
2024-07-03 17:25     ` Paul E. McKenney
2024-07-04 22:18       ` Frederic Weisbecker
2024-07-05  0:26         ` Paul E. McKenney
2024-06-04 22:23 ` [PATCH rcu 7/9] MAINTAINERS: Add Uladzislau Rezki as RCU maintainer Paul E. McKenney
2024-06-04 22:23 ` [PATCH rcu 8/9] rcu: Eliminate lockless accesses to rcu_sync->gp_count Paul E. McKenney
2024-06-04 22:23 ` [PATCH rcu 9/9] rcu: Fix rcu_barrier() VS post CPUHP_TEARDOWN_CPU invocation Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240604222355.2370768-6-paulmck@kernel.org \
    --to=paulmck@kernel.org \
    --cc=kernel-team@meta.com \
    --cc=leobras@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=seanjc@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox