public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@kernel.org>
To: rcu@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, kernel-team@meta.com,
	rostedt@goodmis.org, Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>,
	"Paul E . McKenney" <paulmck@kernel.org>
Subject: [PATCH rcu 2/9] rcu: Reduce synchronize_rcu() delays when all wait heads are in use
Date: Tue,  4 Jun 2024 15:23:48 -0700	[thread overview]
Message-ID: <20240604222355.2370768-2-paulmck@kernel.org> (raw)
In-Reply-To: <657595c8-e86c-4594-a5b1-3c64a8275607@paulmck-laptop>

From: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>

When all wait heads are in use, which can happen when
rcu_sr_normal_gp_cleanup_work()'s callback processing
is slow, any new synchronize_rcu() user's rcu_synchronize
node's processing is deferred to future GP periods. This
can result in long list of synchronize_rcu() invocations
waiting for full grace period processing, which can delay
freeing of memory. Mitigate this problem by using first
node in the list as wait tail when all wait heads are in use.
While methods to speed up callback processing would be needed
to recover from this situation, allowing new nodes to complete
their grace period can help prevent delays due to a fixed
number of wait head nodes.

Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree.c | 40 +++++++++++++++++++++++-----------------
 1 file changed, 23 insertions(+), 17 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 28c7031711a3f..6ba36d9c09bde 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1463,14 +1463,11 @@ static void rcu_poll_gp_seq_end_unlocked(unsigned long *snap)
  * for this new grace period. Given that there are a fixed
  * number of wait nodes, if all wait nodes are in use
  * (which can happen when kworker callback processing
- * is delayed) and additional grace period is requested.
- * This means, a system is slow in processing callbacks.
- *
- * TODO: If a slow processing is detected, a first node
- * in the llist should be used as a wait-tail for this
- * grace period, therefore users which should wait due
- * to a slow process are handled by _this_ grace period
- * and not next.
+ * is delayed), first node in the llist is used as wait
+ * tail for this grace period. This means, the first node
+ * has to go through additional grace periods before it is
+ * part of the wait callbacks. This should be ok, as
+ * the system is slow in processing callbacks anyway.
  *
  * Below is an illustration of how the done and wait
  * tail pointers move from one set of rcu_synchronize nodes
@@ -1639,7 +1636,6 @@ static void rcu_sr_normal_gp_cleanup_work(struct work_struct *work)
 	if (!done)
 		return;
 
-	WARN_ON_ONCE(!rcu_sr_is_wait_head(done));
 	head = done->next;
 	done->next = NULL;
 
@@ -1676,13 +1672,21 @@ static void rcu_sr_normal_gp_cleanup(void)
 
 	rcu_state.srs_wait_tail = NULL;
 	ASSERT_EXCLUSIVE_WRITER(rcu_state.srs_wait_tail);
-	WARN_ON_ONCE(!rcu_sr_is_wait_head(wait_tail));
 
 	/*
 	 * Process (a) and (d) cases. See an illustration.
 	 */
 	llist_for_each_safe(rcu, next, wait_tail->next) {
-		if (rcu_sr_is_wait_head(rcu))
+		/*
+		 * The done tail may reference a rcu_synchronize node.
+		 * Stop at done tail, as using rcu_sr_normal_complete()
+		 * from this path can result in use-after-free. This
+		 * may occur if, following the wake-up of the synchronize_rcu()
+		 * wait contexts and freeing up of node memory,
+		 * rcu_sr_normal_gp_cleanup_work() accesses the done tail and
+		 * its subsequent nodes.
+		 */
+		if (wait_tail->next == rcu_state.srs_done_tail)
 			break;
 
 		rcu_sr_normal_complete(rcu);
@@ -1719,15 +1723,17 @@ static bool rcu_sr_normal_gp_init(void)
 		return start_new_poll;
 
 	wait_head = rcu_sr_get_wait_head();
-	if (!wait_head) {
-		// Kick another GP to retry.
+	if (wait_head) {
+		/* Inject a wait-dummy-node. */
+		llist_add(wait_head, &rcu_state.srs_next);
+	} else {
+		// Kick another GP for first node.
 		start_new_poll = true;
-		return start_new_poll;
+		if (first == rcu_state.srs_done_tail)
+			return start_new_poll;
+		wait_head = first;
 	}
 
-	/* Inject a wait-dummy-node. */
-	llist_add(wait_head, &rcu_state.srs_next);
-
 	/*
 	 * A waiting list of rcu_synchronize nodes should be empty on
 	 * this step, since a GP-kthread, rcu_gp_init() -> gp_cleanup(),
-- 
2.40.1


  parent reply	other threads:[~2024-06-04 22:23 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-04 22:23 [PATCH rcu 0/9] Miscellaneous fixes for v6.11 Paul E. McKenney
2024-06-04 22:23 ` [PATCH rcu 1/9] rcu: Add lockdep_assert_in_rcu_read_lock() and friends Paul E. McKenney
2025-02-20 19:38   ` Jeff Johnson
2025-02-20 22:04     ` Paul E. McKenney
2025-02-20 23:51       ` Jeff Johnson
2024-06-04 22:23 ` Paul E. McKenney [this message]
2024-06-05 12:09   ` [PATCH rcu 2/9] rcu: Reduce synchronize_rcu() delays when all wait heads are in use Frederic Weisbecker
2024-06-05 18:38     ` Paul E. McKenney
2024-06-06  3:46       ` Neeraj Upadhyay
2024-06-06 16:49         ` Paul E. McKenney
2024-06-11 10:12           ` Uladzislau Rezki
2024-06-04 22:23 ` [PATCH rcu 3/9] rcu/tree: Reduce wake up for synchronize_rcu() common case Paul E. McKenney
2024-06-05 16:35   ` Frederic Weisbecker
2024-06-05 18:42     ` Paul E. McKenney
2024-06-06  5:58     ` Neeraj upadhyay
2024-06-06 18:12       ` Paul E. McKenney
2024-06-07  1:51         ` Neeraj upadhyay
2024-06-10 15:12           ` Paul E. McKenney
2024-06-11 13:46             ` Neeraj upadhyay
2024-06-11 16:17               ` Paul E. McKenney
2024-06-04 22:23 ` [PATCH rcu 4/9] rcu: Disable interrupts directly in rcu_gp_init() Paul E. McKenney
2024-06-04 22:23 ` [PATCH rcu 5/9] srcu: Disable interrupts directly in srcu_gp_end() Paul E. McKenney
2024-06-04 22:23 ` [PATCH rcu 6/9] rcu: Add rcutree.nocb_patience_delay to reduce nohz_full OS jitter Paul E. McKenney
2024-06-10  5:05   ` Leonardo Bras
2024-06-10 15:10     ` Paul E. McKenney
2024-07-03 16:21   ` Frederic Weisbecker
2024-07-03 17:25     ` Paul E. McKenney
2024-07-04 22:18       ` Frederic Weisbecker
2024-07-05  0:26         ` Paul E. McKenney
2024-06-04 22:23 ` [PATCH rcu 7/9] MAINTAINERS: Add Uladzislau Rezki as RCU maintainer Paul E. McKenney
2024-06-04 22:23 ` [PATCH rcu 8/9] rcu: Eliminate lockless accesses to rcu_sync->gp_count Paul E. McKenney
2024-06-04 22:23 ` [PATCH rcu 9/9] rcu: Fix rcu_barrier() VS post CPUHP_TEARDOWN_CPU invocation Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240604222355.2370768-2-paulmck@kernel.org \
    --to=paulmck@kernel.org \
    --cc=Neeraj.Upadhyay@amd.com \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox