public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@kernel.org>
To: rcu@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, kernel-team@meta.com,
	rostedt@goodmis.org, "Paul E. McKenney" <paulmck@kernel.org>
Subject: [PATCH rcu 08/11] rcuscale: Make rcu_scale_writer() tolerate repeated GFP_KERNEL failure
Date: Thu,  1 Aug 2024 17:43:05 -0700	[thread overview]
Message-ID: <20240802004308.4134731-8-paulmck@kernel.org> (raw)
In-Reply-To: <917e8cc8-8688-428a-9122-25544c5cc101@paulmck-laptop>

Under some conditions, kmalloc(GFP_KERNEL) allocations have been
observed to repeatedly fail.  This situation has been observed to
cause one of the rcu_scale_writer() instances to loop indefinitely
retrying memory allocation for an asynchronous grace-period primitive.
The problem is that if memory is short, all the other instances will
allocate all available memory before the looping task is awakened from
its rcu_barrier*() call.  This in turn results in hangs, so that rcuscale
fails to complete.

This commit therefore removes the tight retry loop, so that when this
condition occurs, the affected task is still passing through the full
loop with its full set of termination checks.  This spreads the risk
of indefinite memory-allocation retry failures across all instances of
rcu_scale_writer() tasks, which in turn prevents the hangs.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/rcuscale.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/kernel/rcu/rcuscale.c b/kernel/rcu/rcuscale.c
index a820f11b19444..01d48eb753b41 100644
--- a/kernel/rcu/rcuscale.c
+++ b/kernel/rcu/rcuscale.c
@@ -520,6 +520,8 @@ rcu_scale_writer(void *arg)
 
 	jdone = jiffies + minruntime * HZ;
 	do {
+		bool gp_succeeded = false;
+
 		if (writer_holdoff)
 			udelay(writer_holdoff);
 		if (writer_holdoff_jiffies)
@@ -527,23 +529,24 @@ rcu_scale_writer(void *arg)
 		wdp = &wdpp[i];
 		*wdp = ktime_get_mono_fast_ns();
 		if (gp_async && !WARN_ON_ONCE(!cur_ops->async)) {
-retry:
 			if (!rhp)
 				rhp = kmalloc(sizeof(*rhp), GFP_KERNEL);
 			if (rhp && atomic_read(this_cpu_ptr(&n_async_inflight)) < gp_async_max) {
 				atomic_inc(this_cpu_ptr(&n_async_inflight));
 				cur_ops->async(rhp, rcu_scale_async_cb);
 				rhp = NULL;
+				gp_succeeded = true;
 			} else if (!kthread_should_stop()) {
 				cur_ops->gp_barrier();
-				goto retry;
 			} else {
 				kfree(rhp); /* Because we are stopping. */
 			}
 		} else if (gp_exp) {
 			cur_ops->exp_sync();
+			gp_succeeded = true;
 		} else {
 			cur_ops->sync();
+			gp_succeeded = true;
 		}
 		t = ktime_get_mono_fast_ns();
 		*wdp = t - *wdp;
@@ -599,7 +602,7 @@ rcu_scale_writer(void *arg)
 				__func__, me, started, done, writer_done[me], atomic_read(&n_rcu_scale_writer_finished), i, jiffies - jdone);
 			selfreport = true;
 		}
-		if (started && !alldone && i < MAX_MEAS - 1)
+		if (gp_succeeded && started && !alldone && i < MAX_MEAS - 1)
 			i++;
 		rcu_scale_wait_shutdown();
 	} while (!torture_must_stop());
-- 
2.40.1


  parent reply	other threads:[~2024-08-02  0:43 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-02  0:43 [PATCH rcu 0/11] RCU update-side scalability update test Paul E. McKenney
2024-08-02  0:42 ` [PATCH rcu 02/11] rcuscale: Dump stacks of stalled rcu_scale_writer() instances Paul E. McKenney
2024-08-02  0:43 ` [PATCH rcu 03/11] rcuscale: Dump grace-period statistics when rcu_scale_writer() stalls Paul E. McKenney
2024-08-02  0:43 ` [PATCH rcu 04/11] rcu: Mark callbacks not currently participating in barrier operation Paul E. McKenney
2024-08-02  0:43 ` [PATCH rcu 05/11] rcuscale: Print detailed grace-period and barrier diagnostics Paul E. McKenney
2024-08-02  0:43 ` [PATCH rcu 06/11] rcuscale: Provide clear error when async specified without primitives Paul E. McKenney
2024-08-14 12:49   ` Neeraj Upadhyay
2024-08-14 15:09     ` Paul E. McKenney
2024-08-02  0:43 ` [PATCH rcu 07/11] rcuscale: Make all writer tasks report upon hang Paul E. McKenney
2024-08-02  0:43 ` Paul E. McKenney [this message]
2024-08-02  0:43 ` [PATCH rcu 09/11] rcuscale: Use special allocator for rcu_scale_writer() Paul E. McKenney
2024-08-02  0:43 ` [PATCH rcu 10/11] rcuscale: NULL out top-level pointers to heap memory Paul E. McKenney
2024-08-02  0:43 ` [PATCH rcu 11/11] rcuscale: Count outstanding callbacks per-task rather than per-CPU Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240802004308.4134731-8-paulmck@kernel.org \
    --to=paulmck@kernel.org \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox