Re: [PATCH RFC v1 1/2] rcu/tree: Add basic support for kfree_rcu batching

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Joel Fernandes <joel@joelfernandes.org>
To: Byungchul Park <byungchul.park@lge.com>
Cc: "Paul E. McKenney" <paulmck@linux.ibm.com>,
	linux-kernel@vger.kernel.org, Rao Shoaib <rao.shoaib@oracle.com>,
	max.byungchul.park@gmail.com, kernel-team@android.com,
	kernel-team@lge.com, Davidlohr Bueso <dave@stgolabs.net>,
	Josh Triplett <josh@joshtriplett.org>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	rcu@vger.kernel.org, Steven Rostedt <rostedt@goodmis.org>
Subject: Re: [PATCH RFC v1 1/2] rcu/tree: Add basic support for kfree_rcu batching
Date: Thu, 8 Aug 2019 19:30:14 -0400	[thread overview]
Message-ID: <20190808233014.GA184373@google.com> (raw)
In-Reply-To: <20190808125607.GB261256@google.com>

On Thu, Aug 08, 2019 at 08:56:07AM -0400, Joel Fernandes wrote:
> On Thu, Aug 08, 2019 at 06:52:32PM +0900, Byungchul Park wrote:
> > On Wed, Aug 07, 2019 at 10:52:15AM -0700, Paul E. McKenney wrote:
> > > > > On Tue, Aug 06, 2019 at 05:20:40PM -0400, Joel Fernandes (Google) wrote:
> > > [ . . . ]
> > > > > > +	for (; head; head = next) {
> > > > > > +		next = head->next;
> > > > > > +		head->next = NULL;
> > > > > > +		__call_rcu(head, head->func, -1, 1);
> > > > > 
> > > > > We need at least a cond_resched() here.  200,000 times through this loop
> > > > > in a PREEMPT=n kernel might not always be pretty.  Except that this is
> > > > > invoked directly from kfree_rcu() which might be invoked with interrupts
> > > > > disabled, which precludes calls to cond_resched().  So the realtime guys
> > > > > are not going to be at all happy with this loop.
> > > > 
> > > > Ok, will add this here.
> > > > 
> > > > > And this loop could be avoided entirely by having a third rcu_head list
> > > > > in the kfree_rcu_cpu structure.  Yes, some of the batches would exceed
> > > > > KFREE_MAX_BATCH, but given that they are invoked from a workqueue, that
> > > > > should be OK, or at least more OK than queuing 200,000 callbacks with
> > > > > interrupts disabled.  (If it turns out not to be OK, an array of rcu_head
> > > > > pointers can be used to reduce the probability of oversized batches.)
> > > > > This would also mean that the equality comparisons with KFREE_MAX_BATCH
> > > > > need to become greater-or-equal comparisons or some such.
> > > > 
> > > > Yes, certainly we can do these kinds of improvements after this patch, and
> > > > then add more tests to validate the improvements.
> > > 
> > > Out of pity for people bisecting, we need this fixed up front.
> > > 
> > > My suggestion is to just allow ->head to grow until ->head_free becomes
> > > available.  That way you are looping with interrupts and preemption
> > > enabled in workqueue context, which is much less damaging than doing so
> > > with interrupts disabled, and possibly even from hard-irq context.
> > 
> > Agree.
> > 
> > Or after introducing another limit like KFREE_MAX_BATCH_FORCE(>=
> > KFREE_MAX_BATCH):
> > 
> > 1. Try to drain it on hitting KFREE_MAX_BATCH as it does.
> > 
> >    On success: Same as now.
> >    On fail: let ->head grow and drain if possible, until reaching to
> >             KFREE_MAX_BATCH_FORCE.
> > 
> > 3. On hitting KFREE_MAX_BATCH_FORCE, give up batching but handle one by
> >    one from now on to prevent too many pending requests from being
> >    queued for batching work.
> 
> I also agree. But this _FORCE thing will still not solve the issue Paul is
> raising which is doing this loop possibly in irq disabled / hardirq context.
> We can't even cond_resched() here. In fact since _FORCE is larger, it will be
> even worse. Consider a real-time system with a lot of memory, in this case
> letting ->head grow large is Ok, but looping for long time in IRQ disabled
> would not be Ok.
> 
> But I could make it something like:
> 1. Letting ->head grow if ->head_free busy
> 2. If head_free is busy, then just queue/requeue the monitor to try again.
> 
> This would even improve performance, but will still risk going out of memory.

It seems I can indeed hit an out of memory condition once I changed it to
"letting list grow" (diff is below which applies on top of this patch) while
at the same time removing the schedule_timeout(2) and replacing it with
cond_resched() in the rcuperf test.  I think the reason is the rcuperf test
starves the worker threads that are executing in workqueue context after a
grace period and those are unable to get enough CPU time to kfree things fast
enough. But I am not fully sure about it and need to test/trace more to
figure out why this is happening.

If I add back the schedule_uninterruptibe_timeout(2) call, the out of memory
situation goes away.

Clearly we need to do more work on this patch.

In the regular kfree_rcu_no_batch() case, I don't hit this issue. I believe
that since the kfree is happening in softirq context in the _no_batch() case,
it fares better. The question then I guess is how do we run the rcu_work in a
higher priority context so it is not starved and runs often enough. I'll
trace more.

Perhaps I can also lower the priority of the rcuperf threads to give the
worker thread some more room to run and see if anything changes. But I am not
sure then if we're preparing the code for the real world with such
modifications.

Any thoughts?

thanks,

 - Joel

---8<-----------------------

From 098d62e5a1b84a11139236c9b1f59e7f32289b40 Mon Sep 17 00:00:00 2001
From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Date: Thu, 8 Aug 2019 16:29:58 -0400
Subject: [PATCH] Let list grow

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/rcu/rcuperf.c |  2 +-
 kernel/rcu/tree.c    | 52 +++++++++++++++++++-------------------------
 2 files changed, 23 insertions(+), 31 deletions(-)

diff --git a/kernel/rcu/rcuperf.c b/kernel/rcu/rcuperf.c
index 34658760da5e..7dc831db89ae 100644
--- a/kernel/rcu/rcuperf.c
+++ b/kernel/rcu/rcuperf.c
@@ -654,7 +654,7 @@ kfree_perf_thread(void *arg)
 			}
 		}
 
-		schedule_timeout_uninterruptible(2);
+		cond_resched();
 	} while (!torture_must_stop() && ++l < kfree_loops);
 
 	kfree(alloc_ptrs);
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index bdbd483606ce..bab77220d8ac 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2595,7 +2595,7 @@ EXPORT_SYMBOL_GPL(call_rcu);
 
 
 /* Maximum number of jiffies to wait before draining batch */
-#define KFREE_DRAIN_JIFFIES 50
+#define KFREE_DRAIN_JIFFIES (HZ / 20)
 
 /*
  * Maximum number of kfree(s) to batch, if limit is hit
@@ -2684,27 +2684,19 @@ static void kfree_rcu_drain_unlock(struct kfree_rcu_cpu *krc,
 {
 	struct rcu_head *head, *next;
 
-	/* It is time to do bulk reclaim after grace period */
-	krc->monitor_todo = false;
+	/* It is time to do bulk reclaim after grace period. */
 	if (queue_kfree_rcu_work(krc)) {
 		spin_unlock_irqrestore(&krc->lock, flags);
 		return;
 	}
 
-	/*
-	 * Use non-batch regular call_rcu for kfree_rcu in case things are too
-	 * busy and batching of kfree_rcu could not be used.
-	 */
-	head = krc->head;
-	krc->head = NULL;
-	krc->kfree_batch_len = 0;
-	spin_unlock_irqrestore(&krc->lock, flags);
-
-	for (; head; head = next) {
-		next = head->next;
-		head->next = NULL;
-		__call_rcu(head, head->func, -1, 1);
+	/* Previous batch did not get free yet, let us try again soon. */
+	if (krc->monitor_todo == false) {
+		schedule_delayed_work_on(smp_processor_id(),
+				&krc->monitor_work,  KFREE_DRAIN_JIFFIES/4);
+		krc->monitor_todo = true;
 	}
+	spin_unlock_irqrestore(&krc->lock, flags);
 }
 
 /*
-- 
2.23.0.rc1.153.gdeed80330f-goog

next prev parent reply	other threads:[~2019-08-08 23:30 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-06 21:20 [PATCH RFC v1 1/2] rcu/tree: Add basic support for kfree_rcu batching Joel Fernandes (Google)
2019-08-06 21:20 ` [PATCH RFC v1 2/2] rcuperf: Add kfree_rcu performance Tests Joel Fernandes (Google)
2019-08-07  0:29   ` Paul E. McKenney
2019-08-07 10:22     ` Joel Fernandes
2019-08-07 17:56       ` Paul E. McKenney
2019-08-09 16:01         ` Joel Fernandes
2019-08-11  2:01     ` Joel Fernandes
2019-08-11 23:42       ` Paul E. McKenney
2019-08-06 23:56 ` [PATCH RFC v1 1/2] rcu/tree: Add basic support for kfree_rcu batching Paul E. McKenney
2019-08-07  9:45   ` Joel Fernandes
2019-08-07 17:52     ` Paul E. McKenney
2019-08-08  9:52       ` Byungchul Park
2019-08-08 12:56         ` Joel Fernandes
2019-08-08 14:23           ` Byungchul Park
2019-08-08 18:09             ` Paul E. McKenney
2019-08-11  8:36               ` Byungchul Park
2019-08-11  8:49                 ` Byungchul Park
2019-08-11 23:49                   ` Paul E. McKenney
2019-08-12 10:10                     ` Byungchul Park
2019-08-12 13:12                       ` Joel Fernandes
2019-08-13  5:29                         ` Byungchul Park
2019-08-13 15:41                           ` Paul E. McKenney
2019-08-14  0:11                             ` Byungchul Park
2019-08-14  2:53                               ` Paul E. McKenney
2019-08-14  3:43                                 ` Byungchul Park
2019-08-14 16:59                                   ` Paul E. McKenney
2019-08-11 10:37                 ` Byungchul Park
2019-08-08 23:30           ` Joel Fernandes [this message]
2019-08-09 15:16             ` Paul E. McKenney
2019-08-09 15:39               ` Joel Fernandes
2019-08-09 16:33                 ` Paul E. McKenney
2019-08-09 20:22                   ` Joel Fernandes
2019-08-09 20:26                     ` Joel Fernandes
2019-08-09 21:25                       ` Joel Fernandes
2019-08-10  3:38                         ` Paul E. McKenney
2019-08-09 20:29                     ` Joel Fernandes
2019-08-09 20:42                     ` Paul E. McKenney
2019-08-09 21:36                       ` Joel Fernandes
2019-08-10  3:40                         ` Paul E. McKenney
2019-08-10  3:52                           ` Joel Fernandes
2019-08-10  2:42       ` Joel Fernandes
2019-08-10  3:38         ` Paul E. McKenney
2019-08-10  4:20           ` Joel Fernandes
2019-08-10 18:24             ` Paul E. McKenney
2019-08-11  2:26               ` Joel Fernandes
2019-08-11 23:35                 ` Paul E. McKenney
2019-08-12 13:13                   ` Joel Fernandes
2019-08-12 14:44                     ` Paul E. McKenney
2019-08-08 10:26     ` Byungchul Park
2019-08-08 18:11       ` Paul E. McKenney
2019-08-08 20:13         ` Joel Fernandes
2019-08-08 20:51           ` Paul E. McKenney
2019-08-08 22:34             ` Joel Fernandes
2019-08-08 22:37               ` Paul E. McKenney

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:34658760da5 dfblob:7dc831db89a dfblob:bdbd483606c
dfblob:bab77220d8a )
 OR (
bs:"Let list grow" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190808233014.GA184373@google.com \
    --to=joel@joelfernandes.org \
    --cc=byungchul.park@lge.com \
    --cc=dave@stgolabs.net \
    --cc=jiangshanlai@gmail.com \
    --cc=josh@joshtriplett.org \
    --cc=kernel-team@android.com \
    --cc=kernel-team@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=max.byungchul.park@gmail.com \
    --cc=paulmck@linux.ibm.com \
    --cc=rao.shoaib@oracle.com \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.