Re: [PATCH 1/3] kvfree_rcu: Allocate a page for a single argument

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Uladzislau Rezki <urezki@gmail.com>
To: "Paul E . McKenney" <paulmck@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>,
	LKML <linux-kernel@vger.kernel.org>, RCU <rcu@vger.kernel.org>,
	"Paul E . McKenney" <paulmck@kernel.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Andrew Morton <akpm@linux-foundation.org>,
	Daniel Axtens <dja@axtens.net>,
	Frederic Weisbecker <frederic@kernel.org>,
	Neeraj Upadhyay <neeraju@codeaurora.org>,
	Joel Fernandes <joel@joelfernandes.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	"Theodore Y . Ts'o" <tytso@mit.edu>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
Subject: Re: [PATCH 1/3] kvfree_rcu: Allocate a page for a single argument
Date: Thu, 28 Jan 2021 19:02:37 +0100	[thread overview]
Message-ID: <20210128180237.GA5144@pc638.lan> (raw)
In-Reply-To: <20210128153017.GA2006@pc638.lan>

On Thu, Jan 28, 2021 at 04:30:17PM +0100, Uladzislau Rezki wrote:
> On Thu, Jan 28, 2021 at 04:17:01PM +0100, Michal Hocko wrote:
> > On Thu 28-01-21 16:11:52, Uladzislau Rezki wrote:
> > > On Mon, Jan 25, 2021 at 05:25:59PM +0100, Uladzislau Rezki wrote:
> > > > On Mon, Jan 25, 2021 at 04:39:43PM +0100, Michal Hocko wrote:
> > > > > On Mon 25-01-21 15:31:50, Uladzislau Rezki wrote:
> > > > > > > On Wed 20-01-21 17:21:46, Uladzislau Rezki (Sony) wrote:
> > > > > > > > For a single argument we can directly request a page from a caller
> > > > > > > > context when a "carry page block" is run out of free spots. Instead
> > > > > > > > of hitting a slow path we can request an extra page by demand and
> > > > > > > > proceed with a fast path.
> > > > > > > > 
> > > > > > > > A single-argument kvfree_rcu() must be invoked in sleepable contexts,
> > > > > > > > and that its fallback is the relatively high latency synchronize_rcu().
> > > > > > > > Single-argument kvfree_rcu() therefore uses GFP_KERNEL|__GFP_RETRY_MAYFAIL
> > > > > > > > to allow limited sleeping within the memory allocator.
> > > > > > > 
> > > > > > > __GFP_RETRY_MAYFAIL can be quite heavy. It is effectively the most heavy
> > > > > > > way to allocate without triggering the OOM killer. Is this really what
> > > > > > > you need/want? Is __GFP_NORETRY too weak?
> > > > > > > 
> > > > > > Hm... We agreed to proceed with limited lightwait memory direct reclaim.
> > > > > > Johannes Weiner proposed to go with __GFP_NORETRY flag as a starting
> > > > > > point: https://www.spinics.net/lists/rcu/msg02856.html
> > > > > > 
> > > > > > <snip>
> > > > > >     So I'm inclined to suggest __GFP_NORETRY as a starting point, and make
> > > > > >     further decisions based on instrumentation of the success rates of
> > > > > >     these opportunistic allocations.
> > > > > > <snip>
> > > > > 
> > > > > I completely agree with Johannes here.
> > > > > 
> > > > > > but for some reason, i can't find a tail or head of it, we introduced
> > > > > > __GFP_RETRY_MAYFAIL what is a heavy one from a time consuming point of view.
> > > > > > What we would like to avoid.
> > > > > 
> > > > > Not that I object to this use but I think it would be much better to use
> > > > > it based on actual data. Going along with it right away might become a
> > > > > future burden to make any changes in this aspect later on due to lack of 
> > > > > exact reasoning. General rule of thumb for __GFP_RETRY_MAYFAIL is really
> > > > > try as hard as it can get without being really disruptive (like OOM
> > > > > killing something). And your wording didn't really give me that
> > > > > impression.
> > > > > 
> > > > Initially i proposed just to go with GFP_NOWAIT flag. But later on there
> > > > was a discussion about a fallback path, that uses synchronize_rcu() can be
> > > > slow, thus minimizing its hitting would be great. So, here we go with a
> > > > trade off.
> > > > 
> > > > Doing it hard as __GFP_RETRY_MAYFAIL can do, is not worth(IMHO), but to have some
> > > > light-wait requests would be acceptable. That is why __GFP_NORETRY was proposed.
> > > > 
> > > > There were simple criterias we discussed which we would like to achieve:
> > > > 
> > > > a) minimize a fallback hitting;
> > > > b) avoid of OOM involving;
> > > > c) avoid of dipping into the emergency reserves. See kvfree_rcu: Use __GFP_NOMEMALLOC for single-argument kvfree_rcu()
> > > > 
> > > One question here. Since the code that triggers a page request can be
> > > directly invoked from reclaim context as well as outside of it. We had
> > > a concern about if any recursion is possible, but what i see it is safe.
> > > The context that does it can not enter it twice:
> > > 
> > > <snip>
> > >     /* Avoid recursion of direct reclaim */
> > >     if (current->flags & PF_MEMALLOC)
> > >         goto nopage;
> > > <snip>
> > 
> > Yes this is a recursion protection.
> > 
> > > What about any deadlocking in regards to below following flags?
> > > 
> > > GFP_KERNEL | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN
> > 
> > and __GFP_NOMEMALLOC will make sure that the allocation will not consume
> > all the memory reserves. The later should be clarified in one of your
> > patches I have acked IIRC.
> >
> Yep, it is clarified and reflected in another patch you ACKed.
> 
> Thanks!
> 

Please find below the V2.

Changelog V1 -> V2:
    - replace the __GFP_RETRY_MAYFAIL by __GFP_NORETRY;
    - add more comments explaining why and which flags are used.

<snip>
From 0bdb8ca1ae62088790e0a452c4acec3821e06989 Mon Sep 17 00:00:00 2001
From: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Date: Wed, 20 Jan 2021 17:21:46 +0100
Subject: [PATCH v2 1/1] kvfree_rcu: Directly allocate page for single-argument
 case

Single-argument kvfree_rcu() must be invoked from sleepable contexts,
so we can directly allocate pages.  Furthermmore, the fallback in case
of page-allocation failure is the high-latency synchronize_rcu(), so it
makes sense to do these page allocations from the fastpath, and even to
permit limited sleeping within the allocator.

This commit therefore allocates if needed on the fastpath using
GFP_KERNEL|__GFP_NORETRY.  This also has the beneficial effect
of leaving kvfree_rcu()'s per-CPU caches to the double-argument variant
of kvfree_rcu(), given that the double-argument variant cannot directly
invoke the allocator.

[ paulmck: Add add_ptr_to_bulk_krc_lock header comment per Michal Hocko. ]
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/rcu/tree.c | 51 ++++++++++++++++++++++++++++++++---------------
 1 file changed, 35 insertions(+), 16 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index e04e336bee42..e450c17a06b2 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3465,37 +3465,59 @@ run_page_cache_worker(struct kfree_rcu_cpu *krcp)
 	}
 }
 
+// Record ptr in a page managed by krcp, with the pre-krc_this_cpu_lock()
+// state specified by flags.  If can_alloc is true, the caller must
+// be schedulable and not be holding any locks or mutexes that might be
+// acquired by the memory allocator or anything that it might invoke.
+// Returns true if ptr was successfully recorded, else the caller must
+// use a fallback.
 static inline bool
-kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr)
+add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp,
+	unsigned long *flags, void *ptr, bool can_alloc)
 {
 	struct kvfree_rcu_bulk_data *bnode;
 	int idx;
 
-	if (unlikely(!krcp->initialized))
+	*krcp = krc_this_cpu_lock(flags);
+	if (unlikely(!(*krcp)->initialized))
 		return false;
 
-	lockdep_assert_held(&krcp->lock);
 	idx = !!is_vmalloc_addr(ptr);
 
 	/* Check if a new block is required. */
-	if (!krcp->bkvhead[idx] ||
-			krcp->bkvhead[idx]->nr_records == KVFREE_BULK_MAX_ENTR) {
-		bnode = get_cached_bnode(krcp);
-		/* Switch to emergency path. */
+	if (!(*krcp)->bkvhead[idx] ||
+			(*krcp)->bkvhead[idx]->nr_records == KVFREE_BULK_MAX_ENTR) {
+		bnode = get_cached_bnode(*krcp);
+		if (!bnode && can_alloc) {
+			krc_this_cpu_unlock(*krcp, *flags);
+
+			// __GFP_NORETRY - allows a light-weight direct reclaim
+			// what is OK from minimizing of fallback hitting point of
+			// view. Apart of that it forbids any OOM invoking what is
+			// also beneficial since we are about to release memory soon.
+			//
+			// __GFP_NOWARN - it is supposed that an allocation can
+			// be failed under low memory or high memory pressure
+			// scenarios.
+			bnode = (struct kvfree_rcu_bulk_data *)
+				__get_free_page(GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN);
+			*krcp = krc_this_cpu_lock(flags);
+		}
+
 		if (!bnode)
 			return false;
 
 		/* Initialize the new block. */
 		bnode->nr_records = 0;
-		bnode->next = krcp->bkvhead[idx];
+		bnode->next = (*krcp)->bkvhead[idx];
 
 		/* Attach it to the head. */
-		krcp->bkvhead[idx] = bnode;
+		(*krcp)->bkvhead[idx] = bnode;
 	}
 
 	/* Finally insert. */
-	krcp->bkvhead[idx]->records
-		[krcp->bkvhead[idx]->nr_records++] = ptr;
+	(*krcp)->bkvhead[idx]->records
+		[(*krcp)->bkvhead[idx]->nr_records++] = ptr;
 
 	return true;
 }
@@ -3533,8 +3555,6 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
 		ptr = (unsigned long *) func;
 	}
 
-	krcp = krc_this_cpu_lock(&flags);
-
 	// Queue the object but don't yet schedule the batch.
 	if (debug_rcu_head_queue(ptr)) {
 		// Probable double kfree_rcu(), just leak.
@@ -3542,12 +3562,11 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
 			  __func__, head);
 
 		// Mark as success and leave.
-		success = true;
-		goto unlock_return;
+		return;
 	}
 
 	kasan_record_aux_stack(ptr);
-	success = kvfree_call_rcu_add_ptr_to_bulk(krcp, ptr);
+	success = add_ptr_to_bulk_krc_lock(&krcp, &flags, ptr, !head);
 	if (!success) {
 		run_page_cache_worker(krcp);
 
-- 
2.20.1
<snip>

--
Vlad Rezki

next prev parent reply	other threads:[~2021-01-28 18:17 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-20 16:21 [PATCH 1/3] kvfree_rcu: Allocate a page for a single argument Uladzislau Rezki (Sony)
2021-01-20 16:21 ` [PATCH 2/3] kvfree_rcu: Use __GFP_NOMEMALLOC for single-argument kvfree_rcu() Uladzislau Rezki (Sony)
2021-01-28 18:06   ` Uladzislau Rezki
2021-01-20 16:21 ` [PATCH 3/3] kvfree_rcu: use migrate_disable/enable() Uladzislau Rezki (Sony)
2021-01-20 19:45   ` Sebastian Andrzej Siewior
2021-01-20 21:42     ` Paul E. McKenney
2021-01-23  9:31   ` 回复: " Zhang, Qiang
2021-01-24 21:57     ` Uladzislau Rezki
2021-01-25  1:50       ` 回复: " Zhang, Qiang
2021-01-25  2:18         ` Zhang, Qiang
2021-01-25 13:49           ` Uladzislau Rezki
2021-01-26  9:33             ` 回复: " Zhang, Qiang
2021-01-26 13:43               ` Uladzislau Rezki
2021-01-20 18:40 ` [PATCH 1/3] kvfree_rcu: Allocate a page for a single argument Paul E. McKenney
2021-01-20 19:57 ` Sebastian Andrzej Siewior
2021-01-20 21:54   ` Paul E. McKenney
2021-01-21 13:35     ` Uladzislau Rezki
2021-01-21 15:07       ` Paul E. McKenney
2021-01-21 19:17         ` Uladzislau Rezki
2021-01-22 11:17     ` Sebastian Andrzej Siewior
2021-01-22 15:28       ` Paul E. McKenney
2021-01-21 12:38   ` Uladzislau Rezki
2021-01-22 11:34     ` Sebastian Andrzej Siewior
2021-01-22 14:21       ` Uladzislau Rezki
2021-01-25 13:22 ` Michal Hocko
2021-01-25 14:31   ` Uladzislau Rezki
2021-01-25 15:39     ` Michal Hocko
2021-01-25 16:25       ` Uladzislau Rezki
2021-01-28 15:11         ` Uladzislau Rezki
2021-01-28 15:17           ` Michal Hocko
2021-01-28 15:30             ` Uladzislau Rezki
2021-01-28 18:02               ` Uladzislau Rezki [this message]
     [not found]                 ` <YBPNvbJLg56XU8co@dhcp22.suse.cz>
2021-01-29 16:35                   ` Uladzislau Rezki
2021-02-01 11:47                     ` Michal Hocko
2021-02-01 14:44                       ` Uladzislau Rezki
2021-02-03 19:37                       ` Paul E. McKenney

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:e04e336bee4 dfblob:e450c17a06b )
 OR (
bs:"kvfree_rcu: Directly allocate page for single-argument" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210128180237.GA5144@pc638.lan \
    --to=urezki@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=bigeasy@linutronix.de \
    --cc=dja@axtens.net \
    --cc=frederic@kernel.org \
    --cc=joel@joelfernandes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhocko@suse.com \
    --cc=mpe@ellerman.id.au \
    --cc=neeraju@codeaurora.org \
    --cc=oleksiy.avramchenko@sonymobile.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rcu@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.