Re: [RFC-PROTOTYPE 1/1] mm: Add __GFP_FAST_TRY flag

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Uladzislau Rezki <urezki@gmail.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>,
	LKML <linux-kernel@vger.kernel.org>, RCU <rcu@vger.kernel.org>,
	linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	"Paul E . McKenney" <paulmck@kernel.org>,
	Matthew Wilcox <willy@infradead.org>,
	"Theodore Y . Ts'o" <tytso@mit.edu>,
	Joel Fernandes <joel@joelfernandes.org>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
Subject: Re: [RFC-PROTOTYPE 1/1] mm: Add __GFP_FAST_TRY flag
Date: Tue, 4 Aug 2020 21:46:25 +0200	[thread overview]
Message-ID: <20200804194625.GA29837@pc636> (raw)
In-Reply-To: <1d50a46a-b97f-96b2-8a5c-21075f022f01@suse.cz>

On Tue, Aug 04, 2020 at 07:02:14PM +0200, Vlastimil Babka wrote:
> On 8/3/20 6:30 PM, Uladzislau Rezki (Sony) wrote:
> > Some background and kfree_rcu()
> > ===============================
> > The pointers to be freed are stored in the per-cpu array to improve
> > performance, to enable an easier-to-use API, to accommodate vmalloc
> > memmory and to support a single argument of the kfree_rcu() when only
> > a pointer is passed. More details are below.
> > 
> > In order to maintain such per-CPU arrays there is a need in dynamic
> > allocation when a current array is fully populated and a new block is
> > required. See below the example:
> > 
> >  0 1 2 3      0 1 2 3
> > |p|p|p|p| -> |p|p|p|p| -> NULL
> > 
> > there are two pointer-blocks, each one can store 4 addresses
> > which will be freed after a grace period is passed. In reality
> > we store PAGE_SIZE / sizeof(void *).
> 
> So what do you actually have without the dynamic allocation, 8 addresses or
> PAGE_SIZE / sizeof(void *) addresses? And how many dynamically allocated pages
> did you observe you might need in practice? Can it be somehow quantified the
> benefit that you are able to allocate up to X pages dynamically from the
> pcplists, vs a fixed number of pages held just for that purpose + fallback?
> 
We have PAGE_SIZE / sizeof(void *). The above ASCI was an example :)

Answering the second question about fixed number of preloaded pages. Please see
some concerns:

- It is hard to achieve because the logic does not stick to certain static test
  case, i.e. it depends on how heavily kfree_rcu(single/double) are used. Based
  on that, "how heavily" - number of pages are formed, until the drain/reclaimer
  thread frees them.

- Preloading pages and keeping them for internal use, IMHO, seems not optimal
  from the point of resources wasting. It is better to have a fast mechanism to
  request a page and release it back for needs of others. As described about
  we do not know how much we will need.

- As for fallback. That is something we would like to avoid(please see the cover letter).
  Just mention here one concern. For single argument it an entrance to synchronize_rcu()
  that can significantly slow down the reclamation process. What actually we would like
  to speed up.

> 
> > A number of pre-fetched elements seems does not depend on amount of the
> > physical memory in a system. In my case it is 63 pages. This step is not
> 
> It may depend, if you tune vm.percpu_pagelist_fraction sysctl. But I wouldn't
> know the exact formulas immediately. See pageset_set_high_and_batch(). In any
> case for your purpose the 'high' value (in e.g. /proc/zoneinfo) is more relevant
> (it means the maximum pages you might find cached) for you than the 'batch' (how
> much is cached in one refill).
> 
Thanks. I will have a look at it :) it is good that we can control it!

> > lock-less. It uses spinlock_t for accessing to the body's zone. This
> > step is fully covered in the rmqueue_bulk() function.
> > 
> > Summarizing. The __GFP_FAST_TRY covers only [1] and can not do step [2],
> > due to the fact that [2] acquires spinlock_t. It implies that it is super
> > fast, but a higher rate of fails is also expected.
> > 
> > Usage: __get_free_page(__GFP_FAST_TRY);
> > 
> > 2) There was a proposal from Matthew Wilcox: https://lkml.org/lkml/2020/7/31/1015
> > 
> > <snip>
> > On non-RT, we could make that lock a raw spinlock.  On RT, we could
> > decline to take the lock.  We'd need to abstract the spin_lock() away
> > behind zone_lock(zone), but that should be OK.
> > <snip>
> > 
> > It would be great to use any existing flag, say GFP_NOWAIT. Suppose we
> > decline to take the lock across the page allocator for RT. But there is
> > at least one path that does it outside of the page allocator. GFP_NOWAIT
> > can wakeup the kswapd, whereas a "wake-up path" uses sleepable lock:
> > 
> > wakeup_kswapd() -> wake_up_interruptible(&pgdat->kswapd_wait).
> > 
> > Probably it can be fixed by the excluding of waking of the kswapd process
> > defining something like below:
> 
> Is something missing here?
> 
I was talking about: how to bypass waking up of the kswapd that uses
sleepable lock. So, __get_free_page(0) will give a trick. But of course
that is not enough. Because we have prefeatchin pcpl-logic also. 

> > what is equal to zero and i am not sure if __get_free_page(0) handles
> > all that correctly, though it allocates and seems working on my test
> > machine! Please note it is related to "if we can reuse existing flags".
> > 
> > In the meantime, please see below for a patch that adds a __GFP_FAST_TRY,
> > which can at least serve as a baseline against which other proposals can
> > be compared. The patch is based on the 5.8.0-rc3.
> > 
> > Please RFC.
> 
> At first glance __GFP_FAST_TRY (more descriptive name? __GFP_NO_LOCKS?) seems
> better than doing weird things with GFP_NOWAIT, but depends on the real benefits
> (hence my first questions).
> 
No, i do not want to break GFP_NOWAIT, as Matthew mentioned later :)
__GFP_NO_LOCKS looks nice. I think, something like "TRY" should be added as well.
For example __GFP_NO_LOCKS_FAST_TRY.

I am glad for the reaction on it :)

Thank you, Vlastimil! 

--
Vlad Rezki

     prev parent reply	other threads:[~2020-08-04 19:46 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-03 16:30 [RFC-PROTOTYPE 1/1] mm: Add __GFP_FAST_TRY flag Uladzislau Rezki (Sony)
2020-08-04 17:02 ` Vlastimil Babka
2020-08-04 17:12   ` Matthew Wilcox
2020-08-04 17:34     ` Vlastimil Babka
2020-08-04 21:04       ` Uladzislau Rezki
2020-08-04 19:47     ` Uladzislau Rezki
2020-08-04 19:46   ` Uladzislau Rezki [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200804194625.GA29837@pc636 \
    --to=urezki@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=bigeasy@linutronix.de \
    --cc=joel@joelfernandes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=oleksiy.avramchenko@sonymobile.com \
    --cc=paulmck@kernel.org \
    --cc=rcu@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.