linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: David Rientjes <rientjes@google.com>, Michal Hocko <mhocko@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Joel Fernandes <joel@joelfernandes.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>,
	Michal Hocko <mhocko@suse.com>
Subject: Re: [PATCH 1/2] mm: clarify __GFP_MEMALLOC usage
Date: Sat, 04 Apr 2020 08:23:45 +1100	[thread overview]
Message-ID: <87blo8xnz2.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <alpine.DEB.2.21.2004031238571.230548@chino.kir.corp.google.com>

[-- Attachment #1: Type: text/plain, Size: 4369 bytes --]

On Fri, Apr 03 2020, David Rientjes wrote:

> On Fri, 3 Apr 2020, Michal Hocko wrote:
>
>> From: Michal Hocko <mhocko@suse.com>
>> 
>> It seems that the existing documentation is not explicit about the
>> expected usage and potential risks enough. While it is calls out
>> that users have to free memory when using this flag it is not really
>> apparent that users have to careful to not deplete memory reserves
>> and that they should implement some sort of throttling wrt. freeing
>> process.
>> 
>> This is partly based on Neil's explanation [1].
>> 
>> [1] http://lkml.kernel.org/r/877dz0yxoa.fsf@notabene.neil.brown.name
>> Signed-off-by: Michal Hocko <mhocko@suse.com>
>> ---
>>  include/linux/gfp.h | 3 +++
>>  1 file changed, 3 insertions(+)
>> 
>> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
>> index e5b817cb86e7..e3ab1c0d9140 100644
>> --- a/include/linux/gfp.h
>> +++ b/include/linux/gfp.h
>> @@ -110,6 +110,9 @@ struct vm_area_struct;
>>   * the caller guarantees the allocation will allow more memory to be freed
>>   * very shortly e.g. process exiting or swapping. Users either should
>>   * be the MM or co-ordinating closely with the VM (e.g. swap over NFS).
>> + * Users of this flag have to be extremely careful to not deplete the reserve
>> + * completely and implement a throttling mechanism which controls the consumption
>> + * of the reserve based on the amount of freed memory.
>>   *
>>   * %__GFP_NOMEMALLOC is used to explicitly forbid access to emergency reserves.
>>   * This takes precedence over the %__GFP_MEMALLOC flag if both are set.
>
> Hmm, any guidance that we can offer to users of this flag that aren't 
> aware of __GFP_MEMALLOC internals?  If I were to read this and not be 
> aware of the implementation, I would ask "how do I know when I'm at risk 
> of depleting this reserve" especially since the amount of reserve is 
> controlled by sysctl.  How do I know when I'm risking a depletion of this 
> shared reserve?

"how do I know when I'm at risk of depleting this reserve" is definitely
the wrong question to be asking.  The questions to ask are:
- how little memory to I need to ensure forward progress?
- how quick will that forward progress be?

In the ideal case a small allocation will be all that is needed in order
for that allocation plus another page to be freed "quickly", in time
governed only by throughput to some device.  In that case you probably
don't need to worry about rate limiting.

The reason I brought up ratelimiting is that RCU is slow.  You can get
quite a lot of memory caught up in the kfree-rcu lists.  That's not much
of a problem for normal memory, but it might be for the more limited
reserves.

The other difficulty with the the kfree_rcu case is that we have no idea
how many users there will be, so we cannot realistically model how long
the queue might get.  Compare with NFS swap-out there the only user it
the VM swapping memory which (I think?) already tries to pace writeout
with the speed of the device (or is that just writeback...).  I'm
clearly not sure of the details but it is a more constrained environment
so it is more predicatable.

In many cases, preallocating a private reserve is better than using
GFP_MEMALLOC.  That is what mempools provide and they are very effective
(though often way over-allocated*).
GFP_MEMALLOC was added because swap-over-NFS requires lots of different
allocations (transmit headers, receive buffers, possible routing changes
etc), many of them in the network layer which is very sensitive
to latency (and mempools require a spinlock to get the reserves).

Maybe the documentation should say.
 Don't use this - use a mempool.  Here be dragons.

I'm not sure you can really say anything more useful without writing a
long essay.

NeilBrown

(*) mempool sizes should not exceed 2 without measurements demonstrating
that more provides better throughput. Many are 2, (BIO_POOL_SIZE is 2,
which is perfect) but some aren't.
 #define DRBD_MIN_POOL_PAGES       128
way too big!
 #define MIN_IOS 256
even bigger!
 mempool_create_page_pool(2 * (F2FS_IO_SIZE(sbi) - 1), 0);
This is really wrong.  If the IO size is relevant, then each object in
the pool needs to be that size.  Having that many objects in the pool
doesn't mean anything useful.


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

  reply	other threads:[~2020-04-03 21:23 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-03  8:35 [PATCH 0/2] mm: few refinements to gfp flags documentation Michal Hocko
2020-04-03  8:35 ` [PATCH 1/2] mm: clarify __GFP_MEMALLOC usage Michal Hocko
2020-04-03 19:41   ` David Rientjes
2020-04-03 21:23     ` NeilBrown [this message]
2020-04-06  7:01       ` Michal Hocko
2020-04-06 19:02         ` John Hubbard
2020-04-06 23:32           ` David Rientjes
2020-04-06 23:40             ` John Hubbard
2020-04-14  2:15               ` Andrew Morton
2020-04-14  3:56                 ` NeilBrown
2020-04-14 19:05                   ` John Hubbard
2020-04-07  1:00           ` NeilBrown
2020-04-07  1:21             ` John Hubbard
2020-04-07  7:24             ` Michal Hocko
2020-04-03  8:35 ` [PATCH 2/2] mm: make it clear that gfp reclaim modifiers are valid only for sleepable allocations Michal Hocko
2020-04-03 19:41   ` David Rientjes
2020-04-07  1:38     ` Joel Fernandes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87blo8xnz2.fsf@notabene.neil.brown.name \
    --to=neilb@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=joel@joelfernandes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mhocko@suse.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).