Re: [PATCH 1/2] mm: clarify __GFP_MEMALLOC usage

All of lore.kernel.org
 help / color / mirror / Atom feed

From: John Hubbard <jhubbard@nvidia.com>
To: NeilBrown <neilb@suse.de>, Michal Hocko <mhocko@kernel.org>
Cc: David Rientjes <rientjes@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Joel Fernandes <joel@joelfernandes.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	<linux-mm@kvack.org>, LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/2] mm: clarify __GFP_MEMALLOC usage
Date: Mon, 6 Apr 2020 18:21:39 -0700	[thread overview]
Message-ID: <34cb5019-9fab-4dfd-db48-c3f88fe5614c@nvidia.com> (raw)
In-Reply-To: <875zecw1n6.fsf@notabene.neil.brown.name>

On 4/6/20 6:00 PM, NeilBrown wrote:
...
>>> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
>>> index e5b817cb86e7..9cacef1a3ee0 100644
>>> --- a/include/linux/gfp.h
>>> +++ b/include/linux/gfp.h
>>> @@ -110,6 +110,11 @@ struct vm_area_struct;
>>>     * the caller guarantees the allocation will allow more memory to be freed
>>>     * very shortly e.g. process exiting or swapping. Users either should
>>>     * be the MM or co-ordinating closely with the VM (e.g. swap over NFS).
>>> + * Users of this flag have to be extremely careful to not deplete the reserve
>>> + * completely and implement a throttling mechanism which controls the consumption
>>> + * of the reserve based on the amount of freed memory.
>>> + * Usage of a pre-allocated pool (e.g. mempool) should be always considered before
>>> + * using this flag.
> 
> I think this version is pretty good.
> 
>>>     *
>>>     * %__GFP_NOMEMALLOC is used to explicitly forbid access to emergency reserves.
>>>     * This takes precedence over the %__GFP_MEMALLOC flag if both are set.
>>>
>>
>> Hi Michal and all,
>>
>> How about using approximately this wording instead? I found Neil's wording to be
>> especially helpful so I mixed it in. (Also fixed a couple of slight 80-col overruns.)
>>
>> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
>> index be2754841369..c247a911d8c7 100644
>> --- a/include/linux/gfp.h
>> +++ b/include/linux/gfp.h
>> @@ -111,6 +111,15 @@ struct vm_area_struct;
>>     * very shortly e.g. process exiting or swapping. Users either should
>>     * be the MM or co-ordinating closely with the VM (e.g. swap over NFS).
>>     *
>> + * To be extra clear: users of __GFP_MEMALLOC must be working to free other
>> + * memory, and that other memory needs to be freed "soon"; specifically, before
>> + * the reserve is exhausted. This generally implies a throttling mechanism that
>> + * balances the amount of __GFP_MEMALLOC memory used against the amount that the
>> + * caller is about to free.
> 
> I don't like this change. "balances the amount ... is about to free"
> does say anything about time, so it doesn't seem to be about throttling.
> 
> I think it is hard to write rules because the rules are a bit spongey.
> 
> With mempools, we have a nice clear rule.  When you allocate from a
> mempool you must have a clear path to freeing that allocation which will
> not block on memory allocation except from a subordinate mempool.  This
> implies a partial ordering between mempools.  When you have layered
> block devices the path through the layers from filesystem down to
> hardware defines the order.  It isn't enforced, but it is quite easy to
> reason about.
> 
> GFP_MEMALLOC effectively provides multiple mempools.  So it could
> theoretically deadlock if multiple long dependency chains
> happened. i.e. if 1000 threads each make a GFP_MEMALLOC allocation and
> then need to make another one before the first can be freed - then you
> hit problems.  There is no formal way to guarantee that this doesn't
> happen.  We just say "be gentle" and minimize the users of this flag,
> and keep more memory in reserve than we really need.
> Note that 'threads' here might not be Linux tasks.  If you have an IO
> request that proceed asynchronously, moving from queue to queue and
> being handled by different task, then each one is a "thread" for the
> purpose of understanding mem-alloc dependency.
> 
> So maybe what I really should focus on is not how quickly things happen,
> but how many happen concurrently.  The idea of throttling is to allow
> previous requests to complete before we start too many more.
> 
> With Swap-over-NFS, some of the things that might need to be allocated
> are routing table entries.  These scale with the number of NFS servers
> rather than the number of IO requests, so they are not going to cause
> concurrency problems.
> We also need memory to store replies, but these never exceed the number
> of pending requests, so there is limited concurrency there.
> NFS can send a lot of requests in parallel, but the main limit is the
> RPC "slot table" and while that grows dynamically, it does so with
> GFP_NOFS, so it can block or fail (I wonder if that should explicitly
> disable the use of the reserves).
> 
> So there a limit on concurrency imposed by non-GFP_MEMALLOC allocations
> 
> So ... maybe the documentation should say that boundless concurrency of
> allocations (i.e. one module allocating a boundless number of times
> before previous allocations are freed) must be avoided.
> 

Well, that's a good discussion that you just wrote, above, and I think it
demonstrates that it's hard to describe the situation in just a couple of
sentences. With that in mind, perhaps it's best to take the above notes
as a starting point, adjust them slightly and drop them into
Documentation/core-api/memory-allocation.rst ?

Then the comments here could refer to it.


thanks,
-- 
John Hubbard
NVIDIA

next prev parent reply	other threads:[~2020-04-07  1:21 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-03  8:35 [PATCH 0/2] mm: few refinements to gfp flags documentation Michal Hocko
2020-04-03  8:35 ` [PATCH 1/2] mm: clarify __GFP_MEMALLOC usage Michal Hocko
2020-04-03 19:41   ` David Rientjes
2020-04-03 21:23     ` NeilBrown
2020-04-06  7:01       ` Michal Hocko
2020-04-06 19:02         ` John Hubbard
2020-04-06 23:32           ` David Rientjes
2020-04-06 23:40             ` John Hubbard
2020-04-14  2:15               ` Andrew Morton
2020-04-14  3:56                 ` NeilBrown
2020-04-14 19:05                   ` John Hubbard
2020-04-07  1:00           ` NeilBrown
2020-04-07  1:21             ` John Hubbard [this message]
2020-04-07  7:24             ` Michal Hocko
2020-04-03  8:35 ` [PATCH 2/2] mm: make it clear that gfp reclaim modifiers are valid only for sleepable allocations Michal Hocko
2020-04-03 19:41   ` David Rientjes
2020-04-07  1:38     ` Joel Fernandes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=34cb5019-9fab-4dfd-db48-c3f88fe5614c@nvidia.com \
    --to=jhubbard@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=joel@joelfernandes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=neilb@suse.de \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.