From: NeilBrown <neilb@suse.com>
To: Michal Hocko <mhocko@kernel.org>, Johannes Weiner <hannes@cmpxchg.org>
Cc: David Rientjes <rientjes@google.com>,
linux-mm@kvack.org, Mikulas Patocka <mpatocka@redhat.com>,
Ondrej Kozina <okozina@redhat.com>,
Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
Mel Gorman <mgorman@suse.de>,
Andrew Morton <akpm@linux-foundation.org>,
LKML <linux-kernel@vger.kernel.org>,
dm-devel@redhat.com
Subject: Re: [RFC PATCH 1/2] mempool: do not consume memory reserves from the reclaim path
Date: Fri, 22 Jul 2016 11:41:34 +1000 [thread overview]
Message-ID: <87vazy78kx.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <20160721145309.GR26379@dhcp22.suse.cz>
[-- Attachment #1: Type: text/plain, Size: 6354 bytes --]
On Fri, Jul 22 2016, Michal Hocko wrote:
> On Thu 21-07-16 08:13:00, Johannes Weiner wrote:
>> On Thu, Jul 21, 2016 at 10:52:03AM +0200, Michal Hocko wrote:
>> > Look, there are
>> > $ git grep mempool_alloc | wc -l
>> > 304
>> >
>> > many users of this API and we do not want to flip the default behavior
>> > which is there for more than 10 years. So far you have been arguing
>> > about potential deadlocks and haven't shown any particular path which
>> > would have a direct or indirect dependency between mempool and normal
>> > allocator and it wouldn't be a bug. As the matter of fact the change
>> > we are discussing here causes a regression. If you want to change the
>> > semantic of mempool allocator then you are absolutely free to do so. In
>> > a separate patch which would be discussed with IO people and other
>> > users, though. But we _absolutely_ want to fix the regression first
>> > and have a simple fix for 4.6 and 4.7 backports. At this moment there
>> > are revert and patch 1 on the table. The later one should make your
>> > backtrace happy and should be only as a temporal fix until we find out
>> > what is actually misbehaving on your systems. If you are not interested
>> > to pursue that way I will simply go with the revert.
>>
>> +1
>>
>> It's very unlikely that decade-old mempool semantics are suddenly a
>> fundamental livelock problem, when all the evidence we have is one
>> hang and vague speculation. Given that the patch causes regressions,
>> and that the bug is most likely elsewhere anyway, a full revert rather
>> than merely-less-invasive mempool changes makes the most sense to me.
>
> OK, fair enough. What do you think about the following then? Mikulas, I
> have dropped your Tested-by and Reviewed-by because the patch is
> different but unless you have hit the OOM killer then the testing
> results should be same.
> ---
> From d64815758c212643cc1750774e2751721685059a Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Thu, 21 Jul 2016 16:40:59 +0200
> Subject: [PATCH] Revert "mm, mempool: only set __GFP_NOMEMALLOC if there are
> free elements"
>
> This reverts commit f9054c70d28bc214b2857cf8db8269f4f45a5e23.
>
> There has been a report about OOM killer invoked when swapping out to
> a dm-crypt device. The primary reason seems to be that the swapout
> out IO managed to completely deplete memory reserves. Ondrej was
> able to bisect and explained the issue by pointing to f9054c70d28b
> ("mm, mempool: only set __GFP_NOMEMALLOC if there are free elements").
>
> The reason is that the swapout path is not throttled properly because
> the md-raid layer needs to allocate from the generic_make_request path
> which means it allocates from the PF_MEMALLOC context. dm layer uses
> mempool_alloc in order to guarantee a forward progress which used to
> inhibit access to memory reserves when using page allocator. This has
> changed by f9054c70d28b ("mm, mempool: only set __GFP_NOMEMALLOC if
> there are free elements") which has dropped the __GFP_NOMEMALLOC
> protection when the memory pool is depleted.
>
> If we are running out of memory and the only way forward to free memory
> is to perform swapout we just keep consuming memory reserves rather than
> throttling the mempool allocations and allowing the pending IO to
> complete up to a moment when the memory is depleted completely and there
> is no way forward but invoking the OOM killer. This is less than
> optimal.
>
> The original intention of f9054c70d28b was to help with the OOM
> situations where the oom victim depends on mempool allocation to make a
> forward progress. David has mentioned the following backtrace:
>
> schedule
> schedule_timeout
> io_schedule_timeout
> mempool_alloc
> __split_and_process_bio
> dm_request
> generic_make_request
> submit_bio
> mpage_readpages
> ext4_readpages
> __do_page_cache_readahead
> ra_submit
> filemap_fault
> handle_mm_fault
> __do_page_fault
> do_page_fault
> page_fault
>
> We do not know more about why the mempool is depleted without being
> replenished in time, though. In any case the dm layer shouldn't depend
> on any allocations outside of the dedicated pools so a forward progress
> should be guaranteed. If this is not the case then the dm should be
> fixed rather than papering over the problem and postponing it to later
> by accessing more memory reserves.
>
> mempools are a mechanism to maintain dedicated memory reserves to guaratee
> forward progress. Allowing them an unbounded access to the page allocator
> memory reserves is going against the whole purpose of this mechanism.
>
> Bisected-by: Ondrej Kozina <okozina@redhat.com>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
> mm/mempool.c | 20 ++++----------------
> 1 file changed, 4 insertions(+), 16 deletions(-)
>
> diff --git a/mm/mempool.c b/mm/mempool.c
> index 8f65464da5de..5ba6c8b3b814 100644
> --- a/mm/mempool.c
> +++ b/mm/mempool.c
> @@ -306,36 +306,25 @@ EXPORT_SYMBOL(mempool_resize);
> * returns NULL. Note that due to preallocation, this function
> * *never* fails when called from process contexts. (it might
> * fail if called from an IRQ context.)
> - * Note: neither __GFP_NOMEMALLOC nor __GFP_ZERO are supported.
> + * Note: using __GFP_ZERO is not supported.
> */
> -void *mempool_alloc(mempool_t *pool, gfp_t gfp_mask)
> +void * mempool_alloc(mempool_t *pool, gfp_t gfp_mask)
> {
> void *element;
> unsigned long flags;
> wait_queue_t wait;
> gfp_t gfp_temp;
>
> - /* If oom killed, memory reserves are essential to prevent livelock */
> - VM_WARN_ON_ONCE(gfp_mask & __GFP_NOMEMALLOC);
> - /* No element size to zero on allocation */
> VM_WARN_ON_ONCE(gfp_mask & __GFP_ZERO);
> -
> might_sleep_if(gfp_mask & __GFP_DIRECT_RECLAIM);
>
> + gfp_mask |= __GFP_NOMEMALLOC; /* don't allocate emergency reserves */
> gfp_mask |= __GFP_NORETRY; /* don't loop in __alloc_pages */
> gfp_mask |= __GFP_NOWARN; /* failures are OK */
As I was reading through this thread I kept thinking "Surely
mempool_alloc() should never ever allocate from emergency reserves.
Ever."
Then I saw this patch. It made me happy.
Thanks.
Acked-by: NeilBrown <neilb@suse.com>
(if you want it)
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]
next prev parent reply other threads:[~2016-07-22 1:41 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-07-18 8:39 [RFC PATCH 0/2] mempool vs. page allocator interaction Michal Hocko
2016-07-18 8:41 ` [RFC PATCH 1/2] mempool: do not consume memory reserves from the reclaim path Michal Hocko
2016-07-18 8:41 ` [RFC PATCH 2/2] mm, mempool: do not throttle PF_LESS_THROTTLE tasks Michal Hocko
2016-07-19 21:50 ` Mikulas Patocka
2016-07-22 8:46 ` NeilBrown
2016-07-22 9:04 ` NeilBrown
2016-07-22 9:15 ` Michal Hocko
2016-07-23 0:12 ` NeilBrown
2016-07-25 8:32 ` Michal Hocko
2016-07-25 19:23 ` Michal Hocko
2016-07-26 7:07 ` Michal Hocko
2016-07-27 3:43 ` [dm-devel] " NeilBrown
2016-07-27 18:24 ` Michal Hocko
2016-07-27 21:33 ` NeilBrown
2016-07-28 7:17 ` Michal Hocko
2016-08-03 12:53 ` Mikulas Patocka
2016-08-03 14:34 ` Michal Hocko
2016-08-04 18:49 ` Mikulas Patocka
2016-08-12 12:32 ` Michal Hocko
2016-08-13 17:34 ` Mikulas Patocka
2016-08-14 10:34 ` Michal Hocko
2016-08-15 16:15 ` Mikulas Patocka
2016-11-23 21:11 ` Mikulas Patocka
2016-11-24 13:29 ` Michal Hocko
2016-11-24 17:10 ` Mikulas Patocka
2016-11-28 14:06 ` Michal Hocko
2016-07-25 21:52 ` Mikulas Patocka
2016-07-26 7:25 ` Michal Hocko
2016-07-27 4:02 ` [dm-devel] " NeilBrown
2016-07-27 14:28 ` Mikulas Patocka
2016-07-27 18:40 ` Michal Hocko
2016-08-03 13:59 ` Mikulas Patocka
2016-08-03 14:42 ` Michal Hocko
2016-08-04 18:46 ` Mikulas Patocka
2016-07-27 21:36 ` NeilBrown
2016-07-19 2:00 ` [RFC PATCH 1/2] mempool: do not consume memory reserves from the reclaim path David Rientjes
2016-07-19 7:49 ` Michal Hocko
2016-07-19 13:54 ` Johannes Weiner
2016-07-19 14:19 ` Michal Hocko
2016-07-19 22:01 ` Mikulas Patocka
2016-07-19 20:45 ` David Rientjes
2016-07-20 8:15 ` Michal Hocko
2016-07-20 21:06 ` David Rientjes
2016-07-21 8:52 ` Michal Hocko
2016-07-21 12:13 ` Johannes Weiner
2016-07-21 14:53 ` Michal Hocko
2016-07-21 15:26 ` Johannes Weiner
2016-07-22 1:41 ` NeilBrown [this message]
2016-07-22 6:37 ` Michal Hocko
2016-07-22 12:26 ` Vlastimil Babka
2016-07-22 19:44 ` Andrew Morton
2016-07-23 18:52 ` Vlastimil Babka
2016-07-19 21:50 ` Mikulas Patocka
2016-07-20 6:44 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87vazy78kx.fsf@notabene.neil.brown.name \
--to=neilb@suse.com \
--cc=akpm@linux-foundation.org \
--cc=dm-devel@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mhocko@kernel.org \
--cc=mpatocka@redhat.com \
--cc=okozina@redhat.com \
--cc=penguin-kernel@i-love.sakura.ne.jp \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).