Re: [patch] mm, page_alloc: allow __GFP_NOFAIL to allocate below watermarks after reclaim

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mel Gorman <mgorman@suse.de>
To: David Rientjes <rientjes@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@suse.cz>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [patch] mm, page_alloc: allow __GFP_NOFAIL to allocate below watermarks after reclaim
Date: Tue, 10 Dec 2013 07:50:59 +0000	[thread overview]
Message-ID: <20131210075059.GA11295@suse.de> (raw)
In-Reply-To: <alpine.DEB.2.02.1312091402580.11026@chino.kir.corp.google.com>

On Mon, Dec 09, 2013 at 02:03:45PM -0800, David Rientjes wrote:
> If direct reclaim has failed to free memory, __GFP_NOFAIL allocations
> can potentially loop forever in the page allocator.  In this case, it's
> better to give them the ability to access below watermarks so that they
> may allocate similar to the same privilege given to GFP_ATOMIC
> allocations.
> 
> We're careful to ensure this is only done after direct reclaim has had
> the chance to free memory, however.
> 
> Signed-off-by: David Rientjes <rientjes@google.com>

The main problem with doing something like this is that it just smacks
into the adjusted watermark if there are a number of __GFP_NOFAIL. Who
was the user of __GFP_NOFAIL that was fixed by this patch?

It appears there are more __GFP_NOFAIL users than I expected and some of
them are silly. md uses it after mempool_alloc fails GFP_ATOMIC and then
immediately calls with __GFP_NOFAIL in a context that can sleep. It could
just have used GFP_NOIO for the mempool alloc which would "never" fail.

btrfs is using __GFP_NOFAIL to call the slab allocator for the extent
cache but also a kmalloc cache which is just dangerous. After this
patch, that thing can push the system below watermarks and then
effectively "leak" them to other !__GFP_NOFAIL users.

Buffer cache uses __GFP_NOFAIL to grow buffers where it expects the page
allocator can loop endlessly but again, allowing it to go below reserves
is just going to hit the same wall a short time later

gfs is using the flag with kmalloc slabs, same as btrfs this can "leak"
the reserves. jbd is the same although jbd2 avoids using the flag in a
manner of speaking.

There are enough bad users of __GFP_NOFAIL that I really question how
good an idea it is to allow emergency reserves to be used when they are
potentially leaked to other !__GFP_NOFAIL users via the slab allocator
shortly afterwards.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Mel Gorman <mgorman@suse.de>
To: David Rientjes <rientjes@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@suse.cz>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [patch] mm, page_alloc: allow __GFP_NOFAIL to allocate below watermarks after reclaim
Date: Tue, 10 Dec 2013 07:50:59 +0000	[thread overview]
Message-ID: <20131210075059.GA11295@suse.de> (raw)
In-Reply-To: <alpine.DEB.2.02.1312091402580.11026@chino.kir.corp.google.com>

On Mon, Dec 09, 2013 at 02:03:45PM -0800, David Rientjes wrote:
> If direct reclaim has failed to free memory, __GFP_NOFAIL allocations
> can potentially loop forever in the page allocator.  In this case, it's
> better to give them the ability to access below watermarks so that they
> may allocate similar to the same privilege given to GFP_ATOMIC
> allocations.
> 
> We're careful to ensure this is only done after direct reclaim has had
> the chance to free memory, however.
> 
> Signed-off-by: David Rientjes <rientjes@google.com>

The main problem with doing something like this is that it just smacks
into the adjusted watermark if there are a number of __GFP_NOFAIL. Who
was the user of __GFP_NOFAIL that was fixed by this patch?

It appears there are more __GFP_NOFAIL users than I expected and some of
them are silly. md uses it after mempool_alloc fails GFP_ATOMIC and then
immediately calls with __GFP_NOFAIL in a context that can sleep. It could
just have used GFP_NOIO for the mempool alloc which would "never" fail.

btrfs is using __GFP_NOFAIL to call the slab allocator for the extent
cache but also a kmalloc cache which is just dangerous. After this
patch, that thing can push the system below watermarks and then
effectively "leak" them to other !__GFP_NOFAIL users.

Buffer cache uses __GFP_NOFAIL to grow buffers where it expects the page
allocator can loop endlessly but again, allowing it to go below reserves
is just going to hit the same wall a short time later

gfs is using the flag with kmalloc slabs, same as btrfs this can "leak"
the reserves. jbd is the same although jbd2 avoids using the flag in a
manner of speaking.

There are enough bad users of __GFP_NOFAIL that I really question how
good an idea it is to allow emergency reserves to be used when they are
potentially leaked to other !__GFP_NOFAIL users via the slab allocator
shortly afterwards.

-- 
Mel Gorman
SUSE Labs

next prev parent reply	other threads:[~2013-12-10  7:51 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-09 22:03 [patch] mm, page_alloc: allow __GFP_NOFAIL to allocate below watermarks after reclaim David Rientjes
2013-12-09 22:03 ` David Rientjes
2013-12-10  7:50 ` Mel Gorman [this message]
2013-12-10  7:50   ` Mel Gorman
2013-12-10 23:03   ` David Rientjes
2013-12-10 23:03     ` David Rientjes
2013-12-11  9:26     ` Mel Gorman
2013-12-11  9:26       ` Mel Gorman
2013-12-12  1:10     ` Dave Chinner
2013-12-12  1:10       ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131210075059.GA11295@suse.de \
    --to=mgorman@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.