[patch] mm, page_alloc: allow __GFP_NOFAIL to allocate below watermarks after reclaim

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [patch] mm, page_alloc: allow __GFP_NOFAIL to allocate below watermarks after reclaim
@ 2013-12-09 22:03 David Rientjes
  2013-12-10  7:50 ` Mel Gorman
  0 siblings, 1 reply; 5+ messages in thread
From: David Rientjes @ 2013-12-09 22:03 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Mel Gorman, Michal Hocko, linux-mm, linux-kernel

If direct reclaim has failed to free memory, __GFP_NOFAIL allocations
can potentially loop forever in the page allocator.  In this case, it's
better to give them the ability to access below watermarks so that they
may allocate similar to the same privilege given to GFP_ATOMIC
allocations.

We're careful to ensure this is only done after direct reclaim has had
the chance to free memory, however.

Signed-off-by: David Rientjes <rientjes@google.com>
---
 mm/page_alloc.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2629,6 +2629,11 @@ rebalance:
 						pages_reclaimed)) {
 		/* Wait for some write requests to complete then retry */
 		wait_iff_congested(preferred_zone, BLK_RW_ASYNC, HZ/50);
+
+		/* Allocations that cannot fail must allocate from somewhere */
+		if (gfp_mask & __GFP_NOFAIL)
+			alloc_flags |= ALLOC_HARDER;
+
 		goto rebalance;
 	} else {
 		/*

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [patch] mm, page_alloc: allow __GFP_NOFAIL to allocate below watermarks after reclaim
  2013-12-09 22:03 [patch] mm, page_alloc: allow __GFP_NOFAIL to allocate below watermarks after reclaim David Rientjes
@ 2013-12-10  7:50 ` Mel Gorman
  2013-12-10 23:03   ` David Rientjes
  0 siblings, 1 reply; 5+ messages in thread
From: Mel Gorman @ 2013-12-10  7:50 UTC (permalink / raw)
  To: David Rientjes; +Cc: Andrew Morton, Michal Hocko, linux-mm, linux-kernel

On Mon, Dec 09, 2013 at 02:03:45PM -0800, David Rientjes wrote:
> If direct reclaim has failed to free memory, __GFP_NOFAIL allocations
> can potentially loop forever in the page allocator.  In this case, it's
> better to give them the ability to access below watermarks so that they
> may allocate similar to the same privilege given to GFP_ATOMIC
> allocations.
> 
> We're careful to ensure this is only done after direct reclaim has had
> the chance to free memory, however.
> 
> Signed-off-by: David Rientjes <rientjes@google.com>

The main problem with doing something like this is that it just smacks
into the adjusted watermark if there are a number of __GFP_NOFAIL. Who
was the user of __GFP_NOFAIL that was fixed by this patch?

It appears there are more __GFP_NOFAIL users than I expected and some of
them are silly. md uses it after mempool_alloc fails GFP_ATOMIC and then
immediately calls with __GFP_NOFAIL in a context that can sleep. It could
just have used GFP_NOIO for the mempool alloc which would "never" fail.

btrfs is using __GFP_NOFAIL to call the slab allocator for the extent
cache but also a kmalloc cache which is just dangerous. After this
patch, that thing can push the system below watermarks and then
effectively "leak" them to other !__GFP_NOFAIL users.

Buffer cache uses __GFP_NOFAIL to grow buffers where it expects the page
allocator can loop endlessly but again, allowing it to go below reserves
is just going to hit the same wall a short time later

gfs is using the flag with kmalloc slabs, same as btrfs this can "leak"
the reserves. jbd is the same although jbd2 avoids using the flag in a
manner of speaking.

There are enough bad users of __GFP_NOFAIL that I really question how
good an idea it is to allow emergency reserves to be used when they are
potentially leaked to other !__GFP_NOFAIL users via the slab allocator
shortly afterwards.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [patch] mm, page_alloc: allow __GFP_NOFAIL to allocate below watermarks after reclaim
  2013-12-10  7:50 ` Mel Gorman
@ 2013-12-10 23:03   ` David Rientjes
  2013-12-11  9:26     ` Mel Gorman
  2013-12-12  1:10     ` Dave Chinner
  0 siblings, 2 replies; 5+ messages in thread
From: David Rientjes @ 2013-12-10 23:03 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Andrew Morton, Michal Hocko, linux-mm, linux-kernel

On Tue, 10 Dec 2013, Mel Gorman wrote:

> > If direct reclaim has failed to free memory, __GFP_NOFAIL allocations
> > can potentially loop forever in the page allocator.  In this case, it's
> > better to give them the ability to access below watermarks so that they
> > may allocate similar to the same privilege given to GFP_ATOMIC
> > allocations.
> > 
> > We're careful to ensure this is only done after direct reclaim has had
> > the chance to free memory, however.
> > 
> > Signed-off-by: David Rientjes <rientjes@google.com>
> 
> The main problem with doing something like this is that it just smacks
> into the adjusted watermark if there are a number of __GFP_NOFAIL. Who
> was the user of __GFP_NOFAIL that was fixed by this patch?
> 

Nobody, it comes out of a memcg discussion where __GFP_NOFAIL were 
recently given the ability to bypass charges to the root memcg when the 
memcg has hit its limit since we disallow the oom killer to kill a process 
(for the same reason that the vast majority of __GFP_NOFAIL users, those 
that do GFP_NOFS | __GFP_NOFAIL, disallow the oom killer in the page 
allocator).

Without some other thread freeing memory, these allocations simply loop 
forever.  We probably don't want to reconsider the choice that prevents 
calling the oom killer in !__GFP_FS contexts since it will allow 
unnecessary oom killing when memory can actually be freed by another 
thread.

Since there are comments in both gfp.h and page_alloc.c that say no new 
users will be added, it seems legitimate to ensure that the allocation 
will at least have a chance of succeeding, but not the point of depleting 
memory reserves entirely.

> There are enough bad users of __GFP_NOFAIL that I really question how
> good an idea it is to allow emergency reserves to be used when they are
> potentially leaked to other !__GFP_NOFAIL users via the slab allocator
> shortly afterwards.
> 

You could make the same argument for GFP_ATOMIC which can also allow 
access to memory reserves.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [patch] mm, page_alloc: allow __GFP_NOFAIL to allocate below watermarks after reclaim
  2013-12-10 23:03   ` David Rientjes
@ 2013-12-11  9:26     ` Mel Gorman
  2013-12-12  1:10     ` Dave Chinner
  1 sibling, 0 replies; 5+ messages in thread
From: Mel Gorman @ 2013-12-11  9:26 UTC (permalink / raw)
  To: David Rientjes; +Cc: Andrew Morton, Michal Hocko, linux-mm, linux-kernel

On Tue, Dec 10, 2013 at 03:03:39PM -0800, David Rientjes wrote:
> On Tue, 10 Dec 2013, Mel Gorman wrote:
> 
> > > If direct reclaim has failed to free memory, __GFP_NOFAIL allocations
> > > can potentially loop forever in the page allocator.  In this case, it's
> > > better to give them the ability to access below watermarks so that they
> > > may allocate similar to the same privilege given to GFP_ATOMIC
> > > allocations.
> > > 
> > > We're careful to ensure this is only done after direct reclaim has had
> > > the chance to free memory, however.
> > > 
> > > Signed-off-by: David Rientjes <rientjes@google.com>
> > 
> > The main problem with doing something like this is that it just smacks
> > into the adjusted watermark if there are a number of __GFP_NOFAIL. Who
> > was the user of __GFP_NOFAIL that was fixed by this patch?
> > 
> 
> Nobody, it comes out of a memcg discussion where __GFP_NOFAIL were 
> recently given the ability to bypass charges to the root memcg when the 
> memcg has hit its limit since we disallow the oom killer to kill a process 
> (for the same reason that the vast majority of __GFP_NOFAIL users, those 
> that do GFP_NOFS | __GFP_NOFAIL, disallow the oom killer in the page 
> allocator).
> 
> Without some other thread freeing memory, these allocations simply loop 
> forever.  We probably don't want to reconsider the choice that prevents 
> calling the oom killer in !__GFP_FS contexts since it will allow 
> unnecessary oom killing when memory can actually be freed by another 
> thread.
> 
> Since there are comments in both gfp.h and page_alloc.c that say no new 
> users will be added, it seems legitimate to ensure that the allocation 
> will at least have a chance of succeeding, but not the point of depleting 
> memory reserves entirely.
> 

Which __GFP_NOFAIL on its own does not guarantee if they just smack into
that barrier and cannot do anything. It changes the timing, not fixes
the problem.

> > There are enough bad users of __GFP_NOFAIL that I really question how
> > good an idea it is to allow emergency reserves to be used when they are
> > potentially leaked to other !__GFP_NOFAIL users via the slab allocator
> > shortly afterwards.
> > 
> 
> You could make the same argument for GFP_ATOMIC which can also allow 
> access to memory reserves.

The critical difference being that GFP_ATOMIC callers typically can handle
NULL being returned to them. GFP_ATOMIC storms may starve !GFP_ATOMIC
requests but it does not cause the same types of problems that
__GFP_NOFAIL using reserves would.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [patch] mm, page_alloc: allow __GFP_NOFAIL to allocate below watermarks after reclaim
  2013-12-10 23:03   ` David Rientjes
  2013-12-11  9:26     ` Mel Gorman
@ 2013-12-12  1:10     ` Dave Chinner
  1 sibling, 0 replies; 5+ messages in thread
From: Dave Chinner @ 2013-12-12  1:10 UTC (permalink / raw)
  To: David Rientjes
  Cc: Mel Gorman, Andrew Morton, Michal Hocko, linux-mm, linux-kernel

On Tue, Dec 10, 2013 at 03:03:39PM -0800, David Rientjes wrote:
> On Tue, 10 Dec 2013, Mel Gorman wrote:
> 
> > > If direct reclaim has failed to free memory, __GFP_NOFAIL allocations
> > > can potentially loop forever in the page allocator.  In this case, it's
> > > better to give them the ability to access below watermarks so that they
> > > may allocate similar to the same privilege given to GFP_ATOMIC
> > > allocations.
> > > 
> > > We're careful to ensure this is only done after direct reclaim has had
> > > the chance to free memory, however.
> > > 
> > > Signed-off-by: David Rientjes <rientjes@google.com>
> > 
> > The main problem with doing something like this is that it just smacks
> > into the adjusted watermark if there are a number of __GFP_NOFAIL. Who
> > was the user of __GFP_NOFAIL that was fixed by this patch?
> > 
> 
> Nobody, it comes out of a memcg discussion where __GFP_NOFAIL were 
> recently given the ability to bypass charges to the root memcg when the 
> memcg has hit its limit since we disallow the oom killer to kill a process 
> (for the same reason that the vast majority of __GFP_NOFAIL users, those 
> that do GFP_NOFS | __GFP_NOFAIL, disallow the oom killer in the page 
> allocator).
> 
> Without some other thread freeing memory, these allocations simply loop 
> forever.

So what is kswapd doing in this situation?

> Since there are comments in both gfp.h and page_alloc.c that say no new 
> users will be added, it seems legitimate to ensure that the allocation 
> will at least have a chance of succeeding, but not the point of depleting 
> memory reserves entirely.

As it said before, the filesystem will then simply keep allocating
memory until it hits the next limit, and then you're back in the
same situation. Moving the limit at which it fails does not solve
the problem at all.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-12-12  1:10 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-09 22:03 [patch] mm, page_alloc: allow __GFP_NOFAIL to allocate below watermarks after reclaim David Rientjes
2013-12-10  7:50 ` Mel Gorman
2013-12-10 23:03   ` David Rientjes
2013-12-11  9:26     ` Mel Gorman
2013-12-12  1:10     ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).