linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [patch] mm, page_alloc: make __GFP_NOFAIL really not fail
@ 2013-12-09 21:56 David Rientjes
  2013-12-09 23:22 ` Andrew Morton
  0 siblings, 1 reply; 9+ messages in thread
From: David Rientjes @ 2013-12-09 21:56 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Mel Gorman, Michal Hocko, linux-kernel, linux-mm

__GFP_NOFAIL specifies that the page allocator cannot fail to return
memory.  Allocators that call it may not even check for NULL upon
returning.

It turns out GFP_NOWAIT | __GFP_NOFAIL or GFP_ATOMIC | __GFP_NOFAIL can
actually return NULL.  More interestingly, processes that are doing
direct reclaim and have PF_MEMALLOC set may also return NULL for any
__GFP_NOFAIL allocation.

This patch fixes it so that the page allocator never actually returns
NULL as expected for __GFP_NOFAIL.  It turns out that no code actually
does anything as crazy as GFP_ATOMIC | __GFP_NOFAIL currently, so this
is more for correctness than a bug fix for that issue.

Signed-off-by: David Rientjes <rientjes@google.com>
---
 mm/page_alloc.c | 24 +++++++++++++-----------
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2535,17 +2535,19 @@ rebalance:
 		}
 	}
 
-	/* Atomic allocations - we can't balance anything */
-	if (!wait)
-		goto nopage;
-
-	/* Avoid recursion of direct reclaim */
-	if (current->flags & PF_MEMALLOC)
-		goto nopage;
-
-	/* Avoid allocations with no watermarks from looping endlessly */
-	if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
-		goto nopage;
+	if (likely(!(gfp_mask & __GFP_NOFAIL))) {
+		/* Atomic allocations - we can't balance anything */
+		if (!wait)
+			goto nopage;
+
+		/* Avoid recursion of direct reclaim */
+		if (current->flags & PF_MEMALLOC)
+			goto nopage;
+
+		/* Avoid allocations with no watermarks from looping forever */
+		if (test_thread_flag(TIF_MEMDIE))
+			goto nopage;
+	}
 
 	/*
 	 * Try direct compaction. The first pass is asynchronous. Subsequent

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch] mm, page_alloc: make __GFP_NOFAIL really not fail
  2013-12-09 21:56 [patch] mm, page_alloc: make __GFP_NOFAIL really not fail David Rientjes
@ 2013-12-09 23:22 ` Andrew Morton
  2013-12-10 23:20   ` David Rientjes
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2013-12-09 23:22 UTC (permalink / raw)
  To: David Rientjes; +Cc: Mel Gorman, Michal Hocko, linux-kernel, linux-mm

On Mon, 9 Dec 2013 13:56:37 -0800 (PST) David Rientjes <rientjes@google.com> wrote:

> __GFP_NOFAIL specifies that the page allocator cannot fail to return
> memory.  Allocators that call it may not even check for NULL upon
> returning.
> 
> It turns out GFP_NOWAIT | __GFP_NOFAIL or GFP_ATOMIC | __GFP_NOFAIL can
> actually return NULL.  More interestingly, processes that are doing
> direct reclaim and have PF_MEMALLOC set may also return NULL for any
> __GFP_NOFAIL allocation.

__GFP_NOFAIL is a nasty thing and making it pretend to work even better
is heading in the wrong direction, surely?  It would be saner to just
disallow these even-sillier combinations.  Can we fix up the current
callers then stick a WARN_ON() in there?

> This patch fixes it so that the page allocator never actually returns
> NULL as expected for __GFP_NOFAIL.  It turns out that no code actually
> does anything as crazy as GFP_ATOMIC | __GFP_NOFAIL currently, so this
> is more for correctness than a bug fix for that issue.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch] mm, page_alloc: make __GFP_NOFAIL really not fail
  2013-12-09 23:22 ` Andrew Morton
@ 2013-12-10 23:20   ` David Rientjes
  2013-12-10 23:39     ` Andrew Morton
  2013-12-11  0:19     ` [patch alternative] mm, page_alloc: warn for non-blockable __GFP_NOFAIL allocation failure David Rientjes
  0 siblings, 2 replies; 9+ messages in thread
From: David Rientjes @ 2013-12-10 23:20 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Mel Gorman, Michal Hocko, linux-kernel, linux-mm

On Mon, 9 Dec 2013, Andrew Morton wrote:

> > __GFP_NOFAIL specifies that the page allocator cannot fail to return
> > memory.  Allocators that call it may not even check for NULL upon
> > returning.
> > 
> > It turns out GFP_NOWAIT | __GFP_NOFAIL or GFP_ATOMIC | __GFP_NOFAIL can
> > actually return NULL.  More interestingly, processes that are doing
> > direct reclaim and have PF_MEMALLOC set may also return NULL for any
> > __GFP_NOFAIL allocation.
> 
> __GFP_NOFAIL is a nasty thing and making it pretend to work even better
> is heading in the wrong direction, surely?  It would be saner to just
> disallow these even-sillier combinations.  Can we fix up the current
> callers then stick a WARN_ON() in there?
> 

Heh, it's difficult to remove __GFP_NOFAIL when new users get added: 
84235de394d9 ("fs: buffer: move allocation failure loop into the 
allocator") added a new user and a bypass of memcg limits in oom 
conditions so __GFP_NOFAIL just essentially became 
__GFP_BYPASS_MEMCG_LIMIT_ON_OOM.

We can probably ignore the PF_MEMALLOC behavior since it allows full 
access to memory reserves and the only time we would see a __GFP_NOFAIL 
allocation fail in such a context is if every zone's free memory was 0.  
We have bigger problems if memory reserves are completely depleted like 
that, so it's probably sufficient not to address it.

I'd be concerned about new users of __GFP_NOFAIL that are added for 
GFP_NOWAIT or GFP_ATOMIC and never actually trigger such a warning because 
in testing they never trigger the slowpath, but the conditional is 
probably better placed outside of the fastpath:

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2536,8 +2536,15 @@ rebalance:
 	}
 
 	/* Atomic allocations - we can't balance anything */
-	if (!wait)
+	if (!wait) {
+		/*
+		 * All existing users of the deprecated __GFP_NOFAIL are
+		 * blockable, so warn of any new users that actually allow this
+		 * type of allocation to fail.
+		 */
+		WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL);
 		goto nopage;
+	}
 
 	/* Avoid recursion of direct reclaim */
 	if (current->flags & PF_MEMALLOC)

But perhaps the best way to do this in a preventative way is to add a 
warning to checkpatch.pl that actually warns about adding new users.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch] mm, page_alloc: make __GFP_NOFAIL really not fail
  2013-12-10 23:20   ` David Rientjes
@ 2013-12-10 23:39     ` Andrew Morton
  2013-12-11  0:11       ` David Rientjes
  2013-12-12  1:07       ` Dave Chinner
  2013-12-11  0:19     ` [patch alternative] mm, page_alloc: warn for non-blockable __GFP_NOFAIL allocation failure David Rientjes
  1 sibling, 2 replies; 9+ messages in thread
From: Andrew Morton @ 2013-12-10 23:39 UTC (permalink / raw)
  To: David Rientjes; +Cc: Mel Gorman, Michal Hocko, linux-kernel, linux-mm

On Tue, 10 Dec 2013 15:20:17 -0800 (PST) David Rientjes <rientjes@google.com> wrote:

> On Mon, 9 Dec 2013, Andrew Morton wrote:
> 
> > > __GFP_NOFAIL specifies that the page allocator cannot fail to return
> > > memory.  Allocators that call it may not even check for NULL upon
> > > returning.
> > > 
> > > It turns out GFP_NOWAIT | __GFP_NOFAIL or GFP_ATOMIC | __GFP_NOFAIL can
> > > actually return NULL.  More interestingly, processes that are doing
> > > direct reclaim and have PF_MEMALLOC set may also return NULL for any
> > > __GFP_NOFAIL allocation.
> > 
> > __GFP_NOFAIL is a nasty thing and making it pretend to work even better
> > is heading in the wrong direction, surely?  It would be saner to just
> > disallow these even-sillier combinations.  Can we fix up the current
> > callers then stick a WARN_ON() in there?
> > 
> 
> Heh, it's difficult to remove __GFP_NOFAIL when new users get added: 
> 84235de394d9 ("fs: buffer: move allocation failure loop into the 
> allocator") added a new user

That wasn't reeeeealy a new user - it was "convert an existing
open-coded retry-for-ever loop".  Which is what __GFP_NOFAIL is for.

I don't think I've ever seen anyone actually fix one of these things
(by teaching the caller to handle ENOMEM), so it obviously isn't
working...

> and a bypass of memcg limits in oom 
> conditions so __GFP_NOFAIL just essentially became 
> __GFP_BYPASS_MEMCG_LIMIT_ON_OOM.
> 
> We can probably ignore the PF_MEMALLOC behavior since it allows full 
> access to memory reserves and the only time we would see a __GFP_NOFAIL 
> allocation fail in such a context is if every zone's free memory was 0.  
> We have bigger problems if memory reserves are completely depleted like 
> that, so it's probably sufficient not to address it.
> 
> I'd be concerned about new users of __GFP_NOFAIL that are added for 
> GFP_NOWAIT or GFP_ATOMIC and never actually trigger such a warning because 
> in testing they never trigger the slowpath, but the conditional is 
> probably better placed outside of the fastpath:
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2536,8 +2536,15 @@ rebalance:
>  	}
>  
>  	/* Atomic allocations - we can't balance anything */
> -	if (!wait)
> +	if (!wait) {
> +		/*
> +		 * All existing users of the deprecated __GFP_NOFAIL are
> +		 * blockable, so warn of any new users that actually allow this
> +		 * type of allocation to fail.
> +		 */
> +		WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL);
>  		goto nopage;
> +	}

Seems sensible.

> But perhaps the best way to do this in a preventative way is to add a 
> warning to checkpatch.pl that actually warns about adding new users.

yup.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch] mm, page_alloc: make __GFP_NOFAIL really not fail
  2013-12-10 23:39     ` Andrew Morton
@ 2013-12-11  0:11       ` David Rientjes
  2013-12-12  1:07       ` Dave Chinner
  1 sibling, 0 replies; 9+ messages in thread
From: David Rientjes @ 2013-12-11  0:11 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Mel Gorman, Michal Hocko, linux-kernel, linux-mm

On Tue, 10 Dec 2013, Andrew Morton wrote:

> > Heh, it's difficult to remove __GFP_NOFAIL when new users get added: 
> > 84235de394d9 ("fs: buffer: move allocation failure loop into the 
> > allocator") added a new user
> 
> That wasn't reeeeealy a new user - it was "convert an existing
> open-coded retry-for-ever loop".  Which is what __GFP_NOFAIL is for.
> 

No, it just looks like that's what it did.  find_or_create_page() in that 
function does an order-0 allocation which always implicitly __GFP_NOFAIL 
because of the should_alloc_retry() behavior.  So why does it need to add 
__GFP_NOFAIL there now?  Because it is now allowed to bypass memcg limits 
to the root memcg, which is new behavior with the patch.  It adds 
additional memcg powers that can't be duplicated in the caller, so now 
it's really become __GFP_BYPASS_MEMCG_LIMIT_ON_OOM for everything that was 
doing order-3 or smaller allocations, which should be all existing 
__GFP_NOFAIL users.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [patch alternative] mm, page_alloc: warn for non-blockable __GFP_NOFAIL allocation failure
  2013-12-10 23:20   ` David Rientjes
  2013-12-10 23:39     ` Andrew Morton
@ 2013-12-11  0:19     ` David Rientjes
  2013-12-11  0:26       ` [patch] checkpatch: add warning of future __GFP_NOFAIL use David Rientjes
  1 sibling, 1 reply; 9+ messages in thread
From: David Rientjes @ 2013-12-11  0:19 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Mel Gorman, Michal Hocko, linux-kernel, linux-mm

__GFP_NOFAIL may return NULL when coupled with GFP_NOWAIT or GFP_ATOMIC.

Luckily, nothing currently does such craziness.  So instead of causing
such allocations to loop (potentially forever), we maintain the current
behavior and also warn about the new users of the deprecated flag.

Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Rientjes <rientjes@google.com>
---
 mm/page_alloc.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2536,8 +2536,15 @@ rebalance:
 	}
 
 	/* Atomic allocations - we can't balance anything */
-	if (!wait)
+	if (!wait) {
+		/*
+		 * All existing users of the deprecated __GFP_NOFAIL are
+		 * blockable, so warn of any new users that actually allow this
+		 * type of allocation to fail.
+		 */
+		WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL);
 		goto nopage;
+	}
 
 	/* Avoid recursion of direct reclaim */
 	if (current->flags & PF_MEMALLOC)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [patch] checkpatch: add warning of future __GFP_NOFAIL use
  2013-12-11  0:19     ` [patch alternative] mm, page_alloc: warn for non-blockable __GFP_NOFAIL allocation failure David Rientjes
@ 2013-12-11  0:26       ` David Rientjes
  2013-12-11  1:35         ` Joe Perches
  0 siblings, 1 reply; 9+ messages in thread
From: David Rientjes @ 2013-12-11  0:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andy Whitcroft, Joe Perches, Mel Gorman, Michal Hocko,
	linux-kernel, linux-mm

gfp.h and page_alloc.c already specify that __GFP_NOFAIL is deprecated and 
no new users should be added.

Add a warning to checkpatch to catch this.

Signed-off-by: David Rientjes <rientjes@google.com>
---
 scripts/checkpatch.pl | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 9c98100..6667689 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -4114,6 +4114,12 @@ sub process {
 			     "$1 uses number as first arg, sizeof is generally wrong\n" . $herecurr);
 		}
 
+# check for GFP_NOWAIT use
+		if ($line =~ /\b__GFP_NOFAIL\b/) {
+			WARN("__GFP_NOFAIL",
+			     "Use of __GFP_NOFAIL is deprecated, no new users should be added\n" . $herecurr);
+		}
+
 # check for multiple semicolons
 		if ($line =~ /;\s*;\s*$/) {
 			if (WARN("ONE_SEMICOLON",

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [patch] checkpatch: add warning of future __GFP_NOFAIL use
  2013-12-11  0:26       ` [patch] checkpatch: add warning of future __GFP_NOFAIL use David Rientjes
@ 2013-12-11  1:35         ` Joe Perches
  0 siblings, 0 replies; 9+ messages in thread
From: Joe Perches @ 2013-12-11  1:35 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Andy Whitcroft, Mel Gorman, Michal Hocko,
	linux-kernel, linux-mm

On Tue, 2013-12-10 at 16:26 -0800, David Rientjes wrote:
> gfp.h and page_alloc.c already specify that __GFP_NOFAIL is deprecated and 
> no new users should be added.
> 
> Add a warning to checkpatch to catch this.

Fine by me.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch] mm, page_alloc: make __GFP_NOFAIL really not fail
  2013-12-10 23:39     ` Andrew Morton
  2013-12-11  0:11       ` David Rientjes
@ 2013-12-12  1:07       ` Dave Chinner
  1 sibling, 0 replies; 9+ messages in thread
From: Dave Chinner @ 2013-12-12  1:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Rientjes, Mel Gorman, Michal Hocko, linux-kernel, linux-mm

On Tue, Dec 10, 2013 at 03:39:09PM -0800, Andrew Morton wrote:
> On Tue, 10 Dec 2013 15:20:17 -0800 (PST) David Rientjes <rientjes@google.com> wrote:
> 
> > On Mon, 9 Dec 2013, Andrew Morton wrote:
> > 
> > > > __GFP_NOFAIL specifies that the page allocator cannot fail to return
> > > > memory.  Allocators that call it may not even check for NULL upon
> > > > returning.
> > > > 
> > > > It turns out GFP_NOWAIT | __GFP_NOFAIL or GFP_ATOMIC | __GFP_NOFAIL can
> > > > actually return NULL.  More interestingly, processes that are doing
> > > > direct reclaim and have PF_MEMALLOC set may also return NULL for any
> > > > __GFP_NOFAIL allocation.
> > > 
> > > __GFP_NOFAIL is a nasty thing and making it pretend to work even better
> > > is heading in the wrong direction, surely?  It would be saner to just
> > > disallow these even-sillier combinations.  Can we fix up the current
> > > callers then stick a WARN_ON() in there?
> > > 
> > 
> > Heh, it's difficult to remove __GFP_NOFAIL when new users get added: 
> > 84235de394d9 ("fs: buffer: move allocation failure loop into the 
> > allocator") added a new user
> 
> That wasn't reeeeealy a new user - it was "convert an existing
> open-coded retry-for-ever loop".  Which is what __GFP_NOFAIL is for.
> 
> I don't think I've ever seen anyone actually fix one of these things
> (by teaching the caller to handle ENOMEM), so it obviously isn't
> working...

Right, because most of the loops are deep within filesystem
transaction code where the only thing to do with a memory allocation
failure is to abort the transaction, shutdown the filesystem and
deny user access (i.e. DOS the system) because the filesystem is
inconsistent in memory and the only way it can be recovered is
toosing everything in memory away and recovering the last valid
on disk state from the journal. i.e. umount, mount.

IOWs, the "fix" is far worse than current behaviour and so there is
absolutely no motivation for the people who own these __GFP_NOFAIL
allocations to fix them. Indeed, when you consider that the amount of
work to fix the filesystems to robustly handle ENOMEM is a *massive*
undertaking that adds significant overhead and complexity to each
filesystem, the cost/benefit analysis comes down so far on the side
of "just use __GFP_NOFAIL" that doing anything else is sheer lunacy.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2013-12-12  1:08 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-09 21:56 [patch] mm, page_alloc: make __GFP_NOFAIL really not fail David Rientjes
2013-12-09 23:22 ` Andrew Morton
2013-12-10 23:20   ` David Rientjes
2013-12-10 23:39     ` Andrew Morton
2013-12-11  0:11       ` David Rientjes
2013-12-12  1:07       ` Dave Chinner
2013-12-11  0:19     ` [patch alternative] mm, page_alloc: warn for non-blockable __GFP_NOFAIL allocation failure David Rientjes
2013-12-11  0:26       ` [patch] checkpatch: add warning of future __GFP_NOFAIL use David Rientjes
2013-12-11  1:35         ` Joe Perches

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).