Re: upcoming kerneloops.org item: get_page_from_freelist

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mel Gorman <mel@csn.ul.ie>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	penberg@cs.helsinki.fi, arjan@infradead.org,
	linux-kernel@vger.kernel.org, cl@linux-foundation.org,
	npiggin@suse.de, David Rientjes <rientjes@google.com>
Subject: Re: upcoming kerneloops.org item: get_page_from_freelist
Date: Mon, 29 Jun 2009 16:30:07 +0100	[thread overview]
Message-ID: <20090629153007.GD5065@csn.ul.ie> (raw)
In-Reply-To: <20090624145615.2ff9e56e.akpm@linux-foundation.org>

On Wed, Jun 24, 2009 at 02:56:15PM -0700, Andrew Morton wrote:
> On Wed, 24 Jun 2009 13:13:48 -0700 (PDT)
> Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > 
> > On Wed, 24 Jun 2009, Andrew Morton wrote:
> > > 
> > > If the caller gets oom-killed, the allocation attempt fails.  Callers need
> > > to handle that.
> > 
> > I actually disagree. I think we should just admit that we can always free 
> > up enough space to get a few pages, in order to then oom-kill things.
> 
> I'm unclear on precisely what you're proposing here?
> 

As order <= PAGE_ALLOC_COSTLY_ORDER implies __GFP_NOFAIL, prehaps it
makes sense to change the check to

WARN_ON_ONCE(order > PAGE_ALLOC_COSTLY_ORDER)

? The temptation might be there to remove __GFP_NOFAIL for smaller orders but
it makes sense to have it available in case CONFIG_FAULT_INJECTION_DEBUG_FS
is set and randomly failing allocations that have serious consequences even
if handled.

> > This is not a new concept. oom has never been "immediately kill".
> 
> Well, it has been immediate for a long time.  A couple of reasons which
> I can recall:
> 
> - A page-allocating process will oom-kill another process in the
>   expectation that the killing will free up some memory.  If the
>   oom-killed process remains stuck in the page allocator, that doesn't
>   work.
> 
> - The oom-killed process might be holding locks (typically fs locks).
>   This can cause an arbitrary number of other processes to be blocked.
>   So to get the system unstuck we need the oom-killed process to
>   immediately exit the page allocator, to handle the NULL return and to
>   drop those locks.
> 
> There may be other reasons - it was all a long time ago, and I've never
> personally hacked on the oom-killer much and I never get oom-killed. 
> But given the amount of development work which goes on in there, some
> people must be getting massacred.
> 
> 
> A long time ago, the Suse kernel shipped with a largely (or
> completely?) disabled oom-killer.  It removed the
> retry-small-allocations-for-ever logic and simply returned NULL to the
> caller.  I never really understood what problem/thinking led Andrea to
> do that.
> 
> 
> But it's all a bit moot at present, as we seem to have removed the
> return-NULL-if-TIF_MEMDIE logic in Mel's post-2.6.30 merges.  I think
> that was an accident:
> 
> -	/* This allocation should allow future memory freeing. */
> -
>  rebalance:
> -	if (((p->flags & PF_MEMALLOC) || unlikely(test_thread_flag(TIF_MEMDIE)))
> -			&& !in_interrupt()) {
> -		if (!(gfp_mask & __GFP_NOMEMALLOC)) {
> -nofail_alloc:
> -			/* go through the zonelist yet again, ignoring mins */
> -			page = get_page_from_freelist(gfp_mask, nodemask, order,
> -				zonelist, high_zoneidx, ALLOC_NO_WATERMARKS);
> -			if (page)
> -				goto got_pg;
> -			if (gfp_mask & __GFP_NOFAIL) {
> -				congestion_wait(WRITE, HZ/50);
> -				goto nofail_alloc;
> -			}
> -		}
> -		goto nopage;
> +	/* Allocate without watermarks if the context allows */
> +	if (alloc_flags & ALLOC_NO_WATERMARKS) {
> +		page = __alloc_pages_high_priority(gfp_mask, order,
> +				zonelist, high_zoneidx, nodemask,
> +				preferred_zone, migratetype);
> +		if (page)
> +			goto got_pg;
>  	}
> 
> Offending commit 341ce06 handled the PF_MEMALLOC case but forgot about
> the TIF_MEMDIE case.
> 
> Mel is having a bit of downtime at present.

I'm getting back online now and playing catch-up. You're right in that
TIF_MEMDIE returning NULL has been broken and it's possible in theory for an
OOM-killed process to loop forever. But maybe TIF_MEMDIE looping potentially
forever is expected in the case __GFP_NOFAIL is specified. Fixing this to allow
an OOM-killed process to exit does mean that callers using __GFP_NOFAIL must
still handle NULL being returned which might be very unexpected to the caller.

In 2.6.30, TIF_MEMDIE could cause a request to exit without ever entering
direct reclaim but chances are this didn't happen as it would have been
looping during an OOM-kill. To duplicate this, a check for TIF_MEMDIE would
happen after

	/* Avoid recursion of direct reclaim */
        if (p->flags & PF_MEMALLOC)
                goto nopage;

But as failing __GFP_NOFAIL is potentially serious, even for processes that
have been OOM killed, I think it makes more sense to check for TIF_MEMDIE
after direct reclaim and OOM killing have already been considered as options
with a patch such as the following?

==== CUT HERE ====
page-allocator: Ensure that processes that have been OOM killed exit the page allocator

Processes that have been OOM killed set the thread flag TIF_MEMDIE. A
process such as this is expected to exit the page allocator but in the
event it happens to have set __GFP_NOFAIL, it potentially loops forever.

This patch checks TIF_MEMDIE when deciding whether to loop again in the
page allocator. Such a process will now return NULL after direct reclaim
and OOM killing have both been considered as options. The potential
problem is that a __GFP_NOFAIL allocation can still return failure so
callers must still handle getting returned NULL.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
--- 
 mm/page_alloc.c |    8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5d714f8..8449cf9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1539,6 +1539,10 @@ should_alloc_retry(gfp_t gfp_mask, unsigned int order,
 	if (gfp_mask & __GFP_NORETRY)
 		return 0;
 
+	/* Do not loop if this process has been OOM-killed */
+	if (test_thread_flag(TIF_MEMDIE))
+		return 0;
+
 	/*
 	 * In this implementation, order <= PAGE_ALLOC_COSTLY_ORDER
 	 * means __GFP_NOFAIL, but that may not be true in other
@@ -1823,6 +1827,10 @@ rebalance:
 						!(gfp_mask & __GFP_NOFAIL))
 				goto nopage;
 
+			/* Do not loop if this process has been OOM-killed */
+			if (test_thread_flag(TIF_MEMDIE))
+				goto nopage;
+
 			goto restart;
 		}
 	}

next prev parent reply	other threads:[~2009-06-29 15:30 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-06-24 15:07 upcoming kerneloops.org item: get_page_from_freelist Arjan van de Ven
2009-06-24 16:46 ` Andrew Morton
2009-06-24 16:52   ` Linus Torvalds
2009-06-24 16:55   ` Pekka Enberg
2009-06-24 16:56     ` Pekka Enberg
2009-06-24 17:00       ` Pekka Enberg
2009-06-24 17:55     ` Andrew Morton
2009-06-24 17:53       ` Pekka Enberg
2009-06-24 18:30         ` Andrew Morton
2009-06-24 18:42           ` Linus Torvalds
2009-06-24 18:44             ` Pekka Enberg
2009-06-24 18:50               ` Linus Torvalds
2009-06-24 19:12                 ` Pekka J Enberg
2009-06-24 19:21                   ` Linus Torvalds
2009-06-24 19:06             ` Andrew Morton
2009-06-24 19:16               ` Linus Torvalds
2009-06-24 19:36                 ` Andrew Morton
2009-06-24 19:46                   ` Linus Torvalds
2009-06-24 19:47                     ` Linus Torvalds
2009-06-24 20:01                     ` Andrew Morton
2009-06-24 20:13                       ` Linus Torvalds
2009-06-24 20:40                         ` Linus Torvalds
2009-06-24 22:07                           ` Andrew Morton
2009-06-25  4:05                             ` Nick Piggin
2009-06-25 13:25                             ` Theodore Tso
2009-06-25 18:51                               ` David Rientjes
2009-06-25 19:38                                 ` Theodore Tso
2009-06-25 19:44                                   ` Theodore Tso
2009-06-25 19:55                                     ` Andrew Morton
2009-06-25 20:11                                     ` Linus Torvalds
2009-06-25 20:22                                       ` Linus Torvalds
2009-06-25 20:36                                         ` David Rientjes
2009-06-25 20:51                                           ` Linus Torvalds
2009-06-25 22:25                                             ` David Rientjes
2009-06-26  8:51                                         ` Nick Piggin
2009-06-25 20:18                                     ` David Rientjes
2009-06-25 20:37                                       ` Theodore Tso
2009-06-25 21:05                                         ` Joel Becker
2009-06-25 21:05                                           ` Joel Becker
2009-06-25 21:26                                         ` Andreas Dilger
2009-06-25 21:26                                           ` Andreas Dilger
2009-06-25 22:05                                           ` Theodore Tso
2009-06-25 22:11                                             ` Eric Sandeen
2009-06-25 22:11                                               ` Eric Sandeen
2009-06-26  1:11                                               ` Theodore Tso
2009-06-26  5:16                                                 ` Pekka J Enberg
2009-06-26  8:56                                                   ` Nick Piggin
2009-06-26  8:58                                                     ` Pekka Enberg
2009-06-26  9:07                                                       ` Nick Piggin
2009-06-29 21:06                                                       ` Christoph Lameter
2009-06-30  7:59                                                         ` Nick Piggin
2009-06-26 14:41                                                   ` Eric Sandeen
2009-06-29 21:15                                                     ` Christoph Lameter
2009-06-29 21:20                                                       ` Eric Sandeen
2009-06-29 22:35                                                         ` Christoph Lameter
2009-06-25 19:55                             ` Jens Axboe
2009-06-25 20:08                               ` Jens Axboe
2009-06-24 21:56                         ` Andrew Morton
2009-06-25  4:14                           ` Nick Piggin
2009-06-25  8:21                           ` David Rientjes
2009-06-29 15:30                           ` Mel Gorman [this message]
2009-06-29 19:20                             ` Andrew Morton
2009-06-30 11:00                               ` Mel Gorman
2009-06-30 19:35                                 ` David Rientjes
2009-06-30 20:32                                   ` Mel Gorman
2009-06-30 20:51                                     ` David Rientjes
2009-07-01 10:22                                       ` Mel Gorman
2009-06-29 23:35                             ` David Rientjes
2009-06-30  7:47                               ` Nick Piggin
2009-06-30  8:13                                 ` David Rientjes
2009-06-30  8:24                                   ` Nick Piggin
2009-06-30  8:41                                     ` David Rientjes
2009-06-30  9:09                                       ` Nick Piggin
2009-06-30 19:47                                         ` David Rientjes
2009-06-30  6:27                           ` Pavel Machek
2009-06-28 10:16                     ` Pavel Machek
2009-06-28 18:01                       ` Linus Torvalds
2009-06-28 18:27                         ` Arjan van de Ven
2009-06-28 18:36                           ` Linus Torvalds
2009-06-30  7:35                         ` Pavel Machek
2009-06-24 18:43           ` Pekka Enberg

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:5d714f8 dfblob:8449cf9 )
 OR (
bs:"Re: upcoming kerneloops.org item: get_page_from_freelist" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090629153007.GD5065@csn.ul.ie \
    --to=mel@csn.ul.ie \
    --cc=akpm@linux-foundation.org \
    --cc=arjan@infradead.org \
    --cc=cl@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=npiggin@suse.de \
    --cc=penberg@cs.helsinki.fi \
    --cc=rientjes@google.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.