public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mel@csn.ul.ie>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	penberg@cs.helsinki.fi, arjan@infradead.org,
	linux-kernel@vger.kernel.org, cl@linux-foundation.org,
	npiggin@suse.de, David Rientjes <rientjes@google.com>
Subject: Re: upcoming kerneloops.org item: get_page_from_freelist
Date: Mon, 29 Jun 2009 16:30:07 +0100	[thread overview]
Message-ID: <20090629153007.GD5065@csn.ul.ie> (raw)
In-Reply-To: <20090624145615.2ff9e56e.akpm@linux-foundation.org>

On Wed, Jun 24, 2009 at 02:56:15PM -0700, Andrew Morton wrote:
> On Wed, 24 Jun 2009 13:13:48 -0700 (PDT)
> Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > 
> > On Wed, 24 Jun 2009, Andrew Morton wrote:
> > > 
> > > If the caller gets oom-killed, the allocation attempt fails.  Callers need
> > > to handle that.
> > 
> > I actually disagree. I think we should just admit that we can always free 
> > up enough space to get a few pages, in order to then oom-kill things.
> 
> I'm unclear on precisely what you're proposing here?
> 

As order <= PAGE_ALLOC_COSTLY_ORDER implies __GFP_NOFAIL, prehaps it
makes sense to change the check to

WARN_ON_ONCE(order > PAGE_ALLOC_COSTLY_ORDER)

? The temptation might be there to remove __GFP_NOFAIL for smaller orders but
it makes sense to have it available in case CONFIG_FAULT_INJECTION_DEBUG_FS
is set and randomly failing allocations that have serious consequences even
if handled.

> > This is not a new concept. oom has never been "immediately kill".
> 
> Well, it has been immediate for a long time.  A couple of reasons which
> I can recall:
> 
> - A page-allocating process will oom-kill another process in the
>   expectation that the killing will free up some memory.  If the
>   oom-killed process remains stuck in the page allocator, that doesn't
>   work.
> 
> - The oom-killed process might be holding locks (typically fs locks).
>   This can cause an arbitrary number of other processes to be blocked.
>   So to get the system unstuck we need the oom-killed process to
>   immediately exit the page allocator, to handle the NULL return and to
>   drop those locks.
> 
> There may be other reasons - it was all a long time ago, and I've never
> personally hacked on the oom-killer much and I never get oom-killed. 
> But given the amount of development work which goes on in there, some
> people must be getting massacred.
> 
> 
> A long time ago, the Suse kernel shipped with a largely (or
> completely?) disabled oom-killer.  It removed the
> retry-small-allocations-for-ever logic and simply returned NULL to the
> caller.  I never really understood what problem/thinking led Andrea to
> do that.
> 
> 
> But it's all a bit moot at present, as we seem to have removed the
> return-NULL-if-TIF_MEMDIE logic in Mel's post-2.6.30 merges.  I think
> that was an accident:
> 
> -	/* This allocation should allow future memory freeing. */
> -
>  rebalance:
> -	if (((p->flags & PF_MEMALLOC) || unlikely(test_thread_flag(TIF_MEMDIE)))
> -			&& !in_interrupt()) {
> -		if (!(gfp_mask & __GFP_NOMEMALLOC)) {
> -nofail_alloc:
> -			/* go through the zonelist yet again, ignoring mins */
> -			page = get_page_from_freelist(gfp_mask, nodemask, order,
> -				zonelist, high_zoneidx, ALLOC_NO_WATERMARKS);
> -			if (page)
> -				goto got_pg;
> -			if (gfp_mask & __GFP_NOFAIL) {
> -				congestion_wait(WRITE, HZ/50);
> -				goto nofail_alloc;
> -			}
> -		}
> -		goto nopage;
> +	/* Allocate without watermarks if the context allows */
> +	if (alloc_flags & ALLOC_NO_WATERMARKS) {
> +		page = __alloc_pages_high_priority(gfp_mask, order,
> +				zonelist, high_zoneidx, nodemask,
> +				preferred_zone, migratetype);
> +		if (page)
> +			goto got_pg;
>  	}
> 
> Offending commit 341ce06 handled the PF_MEMALLOC case but forgot about
> the TIF_MEMDIE case.
> 
> Mel is having a bit of downtime at present.

I'm getting back online now and playing catch-up. You're right in that
TIF_MEMDIE returning NULL has been broken and it's possible in theory for an
OOM-killed process to loop forever. But maybe TIF_MEMDIE looping potentially
forever is expected in the case __GFP_NOFAIL is specified. Fixing this to allow
an OOM-killed process to exit does mean that callers using __GFP_NOFAIL must
still handle NULL being returned which might be very unexpected to the caller.

In 2.6.30, TIF_MEMDIE could cause a request to exit without ever entering
direct reclaim but chances are this didn't happen as it would have been
looping during an OOM-kill. To duplicate this, a check for TIF_MEMDIE would
happen after

	/* Avoid recursion of direct reclaim */
        if (p->flags & PF_MEMALLOC)
                goto nopage;

But as failing __GFP_NOFAIL is potentially serious, even for processes that
have been OOM killed, I think it makes more sense to check for TIF_MEMDIE
after direct reclaim and OOM killing have already been considered as options
with a patch such as the following?

==== CUT HERE ====
page-allocator: Ensure that processes that have been OOM killed exit the page allocator

Processes that have been OOM killed set the thread flag TIF_MEMDIE. A
process such as this is expected to exit the page allocator but in the
event it happens to have set __GFP_NOFAIL, it potentially loops forever.

This patch checks TIF_MEMDIE when deciding whether to loop again in the
page allocator. Such a process will now return NULL after direct reclaim
and OOM killing have both been considered as options. The potential
problem is that a __GFP_NOFAIL allocation can still return failure so
callers must still handle getting returned NULL.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
--- 
 mm/page_alloc.c |    8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5d714f8..8449cf9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1539,6 +1539,10 @@ should_alloc_retry(gfp_t gfp_mask, unsigned int order,
 	if (gfp_mask & __GFP_NORETRY)
 		return 0;
 
+	/* Do not loop if this process has been OOM-killed */
+	if (test_thread_flag(TIF_MEMDIE))
+		return 0;
+
 	/*
 	 * In this implementation, order <= PAGE_ALLOC_COSTLY_ORDER
 	 * means __GFP_NOFAIL, but that may not be true in other
@@ -1823,6 +1827,10 @@ rebalance:
 						!(gfp_mask & __GFP_NOFAIL))
 				goto nopage;
 
+			/* Do not loop if this process has been OOM-killed */
+			if (test_thread_flag(TIF_MEMDIE))
+				goto nopage;
+
 			goto restart;
 		}
 	}

  parent reply	other threads:[~2009-06-29 15:30 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-06-24 15:07 upcoming kerneloops.org item: get_page_from_freelist Arjan van de Ven
2009-06-24 16:46 ` Andrew Morton
2009-06-24 16:52   ` Linus Torvalds
2009-06-24 16:55   ` Pekka Enberg
2009-06-24 16:56     ` Pekka Enberg
2009-06-24 17:00       ` Pekka Enberg
2009-06-24 17:55     ` Andrew Morton
2009-06-24 17:53       ` Pekka Enberg
2009-06-24 18:30         ` Andrew Morton
2009-06-24 18:42           ` Linus Torvalds
2009-06-24 18:44             ` Pekka Enberg
2009-06-24 18:50               ` Linus Torvalds
2009-06-24 19:12                 ` Pekka J Enberg
2009-06-24 19:21                   ` Linus Torvalds
2009-06-24 19:06             ` Andrew Morton
2009-06-24 19:16               ` Linus Torvalds
2009-06-24 19:36                 ` Andrew Morton
2009-06-24 19:46                   ` Linus Torvalds
2009-06-24 19:47                     ` Linus Torvalds
2009-06-24 20:01                     ` Andrew Morton
2009-06-24 20:13                       ` Linus Torvalds
2009-06-24 20:40                         ` Linus Torvalds
2009-06-24 22:07                           ` Andrew Morton
2009-06-25  4:05                             ` Nick Piggin
2009-06-25 13:25                             ` Theodore Tso
2009-06-25 18:51                               ` David Rientjes
2009-06-25 19:38                                 ` Theodore Tso
2009-06-25 19:44                                   ` Theodore Tso
2009-06-25 19:55                                     ` Andrew Morton
2009-06-25 20:11                                     ` Linus Torvalds
2009-06-25 20:22                                       ` Linus Torvalds
2009-06-25 20:36                                         ` David Rientjes
2009-06-25 20:51                                           ` Linus Torvalds
2009-06-25 22:25                                             ` David Rientjes
2009-06-26  8:51                                         ` Nick Piggin
2009-06-25 20:18                                     ` David Rientjes
2009-06-25 20:37                                       ` Theodore Tso
2009-06-25 21:05                                         ` Joel Becker
2009-06-25 21:26                                         ` Andreas Dilger
2009-06-25 22:05                                           ` Theodore Tso
2009-06-25 22:11                                             ` Eric Sandeen
2009-06-26  1:11                                               ` Theodore Tso
2009-06-26  5:16                                                 ` Pekka J Enberg
2009-06-26  8:56                                                   ` Nick Piggin
2009-06-26  8:58                                                     ` Pekka Enberg
2009-06-26  9:07                                                       ` Nick Piggin
2009-06-29 21:06                                                       ` Christoph Lameter
2009-06-30  7:59                                                         ` Nick Piggin
2009-06-26 14:41                                                   ` Eric Sandeen
2009-06-29 21:15                                                     ` Christoph Lameter
2009-06-29 21:20                                                       ` Eric Sandeen
2009-06-29 22:35                                                         ` Christoph Lameter
2009-06-25 19:55                             ` Jens Axboe
2009-06-25 20:08                               ` Jens Axboe
2009-06-24 21:56                         ` Andrew Morton
2009-06-25  4:14                           ` Nick Piggin
2009-06-25  8:21                           ` David Rientjes
2009-06-29 15:30                           ` Mel Gorman [this message]
2009-06-29 19:20                             ` Andrew Morton
2009-06-30 11:00                               ` Mel Gorman
2009-06-30 19:35                                 ` David Rientjes
2009-06-30 20:32                                   ` Mel Gorman
2009-06-30 20:51                                     ` David Rientjes
2009-07-01 10:22                                       ` Mel Gorman
2009-06-29 23:35                             ` David Rientjes
2009-06-30  7:47                               ` Nick Piggin
2009-06-30  8:13                                 ` David Rientjes
2009-06-30  8:24                                   ` Nick Piggin
2009-06-30  8:41                                     ` David Rientjes
2009-06-30  9:09                                       ` Nick Piggin
2009-06-30 19:47                                         ` David Rientjes
2009-06-30  6:27                           ` Pavel Machek
2009-06-28 10:16                     ` Pavel Machek
2009-06-28 18:01                       ` Linus Torvalds
2009-06-28 18:27                         ` Arjan van de Ven
2009-06-28 18:36                           ` Linus Torvalds
2009-06-30  7:35                         ` Pavel Machek
2009-06-24 18:43           ` Pekka Enberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090629153007.GD5065@csn.ul.ie \
    --to=mel@csn.ul.ie \
    --cc=akpm@linux-foundation.org \
    --cc=arjan@infradead.org \
    --cc=cl@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=npiggin@suse.de \
    --cc=penberg@cs.helsinki.fi \
    --cc=rientjes@google.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox