From: Mel Gorman <mel@csn.ul.ie>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
penberg@cs.helsinki.fi, arjan@infradead.org,
linux-kernel@vger.kernel.org, cl@linux-foundation.org,
npiggin@suse.de, David Rientjes <rientjes@google.com>
Subject: Re: upcoming kerneloops.org item: get_page_from_freelist
Date: Mon, 29 Jun 2009 16:30:07 +0100 [thread overview]
Message-ID: <20090629153007.GD5065@csn.ul.ie> (raw)
In-Reply-To: <20090624145615.2ff9e56e.akpm@linux-foundation.org>
On Wed, Jun 24, 2009 at 02:56:15PM -0700, Andrew Morton wrote:
> On Wed, 24 Jun 2009 13:13:48 -0700 (PDT)
> Linus Torvalds <torvalds@linux-foundation.org> wrote:
> >
> > On Wed, 24 Jun 2009, Andrew Morton wrote:
> > >
> > > If the caller gets oom-killed, the allocation attempt fails. Callers need
> > > to handle that.
> >
> > I actually disagree. I think we should just admit that we can always free
> > up enough space to get a few pages, in order to then oom-kill things.
>
> I'm unclear on precisely what you're proposing here?
>
As order <= PAGE_ALLOC_COSTLY_ORDER implies __GFP_NOFAIL, prehaps it
makes sense to change the check to
WARN_ON_ONCE(order > PAGE_ALLOC_COSTLY_ORDER)
? The temptation might be there to remove __GFP_NOFAIL for smaller orders but
it makes sense to have it available in case CONFIG_FAULT_INJECTION_DEBUG_FS
is set and randomly failing allocations that have serious consequences even
if handled.
> > This is not a new concept. oom has never been "immediately kill".
>
> Well, it has been immediate for a long time. A couple of reasons which
> I can recall:
>
> - A page-allocating process will oom-kill another process in the
> expectation that the killing will free up some memory. If the
> oom-killed process remains stuck in the page allocator, that doesn't
> work.
>
> - The oom-killed process might be holding locks (typically fs locks).
> This can cause an arbitrary number of other processes to be blocked.
> So to get the system unstuck we need the oom-killed process to
> immediately exit the page allocator, to handle the NULL return and to
> drop those locks.
>
> There may be other reasons - it was all a long time ago, and I've never
> personally hacked on the oom-killer much and I never get oom-killed.
> But given the amount of development work which goes on in there, some
> people must be getting massacred.
>
>
> A long time ago, the Suse kernel shipped with a largely (or
> completely?) disabled oom-killer. It removed the
> retry-small-allocations-for-ever logic and simply returned NULL to the
> caller. I never really understood what problem/thinking led Andrea to
> do that.
>
>
> But it's all a bit moot at present, as we seem to have removed the
> return-NULL-if-TIF_MEMDIE logic in Mel's post-2.6.30 merges. I think
> that was an accident:
>
> - /* This allocation should allow future memory freeing. */
> -
> rebalance:
> - if (((p->flags & PF_MEMALLOC) || unlikely(test_thread_flag(TIF_MEMDIE)))
> - && !in_interrupt()) {
> - if (!(gfp_mask & __GFP_NOMEMALLOC)) {
> -nofail_alloc:
> - /* go through the zonelist yet again, ignoring mins */
> - page = get_page_from_freelist(gfp_mask, nodemask, order,
> - zonelist, high_zoneidx, ALLOC_NO_WATERMARKS);
> - if (page)
> - goto got_pg;
> - if (gfp_mask & __GFP_NOFAIL) {
> - congestion_wait(WRITE, HZ/50);
> - goto nofail_alloc;
> - }
> - }
> - goto nopage;
> + /* Allocate without watermarks if the context allows */
> + if (alloc_flags & ALLOC_NO_WATERMARKS) {
> + page = __alloc_pages_high_priority(gfp_mask, order,
> + zonelist, high_zoneidx, nodemask,
> + preferred_zone, migratetype);
> + if (page)
> + goto got_pg;
> }
>
> Offending commit 341ce06 handled the PF_MEMALLOC case but forgot about
> the TIF_MEMDIE case.
>
> Mel is having a bit of downtime at present.
I'm getting back online now and playing catch-up. You're right in that
TIF_MEMDIE returning NULL has been broken and it's possible in theory for an
OOM-killed process to loop forever. But maybe TIF_MEMDIE looping potentially
forever is expected in the case __GFP_NOFAIL is specified. Fixing this to allow
an OOM-killed process to exit does mean that callers using __GFP_NOFAIL must
still handle NULL being returned which might be very unexpected to the caller.
In 2.6.30, TIF_MEMDIE could cause a request to exit without ever entering
direct reclaim but chances are this didn't happen as it would have been
looping during an OOM-kill. To duplicate this, a check for TIF_MEMDIE would
happen after
/* Avoid recursion of direct reclaim */
if (p->flags & PF_MEMALLOC)
goto nopage;
But as failing __GFP_NOFAIL is potentially serious, even for processes that
have been OOM killed, I think it makes more sense to check for TIF_MEMDIE
after direct reclaim and OOM killing have already been considered as options
with a patch such as the following?
==== CUT HERE ====
page-allocator: Ensure that processes that have been OOM killed exit the page allocator
Processes that have been OOM killed set the thread flag TIF_MEMDIE. A
process such as this is expected to exit the page allocator but in the
event it happens to have set __GFP_NOFAIL, it potentially loops forever.
This patch checks TIF_MEMDIE when deciding whether to loop again in the
page allocator. Such a process will now return NULL after direct reclaim
and OOM killing have both been considered as options. The potential
problem is that a __GFP_NOFAIL allocation can still return failure so
callers must still handle getting returned NULL.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
mm/page_alloc.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5d714f8..8449cf9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1539,6 +1539,10 @@ should_alloc_retry(gfp_t gfp_mask, unsigned int order,
if (gfp_mask & __GFP_NORETRY)
return 0;
+ /* Do not loop if this process has been OOM-killed */
+ if (test_thread_flag(TIF_MEMDIE))
+ return 0;
+
/*
* In this implementation, order <= PAGE_ALLOC_COSTLY_ORDER
* means __GFP_NOFAIL, but that may not be true in other
@@ -1823,6 +1827,10 @@ rebalance:
!(gfp_mask & __GFP_NOFAIL))
goto nopage;
+ /* Do not loop if this process has been OOM-killed */
+ if (test_thread_flag(TIF_MEMDIE))
+ goto nopage;
+
goto restart;
}
}
next prev parent reply other threads:[~2009-06-29 15:30 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-06-24 15:07 upcoming kerneloops.org item: get_page_from_freelist Arjan van de Ven
2009-06-24 16:46 ` Andrew Morton
2009-06-24 16:52 ` Linus Torvalds
2009-06-24 16:55 ` Pekka Enberg
2009-06-24 16:56 ` Pekka Enberg
2009-06-24 17:00 ` Pekka Enberg
2009-06-24 17:55 ` Andrew Morton
2009-06-24 17:53 ` Pekka Enberg
2009-06-24 18:30 ` Andrew Morton
2009-06-24 18:42 ` Linus Torvalds
2009-06-24 18:44 ` Pekka Enberg
2009-06-24 18:50 ` Linus Torvalds
2009-06-24 19:12 ` Pekka J Enberg
2009-06-24 19:21 ` Linus Torvalds
2009-06-24 19:06 ` Andrew Morton
2009-06-24 19:16 ` Linus Torvalds
2009-06-24 19:36 ` Andrew Morton
2009-06-24 19:46 ` Linus Torvalds
2009-06-24 19:47 ` Linus Torvalds
2009-06-24 20:01 ` Andrew Morton
2009-06-24 20:13 ` Linus Torvalds
2009-06-24 20:40 ` Linus Torvalds
2009-06-24 22:07 ` Andrew Morton
2009-06-25 4:05 ` Nick Piggin
2009-06-25 13:25 ` Theodore Tso
2009-06-25 18:51 ` David Rientjes
2009-06-25 19:38 ` Theodore Tso
2009-06-25 19:44 ` Theodore Tso
2009-06-25 19:55 ` Andrew Morton
2009-06-25 20:11 ` Linus Torvalds
2009-06-25 20:22 ` Linus Torvalds
2009-06-25 20:36 ` David Rientjes
2009-06-25 20:51 ` Linus Torvalds
2009-06-25 22:25 ` David Rientjes
2009-06-26 8:51 ` Nick Piggin
2009-06-25 20:18 ` David Rientjes
2009-06-25 20:37 ` Theodore Tso
2009-06-25 21:05 ` Joel Becker
2009-06-25 21:26 ` Andreas Dilger
2009-06-25 22:05 ` Theodore Tso
2009-06-25 22:11 ` Eric Sandeen
2009-06-26 1:11 ` Theodore Tso
2009-06-26 5:16 ` Pekka J Enberg
2009-06-26 8:56 ` Nick Piggin
2009-06-26 8:58 ` Pekka Enberg
2009-06-26 9:07 ` Nick Piggin
2009-06-29 21:06 ` Christoph Lameter
2009-06-30 7:59 ` Nick Piggin
2009-06-26 14:41 ` Eric Sandeen
2009-06-29 21:15 ` Christoph Lameter
2009-06-29 21:20 ` Eric Sandeen
2009-06-29 22:35 ` Christoph Lameter
2009-06-25 19:55 ` Jens Axboe
2009-06-25 20:08 ` Jens Axboe
2009-06-24 21:56 ` Andrew Morton
2009-06-25 4:14 ` Nick Piggin
2009-06-25 8:21 ` David Rientjes
2009-06-29 15:30 ` Mel Gorman [this message]
2009-06-29 19:20 ` Andrew Morton
2009-06-30 11:00 ` Mel Gorman
2009-06-30 19:35 ` David Rientjes
2009-06-30 20:32 ` Mel Gorman
2009-06-30 20:51 ` David Rientjes
2009-07-01 10:22 ` Mel Gorman
2009-06-29 23:35 ` David Rientjes
2009-06-30 7:47 ` Nick Piggin
2009-06-30 8:13 ` David Rientjes
2009-06-30 8:24 ` Nick Piggin
2009-06-30 8:41 ` David Rientjes
2009-06-30 9:09 ` Nick Piggin
2009-06-30 19:47 ` David Rientjes
2009-06-30 6:27 ` Pavel Machek
2009-06-28 10:16 ` Pavel Machek
2009-06-28 18:01 ` Linus Torvalds
2009-06-28 18:27 ` Arjan van de Ven
2009-06-28 18:36 ` Linus Torvalds
2009-06-30 7:35 ` Pavel Machek
2009-06-24 18:43 ` Pekka Enberg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090629153007.GD5065@csn.ul.ie \
--to=mel@csn.ul.ie \
--cc=akpm@linux-foundation.org \
--cc=arjan@infradead.org \
--cc=cl@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=npiggin@suse.de \
--cc=penberg@cs.helsinki.fi \
--cc=rientjes@google.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox