From: Nick Piggin <npiggin@suse.de>
To: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
penberg@cs.helsinki.fi, arjan@infradead.org,
linux-kernel@vger.kernel.org, cl@linux-foundation.org
Subject: Re: upcoming kerneloops.org item: get_page_from_freelist
Date: Tue, 30 Jun 2009 10:24:15 +0200 [thread overview]
Message-ID: <20090630082415.GC11980@wotan.suse.de> (raw)
In-Reply-To: <alpine.DEB.2.00.0906300050530.11018@chino.kir.corp.google.com>
On Tue, Jun 30, 2009 at 01:13:08AM -0700, David Rientjes wrote:
> On Tue, 30 Jun 2009, Nick Piggin wrote:
>
> > > That's not the expected behavior for TIF_MEMDIE, although your patch
> > > certainly changes that.
> > >
> > > Your patch is simply doing
> > >
> > > if (test_thread_flag(TIF_MEMDIE))
> > > gfp_mask |= __GFP_NORETRY;
> > >
> > > in the slowpath.
> > >
> > > TIF_MEMDIE is supposed to allow allocations to succeed, not automatically
> > > fail, so that it can quickly handle its SIGKILL without getting blocked in
> > > the exit path seeking more memory.
> >
> > Yes, it need to just ignore all watermarks, do not reclaim (we've
> > already decided reclaim will not work at this point), and return a
> > page if we have one otherwise NULL (unless GFP_NOFAIL is set).
> >
>
> Right, there's no sense in looping endlessly for ~__GFP_NOFAIL if
> allocations continue to fail for a thread with TIF_MEMDIE set.
>
> TIF_MEMDIE doesn't check any watermarks as opposed to GFP_ATOMIC, which
> only reduces the min watermark by half, so we can access more memory
> reserves with TIF_MEMDIE. Instead of immediately failing an oom killed
> task's allocation as in Mel's patch, there is a higher liklihood that it
> will succeed on the next attempt.
Yes. This is how it should have worked prior to Mel's patches, so we
should aim to restore that.
> I'd agree with Mel's added check for TIF_MEMDIE upon returning from the
> oom killer, but only for __GFP_NOMEMALLOC.
NOMEMALLOC indeed should always be kept away from memalloc/memdie
reserves. That's how it should have worked when I added it (but
I may have forgotten TIF_MEMDIE, I can't remember).
> > > All __GFP_NOFAIL allocations should ensure that alloc_pages() never
> > > returns NULL. Although it's unfortunate, that's the requirement that
> > > callers have been guaranteed and until they are fixed, the page allocator
> > > should respect it.
> >
> > Yes.
> >
> > Interesting thing is what to do when we have 0 pages left, we are
> > TIF_MEMDIE, and GFP_NOFAIL is set. Looping will most likely just
> > deadlock the system. Returning NULL will probably oops caller with
> > various locks held and then deadlock the system. It really needs to
> > punt back to the OOM killer so it can select another task. Until
> > then, maybe a simple panic would be reasonable? (it's *never* going
> > to hit anyone in practice I'd say, but if it does then a panic
> > would be better than lockup at least we know what the problem was).
> >
>
> The oom killer currently is a no-op if any eligible task has TIF_MEMDIE,
> so this would require adding an oom killer timeout so that if a task fails
> to exit after a predefined period, TIF_MEMDIE is cleared and the task is
> marked to no longer be selected (which would require an addition to
> task_struct) although it may have already completely depleted memory
> reserves.
It wouldn't have to be a timeout, it could be a call back to the
oom killer.
> The best alternative is just to increase min_free_kbytes to ensure that
> adequate memory reserves (and its partial exemptions allowed by
> GFP_ATOMIC, ALLOC_HARDER, and PF_MEMALLOC) are sustained for an oom killed
> task to exit and that we try hard to avoid getting stuck in
> TASK_UNINTERRUPTIBLE.
Well we're discussing what to do when reserves run out and NOFAIL
is set. So increasing min_free_kbytes is not a valid alternative :)
My vote is a simple panic with a clear message.
> > > I disagree with this change because it unconditionally fails allocations
> > > when a task has been oom killed, a scenario which should be the _highest_
> > > priority for allocations to succeed since it leads to future memory
> >
> > That's another interesting point. I do agree with you because that
> > would restore previous behaviour which got broken. But I wonder if
> > possibly it would be a better idea to fail all allocations? That
> > would a) protect reserves more, and b) probably quite likely to
> > exit the syscall *sooner* than if we try to satisfy all allocations.
> >
>
> You could only fail the single allocation where you triggered the oom
> killer and you were the task chosen to die, which is what Mel's patch
> implemented in the first half. I agree that would protect the memory
> reserves more.
next prev parent reply other threads:[~2009-06-30 8:24 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-06-24 15:07 upcoming kerneloops.org item: get_page_from_freelist Arjan van de Ven
2009-06-24 16:46 ` Andrew Morton
2009-06-24 16:52 ` Linus Torvalds
2009-06-24 16:55 ` Pekka Enberg
2009-06-24 16:56 ` Pekka Enberg
2009-06-24 17:00 ` Pekka Enberg
2009-06-24 17:55 ` Andrew Morton
2009-06-24 17:53 ` Pekka Enberg
2009-06-24 18:30 ` Andrew Morton
2009-06-24 18:42 ` Linus Torvalds
2009-06-24 18:44 ` Pekka Enberg
2009-06-24 18:50 ` Linus Torvalds
2009-06-24 19:12 ` Pekka J Enberg
2009-06-24 19:21 ` Linus Torvalds
2009-06-24 19:06 ` Andrew Morton
2009-06-24 19:16 ` Linus Torvalds
2009-06-24 19:36 ` Andrew Morton
2009-06-24 19:46 ` Linus Torvalds
2009-06-24 19:47 ` Linus Torvalds
2009-06-24 20:01 ` Andrew Morton
2009-06-24 20:13 ` Linus Torvalds
2009-06-24 20:40 ` Linus Torvalds
2009-06-24 22:07 ` Andrew Morton
2009-06-25 4:05 ` Nick Piggin
2009-06-25 13:25 ` Theodore Tso
2009-06-25 18:51 ` David Rientjes
2009-06-25 19:38 ` Theodore Tso
2009-06-25 19:44 ` Theodore Tso
2009-06-25 19:55 ` Andrew Morton
2009-06-25 20:11 ` Linus Torvalds
2009-06-25 20:22 ` Linus Torvalds
2009-06-25 20:36 ` David Rientjes
2009-06-25 20:51 ` Linus Torvalds
2009-06-25 22:25 ` David Rientjes
2009-06-26 8:51 ` Nick Piggin
2009-06-25 20:18 ` David Rientjes
2009-06-25 20:37 ` Theodore Tso
2009-06-25 21:05 ` Joel Becker
2009-06-25 21:26 ` Andreas Dilger
2009-06-25 22:05 ` Theodore Tso
2009-06-25 22:11 ` Eric Sandeen
2009-06-26 1:11 ` Theodore Tso
2009-06-26 5:16 ` Pekka J Enberg
2009-06-26 8:56 ` Nick Piggin
2009-06-26 8:58 ` Pekka Enberg
2009-06-26 9:07 ` Nick Piggin
2009-06-29 21:06 ` Christoph Lameter
2009-06-30 7:59 ` Nick Piggin
2009-06-26 14:41 ` Eric Sandeen
2009-06-29 21:15 ` Christoph Lameter
2009-06-29 21:20 ` Eric Sandeen
2009-06-29 22:35 ` Christoph Lameter
2009-06-25 19:55 ` Jens Axboe
2009-06-25 20:08 ` Jens Axboe
2009-06-24 21:56 ` Andrew Morton
2009-06-25 4:14 ` Nick Piggin
2009-06-25 8:21 ` David Rientjes
2009-06-29 15:30 ` Mel Gorman
2009-06-29 19:20 ` Andrew Morton
2009-06-30 11:00 ` Mel Gorman
2009-06-30 19:35 ` David Rientjes
2009-06-30 20:32 ` Mel Gorman
2009-06-30 20:51 ` David Rientjes
2009-07-01 10:22 ` Mel Gorman
2009-06-29 23:35 ` David Rientjes
2009-06-30 7:47 ` Nick Piggin
2009-06-30 8:13 ` David Rientjes
2009-06-30 8:24 ` Nick Piggin [this message]
2009-06-30 8:41 ` David Rientjes
2009-06-30 9:09 ` Nick Piggin
2009-06-30 19:47 ` David Rientjes
2009-06-30 6:27 ` Pavel Machek
2009-06-28 10:16 ` Pavel Machek
2009-06-28 18:01 ` Linus Torvalds
2009-06-28 18:27 ` Arjan van de Ven
2009-06-28 18:36 ` Linus Torvalds
2009-06-30 7:35 ` Pavel Machek
2009-06-24 18:43 ` Pekka Enberg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090630082415.GC11980@wotan.suse.de \
--to=npiggin@suse.de \
--cc=akpm@linux-foundation.org \
--cc=arjan@infradead.org \
--cc=cl@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mel@csn.ul.ie \
--cc=penberg@cs.helsinki.fi \
--cc=rientjes@google.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox