From: Balbir Singh <balbir@linux.vnet.ibm.com>
To: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Nick Piggin <npiggin@suse.de>,
Andrew Morton <akpm@linux-foundation.org>,
Rik van Riel <riel@redhat.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Lubos Lunak <l.lunak@suse.cz>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [patch -mm 8/9 v2] oom: avoid oom killer for lowmem allocations
Date: Tue, 23 Feb 2010 16:54:31 +0530 [thread overview]
Message-ID: <20100223112431.GA8871@balbir.in.ibm.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1002161609200.11952@chino.kir.corp.google.com>
* David Rientjes <rientjes@google.com> [2010-02-16 16:21:11]:
> On Wed, 17 Feb 2010, KAMEZAWA Hiroyuki wrote:
>
> > > On Wed, 17 Feb 2010, KAMEZAWA Hiroyuki wrote:
> > >
> > > > > > > I'll add this check to __alloc_pages_may_oom() for the !(gfp_mask &
> > > > > > > __GFP_NOFAIL) path since we're all content with endlessly looping.
> > > > > >
> > > > > > Thanks. Yes endlessly looping is far preferable to randomly oopsing
> > > > > > or corrupting memory.
> > > > > >
> > > > >
> > > > > Here's the new patch for your consideration.
> > > > >
> > > >
> > > > Then, can we take kdump in this endlessly looping situaton ?
> > > >
> > > > panic_on_oom=always + kdump can do that.
> > > >
> > >
> > > The endless loop is only helpful if something is going to free memory
> > > external to the current page allocation: either another task with
> > > __GFP_WAIT | __GFP_FS that invokes the oom killer, a task that frees
> > > memory, or a task that exits.
> > >
> > > The most notable endless loop in the page allocator is the one when a task
> > > has been oom killed, gets access to memory reserves, and then cannot find
> > > a page for a __GFP_NOFAIL allocation:
> > >
> > > do {
> > > page = get_page_from_freelist(gfp_mask, nodemask, order,
> > > zonelist, high_zoneidx, ALLOC_NO_WATERMARKS,
> > > preferred_zone, migratetype);
> > >
> > > if (!page && gfp_mask & __GFP_NOFAIL)
> > > congestion_wait(BLK_RW_ASYNC, HZ/50);
> > > } while (!page && (gfp_mask & __GFP_NOFAIL));
> > >
> > > We don't expect any such allocations to happen during the exit path, but
> > > we could probably find some in the fs layer.
> > >
> > > I don't want to check sysctl_panic_on_oom in the page allocator because it
> > > would start panicking the machine unnecessarily for the integrity
> > > metadata GFP_NOIO | __GFP_NOFAIL allocation, for any
> > > order > PAGE_ALLOC_COSTLY_ORDER, or for users who can't lock the zonelist
> > > for oom kill that wouldn't have panicked before.
> > >
> >
> > Then, why don't you check higzone_idx in oom_kill.c
> >
>
> out_of_memory() doesn't return a value to specify whether the page
> allocator should retry the allocation or just return NULL, all that policy
> is kept in mm/page_alloc.c. For highzone_idx < ZONE_NORMAL, we want to
> fail the allocation when !(gfp_mask & __GFP_NOFAIL) and call the oom
> killer when it's __GFP_NOFAIL.
> ---
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1696,6 +1696,9 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
> /* The OOM killer will not help higher order allocs */
> if (order > PAGE_ALLOC_COSTLY_ORDER)
> goto out;
> + /* The OOM killer does not needlessly kill tasks for lowmem */
> + if (high_zoneidx < ZONE_NORMAL)
> + goto out;
I am not sure if this is a good idea, ZONE_DMA could have a lot of
memory on some architectures. IIUC, we return NULL for allocations
from ZONE_DMA? What is the reason for the heuristic?
> /*
> * GFP_THISNODE contains __GFP_NORETRY and we never hit this.
> * Sanity check for bare calls of __GFP_THISNODE, not real OOM.
> @@ -1924,15 +1927,23 @@ rebalance:
> if (page)
> goto got_pg;
>
> - /*
> - * The OOM killer does not trigger for high-order
> - * ~__GFP_NOFAIL allocations so if no progress is being
> - * made, there are no other options and retrying is
> - * unlikely to help.
> - */
> - if (order > PAGE_ALLOC_COSTLY_ORDER &&
> - !(gfp_mask & __GFP_NOFAIL))
> - goto nopage;
> + if (!(gfp_mask & __GFP_NOFAIL)) {
> + /*
> + * The oom killer is not called for high-order
> + * allocations that may fail, so if no progress
> + * is being made, there are no other options and
> + * retrying is unlikely to help.
> + */
> + if (order > PAGE_ALLOC_COSTLY_ORDER)
> + goto nopage;
> + /*
> + * The oom killer is not called for lowmem
> + * allocations to prevent needlessly killing
> + * innocent tasks.
> + */
> + if (high_zoneidx < ZONE_NORMAL)
> + goto nopage;
> + }
>
> goto restart;
> }
--
Three Cheers,
Balbir
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-02-23 11:24 UTC|newest]
Thread overview: 73+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-02-15 22:19 [patch -mm 0/9 v2] oom killer rewrite David Rientjes
2010-02-15 22:20 ` [patch -mm 1/9 v2] oom: filter tasks not sharing the same cpuset David Rientjes
2010-02-16 6:14 ` Nick Piggin
2010-02-15 22:20 ` [patch -mm 2/9 v2] oom: sacrifice child with highest badness score for parent David Rientjes
2010-02-16 6:15 ` Nick Piggin
2010-02-15 22:20 ` [patch -mm 3/9 v2] oom: select task from tasklist for mempolicy ooms David Rientjes
2010-02-23 6:31 ` Balbir Singh
2010-02-23 8:17 ` David Rientjes
2010-02-15 22:20 ` [patch -mm 4/9 v2] oom: remove compulsory panic_on_oom mode David Rientjes
2010-02-16 0:00 ` KAMEZAWA Hiroyuki
2010-02-16 0:14 ` David Rientjes
2010-02-16 0:23 ` KAMEZAWA Hiroyuki
2010-02-16 9:02 ` David Rientjes
2010-02-16 23:42 ` KAMEZAWA Hiroyuki
2010-02-16 23:54 ` David Rientjes
2010-02-17 0:01 ` KAMEZAWA Hiroyuki
2010-02-17 0:31 ` David Rientjes
2010-02-17 0:41 ` KAMEZAWA Hiroyuki
2010-02-17 0:54 ` David Rientjes
2010-02-17 1:03 ` KAMEZAWA Hiroyuki
2010-02-17 1:58 ` David Rientjes
2010-02-17 2:13 ` KAMEZAWA Hiroyuki
2010-02-17 2:23 ` KAMEZAWA Hiroyuki
2010-02-17 2:37 ` David Rientjes
2010-02-17 2:28 ` David Rientjes
2010-02-17 2:34 ` KAMEZAWA Hiroyuki
2010-02-17 2:58 ` David Rientjes
2010-02-17 3:21 ` KAMEZAWA Hiroyuki
2010-02-17 9:11 ` David Rientjes
2010-02-17 9:52 ` Nick Piggin
2010-02-17 22:04 ` David Rientjes
2010-02-22 5:31 ` Daisuke Nishimura
2010-02-22 6:15 ` KAMEZAWA Hiroyuki
2010-02-22 11:42 ` Daisuke Nishimura
2010-02-22 20:59 ` David Rientjes
2010-02-22 23:51 ` KAMEZAWA Hiroyuki
2010-02-22 20:55 ` David Rientjes
2010-02-17 2:19 ` KOSAKI Motohiro
2010-02-16 6:20 ` Nick Piggin
2010-02-16 6:59 ` David Rientjes
2010-02-16 7:20 ` Nick Piggin
2010-02-16 7:53 ` David Rientjes
2010-02-16 8:08 ` Nick Piggin
2010-02-16 8:10 ` KAMEZAWA Hiroyuki
2010-02-16 8:42 ` David Rientjes
2010-02-15 22:20 ` [patch -mm 5/9 v2] oom: badness heuristic rewrite David Rientjes
2010-02-15 22:20 ` [patch -mm 6/9 v2] oom: deprecate oom_adj tunable David Rientjes
2010-02-15 22:28 ` Alan Cox
2010-02-15 22:35 ` David Rientjes
2010-02-15 22:20 ` [patch -mm 7/9 v2] oom: replace sysctls with quick mode David Rientjes
2010-02-16 6:28 ` Nick Piggin
2010-02-16 8:58 ` David Rientjes
2010-02-15 22:20 ` [patch -mm 8/9 v2] oom: avoid oom killer for lowmem allocations David Rientjes
2010-02-15 23:57 ` KAMEZAWA Hiroyuki
2010-02-16 0:10 ` David Rientjes
2010-02-16 0:21 ` KAMEZAWA Hiroyuki
2010-02-16 1:13 ` [patch] mm: add comment about deprecation of __GFP_NOFAIL David Rientjes
2010-02-16 1:26 ` KAMEZAWA Hiroyuki
2010-02-16 7:03 ` David Rientjes
2010-02-16 7:23 ` Nick Piggin
2010-02-16 5:32 ` [patch -mm 8/9 v2] oom: avoid oom killer for lowmem allocations KOSAKI Motohiro
2010-02-16 7:29 ` David Rientjes
2010-02-16 6:44 ` Nick Piggin
2010-02-16 7:41 ` David Rientjes
2010-02-16 7:53 ` Nick Piggin
2010-02-16 8:25 ` David Rientjes
2010-02-16 23:48 ` KAMEZAWA Hiroyuki
2010-02-17 0:03 ` David Rientjes
2010-02-17 0:03 ` KAMEZAWA Hiroyuki
2010-02-17 0:21 ` David Rientjes
2010-02-23 11:24 ` Balbir Singh [this message]
2010-02-23 21:12 ` David Rientjes
2010-02-15 22:20 ` [patch -mm 9/9 v2] oom: remove unnecessary code and cleanup David Rientjes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100223112431.GA8871@balbir.in.ibm.com \
--to=balbir@linux.vnet.ibm.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=l.lunak@suse.cz \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=npiggin@suse.de \
--cc=riel@redhat.com \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).