From: Peter Zijlstra <peterz@infradead.org>
To: David Rientjes <rientjes@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Andi Kleen <ak@suse.de>,
Christoph Lameter <clameter@engr.sgi.com>,
Paul Jackson <pj@sgi.com>,
linux-kernel@vger.kernel.org
Subject: Re: [patch] cpusets: do not allow TIF_MEMDIE tasks to allocate globally
Date: Wed, 06 Jun 2007 09:09:13 +0200 [thread overview]
Message-ID: <1181113753.7348.164.camel@twins> (raw)
In-Reply-To: <alpine.DEB.0.99.0706052334280.30783@chino.kir.corp.google.com>
On Tue, 2007-06-05 at 23:42 -0700, David Rientjes wrote:
> On Wed, 6 Jun 2007, Peter Zijlstra wrote:
>
> > > diff --git a/kernel/cpuset.c b/kernel/cpuset.c
> > > --- a/kernel/cpuset.c
> > > +++ b/kernel/cpuset.c
> > > @@ -2431,12 +2431,6 @@ int __cpuset_zone_allowed_softwall(struct zone *z, gfp_t gfp_mask)
> > > might_sleep_if(!(gfp_mask & __GFP_HARDWALL));
> > > if (node_isset(node, current->mems_allowed))
> > > return 1;
> > > - /*
> > > - * Allow tasks that have access to memory reserves because they have
> > > - * been OOM killed to get memory anywhere.
> > > - */
> > > - if (unlikely(test_thread_flag(TIF_MEMDIE)))
> > > - return 1;
> > > if (gfp_mask & __GFP_HARDWALL) /* If hardwall request, stop here */
> > > return 0;
> > >
> >
> > This seems a little pointless, since cpuset_zone_allowed_softwall() is
> > only effective with ALLOC_CPUSET, and the ALLOC_NO_WATERMARKS
> > allocations opened up by TIF_MEMDIE don't use that.
> >
>
> That's the change. Memory reserves on a per-zone level would now be used
> with respect to ALLOC_NO_WATERMARKS because TIF_MEMDIE tasks no longer
> have an explicit bypass for them. If the node is not in the task's
> mems_allowed and it's a __GFP_HARDWALL allocation or if it's neither
> PF_EXITING or in the nearest_exclusive_ancestor() of that task's cpuset,
> then we move on to the next zone in get_page_from_freelist().
>
> The problem the patch addresses is that, as you mentioned, tasks with
> TIF_MEMDIE have an explicit bypass over using any memory reserves and can
> thus, depending on zonelist ordering, allocate first on nodes outside its
> mems_allowed even in the case of an exclusive cpuset before exhausting its
> own memory. That behavior is wrong.
Right, I see your point; however considering that its a system
allocation, and all these constraints get violated by interrupts anyway,
its more of an application container than a strict allocation container.
Are you actually seeing the described behaviour, or just being pedantic
(nothing wrong with that per-se)?
I would actually be bold and make it worse by proposing something like
this, which has the benefit of making the reserve threshold system wide.
---
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1eef614..870a791 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1630,6 +1630,21 @@ rebalance:
&& !in_interrupt()) {
if (!(gfp_mask & __GFP_NOMEMALLOC)) {
nofail_alloc:
+ /*
+ * break out of mempolicy boundaries
+ */
+ zonelist = NODE_DATA(numa_node_id())->node_zonelists +
+ gfp_zone(gfp_mask);
+
+ /*
+ * Before going bare metal, try to get a page above the
+ * critical threshold - ignoring CPU sets.
+ */
+ page = get_page_from_freelist(gfp_mask, order, zonelist,
+ ALLOC_MIN|ALLOC_HIGH|ALLOC_HARDER);
+ if (page)
+ goto got_pg;
+
/* go through the zonelist yet again, ignoring mins */
page = get_page_from_freelist(gfp_mask, order,
zonelist, ALLOC_NO_WATERMARKS);
next prev parent reply other threads:[~2007-06-06 7:09 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-06-05 22:39 [patch] cpusets: do not allow TIF_MEMDIE tasks to allocate globally David Rientjes
2007-06-05 22:40 ` Christoph Lameter
2007-06-05 22:42 ` David Rientjes
2007-06-05 23:01 ` Paul Jackson
2007-06-05 23:16 ` David Rientjes
2007-06-05 23:19 ` Paul Jackson
2007-06-05 23:20 ` Christoph Lameter
2007-06-05 23:25 ` David Rientjes
2007-06-05 23:32 ` Christoph Lameter
2007-06-05 23:44 ` David Rientjes
2007-06-05 23:55 ` Paul Jackson
2007-06-06 1:17 ` David Rientjes
2007-06-06 1:20 ` Paul Jackson
2007-06-05 23:57 ` Christoph Lameter
2007-06-06 1:23 ` David Rientjes
2007-06-06 1:32 ` Christoph Lameter
2007-06-06 1:40 ` David Rientjes
2007-06-06 1:54 ` Christoph Lameter
2007-06-06 3:29 ` David Rientjes
2007-06-06 6:20 ` Peter Zijlstra
2007-06-06 6:42 ` David Rientjes
2007-06-06 7:09 ` Peter Zijlstra [this message]
2007-06-06 7:18 ` David Rientjes
2007-06-06 7:34 ` Paul Jackson
2007-06-06 7:39 ` Andrew Morton
2007-06-06 7:48 ` David Rientjes
2007-06-06 7:56 ` Paul Jackson
2007-06-06 8:00 ` Andrew Morton
2007-06-06 8:03 ` Peter Zijlstra
2007-06-06 7:56 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1181113753.7348.164.camel@twins \
--to=peterz@infradead.org \
--cc=ak@suse.de \
--cc=akpm@linux-foundation.org \
--cc=clameter@engr.sgi.com \
--cc=linux-kernel@vger.kernel.org \
--cc=pj@sgi.com \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.