From: Miao Xie <miaox@cn.fujitsu.com>
To: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Andrew Morton <akpm@linux-foundation.org>,
menage@google.com, cl@linux-foundation.org,
penberg@cs.helsinki.fi, mpm@selenic.com,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH] cpuset,mm: fix allocating page cache/slab object on the unallowed node when memory spread is set
Date: Wed, 31 Dec 2008 11:59:26 +0800 [thread overview]
Message-ID: <495AEE1E.9050303@cn.fujitsu.com> (raw)
In-Reply-To: <200812311413.45127.nickpiggin@yahoo.com.au>
on 2008-12-31 11:13 Nick Piggin wrote:
> On Wednesday 31 December 2008 09:28:05 Andrew Morton wrote:
>> On Fri, 26 Dec 2008 14:37:07 +0800
>>
>> Miao Xie <miaox@cn.fujitsu.com> wrote:
>>> The task still allocated the page caches on old node after modifying its
>>> cpuset's mems when 'memory_spread_page' was set, it is caused by the old
>>> mem_allowed_list of the task. Slab has the same problem.
>> ok...
>>
>>> diff --git a/mm/filemap.c b/mm/filemap.c
>>> index f3e5f89..d978983 100644
>>> --- a/mm/filemap.c
>>> +++ b/mm/filemap.c
>>> @@ -517,6 +517,9 @@ int add_to_page_cache_lru(struct page *page, struct
>>> address_space *mapping, #ifdef CONFIG_NUMA
>>> struct page *__page_cache_alloc(gfp_t gfp)
>>> {
>>> + if ((gfp & __GFP_WAIT) && !in_interrupt())
>>> + cpuset_update_task_memory_state();
>>> +
>>> if (cpuset_do_page_mem_spread()) {
>>> int n = cpuset_mem_spread_node();
>>> return alloc_pages_node(n, gfp, 0);
>>> diff --git a/mm/slab.c b/mm/slab.c
>>> index 0918751..3b6e3d7 100644
>>> --- a/mm/slab.c
>>> +++ b/mm/slab.c
>>> @@ -3460,6 +3460,9 @@ __cache_alloc(struct kmem_cache *cachep, gfp_t
>>> flags, void *caller) if (should_failslab(cachep, flags))
>>> return NULL;
>>>
>>> + if ((flags & __GFP_WAIT) && !in_interrupt())
>>> + cpuset_update_task_memory_state();
>>> +
>
> These paths are pretty performance critical. Why don't cpusets code do this
> work in the slowpath where the cpuset's mems_allowed gets changed rather
> than putting these calls all over the place with apparently no real rhyme or
> reason :( (this is not against your patch, but just this part of the cpusets
> design)
I see. I will do it.
>>> cache_alloc_debugcheck_before(cachep, flags);
>>> local_irq_save(save_flags);
>>> objp = __do_cache_alloc(cachep, flags);
>> Problems.
>>
>> a) There's no need to test in_interrupt(). Any caller who passed us
>> __GFP_WAIT from interrupt context is horridly buggy and needs to be
>> fixed.
>
> Right. There are existing sites that do the same check, which is probably
> where it is copied from.
I will do cleanup in the next patch.
Thanks!
>
>> b) Even if the caller _did_ set __GFP_WAIT, there's no guarantee
>> that we're deadlock safe here. Does anyone ever do a __GFP_WAIT
>> allocation while holding callback_mutex? If so, it'll deadlock.
>
> It's static to cpuset.c, so I'd hope not.
>
>
>> c) These are two of the kernel's hottest code paths. We really
>> really really really don't want to be polling for some dopey
>> userspace admin change on each call to __cache_alloc()!
>
> Yeah, right. Let's try to fix cpuset.c instead...
>
>> d) How does slub handle this problem?
>
> SLUB seems to do a "sloppy" kind of memory policy allocation, where it just
> relies on the page allocator to hand us the correct page and AFAIKS does not
> exactly obey this stuff all the time.
WARNING: multiple messages have this Message-ID (diff)
From: Miao Xie <miaox@cn.fujitsu.com>
To: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Andrew Morton <akpm@linux-foundation.org>,
menage@google.com, cl@linux-foundation.org,
penberg@cs.helsinki.fi, mpm@selenic.com,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH] cpuset,mm: fix allocating page cache/slab object on the unallowed node when memory spread is set
Date: Wed, 31 Dec 2008 11:59:26 +0800 [thread overview]
Message-ID: <495AEE1E.9050303@cn.fujitsu.com> (raw)
In-Reply-To: <200812311413.45127.nickpiggin@yahoo.com.au>
on 2008-12-31 11:13 Nick Piggin wrote:
> On Wednesday 31 December 2008 09:28:05 Andrew Morton wrote:
>> On Fri, 26 Dec 2008 14:37:07 +0800
>>
>> Miao Xie <miaox@cn.fujitsu.com> wrote:
>>> The task still allocated the page caches on old node after modifying its
>>> cpuset's mems when 'memory_spread_page' was set, it is caused by the old
>>> mem_allowed_list of the task. Slab has the same problem.
>> ok...
>>
>>> diff --git a/mm/filemap.c b/mm/filemap.c
>>> index f3e5f89..d978983 100644
>>> --- a/mm/filemap.c
>>> +++ b/mm/filemap.c
>>> @@ -517,6 +517,9 @@ int add_to_page_cache_lru(struct page *page, struct
>>> address_space *mapping, #ifdef CONFIG_NUMA
>>> struct page *__page_cache_alloc(gfp_t gfp)
>>> {
>>> + if ((gfp & __GFP_WAIT) && !in_interrupt())
>>> + cpuset_update_task_memory_state();
>>> +
>>> if (cpuset_do_page_mem_spread()) {
>>> int n = cpuset_mem_spread_node();
>>> return alloc_pages_node(n, gfp, 0);
>>> diff --git a/mm/slab.c b/mm/slab.c
>>> index 0918751..3b6e3d7 100644
>>> --- a/mm/slab.c
>>> +++ b/mm/slab.c
>>> @@ -3460,6 +3460,9 @@ __cache_alloc(struct kmem_cache *cachep, gfp_t
>>> flags, void *caller) if (should_failslab(cachep, flags))
>>> return NULL;
>>>
>>> + if ((flags & __GFP_WAIT) && !in_interrupt())
>>> + cpuset_update_task_memory_state();
>>> +
>
> These paths are pretty performance critical. Why don't cpusets code do this
> work in the slowpath where the cpuset's mems_allowed gets changed rather
> than putting these calls all over the place with apparently no real rhyme or
> reason :( (this is not against your patch, but just this part of the cpusets
> design)
I see. I will do it.
>>> cache_alloc_debugcheck_before(cachep, flags);
>>> local_irq_save(save_flags);
>>> objp = __do_cache_alloc(cachep, flags);
>> Problems.
>>
>> a) There's no need to test in_interrupt(). Any caller who passed us
>> __GFP_WAIT from interrupt context is horridly buggy and needs to be
>> fixed.
>
> Right. There are existing sites that do the same check, which is probably
> where it is copied from.
I will do cleanup in the next patch.
Thanks!
>
>> b) Even if the caller _did_ set __GFP_WAIT, there's no guarantee
>> that we're deadlock safe here. Does anyone ever do a __GFP_WAIT
>> allocation while holding callback_mutex? If so, it'll deadlock.
>
> It's static to cpuset.c, so I'd hope not.
>
>
>> c) These are two of the kernel's hottest code paths. We really
>> really really really don't want to be polling for some dopey
>> userspace admin change on each call to __cache_alloc()!
>
> Yeah, right. Let's try to fix cpuset.c instead...
>
>> d) How does slub handle this problem?
>
> SLUB seems to do a "sloppy" kind of memory policy allocation, where it just
> relies on the page allocator to hand us the correct page and AFAIKS does not
> exactly obey this stuff all the time.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-12-31 4:01 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-12-26 6:37 [PATCH] cpuset,mm: fix allocating page cache/slab object on the unallowed node when memory spread is set Miao Xie
2008-12-26 6:37 ` Miao Xie
2008-12-30 22:28 ` Andrew Morton
2008-12-30 22:28 ` Andrew Morton
2008-12-31 3:13 ` Nick Piggin
2008-12-31 3:13 ` Nick Piggin
2008-12-31 3:59 ` Miao Xie [this message]
2008-12-31 3:59 ` Miao Xie
2008-12-31 22:37 ` Christoph Lameter
2008-12-31 22:37 ` Christoph Lameter
2008-12-31 22:53 ` Christoph Lameter
2008-12-31 22:53 ` Christoph Lameter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=495AEE1E.9050303@cn.fujitsu.com \
--to=miaox@cn.fujitsu.com \
--cc=akpm@linux-foundation.org \
--cc=cl@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=menage@google.com \
--cc=mpm@selenic.com \
--cc=nickpiggin@yahoo.com.au \
--cc=penberg@cs.helsinki.fi \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.