From: Miao Xie <miaox@cn.fujitsu.com>
To: David Rientjes <rientjes@google.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>,
Nick Piggin <npiggin@suse.de>, Paul Menage <menage@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linux-Kernel <linux-kernel@vger.kernel.org>,
Linux-MM <linux-mm@kvack.org>
Subject: Re: [PATCH 1/2] mm: fix bugs of mpol_rebind_nodemask()
Date: Tue, 04 May 2010 18:53:45 +0800 [thread overview]
Message-ID: <4BDFFCB9.5010402@cn.fujitsu.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1004291054010.24062@chino.kir.corp.google.com>
on 2010-4-30 2:03, David Rientjes wrote:
> On Thu, 29 Apr 2010, Miao Xie wrote:
>
>>> That's been the behavior for at least three years so changing it from
>>> under the applications isn't acceptable, see
>>> Documentation/vm/numa_memory_policy.txt regarding mempolicy rebinds and
>>> the two flags that are defined that can be used to adjust the behavior.
>>
>> Is the flags what you said MPOL_F_STATIC_NODES and MPOL_F_RELATIVE_NODES?
>> But the codes that I changed isn't under MPOL_F_STATIC_NODES or MPOL_F_RELATIVE_NODES.
>> The documentation doesn't say what we should do if either of these two flags is not set.
>>
>
> MPOL_F_STATIC_NODES and MPOL_F_RELATIVE_NODES allow you to adjust the
> behavior of the rebind: the former requires specific nodes to be assigned
> to the mempolicy and could suppress the rebind completely, if necessary;
> the latter ensures the mempolicy nodemask has a certain weight as nodes
> are assigned in a round-robin manner. The behavior that you're referring
> to is provided via MPOL_F_RELATIVE_NODES, which guarantees whatever weight
> is passed via set_mempolicy() will be preserved when mems are added to a
> cpuset.
>
> Regardless of whether the behavior is documented when either flag is
> passed, we can't change the long-standing default behavior that people use
> when their cpuset mems are rebound: we can only extend the functionality
> and the behavior you're seeking is already available with a
> MPOL_F_RELATIVE_NODES flag modifier.
>
>> Furthermore, in order to fix no node to alloc memory, when we want to update mempolicy
>> and mems_allowed, we expand the set of nodes first (set all the newly nodes) and
>> shrink the set of nodes lazily(clean disallowed nodes).
>
> That's a cpuset implementation choice, not a mempolicy one; mempolicies
> have nothing to do with an empty current->mems_allowed.
>
>> But remap() breaks the expanding, so if we don't remove remap(), the problem can't be
>> fixed. Otherwise, cpuset has to do the rebinding by itself and the code is ugly.
>> Like this:
>>
>> static void cpuset_change_task_nodemask(struct task_struct *tsk, nodemask_t *newmems)
>> {
>> nodemask_t tmp;
>> ...
>> /* expand the set of nodes */
>> if (!mpol_store_user_nodemask(tsk->mempolicy)) {
>> nodes_remap(tmp, ...);
>> nodes_or(tsk->mempolicy->v.nodes, tsk->mempolicy->v.nodes, tmp);
>> }
>> ...
>>
>> /* shrink the set of nodes */
>> if (!mpol_store_user_nodemask(tsk->mempolicy))
>> tsk->mempolicy->v.nodes = tmp;
>> }
>>
>
> I don't see why this is even necessary, the mempolicy code could simply
> return numa_node_id() when nodes_empty(current->mempolicy->v.nodes) to
> close the race.
>
> [ Your pseudo-code is also lacking task_lock(tsk), which is required to
> safely dereference tsk->mempolicy, and this is only available so far in
> -mm since the oom killer rewrite. ]
I updated it and remade a new patchset, could you review it for me?
Thanks
Miao
WARNING: multiple messages have this Message-ID (diff)
From: Miao Xie <miaox@cn.fujitsu.com>
To: David Rientjes <rientjes@google.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>,
Nick Piggin <npiggin@suse.de>, Paul Menage <menage@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linux-Kernel <linux-kernel@vger.kernel.org>,
Linux-MM <linux-mm@kvack.org>
Subject: Re: [PATCH 1/2] mm: fix bugs of mpol_rebind_nodemask()
Date: Tue, 04 May 2010 18:53:45 +0800 [thread overview]
Message-ID: <4BDFFCB9.5010402@cn.fujitsu.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1004291054010.24062@chino.kir.corp.google.com>
on 2010-4-30 2:03, David Rientjes wrote:
> On Thu, 29 Apr 2010, Miao Xie wrote:
>
>>> That's been the behavior for at least three years so changing it from
>>> under the applications isn't acceptable, see
>>> Documentation/vm/numa_memory_policy.txt regarding mempolicy rebinds and
>>> the two flags that are defined that can be used to adjust the behavior.
>>
>> Is the flags what you said MPOL_F_STATIC_NODES and MPOL_F_RELATIVE_NODES?
>> But the codes that I changed isn't under MPOL_F_STATIC_NODES or MPOL_F_RELATIVE_NODES.
>> The documentation doesn't say what we should do if either of these two flags is not set.
>>
>
> MPOL_F_STATIC_NODES and MPOL_F_RELATIVE_NODES allow you to adjust the
> behavior of the rebind: the former requires specific nodes to be assigned
> to the mempolicy and could suppress the rebind completely, if necessary;
> the latter ensures the mempolicy nodemask has a certain weight as nodes
> are assigned in a round-robin manner. The behavior that you're referring
> to is provided via MPOL_F_RELATIVE_NODES, which guarantees whatever weight
> is passed via set_mempolicy() will be preserved when mems are added to a
> cpuset.
>
> Regardless of whether the behavior is documented when either flag is
> passed, we can't change the long-standing default behavior that people use
> when their cpuset mems are rebound: we can only extend the functionality
> and the behavior you're seeking is already available with a
> MPOL_F_RELATIVE_NODES flag modifier.
>
>> Furthermore, in order to fix no node to alloc memory, when we want to update mempolicy
>> and mems_allowed, we expand the set of nodes first (set all the newly nodes) and
>> shrink the set of nodes lazily(clean disallowed nodes).
>
> That's a cpuset implementation choice, not a mempolicy one; mempolicies
> have nothing to do with an empty current->mems_allowed.
>
>> But remap() breaks the expanding, so if we don't remove remap(), the problem can't be
>> fixed. Otherwise, cpuset has to do the rebinding by itself and the code is ugly.
>> Like this:
>>
>> static void cpuset_change_task_nodemask(struct task_struct *tsk, nodemask_t *newmems)
>> {
>> nodemask_t tmp;
>> ...
>> /* expand the set of nodes */
>> if (!mpol_store_user_nodemask(tsk->mempolicy)) {
>> nodes_remap(tmp, ...);
>> nodes_or(tsk->mempolicy->v.nodes, tsk->mempolicy->v.nodes, tmp);
>> }
>> ...
>>
>> /* shrink the set of nodes */
>> if (!mpol_store_user_nodemask(tsk->mempolicy))
>> tsk->mempolicy->v.nodes = tmp;
>> }
>>
>
> I don't see why this is even necessary, the mempolicy code could simply
> return numa_node_id() when nodes_empty(current->mempolicy->v.nodes) to
> close the race.
>
> [ Your pseudo-code is also lacking task_lock(tsk), which is required to
> safely dereference tsk->mempolicy, and this is only available so far in
> -mm since the oom killer rewrite. ]
I updated it and remade a new patchset, could you review it for me?
Thanks
Miao
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-05-04 10:53 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-04-22 14:11 [PATCH 1/2] mm: fix bugs of mpol_rebind_nodemask() Miao Xie
2010-04-22 14:11 ` Miao Xie
2010-04-22 21:20 ` David Rientjes
2010-04-22 21:20 ` David Rientjes
2010-04-23 1:27 ` Miao Xie
2010-04-23 1:27 ` Miao Xie
2010-04-23 8:45 ` David Rientjes
2010-04-23 8:45 ` David Rientjes
2010-04-29 4:03 ` Miao Xie
2010-04-29 18:03 ` David Rientjes
2010-04-29 18:03 ` David Rientjes
2010-05-04 10:53 ` Miao Xie [this message]
2010-05-04 10:53 ` Miao Xie
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4BDFFCB9.5010402@cn.fujitsu.com \
--to=miaox@cn.fujitsu.com \
--cc=akpm@linux-foundation.org \
--cc=lee.schermerhorn@hp.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=menage@google.com \
--cc=npiggin@suse.de \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.