* [PATCH 1/2] mm: fix bugs of mpol_rebind_nodemask()
@ 2010-04-22 14:11 Miao Xie
2010-04-22 21:20 ` David Rientjes
0 siblings, 1 reply; 7+ messages in thread
From: Miao Xie @ 2010-04-22 14:11 UTC (permalink / raw)
To: David Rientjes, Lee Schermerhorn, Nick Piggin, Paul Menage
Cc: Andrew Morton, Linux-Kernel, Linux-MM
- local variable might be an empty nodemask, so must be checked before setting
pol->v.nodes to it.
- nodes_remap() may cause the weight of pol->v.nodes being monotonic decreasing.
and never become large even we pass a nodemask with large weight after
->v.nodes become little.
this patch fixes these two problem.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
mm/mempolicy.c | 9 ++++++---
1 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 08f40a2..03ba9fc 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -291,12 +291,15 @@ static void mpol_rebind_nodemask(struct mempolicy *pol,
else if (pol->flags & MPOL_F_RELATIVE_NODES)
mpol_relative_nodemask(&tmp, &pol->w.user_nodemask, nodes);
else {
- nodes_remap(tmp, pol->v.nodes, pol->w.cpuset_mems_allowed,
- *nodes);
+ tmp = *nodes;
pol->w.cpuset_mems_allowed = *nodes;
}
- pol->v.nodes = tmp;
+ if (nodes_empty(tmp))
+ pol->v.nodes = *nodes;
+ else
+ pol->v.nodes = tmp;
+
if (!node_isset(current->il_next, tmp)) {
current->il_next = next_node(current->il_next, tmp);
if (current->il_next >= MAX_NUMNODES)
--
1.6.5.2
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] mm: fix bugs of mpol_rebind_nodemask()
2010-04-22 14:11 [PATCH 1/2] mm: fix bugs of mpol_rebind_nodemask() Miao Xie
@ 2010-04-22 21:20 ` David Rientjes
2010-04-23 1:27 ` Miao Xie
0 siblings, 1 reply; 7+ messages in thread
From: David Rientjes @ 2010-04-22 21:20 UTC (permalink / raw)
To: Miao Xie
Cc: Lee Schermerhorn, Nick Piggin, Paul Menage, Andrew Morton,
Linux-Kernel, Linux-MM
On Thu, 22 Apr 2010, Miao Xie wrote:
> - local variable might be an empty nodemask, so must be checked before setting
> pol->v.nodes to it.
>
> - nodes_remap() may cause the weight of pol->v.nodes being monotonic decreasing.
> and never become large even we pass a nodemask with large weight after
> ->v.nodes become little.
>
That's always been the intention of rebinding a mempolicy nodemask: we
remap the current mempolicy nodes over the new nodemask given the set of
allowed nodes. The nodes_remap() shouldn't be removed.
> this patch fixes these two problem.
>
> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
> ---
> mm/mempolicy.c | 9 ++++++---
> 1 files changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 08f40a2..03ba9fc 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -291,12 +291,15 @@ static void mpol_rebind_nodemask(struct mempolicy *pol,
> else if (pol->flags & MPOL_F_RELATIVE_NODES)
> mpol_relative_nodemask(&tmp, &pol->w.user_nodemask, nodes);
> else {
> - nodes_remap(tmp, pol->v.nodes, pol->w.cpuset_mems_allowed,
> - *nodes);
> + tmp = *nodes;
> pol->w.cpuset_mems_allowed = *nodes;
> }
>
> - pol->v.nodes = tmp;
> + if (nodes_empty(tmp))
> + pol->v.nodes = *nodes;
> + else
> + pol->v.nodes = tmp;
> +
> if (!node_isset(current->il_next, tmp)) {
> current->il_next = next_node(current->il_next, tmp);
> if (current->il_next >= MAX_NUMNODES)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] mm: fix bugs of mpol_rebind_nodemask()
2010-04-22 21:20 ` David Rientjes
@ 2010-04-23 1:27 ` Miao Xie
2010-04-23 8:45 ` David Rientjes
0 siblings, 1 reply; 7+ messages in thread
From: Miao Xie @ 2010-04-23 1:27 UTC (permalink / raw)
To: David Rientjes
Cc: Lee Schermerhorn, Nick Piggin, Paul Menage, Andrew Morton,
Linux-Kernel, Linux-MM
on 2010-4-23 5:20, David Rientjes wrote:
> On Thu, 22 Apr 2010, Miao Xie wrote:
>
>> - local variable might be an empty nodemask, so must be checked before setting
>> pol->v.nodes to it.
>>
>> - nodes_remap() may cause the weight of pol->v.nodes being monotonic decreasing.
>> and never become large even we pass a nodemask with large weight after
>> ->v.nodes become little.
>>
>
> That's always been the intention of rebinding a mempolicy nodemask: we
> remap the current mempolicy nodes over the new nodemask given the set of
> allowed nodes. The nodes_remap() shouldn't be removed.
Suppose the current mempolicy nodes is 0-2, we can remap it from 0-2 to 2,
then we can remap it from 2 to 1, but we can't remap it from 2 to 0-2.
that is to say it can't be remaped to a large set of allowed nodes, and the task
just can use the small set of nodes for ever, even the large set of nodes is allowed,
I think it is unreasonable.
Thanks
Miao
>
>> this patch fixes these two problem.
>>
>> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
>> ---
>> mm/mempolicy.c | 9 ++++++---
>> 1 files changed, 6 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
>> index 08f40a2..03ba9fc 100644
>> --- a/mm/mempolicy.c
>> +++ b/mm/mempolicy.c
>> @@ -291,12 +291,15 @@ static void mpol_rebind_nodemask(struct mempolicy *pol,
>> else if (pol->flags & MPOL_F_RELATIVE_NODES)
>> mpol_relative_nodemask(&tmp, &pol->w.user_nodemask, nodes);
>> else {
>> - nodes_remap(tmp, pol->v.nodes, pol->w.cpuset_mems_allowed,
>> - *nodes);
>> + tmp = *nodes;
>> pol->w.cpuset_mems_allowed = *nodes;
>> }
>>
>> - pol->v.nodes = tmp;
>> + if (nodes_empty(tmp))
>> + pol->v.nodes = *nodes;
>> + else
>> + pol->v.nodes = tmp;
>> +
>> if (!node_isset(current->il_next, tmp)) {
>> current->il_next = next_node(current->il_next, tmp);
>> if (current->il_next >= MAX_NUMNODES)
>
>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] mm: fix bugs of mpol_rebind_nodemask()
2010-04-23 1:27 ` Miao Xie
@ 2010-04-23 8:45 ` David Rientjes
2010-04-29 4:03 ` Miao Xie
0 siblings, 1 reply; 7+ messages in thread
From: David Rientjes @ 2010-04-23 8:45 UTC (permalink / raw)
To: Miao Xie
Cc: Lee Schermerhorn, Nick Piggin, Paul Menage, Andrew Morton,
Linux-Kernel, Linux-MM
On Fri, 23 Apr 2010, Miao Xie wrote:
> Suppose the current mempolicy nodes is 0-2, we can remap it from 0-2 to 2,
> then we can remap it from 2 to 1, but we can't remap it from 2 to 0-2.
>
> that is to say it can't be remaped to a large set of allowed nodes, and the task
> just can use the small set of nodes for ever, even the large set of nodes is allowed,
> I think it is unreasonable.
>
That's been the behavior for at least three years so changing it from
under the applications isn't acceptable, see
Documentation/vm/numa_memory_policy.txt regarding mempolicy rebinds and
the two flags that are defined that can be used to adjust the behavior.
The pol->v.nodes = nodes_empty(tmp) ? *nodes : tmp fix is welcome,
however, as a standalone patch.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] mm: fix bugs of mpol_rebind_nodemask()
2010-04-23 8:45 ` David Rientjes
@ 2010-04-29 4:03 ` Miao Xie
2010-04-29 18:03 ` David Rientjes
0 siblings, 1 reply; 7+ messages in thread
From: Miao Xie @ 2010-04-29 4:03 UTC (permalink / raw)
To: David Rientjes
Cc: Lee Schermerhorn, Nick Piggin, Paul Menage, Andrew Morton,
Linux-Kernel, Linux-MM
on 2010-4-23 16:45, David Rientjes wrote:
> On Fri, 23 Apr 2010, Miao Xie wrote:
>
>> Suppose the current mempolicy nodes is 0-2, we can remap it from 0-2 to 2,
>> then we can remap it from 2 to 1, but we can't remap it from 2 to 0-2.
>>
>> that is to say it can't be remaped to a large set of allowed nodes, and the task
>> just can use the small set of nodes for ever, even the large set of nodes is allowed,
>> I think it is unreasonable.
>>
>
> That's been the behavior for at least three years so changing it from
> under the applications isn't acceptable, see
> Documentation/vm/numa_memory_policy.txt regarding mempolicy rebinds and
> the two flags that are defined that can be used to adjust the behavior.
Is the flags what you said MPOL_F_STATIC_NODES and MPOL_F_RELATIVE_NODES?
But the codes that I changed isn't under MPOL_F_STATIC_NODES or MPOL_F_RELATIVE_NODES.
The documentation doesn't say what we should do if either of these two flags is not set.
Furthermore, in order to fix no node to alloc memory, when we want to update mempolicy
and mems_allowed, we expand the set of nodes first (set all the newly nodes) and
shrink the set of nodes lazily(clean disallowed nodes).
But remap() breaks the expanding, so if we don't remove remap(), the problem can't be
fixed. Otherwise, cpuset has to do the rebinding by itself and the code is ugly.
Like this:
static void cpuset_change_task_nodemask(struct task_struct *tsk, nodemask_t *newmems)
{
nodemask_t tmp;
...
/* expand the set of nodes */
if (!mpol_store_user_nodemask(tsk->mempolicy)) {
nodes_remap(tmp, ...);
nodes_or(tsk->mempolicy->v.nodes, tsk->mempolicy->v.nodes, tmp);
}
...
/* shrink the set of nodes */
if (!mpol_store_user_nodemask(tsk->mempolicy))
tsk->mempolicy->v.nodes = tmp;
}
Thanks
Miao
>
> The pol->v.nodes = nodes_empty(tmp) ? *nodes : tmp fix is welcome,
> however, as a standalone patch.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] mm: fix bugs of mpol_rebind_nodemask()
2010-04-29 4:03 ` Miao Xie
@ 2010-04-29 18:03 ` David Rientjes
2010-05-04 10:53 ` Miao Xie
0 siblings, 1 reply; 7+ messages in thread
From: David Rientjes @ 2010-04-29 18:03 UTC (permalink / raw)
To: Miao Xie
Cc: Lee Schermerhorn, Nick Piggin, Paul Menage, Andrew Morton,
Linux-Kernel, Linux-MM
On Thu, 29 Apr 2010, Miao Xie wrote:
> > That's been the behavior for at least three years so changing it from
> > under the applications isn't acceptable, see
> > Documentation/vm/numa_memory_policy.txt regarding mempolicy rebinds and
> > the two flags that are defined that can be used to adjust the behavior.
>
> Is the flags what you said MPOL_F_STATIC_NODES and MPOL_F_RELATIVE_NODES?
> But the codes that I changed isn't under MPOL_F_STATIC_NODES or MPOL_F_RELATIVE_NODES.
> The documentation doesn't say what we should do if either of these two flags is not set.
>
MPOL_F_STATIC_NODES and MPOL_F_RELATIVE_NODES allow you to adjust the
behavior of the rebind: the former requires specific nodes to be assigned
to the mempolicy and could suppress the rebind completely, if necessary;
the latter ensures the mempolicy nodemask has a certain weight as nodes
are assigned in a round-robin manner. The behavior that you're referring
to is provided via MPOL_F_RELATIVE_NODES, which guarantees whatever weight
is passed via set_mempolicy() will be preserved when mems are added to a
cpuset.
Regardless of whether the behavior is documented when either flag is
passed, we can't change the long-standing default behavior that people use
when their cpuset mems are rebound: we can only extend the functionality
and the behavior you're seeking is already available with a
MPOL_F_RELATIVE_NODES flag modifier.
> Furthermore, in order to fix no node to alloc memory, when we want to update mempolicy
> and mems_allowed, we expand the set of nodes first (set all the newly nodes) and
> shrink the set of nodes lazily(clean disallowed nodes).
That's a cpuset implementation choice, not a mempolicy one; mempolicies
have nothing to do with an empty current->mems_allowed.
> But remap() breaks the expanding, so if we don't remove remap(), the problem can't be
> fixed. Otherwise, cpuset has to do the rebinding by itself and the code is ugly.
> Like this:
>
> static void cpuset_change_task_nodemask(struct task_struct *tsk, nodemask_t *newmems)
> {
> nodemask_t tmp;
> ...
> /* expand the set of nodes */
> if (!mpol_store_user_nodemask(tsk->mempolicy)) {
> nodes_remap(tmp, ...);
> nodes_or(tsk->mempolicy->v.nodes, tsk->mempolicy->v.nodes, tmp);
> }
> ...
>
> /* shrink the set of nodes */
> if (!mpol_store_user_nodemask(tsk->mempolicy))
> tsk->mempolicy->v.nodes = tmp;
> }
>
I don't see why this is even necessary, the mempolicy code could simply
return numa_node_id() when nodes_empty(current->mempolicy->v.nodes) to
close the race.
[ Your pseudo-code is also lacking task_lock(tsk), which is required to
safely dereference tsk->mempolicy, and this is only available so far in
-mm since the oom killer rewrite. ]
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] mm: fix bugs of mpol_rebind_nodemask()
2010-04-29 18:03 ` David Rientjes
@ 2010-05-04 10:53 ` Miao Xie
0 siblings, 0 replies; 7+ messages in thread
From: Miao Xie @ 2010-05-04 10:53 UTC (permalink / raw)
To: David Rientjes
Cc: Lee Schermerhorn, Nick Piggin, Paul Menage, Andrew Morton,
Linux-Kernel, Linux-MM
on 2010-4-30 2:03, David Rientjes wrote:
> On Thu, 29 Apr 2010, Miao Xie wrote:
>
>>> That's been the behavior for at least three years so changing it from
>>> under the applications isn't acceptable, see
>>> Documentation/vm/numa_memory_policy.txt regarding mempolicy rebinds and
>>> the two flags that are defined that can be used to adjust the behavior.
>>
>> Is the flags what you said MPOL_F_STATIC_NODES and MPOL_F_RELATIVE_NODES?
>> But the codes that I changed isn't under MPOL_F_STATIC_NODES or MPOL_F_RELATIVE_NODES.
>> The documentation doesn't say what we should do if either of these two flags is not set.
>>
>
> MPOL_F_STATIC_NODES and MPOL_F_RELATIVE_NODES allow you to adjust the
> behavior of the rebind: the former requires specific nodes to be assigned
> to the mempolicy and could suppress the rebind completely, if necessary;
> the latter ensures the mempolicy nodemask has a certain weight as nodes
> are assigned in a round-robin manner. The behavior that you're referring
> to is provided via MPOL_F_RELATIVE_NODES, which guarantees whatever weight
> is passed via set_mempolicy() will be preserved when mems are added to a
> cpuset.
>
> Regardless of whether the behavior is documented when either flag is
> passed, we can't change the long-standing default behavior that people use
> when their cpuset mems are rebound: we can only extend the functionality
> and the behavior you're seeking is already available with a
> MPOL_F_RELATIVE_NODES flag modifier.
>
>> Furthermore, in order to fix no node to alloc memory, when we want to update mempolicy
>> and mems_allowed, we expand the set of nodes first (set all the newly nodes) and
>> shrink the set of nodes lazily(clean disallowed nodes).
>
> That's a cpuset implementation choice, not a mempolicy one; mempolicies
> have nothing to do with an empty current->mems_allowed.
>
>> But remap() breaks the expanding, so if we don't remove remap(), the problem can't be
>> fixed. Otherwise, cpuset has to do the rebinding by itself and the code is ugly.
>> Like this:
>>
>> static void cpuset_change_task_nodemask(struct task_struct *tsk, nodemask_t *newmems)
>> {
>> nodemask_t tmp;
>> ...
>> /* expand the set of nodes */
>> if (!mpol_store_user_nodemask(tsk->mempolicy)) {
>> nodes_remap(tmp, ...);
>> nodes_or(tsk->mempolicy->v.nodes, tsk->mempolicy->v.nodes, tmp);
>> }
>> ...
>>
>> /* shrink the set of nodes */
>> if (!mpol_store_user_nodemask(tsk->mempolicy))
>> tsk->mempolicy->v.nodes = tmp;
>> }
>>
>
> I don't see why this is even necessary, the mempolicy code could simply
> return numa_node_id() when nodes_empty(current->mempolicy->v.nodes) to
> close the race.
>
> [ Your pseudo-code is also lacking task_lock(tsk), which is required to
> safely dereference tsk->mempolicy, and this is only available so far in
> -mm since the oom killer rewrite. ]
I updated it and remade a new patchset, could you review it for me?
Thanks
Miao
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2010-05-04 10:53 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-22 14:11 [PATCH 1/2] mm: fix bugs of mpol_rebind_nodemask() Miao Xie
2010-04-22 21:20 ` David Rientjes
2010-04-23 1:27 ` Miao Xie
2010-04-23 8:45 ` David Rientjes
2010-04-29 4:03 ` Miao Xie
2010-04-29 18:03 ` David Rientjes
2010-05-04 10:53 ` Miao Xie
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).