public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] mm: fix bugs of mpol_rebind_nodemask()
@ 2010-04-22 14:11 Miao Xie
  2010-04-22 21:20 ` David Rientjes
  0 siblings, 1 reply; 6+ messages in thread
From: Miao Xie @ 2010-04-22 14:11 UTC (permalink / raw)
  To: David Rientjes, Lee Schermerhorn, Nick Piggin, Paul Menage
  Cc: Andrew Morton, Linux-Kernel, Linux-MM

- local variable might be an empty nodemask, so must be checked before setting
  pol->v.nodes to it.

- nodes_remap() may cause the weight of pol->v.nodes being monotonic decreasing.
  and never become large even we pass a nodemask with large weight after
  ->v.nodes become little.

this patch fixes these two problem.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
 mm/mempolicy.c |    9 ++++++---
 1 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 08f40a2..03ba9fc 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -291,12 +291,15 @@ static void mpol_rebind_nodemask(struct mempolicy *pol,
 	else if (pol->flags & MPOL_F_RELATIVE_NODES)
 		mpol_relative_nodemask(&tmp, &pol->w.user_nodemask, nodes);
 	else {
-		nodes_remap(tmp, pol->v.nodes, pol->w.cpuset_mems_allowed,
-			    *nodes);
+		tmp = *nodes;
 		pol->w.cpuset_mems_allowed = *nodes;
 	}
 
-	pol->v.nodes = tmp;
+	if (nodes_empty(tmp))
+		pol->v.nodes = *nodes;
+	else
+		pol->v.nodes = tmp;
+
 	if (!node_isset(current->il_next, tmp)) {
 		current->il_next = next_node(current->il_next, tmp);
 		if (current->il_next >= MAX_NUMNODES)
-- 
1.6.5.2



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] mm: fix bugs of mpol_rebind_nodemask()
  2010-04-22 14:11 [PATCH 1/2] mm: fix bugs of mpol_rebind_nodemask() Miao Xie
@ 2010-04-22 21:20 ` David Rientjes
  2010-04-23  1:27   ` Miao Xie
  0 siblings, 1 reply; 6+ messages in thread
From: David Rientjes @ 2010-04-22 21:20 UTC (permalink / raw)
  To: Miao Xie
  Cc: Lee Schermerhorn, Nick Piggin, Paul Menage, Andrew Morton,
	Linux-Kernel, Linux-MM

On Thu, 22 Apr 2010, Miao Xie wrote:

> - local variable might be an empty nodemask, so must be checked before setting
>   pol->v.nodes to it.
> 
> - nodes_remap() may cause the weight of pol->v.nodes being monotonic decreasing.
>   and never become large even we pass a nodemask with large weight after
>   ->v.nodes become little.
> 

That's always been the intention of rebinding a mempolicy nodemask: we 
remap the current mempolicy nodes over the new nodemask given the set of 
allowed nodes.  The nodes_remap() shouldn't be removed.

> this patch fixes these two problem.
> 
> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
> ---
>  mm/mempolicy.c |    9 ++++++---
>  1 files changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 08f40a2..03ba9fc 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -291,12 +291,15 @@ static void mpol_rebind_nodemask(struct mempolicy *pol,
>  	else if (pol->flags & MPOL_F_RELATIVE_NODES)
>  		mpol_relative_nodemask(&tmp, &pol->w.user_nodemask, nodes);
>  	else {
> -		nodes_remap(tmp, pol->v.nodes, pol->w.cpuset_mems_allowed,
> -			    *nodes);
> +		tmp = *nodes;
>  		pol->w.cpuset_mems_allowed = *nodes;
>  	}
>  
> -	pol->v.nodes = tmp;
> +	if (nodes_empty(tmp))
> +		pol->v.nodes = *nodes;
> +	else
> +		pol->v.nodes = tmp;
> +
>  	if (!node_isset(current->il_next, tmp)) {
>  		current->il_next = next_node(current->il_next, tmp);
>  		if (current->il_next >= MAX_NUMNODES)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] mm: fix bugs of mpol_rebind_nodemask()
  2010-04-22 21:20 ` David Rientjes
@ 2010-04-23  1:27   ` Miao Xie
  2010-04-23  8:45     ` David Rientjes
  0 siblings, 1 reply; 6+ messages in thread
From: Miao Xie @ 2010-04-23  1:27 UTC (permalink / raw)
  To: David Rientjes
  Cc: Lee Schermerhorn, Nick Piggin, Paul Menage, Andrew Morton,
	Linux-Kernel, Linux-MM

on 2010-4-23 5:20, David Rientjes wrote:
> On Thu, 22 Apr 2010, Miao Xie wrote:
> 
>> - local variable might be an empty nodemask, so must be checked before setting
>>   pol->v.nodes to it.
>>
>> - nodes_remap() may cause the weight of pol->v.nodes being monotonic decreasing.
>>   and never become large even we pass a nodemask with large weight after
>>   ->v.nodes become little.
>>
> 
> That's always been the intention of rebinding a mempolicy nodemask: we 
> remap the current mempolicy nodes over the new nodemask given the set of 
> allowed nodes.  The nodes_remap() shouldn't be removed.

Suppose the current mempolicy nodes is 0-2, we can remap it from 0-2 to 2,
then we can remap it from 2 to 1, but we can't remap it from 2 to 0-2.

that is to say it can't be remaped to a large set of allowed nodes, and the task
just can use the small set of nodes for ever, even the large set of nodes is allowed,
I think it is unreasonable.

Thanks
Miao

> 
>> this patch fixes these two problem.
>>
>> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
>> ---
>>  mm/mempolicy.c |    9 ++++++---
>>  1 files changed, 6 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
>> index 08f40a2..03ba9fc 100644
>> --- a/mm/mempolicy.c
>> +++ b/mm/mempolicy.c
>> @@ -291,12 +291,15 @@ static void mpol_rebind_nodemask(struct mempolicy *pol,
>>  	else if (pol->flags & MPOL_F_RELATIVE_NODES)
>>  		mpol_relative_nodemask(&tmp, &pol->w.user_nodemask, nodes);
>>  	else {
>> -		nodes_remap(tmp, pol->v.nodes, pol->w.cpuset_mems_allowed,
>> -			    *nodes);
>> +		tmp = *nodes;
>>  		pol->w.cpuset_mems_allowed = *nodes;
>>  	}
>>  
>> -	pol->v.nodes = tmp;
>> +	if (nodes_empty(tmp))
>> +		pol->v.nodes = *nodes;
>> +	else
>> +		pol->v.nodes = tmp;
>> +
>>  	if (!node_isset(current->il_next, tmp)) {
>>  		current->il_next = next_node(current->il_next, tmp);
>>  		if (current->il_next >= MAX_NUMNODES)
> 
> 
> 



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] mm: fix bugs of mpol_rebind_nodemask()
  2010-04-23  1:27   ` Miao Xie
@ 2010-04-23  8:45     ` David Rientjes
       [not found]       ` <4BD90529.3090401@cn.fujitsu.com>
  0 siblings, 1 reply; 6+ messages in thread
From: David Rientjes @ 2010-04-23  8:45 UTC (permalink / raw)
  To: Miao Xie
  Cc: Lee Schermerhorn, Nick Piggin, Paul Menage, Andrew Morton,
	Linux-Kernel, Linux-MM

On Fri, 23 Apr 2010, Miao Xie wrote:

> Suppose the current mempolicy nodes is 0-2, we can remap it from 0-2 to 2,
> then we can remap it from 2 to 1, but we can't remap it from 2 to 0-2.
> 
> that is to say it can't be remaped to a large set of allowed nodes, and the task
> just can use the small set of nodes for ever, even the large set of nodes is allowed,
> I think it is unreasonable.
> 

That's been the behavior for at least three years so changing it from 
under the applications isn't acceptable, see 
Documentation/vm/numa_memory_policy.txt regarding mempolicy rebinds and 
the two flags that are defined that can be used to adjust the behavior.

The pol->v.nodes = nodes_empty(tmp) ? *nodes : tmp fix is welcome, 
however, as a standalone patch.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] mm: fix bugs of mpol_rebind_nodemask()
       [not found]       ` <4BD90529.3090401@cn.fujitsu.com>
@ 2010-04-29 18:03         ` David Rientjes
  2010-05-04 10:53           ` Miao Xie
  0 siblings, 1 reply; 6+ messages in thread
From: David Rientjes @ 2010-04-29 18:03 UTC (permalink / raw)
  To: Miao Xie
  Cc: Lee Schermerhorn, Nick Piggin, Paul Menage, Andrew Morton,
	Linux-Kernel, Linux-MM

On Thu, 29 Apr 2010, Miao Xie wrote:

> > That's been the behavior for at least three years so changing it from 
> > under the applications isn't acceptable, see 
> > Documentation/vm/numa_memory_policy.txt regarding mempolicy rebinds and 
> > the two flags that are defined that can be used to adjust the behavior.
> 
> Is the flags what you said MPOL_F_STATIC_NODES and MPOL_F_RELATIVE_NODES? 
> But the codes that I changed isn't under MPOL_F_STATIC_NODES or MPOL_F_RELATIVE_NODES.
> The documentation doesn't say what we should do if either of these two flags is not set. 
> 

MPOL_F_STATIC_NODES and MPOL_F_RELATIVE_NODES allow you to adjust the 
behavior of the rebind: the former requires specific nodes to be assigned 
to the mempolicy and could suppress the rebind completely, if necessary; 
the latter ensures the mempolicy nodemask has a certain weight as nodes 
are assigned in a round-robin manner.  The behavior that you're referring 
to is provided via MPOL_F_RELATIVE_NODES, which guarantees whatever weight 
is passed via set_mempolicy() will be preserved when mems are added to a 
cpuset.

Regardless of whether the behavior is documented when either flag is 
passed, we can't change the long-standing default behavior that people use 
when their cpuset mems are rebound: we can only extend the functionality 
and the behavior you're seeking is already available with a 
MPOL_F_RELATIVE_NODES flag modifier.

> Furthermore, in order to fix no node to alloc memory, when we want to update mempolicy
> and mems_allowed, we expand the set of nodes first (set all the newly nodes) and
> shrink the set of nodes lazily(clean disallowed nodes).

That's a cpuset implementation choice, not a mempolicy one; mempolicies 
have nothing to do with an empty current->mems_allowed.

> But remap() breaks the expanding, so if we don't remove remap(), the problem can't be
> fixed. Otherwise, cpuset has to do the rebinding by itself and the code is ugly.
> Like this:
> 
> static void cpuset_change_task_nodemask(struct task_struct *tsk, nodemask_t *newmems)
> {
> 	nodemask_t tmp;
> 	...
> 	/* expand the set of nodes */
> 	if (!mpol_store_user_nodemask(tsk->mempolicy)) {
> 		nodes_remap(tmp, ...);
> 		nodes_or(tsk->mempolicy->v.nodes, tsk->mempolicy->v.nodes, tmp);
> 	}
> 	...
> 
> 	/* shrink the set of nodes */
> 	if (!mpol_store_user_nodemask(tsk->mempolicy))
> 		tsk->mempolicy->v.nodes = tmp;
> }
> 

I don't see why this is even necessary, the mempolicy code could simply 
return numa_node_id() when nodes_empty(current->mempolicy->v.nodes) to 
close the race.

 [ Your pseudo-code is also lacking task_lock(tsk), which is required to 
   safely dereference tsk->mempolicy, and this is only available so far in 
   -mm since the oom killer rewrite. ]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] mm: fix bugs of mpol_rebind_nodemask()
  2010-04-29 18:03         ` David Rientjes
@ 2010-05-04 10:53           ` Miao Xie
  0 siblings, 0 replies; 6+ messages in thread
From: Miao Xie @ 2010-05-04 10:53 UTC (permalink / raw)
  To: David Rientjes
  Cc: Lee Schermerhorn, Nick Piggin, Paul Menage, Andrew Morton,
	Linux-Kernel, Linux-MM

on 2010-4-30 2:03, David Rientjes wrote:
> On Thu, 29 Apr 2010, Miao Xie wrote:
> 
>>> That's been the behavior for at least three years so changing it from 
>>> under the applications isn't acceptable, see 
>>> Documentation/vm/numa_memory_policy.txt regarding mempolicy rebinds and 
>>> the two flags that are defined that can be used to adjust the behavior.
>>
>> Is the flags what you said MPOL_F_STATIC_NODES and MPOL_F_RELATIVE_NODES? 
>> But the codes that I changed isn't under MPOL_F_STATIC_NODES or MPOL_F_RELATIVE_NODES.
>> The documentation doesn't say what we should do if either of these two flags is not set. 
>>
> 
> MPOL_F_STATIC_NODES and MPOL_F_RELATIVE_NODES allow you to adjust the 
> behavior of the rebind: the former requires specific nodes to be assigned 
> to the mempolicy and could suppress the rebind completely, if necessary; 
> the latter ensures the mempolicy nodemask has a certain weight as nodes 
> are assigned in a round-robin manner.  The behavior that you're referring 
> to is provided via MPOL_F_RELATIVE_NODES, which guarantees whatever weight 
> is passed via set_mempolicy() will be preserved when mems are added to a 
> cpuset.
> 
> Regardless of whether the behavior is documented when either flag is 
> passed, we can't change the long-standing default behavior that people use 
> when their cpuset mems are rebound: we can only extend the functionality 
> and the behavior you're seeking is already available with a 
> MPOL_F_RELATIVE_NODES flag modifier.
> 
>> Furthermore, in order to fix no node to alloc memory, when we want to update mempolicy
>> and mems_allowed, we expand the set of nodes first (set all the newly nodes) and
>> shrink the set of nodes lazily(clean disallowed nodes).
> 
> That's a cpuset implementation choice, not a mempolicy one; mempolicies 
> have nothing to do with an empty current->mems_allowed.
> 
>> But remap() breaks the expanding, so if we don't remove remap(), the problem can't be
>> fixed. Otherwise, cpuset has to do the rebinding by itself and the code is ugly.
>> Like this:
>>
>> static void cpuset_change_task_nodemask(struct task_struct *tsk, nodemask_t *newmems)
>> {
>> 	nodemask_t tmp;
>> 	...
>> 	/* expand the set of nodes */
>> 	if (!mpol_store_user_nodemask(tsk->mempolicy)) {
>> 		nodes_remap(tmp, ...);
>> 		nodes_or(tsk->mempolicy->v.nodes, tsk->mempolicy->v.nodes, tmp);
>> 	}
>> 	...
>>
>> 	/* shrink the set of nodes */
>> 	if (!mpol_store_user_nodemask(tsk->mempolicy))
>> 		tsk->mempolicy->v.nodes = tmp;
>> }
>>
> 
> I don't see why this is even necessary, the mempolicy code could simply 
> return numa_node_id() when nodes_empty(current->mempolicy->v.nodes) to 
> close the race.
> 
>  [ Your pseudo-code is also lacking task_lock(tsk), which is required to 
>    safely dereference tsk->mempolicy, and this is only available so far in 
>    -mm since the oom killer rewrite. ]

I updated it and remade a new patchset, could you review it for me?

Thanks
Miao


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-05-04 10:53 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-22 14:11 [PATCH 1/2] mm: fix bugs of mpol_rebind_nodemask() Miao Xie
2010-04-22 21:20 ` David Rientjes
2010-04-23  1:27   ` Miao Xie
2010-04-23  8:45     ` David Rientjes
     [not found]       ` <4BD90529.3090401@cn.fujitsu.com>
2010-04-29 18:03         ` David Rientjes
2010-05-04 10:53           ` Miao Xie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox