* Re: [2.6.24 regression][BUGFIX] numactl --interleave=all doesn't works on memoryless node.
2008-02-05 9:26 ` [2.6.24 regression][BUGFIX] " KOSAKI Motohiro
@ 2008-02-05 21:57 ` Lee Schermerhorn
2008-02-05 22:12 ` Christoph Lameter
` (3 more replies)
2008-02-06 17:38 ` Lee Schermerhorn
2008-02-08 19:45 ` [PATCH 2.6.24-mm1] Mempolicy: silently restrict nodemask to allowed nodes V3 Lee Schermerhorn
2 siblings, 4 replies; 43+ messages in thread
From: Lee Schermerhorn @ 2008-02-05 21:57 UTC (permalink / raw)
To: KOSAKI Motohiro, linux-kernel, Andrew Morton
Cc: Christoph Lameter, Paul Jackson, David Rientjes, Mel Gorman,
torvalds, Eric Whitney
Here's a patch that addresses the problem w/o requiring change to
numactl or libnuma. It DOES have side affects, discussed in the
description.
Tested with memoryless nodes and restricted cpusets using the numactl
installed with RHEL5.1.
Altho' nominally against 24-mm1, applies cleanly to 2.6.24. Should be
suitable for 'stable' if everyone agrees.
Lee
----------------------------------
[PATCH] 2.6.24-mm1 - mempolicy: silently restrict to allowed nodes
Kosaki-san noted that "numactl --interleave=all ..." failed in the
presence of memoryless nodes. This patch attempts to fix that
problem.
Some background:
numactl --interleave=all calls set_mempolicy(2) with a fully
populated [out to MAXNUMNODES] nodemask. set_mempolicy()
[in do_set_mempolicy()] calls contextualize_policy() which
requires that the nodemask be a subset of the current task's
mems_allowed; else EINVAL will be returned. A task's
mems_allowed will always be a subset of node_states[N_HIGH_MEMORY]--
i.e., nodes with memory. So, a fully populated nodemask will
be declared invalid if it includes memoryless nodes.
NOTE: the same thing will occur when running in a cpuset
with restricted mem_allowed--for the same reason:
node mask contains dis-allowed nodes.
mbind(2), on the other hand, just masks off any nodes in the
nodemask that are not included in the caller's mems_allowed.
In each case [mbind() and set_mempolicy()], mpol_check_policy()
will complain [again, resulting in EINVAL] if the nodemask contains
any memoryless nodes. This is somewhat redundant as mpol_new()
will remove memoryless nodes for interleave policy, as will
bind_zonelist()--called by mpol_new() for BIND policy.
Proposed fix:
1) modify contextualize_policy to just remove the non-allowed
nodes, as is currently done in-line for mbind(). This
guarantees that the resulting mask includes only nodes with
memory.
NOTE: this is a [benign, IMO] change in behavior for
set_mempolicy(). Dis-allowed nodes will be silently
ignored, rather than returning an error.
Another, perhaps less benign, change in behavior:
MPOL_PREFERRED policy that specifies only memoryless nodes
or nodes that are disallowed in the cpuset will be interpreted
as "local allocation" as the nodemask will be empty after
the masking in contextualize_policy(). With a bit of
additional hackery I can make this return EINVAL.
Comments?
2) modify mbind() to use contextualize_policy(), like set_mempolicy(),
instead of masking nodes in-line.
3) remove the now redundant check for memoryless nodes from
mpol_check_policy().
4) remove the masking of policy nodes for interleave policy from
mpol_new().
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
mm/mempolicy.c | 18 ++++++++----------
1 file changed, 8 insertions(+), 10 deletions(-)
Index: Linux/mm/mempolicy.c
===================================================================
--- Linux.orig/mm/mempolicy.c 2008-02-05 11:25:17.000000000 -0500
+++ Linux/mm/mempolicy.c 2008-02-05 16:03:11.000000000 -0500
@@ -131,7 +131,7 @@ static int mpol_check_policy(int mode, n
return -EINVAL;
break;
}
- return nodes_subset(*nodes, node_states[N_HIGH_MEMORY]) ? 0 : -EINVAL;
+ return 0;
}
/* Generate a custom zonelist for the BIND policy. */
@@ -188,8 +188,6 @@ static struct mempolicy *mpol_new(int mo
switch (mode) {
case MPOL_INTERLEAVE:
policy->v.nodes = *nodes;
- nodes_and(policy->v.nodes, policy->v.nodes,
- node_states[N_HIGH_MEMORY]);
if (nodes_weight(policy->v.nodes) == 0) {
kmem_cache_free(policy_cache, policy);
return ERR_PTR(-EINVAL);
@@ -426,9 +424,13 @@ static int contextualize_policy(int mode
if (!nodes)
return 0;
+ /*
+ * Restrict the nodes to the allowed nodes in the cpuset.
+ * This is guaranteed to be a subset of nodes with memory.
+ */
cpuset_update_task_memory_state();
- if (!cpuset_nodes_subset_current_mems_allowed(*nodes))
- return -EINVAL;
+ nodes_and(*nodes, *nodes, cpuset_current_mems_allowed);
+
return mpol_check_policy(mode, nodes);
}
@@ -797,7 +799,7 @@ static long do_mbind(unsigned long start
if (end == start)
return 0;
- if (mpol_check_policy(mode, nmask))
+ if (contextualize_policy(mode, nmask))
return -EINVAL;
new = mpol_new(mode, nmask);
@@ -915,10 +917,6 @@ asmlinkage long sys_mbind(unsigned long
err = get_nodes(&nodes, nmask, maxnode);
if (err)
return err;
-#ifdef CONFIG_CPUSETS
- /* Restrict the nodes to the allowed nodes in the cpuset */
- nodes_and(nodes, nodes, current->mems_allowed);
-#endif
return do_mbind(start, len, mode, &nodes, flags);
}
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [2.6.24 regression][BUGFIX] numactl --interleave=all doesn't works on memoryless node.
2008-02-05 21:57 ` Lee Schermerhorn
@ 2008-02-05 22:12 ` Christoph Lameter
2008-02-06 16:00 ` Lee Schermerhorn
2008-02-05 22:15 ` Paul Jackson
` (2 subsequent siblings)
3 siblings, 1 reply; 43+ messages in thread
From: Christoph Lameter @ 2008-02-05 22:12 UTC (permalink / raw)
To: Lee Schermerhorn
Cc: KOSAKI Motohiro, linux-kernel, Andrew Morton, Paul Jackson,
David Rientjes, Mel Gorman, torvalds, Eric Whitney
On Tue, 5 Feb 2008, Lee Schermerhorn wrote:
> mbind(2), on the other hand, just masks off any nodes in the
> nodemask that are not included in the caller's mems_allowed.
Ok so we temporarily adopt these semantics for set_mempolicy.
> 1) modify contextualize_policy to just remove the non-allowed
> nodes, as is currently done in-line for mbind(). This
> guarantees that the resulting mask includes only nodes with
> memory.
Right make ssense. we already contextualize for cpusets.
> Index: Linux/mm/mempolicy.c
> ===================================================================
> --- Linux.orig/mm/mempolicy.c 2008-02-05 11:25:17.000000000 -0500
> +++ Linux/mm/mempolicy.c 2008-02-05 16:03:11.000000000 -0500
> @@ -131,7 +131,7 @@ static int mpol_check_policy(int mode, n
> return -EINVAL;
> break;
> }
> - return nodes_subset(*nodes, node_states[N_HIGH_MEMORY]) ? 0 : -EINVAL;
> + return 0;
> }
Hmmm... That is a pretty drastic change.
> @@ -188,8 +188,6 @@ static struct mempolicy *mpol_new(int mo
> switch (mode) {
> case MPOL_INTERLEAVE:
> policy->v.nodes = *nodes;
> - nodes_and(policy->v.nodes, policy->v.nodes,
> - node_states[N_HIGH_MEMORY]);
> if (nodes_weight(policy->v.nodes) == 0) {
> kmem_cache_free(policy_cache, policy);
> return ERR_PTR(-EINVAL);
Do we really need to remove these lines if we change set_mempolicy?
> @@ -426,9 +424,13 @@ static int contextualize_policy(int mode
> if (!nodes)
> return 0;
>
> + /*
> + * Restrict the nodes to the allowed nodes in the cpuset.
> + * This is guaranteed to be a subset of nodes with memory.
> + */
> cpuset_update_task_memory_state();
> - if (!cpuset_nodes_subset_current_mems_allowed(*nodes))
> - return -EINVAL;
> + nodes_and(*nodes, *nodes, cpuset_current_mems_allowed);
> +
> return mpol_check_policy(mode, nodes);
> }
>
Ditto?
> @@ -797,7 +799,7 @@ static long do_mbind(unsigned long start
> if (end == start)
> return 0;
>
> - if (mpol_check_policy(mode, nmask))
> + if (contextualize_policy(mode, nmask))
> return -EINVAL;
>
> new = mpol_new(mode, nmask);
> @@ -915,10 +917,6 @@ asmlinkage long sys_mbind(unsigned long
> err = get_nodes(&nodes, nmask, maxnode);
> if (err)
> return err;
> -#ifdef CONFIG_CPUSETS
> - /* Restrict the nodes to the allowed nodes in the cpuset */
> - nodes_and(nodes, nodes, current->mems_allowed);
> -#endif
Would just removing #ifdef CONFIG_CPUSETS work? mems_allowed falls back to
node_possible_map.... Shouldnt that be node_online_map?
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [2.6.24 regression][BUGFIX] numactl --interleave=all doesn't works on memoryless node.
2008-02-05 22:12 ` Christoph Lameter
@ 2008-02-06 16:00 ` Lee Schermerhorn
0 siblings, 0 replies; 43+ messages in thread
From: Lee Schermerhorn @ 2008-02-06 16:00 UTC (permalink / raw)
To: Christoph Lameter
Cc: KOSAKI Motohiro, linux-kernel, Andrew Morton, Paul Jackson,
David Rientjes, Mel Gorman, torvalds, Eric Whitney
On Tue, 2008-02-05 at 14:12 -0800, Christoph Lameter wrote:
> On Tue, 5 Feb 2008, Lee Schermerhorn wrote:
>
> > mbind(2), on the other hand, just masks off any nodes in the
> > nodemask that are not included in the caller's mems_allowed.
>
> Ok so we temporarily adopt these semantics for set_mempolicy.
>
> > 1) modify contextualize_policy to just remove the non-allowed
> > nodes, as is currently done in-line for mbind(). This
> > guarantees that the resulting mask includes only nodes with
> > memory.
>
> Right make ssense. we already contextualize for cpusets.
Only for mbind(). set_mempolicy(), via contextualize_policy() was just
returning EINVAL for invalid nodes in the mask. I don't know if it
always worked like this, or if we did this in the memoryless nodes
series...
>
> > Index: Linux/mm/mempolicy.c
> > ===================================================================
> > --- Linux.orig/mm/mempolicy.c 2008-02-05 11:25:17.000000000 -0500
> > +++ Linux/mm/mempolicy.c 2008-02-05 16:03:11.000000000 -0500
> > @@ -131,7 +131,7 @@ static int mpol_check_policy(int mode, n
> > return -EINVAL;
> > break;
> > }
> > - return nodes_subset(*nodes, node_states[N_HIGH_MEMORY]) ? 0 : -EINVAL;
> > + return 0;
> > }
>
> Hmmm... That is a pretty drastic change.
the nodes_subset() would always return true, once we mask it with
cpuset_current_mems_allowed(), right? mems_allowed can now only contain
nodes with memory and if cpusets are not configured,
cpuset_current_mems_allowed() just returns node_states[N_HIGH_MEMORY].
So, I think this is a no-op.
>
> > @@ -188,8 +188,6 @@ static struct mempolicy *mpol_new(int mo
> > switch (mode) {
> > case MPOL_INTERLEAVE:
> > policy->v.nodes = *nodes;
> > - nodes_and(policy->v.nodes, policy->v.nodes,
> > - node_states[N_HIGH_MEMORY]);
> > if (nodes_weight(policy->v.nodes) == 0) {
> > kmem_cache_free(policy_cache, policy);
> > return ERR_PTR(-EINVAL);
>
> Do we really need to remove these lines if we change set_mempolicy?
Again, with the change to contextualize_policy(), the nodemask is
guaranteed to only contain nodes with memory, so this was redundant.
>
> > @@ -426,9 +424,13 @@ static int contextualize_policy(int mode
> > if (!nodes)
> > return 0;
> >
> > + /*
> > + * Restrict the nodes to the allowed nodes in the cpuset.
> > + * This is guaranteed to be a subset of nodes with memory.
> > + */
> > cpuset_update_task_memory_state();
> > - if (!cpuset_nodes_subset_current_mems_allowed(*nodes))
> > - return -EINVAL;
> > + nodes_and(*nodes, *nodes, cpuset_current_mems_allowed);
> > +
> > return mpol_check_policy(mode, nodes);
> > }
> >
>
> Ditto?
This is the main change in the patch: masking off the invalid nodes
[like sys_mbind() did inline] rather than complaining about them.
However, after I finish testing, I will post an update to this patch
which restores some of the error checks that this change lost.
>
> > @@ -797,7 +799,7 @@ static long do_mbind(unsigned long start
> > if (end == start)
> > return 0;
> >
> > - if (mpol_check_policy(mode, nmask))
> > + if (contextualize_policy(mode, nmask))
> > return -EINVAL;
> >
> > new = mpol_new(mode, nmask);
> > @@ -915,10 +917,6 @@ asmlinkage long sys_mbind(unsigned long
> > err = get_nodes(&nodes, nmask, maxnode);
> > if (err)
> > return err;
> > -#ifdef CONFIG_CPUSETS
> > - /* Restrict the nodes to the allowed nodes in the cpuset */
> > - nodes_and(nodes, nodes, current->mems_allowed);
> > -#endif
>
> Would just removing #ifdef CONFIG_CPUSETS work? mems_allowed falls back to
> node_possible_map.... Shouldnt that be node_online_map?
I removed this because I changed do_mbind() to call the revised
contextualize_policy() that does exactly this masking. I didn't see any
reason to leave the duplicate code there.
I think that mems_allowed now falls back to nodes with memory. Or it
should in the current code. When Paul adds his new magic, that might
change.
Lee
>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [2.6.24 regression][BUGFIX] numactl --interleave=all doesn't works on memoryless node.
2008-02-05 21:57 ` Lee Schermerhorn
2008-02-05 22:12 ` Christoph Lameter
@ 2008-02-05 22:15 ` Paul Jackson
2008-02-06 2:17 ` David Rientjes
2008-02-06 6:49 ` KOSAKI Motohiro
3 siblings, 0 replies; 43+ messages in thread
From: Paul Jackson @ 2008-02-05 22:15 UTC (permalink / raw)
To: Lee Schermerhorn
Cc: kosaki.motohiro, linux-kernel, akpm, clameter, rientjes, mel,
torvalds, eric.whitney
Lee wrote:
> [PATCH] 2.6.24-mm1 - mempolicy: silently restrict to allowed nodes
At first glance, I like it. Thanks.
The changes in the exact behaviour of set_mempolicy (and mbind?) seem
to me to be changes for the better -- subtle improvements in the
consistency of handling corner cases.
However I don't have code that depends all that elaborately on the
fine details of these system calls, so I'm easy. If others know of
an existing or likely usage pattern that this patch would break,
that would be interesting input.
Thanks, Lee.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.940.382.4214
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [2.6.24 regression][BUGFIX] numactl --interleave=all doesn't works on memoryless node.
2008-02-05 21:57 ` Lee Schermerhorn
2008-02-05 22:12 ` Christoph Lameter
2008-02-05 22:15 ` Paul Jackson
@ 2008-02-06 2:17 ` David Rientjes
2008-02-06 16:11 ` Lee Schermerhorn
2008-02-06 6:49 ` KOSAKI Motohiro
3 siblings, 1 reply; 43+ messages in thread
From: David Rientjes @ 2008-02-06 2:17 UTC (permalink / raw)
To: Lee Schermerhorn
Cc: KOSAKI Motohiro, linux-kernel, Andrew Morton, Christoph Lameter,
Paul Jackson, Mel Gorman, torvalds, Eric Whitney
On Tue, 5 Feb 2008, Lee Schermerhorn wrote:
> Index: Linux/mm/mempolicy.c
> ===================================================================
> --- Linux.orig/mm/mempolicy.c 2008-02-05 11:25:17.000000000 -0500
> +++ Linux/mm/mempolicy.c 2008-02-05 16:03:11.000000000 -0500
> @@ -131,7 +131,7 @@ static int mpol_check_policy(int mode, n
> return -EINVAL;
> break;
> }
> - return nodes_subset(*nodes, node_states[N_HIGH_MEMORY]) ? 0 : -EINVAL;
> + return 0;
> }
>
> /* Generate a custom zonelist for the BIND policy. */
This change will be necessary when the nodemask passed from the syscall is
saved in the struct mempolicy as the intent of the application as well.
> @@ -188,8 +188,6 @@ static struct mempolicy *mpol_new(int mo
> switch (mode) {
> case MPOL_INTERLEAVE:
> policy->v.nodes = *nodes;
> - nodes_and(policy->v.nodes, policy->v.nodes,
> - node_states[N_HIGH_MEMORY]);
> if (nodes_weight(policy->v.nodes) == 0) {
> kmem_cache_free(policy_cache, policy);
> return ERR_PTR(-EINVAL);
> @@ -426,9 +424,13 @@ static int contextualize_policy(int mode
> if (!nodes)
> return 0;
>
> + /*
> + * Restrict the nodes to the allowed nodes in the cpuset.
> + * This is guaranteed to be a subset of nodes with memory.
> + */
> cpuset_update_task_memory_state();
> - if (!cpuset_nodes_subset_current_mems_allowed(*nodes))
> - return -EINVAL;
> + nodes_and(*nodes, *nodes, cpuset_current_mems_allowed);
> +
> return mpol_check_policy(mode, nodes);
> }
>
I would defer the intersection until later because contextualize_policy()
is called before mpol_new() so we have no struct mempolicy to save the
intent in. It doesn't matter for the sake of this change, I know, but you
could move this intersection to mpol_new() and give us an opportunity to
store the user's nodemask in the mempolicy with a one-line change and get
the same desired result.
You can now remove cpuset_nodes_subset_current_mems_allowed() from
linux/cpuset.h.
> @@ -797,7 +799,7 @@ static long do_mbind(unsigned long start
> if (end == start)
> return 0;
>
> - if (mpol_check_policy(mode, nmask))
> + if (contextualize_policy(mode, nmask))
> return -EINVAL;
>
> new = mpol_new(mode, nmask);
> @@ -915,10 +917,6 @@ asmlinkage long sys_mbind(unsigned long
> err = get_nodes(&nodes, nmask, maxnode);
> if (err)
> return err;
> -#ifdef CONFIG_CPUSETS
> - /* Restrict the nodes to the allowed nodes in the cpuset */
> - nodes_and(nodes, nodes, current->mems_allowed);
> -#endif
> return do_mbind(start, len, mode, &nodes, flags);
> }
>
Looks good, thanks for doing this.
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [2.6.24 regression][BUGFIX] numactl --interleave=all doesn't works on memoryless node.
2008-02-06 2:17 ` David Rientjes
@ 2008-02-06 16:11 ` Lee Schermerhorn
0 siblings, 0 replies; 43+ messages in thread
From: Lee Schermerhorn @ 2008-02-06 16:11 UTC (permalink / raw)
To: David Rientjes
Cc: KOSAKI Motohiro, linux-kernel, Andrew Morton, Christoph Lameter,
Paul Jackson, Mel Gorman, torvalds, Eric Whitney
On Tue, 2008-02-05 at 18:17 -0800, David Rientjes wrote:
> On Tue, 5 Feb 2008, Lee Schermerhorn wrote:
>
> > Index: Linux/mm/mempolicy.c
> > ===================================================================
> > --- Linux.orig/mm/mempolicy.c 2008-02-05 11:25:17.000000000 -0500
> > +++ Linux/mm/mempolicy.c 2008-02-05 16:03:11.000000000 -0500
> > @@ -131,7 +131,7 @@ static int mpol_check_policy(int mode, n
> > return -EINVAL;
> > break;
> > }
> > - return nodes_subset(*nodes, node_states[N_HIGH_MEMORY]) ? 0 : -EINVAL;
> > + return 0;
> > }
> >
> > /* Generate a custom zonelist for the BIND policy. */
>
> This change will be necessary when the nodemask passed from the syscall is
> saved in the struct mempolicy as the intent of the application as well.
>
> > @@ -188,8 +188,6 @@ static struct mempolicy *mpol_new(int mo
> > switch (mode) {
> > case MPOL_INTERLEAVE:
> > policy->v.nodes = *nodes;
> > - nodes_and(policy->v.nodes, policy->v.nodes,
> > - node_states[N_HIGH_MEMORY]);
> > if (nodes_weight(policy->v.nodes) == 0) {
> > kmem_cache_free(policy_cache, policy);
> > return ERR_PTR(-EINVAL);
> > @@ -426,9 +424,13 @@ static int contextualize_policy(int mode
> > if (!nodes)
> > return 0;
> >
> > + /*
> > + * Restrict the nodes to the allowed nodes in the cpuset.
> > + * This is guaranteed to be a subset of nodes with memory.
> > + */
> > cpuset_update_task_memory_state();
> > - if (!cpuset_nodes_subset_current_mems_allowed(*nodes))
> > - return -EINVAL;
> > + nodes_and(*nodes, *nodes, cpuset_current_mems_allowed);
> > +
> > return mpol_check_policy(mode, nodes);
> > }
> >
>
> I would defer the intersection until later because contextualize_policy()
> is called before mpol_new() so we have no struct mempolicy to save the
> intent in. It doesn't matter for the sake of this change, I know, but you
> could move this intersection to mpol_new() and give us an opportunity to
> store the user's nodemask in the mempolicy with a one-line change and get
> the same desired result.
Hi, David:
I wanted to avoid a major restructuring of the code for this patch.
However, now that both do_mbind() and do_set_mempolicy() both call
contextualize_policy() [which calls mpol_check_policy()] immediately
before calling mpol_new(), I agree we can push this "contextualization"
down there. I would like to defer this to another patch--perhaps as
part of Paul's rework of mempolicy and cpusets.
Note that there is another caller of mpol_new() --
mpol_shared_policy_init(). We'll need to decide whether that call needs
to be contextualized, as it constructs a policy from the tmpfs or
hugetlbfs superblock, as specified on the mount command [or kernel
command line?]. As this is a privileged operation, one could argue that
it should be exempt from cpuset constraints.
>
> You can now remove cpuset_nodes_subset_current_mems_allowed() from
> linux/cpuset.h.
>
> > @@ -797,7 +799,7 @@ static long do_mbind(unsigned long start
> > if (end == start)
> > return 0;
> >
> > - if (mpol_check_policy(mode, nmask))
> > + if (contextualize_policy(mode, nmask))
> > return -EINVAL;
> >
> > new = mpol_new(mode, nmask);
> > @@ -915,10 +917,6 @@ asmlinkage long sys_mbind(unsigned long
> > err = get_nodes(&nodes, nmask, maxnode);
> > if (err)
> > return err;
> > -#ifdef CONFIG_CPUSETS
> > - /* Restrict the nodes to the allowed nodes in the cpuset */
> > - nodes_and(nodes, nodes, current->mems_allowed);
> > -#endif
> > return do_mbind(start, len, mode, &nodes, flags);
> > }
> >
>
> Looks good, thanks for doing this.
As I mentioned to Christoph, I'll post a new version that I think
handles the error conditions better.
Later,
Lee
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [2.6.24 regression][BUGFIX] numactl --interleave=all doesn't works on memoryless node.
2008-02-05 21:57 ` Lee Schermerhorn
` (2 preceding siblings ...)
2008-02-06 2:17 ` David Rientjes
@ 2008-02-06 6:49 ` KOSAKI Motohiro
3 siblings, 0 replies; 43+ messages in thread
From: KOSAKI Motohiro @ 2008-02-06 6:49 UTC (permalink / raw)
To: Lee Schermerhorn
Cc: kosaki.motohiro, linux-kernel, Andrew Morton, Christoph Lameter,
Paul Jackson, David Rientjes, Mel Gorman, torvalds, Eric Whitney
Hi Lee-san
> Here's a patch that addresses the problem w/o requiring change to
> numactl or libnuma. It DOES have side affects, discussed in the
> description.
Thank you!
but unfortunately, My machine is broken phisically today ;-)
I will test it tommorow or later.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [2.6.24 regression][BUGFIX] numactl --interleave=all doesn't works on memoryless node.
2008-02-05 9:26 ` [2.6.24 regression][BUGFIX] " KOSAKI Motohiro
2008-02-05 21:57 ` Lee Schermerhorn
@ 2008-02-06 17:38 ` Lee Schermerhorn
2008-02-07 8:31 ` KOSAKI Motohiro
2008-02-08 19:45 ` [PATCH 2.6.24-mm1] Mempolicy: silently restrict nodemask to allowed nodes V3 Lee Schermerhorn
2 siblings, 1 reply; 43+ messages in thread
From: Lee Schermerhorn @ 2008-02-06 17:38 UTC (permalink / raw)
To: KOSAKI Motohiro, linux-kernel, Andrew Morton
Cc: Christoph Lameter, Paul Jackson, David Rientjes, Mel Gorman,
torvalds, Eric Whitney
I've updated the patch to restore some error checking that my previous
version and the memoryless-nodes series lost.
Again, tested with "numactl --interleave=all" and memtoy on ia64 using
mem= command line argument to simulate memoryless node.
Lee
----------------------------------
[PATCH] 2.6.24-mm1 - mempolicy: silently restrict to allowed nodes
V1 -> V2:
+ Communicate whether or not incoming node mask was empty to
mpol_check_policy() for better error checking.
+ As suggested by David Rientjes, remove the now unused
cpuset_nodes_subset_current_mems_allowed() from cpuset.h
Kosaki Motohito noted that "numactl --interleave=all ..." failed in the
presence of memoryless nodes. This patch attempts to fix that problem.
Some background:
numactl --interleave=all calls set_mempolicy(2) with a fully
populated [out to MAXNUMNODES] nodemask. set_mempolicy()
[in do_set_mempolicy()] calls contextualize_policy() which
requires that the nodemask be a subset of the current task's
mems_allowed; else EINVAL will be returned. A task's
mems_allowed will always be a subset of node_states[N_HIGH_MEMORY]--
i.e., nodes with memory. So, a fully populated nodemask will
be declared invalid if it includes memoryless nodes.
NOTE: the same thing will occur when running in a cpuset
with restricted mem_allowed--for the same reason:
node mask contains dis-allowed nodes.
mbind(2), on the other hand, just masks off any nodes in the
nodemask that are not included in the caller's mems_allowed.
In each case [mbind() and set_mempolicy()], mpol_check_policy()
will complain [again, resulting in EINVAL] if the nodemask contains
any memoryless nodes. This is somewhat redundant as mpol_new()
will remove memoryless nodes for interleave policy, as will
bind_zonelist()--called by mpol_new() for BIND policy.
Proposed fix:
1) modify contextualize_policy to:
a) remember whether the incoming node mask is empty.
b) if not, restrict the nodemask to allowed nodes, as is
currently done in-line for mbind(). This guarantees
that the resulting mask includes only nodes with memory.
NOTE: this is a [benign, IMO] change in behavior for
set_mempolicy(). Dis-allowed nodes will be
silently ignored, rather than returning an error.
c) pass the "was_empty" state of the incoming mask to
mpol_check_policy() for vetting the user specified
nodemask for MPOL_DEFAULT and MPOL_PREFERRED
2) In mpol_check_policy():
a) MPOL_DEFAULT: require that in coming mask "was_empty"
a) add a case for MPOL_PREFERRED: if in coming was not empty
and resulting mask IS empty, user specified invalid nodes.
Return EINVAL.
b) remove the now redundant check for memoryless nodes
3) modify mbind() to use contextualize_policy(), like set_mempolicy(),
instead of masking nodes in-line. This preserves the current
behavior of mbind() and restores some error checking that the
memoryless-nodes series lost when restricting node masks to
allowed_nodes [== subset of nodes with memory].
4) remove the now redundant masking of policy nodes for interleave
policy from mpol_new().
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
include/linux/cpuset.h | 3 --
mm/mempolicy.c | 50 ++++++++++++++++++++++++++++++++-----------------
2 files changed, 33 insertions(+), 20 deletions(-)
Index: Linux/mm/mempolicy.c
===================================================================
--- Linux.orig/mm/mempolicy.c 2008-02-05 16:51:19.000000000 -0500
+++ Linux/mm/mempolicy.c 2008-02-06 10:58:57.000000000 -0500
@@ -114,24 +114,37 @@ static void mpol_rebind_policy(struct me
const nodemask_t *newmask);
/* Do sanity checking on a policy */
-static int mpol_check_policy(int mode, nodemask_t *nodes)
+static int mpol_check_policy(int mode, nodemask_t *nodes, int was_empty)
{
- int empty = nodes_empty(*nodes);
+ int is_empty = nodes_empty(*nodes);
switch (mode) {
case MPOL_DEFAULT:
- if (!empty)
+ /*
+ * require caller to specify an empty nodemask
+ * before "contextualization"
+ */
+ if (!was_empty)
return -EINVAL;
break;
case MPOL_BIND:
case MPOL_INTERLEAVE:
- /* Preferred will only use the first bit, but allow
- more for now. */
- if (empty)
+ /*
+ * require at least 1 valid node after "contextualization"
+ */
+ if (is_empty)
+ return -EINVAL;
+ break;
+ case MPOL_PREFERRED:
+ /*
+ * Did caller specify invalid nodes?
+ * Don't silently accept this as "local allocation".
+ */
+ if (!was_empty && is_empty)
return -EINVAL;
break;
}
- return nodes_subset(*nodes, node_states[N_HIGH_MEMORY]) ? 0 : -EINVAL;
+ return 0;
}
/* Generate a custom zonelist for the BIND policy. */
@@ -188,8 +201,6 @@ static struct mempolicy *mpol_new(int mo
switch (mode) {
case MPOL_INTERLEAVE:
policy->v.nodes = *nodes;
- nodes_and(policy->v.nodes, policy->v.nodes,
- node_states[N_HIGH_MEMORY]);
if (nodes_weight(policy->v.nodes) == 0) {
kmem_cache_free(policy_cache, policy);
return ERR_PTR(-EINVAL);
@@ -423,13 +434,22 @@ static int mbind_range(struct vm_area_st
static int contextualize_policy(int mode, nodemask_t *nodes)
{
+ int was_empty;
+
if (!nodes)
return 0;
+ /*
+ * Remember whether in coming nodemask was empty, If not,
+ * restrict the nodes to the allowed nodes in the cpuset.
+ * This is guaranteed to be a subset of nodes with memory.
+ */
cpuset_update_task_memory_state();
- if (!cpuset_nodes_subset_current_mems_allowed(*nodes))
- return -EINVAL;
- return mpol_check_policy(mode, nodes);
+ was_empty = nodes_empty(*nodes);
+ if (!was_empty)
+ nodes_and(*nodes, *nodes, cpuset_current_mems_allowed);
+
+ return mpol_check_policy(mode, nodes, was_empty);
}
@@ -797,7 +817,7 @@ static long do_mbind(unsigned long start
if (end == start)
return 0;
- if (mpol_check_policy(mode, nmask))
+ if (contextualize_policy(mode, nmask))
return -EINVAL;
new = mpol_new(mode, nmask);
@@ -915,10 +935,6 @@ asmlinkage long sys_mbind(unsigned long
err = get_nodes(&nodes, nmask, maxnode);
if (err)
return err;
-#ifdef CONFIG_CPUSETS
- /* Restrict the nodes to the allowed nodes in the cpuset */
- nodes_and(nodes, nodes, current->mems_allowed);
-#endif
return do_mbind(start, len, mode, &nodes, flags);
}
Index: Linux/include/linux/cpuset.h
===================================================================
--- Linux.orig/include/linux/cpuset.h 2008-02-05 16:05:15.000000000 -0500
+++ Linux/include/linux/cpuset.h 2008-02-06 10:47:48.000000000 -0500
@@ -26,8 +26,6 @@ extern nodemask_t cpuset_mems_allowed(st
#define cpuset_current_mems_allowed (current->mems_allowed)
void cpuset_init_current_mems_allowed(void);
void cpuset_update_task_memory_state(void);
-#define cpuset_nodes_subset_current_mems_allowed(nodes) \
- nodes_subset((nodes), current->mems_allowed)
int cpuset_zonelist_valid_mems_allowed(struct zonelist *zl);
extern int __cpuset_zone_allowed_softwall(struct zone *z, gfp_t gfp_mask);
@@ -103,7 +101,6 @@ static inline nodemask_t cpuset_mems_all
#define cpuset_current_mems_allowed (node_states[N_HIGH_MEMORY])
static inline void cpuset_init_current_mems_allowed(void) {}
static inline void cpuset_update_task_memory_state(void) {}
-#define cpuset_nodes_subset_current_mems_allowed(nodes) (1)
static inline int cpuset_zonelist_valid_mems_allowed(struct zonelist *zl)
{
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [2.6.24 regression][BUGFIX] numactl --interleave=all doesn't works on memoryless node.
2008-02-06 17:38 ` Lee Schermerhorn
@ 2008-02-07 8:31 ` KOSAKI Motohiro
0 siblings, 0 replies; 43+ messages in thread
From: KOSAKI Motohiro @ 2008-02-07 8:31 UTC (permalink / raw)
To: Lee Schermerhorn
Cc: kosaki.motohiro, linux-kernel, Andrew Morton, Christoph Lameter,
Paul Jackson, David Rientjes, Mel Gorman, torvalds, Eric Whitney
Hi Lee-san
Unfortunately, 2.6.24-mm1 can't boot on fujitsu machine.
(hmm, origin.patch cause regression to pci initialization ;-)
instead, I tested 2.6.24 + your patch.
it seem work good :)
Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
and, I have a bit comment.
> /* Do sanity checking on a policy */
> -static int mpol_check_policy(int mode, nodemask_t *nodes)
> +static int mpol_check_policy(int mode, nodemask_t *nodes, int was_empty)
was_empty argument is a bit ugly.
Could we unify mpol_check_policy and contextualize_policy?
mpol_check_policy only called from contextualize_policy.
> - return nodes_subset(*nodes, node_states[N_HIGH_MEMORY]) ? 0 : -EINVAL;
> + return 0;
Could we N_POSSIBLE check?
I attached the patch for my idea explain.
on my test environment, your patch and mine works good both.
- kosaki
---
mm/mempolicy.c | 47 +++++++++++++++++++++--------------------------
1 file changed, 21 insertions(+), 26 deletions(-)
Index: b/mm/mempolicy.c
===================================================================
--- a/mm/mempolicy.c 2008-02-07 17:19:09.000000000 +0900
+++ b/mm/mempolicy.c 2008-02-07 17:24:28.000000000 +0900
@@ -114,9 +114,25 @@ static void mpol_rebind_policy(struct me
const nodemask_t *newmask);
/* Do sanity checking on a policy */
-static int mpol_check_policy(int mode, nodemask_t *nodes, int was_empty)
+static int mpol_check_policy(int mode, nodemask_t *nodes)
{
- int is_empty = nodes_empty(*nodes);
+ int was_empty;
+ int is_empty;
+
+ if (!nodes)
+ return 0;
+
+ /*
+ * Remember whether in coming nodemask was empty, If not,
+ * restrict the nodes to the allowed nodes in the cpuset.
+ * This is guaranteed to be a subset of nodes with memory.
+ */
+ cpuset_update_task_memory_state();
+ was_empty = nodes_empty(*nodes);
+ if (!was_empty)
+ nodes_and(*nodes, *nodes, cpuset_current_mems_allowed);
+
+ is_empty = nodes_empty(*nodes);
switch (mode) {
case MPOL_DEFAULT:
@@ -144,7 +160,7 @@ static int mpol_check_policy(int mode, n
return -EINVAL;
break;
}
- return 0;
+ return nodes_subset(*nodes, node_states[N_POSSIBLE]) ? 0 : -EINVAL;
}
/* Generate a custom zonelist for the BIND policy. */
@@ -432,27 +448,6 @@ static int mbind_range(struct vm_area_st
return err;
}
-static int contextualize_policy(int mode, nodemask_t *nodes)
-{
- int was_empty;
-
- if (!nodes)
- return 0;
-
- /*
- * Remember whether in coming nodemask was empty, If not,
- * restrict the nodes to the allowed nodes in the cpuset.
- * This is guaranteed to be a subset of nodes with memory.
- */
- cpuset_update_task_memory_state();
- was_empty = nodes_empty(*nodes);
- if (!was_empty)
- nodes_and(*nodes, *nodes, cpuset_current_mems_allowed);
-
- return mpol_check_policy(mode, nodes, was_empty);
-}
-
-
/*
* Update task->flags PF_MEMPOLICY bit: set iff non-default
* mempolicy. Allows more rapid checking of this (combined perhaps
@@ -488,7 +483,7 @@ static long do_set_mempolicy(int mode, n
{
struct mempolicy *new;
- if (contextualize_policy(mode, nodes))
+ if (mpol_check_policy(mode, nodes))
return -EINVAL;
new = mpol_new(mode, nodes);
if (IS_ERR(new))
@@ -817,7 +812,7 @@ static long do_mbind(unsigned long start
if (end == start)
return 0;
- if (contextualize_policy(mode, nmask))
+ if (mpol_check_policy(mode, nmask))
return -EINVAL;
new = mpol_new(mode, nmask);
^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH 2.6.24-mm1] Mempolicy: silently restrict nodemask to allowed nodes V3
2008-02-05 9:26 ` [2.6.24 regression][BUGFIX] " KOSAKI Motohiro
2008-02-05 21:57 ` Lee Schermerhorn
2008-02-06 17:38 ` Lee Schermerhorn
@ 2008-02-08 19:45 ` Lee Schermerhorn
2008-02-09 18:11 ` KOSAKI Motohiro
2008-02-10 5:29 ` KOSAKI Motohiro
2 siblings, 2 replies; 43+ messages in thread
From: Lee Schermerhorn @ 2008-02-08 19:45 UTC (permalink / raw)
To: KOSAKI Motohiro, linux-kernel, Andrew Morton
Cc: Christoph Lameter, Paul Jackson, David Rientjes, Mel Gorman,
linux-mm, torvalds, Eric Whitney
Was "Re: [2.6.24 regression][BUGFIX] numactl --interleave=all doesn't
works on memoryless node."
[Aside: I noticed there were two slightly different distributions for
this topic. I've unified the distribution lists w/o dropping anyone, I
think. Apologies if you'd rather have been dropped...]
Here's V3 of the patch, accomodating Kosaki Motohiro's suggestion for
folding contextualize_policy() into mpol_check_policy() [because my
"was_empty" argument "was ugly" ;-)]. It does seem to clean up the
code.
I'm still deferring David Rientjes' suggestion to fold
mpol_check_policy() into mpol_new(). We need to sort out whether
mempolicies specified for tmpfs and hugetlbfs mounts always need the
same "contextualization" as user/application installed policies. I
don't want to hold up this bug fix for that discussion. This is
something Paul J will need to address with his cpuset/mempolicy rework,
so we can sort it out in that context.
Again, tested with "numactl --interleave=all" and memtoy on ia64 using
mem= command line argument to simulate memoryless node.
Lee
============================
[PATCH] 2.6.24-mm1 - mempolicy: silently restrict nodemask to allowed nodes
V2 -> V3:
+ As suggested by Kosaki Motohito, fold the "contextualization"
of policy nodemask into mpol_check_policy(). Looks a little
cleaner.
V1 -> V2:
+ Communicate whether or not incoming node mask was empty to
mpol_check_policy() for better error checking.
+ As suggested by David Rientjes, remove the now unused
cpuset_nodes_subset_current_mems_allowed() from cpuset.h
Kosaki Motohito noted that "numactl --interleave=all ..." failed in the
presence of memoryless nodes. This patch attempts to fix that problem.
Some background:
numactl --interleave=all calls set_mempolicy(2) with a fully
populated [out to MAXNUMNODES] nodemask. set_mempolicy()
[in do_set_mempolicy()] calls contextualize_policy() which
requires that the nodemask be a subset of the current task's
mems_allowed; else EINVAL will be returned. A task's
mems_allowed will always be a subset of node_states[N_HIGH_MEMORY]--
i.e., nodes with memory. So, a fully populated nodemask will
be declared invalid if it includes memoryless nodes.
NOTE: the same thing will occur when running in a cpuset
with restricted mem_allowed--for the same reason:
node mask contains dis-allowed nodes.
mbind(2), on the other hand, just masks off any nodes in the
nodemask that are not included in the caller's mems_allowed.
In each case [mbind() and set_mempolicy()], mpol_check_policy()
will complain [again, resulting in EINVAL] if the nodemask contains
any memoryless nodes. This is somewhat redundant as mpol_new()
will remove memoryless nodes for interleave policy, as will
bind_zonelist()--called by mpol_new() for BIND policy.
Proposed fix:
1) modify contextualize_policy logic to:
a) remember whether the incoming node mask is empty.
b) if not, restrict the nodemask to allowed nodes, as is
currently done in-line for mbind(). This guarantees
that the resulting mask includes only nodes with memory.
NOTE: this is a [benign, IMO] change in behavior for
set_mempolicy(). Dis-allowed nodes will be
silently ignored, rather than returning an error.
c) fold this code into mpol_check_policy(), replace 2 calls to
contextualize_policy() to call mpol_check_policy() directly
and remove contextualize_policy().
2) In existing mpol_check_policy() logic, after "contextualization":
a) MPOL_DEFAULT: require that in coming mask "was_empty"
b) MPOL_{BIND|INTERLEAVE}: require that contextualized nodemask
contains at least one node.
c) add a case for MPOL_PREFERRED: if in coming was not empty
and resulting mask IS empty, user specified invalid nodes.
Return EINVAL.
c) remove the now redundant check for memoryless nodes
3) remove the now redundant masking of policy nodes for interleave
policy from mpol_new().
4) Now that mpol_check_policy() contextualizes the nodemask, remove
the in-line nodes_and() from sys_mbind(). I believe that this
restores mbind() to the behavior before the memoryless-nodes
patch series. E.g., we'll no longer treat an invalid nodemask
with MPOL_PREFERRED as local allocation.
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
include/linux/cpuset.h | 3 --
mm/mempolicy.c | 61 ++++++++++++++++++++++++++++---------------------
2 files changed, 36 insertions(+), 28 deletions(-)
Index: Linux/mm/mempolicy.c
===================================================================
--- Linux.orig/mm/mempolicy.c 2008-02-08 11:11:34.000000000 -0500
+++ Linux/mm/mempolicy.c 2008-02-08 13:40:40.000000000 -0500
@@ -116,22 +116,51 @@ static void mpol_rebind_policy(struct me
/* Do sanity checking on a policy */
static int mpol_check_policy(int mode, nodemask_t *nodes)
{
- int empty = nodes_empty(*nodes);
+ int was_empty, is_empty;
+
+ if (!nodes)
+ return 0;
+
+ /*
+ * "Contextualize" the in-coming nodemast for cpusets:
+ * Remember whether in-coming nodemask was empty, If not,
+ * restrict the nodes to the allowed nodes in the cpuset.
+ * This is guaranteed to be a subset of nodes with memory.
+ */
+ cpuset_update_task_memory_state();
+ is_empty = was_empty = nodes_empty(*nodes);
+ if (!was_empty) {
+ nodes_and(*nodes, *nodes, cpuset_current_mems_allowed);
+ is_empty = nodes_empty(*nodes); /* after "contextualization" */
+ }
switch (mode) {
case MPOL_DEFAULT:
- if (!empty)
+ /*
+ * require caller to specify an empty nodemask
+ * before "contextualization"
+ */
+ if (!was_empty)
return -EINVAL;
break;
case MPOL_BIND:
case MPOL_INTERLEAVE:
- /* Preferred will only use the first bit, but allow
- more for now. */
- if (empty)
+ /*
+ * require at least 1 valid node after "contextualization"
+ */
+ if (is_empty)
+ return -EINVAL;
+ break;
+ case MPOL_PREFERRED:
+ /*
+ * Did caller specify invalid nodes?
+ * Don't silently accept this as "local allocation".
+ */
+ if (!was_empty && is_empty)
return -EINVAL;
break;
}
- return nodes_subset(*nodes, node_states[N_HIGH_MEMORY]) ? 0 : -EINVAL;
+ return 0;
}
/* Generate a custom zonelist for the BIND policy. */
@@ -188,8 +217,6 @@ static struct mempolicy *mpol_new(int mo
switch (mode) {
case MPOL_INTERLEAVE:
policy->v.nodes = *nodes;
- nodes_and(policy->v.nodes, policy->v.nodes,
- node_states[N_HIGH_MEMORY]);
if (nodes_weight(policy->v.nodes) == 0) {
kmem_cache_free(policy_cache, policy);
return ERR_PTR(-EINVAL);
@@ -421,18 +448,6 @@ static int mbind_range(struct vm_area_st
return err;
}
-static int contextualize_policy(int mode, nodemask_t *nodes)
-{
- if (!nodes)
- return 0;
-
- cpuset_update_task_memory_state();
- if (!cpuset_nodes_subset_current_mems_allowed(*nodes))
- return -EINVAL;
- return mpol_check_policy(mode, nodes);
-}
-
-
/*
* Update task->flags PF_MEMPOLICY bit: set iff non-default
* mempolicy. Allows more rapid checking of this (combined perhaps
@@ -468,7 +483,7 @@ static long do_set_mempolicy(int mode, n
{
struct mempolicy *new;
- if (contextualize_policy(mode, nodes))
+ if (mpol_check_policy(mode, nodes))
return -EINVAL;
new = mpol_new(mode, nodes);
if (IS_ERR(new))
@@ -915,10 +930,6 @@ asmlinkage long sys_mbind(unsigned long
err = get_nodes(&nodes, nmask, maxnode);
if (err)
return err;
-#ifdef CONFIG_CPUSETS
- /* Restrict the nodes to the allowed nodes in the cpuset */
- nodes_and(nodes, nodes, current->mems_allowed);
-#endif
return do_mbind(start, len, mode, &nodes, flags);
}
Index: Linux/include/linux/cpuset.h
===================================================================
--- Linux.orig/include/linux/cpuset.h 2008-02-08 11:11:34.000000000 -0500
+++ Linux/include/linux/cpuset.h 2008-02-08 11:12:43.000000000 -0500
@@ -26,8 +26,6 @@ extern nodemask_t cpuset_mems_allowed(st
#define cpuset_current_mems_allowed (current->mems_allowed)
void cpuset_init_current_mems_allowed(void);
void cpuset_update_task_memory_state(void);
-#define cpuset_nodes_subset_current_mems_allowed(nodes) \
- nodes_subset((nodes), current->mems_allowed)
int cpuset_zonelist_valid_mems_allowed(struct zonelist *zl);
extern int __cpuset_zone_allowed_softwall(struct zone *z, gfp_t gfp_mask);
@@ -103,7 +101,6 @@ static inline nodemask_t cpuset_mems_all
#define cpuset_current_mems_allowed (node_states[N_HIGH_MEMORY])
static inline void cpuset_init_current_mems_allowed(void) {}
static inline void cpuset_update_task_memory_state(void) {}
-#define cpuset_nodes_subset_current_mems_allowed(nodes) (1)
static inline int cpuset_zonelist_valid_mems_allowed(struct zonelist *zl)
{
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [PATCH 2.6.24-mm1] Mempolicy: silently restrict nodemask to allowed nodes V3
2008-02-08 19:45 ` [PATCH 2.6.24-mm1] Mempolicy: silently restrict nodemask to allowed nodes V3 Lee Schermerhorn
@ 2008-02-09 18:11 ` KOSAKI Motohiro
2008-02-10 5:29 ` KOSAKI Motohiro
1 sibling, 0 replies; 43+ messages in thread
From: KOSAKI Motohiro @ 2008-02-09 18:11 UTC (permalink / raw)
To: Lee Schermerhorn
Cc: linux-kernel, Andrew Morton, Christoph Lameter, Paul Jackson,
David Rientjes, Mel Gorman, linux-mm, torvalds, Eric Whitney
Hi Lee-san
looks good for me.
I'll test about the head of week and report it by another mail.
Thanks!
> Was "Re: [2.6.24 regression][BUGFIX] numactl --interleave=all doesn't
> works on memoryless node."
>
> [Aside: I noticed there were two slightly different distributions for
> this topic. I've unified the distribution lists w/o dropping anyone, I
> think. Apologies if you'd rather have been dropped...]
>
> Here's V3 of the patch, accomodating Kosaki Motohiro's suggestion for
> folding contextualize_policy() into mpol_check_policy() [because my
> "was_empty" argument "was ugly" ;-)]. It does seem to clean up the
> code.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 2.6.24-mm1] Mempolicy: silently restrict nodemask to allowed nodes V3
2008-02-08 19:45 ` [PATCH 2.6.24-mm1] Mempolicy: silently restrict nodemask to allowed nodes V3 Lee Schermerhorn
2008-02-09 18:11 ` KOSAKI Motohiro
@ 2008-02-10 5:29 ` KOSAKI Motohiro
2008-02-10 5:49 ` Greg KH
1 sibling, 1 reply; 43+ messages in thread
From: KOSAKI Motohiro @ 2008-02-10 5:29 UTC (permalink / raw)
To: Lee Schermerhorn
Cc: kosaki.motohiro, linux-kernel, Andrew Morton, Christoph Lameter,
Paul Jackson, David Rientjes, Mel Gorman, linux-mm, torvalds,
Eric Whitney, Greg KH
CC'd Greg KH <greg@kroah.com>
I tested this patch on fujitsu memoryless node.
(2.6.24 + silently-restrict-nodemask-to-allowed-nodes-V3 insted 2.6.24-mm1)
it seems works good.
Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Greg, I hope this patch merge to 2.6.24.x stable tree because
this patch is regression fixed patch.
Please tell me what do i doing for it.
[intentional full quote]
> Was "Re: [2.6.24 regression][BUGFIX] numactl --interleave=all doesn't
> works on memoryless node."
>
> [Aside: I noticed there were two slightly different distributions for
> this topic. I've unified the distribution lists w/o dropping anyone, I
> think. Apologies if you'd rather have been dropped...]
>
> Here's V3 of the patch, accomodating Kosaki Motohiro's suggestion for
> folding contextualize_policy() into mpol_check_policy() [because my
> "was_empty" argument "was ugly" ;-)]. It does seem to clean up the
> code.
>
> I'm still deferring David Rientjes' suggestion to fold
> mpol_check_policy() into mpol_new(). We need to sort out whether
> mempolicies specified for tmpfs and hugetlbfs mounts always need the
> same "contextualization" as user/application installed policies. I
> don't want to hold up this bug fix for that discussion. This is
> something Paul J will need to address with his cpuset/mempolicy rework,
> so we can sort it out in that context.
>
> Again, tested with "numactl --interleave=all" and memtoy on ia64 using
> mem= command line argument to simulate memoryless node.
>
>
> Lee
>
> ============================
> [PATCH] 2.6.24-mm1 - mempolicy: silently restrict nodemask to allowed nodes
>
> V2 -> V3:
> + As suggested by Kosaki Motohito, fold the "contextualization"
> of policy nodemask into mpol_check_policy(). Looks a little
> cleaner.
>
> V1 -> V2:
> + Communicate whether or not incoming node mask was empty to
> mpol_check_policy() for better error checking.
> + As suggested by David Rientjes, remove the now unused
> cpuset_nodes_subset_current_mems_allowed() from cpuset.h
>
> Kosaki Motohito noted that "numactl --interleave=all ..." failed in the
> presence of memoryless nodes. This patch attempts to fix that problem.
>
> Some background:
>
> numactl --interleave=all calls set_mempolicy(2) with a fully
> populated [out to MAXNUMNODES] nodemask. set_mempolicy()
> [in do_set_mempolicy()] calls contextualize_policy() which
> requires that the nodemask be a subset of the current task's
> mems_allowed; else EINVAL will be returned. A task's
> mems_allowed will always be a subset of node_states[N_HIGH_MEMORY]--
> i.e., nodes with memory. So, a fully populated nodemask will
> be declared invalid if it includes memoryless nodes.
>
> NOTE: the same thing will occur when running in a cpuset
> with restricted mem_allowed--for the same reason:
> node mask contains dis-allowed nodes.
>
> mbind(2), on the other hand, just masks off any nodes in the
> nodemask that are not included in the caller's mems_allowed.
>
> In each case [mbind() and set_mempolicy()], mpol_check_policy()
> will complain [again, resulting in EINVAL] if the nodemask contains
> any memoryless nodes. This is somewhat redundant as mpol_new()
> will remove memoryless nodes for interleave policy, as will
> bind_zonelist()--called by mpol_new() for BIND policy.
>
> Proposed fix:
>
> 1) modify contextualize_policy logic to:
> a) remember whether the incoming node mask is empty.
> b) if not, restrict the nodemask to allowed nodes, as is
> currently done in-line for mbind(). This guarantees
> that the resulting mask includes only nodes with memory.
>
> NOTE: this is a [benign, IMO] change in behavior for
> set_mempolicy(). Dis-allowed nodes will be
> silently ignored, rather than returning an error.
>
> c) fold this code into mpol_check_policy(), replace 2 calls to
> contextualize_policy() to call mpol_check_policy() directly
> and remove contextualize_policy().
>
> 2) In existing mpol_check_policy() logic, after "contextualization":
> a) MPOL_DEFAULT: require that in coming mask "was_empty"
> b) MPOL_{BIND|INTERLEAVE}: require that contextualized nodemask
> contains at least one node.
> c) add a case for MPOL_PREFERRED: if in coming was not empty
> and resulting mask IS empty, user specified invalid nodes.
> Return EINVAL.
> c) remove the now redundant check for memoryless nodes
>
> 3) remove the now redundant masking of policy nodes for interleave
> policy from mpol_new().
>
> 4) Now that mpol_check_policy() contextualizes the nodemask, remove
> the in-line nodes_and() from sys_mbind(). I believe that this
> restores mbind() to the behavior before the memoryless-nodes
> patch series. E.g., we'll no longer treat an invalid nodemask
> with MPOL_PREFERRED as local allocation.
>
> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
>
> include/linux/cpuset.h | 3 --
> mm/mempolicy.c | 61 ++++++++++++++++++++++++++++---------------------
> 2 files changed, 36 insertions(+), 28 deletions(-)
>
> Index: Linux/mm/mempolicy.c
> ===================================================================
> --- Linux.orig/mm/mempolicy.c 2008-02-08 11:11:34.000000000 -0500
> +++ Linux/mm/mempolicy.c 2008-02-08 13:40:40.000000000 -0500
> @@ -116,22 +116,51 @@ static void mpol_rebind_policy(struct me
> /* Do sanity checking on a policy */
> static int mpol_check_policy(int mode, nodemask_t *nodes)
> {
> - int empty = nodes_empty(*nodes);
> + int was_empty, is_empty;
> +
> + if (!nodes)
> + return 0;
> +
> + /*
> + * "Contextualize" the in-coming nodemast for cpusets:
> + * Remember whether in-coming nodemask was empty, If not,
> + * restrict the nodes to the allowed nodes in the cpuset.
> + * This is guaranteed to be a subset of nodes with memory.
> + */
> + cpuset_update_task_memory_state();
> + is_empty = was_empty = nodes_empty(*nodes);
> + if (!was_empty) {
> + nodes_and(*nodes, *nodes, cpuset_current_mems_allowed);
> + is_empty = nodes_empty(*nodes); /* after "contextualization" */
> + }
>
> switch (mode) {
> case MPOL_DEFAULT:
> - if (!empty)
> + /*
> + * require caller to specify an empty nodemask
> + * before "contextualization"
> + */
> + if (!was_empty)
> return -EINVAL;
> break;
> case MPOL_BIND:
> case MPOL_INTERLEAVE:
> - /* Preferred will only use the first bit, but allow
> - more for now. */
> - if (empty)
> + /*
> + * require at least 1 valid node after "contextualization"
> + */
> + if (is_empty)
> + return -EINVAL;
> + break;
> + case MPOL_PREFERRED:
> + /*
> + * Did caller specify invalid nodes?
> + * Don't silently accept this as "local allocation".
> + */
> + if (!was_empty && is_empty)
> return -EINVAL;
> break;
> }
> - return nodes_subset(*nodes, node_states[N_HIGH_MEMORY]) ? 0 : -EINVAL;
> + return 0;
> }
>
> /* Generate a custom zonelist for the BIND policy. */
> @@ -188,8 +217,6 @@ static struct mempolicy *mpol_new(int mo
> switch (mode) {
> case MPOL_INTERLEAVE:
> policy->v.nodes = *nodes;
> - nodes_and(policy->v.nodes, policy->v.nodes,
> - node_states[N_HIGH_MEMORY]);
> if (nodes_weight(policy->v.nodes) == 0) {
> kmem_cache_free(policy_cache, policy);
> return ERR_PTR(-EINVAL);
> @@ -421,18 +448,6 @@ static int mbind_range(struct vm_area_st
> return err;
> }
>
> -static int contextualize_policy(int mode, nodemask_t *nodes)
> -{
> - if (!nodes)
> - return 0;
> -
> - cpuset_update_task_memory_state();
> - if (!cpuset_nodes_subset_current_mems_allowed(*nodes))
> - return -EINVAL;
> - return mpol_check_policy(mode, nodes);
> -}
> -
> -
> /*
> * Update task->flags PF_MEMPOLICY bit: set iff non-default
> * mempolicy. Allows more rapid checking of this (combined perhaps
> @@ -468,7 +483,7 @@ static long do_set_mempolicy(int mode, n
> {
> struct mempolicy *new;
>
> - if (contextualize_policy(mode, nodes))
> + if (mpol_check_policy(mode, nodes))
> return -EINVAL;
> new = mpol_new(mode, nodes);
> if (IS_ERR(new))
> @@ -915,10 +930,6 @@ asmlinkage long sys_mbind(unsigned long
> err = get_nodes(&nodes, nmask, maxnode);
> if (err)
> return err;
> -#ifdef CONFIG_CPUSETS
> - /* Restrict the nodes to the allowed nodes in the cpuset */
> - nodes_and(nodes, nodes, current->mems_allowed);
> -#endif
> return do_mbind(start, len, mode, &nodes, flags);
> }
>
> Index: Linux/include/linux/cpuset.h
> ===================================================================
> --- Linux.orig/include/linux/cpuset.h 2008-02-08 11:11:34.000000000 -0500
> +++ Linux/include/linux/cpuset.h 2008-02-08 11:12:43.000000000 -0500
> @@ -26,8 +26,6 @@ extern nodemask_t cpuset_mems_allowed(st
> #define cpuset_current_mems_allowed (current->mems_allowed)
> void cpuset_init_current_mems_allowed(void);
> void cpuset_update_task_memory_state(void);
> -#define cpuset_nodes_subset_current_mems_allowed(nodes) \
> - nodes_subset((nodes), current->mems_allowed)
> int cpuset_zonelist_valid_mems_allowed(struct zonelist *zl);
>
> extern int __cpuset_zone_allowed_softwall(struct zone *z, gfp_t gfp_mask);
> @@ -103,7 +101,6 @@ static inline nodemask_t cpuset_mems_all
> #define cpuset_current_mems_allowed (node_states[N_HIGH_MEMORY])
> static inline void cpuset_init_current_mems_allowed(void) {}
> static inline void cpuset_update_task_memory_state(void) {}
> -#define cpuset_nodes_subset_current_mems_allowed(nodes) (1)
>
> static inline int cpuset_zonelist_valid_mems_allowed(struct zonelist *zl)
> {
>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [PATCH 2.6.24-mm1] Mempolicy: silently restrict nodemask to allowed nodes V3
2008-02-10 5:29 ` KOSAKI Motohiro
@ 2008-02-10 5:49 ` Greg KH
2008-02-10 7:42 ` Linus Torvalds
0 siblings, 1 reply; 43+ messages in thread
From: Greg KH @ 2008-02-10 5:49 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: Lee Schermerhorn, linux-kernel, Andrew Morton, Christoph Lameter,
Paul Jackson, David Rientjes, Mel Gorman, linux-mm, torvalds,
Eric Whitney
On Sun, Feb 10, 2008 at 02:29:24PM +0900, KOSAKI Motohiro wrote:
> CC'd Greg KH <greg@kroah.com>
>
> I tested this patch on fujitsu memoryless node.
> (2.6.24 + silently-restrict-nodemask-to-allowed-nodes-V3 insted 2.6.24-mm1)
> it seems works good.
>
> Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>
>
> Greg, I hope this patch merge to 2.6.24.x stable tree because
> this patch is regression fixed patch.
> Please tell me what do i doing for it.
Once the patch goes into Linus's tree, feel free to send it to the
stable@kernel.org address so that we can include it in the 2.6.24.x
tree.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 2.6.24-mm1] Mempolicy: silently restrict nodemask to allowed nodes V3
2008-02-10 5:49 ` Greg KH
@ 2008-02-10 7:42 ` Linus Torvalds
2008-02-10 10:31 ` Andrew Morton
2008-02-11 16:47 ` Lee Schermerhorn
0 siblings, 2 replies; 43+ messages in thread
From: Linus Torvalds @ 2008-02-10 7:42 UTC (permalink / raw)
To: Greg KH
Cc: KOSAKI Motohiro, Lee Schermerhorn, linux-kernel, Andrew Morton,
Christoph Lameter, Paul Jackson, David Rientjes, Mel Gorman,
linux-mm, Eric Whitney
On Sat, 9 Feb 2008, Greg KH wrote:
>
> Once the patch goes into Linus's tree, feel free to send it to the
> stable@kernel.org address so that we can include it in the 2.6.24.x
> tree.
I've been ignoring the patches because they say "PATCH 2.6.24-mm1", and so
I simply don't know whether it's supposed to go into *my* kernel or just
-mm.
There's also been several versions and discussions, so I'd really like to
have somebody send me a final patch with all the acks etc.. One that is
clearly for me, not for -mm.
Linus
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 2.6.24-mm1] Mempolicy: silently restrict nodemask to allowed nodes V3
2008-02-10 7:42 ` Linus Torvalds
@ 2008-02-10 10:31 ` Andrew Morton
2008-02-11 16:47 ` Lee Schermerhorn
1 sibling, 0 replies; 43+ messages in thread
From: Andrew Morton @ 2008-02-10 10:31 UTC (permalink / raw)
To: Linus Torvalds
Cc: Greg KH, KOSAKI Motohiro, Lee Schermerhorn, linux-kernel,
Christoph Lameter, Paul Jackson, David Rientjes, Mel Gorman,
linux-mm, Eric Whitney
On Sat, 9 Feb 2008 23:42:21 -0800 (PST) Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
>
> On Sat, 9 Feb 2008, Greg KH wrote:
> >
> > Once the patch goes into Linus's tree, feel free to send it to the
> > stable@kernel.org address so that we can include it in the 2.6.24.x
> > tree.
>
> I've been ignoring the patches because they say "PATCH 2.6.24-mm1", and so
> I simply don't know whether it's supposed to go into *my* kernel or just
> -mm.
>
> There's also been several versions and discussions, so I'd really like to
> have somebody send me a final patch with all the acks etc.. One that is
> clearly for me, not for -mm.
>
fyi, I won't be able to do much patch-wrangling until Tuesday or Wednesday.
All the big machines are disconnected and mothballed due to domestic
s/carpet/hardwood/g.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 2.6.24-mm1] Mempolicy: silently restrict nodemask to allowed nodes V3
2008-02-10 7:42 ` Linus Torvalds
2008-02-10 10:31 ` Andrew Morton
@ 2008-02-11 16:47 ` Lee Schermerhorn
2008-02-12 4:30 ` [PATCH for 2.6.24][regression fix] " KOSAKI Motohiro
1 sibling, 1 reply; 43+ messages in thread
From: Lee Schermerhorn @ 2008-02-11 16:47 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: Linus Torvalds, Greg KH, linux-kernel, Andrew Morton,
Christoph Lameter, Paul Jackson, David Rientjes, Mel Gorman,
linux-mm, Eric Whitney
On Sat, 2008-02-09 at 23:42 -0800, Linus Torvalds wrote:
>
> On Sat, 9 Feb 2008, Greg KH wrote:
> >
> > Once the patch goes into Linus's tree, feel free to send it to the
> > stable@kernel.org address so that we can include it in the 2.6.24.x
> > tree.
>
> I've been ignoring the patches because they say "PATCH 2.6.24-mm1", and so
> I simply don't know whether it's supposed to go into *my* kernel or just
> -mm.
>
> There's also been several versions and discussions, so I'd really like to
> have somebody send me a final patch with all the acks etc.. One that is
> clearly for me, not for -mm.
>
Kosaki-san: You've tested V3 on '.24. Do you want to repost the patch
refreshed against .24, adding your "Tested-by:" [and "Signed-off-by:",
as the folding of the contextualization into mpol_check_policy() is
based on your code--apologies for not adding it myself]? I'm tied up
with something else for most of this week and won't get to it until
Friday, earliest.
Regards,
Lee
P.S., As Andrew pointed out, I forgot to run checkpatch and the patch
does include a violation thereof.
^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH for 2.6.24][regression fix] Mempolicy: silently restrict nodemask to allowed nodes V3
2008-02-11 16:47 ` Lee Schermerhorn
@ 2008-02-12 4:30 ` KOSAKI Motohiro
2008-02-12 5:06 ` David Rientjes
2008-02-12 5:07 ` Andrew Morton
0 siblings, 2 replies; 43+ messages in thread
From: KOSAKI Motohiro @ 2008-02-12 4:30 UTC (permalink / raw)
To: Lee Schermerhorn, Andrew Morton
Cc: kosaki.motohiro, Linus Torvalds, Greg KH, linux-kernel,
Christoph Lameter, Paul Jackson, David Rientjes, Mel Gorman,
linux-mm, Eric Whitney
Hi Andrew
# this is second post of the same patch.
this is backport from -mm to mainline.
original patch is http://marc.info/?l=linux-kernel&m=120250000001182&w=2
my change is only line number change and remove extra space.
please ack.
============================
[PATCH] 2.6.24 - mempolicy: silently restrict nodemask to allowed nodes
V2 -> V3:
+ As suggested by Kosaki Motohito, fold the "contextualization"
of policy nodemask into mpol_check_policy(). Looks a little
cleaner.
V1 -> V2:
+ Communicate whether or not incoming node mask was empty to
mpol_check_policy() for better error checking.
+ As suggested by David Rientjes, remove the now unused
cpuset_nodes_subset_current_mems_allowed() from cpuset.h
Kosaki Motohito noted that "numactl --interleave=all ..." failed in the
presence of memoryless nodes. This patch attempts to fix that problem.
Some background:
numactl --interleave=all calls set_mempolicy(2) with a fully
populated [out to MAXNUMNODES] nodemask. set_mempolicy()
[in do_set_mempolicy()] calls contextualize_policy() which
requires that the nodemask be a subset of the current task's
mems_allowed; else EINVAL will be returned. A task's
mems_allowed will always be a subset of node_states[N_HIGH_MEMORY]--
i.e., nodes with memory. So, a fully populated nodemask will
be declared invalid if it includes memoryless nodes.
NOTE: the same thing will occur when running in a cpuset
with restricted mem_allowed--for the same reason:
node mask contains dis-allowed nodes.
mbind(2), on the other hand, just masks off any nodes in the
nodemask that are not included in the caller's mems_allowed.
In each case [mbind() and set_mempolicy()], mpol_check_policy()
will complain [again, resulting in EINVAL] if the nodemask contains
any memoryless nodes. This is somewhat redundant as mpol_new()
will remove memoryless nodes for interleave policy, as will
bind_zonelist()--called by mpol_new() for BIND policy.
Proposed fix:
1) modify contextualize_policy logic to:
a) remember whether the incoming node mask is empty.
b) if not, restrict the nodemask to allowed nodes, as is
currently done in-line for mbind(). This guarantees
that the resulting mask includes only nodes with memory.
NOTE: this is a [benign, IMO] change in behavior for
set_mempolicy(). Dis-allowed nodes will be
silently ignored, rather than returning an error.
c) fold this code into mpol_check_policy(), replace 2 calls to
contextualize_policy() to call mpol_check_policy() directly
and remove contextualize_policy().
2) In existing mpol_check_policy() logic, after "contextualization":
a) MPOL_DEFAULT: require that in coming mask "was_empty"
b) MPOL_{BIND|INTERLEAVE}: require that contextualized nodemask
contains at least one node.
c) add a case for MPOL_PREFERRED: if in coming was not empty
and resulting mask IS empty, user specified invalid nodes.
Return EINVAL.
c) remove the now redundant check for memoryless nodes
3) remove the now redundant masking of policy nodes for interleave
policy from mpol_new().
4) Now that mpol_check_policy() contextualizes the nodemask, remove
the in-line nodes_and() from sys_mbind(). I believe that this
restores mbind() to the behavior before the memoryless-nodes
patch series. E.g., we'll no longer treat an invalid nodemask
with MPOL_PREFERRED as local allocation.
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
include/linux/cpuset.h | 3 --
mm/mempolicy.c | 61 ++++++++++++++++++++++++++++---------------------
2 files changed, 36 insertions(+), 28 deletions(-)
Index: b/mm/mempolicy.c
===================================================================
--- a/mm/mempolicy.c 2008-02-10 14:27:58.000000000 +0900
+++ b/mm/mempolicy.c 2008-02-12 09:37:35.000000000 +0900
@@ -116,22 +116,51 @@ static void mpol_rebind_policy(struct me
/* Do sanity checking on a policy */
static int mpol_check_policy(int mode, nodemask_t *nodes)
{
- int empty = nodes_empty(*nodes);
+ int was_empty, is_empty;
+
+ if (!nodes)
+ return 0;
+
+ /*
+ * "Contextualize" the in-coming nodemast for cpusets:
+ * Remember whether in-coming nodemask was empty, If not,
+ * restrict the nodes to the allowed nodes in the cpuset.
+ * This is guaranteed to be a subset of nodes with memory.
+ */
+ cpuset_update_task_memory_state();
+ is_empty = was_empty = nodes_empty(*nodes);
+ if (!was_empty) {
+ nodes_and(*nodes, *nodes, cpuset_current_mems_allowed);
+ is_empty = nodes_empty(*nodes); /* after "contextualization" */
+ }
switch (mode) {
case MPOL_DEFAULT:
- if (!empty)
+ /*
+ * require caller to specify an empty nodemask
+ * before "contextualization"
+ */
+ if (!was_empty)
return -EINVAL;
break;
case MPOL_BIND:
case MPOL_INTERLEAVE:
- /* Preferred will only use the first bit, but allow
- more for now. */
- if (empty)
+ /*
+ * require at least 1 valid node after "contextualization"
+ */
+ if (is_empty)
+ return -EINVAL;
+ break;
+ case MPOL_PREFERRED:
+ /*
+ * Did caller specify invalid nodes?
+ * Don't silently accept this as "local allocation".
+ */
+ if (!was_empty && is_empty)
return -EINVAL;
break;
}
- return nodes_subset(*nodes, node_states[N_HIGH_MEMORY]) ? 0 : -EINVAL;
+ return 0;
}
/* Generate a custom zonelist for the BIND policy. */
@@ -188,8 +217,6 @@ static struct mempolicy *mpol_new(int mo
switch (mode) {
case MPOL_INTERLEAVE:
policy->v.nodes = *nodes;
- nodes_and(policy->v.nodes, policy->v.nodes,
- node_states[N_HIGH_MEMORY]);
if (nodes_weight(policy->v.nodes) == 0) {
kmem_cache_free(policy_cache, policy);
return ERR_PTR(-EINVAL);
@@ -421,18 +448,6 @@ static int mbind_range(struct vm_area_st
return err;
}
-static int contextualize_policy(int mode, nodemask_t *nodes)
-{
- if (!nodes)
- return 0;
-
- cpuset_update_task_memory_state();
- if (!cpuset_nodes_subset_current_mems_allowed(*nodes))
- return -EINVAL;
- return mpol_check_policy(mode, nodes);
-}
-
-
/*
* Update task->flags PF_MEMPOLICY bit: set iff non-default
* mempolicy. Allows more rapid checking of this (combined perhaps
@@ -468,7 +483,7 @@ static long do_set_mempolicy(int mode, n
{
struct mempolicy *new;
- if (contextualize_policy(mode, nodes))
+ if (mpol_check_policy(mode, nodes))
return -EINVAL;
new = mpol_new(mode, nodes);
if (IS_ERR(new))
@@ -915,10 +930,6 @@ asmlinkage long sys_mbind(unsigned long
err = get_nodes(&nodes, nmask, maxnode);
if (err)
return err;
-#ifdef CONFIG_CPUSETS
- /* Restrict the nodes to the allowed nodes in the cpuset */
- nodes_and(nodes, nodes, current->mems_allowed);
-#endif
return do_mbind(start, len, mode, &nodes, flags);
}
Index: b/include/linux/cpuset.h
===================================================================
--- a/include/linux/cpuset.h 2008-02-10 14:27:58.000000000 +0900
+++ b/include/linux/cpuset.h 2008-02-10 14:33:40.000000000 +0900
@@ -26,8 +26,6 @@ extern nodemask_t cpuset_mems_allowed(st
#define cpuset_current_mems_allowed (current->mems_allowed)
void cpuset_init_current_mems_allowed(void);
void cpuset_update_task_memory_state(void);
-#define cpuset_nodes_subset_current_mems_allowed(nodes) \
- nodes_subset((nodes), current->mems_allowed)
int cpuset_zonelist_valid_mems_allowed(struct zonelist *zl);
extern int __cpuset_zone_allowed_softwall(struct zone *z, gfp_t gfp_mask);
@@ -101,7 +99,6 @@ static inline nodemask_t cpuset_mems_all
#define cpuset_current_mems_allowed (node_states[N_HIGH_MEMORY])
static inline void cpuset_init_current_mems_allowed(void) {}
static inline void cpuset_update_task_memory_state(void) {}
-#define cpuset_nodes_subset_current_mems_allowed(nodes) (1)
static inline int cpuset_zonelist_valid_mems_allowed(struct zonelist *zl)
{
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [PATCH for 2.6.24][regression fix] Mempolicy: silently restrict nodemask to allowed nodes V3
2008-02-12 4:30 ` [PATCH for 2.6.24][regression fix] " KOSAKI Motohiro
@ 2008-02-12 5:06 ` David Rientjes
2008-02-12 5:07 ` Andrew Morton
1 sibling, 0 replies; 43+ messages in thread
From: David Rientjes @ 2008-02-12 5:06 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: Lee Schermerhorn, Andrew Morton, Linus Torvalds, Greg KH,
linux-kernel, Christoph Lameter, Paul Jackson, Mel Gorman,
linux-mm, Eric Whitney
On Tue, 12 Feb 2008, KOSAKI Motohiro wrote:
> [PATCH] 2.6.24 - mempolicy: silently restrict nodemask to allowed nodes
>
Linus has already merged this patch into his tree, but next time you
pass along a contribution to a maintainer the first line should read:
From: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
so the person who actually wrote the patch is listed as the author in the
git commit.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH for 2.6.24][regression fix] Mempolicy: silently restrict nodemask to allowed nodes V3
2008-02-12 4:30 ` [PATCH for 2.6.24][regression fix] " KOSAKI Motohiro
2008-02-12 5:06 ` David Rientjes
@ 2008-02-12 5:07 ` Andrew Morton
2008-02-12 13:18 ` KOSAKI Motohiro
1 sibling, 1 reply; 43+ messages in thread
From: Andrew Morton @ 2008-02-12 5:07 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: Lee Schermerhorn, Linus Torvalds, Greg KH, linux-kernel,
Christoph Lameter, Paul Jackson, David Rientjes, Mel Gorman,
linux-mm, Eric Whitney
On Tue, 12 Feb 2008 13:30:22 +0900 KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
> Hi Andrew
>
> # this is second post of the same patch.
>
> this is backport from -mm to mainline.
> original patch is http://marc.info/?l=linux-kernel&m=120250000001182&w=2
>
> my change is only line number change and remove extra space.
This is identical to what I have now.
> please ack.
As it's now post -rc1 and not a 100% obvious thing, I tend to hang onto
such patches for a week or so before sending up to Linus
Should this be backported to 2.6.24.x? If so, the reasons for such a
relatively stern step should be spelled out in the changelog for the
-stable maintiners to evaluate.
Thanks.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH for 2.6.24][regression fix] Mempolicy: silently restrict nodemask to allowed nodes V3
2008-02-12 5:07 ` Andrew Morton
@ 2008-02-12 13:18 ` KOSAKI Motohiro
0 siblings, 0 replies; 43+ messages in thread
From: KOSAKI Motohiro @ 2008-02-12 13:18 UTC (permalink / raw)
To: Andrew Morton
Cc: kosaki.motohiro, Lee Schermerhorn, Linus Torvalds, Greg KH,
linux-kernel, Christoph Lameter, Paul Jackson, David Rientjes,
Mel Gorman, linux-mm, Eric Whitney
Hi
> > please ack.
>
> As it's now post -rc1 and not a 100% obvious thing, I tend to hang onto
> such patches for a week or so before sending up to Linus
Thanks, really thanks.
> Should this be backported to 2.6.24.x? If so, the reasons for such a
> relatively stern step should be spelled out in the changelog for the
> -stable maintiners to evaluate.
Oh,
you think below reason is not enough, really?
1. it is regression.
2. it is very easy reprodusable on memoryless node machine.
if so, i back down on my backport reclaim.
I don't hope increase your headache ;-)
thanks.
-kosaki
^ permalink raw reply [flat|nested] 43+ messages in thread