linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* cpusets vs. mempolicy and how to get interleaving
@ 2007-08-18  0:22 Ethan Solomita
  2007-08-18  0:29 ` Ethan Solomita
  2007-08-18  1:07 ` Christoph Lameter
  0 siblings, 2 replies; 16+ messages in thread
From: Ethan Solomita @ 2007-08-18  0:22 UTC (permalink / raw)
  To: Paul Jackson, Christoph Lameter, linux-mm

application to request NUMA interleaving in the face of cpusets and 
modifications to mems_allowed. I'm hoping for some advice.

What we want:

	Ideally, we want a task to express its preference for interleaved 
memory allocations without having to provide a list of nodes. The kernel 
will automatically round-robin amongst the task's mems_allowed.

The problem:

	At least in our environment, an independent "cpuset manager" process 
may choose to rewrite a cpuset's mems file at any time, possibly 
increasing or decreasing the number of available nodes. If 
weight(mems_allowed) is decreased, the task's MPOL_INTERLEAVE policy's 
nodemask will be shrunk to fit the new mems_allowed. If 
weight(mems_allowed) is grown, the policy's nodemask will not gain new 
nodes.

	What we want is for the task to "set it and forget it," i.e. to express 
a preference for interleaving and then never worry about NUMA again. If 
the nodemask sent via sys_mempolicy(MPOL_INTERLEAVE) served as a mask 
against mems_allowed, then we would specify an all-1s nodemask.

	I realize that this doesn't work with backwards compatibility so I'm 
looking for advice. A new policy MPOL_INTERLEAVE_ALL that doesn't take a 
nodemask argument and interleaves within mems_allowed? Any better 
suggestions?

	Thanks!
	-- Ethan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: cpusets vs. mempolicy and how to get interleaving
  2007-08-18  0:22 cpusets vs. mempolicy and how to get interleaving Ethan Solomita
@ 2007-08-18  0:29 ` Ethan Solomita
  2007-08-19 10:18   ` David Rientjes
  2007-08-18  1:07 ` Christoph Lameter
  1 sibling, 1 reply; 16+ messages in thread
From: Ethan Solomita @ 2007-08-18  0:29 UTC (permalink / raw)
  To: Paul Jackson, Christoph Lameter, linux-mm

[Resend -- the original email seems to have lost its first line]

     I'm concerned that there isn't an adequate mechanism available for 
an application to request NUMA interleaving in the face of cpusets and 
modifications to mems_allowed. I'm hoping for some advice.

What we want:

     Ideally, we want a task to express its preference for interleaved 
memory allocations without having to provide a list of nodes. The kernel 
will automatically round-robin amongst the task's mems_allowed.

The problem:

     At least in our environment, an independent "cpuset manager" 
process may choose to rewrite a cpuset's mems file at any time, possibly 
increasing or decreasing the number of available nodes. If 
weight(mems_allowed) is decreased, the task's MPOL_INTERLEAVE policy's 
nodemask will be shrunk to fit the new mems_allowed. If 
weight(mems_allowed) is grown, the policy's nodemask will not gain new 
nodes.

     What we want is for the task to "set it and forget it," i.e. to 
express a preference for interleaving and then never worry about NUMA 
again. If the nodemask sent via sys_mempolicy(MPOL_INTERLEAVE) served as 
a mask against mems_allowed, then we would specify an all-1s nodemask.

     I realize that this doesn't work with backwards compatibility so 
I'm looking for advice. A new policy MPOL_INTERLEAVE_ALL that doesn't 
take a nodemask argument and interleaves within mems_allowed? Any better 
suggestions?

     Thanks!
     -- Ethan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: cpusets vs. mempolicy and how to get interleaving
  2007-08-18  0:22 cpusets vs. mempolicy and how to get interleaving Ethan Solomita
  2007-08-18  0:29 ` Ethan Solomita
@ 2007-08-18  1:07 ` Christoph Lameter
  2007-08-18  1:51   ` Ethan Solomita
  1 sibling, 1 reply; 16+ messages in thread
From: Christoph Lameter @ 2007-08-18  1:07 UTC (permalink / raw)
  To: Ethan Solomita; +Cc: Paul Jackson, linux-mm

On Fri, 17 Aug 2007, Ethan Solomita wrote:

> 	Ideally, we want a task to express its preference for interleaved
> memory allocations without having to provide a list of nodes. The kernel will
> automatically round-robin amongst the task's mems_allowed.

You can do that by writing 1 to /dev/cpuset/<cpuset>/memory_spread_page

> 	I realize that this doesn't work with backwards compatibility so I'm
> looking for advice. A new policy MPOL_INTERLEAVE_ALL that doesn't take a
> nodemask argument and interleaves within mems_allowed? Any better suggestions?

No need for a policy. Just use what I suggested above.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: cpusets vs. mempolicy and how to get interleaving
  2007-08-18  1:07 ` Christoph Lameter
@ 2007-08-18  1:51   ` Ethan Solomita
  0 siblings, 0 replies; 16+ messages in thread
From: Ethan Solomita @ 2007-08-18  1:51 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Paul Jackson, linux-mm

Christoph Lameter wrote:
> On Fri, 17 Aug 2007, Ethan Solomita wrote:
> 
>> 	Ideally, we want a task to express its preference for interleaved
>> memory allocations without having to provide a list of nodes. The kernel will
>> automatically round-robin amongst the task's mems_allowed.
> 
> You can do that by writing 1 to /dev/cpuset/<cpuset>/memory_spread_page

	Sorry, also noticed that the above doesn't affect anonymous pages, just 
page cache, and we'd want interleaved anonymous pages.
	-- Ethan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: cpusets vs. mempolicy and how to get interleaving
  2007-08-18  0:29 ` Ethan Solomita
@ 2007-08-19 10:18   ` David Rientjes
  2007-08-20  0:53     ` Ethan Solomita
  0 siblings, 1 reply; 16+ messages in thread
From: David Rientjes @ 2007-08-19 10:18 UTC (permalink / raw)
  To: Ethan Solomita; +Cc: Paul Jackson, Christoph Lameter, linux-mm

On Fri, 17 Aug 2007, Ethan Solomita wrote:

>     Ideally, we want a task to express its preference for interleaved memory
> allocations without having to provide a list of nodes. The kernel will
> automatically round-robin amongst the task's mems_allowed.
> 

Just pass the result of

	grep Mems_allowed /proc/pid/status | awk '{ print $2 }' | sed s/,//

to set_mempolicy() with MPOL_INTERLEAVE.

>     At least in our environment, an independent "cpuset manager" process may
> choose to rewrite a cpuset's mems file at any time, possibly increasing or
> decreasing the number of available nodes. If weight(mems_allowed) is
> decreased, the task's MPOL_INTERLEAVE policy's nodemask will be shrunk to fit
> the new mems_allowed. If weight(mems_allowed) is grown, the policy's nodemask
> will not gain new nodes.
> 

This is not unlike the traditional use of cpusets; a cpuset's mems_allowed 
may be freely changed at any time.

If the weight of a task's mems_allowed decreases, you would want a simple 
remap from the old nodemask to the new nodemask.  node_remap() provides 
this functionality already.

>     What we want is for the task to "set it and forget it," i.e. to express a
> preference for interleaving and then never worry about NUMA again. If the
> nodemask sent via sys_mempolicy(MPOL_INTERLEAVE) served as a mask against
> mems_allowed, then we would specify an all-1s nodemask.
> 

It already does exactly what you want.

cpuset_update_task_memory_state() is invoked anytime an allocation with 
__GFP_WAIT is requested via alloc_pages_current() in process context in 
addition to alloc_page_vma() for any userspace mapped pages.

If a task's mems_allowed has changed in its cpuset behind the task's back, 
mpol_rebind_policy() is called for that task's mempolicy with a pointer to 
the cpuset's mems_allowed.  This will be considered the new mems_allowed 
for the task and is stored in its task_struct.

mpol_rebind_policy() will rebind MPOL_INTERLEAVE policies by remapping the 
old mems_allowed nodemask with the new nodemask from the cpuset and, at 
the same time, update task->il_next to specify the next node to allocate 
from as reflected by the new nodemask.

		David

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: cpusets vs. mempolicy and how to get interleaving
  2007-08-19 10:18   ` David Rientjes
@ 2007-08-20  0:53     ` Ethan Solomita
  2007-08-20  2:34       ` Paul Jackson
  0 siblings, 1 reply; 16+ messages in thread
From: Ethan Solomita @ 2007-08-20  0:53 UTC (permalink / raw)
  To: David Rientjes; +Cc: Paul Jackson, Christoph Lameter, linux-mm

David Rientjes wrote:
> On Fri, 17 Aug 2007, Ethan Solomita wrote:
> 
>>     Ideally, we want a task to express its preference for interleaved memory
>> allocations without having to provide a list of nodes. The kernel will
>> automatically round-robin amongst the task's mems_allowed.
>>
>>     At least in our environment, an independent "cpuset manager" process may
>> choose to rewrite a cpuset's mems file at any time, possibly increasing or
>> decreasing the number of available nodes. If weight(mems_allowed) is
>> decreased, the task's MPOL_INTERLEAVE policy's nodemask will be shrunk to fit
>> the new mems_allowed. If weight(mems_allowed) is grown, the policy's nodemask
>> will not gain new nodes.
>>
> 
> This is not unlike the traditional use of cpusets; a cpuset's mems_allowed 
> may be freely changed at any time.
> 
> If the weight of a task's mems_allowed decreases, you would want a simple 
> remap from the old nodemask to the new nodemask.  node_remap() provides 
> this functionality already.

	And what happens when the weight then goes back up? e.g. at first the 
mems_allowed specifies nodes 0 and 1, and the user sets a 
MPOL_INTERLEAVE policy across nodes 0 and 1. At some point the "cpuset 
manager" shrinks the number of nodes to just node 0, then later it adds 
back node 1. What nodes are in my MPOL_INTERLEAVE policy?

	As I read the code, I'll only have one node in the mempolicy. If that's 
true, this doesn't do what I want.
	-- Ethan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: cpusets vs. mempolicy and how to get interleaving
  2007-08-20  0:53     ` Ethan Solomita
@ 2007-08-20  2:34       ` Paul Jackson
  2007-08-20  5:47         ` Ethan Solomita
  0 siblings, 1 reply; 16+ messages in thread
From: Paul Jackson @ 2007-08-20  2:34 UTC (permalink / raw)
  To: Ethan Solomita; +Cc: rientjes, clameter, linux-mm

Ethan wrote:
> 	And what happens when the weight then goes back up? e.g. at first the 
> mems_allowed specifies nodes 0 and 1, and the user sets a 
> MPOL_INTERLEAVE policy across nodes 0 and 1. At some point the "cpuset 
> manager" shrinks the number of nodes to just node 0, then later it adds 
> back node 1. What nodes are in my MPOL_INTERLEAVE policy?
> 
> 	As I read the code, I'll only have one node in the mempolicy. If that's 
> true, this doesn't do what I want.

I read the code the same way.

Sounds to me like you want a new and different MPOL_* mempolicy, that
interleaves over whatever nodes are available (allowed) to the task.

The existing MPOL_INTERLEAVE mempolicy interleaves over some specified
nodemask, so we do the best we can to remap that set when it changes.

You want a mempolicy that interleaves over all available nodes, not over
some specified subset of them.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: cpusets vs. mempolicy and how to get interleaving
  2007-08-20  2:34       ` Paul Jackson
@ 2007-08-20  5:47         ` Ethan Solomita
  2007-08-20  5:53           ` Paul Jackson
  2007-08-20 19:07           ` Christoph Lameter
  0 siblings, 2 replies; 16+ messages in thread
From: Ethan Solomita @ 2007-08-20  5:47 UTC (permalink / raw)
  To: Paul Jackson; +Cc: rientjes, clameter, linux-mm

Paul Jackson wrote:
> Ethan wrote:
>> 	And what happens when the weight then goes back up? e.g. at first the 
>> mems_allowed specifies nodes 0 and 1, and the user sets a 
>> MPOL_INTERLEAVE policy across nodes 0 and 1. At some point the "cpuset 
>> manager" shrinks the number of nodes to just node 0, then later it adds 
>> back node 1. What nodes are in my MPOL_INTERLEAVE policy?
>>
>> 	As I read the code, I'll only have one node in the mempolicy. If that's 
>> true, this doesn't do what I want.
> 
> I read the code the same way.
> 
> Sounds to me like you want a new and different MPOL_* mempolicy, that
> interleaves over whatever nodes are available (allowed) to the task.
> 
> The existing MPOL_INTERLEAVE mempolicy interleaves over some specified
> nodemask, so we do the best we can to remap that set when it changes.
> 
> You want a mempolicy that interleaves over all available nodes, not over
> some specified subset of them.

	OK, then I'll proceed with a new MPOL. Do you believe that this will be 
of general interest? i.e. worth placing in linux-mm?

	BTW, a slightly different MPOL_INTERLEAVE implementation would help, 
wherein we save the nodemask originally specified by the user and do the 
remap from the original nodemask rather than the current nodemask. This 
would also let the user specify an all-ones nodemask which would then be 
remapped onto mems_allowed. But I'm guessing that these changes would be 
impossible due to breaking compatibility?
	-- Ethan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: cpusets vs. mempolicy and how to get interleaving
  2007-08-20  5:47         ` Ethan Solomita
@ 2007-08-20  5:53           ` Paul Jackson
  2007-08-20  8:10             ` David Rientjes
  2007-08-20 19:07           ` Christoph Lameter
  1 sibling, 1 reply; 16+ messages in thread
From: Paul Jackson @ 2007-08-20  5:53 UTC (permalink / raw)
  To: Ethan Solomita; +Cc: rientjes, clameter, linux-mm

Ethan wrote:
> 	OK, then I'll proceed with a new MPOL. Do you believe that this will be 
> of general interest? i.e. worth placing in linux-mm?

I've no idea if it is of general interest or not.  I'm not interested ;).
But I'm just one person.

> 	BTW, a slightly different MPOL_INTERLEAVE implementation would help, 
> wherein we save the nodemask originally specified by the user and do the 
> remap from the original nodemask rather than the current nodemask.

I kinda like this idea; though keep in mind that since I don't use
mempolicy mechanisms, I am not loosing any sleep over minor(?)
compatibility breakages.  It would take someone familiar with the
actual users or usages of MPOL_INTERLEAVE to know if or how much
this would bite actual users/usages.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: cpusets vs. mempolicy and how to get interleaving
  2007-08-20  5:53           ` Paul Jackson
@ 2007-08-20  8:10             ` David Rientjes
  2007-08-20 18:25               ` Paul Jackson
  2007-08-20 18:28               ` Ethan Solomita
  0 siblings, 2 replies; 16+ messages in thread
From: David Rientjes @ 2007-08-20  8:10 UTC (permalink / raw)
  To: Paul Jackson; +Cc: Ethan Solomita, clameter, linux-mm

On Sun, 19 Aug 2007, Paul Jackson wrote:

> > 	BTW, a slightly different MPOL_INTERLEAVE implementation would help, 
> > wherein we save the nodemask originally specified by the user and do the 
> > remap from the original nodemask rather than the current nodemask.
> 
> I kinda like this idea; though keep in mind that since I don't use
> mempolicy mechanisms, I am not loosing any sleep over minor(?)
> compatibility breakages.  It would take someone familiar with the
> actual users or usages of MPOL_INTERLEAVE to know if or how much
> this would bite actual users/usages.
> 

Like I've already said, there is absolutely no reason to add a new MPOL 
variant for this case.  As Christoph already mentioned, PF_SPREAD_PAGE 
gets similar results.  So just modify mpol_rebind_policy() so that if 
/dev/cpuset/<cpuset>/memory_spread_page is true, you rebind the 
interleaved nodemask to all nodes in the new nodemask.  That's the 
well-defined cpuset interface for getting an interleaved behavior already.

Let's not create new memory policies that only work for a very specific 
and configurable case when the basic underlying mechanism to that policy 
is already present in the cpuset interface, namely, PF_SPREAD_PAGE.

		David

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: cpusets vs. mempolicy and how to get interleaving
  2007-08-20  8:10             ` David Rientjes
@ 2007-08-20 18:25               ` Paul Jackson
  2007-08-20 18:28               ` Ethan Solomita
  1 sibling, 0 replies; 16+ messages in thread
From: Paul Jackson @ 2007-08-20 18:25 UTC (permalink / raw)
  To: David Rientjes; +Cc: solo, clameter, linux-mm

David wrote:
> Like I've already said, there is absolutely no reason to add a new MPOL 
> variant for this case.  As Christoph already mentioned, PF_SPREAD_PAGE 
> gets similar results.  So just modify mpol_rebind_policy() so that if 
> /dev/cpuset/<cpuset>/memory_spread_page is true, you rebind the 
> interleaved nodemask to all nodes in the new nodemask.  That's the 
> well-defined cpuset interface for getting an interleaved behavior already.

Hmm ... nice.

As David likely guesses, I didn't read his earlier suggestion of this.

Thanks for repeating it.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: cpusets vs. mempolicy and how to get interleaving
  2007-08-20  8:10             ` David Rientjes
  2007-08-20 18:25               ` Paul Jackson
@ 2007-08-20 18:28               ` Ethan Solomita
  2007-08-20 18:40                 ` David Rientjes
  1 sibling, 1 reply; 16+ messages in thread
From: Ethan Solomita @ 2007-08-20 18:28 UTC (permalink / raw)
  To: David Rientjes; +Cc: Paul Jackson, clameter, linux-mm

David Rientjes wrote:
> 
> Like I've already said, there is absolutely no reason to add a new MPOL 
> variant for this case.  As Christoph already mentioned, PF_SPREAD_PAGE 
> gets similar results.  So just modify mpol_rebind_policy() so that if 
> /dev/cpuset/<cpuset>/memory_spread_page is true, you rebind the 
> interleaved nodemask to all nodes in the new nodemask.  That's the 
> well-defined cpuset interface for getting an interleaved behavior already.

	memory_spread_page is only for file-backed pages, not anon pages.
	-- Ethan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: cpusets vs. mempolicy and how to get interleaving
  2007-08-20 18:28               ` Ethan Solomita
@ 2007-08-20 18:40                 ` David Rientjes
  2007-08-20 19:50                   ` Ethan Solomita
  0 siblings, 1 reply; 16+ messages in thread
From: David Rientjes @ 2007-08-20 18:40 UTC (permalink / raw)
  To: Ethan Solomita; +Cc: Paul Jackson, clameter, linux-mm

On Mon, 20 Aug 2007, Ethan Solomita wrote:

> > Like I've already said, there is absolutely no reason to add a new MPOL
> > variant for this case.  As Christoph already mentioned, PF_SPREAD_PAGE gets
> > similar results.  So just modify mpol_rebind_policy() so that if
> > /dev/cpuset/<cpuset>/memory_spread_page is true, you rebind the interleaved
> > nodemask to all nodes in the new nodemask.  That's the well-defined cpuset
> > interface for getting an interleaved behavior already.
> 
> 	memory_spread_page is only for file-backed pages, not anon pages.

Please read what I said above, all you have to do is modify 
mpol_rebind_policy() so that if /dev/cpuset/<cpuset>/memory_spread_page is 
true, you rebind the interleaved nodemask to all nodes in the new 
nodemask.

This only happens for the MPOL_INTERLEAVE case because the application has 
made it quite clear through set_mempolicy(MPOL_INTERLEAVE, ...) that it 
wants this behavior.

	int cpuset_is_spread_page(struct task_struct *task)
	{
		int ret;
		task_lock(task);
		ret = is_spread_page(task->cpuset);
		task_unlock(task);
		return ret;
	}

	void mpol_rebind_policy(struct mempolicy *pol, const nodemask_t *newmask)
	{
		...
		case MPOL_INTERLEAVE:
			if (cpuset_is_spread_page(current))
				pol->v.nodes = *newmask;
			else {
				nodes_remap(tmp, pol->v.nodes, *mpolmask, *newmask);
				pol->v.nodes = tmp;
			}
			...
		...
	}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: cpusets vs. mempolicy and how to get interleaving
  2007-08-20  5:47         ` Ethan Solomita
  2007-08-20  5:53           ` Paul Jackson
@ 2007-08-20 19:07           ` Christoph Lameter
  2007-08-21 14:14             ` Lee Schermerhorn
  1 sibling, 1 reply; 16+ messages in thread
From: Christoph Lameter @ 2007-08-20 19:07 UTC (permalink / raw)
  To: Lee Schermerhorn; +Cc: Paul Jackson, Ethan Solomita, rientjes, linux-mm

On Sun, 19 Aug 2007, Ethan Solomita wrote:

> 	OK, then I'll proceed with a new MPOL. Do you believe that this will
> be of general interest? i.e. worth placing in linux-mm?

Ummmm... Lets first get Lee onto this. AFAIK he already has an 
implementation for such a thing.

Lee: Would you respond to these emails?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: cpusets vs. mempolicy and how to get interleaving
  2007-08-20 18:40                 ` David Rientjes
@ 2007-08-20 19:50                   ` Ethan Solomita
  0 siblings, 0 replies; 16+ messages in thread
From: Ethan Solomita @ 2007-08-20 19:50 UTC (permalink / raw)
  To: David Rientjes; +Cc: Paul Jackson, clameter, linux-mm

David Rientjes wrote:
> On Mon, 20 Aug 2007, Ethan Solomita wrote:
> 
>>> Like I've already said, there is absolutely no reason to add a new MPOL
>>> variant for this case.  As Christoph already mentioned, PF_SPREAD_PAGE gets
>>> similar results.  So just modify mpol_rebind_policy() so that if
>>> /dev/cpuset/<cpuset>/memory_spread_page is true, you rebind the interleaved
>>> nodemask to all nodes in the new nodemask.  That's the well-defined cpuset
>>> interface for getting an interleaved behavior already.
>> 	memory_spread_page is only for file-backed pages, not anon pages.
> 
> Please read what I said above, all you have to do is modify 
> mpol_rebind_policy() so that if /dev/cpuset/<cpuset>/memory_spread_page is 
> true, you rebind the interleaved nodemask to all nodes in the new 
> nodemask.
> 
> This only happens for the MPOL_INTERLEAVE case because the application has 
> made it quite clear through set_mempolicy(MPOL_INTERLEAVE, ...) that it 
> wants this behavior.

	I understood what you said, I just had API concerns. If the definition 
of memory_spread_page is that file-backed pages get spread, it seems 
counterintuitive to me that it could have an effect on all memory 
allocations.

	Also MPOL_INTERLEAVE doesn't necessarily make it clear that this 
behavior is desired. One use (not the one I'm interested in, though) of 
this MPOL is to let the user specify specific nodes of interest, and 
they might not want a rebind to revert them to interleaving amongst all 
nodes.

	If MPOL_INTERLEAVE's nodemask were really a mask, applied on top of 
mems_allowed, this would be much easier, but it's not.
	-- Ethan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: cpusets vs. mempolicy and how to get interleaving
  2007-08-20 19:07           ` Christoph Lameter
@ 2007-08-21 14:14             ` Lee Schermerhorn
  0 siblings, 0 replies; 16+ messages in thread
From: Lee Schermerhorn @ 2007-08-21 14:14 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Paul Jackson, Ethan Solomita, rientjes, linux-mm, Eric Whitney

On Mon, 2007-08-20 at 12:07 -0700, Christoph Lameter wrote:
> On Sun, 19 Aug 2007, Ethan Solomita wrote:
> 
> > 	OK, then I'll proceed with a new MPOL. Do you believe that this will
> > be of general interest? i.e. worth placing in linux-mm?
> 
> Ummmm... Lets first get Lee onto this. AFAIK he already has an 
> implementation for such a thing.
> 
> Lee: Would you respond to these emails?
> 

Here's the post for cpuset-independent interleave [a.k.a. "contextual
interleave"].  

	http://marc.info/?l=linux-mm&m=118608528417158&w=4

I'm maintaining this with a half a dozen other mempolicy cleanups and
enhancements.  I'll try to post the entire series later this week.

Meanwhile, the patch linked above should apply to current mm tree with
little conflict.

Lee

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2007-08-21 14:14 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-08-18  0:22 cpusets vs. mempolicy and how to get interleaving Ethan Solomita
2007-08-18  0:29 ` Ethan Solomita
2007-08-19 10:18   ` David Rientjes
2007-08-20  0:53     ` Ethan Solomita
2007-08-20  2:34       ` Paul Jackson
2007-08-20  5:47         ` Ethan Solomita
2007-08-20  5:53           ` Paul Jackson
2007-08-20  8:10             ` David Rientjes
2007-08-20 18:25               ` Paul Jackson
2007-08-20 18:28               ` Ethan Solomita
2007-08-20 18:40                 ` David Rientjes
2007-08-20 19:50                   ` Ethan Solomita
2007-08-20 19:07           ` Christoph Lameter
2007-08-21 14:14             ` Lee Schermerhorn
2007-08-18  1:07 ` Christoph Lameter
2007-08-18  1:51   ` Ethan Solomita

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).