linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH -mm] cpuset,mm: fix no node to alloc memory when changing cpuset's mems - fix2
  2010-05-12  7:20 [PATCH -mm] cpuset,mm: fix no node to alloc memory when changing cpuset's mems - fix2 Miao Xie
@ 2010-05-12  4:32 ` Andrew Morton
  2010-05-12  9:00   ` Miao Xie
  0 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2010-05-12  4:32 UTC (permalink / raw)
  To: miaox
  Cc: David Rientjes, Lee Schermerhorn, Nick Piggin, Paul Menage,
	Linux-Kernel, Linux-MM

On Wed, 12 May 2010 15:20:51 +0800 Miao Xie <miaox@cn.fujitsu.com> wrote:

> @@ -985,6 +984,7 @@ repeat:
>  	 * for the read-side.
>  	 */
>  	while (ACCESS_ONCE(tsk->mems_allowed_change_disable)) {
> +		task_unlock(tsk);
>  		if (!task_curr(tsk))
>  			yield();
>  		goto repeat;

Oh, I meant to mention that.  No yield()s, please.  Their duration is
highly unpredictable.  Can we do something more deterministic here?

Did you consider doing all this with locking?  get_mems_allowed() does
mutex_lock(current->lock)?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH -mm] cpuset,mm: fix no node to alloc memory when changing cpuset's mems - fix2
@ 2010-05-12  7:20 Miao Xie
  2010-05-12  4:32 ` Andrew Morton
  0 siblings, 1 reply; 7+ messages in thread
From: Miao Xie @ 2010-05-12  7:20 UTC (permalink / raw)
  To: Andrew Morton, David Rientjes, Lee Schermerhorn, Nick Piggin,
	Paul Menage
  Cc: Linux-Kernel, Linux-MM

- cleanup unnecessary header file
- fix the race between set_mempolicy() and cpuset_change_task_nodemask()

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
 kernel/cpuset.c |    3 +--
 kernel/exit.c   |    1 -
 kernel/fork.c   |    1 -
 3 files changed, 1 insertions(+), 4 deletions(-)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 30cb9a2..d243a22 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -963,7 +963,6 @@ repeat:
 	task_lock(tsk);
 	nodes_or(tsk->mems_allowed, tsk->mems_allowed, *newmems);
 	mpol_rebind_task(tsk, newmems, MPOL_REBIND_STEP1);
-	task_unlock(tsk);
 
 
 	/*
@@ -985,6 +984,7 @@ repeat:
 	 * for the read-side.
 	 */
 	while (ACCESS_ONCE(tsk->mems_allowed_change_disable)) {
+		task_unlock(tsk);
 		if (!task_curr(tsk))
 			yield();
 		goto repeat;
@@ -999,7 +999,6 @@ repeat:
 	 */
 	smp_mb();
 
-	task_lock(tsk);
 	mpol_rebind_task(tsk, newmems, MPOL_REBIND_STEP2);
 	tsk->mems_allowed = *newmems;
 	task_unlock(tsk);
diff --git a/kernel/exit.c b/kernel/exit.c
index 41bc5b2..0ecb17b 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -16,7 +16,6 @@
 #include <linux/key.h>
 #include <linux/security.h>
 #include <linux/cpu.h>
-#include <linux/cpuset.h>
 #include <linux/acct.h>
 #include <linux/tsacct_kern.h>
 #include <linux/file.h>
diff --git a/kernel/fork.c b/kernel/fork.c
index 6e87c95..f4f0951 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -31,7 +31,6 @@
 #include <linux/nsproxy.h>
 #include <linux/capability.h>
 #include <linux/cpu.h>
-#include <linux/cpuset.h>
 #include <linux/cgroup.h>
 #include <linux/security.h>
 #include <linux/hugetlb.h>
-- 
1.6.5.2


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH -mm] cpuset,mm: fix no node to alloc memory when changing cpuset's mems - fix2
  2010-05-12  4:32 ` Andrew Morton
@ 2010-05-12  9:00   ` Miao Xie
  2010-05-12 17:48     ` Andrew Morton
  0 siblings, 1 reply; 7+ messages in thread
From: Miao Xie @ 2010-05-12  9:00 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Rientjes, Lee Schermerhorn, Nick Piggin, Paul Menage,
	Linux-Kernel, Linux-MM

on 2010-5-12 12:32, Andrew Morton wrote:
> On Wed, 12 May 2010 15:20:51 +0800 Miao Xie <miaox@cn.fujitsu.com> wrote:
> 
>> @@ -985,6 +984,7 @@ repeat:
>>  	 * for the read-side.
>>  	 */
>>  	while (ACCESS_ONCE(tsk->mems_allowed_change_disable)) {
>> +		task_unlock(tsk);
>>  		if (!task_curr(tsk))
>>  			yield();
>>  		goto repeat;
> 
> Oh, I meant to mention that.  No yield()s, please.  Their duration is
> highly unpredictable.  Can we do something more deterministic here?

Maybe we can use wait_for_completion().

> 
> Did you consider doing all this with locking?  get_mems_allowed() does
> mutex_lock(current->lock)?

do you means using a real lock(such as: mutex) to protect mempolicy and mem_allowed?

It may cause the performance regression, so I do my best to abstain from using a real
lock.

Thanks
Miao

> 
> 
> 
> 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH -mm] cpuset,mm: fix no node to alloc memory when changing cpuset's mems - fix2
  2010-05-12  9:00   ` Miao Xie
@ 2010-05-12 17:48     ` Andrew Morton
  2010-05-13  6:16       ` Miao Xie
  0 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2010-05-12 17:48 UTC (permalink / raw)
  To: miaox
  Cc: David Rientjes, Lee Schermerhorn, Nick Piggin, Paul Menage,
	Linux-Kernel, Linux-MM

On Wed, 12 May 2010 17:00:45 +0800
Miao Xie <miaox@cn.fujitsu.com> wrote:

> on 2010-5-12 12:32, Andrew Morton wrote:
> > On Wed, 12 May 2010 15:20:51 +0800 Miao Xie <miaox@cn.fujitsu.com> wrote:
> > 
> >> @@ -985,6 +984,7 @@ repeat:
> >>  	 * for the read-side.
> >>  	 */
> >>  	while (ACCESS_ONCE(tsk->mems_allowed_change_disable)) {
> >> +		task_unlock(tsk);
> >>  		if (!task_curr(tsk))
> >>  			yield();
> >>  		goto repeat;
> > 
> > Oh, I meant to mention that.  No yield()s, please.  Their duration is
> > highly unpredictable.  Can we do something more deterministic here?
> 
> Maybe we can use wait_for_completion().

That would work.

> > 
> > Did you consider doing all this with locking?  get_mems_allowed() does
> > mutex_lock(current->lock)?
> 
> do you means using a real lock(such as: mutex) to protect mempolicy and mem_allowed?

yes.

> It may cause the performance regression, so I do my best to abstain from using a real
> lock.

Well, the code as-is is pretty exotic with lots of open-coded tricky
barriers - it's best to avoid inventing new primitives if possible. 
For example, there's no lockdep support for this new "lock".

mutex_lock() is pretty quick - basically a simgle atomic op.  How
frequently do these operations occur?

The code you have at present is fairly similar to sequence locks.  I
wonder if there's some way of (ab)using sequence locks for this. 
seqlocks don't have lockdep support either...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH -mm] cpuset,mm: fix no node to alloc memory when changing cpuset's mems - fix2
  2010-05-12 17:48     ` Andrew Morton
@ 2010-05-13  6:16       ` Miao Xie
  2010-05-13 19:11         ` Andrew Morton
  0 siblings, 1 reply; 7+ messages in thread
From: Miao Xie @ 2010-05-13  6:16 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Rientjes, Lee Schermerhorn, Nick Piggin, Paul Menage,
	Linux-Kernel, Linux-MM

on 2010-5-13 1:48, Andrew Morton wrote:
>> It may cause the performance regression, so I do my best to abstain from using a real
>> lock.
> 
> Well, the code as-is is pretty exotic with lots of open-coded tricky
> barriers - it's best to avoid inventing new primitives if possible. 
> For example, there's no lockdep support for this new "lock".

I didn't find an existing lock that could fix the problem well till now, so
I had to design this new "lock" to protect the task's mempolicy and mems_allowed.

> 
> mutex_lock() is pretty quick - basically a simgle atomic op.  How
> frequently do these operations occur?

There is another problem that I forgot to mention.
besides the performance problem, the read-side may call it in the context
in which the task can't sleep. so we can't use mutex.

> 
> The code you have at present is fairly similar to sequence locks.  I
> wonder if there's some way of (ab)using sequence locks for this. 
> seqlocks don't have lockdep support either...
> 

We can't use sequence locks here, because the read-side may read the data
in changing, but it can't put off cleaning the old bits.

Thanks
Miao

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH -mm] cpuset,mm: fix no node to alloc memory when changing cpuset's mems - fix2
  2010-05-13  6:16       ` Miao Xie
@ 2010-05-13 19:11         ` Andrew Morton
  2010-05-17  4:01           ` Miao Xie
  0 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2010-05-13 19:11 UTC (permalink / raw)
  To: miaox
  Cc: David Rientjes, Lee Schermerhorn, Nick Piggin, Paul Menage,
	Linux-Kernel, Linux-MM

On Thu, 13 May 2010 14:16:33 +0800
Miao Xie <miaox@cn.fujitsu.com> wrote:

> > 
> > The code you have at present is fairly similar to sequence locks.  I
> > wonder if there's some way of (ab)using sequence locks for this. 
> > seqlocks don't have lockdep support either...
> > 
> 
> We can't use sequence locks here, because the read-side may read the data
> in changing, but it can't put off cleaning the old bits.

I don't understand that sentence.  Can you expand on it please?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH -mm] cpuset,mm: fix no node to alloc memory when changing cpuset's mems - fix2
  2010-05-13 19:11         ` Andrew Morton
@ 2010-05-17  4:01           ` Miao Xie
  0 siblings, 0 replies; 7+ messages in thread
From: Miao Xie @ 2010-05-17  4:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Rientjes, Lee Schermerhorn, Nick Piggin, Paul Menage,
	Linux-Kernel, Linux-MM

on 2010-5-14 3:11, Andrew Morton wrote:
> On Thu, 13 May 2010 14:16:33 +0800
> Miao Xie <miaox@cn.fujitsu.com> wrote:
> 
>>>
>>> The code you have at present is fairly similar to sequence locks.  I
>>> wonder if there's some way of (ab)using sequence locks for this. 
>>> seqlocks don't have lockdep support either...
>>>
>>
>> We can't use sequence locks here, because the read-side may read the data
>> in changing, but it can't put off cleaning the old bits.
> 
> I don't understand that sentence.  Can you expand on it please?
> 

the mempolicy and mems_allowed tell the task that it should allocates the memory
space on the specified node. so when allocating the memory space, the memory
allocation functions that the task invokes must accesses the mempolicy and
mems_allowed to find a node on which it can do memory allocation.

But those memory allocation functions can be used in both the context that the
task can sleep and the context that the task can't sleep(etc. disable irq). so
the real lock is not suitable.

And it is not a problem that the task allocates the memory space on the old
allowed node when the mempolicy and mems_allowed is in changing, because the
mempolicy and mems_allowed is not mandatory. So I think we needn't use a real
lock to protect the mempolicy and mems_allowed in the read-side, and just use a
real lock in the write-side. But there is a serious problem, that is the read
-side may find no node to allocate memory and oom occurs, just like the
following case(mentioned in the patch's changelog):
(mpol: mempolicy)
	task1			task1's mpol	task2
	alloc page		1
	  alloc on node0? NO	1
				1		change mems from 1 to 0
				0		rebind task1's mpol
	  alloc on node1? NO	0
	  ...
	can't alloc page
	  goto oom

In order to fix this problem, I got an idea that we set the newly allowed nodes
first, and then clean the disallowed nodes, But there is still a problem.
(mpol: mempolicy)
	task1			task1's mpol	task2
	alloc page		1
	  alloc on node0? NO	1
				1		change mems from 1 to 0
				1		rebind task1's mpol
				0-1		  set new bits
				0	  	  clear disallowed bits
	  alloc on node1? NO	0
	  ...
	can't alloc page
	  goto oom
	  
It is because we cleanup disallowed nodes early, so I use a variable to tell the
write-side that the task is accessing the mempolicy and mems_allowed now, the
write-side must cleanup disallowed nodes soon after.

And the seq read lock can't provide this function. And besides that, the read-side
will goto oom and not go back if it find no node to allcate memory, so it won't
check the seq number of lock to find whether the mempolicy and mems_allowed have
been changed. so the seq lock is also not suitable, I think.

Thanks
Miao

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-05-17  3:59 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-12  7:20 [PATCH -mm] cpuset,mm: fix no node to alloc memory when changing cpuset's mems - fix2 Miao Xie
2010-05-12  4:32 ` Andrew Morton
2010-05-12  9:00   ` Miao Xie
2010-05-12 17:48     ` Andrew Morton
2010-05-13  6:16       ` Miao Xie
2010-05-13 19:11         ` Andrew Morton
2010-05-17  4:01           ` Miao Xie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).