Re: [patch for-3.2-rc3] cpusets: stall when updating mems_allowed for mempolicy or disjoint nodemask

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Miao Xie <miaox@cn.fujitsu.com>
To: David Rientjes <rientjes@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Paul Menage <paul@paulmenage.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [patch for-3.2-rc3] cpusets: stall when updating mems_allowed for mempolicy or disjoint nodemask
Date: Wed, 23 Nov 2011 10:51:52 +0800	[thread overview]
Message-ID: <4ECC5FC8.9070500@cn.fujitsu.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1111181545170.24487@chino.kir.corp.google.com>

On Fri, 18 Nov 2011 15:49:22 -0800 (pst), David Rientjes wrote:
> On Fri, 18 Nov 2011, Miao Xie wrote:
> 
>>>> I find these is another problem, please take account of the following case:
>>>>
>>>>   2-3 -> 1-2 -> 0-1
>>>>
>>>> the user change mems_allowed twice continuously, the task may see the empty
>>>> mems_allowed.
>>>>
>>>> So, it is still dangerous.
>>>>
>>>
>>> With this patch, we're protected by task_lock(tsk) to determine whether we 
>>> want to take the exception, i.e. whether need_loop is false, and the 
>>> setting of tsk->mems_allowed.  So this would see the nodemask change at 
>>> the individual steps from 2-3 -> 1-2 -> 0-1, not some inconsistent state 
>>> in between or directly from 2-3 -> 0-1.  The only time we don't hold 
>>> task_lock(tsk) to change tsk->mems_allowed is when tsk == current and in 
>>> that case we're not concerned about intermediate reads to its own nodemask 
>>> while storing to a mask where MAX_NUMNODES > BITS_PER_LONG.
>>>
>>> Thus, there's no problem here with regard to such behavior if we exclude 
>>> mempolicies, which this patch does.
>>>
>>
>> No.
>> When the task does memory allocation, it access its mems_allowed without
>> task_lock(tsk), and it may be blocked after it check 0-1 bits. And then, the
>> user changes mems_allowed twice continuously(2-3(initial state) -> 1-2 -> 0-1),
>> After that, the task is woke up and it see the empty mems_allowed.
>>
> 
> I'm confused, you're concerned on a kernel where 
> MAX_NUMNODES > BITS_PER_LONG about thread A reading a partial 
> tsk->mems_allowed, being preempted, meanwhile thread B changes 
> tsk->mems_allowed by taking cgroup_mutex, taking task_lock(tsk), setting 
> the intersecting nodemask, releasing both, taking them again, changing the 
> nodemask again to be disjoint, then the thread A waking up and finishing 
> its read and seeing an intersecting nodemask because it is now disjoint 
> from the first read?
> 

(I am sorry for the late reply, I was on leave for the past few days.)

Yes, what you said is right.
But in fact, on the kernel where MAX_NUMNODES <= BITS_PER_LONG, the same problem
can also occur.
	task1			task1's mems	task2
	alloc page		2-3
	  alloc on node1? NO	2-3
				2-3		change mems from 2-3 to 1-2
				1-2		rebind task1's mpol
				1-2		  set new bits
				1-2		change mems from 0-1 to 0
				1-2		rebind task1's mpol
				0-1		  set new bits
	  alloc on node2? NO	0-1
	  ...
	can't alloc page
	  goto oom

Thanks

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Miao Xie <miaox@cn.fujitsu.com>
To: David Rientjes <rientjes@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Paul Menage <paul@paulmenage.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [patch for-3.2-rc3] cpusets: stall when updating mems_allowed for mempolicy or disjoint nodemask
Date: Wed, 23 Nov 2011 10:51:52 +0800	[thread overview]
Message-ID: <4ECC5FC8.9070500@cn.fujitsu.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1111181545170.24487@chino.kir.corp.google.com>

On Fri, 18 Nov 2011 15:49:22 -0800 (pst), David Rientjes wrote:
> On Fri, 18 Nov 2011, Miao Xie wrote:
> 
>>>> I find these is another problem, please take account of the following case:
>>>>
>>>>   2-3 -> 1-2 -> 0-1
>>>>
>>>> the user change mems_allowed twice continuously, the task may see the empty
>>>> mems_allowed.
>>>>
>>>> So, it is still dangerous.
>>>>
>>>
>>> With this patch, we're protected by task_lock(tsk) to determine whether we 
>>> want to take the exception, i.e. whether need_loop is false, and the 
>>> setting of tsk->mems_allowed.  So this would see the nodemask change at 
>>> the individual steps from 2-3 -> 1-2 -> 0-1, not some inconsistent state 
>>> in between or directly from 2-3 -> 0-1.  The only time we don't hold 
>>> task_lock(tsk) to change tsk->mems_allowed is when tsk == current and in 
>>> that case we're not concerned about intermediate reads to its own nodemask 
>>> while storing to a mask where MAX_NUMNODES > BITS_PER_LONG.
>>>
>>> Thus, there's no problem here with regard to such behavior if we exclude 
>>> mempolicies, which this patch does.
>>>
>>
>> No.
>> When the task does memory allocation, it access its mems_allowed without
>> task_lock(tsk), and it may be blocked after it check 0-1 bits. And then, the
>> user changes mems_allowed twice continuously(2-3(initial state) -> 1-2 -> 0-1),
>> After that, the task is woke up and it see the empty mems_allowed.
>>
> 
> I'm confused, you're concerned on a kernel where 
> MAX_NUMNODES > BITS_PER_LONG about thread A reading a partial 
> tsk->mems_allowed, being preempted, meanwhile thread B changes 
> tsk->mems_allowed by taking cgroup_mutex, taking task_lock(tsk), setting 
> the intersecting nodemask, releasing both, taking them again, changing the 
> nodemask again to be disjoint, then the thread A waking up and finishing 
> its read and seeing an intersecting nodemask because it is now disjoint 
> from the first read?
> 

(I am sorry for the late reply, I was on leave for the past few days.)

Yes, what you said is right.
But in fact, on the kernel where MAX_NUMNODES <= BITS_PER_LONG, the same problem
can also occur.
	task1			task1's mems	task2
	alloc page		2-3
	  alloc on node1? NO	2-3
				2-3		change mems from 2-3 to 1-2
				1-2		rebind task1's mpol
				1-2		  set new bits
				1-2		change mems from 0-1 to 0
				1-2		rebind task1's mpol
				0-1		  set new bits
	  alloc on node2? NO	0-1
	  ...
	can't alloc page
	  goto oom

Thanks

next prev parent reply	other threads:[~2011-11-23  2:52 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-16 21:08 [patch for-3.2-rc3] cpusets: stall when updating mems_allowed for mempolicy or disjoint nodemask David Rientjes
2011-11-16 21:08 ` David Rientjes
2011-11-17  8:29 ` Miao Xie
2011-11-17  8:29   ` Miao Xie
2011-11-17 21:33   ` David Rientjes
2011-11-17 21:33     ` David Rientjes
2011-11-18  9:52     ` Miao Xie
2011-11-18  9:52       ` Miao Xie
2011-11-18 23:49       ` David Rientjes
2011-11-18 23:49         ` David Rientjes
2011-11-23  2:51         ` Miao Xie [this message]
2011-11-23  2:51           ` Miao Xie
2011-11-23  3:32           ` David Rientjes
2011-11-23  3:32             ` David Rientjes
2011-11-23  4:48             ` Miao Xie
2011-11-23  4:48               ` Miao Xie
2011-11-23  6:25               ` David Rientjes
2011-11-23  6:25                 ` David Rientjes
2011-11-23  7:49                 ` Miao Xie
2011-11-23  7:49                   ` Miao Xie
2011-11-23 22:26                   ` David Rientjes
2011-11-23 22:26                     ` David Rientjes
2011-11-24  1:26                     ` Miao Xie
2011-11-24  1:26                       ` Miao Xie
2011-11-24  1:52                       ` David Rientjes
2011-11-24  1:52                         ` David Rientjes
2011-11-24  2:50                         ` Miao Xie
2011-11-24  2:50                           ` Miao Xie
2011-11-17 22:22 ` Andrew Morton
2011-11-17 22:22   ` Andrew Morton
2011-11-17 23:08   ` [patch v2 " David Rientjes
2011-11-17 23:08     ` David Rientjes
2011-11-18  0:00     ` Andrew Morton
2011-11-18  0:00       ` Andrew Morton
2011-11-18 23:53       ` David Rientjes
2011-11-18 23:53         ` David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4ECC5FC8.9070500@cn.fujitsu.com \
    --to=miaox@cn.fujitsu.com \
    --cc=akpm@linux-foundation.org \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=paul@paulmenage.org \
    --cc=rientjes@google.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.