All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Andrew Morton <akpm@linux-foundation.org>,
	Xishi Qiu <qiuxishi@huawei.com>
Cc: Joonsoo Kim <js1304@gmail.com>,
	David Rientjes <rientjes@google.com>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	Laura Abbott <lauraa@codeaurora.org>,
	zhuhui@xiaomi.com, wangxq10@lzu.edu.cn,
	Linux MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] mm: fix invalid node in alloc_migrate_target()
Date: Tue, 29 Mar 2016 15:06:16 +0200	[thread overview]
Message-ID: <56FA7DC8.4000902@suse.cz> (raw)
In-Reply-To: <20160325122237.4ca4e0dbca215ccbf4f49922@linux-foundation.org>

On 03/25/2016 08:22 PM, Andrew Morton wrote:
> On Fri, 25 Mar 2016 14:56:04 +0800 Xishi Qiu <qiuxishi@huawei.com> wrote:
>
>> It is incorrect to use next_node to find a target node, it will
>> return MAX_NUMNODES or invalid node. This will lead to crash in
>> buddy system allocation.
>>
>> ...
>>
>> --- a/mm/page_isolation.c
>> +++ b/mm/page_isolation.c
>> @@ -289,11 +289,11 @@ struct page *alloc_migrate_target(struct page *page, unsigned long private,
>>   	 * now as a simple work-around, we use the next node for destination.
>>   	 */
>>   	if (PageHuge(page)) {
>> -		nodemask_t src = nodemask_of_node(page_to_nid(page));
>> -		nodemask_t dst;
>> -		nodes_complement(dst, src);
>> +		int node = next_online_node(page_to_nid(page));
>> +		if (node == MAX_NUMNODES)
>> +			node = first_online_node;
>>   		return alloc_huge_page_node(page_hstate(compound_head(page)),
>> -					    next_node(page_to_nid(page), dst));
>> +					    node);
>>   	}
>>
>>   	if (PageHighMem(page))
>
> Indeed.  Can you tell us more about this circumstances under which the
> kernel will crash?  I need to decide which kernel version(s) need the
> patch, but the changelog doesn't contain the info needed to make this
> decision (it should).
>
>
>
> next_node() isn't a very useful interface, really.  Just about every
> caller does this:
>
>
> 	node = next_node(node, XXX);
> 	if (node == MAX_NUMNODES)
> 		node = first_node(XXX);
>
> so how about we write a function which does that, and stop open-coding
> the same thing everywhere?

Good idea.

> And I think your fix could then use such a function:
>
> 	int node = that_new_function(page_to_nid(page), node_online_map);
>
>
>
> Also, mm/mempolicy.c:offset_il_node() worries me:
>
> 	do {
> 		nid = next_node(nid, pol->v.nodes);
> 		c++;
> 	} while (c <= target);
>
> Can't `nid' hit MAX_NUMNODES?

AFAICS it can. interleave_nid() uses this and the nid is then used e.g. 
in node_zonelist() where it's used for NODE_DATA(nid). That's quite 
scary. It also predates git. Why don't we see crashes or KASAN finding this?

>
> And can someone please explain mem_cgroup_select_victim_node() to me?
> How can we hit the "node = numa_node_id()" path?  Only if
> memcg->scan_nodes is empty?  is that even valid?  The comment seems to
> have not much to do with the code?

I understand the comment that it's valid to be empty and the comment 
lists reasons why that can happen (with somewhat broken language). Note 
that I didn't verify these reasons:
- we call this when hitting memcg limit, not when adding pages to LRU, 
as adding to LRU means it would contain the given LRU's node
- adding to unevictable LRU means it's not added to scan_nodes (probably 
because scanning unevictable lru would be useless)
- for other reasons (which?) it might have pages not on LRU and it's so 
small there are no other pages that would be on LRU

> mpol_rebind_nodemask() is similar.
>
>
>
> Something like this?
>
>
> From: Andrew Morton <akpm@linux-foundation.org>
> Subject: include/linux/nodemask.h: create next_node_in() helper
>
> Lots of code does
>
> 	node = next_node(node, XXX);
> 	if (node == MAX_NUMNODES)
> 		node = first_node(XXX);
>
> so create next_node_in() to do this and use it in various places.
>
> Cc: Xishi Qiu <qiuxishi@huawei.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

Patch doesn't address offset_il_node() which is good, because if it's 
indeed buggy, it's serious and needs a non-cleanup patch.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Vlastimil Babka <vbabka@suse.cz>
To: Andrew Morton <akpm@linux-foundation.org>,
	Xishi Qiu <qiuxishi@huawei.com>
Cc: Joonsoo Kim <js1304@gmail.com>,
	David Rientjes <rientjes@google.com>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	Laura Abbott <lauraa@codeaurora.org>,
	zhuhui@xiaomi.com, wangxq10@lzu.edu.cn,
	Linux MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] mm: fix invalid node in alloc_migrate_target()
Date: Tue, 29 Mar 2016 15:06:16 +0200	[thread overview]
Message-ID: <56FA7DC8.4000902@suse.cz> (raw)
In-Reply-To: <20160325122237.4ca4e0dbca215ccbf4f49922@linux-foundation.org>

On 03/25/2016 08:22 PM, Andrew Morton wrote:
> On Fri, 25 Mar 2016 14:56:04 +0800 Xishi Qiu <qiuxishi@huawei.com> wrote:
>
>> It is incorrect to use next_node to find a target node, it will
>> return MAX_NUMNODES or invalid node. This will lead to crash in
>> buddy system allocation.
>>
>> ...
>>
>> --- a/mm/page_isolation.c
>> +++ b/mm/page_isolation.c
>> @@ -289,11 +289,11 @@ struct page *alloc_migrate_target(struct page *page, unsigned long private,
>>   	 * now as a simple work-around, we use the next node for destination.
>>   	 */
>>   	if (PageHuge(page)) {
>> -		nodemask_t src = nodemask_of_node(page_to_nid(page));
>> -		nodemask_t dst;
>> -		nodes_complement(dst, src);
>> +		int node = next_online_node(page_to_nid(page));
>> +		if (node == MAX_NUMNODES)
>> +			node = first_online_node;
>>   		return alloc_huge_page_node(page_hstate(compound_head(page)),
>> -					    next_node(page_to_nid(page), dst));
>> +					    node);
>>   	}
>>
>>   	if (PageHighMem(page))
>
> Indeed.  Can you tell us more about this circumstances under which the
> kernel will crash?  I need to decide which kernel version(s) need the
> patch, but the changelog doesn't contain the info needed to make this
> decision (it should).
>
>
>
> next_node() isn't a very useful interface, really.  Just about every
> caller does this:
>
>
> 	node = next_node(node, XXX);
> 	if (node == MAX_NUMNODES)
> 		node = first_node(XXX);
>
> so how about we write a function which does that, and stop open-coding
> the same thing everywhere?

Good idea.

> And I think your fix could then use such a function:
>
> 	int node = that_new_function(page_to_nid(page), node_online_map);
>
>
>
> Also, mm/mempolicy.c:offset_il_node() worries me:
>
> 	do {
> 		nid = next_node(nid, pol->v.nodes);
> 		c++;
> 	} while (c <= target);
>
> Can't `nid' hit MAX_NUMNODES?

AFAICS it can. interleave_nid() uses this and the nid is then used e.g. 
in node_zonelist() where it's used for NODE_DATA(nid). That's quite 
scary. It also predates git. Why don't we see crashes or KASAN finding this?

>
> And can someone please explain mem_cgroup_select_victim_node() to me?
> How can we hit the "node = numa_node_id()" path?  Only if
> memcg->scan_nodes is empty?  is that even valid?  The comment seems to
> have not much to do with the code?

I understand the comment that it's valid to be empty and the comment 
lists reasons why that can happen (with somewhat broken language). Note 
that I didn't verify these reasons:
- we call this when hitting memcg limit, not when adding pages to LRU, 
as adding to LRU means it would contain the given LRU's node
- adding to unevictable LRU means it's not added to scan_nodes (probably 
because scanning unevictable lru would be useless)
- for other reasons (which?) it might have pages not on LRU and it's so 
small there are no other pages that would be on LRU

> mpol_rebind_nodemask() is similar.
>
>
>
> Something like this?
>
>
> From: Andrew Morton <akpm@linux-foundation.org>
> Subject: include/linux/nodemask.h: create next_node_in() helper
>
> Lots of code does
>
> 	node = next_node(node, XXX);
> 	if (node == MAX_NUMNODES)
> 		node = first_node(XXX);
>
> so create next_node_in() to do this and use it in various places.
>
> Cc: Xishi Qiu <qiuxishi@huawei.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

Patch doesn't address offset_il_node() which is good, because if it's 
indeed buggy, it's serious and needs a non-cleanup patch.

  parent reply	other threads:[~2016-03-29 13:06 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-25  6:56 [PATCH] mm: fix invalid node in alloc_migrate_target() Xishi Qiu
2016-03-25  6:56 ` Xishi Qiu
2016-03-25 19:22 ` Andrew Morton
2016-03-25 19:22   ` Andrew Morton
2016-03-26  5:31   ` Xishi Qiu
2016-03-26  5:31     ` Xishi Qiu
2016-03-29  9:52     ` Vlastimil Babka
2016-03-29  9:52       ` Vlastimil Babka
2016-03-29 10:06       ` Vlastimil Babka
2016-03-29 10:06         ` Vlastimil Babka
2016-03-29 10:37       ` Xishi Qiu
2016-03-29 10:37         ` Xishi Qiu
2016-03-29 12:21         ` Vlastimil Babka
2016-03-29 12:21           ` Vlastimil Babka
2016-03-29 13:06   ` Vlastimil Babka [this message]
2016-03-29 13:06     ` Vlastimil Babka
2016-03-31 13:13     ` Vlastimil Babka
2016-03-31 13:13       ` Vlastimil Babka
2016-03-31 21:01       ` Andrew Morton
2016-03-31 21:01         ` Andrew Morton
2016-04-01  8:42         ` Vlastimil Babka
2016-04-01  8:42           ` Vlastimil Babka
2016-03-29 15:52   ` Michal Hocko
2016-03-29 15:52     ` Michal Hocko
2016-03-29 12:25 ` Vlastimil Babka
2016-03-29 12:25   ` Vlastimil Babka
2016-03-30  1:13   ` Naoya Horiguchi
2016-03-30  1:13     ` Naoya Horiguchi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56FA7DC8.4000902@suse.cz \
    --to=vbabka@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=js1304@gmail.com \
    --cc=lauraa@codeaurora.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=qiuxishi@huawei.com \
    --cc=rientjes@google.com \
    --cc=wangxq10@lzu.edu.cn \
    --cc=zhuhui@xiaomi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.