All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Xishi Qiu <qiuxishi@huawei.com>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: Joonsoo Kim <js1304@gmail.com>,
	David Rientjes <rientjes@google.com>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	Laura Abbott <lauraa@codeaurora.org>,
	zhuhui@xiaomi.com, wangxq10@lzu.edu.cn,
	Linux MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Dave Hansen <dave.hansen@intel.com>
Subject: Re: [PATCH] mm: fix invalid node in alloc_migrate_target()
Date: Tue, 29 Mar 2016 12:06:10 +0200	[thread overview]
Message-ID: <56FA5392.1030509@suse.cz> (raw)
In-Reply-To: <56FA5062.2020103@suse.cz>

On 03/29/2016 11:52 AM, Vlastimil Babka wrote:
> On 03/26/2016 06:31 AM, Xishi Qiu wrote:
>> On 2016/3/26 3:22, Andrew Morton wrote:
>>
>>> On Fri, 25 Mar 2016 14:56:04 +0800 Xishi Qiu <qiuxishi@huawei.com> wrote:
>>>
>>>> It is incorrect to use next_node to find a target node, it will
>>>> return MAX_NUMNODES or invalid node. This will lead to crash in
>>>> buddy system allocation.
>>>>
>>>> ...
>>>>
>>>> --- a/mm/page_isolation.c
>>>> +++ b/mm/page_isolation.c
>>>> @@ -289,11 +289,11 @@ struct page *alloc_migrate_target(struct page *page, unsigned long private,
>>>>    	 * now as a simple work-around, we use the next node for destination.
>>>>    	 */
>>>>    	if (PageHuge(page)) {
>>>> -		nodemask_t src = nodemask_of_node(page_to_nid(page));
>>>> -		nodemask_t dst;
>>>> -		nodes_complement(dst, src);
>>>> +		int node = next_online_node(page_to_nid(page));
>>>> +		if (node == MAX_NUMNODES)
>>>> +			node = first_online_node;
>>>>    		return alloc_huge_page_node(page_hstate(compound_head(page)),
>>>> -					    next_node(page_to_nid(page), dst));
>>>> +					    node);
>>>>    	}
>>>>
>>>>    	if (PageHighMem(page))
>>>
>>> Indeed.  Can you tell us more about this circumstances under which the
>>> kernel will crash?  I need to decide which kernel version(s) need the
>>> patch, but the changelog doesn't contain the info needed to make this
>>> decision (it should).
>>>
>>
>> Hi Andrew,
>>
>> I read the code v4.4, and find the following path maybe trigger the bug.
>>
>> alloc_migrate_target()
>> 	alloc_huge_page_node()  // the node may be offline or MAX_NUMNODES
>> 		__alloc_buddy_huge_page_no_mpol()
>> 			__alloc_buddy_huge_page()
>> 				__hugetlb_alloc_buddy_huge_page()
>
> The code in this functions seems to come from 099730d67417d ("mm,
> hugetlb: use memory policy when available") by Dave Hansen (adding to
> CC), which was indeed merged in 4.4-rc1.
>
> However, alloc_pages_node() is only called in the block guarded by:
>
> if (!IS_ENABLED(CONFIG_NUMA) || !vma) {
>
> The rather weird "!IS_ENABLED(CONFIG_NUMA)" part comes from immediate
> followup commit e0ec90ee7e6f ("mm, hugetlbfs: optimize when NUMA=n")
>
> So I doubt the code path here can actually happen. But it's fragile and
> confusing nevertheless.

Ah, so there's actually a dangerous path:
alloc_huge_page_node()
     dequeue_huge_page_node()
         list_for_each_entry(page, &h->hugepage_freelists[nid], lru)

hugepage_freelists is MAX_NUMNODES sized, so when nid is MAX_NUMNODES, 
we access past it.

However, look closer at how nid is obtained in alloc_migrate_target():

nodemask_t src = nodemask_of_node(page_to_nid(page));
nodemask_t dst;
nodes_complement(dst, src);

nid = next_node(page_to_nid(page), dst)

for nid to be MAX_NUMNODES, the original page has to be on node 
MAX_NUMNODES-1, otherwise the complement part means we hit the very next 
bit which is set.

It's actually a rather obfuscated way of doing:

nid = page_to_nid(page) + 1;

In that case the problem is in commit c8721bbbdd36 ("mm: memory-hotplug: 
enable memory hotplug to handle hugepage") from 3.12 and will likely 
affect only people that tune down MAX_NUMNODES to match their machine.

>> 					alloc_pages_node()
>> 						__alloc_pages_node()
>> 							VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
>> 							VM_WARN_ON(!node_online(nid));
>>
>> Thanks,
>> Xishi Qiu
>>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Vlastimil Babka <vbabka@suse.cz>
To: Xishi Qiu <qiuxishi@huawei.com>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: Joonsoo Kim <js1304@gmail.com>,
	David Rientjes <rientjes@google.com>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	Laura Abbott <lauraa@codeaurora.org>,
	zhuhui@xiaomi.com, wangxq10@lzu.edu.cn,
	Linux MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Dave Hansen <dave.hansen@intel.com>
Subject: Re: [PATCH] mm: fix invalid node in alloc_migrate_target()
Date: Tue, 29 Mar 2016 12:06:10 +0200	[thread overview]
Message-ID: <56FA5392.1030509@suse.cz> (raw)
In-Reply-To: <56FA5062.2020103@suse.cz>

On 03/29/2016 11:52 AM, Vlastimil Babka wrote:
> On 03/26/2016 06:31 AM, Xishi Qiu wrote:
>> On 2016/3/26 3:22, Andrew Morton wrote:
>>
>>> On Fri, 25 Mar 2016 14:56:04 +0800 Xishi Qiu <qiuxishi@huawei.com> wrote:
>>>
>>>> It is incorrect to use next_node to find a target node, it will
>>>> return MAX_NUMNODES or invalid node. This will lead to crash in
>>>> buddy system allocation.
>>>>
>>>> ...
>>>>
>>>> --- a/mm/page_isolation.c
>>>> +++ b/mm/page_isolation.c
>>>> @@ -289,11 +289,11 @@ struct page *alloc_migrate_target(struct page *page, unsigned long private,
>>>>    	 * now as a simple work-around, we use the next node for destination.
>>>>    	 */
>>>>    	if (PageHuge(page)) {
>>>> -		nodemask_t src = nodemask_of_node(page_to_nid(page));
>>>> -		nodemask_t dst;
>>>> -		nodes_complement(dst, src);
>>>> +		int node = next_online_node(page_to_nid(page));
>>>> +		if (node == MAX_NUMNODES)
>>>> +			node = first_online_node;
>>>>    		return alloc_huge_page_node(page_hstate(compound_head(page)),
>>>> -					    next_node(page_to_nid(page), dst));
>>>> +					    node);
>>>>    	}
>>>>
>>>>    	if (PageHighMem(page))
>>>
>>> Indeed.  Can you tell us more about this circumstances under which the
>>> kernel will crash?  I need to decide which kernel version(s) need the
>>> patch, but the changelog doesn't contain the info needed to make this
>>> decision (it should).
>>>
>>
>> Hi Andrew,
>>
>> I read the code v4.4, and find the following path maybe trigger the bug.
>>
>> alloc_migrate_target()
>> 	alloc_huge_page_node()  // the node may be offline or MAX_NUMNODES
>> 		__alloc_buddy_huge_page_no_mpol()
>> 			__alloc_buddy_huge_page()
>> 				__hugetlb_alloc_buddy_huge_page()
>
> The code in this functions seems to come from 099730d67417d ("mm,
> hugetlb: use memory policy when available") by Dave Hansen (adding to
> CC), which was indeed merged in 4.4-rc1.
>
> However, alloc_pages_node() is only called in the block guarded by:
>
> if (!IS_ENABLED(CONFIG_NUMA) || !vma) {
>
> The rather weird "!IS_ENABLED(CONFIG_NUMA)" part comes from immediate
> followup commit e0ec90ee7e6f ("mm, hugetlbfs: optimize when NUMA=n")
>
> So I doubt the code path here can actually happen. But it's fragile and
> confusing nevertheless.

Ah, so there's actually a dangerous path:
alloc_huge_page_node()
     dequeue_huge_page_node()
         list_for_each_entry(page, &h->hugepage_freelists[nid], lru)

hugepage_freelists is MAX_NUMNODES sized, so when nid is MAX_NUMNODES, 
we access past it.

However, look closer at how nid is obtained in alloc_migrate_target():

nodemask_t src = nodemask_of_node(page_to_nid(page));
nodemask_t dst;
nodes_complement(dst, src);

nid = next_node(page_to_nid(page), dst)

for nid to be MAX_NUMNODES, the original page has to be on node 
MAX_NUMNODES-1, otherwise the complement part means we hit the very next 
bit which is set.

It's actually a rather obfuscated way of doing:

nid = page_to_nid(page) + 1;

In that case the problem is in commit c8721bbbdd36 ("mm: memory-hotplug: 
enable memory hotplug to handle hugepage") from 3.12 and will likely 
affect only people that tune down MAX_NUMNODES to match their machine.

>> 					alloc_pages_node()
>> 						__alloc_pages_node()
>> 							VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
>> 							VM_WARN_ON(!node_online(nid));
>>
>> Thanks,
>> Xishi Qiu
>>
>

  reply	other threads:[~2016-03-29 10:06 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-25  6:56 [PATCH] mm: fix invalid node in alloc_migrate_target() Xishi Qiu
2016-03-25  6:56 ` Xishi Qiu
2016-03-25 19:22 ` Andrew Morton
2016-03-25 19:22   ` Andrew Morton
2016-03-26  5:31   ` Xishi Qiu
2016-03-26  5:31     ` Xishi Qiu
2016-03-29  9:52     ` Vlastimil Babka
2016-03-29  9:52       ` Vlastimil Babka
2016-03-29 10:06       ` Vlastimil Babka [this message]
2016-03-29 10:06         ` Vlastimil Babka
2016-03-29 10:37       ` Xishi Qiu
2016-03-29 10:37         ` Xishi Qiu
2016-03-29 12:21         ` Vlastimil Babka
2016-03-29 12:21           ` Vlastimil Babka
2016-03-29 13:06   ` Vlastimil Babka
2016-03-29 13:06     ` Vlastimil Babka
2016-03-31 13:13     ` Vlastimil Babka
2016-03-31 13:13       ` Vlastimil Babka
2016-03-31 21:01       ` Andrew Morton
2016-03-31 21:01         ` Andrew Morton
2016-04-01  8:42         ` Vlastimil Babka
2016-04-01  8:42           ` Vlastimil Babka
2016-03-29 15:52   ` Michal Hocko
2016-03-29 15:52     ` Michal Hocko
2016-03-29 12:25 ` Vlastimil Babka
2016-03-29 12:25   ` Vlastimil Babka
2016-03-30  1:13   ` Naoya Horiguchi
2016-03-30  1:13     ` Naoya Horiguchi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56FA5392.1030509@suse.cz \
    --to=vbabka@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@intel.com \
    --cc=js1304@gmail.com \
    --cc=lauraa@codeaurora.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=qiuxishi@huawei.com \
    --cc=rientjes@google.com \
    --cc=wangxq10@lzu.edu.cn \
    --cc=zhuhui@xiaomi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.