From: Xishi Qiu <qiuxishi@huawei.com>
To: Yasuaki Ishimatsu <yasu.isimatu@gmail.com>
Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>,
Andrew Morton <akpm@linux-foundation.org>,
Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>,
Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
izumi.taku@jp.fujitsu.com, Tang Chen <tangchen@cn.fujitsu.com>,
Xiexiuqi <xiexiuqi@huawei.com>, Mel Gorman <mgorman@suse.de>,
David Rientjes <rientjes@google.com>,
Linux MM <linux-mm@kvack.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()
Date: Tue, 21 Apr 2015 09:31:37 +0800 [thread overview]
Message-ID: <5535A879.8050302@huawei.com> (raw)
In-Reply-To: <55354434.2902ec0a.14ab.fffffff6@mx.google.com>
On 2015/4/21 2:23, Yasuaki Ishimatsu wrote:
>
> On Mon, 20 Apr 2015 11:42:10 +0800
> Xishi Qiu <qiuxishi@huawei.com> wrote:
>
>> On 2015/4/20 11:29, Yasuaki Ishimatsu wrote:
>>
>>>
>>> On Mon, 20 Apr 2015 10:45:45 +0800
>>> Xishi Qiu <qiuxishi@huawei.com> wrote:
>>>
>>>> On 2015/4/20 9:42, Gu Zheng wrote:
>>>>
>>>>> Hi Xishi,
>>>>> On 04/18/2015 04:05 AM, Yasuaki Ishimatsu wrote:
>>>>>
>>>>>>
>>>>>> Your patches will fix your issue.
>>>>>> But, if BIOS reports memory first at node hot add, pgdat can
>>>>>> not be initialized.
>>>>>>
>>>>>> Memory hot add flows are as follows:
>>>>>>
>>>>>> add_memory
>>>>>> ...
>>>>>> -> hotadd_new_pgdat()
>>>>>> ...
>>>>>> -> node_set_online(nid)
>>>>>>
>>>>>> When calling hotadd_new_pgdat() for a hot added node, the node is
>>>>>> offline because node_set_online() is not called yet. So if applying
>>>>>> your patches, the pgdat is not initialized in this case.
>>>>>
>>>>> Ishimtasu's worry is reasonable. And I am afraid the fix here is a bit
>>>>> over-kill.
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Yasuaki Ishimatsu
>>>>>>
>>>>>> On Fri, 17 Apr 2015 18:50:32 +0800
>>>>>> Xishi Qiu <qiuxishi@huawei.com> wrote:
>>>>>>
>>>>>>> Hot remove nodeXX, then hot add nodeXX. If BIOS report cpu first, it will call
>>>>>>> hotadd_new_pgdat(nid, 0), this will set pgdat->node_start_pfn to 0. As nodeXX
>>>>>>> exists at boot time, so pgdat->node_spanned_pages is the same as original. Then
>>>>>>> free_area_init_core()->memmap_init() will pass a wrong start and a nonzero size.
>>>>>
>>>>> As your analysis said the root cause here is passing a *0* as the node_start_pfn,
>>>>> then the chaos occurred when init the zones. And this only happens to the re-hotadd
>>>>> node, so how about using the saved *node_start_pfn* (via get_pfn_range_for_nid(nid, &start_pfn, &end_pfn))
>>>>> instead if we find "pgdat->node_start_pfn == 0 && !node_online(XXX)"?
>>>>>
>>>>> Thanks,
>>>>> Gu
>>>>>
>>>>
>>>> Hi Gu,
>>>>
>>>> I first considered this method, but if the hot added node's start and size are different
>>>> from before, it makes the chaos.
>>>>
>>>
>>>> e.g.
>>>> nodeXX (8-16G)
>>>> remove nodeXX
>>>> BIOS report cpu first and online it
>>>> hotadd nodeXX
>>>> use the original value, so pgdat->node_start_pfn is set to 8G, and size is 8G
>>>> BIOS report mem(10-12G)
>>>> call add_memory()->__add_zone()->grow_zone_span()/grow_pgdat_span()
>>>> the start is still 8G, not 10G, this is chaos!
>>>
>>> If you set CONFIG_HAVE_MEMBLOCK_NODE_MAP, kernel shows the following
>>> pr_info()'s message.
>>>
>>> void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
>>> unsigned long node_start_pfn, unsigned long *zholes_size)
>>> {
>>> ...
>>> #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
>>> get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
>>> pr_info("Initmem setup node %d [mem %#018Lx-%#018Lx]\n", nid,
>>> (u64)start_pfn << PAGE_SHIFT, ((u64)end_pfn << PAGE_SHIFT) - 1);
>>> #endif
>>> }
>>>
>>> Is the memory range of the message "8G - 16G"?
>>> If so, the reason is that memblk is not deleted at memory hot remove.
>>>
>>> Thanks,
>>> Yasuaki Ishimatsu
>>>
>>
>> Hi Yasuaki,
>>
>
>> By reading the code, I find memblk is not deleted at memory hot remove.
>> I am not sure whether we should remove it. If remove it, we should also reset
>> "arch_zone_lowest_possible_pfn", right? It seems a little complicated.
>
> I think memblk should be added/removed by hot adding/removing memory.
> But, arch_zone_lowest_possible_pfn should not be changed.
>
Ok, thanks for your suggestion.
> Thanks,
> Yasuaki Ishimatsu
>
>>
>> Thanks,
>> Xishi Qiu
>>
>>>
>>>
>>>>
>>>> Thanks,
>>>> Xishi Qiu
>>>>
>>>
>>> .
>>>
>>
>>
>>
>
> .
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Xishi Qiu <qiuxishi@huawei.com>
To: Yasuaki Ishimatsu <yasu.isimatu@gmail.com>
Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>,
Andrew Morton <akpm@linux-foundation.org>,
Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>,
Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
<izumi.taku@jp.fujitsu.com>, Tang Chen <tangchen@cn.fujitsu.com>,
Xiexiuqi <xiexiuqi@huawei.com>, Mel Gorman <mgorman@suse.de>,
David Rientjes <rientjes@google.com>,
Linux MM <linux-mm@kvack.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages()
Date: Tue, 21 Apr 2015 09:31:37 +0800 [thread overview]
Message-ID: <5535A879.8050302@huawei.com> (raw)
In-Reply-To: <55354434.2902ec0a.14ab.fffffff6@mx.google.com>
On 2015/4/21 2:23, Yasuaki Ishimatsu wrote:
>
> On Mon, 20 Apr 2015 11:42:10 +0800
> Xishi Qiu <qiuxishi@huawei.com> wrote:
>
>> On 2015/4/20 11:29, Yasuaki Ishimatsu wrote:
>>
>>>
>>> On Mon, 20 Apr 2015 10:45:45 +0800
>>> Xishi Qiu <qiuxishi@huawei.com> wrote:
>>>
>>>> On 2015/4/20 9:42, Gu Zheng wrote:
>>>>
>>>>> Hi Xishi,
>>>>> On 04/18/2015 04:05 AM, Yasuaki Ishimatsu wrote:
>>>>>
>>>>>>
>>>>>> Your patches will fix your issue.
>>>>>> But, if BIOS reports memory first at node hot add, pgdat can
>>>>>> not be initialized.
>>>>>>
>>>>>> Memory hot add flows are as follows:
>>>>>>
>>>>>> add_memory
>>>>>> ...
>>>>>> -> hotadd_new_pgdat()
>>>>>> ...
>>>>>> -> node_set_online(nid)
>>>>>>
>>>>>> When calling hotadd_new_pgdat() for a hot added node, the node is
>>>>>> offline because node_set_online() is not called yet. So if applying
>>>>>> your patches, the pgdat is not initialized in this case.
>>>>>
>>>>> Ishimtasu's worry is reasonable. And I am afraid the fix here is a bit
>>>>> over-kill.
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Yasuaki Ishimatsu
>>>>>>
>>>>>> On Fri, 17 Apr 2015 18:50:32 +0800
>>>>>> Xishi Qiu <qiuxishi@huawei.com> wrote:
>>>>>>
>>>>>>> Hot remove nodeXX, then hot add nodeXX. If BIOS report cpu first, it will call
>>>>>>> hotadd_new_pgdat(nid, 0), this will set pgdat->node_start_pfn to 0. As nodeXX
>>>>>>> exists at boot time, so pgdat->node_spanned_pages is the same as original. Then
>>>>>>> free_area_init_core()->memmap_init() will pass a wrong start and a nonzero size.
>>>>>
>>>>> As your analysis said the root cause here is passing a *0* as the node_start_pfn,
>>>>> then the chaos occurred when init the zones. And this only happens to the re-hotadd
>>>>> node, so how about using the saved *node_start_pfn* (via get_pfn_range_for_nid(nid, &start_pfn, &end_pfn))
>>>>> instead if we find "pgdat->node_start_pfn == 0 && !node_online(XXX)"?
>>>>>
>>>>> Thanks,
>>>>> Gu
>>>>>
>>>>
>>>> Hi Gu,
>>>>
>>>> I first considered this method, but if the hot added node's start and size are different
>>>> from before, it makes the chaos.
>>>>
>>>
>>>> e.g.
>>>> nodeXX (8-16G)
>>>> remove nodeXX
>>>> BIOS report cpu first and online it
>>>> hotadd nodeXX
>>>> use the original value, so pgdat->node_start_pfn is set to 8G, and size is 8G
>>>> BIOS report mem(10-12G)
>>>> call add_memory()->__add_zone()->grow_zone_span()/grow_pgdat_span()
>>>> the start is still 8G, not 10G, this is chaos!
>>>
>>> If you set CONFIG_HAVE_MEMBLOCK_NODE_MAP, kernel shows the following
>>> pr_info()'s message.
>>>
>>> void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
>>> unsigned long node_start_pfn, unsigned long *zholes_size)
>>> {
>>> ...
>>> #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
>>> get_pfn_range_for_nid(nid, &start_pfn, &end_pfn);
>>> pr_info("Initmem setup node %d [mem %#018Lx-%#018Lx]\n", nid,
>>> (u64)start_pfn << PAGE_SHIFT, ((u64)end_pfn << PAGE_SHIFT) - 1);
>>> #endif
>>> }
>>>
>>> Is the memory range of the message "8G - 16G"?
>>> If so, the reason is that memblk is not deleted at memory hot remove.
>>>
>>> Thanks,
>>> Yasuaki Ishimatsu
>>>
>>
>> Hi Yasuaki,
>>
>
>> By reading the code, I find memblk is not deleted at memory hot remove.
>> I am not sure whether we should remove it. If remove it, we should also reset
>> "arch_zone_lowest_possible_pfn", right? It seems a little complicated.
>
> I think memblk should be added/removed by hot adding/removing memory.
> But, arch_zone_lowest_possible_pfn should not be changed.
>
Ok, thanks for your suggestion.
> Thanks,
> Yasuaki Ishimatsu
>
>>
>> Thanks,
>> Xishi Qiu
>>
>>>
>>>
>>>>
>>>> Thanks,
>>>> Xishi Qiu
>>>>
>>>
>>> .
>>>
>>
>>
>>
>
> .
>
next prev parent reply other threads:[~2015-04-21 2:51 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-17 10:50 [PATCH 1/2 V2] memory-hotplug: fix BUG_ON in move_freepages() Xishi Qiu
2015-04-17 10:50 ` Xishi Qiu
2015-04-17 20:05 ` Yasuaki Ishimatsu
2015-04-17 20:05 ` Yasuaki Ishimatsu
2015-04-20 1:33 ` Xishi Qiu
2015-04-20 1:33 ` Xishi Qiu
2015-04-20 1:56 ` Yasuaki Ishimatsu
2015-04-20 1:56 ` Yasuaki Ishimatsu
2015-04-20 2:11 ` Yasuaki Ishimatsu
2015-04-20 2:11 ` Yasuaki Ishimatsu
2015-04-20 2:08 ` Gu Zheng
2015-04-20 2:08 ` Gu Zheng
2015-04-20 2:09 ` Gu Zheng
2015-04-20 2:09 ` Gu Zheng
2015-04-20 2:59 ` Xishi Qiu
2015-04-20 2:59 ` Xishi Qiu
2015-04-20 3:15 ` Yasuaki Ishimatsu
2015-04-20 3:15 ` Yasuaki Ishimatsu
2015-04-20 3:34 ` Xishi Qiu
2015-04-20 3:34 ` Xishi Qiu
2015-04-20 2:23 ` Yasuaki Ishimatsu
2015-04-20 2:23 ` Yasuaki Ishimatsu
2015-04-20 1:42 ` Gu Zheng
2015-04-20 1:42 ` Gu Zheng
2015-04-20 2:45 ` Xishi Qiu
2015-04-20 2:45 ` Xishi Qiu
2015-04-20 3:29 ` Yasuaki Ishimatsu
2015-04-20 3:29 ` Yasuaki Ishimatsu
2015-04-20 3:42 ` Xishi Qiu
2015-04-20 3:42 ` Xishi Qiu
2015-04-20 18:23 ` Yasuaki Ishimatsu
2015-04-20 18:23 ` Yasuaki Ishimatsu
2015-04-21 1:31 ` Xishi Qiu [this message]
2015-04-21 1:31 ` Xishi Qiu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5535A879.8050302@huawei.com \
--to=qiuxishi@huawei.com \
--cc=akpm@linux-foundation.org \
--cc=guz.fnst@cn.fujitsu.com \
--cc=isimatu.yasuaki@jp.fujitsu.com \
--cc=izumi.taku@jp.fujitsu.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=rientjes@google.com \
--cc=tangchen@cn.fujitsu.com \
--cc=xiexiuqi@huawei.com \
--cc=yasu.isimatu@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.