All of lore.kernel.org
 help / color / mirror / Atom feed
From: Xishi Qiu <qiuxishi@huawei.com>
To: Gu Zheng <guz.fnst@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Tang Chen <tangchen@cn.fujitsu.com>,
	Yinghai Lu <yinghai@kernel.org>, Linux MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Toshi Kani <toshi.kani@hp.com>, Mel Gorman <mgorman@suse.de>,
	Tejun Heo <tj@kernel.org>, Xiexiuqi <xiexiuqi@huawei.com>,
	Hanjun Guo <guohanjun@huawei.com>
Subject: Re: node-hotplug: is memset 0 safe in try_offline_node()?
Date: Wed, 4 Mar 2015 15:01:38 +0800	[thread overview]
Message-ID: <54F6ADD2.3080403@huawei.com> (raw)
In-Reply-To: <54F681A7.4050203@cn.fujitsu.com>

On 2015/3/4 11:53, Gu Zheng wrote:

> Hi Xishi,
> 
> On 03/04/2015 10:22 AM, Xishi Qiu wrote:
> 
>> On 2015/3/3 18:20, Gu Zheng wrote:
>>
>>> Hi Xishi,
>>> On 03/03/2015 11:30 AM, Xishi Qiu wrote:
>>>
>>>> When hot-remove a numa node, we will clear pgdat,
>>>> but is memset 0 safe in try_offline_node()?
>>>
>>> It is not safe here. In fact, this is a temporary solution here.
>>> As you know, pgdat is accessed lock-less now, so protection
>>> mechanism (RCUi 1/4 ?) is needed to make it completely safe here,
>>> but it seems a bit over-kill.
>>>
>>>>
>>>> process A:			offline node XX:
>>>> for_each_populated_zone()
>>>> find online node XX
>>>> cond_resched()
>>>> 				offline cpu and memory, then try_offline_node()
>>>> 				node_set_offline(nid), and memset(pgdat, 0, sizeof(*pgdat))
>>>> access node XX's pgdat
>>>> NULL pointer access error
>>>
>>> It's possible, but I did not meet this condition, did you?
>>>
>>
>> Yes, we test hot-add/hot-remove node with stress, and meet the following
>> call trace several times.
> 
> Thanks.
> 
>>
>> 	next_online_pgdat()
>> 		int nid = next_online_node(pgdat->node_id);  // it's here, pgdat is NULL
> 
> 	memset(pgdat, 0, sizeof(*pgdat));
> This memset just sets the context of pgdat to 0, but it will not free pgdat, so the *pgdat is
> NULL* is strange here.
> But anyway, the bug is real, we must fix it.

next_zone()
	pg_data_t *pgdat = zone->zone_pgdat;  // I think this pgdat is NULL, and NODE_DATA() is not NULL.
	...
	pgdat = next_online_pgdat(pgdat);
		int nid = next_online_node(pgdat->node_id);  // so here is the null pointer access

Thanks for your new patch, I'll test it.

Thanks,
Xishi Qiu

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Xishi Qiu <qiuxishi@huawei.com>
To: Gu Zheng <guz.fnst@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Tang Chen <tangchen@cn.fujitsu.com>,
	Yinghai Lu <yinghai@kernel.org>, Linux MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Toshi Kani <toshi.kani@hp.com>, Mel Gorman <mgorman@suse.de>,
	Tejun Heo <tj@kernel.org>, Xiexiuqi <xiexiuqi@huawei.com>,
	Hanjun Guo <guohanjun@huawei.com>
Subject: Re: node-hotplug: is memset 0 safe in try_offline_node()?
Date: Wed, 4 Mar 2015 15:01:38 +0800	[thread overview]
Message-ID: <54F6ADD2.3080403@huawei.com> (raw)
In-Reply-To: <54F681A7.4050203@cn.fujitsu.com>

On 2015/3/4 11:53, Gu Zheng wrote:

> Hi Xishi,
> 
> On 03/04/2015 10:22 AM, Xishi Qiu wrote:
> 
>> On 2015/3/3 18:20, Gu Zheng wrote:
>>
>>> Hi Xishi,
>>> On 03/03/2015 11:30 AM, Xishi Qiu wrote:
>>>
>>>> When hot-remove a numa node, we will clear pgdat,
>>>> but is memset 0 safe in try_offline_node()?
>>>
>>> It is not safe here. In fact, this is a temporary solution here.
>>> As you know, pgdat is accessed lock-less now, so protection
>>> mechanism (RCU?) is needed to make it completely safe here,
>>> but it seems a bit over-kill.
>>>
>>>>
>>>> process A:			offline node XX:
>>>> for_each_populated_zone()
>>>> find online node XX
>>>> cond_resched()
>>>> 				offline cpu and memory, then try_offline_node()
>>>> 				node_set_offline(nid), and memset(pgdat, 0, sizeof(*pgdat))
>>>> access node XX's pgdat
>>>> NULL pointer access error
>>>
>>> It's possible, but I did not meet this condition, did you?
>>>
>>
>> Yes, we test hot-add/hot-remove node with stress, and meet the following
>> call trace several times.
> 
> Thanks.
> 
>>
>> 	next_online_pgdat()
>> 		int nid = next_online_node(pgdat->node_id);  // it's here, pgdat is NULL
> 
> 	memset(pgdat, 0, sizeof(*pgdat));
> This memset just sets the context of pgdat to 0, but it will not free pgdat, so the *pgdat is
> NULL* is strange here.
> But anyway, the bug is real, we must fix it.

next_zone()
	pg_data_t *pgdat = zone->zone_pgdat;  // I think this pgdat is NULL, and NODE_DATA() is not NULL.
	...
	pgdat = next_online_pgdat(pgdat);
		int nid = next_online_node(pgdat->node_id);  // so here is the null pointer access

Thanks for your new patch, I'll test it.

Thanks,
Xishi Qiu


  reply	other threads:[~2015-03-04  7:02 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-03  3:30 node-hotplug: is memset 0 safe in try_offline_node()? Xishi Qiu
2015-03-03  3:30 ` Xishi Qiu
2015-03-03 10:20 ` Gu Zheng
2015-03-03 10:20   ` Gu Zheng
2015-03-04  2:22   ` Xishi Qiu
2015-03-04  2:22     ` Xishi Qiu
2015-03-04  2:52     ` Xishi Qiu
2015-03-04  2:52       ` Xishi Qiu
2015-03-04  3:56       ` Gu Zheng
2015-03-04  3:56         ` Gu Zheng
2015-03-04  8:03         ` Xishi Qiu
2015-03-04  8:03           ` Xishi Qiu
2015-03-04  8:53           ` Kamezawa Hiroyuki
2015-03-04  8:53             ` Kamezawa Hiroyuki
2015-03-04  9:53             ` Gu Zheng
2015-03-04  9:53               ` Gu Zheng
2015-03-04  3:53     ` Gu Zheng
2015-03-04  3:53       ` Gu Zheng
2015-03-04  7:01       ` Xishi Qiu [this message]
2015-03-04  7:01         ` Xishi Qiu
2015-03-04  8:31       ` Xie XiuQi
2015-03-04  8:31         ` Xie XiuQi
2015-03-05  8:26 ` Gu Zheng
2015-03-05  8:26   ` Gu Zheng
2015-03-05  9:39   ` Xishi Qiu
2015-03-05  9:39     ` Xishi Qiu
2015-03-05  9:45     ` Gu Zheng
2015-03-05  9:45       ` Gu Zheng
2015-03-11  1:12     ` Gu Zheng
2015-03-11  1:12       ` Gu Zheng
2015-03-11  2:51       ` Xie XiuQi
2015-03-11  2:51         ` Xie XiuQi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54F6ADD2.3080403@huawei.com \
    --to=qiuxishi@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=guohanjun@huawei.com \
    --cc=guz.fnst@cn.fujitsu.com \
    --cc=isimatu.yasuaki@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=tangchen@cn.fujitsu.com \
    --cc=tj@kernel.org \
    --cc=toshi.kani@hp.com \
    --cc=xiexiuqi@huawei.com \
    --cc=yinghai@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.