All of lore.kernel.org
 help / color / mirror / Atom feed
From: Xishi Qiu <qiuxishi@huawei.com>
To: Gu Zheng <guz.fnst@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Tang Chen <tangchen@cn.fujitsu.com>,
	Yinghai Lu <yinghai@kernel.org>, Linux MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Toshi Kani <toshi.kani@hp.com>, Mel Gorman <mgorman@suse.de>,
	Tejun Heo <tj@kernel.org>, Xiexiuqi <xiexiuqi@huawei.com>,
	Hanjun Guo <guohanjun@huawei.com>
Subject: Re: node-hotplug: is memset 0 safe in try_offline_node()?
Date: Wed, 4 Mar 2015 10:22:10 +0800	[thread overview]
Message-ID: <54F66C52.4070600@huawei.com> (raw)
In-Reply-To: <54F58AE3.50101@cn.fujitsu.com>

On 2015/3/3 18:20, Gu Zheng wrote:

> Hi Xishi,
> On 03/03/2015 11:30 AM, Xishi Qiu wrote:
> 
>> When hot-remove a numa node, we will clear pgdat,
>> but is memset 0 safe in try_offline_node()?
> 
> It is not safe here. In fact, this is a temporary solution here.
> As you know, pgdat is accessed lock-less now, so protection
> mechanism (RCUi 1/4 ?) is needed to make it completely safe here,
> but it seems a bit over-kill.
> 
>>
>> process A:			offline node XX:
>> for_each_populated_zone()
>> find online node XX
>> cond_resched()
>> 				offline cpu and memory, then try_offline_node()
>> 				node_set_offline(nid), and memset(pgdat, 0, sizeof(*pgdat))
>> access node XX's pgdat
>> NULL pointer access error
> 
> It's possible, but I did not meet this condition, did you?
> 

Yes, we test hot-add/hot-remove node with stress, and meet the following
call trace several times.

	next_online_pgdat()
		int nid = next_online_node(pgdat->node_id);  // it's here, pgdat is NULL

I add some printk, it shows the above pgdat is just the offline node's pgdat.
The reason may be that for_each_zone() and for_each_populated_zone() are lock-less.
And stop machine could not resolve it, because cond_resched() maybe in cyclical code.

[ 1422.011064] BUG: unable to handle kernel paging request at 0000000000025f60
[ 1422.011086] IP: [<ffffffff81126b91>] next_online_pgdat+0x1/0x50
[ 1422.011178] PGD 0 
[ 1422.011180] Oops: 0000 [#1] SMP 
[ 1422.011409] ACPI: Device does not support D3cold
[ 1422.011961] Modules linked in: fuse nls_iso8859_1 nls_cp437 vfat fat loop dm_mod coretemp mperf crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 pcspkr microcode igb dca i2c_algo_bit ipv6 megaraid_sas iTCO_wdt i2c_i801 i2c_core iTCO_vendor_support tg3 sg hwmon ptp lpc_ich pps_core mfd_core acpi_pad rtc_cmos button ext3 jbd mbcache sd_mod crc_t10dif scsi_dh_alua scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh ahci libahci libata scsi_mod [last unloaded: rasf]
[ 1422.012006] CPU: 23 PID: 238 Comm: kworker/23:1 Tainted: G           O 3.10.15-5885-euler0302 #1
[ 1422.012024] Hardware name: HUAWEI TECHNOLOGIES CO.,LTD. Huawei N1/Huawei N1, BIOS V100R001 03/02/2015
[ 1422.012065] Workqueue: events vmstat_update
[ 1422.012084] task: ffffa800d32c0000 ti: ffffa800d32ae000 task.ti: ffffa800d32ae000
[ 1422.012165] RIP: 0010:[<ffffffff81126b91>]  [<ffffffff81126b91>] next_online_pgdat+0x1/0x50
[ 1422.012205] RSP: 0018:ffffa800d32afce8  EFLAGS: 00010286
[ 1422.012225] RAX: 0000000000001440 RBX: ffffffff81da53b8 RCX: 0000000000000082
[ 1422.012226] RDX: 0000000000000000 RSI: 0000000000000082 RDI: 0000000000000000
[ 1422.012254] RBP: ffffa800d32afd28 R08: ffffffff81c93bfc R09: ffffffff81cbdc96
[ 1422.012272] R10: 00000000000040ec R11: 00000000000000a0 R12: ffffa800fffb3440
[ 1422.012290] R13: ffffa800d32afd38 R14: 0000000000000017 R15: ffffa800e6616800
[ 1422.012292] FS:  0000000000000000(0000) GS:ffffa800e6600000(0000) knlGS:0000000000000000
[ 1422.012314] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1422.012328] CR2: 0000000000025f60 CR3: 0000000001a0b000 CR4: 00000000001407e0
[ 1422.012328] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1422.012328] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1422.012328] Stack:
[ 1422.012328]  ffffa800d32afd28 ffffffff81126ca5 ffffa800ffffffff ffffffff814b4314
[ 1422.012328]  ffffa800d32ae010 0000000000000000 ffffa800e6616180 ffffa800fffb3440
[ 1422.012328]  ffffa800d32afde8 ffffffff81128220 ffffffff00000013 0000000000000038
[ 1422.012328] Call Trace:
[ 1422.012328]  [<ffffffff81126ca5>] ? next_zone+0xc5/0x150
[ 1422.012328]  [<ffffffff814b4314>] ? __schedule+0x544/0x780
[ 1422.012328]  [<ffffffff81128220>] refresh_cpu_vm_stats+0xd0/0x140
[ 1422.012328]  [<ffffffff811282a1>] vmstat_update+0x11/0x50
[ 1422.012328]  [<ffffffff81064c24>] process_one_work+0x194/0x3d0
[ 1422.012328]  [<ffffffff810660bb>] worker_thread+0x12b/0x410
[ 1422.012328]  [<ffffffff81065f90>] ? manage_workers+0x1a0/0x1a0
[ 1422.012328]  [<ffffffff8106ba66>] kthread+0xc6/0xd0
[ 1422.012328]  [<ffffffff8106b9a0>] ? kthread_freezable_should_stop+0x70/0x70
[ 1422.012328]  [<ffffffff814be0ac>] ret_from_fork+0x7c/0xb0
[ 1422.012328]  [<ffffffff8106b9a0>] ? kthread_freezable_should_stop+0x70/0x70

Thanks,
Xishi Qiu

> Regards,
> Gu
> 
>>
>> Thanks,
>> Xishi Qiu
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>
> 
> 
> 
> .
> 



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Xishi Qiu <qiuxishi@huawei.com>
To: Gu Zheng <guz.fnst@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Tang Chen <tangchen@cn.fujitsu.com>,
	Yinghai Lu <yinghai@kernel.org>, Linux MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Toshi Kani <toshi.kani@hp.com>, Mel Gorman <mgorman@suse.de>,
	Tejun Heo <tj@kernel.org>, Xiexiuqi <xiexiuqi@huawei.com>,
	Hanjun Guo <guohanjun@huawei.com>
Subject: Re: node-hotplug: is memset 0 safe in try_offline_node()?
Date: Wed, 4 Mar 2015 10:22:10 +0800	[thread overview]
Message-ID: <54F66C52.4070600@huawei.com> (raw)
In-Reply-To: <54F58AE3.50101@cn.fujitsu.com>

On 2015/3/3 18:20, Gu Zheng wrote:

> Hi Xishi,
> On 03/03/2015 11:30 AM, Xishi Qiu wrote:
> 
>> When hot-remove a numa node, we will clear pgdat,
>> but is memset 0 safe in try_offline_node()?
> 
> It is not safe here. In fact, this is a temporary solution here.
> As you know, pgdat is accessed lock-less now, so protection
> mechanism (RCU?) is needed to make it completely safe here,
> but it seems a bit over-kill.
> 
>>
>> process A:			offline node XX:
>> for_each_populated_zone()
>> find online node XX
>> cond_resched()
>> 				offline cpu and memory, then try_offline_node()
>> 				node_set_offline(nid), and memset(pgdat, 0, sizeof(*pgdat))
>> access node XX's pgdat
>> NULL pointer access error
> 
> It's possible, but I did not meet this condition, did you?
> 

Yes, we test hot-add/hot-remove node with stress, and meet the following
call trace several times.

	next_online_pgdat()
		int nid = next_online_node(pgdat->node_id);  // it's here, pgdat is NULL

I add some printk, it shows the above pgdat is just the offline node's pgdat.
The reason may be that for_each_zone() and for_each_populated_zone() are lock-less.
And stop machine could not resolve it, because cond_resched() maybe in cyclical code.

[ 1422.011064] BUG: unable to handle kernel paging request at 0000000000025f60
[ 1422.011086] IP: [<ffffffff81126b91>] next_online_pgdat+0x1/0x50
[ 1422.011178] PGD 0 
[ 1422.011180] Oops: 0000 [#1] SMP 
[ 1422.011409] ACPI: Device does not support D3cold
[ 1422.011961] Modules linked in: fuse nls_iso8859_1 nls_cp437 vfat fat loop dm_mod coretemp mperf crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 pcspkr microcode igb dca i2c_algo_bit ipv6 megaraid_sas iTCO_wdt i2c_i801 i2c_core iTCO_vendor_support tg3 sg hwmon ptp lpc_ich pps_core mfd_core acpi_pad rtc_cmos button ext3 jbd mbcache sd_mod crc_t10dif scsi_dh_alua scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh ahci libahci libata scsi_mod [last unloaded: rasf]
[ 1422.012006] CPU: 23 PID: 238 Comm: kworker/23:1 Tainted: G           O 3.10.15-5885-euler0302 #1
[ 1422.012024] Hardware name: HUAWEI TECHNOLOGIES CO.,LTD. Huawei N1/Huawei N1, BIOS V100R001 03/02/2015
[ 1422.012065] Workqueue: events vmstat_update
[ 1422.012084] task: ffffa800d32c0000 ti: ffffa800d32ae000 task.ti: ffffa800d32ae000
[ 1422.012165] RIP: 0010:[<ffffffff81126b91>]  [<ffffffff81126b91>] next_online_pgdat+0x1/0x50
[ 1422.012205] RSP: 0018:ffffa800d32afce8  EFLAGS: 00010286
[ 1422.012225] RAX: 0000000000001440 RBX: ffffffff81da53b8 RCX: 0000000000000082
[ 1422.012226] RDX: 0000000000000000 RSI: 0000000000000082 RDI: 0000000000000000
[ 1422.012254] RBP: ffffa800d32afd28 R08: ffffffff81c93bfc R09: ffffffff81cbdc96
[ 1422.012272] R10: 00000000000040ec R11: 00000000000000a0 R12: ffffa800fffb3440
[ 1422.012290] R13: ffffa800d32afd38 R14: 0000000000000017 R15: ffffa800e6616800
[ 1422.012292] FS:  0000000000000000(0000) GS:ffffa800e6600000(0000) knlGS:0000000000000000
[ 1422.012314] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1422.012328] CR2: 0000000000025f60 CR3: 0000000001a0b000 CR4: 00000000001407e0
[ 1422.012328] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1422.012328] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1422.012328] Stack:
[ 1422.012328]  ffffa800d32afd28 ffffffff81126ca5 ffffa800ffffffff ffffffff814b4314
[ 1422.012328]  ffffa800d32ae010 0000000000000000 ffffa800e6616180 ffffa800fffb3440
[ 1422.012328]  ffffa800d32afde8 ffffffff81128220 ffffffff00000013 0000000000000038
[ 1422.012328] Call Trace:
[ 1422.012328]  [<ffffffff81126ca5>] ? next_zone+0xc5/0x150
[ 1422.012328]  [<ffffffff814b4314>] ? __schedule+0x544/0x780
[ 1422.012328]  [<ffffffff81128220>] refresh_cpu_vm_stats+0xd0/0x140
[ 1422.012328]  [<ffffffff811282a1>] vmstat_update+0x11/0x50
[ 1422.012328]  [<ffffffff81064c24>] process_one_work+0x194/0x3d0
[ 1422.012328]  [<ffffffff810660bb>] worker_thread+0x12b/0x410
[ 1422.012328]  [<ffffffff81065f90>] ? manage_workers+0x1a0/0x1a0
[ 1422.012328]  [<ffffffff8106ba66>] kthread+0xc6/0xd0
[ 1422.012328]  [<ffffffff8106b9a0>] ? kthread_freezable_should_stop+0x70/0x70
[ 1422.012328]  [<ffffffff814be0ac>] ret_from_fork+0x7c/0xb0
[ 1422.012328]  [<ffffffff8106b9a0>] ? kthread_freezable_should_stop+0x70/0x70

Thanks,
Xishi Qiu

> Regards,
> Gu
> 
>>
>> Thanks,
>> Xishi Qiu
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>
> 
> 
> 
> .
> 




  reply	other threads:[~2015-03-04  2:26 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-03  3:30 node-hotplug: is memset 0 safe in try_offline_node()? Xishi Qiu
2015-03-03  3:30 ` Xishi Qiu
2015-03-03 10:20 ` Gu Zheng
2015-03-03 10:20   ` Gu Zheng
2015-03-04  2:22   ` Xishi Qiu [this message]
2015-03-04  2:22     ` Xishi Qiu
2015-03-04  2:52     ` Xishi Qiu
2015-03-04  2:52       ` Xishi Qiu
2015-03-04  3:56       ` Gu Zheng
2015-03-04  3:56         ` Gu Zheng
2015-03-04  8:03         ` Xishi Qiu
2015-03-04  8:03           ` Xishi Qiu
2015-03-04  8:53           ` Kamezawa Hiroyuki
2015-03-04  8:53             ` Kamezawa Hiroyuki
2015-03-04  9:53             ` Gu Zheng
2015-03-04  9:53               ` Gu Zheng
2015-03-04  3:53     ` Gu Zheng
2015-03-04  3:53       ` Gu Zheng
2015-03-04  7:01       ` Xishi Qiu
2015-03-04  7:01         ` Xishi Qiu
2015-03-04  8:31       ` Xie XiuQi
2015-03-04  8:31         ` Xie XiuQi
2015-03-05  8:26 ` Gu Zheng
2015-03-05  8:26   ` Gu Zheng
2015-03-05  9:39   ` Xishi Qiu
2015-03-05  9:39     ` Xishi Qiu
2015-03-05  9:45     ` Gu Zheng
2015-03-05  9:45       ` Gu Zheng
2015-03-11  1:12     ` Gu Zheng
2015-03-11  1:12       ` Gu Zheng
2015-03-11  2:51       ` Xie XiuQi
2015-03-11  2:51         ` Xie XiuQi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54F66C52.4070600@huawei.com \
    --to=qiuxishi@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=guohanjun@huawei.com \
    --cc=guz.fnst@cn.fujitsu.com \
    --cc=isimatu.yasuaki@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=tangchen@cn.fujitsu.com \
    --cc=tj@kernel.org \
    --cc=toshi.kani@hp.com \
    --cc=xiexiuqi@huawei.com \
    --cc=yinghai@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.