From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f169.google.com (mail-pd0-f169.google.com [209.85.192.169]) by kanga.kvack.org (Postfix) with ESMTP id 0B5746B0038 for ; Mon, 2 Mar 2015 22:30:52 -0500 (EST) Received: by pdno5 with SMTP id o5so44806256pdn.8 for ; Mon, 02 Mar 2015 19:30:51 -0800 (PST) Received: from szxga01-in.huawei.com ([119.145.14.64]) by mx.google.com with ESMTPS id sa3si7640323pac.27.2015.03.02.19.30.49 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Mon, 02 Mar 2015 19:30:51 -0800 (PST) Message-ID: <54F52ACF.4030103@huawei.com> Date: Tue, 3 Mar 2015 11:30:23 +0800 From: Xishi Qiu MIME-Version: 1.0 Subject: node-hotplug: is memset 0 safe in try_offline_node()? Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Yasuaki Ishimatsu , Andrew Morton , Tang Chen , Yinghai Lu Cc: Linux MM , LKML , Toshi Kani , Mel Gorman , Tejun Heo When hot-remove a numa node, we will clear pgdat, but is memset 0 safe in try_offline_node()? process A: offline node XX: for_each_populated_zone() find online node XX cond_resched() offline cpu and memory, then try_offline_node() node_set_offline(nid), and memset(pgdat, 0, sizeof(*pgdat)) access node XX's pgdat NULL pointer access error Thanks, Xishi Qiu -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f171.google.com (mail-pd0-f171.google.com [209.85.192.171]) by kanga.kvack.org (Postfix) with ESMTP id 0F2E36B0038 for ; Tue, 3 Mar 2015 05:37:39 -0500 (EST) Received: by pdbnh10 with SMTP id nh10so23843330pdb.3 for ; Tue, 03 Mar 2015 02:37:38 -0800 (PST) Received: from heian.cn.fujitsu.com ([59.151.112.132]) by mx.google.com with ESMTP id fn7si392415pdb.157.2015.03.03.02.37.37 for ; Tue, 03 Mar 2015 02:37:38 -0800 (PST) Message-ID: <54F58AE3.50101@cn.fujitsu.com> Date: Tue, 3 Mar 2015 18:20:19 +0800 From: Gu Zheng MIME-Version: 1.0 Subject: Re: node-hotplug: is memset 0 safe in try_offline_node()? References: <54F52ACF.4030103@huawei.com> In-Reply-To: <54F52ACF.4030103@huawei.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org List-ID: To: Xishi Qiu Cc: Yasuaki Ishimatsu , Andrew Morton , Tang Chen , Yinghai Lu , Linux MM , LKML , Toshi Kani , Mel Gorman , Tejun Heo Hi Xishi, On 03/03/2015 11:30 AM, Xishi Qiu wrote: > When hot-remove a numa node, we will clear pgdat, > but is memset 0 safe in try_offline_node()? It is not safe here. In fact, this is a temporary solution here. As you know, pgdat is accessed lock-less now, so protection mechanism (RCU=EF=BC=9F) is needed to make it completely safe here, but it seems a bit over-kill. >=20 > process A: offline node XX: > for_each_populated_zone() > find online node XX > cond_resched() > offline cpu and memory, then try_offline_node() > node_set_offline(nid), and memset(pgdat, 0, sizeof(*pgdat)) > access node XX's pgdat > NULL pointer access error It's possible, but I did not meet this condition, did you? Regards, Gu >=20 > Thanks, > Xishi Qiu >=20 > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org >=20 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ig0-f182.google.com (mail-ig0-f182.google.com [209.85.213.182]) by kanga.kvack.org (Postfix) with ESMTP id 1B49B6B0038 for ; Tue, 3 Mar 2015 21:26:03 -0500 (EST) Received: by igjz20 with SMTP id z20so33395123igj.4 for ; Tue, 03 Mar 2015 18:26:02 -0800 (PST) Received: from szxga01-in.huawei.com (szxga01-in.huawei.com. [119.145.14.64]) by mx.google.com with ESMTPS id y137si3366670iod.20.2015.03.03.18.26.00 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Tue, 03 Mar 2015 18:26:02 -0800 (PST) Message-ID: <54F66C52.4070600@huawei.com> Date: Wed, 4 Mar 2015 10:22:10 +0800 From: Xishi Qiu MIME-Version: 1.0 Subject: Re: node-hotplug: is memset 0 safe in try_offline_node()? References: <54F52ACF.4030103@huawei.com> <54F58AE3.50101@cn.fujitsu.com> In-Reply-To: <54F58AE3.50101@cn.fujitsu.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org List-ID: To: Gu Zheng Cc: Yasuaki Ishimatsu , Andrew Morton , Tang Chen , Yinghai Lu , Linux MM , LKML , Toshi Kani , Mel Gorman , Tejun Heo , Xiexiuqi , Hanjun Guo On 2015/3/3 18:20, Gu Zheng wrote: > Hi Xishi, > On 03/03/2015 11:30 AM, Xishi Qiu wrote: > >> When hot-remove a numa node, we will clear pgdat, >> but is memset 0 safe in try_offline_node()? > > It is not safe here. In fact, this is a temporary solution here. > As you know, pgdat is accessed lock-less now, so protection > mechanism (RCUi 1/4 ?) is needed to make it completely safe here, > but it seems a bit over-kill. > >> >> process A: offline node XX: >> for_each_populated_zone() >> find online node XX >> cond_resched() >> offline cpu and memory, then try_offline_node() >> node_set_offline(nid), and memset(pgdat, 0, sizeof(*pgdat)) >> access node XX's pgdat >> NULL pointer access error > > It's possible, but I did not meet this condition, did you? > Yes, we test hot-add/hot-remove node with stress, and meet the following call trace several times. next_online_pgdat() int nid = next_online_node(pgdat->node_id); // it's here, pgdat is NULL I add some printk, it shows the above pgdat is just the offline node's pgdat. The reason may be that for_each_zone() and for_each_populated_zone() are lock-less. And stop machine could not resolve it, because cond_resched() maybe in cyclical code. [ 1422.011064] BUG: unable to handle kernel paging request at 0000000000025f60 [ 1422.011086] IP: [] next_online_pgdat+0x1/0x50 [ 1422.011178] PGD 0 [ 1422.011180] Oops: 0000 [#1] SMP [ 1422.011409] ACPI: Device does not support D3cold [ 1422.011961] Modules linked in: fuse nls_iso8859_1 nls_cp437 vfat fat loop dm_mod coretemp mperf crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 pcspkr microcode igb dca i2c_algo_bit ipv6 megaraid_sas iTCO_wdt i2c_i801 i2c_core iTCO_vendor_support tg3 sg hwmon ptp lpc_ich pps_core mfd_core acpi_pad rtc_cmos button ext3 jbd mbcache sd_mod crc_t10dif scsi_dh_alua scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh ahci libahci libata scsi_mod [last unloaded: rasf] [ 1422.012006] CPU: 23 PID: 238 Comm: kworker/23:1 Tainted: G O 3.10.15-5885-euler0302 #1 [ 1422.012024] Hardware name: HUAWEI TECHNOLOGIES CO.,LTD. Huawei N1/Huawei N1, BIOS V100R001 03/02/2015 [ 1422.012065] Workqueue: events vmstat_update [ 1422.012084] task: ffffa800d32c0000 ti: ffffa800d32ae000 task.ti: ffffa800d32ae000 [ 1422.012165] RIP: 0010:[] [] next_online_pgdat+0x1/0x50 [ 1422.012205] RSP: 0018:ffffa800d32afce8 EFLAGS: 00010286 [ 1422.012225] RAX: 0000000000001440 RBX: ffffffff81da53b8 RCX: 0000000000000082 [ 1422.012226] RDX: 0000000000000000 RSI: 0000000000000082 RDI: 0000000000000000 [ 1422.012254] RBP: ffffa800d32afd28 R08: ffffffff81c93bfc R09: ffffffff81cbdc96 [ 1422.012272] R10: 00000000000040ec R11: 00000000000000a0 R12: ffffa800fffb3440 [ 1422.012290] R13: ffffa800d32afd38 R14: 0000000000000017 R15: ffffa800e6616800 [ 1422.012292] FS: 0000000000000000(0000) GS:ffffa800e6600000(0000) knlGS:0000000000000000 [ 1422.012314] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1422.012328] CR2: 0000000000025f60 CR3: 0000000001a0b000 CR4: 00000000001407e0 [ 1422.012328] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1422.012328] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1422.012328] Stack: [ 1422.012328] ffffa800d32afd28 ffffffff81126ca5 ffffa800ffffffff ffffffff814b4314 [ 1422.012328] ffffa800d32ae010 0000000000000000 ffffa800e6616180 ffffa800fffb3440 [ 1422.012328] ffffa800d32afde8 ffffffff81128220 ffffffff00000013 0000000000000038 [ 1422.012328] Call Trace: [ 1422.012328] [] ? next_zone+0xc5/0x150 [ 1422.012328] [] ? __schedule+0x544/0x780 [ 1422.012328] [] refresh_cpu_vm_stats+0xd0/0x140 [ 1422.012328] [] vmstat_update+0x11/0x50 [ 1422.012328] [] process_one_work+0x194/0x3d0 [ 1422.012328] [] worker_thread+0x12b/0x410 [ 1422.012328] [] ? manage_workers+0x1a0/0x1a0 [ 1422.012328] [] kthread+0xc6/0xd0 [ 1422.012328] [] ? kthread_freezable_should_stop+0x70/0x70 [ 1422.012328] [] ret_from_fork+0x7c/0xb0 [ 1422.012328] [] ? kthread_freezable_should_stop+0x70/0x70 Thanks, Xishi Qiu > Regards, > Gu > >> >> Thanks, >> Xishi Qiu >> >> -- >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> the body to majordomo@kvack.org. For more info on Linux MM, >> see: http://www.linux-mm.org/ . >> Don't email: email@kvack.org >> > > > > . > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f172.google.com (mail-pd0-f172.google.com [209.85.192.172]) by kanga.kvack.org (Postfix) with ESMTP id 890A16B0038 for ; Tue, 3 Mar 2015 21:53:17 -0500 (EST) Received: by pdno5 with SMTP id o5so53529285pdn.8 for ; Tue, 03 Mar 2015 18:53:17 -0800 (PST) Received: from szxga03-in.huawei.com (szxga03-in.huawei.com. [119.145.14.66]) by mx.google.com with ESMTPS id cx1si3184009pad.152.2015.03.03.18.53.15 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Tue, 03 Mar 2015 18:53:16 -0800 (PST) Message-ID: <54F67376.8050001@huawei.com> Date: Wed, 4 Mar 2015 10:52:38 +0800 From: Xishi Qiu MIME-Version: 1.0 Subject: Re: node-hotplug: is memset 0 safe in try_offline_node()? References: <54F52ACF.4030103@huawei.com> <54F58AE3.50101@cn.fujitsu.com> <54F66C52.4070600@huawei.com> In-Reply-To: <54F66C52.4070600@huawei.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org List-ID: To: Gu Zheng Cc: Yasuaki Ishimatsu , Andrew Morton , Tang Chen , Yinghai Lu , Linux MM , LKML , Toshi Kani , Mel Gorman , Tejun Heo , Xiexiuqi , Hanjun Guo , Li Zefan On 2015/3/4 10:22, Xishi Qiu wrote: > On 2015/3/3 18:20, Gu Zheng wrote: > >> Hi Xishi, >> On 03/03/2015 11:30 AM, Xishi Qiu wrote: >> >>> When hot-remove a numa node, we will clear pgdat, >>> but is memset 0 safe in try_offline_node()? >> >> It is not safe here. In fact, this is a temporary solution here. >> As you know, pgdat is accessed lock-less now, so protection >> mechanism (RCUi 1/4 ?) is needed to make it completely safe here, >> but it seems a bit over-kill. >> Hi Gu, Can we just remove "memset(pgdat, 0, sizeof(*pgdat));" ? I find this will be fine in the stress test except the warning when hot-add memory. Thanks, Xishi Qiu -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f176.google.com (mail-pd0-f176.google.com [209.85.192.176]) by kanga.kvack.org (Postfix) with ESMTP id BA8326B0038 for ; Tue, 3 Mar 2015 23:10:33 -0500 (EST) Received: by pdbnh10 with SMTP id nh10so30390540pdb.3 for ; Tue, 03 Mar 2015 20:10:33 -0800 (PST) Received: from heian.cn.fujitsu.com ([59.151.112.132]) by mx.google.com with ESMTP id gs2si3425349pac.121.2015.03.03.20.10.31 for ; Tue, 03 Mar 2015 20:10:32 -0800 (PST) Message-ID: <54F681A7.4050203@cn.fujitsu.com> Date: Wed, 4 Mar 2015 11:53:11 +0800 From: Gu Zheng MIME-Version: 1.0 Subject: Re: node-hotplug: is memset 0 safe in try_offline_node()? References: <54F52ACF.4030103@huawei.com> <54F58AE3.50101@cn.fujitsu.com> <54F66C52.4070600@huawei.com> In-Reply-To: <54F66C52.4070600@huawei.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org List-ID: To: Xishi Qiu Cc: Yasuaki Ishimatsu , Andrew Morton , Tang Chen , Yinghai Lu , Linux MM , LKML , Toshi Kani , Mel Gorman , Tejun Heo , Xiexiuqi , Hanjun Guo Hi Xishi, On 03/04/2015 10:22 AM, Xishi Qiu wrote: > On 2015/3/3 18:20, Gu Zheng wrote: >=20 >> Hi Xishi, >> On 03/03/2015 11:30 AM, Xishi Qiu wrote: >> >>> When hot-remove a numa node, we will clear pgdat, >>> but is memset 0 safe in try_offline_node()? >> >> It is not safe here. In fact, this is a temporary solution here. >> As you know, pgdat is accessed lock-less now, so protection >> mechanism (RCU=EF=BC=9F) is needed to make it completely safe here, >> but it seems a bit over-kill. >> >>> >>> process A: offline node XX: >>> for_each_populated_zone() >>> find online node XX >>> cond_resched() >>> offline cpu and memory, then try_offline_node() >>> node_set_offline(nid), and memset(pgdat, 0, sizeof(*pgdat)) >>> access node XX's pgdat >>> NULL pointer access error >> >> It's possible, but I did not meet this condition, did you? >> >=20 > Yes, we test hot-add/hot-remove node with stress, and meet the following > call trace several times. Thanks. >=20 > next_online_pgdat() > int nid =3D next_online_node(pgdat->node_id); // it's here, pgdat is N= ULL memset(pgdat, 0, sizeof(*pgdat)); This memset just sets the context of pgdat to 0, but it will not free pgdat= , so the *pgdat is NULL* is strange here. But anyway, the bug is real, we must fix it. Regards, Gu >=20 > I add some printk, it shows the above pgdat is just the offline node's pg= dat. > The reason may be that for_each_zone() and for_each_populated_zone() are = lock-less. > And stop machine could not resolve it, because cond_resched() maybe in cy= clical code. >=20 > [ 1422.011064] BUG: unable to handle kernel paging request at 00000000000= 25f60 > [ 1422.011086] IP: [] next_online_pgdat+0x1/0x50 > [ 1422.011178] PGD 0=20 > [ 1422.011180] Oops: 0000 [#1] SMP=20 > [ 1422.011409] ACPI: Device does not support D3cold > [ 1422.011961] Modules linked in: fuse nls_iso8859_1 nls_cp437 vfat fat l= oop dm_mod coretemp mperf crc32c_intel ghash_clmulni_intel aesni_intel ablk= _helper cryptd lrw gf128mul glue_helper aes_x86_64 pcspkr microcode igb dca= i2c_algo_bit ipv6 megaraid_sas iTCO_wdt i2c_i801 i2c_core iTCO_vendor_supp= ort tg3 sg hwmon ptp lpc_ich pps_core mfd_core acpi_pad rtc_cmos button ext= 3 jbd mbcache sd_mod crc_t10dif scsi_dh_alua scsi_dh_rdac scsi_dh_hp_sw scs= i_dh_emc scsi_dh ahci libahci libata scsi_mod [last unloaded: rasf] > [ 1422.012006] CPU: 23 PID: 238 Comm: kworker/23:1 Tainted: G O= 3.10.15-5885-euler0302 #1 > [ 1422.012024] Hardware name: HUAWEI TECHNOLOGIES CO.,LTD. Huawei N1/Huaw= ei N1, BIOS V100R001 03/02/2015 > [ 1422.012065] Workqueue: events vmstat_update > [ 1422.012084] task: ffffa800d32c0000 ti: ffffa800d32ae000 task.ti: ffffa= 800d32ae000 > [ 1422.012165] RIP: 0010:[] [] next_= online_pgdat+0x1/0x50 > [ 1422.012205] RSP: 0018:ffffa800d32afce8 EFLAGS: 00010286 > [ 1422.012225] RAX: 0000000000001440 RBX: ffffffff81da53b8 RCX: 000000000= 0000082 > [ 1422.012226] RDX: 0000000000000000 RSI: 0000000000000082 RDI: 000000000= 0000000 > [ 1422.012254] RBP: ffffa800d32afd28 R08: ffffffff81c93bfc R09: ffffffff8= 1cbdc96 > [ 1422.012272] R10: 00000000000040ec R11: 00000000000000a0 R12: ffffa800f= ffb3440 > [ 1422.012290] R13: ffffa800d32afd38 R14: 0000000000000017 R15: ffffa800e= 6616800 > [ 1422.012292] FS: 0000000000000000(0000) GS:ffffa800e6600000(0000) knlG= S:0000000000000000 > [ 1422.012314] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 1422.012328] CR2: 0000000000025f60 CR3: 0000000001a0b000 CR4: 000000000= 01407e0 > [ 1422.012328] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 000000000= 0000000 > [ 1422.012328] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 000000000= 0000400 > [ 1422.012328] Stack: > [ 1422.012328] ffffa800d32afd28 ffffffff81126ca5 ffffa800ffffffff ffffff= ff814b4314 > [ 1422.012328] ffffa800d32ae010 0000000000000000 ffffa800e6616180 ffffa8= 00fffb3440 > [ 1422.012328] ffffa800d32afde8 ffffffff81128220 ffffffff00000013 000000= 0000000038 > [ 1422.012328] Call Trace: > [ 1422.012328] [] ? next_zone+0xc5/0x150 > [ 1422.012328] [] ? __schedule+0x544/0x780 > [ 1422.012328] [] refresh_cpu_vm_stats+0xd0/0x140 > [ 1422.012328] [] vmstat_update+0x11/0x50 > [ 1422.012328] [] process_one_work+0x194/0x3d0 > [ 1422.012328] [] worker_thread+0x12b/0x410 > [ 1422.012328] [] ? manage_workers+0x1a0/0x1a0 > [ 1422.012328] [] kthread+0xc6/0xd0 > [ 1422.012328] [] ? kthread_freezable_should_stop+0x70= /0x70 > [ 1422.012328] [] ret_from_fork+0x7c/0xb0 > [ 1422.012328] [] ? kthread_freezable_should_stop+0x70= /0x70 >=20 > Thanks, > Xishi Qiu >=20 >> Regards, >> Gu >> >>> >>> Thanks, >>> Xishi Qiu >>> >>> -- >>> To unsubscribe, send a message with 'unsubscribe linux-mm' in >>> the body to majordomo@kvack.org. For more info on Linux MM, >>> see: http://www.linux-mm.org/ . >>> Don't email: email@kvack.org >>> >> >> >> >> . >> >=20 >=20 >=20 > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" i= n > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > . >=20 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f48.google.com (mail-pa0-f48.google.com [209.85.220.48]) by kanga.kvack.org (Postfix) with ESMTP id 05DBE6B0038 for ; Tue, 3 Mar 2015 23:13:52 -0500 (EST) Received: by padet14 with SMTP id et14so33432513pad.0 for ; Tue, 03 Mar 2015 20:13:51 -0800 (PST) Received: from heian.cn.fujitsu.com ([59.151.112.132]) by mx.google.com with ESMTP id sp8si3425306pac.126.2015.03.03.20.13.50 for ; Tue, 03 Mar 2015 20:13:51 -0800 (PST) Message-ID: <54F68270.5000203@cn.fujitsu.com> Date: Wed, 4 Mar 2015 11:56:32 +0800 From: Gu Zheng MIME-Version: 1.0 Subject: Re: node-hotplug: is memset 0 safe in try_offline_node()? References: <54F52ACF.4030103@huawei.com> <54F58AE3.50101@cn.fujitsu.com> <54F66C52.4070600@huawei.com> <54F67376.8050001@huawei.com> In-Reply-To: <54F67376.8050001@huawei.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org List-ID: To: Xishi Qiu Cc: Yasuaki Ishimatsu , Andrew Morton , Tang Chen , Yinghai Lu , Linux MM , LKML , Toshi Kani , Mel Gorman , Tejun Heo , Xiexiuqi , Hanjun Guo , Li Zefan Hi Xishi, On 03/04/2015 10:52 AM, Xishi Qiu wrote: > On 2015/3/4 10:22, Xishi Qiu wrote: >=20 >> On 2015/3/3 18:20, Gu Zheng wrote: >> >>> Hi Xishi, >>> On 03/03/2015 11:30 AM, Xishi Qiu wrote: >>> >>>> When hot-remove a numa node, we will clear pgdat, >>>> but is memset 0 safe in try_offline_node()? >>> >>> It is not safe here. In fact, this is a temporary solution here. >>> As you know, pgdat is accessed lock-less now, so protection >>> mechanism (RCU=EF=BC=9F) is needed to make it completely safe here, >>> but it seems a bit over-kill. >>> >=20 > Hi Gu, >=20 > Can we just remove "memset(pgdat, 0, sizeof(*pgdat));" ? > I find this will be fine in the stress test except the warning=20 > when hot-add memory. As you see, it will trigger the warning in free_area_init_node(). Could you try the following patch? It will reset the pgdat before reuse it. diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 1778628..0717649 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1092,6 +1092,9 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64= start) return NULL; =20 arch_refresh_nodedata(nid, pgdat); + } else { + /* Reset the pgdat to reuse */ + memset(pgdat, 0, sizeof(*pgdat)); } =20 /* we can use NODE_DATA(nid) from here */ @@ -2021,15 +2024,6 @@ void try_offline_node(int nid) =20 /* notify that the node is down */ call_node_notify(NODE_DOWN, (void *)(long)nid); - - /* - * Since there is no way to guarentee the address of pgdat/zone is = not - * on stack of any kernel threads or used by other kernel objects - * without reference counting or other symchronizing method, do not - * reset node_data and free pgdat here. Just reset it to 0 and reus= e - * the memory when the node is online again. - */ - memset(pgdat, 0, sizeof(*pgdat)); } EXPORT_SYMBOL(try_offline_node); =20 >=20 > Thanks, > Xishi Qiu >=20 > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" i= n > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > . >=20 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f49.google.com (mail-oi0-f49.google.com [209.85.218.49]) by kanga.kvack.org (Postfix) with ESMTP id 34C866B0038 for ; Wed, 4 Mar 2015 02:02:40 -0500 (EST) Received: by oifu20 with SMTP id u20so4057245oif.11 for ; Tue, 03 Mar 2015 23:02:40 -0800 (PST) Received: from szxga02-in.huawei.com (szxga02-in.huawei.com. [119.145.14.65]) by mx.google.com with ESMTPS id e7si1647219obo.17.2015.03.03.23.02.31 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Tue, 03 Mar 2015 23:02:39 -0800 (PST) Message-ID: <54F6ADD2.3080403@huawei.com> Date: Wed, 4 Mar 2015 15:01:38 +0800 From: Xishi Qiu MIME-Version: 1.0 Subject: Re: node-hotplug: is memset 0 safe in try_offline_node()? References: <54F52ACF.4030103@huawei.com> <54F58AE3.50101@cn.fujitsu.com> <54F66C52.4070600@huawei.com> <54F681A7.4050203@cn.fujitsu.com> In-Reply-To: <54F681A7.4050203@cn.fujitsu.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org List-ID: To: Gu Zheng Cc: Yasuaki Ishimatsu , Andrew Morton , Tang Chen , Yinghai Lu , Linux MM , LKML , Toshi Kani , Mel Gorman , Tejun Heo , Xiexiuqi , Hanjun Guo On 2015/3/4 11:53, Gu Zheng wrote: > Hi Xishi, > > On 03/04/2015 10:22 AM, Xishi Qiu wrote: > >> On 2015/3/3 18:20, Gu Zheng wrote: >> >>> Hi Xishi, >>> On 03/03/2015 11:30 AM, Xishi Qiu wrote: >>> >>>> When hot-remove a numa node, we will clear pgdat, >>>> but is memset 0 safe in try_offline_node()? >>> >>> It is not safe here. In fact, this is a temporary solution here. >>> As you know, pgdat is accessed lock-less now, so protection >>> mechanism (RCUi 1/4 ?) is needed to make it completely safe here, >>> but it seems a bit over-kill. >>> >>>> >>>> process A: offline node XX: >>>> for_each_populated_zone() >>>> find online node XX >>>> cond_resched() >>>> offline cpu and memory, then try_offline_node() >>>> node_set_offline(nid), and memset(pgdat, 0, sizeof(*pgdat)) >>>> access node XX's pgdat >>>> NULL pointer access error >>> >>> It's possible, but I did not meet this condition, did you? >>> >> >> Yes, we test hot-add/hot-remove node with stress, and meet the following >> call trace several times. > > Thanks. > >> >> next_online_pgdat() >> int nid = next_online_node(pgdat->node_id); // it's here, pgdat is NULL > > memset(pgdat, 0, sizeof(*pgdat)); > This memset just sets the context of pgdat to 0, but it will not free pgdat, so the *pgdat is > NULL* is strange here. > But anyway, the bug is real, we must fix it. next_zone() pg_data_t *pgdat = zone->zone_pgdat; // I think this pgdat is NULL, and NODE_DATA() is not NULL. ... pgdat = next_online_pgdat(pgdat); int nid = next_online_node(pgdat->node_id); // so here is the null pointer access Thanks for your new patch, I'll test it. Thanks, Xishi Qiu -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ob0-f173.google.com (mail-ob0-f173.google.com [209.85.214.173]) by kanga.kvack.org (Postfix) with ESMTP id 3515A6B0038 for ; Wed, 4 Mar 2015 03:08:40 -0500 (EST) Received: by obcwp18 with SMTP id wp18so5276691obc.8 for ; Wed, 04 Mar 2015 00:08:40 -0800 (PST) Received: from szxga01-in.huawei.com (szxga01-in.huawei.com. [58.251.152.64]) by mx.google.com with ESMTPS id os8si1668937oeb.103.2015.03.04.00.08.08 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Wed, 04 Mar 2015 00:08:39 -0800 (PST) Message-ID: <54F6BC43.3000509@huawei.com> Date: Wed, 4 Mar 2015 16:03:15 +0800 From: Xishi Qiu MIME-Version: 1.0 Subject: Re: node-hotplug: is memset 0 safe in try_offline_node()? References: <54F52ACF.4030103@huawei.com> <54F58AE3.50101@cn.fujitsu.com> <54F66C52.4070600@huawei.com> <54F67376.8050001@huawei.com> <54F68270.5000203@cn.fujitsu.com> In-Reply-To: <54F68270.5000203@cn.fujitsu.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org List-ID: To: Gu Zheng Cc: Yasuaki Ishimatsu , Andrew Morton , Tang Chen , Yinghai Lu , Linux MM , LKML , Toshi Kani , Mel Gorman , Tejun Heo , Xiexiuqi , Hanjun Guo , Li Zefan On 2015/3/4 11:56, Gu Zheng wrote: > Hi Xishi, > On 03/04/2015 10:52 AM, Xishi Qiu wrote: > >> On 2015/3/4 10:22, Xishi Qiu wrote: >> >>> On 2015/3/3 18:20, Gu Zheng wrote: >>> >>>> Hi Xishi, >>>> On 03/03/2015 11:30 AM, Xishi Qiu wrote: >>>> >>>>> When hot-remove a numa node, we will clear pgdat, >>>>> but is memset 0 safe in try_offline_node()? >>>> >>>> It is not safe here. In fact, this is a temporary solution here. >>>> As you know, pgdat is accessed lock-less now, so protection >>>> mechanism (RCUi 1/4 ?) is needed to make it completely safe here, >>>> but it seems a bit over-kill. >>>> >> >> Hi Gu, >> >> Can we just remove "memset(pgdat, 0, sizeof(*pgdat));" ? >> I find this will be fine in the stress test except the warning >> when hot-add memory. > > As you see, it will trigger the warning in free_area_init_node(). > Could you try the following patch? It will reset the pgdat before reuse it. > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index 1778628..0717649 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -1092,6 +1092,9 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start) > return NULL; > > arch_refresh_nodedata(nid, pgdat); > + } else { > + /* Reset the pgdat to reuse */ > + memset(pgdat, 0, sizeof(*pgdat)); > } Hi Gu, If schedule last a long time, next_zone may be still access the pgdat here, so it is not safe enough, right? Thanks Xishi Qiu > > /* we can use NODE_DATA(nid) from here */ > @@ -2021,15 +2024,6 @@ void try_offline_node(int nid) > > /* notify that the node is down */ > call_node_notify(NODE_DOWN, (void *)(long)nid); > - > - /* > - * Since there is no way to guarentee the address of pgdat/zone is not > - * on stack of any kernel threads or used by other kernel objects > - * without reference counting or other symchronizing method, do not > - * reset node_data and free pgdat here. Just reset it to 0 and reuse > - * the memory when the node is online again. > - */ > - memset(pgdat, 0, sizeof(*pgdat)); > } > EXPORT_SYMBOL(try_offline_node); > > >> >> Thanks, >> Xishi Qiu >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> Please read the FAQ at http://www.tux.org/lkml/ >> . >> > > > > . > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ob0-f178.google.com (mail-ob0-f178.google.com [209.85.214.178]) by kanga.kvack.org (Postfix) with ESMTP id 4B50F6B0038 for ; Wed, 4 Mar 2015 03:32:21 -0500 (EST) Received: by obcuz6 with SMTP id uz6so5365329obc.9 for ; Wed, 04 Mar 2015 00:32:21 -0800 (PST) Received: from szxga02-in.huawei.com (szxga02-in.huawei.com. [119.145.14.65]) by mx.google.com with ESMTPS id e7si1748441obf.19.2015.03.04.00.32.13 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Wed, 04 Mar 2015 00:32:20 -0800 (PST) Message-ID: <54F6C2E6.4030500@huawei.com> Date: Wed, 4 Mar 2015 16:31:34 +0800 From: Xie XiuQi MIME-Version: 1.0 Subject: Re: node-hotplug: is memset 0 safe in try_offline_node()? References: <54F52ACF.4030103@huawei.com> <54F58AE3.50101@cn.fujitsu.com> <54F66C52.4070600@huawei.com> <54F681A7.4050203@cn.fujitsu.com> In-Reply-To: <54F681A7.4050203@cn.fujitsu.com> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org List-ID: To: Gu Zheng , Xishi Qiu Cc: Yasuaki Ishimatsu , Andrew Morton , Tang Chen , Yinghai Lu , Linux MM , LKML , Toshi Kani , Mel Gorman , Tejun Heo , Hanjun Guo On 2015/3/4 11:53, Gu Zheng wrote: > Hi Xishi, > > On 03/04/2015 10:22 AM, Xishi Qiu wrote: > >> On 2015/3/3 18:20, Gu Zheng wrote: >> >>> Hi Xishi, >>> On 03/03/2015 11:30 AM, Xishi Qiu wrote: >>> >>>> When hot-remove a numa node, we will clear pgdat, >>>> but is memset 0 safe in try_offline_node()? >>> >>> It is not safe here. In fact, this is a temporary solution here. >>> As you know, pgdat is accessed lock-less now, so protection >>> mechanism (RCUi 1/4 ?) is needed to make it completely safe here, >>> but it seems a bit over-kill. >>> >>>> >>>> process A: offline node XX: >>>> for_each_populated_zone() >>>> find online node XX >>>> cond_resched() >>>> offline cpu and memory, then try_offline_node() >>>> node_set_offline(nid), and memset(pgdat, 0, sizeof(*pgdat)) >>>> access node XX's pgdat >>>> NULL pointer access error >>> >>> It's possible, but I did not meet this condition, did you? >>> >> >> Yes, we test hot-add/hot-remove node with stress, and meet the following >> call trace several times. > > Thanks. > >> >> next_online_pgdat() >> int nid = next_online_node(pgdat->node_id); // it's here, pgdat is NULL > > memset(pgdat, 0, sizeof(*pgdat)); > This memset just sets the context of pgdat to 0, but it will not free pgdat, so the *pgdat is > NULL* is strange here. Hi Gu, This pgdat isn't 0, but pgdat->zone[i]->zone_pgdat is 0. So pgdat is 0 in next_zone(). -- /* * next_zone - helper magic for for_each_zone() */ struct zone *next_zone(struct zone *zone) { pg_data_t *pgdat = zone->zone_pgdat; if (zone < pgdat->node_zones + MAX_NR_ZONES - 1) zone++; else { pgdat = next_online_pgdat(pgdat); if (pgdat) zone = pgdat->node_zones; else zone = NULL; } return zone; } > But anyway, the bug is real, we must fix it. > > Regards, > Gu > >> >> I add some printk, it shows the above pgdat is just the offline node's pgdat. >> The reason may be that for_each_zone() and for_each_populated_zone() are lock-less. >> And stop machine could not resolve it, because cond_resched() maybe in cyclical code. >> >> [ 1422.011064] BUG: unable to handle kernel paging request at 0000000000025f60 >> [ 1422.011086] IP: [] next_online_pgdat+0x1/0x50 >> [ 1422.011178] PGD 0 >> [ 1422.011180] Oops: 0000 [#1] SMP >> [ 1422.011409] ACPI: Device does not support D3cold >> [ 1422.011961] Modules linked in: fuse nls_iso8859_1 nls_cp437 vfat fat loop dm_mod coretemp mperf crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 pcspkr microcode igb dca i2c_algo_bit ipv6 megaraid_sas iTCO_wdt i2c_i801 i2c_core iTCO_vendor_support tg3 sg hwmon ptp lpc_ich pps_core mfd_core acpi_pad rtc_cmos button ext3 jbd mbcache sd_mod crc_t10dif scsi_dh_alua scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh ahci libahci libata scsi_mod [last unloaded: rasf] >> [ 1422.012006] CPU: 23 PID: 238 Comm: kworker/23:1 Tainted: G O 3.10.15-5885-euler0302 #1 >> [ 1422.012024] Hardware name: HUAWEI TECHNOLOGIES CO.,LTD. Huawei N1/Huawei N1, BIOS V100R001 03/02/2015 >> [ 1422.012065] Workqueue: events vmstat_update >> [ 1422.012084] task: ffffa800d32c0000 ti: ffffa800d32ae000 task.ti: ffffa800d32ae000 >> [ 1422.012165] RIP: 0010:[] [] next_online_pgdat+0x1/0x50 >> [ 1422.012205] RSP: 0018:ffffa800d32afce8 EFLAGS: 00010286 >> [ 1422.012225] RAX: 0000000000001440 RBX: ffffffff81da53b8 RCX: 0000000000000082 >> [ 1422.012226] RDX: 0000000000000000 RSI: 0000000000000082 RDI: 0000000000000000 >> [ 1422.012254] RBP: ffffa800d32afd28 R08: ffffffff81c93bfc R09: ffffffff81cbdc96 >> [ 1422.012272] R10: 00000000000040ec R11: 00000000000000a0 R12: ffffa800fffb3440 >> [ 1422.012290] R13: ffffa800d32afd38 R14: 0000000000000017 R15: ffffa800e6616800 >> [ 1422.012292] FS: 0000000000000000(0000) GS:ffffa800e6600000(0000) knlGS:0000000000000000 >> [ 1422.012314] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 1422.012328] CR2: 0000000000025f60 CR3: 0000000001a0b000 CR4: 00000000001407e0 >> [ 1422.012328] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> [ 1422.012328] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >> [ 1422.012328] Stack: >> [ 1422.012328] ffffa800d32afd28 ffffffff81126ca5 ffffa800ffffffff ffffffff814b4314 >> [ 1422.012328] ffffa800d32ae010 0000000000000000 ffffa800e6616180 ffffa800fffb3440 >> [ 1422.012328] ffffa800d32afde8 ffffffff81128220 ffffffff00000013 0000000000000038 >> [ 1422.012328] Call Trace: >> [ 1422.012328] [] ? next_zone+0xc5/0x150 >> [ 1422.012328] [] ? __schedule+0x544/0x780 >> [ 1422.012328] [] refresh_cpu_vm_stats+0xd0/0x140 >> [ 1422.012328] [] vmstat_update+0x11/0x50 >> [ 1422.012328] [] process_one_work+0x194/0x3d0 >> [ 1422.012328] [] worker_thread+0x12b/0x410 >> [ 1422.012328] [] ? manage_workers+0x1a0/0x1a0 >> [ 1422.012328] [] kthread+0xc6/0xd0 >> [ 1422.012328] [] ? kthread_freezable_should_stop+0x70/0x70 >> [ 1422.012328] [] ret_from_fork+0x7c/0xb0 >> [ 1422.012328] [] ? kthread_freezable_should_stop+0x70/0x70 >> >> Thanks, >> Xishi Qiu >> >>> Regards, >>> Gu >>> >>>> >>>> Thanks, >>>> Xishi Qiu >>>> >>>> -- >>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in >>>> the body to majordomo@kvack.org. For more info on Linux MM, >>>> see: http://www.linux-mm.org/ . >>>> Don't email: email@kvack.org >>>> >>> >>> >>> >>> . >>> >> >> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> Please read the FAQ at http://www.tux.org/lkml/ >> . >> > > > > . > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f180.google.com (mail-pd0-f180.google.com [209.85.192.180]) by kanga.kvack.org (Postfix) with ESMTP id 885856B0038 for ; Wed, 4 Mar 2015 04:04:05 -0500 (EST) Received: by pdbnh10 with SMTP id nh10so32221106pdb.3 for ; Wed, 04 Mar 2015 01:04:05 -0800 (PST) Received: from mgwkm02.jp.fujitsu.com (mgwkm02.jp.fujitsu.com. [202.219.69.169]) by mx.google.com with ESMTPS id hn4si4139348pbb.173.2015.03.04.01.04.03 for (version=TLSv1.2 cipher=AES128-GCM-SHA256 bits=128/128); Wed, 04 Mar 2015 01:04:04 -0800 (PST) Received: from g01jpfmpwkw03.exch.g01.fujitsu.local (g01jpfmpwkw03.exch.g01.fujitsu.local [10.0.193.57]) by kw-mxq.gw.nic.fujitsu.com (Postfix) with ESMTP id 482FDAC0A2A for ; Wed, 4 Mar 2015 18:04:01 +0900 (JST) Message-ID: <54F6C809.1080709@jp.fujitsu.com> Date: Wed, 4 Mar 2015 17:53:29 +0900 From: Kamezawa Hiroyuki MIME-Version: 1.0 Subject: Re: node-hotplug: is memset 0 safe in try_offline_node()? References: <54F52ACF.4030103@huawei.com> <54F58AE3.50101@cn.fujitsu.com> <54F66C52.4070600@huawei.com> <54F67376.8050001@huawei.com> <54F68270.5000203@cn.fujitsu.com> <54F6BC43.3000509@huawei.com> In-Reply-To: <54F6BC43.3000509@huawei.com> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org List-ID: To: Xishi Qiu , Gu Zheng Cc: Yasuaki Ishimatsu , Andrew Morton , Tang Chen , Yinghai Lu , Linux MM , LKML , Toshi Kani , Mel Gorman , Tejun Heo , Xiexiuqi , Hanjun Guo , Li Zefan On 2015/03/04 17:03, Xishi Qiu wrote: > On 2015/3/4 11:56, Gu Zheng wrote: > >> Hi Xishi, >> On 03/04/2015 10:52 AM, Xishi Qiu wrote: >> >>> On 2015/3/4 10:22, Xishi Qiu wrote: >>> >>>> On 2015/3/3 18:20, Gu Zheng wrote: >>>> >>>>> Hi Xishi, >>>>> On 03/03/2015 11:30 AM, Xishi Qiu wrote: >>>>> >>>>>> When hot-remove a numa node, we will clear pgdat, >>>>>> but is memset 0 safe in try_offline_node()? >>>>> >>>>> It is not safe here. In fact, this is a temporary solution here. >>>>> As you know, pgdat is accessed lock-less now, so protection >>>>> mechanism (RCUi 1/4 ?) is needed to make it completely safe here, >>>>> but it seems a bit over-kill. >>>>> >>> >>> Hi Gu, >>> >>> Can we just remove "memset(pgdat, 0, sizeof(*pgdat));" ? >>> I find this will be fine in the stress test except the warning >>> when hot-add memory. >> >> As you see, it will trigger the warning in free_area_init_node(). >> Could you try the following patch? It will reset the pgdat before reuse it. >> >> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >> index 1778628..0717649 100644 >> --- a/mm/memory_hotplug.c >> +++ b/mm/memory_hotplug.c >> @@ -1092,6 +1092,9 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start) >> return NULL; >> >> arch_refresh_nodedata(nid, pgdat); >> + } else { >> + /* Reset the pgdat to reuse */ >> + memset(pgdat, 0, sizeof(*pgdat)); >> } > > Hi Gu, > > If schedule last a long time, next_zone may be still access the pgdat here, > so it is not safe enough, right? > How about just reseting pgdat->nr_zones and pgdat->classzone_idx to be 0 rather than memset() ? It seems breaking pointer information in pgdat is not a choice. Just proper "values" should be reset. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f174.google.com (mail-pd0-f174.google.com [209.85.192.174]) by kanga.kvack.org (Postfix) with ESMTP id 6B9B56B0038 for ; Wed, 4 Mar 2015 05:11:21 -0500 (EST) Received: by pdjy10 with SMTP id y10so56314917pdj.6 for ; Wed, 04 Mar 2015 02:11:21 -0800 (PST) Received: from heian.cn.fujitsu.com ([59.151.112.132]) by mx.google.com with ESMTP id y4si4567610pdl.50.2015.03.04.02.11.17 for ; Wed, 04 Mar 2015 02:11:20 -0800 (PST) Message-ID: <54F6D637.6040705@cn.fujitsu.com> Date: Wed, 4 Mar 2015 17:53:59 +0800 From: Gu Zheng MIME-Version: 1.0 Subject: Re: node-hotplug: is memset 0 safe in try_offline_node()? References: <54F52ACF.4030103@huawei.com> <54F58AE3.50101@cn.fujitsu.com> <54F66C52.4070600@huawei.com> <54F67376.8050001@huawei.com> <54F68270.5000203@cn.fujitsu.com> <54F6BC43.3000509@huawei.com> <54F6C809.1080709@jp.fujitsu.com> In-Reply-To: <54F6C809.1080709@jp.fujitsu.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org List-ID: To: Kamezawa Hiroyuki , Xishi Qiu Cc: Yasuaki Ishimatsu , Andrew Morton , Tang Chen , Yinghai Lu , Linux MM , LKML , Toshi Kani , Mel Gorman , Tejun Heo , Xiexiuqi , Hanjun Guo , Li Zefan , Taku Izumi On 03/04/2015 04:53 PM, Kamezawa Hiroyuki wrote: > On 2015/03/04 17:03, Xishi Qiu wrote: >> On 2015/3/4 11:56, Gu Zheng wrote: >> >>> Hi Xishi, >>> On 03/04/2015 10:52 AM, Xishi Qiu wrote: >>> >>>> On 2015/3/4 10:22, Xishi Qiu wrote: >>>> >>>>> On 2015/3/3 18:20, Gu Zheng wrote: >>>>> >>>>>> Hi Xishi, >>>>>> On 03/03/2015 11:30 AM, Xishi Qiu wrote: >>>>>> >>>>>>> When hot-remove a numa node, we will clear pgdat, >>>>>>> but is memset 0 safe in try_offline_node()? >>>>>> >>>>>> It is not safe here. In fact, this is a temporary solution here. >>>>>> As you know, pgdat is accessed lock-less now, so protection >>>>>> mechanism (RCU=EF=BC=9F) is needed to make it completely safe here, >>>>>> but it seems a bit over-kill. >>>>>> >>>> >>>> Hi Gu, >>>> >>>> Can we just remove "memset(pgdat, 0, sizeof(*pgdat));" ? >>>> I find this will be fine in the stress test except the warning >>>> when hot-add memory. >>> >>> As you see, it will trigger the warning in free_area_init_node(). >>> Could you try the following patch? It will reset the pgdat before reuse= it. >>> >>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >>> index 1778628..0717649 100644 >>> --- a/mm/memory_hotplug.c >>> +++ b/mm/memory_hotplug.c >>> @@ -1092,6 +1092,9 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid,= u64 start) >>> return NULL; >>> >>> arch_refresh_nodedata(nid, pgdat); >>> + } else { >>> + /* Reset the pgdat to reuse */ >>> + memset(pgdat, 0, sizeof(*pgdat)); >>> } >> >> Hi Gu, >> >> If schedule last a long time, next_zone may be still access the pgdat he= re, >> so it is not safe enough, right? Hi Xishi, IMO, the scheduled time is rather short if compares with the time gap between hot remove and hot re-add a node, so we can say it is safe here. >> >=20 > How about just reseting pgdat->nr_zones and pgdat->classzone_idx to be 0 = rather than > memset() ? >=20 > It seems breaking pointer information in pgdat is not a choice. > Just proper "values" should be reset. Anyway, sounds reasonable. Best regards, Gu >=20 > Thanks, > -Kame >=20 >=20 >=20 > . >=20 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f171.google.com (mail-pd0-f171.google.com [209.85.192.171]) by kanga.kvack.org (Postfix) with ESMTP id 31C496B006E for ; Thu, 5 Mar 2015 03:43:29 -0500 (EST) Received: by pdev10 with SMTP id v10so4220492pde.13 for ; Thu, 05 Mar 2015 00:43:28 -0800 (PST) Received: from heian.cn.fujitsu.com ([59.151.112.132]) by mx.google.com with ESMTP id f3si8545010pdd.80.2015.03.05.00.43.27 for ; Thu, 05 Mar 2015 00:43:28 -0800 (PST) Message-ID: <54F81322.8010202@cn.fujitsu.com> Date: Thu, 5 Mar 2015 16:26:10 +0800 From: Gu Zheng MIME-Version: 1.0 Subject: Re: node-hotplug: is memset 0 safe in try_offline_node()? References: <54F52ACF.4030103@huawei.com> In-Reply-To: <54F52ACF.4030103@huawei.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Xishi Qiu Cc: Yasuaki Ishimatsu , Andrew Morton , Tang Chen , Yinghai Lu , Linux MM , LKML , Toshi Kani , Mel Gorman , Tejun Heo Hi Xishi, Could you please try the following one? It postpones the reset of obsolete pgdat from try_offline_node() to hotadd_new_pgdat(), and just resetting pgdat->nr_zones and pgdat->classzone_idx to be 0 rather than the whole reset by memset() as Kame suggested. Regards, Gu --- mm/memory_hotplug.c | 13 ++++--------- 1 files changed, 4 insertions(+), 9 deletions(-) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 1778628..c17eebf 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1092,6 +1092,10 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start) return NULL; arch_refresh_nodedata(nid, pgdat); + } else { + /* Reset the nr_zones and classzone_idx to 0 before reuse */ + pgdat->nr_zones = 0; + pgdat->classzone_idx = 0; } /* we can use NODE_DATA(nid) from here */ @@ -2021,15 +2025,6 @@ void try_offline_node(int nid) /* notify that the node is down */ call_node_notify(NODE_DOWN, (void *)(long)nid); - - /* - * Since there is no way to guarentee the address of pgdat/zone is not - * on stack of any kernel threads or used by other kernel objects - * without reference counting or other symchronizing method, do not - * reset node_data and free pgdat here. Just reset it to 0 and reuse - * the memory when the node is online again. - */ - memset(pgdat, 0, sizeof(*pgdat)); } EXPORT_SYMBOL(try_offline_node); -- 1.7.7 On 03/03/2015 11:30 AM, Xishi Qiu wrote: > When hot-remove a numa node, we will clear pgdat, > but is memset 0 safe in try_offline_node()? > > process A: offline node XX: > for_each_populated_zone() > find online node XX > cond_resched() > offline cpu and memory, then try_offline_node() > node_set_offline(nid), and memset(pgdat, 0, sizeof(*pgdat)) > access node XX's pgdat > NULL pointer access error > > Thanks, > Xishi Qiu > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ob0-f178.google.com (mail-ob0-f178.google.com [209.85.214.178]) by kanga.kvack.org (Postfix) with ESMTP id 8FBAD6B0038 for ; Thu, 5 Mar 2015 04:40:25 -0500 (EST) Received: by obbgq1 with SMTP id gq1so12734907obb.2 for ; Thu, 05 Mar 2015 01:40:25 -0800 (PST) Received: from szxga01-in.huawei.com (szxga01-in.huawei.com. [58.251.152.64]) by mx.google.com with ESMTPS id th1si3667392obc.63.2015.03.05.01.39.59 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Thu, 05 Mar 2015 01:40:24 -0800 (PST) Message-ID: <54F8243D.7020809@huawei.com> Date: Thu, 5 Mar 2015 17:39:09 +0800 From: Xishi Qiu MIME-Version: 1.0 Subject: Re: node-hotplug: is memset 0 safe in try_offline_node()? References: <54F52ACF.4030103@huawei.com> <54F81322.8010202@cn.fujitsu.com> In-Reply-To: <54F81322.8010202@cn.fujitsu.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Gu Zheng Cc: Yasuaki Ishimatsu , Andrew Morton , Tang Chen , Yinghai Lu , Linux MM , LKML , Toshi Kani , Mel Gorman , Tejun Heo , Kamezawa Hiroyuki On 2015/3/5 16:26, Gu Zheng wrote: > Hi Xishi, > Could you please try the following one? > It postpones the reset of obsolete pgdat from try_offline_node() to > hotadd_new_pgdat(), and just resetting pgdat->nr_zones and > pgdat->classzone_idx to be 0 rather than the whole reset by memset() > as Kame suggested. > > Regards, > Gu > > --- > mm/memory_hotplug.c | 13 ++++--------- > 1 files changed, 4 insertions(+), 9 deletions(-) > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index 1778628..c17eebf 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -1092,6 +1092,10 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start) > return NULL; > > arch_refresh_nodedata(nid, pgdat); > + } else { > + /* Reset the nr_zones and classzone_idx to 0 before reuse */ > + pgdat->nr_zones = 0; > + pgdat->classzone_idx = 0; Hi Gu, This is just to avoid the warning, I think it's no meaning. Here is the changlog from the original patch: commit 88fdf75d1bb51d85ba00c466391770056d44bc03 ... Warn if memory-hotplug/boot code doesn't initialize pg_data_t with zero when it is allocated. Arch code and memory hotplug already initiailize pg_data_t. So this warning should never happen. I select fields *randomly* near the beginning, middle and end of pg_data_t for checking. ... Thanks, Xishi Qiu > } > > /* we can use NODE_DATA(nid) from here */ > @@ -2021,15 +2025,6 @@ void try_offline_node(int nid) > > /* notify that the node is down */ > call_node_notify(NODE_DOWN, (void *)(long)nid); > - > - /* > - * Since there is no way to guarentee the address of pgdat/zone is not > - * on stack of any kernel threads or used by other kernel objects > - * without reference counting or other symchronizing method, do not > - * reset node_data and free pgdat here. Just reset it to 0 and reuse > - * the memory when the node is online again. > - */ > - memset(pgdat, 0, sizeof(*pgdat)); > } > EXPORT_SYMBOL(try_offline_node); > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f46.google.com (mail-pa0-f46.google.com [209.85.220.46]) by kanga.kvack.org (Postfix) with ESMTP id DAAAF6B0038 for ; Thu, 5 Mar 2015 05:03:08 -0500 (EST) Received: by padfa1 with SMTP id fa1so41303400pad.9 for ; Thu, 05 Mar 2015 02:03:08 -0800 (PST) Received: from heian.cn.fujitsu.com ([59.151.112.132]) by mx.google.com with ESMTP id de6si1548627pdb.184.2015.03.05.02.03.06 for ; Thu, 05 Mar 2015 02:03:08 -0800 (PST) Message-ID: <54F825CB.8040402@cn.fujitsu.com> Date: Thu, 5 Mar 2015 17:45:47 +0800 From: Gu Zheng MIME-Version: 1.0 Subject: Re: node-hotplug: is memset 0 safe in try_offline_node()? References: <54F52ACF.4030103@huawei.com> <54F81322.8010202@cn.fujitsu.com> <54F8243D.7020809@huawei.com> In-Reply-To: <54F8243D.7020809@huawei.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Xishi Qiu Cc: Yasuaki Ishimatsu , Andrew Morton , Tang Chen , Yinghai Lu , Linux MM , LKML , Toshi Kani , Mel Gorman , Tejun Heo , Kamezawa Hiroyuki Hi Xishi, On 03/05/2015 05:39 PM, Xishi Qiu wrote: > On 2015/3/5 16:26, Gu Zheng wrote: > >> Hi Xishi, >> Could you please try the following one? >> It postpones the reset of obsolete pgdat from try_offline_node() to >> hotadd_new_pgdat(), and just resetting pgdat->nr_zones and >> pgdat->classzone_idx to be 0 rather than the whole reset by memset() >> as Kame suggested. >> >> Regards, >> Gu >> >> --- >> mm/memory_hotplug.c | 13 ++++--------- >> 1 files changed, 4 insertions(+), 9 deletions(-) >> >> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >> index 1778628..c17eebf 100644 >> --- a/mm/memory_hotplug.c >> +++ b/mm/memory_hotplug.c >> @@ -1092,6 +1092,10 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start) >> return NULL; >> >> arch_refresh_nodedata(nid, pgdat); >> + } else { >> + /* Reset the nr_zones and classzone_idx to 0 before reuse */ >> + pgdat->nr_zones = 0; >> + pgdat->classzone_idx = 0; > > Hi Gu, > > This is just to avoid the warning, I think it's no meaning. Can not agree. The key point here is postponing the reset of obsolete pgdat to the time we want to reuse it to avoid the effect(Oops: 0000 as you mentioned), and avoiding warning is the minor benefit, though it is also important. > Here is the changlog from the original patch: > > commit 88fdf75d1bb51d85ba00c466391770056d44bc03 > ... > Warn if memory-hotplug/boot code doesn't initialize pg_data_t with zero > when it is allocated. Arch code and memory hotplug already initiailize > pg_data_t. So this warning should never happen. I select fields *randomly* > near the beginning, middle and end of pg_data_t for checking. > ... There was not hot remove node that time, so it seems did not consider the *reuse* case, but anyway, we should not break it here. Regards, Gu > > Thanks, > Xishi Qiu > >> } >> >> /* we can use NODE_DATA(nid) from here */ >> @@ -2021,15 +2025,6 @@ void try_offline_node(int nid) >> >> /* notify that the node is down */ >> call_node_notify(NODE_DOWN, (void *)(long)nid); >> - >> - /* >> - * Since there is no way to guarentee the address of pgdat/zone is not >> - * on stack of any kernel threads or used by other kernel objects >> - * without reference counting or other symchronizing method, do not >> - * reset node_data and free pgdat here. Just reset it to 0 and reuse >> - * the memory when the node is online again. >> - */ >> - memset(pgdat, 0, sizeof(*pgdat)); >> } >> EXPORT_SYMBOL(try_offline_node); >> > > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org > . > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f178.google.com (mail-pd0-f178.google.com [209.85.192.178]) by kanga.kvack.org (Postfix) with ESMTP id 9FEC790002E for ; Tue, 10 Mar 2015 21:29:32 -0400 (EDT) Received: by pdbfl12 with SMTP id fl12so6723757pdb.9 for ; Tue, 10 Mar 2015 18:29:32 -0700 (PDT) Received: from heian.cn.fujitsu.com ([59.151.112.132]) by mx.google.com with ESMTP id fn4si511062pab.203.2015.03.10.18.29.30 for ; Tue, 10 Mar 2015 18:29:31 -0700 (PDT) Message-ID: <54FF9662.8080303@cn.fujitsu.com> Date: Wed, 11 Mar 2015 09:12:02 +0800 From: Gu Zheng MIME-Version: 1.0 Subject: Re: node-hotplug: is memset 0 safe in try_offline_node()? References: <54F52ACF.4030103@huawei.com> <54F81322.8010202@cn.fujitsu.com> <54F8243D.7020809@huawei.com> In-Reply-To: <54F8243D.7020809@huawei.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Xishi Qiu Cc: Yasuaki Ishimatsu , Andrew Morton , Tang Chen , Yinghai Lu , Linux MM , LKML , Toshi Kani , Mel Gorman , Tejun Heo , Kamezawa Hiroyuki Hi Xishi, What is the condition of this problem now? Regards, Gu On 03/05/2015 05:39 PM, Xishi Qiu wrote: > On 2015/3/5 16:26, Gu Zheng wrote: > >> Hi Xishi, >> Could you please try the following one? >> It postpones the reset of obsolete pgdat from try_offline_node() to >> hotadd_new_pgdat(), and just resetting pgdat->nr_zones and >> pgdat->classzone_idx to be 0 rather than the whole reset by memset() >> as Kame suggested. >> >> Regards, >> Gu >> >> --- >> mm/memory_hotplug.c | 13 ++++--------- >> 1 files changed, 4 insertions(+), 9 deletions(-) >> >> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >> index 1778628..c17eebf 100644 >> --- a/mm/memory_hotplug.c >> +++ b/mm/memory_hotplug.c >> @@ -1092,6 +1092,10 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start) >> return NULL; >> >> arch_refresh_nodedata(nid, pgdat); >> + } else { >> + /* Reset the nr_zones and classzone_idx to 0 before reuse */ >> + pgdat->nr_zones = 0; >> + pgdat->classzone_idx = 0; > > Hi Gu, > > This is just to avoid the warning, I think it's no meaning. > Here is the changlog from the original patch: > > commit 88fdf75d1bb51d85ba00c466391770056d44bc03 > ... > Warn if memory-hotplug/boot code doesn't initialize pg_data_t with zero > when it is allocated. Arch code and memory hotplug already initiailize > pg_data_t. So this warning should never happen. I select fields *randomly* > near the beginning, middle and end of pg_data_t for checking. > ... > > Thanks, > Xishi Qiu > >> } >> >> /* we can use NODE_DATA(nid) from here */ >> @@ -2021,15 +2025,6 @@ void try_offline_node(int nid) >> >> /* notify that the node is down */ >> call_node_notify(NODE_DOWN, (void *)(long)nid); >> - >> - /* >> - * Since there is no way to guarentee the address of pgdat/zone is not >> - * on stack of any kernel threads or used by other kernel objects >> - * without reference counting or other symchronizing method, do not >> - * reset node_data and free pgdat here. Just reset it to 0 and reuse >> - * the memory when the node is online again. >> - */ >> - memset(pgdat, 0, sizeof(*pgdat)); >> } >> EXPORT_SYMBOL(try_offline_node); >> > > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org > . > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f46.google.com (mail-oi0-f46.google.com [209.85.218.46]) by kanga.kvack.org (Postfix) with ESMTP id 92AFA90002E for ; Tue, 10 Mar 2015 22:53:58 -0400 (EDT) Received: by oifu20 with SMTP id u20so5375336oif.11 for ; Tue, 10 Mar 2015 19:53:58 -0700 (PDT) Received: from szxga03-in.huawei.com (szxga03-in.huawei.com. [119.145.14.66]) by mx.google.com with ESMTPS id m6si979832oel.34.2015.03.10.19.53.55 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Tue, 10 Mar 2015 19:53:57 -0700 (PDT) Message-ID: <54FFADB6.60604@huawei.com> Date: Wed, 11 Mar 2015 10:51:34 +0800 From: Xie XiuQi MIME-Version: 1.0 Subject: Re: node-hotplug: is memset 0 safe in try_offline_node()? References: <54F52ACF.4030103@huawei.com> <54F81322.8010202@cn.fujitsu.com> <54F8243D.7020809@huawei.com> <54FF9662.8080303@cn.fujitsu.com> In-Reply-To: <54FF9662.8080303@cn.fujitsu.com> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Gu Zheng , Xishi Qiu Cc: Yasuaki Ishimatsu , Andrew Morton , Tang Chen , Yinghai Lu , Linux MM , LKML , Toshi Kani , Mel Gorman , Tejun Heo , Kamezawa Hiroyuki On 2015/3/11 9:12, Gu Zheng wrote: > Hi Xishi, > > What is the condition of this problem now? Hi Gu, I have no machine to do this test now. But I've tested the patch "just remove memset 0" more than 20 hours last week, it's OK. Thanks, Xie XiuQi > > Regards, > Gu > On 03/05/2015 05:39 PM, Xishi Qiu wrote: > >> On 2015/3/5 16:26, Gu Zheng wrote: >> >>> Hi Xishi, >>> Could you please try the following one? >>> It postpones the reset of obsolete pgdat from try_offline_node() to >>> hotadd_new_pgdat(), and just resetting pgdat->nr_zones and >>> pgdat->classzone_idx to be 0 rather than the whole reset by memset() >>> as Kame suggested. >>> >>> Regards, >>> Gu >>> >>> --- >>> mm/memory_hotplug.c | 13 ++++--------- >>> 1 files changed, 4 insertions(+), 9 deletions(-) >>> >>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >>> index 1778628..c17eebf 100644 >>> --- a/mm/memory_hotplug.c >>> +++ b/mm/memory_hotplug.c >>> @@ -1092,6 +1092,10 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start) >>> return NULL; >>> >>> arch_refresh_nodedata(nid, pgdat); >>> + } else { >>> + /* Reset the nr_zones and classzone_idx to 0 before reuse */ >>> + pgdat->nr_zones = 0; >>> + pgdat->classzone_idx = 0; >> >> Hi Gu, >> >> This is just to avoid the warning, I think it's no meaning. >> Here is the changlog from the original patch: >> >> commit 88fdf75d1bb51d85ba00c466391770056d44bc03 >> ... >> Warn if memory-hotplug/boot code doesn't initialize pg_data_t with zero >> when it is allocated. Arch code and memory hotplug already initiailize >> pg_data_t. So this warning should never happen. I select fields *randomly* >> near the beginning, middle and end of pg_data_t for checking. >> ... >> >> Thanks, >> Xishi Qiu >> >>> } >>> >>> /* we can use NODE_DATA(nid) from here */ >>> @@ -2021,15 +2025,6 @@ void try_offline_node(int nid) >>> >>> /* notify that the node is down */ >>> call_node_notify(NODE_DOWN, (void *)(long)nid); >>> - >>> - /* >>> - * Since there is no way to guarentee the address of pgdat/zone is not >>> - * on stack of any kernel threads or used by other kernel objects >>> - * without reference counting or other symchronizing method, do not >>> - * reset node_data and free pgdat here. Just reset it to 0 and reuse >>> - * the memory when the node is online again. >>> - */ >>> - memset(pgdat, 0, sizeof(*pgdat)); >>> } >>> EXPORT_SYMBOL(try_offline_node); >>> >> >> >> >> -- >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> the body to majordomo@kvack.org. For more info on Linux MM, >> see: http://www.linux-mm.org/ . >> Don't email: email@kvack.org >> . >> > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > > . > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756375AbbCCDan (ORCPT ); Mon, 2 Mar 2015 22:30:43 -0500 Received: from szxga01-in.huawei.com ([119.145.14.64]:2217 "EHLO szxga01-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756349AbbCCDak (ORCPT ); Mon, 2 Mar 2015 22:30:40 -0500 Message-ID: <54F52ACF.4030103@huawei.com> Date: Tue, 3 Mar 2015 11:30:23 +0800 From: Xishi Qiu User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: Yasuaki Ishimatsu , Andrew Morton , Tang Chen , Yinghai Lu CC: Linux MM , LKML , "Toshi Kani" , Mel Gorman , Tejun Heo Subject: node-hotplug: is memset 0 safe in try_offline_node()? Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.177.25.179] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When hot-remove a numa node, we will clear pgdat, but is memset 0 safe in try_offline_node()? process A: offline node XX: for_each_populated_zone() find online node XX cond_resched() offline cpu and memory, then try_offline_node() node_set_offline(nid), and memset(pgdat, 0, sizeof(*pgdat)) access node XX's pgdat NULL pointer access error Thanks, Xishi Qiu From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756220AbbCCKhl (ORCPT ); Tue, 3 Mar 2015 05:37:41 -0500 Received: from cn.fujitsu.com ([59.151.112.132]:50670 "EHLO heian.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1755440AbbCCKhi convert rfc822-to-8bit (ORCPT ); Tue, 3 Mar 2015 05:37:38 -0500 X-IronPort-AV: E=Sophos;i="5.04,848,1406563200"; d="scan'208";a="61952002" Message-ID: <54F58AE3.50101@cn.fujitsu.com> Date: Tue, 3 Mar 2015 18:20:19 +0800 From: Gu Zheng User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:7.0.1) Gecko/20110930 Thunderbird/7.0.1 MIME-Version: 1.0 To: Xishi Qiu CC: Yasuaki Ishimatsu , Andrew Morton , Tang Chen , Yinghai Lu , Linux MM , LKML , Toshi Kani , Mel Gorman , Tejun Heo Subject: Re: node-hotplug: is memset 0 safe in try_offline_node()? References: <54F52ACF.4030103@huawei.com> In-Reply-To: <54F52ACF.4030103@huawei.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8BIT X-Originating-IP: [10.167.226.100] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Xishi, On 03/03/2015 11:30 AM, Xishi Qiu wrote: > When hot-remove a numa node, we will clear pgdat, > but is memset 0 safe in try_offline_node()? It is not safe here. In fact, this is a temporary solution here. As you know, pgdat is accessed lock-less now, so protection mechanism (RCU?) is needed to make it completely safe here, but it seems a bit over-kill. > > process A: offline node XX: > for_each_populated_zone() > find online node XX > cond_resched() > offline cpu and memory, then try_offline_node() > node_set_offline(nid), and memset(pgdat, 0, sizeof(*pgdat)) > access node XX's pgdat > NULL pointer access error It's possible, but I did not meet this condition, did you? Regards, Gu > > Thanks, > Xishi Qiu > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756760AbbCDCWn (ORCPT ); Tue, 3 Mar 2015 21:22:43 -0500 Received: from szxga01-in.huawei.com ([119.145.14.64]:54582 "EHLO szxga01-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754488AbbCDCWl (ORCPT ); Tue, 3 Mar 2015 21:22:41 -0500 Message-ID: <54F66C52.4070600@huawei.com> Date: Wed, 4 Mar 2015 10:22:10 +0800 From: Xishi Qiu User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: Gu Zheng CC: Yasuaki Ishimatsu , Andrew Morton , Tang Chen , Yinghai Lu , Linux MM , LKML , Toshi Kani , Mel Gorman , Tejun Heo , Xiexiuqi , Hanjun Guo Subject: Re: node-hotplug: is memset 0 safe in try_offline_node()? References: <54F52ACF.4030103@huawei.com> <54F58AE3.50101@cn.fujitsu.com> In-Reply-To: <54F58AE3.50101@cn.fujitsu.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit X-Originating-IP: [10.177.25.179] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2015/3/3 18:20, Gu Zheng wrote: > Hi Xishi, > On 03/03/2015 11:30 AM, Xishi Qiu wrote: > >> When hot-remove a numa node, we will clear pgdat, >> but is memset 0 safe in try_offline_node()? > > It is not safe here. In fact, this is a temporary solution here. > As you know, pgdat is accessed lock-less now, so protection > mechanism (RCU?) is needed to make it completely safe here, > but it seems a bit over-kill. > >> >> process A: offline node XX: >> for_each_populated_zone() >> find online node XX >> cond_resched() >> offline cpu and memory, then try_offline_node() >> node_set_offline(nid), and memset(pgdat, 0, sizeof(*pgdat)) >> access node XX's pgdat >> NULL pointer access error > > It's possible, but I did not meet this condition, did you? > Yes, we test hot-add/hot-remove node with stress, and meet the following call trace several times. next_online_pgdat() int nid = next_online_node(pgdat->node_id); // it's here, pgdat is NULL I add some printk, it shows the above pgdat is just the offline node's pgdat. The reason may be that for_each_zone() and for_each_populated_zone() are lock-less. And stop machine could not resolve it, because cond_resched() maybe in cyclical code. [ 1422.011064] BUG: unable to handle kernel paging request at 0000000000025f60 [ 1422.011086] IP: [] next_online_pgdat+0x1/0x50 [ 1422.011178] PGD 0 [ 1422.011180] Oops: 0000 [#1] SMP [ 1422.011409] ACPI: Device does not support D3cold [ 1422.011961] Modules linked in: fuse nls_iso8859_1 nls_cp437 vfat fat loop dm_mod coretemp mperf crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 pcspkr microcode igb dca i2c_algo_bit ipv6 megaraid_sas iTCO_wdt i2c_i801 i2c_core iTCO_vendor_support tg3 sg hwmon ptp lpc_ich pps_core mfd_core acpi_pad rtc_cmos button ext3 jbd mbcache sd_mod crc_t10dif scsi_dh_alua scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh ahci libahci libata scsi_mod [last unloaded: rasf] [ 1422.012006] CPU: 23 PID: 238 Comm: kworker/23:1 Tainted: G O 3.10.15-5885-euler0302 #1 [ 1422.012024] Hardware name: HUAWEI TECHNOLOGIES CO.,LTD. Huawei N1/Huawei N1, BIOS V100R001 03/02/2015 [ 1422.012065] Workqueue: events vmstat_update [ 1422.012084] task: ffffa800d32c0000 ti: ffffa800d32ae000 task.ti: ffffa800d32ae000 [ 1422.012165] RIP: 0010:[] [] next_online_pgdat+0x1/0x50 [ 1422.012205] RSP: 0018:ffffa800d32afce8 EFLAGS: 00010286 [ 1422.012225] RAX: 0000000000001440 RBX: ffffffff81da53b8 RCX: 0000000000000082 [ 1422.012226] RDX: 0000000000000000 RSI: 0000000000000082 RDI: 0000000000000000 [ 1422.012254] RBP: ffffa800d32afd28 R08: ffffffff81c93bfc R09: ffffffff81cbdc96 [ 1422.012272] R10: 00000000000040ec R11: 00000000000000a0 R12: ffffa800fffb3440 [ 1422.012290] R13: ffffa800d32afd38 R14: 0000000000000017 R15: ffffa800e6616800 [ 1422.012292] FS: 0000000000000000(0000) GS:ffffa800e6600000(0000) knlGS:0000000000000000 [ 1422.012314] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1422.012328] CR2: 0000000000025f60 CR3: 0000000001a0b000 CR4: 00000000001407e0 [ 1422.012328] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1422.012328] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1422.012328] Stack: [ 1422.012328] ffffa800d32afd28 ffffffff81126ca5 ffffa800ffffffff ffffffff814b4314 [ 1422.012328] ffffa800d32ae010 0000000000000000 ffffa800e6616180 ffffa800fffb3440 [ 1422.012328] ffffa800d32afde8 ffffffff81128220 ffffffff00000013 0000000000000038 [ 1422.012328] Call Trace: [ 1422.012328] [] ? next_zone+0xc5/0x150 [ 1422.012328] [] ? __schedule+0x544/0x780 [ 1422.012328] [] refresh_cpu_vm_stats+0xd0/0x140 [ 1422.012328] [] vmstat_update+0x11/0x50 [ 1422.012328] [] process_one_work+0x194/0x3d0 [ 1422.012328] [] worker_thread+0x12b/0x410 [ 1422.012328] [] ? manage_workers+0x1a0/0x1a0 [ 1422.012328] [] kthread+0xc6/0xd0 [ 1422.012328] [] ? kthread_freezable_should_stop+0x70/0x70 [ 1422.012328] [] ret_from_fork+0x7c/0xb0 [ 1422.012328] [] ? kthread_freezable_should_stop+0x70/0x70 Thanks, Xishi Qiu > Regards, > Gu > >> >> Thanks, >> Xishi Qiu >> >> -- >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> the body to majordomo@kvack.org. For more info on Linux MM, >> see: http://www.linux-mm.org/ . >> Don't email: email@kvack.org >> > > > > . > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758054AbbCDCxL (ORCPT ); Tue, 3 Mar 2015 21:53:11 -0500 Received: from szxga03-in.huawei.com ([119.145.14.66]:39889 "EHLO szxga03-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756036AbbCDCxK (ORCPT ); Tue, 3 Mar 2015 21:53:10 -0500 Message-ID: <54F67376.8050001@huawei.com> Date: Wed, 4 Mar 2015 10:52:38 +0800 From: Xishi Qiu User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: Gu Zheng CC: Yasuaki Ishimatsu , Andrew Morton , Tang Chen , Yinghai Lu , Linux MM , LKML , Toshi Kani , Mel Gorman , Tejun Heo , Xiexiuqi , Hanjun Guo , Li Zefan Subject: Re: node-hotplug: is memset 0 safe in try_offline_node()? References: <54F52ACF.4030103@huawei.com> <54F58AE3.50101@cn.fujitsu.com> <54F66C52.4070600@huawei.com> In-Reply-To: <54F66C52.4070600@huawei.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit X-Originating-IP: [10.177.25.179] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A020203.54F67388.0025,ss=1,re=0.001,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0, ip=0.0.0.0, so=2013-05-26 15:14:31, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: 855b6d3c02475b5bac27360141d803c2 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2015/3/4 10:22, Xishi Qiu wrote: > On 2015/3/3 18:20, Gu Zheng wrote: > >> Hi Xishi, >> On 03/03/2015 11:30 AM, Xishi Qiu wrote: >> >>> When hot-remove a numa node, we will clear pgdat, >>> but is memset 0 safe in try_offline_node()? >> >> It is not safe here. In fact, this is a temporary solution here. >> As you know, pgdat is accessed lock-less now, so protection >> mechanism (RCU?) is needed to make it completely safe here, >> but it seems a bit over-kill. >> Hi Gu, Can we just remove "memset(pgdat, 0, sizeof(*pgdat));" ? I find this will be fine in the stress test except the warning when hot-add memory. Thanks, Xishi Qiu From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758073AbbCDEKh (ORCPT ); Tue, 3 Mar 2015 23:10:37 -0500 Received: from cn.fujitsu.com ([59.151.112.132]:38768 "EHLO heian.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1753212AbbCDEKg convert rfc822-to-8bit (ORCPT ); Tue, 3 Mar 2015 23:10:36 -0500 X-IronPort-AV: E=Sophos;i="5.04,848,1406563200"; d="scan'208";a="62785637" Message-ID: <54F681A7.4050203@cn.fujitsu.com> Date: Wed, 4 Mar 2015 11:53:11 +0800 From: Gu Zheng User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:7.0.1) Gecko/20110930 Thunderbird/7.0.1 MIME-Version: 1.0 To: Xishi Qiu CC: Yasuaki Ishimatsu , Andrew Morton , Tang Chen , Yinghai Lu , Linux MM , LKML , Toshi Kani , Mel Gorman , Tejun Heo , Xiexiuqi , Hanjun Guo Subject: Re: node-hotplug: is memset 0 safe in try_offline_node()? References: <54F52ACF.4030103@huawei.com> <54F58AE3.50101@cn.fujitsu.com> <54F66C52.4070600@huawei.com> In-Reply-To: <54F66C52.4070600@huawei.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8BIT X-Originating-IP: [10.167.226.100] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Xishi, On 03/04/2015 10:22 AM, Xishi Qiu wrote: > On 2015/3/3 18:20, Gu Zheng wrote: > >> Hi Xishi, >> On 03/03/2015 11:30 AM, Xishi Qiu wrote: >> >>> When hot-remove a numa node, we will clear pgdat, >>> but is memset 0 safe in try_offline_node()? >> >> It is not safe here. In fact, this is a temporary solution here. >> As you know, pgdat is accessed lock-less now, so protection >> mechanism (RCU?) is needed to make it completely safe here, >> but it seems a bit over-kill. >> >>> >>> process A: offline node XX: >>> for_each_populated_zone() >>> find online node XX >>> cond_resched() >>> offline cpu and memory, then try_offline_node() >>> node_set_offline(nid), and memset(pgdat, 0, sizeof(*pgdat)) >>> access node XX's pgdat >>> NULL pointer access error >> >> It's possible, but I did not meet this condition, did you? >> > > Yes, we test hot-add/hot-remove node with stress, and meet the following > call trace several times. Thanks. > > next_online_pgdat() > int nid = next_online_node(pgdat->node_id); // it's here, pgdat is NULL memset(pgdat, 0, sizeof(*pgdat)); This memset just sets the context of pgdat to 0, but it will not free pgdat, so the *pgdat is NULL* is strange here. But anyway, the bug is real, we must fix it. Regards, Gu > > I add some printk, it shows the above pgdat is just the offline node's pgdat. > The reason may be that for_each_zone() and for_each_populated_zone() are lock-less. > And stop machine could not resolve it, because cond_resched() maybe in cyclical code. > > [ 1422.011064] BUG: unable to handle kernel paging request at 0000000000025f60 > [ 1422.011086] IP: [] next_online_pgdat+0x1/0x50 > [ 1422.011178] PGD 0 > [ 1422.011180] Oops: 0000 [#1] SMP > [ 1422.011409] ACPI: Device does not support D3cold > [ 1422.011961] Modules linked in: fuse nls_iso8859_1 nls_cp437 vfat fat loop dm_mod coretemp mperf crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 pcspkr microcode igb dca i2c_algo_bit ipv6 megaraid_sas iTCO_wdt i2c_i801 i2c_core iTCO_vendor_support tg3 sg hwmon ptp lpc_ich pps_core mfd_core acpi_pad rtc_cmos button ext3 jbd mbcache sd_mod crc_t10dif scsi_dh_alua scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh ahci libahci libata scsi_mod [last unloaded: rasf] > [ 1422.012006] CPU: 23 PID: 238 Comm: kworker/23:1 Tainted: G O 3.10.15-5885-euler0302 #1 > [ 1422.012024] Hardware name: HUAWEI TECHNOLOGIES CO.,LTD. Huawei N1/Huawei N1, BIOS V100R001 03/02/2015 > [ 1422.012065] Workqueue: events vmstat_update > [ 1422.012084] task: ffffa800d32c0000 ti: ffffa800d32ae000 task.ti: ffffa800d32ae000 > [ 1422.012165] RIP: 0010:[] [] next_online_pgdat+0x1/0x50 > [ 1422.012205] RSP: 0018:ffffa800d32afce8 EFLAGS: 00010286 > [ 1422.012225] RAX: 0000000000001440 RBX: ffffffff81da53b8 RCX: 0000000000000082 > [ 1422.012226] RDX: 0000000000000000 RSI: 0000000000000082 RDI: 0000000000000000 > [ 1422.012254] RBP: ffffa800d32afd28 R08: ffffffff81c93bfc R09: ffffffff81cbdc96 > [ 1422.012272] R10: 00000000000040ec R11: 00000000000000a0 R12: ffffa800fffb3440 > [ 1422.012290] R13: ffffa800d32afd38 R14: 0000000000000017 R15: ffffa800e6616800 > [ 1422.012292] FS: 0000000000000000(0000) GS:ffffa800e6600000(0000) knlGS:0000000000000000 > [ 1422.012314] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 1422.012328] CR2: 0000000000025f60 CR3: 0000000001a0b000 CR4: 00000000001407e0 > [ 1422.012328] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 1422.012328] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [ 1422.012328] Stack: > [ 1422.012328] ffffa800d32afd28 ffffffff81126ca5 ffffa800ffffffff ffffffff814b4314 > [ 1422.012328] ffffa800d32ae010 0000000000000000 ffffa800e6616180 ffffa800fffb3440 > [ 1422.012328] ffffa800d32afde8 ffffffff81128220 ffffffff00000013 0000000000000038 > [ 1422.012328] Call Trace: > [ 1422.012328] [] ? next_zone+0xc5/0x150 > [ 1422.012328] [] ? __schedule+0x544/0x780 > [ 1422.012328] [] refresh_cpu_vm_stats+0xd0/0x140 > [ 1422.012328] [] vmstat_update+0x11/0x50 > [ 1422.012328] [] process_one_work+0x194/0x3d0 > [ 1422.012328] [] worker_thread+0x12b/0x410 > [ 1422.012328] [] ? manage_workers+0x1a0/0x1a0 > [ 1422.012328] [] kthread+0xc6/0xd0 > [ 1422.012328] [] ? kthread_freezable_should_stop+0x70/0x70 > [ 1422.012328] [] ret_from_fork+0x7c/0xb0 > [ 1422.012328] [] ? kthread_freezable_should_stop+0x70/0x70 > > Thanks, > Xishi Qiu > >> Regards, >> Gu >> >>> >>> Thanks, >>> Xishi Qiu >>> >>> -- >>> To unsubscribe, send a message with 'unsubscribe linux-mm' in >>> the body to majordomo@kvack.org. For more info on Linux MM, >>> see: http://www.linux-mm.org/ . >>> Don't email: email@kvack.org >>> >> >> >> >> . >> > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > . > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758151AbbCDENv (ORCPT ); Tue, 3 Mar 2015 23:13:51 -0500 Received: from cn.fujitsu.com ([59.151.112.132]:11051 "EHLO heian.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1757999AbbCDENu convert rfc822-to-8bit (ORCPT ); Tue, 3 Mar 2015 23:13:50 -0500 X-IronPort-AV: E=Sophos;i="5.04,848,1406563200"; d="scan'208";a="62786871" Message-ID: <54F68270.5000203@cn.fujitsu.com> Date: Wed, 4 Mar 2015 11:56:32 +0800 From: Gu Zheng User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:7.0.1) Gecko/20110930 Thunderbird/7.0.1 MIME-Version: 1.0 To: Xishi Qiu CC: Yasuaki Ishimatsu , Andrew Morton , Tang Chen , Yinghai Lu , Linux MM , LKML , Toshi Kani , Mel Gorman , Tejun Heo , Xiexiuqi , Hanjun Guo , Li Zefan Subject: Re: node-hotplug: is memset 0 safe in try_offline_node()? References: <54F52ACF.4030103@huawei.com> <54F58AE3.50101@cn.fujitsu.com> <54F66C52.4070600@huawei.com> <54F67376.8050001@huawei.com> In-Reply-To: <54F67376.8050001@huawei.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8BIT X-Originating-IP: [10.167.226.100] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Xishi, On 03/04/2015 10:52 AM, Xishi Qiu wrote: > On 2015/3/4 10:22, Xishi Qiu wrote: > >> On 2015/3/3 18:20, Gu Zheng wrote: >> >>> Hi Xishi, >>> On 03/03/2015 11:30 AM, Xishi Qiu wrote: >>> >>>> When hot-remove a numa node, we will clear pgdat, >>>> but is memset 0 safe in try_offline_node()? >>> >>> It is not safe here. In fact, this is a temporary solution here. >>> As you know, pgdat is accessed lock-less now, so protection >>> mechanism (RCU?) is needed to make it completely safe here, >>> but it seems a bit over-kill. >>> > > Hi Gu, > > Can we just remove "memset(pgdat, 0, sizeof(*pgdat));" ? > I find this will be fine in the stress test except the warning > when hot-add memory. As you see, it will trigger the warning in free_area_init_node(). Could you try the following patch? It will reset the pgdat before reuse it. diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 1778628..0717649 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1092,6 +1092,9 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start) return NULL; arch_refresh_nodedata(nid, pgdat); + } else { + /* Reset the pgdat to reuse */ + memset(pgdat, 0, sizeof(*pgdat)); } /* we can use NODE_DATA(nid) from here */ @@ -2021,15 +2024,6 @@ void try_offline_node(int nid) /* notify that the node is down */ call_node_notify(NODE_DOWN, (void *)(long)nid); - - /* - * Since there is no way to guarentee the address of pgdat/zone is not - * on stack of any kernel threads or used by other kernel objects - * without reference counting or other symchronizing method, do not - * reset node_data and free pgdat here. Just reset it to 0 and reuse - * the memory when the node is online again. - */ - memset(pgdat, 0, sizeof(*pgdat)); } EXPORT_SYMBOL(try_offline_node); > > Thanks, > Xishi Qiu > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > . > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934762AbbCDHCN (ORCPT ); Wed, 4 Mar 2015 02:02:13 -0500 Received: from szxga02-in.huawei.com ([119.145.14.65]:3960 "EHLO szxga02-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933252AbbCDHCI (ORCPT ); Wed, 4 Mar 2015 02:02:08 -0500 Message-ID: <54F6ADD2.3080403@huawei.com> Date: Wed, 4 Mar 2015 15:01:38 +0800 From: Xishi Qiu User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: Gu Zheng CC: Yasuaki Ishimatsu , Andrew Morton , Tang Chen , Yinghai Lu , Linux MM , LKML , Toshi Kani , Mel Gorman , Tejun Heo , Xiexiuqi , Hanjun Guo Subject: Re: node-hotplug: is memset 0 safe in try_offline_node()? References: <54F52ACF.4030103@huawei.com> <54F58AE3.50101@cn.fujitsu.com> <54F66C52.4070600@huawei.com> <54F681A7.4050203@cn.fujitsu.com> In-Reply-To: <54F681A7.4050203@cn.fujitsu.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit X-Originating-IP: [10.177.25.179] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2015/3/4 11:53, Gu Zheng wrote: > Hi Xishi, > > On 03/04/2015 10:22 AM, Xishi Qiu wrote: > >> On 2015/3/3 18:20, Gu Zheng wrote: >> >>> Hi Xishi, >>> On 03/03/2015 11:30 AM, Xishi Qiu wrote: >>> >>>> When hot-remove a numa node, we will clear pgdat, >>>> but is memset 0 safe in try_offline_node()? >>> >>> It is not safe here. In fact, this is a temporary solution here. >>> As you know, pgdat is accessed lock-less now, so protection >>> mechanism (RCU?) is needed to make it completely safe here, >>> but it seems a bit over-kill. >>> >>>> >>>> process A: offline node XX: >>>> for_each_populated_zone() >>>> find online node XX >>>> cond_resched() >>>> offline cpu and memory, then try_offline_node() >>>> node_set_offline(nid), and memset(pgdat, 0, sizeof(*pgdat)) >>>> access node XX's pgdat >>>> NULL pointer access error >>> >>> It's possible, but I did not meet this condition, did you? >>> >> >> Yes, we test hot-add/hot-remove node with stress, and meet the following >> call trace several times. > > Thanks. > >> >> next_online_pgdat() >> int nid = next_online_node(pgdat->node_id); // it's here, pgdat is NULL > > memset(pgdat, 0, sizeof(*pgdat)); > This memset just sets the context of pgdat to 0, but it will not free pgdat, so the *pgdat is > NULL* is strange here. > But anyway, the bug is real, we must fix it. next_zone() pg_data_t *pgdat = zone->zone_pgdat; // I think this pgdat is NULL, and NODE_DATA() is not NULL. ... pgdat = next_online_pgdat(pgdat); int nid = next_online_node(pgdat->node_id); // so here is the null pointer access Thanks for your new patch, I'll test it. Thanks, Xishi Qiu From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S936086AbbCDIJp (ORCPT ); Wed, 4 Mar 2015 03:09:45 -0500 Received: from szxga01-in.huawei.com ([58.251.152.64]:46319 "EHLO szxga01-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933632AbbCDIJn (ORCPT ); Wed, 4 Mar 2015 03:09:43 -0500 X-Greylist: delayed 311 seconds by postgrey-1.27 at vger.kernel.org; Wed, 04 Mar 2015 03:09:43 EST Message-ID: <54F6BC43.3000509@huawei.com> Date: Wed, 4 Mar 2015 16:03:15 +0800 From: Xishi Qiu User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: Gu Zheng CC: Yasuaki Ishimatsu , Andrew Morton , Tang Chen , Yinghai Lu , Linux MM , LKML , Toshi Kani , Mel Gorman , Tejun Heo , Xiexiuqi , Hanjun Guo , Li Zefan Subject: Re: node-hotplug: is memset 0 safe in try_offline_node()? References: <54F52ACF.4030103@huawei.com> <54F58AE3.50101@cn.fujitsu.com> <54F66C52.4070600@huawei.com> <54F67376.8050001@huawei.com> <54F68270.5000203@cn.fujitsu.com> In-Reply-To: <54F68270.5000203@cn.fujitsu.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit X-Originating-IP: [10.177.25.179] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2015/3/4 11:56, Gu Zheng wrote: > Hi Xishi, > On 03/04/2015 10:52 AM, Xishi Qiu wrote: > >> On 2015/3/4 10:22, Xishi Qiu wrote: >> >>> On 2015/3/3 18:20, Gu Zheng wrote: >>> >>>> Hi Xishi, >>>> On 03/03/2015 11:30 AM, Xishi Qiu wrote: >>>> >>>>> When hot-remove a numa node, we will clear pgdat, >>>>> but is memset 0 safe in try_offline_node()? >>>> >>>> It is not safe here. In fact, this is a temporary solution here. >>>> As you know, pgdat is accessed lock-less now, so protection >>>> mechanism (RCU?) is needed to make it completely safe here, >>>> but it seems a bit over-kill. >>>> >> >> Hi Gu, >> >> Can we just remove "memset(pgdat, 0, sizeof(*pgdat));" ? >> I find this will be fine in the stress test except the warning >> when hot-add memory. > > As you see, it will trigger the warning in free_area_init_node(). > Could you try the following patch? It will reset the pgdat before reuse it. > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index 1778628..0717649 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -1092,6 +1092,9 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start) > return NULL; > > arch_refresh_nodedata(nid, pgdat); > + } else { > + /* Reset the pgdat to reuse */ > + memset(pgdat, 0, sizeof(*pgdat)); > } Hi Gu, If schedule last a long time, next_zone may be still access the pgdat here, so it is not safe enough, right? Thanks Xishi Qiu > > /* we can use NODE_DATA(nid) from here */ > @@ -2021,15 +2024,6 @@ void try_offline_node(int nid) > > /* notify that the node is down */ > call_node_notify(NODE_DOWN, (void *)(long)nid); > - > - /* > - * Since there is no way to guarentee the address of pgdat/zone is not > - * on stack of any kernel threads or used by other kernel objects > - * without reference counting or other symchronizing method, do not > - * reset node_data and free pgdat here. Just reset it to 0 and reuse > - * the memory when the node is online again. > - */ > - memset(pgdat, 0, sizeof(*pgdat)); > } > EXPORT_SYMBOL(try_offline_node); > > >> >> Thanks, >> Xishi Qiu >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> Please read the FAQ at http://www.tux.org/lkml/ >> . >> > > > > . > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S936233AbbCDIcg (ORCPT ); Wed, 4 Mar 2015 03:32:36 -0500 Received: from szxga02-in.huawei.com ([119.145.14.65]:63894 "EHLO szxga02-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758217AbbCDIcd (ORCPT ); Wed, 4 Mar 2015 03:32:33 -0500 Message-ID: <54F6C2E6.4030500@huawei.com> Date: Wed, 4 Mar 2015 16:31:34 +0800 From: Xie XiuQi User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:31.0) Gecko/20100101 Thunderbird/31.5.0 MIME-Version: 1.0 To: Gu Zheng , Xishi Qiu CC: Yasuaki Ishimatsu , Andrew Morton , Tang Chen , Yinghai Lu , Linux MM , LKML , Toshi Kani , Mel Gorman , Tejun Heo , Hanjun Guo Subject: Re: node-hotplug: is memset 0 safe in try_offline_node()? References: <54F52ACF.4030103@huawei.com> <54F58AE3.50101@cn.fujitsu.com> <54F66C52.4070600@huawei.com> <54F681A7.4050203@cn.fujitsu.com> In-Reply-To: <54F681A7.4050203@cn.fujitsu.com> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit X-Originating-IP: [10.177.17.191] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2015/3/4 11:53, Gu Zheng wrote: > Hi Xishi, > > On 03/04/2015 10:22 AM, Xishi Qiu wrote: > >> On 2015/3/3 18:20, Gu Zheng wrote: >> >>> Hi Xishi, >>> On 03/03/2015 11:30 AM, Xishi Qiu wrote: >>> >>>> When hot-remove a numa node, we will clear pgdat, >>>> but is memset 0 safe in try_offline_node()? >>> >>> It is not safe here. In fact, this is a temporary solution here. >>> As you know, pgdat is accessed lock-less now, so protection >>> mechanism (RCU?) is needed to make it completely safe here, >>> but it seems a bit over-kill. >>> >>>> >>>> process A: offline node XX: >>>> for_each_populated_zone() >>>> find online node XX >>>> cond_resched() >>>> offline cpu and memory, then try_offline_node() >>>> node_set_offline(nid), and memset(pgdat, 0, sizeof(*pgdat)) >>>> access node XX's pgdat >>>> NULL pointer access error >>> >>> It's possible, but I did not meet this condition, did you? >>> >> >> Yes, we test hot-add/hot-remove node with stress, and meet the following >> call trace several times. > > Thanks. > >> >> next_online_pgdat() >> int nid = next_online_node(pgdat->node_id); // it's here, pgdat is NULL > > memset(pgdat, 0, sizeof(*pgdat)); > This memset just sets the context of pgdat to 0, but it will not free pgdat, so the *pgdat is > NULL* is strange here. Hi Gu, This pgdat isn't 0, but pgdat->zone[i]->zone_pgdat is 0. So pgdat is 0 in next_zone(). -- /* * next_zone - helper magic for for_each_zone() */ struct zone *next_zone(struct zone *zone) { pg_data_t *pgdat = zone->zone_pgdat; if (zone < pgdat->node_zones + MAX_NR_ZONES - 1) zone++; else { pgdat = next_online_pgdat(pgdat); if (pgdat) zone = pgdat->node_zones; else zone = NULL; } return zone; } > But anyway, the bug is real, we must fix it. > > Regards, > Gu > >> >> I add some printk, it shows the above pgdat is just the offline node's pgdat. >> The reason may be that for_each_zone() and for_each_populated_zone() are lock-less. >> And stop machine could not resolve it, because cond_resched() maybe in cyclical code. >> >> [ 1422.011064] BUG: unable to handle kernel paging request at 0000000000025f60 >> [ 1422.011086] IP: [] next_online_pgdat+0x1/0x50 >> [ 1422.011178] PGD 0 >> [ 1422.011180] Oops: 0000 [#1] SMP >> [ 1422.011409] ACPI: Device does not support D3cold >> [ 1422.011961] Modules linked in: fuse nls_iso8859_1 nls_cp437 vfat fat loop dm_mod coretemp mperf crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 pcspkr microcode igb dca i2c_algo_bit ipv6 megaraid_sas iTCO_wdt i2c_i801 i2c_core iTCO_vendor_support tg3 sg hwmon ptp lpc_ich pps_core mfd_core acpi_pad rtc_cmos button ext3 jbd mbcache sd_mod crc_t10dif scsi_dh_alua scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh ahci libahci libata scsi_mod [last unloaded: rasf] >> [ 1422.012006] CPU: 23 PID: 238 Comm: kworker/23:1 Tainted: G O 3.10.15-5885-euler0302 #1 >> [ 1422.012024] Hardware name: HUAWEI TECHNOLOGIES CO.,LTD. Huawei N1/Huawei N1, BIOS V100R001 03/02/2015 >> [ 1422.012065] Workqueue: events vmstat_update >> [ 1422.012084] task: ffffa800d32c0000 ti: ffffa800d32ae000 task.ti: ffffa800d32ae000 >> [ 1422.012165] RIP: 0010:[] [] next_online_pgdat+0x1/0x50 >> [ 1422.012205] RSP: 0018:ffffa800d32afce8 EFLAGS: 00010286 >> [ 1422.012225] RAX: 0000000000001440 RBX: ffffffff81da53b8 RCX: 0000000000000082 >> [ 1422.012226] RDX: 0000000000000000 RSI: 0000000000000082 RDI: 0000000000000000 >> [ 1422.012254] RBP: ffffa800d32afd28 R08: ffffffff81c93bfc R09: ffffffff81cbdc96 >> [ 1422.012272] R10: 00000000000040ec R11: 00000000000000a0 R12: ffffa800fffb3440 >> [ 1422.012290] R13: ffffa800d32afd38 R14: 0000000000000017 R15: ffffa800e6616800 >> [ 1422.012292] FS: 0000000000000000(0000) GS:ffffa800e6600000(0000) knlGS:0000000000000000 >> [ 1422.012314] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 1422.012328] CR2: 0000000000025f60 CR3: 0000000001a0b000 CR4: 00000000001407e0 >> [ 1422.012328] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> [ 1422.012328] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >> [ 1422.012328] Stack: >> [ 1422.012328] ffffa800d32afd28 ffffffff81126ca5 ffffa800ffffffff ffffffff814b4314 >> [ 1422.012328] ffffa800d32ae010 0000000000000000 ffffa800e6616180 ffffa800fffb3440 >> [ 1422.012328] ffffa800d32afde8 ffffffff81128220 ffffffff00000013 0000000000000038 >> [ 1422.012328] Call Trace: >> [ 1422.012328] [] ? next_zone+0xc5/0x150 >> [ 1422.012328] [] ? __schedule+0x544/0x780 >> [ 1422.012328] [] refresh_cpu_vm_stats+0xd0/0x140 >> [ 1422.012328] [] vmstat_update+0x11/0x50 >> [ 1422.012328] [] process_one_work+0x194/0x3d0 >> [ 1422.012328] [] worker_thread+0x12b/0x410 >> [ 1422.012328] [] ? manage_workers+0x1a0/0x1a0 >> [ 1422.012328] [] kthread+0xc6/0xd0 >> [ 1422.012328] [] ? kthread_freezable_should_stop+0x70/0x70 >> [ 1422.012328] [] ret_from_fork+0x7c/0xb0 >> [ 1422.012328] [] ? kthread_freezable_should_stop+0x70/0x70 >> >> Thanks, >> Xishi Qiu >> >>> Regards, >>> Gu >>> >>>> >>>> Thanks, >>>> Xishi Qiu >>>> >>>> -- >>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in >>>> the body to majordomo@kvack.org. For more info on Linux MM, >>>> see: http://www.linux-mm.org/ . >>>> Don't email: email@kvack.org >>>> >>> >>> >>> >>> . >>> >> >> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> Please read the FAQ at http://www.tux.org/lkml/ >> . >> > > > > . > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760614AbbCDJEI (ORCPT ); Wed, 4 Mar 2015 04:04:08 -0500 Received: from mgwkm02.jp.fujitsu.com ([202.219.69.169]:23057 "EHLO mgwkm02.jp.fujitsu.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753860AbbCDJEE (ORCPT ); Wed, 4 Mar 2015 04:04:04 -0500 X-SecurityPolicyCheck: OK by SHieldMailChecker v2.2.3 X-SHieldMailCheckerPolicyVersion: FJ-ISEC-20140219-2 Message-ID: <54F6C809.1080709@jp.fujitsu.com> Date: Wed, 4 Mar 2015 17:53:29 +0900 From: Kamezawa Hiroyuki User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:31.0) Gecko/20100101 Thunderbird/31.4.0 MIME-Version: 1.0 To: Xishi Qiu , Gu Zheng CC: Yasuaki Ishimatsu , Andrew Morton , Tang Chen , Yinghai Lu , Linux MM , LKML , Toshi Kani , Mel Gorman , Tejun Heo , Xiexiuqi , Hanjun Guo , Li Zefan Subject: Re: node-hotplug: is memset 0 safe in try_offline_node()? References: <54F52ACF.4030103@huawei.com> <54F58AE3.50101@cn.fujitsu.com> <54F66C52.4070600@huawei.com> <54F67376.8050001@huawei.com> <54F68270.5000203@cn.fujitsu.com> <54F6BC43.3000509@huawei.com> In-Reply-To: <54F6BC43.3000509@huawei.com> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 8bit X-SecurityPolicyCheck-GC: OK by FENCE-Mail X-TM-AS-MML: disable Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2015/03/04 17:03, Xishi Qiu wrote: > On 2015/3/4 11:56, Gu Zheng wrote: > >> Hi Xishi, >> On 03/04/2015 10:52 AM, Xishi Qiu wrote: >> >>> On 2015/3/4 10:22, Xishi Qiu wrote: >>> >>>> On 2015/3/3 18:20, Gu Zheng wrote: >>>> >>>>> Hi Xishi, >>>>> On 03/03/2015 11:30 AM, Xishi Qiu wrote: >>>>> >>>>>> When hot-remove a numa node, we will clear pgdat, >>>>>> but is memset 0 safe in try_offline_node()? >>>>> >>>>> It is not safe here. In fact, this is a temporary solution here. >>>>> As you know, pgdat is accessed lock-less now, so protection >>>>> mechanism (RCU?) is needed to make it completely safe here, >>>>> but it seems a bit over-kill. >>>>> >>> >>> Hi Gu, >>> >>> Can we just remove "memset(pgdat, 0, sizeof(*pgdat));" ? >>> I find this will be fine in the stress test except the warning >>> when hot-add memory. >> >> As you see, it will trigger the warning in free_area_init_node(). >> Could you try the following patch? It will reset the pgdat before reuse it. >> >> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >> index 1778628..0717649 100644 >> --- a/mm/memory_hotplug.c >> +++ b/mm/memory_hotplug.c >> @@ -1092,6 +1092,9 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start) >> return NULL; >> >> arch_refresh_nodedata(nid, pgdat); >> + } else { >> + /* Reset the pgdat to reuse */ >> + memset(pgdat, 0, sizeof(*pgdat)); >> } > > Hi Gu, > > If schedule last a long time, next_zone may be still access the pgdat here, > so it is not safe enough, right? > How about just reseting pgdat->nr_zones and pgdat->classzone_idx to be 0 rather than memset() ? It seems breaking pointer information in pgdat is not a choice. Just proper "values" should be reset. Thanks, -Kame From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759461AbbCDKLW (ORCPT ); Wed, 4 Mar 2015 05:11:22 -0500 Received: from cn.fujitsu.com ([59.151.112.132]:21035 "EHLO heian.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1758901AbbCDKLS convert rfc822-to-8bit (ORCPT ); Wed, 4 Mar 2015 05:11:18 -0500 X-IronPort-AV: E=Sophos;i="5.04,848,1406563200"; d="scan'208";a="63139934" Message-ID: <54F6D637.6040705@cn.fujitsu.com> Date: Wed, 4 Mar 2015 17:53:59 +0800 From: Gu Zheng User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:7.0.1) Gecko/20110930 Thunderbird/7.0.1 MIME-Version: 1.0 To: Kamezawa Hiroyuki , Xishi Qiu CC: Yasuaki Ishimatsu , Andrew Morton , Tang Chen , Yinghai Lu , Linux MM , LKML , Toshi Kani , Mel Gorman , Tejun Heo , Xiexiuqi , Hanjun Guo , Li Zefan , Taku Izumi Subject: Re: node-hotplug: is memset 0 safe in try_offline_node()? References: <54F52ACF.4030103@huawei.com> <54F58AE3.50101@cn.fujitsu.com> <54F66C52.4070600@huawei.com> <54F67376.8050001@huawei.com> <54F68270.5000203@cn.fujitsu.com> <54F6BC43.3000509@huawei.com> <54F6C809.1080709@jp.fujitsu.com> In-Reply-To: <54F6C809.1080709@jp.fujitsu.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8BIT X-Originating-IP: [10.167.226.100] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03/04/2015 04:53 PM, Kamezawa Hiroyuki wrote: > On 2015/03/04 17:03, Xishi Qiu wrote: >> On 2015/3/4 11:56, Gu Zheng wrote: >> >>> Hi Xishi, >>> On 03/04/2015 10:52 AM, Xishi Qiu wrote: >>> >>>> On 2015/3/4 10:22, Xishi Qiu wrote: >>>> >>>>> On 2015/3/3 18:20, Gu Zheng wrote: >>>>> >>>>>> Hi Xishi, >>>>>> On 03/03/2015 11:30 AM, Xishi Qiu wrote: >>>>>> >>>>>>> When hot-remove a numa node, we will clear pgdat, >>>>>>> but is memset 0 safe in try_offline_node()? >>>>>> >>>>>> It is not safe here. In fact, this is a temporary solution here. >>>>>> As you know, pgdat is accessed lock-less now, so protection >>>>>> mechanism (RCU?) is needed to make it completely safe here, >>>>>> but it seems a bit over-kill. >>>>>> >>>> >>>> Hi Gu, >>>> >>>> Can we just remove "memset(pgdat, 0, sizeof(*pgdat));" ? >>>> I find this will be fine in the stress test except the warning >>>> when hot-add memory. >>> >>> As you see, it will trigger the warning in free_area_init_node(). >>> Could you try the following patch? It will reset the pgdat before reuse it. >>> >>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >>> index 1778628..0717649 100644 >>> --- a/mm/memory_hotplug.c >>> +++ b/mm/memory_hotplug.c >>> @@ -1092,6 +1092,9 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start) >>> return NULL; >>> >>> arch_refresh_nodedata(nid, pgdat); >>> + } else { >>> + /* Reset the pgdat to reuse */ >>> + memset(pgdat, 0, sizeof(*pgdat)); >>> } >> >> Hi Gu, >> >> If schedule last a long time, next_zone may be still access the pgdat here, >> so it is not safe enough, right? Hi Xishi, IMO, the scheduled time is rather short if compares with the time gap between hot remove and hot re-add a node, so we can say it is safe here. >> > > How about just reseting pgdat->nr_zones and pgdat->classzone_idx to be 0 rather than > memset() ? > > It seems breaking pointer information in pgdat is not a choice. > Just proper "values" should be reset. Anyway, sounds reasonable. Best regards, Gu > > Thanks, > -Kame > > > > . > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754672AbbCEIn3 (ORCPT ); Thu, 5 Mar 2015 03:43:29 -0500 Received: from cn.fujitsu.com ([59.151.112.132]:23563 "EHLO heian.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1751170AbbCEIn2 (ORCPT ); Thu, 5 Mar 2015 03:43:28 -0500 X-IronPort-AV: E=Sophos;i="5.04,848,1406563200"; d="scan'208";a="64868335" Message-ID: <54F81322.8010202@cn.fujitsu.com> Date: Thu, 5 Mar 2015 16:26:10 +0800 From: Gu Zheng User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:7.0.1) Gecko/20110930 Thunderbird/7.0.1 MIME-Version: 1.0 To: Xishi Qiu CC: Yasuaki Ishimatsu , Andrew Morton , Tang Chen , Yinghai Lu , Linux MM , LKML , Toshi Kani , Mel Gorman , Tejun Heo Subject: Re: node-hotplug: is memset 0 safe in try_offline_node()? References: <54F52ACF.4030103@huawei.com> In-Reply-To: <54F52ACF.4030103@huawei.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.167.226.100] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Xishi, Could you please try the following one? It postpones the reset of obsolete pgdat from try_offline_node() to hotadd_new_pgdat(), and just resetting pgdat->nr_zones and pgdat->classzone_idx to be 0 rather than the whole reset by memset() as Kame suggested. Regards, Gu --- mm/memory_hotplug.c | 13 ++++--------- 1 files changed, 4 insertions(+), 9 deletions(-) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 1778628..c17eebf 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1092,6 +1092,10 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start) return NULL; arch_refresh_nodedata(nid, pgdat); + } else { + /* Reset the nr_zones and classzone_idx to 0 before reuse */ + pgdat->nr_zones = 0; + pgdat->classzone_idx = 0; } /* we can use NODE_DATA(nid) from here */ @@ -2021,15 +2025,6 @@ void try_offline_node(int nid) /* notify that the node is down */ call_node_notify(NODE_DOWN, (void *)(long)nid); - - /* - * Since there is no way to guarentee the address of pgdat/zone is not - * on stack of any kernel threads or used by other kernel objects - * without reference counting or other symchronizing method, do not - * reset node_data and free pgdat here. Just reset it to 0 and reuse - * the memory when the node is online again. - */ - memset(pgdat, 0, sizeof(*pgdat)); } EXPORT_SYMBOL(try_offline_node); -- 1.7.7 On 03/03/2015 11:30 AM, Xishi Qiu wrote: > When hot-remove a numa node, we will clear pgdat, > but is memset 0 safe in try_offline_node()? > > process A: offline node XX: > for_each_populated_zone() > find online node XX > cond_resched() > offline cpu and memory, then try_offline_node() > node_set_offline(nid), and memset(pgdat, 0, sizeof(*pgdat)) > access node XX's pgdat > NULL pointer access error > > Thanks, > Xishi Qiu > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932209AbbCEJjg (ORCPT ); Thu, 5 Mar 2015 04:39:36 -0500 Received: from szxga01-in.huawei.com ([58.251.152.64]:27935 "EHLO szxga01-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932171AbbCEJjd (ORCPT ); Thu, 5 Mar 2015 04:39:33 -0500 Message-ID: <54F8243D.7020809@huawei.com> Date: Thu, 5 Mar 2015 17:39:09 +0800 From: Xishi Qiu User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: Gu Zheng CC: Yasuaki Ishimatsu , Andrew Morton , Tang Chen , Yinghai Lu , Linux MM , LKML , Toshi Kani , Mel Gorman , Tejun Heo , Kamezawa Hiroyuki Subject: Re: node-hotplug: is memset 0 safe in try_offline_node()? References: <54F52ACF.4030103@huawei.com> <54F81322.8010202@cn.fujitsu.com> In-Reply-To: <54F81322.8010202@cn.fujitsu.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.177.25.179] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2015/3/5 16:26, Gu Zheng wrote: > Hi Xishi, > Could you please try the following one? > It postpones the reset of obsolete pgdat from try_offline_node() to > hotadd_new_pgdat(), and just resetting pgdat->nr_zones and > pgdat->classzone_idx to be 0 rather than the whole reset by memset() > as Kame suggested. > > Regards, > Gu > > --- > mm/memory_hotplug.c | 13 ++++--------- > 1 files changed, 4 insertions(+), 9 deletions(-) > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index 1778628..c17eebf 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -1092,6 +1092,10 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start) > return NULL; > > arch_refresh_nodedata(nid, pgdat); > + } else { > + /* Reset the nr_zones and classzone_idx to 0 before reuse */ > + pgdat->nr_zones = 0; > + pgdat->classzone_idx = 0; Hi Gu, This is just to avoid the warning, I think it's no meaning. Here is the changlog from the original patch: commit 88fdf75d1bb51d85ba00c466391770056d44bc03 ... Warn if memory-hotplug/boot code doesn't initialize pg_data_t with zero when it is allocated. Arch code and memory hotplug already initiailize pg_data_t. So this warning should never happen. I select fields *randomly* near the beginning, middle and end of pg_data_t for checking. ... Thanks, Xishi Qiu > } > > /* we can use NODE_DATA(nid) from here */ > @@ -2021,15 +2025,6 @@ void try_offline_node(int nid) > > /* notify that the node is down */ > call_node_notify(NODE_DOWN, (void *)(long)nid); > - > - /* > - * Since there is no way to guarentee the address of pgdat/zone is not > - * on stack of any kernel threads or used by other kernel objects > - * without reference counting or other symchronizing method, do not > - * reset node_data and free pgdat here. Just reset it to 0 and reuse > - * the memory when the node is online again. > - */ > - memset(pgdat, 0, sizeof(*pgdat)); > } > EXPORT_SYMBOL(try_offline_node); > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932459AbbCEKDK (ORCPT ); Thu, 5 Mar 2015 05:03:10 -0500 Received: from cn.fujitsu.com ([59.151.112.132]:23076 "EHLO heian.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S932346AbbCEKDG (ORCPT ); Thu, 5 Mar 2015 05:03:06 -0500 X-IronPort-AV: E=Sophos;i="5.04,848,1406563200"; d="scan'208";a="64987723" Message-ID: <54F825CB.8040402@cn.fujitsu.com> Date: Thu, 5 Mar 2015 17:45:47 +0800 From: Gu Zheng User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:7.0.1) Gecko/20110930 Thunderbird/7.0.1 MIME-Version: 1.0 To: Xishi Qiu CC: Yasuaki Ishimatsu , Andrew Morton , Tang Chen , Yinghai Lu , Linux MM , LKML , Toshi Kani , Mel Gorman , Tejun Heo , Kamezawa Hiroyuki Subject: Re: node-hotplug: is memset 0 safe in try_offline_node()? References: <54F52ACF.4030103@huawei.com> <54F81322.8010202@cn.fujitsu.com> <54F8243D.7020809@huawei.com> In-Reply-To: <54F8243D.7020809@huawei.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.167.226.100] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Xishi, On 03/05/2015 05:39 PM, Xishi Qiu wrote: > On 2015/3/5 16:26, Gu Zheng wrote: > >> Hi Xishi, >> Could you please try the following one? >> It postpones the reset of obsolete pgdat from try_offline_node() to >> hotadd_new_pgdat(), and just resetting pgdat->nr_zones and >> pgdat->classzone_idx to be 0 rather than the whole reset by memset() >> as Kame suggested. >> >> Regards, >> Gu >> >> --- >> mm/memory_hotplug.c | 13 ++++--------- >> 1 files changed, 4 insertions(+), 9 deletions(-) >> >> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >> index 1778628..c17eebf 100644 >> --- a/mm/memory_hotplug.c >> +++ b/mm/memory_hotplug.c >> @@ -1092,6 +1092,10 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start) >> return NULL; >> >> arch_refresh_nodedata(nid, pgdat); >> + } else { >> + /* Reset the nr_zones and classzone_idx to 0 before reuse */ >> + pgdat->nr_zones = 0; >> + pgdat->classzone_idx = 0; > > Hi Gu, > > This is just to avoid the warning, I think it's no meaning. Can not agree. The key point here is postponing the reset of obsolete pgdat to the time we want to reuse it to avoid the effect(Oops: 0000 as you mentioned), and avoiding warning is the minor benefit, though it is also important. > Here is the changlog from the original patch: > > commit 88fdf75d1bb51d85ba00c466391770056d44bc03 > ... > Warn if memory-hotplug/boot code doesn't initialize pg_data_t with zero > when it is allocated. Arch code and memory hotplug already initiailize > pg_data_t. So this warning should never happen. I select fields *randomly* > near the beginning, middle and end of pg_data_t for checking. > ... There was not hot remove node that time, so it seems did not consider the *reuse* case, but anyway, we should not break it here. Regards, Gu > > Thanks, > Xishi Qiu > >> } >> >> /* we can use NODE_DATA(nid) from here */ >> @@ -2021,15 +2025,6 @@ void try_offline_node(int nid) >> >> /* notify that the node is down */ >> call_node_notify(NODE_DOWN, (void *)(long)nid); >> - >> - /* >> - * Since there is no way to guarentee the address of pgdat/zone is not >> - * on stack of any kernel threads or used by other kernel objects >> - * without reference counting or other symchronizing method, do not >> - * reset node_data and free pgdat here. Just reset it to 0 and reuse >> - * the memory when the node is online again. >> - */ >> - memset(pgdat, 0, sizeof(*pgdat)); >> } >> EXPORT_SYMBOL(try_offline_node); >> > > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org > . > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752834AbbCKB3f (ORCPT ); Tue, 10 Mar 2015 21:29:35 -0400 Received: from cn.fujitsu.com ([59.151.112.132]:49749 "EHLO heian.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1751532AbbCKB3b (ORCPT ); Tue, 10 Mar 2015 21:29:31 -0400 X-IronPort-AV: E=Sophos;i="5.04,848,1406563200"; d="scan'208";a="72934138" Message-ID: <54FF9662.8080303@cn.fujitsu.com> Date: Wed, 11 Mar 2015 09:12:02 +0800 From: Gu Zheng User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:7.0.1) Gecko/20110930 Thunderbird/7.0.1 MIME-Version: 1.0 To: Xishi Qiu CC: Yasuaki Ishimatsu , Andrew Morton , Tang Chen , Yinghai Lu , Linux MM , LKML , Toshi Kani , Mel Gorman , Tejun Heo , Kamezawa Hiroyuki Subject: Re: node-hotplug: is memset 0 safe in try_offline_node()? References: <54F52ACF.4030103@huawei.com> <54F81322.8010202@cn.fujitsu.com> <54F8243D.7020809@huawei.com> In-Reply-To: <54F8243D.7020809@huawei.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.167.226.100] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Xishi, What is the condition of this problem now? Regards, Gu On 03/05/2015 05:39 PM, Xishi Qiu wrote: > On 2015/3/5 16:26, Gu Zheng wrote: > >> Hi Xishi, >> Could you please try the following one? >> It postpones the reset of obsolete pgdat from try_offline_node() to >> hotadd_new_pgdat(), and just resetting pgdat->nr_zones and >> pgdat->classzone_idx to be 0 rather than the whole reset by memset() >> as Kame suggested. >> >> Regards, >> Gu >> >> --- >> mm/memory_hotplug.c | 13 ++++--------- >> 1 files changed, 4 insertions(+), 9 deletions(-) >> >> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >> index 1778628..c17eebf 100644 >> --- a/mm/memory_hotplug.c >> +++ b/mm/memory_hotplug.c >> @@ -1092,6 +1092,10 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start) >> return NULL; >> >> arch_refresh_nodedata(nid, pgdat); >> + } else { >> + /* Reset the nr_zones and classzone_idx to 0 before reuse */ >> + pgdat->nr_zones = 0; >> + pgdat->classzone_idx = 0; > > Hi Gu, > > This is just to avoid the warning, I think it's no meaning. > Here is the changlog from the original patch: > > commit 88fdf75d1bb51d85ba00c466391770056d44bc03 > ... > Warn if memory-hotplug/boot code doesn't initialize pg_data_t with zero > when it is allocated. Arch code and memory hotplug already initiailize > pg_data_t. So this warning should never happen. I select fields *randomly* > near the beginning, middle and end of pg_data_t for checking. > ... > > Thanks, > Xishi Qiu > >> } >> >> /* we can use NODE_DATA(nid) from here */ >> @@ -2021,15 +2025,6 @@ void try_offline_node(int nid) >> >> /* notify that the node is down */ >> call_node_notify(NODE_DOWN, (void *)(long)nid); >> - >> - /* >> - * Since there is no way to guarentee the address of pgdat/zone is not >> - * on stack of any kernel threads or used by other kernel objects >> - * without reference counting or other symchronizing method, do not >> - * reset node_data and free pgdat here. Just reset it to 0 and reuse >> - * the memory when the node is online again. >> - */ >> - memset(pgdat, 0, sizeof(*pgdat)); >> } >> EXPORT_SYMBOL(try_offline_node); >> > > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org > . > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751931AbbCKCyM (ORCPT ); Tue, 10 Mar 2015 22:54:12 -0400 Received: from szxga03-in.huawei.com ([119.145.14.66]:42896 "EHLO szxga03-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750914AbbCKCyI (ORCPT ); Tue, 10 Mar 2015 22:54:08 -0400 Message-ID: <54FFADB6.60604@huawei.com> Date: Wed, 11 Mar 2015 10:51:34 +0800 From: Xie XiuQi User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:31.0) Gecko/20100101 Thunderbird/31.5.0 MIME-Version: 1.0 To: Gu Zheng , Xishi Qiu CC: Yasuaki Ishimatsu , Andrew Morton , Tang Chen , Yinghai Lu , Linux MM , LKML , Toshi Kani , Mel Gorman , Tejun Heo , Kamezawa Hiroyuki Subject: Re: node-hotplug: is memset 0 safe in try_offline_node()? References: <54F52ACF.4030103@huawei.com> <54F81322.8010202@cn.fujitsu.com> <54F8243D.7020809@huawei.com> <54FF9662.8080303@cn.fujitsu.com> In-Reply-To: <54FF9662.8080303@cn.fujitsu.com> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.177.17.191] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A020206.54FFADC7.00C0,ss=1,re=0.001,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0, ip=0.0.0.0, so=2013-05-26 15:14:31, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: 855b6d3c02475b5bac27360141d803c2 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2015/3/11 9:12, Gu Zheng wrote: > Hi Xishi, > > What is the condition of this problem now? Hi Gu, I have no machine to do this test now. But I've tested the patch "just remove memset 0" more than 20 hours last week, it's OK. Thanks, Xie XiuQi > > Regards, > Gu > On 03/05/2015 05:39 PM, Xishi Qiu wrote: > >> On 2015/3/5 16:26, Gu Zheng wrote: >> >>> Hi Xishi, >>> Could you please try the following one? >>> It postpones the reset of obsolete pgdat from try_offline_node() to >>> hotadd_new_pgdat(), and just resetting pgdat->nr_zones and >>> pgdat->classzone_idx to be 0 rather than the whole reset by memset() >>> as Kame suggested. >>> >>> Regards, >>> Gu >>> >>> --- >>> mm/memory_hotplug.c | 13 ++++--------- >>> 1 files changed, 4 insertions(+), 9 deletions(-) >>> >>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >>> index 1778628..c17eebf 100644 >>> --- a/mm/memory_hotplug.c >>> +++ b/mm/memory_hotplug.c >>> @@ -1092,6 +1092,10 @@ static pg_data_t __ref *hotadd_new_pgdat(int nid, u64 start) >>> return NULL; >>> >>> arch_refresh_nodedata(nid, pgdat); >>> + } else { >>> + /* Reset the nr_zones and classzone_idx to 0 before reuse */ >>> + pgdat->nr_zones = 0; >>> + pgdat->classzone_idx = 0; >> >> Hi Gu, >> >> This is just to avoid the warning, I think it's no meaning. >> Here is the changlog from the original patch: >> >> commit 88fdf75d1bb51d85ba00c466391770056d44bc03 >> ... >> Warn if memory-hotplug/boot code doesn't initialize pg_data_t with zero >> when it is allocated. Arch code and memory hotplug already initiailize >> pg_data_t. So this warning should never happen. I select fields *randomly* >> near the beginning, middle and end of pg_data_t for checking. >> ... >> >> Thanks, >> Xishi Qiu >> >>> } >>> >>> /* we can use NODE_DATA(nid) from here */ >>> @@ -2021,15 +2025,6 @@ void try_offline_node(int nid) >>> >>> /* notify that the node is down */ >>> call_node_notify(NODE_DOWN, (void *)(long)nid); >>> - >>> - /* >>> - * Since there is no way to guarentee the address of pgdat/zone is not >>> - * on stack of any kernel threads or used by other kernel objects >>> - * without reference counting or other symchronizing method, do not >>> - * reset node_data and free pgdat here. Just reset it to 0 and reuse >>> - * the memory when the node is online again. >>> - */ >>> - memset(pgdat, 0, sizeof(*pgdat)); >>> } >>> EXPORT_SYMBOL(try_offline_node); >>> >> >> >> >> -- >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> the body to majordomo@kvack.org. For more info on Linux MM, >> see: http://www.linux-mm.org/ . >> Don't email: email@kvack.org >> . >> > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > > . >