From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wen Congyang Subject: Re: [Patch v4 0/8] bugfix for memory hotplug Date: Wed, 31 Oct 2012 19:32:09 +0800 Message-ID: <50910C39.70305@cn.fujitsu.com> References: <1351682594-17347-1-git-send-email-wency@cn.fujitsu.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1351682594-17347-1-git-send-email-wency@cn.fujitsu.com> Sender: owner-linux-mm@kvack.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, Jiang Liu , Len Brown , Andrew Morton , KOSAKI Motohiro , Yasuaki Ishimatsu , rjw@sisk.pl, Lai Jiangshan List-Id: linux-acpi@vger.kernel.org At 10/31/2012 07:23 PM, Wen Congyang Wrote: > The last version is here: > https://lkml.org/lkml/2012/10/19/56 > > Note: patch 1-3 are in -mm tree and I don't touch them. The other patches > except patch6 are also in mm tree. Patch 6 is not touched. > > Changes from v3 to v4: > Patch4: use dynamically allocated memory instead of static array. > Patch5: merge [patchv3 2-3] into a single patch, and update it as we use > dynamically allocated memory > Patch7: merge [patchv3 5-6] into a single patch > Patch8: merge [patchv3 9] and its fix into a patch Note: The patch from Michal Hocko is not merged into patch8 Thanks Wen Congyang > > Changes from v2 to v3: > Merge the bug fix from ishimatsu to this patchset(Patch 1-3) > Patch 3: split it from patch as it fixes another bug. > Patch 4: new patch, and fix bad-page state when hotadding a memory > device after hotremoving it. I forgot to post this patch in v2. > Patch 6: update it according to Dave Hansen's comment. > > Changes from v1 to v2: > Patch 1: updated according to kosaki's suggestion > > Patch 2: new patch, and update mce_bad_pages when removing memory. > > Patch 4: new patch, and fix a NR_FREE_PAGES mismatch, and this bug > cause oom in my test. > > Patch 5: new patch, and fix a new bug. When repeating to online/offline > pages, the free pages will continue to decrease. > > Wen Congyang (6): > memory-hotplug: auto offline page_cgroup when onlining memory block > failed > memory-hotplug: fix NR_FREE_PAGES mismatch > numa: convert static memory to dynamically allocated memory for per > node device > clear the memory to store struct page > memory-hotplug: current hwpoison doesn't support memory offline > memory-hotplug: allocate zone's pcp before onlining pages > > Yasuaki Ishimatsu (2): > memory hotplug: suppress "Device memoryX does not have a release() > function" warning > suppress "Device nodeX does not have a release() function" warning > > arch/powerpc/kernel/sysfs.c | 4 +-- > drivers/base/memory.c | 9 ++++++- > drivers/base/node.c | 56 ++++++++++++++++++++++++++++++------------ > include/linux/node.h | 2 +- > include/linux/page-isolation.h | 10 +++++--- > mm/hugetlb.c | 4 +-- > mm/memory-failure.c | 2 +- > mm/memory_hotplug.c | 13 +++++++--- > mm/page_alloc.c | 37 +++++++++++++++++++++------- > mm/page_cgroup.c | 3 +++ > mm/page_isolation.c | 27 ++++++++++++++------ > mm/sparse.c | 25 ++++++++++++++++++- > 12 files changed, 144 insertions(+), 48 deletions(-) > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wen Congyang Subject: [Patch v4 0/8] bugfix for memory hotplug Date: Wed, 31 Oct 2012 19:23:06 +0800 Message-ID: <1351682594-17347-1-git-send-email-wency@cn.fujitsu.com> Return-path: Sender: owner-linux-mm@kvack.org To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org Cc: Jiang Liu , Len Brown , Andrew Morton , KOSAKI Motohiro , Yasuaki Ishimatsu , rjw@sisk.pl, Lai Jiangshan , Wen Congyang List-Id: linux-acpi@vger.kernel.org The last version is here: https://lkml.org/lkml/2012/10/19/56 Note: patch 1-3 are in -mm tree and I don't touch them. The other patches except patch6 are also in mm tree. Patch 6 is not touched. Changes from v3 to v4: Patch4: use dynamically allocated memory instead of static array. Patch5: merge [patchv3 2-3] into a single patch, and update it as we use dynamically allocated memory Patch7: merge [patchv3 5-6] into a single patch Patch8: merge [patchv3 9] and its fix into a patch Changes from v2 to v3: Merge the bug fix from ishimatsu to this patchset(Patch 1-3) Patch 3: split it from patch as it fixes another bug. Patch 4: new patch, and fix bad-page state when hotadding a memory device after hotremoving it. I forgot to post this patch in v2. Patch 6: update it according to Dave Hansen's comment. Changes from v1 to v2: Patch 1: updated according to kosaki's suggestion Patch 2: new patch, and update mce_bad_pages when removing memory. Patch 4: new patch, and fix a NR_FREE_PAGES mismatch, and this bug cause oom in my test. Patch 5: new patch, and fix a new bug. When repeating to online/offline pages, the free pages will continue to decrease. Wen Congyang (6): memory-hotplug: auto offline page_cgroup when onlining memory block failed memory-hotplug: fix NR_FREE_PAGES mismatch numa: convert static memory to dynamically allocated memory for per node device clear the memory to store struct page memory-hotplug: current hwpoison doesn't support memory offline memory-hotplug: allocate zone's pcp before onlining pages Yasuaki Ishimatsu (2): memory hotplug: suppress "Device memoryX does not have a release() function" warning suppress "Device nodeX does not have a release() function" warning arch/powerpc/kernel/sysfs.c | 4 +-- drivers/base/memory.c | 9 ++++++- drivers/base/node.c | 56 ++++++++++++++++++++++++++++++------------ include/linux/node.h | 2 +- include/linux/page-isolation.h | 10 +++++--- mm/hugetlb.c | 4 +-- mm/memory-failure.c | 2 +- mm/memory_hotplug.c | 13 +++++++--- mm/page_alloc.c | 37 +++++++++++++++++++++------- mm/page_cgroup.c | 3 +++ mm/page_isolation.c | 27 ++++++++++++++------ mm/sparse.c | 25 ++++++++++++++++++- 12 files changed, 144 insertions(+), 48 deletions(-) -- 1.8.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wen Congyang Subject: [Patch v4 6/8] clear the memory to store struct page Date: Wed, 31 Oct 2012 19:23:12 +0800 Message-ID: <1351682594-17347-7-git-send-email-wency@cn.fujitsu.com> References: <1351682594-17347-1-git-send-email-wency@cn.fujitsu.com> Return-path: In-Reply-To: <1351682594-17347-1-git-send-email-wency@cn.fujitsu.com> Sender: owner-linux-mm@kvack.org To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org Cc: Jiang Liu , Len Brown , Andrew Morton , KOSAKI Motohiro , Yasuaki Ishimatsu , rjw@sisk.pl, Lai Jiangshan , Wen Congyang , David Rientjes , Minchan Kim List-Id: linux-acpi@vger.kernel.org If sparse memory vmemmap is enabled, we can't free the memory to store struct page when a memory device is hotremoved, because we may store struct page in the memory to manage the memory which doesn't belong to this memory device. When we hotadded this memory device again, we will reuse this memory to store struct page, and struct page may contain some obsolete information, and we will get bad-page state: [ 59.611278] init_memory_mapping: [mem 0x80000000-0x9fffffff] [ 59.637836] Built 2 zonelists in Node order, mobility grouping on. Total pages: 547617 [ 59.638739] Policy zone: Normal [ 59.650840] BUG: Bad page state in process bash pfn:9b6dc [ 59.651124] page:ffffea0002200020 count:0 mapcount:0 mapping: (null) index:0xfdfdfdfdfdfdfdfd [ 59.651494] page flags: 0x2fdfdfdfd5df9fd(locked|referenced|uptodate|dirty|lru|active|slab|owner_priv_1|private|private_2|writeback|head|tail|swapcache|reclaim|swapbacked|unevictable|uncached|compound_lock) [ 59.653604] Modules linked in: netconsole acpiphp pci_hotplug acpi_memhotplug loop kvm_amd kvm microcode tpm_tis tpm tpm_bios evdev psmouse serio_raw i2c_piix4 i2c_core parport_pc parport processor button thermal_sys ext3 jbd mbcache sg sr_mod cdrom ata_generic virtio_net ata_piix virtio_blk libata virtio_pci virtio_ring virtio scsi_mod [ 59.656998] Pid: 988, comm: bash Not tainted 3.6.0-rc7-guest #12 [ 59.657172] Call Trace: [ 59.657275] [] ? bad_page+0xb0/0x100 [ 59.657434] [] ? free_pages_prepare+0xb3/0x100 [ 59.657610] [] ? free_hot_cold_page+0x48/0x1a0 [ 59.657787] [] ? online_pages_range+0x68/0xa0 [ 59.657961] [] ? __online_page_increment_counters+0x10/0x10 [ 59.658162] [] ? walk_system_ram_range+0x101/0x110 [ 59.658346] [] ? online_pages+0x1a5/0x2b0 [ 59.658515] [] ? __memory_block_change_state+0x20d/0x270 [ 59.658710] [] ? store_mem_state+0xb6/0xf0 [ 59.658878] [] ? sysfs_write_file+0xd2/0x160 [ 59.659052] [] ? vfs_write+0xaa/0x160 [ 59.659212] [] ? sys_write+0x47/0x90 [ 59.659371] [] ? async_page_fault+0x25/0x30 [ 59.659543] [] ? system_call_fastpath+0x16/0x1b [ 59.659720] Disabling lock debugging due to kernel taint This patch clears the memory to store struct page to avoid unexpected error. CC: David Rientjes CC: Jiang Liu Cc: Minchan Kim CC: Andrew Morton Acked-by: KOSAKI Motohiro CC: Yasuaki Ishimatsu Reported-by: Vasilis Liaskovitis Signed-off-by: Wen Congyang --- mm/sparse.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/mm/sparse.c b/mm/sparse.c index fac95f2..0021265 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -638,7 +638,6 @@ static struct page *__kmalloc_section_memmap(unsigned long nr_pages) got_map_page: ret = (struct page *)pfn_to_kaddr(page_to_pfn(page)); got_map_ptr: - memset(ret, 0, memmap_size); return ret; } @@ -760,6 +759,8 @@ int __meminit sparse_add_one_section(struct zone *zone, unsigned long start_pfn, goto out; } + memset(memmap, 0, sizeof(struct page) * nr_pages); + ms->section_mem_map |= SECTION_MARKED_PRESENT; ret = sparse_init_one_section(ms, section_nr, memmap, usemap); -- 1.8.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wen Congyang Subject: [Patch v4 1/8] memory hotplug: suppress "Device memoryX does not have a release() function" warning Date: Wed, 31 Oct 2012 19:23:07 +0800 Message-ID: <1351682594-17347-2-git-send-email-wency@cn.fujitsu.com> References: <1351682594-17347-1-git-send-email-wency@cn.fujitsu.com> Return-path: Received: from cn.fujitsu.com ([222.73.24.84]:55903 "EHLO song.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S1756883Ab2JaLsX (ORCPT ); Wed, 31 Oct 2012 07:48:23 -0400 In-Reply-To: <1351682594-17347-1-git-send-email-wency@cn.fujitsu.com> Sender: linux-acpi-owner@vger.kernel.org List-Id: linux-acpi@vger.kernel.org To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org Cc: Jiang Liu , Len Brown , Andrew Morton , KOSAKI Motohiro , Yasuaki Ishimatsu , rjw@sisk.pl, Lai Jiangshan , Minchan Kim , Wen Congyang , Greg KH From: Yasuaki Ishimatsu When calling remove_memory_block(), the function shows following message at device_release(). "Device 'memory528' does not have a release() function, it is broken and must be fixed." The reason is memory_block's device struct does not have a release() function. So the patch registers memory_block_release() to the device's release() function for suppressing the warning message. Additionally, the patch moves kfree(mem) into the release function since the release function is prepared as a means to free a memory_block struct. Signed-off-by: Yasuaki Ishimatsu Acked-by: David Rientjes Cc: Jiang Liu Cc: Minchan Kim Acked-by: KOSAKI Motohiro Cc: Wen Congyang Cc: Greg KH Signed-off-by: Andrew Morton --- drivers/base/memory.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/base/memory.c b/drivers/base/memory.c index 86c8821..7eb1211 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -70,6 +70,13 @@ void unregister_memory_isolate_notifier(struct notifier_block *nb) } EXPORT_SYMBOL(unregister_memory_isolate_notifier); +static void memory_block_release(struct device *dev) +{ + struct memory_block *mem = container_of(dev, struct memory_block, dev); + + kfree(mem); +} + /* * register_memory - Setup a sysfs device for a memory block */ @@ -80,6 +87,7 @@ int register_memory(struct memory_block *memory) memory->dev.bus = &memory_subsys; memory->dev.id = memory->start_section_nr / sections_per_block; + memory->dev.release = memory_block_release; error = device_register(&memory->dev); return error; @@ -635,7 +643,6 @@ int remove_memory_block(unsigned long node_id, struct mem_section *section, mem_remove_simple_file(mem, phys_device); mem_remove_simple_file(mem, removable); unregister_memory(mem); - kfree(mem); } else kobject_put(&mem->dev.kobj); -- 1.8.0 From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wen Congyang Subject: [Patch v4 5/8] suppress "Device nodeX does not have a release() function" warning Date: Wed, 31 Oct 2012 19:23:11 +0800 Message-ID: <1351682594-17347-6-git-send-email-wency@cn.fujitsu.com> References: <1351682594-17347-1-git-send-email-wency@cn.fujitsu.com> Return-path: In-Reply-To: <1351682594-17347-1-git-send-email-wency@cn.fujitsu.com> Sender: owner-linux-mm@kvack.org To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org Cc: Jiang Liu , Len Brown , Andrew Morton , KOSAKI Motohiro , Yasuaki Ishimatsu , rjw@sisk.pl, Lai Jiangshan , David Rientjes , Minchan Kim , Wen Congyang List-Id: linux-acpi@vger.kernel.org From: Yasuaki Ishimatsu When calling unregister_node(), the function shows following message at device_release(). "Device 'node2' does not have a release() function, it is broken and must be fixed." The reason is node's device struct does not have a release() function. So the patch registers node_device_release() to the device's release() function for suppressing the warning message. Additionally, the patch adds memset() to initialize a node struct into register_node(). Because the node struct is part of node_devices[] array and it cannot be freed by node_device_release(). So if system reuses the node struct, it has a garbage. CC: David Rientjes CC: Jiang Liu Cc: Minchan Kim CC: Andrew Morton CC: KOSAKI Motohiro Signed-off-by: Yasuaki Ishimatsu Signed-off-by: Wen Congyang --- drivers/base/node.c | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/drivers/base/node.c b/drivers/base/node.c index 28216ce..4282e82 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -252,6 +252,24 @@ static inline void hugetlb_register_node(struct node *node) {} static inline void hugetlb_unregister_node(struct node *node) {} #endif +static void node_device_release(struct device *dev) +{ + struct node *node = to_node(dev); + +#if defined(CONFIG_MEMORY_HOTPLUG_SPARSE) && defined(CONFIG_HUGETLBFS) + /* + * We schedule the work only when a memory section is + * onlined/offlined on this node. When we come here, + * all the memory on this node has been offlined, + * so we won't enqueue new work to this work. + * + * The work is using node->node_work, so we should + * flush work before freeing the memory. + */ + flush_work(&node->node_work); +#endif + kfree(node); +} /* * register_node - Setup a sysfs device for a node. @@ -265,6 +283,7 @@ int register_node(struct node *node, int num, struct node *parent) node->dev.id = num; node->dev.bus = &node_subsys; + node->dev.release = node_device_release; error = device_register(&node->dev); if (!error){ @@ -586,7 +605,6 @@ int register_one_node(int nid) void unregister_one_node(int nid) { unregister_node(node_devices[nid]); - kfree(node_devices[nid]); node_devices[nid] = NULL; } -- 1.8.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wen Congyang Subject: [Patch v4 4/8] numa: convert static memory to dynamically allocated memory for per node device Date: Wed, 31 Oct 2012 19:23:10 +0800 Message-ID: <1351682594-17347-5-git-send-email-wency@cn.fujitsu.com> References: <1351682594-17347-1-git-send-email-wency@cn.fujitsu.com> Return-path: In-Reply-To: <1351682594-17347-1-git-send-email-wency@cn.fujitsu.com> Sender: owner-linux-mm@kvack.org To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org Cc: Jiang Liu , Len Brown , Andrew Morton , KOSAKI Motohiro , Yasuaki Ishimatsu , rjw@sisk.pl, Lai Jiangshan , Wen Congyang , David Rientjes , Minchan Kim List-Id: linux-acpi@vger.kernel.org We use a static array to store struct node. In many cases, we don't have too many nodes, and some memory will be unused. Convert it to per-device dynamically allocated memory. CC: David Rientjes CC: Jiang Liu Cc: Minchan Kim CC: Andrew Morton CC: KOSAKI Motohiro CC: Yasuaki Ishimatsu Signed-off-by: Wen Congyang --- arch/powerpc/kernel/sysfs.c | 4 ++-- drivers/base/node.c | 38 ++++++++++++++++++++++---------------- include/linux/node.h | 2 +- mm/hugetlb.c | 4 ++-- 4 files changed, 27 insertions(+), 21 deletions(-) diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c index cf357a0..3ce1f86 100644 --- a/arch/powerpc/kernel/sysfs.c +++ b/arch/powerpc/kernel/sysfs.c @@ -607,7 +607,7 @@ static void register_nodes(void) int sysfs_add_device_to_node(struct device *dev, int nid) { - struct node *node = &node_devices[nid]; + struct node *node = node_devices[nid]; return sysfs_create_link(&node->dev.kobj, &dev->kobj, kobject_name(&dev->kobj)); } @@ -615,7 +615,7 @@ EXPORT_SYMBOL_GPL(sysfs_add_device_to_node); void sysfs_remove_device_from_node(struct device *dev, int nid) { - struct node *node = &node_devices[nid]; + struct node *node = node_devices[nid]; sysfs_remove_link(&node->dev.kobj, kobject_name(&dev->kobj)); } EXPORT_SYMBOL_GPL(sysfs_remove_device_from_node); diff --git a/drivers/base/node.c b/drivers/base/node.c index af1a177..28216ce 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -306,7 +306,7 @@ void unregister_node(struct node *node) device_unregister(&node->dev); } -struct node node_devices[MAX_NUMNODES]; +struct node *node_devices[MAX_NUMNODES]; /* * register cpu under node @@ -323,15 +323,15 @@ int register_cpu_under_node(unsigned int cpu, unsigned int nid) if (!obj) return 0; - ret = sysfs_create_link(&node_devices[nid].dev.kobj, + ret = sysfs_create_link(&node_devices[nid]->dev.kobj, &obj->kobj, kobject_name(&obj->kobj)); if (ret) return ret; return sysfs_create_link(&obj->kobj, - &node_devices[nid].dev.kobj, - kobject_name(&node_devices[nid].dev.kobj)); + &node_devices[nid]->dev.kobj, + kobject_name(&node_devices[nid]->dev.kobj)); } int unregister_cpu_under_node(unsigned int cpu, unsigned int nid) @@ -345,10 +345,10 @@ int unregister_cpu_under_node(unsigned int cpu, unsigned int nid) if (!obj) return 0; - sysfs_remove_link(&node_devices[nid].dev.kobj, + sysfs_remove_link(&node_devices[nid]->dev.kobj, kobject_name(&obj->kobj)); sysfs_remove_link(&obj->kobj, - kobject_name(&node_devices[nid].dev.kobj)); + kobject_name(&node_devices[nid]->dev.kobj)); return 0; } @@ -390,15 +390,15 @@ int register_mem_sect_under_node(struct memory_block *mem_blk, int nid) continue; if (page_nid != nid) continue; - ret = sysfs_create_link_nowarn(&node_devices[nid].dev.kobj, + ret = sysfs_create_link_nowarn(&node_devices[nid]->dev.kobj, &mem_blk->dev.kobj, kobject_name(&mem_blk->dev.kobj)); if (ret) return ret; return sysfs_create_link_nowarn(&mem_blk->dev.kobj, - &node_devices[nid].dev.kobj, - kobject_name(&node_devices[nid].dev.kobj)); + &node_devices[nid]->dev.kobj, + kobject_name(&node_devices[nid]->dev.kobj)); } /* mem section does not span the specified node */ return 0; @@ -431,10 +431,10 @@ int unregister_mem_sect_under_nodes(struct memory_block *mem_blk, continue; if (node_test_and_set(nid, *unlinked_nodes)) continue; - sysfs_remove_link(&node_devices[nid].dev.kobj, + sysfs_remove_link(&node_devices[nid]->dev.kobj, kobject_name(&mem_blk->dev.kobj)); sysfs_remove_link(&mem_blk->dev.kobj, - kobject_name(&node_devices[nid].dev.kobj)); + kobject_name(&node_devices[nid]->dev.kobj)); } NODEMASK_FREE(unlinked_nodes); return 0; @@ -500,7 +500,7 @@ static void node_hugetlb_work(struct work_struct *work) static void init_node_hugetlb_work(int nid) { - INIT_WORK(&node_devices[nid].node_work, node_hugetlb_work); + INIT_WORK(&node_devices[nid]->node_work, node_hugetlb_work); } static int node_memory_callback(struct notifier_block *self, @@ -517,7 +517,7 @@ static int node_memory_callback(struct notifier_block *self, * when transitioning to/from memoryless state. */ if (nid != NUMA_NO_NODE) - schedule_work(&node_devices[nid].node_work); + schedule_work(&node_devices[nid]->node_work); break; case MEM_GOING_ONLINE: @@ -558,9 +558,13 @@ int register_one_node(int nid) struct node *parent = NULL; if (p_node != nid) - parent = &node_devices[p_node]; + parent = node_devices[p_node]; - error = register_node(&node_devices[nid], nid, parent); + node_devices[nid] = kzalloc(sizeof(struct node), GFP_KERNEL); + if (!node_devices[nid]) + return -ENOMEM; + + error = register_node(node_devices[nid], nid, parent); /* link cpu under this node */ for_each_present_cpu(cpu) { @@ -581,7 +585,9 @@ int register_one_node(int nid) void unregister_one_node(int nid) { - unregister_node(&node_devices[nid]); + unregister_node(node_devices[nid]); + kfree(node_devices[nid]); + node_devices[nid] = NULL; } /* diff --git a/include/linux/node.h b/include/linux/node.h index 624e53c..10316f1 100644 --- a/include/linux/node.h +++ b/include/linux/node.h @@ -27,7 +27,7 @@ struct node { }; struct memory_block; -extern struct node node_devices[]; +extern struct node *node_devices[]; typedef void (*node_registration_func_t)(struct node *); extern int register_node(struct node *, int, struct node *); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 59a0059..1ef2cd4 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1800,7 +1800,7 @@ static void hugetlb_unregister_all_nodes(void) * remove hstate attributes from any nodes that have them. */ for (nid = 0; nid < nr_node_ids; nid++) - hugetlb_unregister_node(&node_devices[nid]); + hugetlb_unregister_node(node_devices[nid]); } /* @@ -1845,7 +1845,7 @@ static void hugetlb_register_all_nodes(void) int nid; for_each_node_state(nid, N_HIGH_MEMORY) { - struct node *node = &node_devices[nid]; + struct node *node = node_devices[nid]; if (node->dev.id == nid) hugetlb_register_node(node); } -- 1.8.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wen Congyang Subject: [Patch v4 3/8] memory-hotplug: fix NR_FREE_PAGES mismatch Date: Wed, 31 Oct 2012 19:23:09 +0800 Message-ID: <1351682594-17347-4-git-send-email-wency@cn.fujitsu.com> References: <1351682594-17347-1-git-send-email-wency@cn.fujitsu.com> Return-path: In-Reply-To: <1351682594-17347-1-git-send-email-wency@cn.fujitsu.com> Sender: owner-linux-mm@kvack.org To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org Cc: Jiang Liu , Len Brown , Andrew Morton , KOSAKI Motohiro , Yasuaki Ishimatsu , rjw@sisk.pl, Lai Jiangshan , Wen Congyang , David Rientjes , Benjamin Herrenschmidt , Paul Mackerras , Christoph Lameter , Minchan Kim , Dave Hansen , Mel Gorman List-Id: linux-acpi@vger.kernel.org NR_FREE_PAGES will be wrong after offlining pages. We add/dec NR_FREE_PAGES like this now: 1. move all pages in buddy system to MIGRATE_ISOLATE, and dec NR_FREE_PAGES 2. don't add NR_FREE_PAGES when it is freed and the migratetype is MIGRATE_ISOLATE 3. dec NR_FREE_PAGES when offlining isolated pages. 4. add NR_FREE_PAGES when undoing isolate pages. When we come to step 3, all pages are in MIGRATE_ISOLATE list, and NR_FREE_PAGES are right. When we come to step4, all pages are not in buddy system, so we don't change NR_FREE_PAGES in this step, but we change NR_FREE_PAGES in step3. So NR_FREE_PAGES is wrong after offlining pages. So there is no need to change NR_FREE_PAGES in step3. This patch also fixs a problem in step2: if the migratetype is MIGRATE_ISOLATE, we should not add NR_FRR_PAGES when we remove pages from pcppages. Signed-off-by: Wen Congyang Cc: David Rientjes Cc: Jiang Liu Cc: Len Brown Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Christoph Lameter Cc: Minchan Kim Cc: KOSAKI Motohiro Cc: Yasuaki Ishimatsu Cc: Dave Hansen Cc: Mel Gorman Signed-off-by: Andrew Morton --- mm/page_alloc.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 5b74de6..a7cd2d1 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -667,11 +667,13 @@ static void free_pcppages_bulk(struct zone *zone, int count, /* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */ __free_one_page(page, zone, 0, mt); trace_mm_page_pcpu_drain(page, 0, mt); - if (is_migrate_cma(mt)) - __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, 1); + if (likely(mt != MIGRATE_ISOLATE)) { + __mod_zone_page_state(zone, NR_FREE_PAGES, 1); + if (is_migrate_cma(mt)) + __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, 1); + } } while (--to_free && --batch_free && !list_empty(list)); } - __mod_zone_page_state(zone, NR_FREE_PAGES, count); spin_unlock(&zone->lock); } @@ -5987,8 +5989,6 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn) list_del(&page->lru); rmv_page_order(page); zone->free_area[order].nr_free--; - __mod_zone_page_state(zone, NR_FREE_PAGES, - - (1UL << order)); for (i = 0; i < (1 << order); i++) SetPageReserved((page+i)); pfn += (1 << order); -- 1.8.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wen Congyang Subject: [Patch v4 8/8] memory-hotplug: allocate zone's pcp before onlining pages Date: Wed, 31 Oct 2012 19:23:14 +0800 Message-ID: <1351682594-17347-9-git-send-email-wency@cn.fujitsu.com> References: <1351682594-17347-1-git-send-email-wency@cn.fujitsu.com> Return-path: In-Reply-To: <1351682594-17347-1-git-send-email-wency@cn.fujitsu.com> Sender: owner-linux-mm@kvack.org To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org Cc: Jiang Liu , Len Brown , Andrew Morton , KOSAKI Motohiro , Yasuaki Ishimatsu , rjw@sisk.pl, Lai Jiangshan , Wen Congyang , David Rientjes , Benjamin Herrenschmidt , Paul Mackerras , Christoph Lameter , Minchan Kim , Dave Hansen , Mel Gorman List-Id: linux-acpi@vger.kernel.org We use __free_page() to put a page to buddy system when onlining pages. __free_page() will store NR_FREE_PAGES in zone's pcp.vm_stat_diff, so we should allocate zone's pcp before onlining pages, otherwise we will lose some free pages. Cc: David Rientjes Cc: Jiang Liu Cc: Len Brown Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Christoph Lameter Cc: Minchan Kim Cc: KOSAKI Motohiro Cc: Yasuaki Ishimatsu Cc: Dave Hansen Cc: Mel Gorman Signed-off-by: Wen Congyang --- mm/memory_hotplug.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 72f4fef..63ea7df 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -505,12 +505,16 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages) * So, zonelist must be updated after online. */ mutex_lock(&zonelists_mutex); - if (!populated_zone(zone)) + if (!populated_zone(zone)) { need_zonelists_rebuild = 1; + build_all_zonelists(NULL, zone); + } ret = walk_system_ram_range(pfn, nr_pages, &onlined_pages, online_pages_range); if (ret) { + if (need_zonelists_rebuild) + zone_pcp_reset(zone); mutex_unlock(&zonelists_mutex); printk(KERN_DEBUG "online_pages [mem %#010llx-%#010llx] failed\n", (unsigned long long) pfn << PAGE_SHIFT, @@ -526,7 +530,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages) if (onlined_pages) { node_set_state(zone_to_nid(zone), N_HIGH_MEMORY); if (need_zonelists_rebuild) - build_all_zonelists(NULL, zone); + build_all_zonelists(NULL, NULL); else zone_pcp_update(zone); } -- 1.8.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wen Congyang Subject: [Patch v4 7/8] memory-hotplug: current hwpoison doesn't support memory offline Date: Wed, 31 Oct 2012 19:23:13 +0800 Message-ID: <1351682594-17347-8-git-send-email-wency@cn.fujitsu.com> References: <1351682594-17347-1-git-send-email-wency@cn.fujitsu.com> Return-path: In-Reply-To: <1351682594-17347-1-git-send-email-wency@cn.fujitsu.com> Sender: owner-linux-mm@kvack.org To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org Cc: Jiang Liu , Len Brown , Andrew Morton , KOSAKI Motohiro , Yasuaki Ishimatsu , rjw@sisk.pl, Lai Jiangshan , Wen Congyang , David Rientjes , Benjamin Herrenschmidt , Paul Mackerras , Christoph Lameter , Minchan Kim , Andi Kleen , Dave Hansen , Mel Gorman List-Id: linux-acpi@vger.kernel.org hwpoisoned may be set when we offline a page by the sysfs interface /sys/devices/system/memory/soft_offline_page or /sys/devices/system/memory/hard_offline_page. If a page is hwpisoned page, we may meet the following problems when we offlining/removing the memory: 1. the pages can't be offlined. If the page is hwpoisoned pages, it can't be freed when it is onlined, and will not in free list. So we can't offline these pages again. So we should skip such page when offlining pages. 2. mce_bad_pages is wrong after removing a memory. When we hotremove a memory device, we will free the memory to store struct page. If the page is hwpoisoned page, we should decrease mce_bad_pages. Cc: David Rientjes Cc: Jiang Liu Cc: Len Brown Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Christoph Lameter Cc: Minchan Kim Cc: KOSAKI Motohiro Cc: Yasuaki Ishimatsu Cc: Andi Kleen Cc: Dave Hansen Cc: Mel Gorman Signed-off-by: Wen Congyang --- include/linux/page-isolation.h | 10 ++++++---- mm/memory-failure.c | 2 +- mm/memory_hotplug.c | 5 +++-- mm/page_alloc.c | 27 +++++++++++++++++++++++---- mm/page_isolation.c | 27 ++++++++++++++++++++------- mm/sparse.c | 22 ++++++++++++++++++++++ 6 files changed, 75 insertions(+), 18 deletions(-) diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h index 76a9539..a92061e 100644 --- a/include/linux/page-isolation.h +++ b/include/linux/page-isolation.h @@ -2,7 +2,8 @@ #define __LINUX_PAGEISOLATION_H -bool has_unmovable_pages(struct zone *zone, struct page *page, int count); +bool has_unmovable_pages(struct zone *zone, struct page *page, int count, + bool skip_hwpoisoned_pages); void set_pageblock_migratetype(struct page *page, int migratetype); int move_freepages_block(struct zone *zone, struct page *page, int migratetype); @@ -21,7 +22,7 @@ int move_freepages(struct zone *zone, */ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, - unsigned migratetype); + unsigned migratetype, bool skip_hwpoisoned_pages); /* * Changes MIGRATE_ISOLATE to MIGRATE_MOVABLE. @@ -34,12 +35,13 @@ undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, /* * Test all pages in [start_pfn, end_pfn) are isolated or not. */ -int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn); +int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn, + bool skip_hwpoisoned_pages); /* * Internal functions. Changes pageblock's migrate type. */ -int set_migratetype_isolate(struct page *page); +int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages); void unset_migratetype_isolate(struct page *page, unsigned migratetype); struct page *alloc_migrate_target(struct page *page, unsigned long private, int **resultp); diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 6c5899b..1abffee 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1385,7 +1385,7 @@ static int get_any_page(struct page *p, unsigned long pfn, int flags) * Isolate the page, so that it doesn't get reallocated if it * was free. */ - set_migratetype_isolate(p); + set_migratetype_isolate(p, true); /* * When the target page is a free hugepage, just remove it * from free hugepage list. diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 56b758a..72f4fef 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -854,7 +854,7 @@ check_pages_isolated_cb(unsigned long start_pfn, unsigned long nr_pages, { int ret; long offlined = *(long *)data; - ret = test_pages_isolated(start_pfn, start_pfn + nr_pages); + ret = test_pages_isolated(start_pfn, start_pfn + nr_pages, true); offlined = nr_pages; if (!ret) *(long *)data += offlined; @@ -901,7 +901,8 @@ static int __ref __offline_pages(unsigned long start_pfn, nr_pages = end_pfn - start_pfn; /* set above range as isolated */ - ret = start_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE); + ret = start_isolate_page_range(start_pfn, end_pfn, + MIGRATE_MOVABLE, true); if (ret) goto out; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index a7cd2d1..027afd0 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5577,7 +5577,8 @@ void set_pageblock_flags_group(struct page *page, unsigned long flags, * MIGRATE_MOVABLE block might include unmovable pages. It means you can't * expect this function should be exact. */ -bool has_unmovable_pages(struct zone *zone, struct page *page, int count) +bool has_unmovable_pages(struct zone *zone, struct page *page, int count, + bool skip_hwpoisoned_pages) { unsigned long pfn, iter, found; int mt; @@ -5612,6 +5613,13 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count) continue; } + /* + * The HWPoisoned page may be not in buddy system, and + * page_count() is not 0. + */ + if (skip_hwpoisoned_pages && PageHWPoison(page)) + continue; + if (!PageLRU(page)) found++; /* @@ -5654,7 +5662,7 @@ bool is_pageblock_removable_nolock(struct page *page) zone->zone_start_pfn + zone->spanned_pages <= pfn) return false; - return !has_unmovable_pages(zone, page, 0); + return !has_unmovable_pages(zone, page, 0, true); } #ifdef CONFIG_CMA @@ -5825,7 +5833,8 @@ int alloc_contig_range(unsigned long start, unsigned long end, */ ret = start_isolate_page_range(pfn_max_align_down(start), - pfn_max_align_up(end), migratetype); + pfn_max_align_up(end), migratetype, + false); if (ret) return ret; @@ -5864,7 +5873,7 @@ int alloc_contig_range(unsigned long start, unsigned long end, } /* Make sure the range is really isolated. */ - if (test_pages_isolated(outer_start, end)) { + if (test_pages_isolated(outer_start, end, false)) { pr_warn("alloc_contig_range test_pages_isolated(%lx, %lx) failed\n", outer_start, end); ret = -EBUSY; @@ -5979,6 +5988,16 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn) continue; } page = pfn_to_page(pfn); + /* + * The HWPoisoned page may be not in buddy system, and + * page_count() is not 0. + */ + if (unlikely(!PageBuddy(page) && PageHWPoison(page))) { + pfn++; + SetPageReserved(page); + continue; + } + BUG_ON(page_count(page)); BUG_ON(!PageBuddy(page)); order = page_order(page); diff --git a/mm/page_isolation.c b/mm/page_isolation.c index f2f5b48..9d2264e 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -30,7 +30,7 @@ static void restore_pageblock_isolate(struct page *page, int migratetype) zone->nr_pageblock_isolate--; } -int set_migratetype_isolate(struct page *page) +int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages) { struct zone *zone; unsigned long flags, pfn; @@ -66,7 +66,8 @@ int set_migratetype_isolate(struct page *page) * FIXME: Now, memory hotplug doesn't call shrink_slab() by itself. * We just check MOVABLE pages. */ - if (!has_unmovable_pages(zone, page, arg.pages_found)) + if (!has_unmovable_pages(zone, page, arg.pages_found, + skip_hwpoisoned_pages)) ret = 0; /* @@ -134,7 +135,7 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages) * Returns 0 on success and -EBUSY if any part of range cannot be isolated. */ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, - unsigned migratetype) + unsigned migratetype, bool skip_hwpoisoned_pages) { unsigned long pfn; unsigned long undo_pfn; @@ -147,7 +148,8 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, pfn < end_pfn; pfn += pageblock_nr_pages) { page = __first_valid_page(pfn, pageblock_nr_pages); - if (page && set_migratetype_isolate(page)) { + if (page && + set_migratetype_isolate(page, skip_hwpoisoned_pages)) { undo_pfn = pfn; goto undo; } @@ -190,7 +192,8 @@ int undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, * Returns 1 if all pages in the range are isolated. */ static int -__test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn) +__test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn, + bool skip_hwpoisoned_pages) { struct page *page; @@ -220,6 +223,14 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn) else if (page_count(page) == 0 && get_freepage_migratetype(page) == MIGRATE_ISOLATE) pfn += 1; + else if (skip_hwpoisoned_pages && PageHWPoison(page)) { + /* + * The HWPoisoned page may be not in buddy + * system, and page_count() is not 0. + */ + pfn++; + continue; + } else break; } @@ -228,7 +239,8 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn) return 1; } -int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn) +int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn, + bool skip_hwpoisoned_pages) { unsigned long pfn, flags; struct page *page; @@ -251,7 +263,8 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn) /* Check all pages are free or Marked as ISOLATED */ zone = page_zone(page); spin_lock_irqsave(&zone->lock, flags); - ret = __test_page_isolated_in_pageblock(start_pfn, end_pfn); + ret = __test_page_isolated_in_pageblock(start_pfn, end_pfn, + skip_hwpoisoned_pages); spin_unlock_irqrestore(&zone->lock, flags); return ret ? 0 : -EBUSY; } diff --git a/mm/sparse.c b/mm/sparse.c index 0021265..b2d37c6 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -774,6 +774,27 @@ out: return ret; } +#ifdef CONFIG_MEMORY_FAILURE +static void clear_hwpoisoned_pages(struct page *memmap, int nr_pages) +{ + int i; + + if (!memmap) + return; + + for (i = 0; i < PAGES_PER_SECTION; i++) { + if (PageHWPoison(&memmap[i])) { + atomic_long_sub(1, &mce_bad_pages); + ClearPageHWPoison(&memmap[i]); + } + } +} +#else +static inline void clear_hwpoisoned_pages(struct page *memmap, int nr_pages) +{ +} +#endif + void sparse_remove_one_section(struct zone *zone, struct mem_section *ms) { struct page *memmap = NULL; @@ -787,6 +808,7 @@ void sparse_remove_one_section(struct zone *zone, struct mem_section *ms) ms->pageblock_flags = NULL; } + clear_hwpoisoned_pages(memmap, PAGES_PER_SECTION); free_section_usemap(memmap, usemap); } #endif -- 1.8.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wen Congyang Subject: [Patch v4 2/8] memory-hotplug: auto offline page_cgroup when onlining memory block failed Date: Wed, 31 Oct 2012 19:23:08 +0800 Message-ID: <1351682594-17347-3-git-send-email-wency@cn.fujitsu.com> References: <1351682594-17347-1-git-send-email-wency@cn.fujitsu.com> Return-path: Received: from cn.fujitsu.com ([222.73.24.84]:7755 "EHLO song.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S935473Ab2JaMAG (ORCPT ); Wed, 31 Oct 2012 08:00:06 -0400 In-Reply-To: <1351682594-17347-1-git-send-email-wency@cn.fujitsu.com> Sender: linux-acpi-owner@vger.kernel.org List-Id: linux-acpi@vger.kernel.org To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org Cc: Jiang Liu , Len Brown , Andrew Morton , KOSAKI Motohiro , Yasuaki Ishimatsu , rjw@sisk.pl, Lai Jiangshan , Wen Congyang , David Rientjes , Benjamin Herrenschmidt , Paul Mackerras , Christoph Lameter , Minchan Kim , Dave Hansen , Mel Gorman When a memory block is onlined, we will try allocate memory on that node to store page_cgroup. If onlining the memory block failed, we don't offline the page cgroup, and we have no chance to offline this page cgroup unless the memory block is onlined successfully again. It will cause that we can't hot-remove the memory device on that node, because some memory is used to store page cgroup. If onlining the memory block is failed, there is no need to stort page cgroup for this memory. So auto offline page_cgroup when onlining memory block failed. Signed-off-by: Wen Congyang Cc: David Rientjes Cc: Jiang Liu Cc: Len Brown Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Christoph Lameter Cc: Minchan Kim Acked-by: KOSAKI Motohiro Cc: Yasuaki Ishimatsu Cc: Dave Hansen Cc: Mel Gorman Signed-off-by: Andrew Morton --- mm/page_cgroup.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c index 5ddad0c..44db00e 100644 --- a/mm/page_cgroup.c +++ b/mm/page_cgroup.c @@ -251,6 +251,9 @@ static int __meminit page_cgroup_callback(struct notifier_block *self, mn->nr_pages, mn->status_change_nid); break; case MEM_CANCEL_ONLINE: + offline_page_cgroup(mn->start_pfn, + mn->nr_pages, mn->status_change_nid); + break; case MEM_GOING_OFFLINE: break; case MEM_ONLINE: -- 1.8.0 From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jianguo Wu Subject: Re: [Patch v4 3/8] memory-hotplug: fix NR_FREE_PAGES mismatch Date: Wed, 31 Oct 2012 21:41:25 +0800 Message-ID: <50912A85.5090808@gmail.com> References: <1351682594-17347-1-git-send-email-wency@cn.fujitsu.com> <1351682594-17347-4-git-send-email-wency@cn.fujitsu.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Return-path: Received: from mail-pa0-f46.google.com ([209.85.220.46]:41235 "EHLO mail-pa0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756605Ab2JaNlh (ORCPT ); Wed, 31 Oct 2012 09:41:37 -0400 In-Reply-To: <1351682594-17347-4-git-send-email-wency@cn.fujitsu.com> Sender: linux-acpi-owner@vger.kernel.org List-Id: linux-acpi@vger.kernel.org To: Wen Congyang Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, Jiang Liu , Len Brown , Andrew Morton , KOSAKI Motohiro , Yasuaki Ishimatsu , rjw@sisk.pl, Lai Jiangshan , David Rientjes , Benjamin Herrenschmidt , Paul Mackerras , Christoph Lameter , Minchan Kim , Dave Hansen , Mel Gorman On 2012/10/31 19:23, Wen Congyang wrote: > NR_FREE_PAGES will be wrong after offlining pages. We add/dec > NR_FREE_PAGES like this now: > > 1. move all pages in buddy system to MIGRATE_ISOLATE, and dec NR_FREE_PAGES > > 2. don't add NR_FREE_PAGES when it is freed and the migratetype is > MIGRATE_ISOLATE > > 3. dec NR_FREE_PAGES when offlining isolated pages. > > 4. add NR_FREE_PAGES when undoing isolate pages. > > When we come to step 3, all pages are in MIGRATE_ISOLATE list, and > NR_FREE_PAGES are right. When we come to step4, all pages are not in > buddy system, so we don't change NR_FREE_PAGES in this step, but we change > NR_FREE_PAGES in step3. So NR_FREE_PAGES is wrong after offlining pages. > So there is no need to change NR_FREE_PAGES in step3. > > This patch also fixs a problem in step2: if the migratetype is > MIGRATE_ISOLATE, we should not add NR_FRR_PAGES when we remove pages from > pcppages. > > Signed-off-by: Wen Congyang > Cc: David Rientjes > Cc: Jiang Liu > Cc: Len Brown > Cc: Benjamin Herrenschmidt > Cc: Paul Mackerras > Cc: Christoph Lameter > Cc: Minchan Kim > Cc: KOSAKI Motohiro > Cc: Yasuaki Ishimatsu > Cc: Dave Hansen > Cc: Mel Gorman > Signed-off-by: Andrew Morton > --- > mm/page_alloc.c | 10 +++++----- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 5b74de6..a7cd2d1 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -667,11 +667,13 @@ static void free_pcppages_bulk(struct zone *zone, int count, > /* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */ > __free_one_page(page, zone, 0, mt); > trace_mm_page_pcpu_drain(page, 0, mt); > - if (is_migrate_cma(mt)) > - __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, 1); > + if (likely(mt != MIGRATE_ISOLATE)) { Hi Congyang, I think mt != MIGRATE_ISOLATE is always true here, page from PCP's migratetype < MIGRATE_PCPTYPES. When isolate page, we change pageblock's migratetype to MIGRATE_ISOLATE, but set_freepage_migratetype() isn't called. Maybe we can use mt = get_pageblock_migratetype() here ? Thanks, Jianguo Wu. > + __mod_zone_page_state(zone, NR_FREE_PAGES, 1); > + if (is_migrate_cma(mt)) > + __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, 1); > + } > } while (--to_free && --batch_free && !list_empty(list)); > } > - __mod_zone_page_state(zone, NR_FREE_PAGES, count); > spin_unlock(&zone->lock); > } > > @@ -5987,8 +5989,6 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn) > list_del(&page->lru); > rmv_page_order(page); > zone->free_area[order].nr_free--; > - __mod_zone_page_state(zone, NR_FREE_PAGES, > - - (1UL << order)); > for (i = 0; i < (1 << order); i++) > SetPageReserved((page+i)); > pfn += (1 << order); > From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wen Congyang Subject: [PATCH] memory-hotplug: fix NR_FREE_PAGES mismatch's fix Date: Thu, 01 Nov 2012 10:55:01 +0800 Message-ID: <5091E485.7090409@cn.fujitsu.com> References: <1351682594-17347-1-git-send-email-wency@cn.fujitsu.com> <1351682594-17347-4-git-send-email-wency@cn.fujitsu.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Return-path: Received: from cn.fujitsu.com ([222.73.24.84]:6642 "EHLO song.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S1755019Ab2KACtS (ORCPT ); Wed, 31 Oct 2012 22:49:18 -0400 In-Reply-To: <1351682594-17347-4-git-send-email-wency@cn.fujitsu.com> Sender: linux-acpi-owner@vger.kernel.org List-Id: linux-acpi@vger.kernel.org To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, Jiang Liu , Len Brown , KOSAKI Motohiro , Yasuaki Ishimatsu , rjw@sisk.pl, Lai Jiangshan , David Rientjes , Benjamin Herrenschmidt , Paul Mackerras , Christoph Lameter , Minchan Kim , Dave Hansen , Mel Gorman , Jianguo wu When a page is freed and put into pcp list, get_freepage_migratetype() doesn't return MIGRATE_ISOLATE even if this pageblock is isolated. So we should use get_pageblock_migratetype() instead of mt to check whether it is isolated. Cc: David Rientjes Cc: Jiang Liu Cc: Len Brown Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Christoph Lameter Cc: Minchan Kim Cc: KOSAKI Motohiro Cc: Yasuaki Ishimatsu Cc: Dave Hansen Cc: Mel Gorman Cc: Jianguo Wu Signed-off-by: Wen Congyang --- mm/page_alloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 027afd0..e9c19d2 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -667,7 +667,7 @@ static void free_pcppages_bulk(struct zone *zone, int count, /* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */ __free_one_page(page, zone, 0, mt); trace_mm_page_pcpu_drain(page, 0, mt); - if (likely(mt != MIGRATE_ISOLATE)) { + if (likely(mt != get_pageblock_migratetype(page))) { __mod_zone_page_state(zone, NR_FREE_PAGES, 1); if (is_migrate_cma(mt)) __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, 1); -- 1.8.0 From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wen Congyang Subject: Re: [Patch v4 3/8] memory-hotplug: fix NR_FREE_PAGES mismatch Date: Thu, 01 Nov 2012 11:00:07 +0800 Message-ID: <5091E5B7.80308@cn.fujitsu.com> References: <1351682594-17347-1-git-send-email-wency@cn.fujitsu.com> <1351682594-17347-4-git-send-email-wency@cn.fujitsu.com> <50912A85.5090808@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <50912A85.5090808@gmail.com> Sender: linux-kernel-owner@vger.kernel.org To: Jianguo Wu Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, Jiang Liu , Len Brown , Andrew Morton , KOSAKI Motohiro , Yasuaki Ishimatsu , rjw@sisk.pl, Lai Jiangshan , David Rientjes , Benjamin Herrenschmidt , Paul Mackerras , Christoph Lameter , Minchan Kim , Dave Hansen , Mel Gorman List-Id: linux-acpi@vger.kernel.org At 10/31/2012 09:41 PM, Jianguo Wu Wrote: > On 2012/10/31 19:23, Wen Congyang wrote: >> NR_FREE_PAGES will be wrong after offlining pages. We add/dec >> NR_FREE_PAGES like this now: >> >> 1. move all pages in buddy system to MIGRATE_ISOLATE, and dec NR_FREE_PAGES >> >> 2. don't add NR_FREE_PAGES when it is freed and the migratetype is >> MIGRATE_ISOLATE >> >> 3. dec NR_FREE_PAGES when offlining isolated pages. >> >> 4. add NR_FREE_PAGES when undoing isolate pages. >> >> When we come to step 3, all pages are in MIGRATE_ISOLATE list, and >> NR_FREE_PAGES are right. When we come to step4, all pages are not in >> buddy system, so we don't change NR_FREE_PAGES in this step, but we change >> NR_FREE_PAGES in step3. So NR_FREE_PAGES is wrong after offlining pages. >> So there is no need to change NR_FREE_PAGES in step3. >> >> This patch also fixs a problem in step2: if the migratetype is >> MIGRATE_ISOLATE, we should not add NR_FRR_PAGES when we remove pages from >> pcppages. >> >> Signed-off-by: Wen Congyang >> Cc: David Rientjes >> Cc: Jiang Liu >> Cc: Len Brown >> Cc: Benjamin Herrenschmidt >> Cc: Paul Mackerras >> Cc: Christoph Lameter >> Cc: Minchan Kim >> Cc: KOSAKI Motohiro >> Cc: Yasuaki Ishimatsu >> Cc: Dave Hansen >> Cc: Mel Gorman >> Signed-off-by: Andrew Morton >> --- >> mm/page_alloc.c | 10 +++++----- >> 1 file changed, 5 insertions(+), 5 deletions(-) >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index 5b74de6..a7cd2d1 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -667,11 +667,13 @@ static void free_pcppages_bulk(struct zone *zone, int count, >> /* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */ >> __free_one_page(page, zone, 0, mt); >> trace_mm_page_pcpu_drain(page, 0, mt); >> - if (is_migrate_cma(mt)) >> - __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, 1); >> + if (likely(mt != MIGRATE_ISOLATE)) { > > Hi Congyang, > I think mt != MIGRATE_ISOLATE is always true here, > page from PCP's migratetype < MIGRATE_PCPTYPES. > When isolate page, we change pageblock's migratetype to MIGRATE_ISOLATE, > but set_freepage_migratetype() isn't called. > Maybe we can use mt = get_pageblock_migratetype() here ? Yes, you are right. I have sent a fix patch. Thanks for pointing it out. Wen Congyang > > Thanks, > Jianguo Wu. > >> + __mod_zone_page_state(zone, NR_FREE_PAGES, 1); >> + if (is_migrate_cma(mt)) >> + __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, 1); >> + } >> } while (--to_free && --batch_free && !list_empty(list)); >> } >> - __mod_zone_page_state(zone, NR_FREE_PAGES, count); >> spin_unlock(&zone->lock); >> } >> >> @@ -5987,8 +5989,6 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn) >> list_del(&page->lru); >> rmv_page_order(page); >> zone->free_area[order].nr_free--; >> - __mod_zone_page_state(zone, NR_FREE_PAGES, >> - - (1UL << order)); >> for (i = 0; i < (1 << order); i++) >> SetPageReserved((page+i)); >> pfn += (1 << order); >> > > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx136.postini.com [74.125.245.136]) by kanga.kvack.org (Postfix) with SMTP id 4FC1D6B006C for ; Wed, 31 Oct 2012 07:48:22 -0400 (EDT) From: Wen Congyang Subject: [Patch v4 1/8] memory hotplug: suppress "Device memoryX does not have a release() function" warning Date: Wed, 31 Oct 2012 19:23:07 +0800 Message-Id: <1351682594-17347-2-git-send-email-wency@cn.fujitsu.com> In-Reply-To: <1351682594-17347-1-git-send-email-wency@cn.fujitsu.com> References: <1351682594-17347-1-git-send-email-wency@cn.fujitsu.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org Cc: Jiang Liu , Len Brown , Andrew Morton , KOSAKI Motohiro , Yasuaki Ishimatsu , rjw@sisk.pl, Lai Jiangshan , Minchan Kim , Wen Congyang , Greg KH From: Yasuaki Ishimatsu When calling remove_memory_block(), the function shows following message at device_release(). "Device 'memory528' does not have a release() function, it is broken and must be fixed." The reason is memory_block's device struct does not have a release() function. So the patch registers memory_block_release() to the device's release() function for suppressing the warning message. Additionally, the patch moves kfree(mem) into the release function since the release function is prepared as a means to free a memory_block struct. Signed-off-by: Yasuaki Ishimatsu Acked-by: David Rientjes Cc: Jiang Liu Cc: Minchan Kim Acked-by: KOSAKI Motohiro Cc: Wen Congyang Cc: Greg KH Signed-off-by: Andrew Morton --- drivers/base/memory.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/base/memory.c b/drivers/base/memory.c index 86c8821..7eb1211 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -70,6 +70,13 @@ void unregister_memory_isolate_notifier(struct notifier_block *nb) } EXPORT_SYMBOL(unregister_memory_isolate_notifier); +static void memory_block_release(struct device *dev) +{ + struct memory_block *mem = container_of(dev, struct memory_block, dev); + + kfree(mem); +} + /* * register_memory - Setup a sysfs device for a memory block */ @@ -80,6 +87,7 @@ int register_memory(struct memory_block *memory) memory->dev.bus = &memory_subsys; memory->dev.id = memory->start_section_nr / sections_per_block; + memory->dev.release = memory_block_release; error = device_register(&memory->dev); return error; @@ -635,7 +643,6 @@ int remove_memory_block(unsigned long node_id, struct mem_section *section, mem_remove_simple_file(mem, phys_device); mem_remove_simple_file(mem, removable); unregister_memory(mem); - kfree(mem); } else kobject_put(&mem->dev.kobj); -- 1.8.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx130.postini.com [74.125.245.130]) by kanga.kvack.org (Postfix) with SMTP id 533586B0044 for ; Wed, 31 Oct 2012 08:00:04 -0400 (EDT) From: Wen Congyang Subject: [Patch v4 2/8] memory-hotplug: auto offline page_cgroup when onlining memory block failed Date: Wed, 31 Oct 2012 19:23:08 +0800 Message-Id: <1351682594-17347-3-git-send-email-wency@cn.fujitsu.com> In-Reply-To: <1351682594-17347-1-git-send-email-wency@cn.fujitsu.com> References: <1351682594-17347-1-git-send-email-wency@cn.fujitsu.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org Cc: Jiang Liu , Len Brown , Andrew Morton , KOSAKI Motohiro , Yasuaki Ishimatsu , rjw@sisk.pl, Lai Jiangshan , Wen Congyang , David Rientjes , Benjamin Herrenschmidt , Paul Mackerras , Christoph Lameter , Minchan Kim , Dave Hansen , Mel Gorman When a memory block is onlined, we will try allocate memory on that node to store page_cgroup. If onlining the memory block failed, we don't offline the page cgroup, and we have no chance to offline this page cgroup unless the memory block is onlined successfully again. It will cause that we can't hot-remove the memory device on that node, because some memory is used to store page cgroup. If onlining the memory block is failed, there is no need to stort page cgroup for this memory. So auto offline page_cgroup when onlining memory block failed. Signed-off-by: Wen Congyang Cc: David Rientjes Cc: Jiang Liu Cc: Len Brown Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Christoph Lameter Cc: Minchan Kim Acked-by: KOSAKI Motohiro Cc: Yasuaki Ishimatsu Cc: Dave Hansen Cc: Mel Gorman Signed-off-by: Andrew Morton --- mm/page_cgroup.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c index 5ddad0c..44db00e 100644 --- a/mm/page_cgroup.c +++ b/mm/page_cgroup.c @@ -251,6 +251,9 @@ static int __meminit page_cgroup_callback(struct notifier_block *self, mn->nr_pages, mn->status_change_nid); break; case MEM_CANCEL_ONLINE: + offline_page_cgroup(mn->start_pfn, + mn->nr_pages, mn->status_change_nid); + break; case MEM_GOING_OFFLINE: break; case MEM_ONLINE: -- 1.8.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx190.postini.com [74.125.245.190]) by kanga.kvack.org (Postfix) with SMTP id 9FB686B006C for ; Wed, 31 Oct 2012 09:41:37 -0400 (EDT) Received: by mail-pb0-f41.google.com with SMTP id rq2so1090727pbb.14 for ; Wed, 31 Oct 2012 06:41:36 -0700 (PDT) Message-ID: <50912A85.5090808@gmail.com> Date: Wed, 31 Oct 2012 21:41:25 +0800 From: Jianguo Wu MIME-Version: 1.0 Subject: Re: [Patch v4 3/8] memory-hotplug: fix NR_FREE_PAGES mismatch References: <1351682594-17347-1-git-send-email-wency@cn.fujitsu.com> <1351682594-17347-4-git-send-email-wency@cn.fujitsu.com> In-Reply-To: <1351682594-17347-4-git-send-email-wency@cn.fujitsu.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Wen Congyang Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, Jiang Liu , Len Brown , Andrew Morton , KOSAKI Motohiro , Yasuaki Ishimatsu , rjw@sisk.pl, Lai Jiangshan , David Rientjes , Benjamin Herrenschmidt , Paul Mackerras , Christoph Lameter , Minchan Kim , Dave Hansen , Mel Gorman On 2012/10/31 19:23, Wen Congyang wrote: > NR_FREE_PAGES will be wrong after offlining pages. We add/dec > NR_FREE_PAGES like this now: > > 1. move all pages in buddy system to MIGRATE_ISOLATE, and dec NR_FREE_PAGES > > 2. don't add NR_FREE_PAGES when it is freed and the migratetype is > MIGRATE_ISOLATE > > 3. dec NR_FREE_PAGES when offlining isolated pages. > > 4. add NR_FREE_PAGES when undoing isolate pages. > > When we come to step 3, all pages are in MIGRATE_ISOLATE list, and > NR_FREE_PAGES are right. When we come to step4, all pages are not in > buddy system, so we don't change NR_FREE_PAGES in this step, but we change > NR_FREE_PAGES in step3. So NR_FREE_PAGES is wrong after offlining pages. > So there is no need to change NR_FREE_PAGES in step3. > > This patch also fixs a problem in step2: if the migratetype is > MIGRATE_ISOLATE, we should not add NR_FRR_PAGES when we remove pages from > pcppages. > > Signed-off-by: Wen Congyang > Cc: David Rientjes > Cc: Jiang Liu > Cc: Len Brown > Cc: Benjamin Herrenschmidt > Cc: Paul Mackerras > Cc: Christoph Lameter > Cc: Minchan Kim > Cc: KOSAKI Motohiro > Cc: Yasuaki Ishimatsu > Cc: Dave Hansen > Cc: Mel Gorman > Signed-off-by: Andrew Morton > --- > mm/page_alloc.c | 10 +++++----- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 5b74de6..a7cd2d1 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -667,11 +667,13 @@ static void free_pcppages_bulk(struct zone *zone, int count, > /* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */ > __free_one_page(page, zone, 0, mt); > trace_mm_page_pcpu_drain(page, 0, mt); > - if (is_migrate_cma(mt)) > - __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, 1); > + if (likely(mt != MIGRATE_ISOLATE)) { Hi Congyang, I think mt != MIGRATE_ISOLATE is always true here, page from PCP's migratetype < MIGRATE_PCPTYPES. When isolate page, we change pageblock's migratetype to MIGRATE_ISOLATE, but set_freepage_migratetype() isn't called. Maybe we can use mt = get_pageblock_migratetype() here ? Thanks, Jianguo Wu. > + __mod_zone_page_state(zone, NR_FREE_PAGES, 1); > + if (is_migrate_cma(mt)) > + __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, 1); > + } > } while (--to_free && --batch_free && !list_empty(list)); > } > - __mod_zone_page_state(zone, NR_FREE_PAGES, count); > spin_unlock(&zone->lock); > } > > @@ -5987,8 +5989,6 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn) > list_del(&page->lru); > rmv_page_order(page); > zone->free_area[order].nr_free--; > - __mod_zone_page_state(zone, NR_FREE_PAGES, > - - (1UL << order)); > for (i = 0; i < (1 << order); i++) > SetPageReserved((page+i)); > pfn += (1 << order); > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx185.postini.com [74.125.245.185]) by kanga.kvack.org (Postfix) with SMTP id 11D556B006C for ; Wed, 31 Oct 2012 22:49:17 -0400 (EDT) Message-ID: <5091E485.7090409@cn.fujitsu.com> Date: Thu, 01 Nov 2012 10:55:01 +0800 From: Wen Congyang MIME-Version: 1.0 Subject: [PATCH] memory-hotplug: fix NR_FREE_PAGES mismatch's fix References: <1351682594-17347-1-git-send-email-wency@cn.fujitsu.com> <1351682594-17347-4-git-send-email-wency@cn.fujitsu.com> In-Reply-To: <1351682594-17347-4-git-send-email-wency@cn.fujitsu.com> Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, Jiang Liu , Len Brown , KOSAKI Motohiro , Yasuaki Ishimatsu , rjw@sisk.pl, Lai Jiangshan , David Rientjes , Benjamin Herrenschmidt , Paul Mackerras , Christoph Lameter , Minchan Kim , Dave Hansen , Mel Gorman , Jianguo wu When a page is freed and put into pcp list, get_freepage_migratetype() doesn't return MIGRATE_ISOLATE even if this pageblock is isolated. So we should use get_pageblock_migratetype() instead of mt to check whether it is isolated. Cc: David Rientjes Cc: Jiang Liu Cc: Len Brown Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Christoph Lameter Cc: Minchan Kim Cc: KOSAKI Motohiro Cc: Yasuaki Ishimatsu Cc: Dave Hansen Cc: Mel Gorman Cc: Jianguo Wu Signed-off-by: Wen Congyang --- mm/page_alloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 027afd0..e9c19d2 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -667,7 +667,7 @@ static void free_pcppages_bulk(struct zone *zone, int count, /* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */ __free_one_page(page, zone, 0, mt); trace_mm_page_pcpu_drain(page, 0, mt); - if (likely(mt != MIGRATE_ISOLATE)) { + if (likely(mt != get_pageblock_migratetype(page))) { __mod_zone_page_state(zone, NR_FREE_PAGES, 1); if (is_migrate_cma(mt)) __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, 1); -- 1.8.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx201.postini.com [74.125.245.201]) by kanga.kvack.org (Postfix) with SMTP id 70D766B006C for ; Wed, 31 Oct 2012 22:54:20 -0400 (EDT) Message-ID: <5091E5B7.80308@cn.fujitsu.com> Date: Thu, 01 Nov 2012 11:00:07 +0800 From: Wen Congyang MIME-Version: 1.0 Subject: Re: [Patch v4 3/8] memory-hotplug: fix NR_FREE_PAGES mismatch References: <1351682594-17347-1-git-send-email-wency@cn.fujitsu.com> <1351682594-17347-4-git-send-email-wency@cn.fujitsu.com> <50912A85.5090808@gmail.com> In-Reply-To: <50912A85.5090808@gmail.com> Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org List-ID: To: Jianguo Wu Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, Jiang Liu , Len Brown , Andrew Morton , KOSAKI Motohiro , Yasuaki Ishimatsu , rjw@sisk.pl, Lai Jiangshan , David Rientjes , Benjamin Herrenschmidt , Paul Mackerras , Christoph Lameter , Minchan Kim , Dave Hansen , Mel Gorman At 10/31/2012 09:41 PM, Jianguo Wu Wrote: > On 2012/10/31 19:23, Wen Congyang wrote: >> NR_FREE_PAGES will be wrong after offlining pages. We add/dec >> NR_FREE_PAGES like this now: >> >> 1. move all pages in buddy system to MIGRATE_ISOLATE, and dec NR_FREE_PAGES >> >> 2. don't add NR_FREE_PAGES when it is freed and the migratetype is >> MIGRATE_ISOLATE >> >> 3. dec NR_FREE_PAGES when offlining isolated pages. >> >> 4. add NR_FREE_PAGES when undoing isolate pages. >> >> When we come to step 3, all pages are in MIGRATE_ISOLATE list, and >> NR_FREE_PAGES are right. When we come to step4, all pages are not in >> buddy system, so we don't change NR_FREE_PAGES in this step, but we change >> NR_FREE_PAGES in step3. So NR_FREE_PAGES is wrong after offlining pages. >> So there is no need to change NR_FREE_PAGES in step3. >> >> This patch also fixs a problem in step2: if the migratetype is >> MIGRATE_ISOLATE, we should not add NR_FRR_PAGES when we remove pages from >> pcppages. >> >> Signed-off-by: Wen Congyang >> Cc: David Rientjes >> Cc: Jiang Liu >> Cc: Len Brown >> Cc: Benjamin Herrenschmidt >> Cc: Paul Mackerras >> Cc: Christoph Lameter >> Cc: Minchan Kim >> Cc: KOSAKI Motohiro >> Cc: Yasuaki Ishimatsu >> Cc: Dave Hansen >> Cc: Mel Gorman >> Signed-off-by: Andrew Morton >> --- >> mm/page_alloc.c | 10 +++++----- >> 1 file changed, 5 insertions(+), 5 deletions(-) >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index 5b74de6..a7cd2d1 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -667,11 +667,13 @@ static void free_pcppages_bulk(struct zone *zone, int count, >> /* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */ >> __free_one_page(page, zone, 0, mt); >> trace_mm_page_pcpu_drain(page, 0, mt); >> - if (is_migrate_cma(mt)) >> - __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, 1); >> + if (likely(mt != MIGRATE_ISOLATE)) { > > Hi Congyang, > I think mt != MIGRATE_ISOLATE is always true here, > page from PCP's migratetype < MIGRATE_PCPTYPES. > When isolate page, we change pageblock's migratetype to MIGRATE_ISOLATE, > but set_freepage_migratetype() isn't called. > Maybe we can use mt = get_pageblock_migratetype() here ? Yes, you are right. I have sent a fix patch. Thanks for pointing it out. Wen Congyang > > Thanks, > Jianguo Wu. > >> + __mod_zone_page_state(zone, NR_FREE_PAGES, 1); >> + if (is_migrate_cma(mt)) >> + __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, 1); >> + } >> } while (--to_free && --batch_free && !list_empty(list)); >> } >> - __mod_zone_page_state(zone, NR_FREE_PAGES, count); >> spin_unlock(&zone->lock); >> } >> >> @@ -5987,8 +5989,6 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn) >> list_del(&page->lru); >> rmv_page_order(page); >> zone->free_area[order].nr_free--; >> - __mod_zone_page_state(zone, NR_FREE_PAGES, >> - - (1UL << order)); >> for (i = 0; i < (1 << order); i++) >> SetPageReserved((page+i)); >> pfn += (1 << order); >> > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965413Ab2JaL0c (ORCPT ); Wed, 31 Oct 2012 07:26:32 -0400 Received: from cn.fujitsu.com ([222.73.24.84]:14146 "EHLO song.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S935399Ab2JaL0Z (ORCPT ); Wed, 31 Oct 2012 07:26:25 -0400 X-IronPort-AV: E=Sophos;i="4.80,687,1344182400"; d="scan'208";a="6108772" Message-ID: <50910C39.70305@cn.fujitsu.com> Date: Wed, 31 Oct 2012 19:32:09 +0800 From: Wen Congyang User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.9) Gecko/20100413 Fedora/3.0.4-2.fc13 Thunderbird/3.0.4 MIME-Version: 1.0 CC: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, Jiang Liu , Len Brown , Andrew Morton , KOSAKI Motohiro , Yasuaki Ishimatsu , rjw@sisk.pl, Lai Jiangshan Subject: Re: [Patch v4 0/8] bugfix for memory hotplug References: <1351682594-17347-1-git-send-email-wency@cn.fujitsu.com> In-Reply-To: <1351682594-17347-1-git-send-email-wency@cn.fujitsu.com> X-MIMETrack: Itemize by SMTP Server on mailserver/fnst(Release 8.5.3|September 15, 2011) at 2012/10/31 19:25:38, Serialize by Router on mailserver/fnst(Release 8.5.3|September 15, 2011) at 2012/10/31 19:25:39, Serialize complete at 2012/10/31 19:25:39 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=ISO-8859-1 To: unlisted-recipients:; (no To-header on input) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org At 10/31/2012 07:23 PM, Wen Congyang Wrote: > The last version is here: > https://lkml.org/lkml/2012/10/19/56 > > Note: patch 1-3 are in -mm tree and I don't touch them. The other patches > except patch6 are also in mm tree. Patch 6 is not touched. > > Changes from v3 to v4: > Patch4: use dynamically allocated memory instead of static array. > Patch5: merge [patchv3 2-3] into a single patch, and update it as we use > dynamically allocated memory > Patch7: merge [patchv3 5-6] into a single patch > Patch8: merge [patchv3 9] and its fix into a patch Note: The patch from Michal Hocko is not merged into patch8 Thanks Wen Congyang > > Changes from v2 to v3: > Merge the bug fix from ishimatsu to this patchset(Patch 1-3) > Patch 3: split it from patch as it fixes another bug. > Patch 4: new patch, and fix bad-page state when hotadding a memory > device after hotremoving it. I forgot to post this patch in v2. > Patch 6: update it according to Dave Hansen's comment. > > Changes from v1 to v2: > Patch 1: updated according to kosaki's suggestion > > Patch 2: new patch, and update mce_bad_pages when removing memory. > > Patch 4: new patch, and fix a NR_FREE_PAGES mismatch, and this bug > cause oom in my test. > > Patch 5: new patch, and fix a new bug. When repeating to online/offline > pages, the free pages will continue to decrease. > > Wen Congyang (6): > memory-hotplug: auto offline page_cgroup when onlining memory block > failed > memory-hotplug: fix NR_FREE_PAGES mismatch > numa: convert static memory to dynamically allocated memory for per > node device > clear the memory to store struct page > memory-hotplug: current hwpoison doesn't support memory offline > memory-hotplug: allocate zone's pcp before onlining pages > > Yasuaki Ishimatsu (2): > memory hotplug: suppress "Device memoryX does not have a release() > function" warning > suppress "Device nodeX does not have a release() function" warning > > arch/powerpc/kernel/sysfs.c | 4 +-- > drivers/base/memory.c | 9 ++++++- > drivers/base/node.c | 56 ++++++++++++++++++++++++++++++------------ > include/linux/node.h | 2 +- > include/linux/page-isolation.h | 10 +++++--- > mm/hugetlb.c | 4 +-- > mm/memory-failure.c | 2 +- > mm/memory_hotplug.c | 13 +++++++--- > mm/page_alloc.c | 37 +++++++++++++++++++++------- > mm/page_cgroup.c | 3 +++ > mm/page_isolation.c | 27 ++++++++++++++------ > mm/sparse.c | 25 ++++++++++++++++++- > 12 files changed, 144 insertions(+), 48 deletions(-) > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935469Ab2JaLsi (ORCPT ); Wed, 31 Oct 2012 07:48:38 -0400 Received: from cn.fujitsu.com ([222.73.24.84]:51732 "EHLO song.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S1757006Ab2JaLs1 (ORCPT ); Wed, 31 Oct 2012 07:48:27 -0400 X-IronPort-AV: E=Sophos;i="4.80,687,1344182400"; d="scan'208";a="6108900" From: Wen Congyang To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org Cc: Jiang Liu , Len Brown , Andrew Morton , KOSAKI Motohiro , Yasuaki Ishimatsu , rjw@sisk.pl, Lai Jiangshan , David Rientjes , Minchan Kim , Wen Congyang Subject: [Patch v4 5/8] suppress "Device nodeX does not have a release() function" warning Date: Wed, 31 Oct 2012 19:23:11 +0800 Message-Id: <1351682594-17347-6-git-send-email-wency@cn.fujitsu.com> X-Mailer: git-send-email 1.8.0 In-Reply-To: <1351682594-17347-1-git-send-email-wency@cn.fujitsu.com> References: <1351682594-17347-1-git-send-email-wency@cn.fujitsu.com> X-MIMETrack: Itemize by SMTP Server on mailserver/fnst(Release 8.5.3|September 15, 2011) at 2012/10/31 19:16:45, Serialize by Router on mailserver/fnst(Release 8.5.3|September 15, 2011) at 2012/10/31 19:16:51, Serialize complete at 2012/10/31 19:16:51 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Yasuaki Ishimatsu When calling unregister_node(), the function shows following message at device_release(). "Device 'node2' does not have a release() function, it is broken and must be fixed." The reason is node's device struct does not have a release() function. So the patch registers node_device_release() to the device's release() function for suppressing the warning message. Additionally, the patch adds memset() to initialize a node struct into register_node(). Because the node struct is part of node_devices[] array and it cannot be freed by node_device_release(). So if system reuses the node struct, it has a garbage. CC: David Rientjes CC: Jiang Liu Cc: Minchan Kim CC: Andrew Morton CC: KOSAKI Motohiro Signed-off-by: Yasuaki Ishimatsu Signed-off-by: Wen Congyang --- drivers/base/node.c | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/drivers/base/node.c b/drivers/base/node.c index 28216ce..4282e82 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -252,6 +252,24 @@ static inline void hugetlb_register_node(struct node *node) {} static inline void hugetlb_unregister_node(struct node *node) {} #endif +static void node_device_release(struct device *dev) +{ + struct node *node = to_node(dev); + +#if defined(CONFIG_MEMORY_HOTPLUG_SPARSE) && defined(CONFIG_HUGETLBFS) + /* + * We schedule the work only when a memory section is + * onlined/offlined on this node. When we come here, + * all the memory on this node has been offlined, + * so we won't enqueue new work to this work. + * + * The work is using node->node_work, so we should + * flush work before freeing the memory. + */ + flush_work(&node->node_work); +#endif + kfree(node); +} /* * register_node - Setup a sysfs device for a node. @@ -265,6 +283,7 @@ int register_node(struct node *node, int num, struct node *parent) node->dev.id = num; node->dev.bus = &node_subsys; + node->dev.release = node_device_release; error = device_register(&node->dev); if (!error){ @@ -586,7 +605,6 @@ int register_one_node(int nid) void unregister_one_node(int nid) { unregister_node(node_devices[nid]); - kfree(node_devices[nid]); node_devices[nid] = NULL; } -- 1.8.0 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757135Ab2JaLsk (ORCPT ); Wed, 31 Oct 2012 07:48:40 -0400 Received: from cn.fujitsu.com ([222.73.24.84]:55903 "EHLO song.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S932711Ab2JaLs1 (ORCPT ); Wed, 31 Oct 2012 07:48:27 -0400 X-IronPort-AV: E=Sophos;i="4.80,687,1344182400"; d="scan'208";a="6108901" From: Wen Congyang To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org Cc: Jiang Liu , Len Brown , Andrew Morton , KOSAKI Motohiro , Yasuaki Ishimatsu , rjw@sisk.pl, Lai Jiangshan , Wen Congyang , David Rientjes , Minchan Kim Subject: [Patch v4 6/8] clear the memory to store struct page Date: Wed, 31 Oct 2012 19:23:12 +0800 Message-Id: <1351682594-17347-7-git-send-email-wency@cn.fujitsu.com> X-Mailer: git-send-email 1.8.0 In-Reply-To: <1351682594-17347-1-git-send-email-wency@cn.fujitsu.com> References: <1351682594-17347-1-git-send-email-wency@cn.fujitsu.com> X-MIMETrack: Itemize by SMTP Server on mailserver/fnst(Release 8.5.3|September 15, 2011) at 2012/10/31 19:16:45, Serialize by Router on mailserver/fnst(Release 8.5.3|September 15, 2011) at 2012/10/31 19:16:51, Serialize complete at 2012/10/31 19:16:51 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org If sparse memory vmemmap is enabled, we can't free the memory to store struct page when a memory device is hotremoved, because we may store struct page in the memory to manage the memory which doesn't belong to this memory device. When we hotadded this memory device again, we will reuse this memory to store struct page, and struct page may contain some obsolete information, and we will get bad-page state: [ 59.611278] init_memory_mapping: [mem 0x80000000-0x9fffffff] [ 59.637836] Built 2 zonelists in Node order, mobility grouping on. Total pages: 547617 [ 59.638739] Policy zone: Normal [ 59.650840] BUG: Bad page state in process bash pfn:9b6dc [ 59.651124] page:ffffea0002200020 count:0 mapcount:0 mapping: (null) index:0xfdfdfdfdfdfdfdfd [ 59.651494] page flags: 0x2fdfdfdfd5df9fd(locked|referenced|uptodate|dirty|lru|active|slab|owner_priv_1|private|private_2|writeback|head|tail|swapcache|reclaim|swapbacked|unevictable|uncached|compound_lock) [ 59.653604] Modules linked in: netconsole acpiphp pci_hotplug acpi_memhotplug loop kvm_amd kvm microcode tpm_tis tpm tpm_bios evdev psmouse serio_raw i2c_piix4 i2c_core parport_pc parport processor button thermal_sys ext3 jbd mbcache sg sr_mod cdrom ata_generic virtio_net ata_piix virtio_blk libata virtio_pci virtio_ring virtio scsi_mod [ 59.656998] Pid: 988, comm: bash Not tainted 3.6.0-rc7-guest #12 [ 59.657172] Call Trace: [ 59.657275] [] ? bad_page+0xb0/0x100 [ 59.657434] [] ? free_pages_prepare+0xb3/0x100 [ 59.657610] [] ? free_hot_cold_page+0x48/0x1a0 [ 59.657787] [] ? online_pages_range+0x68/0xa0 [ 59.657961] [] ? __online_page_increment_counters+0x10/0x10 [ 59.658162] [] ? walk_system_ram_range+0x101/0x110 [ 59.658346] [] ? online_pages+0x1a5/0x2b0 [ 59.658515] [] ? __memory_block_change_state+0x20d/0x270 [ 59.658710] [] ? store_mem_state+0xb6/0xf0 [ 59.658878] [] ? sysfs_write_file+0xd2/0x160 [ 59.659052] [] ? vfs_write+0xaa/0x160 [ 59.659212] [] ? sys_write+0x47/0x90 [ 59.659371] [] ? async_page_fault+0x25/0x30 [ 59.659543] [] ? system_call_fastpath+0x16/0x1b [ 59.659720] Disabling lock debugging due to kernel taint This patch clears the memory to store struct page to avoid unexpected error. CC: David Rientjes CC: Jiang Liu Cc: Minchan Kim CC: Andrew Morton Acked-by: KOSAKI Motohiro CC: Yasuaki Ishimatsu Reported-by: Vasilis Liaskovitis Signed-off-by: Wen Congyang --- mm/sparse.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/mm/sparse.c b/mm/sparse.c index fac95f2..0021265 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -638,7 +638,6 @@ static struct page *__kmalloc_section_memmap(unsigned long nr_pages) got_map_page: ret = (struct page *)pfn_to_kaddr(page_to_pfn(page)); got_map_ptr: - memset(ret, 0, memmap_size); return ret; } @@ -760,6 +759,8 @@ int __meminit sparse_add_one_section(struct zone *zone, unsigned long start_pfn, goto out; } + memset(memmap, 0, sizeof(struct page) * nr_pages); + ms->section_mem_map |= SECTION_MARKED_PRESENT; ret = sparse_init_one_section(ms, section_nr, memmap, usemap); -- 1.8.0 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935476Ab2JaLso (ORCPT ); Wed, 31 Oct 2012 07:48:44 -0400 Received: from cn.fujitsu.com ([222.73.24.84]:55903 "EHLO song.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S932760Ab2JaLsa (ORCPT ); Wed, 31 Oct 2012 07:48:30 -0400 X-IronPort-AV: E=Sophos;i="4.80,687,1344182400"; d="scan'208";a="6108904" From: Wen Congyang To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org Cc: Jiang Liu , Len Brown , Andrew Morton , KOSAKI Motohiro , Yasuaki Ishimatsu , rjw@sisk.pl, Lai Jiangshan , Wen Congyang , David Rientjes , Benjamin Herrenschmidt , Paul Mackerras , Christoph Lameter , Minchan Kim , Dave Hansen , Mel Gorman Subject: [Patch v4 3/8] memory-hotplug: fix NR_FREE_PAGES mismatch Date: Wed, 31 Oct 2012 19:23:09 +0800 Message-Id: <1351682594-17347-4-git-send-email-wency@cn.fujitsu.com> X-Mailer: git-send-email 1.8.0 In-Reply-To: <1351682594-17347-1-git-send-email-wency@cn.fujitsu.com> References: <1351682594-17347-1-git-send-email-wency@cn.fujitsu.com> X-MIMETrack: Itemize by SMTP Server on mailserver/fnst(Release 8.5.3|September 15, 2011) at 2012/10/31 19:16:44, Serialize by Router on mailserver/fnst(Release 8.5.3|September 15, 2011) at 2012/10/31 19:16:54, Serialize complete at 2012/10/31 19:16:54 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org NR_FREE_PAGES will be wrong after offlining pages. We add/dec NR_FREE_PAGES like this now: 1. move all pages in buddy system to MIGRATE_ISOLATE, and dec NR_FREE_PAGES 2. don't add NR_FREE_PAGES when it is freed and the migratetype is MIGRATE_ISOLATE 3. dec NR_FREE_PAGES when offlining isolated pages. 4. add NR_FREE_PAGES when undoing isolate pages. When we come to step 3, all pages are in MIGRATE_ISOLATE list, and NR_FREE_PAGES are right. When we come to step4, all pages are not in buddy system, so we don't change NR_FREE_PAGES in this step, but we change NR_FREE_PAGES in step3. So NR_FREE_PAGES is wrong after offlining pages. So there is no need to change NR_FREE_PAGES in step3. This patch also fixs a problem in step2: if the migratetype is MIGRATE_ISOLATE, we should not add NR_FRR_PAGES when we remove pages from pcppages. Signed-off-by: Wen Congyang Cc: David Rientjes Cc: Jiang Liu Cc: Len Brown Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Christoph Lameter Cc: Minchan Kim Cc: KOSAKI Motohiro Cc: Yasuaki Ishimatsu Cc: Dave Hansen Cc: Mel Gorman Signed-off-by: Andrew Morton --- mm/page_alloc.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 5b74de6..a7cd2d1 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -667,11 +667,13 @@ static void free_pcppages_bulk(struct zone *zone, int count, /* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */ __free_one_page(page, zone, 0, mt); trace_mm_page_pcpu_drain(page, 0, mt); - if (is_migrate_cma(mt)) - __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, 1); + if (likely(mt != MIGRATE_ISOLATE)) { + __mod_zone_page_state(zone, NR_FREE_PAGES, 1); + if (is_migrate_cma(mt)) + __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, 1); + } } while (--to_free && --batch_free && !list_empty(list)); } - __mod_zone_page_state(zone, NR_FREE_PAGES, count); spin_unlock(&zone->lock); } @@ -5987,8 +5989,6 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn) list_del(&page->lru); rmv_page_order(page); zone->free_area[order].nr_free--; - __mod_zone_page_state(zone, NR_FREE_PAGES, - - (1UL << order)); for (i = 0; i < (1 << order); i++) SetPageReserved((page+i)); pfn += (1 << order); -- 1.8.0 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935458Ab2JaLsd (ORCPT ); Wed, 31 Oct 2012 07:48:33 -0400 Received: from cn.fujitsu.com ([222.73.24.84]:51732 "EHLO song.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S1755605Ab2JaLsW (ORCPT ); Wed, 31 Oct 2012 07:48:22 -0400 X-IronPort-AV: E=Sophos;i="4.80,687,1344182400"; d="scan'208";a="6108897" From: Wen Congyang To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org Cc: Jiang Liu , Len Brown , Andrew Morton , KOSAKI Motohiro , Yasuaki Ishimatsu , rjw@sisk.pl, Lai Jiangshan , Wen Congyang Subject: [Patch v4 0/8] bugfix for memory hotplug Date: Wed, 31 Oct 2012 19:23:06 +0800 Message-Id: <1351682594-17347-1-git-send-email-wency@cn.fujitsu.com> X-Mailer: git-send-email 1.8.0 X-MIMETrack: Itemize by SMTP Server on mailserver/fnst(Release 8.5.3|September 15, 2011) at 2012/10/31 19:16:44, Serialize by Router on mailserver/fnst(Release 8.5.3|September 15, 2011) at 2012/10/31 19:16:50, Serialize complete at 2012/10/31 19:16:50 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The last version is here: https://lkml.org/lkml/2012/10/19/56 Note: patch 1-3 are in -mm tree and I don't touch them. The other patches except patch6 are also in mm tree. Patch 6 is not touched. Changes from v3 to v4: Patch4: use dynamically allocated memory instead of static array. Patch5: merge [patchv3 2-3] into a single patch, and update it as we use dynamically allocated memory Patch7: merge [patchv3 5-6] into a single patch Patch8: merge [patchv3 9] and its fix into a patch Changes from v2 to v3: Merge the bug fix from ishimatsu to this patchset(Patch 1-3) Patch 3: split it from patch as it fixes another bug. Patch 4: new patch, and fix bad-page state when hotadding a memory device after hotremoving it. I forgot to post this patch in v2. Patch 6: update it according to Dave Hansen's comment. Changes from v1 to v2: Patch 1: updated according to kosaki's suggestion Patch 2: new patch, and update mce_bad_pages when removing memory. Patch 4: new patch, and fix a NR_FREE_PAGES mismatch, and this bug cause oom in my test. Patch 5: new patch, and fix a new bug. When repeating to online/offline pages, the free pages will continue to decrease. Wen Congyang (6): memory-hotplug: auto offline page_cgroup when onlining memory block failed memory-hotplug: fix NR_FREE_PAGES mismatch numa: convert static memory to dynamically allocated memory for per node device clear the memory to store struct page memory-hotplug: current hwpoison doesn't support memory offline memory-hotplug: allocate zone's pcp before onlining pages Yasuaki Ishimatsu (2): memory hotplug: suppress "Device memoryX does not have a release() function" warning suppress "Device nodeX does not have a release() function" warning arch/powerpc/kernel/sysfs.c | 4 +-- drivers/base/memory.c | 9 ++++++- drivers/base/node.c | 56 ++++++++++++++++++++++++++++++------------ include/linux/node.h | 2 +- include/linux/page-isolation.h | 10 +++++--- mm/hugetlb.c | 4 +-- mm/memory-failure.c | 2 +- mm/memory_hotplug.c | 13 +++++++--- mm/page_alloc.c | 37 +++++++++++++++++++++------- mm/page_cgroup.c | 3 +++ mm/page_isolation.c | 27 ++++++++++++++------ mm/sparse.c | 25 ++++++++++++++++++- 12 files changed, 144 insertions(+), 48 deletions(-) -- 1.8.0 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935467Ab2JaLtc (ORCPT ); Wed, 31 Oct 2012 07:49:32 -0400 Received: from cn.fujitsu.com ([222.73.24.84]:24144 "EHLO song.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S932765Ab2JaLsb (ORCPT ); Wed, 31 Oct 2012 07:48:31 -0400 X-IronPort-AV: E=Sophos;i="4.80,687,1344182400"; d="scan'208";a="6108905" From: Wen Congyang To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org Cc: Jiang Liu , Len Brown , Andrew Morton , KOSAKI Motohiro , Yasuaki Ishimatsu , rjw@sisk.pl, Lai Jiangshan , Wen Congyang , David Rientjes , Benjamin Herrenschmidt , Paul Mackerras , Christoph Lameter , Minchan Kim , Andi Kleen , Dave Hansen , Mel Gorman Subject: [Patch v4 7/8] memory-hotplug: current hwpoison doesn't support memory offline Date: Wed, 31 Oct 2012 19:23:13 +0800 Message-Id: <1351682594-17347-8-git-send-email-wency@cn.fujitsu.com> X-Mailer: git-send-email 1.8.0 In-Reply-To: <1351682594-17347-1-git-send-email-wency@cn.fujitsu.com> References: <1351682594-17347-1-git-send-email-wency@cn.fujitsu.com> X-MIMETrack: Itemize by SMTP Server on mailserver/fnst(Release 8.5.3|September 15, 2011) at 2012/10/31 19:16:45, Serialize by Router on mailserver/fnst(Release 8.5.3|September 15, 2011) at 2012/10/31 19:16:57, Serialize complete at 2012/10/31 19:16:57 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org hwpoisoned may be set when we offline a page by the sysfs interface /sys/devices/system/memory/soft_offline_page or /sys/devices/system/memory/hard_offline_page. If a page is hwpisoned page, we may meet the following problems when we offlining/removing the memory: 1. the pages can't be offlined. If the page is hwpoisoned pages, it can't be freed when it is onlined, and will not in free list. So we can't offline these pages again. So we should skip such page when offlining pages. 2. mce_bad_pages is wrong after removing a memory. When we hotremove a memory device, we will free the memory to store struct page. If the page is hwpoisoned page, we should decrease mce_bad_pages. Cc: David Rientjes Cc: Jiang Liu Cc: Len Brown Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Christoph Lameter Cc: Minchan Kim Cc: KOSAKI Motohiro Cc: Yasuaki Ishimatsu Cc: Andi Kleen Cc: Dave Hansen Cc: Mel Gorman Signed-off-by: Wen Congyang --- include/linux/page-isolation.h | 10 ++++++---- mm/memory-failure.c | 2 +- mm/memory_hotplug.c | 5 +++-- mm/page_alloc.c | 27 +++++++++++++++++++++++---- mm/page_isolation.c | 27 ++++++++++++++++++++------- mm/sparse.c | 22 ++++++++++++++++++++++ 6 files changed, 75 insertions(+), 18 deletions(-) diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h index 76a9539..a92061e 100644 --- a/include/linux/page-isolation.h +++ b/include/linux/page-isolation.h @@ -2,7 +2,8 @@ #define __LINUX_PAGEISOLATION_H -bool has_unmovable_pages(struct zone *zone, struct page *page, int count); +bool has_unmovable_pages(struct zone *zone, struct page *page, int count, + bool skip_hwpoisoned_pages); void set_pageblock_migratetype(struct page *page, int migratetype); int move_freepages_block(struct zone *zone, struct page *page, int migratetype); @@ -21,7 +22,7 @@ int move_freepages(struct zone *zone, */ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, - unsigned migratetype); + unsigned migratetype, bool skip_hwpoisoned_pages); /* * Changes MIGRATE_ISOLATE to MIGRATE_MOVABLE. @@ -34,12 +35,13 @@ undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, /* * Test all pages in [start_pfn, end_pfn) are isolated or not. */ -int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn); +int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn, + bool skip_hwpoisoned_pages); /* * Internal functions. Changes pageblock's migrate type. */ -int set_migratetype_isolate(struct page *page); +int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages); void unset_migratetype_isolate(struct page *page, unsigned migratetype); struct page *alloc_migrate_target(struct page *page, unsigned long private, int **resultp); diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 6c5899b..1abffee 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1385,7 +1385,7 @@ static int get_any_page(struct page *p, unsigned long pfn, int flags) * Isolate the page, so that it doesn't get reallocated if it * was free. */ - set_migratetype_isolate(p); + set_migratetype_isolate(p, true); /* * When the target page is a free hugepage, just remove it * from free hugepage list. diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 56b758a..72f4fef 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -854,7 +854,7 @@ check_pages_isolated_cb(unsigned long start_pfn, unsigned long nr_pages, { int ret; long offlined = *(long *)data; - ret = test_pages_isolated(start_pfn, start_pfn + nr_pages); + ret = test_pages_isolated(start_pfn, start_pfn + nr_pages, true); offlined = nr_pages; if (!ret) *(long *)data += offlined; @@ -901,7 +901,8 @@ static int __ref __offline_pages(unsigned long start_pfn, nr_pages = end_pfn - start_pfn; /* set above range as isolated */ - ret = start_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE); + ret = start_isolate_page_range(start_pfn, end_pfn, + MIGRATE_MOVABLE, true); if (ret) goto out; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index a7cd2d1..027afd0 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5577,7 +5577,8 @@ void set_pageblock_flags_group(struct page *page, unsigned long flags, * MIGRATE_MOVABLE block might include unmovable pages. It means you can't * expect this function should be exact. */ -bool has_unmovable_pages(struct zone *zone, struct page *page, int count) +bool has_unmovable_pages(struct zone *zone, struct page *page, int count, + bool skip_hwpoisoned_pages) { unsigned long pfn, iter, found; int mt; @@ -5612,6 +5613,13 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count) continue; } + /* + * The HWPoisoned page may be not in buddy system, and + * page_count() is not 0. + */ + if (skip_hwpoisoned_pages && PageHWPoison(page)) + continue; + if (!PageLRU(page)) found++; /* @@ -5654,7 +5662,7 @@ bool is_pageblock_removable_nolock(struct page *page) zone->zone_start_pfn + zone->spanned_pages <= pfn) return false; - return !has_unmovable_pages(zone, page, 0); + return !has_unmovable_pages(zone, page, 0, true); } #ifdef CONFIG_CMA @@ -5825,7 +5833,8 @@ int alloc_contig_range(unsigned long start, unsigned long end, */ ret = start_isolate_page_range(pfn_max_align_down(start), - pfn_max_align_up(end), migratetype); + pfn_max_align_up(end), migratetype, + false); if (ret) return ret; @@ -5864,7 +5873,7 @@ int alloc_contig_range(unsigned long start, unsigned long end, } /* Make sure the range is really isolated. */ - if (test_pages_isolated(outer_start, end)) { + if (test_pages_isolated(outer_start, end, false)) { pr_warn("alloc_contig_range test_pages_isolated(%lx, %lx) failed\n", outer_start, end); ret = -EBUSY; @@ -5979,6 +5988,16 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn) continue; } page = pfn_to_page(pfn); + /* + * The HWPoisoned page may be not in buddy system, and + * page_count() is not 0. + */ + if (unlikely(!PageBuddy(page) && PageHWPoison(page))) { + pfn++; + SetPageReserved(page); + continue; + } + BUG_ON(page_count(page)); BUG_ON(!PageBuddy(page)); order = page_order(page); diff --git a/mm/page_isolation.c b/mm/page_isolation.c index f2f5b48..9d2264e 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -30,7 +30,7 @@ static void restore_pageblock_isolate(struct page *page, int migratetype) zone->nr_pageblock_isolate--; } -int set_migratetype_isolate(struct page *page) +int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages) { struct zone *zone; unsigned long flags, pfn; @@ -66,7 +66,8 @@ int set_migratetype_isolate(struct page *page) * FIXME: Now, memory hotplug doesn't call shrink_slab() by itself. * We just check MOVABLE pages. */ - if (!has_unmovable_pages(zone, page, arg.pages_found)) + if (!has_unmovable_pages(zone, page, arg.pages_found, + skip_hwpoisoned_pages)) ret = 0; /* @@ -134,7 +135,7 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages) * Returns 0 on success and -EBUSY if any part of range cannot be isolated. */ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, - unsigned migratetype) + unsigned migratetype, bool skip_hwpoisoned_pages) { unsigned long pfn; unsigned long undo_pfn; @@ -147,7 +148,8 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, pfn < end_pfn; pfn += pageblock_nr_pages) { page = __first_valid_page(pfn, pageblock_nr_pages); - if (page && set_migratetype_isolate(page)) { + if (page && + set_migratetype_isolate(page, skip_hwpoisoned_pages)) { undo_pfn = pfn; goto undo; } @@ -190,7 +192,8 @@ int undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, * Returns 1 if all pages in the range are isolated. */ static int -__test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn) +__test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn, + bool skip_hwpoisoned_pages) { struct page *page; @@ -220,6 +223,14 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn) else if (page_count(page) == 0 && get_freepage_migratetype(page) == MIGRATE_ISOLATE) pfn += 1; + else if (skip_hwpoisoned_pages && PageHWPoison(page)) { + /* + * The HWPoisoned page may be not in buddy + * system, and page_count() is not 0. + */ + pfn++; + continue; + } else break; } @@ -228,7 +239,8 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn) return 1; } -int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn) +int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn, + bool skip_hwpoisoned_pages) { unsigned long pfn, flags; struct page *page; @@ -251,7 +263,8 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn) /* Check all pages are free or Marked as ISOLATED */ zone = page_zone(page); spin_lock_irqsave(&zone->lock, flags); - ret = __test_page_isolated_in_pageblock(start_pfn, end_pfn); + ret = __test_page_isolated_in_pageblock(start_pfn, end_pfn, + skip_hwpoisoned_pages); spin_unlock_irqrestore(&zone->lock, flags); return ret ? 0 : -EBUSY; } diff --git a/mm/sparse.c b/mm/sparse.c index 0021265..b2d37c6 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -774,6 +774,27 @@ out: return ret; } +#ifdef CONFIG_MEMORY_FAILURE +static void clear_hwpoisoned_pages(struct page *memmap, int nr_pages) +{ + int i; + + if (!memmap) + return; + + for (i = 0; i < PAGES_PER_SECTION; i++) { + if (PageHWPoison(&memmap[i])) { + atomic_long_sub(1, &mce_bad_pages); + ClearPageHWPoison(&memmap[i]); + } + } +} +#else +static inline void clear_hwpoisoned_pages(struct page *memmap, int nr_pages) +{ +} +#endif + void sparse_remove_one_section(struct zone *zone, struct mem_section *ms) { struct page *memmap = NULL; @@ -787,6 +808,7 @@ void sparse_remove_one_section(struct zone *zone, struct mem_section *ms) ms->pageblock_flags = NULL; } + clear_hwpoisoned_pages(memmap, PAGES_PER_SECTION); free_section_usemap(memmap, usemap); } #endif -- 1.8.0 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932765Ab2JaLt6 (ORCPT ); Wed, 31 Oct 2012 07:49:58 -0400 Received: from cn.fujitsu.com ([222.73.24.84]:51732 "EHLO song.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S1756742Ab2JaLs3 (ORCPT ); Wed, 31 Oct 2012 07:48:29 -0400 X-IronPort-AV: E=Sophos;i="4.80,687,1344182400"; d="scan'208";a="6108903" From: Wen Congyang To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org Cc: Jiang Liu , Len Brown , Andrew Morton , KOSAKI Motohiro , Yasuaki Ishimatsu , rjw@sisk.pl, Lai Jiangshan , Wen Congyang , David Rientjes , Benjamin Herrenschmidt , Paul Mackerras , Christoph Lameter , Minchan Kim , Dave Hansen , Mel Gorman Subject: [Patch v4 8/8] memory-hotplug: allocate zone's pcp before onlining pages Date: Wed, 31 Oct 2012 19:23:14 +0800 Message-Id: <1351682594-17347-9-git-send-email-wency@cn.fujitsu.com> X-Mailer: git-send-email 1.8.0 In-Reply-To: <1351682594-17347-1-git-send-email-wency@cn.fujitsu.com> References: <1351682594-17347-1-git-send-email-wency@cn.fujitsu.com> X-MIMETrack: Itemize by SMTP Server on mailserver/fnst(Release 8.5.3|September 15, 2011) at 2012/10/31 19:16:45, Serialize by Router on mailserver/fnst(Release 8.5.3|September 15, 2011) at 2012/10/31 19:16:55, Serialize complete at 2012/10/31 19:16:55 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org We use __free_page() to put a page to buddy system when onlining pages. __free_page() will store NR_FREE_PAGES in zone's pcp.vm_stat_diff, so we should allocate zone's pcp before onlining pages, otherwise we will lose some free pages. Cc: David Rientjes Cc: Jiang Liu Cc: Len Brown Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Christoph Lameter Cc: Minchan Kim Cc: KOSAKI Motohiro Cc: Yasuaki Ishimatsu Cc: Dave Hansen Cc: Mel Gorman Signed-off-by: Wen Congyang --- mm/memory_hotplug.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 72f4fef..63ea7df 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -505,12 +505,16 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages) * So, zonelist must be updated after online. */ mutex_lock(&zonelists_mutex); - if (!populated_zone(zone)) + if (!populated_zone(zone)) { need_zonelists_rebuild = 1; + build_all_zonelists(NULL, zone); + } ret = walk_system_ram_range(pfn, nr_pages, &onlined_pages, online_pages_range); if (ret) { + if (need_zonelists_rebuild) + zone_pcp_reset(zone); mutex_unlock(&zonelists_mutex); printk(KERN_DEBUG "online_pages [mem %#010llx-%#010llx] failed\n", (unsigned long long) pfn << PAGE_SHIFT, @@ -526,7 +530,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages) if (onlined_pages) { node_set_state(zone_to_nid(zone), N_HIGH_MEMORY); if (need_zonelists_rebuild) - build_all_zonelists(NULL, zone); + build_all_zonelists(NULL, NULL); else zone_pcp_update(zone); } -- 1.8.0 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757088Ab2JaLuW (ORCPT ); Wed, 31 Oct 2012 07:50:22 -0400 Received: from cn.fujitsu.com ([222.73.24.84]:24144 "EHLO song.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S932727Ab2JaLs2 (ORCPT ); Wed, 31 Oct 2012 07:48:28 -0400 X-IronPort-AV: E=Sophos;i="4.80,687,1344182400"; d="scan'208";a="6108902" From: Wen Congyang To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org Cc: Jiang Liu , Len Brown , Andrew Morton , KOSAKI Motohiro , Yasuaki Ishimatsu , rjw@sisk.pl, Lai Jiangshan , Wen Congyang , David Rientjes , Minchan Kim Subject: [Patch v4 4/8] numa: convert static memory to dynamically allocated memory for per node device Date: Wed, 31 Oct 2012 19:23:10 +0800 Message-Id: <1351682594-17347-5-git-send-email-wency@cn.fujitsu.com> X-Mailer: git-send-email 1.8.0 In-Reply-To: <1351682594-17347-1-git-send-email-wency@cn.fujitsu.com> References: <1351682594-17347-1-git-send-email-wency@cn.fujitsu.com> X-MIMETrack: Itemize by SMTP Server on mailserver/fnst(Release 8.5.3|September 15, 2011) at 2012/10/31 19:16:44, Serialize by Router on mailserver/fnst(Release 8.5.3|September 15, 2011) at 2012/10/31 19:16:50, Serialize complete at 2012/10/31 19:16:50 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org We use a static array to store struct node. In many cases, we don't have too many nodes, and some memory will be unused. Convert it to per-device dynamically allocated memory. CC: David Rientjes CC: Jiang Liu Cc: Minchan Kim CC: Andrew Morton CC: KOSAKI Motohiro CC: Yasuaki Ishimatsu Signed-off-by: Wen Congyang --- arch/powerpc/kernel/sysfs.c | 4 ++-- drivers/base/node.c | 38 ++++++++++++++++++++++---------------- include/linux/node.h | 2 +- mm/hugetlb.c | 4 ++-- 4 files changed, 27 insertions(+), 21 deletions(-) diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c index cf357a0..3ce1f86 100644 --- a/arch/powerpc/kernel/sysfs.c +++ b/arch/powerpc/kernel/sysfs.c @@ -607,7 +607,7 @@ static void register_nodes(void) int sysfs_add_device_to_node(struct device *dev, int nid) { - struct node *node = &node_devices[nid]; + struct node *node = node_devices[nid]; return sysfs_create_link(&node->dev.kobj, &dev->kobj, kobject_name(&dev->kobj)); } @@ -615,7 +615,7 @@ EXPORT_SYMBOL_GPL(sysfs_add_device_to_node); void sysfs_remove_device_from_node(struct device *dev, int nid) { - struct node *node = &node_devices[nid]; + struct node *node = node_devices[nid]; sysfs_remove_link(&node->dev.kobj, kobject_name(&dev->kobj)); } EXPORT_SYMBOL_GPL(sysfs_remove_device_from_node); diff --git a/drivers/base/node.c b/drivers/base/node.c index af1a177..28216ce 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -306,7 +306,7 @@ void unregister_node(struct node *node) device_unregister(&node->dev); } -struct node node_devices[MAX_NUMNODES]; +struct node *node_devices[MAX_NUMNODES]; /* * register cpu under node @@ -323,15 +323,15 @@ int register_cpu_under_node(unsigned int cpu, unsigned int nid) if (!obj) return 0; - ret = sysfs_create_link(&node_devices[nid].dev.kobj, + ret = sysfs_create_link(&node_devices[nid]->dev.kobj, &obj->kobj, kobject_name(&obj->kobj)); if (ret) return ret; return sysfs_create_link(&obj->kobj, - &node_devices[nid].dev.kobj, - kobject_name(&node_devices[nid].dev.kobj)); + &node_devices[nid]->dev.kobj, + kobject_name(&node_devices[nid]->dev.kobj)); } int unregister_cpu_under_node(unsigned int cpu, unsigned int nid) @@ -345,10 +345,10 @@ int unregister_cpu_under_node(unsigned int cpu, unsigned int nid) if (!obj) return 0; - sysfs_remove_link(&node_devices[nid].dev.kobj, + sysfs_remove_link(&node_devices[nid]->dev.kobj, kobject_name(&obj->kobj)); sysfs_remove_link(&obj->kobj, - kobject_name(&node_devices[nid].dev.kobj)); + kobject_name(&node_devices[nid]->dev.kobj)); return 0; } @@ -390,15 +390,15 @@ int register_mem_sect_under_node(struct memory_block *mem_blk, int nid) continue; if (page_nid != nid) continue; - ret = sysfs_create_link_nowarn(&node_devices[nid].dev.kobj, + ret = sysfs_create_link_nowarn(&node_devices[nid]->dev.kobj, &mem_blk->dev.kobj, kobject_name(&mem_blk->dev.kobj)); if (ret) return ret; return sysfs_create_link_nowarn(&mem_blk->dev.kobj, - &node_devices[nid].dev.kobj, - kobject_name(&node_devices[nid].dev.kobj)); + &node_devices[nid]->dev.kobj, + kobject_name(&node_devices[nid]->dev.kobj)); } /* mem section does not span the specified node */ return 0; @@ -431,10 +431,10 @@ int unregister_mem_sect_under_nodes(struct memory_block *mem_blk, continue; if (node_test_and_set(nid, *unlinked_nodes)) continue; - sysfs_remove_link(&node_devices[nid].dev.kobj, + sysfs_remove_link(&node_devices[nid]->dev.kobj, kobject_name(&mem_blk->dev.kobj)); sysfs_remove_link(&mem_blk->dev.kobj, - kobject_name(&node_devices[nid].dev.kobj)); + kobject_name(&node_devices[nid]->dev.kobj)); } NODEMASK_FREE(unlinked_nodes); return 0; @@ -500,7 +500,7 @@ static void node_hugetlb_work(struct work_struct *work) static void init_node_hugetlb_work(int nid) { - INIT_WORK(&node_devices[nid].node_work, node_hugetlb_work); + INIT_WORK(&node_devices[nid]->node_work, node_hugetlb_work); } static int node_memory_callback(struct notifier_block *self, @@ -517,7 +517,7 @@ static int node_memory_callback(struct notifier_block *self, * when transitioning to/from memoryless state. */ if (nid != NUMA_NO_NODE) - schedule_work(&node_devices[nid].node_work); + schedule_work(&node_devices[nid]->node_work); break; case MEM_GOING_ONLINE: @@ -558,9 +558,13 @@ int register_one_node(int nid) struct node *parent = NULL; if (p_node != nid) - parent = &node_devices[p_node]; + parent = node_devices[p_node]; - error = register_node(&node_devices[nid], nid, parent); + node_devices[nid] = kzalloc(sizeof(struct node), GFP_KERNEL); + if (!node_devices[nid]) + return -ENOMEM; + + error = register_node(node_devices[nid], nid, parent); /* link cpu under this node */ for_each_present_cpu(cpu) { @@ -581,7 +585,9 @@ int register_one_node(int nid) void unregister_one_node(int nid) { - unregister_node(&node_devices[nid]); + unregister_node(node_devices[nid]); + kfree(node_devices[nid]); + node_devices[nid] = NULL; } /* diff --git a/include/linux/node.h b/include/linux/node.h index 624e53c..10316f1 100644 --- a/include/linux/node.h +++ b/include/linux/node.h @@ -27,7 +27,7 @@ struct node { }; struct memory_block; -extern struct node node_devices[]; +extern struct node *node_devices[]; typedef void (*node_registration_func_t)(struct node *); extern int register_node(struct node *, int, struct node *); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 59a0059..1ef2cd4 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1800,7 +1800,7 @@ static void hugetlb_unregister_all_nodes(void) * remove hstate attributes from any nodes that have them. */ for (nid = 0; nid < nr_node_ids; nid++) - hugetlb_unregister_node(&node_devices[nid]); + hugetlb_unregister_node(node_devices[nid]); } /* @@ -1845,7 +1845,7 @@ static void hugetlb_register_all_nodes(void) int nid; for_each_node_state(nid, N_HIGH_MEMORY) { - struct node *node = &node_devices[nid]; + struct node *node = node_devices[nid]; if (node->dev.id == nid) hugetlb_register_node(node); } -- 1.8.0