From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vn0-f50.google.com (mail-vn0-f50.google.com [209.85.216.50]) by kanga.kvack.org (Postfix) with ESMTP id A2E516B0038 for ; Fri, 26 Jun 2015 22:28:50 -0400 (EDT) Received: by vnbg129 with SMTP id g129so17961464vnb.2 for ; Fri, 26 Jun 2015 19:28:50 -0700 (PDT) Received: from szxga02-in.huawei.com (szxga02-in.huawei.com. [119.145.14.65]) by mx.google.com with ESMTPS id g20si5198279vdu.74.2015.06.26.19.28.47 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Fri, 26 Jun 2015 19:28:49 -0700 (PDT) Message-ID: <558E084A.60900@huawei.com> Date: Sat, 27 Jun 2015 10:19:54 +0800 From: Xishi Qiu MIME-Version: 1.0 Subject: [RFC v2 PATCH 0/8] mm: mirrored memory support for page buddy allocations Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton , "H. Peter Anvin" , Ingo Molnar , "Luck, Tony" , Hanjun Guo , Xiexiuqi , leon@leon.nu, Kamezawa Hiroyuki , Dave Hansen , Naoya Horiguchi , Vlastimil Babka , Mel Gorman Cc: Xishi Qiu , Linux MM , LKML Intel Xeon processor E7 v3 product family-based platforms introduces support for partial memory mirroring called as 'Address Range Mirroring'. This feature allows BIOS to specify a subset of total available memory to be mirrored (and optionally also specify whether to mirror the range 0-4 GB). This capability allows user to make an appropriate tradeoff between non-mirrored memory range and mirrored memory range thus optimizing total available memory and still achieving highly reliable memory range for mission critical workloads and/or kernel space. Tony has already send a patchset to supprot this feature at boot time. https://lkml.org/lkml/2015/5/8/521 This patchset is based on Tony's, it can support the feature after boot time. Use mirrored memory for all kernel allocations. TBD: - Add compatibility with memory online/offline, memory compaction, CMA... - Need to discuss the implementation ideas, add a new zone or a new migratetype or others. V2: - Use memblock which marked MEMBLOCK_MIRROR to find mirrored memory instead of mirror_info. - Remove __GFP_MIRROR and /proc/sys/vm/mirrorable. - Use mirrored memory for all kernel allocations. Xishi Qiu (8): mm: add a new config to manage the code mm: introduce MIGRATE_MIRROR to manage the mirrored pages mm: find mirrored memory in memblock mm: add mirrored memory to buddy system mm: introduce a new zone_stat_item NR_FREE_MIRROR_PAGES mm: add free mirrored pages info mm: add the buddy system interface mm: add the PCP interface drivers/base/node.c | 17 ++++--- fs/proc/meminfo.c | 6 +++ include/linux/memblock.h | 29 ++++++++++-- include/linux/mmzone.h | 10 ++++ include/linux/vmstat.h | 2 + mm/Kconfig | 8 ++++ mm/memblock.c | 33 +++++++++++-- mm/nobootmem.c | 3 ++ mm/page_alloc.c | 117 ++++++++++++++++++++++++++++++++++++----------- mm/vmstat.c | 4 ++ 10 files changed, 190 insertions(+), 39 deletions(-) -- 2.0.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ob0-f170.google.com (mail-ob0-f170.google.com [209.85.214.170]) by kanga.kvack.org (Postfix) with ESMTP id BD6706B0038 for ; Fri, 26 Jun 2015 22:30:22 -0400 (EDT) Received: by obbkm3 with SMTP id km3so77262408obb.1 for ; Fri, 26 Jun 2015 19:30:22 -0700 (PDT) Received: from szxga03-in.huawei.com (szxga03-in.huawei.com. [119.145.14.66]) by mx.google.com with ESMTPS id wx3si23584839oeb.11.2015.06.26.19.30.20 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Fri, 26 Jun 2015 19:30:21 -0700 (PDT) Message-ID: <558E09A1.2090102@huawei.com> Date: Sat, 27 Jun 2015 10:25:37 +0800 From: Xishi Qiu MIME-Version: 1.0 Subject: [RFC v2 PATCH 4/8] mm: add mirrored memory to buddy system References: <558E084A.60900@huawei.com> In-Reply-To: <558E084A.60900@huawei.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton , "H. Peter Anvin" , Ingo Molnar , "Luck, Tony" , Hanjun Guo , Xiexiuqi , leon@leon.nu, Kamezawa Hiroyuki , Dave Hansen , Naoya Horiguchi , Vlastimil Babka , Mel Gorman Cc: Xishi Qiu , Linux MM , LKML Before free bootmem, set mirrored pageblock's migratetype to MIGRATE_MIRROR, so they could free to buddy system's MIGRATE_MIRROR list. When set reserved memory, skip the mirrored memory. Signed-off-by: Xishi Qiu --- include/linux/memblock.h | 3 +++ mm/memblock.c | 21 +++++++++++++++++++++ mm/nobootmem.c | 3 +++ mm/page_alloc.c | 3 +++ 4 files changed, 30 insertions(+) diff --git a/include/linux/memblock.h b/include/linux/memblock.h index 97f71ca..53be030 100644 --- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -81,6 +81,9 @@ int memblock_mark_hotplug(phys_addr_t base, phys_addr_t size); int memblock_clear_hotplug(phys_addr_t base, phys_addr_t size); int memblock_mark_mirror(phys_addr_t base, phys_addr_t size); ulong choose_memblock_flags(void); +#ifdef CONFIG_MEMORY_MIRROR +void memblock_mark_migratemirror(void); +#endif /* Low level functions */ int memblock_add_range(struct memblock_type *type, diff --git a/mm/memblock.c b/mm/memblock.c index 7612876..0d0b210 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -19,6 +19,7 @@ #include #include #include +#include #include #include @@ -818,6 +819,26 @@ int __init_memblock memblock_mark_mirror(phys_addr_t base, phys_addr_t size) return memblock_setclr_flag(base, size, 1, MEMBLOCK_MIRROR); } +#ifdef CONFIG_MEMORY_MIRROR +void __init_memblock memblock_mark_migratemirror(void) +{ + unsigned long start_pfn, end_pfn, pfn; + int i, node; + struct page *page; + + printk(KERN_DEBUG "Mirrored memory:\n"); + for_each_mirror_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, + &node) { + printk(KERN_DEBUG " node %3d: [mem %#010llx-%#010llx]\n", + node, PFN_PHYS(start_pfn), PFN_PHYS(end_pfn) - 1); + for (pfn = start_pfn; pfn < end_pfn; + pfn += pageblock_nr_pages) { + page = pfn_to_page(pfn); + set_pageblock_migratetype(page, MIGRATE_MIRROR); + } + } +} +#endif /** * __next__mem_range - next function for for_each_free_mem_range() etc. diff --git a/mm/nobootmem.c b/mm/nobootmem.c index 5258386..31aa6d4 100644 --- a/mm/nobootmem.c +++ b/mm/nobootmem.c @@ -129,6 +129,9 @@ static unsigned long __init free_low_memory_core_early(void) u64 i; memblock_clear_hotplug(0, -1); +#ifdef CONFIG_MEMORY_MIRROR + memblock_mark_migratemirror(); +#endif for_each_free_mem_range(i, NUMA_NO_NODE, MEMBLOCK_NONE, &start, &end, NULL) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 6e4d79f..aea78a5 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4118,6 +4118,9 @@ static void setup_zone_migrate_reserve(struct zone *zone) block_migratetype = get_pageblock_migratetype(page); + if (is_migrate_mirror(block_migratetype)) + continue; + /* Only test what is necessary when the reserves are not met */ if (reserve > 0) { /* -- 2.0.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f49.google.com (mail-oi0-f49.google.com [209.85.218.49]) by kanga.kvack.org (Postfix) with ESMTP id 1739F6B006C for ; Fri, 26 Jun 2015 22:30:58 -0400 (EDT) Received: by oiax193 with SMTP id x193so87186554oia.2 for ; Fri, 26 Jun 2015 19:30:56 -0700 (PDT) Received: from szxga03-in.huawei.com (szxga03-in.huawei.com. [119.145.14.66]) by mx.google.com with ESMTPS id jy9si23574057oeb.77.2015.06.26.19.30.54 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Fri, 26 Jun 2015 19:30:56 -0700 (PDT) Message-ID: <558E09CA.7020909@huawei.com> Date: Sat, 27 Jun 2015 10:26:18 +0800 From: Xishi Qiu MIME-Version: 1.0 Subject: [RFC v2 PATCH 5/8] mm: introduce a new zone_stat_item NR_FREE_MIRROR_PAGES References: <558E084A.60900@huawei.com> In-Reply-To: <558E084A.60900@huawei.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton , "H. Peter Anvin" , Ingo Molnar , "Luck, Tony" , Hanjun Guo , Xiexiuqi , leon@leon.nu, Kamezawa Hiroyuki , Dave Hansen , Naoya Horiguchi , Vlastimil Babka , Mel Gorman Cc: Xishi Qiu , Linux MM , LKML This patch introduces a new zone_stat_item called "NR_FREE_MIRROR_PAGES", it is used to storage free mirrored pages count. Signed-off-by: Xishi Qiu --- include/linux/mmzone.h | 1 + include/linux/vmstat.h | 2 ++ mm/vmstat.c | 1 + 3 files changed, 4 insertions(+) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 54e891a..7cc0a29 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -166,6 +166,7 @@ enum zone_stat_item { WORKINGSET_NODERECLAIM, NR_ANON_TRANSPARENT_HUGEPAGES, NR_FREE_CMA_PAGES, + NR_FREE_MIRROR_PAGES, NR_VM_ZONE_STAT_ITEMS }; /* diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h index 82e7db7..d0a7268 100644 --- a/include/linux/vmstat.h +++ b/include/linux/vmstat.h @@ -283,6 +283,8 @@ static inline void __mod_zone_freepage_state(struct zone *zone, int nr_pages, __mod_zone_page_state(zone, NR_FREE_PAGES, nr_pages); if (is_migrate_cma(migratetype)) __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, nr_pages); + if (is_migrate_mirror(migratetype)) + __mod_zone_page_state(zone, NR_FREE_MIRROR_PAGES, nr_pages); } extern const char * const vmstat_text[]; diff --git a/mm/vmstat.c b/mm/vmstat.c index d0323e0..7ee11ca 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -739,6 +739,7 @@ const char * const vmstat_text[] = { "workingset_nodereclaim", "nr_anon_transparent_hugepages", "nr_free_cma", + "nr_free_mirror", /* enum writeback_stat_item counters */ "nr_dirty_threshold", -- 2.0.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ie0-f171.google.com (mail-ie0-f171.google.com [209.85.223.171]) by kanga.kvack.org (Postfix) with ESMTP id E90CA6B0038 for ; Fri, 26 Jun 2015 22:32:21 -0400 (EDT) Received: by iecvh10 with SMTP id vh10so86484448iec.3 for ; Fri, 26 Jun 2015 19:32:21 -0700 (PDT) Received: from szxga01-in.huawei.com (szxga01-in.huawei.com. [58.251.152.64]) by mx.google.com with ESMTPS id q142si29283268ioe.75.2015.06.26.19.32.19 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Fri, 26 Jun 2015 19:32:21 -0700 (PDT) Message-ID: <558E0948.2010104@huawei.com> Date: Sat, 27 Jun 2015 10:24:08 +0800 From: Xishi Qiu MIME-Version: 1.0 Subject: [RFC v2 PATCH 2/8] mm: introduce MIGRATE_MIRROR to manage the mirrored pages References: <558E084A.60900@huawei.com> In-Reply-To: <558E084A.60900@huawei.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton , "H. Peter Anvin" , Ingo Molnar , "Luck, Tony" , Hanjun Guo , Xiexiuqi , leon@leon.nu, Kamezawa Hiroyuki , Dave Hansen , Naoya Horiguchi , Vlastimil Babka , Mel Gorman Cc: Xishi Qiu , Linux MM , LKML This patch introduces a new migratetype called "MIGRATE_MIRROR", it is used to allocate mirrored pages. When cat /proc/pagetypeinfo, you can see the count of free mirrored blocks. Signed-off-by: Xishi Qiu --- include/linux/mmzone.h | 9 +++++++++ mm/page_alloc.c | 3 +++ mm/vmstat.c | 3 +++ 3 files changed, 15 insertions(+) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 54d74f6..54e891a 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -39,6 +39,9 @@ enum { MIGRATE_UNMOVABLE, MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE, +#ifdef CONFIG_MEMORY_MIRROR + MIGRATE_MIRROR, +#endif MIGRATE_PCPTYPES, /* the number of types on the pcp lists */ MIGRATE_RESERVE = MIGRATE_PCPTYPES, #ifdef CONFIG_CMA @@ -69,6 +72,12 @@ enum { # define is_migrate_cma(migratetype) false #endif +#ifdef CONFIG_MEMORY_MIRROR +# define is_migrate_mirror(migratetype) unlikely((migratetype) == MIGRATE_MIRROR) +#else +# define is_migrate_mirror(migratetype) false +#endif + #define for_each_migratetype_order(order, type) \ for (order = 0; order < MAX_ORDER; order++) \ for (type = 0; type < MIGRATE_TYPES; type++) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index ebffa0e..6e4d79f 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3216,6 +3216,9 @@ static void show_migration_types(unsigned char type) [MIGRATE_UNMOVABLE] = 'U', [MIGRATE_RECLAIMABLE] = 'E', [MIGRATE_MOVABLE] = 'M', +#ifdef CONFIG_MEMORY_MIRROR + [MIGRATE_MIRROR] = 'O', +#endif [MIGRATE_RESERVE] = 'R', #ifdef CONFIG_CMA [MIGRATE_CMA] = 'C', diff --git a/mm/vmstat.c b/mm/vmstat.c index 4f5cd97..d0323e0 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -901,6 +901,9 @@ static char * const migratetype_names[MIGRATE_TYPES] = { "Unmovable", "Reclaimable", "Movable", +#ifdef CONFIG_MEMORY_MIRROR + "Mirror", +#endif "Reserve", #ifdef CONFIG_CMA "CMA", -- 2.0.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ig0-f178.google.com (mail-ig0-f178.google.com [209.85.213.178]) by kanga.kvack.org (Postfix) with ESMTP id 0DB4A6B006C for ; Fri, 26 Jun 2015 22:32:52 -0400 (EDT) Received: by igcsj18 with SMTP id sj18so44481627igc.1 for ; Fri, 26 Jun 2015 19:32:51 -0700 (PDT) Received: from szxga01-in.huawei.com (szxga01-in.huawei.com. [58.251.152.64]) by mx.google.com with ESMTPS id m5si671967igx.2.2015.06.26.19.32.49 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Fri, 26 Jun 2015 19:32:51 -0700 (PDT) Message-ID: <558E0974.6060206@huawei.com> Date: Sat, 27 Jun 2015 10:24:52 +0800 From: Xishi Qiu MIME-Version: 1.0 Subject: [RFC v2 PATCH 3/8] mm: find mirrored memory in memblock References: <558E084A.60900@huawei.com> In-Reply-To: <558E084A.60900@huawei.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton , "H. Peter Anvin" , Ingo Molnar , "Luck, Tony" , Hanjun Guo , Xiexiuqi , leon@leon.nu, Kamezawa Hiroyuki , Dave Hansen , Naoya Horiguchi , Vlastimil Babka , Mel Gorman Cc: Xishi Qiu , Linux MM , LKML Add a macro for_each_mirror_pfn_range() to find mirrored memory in memblock. This patch is based on Tony's patchset "Find mirrored memory, use for boot time allocations" Signed-off-by: Xishi Qiu --- include/linux/memblock.h | 25 ++++++++++++++++++++++--- mm/memblock.c | 6 +++++- 2 files changed, 27 insertions(+), 4 deletions(-) diff --git a/include/linux/memblock.h b/include/linux/memblock.h index 0215ffd..97f71ca 100644 --- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -171,7 +171,8 @@ static inline bool memblock_is_mirror(struct memblock_region *m) #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP int memblock_search_pfn_nid(unsigned long pfn, unsigned long *start_pfn, unsigned long *end_pfn); -void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn, +void __next_mem_pfn_range(int *idx, int nid, ulong flags, + unsigned long *out_start_pfn, unsigned long *out_end_pfn, int *out_nid); /** @@ -185,8 +186,26 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn, * Walks over configured memory ranges. */ #define for_each_mem_pfn_range(i, nid, p_start, p_end, p_nid) \ - for (i = -1, __next_mem_pfn_range(&i, nid, p_start, p_end, p_nid); \ - i >= 0; __next_mem_pfn_range(&i, nid, p_start, p_end, p_nid)) + for (i = -1, __next_mem_pfn_range(&i, nid, MEMBLOCK_NONE, \ + p_start, p_end, p_nid); \ + i >= 0; __next_mem_pfn_range(&i, nid, MEMBLOCK_NONE, \ + p_start, p_end, p_nid)) + +/** + * for_each_mirror_pfn_range - early mirrored memory pfn range iterator + * @i: an integer used as loop variable + * @nid: node selector, %MAX_NUMNODES for all nodes + * @p_start: ptr to ulong for start pfn of the range, can be %NULL + * @p_end: ptr to ulong for end pfn of the range, can be %NULL + * @p_nid: ptr to int for nid of the range, can be %NULL + * + * Walks over configured mirrored memory ranges. + */ +#define for_each_mirror_pfn_range(i, nid, p_start, p_end, p_nid) \ + for (i = -1, __next_mem_pfn_range(&i, nid, MEMBLOCK_MIRROR, \ + p_start, p_end, p_nid); \ + i >= 0; __next_mem_pfn_range(&i, nid, MEMBLOCK_MIRROR, \ + p_start, p_end, p_nid)) #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */ /** diff --git a/mm/memblock.c b/mm/memblock.c index 1b444c7..7612876 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -1040,7 +1040,7 @@ void __init_memblock __next_mem_range_rev(u64 *idx, int nid, ulong flags, /* * Common iterator interface used to define for_each_mem_range(). */ -void __init_memblock __next_mem_pfn_range(int *idx, int nid, +void __init_memblock __next_mem_pfn_range(int *idx, int nid, ulong flags, unsigned long *out_start_pfn, unsigned long *out_end_pfn, int *out_nid) { @@ -1050,6 +1050,10 @@ void __init_memblock __next_mem_pfn_range(int *idx, int nid, while (++*idx < type->cnt) { r = &type->regions[*idx]; + /* if we want mirror memory skip non-mirror memory regions */ + if ((flags & MEMBLOCK_MIRROR) && !memblock_is_mirror(r)) + continue; + if (PFN_UP(r->base) >= PFN_DOWN(r->base + r->size)) continue; if (nid == MAX_NUMNODES || nid == r->nid) -- 2.0.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f51.google.com (mail-oi0-f51.google.com [209.85.218.51]) by kanga.kvack.org (Postfix) with ESMTP id C8EA86B006E for ; Fri, 26 Jun 2015 22:33:18 -0400 (EDT) Received: by oiyy130 with SMTP id y130so87160885oiy.0 for ; Fri, 26 Jun 2015 19:33:18 -0700 (PDT) Received: from szxga03-in.huawei.com (szxga03-in.huawei.com. [119.145.14.66]) by mx.google.com with ESMTPS id j8si23630121oia.52.2015.06.26.19.33.15 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Fri, 26 Jun 2015 19:33:18 -0700 (PDT) Message-ID: <558E0A51.1040807@huawei.com> Date: Sat, 27 Jun 2015 10:28:33 +0800 From: Xishi Qiu MIME-Version: 1.0 Subject: [RFC v2 PATCH 8/8] mm: add the PCP interface References: <558E084A.60900@huawei.com> In-Reply-To: <558E084A.60900@huawei.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton , "H. Peter Anvin" , Ingo Molnar , "Luck, Tony" , Hanjun Guo , Xiexiuqi , leon@leon.nu, Kamezawa Hiroyuki , Dave Hansen , Naoya Horiguchi , Vlastimil Babka , Mel Gorman Cc: Xishi Qiu , Linux MM , LKML Abstract the PCP code in __rmqueue_pcp(), and do not call fallback in rmqueue_bulk() when the migratetype is mirror. Signed-off-by: Xishi Qiu --- mm/page_alloc.c | 85 +++++++++++++++++++++++++++++++++++++++++---------------- 1 file changed, 61 insertions(+), 24 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 8a6125e..bb44463 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1337,11 +1337,20 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, unsigned long count, struct list_head *list, int migratetype, bool cold) { - int i; + int i, mt; + struct page *page; spin_lock(&zone->lock); for (i = 0; i < count; ++i) { - struct page *page = __rmqueue(zone, order, migratetype); + /* + * If there is no mirrored memory left, just keep the list + * empty, because we can not mix other types pages into the + * mirror list. + */ + if (is_migrate_mirror(migratetype)) + page = __rmqueue_smallest(zone, order, migratetype); + else + page = __rmqueue(zone, order, migratetype); if (unlikely(page == NULL)) break; @@ -1359,15 +1368,61 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, else list_add_tail(&page->lru, list); list = &page->lru; - if (is_migrate_cma(get_freepage_migratetype(page))) + + mt = get_freepage_migratetype(page); + if (is_migrate_cma(mt)) __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, -(1 << order)); + if (is_migrate_mirror(mt)) + __mod_zone_page_state(zone, NR_FREE_MIRROR_PAGES, + -(1 << order)); } __mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order)); spin_unlock(&zone->lock); return i; } +static struct page *__rmqueue_pcp(struct zone *zone, unsigned int order, + gfp_t gfp_flags, int migratetype) +{ + struct page *page; + struct per_cpu_pages *pcp; + struct list_head *list; + bool cold; + + cold = ((gfp_flags & __GFP_COLD) != 0); + pcp = &this_cpu_ptr(zone->pageset)->pcp; + +retry: + list = &pcp->lists[migratetype]; + if (list_empty(list)) { + pcp->count += rmqueue_bulk(zone, 0, + pcp->batch, list, + migratetype, cold); + if (unlikely(list_empty(list))) { + /* + * If there is no mirrored memory left, alloc other + * types PCP, use MIGRATE_RECLAIMABLE to retry + */ + if (is_migrate_mirror(migratetype)) { + migratetype = MIGRATE_RECLAIMABLE; + goto retry; + } else + return NULL; + } + } + + if (cold) + page = list_entry(list->prev, struct page, lru); + else + page = list_entry(list->next, struct page, lru); + + list_del(&page->lru); + pcp->count--; + + return page; +} + #ifdef CONFIG_NUMA /* * Called from the vmstat counter updater to drain pagesets of this @@ -1713,30 +1768,12 @@ struct page *buffered_rmqueue(struct zone *preferred_zone, { unsigned long flags; struct page *page; - bool cold = ((gfp_flags & __GFP_COLD) != 0); if (likely(order == 0)) { - struct per_cpu_pages *pcp; - struct list_head *list; - local_irq_save(flags); - pcp = &this_cpu_ptr(zone->pageset)->pcp; - list = &pcp->lists[migratetype]; - if (list_empty(list)) { - pcp->count += rmqueue_bulk(zone, 0, - pcp->batch, list, - migratetype, cold); - if (unlikely(list_empty(list))) - goto failed; - } - - if (cold) - page = list_entry(list->prev, struct page, lru); - else - page = list_entry(list->next, struct page, lru); - - list_del(&page->lru); - pcp->count--; + page = __rmqueue_pcp(zone, order, gfp_flags, migratetype); + if (!page) + goto failed; } else { if (unlikely(gfp_flags & __GFP_NOFAIL)) { /* -- 2.0.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f173.google.com (mail-pd0-f173.google.com [209.85.192.173]) by kanga.kvack.org (Postfix) with ESMTP id F13C96B0038 for ; Fri, 26 Jun 2015 22:38:09 -0400 (EDT) Received: by pdcu2 with SMTP id u2so84617725pdc.3 for ; Fri, 26 Jun 2015 19:38:09 -0700 (PDT) Received: from szxga01-in.huawei.com (szxga01-in.huawei.com. [58.251.152.64]) by mx.google.com with ESMTPS id pn8si53070104pbb.126.2015.06.26.19.38.06 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Fri, 26 Jun 2015 19:38:09 -0700 (PDT) Message-ID: <558E09F4.70908@huawei.com> Date: Sat, 27 Jun 2015 10:27:00 +0800 From: Xishi Qiu MIME-Version: 1.0 Subject: [RFC v2 PATCH 6/8] mm: add free mirrored pages info References: <558E084A.60900@huawei.com> In-Reply-To: <558E084A.60900@huawei.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton , "H. Peter Anvin" , Ingo Molnar , "Luck, Tony" , Hanjun Guo , Xiexiuqi , leon@leon.nu, Kamezawa Hiroyuki , Dave Hansen , Naoya Horiguchi , Vlastimil Babka , Mel Gorman Cc: Xishi Qiu , Linux MM , LKML Add the count of free mirrored pages in the following paths: /proc/meminfo /proc/zoneinfo /sys/devices/system/node/node XX/meminfo /sys/devices/system/node/node XX/vmstat Signed-off-by: Xishi Qiu --- drivers/base/node.c | 17 +++++++++++------ fs/proc/meminfo.c | 6 ++++++ mm/page_alloc.c | 7 +++++-- 3 files changed, 22 insertions(+), 8 deletions(-) diff --git a/drivers/base/node.c b/drivers/base/node.c index a2aa65b..d1a3556 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -114,6 +114,9 @@ static ssize_t node_read_meminfo(struct device *dev, #ifdef CONFIG_TRANSPARENT_HUGEPAGE "Node %d AnonHugePages: %8lu kB\n" #endif +#ifdef CONFIG_MEMORY_MIRROR + "Node %d MirrorFree: %8lu kB\n" +#endif , nid, K(node_page_state(nid, NR_FILE_DIRTY)), nid, K(node_page_state(nid, NR_WRITEBACK)), @@ -130,14 +133,16 @@ static ssize_t node_read_meminfo(struct device *dev, nid, K(node_page_state(nid, NR_SLAB_RECLAIMABLE) + node_page_state(nid, NR_SLAB_UNRECLAIMABLE)), nid, K(node_page_state(nid, NR_SLAB_RECLAIMABLE)), -#ifdef CONFIG_TRANSPARENT_HUGEPAGE nid, K(node_page_state(nid, NR_SLAB_UNRECLAIMABLE)) - , nid, - K(node_page_state(nid, NR_ANON_TRANSPARENT_HUGEPAGES) * - HPAGE_PMD_NR)); -#else - nid, K(node_page_state(nid, NR_SLAB_UNRECLAIMABLE))); +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + , nid, K(node_page_state(nid, NR_ANON_TRANSPARENT_HUGEPAGES) * + HPAGE_PMD_NR) +#endif +#ifdef CONFIG_MEMORY_MIRROR + , nid, K(node_page_state(nid, NR_FREE_MIRROR_PAGES)) #endif + ); + n += hugetlb_report_node_meminfo(nid, buf + n); return n; } diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c index d3ebf2e..d1ebb20 100644 --- a/fs/proc/meminfo.c +++ b/fs/proc/meminfo.c @@ -145,6 +145,9 @@ static int meminfo_proc_show(struct seq_file *m, void *v) "CmaTotal: %8lu kB\n" "CmaFree: %8lu kB\n" #endif +#ifdef CONFIG_MEMORY_MIRROR + "MirrorFree: %8lu kB\n" +#endif , K(i.totalram), K(i.freeram), @@ -204,6 +207,9 @@ static int meminfo_proc_show(struct seq_file *m, void *v) , K(totalcma_pages) , K(global_page_state(NR_FREE_CMA_PAGES)) #endif +#ifdef CONFIG_MEMORY_MIRROR + , K(global_page_state(NR_FREE_MIRROR_PAGES)) +#endif ); hugetlb_report_meminfo(m); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index aea78a5..4c5bc50 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3268,7 +3268,7 @@ void show_free_areas(unsigned int filter) " unevictable:%lu dirty:%lu writeback:%lu unstable:%lu\n" " slab_reclaimable:%lu slab_unreclaimable:%lu\n" " mapped:%lu shmem:%lu pagetables:%lu bounce:%lu\n" - " free:%lu free_pcp:%lu free_cma:%lu\n", + " free:%lu free_pcp:%lu free_cma:%lu free_mirror:%lu\n", global_page_state(NR_ACTIVE_ANON), global_page_state(NR_INACTIVE_ANON), global_page_state(NR_ISOLATED_ANON), @@ -3287,7 +3287,8 @@ void show_free_areas(unsigned int filter) global_page_state(NR_BOUNCE), global_page_state(NR_FREE_PAGES), free_pcp, - global_page_state(NR_FREE_CMA_PAGES)); + global_page_state(NR_FREE_CMA_PAGES), + global_page_state(NR_FREE_MIRROR_PAGES)); for_each_populated_zone(zone) { int i; @@ -3328,6 +3329,7 @@ void show_free_areas(unsigned int filter) " free_pcp:%lukB" " local_pcp:%ukB" " free_cma:%lukB" + " free_mirror:%lukB" " writeback_tmp:%lukB" " pages_scanned:%lu" " all_unreclaimable? %s" @@ -3361,6 +3363,7 @@ void show_free_areas(unsigned int filter) K(free_pcp), K(this_cpu_read(zone->pageset->pcp.count)), K(zone_page_state(zone, NR_FREE_CMA_PAGES)), + K(zone_page_state(zone, NR_FREE_MIRROR_PAGES)), K(zone_page_state(zone, NR_WRITEBACK_TEMP)), K(zone_page_state(zone, NR_PAGES_SCANNED)), (!zone_reclaimable(zone) ? "yes" : "no") -- 2.0.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f44.google.com (mail-pa0-f44.google.com [209.85.220.44]) by kanga.kvack.org (Postfix) with ESMTP id 36BF76B006C for ; Fri, 26 Jun 2015 22:38:43 -0400 (EDT) Received: by pactm7 with SMTP id tm7so76896188pac.2 for ; Fri, 26 Jun 2015 19:38:42 -0700 (PDT) Received: from szxga03-in.huawei.com (szxga03-in.huawei.com. [119.145.14.66]) by mx.google.com with ESMTPS id f10si4671450pdp.225.2015.06.26.19.38.40 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Fri, 26 Jun 2015 19:38:42 -0700 (PDT) Message-ID: <558E0913.7020501@huawei.com> Date: Sat, 27 Jun 2015 10:23:15 +0800 From: Xishi Qiu MIME-Version: 1.0 Subject: [RFC v2 PATCH 1/8] mm: add a new config to manage the code References: <558E084A.60900@huawei.com> In-Reply-To: <558E084A.60900@huawei.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton , "H. Peter Anvin" , Ingo Molnar , "Luck, Tony" , Hanjun Guo , Xiexiuqi , leon@leon.nu, Kamezawa Hiroyuki , Dave Hansen , Naoya Horiguchi , Vlastimil Babka , Mel Gorman Cc: Xishi Qiu , Linux MM , LKML This patch introduces a new config called "CONFIG_ACPI_MIRROR_MEMORY", set it off by default. Signed-off-by: Xishi Qiu --- mm/Kconfig | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/mm/Kconfig b/mm/Kconfig index 390214d..c40bb8b 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -200,6 +200,14 @@ config MEMORY_HOTREMOVE depends on MEMORY_HOTPLUG && ARCH_ENABLE_MEMORY_HOTREMOVE depends on MIGRATION +config MEMORY_MIRROR + bool "Address range mirroring support" + depends on X86 && MEMORY_FAILURE + default n + help + This feature depends on hardware and firmware support. + ACPI or EFI records the mirror info. + # # If we have space for more page flags then we can enable additional # optimizations and functionality. -- 2.0.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yk0-f180.google.com (mail-yk0-f180.google.com [209.85.160.180]) by kanga.kvack.org (Postfix) with ESMTP id C2DF36B0038 for ; Fri, 26 Jun 2015 22:39:52 -0400 (EDT) Received: by ykdt186 with SMTP id t186so74161174ykd.0 for ; Fri, 26 Jun 2015 19:39:52 -0700 (PDT) Received: from szxga02-in.huawei.com (szxga02-in.huawei.com. [119.145.14.65]) by mx.google.com with ESMTPS id o11si3669853ykb.45.2015.06.26.19.39.05 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Fri, 26 Jun 2015 19:39:51 -0700 (PDT) Message-ID: <558E0A28.6060607@huawei.com> Date: Sat, 27 Jun 2015 10:27:52 +0800 From: Xishi Qiu MIME-Version: 1.0 Subject: [RFC v2 PATCH 7/8] mm: add the buddy system interface References: <558E084A.60900@huawei.com> In-Reply-To: <558E084A.60900@huawei.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton , "H. Peter Anvin" , Ingo Molnar , "Luck, Tony" , Hanjun Guo , Xiexiuqi , leon@leon.nu, Kamezawa Hiroyuki , Dave Hansen , Naoya Horiguchi , Vlastimil Babka , Mel Gorman Cc: Xishi Qiu , Linux MM , LKML Add the buddy system interface for address range mirroring feature. Use mirrored memory for all kernel allocations. If there is no mirrored pages left, try to use other types pages. Signed-off-by: Xishi Qiu --- include/linux/memblock.h | 1 + mm/memblock.c | 6 +++--- mm/page_alloc.c | 19 +++++++++++++++++++ 3 files changed, 23 insertions(+), 3 deletions(-) diff --git a/include/linux/memblock.h b/include/linux/memblock.h index 53be030..8c33ac0 100644 --- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -81,6 +81,7 @@ int memblock_mark_hotplug(phys_addr_t base, phys_addr_t size); int memblock_clear_hotplug(phys_addr_t base, phys_addr_t size); int memblock_mark_mirror(phys_addr_t base, phys_addr_t size); ulong choose_memblock_flags(void); +extern struct static_key system_has_mirror; #ifdef CONFIG_MEMORY_MIRROR void memblock_mark_migratemirror(void); #endif diff --git a/mm/memblock.c b/mm/memblock.c index 0d0b210..430ad87 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -55,14 +55,14 @@ int memblock_debug __initdata_memblock; #ifdef CONFIG_MOVABLE_NODE bool movable_node_enabled __initdata_memblock = false; #endif -static bool system_has_some_mirror __initdata_memblock = false; +struct static_key system_has_mirror = STATIC_KEY_INIT; static int memblock_can_resize __initdata_memblock; static int memblock_memory_in_slab __initdata_memblock = 0; static int memblock_reserved_in_slab __initdata_memblock = 0; ulong __init_memblock choose_memblock_flags(void) { - return system_has_some_mirror ? MEMBLOCK_MIRROR : MEMBLOCK_NONE; + return static_key_false(&system_has_mirror) ? MEMBLOCK_MIRROR : MEMBLOCK_NONE; } /* inline so we don't get a warning when pr_debug is compiled out */ @@ -814,7 +814,7 @@ int __init_memblock memblock_clear_hotplug(phys_addr_t base, phys_addr_t size) */ int __init_memblock memblock_mark_mirror(phys_addr_t base, phys_addr_t size) { - system_has_some_mirror = true; + static_key_slow_inc(&system_has_mirror); return memblock_setclr_flag(base, size, 1, MEMBLOCK_MIRROR); } diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 4c5bc50..8a6125e 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1033,6 +1033,9 @@ static int fallbacks[MIGRATE_TYPES][4] = { [MIGRATE_UNMOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE, MIGRATE_RESERVE }, [MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE, MIGRATE_MOVABLE, MIGRATE_RESERVE }, [MIGRATE_MOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE }, +#ifdef CONFIG_MEMORY_MIRROR + [MIGRATE_MIRROR] = { MIGRATE_RESERVE }, /* Never used */ +#endif #ifdef CONFIG_CMA [MIGRATE_CMA] = { MIGRATE_RESERVE }, /* Never used */ #endif @@ -1295,6 +1298,15 @@ retry_reserve: page = __rmqueue_smallest(zone, order, migratetype); if (unlikely(!page) && migratetype != MIGRATE_RESERVE) { + /* + * If there is no mirrored memory left, alloc other types + * memory. But we should not change the pageblock's + * migratetype between mirror and others, so just use + * MIGRATE_RECLAIMABLE to retry + */ + if (is_migrate_mirror(migratetype)) + return __rmqueue(zone, order, MIGRATE_RECLAIMABLE); + if (migratetype == MIGRATE_MOVABLE) page = __rmqueue_cma_fallback(zone, order); @@ -2872,6 +2884,13 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, if (IS_ENABLED(CONFIG_CMA) && ac.migratetype == MIGRATE_MOVABLE) alloc_flags |= ALLOC_CMA; +#ifdef CONFIG_MEMORY_MIRROR + /* Alloc mirrored memory for kernel */ + if (static_key_false(&system_has_mirror) + && !(gfp_mask & __GFP_MOVABLE)) + ac.migratetype = MIGRATE_MIRROR; +#endif + retry_cpuset: cpuset_mems_cookie = read_mems_allowed_begin(); -- 2.0.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f176.google.com (mail-pd0-f176.google.com [209.85.192.176]) by kanga.kvack.org (Postfix) with ESMTP id 0F0836B0070 for ; Mon, 29 Jun 2015 02:53:05 -0400 (EDT) Received: by pdbep18 with SMTP id ep18so89652041pdb.1 for ; Sun, 28 Jun 2015 23:53:04 -0700 (PDT) Received: from mgwkm01.jp.fujitsu.com (mgwkm01.jp.fujitsu.com. [202.219.69.168]) by mx.google.com with ESMTPS id qm10si62972049pdb.138.2015.06.28.23.53.03 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 28 Jun 2015 23:53:04 -0700 (PDT) Received: from m3051.s.css.fujitsu.com (m3051.s.css.fujitsu.com [10.134.21.209]) by kw-mxauth.gw.nic.fujitsu.com (Postfix) with ESMTP id 64226AC037F for ; Mon, 29 Jun 2015 15:52:59 +0900 (JST) Message-ID: <5590EAA9.5090104@jp.fujitsu.com> Date: Mon, 29 Jun 2015 15:50:17 +0900 From: Kamezawa Hiroyuki MIME-Version: 1.0 Subject: Re: [RFC v2 PATCH 1/8] mm: add a new config to manage the code References: <558E084A.60900@huawei.com> <558E0913.7020501@huawei.com> In-Reply-To: <558E0913.7020501@huawei.com> Content-Type: multipart/mixed; boundary="------------030208070301040603070806" Sender: owner-linux-mm@kvack.org List-ID: To: Xishi Qiu , Andrew Morton , "H. Peter Anvin" , Ingo Molnar , "Luck, Tony" , Hanjun Guo , Xiexiuqi , leon@leon.nu, Dave Hansen , Naoya Horiguchi , Vlastimil Babka , Mel Gorman Cc: Linux MM , LKML This is a multi-part message in MIME format. --------------030208070301040603070806 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit On 2015/06/27 11:23, Xishi Qiu wrote: > This patch introduces a new config called "CONFIG_ACPI_MIRROR_MEMORY", set it CONFIG_MEMORY_MIRROR > off by default. > > Signed-off-by: Xishi Qiu > --- > mm/Kconfig | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/mm/Kconfig b/mm/Kconfig > index 390214d..c40bb8b 100644 > --- a/mm/Kconfig > +++ b/mm/Kconfig > @@ -200,6 +200,14 @@ config MEMORY_HOTREMOVE > depends on MEMORY_HOTPLUG && ARCH_ENABLE_MEMORY_HOTREMOVE > depends on MIGRATION > > +config MEMORY_MIRROR In following patches, you use CONFIG_MEMORY_MIRROR. I think the name is too generic besides it's depends on ACPI. But I'm not sure address based memory mirror is planned in other platform. So, hmm. How about dividing the config into 2 parts like attached ? (just an example) Thanks, -Kame --------------030208070301040603070806 Content-Type: text/plain; charset=Shift_JIS; name="0001-add-a-new-config-option-for-memory-mirror.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename*0="0001-add-a-new-config-option-for-memory-mirror.patch" --------------030208070301040603070806-- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f181.google.com (mail-pd0-f181.google.com [209.85.192.181]) by kanga.kvack.org (Postfix) with ESMTP id 23DAD6B0032 for ; Mon, 29 Jun 2015 03:34:35 -0400 (EDT) Received: by pdbep18 with SMTP id ep18so90311062pdb.1 for ; Mon, 29 Jun 2015 00:34:34 -0700 (PDT) Received: from mgwkm01.jp.fujitsu.com (mgwkm01.jp.fujitsu.com. [202.219.69.168]) by mx.google.com with ESMTPS id h7si63086091pat.180.2015.06.29.00.34.32 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 29 Jun 2015 00:34:33 -0700 (PDT) Received: from m3051.s.css.fujitsu.com (m3051.s.css.fujitsu.com [10.134.21.209]) by kw-mxoi2.gw.nic.fujitsu.com (Postfix) with ESMTP id 72AA0AC03FC for ; Mon, 29 Jun 2015 16:34:30 +0900 (JST) Message-ID: <5590F4A7.4030606@jp.fujitsu.com> Date: Mon, 29 Jun 2015 16:32:55 +0900 From: Kamezawa Hiroyuki MIME-Version: 1.0 Subject: Re: [RFC v2 PATCH 2/8] mm: introduce MIGRATE_MIRROR to manage the mirrored pages References: <558E084A.60900@huawei.com> <558E0948.2010104@huawei.com> In-Reply-To: <558E0948.2010104@huawei.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Xishi Qiu , Andrew Morton , "H. Peter Anvin" , Ingo Molnar , "Luck, Tony" , Hanjun Guo , Xiexiuqi , leon@leon.nu, Dave Hansen , Naoya Horiguchi , Vlastimil Babka , Mel Gorman Cc: Linux MM , LKML On 2015/06/27 11:24, Xishi Qiu wrote: > This patch introduces a new migratetype called "MIGRATE_MIRROR", it is used to > allocate mirrored pages. > When cat /proc/pagetypeinfo, you can see the count of free mirrored blocks. > > Signed-off-by: Xishi Qiu My fear about this approarch is that this may break something existing. Now, when we add MIGRATE_MIRROR type, we'll hide attributes of pageblocks as MIGRATE_UNMOVABOLE, MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE. Logically, MIRROR attribute is independent from page mobility and this overwrites will make some information lost. Then, > --- > include/linux/mmzone.h | 9 +++++++++ > mm/page_alloc.c | 3 +++ > mm/vmstat.c | 3 +++ > 3 files changed, 15 insertions(+) > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index 54d74f6..54e891a 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -39,6 +39,9 @@ enum { > MIGRATE_UNMOVABLE, > MIGRATE_RECLAIMABLE, > MIGRATE_MOVABLE, > +#ifdef CONFIG_MEMORY_MIRROR > + MIGRATE_MIRROR, > +#endif I think MIGRATE_MIRROR_UNMOVABLE, MIGRATE_MIRROR_RECLAIMABLE, MIGRATE_MIRROR_MOVABLE, <== adding this may need discuss. MIGRATE_MIRROR_RESERVED, <== reserved pages should be maintained per mirrored/unmirrored. should be added with the following fallback list. /* * MIRROR page range is defined by firmware at boot. The range is limited * and is used only for kernel memory mirroring. */ [MIGRATE_UNMOVABLE_MIRROR] = {MIGRATE_RECLAIMABLE_MIRROR, MIGRATE_RESERVE} [MIGRATE_RECLAIMABLE_MIRROR] = {MIGRATE_UNMOVABLE_MIRROR, MIGRATE_RESERVE} Then, we'll not lose the original information of "Reclaiable Pages". One problem here is whteher we should have MIGRATE_RESERVE_MIRROR. If we never allow users to allocate mirrored memory, we should have MIGRATE_RESERVE_MIRROR. But it seems to require much more code change to do that. Creating a zone or adding an attribues to zones are another design choice. Anyway, your patch doesn't takes care of reserved memory calculation at this point. Please check setup_zone_migrate_reserve() That will be a problem. Thanks, -Kame > MIGRATE_PCPTYPES, /* the number of types on the pcp lists */ > MIGRATE_RESERVE = MIGRATE_PCPTYPES, > #ifdef CONFIG_CMA > @@ -69,6 +72,12 @@ enum { > # define is_migrate_cma(migratetype) false > #endif > > +#ifdef CONFIG_MEMORY_MIRROR > +# define is_migrate_mirror(migratetype) unlikely((migratetype) == MIGRATE_MIRROR) > +#else > +# define is_migrate_mirror(migratetype) false > +#endif > + > #define for_each_migratetype_order(order, type) \ > for (order = 0; order < MAX_ORDER; order++) \ > for (type = 0; type < MIGRATE_TYPES; type++) > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index ebffa0e..6e4d79f 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -3216,6 +3216,9 @@ static void show_migration_types(unsigned char type) > [MIGRATE_UNMOVABLE] = 'U', > [MIGRATE_RECLAIMABLE] = 'E', > [MIGRATE_MOVABLE] = 'M', > +#ifdef CONFIG_MEMORY_MIRROR > + [MIGRATE_MIRROR] = 'O', > +#endif > [MIGRATE_RESERVE] = 'R', > #ifdef CONFIG_CMA > [MIGRATE_CMA] = 'C', > diff --git a/mm/vmstat.c b/mm/vmstat.c > index 4f5cd97..d0323e0 100644 > --- a/mm/vmstat.c > +++ b/mm/vmstat.c > @@ -901,6 +901,9 @@ static char * const migratetype_names[MIGRATE_TYPES] = { > "Unmovable", > "Reclaimable", > "Movable", > +#ifdef CONFIG_MEMORY_MIRROR > + "Mirror", > +#endif > "Reserve", > #ifdef CONFIG_CMA > "CMA", > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f52.google.com (mail-pa0-f52.google.com [209.85.220.52]) by kanga.kvack.org (Postfix) with ESMTP id A084B6B0032 for ; Mon, 29 Jun 2015 03:40:34 -0400 (EDT) Received: by paceq1 with SMTP id eq1so100934513pac.3 for ; Mon, 29 Jun 2015 00:40:34 -0700 (PDT) Received: from mgwym04.jp.fujitsu.com (mgwym04.jp.fujitsu.com. [211.128.242.43]) by mx.google.com with ESMTPS id qz9si63130864pab.204.2015.06.29.00.40.32 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 29 Jun 2015 00:40:33 -0700 (PDT) Received: from m3050.s.css.fujitsu.com (msm.b.css.fujitsu.com [10.134.21.208]) by yt-mxoi1.gw.nic.fujitsu.com (Postfix) with ESMTP id 13AE0AC01B7 for ; Mon, 29 Jun 2015 16:40:29 +0900 (JST) Message-ID: <5590F648.2080808@jp.fujitsu.com> Date: Mon, 29 Jun 2015 16:39:52 +0900 From: Kamezawa Hiroyuki MIME-Version: 1.0 Subject: Re: [RFC v2 PATCH 4/8] mm: add mirrored memory to buddy system References: <558E084A.60900@huawei.com> <558E09A1.2090102@huawei.com> In-Reply-To: <558E09A1.2090102@huawei.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Xishi Qiu , Andrew Morton , "H. Peter Anvin" , Ingo Molnar , "Luck, Tony" , Hanjun Guo , Xiexiuqi , leon@leon.nu, Dave Hansen , Naoya Horiguchi , Vlastimil Babka , Mel Gorman Cc: Linux MM , LKML On 2015/06/27 11:25, Xishi Qiu wrote: > Before free bootmem, set mirrored pageblock's migratetype to MIGRATE_MIRROR, so > they could free to buddy system's MIGRATE_MIRROR list. > When set reserved memory, skip the mirrored memory. > > Signed-off-by: Xishi Qiu > --- > include/linux/memblock.h | 3 +++ > mm/memblock.c | 21 +++++++++++++++++++++ > mm/nobootmem.c | 3 +++ > mm/page_alloc.c | 3 +++ > 4 files changed, 30 insertions(+) > > diff --git a/include/linux/memblock.h b/include/linux/memblock.h > index 97f71ca..53be030 100644 > --- a/include/linux/memblock.h > +++ b/include/linux/memblock.h > @@ -81,6 +81,9 @@ int memblock_mark_hotplug(phys_addr_t base, phys_addr_t size); > int memblock_clear_hotplug(phys_addr_t base, phys_addr_t size); > int memblock_mark_mirror(phys_addr_t base, phys_addr_t size); > ulong choose_memblock_flags(void); > +#ifdef CONFIG_MEMORY_MIRROR > +void memblock_mark_migratemirror(void); > +#endif > > /* Low level functions */ > int memblock_add_range(struct memblock_type *type, > diff --git a/mm/memblock.c b/mm/memblock.c > index 7612876..0d0b210 100644 > --- a/mm/memblock.c > +++ b/mm/memblock.c > @@ -19,6 +19,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -818,6 +819,26 @@ int __init_memblock memblock_mark_mirror(phys_addr_t base, phys_addr_t size) > return memblock_setclr_flag(base, size, 1, MEMBLOCK_MIRROR); > } > > +#ifdef CONFIG_MEMORY_MIRROR > +void __init_memblock memblock_mark_migratemirror(void) > +{ > + unsigned long start_pfn, end_pfn, pfn; > + int i, node; > + struct page *page; > + > + printk(KERN_DEBUG "Mirrored memory:\n"); > + for_each_mirror_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, > + &node) { > + printk(KERN_DEBUG " node %3d: [mem %#010llx-%#010llx]\n", > + node, PFN_PHYS(start_pfn), PFN_PHYS(end_pfn) - 1); > + for (pfn = start_pfn; pfn < end_pfn; > + pfn += pageblock_nr_pages) { > + page = pfn_to_page(pfn); > + set_pageblock_migratetype(page, MIGRATE_MIRROR); > + } > + } > +} > +#endif > > /** > * __next__mem_range - next function for for_each_free_mem_range() etc. > diff --git a/mm/nobootmem.c b/mm/nobootmem.c > index 5258386..31aa6d4 100644 > --- a/mm/nobootmem.c > +++ b/mm/nobootmem.c > @@ -129,6 +129,9 @@ static unsigned long __init free_low_memory_core_early(void) > u64 i; > > memblock_clear_hotplug(0, -1); > +#ifdef CONFIG_MEMORY_MIRROR > + memblock_mark_migratemirror(); > +#endif > > for_each_free_mem_range(i, NUMA_NO_NODE, MEMBLOCK_NONE, &start, &end, > NULL) > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 6e4d79f..aea78a5 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -4118,6 +4118,9 @@ static void setup_zone_migrate_reserve(struct zone *zone) > > block_migratetype = get_pageblock_migratetype(page); > > + if (is_migrate_mirror(block_migratetype)) > + continue; > + If mirrored area will not have reserved memory, this should break the page allocator's logic. I think both of mirrored and unmirrored range should have reserved area. Thanks, -Kame > /* Only test what is necessary when the reserves are not met */ > if (reserve > 0) { > /* > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f52.google.com (mail-pa0-f52.google.com [209.85.220.52]) by kanga.kvack.org (Postfix) with ESMTP id 0E59C6B0032 for ; Mon, 29 Jun 2015 11:19:14 -0400 (EDT) Received: by pabvl15 with SMTP id vl15so106933854pab.1 for ; Mon, 29 Jun 2015 08:19:13 -0700 (PDT) Received: from mga03.intel.com (mga03.intel.com. [134.134.136.65]) by mx.google.com with ESMTP id qu16si64909286pab.222.2015.06.29.08.19.11 for ; Mon, 29 Jun 2015 08:19:13 -0700 (PDT) Message-ID: <559161EF.7050405@intel.com> Date: Mon, 29 Jun 2015 08:19:11 -0700 From: Dave Hansen MIME-Version: 1.0 Subject: Re: [RFC v2 PATCH 0/8] mm: mirrored memory support for page buddy allocations References: <558E084A.60900@huawei.com> In-Reply-To: <558E084A.60900@huawei.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Xishi Qiu , Andrew Morton , "H. Peter Anvin" , Ingo Molnar , "Luck, Tony" , Hanjun Guo , Xiexiuqi , leon@leon.nu, Kamezawa Hiroyuki , Naoya Horiguchi , Vlastimil Babka , Mel Gorman Cc: Linux MM , LKML On 06/26/2015 07:19 PM, Xishi Qiu wrote: > drivers/base/node.c | 17 ++++--- > fs/proc/meminfo.c | 6 +++ > include/linux/memblock.h | 29 ++++++++++-- > include/linux/mmzone.h | 10 ++++ > include/linux/vmstat.h | 2 + > mm/Kconfig | 8 ++++ > mm/memblock.c | 33 +++++++++++-- > mm/nobootmem.c | 3 ++ > mm/page_alloc.c | 117 ++++++++++++++++++++++++++++++++++++----------- > mm/vmstat.c | 4 ++ > 10 files changed, 190 insertions(+), 39 deletions(-) Has there been any performance analysis done on this code? I'm always nervous when I see page_alloc.c churn. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f181.google.com (mail-pd0-f181.google.com [209.85.192.181]) by kanga.kvack.org (Postfix) with ESMTP id 56E556B0032 for ; Mon, 29 Jun 2015 19:12:02 -0400 (EDT) Received: by pdbep18 with SMTP id ep18so102541650pdb.1 for ; Mon, 29 Jun 2015 16:12:01 -0700 (PDT) Received: from mga03.intel.com (mga03.intel.com. [134.134.136.65]) by mx.google.com with ESMTP id ty5si66710962pac.54.2015.06.29.16.12.00 for ; Mon, 29 Jun 2015 16:12:01 -0700 (PDT) From: "Luck, Tony" Subject: RE: [RFC v2 PATCH 7/8] mm: add the buddy system interface Date: Mon, 29 Jun 2015 23:11:30 +0000 Message-ID: <3908561D78D1C84285E8C5FCA982C28F32AA124A@ORSMSX114.amr.corp.intel.com> References: <558E084A.60900@huawei.com> <558E0A28.6060607@huawei.com> In-Reply-To: <558E0A28.6060607@huawei.com> Content-Language: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Sender: owner-linux-mm@kvack.org List-ID: To: Xishi Qiu , Andrew Morton , "H. Peter Anvin" , Ingo Molnar , Hanjun Guo , Xiexiuqi , "leon@leon.nu" , Kamezawa Hiroyuki , "Hansen, Dave" , Naoya Horiguchi , Vlastimil Babka , Mel Gorman Cc: Linux MM , LKML > @@ -814,7 +814,7 @@ int __init_memblock memblock_clear_hotplug(phys_addr_= t base, phys_addr_t size) > */ > int __init_memblock memblock_mark_mirror(phys_addr_t base, phys_addr_t s= ize) > { > - system_has_some_mirror =3D true; > + static_key_slow_inc(&system_has_mirror); >=20 > return memblock_setclr_flag(base, size, 1, MEMBLOCK_MIRROR); > } This generates some WARN_ON noise when called from efi_find_mirror(): [ 0.000000] e820: last_pfn =3D 0x7b800 max_arch_pfn =3D 0x400000000 [ 0.000000] ------------[ cut here ]------------ [ 0.000000] WARNING: CPU: 0 PID: 0 at kernel/jump_label.c:61 static_key_= slow_inc+0x57/0xc0() [ 0.000000] static_key_slow_inc used before call to jump_label_init [ 0.000000] Modules linked in: [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.1.0 #4 [ 0.000000] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS B= RHSXSD1.86B.0065.R01.1505011640 05/01/2015 [ 0.000000] 0000000000000000 ee366a8dff38f745 ffffffff81997d68 ffffffff= 816683b4 [ 0.000000] 0000000000000000 ffffffff81997dc0 ffffffff81997da8 ffffffff= 8107b0aa [ 0.000000] ffffffff81d48822 ffffffff81f281a0 0000000040000000 0000001f= cb7a4000 [ 0.000000] Call Trace: [ 0.000000] [] dump_stack+0x45/0x57 [ 0.000000] [] warn_slowpath_common+0x8a/0xc0 [ 0.000000] [] warn_slowpath_fmt+0x55/0x70 [ 0.000000] [] ? memblock_add_range+0x175/0x19e [ 0.000000] [] static_key_slow_inc+0x57/0xc0 [ 0.000000] [] memblock_mark_mirror+0x19/0x33 [ 0.000000] [] efi_find_mirror+0x59/0xdd [ 0.000000] [] setup_arch+0x642/0xccf [ 0.000000] [] ? early_idt_handler_array+0x120/0x120 [ 0.000000] [] ? printk+0x55/0x6b [ 0.000000] [] ? early_idt_handler_array+0x120/0x120 [ 0.000000] [] start_kernel+0xe8/0x4eb [ 0.000000] [] ? early_idt_handler_array+0x120/0x120 [ 0.000000] [] ? early_idt_handler_array+0x120/0x120 [ 0.000000] [] x86_64_start_reservations+0x2a/0x2c [ 0.000000] [] x86_64_start_kernel+0x14c/0x16f [ 0.000000] ---[ end trace baa7fa0514e3bc58 ]--- [ 0.000000] ------------[ cut here ]------------ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f171.google.com (mail-pd0-f171.google.com [209.85.192.171]) by kanga.kvack.org (Postfix) with ESMTP id 9BB5A6B0032 for ; Mon, 29 Jun 2015 21:01:30 -0400 (EDT) Received: by pdbci14 with SMTP id ci14so125623312pdb.2 for ; Mon, 29 Jun 2015 18:01:30 -0700 (PDT) Received: from mgwkm01.jp.fujitsu.com (mgwkm01.jp.fujitsu.com. [202.219.69.168]) by mx.google.com with ESMTPS id ff2si67116214pab.12.2015.06.29.18.01.28 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 29 Jun 2015 18:01:29 -0700 (PDT) Received: from m3051.s.css.fujitsu.com (m3051.s.css.fujitsu.com [10.134.21.209]) by kw-mxauth.gw.nic.fujitsu.com (Postfix) with ESMTP id 8AF0DAC04B2 for ; Tue, 30 Jun 2015 10:01:24 +0900 (JST) Message-ID: <5591EA50.1000000@jp.fujitsu.com> Date: Tue, 30 Jun 2015 10:01:04 +0900 From: Kamezawa Hiroyuki MIME-Version: 1.0 Subject: Re: [RFC v2 PATCH 7/8] mm: add the buddy system interface References: <558E084A.60900@huawei.com> <558E0A28.6060607@huawei.com> <3908561D78D1C84285E8C5FCA982C28F32AA124A@ORSMSX114.amr.corp.intel.com> In-Reply-To: <3908561D78D1C84285E8C5FCA982C28F32AA124A@ORSMSX114.amr.corp.intel.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: "Luck, Tony" , Xishi Qiu , Andrew Morton , "H. Peter Anvin" , Ingo Molnar , Hanjun Guo , Xiexiuqi , "leon@leon.nu" , "Hansen, Dave" , Naoya Horiguchi , Vlastimil Babka , Mel Gorman Cc: Linux MM , LKML On 2015/06/30 8:11, Luck, Tony wrote: >> @@ -814,7 +814,7 @@ int __init_memblock memblock_clear_hotplug(phys_addr_t base, phys_addr_t size) >> */ >> int __init_memblock memblock_mark_mirror(phys_addr_t base, phys_addr_t size) >> { >> - system_has_some_mirror = true; >> + static_key_slow_inc(&system_has_mirror); >> >> return memblock_setclr_flag(base, size, 1, MEMBLOCK_MIRROR); >> } > > This generates some WARN_ON noise when called from efi_find_mirror(): > It seems jump_label_init() is called after memory initialization. (init/main.c::start_kernel()) So, it may be difficut to use static_key function for our purpose because kernel memory allocation may occur before jump_label is ready. Thanks, -Kame > [ 0.000000] e820: last_pfn = 0x7b800 max_arch_pfn = 0x400000000 > [ 0.000000] ------------[ cut here ]------------ > [ 0.000000] WARNING: CPU: 0 PID: 0 at kernel/jump_label.c:61 static_key_slow_inc+0x57/0xc0() > [ 0.000000] static_key_slow_inc used before call to jump_label_init > [ 0.000000] Modules linked in: > > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.1.0 #4 > [ 0.000000] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0065.R01.1505011640 05/01/2015 > [ 0.000000] 0000000000000000 ee366a8dff38f745 ffffffff81997d68 ffffffff816683b4 > [ 0.000000] 0000000000000000 ffffffff81997dc0 ffffffff81997da8 ffffffff8107b0aa > [ 0.000000] ffffffff81d48822 ffffffff81f281a0 0000000040000000 0000001fcb7a4000 > [ 0.000000] Call Trace: > [ 0.000000] [] dump_stack+0x45/0x57 > [ 0.000000] [] warn_slowpath_common+0x8a/0xc0 > [ 0.000000] [] warn_slowpath_fmt+0x55/0x70 > [ 0.000000] [] ? memblock_add_range+0x175/0x19e > [ 0.000000] [] static_key_slow_inc+0x57/0xc0 > [ 0.000000] [] memblock_mark_mirror+0x19/0x33 > [ 0.000000] [] efi_find_mirror+0x59/0xdd > [ 0.000000] [] setup_arch+0x642/0xccf > [ 0.000000] [] ? early_idt_handler_array+0x120/0x120 > [ 0.000000] [] ? printk+0x55/0x6b > [ 0.000000] [] ? early_idt_handler_array+0x120/0x120 > [ 0.000000] [] start_kernel+0xe8/0x4eb > [ 0.000000] [] ? early_idt_handler_array+0x120/0x120 > [ 0.000000] [] ? early_idt_handler_array+0x120/0x120 > [ 0.000000] [] x86_64_start_reservations+0x2a/0x2c > [ 0.000000] [] x86_64_start_kernel+0x14c/0x16f > [ 0.000000] ---[ end trace baa7fa0514e3bc58 ]--- > [ 0.000000] ------------[ cut here ]------------ > > > > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qg0-f54.google.com (mail-qg0-f54.google.com [209.85.192.54]) by kanga.kvack.org (Postfix) with ESMTP id B44076B006E for ; Mon, 29 Jun 2015 21:34:35 -0400 (EDT) Received: by qgii30 with SMTP id i30so12858125qgi.1 for ; Mon, 29 Jun 2015 18:34:35 -0700 (PDT) Received: from szxga03-in.huawei.com (szxga03-in.huawei.com. [119.145.14.66]) by mx.google.com with ESMTPS id i206si43400702qhc.13.2015.06.29.18.34.33 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Mon, 29 Jun 2015 18:34:35 -0700 (PDT) Message-ID: <5591F042.1020304@huawei.com> Date: Tue, 30 Jun 2015 09:26:26 +0800 From: Xishi Qiu MIME-Version: 1.0 Subject: Re: [RFC v2 PATCH 0/8] mm: mirrored memory support for page buddy allocations References: <558E084A.60900@huawei.com> <559161EF.7050405@intel.com> In-Reply-To: <559161EF.7050405@intel.com> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Dave Hansen Cc: Andrew Morton , "H. Peter Anvin" , Ingo Molnar , "Luck, Tony" , Hanjun Guo , Xiexiuqi , leon@leon.nu, Kamezawa Hiroyuki , Naoya Horiguchi , Vlastimil Babka , Mel Gorman , Linux MM , LKML On 2015/6/29 23:19, Dave Hansen wrote: > On 06/26/2015 07:19 PM, Xishi Qiu wrote: >> drivers/base/node.c | 17 ++++--- >> fs/proc/meminfo.c | 6 +++ >> include/linux/memblock.h | 29 ++++++++++-- >> include/linux/mmzone.h | 10 ++++ >> include/linux/vmstat.h | 2 + >> mm/Kconfig | 8 ++++ >> mm/memblock.c | 33 +++++++++++-- >> mm/nobootmem.c | 3 ++ >> mm/page_alloc.c | 117 ++++++++++++++++++++++++++++++++++++----------- >> mm/vmstat.c | 4 ++ >> 10 files changed, 190 insertions(+), 39 deletions(-) > > Has there been any performance analysis done on this code? I'm always > nervous when I see page_alloc.c churn. > Not yet, which benchmark do you suggest? Thanks, Xishi Qiu > > > . > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ob0-f172.google.com (mail-ob0-f172.google.com [209.85.214.172]) by kanga.kvack.org (Postfix) with ESMTP id D9E3D6B0032 for ; Mon, 29 Jun 2015 21:41:55 -0400 (EDT) Received: by obbop1 with SMTP id op1so116588976obb.2 for ; Mon, 29 Jun 2015 18:41:55 -0700 (PDT) Received: from szxga02-in.huawei.com (szxga02-in.huawei.com. [119.145.14.65]) by mx.google.com with ESMTPS id k132si30467299oia.73.2015.06.29.18.41.51 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Mon, 29 Jun 2015 18:41:55 -0700 (PDT) Message-ID: <5591F18E.3060504@huawei.com> Date: Tue, 30 Jun 2015 09:31:58 +0800 From: Xishi Qiu MIME-Version: 1.0 Subject: Re: [RFC v2 PATCH 7/8] mm: add the buddy system interface References: <558E084A.60900@huawei.com> <558E0A28.6060607@huawei.com> <3908561D78D1C84285E8C5FCA982C28F32AA124A@ORSMSX114.amr.corp.intel.com> <5591EA50.1000000@jp.fujitsu.com> In-Reply-To: <5591EA50.1000000@jp.fujitsu.com> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Kamezawa Hiroyuki Cc: "Luck, Tony" , Andrew Morton , "H. Peter Anvin" , Ingo Molnar , Hanjun Guo , Xiexiuqi , "leon@leon.nu" , "Hansen, Dave" , Naoya Horiguchi , Vlastimil Babka , Mel Gorman , Linux MM , LKML On 2015/6/30 9:01, Kamezawa Hiroyuki wrote: > On 2015/06/30 8:11, Luck, Tony wrote: >>> @@ -814,7 +814,7 @@ int __init_memblock memblock_clear_hotplug(phys_addr_t base, phys_addr_t size) >>> */ >>> int __init_memblock memblock_mark_mirror(phys_addr_t base, phys_addr_t size) >>> { >>> - system_has_some_mirror = true; >>> + static_key_slow_inc(&system_has_mirror); >>> >>> return memblock_setclr_flag(base, size, 1, MEMBLOCK_MIRROR); >>> } >> >> This generates some WARN_ON noise when called from efi_find_mirror(): >> > > It seems jump_label_init() is called after memory initialization. (init/main.c::start_kernel()) > So, it may be difficut to use static_key function for our purpose because > kernel memory allocation may occur before jump_label is ready. > > Thanks, > -Kame > Hi Kame, How about like this? Use static bool in bootmem, and use jump label in buddy system. This means we use two variable to do it. Thanks, Xishi Qiu >> [ 0.000000] e820: last_pfn = 0x7b800 max_arch_pfn = 0x400000000 >> [ 0.000000] ------------[ cut here ]------------ >> [ 0.000000] WARNING: CPU: 0 PID: 0 at kernel/jump_label.c:61 static_key_slow_inc+0x57/0xc0() >> [ 0.000000] static_key_slow_inc used before call to jump_label_init >> [ 0.000000] Modules linked in: >> >> [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.1.0 #4 >> [ 0.000000] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0065.R01.1505011640 05/01/2015 >> [ 0.000000] 0000000000000000 ee366a8dff38f745 ffffffff81997d68 ffffffff816683b4 >> [ 0.000000] 0000000000000000 ffffffff81997dc0 ffffffff81997da8 ffffffff8107b0aa >> [ 0.000000] ffffffff81d48822 ffffffff81f281a0 0000000040000000 0000001fcb7a4000 >> [ 0.000000] Call Trace: >> [ 0.000000] [] dump_stack+0x45/0x57 >> [ 0.000000] [] warn_slowpath_common+0x8a/0xc0 >> [ 0.000000] [] warn_slowpath_fmt+0x55/0x70 >> [ 0.000000] [] ? memblock_add_range+0x175/0x19e >> [ 0.000000] [] static_key_slow_inc+0x57/0xc0 >> [ 0.000000] [] memblock_mark_mirror+0x19/0x33 >> [ 0.000000] [] efi_find_mirror+0x59/0xdd >> [ 0.000000] [] setup_arch+0x642/0xccf >> [ 0.000000] [] ? early_idt_handler_array+0x120/0x120 >> [ 0.000000] [] ? printk+0x55/0x6b >> [ 0.000000] [] ? early_idt_handler_array+0x120/0x120 >> [ 0.000000] [] start_kernel+0xe8/0x4eb >> [ 0.000000] [] ? early_idt_handler_array+0x120/0x120 >> [ 0.000000] [] ? early_idt_handler_array+0x120/0x120 >> [ 0.000000] [] x86_64_start_reservations+0x2a/0x2c >> [ 0.000000] [] x86_64_start_kernel+0x14c/0x16f >> [ 0.000000] ---[ end trace baa7fa0514e3bc58 ]--- >> [ 0.000000] ------------[ cut here ]------------ >> >> >> >> >> > > > > . > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f170.google.com (mail-pd0-f170.google.com [209.85.192.170]) by kanga.kvack.org (Postfix) with ESMTP id 717686B0032 for ; Mon, 29 Jun 2015 21:52:13 -0400 (EDT) Received: by pdbci14 with SMTP id ci14so126346236pdb.2 for ; Mon, 29 Jun 2015 18:52:13 -0700 (PDT) Received: from mga14.intel.com (mga14.intel.com. [192.55.52.115]) by mx.google.com with ESMTP id fh4si67307884pdb.61.2015.06.29.18.52.12 for ; Mon, 29 Jun 2015 18:52:12 -0700 (PDT) Message-ID: <5591F64A.3040108@intel.com> Date: Mon, 29 Jun 2015 18:52:10 -0700 From: Dave Hansen MIME-Version: 1.0 Subject: Re: [RFC v2 PATCH 0/8] mm: mirrored memory support for page buddy allocations References: <558E084A.60900@huawei.com> <559161EF.7050405@intel.com> <5591F042.1020304@huawei.com> In-Reply-To: <5591F042.1020304@huawei.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Xishi Qiu Cc: Andrew Morton , "H. Peter Anvin" , Ingo Molnar , "Luck, Tony" , Hanjun Guo , Xiexiuqi , leon@leon.nu, Kamezawa Hiroyuki , Naoya Horiguchi , Vlastimil Babka , Mel Gorman , Linux MM , LKML On 06/29/2015 06:26 PM, Xishi Qiu wrote: >> > Has there been any performance analysis done on this code? I'm always >> > nervous when I see page_alloc.c churn. >> > > Not yet, which benchmark do you suggest? mmtests is always a good place to start. aim9. I'm partial to will-it-scale. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f173.google.com (mail-pd0-f173.google.com [209.85.192.173]) by kanga.kvack.org (Postfix) with ESMTP id E683A6B0032 for ; Mon, 29 Jun 2015 22:01:29 -0400 (EDT) Received: by pdbci14 with SMTP id ci14so126479766pdb.2 for ; Mon, 29 Jun 2015 19:01:29 -0700 (PDT) Received: from mgwym01.jp.fujitsu.com (mgwym01.jp.fujitsu.com. [211.128.242.40]) by mx.google.com with ESMTPS id z1si67346118pda.165.2015.06.29.19.01.28 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 29 Jun 2015 19:01:29 -0700 (PDT) Received: from m3051.s.css.fujitsu.com (m3051.s.css.fujitsu.com [10.134.21.209]) by yt-mxoi2.gw.nic.fujitsu.com (Postfix) with ESMTP id 4DC79AC0219 for ; Tue, 30 Jun 2015 11:01:25 +0900 (JST) Message-ID: <5591F862.7030706@jp.fujitsu.com> Date: Tue, 30 Jun 2015 11:01:06 +0900 From: Kamezawa Hiroyuki MIME-Version: 1.0 Subject: Re: [RFC v2 PATCH 7/8] mm: add the buddy system interface References: <558E084A.60900@huawei.com> <558E0A28.6060607@huawei.com> <3908561D78D1C84285E8C5FCA982C28F32AA124A@ORSMSX114.amr.corp.intel.com> <5591EA50.1000000@jp.fujitsu.com> <5591F18E.3060504@huawei.com> In-Reply-To: <5591F18E.3060504@huawei.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Xishi Qiu Cc: "Luck, Tony" , Andrew Morton , "H. Peter Anvin" , Ingo Molnar , Hanjun Guo , Xiexiuqi , "leon@leon.nu" , "Hansen, Dave" , Naoya Horiguchi , Vlastimil Babka , Mel Gorman , Linux MM , LKML On 2015/06/30 10:31, Xishi Qiu wrote: > On 2015/6/30 9:01, Kamezawa Hiroyuki wrote: > >> On 2015/06/30 8:11, Luck, Tony wrote: >>>> @@ -814,7 +814,7 @@ int __init_memblock memblock_clear_hotplug(phys_addr_t base, phys_addr_t size) >>>> */ >>>> int __init_memblock memblock_mark_mirror(phys_addr_t base, phys_addr_t size) >>>> { >>>> - system_has_some_mirror = true; >>>> + static_key_slow_inc(&system_has_mirror); >>>> >>>> return memblock_setclr_flag(base, size, 1, MEMBLOCK_MIRROR); >>>> } >>> >>> This generates some WARN_ON noise when called from efi_find_mirror(): >>> >> >> It seems jump_label_init() is called after memory initialization. (init/main.c::start_kernel()) >> So, it may be difficut to use static_key function for our purpose because >> kernel memory allocation may occur before jump_label is ready. >> >> Thanks, >> -Kame >> > > Hi Kame, > > How about like this? Use static bool in bootmem, and use jump label in buddy system. > This means we use two variable to do it. > I think it can be done but it should be done in separated patch with enough comment/changelog. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ig0-f182.google.com (mail-ig0-f182.google.com [209.85.213.182]) by kanga.kvack.org (Postfix) with ESMTP id D41616B0032 for ; Mon, 29 Jun 2015 22:53:26 -0400 (EDT) Received: by igrv9 with SMTP id v9so3565314igr.1 for ; Mon, 29 Jun 2015 19:53:26 -0700 (PDT) Received: from szxga02-in.huawei.com (szxga02-in.huawei.com. [119.145.14.65]) by mx.google.com with ESMTPS id a63si3957526ioe.50.2015.06.29.19.53.25 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Mon, 29 Jun 2015 19:53:26 -0700 (PDT) Message-ID: <55920384.7030301@huawei.com> Date: Tue, 30 Jun 2015 10:48:36 +0800 From: Xishi Qiu MIME-Version: 1.0 Subject: Re: [RFC v2 PATCH 0/8] mm: mirrored memory support for page buddy allocations References: <558E084A.60900@huawei.com> <559161EF.7050405@intel.com> <5591F042.1020304@huawei.com> <5591F64A.3040108@intel.com> In-Reply-To: <5591F64A.3040108@intel.com> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Dave Hansen Cc: Andrew Morton , "H. Peter Anvin" , Ingo Molnar , "Luck, Tony" , Hanjun Guo , Xiexiuqi , leon@leon.nu, Kamezawa Hiroyuki , Naoya Horiguchi , Vlastimil Babka , Mel Gorman , Linux MM , LKML On 2015/6/30 9:52, Dave Hansen wrote: > On 06/29/2015 06:26 PM, Xishi Qiu wrote: >>>> Has there been any performance analysis done on this code? I'm always >>>> nervous when I see page_alloc.c churn. >>>> >> Not yet, which benchmark do you suggest? > > mmtests is always a good place to start. aim9. I'm partial to > will-it-scale. > I see, thank you. > > > . > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f48.google.com (mail-pa0-f48.google.com [209.85.220.48]) by kanga.kvack.org (Postfix) with ESMTP id 0B23B6B006C for ; Mon, 29 Jun 2015 22:54:27 -0400 (EDT) Received: by pacws9 with SMTP id ws9so1077442pac.0 for ; Mon, 29 Jun 2015 19:54:26 -0700 (PDT) Received: from szxga01-in.huawei.com (szxga01-in.huawei.com. [58.251.152.64]) by mx.google.com with ESMTPS id nx9si33950250pbb.73.2015.06.29.19.54.22 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Mon, 29 Jun 2015 19:54:26 -0700 (PDT) Message-ID: <559202E2.8060609@huawei.com> Date: Tue, 30 Jun 2015 10:45:54 +0800 From: Xishi Qiu MIME-Version: 1.0 Subject: Re: [RFC v2 PATCH 2/8] mm: introduce MIGRATE_MIRROR to manage the mirrored pages References: <558E084A.60900@huawei.com> <558E0948.2010104@huawei.com> <5590F4A7.4030606@jp.fujitsu.com> In-Reply-To: <5590F4A7.4030606@jp.fujitsu.com> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Kamezawa Hiroyuki Cc: Andrew Morton , "H. Peter Anvin" , Ingo Molnar , "Luck, Tony" , Hanjun Guo , Xiexiuqi , leon@leon.nu, Dave Hansen , Naoya Horiguchi , Vlastimil Babka , Mel Gorman , Linux MM , LKML On 2015/6/29 15:32, Kamezawa Hiroyuki wrote: > On 2015/06/27 11:24, Xishi Qiu wrote: >> This patch introduces a new migratetype called "MIGRATE_MIRROR", it is used to >> allocate mirrored pages. >> When cat /proc/pagetypeinfo, you can see the count of free mirrored blocks. >> >> Signed-off-by: Xishi Qiu > > My fear about this approarch is that this may break something existing. > > Now, when we add MIGRATE_MIRROR type, we'll hide attributes of pageblocks as > MIGRATE_UNMOVABOLE, MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE. > > Logically, MIRROR attribute is independent from page mobility and this overwrites > will make some information lost. > > Then, > >> --- >> include/linux/mmzone.h | 9 +++++++++ >> mm/page_alloc.c | 3 +++ >> mm/vmstat.c | 3 +++ >> 3 files changed, 15 insertions(+) >> >> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h >> index 54d74f6..54e891a 100644 >> --- a/include/linux/mmzone.h >> +++ b/include/linux/mmzone.h >> @@ -39,6 +39,9 @@ enum { >> MIGRATE_UNMOVABLE, >> MIGRATE_RECLAIMABLE, >> MIGRATE_MOVABLE, >> +#ifdef CONFIG_MEMORY_MIRROR >> + MIGRATE_MIRROR, >> +#endif > > I think > MIGRATE_MIRROR_UNMOVABLE, > MIGRATE_MIRROR_RECLAIMABLE, > MIGRATE_MIRROR_MOVABLE, <== adding this may need discuss. > MIGRATE_MIRROR_RESERVED, <== reserved pages should be maintained per mirrored/unmirrored. > Hi Kame, You mean add 3 or 4 new migratetype? > should be added with the following fallback list. > > /* > * MIRROR page range is defined by firmware at boot. The range is limited > * and is used only for kernel memory mirroring. > */ > [MIGRATE_UNMOVABLE_MIRROR] = {MIGRATE_RECLAIMABLE_MIRROR, MIGRATE_RESERVE} > [MIGRATE_RECLAIMABLE_MIRROR] = {MIGRATE_UNMOVABLE_MIRROR, MIGRATE_RESERVE} > Why not like this: {MIGRATE_RECLAIMABLE_MIRROR, MIGRATE_MIRROR_RESERVED, MIGRATE_RESERVE} > Then, we'll not lose the original information of "Reclaiable Pages". > > One problem here is whteher we should have MIGRATE_RESERVE_MIRROR. > > If we never allow users to allocate mirrored memory, we should have MIGRATE_RESERVE_MIRROR. > But it seems to require much more code change to do that. > > Creating a zone or adding an attribues to zones are another design choice. > If we add a new zone, mirror_zone will span others, I'm worry about this maybe have problems. Thanks, Xishi Qiu > Anyway, your patch doesn't takes care of reserved memory calculation at this point. > Please check setup_zone_migrate_reserve() That will be a problem. > > Thanks, > -Kame > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f48.google.com (mail-oi0-f48.google.com [209.85.218.48]) by kanga.kvack.org (Postfix) with ESMTP id D649E6B0032 for ; Mon, 29 Jun 2015 23:00:01 -0400 (EDT) Received: by oiax193 with SMTP id x193so131993427oia.2 for ; Mon, 29 Jun 2015 20:00:01 -0700 (PDT) Received: from szxga02-in.huawei.com (szxga02-in.huawei.com. [119.145.14.65]) by mx.google.com with ESMTPS id m2si30553388oey.25.2015.06.29.19.59.59 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Mon, 29 Jun 2015 20:00:01 -0700 (PDT) Message-ID: <55920450.703@huawei.com> Date: Tue, 30 Jun 2015 10:52:00 +0800 From: Xishi Qiu MIME-Version: 1.0 Subject: Re: [RFC v2 PATCH 1/8] mm: add a new config to manage the code References: <558E084A.60900@huawei.com> <558E0913.7020501@huawei.com> <5590EAA9.5090104@jp.fujitsu.com> In-Reply-To: <5590EAA9.5090104@jp.fujitsu.com> Content-Type: text/plain; charset="Shift_JIS" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Kamezawa Hiroyuki Cc: Andrew Morton , "H. Peter Anvin" , Ingo Molnar , "Luck, Tony" , Hanjun Guo , Xiexiuqi , leon@leon.nu, Dave Hansen , Naoya Horiguchi , Vlastimil Babka , Mel Gorman , Linux MM , LKML On 2015/6/29 14:50, Kamezawa Hiroyuki wrote: > On 2015/06/27 11:23, Xishi Qiu wrote: >> This patch introduces a new config called "CONFIG_ACPI_MIRROR_MEMORY", set it > CONFIG_MEMORY_MIRROR >> off by default. >> >> Signed-off-by: Xishi Qiu >> --- >> mm/Kconfig | 8 ++++++++ >> 1 file changed, 8 insertions(+) >> >> diff --git a/mm/Kconfig b/mm/Kconfig >> index 390214d..c40bb8b 100644 >> --- a/mm/Kconfig >> +++ b/mm/Kconfig >> @@ -200,6 +200,14 @@ config MEMORY_HOTREMOVE >> depends on MEMORY_HOTPLUG && ARCH_ENABLE_MEMORY_HOTREMOVE >> depends on MIGRATION >> >> +config MEMORY_MIRROR > > In following patches, you use CONFIG_MEMORY_MIRROR. > > I think the name is too generic besides it's depends on ACPI. > But I'm not sure address based memory mirror is planned in other platform. > > So, hmm. How about dividing the config into 2 parts like attached ? (just an example) > Seems like a good idea, thank you. > Thanks, > -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f182.google.com (mail-pd0-f182.google.com [209.85.192.182]) by kanga.kvack.org (Postfix) with ESMTP id 0D5086B0032 for ; Tue, 30 Jun 2015 03:53:54 -0400 (EDT) Received: by pdcu2 with SMTP id u2so1668689pdc.3 for ; Tue, 30 Jun 2015 00:53:53 -0700 (PDT) Received: from mgwym03.jp.fujitsu.com (mgwym03.jp.fujitsu.com. [211.128.242.42]) by mx.google.com with ESMTPS id uj2si68766592pab.146.2015.06.30.00.53.52 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 30 Jun 2015 00:53:53 -0700 (PDT) Received: from m3050.s.css.fujitsu.com (msm.b.css.fujitsu.com [10.134.21.208]) by yt-mxauth.gw.nic.fujitsu.com (Postfix) with ESMTP id 325D4AC0634 for ; Tue, 30 Jun 2015 16:53:49 +0900 (JST) Message-ID: <55924AEF.4050107@jp.fujitsu.com> Date: Tue, 30 Jun 2015 16:53:19 +0900 From: Kamezawa Hiroyuki MIME-Version: 1.0 Subject: Re: [RFC v2 PATCH 2/8] mm: introduce MIGRATE_MIRROR to manage the mirrored pages References: <558E084A.60900@huawei.com> <558E0948.2010104@huawei.com> <5590F4A7.4030606@jp.fujitsu.com> <559202E2.8060609@huawei.com> In-Reply-To: <559202E2.8060609@huawei.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Xishi Qiu Cc: Andrew Morton , "H. Peter Anvin" , Ingo Molnar , "Luck, Tony" , Hanjun Guo , Xiexiuqi , leon@leon.nu, Dave Hansen , Naoya Horiguchi , Vlastimil Babka , Mel Gorman , Linux MM , LKML On 2015/06/30 11:45, Xishi Qiu wrote: > On 2015/6/29 15:32, Kamezawa Hiroyuki wrote: > >> On 2015/06/27 11:24, Xishi Qiu wrote: >>> This patch introduces a new migratetype called "MIGRATE_MIRROR", it is used to >>> allocate mirrored pages. >>> When cat /proc/pagetypeinfo, you can see the count of free mirrored blocks. >>> >>> Signed-off-by: Xishi Qiu >> >> My fear about this approarch is that this may break something existing. >> >> Now, when we add MIGRATE_MIRROR type, we'll hide attributes of pageblocks as >> MIGRATE_UNMOVABOLE, MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE. >> >> Logically, MIRROR attribute is independent from page mobility and this overwrites >> will make some information lost. >> >> Then, >> >>> --- >>> include/linux/mmzone.h | 9 +++++++++ >>> mm/page_alloc.c | 3 +++ >>> mm/vmstat.c | 3 +++ >>> 3 files changed, 15 insertions(+) >>> >>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h >>> index 54d74f6..54e891a 100644 >>> --- a/include/linux/mmzone.h >>> +++ b/include/linux/mmzone.h >>> @@ -39,6 +39,9 @@ enum { >>> MIGRATE_UNMOVABLE, >>> MIGRATE_RECLAIMABLE, >>> MIGRATE_MOVABLE, >>> +#ifdef CONFIG_MEMORY_MIRROR >>> + MIGRATE_MIRROR, >>> +#endif >> >> I think >> MIGRATE_MIRROR_UNMOVABLE, >> MIGRATE_MIRROR_RECLAIMABLE, >> MIGRATE_MIRROR_MOVABLE, <== adding this may need discuss. >> MIGRATE_MIRROR_RESERVED, <== reserved pages should be maintained per mirrored/unmirrored. >> > > Hi Kame, > > You mean add 3 or 4 new migratetype? > yes. But please check how NR_MIGRATETYPE_BITS will be. I think this will not have big impact in x86-64 . >> should be added with the following fallback list. >> >> /* >> * MIRROR page range is defined by firmware at boot. The range is limited >> * and is used only for kernel memory mirroring. >> */ >> [MIGRATE_UNMOVABLE_MIRROR] = {MIGRATE_RECLAIMABLE_MIRROR, MIGRATE_RESERVE} >> [MIGRATE_RECLAIMABLE_MIRROR] = {MIGRATE_UNMOVABLE_MIRROR, MIGRATE_RESERVE} >> > > Why not like this: > {MIGRATE_RECLAIMABLE_MIRROR, MIGRATE_MIRROR_RESERVED, MIGRATE_RESERVE} > My mistake. [MIGRATE_UNMOVABLE_MIRROR] = {MIGRATE_RECLAIMABLE_MIRROR, MIGRATE_RESERVE_MIRROR} [MIGRATE_RECLAIMABLE_MIRROR] = {MIGRATE_UNMOVABLE_MIRROR, MIGRATE_RESERVE_MIRROR} was my intention. This means mirrored memory and unmirrored memory is separated completely. But this should affect kswapd or other memory reclaim logic. for example, kswapd stops free pages are more than hi watermark. But mirrored/unmirrored pages exhausted cases are not handled in this series. You need some extra check in memory reclaim logic if you go with migration_type. >> Then, we'll not lose the original information of "Reclaiable Pages". >> >> One problem here is whteher we should have MIGRATE_RESERVE_MIRROR. >> >> If we never allow users to allocate mirrored memory, we should have MIGRATE_RESERVE_MIRROR. >> But it seems to require much more code change to do that. >> >> Creating a zone or adding an attribues to zones are another design choice. >> > > If we add a new zone, mirror_zone will span others, I'm worry about this > maybe have problems. Yes. that's problem. And zoneid bit is very limited resource. (....But memory reclaim logic can be unchanged.) Anyway, I'd like to see your solution with above changes 1st rather than adding zones. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f41.google.com (mail-oi0-f41.google.com [209.85.218.41]) by kanga.kvack.org (Postfix) with ESMTP id 743406B0032 for ; Tue, 30 Jun 2015 05:35:48 -0400 (EDT) Received: by oiax193 with SMTP id x193so3078835oia.2 for ; Tue, 30 Jun 2015 02:35:48 -0700 (PDT) Received: from szxga03-in.huawei.com ([119.145.14.66]) by mx.google.com with ESMTPS id m132si26000853oig.33.2015.06.30.02.35.46 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Tue, 30 Jun 2015 02:35:47 -0700 (PDT) Message-ID: <55925FD5.7030205@huawei.com> Date: Tue, 30 Jun 2015 17:22:29 +0800 From: Xishi Qiu MIME-Version: 1.0 Subject: Re: [RFC v2 PATCH 2/8] mm: introduce MIGRATE_MIRROR to manage the mirrored pages References: <558E084A.60900@huawei.com> <558E0948.2010104@huawei.com> <5590F4A7.4030606@jp.fujitsu.com> <559202E2.8060609@huawei.com> <55924AEF.4050107@jp.fujitsu.com> In-Reply-To: <55924AEF.4050107@jp.fujitsu.com> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Kamezawa Hiroyuki Cc: Andrew Morton , "H. Peter Anvin" , Ingo Molnar , "Luck, Tony" , Hanjun Guo , Xiexiuqi , leon@leon.nu, Dave Hansen , Naoya Horiguchi , Vlastimil Babka , Mel Gorman , Linux MM , LKML On 2015/6/30 15:53, Kamezawa Hiroyuki wrote: > On 2015/06/30 11:45, Xishi Qiu wrote: >> On 2015/6/29 15:32, Kamezawa Hiroyuki wrote: >> >>> On 2015/06/27 11:24, Xishi Qiu wrote: >>>> This patch introduces a new migratetype called "MIGRATE_MIRROR", it is used to >>>> allocate mirrored pages. >>>> When cat /proc/pagetypeinfo, you can see the count of free mirrored blocks. >>>> >>>> Signed-off-by: Xishi Qiu >>> >>> My fear about this approarch is that this may break something existing. >>> >>> Now, when we add MIGRATE_MIRROR type, we'll hide attributes of pageblocks as >>> MIGRATE_UNMOVABOLE, MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE. >>> >>> Logically, MIRROR attribute is independent from page mobility and this overwrites >>> will make some information lost. >>> >>> Then, >>> >>>> --- >>>> include/linux/mmzone.h | 9 +++++++++ >>>> mm/page_alloc.c | 3 +++ >>>> mm/vmstat.c | 3 +++ >>>> 3 files changed, 15 insertions(+) >>>> >>>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h >>>> index 54d74f6..54e891a 100644 >>>> --- a/include/linux/mmzone.h >>>> +++ b/include/linux/mmzone.h >>>> @@ -39,6 +39,9 @@ enum { >>>> MIGRATE_UNMOVABLE, >>>> MIGRATE_RECLAIMABLE, >>>> MIGRATE_MOVABLE, >>>> +#ifdef CONFIG_MEMORY_MIRROR >>>> + MIGRATE_MIRROR, >>>> +#endif >>> >>> I think >>> MIGRATE_MIRROR_UNMOVABLE, >>> MIGRATE_MIRROR_RECLAIMABLE, >>> MIGRATE_MIRROR_MOVABLE, <== adding this may need discuss. >>> MIGRATE_MIRROR_RESERVED, <== reserved pages should be maintained per mirrored/unmirrored. >>> >> >> Hi Kame, >> >> You mean add 3 or 4 new migratetype? >> > > yes. But please check how NR_MIGRATETYPE_BITS will be. > I think this will not have big impact in x86-64 . > >>> should be added with the following fallback list. >>> >>> /* >>> * MIRROR page range is defined by firmware at boot. The range is limited >>> * and is used only for kernel memory mirroring. >>> */ >>> [MIGRATE_UNMOVABLE_MIRROR] = {MIGRATE_RECLAIMABLE_MIRROR, MIGRATE_RESERVE} >>> [MIGRATE_RECLAIMABLE_MIRROR] = {MIGRATE_UNMOVABLE_MIRROR, MIGRATE_RESERVE} >>> >> >> Why not like this: >> {MIGRATE_RECLAIMABLE_MIRROR, MIGRATE_MIRROR_RESERVED, MIGRATE_RESERVE} >> > > My mistake. > [MIGRATE_UNMOVABLE_MIRROR] = {MIGRATE_RECLAIMABLE_MIRROR, MIGRATE_RESERVE_MIRROR} > [MIGRATE_RECLAIMABLE_MIRROR] = {MIGRATE_UNMOVABLE_MIRROR, MIGRATE_RESERVE_MIRROR} > > was my intention. This means mirrored memory and unmirrored memory is separated completely. > > But this should affect kswapd or other memory reclaim logic. > > for example, kswapd stops free pages are more than hi watermark. > But mirrored/unmirrored pages exhausted cases are not handled in this series. > You need some extra check in memory reclaim logic if you go with migration_type. > OK, I understand. Thank you for your suggestion. Thanks, Xishi Qiu > > >>> Then, we'll not lose the original information of "Reclaiable Pages". >>> >>> One problem here is whteher we should have MIGRATE_RESERVE_MIRROR. >>> >>> If we never allow users to allocate mirrored memory, we should have MIGRATE_RESERVE_MIRROR. >>> But it seems to require much more code change to do that. >>> >>> Creating a zone or adding an attribues to zones are another design choice. >>> >> >> If we add a new zone, mirror_zone will span others, I'm worry about this >> maybe have problems. > > Yes. that's problem. And zoneid bit is very limited resource. > (....But memory reclaim logic can be unchanged.) > > Anyway, I'd like to see your solution with above changes 1st rather than adding zones. > > Thanks, > -Kame > > > > . > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f172.google.com (mail-wi0-f172.google.com [209.85.212.172]) by kanga.kvack.org (Postfix) with ESMTP id 9683B6B006C for ; Tue, 30 Jun 2015 05:42:00 -0400 (EDT) Received: by wicgi11 with SMTP id gi11so11023403wic.0 for ; Tue, 30 Jun 2015 02:42:00 -0700 (PDT) Received: from mx2.suse.de (cantor2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id bl10si18407070wib.9.2015.06.30.02.41.58 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 30 Jun 2015 02:41:59 -0700 (PDT) Date: Tue, 30 Jun 2015 10:41:50 +0100 From: Mel Gorman Subject: Re: [RFC v2 PATCH 0/8] mm: mirrored memory support for page buddy allocations Message-ID: <20150630094149.GA6812@suse.de> References: <558E084A.60900@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <558E084A.60900@huawei.com> Sender: owner-linux-mm@kvack.org List-ID: To: Xishi Qiu Cc: Andrew Morton , "H. Peter Anvin" , Ingo Molnar , "Luck, Tony" , Hanjun Guo , Xiexiuqi , leon@leon.nu, Kamezawa Hiroyuki , Dave Hansen , Naoya Horiguchi , Vlastimil Babka , Linux MM , LKML On Sat, Jun 27, 2015 at 10:19:54AM +0800, Xishi Qiu wrote: > Intel Xeon processor E7 v3 product family-based platforms introduces support > for partial memory mirroring called as 'Address Range Mirroring'. This feature > allows BIOS to specify a subset of total available memory to be mirrored (and > optionally also specify whether to mirror the range 0-4 GB). This capability > allows user to make an appropriate tradeoff between non-mirrored memory range > and mirrored memory range thus optimizing total available memory and still > achieving highly reliable memory range for mission critical workloads and/or > kernel space. > > Tony has already send a patchset to supprot this feature at boot time. > https://lkml.org/lkml/2015/5/8/521 > This patchset is based on Tony's, it can support the feature after boot time. > Use mirrored memory for all kernel allocations. > This is my first time glancing through the series so I'm not aware of any past discussion. Hopefully there are no repeats. Broadly speaking though I'm not comfortable with the series. First and foremost, there is uncontrolled access to the memory because it's any kernel request. This includes even short-lived ones that do not need mirroring such as network buffers or caches. Network network traffic can be retried, caches can be reconstructed from disk etc. Kernel page tables, struct page corruption etc are much harder to recover from. Who are the expected users of this memory and how are they meant to be prioritised? What happens if they fail to be mirrored? What happens if the mirrored memory is all used up and a high priority request arrives? Is there any prioritisation of one subsystem over another? What about boot-memory allocations, should they ever use mirrored memory? The expected users are important and this series does not address it. Callers do not specify the flag, you just assume that kernel allocations must be mirrored. If the allocation request fails, then you assume it was MIGRATE_RECLAIMABLE later in the series. This is wrong as it'll break fragmentation avoidance on machines with mirrored memory. Even if you were to use migrate types to handle mirrored memory, you need to treat mirrored memory as a type of reserve or else as a first preference for allocations requested. The fact that this will be used by very few machines but affects the memory footprint of the page allocator is a general concern. When active, it affects the fast paths for all users whether they care about mirroring or not. If all free memory is in the MIGRATE_MIRROR then all user-space requests will be rejected but reclaim will not make any progress if the zone is balanced. The system may go prematurely OOM as no progress is made. Getting around this is tricky and affects a few fast paths. Generally, the easiest approach would be zone-based but I recognise that it has problems of its own. Basically, overall I feel this series is the wrong approach but not knowing who the users are making is much harder to judge. I strongly suspect that if mirrored memory is to be properly used then it needs to be available before the page allocator is even active. Once active, there needs to be controlled access for allocation requests that are really critical to mirror and not just all kernel allocations. None of that would use a MIGRATE_TYPE approach. It would be alterations to the bootmem allocator and access to an explicit reserve that is not accounted for as "free memory" and accessed via an explicit GFP flag. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f175.google.com (mail-wi0-f175.google.com [209.85.212.175]) by kanga.kvack.org (Postfix) with ESMTP id AB1306B0032 for ; Tue, 30 Jun 2015 06:47:02 -0400 (EDT) Received: by wiar9 with SMTP id r9so31774609wia.1 for ; Tue, 30 Jun 2015 03:47:02 -0700 (PDT) Received: from mail-wi0-x22e.google.com (mail-wi0-x22e.google.com. [2a00:1450:400c:c05::22e]) by mx.google.com with ESMTPS id bn1si18668348wib.38.2015.06.30.03.47.00 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 30 Jun 2015 03:47:01 -0700 (PDT) Received: by wiar9 with SMTP id r9so31773805wia.1 for ; Tue, 30 Jun 2015 03:47:00 -0700 (PDT) Date: Tue, 30 Jun 2015 12:46:54 +0200 From: Ingo Molnar Subject: Re: [RFC v2 PATCH 0/8] mm: mirrored memory support for page buddy allocations Message-ID: <20150630104654.GA24932@gmail.com> References: <558E084A.60900@huawei.com> <20150630094149.GA6812@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150630094149.GA6812@suse.de> Sender: owner-linux-mm@kvack.org List-ID: To: Mel Gorman Cc: Xishi Qiu , Andrew Morton , "H. Peter Anvin" , "Luck, Tony" , Hanjun Guo , Xiexiuqi , leon@leon.nu, Kamezawa Hiroyuki , Dave Hansen , Naoya Horiguchi , Vlastimil Babka , Linux MM , LKML * Mel Gorman wrote: > [...] > > Basically, overall I feel this series is the wrong approach but not knowing who > the users are making is much harder to judge. I strongly suspect that if > mirrored memory is to be properly used then it needs to be available before the > page allocator is even active. Once active, there needs to be controlled access > for allocation requests that are really critical to mirror and not just all > kernel allocations. None of that would use a MIGRATE_TYPE approach. It would be > alterations to the bootmem allocator and access to an explicit reserve that is > not accounted for as "free memory" and accessed via an explicit GFP flag. So I think the main goal is to avoid kernel crashes when a #MC memory fault arrives on a piece of memory that is owned by the kernel. In that sense 'protecting' all kernel allocations is natural: we don't know how to recover from faults that affect kernel memory. We do know how to recover from faults that affect user-space memory alone. So if a mechanism is in place that prioritizes 3 groups of allocators: - non-recoverable memory (kernel allocations mostly) - high priority user memory (critical apps that must never fail) - recoverable user memory (non-dirty caches that can simply be dropped, non-critical apps, etc.) then we can make use of this hardware feature. I suspect this series tries to move in that direction. Thanks, Ingo -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f178.google.com (mail-wi0-f178.google.com [209.85.212.178]) by kanga.kvack.org (Postfix) with ESMTP id 47DBD6B0032 for ; Tue, 30 Jun 2015 07:54:04 -0400 (EDT) Received: by wiwl6 with SMTP id l6so129384109wiw.0 for ; Tue, 30 Jun 2015 04:54:03 -0700 (PDT) Received: from mx2.suse.de (cantor2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id ce9si18991821wib.4.2015.06.30.04.54.01 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 30 Jun 2015 04:54:02 -0700 (PDT) Date: Tue, 30 Jun 2015 12:53:53 +0100 From: Mel Gorman Subject: Re: [RFC v2 PATCH 0/8] mm: mirrored memory support for page buddy allocations Message-ID: <20150630115353.GB6812@suse.de> References: <558E084A.60900@huawei.com> <20150630094149.GA6812@suse.de> <20150630104654.GA24932@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20150630104654.GA24932@gmail.com> Sender: owner-linux-mm@kvack.org List-ID: To: Ingo Molnar Cc: Xishi Qiu , Andrew Morton , "H. Peter Anvin" , "Luck, Tony" , Hanjun Guo , Xiexiuqi , leon@leon.nu, Kamezawa Hiroyuki , Dave Hansen , Naoya Horiguchi , Vlastimil Babka , Linux MM , LKML On Tue, Jun 30, 2015 at 12:46:54PM +0200, Ingo Molnar wrote: > > * Mel Gorman wrote: > > > [...] > > > > Basically, overall I feel this series is the wrong approach but not knowing who > > the users are making is much harder to judge. I strongly suspect that if > > mirrored memory is to be properly used then it needs to be available before the > > page allocator is even active. Once active, there needs to be controlled access > > for allocation requests that are really critical to mirror and not just all > > kernel allocations. None of that would use a MIGRATE_TYPE approach. It would be > > alterations to the bootmem allocator and access to an explicit reserve that is > > not accounted for as "free memory" and accessed via an explicit GFP flag. > > So I think the main goal is to avoid kernel crashes when a #MC memory fault > arrives on a piece of memory that is owned by the kernel. > Sounds logical. In that case, bootmem awareness would be crucial. Enabling support in just the page allocator is too late. > In that sense 'protecting' all kernel allocations is natural: we don't know how to > recover from faults that affect kernel memory. > It potentially uses all mirrored memory on memory that does not need that sort of guarantee. For example, if there was a MC on memory backing the inode cache then potentially that is recoverable as long as the inodes were not dirty. That's a minor detail as the kernel could later protect only MIGRATE_UNMOVABLE requests instead of all kernel allocations if fatal MC in kernel space could be distinguished from non-fatal checks. Bootmem awareness is much more important either way. If that was addressed then potentially a MIGRATE_UNMOVABLE_MIRROR type could be created that is only used for MIGRATE_UNMOVABLE allocations and never for user-space. That misses MIGRATE_RECLAIMABLE so if that is required then we need something else that both preserves fragmentation avoidance and avoid introducing loads of new migratetypes. Reclaim-related issues could be partially avoided by forbidding use from userspace and accounting for the size of MIGRATE_UNMOVABLE_MIRROR during watermark checks. > We do know how to recover from faults that affect user-space memory alone. > > So if a mechanism is in place that prioritizes 3 groups of allocators: > > - non-recoverable memory (kernel allocations mostly) > So bootmem at the very least followed by MIGRATE_UNMOVABLE requests whether they are accounted for by zones of MIGRATE_TYPES. > - high priority user memory (critical apps that must never fail) > This one is problematic with a MIGRATE_TYPE-based approach such as the one in this series. If a high priority requires memory and MIGRATE_MIRROR is full then some of it must be reclaimed. With a MIGRATE_TYPE approach, the kernel may reclaim a lot of unnecessary memory trying to free some MIGRATE_MIRROR memory with no guarantee of success. It'll look like unnecessary thrashing from userspace but difficult to diagnose as reclaim stats are per-zone based. Dealing with this needs either a zone-based approach or a lot of surgery to reclaim (similar to what the node-based LRU series does actually when it skips pages when the caller requires lowmem pages). -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f45.google.com (mail-pa0-f45.google.com [209.85.220.45]) by kanga.kvack.org (Postfix) with ESMTP id 445BE6B0032 for ; Tue, 30 Jun 2015 14:12:42 -0400 (EDT) Received: by paceq1 with SMTP id eq1so9073014pac.3 for ; Tue, 30 Jun 2015 11:12:41 -0700 (PDT) Received: from mga11.intel.com (mga11.intel.com. [192.55.52.93]) by mx.google.com with ESMTP id hu9si71343445pdb.252.2015.06.30.11.12.40 for ; Tue, 30 Jun 2015 11:12:40 -0700 (PDT) From: "Luck, Tony" Subject: RE: [RFC v2 PATCH 0/8] mm: mirrored memory support for page buddy allocations Date: Tue, 30 Jun 2015 18:12:35 +0000 Message-ID: <3908561D78D1C84285E8C5FCA982C28F32AA1974@ORSMSX114.amr.corp.intel.com> References: <558E084A.60900@huawei.com> <20150630094149.GA6812@suse.de> <20150630104654.GA24932@gmail.com> <20150630115353.GB6812@suse.de> In-Reply-To: <20150630115353.GB6812@suse.de> Content-Language: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Sender: owner-linux-mm@kvack.org List-ID: To: Mel Gorman , Ingo Molnar Cc: Xishi Qiu , Andrew Morton , "H. Peter Anvin" , Hanjun Guo , Xiexiuqi , "leon@leon.nu" , Kamezawa Hiroyuki , "Hansen, Dave" , Naoya Horiguchi , Vlastimil Babka , Linux MM , LKML > Sounds logical. In that case, bootmem awareness would be crucial. > Enabling support in just the page allocator is too late. Andrew already applied some patches from me that I think covered bootmem mirror allocations: commit fc6daaf93151877748f8096af6b3fddb147f22d6 mm/memblock: add extra "flags" to memblock to allow selection of memory= based on attribute commit a3f5bafcc04aaf62990e0cf3ced1cc6d8dc6fe95 mm/memblock: allocate boot time data structures from mirrored memory commit b05b9f5f9dcf593a0e9327676b78e6c17b4218e8 x86, mirror: x86 enabling - find mirrored memory ranges If I missed something, please let me know. >> In that sense 'protecting' all kernel allocations is natural: we don't k= now how to=20 >> recover from faults that affect kernel memory. >>=20 > > It potentially uses all mirrored memory on memory that does not need that > sort of guarantee. For example, if there was a MC on memory backing the > inode cache then potentially that is recoverable as long as the inodes > were not dirty. Right now this is hard to do. On Intel we get a broadcast machine check th= at may catch bystander cpus holding locks that we might need to look at kernel structures to make decisions on what we just lost. That may get easier wit= h local machine check (only the logical cpu that tried to consume the corrupt data gets the machine check ... patches for Linux are in for basic support = of this ... waiting for h/w that does it). > That's a minor detail as the kernel could later protect > only MIGRATE_UNMOVABLE requests instead of all kernel allocations if fata= l > MC in kernel space could be distinguished from non-fatal checks. So the immediate use case is large memory servers (hundred+ Gbytes to TBytes) running some applications that use most of memory in user mode (like a database). We mirror enough memory to cover *all* the kernel alloc= ations so that a bad memory access with be fixed from the mirror for kernel, or re= sult in SIGBUS to a process for user page ... either way we don't crash the syst= em. Perhaps in the future we might find some places in the kernel where we can cover a lot of memory without too many code changes ... e.g. things like pagecopy(). At that time we'd have to think about allocation priorities. -Tony -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f43.google.com (mail-pa0-f43.google.com [209.85.220.43]) by kanga.kvack.org (Postfix) with ESMTP id 84ECA6B0253 for ; Mon, 13 Jul 2015 01:08:16 -0400 (EDT) Received: by pachj5 with SMTP id hj5so25470773pac.3 for ; Sun, 12 Jul 2015 22:08:16 -0700 (PDT) Received: from szxga02-in.huawei.com (szxga02-in.huawei.com. [119.145.14.65]) by mx.google.com with ESMTPS id z1si26430403pda.165.2015.07.12.22.08.13 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Sun, 12 Jul 2015 22:08:15 -0700 (PDT) Message-ID: <55A3450E.6050707@huawei.com> Date: Mon, 13 Jul 2015 12:56:46 +0800 From: Xishi Qiu MIME-Version: 1.0 Subject: Re: [RFC v2 PATCH 0/8] mm: mirrored memory support for page buddy allocations References: <558E084A.60900@huawei.com> <20150630094149.GA6812@suse.de> <20150630104654.GA24932@gmail.com> <20150630115353.GB6812@suse.de> In-Reply-To: <20150630115353.GB6812@suse.de> Content-Type: text/plain; charset="ISO-8859-15" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Mel Gorman Cc: Ingo Molnar , Andrew Morton , "H. Peter Anvin" , "Luck, Tony" , Hanjun Guo , Xiexiuqi , leon@leon.nu, Kamezawa Hiroyuki , Dave Hansen , Naoya Horiguchi , Vlastimil Babka , Linux MM , LKML On 2015/6/30 19:53, Mel Gorman wrote: > On Tue, Jun 30, 2015 at 12:46:54PM +0200, Ingo Molnar wrote: >> >> * Mel Gorman wrote: >> >>> [...] >>> >>> Basically, overall I feel this series is the wrong approach but not knowing who >>> the users are making is much harder to judge. I strongly suspect that if >>> mirrored memory is to be properly used then it needs to be available before the >>> page allocator is even active. Once active, there needs to be controlled access >>> for allocation requests that are really critical to mirror and not just all >>> kernel allocations. None of that would use a MIGRATE_TYPE approach. It would be >>> alterations to the bootmem allocator and access to an explicit reserve that is >>> not accounted for as "free memory" and accessed via an explicit GFP flag. >> >> So I think the main goal is to avoid kernel crashes when a #MC memory fault >> arrives on a piece of memory that is owned by the kernel. >> > > Sounds logical. In that case, bootmem awareness would be crucial. > Enabling support in just the page allocator is too late. > >> In that sense 'protecting' all kernel allocations is natural: we don't know how to >> recover from faults that affect kernel memory. >> > > It potentially uses all mirrored memory on memory that does not need that > sort of guarantee. For example, if there was a MC on memory backing the > inode cache then potentially that is recoverable as long as the inodes > were not dirty. That's a minor detail as the kernel could later protect > only MIGRATE_UNMOVABLE requests instead of all kernel allocations if fatal > MC in kernel space could be distinguished from non-fatal checks. > > Bootmem awareness is much more important either way. If that was addressed > then potentially a MIGRATE_UNMOVABLE_MIRROR type could be created that > is only used for MIGRATE_UNMOVABLE allocations and never for user-space. > That misses MIGRATE_RECLAIMABLE so if that is required then we need > something else that both preserves fragmentation avoidance and avoid > introducing loads of new migratetypes. > > Reclaim-related issues could be partially avoided by forbidding use from > userspace and accounting for the size of MIGRATE_UNMOVABLE_MIRROR during > watermark checks. > >> We do know how to recover from faults that affect user-space memory alone. >> >> So if a mechanism is in place that prioritizes 3 groups of allocators: >> >> - non-recoverable memory (kernel allocations mostly) >> > > So bootmem at the very least followed by MIGRATE_UNMOVABLE requests whether > they are accounted for by zones of MIGRATE_TYPES. > >> - high priority user memory (critical apps that must never fail) >> > > This one is problematic with a MIGRATE_TYPE-based approach such as the one in > this series. If a high priority requires memory and MIGRATE_MIRROR is full > then some of it must be reclaimed. With a MIGRATE_TYPE approach, the kernel > may reclaim a lot of unnecessary memory trying to free some MIGRATE_MIRROR > memory with no guarantee of success. It'll look like unnecessary thrashing > from userspace but difficult to diagnose as reclaim stats are per-zone based. > Dealing with this needs either a zone-based approach or a lot of surgery > to reclaim (similar to what the node-based LRU series does actually when > it skips pages when the caller requires lowmem pages). > Hi Mel, Thank you for your comment. Sorry for replying late and some of it is not very understanding for me. If fatal memory faults in kernel space could be distinguished from non-fatal, we can use only MIGRATE_UNMOVABLE_MIRROR, if can't, use two types for MIGRATE_RECLAIMABLE and MIGRATE_UNMOVABLE, right? Reclaim-related issues is similar to CMA in zone_watermark_ok(), right? If we protect high priority user memory, use a new mirrored zone may be better, right? How about use a flag(e.g. GFP_MIRROR) to in kernel space allocation? Can we use it to sort kernel space allocation? And it can also called by user space via madvise and mmap. Thanks, Xishi Qiu -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752034AbbF0CVV (ORCPT ); Fri, 26 Jun 2015 22:21:21 -0400 Received: from szxga02-in.huawei.com ([119.145.14.65]:29421 "EHLO szxga02-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751017AbbF0CVR (ORCPT ); Fri, 26 Jun 2015 22:21:17 -0400 Message-ID: <558E084A.60900@huawei.com> Date: Sat, 27 Jun 2015 10:19:54 +0800 From: Xishi Qiu User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: Andrew Morton , "H. Peter Anvin" , Ingo Molnar , "Luck, Tony" , Hanjun Guo , Xiexiuqi , , Kamezawa Hiroyuki , Dave Hansen , Naoya Horiguchi , Vlastimil Babka , Mel Gorman CC: Xishi Qiu , Linux MM , LKML Subject: [RFC v2 PATCH 0/8] mm: mirrored memory support for page buddy allocations Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.177.25.179] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Intel Xeon processor E7 v3 product family-based platforms introduces support for partial memory mirroring called as 'Address Range Mirroring'. This feature allows BIOS to specify a subset of total available memory to be mirrored (and optionally also specify whether to mirror the range 0-4 GB). This capability allows user to make an appropriate tradeoff between non-mirrored memory range and mirrored memory range thus optimizing total available memory and still achieving highly reliable memory range for mission critical workloads and/or kernel space. Tony has already send a patchset to supprot this feature at boot time. https://lkml.org/lkml/2015/5/8/521 This patchset is based on Tony's, it can support the feature after boot time. Use mirrored memory for all kernel allocations. TBD: - Add compatibility with memory online/offline, memory compaction, CMA... - Need to discuss the implementation ideas, add a new zone or a new migratetype or others. V2: - Use memblock which marked MEMBLOCK_MIRROR to find mirrored memory instead of mirror_info. - Remove __GFP_MIRROR and /proc/sys/vm/mirrorable. - Use mirrored memory for all kernel allocations. Xishi Qiu (8): mm: add a new config to manage the code mm: introduce MIGRATE_MIRROR to manage the mirrored pages mm: find mirrored memory in memblock mm: add mirrored memory to buddy system mm: introduce a new zone_stat_item NR_FREE_MIRROR_PAGES mm: add free mirrored pages info mm: add the buddy system interface mm: add the PCP interface drivers/base/node.c | 17 ++++--- fs/proc/meminfo.c | 6 +++ include/linux/memblock.h | 29 ++++++++++-- include/linux/mmzone.h | 10 ++++ include/linux/vmstat.h | 2 + mm/Kconfig | 8 ++++ mm/memblock.c | 33 +++++++++++-- mm/nobootmem.c | 3 ++ mm/page_alloc.c | 117 ++++++++++++++++++++++++++++++++++++----------- mm/vmstat.c | 4 ++ 10 files changed, 190 insertions(+), 39 deletions(-) -- 2.0.0 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752544AbbF0CXx (ORCPT ); Fri, 26 Jun 2015 22:23:53 -0400 Received: from szxga03-in.huawei.com ([119.145.14.66]:50846 "EHLO szxga03-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750993AbbF0CXp (ORCPT ); Fri, 26 Jun 2015 22:23:45 -0400 Message-ID: <558E0913.7020501@huawei.com> Date: Sat, 27 Jun 2015 10:23:15 +0800 From: Xishi Qiu User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: Andrew Morton , "H. Peter Anvin" , Ingo Molnar , "Luck, Tony" , Hanjun Guo , Xiexiuqi , , Kamezawa Hiroyuki , Dave Hansen , Naoya Horiguchi , Vlastimil Babka , Mel Gorman CC: Xishi Qiu , Linux MM , LKML Subject: [RFC v2 PATCH 1/8] mm: add a new config to manage the code References: <558E084A.60900@huawei.com> In-Reply-To: <558E084A.60900@huawei.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.177.25.179] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A020204.558E0923.0088,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0, ip=0.0.0.0, so=2013-05-26 15:14:31, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: 9f710278f29bff016574e6577133c379 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This patch introduces a new config called "CONFIG_ACPI_MIRROR_MEMORY", set it off by default. Signed-off-by: Xishi Qiu --- mm/Kconfig | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/mm/Kconfig b/mm/Kconfig index 390214d..c40bb8b 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -200,6 +200,14 @@ config MEMORY_HOTREMOVE depends on MEMORY_HOTPLUG && ARCH_ENABLE_MEMORY_HOTREMOVE depends on MIGRATION +config MEMORY_MIRROR + bool "Address range mirroring support" + depends on X86 && MEMORY_FAILURE + default n + help + This feature depends on hardware and firmware support. + ACPI or EFI records the mirror info. + # # If we have space for more page flags then we can enable additional # optimizations and functionality. -- 2.0.0 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753053AbbF0CYp (ORCPT ); Fri, 26 Jun 2015 22:24:45 -0400 Received: from szxga01-in.huawei.com ([58.251.152.64]:53480 "EHLO szxga01-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751762AbbF0CYj (ORCPT ); Fri, 26 Jun 2015 22:24:39 -0400 Message-ID: <558E0948.2010104@huawei.com> Date: Sat, 27 Jun 2015 10:24:08 +0800 From: Xishi Qiu User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: Andrew Morton , "H. Peter Anvin" , Ingo Molnar , "Luck, Tony" , Hanjun Guo , Xiexiuqi , , Kamezawa Hiroyuki , Dave Hansen , Naoya Horiguchi , Vlastimil Babka , Mel Gorman CC: Xishi Qiu , Linux MM , LKML Subject: [RFC v2 PATCH 2/8] mm: introduce MIGRATE_MIRROR to manage the mirrored pages References: <558E084A.60900@huawei.com> In-Reply-To: <558E084A.60900@huawei.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.177.25.179] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This patch introduces a new migratetype called "MIGRATE_MIRROR", it is used to allocate mirrored pages. When cat /proc/pagetypeinfo, you can see the count of free mirrored blocks. Signed-off-by: Xishi Qiu --- include/linux/mmzone.h | 9 +++++++++ mm/page_alloc.c | 3 +++ mm/vmstat.c | 3 +++ 3 files changed, 15 insertions(+) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 54d74f6..54e891a 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -39,6 +39,9 @@ enum { MIGRATE_UNMOVABLE, MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE, +#ifdef CONFIG_MEMORY_MIRROR + MIGRATE_MIRROR, +#endif MIGRATE_PCPTYPES, /* the number of types on the pcp lists */ MIGRATE_RESERVE = MIGRATE_PCPTYPES, #ifdef CONFIG_CMA @@ -69,6 +72,12 @@ enum { # define is_migrate_cma(migratetype) false #endif +#ifdef CONFIG_MEMORY_MIRROR +# define is_migrate_mirror(migratetype) unlikely((migratetype) == MIGRATE_MIRROR) +#else +# define is_migrate_mirror(migratetype) false +#endif + #define for_each_migratetype_order(order, type) \ for (order = 0; order < MAX_ORDER; order++) \ for (type = 0; type < MIGRATE_TYPES; type++) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index ebffa0e..6e4d79f 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3216,6 +3216,9 @@ static void show_migration_types(unsigned char type) [MIGRATE_UNMOVABLE] = 'U', [MIGRATE_RECLAIMABLE] = 'E', [MIGRATE_MOVABLE] = 'M', +#ifdef CONFIG_MEMORY_MIRROR + [MIGRATE_MIRROR] = 'O', +#endif [MIGRATE_RESERVE] = 'R', #ifdef CONFIG_CMA [MIGRATE_CMA] = 'C', diff --git a/mm/vmstat.c b/mm/vmstat.c index 4f5cd97..d0323e0 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -901,6 +901,9 @@ static char * const migratetype_names[MIGRATE_TYPES] = { "Unmovable", "Reclaimable", "Movable", +#ifdef CONFIG_MEMORY_MIRROR + "Mirror", +#endif "Reserve", #ifdef CONFIG_CMA "CMA", -- 2.0.0 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753065AbbF0CZV (ORCPT ); Fri, 26 Jun 2015 22:25:21 -0400 Received: from szxga01-in.huawei.com ([58.251.152.64]:53774 "EHLO szxga01-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751762AbbF0CZP (ORCPT ); Fri, 26 Jun 2015 22:25:15 -0400 Message-ID: <558E0974.6060206@huawei.com> Date: Sat, 27 Jun 2015 10:24:52 +0800 From: Xishi Qiu User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: Andrew Morton , "H. Peter Anvin" , Ingo Molnar , "Luck, Tony" , Hanjun Guo , Xiexiuqi , , Kamezawa Hiroyuki , Dave Hansen , Naoya Horiguchi , Vlastimil Babka , Mel Gorman CC: Xishi Qiu , Linux MM , LKML Subject: [RFC v2 PATCH 3/8] mm: find mirrored memory in memblock References: <558E084A.60900@huawei.com> In-Reply-To: <558E084A.60900@huawei.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.177.25.179] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Add a macro for_each_mirror_pfn_range() to find mirrored memory in memblock. This patch is based on Tony's patchset "Find mirrored memory, use for boot time allocations" Signed-off-by: Xishi Qiu --- include/linux/memblock.h | 25 ++++++++++++++++++++++--- mm/memblock.c | 6 +++++- 2 files changed, 27 insertions(+), 4 deletions(-) diff --git a/include/linux/memblock.h b/include/linux/memblock.h index 0215ffd..97f71ca 100644 --- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -171,7 +171,8 @@ static inline bool memblock_is_mirror(struct memblock_region *m) #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP int memblock_search_pfn_nid(unsigned long pfn, unsigned long *start_pfn, unsigned long *end_pfn); -void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn, +void __next_mem_pfn_range(int *idx, int nid, ulong flags, + unsigned long *out_start_pfn, unsigned long *out_end_pfn, int *out_nid); /** @@ -185,8 +186,26 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn, * Walks over configured memory ranges. */ #define for_each_mem_pfn_range(i, nid, p_start, p_end, p_nid) \ - for (i = -1, __next_mem_pfn_range(&i, nid, p_start, p_end, p_nid); \ - i >= 0; __next_mem_pfn_range(&i, nid, p_start, p_end, p_nid)) + for (i = -1, __next_mem_pfn_range(&i, nid, MEMBLOCK_NONE, \ + p_start, p_end, p_nid); \ + i >= 0; __next_mem_pfn_range(&i, nid, MEMBLOCK_NONE, \ + p_start, p_end, p_nid)) + +/** + * for_each_mirror_pfn_range - early mirrored memory pfn range iterator + * @i: an integer used as loop variable + * @nid: node selector, %MAX_NUMNODES for all nodes + * @p_start: ptr to ulong for start pfn of the range, can be %NULL + * @p_end: ptr to ulong for end pfn of the range, can be %NULL + * @p_nid: ptr to int for nid of the range, can be %NULL + * + * Walks over configured mirrored memory ranges. + */ +#define for_each_mirror_pfn_range(i, nid, p_start, p_end, p_nid) \ + for (i = -1, __next_mem_pfn_range(&i, nid, MEMBLOCK_MIRROR, \ + p_start, p_end, p_nid); \ + i >= 0; __next_mem_pfn_range(&i, nid, MEMBLOCK_MIRROR, \ + p_start, p_end, p_nid)) #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */ /** diff --git a/mm/memblock.c b/mm/memblock.c index 1b444c7..7612876 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -1040,7 +1040,7 @@ void __init_memblock __next_mem_range_rev(u64 *idx, int nid, ulong flags, /* * Common iterator interface used to define for_each_mem_range(). */ -void __init_memblock __next_mem_pfn_range(int *idx, int nid, +void __init_memblock __next_mem_pfn_range(int *idx, int nid, ulong flags, unsigned long *out_start_pfn, unsigned long *out_end_pfn, int *out_nid) { @@ -1050,6 +1050,10 @@ void __init_memblock __next_mem_pfn_range(int *idx, int nid, while (++*idx < type->cnt) { r = &type->regions[*idx]; + /* if we want mirror memory skip non-mirror memory regions */ + if ((flags & MEMBLOCK_MIRROR) && !memblock_is_mirror(r)) + continue; + if (PFN_UP(r->base) >= PFN_DOWN(r->base + r->size)) continue; if (nid == MAX_NUMNODES || nid == r->nid) -- 2.0.0 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751764AbbF0C0J (ORCPT ); Fri, 26 Jun 2015 22:26:09 -0400 Received: from szxga03-in.huawei.com ([119.145.14.66]:51880 "EHLO szxga03-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750993AbbF0C0B (ORCPT ); Fri, 26 Jun 2015 22:26:01 -0400 Message-ID: <558E09A1.2090102@huawei.com> Date: Sat, 27 Jun 2015 10:25:37 +0800 From: Xishi Qiu User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: Andrew Morton , "H. Peter Anvin" , Ingo Molnar , "Luck, Tony" , Hanjun Guo , Xiexiuqi , , Kamezawa Hiroyuki , Dave Hansen , Naoya Horiguchi , Vlastimil Babka , Mel Gorman CC: Xishi Qiu , Linux MM , LKML Subject: [RFC v2 PATCH 4/8] mm: add mirrored memory to buddy system References: <558E084A.60900@huawei.com> In-Reply-To: <558E084A.60900@huawei.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.177.25.179] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A020202.558E09AE.0064,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0, ip=0.0.0.0, so=2013-05-26 15:14:31, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: ae2f36c1db268057da3e543895e19805 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Before free bootmem, set mirrored pageblock's migratetype to MIGRATE_MIRROR, so they could free to buddy system's MIGRATE_MIRROR list. When set reserved memory, skip the mirrored memory. Signed-off-by: Xishi Qiu --- include/linux/memblock.h | 3 +++ mm/memblock.c | 21 +++++++++++++++++++++ mm/nobootmem.c | 3 +++ mm/page_alloc.c | 3 +++ 4 files changed, 30 insertions(+) diff --git a/include/linux/memblock.h b/include/linux/memblock.h index 97f71ca..53be030 100644 --- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -81,6 +81,9 @@ int memblock_mark_hotplug(phys_addr_t base, phys_addr_t size); int memblock_clear_hotplug(phys_addr_t base, phys_addr_t size); int memblock_mark_mirror(phys_addr_t base, phys_addr_t size); ulong choose_memblock_flags(void); +#ifdef CONFIG_MEMORY_MIRROR +void memblock_mark_migratemirror(void); +#endif /* Low level functions */ int memblock_add_range(struct memblock_type *type, diff --git a/mm/memblock.c b/mm/memblock.c index 7612876..0d0b210 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -19,6 +19,7 @@ #include #include #include +#include #include #include @@ -818,6 +819,26 @@ int __init_memblock memblock_mark_mirror(phys_addr_t base, phys_addr_t size) return memblock_setclr_flag(base, size, 1, MEMBLOCK_MIRROR); } +#ifdef CONFIG_MEMORY_MIRROR +void __init_memblock memblock_mark_migratemirror(void) +{ + unsigned long start_pfn, end_pfn, pfn; + int i, node; + struct page *page; + + printk(KERN_DEBUG "Mirrored memory:\n"); + for_each_mirror_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, + &node) { + printk(KERN_DEBUG " node %3d: [mem %#010llx-%#010llx]\n", + node, PFN_PHYS(start_pfn), PFN_PHYS(end_pfn) - 1); + for (pfn = start_pfn; pfn < end_pfn; + pfn += pageblock_nr_pages) { + page = pfn_to_page(pfn); + set_pageblock_migratetype(page, MIGRATE_MIRROR); + } + } +} +#endif /** * __next__mem_range - next function for for_each_free_mem_range() etc. diff --git a/mm/nobootmem.c b/mm/nobootmem.c index 5258386..31aa6d4 100644 --- a/mm/nobootmem.c +++ b/mm/nobootmem.c @@ -129,6 +129,9 @@ static unsigned long __init free_low_memory_core_early(void) u64 i; memblock_clear_hotplug(0, -1); +#ifdef CONFIG_MEMORY_MIRROR + memblock_mark_migratemirror(); +#endif for_each_free_mem_range(i, NUMA_NO_NODE, MEMBLOCK_NONE, &start, &end, NULL) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 6e4d79f..aea78a5 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4118,6 +4118,9 @@ static void setup_zone_migrate_reserve(struct zone *zone) block_migratetype = get_pageblock_migratetype(page); + if (is_migrate_mirror(block_migratetype)) + continue; + /* Only test what is necessary when the reserves are not met */ if (reserve > 0) { /* -- 2.0.0 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752293AbbF0C0n (ORCPT ); Fri, 26 Jun 2015 22:26:43 -0400 Received: from szxga03-in.huawei.com ([119.145.14.66]:52130 "EHLO szxga03-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753538AbbF0C0f (ORCPT ); Fri, 26 Jun 2015 22:26:35 -0400 Message-ID: <558E09CA.7020909@huawei.com> Date: Sat, 27 Jun 2015 10:26:18 +0800 From: Xishi Qiu User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: Andrew Morton , "H. Peter Anvin" , Ingo Molnar , "Luck, Tony" , Hanjun Guo , Xiexiuqi , , Kamezawa Hiroyuki , Dave Hansen , Naoya Horiguchi , Vlastimil Babka , Mel Gorman CC: Xishi Qiu , Linux MM , LKML Subject: [RFC v2 PATCH 5/8] mm: introduce a new zone_stat_item NR_FREE_MIRROR_PAGES References: <558E084A.60900@huawei.com> In-Reply-To: <558E084A.60900@huawei.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.177.25.179] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A020203.558E09D5.0097,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0, ip=0.0.0.0, so=2013-05-26 15:14:31, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: a643d2fcad1bf48410835b080a484d87 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This patch introduces a new zone_stat_item called "NR_FREE_MIRROR_PAGES", it is used to storage free mirrored pages count. Signed-off-by: Xishi Qiu --- include/linux/mmzone.h | 1 + include/linux/vmstat.h | 2 ++ mm/vmstat.c | 1 + 3 files changed, 4 insertions(+) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 54e891a..7cc0a29 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -166,6 +166,7 @@ enum zone_stat_item { WORKINGSET_NODERECLAIM, NR_ANON_TRANSPARENT_HUGEPAGES, NR_FREE_CMA_PAGES, + NR_FREE_MIRROR_PAGES, NR_VM_ZONE_STAT_ITEMS }; /* diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h index 82e7db7..d0a7268 100644 --- a/include/linux/vmstat.h +++ b/include/linux/vmstat.h @@ -283,6 +283,8 @@ static inline void __mod_zone_freepage_state(struct zone *zone, int nr_pages, __mod_zone_page_state(zone, NR_FREE_PAGES, nr_pages); if (is_migrate_cma(migratetype)) __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, nr_pages); + if (is_migrate_mirror(migratetype)) + __mod_zone_page_state(zone, NR_FREE_MIRROR_PAGES, nr_pages); } extern const char * const vmstat_text[]; diff --git a/mm/vmstat.c b/mm/vmstat.c index d0323e0..7ee11ca 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -739,6 +739,7 @@ const char * const vmstat_text[] = { "workingset_nodereclaim", "nr_anon_transparent_hugepages", "nr_free_cma", + "nr_free_mirror", /* enum writeback_stat_item counters */ "nr_dirty_threshold", -- 2.0.0 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752933AbbF0Cag (ORCPT ); Fri, 26 Jun 2015 22:30:36 -0400 Received: from szxga01-in.huawei.com ([58.251.152.64]:55568 "EHLO szxga01-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751788AbbF0Caa (ORCPT ); Fri, 26 Jun 2015 22:30:30 -0400 Message-ID: <558E09F4.70908@huawei.com> Date: Sat, 27 Jun 2015 10:27:00 +0800 From: Xishi Qiu User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: Andrew Morton , "H. Peter Anvin" , Ingo Molnar , "Luck, Tony" , Hanjun Guo , Xiexiuqi , , Kamezawa Hiroyuki , Dave Hansen , Naoya Horiguchi , Vlastimil Babka , Mel Gorman CC: Xishi Qiu , Linux MM , LKML Subject: [RFC v2 PATCH 6/8] mm: add free mirrored pages info References: <558E084A.60900@huawei.com> In-Reply-To: <558E084A.60900@huawei.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.177.25.179] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Add the count of free mirrored pages in the following paths: /proc/meminfo /proc/zoneinfo /sys/devices/system/node/node XX/meminfo /sys/devices/system/node/node XX/vmstat Signed-off-by: Xishi Qiu --- drivers/base/node.c | 17 +++++++++++------ fs/proc/meminfo.c | 6 ++++++ mm/page_alloc.c | 7 +++++-- 3 files changed, 22 insertions(+), 8 deletions(-) diff --git a/drivers/base/node.c b/drivers/base/node.c index a2aa65b..d1a3556 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -114,6 +114,9 @@ static ssize_t node_read_meminfo(struct device *dev, #ifdef CONFIG_TRANSPARENT_HUGEPAGE "Node %d AnonHugePages: %8lu kB\n" #endif +#ifdef CONFIG_MEMORY_MIRROR + "Node %d MirrorFree: %8lu kB\n" +#endif , nid, K(node_page_state(nid, NR_FILE_DIRTY)), nid, K(node_page_state(nid, NR_WRITEBACK)), @@ -130,14 +133,16 @@ static ssize_t node_read_meminfo(struct device *dev, nid, K(node_page_state(nid, NR_SLAB_RECLAIMABLE) + node_page_state(nid, NR_SLAB_UNRECLAIMABLE)), nid, K(node_page_state(nid, NR_SLAB_RECLAIMABLE)), -#ifdef CONFIG_TRANSPARENT_HUGEPAGE nid, K(node_page_state(nid, NR_SLAB_UNRECLAIMABLE)) - , nid, - K(node_page_state(nid, NR_ANON_TRANSPARENT_HUGEPAGES) * - HPAGE_PMD_NR)); -#else - nid, K(node_page_state(nid, NR_SLAB_UNRECLAIMABLE))); +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + , nid, K(node_page_state(nid, NR_ANON_TRANSPARENT_HUGEPAGES) * + HPAGE_PMD_NR) +#endif +#ifdef CONFIG_MEMORY_MIRROR + , nid, K(node_page_state(nid, NR_FREE_MIRROR_PAGES)) #endif + ); + n += hugetlb_report_node_meminfo(nid, buf + n); return n; } diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c index d3ebf2e..d1ebb20 100644 --- a/fs/proc/meminfo.c +++ b/fs/proc/meminfo.c @@ -145,6 +145,9 @@ static int meminfo_proc_show(struct seq_file *m, void *v) "CmaTotal: %8lu kB\n" "CmaFree: %8lu kB\n" #endif +#ifdef CONFIG_MEMORY_MIRROR + "MirrorFree: %8lu kB\n" +#endif , K(i.totalram), K(i.freeram), @@ -204,6 +207,9 @@ static int meminfo_proc_show(struct seq_file *m, void *v) , K(totalcma_pages) , K(global_page_state(NR_FREE_CMA_PAGES)) #endif +#ifdef CONFIG_MEMORY_MIRROR + , K(global_page_state(NR_FREE_MIRROR_PAGES)) +#endif ); hugetlb_report_meminfo(m); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index aea78a5..4c5bc50 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3268,7 +3268,7 @@ void show_free_areas(unsigned int filter) " unevictable:%lu dirty:%lu writeback:%lu unstable:%lu\n" " slab_reclaimable:%lu slab_unreclaimable:%lu\n" " mapped:%lu shmem:%lu pagetables:%lu bounce:%lu\n" - " free:%lu free_pcp:%lu free_cma:%lu\n", + " free:%lu free_pcp:%lu free_cma:%lu free_mirror:%lu\n", global_page_state(NR_ACTIVE_ANON), global_page_state(NR_INACTIVE_ANON), global_page_state(NR_ISOLATED_ANON), @@ -3287,7 +3287,8 @@ void show_free_areas(unsigned int filter) global_page_state(NR_BOUNCE), global_page_state(NR_FREE_PAGES), free_pcp, - global_page_state(NR_FREE_CMA_PAGES)); + global_page_state(NR_FREE_CMA_PAGES), + global_page_state(NR_FREE_MIRROR_PAGES)); for_each_populated_zone(zone) { int i; @@ -3328,6 +3329,7 @@ void show_free_areas(unsigned int filter) " free_pcp:%lukB" " local_pcp:%ukB" " free_cma:%lukB" + " free_mirror:%lukB" " writeback_tmp:%lukB" " pages_scanned:%lu" " all_unreclaimable? %s" @@ -3361,6 +3363,7 @@ void show_free_areas(unsigned int filter) K(free_pcp), K(this_cpu_read(zone->pageset->pcp.count)), K(zone_page_state(zone, NR_FREE_CMA_PAGES)), + K(zone_page_state(zone, NR_FREE_MIRROR_PAGES)), K(zone_page_state(zone, NR_WRITEBACK_TEMP)), K(zone_page_state(zone, NR_PAGES_SCANNED)), (!zone_reclaimable(zone) ? "yes" : "no") -- 2.0.0 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752997AbbF0Cbf (ORCPT ); Fri, 26 Jun 2015 22:31:35 -0400 Received: from szxga02-in.huawei.com ([119.145.14.65]:33649 "EHLO szxga02-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752183AbbF0Cb1 (ORCPT ); Fri, 26 Jun 2015 22:31:27 -0400 Message-ID: <558E0A28.6060607@huawei.com> Date: Sat, 27 Jun 2015 10:27:52 +0800 From: Xishi Qiu User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: Andrew Morton , "H. Peter Anvin" , Ingo Molnar , "Luck, Tony" , Hanjun Guo , Xiexiuqi , , Kamezawa Hiroyuki , Dave Hansen , Naoya Horiguchi , Vlastimil Babka , Mel Gorman CC: Xishi Qiu , Linux MM , LKML Subject: [RFC v2 PATCH 7/8] mm: add the buddy system interface References: <558E084A.60900@huawei.com> In-Reply-To: <558E084A.60900@huawei.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.177.25.179] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Add the buddy system interface for address range mirroring feature. Use mirrored memory for all kernel allocations. If there is no mirrored pages left, try to use other types pages. Signed-off-by: Xishi Qiu --- include/linux/memblock.h | 1 + mm/memblock.c | 6 +++--- mm/page_alloc.c | 19 +++++++++++++++++++ 3 files changed, 23 insertions(+), 3 deletions(-) diff --git a/include/linux/memblock.h b/include/linux/memblock.h index 53be030..8c33ac0 100644 --- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -81,6 +81,7 @@ int memblock_mark_hotplug(phys_addr_t base, phys_addr_t size); int memblock_clear_hotplug(phys_addr_t base, phys_addr_t size); int memblock_mark_mirror(phys_addr_t base, phys_addr_t size); ulong choose_memblock_flags(void); +extern struct static_key system_has_mirror; #ifdef CONFIG_MEMORY_MIRROR void memblock_mark_migratemirror(void); #endif diff --git a/mm/memblock.c b/mm/memblock.c index 0d0b210..430ad87 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -55,14 +55,14 @@ int memblock_debug __initdata_memblock; #ifdef CONFIG_MOVABLE_NODE bool movable_node_enabled __initdata_memblock = false; #endif -static bool system_has_some_mirror __initdata_memblock = false; +struct static_key system_has_mirror = STATIC_KEY_INIT; static int memblock_can_resize __initdata_memblock; static int memblock_memory_in_slab __initdata_memblock = 0; static int memblock_reserved_in_slab __initdata_memblock = 0; ulong __init_memblock choose_memblock_flags(void) { - return system_has_some_mirror ? MEMBLOCK_MIRROR : MEMBLOCK_NONE; + return static_key_false(&system_has_mirror) ? MEMBLOCK_MIRROR : MEMBLOCK_NONE; } /* inline so we don't get a warning when pr_debug is compiled out */ @@ -814,7 +814,7 @@ int __init_memblock memblock_clear_hotplug(phys_addr_t base, phys_addr_t size) */ int __init_memblock memblock_mark_mirror(phys_addr_t base, phys_addr_t size) { - system_has_some_mirror = true; + static_key_slow_inc(&system_has_mirror); return memblock_setclr_flag(base, size, 1, MEMBLOCK_MIRROR); } diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 4c5bc50..8a6125e 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1033,6 +1033,9 @@ static int fallbacks[MIGRATE_TYPES][4] = { [MIGRATE_UNMOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE, MIGRATE_RESERVE }, [MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE, MIGRATE_MOVABLE, MIGRATE_RESERVE }, [MIGRATE_MOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE }, +#ifdef CONFIG_MEMORY_MIRROR + [MIGRATE_MIRROR] = { MIGRATE_RESERVE }, /* Never used */ +#endif #ifdef CONFIG_CMA [MIGRATE_CMA] = { MIGRATE_RESERVE }, /* Never used */ #endif @@ -1295,6 +1298,15 @@ retry_reserve: page = __rmqueue_smallest(zone, order, migratetype); if (unlikely(!page) && migratetype != MIGRATE_RESERVE) { + /* + * If there is no mirrored memory left, alloc other types + * memory. But we should not change the pageblock's + * migratetype between mirror and others, so just use + * MIGRATE_RECLAIMABLE to retry + */ + if (is_migrate_mirror(migratetype)) + return __rmqueue(zone, order, MIGRATE_RECLAIMABLE); + if (migratetype == MIGRATE_MOVABLE) page = __rmqueue_cma_fallback(zone, order); @@ -2872,6 +2884,13 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, if (IS_ENABLED(CONFIG_CMA) && ac.migratetype == MIGRATE_MOVABLE) alloc_flags |= ALLOC_CMA; +#ifdef CONFIG_MEMORY_MIRROR + /* Alloc mirrored memory for kernel */ + if (static_key_false(&system_has_mirror) + && !(gfp_mask & __GFP_MOVABLE)) + ac.migratetype = MIGRATE_MIRROR; +#endif + retry_cpuset: cpuset_mems_cookie = read_mems_allowed_begin(); -- 2.0.0 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755237AbbF0C3D (ORCPT ); Fri, 26 Jun 2015 22:29:03 -0400 Received: from szxga03-in.huawei.com ([119.145.14.66]:52882 "EHLO szxga03-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752997AbbF0C2y (ORCPT ); Fri, 26 Jun 2015 22:28:54 -0400 Message-ID: <558E0A51.1040807@huawei.com> Date: Sat, 27 Jun 2015 10:28:33 +0800 From: Xishi Qiu User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: Andrew Morton , "H. Peter Anvin" , Ingo Molnar , "Luck, Tony" , Hanjun Guo , Xiexiuqi , , Kamezawa Hiroyuki , Dave Hansen , Naoya Horiguchi , Vlastimil Babka , Mel Gorman CC: Xishi Qiu , Linux MM , LKML Subject: [RFC v2 PATCH 8/8] mm: add the PCP interface References: <558E084A.60900@huawei.com> In-Reply-To: <558E084A.60900@huawei.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.177.25.179] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A020201.558E0A5D.0040,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0, ip=0.0.0.0, so=2013-05-26 15:14:31, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: 6a5d4aefa04965c47efb4b21990d29d3 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Abstract the PCP code in __rmqueue_pcp(), and do not call fallback in rmqueue_bulk() when the migratetype is mirror. Signed-off-by: Xishi Qiu --- mm/page_alloc.c | 85 +++++++++++++++++++++++++++++++++++++++++---------------- 1 file changed, 61 insertions(+), 24 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 8a6125e..bb44463 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1337,11 +1337,20 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, unsigned long count, struct list_head *list, int migratetype, bool cold) { - int i; + int i, mt; + struct page *page; spin_lock(&zone->lock); for (i = 0; i < count; ++i) { - struct page *page = __rmqueue(zone, order, migratetype); + /* + * If there is no mirrored memory left, just keep the list + * empty, because we can not mix other types pages into the + * mirror list. + */ + if (is_migrate_mirror(migratetype)) + page = __rmqueue_smallest(zone, order, migratetype); + else + page = __rmqueue(zone, order, migratetype); if (unlikely(page == NULL)) break; @@ -1359,15 +1368,61 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, else list_add_tail(&page->lru, list); list = &page->lru; - if (is_migrate_cma(get_freepage_migratetype(page))) + + mt = get_freepage_migratetype(page); + if (is_migrate_cma(mt)) __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, -(1 << order)); + if (is_migrate_mirror(mt)) + __mod_zone_page_state(zone, NR_FREE_MIRROR_PAGES, + -(1 << order)); } __mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order)); spin_unlock(&zone->lock); return i; } +static struct page *__rmqueue_pcp(struct zone *zone, unsigned int order, + gfp_t gfp_flags, int migratetype) +{ + struct page *page; + struct per_cpu_pages *pcp; + struct list_head *list; + bool cold; + + cold = ((gfp_flags & __GFP_COLD) != 0); + pcp = &this_cpu_ptr(zone->pageset)->pcp; + +retry: + list = &pcp->lists[migratetype]; + if (list_empty(list)) { + pcp->count += rmqueue_bulk(zone, 0, + pcp->batch, list, + migratetype, cold); + if (unlikely(list_empty(list))) { + /* + * If there is no mirrored memory left, alloc other + * types PCP, use MIGRATE_RECLAIMABLE to retry + */ + if (is_migrate_mirror(migratetype)) { + migratetype = MIGRATE_RECLAIMABLE; + goto retry; + } else + return NULL; + } + } + + if (cold) + page = list_entry(list->prev, struct page, lru); + else + page = list_entry(list->next, struct page, lru); + + list_del(&page->lru); + pcp->count--; + + return page; +} + #ifdef CONFIG_NUMA /* * Called from the vmstat counter updater to drain pagesets of this @@ -1713,30 +1768,12 @@ struct page *buffered_rmqueue(struct zone *preferred_zone, { unsigned long flags; struct page *page; - bool cold = ((gfp_flags & __GFP_COLD) != 0); if (likely(order == 0)) { - struct per_cpu_pages *pcp; - struct list_head *list; - local_irq_save(flags); - pcp = &this_cpu_ptr(zone->pageset)->pcp; - list = &pcp->lists[migratetype]; - if (list_empty(list)) { - pcp->count += rmqueue_bulk(zone, 0, - pcp->batch, list, - migratetype, cold); - if (unlikely(list_empty(list))) - goto failed; - } - - if (cold) - page = list_entry(list->prev, struct page, lru); - else - page = list_entry(list->next, struct page, lru); - - list_del(&page->lru); - pcp->count--; + page = __rmqueue_pcp(zone, order, gfp_flags, migratetype); + if (!page) + goto failed; } else { if (unlikely(gfp_flags & __GFP_NOFAIL)) { /* -- 2.0.0 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752434AbbF2GxK (ORCPT ); Mon, 29 Jun 2015 02:53:10 -0400 Received: from mgwkm04.jp.fujitsu.com ([202.219.69.171]:31385 "EHLO mgwkm04.jp.fujitsu.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751108AbbF2GxD (ORCPT ); Mon, 29 Jun 2015 02:53:03 -0400 X-SecurityPolicyCheck: OK by SHieldMailChecker v2.3.2 X-SHieldMailCheckerPolicyVersion: FJ-ISEC-20150223 X-SHieldMailCheckerMailID: 874258868317421bab268ae9914ec0bc Message-ID: <5590EAA9.5090104@jp.fujitsu.com> Date: Mon, 29 Jun 2015 15:50:17 +0900 From: Kamezawa Hiroyuki User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Xishi Qiu , Andrew Morton , "H. Peter Anvin" , Ingo Molnar , "Luck, Tony" , Hanjun Guo , Xiexiuqi , leon@leon.nu, Dave Hansen , Naoya Horiguchi , Vlastimil Babka , Mel Gorman CC: Linux MM , LKML Subject: Re: [RFC v2 PATCH 1/8] mm: add a new config to manage the code References: <558E084A.60900@huawei.com> <558E0913.7020501@huawei.com> In-Reply-To: <558E0913.7020501@huawei.com> Content-Type: multipart/mixed; boundary="------------030208070301040603070806" X-TM-AS-MML: disable Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This is a multi-part message in MIME format. --------------030208070301040603070806 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit On 2015/06/27 11:23, Xishi Qiu wrote: > This patch introduces a new config called "CONFIG_ACPI_MIRROR_MEMORY", set it CONFIG_MEMORY_MIRROR > off by default. > > Signed-off-by: Xishi Qiu > --- > mm/Kconfig | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/mm/Kconfig b/mm/Kconfig > index 390214d..c40bb8b 100644 > --- a/mm/Kconfig > +++ b/mm/Kconfig > @@ -200,6 +200,14 @@ config MEMORY_HOTREMOVE > depends on MEMORY_HOTPLUG && ARCH_ENABLE_MEMORY_HOTREMOVE > depends on MIGRATION > > +config MEMORY_MIRROR In following patches, you use CONFIG_MEMORY_MIRROR. I think the name is too generic besides it's depends on ACPI. But I'm not sure address based memory mirror is planned in other platform. So, hmm. How about dividing the config into 2 parts like attached ? (just an example) Thanks, -Kame --------------030208070301040603070806 Content-Type: text/plain; charset=Shift_JIS; name="0001-add-a-new-config-option-for-memory-mirror.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename*0="0001-add-a-new-config-option-for-memory-mirror.patch" >>From 88213b0f76e2f603c5a38690cbd85a4df1e646ba Mon Sep 17 00:00:00 2001 From: KAMEZAWA Hiroyuki Date: Mon, 29 Jun 2015 15:35:47 +0900 Subject: [PATCH] add a new config option for memory mirror Add a new config option "CONFIG_MEMORY_MIRROR" for kernel assisted memory mirroring. In UEFI2.5 spec, Address based memory mirror is defined and it allows the system to create partial memory mirror. The feature guards important(kernel) memory to be mirrored by using the address based memory mirror. Now this depends on cpu architecure Haswell? Broadwell? --- arch/x86/Kconfig | 6 ++++++ mm/Kconfig | 9 +++++++++ 2 files changed, 15 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index e33e01b..56f17df 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -596,6 +596,12 @@ config X86_SUPPORTS_MEMORY_FAILURE depends on X86_64 || !SPARSEMEM select ARCH_SUPPORTS_MEMORY_FAILURE +config X86_SUPPORTS_MEMORY_MIRROR + def_bool y + # UEFI 2.5spec. address based memory mirror, supported only after XXX + depends on X86_64 && ARCH_SUPPORTS_MEMORY_FAILURE + select ARCH_MEMORY_MIRROR + config STA2X11 bool "STA2X11 Companion Chip Support" depends on X86_32_NON_STANDARD && PCI diff --git a/mm/Kconfig b/mm/Kconfig index b3a60ee..e14dc2d 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -200,6 +200,15 @@ config MEMORY_HOTREMOVE depends on MEMORY_HOTPLUG && ARCH_ENABLE_MEMORY_HOTREMOVE depends on MIGRATION +config MEMORY_MIRROR + bool "Address range mirroring support" + depends on ARCH_MEMORY_MIRROR + default n + help + This feature allows the kernel to assist address based memory + mirror supported by architecture/firmware. And place some types + of memory (especially, kernel memory) placed into mirrored range. + # # If we have space for more page flags then we can enable additional # optimizations and functionality. -- 1.9.3 --------------030208070301040603070806-- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752597AbbF2Hem (ORCPT ); Mon, 29 Jun 2015 03:34:42 -0400 Received: from mgwym03.jp.fujitsu.com ([211.128.242.42]:45519 "EHLO mgwym03.jp.fujitsu.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751833AbbF2Hed (ORCPT ); Mon, 29 Jun 2015 03:34:33 -0400 X-SecurityPolicyCheck: OK by SHieldMailChecker v2.3.2 X-SHieldMailCheckerPolicyVersion: FJ-ISEC-20150223 X-SHieldMailCheckerMailID: 001582a71d084b0d8fd6d4f4c758937e Message-ID: <5590F4A7.4030606@jp.fujitsu.com> Date: Mon, 29 Jun 2015 16:32:55 +0900 From: Kamezawa Hiroyuki User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Xishi Qiu , Andrew Morton , "H. Peter Anvin" , Ingo Molnar , "Luck, Tony" , Hanjun Guo , Xiexiuqi , leon@leon.nu, Dave Hansen , Naoya Horiguchi , Vlastimil Babka , Mel Gorman CC: Linux MM , LKML Subject: Re: [RFC v2 PATCH 2/8] mm: introduce MIGRATE_MIRROR to manage the mirrored pages References: <558E084A.60900@huawei.com> <558E0948.2010104@huawei.com> In-Reply-To: <558E0948.2010104@huawei.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-TM-AS-MML: disable Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2015/06/27 11:24, Xishi Qiu wrote: > This patch introduces a new migratetype called "MIGRATE_MIRROR", it is used to > allocate mirrored pages. > When cat /proc/pagetypeinfo, you can see the count of free mirrored blocks. > > Signed-off-by: Xishi Qiu My fear about this approarch is that this may break something existing. Now, when we add MIGRATE_MIRROR type, we'll hide attributes of pageblocks as MIGRATE_UNMOVABOLE, MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE. Logically, MIRROR attribute is independent from page mobility and this overwrites will make some information lost. Then, > --- > include/linux/mmzone.h | 9 +++++++++ > mm/page_alloc.c | 3 +++ > mm/vmstat.c | 3 +++ > 3 files changed, 15 insertions(+) > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index 54d74f6..54e891a 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -39,6 +39,9 @@ enum { > MIGRATE_UNMOVABLE, > MIGRATE_RECLAIMABLE, > MIGRATE_MOVABLE, > +#ifdef CONFIG_MEMORY_MIRROR > + MIGRATE_MIRROR, > +#endif I think MIGRATE_MIRROR_UNMOVABLE, MIGRATE_MIRROR_RECLAIMABLE, MIGRATE_MIRROR_MOVABLE, <== adding this may need discuss. MIGRATE_MIRROR_RESERVED, <== reserved pages should be maintained per mirrored/unmirrored. should be added with the following fallback list. /* * MIRROR page range is defined by firmware at boot. The range is limited * and is used only for kernel memory mirroring. */ [MIGRATE_UNMOVABLE_MIRROR] = {MIGRATE_RECLAIMABLE_MIRROR, MIGRATE_RESERVE} [MIGRATE_RECLAIMABLE_MIRROR] = {MIGRATE_UNMOVABLE_MIRROR, MIGRATE_RESERVE} Then, we'll not lose the original information of "Reclaiable Pages". One problem here is whteher we should have MIGRATE_RESERVE_MIRROR. If we never allow users to allocate mirrored memory, we should have MIGRATE_RESERVE_MIRROR. But it seems to require much more code change to do that. Creating a zone or adding an attribues to zones are another design choice. Anyway, your patch doesn't takes care of reserved memory calculation at this point. Please check setup_zone_migrate_reserve() That will be a problem. Thanks, -Kame > MIGRATE_PCPTYPES, /* the number of types on the pcp lists */ > MIGRATE_RESERVE = MIGRATE_PCPTYPES, > #ifdef CONFIG_CMA > @@ -69,6 +72,12 @@ enum { > # define is_migrate_cma(migratetype) false > #endif > > +#ifdef CONFIG_MEMORY_MIRROR > +# define is_migrate_mirror(migratetype) unlikely((migratetype) == MIGRATE_MIRROR) > +#else > +# define is_migrate_mirror(migratetype) false > +#endif > + > #define for_each_migratetype_order(order, type) \ > for (order = 0; order < MAX_ORDER; order++) \ > for (type = 0; type < MIGRATE_TYPES; type++) > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index ebffa0e..6e4d79f 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -3216,6 +3216,9 @@ static void show_migration_types(unsigned char type) > [MIGRATE_UNMOVABLE] = 'U', > [MIGRATE_RECLAIMABLE] = 'E', > [MIGRATE_MOVABLE] = 'M', > +#ifdef CONFIG_MEMORY_MIRROR > + [MIGRATE_MIRROR] = 'O', > +#endif > [MIGRATE_RESERVE] = 'R', > #ifdef CONFIG_CMA > [MIGRATE_CMA] = 'C', > diff --git a/mm/vmstat.c b/mm/vmstat.c > index 4f5cd97..d0323e0 100644 > --- a/mm/vmstat.c > +++ b/mm/vmstat.c > @@ -901,6 +901,9 @@ static char * const migratetype_names[MIGRATE_TYPES] = { > "Unmovable", > "Reclaimable", > "Movable", > +#ifdef CONFIG_MEMORY_MIRROR > + "Mirror", > +#endif > "Reserve", > #ifdef CONFIG_CMA > "CMA", > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752134AbbF2Hki (ORCPT ); Mon, 29 Jun 2015 03:40:38 -0400 Received: from mgwkm04.jp.fujitsu.com ([202.219.69.171]:45339 "EHLO mgwkm04.jp.fujitsu.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751034AbbF2Hkb (ORCPT ); Mon, 29 Jun 2015 03:40:31 -0400 X-SecurityPolicyCheck: OK by SHieldMailChecker v2.3.2 X-SHieldMailCheckerPolicyVersion: FJ-ISEC-20150223 X-SHieldMailCheckerMailID: 3913398d0a954711bd599f5eb7ecfb78 Message-ID: <5590F648.2080808@jp.fujitsu.com> Date: Mon, 29 Jun 2015 16:39:52 +0900 From: Kamezawa Hiroyuki User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Xishi Qiu , Andrew Morton , "H. Peter Anvin" , Ingo Molnar , "Luck, Tony" , Hanjun Guo , Xiexiuqi , leon@leon.nu, Dave Hansen , Naoya Horiguchi , Vlastimil Babka , Mel Gorman CC: Linux MM , LKML Subject: Re: [RFC v2 PATCH 4/8] mm: add mirrored memory to buddy system References: <558E084A.60900@huawei.com> <558E09A1.2090102@huawei.com> In-Reply-To: <558E09A1.2090102@huawei.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-TM-AS-MML: disable Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2015/06/27 11:25, Xishi Qiu wrote: > Before free bootmem, set mirrored pageblock's migratetype to MIGRATE_MIRROR, so > they could free to buddy system's MIGRATE_MIRROR list. > When set reserved memory, skip the mirrored memory. > > Signed-off-by: Xishi Qiu > --- > include/linux/memblock.h | 3 +++ > mm/memblock.c | 21 +++++++++++++++++++++ > mm/nobootmem.c | 3 +++ > mm/page_alloc.c | 3 +++ > 4 files changed, 30 insertions(+) > > diff --git a/include/linux/memblock.h b/include/linux/memblock.h > index 97f71ca..53be030 100644 > --- a/include/linux/memblock.h > +++ b/include/linux/memblock.h > @@ -81,6 +81,9 @@ int memblock_mark_hotplug(phys_addr_t base, phys_addr_t size); > int memblock_clear_hotplug(phys_addr_t base, phys_addr_t size); > int memblock_mark_mirror(phys_addr_t base, phys_addr_t size); > ulong choose_memblock_flags(void); > +#ifdef CONFIG_MEMORY_MIRROR > +void memblock_mark_migratemirror(void); > +#endif > > /* Low level functions */ > int memblock_add_range(struct memblock_type *type, > diff --git a/mm/memblock.c b/mm/memblock.c > index 7612876..0d0b210 100644 > --- a/mm/memblock.c > +++ b/mm/memblock.c > @@ -19,6 +19,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -818,6 +819,26 @@ int __init_memblock memblock_mark_mirror(phys_addr_t base, phys_addr_t size) > return memblock_setclr_flag(base, size, 1, MEMBLOCK_MIRROR); > } > > +#ifdef CONFIG_MEMORY_MIRROR > +void __init_memblock memblock_mark_migratemirror(void) > +{ > + unsigned long start_pfn, end_pfn, pfn; > + int i, node; > + struct page *page; > + > + printk(KERN_DEBUG "Mirrored memory:\n"); > + for_each_mirror_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, > + &node) { > + printk(KERN_DEBUG " node %3d: [mem %#010llx-%#010llx]\n", > + node, PFN_PHYS(start_pfn), PFN_PHYS(end_pfn) - 1); > + for (pfn = start_pfn; pfn < end_pfn; > + pfn += pageblock_nr_pages) { > + page = pfn_to_page(pfn); > + set_pageblock_migratetype(page, MIGRATE_MIRROR); > + } > + } > +} > +#endif > > /** > * __next__mem_range - next function for for_each_free_mem_range() etc. > diff --git a/mm/nobootmem.c b/mm/nobootmem.c > index 5258386..31aa6d4 100644 > --- a/mm/nobootmem.c > +++ b/mm/nobootmem.c > @@ -129,6 +129,9 @@ static unsigned long __init free_low_memory_core_early(void) > u64 i; > > memblock_clear_hotplug(0, -1); > +#ifdef CONFIG_MEMORY_MIRROR > + memblock_mark_migratemirror(); > +#endif > > for_each_free_mem_range(i, NUMA_NO_NODE, MEMBLOCK_NONE, &start, &end, > NULL) > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 6e4d79f..aea78a5 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -4118,6 +4118,9 @@ static void setup_zone_migrate_reserve(struct zone *zone) > > block_migratetype = get_pageblock_migratetype(page); > > + if (is_migrate_mirror(block_migratetype)) > + continue; > + If mirrored area will not have reserved memory, this should break the page allocator's logic. I think both of mirrored and unmirrored range should have reserved area. Thanks, -Kame > /* Only test what is necessary when the reserves are not met */ > if (reserve > 0) { > /* > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753068AbbF2PTN (ORCPT ); Mon, 29 Jun 2015 11:19:13 -0400 Received: from mga11.intel.com ([192.55.52.93]:9159 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753026AbbF2PTM (ORCPT ); Mon, 29 Jun 2015 11:19:12 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.13,699,1427785200"; d="scan'208";a="719666141" Message-ID: <559161EF.7050405@intel.com> Date: Mon, 29 Jun 2015 08:19:11 -0700 From: Dave Hansen User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Xishi Qiu , Andrew Morton , "H. Peter Anvin" , Ingo Molnar , "Luck, Tony" , Hanjun Guo , Xiexiuqi , leon@leon.nu, Kamezawa Hiroyuki , Naoya Horiguchi , Vlastimil Babka , Mel Gorman CC: Linux MM , LKML Subject: Re: [RFC v2 PATCH 0/8] mm: mirrored memory support for page buddy allocations References: <558E084A.60900@huawei.com> In-Reply-To: <558E084A.60900@huawei.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/26/2015 07:19 PM, Xishi Qiu wrote: > drivers/base/node.c | 17 ++++--- > fs/proc/meminfo.c | 6 +++ > include/linux/memblock.h | 29 ++++++++++-- > include/linux/mmzone.h | 10 ++++ > include/linux/vmstat.h | 2 + > mm/Kconfig | 8 ++++ > mm/memblock.c | 33 +++++++++++-- > mm/nobootmem.c | 3 ++ > mm/page_alloc.c | 117 ++++++++++++++++++++++++++++++++++++----------- > mm/vmstat.c | 4 ++ > 10 files changed, 190 insertions(+), 39 deletions(-) Has there been any performance analysis done on this code? I'm always nervous when I see page_alloc.c churn. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753191AbbF2XLh (ORCPT ); Mon, 29 Jun 2015 19:11:37 -0400 Received: from mga11.intel.com ([192.55.52.93]:35415 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751080AbbF2XLb convert rfc822-to-8bit (ORCPT ); Mon, 29 Jun 2015 19:11:31 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.15,373,1432623600"; d="scan'208";a="752829923" From: "Luck, Tony" To: Xishi Qiu , Andrew Morton , "H. Peter Anvin" , Ingo Molnar , Hanjun Guo , Xiexiuqi , "leon@leon.nu" , Kamezawa Hiroyuki , "Hansen, Dave" , Naoya Horiguchi , Vlastimil Babka , Mel Gorman CC: Linux MM , LKML Subject: RE: [RFC v2 PATCH 7/8] mm: add the buddy system interface Thread-Topic: [RFC v2 PATCH 7/8] mm: add the buddy system interface Thread-Index: AQHQsIDtglWikYYiAUujGjxd5awj4J3EH3qg Date: Mon, 29 Jun 2015 23:11:30 +0000 Message-ID: <3908561D78D1C84285E8C5FCA982C28F32AA124A@ORSMSX114.amr.corp.intel.com> References: <558E084A.60900@huawei.com> <558E0A28.6060607@huawei.com> In-Reply-To: <558E0A28.6060607@huawei.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.22.254.139] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > @@ -814,7 +814,7 @@ int __init_memblock memblock_clear_hotplug(phys_addr_t base, phys_addr_t size) > */ > int __init_memblock memblock_mark_mirror(phys_addr_t base, phys_addr_t size) > { > - system_has_some_mirror = true; > + static_key_slow_inc(&system_has_mirror); > > return memblock_setclr_flag(base, size, 1, MEMBLOCK_MIRROR); > } This generates some WARN_ON noise when called from efi_find_mirror(): [ 0.000000] e820: last_pfn = 0x7b800 max_arch_pfn = 0x400000000 [ 0.000000] ------------[ cut here ]------------ [ 0.000000] WARNING: CPU: 0 PID: 0 at kernel/jump_label.c:61 static_key_slow_inc+0x57/0xc0() [ 0.000000] static_key_slow_inc used before call to jump_label_init [ 0.000000] Modules linked in: [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.1.0 #4 [ 0.000000] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0065.R01.1505011640 05/01/2015 [ 0.000000] 0000000000000000 ee366a8dff38f745 ffffffff81997d68 ffffffff816683b4 [ 0.000000] 0000000000000000 ffffffff81997dc0 ffffffff81997da8 ffffffff8107b0aa [ 0.000000] ffffffff81d48822 ffffffff81f281a0 0000000040000000 0000001fcb7a4000 [ 0.000000] Call Trace: [ 0.000000] [] dump_stack+0x45/0x57 [ 0.000000] [] warn_slowpath_common+0x8a/0xc0 [ 0.000000] [] warn_slowpath_fmt+0x55/0x70 [ 0.000000] [] ? memblock_add_range+0x175/0x19e [ 0.000000] [] static_key_slow_inc+0x57/0xc0 [ 0.000000] [] memblock_mark_mirror+0x19/0x33 [ 0.000000] [] efi_find_mirror+0x59/0xdd [ 0.000000] [] setup_arch+0x642/0xccf [ 0.000000] [] ? early_idt_handler_array+0x120/0x120 [ 0.000000] [] ? printk+0x55/0x6b [ 0.000000] [] ? early_idt_handler_array+0x120/0x120 [ 0.000000] [] start_kernel+0xe8/0x4eb [ 0.000000] [] ? early_idt_handler_array+0x120/0x120 [ 0.000000] [] ? early_idt_handler_array+0x120/0x120 [ 0.000000] [] x86_64_start_reservations+0x2a/0x2c [ 0.000000] [] x86_64_start_kernel+0x14c/0x16f [ 0.000000] ---[ end trace baa7fa0514e3bc58 ]--- [ 0.000000] ------------[ cut here ]------------ From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753560AbbF3BBe (ORCPT ); Mon, 29 Jun 2015 21:01:34 -0400 Received: from mgwkm04.jp.fujitsu.com ([202.219.69.171]:29596 "EHLO mgwkm04.jp.fujitsu.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752080AbbF3BB1 (ORCPT ); Mon, 29 Jun 2015 21:01:27 -0400 X-SecurityPolicyCheck: OK by SHieldMailChecker v2.3.2 X-SHieldMailCheckerPolicyVersion: FJ-ISEC-20150223 X-SHieldMailCheckerMailID: d1fb647ca5574787bf69fa96827e3b10 Message-ID: <5591EA50.1000000@jp.fujitsu.com> Date: Tue, 30 Jun 2015 10:01:04 +0900 From: Kamezawa Hiroyuki User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: "Luck, Tony" , Xishi Qiu , Andrew Morton , "H. Peter Anvin" , Ingo Molnar , Hanjun Guo , Xiexiuqi , "leon@leon.nu" , "Hansen, Dave" , Naoya Horiguchi , Vlastimil Babka , Mel Gorman CC: Linux MM , LKML Subject: Re: [RFC v2 PATCH 7/8] mm: add the buddy system interface References: <558E084A.60900@huawei.com> <558E0A28.6060607@huawei.com> <3908561D78D1C84285E8C5FCA982C28F32AA124A@ORSMSX114.amr.corp.intel.com> In-Reply-To: <3908561D78D1C84285E8C5FCA982C28F32AA124A@ORSMSX114.amr.corp.intel.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-TM-AS-MML: disable Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2015/06/30 8:11, Luck, Tony wrote: >> @@ -814,7 +814,7 @@ int __init_memblock memblock_clear_hotplug(phys_addr_t base, phys_addr_t size) >> */ >> int __init_memblock memblock_mark_mirror(phys_addr_t base, phys_addr_t size) >> { >> - system_has_some_mirror = true; >> + static_key_slow_inc(&system_has_mirror); >> >> return memblock_setclr_flag(base, size, 1, MEMBLOCK_MIRROR); >> } > > This generates some WARN_ON noise when called from efi_find_mirror(): > It seems jump_label_init() is called after memory initialization. (init/main.c::start_kernel()) So, it may be difficut to use static_key function for our purpose because kernel memory allocation may occur before jump_label is ready. Thanks, -Kame > [ 0.000000] e820: last_pfn = 0x7b800 max_arch_pfn = 0x400000000 > [ 0.000000] ------------[ cut here ]------------ > [ 0.000000] WARNING: CPU: 0 PID: 0 at kernel/jump_label.c:61 static_key_slow_inc+0x57/0xc0() > [ 0.000000] static_key_slow_inc used before call to jump_label_init > [ 0.000000] Modules linked in: > > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.1.0 #4 > [ 0.000000] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0065.R01.1505011640 05/01/2015 > [ 0.000000] 0000000000000000 ee366a8dff38f745 ffffffff81997d68 ffffffff816683b4 > [ 0.000000] 0000000000000000 ffffffff81997dc0 ffffffff81997da8 ffffffff8107b0aa > [ 0.000000] ffffffff81d48822 ffffffff81f281a0 0000000040000000 0000001fcb7a4000 > [ 0.000000] Call Trace: > [ 0.000000] [] dump_stack+0x45/0x57 > [ 0.000000] [] warn_slowpath_common+0x8a/0xc0 > [ 0.000000] [] warn_slowpath_fmt+0x55/0x70 > [ 0.000000] [] ? memblock_add_range+0x175/0x19e > [ 0.000000] [] static_key_slow_inc+0x57/0xc0 > [ 0.000000] [] memblock_mark_mirror+0x19/0x33 > [ 0.000000] [] efi_find_mirror+0x59/0xdd > [ 0.000000] [] setup_arch+0x642/0xccf > [ 0.000000] [] ? early_idt_handler_array+0x120/0x120 > [ 0.000000] [] ? printk+0x55/0x6b > [ 0.000000] [] ? early_idt_handler_array+0x120/0x120 > [ 0.000000] [] start_kernel+0xe8/0x4eb > [ 0.000000] [] ? early_idt_handler_array+0x120/0x120 > [ 0.000000] [] ? early_idt_handler_array+0x120/0x120 > [ 0.000000] [] x86_64_start_reservations+0x2a/0x2c > [ 0.000000] [] x86_64_start_kernel+0x14c/0x16f > [ 0.000000] ---[ end trace baa7fa0514e3bc58 ]--- > [ 0.000000] ------------[ cut here ]------------ > > > > > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753183AbbF3BaN (ORCPT ); Mon, 29 Jun 2015 21:30:13 -0400 Received: from szxga03-in.huawei.com ([119.145.14.66]:18967 "EHLO szxga03-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752718AbbF3BaJ (ORCPT ); Mon, 29 Jun 2015 21:30:09 -0400 Message-ID: <5591F042.1020304@huawei.com> Date: Tue, 30 Jun 2015 09:26:26 +0800 From: Xishi Qiu User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: Dave Hansen CC: Andrew Morton , "H. Peter Anvin" , Ingo Molnar , "Luck, Tony" , Hanjun Guo , Xiexiuqi , , Kamezawa Hiroyuki , Naoya Horiguchi , Vlastimil Babka , Mel Gorman , Linux MM , LKML Subject: Re: [RFC v2 PATCH 0/8] mm: mirrored memory support for page buddy allocations References: <558E084A.60900@huawei.com> <559161EF.7050405@intel.com> In-Reply-To: <559161EF.7050405@intel.com> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.177.25.179] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A020204.5591F056.0068,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0, ip=0.0.0.0, so=2013-05-26 15:14:31, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: ca4a8c30d9d8a4a1f052d909a055ac20 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2015/6/29 23:19, Dave Hansen wrote: > On 06/26/2015 07:19 PM, Xishi Qiu wrote: >> drivers/base/node.c | 17 ++++--- >> fs/proc/meminfo.c | 6 +++ >> include/linux/memblock.h | 29 ++++++++++-- >> include/linux/mmzone.h | 10 ++++ >> include/linux/vmstat.h | 2 + >> mm/Kconfig | 8 ++++ >> mm/memblock.c | 33 +++++++++++-- >> mm/nobootmem.c | 3 ++ >> mm/page_alloc.c | 117 ++++++++++++++++++++++++++++++++++++----------- >> mm/vmstat.c | 4 ++ >> 10 files changed, 190 insertions(+), 39 deletions(-) > > Has there been any performance analysis done on this code? I'm always > nervous when I see page_alloc.c churn. > Not yet, which benchmark do you suggest? Thanks, Xishi Qiu > > > . > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753845AbbF3BeQ (ORCPT ); Mon, 29 Jun 2015 21:34:16 -0400 Received: from szxga02-in.huawei.com ([119.145.14.65]:39058 "EHLO szxga02-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752787AbbF3BeK (ORCPT ); Mon, 29 Jun 2015 21:34:10 -0400 Message-ID: <5591F18E.3060504@huawei.com> Date: Tue, 30 Jun 2015 09:31:58 +0800 From: Xishi Qiu User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: Kamezawa Hiroyuki CC: "Luck, Tony" , Andrew Morton , "H. Peter Anvin" , Ingo Molnar , Hanjun Guo , Xiexiuqi , "leon@leon.nu" , "Hansen, Dave" , Naoya Horiguchi , Vlastimil Babka , Mel Gorman , Linux MM , LKML Subject: Re: [RFC v2 PATCH 7/8] mm: add the buddy system interface References: <558E084A.60900@huawei.com> <558E0A28.6060607@huawei.com> <3908561D78D1C84285E8C5FCA982C28F32AA124A@ORSMSX114.amr.corp.intel.com> <5591EA50.1000000@jp.fujitsu.com> In-Reply-To: <5591EA50.1000000@jp.fujitsu.com> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.177.25.179] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2015/6/30 9:01, Kamezawa Hiroyuki wrote: > On 2015/06/30 8:11, Luck, Tony wrote: >>> @@ -814,7 +814,7 @@ int __init_memblock memblock_clear_hotplug(phys_addr_t base, phys_addr_t size) >>> */ >>> int __init_memblock memblock_mark_mirror(phys_addr_t base, phys_addr_t size) >>> { >>> - system_has_some_mirror = true; >>> + static_key_slow_inc(&system_has_mirror); >>> >>> return memblock_setclr_flag(base, size, 1, MEMBLOCK_MIRROR); >>> } >> >> This generates some WARN_ON noise when called from efi_find_mirror(): >> > > It seems jump_label_init() is called after memory initialization. (init/main.c::start_kernel()) > So, it may be difficut to use static_key function for our purpose because > kernel memory allocation may occur before jump_label is ready. > > Thanks, > -Kame > Hi Kame, How about like this? Use static bool in bootmem, and use jump label in buddy system. This means we use two variable to do it. Thanks, Xishi Qiu >> [ 0.000000] e820: last_pfn = 0x7b800 max_arch_pfn = 0x400000000 >> [ 0.000000] ------------[ cut here ]------------ >> [ 0.000000] WARNING: CPU: 0 PID: 0 at kernel/jump_label.c:61 static_key_slow_inc+0x57/0xc0() >> [ 0.000000] static_key_slow_inc used before call to jump_label_init >> [ 0.000000] Modules linked in: >> >> [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.1.0 #4 >> [ 0.000000] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0065.R01.1505011640 05/01/2015 >> [ 0.000000] 0000000000000000 ee366a8dff38f745 ffffffff81997d68 ffffffff816683b4 >> [ 0.000000] 0000000000000000 ffffffff81997dc0 ffffffff81997da8 ffffffff8107b0aa >> [ 0.000000] ffffffff81d48822 ffffffff81f281a0 0000000040000000 0000001fcb7a4000 >> [ 0.000000] Call Trace: >> [ 0.000000] [] dump_stack+0x45/0x57 >> [ 0.000000] [] warn_slowpath_common+0x8a/0xc0 >> [ 0.000000] [] warn_slowpath_fmt+0x55/0x70 >> [ 0.000000] [] ? memblock_add_range+0x175/0x19e >> [ 0.000000] [] static_key_slow_inc+0x57/0xc0 >> [ 0.000000] [] memblock_mark_mirror+0x19/0x33 >> [ 0.000000] [] efi_find_mirror+0x59/0xdd >> [ 0.000000] [] setup_arch+0x642/0xccf >> [ 0.000000] [] ? early_idt_handler_array+0x120/0x120 >> [ 0.000000] [] ? printk+0x55/0x6b >> [ 0.000000] [] ? early_idt_handler_array+0x120/0x120 >> [ 0.000000] [] start_kernel+0xe8/0x4eb >> [ 0.000000] [] ? early_idt_handler_array+0x120/0x120 >> [ 0.000000] [] ? early_idt_handler_array+0x120/0x120 >> [ 0.000000] [] x86_64_start_reservations+0x2a/0x2c >> [ 0.000000] [] x86_64_start_kernel+0x14c/0x16f >> [ 0.000000] ---[ end trace baa7fa0514e3bc58 ]--- >> [ 0.000000] ------------[ cut here ]------------ >> >> >> >> >> > > > > . > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752755AbbF3BwW (ORCPT ); Mon, 29 Jun 2015 21:52:22 -0400 Received: from mga09.intel.com ([134.134.136.24]:40213 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751754AbbF3BwN (ORCPT ); Mon, 29 Jun 2015 21:52:13 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.15,374,1432623600"; d="scan'208";a="737319861" Message-ID: <5591F64A.3040108@intel.com> Date: Mon, 29 Jun 2015 18:52:10 -0700 From: Dave Hansen User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Xishi Qiu CC: Andrew Morton , "H. Peter Anvin" , Ingo Molnar , "Luck, Tony" , Hanjun Guo , Xiexiuqi , leon@leon.nu, Kamezawa Hiroyuki , Naoya Horiguchi , Vlastimil Babka , Mel Gorman , Linux MM , LKML Subject: Re: [RFC v2 PATCH 0/8] mm: mirrored memory support for page buddy allocations References: <558E084A.60900@huawei.com> <559161EF.7050405@intel.com> <5591F042.1020304@huawei.com> In-Reply-To: <5591F042.1020304@huawei.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/29/2015 06:26 PM, Xishi Qiu wrote: >> > Has there been any performance analysis done on this code? I'm always >> > nervous when I see page_alloc.c churn. >> > > Not yet, which benchmark do you suggest? mmtests is always a good place to start. aim9. I'm partial to will-it-scale. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753421AbbF3CBg (ORCPT ); Mon, 29 Jun 2015 22:01:36 -0400 Received: from mgwkm03.jp.fujitsu.com ([202.219.69.170]:53863 "EHLO mgwkm03.jp.fujitsu.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751754AbbF3CB2 (ORCPT ); Mon, 29 Jun 2015 22:01:28 -0400 X-SecurityPolicyCheck: OK by SHieldMailChecker v2.3.2 X-SHieldMailCheckerPolicyVersion: FJ-ISEC-20150223 X-SHieldMailCheckerMailID: c0ab22e01ca94bde8fc9845d3e2b3111 Message-ID: <5591F862.7030706@jp.fujitsu.com> Date: Tue, 30 Jun 2015 11:01:06 +0900 From: Kamezawa Hiroyuki User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Xishi Qiu CC: "Luck, Tony" , Andrew Morton , "H. Peter Anvin" , Ingo Molnar , Hanjun Guo , Xiexiuqi , "leon@leon.nu" , "Hansen, Dave" , Naoya Horiguchi , Vlastimil Babka , Mel Gorman , Linux MM , LKML Subject: Re: [RFC v2 PATCH 7/8] mm: add the buddy system interface References: <558E084A.60900@huawei.com> <558E0A28.6060607@huawei.com> <3908561D78D1C84285E8C5FCA982C28F32AA124A@ORSMSX114.amr.corp.intel.com> <5591EA50.1000000@jp.fujitsu.com> <5591F18E.3060504@huawei.com> In-Reply-To: <5591F18E.3060504@huawei.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-TM-AS-MML: disable Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2015/06/30 10:31, Xishi Qiu wrote: > On 2015/6/30 9:01, Kamezawa Hiroyuki wrote: > >> On 2015/06/30 8:11, Luck, Tony wrote: >>>> @@ -814,7 +814,7 @@ int __init_memblock memblock_clear_hotplug(phys_addr_t base, phys_addr_t size) >>>> */ >>>> int __init_memblock memblock_mark_mirror(phys_addr_t base, phys_addr_t size) >>>> { >>>> - system_has_some_mirror = true; >>>> + static_key_slow_inc(&system_has_mirror); >>>> >>>> return memblock_setclr_flag(base, size, 1, MEMBLOCK_MIRROR); >>>> } >>> >>> This generates some WARN_ON noise when called from efi_find_mirror(): >>> >> >> It seems jump_label_init() is called after memory initialization. (init/main.c::start_kernel()) >> So, it may be difficut to use static_key function for our purpose because >> kernel memory allocation may occur before jump_label is ready. >> >> Thanks, >> -Kame >> > > Hi Kame, > > How about like this? Use static bool in bootmem, and use jump label in buddy system. > This means we use two variable to do it. > I think it can be done but it should be done in separated patch with enough comment/changelog. Thanks, -Kame From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753529AbbF3Cqf (ORCPT ); Mon, 29 Jun 2015 22:46:35 -0400 Received: from szxga01-in.huawei.com ([58.251.152.64]:48330 "EHLO szxga01-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752102AbbF3Cq1 (ORCPT ); Mon, 29 Jun 2015 22:46:27 -0400 Message-ID: <559202E2.8060609@huawei.com> Date: Tue, 30 Jun 2015 10:45:54 +0800 From: Xishi Qiu User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: Kamezawa Hiroyuki CC: Andrew Morton , "H. Peter Anvin" , Ingo Molnar , "Luck, Tony" , Hanjun Guo , Xiexiuqi , , Dave Hansen , Naoya Horiguchi , Vlastimil Babka , Mel Gorman , Linux MM , LKML Subject: Re: [RFC v2 PATCH 2/8] mm: introduce MIGRATE_MIRROR to manage the mirrored pages References: <558E084A.60900@huawei.com> <558E0948.2010104@huawei.com> <5590F4A7.4030606@jp.fujitsu.com> In-Reply-To: <5590F4A7.4030606@jp.fujitsu.com> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.177.25.179] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2015/6/29 15:32, Kamezawa Hiroyuki wrote: > On 2015/06/27 11:24, Xishi Qiu wrote: >> This patch introduces a new migratetype called "MIGRATE_MIRROR", it is used to >> allocate mirrored pages. >> When cat /proc/pagetypeinfo, you can see the count of free mirrored blocks. >> >> Signed-off-by: Xishi Qiu > > My fear about this approarch is that this may break something existing. > > Now, when we add MIGRATE_MIRROR type, we'll hide attributes of pageblocks as > MIGRATE_UNMOVABOLE, MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE. > > Logically, MIRROR attribute is independent from page mobility and this overwrites > will make some information lost. > > Then, > >> --- >> include/linux/mmzone.h | 9 +++++++++ >> mm/page_alloc.c | 3 +++ >> mm/vmstat.c | 3 +++ >> 3 files changed, 15 insertions(+) >> >> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h >> index 54d74f6..54e891a 100644 >> --- a/include/linux/mmzone.h >> +++ b/include/linux/mmzone.h >> @@ -39,6 +39,9 @@ enum { >> MIGRATE_UNMOVABLE, >> MIGRATE_RECLAIMABLE, >> MIGRATE_MOVABLE, >> +#ifdef CONFIG_MEMORY_MIRROR >> + MIGRATE_MIRROR, >> +#endif > > I think > MIGRATE_MIRROR_UNMOVABLE, > MIGRATE_MIRROR_RECLAIMABLE, > MIGRATE_MIRROR_MOVABLE, <== adding this may need discuss. > MIGRATE_MIRROR_RESERVED, <== reserved pages should be maintained per mirrored/unmirrored. > Hi Kame, You mean add 3 or 4 new migratetype? > should be added with the following fallback list. > > /* > * MIRROR page range is defined by firmware at boot. The range is limited > * and is used only for kernel memory mirroring. > */ > [MIGRATE_UNMOVABLE_MIRROR] = {MIGRATE_RECLAIMABLE_MIRROR, MIGRATE_RESERVE} > [MIGRATE_RECLAIMABLE_MIRROR] = {MIGRATE_UNMOVABLE_MIRROR, MIGRATE_RESERVE} > Why not like this: {MIGRATE_RECLAIMABLE_MIRROR, MIGRATE_MIRROR_RESERVED, MIGRATE_RESERVE} > Then, we'll not lose the original information of "Reclaiable Pages". > > One problem here is whteher we should have MIGRATE_RESERVE_MIRROR. > > If we never allow users to allocate mirrored memory, we should have MIGRATE_RESERVE_MIRROR. > But it seems to require much more code change to do that. > > Creating a zone or adding an attribues to zones are another design choice. > If we add a new zone, mirror_zone will span others, I'm worry about this maybe have problems. Thanks, Xishi Qiu > Anyway, your patch doesn't takes care of reserved memory calculation at this point. > Please check setup_zone_migrate_reserve() That will be a problem. > > Thanks, > -Kame > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753734AbbF3CtJ (ORCPT ); Mon, 29 Jun 2015 22:49:09 -0400 Received: from szxga02-in.huawei.com ([119.145.14.65]:24707 "EHLO szxga02-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753635AbbF3CtG (ORCPT ); Mon, 29 Jun 2015 22:49:06 -0400 Message-ID: <55920384.7030301@huawei.com> Date: Tue, 30 Jun 2015 10:48:36 +0800 From: Xishi Qiu User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: Dave Hansen CC: Andrew Morton , "H. Peter Anvin" , Ingo Molnar , "Luck, Tony" , Hanjun Guo , Xiexiuqi , , Kamezawa Hiroyuki , Naoya Horiguchi , Vlastimil Babka , Mel Gorman , Linux MM , LKML Subject: Re: [RFC v2 PATCH 0/8] mm: mirrored memory support for page buddy allocations References: <558E084A.60900@huawei.com> <559161EF.7050405@intel.com> <5591F042.1020304@huawei.com> <5591F64A.3040108@intel.com> In-Reply-To: <5591F64A.3040108@intel.com> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.177.25.179] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2015/6/30 9:52, Dave Hansen wrote: > On 06/29/2015 06:26 PM, Xishi Qiu wrote: >>>> Has there been any performance analysis done on this code? I'm always >>>> nervous when I see page_alloc.c churn. >>>> >> Not yet, which benchmark do you suggest? > > mmtests is always a good place to start. aim9. I'm partial to > will-it-scale. > I see, thank you. > > > . > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753772AbbF3Cw3 (ORCPT ); Mon, 29 Jun 2015 22:52:29 -0400 Received: from szxga02-in.huawei.com ([119.145.14.65]:26902 "EHLO szxga02-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751027AbbF3CwX (ORCPT ); Mon, 29 Jun 2015 22:52:23 -0400 Message-ID: <55920450.703@huawei.com> Date: Tue, 30 Jun 2015 10:52:00 +0800 From: Xishi Qiu User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: Kamezawa Hiroyuki CC: Andrew Morton , "H. Peter Anvin" , Ingo Molnar , "Luck, Tony" , Hanjun Guo , Xiexiuqi , , Dave Hansen , Naoya Horiguchi , Vlastimil Babka , Mel Gorman , Linux MM , LKML Subject: Re: [RFC v2 PATCH 1/8] mm: add a new config to manage the code References: <558E084A.60900@huawei.com> <558E0913.7020501@huawei.com> <5590EAA9.5090104@jp.fujitsu.com> In-Reply-To: <5590EAA9.5090104@jp.fujitsu.com> Content-Type: text/plain; charset="Shift_JIS" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.177.25.179] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2015/6/29 14:50, Kamezawa Hiroyuki wrote: > On 2015/06/27 11:23, Xishi Qiu wrote: >> This patch introduces a new config called "CONFIG_ACPI_MIRROR_MEMORY", set it > CONFIG_MEMORY_MIRROR >> off by default. >> >> Signed-off-by: Xishi Qiu >> --- >> mm/Kconfig | 8 ++++++++ >> 1 file changed, 8 insertions(+) >> >> diff --git a/mm/Kconfig b/mm/Kconfig >> index 390214d..c40bb8b 100644 >> --- a/mm/Kconfig >> +++ b/mm/Kconfig >> @@ -200,6 +200,14 @@ config MEMORY_HOTREMOVE >> depends on MEMORY_HOTPLUG && ARCH_ENABLE_MEMORY_HOTREMOVE >> depends on MIGRATION >> >> +config MEMORY_MIRROR > > In following patches, you use CONFIG_MEMORY_MIRROR. > > I think the name is too generic besides it's depends on ACPI. > But I'm not sure address based memory mirror is planned in other platform. > > So, hmm. How about dividing the config into 2 parts like attached ? (just an example) > Seems like a good idea, thank you. > Thanks, > -Kame From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750821AbbF3HyC (ORCPT ); Tue, 30 Jun 2015 03:54:02 -0400 Received: from mgwkm04.jp.fujitsu.com ([202.219.69.171]:55127 "EHLO mgwkm04.jp.fujitsu.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750800AbbF3Hxx (ORCPT ); Tue, 30 Jun 2015 03:53:53 -0400 X-SecurityPolicyCheck: OK by SHieldMailChecker v2.3.2 X-SHieldMailCheckerPolicyVersion: FJ-ISEC-20150223 X-SHieldMailCheckerMailID: a050b766c97d47e7a204c4f6f9c54bbd Message-ID: <55924AEF.4050107@jp.fujitsu.com> Date: Tue, 30 Jun 2015 16:53:19 +0900 From: Kamezawa Hiroyuki User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Xishi Qiu CC: Andrew Morton , "H. Peter Anvin" , Ingo Molnar , "Luck, Tony" , Hanjun Guo , Xiexiuqi , leon@leon.nu, Dave Hansen , Naoya Horiguchi , Vlastimil Babka , Mel Gorman , Linux MM , LKML Subject: Re: [RFC v2 PATCH 2/8] mm: introduce MIGRATE_MIRROR to manage the mirrored pages References: <558E084A.60900@huawei.com> <558E0948.2010104@huawei.com> <5590F4A7.4030606@jp.fujitsu.com> <559202E2.8060609@huawei.com> In-Reply-To: <559202E2.8060609@huawei.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-TM-AS-MML: disable Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2015/06/30 11:45, Xishi Qiu wrote: > On 2015/6/29 15:32, Kamezawa Hiroyuki wrote: > >> On 2015/06/27 11:24, Xishi Qiu wrote: >>> This patch introduces a new migratetype called "MIGRATE_MIRROR", it is used to >>> allocate mirrored pages. >>> When cat /proc/pagetypeinfo, you can see the count of free mirrored blocks. >>> >>> Signed-off-by: Xishi Qiu >> >> My fear about this approarch is that this may break something existing. >> >> Now, when we add MIGRATE_MIRROR type, we'll hide attributes of pageblocks as >> MIGRATE_UNMOVABOLE, MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE. >> >> Logically, MIRROR attribute is independent from page mobility and this overwrites >> will make some information lost. >> >> Then, >> >>> --- >>> include/linux/mmzone.h | 9 +++++++++ >>> mm/page_alloc.c | 3 +++ >>> mm/vmstat.c | 3 +++ >>> 3 files changed, 15 insertions(+) >>> >>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h >>> index 54d74f6..54e891a 100644 >>> --- a/include/linux/mmzone.h >>> +++ b/include/linux/mmzone.h >>> @@ -39,6 +39,9 @@ enum { >>> MIGRATE_UNMOVABLE, >>> MIGRATE_RECLAIMABLE, >>> MIGRATE_MOVABLE, >>> +#ifdef CONFIG_MEMORY_MIRROR >>> + MIGRATE_MIRROR, >>> +#endif >> >> I think >> MIGRATE_MIRROR_UNMOVABLE, >> MIGRATE_MIRROR_RECLAIMABLE, >> MIGRATE_MIRROR_MOVABLE, <== adding this may need discuss. >> MIGRATE_MIRROR_RESERVED, <== reserved pages should be maintained per mirrored/unmirrored. >> > > Hi Kame, > > You mean add 3 or 4 new migratetype? > yes. But please check how NR_MIGRATETYPE_BITS will be. I think this will not have big impact in x86-64 . >> should be added with the following fallback list. >> >> /* >> * MIRROR page range is defined by firmware at boot. The range is limited >> * and is used only for kernel memory mirroring. >> */ >> [MIGRATE_UNMOVABLE_MIRROR] = {MIGRATE_RECLAIMABLE_MIRROR, MIGRATE_RESERVE} >> [MIGRATE_RECLAIMABLE_MIRROR] = {MIGRATE_UNMOVABLE_MIRROR, MIGRATE_RESERVE} >> > > Why not like this: > {MIGRATE_RECLAIMABLE_MIRROR, MIGRATE_MIRROR_RESERVED, MIGRATE_RESERVE} > My mistake. [MIGRATE_UNMOVABLE_MIRROR] = {MIGRATE_RECLAIMABLE_MIRROR, MIGRATE_RESERVE_MIRROR} [MIGRATE_RECLAIMABLE_MIRROR] = {MIGRATE_UNMOVABLE_MIRROR, MIGRATE_RESERVE_MIRROR} was my intention. This means mirrored memory and unmirrored memory is separated completely. But this should affect kswapd or other memory reclaim logic. for example, kswapd stops free pages are more than hi watermark. But mirrored/unmirrored pages exhausted cases are not handled in this series. You need some extra check in memory reclaim logic if you go with migration_type. >> Then, we'll not lose the original information of "Reclaiable Pages". >> >> One problem here is whteher we should have MIGRATE_RESERVE_MIRROR. >> >> If we never allow users to allocate mirrored memory, we should have MIGRATE_RESERVE_MIRROR. >> But it seems to require much more code change to do that. >> >> Creating a zone or adding an attribues to zones are another design choice. >> > > If we add a new zone, mirror_zone will span others, I'm worry about this > maybe have problems. Yes. that's problem. And zoneid bit is very limited resource. (....But memory reclaim logic can be unchanged.) Anyway, I'd like to see your solution with above changes 1st rather than adding zones. Thanks, -Kame From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752840AbbF3Jbb (ORCPT ); Tue, 30 Jun 2015 05:31:31 -0400 Received: from szxga03-in.huawei.com ([119.145.14.66]:7493 "EHLO szxga03-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752350AbbF3JbW (ORCPT ); Tue, 30 Jun 2015 05:31:22 -0400 Message-ID: <55925FD5.7030205@huawei.com> Date: Tue, 30 Jun 2015 17:22:29 +0800 From: Xishi Qiu User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: Kamezawa Hiroyuki CC: Andrew Morton , "H. Peter Anvin" , Ingo Molnar , "Luck, Tony" , Hanjun Guo , Xiexiuqi , , Dave Hansen , Naoya Horiguchi , Vlastimil Babka , Mel Gorman , Linux MM , LKML Subject: Re: [RFC v2 PATCH 2/8] mm: introduce MIGRATE_MIRROR to manage the mirrored pages References: <558E084A.60900@huawei.com> <558E0948.2010104@huawei.com> <5590F4A7.4030606@jp.fujitsu.com> <559202E2.8060609@huawei.com> <55924AEF.4050107@jp.fujitsu.com> In-Reply-To: <55924AEF.4050107@jp.fujitsu.com> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.177.25.179] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A020205.55926063.00FF,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0, ip=0.0.0.0, so=2013-05-26 15:14:31, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: 79e4f9795ca5068081bfa40a0f3a93ab Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2015/6/30 15:53, Kamezawa Hiroyuki wrote: > On 2015/06/30 11:45, Xishi Qiu wrote: >> On 2015/6/29 15:32, Kamezawa Hiroyuki wrote: >> >>> On 2015/06/27 11:24, Xishi Qiu wrote: >>>> This patch introduces a new migratetype called "MIGRATE_MIRROR", it is used to >>>> allocate mirrored pages. >>>> When cat /proc/pagetypeinfo, you can see the count of free mirrored blocks. >>>> >>>> Signed-off-by: Xishi Qiu >>> >>> My fear about this approarch is that this may break something existing. >>> >>> Now, when we add MIGRATE_MIRROR type, we'll hide attributes of pageblocks as >>> MIGRATE_UNMOVABOLE, MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE. >>> >>> Logically, MIRROR attribute is independent from page mobility and this overwrites >>> will make some information lost. >>> >>> Then, >>> >>>> --- >>>> include/linux/mmzone.h | 9 +++++++++ >>>> mm/page_alloc.c | 3 +++ >>>> mm/vmstat.c | 3 +++ >>>> 3 files changed, 15 insertions(+) >>>> >>>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h >>>> index 54d74f6..54e891a 100644 >>>> --- a/include/linux/mmzone.h >>>> +++ b/include/linux/mmzone.h >>>> @@ -39,6 +39,9 @@ enum { >>>> MIGRATE_UNMOVABLE, >>>> MIGRATE_RECLAIMABLE, >>>> MIGRATE_MOVABLE, >>>> +#ifdef CONFIG_MEMORY_MIRROR >>>> + MIGRATE_MIRROR, >>>> +#endif >>> >>> I think >>> MIGRATE_MIRROR_UNMOVABLE, >>> MIGRATE_MIRROR_RECLAIMABLE, >>> MIGRATE_MIRROR_MOVABLE, <== adding this may need discuss. >>> MIGRATE_MIRROR_RESERVED, <== reserved pages should be maintained per mirrored/unmirrored. >>> >> >> Hi Kame, >> >> You mean add 3 or 4 new migratetype? >> > > yes. But please check how NR_MIGRATETYPE_BITS will be. > I think this will not have big impact in x86-64 . > >>> should be added with the following fallback list. >>> >>> /* >>> * MIRROR page range is defined by firmware at boot. The range is limited >>> * and is used only for kernel memory mirroring. >>> */ >>> [MIGRATE_UNMOVABLE_MIRROR] = {MIGRATE_RECLAIMABLE_MIRROR, MIGRATE_RESERVE} >>> [MIGRATE_RECLAIMABLE_MIRROR] = {MIGRATE_UNMOVABLE_MIRROR, MIGRATE_RESERVE} >>> >> >> Why not like this: >> {MIGRATE_RECLAIMABLE_MIRROR, MIGRATE_MIRROR_RESERVED, MIGRATE_RESERVE} >> > > My mistake. > [MIGRATE_UNMOVABLE_MIRROR] = {MIGRATE_RECLAIMABLE_MIRROR, MIGRATE_RESERVE_MIRROR} > [MIGRATE_RECLAIMABLE_MIRROR] = {MIGRATE_UNMOVABLE_MIRROR, MIGRATE_RESERVE_MIRROR} > > was my intention. This means mirrored memory and unmirrored memory is separated completely. > > But this should affect kswapd or other memory reclaim logic. > > for example, kswapd stops free pages are more than hi watermark. > But mirrored/unmirrored pages exhausted cases are not handled in this series. > You need some extra check in memory reclaim logic if you go with migration_type. > OK, I understand. Thank you for your suggestion. Thanks, Xishi Qiu > > >>> Then, we'll not lose the original information of "Reclaiable Pages". >>> >>> One problem here is whteher we should have MIGRATE_RESERVE_MIRROR. >>> >>> If we never allow users to allocate mirrored memory, we should have MIGRATE_RESERVE_MIRROR. >>> But it seems to require much more code change to do that. >>> >>> Creating a zone or adding an attribues to zones are another design choice. >>> >> >> If we add a new zone, mirror_zone will span others, I'm worry about this >> maybe have problems. > > Yes. that's problem. And zoneid bit is very limited resource. > (....But memory reclaim logic can be unchanged.) > > Anyway, I'd like to see your solution with above changes 1st rather than adding zones. > > Thanks, > -Kame > > > > . > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752567AbbF3JmE (ORCPT ); Tue, 30 Jun 2015 05:42:04 -0400 Received: from cantor2.suse.de ([195.135.220.15]:39460 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751128AbbF3Jl7 (ORCPT ); Tue, 30 Jun 2015 05:41:59 -0400 Date: Tue, 30 Jun 2015 10:41:50 +0100 From: Mel Gorman To: Xishi Qiu Cc: Andrew Morton , "H. Peter Anvin" , Ingo Molnar , "Luck, Tony" , Hanjun Guo , Xiexiuqi , leon@leon.nu, Kamezawa Hiroyuki , Dave Hansen , Naoya Horiguchi , Vlastimil Babka , Linux MM , LKML Subject: Re: [RFC v2 PATCH 0/8] mm: mirrored memory support for page buddy allocations Message-ID: <20150630094149.GA6812@suse.de> References: <558E084A.60900@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <558E084A.60900@huawei.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Jun 27, 2015 at 10:19:54AM +0800, Xishi Qiu wrote: > Intel Xeon processor E7 v3 product family-based platforms introduces support > for partial memory mirroring called as 'Address Range Mirroring'. This feature > allows BIOS to specify a subset of total available memory to be mirrored (and > optionally also specify whether to mirror the range 0-4 GB). This capability > allows user to make an appropriate tradeoff between non-mirrored memory range > and mirrored memory range thus optimizing total available memory and still > achieving highly reliable memory range for mission critical workloads and/or > kernel space. > > Tony has already send a patchset to supprot this feature at boot time. > https://lkml.org/lkml/2015/5/8/521 > This patchset is based on Tony's, it can support the feature after boot time. > Use mirrored memory for all kernel allocations. > This is my first time glancing through the series so I'm not aware of any past discussion. Hopefully there are no repeats. Broadly speaking though I'm not comfortable with the series. First and foremost, there is uncontrolled access to the memory because it's any kernel request. This includes even short-lived ones that do not need mirroring such as network buffers or caches. Network network traffic can be retried, caches can be reconstructed from disk etc. Kernel page tables, struct page corruption etc are much harder to recover from. Who are the expected users of this memory and how are they meant to be prioritised? What happens if they fail to be mirrored? What happens if the mirrored memory is all used up and a high priority request arrives? Is there any prioritisation of one subsystem over another? What about boot-memory allocations, should they ever use mirrored memory? The expected users are important and this series does not address it. Callers do not specify the flag, you just assume that kernel allocations must be mirrored. If the allocation request fails, then you assume it was MIGRATE_RECLAIMABLE later in the series. This is wrong as it'll break fragmentation avoidance on machines with mirrored memory. Even if you were to use migrate types to handle mirrored memory, you need to treat mirrored memory as a type of reserve or else as a first preference for allocations requested. The fact that this will be used by very few machines but affects the memory footprint of the page allocator is a general concern. When active, it affects the fast paths for all users whether they care about mirroring or not. If all free memory is in the MIGRATE_MIRROR then all user-space requests will be rejected but reclaim will not make any progress if the zone is balanced. The system may go prematurely OOM as no progress is made. Getting around this is tricky and affects a few fast paths. Generally, the easiest approach would be zone-based but I recognise that it has problems of its own. Basically, overall I feel this series is the wrong approach but not knowing who the users are making is much harder to judge. I strongly suspect that if mirrored memory is to be properly used then it needs to be available before the page allocator is even active. Once active, there needs to be controlled access for allocation requests that are really critical to mirror and not just all kernel allocations. None of that would use a MIGRATE_TYPE approach. It would be alterations to the bootmem allocator and access to an explicit reserve that is not accounted for as "free memory" and accessed via an explicit GFP flag. -- Mel Gorman SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752855AbbF3KrI (ORCPT ); Tue, 30 Jun 2015 06:47:08 -0400 Received: from mail-wg0-f42.google.com ([74.125.82.42]:35792 "EHLO mail-wg0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752495AbbF3KrB (ORCPT ); Tue, 30 Jun 2015 06:47:01 -0400 Date: Tue, 30 Jun 2015 12:46:54 +0200 From: Ingo Molnar To: Mel Gorman Cc: Xishi Qiu , Andrew Morton , "H. Peter Anvin" , "Luck, Tony" , Hanjun Guo , Xiexiuqi , leon@leon.nu, Kamezawa Hiroyuki , Dave Hansen , Naoya Horiguchi , Vlastimil Babka , Linux MM , LKML Subject: Re: [RFC v2 PATCH 0/8] mm: mirrored memory support for page buddy allocations Message-ID: <20150630104654.GA24932@gmail.com> References: <558E084A.60900@huawei.com> <20150630094149.GA6812@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150630094149.GA6812@suse.de> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Mel Gorman wrote: > [...] > > Basically, overall I feel this series is the wrong approach but not knowing who > the users are making is much harder to judge. I strongly suspect that if > mirrored memory is to be properly used then it needs to be available before the > page allocator is even active. Once active, there needs to be controlled access > for allocation requests that are really critical to mirror and not just all > kernel allocations. None of that would use a MIGRATE_TYPE approach. It would be > alterations to the bootmem allocator and access to an explicit reserve that is > not accounted for as "free memory" and accessed via an explicit GFP flag. So I think the main goal is to avoid kernel crashes when a #MC memory fault arrives on a piece of memory that is owned by the kernel. In that sense 'protecting' all kernel allocations is natural: we don't know how to recover from faults that affect kernel memory. We do know how to recover from faults that affect user-space memory alone. So if a mechanism is in place that prioritizes 3 groups of allocators: - non-recoverable memory (kernel allocations mostly) - high priority user memory (critical apps that must never fail) - recoverable user memory (non-dirty caches that can simply be dropped, non-critical apps, etc.) then we can make use of this hardware feature. I suspect this series tries to move in that direction. Thanks, Ingo From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752996AbbF3LyK (ORCPT ); Tue, 30 Jun 2015 07:54:10 -0400 Received: from cantor2.suse.de ([195.135.220.15]:47362 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751909AbbF3LyD (ORCPT ); Tue, 30 Jun 2015 07:54:03 -0400 Date: Tue, 30 Jun 2015 12:53:53 +0100 From: Mel Gorman To: Ingo Molnar Cc: Xishi Qiu , Andrew Morton , "H. Peter Anvin" , "Luck, Tony" , Hanjun Guo , Xiexiuqi , leon@leon.nu, Kamezawa Hiroyuki , Dave Hansen , Naoya Horiguchi , Vlastimil Babka , Linux MM , LKML Subject: Re: [RFC v2 PATCH 0/8] mm: mirrored memory support for page buddy allocations Message-ID: <20150630115353.GB6812@suse.de> References: <558E084A.60900@huawei.com> <20150630094149.GA6812@suse.de> <20150630104654.GA24932@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20150630104654.GA24932@gmail.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jun 30, 2015 at 12:46:54PM +0200, Ingo Molnar wrote: > > * Mel Gorman wrote: > > > [...] > > > > Basically, overall I feel this series is the wrong approach but not knowing who > > the users are making is much harder to judge. I strongly suspect that if > > mirrored memory is to be properly used then it needs to be available before the > > page allocator is even active. Once active, there needs to be controlled access > > for allocation requests that are really critical to mirror and not just all > > kernel allocations. None of that would use a MIGRATE_TYPE approach. It would be > > alterations to the bootmem allocator and access to an explicit reserve that is > > not accounted for as "free memory" and accessed via an explicit GFP flag. > > So I think the main goal is to avoid kernel crashes when a #MC memory fault > arrives on a piece of memory that is owned by the kernel. > Sounds logical. In that case, bootmem awareness would be crucial. Enabling support in just the page allocator is too late. > In that sense 'protecting' all kernel allocations is natural: we don't know how to > recover from faults that affect kernel memory. > It potentially uses all mirrored memory on memory that does not need that sort of guarantee. For example, if there was a MC on memory backing the inode cache then potentially that is recoverable as long as the inodes were not dirty. That's a minor detail as the kernel could later protect only MIGRATE_UNMOVABLE requests instead of all kernel allocations if fatal MC in kernel space could be distinguished from non-fatal checks. Bootmem awareness is much more important either way. If that was addressed then potentially a MIGRATE_UNMOVABLE_MIRROR type could be created that is only used for MIGRATE_UNMOVABLE allocations and never for user-space. That misses MIGRATE_RECLAIMABLE so if that is required then we need something else that both preserves fragmentation avoidance and avoid introducing loads of new migratetypes. Reclaim-related issues could be partially avoided by forbidding use from userspace and accounting for the size of MIGRATE_UNMOVABLE_MIRROR during watermark checks. > We do know how to recover from faults that affect user-space memory alone. > > So if a mechanism is in place that prioritizes 3 groups of allocators: > > - non-recoverable memory (kernel allocations mostly) > So bootmem at the very least followed by MIGRATE_UNMOVABLE requests whether they are accounted for by zones of MIGRATE_TYPES. > - high priority user memory (critical apps that must never fail) > This one is problematic with a MIGRATE_TYPE-based approach such as the one in this series. If a high priority requires memory and MIGRATE_MIRROR is full then some of it must be reclaimed. With a MIGRATE_TYPE approach, the kernel may reclaim a lot of unnecessary memory trying to free some MIGRATE_MIRROR memory with no guarantee of success. It'll look like unnecessary thrashing from userspace but difficult to diagnose as reclaim stats are per-zone based. Dealing with this needs either a zone-based approach or a lot of surgery to reclaim (similar to what the node-based LRU series does actually when it skips pages when the caller requires lowmem pages). -- Mel Gorman SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752595AbbF3SMs (ORCPT ); Tue, 30 Jun 2015 14:12:48 -0400 Received: from mga01.intel.com ([192.55.52.88]:62791 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750821AbbF3SMj convert rfc822-to-8bit (ORCPT ); Tue, 30 Jun 2015 14:12:39 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.15,379,1432623600"; d="scan'208";a="756002383" From: "Luck, Tony" To: Mel Gorman , Ingo Molnar CC: Xishi Qiu , Andrew Morton , "H. Peter Anvin" , Hanjun Guo , Xiexiuqi , "leon@leon.nu" , Kamezawa Hiroyuki , "Hansen, Dave" , Naoya Horiguchi , Vlastimil Babka , Linux MM , LKML Subject: RE: [RFC v2 PATCH 0/8] mm: mirrored memory support for page buddy allocations Thread-Topic: [RFC v2 PATCH 0/8] mm: mirrored memory support for page buddy allocations Thread-Index: AQHQsH/SyOmj3uqah0iJLVCYU8WwM53FRlMAgAASLgCAABK3gP//5bhQ Date: Tue, 30 Jun 2015 18:12:35 +0000 Message-ID: <3908561D78D1C84285E8C5FCA982C28F32AA1974@ORSMSX114.amr.corp.intel.com> References: <558E084A.60900@huawei.com> <20150630094149.GA6812@suse.de> <20150630104654.GA24932@gmail.com> <20150630115353.GB6812@suse.de> In-Reply-To: <20150630115353.GB6812@suse.de> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.22.254.139] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > Sounds logical. In that case, bootmem awareness would be crucial. > Enabling support in just the page allocator is too late. Andrew already applied some patches from me that I think covered bootmem mirror allocations: commit fc6daaf93151877748f8096af6b3fddb147f22d6 mm/memblock: add extra "flags" to memblock to allow selection of memory based on attribute commit a3f5bafcc04aaf62990e0cf3ced1cc6d8dc6fe95 mm/memblock: allocate boot time data structures from mirrored memory commit b05b9f5f9dcf593a0e9327676b78e6c17b4218e8 x86, mirror: x86 enabling - find mirrored memory ranges If I missed something, please let me know. >> In that sense 'protecting' all kernel allocations is natural: we don't know how to >> recover from faults that affect kernel memory. >> > > It potentially uses all mirrored memory on memory that does not need that > sort of guarantee. For example, if there was a MC on memory backing the > inode cache then potentially that is recoverable as long as the inodes > were not dirty. Right now this is hard to do. On Intel we get a broadcast machine check that may catch bystander cpus holding locks that we might need to look at kernel structures to make decisions on what we just lost. That may get easier with local machine check (only the logical cpu that tried to consume the corrupt data gets the machine check ... patches for Linux are in for basic support of this ... waiting for h/w that does it). > That's a minor detail as the kernel could later protect > only MIGRATE_UNMOVABLE requests instead of all kernel allocations if fatal > MC in kernel space could be distinguished from non-fatal checks. So the immediate use case is large memory servers (hundred+ Gbytes to TBytes) running some applications that use most of memory in user mode (like a database). We mirror enough memory to cover *all* the kernel allocations so that a bad memory access with be fixed from the mirror for kernel, or result in SIGBUS to a process for user page ... either way we don't crash the system. Perhaps in the future we might find some places in the kernel where we can cover a lot of memory without too many code changes ... e.g. things like pagecopy(). At that time we'd have to think about allocation priorities. -Tony From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751096AbbGME52 (ORCPT ); Mon, 13 Jul 2015 00:57:28 -0400 Received: from szxga02-in.huawei.com ([119.145.14.65]:62629 "EHLO szxga02-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750786AbbGME51 (ORCPT ); Mon, 13 Jul 2015 00:57:27 -0400 Message-ID: <55A3450E.6050707@huawei.com> Date: Mon, 13 Jul 2015 12:56:46 +0800 From: Xishi Qiu User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: Mel Gorman CC: Ingo Molnar , Andrew Morton , "H. Peter Anvin" , "Luck, Tony" , Hanjun Guo , Xiexiuqi , , Kamezawa Hiroyuki , "Dave Hansen" , Naoya Horiguchi , Vlastimil Babka , Linux MM , LKML Subject: Re: [RFC v2 PATCH 0/8] mm: mirrored memory support for page buddy allocations References: <558E084A.60900@huawei.com> <20150630094149.GA6812@suse.de> <20150630104654.GA24932@gmail.com> <20150630115353.GB6812@suse.de> In-Reply-To: <20150630115353.GB6812@suse.de> Content-Type: text/plain; charset="ISO-8859-15" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.177.25.179] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2015/6/30 19:53, Mel Gorman wrote: > On Tue, Jun 30, 2015 at 12:46:54PM +0200, Ingo Molnar wrote: >> >> * Mel Gorman wrote: >> >>> [...] >>> >>> Basically, overall I feel this series is the wrong approach but not knowing who >>> the users are making is much harder to judge. I strongly suspect that if >>> mirrored memory is to be properly used then it needs to be available before the >>> page allocator is even active. Once active, there needs to be controlled access >>> for allocation requests that are really critical to mirror and not just all >>> kernel allocations. None of that would use a MIGRATE_TYPE approach. It would be >>> alterations to the bootmem allocator and access to an explicit reserve that is >>> not accounted for as "free memory" and accessed via an explicit GFP flag. >> >> So I think the main goal is to avoid kernel crashes when a #MC memory fault >> arrives on a piece of memory that is owned by the kernel. >> > > Sounds logical. In that case, bootmem awareness would be crucial. > Enabling support in just the page allocator is too late. > >> In that sense 'protecting' all kernel allocations is natural: we don't know how to >> recover from faults that affect kernel memory. >> > > It potentially uses all mirrored memory on memory that does not need that > sort of guarantee. For example, if there was a MC on memory backing the > inode cache then potentially that is recoverable as long as the inodes > were not dirty. That's a minor detail as the kernel could later protect > only MIGRATE_UNMOVABLE requests instead of all kernel allocations if fatal > MC in kernel space could be distinguished from non-fatal checks. > > Bootmem awareness is much more important either way. If that was addressed > then potentially a MIGRATE_UNMOVABLE_MIRROR type could be created that > is only used for MIGRATE_UNMOVABLE allocations and never for user-space. > That misses MIGRATE_RECLAIMABLE so if that is required then we need > something else that both preserves fragmentation avoidance and avoid > introducing loads of new migratetypes. > > Reclaim-related issues could be partially avoided by forbidding use from > userspace and accounting for the size of MIGRATE_UNMOVABLE_MIRROR during > watermark checks. > >> We do know how to recover from faults that affect user-space memory alone. >> >> So if a mechanism is in place that prioritizes 3 groups of allocators: >> >> - non-recoverable memory (kernel allocations mostly) >> > > So bootmem at the very least followed by MIGRATE_UNMOVABLE requests whether > they are accounted for by zones of MIGRATE_TYPES. > >> - high priority user memory (critical apps that must never fail) >> > > This one is problematic with a MIGRATE_TYPE-based approach such as the one in > this series. If a high priority requires memory and MIGRATE_MIRROR is full > then some of it must be reclaimed. With a MIGRATE_TYPE approach, the kernel > may reclaim a lot of unnecessary memory trying to free some MIGRATE_MIRROR > memory with no guarantee of success. It'll look like unnecessary thrashing > from userspace but difficult to diagnose as reclaim stats are per-zone based. > Dealing with this needs either a zone-based approach or a lot of surgery > to reclaim (similar to what the node-based LRU series does actually when > it skips pages when the caller requires lowmem pages). > Hi Mel, Thank you for your comment. Sorry for replying late and some of it is not very understanding for me. If fatal memory faults in kernel space could be distinguished from non-fatal, we can use only MIGRATE_UNMOVABLE_MIRROR, if can't, use two types for MIGRATE_RECLAIMABLE and MIGRATE_UNMOVABLE, right? Reclaim-related issues is similar to CMA in zone_watermark_ok(), right? If we protect high priority user memory, use a new mirrored zone may be better, right? How about use a flag(e.g. GFP_MIRROR) to in kernel space allocation? Can we use it to sort kernel space allocation? And it can also called by user space via madvise and mmap. Thanks, Xishi Qiu