linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v1 4/6] arch/x86/kernel: mark NVDIMM regions from e820_table
  2018-05-07 14:50 [RFC PATCH v1 0/6] use mm to manage NVDIMM (pmem) zone Huaisheng Ye
@ 2018-05-07 14:50 ` Huaisheng Ye
  0 siblings, 0 replies; 4+ messages in thread
From: Huaisheng Ye @ 2018-05-07 14:50 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: mhocko, willy, vbabka, mgorman, pasha.tatashin, alexander.levin,
	hannes, penguin-kernel, colyli, chengnt, linux-kernel,
	Huaisheng Ye, Ocean He

During e820__memblock_setup memblock gets entries with type
E820_TYPE_RAM, E820_TYPE_RESERVED_KERN and E820_TYPE_PMEM from
e820_table, then marks NVDIMM regions with flag MEMBLOCK_NVDIMM.

Create function as e820__end_of_nvm_pfn to calculate max_pfn with
NVDIMM region, while zone_sizes_init needs max_pfn to get
arch_zone_lowest/highest_possible_pfn. During free_area_init_nodes,
the possible pfns need to be recalculated for ZONE_NVM.

Signed-off-by: Huaisheng Ye <yehs1@lenovo.com>
Signed-off-by: Ocean He <hehy1@lenovo.com>
---
 arch/x86/include/asm/e820/api.h |  3 +++
 arch/x86/kernel/e820.c          | 20 +++++++++++++++++++-
 arch/x86/kernel/setup.c         |  8 ++++++++
 3 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/e820/api.h b/arch/x86/include/asm/e820/api.h
index 62be73b..b8006c3 100644
--- a/arch/x86/include/asm/e820/api.h
+++ b/arch/x86/include/asm/e820/api.h
@@ -22,6 +22,9 @@
 extern void e820__update_table_print(void);
 
 extern unsigned long e820__end_of_ram_pfn(void);
+#ifdef CONFIG_ZONE_NVM
+extern unsigned long e820__end_of_nvm_pfn(void);
+#endif
 extern unsigned long e820__end_of_low_ram_pfn(void);
 
 extern u64  e820__memblock_alloc_reserved(u64 size, u64 align);
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 6a2cb14..1bf7876 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -840,6 +840,13 @@ unsigned long __init e820__end_of_ram_pfn(void)
 	return e820_end_pfn(MAX_ARCH_PFN, E820_TYPE_RAM);
 }
 
+#ifdef CONFIG_ZONE_NVM
+unsigned long __init e820__end_of_nvm_pfn(void)
+{
+	return e820_end_pfn(MAX_ARCH_PFN, E820_TYPE_PMEM);
+}
+#endif
+
 unsigned long __init e820__end_of_low_ram_pfn(void)
 {
 	return e820_end_pfn(1UL << (32 - PAGE_SHIFT), E820_TYPE_RAM);
@@ -1264,11 +1271,22 @@ void __init e820__memblock_setup(void)
 		end = entry->addr + entry->size;
 		if (end != (resource_size_t)end)
 			continue;
-
+#ifdef CONFIG_ZONE_NVM
+		if (entry->type != E820_TYPE_RAM && entry->type != E820_TYPE_RESERVED_KERN &&
+								entry->type != E820_TYPE_PMEM)
+#else
 		if (entry->type != E820_TYPE_RAM && entry->type != E820_TYPE_RESERVED_KERN)
+#endif
 			continue;
 
 		memblock_add(entry->addr, entry->size);
+
+#ifdef CONFIG_ZONE_NVM
+		if (entry->type == E820_TYPE_PMEM) {
+			/* Mark this region with PMEM flags */
+			memblock_mark_nvdimm(entry->addr, entry->size);
+		}
+#endif
 	}
 
 	/* Throw away partial pages: */
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 6285697..305975b 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1031,7 +1031,15 @@ void __init setup_arch(char **cmdline_p)
 	 * partially used pages are not usable - thus
 	 * we are rounding upwards:
 	 */
+#ifdef CONFIG_ZONE_NVM
+	max_pfn = e820__end_of_nvm_pfn();
+	if (!max_pfn) {
+		printk(KERN_INFO "No physical NVDIMM has been found\n");
+		max_pfn = e820__end_of_ram_pfn();
+	}
+#else
 	max_pfn = e820__end_of_ram_pfn();
+#endif
 
 	/* update e820 for memory not covered by WB MTRRs */
 	mtrr_bp_init();
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [RFC PATCH v1 0/6] use mm to manage NVDIMM (pmem) zone
@ 2018-05-08  2:00 Huaisheng Ye
  2018-05-08  2:00 ` [RFC PATCH v1 4/6] arch/x86/kernel: mark NVDIMM regions from e820_table Huaisheng Ye
  0 siblings, 1 reply; 4+ messages in thread
From: Huaisheng Ye @ 2018-05-08  2:00 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: mhocko, willy, vbabka, mgorman, pasha.tatashin, alexander.levin,
	hannes, penguin-kernel, colyli, chengnt, hehy1, linux-kernel,
	linux-nvdimm, Huaisheng Ye

Traditionally, NVDIMMs are treated by mm(memory management) subsystem as
DEVICE zone, which is a virtual zone and both its start and end of pfn 
are equal to 0, mm wouldna??t manage NVDIMM directly as DRAM, kernel uses
corresponding drivers, which locate at \drivers\nvdimm\ and 
\drivers\acpi\nfit and fs, to realize NVDIMM memory alloc and free with
memory hot plug implementation.

With current kernel, many mma??s classical features like the buddy
system, swap mechanism and page cache couldna??t be supported to NVDIMM.
What we are doing is to expand kernel mma??s capacity to make it to handle
NVDIMM like DRAM. Furthermore we make mm could treat DRAM and NVDIMM
separately, that means mm can only put the critical pages to NVDIMM
zone, here we created a new zone type as NVM zone. That is to say for 
traditional(or normal) pages which would be stored at DRAM scope like
Normal, DMA32 and DMA zones. But for the critical pages, which we hope
them could be recovered from power fail or system crash, we make them
to be persistent by storing them to NVM zone.

We installed two NVDIMMs to Lenovo Thinksystem product as development
platform, which has 125GB storage capacity respectively. With these 
patches below, mm can create NVM zones for NVDIMMs.

Here is dmesg info,
 Initmem setup node 0 [mem 0x0000000000001000-0x000000237fffffff]
 On node 0 totalpages: 36879666
   DMA zone: 64 pages used for memmap
   DMA zone: 23 pages reserved
   DMA zone: 3999 pages, LIFO batch:0
 mminit::memmap_init Initialising map node 0 zone 0 pfns 1 -> 4096 
   DMA32 zone: 10935 pages used for memmap
   DMA32 zone: 699795 pages, LIFO batch:31
 mminit::memmap_init Initialising map node 0 zone 1 pfns 4096 -> 1048576
   Normal zone: 53248 pages used for memmap
   Normal zone: 3407872 pages, LIFO batch:31
 mminit::memmap_init Initialising map node 0 zone 2 pfns 1048576 -> 4456448
   NVM zone: 512000 pages used for memmap
   NVM zone: 32768000 pages, LIFO batch:31
 mminit::memmap_init Initialising map node 0 zone 3 pfns 4456448 -> 37224448
 Initmem setup node 1 [mem 0x0000002380000000-0x00000046bfffffff]
 On node 1 totalpages: 36962304
   Normal zone: 65536 pages used for memmap
   Normal zone: 4194304 pages, LIFO batch:31
 mminit::memmap_init Initialising map node 1 zone 2 pfns 37224448 -> 41418752
   NVM zone: 512000 pages used for memmap
   NVM zone: 32768000 pages, LIFO batch:31
 mminit::memmap_init Initialising map node 1 zone 3 pfns 41418752 -> 74186752

This comes /proc/zoneinfo
Node 0, zone      NVM
  pages free     32768000
        min      15244
        low      48012
        high     80780
        spanned  32768000
        present  32768000
        managed  32768000
        protection: (0, 0, 0, 0, 0, 0)
        nr_free_pages 32768000
Node 1, zone      NVM
  pages free     32768000
        min      15244
        low      48012
        high     80780
        spanned  32768000
        present  32768000
        managed  32768000

Huaisheng Ye (6):
  mm/memblock: Expand definition of flags to support NVDIMM
  mm/page_alloc.c: get pfn range with flags of memblock
  mm, zone_type: create ZONE_NVM and fill into GFP_ZONE_TABLE
  arch/x86/kernel: mark NVDIMM regions from e820_table
  mm: get zone spanned pages separately for DRAM and NVDIMM
  arch/x86/mm: create page table mapping for DRAM and NVDIMM both

 arch/x86/include/asm/e820/api.h |  3 +++
 arch/x86/kernel/e820.c          | 20 +++++++++++++-
 arch/x86/kernel/setup.c         |  8 ++++++
 arch/x86/mm/init_64.c           | 16 +++++++++++
 include/linux/gfp.h             | 57 ++++++++++++++++++++++++++++++++++++---
 include/linux/memblock.h        | 19 +++++++++++++
 include/linux/mm.h              |  4 +++
 include/linux/mmzone.h          |  3 +++
 mm/Kconfig                      | 16 +++++++++++
 mm/memblock.c                   | 46 +++++++++++++++++++++++++++----
 mm/nobootmem.c                  |  5 ++--
 mm/page_alloc.c                 | 60 ++++++++++++++++++++++++++++++++++++++++-
 12 files changed, 245 insertions(+), 12 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [RFC PATCH v1 4/6] arch/x86/kernel: mark NVDIMM regions from e820_table
  2018-05-08  2:00 [RFC PATCH v1 0/6] use mm to manage NVDIMM (pmem) zone Huaisheng Ye
@ 2018-05-08  2:00 ` Huaisheng Ye
  0 siblings, 0 replies; 4+ messages in thread
From: Huaisheng Ye @ 2018-05-08  2:00 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: mhocko, willy, vbabka, mgorman, pasha.tatashin, alexander.levin,
	hannes, penguin-kernel, colyli, chengnt, hehy1, linux-kernel,
	linux-nvdimm, Huaisheng Ye

During e820__memblock_setup memblock gets entries with type
E820_TYPE_RAM, E820_TYPE_RESERVED_KERN and E820_TYPE_PMEM from
e820_table, then marks NVDIMM regions with flag MEMBLOCK_NVDIMM.

Create function as e820__end_of_nvm_pfn to calculate max_pfn with
NVDIMM region, while zone_sizes_init needs max_pfn to get
arch_zone_lowest/highest_possible_pfn. During free_area_init_nodes,
the possible pfns need to be recalculated for ZONE_NVM.

Signed-off-by: Huaisheng Ye <yehs1@lenovo.com>
Signed-off-by: Ocean He <hehy1@lenovo.com>
---
 arch/x86/include/asm/e820/api.h |  3 +++
 arch/x86/kernel/e820.c          | 20 +++++++++++++++++++-
 arch/x86/kernel/setup.c         |  8 ++++++++
 3 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/e820/api.h b/arch/x86/include/asm/e820/api.h
index 62be73b..b8006c3 100644
--- a/arch/x86/include/asm/e820/api.h
+++ b/arch/x86/include/asm/e820/api.h
@@ -22,6 +22,9 @@
 extern void e820__update_table_print(void);
 
 extern unsigned long e820__end_of_ram_pfn(void);
+#ifdef CONFIG_ZONE_NVM
+extern unsigned long e820__end_of_nvm_pfn(void);
+#endif
 extern unsigned long e820__end_of_low_ram_pfn(void);
 
 extern u64  e820__memblock_alloc_reserved(u64 size, u64 align);
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 6a2cb14..1bf7876 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -840,6 +840,13 @@ unsigned long __init e820__end_of_ram_pfn(void)
 	return e820_end_pfn(MAX_ARCH_PFN, E820_TYPE_RAM);
 }
 
+#ifdef CONFIG_ZONE_NVM
+unsigned long __init e820__end_of_nvm_pfn(void)
+{
+	return e820_end_pfn(MAX_ARCH_PFN, E820_TYPE_PMEM);
+}
+#endif
+
 unsigned long __init e820__end_of_low_ram_pfn(void)
 {
 	return e820_end_pfn(1UL << (32 - PAGE_SHIFT), E820_TYPE_RAM);
@@ -1264,11 +1271,22 @@ void __init e820__memblock_setup(void)
 		end = entry->addr + entry->size;
 		if (end != (resource_size_t)end)
 			continue;
-
+#ifdef CONFIG_ZONE_NVM
+		if (entry->type != E820_TYPE_RAM && entry->type != E820_TYPE_RESERVED_KERN &&
+								entry->type != E820_TYPE_PMEM)
+#else
 		if (entry->type != E820_TYPE_RAM && entry->type != E820_TYPE_RESERVED_KERN)
+#endif
 			continue;
 
 		memblock_add(entry->addr, entry->size);
+
+#ifdef CONFIG_ZONE_NVM
+		if (entry->type == E820_TYPE_PMEM) {
+			/* Mark this region with PMEM flags */
+			memblock_mark_nvdimm(entry->addr, entry->size);
+		}
+#endif
 	}
 
 	/* Throw away partial pages: */
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 6285697..305975b 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1031,7 +1031,15 @@ void __init setup_arch(char **cmdline_p)
 	 * partially used pages are not usable - thus
 	 * we are rounding upwards:
 	 */
+#ifdef CONFIG_ZONE_NVM
+	max_pfn = e820__end_of_nvm_pfn();
+	if (!max_pfn) {
+		printk(KERN_INFO "No physical NVDIMM has been found\n");
+		max_pfn = e820__end_of_ram_pfn();
+	}
+#else
 	max_pfn = e820__end_of_ram_pfn();
+#endif
 
 	/* update e820 for memory not covered by WB MTRRs */
 	mtrr_bp_init();
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [RFC PATCH v1 4/6] arch/x86/kernel: mark NVDIMM regions from e820_table
  2018-05-08  2:30 [RFC PATCH v1 0/6] use mm to manage NVDIMM (pmem) zone Huaisheng Ye
@ 2018-05-08  2:30 ` Huaisheng Ye
  0 siblings, 0 replies; 4+ messages in thread
From: Huaisheng Ye @ 2018-05-08  2:30 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: mhocko, willy, vbabka, mgorman, pasha.tatashin, alexander.levin,
	hannes, penguin-kernel, colyli, chengnt, hehy1, linux-kernel,
	linux-nvdimm, Huaisheng Ye

During e820__memblock_setup memblock gets entries with type
E820_TYPE_RAM, E820_TYPE_RESERVED_KERN and E820_TYPE_PMEM from
e820_table, then marks NVDIMM regions with flag MEMBLOCK_NVDIMM.

Create function as e820__end_of_nvm_pfn to calculate max_pfn with
NVDIMM region, while zone_sizes_init needs max_pfn to get
arch_zone_lowest/highest_possible_pfn. During free_area_init_nodes,
the possible pfns need to be recalculated for ZONE_NVM.

Signed-off-by: Huaisheng Ye <yehs1@lenovo.com>
Signed-off-by: Ocean He <hehy1@lenovo.com>
---
 arch/x86/include/asm/e820/api.h |  3 +++
 arch/x86/kernel/e820.c          | 20 +++++++++++++++++++-
 arch/x86/kernel/setup.c         |  8 ++++++++
 3 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/e820/api.h b/arch/x86/include/asm/e820/api.h
index 62be73b..b8006c3 100644
--- a/arch/x86/include/asm/e820/api.h
+++ b/arch/x86/include/asm/e820/api.h
@@ -22,6 +22,9 @@
 extern void e820__update_table_print(void);
 
 extern unsigned long e820__end_of_ram_pfn(void);
+#ifdef CONFIG_ZONE_NVM
+extern unsigned long e820__end_of_nvm_pfn(void);
+#endif
 extern unsigned long e820__end_of_low_ram_pfn(void);
 
 extern u64  e820__memblock_alloc_reserved(u64 size, u64 align);
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 71c11ad..c1dc1cc 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -840,6 +840,13 @@ unsigned long __init e820__end_of_ram_pfn(void)
 	return e820_end_pfn(MAX_ARCH_PFN, E820_TYPE_RAM);
 }
 
+#ifdef CONFIG_ZONE_NVM
+unsigned long __init e820__end_of_nvm_pfn(void)
+{
+	return e820_end_pfn(MAX_ARCH_PFN, E820_TYPE_PMEM);
+}
+#endif
+
 unsigned long __init e820__end_of_low_ram_pfn(void)
 {
 	return e820_end_pfn(1UL << (32 - PAGE_SHIFT), E820_TYPE_RAM);
@@ -1246,11 +1253,22 @@ void __init e820__memblock_setup(void)
 		end = entry->addr + entry->size;
 		if (end != (resource_size_t)end)
 			continue;
-
+#ifdef CONFIG_ZONE_NVM
+		if (entry->type != E820_TYPE_RAM && entry->type != E820_TYPE_RESERVED_KERN &&
+								entry->type != E820_TYPE_PMEM)
+#else
 		if (entry->type != E820_TYPE_RAM && entry->type != E820_TYPE_RESERVED_KERN)
+#endif
 			continue;
 
 		memblock_add(entry->addr, entry->size);
+
+#ifdef CONFIG_ZONE_NVM
+		if (entry->type == E820_TYPE_PMEM) {
+			/* Mark this region with PMEM flags */
+			memblock_mark_nvdimm(entry->addr, entry->size);
+		}
+#endif
 	}
 
 	/* Throw away partial pages: */
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 4c616be..84c4ddb 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1032,7 +1032,15 @@ void __init setup_arch(char **cmdline_p)
 	 * partially used pages are not usable - thus
 	 * we are rounding upwards:
 	 */
+#ifdef CONFIG_ZONE_NVM
+	max_pfn = e820__end_of_nvm_pfn();
+	if (!max_pfn) {
+		printk(KERN_INFO "No physical NVDIMM has been found\n");
+		max_pfn = e820__end_of_ram_pfn();
+	}
+#else
 	max_pfn = e820__end_of_ram_pfn();
+#endif
 
 	/* update e820 for memory not covered by WB MTRRs */
 	mtrr_bp_init();
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-05-08  2:18 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-05-08  2:00 [RFC PATCH v1 0/6] use mm to manage NVDIMM (pmem) zone Huaisheng Ye
2018-05-08  2:00 ` [RFC PATCH v1 4/6] arch/x86/kernel: mark NVDIMM regions from e820_table Huaisheng Ye
  -- strict thread matches above, loose matches on Subject: below --
2018-05-08  2:30 [RFC PATCH v1 0/6] use mm to manage NVDIMM (pmem) zone Huaisheng Ye
2018-05-08  2:30 ` [RFC PATCH v1 4/6] arch/x86/kernel: mark NVDIMM regions from e820_table Huaisheng Ye
2018-05-07 14:50 [RFC PATCH v1 0/6] use mm to manage NVDIMM (pmem) zone Huaisheng Ye
2018-05-07 14:50 ` [RFC PATCH v1 4/6] arch/x86/kernel: mark NVDIMM regions from e820_table Huaisheng Ye

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).