linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] Sparsemem Virtual Memmap V5
@ 2007-07-13 13:34 Andy Whitcroft
  2007-07-13 13:35 ` [PATCH 1/7] sparsemem: clean up spelling error in comments Andy Whitcroft
                   ` (8 more replies)
  0 siblings, 9 replies; 38+ messages in thread
From: Andy Whitcroft @ 2007-07-13 13:34 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-arch, Nick Piggin, Christoph Lameter, Mel Gorman,
	Andy Whitcroft

Following this email is the current state of the Sparsemem Virtual
Memory Map support (description below).  This version is essentially
unchanged from V4.  The ia64 crowd has decided they want a 4MB
mapping size for the map which will come later, so this set includes
only base page support for them.  This patch set has been tested
pretty heavily over the last few weeks.

The current aim is to bring a common virtually mapped mem_map
to all architectures.  This should facilitate the removal of the
bespoke implementations from the architectures.  This also brings
performance improvements for most architecture making sparsmem
vmemmap the more desirable memory model.  The ultimate aim of this
work is to expand sparsemem support to encompass all the features
of the other memory models.  This could allow us to drop support
for and remove the other models in the longer term.

Below are some comparitive kernbench numbers for various
architectures, comparing default memory model against SPARSEMEM
VMEMMAP.  All but ia64 show marginal improvement; we expect the ia64
figures to be sorted out when the larger mapping support returns.

x86-64 non-NUMA
             Base    VMEMAP    % change (-ve good)
User        85.07     84.84    -0.26
System      34.32     33.84    -1.39
Total      119.38    118.68    -0.59

ia64
             Base    VMEMAP    % change (-ve good)
User      1016.41   1016.93    0.05
System      50.83     51.02    0.36
Total     1067.25   1067.95    0.07

x86-64 NUMA
             Base   VMEMAP    % change (-ve good)
User        30.77   431.73     0.22
System      45.39    43.98    -3.11
Total      476.17   475.71    -0.10

ppc64
             Base   VMEMAP    % change (-ve good)
User       488.77   488.35    -0.09
System      56.92    56.37    -0.97
Total      545.69   544.72    -0.18

Below are some AIM bencharks on IA64 and x86-64 (thank Bob).  The seems
pretty much flat as you would expect.


ia64 results 2 cpu non-numa 4Gb SCSI disk

Benchmark	Version	Machine	Run Date
AIM Multiuser Benchmark - Suite VII	"1.1"	extreme	Jun  1 07:17:24 2007

Tasks	Jobs/Min	JTI	Real	CPU	Jobs/sec/task
1	98.9		100	58.9	1.3	1.6482
101	5547.1		95	106.0	79.4	0.9154
201	6377.7		95	183.4	158.3	0.5288
301	6932.2		95	252.7	237.3	0.3838
401	7075.8		93	329.8	316.7	0.2941
501	7235.6		94	403.0	396.2	0.2407
600	7387.5		94	472.7	475.0	0.2052

Benchmark	Version	Machine	Run Date
AIM Multiuser Benchmark - Suite VII	"1.1"	vmemmap	Jun  1 09:59:04 2007

Tasks	Jobs/Min	JTI	Real	CPU	Jobs/sec/task
1	99.1		100	58.8	1.2	1.6509
101	5480.9		95	107.2	79.2	0.9044
201	6490.3		95	180.2	157.8	0.5382
301	6886.6		94	254.4	236.8	0.3813
401	7078.2		94	329.7	316.0	0.2942
501	7250.3		95	402.2	395.4	0.2412
600	7399.1		94	471.9	473.9	0.2055


open power 710 2 cpu, 4 Gb, SCSI and configured physically

Benchmark	Version	Machine	Run Date
AIM Multiuser Benchmark - Suite VII	"1.1"	extreme	May 29 15:42:53 2007

Tasks	Jobs/Min	JTI	Real	CPU	Jobs/sec/task
1	25.7		100	226.3	4.3	0.4286
101	1096.0		97	536.4	199.8	0.1809
201	1236.4		96	946.1	389.1	0.1025
301	1280.5		96	1368.0	582.3	0.0709
401	1270.2		95	1837.4	771.0	0.0528
501	1251.4		96	2330.1	955.9	0.0416
601	1252.6		96	2792.4	1139.2	0.0347
701	1245.2		96	3276.5	1334.6	0.0296
918	1229.5		96	4345.4	1728.7	0.0223

Benchmark	Version	Machine	Run Date
AIM Multiuser Benchmark - Suite VII	"1.1"	vmemmap	May 30 07:28:26 2007

Tasks	Jobs/Min	JTI	Real	CPU	Jobs/sec/task
1	25.6		100	226.9	4.3	0.4275
101	1049.3		97	560.2	198.1	0.1731
201	1199.1		97	975.6	390.7	0.0994
301	1261.7		96	1388.5	591.5	0.0699
401	1256.1		96	1858.1	771.9	0.0522
501	1220.1		96	2389.7	955.3	0.0406
601	1224.6		96	2856.3	1133.4	0.0340
701	1252.0		96	3258.7	1314.1	0.0298
915	1232.8		96	4319.7	1704.0	0.0225


amd64 2 2-core, 4Gb and SATA

Benchmark	Version	Machine	Run Date
AIM Multiuser Benchmark - Suite VII	"1.1"	extreme	Jun  2 03:59:48 2007

Tasks	Jobs/Min	JTI	Real	CPU	Jobs/sec/task
1	13.0		100	446.4	2.1	0.2173
101	533.4		97	1102.0	110.2	0.0880
201	578.3		97	2022.8	220.8	0.0480
301	583.8		97	3000.6	332.3	0.0323
401	580.5		97	4020.1	442.2	0.0241
501	574.8		98	5072.8	558.8	0.0191
600	566.5		98	6163.8	671.0	0.0157

Benchmark	Version	Machine	Run Date
AIM Multiuser Benchmark - Suite VII	"1.1"	vmemmap	Jun  3 04:19:31 2007

Tasks	Jobs/Min	JTI	Real	CPU	Jobs/sec/task
1	13.0		100	447.8	2.0	0.2166
101	536.5		97	1095.6	109.7	0.0885
201	567.7		97	2060.5	219.3	0.0471
301	582.1		96	3009.4	330.2	0.0322
401	578.2		96	4036.4	442.4	0.0240
501	585.1		98	4983.2	555.1	0.0195
600	565.5		98	6175.2	660.6	0.0157

This stack is against v2.6.22-rc6-mm1.  It has been compile, boot
and tested on x86_64, ia64 and PPC64.

Andrew, please consider for -mm.

Note that I am away from my keyboard all of next week, but I figured
it better to get this out for testing.

-apw

===
SPARSEMEM is a pretty nice framework that unifies quite a bit of
code over all the arches. It would be great if it could be the
default so that we can get rid of various forms of DISCONTIG and
other variations on memory maps. So far what has hindered this are
the additional lookups that SPARSEMEM introduces for virt_to_page
and page_address. This goes so far that the code to do this has to
be kept in a separate function and cannot be used inline.

This patch introduces a virtual memmap mode for SPARSEMEM, in which
the memmap is mapped into a virtually contigious area, only the
active sections are physically backed.  This allows virt_to_page
page_address and cohorts become simple shift/add operations.
No page flag fields, no table lookups, nothing involving memory
is required.

The two key operations pfn_to_page and page_to_page become:

   #define __pfn_to_page(pfn)      (vmemmap + (pfn))
   #define __page_to_pfn(page)     ((page) - vmemmap)

By having a virtual mapping for the memmap we allow simple access
without wasting physical memory.  As kernel memory is typically
already mapped 1:1 this introduces no additional overhead.
The virtual mapping must be big enough to allow a struct page to
be allocated and mapped for all valid physical pages.  This vill
make a virtual memmap difficult to use on 32 bit platforms that
support 36 address bits.

However, if there is enough virtual space available and the arch
already maps its 1-1 kernel space using TLBs (f.e. true of IA64
and x86_64) then this technique makes SPARSEMEM lookups even more
efficient than CONFIG_FLATMEM.  FLATMEM needs to read the contents
of the mem_map variable to get the start of the memmap and then add
the offset to the required entry.  vmemmap is a constant to which
we can simply add the offset.

This patch has the potential to allow us to make SPARSMEM the default
(and even the only) option for most systems.  It should be optimal
on UP, SMP and NUMA on most platforms.  Then we may even be able
to remove the other memory models: FLATMEM, DISCONTIG etc.

V4->V5
 - IA64 16MB support shelved
 - rebase to current -mm

V3->V4
 - SPARC64 support -- from Dave Miller
 - PPC64 support -- from Andy Whitcroft
 - sparsemem precense/valid split
 - rename Kconfig options into SPARSEMEM configuration name space
 - redundant vmemmap alignment removed
 - split out PMD support to x86_64
 - x86_64 Kconfig dependancies
 - ia64 Kconfig dependancies
 - sparc64 dependancies, cleanup defines
 - cleanup function names _pop_ -> _populate_
 - markup __meminit
 - cleanup style
 - whitespace cleanups

V2->V3
 - Add IA64 16M vmemmap size support (reduces TLB pressure)
 - Add function to test for eventual node/node vmemmap overlaps
 - Upper / Lower boundary fix.

V1->V2
 - Support for PAGE_SIZE vmemmap which allows the general use of
   of virtual memmap on any MMU capable platform (enabled IA64
   support).
 - Fix various issues as suggested by Dave Hansen.
 - Add comments and error handling.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 1/7] sparsemem: clean up spelling error in comments
  2007-07-13 13:34 [PATCH 0/7] Sparsemem Virtual Memmap V5 Andy Whitcroft
@ 2007-07-13 13:35 ` Andy Whitcroft
  2007-07-13 13:35 ` [PATCH 2/7] sparsemem: record when a section has a valid mem_map Andy Whitcroft
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 38+ messages in thread
From: Andy Whitcroft @ 2007-07-13 13:35 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-arch, Nick Piggin, Christoph Lameter, Mel Gorman,
	Andy Whitcroft


Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Mel Gorman <mel@csn.ul.ie>
---
diff --git a/mm/sparse.c b/mm/sparse.c
index b2327e0..ec6ead6 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -101,7 +101,7 @@ static inline int sparse_index_init(unsigned long section_nr, int nid)
 
 /*
  * Although written for the SPARSEMEM_EXTREME case, this happens
- * to also work for the flat array case becase
+ * to also work for the flat array case because
  * NR_SECTION_ROOTS==NR_MEM_SECTIONS.
  */
 int __section_nr(struct mem_section* ms)

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 2/7] sparsemem: record when a section has a valid mem_map
  2007-07-13 13:34 [PATCH 0/7] Sparsemem Virtual Memmap V5 Andy Whitcroft
  2007-07-13 13:35 ` [PATCH 1/7] sparsemem: clean up spelling error in comments Andy Whitcroft
@ 2007-07-13 13:35 ` Andy Whitcroft
  2007-07-13 13:36 ` [PATCH 3/7] Generic Virtual Memmap support for SPARSEMEM Andy Whitcroft
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 38+ messages in thread
From: Andy Whitcroft @ 2007-07-13 13:35 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-arch, Nick Piggin, Christoph Lameter, Mel Gorman,
	Andy Whitcroft


We have flags to indicate whether a section actually has a valid
mem_map associated with it.  This is never set and we rely solely
on the present bit to indicate a section is valid.  By definition
a section is not valid if it has no mem_map and there is a window
during init where the present bit is set but there is no mem_map,
during which pfn_valid() will return true incorrectly.

Use the existing SECTION_HAS_MEM_MAP flag to indicate the presence
of a valid mem_map.  Switch valid_section{,_nr} and pfn_valid()
to this bit.  Add a new present_section{,_nr} and pfn_present()
interfaces for those users who care to know that a section is going
to be valid.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Mel Gorman <mel@csn.ul.ie>
---
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 74b9679..f1f0af8 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -239,7 +239,7 @@ store_mem_state(struct sys_device *dev, const char *buf, size_t count)
 	mem = container_of(dev, struct memory_block, sysdev);
 	phys_section_nr = mem->phys_index;
 
-	if (!valid_section_nr(phys_section_nr))
+	if (!present_section_nr(phys_section_nr))
 		goto out;
 
 	if (!strncmp(buf, "online", min((int)count, 6)))
@@ -419,7 +419,7 @@ int register_new_memory(struct mem_section *section)
 
 int unregister_memory_section(struct mem_section *section)
 {
-	if (!valid_section(section))
+	if (!present_section(section))
 		return -EINVAL;
 
 	return remove_memory_block(0, section, 0);
@@ -444,7 +444,7 @@ int __init memory_dev_init(void)
 	 * during boot and have been initialized
 	 */
 	for (i = 0; i < NR_MEM_SECTIONS; i++) {
-		if (!valid_section_nr(i))
+		if (!present_section_nr(i))
 			continue;
 		err = add_memory_block(0, __nr_to_section(i), MEM_ONLINE, 0);
 		if (!ret)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 26341a6..f83317b 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -792,12 +792,17 @@ static inline struct page *__section_mem_map_addr(struct mem_section *section)
 	return (struct page *)map;
 }
 
-static inline int valid_section(struct mem_section *section)
+static inline int present_section(struct mem_section *section)
 {
 	return (section && (section->section_mem_map & SECTION_MARKED_PRESENT));
 }
 
-static inline int section_has_mem_map(struct mem_section *section)
+static inline int present_section_nr(unsigned long nr)
+{
+	return present_section(__nr_to_section(nr));
+}
+
+static inline int valid_section(struct mem_section *section)
 {
 	return (section && (section->section_mem_map & SECTION_HAS_MEM_MAP));
 }
@@ -819,6 +824,13 @@ static inline int pfn_valid(unsigned long pfn)
 	return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
 }
 
+static inline int pfn_present(unsigned long pfn)
+{
+        if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
+                return 0;
+        return present_section(__nr_to_section(pfn_to_section_nr(pfn)));
+}
+
 /*
  * These are _only_ used during initialisation, therefore they
  * can use __initdata ...  They could have names to indicate
diff --git a/mm/sparse.c b/mm/sparse.c
index ec6ead6..d6678ab 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -170,7 +170,7 @@ unsigned long __init node_memmap_size_bytes(int nid, unsigned long start_pfn,
 		if (nid != early_pfn_to_nid(pfn))
 			continue;
 
-		if (pfn_valid(pfn))
+		if (pfn_present(pfn))
 			nr_pages += PAGES_PER_SECTION;
 	}
 
@@ -201,11 +201,12 @@ static int __meminit sparse_init_one_section(struct mem_section *ms,
 		unsigned long pnum, struct page *mem_map,
 		unsigned long *pageblock_bitmap)
 {
-	if (!valid_section(ms))
+	if (!present_section(ms))
 		return -EINVAL;
 
 	ms->section_mem_map &= ~SECTION_MAP_MASK;
-	ms->section_mem_map |= sparse_encode_mem_map(mem_map, pnum);
+	ms->section_mem_map |= sparse_encode_mem_map(mem_map, pnum) |
+							SECTION_HAS_MEM_MAP;
 	ms->pageblock_flags = pageblock_bitmap;
 
 	return 1;
@@ -282,7 +283,7 @@ void __init sparse_init(void)
 	unsigned long *usemap;
 
 	for (pnum = 0; pnum < NR_MEM_SECTIONS; pnum++) {
-		if (!valid_section_nr(pnum))
+		if (!present_section_nr(pnum))
 			continue;
 
 		map = sparse_early_mem_map_alloc(pnum);

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 3/7] Generic Virtual Memmap support for SPARSEMEM
  2007-07-13 13:34 [PATCH 0/7] Sparsemem Virtual Memmap V5 Andy Whitcroft
  2007-07-13 13:35 ` [PATCH 1/7] sparsemem: clean up spelling error in comments Andy Whitcroft
  2007-07-13 13:35 ` [PATCH 2/7] sparsemem: record when a section has a valid mem_map Andy Whitcroft
@ 2007-07-13 13:36 ` Andy Whitcroft
  2007-07-13 14:51   ` KAMEZAWA Hiroyuki
  2007-07-14 15:20   ` Christoph Hellwig
  2007-07-13 13:36 ` [PATCH 4/7] x86_64: SPARSEMEM_VMEMMAP 2M page size support Andy Whitcroft
                   ` (5 subsequent siblings)
  8 siblings, 2 replies; 38+ messages in thread
From: Andy Whitcroft @ 2007-07-13 13:36 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-arch, Nick Piggin, Christoph Lameter, Mel Gorman,
	Andy Whitcroft


SPARSEMEM is a pretty nice framework that unifies quite a bit of
code over all the arches. It would be great if it could be the
default so that we can get rid of various forms of DISCONTIG and
other variations on memory maps. So far what has hindered this are
the additional lookups that SPARSEMEM introduces for virt_to_page
and page_address. This goes so far that the code to do this has to
be kept in a separate function and cannot be used inline.

This patch introduces a virtual memmap mode for SPARSEMEM, in which
the memmap is mapped into a virtually contigious area, only the
active sections are physically backed.  This allows virt_to_page
page_address and cohorts become simple shift/add operations.
No page flag fields, no table lookups, nothing involving memory
is required.

The two key operations pfn_to_page and page_to_page become:

   #define __pfn_to_page(pfn)      (vmemmap + (pfn))
   #define __page_to_pfn(page)     ((page) - vmemmap)

By having a virtual mapping for the memmap we allow simple access
without wasting physical memory.  As kernel memory is typically
already mapped 1:1 this introduces no additional overhead.
The virtual mapping must be big enough to allow a struct page to
be allocated and mapped for all valid physical pages.  This vill
make a virtual memmap difficult to use on 32 bit platforms that
support 36 address bits.

However, if there is enough virtual space available and the arch
already maps its 1-1 kernel space using TLBs (f.e. true of IA64
and x86_64) then this technique makes SPARSEMEM lookups even more
efficient than CONFIG_FLATMEM.  FLATMEM needs to read the contents
of the mem_map variable to get the start of the memmap and then add
the offset to the required entry.  vmemmap is a constant to which
we can simply add the offset.

This patch has the potential to allow us to make SPARSMEM the default
(and even the only) option for most systems.  It should be optimal
on UP, SMP and NUMA on most platforms.  Then we may even be able
to remove the other memory models: FLATMEM, DISCONTIG etc.

[apw@shadowen.org: config cleanups, resplit code etc]
From: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Mel Gorman <mel@csn.ul.ie>
---
diff --git a/include/asm-generic/memory_model.h b/include/asm-generic/memory_model.h
index 30d8d33..52226e1 100644
--- a/include/asm-generic/memory_model.h
+++ b/include/asm-generic/memory_model.h
@@ -46,6 +46,12 @@
 	 __pgdat->node_start_pfn;					\
 })
 
+#elif defined(CONFIG_SPARSEMEM_VMEMMAP)
+
+/* memmap is virtually contigious.  */
+#define __pfn_to_page(pfn)	(vmemmap + (pfn))
+#define __page_to_pfn(page)	((page) - vmemmap)
+
 #elif defined(CONFIG_SPARSEMEM)
 /*
  * Note: section's mem_map is encorded to reflect its start_pfn.
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 69f4210..e9d8c32 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1261,5 +1261,10 @@ extern int randomize_va_space;
 
 __attribute__((weak)) const char *arch_vma_name(struct vm_area_struct *vma);
 
+int vmemmap_populate(struct page *start_page, unsigned long pages, int node);
+int vmemmap_populate_pmd(pud_t *, unsigned long, unsigned long, int);
+void *vmemmap_alloc_block(unsigned long size, int node);
+void vmemmap_verify(pte_t *, int, unsigned long, unsigned long);
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
diff --git a/mm/sparse.c b/mm/sparse.c
index d6678ab..5cc6e74 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -9,6 +9,8 @@
 #include <linux/spinlock.h>
 #include <linux/vmalloc.h>
 #include <asm/dma.h>
+#include <asm/pgalloc.h>
+#include <asm/pgtable.h>
 
 /*
  * Permanent SPARSEMEM data:
@@ -218,6 +220,192 @@ void *alloc_bootmem_high_node(pg_data_t *pgdat, unsigned long size)
 	return NULL;
 }
 
+#ifdef CONFIG_SPARSEMEM_VMEMMAP
+/*
+ * Virtual Memory Map support
+ *
+ * (C) 2007 sgi. Christoph Lameter <clameter@sgi.com>.
+ *
+ * Virtual memory maps allow VM primitives pfn_to_page, page_to_pfn,
+ * virt_to_page, page_address() to be implemented as a base offset
+ * calculation without memory access.
+ *
+ * However, virtual mappings need a page table and TLBs. Many Linux
+ * architectures already map their physical space using 1-1 mappings
+ * via TLBs. For those arches the virtual memmory map is essentially
+ * for free if we use the same page size as the 1-1 mappings. In that
+ * case the overhead consists of a few additional pages that are
+ * allocated to create a view of memory for vmemmap.
+ *
+ * Special Kconfig settings:
+ *
+ * CONFIG_ARCH_POPULATES_SPARSEMEM_VMEMMAP
+ *
+ * 	The architecture has its own functions to populate the memory
+ * 	map and provides a vmemmap_populate function.
+ *
+ * CONFIG_ARCH_POPULATES_SPARSEMEM_VMEMMAP_PMD
+ *
+ * 	The architecture provides functions to populate the pmd level
+ * 	of the vmemmap mappings.  Allowing mappings using large pages
+ * 	where available.
+ *
+ * 	If neither are set then PAGE_SIZE mappings are generated which
+ * 	require one PTE/TLB per PAGE_SIZE chunk of the virtual memory map.
+ */
+
+/*
+ * Allocate a block of memory to be used to back the virtual memory map
+ * or to back the page tables that are used to create the mapping.
+ * Uses the main allocators if they are available, else bootmem.
+ */
+void * __meminit vmemmap_alloc_block(unsigned long size, int node)
+{
+	/* If the main allocator is up use that, fallback to bootmem. */
+	if (slab_is_available()) {
+		struct page *page = alloc_pages_node(node,
+				GFP_KERNEL | __GFP_ZERO, get_order(size));
+		if (page)
+			return page_address(page);
+		return NULL;
+	} else
+		return __alloc_bootmem_node(NODE_DATA(node), size, size,
+				__pa(MAX_DMA_ADDRESS));
+}
+
+#ifndef CONFIG_ARCH_POPULATES_SPARSEMEM_VMEMMAP
+void __meminit vmemmap_verify(pte_t *pte, int node,
+				unsigned long start, unsigned long end)
+{
+	unsigned long pfn = pte_pfn(*pte);
+	int actual_node = early_pfn_to_nid(pfn);
+
+	if (actual_node != node)
+		printk(KERN_WARNING "[%lx-%lx] potential offnode "
+			"page_structs\n", start, end - 1);
+}
+
+#ifndef CONFIG_ARCH_POPULATES_SPARSEMEM_VMEMMAP_PMD
+static int __meminit vmemmap_populate_pte(pmd_t *pmd, unsigned long addr,
+					unsigned long end, int node)
+{
+	pte_t *pte;
+
+	for (pte = pte_offset_map(pmd, addr); addr < end;
+						pte++, addr += PAGE_SIZE)
+		if (pte_none(*pte)) {
+			pte_t entry;
+			void *p = vmemmap_alloc_block(PAGE_SIZE, node);
+			if (!p)
+				return -ENOMEM;
+
+			entry = pfn_pte(__pa(p) >> PAGE_SHIFT, PAGE_KERNEL);
+			set_pte(pte, entry);
+
+			printk(KERN_DEBUG "[%lx-%lx] PTE ->%p on node %d\n",
+				addr, addr + PAGE_SIZE - 1, p, node);
+
+		} else
+			vmemmap_verify(pte, node, addr + PAGE_SIZE, end);
+
+	return 0;
+}
+
+int __meminit vmemmap_populate_pmd(pud_t *pud, unsigned long addr,
+						unsigned long end, int node)
+{
+	pmd_t *pmd;
+	int error = 0;
+
+	for (pmd = pmd_offset(pud, addr); addr < end && !error;
+						pmd++, addr += PMD_SIZE) {
+		if (pmd_none(*pmd)) {
+			void *p = vmemmap_alloc_block(PAGE_SIZE, node);
+			if (!p)
+				return -ENOMEM;
+
+			pmd_populate_kernel(&init_mm, pmd, p);
+		} else
+			vmemmap_verify((pte_t *)pmd, node,
+					pmd_addr_end(addr, end), end);
+
+		error = vmemmap_populate_pte(pmd, addr,
+					pmd_addr_end(addr, end), node);
+	}
+	return error;
+}
+#endif /* CONFIG_ARCH_POPULATES_SPARSEMEM_VMEMMAP_PMD */
+
+static int __meminit vmemmap_populate_pud(pgd_t *pgd, unsigned long addr,
+						unsigned long end, int node)
+{
+	pud_t *pud;
+	int error = 0;
+
+	for (pud = pud_offset(pgd, addr); addr < end && !error;
+						pud++, addr += PUD_SIZE) {
+		if (pud_none(*pud)) {
+			void *p = vmemmap_alloc_block(PAGE_SIZE, node);
+			if (!p)
+				return -ENOMEM;
+
+			pud_populate(&init_mm, pud, p);
+		}
+		error = vmemmap_populate_pmd(pud, addr,
+					pud_addr_end(addr, end), node);
+	}
+	return error;
+}
+
+int __meminit vmemmap_populate(struct page *start_page,
+						unsigned long nr, int node)
+{
+	pgd_t *pgd;
+	unsigned long addr = (unsigned long)start_page;
+	unsigned long end = (unsigned long)(start_page + nr);
+	int error = 0;
+
+	printk(KERN_DEBUG "[%lx-%lx] Virtual memory section"
+		" (%ld pages) node %d\n", addr, end - 1, nr, node);
+
+	for (pgd = pgd_offset_k(addr); addr < end && !error;
+					pgd++, addr += PGDIR_SIZE) {
+		if (pgd_none(*pgd)) {
+			void *p = vmemmap_alloc_block(PAGE_SIZE, node);
+			if (!p)
+				return -ENOMEM;
+
+			pgd_populate(&init_mm, pgd, p);
+		}
+		error = vmemmap_populate_pud(pgd, addr,
+					pgd_addr_end(addr, end), node);
+	}
+	return error;
+}
+#endif /* !CONFIG_ARCH_POPULATES_SPARSEMEM_VMEMMAP */
+
+static struct page * __init sparse_early_mem_map_alloc(unsigned long pnum)
+{
+	struct page *map;
+	struct mem_section *ms = __nr_to_section(pnum);
+	int nid = sparse_early_nid(ms);
+	int error;
+
+	map = pfn_to_page(pnum * PAGES_PER_SECTION);
+	error = vmemmap_populate(map, PAGES_PER_SECTION, nid);
+	if (error) {
+		printk(KERN_ERR "%s: allocation failed. Error=%d\n",
+							__FUNCTION__, error);
+		printk(KERN_ERR "%s: virtual memory map backing failed "
+			"some memory will not be available.\n", __FUNCTION__);
+		ms->section_mem_map = 0;
+		return NULL;
+	}
+	return map;
+}
+
+#else /* CONFIG_SPARSEMEM_VMEMMAP */
+
 static struct page __init *sparse_early_mem_map_alloc(unsigned long pnum)
 {
 	struct page *map;
@@ -242,6 +430,7 @@ static struct page __init *sparse_early_mem_map_alloc(unsigned long pnum)
 	ms->section_mem_map = 0;
 	return NULL;
 }
+#endif /* !CONFIG_SPARSEMEM_VMEMMAP */
 
 static unsigned long usemap_size(void)
 {

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 4/7] x86_64: SPARSEMEM_VMEMMAP 2M page size support
  2007-07-13 13:34 [PATCH 0/7] Sparsemem Virtual Memmap V5 Andy Whitcroft
                   ` (2 preceding siblings ...)
  2007-07-13 13:36 ` [PATCH 3/7] Generic Virtual Memmap support for SPARSEMEM Andy Whitcroft
@ 2007-07-13 13:36 ` Andy Whitcroft
  2007-07-19 23:25   ` Andrew Morton
  2007-07-13 13:37 ` [PATCH 5/7] IA64: SPARSEMEM_VMEMMAP 16K " Andy Whitcroft
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 38+ messages in thread
From: Andy Whitcroft @ 2007-07-13 13:36 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-arch, Nick Piggin, Christoph Lameter, Mel Gorman,
	Andy Whitcroft


x86_64 uses 2M page table entries to map its 1-1 kernel space.
We also implement the virtual memmap using 2M page table entries.  So
there is no additional runtime overhead over FLATMEM, initialisation
is slightly more complex.  As FLATMEM still references memory to
obtain the mem_map pointer and SPARSEMEM_VMEMMAP uses a compile
time constant, SPARSEMEM_VMEMMAP should be superior.

With this SPARSEMEM becomes the most efficient way of handling
virt_to_page, pfn_to_page and friends for UP, SMP and NUMA on x86_64.

[apw@shadowen.org: code resplit, style fixups]
From: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Mel Gorman <mel@csn.ul.ie>
---
diff --git a/Documentation/x86_64/mm.txt b/Documentation/x86_64/mm.txt
index f42798e..b89b6d2 100644
--- a/Documentation/x86_64/mm.txt
+++ b/Documentation/x86_64/mm.txt
@@ -9,6 +9,7 @@ ffff800000000000 - ffff80ffffffffff (=40 bits) guard hole
 ffff810000000000 - ffffc0ffffffffff (=46 bits) direct mapping of all phys. memory
 ffffc10000000000 - ffffc1ffffffffff (=40 bits) hole
 ffffc20000000000 - ffffe1ffffffffff (=45 bits) vmalloc/ioremap space
+ffffe20000000000 - ffffe2ffffffffff (=40 bits) virtual memory map (1TB)
 ... unused hole ...
 ffffffff80000000 - ffffffff82800000 (=40 MB)   kernel text mapping, from phys 0
 ... unused hole ...
diff --git a/arch/x86_64/Kconfig b/arch/x86_64/Kconfig
index 9a7a66f..603afa2 100644
--- a/arch/x86_64/Kconfig
+++ b/arch/x86_64/Kconfig
@@ -418,6 +418,14 @@ config ARCH_SPARSEMEM_ENABLE
 	def_bool y
 	depends on (NUMA || EXPERIMENTAL)
 
+config SPARSEMEM_VMEMMAP
+	def_bool y
+	depends on SPARSEMEM
+
+config ARCH_POPULATES_SPARSEMEM_VMEMMAP_PMD
+	def_bool y
+	depends on SPARSEMEM_VMEMMAP
+
 config ARCH_MEMORY_PROBE
 	def_bool y
 	depends on MEMORY_HOTPLUG
diff --git a/arch/x86_64/mm/init.c b/arch/x86_64/mm/init.c
index 955c98e..1b52c76 100644
--- a/arch/x86_64/mm/init.c
+++ b/arch/x86_64/mm/init.c
@@ -752,3 +752,33 @@ const char *arch_vma_name(struct vm_area_struct *vma)
 		return "[vsyscall]";
 	return NULL;
 }
+
+#ifdef CONFIG_ARCH_POPULATES_SPARSEMEM_VMEMMAP_PMD
+/*
+ * Initialise the sparsemem vmemmap using huge-pages at the PMD level.
+ */
+int __meminit vmemmap_populate_pmd(pud_t *pud, unsigned long addr,
+						unsigned long end, int node)
+{
+	pmd_t *pmd;
+
+	for (pmd = pmd_offset(pud, addr); addr < end;
+						pmd++, addr += PMD_SIZE)
+		if (pmd_none(*pmd)) {
+			pte_t entry;
+			void *p = vmemmap_alloc_block(PMD_SIZE, node);
+			if (!p)
+				return -ENOMEM;
+
+			entry = pfn_pte(__pa(p) >> PAGE_SHIFT, PAGE_KERNEL);
+			mk_pte_huge(entry);
+			set_pmd(pmd, __pmd(pte_val(entry)));
+
+			printk(KERN_DEBUG " [%lx-%lx] PMD ->%p on node %d\n",
+				addr, addr + PMD_SIZE - 1, p, node);
+		} else
+			vmemmap_verify((pte_t *)pmd, node,
+						pmd_addr_end(addr, end), end);
+	return 0;
+}
+#endif
diff --git a/include/asm-x86_64/page.h b/include/asm-x86_64/page.h
index 88adf1a..c3b52bc 100644
--- a/include/asm-x86_64/page.h
+++ b/include/asm-x86_64/page.h
@@ -134,6 +134,7 @@ extern unsigned long __phys_addr(unsigned long);
 	 VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC)
 
 #define __HAVE_ARCH_GATE_AREA 1	
+#define vmemmap ((struct page *)VMEMMAP_START)
 
 #include <asm-generic/memory_model.h>
 #include <asm-generic/page.h>
diff --git a/include/asm-x86_64/pgtable.h b/include/asm-x86_64/pgtable.h
index 5674f4a..f7e759f 100644
--- a/include/asm-x86_64/pgtable.h
+++ b/include/asm-x86_64/pgtable.h
@@ -137,6 +137,7 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm, unsigned long
 #define MAXMEM		 _AC(0x3fffffffffff, UL)
 #define VMALLOC_START    _AC(0xffffc20000000000, UL)
 #define VMALLOC_END      _AC(0xffffe1ffffffffff, UL)
+#define VMEMMAP_START	 _AC(0xffffe20000000000, UL)
 #define MODULES_VADDR    _AC(0xffffffff88000000, UL)
 #define MODULES_END      _AC(0xfffffffffff00000, UL)
 #define MODULES_LEN   (MODULES_END - MODULES_VADDR)

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 5/7] IA64: SPARSEMEM_VMEMMAP 16K page size support
  2007-07-13 13:34 [PATCH 0/7] Sparsemem Virtual Memmap V5 Andy Whitcroft
                   ` (3 preceding siblings ...)
  2007-07-13 13:36 ` [PATCH 4/7] x86_64: SPARSEMEM_VMEMMAP 2M page size support Andy Whitcroft
@ 2007-07-13 13:37 ` Andy Whitcroft
  2007-07-13 13:37 ` [PATCH 6/7] SPARC64: SPARSEMEM_VMEMMAP support Andy Whitcroft
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 38+ messages in thread
From: Andy Whitcroft @ 2007-07-13 13:37 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-arch, Nick Piggin, Christoph Lameter, Mel Gorman,
	Andy Whitcroft


Equip IA64 sparsemem with a virtual memmap. This is similar to the
existing CONFIG_VIRTUAL_MEM_MAP functionality for DISCONTIGMEM.
It uses a PAGE_SIZE mapping.

This is provided as a minimally intrusive solution. We split the
128TB VMALLOC area into two 64TB areas and use one for the virtual
memmap.

This should replace CONFIG_VIRTUAL_MEM_MAP long term.

From: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Mel Gorman <mel@csn.ul.ie>
---
diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 7a2bd33..ac91a3f 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -360,6 +360,10 @@ config ARCH_SPARSEMEM_ENABLE
 	def_bool y
 	depends on ARCH_DISCONTIGMEM_ENABLE
 
+config SPARSEMEM_VMEMMAP
+	def_bool y
+	depends on SPARSEMEM
+
 config ARCH_DISCONTIGMEM_DEFAULT
 	def_bool y if (IA64_SGI_SN2 || IA64_GENERIC || IA64_HP_ZX1 || IA64_HP_ZX1_SWIOTLB)
 	depends on ARCH_DISCONTIGMEM_ENABLE
diff --git a/include/asm-ia64/pgtable.h b/include/asm-ia64/pgtable.h
index f923d81..033e21d 100644
--- a/include/asm-ia64/pgtable.h
+++ b/include/asm-ia64/pgtable.h
@@ -236,8 +236,14 @@ ia64_phys_addr_valid (unsigned long addr)
 # define VMALLOC_END		vmalloc_end
   extern unsigned long vmalloc_end;
 #else
+#if defined(CONFIG_SPARSEMEM) && defined(CONFIG_SPARSEMEM_VMEMMAP)
+/* SPARSEMEM_VMEMMAP uses half of vmalloc... */
+# define VMALLOC_END		(RGN_BASE(RGN_GATE) + (1UL << (4*PAGE_SHIFT - 10)))
+# define vmemmap		((struct page *)VMALLOC_END)
+#else
 # define VMALLOC_END		(RGN_BASE(RGN_GATE) + (1UL << (4*PAGE_SHIFT - 9)))
 #endif
+#endif
 
 /* fs/proc/kcore.c */
 #define	kc_vaddr_to_offset(v) ((v) - RGN_BASE(RGN_GATE))

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 6/7] SPARC64: SPARSEMEM_VMEMMAP support
  2007-07-13 13:34 [PATCH 0/7] Sparsemem Virtual Memmap V5 Andy Whitcroft
                   ` (4 preceding siblings ...)
  2007-07-13 13:37 ` [PATCH 5/7] IA64: SPARSEMEM_VMEMMAP 16K " Andy Whitcroft
@ 2007-07-13 13:37 ` Andy Whitcroft
  2007-07-13 17:00   ` Christoph Lameter
  2007-07-13 13:38 ` [PATCH 7/7] ppc64: " Andy Whitcroft
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 38+ messages in thread
From: Andy Whitcroft @ 2007-07-13 13:37 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-arch, Nick Piggin, Christoph Lameter, Mel Gorman,
	Andy Whitcroft


Hey Christoph, here is sparc64 support for this stuff.

After implementing this and seeing more and more how it works, I
really like it :-)

Thanks a lot for doing this work Christoph!

[apw@shadowen.org: style fixups]
From: David Miller <davem@davemloft.net>
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Mel Gorman <mel@csn.ul.ie>
---
diff --git a/arch/sparc64/Kconfig b/arch/sparc64/Kconfig
index c73830d..9344dce 100644
--- a/arch/sparc64/Kconfig
+++ b/arch/sparc64/Kconfig
@@ -228,10 +228,17 @@ config ARCH_SPARSEMEM_ENABLE
 
 config ARCH_SPARSEMEM_DEFAULT
 	def_bool y
-	select SPARSEMEM_STATIC
 
 source "mm/Kconfig"
 
+config SPARSEMEM_VMEMMAP
+	def_bool y
+	depends on SPARSEMEM
+
+config ARCH_POPULATES_SPARSEMEM_VMEMMAP
+	def_bool y
+	depends on SPARSEMEM_VMEMMAP
+
 config ISA
 	bool
 	help
diff --git a/arch/sparc64/kernel/ktlb.S b/arch/sparc64/kernel/ktlb.S
index d4024ac..964527d 100644
--- a/arch/sparc64/kernel/ktlb.S
+++ b/arch/sparc64/kernel/ktlb.S
@@ -226,6 +226,15 @@ kvmap_dtlb_load:
 	ba,pt		%xcc, sun4v_dtlb_load
 	 mov		%g5, %g3
 
+kvmap_vmemmap:
+	sub		%g4, %g5, %g5
+	srlx		%g5, 22, %g5
+	sethi		%hi(vmemmap_table), %g1
+	sllx		%g5, 3, %g5
+	or		%g1, %lo(vmemmap_table), %g1
+	ba,pt		%xcc, kvmap_dtlb_load
+	 ldx		[%g1 + %g5], %g5
+
 kvmap_dtlb_nonlinear:
 	/* Catch kernel NULL pointer derefs.  */
 	sethi		%hi(PAGE_SIZE), %g5
@@ -233,6 +242,13 @@ kvmap_dtlb_nonlinear:
 	bleu,pn		%xcc, kvmap_dtlb_longpath
 	 nop
 
+	/* Do not use the TSB for vmemmap.  */
+	mov		(VMEMMAP_BASE >> 24), %g5
+	sllx		%g5, 24, %g5
+	cmp		%g4,%g5
+	bgeu,pn		%xcc, kvmap_vmemmap
+	 nop
+
 	KERN_TSB_LOOKUP_TL1(%g4, %g6, %g5, %g1, %g2, %g3, kvmap_dtlb_load)
 
 kvmap_dtlb_tsbmiss:
diff --git a/arch/sparc64/mm/init.c b/arch/sparc64/mm/init.c
index 3010227..9bb6688 100644
--- a/arch/sparc64/mm/init.c
+++ b/arch/sparc64/mm/init.c
@@ -1647,6 +1647,58 @@ EXPORT_SYMBOL(_PAGE_E);
 unsigned long _PAGE_CACHE __read_mostly;
 EXPORT_SYMBOL(_PAGE_CACHE);
 
+#ifdef CONFIG_ARCH_POPULATES_SPARSEMEM_VMEMMAP
+
+#define VMEMMAP_CHUNK_SHIFT	22
+#define VMEMMAP_CHUNK		(1UL << VMEMMAP_CHUNK_SHIFT)
+#define VMEMMAP_CHUNK_MASK	~(VMEMMAP_CHUNK - 1UL)
+#define VMEMMAP_ALIGN(x)	(((x)+VMEMMAP_CHUNK-1UL)&VMEMMAP_CHUNK_MASK)
+
+#define VMEMMAP_SIZE	((((1UL << MAX_PHYSADDR_BITS) >> PAGE_SHIFT) * \
+			  sizeof(struct page *)) >> VMEMMAP_CHUNK_SHIFT)
+unsigned long vmemmap_table[VMEMMAP_SIZE];
+
+int __meminit vmemmap_populate(struct page *start, unsigned long nr, int node)
+{
+	unsigned long vstart = (unsigned long) start;
+	unsigned long vend = (unsigned long) (start + nr);
+	unsigned long phys_start = (vstart - VMEMMAP_BASE);
+	unsigned long phys_end = (vend - VMEMMAP_BASE);
+	unsigned long addr = phys_start & VMEMMAP_CHUNK_MASK;
+	unsigned long end = VMEMMAP_ALIGN(phys_end);
+	unsigned long pte_base;
+
+	pte_base = (_PAGE_VALID | _PAGE_SZ4MB_4U |
+		    _PAGE_CP_4U | _PAGE_CV_4U |
+		    _PAGE_P_4U | _PAGE_W_4U);
+	if (tlb_type == hypervisor)
+		pte_base = (_PAGE_VALID | _PAGE_SZ4MB_4V |
+			    _PAGE_CP_4V | _PAGE_CV_4V |
+			    _PAGE_P_4V | _PAGE_W_4V);
+
+	for (; addr < end; addr += VMEMMAP_CHUNK) {
+		unsigned long *vmem_pp =
+			vmemmap_table + (addr >> VMEMMAP_CHUNK_SHIFT);
+		void *block;
+
+		if (!(*vmem_pp & _PAGE_VALID)) {
+			block = vmemmap_alloc_block(1UL << 22, node);
+			if (!block)
+				return -ENOMEM;
+
+			*vmem_pp = pte_base | __pa(block);
+
+			printk(KERN_INFO "[%p-%p] page_structs=%lu "
+			       "node=%d entry=%lu/%lu\n", start, block, nr,
+			       node,
+			       addr >> VMEMMAP_CHUNK_SHIFT,
+			       VMEMMAP_SIZE >> VMEMMAP_CHUNK_SHIFT);
+		}
+	}
+	return 0;
+}
+#endif /* CONFIG_ARCH_POPULATES_SPARSEMEM_VMEMMAP */
+
 static void prot_init_common(unsigned long page_none,
 			     unsigned long page_shared,
 			     unsigned long page_copy,
diff --git a/include/asm-sparc64/pgtable.h b/include/asm-sparc64/pgtable.h
index 0393380..3167ccf 100644
--- a/include/asm-sparc64/pgtable.h
+++ b/include/asm-sparc64/pgtable.h
@@ -42,6 +42,9 @@
 #define HI_OBP_ADDRESS		_AC(0x0000000100000000,UL)
 #define VMALLOC_START		_AC(0x0000000100000000,UL)
 #define VMALLOC_END		_AC(0x0000000200000000,UL)
+#define VMEMMAP_BASE		_AC(0x0000000200000000,UL)
+
+#define vmemmap			((struct page *)VMEMMAP_BASE)
 
 /* XXX All of this needs to be rethought so we can take advantage
  * XXX cheetah's full 64-bit virtual address space, ie. no more hole

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 7/7] ppc64: SPARSEMEM_VMEMMAP support
  2007-07-13 13:34 [PATCH 0/7] Sparsemem Virtual Memmap V5 Andy Whitcroft
                   ` (5 preceding siblings ...)
  2007-07-13 13:37 ` [PATCH 6/7] SPARC64: SPARSEMEM_VMEMMAP support Andy Whitcroft
@ 2007-07-13 13:38 ` Andy Whitcroft
  2007-07-13 17:04 ` [PATCH 0/7] Sparsemem Virtual Memmap V5 Christoph Lameter
  2007-07-26  8:05 ` Paul Mundt
  8 siblings, 0 replies; 38+ messages in thread
From: Andy Whitcroft @ 2007-07-13 13:38 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-arch, Nick Piggin, Christoph Lameter, Mel Gorman,
	Andy Whitcroft


Enable virtual memmap support for SPARSEMEM on PPC64 systems.
Slice a 16th off the end of the linear mapping space and use that
to hold the vmemmap.  Uses the same size mapping as uses in the
linear 1:1 kernel mapping.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Mel Gorman <mel@csn.ul.ie>
---
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 5c5c487..c1212f0 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -277,6 +277,14 @@ config ARCH_POPULATES_NODE_MAP
 
 source "mm/Kconfig"
 
+config SPARSEMEM_VMEMMAP
+	def_bool y
+	depends on SPARSEMEM
+
+config ARCH_POPULATES_SPARSEMEM_VMEMMAP
+	def_bool y
+	depends on SPARSEMEM_VMEMMAP
+
 config ARCH_MEMORY_PROBE
 	def_bool y
 	depends on MEMORY_HOTPLUG
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index 1d6edf7..2de3b5d 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -182,3 +182,67 @@ void pgtable_cache_init(void)
 						     NULL);
 	}
 }
+
+#ifdef CONFIG_ARCH_POPULATES_SPARSEMEM_VMEMMAP
+
+/*
+ * Convert an address within the vmemmap into a pfn.  Note that we have
+ * to do this by hand as the proffered address may not be correctly aligned.
+ * Subtraction of non-aligned pointers produces undefined results.
+ */
+#define VMM_SECTION(addr) \
+		(((((unsigned long)(addr)) - ((unsigned long)(vmemmap))) / \
+		sizeof(struct page)) >> PFN_SECTION_SHIFT)
+#define VMM_SECTION_PAGE(addr)	(VMM_SECTION(addr) << PFN_SECTION_SHIFT)
+
+/*
+ * Check if this vmemmap page is already initialised.  If any section
+ * which overlaps this vmemmap page is initialised then this page is
+ * initialised already.
+ */
+int __meminit vmemmap_populated(unsigned long start, int page_size)
+{
+	unsigned long end = start + page_size;
+
+	for (; start < end; start += (PAGES_PER_SECTION * sizeof(struct page)))
+		if (pfn_valid(VMM_SECTION_PAGE(start)))
+			return 1;
+
+	return 0;
+}
+
+int __meminit vmemmap_populate(struct page *start_page,
+					unsigned long nr_pages, int node)
+{
+	unsigned long mode_rw;
+	unsigned long start = (unsigned long)start_page;
+	unsigned long end = (unsigned long)(start_page + nr_pages);
+	unsigned long page_size = 1 << mmu_psize_defs[mmu_linear_psize].shift;
+
+	mode_rw = _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_COHERENT | PP_RWXX;
+
+	/* Align to the page size of the linear mapping. */
+	start = _ALIGN_DOWN(start, page_size);
+
+	for (; start < end; start += page_size) {
+		int mapped;
+		void *p;
+
+		if (vmemmap_populated(start, page_size))
+			continue;
+
+		p = vmemmap_alloc_block(page_size, node);
+		if (!p)
+			return -ENOMEM;
+
+		printk(KERN_WARNING "vmemmap %08lx allocated at %p, "
+					"physical %p.\n", start, p, __pa(p));
+
+		mapped = htab_bolt_mapping(start, start + page_size,
+					__pa(p), mode_rw, mmu_linear_psize);
+		BUG_ON(mapped < 0);
+	}
+
+	return 0;
+}
+#endif
diff --git a/include/asm-powerpc/pgtable-ppc64.h b/include/asm-powerpc/pgtable-ppc64.h
index 7ca8b5c..9577650 100644
--- a/include/asm-powerpc/pgtable-ppc64.h
+++ b/include/asm-powerpc/pgtable-ppc64.h
@@ -68,6 +68,14 @@
 #define USER_REGION_ID		(0UL)
 
 /*
+ * Defines the address of the vmemap area, in the top 16th of the
+ * kernel region.
+ */
+#define VMEMMAP_BASE (ASM_CONST(CONFIG_KERNEL_START) + \
+					(0xfUL << (REGION_SHIFT - 4)))
+#define vmemmap ((struct page *)VMEMMAP_BASE)
+
+/*
  * Common bits in a linux-style PTE.  These match the bits in the
  * (hardware-defined) PowerPC PTE as closely as possible. Additional
  * bits may be defined in pgtable-*.h

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/7] Generic Virtual Memmap support for SPARSEMEM
  2007-07-13 13:36 ` [PATCH 3/7] Generic Virtual Memmap support for SPARSEMEM Andy Whitcroft
@ 2007-07-13 14:51   ` KAMEZAWA Hiroyuki
  2007-07-13 22:42     ` Christoph Lameter
  2007-07-14 15:20   ` Christoph Hellwig
  1 sibling, 1 reply; 38+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-07-13 14:51 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: linux-mm, linux-arch, npiggin, clameter, mel

On Fri, 13 Jul 2007 14:36:08 +0100
Andy Whitcroft <apw@shadowen.org> wrote:

> SPARSEMEM is a pretty nice framework that unifies quite a bit of
> code over all the arches. It would be great if it could be the
> default so that we can get rid of various forms of DISCONTIG and
> other variations on memory maps. So far what has hindered this are
> the additional lookups that SPARSEMEM introduces for virt_to_page
> and page_address. This goes so far that the code to do this has to
> be kept in a separate function and cannot be used inline.
> 
Maybe it will be our(my or Goto-san's) work to implement MEMORY_HOTADD support
for this. Could you add !MEMORY_HOTPLUG in Kconfig ? Then, we'll write
patch later.
Or..If you'll add memory hotplug support by yourself, It's great, 

-Kame

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 6/7] SPARC64: SPARSEMEM_VMEMMAP support
  2007-07-13 13:37 ` [PATCH 6/7] SPARC64: SPARSEMEM_VMEMMAP support Andy Whitcroft
@ 2007-07-13 17:00   ` Christoph Lameter
  0 siblings, 0 replies; 38+ messages in thread
From: Christoph Lameter @ 2007-07-13 17:00 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: linux-mm, linux-arch, Nick Piggin, Mel Gorman

Acked-by: Christoph Lameter <clameter@sgi.com>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 0/7] Sparsemem Virtual Memmap V5
  2007-07-13 13:34 [PATCH 0/7] Sparsemem Virtual Memmap V5 Andy Whitcroft
                   ` (6 preceding siblings ...)
  2007-07-13 13:38 ` [PATCH 7/7] ppc64: " Andy Whitcroft
@ 2007-07-13 17:04 ` Christoph Lameter
  2007-07-13 17:40   ` Andrew Morton
  2007-07-26  8:05 ` Paul Mundt
  8 siblings, 1 reply; 38+ messages in thread
From: Christoph Lameter @ 2007-07-13 17:04 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-arch, Andy Whitcroft, Nick Piggin, Mel Gorman

On Fri, 13 Jul 2007, Andy Whitcroft wrote:

> Andrew, please consider for -mm.
> 
> Note that I am away from my keyboard all of next week, but I figured
> it better to get this out for testing.

Yes grumble. Why does it take so long...

Would it be possible to merge this for 2.6.23 (maybe late?). This has been 
around for 6 months now. It removes the troubling lookups in 
virt_to_page and page_address in sparsemem that have spooked many of us. 

virt_to_page efficiency is a performance issue for kfree and 
kmem_cache_free in the slab allocators. I inserted probles and saw 
that the patchset cuts down the cycles spend in virt_to_page by 50%.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 0/7] Sparsemem Virtual Memmap V5
  2007-07-13 17:04 ` [PATCH 0/7] Sparsemem Virtual Memmap V5 Christoph Lameter
@ 2007-07-13 17:40   ` Andrew Morton
  2007-07-13 18:23     ` Christoph Lameter
                       ` (3 more replies)
  0 siblings, 4 replies; 38+ messages in thread
From: Andrew Morton @ 2007-07-13 17:40 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-mm, linux-arch, Andy Whitcroft, Nick Piggin, Mel Gorman

On Fri, 13 Jul 2007 10:04:45 -0700 (PDT) Christoph Lameter <clameter@sgi.com> wrote:

> On Fri, 13 Jul 2007, Andy Whitcroft wrote:
> 
> > Andrew, please consider for -mm.
> > 
> > Note that I am away from my keyboard all of next week, but I figured
> > it better to get this out for testing.
> 
> Yes grumble. Why does it take so long...

gaah, I read linux-arch and linux-mm rather intermittently and I haven't
even seen these yet.

> Would it be possible to merge this for 2.6.23 (maybe late?).

It would be nice to see a bit of spirited reviewing from the affected arch
maintainers and mm people...

There's already an enormous amount of mm stuff banked up and it looks like
I get to hold onto a lot of that until 2.6.24.  We seem to be spending too
little time on the first 90% of new stuff and too little time on the last
10% of existing stuff.


> This has been 
> around for 6 months now. It removes the troubling lookups in 
> virt_to_page and page_address in sparsemem that have spooked many of us. 
> 
> virt_to_page efficiency is a performance issue for kfree and 
> kmem_cache_free in the slab allocators. I inserted probles and saw 
> that the patchset cuts down the cycles spend in virt_to_page by 50%.



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 0/7] Sparsemem Virtual Memmap V5
  2007-07-13 17:40   ` Andrew Morton
@ 2007-07-13 18:23     ` Christoph Lameter
  2007-07-14  8:57       ` Russell King
  2007-07-13 20:08     ` Roman Zippel
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 38+ messages in thread
From: Christoph Lameter @ 2007-07-13 18:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-arch, Andy Whitcroft, Nick Piggin, Mel Gorman

On Fri, 13 Jul 2007, Andrew Morton wrote:

> It would be nice to see a bit of spirited reviewing from the affected arch
> maintainers and mm people...

That was already done a long time ago. Maybe you do not remember it.

See 
http://marc.info/?l=linux-kernel&m=117556067909158&w=2
http://marc.info/?l=linux-kernel&m=117598342420719&w=2
http://marc.info/?l=linux-kernel&m=117541139915535&w=2
http://marc.info/?l=linux-kernel&m=116556142519461&w=2

> There's already an enormous amount of mm stuff banked up and it looks like
> I get to hold onto a lot of that until 2.6.24.  We seem to be spending too
> little time on the first 90% of new stuff and too little time on the last
> 10% of existing stuff.

Well without this we cannot perform the cleanup of the miscellaneous 
memory models around. The longer this is held up the longer the discontig 
etc will stay in the tree with all the associated #ifdeffery.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 0/7] Sparsemem Virtual Memmap V5
  2007-07-13 17:40   ` Andrew Morton
  2007-07-13 18:23     ` Christoph Lameter
@ 2007-07-13 20:08     ` Roman Zippel
  2007-07-13 22:02     ` Luck, Tony
  2007-07-13 22:43     ` David Miller
  3 siblings, 0 replies; 38+ messages in thread
From: Roman Zippel @ 2007-07-13 20:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, linux-mm, linux-arch, Andy Whitcroft,
	Nick Piggin, Mel Gorman

Hi,

On Friday 13 July 2007, Andrew Morton wrote:

> > Would it be possible to merge this for 2.6.23 (maybe late?).
>
> It would be nice to see a bit of spirited reviewing from the affected arch
> maintainers and mm people...

As far as m68k is concerned I like it, especially that it gets rid of the 
explicit table lookup. :)

bye, Roman

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH 0/7] Sparsemem Virtual Memmap V5
  2007-07-13 17:40   ` Andrew Morton
  2007-07-13 18:23     ` Christoph Lameter
  2007-07-13 20:08     ` Roman Zippel
@ 2007-07-13 22:02     ` Luck, Tony
  2007-07-13 22:21       ` Christoph Lameter
  2007-07-13 22:43     ` David Miller
  3 siblings, 1 reply; 38+ messages in thread
From: Luck, Tony @ 2007-07-13 22:02 UTC (permalink / raw)
  To: Andrew Morton, Christoph Lameter
  Cc: linux-mm, linux-arch, Andy Whitcroft, Nick Piggin, Mel Gorman

> It would be nice to see a bit of spirited reviewing from the affected arch
> maintainers and mm people...

I'm 100% in favour of the direction this patch is taking ... eventually
it will allow getting rid of several config options, and thus 2^several
less config options to test.

On the question of whether it should be squeezed into 2.6.23 ... I have
mixed feelings.  On the negative side:

1) There is a small performance regression for ia64 (which is promised
to go away when bigger pages are used for the mem_map, but I'd like to
see that this really does fix the issue).

2) Fujitsu pointed out that there is work to be done to port HOTPLUG
code to this.

On the positive side:
1) There are few ia64 developers working on -mm ... so progress will
continue to be glacial unless this goes into mainline.

2) The patch appears to co-exist with all the existing CONFIG options,
so it doesn't break anything (well, all my test configs still compile
cleanly ... I haven't actually test booted them all yet).

Finally one gripe with the current version of the patch.  This debug
trace is WAY too verbose during boot!

mm/sparse.c
+			printk(KERN_DEBUG "[%lx-%lx] PTE ->%p on node %d\n",
+				addr, addr + PAGE_SIZE - 1, p, node);

-Tony

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH 0/7] Sparsemem Virtual Memmap V5
  2007-07-13 22:02     ` Luck, Tony
@ 2007-07-13 22:21       ` Christoph Lameter
  2007-07-13 22:37         ` Luck, Tony
  2007-07-14  8:49         ` Nick Piggin
  0 siblings, 2 replies; 38+ messages in thread
From: Christoph Lameter @ 2007-07-13 22:21 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Andrew Morton, linux-mm, linux-arch, Andy Whitcroft, Nick Piggin,
	Mel Gorman

On Fri, 13 Jul 2007, Luck, Tony wrote:

> 1) There is a small performance regression for ia64 (which is promised
> to go away when bigger pages are used for the mem_map, but I'd like to
> see that this really does fix the issue).

The performance should be better than the existing one since we have even 
less code here than discontig. We do no have to fetch the base anymore or 
check boundaries (discontig was the baseline right?) but we have exactly 
the same method of pfn_to_page and page_to_pfn as discontig/vmemmap.

These types of variation may come about due to the concurrency in memory 
detection / reservations in the PROM on Altix systems which results in 
variances in the placement of key memory areas. Performance often varies 
slightly because of these issues.

If performance testing done on an Altix then the solution is to redo 
the tests a couple of time, each time rebooting the box. Or redo it again 
on a SMP box that does not have these variations.

How many tests were done and on what platform?


^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH 0/7] Sparsemem Virtual Memmap V5
  2007-07-13 22:21       ` Christoph Lameter
@ 2007-07-13 22:37         ` Luck, Tony
  2007-07-13 22:54           ` Christoph Lameter
  2007-07-14  8:49         ` Nick Piggin
  1 sibling, 1 reply; 38+ messages in thread
From: Luck, Tony @ 2007-07-13 22:37 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andrew Morton, linux-mm, linux-arch, Andy Whitcroft, Nick Piggin,
	Mel Gorman

> How many tests were done and on what platform?

Andy's part 0/7 post starts off with the performance numbers.  He
didn't say which ia64 platform was used for the tests.

Looking my logs for the last few kernel builds (some built on a
tiger_defconfig kernel which uses CONFIG_VIRTUAL_MEM_MAP=y, and
some with the new CONFIG_SPARSEMEM_VMEMMAP) I'd have a tough time
saying whether there was a regression or not).

-Tony

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/7] Generic Virtual Memmap support for SPARSEMEM
  2007-07-13 14:51   ` KAMEZAWA Hiroyuki
@ 2007-07-13 22:42     ` Christoph Lameter
  2007-07-13 23:12       ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 38+ messages in thread
From: Christoph Lameter @ 2007-07-13 22:42 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: Andy Whitcroft, linux-mm, linux-arch, npiggin, mel

On Fri, 13 Jul 2007, KAMEZAWA Hiroyuki wrote:

> On Fri, 13 Jul 2007 14:36:08 +0100
> Andy Whitcroft <apw@shadowen.org> wrote:
> 
> > SPARSEMEM is a pretty nice framework that unifies quite a bit of
> > code over all the arches. It would be great if it could be the
> > default so that we can get rid of various forms of DISCONTIG and
> > other variations on memory maps. So far what has hindered this are
> > the additional lookups that SPARSEMEM introduces for virt_to_page
> > and page_address. This goes so far that the code to do this has to
> > be kept in a separate function and cannot be used inline.
> > 
> Maybe it will be our(my or Goto-san's) work to implement MEMORY_HOTADD support
> for this. Could you add !MEMORY_HOTPLUG in Kconfig ? Then, we'll write
> patch later.
> Or..If you'll add memory hotplug support by yourself, It's great, 

Why would hotadd not work as is?


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 0/7] Sparsemem Virtual Memmap V5
  2007-07-13 17:40   ` Andrew Morton
                       ` (2 preceding siblings ...)
  2007-07-13 22:02     ` Luck, Tony
@ 2007-07-13 22:43     ` David Miller
  3 siblings, 0 replies; 38+ messages in thread
From: David Miller @ 2007-07-13 22:43 UTC (permalink / raw)
  To: akpm; +Cc: clameter, linux-mm, linux-arch, apw, npiggin, mel

From: Andrew Morton <akpm@linux-foundation.org>
Date: Fri, 13 Jul 2007 10:40:44 -0700

> On Fri, 13 Jul 2007 10:04:45 -0700 (PDT) Christoph Lameter <clameter@sgi.com> wrote:
> 
> > Would it be possible to merge this for 2.6.23 (maybe late?).
> 
> It would be nice to see a bit of spirited reviewing from the affected arch
> maintainers and mm people...

I have no objection to this work and would like to see it go
in sooner rather than later.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH 0/7] Sparsemem Virtual Memmap V5
  2007-07-13 22:37         ` Luck, Tony
@ 2007-07-13 22:54           ` Christoph Lameter
  2007-07-13 23:27             ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 38+ messages in thread
From: Christoph Lameter @ 2007-07-13 22:54 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Andrew Morton, linux-mm, linux-arch, Andy Whitcroft, Nick Piggin,
	Mel Gorman

On Fri, 13 Jul 2007, Luck, Tony wrote:

> > How many tests were done and on what platform?
> 
> Andy's part 0/7 post starts off with the performance numbers.  He
> didn't say which ia64 platform was used for the tests.
> 
> Looking my logs for the last few kernel builds (some built on a
> tiger_defconfig kernel which uses CONFIG_VIRTUAL_MEM_MAP=y, and
> some with the new CONFIG_SPARSEMEM_VMEMMAP) I'd have a tough time
> saying whether there was a regression or not).

I'd be very surprised if there is any difference because the IA64 code for 
virtual memmap is the source of ideas and implementation for SPARSE_VIRTUAL.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/7] Generic Virtual Memmap support for SPARSEMEM
  2007-07-13 22:42     ` Christoph Lameter
@ 2007-07-13 23:12       ` KAMEZAWA Hiroyuki
  2007-07-13 23:17         ` Christoph Lameter
  0 siblings, 1 reply; 38+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-07-13 23:12 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: apw, linux-mm, linux-arch, npiggin, mel

On Fri, 13 Jul 2007 15:42:30 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:

> On Fri, 13 Jul 2007, KAMEZAWA Hiroyuki wrote:
> 
> > On Fri, 13 Jul 2007 14:36:08 +0100
> > Andy Whitcroft <apw@shadowen.org> wrote:
> > 
> > > SPARSEMEM is a pretty nice framework that unifies quite a bit of
> > > code over all the arches. It would be great if it could be the
> > > default so that we can get rid of various forms of DISCONTIG and
> > > other variations on memory maps. So far what has hindered this are
> > > the additional lookups that SPARSEMEM introduces for virt_to_page
> > > and page_address. This goes so far that the code to do this has to
> > > be kept in a separate function and cannot be used inline.
> > > 
> > Maybe it will be our(my or Goto-san's) work to implement MEMORY_HOTADD support
> > for this. Could you add !MEMORY_HOTPLUG in Kconfig ? Then, we'll write
> > patch later.
> > Or..If you'll add memory hotplug support by yourself, It's great, 
> 
> Why would hotadd not work as is?
> 
Just because this patch takes care of boot path. Maybe small problem.
Basically, I welcome this patch. I like this.
If we can remove DISCONTIG+VMEMMAP after this is merged, we can say good-bye
to terrible CONFIG_HOLES_IN_ZONE :)

Note
From memory hotplug development/enhancement view, I have following thinking now.
 
 1. memmap's section is *not* aligned to "big page size". We have to take care
    of this at adding support for memory_hotplug/unplug.

 2. With an appropriate patch, we can allocate new section's memmap from
    itself. This will reduce possibility of memory hotplug failure becasue of
    large size kmalloc/vmalloc. And it guarantees locality of memmap.
    But maybe need some amount of work for implementing this in clean way.
    This will depend on vmemmap.

 3. removin memmap code for memory unplug will be necessary. But there is no code
    for removing memmap in usual SPARSEMEM. So this is not real problem of vmemmap
    now. 

Thanks,
 -Kame

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/7] Generic Virtual Memmap support for SPARSEMEM
  2007-07-13 23:12       ` KAMEZAWA Hiroyuki
@ 2007-07-13 23:17         ` Christoph Lameter
  2007-07-13 23:25           ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 38+ messages in thread
From: Christoph Lameter @ 2007-07-13 23:17 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: apw, linux-mm, linux-arch, npiggin, mel

On Sat, 14 Jul 2007, KAMEZAWA Hiroyuki wrote:

> Just because this patch takes care of boot path. Maybe small problem.

Ahh. I just looked at it. Yes we did not modify the hotplug path. It needs
to call the new vmemmap alloc functions.

> Basically, I welcome this patch. I like this.

Yes you proposed the initial version of this last year. Thanks.

> If we can remove DISCONTIG+VMEMMAP after this is merged, we can say good-bye
> to terrible CONFIG_HOLES_IN_ZONE :)

Right. Horrible stuff. Lots of useless cachelines that have to be 
references in critical paths.

> Note
> >From memory hotplug development/enhancement view, I have following thinking now.
>  
>  1. memmap's section is *not* aligned to "big page size". We have to take care
>     of this at adding support for memory_hotplug/unplug.

You can call the functions for virtual memmap allocation directly. They 
are already generic and will call the page allocator instead of the 
bootmem allocator if the system is already. They will give you the 
properly aligned memory. Perhaps you can just change a few lines 
in sparse_add_one_section to call the vmemmap functions instead?

>  2. With an appropriate patch, we can allocate new section's memmap from
>     itself. This will reduce possibility of memory hotplug failure becasue of
>     large size kmalloc/vmalloc. And it guarantees locality of memmap.
>     But maybe need some amount of work for implementing this in clean way.
>     This will depend on vmemmap.

That is a good idea. Maybe do the simple approach first and then the other 
one?

> 
>  3. removin memmap code for memory unplug will be necessary. But there is no code
>     for removing memmap in usual SPARSEMEM. So this is not real problem of vmemmap
>     now. 

Right. It would have to be added later anyways.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/7] Generic Virtual Memmap support for SPARSEMEM
  2007-07-13 23:17         ` Christoph Lameter
@ 2007-07-13 23:25           ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 38+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-07-13 23:25 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: apw, linux-mm, linux-arch, npiggin, mel

On Fri, 13 Jul 2007 16:17:32 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:

> > Note
> > >From memory hotplug development/enhancement view, I have following thinking now.
> >  
> >  1. memmap's section is *not* aligned to "big page size". We have to take care
> >     of this at adding support for memory_hotplug/unplug.
> 
> You can call the functions for virtual memmap allocation directly. They 
> are already generic and will call the page allocator instead of the 
> bootmem allocator if the system is already. They will give you the 
> properly aligned memory. Perhaps you can just change a few lines 
> in sparse_add_one_section to call the vmemmap functions instead?
> 
yes, I think so now. But we'll see warnings of "section mismatch".
Because this patch includes following.

==
func() {
	if()
		call_generic_func
	else
		call_boot_func.
}
==



> >  2. With an appropriate patch, we can allocate new section's memmap from
> >     itself. This will reduce possibility of memory hotplug failure becasue of
> >     large size kmalloc/vmalloc. And it guarantees locality of memmap.
> >     But maybe need some amount of work for implementing this in clean way.
> >     This will depend on vmemmap.
> 
> That is a good idea. Maybe do the simple approach first and then the other 
> one?
Yes, simple first. Above one will be an option for people who use 
big-section-size, like ia64.

-Kame

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 0/7] Sparsemem Virtual Memmap V5
  2007-07-13 22:54           ` Christoph Lameter
@ 2007-07-13 23:27             ` KAMEZAWA Hiroyuki
  2007-07-13 23:28               ` Christoph Lameter
  0 siblings, 1 reply; 38+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-07-13 23:27 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: tony.luck, akpm, linux-mm, linux-arch, apw, npiggin, mel

On Fri, 13 Jul 2007 15:54:25 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:

> On Fri, 13 Jul 2007, Luck, Tony wrote:
> 
> > > How many tests were done and on what platform?
> > 
> > Andy's part 0/7 post starts off with the performance numbers.  He
> > didn't say which ia64 platform was used for the tests.
> > 
> > Looking my logs for the last few kernel builds (some built on a
> > tiger_defconfig kernel which uses CONFIG_VIRTUAL_MEM_MAP=y, and
> > some with the new CONFIG_SPARSEMEM_VMEMMAP) I'd have a tough time
> > saying whether there was a regression or not).
> 
> I'd be very surprised if there is any difference because the IA64 code for 
> virtual memmap is the source of ideas and implementation for SPARSE_VIRTUAL.
> 
Maybe pfn_valid() implementation is different from ?

Thanks,
-Kame

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 0/7] Sparsemem Virtual Memmap V5
  2007-07-13 23:27             ` KAMEZAWA Hiroyuki
@ 2007-07-13 23:28               ` Christoph Lameter
  0 siblings, 0 replies; 38+ messages in thread
From: Christoph Lameter @ 2007-07-13 23:28 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: tony.luck, akpm, linux-mm, linux-arch, apw, npiggin, mel

On Sat, 14 Jul 2007, KAMEZAWA Hiroyuki wrote:

> > I'd be very surprised if there is any difference because the IA64 code for 
> > virtual memmap is the source of ideas and implementation for SPARSE_VIRTUAL.
> > 
> Maybe pfn_valid() implementation is different from ?

Right but that should increase the speed and not decrease it since we do 
not have the CONFIG HOLES anymore.



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 0/7] Sparsemem Virtual Memmap V5
  2007-07-13 22:21       ` Christoph Lameter
  2007-07-13 22:37         ` Luck, Tony
@ 2007-07-14  8:49         ` Nick Piggin
  2007-07-14 15:07           ` Christoph Lameter
  1 sibling, 1 reply; 38+ messages in thread
From: Nick Piggin @ 2007-07-14  8:49 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Luck, Tony, Andrew Morton, linux-mm, linux-arch, Andy Whitcroft,
	Mel Gorman

On Fri, Jul 13, 2007 at 03:21:43PM -0700, Christoph Lameter wrote:
> On Fri, 13 Jul 2007, Luck, Tony wrote:
> 
> > 1) There is a small performance regression for ia64 (which is promised
> > to go away when bigger pages are used for the mem_map, but I'd like to
> > see that this really does fix the issue).
> 
> The performance should be better than the existing one since we have even 
> less code here than discontig. We do no have to fetch the base anymore or 
> check boundaries (discontig was the baseline right?) but we have exactly 
> the same method of pfn_to_page and page_to_pfn as discontig/vmemmap.

Isn't it still possible that you could have TLB pressure that would
result in lower performance? I wonder why the large page support for
ia64 was shelved?

FWIW, since I was cc'ed for comments: I really like the patches as well
although much of it is in memory model and arch code which I'm not so
involved with.

It should allow better performance, and unification of most if not all
memory models which will be really nice.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 0/7] Sparsemem Virtual Memmap V5
  2007-07-13 18:23     ` Christoph Lameter
@ 2007-07-14  8:57       ` Russell King
  2007-07-14 15:10         ` Christoph Lameter
  0 siblings, 1 reply; 38+ messages in thread
From: Russell King @ 2007-07-14  8:57 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andrew Morton, linux-mm, linux-arch, Andy Whitcroft, Nick Piggin,
	Mel Gorman

On Fri, Jul 13, 2007 at 11:23:20AM -0700, Christoph Lameter wrote:
> Well without this we cannot perform the cleanup of the miscellaneous 
> memory models around. The longer this is held up the longer the discontig 
> etc will stay in the tree with all the associated #ifdeffery.

It would also be nice to convert ARM to using sparsemem rather than
discontigmem, but despite having a patch adding the supporting common
infrastructure for the last year and a half or so, no one in the ARM
community is interested in it.

Since I've no machines which use the present discontig support and
have more than a single bank of memory, I've no way to test and
progress sparsemem on ARM - and since no one's interested I'm probably
going to drop the ARM sparsemem git branch soon.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 0/7] Sparsemem Virtual Memmap V5
  2007-07-14  8:49         ` Nick Piggin
@ 2007-07-14 15:07           ` Christoph Lameter
  0 siblings, 0 replies; 38+ messages in thread
From: Christoph Lameter @ 2007-07-14 15:07 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Luck, Tony, Andrew Morton, linux-mm, linux-arch, Andy Whitcroft,
	Mel Gorman

On Sat, 14 Jul 2007, Nick Piggin wrote:

> Isn't it still possible that you could have TLB pressure that would
> result in lower performance? I wonder why the large page support for
> ia64 was shelved?

16M Large memmap support was shelved because 16M is too large a size for 
a vmemmap block. It results in the vmemmap overlapping multiple nodes.

The TLB pressure for the 16k support is the same since its the same 
algorithm. We are measuring discontig/vmemmap 16k against sparse/vmemmap 
16k here.

We would likely see a different if we would compare sparsemem vs. 
sparse/vmemmap. Then there may be a difference in TLB pressure. But 16k 
discontig/vmemmap is the current default on IA64.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 0/7] Sparsemem Virtual Memmap V5
  2007-07-14  8:57       ` Russell King
@ 2007-07-14 15:10         ` Christoph Lameter
  2007-07-14 17:16           ` Russell King
  0 siblings, 1 reply; 38+ messages in thread
From: Christoph Lameter @ 2007-07-14 15:10 UTC (permalink / raw)
  To: Russell King
  Cc: Andrew Morton, linux-mm, linux-arch, Andy Whitcroft, Nick Piggin,
	Mel Gorman

On Sat, 14 Jul 2007, Russell King wrote:

> It would also be nice to convert ARM to using sparsemem rather than
> discontigmem, but despite having a patch adding the supporting common
> infrastructure for the last year and a half or so, no one in the ARM
> community is interested in it.

Yeah I was also not interested for the longest time because when I 
became concerned when I looked at the code generated by sparsemem for 
virt_to_page and page_address. But that is different now.
 
> Since I've no machines which use the present discontig support and
> have more than a single bank of memory, I've no way to test and
> progress sparsemem on ARM - and since no one's interested I'm probably
> going to drop the ARM sparsemem git branch soon.

Could you keep it around for a while longer? sparse_virtual now allows 
code reduction in page_address and virt_to_page even vs. discontig.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/7] Generic Virtual Memmap support for SPARSEMEM
  2007-07-13 13:36 ` [PATCH 3/7] Generic Virtual Memmap support for SPARSEMEM Andy Whitcroft
  2007-07-13 14:51   ` KAMEZAWA Hiroyuki
@ 2007-07-14 15:20   ` Christoph Hellwig
  2007-07-14 16:06     ` Christoph Lameter
  2007-07-30 14:39     ` Andy Whitcroft
  1 sibling, 2 replies; 38+ messages in thread
From: Christoph Hellwig @ 2007-07-14 15:20 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: linux-mm, linux-arch, Nick Piggin, Christoph Lameter, Mel Gorman

> --- a/include/asm-generic/memory_model.h
> +++ b/include/asm-generic/memory_model.h
> @@ -46,6 +46,12 @@
>  	 __pgdat->node_start_pfn;					\
>  })
>  
> +#elif defined(CONFIG_SPARSEMEM_VMEMMAP)
> +
> +/* memmap is virtually contigious.  */
> +#define __pfn_to_page(pfn)	(vmemmap + (pfn))
> +#define __page_to_pfn(page)	((page) - vmemmap)
> +
>  #elif defined(CONFIG_SPARSEMEM)

nice ifdef mess you have here.  and an sm-generic file should be something
truely generic instead of a complete ifdef forest.  I think we'd be
much better off duplicating the two lines above in architectures using
it anyway.

> diff --git a/mm/sparse.c b/mm/sparse.c
> index d6678ab..5cc6e74 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -9,6 +9,8 @@
>  #include <linux/spinlock.h>
>  #include <linux/vmalloc.h>
>  #include <asm/dma.h>
> +#include <asm/pgalloc.h>
> +#include <asm/pgtable.h>
>  
>  /*
>   * Permanent SPARSEMEM data:
> @@ -218,6 +220,192 @@ void *alloc_bootmem_high_node(pg_data_t *pgdat, unsigned long size)
>  	return NULL;
>  }
>  
> +#ifdef CONFIG_SPARSEMEM_VMEMMAP
> +/*
> + * Virtual Memory Map support
> + *
> + * (C) 2007 sgi. Christoph Lameter <clameter@sgi.com>.

When did we start putting copyright lines and large block comment in the
middle of the file?

Please sort this and the ifdef mess out, I suspect a new file for this
code would be best.

> +void * __meminit vmemmap_alloc_block(unsigned long size, int node)

void * __meminit vmemmap_alloc_block(unsigned long size, int node)

> +#ifndef CONFIG_ARCH_POPULATES_SPARSEMEM_VMEMMAP
> +void __meminit vmemmap_verify(pte_t *pte, int node,
> +				unsigned long start, unsigned long end)
> +{
> +	unsigned long pfn = pte_pfn(*pte);
> +	int actual_node = early_pfn_to_nid(pfn);
> +
> +	if (actual_node != node)
> +		printk(KERN_WARNING "[%lx-%lx] potential offnode "
> +			"page_structs\n", start, end - 1);
> +}

Given tht this function is a tiny noop please just put them into the
arch dir for !CONFIG_ARCH_POPULATES_SPARSEMEM_VMEMMAP architectures
and save yourself both the ifdef mess and the config option.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/7] Generic Virtual Memmap support for SPARSEMEM
  2007-07-14 15:20   ` Christoph Hellwig
@ 2007-07-14 16:06     ` Christoph Lameter
  2007-07-14 16:33       ` Christoph Hellwig
  2007-07-30 14:39     ` Andy Whitcroft
  1 sibling, 1 reply; 38+ messages in thread
From: Christoph Lameter @ 2007-07-14 16:06 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andy Whitcroft, linux-mm, linux-arch, Nick Piggin, Mel Gorman

On Sat, 14 Jul 2007, Christoph Hellwig wrote:

> >  #elif defined(CONFIG_SPARSEMEM)
> 
> nice ifdef mess you have here.  and an sm-generic file should be something
> truely generic instead of a complete ifdef forest.  I think we'd be
> much better off duplicating the two lines above in architectures using
> it anyway.

Nope these all need to be arch independent otherwise we cannot consolidate 
the code. True these statements became very small with SPARSE_VIRTUAL but 
that is no reason to make an exception just for this new model.

> > +#ifndef CONFIG_ARCH_POPULATES_SPARSEMEM_VMEMMAP
> > +void __meminit vmemmap_verify(pte_t *pte, int node,
> > +				unsigned long start, unsigned long end)
> > +{
> > +	unsigned long pfn = pte_pfn(*pte);
> > +	int actual_node = early_pfn_to_nid(pfn);
> > +
> > +	if (actual_node != node)
> > +		printk(KERN_WARNING "[%lx-%lx] potential offnode "
> > +			"page_structs\n", start, end - 1);
> > +}
> 
> Given tht this function is a tiny noop please just put them into the
> arch dir for !CONFIG_ARCH_POPULATES_SPARSEMEM_VMEMMAP architectures
> and save yourself both the ifdef mess and the config option.

Then its no longer generic. You are ripping the basic framework of 
sparsemem apart.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/7] Generic Virtual Memmap support for SPARSEMEM
  2007-07-14 16:06     ` Christoph Lameter
@ 2007-07-14 16:33       ` Christoph Hellwig
  2007-07-23 19:36         ` Christoph Lameter
  0 siblings, 1 reply; 38+ messages in thread
From: Christoph Hellwig @ 2007-07-14 16:33 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Christoph Hellwig, Andy Whitcroft, linux-mm, linux-arch,
	Nick Piggin, Mel Gorman

On Sat, Jul 14, 2007 at 09:06:58AM -0700, Christoph Lameter wrote:
> > > +#ifndef CONFIG_ARCH_POPULATES_SPARSEMEM_VMEMMAP
> > > +void __meminit vmemmap_verify(pte_t *pte, int node,
> > > +				unsigned long start, unsigned long end)
> > > +{
> > > +	unsigned long pfn = pte_pfn(*pte);
> > > +	int actual_node = early_pfn_to_nid(pfn);
> > > +
> > > +	if (actual_node != node)
> > > +		printk(KERN_WARNING "[%lx-%lx] potential offnode "
> > > +			"page_structs\n", start, end - 1);
> > > +}
> > 
> > Given tht this function is a tiny noop please just put them into the
> > arch dir for !CONFIG_ARCH_POPULATES_SPARSEMEM_VMEMMAP architectures
> > and save yourself both the ifdef mess and the config option.
> 
> Then its no longer generic. You are ripping the basic framework of 
> sparsemem apart.

It's not generic.  Most of it is under a maze of obscure config options.
The patchset in it's current form is a complete mess of obscure ifefery
and not quite generic code.  And it only adds new memory models without
ripping old stuff out.  So while I really like the basic idea the patches
need quite a lot more work until they're mergeable.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 0/7] Sparsemem Virtual Memmap V5
  2007-07-14 15:10         ` Christoph Lameter
@ 2007-07-14 17:16           ` Russell King
  0 siblings, 0 replies; 38+ messages in thread
From: Russell King @ 2007-07-14 17:16 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andrew Morton, linux-mm, linux-arch, Andy Whitcroft, Nick Piggin,
	Mel Gorman

On Sat, Jul 14, 2007 at 08:10:49AM -0700, Christoph Lameter wrote:
> On Sat, 14 Jul 2007, Russell King wrote:
> 
> > It would also be nice to convert ARM to using sparsemem rather than
> > discontigmem, but despite having a patch adding the supporting common
> > infrastructure for the last year and a half or so, no one in the ARM
> > community is interested in it.
> 
> Yeah I was also not interested for the longest time because when I 
> became concerned when I looked at the code generated by sparsemem for 
> virt_to_page and page_address. But that is different now.
>  
> > Since I've no machines which use the present discontig support and
> > have more than a single bank of memory, I've no way to test and
> > progress sparsemem on ARM - and since no one's interested I'm probably
> > going to drop the ARM sparsemem git branch soon.
> 
> Could you keep it around for a while longer? sparse_virtual now allows 
> code reduction in page_address and virt_to_page even vs. discontig.

The patch is at:

  http://ftp.arm.linux.org.uk/pub/linux/arm/kernel/git-cur/arm:sparsemem.diff

it's not much.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/7] x86_64: SPARSEMEM_VMEMMAP 2M page size support
  2007-07-13 13:36 ` [PATCH 4/7] x86_64: SPARSEMEM_VMEMMAP 2M page size support Andy Whitcroft
@ 2007-07-19 23:25   ` Andrew Morton
  0 siblings, 0 replies; 38+ messages in thread
From: Andrew Morton @ 2007-07-19 23:25 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: linux-mm, linux-arch, Nick Piggin, Christoph Lameter, Mel Gorman

On Fri, 13 Jul 2007 14:36:39 +0100
Andy Whitcroft <apw@shadowen.org> wrote:

> x86_64 uses 2M page table entries to map its 1-1 kernel space.
> We also implement the virtual memmap using 2M page table entries.  So
> there is no additional runtime overhead over FLATMEM, initialisation
> is slightly more complex.  As FLATMEM still references memory to
> obtain the mem_map pointer and SPARSEMEM_VMEMMAP uses a compile
> time constant, SPARSEMEM_VMEMMAP should be superior.
> 
> With this SPARSEMEM becomes the most efficient way of handling
> virt_to_page, pfn_to_page and friends for UP, SMP and NUMA on x86_64.
> 
> [apw@shadowen.org: code resplit, style fixups]
> From: Christoph Lameter <clameter@sgi.com>
> Signed-off-by: Christoph Lameter <clameter@sgi.com>
> Signed-off-by: Andy Whitcroft <apw@shadowen.org>
> Acked-by: Mel Gorman <mel@csn.ul.ie>
> ---
> diff --git a/Documentation/x86_64/mm.txt b/Documentation/x86_64/mm.txt

Please put the From: attribution right at the top of the changelog.

Please alter your scripts to include diffstat output after the ^---

Thanks.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/7] Generic Virtual Memmap support for SPARSEMEM
  2007-07-14 16:33       ` Christoph Hellwig
@ 2007-07-23 19:36         ` Christoph Lameter
  0 siblings, 0 replies; 38+ messages in thread
From: Christoph Lameter @ 2007-07-23 19:36 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andy Whitcroft, linux-mm, linux-arch, Nick Piggin, Mel Gorman

On Sat, 14 Jul 2007 17:33:19 +0100
Christoph Hellwig <hch@infradead.org> wrote:
 
> It's not generic.  Most of it is under a maze of obscure config
> options. The patchset in it's current form is a complete mess of
> obscure ifefery and not quite generic code.  And it only adds new
> memory models without ripping old stuff out.  So while I really like
> the basic idea the patches need quite a lot more work until they're
> mergeable.

It is generic. If you would put the components into each arch then you
would needlessly duplicate code. In order to rip stuff out we first
need to have sparsemem contain all the features of discontig.

Then we can start to get rid of discontig and then we will be able to
reduce the number of memory models supported by sparsemem.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 0/7] Sparsemem Virtual Memmap V5
  2007-07-13 13:34 [PATCH 0/7] Sparsemem Virtual Memmap V5 Andy Whitcroft
                   ` (7 preceding siblings ...)
  2007-07-13 17:04 ` [PATCH 0/7] Sparsemem Virtual Memmap V5 Christoph Lameter
@ 2007-07-26  8:05 ` Paul Mundt
  8 siblings, 0 replies; 38+ messages in thread
From: Paul Mundt @ 2007-07-26  8:05 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: linux-mm, linux-arch, Nick Piggin, Christoph Lameter, Mel Gorman

On Fri, Jul 13, 2007 at 02:34:37PM +0100, Andy Whitcroft wrote:
> However, if there is enough virtual space available and the arch
> already maps its 1-1 kernel space using TLBs (f.e. true of IA64
> and x86_64) then this technique makes SPARSEMEM lookups even more
> efficient than CONFIG_FLATMEM.  FLATMEM needs to read the contents
> of the mem_map variable to get the start of the memmap and then add
> the offset to the required entry.  vmemmap is a constant to which
> we can simply add the offset.
> 
This is something I've been debating how to make use of on SH, but I
haven't come to any good conclusions yet, so I think a brain-dump is in
order (MIPS will have the same concerns I suppose).

SH has lowmem (512M) directly accessible with physical/virtual and
cached/uncached simply being a matter of flipping high bits, there are no
TLBs for this space, as it's not really a translatable space in the
strictest sense of the word (we only end up taking page faults for user
addresses, special memory windows, and various other memory blocks --
include/asm-sh/io.h:__ioremap_mode() might serve as a good example). For
contiguous system memory it would be possible just to wrap the vmemmap
base to the beginning of P1 space and not worry about any of this.
However, for memories that exist outside of this space (whether it be
highmem or other nodes built on memories in different part of the address
space completely), it's still necessary to map with TLBs. Building a
vmemmap for lowmem would seem to be a waste of space, and it doesn't
really buy us anything that I can see. On the other hand, this is
something that's desirable for the other nodes or anything translatable
(ie, memories outside of the lowmem range) as it gives us the ability to
construct the memmap using large TLBs.

This is something that's fairly trivial to hack up with out-of-line
__page_to_pfn()/__pfn_to_page() as we can simply reference the vmemmap
for memory that is not in the low 512M and do the high bit mangling
otherwise (assuming we've populated it in a similar fashion), but I
wonder if that's the best way to approach this?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/7] Generic Virtual Memmap support for SPARSEMEM
  2007-07-14 15:20   ` Christoph Hellwig
  2007-07-14 16:06     ` Christoph Lameter
@ 2007-07-30 14:39     ` Andy Whitcroft
  2007-07-30 18:35       ` Christoph Lameter
  1 sibling, 1 reply; 38+ messages in thread
From: Andy Whitcroft @ 2007-07-30 14:39 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-mm, linux-arch, Nick Piggin, Christoph Lameter, Mel Gorman

Christoph Hellwig wrote:
>> --- a/include/asm-generic/memory_model.h
>> +++ b/include/asm-generic/memory_model.h
>> @@ -46,6 +46,12 @@
>>  	 __pgdat->node_start_pfn;					\
>>  })
>>  
>> +#elif defined(CONFIG_SPARSEMEM_VMEMMAP)
>> +
>> +/* memmap is virtually contigious.  */
>> +#define __pfn_to_page(pfn)	(vmemmap + (pfn))
>> +#define __page_to_pfn(page)	((page) - vmemmap)
>> +
>>  #elif defined(CONFIG_SPARSEMEM)
> 
> nice ifdef mess you have here.  and an sm-generic file should be something
> truely generic instead of a complete ifdef forest.  I think we'd be
> much better off duplicating the two lines above in architectures using
> it anyway.

The code itself is generic in the sense its architecture neutral.  This
is "per memory model" code.  I am wondering however why it is in an
asm-anything include file here.  This seems to the world like it should
be in include/linux/memory_model.h.

>> diff --git a/mm/sparse.c b/mm/sparse.c
>> index d6678ab..5cc6e74 100644
>> --- a/mm/sparse.c
>> +++ b/mm/sparse.c
>> @@ -9,6 +9,8 @@
>>  #include <linux/spinlock.h>
>>  #include <linux/vmalloc.h>
>>  #include <asm/dma.h>
>> +#include <asm/pgalloc.h>
>> +#include <asm/pgtable.h>
>>  
>>  /*
>>   * Permanent SPARSEMEM data:
>> @@ -218,6 +220,192 @@ void *alloc_bootmem_high_node(pg_data_t *pgdat, unsigned long size)
>>  	return NULL;
>>  }
>>  
>> +#ifdef CONFIG_SPARSEMEM_VMEMMAP
>> +/*
>> + * Virtual Memory Map support
>> + *
>> + * (C) 2007 sgi. Christoph Lameter <clameter@sgi.com>.
> 
> When did we start putting copyright lines and large block comment in the
> middle of the file?
> 
> Please sort this and the ifdef mess out, I suspect a new file for this
> code would be best.

I will have a look at how this would look pulled out into separate .c files.

>> +void * __meminit vmemmap_alloc_block(unsigned long size, int node)
> 
> void * __meminit vmemmap_alloc_block(unsigned long size, int node)
> 
>> +#ifndef CONFIG_ARCH_POPULATES_SPARSEMEM_VMEMMAP
>> +void __meminit vmemmap_verify(pte_t *pte, int node,
>> +				unsigned long start, unsigned long end)
>> +{
>> +	unsigned long pfn = pte_pfn(*pte);
>> +	int actual_node = early_pfn_to_nid(pfn);
>> +
>> +	if (actual_node != node)
>> +		printk(KERN_WARNING "[%lx-%lx] potential offnode "
>> +			"page_structs\n", start, end - 1);
>> +}
> 
> Given tht this function is a tiny noop please just put them into the
> arch dir for !CONFIG_ARCH_POPULATES_SPARSEMEM_VMEMMAP architectures
> and save yourself both the ifdef mess and the config option.
> 

Will also look that over and see how it comes out.

-apw

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/7] Generic Virtual Memmap support for SPARSEMEM
  2007-07-30 14:39     ` Andy Whitcroft
@ 2007-07-30 18:35       ` Christoph Lameter
  0 siblings, 0 replies; 38+ messages in thread
From: Christoph Lameter @ 2007-07-30 18:35 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: Christoph Hellwig, linux-mm, linux-arch, Nick Piggin, Mel Gorman

On Mon, 30 Jul 2007, Andy Whitcroft wrote:

> The code itself is generic in the sense its architecture neutral.  This
> is "per memory model" code.  I am wondering however why it is in an
> asm-anything include file here.  This seems to the world like it should
> be in include/linux/memory_model.h.

Riiight! Or directly in mm.h?


^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2007-07-30 18:35 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-13 13:34 [PATCH 0/7] Sparsemem Virtual Memmap V5 Andy Whitcroft
2007-07-13 13:35 ` [PATCH 1/7] sparsemem: clean up spelling error in comments Andy Whitcroft
2007-07-13 13:35 ` [PATCH 2/7] sparsemem: record when a section has a valid mem_map Andy Whitcroft
2007-07-13 13:36 ` [PATCH 3/7] Generic Virtual Memmap support for SPARSEMEM Andy Whitcroft
2007-07-13 14:51   ` KAMEZAWA Hiroyuki
2007-07-13 22:42     ` Christoph Lameter
2007-07-13 23:12       ` KAMEZAWA Hiroyuki
2007-07-13 23:17         ` Christoph Lameter
2007-07-13 23:25           ` KAMEZAWA Hiroyuki
2007-07-14 15:20   ` Christoph Hellwig
2007-07-14 16:06     ` Christoph Lameter
2007-07-14 16:33       ` Christoph Hellwig
2007-07-23 19:36         ` Christoph Lameter
2007-07-30 14:39     ` Andy Whitcroft
2007-07-30 18:35       ` Christoph Lameter
2007-07-13 13:36 ` [PATCH 4/7] x86_64: SPARSEMEM_VMEMMAP 2M page size support Andy Whitcroft
2007-07-19 23:25   ` Andrew Morton
2007-07-13 13:37 ` [PATCH 5/7] IA64: SPARSEMEM_VMEMMAP 16K " Andy Whitcroft
2007-07-13 13:37 ` [PATCH 6/7] SPARC64: SPARSEMEM_VMEMMAP support Andy Whitcroft
2007-07-13 17:00   ` Christoph Lameter
2007-07-13 13:38 ` [PATCH 7/7] ppc64: " Andy Whitcroft
2007-07-13 17:04 ` [PATCH 0/7] Sparsemem Virtual Memmap V5 Christoph Lameter
2007-07-13 17:40   ` Andrew Morton
2007-07-13 18:23     ` Christoph Lameter
2007-07-14  8:57       ` Russell King
2007-07-14 15:10         ` Christoph Lameter
2007-07-14 17:16           ` Russell King
2007-07-13 20:08     ` Roman Zippel
2007-07-13 22:02     ` Luck, Tony
2007-07-13 22:21       ` Christoph Lameter
2007-07-13 22:37         ` Luck, Tony
2007-07-13 22:54           ` Christoph Lameter
2007-07-13 23:27             ` KAMEZAWA Hiroyuki
2007-07-13 23:28               ` Christoph Lameter
2007-07-14  8:49         ` Nick Piggin
2007-07-14 15:07           ` Christoph Lameter
2007-07-13 22:43     ` David Miller
2007-07-26  8:05 ` Paul Mundt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).