* [PATCH 0/4] vmemmap updates to V6
@ 2007-08-02 9:24 Andy Whitcroft
2007-08-02 9:24 ` [PATCH 1/4] vmemmap: remove excess debugging Andy Whitcroft
` (3 more replies)
0 siblings, 4 replies; 14+ messages in thread
From: Andy Whitcroft @ 2007-08-02 9:24 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-mm, linux-arch, Christoph Hellwig, Nick Piggin,
Christoph Lameter, Mel Gorman, Andy Whitcroft
Following this email are a four patches which represent the first
batch of feedback on version V5. I have some additional config
simplifications in test at the moment, and we probabally need to
move memory_model.h.
vmemmap-remove-excess-debugging -- remove some verbose and mostly
unhelpful debugging.
vmemmap-simplify-initialisation-code-and-reduce-duplication -- clean
up section initialisaion to simplify pulling out the vmemmap code.
vmemmap-pull-out-the-vmemmap-code-into-its-own-file -- pull out the
vmemmap code into its own file.
vmemmap-ppc64-convert-VMM_*-macros-to-a-real-function -- replace
some macros with an inline function to improve type safety.
The first three should be considered as fixes the patch below,
the last against the ppc64 support:
generic-virtual-memmap-support-for-sparsemem
All against 2.6.23-rc1-mm2.
Andrew please consider for -mm. (I found that merging the patch
below into its parent patch before sliding these into the tree made
the rejects must simpler.)
fix-corruption-of-memmap-on-ia64-sparsemem-when-mem_section-is-not-a-power-of-2-fix.patch
-apw
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 1/4] vmemmap: remove excess debugging
2007-08-02 9:24 [PATCH 0/4] vmemmap updates to V6 Andy Whitcroft
@ 2007-08-02 9:24 ` Andy Whitcroft
2007-08-02 19:18 ` Christoph Lameter
2007-08-02 9:25 ` [PATCH 2/4] vmemmap: simplify initialisation code and reduce duplication Andy Whitcroft
` (2 subsequent siblings)
3 siblings, 1 reply; 14+ messages in thread
From: Andy Whitcroft @ 2007-08-02 9:24 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-mm, linux-arch, Christoph Hellwig, Nick Piggin,
Christoph Lameter, Mel Gorman, Andy Whitcroft
Outputting each and every PTE as it is loaded is somewhat overkill
zap this debug.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Mel Gorman <mel@csn.ul.ie>
---
mm/sparse.c | 3 ---
1 files changed, 0 insertions(+), 3 deletions(-)
diff --git a/mm/sparse.c b/mm/sparse.c
index 7dcea95..76316d4 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -340,9 +340,6 @@ static int __meminit vmemmap_populate_pte(pmd_t *pmd, unsigned long addr,
entry = pfn_pte(__pa(p) >> PAGE_SHIFT, PAGE_KERNEL);
set_pte(pte, entry);
- printk(KERN_DEBUG "[%lx-%lx] PTE ->%p on node %d\n",
- addr, addr + PAGE_SIZE - 1, p, node);
-
} else
vmemmap_verify(pte, node, addr + PAGE_SIZE, end);
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 2/4] vmemmap: simplify initialisation code and reduce duplication
2007-08-02 9:24 [PATCH 0/4] vmemmap updates to V6 Andy Whitcroft
2007-08-02 9:24 ` [PATCH 1/4] vmemmap: remove excess debugging Andy Whitcroft
@ 2007-08-02 9:25 ` Andy Whitcroft
2007-08-02 9:25 ` [PATCH 3/4] vmemmap: pull out the vmemmap code into its own file Andy Whitcroft
2007-08-02 9:25 ` [PATCH 4/4] vmemmap ppc64: convert VMM_* macros to a real function Andy Whitcroft
3 siblings, 0 replies; 14+ messages in thread
From: Andy Whitcroft @ 2007-08-02 9:25 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-mm, linux-arch, Christoph Hellwig, Nick Piggin,
Christoph Lameter, Mel Gorman, Andy Whitcroft
The vmemmap and non-vmemmap implementations of
sparse_early_mem_map_alloc() share a fair amount of code.
Refactor this into a common wrapper, pulling the differences out
to sparse_early_mem_map_populate(). This reduces depandancies
between SPARSMEM and SPARSEMEM_VMEMMAP simplifying separation.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
---
mm/sparse.c | 41 +++++++++++++++++++++--------------------
1 files changed, 21 insertions(+), 20 deletions(-)
diff --git a/mm/sparse.c b/mm/sparse.c
index 76316d4..1905759 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -421,33 +421,23 @@ int __meminit vmemmap_populate(struct page *start_page,
}
#endif /* !CONFIG_ARCH_POPULATES_SPARSEMEM_VMEMMAP */
-static struct page * __init sparse_early_mem_map_alloc(unsigned long pnum)
+static struct page __init *sparse_early_mem_map_populate(unsigned long pnum,
+ int nid)
{
- struct page *map;
- struct mem_section *ms = __nr_to_section(pnum);
- int nid = sparse_early_nid(ms);
- int error;
-
- map = pfn_to_page(pnum * PAGES_PER_SECTION);
- error = vmemmap_populate(map, PAGES_PER_SECTION, nid);
- if (error) {
- printk(KERN_ERR "%s: allocation failed. Error=%d\n",
- __FUNCTION__, error);
- printk(KERN_ERR "%s: virtual memory map backing failed "
- "some memory will not be available.\n", __FUNCTION__);
- ms->section_mem_map = 0;
+ struct page *map = pfn_to_page(pnum * PAGES_PER_SECTION);
+ int error = vmemmap_populate(map, PAGES_PER_SECTION, nid);
+ if (error)
return NULL;
- }
+
return map;
}
#else /* CONFIG_SPARSEMEM_VMEMMAP */
-static struct page __init *sparse_early_mem_map_alloc(unsigned long pnum)
+static struct page __init *sparse_early_mem_map_populate(unsigned long pnum,
+ int nid)
{
struct page *map;
- struct mem_section *ms = __nr_to_section(pnum);
- int nid = sparse_early_nid(ms);
map = alloc_remap(nid, sizeof(struct page) * PAGES_PER_SECTION);
if (map)
@@ -460,14 +450,25 @@ static struct page __init *sparse_early_mem_map_alloc(unsigned long pnum)
map = alloc_bootmem_node(NODE_DATA(nid),
sizeof(struct page) * PAGES_PER_SECTION);
+ return map;
+}
+#endif /* !CONFIG_SPARSEMEM_VMEMMAP */
+
+struct page __init *sparse_early_mem_map_alloc(unsigned long pnum)
+{
+ struct page *map;
+ struct mem_section *ms = __nr_to_section(pnum);
+ int nid = sparse_early_nid(ms);
+
+ map = sparse_early_mem_map_populate(pnum, nid);
if (map)
return map;
- printk(KERN_WARNING "%s: allocation failed\n", __FUNCTION__);
+ printk(KERN_ERR "%s: sparsemem memory map backing failed "
+ "some memory will not be available.\n", __FUNCTION__);
ms->section_mem_map = 0;
return NULL;
}
-#endif /* !CONFIG_SPARSEMEM_VMEMMAP */
/*
* Allocate the accumulated non-linear sections, allocate a mem_map
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 3/4] vmemmap: pull out the vmemmap code into its own file
2007-08-02 9:24 [PATCH 0/4] vmemmap updates to V6 Andy Whitcroft
2007-08-02 9:24 ` [PATCH 1/4] vmemmap: remove excess debugging Andy Whitcroft
2007-08-02 9:25 ` [PATCH 2/4] vmemmap: simplify initialisation code and reduce duplication Andy Whitcroft
@ 2007-08-02 9:25 ` Andy Whitcroft
2007-08-02 13:26 ` Christoph Hellwig
2007-08-02 9:25 ` [PATCH 4/4] vmemmap ppc64: convert VMM_* macros to a real function Andy Whitcroft
3 siblings, 1 reply; 14+ messages in thread
From: Andy Whitcroft @ 2007-08-02 9:25 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-mm, linux-arch, Christoph Hellwig, Nick Piggin,
Christoph Lameter, Mel Gorman, Andy Whitcroft
Pull out the SPARSEMEM_VMEMMAP support into its own file.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
---
include/linux/mm.h | 1 +
mm/Makefile | 1 +
mm/sparse-vmemmap.c | 181 +++++++++++++++++++++++++++++++++++++++++++++++++++
mm/sparse.c | 180 +--------------------------------------------------
4 files changed, 185 insertions(+), 178 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 22e9705..9ea07a5 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1211,6 +1211,7 @@ extern int randomize_va_space;
const char * arch_vma_name(struct vm_area_struct *vma);
+struct page *sparse_early_mem_map_populate(unsigned long pnum, int nid);
int vmemmap_populate(struct page *start_page, unsigned long pages, int node);
int vmemmap_populate_pmd(pud_t *, unsigned long, unsigned long, int);
void *vmemmap_alloc_block(unsigned long size, int node);
diff --git a/mm/Makefile b/mm/Makefile
index 3dd262a..adb8442 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -19,6 +19,7 @@ obj-$(CONFIG_SWAP_PREFETCH) += swap_prefetch.o
obj-$(CONFIG_HUGETLBFS) += hugetlb.o
obj-$(CONFIG_NUMA) += mempolicy.o
obj-$(CONFIG_SPARSEMEM) += sparse.o
+obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
obj-$(CONFIG_SHMEM) += shmem.o
obj-$(CONFIG_TMPFS_POSIX_ACL) += shmem_acl.o
obj-$(CONFIG_TINY_SHMEM) += tiny-shmem.o
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
new file mode 100644
index 0000000..7bb7a4b
--- /dev/null
+++ b/mm/sparse-vmemmap.c
@@ -0,0 +1,181 @@
+/*
+ * Virtual Memory Map support
+ *
+ * (C) 2007 sgi. Christoph Lameter <clameter@sgi.com>.
+ *
+ * Virtual memory maps allow VM primitives pfn_to_page, page_to_pfn,
+ * virt_to_page, page_address() to be implemented as a base offset
+ * calculation without memory access.
+ *
+ * However, virtual mappings need a page table and TLBs. Many Linux
+ * architectures already map their physical space using 1-1 mappings
+ * via TLBs. For those arches the virtual memmory map is essentially
+ * for free if we use the same page size as the 1-1 mappings. In that
+ * case the overhead consists of a few additional pages that are
+ * allocated to create a view of memory for vmemmap.
+ *
+ * Special Kconfig settings:
+ *
+ * CONFIG_ARCH_POPULATES_SPARSEMEM_VMEMMAP
+ *
+ * The architecture has its own functions to populate the memory
+ * map and provides a vmemmap_populate function.
+ *
+ * CONFIG_ARCH_POPULATES_SPARSEMEM_VMEMMAP_PMD
+ *
+ * The architecture provides functions to populate the pmd level
+ * of the vmemmap mappings. Allowing mappings using large pages
+ * where available.
+ *
+ * If neither are set then PAGE_SIZE mappings are generated which
+ * require one PTE/TLB per PAGE_SIZE chunk of the virtual memory map.
+ */
+#include <linux/mm.h>
+#include <linux/mmzone.h>
+#include <linux/bootmem.h>
+#include <linux/highmem.h>
+#include <linux/module.h>
+#include <linux/spinlock.h>
+#include <linux/vmalloc.h>
+#include <asm/dma.h>
+#include <asm/pgalloc.h>
+#include <asm/pgtable.h>
+
+/*
+ * Allocate a block of memory to be used to back the virtual memory map
+ * or to back the page tables that are used to create the mapping.
+ * Uses the main allocators if they are available, else bootmem.
+ */
+void * __meminit vmemmap_alloc_block(unsigned long size, int node)
+{
+ /* If the main allocator is up use that, fallback to bootmem. */
+ if (slab_is_available()) {
+ struct page *page = alloc_pages_node(node,
+ GFP_KERNEL | __GFP_ZERO, get_order(size));
+ if (page)
+ return page_address(page);
+ return NULL;
+ } else
+ return __alloc_bootmem_node(NODE_DATA(node), size, size,
+ __pa(MAX_DMA_ADDRESS));
+}
+
+#ifndef CONFIG_ARCH_POPULATES_SPARSEMEM_VMEMMAP
+void __meminit vmemmap_verify(pte_t *pte, int node,
+ unsigned long start, unsigned long end)
+{
+ unsigned long pfn = pte_pfn(*pte);
+ int actual_node = early_pfn_to_nid(pfn);
+
+ if (actual_node != node)
+ printk(KERN_WARNING "[%lx-%lx] potential offnode "
+ "page_structs\n", start, end - 1);
+}
+
+#ifndef CONFIG_ARCH_POPULATES_SPARSEMEM_VMEMMAP_PMD
+static int __meminit vmemmap_populate_pte(pmd_t *pmd, unsigned long addr,
+ unsigned long end, int node)
+{
+ pte_t *pte;
+
+ for (pte = pte_offset_kernel(pmd, addr); addr < end;
+ pte++, addr += PAGE_SIZE)
+ if (pte_none(*pte)) {
+ pte_t entry;
+ void *p = vmemmap_alloc_block(PAGE_SIZE, node);
+ if (!p)
+ return -ENOMEM;
+
+ entry = pfn_pte(__pa(p) >> PAGE_SHIFT, PAGE_KERNEL);
+ set_pte(pte, entry);
+
+ } else
+ vmemmap_verify(pte, node, addr + PAGE_SIZE, end);
+
+ return 0;
+}
+
+int __meminit vmemmap_populate_pmd(pud_t *pud, unsigned long addr,
+ unsigned long end, int node)
+{
+ pmd_t *pmd;
+ int error = 0;
+ unsigned long next;
+
+ for (pmd = pmd_offset(pud, addr); addr < end && !error;
+ pmd++, addr = next) {
+ if (pmd_none(*pmd)) {
+ void *p = vmemmap_alloc_block(PAGE_SIZE, node);
+ if (!p)
+ return -ENOMEM;
+
+ pmd_populate_kernel(&init_mm, pmd, p);
+ } else
+ vmemmap_verify((pte_t *)pmd, node,
+ pmd_addr_end(addr, end), end);
+ next = pmd_addr_end(addr, end);
+ error = vmemmap_populate_pte(pmd, addr, next, node);
+ }
+ return error;
+}
+#endif /* CONFIG_ARCH_POPULATES_SPARSEMEM_VMEMMAP_PMD */
+
+static int __meminit vmemmap_populate_pud(pgd_t *pgd, unsigned long addr,
+ unsigned long end, int node)
+{
+ pud_t *pud;
+ int error = 0;
+ unsigned long next;
+
+ for (pud = pud_offset(pgd, addr); addr < end && !error;
+ pud++, addr = next) {
+ if (pud_none(*pud)) {
+ void *p = vmemmap_alloc_block(PAGE_SIZE, node);
+ if (!p)
+ return -ENOMEM;
+
+ pud_populate(&init_mm, pud, p);
+ }
+ next = pud_addr_end(addr, end);
+ error = vmemmap_populate_pmd(pud, addr, next, node);
+ }
+ return error;
+}
+
+int __meminit vmemmap_populate(struct page *start_page,
+ unsigned long nr, int node)
+{
+ pgd_t *pgd;
+ unsigned long addr = (unsigned long)start_page;
+ unsigned long end = (unsigned long)(start_page + nr);
+ unsigned long next;
+ int error = 0;
+
+ printk(KERN_DEBUG "[%lx-%lx] Virtual memory section"
+ " (%ld pages) node %d\n", addr, end - 1, nr, node);
+
+ for (pgd = pgd_offset_k(addr); addr < end && !error;
+ pgd++, addr = next) {
+ if (pgd_none(*pgd)) {
+ void *p = vmemmap_alloc_block(PAGE_SIZE, node);
+ if (!p)
+ return -ENOMEM;
+
+ pgd_populate(&init_mm, pgd, p);
+ }
+ next = pgd_addr_end(addr,end);
+ error = vmemmap_populate_pud(pgd, addr, next, node);
+ }
+ return error;
+}
+#endif /* !CONFIG_ARCH_POPULATES_SPARSEMEM_VMEMMAP */
+
+struct page __init *sparse_early_mem_map_populate(unsigned long pnum, int nid)
+{
+ struct page *map = pfn_to_page(pnum * PAGES_PER_SECTION);
+ int error = vmemmap_populate(map, PAGES_PER_SECTION, nid);
+ if (error)
+ return NULL;
+
+ return map;
+}
diff --git a/mm/sparse.c b/mm/sparse.c
index 1905759..1f4dbb8 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -258,184 +258,8 @@ static unsigned long *sparse_early_usemap_alloc(unsigned long pnum)
return NULL;
}
-#ifdef CONFIG_SPARSEMEM_VMEMMAP
-/*
- * Virtual Memory Map support
- *
- * (C) 2007 sgi. Christoph Lameter <clameter@sgi.com>.
- *
- * Virtual memory maps allow VM primitives pfn_to_page, page_to_pfn,
- * virt_to_page, page_address() to be implemented as a base offset
- * calculation without memory access.
- *
- * However, virtual mappings need a page table and TLBs. Many Linux
- * architectures already map their physical space using 1-1 mappings
- * via TLBs. For those arches the virtual memmory map is essentially
- * for free if we use the same page size as the 1-1 mappings. In that
- * case the overhead consists of a few additional pages that are
- * allocated to create a view of memory for vmemmap.
- *
- * Special Kconfig settings:
- *
- * CONFIG_ARCH_POPULATES_SPARSEMEM_VMEMMAP
- *
- * The architecture has its own functions to populate the memory
- * map and provides a vmemmap_populate function.
- *
- * CONFIG_ARCH_POPULATES_SPARSEMEM_VMEMMAP_PMD
- *
- * The architecture provides functions to populate the pmd level
- * of the vmemmap mappings. Allowing mappings using large pages
- * where available.
- *
- * If neither are set then PAGE_SIZE mappings are generated which
- * require one PTE/TLB per PAGE_SIZE chunk of the virtual memory map.
- */
-
-/*
- * Allocate a block of memory to be used to back the virtual memory map
- * or to back the page tables that are used to create the mapping.
- * Uses the main allocators if they are available, else bootmem.
- */
-void * __meminit vmemmap_alloc_block(unsigned long size, int node)
-{
- /* If the main allocator is up use that, fallback to bootmem. */
- if (slab_is_available()) {
- struct page *page = alloc_pages_node(node,
- GFP_KERNEL | __GFP_ZERO, get_order(size));
- if (page)
- return page_address(page);
- return NULL;
- } else
- return __alloc_bootmem_node(NODE_DATA(node), size, size,
- __pa(MAX_DMA_ADDRESS));
-}
-
-#ifndef CONFIG_ARCH_POPULATES_SPARSEMEM_VMEMMAP
-void __meminit vmemmap_verify(pte_t *pte, int node,
- unsigned long start, unsigned long end)
-{
- unsigned long pfn = pte_pfn(*pte);
- int actual_node = early_pfn_to_nid(pfn);
-
- if (actual_node != node)
- printk(KERN_WARNING "[%lx-%lx] potential offnode "
- "page_structs\n", start, end - 1);
-}
-
-#ifndef CONFIG_ARCH_POPULATES_SPARSEMEM_VMEMMAP_PMD
-static int __meminit vmemmap_populate_pte(pmd_t *pmd, unsigned long addr,
- unsigned long end, int node)
-{
- pte_t *pte;
-
- for (pte = pte_offset_kernel(pmd, addr); addr < end;
- pte++, addr += PAGE_SIZE)
- if (pte_none(*pte)) {
- pte_t entry;
- void *p = vmemmap_alloc_block(PAGE_SIZE, node);
- if (!p)
- return -ENOMEM;
-
- entry = pfn_pte(__pa(p) >> PAGE_SHIFT, PAGE_KERNEL);
- set_pte(pte, entry);
-
- } else
- vmemmap_verify(pte, node, addr + PAGE_SIZE, end);
-
- return 0;
-}
-
-int __meminit vmemmap_populate_pmd(pud_t *pud, unsigned long addr,
- unsigned long end, int node)
-{
- pmd_t *pmd;
- int error = 0;
- unsigned long next;
-
- for (pmd = pmd_offset(pud, addr); addr < end && !error;
- pmd++, addr = next) {
- if (pmd_none(*pmd)) {
- void *p = vmemmap_alloc_block(PAGE_SIZE, node);
- if (!p)
- return -ENOMEM;
-
- pmd_populate_kernel(&init_mm, pmd, p);
- } else
- vmemmap_verify((pte_t *)pmd, node,
- pmd_addr_end(addr, end), end);
- next = pmd_addr_end(addr, end);
- error = vmemmap_populate_pte(pmd, addr, next, node);
- }
- return error;
-}
-#endif /* CONFIG_ARCH_POPULATES_SPARSEMEM_VMEMMAP_PMD */
-
-static int __meminit vmemmap_populate_pud(pgd_t *pgd, unsigned long addr,
- unsigned long end, int node)
-{
- pud_t *pud;
- int error = 0;
- unsigned long next;
-
- for (pud = pud_offset(pgd, addr); addr < end && !error;
- pud++, addr = next) {
- if (pud_none(*pud)) {
- void *p = vmemmap_alloc_block(PAGE_SIZE, node);
- if (!p)
- return -ENOMEM;
-
- pud_populate(&init_mm, pud, p);
- }
- next = pud_addr_end(addr, end);
- error = vmemmap_populate_pmd(pud, addr, next, node);
- }
- return error;
-}
-
-int __meminit vmemmap_populate(struct page *start_page,
- unsigned long nr, int node)
-{
- pgd_t *pgd;
- unsigned long addr = (unsigned long)start_page;
- unsigned long end = (unsigned long)(start_page + nr);
- unsigned long next;
- int error = 0;
-
- printk(KERN_DEBUG "[%lx-%lx] Virtual memory section"
- " (%ld pages) node %d\n", addr, end - 1, nr, node);
-
- for (pgd = pgd_offset_k(addr); addr < end && !error;
- pgd++, addr = next) {
- if (pgd_none(*pgd)) {
- void *p = vmemmap_alloc_block(PAGE_SIZE, node);
- if (!p)
- return -ENOMEM;
-
- pgd_populate(&init_mm, pgd, p);
- }
- next = pgd_addr_end(addr,end);
- error = vmemmap_populate_pud(pgd, addr, next, node);
- }
- return error;
-}
-#endif /* !CONFIG_ARCH_POPULATES_SPARSEMEM_VMEMMAP */
-
-static struct page __init *sparse_early_mem_map_populate(unsigned long pnum,
- int nid)
-{
- struct page *map = pfn_to_page(pnum * PAGES_PER_SECTION);
- int error = vmemmap_populate(map, PAGES_PER_SECTION, nid);
- if (error)
- return NULL;
-
- return map;
-}
-
-#else /* CONFIG_SPARSEMEM_VMEMMAP */
-
-static struct page __init *sparse_early_mem_map_populate(unsigned long pnum,
- int nid)
+#ifndef CONFIG_SPARSEMEM_VMEMMAP
+struct page __init *sparse_early_mem_map_populate(unsigned long pnum, int nid)
{
struct page *map;
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 4/4] vmemmap ppc64: convert VMM_* macros to a real function
2007-08-02 9:24 [PATCH 0/4] vmemmap updates to V6 Andy Whitcroft
` (2 preceding siblings ...)
2007-08-02 9:25 ` [PATCH 3/4] vmemmap: pull out the vmemmap code into its own file Andy Whitcroft
@ 2007-08-02 9:25 ` Andy Whitcroft
2007-08-02 16:31 ` Dave Hansen
3 siblings, 1 reply; 14+ messages in thread
From: Andy Whitcroft @ 2007-08-02 9:25 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-mm, linux-arch, Christoph Hellwig, Nick Piggin,
Christoph Lameter, Mel Gorman, Andy Whitcroft
The code to convert an address within the vmemmap to the start of the
section is currently implemented using macros. Convert these over to
a new helper function, clarifying the code and gaining type checking.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
---
arch/powerpc/mm/init_64.c | 19 ++++++++++++-------
1 files changed, 12 insertions(+), 7 deletions(-)
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index 49c9f7c..05c7e93 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -185,14 +185,19 @@ void pgtable_cache_init(void)
#ifdef CONFIG_ARCH_POPULATES_SPARSEMEM_VMEMMAP
/*
- * Convert an address within the vmemmap into a pfn. Note that we have
- * to do this by hand as the proffered address may not be correctly aligned.
+ * Given an address within the vmemmap, determine the pfn of the page that
+ * represents the start of the section it is within. Note that we have to
+ * do this by hand as the proffered address may not be correctly aligned.
* Subtraction of non-aligned pointers produces undefined results.
*/
-#define VMM_SECTION(addr) \
- (((((unsigned long)(addr)) - ((unsigned long)(vmemmap))) / \
- sizeof(struct page)) >> PFN_SECTION_SHIFT)
-#define VMM_SECTION_PAGE(addr) (VMM_SECTION(addr) << PFN_SECTION_SHIFT)
+unsigned long __meminit vmemmap_section_start(struct page *page)
+{
+ unsigned long offset = ((unsigned long)page) -
+ ((unsigned long)(vmemmap));
+
+ /* Return the pfn of the start of the section. */
+ return (offset / sizeof(struct page)) & PAGE_SECTION_MASK;
+}
/*
* Check if this vmemmap page is already initialised. If any section
@@ -204,7 +209,7 @@ int __meminit vmemmap_populated(unsigned long start, int page_size)
unsigned long end = start + page_size;
for (; start < end; start += (PAGES_PER_SECTION * sizeof(struct page)))
- if (pfn_valid(VMM_SECTION_PAGE(start)))
+ if (pfn_valid(vmemmap_section_start(start)))
return 1;
return 0;
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 3/4] vmemmap: pull out the vmemmap code into its own file
2007-08-02 9:25 ` [PATCH 3/4] vmemmap: pull out the vmemmap code into its own file Andy Whitcroft
@ 2007-08-02 13:26 ` Christoph Hellwig
2007-08-02 19:28 ` Christoph Lameter
0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2007-08-02 13:26 UTC (permalink / raw)
To: Andy Whitcroft
Cc: Andrew Morton, linux-mm, linux-arch, Christoph Hellwig,
Nick Piggin, Christoph Lameter, Mel Gorman
On Thu, Aug 02, 2007 at 10:25:35AM +0100, Andy Whitcroft wrote:
> + * Special Kconfig settings:
> + *
> + * CONFIG_ARCH_POPULATES_SPARSEMEM_VMEMMAP
> + *
> + * The architecture has its own functions to populate the memory
> + * map and provides a vmemmap_populate function.
> + *
> + * CONFIG_ARCH_POPULATES_SPARSEMEM_VMEMMAP_PMD
> + *
> + * The architecture provides functions to populate the pmd level
> + * of the vmemmap mappings. Allowing mappings using large pages
> + * where available.
> + *
> + * If neither are set then PAGE_SIZE mappings are generated which
> + * require one PTE/TLB per PAGE_SIZE chunk of the virtual memory map.
> + */
This is the kinda of mess I mean. Which architecturs set either of these
and why? This code would be a lot more acceptable if we hadn't three
different variants of the arch interface.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 4/4] vmemmap ppc64: convert VMM_* macros to a real function
2007-08-02 9:25 ` [PATCH 4/4] vmemmap ppc64: convert VMM_* macros to a real function Andy Whitcroft
@ 2007-08-02 16:31 ` Dave Hansen
2007-08-02 17:39 ` Andy Whitcroft
2007-08-02 19:30 ` Christoph Lameter
0 siblings, 2 replies; 14+ messages in thread
From: Dave Hansen @ 2007-08-02 16:31 UTC (permalink / raw)
To: Andy Whitcroft
Cc: Andrew Morton, linux-mm, linux-arch, Christoph Hellwig,
Nick Piggin, Christoph Lameter, Mel Gorman
On Thu, 2007-08-02 at 10:25 +0100, Andy Whitcroft wrote:
>
> +unsigned long __meminit vmemmap_section_start(struct page *page)
> +{
> + unsigned long offset = ((unsigned long)page) -
> + ((unsigned long)(vmemmap));
Isn't this basically page_to_pfn()? Can we use it here?
-- Dave
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 4/4] vmemmap ppc64: convert VMM_* macros to a real function
2007-08-02 16:31 ` Dave Hansen
@ 2007-08-02 17:39 ` Andy Whitcroft
2007-08-02 18:00 ` Dave Hansen
2007-08-02 19:30 ` Christoph Lameter
1 sibling, 1 reply; 14+ messages in thread
From: Andy Whitcroft @ 2007-08-02 17:39 UTC (permalink / raw)
To: Dave Hansen
Cc: Andrew Morton, linux-mm, linux-arch, Christoph Hellwig,
Nick Piggin, Christoph Lameter, Mel Gorman
Dave Hansen wrote:
> On Thu, 2007-08-02 at 10:25 +0100, Andy Whitcroft wrote:
>> +unsigned long __meminit vmemmap_section_start(struct page *page)
>> +{
>> + unsigned long offset = ((unsigned long)page) -
>> + ((unsigned long)(vmemmap));
>
> Isn't this basically page_to_pfn()? Can we use it here?
No, as that does direct subtraction of the two pointers. Our 'page'
here is not guarenteed to be aligned even to a struct page boundary.
When it is not so aligned the subtraction of the pointers is undefined.
Indeed when you do subtract them when the 'page' is not aligned you get
complete gibberish back and blammo's result.
-apw
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 4/4] vmemmap ppc64: convert VMM_* macros to a real function
2007-08-02 17:39 ` Andy Whitcroft
@ 2007-08-02 18:00 ` Dave Hansen
0 siblings, 0 replies; 14+ messages in thread
From: Dave Hansen @ 2007-08-02 18:00 UTC (permalink / raw)
To: Andy Whitcroft
Cc: Andrew Morton, linux-mm, linux-arch, Christoph Hellwig,
Nick Piggin, Christoph Lameter, Mel Gorman
On Thu, 2007-08-02 at 18:39 +0100, Andy Whitcroft wrote:
> Dave Hansen wrote:
> > On Thu, 2007-08-02 at 10:25 +0100, Andy Whitcroft wrote:
> >> +unsigned long __meminit vmemmap_section_start(struct page *page)
> >> +{
> >> + unsigned long offset = ((unsigned long)page) -
> >> + ((unsigned long)(vmemmap));
> >
> > Isn't this basically page_to_pfn()? Can we use it here?
>
> No, as that does direct subtraction of the two pointers. Our 'page'
> here is not guarenteed to be aligned even to a struct page boundary.
Are you saying that it isn't PAGE_MASK (((unsigned long)page)&PAGE_MASK
== page) aligned. Or, that it isn't sizeof(struct page) aligned?
> +unsigned long __meminit vmemmap_section_start(struct page *page)
...
> @@ -204,7 +209,7 @@ int __meminit vmemmap_populated(unsigned long start, int page_size)
> unsigned long end = start + page_size;
>
> for (; start < end; start += (PAGES_PER_SECTION * sizeof(struct page)))
> - if (pfn_valid(VMM_SECTION_PAGE(start)))
> + if (pfn_valid(vmemmap_section_start(start)))
> return 1;
If "start" is an "unsigned long", why is it being passed into a function
that takes a "struct page"?
I think the types are confusing me a bit.
-- Dave
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/4] vmemmap: remove excess debugging
2007-08-02 9:24 ` [PATCH 1/4] vmemmap: remove excess debugging Andy Whitcroft
@ 2007-08-02 19:18 ` Christoph Lameter
0 siblings, 0 replies; 14+ messages in thread
From: Christoph Lameter @ 2007-08-02 19:18 UTC (permalink / raw)
To: Andy Whitcroft
Cc: Andrew Morton, linux-mm, linux-arch, Christoph Hellwig,
Nick Piggin, Mel Gorman
Acked-by: Christoph Lameter <clameter@sgi.com>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/4] vmemmap: pull out the vmemmap code into its own file
2007-08-02 13:26 ` Christoph Hellwig
@ 2007-08-02 19:28 ` Christoph Lameter
2007-08-03 14:57 ` Andy Whitcroft
0 siblings, 1 reply; 14+ messages in thread
From: Christoph Lameter @ 2007-08-02 19:28 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Andy Whitcroft, Andrew Morton, linux-mm, linux-arch, Nick Piggin,
Mel Gorman
On Thu, 2 Aug 2007, Christoph Hellwig wrote:
> On Thu, Aug 02, 2007 at 10:25:35AM +0100, Andy Whitcroft wrote:
> > + * Special Kconfig settings:
> > + *
> > + * CONFIG_ARCH_POPULATES_SPARSEMEM_VMEMMAP
> > + *
> > + * The architecture has its own functions to populate the memory
> > + * map and provides a vmemmap_populate function.
> > + *
> > + * CONFIG_ARCH_POPULATES_SPARSEMEM_VMEMMAP_PMD
?? Why was this added? The arch can populate the PMDs on its own already
if CONFIG_ARCH_SPARSEMEM_VMEMMAP is set.
> This is the kinda of mess I mean. Which architecturs set either of these
> and why? This code would be a lot more acceptable if we hadn't three
> different variants of the arch interface.
Initially at least my scheme was the following:
In general the sparsemem code can take care of a vmemmap that is using a
section of the vmalloc space. In that case no arch code is needed to
populate the vmemmap. Typical use is by arches with large pages (like
IA64). This is the default if no other options are set and can simply be
enabled by defining some constants in the arch code to reserve a section
of the vmalloc space.
Then there is the option of using the PMD to map a higher order page. This
can also be done transparently and is used by arches that have this
capability and a small page size. Those arches also require no additional
code to populate their vmemmap. This is true f.e. for i386 and x86_64.
These have to set CONFIG_ARCH_SUPPORTS_PMD_MAPPING
Then there are arches that have the vmemmap in non standard ways. Memory
may not be taken from the vmalloc space, special flags may have to be set
for the page tables (or one may use a different mechanism for mapping).
Those arches have to set CONFIG_ARCH_POPULATES_VIRTUAL_MEMMAP. In that
case the arch must provide its own function to populate the memory map.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 4/4] vmemmap ppc64: convert VMM_* macros to a real function
2007-08-02 16:31 ` Dave Hansen
2007-08-02 17:39 ` Andy Whitcroft
@ 2007-08-02 19:30 ` Christoph Lameter
1 sibling, 0 replies; 14+ messages in thread
From: Christoph Lameter @ 2007-08-02 19:30 UTC (permalink / raw)
To: Dave Hansen
Cc: Andy Whitcroft, Andrew Morton, linux-mm, linux-arch,
Christoph Hellwig, Nick Piggin, Mel Gorman
On Thu, 2 Aug 2007, Dave Hansen wrote:
> On Thu, 2007-08-02 at 10:25 +0100, Andy Whitcroft wrote:
> >
> > +unsigned long __meminit vmemmap_section_start(struct page *page)
> > +{
> > + unsigned long offset = ((unsigned long)page) -
> > + ((unsigned long)(vmemmap));
>
> Isn't this basically page_to_pfn()? Can we use it here?
Nope. He cast page to long.
Its equivalent to
page_to_pfn(page) * sizeof(struct page)
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/4] vmemmap: pull out the vmemmap code into its own file
2007-08-02 19:28 ` Christoph Lameter
@ 2007-08-03 14:57 ` Andy Whitcroft
2007-08-03 16:58 ` Christoph Lameter
0 siblings, 1 reply; 14+ messages in thread
From: Andy Whitcroft @ 2007-08-03 14:57 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Christoph Lameter, Andrew Morton, linux-mm, linux-arch,
Nick Piggin, Mel Gorman
Christoph Lameter wrote:
> On Thu, 2 Aug 2007, Christoph Hellwig wrote:
>
>> On Thu, Aug 02, 2007 at 10:25:35AM +0100, Andy Whitcroft wrote:
>>> + * Special Kconfig settings:
>>> + *
>>> + * CONFIG_ARCH_POPULATES_SPARSEMEM_VMEMMAP
>>> + *
>>> + * The architecture has its own functions to populate the memory
>>> + * map and provides a vmemmap_populate function.
>>> + *
>>> + * CONFIG_ARCH_POPULATES_SPARSEMEM_VMEMMAP_PMD
>
> ?? Why was this added? The arch can populate the PMDs on its own already
> if CONFIG_ARCH_SPARSEMEM_VMEMMAP is set.
The defines are essentially the ones which were in the V3 version of
VMEMMAP I picked up from you. They had slightly different names:
* CONFIG_ARCH_POPULATES_VIRTUAL_MEMMAP
* CONFIG_ARCH_SUPPORTS_PMD_MAPPING
The names were changed based on the PMD support not really being
generic, and to better then describe what they did. We still have the
same three options.
>> This is the kinda of mess I mean. Which architecturs set either of these
>> and why? This code would be a lot more acceptable if we hadn't three
>> different variants of the arch interface.
>
> Initially at least my scheme was the following:
>
> In general the sparsemem code can take care of a vmemmap that is using a
> section of the vmalloc space. In that case no arch code is needed to
> populate the vmemmap. Typical use is by arches with large pages (like
> IA64). This is the default if no other options are set and can simply be
> enabled by defining some constants in the arch code to reserve a section
> of the vmalloc space.
>
> Then there is the option of using the PMD to map a higher order page. This
> can also be done transparently and is used by arches that have this
> capability and a small page size. Those arches also require no additional
> code to populate their vmemmap. This is true f.e. for i386 and x86_64.
> These have to set CONFIG_ARCH_SUPPORTS_PMD_MAPPING
>
> Then there are arches that have the vmemmap in non standard ways. Memory
> may not be taken from the vmalloc space, special flags may have to be set
> for the page tables (or one may use a different mechanism for mapping).
> Those arches have to set CONFIG_ARCH_POPULATES_VIRTUAL_MEMMAP. In that
> case the arch must provide its own function to populate the memory map.
Yes in my view there is two parts to VMEMMAP. There is the runtime side
which is common to all architectures using the single virtually mapped
mem_map using the simple addition accessors, none of these options alter
the post-init behaviour of VMEMMAP. Then there is the initialisation
side, all of the configuration options here pertain to how that
initialisation is done.
There are three basic options:
1) it can be completely generic in that we use base pages mapped using
regular PTE's, or
2) the architecture can supply a PMD level initialiser, or
3) the architecture can supply a vmemmap_populate initialise which
instantiates the entire map.
As the PMD initialisers are only used by x86_64 we could make it supply
a complete vmemmap_populate level initialiser but that would result in
us duplicating the PUD level initialier function there which seems like
a bad idea.
I have been looking at a rejig of the configuration options to make them
all positive, so that you only have to assert a single VMEMMAP_* define
to get the correct code. But that does not really get rid of the defines.
The code as it stands would allow for us to pull out the "PTE/PMD"
initialiser from the vmemmap code and allow the architecture to select
it as a helper, it living in its own file again. But that seems excessive.
-apw
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/4] vmemmap: pull out the vmemmap code into its own file
2007-08-03 14:57 ` Andy Whitcroft
@ 2007-08-03 16:58 ` Christoph Lameter
0 siblings, 0 replies; 14+ messages in thread
From: Christoph Lameter @ 2007-08-03 16:58 UTC (permalink / raw)
To: Andy Whitcroft
Cc: Christoph Hellwig, Andrew Morton, linux-mm, linux-arch,
Nick Piggin, Mel Gorman
On Fri, 3 Aug 2007, Andy Whitcroft wrote:
> As the PMD initialisers are only used by x86_64 we could make it supply
> a complete vmemmap_populate level initialiser but that would result in
> us duplicating the PUD level initialier function there which seems like
> a bad idea.
Hmmm... at least i386 also uses it. Looked through the other arches but
cannot find evidence of them supporting PMD level huge page stuff.
There are some embedded archs (example FRV) which seem to be i386 knock
offs and those also support the same in hardware. There is some
rudimentary PSE suport in FRV. Has mk_pte_huge(). So I would expect that
at least i386, x86_64 and FRV would benefit from a generic implementation.
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2007-08-03 16:58 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-08-02 9:24 [PATCH 0/4] vmemmap updates to V6 Andy Whitcroft
2007-08-02 9:24 ` [PATCH 1/4] vmemmap: remove excess debugging Andy Whitcroft
2007-08-02 19:18 ` Christoph Lameter
2007-08-02 9:25 ` [PATCH 2/4] vmemmap: simplify initialisation code and reduce duplication Andy Whitcroft
2007-08-02 9:25 ` [PATCH 3/4] vmemmap: pull out the vmemmap code into its own file Andy Whitcroft
2007-08-02 13:26 ` Christoph Hellwig
2007-08-02 19:28 ` Christoph Lameter
2007-08-03 14:57 ` Andy Whitcroft
2007-08-03 16:58 ` Christoph Lameter
2007-08-02 9:25 ` [PATCH 4/4] vmemmap ppc64: convert VMM_* macros to a real function Andy Whitcroft
2007-08-02 16:31 ` Dave Hansen
2007-08-02 17:39 ` Andy Whitcroft
2007-08-02 18:00 ` Dave Hansen
2007-08-02 19:30 ` Christoph Lameter
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).