* [RFC PATCH 01/09] robust VM per_cpu core
2006-05-17 9:54 [RFC PATCH 00/09] robust VM per_cpu variables Steven Rostedt
@ 2006-05-17 9:56 ` Steven Rostedt
2006-05-17 9:17 ` Andi Kleen
2006-05-17 9:56 ` [RFC PATCH 01/09] robust VM per_cpu mm header update Steven Rostedt
` (8 subsequent siblings)
9 siblings, 1 reply; 20+ messages in thread
From: Steven Rostedt @ 2006-05-17 9:56 UTC (permalink / raw)
To: LKML
Cc: Rusty Russell, Paul Mackerras, Nick Piggin, Andrew Morton,
Linus Torvalds, Ingo Molnar, Thomas Gleixner, Andi Kleen,
Martin Mares, bjornw, schwidefsky, benedict.gaster, lethal,
Chris Zankel, Marc Gauthier, Joe Taylor, David Mosberger-Tang,
rth, spyro, starvik, tony.luck, linux-ia64, ralf, linux-mips,
grundler, parisc-linux, linuxppc-dev, linux390, davem, arnd,
kenneth.w.chen, sam, clameter, kiran
This is the VM per_cpu core patch. It includes the mm/per_cpu.c file
that is used to initialize and update per_cpu variables at startup
and module load.
To use this, the arch must define CONFIG_HAS_VM_PERCPU and
__ARCH_HAS_VM_PERCPU.
Also the following must be defined:
PERCPU_START - start of the percpu VM area
PERCPU_SIZE - size of the percpu VM area for each CPU so that the
total size would be PERCPU_SIZE * NR_CPUS
As well as the following three functions:
pud_t *pud_boot_alloc(struct mm_struct *mm, pgd_t *pgd, unsigned long addr,
int cpu);
pmd_t *pmd_boot_alloc(struct mm_struct *mm, pud_t *pud, unsigned long addr,
int cpu);
pte_t *pte_boot_alloc(struct mm_struct *mm, pmd_t *pmd, unsigned long addr,
int cpu);
The above functions are to allocate page tables from bootmem because the
percpu is initialized right after setup_arch in init/main.c
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Index: linux-2.6.16-test/mm/percpu.c
=================================--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.16-test/mm/percpu.c 2006-05-17 04:39:52.000000000 -0400
@@ -0,0 +1,287 @@
+/*
+ * linux/mm/percpu.c
+ *
+ * Copyright (C) 2006 Steven Rostedt <rostedt@goodmis.org>
+ *
+ * Some of this code was influenced by mm/vmalloc.c
+ *
+ * The percpu variables need to always have the same offset from one CPU to
+ * the next no matter if the percpu variable is defined in the kernel or
+ * inside a module. So to guarentee that the offset is the same for both,
+ * they are mapped into virtual memory.
+ *
+ * Since the percpu variables are used before memory is initialized, the
+ * inital setup must be done with bootmem, and thus vmalloc code can not be
+ * used.
+ *
+ * Credits:
+ * -------
+ * This goes to lots of people that inspired me on LKML, and responded to
+ * my first (horrible) implementation of robust per_cpu variables.
+ *
+ * Also many thanks to Rusty Russell in his generic per_cpu implementation.
+ */
+
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/highmem.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/interrupt.h>
+
+#include <linux/bootmem.h>
+
+#include <asm/uaccess.h>
+#include <asm/tlbflush.h>
+
+static int __init percpu_boot_alloc(unsigned long addr, unsigned long size,
+ int node);
+
+/*
+ * percpu_allocated keeps track of the actual allocated memory. It
+ * always points to the page after the last page in VM that was allocated.
+ *
+ * Yes this is also a per_cpu variable :)
+ * It gets updated after the copys are made.
+ */
+static DEFINE_PER_CPU(unsigned long, percpu_allocated);
+
+static char * __init per_cpu_allocate_init(unsigned long size, int cpu)
+{
+ unsigned long addr;
+
+ addr = PERCPU_START+(cpu*PERCPU_SIZE);
+ BUG_ON(percpu_boot_alloc(addr, size, cpu));
+
+ return (char*)addr;
+
+}
+
+/**
+ * setup_per_cpu_areas - initialization of VM per_cpu variables
+ *
+ * Allocate pages in VM for the per_cpu variables
+ * of the kernel.
+ */
+void __init setup_per_cpu_areas(void)
+{
+ unsigned long size, i;
+ char *ptr;
+
+ /* Copy section for each CPU (we discard the original) */
+ size = ALIGN(__per_cpu_end - __per_cpu_start, SMP_CACHE_BYTES);
+
+ for (i = 0; i < NR_CPUS; i++, ptr += size) {
+ ptr = per_cpu_allocate_init(size, i);
+ memcpy(ptr, __per_cpu_start, __per_cpu_end - __per_cpu_start);
+ wmb();
+ per_cpu(percpu_allocated, i) + PAGE_ALIGN((unsigned long)ptr + size);
+ }
+}
+
+static __init int percpu_boot_pte_alloc(pmd_t *pmd, unsigned long addr,
+ unsigned long end, int node)
+{
+ pte_t *pte;
+
+ pte = pte_boot_alloc(&init_mm, pmd, addr, node);
+ if (!pte)
+ return -ENOMEM;
+ do {
+ void *page;
+ WARN_ON(!pte_none(*pte));
+ page = alloc_bootmem_pages(PAGE_SIZE);
+ if (!page)
+ return -ENOMEM;
+ set_pte_at(&init_mm, addr, pte, mk_pte(virt_to_page(page),
+ PAGE_KERNEL));
+ } while (pte++, addr += PAGE_SIZE, addr < end);
+ return 0;
+}
+
+static __init int percpu_boot_pmd_alloc(pud_t *pud, unsigned long addr,
+ unsigned long end, int node)
+{
+ pmd_t *pmd;
+ unsigned long next;
+
+ pmd = pmd_boot_alloc(&init_mm, pud, addr, node);
+ if (!pud)
+ return -ENOMEM;
+ do {
+ next = pmd_addr_end(addr, end);
+ if (percpu_boot_pte_alloc(pmd, addr, next, node))
+ return -ENOMEM;
+ } while (pmd++, addr = next, addr < end);
+ return 0;
+}
+
+static __init int percpu_boot_pud_alloc(pgd_t *pgd, unsigned long addr,
+ unsigned long end, int node)
+{
+ pud_t *pud;
+ unsigned long next;
+
+ pud = pud_boot_alloc(&init_mm, pgd, addr, node);
+ if (!pud)
+ return -ENOMEM;
+ do {
+ next = pud_addr_end(addr, end);
+ if (percpu_boot_pmd_alloc(pud, addr, next, node))
+ return -ENOMEM;
+ } while (pud++, addr = next, addr < end);
+ return 0;
+}
+
+static int __init percpu_boot_alloc(unsigned long addr, unsigned long size,
+ int node)
+{
+ pgd_t *pgd;
+ unsigned long end = addr + size;
+ unsigned long next;
+ int err;
+
+ pgd = pgd_offset_k(addr);
+ do {
+ next = pgd_addr_end(addr, end);
+ err = percpu_boot_pud_alloc(pgd, addr, next, node);
+ if (err)
+ break;
+ } while (pgd++, addr = next, addr < end);
+ return err;
+}
+
+static __init int percpu_pte_alloc(pmd_t *pmd, unsigned long addr,
+ unsigned long end, int node)
+{
+ pte_t *pte;
+
+ pte = pte_alloc_kernel(pmd, addr);
+ if (!pte)
+ return -ENOMEM;
+ do {
+ void *page;
+ if (unlikely(!pte_none(*pte))) {
+ printk("bad pte: %p->%p\n", pte, (void*)pte_val(*pte));
+ BUG();
+ return -EFAULT;
+ }
+ page = (void*)__get_free_page(GFP_KERNEL);
+ if (!page)
+ return -ENOMEM;
+ set_pte_at(&init_mm, addr, pte, mk_pte(virt_to_page(page),
+ PAGE_KERNEL));
+ } while (pte++, addr += PAGE_SIZE, addr < end);
+ __flush_tlb();
+ return 0;
+}
+
+static __init int percpu_pmd_alloc(pud_t *pud, unsigned long addr,
+ unsigned long end, int node)
+{
+ pmd_t *pmd;
+ unsigned long next;
+
+ pmd = pmd_alloc(&init_mm, pud, addr);
+ if (!pmd)
+ return -ENOMEM;
+ do {
+ next = pmd_addr_end(addr, end);
+ if (percpu_pte_alloc(pmd, addr, next, node))
+ return -ENOMEM;
+ } while (pmd++, addr = next, addr < end);
+ return 0;
+}
+
+static __init int percpu_pud_alloc(pgd_t *pgd, unsigned long addr,
+ unsigned long end, int node)
+{
+ pud_t *pud;
+ unsigned long next;
+
+ pud = pud_alloc(&init_mm, pgd, addr);
+ if (!pud)
+ return -ENOMEM;
+ do {
+ next = pud_addr_end(addr, end);
+ if (percpu_pmd_alloc(pud, addr, next, node))
+ return -ENOMEM;
+ } while (pud++, addr = next, addr < end);
+ return 0;
+}
+
+static int percpu_alloc(unsigned long addr, unsigned long size,
+ int node)
+{
+ pgd_t *pgd;
+ unsigned long end = addr + size;
+ unsigned long next;
+ int err;
+
+ pgd = pgd_offset_k(addr);
+ do {
+ next = pgd_addr_end(addr, end);
+ err = percpu_pud_alloc(pgd, addr, next, node);
+ if (err)
+ break;
+ } while (pgd++, addr = next, addr < end);
+ return err;
+}
+
+static int percpu_module_update(void *pcpudst, unsigned long size, int cpu)
+{
+ int err = 0;
+ /*
+ * These two local variables are only used to keep the code
+ * looking simpler. Since this function is only called on
+ * module load, it's not time critical.
+ */
+ unsigned long needed_address = (unsigned long)
+ ((pcpudst) + __PERCPU_OFFSET_ADDRESS(cpu)+size);
+ unsigned long allocated = per_cpu(percpu_allocated, cpu);
+
+ if (allocated < needed_address) {
+ unsigned long alloc = needed_address - allocated;
+ err = percpu_alloc(allocated, alloc, cpu);
+ if (!err)
+ per_cpu(percpu_allocated, cpu) + PAGE_ALIGN(needed_address);
+ }
+ return err;
+}
+
+/**
+ * per_cpu_modcopy - copy and allocate module VM per_cpu variables
+ *
+ * @pcpudst: Destination of module per_cpu section
+ * @src: Source of module per_cpu data section
+ * @size: Size of module per_cpu data section
+ *
+ * Copy the module's data per_cpu section into each VM per_cpu section
+ * stored in the kernel. If need be, allocate more pages in VM
+ * if they are not yet allocated.
+ *
+ * protected by module_mutex
+ */
+int percpu_modcopy(void *pcpudst, void *src, unsigned long size)
+{
+ unsigned int i;
+ int err = 0;
+
+ for (i = 0; i < NR_CPUS; i++)
+ if (cpu_possible(i)) {
+ err = percpu_module_update(pcpudst, size, i);
+ if (err)
+ break;
+ memcpy((pcpudst)+__PERCPU_OFFSET_ADDRESS(i),
+ (src), (size));
+ }
+ return err;
+}
+
+/*
+ * We use the __per_cpu_start for the indexing of
+ * per_cpu variables, even in modules.
+ */
+EXPORT_SYMBOL(__per_cpu_start);
Index: linux-2.6.16-test/mm/Makefile
=================================--- linux-2.6.16-test.orig/mm/Makefile 2006-05-17 04:32:27.000000000 -0400
+++ linux-2.6.16-test/mm/Makefile 2006-05-17 04:39:52.000000000 -0400
@@ -22,3 +22,4 @@ obj-$(CONFIG_SLOB) += slob.o
obj-$(CONFIG_SLAB) += slab.o
obj-$(CONFIG_MEMORY_HOTPLUG) += memory_hotplug.o
obj-$(CONFIG_FS_XIP) += filemap_xip.o
+obj-$(CONFIG_HAS_VM_PERCPU) += percpu.o
\ No newline at end of file
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [RFC PATCH 01/09] robust VM per_cpu core
2006-05-17 9:56 ` [RFC PATCH 01/09] robust VM per_cpu core Steven Rostedt
@ 2006-05-17 9:17 ` Andi Kleen
2006-05-17 10:46 ` Steven Rostedt
0 siblings, 1 reply; 20+ messages in thread
From: Andi Kleen @ 2006-05-17 9:17 UTC (permalink / raw)
To: Steven Rostedt
Cc: LKML, Rusty Russell, Paul Mackerras, Nick Piggin, Andrew Morton,
Linus Torvalds, Ingo Molnar, Thomas Gleixner, Martin Mares,
bjornw, schwidefsky, benedict.gaster, lethal, Chris Zankel,
Marc Gauthier, Joe Taylor, David Mosberger-Tang, rth, spyro,
starvik, tony.luck, linux-ia64, ralf, linux-mips, grundler,
parisc-linux, linuxppc-dev, linux390, davem, arnd, kenneth.w.chen,
sam, clameter, kiran
> As well as the following three functions:
>
> pud_t *pud_boot_alloc(struct mm_struct *mm, pgd_t *pgd, unsigned long addr,
> int cpu);
> pmd_t *pmd_boot_alloc(struct mm_struct *mm, pud_t *pud, unsigned long addr,
> int cpu);
> pte_t *pte_boot_alloc(struct mm_struct *mm, pmd_t *pmd, unsigned long addr,
> int cpu);
I'm not sure you can just put them like this into generic code. Some
architectures are doing strange things with them.
And we already have boot_ioremap on some architectures. Why is that not
enough?
-Andi
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 01/09] robust VM per_cpu core
2006-05-17 9:17 ` Andi Kleen
@ 2006-05-17 10:46 ` Steven Rostedt
2006-05-17 11:08 ` Andi Kleen
0 siblings, 1 reply; 20+ messages in thread
From: Steven Rostedt @ 2006-05-17 10:46 UTC (permalink / raw)
To: Andi Kleen
Cc: LKML, Rusty Russell, Paul Mackerras, Nick Piggin, Andrew Morton,
Linus Torvalds, Ingo Molnar, Thomas Gleixner, Martin Mares,
bjornw, schwidefsky, lethal, Chris Zankel, Marc Gauthier,
Joe Taylor, rth, spyro, starvik, tony.luck, linux-ia64, ralf,
linux-mips, grundler, parisc-linux, linuxppc-dev, linux390, davem,
arnd, kenneth.w.chen, sam, clameter, kiran
On Wed, 17 May 2006, Andi Kleen wrote:
>
> > As well as the following three functions:
> >
> > pud_t *pud_boot_alloc(struct mm_struct *mm, pgd_t *pgd, unsigned long addr,
> > int cpu);
> > pmd_t *pmd_boot_alloc(struct mm_struct *mm, pud_t *pud, unsigned long addr,
> > int cpu);
> > pte_t *pte_boot_alloc(struct mm_struct *mm, pmd_t *pmd, unsigned long addr,
> > int cpu);
>
> I'm not sure you can just put them like this into generic code. Some
> architectures are doing strange things with them.
Hmm, like what?
>
> And we already have boot_ioremap on some architectures. Why is that not
> enough?
I thought about using boot_ioremap, but it seems to be an abuse. Since
I'm not mapping io, but actual memory pages. So the solution to that
seemed more of a hack. I then would need to worry about grabbing pages
that were node specific and getting the physical addresses. It just
looked like a cleaner solution to have an API that was for exactly what it
was meant for.
-- Steve
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 01/09] robust VM per_cpu core
2006-05-17 10:46 ` Steven Rostedt
@ 2006-05-17 11:08 ` Andi Kleen
2006-05-17 11:31 ` Steven Rostedt
0 siblings, 1 reply; 20+ messages in thread
From: Andi Kleen @ 2006-05-17 11:08 UTC (permalink / raw)
To: Steven Rostedt
Cc: LKML, Rusty Russell, Paul Mackerras, Nick Piggin, Andrew Morton,
Linus Torvalds, Ingo Molnar, Thomas Gleixner, Martin Mares,
bjornw, schwidefsky, lethal, Chris Zankel, Marc Gauthier,
Joe Taylor, rth, spyro, starvik, tony.luck, linux-ia64, ralf,
linux-mips, grundler, parisc-linux, linuxppc-dev, linux390, davem,
arnd, kenneth.w.chen, sam, clameter, kiran
On Wednesday 17 May 2006 12:46, Steven Rostedt wrote:
>
> On Wed, 17 May 2006, Andi Kleen wrote:
>
> >
> > > As well as the following three functions:
> > >
> > > pud_t *pud_boot_alloc(struct mm_struct *mm, pgd_t *pgd, unsigned long addr,
> > > int cpu);
> > > pmd_t *pmd_boot_alloc(struct mm_struct *mm, pud_t *pud, unsigned long addr,
> > > int cpu);
> > > pte_t *pte_boot_alloc(struct mm_struct *mm, pmd_t *pmd, unsigned long addr,
> > > int cpu);
> >
> > I'm not sure you can just put them like this into generic code. Some
> > architectures are doing strange things with them.
>
> Hmm, like what?
Mostly managing their software TLBs I think.
> >
> > And we already have boot_ioremap on some architectures. Why is that not
> > enough?
>
> I thought about using boot_ioremap, but it seems to be an abuse. Since
> I'm not mapping io, but actual memory pages.
We already use it for memory, e.g. for mapping some BIOS tables.
> So the solution to that
> seemed more of a hack. I then would need to worry about grabbing pages
> that were node specific
alloc_bootmem_node
> and getting the physical addresses.
virt_to_phys()
[ + hacks to handle 32bit NUMA unfortunately ]
-Andi
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 01/09] robust VM per_cpu core
2006-05-17 11:08 ` Andi Kleen
@ 2006-05-17 11:31 ` Steven Rostedt
0 siblings, 0 replies; 20+ messages in thread
From: Steven Rostedt @ 2006-05-17 11:31 UTC (permalink / raw)
To: Andi Kleen
Cc: LKML, Rusty Russell, Paul Mackerras, Nick Piggin, Andrew Morton,
Linus Torvalds, Ingo Molnar, Thomas Gleixner, Martin Mares,
bjornw, schwidefsky, lethal, Chris Zankel, Marc Gauthier,
Joe Taylor, rth, spyro, starvik, tony.luck, linux-ia64, ralf,
linux-mips, grundler, parisc-linux, linuxppc-dev, linux390, davem,
arnd, kenneth.w.chen, sam, clameter, kiran
On Wed, 17 May 2006, Andi Kleen wrote:
> On Wednesday 17 May 2006 12:46, Steven Rostedt wrote:
> >
> > On Wed, 17 May 2006, Andi Kleen wrote:
> >
> > >
> > > > As well as the following three functions:
> > > >
> > > > pud_t *pud_boot_alloc(struct mm_struct *mm, pgd_t *pgd, unsigned long addr,
> > > > int cpu);
> > > > pmd_t *pmd_boot_alloc(struct mm_struct *mm, pud_t *pud, unsigned long addr,
> > > > int cpu);
> > > > pte_t *pte_boot_alloc(struct mm_struct *mm, pmd_t *pmd, unsigned long addr,
> > > > int cpu);
> > >
> > > I'm not sure you can just put them like this into generic code. Some
> > > architectures are doing strange things with them.
> >
> > Hmm, like what?
>
> Mostly managing their software TLBs I think.
Wait. "into generic code"? Do you mean that we can't use them in generic
code. The are not defined there, but are defined in the arch. They are
just used like the p*_alloc (without boot) equivalents (from vmalloc.c).
>
> > >
> > > And we already have boot_ioremap on some architectures. Why is that not
> > > enough?
> >
> > I thought about using boot_ioremap, but it seems to be an abuse. Since
> > I'm not mapping io, but actual memory pages.
>
> We already use it for memory, e.g. for mapping some BIOS tables.
>
> > So the solution to that
> > seemed more of a hack. I then would need to worry about grabbing pages
> > that were node specific
>
> alloc_bootmem_node
>
> > and getting the physical addresses.
>
> virt_to_phys()
>
> [ + hacks to handle 32bit NUMA unfortunately ]
With the archs defining their own p*_boot_alloc functions, there shouldn't
be any hacks. The arch can figure out what to do. I didn't want any hacks
in the generic code.
-- Steve
^ permalink raw reply [flat|nested] 20+ messages in thread
* [RFC PATCH 01/09] robust VM per_cpu mm header update
2006-05-17 9:54 [RFC PATCH 00/09] robust VM per_cpu variables Steven Rostedt
2006-05-17 9:56 ` [RFC PATCH 01/09] robust VM per_cpu core Steven Rostedt
@ 2006-05-17 9:56 ` Steven Rostedt
2006-05-17 9:57 ` [RFC PATCH 03/09] robust VM per_cpu generic header Steven Rostedt
` (7 subsequent siblings)
9 siblings, 0 replies; 20+ messages in thread
From: Steven Rostedt @ 2006-05-17 9:56 UTC (permalink / raw)
To: LKML
Cc: Rusty Russell, Paul Mackerras, Nick Piggin, Andrew Morton,
Linus Torvalds, Ingo Molnar, Thomas Gleixner, Andi Kleen,
Martin Mares, bjornw, schwidefsky, benedict.gaster, lethal,
Chris Zankel, Marc Gauthier, Joe Taylor, David Mosberger-Tang,
rth, spyro, starvik, tony.luck, linux-ia64, ralf, linux-mips,
grundler, parisc-linux, linuxppc-dev, linux390, davem, arnd,
kenneth.w.chen, sam, clameter, kiran
This patch declares the three functions needed by the archs to
implement the percpu VM.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Index: linux-2.6.16-test/include/linux/mm.h
=================================--- linux-2.6.16-test.orig/include/linux/mm.h 2006-05-17 04:32:27.000000000 -0400
+++ linux-2.6.16-test/include/linux/mm.h 2006-05-17 04:56:52.000000000 -0400
@@ -795,6 +795,15 @@ int __pmd_alloc(struct mm_struct *mm, pu
int __pte_alloc(struct mm_struct *mm, pmd_t *pmd, unsigned long address);
int __pte_alloc_kernel(pmd_t *pmd, unsigned long address);
+#ifdef CONFIG_HAS_VM_PERCPU
+pud_t *pud_boot_alloc(struct mm_struct *mm, pgd_t *pgd, unsigned long addr,
+ int cpu);
+pmd_t *pmd_boot_alloc(struct mm_struct *mm, pud_t *pud, unsigned long addr,
+ int cpu);
+pte_t *pte_boot_alloc(struct mm_struct *mm, pmd_t *pmd, unsigned long addr,
+ int cpu);
+#endif
+
/*
* The following ifdef needed to get the 4level-fixup.h header to work.
* Remove it when 4level-fixup.h has been removed.
^ permalink raw reply [flat|nested] 20+ messages in thread* [RFC PATCH 03/09] robust VM per_cpu generic header
2006-05-17 9:54 [RFC PATCH 00/09] robust VM per_cpu variables Steven Rostedt
2006-05-17 9:56 ` [RFC PATCH 01/09] robust VM per_cpu core Steven Rostedt
2006-05-17 9:56 ` [RFC PATCH 01/09] robust VM per_cpu mm header update Steven Rostedt
@ 2006-05-17 9:57 ` Steven Rostedt
2006-05-17 9:58 ` [RFC PATCH 04/09] robust VM per_cpu main startup Steven Rostedt
` (6 subsequent siblings)
9 siblings, 0 replies; 20+ messages in thread
From: Steven Rostedt @ 2006-05-17 9:57 UTC (permalink / raw)
To: LKML
Cc: Rusty Russell, Paul Mackerras, Nick Piggin, Andrew Morton,
Linus Torvalds, Ingo Molnar, Thomas Gleixner, Andi Kleen,
Martin Mares, bjornw, schwidefsky, benedict.gaster, lethal,
Chris Zankel, Marc Gauthier, Joe Taylor, David Mosberger-Tang,
rth, spyro, starvik, tony.luck, linux-ia64, ralf, linux-mips,
grundler, parisc-linux, linuxppc-dev, linux390, davem, arnd,
kenneth.w.chen, sam, clameter, kiran
This patch adds the VM per_cpu to the generic per_cpu.h header.
If __ARCH_HAS_VM_PERCPU is defined, it is expected that the arch
also defined the following:
PERCPU_START - start of VM area that per_cpu variables will be stored.
PERCPU_SIZE - size of the VM area for each CPU. So the total size
would be PERCPU_SIZE * NR_CPU
If __ARCH_HAS_VM_PERCPU is not defined, it falls back to the old
percpu hack.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Index: linux-2.6.16-test/include/asm-generic/percpu.h
=================================--- linux-2.6.16-test.orig/include/asm-generic/percpu.h 2006-05-17 04:32:27.000000000 -0400
+++ linux-2.6.16-test/include/asm-generic/percpu.h 2006-05-17 04:57:21.000000000 -0400
@@ -5,25 +5,52 @@
#define __GENERIC_PER_CPU
#ifdef CONFIG_SMP
-extern unsigned long __per_cpu_offset[NR_CPUS];
-
/* Separate out the type, so (int[3], foo) works. */
#define DEFINE_PER_CPU(type, name) \
__attribute__((__section__(".data.percpu"))) __typeof__(type) per_cpu__##name
+#ifdef __ARCH_HAS_VM_PERCPU
+
+#include <asm/sections.h>
+
+/*
+ * This is included in linux/percpu.h and if PERCPU_ENOUGH_ROOM is already
+ * defined, it wont overwrite it.
+ * This allows kernel/module.c to be the same for both archs with VM
+ * per_cpu and without.
+ */
+#define PERCPU_ENOUGH_ROOM PERCPU_SIZE
+
+#define __PERCPU_OFFSET_ADDRESS(i) ((PERCPU_START+PERCPU_SIZE*(i)) - \
+ (unsigned long)__per_cpu_start)
+
+extern void setup_per_cpu_areas (void);
+extern int percpu_modcopy(void *pcpudst, void *src, unsigned long size);
+
+#else /* !__ARCH_HAS_VM_PERCPU */
+
+extern unsigned long __per_cpu_offset[NR_CPUS];
+
+#define __PERCPU_OFFSET_ADDRESS(i) __per_cpu_offset[i]
+
+/* A macro to avoid #include hell... */
+#define percpu_modcopy(pcpudst, src, size) \
+({ \
+ unsigned int __i; \
+ for (__i = 0; __i < NR_CPUS; __i++) \
+ if (cpu_possible(__i)) \
+ memcpy((pcpudst)+__PERCPU_OFFSET_ADDRESS(__i), \
+ (src), (size)); \
+ 0; \
+})
+
+#endif /* __ARCH_HAS_VM_PERCPU */
+
/* var is in discarded region: offset to particular copy we want */
-#define per_cpu(var, cpu) (*RELOC_HIDE(&per_cpu__##var, __per_cpu_offset[cpu]))
+#define per_cpu(var, cpu) (*RELOC_HIDE(&per_cpu__##var, \
+ __PERCPU_OFFSET_ADDRESS(cpu)))
#define __get_cpu_var(var) per_cpu(var, smp_processor_id())
-/* A macro to avoid #include hell... */
-#define percpu_modcopy(pcpudst, src, size) \
-do { \
- unsigned int __i; \
- for (__i = 0; __i < NR_CPUS; __i++) \
- if (cpu_possible(__i)) \
- memcpy((pcpudst)+__per_cpu_offset[__i], \
- (src), (size)); \
-} while (0)
#else /* ! SMP */
#define DEFINE_PER_CPU(type, name) \
^ permalink raw reply [flat|nested] 20+ messages in thread* [RFC PATCH 04/09] robust VM per_cpu main startup
2006-05-17 9:54 [RFC PATCH 00/09] robust VM per_cpu variables Steven Rostedt
` (2 preceding siblings ...)
2006-05-17 9:57 ` [RFC PATCH 03/09] robust VM per_cpu generic header Steven Rostedt
@ 2006-05-17 9:58 ` Steven Rostedt
2006-05-17 9:59 ` [RFC PATCH 05/09] robust VM per_cpu module Steven Rostedt
` (5 subsequent siblings)
9 siblings, 0 replies; 20+ messages in thread
From: Steven Rostedt @ 2006-05-17 9:58 UTC (permalink / raw)
To: LKML
Cc: Rusty Russell, Paul Mackerras, Nick Piggin, Andrew Morton,
Linus Torvalds, Ingo Molnar, Thomas Gleixner, Andi Kleen,
Martin Mares, bjornw, schwidefsky, benedict.gaster, lethal,
Chris Zankel, Marc Gauthier, Joe Taylor, David Mosberger-Tang,
rth, spyro, starvik, tony.luck, linux-ia64, ralf, linux-mips,
grundler, parisc-linux, linuxppc-dev, linux390, davem, arnd,
kenneth.w.chen, sam, clameter, kiran
This patch disables the generic setup if __ARCH_HAS_VM_PERCPU defined.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Index: linux-2.6.16-test/init/main.c
=================================--- linux-2.6.16-test.orig/init/main.c 2006-05-17 04:32:28.000000000 -0400
+++ linux-2.6.16-test/init/main.c 2006-05-17 04:57:45.000000000 -0400
@@ -324,7 +324,7 @@ static inline void smp_prepare_cpus(unsi
#else
-#ifdef __GENERIC_PER_CPU
+#if defined(__GENERIC_PER_CPU) && !defined(__ARCH_HAS_VM_PERCPU)
unsigned long __per_cpu_offset[NR_CPUS];
EXPORT_SYMBOL(__per_cpu_offset);
^ permalink raw reply [flat|nested] 20+ messages in thread* [RFC PATCH 05/09] robust VM per_cpu module
2006-05-17 9:54 [RFC PATCH 00/09] robust VM per_cpu variables Steven Rostedt
` (3 preceding siblings ...)
2006-05-17 9:58 ` [RFC PATCH 04/09] robust VM per_cpu main startup Steven Rostedt
@ 2006-05-17 9:59 ` Steven Rostedt
2006-05-17 10:00 ` [RFC PATCH 06/09] robust VM per_cpu i386 bootmem Steven Rostedt
` (4 subsequent siblings)
9 siblings, 0 replies; 20+ messages in thread
From: Steven Rostedt @ 2006-05-17 9:59 UTC (permalink / raw)
To: LKML
Cc: Rusty Russell, Paul Mackerras, Nick Piggin, Andrew Morton,
Linus Torvalds, Ingo Molnar, Thomas Gleixner, Andi Kleen,
Martin Mares, bjornw, schwidefsky, benedict.gaster, lethal,
Chris Zankel, Marc Gauthier, Joe Taylor, David Mosberger-Tang,
rth, spyro, starvik, tony.luck, linux-ia64, ralf, linux-mips,
grundler, parisc-linux, linuxppc-dev, linux390, davem, arnd,
kenneth.w.chen, sam, clameter, kiran
This patch performs a check on the return value of percpu_modcopy
for the module load.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Index: linux-2.6.16-test/kernel/module.c
=================================--- linux-2.6.16-test.orig/kernel/module.c 2006-05-17 04:32:28.000000000 -0400
+++ linux-2.6.16-test/kernel/module.c 2006-05-17 04:57:53.000000000 -0400
@@ -1819,8 +1819,11 @@ static struct module *load_module(void _
sort_extable(extable, extable + mod->num_exentries);
/* Finally, copy percpu area over. */
- percpu_modcopy(mod->percpu, (void *)sechdrs[pcpuindex].sh_addr,
- sechdrs[pcpuindex].sh_size);
+ err = percpu_modcopy(mod->percpu, (void *)sechdrs[pcpuindex].sh_addr,
+ sechdrs[pcpuindex].sh_size);
+
+ if (err < 0)
+ goto cleanup;
add_kallsyms(mod, sechdrs, symindex, strindex, secstrings);
^ permalink raw reply [flat|nested] 20+ messages in thread* [RFC PATCH 06/09] robust VM per_cpu i386 bootmem
2006-05-17 9:54 [RFC PATCH 00/09] robust VM per_cpu variables Steven Rostedt
` (4 preceding siblings ...)
2006-05-17 9:59 ` [RFC PATCH 05/09] robust VM per_cpu module Steven Rostedt
@ 2006-05-17 10:00 ` Steven Rostedt
2006-05-17 10:01 ` [RFC PATCH 07/09] robust VM per_cpu i386 VM area Steven Rostedt
` (3 subsequent siblings)
9 siblings, 0 replies; 20+ messages in thread
From: Steven Rostedt @ 2006-05-17 10:00 UTC (permalink / raw)
To: LKML
Cc: Rusty Russell, Paul Mackerras, Nick Piggin, Andrew Morton,
Linus Torvalds, Ingo Molnar, Thomas Gleixner, Andi Kleen,
Martin Mares, bjornw, schwidefsky, benedict.gaster, lethal,
Chris Zankel, Marc Gauthier, Joe Taylor, David Mosberger-Tang,
rth, spyro, starvik, tony.luck, linux-ia64, ralf, linux-mips,
grundler, parisc-linux, linuxppc-dev, linux390, davem, arnd,
kenneth.w.chen, sam, clameter, kiran
This patch was to get my VM percpu working on my laptop. This patch
still needs work to handle NUMA and other types of X86 architectures.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Index: linux-2.6.16-test/arch/i386/mm/init.c
=================================--- linux-2.6.16-test.orig/arch/i386/mm/init.c 2006-05-17 04:32:28.000000000 -0400
+++ linux-2.6.16-test/arch/i386/mm/init.c 2006-05-17 04:58:37.000000000 -0400
@@ -772,3 +772,39 @@ void free_initrd_mem(unsigned long start
}
}
#endif
+
+/*
+ * The following three functions are to impement per_cpu variables
+ * into VM. per_cpu variables are initialized very early on startup
+ * and before memory management. So the per_cpu initialization needs
+ * a way to allocate pages using bootmem.
+ */
+pud_t __init *pud_boot_alloc(struct mm_struct *mm, pgd_t *pgd,
+ unsigned long addr, int cpu)
+{
+ return (pud_t*)pgd;
+}
+
+pmd_t __init *pmd_boot_alloc(struct mm_struct *mm, pud_t *pud,
+ unsigned long addr, int cpu)
+{
+ return pmd_offset(pud, addr);
+}
+
+/* FIXME - handle NUMA handling with the CPU parameter */
+pte_t __init *pte_boot_alloc(struct mm_struct *mm, pmd_t *pmd,
+ unsigned long addr, int cpu)
+{
+ pte_t *pte;
+
+ if (pmd_none(*pmd)) {
+ pte = alloc_bootmem_pages(PAGE_SIZE);
+ if (!pte)
+ return NULL;
+ mm->nr_ptes++;
+ set_pmd(pmd, __pmd(__pa(pte) | _PAGE_TABLE));
+ } else
+ pte = pte_offset_kernel(pmd, addr);
+
+ return pte;
+}
^ permalink raw reply [flat|nested] 20+ messages in thread* [RFC PATCH 07/09] robust VM per_cpu i386 VM area
2006-05-17 9:54 [RFC PATCH 00/09] robust VM per_cpu variables Steven Rostedt
` (5 preceding siblings ...)
2006-05-17 10:00 ` [RFC PATCH 06/09] robust VM per_cpu i386 bootmem Steven Rostedt
@ 2006-05-17 10:01 ` Steven Rostedt
2006-05-17 10:01 ` [RFC PATCH 08/09] robust VM per_cpu i386 header Steven Rostedt
` (2 subsequent siblings)
9 siblings, 0 replies; 20+ messages in thread
From: Steven Rostedt @ 2006-05-17 10:01 UTC (permalink / raw)
To: LKML
Cc: Rusty Russell, Paul Mackerras, Nick Piggin, Andrew Morton,
Linus Torvalds, Ingo Molnar, Thomas Gleixner, Andi Kleen,
Martin Mares, bjornw, schwidefsky, benedict.gaster, lethal,
Chris Zankel, Marc Gauthier, Joe Taylor, David Mosberger-Tang,
rth, spyro, starvik, tony.luck, linux-ia64, ralf, linux-mips,
grundler, parisc-linux, linuxppc-dev, linux390, davem, arnd,
kenneth.w.chen, sam, clameter, kiran
This patch allocates the percpu VM area using the fix addresses.
It defines currently 1 meg per cpu.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Index: linux-2.6.16-test/include/asm-i386/fixmap.h
=================================--- linux-2.6.16-test.orig/include/asm-i386/fixmap.h 2006-05-17 04:32:27.000000000 -0400
+++ linux-2.6.16-test/include/asm-i386/fixmap.h 2006-05-17 04:59:34.000000000 -0400
@@ -32,6 +32,10 @@
#include <asm/kmap_types.h>
#endif
+/* One meg per cpu of VM space */
+#define PERCPU_PAGES 256
+#define PERCPU_SIZE (PERCPU_PAGES << PAGE_SHIFT)
+
/*
* Here we define all the compile-time 'special' virtual
* addresses. The point is to have a constant address at
@@ -83,6 +87,8 @@ enum fixed_addresses {
#ifdef CONFIG_PCI_MMCONFIG
FIX_PCIE_MCFG,
#endif
+ FIX_PERCPU_BEGIN,
+ FIX_PERCPU_END = FIX_PERCPU_BEGIN+(PERCPU_PAGES*NR_CPUS)-1,
__end_of_permanent_fixed_addresses,
/* temporary boot-time mappings, used before ioremap() is functional */
#define NR_FIX_BTMAPS 16
^ permalink raw reply [flat|nested] 20+ messages in thread* [RFC PATCH 08/09] robust VM per_cpu i386 header
2006-05-17 9:54 [RFC PATCH 00/09] robust VM per_cpu variables Steven Rostedt
` (6 preceding siblings ...)
2006-05-17 10:01 ` [RFC PATCH 07/09] robust VM per_cpu i386 VM area Steven Rostedt
@ 2006-05-17 10:01 ` Steven Rostedt
2006-05-17 10:01 ` [RFC PATCH 09/09] robust VM per_cpu i386 Kconfig update Steven Rostedt
2006-05-17 14:52 ` [RFC PATCH 00/09] robust VM per_cpu variables Christoph Lameter
9 siblings, 0 replies; 20+ messages in thread
From: Steven Rostedt @ 2006-05-17 10:01 UTC (permalink / raw)
To: LKML
Cc: Rusty Russell, Paul Mackerras, Nick Piggin, Andrew Morton,
Linus Torvalds, Ingo Molnar, Thomas Gleixner, Andi Kleen,
Martin Mares, bjornw, schwidefsky, benedict.gaster, lethal,
Chris Zankel, Marc Gauthier, Joe Taylor, David Mosberger-Tang,
rth, spyro, starvik, tony.luck, linux-ia64, ralf, linux-mips,
grundler, parisc-linux, linuxppc-dev, linux390, davem, arnd,
kenneth.w.chen, sam, clameter, kiran
This patch adds the __ARCH_HAS_VM_PERCPU to i386 and defines
the PERCPU_START macro.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Index: linux-2.6.16-test/include/asm-i386/percpu.h
=================================--- linux-2.6.16-test.orig/include/asm-i386/percpu.h 2006-05-17 04:32:27.000000000 -0400
+++ linux-2.6.16-test/include/asm-i386/percpu.h 2006-05-17 05:00:00.000000000 -0400
@@ -1,6 +1,16 @@
#ifndef __ARCH_I386_PERCPU__
#define __ARCH_I386_PERCPU__
+#ifdef CONFIG_HAS_VM_PERCPU
+#define __ARCH_HAS_VM_PERCPU
+#include <asm/fixmap.h>
+
+/*
+ * Virtual address space for the percpu area.
+ */
+#define PERCPU_START (__fix_to_virt(FIX_PERCPU_END))
+#endif /* CONFIG_HAS_VM_PERCPU */
+
#include <asm-generic/percpu.h>
#endif /* __ARCH_I386_PERCPU__ */
^ permalink raw reply [flat|nested] 20+ messages in thread* [RFC PATCH 09/09] robust VM per_cpu i386 Kconfig update
2006-05-17 9:54 [RFC PATCH 00/09] robust VM per_cpu variables Steven Rostedt
` (7 preceding siblings ...)
2006-05-17 10:01 ` [RFC PATCH 08/09] robust VM per_cpu i386 header Steven Rostedt
@ 2006-05-17 10:01 ` Steven Rostedt
2006-05-17 14:52 ` [RFC PATCH 00/09] robust VM per_cpu variables Christoph Lameter
9 siblings, 0 replies; 20+ messages in thread
From: Steven Rostedt @ 2006-05-17 10:01 UTC (permalink / raw)
To: LKML
Cc: Rusty Russell, Paul Mackerras, Nick Piggin, Andrew Morton,
Linus Torvalds, Ingo Molnar, Thomas Gleixner, Andi Kleen,
Martin Mares, bjornw, schwidefsky, benedict.gaster, lethal,
Chris Zankel, Marc Gauthier, Joe Taylor, David Mosberger-Tang,
rth, spyro, starvik, tony.luck, linux-ia64, ralf, linux-mips,
grundler, parisc-linux, linuxppc-dev, linux390, davem, arnd,
kenneth.w.chen, sam, clameter, kiran
This patch forces the CONFIG_HAS_VM_PERCU to be defined for i386.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Index: linux-2.6.16-test/arch/i386/Kconfig
=================================--- linux-2.6.16-test.orig/arch/i386/Kconfig 2006-05-17 04:32:27.000000000 -0400
+++ linux-2.6.16-test/arch/i386/Kconfig 2006-05-17 05:00:10.000000000 -0400
@@ -1116,3 +1116,7 @@ config X86_TRAMPOLINE
config KTIME_SCALAR
bool
default y
+
+config HAS_VM_PERCPU
+ bool
+ default y
\ No newline at end of file
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [RFC PATCH 00/09] robust VM per_cpu variables
2006-05-17 9:54 [RFC PATCH 00/09] robust VM per_cpu variables Steven Rostedt
` (8 preceding siblings ...)
2006-05-17 10:01 ` [RFC PATCH 09/09] robust VM per_cpu i386 Kconfig update Steven Rostedt
@ 2006-05-17 14:52 ` Christoph Lameter
2006-05-17 15:18 ` Steven Rostedt
9 siblings, 1 reply; 20+ messages in thread
From: Christoph Lameter @ 2006-05-17 14:52 UTC (permalink / raw)
To: Steven Rostedt
Cc: LKML, Rusty Russell, Paul Mackerras, Nick Piggin, Andrew Morton,
Linus Torvalds, Ingo Molnar, Thomas Gleixner, Andi Kleen,
Martin Mares, bjornw, schwidefsky, benedict.gaster, lethal,
Chris Zankel, Marc Gauthier, Joe Taylor, David Mosberger-Tang,
rth, spyro, starvik, tony.luck, linux-ia64, ralf, linux-mips,
grundler, parisc-linux, linuxppc-dev, linux390, davem, arnd,
kenneth.w.chen, sam, kiran
On Wed, 17 May 2006, Steven Rostedt wrote:
> My first attempt to fix this introduced another dereference, to allow
> for modules to allocate their own memory. This was quickly shot down,
> and for good reason, because dereferences kill performance, and don't
> play nice with large SMP systems that depend on per_cpu being fast.
> I now place the per_cpu variables into VM, such that the pages are
> only allocated when needed. All the architecture needs to do is
> supply a VM address range, size for each CPU to use (note this
> implementation expects all the VM CPU areas to be together), and
> three functions to allow for allocating page tables at bootup.
So now instead of an explicit indirection we use an implicit one
through the page tables for this. This happens during early boot which
requires additional page table functions? And it requires the use of an
additional TLB entry? I guess that the additional TLB pressure alone will
result in a performance drop of 3%?
See http://www.gelato.unsw.edu.au/archives/linux-ia64/0602/17311.html
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [RFC PATCH 00/09] robust VM per_cpu variables
2006-05-17 14:52 ` [RFC PATCH 00/09] robust VM per_cpu variables Christoph Lameter
@ 2006-05-17 15:18 ` Steven Rostedt
2006-05-17 15:49 ` Christoph Lameter
0 siblings, 1 reply; 20+ messages in thread
From: Steven Rostedt @ 2006-05-17 15:18 UTC (permalink / raw)
To: Christoph Lameter
Cc: LKML, Rusty Russell, Paul Mackerras, Nick Piggin, Andrew Morton,
Linus Torvalds, Ingo Molnar, Thomas Gleixner, Andi Kleen,
Martin Mares, bjornw, schwidefsky, benedict.gaster, lethal,
Chris Zankel, Marc Gauthier, Joe Taylor, David Mosberger-Tang,
rth, spyro, starvik, tony.luck, linux-ia64, ralf, linux-mips,
grundler, parisc-linux, linuxppc-dev, linux390, davem, arnd,
kenneth.w.chen, sam, kiran
Hi Christoph,
Thanks for replying!
On Wed, 17 May 2006, Christoph Lameter wrote:
> On Wed, 17 May 2006, Steven Rostedt wrote:
>
> > My first attempt to fix this introduced another dereference, to allow
> > for modules to allocate their own memory. This was quickly shot down,
> > and for good reason, because dereferences kill performance, and don't
> > play nice with large SMP systems that depend on per_cpu being fast.
>
> > I now place the per_cpu variables into VM, such that the pages are
> > only allocated when needed. All the architecture needs to do is
> > supply a VM address range, size for each CPU to use (note this
> > implementation expects all the VM CPU areas to be together), and
> > three functions to allow for allocating page tables at bootup.
>
> So now instead of an explicit indirection we use an implicit one
> through the page tables for this. This happens during early boot which
> requires additional page table functions? And it requires the use of an
> additional TLB entry? I guess that the additional TLB pressure alone will
> result in a performance drop of 3%?
Ouch!
>
> See http://www.gelato.unsw.edu.au/archives/linux-ia64/0602/17311.html
Thanks for the link.
Hmm, my main goal is still to make the per_cpu more robust, so that the
generic code is truely that, and the hacks are better managed. Would the
TLB pressure on a normal desktop also cause the drop in performance? I
haven't tried any benchmarks. Have any tests I can run on two kernels?
I'm currently running my machine with the patches and I haven't noticed
a difference. Although I'm not doing database work, I'm still compiling
kernels.
Reason I'm asking, is that I wonder if the whole VM idea is a waste, or is
it only a problem on certain archs?
Perhaps move the whole of percpu_boot_alloc into the arch, and let it do
the allocation as is. Could perhaps use some arch specific register to
calculate the entries.
OK, now I'm just rambling. I don't know, have any other ideas on making
this more robust? Or is this all in vain, and I should spend my evenings
walking around this beautiful town of Karlsruhe ;)
-- Steve
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 00/09] robust VM per_cpu variables
2006-05-17 15:18 ` Steven Rostedt
@ 2006-05-17 15:49 ` Christoph Lameter
2006-05-17 15:56 ` Steven Rostedt
0 siblings, 1 reply; 20+ messages in thread
From: Christoph Lameter @ 2006-05-17 15:49 UTC (permalink / raw)
To: Steven Rostedt
Cc: LKML, Rusty Russell, Paul Mackerras, Nick Piggin, Andrew Morton,
Linus Torvalds, Ingo Molnar, Thomas Gleixner, Andi Kleen,
Martin Mares, bjornw, schwidefsky, benedict.gaster, lethal,
Chris Zankel, Marc Gauthier, Joe Taylor, David Mosberger-Tang,
rth, spyro, starvik, tony.luck, linux-ia64, ralf, linux-mips,
grundler, parisc-linux, linuxppc-dev, linux390, davem, arnd,
kenneth.w.chen, sam, kiran
On Wed, 17 May 2006, Steven Rostedt wrote:
> OK, now I'm just rambling. I don't know, have any other ideas on making
> this more robust? Or is this all in vain, and I should spend my evenings
> walking around this beautiful town of Karlsruhe ;)
Well I'd like to see a comprehensive solution including a fix for the
problems with allocper_cpu() allocations (allocper_cpu has to allocate
memory for potential processors... which could be a lot on
some types of systems and its allocated somewhere not on the nodes of the
processor since they may not yet be online).
Wish I could be back home in Germany to talk a walk with you. Are you
coming to the OLS?
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 00/09] robust VM per_cpu variables
2006-05-17 15:49 ` Christoph Lameter
@ 2006-05-17 15:56 ` Steven Rostedt
2006-05-17 17:40 ` Christoph Lameter
0 siblings, 1 reply; 20+ messages in thread
From: Steven Rostedt @ 2006-05-17 15:56 UTC (permalink / raw)
To: Christoph Lameter
Cc: LKML, Rusty Russell, Paul Mackerras, Nick Piggin, Andrew Morton,
Linus Torvalds, Ingo Molnar, Thomas Gleixner, Andi Kleen,
Martin Mares, bjornw, schwidefsky, benedict.gaster, lethal,
Chris Zankel, Marc Gauthier, Joe Taylor, David Mosberger-Tang,
rth, spyro, starvik, tony.luck, linux-ia64, ralf, linux-mips,
grundler, parisc-linux, linuxppc-dev, linux390, davem, arnd,
kenneth.w.chen, sam, kiran
On Wed, 17 May 2006, Christoph Lameter wrote:
> On Wed, 17 May 2006, Steven Rostedt wrote:
>
> > OK, now I'm just rambling. I don't know, have any other ideas on making
> > this more robust? Or is this all in vain, and I should spend my evenings
> > walking around this beautiful town of Karlsruhe ;)
>
> Well I'd like to see a comprehensive solution including a fix for the
> problems with allocper_cpu() allocations (allocper_cpu has to allocate
> memory for potential processors... which could be a lot on
> some types of systems and its allocated somewhere not on the nodes of the
> processor since they may not yet be online).
OK, now you're beyond what I'm working with ;) No hot plug CPUs for me.
Well, at least not yet!
>
> Wish I could be back home in Germany to talk a walk with you. Are you
> coming to the OLS?
I'm just here on business. Will be back home in the States on Saturday.
Yep, I'll be at OLS. Hopefully we can get a group together to do some
brainstorming.
Thanks,
-- Steve
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 00/09] robust VM per_cpu variables
2006-05-17 15:56 ` Steven Rostedt
@ 2006-05-17 17:40 ` Christoph Lameter
2006-05-18 7:00 ` Steven Rostedt
0 siblings, 1 reply; 20+ messages in thread
From: Christoph Lameter @ 2006-05-17 17:40 UTC (permalink / raw)
To: Steven Rostedt
Cc: LKML, Rusty Russell, Paul Mackerras, Nick Piggin, Andrew Morton,
Linus Torvalds, Ingo Molnar, Thomas Gleixner, Andi Kleen,
Martin Mares, bjornw, schwidefsky, benedict.gaster, lethal,
Chris Zankel, Marc Gauthier, Joe Taylor, David Mosberger-Tang,
rth, spyro, starvik, tony.luck, linux-ia64, ralf, linux-mips,
grundler, parisc-linux, linuxppc-dev, linux390, davem, arnd,
kenneth.w.chen, sam, kiran
On Wed, 17 May 2006, Steven Rostedt wrote:
> > Well I'd like to see a comprehensive solution including a fix for the
> > problems with allocper_cpu() allocations (allocper_cpu has to allocate
> > memory for potential processors... which could be a lot on
> > some types of systems and its allocated somewhere not on the nodes of the
> > processor since they may not yet be online).
>
> OK, now you're beyond what I'm working with ;) No hot plug CPUs for me.
> Well, at least not yet!
You need to at least consider how this could be handled by the per_cpu
memory manangement. The VM thingie with dynamic per cpu memory would allow
a fixup of allocpercpu.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH 00/09] robust VM per_cpu variables
2006-05-17 17:40 ` Christoph Lameter
@ 2006-05-18 7:00 ` Steven Rostedt
0 siblings, 0 replies; 20+ messages in thread
From: Steven Rostedt @ 2006-05-18 7:00 UTC (permalink / raw)
To: Christoph Lameter
Cc: LKML, Rusty Russell, Paul Mackerras, Nick Piggin, Andrew Morton,
Linus Torvalds, Ingo Molnar, Thomas Gleixner, Andi Kleen,
Martin Mares, bjornw, schwidefsky, benedict.gaster, lethal,
Chris Zankel, Marc Gauthier, Joe Taylor, David Mosberger-Tang,
rth, spyro, starvik, tony.luck, linux-ia64, ralf, linux-mips,
grundler, parisc-linux, linuxppc-dev, linux390, davem, arnd,
kenneth.w.chen, sam, kiran
On Wed, 17 May 2006, Christoph Lameter wrote:
> On Wed, 17 May 2006, Steven Rostedt wrote:
>
> > > Well I'd like to see a comprehensive solution including a fix for the
> > > problems with allocper_cpu() allocations (allocper_cpu has to allocate
> > > memory for potential processors... which could be a lot on
> > > some types of systems and its allocated somewhere not on the nodes of the
> > > processor since they may not yet be online).
> >
> > OK, now you're beyond what I'm working with ;) No hot plug CPUs for me.
> > Well, at least not yet!
>
> You need to at least consider how this could be handled by the per_cpu
> memory manangement. The VM thingie with dynamic per cpu memory would allow
> a fixup of allocpercpu.
>
Last night, while aimlessly wondering the streets of Karlsruhe, I thought
of some ideas. Maybe not very good ideas, but ideas never-the-less.
How about a hybrid? Have the normal in kernel code use what is there
today, with the indirection. Have modules and add on CPUs use an
allocated vm area.
Here's the thought:
Have the boot time per_cpu variables (BTPCV) allocated like it is today.
But make sure that they are paged align.
Store away the initial values of per_cpu variables (PCV), for systems
with hot pluggable CPUs.
For modules, have a VM dedicated area. The assumption is that the
modules will not have more PCVs than the kernel. Hopefully the PCVs
of a module will not strain the TLB too much. As long as the modules
PCVs are separated per cpu the same as the BTPCV then this will work.
We can even use the extra space that was added in the alignment,
if the modules PCV section is small enough to fit.
Now for allocated PCV for a hot plugged CPU. We can dynamically
allocate them when the CPU is loaded, and copy in the saved BTPCV.
The hotplug CPU handling might be hard to work with the module handling,
but both should be simple by themselves. So if we concentrate on just the
hotplug first, then this might actually benefit you.
So, Allocate a page alligned BTPCV for each online CPU and copy the
section into them. Keep the initial section around.
When a CPU comes on line, allocate the memory for the PCV in VM and copy
the saved BTPCV into it. Then have the indirect pointer array (or CPU
private register/variable) point to this section. It only puts strain on
the TLB of the newly online CPU, but it gives us the option of placing the
PCV into memory that we want (NUMA friendly).
So I should forget about the modules for now, and get a hotplug PCV
solution working. All the archs would need to do is to give a VM address
where to store these variables. Hmm...
Thoughts?
-- Steve
^ permalink raw reply [flat|nested] 20+ messages in thread