* [GIT PATCH] x86,percpu: fix pageattr handling with remap allocator
@ 2009-05-14 12:49 Tejun Heo
2009-05-14 12:49 ` [PATCH 1/4] x86: prepare setup_pcpu_remap() for pageattr fix Tejun Heo
` (7 more replies)
0 siblings, 8 replies; 38+ messages in thread
From: Tejun Heo @ 2009-05-14 12:49 UTC (permalink / raw)
To: JBeulich, andi, mingo, linux-kernel-owner, hpa, tglx,
linux-kernel
Hello,
Upon ack, please pull from the following git tree.
git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git x86-percpu-pageattr
This patchset fixes subtile bug in pageattr handling when remap percpu
first chunk allocator is in use and implements percpu_alloc kernel
parameter so that allocator can be selected from boot prompt.
This problem was spotted by Jan Beulich.
The remap allocator allocates a PMD page per cpu, returns whatever is
unnecessary to the page allocator and remaps the PMD page into vmalloc
area to construct the first percpu chunk. This is to take advantage
of large page mapping. However this creates active aliases for the
recycled pages. When some user allocates the recycled pages and tries
to change pageattr on it, remapped PMD alias might end up having
different attributes with the regular page mapped addresses which can
lead to subtle data corruption according to Andi.
Similar problem exists for the high mapped area of x86_64 kernel which
is handled by detecting the alias and splitting up the high map PMD
and applying the same attributes there. This patch implements the
same workaround for remapped first chunk.
Also, it's still unclear whether the remap allocator is worth the
trouble. Andi thinks it would be better to simply use 4k mapping on
NUMA machines as PMD TLB is much more precious resource than PTE TLB.
To ease testing, this patch implements percpu_alloc kernel parameter
which allows explicitly selecting which allocator to use.
This patchset contains the following four patches.
0001-x86-prepare-setup_pcpu_remap-for-pageattr-fix.patch
0002-x86-simplify-cpa_process_alias.patch
0003-x86-fix-pageattr-handling-for-remap-percpu-allocato.patch
0004-x86-implement-percpu_alloc-kernel-parameter.patch
0001-0002 preps for pageattr fix. 0003 fixes the pageattr bug. 0004
implements percpu_alloc kernel parameter.
This patchset is on top of the current linux-2.6#master
(210af919c949a7d6bd330916ef376cec2907d81e) and contains the following
changes.
Documentation/kernel-parameters.txt | 6 +
arch/x86/include/asm/percpu.h | 9 +
arch/x86/kernel/setup_percpu.c | 186 ++++++++++++++++++++++++++----------
arch/x86/mm/pageattr.c | 59 +++++++----
mm/percpu.c | 13 +-
5 files changed, 197 insertions(+), 76 deletions(-)
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 38+ messages in thread
* [PATCH 1/4] x86: prepare setup_pcpu_remap() for pageattr fix
2009-05-14 12:49 [GIT PATCH] x86,percpu: fix pageattr handling with remap allocator Tejun Heo
@ 2009-05-14 12:49 ` Tejun Heo
2009-05-14 12:49 ` [PATCH 2/4] x86: simplify cpa_process_alias() Tejun Heo
` (6 subsequent siblings)
7 siblings, 0 replies; 38+ messages in thread
From: Tejun Heo @ 2009-05-14 12:49 UTC (permalink / raw)
To: JBeulich, andi, mingo, linux-kernel-owner, hpa, tglx,
linux-kernel
Cc: Tejun Heo
Make the following changes in preparation of coming pageattr updates.
* Define and use array of struct pcpur_ent instead of array of
pointers. The only difference is ->cpu field which is set but
unused yet.
* Rename variables according to the above change.
* Rename local variable vm to pcpur_vm and move it out of the
function.
[ Impact: no functional difference ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jan Beulich <JBeulich@novell.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
---
arch/x86/kernel/setup_percpu.c | 58 ++++++++++++++++++++++-----------------
1 files changed, 33 insertions(+), 25 deletions(-)
diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c
index 3a97a4c..c17059c 100644
--- a/arch/x86/kernel/setup_percpu.c
+++ b/arch/x86/kernel/setup_percpu.c
@@ -137,8 +137,14 @@ static void * __init pcpu_alloc_bootmem(unsigned int cpu, unsigned long size,
* better than only using 4k mappings while still being NUMA friendly.
*/
#ifdef CONFIG_NEED_MULTIPLE_NODES
+struct pcpur_ent {
+ unsigned int cpu;
+ void *ptr;
+};
+
static size_t pcpur_size __initdata;
-static void **pcpur_ptrs __initdata;
+static struct pcpur_ent *pcpur_map __initdata;
+static struct vm_struct pcpur_vm;
static struct page * __init pcpur_get_page(unsigned int cpu, int pageno)
{
@@ -147,13 +153,12 @@ static struct page * __init pcpur_get_page(unsigned int cpu, int pageno)
if (off >= pcpur_size)
return NULL;
- return virt_to_page(pcpur_ptrs[cpu] + off);
+ return virt_to_page(pcpur_map[cpu].ptr + off);
}
static ssize_t __init setup_pcpu_remap(size_t static_size)
{
- static struct vm_struct vm;
- size_t ptrs_size, dyn_size;
+ size_t map_size, dyn_size;
unsigned int cpu;
ssize_t ret;
@@ -178,12 +183,14 @@ static ssize_t __init setup_pcpu_remap(size_t static_size)
dyn_size = pcpur_size - static_size - PERCPU_FIRST_CHUNK_RESERVE;
/* allocate pointer array and alloc large pages */
- ptrs_size = PFN_ALIGN(num_possible_cpus() * sizeof(pcpur_ptrs[0]));
- pcpur_ptrs = alloc_bootmem(ptrs_size);
+ map_size = PFN_ALIGN(num_possible_cpus() * sizeof(pcpur_map[0]));
+ pcpur_map = alloc_bootmem(map_size);
for_each_possible_cpu(cpu) {
- pcpur_ptrs[cpu] = pcpu_alloc_bootmem(cpu, PMD_SIZE, PMD_SIZE);
- if (!pcpur_ptrs[cpu])
+ pcpur_map[cpu].cpu = cpu;
+ pcpur_map[cpu].ptr = pcpu_alloc_bootmem(cpu, PMD_SIZE,
+ PMD_SIZE);
+ if (!pcpur_map[cpu].ptr)
goto enomem;
/*
@@ -194,42 +201,43 @@ static ssize_t __init setup_pcpu_remap(size_t static_size)
* not well-specified to have a PAT-incompatible area
* (unmapped RAM, device memory, etc.) in that hole.
*/
- free_bootmem(__pa(pcpur_ptrs[cpu] + pcpur_size),
+ free_bootmem(__pa(pcpur_map[cpu].ptr + pcpur_size),
PMD_SIZE - pcpur_size);
- memcpy(pcpur_ptrs[cpu], __per_cpu_load, static_size);
+ memcpy(pcpur_map[cpu].ptr, __per_cpu_load, static_size);
}
/* allocate address and map */
- vm.flags = VM_ALLOC;
- vm.size = num_possible_cpus() * PMD_SIZE;
- vm_area_register_early(&vm, PMD_SIZE);
+ pcpur_vm.flags = VM_ALLOC;
+ pcpur_vm.size = num_possible_cpus() * PMD_SIZE;
+ vm_area_register_early(&pcpur_vm, PMD_SIZE);
for_each_possible_cpu(cpu) {
- pmd_t *pmd;
+ pmd_t *pmd, pmd_v;
- pmd = populate_extra_pmd((unsigned long)vm.addr
- + cpu * PMD_SIZE);
- set_pmd(pmd, pfn_pmd(page_to_pfn(virt_to_page(pcpur_ptrs[cpu])),
- PAGE_KERNEL_LARGE));
+ pmd = populate_extra_pmd((unsigned long)pcpur_vm.addr +
+ cpu * PMD_SIZE);
+ pmd_v = pfn_pmd(page_to_pfn(virt_to_page(pcpur_map[cpu].ptr)),
+ PAGE_KERNEL_LARGE);
+ set_pmd(pmd, pmd_v);
}
/* we're ready, commit */
pr_info("PERCPU: Remapped at %p with large pages, static data "
- "%zu bytes\n", vm.addr, static_size);
+ "%zu bytes\n", pcpur_vm.addr, static_size);
ret = pcpu_setup_first_chunk(pcpur_get_page, static_size,
PERCPU_FIRST_CHUNK_RESERVE, dyn_size,
- PMD_SIZE, vm.addr, NULL);
- goto out_free_ar;
+ PMD_SIZE, pcpur_vm.addr, NULL);
+ goto out_free_map;
enomem:
for_each_possible_cpu(cpu)
- if (pcpur_ptrs[cpu])
- free_bootmem(__pa(pcpur_ptrs[cpu]), PMD_SIZE);
+ if (pcpur_map[cpu].ptr)
+ free_bootmem(__pa(pcpur_map[cpu].ptr), PMD_SIZE);
ret = -ENOMEM;
-out_free_ar:
- free_bootmem(__pa(pcpur_ptrs), ptrs_size);
+out_free_map:
+ free_bootmem(__pa(pcpur_map), map_size);
return ret;
}
#else
--
1.6.0.2
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH 2/4] x86: simplify cpa_process_alias()
2009-05-14 12:49 [GIT PATCH] x86,percpu: fix pageattr handling with remap allocator Tejun Heo
2009-05-14 12:49 ` [PATCH 1/4] x86: prepare setup_pcpu_remap() for pageattr fix Tejun Heo
@ 2009-05-14 12:49 ` Tejun Heo
2009-05-14 14:16 ` Jan Beulich
2009-05-14 12:49 ` [PATCH 3/4] x86: fix pageattr handling for remap percpu allocator Tejun Heo
` (5 subsequent siblings)
7 siblings, 1 reply; 38+ messages in thread
From: Tejun Heo @ 2009-05-14 12:49 UTC (permalink / raw)
To: JBeulich, andi, mingo, linux-kernel-owner, hpa, tglx,
linux-kernel
Cc: Tejun Heo
The two existing alias conditions in cpa_process_alias() are mutually
exclusive and future ones are likely to be exclusive too. Simplify
control flow to ease adding other alias cases.
The within(vaddr, (unsigned long)_text, _brk_end) test is removed as
it's guaranteed to be false if vaddr is in the page mapped area.
[ Impact: cleanup, no functional difference ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jan Beulich <JBeulich@novell.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
---
arch/x86/mm/pageattr.c | 42 ++++++++++++++++++------------------------
1 files changed, 18 insertions(+), 24 deletions(-)
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 797f9f1..1097b61 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -686,7 +686,6 @@ static int __change_page_attr_set_clr(struct cpa_data *cpa, int checkalias);
static int cpa_process_alias(struct cpa_data *cpa)
{
struct cpa_data alias_cpa;
- int ret = 0;
unsigned long temp_cpa_vaddr, vaddr;
if (cpa->pfn >= max_pfn_mapped)
@@ -715,38 +714,33 @@ static int cpa_process_alias(struct cpa_data *cpa)
alias_cpa.vaddr = &temp_cpa_vaddr;
alias_cpa.flags &= ~(CPA_PAGES_ARRAY | CPA_ARRAY);
-
- ret = __change_page_attr_set_clr(&alias_cpa, 0);
+ return __change_page_attr_set_clr(&alias_cpa, 0);
}
-#ifdef CONFIG_X86_64
- if (ret)
- return ret;
- /*
- * No need to redo, when the primary call touched the high
- * mapping already:
- */
- if (within(vaddr, (unsigned long) _text, _brk_end))
- return 0;
+ /* vaddr is in page mapped area */
+#ifdef CONFIG_X86_64
/*
* If the physical address is inside the kernel map, we need
* to touch the high mapped kernel as well:
*/
- if (!within(cpa->pfn, highmap_start_pfn(), highmap_end_pfn()))
- return 0;
-
- alias_cpa = *cpa;
- temp_cpa_vaddr = (cpa->pfn << PAGE_SHIFT) + __START_KERNEL_map - phys_base;
- alias_cpa.vaddr = &temp_cpa_vaddr;
- alias_cpa.flags &= ~(CPA_PAGES_ARRAY | CPA_ARRAY);
+ if (within(cpa->pfn, highmap_start_pfn(), highmap_end_pfn())) {
+ alias_cpa = *cpa;
+ temp_cpa_vaddr = (cpa->pfn << PAGE_SHIFT) +
+ __START_KERNEL_map - phys_base;
+ alias_cpa.vaddr = &temp_cpa_vaddr;
+ alias_cpa.flags &= ~(CPA_PAGES_ARRAY | CPA_ARRAY);
- /*
- * The high mapping range is imprecise, so ignore the return value.
- */
- __change_page_attr_set_clr(&alias_cpa, 0);
+ /*
+ * The high mapping range is imprecise, so ignore the
+ * return value.
+ */
+ __change_page_attr_set_clr(&alias_cpa, 0);
+ return 0;
+ }
#endif
- return ret;
+
+ return 0;
}
static int __change_page_attr_set_clr(struct cpa_data *cpa, int checkalias)
--
1.6.0.2
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH 3/4] x86: fix pageattr handling for remap percpu allocator
2009-05-14 12:49 [GIT PATCH] x86,percpu: fix pageattr handling with remap allocator Tejun Heo
2009-05-14 12:49 ` [PATCH 1/4] x86: prepare setup_pcpu_remap() for pageattr fix Tejun Heo
2009-05-14 12:49 ` [PATCH 2/4] x86: simplify cpa_process_alias() Tejun Heo
@ 2009-05-14 12:49 ` Tejun Heo
2009-05-14 16:21 ` [PATCH UPDATED " Tejun Heo
2009-05-14 12:49 ` [PATCH 4/4] x86: implement percpu_alloc kernel parameter Tejun Heo
` (4 subsequent siblings)
7 siblings, 1 reply; 38+ messages in thread
From: Tejun Heo @ 2009-05-14 12:49 UTC (permalink / raw)
To: JBeulich, andi, mingo, linux-kernel-owner, hpa, tglx,
linux-kernel
Cc: Tejun Heo
Remap allocator aliases a PMD page for each cpu and returns whatever
is unused to the page allocator. When the pageattr of the recycled
pages are changed, this makes the two aliases point to the overlapping
regions with different attributes which isn't allowed and known to
cause subtle data corruption in certain cases.
This can be handled in simliar manner to the x86_64 highmap alias.
pageattr code should detect if the target pages have PMD alias and
split the PMD alias and synchronize the attributes.
pcpur allocator is updated to keep the allocated PMD pages map sorted
in ascending address order and provide pcpu_pmd_remapped() function
which binary searches the array to determine whether the given address
is aliased and if so to which address. pageattr is updated to use
pcpu_pmd_remapped() to detect the PMD alias and split it up as
necessary from cpa_process_alias().
This problem has been spotted by Jan Beulich.
[ Impact: fix subtle pageattr bug ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Jan Beulich <JBeulich@novell.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
---
arch/x86/include/asm/percpu.h | 9 +++++
arch/x86/kernel/setup_percpu.c | 68 ++++++++++++++++++++++++++++++++++++---
arch/x86/mm/pageattr.c | 21 ++++++++++++
3 files changed, 92 insertions(+), 6 deletions(-)
diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index aee103b..cad3531 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -155,6 +155,15 @@ do { \
/* We can use this directly for local CPU (faster). */
DECLARE_PER_CPU(unsigned long, this_cpu_off);
+#ifdef CONFIG_NEED_MULTIPLE_NODES
+void *pcpu_pmd_remapped(void *kaddr);
+#else
+static inline void *pcpu_pmd_remapped(void *kaddr)
+{
+ return NULL;
+}
+#endif
+
#endif /* !__ASSEMBLY__ */
#ifdef CONFIG_SMP
diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c
index c17059c..dd567a7 100644
--- a/arch/x86/kernel/setup_percpu.c
+++ b/arch/x86/kernel/setup_percpu.c
@@ -142,8 +142,8 @@ struct pcpur_ent {
void *ptr;
};
-static size_t pcpur_size __initdata;
-static struct pcpur_ent *pcpur_map __initdata;
+static size_t pcpur_size;
+static struct pcpur_ent *pcpur_map;
static struct vm_struct pcpur_vm;
static struct page * __init pcpur_get_page(unsigned int cpu, int pageno)
@@ -160,6 +160,7 @@ static ssize_t __init setup_pcpu_remap(size_t static_size)
{
size_t map_size, dyn_size;
unsigned int cpu;
+ int i, j;
ssize_t ret;
/*
@@ -229,16 +230,71 @@ static ssize_t __init setup_pcpu_remap(size_t static_size)
ret = pcpu_setup_first_chunk(pcpur_get_page, static_size,
PERCPU_FIRST_CHUNK_RESERVE, dyn_size,
PMD_SIZE, pcpur_vm.addr, NULL);
- goto out_free_map;
+
+ /* sort pcpur_map array for pcpu_pmd_remapped() */
+ for (i = 0; i < num_possible_cpus() - 1; i++)
+ for (j = i + 1; j < num_possible_cpus(); j++)
+ if (pcpur_map[i].ptr > pcpur_map[j].ptr) {
+ struct pcpur_ent tmp = pcpur_map[i];
+ pcpur_map[i] = pcpur_map[j];
+ pcpur_map[j] = tmp;
+ }
+
+ return ret;
enomem:
for_each_possible_cpu(cpu)
if (pcpur_map[cpu].ptr)
free_bootmem(__pa(pcpur_map[cpu].ptr), PMD_SIZE);
- ret = -ENOMEM;
-out_free_map:
free_bootmem(__pa(pcpur_map), map_size);
- return ret;
+ return -ENOMEM;
+}
+
+/**
+ * pcpu_pmd_remapped - determine whether a kaddr is in pcpur recycled area
+ * @kaddr: the kernel address in question
+ *
+ * Determine whether @kaddr falls in the pcpur recycled area. This is
+ * used by pageattr to detect VM aliases and break up the pcpu PMD
+ * mapping such that the same physical page is not mapped under
+ * different attributes.
+ *
+ * The recycled area is always at the tail of a partially used PMD
+ * page.
+ *
+ * RETURNS:
+ * Address of corresponding remapped pcpu address if match is found;
+ * otherwise, NULL.
+ */
+void *pcpu_pmd_remapped(void *kaddr)
+{
+ void *pmd_addr = (void *)((unsigned long)kaddr & PMD_MASK);
+ unsigned long offset = (unsigned long)kaddr & ~PMD_MASK;
+ int left = 0, right = num_possible_cpus() - 1;
+ int pos;
+
+ /* pcpur in use at all? */
+ if (!pcpur_map)
+ return NULL;
+
+ /* okay, perform binary search */
+ while (left <= right) {
+ pos = (left + right) / 2;
+
+ if (pcpur_map[pos].ptr < pmd_addr)
+ left = pos + 1;
+ else if (pcpur_map[pos].ptr > pmd_addr)
+ right = pos - 1;
+ else {
+ /* it shouldn't be in the area for the first chunk */
+ WARN_ON(offset < pcpur_size);
+
+ return pcpur_vm.addr +
+ pcpur_map[pos].cpu * PMD_SIZE + offset;
+ }
+ }
+
+ return NULL;
}
#else
static ssize_t __init setup_pcpu_remap(size_t static_size)
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 1097b61..a3d860b 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -11,6 +11,7 @@
#include <linux/interrupt.h>
#include <linux/seq_file.h>
#include <linux/debugfs.h>
+#include <linux/pfn.h>
#include <asm/e820.h>
#include <asm/processor.h>
@@ -687,6 +688,7 @@ static int cpa_process_alias(struct cpa_data *cpa)
{
struct cpa_data alias_cpa;
unsigned long temp_cpa_vaddr, vaddr;
+ void *remapped;
if (cpa->pfn >= max_pfn_mapped)
return 0;
@@ -740,6 +742,25 @@ static int cpa_process_alias(struct cpa_data *cpa)
}
#endif
+ /*
+ * If the PMD page was partially used for per-cpu remapping,
+ * the remapped area needs to be split and modified. Note
+ * that the partial recycling only happens at the tail of a
+ * partially used PMD page, so touching single PMD page is
+ * always enough.
+ */
+ remapped = pcpu_pmd_remapped((void *)vaddr);
+ if (remapped) {
+ int max_pages = PFN_DOWN(PMD_SIZE - (vaddr & ~PMD_MASK));
+
+ alias_cpa = *cpa;
+ temp_cpa_vaddr = (unsigned long)remapped;
+ alias_cpa.vaddr = &temp_cpa_vaddr;
+ alias_cpa.numpages = min(alias_cpa.numpages, max_pages);
+ alias_cpa.flags &= ~(CPA_PAGES_ARRAY | CPA_ARRAY);
+ return __change_page_attr_set_clr(&alias_cpa, 0);
+ }
+
return 0;
}
--
1.6.0.2
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH 4/4] x86: implement percpu_alloc kernel parameter
2009-05-14 12:49 [GIT PATCH] x86,percpu: fix pageattr handling with remap allocator Tejun Heo
` (2 preceding siblings ...)
2009-05-14 12:49 ` [PATCH 3/4] x86: fix pageattr handling for remap percpu allocator Tejun Heo
@ 2009-05-14 12:49 ` Tejun Heo
2009-05-14 14:28 ` [GIT PATCH] x86,percpu: fix pageattr handling with remap allocator Jan Beulich
` (3 subsequent siblings)
7 siblings, 0 replies; 38+ messages in thread
From: Tejun Heo @ 2009-05-14 12:49 UTC (permalink / raw)
To: JBeulich, andi, mingo, linux-kernel-owner, hpa, tglx,
linux-kernel
Cc: Tejun Heo
According to Andi, it isn't clear whether remap allocator is worth the
trouble as there are many processors where PMD TLB is far scarcer than
PTE TLB. The advantage or disadvantage probably depends on the actual
size of percpu area and specific processor. As performance
degradation due to TLB pressure tends to be highly workload specific
and subtle, it is difficult to decide which way to go without more
data.
This patch implements percpu_alloc kernel parameter to allow selecting
which first chunk allocator to use to ease debugging and testing.
While at it, make sure all the failure paths report why something
failed to help determining why certain allocator isn't working. Also,
kill the "Great future plan" comment which had already been realized
quite some time ago.
[ Impact: allow explicit percpu first chunk allocator selection ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Jan Beulich <JBeulich@novell.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
---
Documentation/kernel-parameters.txt | 6 +++
arch/x86/kernel/setup_percpu.c | 68 +++++++++++++++++++++++------------
mm/percpu.c | 13 +++++--
3 files changed, 60 insertions(+), 27 deletions(-)
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index e87bdbf..929bb3a 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1837,6 +1837,12 @@ and is between 256 and 4096 characters. It is defined in the file
Format: { 0 | 1 }
See arch/parisc/kernel/pdc_chassis.c
+ percpu_alloc= [X86] Select which percpu first chunk allocator to use.
+ Allowed values are one of "remap", "embed" and "4k".
+ See comments in arch/x86/kernel/setup_percpu.c for
+ details on each allocator. This parameter is primarily
+ for debugging and performance comparison.
+
pf. [PARIDE]
See Documentation/blockdev/paride.txt.
diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c
index dd567a7..db10248 100644
--- a/arch/x86/kernel/setup_percpu.c
+++ b/arch/x86/kernel/setup_percpu.c
@@ -163,12 +163,11 @@ static ssize_t __init setup_pcpu_remap(size_t static_size)
int i, j;
ssize_t ret;
- /*
- * If large page isn't supported, there's no benefit in doing
- * this. Also, on non-NUMA, embedding is better.
- */
- if (!cpu_has_pse || !pcpu_need_numa())
+ /* need PSE */
+ if (!cpu_has_pse) {
+ pr_warning("PERCPU: remap allocator requires PSE\n");
return -EINVAL;
+ }
/*
* Currently supports only single page. Supporting multiple
@@ -191,8 +190,11 @@ static ssize_t __init setup_pcpu_remap(size_t static_size)
pcpur_map[cpu].cpu = cpu;
pcpur_map[cpu].ptr = pcpu_alloc_bootmem(cpu, PMD_SIZE,
PMD_SIZE);
- if (!pcpur_map[cpu].ptr)
+ if (!pcpur_map[cpu].ptr) {
+ pr_warning("PERCPU: failed to allocate large page "
+ "for cpu%u\n", cpu);
goto enomem;
+ }
/*
* Only use pcpur_size bytes and give back the rest.
@@ -315,14 +317,6 @@ static ssize_t __init setup_pcpu_embed(size_t static_size)
{
size_t reserve = PERCPU_MODULE_RESERVE + PERCPU_DYNAMIC_RESERVE;
- /*
- * If large page isn't supported, there's no benefit in doing
- * this. Also, embedding allocation doesn't play well with
- * NUMA.
- */
- if (!cpu_has_pse || pcpu_need_numa())
- return -EINVAL;
-
return pcpu_embed_first_chunk(static_size, PERCPU_FIRST_CHUNK_RESERVE,
reserve - PERCPU_FIRST_CHUNK_RESERVE, -1);
}
@@ -370,8 +364,11 @@ static ssize_t __init setup_pcpu_4k(size_t static_size)
void *ptr;
ptr = pcpu_alloc_bootmem(cpu, PAGE_SIZE, PAGE_SIZE);
- if (!ptr)
+ if (!ptr) {
+ pr_warning("PERCPU: failed to allocate "
+ "4k page for cpu%u\n", cpu);
goto enomem;
+ }
memcpy(ptr, __per_cpu_load + i * PAGE_SIZE, PAGE_SIZE);
pcpu4k_pages[j++] = virt_to_page(ptr);
@@ -395,6 +392,16 @@ out_free_ar:
return ret;
}
+/* for explicit first chunk allocator selection */
+static char pcpu_chosen_alloc[16] __initdata;
+
+static int __init percpu_alloc_setup(char *str)
+{
+ strncpy(pcpu_chosen_alloc, str, sizeof(pcpu_chosen_alloc) - 1);
+ return 0;
+}
+early_param("percpu_alloc", percpu_alloc_setup);
+
static inline void setup_percpu_segment(int cpu)
{
#ifdef CONFIG_X86_32
@@ -408,11 +415,6 @@ static inline void setup_percpu_segment(int cpu)
#endif
}
-/*
- * Great future plan:
- * Declare PDA itself and support (irqstack,tss,pgd) as per cpu data.
- * Always point %gs to its beginning
- */
void __init setup_per_cpu_areas(void)
{
size_t static_size = __per_cpu_end - __per_cpu_start;
@@ -429,9 +431,29 @@ void __init setup_per_cpu_areas(void)
* of large page mappings. Please read comments on top of
* each allocator for details.
*/
- ret = setup_pcpu_remap(static_size);
- if (ret < 0)
- ret = setup_pcpu_embed(static_size);
+ ret = -EINVAL;
+ if (strlen(pcpu_chosen_alloc)) {
+ if (strcmp(pcpu_chosen_alloc, "4k")) {
+ if (!strcmp(pcpu_chosen_alloc, "remap"))
+ ret = setup_pcpu_remap(static_size);
+ else if (!strcmp(pcpu_chosen_alloc, "embed"))
+ ret = setup_pcpu_embed(static_size);
+ else
+ pr_warning("PERCPU: unknown allocator %s "
+ "specified\n", pcpu_chosen_alloc);
+ if (ret < 0)
+ pr_warning("PERCPU: %s allocator failed (%zd), "
+ "falling back to 4k\n",
+ pcpu_chosen_alloc, ret);
+ }
+ } else {
+ if (cpu_has_pse) {
+ if (pcpu_need_numa())
+ ret = setup_pcpu_remap(static_size);
+ else
+ ret = setup_pcpu_embed(static_size);
+ }
+ }
if (ret < 0)
ret = setup_pcpu_4k(static_size);
if (ret < 0)
diff --git a/mm/percpu.c b/mm/percpu.c
index 1aa5d8f..d42f2ce 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1287,6 +1287,7 @@ static struct page * __init pcpue_get_page(unsigned int cpu, int pageno)
ssize_t __init pcpu_embed_first_chunk(size_t static_size, size_t reserved_size,
ssize_t dyn_size, ssize_t unit_size)
{
+ size_t chunk_size;
unsigned int cpu;
/* determine parameters and allocate */
@@ -1301,11 +1302,15 @@ ssize_t __init pcpu_embed_first_chunk(size_t static_size, size_t reserved_size,
} else
pcpue_unit_size = max_t(size_t, pcpue_size, PCPU_MIN_UNIT_SIZE);
- pcpue_ptr = __alloc_bootmem_nopanic(
- num_possible_cpus() * pcpue_unit_size,
- PAGE_SIZE, __pa(MAX_DMA_ADDRESS));
- if (!pcpue_ptr)
+ chunk_size = pcpue_unit_size * num_possible_cpus();
+
+ pcpue_ptr = __alloc_bootmem_nopanic(chunk_size, PAGE_SIZE,
+ __pa(MAX_DMA_ADDRESS));
+ if (!pcpue_ptr) {
+ pr_warning("PERCPU: failed to allocate %zu bytes for "
+ "embedding\n", chunk_size);
return -ENOMEM;
+ }
/* return the leftover and copy */
for_each_possible_cpu(cpu) {
--
1.6.0.2
^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [PATCH 2/4] x86: simplify cpa_process_alias()
2009-05-14 12:49 ` [PATCH 2/4] x86: simplify cpa_process_alias() Tejun Heo
@ 2009-05-14 14:16 ` Jan Beulich
2009-05-14 15:37 ` Tejun Heo
0 siblings, 1 reply; 38+ messages in thread
From: Jan Beulich @ 2009-05-14 14:16 UTC (permalink / raw)
To: Tejun Heo; +Cc: mingo, andi, tglx, linux-kernel, linux-kernel-owner, hpa
>>> Tejun Heo <tj@kernel.org> 14.05.09 14:49 >>>
>The two existing alias conditions in cpa_process_alias() are mutually
>exclusive and future ones are likely to be exclusive too. Simplify
>control flow to ease adding other alias cases.
>
>The within(vaddr, (unsigned long)_text, _brk_end) test is removed as
>it's guaranteed to be false if vaddr is in the page mapped area.
I don't think that's correct - just consider the case where the originally passed
in virtual address is from the vmalloc area: In that case, both the 1:1 mapping
*and* the kernel mapping need to be checked.
Jan
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [GIT PATCH] x86,percpu: fix pageattr handling with remap allocator
2009-05-14 12:49 [GIT PATCH] x86,percpu: fix pageattr handling with remap allocator Tejun Heo
` (3 preceding siblings ...)
2009-05-14 12:49 ` [PATCH 4/4] x86: implement percpu_alloc kernel parameter Tejun Heo
@ 2009-05-14 14:28 ` Jan Beulich
2009-05-14 15:55 ` Tejun Heo
2009-05-14 16:22 ` Tejun Heo
` (2 subsequent siblings)
7 siblings, 1 reply; 38+ messages in thread
From: Jan Beulich @ 2009-05-14 14:28 UTC (permalink / raw)
To: Tejun Heo; +Cc: mingo, andi, tglx, linux-kernel, linux-kernel-owner, hpa
>>> Tejun Heo <tj@kernel.org> 14.05.09 14:49 >>>
>The remap allocator allocates a PMD page per cpu, returns whatever is
>unnecessary to the page allocator and remaps the PMD page into vmalloc
>area to construct the first percpu chunk. This is to take advantage
>of large page mapping. However this creates active aliases for the
>recycled pages. When some user allocates the recycled pages and tries
>to change pageattr on it, remapped PMD alias might end up having
>different attributes with the regular page mapped addresses which can
>lead to subtle data corruption according to Andi.
In order to reduce the amount of work to do during lookup as well as the
chance of having a collision at all, wouldn't it be reasonable to use as much
of an allocated 2/4M page as possible rather than returning whatever is
left after a single CPU got its per-CPU memory chunk from it? I.e. you'd
return only those (few) pages that either don't fit another CPU's chunk
anymore or that are left after running through all CPUs.
Or is there some hidden requirement that each CPU's per-CPU area must
start on a PMD boundary?
This would additionally address a potential problem on 32-bits - currently,
for a 32-CPU system you consume half of the vmalloc space with PAE (on
non-PAE you'd even exhaust it, but I think it's unreasonable to expect a
system having 32 CPUs to not need PAE).
Jan
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 2/4] x86: simplify cpa_process_alias()
2009-05-14 14:16 ` Jan Beulich
@ 2009-05-14 15:37 ` Tejun Heo
2009-05-14 16:20 ` [PATCH UPDATED 2/4] x86: reorganize cpa_process_alias() Tejun Heo
0 siblings, 1 reply; 38+ messages in thread
From: Tejun Heo @ 2009-05-14 15:37 UTC (permalink / raw)
To: Jan Beulich; +Cc: mingo, andi, tglx, linux-kernel, linux-kernel-owner, hpa
Hello,
Jan Beulich wrote:
>>>> Tejun Heo <tj@kernel.org> 14.05.09 14:49 >>>
>> The two existing alias conditions in cpa_process_alias() are mutually
>> exclusive and future ones are likely to be exclusive too. Simplify
>> control flow to ease adding other alias cases.
>>
>> The within(vaddr, (unsigned long)_text, _brk_end) test is removed as
>> it's guaranteed to be false if vaddr is in the page mapped area.
>
> I don't think that's correct - just consider the case where the
> originally passed in virtual address is from the vmalloc area: In
> that case, both the 1:1 mapping *and* the kernel mapping need to be
> checked.
Yeap, you're right. I was reading the second condition in reversal.
Will update and repost. Thanks for spotting it.
--
tejun
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [GIT PATCH] x86,percpu: fix pageattr handling with remap allocator
2009-05-14 14:28 ` [GIT PATCH] x86,percpu: fix pageattr handling with remap allocator Jan Beulich
@ 2009-05-14 15:55 ` Tejun Heo
2009-05-15 7:47 ` Jan Beulich
0 siblings, 1 reply; 38+ messages in thread
From: Tejun Heo @ 2009-05-14 15:55 UTC (permalink / raw)
To: Jan Beulich; +Cc: mingo, andi, tglx, linux-kernel, linux-kernel-owner, hpa
Hello, Jan.
Jan Beulich wrote:
> In order to reduce the amount of work to do during lookup as well as
> the chance of having a collision at all, wouldn't it be reasonable
> to use as much of an allocated 2/4M page as possible rather than
> returning whatever is left after a single CPU got its per-CPU memory
> chunk from it? I.e. you'd return only those (few) pages that either
> don't fit another CPU's chunk anymore or that are left after running
> through all CPUs.
>
> Or is there some hidden requirement that each CPU's per-CPU area must
> start on a PMD boundary?
The whole point of doing the remapping is giving each CPU its own PMD
mapping for perpcu area, so, yeah, that's the requirement. I don't
think the requirement is hidden tho.
How hot is the cpa path? On my test systems, there were only few
calls during init and then nothing. Does it become very hot if, for
example, GEM is used? But I really don't think the log2 binary search
overhead would be anything noticeable compared to TLB shootdown and
all other stuff going on there.
> This would additionally address a potential problem on 32-bits -
> currently, for a 32-CPU system you consume half of the vmalloc space
> with PAE (on non-PAE you'd even exhaust it, but I think it's
> unreasonable to expect a system having 32 CPUs to not need PAE).
I recall having about the same conversation before. Looking up...
-- QUOTE --
Actually, I've been looking at the numbers and I'm not sure if the
concern is valid. On x86_32, the practical number of maximum
processors would be around 16 so it will end up 32M, which isn't
nice and it would probably a good idea to introduce a parameter to
select which allocator to use but still it's far from consuming all
the VM area. On x86_64, the vmalloc area is obscenely large at 245,
ie 32 terabytes. Even with 4096 processors, single chunk is measly
0.02%.
If it's a problem for other archs or extreme x86_32 configurations,
we can add some safety measures but in general I don't think it is a
problem.
-- END OF QUOTE --
So, yeah, if there are 32bit 32-way NUMA machines out there, it would
be wise to skip remap allocator on such machines. Maybe we can
implement a heuristic - something like "if vm area consumption goes
over 25%, don't use remap".
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 38+ messages in thread
* [PATCH UPDATED 2/4] x86: reorganize cpa_process_alias()
2009-05-14 15:37 ` Tejun Heo
@ 2009-05-14 16:20 ` Tejun Heo
0 siblings, 0 replies; 38+ messages in thread
From: Tejun Heo @ 2009-05-14 16:20 UTC (permalink / raw)
To: Jan Beulich; +Cc: mingo, andi, tglx, linux-kernel, linux-kernel-owner, hpa
Reorganize cpa_process_alias() so that new alias condition can be
added easily.
Jan Beulich spotted problem in the original cleanup thread which
incorrectly assumed the two existing conditions were mutially
exclusive.
[ Impact: code reorganization ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jan Beulich <JBeulich@novell.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
---
arch/x86/mm/pageattr.c | 44 ++++++++++++++++++++------------------------
1 files changed, 20 insertions(+), 24 deletions(-)
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 797f9f1..8bc64b0 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -686,8 +686,8 @@ static int __change_page_attr_set_clr(struct cpa_data *cpa, int checkalias);
static int cpa_process_alias(struct cpa_data *cpa)
{
struct cpa_data alias_cpa;
- int ret = 0;
unsigned long temp_cpa_vaddr, vaddr;
+ int ret;
if (cpa->pfn >= max_pfn_mapped)
return 0;
@@ -715,38 +715,34 @@ static int cpa_process_alias(struct cpa_data *cpa)
alias_cpa.vaddr = &temp_cpa_vaddr;
alias_cpa.flags &= ~(CPA_PAGES_ARRAY | CPA_ARRAY);
-
ret = __change_page_attr_set_clr(&alias_cpa, 0);
+ if (ret)
+ return ret;
}
#ifdef CONFIG_X86_64
- if (ret)
- return ret;
/*
- * No need to redo, when the primary call touched the high
- * mapping already:
- */
- if (within(vaddr, (unsigned long) _text, _brk_end))
- return 0;
-
- /*
- * If the physical address is inside the kernel map, we need
+ * If the primary call didn't touch the high mapping already
+ * and the physical address is inside the kernel map, we need
* to touch the high mapped kernel as well:
*/
- if (!within(cpa->pfn, highmap_start_pfn(), highmap_end_pfn()))
- return 0;
-
- alias_cpa = *cpa;
- temp_cpa_vaddr = (cpa->pfn << PAGE_SHIFT) + __START_KERNEL_map - phys_base;
- alias_cpa.vaddr = &temp_cpa_vaddr;
- alias_cpa.flags &= ~(CPA_PAGES_ARRAY | CPA_ARRAY);
+ if (!within(vaddr, (unsigned long)_text, _brk_end) &&
+ within(cpa->pfn, highmap_start_pfn(), highmap_end_pfn())) {
+ alias_cpa = *cpa;
+ temp_cpa_vaddr = (cpa->pfn << PAGE_SHIFT) +
+ __START_KERNEL_map - phys_base;
+ alias_cpa.vaddr = &temp_cpa_vaddr;
+ alias_cpa.flags &= ~(CPA_PAGES_ARRAY | CPA_ARRAY);
- /*
- * The high mapping range is imprecise, so ignore the return value.
- */
- __change_page_attr_set_clr(&alias_cpa, 0);
+ /*
+ * The high mapping range is imprecise, so ignore the
+ * return value.
+ */
+ __change_page_attr_set_clr(&alias_cpa, 0);
+ }
#endif
- return ret;
+
+ return 0;
}
static int __change_page_attr_set_clr(struct cpa_data *cpa, int checkalias)
--
1.6.0.2
^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [PATCH UPDATED 3/4] x86: fix pageattr handling for remap percpu allocator
2009-05-14 12:49 ` [PATCH 3/4] x86: fix pageattr handling for remap percpu allocator Tejun Heo
@ 2009-05-14 16:21 ` Tejun Heo
0 siblings, 0 replies; 38+ messages in thread
From: Tejun Heo @ 2009-05-14 16:21 UTC (permalink / raw)
To: JBeulich, andi, mingo, linux-kernel-owner, hpa, tglx,
linux-kernel
Remap allocator aliases a PMD page for each cpu and returns whatever
is unused to the page allocator. When the pageattr of the recycled
pages are changed, this makes the two aliases point to the overlapping
regions with different attributes which isn't allowed and known to
cause subtle data corruption in certain cases.
This can be handled in simliar manner to the x86_64 highmap alias.
pageattr code should detect if the target pages have PMD alias and
split the PMD alias and synchronize the attributes.
pcpur allocator is updated to keep the allocated PMD pages map sorted
in ascending address order and provide pcpu_pmd_remapped() function
which binary searches the array to determine whether the given address
is aliased and if so to which address. pageattr is updated to use
pcpu_pmd_remapped() to detect the PMD alias and split it up as
necessary from cpa_process_alias().
This problem has been spotted by Jan Beulich.
[ Impact: fix subtle pageattr bug ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Jan Beulich <JBeulich@novell.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
---
arch/x86/include/asm/percpu.h | 9 +++++
arch/x86/kernel/setup_percpu.c | 68 ++++++++++++++++++++++++++++++++++++---
arch/x86/mm/pageattr.c | 23 +++++++++++++
3 files changed, 94 insertions(+), 6 deletions(-)
diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index aee103b..cad3531 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -155,6 +155,15 @@ do { \
/* We can use this directly for local CPU (faster). */
DECLARE_PER_CPU(unsigned long, this_cpu_off);
+#ifdef CONFIG_NEED_MULTIPLE_NODES
+void *pcpu_pmd_remapped(void *kaddr);
+#else
+static inline void *pcpu_pmd_remapped(void *kaddr)
+{
+ return NULL;
+}
+#endif
+
#endif /* !__ASSEMBLY__ */
#ifdef CONFIG_SMP
diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c
index c17059c..dd567a7 100644
--- a/arch/x86/kernel/setup_percpu.c
+++ b/arch/x86/kernel/setup_percpu.c
@@ -142,8 +142,8 @@ struct pcpur_ent {
void *ptr;
};
-static size_t pcpur_size __initdata;
-static struct pcpur_ent *pcpur_map __initdata;
+static size_t pcpur_size;
+static struct pcpur_ent *pcpur_map;
static struct vm_struct pcpur_vm;
static struct page * __init pcpur_get_page(unsigned int cpu, int pageno)
@@ -160,6 +160,7 @@ static ssize_t __init setup_pcpu_remap(size_t static_size)
{
size_t map_size, dyn_size;
unsigned int cpu;
+ int i, j;
ssize_t ret;
/*
@@ -229,16 +230,71 @@ static ssize_t __init setup_pcpu_remap(size_t static_size)
ret = pcpu_setup_first_chunk(pcpur_get_page, static_size,
PERCPU_FIRST_CHUNK_RESERVE, dyn_size,
PMD_SIZE, pcpur_vm.addr, NULL);
- goto out_free_map;
+
+ /* sort pcpur_map array for pcpu_pmd_remapped() */
+ for (i = 0; i < num_possible_cpus() - 1; i++)
+ for (j = i + 1; j < num_possible_cpus(); j++)
+ if (pcpur_map[i].ptr > pcpur_map[j].ptr) {
+ struct pcpur_ent tmp = pcpur_map[i];
+ pcpur_map[i] = pcpur_map[j];
+ pcpur_map[j] = tmp;
+ }
+
+ return ret;
enomem:
for_each_possible_cpu(cpu)
if (pcpur_map[cpu].ptr)
free_bootmem(__pa(pcpur_map[cpu].ptr), PMD_SIZE);
- ret = -ENOMEM;
-out_free_map:
free_bootmem(__pa(pcpur_map), map_size);
- return ret;
+ return -ENOMEM;
+}
+
+/**
+ * pcpu_pmd_remapped - determine whether a kaddr is in pcpur recycled area
+ * @kaddr: the kernel address in question
+ *
+ * Determine whether @kaddr falls in the pcpur recycled area. This is
+ * used by pageattr to detect VM aliases and break up the pcpu PMD
+ * mapping such that the same physical page is not mapped under
+ * different attributes.
+ *
+ * The recycled area is always at the tail of a partially used PMD
+ * page.
+ *
+ * RETURNS:
+ * Address of corresponding remapped pcpu address if match is found;
+ * otherwise, NULL.
+ */
+void *pcpu_pmd_remapped(void *kaddr)
+{
+ void *pmd_addr = (void *)((unsigned long)kaddr & PMD_MASK);
+ unsigned long offset = (unsigned long)kaddr & ~PMD_MASK;
+ int left = 0, right = num_possible_cpus() - 1;
+ int pos;
+
+ /* pcpur in use at all? */
+ if (!pcpur_map)
+ return NULL;
+
+ /* okay, perform binary search */
+ while (left <= right) {
+ pos = (left + right) / 2;
+
+ if (pcpur_map[pos].ptr < pmd_addr)
+ left = pos + 1;
+ else if (pcpur_map[pos].ptr > pmd_addr)
+ right = pos - 1;
+ else {
+ /* it shouldn't be in the area for the first chunk */
+ WARN_ON(offset < pcpur_size);
+
+ return pcpur_vm.addr +
+ pcpur_map[pos].cpu * PMD_SIZE + offset;
+ }
+ }
+
+ return NULL;
}
#else
static ssize_t __init setup_pcpu_remap(size_t static_size)
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 8bc64b0..a1bcead 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -11,6 +11,7 @@
#include <linux/interrupt.h>
#include <linux/seq_file.h>
#include <linux/debugfs.h>
+#include <linux/pfn.h>
#include <asm/e820.h>
#include <asm/processor.h>
@@ -687,6 +688,7 @@ static int cpa_process_alias(struct cpa_data *cpa)
{
struct cpa_data alias_cpa;
unsigned long temp_cpa_vaddr, vaddr;
+ void *remapped;
int ret;
if (cpa->pfn >= max_pfn_mapped)
@@ -742,6 +744,27 @@ static int cpa_process_alias(struct cpa_data *cpa)
}
#endif
+ /*
+ * If the PMD page was partially used for per-cpu remapping,
+ * the remapped area needs to be split and modified. Note
+ * that the partial recycling only happens at the tail of a
+ * partially used PMD page, so touching single PMD page is
+ * always enough.
+ */
+ remapped = pcpu_pmd_remapped((void *)vaddr);
+ if (remapped) {
+ int max_pages = PFN_DOWN(PMD_SIZE - (vaddr & ~PMD_MASK));
+
+ alias_cpa = *cpa;
+ temp_cpa_vaddr = (unsigned long)remapped;
+ alias_cpa.vaddr = &temp_cpa_vaddr;
+ alias_cpa.numpages = min(alias_cpa.numpages, max_pages);
+ alias_cpa.flags &= ~(CPA_PAGES_ARRAY | CPA_ARRAY);
+ ret = __change_page_attr_set_clr(&alias_cpa, 0);
+ if (ret)
+ return ret;
+ }
+
return 0;
}
--
1.6.0.2
^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [GIT PATCH] x86,percpu: fix pageattr handling with remap allocator
2009-05-14 12:49 [GIT PATCH] x86,percpu: fix pageattr handling with remap allocator Tejun Heo
` (4 preceding siblings ...)
2009-05-14 14:28 ` [GIT PATCH] x86,percpu: fix pageattr handling with remap allocator Jan Beulich
@ 2009-05-14 16:22 ` Tejun Heo
2009-05-15 4:00 ` Tejun Heo
2009-05-16 1:17 ` Suresh Siddha
2009-05-19 9:44 ` Tejun Heo
7 siblings, 1 reply; 38+ messages in thread
From: Tejun Heo @ 2009-05-14 16:22 UTC (permalink / raw)
To: JBeulich, andi, mingo, linux-kernel-owner, hpa, tglx,
linux-kernel
Tejun Heo wrote:
> Hello,
>
> Upon ack, please pull from the following git tree.
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git x86-percpu-pageattr
git tree updated to contain two updated patches. The new commit ID is
42fec16c045bc5259002faacdcd23e4530b6482e.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [GIT PATCH] x86,percpu: fix pageattr handling with remap allocator
2009-05-14 16:22 ` Tejun Heo
@ 2009-05-15 4:00 ` Tejun Heo
2009-05-15 4:36 ` David Miller
0 siblings, 1 reply; 38+ messages in thread
From: Tejun Heo @ 2009-05-15 4:00 UTC (permalink / raw)
To: JBeulich, andi, mingo, linux-kernel-owner, hpa, tglx,
linux-kernel
Tejun Heo wrote:
> Tejun Heo wrote:
>> Hello,
>>
>> Upon ack, please pull from the following git tree.
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git x86-percpu-pageattr
>
> git tree updated to contain two updated patches. The new commit ID is
> 42fec16c045bc5259002faacdcd23e4530b6482e.
Will soon post an updated patchset. Please wait a bit.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [GIT PATCH] x86,percpu: fix pageattr handling with remap allocator
2009-05-15 4:00 ` Tejun Heo
@ 2009-05-15 4:36 ` David Miller
2009-05-15 4:48 ` Tejun Heo
0 siblings, 1 reply; 38+ messages in thread
From: David Miller @ 2009-05-15 4:36 UTC (permalink / raw)
To: teheo; +Cc: JBeulich, andi, mingo, linux-kernel-owner, hpa, tglx,
linux-kernel
You might want to drop linux-kernel-owner from the CC:, I already
get enough copies :-)
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [GIT PATCH] x86,percpu: fix pageattr handling with remap allocator
2009-05-15 4:36 ` David Miller
@ 2009-05-15 4:48 ` Tejun Heo
0 siblings, 0 replies; 38+ messages in thread
From: Tejun Heo @ 2009-05-15 4:48 UTC (permalink / raw)
To: David Miller; +Cc: JBeulich, andi, mingo, hpa, tglx, linux-kernel
David Miller wrote:
> You might want to drop linux-kernel-owner from the CC:, I already
> get enough copies :-)
Hehe.. Where did that come from? Sorry about that. :-)
--
tejun
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [GIT PATCH] x86,percpu: fix pageattr handling with remap allocator
2009-05-14 15:55 ` Tejun Heo
@ 2009-05-15 7:47 ` Jan Beulich
2009-05-15 8:11 ` Tejun Heo
0 siblings, 1 reply; 38+ messages in thread
From: Jan Beulich @ 2009-05-15 7:47 UTC (permalink / raw)
To: Tejun Heo; +Cc: mingo, andi, tglx, linux-kernel, hpa
>>> Tejun Heo <tj@kernel.org> 14.05.09 17:55 >>>
>> In order to reduce the amount of work to do during lookup as well as
>> the chance of having a collision at all, wouldn't it be reasonable
>> to use as much of an allocated 2/4M page as possible rather than
>> returning whatever is left after a single CPU got its per-CPU memory
>> chunk from it? I.e. you'd return only those (few) pages that either
>> don't fit another CPU's chunk anymore or that are left after running
>> through all CPUs.
>>
>> Or is there some hidden requirement that each CPU's per-CPU area must
>> start on a PMD boundary?
>
>The whole point of doing the remapping is giving each CPU its own PMD
>mapping for perpcu area, so, yeah, that's the requirement. I don't
>think the requirement is hidden tho.
No, from looking at the code the requirement seems to only be that you
get memory allocated from the correct node and mapped by a large page.
There's nothing said why the final virtual address would need to be large
page aligned. I.e., with a slight modification to take the NUMA requirement
into account (I noticed I ignored that aspect after I had already sent that
mail), the previous suggestion would still appear usable to me.
>How hot is the cpa path? On my test systems, there were only few
>calls during init and then nothing. Does it become very hot if, for
>example, GEM is used? But I really don't think the log2 binary search
>overhead would be anything noticeable compared to TLB shootdown and
>all other stuff going on there.
I would view cutting down on that only as a nice side effect, not a primary
reason to do the change. The primary reason is this:
>> This would additionally address a potential problem on 32-bits -
>> currently, for a 32-CPU system you consume half of the vmalloc space
>> with PAE (on non-PAE you'd even exhaust it, but I think it's
>> unreasonable to expect a system having 32 CPUs to not need PAE).
>
>I recall having about the same conversation before. Looking up...
>
>-- QUOTE --
> Actually, I've been looking at the numbers and I'm not sure if the
> concern is valid. On x86_32, the practical number of maximum
> processors would be around 16 so it will end up 32M, which isn't
> nice and it would probably a good idea to introduce a parameter to
> select which allocator to use but still it's far from consuming all
> the VM area. On x86_64, the vmalloc area is obscenely large at 245,
> ie 32 terabytes. Even with 4096 processors, single chunk is measly
> 0.02%.
Just to note - there must be a reason we (SuSE/Novell) build our default
32-bit kernel with support for 128 CPUs, which now is simply broken.
> If it's a problem for other archs or extreme x86_32 configurations,
> we can add some safety measures but in general I don't think it is a
> problem.
>-- END OF QUOTE --
>
>So, yeah, if there are 32bit 32-way NUMA machines out there, it would
>be wise to skip remap allocator on such machines. Maybe we can
>implement a heuristic - something like "if vm area consumption goes
>over 25%, don't use remap".
Possibly, as a secondary consideration on top of the suggested reduction
of virtual address space consumption.
Jan
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [GIT PATCH] x86,percpu: fix pageattr handling with remap allocator
2009-05-15 7:47 ` Jan Beulich
@ 2009-05-15 8:11 ` Tejun Heo
2009-05-15 8:22 ` Jan Beulich
0 siblings, 1 reply; 38+ messages in thread
From: Tejun Heo @ 2009-05-15 8:11 UTC (permalink / raw)
To: Jan Beulich; +Cc: mingo, andi, tglx, linux-kernel, hpa
Hello,
Jan Beulich wrote:
>> The whole point of doing the remapping is giving each CPU its own PMD
>> mapping for perpcu area, so, yeah, that's the requirement. I don't
>> think the requirement is hidden tho.
>
> No, from looking at the code the requirement seems to only be that you
> get memory allocated from the correct node and mapped by a large page.
> There's nothing said why the final virtual address would need to be large
> page aligned. I.e., with a slight modification to take the NUMA requirement
> into account (I noticed I ignored that aspect after I had already sent that
> mail), the previous suggestion would still appear usable to me.
The requirement is having separate PMD mapping per NUMA node. What
has been implemented is the simplest form of that - one mapping per
CPU. Sure it can be further improved with more knowledge of the
topology. If you're interested, please go ahead.
>>> This would additionally address a potential problem on 32-bits -
>>> currently, for a 32-CPU system you consume half of the vmalloc space
>>> with PAE (on non-PAE you'd even exhaust it, but I think it's
>>> unreasonable to expect a system having 32 CPUs to not need PAE).
>> I recall having about the same conversation before. Looking up...
>>
>> -- QUOTE --
>> Actually, I've been looking at the numbers and I'm not sure if the
>> concern is valid. On x86_32, the practical number of maximum
>> processors would be around 16 so it will end up 32M, which isn't
>> nice and it would probably a good idea to introduce a parameter to
>> select which allocator to use but still it's far from consuming all
>> the VM area. On x86_64, the vmalloc area is obscenely large at 245,
>> ie 32 terabytes. Even with 4096 processors, single chunk is measly
>> 0.02%.
>
> Just to note - there must be a reason we (SuSE/Novell) build our default
> 32-bit kernel with support for 128 CPUs, which now is simply broken.
It's not broken, it will just fall back to 4k allocator. Also, please
take a look at the refreshed patchset, remap allocator is not used
anymore if it's gonna occupy more than 20% (random number from the top
of my head) of vmalloc area.
>> So, yeah, if there are 32bit 32-way NUMA machines out there, it would
>> be wise to skip remap allocator on such machines. Maybe we can
>> implement a heuristic - something like "if vm area consumption goes
>> over 25%, don't use remap".
>
> Possibly, as a secondary consideration on top of the suggested reduction
> of virtual address space consumption.
Yeah, further improvements welcome. No objection whatsoever there.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [GIT PATCH] x86,percpu: fix pageattr handling with remap allocator
2009-05-15 8:11 ` Tejun Heo
@ 2009-05-15 8:22 ` Jan Beulich
2009-05-15 8:27 ` Tejun Heo
0 siblings, 1 reply; 38+ messages in thread
From: Jan Beulich @ 2009-05-15 8:22 UTC (permalink / raw)
To: Tejun Heo; +Cc: mingo, andi, tglx, linux-kernel, hpa
>>> Tejun Heo <tj@kernel.org> 15.05.09 10:11 >>>
>>>> This would additionally address a potential problem on 32-bits -
>>>> currently, for a 32-CPU system you consume half of the vmalloc space
>>>> with PAE (on non-PAE you'd even exhaust it, but I think it's
>>>> unreasonable to expect a system having 32 CPUs to not need PAE).
>>> I recall having about the same conversation before. Looking up...
>>>
>>> -- QUOTE --
>>> Actually, I've been looking at the numbers and I'm not sure if the
>>> concern is valid. On x86_32, the practical number of maximum
>>> processors would be around 16 so it will end up 32M, which isn't
>>> nice and it would probably a good idea to introduce a parameter to
>>> select which allocator to use but still it's far from consuming all
>>> the VM area. On x86_64, the vmalloc area is obscenely large at 245,
>>> ie 32 terabytes. Even with 4096 processors, single chunk is measly
>>> 0.02%.
>>
>> Just to note - there must be a reason we (SuSE/Novell) build our default
>> 32-bit kernel with support for 128 CPUs, which now is simply broken.
>
>It's not broken, it will just fall back to 4k allocator. Also, please
I'm afraid I have to disagree: There's no check (not even in
vm_area_register_early()) whether the vmalloc area is actually large enough
to fulfill the request.
>take a look at the refreshed patchset, remap allocator is not used
>anymore if it's gonna occupy more than 20% (random number from the top
>of my head) of vmalloc area.
Yeah, I saw this only after following this thread.
Jan
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [GIT PATCH] x86,percpu: fix pageattr handling with remap allocator
2009-05-15 8:22 ` Jan Beulich
@ 2009-05-15 8:27 ` Tejun Heo
0 siblings, 0 replies; 38+ messages in thread
From: Tejun Heo @ 2009-05-15 8:27 UTC (permalink / raw)
To: Jan Beulich; +Cc: mingo, andi, tglx, linux-kernel, hpa
Jan Beulich wrote:
>>>> Tejun Heo <tj@kernel.org> 15.05.09 10:11 >>>
>>>>> This would additionally address a potential problem on 32-bits -
>>>>> currently, for a 32-CPU system you consume half of the vmalloc space
>>>>> with PAE (on non-PAE you'd even exhaust it, but I think it's
>>>>> unreasonable to expect a system having 32 CPUs to not need PAE).
>>>> I recall having about the same conversation before. Looking up...
>>>>
>>>> -- QUOTE --
>>>> Actually, I've been looking at the numbers and I'm not sure if the
>>>> concern is valid. On x86_32, the practical number of maximum
>>>> processors would be around 16 so it will end up 32M, which isn't
>>>> nice and it would probably a good idea to introduce a parameter to
>>>> select which allocator to use but still it's far from consuming all
>>>> the VM area. On x86_64, the vmalloc area is obscenely large at 245,
>>>> ie 32 terabytes. Even with 4096 processors, single chunk is measly
>>>> 0.02%.
>>> Just to note - there must be a reason we (SuSE/Novell) build our default
>>> 32-bit kernel with support for 128 CPUs, which now is simply broken.
>> It's not broken, it will just fall back to 4k allocator. Also, please
>
> I'm afraid I have to disagree: There's no check (not even in
> vm_area_register_early()) whether the vmalloc area is actually large enough
> to fulfill the request.
Hah... indeed. Well, it's solved now.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [GIT PATCH] x86,percpu: fix pageattr handling with remap allocator
2009-05-14 12:49 [GIT PATCH] x86,percpu: fix pageattr handling with remap allocator Tejun Heo
` (5 preceding siblings ...)
2009-05-14 16:22 ` Tejun Heo
@ 2009-05-16 1:17 ` Suresh Siddha
2009-05-16 15:16 ` Tejun Heo
2009-05-19 9:44 ` Tejun Heo
7 siblings, 1 reply; 38+ messages in thread
From: Suresh Siddha @ 2009-05-16 1:17 UTC (permalink / raw)
To: Tejun Heo
Cc: JBeulich@novell.com, andi@firstfloor.org, mingo@elte.hu,
linux-kernel-owner@vger.kernel.org, hpa@zytor.com,
tglx@linutronix.de, linux-kernel@vger.kernel.org
On Thu, 2009-05-14 at 05:49 -0700, Tejun Heo wrote:
> Hello,
>
> Upon ack, please pull from the following git tree.
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git x86-percpu-pageattr
>
> This patchset fixes subtile bug in pageattr handling when remap percpu
> first chunk allocator is in use and implements percpu_alloc kernel
> parameter so that allocator can be selected from boot prompt.
>
> This problem was spotted by Jan Beulich.
>
> The remap allocator allocates a PMD page per cpu, returns whatever is
> unnecessary to the page allocator and remaps the PMD page into vmalloc
> area to construct the first percpu chunk. This is to take advantage
> of large page mapping.
Tejun, Can you please educate me why we need to map this first percpu
chunk (which is pre-allocated during boot and is physically contiguous)
into vmalloc area? Perhaps even for the other dynamically allocated
secondary chunks? (as far as I can see, all the chunk allocations seems
to be physically contiguous and later mapped into vmalloc area)..
That should simplify these things quite a bit(atleast for first percpu
chunk). I am missing something obvious I guess.
thanks,
suresh
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [GIT PATCH] x86,percpu: fix pageattr handling with remap allocator
2009-05-16 1:17 ` Suresh Siddha
@ 2009-05-16 15:16 ` Tejun Heo
2009-05-16 19:09 ` Suresh Siddha
0 siblings, 1 reply; 38+ messages in thread
From: Tejun Heo @ 2009-05-16 15:16 UTC (permalink / raw)
To: suresh.b.siddha
Cc: JBeulich@novell.com, andi@firstfloor.org, mingo@elte.hu,
linux-kernel-owner@vger.kernel.org, hpa@zytor.com,
tglx@linutronix.de, linux-kernel@vger.kernel.org
Hello, Suresh.
Suresh Siddha wrote:
> Tejun, Can you please educate me why we need to map this first
> percpu chunk (which is pre-allocated during boot and is physically
> contiguous) into vmalloc area?
To make areas for each cpu congruent such that the address offset of a
percpu symbol for CPU N is always the same from the address for CPU 0.
> Perhaps even for the other dynamically allocated secondary chunks?
> (as far as I can see, all the chunk allocations seems to be
> physically contiguous and later mapped into vmalloc area)..
>
> That should simplify these things quite a bit(atleast for first
> percpu chunk). I am missing something obvious I guess.
Hmm... Sorry I don't really follow. Can you please elaborate the
question?
thanks.
--
tejun
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [GIT PATCH] x86,percpu: fix pageattr handling with remap allocator
2009-05-16 15:16 ` Tejun Heo
@ 2009-05-16 19:09 ` Suresh Siddha
2009-05-17 1:23 ` Tejun Heo
0 siblings, 1 reply; 38+ messages in thread
From: Suresh Siddha @ 2009-05-16 19:09 UTC (permalink / raw)
To: Tejun Heo
Cc: JBeulich@novell.com, andi@firstfloor.org, mingo@elte.hu,
linux-kernel-owner@vger.kernel.org, hpa@zytor.com,
tglx@linutronix.de, linux-kernel@vger.kernel.org
On Sat, 2009-05-16 at 08:16 -0700, Tejun Heo wrote:
> Hello, Suresh.
>
> Suresh Siddha wrote:
> > Tejun, Can you please educate me why we need to map this first
> > percpu chunk (which is pre-allocated during boot and is physically
> > contiguous) into vmalloc area?
>
> To make areas for each cpu congruent such that the address offset of a
> percpu symbol for CPU N is always the same from the address for CPU 0.
But for the first percpu chunk, isn't it the case that the physical
address allocations for a particular cpu is contiguous (as you are using
one bootmem allocation for whole PMD_SIZE for any given cpu)? So both
the kernel direct mapping aswell as the vmalloc mappings are contiguous
for the first chunk, on any given cpu. Right?
> > Perhaps even for the other dynamically allocated secondary chunks?
> > (as far as I can see, all the chunk allocations seems to be
> > physically contiguous and later mapped into vmalloc area)..
> >
> > That should simplify these things quite a bit(atleast for first
> > percpu chunk). I am missing something obvious I guess.
>
> Hmm... Sorry I don't really follow. Can you please elaborate the
> question?
For the first percpu chunk, we can use the kernel direct mapping and
avoid the vmalloc mapping of PMD_SIZE. And avoid the vmap address
aliasing problem (wrt to free pages that we have given back to -mm) that
we are trying to avoid with this patchset (as the existing cpa code
already takes care of the kernel direct mappings).
thanks,
suresh
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [GIT PATCH] x86,percpu: fix pageattr handling with remap allocator
2009-05-16 19:09 ` Suresh Siddha
@ 2009-05-17 1:23 ` Tejun Heo
2009-05-18 19:20 ` Suresh Siddha
0 siblings, 1 reply; 38+ messages in thread
From: Tejun Heo @ 2009-05-17 1:23 UTC (permalink / raw)
To: suresh.b.siddha
Cc: JBeulich@novell.com, andi@firstfloor.org, mingo@elte.hu,
linux-kernel-owner@vger.kernel.org, hpa@zytor.com,
tglx@linutronix.de, linux-kernel@vger.kernel.org
Hello, Suresh.
Suresh Siddha wrote:
> On Sat, 2009-05-16 at 08:16 -0700, Tejun Heo wrote:
>> Hello, Suresh.
>>
>> Suresh Siddha wrote:
>>> Tejun, Can you please educate me why we need to map this first
>>> percpu chunk (which is pre-allocated during boot and is physically
>>> contiguous) into vmalloc area?
>> To make areas for each cpu congruent such that the address offset of a
>> percpu symbol for CPU N is always the same from the address for CPU 0.
>
> But for the first percpu chunk, isn't it the case that the physical
> address allocations for a particular cpu is contiguous (as you are using
> one bootmem allocation for whole PMD_SIZE for any given cpu)? So both
> the kernel direct mapping aswell as the vmalloc mappings are contiguous
> for the first chunk, on any given cpu. Right?
Hmmm... okay. Percpu areas are composed of multiple chunks. A single
chunk is composed of multiple units, one unit for each CPU. Units in
a single chunk should be contiguous and of the same size such that
unit_addr_for_cpu_N == chunk_addr + N * unit_size whereas chunks don't
need to have any special address relation to other chunks. Combined,
this results in percpu addresses for CPU N are always offset by N *
unit_size from the percpu adresses for CPU 0 which can be efficiently
determined using some extra resource in the processor (segment
register on x86 for example).
For remap first chunk allocator, each unit for each CPU is allocated
separately using the bootmem allocator. Each unit is continuous but
they still need to be assembed into a single contiguous area to be
used as the first chunk, which is where the remapping comes in. So,
the extra requirement is that units in the same chunk need to be
contiguous and NUMA allocation means units will be spread according to
NUMA configuration, so they need to be put together by remapping them.
>>> Perhaps even for the other dynamically allocated secondary chunks?
>>> (as far as I can see, all the chunk allocations seems to be
>>> physically contiguous and later mapped into vmalloc area)..
>>>
>>> That should simplify these things quite a bit(atleast for first
>>> percpu chunk). I am missing something obvious I guess.
>> Hmm... Sorry I don't really follow. Can you please elaborate the
>> question?
>
> For the first percpu chunk, we can use the kernel direct mapping and
> avoid the vmalloc mapping of PMD_SIZE. And avoid the vmap address
> aliasing problem (wrt to free pages that we have given back to -mm) that
> we are trying to avoid with this patchset (as the existing cpa code
> already takes care of the kernel direct mappings).
Hmmm.... If you can show me how to use the linear mapping directly,
I'll be happy as a clam.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [GIT PATCH] x86,percpu: fix pageattr handling with remap allocator
2009-05-17 1:23 ` Tejun Heo
@ 2009-05-18 19:20 ` Suresh Siddha
2009-05-18 19:41 ` H. Peter Anvin
0 siblings, 1 reply; 38+ messages in thread
From: Suresh Siddha @ 2009-05-18 19:20 UTC (permalink / raw)
To: Tejun Heo
Cc: JBeulich@novell.com, andi@firstfloor.org, mingo@elte.hu,
linux-kernel-owner@vger.kernel.org, hpa@zytor.com,
tglx@linutronix.de, linux-kernel@vger.kernel.org
On Sat, 2009-05-16 at 18:23 -0700, Tejun Heo wrote:
> Units in
> a single chunk should be contiguous and of the same size such that
> unit_addr_for_cpu_N == chunk_addr + N * unit_size whereas chunks don't
> need to have any special address relation to other chunks.
And the size of the unit is same across chunks.
This is what I was missing :)
Can we don't use PERCPU_DYNAMIC_RESERVE in the first chunk and for
dynamic per_cpu_ptr's we can use some other offset such as
per_cpu_ptr_offset() or some such thing?
Then we can separate the static and dynamic chunks. And use large page
kernel-direct mappings for fast access for critical common things and
use small page accesses for dynamic and not so common accesses.
Just checking to see if we can reduce the complexity of setting up the
percpu areas (different versions for embed, non-embed etc) and handling
all these aliases with simple code, rather than making it complex.
thanks,
suresh
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [GIT PATCH] x86,percpu: fix pageattr handling with remap allocator
2009-05-18 19:20 ` Suresh Siddha
@ 2009-05-18 19:41 ` H. Peter Anvin
2009-05-18 21:07 ` Suresh Siddha
0 siblings, 1 reply; 38+ messages in thread
From: H. Peter Anvin @ 2009-05-18 19:41 UTC (permalink / raw)
To: suresh.b.siddha
Cc: Tejun Heo, JBeulich@novell.com, andi@firstfloor.org,
mingo@elte.hu, linux-kernel-owner@vger.kernel.org,
tglx@linutronix.de, linux-kernel@vger.kernel.org
Suresh Siddha wrote:
>
> Can we don't use PERCPU_DYNAMIC_RESERVE in the first chunk and for
> dynamic per_cpu_ptr's we can use some other offset such as
> per_cpu_ptr_offset() or some such thing?
>
> Then we can separate the static and dynamic chunks. And use large page
> kernel-direct mappings for fast access for critical common things and
> use small page accesses for dynamic and not so common accesses.
>
> Just checking to see if we can reduce the complexity of setting up the
> percpu areas (different versions for embed, non-embed etc) and handling
> all these aliases with simple code, rather than making it complex.
>
I'm confused what you're suggesting here. The whole point of the percpu
unification work is that we can use %gs:absolute type references to hit
a variable right away. Although in theory we could use both %fs and %gs
for pointers, setting up both would greatly increase the cost of
entering the kernel, especially on 64 bits.
This means all percpu data has to have a constant (virtual) offset from
the beginning of the static percpu area.
-hpa
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [GIT PATCH] x86,percpu: fix pageattr handling with remap allocator
2009-05-18 19:41 ` H. Peter Anvin
@ 2009-05-18 21:07 ` Suresh Siddha
2009-05-19 1:28 ` Tejun Heo
0 siblings, 1 reply; 38+ messages in thread
From: Suresh Siddha @ 2009-05-18 21:07 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Tejun Heo, JBeulich@novell.com, andi@firstfloor.org,
mingo@elte.hu, linux-kernel-owner@vger.kernel.org,
tglx@linutronix.de, linux-kernel@vger.kernel.org
On Mon, 2009-05-18 at 12:41 -0700, H. Peter Anvin wrote:
> Suresh Siddha wrote:
> >
> > Can we don't use PERCPU_DYNAMIC_RESERVE in the first chunk and for
> > dynamic per_cpu_ptr's we can use some other offset such as
> > per_cpu_ptr_offset() or some such thing?
> >
> > Then we can separate the static and dynamic chunks. And use large page
> > kernel-direct mappings for fast access for critical common things and
> > use small page accesses for dynamic and not so common accesses.
> >
> > Just checking to see if we can reduce the complexity of setting up the
> > percpu areas (different versions for embed, non-embed etc) and handling
> > all these aliases with simple code, rather than making it complex.
> >
>
> I'm confused what you're suggesting here. The whole point of the percpu
> unification work is that we can use %gs:absolute type references to hit
> a variable right away. Although in theory we could use both %fs and %gs
> for pointers, setting up both would greatly increase the cost of
> entering the kernel, especially on 64 bits.
>
> This means all percpu data has to have a constant (virtual) offset from
> the beginning of the static percpu area.
This %gs:absolute type accesses are for static percpu data.
But what I was referring to is the dynamic percpu data(accessed through
per_cpu_ptr()). Instead of combining some part of the dynamic percpu
data into the static percpu data(first percpu chunk), we can use
different chunks for dynamic percpu data and governed by a different
per_cpu_dynamic_offset[NR_CPUS] array.
Then we can use large-page kernel direct mappings for static percpu data
(or %gs:offset) and small-page vmalloc mappings for dynamic percpu data.
thanks,
suresh
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [GIT PATCH] x86,percpu: fix pageattr handling with remap allocator
2009-05-18 21:07 ` Suresh Siddha
@ 2009-05-19 1:28 ` Tejun Heo
2009-05-20 23:01 ` Suresh Siddha
0 siblings, 1 reply; 38+ messages in thread
From: Tejun Heo @ 2009-05-19 1:28 UTC (permalink / raw)
To: suresh.b.siddha
Cc: H. Peter Anvin, JBeulich@novell.com, andi@firstfloor.org,
mingo@elte.hu, linux-kernel-owner@vger.kernel.org,
tglx@linutronix.de, linux-kernel@vger.kernel.org
Hello, Suresh.
Suresh Siddha wrote:
> This %gs:absolute type accesses are for static percpu data.
>
> But what I was referring to is the dynamic percpu data(accessed through
> per_cpu_ptr()). Instead of combining some part of the dynamic percpu
> data into the static percpu data(first percpu chunk), we can use
> different chunks for dynamic percpu data and governed by a different
> per_cpu_dynamic_offset[NR_CPUS] array.
>
> Then we can use large-page kernel direct mappings for static percpu data
> (or %gs:offset) and small-page vmalloc mappings for dynamic percpu data.
Hmmm... I can't really follow what you're suggesting. Can you please
explain it in more detailed way?
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [GIT PATCH] x86,percpu: fix pageattr handling with remap allocator
2009-05-14 12:49 [GIT PATCH] x86,percpu: fix pageattr handling with remap allocator Tejun Heo
` (6 preceding siblings ...)
2009-05-16 1:17 ` Suresh Siddha
@ 2009-05-19 9:44 ` Tejun Heo
2009-05-20 7:54 ` Ingo Molnar
7 siblings, 1 reply; 38+ messages in thread
From: Tejun Heo @ 2009-05-19 9:44 UTC (permalink / raw)
To: JBeulich, andi, mingo, linux-kernel-owner, hpa, tglx,
linux-kernel
Ingo, Ping.
--
tejun
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [GIT PATCH] x86,percpu: fix pageattr handling with remap allocator
2009-05-19 9:44 ` Tejun Heo
@ 2009-05-20 7:54 ` Ingo Molnar
2009-05-20 7:57 ` Tejun Heo
0 siblings, 1 reply; 38+ messages in thread
From: Ingo Molnar @ 2009-05-20 7:54 UTC (permalink / raw)
To: Tejun Heo, H. Peter Anvin
Cc: JBeulich, andi, linux-kernel-owner, tglx, linux-kernel
* Tejun Heo <teheo@novell.com> wrote:
> Ingo, Ping.
hpa is handling this - Peter will reply if he has any updates.
Ingo
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [GIT PATCH] x86,percpu: fix pageattr handling with remap allocator
2009-05-20 7:54 ` Ingo Molnar
@ 2009-05-20 7:57 ` Tejun Heo
0 siblings, 0 replies; 38+ messages in thread
From: Tejun Heo @ 2009-05-20 7:57 UTC (permalink / raw)
To: Ingo Molnar
Cc: H. Peter Anvin, JBeulich, andi, linux-kernel-owner, tglx,
linux-kernel
Ingo Molnar wrote:
> * Tejun Heo <teheo@novell.com> wrote:
>
>> Ingo, Ping.
>
> hpa is handling this - Peter will reply if he has any updates.
Okay, but please ignore this thread. I thought I was replying to
take#2 thread. At any rate, the git vector is the same and contains
the final version.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [GIT PATCH] x86,percpu: fix pageattr handling with remap allocator
2009-05-19 1:28 ` Tejun Heo
@ 2009-05-20 23:01 ` Suresh Siddha
2009-05-21 0:08 ` Tejun Heo
0 siblings, 1 reply; 38+ messages in thread
From: Suresh Siddha @ 2009-05-20 23:01 UTC (permalink / raw)
To: Tejun Heo
Cc: H. Peter Anvin, JBeulich@novell.com, andi@firstfloor.org,
mingo@elte.hu, linux-kernel-owner@vger.kernel.org,
tglx@linutronix.de, linux-kernel@vger.kernel.org
On Mon, 2009-05-18 at 18:28 -0700, Tejun Heo wrote:
> Hello, Suresh.
>
> Suresh Siddha wrote:
> > This %gs:absolute type accesses are for static percpu data.
> >
> > But what I was referring to is the dynamic percpu data(accessed through
> > per_cpu_ptr()). Instead of combining some part of the dynamic percpu
> > data into the static percpu data(first percpu chunk), we can use
> > different chunks for dynamic percpu data and governed by a different
> > per_cpu_dynamic_offset[NR_CPUS] array.
> >
> > Then we can use large-page kernel direct mappings for static percpu data
> > (or %gs:offset) and small-page vmalloc mappings for dynamic percpu data.
>
> Hmmm... I can't really follow what you're suggesting. Can you please
> explain it in more detailed way?
Ok. Before I make another attempt walking that hill :)
I was talking to Peter and it seems there are some requests to change
the first percpu unit allocation for each possible cpu using bootmem
allocator, to allocating the corresponding unit at the cpu online time.
Do you have plans to change this? If we do this allocation during the
corresponding cpu online time and don't end up using big pages, then
also we avoid all these aliasing issues...
thanks,
suresh
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [GIT PATCH] x86,percpu: fix pageattr handling with remap allocator
2009-05-20 23:01 ` Suresh Siddha
@ 2009-05-21 0:08 ` Tejun Heo
2009-05-21 0:36 ` Suresh Siddha
0 siblings, 1 reply; 38+ messages in thread
From: Tejun Heo @ 2009-05-21 0:08 UTC (permalink / raw)
To: suresh.b.siddha
Cc: H. Peter Anvin, JBeulich@novell.com, andi@firstfloor.org,
mingo@elte.hu, linux-kernel-owner@vger.kernel.org,
tglx@linutronix.de, linux-kernel@vger.kernel.org
Hello,
Suresh Siddha wrote:
>> Hmmm... I can't really follow what you're suggesting. Can you please
>> explain it in more detailed way?
>
> Ok. Before I make another attempt walking that hill :)
Heh... the problem is that unless I understand what you're trying to
achieve (and vice-versa), our discussion is likely to be riddled with
confusions, and currently either you're misunderstanding the whole
thing or I'm being slow (not too unusual :-). It would be nice to
determine which way it is.
> I was talking to Peter and it seems there are some requests to change
> the first percpu unit allocation for each possible cpu using bootmem
> allocator, to allocating the corresponding unit at the cpu online time.
>
> Do you have plans to change this?
Yeap, once the remaining archs are converted, that's the next stop.
With recently posted patchset, only three remain - sparc64, powerpc64
and sparc64. Davem is working on sparc64. I'm planning on doing
powerpc64. ia64 is a bit tricky as it already remaps percpu areas but
I'm sure we'll find a way around it. After that, yeap, the dynamic
online thing.
> If we do this allocation during the corresponding cpu online time
> and don't end up using big pages, then also we avoid all these
> aliasing issues...
The dynamic onlining will probably use 4k pages so, yeah, it won't
have the alias issues but that's not the issue here, right? You can
already avoid aliasing that way by simply using 4k allocator from the
get-go.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [GIT PATCH] x86,percpu: fix pageattr handling with remap allocator
2009-05-21 0:08 ` Tejun Heo
@ 2009-05-21 0:36 ` Suresh Siddha
2009-05-21 1:46 ` Tejun Heo
0 siblings, 1 reply; 38+ messages in thread
From: Suresh Siddha @ 2009-05-21 0:36 UTC (permalink / raw)
To: Tejun Heo
Cc: H. Peter Anvin, JBeulich@novell.com, andi@firstfloor.org,
mingo@elte.hu, linux-kernel-owner@vger.kernel.org,
tglx@linutronix.de, linux-kernel@vger.kernel.org
On Wed, 2009-05-20 at 17:08 -0700, Tejun Heo wrote:
> Heh... the problem is that unless I understand what you're trying to
> achieve (and vice-versa), our discussion is likely to be riddled with
> confusions, and currently either you're misunderstanding the whole
> thing or I'm being slow (not too unusual :-). It would be nice to
> determine which way it is.
At the start of this, I was trying to see a way where we can use large
page mappings for percpu areas and at the same time, avoid the
complexity of cpa() that we are adding in this patchset. So, I was going
down the path of exploiting your bootmem allocation for the first percpu
chunk (as kernel identity mappings are already taken care by cpa()).
> The dynamic onlining will probably use 4k pages so, yeah, it won't
> have the alias issues but that's not the issue here, right? You can
> already avoid aliasing that way by simply using 4k allocator from the
> get-go.
But now that I learnt about dynamic online allocation, can we avoid the
complexity brought by this patchset, by simply using 4k allocator from
get-go.
i.e., can we drop this remap pageattr handling patchset and simply use
4k mapping for now? And move to dynamic allocation at a later point.
This will simplify quite a bit of code.
thanks,
suresh
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [GIT PATCH] x86,percpu: fix pageattr handling with remap allocator
2009-05-21 0:36 ` Suresh Siddha
@ 2009-05-21 1:46 ` Tejun Heo
2009-05-21 1:48 ` Tejun Heo
2009-05-21 19:10 ` Suresh Siddha
0 siblings, 2 replies; 38+ messages in thread
From: Tejun Heo @ 2009-05-21 1:46 UTC (permalink / raw)
To: suresh.b.siddha
Cc: H. Peter Anvin, JBeulich@novell.com, andi@firstfloor.org,
mingo@elte.hu, linux-kernel-owner@vger.kernel.org,
tglx@linutronix.de, linux-kernel@vger.kernel.org
Hello,
Suresh Siddha wrote:
>> The dynamic onlining will probably use 4k pages so, yeah, it won't
>> have the alias issues but that's not the issue here, right? You can
>> already avoid aliasing that way by simply using 4k allocator from the
>> get-go.
>
> But now that I learnt about dynamic online allocation, can we avoid the
> complexity brought by this patchset, by simply using 4k allocator from
> get-go.
Sure, we can.
> i.e., can we drop this remap pageattr handling patchset and simply use
> 4k mapping for now? And move to dynamic allocation at a later point.
4k or not, x86 is already on dynamic allocation. The only difference
is how the first chunk is allocated.
> This will simplify quite a bit of code.
Yes it will. The question is which way would be better. Till now,
there hasn't been any actual data on how remap compares to 4k. The
only thing we know is that, on UMA, embed should behave exactly the
same for static percpu variables as before the whole dynamic
allocator.
On NUMA, both remap and 4k add some level of TLB pressure. remap will
waste one more PMD TLB entry (dup) while 4k adds a bunch of 4k ones
(non-dup but what used to be accessed by PMD TLB is now accessed with
PTE TLB). Some say using one more PMD TLB is better while others
disagree. So, the best course of action here seems to offer both and
easy way to select between them so that data can be gathered, which is
what this patchset does.
I don't think the added complexity for cpa() justifies dropping remap
without further testing. The added complexity isn't that big. Most
of the confusion in this patchset came from my ignorance on the
subject. cpa() is a fragile thing but we need it anyway, so...
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [GIT PATCH] x86,percpu: fix pageattr handling with remap allocator
2009-05-21 1:46 ` Tejun Heo
@ 2009-05-21 1:48 ` Tejun Heo
2009-05-21 19:10 ` Suresh Siddha
1 sibling, 0 replies; 38+ messages in thread
From: Tejun Heo @ 2009-05-21 1:48 UTC (permalink / raw)
To: suresh.b.siddha
Cc: H. Peter Anvin, JBeulich@novell.com, andi@firstfloor.org,
mingo@elte.hu, linux-kernel-owner@vger.kernel.org,
tglx@linutronix.de, linux-kernel@vger.kernel.org
Tejun Heo wrote:
> On NUMA, both remap and 4k add some level of TLB pressure. remap will
> waste one more PMD TLB entry (dup) while 4k adds a bunch of 4k ones
> (non-dup but what used to be accessed by PMD TLB is now accessed with
> PTE TLB).
Correction: 4k TLBs are dups too.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [GIT PATCH] x86,percpu: fix pageattr handling with remap allocator
2009-05-21 1:46 ` Tejun Heo
2009-05-21 1:48 ` Tejun Heo
@ 2009-05-21 19:10 ` Suresh Siddha
2009-05-21 23:18 ` Tejun Heo
1 sibling, 1 reply; 38+ messages in thread
From: Suresh Siddha @ 2009-05-21 19:10 UTC (permalink / raw)
To: Tejun Heo
Cc: H. Peter Anvin, JBeulich@novell.com, andi@firstfloor.org,
mingo@elte.hu, linux-kernel-owner@vger.kernel.org,
tglx@linutronix.de, linux-kernel@vger.kernel.org
On Wed, 2009-05-20 at 18:46 -0700, Tejun Heo wrote:
> Yes it will. The question is which way would be better. Till now,
> there hasn't been any actual data on how remap compares to 4k.
I am not sure if we see any measurable difference. Even if we use 4k
entries, it will be few entries that kernel will be referring to
frequently.
> On NUMA, both remap and 4k add some level of TLB pressure. remap will
> waste one more PMD TLB entry (dup) while 4k adds a bunch of 4k ones
> (non-dup but what used to be accessed by PMD TLB is now accessed with
> PTE TLB). Some say using one more PMD TLB is better while others
> disagree. So, the best course of action here seems to offer both and
> easy way to select between them so that data can be gathered, which is
> what this patchset does.
So with the planned future change of percpu unit allocation during cpu
online, you are planning to try large page allocation first and then
fallback to 4k pages, if that doesn't succeed. And then populate new
percpu ptr accordingly and then sort wrt to other cpu ptr's, so that we
can keep aliases in sync for future(and in parallel) cpa()'s that might
be happening.
There is nothing wrong with all this. Just the code complexity (and
maintenance) for what we are trying to gain ;)
>
> I don't think the added complexity for cpa() justifies dropping remap
> without further testing. The added complexity isn't that big. Most
> of the confusion in this patchset came from my ignorance on the
> subject. cpa() is a fragile thing but we need it anyway, so...
>
> Thanks.
>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [GIT PATCH] x86,percpu: fix pageattr handling with remap allocator
2009-05-21 19:10 ` Suresh Siddha
@ 2009-05-21 23:18 ` Tejun Heo
2009-05-22 0:55 ` Suresh Siddha
0 siblings, 1 reply; 38+ messages in thread
From: Tejun Heo @ 2009-05-21 23:18 UTC (permalink / raw)
To: suresh.b.siddha
Cc: H. Peter Anvin, JBeulich@novell.com, andi@firstfloor.org,
mingo@elte.hu, linux-kernel-owner@vger.kernel.org,
tglx@linutronix.de, linux-kernel@vger.kernel.org
Hello, Suresh.
Suresh Siddha wrote:
> On Wed, 2009-05-20 at 18:46 -0700, Tejun Heo wrote:
>> Yes it will. The question is which way would be better. Till now,
>> there hasn't been any actual data on how remap compares to 4k.
>
> I am not sure if we see any measurable difference. Even if we use 4k
> entries, it will be few entries that kernel will be referring to
> frequently.
Yeah, I hope so too but I *really* want to see some numbers before
taking further actions. Remap is chosen as the default because it
deviates less from the original behavior but I'll be excited to drop
it if 4k works just fine.
>> On NUMA, both remap and 4k add some level of TLB pressure. remap will
>> waste one more PMD TLB entry (dup) while 4k adds a bunch of 4k ones
>> (non-dup but what used to be accessed by PMD TLB is now accessed with
>> PTE TLB). Some say using one more PMD TLB is better while others
>> disagree. So, the best course of action here seems to offer both and
>> easy way to select between them so that data can be gathered, which is
>> what this patchset does.
>
> So with the planned future change of percpu unit allocation during cpu
> online, you are planning to try large page allocation first and then
> fallback to 4k pages, if that doesn't succeed. And then populate new
> percpu ptr accordingly and then sort wrt to other cpu ptr's, so that we
> can keep aliases in sync for future(and in parallel) cpa()'s that might
> be happening.
>
> There is nothing wrong with all this. Just the code complexity (and
> maintenance) for what we are trying to gain ;)
No, I'll let the first chunk allocation happen the same way for cpus
available on boot and then just do 4k allocations for whatever
necessary afterward. The needed code change in percpu proper isn't
that big. What would take more effort is auditing all percpu users.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [GIT PATCH] x86,percpu: fix pageattr handling with remap allocator
2009-05-21 23:18 ` Tejun Heo
@ 2009-05-22 0:55 ` Suresh Siddha
0 siblings, 0 replies; 38+ messages in thread
From: Suresh Siddha @ 2009-05-22 0:55 UTC (permalink / raw)
To: Tejun Heo
Cc: H. Peter Anvin, JBeulich@novell.com, andi@firstfloor.org,
mingo@elte.hu, linux-kernel-owner@vger.kernel.org,
tglx@linutronix.de, linux-kernel@vger.kernel.org
On Thu, 2009-05-21 at 16:18 -0700, Tejun Heo wrote:
> No, I'll let the first chunk allocation happen the same way for cpus
> available on boot and then just do 4k allocations for whatever
> necessary afterward. The needed code change in percpu proper isn't
> that big.
Ok. It will be more cleaner this way.
thanks,
suresh
^ permalink raw reply [flat|nested] 38+ messages in thread
end of thread, other threads:[~2009-05-22 0:57 UTC | newest]
Thread overview: 38+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-14 12:49 [GIT PATCH] x86,percpu: fix pageattr handling with remap allocator Tejun Heo
2009-05-14 12:49 ` [PATCH 1/4] x86: prepare setup_pcpu_remap() for pageattr fix Tejun Heo
2009-05-14 12:49 ` [PATCH 2/4] x86: simplify cpa_process_alias() Tejun Heo
2009-05-14 14:16 ` Jan Beulich
2009-05-14 15:37 ` Tejun Heo
2009-05-14 16:20 ` [PATCH UPDATED 2/4] x86: reorganize cpa_process_alias() Tejun Heo
2009-05-14 12:49 ` [PATCH 3/4] x86: fix pageattr handling for remap percpu allocator Tejun Heo
2009-05-14 16:21 ` [PATCH UPDATED " Tejun Heo
2009-05-14 12:49 ` [PATCH 4/4] x86: implement percpu_alloc kernel parameter Tejun Heo
2009-05-14 14:28 ` [GIT PATCH] x86,percpu: fix pageattr handling with remap allocator Jan Beulich
2009-05-14 15:55 ` Tejun Heo
2009-05-15 7:47 ` Jan Beulich
2009-05-15 8:11 ` Tejun Heo
2009-05-15 8:22 ` Jan Beulich
2009-05-15 8:27 ` Tejun Heo
2009-05-14 16:22 ` Tejun Heo
2009-05-15 4:00 ` Tejun Heo
2009-05-15 4:36 ` David Miller
2009-05-15 4:48 ` Tejun Heo
2009-05-16 1:17 ` Suresh Siddha
2009-05-16 15:16 ` Tejun Heo
2009-05-16 19:09 ` Suresh Siddha
2009-05-17 1:23 ` Tejun Heo
2009-05-18 19:20 ` Suresh Siddha
2009-05-18 19:41 ` H. Peter Anvin
2009-05-18 21:07 ` Suresh Siddha
2009-05-19 1:28 ` Tejun Heo
2009-05-20 23:01 ` Suresh Siddha
2009-05-21 0:08 ` Tejun Heo
2009-05-21 0:36 ` Suresh Siddha
2009-05-21 1:46 ` Tejun Heo
2009-05-21 1:48 ` Tejun Heo
2009-05-21 19:10 ` Suresh Siddha
2009-05-21 23:18 ` Tejun Heo
2009-05-22 0:55 ` Suresh Siddha
2009-05-19 9:44 ` Tejun Heo
2009-05-20 7:54 ` Ingo Molnar
2009-05-20 7:57 ` Tejun Heo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox