* [PATCH 1/2] MM: Make page tables relocatable -- conditional flush (rc9)
@ 2008-04-14 16:39 Ross Biro
2008-04-14 22:57 ` Andrew Morton
0 siblings, 1 reply; 6+ messages in thread
From: Ross Biro @ 2008-04-14 16:39 UTC (permalink / raw)
To: linux-kernel, linux-mm, rossb, akpm
These Patches make page tables relocatable for numa, memory
defragmentation, and memory hotblug. The potential need to rewalk the
page tables before making any changes causes a 3% peformance
degredation in the lmbench page miss micro benchmark.
Signed-off-by:rossb@google.com
----
These patches are against 2.6.25-rc9. There are no other differences between
this version and the last one.
diff -uprwNBb -X 2.6.25-rc9/Documentation/dontdiff /home/rossb/local/linux-2.6.25-rc9/arch/alpha/kernel/smp.c 2.6.25-rc9/arch/alpha/kernel/smp.c
--- /home/rossb/local/linux-2.6.25-rc9/arch/alpha/kernel/smp.c 2008-04-11 13:32:29.000000000 -0700
+++ 2.6.25-rc9/arch/alpha/kernel/smp.c 2008-04-14 09:00:18.000000000 -0700
@@ -845,6 +845,8 @@ flush_tlb_mm(struct mm_struct *mm)
{
preempt_disable();
+ clear_bit(MMF_NEED_FLUSH, mm->flags);
+
if (mm == current->active_mm) {
flush_tlb_current(mm);
if (atomic_read(&mm->mm_users) <= 1) {
diff -uprwNBb -X 2.6.25-rc9/Documentation/dontdiff /home/rossb/local/linux-2.6.25-rc9/arch/arm/kernel/smp.c 2.6.25-rc9/arch/arm/kernel/smp.c
--- /home/rossb/local/linux-2.6.25-rc9/arch/arm/kernel/smp.c 2008-04-11 13:32:29.000000000 -0700
+++ 2.6.25-rc9/arch/arm/kernel/smp.c 2008-04-14 09:00:18.000000000 -0700
@@ -738,6 +738,8 @@ void flush_tlb_mm(struct mm_struct *mm)
{
cpumask_t mask = mm->cpu_vm_mask;
+ clear_bit(MMF_NEED_FLUSH, mm->flags);
+
on_each_cpu_mask(ipi_flush_tlb_mm, mm, 1, 1, mask);
}
diff -uprwNBb -X 2.6.25-rc9/Documentation/dontdiff /home/rossb/local/linux-2.6.25-rc9/arch/avr32/mm/tlb.c 2.6.25-rc9/arch/avr32/mm/tlb.c
--- /home/rossb/local/linux-2.6.25-rc9/arch/avr32/mm/tlb.c 2008-04-11 13:32:29.000000000 -0700
+++ 2.6.25-rc9/arch/avr32/mm/tlb.c 2008-04-14 09:00:18.000000000 -0700
@@ -249,6 +249,8 @@ void flush_tlb_kernel_range(unsigned lon
void flush_tlb_mm(struct mm_struct *mm)
{
+ clear_bit(MMF_NEED_FLUSH, mm->flags);
+
/* Invalidate all TLB entries of this process by getting a new ASID */
if (mm->context != NO_CONTEXT) {
unsigned long flags;
diff -uprwNBb -X 2.6.25-rc9/Documentation/dontdiff /home/rossb/local/linux-2.6.25-rc9/arch/cris/arch-v10/mm/tlb.c 2.6.25-rc9/arch/cris/arch-v10/mm/tlb.c
--- /home/rossb/local/linux-2.6.25-rc9/arch/cris/arch-v10/mm/tlb.c 2008-04-11 13:32:29.000000000 -0700
+++ 2.6.25-rc9/arch/cris/arch-v10/mm/tlb.c 2008-04-14 09:00:18.000000000 -0700
@@ -69,6 +69,8 @@ flush_tlb_mm(struct mm_struct *mm)
D(printk("tlb: flush mm context %d (%p)\n", page_id, mm));
+ clear_bit(MMF_NEED_FLUSH, mm->flags);
+
if(page_id == NO_CONTEXT)
return;
diff -uprwNBb -X 2.6.25-rc9/Documentation/dontdiff /home/rossb/local/linux-2.6.25-rc9/arch/cris/arch-v32/kernel/smp.c 2.6.25-rc9/arch/cris/arch-v32/kernel/smp.c
--- /home/rossb/local/linux-2.6.25-rc9/arch/cris/arch-v32/kernel/smp.c 2008-04-11 13:32:29.000000000 -0700
+++ 2.6.25-rc9/arch/cris/arch-v32/kernel/smp.c 2008-04-14 09:00:18.000000000 -0700
@@ -252,6 +252,7 @@ void flush_tlb_all(void)
void flush_tlb_mm(struct mm_struct *mm)
{
+ clear_bit(MMF_NEED_FLUSH, mm->flags);
__flush_tlb_mm(mm);
flush_tlb_common(mm, FLUSH_ALL, 0);
/* No more mappings in other CPUs */
diff -uprwNBb -X 2.6.25-rc9/Documentation/dontdiff /home/rossb/local/linux-2.6.25-rc9/arch/ia64/kernel/smp.c 2.6.25-rc9/arch/ia64/kernel/smp.c
--- /home/rossb/local/linux-2.6.25-rc9/arch/ia64/kernel/smp.c 2008-04-11 13:32:29.000000000 -0700
+++ 2.6.25-rc9/arch/ia64/kernel/smp.c 2008-04-14 09:00:18.000000000 -0700
@@ -325,6 +325,8 @@ smp_flush_tlb_all (void)
void
smp_flush_tlb_mm (struct mm_struct *mm)
{
+ clear_bit(MMF_NEED_FLUSH, mm->flags);
+
preempt_disable();
/* this happens for the common case of a single-threaded fork(): */
if (likely(mm == current->active_mm && atomic_read(&mm->mm_users) == 1))
diff -uprwNBb -X 2.6.25-rc9/Documentation/dontdiff /home/rossb/local/linux-2.6.25-rc9/arch/m32r/kernel/smp.c 2.6.25-rc9/arch/m32r/kernel/smp.c
--- /home/rossb/local/linux-2.6.25-rc9/arch/m32r/kernel/smp.c 2008-04-11 13:32:29.000000000 -0700
+++ 2.6.25-rc9/arch/m32r/kernel/smp.c 2008-04-14 09:00:18.000000000 -0700
@@ -280,6 +280,8 @@ void smp_flush_tlb_mm(struct mm_struct *
unsigned long *mmc;
unsigned long flags;
+ clear_bit(MMF_NEED_FLUSH, mm->flags);
+
preempt_disable();
cpu_id = smp_processor_id();
mmc = &mm->context[cpu_id];
diff -uprwNBb -X 2.6.25-rc9/Documentation/dontdiff /home/rossb/local/linux-2.6.25-rc9/arch/mips/kernel/smp.c 2.6.25-rc9/arch/mips/kernel/smp.c
--- /home/rossb/local/linux-2.6.25-rc9/arch/mips/kernel/smp.c 2008-04-11 13:32:29.000000000 -0700
+++ 2.6.25-rc9/arch/mips/kernel/smp.c 2008-04-14 09:00:18.000000000 -0700
@@ -408,6 +408,8 @@ static inline void smp_on_each_tlb(void
void flush_tlb_mm(struct mm_struct *mm)
{
+ clear_bit(MMF_NEED_FLUSH, mm->flags);
+
preempt_disable();
if ((atomic_read(&mm->mm_users) != 1) || (current->mm != mm)) {
diff -uprwNBb -X 2.6.25-rc9/Documentation/dontdiff /home/rossb/local/linux-2.6.25-rc9/arch/powerpc/mm/tlb_32.c 2.6.25-rc9/arch/powerpc/mm/tlb_32.c
--- /home/rossb/local/linux-2.6.25-rc9/arch/powerpc/mm/tlb_32.c 2008-04-11 13:32:29.000000000 -0700
+++ 2.6.25-rc9/arch/powerpc/mm/tlb_32.c 2008-04-14 09:00:18.000000000 -0700
@@ -144,6 +144,8 @@ void flush_tlb_mm(struct mm_struct *mm)
{
struct vm_area_struct *mp;
+ clear_bit(MMF_NEED_FLUSH, mm->flags);
+
if (Hash == 0) {
_tlbia();
return;
diff -uprwNBb -X 2.6.25-rc9/Documentation/dontdiff /home/rossb/local/linux-2.6.25-rc9/arch/ppc/mm/tlb.c 2.6.25-rc9/arch/ppc/mm/tlb.c
--- /home/rossb/local/linux-2.6.25-rc9/arch/ppc/mm/tlb.c 2008-04-11 13:32:29.000000000 -0700
+++ 2.6.25-rc9/arch/ppc/mm/tlb.c 2008-04-14 09:00:18.000000000 -0700
@@ -144,6 +144,8 @@ void flush_tlb_mm(struct mm_struct *mm)
{
struct vm_area_struct *mp;
+ clear_bit(MMF_NEED_FLUSH, mm->flags);
+
if (Hash == 0) {
_tlbia();
return;
diff -uprwNBb -X 2.6.25-rc9/Documentation/dontdiff /home/rossb/local/linux-2.6.25-rc9/arch/sparc/kernel/smp.c 2.6.25-rc9/arch/sparc/kernel/smp.c
--- /home/rossb/local/linux-2.6.25-rc9/arch/sparc/kernel/smp.c 2008-04-11 13:32:29.000000000 -0700
+++ 2.6.25-rc9/arch/sparc/kernel/smp.c 2008-04-14 09:00:18.000000000 -0700
@@ -163,6 +163,8 @@ void smp_flush_cache_mm(struct mm_struct
void smp_flush_tlb_mm(struct mm_struct *mm)
{
+ clear_bit(MMF_NEED_FLUSH, mm->flags);
+
if(mm->context != NO_CONTEXT) {
cpumask_t cpu_mask = mm->cpu_vm_mask;
cpu_clear(smp_processor_id(), cpu_mask);
diff -uprwNBb -X 2.6.25-rc9/Documentation/dontdiff /home/rossb/local/linux-2.6.25-rc9/arch/sparc64/kernel/smp.c 2.6.25-rc9/arch/sparc64/kernel/smp.c
--- /home/rossb/local/linux-2.6.25-rc9/arch/sparc64/kernel/smp.c 2008-04-11 13:32:29.000000000 -0700
+++ 2.6.25-rc9/arch/sparc64/kernel/smp.c 2008-04-14 09:00:18.000000000 -0700
@@ -1119,6 +1119,8 @@ void smp_flush_tlb_mm(struct mm_struct *
u32 ctx = CTX_HWBITS(mm->context);
int cpu = get_cpu();
+ clear_bit(MMF_NEED_FLUSH, mm->flags);
+
if (atomic_read(&mm->mm_users) == 1) {
mm->cpu_vm_mask = cpumask_of_cpu(cpu);
goto local_flush_and_out;
diff -uprwNBb -X 2.6.25-rc9/Documentation/dontdiff /home/rossb/local/linux-2.6.25-rc9/arch/um/kernel/tlb.c 2.6.25-rc9/arch/um/kernel/tlb.c
--- /home/rossb/local/linux-2.6.25-rc9/arch/um/kernel/tlb.c 2008-04-11 13:32:29.000000000 -0700
+++ 2.6.25-rc9/arch/um/kernel/tlb.c 2008-04-14 09:00:18.000000000 -0700
@@ -517,6 +517,8 @@ void flush_tlb_mm(struct mm_struct *mm)
{
struct vm_area_struct *vma = mm->mmap;
+ clear_bit(MMF_NEED_FLUSH, mm->flags);
+
while (vma != NULL) {
fix_range(mm, vma->vm_start, vma->vm_end, 0);
vma = vma->vm_next;
diff -uprwNBb -X 2.6.25-rc9/Documentation/dontdiff /home/rossb/local/linux-2.6.25-rc9/arch/x86/kernel/smp_32.c 2.6.25-rc9/arch/x86/kernel/smp_32.c
--- /home/rossb/local/linux-2.6.25-rc9/arch/x86/kernel/smp_32.c 2008-04-11 13:32:29.000000000 -0700
+++ 2.6.25-rc9/arch/x86/kernel/smp_32.c 2008-04-14 09:00:18.000000000 -0700
@@ -332,6 +332,8 @@ void smp_invalidate_interrupt(struct pt_
if (per_cpu(cpu_tlbstate, cpu).state == TLBSTATE_OK) {
if (flush_va == TLB_FLUSH_ALL)
local_flush_tlb();
+ else if (f->flush_va == TLB_RELOAD_ALL)
+ local_reload_tlb_mm(f->flush_mm);
else
__flush_tlb_one(flush_va);
} else
@@ -408,10 +410,35 @@ void flush_tlb_current_task(void)
preempt_enable();
}
+void reload_tlb_mm(struct mm_struct *mm)
+{
+ cpumask_t cpu_mask;
+
+ clear_bit(MMF_NEED_RELOAD, &mm->flags);
+ clear_bit(MMF_NEED_FLUSH, &mm->flags);
+
+ preempt_disable();
+ cpu_mask = mm->cpu_vm_mask;
+ cpu_clear(smp_processor_id(), cpu_mask);
+
+ if (current->active_mm == mm) {
+ if (current->mm)
+ local_reload_tlb_mm(mm);
+ else
+ leave_mm(smp_processor_id());
+ }
+ if (!cpus_empty(cpu_mask))
+ flush_tlb_others(cpu_mask, mm, TLB_RELOAD_ALL);
+
+ preempt_enable();
+
+}
+
void flush_tlb_mm (struct mm_struct * mm)
{
cpumask_t cpu_mask;
+ clear_bit(MMF_NEED_FLUSH, mm->flags);
preempt_disable();
cpu_mask = mm->cpu_vm_mask;
cpu_clear(smp_processor_id(), cpu_mask);
diff -uprwNBb -X 2.6.25-rc9/Documentation/dontdiff /home/rossb/local/linux-2.6.25-rc9/arch/x86/kernel/smp_64.c 2.6.25-rc9/arch/x86/kernel/smp_64.c
--- /home/rossb/local/linux-2.6.25-rc9/arch/x86/kernel/smp_64.c 2008-04-11 13:32:29.000000000 -0700
+++ 2.6.25-rc9/arch/x86/kernel/smp_64.c 2008-04-14 09:00:18.000000000 -0700
@@ -155,6 +155,8 @@ asmlinkage void smp_invalidate_interrupt
if (read_pda(mmu_state) == TLBSTATE_OK) {
if (f->flush_va == TLB_FLUSH_ALL)
local_flush_tlb();
+ else if (f->flush_va == TLB_RELOAD_ALL)
+ local_reload_tlb_mm(f->flush_mm);
else
__flush_tlb_one(f->flush_va);
} else
@@ -228,10 +230,36 @@ void flush_tlb_current_task(void)
preempt_enable();
}
+void reload_tlb_mm(struct mm_struct *mm)
+{
+ cpumask_t cpu_mask;
+
+ clear_bit(MMF_NEED_RELOAD, &mm->flags);
+ clear_bit(MMF_NEED_FLUSH, &mm->flags);
+
+ preempt_disable();
+ cpu_mask = mm->cpu_vm_mask;
+ cpu_clear(smp_processor_id(), cpu_mask);
+
+ if (current->active_mm == mm) {
+ if (current->mm)
+ local_reload_tlb_mm(mm);
+ else
+ leave_mm(smp_processor_id());
+ }
+ if (!cpus_empty(cpu_mask))
+ flush_tlb_others(cpu_mask, mm, TLB_RELOAD_ALL);
+
+ preempt_enable();
+
+}
+
void flush_tlb_mm (struct mm_struct * mm)
{
cpumask_t cpu_mask;
+ clear_bit(MMF_NEED_FLUSH, &mm->flags);
+
preempt_disable();
cpu_mask = mm->cpu_vm_mask;
cpu_clear(smp_processor_id(), cpu_mask);
diff -uprwNBb -X 2.6.25-rc9/Documentation/dontdiff /home/rossb/local/linux-2.6.25-rc9/arch/x86/mach-voyager/voyager_smp.c 2.6.25-rc9/arch/x86/mach-voyager/voyager_smp.c
--- /home/rossb/local/linux-2.6.25-rc9/arch/x86/mach-voyager/voyager_smp.c 2008-04-11 13:32:29.000000000 -0700
+++ 2.6.25-rc9/arch/x86/mach-voyager/voyager_smp.c 2008-04-14 09:00:18.000000000 -0700
@@ -909,6 +909,8 @@ void flush_tlb_mm(struct mm_struct *mm)
{
unsigned long cpu_mask;
+ clear_bit(MMF_NEED_FLUSH, mm->flags);
+
preempt_disable();
cpu_mask = cpus_addr(mm->cpu_vm_mask)[0] & ~(1 << smp_processor_id());
diff -uprwNBb -X 2.6.25-rc9/Documentation/dontdiff /home/rossb/local/linux-2.6.25-rc9/arch/xtensa/mm/tlb.c 2.6.25-rc9/arch/xtensa/mm/tlb.c
--- /home/rossb/local/linux-2.6.25-rc9/arch/xtensa/mm/tlb.c 2008-04-11 13:32:29.000000000 -0700
+++ 2.6.25-rc9/arch/xtensa/mm/tlb.c 2008-04-14 09:00:18.000000000 -0700
@@ -63,6 +63,8 @@ void flush_tlb_all (void)
void flush_tlb_mm(struct mm_struct *mm)
{
+ clear_bit(MMF_NEED_FLUSH, mm->flags);
+
if (mm == current->active_mm) {
int flags;
local_save_flags(flags);
diff -uprwNBb -X 2.6.25-rc9/Documentation/dontdiff /home/rossb/local/linux-2.6.25-rc9/include/asm-alpha/tlbflush.h 2.6.25-rc9/include/asm-alpha/tlbflush.h
--- /home/rossb/local/linux-2.6.25-rc9/include/asm-alpha/tlbflush.h 2008-04-11 13:32:29.000000000 -0700
+++ 2.6.25-rc9/include/asm-alpha/tlbflush.h 2008-04-14 09:00:18.000000000 -0700
@@ -148,4 +148,6 @@ static inline void flush_tlb_kernel_rang
flush_tlb_all();
}
+#include <asm-generic/tlbflush.h>
+
#endif /* _ALPHA_TLBFLUSH_H */
diff -uprwNBb -X 2.6.25-rc9/Documentation/dontdiff /home/rossb/local/linux-2.6.25-rc9/include/asm-arm/tlbflush.h 2.6.25-rc9/include/asm-arm/tlbflush.h
--- /home/rossb/local/linux-2.6.25-rc9/include/asm-arm/tlbflush.h 2008-04-11 13:32:29.000000000 -0700
+++ 2.6.25-rc9/include/asm-arm/tlbflush.h 2008-04-14 09:00:18.000000000 -0700
@@ -466,5 +466,6 @@ extern void update_mmu_cache(struct vm_a
#endif
#endif /* CONFIG_MMU */
+#include <asm-generic/tlbflush.h>
#endif
diff -uprwNBb -X 2.6.25-rc9/Documentation/dontdiff /home/rossb/local/linux-2.6.25-rc9/include/asm-avr32/tlbflush.h 2.6.25-rc9/include/asm-avr32/tlbflush.h
--- /home/rossb/local/linux-2.6.25-rc9/include/asm-avr32/tlbflush.h 2008-04-11 13:32:29.000000000 -0700
+++ 2.6.25-rc9/include/asm-avr32/tlbflush.h 2008-04-14 09:00:18.000000000 -0700
@@ -29,5 +29,6 @@ extern void flush_tlb_page(struct vm_are
extern void __flush_tlb_page(unsigned long asid, unsigned long page);
extern void flush_tlb_kernel_range(unsigned long start, unsigned long end);
+#include <asm-generic/tlbflush.h>
#endif /* __ASM_AVR32_TLBFLUSH_H */
diff -uprwNBb -X 2.6.25-rc9/Documentation/dontdiff /home/rossb/local/linux-2.6.25-rc9/include/asm-blackfin/tlbflush.h 2.6.25-rc9/include/asm-blackfin/tlbflush.h
--- /home/rossb/local/linux-2.6.25-rc9/include/asm-blackfin/tlbflush.h 2008-04-11 13:32:29.000000000 -0700
+++ 2.6.25-rc9/include/asm-blackfin/tlbflush.h 2008-04-14 09:00:18.000000000 -0700
@@ -53,4 +53,5 @@ static inline void flush_tlb_kernel_page
BUG();
}
+#include <asm-generic/tlbflush.h>
#endif
diff -uprwNBb -X 2.6.25-rc9/Documentation/dontdiff /home/rossb/local/linux-2.6.25-rc9/include/asm-cris/tlbflush.h 2.6.25-rc9/include/asm-cris/tlbflush.h
--- /home/rossb/local/linux-2.6.25-rc9/include/asm-cris/tlbflush.h 2008-04-11 13:32:29.000000000 -0700
+++ 2.6.25-rc9/include/asm-cris/tlbflush.h 2008-04-14 09:00:18.000000000 -0700
@@ -44,5 +44,6 @@ static inline void flush_tlb(void)
}
#define flush_tlb_kernel_range(start, end) flush_tlb_all()
+#include <asm-generic/tlbflush.h>
#endif /* _CRIS_TLBFLUSH_H */
diff -uprwNBb -X 2.6.25-rc9/Documentation/dontdiff /home/rossb/local/linux-2.6.25-rc9/include/asm-frv/tlbflush.h 2.6.25-rc9/include/asm-frv/tlbflush.h
--- /home/rossb/local/linux-2.6.25-rc9/include/asm-frv/tlbflush.h 2008-04-11 13:32:29.000000000 -0700
+++ 2.6.25-rc9/include/asm-frv/tlbflush.h 2008-04-14 09:00:18.000000000 -0700
@@ -68,6 +68,7 @@ do { \
#define flush_tlb_kernel_range(start, end) BUG()
#endif
+#include <asm-generic/tlbflush.h>
#endif /* _ASM_TLBFLUSH_H */
diff -uprwNBb -X 2.6.25-rc9/Documentation/dontdiff /home/rossb/local/linux-2.6.25-rc9/include/asm-generic/tlbflush.h 2.6.25-rc9/include/asm-generic/tlbflush.h
--- /home/rossb/local/linux-2.6.25-rc9/include/asm-generic/tlbflush.h 1969-12-31 16:00:00.000000000 -0800
+++ 2.6.25-rc9/include/asm-generic/tlbflush.h 2008-04-14 09:00:18.000000000 -0700
@@ -0,0 +1,102 @@
+/* include/asm-generic/tlbflush.h
+ *
+ * Generic TLB reload code and page table migration code that
+ * depends on it.
+ *
+ * Copyright 2008 Google, Inc.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; version 2 of the
+ * License.
+ */
+
+#ifndef _ASM_GENERIC__TLBFLUSH_H
+#define _ASM_GENERIC__TLBFLUSH_H
+
+#include <asm/pgalloc.h>
+#include <asm/mmu_context.h>
+
+/* flush an mm that we messed with earlier, but delayed the flush
+ assuming that we would muck with it a whole lot more. */
+static inline void maybe_flush_tlb_mm(struct mm_struct *mm)
+{
+ if (test_and_clear_bit(MMF_NEED_FLUSH, &mm->flags))
+ flush_tlb_mm(mm);
+}
+
+/* possibly flag an mm as needing to be flushed. */
+static inline int maybe_need_flush_mm(struct mm_struct *mm)
+{
+ if (!cpus_empty(mm->cpu_vm_mask)) {
+ set_bit(MMF_NEED_FLUSH, &mm->flags);
+ return 1;
+ }
+ return 0;
+}
+
+
+
+#ifdef ARCH_HAS_RELOAD_TLB
+static inline void maybe_reload_tlb_mm(struct mm_struct *mm)
+{
+ if (test_and_clear_bit(MMF_NEED_RELOAD, &mm->flags))
+ reload_tlb_mm(mm);
+ else
+ maybe_flush_tlb_mm(mm);
+}
+
+static inline int maybe_need_tlb_reload_mm(struct mm_struct *mm)
+{
+ if (!cpus_empty(mm->cpu_vm_mask)) {
+ set_bit(MMF_NEED_RELOAD, &mm->flags);
+ return 1;
+ }
+ return 0;
+}
+
+static inline int migrate_top_level_page_table(struct mm_struct *mm,
+ struct page *dest,
+ struct list_head *old_pages)
+{
+ unsigned long flags;
+ void *dest_ptr;
+
+ dest_ptr = page_address(dest);
+
+ spin_lock_irqsave(&mm->page_table_lock, flags);
+ memcpy(dest_ptr, mm->pgd, PAGE_SIZE);
+
+ /* Must be done before adding the list to the page to be
+ * freed. Should we take the pgd_lock through this entire
+ * mess, or is it ok for the pgd to be missing from the list
+ * for a bit?
+ */
+ pgd_list_del(mm->pgd);
+
+ list_add_tail(&virt_to_page(mm->pgd)->lru, old_pages);
+
+ mm->pgd = (pgd_t *)dest_ptr;
+
+ maybe_need_tlb_reload_mm(mm);
+
+ spin_unlock_irqrestore(&mm->page_table_lock, flags);
+ return 0;
+}
+#else /* ARCH_HAS_RELOAD_TLB */
+static inline int migrate_top_level_page_table(struct mm_struct *mm,
+ struct page *dest,
+ struct list_head *old_pages) {
+ return 1;
+}
+
+static inline void maybe_reload_tlb_mm(struct mm_struct *mm)
+{
+ maybe_flush_tlb_mm(mm);
+}
+
+
+#endif /* ARCH_HAS_RELOAD_TLB */
+
+
+#endif /* _ASM_GENERIC__TLBFLUSH_H */
diff -uprwNBb -X 2.6.25-rc9/Documentation/dontdiff /home/rossb/local/linux-2.6.25-rc9/include/asm-h8300/tlbflush.h 2.6.25-rc9/include/asm-h8300/tlbflush.h
--- /home/rossb/local/linux-2.6.25-rc9/include/asm-h8300/tlbflush.h 2008-04-11 13:32:29.000000000 -0700
+++ 2.6.25-rc9/include/asm-h8300/tlbflush.h 2008-04-14 09:00:18.000000000 -0700
@@ -52,4 +52,6 @@ static inline void flush_tlb_kernel_page
BUG();
}
+#include <asm-generic/tlbflush.h>
+
#endif /* _H8300_TLBFLUSH_H */
diff -uprwNBb -X 2.6.25-rc9/Documentation/dontdiff /home/rossb/local/linux-2.6.25-rc9/include/asm-ia64/tlbflush.h 2.6.25-rc9/include/asm-ia64/tlbflush.h
--- /home/rossb/local/linux-2.6.25-rc9/include/asm-ia64/tlbflush.h 2008-04-11 13:32:29.000000000 -0700
+++ 2.6.25-rc9/include/asm-ia64/tlbflush.h 2008-04-14 09:00:18.000000000 -0700
@@ -98,4 +98,6 @@ static inline void flush_tlb_kernel_rang
flush_tlb_all(); /* XXX fix me */
}
+#include <asm-generic/tlbflush.h>
+
#endif /* _ASM_IA64_TLBFLUSH_H */
diff -uprwNBb -X 2.6.25-rc9/Documentation/dontdiff /home/rossb/local/linux-2.6.25-rc9/include/asm-m32r/tlbflush.h 2.6.25-rc9/include/asm-m32r/tlbflush.h
--- /home/rossb/local/linux-2.6.25-rc9/include/asm-m32r/tlbflush.h 2008-04-11 13:32:29.000000000 -0700
+++ 2.6.25-rc9/include/asm-m32r/tlbflush.h 2008-04-14 09:00:18.000000000 -0700
@@ -93,5 +93,6 @@ static __inline__ void __flush_tlb_all(v
}
extern void update_mmu_cache(struct vm_area_struct *, unsigned long, pte_t);
+#include <asm-generic/tlbflush.h>
#endif /* _ASM_M32R_TLBFLUSH_H */
diff -uprwNBb -X 2.6.25-rc9/Documentation/dontdiff /home/rossb/local/linux-2.6.25-rc9/include/asm-m68k/tlbflush.h 2.6.25-rc9/include/asm-m68k/tlbflush.h
--- /home/rossb/local/linux-2.6.25-rc9/include/asm-m68k/tlbflush.h 2008-04-11 13:32:29.000000000 -0700
+++ 2.6.25-rc9/include/asm-m68k/tlbflush.h 2008-04-14 09:00:18.000000000 -0700
@@ -215,5 +215,6 @@ static inline void flush_tlb_kernel_page
}
#endif
+#include <asm-generic/tlbflush.h>
#endif /* _M68K_TLBFLUSH_H */
diff -uprwNBb -X 2.6.25-rc9/Documentation/dontdiff /home/rossb/local/linux-2.6.25-rc9/include/asm-m68knommu/tlbflush.h 2.6.25-rc9/include/asm-m68knommu/tlbflush.h
--- /home/rossb/local/linux-2.6.25-rc9/include/asm-m68knommu/tlbflush.h 2008-04-11 13:32:29.000000000 -0700
+++ 2.6.25-rc9/include/asm-m68knommu/tlbflush.h 2008-04-14 09:00:18.000000000 -0700
@@ -52,4 +52,6 @@ static inline void flush_tlb_kernel_page
BUG();
}
+#include <asm-generic/tlbflush.h>
+
#endif /* _M68KNOMMU_TLBFLUSH_H */
diff -uprwNBb -X 2.6.25-rc9/Documentation/dontdiff /home/rossb/local/linux-2.6.25-rc9/include/asm-mips/tlbflush.h 2.6.25-rc9/include/asm-mips/tlbflush.h
--- /home/rossb/local/linux-2.6.25-rc9/include/asm-mips/tlbflush.h 2008-04-11 13:32:29.000000000 -0700
+++ 2.6.25-rc9/include/asm-mips/tlbflush.h 2008-04-14 09:00:18.000000000 -0700
@@ -44,4 +44,6 @@ extern void flush_tlb_one(unsigned long
#endif /* CONFIG_SMP */
+#include <asm-generic/tlbflush.h>
+
#endif /* __ASM_TLBFLUSH_H */
diff -uprwNBb -X 2.6.25-rc9/Documentation/dontdiff /home/rossb/local/linux-2.6.25-rc9/include/asm-parisc/tlbflush.h 2.6.25-rc9/include/asm-parisc/tlbflush.h
--- /home/rossb/local/linux-2.6.25-rc9/include/asm-parisc/tlbflush.h 2008-04-11 13:32:29.000000000 -0700
+++ 2.6.25-rc9/include/asm-parisc/tlbflush.h 2008-04-14 09:00:18.000000000 -0700
@@ -76,5 +76,6 @@ void __flush_tlb_range(unsigned long sid
#define flush_tlb_range(vma,start,end) __flush_tlb_range((vma)->vm_mm->context,start,end)
#define flush_tlb_kernel_range(start, end) __flush_tlb_range(0,start,end)
+#include <asm-generic/tlbflush.h>
#endif
diff -uprwNBb -X 2.6.25-rc9/Documentation/dontdiff /home/rossb/local/linux-2.6.25-rc9/include/asm-powerpc/tlbflush.h 2.6.25-rc9/include/asm-powerpc/tlbflush.h
--- /home/rossb/local/linux-2.6.25-rc9/include/asm-powerpc/tlbflush.h 2008-04-11 13:32:29.000000000 -0700
+++ 2.6.25-rc9/include/asm-powerpc/tlbflush.h 2008-04-14 09:00:18.000000000 -0700
@@ -173,5 +173,7 @@ extern void __flush_hash_table_range(str
*/
extern void update_mmu_cache(struct vm_area_struct *, unsigned long, pte_t);
+#include <asm-generic/tlbflush.h>
+
#endif /*__KERNEL__ */
#endif /* _ASM_POWERPC_TLBFLUSH_H */
diff -uprwNBb -X 2.6.25-rc9/Documentation/dontdiff /home/rossb/local/linux-2.6.25-rc9/include/asm-s390/tlbflush.h 2.6.25-rc9/include/asm-s390/tlbflush.h
--- /home/rossb/local/linux-2.6.25-rc9/include/asm-s390/tlbflush.h 2008-04-11 13:32:29.000000000 -0700
+++ 2.6.25-rc9/include/asm-s390/tlbflush.h 2008-04-14 09:00:18.000000000 -0700
@@ -126,4 +126,6 @@ static inline void flush_tlb_kernel_rang
__tlb_flush_mm(&init_mm);
}
+#include <asm-generic/tlbflush.h>
+
#endif /* _S390_TLBFLUSH_H */
diff -uprwNBb -X 2.6.25-rc9/Documentation/dontdiff /home/rossb/local/linux-2.6.25-rc9/include/asm-sh/tlbflush.h 2.6.25-rc9/include/asm-sh/tlbflush.h
--- /home/rossb/local/linux-2.6.25-rc9/include/asm-sh/tlbflush.h 2008-04-11 13:32:29.000000000 -0700
+++ 2.6.25-rc9/include/asm-sh/tlbflush.h 2008-04-14 09:00:18.000000000 -0700
@@ -46,4 +46,6 @@ extern void flush_tlb_one(unsigned long
#endif /* CONFIG_SMP */
+#include <asm-generic/tlbflush.h>
+
#endif /* __ASM_SH_TLBFLUSH_H */
diff -uprwNBb -X 2.6.25-rc9/Documentation/dontdiff /home/rossb/local/linux-2.6.25-rc9/include/asm-sparc/tlbflush.h 2.6.25-rc9/include/asm-sparc/tlbflush.h
--- /home/rossb/local/linux-2.6.25-rc9/include/asm-sparc/tlbflush.h 2008-04-11 13:32:29.000000000 -0700
+++ 2.6.25-rc9/include/asm-sparc/tlbflush.h 2008-04-14 09:00:18.000000000 -0700
@@ -57,4 +57,6 @@ static inline void flush_tlb_kernel_rang
flush_tlb_all();
}
+#include <asm-generic/tlbflush.h>
+
#endif /* _SPARC_TLBFLUSH_H */
diff -uprwNBb -X 2.6.25-rc9/Documentation/dontdiff /home/rossb/local/linux-2.6.25-rc9/include/asm-sparc64/tlbflush.h 2.6.25-rc9/include/asm-sparc64/tlbflush.h
--- /home/rossb/local/linux-2.6.25-rc9/include/asm-sparc64/tlbflush.h 2008-04-11 13:32:29.000000000 -0700
+++ 2.6.25-rc9/include/asm-sparc64/tlbflush.h 2008-04-14 09:00:18.000000000 -0700
@@ -41,4 +41,6 @@ do { flush_tsb_kernel_range(start,end);
#endif /* ! CONFIG_SMP */
+#include <asm-generic/tlbflush.h>
+
#endif /* _SPARC64_TLBFLUSH_H */
diff -uprwNBb -X 2.6.25-rc9/Documentation/dontdiff /home/rossb/local/linux-2.6.25-rc9/include/asm-um/tlbflush.h 2.6.25-rc9/include/asm-um/tlbflush.h
--- /home/rossb/local/linux-2.6.25-rc9/include/asm-um/tlbflush.h 2008-04-11 13:32:29.000000000 -0700
+++ 2.6.25-rc9/include/asm-um/tlbflush.h 2008-04-14 09:00:18.000000000 -0700
@@ -28,4 +28,6 @@ extern void flush_tlb_kernel_vm(void);
extern void flush_tlb_kernel_range(unsigned long start, unsigned long end);
extern void __flush_tlb_one(unsigned long addr);
+#include <asm-generic/tlbflush.h>
+
#endif
diff -uprwNBb -X 2.6.25-rc9/Documentation/dontdiff /home/rossb/local/linux-2.6.25-rc9/include/asm-v850/tlbflush.h 2.6.25-rc9/include/asm-v850/tlbflush.h
--- /home/rossb/local/linux-2.6.25-rc9/include/asm-v850/tlbflush.h 2008-04-11 13:32:29.000000000 -0700
+++ 2.6.25-rc9/include/asm-v850/tlbflush.h 2008-04-14 09:00:18.000000000 -0700
@@ -61,4 +61,6 @@ static inline void flush_tlb_kernel_page
BUG ();
}
+#include <asm-generic/tlbflush.h>
+
#endif /* __V850_TLBFLUSH_H__ */
diff -uprwNBb -X 2.6.25-rc9/Documentation/dontdiff /home/rossb/local/linux-2.6.25-rc9/include/asm-x86/tlbflush.h 2.6.25-rc9/include/asm-x86/tlbflush.h
--- /home/rossb/local/linux-2.6.25-rc9/include/asm-x86/tlbflush.h 2008-04-11 13:32:29.000000000 -0700
+++ 2.6.25-rc9/include/asm-x86/tlbflush.h 2008-04-14 09:00:18.000000000 -0700
@@ -35,6 +35,13 @@ static inline void __native_flush_tlb_si
__asm__ __volatile__("invlpg (%0)" ::"r" (addr) : "memory");
}
+#define ARCH_HAS_RELOAD_TLB
+static inline void load_cr3(pgd_t *pgd);
+static inline void __reload_tlb_mm(struct mm_struct *mm)
+{
+ load_cr3(mm->pgd);
+}
+
static inline void __flush_tlb_all(void)
{
if (cpu_has_pge)
@@ -53,8 +60,10 @@ static inline void __flush_tlb_one(unsig
#ifdef CONFIG_X86_32
# define TLB_FLUSH_ALL 0xffffffff
+# define TLB_RELOAD_ALL 0xfffffffe
#else
# define TLB_FLUSH_ALL -1ULL
+# define TLB_RELOAD_ALL -2ULL
#endif
/*
@@ -82,6 +91,12 @@ static inline void __flush_tlb_one(unsig
#define flush_tlb_all() __flush_tlb_all()
#define local_flush_tlb() __flush_tlb()
+static inline void reload_tlb_mm(struct mm_struct *mm)
+{
+ if (mm == current->active_mm)
+ __reload_tlb_mm(mm);
+}
+
static inline void flush_tlb_mm(struct mm_struct *mm)
{
if (mm == current->active_mm)
@@ -114,6 +129,10 @@ static inline void native_flush_tlb_othe
#define local_flush_tlb() __flush_tlb()
+#define local_reload_tlb_mm(mm) \
+ __reload_tlb_mm(mm)
+
+extern void reload_tlb_mm(struct mm_struct *mm);
extern void flush_tlb_all(void);
extern void flush_tlb_current_task(void);
extern void flush_tlb_mm(struct mm_struct *);
@@ -155,4 +174,6 @@ static inline void flush_tlb_kernel_rang
flush_tlb_all();
}
+#include <asm-generic/tlbflush.h>
+
#endif /* _ASM_X86_TLBFLUSH_H */
diff -uprwNBb -X 2.6.25-rc9/Documentation/dontdiff /home/rossb/local/linux-2.6.25-rc9/include/asm-xtensa/tlbflush.h 2.6.25-rc9/include/asm-xtensa/tlbflush.h
--- /home/rossb/local/linux-2.6.25-rc9/include/asm-xtensa/tlbflush.h 2008-04-11 13:32:29.000000000 -0700
+++ 2.6.25-rc9/include/asm-xtensa/tlbflush.h 2008-04-14 09:00:18.000000000 -0700
@@ -186,6 +186,8 @@ static inline unsigned long read_itlb_tr
return tmp;
}
+#include <asm-generic/tlbflush.h>
+
#endif /* __ASSEMBLY__ */
#endif /* __KERNEL__ */
#endif /* _XTENSA_TLBFLUSH_H */
diff -uprwNBb -X 2.6.25-rc9/Documentation/dontdiff /home/rossb/local/linux-2.6.25-rc9/include/linux/sched.h 2.6.25-rc9/include/linux/sched.h
--- /home/rossb/local/linux-2.6.25-rc9/include/linux/sched.h 2008-04-11 13:32:29.000000000 -0700
+++ 2.6.25-rc9/include/linux/sched.h 2008-04-14 09:00:18.000000000 -0700
@@ -408,6 +408,16 @@ extern int get_dumpable(struct mm_struct
#define MMF_DUMP_FILTER_DEFAULT \
((1 << MMF_DUMP_ANON_PRIVATE) | (1 << MMF_DUMP_ANON_SHARED))
+/* Misc MM flags. */
+#define MMF_NEED_FLUSH 7
+#define MMF_NEED_RELOAD 8 /* Only meaningful on some archs. */
+
+#ifdef CONFIG_RELOCATE_PAGE_TABLES
+#define MMF_NEED_REWALK 9 /* Must rewalk page tables with spin
+ * lock held. */
+#endif /* CONFIG_RELOCATE_PAGE_TABLES */
+
+
struct sighand_struct {
atomic_t count;
struct k_sigaction action[_NSIG];
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] MM: Make page tables relocatable -- conditional flush (rc9)
2008-04-14 16:39 [PATCH 1/2] MM: Make page tables relocatable -- conditional flush (rc9) Ross Biro
@ 2008-04-14 22:57 ` Andrew Morton
2008-04-15 12:47 ` Ross Biro
2008-04-16 19:22 ` Christoph Lameter
0 siblings, 2 replies; 6+ messages in thread
From: Andrew Morton @ 2008-04-14 22:57 UTC (permalink / raw)
To: Ross Biro; +Cc: linux-kernel, linux-mm
On Mon, 14 Apr 2008 09:39:33 -0700 (PDT)
rossb@google.com (Ross Biro) wrote:
> These Patches make page tables relocatable for numa, memory
> defragmentation, and memory hotblug. The potential need to rewalk the
> page tables before making any changes causes a 3% peformance
> degredation in the lmbench page miss micro benchmark.
We're going to need a considerably more detailed description than this,
please.
This is a large patch which is quite intrusive on the core memory
management code. It appears that there has been close to zero interest
from any MM developers apart from a bit of to-and-fro back in October.
Probably because nobody can see why the chnges are valuable to them, and
that's probably because you're not telling them!
For starters, what problems does the patchset solve? People can partially
work that out for themselves if they are sufficiently experienced with the
internals of defrag and hotplug, but it does not hurt at all to spell this out.
Secondly, how does the code work? What is the overall design? Any
implementation details or shortcomings or todos which we should know about?
This patchset doesn't apply to the 2.6.26 queue because of the ongoing x86
shell game: the arch/x86/kernel/smp_??.c files were consolidated.
I could fix that up and merge the patches, but I review patches when I
merge them, and these ones would require a lengthy review. That review
would be much less effective than it would be if I had a complete
description of the design and implementation from its designer and
implementor.
The reason for this is that reviewing code for correctness involves a)
understanding (and approving of) the design then b) attempting to identify
places where the implementation incorrectly implements that design. But if
the reviewer has to gain his understanding of the design from the
implementation we get into a circularity problem and mistakes can be made.
Generally, where possible, I do think that it's best if the design and
implementation are conveyed in code comments rather than changelog. That's
more convenient for readers and for reviewers and makes it more likely that
the documentation will remain correct as the code evolves. But this
patchset adds few comments.
Just one example: I have no way of knowing what led you to choose
down_interruptible() in enter_page_table_relocation_mode(). So people who
read the code two years hence will be wondering the same thing.
Minor notes from a quick scan:
- Must ->page_table_relocation_lock be a semaphore? mutexes are
preferred.
- The patch adds a number of largeish inlined functions. There's rarely
a need for this, and it can lead to large icache footprint which will, we
expect, produce slower code.
- The patch adds a lot of macros which look like they could have been
implemented as inlines. Inlines are preferred, please. They look nicer,
they provide typechecking, they avoid accidental
multiple-reference-to-arguments bugs and they help to avoid
unused-variable warnings.
- Doing PAGE_SIZE memcpy under spin_lock_irqsave() might get a bit
expensive from an interrupt-latency POV. It could (I think?) result in
large periods of time where interrupts are almost always disabled, which
might disrupt some device drivers.
- Why is this code doing spin_lock_irqsave() on page_table_lock? The
rest of mm/ doesn't disable IRQs for that lock. This implies that
something somewhere is now taking that lock from interrupt context, which
means that existing code will deadlock. Unless you converted all those
sites as well. Which would be a major change, which would need to be
documented in big blinking lights in the changelog.
- I haven't checked, but if the code is taking KM_USER0 from interrupt
context then that would be a bug. Switching to KM_IRQ0 would fix that.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] MM: Make page tables relocatable -- conditional flush (rc9)
2008-04-14 22:57 ` Andrew Morton
@ 2008-04-15 12:47 ` Ross Biro
2008-04-16 19:22 ` Christoph Lameter
1 sibling, 0 replies; 6+ messages in thread
From: Ross Biro @ 2008-04-15 12:47 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, linux-mm
On Mon, Apr 14, 2008 at 6:57 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
>
> This patchset doesn't apply to the 2.6.26 queue because of the ongoing x86
> shell game: the arch/x86/kernel/smp_??.c files were consolidated.
It's probably best to just wait until the smoke clears on 2.6.26 then.
I'll add some comments, however I usually get in trouble for adding
too verbose comments, so I've learned to go the other way. If you
prefer comments though, I'll add them.
> - Must ->page_table_relocation_lock be a semaphore? mutexes are
> preferred.
Not any more. It used to require a semaphore, but I can switch it
back to a mutex now. I can even replace the mutex with an atomic
inc/dec which might be even better since it will work at interrupt
time as well.
> - The patch adds a number of largeish inlined functions. There's rarely
> a need for this, and it can lead to large icache footprint which will, we
> expect, produce slower code.
If these are the ones I'm thinking of, they are in the fast path on
page faults. So they should be inlined. However, I could easily
change it to a small macro or inline function and a regular function
call that would rarely be taken. This should be a win from the icache
point of view and only a loss in a case we really don't care much
about.
> - The patch adds a lot of macros which look like they could have been
> implemented as inlines. Inlines are preferred, please. They look nicer,
> they provide typechecking, they avoid accidental
> multiple-reference-to-arguments bugs and they help to avoid
> unused-variable warnings.
Here I disagree. The only added function-like #define's I see are
either just aliasing functions, or the case when any function that
does nothing. I guess the later could be replaced by inlines to avoid
warnings.
> - Doing PAGE_SIZE memcpy under spin_lock_irqsave() might get a bit
> expensive from an interrupt-latency POV. It could (I think?) result in
> large periods of time where interrupts are almost always disabled, which
> might disrupt some device drivers.
Here I'm just being stupid. There is no reason to have interrupts
disabled at this point.
>
> - Why is this code doing spin_lock_irqsave() on page_table_lock? The
> rest of mm/ doesn't disable IRQs for that lock. This implies that
Laziness. I didn't feel like figuring this out if the irqsave was
necessary when I started, and forgot to go back and fix it later.
There is no reason.
> - I haven't checked, but if the code is taking KM_USER0 from interrupt
> context then that would be a bug. Switching to KM_IRQ0 would fix that.
KM_USER0 is currently correct. For memory hotplug, we may need to
change this in the future.
Ross
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] MM: Make page tables relocatable -- conditional flush (rc9)
2008-04-14 22:57 ` Andrew Morton
2008-04-15 12:47 ` Ross Biro
@ 2008-04-16 19:22 ` Christoph Lameter
2008-04-29 13:27 ` Ross Biro
1 sibling, 1 reply; 6+ messages in thread
From: Christoph Lameter @ 2008-04-16 19:22 UTC (permalink / raw)
To: Andrew Morton; +Cc: Ross Biro, linux-kernel, linux-mm, mel, apm
On Mon, 14 Apr 2008, Andrew Morton wrote:
> This is a large patch which is quite intrusive on the core memory
> management code. It appears that there has been close to zero interest
> from any MM developers apart from a bit of to-and-fro back in October.
> Probably because nobody can see why the chnges are valuable to them, and
> that's probably because you're not telling them!
The patch is interesting because it would allow the moving of page table
pages into MOVABLE sections and reduce the size of the UNMOVABLE
allocations signficantly (Ross: We need some numbers here). This in turn
improves the success of the antifrag methods. May also improve lumpy
reclaim if it can be adapted to move page table pages out of the way.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] MM: Make page tables relocatable -- conditional flush (rc9)
2008-04-16 19:22 ` Christoph Lameter
@ 2008-04-29 13:27 ` Ross Biro
2008-05-02 9:58 ` Mel Gorman
0 siblings, 1 reply; 6+ messages in thread
From: Ross Biro @ 2008-04-29 13:27 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Andrew Morton, linux-kernel, linux-mm, mel, apm
On Wed, Apr 16, 2008 at 3:22 PM, Christoph Lameter <clameter@sgi.com> wrote:
> The patch is interesting because it would allow the moving of page table
> pages into MOVABLE sections and reduce the size of the UNMOVABLE
> allocations signficantly (Ross: We need some numbers here). This in turn
Is there a standard test used to evaluate kernel memory fragmentation?
I'm sure I can rig up a test to create huge amounts of fragmentation
with about 1/2 the pages being page tables. However, I doubt that it
would reflect any real loads. Similarly, if I check the memory
fragmentation on my test system right after it's been booted, I won't
see much fragmentation and page tables won't be causing any trouble.
Ross
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] MM: Make page tables relocatable -- conditional flush (rc9)
2008-04-29 13:27 ` Ross Biro
@ 2008-05-02 9:58 ` Mel Gorman
0 siblings, 0 replies; 6+ messages in thread
From: Mel Gorman @ 2008-05-02 9:58 UTC (permalink / raw)
To: Ross Biro; +Cc: Christoph Lameter, Andrew Morton, linux-kernel, linux-mm, apm
On (29/04/08 09:27), Ross Biro didst pronounce:
> On Wed, Apr 16, 2008 at 3:22 PM, Christoph Lameter <clameter@sgi.com> wrote:
> > The patch is interesting because it would allow the moving of page table
> > pages into MOVABLE sections and reduce the size of the UNMOVABLE
> > allocations signficantly (Ross: We need some numbers here). This in turn
>
> Is there a standard test used to evaluate kernel memory fragmentation?
Not exactly, but the test I run most frequently for testing
fragmentation-related problems is
1. Kernbench building 2.6.14 - 5 iterations
2. aim9
3. bench-hugepagecapability.sh
4. bench-stresshighalloc.sh
the last two are from vmregress
www.csn.ul.ie/~mel/projects/vmregress/vmregress-0.88-rc7.tar.gz which is a
mess of undocumented tools. bench-stresshighalloc.sh needs to be built
against the current running kernel
./configure --with-linux=PATH_TO_KERNEL_SOURCE
The parameters passed to the last two tests depend on the machine but
generally -k $((PHYS_MEMORY_IN_MB/250)) for the number of kernels and
--mb-per-sec 16 are the most important ones.
These tests are not suitable on machines with very large amounts of memory
because too many kernels would be built at the same time and the machine
just grinds. Originally, the tests reflected the most hostile load in terms
of fragmentation, but it's showing it's age now as a suitable test.
Particularly from your perspective, the test is not very pagetable-page
intentensive. You could run the tests above and then start a long-lived
test like sysbench tuned to consume most of memory and then trigger a
relocation to see how effective it would be? Tuned to consume most of
memory should mean there are a lot of pagetable-allocations as well.
If you wanted to see what large page allocation was like at any time,
you could use bench-plainhighalloc.sh from vmregress just to artifically
allocate huge pages or attempt growing of the hugepage pool via proc as
that would also give an indication of the fragmentation state of the
system.
> I'm sure I can rig up a test to create huge amounts of fragmentation
> with about 1/2 the pages being page tables. However, I doubt that it
> would reflect any real loads. Similarly, if I check the memory
> fragmentation on my test system right after it's been booted, I won't
> see much fragmentation and page tables won't be causing any trouble.
>
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2008-05-02 9:58 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-14 16:39 [PATCH 1/2] MM: Make page tables relocatable -- conditional flush (rc9) Ross Biro
2008-04-14 22:57 ` Andrew Morton
2008-04-15 12:47 ` Ross Biro
2008-04-16 19:22 ` Christoph Lameter
2008-04-29 13:27 ` Ross Biro
2008-05-02 9:58 ` Mel Gorman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).