* 2.5.53-mm2
@ 2002-12-29 0:52 ` Andrew Morton
0 siblings, 0 replies; 6+ messages in thread
From: Andrew Morton @ 2002-12-29 0:52 UTC (permalink / raw)
To: lkml, linux-mm
http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.53/2.5.53-mm2/
Mainly stability work:
. If pte_chain_alloc() fails to allocate GFP_ATOMIC memory, the kernel
oopses. This is a long-standing rmap problem. Present also in the 2.4
rmap patches and, as far as I know, production Red Hat kernels.
So it is clearly a very rare problem but it is not acceptable to have
an unchecked kmalloc in the core of the 2.5 VM.
The approach which I took was to change the page_add_rmap() API to
require the caller to pass in a preallocated pte_chain. And change
all callers to allocate their pte_chains with GFP_KERNEL.
This change is fairly ugly, but every other hare-brained scheme I
could come up with had holes. This one adds maybe 20 instructions
to pagefaults and works...
The swapoff path has not yet been converted - this can still oops.
The locking isn't quite right yet, if shared pagetables are enabled.
. If radix_tree_insert() fails to allocate GFP_ATOMIC memory, a system call
will return -ENOMEM, resulting in application failure.
This was fixed by implementing a reservation API within the slab allocator.
Before taking locks the caller of radix_tree_insert will ask slab to preallocate
sufficient objects in this CPU's slab head array to guarantee that the
allocation of up to seven (on ia32) radix_tree_nodes cannot fail.
This permitted the removal of the radix tree mempool. That's a 130 kbyte
saving. (260k on 64-bit).
. Some aggressive pruning of various system-wide memory reserve settings:
- The page reservation limits in the page allocator have been reduced
from ~256 pages per zone to ~4 pages per zone.
- The preallocation levels in the slab head arrays (which were ridiculously
large) have been reduced from 32k-128k to, typically, a single page.
- The per-cpu-pages head arrays in the page allocator have been reduced
from ~64 pages to 2 pages.
The net effect of these changes is to remove almost all of the kernel's
reserved memory buffers. Instead of maintaining several megabytes of
free memory the kernel will only maintain some tens of kilobytes.
And guess what? Everything still works.
I won't be submitting these changes - they are here for robustness testing.
But it certainly does indicate that the settings of these thresholds need
to be reviewed. And that there don't appear to be any low-on-memory deadlocks
in the VM (with ext2, at least..)
. An updated dcache_rcu patch which should fix a rename-related race which
Al Viro noted.
Changes since 2.5.53-mm1:
+linus.patch
Latest -BK
+aic-bounce.patch
aic7xxx highmem IO fix
+misc.patch
triviata
+devfs-fix.patch
A partial fix for a CONFIG_DEVFS=y boot problem
+copy_page_range-cleanup.patch
Small cleanups, partly to ease the maintenance of the shared pagetable
diff.
+pte_chain_alloc-fix.patch
Infrastructure for handling pte_chain_alloc() failures.
+page_add_rmap-rework.patch
Handle pte_chain_alloc() failures.
shpte-ng.patch
Lots of changes to handle pte_chain_alloc() failures.
+slab-preallocation.patch
Add an API to slab to reserve objects in the per-CPU head arrays.
+slab-export-tuning.patch
Export the slab head-array tuning functions.
+rat-preallocation.patch
Add a reservation API to the radix_tree code.
+use-rat-preallocation.patch
Use the reservation API to avoid radix_tree allocation failures.
+teeny-mem-limits.patch
Remove most of the page allocator page reserves.
+smaller-head-arrays.patch
Remove most of the slab memory reserves.
+remove-hugetlb-syscalls.patch
Remove the hugetlb system calls. hugetlbfs is suitable.
All 72 patches:
linus.patch
cset-1.951-to-1.1030.txt.gz
kgdb.patch
aic-bounce.patch
rcf.patch
run-child-first after fork
ga2.patch
don't call console drivers on non-online CPUs
misc.patch
misc fixes
devfs-fix.patch
dio-return-partial-result.patch
aio-direct-io-infrastructure.patch
AIO support for raw/O_DIRECT
deferred-bio-dirtying.patch
bio dirtying infrastructure
aio-direct-io.patch
AIO support for raw/O_DIRECT
aio-dio-debug.patch
dio-reduce-context-switch-rate.patch
Reduced wakeup rate in direct-io code
cputimes_stat.patch
Retore per-cpu time accounting, with a config option
reduce-random-context-switch-rate.patch
Reduce context switch rate due to the random driver
inlines-net.patch
rbtree-iosched.patch
rbtree-based IO scheduler
deadsched-fix.patch
deadline scheduler fix
quota-smp-locks.patch
Subject: [PATCH] Quota SMP locks
copy_page_range-cleanup.patch
copy_page_range: minor cleanup
pte_chain_alloc-fix.patch
page_add_rmap-rework.patch
shpte-ng.patch
pagetable sharing for ia32
slab-preallocation.patch
slab-export-tuning.patch
rat-preallocation.patch
use-rat-preallocation.patch
teeny-mem-limits.patch
smaller-head-arrays.patch
ptrace-flush.patch
Subject: [PATCH] ptrace on 2.5.44
buffer-debug.patch
buffer.c debugging
warn-null-wakeup.patch
pentium-II.patch
Pentium-II support bits
rcu-stats.patch
RCU statistics reporting
auto-unplug.patch
self-unplugging request queues
less-unplugging.patch
Remove most of the blk_run_queues() calls
ext3-fsync-speedup.patch
Clean up ext3_sync_file()
lockless-current_kernel_time.patch
Lockless current_kernel_timer()
scheduler-tunables.patch
scheduler tunables
dio-always-kmalloc.patch
direct-io: dynamically allocate struct dio
file-nr-doc-fix.patch
Docs: fix explanation of file-nr
set_page_dirty_lock.patch
fix set_page_dirty vs truncate&free races
remove-memshared.patch
Remove /proc/meminfo:MemShared
bin2bcd.patch
BIN_TO_BCD consolidation
log_buf_size.patch
move LOG_BUF_SIZE to header/config
semtimedop-update.patch
Enable semtimedop for ia64 32-bit emulation.
drain_local_pages.patch
add drain_local_pages() for CONFIG_SOFTWARE_SUSPEND
htlb-2.patch
hugetlb: fix MAP_FIXED handling
kmalloc_percpu.patch
kmalloc_percpu -- stripped down version
config_page_offset.patch
Configurable kenrel/user memory split
config_hz.patch
CONFIGurable HZ
dont-aligns-vmas.patch
Don't cacheline-align vm_area_struct
remove-swappable.patch
remove task_struct.swappable
remove-hugetlb-syscalls.patch
Subject: [hugetlb] remove hugetlb syscalls
wli-01_numaq_io.patch
(undescribed patch)
wli-02_do_sak.patch
(undescribed patch)
wli-03_proc_super.patch
(undescribed patch)
wli-06_uml_get_task.patch
(undescribed patch)
wli-07_numaq_mem_map.patch
(undescribed patch)
wli-08_numaq_pgdat.patch
(undescribed patch)
wli-09_has_stopped_jobs.patch
(undescribed patch)
wli-10_inode_wait.patch
(undescribed patch)
wli-11_pgd_ctor.patch
(undescribed patch)
wli-12_pidhash_size.patch
(undescribed patch)
wli-13_rmap_nrpte.patch
(undescribed patch)
dcache_rcu-2.patch
dcache_rcu-2-2.5.51.patch
dcache_rcu-3.patch
dcache_rcu-3-2.5.51.patch
page-walk-api.patch
page-walk-scsi.patch
page-walk-api-update.patch
pagewalk API update
gup-check-valid.patch
valid page test in get_user_pages()
^ permalink raw reply [flat|nested] 6+ messages in thread* 2.5.53-mm2 @ 2002-12-29 0:52 ` Andrew Morton 0 siblings, 0 replies; 6+ messages in thread From: Andrew Morton @ 2002-12-29 0:52 UTC (permalink / raw) To: lkml, linux-mm http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.53/2.5.53-mm2/ Mainly stability work: . If pte_chain_alloc() fails to allocate GFP_ATOMIC memory, the kernel oopses. This is a long-standing rmap problem. Present also in the 2.4 rmap patches and, as far as I know, production Red Hat kernels. So it is clearly a very rare problem but it is not acceptable to have an unchecked kmalloc in the core of the 2.5 VM. The approach which I took was to change the page_add_rmap() API to require the caller to pass in a preallocated pte_chain. And change all callers to allocate their pte_chains with GFP_KERNEL. This change is fairly ugly, but every other hare-brained scheme I could come up with had holes. This one adds maybe 20 instructions to pagefaults and works... The swapoff path has not yet been converted - this can still oops. The locking isn't quite right yet, if shared pagetables are enabled. . If radix_tree_insert() fails to allocate GFP_ATOMIC memory, a system call will return -ENOMEM, resulting in application failure. This was fixed by implementing a reservation API within the slab allocator. Before taking locks the caller of radix_tree_insert will ask slab to preallocate sufficient objects in this CPU's slab head array to guarantee that the allocation of up to seven (on ia32) radix_tree_nodes cannot fail. This permitted the removal of the radix tree mempool. That's a 130 kbyte saving. (260k on 64-bit). . Some aggressive pruning of various system-wide memory reserve settings: - The page reservation limits in the page allocator have been reduced from ~256 pages per zone to ~4 pages per zone. - The preallocation levels in the slab head arrays (which were ridiculously large) have been reduced from 32k-128k to, typically, a single page. - The per-cpu-pages head arrays in the page allocator have been reduced from ~64 pages to 2 pages. The net effect of these changes is to remove almost all of the kernel's reserved memory buffers. Instead of maintaining several megabytes of free memory the kernel will only maintain some tens of kilobytes. And guess what? Everything still works. I won't be submitting these changes - they are here for robustness testing. But it certainly does indicate that the settings of these thresholds need to be reviewed. And that there don't appear to be any low-on-memory deadlocks in the VM (with ext2, at least..) . An updated dcache_rcu patch which should fix a rename-related race which Al Viro noted. Changes since 2.5.53-mm1: +linus.patch Latest -BK +aic-bounce.patch aic7xxx highmem IO fix +misc.patch triviata +devfs-fix.patch A partial fix for a CONFIG_DEVFS=y boot problem +copy_page_range-cleanup.patch Small cleanups, partly to ease the maintenance of the shared pagetable diff. +pte_chain_alloc-fix.patch Infrastructure for handling pte_chain_alloc() failures. +page_add_rmap-rework.patch Handle pte_chain_alloc() failures. shpte-ng.patch Lots of changes to handle pte_chain_alloc() failures. +slab-preallocation.patch Add an API to slab to reserve objects in the per-CPU head arrays. +slab-export-tuning.patch Export the slab head-array tuning functions. +rat-preallocation.patch Add a reservation API to the radix_tree code. +use-rat-preallocation.patch Use the reservation API to avoid radix_tree allocation failures. +teeny-mem-limits.patch Remove most of the page allocator page reserves. +smaller-head-arrays.patch Remove most of the slab memory reserves. +remove-hugetlb-syscalls.patch Remove the hugetlb system calls. hugetlbfs is suitable. All 72 patches: linus.patch cset-1.951-to-1.1030.txt.gz kgdb.patch aic-bounce.patch rcf.patch run-child-first after fork ga2.patch don't call console drivers on non-online CPUs misc.patch misc fixes devfs-fix.patch dio-return-partial-result.patch aio-direct-io-infrastructure.patch AIO support for raw/O_DIRECT deferred-bio-dirtying.patch bio dirtying infrastructure aio-direct-io.patch AIO support for raw/O_DIRECT aio-dio-debug.patch dio-reduce-context-switch-rate.patch Reduced wakeup rate in direct-io code cputimes_stat.patch Retore per-cpu time accounting, with a config option reduce-random-context-switch-rate.patch Reduce context switch rate due to the random driver inlines-net.patch rbtree-iosched.patch rbtree-based IO scheduler deadsched-fix.patch deadline scheduler fix quota-smp-locks.patch Subject: [PATCH] Quota SMP locks copy_page_range-cleanup.patch copy_page_range: minor cleanup pte_chain_alloc-fix.patch page_add_rmap-rework.patch shpte-ng.patch pagetable sharing for ia32 slab-preallocation.patch slab-export-tuning.patch rat-preallocation.patch use-rat-preallocation.patch teeny-mem-limits.patch smaller-head-arrays.patch ptrace-flush.patch Subject: [PATCH] ptrace on 2.5.44 buffer-debug.patch buffer.c debugging warn-null-wakeup.patch pentium-II.patch Pentium-II support bits rcu-stats.patch RCU statistics reporting auto-unplug.patch self-unplugging request queues less-unplugging.patch Remove most of the blk_run_queues() calls ext3-fsync-speedup.patch Clean up ext3_sync_file() lockless-current_kernel_time.patch Lockless current_kernel_timer() scheduler-tunables.patch scheduler tunables dio-always-kmalloc.patch direct-io: dynamically allocate struct dio file-nr-doc-fix.patch Docs: fix explanation of file-nr set_page_dirty_lock.patch fix set_page_dirty vs truncate&free races remove-memshared.patch Remove /proc/meminfo:MemShared bin2bcd.patch BIN_TO_BCD consolidation log_buf_size.patch move LOG_BUF_SIZE to header/config semtimedop-update.patch Enable semtimedop for ia64 32-bit emulation. drain_local_pages.patch add drain_local_pages() for CONFIG_SOFTWARE_SUSPEND htlb-2.patch hugetlb: fix MAP_FIXED handling kmalloc_percpu.patch kmalloc_percpu -- stripped down version config_page_offset.patch Configurable kenrel/user memory split config_hz.patch CONFIGurable HZ dont-aligns-vmas.patch Don't cacheline-align vm_area_struct remove-swappable.patch remove task_struct.swappable remove-hugetlb-syscalls.patch Subject: [hugetlb] remove hugetlb syscalls wli-01_numaq_io.patch (undescribed patch) wli-02_do_sak.patch (undescribed patch) wli-03_proc_super.patch (undescribed patch) wli-06_uml_get_task.patch (undescribed patch) wli-07_numaq_mem_map.patch (undescribed patch) wli-08_numaq_pgdat.patch (undescribed patch) wli-09_has_stopped_jobs.patch (undescribed patch) wli-10_inode_wait.patch (undescribed patch) wli-11_pgd_ctor.patch (undescribed patch) wli-12_pidhash_size.patch (undescribed patch) wli-13_rmap_nrpte.patch (undescribed patch) dcache_rcu-2.patch dcache_rcu-2-2.5.51.patch dcache_rcu-3.patch dcache_rcu-3-2.5.51.patch page-walk-api.patch page-walk-scsi.patch page-walk-api-update.patch pagewalk API update gup-check-valid.patch valid page test in get_user_pages() -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 2.5.53-mm2 2002-12-29 0:52 ` 2.5.53-mm2 Andrew Morton @ 2003-01-02 4:53 ` William Lee Irwin III -1 siblings, 0 replies; 6+ messages in thread From: William Lee Irwin III @ 2003-01-02 4:53 UTC (permalink / raw) To: Andrew Morton; +Cc: lkml, linux-mm On Sat, Dec 28, 2002 at 04:52:20PM -0800, Andrew Morton wrote: > wli-11_pgd_ctor.patch > (undescribed patch) A moment's reflection on the subject suggests to me it's worthwhile to generalize pgd_ctor support so it works (without #ifdefs!) on both PAE and non-PAE. This tiny tweak is actually more noticeably beneficial on non-PAE systems but only really because pgd_alloc() is more visible; the most likely reason it's less visible on PAE is "other overhead". It looks particularly nice since it removes more code than it adds. Touch tested on NUMA-Q (PAE). OFTC #kn testers testing the non-PAE case. arch/i386/mm/init.c | 36 +++++++++++++---------- arch/i386/mm/pgtable.c | 58 ++++++++++++-------------------------- include/asm-i386/pgtable-3level.h | 2 - include/asm-i386/pgtable.h | 13 +------- 4 files changed, 41 insertions(+), 68 deletions(-) diff -urpN mm3-2.5.53-1/arch/i386/mm/init.c mm3-2.5.53-2/arch/i386/mm/init.c --- mm3-2.5.53-1/arch/i386/mm/init.c 2003-01-01 18:49:19.000000000 -0800 +++ mm3-2.5.53-2/arch/i386/mm/init.c 2003-01-01 18:51:17.000000000 -0800 @@ -504,32 +504,36 @@ void __init mem_init(void) #endif } -#if CONFIG_X86_PAE #include <linux/slab.h> -kmem_cache_t *pae_pmd_cachep; -kmem_cache_t *pae_pgd_cachep; +kmem_cache_t *pmd_cache; +kmem_cache_t *pgd_cache; -void pae_pmd_ctor(void *, kmem_cache_t *, unsigned long); -void pae_pgd_ctor(void *, kmem_cache_t *, unsigned long); +void pmd_ctor(void *, kmem_cache_t *, unsigned long); +void pgd_ctor(void *, kmem_cache_t *, unsigned long); void __init pgtable_cache_init(void) { + if (PTRS_PER_PMD > 1) { + pmd_cache = kmem_cache_create("pae_pmd", + PTRS_PER_PMD*sizeof(pmd_t), + 0, + SLAB_HWCACHE_ALIGN | SLAB_MUST_HWCACHE_ALIGN, + pmd_ctor, + NULL); + + if (!pmd_cache) + panic("pgtable_cache_init(): cannot create pmd cache"); + } + /* * PAE pgds must be 16-byte aligned: */ - pae_pmd_cachep = kmem_cache_create("pae_pmd", 4096, 0, - SLAB_HWCACHE_ALIGN | SLAB_MUST_HWCACHE_ALIGN, pae_pmd_ctor, NULL); - - if (!pae_pmd_cachep) - panic("init_pae(): cannot allocate pae_pmd SLAB cache"); - - pae_pgd_cachep = kmem_cache_create("pae_pgd", 32, 0, - SLAB_HWCACHE_ALIGN | SLAB_MUST_HWCACHE_ALIGN, pae_pgd_ctor, NULL); - if (!pae_pgd_cachep) - panic("init_pae(): Cannot alloc pae_pgd SLAB cache"); + pgd_cache = kmem_cache_create("pgd", PTRS_PER_PGD*sizeof(pgd_t), 0, + SLAB_HWCACHE_ALIGN | SLAB_MUST_HWCACHE_ALIGN, pgd_ctor, NULL); + if (!pgd_cache) + panic("pgtable_cache_init(): Cannot create pgd cache"); } -#endif /* Put this after the callers, so that it cannot be inlined */ static int do_test_wp_bit(void) diff -urpN mm3-2.5.53-1/arch/i386/mm/pgtable.c mm3-2.5.53-2/arch/i386/mm/pgtable.c --- mm3-2.5.53-1/arch/i386/mm/pgtable.c 2003-01-01 18:49:19.000000000 -0800 +++ mm3-2.5.53-2/arch/i386/mm/pgtable.c 2003-01-01 18:51:17.000000000 -0800 @@ -166,19 +166,20 @@ struct page *pte_alloc_one(struct mm_str return pte; } -#if CONFIG_X86_PAE +extern kmem_cache_t *pmd_cache; +extern kmem_cache_t *pgd_cache; -extern kmem_cache_t *pae_pmd_cachep; - -void pae_pmd_ctor(void *__pmd, kmem_cache_t *pmd_cache, unsigned long flags) +void pmd_ctor(void *__pmd, kmem_cache_t *pmd_cache, unsigned long flags) { clear_page(__pmd); } -void pae_pgd_ctor(void *__pgd, kmem_cache_t *pgd_cache, unsigned long flags) +void pgd_ctor(void *__pgd, kmem_cache_t *pgd_cache, unsigned long flags) { pgd_t *pgd = __pgd; + if (PTRS_PER_PMD == 1) + memset(pgd, 0, USER_PTRS_PER_PGD * sizeof(pgd_t)); memcpy(pgd + USER_PTRS_PER_PGD, swapper_pg_dir + USER_PTRS_PER_PGD, (PTRS_PER_PGD - USER_PTRS_PER_PGD) * sizeof(pgd_t)); @@ -187,59 +188,38 @@ void pae_pgd_ctor(void *__pgd, kmem_cach pgd_t *pgd_alloc(struct mm_struct *mm) { int i; - pgd_t *pgd = kmem_cache_alloc(pae_pgd_cachep, SLAB_KERNEL); + pgd_t *pgd = kmem_cache_alloc(pgd_cache, SLAB_KERNEL); - if (!pgd) + if (PTRS_PER_PMD == 1) + return pgd; + else if (!pgd) return NULL; for (i = 0; i < USER_PTRS_PER_PGD; ++i) { - pmd_t *pmd = kmem_cache_alloc(pae_pmd_cachep, SLAB_KERNEL); + pmd_t *pmd = kmem_cache_alloc(pmd_cache, SLAB_KERNEL); if (!pmd) goto out_oom; - else if ((unsigned long)pmd & ~PAGE_MASK) { - printk("kmem_cache_alloc did wrong! death ensues!\n"); - goto out_oom; - } set_pgd(pgd + i, __pgd(1 + __pa((unsigned long long)((unsigned long)pmd)))); } return pgd; out_oom: for (i--; i >= 0; --i) - kmem_cache_free(pae_pmd_cachep, (void *)__va(pgd_val(pgd[i])-1)); - kmem_cache_free(pae_pgd_cachep, (void *)pgd); + kmem_cache_free(pmd_cache, (void *)__va(pgd_val(pgd[i])-1)); + kmem_cache_free(pgd_cache, (void *)pgd); return NULL; } void pgd_free(pgd_t *pgd) { int i; - for (i = 0; i < USER_PTRS_PER_PGD; ++i) { - kmem_cache_free(pae_pmd_cachep, (void *)__va(pgd_val(pgd[i])-1)); - set_pgd(pgd + i, __pgd(0)); - } - kmem_cache_free(pae_pgd_cachep, (void *)pgd); -} - -#else -pgd_t *pgd_alloc(struct mm_struct *mm) -{ - pgd_t *pgd = (pgd_t *)__get_free_page(GFP_KERNEL); - - if (pgd) { - memset(pgd, 0, USER_PTRS_PER_PGD * sizeof(pgd_t)); - memcpy(pgd + USER_PTRS_PER_PGD, - swapper_pg_dir + USER_PTRS_PER_PGD, - (PTRS_PER_PGD - USER_PTRS_PER_PGD) * sizeof(pgd_t)); + if (PTRS_PER_PMD > 1) { + for (i = 0; i < USER_PTRS_PER_PGD; ++i) { + kmem_cache_free(pmd_cache, (void *)__va(pgd_val(pgd[i])-1)); + set_pgd(pgd + i, __pgd(0)); + } } - return pgd; -} -void pgd_free(pgd_t *pgd) -{ - free_page((unsigned long)pgd); + kmem_cache_free(pgd_cache, (void *)pgd); } - -#endif /* CONFIG_X86_PAE */ - diff -urpN mm3-2.5.53-1/include/asm-i386/pgtable-3level.h mm3-2.5.53-2/include/asm-i386/pgtable-3level.h --- mm3-2.5.53-1/include/asm-i386/pgtable-3level.h 2002-12-23 21:21:07.000000000 -0800 +++ mm3-2.5.53-2/include/asm-i386/pgtable-3level.h 2003-01-01 18:51:17.000000000 -0800 @@ -106,6 +106,4 @@ static inline pmd_t pfn_pmd(unsigned lon return __pmd(((unsigned long long)page_nr << PAGE_SHIFT) | pgprot_val(pgprot)); } -extern struct kmem_cache_s *pae_pgd_cachep; - #endif /* _I386_PGTABLE_3LEVEL_H */ diff -urpN mm3-2.5.53-1/include/asm-i386/pgtable.h mm3-2.5.53-2/include/asm-i386/pgtable.h --- mm3-2.5.53-1/include/asm-i386/pgtable.h 2003-01-01 18:49:21.000000000 -0800 +++ mm3-2.5.53-2/include/asm-i386/pgtable.h 2003-01-01 18:51:17.000000000 -0800 @@ -41,22 +41,13 @@ extern unsigned long empty_zero_page[102 #ifndef __ASSEMBLY__ #if CONFIG_X86_PAE # include <asm/pgtable-3level.h> - -/* - * Need to initialise the X86 PAE caches - */ -extern void pgtable_cache_init(void); - #else # include <asm/pgtable-2level.h> +#endif -/* - * No page table caches to initialise - */ -#define pgtable_cache_init() do { } while (0) +void pgtable_cache_init(void); #endif -#endif #define __beep() asm("movb $0x3,%al; outb %al,$0x61") ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 2.5.53-mm2 @ 2003-01-02 4:53 ` William Lee Irwin III 0 siblings, 0 replies; 6+ messages in thread From: William Lee Irwin III @ 2003-01-02 4:53 UTC (permalink / raw) To: Andrew Morton; +Cc: lkml, linux-mm On Sat, Dec 28, 2002 at 04:52:20PM -0800, Andrew Morton wrote: > wli-11_pgd_ctor.patch > (undescribed patch) A moment's reflection on the subject suggests to me it's worthwhile to generalize pgd_ctor support so it works (without #ifdefs!) on both PAE and non-PAE. This tiny tweak is actually more noticeably beneficial on non-PAE systems but only really because pgd_alloc() is more visible; the most likely reason it's less visible on PAE is "other overhead". It looks particularly nice since it removes more code than it adds. Touch tested on NUMA-Q (PAE). OFTC #kn testers testing the non-PAE case. arch/i386/mm/init.c | 36 +++++++++++++---------- arch/i386/mm/pgtable.c | 58 ++++++++++++-------------------------- include/asm-i386/pgtable-3level.h | 2 - include/asm-i386/pgtable.h | 13 +------- 4 files changed, 41 insertions(+), 68 deletions(-) diff -urpN mm3-2.5.53-1/arch/i386/mm/init.c mm3-2.5.53-2/arch/i386/mm/init.c --- mm3-2.5.53-1/arch/i386/mm/init.c 2003-01-01 18:49:19.000000000 -0800 +++ mm3-2.5.53-2/arch/i386/mm/init.c 2003-01-01 18:51:17.000000000 -0800 @@ -504,32 +504,36 @@ void __init mem_init(void) #endif } -#if CONFIG_X86_PAE #include <linux/slab.h> -kmem_cache_t *pae_pmd_cachep; -kmem_cache_t *pae_pgd_cachep; +kmem_cache_t *pmd_cache; +kmem_cache_t *pgd_cache; -void pae_pmd_ctor(void *, kmem_cache_t *, unsigned long); -void pae_pgd_ctor(void *, kmem_cache_t *, unsigned long); +void pmd_ctor(void *, kmem_cache_t *, unsigned long); +void pgd_ctor(void *, kmem_cache_t *, unsigned long); void __init pgtable_cache_init(void) { + if (PTRS_PER_PMD > 1) { + pmd_cache = kmem_cache_create("pae_pmd", + PTRS_PER_PMD*sizeof(pmd_t), + 0, + SLAB_HWCACHE_ALIGN | SLAB_MUST_HWCACHE_ALIGN, + pmd_ctor, + NULL); + + if (!pmd_cache) + panic("pgtable_cache_init(): cannot create pmd cache"); + } + /* * PAE pgds must be 16-byte aligned: */ - pae_pmd_cachep = kmem_cache_create("pae_pmd", 4096, 0, - SLAB_HWCACHE_ALIGN | SLAB_MUST_HWCACHE_ALIGN, pae_pmd_ctor, NULL); - - if (!pae_pmd_cachep) - panic("init_pae(): cannot allocate pae_pmd SLAB cache"); - - pae_pgd_cachep = kmem_cache_create("pae_pgd", 32, 0, - SLAB_HWCACHE_ALIGN | SLAB_MUST_HWCACHE_ALIGN, pae_pgd_ctor, NULL); - if (!pae_pgd_cachep) - panic("init_pae(): Cannot alloc pae_pgd SLAB cache"); + pgd_cache = kmem_cache_create("pgd", PTRS_PER_PGD*sizeof(pgd_t), 0, + SLAB_HWCACHE_ALIGN | SLAB_MUST_HWCACHE_ALIGN, pgd_ctor, NULL); + if (!pgd_cache) + panic("pgtable_cache_init(): Cannot create pgd cache"); } -#endif /* Put this after the callers, so that it cannot be inlined */ static int do_test_wp_bit(void) diff -urpN mm3-2.5.53-1/arch/i386/mm/pgtable.c mm3-2.5.53-2/arch/i386/mm/pgtable.c --- mm3-2.5.53-1/arch/i386/mm/pgtable.c 2003-01-01 18:49:19.000000000 -0800 +++ mm3-2.5.53-2/arch/i386/mm/pgtable.c 2003-01-01 18:51:17.000000000 -0800 @@ -166,19 +166,20 @@ struct page *pte_alloc_one(struct mm_str return pte; } -#if CONFIG_X86_PAE +extern kmem_cache_t *pmd_cache; +extern kmem_cache_t *pgd_cache; -extern kmem_cache_t *pae_pmd_cachep; - -void pae_pmd_ctor(void *__pmd, kmem_cache_t *pmd_cache, unsigned long flags) +void pmd_ctor(void *__pmd, kmem_cache_t *pmd_cache, unsigned long flags) { clear_page(__pmd); } -void pae_pgd_ctor(void *__pgd, kmem_cache_t *pgd_cache, unsigned long flags) +void pgd_ctor(void *__pgd, kmem_cache_t *pgd_cache, unsigned long flags) { pgd_t *pgd = __pgd; + if (PTRS_PER_PMD == 1) + memset(pgd, 0, USER_PTRS_PER_PGD * sizeof(pgd_t)); memcpy(pgd + USER_PTRS_PER_PGD, swapper_pg_dir + USER_PTRS_PER_PGD, (PTRS_PER_PGD - USER_PTRS_PER_PGD) * sizeof(pgd_t)); @@ -187,59 +188,38 @@ void pae_pgd_ctor(void *__pgd, kmem_cach pgd_t *pgd_alloc(struct mm_struct *mm) { int i; - pgd_t *pgd = kmem_cache_alloc(pae_pgd_cachep, SLAB_KERNEL); + pgd_t *pgd = kmem_cache_alloc(pgd_cache, SLAB_KERNEL); - if (!pgd) + if (PTRS_PER_PMD == 1) + return pgd; + else if (!pgd) return NULL; for (i = 0; i < USER_PTRS_PER_PGD; ++i) { - pmd_t *pmd = kmem_cache_alloc(pae_pmd_cachep, SLAB_KERNEL); + pmd_t *pmd = kmem_cache_alloc(pmd_cache, SLAB_KERNEL); if (!pmd) goto out_oom; - else if ((unsigned long)pmd & ~PAGE_MASK) { - printk("kmem_cache_alloc did wrong! death ensues!\n"); - goto out_oom; - } set_pgd(pgd + i, __pgd(1 + __pa((unsigned long long)((unsigned long)pmd)))); } return pgd; out_oom: for (i--; i >= 0; --i) - kmem_cache_free(pae_pmd_cachep, (void *)__va(pgd_val(pgd[i])-1)); - kmem_cache_free(pae_pgd_cachep, (void *)pgd); + kmem_cache_free(pmd_cache, (void *)__va(pgd_val(pgd[i])-1)); + kmem_cache_free(pgd_cache, (void *)pgd); return NULL; } void pgd_free(pgd_t *pgd) { int i; - for (i = 0; i < USER_PTRS_PER_PGD; ++i) { - kmem_cache_free(pae_pmd_cachep, (void *)__va(pgd_val(pgd[i])-1)); - set_pgd(pgd + i, __pgd(0)); - } - kmem_cache_free(pae_pgd_cachep, (void *)pgd); -} - -#else -pgd_t *pgd_alloc(struct mm_struct *mm) -{ - pgd_t *pgd = (pgd_t *)__get_free_page(GFP_KERNEL); - - if (pgd) { - memset(pgd, 0, USER_PTRS_PER_PGD * sizeof(pgd_t)); - memcpy(pgd + USER_PTRS_PER_PGD, - swapper_pg_dir + USER_PTRS_PER_PGD, - (PTRS_PER_PGD - USER_PTRS_PER_PGD) * sizeof(pgd_t)); + if (PTRS_PER_PMD > 1) { + for (i = 0; i < USER_PTRS_PER_PGD; ++i) { + kmem_cache_free(pmd_cache, (void *)__va(pgd_val(pgd[i])-1)); + set_pgd(pgd + i, __pgd(0)); + } } - return pgd; -} -void pgd_free(pgd_t *pgd) -{ - free_page((unsigned long)pgd); + kmem_cache_free(pgd_cache, (void *)pgd); } - -#endif /* CONFIG_X86_PAE */ - diff -urpN mm3-2.5.53-1/include/asm-i386/pgtable-3level.h mm3-2.5.53-2/include/asm-i386/pgtable-3level.h --- mm3-2.5.53-1/include/asm-i386/pgtable-3level.h 2002-12-23 21:21:07.000000000 -0800 +++ mm3-2.5.53-2/include/asm-i386/pgtable-3level.h 2003-01-01 18:51:17.000000000 -0800 @@ -106,6 +106,4 @@ static inline pmd_t pfn_pmd(unsigned lon return __pmd(((unsigned long long)page_nr << PAGE_SHIFT) | pgprot_val(pgprot)); } -extern struct kmem_cache_s *pae_pgd_cachep; - #endif /* _I386_PGTABLE_3LEVEL_H */ diff -urpN mm3-2.5.53-1/include/asm-i386/pgtable.h mm3-2.5.53-2/include/asm-i386/pgtable.h --- mm3-2.5.53-1/include/asm-i386/pgtable.h 2003-01-01 18:49:21.000000000 -0800 +++ mm3-2.5.53-2/include/asm-i386/pgtable.h 2003-01-01 18:51:17.000000000 -0800 @@ -41,22 +41,13 @@ extern unsigned long empty_zero_page[102 #ifndef __ASSEMBLY__ #if CONFIG_X86_PAE # include <asm/pgtable-3level.h> - -/* - * Need to initialise the X86 PAE caches - */ -extern void pgtable_cache_init(void); - #else # include <asm/pgtable-2level.h> +#endif -/* - * No page table caches to initialise - */ -#define pgtable_cache_init() do { } while (0) +void pgtable_cache_init(void); #endif -#endif #define __beep() asm("movb $0x3,%al; outb %al,$0x61") -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 2.5.53-mm2 2003-01-02 4:53 ` 2.5.53-mm2 William Lee Irwin III @ 2003-01-02 5:25 ` William Lee Irwin III -1 siblings, 0 replies; 6+ messages in thread From: William Lee Irwin III @ 2003-01-02 5:25 UTC (permalink / raw) To: Andrew Morton, lkml, linux-mm On Sat, Dec 28, 2002 at 04:52:20PM -0800, Andrew Morton wrote: >> wli-11_pgd_ctor.patch On Wed, Jan 01, 2003 at 08:53:27PM -0800, William Lee Irwin III wrote: > A moment's reflection on the subject suggests to me it's worthwhile to > generalize pgd_ctor support so it works (without #ifdefs!) on both PAE > and non-PAE. This tiny tweak is actually more noticeably beneficial > on non-PAE systems but only really because pgd_alloc() is more visible; > the most likely reason it's less visible on PAE is "other overhead". > It looks particularly nice since it removes more code than it adds. > Touch tested on NUMA-Q (PAE). OFTC #kn testers testing the non-PAE case. For those needing more interpretation, this is essentially a reinstatement of the 2.4.x-style pgd/pmd cache optimization in a leak-free and accounted (in /proc/slabinfo) manner. The point of the optimizations is that these initializations are large cache hits to take in a single shot, and in the PAE case, amount to a full L1 cache flush as they traverse almost an entire 16K. No rigorous benchmarking has been done yet. Bill ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 2.5.53-mm2 @ 2003-01-02 5:25 ` William Lee Irwin III 0 siblings, 0 replies; 6+ messages in thread From: William Lee Irwin III @ 2003-01-02 5:25 UTC (permalink / raw) To: Andrew Morton, lkml, linux-mm On Sat, Dec 28, 2002 at 04:52:20PM -0800, Andrew Morton wrote: >> wli-11_pgd_ctor.patch On Wed, Jan 01, 2003 at 08:53:27PM -0800, William Lee Irwin III wrote: > A moment's reflection on the subject suggests to me it's worthwhile to > generalize pgd_ctor support so it works (without #ifdefs!) on both PAE > and non-PAE. This tiny tweak is actually more noticeably beneficial > on non-PAE systems but only really because pgd_alloc() is more visible; > the most likely reason it's less visible on PAE is "other overhead". > It looks particularly nice since it removes more code than it adds. > Touch tested on NUMA-Q (PAE). OFTC #kn testers testing the non-PAE case. For those needing more interpretation, this is essentially a reinstatement of the 2.4.x-style pgd/pmd cache optimization in a leak-free and accounted (in /proc/slabinfo) manner. The point of the optimizations is that these initializations are large cache hits to take in a single shot, and in the PAE case, amount to a full L1 cache flush as they traverse almost an entire 16K. No rigorous benchmarking has been done yet. Bill -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2003-01-02 5:25 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2002-12-29 0:52 2.5.53-mm2 Andrew Morton 2002-12-29 0:52 ` 2.5.53-mm2 Andrew Morton 2003-01-02 4:53 ` 2.5.53-mm2 William Lee Irwin III 2003-01-02 4:53 ` 2.5.53-mm2 William Lee Irwin III 2003-01-02 5:25 ` 2.5.53-mm2 William Lee Irwin III 2003-01-02 5:25 ` 2.5.53-mm2 William Lee Irwin III
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.