* [PATCH 0/2] mm: measuring resource demand @ 2006-07-11 18:29 ` Peter Zijlstra 0 siblings, 0 replies; 10+ messages in thread From: Peter Zijlstra @ 2006-07-11 18:29 UTC (permalink / raw) To: linux-mm, linux-kernel; +Cc: Peter Zijlstra, Rik van Riel Hi, This patch set implements a refault histogram. This can be used to effectively measure resource demand, as outlined in Rik's OLS paper "Measuring Resource Demand on Linux" available at: http://people.redhat.com/~riel/riel-OLS2006.pdf This current posting is meant to start a discussion on the topic, with the ultimate goal of getting something like this in mainline. Peter ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 0/2] mm: measuring resource demand @ 2006-07-11 18:29 ` Peter Zijlstra 0 siblings, 0 replies; 10+ messages in thread From: Peter Zijlstra @ 2006-07-11 18:29 UTC (permalink / raw) To: linux-mm, linux-kernel; +Cc: Peter Zijlstra, Rik van Riel Hi, This patch set implements a refault histogram. This can be used to effectively measure resource demand, as outlined in Rik's OLS paper "Measuring Resource Demand on Linux" available at: http://people.redhat.com/~riel/riel-OLS2006.pdf This current posting is meant to start a discussion on the topic, with the ultimate goal of getting something like this in mainline. Peter -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/2] mm: nonresident page tracking 2006-07-11 18:29 ` Peter Zijlstra @ 2006-07-11 18:29 ` Peter Zijlstra -1 siblings, 0 replies; 10+ messages in thread From: Peter Zijlstra @ 2006-07-11 18:29 UTC (permalink / raw) To: linux-mm, linux-kernel; +Cc: Peter Zijlstra, Rik van Riel From: Rik van Riel <riel@redhat.com> Track non-resident pages through a simple hashing scheme. This way the space overhead is limited to 1 u32 per page, or 0.1% space overhead and lookups are one cache miss. Aside from seeing whether or not a page was recently evicted, we can also take a reasonable guess at how many other pages were evicted since this page was evicted. NOTE: bucket space also contributes to the total size of the hash. This way even 64-bit machines with more than 2^32 pages get a fair chance. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> include/linux/nonresident.h | 35 +++++++++ init/main.c | 2 mm/Kconfig | 4 + mm/Makefile | 1 mm/nonresident.c | 167 ++++++++++++++++++++++++++++++++++++++++++++ mm/swap.c | 3 mm/vmscan.c | 3 7 files changed, 215 insertions(+) Index: linux-2.6/mm/nonresident.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-2.6/mm/nonresident.c 2006-07-10 19:51:24.000000000 +0200 @@ -0,0 +1,171 @@ +/* + * mm/nonresident.c + * (C) 2004,2005 Red Hat, Inc + * Written by Rik van Riel <riel@redhat.com> + * Released under the GPL, see the file COPYING for details. + * + * Keeps track of whether a non-resident page was recently evicted + * and should be immediately promoted to the active list. This also + * helps automatically tune the inactive target. + * + * The pageout code stores a recently evicted page in this cache + * by calling remember_page(mapping/mm, index/vaddr, generation) + * and can look it up in the cache by calling recently_evicted() + * with the same arguments. + * + * Note that there is no way to invalidate pages after eg. truncate + * or exit, we let the pages fall out of the non-resident set through + * normal replacement. + */ +#include <linux/mm.h> +#include <linux/cache.h> +#include <linux/spinlock.h> +#include <linux/bootmem.h> +#include <linux/hash.h> +#include <linux/prefetch.h> +#include <linux/kernel.h> + +/* Number of non-resident pages per hash bucket. Never smaller than 15. */ +#if (L1_CACHE_BYTES < 64) +#define NR_BUCKET_BYTES 64 +#else +#define NR_BUCKET_BYTES L1_CACHE_BYTES +#endif +#define NUM_NR ((NR_BUCKET_BYTES - sizeof(atomic_t))/sizeof(u32)) + +struct nr_bucket +{ + atomic_t hand; + u32 page[NUM_NR]; +} ____cacheline_aligned; + +/* The non-resident page hash table. */ +static struct nr_bucket * nonres_table; +static unsigned int nonres_shift; +static unsigned int nonres_mask; + +static struct nr_bucket * nr_hash(void * mapping, unsigned long index) +{ + unsigned long bucket; + unsigned long hash; + + hash = hash_ptr(mapping, BITS_PER_LONG); + hash = 37 * hash + hash_long(index, BITS_PER_LONG); + bucket = hash & nonres_mask; + + return nonres_table + bucket; +} + +static u32 nr_cookie(struct address_space * mapping, unsigned long index) +{ + /* + * Different hash magic from bucket selection to insure + * the combined bits extend hash-space. + */ + unsigned long cookie = hash_long(index, BITS_PER_LONG); + cookie = 51 * cookie + hash_ptr(mapping, BITS_PER_LONG); + + if (mapping && mapping->host) { + cookie = 37 * cookie + hash_long(mapping->host->i_ino, BITS_PER_LONG); + } + + return (u32)(cookie >> (BITS_PER_LONG - 32)); +} + +unsigned long nonresident_get(struct address_space * mapping, unsigned long index) +{ + struct nr_bucket * nr_bucket; + int distance; + u32 wanted; + int i; + + prefetch(mapping->host); + nr_bucket = nr_hash(mapping, index); + + prefetch(nr_bucket); + wanted = nr_cookie(mapping, index); + + for (i = 0; i < NUM_NR; i++) { + if (nr_bucket->page[i] == wanted) { + nr_bucket->page[i] = 0; + /* Return the distance between entry and clock hand. */ + distance = atomic_read(&nr_bucket->hand) + NUM_NR - i; + distance %= NUM_NR; + return (distance << nonres_shift) + (nr_bucket - nonres_table); + } + } + + return ~0UL; +} + +u32 nonresident_put(struct address_space * mapping, unsigned long index) +{ + struct nr_bucket * nr_bucket; + u32 nrpage; + int i; + + prefetch(mapping->host); + nr_bucket = nr_hash(mapping, index); + + prefetchw(nr_bucket); + nrpage = nr_cookie(mapping, index); + + /* Atomically find the next array index. */ + preempt_disable(); +retry: + i = atomic_inc_return(&nr_bucket->hand); + if (unlikely(i >= NUM_NR)) { + if (i == NUM_NR) + atomic_set(&nr_bucket->hand, -1); + goto retry; + } + preempt_enable(); + + /* Statistics may want to know whether the entry was in use. */ + return xchg(&nr_bucket->page[i], nrpage); +} + +unsigned long nonresident_total(void) +{ + return NUM_NR << nonres_shift; +} + +/* + * For interactive workloads, we remember about as many non-resident pages + * as we have actual memory pages. For server workloads with large inter- + * reference distances we could benefit from remembering more. + */ +static __initdata unsigned long nonresident_factor = 1; +void __init nonresident_init(void) +{ + int target; + int i; + + /* + * Calculate the non-resident hash bucket target. Use a power of + * two for the division because alloc_large_system_hash rounds up. + */ + target = nr_all_pages * nonresident_factor; + target /= (sizeof(struct nr_bucket) / sizeof(u32)); + + nonres_table = alloc_large_system_hash("Non-resident page tracking", + sizeof(struct nr_bucket), + target, + 0, + HASH_EARLY | HASH_HIGHMEM, + &nonres_shift, + &nonres_mask, + 0); + + for (i = 0; i < (1 << nonres_shift); i++) + atomic_set(&nonres_table[i].hand, 0); +} + +static int __init set_nonresident_factor(char * str) +{ + if (!str) + return 0; + nonresident_factor = simple_strtoul(str, &str, 0); + return 1; +} +__setup("nonresident_factor=", set_nonresident_factor); Index: linux-2.6/include/linux/nonresident.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-2.6/include/linux/nonresident.h 2006-07-10 19:51:27.000000000 +0200 @@ -0,0 +1,35 @@ +#ifndef _LINUX_NONRESIDENT_H_ +#define _LINUX_NONRESIDENT_H_ + +#ifdef __KERNEL__ + +#ifdef CONFIG_MM_NONRESIDENT + +extern void nonresident_init(void); +extern unsigned long nonresident_get(struct address_space *, unsigned long); +extern u32 nonresident_put(struct address_space *, unsigned long); +extern unsigned long nonresident_total(void); + +#else /* CONFIG_MM_NONRESIDENT */ + +static inline void nonresident_init(void) { } +static inline +unsigned long nonresident_get(struct address_space *, unsigned long, int) +{ + return 0; +} + +static inline u32 nonresident_put(struct address_space *, unsigned long) +{ + return 0; +} + +static inline unsigned long nonresident_total(void) +{ + return 0; +} + +#endif /* CONFIG_MM_NONRESIDENT */ + +#endif /* __KERNEL */ +#endif /* _LINUX_NONRESIDENT_H_ */ Index: linux-2.6/init/main.c =================================================================== --- linux-2.6.orig/init/main.c 2006-07-10 19:49:02.000000000 +0200 +++ linux-2.6/init/main.c 2006-07-10 19:49:52.000000000 +0200 @@ -49,6 +49,7 @@ #include <linux/buffer_head.h> #include <linux/debug_locks.h> #include <linux/lockdep.h> +#include <linux/nonresident.h> #include <asm/io.h> #include <asm/bugs.h> @@ -544,6 +545,7 @@ asmlinkage void __init start_kernel(void #endif vfs_caches_init_early(); cpuset_init_early(); + nonresident_init(); mem_init(); kmem_cache_init(); setup_per_cpu_pageset(); Index: linux-2.6/mm/Makefile =================================================================== --- linux-2.6.orig/mm/Makefile 2006-07-10 19:49:02.000000000 +0200 +++ linux-2.6/mm/Makefile 2006-07-10 19:49:52.000000000 +0200 @@ -13,6 +13,7 @@ obj-y := bootmem.o filemap.o mempool.o prio_tree.o util.o mmzone.o vmstat.o $(mmu-y) obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o thrash.o +obj-$(CONFIG_MM_NONRESIDENT) += nonresident.o obj-$(CONFIG_HUGETLBFS) += hugetlb.o obj-$(CONFIG_NUMA) += mempolicy.o obj-$(CONFIG_SPARSEMEM) += sparse.o Index: linux-2.6/mm/swap.c =================================================================== --- linux-2.6.orig/mm/swap.c 2006-07-10 19:49:02.000000000 +0200 +++ linux-2.6/mm/swap.c 2006-07-10 19:49:52.000000000 +0200 @@ -30,6 +30,7 @@ #include <linux/cpu.h> #include <linux/notifier.h> #include <linux/init.h> +#include <linux/nonresident.h> /* How many pages do we try to swap or page in/out together? */ int page_cluster; @@ -346,6 +347,7 @@ void __pagevec_lru_add(struct pagevec *p } BUG_ON(PageLRU(page)); SetPageLRU(page); + nonresident_get(page_mapping(page), page_index(page)); add_page_to_inactive_list(zone, page); } if (zone) @@ -373,6 +375,7 @@ void __pagevec_lru_add_active(struct pag } BUG_ON(PageLRU(page)); SetPageLRU(page); + nonresident_get(page_mapping(page), page_index(page)); BUG_ON(PageActive(page)); SetPageActive(page); add_page_to_active_list(zone, page); Index: linux-2.6/mm/vmscan.c =================================================================== --- linux-2.6.orig/mm/vmscan.c 2006-07-10 19:49:02.000000000 +0200 +++ linux-2.6/mm/vmscan.c 2006-07-10 19:49:52.000000000 +0200 @@ -35,6 +35,7 @@ #include <linux/rwsem.h> #include <linux/delay.h> #include <linux/kthread.h> +#include <linux/nonresident.h> #include <asm/tlbflush.h> #include <asm/div64.h> @@ -395,6 +396,7 @@ int remove_mapping(struct address_space if (PageSwapCache(page)) { swp_entry_t swap = { .val = page_private(page) }; + nonresident_put(mapping, page_index(page)); __delete_from_swap_cache(page); write_unlock_irq(&mapping->tree_lock); swap_free(swap); @@ -402,6 +404,7 @@ int remove_mapping(struct address_space return 1; } + nonresident_put(mapping, page_index(page)); __remove_from_page_cache(page); write_unlock_irq(&mapping->tree_lock); __put_page(page); Index: linux-2.6/mm/Kconfig =================================================================== --- linux-2.6.orig/mm/Kconfig 2006-07-10 19:49:02.000000000 +0200 +++ linux-2.6/mm/Kconfig 2006-07-10 19:51:24.000000000 +0200 @@ -152,3 +152,7 @@ config RESOURCES_64BIT default 64BIT help This option allows memory and IO resources to be 64 bit. + +config MM_NONRESIDENT + bool "Track nonresident pages" + def_bool y ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/2] mm: nonresident page tracking @ 2006-07-11 18:29 ` Peter Zijlstra 0 siblings, 0 replies; 10+ messages in thread From: Peter Zijlstra @ 2006-07-11 18:29 UTC (permalink / raw) To: linux-mm, linux-kernel; +Cc: Peter Zijlstra, Rik van Riel From: Rik van Riel <riel@redhat.com> Track non-resident pages through a simple hashing scheme. This way the space overhead is limited to 1 u32 per page, or 0.1% space overhead and lookups are one cache miss. Aside from seeing whether or not a page was recently evicted, we can also take a reasonable guess at how many other pages were evicted since this page was evicted. NOTE: bucket space also contributes to the total size of the hash. This way even 64-bit machines with more than 2^32 pages get a fair chance. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> include/linux/nonresident.h | 35 +++++++++ init/main.c | 2 mm/Kconfig | 4 + mm/Makefile | 1 mm/nonresident.c | 167 ++++++++++++++++++++++++++++++++++++++++++++ mm/swap.c | 3 mm/vmscan.c | 3 7 files changed, 215 insertions(+) Index: linux-2.6/mm/nonresident.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-2.6/mm/nonresident.c 2006-07-10 19:51:24.000000000 +0200 @@ -0,0 +1,171 @@ +/* + * mm/nonresident.c + * (C) 2004,2005 Red Hat, Inc + * Written by Rik van Riel <riel@redhat.com> + * Released under the GPL, see the file COPYING for details. + * + * Keeps track of whether a non-resident page was recently evicted + * and should be immediately promoted to the active list. This also + * helps automatically tune the inactive target. + * + * The pageout code stores a recently evicted page in this cache + * by calling remember_page(mapping/mm, index/vaddr, generation) + * and can look it up in the cache by calling recently_evicted() + * with the same arguments. + * + * Note that there is no way to invalidate pages after eg. truncate + * or exit, we let the pages fall out of the non-resident set through + * normal replacement. + */ +#include <linux/mm.h> +#include <linux/cache.h> +#include <linux/spinlock.h> +#include <linux/bootmem.h> +#include <linux/hash.h> +#include <linux/prefetch.h> +#include <linux/kernel.h> + +/* Number of non-resident pages per hash bucket. Never smaller than 15. */ +#if (L1_CACHE_BYTES < 64) +#define NR_BUCKET_BYTES 64 +#else +#define NR_BUCKET_BYTES L1_CACHE_BYTES +#endif +#define NUM_NR ((NR_BUCKET_BYTES - sizeof(atomic_t))/sizeof(u32)) + +struct nr_bucket +{ + atomic_t hand; + u32 page[NUM_NR]; +} ____cacheline_aligned; + +/* The non-resident page hash table. */ +static struct nr_bucket * nonres_table; +static unsigned int nonres_shift; +static unsigned int nonres_mask; + +static struct nr_bucket * nr_hash(void * mapping, unsigned long index) +{ + unsigned long bucket; + unsigned long hash; + + hash = hash_ptr(mapping, BITS_PER_LONG); + hash = 37 * hash + hash_long(index, BITS_PER_LONG); + bucket = hash & nonres_mask; + + return nonres_table + bucket; +} + +static u32 nr_cookie(struct address_space * mapping, unsigned long index) +{ + /* + * Different hash magic from bucket selection to insure + * the combined bits extend hash-space. + */ + unsigned long cookie = hash_long(index, BITS_PER_LONG); + cookie = 51 * cookie + hash_ptr(mapping, BITS_PER_LONG); + + if (mapping && mapping->host) { + cookie = 37 * cookie + hash_long(mapping->host->i_ino, BITS_PER_LONG); + } + + return (u32)(cookie >> (BITS_PER_LONG - 32)); +} + +unsigned long nonresident_get(struct address_space * mapping, unsigned long index) +{ + struct nr_bucket * nr_bucket; + int distance; + u32 wanted; + int i; + + prefetch(mapping->host); + nr_bucket = nr_hash(mapping, index); + + prefetch(nr_bucket); + wanted = nr_cookie(mapping, index); + + for (i = 0; i < NUM_NR; i++) { + if (nr_bucket->page[i] == wanted) { + nr_bucket->page[i] = 0; + /* Return the distance between entry and clock hand. */ + distance = atomic_read(&nr_bucket->hand) + NUM_NR - i; + distance %= NUM_NR; + return (distance << nonres_shift) + (nr_bucket - nonres_table); + } + } + + return ~0UL; +} + +u32 nonresident_put(struct address_space * mapping, unsigned long index) +{ + struct nr_bucket * nr_bucket; + u32 nrpage; + int i; + + prefetch(mapping->host); + nr_bucket = nr_hash(mapping, index); + + prefetchw(nr_bucket); + nrpage = nr_cookie(mapping, index); + + /* Atomically find the next array index. */ + preempt_disable(); +retry: + i = atomic_inc_return(&nr_bucket->hand); + if (unlikely(i >= NUM_NR)) { + if (i == NUM_NR) + atomic_set(&nr_bucket->hand, -1); + goto retry; + } + preempt_enable(); + + /* Statistics may want to know whether the entry was in use. */ + return xchg(&nr_bucket->page[i], nrpage); +} + +unsigned long nonresident_total(void) +{ + return NUM_NR << nonres_shift; +} + +/* + * For interactive workloads, we remember about as many non-resident pages + * as we have actual memory pages. For server workloads with large inter- + * reference distances we could benefit from remembering more. + */ +static __initdata unsigned long nonresident_factor = 1; +void __init nonresident_init(void) +{ + int target; + int i; + + /* + * Calculate the non-resident hash bucket target. Use a power of + * two for the division because alloc_large_system_hash rounds up. + */ + target = nr_all_pages * nonresident_factor; + target /= (sizeof(struct nr_bucket) / sizeof(u32)); + + nonres_table = alloc_large_system_hash("Non-resident page tracking", + sizeof(struct nr_bucket), + target, + 0, + HASH_EARLY | HASH_HIGHMEM, + &nonres_shift, + &nonres_mask, + 0); + + for (i = 0; i < (1 << nonres_shift); i++) + atomic_set(&nonres_table[i].hand, 0); +} + +static int __init set_nonresident_factor(char * str) +{ + if (!str) + return 0; + nonresident_factor = simple_strtoul(str, &str, 0); + return 1; +} +__setup("nonresident_factor=", set_nonresident_factor); Index: linux-2.6/include/linux/nonresident.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-2.6/include/linux/nonresident.h 2006-07-10 19:51:27.000000000 +0200 @@ -0,0 +1,35 @@ +#ifndef _LINUX_NONRESIDENT_H_ +#define _LINUX_NONRESIDENT_H_ + +#ifdef __KERNEL__ + +#ifdef CONFIG_MM_NONRESIDENT + +extern void nonresident_init(void); +extern unsigned long nonresident_get(struct address_space *, unsigned long); +extern u32 nonresident_put(struct address_space *, unsigned long); +extern unsigned long nonresident_total(void); + +#else /* CONFIG_MM_NONRESIDENT */ + +static inline void nonresident_init(void) { } +static inline +unsigned long nonresident_get(struct address_space *, unsigned long, int) +{ + return 0; +} + +static inline u32 nonresident_put(struct address_space *, unsigned long) +{ + return 0; +} + +static inline unsigned long nonresident_total(void) +{ + return 0; +} + +#endif /* CONFIG_MM_NONRESIDENT */ + +#endif /* __KERNEL */ +#endif /* _LINUX_NONRESIDENT_H_ */ Index: linux-2.6/init/main.c =================================================================== --- linux-2.6.orig/init/main.c 2006-07-10 19:49:02.000000000 +0200 +++ linux-2.6/init/main.c 2006-07-10 19:49:52.000000000 +0200 @@ -49,6 +49,7 @@ #include <linux/buffer_head.h> #include <linux/debug_locks.h> #include <linux/lockdep.h> +#include <linux/nonresident.h> #include <asm/io.h> #include <asm/bugs.h> @@ -544,6 +545,7 @@ asmlinkage void __init start_kernel(void #endif vfs_caches_init_early(); cpuset_init_early(); + nonresident_init(); mem_init(); kmem_cache_init(); setup_per_cpu_pageset(); Index: linux-2.6/mm/Makefile =================================================================== --- linux-2.6.orig/mm/Makefile 2006-07-10 19:49:02.000000000 +0200 +++ linux-2.6/mm/Makefile 2006-07-10 19:49:52.000000000 +0200 @@ -13,6 +13,7 @@ obj-y := bootmem.o filemap.o mempool.o prio_tree.o util.o mmzone.o vmstat.o $(mmu-y) obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o thrash.o +obj-$(CONFIG_MM_NONRESIDENT) += nonresident.o obj-$(CONFIG_HUGETLBFS) += hugetlb.o obj-$(CONFIG_NUMA) += mempolicy.o obj-$(CONFIG_SPARSEMEM) += sparse.o Index: linux-2.6/mm/swap.c =================================================================== --- linux-2.6.orig/mm/swap.c 2006-07-10 19:49:02.000000000 +0200 +++ linux-2.6/mm/swap.c 2006-07-10 19:49:52.000000000 +0200 @@ -30,6 +30,7 @@ #include <linux/cpu.h> #include <linux/notifier.h> #include <linux/init.h> +#include <linux/nonresident.h> /* How many pages do we try to swap or page in/out together? */ int page_cluster; @@ -346,6 +347,7 @@ void __pagevec_lru_add(struct pagevec *p } BUG_ON(PageLRU(page)); SetPageLRU(page); + nonresident_get(page_mapping(page), page_index(page)); add_page_to_inactive_list(zone, page); } if (zone) @@ -373,6 +375,7 @@ void __pagevec_lru_add_active(struct pag } BUG_ON(PageLRU(page)); SetPageLRU(page); + nonresident_get(page_mapping(page), page_index(page)); BUG_ON(PageActive(page)); SetPageActive(page); add_page_to_active_list(zone, page); Index: linux-2.6/mm/vmscan.c =================================================================== --- linux-2.6.orig/mm/vmscan.c 2006-07-10 19:49:02.000000000 +0200 +++ linux-2.6/mm/vmscan.c 2006-07-10 19:49:52.000000000 +0200 @@ -35,6 +35,7 @@ #include <linux/rwsem.h> #include <linux/delay.h> #include <linux/kthread.h> +#include <linux/nonresident.h> #include <asm/tlbflush.h> #include <asm/div64.h> @@ -395,6 +396,7 @@ int remove_mapping(struct address_space if (PageSwapCache(page)) { swp_entry_t swap = { .val = page_private(page) }; + nonresident_put(mapping, page_index(page)); __delete_from_swap_cache(page); write_unlock_irq(&mapping->tree_lock); swap_free(swap); @@ -402,6 +404,7 @@ int remove_mapping(struct address_space return 1; } + nonresident_put(mapping, page_index(page)); __remove_from_page_cache(page); write_unlock_irq(&mapping->tree_lock); __put_page(page); Index: linux-2.6/mm/Kconfig =================================================================== --- linux-2.6.orig/mm/Kconfig 2006-07-10 19:49:02.000000000 +0200 +++ linux-2.6/mm/Kconfig 2006-07-10 19:51:24.000000000 +0200 @@ -152,3 +152,7 @@ config RESOURCES_64BIT default 64BIT help This option allows memory and IO resources to be 64 bit. + +config MM_NONRESIDENT + bool "Track nonresident pages" + def_bool y -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <215036450607140155w67df26fan5b2342ead686ce8b@mail.gmail.com>]
* Re: [PATCH 1/2] mm: nonresident page tracking [not found] ` <215036450607140155w67df26fan5b2342ead686ce8b@mail.gmail.com> @ 2006-07-14 14:19 ` Peter Zijlstra 0 siblings, 0 replies; 10+ messages in thread From: Peter Zijlstra @ 2006-07-14 14:19 UTC (permalink / raw) To: Feng Jin; +Cc: linux-mm, linux-kernel, Rik van Riel On Fri, 2006-07-14 at 16:55 +0800, Feng Jin wrote: > Hi, > > I have applied the patch on 2.6.18-rc1-mm1, and when boot my system, > kernel panic occured, :( > I have tyied debug it with kdb, but panic occured at startup, although > I have add kdb=early, but it still > could not debug it. > attachment is my config file. >From the fact that the patch doesn't apply cleanly to .18-rc1-mm1, and that when I fixup the rejects it does boot, I can reach no other conclusion than that you blotched it somehow. This patch was against mainline from the day of the post. As for your suggestion of putting #ifdef CONFIG_MM_NONRESIDENT all over the place; have you seen how the nonresident.h file declares empty stubs for the functions? Peter ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] mm: nonresident page tracking @ 2006-07-14 14:19 ` Peter Zijlstra 0 siblings, 0 replies; 10+ messages in thread From: Peter Zijlstra @ 2006-07-14 14:19 UTC (permalink / raw) To: Feng Jin; +Cc: linux-mm, linux-kernel, Rik van Riel On Fri, 2006-07-14 at 16:55 +0800, Feng Jin wrote: > Hi, > > I have applied the patch on 2.6.18-rc1-mm1, and when boot my system, > kernel panic occured, :( > I have tyied debug it with kdb, but panic occured at startup, although > I have add kdb=early, but it still > could not debug it. > attachment is my config file. >From the fact that the patch doesn't apply cleanly to .18-rc1-mm1, and that when I fixup the rejects it does boot, I can reach no other conclusion than that you blotched it somehow. This patch was against mainline from the day of the post. As for your suggestion of putting #ifdef CONFIG_MM_NONRESIDENT all over the place; have you seen how the nonresident.h file declares empty stubs for the functions? Peter -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 2/2] mm: refault histogram 2006-07-11 18:29 ` Peter Zijlstra @ 2006-07-11 18:29 ` Peter Zijlstra -1 siblings, 0 replies; 10+ messages in thread From: Peter Zijlstra @ 2006-07-11 18:29 UTC (permalink / raw) To: linux-mm, linux-kernel; +Cc: Peter Zijlstra, Rik van Riel From: Peter Zijlstra <a.p.zijlstra@chello.nl> Adds a refault histogram on top of the nonresident code. Based on ideas and code from Rik van Riel. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl fs/proc/proc_misc.c | 23 ++++++++++ mm/Kconfig | 5 ++ mm/Makefile | 1 mm/nonresident.c | 15 +++++- mm/refault.c | 114 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 155 insertions(+), 3 deletions(-) Index: linux-2.6/fs/proc/proc_misc.c =================================================================== --- linux-2.6.orig/fs/proc/proc_misc.c 2006-07-11 18:06:58.000000000 +0200 +++ linux-2.6/fs/proc/proc_misc.c 2006-07-11 18:07:03.000000000 +0200 @@ -224,6 +224,26 @@ static struct file_operations fragmentat .release = seq_release, }; +#ifdef CONFIG_MM_REFAULT +extern struct seq_operations refault_op; +static int refault_open(struct inode *inode, struct file *file) +{ + (void)inode; + return seq_open(file, &refault_op); +} + +extern ssize_t refault_write(struct file *, const char __user *buf, + size_t count, loff_t *); + +static struct file_operations refault_file_operations = { + .open = refault_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release, + .write = refault_write, +}; +#endif + extern struct seq_operations zoneinfo_op; static int zoneinfo_open(struct inode *inode, struct file *file) { @@ -696,6 +716,9 @@ void __init proc_misc_init(void) #endif #endif create_seq_entry("buddyinfo",S_IRUGO, &fragmentation_file_operations); +#ifdef CONFIG_MM_REFAULT + create_seq_entry("refault",S_IRUGO, &refault_file_operations); +#endif create_seq_entry("vmstat",S_IRUGO, &proc_vmstat_file_operations); create_seq_entry("zoneinfo",S_IRUGO, &proc_zoneinfo_file_operations); create_seq_entry("diskstats", 0, &proc_diskstats_operations); Index: linux-2.6/mm/Kconfig =================================================================== --- linux-2.6.orig/mm/Kconfig 2006-07-11 18:07:03.000000000 +0200 +++ linux-2.6/mm/Kconfig 2006-07-11 18:07:03.000000000 +0200 @@ -156,3 +156,8 @@ config RESOURCES_64BIT config MM_NONRESIDENT bool "Track nonresident pages" def_bool y + +config MM_REFAULT + bool "Refault histogram" + def_bool y + depends on MM_NONRESIDENT Index: linux-2.6/mm/nonresident.c =================================================================== --- linux-2.6.orig/mm/nonresident.c 2006-07-11 18:07:03.000000000 +0200 +++ linux-2.6/mm/nonresident.c 2006-07-11 18:07:03.000000000 +0200 @@ -90,12 +90,21 @@ unsigned long nonresident_get(struct add nr_bucket->page[i] = 0; /* Return the distance between entry and clock hand. */ distance = atomic_read(&nr_bucket->hand) + NUM_NR - i; - distance %= NUM_NR; - return (distance << nonres_shift) + (nr_bucket - nonres_table); + distance = (distance % NUM_NR) << nonres_shift; + distance += (nr_bucket - nonres_table); + goto out; } } - return ~0UL; + distance = ~0UL; +out: +#ifdef CONFIG_MM_REFAULT + { + extern void nonresident_refault(unsigned long); + nonresident_refault(distance); + } +#endif /* CONFIG_MM_REFAULT */ + return distance; } u32 nonresident_put(struct address_space * mapping, unsigned long index) Index: linux-2.6/mm/refault.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-2.6/mm/refault.c 2006-07-11 18:07:03.000000000 +0200 @@ -0,0 +1,114 @@ +#include <linux/config.h> +#include <linux/percpu.h> +#include <linux/seq_file.h> +#include <asm/uaccess.h> + +#define BUCKETS 64 + +DEFINE_PER_CPU(unsigned long[BUCKETS+1], refault_histogram); + +extern unsigned long nonresident_total(void); + +void nonresident_refault(unsigned long distance) +{ + unsigned long nonres_bucket = nonresident_total() / BUCKETS; + unsigned long bucket_id = distance / nonres_bucket; + + if (bucket_id > BUCKETS) + bucket_id = BUCKETS; + + __get_cpu_var(refault_histogram)[bucket_id]++; +} + +#ifdef CONFIG_PROC_FS + +#include <linux/seq_file.h> + +static void *frag_start(struct seq_file *m, loff_t *pos) +{ + if (*pos < 0 || *pos > BUCKETS) + return NULL; + + m->private = (void *)(unsigned long)*pos; + + return pos; +} + +static void *frag_next(struct seq_file *m, void *arg, loff_t *pos) +{ + if (*pos < BUCKETS) { + (*pos)++; + (unsigned long)m->private++; + return pos; + } + return NULL; +} + +static void frag_stop(struct seq_file *m, void *arg) +{ +} + +unsigned long get_refault_stat(unsigned long index) +{ + unsigned long total = 0; + int cpu; + + for_each_possible_cpu(cpu) { + total += per_cpu(refault_histogram, cpu)[index]; + } + return total; +} + +static int frag_show(struct seq_file *m, void *arg) +{ + unsigned long index = (unsigned long)m->private; + unsigned long nonres_bucket = nonresident_total() / BUCKETS; + unsigned long upper = ((unsigned long)index + 1) * nonres_bucket; + unsigned long lower = (unsigned long)index * nonres_bucket; + unsigned long hits = get_refault_stat(index); + + if (index == 0) + seq_printf(m, " Refault distance Hits\n"); + + if (index < BUCKETS) + seq_printf(m, "%9lu - %9lu %9lu\n", lower, upper, hits); + else + seq_printf(m, " New/Beyond %9lu %9lu\n", lower, hits); + + return 0; +} + +struct seq_operations refault_op = { + .start = frag_start, + .next = frag_next, + .stop = frag_stop, + .show = frag_show, +}; + +static void refault_reset(void) +{ + int cpu; + int bucket_id; + + for_each_possible_cpu(cpu) { + for (bucket_id = 0; bucket_id <= BUCKETS; ++bucket_id) + per_cpu(refault_histogram, cpu)[bucket_id] = 0; + } +} + +ssize_t refault_write(struct file *file, const char __user *buf, + size_t count, loff_t *ppos) +{ + if (count) { + char c; + + if (get_user(c, buf)) + return -EFAULT; + if (c == '0') + refault_reset(); + } + return count; +} + +#endif /* CONFIG_PROCFS */ + Index: linux-2.6/mm/Makefile =================================================================== --- linux-2.6.orig/mm/Makefile 2006-07-11 18:07:03.000000000 +0200 +++ linux-2.6/mm/Makefile 2006-07-11 18:07:03.000000000 +0200 @@ -14,6 +14,7 @@ obj-y := bootmem.o filemap.o mempool.o obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o thrash.o obj-$(CONFIG_MM_NONRESIDENT) += nonresident.o +obj-$(CONFIG_MM_REFAULT) += refault.o obj-$(CONFIG_HUGETLBFS) += hugetlb.o obj-$(CONFIG_NUMA) += mempolicy.o obj-$(CONFIG_SPARSEMEM) += sparse.o ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 2/2] mm: refault histogram @ 2006-07-11 18:29 ` Peter Zijlstra 0 siblings, 0 replies; 10+ messages in thread From: Peter Zijlstra @ 2006-07-11 18:29 UTC (permalink / raw) To: linux-mm, linux-kernel; +Cc: Peter Zijlstra, Rik van Riel From: Peter Zijlstra <a.p.zijlstra@chello.nl> Adds a refault histogram on top of the nonresident code. Based on ideas and code from Rik van Riel. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl fs/proc/proc_misc.c | 23 ++++++++++ mm/Kconfig | 5 ++ mm/Makefile | 1 mm/nonresident.c | 15 +++++- mm/refault.c | 114 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 155 insertions(+), 3 deletions(-) Index: linux-2.6/fs/proc/proc_misc.c =================================================================== --- linux-2.6.orig/fs/proc/proc_misc.c 2006-07-11 18:06:58.000000000 +0200 +++ linux-2.6/fs/proc/proc_misc.c 2006-07-11 18:07:03.000000000 +0200 @@ -224,6 +224,26 @@ static struct file_operations fragmentat .release = seq_release, }; +#ifdef CONFIG_MM_REFAULT +extern struct seq_operations refault_op; +static int refault_open(struct inode *inode, struct file *file) +{ + (void)inode; + return seq_open(file, &refault_op); +} + +extern ssize_t refault_write(struct file *, const char __user *buf, + size_t count, loff_t *); + +static struct file_operations refault_file_operations = { + .open = refault_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release, + .write = refault_write, +}; +#endif + extern struct seq_operations zoneinfo_op; static int zoneinfo_open(struct inode *inode, struct file *file) { @@ -696,6 +716,9 @@ void __init proc_misc_init(void) #endif #endif create_seq_entry("buddyinfo",S_IRUGO, &fragmentation_file_operations); +#ifdef CONFIG_MM_REFAULT + create_seq_entry("refault",S_IRUGO, &refault_file_operations); +#endif create_seq_entry("vmstat",S_IRUGO, &proc_vmstat_file_operations); create_seq_entry("zoneinfo",S_IRUGO, &proc_zoneinfo_file_operations); create_seq_entry("diskstats", 0, &proc_diskstats_operations); Index: linux-2.6/mm/Kconfig =================================================================== --- linux-2.6.orig/mm/Kconfig 2006-07-11 18:07:03.000000000 +0200 +++ linux-2.6/mm/Kconfig 2006-07-11 18:07:03.000000000 +0200 @@ -156,3 +156,8 @@ config RESOURCES_64BIT config MM_NONRESIDENT bool "Track nonresident pages" def_bool y + +config MM_REFAULT + bool "Refault histogram" + def_bool y + depends on MM_NONRESIDENT Index: linux-2.6/mm/nonresident.c =================================================================== --- linux-2.6.orig/mm/nonresident.c 2006-07-11 18:07:03.000000000 +0200 +++ linux-2.6/mm/nonresident.c 2006-07-11 18:07:03.000000000 +0200 @@ -90,12 +90,21 @@ unsigned long nonresident_get(struct add nr_bucket->page[i] = 0; /* Return the distance between entry and clock hand. */ distance = atomic_read(&nr_bucket->hand) + NUM_NR - i; - distance %= NUM_NR; - return (distance << nonres_shift) + (nr_bucket - nonres_table); + distance = (distance % NUM_NR) << nonres_shift; + distance += (nr_bucket - nonres_table); + goto out; } } - return ~0UL; + distance = ~0UL; +out: +#ifdef CONFIG_MM_REFAULT + { + extern void nonresident_refault(unsigned long); + nonresident_refault(distance); + } +#endif /* CONFIG_MM_REFAULT */ + return distance; } u32 nonresident_put(struct address_space * mapping, unsigned long index) Index: linux-2.6/mm/refault.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-2.6/mm/refault.c 2006-07-11 18:07:03.000000000 +0200 @@ -0,0 +1,114 @@ +#include <linux/config.h> +#include <linux/percpu.h> +#include <linux/seq_file.h> +#include <asm/uaccess.h> + +#define BUCKETS 64 + +DEFINE_PER_CPU(unsigned long[BUCKETS+1], refault_histogram); + +extern unsigned long nonresident_total(void); + +void nonresident_refault(unsigned long distance) +{ + unsigned long nonres_bucket = nonresident_total() / BUCKETS; + unsigned long bucket_id = distance / nonres_bucket; + + if (bucket_id > BUCKETS) + bucket_id = BUCKETS; + + __get_cpu_var(refault_histogram)[bucket_id]++; +} + +#ifdef CONFIG_PROC_FS + +#include <linux/seq_file.h> + +static void *frag_start(struct seq_file *m, loff_t *pos) +{ + if (*pos < 0 || *pos > BUCKETS) + return NULL; + + m->private = (void *)(unsigned long)*pos; + + return pos; +} + +static void *frag_next(struct seq_file *m, void *arg, loff_t *pos) +{ + if (*pos < BUCKETS) { + (*pos)++; + (unsigned long)m->private++; + return pos; + } + return NULL; +} + +static void frag_stop(struct seq_file *m, void *arg) +{ +} + +unsigned long get_refault_stat(unsigned long index) +{ + unsigned long total = 0; + int cpu; + + for_each_possible_cpu(cpu) { + total += per_cpu(refault_histogram, cpu)[index]; + } + return total; +} + +static int frag_show(struct seq_file *m, void *arg) +{ + unsigned long index = (unsigned long)m->private; + unsigned long nonres_bucket = nonresident_total() / BUCKETS; + unsigned long upper = ((unsigned long)index + 1) * nonres_bucket; + unsigned long lower = (unsigned long)index * nonres_bucket; + unsigned long hits = get_refault_stat(index); + + if (index == 0) + seq_printf(m, " Refault distance Hits\n"); + + if (index < BUCKETS) + seq_printf(m, "%9lu - %9lu %9lu\n", lower, upper, hits); + else + seq_printf(m, " New/Beyond %9lu %9lu\n", lower, hits); + + return 0; +} + +struct seq_operations refault_op = { + .start = frag_start, + .next = frag_next, + .stop = frag_stop, + .show = frag_show, +}; + +static void refault_reset(void) +{ + int cpu; + int bucket_id; + + for_each_possible_cpu(cpu) { + for (bucket_id = 0; bucket_id <= BUCKETS; ++bucket_id) + per_cpu(refault_histogram, cpu)[bucket_id] = 0; + } +} + +ssize_t refault_write(struct file *file, const char __user *buf, + size_t count, loff_t *ppos) +{ + if (count) { + char c; + + if (get_user(c, buf)) + return -EFAULT; + if (c == '0') + refault_reset(); + } + return count; +} + +#endif /* CONFIG_PROCFS */ + Index: linux-2.6/mm/Makefile =================================================================== --- linux-2.6.orig/mm/Makefile 2006-07-11 18:07:03.000000000 +0200 +++ linux-2.6/mm/Makefile 2006-07-11 18:07:03.000000000 +0200 @@ -14,6 +14,7 @@ obj-y := bootmem.o filemap.o mempool.o obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o thrash.o obj-$(CONFIG_MM_NONRESIDENT) += nonresident.o +obj-$(CONFIG_MM_REFAULT) += refault.o obj-$(CONFIG_HUGETLBFS) += hugetlb.o obj-$(CONFIG_NUMA) += mempolicy.o obj-$(CONFIG_SPARSEMEM) += sparse.o -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 0/2] refaults @ 2007-07-26 17:25 Peter Zijlstra 2007-07-26 17:25 ` Peter Zijlstra, Rik van Riel 0 siblings, 1 reply; 10+ messages in thread From: Peter Zijlstra @ 2007-07-26 17:25 UTC (permalink / raw) To: linux-kernel, linux-mm, containers; +Cc: akpm, balbir, riel, a.p.zijlstra Hi, This is a brush up of the refault patches, as presented by Rik at last year's OLS: http://people.redhat.com/riel/riel-OLS2006.pdf When talking to people at OLS this year there was a renewed interrest in the concept. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/2] mm: nonresident page tracking 2007-07-26 17:25 [PATCH 0/2] refaults Peter Zijlstra @ 2007-07-26 17:25 ` Peter Zijlstra, Rik van Riel 0 siblings, 0 replies; 10+ messages in thread From: Peter Zijlstra @ 2007-07-26 17:25 UTC (permalink / raw) To: linux-kernel, linux-mm, containers; +Cc: akpm, balbir, riel, a.p.zijlstra [-- Attachment #1: mm-nonresident.patch --] [-- Type: text/plain, Size: 10665 bytes --] From: Rik van Riel <riel@redhat.com> Track non-resident pages through a simple hashing scheme. This way the space overhead is limited to 1 u32 per page, or 0.1% space overhead and lookups are one cache miss. Aside from seeing whether or not a page was recently evicted, we can also take a reasonable guess at how many other pages were evicted since this page was evicted. TODO: make the entries unsigned long, currently we're limited to 1^32*NUM_NR*PAGE_SIZE bytes of memory. Event though this would end up being 1008 TB of memory, I suspect the hash function to go crap at around 4 to 16 TB. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> --- Documentation/kernel-parameters.txt | 4 include/linux/nonresident.h | 35 +++++++ init/main.c | 2 mm/Kconfig | 4 mm/Makefile | 1 mm/nonresident.c | 173 ++++++++++++++++++++++++++++++++++++ mm/swap.c | 3 mm/vmscan.c | 3 8 files changed, 225 insertions(+) Index: linux-2.6/mm/nonresident.c =================================================================== --- /dev/null +++ linux-2.6/mm/nonresident.c @@ -0,0 +1,173 @@ +/* + * mm/nonresident.c + * (C) 2004,2005 Red Hat, Inc + * Written by Rik van Riel <riel@redhat.com> + * Released under the GPL, see the file COPYING for details. + * + * Keeps track of whether a non-resident page was recently evicted + * and should be immediately promoted to the active list. This also + * helps automatically tune the inactive target. + * + * The pageout code stores a recently evicted page in this cache + * by calling remember_page(mapping/mm, index/vaddr, generation) + * and can look it up in the cache by calling recently_evicted() + * with the same arguments. + * + * Note that there is no way to invalidate pages after eg. truncate + * or exit, we let the pages fall out of the non-resident set through + * normal replacement. + */ +#include <linux/mm.h> +#include <linux/cache.h> +#include <linux/spinlock.h> +#include <linux/bootmem.h> +#include <linux/hash.h> +#include <linux/prefetch.h> +#include <linux/kernel.h> + +/* Number of non-resident pages per hash bucket. Never smaller than 15. */ +#if (L1_CACHE_BYTES < 64) +#define NR_BUCKET_BYTES 64 +#else +#define NR_BUCKET_BYTES L1_CACHE_BYTES +#endif +#define NUM_NR ((NR_BUCKET_BYTES - sizeof(atomic_t))/sizeof(atomic_t)) + +struct nr_bucket +{ + atomic_t hand; + atomic_t cookie[NUM_NR]; +} ____cacheline_aligned; + +/* The non-resident page hash table. */ +static struct nr_bucket *nonres_table; +static unsigned int nonres_shift; +static unsigned int nonres_mask; + +static struct nr_bucket *nr_hash(void *mapping, unsigned long index) +{ + unsigned long bucket; + unsigned long hash; + + hash = hash_ptr(mapping, BITS_PER_LONG); + hash = 37 * hash + hash_long(index, BITS_PER_LONG); + bucket = hash & nonres_mask; + + return nonres_table + bucket; +} + +static u32 nr_cookie(struct address_space *mapping, unsigned long index) +{ + unsigned long cookie; + + cookie = hash_ptr(mapping, BITS_PER_LONG); + cookie = 37 * cookie + hash_long(index, BITS_PER_LONG); + + if (mapping && mapping->host) { + cookie *= 37; + cookie += hash_long(mapping->host->i_ino, BITS_PER_LONG); + } + + return (u32)(cookie >> (BITS_PER_LONG - 32)); +} + +unsigned long +nonresident_get(struct address_space *mapping, unsigned long index) +{ + struct nr_bucket *bucket; + unsigned long distance = ~0UL; + u32 wanted; + int i; + + if (!mapping) + return distance; + + prefetch(mapping->host); + bucket = nr_hash(mapping, index); + + prefetch(bucket); + wanted = nr_cookie(mapping, index); + + for (i = 0; i < NUM_NR; i++) { + if ((u32)atomic_cmpxchg(&bucket->cookie[i], wanted, 0) == wanted) { + /* Return the distance between entry and clock hand. */ + distance = atomic_read(&bucket->hand) + NUM_NR - i; + distance %= NUM_NR; + distance += (bucket - nonres_table); + goto out; + } + } + +out: + return distance; +} + +u32 nonresident_put(struct address_space *mapping, unsigned long index) +{ + struct nr_bucket *bucket; + u32 cookie; + int cur_hand; + int hand; + + prefetch(mapping->host); + bucket = nr_hash(mapping, index); + + prefetchw(bucket); + cookie = nr_cookie(mapping, index); + + /* Atomically find the next array index. */ + do { + cur_hand = atomic_read(&bucket->hand); + hand = cur_hand + 1; + if (unlikely(hand == NUM_NR)) + hand = 0; + } while (atomic_cmpxchg(&bucket->hand, cur_hand, hand) != cur_hand); + + /* Statistics may want to know whether the entry was in use. */ + return atomic_xchg(&bucket->cookie[hand], cookie); +} + +unsigned long nonresident_total(void) +{ + return NUM_NR << nonres_shift; +} + +/* + * For interactive workloads, we remember about as many non-resident pages + * as we have actual memory pages. For server workloads with large inter- + * reference distances we could benefit from remembering more. + */ +static __initdata unsigned long nonresident_factor = 100; +void __init nonresident_init(void) +{ + int target; + int i; + + /* + * Calculate the non-resident hash bucket target. Use a power of + * two for the division because alloc_large_system_hash rounds up. + */ + target = (nr_all_pages * nonresident_factor) / 100; + target /= (sizeof(struct nr_bucket) / sizeof(atomic_t)); + + nonres_table = alloc_large_system_hash("Non-resident page tracking", + sizeof(struct nr_bucket), + target, + 0, + HASH_EARLY, + &nonres_shift, + &nonres_mask, + 0); + + for (i = 0; i < (1 << nonres_shift); i++) + atomic_set(&nonres_table[i].hand, 0); +} + +static int __init set_nonresident_factor(char *str) +{ + if (!str) + return 0; + nonresident_factor = simple_strtoul(str, &str, 0); + return 1; +} +__setup("nonresident_factor=", set_nonresident_factor); Index: linux-2.6/include/linux/nonresident.h =================================================================== --- /dev/null +++ linux-2.6/include/linux/nonresident.h @@ -0,0 +1,35 @@ +#ifndef _LINUX_NONRESIDENT_H_ +#define _LINUX_NONRESIDENT_H_ + +#ifdef __KERNEL__ + +#ifdef CONFIG_MM_NONRESIDENT + +extern void nonresident_init(void); +extern unsigned long nonresident_get(struct address_space *, unsigned long); +extern u32 nonresident_put(struct address_space *, unsigned long); +extern unsigned long nonresident_total(void); + +#else /* CONFIG_MM_NONRESIDENT */ + +static inline void nonresident_init(void) { } +static inline +unsigned long nonresident_get(struct address_space *, unsigned long, int) +{ + return 0; +} + +static inline u32 nonresident_put(struct address_space *, unsigned long) +{ + return 0; +} + +static inline unsigned long nonresident_total(void) +{ + return 0; +} + +#endif /* CONFIG_MM_NONRESIDENT */ + +#endif /* __KERNEL */ +#endif /* _LINUX_NONRESIDENT_H_ */ Index: linux-2.6/init/main.c =================================================================== --- linux-2.6.orig/init/main.c +++ linux-2.6/init/main.c @@ -55,6 +55,7 @@ #include <linux/pid_namespace.h> #include <linux/device.h> #include <linux/kthread.h> +#include <linux/nonresident.h> #include <asm/io.h> #include <asm/bugs.h> @@ -611,6 +612,7 @@ asmlinkage void __init start_kernel(void #endif vfs_caches_init_early(); cpuset_init_early(); + nonresident_init(); mem_init(); kmem_cache_init(); setup_per_cpu_pageset(); Index: linux-2.6/mm/Makefile =================================================================== --- linux-2.6.orig/mm/Makefile +++ linux-2.6/mm/Makefile @@ -15,6 +15,7 @@ obj-y := bootmem.o filemap.o mempool.o obj-$(CONFIG_BOUNCE) += bounce.o obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o thrash.o +obj-$(CONFIG_MM_NONRESIDENT) += nonresident.o obj-$(CONFIG_HUGETLBFS) += hugetlb.o obj-$(CONFIG_NUMA) += mempolicy.o obj-$(CONFIG_SPARSEMEM) += sparse.o Index: linux-2.6/mm/swap.c =================================================================== --- linux-2.6.orig/mm/swap.c +++ linux-2.6/mm/swap.c @@ -30,6 +30,7 @@ #include <linux/cpu.h> #include <linux/notifier.h> #include <linux/init.h> +#include <linux/nonresident.h> /* How many pages do we try to swap or page in/out together? */ int page_cluster; @@ -365,6 +366,7 @@ void __pagevec_lru_add(struct pagevec *p } VM_BUG_ON(PageLRU(page)); SetPageLRU(page); + nonresident_get(page_mapping(page), page_index(page)); add_page_to_inactive_list(zone, page); } if (zone) @@ -392,6 +394,7 @@ void __pagevec_lru_add_active(struct pag } VM_BUG_ON(PageLRU(page)); SetPageLRU(page); + nonresident_get(page_mapping(page), page_index(page)); VM_BUG_ON(PageActive(page)); SetPageActive(page); add_page_to_active_list(zone, page); Index: linux-2.6/mm/vmscan.c =================================================================== --- linux-2.6.orig/mm/vmscan.c +++ linux-2.6/mm/vmscan.c @@ -37,6 +37,7 @@ #include <linux/delay.h> #include <linux/kthread.h> #include <linux/freezer.h> +#include <linux/nonresident.h> #include <asm/tlbflush.h> #include <asm/div64.h> @@ -402,6 +403,7 @@ int remove_mapping(struct address_space if (PageSwapCache(page)) { swp_entry_t swap = { .val = page_private(page) }; + nonresident_put(mapping, page_index(page)); __delete_from_swap_cache(page); write_unlock_irq(&mapping->tree_lock); swap_free(swap); @@ -409,6 +411,7 @@ int remove_mapping(struct address_space return 1; } + nonresident_put(mapping, page_index(page)); __remove_from_page_cache(page); write_unlock_irq(&mapping->tree_lock); __put_page(page); Index: linux-2.6/mm/Kconfig =================================================================== --- linux-2.6.orig/mm/Kconfig +++ linux-2.6/mm/Kconfig @@ -158,6 +158,10 @@ config RESOURCES_64BIT help This option allows memory and IO resources to be 64 bit. +config MM_NONRESIDENT + bool "Track nonresident pages" + def_bool y + config ZONE_DMA_FLAG int default "0" if !ZONE_DMA Index: linux-2.6/Documentation/kernel-parameters.txt =================================================================== --- linux-2.6.orig/Documentation/kernel-parameters.txt +++ linux-2.6/Documentation/kernel-parameters.txt @@ -1167,6 +1167,10 @@ and is between 256 and 4096 characters. nomce [IA-32] Machine Check Exception + nonresident_factor= [KNL] Scale the size of the nonresident history + The default is 100, which equals the total + memory size. + noreplace-paravirt [IA-32,PV_OPS] Don't patch paravirt_ops noreplace-smp [IA-32,SMP] Don't replace SMP instructions -- ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/2] mm: nonresident page tracking @ 2007-07-26 17:25 ` Peter Zijlstra, Rik van Riel 0 siblings, 0 replies; 10+ messages in thread From: Peter Zijlstra, Rik van Riel @ 2007-07-26 17:25 UTC (permalink / raw) To: linux-kernel, linux-mm, containers; +Cc: akpm, balbir, riel, a.p.zijlstra [-- Attachment #1: mm-nonresident.patch --] [-- Type: text/plain, Size: 10853 bytes --] Track non-resident pages through a simple hashing scheme. This way the space overhead is limited to 1 u32 per page, or 0.1% space overhead and lookups are one cache miss. Aside from seeing whether or not a page was recently evicted, we can also take a reasonable guess at how many other pages were evicted since this page was evicted. TODO: make the entries unsigned long, currently we're limited to 1^32*NUM_NR*PAGE_SIZE bytes of memory. Event though this would end up being 1008 TB of memory, I suspect the hash function to go crap at around 4 to 16 TB. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> --- Documentation/kernel-parameters.txt | 4 include/linux/nonresident.h | 35 +++++++ init/main.c | 2 mm/Kconfig | 4 mm/Makefile | 1 mm/nonresident.c | 173 ++++++++++++++++++++++++++++++++++++ mm/swap.c | 3 mm/vmscan.c | 3 8 files changed, 225 insertions(+) Index: linux-2.6/mm/nonresident.c =================================================================== --- /dev/null +++ linux-2.6/mm/nonresident.c @@ -0,0 +1,173 @@ +/* + * mm/nonresident.c + * (C) 2004,2005 Red Hat, Inc + * Written by Rik van Riel <riel@redhat.com> + * Released under the GPL, see the file COPYING for details. + * + * Keeps track of whether a non-resident page was recently evicted + * and should be immediately promoted to the active list. This also + * helps automatically tune the inactive target. + * + * The pageout code stores a recently evicted page in this cache + * by calling remember_page(mapping/mm, index/vaddr, generation) + * and can look it up in the cache by calling recently_evicted() + * with the same arguments. + * + * Note that there is no way to invalidate pages after eg. truncate + * or exit, we let the pages fall out of the non-resident set through + * normal replacement. + */ +#include <linux/mm.h> +#include <linux/cache.h> +#include <linux/spinlock.h> +#include <linux/bootmem.h> +#include <linux/hash.h> +#include <linux/prefetch.h> +#include <linux/kernel.h> + +/* Number of non-resident pages per hash bucket. Never smaller than 15. */ +#if (L1_CACHE_BYTES < 64) +#define NR_BUCKET_BYTES 64 +#else +#define NR_BUCKET_BYTES L1_CACHE_BYTES +#endif +#define NUM_NR ((NR_BUCKET_BYTES - sizeof(atomic_t))/sizeof(atomic_t)) + +struct nr_bucket +{ + atomic_t hand; + atomic_t cookie[NUM_NR]; +} ____cacheline_aligned; + +/* The non-resident page hash table. */ +static struct nr_bucket *nonres_table; +static unsigned int nonres_shift; +static unsigned int nonres_mask; + +static struct nr_bucket *nr_hash(void *mapping, unsigned long index) +{ + unsigned long bucket; + unsigned long hash; + + hash = hash_ptr(mapping, BITS_PER_LONG); + hash = 37 * hash + hash_long(index, BITS_PER_LONG); + bucket = hash & nonres_mask; + + return nonres_table + bucket; +} + +static u32 nr_cookie(struct address_space *mapping, unsigned long index) +{ + unsigned long cookie; + + cookie = hash_ptr(mapping, BITS_PER_LONG); + cookie = 37 * cookie + hash_long(index, BITS_PER_LONG); + + if (mapping && mapping->host) { + cookie *= 37; + cookie += hash_long(mapping->host->i_ino, BITS_PER_LONG); + } + + return (u32)(cookie >> (BITS_PER_LONG - 32)); +} + +unsigned long +nonresident_get(struct address_space *mapping, unsigned long index) +{ + struct nr_bucket *bucket; + unsigned long distance = ~0UL; + u32 wanted; + int i; + + if (!mapping) + return distance; + + prefetch(mapping->host); + bucket = nr_hash(mapping, index); + + prefetch(bucket); + wanted = nr_cookie(mapping, index); + + for (i = 0; i < NUM_NR; i++) { + if ((u32)atomic_cmpxchg(&bucket->cookie[i], wanted, 0) == wanted) { + /* Return the distance between entry and clock hand. */ + distance = atomic_read(&bucket->hand) + NUM_NR - i; + distance %= NUM_NR; + distance += (bucket - nonres_table); + goto out; + } + } + +out: + return distance; +} + +u32 nonresident_put(struct address_space *mapping, unsigned long index) +{ + struct nr_bucket *bucket; + u32 cookie; + int cur_hand; + int hand; + + prefetch(mapping->host); + bucket = nr_hash(mapping, index); + + prefetchw(bucket); + cookie = nr_cookie(mapping, index); + + /* Atomically find the next array index. */ + do { + cur_hand = atomic_read(&bucket->hand); + hand = cur_hand + 1; + if (unlikely(hand == NUM_NR)) + hand = 0; + } while (atomic_cmpxchg(&bucket->hand, cur_hand, hand) != cur_hand); + + /* Statistics may want to know whether the entry was in use. */ + return atomic_xchg(&bucket->cookie[hand], cookie); +} + +unsigned long nonresident_total(void) +{ + return NUM_NR << nonres_shift; +} + +/* + * For interactive workloads, we remember about as many non-resident pages + * as we have actual memory pages. For server workloads with large inter- + * reference distances we could benefit from remembering more. + */ +static __initdata unsigned long nonresident_factor = 100; +void __init nonresident_init(void) +{ + int target; + int i; + + /* + * Calculate the non-resident hash bucket target. Use a power of + * two for the division because alloc_large_system_hash rounds up. + */ + target = (nr_all_pages * nonresident_factor) / 100; + target /= (sizeof(struct nr_bucket) / sizeof(atomic_t)); + + nonres_table = alloc_large_system_hash("Non-resident page tracking", + sizeof(struct nr_bucket), + target, + 0, + HASH_EARLY, + &nonres_shift, + &nonres_mask, + 0); + + for (i = 0; i < (1 << nonres_shift); i++) + atomic_set(&nonres_table[i].hand, 0); +} + +static int __init set_nonresident_factor(char *str) +{ + if (!str) + return 0; + nonresident_factor = simple_strtoul(str, &str, 0); + return 1; +} +__setup("nonresident_factor=", set_nonresident_factor); Index: linux-2.6/include/linux/nonresident.h =================================================================== --- /dev/null +++ linux-2.6/include/linux/nonresident.h @@ -0,0 +1,35 @@ +#ifndef _LINUX_NONRESIDENT_H_ +#define _LINUX_NONRESIDENT_H_ + +#ifdef __KERNEL__ + +#ifdef CONFIG_MM_NONRESIDENT + +extern void nonresident_init(void); +extern unsigned long nonresident_get(struct address_space *, unsigned long); +extern u32 nonresident_put(struct address_space *, unsigned long); +extern unsigned long nonresident_total(void); + +#else /* CONFIG_MM_NONRESIDENT */ + +static inline void nonresident_init(void) { } +static inline +unsigned long nonresident_get(struct address_space *, unsigned long, int) +{ + return 0; +} + +static inline u32 nonresident_put(struct address_space *, unsigned long) +{ + return 0; +} + +static inline unsigned long nonresident_total(void) +{ + return 0; +} + +#endif /* CONFIG_MM_NONRESIDENT */ + +#endif /* __KERNEL */ +#endif /* _LINUX_NONRESIDENT_H_ */ Index: linux-2.6/init/main.c =================================================================== --- linux-2.6.orig/init/main.c +++ linux-2.6/init/main.c @@ -55,6 +55,7 @@ #include <linux/pid_namespace.h> #include <linux/device.h> #include <linux/kthread.h> +#include <linux/nonresident.h> #include <asm/io.h> #include <asm/bugs.h> @@ -611,6 +612,7 @@ asmlinkage void __init start_kernel(void #endif vfs_caches_init_early(); cpuset_init_early(); + nonresident_init(); mem_init(); kmem_cache_init(); setup_per_cpu_pageset(); Index: linux-2.6/mm/Makefile =================================================================== --- linux-2.6.orig/mm/Makefile +++ linux-2.6/mm/Makefile @@ -15,6 +15,7 @@ obj-y := bootmem.o filemap.o mempool.o obj-$(CONFIG_BOUNCE) += bounce.o obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o thrash.o +obj-$(CONFIG_MM_NONRESIDENT) += nonresident.o obj-$(CONFIG_HUGETLBFS) += hugetlb.o obj-$(CONFIG_NUMA) += mempolicy.o obj-$(CONFIG_SPARSEMEM) += sparse.o Index: linux-2.6/mm/swap.c =================================================================== --- linux-2.6.orig/mm/swap.c +++ linux-2.6/mm/swap.c @@ -30,6 +30,7 @@ #include <linux/cpu.h> #include <linux/notifier.h> #include <linux/init.h> +#include <linux/nonresident.h> /* How many pages do we try to swap or page in/out together? */ int page_cluster; @@ -365,6 +366,7 @@ void __pagevec_lru_add(struct pagevec *p } VM_BUG_ON(PageLRU(page)); SetPageLRU(page); + nonresident_get(page_mapping(page), page_index(page)); add_page_to_inactive_list(zone, page); } if (zone) @@ -392,6 +394,7 @@ void __pagevec_lru_add_active(struct pag } VM_BUG_ON(PageLRU(page)); SetPageLRU(page); + nonresident_get(page_mapping(page), page_index(page)); VM_BUG_ON(PageActive(page)); SetPageActive(page); add_page_to_active_list(zone, page); Index: linux-2.6/mm/vmscan.c =================================================================== --- linux-2.6.orig/mm/vmscan.c +++ linux-2.6/mm/vmscan.c @@ -37,6 +37,7 @@ #include <linux/delay.h> #include <linux/kthread.h> #include <linux/freezer.h> +#include <linux/nonresident.h> #include <asm/tlbflush.h> #include <asm/div64.h> @@ -402,6 +403,7 @@ int remove_mapping(struct address_space if (PageSwapCache(page)) { swp_entry_t swap = { .val = page_private(page) }; + nonresident_put(mapping, page_index(page)); __delete_from_swap_cache(page); write_unlock_irq(&mapping->tree_lock); swap_free(swap); @@ -409,6 +411,7 @@ int remove_mapping(struct address_space return 1; } + nonresident_put(mapping, page_index(page)); __remove_from_page_cache(page); write_unlock_irq(&mapping->tree_lock); __put_page(page); Index: linux-2.6/mm/Kconfig =================================================================== --- linux-2.6.orig/mm/Kconfig +++ linux-2.6/mm/Kconfig @@ -158,6 +158,10 @@ config RESOURCES_64BIT help This option allows memory and IO resources to be 64 bit. +config MM_NONRESIDENT + bool "Track nonresident pages" + def_bool y + config ZONE_DMA_FLAG int default "0" if !ZONE_DMA Index: linux-2.6/Documentation/kernel-parameters.txt =================================================================== --- linux-2.6.orig/Documentation/kernel-parameters.txt +++ linux-2.6/Documentation/kernel-parameters.txt @@ -1167,6 +1167,10 @@ and is between 256 and 4096 characters. nomce [IA-32] Machine Check Exception + nonresident_factor= [KNL] Scale the size of the nonresident history + The default is 100, which equals the total + memory size. + noreplace-paravirt [IA-32,PV_OPS] Don't patch paravirt_ops noreplace-smp [IA-32,SMP] Don't replace SMP instructions -- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2007-07-26 17:25 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-11 18:29 [PATCH 0/2] mm: measuring resource demand Peter Zijlstra
2006-07-11 18:29 ` Peter Zijlstra
2006-07-11 18:29 ` [PATCH 1/2] mm: nonresident page tracking Peter Zijlstra
2006-07-11 18:29 ` Peter Zijlstra
[not found] ` <215036450607140155w67df26fan5b2342ead686ce8b@mail.gmail.com>
2006-07-14 14:19 ` Peter Zijlstra
2006-07-14 14:19 ` Peter Zijlstra
2006-07-11 18:29 ` [PATCH 2/2] mm: refault histogram Peter Zijlstra
2006-07-11 18:29 ` Peter Zijlstra
-- strict thread matches above, loose matches on Subject: below --
2007-07-26 17:25 [PATCH 0/2] refaults Peter Zijlstra
2007-07-26 17:25 ` [PATCH 1/2] mm: nonresident page tracking Peter Zijlstra
2007-07-26 17:25 ` Peter Zijlstra, Rik van Riel
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.