* Re: [PATCH -mm 23/25] Noreclaim LRU scan sysctl [not found] <484DEA23.50707@cn.fujitsu.com> @ 2008-06-10 2:52 ` Li Zefan 0 siblings, 0 replies; 3+ messages in thread From: Li Zefan @ 2008-06-10 2:52 UTC (permalink / raw) To: Rik Van Riel; +Cc: Andrew Morton, Lee Schermerhorn, KOSAKI Motohiro, LKML > +static void show_page_path(struct page *page) > +{ > + char buf[256]; > + if (page_file_cache(page)) { > + struct address_space *mapping = page->mapping; > + struct dentry *dentry; > + pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT); > + > + spin_lock(&mapping->i_mmap_lock); > + dentry = d_find_alias(mapping->host); > + printk(KERN_INFO "rescued: %s %lu\n", > + dentry_path(dentry, buf, 256), pgoff); > + spin_unlock(&mapping->i_mmap_lock); > + } else { > + struct anon_vma *anon_vma; > + struct vm_area_struct *vma; > + > + anon_vma = page_lock_anon_vma(page); > + if (!anon_vma) > + return; > + > + list_for_each_entry(vma, &anon_vma->head, anon_vma_node) { > + printk(KERN_INFO "rescued: anon %s\n", > + vma->vm_mm->owner->comm); This would cause compile failure if !CONFIG_MM_OWNER. > + break; > + } > + page_unlock_anon_vma(anon_vma); > + } > +} ^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH -mm 00/25] VM pageout scalability improvements (V10) @ 2008-06-06 20:28 Rik van Riel, Rik van Riel 2008-06-06 20:29 ` Rik van Riel, Rik van Riel, Lee Schermerhorn 0 siblings, 1 reply; 3+ messages in thread From: Rik van Riel, Rik van Riel @ 2008-06-06 20:28 UTC (permalink / raw) To: linux-kernel; +Cc: Andrew Morton, Lee Schermerhorn, Kosaki Motohiro On large memory systems, the VM can spend way too much time scanning through pages that it cannot (or should not) evict from memory. Not only does it use up CPU time, but it also provokes lock contention and can leave large systems under memory presure in a catatonic state. Against 2.6.26-rc2-mm1 This patch series improves VM scalability by: 1) putting filesystem backed, swap backed and non-reclaimable pages onto their own LRUs, so the system only scans the pages that it can/should evict from memory 2) switching to two handed clock replacement for the anonymous LRUs, so the number of pages that need to be scanned when the system starts swapping is bound to a reasonable number 3) keeping non-reclaimable pages off the LRU completely, so the VM does not waste CPU time scanning them. Currently only ramfs and SHM_LOCKED pages are kept on the noreclaim list, mlock()ed VMAs will be added later More info on the overall design can be found at: http://linux-mm.org/PageReplacementDesign An all-in-one patch can be found at: http://people.redhat.com/riel/splitvm/ Changelog: - merge in all of Lee's mlock changes - fix shrink_list memcgroup balancing (KOSAKI Motohiro) - fix balancing stats in shrink_active_list (Daisuke Nishimura) - make sure previously active pagecache pages get reactivated on the first access (Rik van Riel) - compile fix when !CONFIG_SWAP (MinChan Kim) - clean up page-flags.h defines when !CONFIG_NORECLAIM_LRU (Lee Schermerhorn) - fix some race conditions around moving pages to and from the noreclaim list (Lee Schermerhorn, KOSAKI Motohiro) - use putback_lru_page() for page migration (Lee Schermerhorn) - fix potential SHM_UNLOCK race in scan_mapping_noreclaim_pages() (Lee Schermerhorn, KOSAKI Motohiro) - improve swap space freeing to deal with COW shared space (Lee Schermerhorn, Daisuke Nishimura & Minchan Kim) - clean up PG_swapbacked setting in swapin path (Minchan Kim) - properly invoke shrink_active_list for background aging (Minchan Kim) - add authorship info to all patches (Rik van Riel) - clean up (or move below ---) the comments for the commit logs (Rik van Riel) - after some tests, reduce default swappiness to 20 for now (Rik van Riel) - several code cleanups (minchan Kim) - noreclaim patch refactoring and improvements (Lee Schermerhorn) - several PROT_NONE and vma merging fixes (KOSAKI Motohiro) - SMP bugfixes and efficiency improvements (Rik van Riel, Lee Schermerhorn) - fix NUMA node stats printing (Lee Schermerhorn) - remove the mlocked-VMA-noreclaim code for now, it still has bugs on IA64 and is holding up the merge (Rik van Riel) - make page_alloc.c compile without CONFIG_NORECLAIM_MLOCK (minchan Kim) - BUG() does not take an argument (minchan Kim) - clean up is_active_lru and is_file_lru (Andy Whitcroft) - clean up shrink_active_list temp list names (KOSAKI Motohiro) - add total active & inactive memory totals for vmstat -a (KOSAKI Motohiro) - only try global anon page aging on global lru scans (KOSAKI Motohiro) - make function descriptions follow the kernel-doc format (Rik van Riel) - simplify mlock_vma_pages_range and munlock_vma_pages_range (Lee Schermerhorn) - remove some more arguments, rename to mlock_vma_pages_all (Lee Schermerhorn) - many code cleanups (Lee Schermerhorn) - pass correct vma arg to mlock_vma_pages_range from do_brk (Rik van Riel) - port to 2.6.25-rc3-mm1 - pull the memcontrol lru arrayification earlier into the patch series - use a pagevec array similar to the lru array - clean up the code in various places - improved pageout balancing and reduced pageout cpu use - fix compilation on PPC and without memcontrol - make page_is_pagecache more readable - replace get_scan_ratio with correct version - merge memcontroller split LRU code into the main split LRU patch, since it is not functionally different (it was split up only to help people who had seen the last version of the patch series review it) - drop the page_file_cache debugging patch, since it never triggered - reintroduce code to not scan anon list if swap is full - add code to scan anon list if page cache is very small already - use lumpy reclaim more aggressively for smaller order > 1 allocations Subject: [PATCH -mm 00/25] -- All Rights Reversed ^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH -mm 23/25] Noreclaim LRU scan sysctl 2008-06-06 20:28 [PATCH -mm 00/25] VM pageout scalability improvements (V10) Rik van Riel, Rik van Riel @ 2008-06-06 20:29 ` Rik van Riel, Rik van Riel, Lee Schermerhorn 0 siblings, 0 replies; 3+ messages in thread From: Rik van Riel, Rik van Riel @ 2008-06-06 20:29 UTC (permalink / raw) To: linux-kernel Cc: Andrew Morton, Lee Schermerhorn, Kosaki Motohiro, linux-mm, Eric Whitney [-- Attachment #1: rvr-23-lts-noreclaim-lru-scan-sysctl.patch --] [-- Type: text/plain, Size: 11463 bytes --] From: Lee Schermerhorn <lee.schermerhorn@hp.com> Against: 2.6.26-rc2-mm1 V6: + moved to end of series as optional debug patch V2 -> V3: + rebase to 23-mm1 atop RvR's split LRU series New in V2 This patch adds a function to scan individual or all zones' noreclaim lists and move any pages that have become reclaimable onto the respective zone's inactive list, where shrink_inactive_list() will deal with them. Adds sysctl to scan all nodes, and per node attributes to individual nodes' zones. Kosaki: If reclaimable page found in noreclaim lru when write /proc/sys/vm/scan_noreclaim_pages, print filename and file offset of these pages. TODO: DEBUGGING ONLY: NOT FOR UPSTREAM MERGE Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> drivers/base/node.c | 5 + include/linux/rmap.h | 3 include/linux/swap.h | 15 ++++ kernel/sysctl.c | 10 +++ mm/rmap.c | 4 - mm/vmscan.c | 161 +++++++++++++++++++++++++++++++++++++++++++++++++++ 6 files changed, 196 insertions(+), 2 deletions(-) Index: linux-2.6.26-rc2-mm1/include/linux/swap.h =================================================================== --- linux-2.6.26-rc2-mm1.orig/include/linux/swap.h 2008-06-06 16:06:44.000000000 -0400 +++ linux-2.6.26-rc2-mm1/include/linux/swap.h 2008-06-06 16:06:52.000000000 -0400 @@ -7,6 +7,7 @@ #include <linux/list.h> #include <linux/memcontrol.h> #include <linux/sched.h> +#include <linux/node.h> #include <asm/atomic.h> #include <asm/page.h> @@ -235,15 +236,29 @@ static inline int zone_reclaim(struct zo #ifdef CONFIG_NORECLAIM_LRU extern int page_reclaimable(struct page *page, struct vm_area_struct *vma); extern void scan_mapping_noreclaim_pages(struct address_space *); + +extern unsigned long scan_noreclaim_pages; +extern int scan_noreclaim_handler(struct ctl_table *, int, struct file *, + void __user *, size_t *, loff_t *); +extern int scan_noreclaim_register_node(struct node *node); +extern void scan_noreclaim_unregister_node(struct node *node); #else static inline int page_reclaimable(struct page *page, struct vm_area_struct *vma) { return 1; } + static inline void scan_mapping_noreclaim_pages(struct address_space *mapping) { } + +static inline int scan_noreclaim_register_node(struct node *node) +{ + return 0; +} + +static inline void scan_noreclaim_unregister_node(struct node *node) { } #endif extern int kswapd_run(int nid); Index: linux-2.6.26-rc2-mm1/mm/vmscan.c =================================================================== --- linux-2.6.26-rc2-mm1.orig/mm/vmscan.c 2008-06-06 16:06:48.000000000 -0400 +++ linux-2.6.26-rc2-mm1/mm/vmscan.c 2008-06-06 16:06:52.000000000 -0400 @@ -39,6 +39,7 @@ #include <linux/kthread.h> #include <linux/freezer.h> #include <linux/memcontrol.h> +#include <linux/sysctl.h> #include <asm/tlbflush.h> #include <asm/div64.h> @@ -2355,6 +2356,37 @@ int page_reclaimable(struct page *page, return 1; } +static void show_page_path(struct page *page) +{ + char buf[256]; + if (page_file_cache(page)) { + struct address_space *mapping = page->mapping; + struct dentry *dentry; + pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT); + + spin_lock(&mapping->i_mmap_lock); + dentry = d_find_alias(mapping->host); + printk(KERN_INFO "rescued: %s %lu\n", + dentry_path(dentry, buf, 256), pgoff); + spin_unlock(&mapping->i_mmap_lock); + } else { + struct anon_vma *anon_vma; + struct vm_area_struct *vma; + + anon_vma = page_lock_anon_vma(page); + if (!anon_vma) + return; + + list_for_each_entry(vma, &anon_vma->head, anon_vma_node) { + printk(KERN_INFO "rescued: anon %s\n", + vma->vm_mm->owner->comm); + break; + } + page_unlock_anon_vma(anon_vma); + } +} + + /** * check_move_noreclaim_page - check page for reclaimability and move to appropriate lru list * @page: page to check reclaimability and move to appropriate lru list @@ -2372,6 +2404,9 @@ static void check_move_noreclaim_page(st ClearPageNoreclaim(page); /* for page_reclaimable() */ if (page_reclaimable(page, NULL)) { enum lru_list l = LRU_INACTIVE_ANON + page_file_cache(page); + + show_page_path(page); + __dec_zone_state(zone, NR_NORECLAIM); list_move(&page->lru, &zone->list[l]); __inc_zone_state(zone, NR_INACTIVE_ANON + l); @@ -2452,4 +2487,130 @@ void scan_mapping_noreclaim_pages(struct } } + +/** + * scan_zone_noreclaim_pages - check noreclaim list for reclaimable pages + * @zone - zone of which to scan the noreclaim list + * + * Scan @zone's noreclaim LRU lists to check for pages that have become + * reclaimable. Move those that have to @zone's inactive list where they + * become candidates for reclaim, unless shrink_inactive_zone() decides + * to reactivate them. Pages that are still non-reclaimable are rotated + * back onto @zone's noreclaim list. + */ +#define SCAN_NORECLAIM_BATCH_SIZE 16UL /* arbitrary lock hold batch size */ +void scan_zone_noreclaim_pages(struct zone *zone) +{ + struct list_head *l_noreclaim = &zone->list[LRU_NORECLAIM]; + unsigned long scan; + unsigned long nr_to_scan = zone_page_state(zone, NR_NORECLAIM); + + while (nr_to_scan > 0) { + unsigned long batch_size = min(nr_to_scan, + SCAN_NORECLAIM_BATCH_SIZE); + + spin_lock_irq(&zone->lru_lock); + for (scan = 0; scan < batch_size; scan++) { + struct page *page = lru_to_page(l_noreclaim); + + if (TestSetPageLocked(page)) + continue; + + prefetchw_prev_lru_page(page, l_noreclaim, flags); + + if (likely(PageLRU(page) && PageNoreclaim(page))) + check_move_noreclaim_page(page, zone); + + unlock_page(page); + } + spin_unlock_irq(&zone->lru_lock); + + nr_to_scan -= batch_size; + } +} + + +/** + * scan_all_zones_noreclaim_pages - scan all noreclaim lists for reclaimable pages + * + * A really big hammer: scan all zones' noreclaim LRU lists to check for + * pages that have become reclaimable. Move those back to the zones' + * inactive list where they become candidates for reclaim. + * This occurs when, e.g., we have unswappable pages on the noreclaim lists, + * and we add swap to the system. As such, it runs in the context of a task + * that has possibly/probably made some previously non-reclaimable pages + * reclaimable. + */ +void scan_all_zones_noreclaim_pages(void) +{ + struct zone *zone; + + for_each_zone(zone) { + scan_zone_noreclaim_pages(zone); + } +} + +/* + * scan_noreclaim_pages [vm] sysctl handler. On demand re-scan of + * all nodes' noreclaim lists for reclaimable pages + */ +unsigned long scan_noreclaim_pages; + +int scan_noreclaim_handler(struct ctl_table *table, int write, + struct file *file, void __user *buffer, + size_t *length, loff_t *ppos) +{ + proc_doulongvec_minmax(table, write, file, buffer, length, ppos); + + if (write && *(unsigned long *)table->data) + scan_all_zones_noreclaim_pages(); + + scan_noreclaim_pages = 0; + return 0; +} + +/* + * per node 'scan_noreclaim_pages' attribute. On demand re-scan of + * a specified node's per zone noreclaim lists for reclaimable pages. + */ + +static ssize_t read_scan_noreclaim_node(struct sys_device *dev, char *buf) +{ + return sprintf(buf, "0\n"); /* always zero; should fit... */ +} + +static ssize_t write_scan_noreclaim_node(struct sys_device *dev, + const char *buf, size_t count) +{ + struct zone *node_zones = NODE_DATA(dev->id)->node_zones; + struct zone *zone; + unsigned long res; + unsigned long req = strict_strtoul(buf, 10, &res); + + if (!req) + return 1; /* zero is no-op */ + + for (zone = node_zones; zone - node_zones < MAX_NR_ZONES; ++zone) { + if (!populated_zone(zone)) + continue; + scan_zone_noreclaim_pages(zone); + } + return 1; +} + + +static SYSDEV_ATTR(scan_noreclaim_pages, S_IRUGO | S_IWUSR, + read_scan_noreclaim_node, + write_scan_noreclaim_node); + +int scan_noreclaim_register_node(struct node *node) +{ + return sysdev_create_file(&node->sysdev, &attr_scan_noreclaim_pages); +} + +void scan_noreclaim_unregister_node(struct node *node) +{ + sysdev_remove_file(&node->sysdev, &attr_scan_noreclaim_pages); +} + #endif Index: linux-2.6.26-rc2-mm1/kernel/sysctl.c =================================================================== --- linux-2.6.26-rc2-mm1.orig/kernel/sysctl.c 2008-05-15 11:21:11.000000000 -0400 +++ linux-2.6.26-rc2-mm1/kernel/sysctl.c 2008-06-06 16:06:52.000000000 -0400 @@ -1151,6 +1151,16 @@ static struct ctl_table vm_table[] = { .extra2 = &one, }, #endif +#ifdef CONFIG_NORECLAIM_LRU + { + .ctl_name = CTL_UNNUMBERED, + .procname = "scan_noreclaim_pages", + .data = &scan_noreclaim_pages, + .maxlen = sizeof(scan_noreclaim_pages), + .mode = 0644, + .proc_handler = &scan_noreclaim_handler, + }, +#endif /* * NOTE: do not add new entries to this table unless you have read * Documentation/sysctl/ctl_unnumbered.txt Index: linux-2.6.26-rc2-mm1/drivers/base/node.c =================================================================== --- linux-2.6.26-rc2-mm1.orig/drivers/base/node.c 2008-06-06 16:06:38.000000000 -0400 +++ linux-2.6.26-rc2-mm1/drivers/base/node.c 2008-06-06 16:06:52.000000000 -0400 @@ -13,6 +13,7 @@ #include <linux/nodemask.h> #include <linux/cpu.h> #include <linux/device.h> +#include <linux/swap.h> static struct sysdev_class node_class = { .name = "node", @@ -190,6 +191,8 @@ int register_node(struct node *node, int sysdev_create_file(&node->sysdev, &attr_meminfo); sysdev_create_file(&node->sysdev, &attr_numastat); sysdev_create_file(&node->sysdev, &attr_distance); + + scan_noreclaim_register_node(node); } return error; } @@ -209,6 +212,8 @@ void unregister_node(struct node *node) sysdev_remove_file(&node->sysdev, &attr_numastat); sysdev_remove_file(&node->sysdev, &attr_distance); + scan_noreclaim_unregister_node(node); + sysdev_unregister(&node->sysdev); } Index: linux-2.6.26-rc2-mm1/include/linux/rmap.h =================================================================== --- linux-2.6.26-rc2-mm1.orig/include/linux/rmap.h 2008-06-06 16:06:28.000000000 -0400 +++ linux-2.6.26-rc2-mm1/include/linux/rmap.h 2008-06-06 16:06:52.000000000 -0400 @@ -55,6 +55,9 @@ void anon_vma_unlink(struct vm_area_stru void anon_vma_link(struct vm_area_struct *); void __anon_vma_link(struct vm_area_struct *); +extern struct anon_vma *page_lock_anon_vma(struct page *page); +extern void page_unlock_anon_vma(struct anon_vma *anon_vma); + /* * rmap interfaces called when adding or removing pte of page */ Index: linux-2.6.26-rc2-mm1/mm/rmap.c =================================================================== --- linux-2.6.26-rc2-mm1.orig/mm/rmap.c 2008-06-06 16:06:28.000000000 -0400 +++ linux-2.6.26-rc2-mm1/mm/rmap.c 2008-06-06 16:06:52.000000000 -0400 @@ -168,7 +168,7 @@ void __init anon_vma_init(void) * Getting a lock on a stable anon_vma from a page off the LRU is * tricky: page_lock_anon_vma rely on RCU to guard against the races. */ -static struct anon_vma *page_lock_anon_vma(struct page *page) +struct anon_vma *page_lock_anon_vma(struct page *page) { struct anon_vma *anon_vma; unsigned long anon_mapping; @@ -188,7 +188,7 @@ out: return NULL; } -static void page_unlock_anon_vma(struct anon_vma *anon_vma) +void page_unlock_anon_vma(struct anon_vma *anon_vma) { spin_unlock(&anon_vma->lock); rcu_read_unlock(); -- All Rights Reversed ^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH -mm 23/25] Noreclaim LRU scan sysctl @ 2008-06-06 20:29 ` Rik van Riel, Rik van Riel, Lee Schermerhorn 0 siblings, 0 replies; 3+ messages in thread From: Rik van Riel, Rik van Riel, Lee Schermerhorn @ 2008-06-06 20:29 UTC (permalink / raw) To: linux-kernel Cc: Andrew Morton, Lee Schermerhorn, Kosaki Motohiro, linux-mm, Eric Whitney [-- Attachment #1: rvr-23-lts-noreclaim-lru-scan-sysctl.patch --] [-- Type: text/plain, Size: 11638 bytes --] Against: 2.6.26-rc2-mm1 V6: + moved to end of series as optional debug patch V2 -> V3: + rebase to 23-mm1 atop RvR's split LRU series New in V2 This patch adds a function to scan individual or all zones' noreclaim lists and move any pages that have become reclaimable onto the respective zone's inactive list, where shrink_inactive_list() will deal with them. Adds sysctl to scan all nodes, and per node attributes to individual nodes' zones. Kosaki: If reclaimable page found in noreclaim lru when write /proc/sys/vm/scan_noreclaim_pages, print filename and file offset of these pages. TODO: DEBUGGING ONLY: NOT FOR UPSTREAM MERGE Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> drivers/base/node.c | 5 + include/linux/rmap.h | 3 include/linux/swap.h | 15 ++++ kernel/sysctl.c | 10 +++ mm/rmap.c | 4 - mm/vmscan.c | 161 +++++++++++++++++++++++++++++++++++++++++++++++++++ 6 files changed, 196 insertions(+), 2 deletions(-) Index: linux-2.6.26-rc2-mm1/include/linux/swap.h =================================================================== --- linux-2.6.26-rc2-mm1.orig/include/linux/swap.h 2008-06-06 16:06:44.000000000 -0400 +++ linux-2.6.26-rc2-mm1/include/linux/swap.h 2008-06-06 16:06:52.000000000 -0400 @@ -7,6 +7,7 @@ #include <linux/list.h> #include <linux/memcontrol.h> #include <linux/sched.h> +#include <linux/node.h> #include <asm/atomic.h> #include <asm/page.h> @@ -235,15 +236,29 @@ static inline int zone_reclaim(struct zo #ifdef CONFIG_NORECLAIM_LRU extern int page_reclaimable(struct page *page, struct vm_area_struct *vma); extern void scan_mapping_noreclaim_pages(struct address_space *); + +extern unsigned long scan_noreclaim_pages; +extern int scan_noreclaim_handler(struct ctl_table *, int, struct file *, + void __user *, size_t *, loff_t *); +extern int scan_noreclaim_register_node(struct node *node); +extern void scan_noreclaim_unregister_node(struct node *node); #else static inline int page_reclaimable(struct page *page, struct vm_area_struct *vma) { return 1; } + static inline void scan_mapping_noreclaim_pages(struct address_space *mapping) { } + +static inline int scan_noreclaim_register_node(struct node *node) +{ + return 0; +} + +static inline void scan_noreclaim_unregister_node(struct node *node) { } #endif extern int kswapd_run(int nid); Index: linux-2.6.26-rc2-mm1/mm/vmscan.c =================================================================== --- linux-2.6.26-rc2-mm1.orig/mm/vmscan.c 2008-06-06 16:06:48.000000000 -0400 +++ linux-2.6.26-rc2-mm1/mm/vmscan.c 2008-06-06 16:06:52.000000000 -0400 @@ -39,6 +39,7 @@ #include <linux/kthread.h> #include <linux/freezer.h> #include <linux/memcontrol.h> +#include <linux/sysctl.h> #include <asm/tlbflush.h> #include <asm/div64.h> @@ -2355,6 +2356,37 @@ int page_reclaimable(struct page *page, return 1; } +static void show_page_path(struct page *page) +{ + char buf[256]; + if (page_file_cache(page)) { + struct address_space *mapping = page->mapping; + struct dentry *dentry; + pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT); + + spin_lock(&mapping->i_mmap_lock); + dentry = d_find_alias(mapping->host); + printk(KERN_INFO "rescued: %s %lu\n", + dentry_path(dentry, buf, 256), pgoff); + spin_unlock(&mapping->i_mmap_lock); + } else { + struct anon_vma *anon_vma; + struct vm_area_struct *vma; + + anon_vma = page_lock_anon_vma(page); + if (!anon_vma) + return; + + list_for_each_entry(vma, &anon_vma->head, anon_vma_node) { + printk(KERN_INFO "rescued: anon %s\n", + vma->vm_mm->owner->comm); + break; + } + page_unlock_anon_vma(anon_vma); + } +} + + /** * check_move_noreclaim_page - check page for reclaimability and move to appropriate lru list * @page: page to check reclaimability and move to appropriate lru list @@ -2372,6 +2404,9 @@ static void check_move_noreclaim_page(st ClearPageNoreclaim(page); /* for page_reclaimable() */ if (page_reclaimable(page, NULL)) { enum lru_list l = LRU_INACTIVE_ANON + page_file_cache(page); + + show_page_path(page); + __dec_zone_state(zone, NR_NORECLAIM); list_move(&page->lru, &zone->list[l]); __inc_zone_state(zone, NR_INACTIVE_ANON + l); @@ -2452,4 +2487,130 @@ void scan_mapping_noreclaim_pages(struct } } + +/** + * scan_zone_noreclaim_pages - check noreclaim list for reclaimable pages + * @zone - zone of which to scan the noreclaim list + * + * Scan @zone's noreclaim LRU lists to check for pages that have become + * reclaimable. Move those that have to @zone's inactive list where they + * become candidates for reclaim, unless shrink_inactive_zone() decides + * to reactivate them. Pages that are still non-reclaimable are rotated + * back onto @zone's noreclaim list. + */ +#define SCAN_NORECLAIM_BATCH_SIZE 16UL /* arbitrary lock hold batch size */ +void scan_zone_noreclaim_pages(struct zone *zone) +{ + struct list_head *l_noreclaim = &zone->list[LRU_NORECLAIM]; + unsigned long scan; + unsigned long nr_to_scan = zone_page_state(zone, NR_NORECLAIM); + + while (nr_to_scan > 0) { + unsigned long batch_size = min(nr_to_scan, + SCAN_NORECLAIM_BATCH_SIZE); + + spin_lock_irq(&zone->lru_lock); + for (scan = 0; scan < batch_size; scan++) { + struct page *page = lru_to_page(l_noreclaim); + + if (TestSetPageLocked(page)) + continue; + + prefetchw_prev_lru_page(page, l_noreclaim, flags); + + if (likely(PageLRU(page) && PageNoreclaim(page))) + check_move_noreclaim_page(page, zone); + + unlock_page(page); + } + spin_unlock_irq(&zone->lru_lock); + + nr_to_scan -= batch_size; + } +} + + +/** + * scan_all_zones_noreclaim_pages - scan all noreclaim lists for reclaimable pages + * + * A really big hammer: scan all zones' noreclaim LRU lists to check for + * pages that have become reclaimable. Move those back to the zones' + * inactive list where they become candidates for reclaim. + * This occurs when, e.g., we have unswappable pages on the noreclaim lists, + * and we add swap to the system. As such, it runs in the context of a task + * that has possibly/probably made some previously non-reclaimable pages + * reclaimable. + */ +void scan_all_zones_noreclaim_pages(void) +{ + struct zone *zone; + + for_each_zone(zone) { + scan_zone_noreclaim_pages(zone); + } +} + +/* + * scan_noreclaim_pages [vm] sysctl handler. On demand re-scan of + * all nodes' noreclaim lists for reclaimable pages + */ +unsigned long scan_noreclaim_pages; + +int scan_noreclaim_handler(struct ctl_table *table, int write, + struct file *file, void __user *buffer, + size_t *length, loff_t *ppos) +{ + proc_doulongvec_minmax(table, write, file, buffer, length, ppos); + + if (write && *(unsigned long *)table->data) + scan_all_zones_noreclaim_pages(); + + scan_noreclaim_pages = 0; + return 0; +} + +/* + * per node 'scan_noreclaim_pages' attribute. On demand re-scan of + * a specified node's per zone noreclaim lists for reclaimable pages. + */ + +static ssize_t read_scan_noreclaim_node(struct sys_device *dev, char *buf) +{ + return sprintf(buf, "0\n"); /* always zero; should fit... */ +} + +static ssize_t write_scan_noreclaim_node(struct sys_device *dev, + const char *buf, size_t count) +{ + struct zone *node_zones = NODE_DATA(dev->id)->node_zones; + struct zone *zone; + unsigned long res; + unsigned long req = strict_strtoul(buf, 10, &res); + + if (!req) + return 1; /* zero is no-op */ + + for (zone = node_zones; zone - node_zones < MAX_NR_ZONES; ++zone) { + if (!populated_zone(zone)) + continue; + scan_zone_noreclaim_pages(zone); + } + return 1; +} + + +static SYSDEV_ATTR(scan_noreclaim_pages, S_IRUGO | S_IWUSR, + read_scan_noreclaim_node, + write_scan_noreclaim_node); + +int scan_noreclaim_register_node(struct node *node) +{ + return sysdev_create_file(&node->sysdev, &attr_scan_noreclaim_pages); +} + +void scan_noreclaim_unregister_node(struct node *node) +{ + sysdev_remove_file(&node->sysdev, &attr_scan_noreclaim_pages); +} + #endif Index: linux-2.6.26-rc2-mm1/kernel/sysctl.c =================================================================== --- linux-2.6.26-rc2-mm1.orig/kernel/sysctl.c 2008-05-15 11:21:11.000000000 -0400 +++ linux-2.6.26-rc2-mm1/kernel/sysctl.c 2008-06-06 16:06:52.000000000 -0400 @@ -1151,6 +1151,16 @@ static struct ctl_table vm_table[] = { .extra2 = &one, }, #endif +#ifdef CONFIG_NORECLAIM_LRU + { + .ctl_name = CTL_UNNUMBERED, + .procname = "scan_noreclaim_pages", + .data = &scan_noreclaim_pages, + .maxlen = sizeof(scan_noreclaim_pages), + .mode = 0644, + .proc_handler = &scan_noreclaim_handler, + }, +#endif /* * NOTE: do not add new entries to this table unless you have read * Documentation/sysctl/ctl_unnumbered.txt Index: linux-2.6.26-rc2-mm1/drivers/base/node.c =================================================================== --- linux-2.6.26-rc2-mm1.orig/drivers/base/node.c 2008-06-06 16:06:38.000000000 -0400 +++ linux-2.6.26-rc2-mm1/drivers/base/node.c 2008-06-06 16:06:52.000000000 -0400 @@ -13,6 +13,7 @@ #include <linux/nodemask.h> #include <linux/cpu.h> #include <linux/device.h> +#include <linux/swap.h> static struct sysdev_class node_class = { .name = "node", @@ -190,6 +191,8 @@ int register_node(struct node *node, int sysdev_create_file(&node->sysdev, &attr_meminfo); sysdev_create_file(&node->sysdev, &attr_numastat); sysdev_create_file(&node->sysdev, &attr_distance); + + scan_noreclaim_register_node(node); } return error; } @@ -209,6 +212,8 @@ void unregister_node(struct node *node) sysdev_remove_file(&node->sysdev, &attr_numastat); sysdev_remove_file(&node->sysdev, &attr_distance); + scan_noreclaim_unregister_node(node); + sysdev_unregister(&node->sysdev); } Index: linux-2.6.26-rc2-mm1/include/linux/rmap.h =================================================================== --- linux-2.6.26-rc2-mm1.orig/include/linux/rmap.h 2008-06-06 16:06:28.000000000 -0400 +++ linux-2.6.26-rc2-mm1/include/linux/rmap.h 2008-06-06 16:06:52.000000000 -0400 @@ -55,6 +55,9 @@ void anon_vma_unlink(struct vm_area_stru void anon_vma_link(struct vm_area_struct *); void __anon_vma_link(struct vm_area_struct *); +extern struct anon_vma *page_lock_anon_vma(struct page *page); +extern void page_unlock_anon_vma(struct anon_vma *anon_vma); + /* * rmap interfaces called when adding or removing pte of page */ Index: linux-2.6.26-rc2-mm1/mm/rmap.c =================================================================== --- linux-2.6.26-rc2-mm1.orig/mm/rmap.c 2008-06-06 16:06:28.000000000 -0400 +++ linux-2.6.26-rc2-mm1/mm/rmap.c 2008-06-06 16:06:52.000000000 -0400 @@ -168,7 +168,7 @@ void __init anon_vma_init(void) * Getting a lock on a stable anon_vma from a page off the LRU is * tricky: page_lock_anon_vma rely on RCU to guard against the races. */ -static struct anon_vma *page_lock_anon_vma(struct page *page) +struct anon_vma *page_lock_anon_vma(struct page *page) { struct anon_vma *anon_vma; unsigned long anon_mapping; @@ -188,7 +188,7 @@ out: return NULL; } -static void page_unlock_anon_vma(struct anon_vma *anon_vma) +void page_unlock_anon_vma(struct anon_vma *anon_vma) { spin_unlock(&anon_vma->lock); rcu_read_unlock(); -- All Rights Reversed -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2008-06-10 2:54 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <484DEA23.50707@cn.fujitsu.com>
2008-06-10 2:52 ` [PATCH -mm 23/25] Noreclaim LRU scan sysctl Li Zefan
2008-06-06 20:28 [PATCH -mm 00/25] VM pageout scalability improvements (V10) Rik van Riel, Rik van Riel
2008-06-06 20:29 ` [PATCH -mm 23/25] Noreclaim LRU scan sysctl Rik van Riel, Rik van Riel
2008-06-06 20:29 ` Rik van Riel, Rik van Riel, Lee Schermerhorn
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.