* [PATCH 0/3] Series short description @ 2010-11-30 10:14 Balbir Singh 2010-11-30 10:15 ` [PATCH 1/3] Move zone_reclaim() outside of CONFIG_NUMA Balbir Singh ` (2 more replies) 0 siblings, 3 replies; 27+ messages in thread From: Balbir Singh @ 2010-11-30 10:14 UTC (permalink / raw) To: linux-mm, Christoph Lameter; +Cc: akpm, linux-kernel, kvm The following series implements page cache control, this is a split out version of patch 1 of version 3 of the page cache optimization patches posted earlier at http://www.mail-archive.com/kvm@vger.kernel.org/msg43654.html Christoph Lamater recommended splitting out patch 1, which is what this series does Detailed Description ==================== This patch implements unmapped page cache control via preferred page cache reclaim. The current patch hooks into kswapd and reclaims page cache if the user has requested for unmapped page control. This is useful in the following scenario - In a virtualized environment with cache=writethrough, we see double caching - (one in the host and one in the guest). As we try to scale guests, cache usage across the system grows. The goal of this patch is to reclaim page cache when Linux is running as a guest and get the host to hold the page cache and manage it. There might be temporary duplication, but in the long run, memory in the guests would be used for mapped pages. - The option is controlled via a boot option and the administrator can selectively turn it on, on a need to use basis. A lot of the code is borrowed from zone_reclaim_mode logic for __zone_reclaim(). One might argue that the with ballooning and KSM this feature is not very useful, but even with ballooning, we need extra logic to balloon multiple VM machines and it is hard to figure out the correct amount of memory to balloon. With these patches applied, each guest has a sufficient amount of free memory available, that can be easily seen and reclaimed by the balloon driver. The additional memory in the guest can be reused for additional applications or used to start additional guests/balance memory in the host. KSM currently does not de-duplicate host and guest page cache. The goal of this patch is to help automatically balance unmapped page cache when instructed to do so. There are some magic numbers in use in the code, UNMAPPED_PAGE_RATIO and the number of pages to reclaim when unmapped_page_control argument is supplied. These numbers were chosen to avoid aggressiveness in reaping page cache ever so frequently, at the same time providing control. The sysctl for min_unmapped_ratio provides further control from within the guest on the amount of unmapped pages to reclaim. For a single VM - running kernbench Enabled Optimal load -j 8 run number 1... Optimal load -j 8 run number 2... Optimal load -j 8 run number 3... Optimal load -j 8 run number 4... Optimal load -j 8 run number 5... Average Optimal load -j 8 Run (std deviation): Elapsed Time 273.726 (1.2683) User Time 190.014 (0.589941) System Time 298.758 (1.72574) Percent CPU 178 (0) Context Switches 119953 (865.74) Sleeps 38758 (795.074) Disabled Optimal load -j 8 run number 1... Optimal load -j 8 run number 2... Optimal load -j 8 run number 3... Optimal load -j 8 run number 4... Optimal load -j 8 run number 5... Average Optimal load -j 8 Run (std deviation): Elapsed Time 272.672 (0.453178) User Time 189.7 (0.718157) System Time 296.77 (0.845606) Percent CPU 178 (0) Context Switches 118822 (277.434) Sleeps 37542.8 (545.922) More data on the test results with the earlier patch is at http://www.mail-archive.com/kvm@vger.kernel.org/msg43655.html --- Balbir Singh (3): Move zone_reclaim() outside of CONFIG_NUMA Refactor zone_reclaim, move reusable functionality outside Provide control over unmapped pages include/linux/mmzone.h | 4 +- include/linux/swap.h | 5 +- mm/page_alloc.c | 7 ++- mm/vmscan.c | 109 +++++++++++++++++++++++++++++++++++++++++------- 4 files changed, 104 insertions(+), 21 deletions(-) -- Balbir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 1/3] Move zone_reclaim() outside of CONFIG_NUMA 2010-11-30 10:14 [PATCH 0/3] Series short description Balbir Singh @ 2010-11-30 10:15 ` Balbir Singh 2010-11-30 19:18 ` Christoph Lameter 2010-11-30 22:23 ` Andrew Morton 2010-11-30 10:15 ` [PATCH 2/3] Refactor zone_reclaim Balbir Singh 2010-11-30 10:16 ` [PATCH 3/3] Provide control over unmapped pages Balbir Singh 2 siblings, 2 replies; 27+ messages in thread From: Balbir Singh @ 2010-11-30 10:15 UTC (permalink / raw) To: linux-mm, Christoph Lameter; +Cc: akpm, linux-kernel, kvm This patch moves zone_reclaim and associated helpers outside CONFIG_NUMA. This infrastructure is reused in the patches for page cache control that follow. Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> --- include/linux/mmzone.h | 4 ++-- mm/vmscan.c | 2 -- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 4890662..aeede91 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -302,12 +302,12 @@ struct zone { */ unsigned long lowmem_reserve[MAX_NR_ZONES]; -#ifdef CONFIG_NUMA - int node; /* * zone reclaim becomes active if more unmapped pages exist. */ unsigned long min_unmapped_pages; +#ifdef CONFIG_NUMA + int node; unsigned long min_slab_pages; #endif struct per_cpu_pageset __percpu *pageset; diff --git a/mm/vmscan.c b/mm/vmscan.c index 8cc90d5..325443a 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2644,7 +2644,6 @@ static int __init kswapd_init(void) module_init(kswapd_init) -#ifdef CONFIG_NUMA /* * Zone reclaim mode * @@ -2854,7 +2853,6 @@ int zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order) return ret; } -#endif /* * page_evictable - test whether a page is evictable -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH 1/3] Move zone_reclaim() outside of CONFIG_NUMA 2010-11-30 10:15 ` [PATCH 1/3] Move zone_reclaim() outside of CONFIG_NUMA Balbir Singh @ 2010-11-30 19:18 ` Christoph Lameter 2010-11-30 22:23 ` Andrew Morton 1 sibling, 0 replies; 27+ messages in thread From: Christoph Lameter @ 2010-11-30 19:18 UTC (permalink / raw) To: Balbir Singh; +Cc: linux-mm, akpm, linux-kernel, kvm Reviewed-by: Christoph Lameter <cl@linux.com> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/3] Move zone_reclaim() outside of CONFIG_NUMA 2010-11-30 10:15 ` [PATCH 1/3] Move zone_reclaim() outside of CONFIG_NUMA Balbir Singh 2010-11-30 19:18 ` Christoph Lameter @ 2010-11-30 22:23 ` Andrew Morton [not found] ` <20101201043408.GE2746@balbir.in.ibm.com> 1 sibling, 1 reply; 27+ messages in thread From: Andrew Morton @ 2010-11-30 22:23 UTC (permalink / raw) To: Balbir Singh; +Cc: linux-mm, Christoph Lameter, linux-kernel, kvm On Tue, 30 Nov 2010 15:45:12 +0530 Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > This patch moves zone_reclaim and associated helpers > outside CONFIG_NUMA. This infrastructure is reused > in the patches for page cache control that follow. > Thereby adding a nice dollop of bloat to everyone's kernel. I don't think that is justifiable given that the audience for this feature is about eight people :( How's about CONFIG_UNMAPPED_PAGECACHE_CONTROL? Also this patch instantiates sysctl_min_unmapped_ratio and sysctl_min_slab_ratio on non-NUMA builds but fails to make those tunables actually tunable in procfs. Changes to sysctl.c are needed. > Reviewed-by: Christoph Lameter <cl@linux.com> More careful reviewers, please. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
[parent not found: <20101201043408.GE2746@balbir.in.ibm.com>]
* Re: [PATCH 1/3] Move zone_reclaim() outside of CONFIG_NUMA [not found] ` <20101201043408.GE2746@balbir.in.ibm.com> @ 2010-12-01 5:21 ` Balbir Singh 0 siblings, 0 replies; 27+ messages in thread From: Balbir Singh @ 2010-12-01 5:21 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-mm, Christoph Lameter, linux-kernel, kvm * Balbir Singh <balbir@linux.vnet.ibm.com> [2010-12-01 10:04:08]: > * Andrew Morton <akpm@linux-foundation.org> [2010-11-30 14:23:38]: > > > On Tue, 30 Nov 2010 15:45:12 +0530 > > Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > > > > > This patch moves zone_reclaim and associated helpers > > > outside CONFIG_NUMA. This infrastructure is reused > > > in the patches for page cache control that follow. > > > > > > > Thereby adding a nice dollop of bloat to everyone's kernel. I don't > > think that is justifiable given that the audience for this feature is > > about eight people :( > > > > How's about CONFIG_UNMAPPED_PAGECACHE_CONTROL? > > > > OK, I'll add the config, but this code is enabled under CONFIG_NUMA > today, so the bloat I agree is more for non NUMA users. I'll make > CONFIG_UNMAPPED_PAGECACHE_CONTROL default if CONFIG_NUMA is set. > > > Also this patch instantiates sysctl_min_unmapped_ratio and > > sysctl_min_slab_ratio on non-NUMA builds but fails to make those > > tunables actually tunable in procfs. Changes to sysctl.c are > > needed. > > > > Oh! yeah.. I missed it while refactoring, my fault. > > > > Reviewed-by: Christoph Lameter <cl@linux.com> > > My local MTA failed to deliver the message, trying again. -- Three Cheers, Balbir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 2/3] Refactor zone_reclaim 2010-11-30 10:14 [PATCH 0/3] Series short description Balbir Singh 2010-11-30 10:15 ` [PATCH 1/3] Move zone_reclaim() outside of CONFIG_NUMA Balbir Singh @ 2010-11-30 10:15 ` Balbir Singh 2010-11-30 19:19 ` Christoph Lameter ` (2 more replies) 2010-11-30 10:16 ` [PATCH 3/3] Provide control over unmapped pages Balbir Singh 2 siblings, 3 replies; 27+ messages in thread From: Balbir Singh @ 2010-11-30 10:15 UTC (permalink / raw) To: linux-mm, Christoph Lameter; +Cc: akpm, linux-kernel, kvm Refactor zone_reclaim, move reusable functionality outside of zone_reclaim. Make zone_reclaim_unmapped_pages modular Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> --- mm/vmscan.c | 35 +++++++++++++++++++++++------------ 1 files changed, 23 insertions(+), 12 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 325443a..0ac444f 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2719,6 +2719,27 @@ static long zone_pagecache_reclaimable(struct zone *zone) } /* + * Helper function to reclaim unmapped pages, we might add something + * similar to this for slab cache as well. Currently this function + * is shared with __zone_reclaim() + */ +static inline void +zone_reclaim_unmapped_pages(struct zone *zone, struct scan_control *sc, + unsigned long nr_pages) +{ + int priority; + /* + * Free memory by calling shrink zone with increasing + * priorities until we have enough memory freed. + */ + priority = ZONE_RECLAIM_PRIORITY; + do { + shrink_zone(priority, zone, sc); + priority--; + } while (priority >= 0 && sc->nr_reclaimed < nr_pages); +} + +/* * Try to free up some pages from this zone through reclaim. */ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order) @@ -2727,7 +2748,6 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order) const unsigned long nr_pages = 1 << order; struct task_struct *p = current; struct reclaim_state reclaim_state; - int priority; struct scan_control sc = { .may_writepage = !!(zone_reclaim_mode & RECLAIM_WRITE), .may_unmap = !!(zone_reclaim_mode & RECLAIM_SWAP), @@ -2751,17 +2771,8 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order) reclaim_state.reclaimed_slab = 0; p->reclaim_state = &reclaim_state; - if (zone_pagecache_reclaimable(zone) > zone->min_unmapped_pages) { - /* - * Free memory by calling shrink zone with increasing - * priorities until we have enough memory freed. - */ - priority = ZONE_RECLAIM_PRIORITY; - do { - shrink_zone(priority, zone, &sc); - priority--; - } while (priority >= 0 && sc.nr_reclaimed < nr_pages); - } + if (zone_pagecache_reclaimable(zone) > zone->min_unmapped_pages) + zone_reclaim_unmapped_pages(zone, &sc, nr_pages); nr_slab_pages0 = zone_page_state(zone, NR_SLAB_RECLAIMABLE); if (nr_slab_pages0 > zone->min_slab_pages) { -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH 2/3] Refactor zone_reclaim 2010-11-30 10:15 ` [PATCH 2/3] Refactor zone_reclaim Balbir Singh @ 2010-11-30 19:19 ` Christoph Lameter 2010-12-01 1:23 ` KAMEZAWA Hiroyuki 2010-12-01 7:54 ` Minchan Kim 2 siblings, 0 replies; 27+ messages in thread From: Christoph Lameter @ 2010-11-30 19:19 UTC (permalink / raw) To: Balbir Singh; +Cc: linux-mm, akpm, linux-kernel, kvm Reviewed-by: Christoph Lameter <cl@linux.com> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 2/3] Refactor zone_reclaim 2010-11-30 10:15 ` [PATCH 2/3] Refactor zone_reclaim Balbir Singh 2010-11-30 19:19 ` Christoph Lameter @ 2010-12-01 1:23 ` KAMEZAWA Hiroyuki [not found] ` <20101201044634.GF2746@balbir.in.ibm.com> 2010-12-01 7:54 ` Minchan Kim 2 siblings, 1 reply; 27+ messages in thread From: KAMEZAWA Hiroyuki @ 2010-12-01 1:23 UTC (permalink / raw) To: Balbir Singh; +Cc: linux-mm, Christoph Lameter, akpm, linux-kernel, kvm On Tue, 30 Nov 2010 15:45:55 +0530 Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > Refactor zone_reclaim, move reusable functionality outside > of zone_reclaim. Make zone_reclaim_unmapped_pages modular > > Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> Why is this min_mapped_pages based on zone (IOW, per-zone) ? Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
[parent not found: <20101201044634.GF2746@balbir.in.ibm.com>]
* Re: [PATCH 2/3] Refactor zone_reclaim [not found] ` <20101201044634.GF2746@balbir.in.ibm.com> @ 2010-12-01 5:22 ` Balbir Singh 2010-12-01 8:59 ` KAMEZAWA Hiroyuki 0 siblings, 1 reply; 27+ messages in thread From: Balbir Singh @ 2010-12-01 5:22 UTC (permalink / raw) To: KAMEZAWA Hiroyuki; +Cc: linux-mm, Christoph Lameter, akpm, linux-kernel, kvm * Balbir Singh <balbir@linux.vnet.ibm.com> [2010-12-01 10:16:34]: > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2010-12-01 10:23:29]: > > > On Tue, 30 Nov 2010 15:45:55 +0530 > > Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > > > > > Refactor zone_reclaim, move reusable functionality outside > > > of zone_reclaim. Make zone_reclaim_unmapped_pages modular > > > > > > Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> > > > > Why is this min_mapped_pages based on zone (IOW, per-zone) ? > > > > Kamezawa-San, this has been the design before the refactoring (it is > based on zone_reclaim_mode and reclaim based on top of that). I am > reusing bits of existing technology. The advantage of it being > per-zone is that it integrates well with kswapd. > My local MTA failed to deliver the message, trying again. -- Three Cheers, Balbir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 2/3] Refactor zone_reclaim 2010-12-01 5:22 ` Balbir Singh @ 2010-12-01 8:59 ` KAMEZAWA Hiroyuki 0 siblings, 0 replies; 27+ messages in thread From: KAMEZAWA Hiroyuki @ 2010-12-01 8:59 UTC (permalink / raw) To: balbir; +Cc: linux-mm, Christoph Lameter, akpm, linux-kernel, kvm On Wed, 1 Dec 2010 10:52:18 +0530 Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > * Balbir Singh <balbir@linux.vnet.ibm.com> [2010-12-01 10:16:34]: > > > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2010-12-01 10:23:29]: > > > > > On Tue, 30 Nov 2010 15:45:55 +0530 > > > Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > > > > > > > Refactor zone_reclaim, move reusable functionality outside > > > > of zone_reclaim. Make zone_reclaim_unmapped_pages modular > > > > > > > > Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> > > > > > > Why is this min_mapped_pages based on zone (IOW, per-zone) ? > > > > > > > Kamezawa-San, this has been the design before the refactoring (it is > > based on zone_reclaim_mode and reclaim based on top of that). I am > > reusing bits of existing technology. The advantage of it being > > per-zone is that it integrates well with kswapd. > > > Sorry, what I wanted to here was: Why min_mapped_pages per zone ? Why you don't add "limit_for_unmapped_page_cache_size" for the whole system ? I guess what you really want is "limit_for_unmapped_page_cache_size". Then, you have to use this kind of mysterious code. == (zone_unmapped_file_pages(zone) > + UNMAPPED_PAGE_RATIO * zone->min_unmapped_pages)) Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 2/3] Refactor zone_reclaim 2010-11-30 10:15 ` [PATCH 2/3] Refactor zone_reclaim Balbir Singh 2010-11-30 19:19 ` Christoph Lameter 2010-12-01 1:23 ` KAMEZAWA Hiroyuki @ 2010-12-01 7:54 ` Minchan Kim 2 siblings, 0 replies; 27+ messages in thread From: Minchan Kim @ 2010-12-01 7:54 UTC (permalink / raw) To: Balbir Singh; +Cc: linux-mm, Christoph Lameter, akpm, linux-kernel, kvm Hi Balbir, On Tue, Nov 30, 2010 at 7:15 PM, Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > Refactor zone_reclaim, move reusable functionality outside > of zone_reclaim. Make zone_reclaim_unmapped_pages modular > > Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> > --- > mm/vmscan.c | 35 +++++++++++++++++++++++------------ > 1 files changed, 23 insertions(+), 12 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 325443a..0ac444f 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -2719,6 +2719,27 @@ static long zone_pagecache_reclaimable(struct zone *zone) > } > > /* > + * Helper function to reclaim unmapped pages, we might add something > + * similar to this for slab cache as well. Currently this function > + * is shared with __zone_reclaim() > + */ > +static inline void > +zone_reclaim_unmapped_pages(struct zone *zone, struct scan_control *sc, > + unsigned long nr_pages) > +{ > + int priority; > + /* > + * Free memory by calling shrink zone with increasing > + * priorities until we have enough memory freed. > + */ > + priority = ZONE_RECLAIM_PRIORITY; > + do { > + shrink_zone(priority, zone, sc); > + priority--; > + } while (priority >= 0 && sc->nr_reclaimed < nr_pages); > +} > + I don't see any specific logic about naming "zone_reclaim_unmapped_pages" in your function. Maybe, caller of this function have to handle sc->may_unmap. So, this function's naming is not good. As I see your logic, the function name would be just "zone_reclaim_pages" If you want to name it with zone_reclaim_unmapped_pages, it could be better with setting sc->may_unmap in this function. -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 3/3] Provide control over unmapped pages 2010-11-30 10:14 [PATCH 0/3] Series short description Balbir Singh 2010-11-30 10:15 ` [PATCH 1/3] Move zone_reclaim() outside of CONFIG_NUMA Balbir Singh 2010-11-30 10:15 ` [PATCH 2/3] Refactor zone_reclaim Balbir Singh @ 2010-11-30 10:16 ` Balbir Singh 2010-11-30 19:21 ` Christoph Lameter ` (3 more replies) 2 siblings, 4 replies; 27+ messages in thread From: Balbir Singh @ 2010-11-30 10:16 UTC (permalink / raw) To: linux-mm, Christoph Lameter; +Cc: akpm, linux-kernel, kvm Provide control using zone_reclaim() and a boot parameter. The code reuses functionality from zone_reclaim() to isolate unmapped pages and reclaim them as a priority, ahead of other mapped pages. Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> --- include/linux/swap.h | 5 ++- mm/page_alloc.c | 7 +++-- mm/vmscan.c | 72 +++++++++++++++++++++++++++++++++++++++++++++++++- 3 files changed, 79 insertions(+), 5 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index eba53e7..78b0830 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -252,11 +252,12 @@ extern int vm_swappiness; extern int remove_mapping(struct address_space *mapping, struct page *page); extern long vm_total_pages; -#ifdef CONFIG_NUMA -extern int zone_reclaim_mode; extern int sysctl_min_unmapped_ratio; extern int sysctl_min_slab_ratio; extern int zone_reclaim(struct zone *, gfp_t, unsigned int); +extern bool should_balance_unmapped_pages(struct zone *zone); +#ifdef CONFIG_NUMA +extern int zone_reclaim_mode; #else #define zone_reclaim_mode 0 static inline int zone_reclaim(struct zone *z, gfp_t mask, unsigned int order) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 62b7280..4228da3 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1662,6 +1662,9 @@ zonelist_scan: unsigned long mark; int ret; + if (should_balance_unmapped_pages(zone)) + wakeup_kswapd(zone, order); + mark = zone->watermark[alloc_flags & ALLOC_WMARK_MASK]; if (zone_watermark_ok(zone, order, mark, classzone_idx, alloc_flags)) @@ -4136,10 +4139,10 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat, zone->spanned_pages = size; zone->present_pages = realsize; -#ifdef CONFIG_NUMA - zone->node = nid; zone->min_unmapped_pages = (realsize*sysctl_min_unmapped_ratio) / 100; +#ifdef CONFIG_NUMA + zone->node = nid; zone->min_slab_pages = (realsize * sysctl_min_slab_ratio) / 100; #endif zone->name = zone_names[j]; diff --git a/mm/vmscan.c b/mm/vmscan.c index 0ac444f..98950f4 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -145,6 +145,21 @@ static DECLARE_RWSEM(shrinker_rwsem); #define scanning_global_lru(sc) (1) #endif +static unsigned long balance_unmapped_pages(int priority, struct zone *zone, + struct scan_control *sc); +static int unmapped_page_control __read_mostly; + +static int __init unmapped_page_control_parm(char *str) +{ + unmapped_page_control = 1; + /* + * XXX: Should we tweak swappiness here? + */ + return 1; +} +__setup("unmapped_page_control", unmapped_page_control_parm); + + static struct zone_reclaim_stat *get_reclaim_stat(struct zone *zone, struct scan_control *sc) { @@ -2223,6 +2238,12 @@ loop_again: shrink_active_list(SWAP_CLUSTER_MAX, zone, &sc, priority, 0); + /* + * We do unmapped page balancing once here and once + * below, so that we don't lose out + */ + balance_unmapped_pages(priority, zone, &sc); + if (!zone_watermark_ok_safe(zone, order, high_wmark_pages(zone), 0, 0)) { end_zone = i; @@ -2258,6 +2279,11 @@ loop_again: continue; sc.nr_scanned = 0; + /* + * Balance unmapped pages upfront, this should be + * really cheap + */ + balance_unmapped_pages(priority, zone, &sc); /* * Call soft limit reclaim before calling shrink_zone. @@ -2491,7 +2517,8 @@ void wakeup_kswapd(struct zone *zone, int order) pgdat->kswapd_max_order = order; if (!waitqueue_active(&pgdat->kswapd_wait)) return; - if (zone_watermark_ok_safe(zone, order, low_wmark_pages(zone), 0, 0)) + if (zone_watermark_ok_safe(zone, order, low_wmark_pages(zone), 0, 0) && + !should_balance_unmapped_pages(zone)) return; trace_mm_vmscan_wakeup_kswapd(pgdat->node_id, zone_idx(zone), order); @@ -2740,6 +2767,49 @@ zone_reclaim_unmapped_pages(struct zone *zone, struct scan_control *sc, } /* + * Routine to balance unmapped pages, inspired from the code under + * CONFIG_NUMA that does unmapped page and slab page control by keeping + * min_unmapped_pages in the zone. We currently reclaim just unmapped + * pages, slab control will come in soon, at which point this routine + * should be called balance cached pages + */ +static unsigned long balance_unmapped_pages(int priority, struct zone *zone, + struct scan_control *sc) +{ + if (unmapped_page_control && + (zone_unmapped_file_pages(zone) > zone->min_unmapped_pages)) { + struct scan_control nsc; + unsigned long nr_pages; + + nsc = *sc; + + nsc.swappiness = 0; + nsc.may_writepage = 0; + nsc.may_unmap = 0; + nsc.nr_reclaimed = 0; + + nr_pages = zone_unmapped_file_pages(zone) - + zone->min_unmapped_pages; + /* Magically try to reclaim eighth the unmapped cache pages */ + nr_pages >>= 3; + + zone_reclaim_unmapped_pages(zone, &nsc, nr_pages); + return nsc.nr_reclaimed; + } + return 0; +} + +#define UNMAPPED_PAGE_RATIO 16 +bool should_balance_unmapped_pages(struct zone *zone) +{ + if (unmapped_page_control && + (zone_unmapped_file_pages(zone) > + UNMAPPED_PAGE_RATIO * zone->min_unmapped_pages)) + return true; + return false; +} + +/* * Try to free up some pages from this zone through reclaim. */ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH 3/3] Provide control over unmapped pages 2010-11-30 10:16 ` [PATCH 3/3] Provide control over unmapped pages Balbir Singh @ 2010-11-30 19:21 ` Christoph Lameter 2010-11-30 22:25 ` Andrew Morton ` (2 subsequent siblings) 3 siblings, 0 replies; 27+ messages in thread From: Christoph Lameter @ 2010-11-30 19:21 UTC (permalink / raw) To: Balbir Singh; +Cc: linux-mm, akpm, linux-kernel, kvm Looks good. Reviewed-by: Christoph Lameter <cl@linux.com> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 3/3] Provide control over unmapped pages 2010-11-30 10:16 ` [PATCH 3/3] Provide control over unmapped pages Balbir Singh 2010-11-30 19:21 ` Christoph Lameter @ 2010-11-30 22:25 ` Andrew Morton [not found] ` <20101201045421.GG2746@balbir.in.ibm.com> 2010-12-01 15:01 ` Christoph Lameter 2010-12-01 0:14 ` KOSAKI Motohiro 2010-12-01 1:32 ` KAMEZAWA Hiroyuki 3 siblings, 2 replies; 27+ messages in thread From: Andrew Morton @ 2010-11-30 22:25 UTC (permalink / raw) To: Balbir Singh; +Cc: linux-mm, Christoph Lameter, linux-kernel, kvm On Tue, 30 Nov 2010 15:46:31 +0530 Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > Provide control using zone_reclaim() and a boot parameter. The > code reuses functionality from zone_reclaim() to isolate unmapped > pages and reclaim them as a priority, ahead of other mapped pages. > > Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> > --- > include/linux/swap.h | 5 ++- > mm/page_alloc.c | 7 +++-- > mm/vmscan.c | 72 +++++++++++++++++++++++++++++++++++++++++++++++++- > 3 files changed, 79 insertions(+), 5 deletions(-) > > diff --git a/include/linux/swap.h b/include/linux/swap.h > index eba53e7..78b0830 100644 > --- a/include/linux/swap.h > +++ b/include/linux/swap.h > @@ -252,11 +252,12 @@ extern int vm_swappiness; > extern int remove_mapping(struct address_space *mapping, struct page *page); > extern long vm_total_pages; > > -#ifdef CONFIG_NUMA > -extern int zone_reclaim_mode; > extern int sysctl_min_unmapped_ratio; > extern int sysctl_min_slab_ratio; This change will need to be moved into the first patch. > extern int zone_reclaim(struct zone *, gfp_t, unsigned int); > +extern bool should_balance_unmapped_pages(struct zone *zone); > +#ifdef CONFIG_NUMA > +extern int zone_reclaim_mode; > #else > #define zone_reclaim_mode 0 > static inline int zone_reclaim(struct zone *z, gfp_t mask, unsigned int order) > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 62b7280..4228da3 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1662,6 +1662,9 @@ zonelist_scan: > unsigned long mark; > int ret; > > + if (should_balance_unmapped_pages(zone)) > + wakeup_kswapd(zone, order); gack, this is on the page allocator fastpath, isn't it? So 99.99999999% of the world's machines end up doing a pointless call to a pointless function which pointlessly tests a pointless global and pointlessly returns? All because of some whacky KSM thing? The speed and space overhead of this code should be *zero* if !CONFIG_UNMAPPED_PAGECACHE_CONTROL and should be minimal if CONFIG_UNMAPPED_PAGECACHE_CONTROL=y. The way to do the latter is to inline the test of unmapped_page_control into callers and only if it is true (and use unlikely(), please) do we call into the KSM gunk. > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -145,6 +145,21 @@ static DECLARE_RWSEM(shrinker_rwsem); > #define scanning_global_lru(sc) (1) > #endif > > +static unsigned long balance_unmapped_pages(int priority, struct zone *zone, > + struct scan_control *sc); > +static int unmapped_page_control __read_mostly; > + > +static int __init unmapped_page_control_parm(char *str) > +{ > + unmapped_page_control = 1; > + /* > + * XXX: Should we tweak swappiness here? > + */ > + return 1; > +} > +__setup("unmapped_page_control", unmapped_page_control_parm); aw c'mon guys, everybody knows that when you add a kernel parameter you document it in Documentation/kernel-parameters.txt. > static struct zone_reclaim_stat *get_reclaim_stat(struct zone *zone, > struct scan_control *sc) > { > @@ -2223,6 +2238,12 @@ loop_again: > shrink_active_list(SWAP_CLUSTER_MAX, zone, > &sc, priority, 0); > > + /* > + * We do unmapped page balancing once here and once > + * below, so that we don't lose out > + */ > + balance_unmapped_pages(priority, zone, &sc); > + > if (!zone_watermark_ok_safe(zone, order, > high_wmark_pages(zone), 0, 0)) { > end_zone = i; > @@ -2258,6 +2279,11 @@ loop_again: > continue; > > sc.nr_scanned = 0; > + /* > + * Balance unmapped pages upfront, this should be > + * really cheap > + */ > + balance_unmapped_pages(priority, zone, &sc); More unjustifiable overhead on a commonly-executed codepath. > /* > * Call soft limit reclaim before calling shrink_zone. > @@ -2491,7 +2517,8 @@ void wakeup_kswapd(struct zone *zone, int order) > pgdat->kswapd_max_order = order; > if (!waitqueue_active(&pgdat->kswapd_wait)) > return; > - if (zone_watermark_ok_safe(zone, order, low_wmark_pages(zone), 0, 0)) > + if (zone_watermark_ok_safe(zone, order, low_wmark_pages(zone), 0, 0) && > + !should_balance_unmapped_pages(zone)) > return; > > trace_mm_vmscan_wakeup_kswapd(pgdat->node_id, zone_idx(zone), order); > @@ -2740,6 +2767,49 @@ zone_reclaim_unmapped_pages(struct zone *zone, struct scan_control *sc, > } > > /* > + * Routine to balance unmapped pages, inspired from the code under > + * CONFIG_NUMA that does unmapped page and slab page control by keeping > + * min_unmapped_pages in the zone. We currently reclaim just unmapped > + * pages, slab control will come in soon, at which point this routine > + * should be called balance cached pages > + */ The problem I have with this comment is that it uses the term "balance" without ever defining it. Plus "balance" is already a term which is used in memory reclaim. So if you can think up a unique noun then that's good but whether or not that is done, please describe with great care what that term actually means in this context. > +static unsigned long balance_unmapped_pages(int priority, struct zone *zone, > + struct scan_control *sc) > +{ > + if (unmapped_page_control && > + (zone_unmapped_file_pages(zone) > zone->min_unmapped_pages)) { > + struct scan_control nsc; > + unsigned long nr_pages; > + > + nsc = *sc; > + > + nsc.swappiness = 0; > + nsc.may_writepage = 0; > + nsc.may_unmap = 0; > + nsc.nr_reclaimed = 0; Doing a clone-and-own of a scan_control is novel. What's going on here? > + nr_pages = zone_unmapped_file_pages(zone) - > + zone->min_unmapped_pages; > + /* Magically try to reclaim eighth the unmapped cache pages */ > + nr_pages >>= 3; > + > + zone_reclaim_unmapped_pages(zone, &nsc, nr_pages); > + return nsc.nr_reclaimed; > + } > + return 0; > +} > + > +#define UNMAPPED_PAGE_RATIO 16 Well. Giving 16 a name didn't really clarify anything. Attentive readers will want to know what this does, why 16 was chosen and what the effects of changing it will be. > +bool should_balance_unmapped_pages(struct zone *zone) > +{ > + if (unmapped_page_control && > + (zone_unmapped_file_pages(zone) > > + UNMAPPED_PAGE_RATIO * zone->min_unmapped_pages)) > + return true; > + return false; > +} > Reviewed-by: Christoph Lameter <cl@linux.com> So you're OK with shoving all this flotsam into 100,000,000 cellphones? This was a pretty outrageous patchset! -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
[parent not found: <20101201045421.GG2746@balbir.in.ibm.com>]
* Re: [PATCH 3/3] Provide control over unmapped pages [not found] ` <20101201045421.GG2746@balbir.in.ibm.com> @ 2010-12-01 5:22 ` Balbir Singh 2010-12-01 8:18 ` Minchan Kim 0 siblings, 1 reply; 27+ messages in thread From: Balbir Singh @ 2010-12-01 5:22 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-mm, Christoph Lameter, linux-kernel, kvm * Balbir Singh <balbir@linux.vnet.ibm.com> [2010-12-01 10:24:21]: > * Andrew Morton <akpm@linux-foundation.org> [2010-11-30 14:25:09]: > > > On Tue, 30 Nov 2010 15:46:31 +0530 > > Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > > > > > Provide control using zone_reclaim() and a boot parameter. The > > > code reuses functionality from zone_reclaim() to isolate unmapped > > > pages and reclaim them as a priority, ahead of other mapped pages. > > > > > > Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> > > > --- > > > include/linux/swap.h | 5 ++- > > > mm/page_alloc.c | 7 +++-- > > > mm/vmscan.c | 72 +++++++++++++++++++++++++++++++++++++++++++++++++- > > > 3 files changed, 79 insertions(+), 5 deletions(-) > > > > > > diff --git a/include/linux/swap.h b/include/linux/swap.h > > > index eba53e7..78b0830 100644 > > > --- a/include/linux/swap.h > > > +++ b/include/linux/swap.h > > > @@ -252,11 +252,12 @@ extern int vm_swappiness; > > > extern int remove_mapping(struct address_space *mapping, struct page *page); > > > extern long vm_total_pages; > > > > > > -#ifdef CONFIG_NUMA > > > -extern int zone_reclaim_mode; > > > extern int sysctl_min_unmapped_ratio; > > > extern int sysctl_min_slab_ratio; > > > > This change will need to be moved into the first patch. > > > > OK, will do, thanks for pointing it out > > > > extern int zone_reclaim(struct zone *, gfp_t, unsigned int); > > > +extern bool should_balance_unmapped_pages(struct zone *zone); > > > +#ifdef CONFIG_NUMA > > > +extern int zone_reclaim_mode; > > > #else > > > #define zone_reclaim_mode 0 > > > static inline int zone_reclaim(struct zone *z, gfp_t mask, unsigned int order) > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > > index 62b7280..4228da3 100644 > > > --- a/mm/page_alloc.c > > > +++ b/mm/page_alloc.c > > > @@ -1662,6 +1662,9 @@ zonelist_scan: > > > unsigned long mark; > > > int ret; > > > > > > + if (should_balance_unmapped_pages(zone)) > > > + wakeup_kswapd(zone, order); > > > > gack, this is on the page allocator fastpath, isn't it? So > > 99.99999999% of the world's machines end up doing a pointless call to a > > pointless function which pointlessly tests a pointless global and > > pointlessly returns? All because of some whacky KSM thing? > > > > The speed and space overhead of this code should be *zero* if > > !CONFIG_UNMAPPED_PAGECACHE_CONTROL and should be minimal if > > CONFIG_UNMAPPED_PAGECACHE_CONTROL=y. The way to do the latter is to > > inline the test of unmapped_page_control into callers and only if it is > > true (and use unlikely(), please) do we call into the KSM gunk. > > > > Will do, should_balance_unmapped_pages() will be a made a no-op in the > absence of CONFIG_UNMAPPED_PAGECACHE_CONTROL > > > > --- a/mm/vmscan.c > > > +++ b/mm/vmscan.c > > > @@ -145,6 +145,21 @@ static DECLARE_RWSEM(shrinker_rwsem); > > > #define scanning_global_lru(sc) (1) > > > #endif > > > > > > +static unsigned long balance_unmapped_pages(int priority, struct zone *zone, > > > + struct scan_control *sc); > > > +static int unmapped_page_control __read_mostly; > > > + > > > +static int __init unmapped_page_control_parm(char *str) > > > +{ > > > + unmapped_page_control = 1; > > > + /* > > > + * XXX: Should we tweak swappiness here? > > > + */ > > > + return 1; > > > +} > > > +__setup("unmapped_page_control", unmapped_page_control_parm); > > > > aw c'mon guys, everybody knows that when you add a kernel parameter you > > document it in Documentation/kernel-parameters.txt. > > Will do - feeling silly on missing it out, that is where reviews help. > > > > > > static struct zone_reclaim_stat *get_reclaim_stat(struct zone *zone, > > > struct scan_control *sc) > > > { > > > @@ -2223,6 +2238,12 @@ loop_again: > > > shrink_active_list(SWAP_CLUSTER_MAX, zone, > > > &sc, priority, 0); > > > > > > + /* > > > + * We do unmapped page balancing once here and once > > > + * below, so that we don't lose out > > > + */ > > > + balance_unmapped_pages(priority, zone, &sc); > > > + > > > if (!zone_watermark_ok_safe(zone, order, > > > high_wmark_pages(zone), 0, 0)) { > > > end_zone = i; > > > @@ -2258,6 +2279,11 @@ loop_again: > > > continue; > > > > > > sc.nr_scanned = 0; > > > + /* > > > + * Balance unmapped pages upfront, this should be > > > + * really cheap > > > + */ > > > + balance_unmapped_pages(priority, zone, &sc); > > > > More unjustifiable overhead on a commonly-executed codepath. > > > > Will refactor with a CONFIG suggested above. > > > > /* > > > * Call soft limit reclaim before calling shrink_zone. > > > @@ -2491,7 +2517,8 @@ void wakeup_kswapd(struct zone *zone, int order) > > > pgdat->kswapd_max_order = order; > > > if (!waitqueue_active(&pgdat->kswapd_wait)) > > > return; > > > - if (zone_watermark_ok_safe(zone, order, low_wmark_pages(zone), 0, 0)) > > > + if (zone_watermark_ok_safe(zone, order, low_wmark_pages(zone), 0, 0) && > > > + !should_balance_unmapped_pages(zone)) > > > return; > > > > > > trace_mm_vmscan_wakeup_kswapd(pgdat->node_id, zone_idx(zone), order); > > > @@ -2740,6 +2767,49 @@ zone_reclaim_unmapped_pages(struct zone *zone, struct scan_control *sc, > > > } > > > > > > /* > > > + * Routine to balance unmapped pages, inspired from the code under > > > + * CONFIG_NUMA that does unmapped page and slab page control by keeping > > > + * min_unmapped_pages in the zone. We currently reclaim just unmapped > > > + * pages, slab control will come in soon, at which point this routine > > > + * should be called balance cached pages > > > + */ > > > > The problem I have with this comment is that it uses the term "balance" > > without ever defining it. Plus "balance" is already a term which is used > > in memory reclaim. > > > > So if you can think up a unique noun then that's good but whether or > > not that is done, please describe with great care what that term > > actually means in this context. > > I used balance as a not a 1:1 balance, but to balance the proportion > of unmapped page cache based on a sysctl/tunable. > > > > > > +static unsigned long balance_unmapped_pages(int priority, struct zone *zone, > > > + struct scan_control *sc) > > > +{ > > > + if (unmapped_page_control && > > > + (zone_unmapped_file_pages(zone) > zone->min_unmapped_pages)) { > > > + struct scan_control nsc; > > > + unsigned long nr_pages; > > > + > > > + nsc = *sc; > > > + > > > + nsc.swappiness = 0; > > > + nsc.may_writepage = 0; > > > + nsc.may_unmap = 0; > > > + nsc.nr_reclaimed = 0; > > > > Doing a clone-and-own of a scan_control is novel. What's going on here? > > This code overwrites the swappiness, may_* and nr_reclaimed for > correct stats. The idea is to vary the reclaim behaviour/bias it. > > > > > > + nr_pages = zone_unmapped_file_pages(zone) - > > > + zone->min_unmapped_pages; > > > + /* Magically try to reclaim eighth the unmapped cache pages */ > > > + nr_pages >>= 3; > > > + > > > + zone_reclaim_unmapped_pages(zone, &nsc, nr_pages); > > > + return nsc.nr_reclaimed; > > > + } > > > + return 0; > > > +} > > > + > > > +#define UNMAPPED_PAGE_RATIO 16 > > > > Well. Giving 16 a name didn't really clarify anything. Attentive > > readers will want to know what this does, why 16 was chosen and what > > the effects of changing it will be. > > Sorry, I documented that in the changelog of the first patchset. I'll > document it here as well. The reason for choosing 16 is based on > heuristics and test, the tradeoff being overenthusiastic reclaim > versus size of cache/performance. > > > > > > +bool should_balance_unmapped_pages(struct zone *zone) > > > +{ > > > + if (unmapped_page_control && > > > + (zone_unmapped_file_pages(zone) > > > > + UNMAPPED_PAGE_RATIO * zone->min_unmapped_pages)) > > > + return true; > > > + return false; > > > +} > > > > > > > Reviewed-by: Christoph Lameter <cl@linux.com> > > > > So you're OK with shoving all this flotsam into 100,000,000 cellphones? > > This was a pretty outrageous patchset! > > I'll do a better one, BTW, a lot of embedded folks are interested in > page cache control outside of cgroup behaviour. > > Thanks for the detailed review! > My local MTA failed to deliver the message, trying again. -- Three Cheers, Balbir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 3/3] Provide control over unmapped pages 2010-12-01 5:22 ` Balbir Singh @ 2010-12-01 8:18 ` Minchan Kim 0 siblings, 0 replies; 27+ messages in thread From: Minchan Kim @ 2010-12-01 8:18 UTC (permalink / raw) To: balbir; +Cc: Andrew Morton, linux-mm, Christoph Lameter, linux-kernel, kvm On Wed, Dec 1, 2010 at 2:22 PM, Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > * Balbir Singh <balbir@linux.vnet.ibm.com> [2010-12-01 10:24:21]: > >> * Andrew Morton <akpm@linux-foundation.org> [2010-11-30 14:25:09]: >> > So you're OK with shoving all this flotsam into 100,000,000 cellphones? >> > This was a pretty outrageous patchset! >> >> I'll do a better one, BTW, a lot of embedded folks are interested in >> page cache control outside of cgroup behaviour. Yes. Embedded people(at least, me) want it. That's because they don't have any swap device so they could reclaim only page cache page. And many page cache pages are mapped at address space of application(ex, android uses java model so many pages are mapped by application's address space). It means it's hard to reclaim them without lagging. So I have a interest in this patch. -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 3/3] Provide control over unmapped pages 2010-11-30 22:25 ` Andrew Morton [not found] ` <20101201045421.GG2746@balbir.in.ibm.com> @ 2010-12-01 15:01 ` Christoph Lameter 2010-12-02 1:22 ` KOSAKI Motohiro 1 sibling, 1 reply; 27+ messages in thread From: Christoph Lameter @ 2010-12-01 15:01 UTC (permalink / raw) To: Andrew Morton; +Cc: Balbir Singh, linux-mm, linux-kernel, kvm On Tue, 30 Nov 2010, Andrew Morton wrote: > > +#define UNMAPPED_PAGE_RATIO 16 > > Well. Giving 16 a name didn't really clarify anything. Attentive > readers will want to know what this does, why 16 was chosen and what > the effects of changing it will be. The meaning is analoguous to the other zone reclaim ratio. But yes it should be justified and defined. > > Reviewed-by: Christoph Lameter <cl@linux.com> > > So you're OK with shoving all this flotsam into 100,000,000 cellphones? > This was a pretty outrageous patchset! This is a feature that has been requested over and over for years. Using /proc/vm/drop_caches for fixing situations where one simply has too many page cache pages is not so much fun in the long run. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 3/3] Provide control over unmapped pages 2010-12-01 15:01 ` Christoph Lameter @ 2010-12-02 1:22 ` KOSAKI Motohiro 2010-12-02 2:50 ` KAMEZAWA Hiroyuki 0 siblings, 1 reply; 27+ messages in thread From: KOSAKI Motohiro @ 2010-12-02 1:22 UTC (permalink / raw) To: Christoph Lameter Cc: kosaki.motohiro, Andrew Morton, Balbir Singh, linux-mm, linux-kernel, kvm > On Tue, 30 Nov 2010, Andrew Morton wrote: > > > > +#define UNMAPPED_PAGE_RATIO 16 > > > > Well. Giving 16 a name didn't really clarify anything. Attentive > > readers will want to know what this does, why 16 was chosen and what > > the effects of changing it will be. > > The meaning is analoguous to the other zone reclaim ratio. But yes it > should be justified and defined. > > > > Reviewed-by: Christoph Lameter <cl@linux.com> > > > > So you're OK with shoving all this flotsam into 100,000,000 cellphones? > > This was a pretty outrageous patchset! > > This is a feature that has been requested over and over for years. Using > /proc/vm/drop_caches for fixing situations where one simply has too many > page cache pages is not so much fun in the long run. I'm not against page cache limitation feature at all. But, this is too ugly and too destructive fast path. I hope this patch reduce negative impact more. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 3/3] Provide control over unmapped pages 2010-12-02 1:22 ` KOSAKI Motohiro @ 2010-12-02 2:50 ` KAMEZAWA Hiroyuki 2010-12-02 7:01 ` Balbir Singh 0 siblings, 1 reply; 27+ messages in thread From: KAMEZAWA Hiroyuki @ 2010-12-02 2:50 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Christoph Lameter, Andrew Morton, Balbir Singh, linux-mm, linux-kernel, kvm On Thu, 2 Dec 2010 10:22:16 +0900 (JST) KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote: > > On Tue, 30 Nov 2010, Andrew Morton wrote: > > > > > > +#define UNMAPPED_PAGE_RATIO 16 > > > > > > Well. Giving 16 a name didn't really clarify anything. Attentive > > > readers will want to know what this does, why 16 was chosen and what > > > the effects of changing it will be. > > > > The meaning is analoguous to the other zone reclaim ratio. But yes it > > should be justified and defined. > > > > > > Reviewed-by: Christoph Lameter <cl@linux.com> > > > > > > So you're OK with shoving all this flotsam into 100,000,000 cellphones? > > > This was a pretty outrageous patchset! > > > > This is a feature that has been requested over and over for years. Using > > /proc/vm/drop_caches for fixing situations where one simply has too many > > page cache pages is not so much fun in the long run. > > I'm not against page cache limitation feature at all. But, this is > too ugly and too destructive fast path. I hope this patch reduce negative > impact more. > And I think min_mapped_unmapped_pages is ugly. It should be "unmapped_pagecache_limit" or some because it's for limitation feature. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 3/3] Provide control over unmapped pages 2010-12-02 2:50 ` KAMEZAWA Hiroyuki @ 2010-12-02 7:01 ` Balbir Singh 0 siblings, 0 replies; 27+ messages in thread From: Balbir Singh @ 2010-12-02 7:01 UTC (permalink / raw) To: KAMEZAWA Hiroyuki Cc: KOSAKI Motohiro, Christoph Lameter, Andrew Morton, linux-mm, linux-kernel, kvm * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2010-12-02 11:50:36]: > On Thu, 2 Dec 2010 10:22:16 +0900 (JST) > KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote: > > > > On Tue, 30 Nov 2010, Andrew Morton wrote: > > > > > > > > +#define UNMAPPED_PAGE_RATIO 16 > > > > > > > > Well. Giving 16 a name didn't really clarify anything. Attentive > > > > readers will want to know what this does, why 16 was chosen and what > > > > the effects of changing it will be. > > > > > > The meaning is analoguous to the other zone reclaim ratio. But yes it > > > should be justified and defined. > > > > > > > > Reviewed-by: Christoph Lameter <cl@linux.com> > > > > > > > > So you're OK with shoving all this flotsam into 100,000,000 cellphones? > > > > This was a pretty outrageous patchset! > > > > > > This is a feature that has been requested over and over for years. Using > > > /proc/vm/drop_caches for fixing situations where one simply has too many > > > page cache pages is not so much fun in the long run. > > > > I'm not against page cache limitation feature at all. But, this is > > too ugly and too destructive fast path. I hope this patch reduce negative > > impact more. > > > > And I think min_mapped_unmapped_pages is ugly. It should be > "unmapped_pagecache_limit" or some because it's for limitation feature. > The feature will now be enabled with a CONFIG and boot parameter, I find changing the naming convention now - it is already in use and well known is not a good idea. THe name of the boot parameter can be changed of-course. -- Three Cheers, Balbir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 3/3] Provide control over unmapped pages 2010-11-30 10:16 ` [PATCH 3/3] Provide control over unmapped pages Balbir Singh 2010-11-30 19:21 ` Christoph Lameter 2010-11-30 22:25 ` Andrew Morton @ 2010-12-01 0:14 ` KOSAKI Motohiro [not found] ` <20101201051632.GH2746@balbir.in.ibm.com> 2010-12-01 1:32 ` KAMEZAWA Hiroyuki 3 siblings, 1 reply; 27+ messages in thread From: KOSAKI Motohiro @ 2010-12-01 0:14 UTC (permalink / raw) To: Balbir Singh Cc: kosaki.motohiro, linux-mm, Christoph Lameter, akpm, linux-kernel, kvm > Provide control using zone_reclaim() and a boot parameter. The > code reuses functionality from zone_reclaim() to isolate unmapped > pages and reclaim them as a priority, ahead of other mapped pages. > > Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> > --- > include/linux/swap.h | 5 ++- > mm/page_alloc.c | 7 +++-- > mm/vmscan.c | 72 +++++++++++++++++++++++++++++++++++++++++++++++++- > 3 files changed, 79 insertions(+), 5 deletions(-) > > diff --git a/include/linux/swap.h b/include/linux/swap.h > index eba53e7..78b0830 100644 > --- a/include/linux/swap.h > +++ b/include/linux/swap.h > @@ -252,11 +252,12 @@ extern int vm_swappiness; > extern int remove_mapping(struct address_space *mapping, struct page *page); > extern long vm_total_pages; > > -#ifdef CONFIG_NUMA > -extern int zone_reclaim_mode; > extern int sysctl_min_unmapped_ratio; > extern int sysctl_min_slab_ratio; > extern int zone_reclaim(struct zone *, gfp_t, unsigned int); > +extern bool should_balance_unmapped_pages(struct zone *zone); > +#ifdef CONFIG_NUMA > +extern int zone_reclaim_mode; > #else > #define zone_reclaim_mode 0 > static inline int zone_reclaim(struct zone *z, gfp_t mask, unsigned int order) > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 62b7280..4228da3 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1662,6 +1662,9 @@ zonelist_scan: > unsigned long mark; > int ret; > > + if (should_balance_unmapped_pages(zone)) > + wakeup_kswapd(zone, order); > + You don't have to add extra branch into fast path. > mark = zone->watermark[alloc_flags & ALLOC_WMARK_MASK]; > if (zone_watermark_ok(zone, order, mark, > classzone_idx, alloc_flags)) > @@ -4136,10 +4139,10 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat, > > zone->spanned_pages = size; > zone->present_pages = realsize; > -#ifdef CONFIG_NUMA > - zone->node = nid; > zone->min_unmapped_pages = (realsize*sysctl_min_unmapped_ratio) > / 100; > +#ifdef CONFIG_NUMA > + zone->node = nid; > zone->min_slab_pages = (realsize * sysctl_min_slab_ratio) / 100; > #endif > zone->name = zone_names[j]; > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 0ac444f..98950f4 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -145,6 +145,21 @@ static DECLARE_RWSEM(shrinker_rwsem); > #define scanning_global_lru(sc) (1) > #endif > > +static unsigned long balance_unmapped_pages(int priority, struct zone *zone, > + struct scan_control *sc); > +static int unmapped_page_control __read_mostly; > + > +static int __init unmapped_page_control_parm(char *str) > +{ > + unmapped_page_control = 1; > + /* > + * XXX: Should we tweak swappiness here? > + */ > + return 1; > +} > +__setup("unmapped_page_control", unmapped_page_control_parm); > + > + > static struct zone_reclaim_stat *get_reclaim_stat(struct zone *zone, > struct scan_control *sc) > { > @@ -2223,6 +2238,12 @@ loop_again: > shrink_active_list(SWAP_CLUSTER_MAX, zone, > &sc, priority, 0); > > + /* > + * We do unmapped page balancing once here and once > + * below, so that we don't lose out > + */ > + balance_unmapped_pages(priority, zone, &sc); You can't invoke any reclaim from here. It is in zone balancing detection phase. It mean your code reclaim pages from zones which has lots free pages too. > + > if (!zone_watermark_ok_safe(zone, order, > high_wmark_pages(zone), 0, 0)) { > end_zone = i; > @@ -2258,6 +2279,11 @@ loop_again: > continue; > > sc.nr_scanned = 0; > + /* > + * Balance unmapped pages upfront, this should be > + * really cheap > + */ > + balance_unmapped_pages(priority, zone, &sc); This code break page-cache/slab balancing logic. And this is conflict against Nick's per-zone slab effort. Plus, high-order + priority=5 reclaim Simon's case. (see "Free memory never fully used, swapping" threads) > > /* > * Call soft limit reclaim before calling shrink_zone. > @@ -2491,7 +2517,8 @@ void wakeup_kswapd(struct zone *zone, int order) > pgdat->kswapd_max_order = order; > if (!waitqueue_active(&pgdat->kswapd_wait)) > return; > - if (zone_watermark_ok_safe(zone, order, low_wmark_pages(zone), 0, 0)) > + if (zone_watermark_ok_safe(zone, order, low_wmark_pages(zone), 0, 0) && > + !should_balance_unmapped_pages(zone)) > return; > > trace_mm_vmscan_wakeup_kswapd(pgdat->node_id, zone_idx(zone), order); > @@ -2740,6 +2767,49 @@ zone_reclaim_unmapped_pages(struct zone *zone, struct scan_control *sc, > } > > /* > + * Routine to balance unmapped pages, inspired from the code under > + * CONFIG_NUMA that does unmapped page and slab page control by keeping > + * min_unmapped_pages in the zone. We currently reclaim just unmapped > + * pages, slab control will come in soon, at which point this routine > + * should be called balance cached pages > + */ > +static unsigned long balance_unmapped_pages(int priority, struct zone *zone, > + struct scan_control *sc) > +{ > + if (unmapped_page_control && > + (zone_unmapped_file_pages(zone) > zone->min_unmapped_pages)) { > + struct scan_control nsc; > + unsigned long nr_pages; > + > + nsc = *sc; > + > + nsc.swappiness = 0; > + nsc.may_writepage = 0; > + nsc.may_unmap = 0; > + nsc.nr_reclaimed = 0; Don't you need to fill nsc.nr_to_reclaim field? > + > + nr_pages = zone_unmapped_file_pages(zone) - > + zone->min_unmapped_pages; > + /* Magically try to reclaim eighth the unmapped cache pages */ > + nr_pages >>= 3; Please don't make magic. > + > + zone_reclaim_unmapped_pages(zone, &nsc, nr_pages); > + return nsc.nr_reclaimed; > + } > + return 0; > +} > + > +#define UNMAPPED_PAGE_RATIO 16 Please don't make magic ratio. > +bool should_balance_unmapped_pages(struct zone *zone) > +{ > + if (unmapped_page_control && > + (zone_unmapped_file_pages(zone) > > + UNMAPPED_PAGE_RATIO * zone->min_unmapped_pages)) > + return true; > + return false; > +} > + > +/* > * Try to free up some pages from this zone through reclaim. > */ > static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order) Hmm.... As far as I reviewed, I can't find any reason why this patch works as expected. So, I think cleancache looks promising more than this idea. Have you seen Dan's patch? I would suggested discuss him. Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
[parent not found: <20101201051632.GH2746@balbir.in.ibm.com>]
* Re: [PATCH 3/3] Provide control over unmapped pages [not found] ` <20101201051632.GH2746@balbir.in.ibm.com> @ 2010-12-01 5:22 ` Balbir Singh 0 siblings, 0 replies; 27+ messages in thread From: Balbir Singh @ 2010-12-01 5:22 UTC (permalink / raw) To: KOSAKI Motohiro Cc: linux-mm, Christoph Lameter, akpm, linux-kernel, kvm, Nick Piggin * Balbir Singh <balbir@linux.vnet.ibm.com> [2010-12-01 10:46:32]: > * KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> [2010-12-01 09:14:13]: > > > > Provide control using zone_reclaim() and a boot parameter. The > > > code reuses functionality from zone_reclaim() to isolate unmapped > > > pages and reclaim them as a priority, ahead of other mapped pages. > > > > > > Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> > > > --- > > > include/linux/swap.h | 5 ++- > > > mm/page_alloc.c | 7 +++-- > > > mm/vmscan.c | 72 +++++++++++++++++++++++++++++++++++++++++++++++++- > > > 3 files changed, 79 insertions(+), 5 deletions(-) > > > > > > diff --git a/include/linux/swap.h b/include/linux/swap.h > > > index eba53e7..78b0830 100644 > > > --- a/include/linux/swap.h > > > +++ b/include/linux/swap.h > > > @@ -252,11 +252,12 @@ extern int vm_swappiness; > > > extern int remove_mapping(struct address_space *mapping, struct page *page); > > > extern long vm_total_pages; > > > > > > -#ifdef CONFIG_NUMA > > > -extern int zone_reclaim_mode; > > > extern int sysctl_min_unmapped_ratio; > > > extern int sysctl_min_slab_ratio; > > > extern int zone_reclaim(struct zone *, gfp_t, unsigned int); > > > +extern bool should_balance_unmapped_pages(struct zone *zone); > > > +#ifdef CONFIG_NUMA > > > +extern int zone_reclaim_mode; > > > #else > > > #define zone_reclaim_mode 0 > > > static inline int zone_reclaim(struct zone *z, gfp_t mask, unsigned int order) > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > > index 62b7280..4228da3 100644 > > > --- a/mm/page_alloc.c > > > +++ b/mm/page_alloc.c > > > @@ -1662,6 +1662,9 @@ zonelist_scan: > > > unsigned long mark; > > > int ret; > > > > > > + if (should_balance_unmapped_pages(zone)) > > > + wakeup_kswapd(zone, order); > > > + > > > > You don't have to add extra branch into fast path. > > > > > > > mark = zone->watermark[alloc_flags & ALLOC_WMARK_MASK]; > > > if (zone_watermark_ok(zone, order, mark, > > > classzone_idx, alloc_flags)) > > > @@ -4136,10 +4139,10 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat, > > > > > > zone->spanned_pages = size; > > > zone->present_pages = realsize; > > > -#ifdef CONFIG_NUMA > > > - zone->node = nid; > > > zone->min_unmapped_pages = (realsize*sysctl_min_unmapped_ratio) > > > / 100; > > > +#ifdef CONFIG_NUMA > > > + zone->node = nid; > > > zone->min_slab_pages = (realsize * sysctl_min_slab_ratio) / 100; > > > #endif > > > zone->name = zone_names[j]; > > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > > index 0ac444f..98950f4 100644 > > > --- a/mm/vmscan.c > > > +++ b/mm/vmscan.c > > > @@ -145,6 +145,21 @@ static DECLARE_RWSEM(shrinker_rwsem); > > > #define scanning_global_lru(sc) (1) > > > #endif > > > > > > +static unsigned long balance_unmapped_pages(int priority, struct zone *zone, > > > + struct scan_control *sc); > > > +static int unmapped_page_control __read_mostly; > > > + > > > +static int __init unmapped_page_control_parm(char *str) > > > +{ > > > + unmapped_page_control = 1; > > > + /* > > > + * XXX: Should we tweak swappiness here? > > > + */ > > > + return 1; > > > +} > > > +__setup("unmapped_page_control", unmapped_page_control_parm); > > > + > > > + > > > static struct zone_reclaim_stat *get_reclaim_stat(struct zone *zone, > > > struct scan_control *sc) > > > { > > > @@ -2223,6 +2238,12 @@ loop_again: > > > shrink_active_list(SWAP_CLUSTER_MAX, zone, > > > &sc, priority, 0); > > > > > > + /* > > > + * We do unmapped page balancing once here and once > > > + * below, so that we don't lose out > > > + */ > > > + balance_unmapped_pages(priority, zone, &sc); > > > > You can't invoke any reclaim from here. It is in zone balancing detection > > phase. It mean your code reclaim pages from zones which has lots free pages too. > > The goal is to check not only for zone_watermark_ok, but also to see > if unmapped pages are way higher than expected values. > > > > > > + > > > if (!zone_watermark_ok_safe(zone, order, > > > high_wmark_pages(zone), 0, 0)) { > > > end_zone = i; > > > @@ -2258,6 +2279,11 @@ loop_again: > > > continue; > > > > > > sc.nr_scanned = 0; > > > + /* > > > + * Balance unmapped pages upfront, this should be > > > + * really cheap > > > + */ > > > + balance_unmapped_pages(priority, zone, &sc); > > > > > > This code break page-cache/slab balancing logic. And this is conflict > > against Nick's per-zone slab effort. > > > > OK, cc'ing Nick for comments. > > > Plus, high-order + priority=5 reclaim Simon's case. (see "Free memory never > > fully used, swapping" threads) > > > > OK, this path should not add to swapping activity, if that is your > concern. > > > > > > > > /* > > > * Call soft limit reclaim before calling shrink_zone. > > > @@ -2491,7 +2517,8 @@ void wakeup_kswapd(struct zone *zone, int order) > > > pgdat->kswapd_max_order = order; > > > if (!waitqueue_active(&pgdat->kswapd_wait)) > > > return; > > > - if (zone_watermark_ok_safe(zone, order, low_wmark_pages(zone), 0, 0)) > > > + if (zone_watermark_ok_safe(zone, order, low_wmark_pages(zone), 0, 0) && > > > + !should_balance_unmapped_pages(zone)) > > > return; > > > > > > trace_mm_vmscan_wakeup_kswapd(pgdat->node_id, zone_idx(zone), order); > > > @@ -2740,6 +2767,49 @@ zone_reclaim_unmapped_pages(struct zone *zone, struct scan_control *sc, > > > } > > > > > > /* > > > + * Routine to balance unmapped pages, inspired from the code under > > > + * CONFIG_NUMA that does unmapped page and slab page control by keeping > > > + * min_unmapped_pages in the zone. We currently reclaim just unmapped > > > + * pages, slab control will come in soon, at which point this routine > > > + * should be called balance cached pages > > > + */ > > > +static unsigned long balance_unmapped_pages(int priority, struct zone *zone, > > > + struct scan_control *sc) > > > +{ > > > + if (unmapped_page_control && > > > + (zone_unmapped_file_pages(zone) > zone->min_unmapped_pages)) { > > > + struct scan_control nsc; > > > + unsigned long nr_pages; > > > + > > > + nsc = *sc; > > > + > > > + nsc.swappiness = 0; > > > + nsc.may_writepage = 0; > > > + nsc.may_unmap = 0; > > > + nsc.nr_reclaimed = 0; > > > > Don't you need to fill nsc.nr_to_reclaim field? > > > > Yes, since the relcaim code looks at nr_reclaimed, it needs to be 0 at > every iteration - did I miss something? > > > > + > > > + nr_pages = zone_unmapped_file_pages(zone) - > > > + zone->min_unmapped_pages; > > > + /* Magically try to reclaim eighth the unmapped cache pages */ > > > + nr_pages >>= 3; > > > > Please don't make magic. > > > > OK, it is a hueristic, how do I use it? > > > > + > > > + zone_reclaim_unmapped_pages(zone, &nsc, nr_pages); > > > + return nsc.nr_reclaimed; > > > + } > > > + return 0; > > > +} > > > + > > > +#define UNMAPPED_PAGE_RATIO 16 > > > > Please don't make magic ratio. > > OK, it is a hueristic, how do I use heuristics - sysctl? > > > > > > +bool should_balance_unmapped_pages(struct zone *zone) > > > +{ > > > + if (unmapped_page_control && > > > + (zone_unmapped_file_pages(zone) > > > > + UNMAPPED_PAGE_RATIO * zone->min_unmapped_pages)) > > > + return true; > > > + return false; > > > +} > > > + > > > +/* > > > * Try to free up some pages from this zone through reclaim. > > > */ > > > static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order) > > > > > > Hmm.... > > > > As far as I reviewed, I can't find any reason why this patch works as expected. > > So, I think cleancache looks promising more than this idea. Have you seen Dan's > > patch? I would suggested discuss him. > > Please try the patch, I've been using it and it works exactly as > expected for me. kswapd does the balancing and works well. I've posted > some data as well. > My local MTA failed to deliver the message, trying again. -- Three Cheers, Balbir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 3/3] Provide control over unmapped pages 2010-11-30 10:16 ` [PATCH 3/3] Provide control over unmapped pages Balbir Singh ` (2 preceding siblings ...) 2010-12-01 0:14 ` KOSAKI Motohiro @ 2010-12-01 1:32 ` KAMEZAWA Hiroyuki [not found] ` <20101201051816.GI2746@balbir.in.ibm.com> 3 siblings, 1 reply; 27+ messages in thread From: KAMEZAWA Hiroyuki @ 2010-12-01 1:32 UTC (permalink / raw) To: Balbir Singh; +Cc: linux-mm, Christoph Lameter, akpm, linux-kernel, kvm On Tue, 30 Nov 2010 15:46:31 +0530 Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > Provide control using zone_reclaim() and a boot parameter. The > code reuses functionality from zone_reclaim() to isolate unmapped > pages and reclaim them as a priority, ahead of other mapped pages. > > Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> > --- > include/linux/swap.h | 5 ++- > mm/page_alloc.c | 7 +++-- > mm/vmscan.c | 72 +++++++++++++++++++++++++++++++++++++++++++++++++- > 3 files changed, 79 insertions(+), 5 deletions(-) > > diff --git a/include/linux/swap.h b/include/linux/swap.h > index eba53e7..78b0830 100644 > --- a/include/linux/swap.h > +++ b/include/linux/swap.h > @@ -252,11 +252,12 @@ extern int vm_swappiness; > extern int remove_mapping(struct address_space *mapping, struct page *page); > extern long vm_total_pages; > > -#ifdef CONFIG_NUMA > -extern int zone_reclaim_mode; > extern int sysctl_min_unmapped_ratio; > extern int sysctl_min_slab_ratio; > extern int zone_reclaim(struct zone *, gfp_t, unsigned int); > +extern bool should_balance_unmapped_pages(struct zone *zone); > +#ifdef CONFIG_NUMA > +extern int zone_reclaim_mode; > #else > #define zone_reclaim_mode 0 > static inline int zone_reclaim(struct zone *z, gfp_t mask, unsigned int order) > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 62b7280..4228da3 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1662,6 +1662,9 @@ zonelist_scan: > unsigned long mark; > int ret; > > + if (should_balance_unmapped_pages(zone)) > + wakeup_kswapd(zone, order); > + Hm, I'm not sure the final vision of this feature. Does this reclaiming feature can't be called directly via balloon driver just before alloc_page() ? Do you need to keep page caches small even when there are free memory on host ? Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
[parent not found: <20101201051816.GI2746@balbir.in.ibm.com>]
* Re: [PATCH 3/3] Provide control over unmapped pages [not found] ` <20101201051816.GI2746@balbir.in.ibm.com> @ 2010-12-01 5:22 ` Balbir Singh 2010-12-01 5:35 ` KAMEZAWA Hiroyuki 0 siblings, 1 reply; 27+ messages in thread From: Balbir Singh @ 2010-12-01 5:22 UTC (permalink / raw) To: KAMEZAWA Hiroyuki; +Cc: linux-mm, Christoph Lameter, akpm, linux-kernel, kvm * Balbir Singh <balbir@linux.vnet.ibm.com> [2010-12-01 10:48:16]: > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2010-12-01 10:32:54]: > > > On Tue, 30 Nov 2010 15:46:31 +0530 > > Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > > > > > Provide control using zone_reclaim() and a boot parameter. The > > > code reuses functionality from zone_reclaim() to isolate unmapped > > > pages and reclaim them as a priority, ahead of other mapped pages. > > > > > > Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> > > > --- > > > include/linux/swap.h | 5 ++- > > > mm/page_alloc.c | 7 +++-- > > > mm/vmscan.c | 72 +++++++++++++++++++++++++++++++++++++++++++++++++- > > > 3 files changed, 79 insertions(+), 5 deletions(-) > > > > > > diff --git a/include/linux/swap.h b/include/linux/swap.h > > > index eba53e7..78b0830 100644 > > > --- a/include/linux/swap.h > > > +++ b/include/linux/swap.h > > > @@ -252,11 +252,12 @@ extern int vm_swappiness; > > > extern int remove_mapping(struct address_space *mapping, struct page *page); > > > extern long vm_total_pages; > > > > > > -#ifdef CONFIG_NUMA > > > -extern int zone_reclaim_mode; > > > extern int sysctl_min_unmapped_ratio; > > > extern int sysctl_min_slab_ratio; > > > extern int zone_reclaim(struct zone *, gfp_t, unsigned int); > > > +extern bool should_balance_unmapped_pages(struct zone *zone); > > > +#ifdef CONFIG_NUMA > > > +extern int zone_reclaim_mode; > > > #else > > > #define zone_reclaim_mode 0 > > > static inline int zone_reclaim(struct zone *z, gfp_t mask, unsigned int order) > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > > index 62b7280..4228da3 100644 > > > --- a/mm/page_alloc.c > > > +++ b/mm/page_alloc.c > > > @@ -1662,6 +1662,9 @@ zonelist_scan: > > > unsigned long mark; > > > int ret; > > > > > > + if (should_balance_unmapped_pages(zone)) > > > + wakeup_kswapd(zone, order); > > > + > > > > Hm, I'm not sure the final vision of this feature. Does this reclaiming feature > > can't be called directly via balloon driver just before alloc_page() ? > > > > That is a separate patch, this is a boot paramter based control > approach. > > > Do you need to keep page caches small even when there are free memory on host ? > > > > The goal is to avoid duplication, as you know page cache fills itself > to consume as much memory as possible. The host generally does not > have a lot of free memory in a consolidated environment. > My local MTA failed to deliver the message, trying again. -- Three Cheers, Balbir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 3/3] Provide control over unmapped pages 2010-12-01 5:22 ` Balbir Singh @ 2010-12-01 5:35 ` KAMEZAWA Hiroyuki 2010-12-01 6:40 ` Balbir Singh 0 siblings, 1 reply; 27+ messages in thread From: KAMEZAWA Hiroyuki @ 2010-12-01 5:35 UTC (permalink / raw) To: balbir; +Cc: linux-mm, Christoph Lameter, akpm, linux-kernel, kvm On Wed, 1 Dec 2010 10:52:59 +0530 Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > * Balbir Singh <balbir@linux.vnet.ibm.com> [2010-12-01 10:48:16]: > > > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2010-12-01 10:32:54]: > > > > > On Tue, 30 Nov 2010 15:46:31 +0530 > > > Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > > > > > > > Provide control using zone_reclaim() and a boot parameter. The > > > > code reuses functionality from zone_reclaim() to isolate unmapped > > > > pages and reclaim them as a priority, ahead of other mapped pages. > > > > > > > > Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> > > > > --- > > > > include/linux/swap.h | 5 ++- > > > > mm/page_alloc.c | 7 +++-- > > > > mm/vmscan.c | 72 +++++++++++++++++++++++++++++++++++++++++++++++++- > > > > 3 files changed, 79 insertions(+), 5 deletions(-) > > > > > > > > diff --git a/include/linux/swap.h b/include/linux/swap.h > > > > index eba53e7..78b0830 100644 > > > > --- a/include/linux/swap.h > > > > +++ b/include/linux/swap.h > > > > @@ -252,11 +252,12 @@ extern int vm_swappiness; > > > > extern int remove_mapping(struct address_space *mapping, struct page *page); > > > > extern long vm_total_pages; > > > > > > > > -#ifdef CONFIG_NUMA > > > > -extern int zone_reclaim_mode; > > > > extern int sysctl_min_unmapped_ratio; > > > > extern int sysctl_min_slab_ratio; > > > > extern int zone_reclaim(struct zone *, gfp_t, unsigned int); > > > > +extern bool should_balance_unmapped_pages(struct zone *zone); > > > > +#ifdef CONFIG_NUMA > > > > +extern int zone_reclaim_mode; > > > > #else > > > > #define zone_reclaim_mode 0 > > > > static inline int zone_reclaim(struct zone *z, gfp_t mask, unsigned int order) > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > > > index 62b7280..4228da3 100644 > > > > --- a/mm/page_alloc.c > > > > +++ b/mm/page_alloc.c > > > > @@ -1662,6 +1662,9 @@ zonelist_scan: > > > > unsigned long mark; > > > > int ret; > > > > > > > > + if (should_balance_unmapped_pages(zone)) > > > > + wakeup_kswapd(zone, order); > > > > + > > > > > > Hm, I'm not sure the final vision of this feature. Does this reclaiming feature > > > can't be called directly via balloon driver just before alloc_page() ? > > > > > > > That is a separate patch, this is a boot paramter based control > > approach. > > > > > Do you need to keep page caches small even when there are free memory on host ? > > > > > > > The goal is to avoid duplication, as you know page cache fills itself > > to consume as much memory as possible. The host generally does not > > have a lot of free memory in a consolidated environment. > > That's a point. Then, why the guest has to do _extra_ work for host even when the host says nothing ? I think trigger this by guests themselves is not very good. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 3/3] Provide control over unmapped pages 2010-12-01 5:35 ` KAMEZAWA Hiroyuki @ 2010-12-01 6:40 ` Balbir Singh 2010-12-01 7:24 ` KAMEZAWA Hiroyuki 0 siblings, 1 reply; 27+ messages in thread From: Balbir Singh @ 2010-12-01 6:40 UTC (permalink / raw) To: KAMEZAWA Hiroyuki; +Cc: linux-mm, Christoph Lameter, akpm, linux-kernel, kvm * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2010-12-01 14:35:50]: > On Wed, 1 Dec 2010 10:52:59 +0530 > Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > > > * Balbir Singh <balbir@linux.vnet.ibm.com> [2010-12-01 10:48:16]: > > > > > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2010-12-01 10:32:54]: > > > > > > > On Tue, 30 Nov 2010 15:46:31 +0530 > > > > Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > > > > > > > > > Provide control using zone_reclaim() and a boot parameter. The > > > > > code reuses functionality from zone_reclaim() to isolate unmapped > > > > > pages and reclaim them as a priority, ahead of other mapped pages. > > > > > > > > > > Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> > > > > > --- > > > > > include/linux/swap.h | 5 ++- > > > > > mm/page_alloc.c | 7 +++-- > > > > > mm/vmscan.c | 72 +++++++++++++++++++++++++++++++++++++++++++++++++- > > > > > 3 files changed, 79 insertions(+), 5 deletions(-) > > > > > > > > > > diff --git a/include/linux/swap.h b/include/linux/swap.h > > > > > index eba53e7..78b0830 100644 > > > > > --- a/include/linux/swap.h > > > > > +++ b/include/linux/swap.h > > > > > @@ -252,11 +252,12 @@ extern int vm_swappiness; > > > > > extern int remove_mapping(struct address_space *mapping, struct page *page); > > > > > extern long vm_total_pages; > > > > > > > > > > -#ifdef CONFIG_NUMA > > > > > -extern int zone_reclaim_mode; > > > > > extern int sysctl_min_unmapped_ratio; > > > > > extern int sysctl_min_slab_ratio; > > > > > extern int zone_reclaim(struct zone *, gfp_t, unsigned int); > > > > > +extern bool should_balance_unmapped_pages(struct zone *zone); > > > > > +#ifdef CONFIG_NUMA > > > > > +extern int zone_reclaim_mode; > > > > > #else > > > > > #define zone_reclaim_mode 0 > > > > > static inline int zone_reclaim(struct zone *z, gfp_t mask, unsigned int order) > > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > > > > index 62b7280..4228da3 100644 > > > > > --- a/mm/page_alloc.c > > > > > +++ b/mm/page_alloc.c > > > > > @@ -1662,6 +1662,9 @@ zonelist_scan: > > > > > unsigned long mark; > > > > > int ret; > > > > > > > > > > + if (should_balance_unmapped_pages(zone)) > > > > > + wakeup_kswapd(zone, order); > > > > > + > > > > > > > > Hm, I'm not sure the final vision of this feature. Does this reclaiming feature > > > > can't be called directly via balloon driver just before alloc_page() ? > > > > > > > > > > That is a separate patch, this is a boot paramter based control > > > approach. > > > > > > > Do you need to keep page caches small even when there are free memory on host ? > > > > > > > > > > The goal is to avoid duplication, as you know page cache fills itself > > > to consume as much memory as possible. The host generally does not > > > have a lot of free memory in a consolidated environment. > > > > > That's a point. Then, why the guest has to do _extra_ work for host even when > the host says nothing ? I think trigger this by guests themselves is not very good. I've mentioned it before, the guest keeping free memory without a large performance hit, helps, the balloon driver is able to quickly retrieve this memory if required or the guest can use this memory for some other application/task. The cached data is mostly already present in the host page cache. -- Three Cheers, Balbir -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 3/3] Provide control over unmapped pages 2010-12-01 6:40 ` Balbir Singh @ 2010-12-01 7:24 ` KAMEZAWA Hiroyuki 0 siblings, 0 replies; 27+ messages in thread From: KAMEZAWA Hiroyuki @ 2010-12-01 7:24 UTC (permalink / raw) To: balbir; +Cc: linux-mm, Christoph Lameter, akpm, linux-kernel, kvm On Wed, 1 Dec 2010 12:10:43 +0530 Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > > That's a point. Then, why the guest has to do _extra_ work for host even when > > the host says nothing ? I think trigger this by guests themselves is not very good. > > I've mentioned it before, the guest keeping free memory without a > large performance hit, helps, the balloon driver is able to quickly > retrieve this memory if required or the guest can use this memory for > some other application/task. > The cached data is mostly already present in the host page cache. Why ? Are there parameters/stats which shows this is _true_ ? How we can guarantee/show it to users ? Please add an interface to show "shared rate between guest/host" If not, any admin will not turn this on because "file cache status on host" is a black box for guest admins. I think this patch skips something important steps. 2nd point is maybe for reducing total host memory usage and for increasing the number of guests on a host. For that, this feature is useful only when all guests on a host are friendly and devoted to the health of host memory management because all setting must be done in the guest. This can be passed as even by qemu's command line argument. And _no_ benefit for the guests who reduce it's resource to help host management because there is no guarantee dropped caches are on host memory. So, for both claim, I want to see an interface to show the number of shared pages between hosts and guests rather than imagine it. BTW, I don't like this kind of "please give us your victim, please please please" logic. The host should be able to "steal" what it wants in force. Then, I think there should be no On/Off visible interfaces. The vm firmware should tell to turn on this if administrator of the host wants. BTW2, please test with some other benchmarks (which read file caches.) I don't think kernel make is good test for this. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2010-12-05 5:14 UTC | newest] Thread overview: 27+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-11-30 10:14 [PATCH 0/3] Series short description Balbir Singh 2010-11-30 10:15 ` [PATCH 1/3] Move zone_reclaim() outside of CONFIG_NUMA Balbir Singh 2010-11-30 19:18 ` Christoph Lameter 2010-11-30 22:23 ` Andrew Morton [not found] ` <20101201043408.GE2746@balbir.in.ibm.com> 2010-12-01 5:21 ` Balbir Singh 2010-11-30 10:15 ` [PATCH 2/3] Refactor zone_reclaim Balbir Singh 2010-11-30 19:19 ` Christoph Lameter 2010-12-01 1:23 ` KAMEZAWA Hiroyuki [not found] ` <20101201044634.GF2746@balbir.in.ibm.com> 2010-12-01 5:22 ` Balbir Singh 2010-12-01 8:59 ` KAMEZAWA Hiroyuki 2010-12-01 7:54 ` Minchan Kim 2010-11-30 10:16 ` [PATCH 3/3] Provide control over unmapped pages Balbir Singh 2010-11-30 19:21 ` Christoph Lameter 2010-11-30 22:25 ` Andrew Morton [not found] ` <20101201045421.GG2746@balbir.in.ibm.com> 2010-12-01 5:22 ` Balbir Singh 2010-12-01 8:18 ` Minchan Kim 2010-12-01 15:01 ` Christoph Lameter 2010-12-02 1:22 ` KOSAKI Motohiro 2010-12-02 2:50 ` KAMEZAWA Hiroyuki 2010-12-02 7:01 ` Balbir Singh 2010-12-01 0:14 ` KOSAKI Motohiro [not found] ` <20101201051632.GH2746@balbir.in.ibm.com> 2010-12-01 5:22 ` Balbir Singh 2010-12-01 1:32 ` KAMEZAWA Hiroyuki [not found] ` <20101201051816.GI2746@balbir.in.ibm.com> 2010-12-01 5:22 ` Balbir Singh 2010-12-01 5:35 ` KAMEZAWA Hiroyuki 2010-12-01 6:40 ` Balbir Singh 2010-12-01 7:24 ` KAMEZAWA Hiroyuki
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).