[PATCH 0/3] Series short description

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/3] Series short description
@ 2010-11-30 10:14 Balbir Singh
  2010-11-30 10:15 ` [PATCH 1/3] Move zone_reclaim() outside of CONFIG_NUMA Balbir Singh
                   ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Balbir Singh @ 2010-11-30 10:14 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter; +Cc: akpm, linux-kernel, kvm

The following series implements page cache control,
this is a split out version of patch 1 of version 3 of the
page cache optimization patches posted earlier at
http://www.mail-archive.com/kvm@vger.kernel.org/msg43654.html

Christoph Lamater recommended splitting out patch 1, which
is what this series does

Detailed Description
====================
This patch implements unmapped page cache control via preferred
page cache reclaim. The current patch hooks into kswapd and reclaims
page cache if the user has requested for unmapped page control.
This is useful in the following scenario

- In a virtualized environment with cache=writethrough, we see
  double caching - (one in the host and one in the guest). As
  we try to scale guests, cache usage across the system grows.
  The goal of this patch is to reclaim page cache when Linux is running
  as a guest and get the host to hold the page cache and manage it.
  There might be temporary duplication, but in the long run, memory
  in the guests would be used for mapped pages.
- The option is controlled via a boot option and the administrator
  can selectively turn it on, on a need to use basis.

A lot of the code is borrowed from zone_reclaim_mode logic for
__zone_reclaim(). One might argue that the with ballooning and
KSM this feature is not very useful, but even with ballooning,
we need extra logic to balloon multiple VM machines and it is hard
to figure out the correct amount of memory to balloon. With these
patches applied, each guest has a sufficient amount of free memory
available, that can be easily seen and reclaimed by the balloon driver.
The additional memory in the guest can be reused for additional
applications or used to start additional guests/balance memory in
the host.

KSM currently does not de-duplicate host and guest page cache. The goal
of this patch is to help automatically balance unmapped page cache when
instructed to do so.

There are some magic numbers in use in the code, UNMAPPED_PAGE_RATIO
and the number of pages to reclaim when unmapped_page_control argument
is supplied. These numbers were chosen to avoid aggressiveness in
reaping page cache ever so frequently, at the same time providing control.

The sysctl for min_unmapped_ratio provides further control from
within the guest on the amount of unmapped pages to reclaim.

For a single VM - running kernbench

Enabled

Optimal load -j 8 run number 1...
Optimal load -j 8 run number 2...
Optimal load -j 8 run number 3...
Optimal load -j 8 run number 4...
Optimal load -j 8 run number 5...
Average Optimal load -j 8 Run (std deviation):
Elapsed Time 273.726 (1.2683)
User Time 190.014 (0.589941)
System Time 298.758 (1.72574)
Percent CPU 178 (0)
Context Switches 119953 (865.74)
Sleeps 38758 (795.074)

Disabled

Optimal load -j 8 run number 1...
Optimal load -j 8 run number 2...
Optimal load -j 8 run number 3...
Optimal load -j 8 run number 4...
Optimal load -j 8 run number 5...
Average Optimal load -j 8 Run (std deviation):
Elapsed Time 272.672 (0.453178)
User Time 189.7 (0.718157)
System Time 296.77 (0.845606)
Percent CPU 178 (0)
Context Switches 118822 (277.434)
Sleeps 37542.8 (545.922)

More data on the test results with the earlier patch is
at http://www.mail-archive.com/kvm@vger.kernel.org/msg43655.html

---

Balbir Singh (3):
      Move zone_reclaim() outside of CONFIG_NUMA
      Refactor zone_reclaim, move reusable functionality outside
      Provide control over unmapped pages

 include/linux/mmzone.h |    4 +-
 include/linux/swap.h   |    5 +-
 mm/page_alloc.c        |    7 ++-
 mm/vmscan.c            |  109 +++++++++++++++++++++++++++++++++++++++++-------
 4 files changed, 104 insertions(+), 21 deletions(-)

-- 
Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/3] Move zone_reclaim() outside of CONFIG_NUMA
  2010-11-30 10:14 [PATCH 0/3] Series short description Balbir Singh
@ 2010-11-30 10:15 ` Balbir Singh
  2010-11-30 19:18   ` Christoph Lameter
  2010-11-30 22:23   ` Andrew Morton
  2010-11-30 10:15 ` [PATCH 2/3] Refactor zone_reclaim Balbir Singh
  2010-11-30 10:16 ` [PATCH 3/3] Provide control over unmapped pages Balbir Singh
  2 siblings, 2 replies; 27+ messages in thread
From: Balbir Singh @ 2010-11-30 10:15 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter; +Cc: akpm, linux-kernel, kvm

This patch moves zone_reclaim and associated helpers
outside CONFIG_NUMA. This infrastructure is reused
in the patches for page cache control that follow.

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---
 include/linux/mmzone.h |    4 ++--
 mm/vmscan.c            |    2 --
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 4890662..aeede91 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -302,12 +302,12 @@ struct zone {
 	 */
 	unsigned long		lowmem_reserve[MAX_NR_ZONES];
 
-#ifdef CONFIG_NUMA
-	int node;
 	/*
 	 * zone reclaim becomes active if more unmapped pages exist.
 	 */
 	unsigned long		min_unmapped_pages;
+#ifdef CONFIG_NUMA
+	int node;
 	unsigned long		min_slab_pages;
 #endif
 	struct per_cpu_pageset __percpu *pageset;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 8cc90d5..325443a 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2644,7 +2644,6 @@ static int __init kswapd_init(void)
 
 module_init(kswapd_init)
 
-#ifdef CONFIG_NUMA
 /*
  * Zone reclaim mode
  *
@@ -2854,7 +2853,6 @@ int zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 
 	return ret;
 }
-#endif
 
 /*
  * page_evictable - test whether a page is evictable

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/3] Move zone_reclaim() outside of CONFIG_NUMA
  2010-11-30 10:15 ` [PATCH 1/3] Move zone_reclaim() outside of CONFIG_NUMA Balbir Singh
@ 2010-11-30 19:18   ` Christoph Lameter
  2010-11-30 22:23   ` Andrew Morton
  1 sibling, 0 replies; 27+ messages in thread
From: Christoph Lameter @ 2010-11-30 19:18 UTC (permalink / raw)
  To: Balbir Singh; +Cc: linux-mm, akpm, linux-kernel, kvm


Reviewed-by: Christoph Lameter <cl@linux.com>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/3] Move zone_reclaim() outside of CONFIG_NUMA
  2010-11-30 10:15 ` [PATCH 1/3] Move zone_reclaim() outside of CONFIG_NUMA Balbir Singh
  2010-11-30 19:18   ` Christoph Lameter
@ 2010-11-30 22:23   ` Andrew Morton
       [not found]     ` <20101201043408.GE2746@balbir.in.ibm.com>
  1 sibling, 1 reply; 27+ messages in thread
From: Andrew Morton @ 2010-11-30 22:23 UTC (permalink / raw)
  To: Balbir Singh; +Cc: linux-mm, Christoph Lameter, linux-kernel, kvm

On Tue, 30 Nov 2010 15:45:12 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> This patch moves zone_reclaim and associated helpers
> outside CONFIG_NUMA. This infrastructure is reused
> in the patches for page cache control that follow.
> 

Thereby adding a nice dollop of bloat to everyone's kernel.  I don't
think that is justifiable given that the audience for this feature is
about eight people :(

How's about CONFIG_UNMAPPED_PAGECACHE_CONTROL?

Also this patch instantiates sysctl_min_unmapped_ratio and
sysctl_min_slab_ratio on non-NUMA builds but fails to make those
tunables actually tunable in procfs.  Changes to sysctl.c are
needed.

> Reviewed-by: Christoph Lameter <cl@linux.com>

More careful reviewers, please.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

[parent not found: <20101201043408.GE2746@balbir.in.ibm.com>]

* Re: [PATCH 1/3] Move zone_reclaim() outside of CONFIG_NUMA
       [not found]     ` <20101201043408.GE2746@balbir.in.ibm.com>
@ 2010-12-01  5:21       ` Balbir Singh
  0 siblings, 0 replies; 27+ messages in thread
From: Balbir Singh @ 2010-12-01  5:21 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, Christoph Lameter, linux-kernel, kvm

* Balbir Singh <balbir@linux.vnet.ibm.com> [2010-12-01 10:04:08]:

> * Andrew Morton <akpm@linux-foundation.org> [2010-11-30 14:23:38]:
> 
> > On Tue, 30 Nov 2010 15:45:12 +0530
> > Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > 
> > > This patch moves zone_reclaim and associated helpers
> > > outside CONFIG_NUMA. This infrastructure is reused
> > > in the patches for page cache control that follow.
> > > 
> > 
> > Thereby adding a nice dollop of bloat to everyone's kernel.  I don't
> > think that is justifiable given that the audience for this feature is
> > about eight people :(
> > 
> > How's about CONFIG_UNMAPPED_PAGECACHE_CONTROL?
> >
> 
> OK, I'll add the config, but this code is enabled under CONFIG_NUMA
> today, so the bloat I agree is more for non NUMA users. I'll make
> CONFIG_UNMAPPED_PAGECACHE_CONTROL default if CONFIG_NUMA is set.
>  
> > Also this patch instantiates sysctl_min_unmapped_ratio and
> > sysctl_min_slab_ratio on non-NUMA builds but fails to make those
> > tunables actually tunable in procfs.  Changes to sysctl.c are
> > needed.
> > 
> 
> Oh! yeah.. I missed it while refactoring, my fault.
> 
> > > Reviewed-by: Christoph Lameter <cl@linux.com>
> > 

My local MTA failed to deliver the message, trying again.

-- 
	Three Cheers,
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 2/3] Refactor zone_reclaim
  2010-11-30 10:14 [PATCH 0/3] Series short description Balbir Singh
  2010-11-30 10:15 ` [PATCH 1/3] Move zone_reclaim() outside of CONFIG_NUMA Balbir Singh
@ 2010-11-30 10:15 ` Balbir Singh
  2010-11-30 19:19   ` Christoph Lameter
                     ` (2 more replies)
  2010-11-30 10:16 ` [PATCH 3/3] Provide control over unmapped pages Balbir Singh
  2 siblings, 3 replies; 27+ messages in thread
From: Balbir Singh @ 2010-11-30 10:15 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter; +Cc: akpm, linux-kernel, kvm

Refactor zone_reclaim, move reusable functionality outside
of zone_reclaim. Make zone_reclaim_unmapped_pages modular

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---
 mm/vmscan.c |   35 +++++++++++++++++++++++------------
 1 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 325443a..0ac444f 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2719,6 +2719,27 @@ static long zone_pagecache_reclaimable(struct zone *zone)
 }
 
 /*
+ * Helper function to reclaim unmapped pages, we might add something
+ * similar to this for slab cache as well. Currently this function
+ * is shared with __zone_reclaim()
+ */
+static inline void
+zone_reclaim_unmapped_pages(struct zone *zone, struct scan_control *sc,
+				unsigned long nr_pages)
+{
+	int priority;
+	/*
+	 * Free memory by calling shrink zone with increasing
+	 * priorities until we have enough memory freed.
+	 */
+	priority = ZONE_RECLAIM_PRIORITY;
+	do {
+		shrink_zone(priority, zone, sc);
+		priority--;
+	} while (priority >= 0 && sc->nr_reclaimed < nr_pages);
+}
+
+/*
  * Try to free up some pages from this zone through reclaim.
  */
 static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
@@ -2727,7 +2748,6 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 	const unsigned long nr_pages = 1 << order;
 	struct task_struct *p = current;
 	struct reclaim_state reclaim_state;
-	int priority;
 	struct scan_control sc = {
 		.may_writepage = !!(zone_reclaim_mode & RECLAIM_WRITE),
 		.may_unmap = !!(zone_reclaim_mode & RECLAIM_SWAP),
@@ -2751,17 +2771,8 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 	reclaim_state.reclaimed_slab = 0;
 	p->reclaim_state = &reclaim_state;
 
-	if (zone_pagecache_reclaimable(zone) > zone->min_unmapped_pages) {
-		/*
-		 * Free memory by calling shrink zone with increasing
-		 * priorities until we have enough memory freed.
-		 */
-		priority = ZONE_RECLAIM_PRIORITY;
-		do {
-			shrink_zone(priority, zone, &sc);
-			priority--;
-		} while (priority >= 0 && sc.nr_reclaimed < nr_pages);
-	}
+	if (zone_pagecache_reclaimable(zone) > zone->min_unmapped_pages)
+		zone_reclaim_unmapped_pages(zone, &sc, nr_pages);
 
 	nr_slab_pages0 = zone_page_state(zone, NR_SLAB_RECLAIMABLE);
 	if (nr_slab_pages0 > zone->min_slab_pages) {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] Refactor zone_reclaim
  2010-11-30 10:15 ` [PATCH 2/3] Refactor zone_reclaim Balbir Singh
@ 2010-11-30 19:19   ` Christoph Lameter
  2010-12-01  1:23   ` KAMEZAWA Hiroyuki
  2010-12-01  7:54   ` Minchan Kim
  2 siblings, 0 replies; 27+ messages in thread
From: Christoph Lameter @ 2010-11-30 19:19 UTC (permalink / raw)
  To: Balbir Singh; +Cc: linux-mm, akpm, linux-kernel, kvm


Reviewed-by: Christoph Lameter <cl@linux.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] Refactor zone_reclaim
  2010-11-30 10:15 ` [PATCH 2/3] Refactor zone_reclaim Balbir Singh
  2010-11-30 19:19   ` Christoph Lameter
@ 2010-12-01  1:23   ` KAMEZAWA Hiroyuki
       [not found]     ` <20101201044634.GF2746@balbir.in.ibm.com>
  2010-12-01  7:54   ` Minchan Kim
  2 siblings, 1 reply; 27+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-12-01  1:23 UTC (permalink / raw)
  To: Balbir Singh; +Cc: linux-mm, Christoph Lameter, akpm, linux-kernel, kvm

On Tue, 30 Nov 2010 15:45:55 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> Refactor zone_reclaim, move reusable functionality outside
> of zone_reclaim. Make zone_reclaim_unmapped_pages modular
> 
> Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>

Why is this min_mapped_pages based on zone (IOW, per-zone) ?


Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

[parent not found: <20101201044634.GF2746@balbir.in.ibm.com>]

* Re: [PATCH 2/3] Refactor zone_reclaim
       [not found]     ` <20101201044634.GF2746@balbir.in.ibm.com>
@ 2010-12-01  5:22       ` Balbir Singh
  2010-12-01  8:59         ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 27+ messages in thread
From: Balbir Singh @ 2010-12-01  5:22 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, Christoph Lameter, akpm, linux-kernel, kvm

* Balbir Singh <balbir@linux.vnet.ibm.com> [2010-12-01 10:16:34]:

> * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2010-12-01 10:23:29]:
> 
> > On Tue, 30 Nov 2010 15:45:55 +0530
> > Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > 
> > > Refactor zone_reclaim, move reusable functionality outside
> > > of zone_reclaim. Make zone_reclaim_unmapped_pages modular
> > > 
> > > Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
> > 
> > Why is this min_mapped_pages based on zone (IOW, per-zone) ?
> >
> 
> Kamezawa-San, this has been the design before the refactoring (it is
> based on zone_reclaim_mode and reclaim based on top of that).  I am
> reusing bits of existing technology. The advantage of it being
> per-zone is that it integrates well with kswapd. 
>

My local MTA failed to deliver the message, trying again. 

-- 
	Three Cheers,
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] Refactor zone_reclaim
  2010-12-01  5:22       ` Balbir Singh
@ 2010-12-01  8:59         ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 27+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-12-01  8:59 UTC (permalink / raw)
  To: balbir; +Cc: linux-mm, Christoph Lameter, akpm, linux-kernel, kvm

On Wed, 1 Dec 2010 10:52:18 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> * Balbir Singh <balbir@linux.vnet.ibm.com> [2010-12-01 10:16:34]:
> 
> > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2010-12-01 10:23:29]:
> > 
> > > On Tue, 30 Nov 2010 15:45:55 +0530
> > > Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > > 
> > > > Refactor zone_reclaim, move reusable functionality outside
> > > > of zone_reclaim. Make zone_reclaim_unmapped_pages modular
> > > > 
> > > > Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
> > > 
> > > Why is this min_mapped_pages based on zone (IOW, per-zone) ?
> > >
> > 
> > Kamezawa-San, this has been the design before the refactoring (it is
> > based on zone_reclaim_mode and reclaim based on top of that).  I am
> > reusing bits of existing technology. The advantage of it being
> > per-zone is that it integrates well with kswapd. 
> >
> 

Sorry, what I wanted to here was:

Why min_mapped_pages per zone ?
Why you don't add "limit_for_unmapped_page_cache_size" for the whole system ?

I guess what you really want is "limit_for_unmapped_page_cache_size".

Then, you have to use this kind of mysterious code.
==
(zone_unmapped_file_pages(zone) >
+			UNMAPPED_PAGE_RATIO * zone->min_unmapped_pages))

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] Refactor zone_reclaim
  2010-11-30 10:15 ` [PATCH 2/3] Refactor zone_reclaim Balbir Singh
  2010-11-30 19:19   ` Christoph Lameter
  2010-12-01  1:23   ` KAMEZAWA Hiroyuki
@ 2010-12-01  7:54   ` Minchan Kim
  2 siblings, 0 replies; 27+ messages in thread
From: Minchan Kim @ 2010-12-01  7:54 UTC (permalink / raw)
  To: Balbir Singh; +Cc: linux-mm, Christoph Lameter, akpm, linux-kernel, kvm

Hi Balbir,

On Tue, Nov 30, 2010 at 7:15 PM, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> Refactor zone_reclaim, move reusable functionality outside
> of zone_reclaim. Make zone_reclaim_unmapped_pages modular
>
> Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
> ---
>  mm/vmscan.c |   35 +++++++++++++++++++++++------------
>  1 files changed, 23 insertions(+), 12 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 325443a..0ac444f 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2719,6 +2719,27 @@ static long zone_pagecache_reclaimable(struct zone *zone)
>  }
>
>  /*
> + * Helper function to reclaim unmapped pages, we might add something
> + * similar to this for slab cache as well. Currently this function
> + * is shared with __zone_reclaim()
> + */
> +static inline void
> +zone_reclaim_unmapped_pages(struct zone *zone, struct scan_control *sc,
> +                               unsigned long nr_pages)
> +{
> +       int priority;
> +       /*
> +        * Free memory by calling shrink zone with increasing
> +        * priorities until we have enough memory freed.
> +        */
> +       priority = ZONE_RECLAIM_PRIORITY;
> +       do {
> +               shrink_zone(priority, zone, sc);
> +               priority--;
> +       } while (priority >= 0 && sc->nr_reclaimed < nr_pages);
> +}
> +

I don't see any specific logic about naming
"zone_reclaim_unmapped_pages" in your function.
Maybe, caller of this function have to handle sc->may_unmap. So, this
function's naming
is not good.
As I see your logic, the function name would be just "zone_reclaim_pages"
If you want to name it with zone_reclaim_unmapped_pages, it could be
better with setting sc->may_unmap in this function.


-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 3/3] Provide control over unmapped pages
  2010-11-30 10:14 [PATCH 0/3] Series short description Balbir Singh
  2010-11-30 10:15 ` [PATCH 1/3] Move zone_reclaim() outside of CONFIG_NUMA Balbir Singh
  2010-11-30 10:15 ` [PATCH 2/3] Refactor zone_reclaim Balbir Singh
@ 2010-11-30 10:16 ` Balbir Singh
  2010-11-30 19:21   ` Christoph Lameter
                     ` (3 more replies)
  2 siblings, 4 replies; 27+ messages in thread
From: Balbir Singh @ 2010-11-30 10:16 UTC (permalink / raw)
  To: linux-mm, Christoph Lameter; +Cc: akpm, linux-kernel, kvm

Provide control using zone_reclaim() and a boot parameter. The
code reuses functionality from zone_reclaim() to isolate unmapped
pages and reclaim them as a priority, ahead of other mapped pages.

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---
 include/linux/swap.h |    5 ++-
 mm/page_alloc.c      |    7 +++--
 mm/vmscan.c          |   72 +++++++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 79 insertions(+), 5 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index eba53e7..78b0830 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -252,11 +252,12 @@ extern int vm_swappiness;
 extern int remove_mapping(struct address_space *mapping, struct page *page);
 extern long vm_total_pages;
 
-#ifdef CONFIG_NUMA
-extern int zone_reclaim_mode;
 extern int sysctl_min_unmapped_ratio;
 extern int sysctl_min_slab_ratio;
 extern int zone_reclaim(struct zone *, gfp_t, unsigned int);
+extern bool should_balance_unmapped_pages(struct zone *zone);
+#ifdef CONFIG_NUMA
+extern int zone_reclaim_mode;
 #else
 #define zone_reclaim_mode 0
 static inline int zone_reclaim(struct zone *z, gfp_t mask, unsigned int order)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 62b7280..4228da3 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1662,6 +1662,9 @@ zonelist_scan:
 			unsigned long mark;
 			int ret;
 
+			if (should_balance_unmapped_pages(zone))
+				wakeup_kswapd(zone, order);
+
 			mark = zone->watermark[alloc_flags & ALLOC_WMARK_MASK];
 			if (zone_watermark_ok(zone, order, mark,
 				    classzone_idx, alloc_flags))
@@ -4136,10 +4139,10 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
 
 		zone->spanned_pages = size;
 		zone->present_pages = realsize;
-#ifdef CONFIG_NUMA
-		zone->node = nid;
 		zone->min_unmapped_pages = (realsize*sysctl_min_unmapped_ratio)
 						/ 100;
+#ifdef CONFIG_NUMA
+		zone->node = nid;
 		zone->min_slab_pages = (realsize * sysctl_min_slab_ratio) / 100;
 #endif
 		zone->name = zone_names[j];
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 0ac444f..98950f4 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -145,6 +145,21 @@ static DECLARE_RWSEM(shrinker_rwsem);
 #define scanning_global_lru(sc)	(1)
 #endif
 
+static unsigned long balance_unmapped_pages(int priority, struct zone *zone,
+						struct scan_control *sc);
+static int unmapped_page_control __read_mostly;
+
+static int __init unmapped_page_control_parm(char *str)
+{
+	unmapped_page_control = 1;
+	/*
+	 * XXX: Should we tweak swappiness here?
+	 */
+	return 1;
+}
+__setup("unmapped_page_control", unmapped_page_control_parm);
+
+
 static struct zone_reclaim_stat *get_reclaim_stat(struct zone *zone,
 						  struct scan_control *sc)
 {
@@ -2223,6 +2238,12 @@ loop_again:
 				shrink_active_list(SWAP_CLUSTER_MAX, zone,
 							&sc, priority, 0);
 
+			/*
+			 * We do unmapped page balancing once here and once
+			 * below, so that we don't lose out
+			 */
+			balance_unmapped_pages(priority, zone, &sc);
+
 			if (!zone_watermark_ok_safe(zone, order,
 					high_wmark_pages(zone), 0, 0)) {
 				end_zone = i;
@@ -2258,6 +2279,11 @@ loop_again:
 				continue;
 
 			sc.nr_scanned = 0;
+			/*
+			 * Balance unmapped pages upfront, this should be
+			 * really cheap
+			 */
+			balance_unmapped_pages(priority, zone, &sc);
 
 			/*
 			 * Call soft limit reclaim before calling shrink_zone.
@@ -2491,7 +2517,8 @@ void wakeup_kswapd(struct zone *zone, int order)
 		pgdat->kswapd_max_order = order;
 	if (!waitqueue_active(&pgdat->kswapd_wait))
 		return;
-	if (zone_watermark_ok_safe(zone, order, low_wmark_pages(zone), 0, 0))
+	if (zone_watermark_ok_safe(zone, order, low_wmark_pages(zone), 0, 0) &&
+		!should_balance_unmapped_pages(zone))
 		return;
 
 	trace_mm_vmscan_wakeup_kswapd(pgdat->node_id, zone_idx(zone), order);
@@ -2740,6 +2767,49 @@ zone_reclaim_unmapped_pages(struct zone *zone, struct scan_control *sc,
 }
 
 /*
+ * Routine to balance unmapped pages, inspired from the code under
+ * CONFIG_NUMA that does unmapped page and slab page control by keeping
+ * min_unmapped_pages in the zone. We currently reclaim just unmapped
+ * pages, slab control will come in soon, at which point this routine
+ * should be called balance cached pages
+ */
+static unsigned long balance_unmapped_pages(int priority, struct zone *zone,
+						struct scan_control *sc)
+{
+	if (unmapped_page_control &&
+		(zone_unmapped_file_pages(zone) > zone->min_unmapped_pages)) {
+		struct scan_control nsc;
+		unsigned long nr_pages;
+
+		nsc = *sc;
+
+		nsc.swappiness = 0;
+		nsc.may_writepage = 0;
+		nsc.may_unmap = 0;
+		nsc.nr_reclaimed = 0;
+
+		nr_pages = zone_unmapped_file_pages(zone) -
+				zone->min_unmapped_pages;
+		/* Magically try to reclaim eighth the unmapped cache pages */
+		nr_pages >>= 3;
+
+		zone_reclaim_unmapped_pages(zone, &nsc, nr_pages);
+		return nsc.nr_reclaimed;
+	}
+	return 0;
+}
+
+#define UNMAPPED_PAGE_RATIO 16
+bool should_balance_unmapped_pages(struct zone *zone)
+{
+	if (unmapped_page_control &&
+		(zone_unmapped_file_pages(zone) >
+			UNMAPPED_PAGE_RATIO * zone->min_unmapped_pages))
+		return true;
+	return false;
+}
+
+/*
  * Try to free up some pages from this zone through reclaim.
  */
 static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/3] Provide control over unmapped pages
  2010-11-30 10:16 ` [PATCH 3/3] Provide control over unmapped pages Balbir Singh
@ 2010-11-30 19:21   ` Christoph Lameter
  2010-11-30 22:25   ` Andrew Morton
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 27+ messages in thread
From: Christoph Lameter @ 2010-11-30 19:21 UTC (permalink / raw)
  To: Balbir Singh; +Cc: linux-mm, akpm, linux-kernel, kvm


Looks good.

Reviewed-by: Christoph Lameter <cl@linux.com>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/3] Provide control over unmapped pages
  2010-11-30 10:16 ` [PATCH 3/3] Provide control over unmapped pages Balbir Singh
  2010-11-30 19:21   ` Christoph Lameter
@ 2010-11-30 22:25   ` Andrew Morton
       [not found]     ` <20101201045421.GG2746@balbir.in.ibm.com>
  2010-12-01 15:01     ` Christoph Lameter
  2010-12-01  0:14   ` KOSAKI Motohiro
  2010-12-01  1:32   ` KAMEZAWA Hiroyuki
  3 siblings, 2 replies; 27+ messages in thread
From: Andrew Morton @ 2010-11-30 22:25 UTC (permalink / raw)
  To: Balbir Singh; +Cc: linux-mm, Christoph Lameter, linux-kernel, kvm

On Tue, 30 Nov 2010 15:46:31 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> Provide control using zone_reclaim() and a boot parameter. The
> code reuses functionality from zone_reclaim() to isolate unmapped
> pages and reclaim them as a priority, ahead of other mapped pages.
> 
> Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
> ---
>  include/linux/swap.h |    5 ++-
>  mm/page_alloc.c      |    7 +++--
>  mm/vmscan.c          |   72 +++++++++++++++++++++++++++++++++++++++++++++++++-
>  3 files changed, 79 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index eba53e7..78b0830 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -252,11 +252,12 @@ extern int vm_swappiness;
>  extern int remove_mapping(struct address_space *mapping, struct page *page);
>  extern long vm_total_pages;
>  
> -#ifdef CONFIG_NUMA
> -extern int zone_reclaim_mode;
>  extern int sysctl_min_unmapped_ratio;
>  extern int sysctl_min_slab_ratio;

This change will need to be moved into the first patch.

>  extern int zone_reclaim(struct zone *, gfp_t, unsigned int);
> +extern bool should_balance_unmapped_pages(struct zone *zone);
> +#ifdef CONFIG_NUMA
> +extern int zone_reclaim_mode;
>  #else
>  #define zone_reclaim_mode 0
>  static inline int zone_reclaim(struct zone *z, gfp_t mask, unsigned int order)
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 62b7280..4228da3 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1662,6 +1662,9 @@ zonelist_scan:
>  			unsigned long mark;
>  			int ret;
>  
> +			if (should_balance_unmapped_pages(zone))
> +				wakeup_kswapd(zone, order);

gack, this is on the page allocator fastpath, isn't it?  So
99.99999999% of the world's machines end up doing a pointless call to a
pointless function which pointlessly tests a pointless global and
pointlessly returns?  All because of some whacky KSM thing?

The speed and space overhead of this code should be *zero* if
!CONFIG_UNMAPPED_PAGECACHE_CONTROL and should be minimal if
CONFIG_UNMAPPED_PAGECACHE_CONTROL=y.  The way to do the latter is to
inline the test of unmapped_page_control into callers and only if it is
true (and use unlikely(), please) do we call into the KSM gunk.

> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -145,6 +145,21 @@ static DECLARE_RWSEM(shrinker_rwsem);
>  #define scanning_global_lru(sc)	(1)
>  #endif
>  
> +static unsigned long balance_unmapped_pages(int priority, struct zone *zone,
> +						struct scan_control *sc);
> +static int unmapped_page_control __read_mostly;
> +
> +static int __init unmapped_page_control_parm(char *str)
> +{
> +	unmapped_page_control = 1;
> +	/*
> +	 * XXX: Should we tweak swappiness here?
> +	 */
> +	return 1;
> +}
> +__setup("unmapped_page_control", unmapped_page_control_parm);

aw c'mon guys, everybody knows that when you add a kernel parameter you
document it in Documentation/kernel-parameters.txt.

>  static struct zone_reclaim_stat *get_reclaim_stat(struct zone *zone,
>  						  struct scan_control *sc)
>  {
> @@ -2223,6 +2238,12 @@ loop_again:
>  				shrink_active_list(SWAP_CLUSTER_MAX, zone,
>  							&sc, priority, 0);
>  
> +			/*
> +			 * We do unmapped page balancing once here and once
> +			 * below, so that we don't lose out
> +			 */
> +			balance_unmapped_pages(priority, zone, &sc);
> +
>  			if (!zone_watermark_ok_safe(zone, order,
>  					high_wmark_pages(zone), 0, 0)) {
>  				end_zone = i;
> @@ -2258,6 +2279,11 @@ loop_again:
>  				continue;
>  
>  			sc.nr_scanned = 0;
> +			/*
> +			 * Balance unmapped pages upfront, this should be
> +			 * really cheap
> +			 */
> +			balance_unmapped_pages(priority, zone, &sc);

More unjustifiable overhead on a commonly-executed codepath.

>  			/*
>  			 * Call soft limit reclaim before calling shrink_zone.
> @@ -2491,7 +2517,8 @@ void wakeup_kswapd(struct zone *zone, int order)
>  		pgdat->kswapd_max_order = order;
>  	if (!waitqueue_active(&pgdat->kswapd_wait))
>  		return;
> -	if (zone_watermark_ok_safe(zone, order, low_wmark_pages(zone), 0, 0))
> +	if (zone_watermark_ok_safe(zone, order, low_wmark_pages(zone), 0, 0) &&
> +		!should_balance_unmapped_pages(zone))
>  		return;
>  
>  	trace_mm_vmscan_wakeup_kswapd(pgdat->node_id, zone_idx(zone), order);
> @@ -2740,6 +2767,49 @@ zone_reclaim_unmapped_pages(struct zone *zone, struct scan_control *sc,
>  }
>  
>  /*
> + * Routine to balance unmapped pages, inspired from the code under
> + * CONFIG_NUMA that does unmapped page and slab page control by keeping
> + * min_unmapped_pages in the zone. We currently reclaim just unmapped
> + * pages, slab control will come in soon, at which point this routine
> + * should be called balance cached pages
> + */

The problem I have with this comment is that it uses the term "balance"
without ever defining it.  Plus "balance" is already a term which is used
in memory reclaim.

So if you can think up a unique noun then that's good but whether or
not that is done, please describe with great care what that term
actually means in this context.

> +static unsigned long balance_unmapped_pages(int priority, struct zone *zone,
> +						struct scan_control *sc)
> +{
> +	if (unmapped_page_control &&
> +		(zone_unmapped_file_pages(zone) > zone->min_unmapped_pages)) {
> +		struct scan_control nsc;
> +		unsigned long nr_pages;
> +
> +		nsc = *sc;
> +
> +		nsc.swappiness = 0;
> +		nsc.may_writepage = 0;
> +		nsc.may_unmap = 0;
> +		nsc.nr_reclaimed = 0;

Doing a clone-and-own of a scan_control is novel.  What's going on here?

> +		nr_pages = zone_unmapped_file_pages(zone) -
> +				zone->min_unmapped_pages;
> +		/* Magically try to reclaim eighth the unmapped cache pages */
> +		nr_pages >>= 3;
> +
> +		zone_reclaim_unmapped_pages(zone, &nsc, nr_pages);
> +		return nsc.nr_reclaimed;
> +	}
> +	return 0;
> +}
> +
> +#define UNMAPPED_PAGE_RATIO 16

Well.  Giving 16 a name didn't really clarify anything.  Attentive
readers will want to know what this does, why 16 was chosen and what
the effects of changing it will be.

> +bool should_balance_unmapped_pages(struct zone *zone)
> +{
> +	if (unmapped_page_control &&
> +		(zone_unmapped_file_pages(zone) >
> +			UNMAPPED_PAGE_RATIO * zone->min_unmapped_pages))
> +		return true;
> +	return false;
> +}


> Reviewed-by: Christoph Lameter <cl@linux.com>

So you're OK with shoving all this flotsam into 100,000,000 cellphones? 
This was a pretty outrageous patchset!


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

[parent not found: <20101201045421.GG2746@balbir.in.ibm.com>]

* Re: [PATCH 3/3] Provide control over unmapped pages
       [not found]     ` <20101201045421.GG2746@balbir.in.ibm.com>
@ 2010-12-01  5:22       ` Balbir Singh
  2010-12-01  8:18         ` Minchan Kim
  0 siblings, 1 reply; 27+ messages in thread
From: Balbir Singh @ 2010-12-01  5:22 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, Christoph Lameter, linux-kernel, kvm

* Balbir Singh <balbir@linux.vnet.ibm.com> [2010-12-01 10:24:21]:

> * Andrew Morton <akpm@linux-foundation.org> [2010-11-30 14:25:09]:
> 
> > On Tue, 30 Nov 2010 15:46:31 +0530
> > Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > 
> > > Provide control using zone_reclaim() and a boot parameter. The
> > > code reuses functionality from zone_reclaim() to isolate unmapped
> > > pages and reclaim them as a priority, ahead of other mapped pages.
> > > 
> > > Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
> > > ---
> > >  include/linux/swap.h |    5 ++-
> > >  mm/page_alloc.c      |    7 +++--
> > >  mm/vmscan.c          |   72 +++++++++++++++++++++++++++++++++++++++++++++++++-
> > >  3 files changed, 79 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/include/linux/swap.h b/include/linux/swap.h
> > > index eba53e7..78b0830 100644
> > > --- a/include/linux/swap.h
> > > +++ b/include/linux/swap.h
> > > @@ -252,11 +252,12 @@ extern int vm_swappiness;
> > >  extern int remove_mapping(struct address_space *mapping, struct page *page);
> > >  extern long vm_total_pages;
> > >  
> > > -#ifdef CONFIG_NUMA
> > > -extern int zone_reclaim_mode;
> > >  extern int sysctl_min_unmapped_ratio;
> > >  extern int sysctl_min_slab_ratio;
> > 
> > This change will need to be moved into the first patch.
> > 
> 
> OK, will do, thanks for pointing it out
> 
> > >  extern int zone_reclaim(struct zone *, gfp_t, unsigned int);
> > > +extern bool should_balance_unmapped_pages(struct zone *zone);
> > > +#ifdef CONFIG_NUMA
> > > +extern int zone_reclaim_mode;
> > >  #else
> > >  #define zone_reclaim_mode 0
> > >  static inline int zone_reclaim(struct zone *z, gfp_t mask, unsigned int order)
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index 62b7280..4228da3 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -1662,6 +1662,9 @@ zonelist_scan:
> > >  			unsigned long mark;
> > >  			int ret;
> > >  
> > > +			if (should_balance_unmapped_pages(zone))
> > > +				wakeup_kswapd(zone, order);
> > 
> > gack, this is on the page allocator fastpath, isn't it?  So
> > 99.99999999% of the world's machines end up doing a pointless call to a
> > pointless function which pointlessly tests a pointless global and
> > pointlessly returns?  All because of some whacky KSM thing?
> > 
> > The speed and space overhead of this code should be *zero* if
> > !CONFIG_UNMAPPED_PAGECACHE_CONTROL and should be minimal if
> > CONFIG_UNMAPPED_PAGECACHE_CONTROL=y.  The way to do the latter is to
> > inline the test of unmapped_page_control into callers and only if it is
> > true (and use unlikely(), please) do we call into the KSM gunk.
> >
> 
> Will do, should_balance_unmapped_pages() will be a made a no-op in the
> absence of CONFIG_UNMAPPED_PAGECACHE_CONTROL
>  
> > > --- a/mm/vmscan.c
> > > +++ b/mm/vmscan.c
> > > @@ -145,6 +145,21 @@ static DECLARE_RWSEM(shrinker_rwsem);
> > >  #define scanning_global_lru(sc)	(1)
> > >  #endif
> > >  
> > > +static unsigned long balance_unmapped_pages(int priority, struct zone *zone,
> > > +						struct scan_control *sc);
> > > +static int unmapped_page_control __read_mostly;
> > > +
> > > +static int __init unmapped_page_control_parm(char *str)
> > > +{
> > > +	unmapped_page_control = 1;
> > > +	/*
> > > +	 * XXX: Should we tweak swappiness here?
> > > +	 */
> > > +	return 1;
> > > +}
> > > +__setup("unmapped_page_control", unmapped_page_control_parm);
> > 
> > aw c'mon guys, everybody knows that when you add a kernel parameter you
> > document it in Documentation/kernel-parameters.txt.
> 
> Will do - feeling silly on missing it out, that is where reviews help.
> 
> > 
> > >  static struct zone_reclaim_stat *get_reclaim_stat(struct zone *zone,
> > >  						  struct scan_control *sc)
> > >  {
> > > @@ -2223,6 +2238,12 @@ loop_again:
> > >  				shrink_active_list(SWAP_CLUSTER_MAX, zone,
> > >  							&sc, priority, 0);
> > >  
> > > +			/*
> > > +			 * We do unmapped page balancing once here and once
> > > +			 * below, so that we don't lose out
> > > +			 */
> > > +			balance_unmapped_pages(priority, zone, &sc);
> > > +
> > >  			if (!zone_watermark_ok_safe(zone, order,
> > >  					high_wmark_pages(zone), 0, 0)) {
> > >  				end_zone = i;
> > > @@ -2258,6 +2279,11 @@ loop_again:
> > >  				continue;
> > >  
> > >  			sc.nr_scanned = 0;
> > > +			/*
> > > +			 * Balance unmapped pages upfront, this should be
> > > +			 * really cheap
> > > +			 */
> > > +			balance_unmapped_pages(priority, zone, &sc);
> > 
> > More unjustifiable overhead on a commonly-executed codepath.
> >
> 
> Will refactor with a CONFIG suggested above.
>  
> > >  			/*
> > >  			 * Call soft limit reclaim before calling shrink_zone.
> > > @@ -2491,7 +2517,8 @@ void wakeup_kswapd(struct zone *zone, int order)
> > >  		pgdat->kswapd_max_order = order;
> > >  	if (!waitqueue_active(&pgdat->kswapd_wait))
> > >  		return;
> > > -	if (zone_watermark_ok_safe(zone, order, low_wmark_pages(zone), 0, 0))
> > > +	if (zone_watermark_ok_safe(zone, order, low_wmark_pages(zone), 0, 0) &&
> > > +		!should_balance_unmapped_pages(zone))
> > >  		return;
> > >  
> > >  	trace_mm_vmscan_wakeup_kswapd(pgdat->node_id, zone_idx(zone), order);
> > > @@ -2740,6 +2767,49 @@ zone_reclaim_unmapped_pages(struct zone *zone, struct scan_control *sc,
> > >  }
> > >  
> > >  /*
> > > + * Routine to balance unmapped pages, inspired from the code under
> > > + * CONFIG_NUMA that does unmapped page and slab page control by keeping
> > > + * min_unmapped_pages in the zone. We currently reclaim just unmapped
> > > + * pages, slab control will come in soon, at which point this routine
> > > + * should be called balance cached pages
> > > + */
> > 
> > The problem I have with this comment is that it uses the term "balance"
> > without ever defining it.  Plus "balance" is already a term which is used
> > in memory reclaim.
> > 
> > So if you can think up a unique noun then that's good but whether or
> > not that is done, please describe with great care what that term
> > actually means in this context.
> 
> I used balance as a not a 1:1 balance, but to balance the proportion
> of unmapped page cache based on a sysctl/tunable.
> 
> > 
> > > +static unsigned long balance_unmapped_pages(int priority, struct zone *zone,
> > > +						struct scan_control *sc)
> > > +{
> > > +	if (unmapped_page_control &&
> > > +		(zone_unmapped_file_pages(zone) > zone->min_unmapped_pages)) {
> > > +		struct scan_control nsc;
> > > +		unsigned long nr_pages;
> > > +
> > > +		nsc = *sc;
> > > +
> > > +		nsc.swappiness = 0;
> > > +		nsc.may_writepage = 0;
> > > +		nsc.may_unmap = 0;
> > > +		nsc.nr_reclaimed = 0;
> > 
> > Doing a clone-and-own of a scan_control is novel.  What's going on here?
> 
> This code overwrites the swappiness, may_* and nr_reclaimed for
> correct stats. The idea is to vary the reclaim behaviour/bias it.
> 
> > 
> > > +		nr_pages = zone_unmapped_file_pages(zone) -
> > > +				zone->min_unmapped_pages;
> > > +		/* Magically try to reclaim eighth the unmapped cache pages */
> > > +		nr_pages >>= 3;
> > > +
> > > +		zone_reclaim_unmapped_pages(zone, &nsc, nr_pages);
> > > +		return nsc.nr_reclaimed;
> > > +	}
> > > +	return 0;
> > > +}
> > > +
> > > +#define UNMAPPED_PAGE_RATIO 16
> > 
> > Well.  Giving 16 a name didn't really clarify anything.  Attentive
> > readers will want to know what this does, why 16 was chosen and what
> > the effects of changing it will be.
> 
> Sorry, I documented that in the changelog of the first patchset. I'll
> document it here as well. The reason for choosing 16 is based on
> heuristics and test, the tradeoff being overenthusiastic reclaim
> versus size of cache/performance.
> 
> > 
> > > +bool should_balance_unmapped_pages(struct zone *zone)
> > > +{
> > > +	if (unmapped_page_control &&
> > > +		(zone_unmapped_file_pages(zone) >
> > > +			UNMAPPED_PAGE_RATIO * zone->min_unmapped_pages))
> > > +		return true;
> > > +	return false;
> > > +}
> > 
> > 
> > > Reviewed-by: Christoph Lameter <cl@linux.com>
> > 
> > So you're OK with shoving all this flotsam into 100,000,000 cellphones? 
> > This was a pretty outrageous patchset!
> 
> I'll do a better one, BTW, a lot of embedded folks are interested in
> page cache control outside of cgroup behaviour.
> 
> Thanks for the detailed review!
>

My local MTA failed to deliver the message, trying again. 

-- 
	Three Cheers,
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/3] Provide control over unmapped pages
  2010-12-01  5:22       ` Balbir Singh
@ 2010-12-01  8:18         ` Minchan Kim
  0 siblings, 0 replies; 27+ messages in thread
From: Minchan Kim @ 2010-12-01  8:18 UTC (permalink / raw)
  To: balbir; +Cc: Andrew Morton, linux-mm, Christoph Lameter, linux-kernel, kvm

On Wed, Dec 1, 2010 at 2:22 PM, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> * Balbir Singh <balbir@linux.vnet.ibm.com> [2010-12-01 10:24:21]:
>
>> * Andrew Morton <akpm@linux-foundation.org> [2010-11-30 14:25:09]:
>> > So you're OK with shoving all this flotsam into 100,000,000 cellphones?
>> > This was a pretty outrageous patchset!
>>
>> I'll do a better one, BTW, a lot of embedded folks are interested in
>> page cache control outside of cgroup behaviour.

Yes. Embedded people(at least, me) want it. That's because they don't
have any swap device so they could reclaim only page cache page.
And many page cache pages are mapped at address space of
application(ex, android uses java model so many pages are mapped by
application's address space). It means it's hard to reclaim them
without lagging.
So I have a interest in this patch.

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/3] Provide control over unmapped pages
  2010-11-30 22:25   ` Andrew Morton
       [not found]     ` <20101201045421.GG2746@balbir.in.ibm.com>
@ 2010-12-01 15:01     ` Christoph Lameter
  2010-12-02  1:22       ` KOSAKI Motohiro
  1 sibling, 1 reply; 27+ messages in thread
From: Christoph Lameter @ 2010-12-01 15:01 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Balbir Singh, linux-mm, linux-kernel, kvm

On Tue, 30 Nov 2010, Andrew Morton wrote:

> > +#define UNMAPPED_PAGE_RATIO 16
>
> Well.  Giving 16 a name didn't really clarify anything.  Attentive
> readers will want to know what this does, why 16 was chosen and what
> the effects of changing it will be.

The meaning is analoguous to the other zone reclaim ratio. But yes it
should be justified and defined.

> > Reviewed-by: Christoph Lameter <cl@linux.com>
>
> So you're OK with shoving all this flotsam into 100,000,000 cellphones?
> This was a pretty outrageous patchset!

This is a feature that has been requested over and over for years. Using
/proc/vm/drop_caches for fixing situations where one simply has too many
page cache pages is not so much fun in the long run.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/3] Provide control over unmapped pages
  2010-12-01 15:01     ` Christoph Lameter
@ 2010-12-02  1:22       ` KOSAKI Motohiro
  2010-12-02  2:50         ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 27+ messages in thread
From: KOSAKI Motohiro @ 2010-12-02  1:22 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: kosaki.motohiro, Andrew Morton, Balbir Singh, linux-mm,
	linux-kernel, kvm

> On Tue, 30 Nov 2010, Andrew Morton wrote:
> 
> > > +#define UNMAPPED_PAGE_RATIO 16
> >
> > Well.  Giving 16 a name didn't really clarify anything.  Attentive
> > readers will want to know what this does, why 16 was chosen and what
> > the effects of changing it will be.
> 
> The meaning is analoguous to the other zone reclaim ratio. But yes it
> should be justified and defined.
> 
> > > Reviewed-by: Christoph Lameter <cl@linux.com>
> >
> > So you're OK with shoving all this flotsam into 100,000,000 cellphones?
> > This was a pretty outrageous patchset!
> 
> This is a feature that has been requested over and over for years. Using
> /proc/vm/drop_caches for fixing situations where one simply has too many
> page cache pages is not so much fun in the long run.

I'm not against page cache limitation feature at all. But, this is
too ugly and too destructive fast path. I hope this patch reduce negative
impact more.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/3] Provide control over unmapped pages
  2010-12-02  1:22       ` KOSAKI Motohiro
@ 2010-12-02  2:50         ` KAMEZAWA Hiroyuki
  2010-12-02  7:01           ` Balbir Singh
  0 siblings, 1 reply; 27+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-12-02  2:50 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Christoph Lameter, Andrew Morton, Balbir Singh, linux-mm,
	linux-kernel, kvm

On Thu,  2 Dec 2010 10:22:16 +0900 (JST)
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:

> > On Tue, 30 Nov 2010, Andrew Morton wrote:
> > 
> > > > +#define UNMAPPED_PAGE_RATIO 16
> > >
> > > Well.  Giving 16 a name didn't really clarify anything.  Attentive
> > > readers will want to know what this does, why 16 was chosen and what
> > > the effects of changing it will be.
> > 
> > The meaning is analoguous to the other zone reclaim ratio. But yes it
> > should be justified and defined.
> > 
> > > > Reviewed-by: Christoph Lameter <cl@linux.com>
> > >
> > > So you're OK with shoving all this flotsam into 100,000,000 cellphones?
> > > This was a pretty outrageous patchset!
> > 
> > This is a feature that has been requested over and over for years. Using
> > /proc/vm/drop_caches for fixing situations where one simply has too many
> > page cache pages is not so much fun in the long run.
> 
> I'm not against page cache limitation feature at all. But, this is
> too ugly and too destructive fast path. I hope this patch reduce negative
> impact more.
> 

And I think min_mapped_unmapped_pages is ugly. It should be
"unmapped_pagecache_limit" or some because it's for limitation feature.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/3] Provide control over unmapped pages
  2010-12-02  2:50         ` KAMEZAWA Hiroyuki
@ 2010-12-02  7:01           ` Balbir Singh
  0 siblings, 0 replies; 27+ messages in thread
From: Balbir Singh @ 2010-12-02  7:01 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: KOSAKI Motohiro, Christoph Lameter, Andrew Morton, linux-mm,
	linux-kernel, kvm

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2010-12-02 11:50:36]:

> On Thu,  2 Dec 2010 10:22:16 +0900 (JST)
> KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
> 
> > > On Tue, 30 Nov 2010, Andrew Morton wrote:
> > > 
> > > > > +#define UNMAPPED_PAGE_RATIO 16
> > > >
> > > > Well.  Giving 16 a name didn't really clarify anything.  Attentive
> > > > readers will want to know what this does, why 16 was chosen and what
> > > > the effects of changing it will be.
> > > 
> > > The meaning is analoguous to the other zone reclaim ratio. But yes it
> > > should be justified and defined.
> > > 
> > > > > Reviewed-by: Christoph Lameter <cl@linux.com>
> > > >
> > > > So you're OK with shoving all this flotsam into 100,000,000 cellphones?
> > > > This was a pretty outrageous patchset!
> > > 
> > > This is a feature that has been requested over and over for years. Using
> > > /proc/vm/drop_caches for fixing situations where one simply has too many
> > > page cache pages is not so much fun in the long run.
> > 
> > I'm not against page cache limitation feature at all. But, this is
> > too ugly and too destructive fast path. I hope this patch reduce negative
> > impact more.
> > 
> 
> And I think min_mapped_unmapped_pages is ugly. It should be
> "unmapped_pagecache_limit" or some because it's for limitation feature.
>

The feature will now be enabled with a CONFIG and boot parameter, I
find changing the naming convention now - it is already in use and
well known is not a good idea. THe name of the boot parameter can be
changed of-course. 

-- 
	Three Cheers,
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/3] Provide control over unmapped pages
  2010-11-30 10:16 ` [PATCH 3/3] Provide control over unmapped pages Balbir Singh
  2010-11-30 19:21   ` Christoph Lameter
  2010-11-30 22:25   ` Andrew Morton
@ 2010-12-01  0:14   ` KOSAKI Motohiro
       [not found]     ` <20101201051632.GH2746@balbir.in.ibm.com>
  2010-12-01  1:32   ` KAMEZAWA Hiroyuki
  3 siblings, 1 reply; 27+ messages in thread
From: KOSAKI Motohiro @ 2010-12-01  0:14 UTC (permalink / raw)
  To: Balbir Singh
  Cc: kosaki.motohiro, linux-mm, Christoph Lameter, akpm, linux-kernel,
	kvm

> Provide control using zone_reclaim() and a boot parameter. The
> code reuses functionality from zone_reclaim() to isolate unmapped
> pages and reclaim them as a priority, ahead of other mapped pages.
> 
> Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
> ---
>  include/linux/swap.h |    5 ++-
>  mm/page_alloc.c      |    7 +++--
>  mm/vmscan.c          |   72 +++++++++++++++++++++++++++++++++++++++++++++++++-
>  3 files changed, 79 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index eba53e7..78b0830 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -252,11 +252,12 @@ extern int vm_swappiness;
>  extern int remove_mapping(struct address_space *mapping, struct page *page);
>  extern long vm_total_pages;
>  
> -#ifdef CONFIG_NUMA
> -extern int zone_reclaim_mode;
>  extern int sysctl_min_unmapped_ratio;
>  extern int sysctl_min_slab_ratio;
>  extern int zone_reclaim(struct zone *, gfp_t, unsigned int);
> +extern bool should_balance_unmapped_pages(struct zone *zone);
> +#ifdef CONFIG_NUMA
> +extern int zone_reclaim_mode;
>  #else
>  #define zone_reclaim_mode 0
>  static inline int zone_reclaim(struct zone *z, gfp_t mask, unsigned int order)
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 62b7280..4228da3 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1662,6 +1662,9 @@ zonelist_scan:
>  			unsigned long mark;
>  			int ret;
>  
> +			if (should_balance_unmapped_pages(zone))
> +				wakeup_kswapd(zone, order);
> +

You don't have to add extra branch into fast path.


>  			mark = zone->watermark[alloc_flags & ALLOC_WMARK_MASK];
>  			if (zone_watermark_ok(zone, order, mark,
>  				    classzone_idx, alloc_flags))
> @@ -4136,10 +4139,10 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
>  
>  		zone->spanned_pages = size;
>  		zone->present_pages = realsize;
> -#ifdef CONFIG_NUMA
> -		zone->node = nid;
>  		zone->min_unmapped_pages = (realsize*sysctl_min_unmapped_ratio)
>  						/ 100;
> +#ifdef CONFIG_NUMA
> +		zone->node = nid;
>  		zone->min_slab_pages = (realsize * sysctl_min_slab_ratio) / 100;
>  #endif
>  		zone->name = zone_names[j];
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 0ac444f..98950f4 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -145,6 +145,21 @@ static DECLARE_RWSEM(shrinker_rwsem);
>  #define scanning_global_lru(sc)	(1)
>  #endif
>  
> +static unsigned long balance_unmapped_pages(int priority, struct zone *zone,
> +						struct scan_control *sc);
> +static int unmapped_page_control __read_mostly;
> +
> +static int __init unmapped_page_control_parm(char *str)
> +{
> +	unmapped_page_control = 1;
> +	/*
> +	 * XXX: Should we tweak swappiness here?
> +	 */
> +	return 1;
> +}
> +__setup("unmapped_page_control", unmapped_page_control_parm);
> +
> +
>  static struct zone_reclaim_stat *get_reclaim_stat(struct zone *zone,
>  						  struct scan_control *sc)
>  {
> @@ -2223,6 +2238,12 @@ loop_again:
>  				shrink_active_list(SWAP_CLUSTER_MAX, zone,
>  							&sc, priority, 0);
>  
> +			/*
> +			 * We do unmapped page balancing once here and once
> +			 * below, so that we don't lose out
> +			 */
> +			balance_unmapped_pages(priority, zone, &sc);

You can't invoke any reclaim from here. It is in zone balancing detection
phase. It mean your code reclaim pages from zones which has lots free pages too.

> +
>  			if (!zone_watermark_ok_safe(zone, order,
>  					high_wmark_pages(zone), 0, 0)) {
>  				end_zone = i;
> @@ -2258,6 +2279,11 @@ loop_again:
>  				continue;
>  
>  			sc.nr_scanned = 0;
> +			/*
> +			 * Balance unmapped pages upfront, this should be
> +			 * really cheap
> +			 */
> +			balance_unmapped_pages(priority, zone, &sc);


This code break page-cache/slab balancing logic. And this is conflict
against Nick's per-zone slab effort.

Plus, high-order + priority=5 reclaim Simon's case. (see "Free memory never 
fully used, swapping" threads)

>  
>  			/*
>  			 * Call soft limit reclaim before calling shrink_zone.
> @@ -2491,7 +2517,8 @@ void wakeup_kswapd(struct zone *zone, int order)
>  		pgdat->kswapd_max_order = order;
>  	if (!waitqueue_active(&pgdat->kswapd_wait))
>  		return;
> -	if (zone_watermark_ok_safe(zone, order, low_wmark_pages(zone), 0, 0))
> +	if (zone_watermark_ok_safe(zone, order, low_wmark_pages(zone), 0, 0) &&
> +		!should_balance_unmapped_pages(zone))
>  		return;
>  
>  	trace_mm_vmscan_wakeup_kswapd(pgdat->node_id, zone_idx(zone), order);
> @@ -2740,6 +2767,49 @@ zone_reclaim_unmapped_pages(struct zone *zone, struct scan_control *sc,
>  }
>  
>  /*
> + * Routine to balance unmapped pages, inspired from the code under
> + * CONFIG_NUMA that does unmapped page and slab page control by keeping
> + * min_unmapped_pages in the zone. We currently reclaim just unmapped
> + * pages, slab control will come in soon, at which point this routine
> + * should be called balance cached pages
> + */
> +static unsigned long balance_unmapped_pages(int priority, struct zone *zone,
> +						struct scan_control *sc)
> +{
> +	if (unmapped_page_control &&
> +		(zone_unmapped_file_pages(zone) > zone->min_unmapped_pages)) {
> +		struct scan_control nsc;
> +		unsigned long nr_pages;
> +
> +		nsc = *sc;
> +
> +		nsc.swappiness = 0;
> +		nsc.may_writepage = 0;
> +		nsc.may_unmap = 0;
> +		nsc.nr_reclaimed = 0;

Don't you need to fill nsc.nr_to_reclaim field?

> +
> +		nr_pages = zone_unmapped_file_pages(zone) -
> +				zone->min_unmapped_pages;
> +		/* Magically try to reclaim eighth the unmapped cache pages */
> +		nr_pages >>= 3;

Please don't make magic.

> +
> +		zone_reclaim_unmapped_pages(zone, &nsc, nr_pages);
> +		return nsc.nr_reclaimed;
> +	}
> +	return 0;
> +}
> +
> +#define UNMAPPED_PAGE_RATIO 16

Please don't make magic ratio.

> +bool should_balance_unmapped_pages(struct zone *zone)
> +{
> +	if (unmapped_page_control &&
> +		(zone_unmapped_file_pages(zone) >
> +			UNMAPPED_PAGE_RATIO * zone->min_unmapped_pages))
> +		return true;
> +	return false;
> +}
> +
> +/*
>   * Try to free up some pages from this zone through reclaim.
>   */
>  static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)


Hmm....

As far as I reviewed, I can't find any reason why this patch works as expected.
So, I think cleancache looks promising more than this idea. Have you seen Dan's
patch? I would suggested discuss him.

Thanks.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

[parent not found: <20101201051632.GH2746@balbir.in.ibm.com>]

* Re: [PATCH 3/3] Provide control over unmapped pages
       [not found]     ` <20101201051632.GH2746@balbir.in.ibm.com>
@ 2010-12-01  5:22       ` Balbir Singh
  0 siblings, 0 replies; 27+ messages in thread
From: Balbir Singh @ 2010-12-01  5:22 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-mm, Christoph Lameter, akpm, linux-kernel, kvm, Nick Piggin

* Balbir Singh <balbir@linux.vnet.ibm.com> [2010-12-01 10:46:32]:

> * KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> [2010-12-01 09:14:13]:
> 
> > > Provide control using zone_reclaim() and a boot parameter. The
> > > code reuses functionality from zone_reclaim() to isolate unmapped
> > > pages and reclaim them as a priority, ahead of other mapped pages.
> > > 
> > > Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
> > > ---
> > >  include/linux/swap.h |    5 ++-
> > >  mm/page_alloc.c      |    7 +++--
> > >  mm/vmscan.c          |   72 +++++++++++++++++++++++++++++++++++++++++++++++++-
> > >  3 files changed, 79 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/include/linux/swap.h b/include/linux/swap.h
> > > index eba53e7..78b0830 100644
> > > --- a/include/linux/swap.h
> > > +++ b/include/linux/swap.h
> > > @@ -252,11 +252,12 @@ extern int vm_swappiness;
> > >  extern int remove_mapping(struct address_space *mapping, struct page *page);
> > >  extern long vm_total_pages;
> > >  
> > > -#ifdef CONFIG_NUMA
> > > -extern int zone_reclaim_mode;
> > >  extern int sysctl_min_unmapped_ratio;
> > >  extern int sysctl_min_slab_ratio;
> > >  extern int zone_reclaim(struct zone *, gfp_t, unsigned int);
> > > +extern bool should_balance_unmapped_pages(struct zone *zone);
> > > +#ifdef CONFIG_NUMA
> > > +extern int zone_reclaim_mode;
> > >  #else
> > >  #define zone_reclaim_mode 0
> > >  static inline int zone_reclaim(struct zone *z, gfp_t mask, unsigned int order)
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index 62b7280..4228da3 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -1662,6 +1662,9 @@ zonelist_scan:
> > >  			unsigned long mark;
> > >  			int ret;
> > >  
> > > +			if (should_balance_unmapped_pages(zone))
> > > +				wakeup_kswapd(zone, order);
> > > +
> > 
> > You don't have to add extra branch into fast path.
> > 
> > 
> > >  			mark = zone->watermark[alloc_flags & ALLOC_WMARK_MASK];
> > >  			if (zone_watermark_ok(zone, order, mark,
> > >  				    classzone_idx, alloc_flags))
> > > @@ -4136,10 +4139,10 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
> > >  
> > >  		zone->spanned_pages = size;
> > >  		zone->present_pages = realsize;
> > > -#ifdef CONFIG_NUMA
> > > -		zone->node = nid;
> > >  		zone->min_unmapped_pages = (realsize*sysctl_min_unmapped_ratio)
> > >  						/ 100;
> > > +#ifdef CONFIG_NUMA
> > > +		zone->node = nid;
> > >  		zone->min_slab_pages = (realsize * sysctl_min_slab_ratio) / 100;
> > >  #endif
> > >  		zone->name = zone_names[j];
> > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > index 0ac444f..98950f4 100644
> > > --- a/mm/vmscan.c
> > > +++ b/mm/vmscan.c
> > > @@ -145,6 +145,21 @@ static DECLARE_RWSEM(shrinker_rwsem);
> > >  #define scanning_global_lru(sc)	(1)
> > >  #endif
> > >  
> > > +static unsigned long balance_unmapped_pages(int priority, struct zone *zone,
> > > +						struct scan_control *sc);
> > > +static int unmapped_page_control __read_mostly;
> > > +
> > > +static int __init unmapped_page_control_parm(char *str)
> > > +{
> > > +	unmapped_page_control = 1;
> > > +	/*
> > > +	 * XXX: Should we tweak swappiness here?
> > > +	 */
> > > +	return 1;
> > > +}
> > > +__setup("unmapped_page_control", unmapped_page_control_parm);
> > > +
> > > +
> > >  static struct zone_reclaim_stat *get_reclaim_stat(struct zone *zone,
> > >  						  struct scan_control *sc)
> > >  {
> > > @@ -2223,6 +2238,12 @@ loop_again:
> > >  				shrink_active_list(SWAP_CLUSTER_MAX, zone,
> > >  							&sc, priority, 0);
> > >  
> > > +			/*
> > > +			 * We do unmapped page balancing once here and once
> > > +			 * below, so that we don't lose out
> > > +			 */
> > > +			balance_unmapped_pages(priority, zone, &sc);
> > 
> > You can't invoke any reclaim from here. It is in zone balancing detection
> > phase. It mean your code reclaim pages from zones which has lots free pages too.
> 
> The goal is to check not only for zone_watermark_ok, but also to see
> if unmapped pages are way higher than expected values.
> 
> > 
> > > +
> > >  			if (!zone_watermark_ok_safe(zone, order,
> > >  					high_wmark_pages(zone), 0, 0)) {
> > >  				end_zone = i;
> > > @@ -2258,6 +2279,11 @@ loop_again:
> > >  				continue;
> > >  
> > >  			sc.nr_scanned = 0;
> > > +			/*
> > > +			 * Balance unmapped pages upfront, this should be
> > > +			 * really cheap
> > > +			 */
> > > +			balance_unmapped_pages(priority, zone, &sc);
> > 
> > 
> > This code break page-cache/slab balancing logic. And this is conflict
> > against Nick's per-zone slab effort.
> >
> 
> OK, cc'ing Nick for comments.
>  
> > Plus, high-order + priority=5 reclaim Simon's case. (see "Free memory never 
> > fully used, swapping" threads)
> >
> 
> OK, this path should not add to swapping activity, if that is your
> concern.
> 
>  
> > >  
> > >  			/*
> > >  			 * Call soft limit reclaim before calling shrink_zone.
> > > @@ -2491,7 +2517,8 @@ void wakeup_kswapd(struct zone *zone, int order)
> > >  		pgdat->kswapd_max_order = order;
> > >  	if (!waitqueue_active(&pgdat->kswapd_wait))
> > >  		return;
> > > -	if (zone_watermark_ok_safe(zone, order, low_wmark_pages(zone), 0, 0))
> > > +	if (zone_watermark_ok_safe(zone, order, low_wmark_pages(zone), 0, 0) &&
> > > +		!should_balance_unmapped_pages(zone))
> > >  		return;
> > >  
> > >  	trace_mm_vmscan_wakeup_kswapd(pgdat->node_id, zone_idx(zone), order);
> > > @@ -2740,6 +2767,49 @@ zone_reclaim_unmapped_pages(struct zone *zone, struct scan_control *sc,
> > >  }
> > >  
> > >  /*
> > > + * Routine to balance unmapped pages, inspired from the code under
> > > + * CONFIG_NUMA that does unmapped page and slab page control by keeping
> > > + * min_unmapped_pages in the zone. We currently reclaim just unmapped
> > > + * pages, slab control will come in soon, at which point this routine
> > > + * should be called balance cached pages
> > > + */
> > > +static unsigned long balance_unmapped_pages(int priority, struct zone *zone,
> > > +						struct scan_control *sc)
> > > +{
> > > +	if (unmapped_page_control &&
> > > +		(zone_unmapped_file_pages(zone) > zone->min_unmapped_pages)) {
> > > +		struct scan_control nsc;
> > > +		unsigned long nr_pages;
> > > +
> > > +		nsc = *sc;
> > > +
> > > +		nsc.swappiness = 0;
> > > +		nsc.may_writepage = 0;
> > > +		nsc.may_unmap = 0;
> > > +		nsc.nr_reclaimed = 0;
> > 
> > Don't you need to fill nsc.nr_to_reclaim field?
> > 
> 
> Yes, since the relcaim code looks at nr_reclaimed, it needs to be 0 at
> every iteration - did I miss something?
> 
> > > +
> > > +		nr_pages = zone_unmapped_file_pages(zone) -
> > > +				zone->min_unmapped_pages;
> > > +		/* Magically try to reclaim eighth the unmapped cache pages */
> > > +		nr_pages >>= 3;
> > 
> > Please don't make magic.
> >
>  
> OK, it is a hueristic, how do I use it?
> 
> > > +
> > > +		zone_reclaim_unmapped_pages(zone, &nsc, nr_pages);
> > > +		return nsc.nr_reclaimed;
> > > +	}
> > > +	return 0;
> > > +}
> > > +
> > > +#define UNMAPPED_PAGE_RATIO 16
> > 
> > Please don't make magic ratio.
> 
> OK, it is a hueristic, how do I use heuristics - sysctl?
> 
> > 
> > > +bool should_balance_unmapped_pages(struct zone *zone)
> > > +{
> > > +	if (unmapped_page_control &&
> > > +		(zone_unmapped_file_pages(zone) >
> > > +			UNMAPPED_PAGE_RATIO * zone->min_unmapped_pages))
> > > +		return true;
> > > +	return false;
> > > +}
> > > +
> > > +/*
> > >   * Try to free up some pages from this zone through reclaim.
> > >   */
> > >  static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
> > 
> > 
> > Hmm....
> > 
> > As far as I reviewed, I can't find any reason why this patch works as expected.
> > So, I think cleancache looks promising more than this idea. Have you seen Dan's
> > patch? I would suggested discuss him.
> 
> Please try the patch, I've been using it and it works exactly as
> expected for me. kswapd does the balancing and works well. I've posted
> some data as well.
>

My local MTA failed to deliver the message, trying again. 

-- 
	Three Cheers,
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/3] Provide control over unmapped pages
  2010-11-30 10:16 ` [PATCH 3/3] Provide control over unmapped pages Balbir Singh
                     ` (2 preceding siblings ...)
  2010-12-01  0:14   ` KOSAKI Motohiro
@ 2010-12-01  1:32   ` KAMEZAWA Hiroyuki
       [not found]     ` <20101201051816.GI2746@balbir.in.ibm.com>
  3 siblings, 1 reply; 27+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-12-01  1:32 UTC (permalink / raw)
  To: Balbir Singh; +Cc: linux-mm, Christoph Lameter, akpm, linux-kernel, kvm

On Tue, 30 Nov 2010 15:46:31 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> Provide control using zone_reclaim() and a boot parameter. The
> code reuses functionality from zone_reclaim() to isolate unmapped
> pages and reclaim them as a priority, ahead of other mapped pages.
> 
> Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
> ---
>  include/linux/swap.h |    5 ++-
>  mm/page_alloc.c      |    7 +++--
>  mm/vmscan.c          |   72 +++++++++++++++++++++++++++++++++++++++++++++++++-
>  3 files changed, 79 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index eba53e7..78b0830 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -252,11 +252,12 @@ extern int vm_swappiness;
>  extern int remove_mapping(struct address_space *mapping, struct page *page);
>  extern long vm_total_pages;
>  
> -#ifdef CONFIG_NUMA
> -extern int zone_reclaim_mode;
>  extern int sysctl_min_unmapped_ratio;
>  extern int sysctl_min_slab_ratio;
>  extern int zone_reclaim(struct zone *, gfp_t, unsigned int);
> +extern bool should_balance_unmapped_pages(struct zone *zone);
> +#ifdef CONFIG_NUMA
> +extern int zone_reclaim_mode;
>  #else
>  #define zone_reclaim_mode 0
>  static inline int zone_reclaim(struct zone *z, gfp_t mask, unsigned int order)
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 62b7280..4228da3 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1662,6 +1662,9 @@ zonelist_scan:
>  			unsigned long mark;
>  			int ret;
>  
> +			if (should_balance_unmapped_pages(zone))
> +				wakeup_kswapd(zone, order);
> +

Hm, I'm not sure the final vision of this feature. Does this reclaiming feature
can't be called directly via balloon driver just before alloc_page() ?

Do you need to keep page caches small even when there are free memory on host ?

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

[parent not found: <20101201051816.GI2746@balbir.in.ibm.com>]

* Re: [PATCH 3/3] Provide control over unmapped pages
       [not found]     ` <20101201051816.GI2746@balbir.in.ibm.com>
@ 2010-12-01  5:22       ` Balbir Singh
  2010-12-01  5:35         ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 27+ messages in thread
From: Balbir Singh @ 2010-12-01  5:22 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, Christoph Lameter, akpm, linux-kernel, kvm

* Balbir Singh <balbir@linux.vnet.ibm.com> [2010-12-01 10:48:16]:

> * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2010-12-01 10:32:54]:
> 
> > On Tue, 30 Nov 2010 15:46:31 +0530
> > Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > 
> > > Provide control using zone_reclaim() and a boot parameter. The
> > > code reuses functionality from zone_reclaim() to isolate unmapped
> > > pages and reclaim them as a priority, ahead of other mapped pages.
> > > 
> > > Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
> > > ---
> > >  include/linux/swap.h |    5 ++-
> > >  mm/page_alloc.c      |    7 +++--
> > >  mm/vmscan.c          |   72 +++++++++++++++++++++++++++++++++++++++++++++++++-
> > >  3 files changed, 79 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/include/linux/swap.h b/include/linux/swap.h
> > > index eba53e7..78b0830 100644
> > > --- a/include/linux/swap.h
> > > +++ b/include/linux/swap.h
> > > @@ -252,11 +252,12 @@ extern int vm_swappiness;
> > >  extern int remove_mapping(struct address_space *mapping, struct page *page);
> > >  extern long vm_total_pages;
> > >  
> > > -#ifdef CONFIG_NUMA
> > > -extern int zone_reclaim_mode;
> > >  extern int sysctl_min_unmapped_ratio;
> > >  extern int sysctl_min_slab_ratio;
> > >  extern int zone_reclaim(struct zone *, gfp_t, unsigned int);
> > > +extern bool should_balance_unmapped_pages(struct zone *zone);
> > > +#ifdef CONFIG_NUMA
> > > +extern int zone_reclaim_mode;
> > >  #else
> > >  #define zone_reclaim_mode 0
> > >  static inline int zone_reclaim(struct zone *z, gfp_t mask, unsigned int order)
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index 62b7280..4228da3 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -1662,6 +1662,9 @@ zonelist_scan:
> > >  			unsigned long mark;
> > >  			int ret;
> > >  
> > > +			if (should_balance_unmapped_pages(zone))
> > > +				wakeup_kswapd(zone, order);
> > > +
> > 
> > Hm, I'm not sure the final vision of this feature. Does this reclaiming feature
> > can't be called directly via balloon driver just before alloc_page() ?
> >
> 
> That is a separate patch, this is a boot paramter based control
> approach.
>  
> > Do you need to keep page caches small even when there are free memory on host ?
> >
> 
> The goal is to avoid duplication, as you know page cache fills itself
> to consume as much memory as possible. The host generally does not
> have a lot of free memory in a consolidated environment. 
>
My local MTA failed to deliver the message, trying again. 

-- 
	Three Cheers,
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/3] Provide control over unmapped pages
  2010-12-01  5:22       ` Balbir Singh
@ 2010-12-01  5:35         ` KAMEZAWA Hiroyuki
  2010-12-01  6:40           ` Balbir Singh
  0 siblings, 1 reply; 27+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-12-01  5:35 UTC (permalink / raw)
  To: balbir; +Cc: linux-mm, Christoph Lameter, akpm, linux-kernel, kvm

On Wed, 1 Dec 2010 10:52:59 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> * Balbir Singh <balbir@linux.vnet.ibm.com> [2010-12-01 10:48:16]:
> 
> > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2010-12-01 10:32:54]:
> > 
> > > On Tue, 30 Nov 2010 15:46:31 +0530
> > > Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > > 
> > > > Provide control using zone_reclaim() and a boot parameter. The
> > > > code reuses functionality from zone_reclaim() to isolate unmapped
> > > > pages and reclaim them as a priority, ahead of other mapped pages.
> > > > 
> > > > Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
> > > > ---
> > > >  include/linux/swap.h |    5 ++-
> > > >  mm/page_alloc.c      |    7 +++--
> > > >  mm/vmscan.c          |   72 +++++++++++++++++++++++++++++++++++++++++++++++++-
> > > >  3 files changed, 79 insertions(+), 5 deletions(-)
> > > > 
> > > > diff --git a/include/linux/swap.h b/include/linux/swap.h
> > > > index eba53e7..78b0830 100644
> > > > --- a/include/linux/swap.h
> > > > +++ b/include/linux/swap.h
> > > > @@ -252,11 +252,12 @@ extern int vm_swappiness;
> > > >  extern int remove_mapping(struct address_space *mapping, struct page *page);
> > > >  extern long vm_total_pages;
> > > >  
> > > > -#ifdef CONFIG_NUMA
> > > > -extern int zone_reclaim_mode;
> > > >  extern int sysctl_min_unmapped_ratio;
> > > >  extern int sysctl_min_slab_ratio;
> > > >  extern int zone_reclaim(struct zone *, gfp_t, unsigned int);
> > > > +extern bool should_balance_unmapped_pages(struct zone *zone);
> > > > +#ifdef CONFIG_NUMA
> > > > +extern int zone_reclaim_mode;
> > > >  #else
> > > >  #define zone_reclaim_mode 0
> > > >  static inline int zone_reclaim(struct zone *z, gfp_t mask, unsigned int order)
> > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > > index 62b7280..4228da3 100644
> > > > --- a/mm/page_alloc.c
> > > > +++ b/mm/page_alloc.c
> > > > @@ -1662,6 +1662,9 @@ zonelist_scan:
> > > >  			unsigned long mark;
> > > >  			int ret;
> > > >  
> > > > +			if (should_balance_unmapped_pages(zone))
> > > > +				wakeup_kswapd(zone, order);
> > > > +
> > > 
> > > Hm, I'm not sure the final vision of this feature. Does this reclaiming feature
> > > can't be called directly via balloon driver just before alloc_page() ?
> > >
> > 
> > That is a separate patch, this is a boot paramter based control
> > approach.
> >  
> > > Do you need to keep page caches small even when there are free memory on host ?
> > >
> > 
> > The goal is to avoid duplication, as you know page cache fills itself
> > to consume as much memory as possible. The host generally does not
> > have a lot of free memory in a consolidated environment. 
> >

That's a point. Then, why the guest has to do _extra_ work for host even when
the host says nothing ? I think trigger this by guests themselves is not very good.

Thanks,
-Kame



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/3] Provide control over unmapped pages
  2010-12-01  5:35         ` KAMEZAWA Hiroyuki
@ 2010-12-01  6:40           ` Balbir Singh
  2010-12-01  7:24             ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 27+ messages in thread
From: Balbir Singh @ 2010-12-01  6:40 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, Christoph Lameter, akpm, linux-kernel, kvm

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2010-12-01 14:35:50]:

> On Wed, 1 Dec 2010 10:52:59 +0530
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> 
> > * Balbir Singh <balbir@linux.vnet.ibm.com> [2010-12-01 10:48:16]:
> > 
> > > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2010-12-01 10:32:54]:
> > > 
> > > > On Tue, 30 Nov 2010 15:46:31 +0530
> > > > Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > > > 
> > > > > Provide control using zone_reclaim() and a boot parameter. The
> > > > > code reuses functionality from zone_reclaim() to isolate unmapped
> > > > > pages and reclaim them as a priority, ahead of other mapped pages.
> > > > > 
> > > > > Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
> > > > > ---
> > > > >  include/linux/swap.h |    5 ++-
> > > > >  mm/page_alloc.c      |    7 +++--
> > > > >  mm/vmscan.c          |   72 +++++++++++++++++++++++++++++++++++++++++++++++++-
> > > > >  3 files changed, 79 insertions(+), 5 deletions(-)
> > > > > 
> > > > > diff --git a/include/linux/swap.h b/include/linux/swap.h
> > > > > index eba53e7..78b0830 100644
> > > > > --- a/include/linux/swap.h
> > > > > +++ b/include/linux/swap.h
> > > > > @@ -252,11 +252,12 @@ extern int vm_swappiness;
> > > > >  extern int remove_mapping(struct address_space *mapping, struct page *page);
> > > > >  extern long vm_total_pages;
> > > > >  
> > > > > -#ifdef CONFIG_NUMA
> > > > > -extern int zone_reclaim_mode;
> > > > >  extern int sysctl_min_unmapped_ratio;
> > > > >  extern int sysctl_min_slab_ratio;
> > > > >  extern int zone_reclaim(struct zone *, gfp_t, unsigned int);
> > > > > +extern bool should_balance_unmapped_pages(struct zone *zone);
> > > > > +#ifdef CONFIG_NUMA
> > > > > +extern int zone_reclaim_mode;
> > > > >  #else
> > > > >  #define zone_reclaim_mode 0
> > > > >  static inline int zone_reclaim(struct zone *z, gfp_t mask, unsigned int order)
> > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > > > index 62b7280..4228da3 100644
> > > > > --- a/mm/page_alloc.c
> > > > > +++ b/mm/page_alloc.c
> > > > > @@ -1662,6 +1662,9 @@ zonelist_scan:
> > > > >  			unsigned long mark;
> > > > >  			int ret;
> > > > >  
> > > > > +			if (should_balance_unmapped_pages(zone))
> > > > > +				wakeup_kswapd(zone, order);
> > > > > +
> > > > 
> > > > Hm, I'm not sure the final vision of this feature. Does this reclaiming feature
> > > > can't be called directly via balloon driver just before alloc_page() ?
> > > >
> > > 
> > > That is a separate patch, this is a boot paramter based control
> > > approach.
> > >  
> > > > Do you need to keep page caches small even when there are free memory on host ?
> > > >
> > > 
> > > The goal is to avoid duplication, as you know page cache fills itself
> > > to consume as much memory as possible. The host generally does not
> > > have a lot of free memory in a consolidated environment. 
> > >
> 
> That's a point. Then, why the guest has to do _extra_ work for host even when
> the host says nothing ? I think trigger this by guests themselves is not very good.

I've mentioned it before, the guest keeping free memory without a
large performance hit, helps, the balloon driver is able to quickly
retrieve this memory if required or the guest can use this memory for
some other application/task. The cached data is mostly already present
in the host page cache.

-- 
	Three Cheers,
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/3] Provide control over unmapped pages
  2010-12-01  6:40           ` Balbir Singh
@ 2010-12-01  7:24             ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 27+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-12-01  7:24 UTC (permalink / raw)
  To: balbir; +Cc: linux-mm, Christoph Lameter, akpm, linux-kernel, kvm

On Wed, 1 Dec 2010 12:10:43 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> > That's a point. Then, why the guest has to do _extra_ work for host even when
> > the host says nothing ? I think trigger this by guests themselves is not very good.
> 
> I've mentioned it before, the guest keeping free memory without a
> large performance hit, helps, the balloon driver is able to quickly
> retrieve this memory if required or the guest can use this memory for
> some other application/task. 

> The cached data is mostly already present in the host page cache.

Why ? Are there parameters/stats which shows this is _true_ ? How we can
guarantee/show it to users ?
Please add an interface to show "shared rate between guest/host" If not,
any admin will not turn this on because "file cache status on host" is a
black box for guest admins. I think this patch skips something important steps.

2nd point is maybe for reducing total host memory usage and for increasing
the number of guests on a host. For that, this feature is useful only when all guests
on a host are friendly and devoted to the health of host memory management because
all setting must be done in the guest. This can be passed as even by qemu's command line
argument. And _no_ benefit for the guests who reduce it's resource to help
host management because there is no guarantee dropped caches are on host memory.

So, for both claim, I want to see an interface to show the number of shared pages
between hosts and guests rather than imagine it.

BTW, I don't like this kind of "please give us your victim, please please please"
logic. The host should be able to "steal" what it wants in force.
Then, I think there should be no On/Off visible interfaces. The vm firmware
should tell to turn on this if administrator of the host wants.

BTW2, please test with some other benchmarks (which read file caches.)
I don't think kernel make is good test for this.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2010-12-05  5:14 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-30 10:14 [PATCH 0/3] Series short description Balbir Singh
2010-11-30 10:15 ` [PATCH 1/3] Move zone_reclaim() outside of CONFIG_NUMA Balbir Singh
2010-11-30 19:18   ` Christoph Lameter
2010-11-30 22:23   ` Andrew Morton
     [not found]     ` <20101201043408.GE2746@balbir.in.ibm.com>
2010-12-01  5:21       ` Balbir Singh
2010-11-30 10:15 ` [PATCH 2/3] Refactor zone_reclaim Balbir Singh
2010-11-30 19:19   ` Christoph Lameter
2010-12-01  1:23   ` KAMEZAWA Hiroyuki
     [not found]     ` <20101201044634.GF2746@balbir.in.ibm.com>
2010-12-01  5:22       ` Balbir Singh
2010-12-01  8:59         ` KAMEZAWA Hiroyuki
2010-12-01  7:54   ` Minchan Kim
2010-11-30 10:16 ` [PATCH 3/3] Provide control over unmapped pages Balbir Singh
2010-11-30 19:21   ` Christoph Lameter
2010-11-30 22:25   ` Andrew Morton
     [not found]     ` <20101201045421.GG2746@balbir.in.ibm.com>
2010-12-01  5:22       ` Balbir Singh
2010-12-01  8:18         ` Minchan Kim
2010-12-01 15:01     ` Christoph Lameter
2010-12-02  1:22       ` KOSAKI Motohiro
2010-12-02  2:50         ` KAMEZAWA Hiroyuki
2010-12-02  7:01           ` Balbir Singh
2010-12-01  0:14   ` KOSAKI Motohiro
     [not found]     ` <20101201051632.GH2746@balbir.in.ibm.com>
2010-12-01  5:22       ` Balbir Singh
2010-12-01  1:32   ` KAMEZAWA Hiroyuki
     [not found]     ` <20101201051816.GI2746@balbir.in.ibm.com>
2010-12-01  5:22       ` Balbir Singh
2010-12-01  5:35         ` KAMEZAWA Hiroyuki
2010-12-01  6:40           ` Balbir Singh
2010-12-01  7:24             ` KAMEZAWA Hiroyuki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).