linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] fix get_scan_count for working well with small targets
@ 2011-04-26  9:17 KAMEZAWA Hiroyuki
  2011-04-26 17:36 ` Ying Han
  2011-04-26 20:59 ` Andrew Morton
  0 siblings, 2 replies; 9+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-04-26  9:17 UTC (permalink / raw)
  To: linux-mm@kvack.org
  Cc: linux-kernel@vger.kernel.org, nishimura@mxp.nes.nec.co.jp,
	kosaki.motohiro@jp.fujitsu.com, akpm@linux-foundation.org,
	minchan.kim@gmail.com, mgorman@suse.de, Ying Han

At memory reclaim, we determine the number of pages to be scanned
per zone as
	(anon + file) >> priority.
Assume 
	scan = (anon + file) >> priority.

If scan < SWAP_CLUSTER_MAX, shlink_list will be skipped for this
priority and results no-sacn.  This has some problems.

  1. This increases priority as 1 without any scan.
     To do scan in DEF_PRIORITY always, amount of pages should be larger than
     512M. If pages>>priority < SWAP_CLUSTER_MAX, it's recorded and scan will be
     batched, later. (But we lose 1 priority.)
     But if the amount of pages is smaller than 16M, no scan at priority==0
     forever.

  2. If zone->all_unreclaimabe==true, it's scanned only when priority==0.
     So, x86's ZONE_DMA will never be recoverred until the user of pages
     frees memory by itself.

  3. With memcg, the limit of memory can be small. When using small memcg,
     it gets priority < DEF_PRIORITY-2 very easily and need to call
     wait_iff_congested().
     For doing scan before priorty=9, 64MB of memory should be used.

This patch tries to scan SWAP_CLUSTER_MAX of pages in force...when

  1. the target is enough small.
  2. it's kswapd or memcg reclaim.

Then we can avoid rapid priority drop and may be able to recover
all_unreclaimable in a small zones.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/vmscan.c |   31 ++++++++++++++++++++++++++-----
 1 file changed, 26 insertions(+), 5 deletions(-)

Index: memcg/mm/vmscan.c
===================================================================
--- memcg.orig/mm/vmscan.c
+++ memcg/mm/vmscan.c
@@ -1737,6 +1737,16 @@ static void get_scan_count(struct zone *
 	u64 fraction[2], denominator;
 	enum lru_list l;
 	int noswap = 0;
+	int may_noscan = 0;
+
+
+	anon  = zone_nr_lru_pages(zone, sc, LRU_ACTIVE_ANON) +
+		zone_nr_lru_pages(zone, sc, LRU_INACTIVE_ANON);
+	file  = zone_nr_lru_pages(zone, sc, LRU_ACTIVE_FILE) +
+		zone_nr_lru_pages(zone, sc, LRU_INACTIVE_FILE);
+
+	if (((anon + file) >> priority) < SWAP_CLUSTER_MAX)
+		may_noscan = 1;
 
 	/* If we have no swap space, do not bother scanning anon pages. */
 	if (!sc->may_swap || (nr_swap_pages <= 0)) {
@@ -1747,11 +1757,6 @@ static void get_scan_count(struct zone *
 		goto out;
 	}
 
-	anon  = zone_nr_lru_pages(zone, sc, LRU_ACTIVE_ANON) +
-		zone_nr_lru_pages(zone, sc, LRU_INACTIVE_ANON);
-	file  = zone_nr_lru_pages(zone, sc, LRU_ACTIVE_FILE) +
-		zone_nr_lru_pages(zone, sc, LRU_INACTIVE_FILE);
-
 	if (scanning_global_lru(sc)) {
 		free  = zone_page_state(zone, NR_FREE_PAGES);
 		/* If we have very few page cache pages,
@@ -1814,10 +1819,26 @@ out:
 		unsigned long scan;
 
 		scan = zone_nr_lru_pages(zone, sc, l);
+
 		if (priority || noswap) {
 			scan >>= priority;
 			scan = div64_u64(scan * fraction[file], denominator);
 		}
+
+		if (!scan &&
+		    may_noscan &&
+		    (current_is_kswapd() || !scanning_global_lru(sc))) {
+			/*
+ 			 * if we do target scan, the whole amount of memory
+ 			 * can be too small to scan with low priority value.
+ 			 * This raise up priority rapidly without any scan.
+ 			 * Avoid that and give some scan.
+ 			 */
+			if (file)
+				scan = SWAP_CLUSTER_MAX;
+			else if (!noswap && (fraction[anon] > fraction[file]*16))
+				scan = SWAP_CLUSTER_MAX;
+		}
 		nr[l] = nr_scan_try_batch(scan,
 					  &reclaim_stat->nr_saved_scan[l]);
 	}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] fix get_scan_count for working well with small targets
  2011-04-26  9:17 [PATCH] fix get_scan_count for working well with small targets KAMEZAWA Hiroyuki
@ 2011-04-26 17:36 ` Ying Han
  2011-04-26 23:58   ` KAMEZAWA Hiroyuki
  2011-04-26 20:59 ` Andrew Morton
  1 sibling, 1 reply; 9+ messages in thread
From: Ying Han @ 2011-04-26 17:36 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	nishimura@mxp.nes.nec.co.jp, kosaki.motohiro@jp.fujitsu.com,
	akpm@linux-foundation.org, minchan.kim@gmail.com, mgorman@suse.de

[-- Attachment #1: Type: text/plain, Size: 4430 bytes --]

On Tue, Apr 26, 2011 at 2:17 AM, KAMEZAWA Hiroyuki <
kamezawa.hiroyu@jp.fujitsu.com> wrote:

> At memory reclaim, we determine the number of pages to be scanned
> per zone as
>        (anon + file) >> priority.
> Assume
>        scan = (anon + file) >> priority.
>
> If scan < SWAP_CLUSTER_MAX, shlink_list will be skipped for this
> priority and results no-sacn.  This has some problems.
>
>  1. This increases priority as 1 without any scan.
>     To do scan in DEF_PRIORITY always, amount of pages should be larger
> than
>     512M. If pages>>priority < SWAP_CLUSTER_MAX, it's recorded and scan
> will be
>     batched, later. (But we lose 1 priority.)
>     But if the amount of pages is smaller than 16M, no scan at priority==0
>     forever.
>
>  2. If zone->all_unreclaimabe==true, it's scanned only when priority==0.
>     So, x86's ZONE_DMA will never be recoverred until the user of pages
>     frees memory by itself.
>
>  3. With memcg, the limit of memory can be small. When using small memcg,
>     it gets priority < DEF_PRIORITY-2 very easily and need to call
>     wait_iff_congested().
>     For doing scan before priorty=9, 64MB of memory should be used.
>
> This patch tries to scan SWAP_CLUSTER_MAX of pages in force...when
>
>  1. the target is enough small.
>  2. it's kswapd or memcg reclaim.
>
> Then we can avoid rapid priority drop and may be able to recover
> all_unreclaimable in a small zones.
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> ---
>  mm/vmscan.c |   31 ++++++++++++++++++++++++++-----
>  1 file changed, 26 insertions(+), 5 deletions(-)
>
> Index: memcg/mm/vmscan.c
> ===================================================================
> --- memcg.orig/mm/vmscan.c
> +++ memcg/mm/vmscan.c
> @@ -1737,6 +1737,16 @@ static void get_scan_count(struct zone *
>        u64 fraction[2], denominator;
>        enum lru_list l;
>        int noswap = 0;
> +       int may_noscan = 0;
> +
> +
>
extra line?


> +       anon  = zone_nr_lru_pages(zone, sc, LRU_ACTIVE_ANON) +
> +               zone_nr_lru_pages(zone, sc, LRU_INACTIVE_ANON);
> +       file  = zone_nr_lru_pages(zone, sc, LRU_ACTIVE_FILE) +
> +               zone_nr_lru_pages(zone, sc, LRU_INACTIVE_FILE);
> +
> +       if (((anon + file) >> priority) < SWAP_CLUSTER_MAX)
> +               may_noscan = 1;
>
>        /* If we have no swap space, do not bother scanning anon pages. */
>        if (!sc->may_swap || (nr_swap_pages <= 0)) {
> @@ -1747,11 +1757,6 @@ static void get_scan_count(struct zone *
>                goto out;
>        }
>
> -       anon  = zone_nr_lru_pages(zone, sc, LRU_ACTIVE_ANON) +
> -               zone_nr_lru_pages(zone, sc, LRU_INACTIVE_ANON);
> -       file  = zone_nr_lru_pages(zone, sc, LRU_ACTIVE_FILE) +
> -               zone_nr_lru_pages(zone, sc, LRU_INACTIVE_FILE);
> -
>        if (scanning_global_lru(sc)) {
>                free  = zone_page_state(zone, NR_FREE_PAGES);
>                /* If we have very few page cache pages,
> @@ -1814,10 +1819,26 @@ out:
>                unsigned long scan;
>
>                scan = zone_nr_lru_pages(zone, sc, l);
> +
>
extra line?

>                if (priority || noswap) {
>                        scan >>= priority;
>                        scan = div64_u64(scan * fraction[file],
> denominator);
>                }
> +
> +               if (!scan &&
> +                   may_noscan &&
> +                   (current_is_kswapd() || !scanning_global_lru(sc))) {
> +                       /*
> +                        * if we do target scan, the whole amount of memory
> +                        * can be too small to scan with low priority
> value.
> +                        * This raise up priority rapidly without any scan.
> +                        * Avoid that and give some scan.
> +                        */
> +                       if (file)
> +                               scan = SWAP_CLUSTER_MAX;
> +                       else if (!noswap && (fraction[anon] >
> fraction[file]*16))
> +                               scan = SWAP_CLUSTER_MAX;
> +               }
>
Ok, so we are changing the global kswapd, and per-memcg bg and direct
reclaim both. Just to be clear here.
Also, how did we calculated the "16" to be the fraction of anon vs file?

               nr[l] = nr_scan_try_batch(scan,
>                                          &reclaim_stat->nr_saved_scan[l]);
>        }
>
> Thank you

--Ying

[-- Attachment #2: Type: text/html, Size: 5618 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] fix get_scan_count for working well with small targets
  2011-04-26  9:17 [PATCH] fix get_scan_count for working well with small targets KAMEZAWA Hiroyuki
  2011-04-26 17:36 ` Ying Han
@ 2011-04-26 20:59 ` Andrew Morton
  2011-04-26 23:46   ` KAMEZAWA Hiroyuki
  2011-04-27  1:50   ` [PATCH v2] " KAMEZAWA Hiroyuki
  1 sibling, 2 replies; 9+ messages in thread
From: Andrew Morton @ 2011-04-26 20:59 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	nishimura@mxp.nes.nec.co.jp, kosaki.motohiro@jp.fujitsu.com,
	minchan.kim@gmail.com, mgorman@suse.de, Ying Han

On Tue, 26 Apr 2011 18:17:24 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> At memory reclaim, we determine the number of pages to be scanned
> per zone as
> 	(anon + file) >> priority.
> Assume 
> 	scan = (anon + file) >> priority.
> 
> If scan < SWAP_CLUSTER_MAX, shlink_list will be skipped for this
> priority and results no-sacn.  This has some problems.
> 
>   1. This increases priority as 1 without any scan.
>      To do scan in DEF_PRIORITY always, amount of pages should be larger than
>      512M. If pages>>priority < SWAP_CLUSTER_MAX, it's recorded and scan will be
>      batched, later. (But we lose 1 priority.)
>      But if the amount of pages is smaller than 16M, no scan at priority==0
>      forever.
> 
>   2. If zone->all_unreclaimabe==true, it's scanned only when priority==0.
>      So, x86's ZONE_DMA will never be recoverred until the user of pages
>      frees memory by itself.
> 
>   3. With memcg, the limit of memory can be small. When using small memcg,
>      it gets priority < DEF_PRIORITY-2 very easily and need to call
>      wait_iff_congested().
>      For doing scan before priorty=9, 64MB of memory should be used.
> 
> This patch tries to scan SWAP_CLUSTER_MAX of pages in force...when
> 
>   1. the target is enough small.
>   2. it's kswapd or memcg reclaim.
> 
> Then we can avoid rapid priority drop and may be able to recover
> all_unreclaimable in a small zones.

What about simply removing the nr_saved_scan logic and permitting small
scans?  That simplifies the code and I bet it makes no measurable
performance difference.

(A good thing to do here would be to instrument the code and determine
the frequency with which we perform short scans, as well as their
shortness.  ie: a histogram).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] fix get_scan_count for working well with small targets
  2011-04-26 20:59 ` Andrew Morton
@ 2011-04-26 23:46   ` KAMEZAWA Hiroyuki
  2011-04-27  1:50   ` [PATCH v2] " KAMEZAWA Hiroyuki
  1 sibling, 0 replies; 9+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-04-26 23:46 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	nishimura@mxp.nes.nec.co.jp, kosaki.motohiro@jp.fujitsu.com,
	minchan.kim@gmail.com, mgorman@suse.de, Ying Han

On Tue, 26 Apr 2011 13:59:34 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Tue, 26 Apr 2011 18:17:24 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> 
> > At memory reclaim, we determine the number of pages to be scanned
> > per zone as
> > 	(anon + file) >> priority.
> > Assume 
> > 	scan = (anon + file) >> priority.
> > 
> > If scan < SWAP_CLUSTER_MAX, shlink_list will be skipped for this
> > priority and results no-sacn.  This has some problems.
> > 
> >   1. This increases priority as 1 without any scan.
> >      To do scan in DEF_PRIORITY always, amount of pages should be larger than
> >      512M. If pages>>priority < SWAP_CLUSTER_MAX, it's recorded and scan will be
> >      batched, later. (But we lose 1 priority.)
> >      But if the amount of pages is smaller than 16M, no scan at priority==0
> >      forever.
> > 
> >   2. If zone->all_unreclaimabe==true, it's scanned only when priority==0.
> >      So, x86's ZONE_DMA will never be recoverred until the user of pages
> >      frees memory by itself.
> > 
> >   3. With memcg, the limit of memory can be small. When using small memcg,
> >      it gets priority < DEF_PRIORITY-2 very easily and need to call
> >      wait_iff_congested().
> >      For doing scan before priorty=9, 64MB of memory should be used.
> > 
> > This patch tries to scan SWAP_CLUSTER_MAX of pages in force...when
> > 
> >   1. the target is enough small.
> >   2. it's kswapd or memcg reclaim.
> > 
> > Then we can avoid rapid priority drop and may be able to recover
> > all_unreclaimable in a small zones.
> 
> What about simply removing the nr_saved_scan logic and permitting small
> scans?  That simplifies the code and I bet it makes no measurable
> performance difference.
> 

When I considered memcg, I thought of that. But I noticed ZONE_DMA will not
be scanned even if we do so (and zone->all_unreclaimable will not be recovered
until someone free its page by himself.)

> (A good thing to do here would be to instrument the code and determine
> the frequency with which we perform short scans, as well as their
> shortness.  ie: a histogram).
> 

With memcg, I hope we can scan SWAP_CLUSTER_MAX always, at leaset. Considering
a bad case as
  - memory cgroup is small and the system is swapless, file cache is small.
doing SWAP_CLUSETE_MAX file cache scan always seems to make sense to me.

Thanks,
-Kame










--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] fix get_scan_count for working well with small targets
  2011-04-26 17:36 ` Ying Han
@ 2011-04-26 23:58   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 9+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-04-26 23:58 UTC (permalink / raw)
  To: Ying Han
  Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	nishimura@mxp.nes.nec.co.jp, kosaki.motohiro@jp.fujitsu.com,
	akpm@linux-foundation.org, minchan.kim@gmail.com, mgorman@suse.de

On Tue, 26 Apr 2011 10:36:51 -0700
Ying Han <yinghan@google.com> wrote:

> On Tue, Apr 26, 2011 at 2:17 AM, KAMEZAWA Hiroyuki <
> kamezawa.hiroyu@jp.fujitsu.com> wrote:
> 
> > At memory reclaim, we determine the number of pages to be scanned
> > per zone as
> >        (anon + file) >> priority.
> > Assume
> >        scan = (anon + file) >> priority.
> >
> > If scan < SWAP_CLUSTER_MAX, shlink_list will be skipped for this
> > priority and results no-sacn.  This has some problems.
> >
> >  1. This increases priority as 1 without any scan.
> >     To do scan in DEF_PRIORITY always, amount of pages should be larger
> > than
> >     512M. If pages>>priority < SWAP_CLUSTER_MAX, it's recorded and scan
> > will be
> >     batched, later. (But we lose 1 priority.)
> >     But if the amount of pages is smaller than 16M, no scan at priority==0
> >     forever.
> >
> >  2. If zone->all_unreclaimabe==true, it's scanned only when priority==0.
> >     So, x86's ZONE_DMA will never be recoverred until the user of pages
> >     frees memory by itself.
> >
> >  3. With memcg, the limit of memory can be small. When using small memcg,
> >     it gets priority < DEF_PRIORITY-2 very easily and need to call
> >     wait_iff_congested().
> >     For doing scan before priorty=9, 64MB of memory should be used.
> >
> > This patch tries to scan SWAP_CLUSTER_MAX of pages in force...when
> >
> >  1. the target is enough small.
> >  2. it's kswapd or memcg reclaim.
> >
> > Then we can avoid rapid priority drop and may be able to recover
> > all_unreclaimable in a small zones.
> >
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> > ---
> >  mm/vmscan.c |   31 ++++++++++++++++++++++++++-----
> >  1 file changed, 26 insertions(+), 5 deletions(-)
> >
> > Index: memcg/mm/vmscan.c
> > ===================================================================
> > --- memcg.orig/mm/vmscan.c
> > +++ memcg/mm/vmscan.c
> > @@ -1737,6 +1737,16 @@ static void get_scan_count(struct zone *
> >        u64 fraction[2], denominator;
> >        enum lru_list l;
> >        int noswap = 0;
> > +       int may_noscan = 0;
> > +
> > +
> >
> extra line?
> 
will fix.

> 
> > +       anon  = zone_nr_lru_pages(zone, sc, LRU_ACTIVE_ANON) +
> > +               zone_nr_lru_pages(zone, sc, LRU_INACTIVE_ANON);
> > +       file  = zone_nr_lru_pages(zone, sc, LRU_ACTIVE_FILE) +
> > +               zone_nr_lru_pages(zone, sc, LRU_INACTIVE_FILE);
> > +
> > +       if (((anon + file) >> priority) < SWAP_CLUSTER_MAX)
> > +               may_noscan = 1;
> >
> >        /* If we have no swap space, do not bother scanning anon pages. */
> >        if (!sc->may_swap || (nr_swap_pages <= 0)) {
> > @@ -1747,11 +1757,6 @@ static void get_scan_count(struct zone *
> >                goto out;
> >        }
> >
> > -       anon  = zone_nr_lru_pages(zone, sc, LRU_ACTIVE_ANON) +
> > -               zone_nr_lru_pages(zone, sc, LRU_INACTIVE_ANON);
> > -       file  = zone_nr_lru_pages(zone, sc, LRU_ACTIVE_FILE) +
> > -               zone_nr_lru_pages(zone, sc, LRU_INACTIVE_FILE);
> > -
> >        if (scanning_global_lru(sc)) {
> >                free  = zone_page_state(zone, NR_FREE_PAGES);
> >                /* If we have very few page cache pages,
> > @@ -1814,10 +1819,26 @@ out:
> >                unsigned long scan;
> >
> >                scan = zone_nr_lru_pages(zone, sc, l);
> > +
> >
> extra line?
> 
will fix.

> >                if (priority || noswap) {
> >                        scan >>= priority;
> >                        scan = div64_u64(scan * fraction[file],
> > denominator);
> >                }
> > +
> > +               if (!scan &&
> > +                   may_noscan &&
> > +                   (current_is_kswapd() || !scanning_global_lru(sc))) {
> > +                       /*
> > +                        * if we do target scan, the whole amount of memory
> > +                        * can be too small to scan with low priority
> > value.
> > +                        * This raise up priority rapidly without any scan.
> > +                        * Avoid that and give some scan.
> > +                        */
> > +                       if (file)
> > +                               scan = SWAP_CLUSTER_MAX;
> > +                       else if (!noswap && (fraction[anon] >
> > fraction[file]*16))
> > +                               scan = SWAP_CLUSTER_MAX;
> > +               }
> >
> Ok, so we are changing the global kswapd, and per-memcg bg and direct
> reclaim both. Just to be clear here.

and softlimit reclaim.

> Also, how did we calculated the "16" to be the fraction of anon vs file?
> 
I intended that it implies if file cache is lower than 5-6% of scan target.

With current implementation, which has been used for some long time, we made no
swapout because we do no scan. After this, we may do swapouts which has been
unseen .....I felt it as regression. This check is only for very small zones
or small memcgs. So, I thought it was ok to to limit scanning of anon only when
we needed it. 

Thanks,
-Kame











--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2] fix get_scan_count for working well with small targets
  2011-04-26 20:59 ` Andrew Morton
  2011-04-26 23:46   ` KAMEZAWA Hiroyuki
@ 2011-04-27  1:50   ` KAMEZAWA Hiroyuki
  2011-04-27  3:09     ` KAMEZAWA Hiroyuki
  2011-04-27  5:08     ` Minchan Kim
  1 sibling, 2 replies; 9+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-04-27  1:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	nishimura@mxp.nes.nec.co.jp, kosaki.motohiro@jp.fujitsu.com,
	minchan.kim@gmail.com, mgorman@suse.de, Ying Han

On Tue, 26 Apr 2011 13:59:34 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:

> What about simply removing the nr_saved_scan logic and permitting small
> scans?  That simplifies the code and I bet it makes no measurable
> performance difference.
> 

ok, v2 here. How this looks ?
For memcg, I think I should add select_victim_node() for direct reclaim,
then, we'll be tune big memcg using small memory on a zone case.

==
At memory reclaim, we determine the number of pages to be scanned
per zone as
	(anon + file) >> priority.
Assume 
	scan = (anon + file) >> priority.

If scan < SWAP_CLUSTER_MAX, the scan will be skipped for this time
and priority gets higher. This has some problems.

  1. This increases priority as 1 without any scan.
     To do scan in this priority, amount of pages should be larger than 512M.
     If pages>>priority < SWAP_CLUSTER_MAX, it's recorded and scan will be
     batched, later. (But we lose 1 priority.)
     But if the amount of pages is smaller than 16M, no scan at priority==0
     forever.

  2. If zone->all_unreclaimabe==true, it's scanned only when priority==0.
     So, x86's ZONE_DMA will never be recoverred until the user of pages
     frees memory by itself.

  3. With memcg, the limit of memory can be small. When using small memcg,
     it gets priority < DEF_PRIORITY-2 very easily and need to call
     wait_iff_congested().
     For doing scan before priorty=9, 64MB of memory should be used.

Then, this patch tries to scan SWAP_CLUSTER_MAX of pages in force...when

  1. the target is enough small.
  2. it's kswapd or memcg reclaim.

Then we can avoid rapid priority drop and may be able to recover
all_unreclaimable in a small zones.

Changelog v1->v2:
 - removed nr_scan_try_batch
 - scan anon and file if the target memory is very small.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 mm/vmscan.c |   60 +++++++++++++++++++++++++++++++++---------------------------
 1 file changed, 33 insertions(+), 27 deletions(-)

Index: memcg/mm/vmscan.c
===================================================================
--- memcg.orig/mm/vmscan.c
+++ memcg/mm/vmscan.c
@@ -1700,26 +1700,6 @@ static unsigned long shrink_list(enum lr
 }
 
 /*
- * Smallish @nr_to_scan's are deposited in @nr_saved_scan,
- * until we collected @swap_cluster_max pages to scan.
- */
-static unsigned long nr_scan_try_batch(unsigned long nr_to_scan,
-				       unsigned long *nr_saved_scan)
-{
-	unsigned long nr;
-
-	*nr_saved_scan += nr_to_scan;
-	nr = *nr_saved_scan;
-
-	if (nr >= SWAP_CLUSTER_MAX)
-		*nr_saved_scan = 0;
-	else
-		nr = 0;
-
-	return nr;
-}
-
-/*
  * Determine how aggressively the anon and file LRU lists should be
  * scanned.  The relative value of each set of LRU lists is determined
  * by looking at the fraction of the pages scanned we did rotate back
@@ -1737,6 +1717,22 @@ static void get_scan_count(struct zone *
 	u64 fraction[2], denominator;
 	enum lru_list l;
 	int noswap = 0;
+	int force_scan = 0;
+
+
+	anon  = zone_nr_lru_pages(zone, sc, LRU_ACTIVE_ANON) +
+		zone_nr_lru_pages(zone, sc, LRU_INACTIVE_ANON);
+	file  = zone_nr_lru_pages(zone, sc, LRU_ACTIVE_FILE) +
+		zone_nr_lru_pages(zone, sc, LRU_INACTIVE_FILE);
+
+	if (((anon + file) >> priority) < SWAP_CLUSTER_MAX) {
+		/* kswapd does zone balancing and need to scan this zone */
+		if (scanning_global_lru(sc) && current_is_kswapd())
+			force_scan = 1;
+		/* memcg may have small limit and need to avoid priority drop */
+		if (!scanning_global_lru(sc))
+			force_scan = 1;
+	}
 
 	/* If we have no swap space, do not bother scanning anon pages. */
 	if (!sc->may_swap || (nr_swap_pages <= 0)) {
@@ -1747,11 +1743,6 @@ static void get_scan_count(struct zone *
 		goto out;
 	}
 
-	anon  = zone_nr_lru_pages(zone, sc, LRU_ACTIVE_ANON) +
-		zone_nr_lru_pages(zone, sc, LRU_INACTIVE_ANON);
-	file  = zone_nr_lru_pages(zone, sc, LRU_ACTIVE_FILE) +
-		zone_nr_lru_pages(zone, sc, LRU_INACTIVE_FILE);
-
 	if (scanning_global_lru(sc)) {
 		free  = zone_page_state(zone, NR_FREE_PAGES);
 		/* If we have very few page cache pages,
@@ -1818,8 +1809,23 @@ out:
 			scan >>= priority;
 			scan = div64_u64(scan * fraction[file], denominator);
 		}
-		nr[l] = nr_scan_try_batch(scan,
-					  &reclaim_stat->nr_saved_scan[l]);
+
+		/*
+		 * If zone is small or memcg is small, nr[l] can be 0.
+		 * This results no-scan on this priority and priority drop down.
+		 * For global direct reclaim, it can visit next zone and tend
+		 * not to have problems. For global kswapd, it's for zone
+		 * balancing and it need to scan a small amounts. When using
+		 * memcg, priority drop can cause big latency. So, it's better
+		 * to scan small amount. See may_noscan above.
+		 */
+		if (!scan && force_scan) {
+			if (file)
+				scan = SWAP_CLUSTER_MAX;
+			else if (!noswap)
+				scan = SWAP_CLUSTER_MAX;
+		}
+		nr[l] = scan;
 	}
 }
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] fix get_scan_count for working well with small targets
  2011-04-27  1:50   ` [PATCH v2] " KAMEZAWA Hiroyuki
@ 2011-04-27  3:09     ` KAMEZAWA Hiroyuki
  2011-04-27  5:08     ` Minchan Kim
  1 sibling, 0 replies; 9+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-04-27  3:09 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	nishimura@mxp.nes.nec.co.jp, kosaki.motohiro@jp.fujitsu.com,
	minchan.kim@gmail.com, mgorman@suse.de, Ying Han

On Wed, 27 Apr 2011 10:50:31 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> On Tue, 26 Apr 2011 13:59:34 -0700
> Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > What about simply removing the nr_saved_scan logic and permitting small
> > scans?  That simplifies the code and I bet it makes no measurable
> > performance difference.
> > 
> 
> ok, v2 here. How this looks ?
> For memcg, I think I should add select_victim_node() for direct reclaim,
> then, we'll be tune big memcg using small memory on a zone case.
> 


Ah, sorry this v2 doesn't remove nr_saved_scan in reclaim_stat. ...
I will send v3.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] fix get_scan_count for working well with small targets
  2011-04-27  1:50   ` [PATCH v2] " KAMEZAWA Hiroyuki
  2011-04-27  3:09     ` KAMEZAWA Hiroyuki
@ 2011-04-27  5:08     ` Minchan Kim
  2011-04-27  5:31       ` KAMEZAWA Hiroyuki
  1 sibling, 1 reply; 9+ messages in thread
From: Minchan Kim @ 2011-04-27  5:08 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	nishimura@mxp.nes.nec.co.jp, kosaki.motohiro@jp.fujitsu.com,
	mgorman@suse.de, Ying Han

Hi Kame,

On Wed, Apr 27, 2011 at 10:50 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Tue, 26 Apr 2011 13:59:34 -0700
> Andrew Morton <akpm@linux-foundation.org> wrote:
>
>> What about simply removing the nr_saved_scan logic and permitting small
>> scans?  That simplifies the code and I bet it makes no measurable
>> performance difference.
>>
>
> ok, v2 here. How this looks ?
> For memcg, I think I should add select_victim_node() for direct reclaim,
> then, we'll be tune big memcg using small memory on a zone case.
>
> ==
> At memory reclaim, we determine the number of pages to be scanned
> per zone as
>        (anon + file) >> priority.
> Assume
>        scan = (anon + file) >> priority.
>
> If scan < SWAP_CLUSTER_MAX, the scan will be skipped for this time
> and priority gets higher. This has some problems.
>
>  1. This increases priority as 1 without any scan.
>     To do scan in this priority, amount of pages should be larger than 512M.
>     If pages>>priority < SWAP_CLUSTER_MAX, it's recorded and scan will be
>     batched, later. (But we lose 1 priority.)

Nice catch!  It looks to be much enhance.

>     But if the amount of pages is smaller than 16M, no scan at priority==0
>     forever.

Before reviewing the code, I have a question about this.
Now, in case of (priority = 0), we don't do shift operation with priority.
So nr_saved_scan would be the number of lru list pages. ie, 16M.
Why no-scan happens in case of (priority == 0 and 16M lru pages)?
What am I missing now?

>
>  2. If zone->all_unreclaimabe==true, it's scanned only when priority==0.
>     So, x86's ZONE_DMA will never be recoverred until the user of pages
>     frees memory by itself.
>
>  3. With memcg, the limit of memory can be small. When using small memcg,
>     it gets priority < DEF_PRIORITY-2 very easily and need to call
>     wait_iff_congested().
>     For doing scan before priorty=9, 64MB of memory should be used.

It makes sense.



-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] fix get_scan_count for working well with small targets
  2011-04-27  5:08     ` Minchan Kim
@ 2011-04-27  5:31       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 9+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-04-27  5:31 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	nishimura@mxp.nes.nec.co.jp, kosaki.motohiro@jp.fujitsu.com,
	mgorman@suse.de, Ying Han

On Wed, 27 Apr 2011 14:08:18 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:

> Hi Kame,
> 
> On Wed, Apr 27, 2011 at 10:50 AM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > On Tue, 26 Apr 2011 13:59:34 -0700
> > Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> >> What about simply removing the nr_saved_scan logic and permitting small
> >> scans? A That simplifies the code and I bet it makes no measurable
> >> performance difference.
> >>
> >
> > ok, v2 here. How this looks ?
> > For memcg, I think I should add select_victim_node() for direct reclaim,
> > then, we'll be tune big memcg using small memory on a zone case.
> >
> > ==
> > At memory reclaim, we determine the number of pages to be scanned
> > per zone as
> > A  A  A  A (anon + file) >> priority.
> > Assume
> > A  A  A  A scan = (anon + file) >> priority.
> >
> > If scan < SWAP_CLUSTER_MAX, the scan will be skipped for this time
> > and priority gets higher. This has some problems.
> >
> > A 1. This increases priority as 1 without any scan.
> > A  A  To do scan in this priority, amount of pages should be larger than 512M.
> > A  A  If pages>>priority < SWAP_CLUSTER_MAX, it's recorded and scan will be
> > A  A  batched, later. (But we lose 1 priority.)
> 
> Nice catch!  It looks to be much enhance.
> 
> > A  A  But if the amount of pages is smaller than 16M, no scan at priority==0
> > A  A  forever.
> 


> Before reviewing the code, I have a question about this.
> Now, in case of (priority = 0), we don't do shift operation with priority.>
 So nr_saved_scan would be the number of lru list pages. ie, 16M.
> Why no-scan happens in case of (priority == 0 and 16M lru pages)?
> What am I missing now?
> 
An, sorry. My comment is wrong. no scan at priority == DEF_PRIORITY.
I'll fix description.

But....
Now, in direct reclaim path
==
static void shrink_zones(int priority, struct zonelist *zonelist,
                                        struct scan_control *sc)
{
....
                if (scanning_global_lru(sc)) {
                        if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
                                continue;
                        if (zone->all_unreclaimable && priority != DEF_PRIORITY)
                                continue;       /* Let kswapd poll it */
                }
==

And in kswapd path
==
                /*
                 * Scan in the highmem->dma direction for the highest
                 * zone which needs scanning
                 */
                for (i = pgdat->nr_zones - 1; i >= 0; i--) {
                        struct zone *zone = pgdat->node_zones + i;

                        if (!populated_zone(zone))
                                continue;

                        if (zone->all_unreclaimable && priority != DEF_PRIORITY)
                                continue;
....
               for (i = 0; i <= end_zone; i++) {
                        if (zone->all_unreclaimable && priority != DEF_PRIORITY)
                                continue;

==

So, all_unreclaimable zones are only scanned when priority==DEF_PRIORITY.
But, in DEF_PRIORITY, scan count is always zero because of priority shift.
So, yes, no scan even if priority==0 even after setting all_unreclaimable == true.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2011-04-27  5:38 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-26  9:17 [PATCH] fix get_scan_count for working well with small targets KAMEZAWA Hiroyuki
2011-04-26 17:36 ` Ying Han
2011-04-26 23:58   ` KAMEZAWA Hiroyuki
2011-04-26 20:59 ` Andrew Morton
2011-04-26 23:46   ` KAMEZAWA Hiroyuki
2011-04-27  1:50   ` [PATCH v2] " KAMEZAWA Hiroyuki
2011-04-27  3:09     ` KAMEZAWA Hiroyuki
2011-04-27  5:08     ` Minchan Kim
2011-04-27  5:31       ` KAMEZAWA Hiroyuki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).