Re: [PATCH 2/2] vmscan: kill shrink_all_zones()

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Rafael J. Wysocki" <rjw@sisk.pl>
To: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: LKML <linux-kernel@vger.kernel.org>, "linux-mm" <linux-mm@kvack.org>
Subject: Re: [PATCH 2/2] vmscan: kill shrink_all_zones()
Date: Sun, 11 Oct 2009 23:01:29 +0200	[thread overview]
Message-ID: <200910112301.29237.rjw@sisk.pl> (raw)
In-Reply-To: <20091009175559.12B8.A69D9226@jp.fujitsu.com>

On Friday 09 October 2009, KOSAKI Motohiro wrote:
> shrink_all_zone() was introduced by commit d6277db4ab (swsusp: rework
> memory shrinker) for hibernate performance improvement. and sc.swap_cluster_max
> was introduced by commit a06fe4d307 (Speed freeing memory for suspend).
> 
> commit a06fe4d307 said
> 
>     Without the patch:
>     Freed  14600 pages in  1749 jiffies = 32.61 MB/s (Anomolous!)
>     Freed  88563 pages in 14719 jiffies = 23.50 MB/s
>     Freed 205734 pages in 32389 jiffies = 24.81 MB/s
> 
>     With the patch:
>     Freed  68252 pages in   496 jiffies = 537.52 MB/s
>     Freed 116464 pages in   569 jiffies = 798.54 MB/s
>     Freed 209699 pages in   705 jiffies = 1161.89 MB/s
> 
> At that time, their patch was pretty worth. However, Modern Hardware
> trend and recent VM improvement broke its worth. From several reason,
> I think we should remove shrink_all_zones() at all.
> 
> detail:
> 
> 1) Old days, shrink_zone()'s slowness was mainly caused by stupid congestion_wait()
>    at no i/o congestion.
>    but current shrink_zone() is sane, not slow.
> 
> 2) shrink_all_zone() try to shrink all pages at a time. but it doesn't works
>    fine on numa system.
>    example)
>      System has 4GB memory and each node have 2GB. and hibernate need 1GB.
> 
>      optimal)
> 	steal 500MB from each node.
>      shrink_all_zones)
> 	steal 1GB from node-0.
> 
>    Oh, Cache balancing was broke ;)
>    Unfortunately, Desktop system moved ahead NUMA.
>    (Side note, if hibernate require 2GB, shrink_all_zones() never success)
> 
> 3) if the node has several I/O flighting pages, shrink_all_zones() makes
>    pretty bad result.
> 
>    schenario) hibernate need 1GB
> 
>    1) shrink_all_zones() try to reclaim 1GB from Node-0
>    2) but it only reclaimed 990MB
>    3) stupidly, shrink_all_zones() try to reclaim 1GB from Node-1
>    4) it reclaimed 990MB
> 
>    Oh, well. it reclaimed twice much than required.
>    In the other hand, current shrink_zone() has sane baling out logic.
>    then, it doesn't make overkill reclaim. then, we lost shrink_zones()'s risk.
> 
> 4) SplitLRU VM always keep active/inactive ratio very carefully. inactive list only
>    shrinking break its assumption. it makes unnecessary OOM risk. it obviously suboptimal.
> 
>   Throuput comparision
>   ==============================================
>   old		2192.10 MB/s
>   new		2222.22 MB/s
> 
>   ok, it's almost same throuput.
> 
> Cc: Rafael J. Wysocki <rjw@sisk.pl>
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

I have no objections to any of the two patches, but I think we may want to drop
shrink_all_memory() altogether.  Everything should work without it and the
reason I didn't remove it was because I saw a performance regression on one
system without it.  It may not be worth keeping it, though.

Have you done any tests with shrink_all_memory() removed?

Rafael


> ---
>  mm/vmscan.c |   75 +++++++++++++---------------------------------------------
>  1 files changed, 17 insertions(+), 58 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 80e727d..9f28166 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2130,51 +2130,6 @@ unsigned long global_lru_pages(void)
>  
>  #ifdef CONFIG_HIBERNATION
>  /*
> - * Helper function for shrink_all_memory().  Tries to reclaim 'nr_pages' pages
> - * from LRU lists system-wide, for given pass and priority.
> - *
> - * For pass > 3 we also try to shrink the LRU lists that contain a few pages
> - */
> -static void shrink_all_zones(unsigned long nr_pages, int prio,
> -				      int pass, struct scan_control *sc)
> -{
> -	struct zone *zone;
> -	unsigned long nr_reclaimed = 0;
> -
> -	for_each_populated_zone(zone) {
> -		enum lru_list l;
> -
> -		if (zone_is_all_unreclaimable(zone) && prio != DEF_PRIORITY)
> -			continue;
> -
> -		for_each_evictable_lru(l) {
> -			enum zone_stat_item ls = NR_LRU_BASE + l;
> -			unsigned long lru_pages = zone_page_state(zone, ls);
> -
> -			/* For pass = 0, we don't shrink the active list */
> -			if (pass == 0 && (l == LRU_ACTIVE_ANON ||
> -						l == LRU_ACTIVE_FILE))
> -				continue;
> -
> -			zone->lru[l].nr_saved_scan += (lru_pages >> prio) + 1;
> -			if (zone->lru[l].nr_saved_scan >= nr_pages || pass > 3) {
> -				unsigned long nr_to_scan;
> -
> -				zone->lru[l].nr_saved_scan = 0;
> -				nr_to_scan = min(nr_pages, lru_pages);
> -				nr_reclaimed += shrink_list(l, nr_to_scan, zone,
> -								sc, prio);
> -				if (nr_reclaimed >= nr_pages) {
> -					sc->nr_reclaimed += nr_reclaimed;
> -					return;
> -				}
> -			}
> -		}
> -	}
> -	sc->nr_reclaimed += nr_reclaimed;
> -}
> -
> -/*
>   * Try to free `nr_pages' of memory, system-wide, and return the number of
>   * freed pages.
>   *
> @@ -2188,12 +2143,18 @@ unsigned long shrink_all_memory(unsigned long nr_pages)
>  	int pass;
>  	struct reclaim_state reclaim_state;
>  	struct scan_control sc = {
> -		.gfp_mask = GFP_KERNEL,
> +		.gfp_mask = GFP_HIGHUSER_MOVABLE,
> +		.may_swap = 0,
>  		.may_unmap = 0,
>  		.may_writepage = 1,
> +		.swap_cluster_max = SWAP_CLUSTER_MAX,
> +		.nr_to_reclaim = nr_pages,
> +		.swappiness = vm_swappiness,
> +		.order = 0,
>  		.isolate_pages = isolate_pages_global,
> -		.nr_reclaimed = 0,
>  	};
> +	struct zonelist * zonelist = node_zonelist(first_online_node,
> +						   GFP_HIGHUSER_MOVABLE);
>  
>  	current->reclaim_state = &reclaim_state;
>  
> @@ -2215,9 +2176,9 @@ unsigned long shrink_all_memory(unsigned long nr_pages)
>  
>  	/*
>  	 * We try to shrink LRUs in 5 passes:
> -	 * 0 = Reclaim from inactive_list only
> -	 * 1 = Reclaim from active list but don't reclaim mapped
> -	 * 2 = 2nd pass of type 1
> +	 * 0 = Reclaim unmapped pages
> +	 * 1 = 2nd pass of type 0
> +	 * 2 = 3rd pass of type 0
>  	 * 3 = Reclaim mapped (normal reclaim)
>  	 * 4 = 2nd pass of type 3
>  	 */
> @@ -2225,15 +2186,15 @@ unsigned long shrink_all_memory(unsigned long nr_pages)
>  		int prio;
>  
>  		/* Force reclaiming mapped pages in the passes #3 and #4 */
> -		if (pass > 2)
> +		if (pass > 2) {
>  			sc.may_unmap = 1;
> +			sc.may_swap = 1;
> +		}
>  
>  		for (prio = DEF_PRIORITY; prio >= 0; prio--) {
> -			unsigned long nr_to_scan = nr_pages - sc.nr_reclaimed;
> -
>  			sc.nr_scanned = 0;
> -			sc.swap_cluster_max = nr_to_scan;
> -			shrink_all_zones(nr_to_scan, prio, pass, &sc);
> +
> +			shrink_zones(prio, zonelist, &sc);
>  			if (sc.nr_reclaimed >= nr_pages)
>  				goto out;
>  
> @@ -2243,10 +2204,8 @@ unsigned long shrink_all_memory(unsigned long nr_pages)
>  			sc.nr_reclaimed += reclaim_state.reclaimed_slab;
>  			if (sc.nr_reclaimed >= nr_pages)
>  				goto out;
> -
> -			if (sc.nr_scanned && prio < DEF_PRIORITY - 2)
> -				congestion_wait(BLK_RW_ASYNC, HZ / 10);
>  		}
> +		congestion_wait(BLK_RW_ASYNC, HZ / 10);
>  	}
>  
>  	/*
>

WARNING: multiple messages have this Message-ID (diff)

From: "Rafael J. Wysocki" <rjw@sisk.pl>
To: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: LKML <linux-kernel@vger.kernel.org>, linux-mm <linux-mm@kvack.org>
Subject: Re: [PATCH 2/2] vmscan: kill shrink_all_zones()
Date: Sun, 11 Oct 2009 23:01:29 +0200	[thread overview]
Message-ID: <200910112301.29237.rjw@sisk.pl> (raw)
In-Reply-To: <20091009175559.12B8.A69D9226@jp.fujitsu.com>

On Friday 09 October 2009, KOSAKI Motohiro wrote:
> shrink_all_zone() was introduced by commit d6277db4ab (swsusp: rework
> memory shrinker) for hibernate performance improvement. and sc.swap_cluster_max
> was introduced by commit a06fe4d307 (Speed freeing memory for suspend).
> 
> commit a06fe4d307 said
> 
>     Without the patch:
>     Freed  14600 pages in  1749 jiffies = 32.61 MB/s (Anomolous!)
>     Freed  88563 pages in 14719 jiffies = 23.50 MB/s
>     Freed 205734 pages in 32389 jiffies = 24.81 MB/s
> 
>     With the patch:
>     Freed  68252 pages in   496 jiffies = 537.52 MB/s
>     Freed 116464 pages in   569 jiffies = 798.54 MB/s
>     Freed 209699 pages in   705 jiffies = 1161.89 MB/s
> 
> At that time, their patch was pretty worth. However, Modern Hardware
> trend and recent VM improvement broke its worth. From several reason,
> I think we should remove shrink_all_zones() at all.
> 
> detail:
> 
> 1) Old days, shrink_zone()'s slowness was mainly caused by stupid congestion_wait()
>    at no i/o congestion.
>    but current shrink_zone() is sane, not slow.
> 
> 2) shrink_all_zone() try to shrink all pages at a time. but it doesn't works
>    fine on numa system.
>    example)
>      System has 4GB memory and each node have 2GB. and hibernate need 1GB.
> 
>      optimal)
> 	steal 500MB from each node.
>      shrink_all_zones)
> 	steal 1GB from node-0.
> 
>    Oh, Cache balancing was broke ;)
>    Unfortunately, Desktop system moved ahead NUMA.
>    (Side note, if hibernate require 2GB, shrink_all_zones() never success)
> 
> 3) if the node has several I/O flighting pages, shrink_all_zones() makes
>    pretty bad result.
> 
>    schenario) hibernate need 1GB
> 
>    1) shrink_all_zones() try to reclaim 1GB from Node-0
>    2) but it only reclaimed 990MB
>    3) stupidly, shrink_all_zones() try to reclaim 1GB from Node-1
>    4) it reclaimed 990MB
> 
>    Oh, well. it reclaimed twice much than required.
>    In the other hand, current shrink_zone() has sane baling out logic.
>    then, it doesn't make overkill reclaim. then, we lost shrink_zones()'s risk.
> 
> 4) SplitLRU VM always keep active/inactive ratio very carefully. inactive list only
>    shrinking break its assumption. it makes unnecessary OOM risk. it obviously suboptimal.
> 
>   Throuput comparision
>   ==============================================
>   old		2192.10 MB/s
>   new		2222.22 MB/s
> 
>   ok, it's almost same throuput.
> 
> Cc: Rafael J. Wysocki <rjw@sisk.pl>
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

I have no objections to any of the two patches, but I think we may want to drop
shrink_all_memory() altogether.  Everything should work without it and the
reason I didn't remove it was because I saw a performance regression on one
system without it.  It may not be worth keeping it, though.

Have you done any tests with shrink_all_memory() removed?

Rafael


> ---
>  mm/vmscan.c |   75 +++++++++++++---------------------------------------------
>  1 files changed, 17 insertions(+), 58 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 80e727d..9f28166 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2130,51 +2130,6 @@ unsigned long global_lru_pages(void)
>  
>  #ifdef CONFIG_HIBERNATION
>  /*
> - * Helper function for shrink_all_memory().  Tries to reclaim 'nr_pages' pages
> - * from LRU lists system-wide, for given pass and priority.
> - *
> - * For pass > 3 we also try to shrink the LRU lists that contain a few pages
> - */
> -static void shrink_all_zones(unsigned long nr_pages, int prio,
> -				      int pass, struct scan_control *sc)
> -{
> -	struct zone *zone;
> -	unsigned long nr_reclaimed = 0;
> -
> -	for_each_populated_zone(zone) {
> -		enum lru_list l;
> -
> -		if (zone_is_all_unreclaimable(zone) && prio != DEF_PRIORITY)
> -			continue;
> -
> -		for_each_evictable_lru(l) {
> -			enum zone_stat_item ls = NR_LRU_BASE + l;
> -			unsigned long lru_pages = zone_page_state(zone, ls);
> -
> -			/* For pass = 0, we don't shrink the active list */
> -			if (pass == 0 && (l == LRU_ACTIVE_ANON ||
> -						l == LRU_ACTIVE_FILE))
> -				continue;
> -
> -			zone->lru[l].nr_saved_scan += (lru_pages >> prio) + 1;
> -			if (zone->lru[l].nr_saved_scan >= nr_pages || pass > 3) {
> -				unsigned long nr_to_scan;
> -
> -				zone->lru[l].nr_saved_scan = 0;
> -				nr_to_scan = min(nr_pages, lru_pages);
> -				nr_reclaimed += shrink_list(l, nr_to_scan, zone,
> -								sc, prio);
> -				if (nr_reclaimed >= nr_pages) {
> -					sc->nr_reclaimed += nr_reclaimed;
> -					return;
> -				}
> -			}
> -		}
> -	}
> -	sc->nr_reclaimed += nr_reclaimed;
> -}
> -
> -/*
>   * Try to free `nr_pages' of memory, system-wide, and return the number of
>   * freed pages.
>   *
> @@ -2188,12 +2143,18 @@ unsigned long shrink_all_memory(unsigned long nr_pages)
>  	int pass;
>  	struct reclaim_state reclaim_state;
>  	struct scan_control sc = {
> -		.gfp_mask = GFP_KERNEL,
> +		.gfp_mask = GFP_HIGHUSER_MOVABLE,
> +		.may_swap = 0,
>  		.may_unmap = 0,
>  		.may_writepage = 1,
> +		.swap_cluster_max = SWAP_CLUSTER_MAX,
> +		.nr_to_reclaim = nr_pages,
> +		.swappiness = vm_swappiness,
> +		.order = 0,
>  		.isolate_pages = isolate_pages_global,
> -		.nr_reclaimed = 0,
>  	};
> +	struct zonelist * zonelist = node_zonelist(first_online_node,
> +						   GFP_HIGHUSER_MOVABLE);
>  
>  	current->reclaim_state = &reclaim_state;
>  
> @@ -2215,9 +2176,9 @@ unsigned long shrink_all_memory(unsigned long nr_pages)
>  
>  	/*
>  	 * We try to shrink LRUs in 5 passes:
> -	 * 0 = Reclaim from inactive_list only
> -	 * 1 = Reclaim from active list but don't reclaim mapped
> -	 * 2 = 2nd pass of type 1
> +	 * 0 = Reclaim unmapped pages
> +	 * 1 = 2nd pass of type 0
> +	 * 2 = 3rd pass of type 0
>  	 * 3 = Reclaim mapped (normal reclaim)
>  	 * 4 = 2nd pass of type 3
>  	 */
> @@ -2225,15 +2186,15 @@ unsigned long shrink_all_memory(unsigned long nr_pages)
>  		int prio;
>  
>  		/* Force reclaiming mapped pages in the passes #3 and #4 */
> -		if (pass > 2)
> +		if (pass > 2) {
>  			sc.may_unmap = 1;
> +			sc.may_swap = 1;
> +		}
>  
>  		for (prio = DEF_PRIORITY; prio >= 0; prio--) {
> -			unsigned long nr_to_scan = nr_pages - sc.nr_reclaimed;
> -
>  			sc.nr_scanned = 0;
> -			sc.swap_cluster_max = nr_to_scan;
> -			shrink_all_zones(nr_to_scan, prio, pass, &sc);
> +
> +			shrink_zones(prio, zonelist, &sc);
>  			if (sc.nr_reclaimed >= nr_pages)
>  				goto out;
>  
> @@ -2243,10 +2204,8 @@ unsigned long shrink_all_memory(unsigned long nr_pages)
>  			sc.nr_reclaimed += reclaim_state.reclaimed_slab;
>  			if (sc.nr_reclaimed >= nr_pages)
>  				goto out;
> -
> -			if (sc.nr_scanned && prio < DEF_PRIORITY - 2)
> -				congestion_wait(BLK_RW_ASYNC, HZ / 10);
>  		}
> +		congestion_wait(BLK_RW_ASYNC, HZ / 10);
>  	}
>  
>  	/*
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2009-10-11 21:00 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-10-09  8:55 [PATCH 1/2] vmscan: separate sc.swap_cluster_max and sc.nr_max_reclaim KOSAKI Motohiro
2009-10-09  8:55 ` KOSAKI Motohiro
2009-10-09  8:56 ` [PATCH 2/2] vmscan: kill shrink_all_zones() KOSAKI Motohiro
2009-10-09  8:56   ` KOSAKI Motohiro
2009-10-09 20:34   ` Rik van Riel
2009-10-09 20:34     ` Rik van Riel
2009-10-11 21:01   ` Rafael J. Wysocki [this message]
2009-10-11 21:01     ` Rafael J. Wysocki
2009-10-12  8:12     ` KOSAKI Motohiro
2009-10-12  8:12       ` KOSAKI Motohiro
2009-10-09 20:33 ` [PATCH 1/2] vmscan: separate sc.swap_cluster_max and sc.nr_max_reclaim Rik van Riel
2009-10-09 20:33   ` Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200910112301.29237.rjw@sisk.pl \
    --to=rjw@sisk.pl \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.