All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wu Fengguang <fengguang.wu@intel.com>
To: Mel Gorman <mel@csn.ul.ie>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Rik van Riel <riel@redhat.com>,
	Christoph Lameter <cl@linux-foundation.org>,
	"Zhang, Yanmin" <yanmin.zhang@intel.com>,
	"linuxram@us.ibm.com" <linuxram@us.ibm.com>,
	linux-mm <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/4] Properly account for the number of page cache pages zone_reclaim() can reclaim
Date: Wed, 10 Jun 2009 09:19:39 +0800	[thread overview]
Message-ID: <20090610011939.GA5603@localhost> (raw)
In-Reply-To: <1244566904-31470-2-git-send-email-mel@csn.ul.ie>

On Wed, Jun 10, 2009 at 01:01:41AM +0800, Mel Gorman wrote:
> On NUMA machines, the administrator can configure zone_reclaim_mode that
> is a more targetted form of direct reclaim. On machines with large NUMA
> distances for example, a zone_reclaim_mode defaults to 1 meaning that clean
> unmapped pages will be reclaimed if the zone watermarks are not being met.
> 
> There is a heuristic that determines if the scan is worthwhile but the
> problem is that the heuristic is not being properly applied and is basically
> assuming zone_reclaim_mode is 1 if it is enabled.
> 
> Historically, once enabled it was depending on NR_FILE_PAGES which may
> include swapcache pages that the reclaim_mode cannot deal with.  Patch
> vmscan-change-the-number-of-the-unmapped-files-in-zone-reclaim.patch by
> Kosaki Motohiro noted that zone_page_state(zone, NR_FILE_PAGES) included
> pages that were not file-backed such as swapcache and made a calculation
> based on the inactive, active and mapped files. This is far superior
> when zone_reclaim==1 but if RECLAIM_SWAP is set, then NR_FILE_PAGES is a
> reasonable starting figure.
> 
> This patch alters how zone_reclaim() works out how many pages it might be
> able to reclaim given the current reclaim_mode. If RECLAIM_SWAP is set
> in the reclaim_mode it will either consider NR_FILE_PAGES as potential
> candidates or else use NR_{IN}ACTIVE}_PAGES-NR_FILE_MAPPED to discount
> swapcache and other non-file-backed pages.  If RECLAIM_WRITE is not set,
> then NR_FILE_DIRTY number of pages are not candidates. If RECLAIM_SWAP is
> not set, then NR_FILE_MAPPED are not.
> 
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> Acked-by: Christoph Lameter <cl@linux-foundation.org>
> ---
>  mm/vmscan.c |   52 ++++++++++++++++++++++++++++++++++++++--------------
>  1 files changed, 38 insertions(+), 14 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 2ddcfc8..2bfc76e 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2333,6 +2333,41 @@ int sysctl_min_unmapped_ratio = 1;
>   */
>  int sysctl_min_slab_ratio = 5;
>  
> +static inline unsigned long zone_unmapped_file_pages(struct zone *zone)
> +{
> +	return zone_page_state(zone, NR_INACTIVE_FILE) +
> +		zone_page_state(zone, NR_ACTIVE_FILE) -
> +		zone_page_state(zone, NR_FILE_MAPPED);

This may underflow if too many tmpfs pages are mapped.

> +}
> +
> +/* Work out how many page cache pages we can reclaim in this reclaim_mode */
> +static inline long zone_pagecache_reclaimable(struct zone *zone)
> +{
> +	long nr_pagecache_reclaimable;
> +	long delta = 0;
> +
> +	/*
> +	 * If RECLAIM_SWAP is set, then all file pages are considered
> +	 * potentially reclaimable. Otherwise, we have to worry about
> +	 * pages like swapcache and zone_unmapped_file_pages() provides
> +	 * a better estimate
> +	 */
> +	if (zone_reclaim_mode & RECLAIM_SWAP)
> +		nr_pagecache_reclaimable = zone_page_state(zone, NR_FILE_PAGES);
> +	else
> +		nr_pagecache_reclaimable = zone_unmapped_file_pages(zone);
> +
> +	/* If we can't clean pages, remove dirty pages from consideration */
> +	if (!(zone_reclaim_mode & RECLAIM_WRITE))
> +		delta += zone_page_state(zone, NR_FILE_DIRTY);
> +
> +	/* Beware of double accounting */

The double accounting happens for NR_FILE_MAPPED but not
NR_FILE_DIRTY(dirty tmpfs pages won't be accounted), so this comment
is more suitable for zone_unmapped_file_pages(). But the double
accounting does affects this abstraction. So a more reasonable
sequence could be to first substract NR_FILE_DIRTY and then
conditionally substract NR_FILE_MAPPED?

Or better to introduce a new counter NR_TMPFS_MAPPED to fix this mess?

Thanks,
Fengguang

> +	if (delta < nr_pagecache_reclaimable)
> +		nr_pagecache_reclaimable -= delta;
> +
> +	return nr_pagecache_reclaimable;
> +}
> +
>  /*
>   * Try to free up some pages from this zone through reclaim.
>   */
> @@ -2355,7 +2390,6 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
>  		.isolate_pages = isolate_pages_global,
>  	};
>  	unsigned long slab_reclaimable;
> -	long nr_unmapped_file_pages;
>  
>  	disable_swap_token();
>  	cond_resched();
> @@ -2368,11 +2402,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
>  	reclaim_state.reclaimed_slab = 0;
>  	p->reclaim_state = &reclaim_state;
>  
> -	nr_unmapped_file_pages = zone_page_state(zone, NR_INACTIVE_FILE) +
> -				 zone_page_state(zone, NR_ACTIVE_FILE) -
> -				 zone_page_state(zone, NR_FILE_MAPPED);
> -
> -	if (nr_unmapped_file_pages > zone->min_unmapped_pages) {
> +	if (zone_pagecache_reclaimable(zone) > zone->min_unmapped_pages) {
>  		/*
>  		 * Free memory by calling shrink zone with increasing
>  		 * priorities until we have enough memory freed.
> @@ -2419,8 +2449,6 @@ int zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
>  {
>  	int node_id;
>  	int ret;
> -	long nr_unmapped_file_pages;
> -	long nr_slab_reclaimable;
>  
>  	/*
>  	 * Zone reclaim reclaims unmapped file backed pages and
> @@ -2432,12 +2460,8 @@ int zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
>  	 * if less than a specified percentage of the zone is used by
>  	 * unmapped file backed pages.
>  	 */
> -	nr_unmapped_file_pages = zone_page_state(zone, NR_INACTIVE_FILE) +
> -				 zone_page_state(zone, NR_ACTIVE_FILE) -
> -				 zone_page_state(zone, NR_FILE_MAPPED);
> -	nr_slab_reclaimable = zone_page_state(zone, NR_SLAB_RECLAIMABLE);
> -	if (nr_unmapped_file_pages <= zone->min_unmapped_pages &&
> -	    nr_slab_reclaimable <= zone->min_slab_pages)
> +	if (zone_pagecache_reclaimable(zone) <= zone->min_unmapped_pages &&
> +	    zone_page_state(zone, NR_SLAB_RECLAIMABLE) <= zone->min_slab_pages)
>  		return 0;
>  
>  	if (zone_is_all_unreclaimable(zone))
> -- 
> 1.5.6.5

WARNING: multiple messages have this Message-ID (diff)
From: Wu Fengguang <fengguang.wu@intel.com>
To: Mel Gorman <mel@csn.ul.ie>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Rik van Riel <riel@redhat.com>,
	Christoph Lameter <cl@linux-foundation.org>,
	"Zhang, Yanmin" <yanmin.zhang@intel.com>,
	"linuxram@us.ibm.com" <linuxram@us.ibm.com>,
	linux-mm <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/4] Properly account for the number of page cache pages zone_reclaim() can reclaim
Date: Wed, 10 Jun 2009 09:19:39 +0800	[thread overview]
Message-ID: <20090610011939.GA5603@localhost> (raw)
In-Reply-To: <1244566904-31470-2-git-send-email-mel@csn.ul.ie>

On Wed, Jun 10, 2009 at 01:01:41AM +0800, Mel Gorman wrote:
> On NUMA machines, the administrator can configure zone_reclaim_mode that
> is a more targetted form of direct reclaim. On machines with large NUMA
> distances for example, a zone_reclaim_mode defaults to 1 meaning that clean
> unmapped pages will be reclaimed if the zone watermarks are not being met.
> 
> There is a heuristic that determines if the scan is worthwhile but the
> problem is that the heuristic is not being properly applied and is basically
> assuming zone_reclaim_mode is 1 if it is enabled.
> 
> Historically, once enabled it was depending on NR_FILE_PAGES which may
> include swapcache pages that the reclaim_mode cannot deal with.  Patch
> vmscan-change-the-number-of-the-unmapped-files-in-zone-reclaim.patch by
> Kosaki Motohiro noted that zone_page_state(zone, NR_FILE_PAGES) included
> pages that were not file-backed such as swapcache and made a calculation
> based on the inactive, active and mapped files. This is far superior
> when zone_reclaim==1 but if RECLAIM_SWAP is set, then NR_FILE_PAGES is a
> reasonable starting figure.
> 
> This patch alters how zone_reclaim() works out how many pages it might be
> able to reclaim given the current reclaim_mode. If RECLAIM_SWAP is set
> in the reclaim_mode it will either consider NR_FILE_PAGES as potential
> candidates or else use NR_{IN}ACTIVE}_PAGES-NR_FILE_MAPPED to discount
> swapcache and other non-file-backed pages.  If RECLAIM_WRITE is not set,
> then NR_FILE_DIRTY number of pages are not candidates. If RECLAIM_SWAP is
> not set, then NR_FILE_MAPPED are not.
> 
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> Acked-by: Christoph Lameter <cl@linux-foundation.org>
> ---
>  mm/vmscan.c |   52 ++++++++++++++++++++++++++++++++++++++--------------
>  1 files changed, 38 insertions(+), 14 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 2ddcfc8..2bfc76e 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2333,6 +2333,41 @@ int sysctl_min_unmapped_ratio = 1;
>   */
>  int sysctl_min_slab_ratio = 5;
>  
> +static inline unsigned long zone_unmapped_file_pages(struct zone *zone)
> +{
> +	return zone_page_state(zone, NR_INACTIVE_FILE) +
> +		zone_page_state(zone, NR_ACTIVE_FILE) -
> +		zone_page_state(zone, NR_FILE_MAPPED);

This may underflow if too many tmpfs pages are mapped.

> +}
> +
> +/* Work out how many page cache pages we can reclaim in this reclaim_mode */
> +static inline long zone_pagecache_reclaimable(struct zone *zone)
> +{
> +	long nr_pagecache_reclaimable;
> +	long delta = 0;
> +
> +	/*
> +	 * If RECLAIM_SWAP is set, then all file pages are considered
> +	 * potentially reclaimable. Otherwise, we have to worry about
> +	 * pages like swapcache and zone_unmapped_file_pages() provides
> +	 * a better estimate
> +	 */
> +	if (zone_reclaim_mode & RECLAIM_SWAP)
> +		nr_pagecache_reclaimable = zone_page_state(zone, NR_FILE_PAGES);
> +	else
> +		nr_pagecache_reclaimable = zone_unmapped_file_pages(zone);
> +
> +	/* If we can't clean pages, remove dirty pages from consideration */
> +	if (!(zone_reclaim_mode & RECLAIM_WRITE))
> +		delta += zone_page_state(zone, NR_FILE_DIRTY);
> +
> +	/* Beware of double accounting */

The double accounting happens for NR_FILE_MAPPED but not
NR_FILE_DIRTY(dirty tmpfs pages won't be accounted), so this comment
is more suitable for zone_unmapped_file_pages(). But the double
accounting does affects this abstraction. So a more reasonable
sequence could be to first substract NR_FILE_DIRTY and then
conditionally substract NR_FILE_MAPPED?

Or better to introduce a new counter NR_TMPFS_MAPPED to fix this mess?

Thanks,
Fengguang

> +	if (delta < nr_pagecache_reclaimable)
> +		nr_pagecache_reclaimable -= delta;
> +
> +	return nr_pagecache_reclaimable;
> +}
> +
>  /*
>   * Try to free up some pages from this zone through reclaim.
>   */
> @@ -2355,7 +2390,6 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
>  		.isolate_pages = isolate_pages_global,
>  	};
>  	unsigned long slab_reclaimable;
> -	long nr_unmapped_file_pages;
>  
>  	disable_swap_token();
>  	cond_resched();
> @@ -2368,11 +2402,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
>  	reclaim_state.reclaimed_slab = 0;
>  	p->reclaim_state = &reclaim_state;
>  
> -	nr_unmapped_file_pages = zone_page_state(zone, NR_INACTIVE_FILE) +
> -				 zone_page_state(zone, NR_ACTIVE_FILE) -
> -				 zone_page_state(zone, NR_FILE_MAPPED);
> -
> -	if (nr_unmapped_file_pages > zone->min_unmapped_pages) {
> +	if (zone_pagecache_reclaimable(zone) > zone->min_unmapped_pages) {
>  		/*
>  		 * Free memory by calling shrink zone with increasing
>  		 * priorities until we have enough memory freed.
> @@ -2419,8 +2449,6 @@ int zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
>  {
>  	int node_id;
>  	int ret;
> -	long nr_unmapped_file_pages;
> -	long nr_slab_reclaimable;
>  
>  	/*
>  	 * Zone reclaim reclaims unmapped file backed pages and
> @@ -2432,12 +2460,8 @@ int zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
>  	 * if less than a specified percentage of the zone is used by
>  	 * unmapped file backed pages.
>  	 */
> -	nr_unmapped_file_pages = zone_page_state(zone, NR_INACTIVE_FILE) +
> -				 zone_page_state(zone, NR_ACTIVE_FILE) -
> -				 zone_page_state(zone, NR_FILE_MAPPED);
> -	nr_slab_reclaimable = zone_page_state(zone, NR_SLAB_RECLAIMABLE);
> -	if (nr_unmapped_file_pages <= zone->min_unmapped_pages &&
> -	    nr_slab_reclaimable <= zone->min_slab_pages)
> +	if (zone_pagecache_reclaimable(zone) <= zone->min_unmapped_pages &&
> +	    zone_page_state(zone, NR_SLAB_RECLAIMABLE) <= zone->min_slab_pages)
>  		return 0;
>  
>  	if (zone_is_all_unreclaimable(zone))
> -- 
> 1.5.6.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2009-06-10  1:19 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-06-09 17:01 [PATCH 0/4] [RFC] Functional fix to zone_reclaim() and bring behaviour more in line with expectations V2 Mel Gorman
2009-06-09 17:01 ` Mel Gorman
2009-06-09 17:01 ` [PATCH 1/4] Properly account for the number of page cache pages zone_reclaim() can reclaim Mel Gorman
2009-06-09 17:01   ` Mel Gorman
2009-06-09 18:15   ` Rik van Riel
2009-06-09 18:15     ` Rik van Riel
2009-06-10  1:19   ` Wu Fengguang [this message]
2009-06-10  1:19     ` Wu Fengguang
2009-06-10  7:31     ` KOSAKI Motohiro
2009-06-10  7:31       ` KOSAKI Motohiro
2009-06-10 10:31     ` Mel Gorman
2009-06-10 10:31       ` Mel Gorman
2009-06-10 11:59       ` Wu Fengguang
2009-06-10 11:59         ` Wu Fengguang
2009-06-10 13:41         ` Mel Gorman
2009-06-10 13:41           ` Mel Gorman
2009-06-10 22:42           ` Ram Pai
2009-06-10 22:42             ` Ram Pai
2009-06-11 13:52             ` Mel Gorman
2009-06-11 13:52               ` Mel Gorman
2009-06-11  1:29           ` Wu Fengguang
2009-06-11  1:29             ` Wu Fengguang
2009-06-11  3:26         ` KOSAKI Motohiro
2009-06-11  3:26           ` KOSAKI Motohiro
2009-06-09 17:01 ` [PATCH 2/4] Do not unconditionally treat zones that fail zone_reclaim() as full Mel Gorman
2009-06-09 17:01   ` Mel Gorman
2009-06-09 18:11   ` Rik van Riel
2009-06-09 18:11     ` Rik van Riel
2009-06-10  1:52   ` KOSAKI Motohiro
2009-06-10  1:52     ` KOSAKI Motohiro
2009-06-09 17:01 ` [PATCH 3/4] Count the number of times zone_reclaim() scans and fails Mel Gorman
2009-06-09 17:01   ` Mel Gorman
2009-06-09 18:56   ` Rik van Riel
2009-06-09 18:56     ` Rik van Riel
2009-06-10  1:47   ` KOSAKI Motohiro
2009-06-10  1:47     ` KOSAKI Motohiro
2009-06-10 10:36     ` Mel Gorman
2009-06-10 10:36       ` Mel Gorman
2009-06-10  2:10   ` Wu Fengguang
2009-06-10  2:10     ` Wu Fengguang
2009-06-10 10:40     ` Mel Gorman
2009-06-10 10:40       ` Mel Gorman
2009-06-09 17:01 ` [PATCH 4/4] Reintroduce zone_reclaim_interval for when zone_reclaim() scans and fails to avoid CPU spinning at 100% on NUMA Mel Gorman
2009-06-09 17:01   ` Mel Gorman
2009-06-10  1:53   ` KOSAKI Motohiro
2009-06-10  1:53     ` KOSAKI Motohiro
2009-06-10  5:54   ` Andrew Morton
2009-06-10  5:54     ` Andrew Morton
2009-06-10 10:48     ` Mel Gorman
2009-06-10 10:48       ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090610011939.GA5603@localhost \
    --to=fengguang.wu@intel.com \
    --cc=cl@linux-foundation.org \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxram@us.ibm.com \
    --cc=mel@csn.ul.ie \
    --cc=riel@redhat.com \
    --cc=yanmin.zhang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.