linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@suse.de>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: linux-mm@kvack.org, Johannes Weiner <jweiner@redhat.com>,
	Rik van Riel <riel@redhat.com>, Hugh Dickins <hughd@google.com>,
	Richard Davies <richard@arachsys.com>,
	Shaohua Li <shli@kernel.org>, Rafael Aquini <aquini@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Hush Bensen <hush.bensen@gmail.com>
Subject: Re: [PATCH 9/9] mm: zone_reclaim: compaction: add compaction to zone_reclaim_mode
Date: Wed, 7 Aug 2013 17:18:37 +0100	[thread overview]
Message-ID: <20130807161837.GW2296@suse.de> (raw)
In-Reply-To: <20130804165526.GG27921@redhat.com>

On Sun, Aug 04, 2013 at 06:55:26PM +0200, Andrea Arcangeli wrote:
> On Fri, Aug 02, 2013 at 06:06:36PM +0200, Andrea Arcangeli wrote:
> > +		need_compaction = false;
> 
> This should be changed to "*need_compaction = false". It's actually a
> cleanup because it's a nooperational change at runtime.
> need_compaction was initialized to false by the only caller so it
> couldn't harm. But it's better to fix it to avoid
> confusion. Alternatively the above line can be dropped entirely but I
> thought it was cleaner to have a defined value as result of the
> function.
> 
> Found by Fengguang kbuild robot.
> 
> A new replacement patch 9/9 is appended below:
> 
> ===
> From: Andrea Arcangeli <aarcange@redhat.com>
> Subject: [PATCH] mm: zone_reclaim: compaction: add compaction to
>  zone_reclaim_mode
> 
> This adds compaction to zone_reclaim so THP enabled won't decrease the
> NUMA locality with /proc/sys/vm/zone_reclaim_mode > 0.
> 

That is a light explanation.

> It is important to boot with numa_zonelist_order=n (n means nodes) to
> get more accurate NUMA locality if there are multiple zones per node.
> 

This appears to be an unrelated observation.

> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> ---
>  include/linux/swap.h |   8 +++-
>  mm/page_alloc.c      |   4 +-
>  mm/vmscan.c          | 111 ++++++++++++++++++++++++++++++++++++++++++---------
>  3 files changed, 102 insertions(+), 21 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index f2ada36..fedb246 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
>
> <SNIP>
>
> @@ -3549,27 +3567,35 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
>  	return sc.nr_reclaimed >= nr_pages;
>  }
>  
> -int zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
> +static int zone_reclaim_compact(struct zone *preferred_zone,
> +				struct zone *zone, gfp_t gfp_mask,
> +				unsigned int order,
> +				bool sync_compaction,
> +				bool *need_compaction)
>  {
> -	int node_id;
> -	int ret;
> +	bool contended;
>  
> -	/*
> -	 * Zone reclaim reclaims unmapped file backed pages and
> -	 * slab pages if we are over the defined limits.
> -	 *
> -	 * A small portion of unmapped file backed pages is needed for
> -	 * file I/O otherwise pages read by file I/O will be immediately
> -	 * thrown out if the zone is overallocated. So we do not reclaim
> -	 * if less than a specified percentage of the zone is used by
> -	 * unmapped file backed pages.
> -	 */
> -	if (zone_pagecache_reclaimable(zone) <= zone->min_unmapped_pages &&
> -	    zone_page_state(zone, NR_SLAB_RECLAIMABLE) <= zone->min_slab_pages)
> -		return ZONE_RECLAIM_FULL;
> +	if (compaction_deferred(preferred_zone, order) ||
> +	    !order ||
> +	    (gfp_mask & (__GFP_FS|__GFP_IO)) != (__GFP_FS|__GFP_IO)) {
> +		*need_compaction = false;
> +		return COMPACT_SKIPPED;
> +	}
>  
> -	if (zone->all_unreclaimable)
> -		return ZONE_RECLAIM_FULL;
> +	*need_compaction = true;
> +	return compact_zone_order(zone, order,
> +				  gfp_mask,
> +				  sync_compaction,
> +				  &contended);
> +}
> +
> +int zone_reclaim(struct zone *preferred_zone, struct zone *zone,
> +		 gfp_t gfp_mask, unsigned int order,
> +		 unsigned long mark, int classzone_idx, int alloc_flags)
> +{
> +	int node_id;
> +	int ret, c_ret;
> +	bool sync_compaction = false, need_compaction = false;
>  
>  	/*
>  	 * Do not scan if the allocation should not be delayed.
> @@ -3587,7 +3613,56 @@ int zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
>  	if (node_state(node_id, N_CPU) && node_id != numa_node_id())
>  		return ZONE_RECLAIM_NOSCAN;
>  
> +repeat_compaction:
> +	/*
> +	 * If this allocation may be satisfied by memory compaction,
> +	 * run compaction before reclaim.
> +	 */
> +	c_ret = zone_reclaim_compact(preferred_zone,
> +				     zone, gfp_mask, order,
> +				     sync_compaction,
> +				     &need_compaction);
> +	if (need_compaction &&
> +	    c_ret != COMPACT_SKIPPED &&

need_compaction records whether compaction was attempted or not. Why
not just check for COMPACT_SKIPPED and have compact_zone_order return
COMPACT_SKIPPED if !CONFIG_COMPACTION?

> +	    zone_watermark_ok(zone, order, mark,
> +			      classzone_idx,
> +			      alloc_flags)) {
> +#ifdef CONFIG_COMPACTION
> +		zone->compact_considered = 0;
> +		zone->compact_defer_shift = 0;
> +#endif
> +		return ZONE_RECLAIM_SUCCESS;
> +	}
> +
> +	/*
> +	 * reclaim if compaction failed because not enough memory was
> +	 * available or if compaction didn't run (order 0) or didn't
> +	 * succeed.
> +	 */
>  	ret = __zone_reclaim(zone, gfp_mask, order);
> +	if (ret == ZONE_RECLAIM_SUCCESS) {
> +		if (zone_watermark_ok(zone, order, mark,
> +				      classzone_idx,
> +				      alloc_flags))
> +			return ZONE_RECLAIM_SUCCESS;
> +
> +		/*
> +		 * If compaction run but it was skipped and reclaim was
> +		 * successful keep going.
> +		 */
> +		if (need_compaction && c_ret == COMPACT_SKIPPED) {

And I recognise that you use need_compaction to see if it had been possible
to attempt compaction before but the way this is organise it appears that
it is possible to loop until __zone_reclaim fails which could be a lot
of reclaiming. If compaction always makes an attempt but always fails
(pinned/locked pages, fragmentation etc) then potentially we reclaim the
entire zone. I recognise that a single __zone_reclaim pass may not reclaim
enough pages to allow compaction to succeed so a single pass is not the
answer either but I worry that this will create a variation of the bug
where an excessive amount of memory is reclaimed to satisfy a THP
allocation.

> +			/*
> +			 * If it's ok to wait for I/O we can as well run sync
> +			 * compaction
> +			 */
> +			sync_compaction = !!(zone_reclaim_mode &
> +					     (RECLAIM_WRITE|RECLAIM_SWAP));
> +			cond_resched();
> +			goto repeat_compaction;
> +		}
> +	}
> +	if (need_compaction)
> +		defer_compaction(preferred_zone, order);
>  
>  	if (!ret)
>  		count_vm_event(PGSCAN_ZONE_RECLAIM_FAILED);

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2013-08-07 16:18 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-02 16:06 [PATCH 0/9] adding compaction to zone_reclaim_mode v3 Andrea Arcangeli
2013-08-02 16:06 ` [PATCH 1/9] mm: zone_reclaim: remove ZONE_RECLAIM_LOCKED Andrea Arcangeli
2013-08-05 18:30   ` Johannes Weiner
2013-08-07 15:13   ` Mel Gorman
2013-08-02 16:06 ` [PATCH 2/9] mm: zone_reclaim: compaction: scan all memory with /proc/sys/vm/compact_memory Andrea Arcangeli
2013-08-05 18:45   ` Johannes Weiner
2013-08-06  5:50     ` Andrea Arcangeli
2013-08-02 16:06 ` [PATCH 3/9] mm: zone_reclaim: compaction: don't depend on kswapd to invoke reset_isolation_suitable Andrea Arcangeli
2013-08-05 19:32   ` Johannes Weiner
2013-08-02 16:06 ` [PATCH 4/9] mm: zone_reclaim: compaction: reset before initializing the scan cursors Andrea Arcangeli
2013-08-05 19:44   ` Johannes Weiner
2013-08-02 16:06 ` [PATCH 5/9] mm: compaction: don't require high order pages below min wmark Andrea Arcangeli
2013-08-05 18:35   ` Rik van Riel
2013-08-06 13:23   ` Johannes Weiner
2013-08-07 15:42   ` Mel Gorman
2013-08-07 16:14     ` Andrea Arcangeli
2013-08-07 16:47       ` Mel Gorman
2013-08-08  0:59         ` Andrea Arcangeli
2013-08-02 16:06 ` [PATCH 6/9] mm: zone_reclaim: compaction: increase the high order pages in the watermarks Andrea Arcangeli
2013-08-05 18:36   ` Rik van Riel
2013-08-06 16:08   ` Johannes Weiner
2013-08-06 18:52     ` Johannes Weiner
2013-08-07 13:18     ` Andrea Arcangeli
2013-08-07 15:43   ` Mel Gorman
2013-08-02 16:06 ` [PATCH 7/9] mm: zone_reclaim: compaction: export compact_zone_order() Andrea Arcangeli
2013-08-05 18:37   ` Rik van Riel
2013-08-07 15:48   ` Mel Gorman
2013-08-02 16:06 ` [PATCH 8/9] mm: zone_reclaim: after a successful zone_reclaim check the min watermark Andrea Arcangeli
2013-08-05 18:38   ` Rik van Riel
2013-08-07 15:53   ` Mel Gorman
2013-08-02 16:06 ` [PATCH 9/9] mm: zone_reclaim: compaction: add compaction to zone_reclaim_mode Andrea Arcangeli
2013-08-04 16:55   ` Andrea Arcangeli
2013-08-05 18:38     ` Rik van Riel
2013-08-07 16:18     ` Mel Gorman [this message]
2013-08-07 23:48       ` Andrea Arcangeli
2013-08-08  8:22         ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130807161837.GW2296@suse.de \
    --to=mgorman@suse.de \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=aquini@redhat.com \
    --cc=hughd@google.com \
    --cc=hush.bensen@gmail.com \
    --cc=jweiner@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=richard@arachsys.com \
    --cc=riel@redhat.com \
    --cc=shli@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).