All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@suse.de>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: linux-mm@kvack.org, Johannes Weiner <jweiner@redhat.com>,
	Rik van Riel <riel@redhat.com>, Hugh Dickins <hughd@google.com>,
	Richard Davies <richard@arachsys.com>,
	Shaohua Li <shli@kernel.org>, Rafael Aquini <aquini@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Hush Bensen <hush.bensen@gmail.com>
Subject: Re: [PATCH 9/9] mm: zone_reclaim: compaction: add compaction to zone_reclaim_mode
Date: Wed, 7 Aug 2013 17:18:37 +0100	[thread overview]
Message-ID: <20130807161837.GW2296@suse.de> (raw)
In-Reply-To: <20130804165526.GG27921@redhat.com>

On Sun, Aug 04, 2013 at 06:55:26PM +0200, Andrea Arcangeli wrote:
> On Fri, Aug 02, 2013 at 06:06:36PM +0200, Andrea Arcangeli wrote:
> > +		need_compaction = false;
> 
> This should be changed to "*need_compaction = false". It's actually a
> cleanup because it's a nooperational change at runtime.
> need_compaction was initialized to false by the only caller so it
> couldn't harm. But it's better to fix it to avoid
> confusion. Alternatively the above line can be dropped entirely but I
> thought it was cleaner to have a defined value as result of the
> function.
> 
> Found by Fengguang kbuild robot.
> 
> A new replacement patch 9/9 is appended below:
> 
> ===
> From: Andrea Arcangeli <aarcange@redhat.com>
> Subject: [PATCH] mm: zone_reclaim: compaction: add compaction to
>  zone_reclaim_mode
> 
> This adds compaction to zone_reclaim so THP enabled won't decrease the
> NUMA locality with /proc/sys/vm/zone_reclaim_mode > 0.
> 

That is a light explanation.

> It is important to boot with numa_zonelist_order=n (n means nodes) to
> get more accurate NUMA locality if there are multiple zones per node.
> 

This appears to be an unrelated observation.

> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> ---
>  include/linux/swap.h |   8 +++-
>  mm/page_alloc.c      |   4 +-
>  mm/vmscan.c          | 111 ++++++++++++++++++++++++++++++++++++++++++---------
>  3 files changed, 102 insertions(+), 21 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index f2ada36..fedb246 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
>
> <SNIP>
>
> @@ -3549,27 +3567,35 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
>  	return sc.nr_reclaimed >= nr_pages;
>  }
>  
> -int zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
> +static int zone_reclaim_compact(struct zone *preferred_zone,
> +				struct zone *zone, gfp_t gfp_mask,
> +				unsigned int order,
> +				bool sync_compaction,
> +				bool *need_compaction)
>  {
> -	int node_id;
> -	int ret;
> +	bool contended;
>  
> -	/*
> -	 * Zone reclaim reclaims unmapped file backed pages and
> -	 * slab pages if we are over the defined limits.
> -	 *
> -	 * A small portion of unmapped file backed pages is needed for
> -	 * file I/O otherwise pages read by file I/O will be immediately
> -	 * thrown out if the zone is overallocated. So we do not reclaim
> -	 * if less than a specified percentage of the zone is used by
> -	 * unmapped file backed pages.
> -	 */
> -	if (zone_pagecache_reclaimable(zone) <= zone->min_unmapped_pages &&
> -	    zone_page_state(zone, NR_SLAB_RECLAIMABLE) <= zone->min_slab_pages)
> -		return ZONE_RECLAIM_FULL;
> +	if (compaction_deferred(preferred_zone, order) ||
> +	    !order ||
> +	    (gfp_mask & (__GFP_FS|__GFP_IO)) != (__GFP_FS|__GFP_IO)) {
> +		*need_compaction = false;
> +		return COMPACT_SKIPPED;
> +	}
>  
> -	if (zone->all_unreclaimable)
> -		return ZONE_RECLAIM_FULL;
> +	*need_compaction = true;
> +	return compact_zone_order(zone, order,
> +				  gfp_mask,
> +				  sync_compaction,
> +				  &contended);
> +}
> +
> +int zone_reclaim(struct zone *preferred_zone, struct zone *zone,
> +		 gfp_t gfp_mask, unsigned int order,
> +		 unsigned long mark, int classzone_idx, int alloc_flags)
> +{
> +	int node_id;
> +	int ret, c_ret;
> +	bool sync_compaction = false, need_compaction = false;
>  
>  	/*
>  	 * Do not scan if the allocation should not be delayed.
> @@ -3587,7 +3613,56 @@ int zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
>  	if (node_state(node_id, N_CPU) && node_id != numa_node_id())
>  		return ZONE_RECLAIM_NOSCAN;
>  
> +repeat_compaction:
> +	/*
> +	 * If this allocation may be satisfied by memory compaction,
> +	 * run compaction before reclaim.
> +	 */
> +	c_ret = zone_reclaim_compact(preferred_zone,
> +				     zone, gfp_mask, order,
> +				     sync_compaction,
> +				     &need_compaction);
> +	if (need_compaction &&
> +	    c_ret != COMPACT_SKIPPED &&

need_compaction records whether compaction was attempted or not. Why
not just check for COMPACT_SKIPPED and have compact_zone_order return
COMPACT_SKIPPED if !CONFIG_COMPACTION?

> +	    zone_watermark_ok(zone, order, mark,
> +			      classzone_idx,
> +			      alloc_flags)) {
> +#ifdef CONFIG_COMPACTION
> +		zone->compact_considered = 0;
> +		zone->compact_defer_shift = 0;
> +#endif
> +		return ZONE_RECLAIM_SUCCESS;
> +	}
> +
> +	/*
> +	 * reclaim if compaction failed because not enough memory was
> +	 * available or if compaction didn't run (order 0) or didn't
> +	 * succeed.
> +	 */
>  	ret = __zone_reclaim(zone, gfp_mask, order);
> +	if (ret == ZONE_RECLAIM_SUCCESS) {
> +		if (zone_watermark_ok(zone, order, mark,
> +				      classzone_idx,
> +				      alloc_flags))
> +			return ZONE_RECLAIM_SUCCESS;
> +
> +		/*
> +		 * If compaction run but it was skipped and reclaim was
> +		 * successful keep going.
> +		 */
> +		if (need_compaction && c_ret == COMPACT_SKIPPED) {

And I recognise that you use need_compaction to see if it had been possible
to attempt compaction before but the way this is organise it appears that
it is possible to loop until __zone_reclaim fails which could be a lot
of reclaiming. If compaction always makes an attempt but always fails
(pinned/locked pages, fragmentation etc) then potentially we reclaim the
entire zone. I recognise that a single __zone_reclaim pass may not reclaim
enough pages to allow compaction to succeed so a single pass is not the
answer either but I worry that this will create a variation of the bug
where an excessive amount of memory is reclaimed to satisfy a THP
allocation.

> +			/*
> +			 * If it's ok to wait for I/O we can as well run sync
> +			 * compaction
> +			 */
> +			sync_compaction = !!(zone_reclaim_mode &
> +					     (RECLAIM_WRITE|RECLAIM_SWAP));
> +			cond_resched();
> +			goto repeat_compaction;
> +		}
> +	}
> +	if (need_compaction)
> +		defer_compaction(preferred_zone, order);
>  
>  	if (!ret)
>  		count_vm_event(PGSCAN_ZONE_RECLAIM_FAILED);

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2013-08-07 16:18 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-02 16:06 [PATCH 0/9] adding compaction to zone_reclaim_mode v3 Andrea Arcangeli
2013-08-02 16:06 ` [PATCH 1/9] mm: zone_reclaim: remove ZONE_RECLAIM_LOCKED Andrea Arcangeli
2013-08-05 18:30   ` Johannes Weiner
2013-08-07 15:13   ` Mel Gorman
2013-08-02 16:06 ` [PATCH 2/9] mm: zone_reclaim: compaction: scan all memory with /proc/sys/vm/compact_memory Andrea Arcangeli
2013-08-05 18:45   ` Johannes Weiner
2013-08-06  5:50     ` Andrea Arcangeli
2013-08-02 16:06 ` [PATCH 3/9] mm: zone_reclaim: compaction: don't depend on kswapd to invoke reset_isolation_suitable Andrea Arcangeli
2013-08-05 19:32   ` Johannes Weiner
2013-08-02 16:06 ` [PATCH 4/9] mm: zone_reclaim: compaction: reset before initializing the scan cursors Andrea Arcangeli
2013-08-05 19:44   ` Johannes Weiner
2013-08-02 16:06 ` [PATCH 5/9] mm: compaction: don't require high order pages below min wmark Andrea Arcangeli
2013-08-05 18:35   ` Rik van Riel
2013-08-06 13:23   ` Johannes Weiner
2013-08-07 15:42   ` Mel Gorman
2013-08-07 16:14     ` Andrea Arcangeli
2013-08-07 16:47       ` Mel Gorman
2013-08-08  0:59         ` Andrea Arcangeli
2013-08-02 16:06 ` [PATCH 6/9] mm: zone_reclaim: compaction: increase the high order pages in the watermarks Andrea Arcangeli
2013-08-05 18:36   ` Rik van Riel
2013-08-06 16:08   ` Johannes Weiner
2013-08-06 18:52     ` Johannes Weiner
2013-08-07 13:18     ` Andrea Arcangeli
2013-08-07 15:43   ` Mel Gorman
2013-08-02 16:06 ` [PATCH 7/9] mm: zone_reclaim: compaction: export compact_zone_order() Andrea Arcangeli
2013-08-05 18:37   ` Rik van Riel
2013-08-07 15:48   ` Mel Gorman
2013-08-02 16:06 ` [PATCH 8/9] mm: zone_reclaim: after a successful zone_reclaim check the min watermark Andrea Arcangeli
2013-08-05 18:38   ` Rik van Riel
2013-08-07 15:53   ` Mel Gorman
2013-08-02 16:06 ` [PATCH 9/9] mm: zone_reclaim: compaction: add compaction to zone_reclaim_mode Andrea Arcangeli
2013-08-04 16:55   ` Andrea Arcangeli
2013-08-05 18:38     ` Rik van Riel
2013-08-07 16:18     ` Mel Gorman [this message]
2013-08-07 23:48       ` Andrea Arcangeli
2013-08-08  8:22         ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130807161837.GW2296@suse.de \
    --to=mgorman@suse.de \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=aquini@redhat.com \
    --cc=hughd@google.com \
    --cc=hush.bensen@gmail.com \
    --cc=jweiner@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=richard@arachsys.com \
    --cc=riel@redhat.com \
    --cc=shli@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.