From: Minchan Kim <minchan@kernel.org>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
David Rientjes <rientjes@google.com>,
Mel Gorman <mgorman@suse.de>,
Joonsoo Kim <iamjoonsoo.kim@lge.com>,
Michal Nazarewicz <mina86@mina86.com>,
Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
Christoph Lameter <cl@linux.com>, Rik van Riel <riel@redhat.com>,
Zhang Yanfei <zhangyanfei@cn.fujitsu.com>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 02/13] mm, compaction: defer each zone individually instead of preferred zone
Date: Mon, 23 Jun 2014 11:24:43 +0900 [thread overview]
Message-ID: <20140623022443.GB12413@bbox> (raw)
In-Reply-To: <1403279383-5862-3-git-send-email-vbabka@suse.cz>
On Fri, Jun 20, 2014 at 05:49:32PM +0200, Vlastimil Babka wrote:
> When direct sync compaction is often unsuccessful, it may become deferred for
> some time to avoid further useless attempts, both sync and async. Successful
> high-order allocations un-defer compaction, while further unsuccessful
> compaction attempts prolong the copmaction deferred period.
>
> Currently the checking and setting deferred status is performed only on the
> preferred zone of the allocation that invoked direct compaction. But compaction
> itself is attempted on all eligible zones in the zonelist, so the behavior is
> suboptimal and may lead both to scenarios where 1) compaction is attempted
> uselessly, or 2) where it's not attempted despite good chances of succeeding,
> as shown on the examples below:
>
> 1) A direct compaction with Normal preferred zone failed and set deferred
> compaction for the Normal zone. Another unrelated direct compaction with
> DMA32 as preferred zone will attempt to compact DMA32 zone even though
> the first compaction attempt also included DMA32 zone.
>
> In another scenario, compaction with Normal preferred zone failed to compact
> Normal zone, but succeeded in the DMA32 zone, so it will not defer
> compaction. In the next attempt, it will try Normal zone which will fail
> again, instead of skipping Normal zone and trying DMA32 directly.
>
> 2) Kswapd will balance DMA32 zone and reset defer status based on watermarks
> looking good. A direct compaction with preferred Normal zone will skip
> compaction of all zones including DMA32 because Normal was still deferred.
> The allocation might have succeeded in DMA32, but won't.
>
> This patch makes compaction deferring work on individual zone basis instead of
> preferred zone. For each zone, it checks compaction_deferred() to decide if the
> zone should be skipped. If watermarks fail after compacting the zone,
> defer_compaction() is called. The zone where watermarks passed can still be
> deferred when the allocation attempt is unsuccessful. When allocation is
> successful, compaction_defer_reset() is called for the zone containing the
> allocated page. This approach should approximate calling defer_compaction()
> only on zones where compaction was attempted and did not yield allocated page.
> There might be corner cases but that is inevitable as long as the decision
> to stop compacting dues not guarantee that a page will be allocated.
>
> During testing on a two-node machine with a single very small Normal zone on
> node 1, this patch has improved success rates in stress-highalloc mmtests
> benchmark. The success here were previously made worse by commit 3a025760fc
> ("mm: page_alloc: spill to remote nodes before waking kswapd") as kswapd was
> no longer resetting often enough the deferred compaction for the Normal zone,
> and DMA32 zones on both nodes were thus not considered for compaction.
>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Nice job!
Acked-by: Minchan Kim <minchan@kernel.org>
Below is just nitpick.
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Michal Nazarewicz <mina86@mina86.com>
> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: David Rientjes <rientjes@google.com>
> ---
> include/linux/compaction.h | 6 ++++--
> mm/compaction.c | 29 ++++++++++++++++++++++++-----
> mm/page_alloc.c | 33 ++++++++++++++++++---------------
> 3 files changed, 46 insertions(+), 22 deletions(-)
>
> diff --git a/include/linux/compaction.h b/include/linux/compaction.h
> index 01e3132..76f9beb 100644
> --- a/include/linux/compaction.h
> +++ b/include/linux/compaction.h
> @@ -22,7 +22,8 @@ extern int sysctl_extfrag_handler(struct ctl_table *table, int write,
> extern int fragmentation_index(struct zone *zone, unsigned int order);
> extern unsigned long try_to_compact_pages(struct zonelist *zonelist,
> int order, gfp_t gfp_mask, nodemask_t *mask,
> - enum migrate_mode mode, bool *contended);
> + enum migrate_mode mode, bool *contended, bool *deferred,
> + struct zone **candidate_zone);
> extern void compact_pgdat(pg_data_t *pgdat, int order);
> extern void reset_isolation_suitable(pg_data_t *pgdat);
> extern unsigned long compaction_suitable(struct zone *zone, int order);
> @@ -91,7 +92,8 @@ static inline bool compaction_restarting(struct zone *zone, int order)
> #else
> static inline unsigned long try_to_compact_pages(struct zonelist *zonelist,
> int order, gfp_t gfp_mask, nodemask_t *nodemask,
> - enum migrate_mode mode, bool *contended)
> + enum migrate_mode mode, bool *contended, bool *deferred,
> + struct zone **candidate_zone)
> {
> return COMPACT_CONTINUE;
> }
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 5175019..7c491d0 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1122,13 +1122,15 @@ int sysctl_extfrag_threshold = 500;
> * @nodemask: The allowed nodes to allocate from
> * @mode: The migration mode for async, sync light, or sync migration
> * @contended: Return value that is true if compaction was aborted due to lock contention
> - * @page: Optionally capture a free page of the requested order during compaction
> + * @deferred: Return value that is true if compaction was deferred in all zones
> + * @candidate_zone: Return the zone where we think allocation should succeed
> *
> * This is the main entry point for direct page compaction.
> */
> unsigned long try_to_compact_pages(struct zonelist *zonelist,
> int order, gfp_t gfp_mask, nodemask_t *nodemask,
> - enum migrate_mode mode, bool *contended)
> + enum migrate_mode mode, bool *contended, bool *deferred,
> + struct zone **candidate_zone)
> {
> enum zone_type high_zoneidx = gfp_zone(gfp_mask);
> int may_enter_fs = gfp_mask & __GFP_FS;
> @@ -1142,8 +1144,7 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
> if (!order || !may_enter_fs || !may_perform_io)
> return rc;
>
> - count_compact_event(COMPACTSTALL);
> -
> + *deferred = true;
> #ifdef CONFIG_CMA
> if (allocflags_to_migratetype(gfp_mask) == MIGRATE_MOVABLE)
> alloc_flags |= ALLOC_CMA;
> @@ -1153,16 +1154,34 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
> nodemask) {
> int status;
>
> + if (compaction_deferred(zone, order))
> + continue;
> +
> + *deferred = false;
> +
> status = compact_zone_order(zone, order, gfp_mask, mode,
> contended);
> rc = max(status, rc);
>
> /* If a normal allocation would succeed, stop compacting */
> if (zone_watermark_ok(zone, order, low_wmark_pages(zone), 0,
> - alloc_flags))
> + alloc_flags)) {
> + *candidate_zone = zone;
> break;
> + } else if (mode != MIGRATE_ASYNC) {
> + /*
> + * We think that allocation won't succeed in this zone
> + * so we defer compaction there. If it ends up
> + * succeeding after all, it will be reset.
> + */
> + defer_compaction(zone, order);
> + }
> }
>
> + /* If at least one zone wasn't deferred, we count a compaction stall */
I like positive sentence.
/* Once we tried compaction a zone at least, let's count a compaction stall */
> + if (!*deferred)
> + count_compact_event(COMPACTSTALL);
> +
> return rc;
> }
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index ee92384..6593f79 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2238,18 +2238,17 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
> bool *contended_compaction, bool *deferred_compaction,
> unsigned long *did_some_progress)
> {
> - if (!order)
> - return NULL;
> + struct zone *last_compact_zone = NULL;
>
> - if (compaction_deferred(preferred_zone, order)) {
> - *deferred_compaction = true;
> + if (!order)
> return NULL;
> - }
>
> current->flags |= PF_MEMALLOC;
> *did_some_progress = try_to_compact_pages(zonelist, order, gfp_mask,
> nodemask, mode,
> - contended_compaction);
> + contended_compaction,
> + deferred_compaction,
> + &last_compact_zone);
> current->flags &= ~PF_MEMALLOC;
>
> if (*did_some_progress != COMPACT_SKIPPED) {
> @@ -2263,27 +2262,31 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
> order, zonelist, high_zoneidx,
> alloc_flags & ~ALLOC_NO_WATERMARKS,
> preferred_zone, classzone_idx, migratetype);
> +
> if (page) {
> - preferred_zone->compact_blockskip_flush = false;
> - compaction_defer_reset(preferred_zone, order, true);
> + struct zone *zone = page_zone(page);
> +
> + zone->compact_blockskip_flush = false;
> + compaction_defer_reset(zone, order, true);
> count_vm_event(COMPACTSUCCESS);
> return page;
> }
>
> /*
> + * last_compact_zone is where try_to_compact_pages thought
> + * allocation should succeed, so it did not defer compaction.
> + * But now we know that it didn't succeed, so we do the defer.
> + */
> + if (last_compact_zone && mode != MIGRATE_ASYNC)
> + defer_compaction(last_compact_zone, order);
> +
> + /*
> * It's bad if compaction run occurs and fails.
> * The most likely reason is that pages exist,
> * but not enough to satisfy watermarks.
> */
> count_vm_event(COMPACTFAIL);
>
> - /*
> - * As async compaction considers a subset of pageblocks, only
> - * defer if the failure was a sync compaction failure.
> - */
> - if (mode != MIGRATE_ASYNC)
> - defer_compaction(preferred_zone, order);
> -
> cond_resched();
> }
>
> --
> 1.8.4.5
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2014-06-23 2:23 UTC|newest]
Thread overview: 57+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-20 15:49 [PATCH v3 00/13] compaction: balancing overhead and success rates Vlastimil Babka
2014-06-20 15:49 ` [PATCH v3 01/13] mm, THP: don't hold mmap_sem in khugepaged when allocating THP Vlastimil Babka
2014-06-20 17:45 ` Kirill A. Shutemov
2014-06-23 5:39 ` Zhang Yanfei
2014-06-23 9:52 ` Vlastimil Babka
2014-06-23 10:40 ` Zhang Yanfei
2014-06-20 15:49 ` [PATCH v3 02/13] mm, compaction: defer each zone individually instead of preferred zone Vlastimil Babka
2014-06-23 2:24 ` Minchan Kim [this message]
2014-06-23 6:26 ` Zhang Yanfei
2014-06-24 8:23 ` Joonsoo Kim
2014-06-24 15:29 ` Vlastimil Babka
2014-06-25 1:02 ` Joonsoo Kim
2014-06-20 15:49 ` [PATCH v3 03/13] mm, compaction: do not recheck suitable_migration_target under lock Vlastimil Babka
2014-06-20 15:49 ` [PATCH v3 04/13] mm, compaction: move pageblock checks up from isolate_migratepages_range() Vlastimil Babka
2014-06-23 6:57 ` Zhang Yanfei
2014-06-24 4:52 ` Naoya Horiguchi
2014-06-24 15:34 ` Vlastimil Babka
2014-06-24 16:58 ` Naoya Horiguchi
2014-06-25 8:50 ` Vlastimil Babka
2014-06-25 15:46 ` Naoya Horiguchi
2014-06-24 8:33 ` Joonsoo Kim
2014-06-24 15:42 ` Vlastimil Babka
2014-06-25 0:53 ` Joonsoo Kim
2014-06-25 8:59 ` Vlastimil Babka
2014-06-27 5:57 ` Joonsoo Kim
2014-06-20 15:49 ` [PATCH v3 05/13] mm, compaction: report compaction as contended only due to lock contention Vlastimil Babka
2014-06-23 1:39 ` Minchan Kim
2014-06-23 8:55 ` Zhang Yanfei
2014-06-23 23:35 ` Minchan Kim
2014-06-24 1:07 ` Zhang Yanfei
2014-07-11 8:28 ` Vlastimil Babka
2014-07-11 9:38 ` Vlastimil Babka
2014-06-20 15:49 ` [PATCH v3 06/13] mm, compaction: periodically drop lock and restore IRQs in scanners Vlastimil Babka
2014-06-23 2:53 ` Minchan Kim
2014-07-11 12:03 ` Vlastimil Babka
2014-06-23 9:13 ` Zhang Yanfei
2014-06-24 15:39 ` Naoya Horiguchi
2014-06-24 15:44 ` Vlastimil Babka
2014-06-20 15:49 ` [PATCH v3 07/13] mm, compaction: skip rechecks when lock was already held Vlastimil Babka
2014-06-23 9:16 ` Zhang Yanfei
2014-06-24 18:55 ` Naoya Horiguchi
2014-06-20 15:49 ` [PATCH v3 08/13] mm, compaction: remember position within pageblock in free pages scanner Vlastimil Babka
2014-06-23 3:04 ` Minchan Kim
2014-06-23 9:17 ` Zhang Yanfei
2014-06-24 19:09 ` Naoya Horiguchi
2014-06-20 15:49 ` [PATCH v3 09/13] mm, compaction: skip buddy pages by their order in the migrate scanner Vlastimil Babka
2014-06-23 3:05 ` Minchan Kim
2014-06-23 9:29 ` Zhang Yanfei
2014-06-20 15:49 ` [PATCH v3 10/13] mm: rename allocflags_to_migratetype for clarity Vlastimil Babka
2014-06-24 20:34 ` Naoya Horiguchi
2014-06-20 15:49 ` [PATCH v3 11/13] mm, compaction: pass gfp mask to compact_control Vlastimil Babka
2014-06-23 3:06 ` Minchan Kim
2014-06-23 9:31 ` Zhang Yanfei
2014-06-20 15:49 ` [PATCH v3 12/13] mm, compaction: try to capture the just-created high-order freepage Vlastimil Babka
2014-06-25 1:57 ` Naoya Horiguchi
2014-06-25 8:57 ` Vlastimil Babka
2014-06-20 15:49 ` [RFC PATCH v3 13/13] mm, compaction: do not migrate pages when that cannot satisfy page fault allocation Vlastimil Babka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140623022443.GB12413@bbox \
--to=minchan@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=cl@linux.com \
--cc=iamjoonsoo.kim@lge.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mina86@mina86.com \
--cc=n-horiguchi@ah.jp.nec.com \
--cc=riel@redhat.com \
--cc=rientjes@google.com \
--cc=vbabka@suse.cz \
--cc=zhangyanfei@cn.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).