From: Vlastimil Babka <vbabka@suse.cz>
To: Joonsoo Kim <js1304@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>, Rik van Riel <riel@redhat.com>,
David Rientjes <rientjes@google.com>,
Minchan Kim <minchan@kernel.org>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Joonsoo Kim <iamjoonsoo.kim@lge.com>
Subject: Re: [PATCH v3 6/7] mm/compaction: introduce migration scan limit
Date: Mon, 14 Dec 2015 10:34:50 +0100 [thread overview]
Message-ID: <566E8D3A.5030804@suse.cz> (raw)
In-Reply-To: <1449126681-19647-7-git-send-email-iamjoonsoo.kim@lge.com>
On 12/03/2015 08:11 AM, Joonsoo Kim wrote:
> This is preparation step to replace compaction deferring with compaction
> limit. Whole reason why we need to replace it will be mentioned in
> the following patch.
>
> In this patch, migration_scan_limit is assigned and accounted, but, not
> checked to finish. So, there is no functional change.
>
> Currently, amount of migration_scan_limit is chosen to imitate compaction
> deferring logic. We can tune it easily if overhead looks insane, but,
> it would be further work.
> Also, amount of migration_scan_limit is adapted by compact_defer_shift.
> More fails increase compact_defer_shift and this will limit compaction
> more.
>
> There are two interesting changes. One is that cached pfn is always
> updated while limit is activated. Otherwise, we would scan same range
> over and over. Second one is that async compaction is skipped while
> limit is activated, for algorithm correctness. Until now, even if
> failure case, sync compaction continue to work when both scanner is met
> so COMPACT_COMPLETE usually happens in sync compaction. But, limit is
> applied, sync compaction is finished if limit is exhausted so
> COMPACT_COMPLETE usually happens in async compaction. Because we don't
> consider async COMPACT_COMPLETE as actual fail while we reset cached
> scanner pfn
I don't see where compaction being sync/async applies to "reset cached
scanner pfn". I assume you actually meant the call to defer_compaction()
in try_to_compact_pages, which only happens for async compaction?
> defer mechanism doesn't work well. And, async compaction
> would not be easy to succeed in this case so skipping async compaction
> doesn't result in much difference.
So, the alternative to avoiding async compaction would be to call
defer_compaction() also when async compaction completes, right? Which
doesn't sound as scary when deferring isn't an on/off thing, but applies
a limit.
This would also help the issue with THP fault compactions being only
async and thus never deferring anything, which I think showed itself in
Aaron's reports. This current patch wouldn't help there I think, as
without sync compaction the system would never start to apply the limit
in the first place, and would be stuck with the ill-defined contended
compaction detection based on need_resched etc.
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
> mm/compaction.c | 88 +++++++++++++++++++++++++++++++++++++++++++++++++--------
> mm/internal.h | 1 +
> 2 files changed, 78 insertions(+), 11 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 1a75a6e..b23f6d9 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -116,6 +116,67 @@ static struct page *pageblock_pfn_to_page(unsigned long start_pfn,
>
> #ifdef CONFIG_COMPACTION
>
> +/*
> + * order == -1 is expected when compacting via
> + * /proc/sys/vm/compact_memory
> + */
> +static inline bool is_via_compact_memory(int order)
> +{
> + return order == -1;
> +}
> +
> +#define COMPACT_MIN_SCAN_LIMIT (pageblock_nr_pages)
> +
> +static bool excess_migration_scan_limit(struct compact_control *cc)
> +{
> + /* Disable scan limit for now */
> + return false;
> +}
> +
> +static void set_migration_scan_limit(struct compact_control *cc)
> +{
> + struct zone *zone = cc->zone;
> + int order = cc->order;
> + unsigned long limit = zone->managed_pages;
> +
> + cc->migration_scan_limit = LONG_MAX;
> + if (is_via_compact_memory(order))
> + return;
> +
> + if (order < zone->compact_order_failed)
> + return;
> +
> + if (!zone->compact_defer_shift)
> + return;
> +
> + /*
> + * Do not allow async compaction during limit work. In this case,
> + * async compaction would not be easy to succeed and we need to
> + * ensure that COMPACT_COMPLETE occurs by sync compaction for
> + * algorithm correctness and prevention of async compaction will
> + * lead it.
> + */
> + if (cc->mode == MIGRATE_ASYNC) {
> + cc->migration_scan_limit = -1;
> + return;
> + }
> +
> + /* Migration scanner usually scans less than 1/4 pages */
> + limit >>= 2;
> +
> + /*
> + * Deferred compaction restart compaction every 64 compaction
> + * attempts and it rescans whole zone range. To imitate it,
> + * we set limit to 1/64 of scannable range.
> + */
> + limit >>= 6;
> +
> + /* Degradation scan limit according to defer shift */
> + limit >>= zone->compact_defer_shift;
> +
> + cc->migration_scan_limit = max(limit, COMPACT_MIN_SCAN_LIMIT);
> +}
> +
> /* Do not skip compaction more than 64 times */
> #define COMPACT_MAX_DEFER_SHIFT 6
>
> @@ -263,10 +324,15 @@ static void update_pageblock_skip(struct compact_control *cc,
> if (!page)
> return;
>
> - if (nr_isolated)
> + /*
> + * Always update cached_pfn if compaction has scan_limit,
> + * otherwise we would scan same range over and over.
> + */
> + if (cc->migration_scan_limit == LONG_MAX && nr_isolated)
> return;
>
> - set_pageblock_skip(page);
> + if (!nr_isolated)
> + set_pageblock_skip(page);
>
> /* Update where async and sync compaction should restart */
> if (migrate_scanner) {
> @@ -822,6 +888,8 @@ isolate_success:
> if (locked)
> spin_unlock_irqrestore(&zone->lru_lock, flags);
>
> + cc->migration_scan_limit -= nr_scanned;
> +
> trace_mm_compaction_isolate_migratepages(start_pfn, low_pfn,
> nr_scanned, nr_isolated);
>
> @@ -1186,15 +1254,6 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
> return cc->nr_migratepages ? ISOLATE_SUCCESS : ISOLATE_NONE;
> }
>
> -/*
> - * order == -1 is expected when compacting via
> - * /proc/sys/vm/compact_memory
> - */
> -static inline bool is_via_compact_memory(int order)
> -{
> - return order == -1;
> -}
> -
> static int __compact_finished(struct zone *zone, struct compact_control *cc,
> const int migratetype)
> {
> @@ -1224,6 +1283,9 @@ static int __compact_finished(struct zone *zone, struct compact_control *cc,
> if (is_via_compact_memory(cc->order))
> return COMPACT_CONTINUE;
>
> + if (excess_migration_scan_limit(cc))
> + return COMPACT_PARTIAL;
> +
> /* Compaction run is not finished if the watermark is not met */
> watermark = low_wmark_pages(zone);
>
> @@ -1382,6 +1444,10 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
> }
> cc->last_migrated_pfn = 0;
>
> + set_migration_scan_limit(cc);
> + if (excess_migration_scan_limit(cc))
> + return COMPACT_SKIPPED;
> +
> trace_mm_compaction_begin(start_pfn, cc->migrate_pfn,
> cc->free_pfn, end_pfn, sync);
>
> diff --git a/mm/internal.h b/mm/internal.h
> index dbe0436..bb8225c 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -164,6 +164,7 @@ struct compact_control {
> unsigned long free_pfn; /* isolate_freepages search base */
> unsigned long migrate_pfn; /* isolate_migratepages search base */
> unsigned long last_migrated_pfn;/* Not yet flushed page being freed */
> + long migration_scan_limit; /* Limit migration scanner activity */
> enum migrate_mode mode; /* Async or sync migration mode */
> bool ignore_skip_hint; /* Scan blocks even if marked skip */
> int order; /* order a direct compactor needs */
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Vlastimil Babka <vbabka@suse.cz>
To: Joonsoo Kim <js1304@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>, Rik van Riel <riel@redhat.com>,
David Rientjes <rientjes@google.com>,
Minchan Kim <minchan@kernel.org>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Joonsoo Kim <iamjoonsoo.kim@lge.com>
Subject: Re: [PATCH v3 6/7] mm/compaction: introduce migration scan limit
Date: Mon, 14 Dec 2015 10:34:50 +0100 [thread overview]
Message-ID: <566E8D3A.5030804@suse.cz> (raw)
In-Reply-To: <1449126681-19647-7-git-send-email-iamjoonsoo.kim@lge.com>
On 12/03/2015 08:11 AM, Joonsoo Kim wrote:
> This is preparation step to replace compaction deferring with compaction
> limit. Whole reason why we need to replace it will be mentioned in
> the following patch.
>
> In this patch, migration_scan_limit is assigned and accounted, but, not
> checked to finish. So, there is no functional change.
>
> Currently, amount of migration_scan_limit is chosen to imitate compaction
> deferring logic. We can tune it easily if overhead looks insane, but,
> it would be further work.
> Also, amount of migration_scan_limit is adapted by compact_defer_shift.
> More fails increase compact_defer_shift and this will limit compaction
> more.
>
> There are two interesting changes. One is that cached pfn is always
> updated while limit is activated. Otherwise, we would scan same range
> over and over. Second one is that async compaction is skipped while
> limit is activated, for algorithm correctness. Until now, even if
> failure case, sync compaction continue to work when both scanner is met
> so COMPACT_COMPLETE usually happens in sync compaction. But, limit is
> applied, sync compaction is finished if limit is exhausted so
> COMPACT_COMPLETE usually happens in async compaction. Because we don't
> consider async COMPACT_COMPLETE as actual fail while we reset cached
> scanner pfn
I don't see where compaction being sync/async applies to "reset cached
scanner pfn". I assume you actually meant the call to defer_compaction()
in try_to_compact_pages, which only happens for async compaction?
> defer mechanism doesn't work well. And, async compaction
> would not be easy to succeed in this case so skipping async compaction
> doesn't result in much difference.
So, the alternative to avoiding async compaction would be to call
defer_compaction() also when async compaction completes, right? Which
doesn't sound as scary when deferring isn't an on/off thing, but applies
a limit.
This would also help the issue with THP fault compactions being only
async and thus never deferring anything, which I think showed itself in
Aaron's reports. This current patch wouldn't help there I think, as
without sync compaction the system would never start to apply the limit
in the first place, and would be stuck with the ill-defined contended
compaction detection based on need_resched etc.
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
> mm/compaction.c | 88 +++++++++++++++++++++++++++++++++++++++++++++++++--------
> mm/internal.h | 1 +
> 2 files changed, 78 insertions(+), 11 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 1a75a6e..b23f6d9 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -116,6 +116,67 @@ static struct page *pageblock_pfn_to_page(unsigned long start_pfn,
>
> #ifdef CONFIG_COMPACTION
>
> +/*
> + * order == -1 is expected when compacting via
> + * /proc/sys/vm/compact_memory
> + */
> +static inline bool is_via_compact_memory(int order)
> +{
> + return order == -1;
> +}
> +
> +#define COMPACT_MIN_SCAN_LIMIT (pageblock_nr_pages)
> +
> +static bool excess_migration_scan_limit(struct compact_control *cc)
> +{
> + /* Disable scan limit for now */
> + return false;
> +}
> +
> +static void set_migration_scan_limit(struct compact_control *cc)
> +{
> + struct zone *zone = cc->zone;
> + int order = cc->order;
> + unsigned long limit = zone->managed_pages;
> +
> + cc->migration_scan_limit = LONG_MAX;
> + if (is_via_compact_memory(order))
> + return;
> +
> + if (order < zone->compact_order_failed)
> + return;
> +
> + if (!zone->compact_defer_shift)
> + return;
> +
> + /*
> + * Do not allow async compaction during limit work. In this case,
> + * async compaction would not be easy to succeed and we need to
> + * ensure that COMPACT_COMPLETE occurs by sync compaction for
> + * algorithm correctness and prevention of async compaction will
> + * lead it.
> + */
> + if (cc->mode == MIGRATE_ASYNC) {
> + cc->migration_scan_limit = -1;
> + return;
> + }
> +
> + /* Migration scanner usually scans less than 1/4 pages */
> + limit >>= 2;
> +
> + /*
> + * Deferred compaction restart compaction every 64 compaction
> + * attempts and it rescans whole zone range. To imitate it,
> + * we set limit to 1/64 of scannable range.
> + */
> + limit >>= 6;
> +
> + /* Degradation scan limit according to defer shift */
> + limit >>= zone->compact_defer_shift;
> +
> + cc->migration_scan_limit = max(limit, COMPACT_MIN_SCAN_LIMIT);
> +}
> +
> /* Do not skip compaction more than 64 times */
> #define COMPACT_MAX_DEFER_SHIFT 6
>
> @@ -263,10 +324,15 @@ static void update_pageblock_skip(struct compact_control *cc,
> if (!page)
> return;
>
> - if (nr_isolated)
> + /*
> + * Always update cached_pfn if compaction has scan_limit,
> + * otherwise we would scan same range over and over.
> + */
> + if (cc->migration_scan_limit == LONG_MAX && nr_isolated)
> return;
>
> - set_pageblock_skip(page);
> + if (!nr_isolated)
> + set_pageblock_skip(page);
>
> /* Update where async and sync compaction should restart */
> if (migrate_scanner) {
> @@ -822,6 +888,8 @@ isolate_success:
> if (locked)
> spin_unlock_irqrestore(&zone->lru_lock, flags);
>
> + cc->migration_scan_limit -= nr_scanned;
> +
> trace_mm_compaction_isolate_migratepages(start_pfn, low_pfn,
> nr_scanned, nr_isolated);
>
> @@ -1186,15 +1254,6 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
> return cc->nr_migratepages ? ISOLATE_SUCCESS : ISOLATE_NONE;
> }
>
> -/*
> - * order == -1 is expected when compacting via
> - * /proc/sys/vm/compact_memory
> - */
> -static inline bool is_via_compact_memory(int order)
> -{
> - return order == -1;
> -}
> -
> static int __compact_finished(struct zone *zone, struct compact_control *cc,
> const int migratetype)
> {
> @@ -1224,6 +1283,9 @@ static int __compact_finished(struct zone *zone, struct compact_control *cc,
> if (is_via_compact_memory(cc->order))
> return COMPACT_CONTINUE;
>
> + if (excess_migration_scan_limit(cc))
> + return COMPACT_PARTIAL;
> +
> /* Compaction run is not finished if the watermark is not met */
> watermark = low_wmark_pages(zone);
>
> @@ -1382,6 +1444,10 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
> }
> cc->last_migrated_pfn = 0;
>
> + set_migration_scan_limit(cc);
> + if (excess_migration_scan_limit(cc))
> + return COMPACT_SKIPPED;
> +
> trace_mm_compaction_begin(start_pfn, cc->migrate_pfn,
> cc->free_pfn, end_pfn, sync);
>
> diff --git a/mm/internal.h b/mm/internal.h
> index dbe0436..bb8225c 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -164,6 +164,7 @@ struct compact_control {
> unsigned long free_pfn; /* isolate_freepages search base */
> unsigned long migrate_pfn; /* isolate_migratepages search base */
> unsigned long last_migrated_pfn;/* Not yet flushed page being freed */
> + long migration_scan_limit; /* Limit migration scanner activity */
> enum migrate_mode mode; /* Async or sync migration mode */
> bool ignore_skip_hint; /* Scan blocks even if marked skip */
> int order; /* order a direct compactor needs */
>
next prev parent reply other threads:[~2015-12-14 9:34 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-12-03 7:11 [PATCH v3 0/7] mm/compaction: redesign compaction: part1 Joonsoo Kim
2015-12-03 7:11 ` Joonsoo Kim
2015-12-03 7:11 ` [PATCH v3 1/7] mm/compaction: skip useless pfn when updating cached pfn Joonsoo Kim
2015-12-03 7:11 ` Joonsoo Kim
2015-12-03 10:36 ` Vlastimil Babka
2015-12-03 10:36 ` Vlastimil Babka
2015-12-07 7:37 ` Joonsoo Kim
2015-12-07 7:37 ` Joonsoo Kim
2015-12-03 7:11 ` [PATCH v3 2/7] mm/compaction: remove unused defer_compaction() in compaction.h Joonsoo Kim
2015-12-03 7:11 ` Joonsoo Kim
2015-12-04 15:29 ` Vlastimil Babka
2015-12-04 15:29 ` Vlastimil Babka
2015-12-03 7:11 ` [PATCH v3 3/7] mm/compaction: initialize compact_order_failed to MAX_ORDER Joonsoo Kim
2015-12-03 7:11 ` Joonsoo Kim
2015-12-04 15:36 ` Vlastimil Babka
2015-12-04 15:36 ` Vlastimil Babka
2015-12-03 7:11 ` [PATCH v3 4/7] mm/compaction: update defer counter when allocation is expected to succeed Joonsoo Kim
2015-12-03 7:11 ` Joonsoo Kim
2015-12-04 16:52 ` Vlastimil Babka
2015-12-04 16:52 ` Vlastimil Babka
2015-12-07 8:03 ` Joonsoo Kim
2015-12-07 8:03 ` Joonsoo Kim
2015-12-03 7:11 ` [PATCH v3 5/7] mm/compaction: respect compaction order when updating defer counter Joonsoo Kim
2015-12-03 7:11 ` Joonsoo Kim
2015-12-04 17:15 ` Vlastimil Babka
2015-12-04 17:15 ` Vlastimil Babka
2015-12-07 8:04 ` Joonsoo Kim
2015-12-07 8:04 ` Joonsoo Kim
2015-12-03 7:11 ` [PATCH v3 6/7] mm/compaction: introduce migration scan limit Joonsoo Kim
2015-12-03 7:11 ` Joonsoo Kim
2015-12-14 9:34 ` Vlastimil Babka [this message]
2015-12-14 9:34 ` Vlastimil Babka
2015-12-16 5:39 ` Joonsoo Kim
2015-12-16 5:39 ` Joonsoo Kim
2015-12-03 7:11 ` [PATCH v3 7/7] mm/compaction: replace compaction deferring with compaction limit Joonsoo Kim
2015-12-03 7:11 ` Joonsoo Kim
2015-12-14 9:55 ` Vlastimil Babka
2015-12-14 9:55 ` Vlastimil Babka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=566E8D3A.5030804@suse.cz \
--to=vbabka@suse.cz \
--cc=akpm@linux-foundation.org \
--cc=iamjoonsoo.kim@lge.com \
--cc=js1304@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=minchan@kernel.org \
--cc=riel@redhat.com \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.