* [PATCH 1/5] mm: compaction: encapsulate defer reset logic
2013-11-25 14:26 [RFC PATCH 0/5] Memory compaction efficiency improvements Vlastimil Babka
@ 2013-11-25 14:26 ` Vlastimil Babka
2013-11-25 22:08 ` Rik van Riel
2013-11-26 10:16 ` Mel Gorman
2013-11-25 14:26 ` [PATCH 2/5] mm: compaction: reset cached scanner pfn's before reading them Vlastimil Babka
` (4 subsequent siblings)
5 siblings, 2 replies; 20+ messages in thread
From: Vlastimil Babka @ 2013-11-25 14:26 UTC (permalink / raw)
To: linux-mm; +Cc: Vlastimil Babka, linux-kernel, Mel Gorman, Rik van Riel
Currently there are several functions to manipulate the deferred compaction
state variables. The remaining case where the variables are touched directly
is when a successful allocation occurs in direct compaction, or is expected
to be successful in the future by kswapd. Here, the lowest order that is
expected to fail is updated, and in the case of direct compaction, the deferred
status is reset completely.
Create a new function compaction_defer_reset() to encapsulate this
functionality and make it easier to understand the code. No functional change.
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
include/linux/compaction.h | 12 ++++++++++++
mm/compaction.c | 9 ++++-----
mm/page_alloc.c | 5 +----
3 files changed, 17 insertions(+), 9 deletions(-)
diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index 091d72e..da39978 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -62,6 +62,18 @@ static inline bool compaction_deferred(struct zone *zone, int order)
return zone->compact_considered < defer_limit;
}
+/* Update defer tracking counters after successful allocation of given order */
+static inline void compaction_defer_reset(struct zone *zone, int order,
+ bool reset_shift)
+{
+ if (reset_shift) {
+ zone->compact_considered = 0;
+ zone->compact_defer_shift = 0;
+ }
+ if (order >= zone->compact_order_failed)
+ zone->compact_order_failed = order + 1;
+}
+
/* Returns true if restarting compaction after many failures */
static inline bool compaction_restarting(struct zone *zone, int order)
{
diff --git a/mm/compaction.c b/mm/compaction.c
index 805165b..7c0073e 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1116,12 +1116,11 @@ static void __compact_pgdat(pg_data_t *pgdat, struct compact_control *cc)
compact_zone(zone, cc);
if (cc->order > 0) {
- int ok = zone_watermark_ok(zone, cc->order,
- low_wmark_pages(zone), 0, 0);
- if (ok && cc->order >= zone->compact_order_failed)
- zone->compact_order_failed = cc->order + 1;
+ if (zone_watermark_ok(zone, cc->order,
+ low_wmark_pages(zone), 0, 0))
+ compaction_defer_reset(zone, cc->order, false);
/* Currently async compaction is never deferred. */
- else if (!ok && cc->sync)
+ else if (cc->sync)
defer_compaction(zone, cc->order);
}
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 580a5f0..50c7f67 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2243,10 +2243,7 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
preferred_zone, migratetype);
if (page) {
preferred_zone->compact_blockskip_flush = false;
- preferred_zone->compact_considered = 0;
- preferred_zone->compact_defer_shift = 0;
- if (order >= preferred_zone->compact_order_failed)
- preferred_zone->compact_order_failed = order + 1;
+ compaction_defer_reset(preferred_zone, order, true);
count_vm_event(COMPACTSUCCESS);
return page;
}
--
1.8.1.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 1/5] mm: compaction: encapsulate defer reset logic
2013-11-25 14:26 ` [PATCH 1/5] mm: compaction: encapsulate defer reset logic Vlastimil Babka
@ 2013-11-25 22:08 ` Rik van Riel
2013-11-26 10:16 ` Mel Gorman
1 sibling, 0 replies; 20+ messages in thread
From: Rik van Riel @ 2013-11-25 22:08 UTC (permalink / raw)
To: Vlastimil Babka, linux-mm; +Cc: linux-kernel, Mel Gorman
On 11/25/2013 09:26 AM, Vlastimil Babka wrote:
> Currently there are several functions to manipulate the deferred compaction
> state variables. The remaining case where the variables are touched directly
> is when a successful allocation occurs in direct compaction, or is expected
> to be successful in the future by kswapd. Here, the lowest order that is
> expected to fail is updated, and in the case of direct compaction, the deferred
> status is reset completely.
>
> Create a new function compaction_defer_reset() to encapsulate this
> functionality and make it easier to understand the code. No functional change.
>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Rik van Riel <riel@redhat.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Rik van Riel <riel@redhat.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/5] mm: compaction: encapsulate defer reset logic
2013-11-25 14:26 ` [PATCH 1/5] mm: compaction: encapsulate defer reset logic Vlastimil Babka
2013-11-25 22:08 ` Rik van Riel
@ 2013-11-26 10:16 ` Mel Gorman
1 sibling, 0 replies; 20+ messages in thread
From: Mel Gorman @ 2013-11-26 10:16 UTC (permalink / raw)
To: Vlastimil Babka; +Cc: linux-mm, linux-kernel, Rik van Riel
On Mon, Nov 25, 2013 at 03:26:06PM +0100, Vlastimil Babka wrote:
> Currently there are several functions to manipulate the deferred compaction
> state variables. The remaining case where the variables are touched directly
> is when a successful allocation occurs in direct compaction, or is expected
> to be successful in the future by kswapd. Here, the lowest order that is
> expected to fail is updated, and in the case of direct compaction, the deferred
> status is reset completely.
>
> Create a new function compaction_defer_reset() to encapsulate this
> functionality and make it easier to understand the code. No functional change.
>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Rik van Riel <riel@redhat.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
> include/linux/compaction.h | 12 ++++++++++++
> mm/compaction.c | 9 ++++-----
> mm/page_alloc.c | 5 +----
> 3 files changed, 17 insertions(+), 9 deletions(-)
>
> diff --git a/include/linux/compaction.h b/include/linux/compaction.h
> index 091d72e..da39978 100644
> --- a/include/linux/compaction.h
> +++ b/include/linux/compaction.h
> @@ -62,6 +62,18 @@ static inline bool compaction_deferred(struct zone *zone, int order)
> return zone->compact_considered < defer_limit;
> }
>
> +/* Update defer tracking counters after successful allocation of given order */
> +static inline void compaction_defer_reset(struct zone *zone, int order,
> + bool reset_shift)
> +{
> + if (reset_shift) {
> + zone->compact_considered = 0;
> + zone->compact_defer_shift = 0;
> + }
> + if (order >= zone->compact_order_failed)
> + zone->compact_order_failed = order + 1;
> +}
> +
Nit pick
The comment says this is called after a successful allocation but that
is only true in one case. s/allocation/compaction/ ?
reset_shift says what it does but not why and exposes an unnecessary. If
this sees a second revision, maybe consider renaming it to something like
alloc_success?
With or without changes;
Acked-by: Mel Gorman <mgorman@suse.de>
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 2/5] mm: compaction: reset cached scanner pfn's before reading them
2013-11-25 14:26 [RFC PATCH 0/5] Memory compaction efficiency improvements Vlastimil Babka
2013-11-25 14:26 ` [PATCH 1/5] mm: compaction: encapsulate defer reset logic Vlastimil Babka
@ 2013-11-25 14:26 ` Vlastimil Babka
2013-11-26 10:23 ` Mel Gorman
2013-11-26 13:16 ` Rik van Riel
2013-11-25 14:26 ` [PATCH 3/5] mm: compaction: detect when scanners meet in isolate_freepages Vlastimil Babka
` (3 subsequent siblings)
5 siblings, 2 replies; 20+ messages in thread
From: Vlastimil Babka @ 2013-11-25 14:26 UTC (permalink / raw)
To: linux-mm; +Cc: Vlastimil Babka, linux-kernel, Mel Gorman, Rik van Riel
Compaction caches pfn's for its migrate and free scanners to avoid scanning
the whole zone each time. In compact_zone(), the cached values are read to
set up initial values for the scanners. There are several situations when
these cached pfn's are reset to the first and last pfn of the zone,
respectively. One of these situations is when a compaction has been deferred
for a zone and is now being restarted during a direct compaction, which is also
done in compact_zone().
However, compact_zone() currently reads the cached pfn's *before* resetting
them. This means the reset doesn't affect the compaction that performs it, and
with good chance also subsequent compactions, as update_pageblock_skip() is
likely to be called and update the cached pfn's to those being processed.
Another chance for a successful reset is when a direct compaction detects that
migration and free scanners meet (which has its own problems addressed by
another patch) and sets update_pageblock_skip flag which kswapd uses to do the
reset because it goes to sleep.
This is clearly a bug that results in non-deterministic behavior, so this patch
moves the cached pfn reset to be performed *before* the values are read.
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
mm/compaction.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index 7c0073e..6a2f0c2 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -943,6 +943,14 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
}
/*
+ * Clear pageblock skip if there were failures recently and compaction
+ * is about to be retried after being deferred. kswapd does not do
+ * this reset as it'll reset the cached information when going to sleep.
+ */
+ if (compaction_restarting(zone, cc->order) && !current_is_kswapd())
+ __reset_isolation_suitable(zone);
+
+ /*
* Setup to move all movable pages to the end of the zone. Used cached
* information on where the scanners should start but check that it
* is initialised by ensuring the values are within zone boundaries.
@@ -958,14 +966,6 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
zone->compact_cached_migrate_pfn = cc->migrate_pfn;
}
- /*
- * Clear pageblock skip if there were failures recently and compaction
- * is about to be retried after being deferred. kswapd does not do
- * this reset as it'll reset the cached information when going to sleep.
- */
- if (compaction_restarting(zone, cc->order) && !current_is_kswapd())
- __reset_isolation_suitable(zone);
-
migrate_prep_local();
while ((ret = compact_finished(zone, cc)) == COMPACT_CONTINUE) {
--
1.8.1.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 2/5] mm: compaction: reset cached scanner pfn's before reading them
2013-11-25 14:26 ` [PATCH 2/5] mm: compaction: reset cached scanner pfn's before reading them Vlastimil Babka
@ 2013-11-26 10:23 ` Mel Gorman
2013-11-26 13:16 ` Rik van Riel
1 sibling, 0 replies; 20+ messages in thread
From: Mel Gorman @ 2013-11-26 10:23 UTC (permalink / raw)
To: Vlastimil Babka; +Cc: linux-mm, linux-kernel, Rik van Riel
On Mon, Nov 25, 2013 at 03:26:07PM +0100, Vlastimil Babka wrote:
> Compaction caches pfn's for its migrate and free scanners to avoid scanning
> the whole zone each time. In compact_zone(), the cached values are read to
> set up initial values for the scanners. There are several situations when
> these cached pfn's are reset to the first and last pfn of the zone,
> respectively. One of these situations is when a compaction has been deferred
> for a zone and is now being restarted during a direct compaction, which is also
> done in compact_zone().
>
> However, compact_zone() currently reads the cached pfn's *before* resetting
> them. This means the reset doesn't affect the compaction that performs it, and
> with good chance also subsequent compactions, as update_pageblock_skip() is
> likely to be called and update the cached pfn's to those being processed.
> Another chance for a successful reset is when a direct compaction detects that
> migration and free scanners meet (which has its own problems addressed by
> another patch) and sets update_pageblock_skip flag which kswapd uses to do the
> reset because it goes to sleep.
>
> This is clearly a bug that results in non-deterministic behavior, so this patch
> moves the cached pfn reset to be performed *before* the values are read.
>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Rik van Riel <riel@redhat.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@suse.de>
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/5] mm: compaction: reset cached scanner pfn's before reading them
2013-11-25 14:26 ` [PATCH 2/5] mm: compaction: reset cached scanner pfn's before reading them Vlastimil Babka
2013-11-26 10:23 ` Mel Gorman
@ 2013-11-26 13:16 ` Rik van Riel
1 sibling, 0 replies; 20+ messages in thread
From: Rik van Riel @ 2013-11-26 13:16 UTC (permalink / raw)
To: Vlastimil Babka; +Cc: linux-mm, linux-kernel, Mel Gorman
On 11/25/2013 09:26 AM, Vlastimil Babka wrote:
> Compaction caches pfn's for its migrate and free scanners to avoid scanning
> the whole zone each time. In compact_zone(), the cached values are read to
> set up initial values for the scanners. There are several situations when
> these cached pfn's are reset to the first and last pfn of the zone,
> respectively. One of these situations is when a compaction has been deferred
> for a zone and is now being restarted during a direct compaction, which is also
> done in compact_zone().
>
> However, compact_zone() currently reads the cached pfn's *before* resetting
> them. This means the reset doesn't affect the compaction that performs it, and
> with good chance also subsequent compactions, as update_pageblock_skip() is
> likely to be called and update the cached pfn's to those being processed.
> Another chance for a successful reset is when a direct compaction detects that
> migration and free scanners meet (which has its own problems addressed by
> another patch) and sets update_pageblock_skip flag which kswapd uses to do the
> reset because it goes to sleep.
>
> This is clearly a bug that results in non-deterministic behavior, so this patch
> moves the cached pfn reset to be performed *before* the values are read.
>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Rik van Riel <riel@redhat.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Rik van Riel <riel@redhat.com>
--
All rights reversed
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 3/5] mm: compaction: detect when scanners meet in isolate_freepages
2013-11-25 14:26 [RFC PATCH 0/5] Memory compaction efficiency improvements Vlastimil Babka
2013-11-25 14:26 ` [PATCH 1/5] mm: compaction: encapsulate defer reset logic Vlastimil Babka
2013-11-25 14:26 ` [PATCH 2/5] mm: compaction: reset cached scanner pfn's before reading them Vlastimil Babka
@ 2013-11-25 14:26 ` Vlastimil Babka
2013-11-26 10:45 ` Mel Gorman
2013-11-25 14:26 ` [PATCH 4/5] mm: compaction: do not mark unmovable pageblocks as skipped in async compaction Vlastimil Babka
` (2 subsequent siblings)
5 siblings, 1 reply; 20+ messages in thread
From: Vlastimil Babka @ 2013-11-25 14:26 UTC (permalink / raw)
To: linux-mm; +Cc: Vlastimil Babka, linux-kernel, Mel Gorman, Rik van Riel
Compaction of a zone is finished when the migrate scanner (which begins at the
zone's lowest pfn) meets the free page scanner (which begins at the zone's
highest pfn). This is detected in compact_zone() and in the case of direct
compaction, the compact_blockskip_flush flag is set so that kswapd later resets
the cached scanner pfn's, and a new compaction may again start at the zone's
borders.
The meeting of the scanners can happen during either scanner's activity.
However, it may currently fail to be detected when it occurs in the free page
scanner, due to two problems. First, isolate_freepages() keeps free_pfn at the
highest block where it isolated pages from, for the purposes of not missing the
pages that are returned back to allocator when migration fails. Second, failing
to isolate enough free pages due to scanners meeting results in -ENOMEM being
returned by migrate_pages(), which makes compact_zone() bail out immediately
without calling compact_finished() that would detect scanners meeting.
This failure to detect scanners meeting might result in repeated attempts at
compaction of a zone that keep starting from the cached pfn's close to the
meeting point, and quickly failing through the -ENOMEM path, without the cached
pfns being reset, over and over. This has been observed (through additional
tracepoints) in the third phase of the mmtests stress-highalloc benchmark, where
the allocator runs on an otherwise idle system. The problem was observed in the
DMA32 zone, which was used as a fallback to the preferred Normal zone, but on
the 4GB system it was actually the largest zone. The problem is even amplified
for such fallback zone - the deferred compaction logic, which could (after
being fixed by a previous patch) reset the cached scanner pfn's, is only
applied to the preferred zone and not for the fallbacks.
The problem in the third phase of the benchmark was further amplified by commit
81c0a2bb ("mm: page_alloc: fair zone allocator policy") which resulted in a
non-deterministic regression of the allocation success rate from ~85% to ~65%.
This occurs in about half of benchmark runs, making bisection problematic.
It is unlikely that the commit itself is buggy, but it should put more pressure
on the DMA32 zone during phases 1 and 2, which may leave it more fragmented in
phase 3 and expose the bugs that this patch fixes.
The fix is to make scanners meeting in isolate_freepage() stay that way, and
to check in compact_zone() for scanners meeting when migrate_pages() returns
-ENOMEM. The result is that compact_finished() also detects scanners meeting
and sets the compact_blockskip_flush flag to make kswapd reset the scanner
pfn's.
The results in stress-highalloc benchmark show that the "regression" by commit
81c0a2bb in phase 3 no longer occurs, and phase 1 and 2 allocation success rates
are significantly improved.
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
mm/compaction.c | 19 +++++++++++++++----
1 file changed, 15 insertions(+), 4 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index 6a2f0c2..0702bdf 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -656,7 +656,7 @@ static void isolate_freepages(struct zone *zone,
* is the end of the pageblock the migration scanner is using.
*/
pfn = cc->free_pfn;
- low_pfn = cc->migrate_pfn + pageblock_nr_pages;
+ low_pfn = ALIGN(cc->migrate_pfn + 1, pageblock_nr_pages);
/*
* Take care that if the migration scanner is at the end of the zone
@@ -672,7 +672,7 @@ static void isolate_freepages(struct zone *zone,
* pages on cc->migratepages. We stop searching if the migrate
* and free page scanners meet or enough free pages are isolated.
*/
- for (; pfn > low_pfn && cc->nr_migratepages > nr_freepages;
+ for (; pfn >= low_pfn && cc->nr_migratepages > nr_freepages;
pfn -= pageblock_nr_pages) {
unsigned long isolated;
@@ -734,7 +734,14 @@ static void isolate_freepages(struct zone *zone,
/* split_free_page does not map the pages */
map_pages(freelist);
- cc->free_pfn = high_pfn;
+ /*
+ * If we crossed the migrate scanner, we want to keep it that way
+ * so that compact_finished() may detect this
+ */
+ if (pfn < low_pfn)
+ cc->free_pfn = max(pfn, zone->zone_start_pfn);
+ else
+ cc->free_pfn = high_pfn;
cc->nr_freepages = nr_freepages;
}
@@ -999,7 +1006,11 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
if (err) {
putback_movable_pages(&cc->migratepages);
cc->nr_migratepages = 0;
- if (err == -ENOMEM) {
+ /*
+ * migrate_pages() may return -ENOMEM when scanners meet
+ * and we want compact_finished() to detect it
+ */
+ if (err == -ENOMEM && cc->free_pfn > cc->migrate_pfn) {
ret = COMPACT_PARTIAL;
goto out;
}
--
1.8.1.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 3/5] mm: compaction: detect when scanners meet in isolate_freepages
2013-11-25 14:26 ` [PATCH 3/5] mm: compaction: detect when scanners meet in isolate_freepages Vlastimil Babka
@ 2013-11-26 10:45 ` Mel Gorman
2013-11-26 16:44 ` Vlastimil Babka
0 siblings, 1 reply; 20+ messages in thread
From: Mel Gorman @ 2013-11-26 10:45 UTC (permalink / raw)
To: Vlastimil Babka; +Cc: linux-mm, linux-kernel, Rik van Riel
On Mon, Nov 25, 2013 at 03:26:08PM +0100, Vlastimil Babka wrote:
> Compaction of a zone is finished when the migrate scanner (which begins at the
> zone's lowest pfn) meets the free page scanner (which begins at the zone's
> highest pfn). This is detected in compact_zone() and in the case of direct
> compaction, the compact_blockskip_flush flag is set so that kswapd later resets
> the cached scanner pfn's, and a new compaction may again start at the zone's
> borders.
>
> The meeting of the scanners can happen during either scanner's activity.
> However, it may currently fail to be detected when it occurs in the free page
> scanner, due to two problems. First, isolate_freepages() keeps free_pfn at the
> highest block where it isolated pages from, for the purposes of not missing the
> pages that are returned back to allocator when migration fails. Second, failing
> to isolate enough free pages due to scanners meeting results in -ENOMEM being
> returned by migrate_pages(), which makes compact_zone() bail out immediately
> without calling compact_finished() that would detect scanners meeting.
>
> This failure to detect scanners meeting might result in repeated attempts at
> compaction of a zone that keep starting from the cached pfn's close to the
> meeting point, and quickly failing through the -ENOMEM path, without the cached
> pfns being reset, over and over. This has been observed (through additional
> tracepoints) in the third phase of the mmtests stress-highalloc benchmark, where
> the allocator runs on an otherwise idle system. The problem was observed in the
> DMA32 zone, which was used as a fallback to the preferred Normal zone, but on
> the 4GB system it was actually the largest zone. The problem is even amplified
> for such fallback zone - the deferred compaction logic, which could (after
> being fixed by a previous patch) reset the cached scanner pfn's, is only
> applied to the preferred zone and not for the fallbacks.
>
> The problem in the third phase of the benchmark was further amplified by commit
> 81c0a2bb ("mm: page_alloc: fair zone allocator policy") which resulted in a
> non-deterministic regression of the allocation success rate from ~85% to ~65%.
> This occurs in about half of benchmark runs, making bisection problematic.
> It is unlikely that the commit itself is buggy, but it should put more pressure
> on the DMA32 zone during phases 1 and 2, which may leave it more fragmented in
> phase 3 and expose the bugs that this patch fixes.
>
> The fix is to make scanners meeting in isolate_freepage() stay that way, and
> to check in compact_zone() for scanners meeting when migrate_pages() returns
> -ENOMEM. The result is that compact_finished() also detects scanners meeting
> and sets the compact_blockskip_flush flag to make kswapd reset the scanner
> pfn's.
>
> The results in stress-highalloc benchmark show that the "regression" by commit
> 81c0a2bb in phase 3 no longer occurs, and phase 1 and 2 allocation success rates
> are significantly improved.
>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Rik van Riel <riel@redhat.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
> mm/compaction.c | 19 +++++++++++++++----
> 1 file changed, 15 insertions(+), 4 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 6a2f0c2..0702bdf 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -656,7 +656,7 @@ static void isolate_freepages(struct zone *zone,
> * is the end of the pageblock the migration scanner is using.
> */
> pfn = cc->free_pfn;
> - low_pfn = cc->migrate_pfn + pageblock_nr_pages;
> + low_pfn = ALIGN(cc->migrate_pfn + 1, pageblock_nr_pages);
>
> /*
> * Take care that if the migration scanner is at the end of the zone
> @@ -672,7 +672,7 @@ static void isolate_freepages(struct zone *zone,
> * pages on cc->migratepages. We stop searching if the migrate
> * and free page scanners meet or enough free pages are isolated.
> */
> - for (; pfn > low_pfn && cc->nr_migratepages > nr_freepages;
> + for (; pfn >= low_pfn && cc->nr_migratepages > nr_freepages;
> pfn -= pageblock_nr_pages) {
> unsigned long isolated;
>
> @@ -734,7 +734,14 @@ static void isolate_freepages(struct zone *zone,
> /* split_free_page does not map the pages */
> map_pages(freelist);
>
> - cc->free_pfn = high_pfn;
> + /*
> + * If we crossed the migrate scanner, we want to keep it that way
> + * so that compact_finished() may detect this
> + */
Whitespace damage.
> + if (pfn < low_pfn)
> + cc->free_pfn = max(pfn, zone->zone_start_pfn);
Is it even possible for this condition to occur? low_pfn bound is
cc->migrate_pfn and the free scanner should only start after some pages
have already been isolated for migration.
> + else
> + cc->free_pfn = high_pfn;
> cc->nr_freepages = nr_freepages;
> }
>
> @@ -999,7 +1006,11 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
> if (err) {
> putback_movable_pages(&cc->migratepages);
> cc->nr_migratepages = 0;
> - if (err == -ENOMEM) {
> + /*
> + * migrate_pages() may return -ENOMEM when scanners meet
> + * and we want compact_finished() to detect it
> + */
> + if (err == -ENOMEM && cc->free_pfn > cc->migrate_pfn) {
> ret = COMPACT_PARTIAL;
> goto out;
> }
> --
> 1.8.1.4
>
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 3/5] mm: compaction: detect when scanners meet in isolate_freepages
2013-11-26 10:45 ` Mel Gorman
@ 2013-11-26 16:44 ` Vlastimil Babka
0 siblings, 0 replies; 20+ messages in thread
From: Vlastimil Babka @ 2013-11-26 16:44 UTC (permalink / raw)
To: Mel Gorman; +Cc: linux-mm, linux-kernel, Rik van Riel
On 11/26/2013 11:45 AM, Mel Gorman wrote:
> On Mon, Nov 25, 2013 at 03:26:08PM +0100, Vlastimil Babka wrote:
>> Compaction of a zone is finished when the migrate scanner (which begins at the
>> zone's lowest pfn) meets the free page scanner (which begins at the zone's
>> highest pfn). This is detected in compact_zone() and in the case of direct
>> compaction, the compact_blockskip_flush flag is set so that kswapd later resets
>> the cached scanner pfn's, and a new compaction may again start at the zone's
>> borders.
>>
>> The meeting of the scanners can happen during either scanner's activity.
>> However, it may currently fail to be detected when it occurs in the free page
>> scanner, due to two problems. First, isolate_freepages() keeps free_pfn at the
>> highest block where it isolated pages from, for the purposes of not missing the
>> pages that are returned back to allocator when migration fails. Second, failing
>> to isolate enough free pages due to scanners meeting results in -ENOMEM being
>> returned by migrate_pages(), which makes compact_zone() bail out immediately
>> without calling compact_finished() that would detect scanners meeting.
>>
>> This failure to detect scanners meeting might result in repeated attempts at
>> compaction of a zone that keep starting from the cached pfn's close to the
>> meeting point, and quickly failing through the -ENOMEM path, without the cached
>> pfns being reset, over and over. This has been observed (through additional
>> tracepoints) in the third phase of the mmtests stress-highalloc benchmark, where
>> the allocator runs on an otherwise idle system. The problem was observed in the
>> DMA32 zone, which was used as a fallback to the preferred Normal zone, but on
>> the 4GB system it was actually the largest zone. The problem is even amplified
>> for such fallback zone - the deferred compaction logic, which could (after
>> being fixed by a previous patch) reset the cached scanner pfn's, is only
>> applied to the preferred zone and not for the fallbacks.
>>
>> The problem in the third phase of the benchmark was further amplified by commit
>> 81c0a2bb ("mm: page_alloc: fair zone allocator policy") which resulted in a
>> non-deterministic regression of the allocation success rate from ~85% to ~65%.
>> This occurs in about half of benchmark runs, making bisection problematic.
>> It is unlikely that the commit itself is buggy, but it should put more pressure
>> on the DMA32 zone during phases 1 and 2, which may leave it more fragmented in
>> phase 3 and expose the bugs that this patch fixes.
>>
>> The fix is to make scanners meeting in isolate_freepage() stay that way, and
>> to check in compact_zone() for scanners meeting when migrate_pages() returns
>> -ENOMEM. The result is that compact_finished() also detects scanners meeting
>> and sets the compact_blockskip_flush flag to make kswapd reset the scanner
>> pfn's.
>>
>> The results in stress-highalloc benchmark show that the "regression" by commit
>> 81c0a2bb in phase 3 no longer occurs, and phase 1 and 2 allocation success rates
>> are significantly improved.
>>
>> Cc: Mel Gorman <mgorman@suse.de>
>> Cc: Rik van Riel <riel@redhat.com>
>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>> ---
>> mm/compaction.c | 19 +++++++++++++++----
>> 1 file changed, 15 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index 6a2f0c2..0702bdf 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -656,7 +656,7 @@ static void isolate_freepages(struct zone *zone,
>> * is the end of the pageblock the migration scanner is using.
>> */
>> pfn = cc->free_pfn;
>> - low_pfn = cc->migrate_pfn + pageblock_nr_pages;
>> + low_pfn = ALIGN(cc->migrate_pfn + 1, pageblock_nr_pages);
>>
>> /*
>> * Take care that if the migration scanner is at the end of the zone
>> @@ -672,7 +672,7 @@ static void isolate_freepages(struct zone *zone,
>> * pages on cc->migratepages. We stop searching if the migrate
>> * and free page scanners meet or enough free pages are isolated.
>> */
>> - for (; pfn > low_pfn && cc->nr_migratepages > nr_freepages;
>> + for (; pfn >= low_pfn && cc->nr_migratepages > nr_freepages;
>> pfn -= pageblock_nr_pages) {
>> unsigned long isolated;
>>
>> @@ -734,7 +734,14 @@ static void isolate_freepages(struct zone *zone,
>> /* split_free_page does not map the pages */
>> map_pages(freelist);
>>
>> - cc->free_pfn = high_pfn;
>> + /*
>> + * If we crossed the migrate scanner, we want to keep it that way
>> + * so that compact_finished() may detect this
>> + */
>
> Whitespace damage.
Thanks, will fix.
>> + if (pfn < low_pfn)
>> + cc->free_pfn = max(pfn, zone->zone_start_pfn);
>
> Is it even possible for this condition to occur? low_pfn bound is
> cc->migrate_pfn and the free scanner should only start after some pages
> have already been isolated for migration.
If a zone starts in a middle of a pageblock and migrate scanner isolates
enough pages early to stay within that pageblock, low_pfn will be at the
end of that pageblock and after the for cycle in this function ends, pfn
might be at the beginning of that pageblock. It might not be an actual
problem (this compaction will finish at this point, and if someone else
is racing, he will probably check the boundaries himself), but I played
it safe.
>> + else
>> + cc->free_pfn = high_pfn;
>> cc->nr_freepages = nr_freepages;
>> }
>>
>> @@ -999,7 +1006,11 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
>> if (err) {
>> putback_movable_pages(&cc->migratepages);
>> cc->nr_migratepages = 0;
>> - if (err == -ENOMEM) {
>> + /*
>> + * migrate_pages() may return -ENOMEM when scanners meet
>> + * and we want compact_finished() to detect it
>> + */
>> + if (err == -ENOMEM && cc->free_pfn > cc->migrate_pfn) {
>> ret = COMPACT_PARTIAL;
>> goto out;
>> }
>> --
>> 1.8.1.4
>>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 4/5] mm: compaction: do not mark unmovable pageblocks as skipped in async compaction
2013-11-25 14:26 [RFC PATCH 0/5] Memory compaction efficiency improvements Vlastimil Babka
` (2 preceding siblings ...)
2013-11-25 14:26 ` [PATCH 3/5] mm: compaction: detect when scanners meet in isolate_freepages Vlastimil Babka
@ 2013-11-25 14:26 ` Vlastimil Babka
2013-11-26 10:58 ` Mel Gorman
2013-11-25 14:26 ` [PATCH 5/5] mm: compaction: reset scanner positions immediately when they meet Vlastimil Babka
2013-12-04 14:30 ` [PATCH] mm: compaction: Trace compaction begin and end Mel Gorman
5 siblings, 1 reply; 20+ messages in thread
From: Vlastimil Babka @ 2013-11-25 14:26 UTC (permalink / raw)
To: linux-mm; +Cc: Vlastimil Babka, linux-kernel, Mel Gorman, Rik van Riel
Compaction temporarily marks pageblocks where it fails to isolate pages as
to-be-skipped in further compactions, in order to improve efficiency. One of
the reasons to fail isolating pages is that isolation is not attempted in
pageblocks that are not of MIGRATE_MOVABLE (or CMA) type.
The problem is that blocks skipped due to not being MIGRATE_MOVABLE in async
compaction become skipped due to the temporary mark also in future sync
compaction. Moreover, this may follow quite soon during __alloc_page_slowpath,
without much time for kswapd to clear the pageblock skip marks. This goes
against the idea that sync compaction should try to scan these blocks more
thoroughly than the async compaction.
The fix is to ensure in async compaction that these !MIGRATE_MOVABLE blocks are
not marked to be skipped. Note this should not affect performance or locking
impact of further async compactions, as skipping a block due to being
!MIGRATE_MOVABLE is done soon after skipping a block marked to be skipped, both
without locking.
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
mm/compaction.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index 0702bdf..f481193 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -455,6 +455,8 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
unsigned long flags;
bool locked = false;
struct page *page = NULL, *valid_page = NULL;
+ bool skipped_unmovable = false;
+
/*
* Ensure that there are not too many pages isolated from the LRU
@@ -530,6 +532,7 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
if (!cc->sync && last_pageblock_nr != pageblock_nr &&
!migrate_async_suitable(get_pageblock_migratetype(page))) {
cc->finished_update_migrate = true;
+ skipped_unmovable = true;
goto next_pageblock;
}
@@ -624,7 +627,7 @@ next_pageblock:
spin_unlock_irqrestore(&zone->lru_lock, flags);
/* Update the pageblock-skip if the whole pageblock was scanned */
- if (low_pfn == end_pfn)
+ if (low_pfn == end_pfn && !skipped_unmovable)
update_pageblock_skip(cc, valid_page, nr_isolated, true);
trace_mm_compaction_isolate_migratepages(nr_scanned, nr_isolated);
--
1.8.1.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 4/5] mm: compaction: do not mark unmovable pageblocks as skipped in async compaction
2013-11-25 14:26 ` [PATCH 4/5] mm: compaction: do not mark unmovable pageblocks as skipped in async compaction Vlastimil Babka
@ 2013-11-26 10:58 ` Mel Gorman
0 siblings, 0 replies; 20+ messages in thread
From: Mel Gorman @ 2013-11-26 10:58 UTC (permalink / raw)
To: Vlastimil Babka; +Cc: linux-mm, linux-kernel, Rik van Riel
On Mon, Nov 25, 2013 at 03:26:09PM +0100, Vlastimil Babka wrote:
> Compaction temporarily marks pageblocks where it fails to isolate pages as
> to-be-skipped in further compactions, in order to improve efficiency. One of
> the reasons to fail isolating pages is that isolation is not attempted in
> pageblocks that are not of MIGRATE_MOVABLE (or CMA) type.
>
> The problem is that blocks skipped due to not being MIGRATE_MOVABLE in async
> compaction become skipped due to the temporary mark also in future sync
> compaction. Moreover, this may follow quite soon during __alloc_page_slowpath,
> without much time for kswapd to clear the pageblock skip marks. This goes
> against the idea that sync compaction should try to scan these blocks more
> thoroughly than the async compaction.
>
> The fix is to ensure in async compaction that these !MIGRATE_MOVABLE blocks are
> not marked to be skipped. Note this should not affect performance or locking
> impact of further async compactions, as skipping a block due to being
> !MIGRATE_MOVABLE is done soon after skipping a block marked to be skipped, both
> without locking.
>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Rik van Riel <riel@redhat.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
> mm/compaction.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 0702bdf..f481193 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -455,6 +455,8 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
> unsigned long flags;
> bool locked = false;
> struct page *page = NULL, *valid_page = NULL;
> + bool skipped_unmovable = false;
> +
>
whitespace damage.
> /*
> * Ensure that there are not too many pages isolated from the LRU
> @@ -530,6 +532,7 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
> if (!cc->sync && last_pageblock_nr != pageblock_nr &&
> !migrate_async_suitable(get_pageblock_migratetype(page))) {
> cc->finished_update_migrate = true;
> + skipped_unmovable = true;
> goto next_pageblock;
> }
>
Minor nitpick, but it's also unreclaimable and isolate blocks that are
being skipped here. If you do another revision, consider rephrasing
s/unmovable/unsuitable/ where appropriate. It's fairly obvious from
context so if you decide not to, that's fine too.
> @@ -624,7 +627,7 @@ next_pageblock:
> spin_unlock_irqrestore(&zone->lru_lock, flags);
>
> /* Update the pageblock-skip if the whole pageblock was scanned */
> - if (low_pfn == end_pfn)
> + if (low_pfn == end_pfn && !skipped_unmovable)
> update_pageblock_skip(cc, valid_page, nr_isolated, true);
>
This comment is now out of date. If the comment gets updated then feel
free to add
Acked-by: Mel Gorman <mgorman@suse.de>
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 5/5] mm: compaction: reset scanner positions immediately when they meet
2013-11-25 14:26 [RFC PATCH 0/5] Memory compaction efficiency improvements Vlastimil Babka
` (3 preceding siblings ...)
2013-11-25 14:26 ` [PATCH 4/5] mm: compaction: do not mark unmovable pageblocks as skipped in async compaction Vlastimil Babka
@ 2013-11-25 14:26 ` Vlastimil Babka
2013-11-26 11:03 ` Mel Gorman
2013-12-04 14:30 ` [PATCH] mm: compaction: Trace compaction begin and end Mel Gorman
5 siblings, 1 reply; 20+ messages in thread
From: Vlastimil Babka @ 2013-11-25 14:26 UTC (permalink / raw)
To: linux-mm; +Cc: Vlastimil Babka, linux-kernel, Mel Gorman, Rik van Riel
Compaction used to start its migrate and free page scaners at the zone's lowest
and highest pfn, respectively. Later, caching was introduced to remember the
scanners' progress across compaction attempts so that pageblocks are not
re-scanned uselessly. Additionally, pageblocks where isolation failed are
marked to be quickly skipped when encountered again in future compactions.
Currently, both the reset of cached pfn's and clearing of the pageblock skip
information for a zone is done in __reset_isolation_suitable(). This function
gets called when:
- compaction is restarting after being deferred
- compact_blockskip_flush flag is set in compact_finished() when the scanners
meet (and not again cleared when direct compaction succeeds in allocation)
and kswapd acts upon this flag before going to sleep
This behavior is suboptimal for several reasons:
- when direct sync compaction is called after async compaction fails (in the
allocation slowpath), it will effectively do nothing, unless kswapd
happens to process the compact_blockskip_flush flag meanwhile. This is racy
and goes against the purpose of sync compaction to more thoroughly retry
the compaction of a zone where async compaction has failed.
The restart-after-deferring path cannot help here as deferring happens only
after the sync compaction fails. It is also done only for the preferred
zone, while the compaction might be done for a fallback zone.
- the mechanism of marking pageblock to be skipped has little value since the
cached pfn's are reset only together with the pageblock skip flags. This
effectively limits pageblock skip usage to parallel compactions.
This patch changes compact_finished() so that cached pfn's are reset
immediately when the scanners meet. Clearing pageblock skip flags is unchanged,
as well as the other situations where cached pfn's are reset. This allows the
sync-after-async compaction to retry pageblocks not marked as skipped, such as
blocks !MIGRATE_MOVABLE blocks that async compactions now skips without
marking them.
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
mm/compaction.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/mm/compaction.c b/mm/compaction.c
index f481193..2c2cc4a 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -843,6 +843,10 @@ static int compact_finished(struct zone *zone,
/* Compaction run completes if the migrate and free scanner meet */
if (cc->free_pfn <= cc->migrate_pfn) {
+ /* Let the next compaction start anew. */
+ zone->compact_cached_migrate_pfn = zone->zone_start_pfn;
+ zone->compact_cached_free_pfn = zone_end_pfn(zone);
+
/*
* Mark that the PG_migrate_skip information should be cleared
* by kswapd when it goes to sleep. kswapd does not set the
--
1.8.1.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 5/5] mm: compaction: reset scanner positions immediately when they meet
2013-11-25 14:26 ` [PATCH 5/5] mm: compaction: reset scanner positions immediately when they meet Vlastimil Babka
@ 2013-11-26 11:03 ` Mel Gorman
0 siblings, 0 replies; 20+ messages in thread
From: Mel Gorman @ 2013-11-26 11:03 UTC (permalink / raw)
To: Vlastimil Babka; +Cc: linux-mm, linux-kernel, Rik van Riel
On Mon, Nov 25, 2013 at 03:26:10PM +0100, Vlastimil Babka wrote:
> Compaction used to start its migrate and free page scaners at the zone's lowest
> and highest pfn, respectively. Later, caching was introduced to remember the
> scanners' progress across compaction attempts so that pageblocks are not
> re-scanned uselessly. Additionally, pageblocks where isolation failed are
> marked to be quickly skipped when encountered again in future compactions.
>
> Currently, both the reset of cached pfn's and clearing of the pageblock skip
> information for a zone is done in __reset_isolation_suitable(). This function
> gets called when:
> - compaction is restarting after being deferred
> - compact_blockskip_flush flag is set in compact_finished() when the scanners
> meet (and not again cleared when direct compaction succeeds in allocation)
> and kswapd acts upon this flag before going to sleep
>
> This behavior is suboptimal for several reasons:
> - when direct sync compaction is called after async compaction fails (in the
> allocation slowpath), it will effectively do nothing, unless kswapd
> happens to process the compact_blockskip_flush flag meanwhile. This is racy
> and goes against the purpose of sync compaction to more thoroughly retry
> the compaction of a zone where async compaction has failed.
> The restart-after-deferring path cannot help here as deferring happens only
> after the sync compaction fails. It is also done only for the preferred
> zone, while the compaction might be done for a fallback zone.
> - the mechanism of marking pageblock to be skipped has little value since the
> cached pfn's are reset only together with the pageblock skip flags. This
> effectively limits pageblock skip usage to parallel compactions.
>
> This patch changes compact_finished() so that cached pfn's are reset
> immediately when the scanners meet. Clearing pageblock skip flags is unchanged,
> as well as the other situations where cached pfn's are reset. This allows the
> sync-after-async compaction to retry pageblocks not marked as skipped, such as
> blocks !MIGRATE_MOVABLE blocks that async compactions now skips without
> marking them.
>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Rik van Riel <riel@redhat.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@suse.de>
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH] mm: compaction: Trace compaction begin and end
2013-11-25 14:26 [RFC PATCH 0/5] Memory compaction efficiency improvements Vlastimil Babka
` (4 preceding siblings ...)
2013-11-25 14:26 ` [PATCH 5/5] mm: compaction: reset scanner positions immediately when they meet Vlastimil Babka
@ 2013-12-04 14:30 ` Mel Gorman
2013-12-04 14:51 ` Vlastimil Babka
5 siblings, 1 reply; 20+ messages in thread
From: Mel Gorman @ 2013-12-04 14:30 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm, linux-kernel, Rik van Riel, Vlastimil Babka
This patch adds two tracepoints for compaction begin and end of a zone. Using
this it is possible to calculate how much time a workload is spending
within compaction and potentially debug problems related to cached pfns
for scanning. In combination with the direct reclaim and slab trace points
it should be possible to estimate most allocation-related overhead for
a workload.
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
include/trace/events/compaction.h | 42 +++++++++++++++++++++++++++++++++++++++
mm/compaction.c | 4 ++++
2 files changed, 46 insertions(+)
diff --git a/include/trace/events/compaction.h b/include/trace/events/compaction.h
index fde1b3e..f4e115a 100644
--- a/include/trace/events/compaction.h
+++ b/include/trace/events/compaction.h
@@ -67,6 +67,48 @@ TRACE_EVENT(mm_compaction_migratepages,
__entry->nr_failed)
);
+TRACE_EVENT(mm_compaction_begin,
+ TP_PROTO(unsigned long zone_start, unsigned long migrate_start,
+ unsigned long zone_end, unsigned long free_start),
+
+ TP_ARGS(zone_start, migrate_start, zone_end, free_start),
+
+ TP_STRUCT__entry(
+ __field(unsigned long, zone_start)
+ __field(unsigned long, migrate_start)
+ __field(unsigned long, zone_end)
+ __field(unsigned long, free_start)
+ ),
+
+ TP_fast_assign(
+ __entry->zone_start = zone_start;
+ __entry->migrate_start = migrate_start;
+ __entry->zone_end = zone_end;
+ __entry->free_start = free_start;
+ ),
+
+ TP_printk("zone_start=%lu migrate_start=%lu zone_end=%lu free_start=%lu",
+ __entry->zone_start,
+ __entry->migrate_start,
+ __entry->zone_end,
+ __entry->free_start)
+);
+
+TRACE_EVENT(mm_compaction_end,
+ TP_PROTO(int status),
+
+ TP_ARGS(status),
+
+ TP_STRUCT__entry(
+ __field(int, status)
+ ),
+
+ TP_fast_assign(
+ __entry->status = status;
+ ),
+
+ TP_printk("status=%d", __entry->status)
+);
#endif /* _TRACE_COMPACTION_H */
diff --git a/mm/compaction.c b/mm/compaction.c
index c437893..78ff866 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -960,6 +960,8 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
if (compaction_restarting(zone, cc->order) && !current_is_kswapd())
__reset_isolation_suitable(zone);
+ trace_mm_compaction_begin(start_pfn, cc->migrate_pfn, end_pfn, cc->free_pfn);
+
migrate_prep_local();
while ((ret = compact_finished(zone, cc)) == COMPACT_CONTINUE) {
@@ -1005,6 +1007,8 @@ out:
cc->nr_freepages -= release_freepages(&cc->freepages);
VM_BUG_ON(cc->nr_freepages != 0);
+ trace_mm_compaction_end(ret);
+
return ret;
}
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH] mm: compaction: Trace compaction begin and end
2013-12-04 14:30 ` [PATCH] mm: compaction: Trace compaction begin and end Mel Gorman
@ 2013-12-04 14:51 ` Vlastimil Babka
2013-12-05 9:05 ` Mel Gorman
2013-12-05 9:07 ` [PATCH] mm: compaction: Trace compaction begin and end v2 Mel Gorman
0 siblings, 2 replies; 20+ messages in thread
From: Vlastimil Babka @ 2013-12-04 14:51 UTC (permalink / raw)
To: Mel Gorman, Andrew Morton; +Cc: linux-mm, linux-kernel, Rik van Riel
On 12/04/2013 03:30 PM, Mel Gorman wrote:
> This patch adds two tracepoints for compaction begin and end of a zone. Using
> this it is possible to calculate how much time a workload is spending
> within compaction and potentially debug problems related to cached pfns
> for scanning.
I guess for debugging pfns it would be also useful to print their values
also in mm_compaction_end.
> In combination with the direct reclaim and slab trace points
> it should be possible to estimate most allocation-related overhead for
> a workload.
>
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> ---
> include/trace/events/compaction.h | 42 +++++++++++++++++++++++++++++++++++++++
> mm/compaction.c | 4 ++++
> 2 files changed, 46 insertions(+)
>
> diff --git a/include/trace/events/compaction.h b/include/trace/events/compaction.h
> index fde1b3e..f4e115a 100644
> --- a/include/trace/events/compaction.h
> +++ b/include/trace/events/compaction.h
> @@ -67,6 +67,48 @@ TRACE_EVENT(mm_compaction_migratepages,
> __entry->nr_failed)
> );
>
> +TRACE_EVENT(mm_compaction_begin,
> + TP_PROTO(unsigned long zone_start, unsigned long migrate_start,
> + unsigned long zone_end, unsigned long free_start),
> +
> + TP_ARGS(zone_start, migrate_start, zone_end, free_start),
IMHO a better order would be:
zone_start, migrate_start, free_start, zone_end
(well especially in the TP_printk part anyway).
> +
> + TP_STRUCT__entry(
> + __field(unsigned long, zone_start)
> + __field(unsigned long, migrate_start)
> + __field(unsigned long, zone_end)
> + __field(unsigned long, free_start)
> + ),
> +
> + TP_fast_assign(
> + __entry->zone_start = zone_start;
> + __entry->migrate_start = migrate_start;
> + __entry->zone_end = zone_end;
> + __entry->free_start = free_start;
> + ),
> +
> + TP_printk("zone_start=%lu migrate_start=%lu zone_end=%lu free_start=%lu",
> + __entry->zone_start,
> + __entry->migrate_start,
> + __entry->zone_end,
> + __entry->free_start)
> +);
> +
> +TRACE_EVENT(mm_compaction_end,
> + TP_PROTO(int status),
> +
> + TP_ARGS(status),
> +
> + TP_STRUCT__entry(
> + __field(int, status)
> + ),
> +
> + TP_fast_assign(
> + __entry->status = status;
> + ),
> +
> + TP_printk("status=%d", __entry->status)
> +);
>
> #endif /* _TRACE_COMPACTION_H */
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index c437893..78ff866 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -960,6 +960,8 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
> if (compaction_restarting(zone, cc->order) && !current_is_kswapd())
> __reset_isolation_suitable(zone);
>
> + trace_mm_compaction_begin(start_pfn, cc->migrate_pfn, end_pfn, cc->free_pfn);
> +
> migrate_prep_local();
>
> while ((ret = compact_finished(zone, cc)) == COMPACT_CONTINUE) {
> @@ -1005,6 +1007,8 @@ out:
> cc->nr_freepages -= release_freepages(&cc->freepages);
> VM_BUG_ON(cc->nr_freepages != 0);
>
> + trace_mm_compaction_end(ret);
> +
> return ret;
> }
>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] mm: compaction: Trace compaction begin and end
2013-12-04 14:51 ` Vlastimil Babka
@ 2013-12-05 9:05 ` Mel Gorman
2013-12-06 9:50 ` Vlastimil Babka
2013-12-05 9:07 ` [PATCH] mm: compaction: Trace compaction begin and end v2 Mel Gorman
1 sibling, 1 reply; 20+ messages in thread
From: Mel Gorman @ 2013-12-05 9:05 UTC (permalink / raw)
To: Vlastimil Babka; +Cc: Andrew Morton, linux-mm, linux-kernel, Rik van Riel
On Wed, Dec 04, 2013 at 03:51:57PM +0100, Vlastimil Babka wrote:
> On 12/04/2013 03:30 PM, Mel Gorman wrote:
> >This patch adds two tracepoints for compaction begin and end of a zone. Using
> >this it is possible to calculate how much time a workload is spending
> >within compaction and potentially debug problems related to cached pfns
> >for scanning.
>
> I guess for debugging pfns it would be also useful to print their
> values also in mm_compaction_end.
>
What additional information would we get from it and what new
conclusions could we draw? We could guess how much work the
scanners did but the trace_mm_compaction_isolate_freepages and
trace_mm_compaction_isolate_migratepages tracepoints already accurately
tell us that. The scanner PFNs alone do not tell us if the cached pfns
were updated and even if it did, the information can be changed by
parallel resets so it would be hard to draw reasonable conclusions from
the information. We could guess where compaction hotspots might be but
without the skip information, we could not detect it accurately. If we
wanted to detect that accurately, the mm_compaction_isolate* tracepoints
would be the one to update.
I was primarily concerned about compaction time so I might be looking
at this the wrong way but it feels like having the PFNs at the end of a
compaction cycle would be of marginal benefit.
> >In combination with the direct reclaim and slab trace points
> >it should be possible to estimate most allocation-related overhead for
> >a workload.
> >
> >Signed-off-by: Mel Gorman <mgorman@suse.de>
> >---
> > include/trace/events/compaction.h | 42 +++++++++++++++++++++++++++++++++++++++
> > mm/compaction.c | 4 ++++
> > 2 files changed, 46 insertions(+)
> >
> >diff --git a/include/trace/events/compaction.h b/include/trace/events/compaction.h
> >index fde1b3e..f4e115a 100644
> >--- a/include/trace/events/compaction.h
> >+++ b/include/trace/events/compaction.h
> >@@ -67,6 +67,48 @@ TRACE_EVENT(mm_compaction_migratepages,
> > __entry->nr_failed)
> > );
> >
> >+TRACE_EVENT(mm_compaction_begin,
> >+ TP_PROTO(unsigned long zone_start, unsigned long migrate_start,
> >+ unsigned long zone_end, unsigned long free_start),
> >+
> >+ TP_ARGS(zone_start, migrate_start, zone_end, free_start),
>
> IMHO a better order would be:
> zone_start, migrate_start, free_start, zone_end
> (well especially in the TP_printk part anyway).
>
Ok, that would put them in PFN order which may be easier to visualise.
I'll post a V2 with that change at least.
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] mm: compaction: Trace compaction begin and end
2013-12-05 9:05 ` Mel Gorman
@ 2013-12-06 9:50 ` Vlastimil Babka
0 siblings, 0 replies; 20+ messages in thread
From: Vlastimil Babka @ 2013-12-06 9:50 UTC (permalink / raw)
To: Mel Gorman; +Cc: Andrew Morton, linux-mm, linux-kernel, Rik van Riel
On 12/05/2013 10:05 AM, Mel Gorman wrote:
> On Wed, Dec 04, 2013 at 03:51:57PM +0100, Vlastimil Babka wrote:
>> On 12/04/2013 03:30 PM, Mel Gorman wrote:
>>> This patch adds two tracepoints for compaction begin and end of a zone. Using
>>> this it is possible to calculate how much time a workload is spending
>>> within compaction and potentially debug problems related to cached pfns
>>> for scanning.
>>
>> I guess for debugging pfns it would be also useful to print their
>> values also in mm_compaction_end.
>>
>
> What additional information would we get from it and what new
> conclusions could we draw? We could guess how much work the
> scanners did but the trace_mm_compaction_isolate_freepages and
> trace_mm_compaction_isolate_migratepages tracepoints already accurately
> tell us that. The scanner PFNs alone do not tell us if the cached pfns
> were updated and even if it did, the information can be changed by
> parallel resets so it would be hard to draw reasonable conclusions from
> the information. We could guess where compaction hotspots might be but
> without the skip information, we could not detect it accurately. If we
> wanted to detect that accurately, the mm_compaction_isolate* tracepoints
> would be the one to update.
OK, I agree. I guess multiple compaction_begin events would hint at
scanners being stuck anyway.
> I was primarily concerned about compaction time so I might be looking
> at this the wrong way but it feels like having the PFNs at the end of a
> compaction cycle would be of marginal benefit.
>
>>> In combination with the direct reclaim and slab trace points
>>> it should be possible to estimate most allocation-related overhead for
>>> a workload.
>>>
>>> Signed-off-by: Mel Gorman <mgorman@suse.de>
>>> ---
>>> include/trace/events/compaction.h | 42 +++++++++++++++++++++++++++++++++++++++
>>> mm/compaction.c | 4 ++++
>>> 2 files changed, 46 insertions(+)
>>>
>>> diff --git a/include/trace/events/compaction.h b/include/trace/events/compaction.h
>>> index fde1b3e..f4e115a 100644
>>> --- a/include/trace/events/compaction.h
>>> +++ b/include/trace/events/compaction.h
>>> @@ -67,6 +67,48 @@ TRACE_EVENT(mm_compaction_migratepages,
>>> __entry->nr_failed)
>>> );
>>>
>>> +TRACE_EVENT(mm_compaction_begin,
>>> + TP_PROTO(unsigned long zone_start, unsigned long migrate_start,
>>> + unsigned long zone_end, unsigned long free_start),
>>> +
>>> + TP_ARGS(zone_start, migrate_start, zone_end, free_start),
>>
>> IMHO a better order would be:
>> zone_start, migrate_start, free_start, zone_end
>> (well especially in the TP_printk part anyway).
>>
>
> Ok, that would put them in PFN order which may be easier to visualise.
> I'll post a V2 with that change at least.
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH] mm: compaction: Trace compaction begin and end v2
2013-12-04 14:51 ` Vlastimil Babka
2013-12-05 9:05 ` Mel Gorman
@ 2013-12-05 9:07 ` Mel Gorman
2013-12-06 9:50 ` Vlastimil Babka
1 sibling, 1 reply; 20+ messages in thread
From: Mel Gorman @ 2013-12-05 9:07 UTC (permalink / raw)
To: Andrew Morton; +Cc: Vlastimil Babka, linux-mm, linux-kernel, Rik van Riel
Changelog since V1
o Print output parameters in pfn order (vbabka)
This patch adds two tracepoints for compaction begin and end of a zone. Using
this it is possible to calculate how much time a workload is spending
within compaction and potentially debug problems related to cached pfns
for scanning. In combination with the direct reclaim and slab trace points
it should be possible to estimate most allocation-related overhead for
a workload.
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
include/trace/events/compaction.h | 42 +++++++++++++++++++++++++++++++++++++++
mm/compaction.c | 4 ++++
2 files changed, 46 insertions(+)
diff --git a/include/trace/events/compaction.h b/include/trace/events/compaction.h
index fde1b3e..06f544e 100644
--- a/include/trace/events/compaction.h
+++ b/include/trace/events/compaction.h
@@ -67,6 +67,48 @@ TRACE_EVENT(mm_compaction_migratepages,
__entry->nr_failed)
);
+TRACE_EVENT(mm_compaction_begin,
+ TP_PROTO(unsigned long zone_start, unsigned long migrate_start,
+ unsigned long free_start, unsigned long zone_end),
+
+ TP_ARGS(zone_start, migrate_start, free_start, zone_end),
+
+ TP_STRUCT__entry(
+ __field(unsigned long, zone_start)
+ __field(unsigned long, migrate_start)
+ __field(unsigned long, free_start)
+ __field(unsigned long, zone_end)
+ ),
+
+ TP_fast_assign(
+ __entry->zone_start = zone_start;
+ __entry->migrate_start = migrate_start;
+ __entry->free_start = free_start;
+ __entry->zone_end = zone_end;
+ ),
+
+ TP_printk("zone_start=%lu migrate_start=%lu free_start=%lu zone_end=%lu",
+ __entry->zone_start,
+ __entry->migrate_start,
+ __entry->free_start,
+ __entry->zone_end)
+);
+
+TRACE_EVENT(mm_compaction_end,
+ TP_PROTO(int status),
+
+ TP_ARGS(status),
+
+ TP_STRUCT__entry(
+ __field(int, status)
+ ),
+
+ TP_fast_assign(
+ __entry->status = status;
+ ),
+
+ TP_printk("status=%d", __entry->status)
+);
#endif /* _TRACE_COMPACTION_H */
diff --git a/mm/compaction.c b/mm/compaction.c
index 805165b..bb50fd3 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -966,6 +966,8 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
if (compaction_restarting(zone, cc->order) && !current_is_kswapd())
__reset_isolation_suitable(zone);
+ trace_mm_compaction_begin(start_pfn, cc->migrate_pfn, cc->free_pfn, end_pfn);
+
migrate_prep_local();
while ((ret = compact_finished(zone, cc)) == COMPACT_CONTINUE) {
@@ -1011,6 +1013,8 @@ out:
cc->nr_freepages -= release_freepages(&cc->freepages);
VM_BUG_ON(cc->nr_freepages != 0);
+ trace_mm_compaction_end(ret);
+
return ret;
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH] mm: compaction: Trace compaction begin and end v2
2013-12-05 9:07 ` [PATCH] mm: compaction: Trace compaction begin and end v2 Mel Gorman
@ 2013-12-06 9:50 ` Vlastimil Babka
0 siblings, 0 replies; 20+ messages in thread
From: Vlastimil Babka @ 2013-12-06 9:50 UTC (permalink / raw)
To: Mel Gorman, Andrew Morton; +Cc: linux-mm, linux-kernel, Rik van Riel
On 12/05/2013 10:07 AM, Mel Gorman wrote:
> Changelog since V1
> o Print output parameters in pfn order (vbabka)
>
> This patch adds two tracepoints for compaction begin and end of a zone. Using
> this it is possible to calculate how much time a workload is spending
> within compaction and potentially debug problems related to cached pfns
> for scanning. In combination with the direct reclaim and slab trace points
> it should be possible to estimate most allocation-related overhead for
> a workload.
>
> Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
> ---
> include/trace/events/compaction.h | 42 +++++++++++++++++++++++++++++++++++++++
> mm/compaction.c | 4 ++++
> 2 files changed, 46 insertions(+)
>
> diff --git a/include/trace/events/compaction.h b/include/trace/events/compaction.h
> index fde1b3e..06f544e 100644
> --- a/include/trace/events/compaction.h
> +++ b/include/trace/events/compaction.h
> @@ -67,6 +67,48 @@ TRACE_EVENT(mm_compaction_migratepages,
> __entry->nr_failed)
> );
>
> +TRACE_EVENT(mm_compaction_begin,
> + TP_PROTO(unsigned long zone_start, unsigned long migrate_start,
> + unsigned long free_start, unsigned long zone_end),
> +
> + TP_ARGS(zone_start, migrate_start, free_start, zone_end),
> +
> + TP_STRUCT__entry(
> + __field(unsigned long, zone_start)
> + __field(unsigned long, migrate_start)
> + __field(unsigned long, free_start)
> + __field(unsigned long, zone_end)
> + ),
> +
> + TP_fast_assign(
> + __entry->zone_start = zone_start;
> + __entry->migrate_start = migrate_start;
> + __entry->free_start = free_start;
> + __entry->zone_end = zone_end;
> + ),
> +
> + TP_printk("zone_start=%lu migrate_start=%lu free_start=%lu zone_end=%lu",
> + __entry->zone_start,
> + __entry->migrate_start,
> + __entry->free_start,
> + __entry->zone_end)
> +);
> +
> +TRACE_EVENT(mm_compaction_end,
> + TP_PROTO(int status),
> +
> + TP_ARGS(status),
> +
> + TP_STRUCT__entry(
> + __field(int, status)
> + ),
> +
> + TP_fast_assign(
> + __entry->status = status;
> + ),
> +
> + TP_printk("status=%d", __entry->status)
> +);
>
> #endif /* _TRACE_COMPACTION_H */
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 805165b..bb50fd3 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -966,6 +966,8 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
> if (compaction_restarting(zone, cc->order) && !current_is_kswapd())
> __reset_isolation_suitable(zone);
>
> + trace_mm_compaction_begin(start_pfn, cc->migrate_pfn, cc->free_pfn, end_pfn);
> +
> migrate_prep_local();
>
> while ((ret = compact_finished(zone, cc)) == COMPACT_CONTINUE) {
> @@ -1011,6 +1013,8 @@ out:
> cc->nr_freepages -= release_freepages(&cc->freepages);
> VM_BUG_ON(cc->nr_freepages != 0);
>
> + trace_mm_compaction_end(ret);
> +
> return ret;
> }
>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 20+ messages in thread