[PATCH v2 0/5] Assorted compaction cleanups and optimizations

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/5] Assorted compaction cleanups and optimizations
@ 2015-07-31 15:28 Vlastimil Babka
  2015-07-31 15:28 ` [PATCH v2 1/5] mm, compaction: more robust check for scanners meeting Vlastimil Babka
                   ` (5 more replies)
  0 siblings, 6 replies; 9+ messages in thread
From: Vlastimil Babka @ 2015-07-31 15:28 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: linux-kernel, Vlastimil Babka, Christoph Lameter, David Rientjes,
	Joonsoo Kim, Mel Gorman, Michal Nazarewicz, Minchan Kim,
	Naoya Horiguchi, Rik van Riel

v2 changes:
 - dropped Patch 6 as adjusting to Joonsoo's objection would be too
   complicated and the results didn't justify it
 - don't check for compound order > 0 in patches 4 and 5 as suggested by
   Michal Nazarewicz. Negative values are handled by converting to unsinged
   int, the pfn calculations work fine with 0 and it's unlikely to see 0
   due to a race when we just checked PageCompound().

This series is partly the cleanups that were posted as part of the RFC for
changing initial scanning positions [1] and partly new relatively simple
scanner optimizations (yes, they are still possible). I've resumed working
on the bigger scanner changes, but that will take a while, so no point in
delaying these smaller patches.

The interesting patches are 4 and 5. In 4, skipping of compound pages in single
iteration is improved for migration scanner, so it works also for !PageLRU
compound pages such as hugetlbfs, slab etc. Patch 5 introduces this kind of
skipping in the free scanner. The trick is that we can read compound_order()
without any protection, if we are careful to filter out values larger than
MAX_ORDER. The only danger is that we skip too much.  The same trick was
already used for reading the freepage order in the migrate scanner.

To demonstrate improvements of Patches 4 and 5 I've run stress-highalloc from
mmtests, set to simulate THP allocations (including __GFP_COMP) on a 4GB system
where 1GB was occupied by hugetlbfs pages. I'll include just the relevant
stats:

                               Patch 3     Patch 4     Patch 5

Compaction stalls                 7523        7529        7515
Compaction success                 323         304         322
Compaction failures               7200        7224        7192
Page migrate success            247778      264395      240737
Page migrate failure             15358       33184       21621
Compaction pages isolated       906928      980192      909983
Compaction migrate scanned     2005277     1692805     1498800
Compaction free scanned       13255284    11539986     9011276
Compaction cost                    288         305         277

With 5 iterations per patch, the results are still noisy, but we can see that
Patch 4 does reduce migrate_scanned by 15% thanks to skipping the hugetlbfs
pages at once. Interestingly, free_scanned is also reduced and I have no idea
why. Patch 5 further reduces free_scanned as expected, by 15%. Other stats
are unaffected modulo noise.

[1] https://lkml.org/lkml/2015/1/19/158

Vlastimil Babka (5):
  mm, compaction: more robust check for scanners meeting
  mm, compaction: simplify handling restart position in free pages
    scanner
  mm, compaction: encapsulate resetting cached scanner positions
  mm, compaction: always skip compound pages by order in migrate scanner
  mm, compaction: skip compound pages by order in free scanner

 mm/compaction.c | 134 ++++++++++++++++++++++++++++++++++++--------------------
 1 file changed, 86 insertions(+), 48 deletions(-)

-- 
2.4.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 1/5] mm, compaction: more robust check for scanners meeting
  2015-07-31 15:28 [PATCH v2 0/5] Assorted compaction cleanups and optimizations Vlastimil Babka
@ 2015-07-31 15:28 ` Vlastimil Babka
  2015-07-31 15:28 ` [PATCH v2 2/5] mm, compaction: simplify handling restart position in free pages scanner Vlastimil Babka
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Vlastimil Babka @ 2015-07-31 15:28 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: linux-kernel, Vlastimil Babka, Minchan Kim, Naoya Horiguchi,
	Christoph Lameter, David Rientjes, Joonsoo Kim, Mel Gorman,
	Michal Nazarewicz, Rik van Riel

Compaction should finish when the migration and free scanner meet, i.e. they
reach the same pageblock. Currently however, the test in compact_finished()
simply just compares the exact pfns, which may yield a false negative when the
free scanner position is in the middle of a pageblock and the migration scanner
reaches the begining of the same pageblock.

This hasn't been a problem until commit e14c720efdd7 ("mm, compaction: remember
position within pageblock in free pages scanner") allowed the free scanner
position to be in the middle of a pageblock between invocations.  The hot-fix
1d5bfe1ffb5b ("mm, compaction: prevent infinite loop in compact_zone")
prevented the issue by adding a special check in the migration scanner to
satisfy the current detection of scanners meeting.

However, the proper fix is to make the detection more robust. This patch
introduces the compact_scanners_met() function that returns true when the free
scanner position is in the same or lower pageblock than the migration scanner.
The special case in isolate_migratepages() introduced by 1d5bfe1ffb5b is
removed.

Suggested-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Christoph Lameter <cl@linux.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
---
 mm/compaction.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 16e1b57..d46aaeb 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -902,6 +902,16 @@ static bool suitable_migration_target(struct page *page)
 }
 
 /*
+ * Test whether the free scanner has reached the same or lower pageblock than
+ * the migration scanner, and compaction should thus terminate.
+ */
+static inline bool compact_scanners_met(struct compact_control *cc)
+{
+	return (cc->free_pfn >> pageblock_order)
+		<= (cc->migrate_pfn >> pageblock_order);
+}
+
+/*
  * Based on information in the current compact_control, find blocks
  * suitable for isolating free pages from and then isolate them.
  */
@@ -1131,12 +1141,8 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
 	}
 
 	acct_isolated(zone, cc);
-	/*
-	 * Record where migration scanner will be restarted. If we end up in
-	 * the same pageblock as the free scanner, make the scanners fully
-	 * meet so that compact_finished() terminates compaction.
-	 */
-	cc->migrate_pfn = (end_pfn <= cc->free_pfn) ? low_pfn : cc->free_pfn;
+	/* Record where migration scanner will be restarted. */
+	cc->migrate_pfn = low_pfn;
 
 	return cc->nr_migratepages ? ISOLATE_SUCCESS : ISOLATE_NONE;
 }
@@ -1151,7 +1157,7 @@ static int __compact_finished(struct zone *zone, struct compact_control *cc,
 		return COMPACT_PARTIAL;
 
 	/* Compaction run completes if the migrate and free scanner meet */
-	if (cc->free_pfn <= cc->migrate_pfn) {
+	if (compact_scanners_met(cc)) {
 		/* Let the next compaction start anew. */
 		zone->compact_cached_migrate_pfn[0] = zone->zone_start_pfn;
 		zone->compact_cached_migrate_pfn[1] = zone->zone_start_pfn;
@@ -1380,7 +1386,7 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
 			 * migrate_pages() may return -ENOMEM when scanners meet
 			 * and we want compact_finished() to detect it
 			 */
-			if (err == -ENOMEM && cc->free_pfn > cc->migrate_pfn) {
+			if (err == -ENOMEM && !compact_scanners_met(cc)) {
 				ret = COMPACT_PARTIAL;
 				goto out;
 			}
-- 
2.4.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 2/5] mm, compaction: simplify handling restart position in free pages scanner
  2015-07-31 15:28 [PATCH v2 0/5] Assorted compaction cleanups and optimizations Vlastimil Babka
  2015-07-31 15:28 ` [PATCH v2 1/5] mm, compaction: more robust check for scanners meeting Vlastimil Babka
@ 2015-07-31 15:28 ` Vlastimil Babka
  2015-08-03 17:00   ` Michal Nazarewicz
  2015-07-31 15:28 ` [PATCH v2 3/5] mm, compaction: encapsulate resetting cached scanner positions Vlastimil Babka
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 9+ messages in thread
From: Vlastimil Babka @ 2015-07-31 15:28 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: linux-kernel, Vlastimil Babka, Minchan Kim, Michal Nazarewicz,
	Naoya Horiguchi, Christoph Lameter, Rik van Riel, David Rientjes,
	Joonsoo Kim, Mel Gorman

Handling the position where compaction free scanner should restart (stored in
cc->free_pfn) got more complex with commit e14c720efdd7 ("mm, compaction:
remember position within pageblock in free pages scanner"). Currently the
position is updated in each loop iteration of isolate_freepages(), although it
should be enough to update it only when breaking from the loop. There's also
an extra check outside the loop updates the position in case we have met the
migration scanner.

This can be simplified if we move the test for having isolated enough from the
for-loop header next to the test for contention, and determining the restart
position only in these cases. We can reuse the isolate_start_pfn variable for
this instead of setting cc->free_pfn directly. Outside the loop, we can simply
set cc->free_pfn to current value of isolate_start_pfn without any extra check.

Also add a VM_BUG_ON to catch possible mistake in the future, in case we later
add a new condition that terminates isolate_freepages_block() prematurely
without also considering the condition in isolate_freepages().

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
---
 mm/compaction.c | 35 ++++++++++++++++++++---------------
 1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index d46aaeb..7e0a814 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -947,8 +947,7 @@ static void isolate_freepages(struct compact_control *cc)
 	 * pages on cc->migratepages. We stop searching if the migrate
 	 * and free page scanners meet or enough free pages are isolated.
 	 */
-	for (; block_start_pfn >= low_pfn &&
-			cc->nr_migratepages > cc->nr_freepages;
+	for (; block_start_pfn >= low_pfn;
 				block_end_pfn = block_start_pfn,
 				block_start_pfn -= pageblock_nr_pages,
 				isolate_start_pfn = block_start_pfn) {
@@ -980,6 +979,8 @@ static void isolate_freepages(struct compact_control *cc)
 					block_end_pfn, freelist, false);
 
 		/*
+		 * If we isolated enough freepages, or aborted due to async
+		 * compaction being contended, terminate the loop.
 		 * Remember where the free scanner should restart next time,
 		 * which is where isolate_freepages_block() left off.
 		 * But if it scanned the whole pageblock, isolate_start_pfn
@@ -988,27 +989,31 @@ static void isolate_freepages(struct compact_control *cc)
 		 * In that case we will however want to restart at the start
 		 * of the previous pageblock.
 		 */
-		cc->free_pfn = (isolate_start_pfn < block_end_pfn) ?
-				isolate_start_pfn :
-				block_start_pfn - pageblock_nr_pages;
-
-		/*
-		 * isolate_freepages_block() might have aborted due to async
-		 * compaction being contended
-		 */
-		if (cc->contended)
+		if ((cc->nr_freepages >= cc->nr_migratepages)
+							|| cc->contended) {
+			if (isolate_start_pfn >= block_end_pfn)
+				isolate_start_pfn =
+					block_start_pfn - pageblock_nr_pages;
 			break;
+		} else {
+			/*
+			 * isolate_freepages_block() should not terminate
+			 * prematurely unless contended, or isolated enough
+			 */
+			VM_BUG_ON(isolate_start_pfn < block_end_pfn);
+		}
 	}
 
 	/* split_free_page does not map the pages */
 	map_pages(freelist);
 
 	/*
-	 * If we crossed the migrate scanner, we want to keep it that way
-	 * so that compact_finished() may detect this
+	 * Record where the free scanner will restart next time. Either we
+	 * broke from the loop and set isolate_start_pfn based on the last
+	 * call to isolate_freepages_block(), or we met the migration scanner
+	 * and the loop terminated due to isolate_start_pfn < low_pfn
 	 */
-	if (block_start_pfn < low_pfn)
-		cc->free_pfn = cc->migrate_pfn;
+	cc->free_pfn = isolate_start_pfn;
 }
 
 /*
-- 
2.4.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/5] mm, compaction: simplify handling restart position in free pages scanner
  2015-07-31 15:28 ` [PATCH v2 2/5] mm, compaction: simplify handling restart position in free pages scanner Vlastimil Babka
@ 2015-08-03 17:00   ` Michal Nazarewicz
  0 siblings, 0 replies; 9+ messages in thread
From: Michal Nazarewicz @ 2015-08-03 17:00 UTC (permalink / raw)
  To: Vlastimil Babka, Andrew Morton, linux-mm
  Cc: linux-kernel, Minchan Kim, Naoya Horiguchi, Christoph Lameter,
	Rik van Riel, David Rientjes, Joonsoo Kim, Mel Gorman

On Fri, Jul 31 2015, Vlastimil Babka wrote:
> Handling the position where compaction free scanner should restart (stored in
> cc->free_pfn) got more complex with commit e14c720efdd7 ("mm, compaction:
> remember position within pageblock in free pages scanner"). Currently the
> position is updated in each loop iteration of isolate_freepages(), although it
> should be enough to update it only when breaking from the loop. There's also
> an extra check outside the loop updates the position in case we have met the
> migration scanner.
>
> This can be simplified if we move the test for having isolated enough from the
> for-loop header next to the test for contention, and determining the restart
> position only in these cases. We can reuse the isolate_start_pfn variable for
> this instead of setting cc->free_pfn directly. Outside the loop, we can simply
> set cc->free_pfn to current value of isolate_start_pfn without any extra check.
>
> Also add a VM_BUG_ON to catch possible mistake in the future, in case we later
> add a new condition that terminates isolate_freepages_block() prematurely
> without also considering the condition in isolate_freepages().
>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Cc: Minchan Kim <minchan@kernel.org>
> Acked-by: Mel Gorman <mgorman@suse.de>
> Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Michal Nazarewicz <mina86@mina86.com>

Acked-by: Michal Nazarewicz <mina86@mina86.com>

> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: David Rientjes <rientjes@google.com>
> ---
>  mm/compaction.c | 35 ++++++++++++++++++++---------------
>  1 file changed, 20 insertions(+), 15 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index d46aaeb..7e0a814 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -947,8 +947,7 @@ static void isolate_freepages(struct compact_control *cc)
>  	 * pages on cc->migratepages. We stop searching if the migrate
>  	 * and free page scanners meet or enough free pages are isolated.
>  	 */
> -	for (; block_start_pfn >= low_pfn &&
> -			cc->nr_migratepages > cc->nr_freepages;
> +	for (; block_start_pfn >= low_pfn;
>  				block_end_pfn = block_start_pfn,
>  				block_start_pfn -= pageblock_nr_pages,
>  				isolate_start_pfn = block_start_pfn) {
> @@ -980,6 +979,8 @@ static void isolate_freepages(struct compact_control *cc)
>  					block_end_pfn, freelist, false);
>  
>  		/*
> +		 * If we isolated enough freepages, or aborted due to async
> +		 * compaction being contended, terminate the loop.
>  		 * Remember where the free scanner should restart next time,
>  		 * which is where isolate_freepages_block() left off.
>  		 * But if it scanned the whole pageblock, isolate_start_pfn
> @@ -988,27 +989,31 @@ static void isolate_freepages(struct compact_control *cc)
>  		 * In that case we will however want to restart at the start
>  		 * of the previous pageblock.
>  		 */
> -		cc->free_pfn = (isolate_start_pfn < block_end_pfn) ?
> -				isolate_start_pfn :
> -				block_start_pfn - pageblock_nr_pages;
> -
> -		/*
> -		 * isolate_freepages_block() might have aborted due to async
> -		 * compaction being contended
> -		 */
> -		if (cc->contended)
> +		if ((cc->nr_freepages >= cc->nr_migratepages)
> +							|| cc->contended) {
> +			if (isolate_start_pfn >= block_end_pfn)
> +				isolate_start_pfn =
> +					block_start_pfn - pageblock_nr_pages;
>  			break;
> +		} else {
> +			/*
> +			 * isolate_freepages_block() should not terminate
> +			 * prematurely unless contended, or isolated enough
> +			 */
> +			VM_BUG_ON(isolate_start_pfn < block_end_pfn);
> +		}
>  	}
>  
>  	/* split_free_page does not map the pages */
>  	map_pages(freelist);
>  
>  	/*
> -	 * If we crossed the migrate scanner, we want to keep it that way
> -	 * so that compact_finished() may detect this
> +	 * Record where the free scanner will restart next time. Either we
> +	 * broke from the loop and set isolate_start_pfn based on the last
> +	 * call to isolate_freepages_block(), or we met the migration scanner
> +	 * and the loop terminated due to isolate_start_pfn < low_pfn
>  	 */
> -	if (block_start_pfn < low_pfn)
> -		cc->free_pfn = cc->migrate_pfn;
> +	cc->free_pfn = isolate_start_pfn;
>  }
>  
>  /*
> -- 
> 2.4.6
>

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +--<mpn@google.com>--<xmpp:mina86@jabber.org>--ooO--(_)--Ooo--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 3/5] mm, compaction: encapsulate resetting cached scanner positions
  2015-07-31 15:28 [PATCH v2 0/5] Assorted compaction cleanups and optimizations Vlastimil Babka
  2015-07-31 15:28 ` [PATCH v2 1/5] mm, compaction: more robust check for scanners meeting Vlastimil Babka
  2015-07-31 15:28 ` [PATCH v2 2/5] mm, compaction: simplify handling restart position in free pages scanner Vlastimil Babka
@ 2015-07-31 15:28 ` Vlastimil Babka
  2015-07-31 15:28 ` [PATCH v2 4/5] mm, compaction: always skip compound pages by order in migrate scanner Vlastimil Babka
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Vlastimil Babka @ 2015-07-31 15:28 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: linux-kernel, Vlastimil Babka, Minchan Kim, Mel Gorman,
	Joonsoo Kim, Naoya Horiguchi, Christoph Lameter, Rik van Riel,
	David Rientjes, Michal Nazarewicz

Reseting the cached compaction scanner positions is now open-coded in
__reset_isolation_suitable() and compact_finished(). Encapsulate the
functionality in a new function reset_cached_positions().

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
---
 mm/compaction.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 7e0a814..07b6104 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -207,6 +207,13 @@ static inline bool isolation_suitable(struct compact_control *cc,
 	return !get_pageblock_skip(page);
 }
 
+static void reset_cached_positions(struct zone *zone)
+{
+	zone->compact_cached_migrate_pfn[0] = zone->zone_start_pfn;
+	zone->compact_cached_migrate_pfn[1] = zone->zone_start_pfn;
+	zone->compact_cached_free_pfn = zone_end_pfn(zone);
+}
+
 /*
  * This function is called to clear all cached information on pageblocks that
  * should be skipped for page isolation when the migrate and free page scanner
@@ -218,9 +225,6 @@ static void __reset_isolation_suitable(struct zone *zone)
 	unsigned long end_pfn = zone_end_pfn(zone);
 	unsigned long pfn;
 
-	zone->compact_cached_migrate_pfn[0] = start_pfn;
-	zone->compact_cached_migrate_pfn[1] = start_pfn;
-	zone->compact_cached_free_pfn = end_pfn;
 	zone->compact_blockskip_flush = false;
 
 	/* Walk the zone and mark every pageblock as suitable for isolation */
@@ -238,6 +242,8 @@ static void __reset_isolation_suitable(struct zone *zone)
 
 		clear_pageblock_skip(page);
 	}
+
+	reset_cached_positions(zone);
 }
 
 void reset_isolation_suitable(pg_data_t *pgdat)
@@ -1164,9 +1170,7 @@ static int __compact_finished(struct zone *zone, struct compact_control *cc,
 	/* Compaction run completes if the migrate and free scanner meet */
 	if (compact_scanners_met(cc)) {
 		/* Let the next compaction start anew. */
-		zone->compact_cached_migrate_pfn[0] = zone->zone_start_pfn;
-		zone->compact_cached_migrate_pfn[1] = zone->zone_start_pfn;
-		zone->compact_cached_free_pfn = zone_end_pfn(zone);
+		reset_cached_positions(zone);
 
 		/*
 		 * Mark that the PG_migrate_skip information should be cleared
-- 
2.4.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 4/5] mm, compaction: always skip compound pages by order in migrate scanner
  2015-07-31 15:28 [PATCH v2 0/5] Assorted compaction cleanups and optimizations Vlastimil Babka
                   ` (2 preceding siblings ...)
  2015-07-31 15:28 ` [PATCH v2 3/5] mm, compaction: encapsulate resetting cached scanner positions Vlastimil Babka
@ 2015-07-31 15:28 ` Vlastimil Babka
  2015-08-07  9:18   ` Vlastimil Babka
  2015-07-31 15:28 ` [PATCH v2 5/5] mm, compaction: skip compound pages by order in free scanner Vlastimil Babka
  2015-07-31 15:30 ` [PATCH v2 0/5] Assorted compaction cleanups and optimizations Vlastimil Babka
  5 siblings, 1 reply; 9+ messages in thread
From: Vlastimil Babka @ 2015-07-31 15:28 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: linux-kernel, Vlastimil Babka, Minchan Kim, Naoya Horiguchi,
	Christoph Lameter, Rik van Riel, David Rientjes, Joonsoo Kim,
	Mel Gorman, Michal Nazarewicz

The compaction migrate scanner tries to skip compound pages by their order, to
reduce number of iterations for pages it cannot isolate. The check is only done
if PageLRU() is true, which means it applies to THP pages, but not e.g.
hugetlbfs pages or any other non-LRU compound pages, which we have to iterate
by base pages.

This limitation comes from the assumption that it's only safe to read
compound_order() when we have the zone's lru_lock and THP cannot be split under
us. But the only danger (after filtering out order values that are not below
MAX_ORDER, to prevent overflows) is that we skip too much or too little after
reading a bogus compound_order() due to a rare race. This is the same reasoning
as patch 99c0fd5e51c4 ("mm, compaction: skip buddy pages by their order in the
migrate scanner") introduced for unsafely reading PageBuddy() order.

After this patch, all pages are tested for PageCompound() and we skip them by
compound_order().  The test is done after the test for balloon_page_movable()
as we don't want to assume if balloon pages (or other pages with own isolation
and migration implementation if a generic API gets implemented) are compound
or not.

When tested with stress-highalloc from mmtests on 4GB system with 1GB hugetlbfs
pages, the vmstat compact_migrate_scanned count decreased by 15%.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
---
 mm/compaction.c | 36 +++++++++++++++++-------------------
 1 file changed, 17 insertions(+), 19 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 07b6104..70b0776 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -680,6 +680,8 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 
 	/* Time to isolate some pages for migration */
 	for (; low_pfn < end_pfn; low_pfn++) {
+		bool is_lru;
+
 		/*
 		 * Periodically drop the lock (if held) regardless of its
 		 * contention, to give chance to IRQs. Abort async compaction
@@ -723,39 +725,35 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 		 * It's possible to migrate LRU pages and balloon pages
 		 * Skip any other type of page
 		 */
-		if (!PageLRU(page)) {
+		is_lru = PageLRU(page);
+		if (!is_lru) {
 			if (unlikely(balloon_page_movable(page))) {
 				if (balloon_page_isolate(page)) {
 					/* Successfully isolated */
 					goto isolate_success;
 				}
 			}
-			continue;
 		}
 
 		/*
-		 * PageLRU is set. lru_lock normally excludes isolation
-		 * splitting and collapsing (collapsing has already happened
-		 * if PageLRU is set) but the lock is not necessarily taken
-		 * here and it is wasteful to take it just to check transhuge.
-		 * Check PageCompound without lock and skip the whole pageblock
-		 * if it's a transhuge page, as calling compound_order()
-		 * without preventing THP from splitting the page underneath us
-		 * may return surprising results.
-		 * If we happen to check a THP tail page, compound_order()
-		 * returns 0. It should be rare enough to not bother with
-		 * using compound_head() in that case.
+		 * Regardless of being on LRU, compound pages such as THP and
+		 * hugetlbfs are not to be compacted. We can potentially save
+		 * a lot of iterations if we skip them at once. The check is
+		 * racy, but we can consider only valid values and the only
+		 * danger is skipping too much.
 		 */
 		if (PageCompound(page)) {
-			int nr;
-			if (locked)
-				nr = 1 << compound_order(page);
-			else
-				nr = pageblock_nr_pages;
-			low_pfn += nr - 1;
+			unsigned int comp_order = compound_order(page);
+
+			if (likely(comp_order < MAX_ORDER))
+				low_pfn += (1UL << comp_order) - 1;
+
 			continue;
 		}
 
+		if (!is_lru)
+			continue;
+
 		/*
 		 * Migration will fail if an anonymous page is pinned in memory,
 		 * so avoid taking lru_lock and isolating it unnecessarily in an
-- 
2.4.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 4/5] mm, compaction: always skip compound pages by order in migrate scanner
  2015-07-31 15:28 ` [PATCH v2 4/5] mm, compaction: always skip compound pages by order in migrate scanner Vlastimil Babka
@ 2015-08-07  9:18   ` Vlastimil Babka
  0 siblings, 0 replies; 9+ messages in thread
From: Vlastimil Babka @ 2015-08-07  9:18 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: linux-kernel, Minchan Kim, Naoya Horiguchi, Christoph Lameter,
	Rik van Riel, David Rientjes, Joonsoo Kim, Mel Gorman,
	Michal Nazarewicz

As requested, here's a rebased version that doesn't depend on the page-flags
changes from Kirill. This obsoletes the following patches from that series:

page-flags-define-behavior-of-lru-related-flags-on-compound-pages-fix.patch
page-flags-define-behavior-of-lru-related-flags-on-compound-pages-fix-fix.patch

They were effectively squashed into this patch. Other than that, I've changed
the changelog a bit to reflect that, and added comment to the recheck-after-locking
part along with minor code tweaks there.

------8<------
From: Vlastimil Babka <vbabka@suse.cz>
Date: Fri, 7 Aug 2015 11:12:33 +0200
Subject: [PATCH] mm, compaction: always skip all compound pages by order in
 migrate scanner

The compaction migrate scanner tries to skip THP pages by their order, to
reduce number of iterations for pages it cannot isolate. The check is only done
if PageLRU() is true, which means it applies to THP pages, but not e.g.
hugetlbfs pages or any other non-LRU compound pages, which we have to iterate
by base pages.

This limitation comes from the assumption that it's only safe to read
compound_order() when we have the zone's lru_lock and THP cannot be split under
us. But the only danger (after filtering out order values that are not below
MAX_ORDER, to prevent overflows) is that we skip too much or too little after
reading a bogus compound_order() due to a rare race. This is the same reasoning
as patch 99c0fd5e51c4 ("mm, compaction: skip buddy pages by their order in the
migrate scanner") introduced for unsafely reading PageBuddy() order.

After this patch, all pages are tested for PageCompound() and we skip them by
compound_order().  The test is done after the test for balloon_page_movable()
as we don't want to assume if balloon pages (or other pages with own isolation
and migration implementation if a generic API gets implemented) are compound
or not.

When tested with stress-highalloc from mmtests on 4GB system with 1GB hugetlbfs
pages, the vmstat compact_migrate_scanned count decreased by 15%.

[Patch from Kirill A. Shutemov <kirill.shutemov@linux.intel.com> that changed
 PageTransHuge checks to PageCompound for different series was squashed here]

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
---
 mm/compaction.c | 47 +++++++++++++++++++++++++++--------------------
 1 file changed, 27 insertions(+), 20 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index e51b56a..8f64d35 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -705,6 +705,8 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 
 	/* Time to isolate some pages for migration */
 	for (; low_pfn < end_pfn; low_pfn++) {
+		bool is_lru;
+
 		/*
 		 * Periodically drop the lock (if held) regardless of its
 		 * contention, to give chance to IRQs. Abort async compaction
@@ -748,36 +750,35 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 		 * It's possible to migrate LRU pages and balloon pages
 		 * Skip any other type of page
 		 */
-		if (!PageLRU(page)) {
+		is_lru = PageLRU(page);
+		if (!is_lru) {
 			if (unlikely(balloon_page_movable(page))) {
 				if (balloon_page_isolate(page)) {
 					/* Successfully isolated */
 					goto isolate_success;
 				}
 			}
-			continue;
 		}
 
 		/*
-		 * PageLRU is set. lru_lock normally excludes isolation
-		 * splitting and collapsing (collapsing has already happened
-		 * if PageLRU is set) but the lock is not necessarily taken
-		 * here and it is wasteful to take it just to check transhuge.
-		 * Check TransHuge without lock and skip the whole pageblock if
-		 * it's either a transhuge or hugetlbfs page, as calling
-		 * compound_order() without preventing THP from splitting the
-		 * page underneath us may return surprising results.
+		 * Regardless of being on LRU, compound pages such as THP and
+		 * hugetlbfs are not to be compacted. We can potentially save
+		 * a lot of iterations if we skip them at once. The check is
+		 * racy, but we can consider only valid values and the only
+		 * danger is skipping too much.
 		 */
-		if (PageTransHuge(page)) {
-			if (!locked)
-				low_pfn = ALIGN(low_pfn + 1,
-						pageblock_nr_pages) - 1;
-			else
-				low_pfn += (1 << compound_order(page)) - 1;
+		if (PageCompound(page)) {
+			unsigned int comp_order = compound_order(page);
+
+			if (likely(comp_order < MAX_ORDER))
+				low_pfn += (1UL << comp_order) - 1;
 
 			continue;
 		}
 
+		if (!is_lru)
+			continue;
+
 		/*
 		 * Migration will fail if an anonymous page is pinned in memory,
 		 * so avoid taking lru_lock and isolating it unnecessarily in an
@@ -794,11 +795,17 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 			if (!locked)
 				break;
 
-			/* Recheck PageLRU and PageTransHuge under lock */
+			/* Recheck PageLRU and PageCompound under lock */
 			if (!PageLRU(page))
 				continue;
-			if (PageTransHuge(page)) {
-				low_pfn += (1 << compound_order(page)) - 1;
+
+			/*
+			 * Page become compound since the non-locked check,
+			 * and it's on LRU. It can only be a THP so the order
+			 * is safe to read and it's 0 for tail pages.
+			 */
+			if (unlikely(PageCompound(page))) {
+				low_pfn += (1UL << compound_order(page)) - 1;
 				continue;
 			}
 		}
@@ -809,7 +816,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 		if (__isolate_lru_page(page, isolate_mode) != 0)
 			continue;
 
-		VM_BUG_ON_PAGE(PageTransCompound(page), page);
+		VM_BUG_ON_PAGE(PageCompound(page), page);
 
 		/* Successfully isolated */
 		del_page_from_lru_list(page, lruvec, page_lru(page));
-- 
2.4.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 5/5] mm, compaction: skip compound pages by order in free scanner
  2015-07-31 15:28 [PATCH v2 0/5] Assorted compaction cleanups and optimizations Vlastimil Babka
                   ` (3 preceding siblings ...)
  2015-07-31 15:28 ` [PATCH v2 4/5] mm, compaction: always skip compound pages by order in migrate scanner Vlastimil Babka
@ 2015-07-31 15:28 ` Vlastimil Babka
  2015-07-31 15:30 ` [PATCH v2 0/5] Assorted compaction cleanups and optimizations Vlastimil Babka
  5 siblings, 0 replies; 9+ messages in thread
From: Vlastimil Babka @ 2015-07-31 15:28 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: linux-kernel, Vlastimil Babka, Minchan Kim, Mel Gorman,
	Naoya Horiguchi, Christoph Lameter, Rik van Riel, David Rientjes,
	Joonsoo Kim, Michal Nazarewicz

The compaction free scanner is looking for PageBuddy() pages and skipping all
others.  For large compound pages such as THP or hugetlbfs, we can save a lot
of iterations if we skip them at once using their compound_order(). This is
generally unsafe and we can read a bogus value of order due to a race, but if
we are careful, the only danger is skipping too much.

When tested with stress-highalloc from mmtests on 4GB system with 1GB hugetlbfs
pages, the vmstat compact_free_scanned count decreased by at least 15%.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
---
 mm/compaction.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/mm/compaction.c b/mm/compaction.c
index 70b0776..b978693 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -437,6 +437,24 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
 
 		if (!valid_page)
 			valid_page = page;
+
+		/*
+		 * For compound pages such as THP and hugetlbfs, we can save
+		 * potentially a lot of iterations if we skip them at once.
+		 * The check is racy, but we can consider only valid values
+		 * and the only danger is skipping too much.
+		 */
+		if (PageCompound(page)) {
+			unsigned int comp_order = compound_order(page);
+
+			if (likely(comp_order < MAX_ORDER)) {
+				blockpfn += (1UL << comp_order) - 1;
+				cursor += (1UL << comp_order) - 1;
+			}
+
+			goto isolate_fail;
+		}
+
 		if (!PageBuddy(page))
 			goto isolate_fail;
 
@@ -496,6 +514,13 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
 
 	}
 
+	/*
+	 * There is a tiny chance that we have read bogus compound_order(),
+	 * so be careful to not go outside of the pageblock.
+	 */
+	if (unlikely(blockpfn > end_pfn))
+		blockpfn = end_pfn;
+
 	trace_mm_compaction_isolate_freepages(*start_pfn, blockpfn,
 					nr_scanned, total_isolated);
 
-- 
2.4.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 0/5] Assorted compaction cleanups and optimizations
  2015-07-31 15:28 [PATCH v2 0/5] Assorted compaction cleanups and optimizations Vlastimil Babka
                   ` (4 preceding siblings ...)
  2015-07-31 15:28 ` [PATCH v2 5/5] mm, compaction: skip compound pages by order in free scanner Vlastimil Babka
@ 2015-07-31 15:30 ` Vlastimil Babka
  5 siblings, 0 replies; 9+ messages in thread
From: Vlastimil Babka @ 2015-07-31 15:30 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: linux-kernel, Christoph Lameter, David Rientjes, Joonsoo Kim,
	Mel Gorman, Michal Nazarewicz, Minchan Kim, Naoya Horiguchi,
	Rik van Riel

On 07/31/2015 05:28 PM, Vlastimil Babka wrote:
> v2 changes:
>   - dropped Patch 6 as adjusting to Joonsoo's objection would be too
>     complicated and the results didn't justify it
>   - don't check for compound order > 0 in patches 4 and 5 as suggested by
>     Michal Nazarewicz. Negative values are handled by converting to unsinged
>     int, the pfn calculations work fine with 0 and it's unlikely to see 0
>     due to a race when we just checked PageCompound().

ah, also patch 3 calls reset_cached_positions() from 
__reset_isolation_suitable() as v1 missed a callsite by doing it 
separately (spotted by Joonsoo)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-08-07  9:18 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-31 15:28 [PATCH v2 0/5] Assorted compaction cleanups and optimizations Vlastimil Babka
2015-07-31 15:28 ` [PATCH v2 1/5] mm, compaction: more robust check for scanners meeting Vlastimil Babka
2015-07-31 15:28 ` [PATCH v2 2/5] mm, compaction: simplify handling restart position in free pages scanner Vlastimil Babka
2015-08-03 17:00   ` Michal Nazarewicz
2015-07-31 15:28 ` [PATCH v2 3/5] mm, compaction: encapsulate resetting cached scanner positions Vlastimil Babka
2015-07-31 15:28 ` [PATCH v2 4/5] mm, compaction: always skip compound pages by order in migrate scanner Vlastimil Babka
2015-08-07  9:18   ` Vlastimil Babka
2015-07-31 15:28 ` [PATCH v2 5/5] mm, compaction: skip compound pages by order in free scanner Vlastimil Babka
2015-07-31 15:30 ` [PATCH v2 0/5] Assorted compaction cleanups and optimizations Vlastimil Babka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).