[PATCH v9] mm: compaction: handle incorrect MIGRATE

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v9] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
@ 2012-06-04 13:43 ` Bartlomiej Zolnierkiewicz
  0 siblings, 0 replies; 31+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2012-06-04 13:43 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Minchan Kim, Hugh Dickins, Linus Torvalds, Kyungmin Park,
	Marek Szyprowski, Mel Gorman, Rik van Riel, Dave Jones,
	Andrew Morton, Cong Wang, Markus Trippelsdorf


Dave, could you please test this version?

From: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Subject: [PATCH v9] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks

When MIGRATE_UNMOVABLE pages are freed from MIGRATE_UNMOVABLE
type pageblock (and some MIGRATE_MOVABLE pages are left in it)
waiting until an allocation takes ownership of the block may
take too long.  The type of the pageblock remains unchanged
so the pageblock cannot be used as a migration target during
compaction.

Fix it by:

* Adding enum compact_mode (COMPACT_ASYNC_[MOVABLE,UNMOVABLE],
  and COMPACT_SYNC) and then converting sync field in struct
  compact_control to use it.

* Adding nr_pageblocks_skipped field to struct compact_control
  and tracking how many destination pageblocks were of
  MIGRATE_UNMOVABLE type.  If COMPACT_ASYNC_MOVABLE mode compaction
  ran fully in try_to_compact_pages() (COMPACT_COMPLETE) it implies
  that there is not a suitable page for allocation.  In this case
  then check how if there were enough MIGRATE_UNMOVABLE pageblocks
  to try a second pass in COMPACT_ASYNC_UNMOVABLE mode.

* Scanning the MIGRATE_UNMOVABLE pageblocks (during COMPACT_SYNC
  and COMPACT_ASYNC_UNMOVABLE compaction modes) and building
  a count based on finding PageBuddy pages, page_count(page) == 0
  or PageLRU pages.  If all pages within the MIGRATE_UNMOVABLE
  pageblock are in one of those three sets change the whole
  pageblock type to MIGRATE_MOVABLE.

My particular test case (on a ARM EXYNOS4 device with 512 MiB,
which means 131072 standard 4KiB pages in 'Normal' zone) is to:
- allocate 95000 pages for kernel's usage
- free every second page (47500 pages) of memory just allocated
- allocate and use 60000 pages from user space
- free remaining 60000 pages of kernel memory
(now we have fragmented memory occupied mostly by user space pages)
- try to allocate 100 order-9 (2048 KiB) pages for kernel's usage

The results:
- with compaction disabled I get 10 successful allocations
- with compaction enabled - 11 successful allocations
- with this patch I'm able to get 25 successful allocations

NOTE: If we can make kswapd aware of order-0 request during
compaction, we can enhance kswapd with changing mode to
COMPACT_ASYNC_FULL (COMPACT_ASYNC_MOVABLE + COMPACT_ASYNC_UNMOVABLE).
Please see the following thread:

	http://marc.info/?l=linux-mm&m=133552069417068&w=2

[minchan@kernel.org: minor cleanups]
Cc: Hugh Dickins <hughd@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Cong Wang <amwang@redhat.com>
Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
v2:
- redo the patch basing on review from Mel Gorman
  (http://marc.info/?l=linux-mm&m=133519311025444&w=2)
v3:
- apply review comments from Minchan Kim
  (http://marc.info/?l=linux-mm&m=133531540308862&w=2)
v4:
- more review comments from Mel
  (http://marc.info/?l=linux-mm&m=133545110625042&w=2)
v5:
- even more comments from Mel
  (http://marc.info/?l=linux-mm&m=133577669023492&w=2)
- fix patch description
v6: (based on comments from Minchan Kim and Mel Gorman)
- add note about kswapd
- rename nr_pageblocks to nr_pageblocks_scanned_scanned and nr_skipped
  to nr_pageblocks_scanned_skipped
- fix pageblocks counting in suitable_migration_target()
- fix try_to_compact_pages() to do COMPACT_ASYNC_UNMOVABLE per zone 
v7:
- minor cleanups from Minchan Kim
- cleanup try_to_compact_pages()
v8:
- document rescue_unmovable_pageblock()
- enum result_smt -> enum_smt_result
- fix suitable_migration_target() documentation
- add comment about zeroing cc->nr_pageblocks_skipped
- fix FAIL_UNMOVABLE_TARGET handling in isolate_freepages()
v9:
- use right page for pageblock conversion in rescue_unmovable_pageblock()
- split rescue_unmovable_pageblock() on can_rescue_unmovable_pageblock()
  and __rescue_unmovable_pageblock()
- add missing locking
- modify test-case slightly

 include/linux/compaction.h |   19 +++++
 mm/compaction.c            |  166 ++++++++++++++++++++++++++++++++++++++-------
 mm/internal.h              |    9 ++
 mm/page_alloc.c            |    8 +-
 4 files changed, 174 insertions(+), 28 deletions(-)

Index: b/include/linux/compaction.h
===================================================================
--- a/include/linux/compaction.h	2012-06-04 15:01:40.957552983 +0200
+++ b/include/linux/compaction.h	2012-06-04 15:16:30.396467898 +0200
@@ -1,6 +1,8 @@
 #ifndef _LINUX_COMPACTION_H
 #define _LINUX_COMPACTION_H
 
+#include <linux/node.h>
+
 /* Return values for compact_zone() and try_to_compact_pages() */
 /* compaction didn't start as it was not possible or direct reclaim was more suitable */
 #define COMPACT_SKIPPED		0
@@ -11,6 +13,23 @@
 /* The full zone was compacted */
 #define COMPACT_COMPLETE	3
 
+/*
+ * compaction supports three modes
+ *
+ * COMPACT_ASYNC_MOVABLE uses asynchronous migration and only scans
+ *    MIGRATE_MOVABLE pageblocks as migration sources and targets.
+ * COMPACT_ASYNC_UNMOVABLE uses asynchronous migration and only scans
+ *    MIGRATE_MOVABLE pageblocks as migration sources.
+ *    MIGRATE_UNMOVABLE pageblocks are scanned as potential migration
+ *    targets and convers them to MIGRATE_MOVABLE if possible
+ * COMPACT_SYNC uses synchronous migration and scans all pageblocks
+ */
+enum compact_mode {
+	COMPACT_ASYNC_MOVABLE,
+	COMPACT_ASYNC_UNMOVABLE,
+	COMPACT_SYNC,
+};
+
 #ifdef CONFIG_COMPACTION
 extern int sysctl_compact_memory;
 extern int sysctl_compaction_handler(struct ctl_table *table, int write,
Index: b/mm/compaction.c
===================================================================
--- a/mm/compaction.c	2012-06-04 15:16:11.884467919 +0200
+++ b/mm/compaction.c	2012-06-04 15:18:34.220467910 +0200
@@ -236,7 +236,7 @@ isolate_migratepages_range(struct zone *
 	 */
 	while (unlikely(too_many_isolated(zone))) {
 		/* async migration should just abort */
-		if (!cc->sync)
+		if (cc->mode != COMPACT_SYNC)
 			return 0;
 
 		congestion_wait(BLK_RW_ASYNC, HZ/10);
@@ -304,7 +304,8 @@ isolate_migratepages_range(struct zone *
 		 * satisfies the allocation
 		 */
 		pageblock_nr = low_pfn >> pageblock_order;
-		if (!cc->sync && last_pageblock_nr != pageblock_nr &&
+		if (cc->mode != COMPACT_SYNC &&
+		    last_pageblock_nr != pageblock_nr &&
 		    !migrate_async_suitable(get_pageblock_migratetype(page))) {
 			low_pfn += pageblock_nr_pages;
 			low_pfn = ALIGN(low_pfn, pageblock_nr_pages) - 1;
@@ -325,7 +326,7 @@ isolate_migratepages_range(struct zone *
 			continue;
 		}
 
-		if (!cc->sync)
+		if (cc->mode != COMPACT_SYNC)
 			mode |= ISOLATE_ASYNC_MIGRATE;
 
 		lruvec = mem_cgroup_page_lruvec(page, zone);
@@ -360,27 +361,110 @@ isolate_migratepages_range(struct zone *
 
 #endif /* CONFIG_COMPACTION || CONFIG_CMA */
 #ifdef CONFIG_COMPACTION
+/*
+ * Returns true if MIGRATE_UNMOVABLE pageblock can be successfully
+ * converted to MIGRATE_MOVABLE type, false otherwise.
+ */
+static bool can_rescue_unmovable_pageblock(struct page *page, bool locked)
+{
+	unsigned long pfn, start_pfn, end_pfn;
+	struct page *start_page, *end_page, *cursor_page;
+
+	pfn = page_to_pfn(page);
+	start_pfn = pfn & ~(pageblock_nr_pages - 1);
+	end_pfn = start_pfn + pageblock_nr_pages - 1;
+
+	start_page = pfn_to_page(start_pfn);
+	end_page = pfn_to_page(end_pfn);
+
+	for (cursor_page = start_page, pfn = start_pfn; cursor_page <= end_page;
+		pfn++, cursor_page++) {
+		struct zone *zone = page_zone(start_page);
+		unsigned long flags;
+
+		if (!pfn_valid_within(pfn))
+			continue;
+
+		/* Do not deal with pageblocks that overlap zones */
+		if (page_zone(cursor_page) != zone)
+			return false;
+
+		if (!locked)
+			spin_lock_irqsave(&zone->lock, flags);
+
+		if (PageBuddy(cursor_page)) {
+			int order = page_order(cursor_page);
 
-/* Returns true if the page is within a block suitable for migration to */
-static bool suitable_migration_target(struct page *page)
+			pfn += (1 << order) - 1;
+			cursor_page += (1 << order) - 1;
+
+			if (!locked)
+				spin_unlock_irqrestore(&zone->lock, flags);
+			continue;
+		} else if (page_count(cursor_page) == 0 ||
+			   PageLRU(cursor_page)) {
+			if (!locked)
+				spin_unlock_irqrestore(&zone->lock, flags);
+			continue;
+		}
+
+		if (!locked)
+			spin_unlock_irqrestore(&zone->lock, flags);
+
+		return false;
+	}
+
+	return true;
+}
+
+void __rescue_unmovable_pageblock(struct page *page)
+{
+	set_pageblock_migratetype(page, MIGRATE_MOVABLE);
+	move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE);
+}
+
+enum smt_result {
+	GOOD_AS_MIGRATION_TARGET,
+	GOOD_CAN_RESCUE_UNMOVABLE_TARGET,
+	FAIL_UNMOVABLE_TARGET,
+	FAIL_BAD_TARGET,
+};
+
+/*
+ * Returns GOOD_AS_MIGRATION_TARGET if the page is within a block
+ * suitable for migration to, FAIL_UNMOVABLE_TARGET if the page
+ * is within a MIGRATE_UNMOVABLE block, FAIL_BAD_TARGET otherwise.
+ */
+static enum smt_result suitable_migration_target(struct page *page,
+				      struct compact_control *cc, bool locked)
 {
 
 	int migratetype = get_pageblock_migratetype(page);
 
 	/* Don't interfere with memory hot-remove or the min_free_kbytes blocks */
 	if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
-		return false;
+		return FAIL_BAD_TARGET;
 
 	/* If the page is a large free page, then allow migration */
 	if (PageBuddy(page) && page_order(page) >= pageblock_order)
-		return true;
+		return GOOD_AS_MIGRATION_TARGET;
 
 	/* If the block is MIGRATE_MOVABLE or MIGRATE_CMA, allow migration */
-	if (migrate_async_suitable(migratetype))
-		return true;
+	if (cc->mode != COMPACT_ASYNC_UNMOVABLE &&
+	    migrate_async_suitable(migratetype))
+		return GOOD_AS_MIGRATION_TARGET;
+
+	if (cc->mode == COMPACT_ASYNC_MOVABLE &&
+	    migratetype == MIGRATE_UNMOVABLE)
+		return FAIL_UNMOVABLE_TARGET;
+
+	if (cc->mode != COMPACT_ASYNC_MOVABLE &&
+	    migratetype == MIGRATE_UNMOVABLE &&
+	    can_rescue_unmovable_pageblock(page, locked))
+		return GOOD_CAN_RESCUE_UNMOVABLE_TARGET;
 
 	/* Otherwise skip the block */
-	return false;
+	return FAIL_BAD_TARGET;
 }
 
 /*
@@ -414,6 +498,13 @@ static void isolate_freepages(struct zon
 	zone_end_pfn = zone->zone_start_pfn + zone->spanned_pages;
 
 	/*
+	 * isolate_freepages() may be called more than once during
+	 * compact_zone_order() run and we want only the most recent
+	 * count.
+	 */
+	cc->nr_pageblocks_skipped = 0;
+
+	/*
 	 * Isolate free pages until enough are available to migrate the
 	 * pages on cc->migratepages. We stop searching if the migrate
 	 * and free page scanners meet or enough free pages are isolated.
@@ -421,6 +512,7 @@ static void isolate_freepages(struct zon
 	for (; pfn > low_pfn && cc->nr_migratepages > nr_freepages;
 					pfn -= pageblock_nr_pages) {
 		unsigned long isolated;
+		enum smt_result ret;
 
 		if (!pfn_valid(pfn))
 			continue;
@@ -437,9 +529,13 @@ static void isolate_freepages(struct zon
 			continue;
 
 		/* Check the block is suitable for migration */
-		if (!suitable_migration_target(page))
+		ret = suitable_migration_target(page, cc, false);
+		if (ret != GOOD_AS_MIGRATION_TARGET &&
+		    ret != GOOD_CAN_RESCUE_UNMOVABLE_TARGET) {
+			if (ret == FAIL_UNMOVABLE_TARGET)
+				cc->nr_pageblocks_skipped++;
 			continue;
-
+		}
 		/*
 		 * Found a block suitable for isolating free pages from. Now
 		 * we disabled interrupts, double check things are ok and
@@ -448,12 +544,17 @@ static void isolate_freepages(struct zon
 		 */
 		isolated = 0;
 		spin_lock_irqsave(&zone->lock, flags);
-		if (suitable_migration_target(page)) {
+		ret = suitable_migration_target(page, cc, true);
+		if (ret == GOOD_AS_MIGRATION_TARGET ||
+		    ret == GOOD_CAN_RESCUE_UNMOVABLE_TARGET) {
+			if (ret == GOOD_CAN_RESCUE_UNMOVABLE_TARGET)
+				__rescue_unmovable_pageblock(page);
 			end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn);
 			isolated = isolate_freepages_block(pfn, end_pfn,
 							   freelist, false);
 			nr_freepages += isolated;
-		}
+		} else if (ret == FAIL_UNMOVABLE_TARGET)
+			cc->nr_pageblocks_skipped++;
 		spin_unlock_irqrestore(&zone->lock, flags);
 
 		/*
@@ -685,8 +786,9 @@ static int compact_zone(struct zone *zon
 
 		nr_migrate = cc->nr_migratepages;
 		err = migrate_pages(&cc->migratepages, compaction_alloc,
-				(unsigned long)cc, false,
-				cc->sync ? MIGRATE_SYNC_LIGHT : MIGRATE_ASYNC);
+			(unsigned long)&cc->freepages, false,
+			(cc->mode == COMPACT_SYNC) ? MIGRATE_SYNC_LIGHT
+						      : MIGRATE_ASYNC);
 		update_nr_listpages(cc);
 		nr_remaining = cc->nr_migratepages;
 
@@ -715,7 +817,8 @@ out:
 
 static unsigned long compact_zone_order(struct zone *zone,
 				 int order, gfp_t gfp_mask,
-				 bool sync)
+				 enum compact_mode mode,
+				 unsigned long *nr_pageblocks_skipped)
 {
 	struct compact_control cc = {
 		.nr_freepages = 0,
@@ -723,12 +826,17 @@ static unsigned long compact_zone_order(
 		.order = order,
 		.migratetype = allocflags_to_migratetype(gfp_mask),
 		.zone = zone,
-		.sync = sync,
+		.mode = mode,
 	};
+	unsigned long rc;
+
 	INIT_LIST_HEAD(&cc.freepages);
 	INIT_LIST_HEAD(&cc.migratepages);
 
-	return compact_zone(zone, &cc);
+	rc = compact_zone(zone, &cc);
+	*nr_pageblocks_skipped = cc.nr_pageblocks_skipped;
+
+	return rc;
 }
 
 int sysctl_extfrag_threshold = 500;
@@ -753,6 +861,8 @@ unsigned long try_to_compact_pages(struc
 	struct zoneref *z;
 	struct zone *zone;
 	int rc = COMPACT_SKIPPED;
+	unsigned long nr_pageblocks_skipped;
+	enum compact_mode mode;
 
 	/*
 	 * Check whether it is worth even starting compaction. The order check is
@@ -769,12 +879,22 @@ unsigned long try_to_compact_pages(struc
 								nodemask) {
 		int status;
 
-		status = compact_zone_order(zone, order, gfp_mask, sync);
+		mode = sync ? COMPACT_SYNC : COMPACT_ASYNC_MOVABLE;
+retry:
+		status = compact_zone_order(zone, order, gfp_mask, mode,
+						&nr_pageblocks_skipped);
 		rc = max(status, rc);
 
 		/* If a normal allocation would succeed, stop compacting */
 		if (zone_watermark_ok(zone, order, low_wmark_pages(zone), 0, 0))
 			break;
+
+		if (rc == COMPACT_COMPLETE && mode == COMPACT_ASYNC_MOVABLE) {
+			if (nr_pageblocks_skipped) {
+				mode = COMPACT_ASYNC_UNMOVABLE;
+				goto retry;
+			}
+		}
 	}
 
 	return rc;
@@ -808,7 +928,7 @@ static int __compact_pgdat(pg_data_t *pg
 			if (ok && cc->order > zone->compact_order_failed)
 				zone->compact_order_failed = cc->order + 1;
 			/* Currently async compaction is never deferred. */
-			else if (!ok && cc->sync)
+			else if (!ok && cc->mode == COMPACT_SYNC)
 				defer_compaction(zone, cc->order);
 		}
 
@@ -823,7 +943,7 @@ int compact_pgdat(pg_data_t *pgdat, int 
 {
 	struct compact_control cc = {
 		.order = order,
-		.sync = false,
+		.mode = COMPACT_ASYNC_MOVABLE,
 	};
 
 	return __compact_pgdat(pgdat, &cc);
@@ -833,7 +953,7 @@ static int compact_node(int nid)
 {
 	struct compact_control cc = {
 		.order = -1,
-		.sync = true,
+		.mode = COMPACT_SYNC,
 	};
 
 	return __compact_pgdat(NODE_DATA(nid), &cc);
Index: b/mm/internal.h
===================================================================
--- a/mm/internal.h	2012-06-04 15:16:11.908467919 +0200
+++ b/mm/internal.h	2012-06-04 15:16:30.396467898 +0200
@@ -94,6 +94,9 @@ extern void putback_lru_page(struct page
 /*
  * in mm/page_alloc.c
  */
+extern void set_pageblock_migratetype(struct page *page, int migratetype);
+extern int move_freepages_block(struct zone *zone, struct page *page,
+				int migratetype);
 extern void __free_pages_bootmem(struct page *page, unsigned int order);
 extern void prep_compound_page(struct page *page, unsigned long order);
 #ifdef CONFIG_MEMORY_FAILURE
@@ -101,6 +104,7 @@ extern bool is_free_buddy_page(struct pa
 #endif
 
 #if defined CONFIG_COMPACTION || defined CONFIG_CMA
+#include <linux/compaction.h>
 
 /*
  * in mm/compaction.c
@@ -119,11 +123,14 @@ struct compact_control {
 	unsigned long nr_migratepages;	/* Number of pages to migrate */
 	unsigned long free_pfn;		/* isolate_freepages search base */
 	unsigned long migrate_pfn;	/* isolate_migratepages search base */
-	bool sync;			/* Synchronous migration */
+	enum compact_mode mode;		/* Compaction mode */
 
 	int order;			/* order a direct compactor needs */
 	int migratetype;		/* MOVABLE, RECLAIMABLE etc */
 	struct zone *zone;
+
+	/* Number of UNMOVABLE destination pageblocks skipped during scan */
+	unsigned long nr_pageblocks_skipped;
 };
 
 unsigned long
Index: b/mm/page_alloc.c
===================================================================
--- a/mm/page_alloc.c	2012-06-04 15:16:27.356467917 +0200
+++ b/mm/page_alloc.c	2012-06-04 15:16:30.396467898 +0200
@@ -241,7 +241,7 @@ static char *migratetype_to_str(int migr
 	}
 }
 
-static void set_pageblock_migratetype(struct page *page, int migratetype)
+void set_pageblock_migratetype(struct page *page, int migratetype)
 {
 	struct zone *zone = page_zone(page);
 
@@ -982,8 +982,8 @@ static int move_freepages(struct zone *z
 	return pages_moved;
 }
 
-static int move_freepages_block(struct zone *zone, struct page *page,
-				int migratetype)
+int move_freepages_block(struct zone *zone, struct page *page,
+			 int migratetype)
 {
 	unsigned long start_pfn, end_pfn;
 	struct page *start_page, *end_page;
@@ -5684,7 +5684,7 @@ static int __alloc_contig_migrate_range(
 		.nr_migratepages = 0,
 		.order = -1,
 		.zone = page_zone(pfn_to_page(start)),
-		.sync = true,
+		.mode = COMPACT_SYNC,
 	};
 	INIT_LIST_HEAD(&cc.migratepages);
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v9] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
@ 2012-06-04 13:43 ` Bartlomiej Zolnierkiewicz
  0 siblings, 0 replies; 31+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2012-06-04 13:43 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Minchan Kim, Hugh Dickins, Linus Torvalds, Kyungmin Park,
	Marek Szyprowski, Mel Gorman, Rik van Riel, Dave Jones,
	Andrew Morton, Cong Wang, Markus Trippelsdorf


Dave, could you please test this version?

From: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Subject: [PATCH v9] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks

When MIGRATE_UNMOVABLE pages are freed from MIGRATE_UNMOVABLE
type pageblock (and some MIGRATE_MOVABLE pages are left in it)
waiting until an allocation takes ownership of the block may
take too long.  The type of the pageblock remains unchanged
so the pageblock cannot be used as a migration target during
compaction.

Fix it by:

* Adding enum compact_mode (COMPACT_ASYNC_[MOVABLE,UNMOVABLE],
  and COMPACT_SYNC) and then converting sync field in struct
  compact_control to use it.

* Adding nr_pageblocks_skipped field to struct compact_control
  and tracking how many destination pageblocks were of
  MIGRATE_UNMOVABLE type.  If COMPACT_ASYNC_MOVABLE mode compaction
  ran fully in try_to_compact_pages() (COMPACT_COMPLETE) it implies
  that there is not a suitable page for allocation.  In this case
  then check how if there were enough MIGRATE_UNMOVABLE pageblocks
  to try a second pass in COMPACT_ASYNC_UNMOVABLE mode.

* Scanning the MIGRATE_UNMOVABLE pageblocks (during COMPACT_SYNC
  and COMPACT_ASYNC_UNMOVABLE compaction modes) and building
  a count based on finding PageBuddy pages, page_count(page) == 0
  or PageLRU pages.  If all pages within the MIGRATE_UNMOVABLE
  pageblock are in one of those three sets change the whole
  pageblock type to MIGRATE_MOVABLE.

My particular test case (on a ARM EXYNOS4 device with 512 MiB,
which means 131072 standard 4KiB pages in 'Normal' zone) is to:
- allocate 95000 pages for kernel's usage
- free every second page (47500 pages) of memory just allocated
- allocate and use 60000 pages from user space
- free remaining 60000 pages of kernel memory
(now we have fragmented memory occupied mostly by user space pages)
- try to allocate 100 order-9 (2048 KiB) pages for kernel's usage

The results:
- with compaction disabled I get 10 successful allocations
- with compaction enabled - 11 successful allocations
- with this patch I'm able to get 25 successful allocations

NOTE: If we can make kswapd aware of order-0 request during
compaction, we can enhance kswapd with changing mode to
COMPACT_ASYNC_FULL (COMPACT_ASYNC_MOVABLE + COMPACT_ASYNC_UNMOVABLE).
Please see the following thread:

	http://marc.info/?l=linux-mm&m=133552069417068&w=2

[minchan@kernel.org: minor cleanups]
Cc: Hugh Dickins <hughd@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Cong Wang <amwang@redhat.com>
Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
v2:
- redo the patch basing on review from Mel Gorman
  (http://marc.info/?l=linux-mm&m=133519311025444&w=2)
v3:
- apply review comments from Minchan Kim
  (http://marc.info/?l=linux-mm&m=133531540308862&w=2)
v4:
- more review comments from Mel
  (http://marc.info/?l=linux-mm&m=133545110625042&w=2)
v5:
- even more comments from Mel
  (http://marc.info/?l=linux-mm&m=133577669023492&w=2)
- fix patch description
v6: (based on comments from Minchan Kim and Mel Gorman)
- add note about kswapd
- rename nr_pageblocks to nr_pageblocks_scanned_scanned and nr_skipped
  to nr_pageblocks_scanned_skipped
- fix pageblocks counting in suitable_migration_target()
- fix try_to_compact_pages() to do COMPACT_ASYNC_UNMOVABLE per zone 
v7:
- minor cleanups from Minchan Kim
- cleanup try_to_compact_pages()
v8:
- document rescue_unmovable_pageblock()
- enum result_smt -> enum_smt_result
- fix suitable_migration_target() documentation
- add comment about zeroing cc->nr_pageblocks_skipped
- fix FAIL_UNMOVABLE_TARGET handling in isolate_freepages()
v9:
- use right page for pageblock conversion in rescue_unmovable_pageblock()
- split rescue_unmovable_pageblock() on can_rescue_unmovable_pageblock()
  and __rescue_unmovable_pageblock()
- add missing locking
- modify test-case slightly

 include/linux/compaction.h |   19 +++++
 mm/compaction.c            |  166 ++++++++++++++++++++++++++++++++++++++-------
 mm/internal.h              |    9 ++
 mm/page_alloc.c            |    8 +-
 4 files changed, 174 insertions(+), 28 deletions(-)

Index: b/include/linux/compaction.h
===================================================================
--- a/include/linux/compaction.h	2012-06-04 15:01:40.957552983 +0200
+++ b/include/linux/compaction.h	2012-06-04 15:16:30.396467898 +0200
@@ -1,6 +1,8 @@
 #ifndef _LINUX_COMPACTION_H
 #define _LINUX_COMPACTION_H
 
+#include <linux/node.h>
+
 /* Return values for compact_zone() and try_to_compact_pages() */
 /* compaction didn't start as it was not possible or direct reclaim was more suitable */
 #define COMPACT_SKIPPED		0
@@ -11,6 +13,23 @@
 /* The full zone was compacted */
 #define COMPACT_COMPLETE	3
 
+/*
+ * compaction supports three modes
+ *
+ * COMPACT_ASYNC_MOVABLE uses asynchronous migration and only scans
+ *    MIGRATE_MOVABLE pageblocks as migration sources and targets.
+ * COMPACT_ASYNC_UNMOVABLE uses asynchronous migration and only scans
+ *    MIGRATE_MOVABLE pageblocks as migration sources.
+ *    MIGRATE_UNMOVABLE pageblocks are scanned as potential migration
+ *    targets and convers them to MIGRATE_MOVABLE if possible
+ * COMPACT_SYNC uses synchronous migration and scans all pageblocks
+ */
+enum compact_mode {
+	COMPACT_ASYNC_MOVABLE,
+	COMPACT_ASYNC_UNMOVABLE,
+	COMPACT_SYNC,
+};
+
 #ifdef CONFIG_COMPACTION
 extern int sysctl_compact_memory;
 extern int sysctl_compaction_handler(struct ctl_table *table, int write,
Index: b/mm/compaction.c
===================================================================
--- a/mm/compaction.c	2012-06-04 15:16:11.884467919 +0200
+++ b/mm/compaction.c	2012-06-04 15:18:34.220467910 +0200
@@ -236,7 +236,7 @@ isolate_migratepages_range(struct zone *
 	 */
 	while (unlikely(too_many_isolated(zone))) {
 		/* async migration should just abort */
-		if (!cc->sync)
+		if (cc->mode != COMPACT_SYNC)
 			return 0;
 
 		congestion_wait(BLK_RW_ASYNC, HZ/10);
@@ -304,7 +304,8 @@ isolate_migratepages_range(struct zone *
 		 * satisfies the allocation
 		 */
 		pageblock_nr = low_pfn >> pageblock_order;
-		if (!cc->sync && last_pageblock_nr != pageblock_nr &&
+		if (cc->mode != COMPACT_SYNC &&
+		    last_pageblock_nr != pageblock_nr &&
 		    !migrate_async_suitable(get_pageblock_migratetype(page))) {
 			low_pfn += pageblock_nr_pages;
 			low_pfn = ALIGN(low_pfn, pageblock_nr_pages) - 1;
@@ -325,7 +326,7 @@ isolate_migratepages_range(struct zone *
 			continue;
 		}
 
-		if (!cc->sync)
+		if (cc->mode != COMPACT_SYNC)
 			mode |= ISOLATE_ASYNC_MIGRATE;
 
 		lruvec = mem_cgroup_page_lruvec(page, zone);
@@ -360,27 +361,110 @@ isolate_migratepages_range(struct zone *
 
 #endif /* CONFIG_COMPACTION || CONFIG_CMA */
 #ifdef CONFIG_COMPACTION
+/*
+ * Returns true if MIGRATE_UNMOVABLE pageblock can be successfully
+ * converted to MIGRATE_MOVABLE type, false otherwise.
+ */
+static bool can_rescue_unmovable_pageblock(struct page *page, bool locked)
+{
+	unsigned long pfn, start_pfn, end_pfn;
+	struct page *start_page, *end_page, *cursor_page;
+
+	pfn = page_to_pfn(page);
+	start_pfn = pfn & ~(pageblock_nr_pages - 1);
+	end_pfn = start_pfn + pageblock_nr_pages - 1;
+
+	start_page = pfn_to_page(start_pfn);
+	end_page = pfn_to_page(end_pfn);
+
+	for (cursor_page = start_page, pfn = start_pfn; cursor_page <= end_page;
+		pfn++, cursor_page++) {
+		struct zone *zone = page_zone(start_page);
+		unsigned long flags;
+
+		if (!pfn_valid_within(pfn))
+			continue;
+
+		/* Do not deal with pageblocks that overlap zones */
+		if (page_zone(cursor_page) != zone)
+			return false;
+
+		if (!locked)
+			spin_lock_irqsave(&zone->lock, flags);
+
+		if (PageBuddy(cursor_page)) {
+			int order = page_order(cursor_page);
 
-/* Returns true if the page is within a block suitable for migration to */
-static bool suitable_migration_target(struct page *page)
+			pfn += (1 << order) - 1;
+			cursor_page += (1 << order) - 1;
+
+			if (!locked)
+				spin_unlock_irqrestore(&zone->lock, flags);
+			continue;
+		} else if (page_count(cursor_page) == 0 ||
+			   PageLRU(cursor_page)) {
+			if (!locked)
+				spin_unlock_irqrestore(&zone->lock, flags);
+			continue;
+		}
+
+		if (!locked)
+			spin_unlock_irqrestore(&zone->lock, flags);
+
+		return false;
+	}
+
+	return true;
+}
+
+void __rescue_unmovable_pageblock(struct page *page)
+{
+	set_pageblock_migratetype(page, MIGRATE_MOVABLE);
+	move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE);
+}
+
+enum smt_result {
+	GOOD_AS_MIGRATION_TARGET,
+	GOOD_CAN_RESCUE_UNMOVABLE_TARGET,
+	FAIL_UNMOVABLE_TARGET,
+	FAIL_BAD_TARGET,
+};
+
+/*
+ * Returns GOOD_AS_MIGRATION_TARGET if the page is within a block
+ * suitable for migration to, FAIL_UNMOVABLE_TARGET if the page
+ * is within a MIGRATE_UNMOVABLE block, FAIL_BAD_TARGET otherwise.
+ */
+static enum smt_result suitable_migration_target(struct page *page,
+				      struct compact_control *cc, bool locked)
 {
 
 	int migratetype = get_pageblock_migratetype(page);
 
 	/* Don't interfere with memory hot-remove or the min_free_kbytes blocks */
 	if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
-		return false;
+		return FAIL_BAD_TARGET;
 
 	/* If the page is a large free page, then allow migration */
 	if (PageBuddy(page) && page_order(page) >= pageblock_order)
-		return true;
+		return GOOD_AS_MIGRATION_TARGET;
 
 	/* If the block is MIGRATE_MOVABLE or MIGRATE_CMA, allow migration */
-	if (migrate_async_suitable(migratetype))
-		return true;
+	if (cc->mode != COMPACT_ASYNC_UNMOVABLE &&
+	    migrate_async_suitable(migratetype))
+		return GOOD_AS_MIGRATION_TARGET;
+
+	if (cc->mode == COMPACT_ASYNC_MOVABLE &&
+	    migratetype == MIGRATE_UNMOVABLE)
+		return FAIL_UNMOVABLE_TARGET;
+
+	if (cc->mode != COMPACT_ASYNC_MOVABLE &&
+	    migratetype == MIGRATE_UNMOVABLE &&
+	    can_rescue_unmovable_pageblock(page, locked))
+		return GOOD_CAN_RESCUE_UNMOVABLE_TARGET;
 
 	/* Otherwise skip the block */
-	return false;
+	return FAIL_BAD_TARGET;
 }
 
 /*
@@ -414,6 +498,13 @@ static void isolate_freepages(struct zon
 	zone_end_pfn = zone->zone_start_pfn + zone->spanned_pages;
 
 	/*
+	 * isolate_freepages() may be called more than once during
+	 * compact_zone_order() run and we want only the most recent
+	 * count.
+	 */
+	cc->nr_pageblocks_skipped = 0;
+
+	/*
 	 * Isolate free pages until enough are available to migrate the
 	 * pages on cc->migratepages. We stop searching if the migrate
 	 * and free page scanners meet or enough free pages are isolated.
@@ -421,6 +512,7 @@ static void isolate_freepages(struct zon
 	for (; pfn > low_pfn && cc->nr_migratepages > nr_freepages;
 					pfn -= pageblock_nr_pages) {
 		unsigned long isolated;
+		enum smt_result ret;
 
 		if (!pfn_valid(pfn))
 			continue;
@@ -437,9 +529,13 @@ static void isolate_freepages(struct zon
 			continue;
 
 		/* Check the block is suitable for migration */
-		if (!suitable_migration_target(page))
+		ret = suitable_migration_target(page, cc, false);
+		if (ret != GOOD_AS_MIGRATION_TARGET &&
+		    ret != GOOD_CAN_RESCUE_UNMOVABLE_TARGET) {
+			if (ret == FAIL_UNMOVABLE_TARGET)
+				cc->nr_pageblocks_skipped++;
 			continue;
-
+		}
 		/*
 		 * Found a block suitable for isolating free pages from. Now
 		 * we disabled interrupts, double check things are ok and
@@ -448,12 +544,17 @@ static void isolate_freepages(struct zon
 		 */
 		isolated = 0;
 		spin_lock_irqsave(&zone->lock, flags);
-		if (suitable_migration_target(page)) {
+		ret = suitable_migration_target(page, cc, true);
+		if (ret == GOOD_AS_MIGRATION_TARGET ||
+		    ret == GOOD_CAN_RESCUE_UNMOVABLE_TARGET) {
+			if (ret == GOOD_CAN_RESCUE_UNMOVABLE_TARGET)
+				__rescue_unmovable_pageblock(page);
 			end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn);
 			isolated = isolate_freepages_block(pfn, end_pfn,
 							   freelist, false);
 			nr_freepages += isolated;
-		}
+		} else if (ret == FAIL_UNMOVABLE_TARGET)
+			cc->nr_pageblocks_skipped++;
 		spin_unlock_irqrestore(&zone->lock, flags);
 
 		/*
@@ -685,8 +786,9 @@ static int compact_zone(struct zone *zon
 
 		nr_migrate = cc->nr_migratepages;
 		err = migrate_pages(&cc->migratepages, compaction_alloc,
-				(unsigned long)cc, false,
-				cc->sync ? MIGRATE_SYNC_LIGHT : MIGRATE_ASYNC);
+			(unsigned long)&cc->freepages, false,
+			(cc->mode == COMPACT_SYNC) ? MIGRATE_SYNC_LIGHT
+						      : MIGRATE_ASYNC);
 		update_nr_listpages(cc);
 		nr_remaining = cc->nr_migratepages;
 
@@ -715,7 +817,8 @@ out:
 
 static unsigned long compact_zone_order(struct zone *zone,
 				 int order, gfp_t gfp_mask,
-				 bool sync)
+				 enum compact_mode mode,
+				 unsigned long *nr_pageblocks_skipped)
 {
 	struct compact_control cc = {
 		.nr_freepages = 0,
@@ -723,12 +826,17 @@ static unsigned long compact_zone_order(
 		.order = order,
 		.migratetype = allocflags_to_migratetype(gfp_mask),
 		.zone = zone,
-		.sync = sync,
+		.mode = mode,
 	};
+	unsigned long rc;
+
 	INIT_LIST_HEAD(&cc.freepages);
 	INIT_LIST_HEAD(&cc.migratepages);
 
-	return compact_zone(zone, &cc);
+	rc = compact_zone(zone, &cc);
+	*nr_pageblocks_skipped = cc.nr_pageblocks_skipped;
+
+	return rc;
 }
 
 int sysctl_extfrag_threshold = 500;
@@ -753,6 +861,8 @@ unsigned long try_to_compact_pages(struc
 	struct zoneref *z;
 	struct zone *zone;
 	int rc = COMPACT_SKIPPED;
+	unsigned long nr_pageblocks_skipped;
+	enum compact_mode mode;
 
 	/*
 	 * Check whether it is worth even starting compaction. The order check is
@@ -769,12 +879,22 @@ unsigned long try_to_compact_pages(struc
 								nodemask) {
 		int status;
 
-		status = compact_zone_order(zone, order, gfp_mask, sync);
+		mode = sync ? COMPACT_SYNC : COMPACT_ASYNC_MOVABLE;
+retry:
+		status = compact_zone_order(zone, order, gfp_mask, mode,
+						&nr_pageblocks_skipped);
 		rc = max(status, rc);
 
 		/* If a normal allocation would succeed, stop compacting */
 		if (zone_watermark_ok(zone, order, low_wmark_pages(zone), 0, 0))
 			break;
+
+		if (rc == COMPACT_COMPLETE && mode == COMPACT_ASYNC_MOVABLE) {
+			if (nr_pageblocks_skipped) {
+				mode = COMPACT_ASYNC_UNMOVABLE;
+				goto retry;
+			}
+		}
 	}
 
 	return rc;
@@ -808,7 +928,7 @@ static int __compact_pgdat(pg_data_t *pg
 			if (ok && cc->order > zone->compact_order_failed)
 				zone->compact_order_failed = cc->order + 1;
 			/* Currently async compaction is never deferred. */
-			else if (!ok && cc->sync)
+			else if (!ok && cc->mode == COMPACT_SYNC)
 				defer_compaction(zone, cc->order);
 		}
 
@@ -823,7 +943,7 @@ int compact_pgdat(pg_data_t *pgdat, int 
 {
 	struct compact_control cc = {
 		.order = order,
-		.sync = false,
+		.mode = COMPACT_ASYNC_MOVABLE,
 	};
 
 	return __compact_pgdat(pgdat, &cc);
@@ -833,7 +953,7 @@ static int compact_node(int nid)
 {
 	struct compact_control cc = {
 		.order = -1,
-		.sync = true,
+		.mode = COMPACT_SYNC,
 	};
 
 	return __compact_pgdat(NODE_DATA(nid), &cc);
Index: b/mm/internal.h
===================================================================
--- a/mm/internal.h	2012-06-04 15:16:11.908467919 +0200
+++ b/mm/internal.h	2012-06-04 15:16:30.396467898 +0200
@@ -94,6 +94,9 @@ extern void putback_lru_page(struct page
 /*
  * in mm/page_alloc.c
  */
+extern void set_pageblock_migratetype(struct page *page, int migratetype);
+extern int move_freepages_block(struct zone *zone, struct page *page,
+				int migratetype);
 extern void __free_pages_bootmem(struct page *page, unsigned int order);
 extern void prep_compound_page(struct page *page, unsigned long order);
 #ifdef CONFIG_MEMORY_FAILURE
@@ -101,6 +104,7 @@ extern bool is_free_buddy_page(struct pa
 #endif
 
 #if defined CONFIG_COMPACTION || defined CONFIG_CMA
+#include <linux/compaction.h>
 
 /*
  * in mm/compaction.c
@@ -119,11 +123,14 @@ struct compact_control {
 	unsigned long nr_migratepages;	/* Number of pages to migrate */
 	unsigned long free_pfn;		/* isolate_freepages search base */
 	unsigned long migrate_pfn;	/* isolate_migratepages search base */
-	bool sync;			/* Synchronous migration */
+	enum compact_mode mode;		/* Compaction mode */
 
 	int order;			/* order a direct compactor needs */
 	int migratetype;		/* MOVABLE, RECLAIMABLE etc */
 	struct zone *zone;
+
+	/* Number of UNMOVABLE destination pageblocks skipped during scan */
+	unsigned long nr_pageblocks_skipped;
 };
 
 unsigned long
Index: b/mm/page_alloc.c
===================================================================
--- a/mm/page_alloc.c	2012-06-04 15:16:27.356467917 +0200
+++ b/mm/page_alloc.c	2012-06-04 15:16:30.396467898 +0200
@@ -241,7 +241,7 @@ static char *migratetype_to_str(int migr
 	}
 }
 
-static void set_pageblock_migratetype(struct page *page, int migratetype)
+void set_pageblock_migratetype(struct page *page, int migratetype)
 {
 	struct zone *zone = page_zone(page);
 
@@ -982,8 +982,8 @@ static int move_freepages(struct zone *z
 	return pages_moved;
 }
 
-static int move_freepages_block(struct zone *zone, struct page *page,
-				int migratetype)
+int move_freepages_block(struct zone *zone, struct page *page,
+			 int migratetype)
 {
 	unsigned long start_pfn, end_pfn;
 	struct page *start_page, *end_page;
@@ -5684,7 +5684,7 @@ static int __alloc_contig_migrate_range(
 		.nr_migratepages = 0,
 		.order = -1,
 		.zone = page_zone(pfn_to_page(start)),
-		.sync = true,
+		.mode = COMPACT_SYNC,
 	};
 	INIT_LIST_HEAD(&cc.migratepages);
 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v9] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
  2012-06-04 13:43 ` Bartlomiej Zolnierkiewicz
@ 2012-06-04 14:22   ` Michal Nazarewicz
  -1 siblings, 0 replies; 31+ messages in thread
From: Michal Nazarewicz @ 2012-06-04 14:22 UTC (permalink / raw)
  To: linux-mm, linux-kernel, Bartlomiej Zolnierkiewicz
  Cc: Minchan Kim, Hugh Dickins, Linus Torvalds, Kyungmin Park,
	Marek Szyprowski, Mel Gorman, Rik van Riel, Dave Jones,
	Andrew Morton, Cong Wang, Markus Trippelsdorf

On Mon, 04 Jun 2012 15:43:56 +0200, Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> wrote:
> +/*
> + * Returns true if MIGRATE_UNMOVABLE pageblock can be successfully
> + * converted to MIGRATE_MOVABLE type, false otherwise.
> + */
> +static bool can_rescue_unmovable_pageblock(struct page *page, bool locked)
> +{
> +	unsigned long pfn, start_pfn, end_pfn;
> +	struct page *start_page, *end_page, *cursor_page;
> +
> +	pfn = page_to_pfn(page);
> +	start_pfn = pfn & ~(pageblock_nr_pages - 1);
> +	end_pfn = start_pfn + pageblock_nr_pages - 1;
> +
> +	start_page = pfn_to_page(start_pfn);
> +	end_page = pfn_to_page(end_pfn);
> +
> +	for (cursor_page = start_page, pfn = start_pfn; cursor_page <= end_page;
> +		pfn++, cursor_page++) {
> +		struct zone *zone = page_zone(start_page);
> +		unsigned long flags;
> +
> +		if (!pfn_valid_within(pfn))
> +			continue;
> +
> +		/* Do not deal with pageblocks that overlap zones */
> +		if (page_zone(cursor_page) != zone)
> +			return false;
> +
> +		if (!locked)
> +			spin_lock_irqsave(&zone->lock, flags);
> +
> +		if (PageBuddy(cursor_page)) {
> +			int order = page_order(cursor_page);
>-/* Returns true if the page is within a block suitable for migration to */
> -static bool suitable_migration_target(struct page *page)
> +			pfn += (1 << order) - 1;
> +			cursor_page += (1 << order) - 1;
> +
> +			if (!locked)
> +				spin_unlock_irqrestore(&zone->lock, flags);
> +			continue;
> +		} else if (page_count(cursor_page) == 0 ||
> +			   PageLRU(cursor_page)) {
> +			if (!locked)
> +				spin_unlock_irqrestore(&zone->lock, flags);
> +			continue;
> +		}
> +
> +		if (!locked)
> +			spin_unlock_irqrestore(&zone->lock, flags);

spin_unlock in three spaces is ugly.  How about adding a flag that holds the
result of the function which you use as for loop condition and you set it to
false inside an additional else clause?  Eg.:

	bool result = true;
	for (...; result && cursor_page <= end_page; ...) {
		...
		if (!pfn_valid_within(pfn)) continue;
		if (page_zone(cursor_page) != zone) return false;
		if (!locked) spin_lock_irqsave(...);
		
		if (PageBuddy(...)) {
			...
		} else if (page_count(cursor_page) == 0 ||
			   PageLRU(cursor_page)) {
			...
		} else {
			result = false;
		}
		if (!locked) spin_unlock_irqsave(...);
	}
	return result;

> +		return false;
> +	}
> +
> +	return true;
> +}

How do you make sure that a page is not allocated while this runs?  Or you just
don't care?  Not that even with zone lock, page may be allocated from pcp list
on (another) CPU.

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v9] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
@ 2012-06-04 14:22   ` Michal Nazarewicz
  0 siblings, 0 replies; 31+ messages in thread
From: Michal Nazarewicz @ 2012-06-04 14:22 UTC (permalink / raw)
  To: linux-mm, linux-kernel, Bartlomiej Zolnierkiewicz
  Cc: Minchan Kim, Hugh Dickins, Linus Torvalds, Kyungmin Park,
	Marek Szyprowski, Mel Gorman, Rik van Riel, Dave Jones,
	Andrew Morton, Cong Wang, Markus Trippelsdorf

On Mon, 04 Jun 2012 15:43:56 +0200, Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> wrote:
> +/*
> + * Returns true if MIGRATE_UNMOVABLE pageblock can be successfully
> + * converted to MIGRATE_MOVABLE type, false otherwise.
> + */
> +static bool can_rescue_unmovable_pageblock(struct page *page, bool locked)
> +{
> +	unsigned long pfn, start_pfn, end_pfn;
> +	struct page *start_page, *end_page, *cursor_page;
> +
> +	pfn = page_to_pfn(page);
> +	start_pfn = pfn & ~(pageblock_nr_pages - 1);
> +	end_pfn = start_pfn + pageblock_nr_pages - 1;
> +
> +	start_page = pfn_to_page(start_pfn);
> +	end_page = pfn_to_page(end_pfn);
> +
> +	for (cursor_page = start_page, pfn = start_pfn; cursor_page <= end_page;
> +		pfn++, cursor_page++) {
> +		struct zone *zone = page_zone(start_page);
> +		unsigned long flags;
> +
> +		if (!pfn_valid_within(pfn))
> +			continue;
> +
> +		/* Do not deal with pageblocks that overlap zones */
> +		if (page_zone(cursor_page) != zone)
> +			return false;
> +
> +		if (!locked)
> +			spin_lock_irqsave(&zone->lock, flags);
> +
> +		if (PageBuddy(cursor_page)) {
> +			int order = page_order(cursor_page);
>-/* Returns true if the page is within a block suitable for migration to */
> -static bool suitable_migration_target(struct page *page)
> +			pfn += (1 << order) - 1;
> +			cursor_page += (1 << order) - 1;
> +
> +			if (!locked)
> +				spin_unlock_irqrestore(&zone->lock, flags);
> +			continue;
> +		} else if (page_count(cursor_page) == 0 ||
> +			   PageLRU(cursor_page)) {
> +			if (!locked)
> +				spin_unlock_irqrestore(&zone->lock, flags);
> +			continue;
> +		}
> +
> +		if (!locked)
> +			spin_unlock_irqrestore(&zone->lock, flags);

spin_unlock in three spaces is ugly.  How about adding a flag that holds the
result of the function which you use as for loop condition and you set it to
false inside an additional else clause?  Eg.:

	bool result = true;
	for (...; result && cursor_page <= end_page; ...) {
		...
		if (!pfn_valid_within(pfn)) continue;
		if (page_zone(cursor_page) != zone) return false;
		if (!locked) spin_lock_irqsave(...);
		
		if (PageBuddy(...)) {
			...
		} else if (page_count(cursor_page) == 0 ||
			   PageLRU(cursor_page)) {
			...
		} else {
			result = false;
		}
		if (!locked) spin_unlock_irqsave(...);
	}
	return result;

> +		return false;
> +	}
> +
> +	return true;
> +}

How do you make sure that a page is not allocated while this runs?  Or you just
don't care?  Not that even with zone lock, page may be allocated from pcp list
on (another) CPU.

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v9] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
  2012-06-04 14:22   ` Michal Nazarewicz
@ 2012-06-06 12:55     ` Bartlomiej Zolnierkiewicz
  -1 siblings, 0 replies; 31+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2012-06-06 12:55 UTC (permalink / raw)
  To: Michal Nazarewicz
  Cc: linux-mm, linux-kernel, Minchan Kim, Hugh Dickins, Linus Torvalds,
	Kyungmin Park, Marek Szyprowski, Mel Gorman, Rik van Riel,
	Dave Jones, Andrew Morton, Cong Wang, Markus Trippelsdorf

On Monday 04 June 2012 16:22:51 Michal Nazarewicz wrote:
> On Mon, 04 Jun 2012 15:43:56 +0200, Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> wrote:
> > +/*
> > + * Returns true if MIGRATE_UNMOVABLE pageblock can be successfully
> > + * converted to MIGRATE_MOVABLE type, false otherwise.
> > + */
> > +static bool can_rescue_unmovable_pageblock(struct page *page, bool locked)
> > +{
> > +	unsigned long pfn, start_pfn, end_pfn;
> > +	struct page *start_page, *end_page, *cursor_page;
> > +
> > +	pfn = page_to_pfn(page);
> > +	start_pfn = pfn & ~(pageblock_nr_pages - 1);
> > +	end_pfn = start_pfn + pageblock_nr_pages - 1;
> > +
> > +	start_page = pfn_to_page(start_pfn);
> > +	end_page = pfn_to_page(end_pfn);
> > +
> > +	for (cursor_page = start_page, pfn = start_pfn; cursor_page <= end_page;
> > +		pfn++, cursor_page++) {
> > +		struct zone *zone = page_zone(start_page);
> > +		unsigned long flags;
> > +
> > +		if (!pfn_valid_within(pfn))
> > +			continue;
> > +
> > +		/* Do not deal with pageblocks that overlap zones */
> > +		if (page_zone(cursor_page) != zone)
> > +			return false;
> > +
> > +		if (!locked)
> > +			spin_lock_irqsave(&zone->lock, flags);
> > +
> > +		if (PageBuddy(cursor_page)) {
> > +			int order = page_order(cursor_page);
> >-/* Returns true if the page is within a block suitable for migration to */
> > -static bool suitable_migration_target(struct page *page)
> > +			pfn += (1 << order) - 1;
> > +			cursor_page += (1 << order) - 1;
> > +
> > +			if (!locked)
> > +				spin_unlock_irqrestore(&zone->lock, flags);
> > +			continue;
> > +		} else if (page_count(cursor_page) == 0 ||
> > +			   PageLRU(cursor_page)) {
> > +			if (!locked)
> > +				spin_unlock_irqrestore(&zone->lock, flags);
> > +			continue;
> > +		}
> > +
> > +		if (!locked)
> > +			spin_unlock_irqrestore(&zone->lock, flags);
> 
> spin_unlock in three spaces is ugly.  How about adding a flag that holds the
> result of the function which you use as for loop condition and you set it to
> false inside an additional else clause?  Eg.:
> 
> 	bool result = true;
> 	for (...; result && cursor_page <= end_page; ...) {
> 		...
> 		if (!pfn_valid_within(pfn)) continue;
> 		if (page_zone(cursor_page) != zone) return false;
> 		if (!locked) spin_lock_irqsave(...);
> 		
> 		if (PageBuddy(...)) {
> 			...
> 		} else if (page_count(cursor_page) == 0 ||
> 			   PageLRU(cursor_page)) {
> 			...
> 		} else {
> 			result = false;
> 		}
> 		if (!locked) spin_unlock_irqsave(...);
> 	}
> 	return result;

Thanks, I'll use the hint (if still applicable) in the next patch version.

> > +		return false;
> > +	}
> > +
> > +	return true;
> > +}
> 
> How do you make sure that a page is not allocated while this runs?  Or you just
> don't care?  Not that even with zone lock, page may be allocated from pcp list
> on (another) CPU.

Ok, I see the issue (i.e. pcp page can be returned by rmqueue_bulk() in
buffered_rmqueue() and its page count will be increased in prep_new_page()
a bit later with zone lock dropped so while we may not see the page as
"bad" one in can_rescue_unmovable_pageblock() it may end up as unmovable
one in a pageblock that was just changed to MIGRATE_MOVABLE type).

It is basically similar problem to page allocation vs alloc_contig_range()
races present in CMA [*] so we may deal with it in a similar manner as
CMA: isolate pageblock so no new allocations will be allowed from it,
check if we can do pageblock transition to MIGRATE_MOVABLE type and do
it if so, drain pcp lists, check if the transition was successful and
if there are some pages that slipped through just revert the operation..

However I worry that this still won't cover all races as we can have
some page in "transient state" (no longer on pcp list but not yet used,
simply still being processed by buffered_rmqueue() while we count it
as "good" one in the pageblock transition verification code)?

[*] BTW please see http://marc.info/?l=linux-mm&m=133775797022645&w=2
for CMA related fixes

Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung Poland R&D Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v9] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
@ 2012-06-06 12:55     ` Bartlomiej Zolnierkiewicz
  0 siblings, 0 replies; 31+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2012-06-06 12:55 UTC (permalink / raw)
  To: Michal Nazarewicz
  Cc: linux-mm, linux-kernel, Minchan Kim, Hugh Dickins, Linus Torvalds,
	Kyungmin Park, Marek Szyprowski, Mel Gorman, Rik van Riel,
	Dave Jones, Andrew Morton, Cong Wang, Markus Trippelsdorf

On Monday 04 June 2012 16:22:51 Michal Nazarewicz wrote:
> On Mon, 04 Jun 2012 15:43:56 +0200, Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> wrote:
> > +/*
> > + * Returns true if MIGRATE_UNMOVABLE pageblock can be successfully
> > + * converted to MIGRATE_MOVABLE type, false otherwise.
> > + */
> > +static bool can_rescue_unmovable_pageblock(struct page *page, bool locked)
> > +{
> > +	unsigned long pfn, start_pfn, end_pfn;
> > +	struct page *start_page, *end_page, *cursor_page;
> > +
> > +	pfn = page_to_pfn(page);
> > +	start_pfn = pfn & ~(pageblock_nr_pages - 1);
> > +	end_pfn = start_pfn + pageblock_nr_pages - 1;
> > +
> > +	start_page = pfn_to_page(start_pfn);
> > +	end_page = pfn_to_page(end_pfn);
> > +
> > +	for (cursor_page = start_page, pfn = start_pfn; cursor_page <= end_page;
> > +		pfn++, cursor_page++) {
> > +		struct zone *zone = page_zone(start_page);
> > +		unsigned long flags;
> > +
> > +		if (!pfn_valid_within(pfn))
> > +			continue;
> > +
> > +		/* Do not deal with pageblocks that overlap zones */
> > +		if (page_zone(cursor_page) != zone)
> > +			return false;
> > +
> > +		if (!locked)
> > +			spin_lock_irqsave(&zone->lock, flags);
> > +
> > +		if (PageBuddy(cursor_page)) {
> > +			int order = page_order(cursor_page);
> >-/* Returns true if the page is within a block suitable for migration to */
> > -static bool suitable_migration_target(struct page *page)
> > +			pfn += (1 << order) - 1;
> > +			cursor_page += (1 << order) - 1;
> > +
> > +			if (!locked)
> > +				spin_unlock_irqrestore(&zone->lock, flags);
> > +			continue;
> > +		} else if (page_count(cursor_page) == 0 ||
> > +			   PageLRU(cursor_page)) {
> > +			if (!locked)
> > +				spin_unlock_irqrestore(&zone->lock, flags);
> > +			continue;
> > +		}
> > +
> > +		if (!locked)
> > +			spin_unlock_irqrestore(&zone->lock, flags);
> 
> spin_unlock in three spaces is ugly.  How about adding a flag that holds the
> result of the function which you use as for loop condition and you set it to
> false inside an additional else clause?  Eg.:
> 
> 	bool result = true;
> 	for (...; result && cursor_page <= end_page; ...) {
> 		...
> 		if (!pfn_valid_within(pfn)) continue;
> 		if (page_zone(cursor_page) != zone) return false;
> 		if (!locked) spin_lock_irqsave(...);
> 		
> 		if (PageBuddy(...)) {
> 			...
> 		} else if (page_count(cursor_page) == 0 ||
> 			   PageLRU(cursor_page)) {
> 			...
> 		} else {
> 			result = false;
> 		}
> 		if (!locked) spin_unlock_irqsave(...);
> 	}
> 	return result;

Thanks, I'll use the hint (if still applicable) in the next patch version.

> > +		return false;
> > +	}
> > +
> > +	return true;
> > +}
> 
> How do you make sure that a page is not allocated while this runs?  Or you just
> don't care?  Not that even with zone lock, page may be allocated from pcp list
> on (another) CPU.

Ok, I see the issue (i.e. pcp page can be returned by rmqueue_bulk() in
buffered_rmqueue() and its page count will be increased in prep_new_page()
a bit later with zone lock dropped so while we may not see the page as
"bad" one in can_rescue_unmovable_pageblock() it may end up as unmovable
one in a pageblock that was just changed to MIGRATE_MOVABLE type).

It is basically similar problem to page allocation vs alloc_contig_range()
races present in CMA [*] so we may deal with it in a similar manner as
CMA: isolate pageblock so no new allocations will be allowed from it,
check if we can do pageblock transition to MIGRATE_MOVABLE type and do
it if so, drain pcp lists, check if the transition was successful and
if there are some pages that slipped through just revert the operation..

However I worry that this still won't cover all races as we can have
some page in "transient state" (no longer on pcp list but not yet used,
simply still being processed by buffered_rmqueue() while we count it
as "good" one in the pageblock transition verification code)?

[*] BTW please see http://marc.info/?l=linux-mm&m=133775797022645&w=2
for CMA related fixes

Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung Poland R&D Center

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v9] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
  2012-06-06 12:55     ` Bartlomiej Zolnierkiewicz
@ 2012-06-06 15:52       ` Michal Nazarewicz
  -1 siblings, 0 replies; 31+ messages in thread
From: Michal Nazarewicz @ 2012-06-06 15:52 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: linux-mm, linux-kernel, Minchan Kim, Hugh Dickins, Linus Torvalds,
	Kyungmin Park, Marek Szyprowski, Mel Gorman, Rik van Riel,
	Dave Jones, Andrew Morton, Cong Wang, Markus Trippelsdorf

On Wed, 06 Jun 2012 14:55:28 +0200, Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> wrote:

> On Monday 04 June 2012 16:22:51 Michal Nazarewicz wrote:
>> On Mon, 04 Jun 2012 15:43:56 +0200, Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> wrote:
>> > +/*
>> > + * Returns true if MIGRATE_UNMOVABLE pageblock can be successfully
>> > + * converted to MIGRATE_MOVABLE type, false otherwise.
>> > + */
>> > +static bool can_rescue_unmovable_pageblock(struct page *page, bool locked)
>> > +{
>> > +	unsigned long pfn, start_pfn, end_pfn;
>> > +	struct page *start_page, *end_page, *cursor_page;
>> > +
>> > +	pfn = page_to_pfn(page);
>> > +	start_pfn = pfn & ~(pageblock_nr_pages - 1);
>> > +	end_pfn = start_pfn + pageblock_nr_pages - 1;
>> > +
>> > +	start_page = pfn_to_page(start_pfn);
>> > +	end_page = pfn_to_page(end_pfn);
>> > +
>> > +	for (cursor_page = start_page, pfn = start_pfn; cursor_page <= end_page;
>> > +		pfn++, cursor_page++) {
>> > +		struct zone *zone = page_zone(start_page);
>> > +		unsigned long flags;
>> > +
>> > +		if (!pfn_valid_within(pfn))
>> > +			continue;
>> > +
>> > +		/* Do not deal with pageblocks that overlap zones */
>> > +		if (page_zone(cursor_page) != zone)
>> > +			return false;
>> > +
>> > +		if (!locked)
>> > +			spin_lock_irqsave(&zone->lock, flags);
>> > +
>> > +		if (PageBuddy(cursor_page)) {
>> > +			int order = page_order(cursor_page);
>> >-/* Returns true if the page is within a block suitable for migration to */
>> > -static bool suitable_migration_target(struct page *page)
>> > +			pfn += (1 << order) - 1;
>> > +			cursor_page += (1 << order) - 1;
>> > +
>> > +			if (!locked)
>> > +				spin_unlock_irqrestore(&zone->lock, flags);
>> > +			continue;
>> > +		} else if (page_count(cursor_page) == 0 ||
>> > +			   PageLRU(cursor_page)) {
>> > +			if (!locked)
>> > +				spin_unlock_irqrestore(&zone->lock, flags);
>> > +			continue;
>> > +		}
>> > +
>> > +		if (!locked)
>> > +			spin_unlock_irqrestore(&zone->lock, flags);
>>
>> spin_unlock in three spaces is ugly.  How about adding a flag that holds the
>> result of the function which you use as for loop condition and you set it to
>> false inside an additional else clause?  Eg.:
>>
>> 	bool result = true;
>> 	for (...; result && cursor_page <= end_page; ...) {
>> 		...
>> 		if (!pfn_valid_within(pfn)) continue;
>> 		if (page_zone(cursor_page) != zone) return false;
>> 		if (!locked) spin_lock_irqsave(...);
>> 		
>> 		if (PageBuddy(...)) {
>> 			...
>> 		} else if (page_count(cursor_page) == 0 ||
>> 			   PageLRU(cursor_page)) {
>> 			...
>> 		} else {
>> 			result = false;
>> 		}
>> 		if (!locked) spin_unlock_irqsave(...);
>> 	}
>> 	return result;
>
> Thanks, I'll use the hint (if still applicable) in the next patch version.
>
>> > +		return false;
>> > +	}
>> > +
>> > +	return true;
>> > +}
>>
>> How do you make sure that a page is not allocated while this runs?  Or you just
>> don't care?  Not that even with zone lock, page may be allocated from pcp list
>> on (another) CPU.
>
> Ok, I see the issue (i.e. pcp page can be returned by rmqueue_bulk() in
> buffered_rmqueue() and its page count will be increased in prep_new_page()
> a bit later with zone lock dropped so while we may not see the page as
> "bad" one in can_rescue_unmovable_pageblock() it may end up as unmovable
> one in a pageblock that was just changed to MIGRATE_MOVABLE type).

Allocating unmovable pages from movable pageblock is allowed though.  But,
consider those two scenarios:

thread A                               thread B
                                        allocate page from pcp list
call can_rescue_unmovable_pageblock()
  iterate over all pages
   find that one of them is allocated
    so return false

Second one:

thread A                               thread B
call can_rescue_unmovable_pageblock()
  iterate over all pages
   find that all of them are free
                                        allocate page from pcp list
    return true

Note that the second scenario can happen even if zone lock is
held.  So, why in both the function returns different result?

> It is basically similar problem to page allocation vs alloc_contig_range()
> races present in CMA so we may deal with it in a similar manner as
> CMA: isolate pageblock so no new allocations will be allowed from it,
> check if we can do pageblock transition to MIGRATE_MOVABLE type and do
> it if so, drain pcp lists, check if the transition was successful and
> if there are some pages that slipped through just revert the operation..

To me this sounds like too much work.

I'm also not sure if you are not overthinking it, which is why I asked
at the beginning “or you just don't care?”  I'm not entirely sure that
you need to make sure that all pages in the pageblock are in fact free.
If some of them slip through, nothing catastrophic happens, does it?

> [*] BTW please see http://marc.info/?l=linux-mm&m=133775797022645&w=2
> for CMA related fixes

Could you mail it to me again, that would be great, thanks.


-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v9] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
@ 2012-06-06 15:52       ` Michal Nazarewicz
  0 siblings, 0 replies; 31+ messages in thread
From: Michal Nazarewicz @ 2012-06-06 15:52 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: linux-mm, linux-kernel, Minchan Kim, Hugh Dickins, Linus Torvalds,
	Kyungmin Park, Marek Szyprowski, Mel Gorman, Rik van Riel,
	Dave Jones, Andrew Morton, Cong Wang, Markus Trippelsdorf

On Wed, 06 Jun 2012 14:55:28 +0200, Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> wrote:

> On Monday 04 June 2012 16:22:51 Michal Nazarewicz wrote:
>> On Mon, 04 Jun 2012 15:43:56 +0200, Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> wrote:
>> > +/*
>> > + * Returns true if MIGRATE_UNMOVABLE pageblock can be successfully
>> > + * converted to MIGRATE_MOVABLE type, false otherwise.
>> > + */
>> > +static bool can_rescue_unmovable_pageblock(struct page *page, bool locked)
>> > +{
>> > +	unsigned long pfn, start_pfn, end_pfn;
>> > +	struct page *start_page, *end_page, *cursor_page;
>> > +
>> > +	pfn = page_to_pfn(page);
>> > +	start_pfn = pfn & ~(pageblock_nr_pages - 1);
>> > +	end_pfn = start_pfn + pageblock_nr_pages - 1;
>> > +
>> > +	start_page = pfn_to_page(start_pfn);
>> > +	end_page = pfn_to_page(end_pfn);
>> > +
>> > +	for (cursor_page = start_page, pfn = start_pfn; cursor_page <= end_page;
>> > +		pfn++, cursor_page++) {
>> > +		struct zone *zone = page_zone(start_page);
>> > +		unsigned long flags;
>> > +
>> > +		if (!pfn_valid_within(pfn))
>> > +			continue;
>> > +
>> > +		/* Do not deal with pageblocks that overlap zones */
>> > +		if (page_zone(cursor_page) != zone)
>> > +			return false;
>> > +
>> > +		if (!locked)
>> > +			spin_lock_irqsave(&zone->lock, flags);
>> > +
>> > +		if (PageBuddy(cursor_page)) {
>> > +			int order = page_order(cursor_page);
>> >-/* Returns true if the page is within a block suitable for migration to */
>> > -static bool suitable_migration_target(struct page *page)
>> > +			pfn += (1 << order) - 1;
>> > +			cursor_page += (1 << order) - 1;
>> > +
>> > +			if (!locked)
>> > +				spin_unlock_irqrestore(&zone->lock, flags);
>> > +			continue;
>> > +		} else if (page_count(cursor_page) == 0 ||
>> > +			   PageLRU(cursor_page)) {
>> > +			if (!locked)
>> > +				spin_unlock_irqrestore(&zone->lock, flags);
>> > +			continue;
>> > +		}
>> > +
>> > +		if (!locked)
>> > +			spin_unlock_irqrestore(&zone->lock, flags);
>>
>> spin_unlock in three spaces is ugly.  How about adding a flag that holds the
>> result of the function which you use as for loop condition and you set it to
>> false inside an additional else clause?  Eg.:
>>
>> 	bool result = true;
>> 	for (...; result && cursor_page <= end_page; ...) {
>> 		...
>> 		if (!pfn_valid_within(pfn)) continue;
>> 		if (page_zone(cursor_page) != zone) return false;
>> 		if (!locked) spin_lock_irqsave(...);
>> 		
>> 		if (PageBuddy(...)) {
>> 			...
>> 		} else if (page_count(cursor_page) == 0 ||
>> 			   PageLRU(cursor_page)) {
>> 			...
>> 		} else {
>> 			result = false;
>> 		}
>> 		if (!locked) spin_unlock_irqsave(...);
>> 	}
>> 	return result;
>
> Thanks, I'll use the hint (if still applicable) in the next patch version.
>
>> > +		return false;
>> > +	}
>> > +
>> > +	return true;
>> > +}
>>
>> How do you make sure that a page is not allocated while this runs?  Or you just
>> don't care?  Not that even with zone lock, page may be allocated from pcp list
>> on (another) CPU.
>
> Ok, I see the issue (i.e. pcp page can be returned by rmqueue_bulk() in
> buffered_rmqueue() and its page count will be increased in prep_new_page()
> a bit later with zone lock dropped so while we may not see the page as
> "bad" one in can_rescue_unmovable_pageblock() it may end up as unmovable
> one in a pageblock that was just changed to MIGRATE_MOVABLE type).

Allocating unmovable pages from movable pageblock is allowed though.  But,
consider those two scenarios:

thread A                               thread B
                                        allocate page from pcp list
call can_rescue_unmovable_pageblock()
  iterate over all pages
   find that one of them is allocated
    so return false

Second one:

thread A                               thread B
call can_rescue_unmovable_pageblock()
  iterate over all pages
   find that all of them are free
                                        allocate page from pcp list
    return true

Note that the second scenario can happen even if zone lock is
held.  So, why in both the function returns different result?

> It is basically similar problem to page allocation vs alloc_contig_range()
> races present in CMA so we may deal with it in a similar manner as
> CMA: isolate pageblock so no new allocations will be allowed from it,
> check if we can do pageblock transition to MIGRATE_MOVABLE type and do
> it if so, drain pcp lists, check if the transition was successful and
> if there are some pages that slipped through just revert the operation..

To me this sounds like too much work.

I'm also not sure if you are not overthinking it, which is why I asked
at the beginning “or you just don't care?”  I'm not entirely sure that
you need to make sure that all pages in the pageblock are in fact free.
If some of them slip through, nothing catastrophic happens, does it?

> [*] BTW please see http://marc.info/?l=linux-mm&m=133775797022645&w=2
> for CMA related fixes

Could you mail it to me again, that would be great, thanks.


-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v9] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
  2012-06-06 15:52       ` Michal Nazarewicz
@ 2012-06-07  4:23         ` Minchan Kim
  -1 siblings, 0 replies; 31+ messages in thread
From: Minchan Kim @ 2012-06-07  4:23 UTC (permalink / raw)
  To: Michal Nazarewicz
  Cc: Bartlomiej Zolnierkiewicz, linux-mm, linux-kernel, Hugh Dickins,
	Linus Torvalds, Kyungmin Park, Marek Szyprowski, Mel Gorman,
	Rik van Riel, Dave Jones, Andrew Morton, Cong Wang,
	Markus Trippelsdorf

On 06/07/2012 12:52 AM, Michal Nazarewicz wrote:

> On Wed, 06 Jun 2012 14:55:28 +0200, Bartlomiej Zolnierkiewicz
> <b.zolnierkie@samsung.com> wrote:
> 
>> On Monday 04 June 2012 16:22:51 Michal Nazarewicz wrote:
>>> On Mon, 04 Jun 2012 15:43:56 +0200, Bartlomiej Zolnierkiewicz
>>> <b.zolnierkie@samsung.com> wrote:
>>> > +/*
>>> > + * Returns true if MIGRATE_UNMOVABLE pageblock can be successfully
>>> > + * converted to MIGRATE_MOVABLE type, false otherwise.
>>> > + */
>>> > +static bool can_rescue_unmovable_pageblock(struct page *page, bool
>>> locked)
>>> > +{
>>> > +    unsigned long pfn, start_pfn, end_pfn;
>>> > +    struct page *start_page, *end_page, *cursor_page;
>>> > +
>>> > +    pfn = page_to_pfn(page);
>>> > +    start_pfn = pfn & ~(pageblock_nr_pages - 1);
>>> > +    end_pfn = start_pfn + pageblock_nr_pages - 1;
>>> > +
>>> > +    start_page = pfn_to_page(start_pfn);
>>> > +    end_page = pfn_to_page(end_pfn);
>>> > +
>>> > +    for (cursor_page = start_page, pfn = start_pfn; cursor_page <=
>>> end_page;
>>> > +        pfn++, cursor_page++) {
>>> > +        struct zone *zone = page_zone(start_page);
>>> > +        unsigned long flags;
>>> > +
>>> > +        if (!pfn_valid_within(pfn))
>>> > +            continue;
>>> > +
>>> > +        /* Do not deal with pageblocks that overlap zones */
>>> > +        if (page_zone(cursor_page) != zone)
>>> > +            return false;
>>> > +
>>> > +        if (!locked)
>>> > +            spin_lock_irqsave(&zone->lock, flags);
>>> > +
>>> > +        if (PageBuddy(cursor_page)) {
>>> > +            int order = page_order(cursor_page);
>>> >-/* Returns true if the page is within a block suitable for
>>> migration to */
>>> > -static bool suitable_migration_target(struct page *page)
>>> > +            pfn += (1 << order) - 1;
>>> > +            cursor_page += (1 << order) - 1;
>>> > +
>>> > +            if (!locked)
>>> > +                spin_unlock_irqrestore(&zone->lock, flags);
>>> > +            continue;
>>> > +        } else if (page_count(cursor_page) == 0 ||
>>> > +               PageLRU(cursor_page)) {
>>> > +            if (!locked)
>>> > +                spin_unlock_irqrestore(&zone->lock, flags);
>>> > +            continue;
>>> > +        }
>>> > +
>>> > +        if (!locked)
>>> > +            spin_unlock_irqrestore(&zone->lock, flags);
>>>
>>> spin_unlock in three spaces is ugly.  How about adding a flag that
>>> holds the
>>> result of the function which you use as for loop condition and you
>>> set it to
>>> false inside an additional else clause?  Eg.:
>>>
>>>     bool result = true;
>>>     for (...; result && cursor_page <= end_page; ...) {
>>>         ...
>>>         if (!pfn_valid_within(pfn)) continue;
>>>         if (page_zone(cursor_page) != zone) return false;
>>>         if (!locked) spin_lock_irqsave(...);
>>>        
>>>         if (PageBuddy(...)) {
>>>             ...
>>>         } else if (page_count(cursor_page) == 0 ||
>>>                PageLRU(cursor_page)) {
>>>             ...
>>>         } else {
>>>             result = false;
>>>         }
>>>         if (!locked) spin_unlock_irqsave(...);
>>>     }
>>>     return result;
>>
>> Thanks, I'll use the hint (if still applicable) in the next patch
>> version.
>>
>>> > +        return false;
>>> > +    }
>>> > +
>>> > +    return true;
>>> > +}
>>>
>>> How do you make sure that a page is not allocated while this runs? 
>>> Or you just
>>> don't care?  Not that even with zone lock, page may be allocated from
>>> pcp list
>>> on (another) CPU.
>>
>> Ok, I see the issue (i.e. pcp page can be returned by rmqueue_bulk() in
>> buffered_rmqueue() and its page count will be increased in
>> prep_new_page()
>> a bit later with zone lock dropped so while we may not see the page as
>> "bad" one in can_rescue_unmovable_pageblock() it may end up as unmovable
>> one in a pageblock that was just changed to MIGRATE_MOVABLE type).
> 
> Allocating unmovable pages from movable pageblock is allowed though.  But,
> consider those two scenarios:
> 
> thread A                               thread B
>                                        allocate page from pcp list
> call can_rescue_unmovable_pageblock()
>  iterate over all pages
>   find that one of them is allocated
>    so return false
> 
> Second one:
> 
> thread A                               thread B
> call can_rescue_unmovable_pageblock()
>  iterate over all pages
>   find that all of them are free
>                                        allocate page from pcp list
>    return true
> 
> Note that the second scenario can happen even if zone lock is
> held.  So, why in both the function returns different result?
> 
>> It is basically similar problem to page allocation vs
>> alloc_contig_range()
>> races present in CMA so we may deal with it in a similar manner as
>> CMA: isolate pageblock so no new allocations will be allowed from it,
>> check if we can do pageblock transition to MIGRATE_MOVABLE type and do
>> it if so, drain pcp lists, check if the transition was successful and
>> if there are some pages that slipped through just revert the operation..
> 
> To me this sounds like too much work.
> 
> I'm also not sure if you are not overthinking it, which is why I asked
> at the beginning a??or you just don't care?a??  I'm not entirely sure that
> you need to make sure that all pages in the pageblock are in fact free.


Free page isn't only problem but also PageLRU check.
We can't make sure it without lru_lock or isolation of the page.

> If some of them slip through, nothing catastrophic happens, does it?
> 


Right. It can regress anti-fragmentation but I believe it would be not severe.
The more problem than it is to use page_count without a pin of page which ends up
racing with THP free by another CPU so that kernel would crash by dangling pointer of compound_head.


>> [*] BTW please see http://marc.info/?l=linux-mm&m=133775797022645&w=2
>> for CMA related fixes
> 
> Could you mail it to me again, that would be great, thanks.
> 
> 



-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v9] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
@ 2012-06-07  4:23         ` Minchan Kim
  0 siblings, 0 replies; 31+ messages in thread
From: Minchan Kim @ 2012-06-07  4:23 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm

On 06/07/2012 12:52 AM, Michal Nazarewicz wrote:

> On Wed, 06 Jun 2012 14:55:28 +0200, Bartlomiej Zolnierkiewicz
> <b.zolnierkie@samsung.com> wrote:
> 
>> On Monday 04 June 2012 16:22:51 Michal Nazarewicz wrote:
>>> On Mon, 04 Jun 2012 15:43:56 +0200, Bartlomiej Zolnierkiewicz
>>> <b.zolnierkie@samsung.com> wrote:
>>> > +/*
>>> > + * Returns true if MIGRATE_UNMOVABLE pageblock can be successfully
>>> > + * converted to MIGRATE_MOVABLE type, false otherwise.
>>> > + */
>>> > +static bool can_rescue_unmovable_pageblock(struct page *page, bool
>>> locked)
>>> > +{
>>> > +    unsigned long pfn, start_pfn, end_pfn;
>>> > +    struct page *start_page, *end_page, *cursor_page;
>>> > +
>>> > +    pfn = page_to_pfn(page);
>>> > +    start_pfn = pfn & ~(pageblock_nr_pages - 1);
>>> > +    end_pfn = start_pfn + pageblock_nr_pages - 1;
>>> > +
>>> > +    start_page = pfn_to_page(start_pfn);
>>> > +    end_page = pfn_to_page(end_pfn);
>>> > +
>>> > +    for (cursor_page = start_page, pfn = start_pfn; cursor_page <=
>>> end_page;
>>> > +        pfn++, cursor_page++) {
>>> > +        struct zone *zone = page_zone(start_page);
>>> > +        unsigned long flags;
>>> > +
>>> > +        if (!pfn_valid_within(pfn))
>>> > +            continue;
>>> > +
>>> > +        /* Do not deal with pageblocks that overlap zones */
>>> > +        if (page_zone(cursor_page) != zone)
>>> > +            return false;
>>> > +
>>> > +        if (!locked)
>>> > +            spin_lock_irqsave(&zone->lock, flags);
>>> > +
>>> > +        if (PageBuddy(cursor_page)) {
>>> > +            int order = page_order(cursor_page);
>>> >-/* Returns true if the page is within a block suitable for
>>> migration to */
>>> > -static bool suitable_migration_target(struct page *page)
>>> > +            pfn += (1 << order) - 1;
>>> > +            cursor_page += (1 << order) - 1;
>>> > +
>>> > +            if (!locked)
>>> > +                spin_unlock_irqrestore(&zone->lock, flags);
>>> > +            continue;
>>> > +        } else if (page_count(cursor_page) == 0 ||
>>> > +               PageLRU(cursor_page)) {
>>> > +            if (!locked)
>>> > +                spin_unlock_irqrestore(&zone->lock, flags);
>>> > +            continue;
>>> > +        }
>>> > +
>>> > +        if (!locked)
>>> > +            spin_unlock_irqrestore(&zone->lock, flags);
>>>
>>> spin_unlock in three spaces is ugly.  How about adding a flag that
>>> holds the
>>> result of the function which you use as for loop condition and you
>>> set it to
>>> false inside an additional else clause?  Eg.:
>>>
>>>     bool result = true;
>>>     for (...; result && cursor_page <= end_page; ...) {
>>>         ...
>>>         if (!pfn_valid_within(pfn)) continue;
>>>         if (page_zone(cursor_page) != zone) return false;
>>>         if (!locked) spin_lock_irqsave(...);
>>>        
>>>         if (PageBuddy(...)) {
>>>             ...
>>>         } else if (page_count(cursor_page) == 0 ||
>>>                PageLRU(cursor_page)) {
>>>             ...
>>>         } else {
>>>             result = false;
>>>         }
>>>         if (!locked) spin_unlock_irqsave(...);
>>>     }
>>>     return result;
>>
>> Thanks, I'll use the hint (if still applicable) in the next patch
>> version.
>>
>>> > +        return false;
>>> > +    }
>>> > +
>>> > +    return true;
>>> > +}
>>>
>>> How do you make sure that a page is not allocated while this runs? 
>>> Or you just
>>> don't care?  Not that even with zone lock, page may be allocated from
>>> pcp list
>>> on (another) CPU.
>>
>> Ok, I see the issue (i.e. pcp page can be returned by rmqueue_bulk() in
>> buffered_rmqueue() and its page count will be increased in
>> prep_new_page()
>> a bit later with zone lock dropped so while we may not see the page as
>> "bad" one in can_rescue_unmovable_pageblock() it may end up as unmovable
>> one in a pageblock that was just changed to MIGRATE_MOVABLE type).
> 
> Allocating unmovable pages from movable pageblock is allowed though.  But,
> consider those two scenarios:
> 
> thread A                               thread B
>                                        allocate page from pcp list
> call can_rescue_unmovable_pageblock()
>  iterate over all pages
>   find that one of them is allocated
>    so return false
> 
> Second one:
> 
> thread A                               thread B
> call can_rescue_unmovable_pageblock()
>  iterate over all pages
>   find that all of them are free
>                                        allocate page from pcp list
>    return true
> 
> Note that the second scenario can happen even if zone lock is
> held.  So, why in both the function returns different result?
> 
>> It is basically similar problem to page allocation vs
>> alloc_contig_range()
>> races present in CMA so we may deal with it in a similar manner as
>> CMA: isolate pageblock so no new allocations will be allowed from it,
>> check if we can do pageblock transition to MIGRATE_MOVABLE type and do
>> it if so, drain pcp lists, check if the transition was successful and
>> if there are some pages that slipped through just revert the operation..
> 
> To me this sounds like too much work.
> 
> I'm also not sure if you are not overthinking it, which is why I asked
> at the beginning “or you just don't care?”  I'm not entirely sure that
> you need to make sure that all pages in the pageblock are in fact free.


Free page isn't only problem but also PageLRU check.
We can't make sure it without lru_lock or isolation of the page.

> If some of them slip through, nothing catastrophic happens, does it?
> 


Right. It can regress anti-fragmentation but I believe it would be not severe.
The more problem than it is to use page_count without a pin of page which ends up
racing with THP free by another CPU so that kernel would crash by dangling pointer of compound_head.


>> [*] BTW please see http://marc.info/?l=linux-mm&m=133775797022645&w=2
>> for CMA related fixes
> 
> Could you mail it to me again, that would be great, thanks.
> 
> 



-- 
Kind regards,
Minchan Kim


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v9] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
  2012-06-04 13:43 ` Bartlomiej Zolnierkiewicz
@ 2012-06-04 17:13   ` Dave Jones
  -1 siblings, 0 replies; 31+ messages in thread
From: Dave Jones @ 2012-06-04 17:13 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: linux-mm, linux-kernel, Minchan Kim, Hugh Dickins, Linus Torvalds,
	Kyungmin Park, Marek Szyprowski, Mel Gorman, Rik van Riel,
	Andrew Morton, Cong Wang, Markus Trippelsdorf

On Mon, Jun 04, 2012 at 03:43:56PM +0200, Bartlomiej Zolnierkiewicz wrote:
 > 
 > Dave, could you please test this version?
 > 
 > From: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
 > Subject: [PATCH v9] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks

Initial testing looks good.

	Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v9] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
@ 2012-06-04 17:13   ` Dave Jones
  0 siblings, 0 replies; 31+ messages in thread
From: Dave Jones @ 2012-06-04 17:13 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: linux-mm, linux-kernel, Minchan Kim, Hugh Dickins, Linus Torvalds,
	Kyungmin Park, Marek Szyprowski, Mel Gorman, Rik van Riel,
	Andrew Morton, Cong Wang, Markus Trippelsdorf

On Mon, Jun 04, 2012 at 03:43:56PM +0200, Bartlomiej Zolnierkiewicz wrote:
 > 
 > Dave, could you please test this version?
 > 
 > From: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
 > Subject: [PATCH v9] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks

Initial testing looks good.

	Dave


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v9] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
  2012-06-04 13:43 ` Bartlomiej Zolnierkiewicz
@ 2012-06-04 20:22   ` KOSAKI Motohiro
  -1 siblings, 0 replies; 31+ messages in thread
From: KOSAKI Motohiro @ 2012-06-04 20:22 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: linux-mm, linux-kernel, Minchan Kim, Hugh Dickins, Linus Torvalds,
	Kyungmin Park, Marek Szyprowski, Mel Gorman, Rik van Riel,
	Dave Jones, Andrew Morton, Cong Wang, Markus Trippelsdorf,
	kosaki.motohiro

> +/*
> + * Returns true if MIGRATE_UNMOVABLE pageblock can be successfully
> + * converted to MIGRATE_MOVABLE type, false otherwise.
> + */
> +static bool can_rescue_unmovable_pageblock(struct page *page, bool locked)
> +{
> +	unsigned long pfn, start_pfn, end_pfn;
> +	struct page *start_page, *end_page, *cursor_page;
> +
> +	pfn = page_to_pfn(page);
> +	start_pfn = pfn&  ~(pageblock_nr_pages - 1);
> +	end_pfn = start_pfn + pageblock_nr_pages - 1;
> +
> +	start_page = pfn_to_page(start_pfn);
> +	end_page = pfn_to_page(end_pfn);
> +
> +	for (cursor_page = start_page, pfn = start_pfn; cursor_page<= end_page;
> +		pfn++, cursor_page++) {
> +		struct zone *zone = page_zone(start_page);
> +		unsigned long flags;
> +
> +		if (!pfn_valid_within(pfn))
> +			continue;
> +
> +		/* Do not deal with pageblocks that overlap zones */
> +		if (page_zone(cursor_page) != zone)
> +			return false;
> +
> +		if (!locked)
> +			spin_lock_irqsave(&zone->lock, flags);
> +
> +		if (PageBuddy(cursor_page)) {
> +			int order = page_order(cursor_page);
>
> -/* Returns true if the page is within a block suitable for migration to */
> -static bool suitable_migration_target(struct page *page)
> +			pfn += (1<<  order) - 1;
> +			cursor_page += (1<<  order) - 1;
> +
> +			if (!locked)
> +				spin_unlock_irqrestore(&zone->lock, flags);
> +			continue;
> +		} else if (page_count(cursor_page) == 0 ||
> +			   PageLRU(cursor_page)) {
> +			if (!locked)
> +				spin_unlock_irqrestore(&zone->lock, flags);
> +			continue;
> +		}
> +
> +		if (!locked)
> +			spin_unlock_irqrestore(&zone->lock, flags);
> +
> +		return false;
> +	}
> +
> +	return true;
> +}

Minchan, are you interest this patch? If yes, can you please rewrite it? This one are
not fixed our pointed issue and can_rescue_unmovable_pageblock() still has plenty bugs.
We can't ack it.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v9] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
@ 2012-06-04 20:22   ` KOSAKI Motohiro
  0 siblings, 0 replies; 31+ messages in thread
From: KOSAKI Motohiro @ 2012-06-04 20:22 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: linux-mm, linux-kernel, Minchan Kim, Hugh Dickins, Linus Torvalds,
	Kyungmin Park, Marek Szyprowski, Mel Gorman, Rik van Riel,
	Dave Jones, Andrew Morton, Cong Wang, Markus Trippelsdorf,
	kosaki.motohiro

> +/*
> + * Returns true if MIGRATE_UNMOVABLE pageblock can be successfully
> + * converted to MIGRATE_MOVABLE type, false otherwise.
> + */
> +static bool can_rescue_unmovable_pageblock(struct page *page, bool locked)
> +{
> +	unsigned long pfn, start_pfn, end_pfn;
> +	struct page *start_page, *end_page, *cursor_page;
> +
> +	pfn = page_to_pfn(page);
> +	start_pfn = pfn&  ~(pageblock_nr_pages - 1);
> +	end_pfn = start_pfn + pageblock_nr_pages - 1;
> +
> +	start_page = pfn_to_page(start_pfn);
> +	end_page = pfn_to_page(end_pfn);
> +
> +	for (cursor_page = start_page, pfn = start_pfn; cursor_page<= end_page;
> +		pfn++, cursor_page++) {
> +		struct zone *zone = page_zone(start_page);
> +		unsigned long flags;
> +
> +		if (!pfn_valid_within(pfn))
> +			continue;
> +
> +		/* Do not deal with pageblocks that overlap zones */
> +		if (page_zone(cursor_page) != zone)
> +			return false;
> +
> +		if (!locked)
> +			spin_lock_irqsave(&zone->lock, flags);
> +
> +		if (PageBuddy(cursor_page)) {
> +			int order = page_order(cursor_page);
>
> -/* Returns true if the page is within a block suitable for migration to */
> -static bool suitable_migration_target(struct page *page)
> +			pfn += (1<<  order) - 1;
> +			cursor_page += (1<<  order) - 1;
> +
> +			if (!locked)
> +				spin_unlock_irqrestore(&zone->lock, flags);
> +			continue;
> +		} else if (page_count(cursor_page) == 0 ||
> +			   PageLRU(cursor_page)) {
> +			if (!locked)
> +				spin_unlock_irqrestore(&zone->lock, flags);
> +			continue;
> +		}
> +
> +		if (!locked)
> +			spin_unlock_irqrestore(&zone->lock, flags);
> +
> +		return false;
> +	}
> +
> +	return true;
> +}

Minchan, are you interest this patch? If yes, can you please rewrite it? This one are
not fixed our pointed issue and can_rescue_unmovable_pageblock() still has plenty bugs.
We can't ack it.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v9] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
  2012-06-04 20:22   ` KOSAKI Motohiro
  (?)
@ 2012-06-05  1:59   ` Minchan Kim
  2012-06-05  2:38       ` Minchan Kim
  -1 siblings, 1 reply; 31+ messages in thread
From: Minchan Kim @ 2012-06-05  1:59 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Bartlomiej Zolnierkiewicz, linux-mm, linux-kernel, Hugh Dickins,
	Linus Torvalds, Kyungmin Park, Marek Szyprowski, Mel Gorman,
	Rik van Riel, Dave Jones, Andrew Morton, Cong Wang,
	Markus Trippelsdorf

[-- Attachment #1: Type: text/plain, Size: 4300 bytes --]

On 06/05/2012 05:22 AM, KOSAKI Motohiro wrote:

>> +/*
>> + * Returns true if MIGRATE_UNMOVABLE pageblock can be successfully
>> + * converted to MIGRATE_MOVABLE type, false otherwise.
>> + */
>> +static bool can_rescue_unmovable_pageblock(struct page *page, bool
>> locked)
>> +{
>> +    unsigned long pfn, start_pfn, end_pfn;
>> +    struct page *start_page, *end_page, *cursor_page;
>> +
>> +    pfn = page_to_pfn(page);
>> +    start_pfn = pfn&  ~(pageblock_nr_pages - 1);
>> +    end_pfn = start_pfn + pageblock_nr_pages - 1;
>> +
>> +    start_page = pfn_to_page(start_pfn);
>> +    end_page = pfn_to_page(end_pfn);
>> +
>> +    for (cursor_page = start_page, pfn = start_pfn; cursor_page<=
>> end_page;
>> +        pfn++, cursor_page++) {
>> +        struct zone *zone = page_zone(start_page);
>> +        unsigned long flags;
>> +
>> +        if (!pfn_valid_within(pfn))
>> +            continue;
>> +
>> +        /* Do not deal with pageblocks that overlap zones */
>> +        if (page_zone(cursor_page) != zone)
>> +            return false;
>> +
>> +        if (!locked)
>> +            spin_lock_irqsave(&zone->lock, flags);
>> +
>> +        if (PageBuddy(cursor_page)) {
>> +            int order = page_order(cursor_page);
>>
>> -/* Returns true if the page is within a block suitable for migration
>> to */
>> -static bool suitable_migration_target(struct page *page)
>> +            pfn += (1<<  order) - 1;
>> +            cursor_page += (1<<  order) - 1;
>> +
>> +            if (!locked)
>> +                spin_unlock_irqrestore(&zone->lock, flags);
>> +            continue;
>> +        } else if (page_count(cursor_page) == 0 ||
>> +               PageLRU(cursor_page)) {
>> +            if (!locked)
>> +                spin_unlock_irqrestore(&zone->lock, flags);
>> +            continue;
>> +        }
>> +
>> +        if (!locked)
>> +            spin_unlock_irqrestore(&zone->lock, flags);
>> +
>> +        return false;
>> +    }
>> +
>> +    return true;
>> +}
> 
> Minchan, are you interest this patch? If yes, can you please rewrite it?


Can do it but I want to give credit to Bartlomiej.
Bartlomiej, if you like my patch, could you resend it as formal patch after you do broad testing?


> This one are
> not fixed our pointed issue and can_rescue_unmovable_pageblock() still
> has plenty bugs.
> We can't ack it.
> 
> -- 


Frankly speaking, I don't want to merge it without any data which prove it's really good for real practice.

When the patch firstly was submitted, it wasn't complicated so I was okay at that time but it has been complicated
than my expectation. So if Andrew might pass the decision to me, I'm totally NACK if author doesn't provide
any real data or VOC of some client.

1) Any comment?

Anyway, I fixed some bugs and clean up something I found during review.

Minor thing.
1. change smt_result naming - I never like such long non-consistent naming. How about this?
2. fix can_rescue_unmovable_pageblock 
   2.1 pfn valid check for page_zone

Major thing.

   2.2 add lru_lock for stablizing PageLRU
       If we don't hold lru_lock, there is possibility that unmovable(non-LRU) page can put in movable pageblock.
       It can make compaction/CMA's regression. But there is a concern about deadlock between lru_lock and lock.
       As I look the code, I can't find allocation trial with holding lru_lock so it might be safe(but not sure,
       I didn't test it. It need more careful review/testing) but it makes new locking dependency(not sure, too.
       We already made such rule but I didn't know that until now ;-) ) Why I thought so is we can allocate
       GFP_ATOMIC with holding lru_lock, logically which might be crazy idea.

   2.3 remove zone->lock in first phase.
       We do rescue unmovable pageblock by 2-phase. In first-phase, we just peek pages so we don't need locking.
       If we see non-stablizing value, it would be caught by 2-phase with needed lock or 
       can_rescue_unmovable_pageblock can return out of loop by stale page_order(cursor_page).
       It couldn't make unmovable pageblock to movable but we can do it next time, again.
       It's not critical.

2) Any comment?

Now I can't inline the code so sorry but attach patch.
It's not a formal patch/never tested.





-- 
Kind regards,
Minchan Kim

[-- Attachment #2: 1.patch --]
[-- Type: text/x-patch, Size: 13391 bytes --]

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index 51a90b7..e988037 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -1,6 +1,8 @@
 #ifndef _LINUX_COMPACTION_H
 #define _LINUX_COMPACTION_H
 
+#include <linux/node.h>
+
 /* Return values for compact_zone() and try_to_compact_pages() */
 /* compaction didn't start as it was not possible or direct reclaim was more suitable */
 #define COMPACT_SKIPPED		0
@@ -11,6 +13,23 @@
 /* The full zone was compacted */
 #define COMPACT_COMPLETE	3
 
+/*
+ * compaction supports three modes
+ *
+ * COMPACT_ASYNC_MOVABLE uses asynchronous migration and only scans
+ *    MIGRATE_MOVABLE pageblocks as migration sources and targets.
+ * COMPACT_ASYNC_UNMOVABLE uses asynchronous migration and only scans
+ *    MIGRATE_MOVABLE pageblocks as migration sources.
+ *    MIGRATE_UNMOVABLE pageblocks are scanned as potential migration
+ *    targets and convers them to MIGRATE_MOVABLE if possible
+ * COMPACT_SYNC uses synchronous migration and scans all pageblocks
+ */
+enum compact_mode {
+	COMPACT_ASYNC_MOVABLE,
+	COMPACT_ASYNC_UNMOVABLE,
+	COMPACT_SYNC,
+};
+
 #ifdef CONFIG_COMPACTION
 extern int sysctl_compact_memory;
 extern int sysctl_compaction_handler(struct ctl_table *table, int write,
diff --git a/mm/compaction.c b/mm/compaction.c
index 7ea259d..5c96391 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -236,7 +236,7 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
 	 */
 	while (unlikely(too_many_isolated(zone))) {
 		/* async migration should just abort */
-		if (!cc->sync)
+		if (cc->mode != COMPACT_SYNC)
 			return 0;
 
 		congestion_wait(BLK_RW_ASYNC, HZ/10);
@@ -304,7 +304,8 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
 		 * satisfies the allocation
 		 */
 		pageblock_nr = low_pfn >> pageblock_order;
-		if (!cc->sync && last_pageblock_nr != pageblock_nr &&
+		if (cc->mode != COMPACT_SYNC &&
+		    last_pageblock_nr != pageblock_nr &&
 		    !migrate_async_suitable(get_pageblock_migratetype(page))) {
 			low_pfn += pageblock_nr_pages;
 			low_pfn = ALIGN(low_pfn, pageblock_nr_pages) - 1;
@@ -325,7 +326,7 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
 			continue;
 		}
 
-		if (!cc->sync)
+		if (cc->mode != COMPACT_SYNC)
 			mode |= ISOLATE_ASYNC_MIGRATE;
 
 		lruvec = mem_cgroup_page_lruvec(page, zone);
@@ -360,27 +361,117 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
 
 #endif /* CONFIG_COMPACTION || CONFIG_CMA */
 #ifdef CONFIG_COMPACTION
+/*
+ * Returns true if MIGRATE_UNMOVABLE pageblock can be successfully
+ * converted to MIGRATE_MOVABLE type, false otherwise.
+ */
+static bool can_rescue_unmovable_pageblock(struct page *page, bool need_lrulock)
+{
+	struct zone *zone;
+	unsigned long pfn, start_pfn, end_pfn;
+	struct page *start_page, *end_page, *cursor_page;
+	bool lru_locked = false;
+
+	zone = page_zone(page);
+	pfn = page_to_pfn(page);
+	start_pfn = pfn & ~(pageblock_nr_pages - 1);
+	end_pfn = start_pfn + pageblock_nr_pages - 1;
+
+	start_page = pfn_to_page(start_pfn);
+	end_page = pfn_to_page(end_pfn);
+
+	for (cursor_page = start_page, pfn = start_pfn; cursor_page <= end_page;
+		pfn++, cursor_page++) {
+
+		if (!pfn_valid_within(pfn))
+			continue;
+
+		/* Do not deal with pageblocks that overlap zones */
+		if (page_zone(cursor_page) != zone)
+			goto out;
+
+		if (PageBuddy(cursor_page)) {
+			unsigned long order = page_order(cursor_page);
+
+			pfn += (1 << order) - 1;
+			cursor_page += (1 << order) - 1;
+			continue;
+		} else if (page_count(cursor_page) == 0) {
+			continue;
+		} else if (PageLRU(cursor_page)) {
+			if (!lru_locked && need_lrulock) {
+				spin_lock(&zone->lru_lock);
+				lru_locked = true;
+				if (PageLRU(cursor_page))
+					continue;
+			}
+		}
+
+		goto out;
+	}
+
+	return true;
+out:
+	if (lru_locked)
+		spin_unlock(&zone->lru_lock);
+
+	return false;
+}
 
-/* Returns true if the page is within a block suitable for migration to */
-static bool suitable_migration_target(struct page *page)
+static void rescue_unmovable_pageblock(struct page *page)
+{
+	set_pageblock_migratetype(page, MIGRATE_MOVABLE);
+	move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE);
+}
+
+/*
+ * MIGRATE_TARGET : good for migration target
+ * RESCUE_UNMOVABLE_TARTET : good only if we can rescue the unmovable pageblock.
+ * UNMOVABLE_TARGET : can't migrate because it's a page in unmovable pageblock.
+ * SKIP_TARGET : can't migrate another reasons.
+ */
+enum smt_result {
+	MIGRATE_TARGET,
+	RESCUE_UNMOVABLE_TARGET,
+	UNMOVABLE_TARGET,
+	SKIP_TARGET,
+};
+
+/*
+ * Returns MIGRATE_TARGET if the page is within a block
+ * suitable for migration to, UNMOVABLE_TARGET if the page
+ * is within a MIGRATE_UNMOVABLE block, SKIP_TARGET otherwise.
+ */
+static enum smt_result suitable_migration_target(struct page *page,
+			      struct compact_control *cc, bool need_lrulock)
 {
 
 	int migratetype = get_pageblock_migratetype(page);
 
 	/* Don't interfere with memory hot-remove or the min_free_kbytes blocks */
 	if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
-		return false;
+		return SKIP_TARGET;
 
 	/* If the page is a large free page, then allow migration */
 	if (PageBuddy(page) && page_order(page) >= pageblock_order)
-		return true;
+		return MIGRATE_TARGET;
 
 	/* If the block is MIGRATE_MOVABLE or MIGRATE_CMA, allow migration */
-	if (migrate_async_suitable(migratetype))
-		return true;
+	if (cc->mode != COMPACT_ASYNC_UNMOVABLE &&
+	    migrate_async_suitable(migratetype))
+		return MIGRATE_TARGET;
+
+	if (cc->mode == COMPACT_ASYNC_MOVABLE &&
+	    migratetype == MIGRATE_UNMOVABLE)
+		return UNMOVABLE_TARGET;
+
+	if (cc->mode != COMPACT_ASYNC_MOVABLE &&
+	    migratetype == MIGRATE_UNMOVABLE &&
+	    can_rescue_unmovable_pageblock(page, need_lrulock))
+		return RESCUE_UNMOVABLE_TARGET;
 
 	/* Otherwise skip the block */
-	return false;
+	return SKIP_TARGET;
 }
 
 /*
@@ -414,6 +505,13 @@ static void isolate_freepages(struct zone *zone,
 	zone_end_pfn = zone->zone_start_pfn + zone->spanned_pages;
 
 	/*
+	 * isolate_freepages() may be called more than once during
+	 * compact_zone_order() run and we want only the most recent
+	 * count.
+	 */
+	cc->nr_unmovable_pageblock = 0;
+
+	/*
 	 * Isolate free pages until enough are available to migrate the
 	 * pages on cc->migratepages. We stop searching if the migrate
 	 * and free page scanners meet or enough free pages are isolated.
@@ -421,6 +519,7 @@ static void isolate_freepages(struct zone *zone,
 	for (; pfn > low_pfn && cc->nr_migratepages > nr_freepages;
 					pfn -= pageblock_nr_pages) {
 		unsigned long isolated;
+		enum smt_result ret;
 
 		if (!pfn_valid(pfn))
 			continue;
@@ -437,9 +536,12 @@ static void isolate_freepages(struct zone *zone,
 			continue;
 
 		/* Check the block is suitable for migration */
-		if (!suitable_migration_target(page))
+		ret = suitable_migration_target(page, cc, false);
+		if (ret != MIGRATE_TARGET && ret != RESCUE_UNMOVABLE_TARGET) {
+			if (ret == UNMOVABLE_TARGET)
+				cc->nr_unmovable_pageblock++;
 			continue;
-
+		}
 		/*
 		 * Found a block suitable for isolating free pages from. Now
 		 * we disabled interrupts, double check things are ok and
@@ -448,12 +550,16 @@ static void isolate_freepages(struct zone *zone,
 		 */
 		isolated = 0;
 		spin_lock_irqsave(&zone->lock, flags);
-		if (suitable_migration_target(page)) {
+		ret = suitable_migration_target(page, cc, true);
+		if (ret == MIGRATE_TARGET || ret == RESCUE_UNMOVABLE_TARGET) {
+			if (ret == RESCUE_UNMOVABLE_TARGET)
+				rescue_unmovable_pageblock(page);
 			end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn);
 			isolated = isolate_freepages_block(pfn, end_pfn,
 							   freelist, false);
 			nr_freepages += isolated;
-		}
+		} else if (ret == UNMOVABLE_TARGET)
+			cc->nr_unmovable_pageblock++;
 		spin_unlock_irqrestore(&zone->lock, flags);
 
 		/*
@@ -685,8 +791,9 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
 
 		nr_migrate = cc->nr_migratepages;
 		err = migrate_pages(&cc->migratepages, compaction_alloc,
-				(unsigned long)cc, false,
-				cc->sync ? MIGRATE_SYNC_LIGHT : MIGRATE_ASYNC);
+			(unsigned long)&cc->freepages, false,
+			(cc->mode == COMPACT_SYNC) ? MIGRATE_SYNC_LIGHT
+						      : MIGRATE_ASYNC);
 		update_nr_listpages(cc);
 		nr_remaining = cc->nr_migratepages;
 
@@ -715,7 +822,8 @@ out:
 
 static unsigned long compact_zone_order(struct zone *zone,
 				 int order, gfp_t gfp_mask,
-				 bool sync)
+				 enum compact_mode mode,
+				 unsigned long *nr_pageblocks_skipped)
 {
 	struct compact_control cc = {
 		.nr_freepages = 0,
@@ -723,12 +831,17 @@ static unsigned long compact_zone_order(struct zone *zone,
 		.order = order,
 		.migratetype = allocflags_to_migratetype(gfp_mask),
 		.zone = zone,
-		.sync = sync,
+		.mode = mode,
 	};
+	unsigned long rc;
+
 	INIT_LIST_HEAD(&cc.freepages);
 	INIT_LIST_HEAD(&cc.migratepages);
 
-	return compact_zone(zone, &cc);
+	rc = compact_zone(zone, &cc);
+	*nr_pageblocks_skipped = cc.nr_unmovable_pageblock;
+
+	return rc;
 }
 
 int sysctl_extfrag_threshold = 500;
@@ -753,6 +866,8 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
 	struct zoneref *z;
 	struct zone *zone;
 	int rc = COMPACT_SKIPPED;
+	unsigned long nr_pageblocks_skipped;
+	enum compact_mode mode;
 
 	/*
 	 * Check whether it is worth even starting compaction. The order check is
@@ -769,12 +884,22 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
 								nodemask) {
 		int status;
 
-		status = compact_zone_order(zone, order, gfp_mask, sync);
+		mode = sync ? COMPACT_SYNC : COMPACT_ASYNC_MOVABLE;
+retry:
+		status = compact_zone_order(zone, order, gfp_mask, mode,
+						&nr_pageblocks_skipped);
 		rc = max(status, rc);
 
 		/* If a normal allocation would succeed, stop compacting */
 		if (zone_watermark_ok(zone, order, low_wmark_pages(zone), 0, 0))
 			break;
+
+		if (rc == COMPACT_COMPLETE && mode == COMPACT_ASYNC_MOVABLE) {
+			if (nr_pageblocks_skipped) {
+				mode = COMPACT_ASYNC_UNMOVABLE;
+				goto retry;
+			}
+		}
 	}
 
 	return rc;
@@ -808,7 +933,7 @@ static int __compact_pgdat(pg_data_t *pgdat, struct compact_control *cc)
 			if (ok && cc->order > zone->compact_order_failed)
 				zone->compact_order_failed = cc->order + 1;
 			/* Currently async compaction is never deferred. */
-			else if (!ok && cc->sync)
+			else if (!ok && cc->mode == COMPACT_SYNC)
 				defer_compaction(zone, cc->order);
 		}
 
@@ -823,7 +948,7 @@ int compact_pgdat(pg_data_t *pgdat, int order)
 {
 	struct compact_control cc = {
 		.order = order,
-		.sync = false,
+		.mode = COMPACT_ASYNC_MOVABLE,
 	};
 
 	return __compact_pgdat(pgdat, &cc);
@@ -833,7 +958,7 @@ static int compact_node(int nid)
 {
 	struct compact_control cc = {
 		.order = -1,
-		.sync = true,
+		.mode = COMPACT_SYNC,
 	};
 
 	return __compact_pgdat(NODE_DATA(nid), &cc);
diff --git a/mm/internal.h b/mm/internal.h
index 2ba87fb..061fde7 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -94,6 +94,9 @@ extern void putback_lru_page(struct page *page);
 /*
  * in mm/page_alloc.c
  */
+extern void set_pageblock_migratetype(struct page *page, int migratetype);
+extern int move_freepages_block(struct zone *zone, struct page *page,
+				int migratetype);
 extern void __free_pages_bootmem(struct page *page, unsigned int order);
 extern void prep_compound_page(struct page *page, unsigned long order);
 #ifdef CONFIG_MEMORY_FAILURE
@@ -101,6 +104,7 @@ extern bool is_free_buddy_page(struct page *page);
 #endif
 
 #if defined CONFIG_COMPACTION || defined CONFIG_CMA
+#include <linux/compaction.h>
 
 /*
  * in mm/compaction.c
@@ -119,11 +123,14 @@ struct compact_control {
 	unsigned long nr_migratepages;	/* Number of pages to migrate */
 	unsigned long free_pfn;		/* isolate_freepages search base */
 	unsigned long migrate_pfn;	/* isolate_migratepages search base */
-	bool sync;			/* Synchronous migration */
+	enum compact_mode mode;		/* Compaction mode */
 
 	int order;			/* order a direct compactor needs */
 	int migratetype;		/* MOVABLE, RECLAIMABLE etc */
 	struct zone *zone;
+
+	/* Number of UNMOVABLE destination pageblocks skipped during scan */
+	unsigned long nr_unmovable_pageblock;
 };
 
 unsigned long
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 476ae3e..d40e4c7 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -219,7 +219,7 @@ EXPORT_SYMBOL(nr_online_nodes);
 
 int page_group_by_mobility_disabled __read_mostly;
 
-static void set_pageblock_migratetype(struct page *page, int migratetype)
+void set_pageblock_migratetype(struct page *page, int migratetype)
 {
 
 	if (unlikely(page_group_by_mobility_disabled))
@@ -954,8 +954,8 @@ static int move_freepages(struct zone *zone,
 	return pages_moved;
 }
 
-static int move_freepages_block(struct zone *zone, struct page *page,
-				int migratetype)
+int move_freepages_block(struct zone *zone, struct page *page,
+			 int migratetype)
 {
 	unsigned long start_pfn, end_pfn;
 	struct page *start_page, *end_page;
@@ -5651,7 +5651,7 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
 		.nr_migratepages = 0,
 		.order = -1,
 		.zone = page_zone(pfn_to_page(start)),
-		.sync = true,
+		.mode = COMPACT_SYNC,
 	};
 	INIT_LIST_HEAD(&cc.migratepages);
 

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v9] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
  2012-06-05  1:59   ` Minchan Kim
@ 2012-06-05  2:38       ` Minchan Kim
  0 siblings, 0 replies; 31+ messages in thread
From: Minchan Kim @ 2012-06-05  2:38 UTC (permalink / raw)
  Cc: KOSAKI Motohiro, Bartlomiej Zolnierkiewicz, linux-mm,
	linux-kernel, Hugh Dickins, Linus Torvalds, Kyungmin Park,
	Marek Szyprowski, Mel Gorman, Rik van Riel, Dave Jones,
	Andrew Morton, Cong Wang, Markus Trippelsdorf

[-- Attachment #1: Type: text/plain, Size: 4596 bytes --]

On 06/05/2012 10:59 AM, Minchan Kim wrote:

> On 06/05/2012 05:22 AM, KOSAKI Motohiro wrote:
> 
>>> +/*
>>> + * Returns true if MIGRATE_UNMOVABLE pageblock can be successfully
>>> + * converted to MIGRATE_MOVABLE type, false otherwise.
>>> + */
>>> +static bool can_rescue_unmovable_pageblock(struct page *page, bool
>>> locked)
>>> +{
>>> +    unsigned long pfn, start_pfn, end_pfn;
>>> +    struct page *start_page, *end_page, *cursor_page;
>>> +
>>> +    pfn = page_to_pfn(page);
>>> +    start_pfn = pfn&  ~(pageblock_nr_pages - 1);
>>> +    end_pfn = start_pfn + pageblock_nr_pages - 1;
>>> +
>>> +    start_page = pfn_to_page(start_pfn);
>>> +    end_page = pfn_to_page(end_pfn);
>>> +
>>> +    for (cursor_page = start_page, pfn = start_pfn; cursor_page<=
>>> end_page;
>>> +        pfn++, cursor_page++) {
>>> +        struct zone *zone = page_zone(start_page);
>>> +        unsigned long flags;
>>> +
>>> +        if (!pfn_valid_within(pfn))
>>> +            continue;
>>> +
>>> +        /* Do not deal with pageblocks that overlap zones */
>>> +        if (page_zone(cursor_page) != zone)
>>> +            return false;
>>> +
>>> +        if (!locked)
>>> +            spin_lock_irqsave(&zone->lock, flags);
>>> +
>>> +        if (PageBuddy(cursor_page)) {
>>> +            int order = page_order(cursor_page);
>>>
>>> -/* Returns true if the page is within a block suitable for migration
>>> to */
>>> -static bool suitable_migration_target(struct page *page)
>>> +            pfn += (1<<  order) - 1;
>>> +            cursor_page += (1<<  order) - 1;
>>> +
>>> +            if (!locked)
>>> +                spin_unlock_irqrestore(&zone->lock, flags);
>>> +            continue;
>>> +        } else if (page_count(cursor_page) == 0 ||
>>> +               PageLRU(cursor_page)) {
>>> +            if (!locked)
>>> +                spin_unlock_irqrestore(&zone->lock, flags);
>>> +            continue;
>>> +        }
>>> +
>>> +        if (!locked)
>>> +            spin_unlock_irqrestore(&zone->lock, flags);
>>> +
>>> +        return false;
>>> +    }
>>> +
>>> +    return true;
>>> +}
>>
>> Minchan, are you interest this patch? If yes, can you please rewrite it?
> 
> 
> Can do it but I want to give credit to Bartlomiej.
> Bartlomiej, if you like my patch, could you resend it as formal patch after you do broad testing?
> 
> 
>> This one are
>> not fixed our pointed issue and can_rescue_unmovable_pageblock() still
>> has plenty bugs.
>> We can't ack it.
>>
>> -- 
> 
> 
> Frankly speaking, I don't want to merge it without any data which prove it's really good for real practice.
> 
> When the patch firstly was submitted, it wasn't complicated so I was okay at that time but it has been complicated
> than my expectation. So if Andrew might pass the decision to me, I'm totally NACK if author doesn't provide
> any real data or VOC of some client.
> 
> 1) Any comment?
> 
> Anyway, I fixed some bugs and clean up something I found during review.
> 
> Minor thing.
> 1. change smt_result naming - I never like such long non-consistent naming. How about this?
> 2. fix can_rescue_unmovable_pageblock 
>    2.1 pfn valid check for page_zone
> 
> Major thing.
> 
>    2.2 add lru_lock for stablizing PageLRU
>        If we don't hold lru_lock, there is possibility that unmovable(non-LRU) page can put in movable pageblock.
>        It can make compaction/CMA's regression. But there is a concern about deadlock between lru_lock and lock.
>        As I look the code, I can't find allocation trial with holding lru_lock so it might be safe(but not sure,
>        I didn't test it. It need more careful review/testing) but it makes new locking dependency(not sure, too.
>        We already made such rule but I didn't know that until now ;-) ) Why I thought so is we can allocate
>        GFP_ATOMIC with holding lru_lock, logically which might be crazy idea.
> 
>    2.3 remove zone->lock in first phase.
>        We do rescue unmovable pageblock by 2-phase. In first-phase, we just peek pages so we don't need locking.
>        If we see non-stablizing value, it would be caught by 2-phase with needed lock or 
>        can_rescue_unmovable_pageblock can return out of loop by stale page_order(cursor_page).
>        It couldn't make unmovable pageblock to movable but we can do it next time, again.
>        It's not critical.
> 
> 2) Any comment?
> 
> Now I can't inline the code so sorry but attach patch.
> It's not a formal patch/never tested.
> 


Attached patch has a BUG in can_rescue_unmovable_pageblock.
Resend. I hope it is fixed.

 



-- 
Kind regards,
Minchan Kim

[-- Attachment #2: 1.patch --]
[-- Type: text/x-patch, Size: 13434 bytes --]

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index 51a90b7..e988037 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -1,6 +1,8 @@
 #ifndef _LINUX_COMPACTION_H
 #define _LINUX_COMPACTION_H
 
+#include <linux/node.h>
+
 /* Return values for compact_zone() and try_to_compact_pages() */
 /* compaction didn't start as it was not possible or direct reclaim was more suitable */
 #define COMPACT_SKIPPED		0
@@ -11,6 +13,23 @@
 /* The full zone was compacted */
 #define COMPACT_COMPLETE	3
 
+/*
+ * compaction supports three modes
+ *
+ * COMPACT_ASYNC_MOVABLE uses asynchronous migration and only scans
+ *    MIGRATE_MOVABLE pageblocks as migration sources and targets.
+ * COMPACT_ASYNC_UNMOVABLE uses asynchronous migration and only scans
+ *    MIGRATE_MOVABLE pageblocks as migration sources.
+ *    MIGRATE_UNMOVABLE pageblocks are scanned as potential migration
+ *    targets and convers them to MIGRATE_MOVABLE if possible
+ * COMPACT_SYNC uses synchronous migration and scans all pageblocks
+ */
+enum compact_mode {
+	COMPACT_ASYNC_MOVABLE,
+	COMPACT_ASYNC_UNMOVABLE,
+	COMPACT_SYNC,
+};
+
 #ifdef CONFIG_COMPACTION
 extern int sysctl_compact_memory;
 extern int sysctl_compaction_handler(struct ctl_table *table, int write,
diff --git a/mm/compaction.c b/mm/compaction.c
index 7ea259d..dd02f25 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -236,7 +236,7 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
 	 */
 	while (unlikely(too_many_isolated(zone))) {
 		/* async migration should just abort */
-		if (!cc->sync)
+		if (cc->mode != COMPACT_SYNC)
 			return 0;
 
 		congestion_wait(BLK_RW_ASYNC, HZ/10);
@@ -304,7 +304,8 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
 		 * satisfies the allocation
 		 */
 		pageblock_nr = low_pfn >> pageblock_order;
-		if (!cc->sync && last_pageblock_nr != pageblock_nr &&
+		if (cc->mode != COMPACT_SYNC &&
+		    last_pageblock_nr != pageblock_nr &&
 		    !migrate_async_suitable(get_pageblock_migratetype(page))) {
 			low_pfn += pageblock_nr_pages;
 			low_pfn = ALIGN(low_pfn, pageblock_nr_pages) - 1;
@@ -325,7 +326,7 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
 			continue;
 		}
 
-		if (!cc->sync)
+		if (cc->mode != COMPACT_SYNC)
 			mode |= ISOLATE_ASYNC_MIGRATE;
 
 		lruvec = mem_cgroup_page_lruvec(page, zone);
@@ -360,27 +361,121 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
 
 #endif /* CONFIG_COMPACTION || CONFIG_CMA */
 #ifdef CONFIG_COMPACTION
+/*
+ * Returns true if MIGRATE_UNMOVABLE pageblock can be successfully
+ * converted to MIGRATE_MOVABLE type, false otherwise.
+ */
+static bool can_rescue_unmovable_pageblock(struct page *page, bool need_lrulock)
+{
+	struct zone *zone;
+	unsigned long pfn, start_pfn, end_pfn;
+	struct page *start_page, *end_page, *cursor_page;
+	bool lru_locked = false;
+
+	zone = page_zone(page);
+	pfn = page_to_pfn(page);
+	start_pfn = pfn & ~(pageblock_nr_pages - 1);
+	end_pfn = start_pfn + pageblock_nr_pages - 1;
+
+	start_page = pfn_to_page(start_pfn);
+	end_page = pfn_to_page(end_pfn);
+
+	for (cursor_page = start_page, pfn = start_pfn; cursor_page <= end_page;
+		pfn++, cursor_page++) {
 
-/* Returns true if the page is within a block suitable for migration to */
-static bool suitable_migration_target(struct page *page)
+		if (!pfn_valid_within(pfn))
+			continue;
+
+		/* Do not deal with pageblocks that overlap zones */
+		if (page_zone(cursor_page) != zone)
+			goto out;
+
+		if (PageBuddy(cursor_page)) {
+			unsigned long order = page_order(cursor_page);
+
+			pfn += (1 << order) - 1;
+			cursor_page += (1 << order) - 1;
+			continue;
+		} else if (page_count(cursor_page) == 0) {
+			continue;
+		} else if (PageLRU(cursor_page)) {
+			if (!need_lrulock)
+				continue;
+			else if (lru_locked)
+				continue;
+			else {
+				spin_lock(&zone->lru_lock);
+				lru_locked = true;
+				if (PageLRU(page))
+					continue;
+			}
+		}
+
+		goto out;
+	}
+
+	return true;
+out:
+	if (lru_locked)
+		spin_unlock(&zone->lru_lock);
+
+	return false;
+}
+
+static void rescue_unmovable_pageblock(struct page *page)
+{
+	set_pageblock_migratetype(page, MIGRATE_MOVABLE);
+	move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE);
+}
+
+/*
+ * MIGRATE_TARGET : good for migration target
+ * RESCUE_UNMOVABLE_TARTET : good only if we can rescue the unmovable pageblock.
+ * UNMOVABLE_TARGET : can't migrate because it's a page in unmovable pageblock.
+ * SKIP_TARGET : can't migrate another reasons.
+ */
+enum smt_result {
+	MIGRATE_TARGET,
+	RESCUE_UNMOVABLE_TARGET,
+	UNMOVABLE_TARGET,
+	SKIP_TARGET,
+};
+
+/*
+ * Returns MIGRATE_TARGET if the page is within a block
+ * suitable for migration to, UNMOVABLE_TARGET if the page
+ * is within a MIGRATE_UNMOVABLE block, SKIP_TARGET otherwise.
+ */
+static enum smt_result suitable_migration_target(struct page *page,
+			      struct compact_control *cc, bool need_lrulock)
 {
 
 	int migratetype = get_pageblock_migratetype(page);
 
 	/* Don't interfere with memory hot-remove or the min_free_kbytes blocks */
 	if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
-		return false;
+		return SKIP_TARGET;
 
 	/* If the page is a large free page, then allow migration */
 	if (PageBuddy(page) && page_order(page) >= pageblock_order)
-		return true;
+		return MIGRATE_TARGET;
 
 	/* If the block is MIGRATE_MOVABLE or MIGRATE_CMA, allow migration */
-	if (migrate_async_suitable(migratetype))
-		return true;
+	if (cc->mode != COMPACT_ASYNC_UNMOVABLE &&
+	    migrate_async_suitable(migratetype))
+		return MIGRATE_TARGET;
+
+	if (cc->mode == COMPACT_ASYNC_MOVABLE &&
+	    migratetype == MIGRATE_UNMOVABLE)
+		return UNMOVABLE_TARGET;
+
+	if (cc->mode != COMPACT_ASYNC_MOVABLE &&
+	    migratetype == MIGRATE_UNMOVABLE &&
+	    can_rescue_unmovable_pageblock(page, need_lrulock))
+		return RESCUE_UNMOVABLE_TARGET;
 
 	/* Otherwise skip the block */
-	return false;
+	return SKIP_TARGET;
 }
 
 /*
@@ -414,6 +509,13 @@ static void isolate_freepages(struct zone *zone,
 	zone_end_pfn = zone->zone_start_pfn + zone->spanned_pages;
 
 	/*
+	 * isolate_freepages() may be called more than once during
+	 * compact_zone_order() run and we want only the most recent
+	 * count.
+	 */
+	cc->nr_unmovable_pageblock = 0;
+
+	/*
 	 * Isolate free pages until enough are available to migrate the
 	 * pages on cc->migratepages. We stop searching if the migrate
 	 * and free page scanners meet or enough free pages are isolated.
@@ -421,6 +523,7 @@ static void isolate_freepages(struct zone *zone,
 	for (; pfn > low_pfn && cc->nr_migratepages > nr_freepages;
 					pfn -= pageblock_nr_pages) {
 		unsigned long isolated;
+		enum smt_result ret;
 
 		if (!pfn_valid(pfn))
 			continue;
@@ -437,9 +540,12 @@ static void isolate_freepages(struct zone *zone,
 			continue;
 
 		/* Check the block is suitable for migration */
-		if (!suitable_migration_target(page))
+		ret = suitable_migration_target(page, cc, false);
+		if (ret != MIGRATE_TARGET && ret != RESCUE_UNMOVABLE_TARGET) {
+			if (ret == UNMOVABLE_TARGET)
+				cc->nr_unmovable_pageblock++;
 			continue;
-
+		}
 		/*
 		 * Found a block suitable for isolating free pages from. Now
 		 * we disabled interrupts, double check things are ok and
@@ -448,12 +554,16 @@ static void isolate_freepages(struct zone *zone,
 		 */
 		isolated = 0;
 		spin_lock_irqsave(&zone->lock, flags);
-		if (suitable_migration_target(page)) {
+		ret = suitable_migration_target(page, cc, true);
+		if (ret == MIGRATE_TARGET || ret == RESCUE_UNMOVABLE_TARGET) {
+			if (ret == RESCUE_UNMOVABLE_TARGET)
+				rescue_unmovable_pageblock(page);
 			end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn);
 			isolated = isolate_freepages_block(pfn, end_pfn,
 							   freelist, false);
 			nr_freepages += isolated;
-		}
+		} else if (ret == UNMOVABLE_TARGET)
+			cc->nr_unmovable_pageblock++;
 		spin_unlock_irqrestore(&zone->lock, flags);
 
 		/*
@@ -685,8 +795,9 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
 
 		nr_migrate = cc->nr_migratepages;
 		err = migrate_pages(&cc->migratepages, compaction_alloc,
-				(unsigned long)cc, false,
-				cc->sync ? MIGRATE_SYNC_LIGHT : MIGRATE_ASYNC);
+			(unsigned long)&cc->freepages, false,
+			(cc->mode == COMPACT_SYNC) ? MIGRATE_SYNC_LIGHT
+						      : MIGRATE_ASYNC);
 		update_nr_listpages(cc);
 		nr_remaining = cc->nr_migratepages;
 
@@ -715,7 +826,8 @@ out:
 
 static unsigned long compact_zone_order(struct zone *zone,
 				 int order, gfp_t gfp_mask,
-				 bool sync)
+				 enum compact_mode mode,
+				 unsigned long *nr_pageblocks_skipped)
 {
 	struct compact_control cc = {
 		.nr_freepages = 0,
@@ -723,12 +835,17 @@ static unsigned long compact_zone_order(struct zone *zone,
 		.order = order,
 		.migratetype = allocflags_to_migratetype(gfp_mask),
 		.zone = zone,
-		.sync = sync,
+		.mode = mode,
 	};
+	unsigned long rc;
+
 	INIT_LIST_HEAD(&cc.freepages);
 	INIT_LIST_HEAD(&cc.migratepages);
 
-	return compact_zone(zone, &cc);
+	rc = compact_zone(zone, &cc);
+	*nr_pageblocks_skipped = cc.nr_unmovable_pageblock;
+
+	return rc;
 }
 
 int sysctl_extfrag_threshold = 500;
@@ -753,6 +870,8 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
 	struct zoneref *z;
 	struct zone *zone;
 	int rc = COMPACT_SKIPPED;
+	unsigned long nr_pageblocks_skipped;
+	enum compact_mode mode;
 
 	/*
 	 * Check whether it is worth even starting compaction. The order check is
@@ -769,12 +888,22 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
 								nodemask) {
 		int status;
 
-		status = compact_zone_order(zone, order, gfp_mask, sync);
+		mode = sync ? COMPACT_SYNC : COMPACT_ASYNC_MOVABLE;
+retry:
+		status = compact_zone_order(zone, order, gfp_mask, mode,
+						&nr_pageblocks_skipped);
 		rc = max(status, rc);
 
 		/* If a normal allocation would succeed, stop compacting */
 		if (zone_watermark_ok(zone, order, low_wmark_pages(zone), 0, 0))
 			break;
+
+		if (rc == COMPACT_COMPLETE && mode == COMPACT_ASYNC_MOVABLE) {
+			if (nr_pageblocks_skipped) {
+				mode = COMPACT_ASYNC_UNMOVABLE;
+				goto retry;
+			}
+		}
 	}
 
 	return rc;
@@ -808,7 +937,7 @@ static int __compact_pgdat(pg_data_t *pgdat, struct compact_control *cc)
 			if (ok && cc->order > zone->compact_order_failed)
 				zone->compact_order_failed = cc->order + 1;
 			/* Currently async compaction is never deferred. */
-			else if (!ok && cc->sync)
+			else if (!ok && cc->mode == COMPACT_SYNC)
 				defer_compaction(zone, cc->order);
 		}
 
@@ -823,7 +952,7 @@ int compact_pgdat(pg_data_t *pgdat, int order)
 {
 	struct compact_control cc = {
 		.order = order,
-		.sync = false,
+		.mode = COMPACT_ASYNC_MOVABLE,
 	};
 
 	return __compact_pgdat(pgdat, &cc);
@@ -833,7 +962,7 @@ static int compact_node(int nid)
 {
 	struct compact_control cc = {
 		.order = -1,
-		.sync = true,
+		.mode = COMPACT_SYNC,
 	};
 
 	return __compact_pgdat(NODE_DATA(nid), &cc);
diff --git a/mm/internal.h b/mm/internal.h
index 2ba87fb..061fde7 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -94,6 +94,9 @@ extern void putback_lru_page(struct page *page);
 /*
  * in mm/page_alloc.c
  */
+extern void set_pageblock_migratetype(struct page *page, int migratetype);
+extern int move_freepages_block(struct zone *zone, struct page *page,
+				int migratetype);
 extern void __free_pages_bootmem(struct page *page, unsigned int order);
 extern void prep_compound_page(struct page *page, unsigned long order);
 #ifdef CONFIG_MEMORY_FAILURE
@@ -101,6 +104,7 @@ extern bool is_free_buddy_page(struct page *page);
 #endif
 
 #if defined CONFIG_COMPACTION || defined CONFIG_CMA
+#include <linux/compaction.h>
 
 /*
  * in mm/compaction.c
@@ -119,11 +123,14 @@ struct compact_control {
 	unsigned long nr_migratepages;	/* Number of pages to migrate */
 	unsigned long free_pfn;		/* isolate_freepages search base */
 	unsigned long migrate_pfn;	/* isolate_migratepages search base */
-	bool sync;			/* Synchronous migration */
+	enum compact_mode mode;		/* Compaction mode */
 
 	int order;			/* order a direct compactor needs */
 	int migratetype;		/* MOVABLE, RECLAIMABLE etc */
 	struct zone *zone;
+
+	/* Number of UNMOVABLE destination pageblocks skipped during scan */
+	unsigned long nr_unmovable_pageblock;
 };
 
 unsigned long
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 476ae3e..d40e4c7 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -219,7 +219,7 @@ EXPORT_SYMBOL(nr_online_nodes);
 
 int page_group_by_mobility_disabled __read_mostly;
 
-static void set_pageblock_migratetype(struct page *page, int migratetype)
+void set_pageblock_migratetype(struct page *page, int migratetype)
 {
 
 	if (unlikely(page_group_by_mobility_disabled))
@@ -954,8 +954,8 @@ static int move_freepages(struct zone *zone,
 	return pages_moved;
 }
 
-static int move_freepages_block(struct zone *zone, struct page *page,
-				int migratetype)
+int move_freepages_block(struct zone *zone, struct page *page,
+			 int migratetype)
 {
 	unsigned long start_pfn, end_pfn;
 	struct page *start_page, *end_page;
@@ -5651,7 +5651,7 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
 		.nr_migratepages = 0,
 		.order = -1,
 		.zone = page_zone(pfn_to_page(start)),
-		.sync = true,
+		.mode = COMPACT_SYNC,
 	};
 	INIT_LIST_HEAD(&cc.migratepages);
 

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v9] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
@ 2012-06-05  2:38       ` Minchan Kim
  0 siblings, 0 replies; 31+ messages in thread
From: Minchan Kim @ 2012-06-05  2:38 UTC (permalink / raw)
  Cc: KOSAKI Motohiro, Bartlomiej Zolnierkiewicz, linux-mm,
	linux-kernel, Hugh Dickins, Linus Torvalds, Kyungmin Park,
	Marek Szyprowski, Mel Gorman, Rik van Riel, Dave Jones,
	Andrew Morton, Cong Wang, Markus Trippelsdorf

[-- Attachment #1: Type: text/plain, Size: 4596 bytes --]

On 06/05/2012 10:59 AM, Minchan Kim wrote:

> On 06/05/2012 05:22 AM, KOSAKI Motohiro wrote:
> 
>>> +/*
>>> + * Returns true if MIGRATE_UNMOVABLE pageblock can be successfully
>>> + * converted to MIGRATE_MOVABLE type, false otherwise.
>>> + */
>>> +static bool can_rescue_unmovable_pageblock(struct page *page, bool
>>> locked)
>>> +{
>>> +    unsigned long pfn, start_pfn, end_pfn;
>>> +    struct page *start_page, *end_page, *cursor_page;
>>> +
>>> +    pfn = page_to_pfn(page);
>>> +    start_pfn = pfn&  ~(pageblock_nr_pages - 1);
>>> +    end_pfn = start_pfn + pageblock_nr_pages - 1;
>>> +
>>> +    start_page = pfn_to_page(start_pfn);
>>> +    end_page = pfn_to_page(end_pfn);
>>> +
>>> +    for (cursor_page = start_page, pfn = start_pfn; cursor_page<=
>>> end_page;
>>> +        pfn++, cursor_page++) {
>>> +        struct zone *zone = page_zone(start_page);
>>> +        unsigned long flags;
>>> +
>>> +        if (!pfn_valid_within(pfn))
>>> +            continue;
>>> +
>>> +        /* Do not deal with pageblocks that overlap zones */
>>> +        if (page_zone(cursor_page) != zone)
>>> +            return false;
>>> +
>>> +        if (!locked)
>>> +            spin_lock_irqsave(&zone->lock, flags);
>>> +
>>> +        if (PageBuddy(cursor_page)) {
>>> +            int order = page_order(cursor_page);
>>>
>>> -/* Returns true if the page is within a block suitable for migration
>>> to */
>>> -static bool suitable_migration_target(struct page *page)
>>> +            pfn += (1<<  order) - 1;
>>> +            cursor_page += (1<<  order) - 1;
>>> +
>>> +            if (!locked)
>>> +                spin_unlock_irqrestore(&zone->lock, flags);
>>> +            continue;
>>> +        } else if (page_count(cursor_page) == 0 ||
>>> +               PageLRU(cursor_page)) {
>>> +            if (!locked)
>>> +                spin_unlock_irqrestore(&zone->lock, flags);
>>> +            continue;
>>> +        }
>>> +
>>> +        if (!locked)
>>> +            spin_unlock_irqrestore(&zone->lock, flags);
>>> +
>>> +        return false;
>>> +    }
>>> +
>>> +    return true;
>>> +}
>>
>> Minchan, are you interest this patch? If yes, can you please rewrite it?
> 
> 
> Can do it but I want to give credit to Bartlomiej.
> Bartlomiej, if you like my patch, could you resend it as formal patch after you do broad testing?
> 
> 
>> This one are
>> not fixed our pointed issue and can_rescue_unmovable_pageblock() still
>> has plenty bugs.
>> We can't ack it.
>>
>> -- 
> 
> 
> Frankly speaking, I don't want to merge it without any data which prove it's really good for real practice.
> 
> When the patch firstly was submitted, it wasn't complicated so I was okay at that time but it has been complicated
> than my expectation. So if Andrew might pass the decision to me, I'm totally NACK if author doesn't provide
> any real data or VOC of some client.
> 
> 1) Any comment?
> 
> Anyway, I fixed some bugs and clean up something I found during review.
> 
> Minor thing.
> 1. change smt_result naming - I never like such long non-consistent naming. How about this?
> 2. fix can_rescue_unmovable_pageblock 
>    2.1 pfn valid check for page_zone
> 
> Major thing.
> 
>    2.2 add lru_lock for stablizing PageLRU
>        If we don't hold lru_lock, there is possibility that unmovable(non-LRU) page can put in movable pageblock.
>        It can make compaction/CMA's regression. But there is a concern about deadlock between lru_lock and lock.
>        As I look the code, I can't find allocation trial with holding lru_lock so it might be safe(but not sure,
>        I didn't test it. It need more careful review/testing) but it makes new locking dependency(not sure, too.
>        We already made such rule but I didn't know that until now ;-) ) Why I thought so is we can allocate
>        GFP_ATOMIC with holding lru_lock, logically which might be crazy idea.
> 
>    2.3 remove zone->lock in first phase.
>        We do rescue unmovable pageblock by 2-phase. In first-phase, we just peek pages so we don't need locking.
>        If we see non-stablizing value, it would be caught by 2-phase with needed lock or 
>        can_rescue_unmovable_pageblock can return out of loop by stale page_order(cursor_page).
>        It couldn't make unmovable pageblock to movable but we can do it next time, again.
>        It's not critical.
> 
> 2) Any comment?
> 
> Now I can't inline the code so sorry but attach patch.
> It's not a formal patch/never tested.
> 


Attached patch has a BUG in can_rescue_unmovable_pageblock.
Resend. I hope it is fixed.

 



-- 
Kind regards,
Minchan Kim

[-- Attachment #2: 1.patch --]
[-- Type: text/x-patch, Size: 13434 bytes --]

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index 51a90b7..e988037 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -1,6 +1,8 @@
 #ifndef _LINUX_COMPACTION_H
 #define _LINUX_COMPACTION_H
 
+#include <linux/node.h>
+
 /* Return values for compact_zone() and try_to_compact_pages() */
 /* compaction didn't start as it was not possible or direct reclaim was more suitable */
 #define COMPACT_SKIPPED		0
@@ -11,6 +13,23 @@
 /* The full zone was compacted */
 #define COMPACT_COMPLETE	3
 
+/*
+ * compaction supports three modes
+ *
+ * COMPACT_ASYNC_MOVABLE uses asynchronous migration and only scans
+ *    MIGRATE_MOVABLE pageblocks as migration sources and targets.
+ * COMPACT_ASYNC_UNMOVABLE uses asynchronous migration and only scans
+ *    MIGRATE_MOVABLE pageblocks as migration sources.
+ *    MIGRATE_UNMOVABLE pageblocks are scanned as potential migration
+ *    targets and convers them to MIGRATE_MOVABLE if possible
+ * COMPACT_SYNC uses synchronous migration and scans all pageblocks
+ */
+enum compact_mode {
+	COMPACT_ASYNC_MOVABLE,
+	COMPACT_ASYNC_UNMOVABLE,
+	COMPACT_SYNC,
+};
+
 #ifdef CONFIG_COMPACTION
 extern int sysctl_compact_memory;
 extern int sysctl_compaction_handler(struct ctl_table *table, int write,
diff --git a/mm/compaction.c b/mm/compaction.c
index 7ea259d..dd02f25 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -236,7 +236,7 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
 	 */
 	while (unlikely(too_many_isolated(zone))) {
 		/* async migration should just abort */
-		if (!cc->sync)
+		if (cc->mode != COMPACT_SYNC)
 			return 0;
 
 		congestion_wait(BLK_RW_ASYNC, HZ/10);
@@ -304,7 +304,8 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
 		 * satisfies the allocation
 		 */
 		pageblock_nr = low_pfn >> pageblock_order;
-		if (!cc->sync && last_pageblock_nr != pageblock_nr &&
+		if (cc->mode != COMPACT_SYNC &&
+		    last_pageblock_nr != pageblock_nr &&
 		    !migrate_async_suitable(get_pageblock_migratetype(page))) {
 			low_pfn += pageblock_nr_pages;
 			low_pfn = ALIGN(low_pfn, pageblock_nr_pages) - 1;
@@ -325,7 +326,7 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
 			continue;
 		}
 
-		if (!cc->sync)
+		if (cc->mode != COMPACT_SYNC)
 			mode |= ISOLATE_ASYNC_MIGRATE;
 
 		lruvec = mem_cgroup_page_lruvec(page, zone);
@@ -360,27 +361,121 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
 
 #endif /* CONFIG_COMPACTION || CONFIG_CMA */
 #ifdef CONFIG_COMPACTION
+/*
+ * Returns true if MIGRATE_UNMOVABLE pageblock can be successfully
+ * converted to MIGRATE_MOVABLE type, false otherwise.
+ */
+static bool can_rescue_unmovable_pageblock(struct page *page, bool need_lrulock)
+{
+	struct zone *zone;
+	unsigned long pfn, start_pfn, end_pfn;
+	struct page *start_page, *end_page, *cursor_page;
+	bool lru_locked = false;
+
+	zone = page_zone(page);
+	pfn = page_to_pfn(page);
+	start_pfn = pfn & ~(pageblock_nr_pages - 1);
+	end_pfn = start_pfn + pageblock_nr_pages - 1;
+
+	start_page = pfn_to_page(start_pfn);
+	end_page = pfn_to_page(end_pfn);
+
+	for (cursor_page = start_page, pfn = start_pfn; cursor_page <= end_page;
+		pfn++, cursor_page++) {
 
-/* Returns true if the page is within a block suitable for migration to */
-static bool suitable_migration_target(struct page *page)
+		if (!pfn_valid_within(pfn))
+			continue;
+
+		/* Do not deal with pageblocks that overlap zones */
+		if (page_zone(cursor_page) != zone)
+			goto out;
+
+		if (PageBuddy(cursor_page)) {
+			unsigned long order = page_order(cursor_page);
+
+			pfn += (1 << order) - 1;
+			cursor_page += (1 << order) - 1;
+			continue;
+		} else if (page_count(cursor_page) == 0) {
+			continue;
+		} else if (PageLRU(cursor_page)) {
+			if (!need_lrulock)
+				continue;
+			else if (lru_locked)
+				continue;
+			else {
+				spin_lock(&zone->lru_lock);
+				lru_locked = true;
+				if (PageLRU(page))
+					continue;
+			}
+		}
+
+		goto out;
+	}
+
+	return true;
+out:
+	if (lru_locked)
+		spin_unlock(&zone->lru_lock);
+
+	return false;
+}
+
+static void rescue_unmovable_pageblock(struct page *page)
+{
+	set_pageblock_migratetype(page, MIGRATE_MOVABLE);
+	move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE);
+}
+
+/*
+ * MIGRATE_TARGET : good for migration target
+ * RESCUE_UNMOVABLE_TARTET : good only if we can rescue the unmovable pageblock.
+ * UNMOVABLE_TARGET : can't migrate because it's a page in unmovable pageblock.
+ * SKIP_TARGET : can't migrate another reasons.
+ */
+enum smt_result {
+	MIGRATE_TARGET,
+	RESCUE_UNMOVABLE_TARGET,
+	UNMOVABLE_TARGET,
+	SKIP_TARGET,
+};
+
+/*
+ * Returns MIGRATE_TARGET if the page is within a block
+ * suitable for migration to, UNMOVABLE_TARGET if the page
+ * is within a MIGRATE_UNMOVABLE block, SKIP_TARGET otherwise.
+ */
+static enum smt_result suitable_migration_target(struct page *page,
+			      struct compact_control *cc, bool need_lrulock)
 {
 
 	int migratetype = get_pageblock_migratetype(page);
 
 	/* Don't interfere with memory hot-remove or the min_free_kbytes blocks */
 	if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
-		return false;
+		return SKIP_TARGET;
 
 	/* If the page is a large free page, then allow migration */
 	if (PageBuddy(page) && page_order(page) >= pageblock_order)
-		return true;
+		return MIGRATE_TARGET;
 
 	/* If the block is MIGRATE_MOVABLE or MIGRATE_CMA, allow migration */
-	if (migrate_async_suitable(migratetype))
-		return true;
+	if (cc->mode != COMPACT_ASYNC_UNMOVABLE &&
+	    migrate_async_suitable(migratetype))
+		return MIGRATE_TARGET;
+
+	if (cc->mode == COMPACT_ASYNC_MOVABLE &&
+	    migratetype == MIGRATE_UNMOVABLE)
+		return UNMOVABLE_TARGET;
+
+	if (cc->mode != COMPACT_ASYNC_MOVABLE &&
+	    migratetype == MIGRATE_UNMOVABLE &&
+	    can_rescue_unmovable_pageblock(page, need_lrulock))
+		return RESCUE_UNMOVABLE_TARGET;
 
 	/* Otherwise skip the block */
-	return false;
+	return SKIP_TARGET;
 }
 
 /*
@@ -414,6 +509,13 @@ static void isolate_freepages(struct zone *zone,
 	zone_end_pfn = zone->zone_start_pfn + zone->spanned_pages;
 
 	/*
+	 * isolate_freepages() may be called more than once during
+	 * compact_zone_order() run and we want only the most recent
+	 * count.
+	 */
+	cc->nr_unmovable_pageblock = 0;
+
+	/*
 	 * Isolate free pages until enough are available to migrate the
 	 * pages on cc->migratepages. We stop searching if the migrate
 	 * and free page scanners meet or enough free pages are isolated.
@@ -421,6 +523,7 @@ static void isolate_freepages(struct zone *zone,
 	for (; pfn > low_pfn && cc->nr_migratepages > nr_freepages;
 					pfn -= pageblock_nr_pages) {
 		unsigned long isolated;
+		enum smt_result ret;
 
 		if (!pfn_valid(pfn))
 			continue;
@@ -437,9 +540,12 @@ static void isolate_freepages(struct zone *zone,
 			continue;
 
 		/* Check the block is suitable for migration */
-		if (!suitable_migration_target(page))
+		ret = suitable_migration_target(page, cc, false);
+		if (ret != MIGRATE_TARGET && ret != RESCUE_UNMOVABLE_TARGET) {
+			if (ret == UNMOVABLE_TARGET)
+				cc->nr_unmovable_pageblock++;
 			continue;
-
+		}
 		/*
 		 * Found a block suitable for isolating free pages from. Now
 		 * we disabled interrupts, double check things are ok and
@@ -448,12 +554,16 @@ static void isolate_freepages(struct zone *zone,
 		 */
 		isolated = 0;
 		spin_lock_irqsave(&zone->lock, flags);
-		if (suitable_migration_target(page)) {
+		ret = suitable_migration_target(page, cc, true);
+		if (ret == MIGRATE_TARGET || ret == RESCUE_UNMOVABLE_TARGET) {
+			if (ret == RESCUE_UNMOVABLE_TARGET)
+				rescue_unmovable_pageblock(page);
 			end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn);
 			isolated = isolate_freepages_block(pfn, end_pfn,
 							   freelist, false);
 			nr_freepages += isolated;
-		}
+		} else if (ret == UNMOVABLE_TARGET)
+			cc->nr_unmovable_pageblock++;
 		spin_unlock_irqrestore(&zone->lock, flags);
 
 		/*
@@ -685,8 +795,9 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
 
 		nr_migrate = cc->nr_migratepages;
 		err = migrate_pages(&cc->migratepages, compaction_alloc,
-				(unsigned long)cc, false,
-				cc->sync ? MIGRATE_SYNC_LIGHT : MIGRATE_ASYNC);
+			(unsigned long)&cc->freepages, false,
+			(cc->mode == COMPACT_SYNC) ? MIGRATE_SYNC_LIGHT
+						      : MIGRATE_ASYNC);
 		update_nr_listpages(cc);
 		nr_remaining = cc->nr_migratepages;
 
@@ -715,7 +826,8 @@ out:
 
 static unsigned long compact_zone_order(struct zone *zone,
 				 int order, gfp_t gfp_mask,
-				 bool sync)
+				 enum compact_mode mode,
+				 unsigned long *nr_pageblocks_skipped)
 {
 	struct compact_control cc = {
 		.nr_freepages = 0,
@@ -723,12 +835,17 @@ static unsigned long compact_zone_order(struct zone *zone,
 		.order = order,
 		.migratetype = allocflags_to_migratetype(gfp_mask),
 		.zone = zone,
-		.sync = sync,
+		.mode = mode,
 	};
+	unsigned long rc;
+
 	INIT_LIST_HEAD(&cc.freepages);
 	INIT_LIST_HEAD(&cc.migratepages);
 
-	return compact_zone(zone, &cc);
+	rc = compact_zone(zone, &cc);
+	*nr_pageblocks_skipped = cc.nr_unmovable_pageblock;
+
+	return rc;
 }
 
 int sysctl_extfrag_threshold = 500;
@@ -753,6 +870,8 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
 	struct zoneref *z;
 	struct zone *zone;
 	int rc = COMPACT_SKIPPED;
+	unsigned long nr_pageblocks_skipped;
+	enum compact_mode mode;
 
 	/*
 	 * Check whether it is worth even starting compaction. The order check is
@@ -769,12 +888,22 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
 								nodemask) {
 		int status;
 
-		status = compact_zone_order(zone, order, gfp_mask, sync);
+		mode = sync ? COMPACT_SYNC : COMPACT_ASYNC_MOVABLE;
+retry:
+		status = compact_zone_order(zone, order, gfp_mask, mode,
+						&nr_pageblocks_skipped);
 		rc = max(status, rc);
 
 		/* If a normal allocation would succeed, stop compacting */
 		if (zone_watermark_ok(zone, order, low_wmark_pages(zone), 0, 0))
 			break;
+
+		if (rc == COMPACT_COMPLETE && mode == COMPACT_ASYNC_MOVABLE) {
+			if (nr_pageblocks_skipped) {
+				mode = COMPACT_ASYNC_UNMOVABLE;
+				goto retry;
+			}
+		}
 	}
 
 	return rc;
@@ -808,7 +937,7 @@ static int __compact_pgdat(pg_data_t *pgdat, struct compact_control *cc)
 			if (ok && cc->order > zone->compact_order_failed)
 				zone->compact_order_failed = cc->order + 1;
 			/* Currently async compaction is never deferred. */
-			else if (!ok && cc->sync)
+			else if (!ok && cc->mode == COMPACT_SYNC)
 				defer_compaction(zone, cc->order);
 		}
 
@@ -823,7 +952,7 @@ int compact_pgdat(pg_data_t *pgdat, int order)
 {
 	struct compact_control cc = {
 		.order = order,
-		.sync = false,
+		.mode = COMPACT_ASYNC_MOVABLE,
 	};
 
 	return __compact_pgdat(pgdat, &cc);
@@ -833,7 +962,7 @@ static int compact_node(int nid)
 {
 	struct compact_control cc = {
 		.order = -1,
-		.sync = true,
+		.mode = COMPACT_SYNC,
 	};
 
 	return __compact_pgdat(NODE_DATA(nid), &cc);
diff --git a/mm/internal.h b/mm/internal.h
index 2ba87fb..061fde7 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -94,6 +94,9 @@ extern void putback_lru_page(struct page *page);
 /*
  * in mm/page_alloc.c
  */
+extern void set_pageblock_migratetype(struct page *page, int migratetype);
+extern int move_freepages_block(struct zone *zone, struct page *page,
+				int migratetype);
 extern void __free_pages_bootmem(struct page *page, unsigned int order);
 extern void prep_compound_page(struct page *page, unsigned long order);
 #ifdef CONFIG_MEMORY_FAILURE
@@ -101,6 +104,7 @@ extern bool is_free_buddy_page(struct page *page);
 #endif
 
 #if defined CONFIG_COMPACTION || defined CONFIG_CMA
+#include <linux/compaction.h>
 
 /*
  * in mm/compaction.c
@@ -119,11 +123,14 @@ struct compact_control {
 	unsigned long nr_migratepages;	/* Number of pages to migrate */
 	unsigned long free_pfn;		/* isolate_freepages search base */
 	unsigned long migrate_pfn;	/* isolate_migratepages search base */
-	bool sync;			/* Synchronous migration */
+	enum compact_mode mode;		/* Compaction mode */
 
 	int order;			/* order a direct compactor needs */
 	int migratetype;		/* MOVABLE, RECLAIMABLE etc */
 	struct zone *zone;
+
+	/* Number of UNMOVABLE destination pageblocks skipped during scan */
+	unsigned long nr_unmovable_pageblock;
 };
 
 unsigned long
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 476ae3e..d40e4c7 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -219,7 +219,7 @@ EXPORT_SYMBOL(nr_online_nodes);
 
 int page_group_by_mobility_disabled __read_mostly;
 
-static void set_pageblock_migratetype(struct page *page, int migratetype)
+void set_pageblock_migratetype(struct page *page, int migratetype)
 {
 
 	if (unlikely(page_group_by_mobility_disabled))
@@ -954,8 +954,8 @@ static int move_freepages(struct zone *zone,
 	return pages_moved;
 }
 
-static int move_freepages_block(struct zone *zone, struct page *page,
-				int migratetype)
+int move_freepages_block(struct zone *zone, struct page *page,
+			 int migratetype)
 {
 	unsigned long start_pfn, end_pfn;
 	struct page *start_page, *end_page;
@@ -5651,7 +5651,7 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
 		.nr_migratepages = 0,
 		.order = -1,
 		.zone = page_zone(pfn_to_page(start)),
-		.sync = true,
+		.mode = COMPACT_SYNC,
 	};
 	INIT_LIST_HEAD(&cc.migratepages);
 

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v9] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
  2012-06-05  2:38       ` Minchan Kim
@ 2012-06-05  4:35         ` KOSAKI Motohiro
  -1 siblings, 0 replies; 31+ messages in thread
From: KOSAKI Motohiro @ 2012-06-05  4:35 UTC (permalink / raw)
  To: Minchan Kim
  Cc: KOSAKI Motohiro, Bartlomiej Zolnierkiewicz, linux-mm,
	linux-kernel, Hugh Dickins, Linus Torvalds, Kyungmin Park,
	Marek Szyprowski, Mel Gorman, Rik van Riel, Dave Jones,
	Andrew Morton, Cong Wang, Markus Trippelsdorf

>>> Minchan, are you interest this patch? If yes, can you please rewrite it?
>>
>> Can do it but I want to give credit to Bartlomiej.
>> Bartlomiej, if you like my patch, could you resend it as formal patch after you do broad testing?
>>
>> Frankly speaking, I don't want to merge it without any data which prove it's really good for real practice.
>>
>> When the patch firstly was submitted, it wasn't complicated so I was okay at that time but it has been complicated
>> than my expectation. So if Andrew might pass the decision to me, I'm totally NACK if author doesn't provide
>> any real data or VOC of some client.

I agree. And you don't need to bother this patch if you are not interest this one. I'm sorry.
Let's throw it away until the author send us data.


>> 1) Any comment?
>>
>> Anyway, I fixed some bugs and clean up something I found during review.
>>
>> Minor thing.
>> 1. change smt_result naming - I never like such long non-consistent naming. How about this?
>> 2. fix can_rescue_unmovable_pageblock
>>     2.1 pfn valid check for page_zone
>>
>> Major thing.
>>
>>     2.2 add lru_lock for stablizing PageLRU
>>         If we don't hold lru_lock, there is possibility that unmovable(non-LRU) page can put in movable pageblock.
>>         It can make compaction/CMA's regression. But there is a concern about deadlock between lru_lock and lock.
>>         As I look the code, I can't find allocation trial with holding lru_lock so it might be safe(but not sure,
>>         I didn't test it. It need more careful review/testing) but it makes new locking dependency(not sure, too.
>>         We already made such rule but I didn't know that until now ;-) ) Why I thought so is we can allocate
>>         GFP_ATOMIC with holding lru_lock, logically which might be crazy idea.
>>
>>     2.3 remove zone->lock in first phase.
>>         We do rescue unmovable pageblock by 2-phase. In first-phase, we just peek pages so we don't need locking.
>>         If we see non-stablizing value, it would be caught by 2-phase with needed lock or
>>         can_rescue_unmovable_pageblock can return out of loop by stale page_order(cursor_page).
>>         It couldn't make unmovable pageblock to movable but we can do it next time, again.
>>         It's not critical.
>>
>> 2) Any comment?
>>
>> Now I can't inline the code so sorry but attach patch.
>> It's not a formal patch/never tested.
>>
>
>
> Attached patch has a BUG in can_rescue_unmovable_pageblock.
> Resend. I hope it is fixed.
>
>
>
>
>
> diff --git a/include/linux/compaction.h b/include/linux/compaction.h
> index 51a90b7..e988037 100644
> --- a/include/linux/compaction.h
> +++ b/include/linux/compaction.h
> @@ -1,6 +1,8 @@
>  #ifndef _LINUX_COMPACTION_H
>  #define _LINUX_COMPACTION_H
>
> +#include <linux/node.h>
> +
>  /* Return values for compact_zone() and try_to_compact_pages() */
>  /* compaction didn't start as it was not possible or direct reclaim was more suitable */
>  #define COMPACT_SKIPPED		0
> @@ -11,6 +13,23 @@
>  /* The full zone was compacted */
>  #define COMPACT_COMPLETE	3
>
> +/*
> + * compaction supports three modes
> + *
> + * COMPACT_ASYNC_MOVABLE uses asynchronous migration and only scans
> + *    MIGRATE_MOVABLE pageblocks as migration sources and targets.
> + * COMPACT_ASYNC_UNMOVABLE uses asynchronous migration and only scans
> + *    MIGRATE_MOVABLE pageblocks as migration sources.
> + *    MIGRATE_UNMOVABLE pageblocks are scanned as potential migration
> + *    targets and convers them to MIGRATE_MOVABLE if possible
> + * COMPACT_SYNC uses synchronous migration and scans all pageblocks
> + */
> +enum compact_mode {
> +	COMPACT_ASYNC_MOVABLE,
> +	COMPACT_ASYNC_UNMOVABLE,
> +	COMPACT_SYNC,
> +};
> +
>  #ifdef CONFIG_COMPACTION
>  extern int sysctl_compact_memory;
>  extern int sysctl_compaction_handler(struct ctl_table *table, int write,
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 7ea259d..dd02f25 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -236,7 +236,7 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
>  	 */
>  	while (unlikely(too_many_isolated(zone))) {
>  		/* async migration should just abort */
> -		if (!cc->sync)
> +		if (cc->mode != COMPACT_SYNC)
>  			return 0;
>
>  		congestion_wait(BLK_RW_ASYNC, HZ/10);
> @@ -304,7 +304,8 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
>  		 * satisfies the allocation
>  		 */
>  		pageblock_nr = low_pfn >> pageblock_order;
> -		if (!cc->sync && last_pageblock_nr != pageblock_nr &&
> +		if (cc->mode != COMPACT_SYNC &&
> +		    last_pageblock_nr != pageblock_nr &&
>  		    !migrate_async_suitable(get_pageblock_migratetype(page))) {
>  			low_pfn += pageblock_nr_pages;
>  			low_pfn = ALIGN(low_pfn, pageblock_nr_pages) - 1;
> @@ -325,7 +326,7 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
>  			continue;
>  		}
>
> -		if (!cc->sync)
> +		if (cc->mode != COMPACT_SYNC)
>  			mode |= ISOLATE_ASYNC_MIGRATE;
>
>  		lruvec = mem_cgroup_page_lruvec(page, zone);
> @@ -360,27 +361,121 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
>
>  #endif /* CONFIG_COMPACTION || CONFIG_CMA */
>  #ifdef CONFIG_COMPACTION
> +/*
> + * Returns true if MIGRATE_UNMOVABLE pageblock can be successfully
> + * converted to MIGRATE_MOVABLE type, false otherwise.
> + */
> +static bool can_rescue_unmovable_pageblock(struct page *page, bool need_lrulock)
> +{
> +	struct zone *zone;
> +	unsigned long pfn, start_pfn, end_pfn;
> +	struct page *start_page, *end_page, *cursor_page;
> +	bool lru_locked = false;
> +
> +	zone = page_zone(page);
> +	pfn = page_to_pfn(page);
> +	start_pfn = pfn & ~(pageblock_nr_pages - 1);
> +	end_pfn = start_pfn + pageblock_nr_pages - 1;
> +
> +	start_page = pfn_to_page(start_pfn);
> +	end_page = pfn_to_page(end_pfn);
> +
> +	for (cursor_page = start_page, pfn = start_pfn; cursor_page <= end_page;
> +		pfn++, cursor_page++) {
>
> -/* Returns true if the page is within a block suitable for migration to */
> -static bool suitable_migration_target(struct page *page)
> +		if (!pfn_valid_within(pfn))
> +			continue;
> +
> +		/* Do not deal with pageblocks that overlap zones */
> +		if (page_zone(cursor_page) != zone)
> +			goto out;
> +
> +		if (PageBuddy(cursor_page)) {
> +			unsigned long order = page_order(cursor_page);
> +
> +			pfn += (1 << order) - 1;
> +			cursor_page += (1 << order) - 1;
> +			continue;
> +		} else if (page_count(cursor_page) == 0) {
> +			continue;

Can we assume freed tail page always have page_count()==0? if yes, why do we
need dangerous PageBuddy(cursor_page) check? ok, but this may be harmless.

But if no, this code is seriously dangerous. think following scenario,

1) cursor page points free page

     +----------------+------------------+
     | free (order-1) |  used (order-1)  |
     +----------------+------------------+
     |
    cursor

2) moved cursor

     +----------------+------------------+
     | free (order-1) |  used (order-1)  |
     +----------------+------------------+
                      |
                      cursor

3) neighbor block was freed


     +----------------+------------------+
     | free (order-2)                    |
     +----------------+------------------+
                      |
                      cursor

now, cursor points to middle of free block.


Anyway, I recommend to avoid dangerous no zone->lock game and change
can_rescue_unmovable_pageblock() is only called w/ zone->lock. I have
no seen any worth to include this high complex for mere minor optimization.


> +		} else if (PageLRU(cursor_page)) {
> +			if (!need_lrulock)
> +				continue;
> +			else if (lru_locked)
> +				continue;
> +			else {
> +				spin_lock(&zone->lru_lock);

Hmm...
I don't like to take lru_lock. 1) Until now, we carefully avoid to take
both zone->lock and zone->lru_lock. they are both performance critical
lock. And I think pageblock migratetype don't need strictly correct. It
is only optimization of anti fragmentation. Why do we need take it?



> +				lru_locked = true;
> +				if (PageLRU(page))
> +					continue;
> +			}
> +		}
> +
> +		goto out;
> +	}
> +

Why don't we need to release lru_lock when returning true.


> +	return true;
> +out:
> +	if (lru_locked)
> +		spin_unlock(&zone->lru_lock);
> +
> +	return false;
> +}
> +
> +static void rescue_unmovable_pageblock(struct page *page)
> +{
> +	set_pageblock_migratetype(page, MIGRATE_MOVABLE);
> +	move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE);
> +}
> +
> +/*
> + * MIGRATE_TARGET : good for migration target
> + * RESCUE_UNMOVABLE_TARTET : good only if we can rescue the unmovable pageblock.
> + * UNMOVABLE_TARGET : can't migrate because it's a page in unmovable pageblock.
> + * SKIP_TARGET : can't migrate another reasons.
> + */
> +enum smt_result {
> +	MIGRATE_TARGET,
> +	RESCUE_UNMOVABLE_TARGET,
> +	UNMOVABLE_TARGET,
> +	SKIP_TARGET,
> +};
> +
> +/*
> + * Returns MIGRATE_TARGET if the page is within a block
> + * suitable for migration to, UNMOVABLE_TARGET if the page
> + * is within a MIGRATE_UNMOVABLE block, SKIP_TARGET otherwise.
> + */
> +static enum smt_result suitable_migration_target(struct page *page,
> +			      struct compact_control *cc, bool need_lrulock)
>  {
>
>  	int migratetype = get_pageblock_migratetype(page);
>
>  	/* Don't interfere with memory hot-remove or the min_free_kbytes blocks */
>  	if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
> -		return false;
> +		return SKIP_TARGET;
>
>  	/* If the page is a large free page, then allow migration */
>  	if (PageBuddy(page) && page_order(page) >= pageblock_order)
> -		return true;
> +		return MIGRATE_TARGET;
>
>  	/* If the block is MIGRATE_MOVABLE or MIGRATE_CMA, allow migration */
> -	if (migrate_async_suitable(migratetype))
> -		return true;
> +	if (cc->mode != COMPACT_ASYNC_UNMOVABLE &&
> +	    migrate_async_suitable(migratetype))
> +		return MIGRATE_TARGET;
> +
> +	if (cc->mode == COMPACT_ASYNC_MOVABLE &&
> +	    migratetype == MIGRATE_UNMOVABLE)
> +		return UNMOVABLE_TARGET;
> +
> +	if (cc->mode != COMPACT_ASYNC_MOVABLE &&
> +	    migratetype == MIGRATE_UNMOVABLE &&
> +	    can_rescue_unmovable_pageblock(page, need_lrulock))
> +		return RESCUE_UNMOVABLE_TARGET;
>
>  	/* Otherwise skip the block */
> -	return false;
> +	return SKIP_TARGET;
>  }
>
>  /*
> @@ -414,6 +509,13 @@ static void isolate_freepages(struct zone *zone,
>  	zone_end_pfn = zone->zone_start_pfn + zone->spanned_pages;
>
>  	/*
> +	 * isolate_freepages() may be called more than once during
> +	 * compact_zone_order() run and we want only the most recent
> +	 * count.
> +	 */
> +	cc->nr_unmovable_pageblock = 0;
> +
> +	/*
>  	 * Isolate free pages until enough are available to migrate the
>  	 * pages on cc->migratepages. We stop searching if the migrate
>  	 * and free page scanners meet or enough free pages are isolated.
> @@ -421,6 +523,7 @@ static void isolate_freepages(struct zone *zone,
>  	for (; pfn > low_pfn && cc->nr_migratepages > nr_freepages;
>  					pfn -= pageblock_nr_pages) {
>  		unsigned long isolated;
> +		enum smt_result ret;
>
>  		if (!pfn_valid(pfn))
>  			continue;
> @@ -437,9 +540,12 @@ static void isolate_freepages(struct zone *zone,
>  			continue;
>
>  		/* Check the block is suitable for migration */
> -		if (!suitable_migration_target(page))
> +		ret = suitable_migration_target(page, cc, false);
> +		if (ret != MIGRATE_TARGET && ret != RESCUE_UNMOVABLE_TARGET) {
> +			if (ret == UNMOVABLE_TARGET)
> +				cc->nr_unmovable_pageblock++;
>  			continue;
> -
> +		}
>  		/*
>  		 * Found a block suitable for isolating free pages from. Now
>  		 * we disabled interrupts, double check things are ok and
> @@ -448,12 +554,16 @@ static void isolate_freepages(struct zone *zone,
>  		 */
>  		isolated = 0;
>  		spin_lock_irqsave(&zone->lock, flags);
> -		if (suitable_migration_target(page)) {
> +		ret = suitable_migration_target(page, cc, true);
> +		if (ret == MIGRATE_TARGET || ret == RESCUE_UNMOVABLE_TARGET) {
> +			if (ret == RESCUE_UNMOVABLE_TARGET)
> +				rescue_unmovable_pageblock(page);
>  			end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn);
>  			isolated = isolate_freepages_block(pfn, end_pfn,
>  							   freelist, false);
>  			nr_freepages += isolated;
> -		}
> +		} else if (ret == UNMOVABLE_TARGET)
> +			cc->nr_unmovable_pageblock++;
>  		spin_unlock_irqrestore(&zone->lock, flags);
>
>  		/*
> @@ -685,8 +795,9 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
>
>  		nr_migrate = cc->nr_migratepages;
>  		err = migrate_pages(&cc->migratepages, compaction_alloc,
> -				(unsigned long)cc, false,
> -				cc->sync ? MIGRATE_SYNC_LIGHT : MIGRATE_ASYNC);
> +			(unsigned long)&cc->freepages, false,
> +			(cc->mode == COMPACT_SYNC) ? MIGRATE_SYNC_LIGHT
> +						      : MIGRATE_ASYNC);
>  		update_nr_listpages(cc);
>  		nr_remaining = cc->nr_migratepages;
>
> @@ -715,7 +826,8 @@ out:
>
>  static unsigned long compact_zone_order(struct zone *zone,
>  				 int order, gfp_t gfp_mask,
> -				 bool sync)
> +				 enum compact_mode mode,
> +				 unsigned long *nr_pageblocks_skipped)
>  {
>  	struct compact_control cc = {
>  		.nr_freepages = 0,
> @@ -723,12 +835,17 @@ static unsigned long compact_zone_order(struct zone *zone,
>  		.order = order,
>  		.migratetype = allocflags_to_migratetype(gfp_mask),
>  		.zone = zone,
> -		.sync = sync,
> +		.mode = mode,
>  	};
> +	unsigned long rc;
> +
>  	INIT_LIST_HEAD(&cc.freepages);
>  	INIT_LIST_HEAD(&cc.migratepages);
>
> -	return compact_zone(zone, &cc);
> +	rc = compact_zone(zone, &cc);
> +	*nr_pageblocks_skipped = cc.nr_unmovable_pageblock;
> +
> +	return rc;
>  }
>
>  int sysctl_extfrag_threshold = 500;
> @@ -753,6 +870,8 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
>  	struct zoneref *z;
>  	struct zone *zone;
>  	int rc = COMPACT_SKIPPED;
> +	unsigned long nr_pageblocks_skipped;
> +	enum compact_mode mode;
>
>  	/*
>  	 * Check whether it is worth even starting compaction. The order check is
> @@ -769,12 +888,22 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
>  								nodemask) {
>  		int status;
>
> -		status = compact_zone_order(zone, order, gfp_mask, sync);
> +		mode = sync ? COMPACT_SYNC : COMPACT_ASYNC_MOVABLE;
> +retry:
> +		status = compact_zone_order(zone, order, gfp_mask, mode,
> +						&nr_pageblocks_skipped);
>  		rc = max(status, rc);
>
>  		/* If a normal allocation would succeed, stop compacting */
>  		if (zone_watermark_ok(zone, order, low_wmark_pages(zone), 0, 0))
>  			break;
> +
> +		if (rc == COMPACT_COMPLETE && mode == COMPACT_ASYNC_MOVABLE) {
> +			if (nr_pageblocks_skipped) {
> +				mode = COMPACT_ASYNC_UNMOVABLE;
> +				goto retry;
> +			}
> +		}
>  	}
>
>  	return rc;
> @@ -808,7 +937,7 @@ static int __compact_pgdat(pg_data_t *pgdat, struct compact_control *cc)
>  			if (ok && cc->order > zone->compact_order_failed)
>  				zone->compact_order_failed = cc->order + 1;
>  			/* Currently async compaction is never deferred. */
> -			else if (!ok && cc->sync)
> +			else if (!ok && cc->mode == COMPACT_SYNC)
>  				defer_compaction(zone, cc->order);
>  		}
>
> @@ -823,7 +952,7 @@ int compact_pgdat(pg_data_t *pgdat, int order)
>  {
>  	struct compact_control cc = {
>  		.order = order,
> -		.sync = false,
> +		.mode = COMPACT_ASYNC_MOVABLE,
>  	};
>
>  	return __compact_pgdat(pgdat, &cc);
> @@ -833,7 +962,7 @@ static int compact_node(int nid)
>  {
>  	struct compact_control cc = {
>  		.order = -1,
> -		.sync = true,
> +		.mode = COMPACT_SYNC,
>  	};
>
>  	return __compact_pgdat(NODE_DATA(nid), &cc);
> diff --git a/mm/internal.h b/mm/internal.h
> index 2ba87fb..061fde7 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -94,6 +94,9 @@ extern void putback_lru_page(struct page *page);
>  /*
>   * in mm/page_alloc.c
>   */
> +extern void set_pageblock_migratetype(struct page *page, int migratetype);
> +extern int move_freepages_block(struct zone *zone, struct page *page,
> +				int migratetype);
>  extern void __free_pages_bootmem(struct page *page, unsigned int order);
>  extern void prep_compound_page(struct page *page, unsigned long order);
>  #ifdef CONFIG_MEMORY_FAILURE
> @@ -101,6 +104,7 @@ extern bool is_free_buddy_page(struct page *page);
>  #endif
>
>  #if defined CONFIG_COMPACTION || defined CONFIG_CMA
> +#include <linux/compaction.h>
>
>  /*
>   * in mm/compaction.c
> @@ -119,11 +123,14 @@ struct compact_control {
>  	unsigned long nr_migratepages;	/* Number of pages to migrate */
>  	unsigned long free_pfn;		/* isolate_freepages search base */
>  	unsigned long migrate_pfn;	/* isolate_migratepages search base */
> -	bool sync;			/* Synchronous migration */
> +	enum compact_mode mode;		/* Compaction mode */
>
>  	int order;			/* order a direct compactor needs */
>  	int migratetype;		/* MOVABLE, RECLAIMABLE etc */
>  	struct zone *zone;
> +
> +	/* Number of UNMOVABLE destination pageblocks skipped during scan */
> +	unsigned long nr_unmovable_pageblock;
>  };
>
>  unsigned long
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 476ae3e..d40e4c7 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -219,7 +219,7 @@ EXPORT_SYMBOL(nr_online_nodes);
>
>  int page_group_by_mobility_disabled __read_mostly;
>
> -static void set_pageblock_migratetype(struct page *page, int migratetype)
> +void set_pageblock_migratetype(struct page *page, int migratetype)
>  {
>
>  	if (unlikely(page_group_by_mobility_disabled))
> @@ -954,8 +954,8 @@ static int move_freepages(struct zone *zone,
>  	return pages_moved;
>  }
>
> -static int move_freepages_block(struct zone *zone, struct page *page,
> -				int migratetype)
> +int move_freepages_block(struct zone *zone, struct page *page,
> +			 int migratetype)
>  {
>  	unsigned long start_pfn, end_pfn;
>  	struct page *start_page, *end_page;
> @@ -5651,7 +5651,7 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
>  		.nr_migratepages = 0,
>  		.order = -1,
>  		.zone = page_zone(pfn_to_page(start)),
> -		.sync = true,
> +		.mode = COMPACT_SYNC,
>  	};
>  	INIT_LIST_HEAD(&cc.migratepages);
>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v9] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
@ 2012-06-05  4:35         ` KOSAKI Motohiro
  0 siblings, 0 replies; 31+ messages in thread
From: KOSAKI Motohiro @ 2012-06-05  4:35 UTC (permalink / raw)
  To: Minchan Kim
  Cc: ; KOSAKI Motohiro, Bartlomiej Zolnierkiewicz, linux-mm,
	linux-kernel, Hugh Dickins, Linus Torvalds, Kyungmin Park,
	Marek Szyprowski, Mel Gorman, Rik van Riel, Dave Jones,
	Andrew Morton, Cong Wang, Markus Trippelsdorf

>>> Minchan, are you interest this patch? If yes, can you please rewrite it?
>>
>> Can do it but I want to give credit to Bartlomiej.
>> Bartlomiej, if you like my patch, could you resend it as formal patch after you do broad testing?
>>
>> Frankly speaking, I don't want to merge it without any data which prove it's really good for real practice.
>>
>> When the patch firstly was submitted, it wasn't complicated so I was okay at that time but it has been complicated
>> than my expectation. So if Andrew might pass the decision to me, I'm totally NACK if author doesn't provide
>> any real data or VOC of some client.

I agree. And you don't need to bother this patch if you are not interest this one. I'm sorry.
Let's throw it away until the author send us data.


>> 1) Any comment?
>>
>> Anyway, I fixed some bugs and clean up something I found during review.
>>
>> Minor thing.
>> 1. change smt_result naming - I never like such long non-consistent naming. How about this?
>> 2. fix can_rescue_unmovable_pageblock
>>     2.1 pfn valid check for page_zone
>>
>> Major thing.
>>
>>     2.2 add lru_lock for stablizing PageLRU
>>         If we don't hold lru_lock, there is possibility that unmovable(non-LRU) page can put in movable pageblock.
>>         It can make compaction/CMA's regression. But there is a concern about deadlock between lru_lock and lock.
>>         As I look the code, I can't find allocation trial with holding lru_lock so it might be safe(but not sure,
>>         I didn't test it. It need more careful review/testing) but it makes new locking dependency(not sure, too.
>>         We already made such rule but I didn't know that until now ;-) ) Why I thought so is we can allocate
>>         GFP_ATOMIC with holding lru_lock, logically which might be crazy idea.
>>
>>     2.3 remove zone->lock in first phase.
>>         We do rescue unmovable pageblock by 2-phase. In first-phase, we just peek pages so we don't need locking.
>>         If we see non-stablizing value, it would be caught by 2-phase with needed lock or
>>         can_rescue_unmovable_pageblock can return out of loop by stale page_order(cursor_page).
>>         It couldn't make unmovable pageblock to movable but we can do it next time, again.
>>         It's not critical.
>>
>> 2) Any comment?
>>
>> Now I can't inline the code so sorry but attach patch.
>> It's not a formal patch/never tested.
>>
>
>
> Attached patch has a BUG in can_rescue_unmovable_pageblock.
> Resend. I hope it is fixed.
>
>
>
>
>
> diff --git a/include/linux/compaction.h b/include/linux/compaction.h
> index 51a90b7..e988037 100644
> --- a/include/linux/compaction.h
> +++ b/include/linux/compaction.h
> @@ -1,6 +1,8 @@
>  #ifndef _LINUX_COMPACTION_H
>  #define _LINUX_COMPACTION_H
>
> +#include <linux/node.h>
> +
>  /* Return values for compact_zone() and try_to_compact_pages() */
>  /* compaction didn't start as it was not possible or direct reclaim was more suitable */
>  #define COMPACT_SKIPPED		0
> @@ -11,6 +13,23 @@
>  /* The full zone was compacted */
>  #define COMPACT_COMPLETE	3
>
> +/*
> + * compaction supports three modes
> + *
> + * COMPACT_ASYNC_MOVABLE uses asynchronous migration and only scans
> + *    MIGRATE_MOVABLE pageblocks as migration sources and targets.
> + * COMPACT_ASYNC_UNMOVABLE uses asynchronous migration and only scans
> + *    MIGRATE_MOVABLE pageblocks as migration sources.
> + *    MIGRATE_UNMOVABLE pageblocks are scanned as potential migration
> + *    targets and convers them to MIGRATE_MOVABLE if possible
> + * COMPACT_SYNC uses synchronous migration and scans all pageblocks
> + */
> +enum compact_mode {
> +	COMPACT_ASYNC_MOVABLE,
> +	COMPACT_ASYNC_UNMOVABLE,
> +	COMPACT_SYNC,
> +};
> +
>  #ifdef CONFIG_COMPACTION
>  extern int sysctl_compact_memory;
>  extern int sysctl_compaction_handler(struct ctl_table *table, int write,
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 7ea259d..dd02f25 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -236,7 +236,7 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
>  	 */
>  	while (unlikely(too_many_isolated(zone))) {
>  		/* async migration should just abort */
> -		if (!cc->sync)
> +		if (cc->mode != COMPACT_SYNC)
>  			return 0;
>
>  		congestion_wait(BLK_RW_ASYNC, HZ/10);
> @@ -304,7 +304,8 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
>  		 * satisfies the allocation
>  		 */
>  		pageblock_nr = low_pfn >> pageblock_order;
> -		if (!cc->sync && last_pageblock_nr != pageblock_nr &&
> +		if (cc->mode != COMPACT_SYNC &&
> +		    last_pageblock_nr != pageblock_nr &&
>  		    !migrate_async_suitable(get_pageblock_migratetype(page))) {
>  			low_pfn += pageblock_nr_pages;
>  			low_pfn = ALIGN(low_pfn, pageblock_nr_pages) - 1;
> @@ -325,7 +326,7 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
>  			continue;
>  		}
>
> -		if (!cc->sync)
> +		if (cc->mode != COMPACT_SYNC)
>  			mode |= ISOLATE_ASYNC_MIGRATE;
>
>  		lruvec = mem_cgroup_page_lruvec(page, zone);
> @@ -360,27 +361,121 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
>
>  #endif /* CONFIG_COMPACTION || CONFIG_CMA */
>  #ifdef CONFIG_COMPACTION
> +/*
> + * Returns true if MIGRATE_UNMOVABLE pageblock can be successfully
> + * converted to MIGRATE_MOVABLE type, false otherwise.
> + */
> +static bool can_rescue_unmovable_pageblock(struct page *page, bool need_lrulock)
> +{
> +	struct zone *zone;
> +	unsigned long pfn, start_pfn, end_pfn;
> +	struct page *start_page, *end_page, *cursor_page;
> +	bool lru_locked = false;
> +
> +	zone = page_zone(page);
> +	pfn = page_to_pfn(page);
> +	start_pfn = pfn & ~(pageblock_nr_pages - 1);
> +	end_pfn = start_pfn + pageblock_nr_pages - 1;
> +
> +	start_page = pfn_to_page(start_pfn);
> +	end_page = pfn_to_page(end_pfn);
> +
> +	for (cursor_page = start_page, pfn = start_pfn; cursor_page <= end_page;
> +		pfn++, cursor_page++) {
>
> -/* Returns true if the page is within a block suitable for migration to */
> -static bool suitable_migration_target(struct page *page)
> +		if (!pfn_valid_within(pfn))
> +			continue;
> +
> +		/* Do not deal with pageblocks that overlap zones */
> +		if (page_zone(cursor_page) != zone)
> +			goto out;
> +
> +		if (PageBuddy(cursor_page)) {
> +			unsigned long order = page_order(cursor_page);
> +
> +			pfn += (1 << order) - 1;
> +			cursor_page += (1 << order) - 1;
> +			continue;
> +		} else if (page_count(cursor_page) == 0) {
> +			continue;

Can we assume freed tail page always have page_count()==0? if yes, why do we
need dangerous PageBuddy(cursor_page) check? ok, but this may be harmless.

But if no, this code is seriously dangerous. think following scenario,

1) cursor page points free page

     +----------------+------------------+
     | free (order-1) |  used (order-1)  |
     +----------------+------------------+
     |
    cursor

2) moved cursor

     +----------------+------------------+
     | free (order-1) |  used (order-1)  |
     +----------------+------------------+
                      |
                      cursor

3) neighbor block was freed


     +----------------+------------------+
     | free (order-2)                    |
     +----------------+------------------+
                      |
                      cursor

now, cursor points to middle of free block.


Anyway, I recommend to avoid dangerous no zone->lock game and change
can_rescue_unmovable_pageblock() is only called w/ zone->lock. I have
no seen any worth to include this high complex for mere minor optimization.


> +		} else if (PageLRU(cursor_page)) {
> +			if (!need_lrulock)
> +				continue;
> +			else if (lru_locked)
> +				continue;
> +			else {
> +				spin_lock(&zone->lru_lock);

Hmm...
I don't like to take lru_lock. 1) Until now, we carefully avoid to take
both zone->lock and zone->lru_lock. they are both performance critical
lock. And I think pageblock migratetype don't need strictly correct. It
is only optimization of anti fragmentation. Why do we need take it?



> +				lru_locked = true;
> +				if (PageLRU(page))
> +					continue;
> +			}
> +		}
> +
> +		goto out;
> +	}
> +

Why don't we need to release lru_lock when returning true.


> +	return true;
> +out:
> +	if (lru_locked)
> +		spin_unlock(&zone->lru_lock);
> +
> +	return false;
> +}
> +
> +static void rescue_unmovable_pageblock(struct page *page)
> +{
> +	set_pageblock_migratetype(page, MIGRATE_MOVABLE);
> +	move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE);
> +}
> +
> +/*
> + * MIGRATE_TARGET : good for migration target
> + * RESCUE_UNMOVABLE_TARTET : good only if we can rescue the unmovable pageblock.
> + * UNMOVABLE_TARGET : can't migrate because it's a page in unmovable pageblock.
> + * SKIP_TARGET : can't migrate another reasons.
> + */
> +enum smt_result {
> +	MIGRATE_TARGET,
> +	RESCUE_UNMOVABLE_TARGET,
> +	UNMOVABLE_TARGET,
> +	SKIP_TARGET,
> +};
> +
> +/*
> + * Returns MIGRATE_TARGET if the page is within a block
> + * suitable for migration to, UNMOVABLE_TARGET if the page
> + * is within a MIGRATE_UNMOVABLE block, SKIP_TARGET otherwise.
> + */
> +static enum smt_result suitable_migration_target(struct page *page,
> +			      struct compact_control *cc, bool need_lrulock)
>  {
>
>  	int migratetype = get_pageblock_migratetype(page);
>
>  	/* Don't interfere with memory hot-remove or the min_free_kbytes blocks */
>  	if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
> -		return false;
> +		return SKIP_TARGET;
>
>  	/* If the page is a large free page, then allow migration */
>  	if (PageBuddy(page) && page_order(page) >= pageblock_order)
> -		return true;
> +		return MIGRATE_TARGET;
>
>  	/* If the block is MIGRATE_MOVABLE or MIGRATE_CMA, allow migration */
> -	if (migrate_async_suitable(migratetype))
> -		return true;
> +	if (cc->mode != COMPACT_ASYNC_UNMOVABLE &&
> +	    migrate_async_suitable(migratetype))
> +		return MIGRATE_TARGET;
> +
> +	if (cc->mode == COMPACT_ASYNC_MOVABLE &&
> +	    migratetype == MIGRATE_UNMOVABLE)
> +		return UNMOVABLE_TARGET;
> +
> +	if (cc->mode != COMPACT_ASYNC_MOVABLE &&
> +	    migratetype == MIGRATE_UNMOVABLE &&
> +	    can_rescue_unmovable_pageblock(page, need_lrulock))
> +		return RESCUE_UNMOVABLE_TARGET;
>
>  	/* Otherwise skip the block */
> -	return false;
> +	return SKIP_TARGET;
>  }
>
>  /*
> @@ -414,6 +509,13 @@ static void isolate_freepages(struct zone *zone,
>  	zone_end_pfn = zone->zone_start_pfn + zone->spanned_pages;
>
>  	/*
> +	 * isolate_freepages() may be called more than once during
> +	 * compact_zone_order() run and we want only the most recent
> +	 * count.
> +	 */
> +	cc->nr_unmovable_pageblock = 0;
> +
> +	/*
>  	 * Isolate free pages until enough are available to migrate the
>  	 * pages on cc->migratepages. We stop searching if the migrate
>  	 * and free page scanners meet or enough free pages are isolated.
> @@ -421,6 +523,7 @@ static void isolate_freepages(struct zone *zone,
>  	for (; pfn > low_pfn && cc->nr_migratepages > nr_freepages;
>  					pfn -= pageblock_nr_pages) {
>  		unsigned long isolated;
> +		enum smt_result ret;
>
>  		if (!pfn_valid(pfn))
>  			continue;
> @@ -437,9 +540,12 @@ static void isolate_freepages(struct zone *zone,
>  			continue;
>
>  		/* Check the block is suitable for migration */
> -		if (!suitable_migration_target(page))
> +		ret = suitable_migration_target(page, cc, false);
> +		if (ret != MIGRATE_TARGET && ret != RESCUE_UNMOVABLE_TARGET) {
> +			if (ret == UNMOVABLE_TARGET)
> +				cc->nr_unmovable_pageblock++;
>  			continue;
> -
> +		}
>  		/*
>  		 * Found a block suitable for isolating free pages from. Now
>  		 * we disabled interrupts, double check things are ok and
> @@ -448,12 +554,16 @@ static void isolate_freepages(struct zone *zone,
>  		 */
>  		isolated = 0;
>  		spin_lock_irqsave(&zone->lock, flags);
> -		if (suitable_migration_target(page)) {
> +		ret = suitable_migration_target(page, cc, true);
> +		if (ret == MIGRATE_TARGET || ret == RESCUE_UNMOVABLE_TARGET) {
> +			if (ret == RESCUE_UNMOVABLE_TARGET)
> +				rescue_unmovable_pageblock(page);
>  			end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn);
>  			isolated = isolate_freepages_block(pfn, end_pfn,
>  							   freelist, false);
>  			nr_freepages += isolated;
> -		}
> +		} else if (ret == UNMOVABLE_TARGET)
> +			cc->nr_unmovable_pageblock++;
>  		spin_unlock_irqrestore(&zone->lock, flags);
>
>  		/*
> @@ -685,8 +795,9 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
>
>  		nr_migrate = cc->nr_migratepages;
>  		err = migrate_pages(&cc->migratepages, compaction_alloc,
> -				(unsigned long)cc, false,
> -				cc->sync ? MIGRATE_SYNC_LIGHT : MIGRATE_ASYNC);
> +			(unsigned long)&cc->freepages, false,
> +			(cc->mode == COMPACT_SYNC) ? MIGRATE_SYNC_LIGHT
> +						      : MIGRATE_ASYNC);
>  		update_nr_listpages(cc);
>  		nr_remaining = cc->nr_migratepages;
>
> @@ -715,7 +826,8 @@ out:
>
>  static unsigned long compact_zone_order(struct zone *zone,
>  				 int order, gfp_t gfp_mask,
> -				 bool sync)
> +				 enum compact_mode mode,
> +				 unsigned long *nr_pageblocks_skipped)
>  {
>  	struct compact_control cc = {
>  		.nr_freepages = 0,
> @@ -723,12 +835,17 @@ static unsigned long compact_zone_order(struct zone *zone,
>  		.order = order,
>  		.migratetype = allocflags_to_migratetype(gfp_mask),
>  		.zone = zone,
> -		.sync = sync,
> +		.mode = mode,
>  	};
> +	unsigned long rc;
> +
>  	INIT_LIST_HEAD(&cc.freepages);
>  	INIT_LIST_HEAD(&cc.migratepages);
>
> -	return compact_zone(zone, &cc);
> +	rc = compact_zone(zone, &cc);
> +	*nr_pageblocks_skipped = cc.nr_unmovable_pageblock;
> +
> +	return rc;
>  }
>
>  int sysctl_extfrag_threshold = 500;
> @@ -753,6 +870,8 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
>  	struct zoneref *z;
>  	struct zone *zone;
>  	int rc = COMPACT_SKIPPED;
> +	unsigned long nr_pageblocks_skipped;
> +	enum compact_mode mode;
>
>  	/*
>  	 * Check whether it is worth even starting compaction. The order check is
> @@ -769,12 +888,22 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
>  								nodemask) {
>  		int status;
>
> -		status = compact_zone_order(zone, order, gfp_mask, sync);
> +		mode = sync ? COMPACT_SYNC : COMPACT_ASYNC_MOVABLE;
> +retry:
> +		status = compact_zone_order(zone, order, gfp_mask, mode,
> +						&nr_pageblocks_skipped);
>  		rc = max(status, rc);
>
>  		/* If a normal allocation would succeed, stop compacting */
>  		if (zone_watermark_ok(zone, order, low_wmark_pages(zone), 0, 0))
>  			break;
> +
> +		if (rc == COMPACT_COMPLETE && mode == COMPACT_ASYNC_MOVABLE) {
> +			if (nr_pageblocks_skipped) {
> +				mode = COMPACT_ASYNC_UNMOVABLE;
> +				goto retry;
> +			}
> +		}
>  	}
>
>  	return rc;
> @@ -808,7 +937,7 @@ static int __compact_pgdat(pg_data_t *pgdat, struct compact_control *cc)
>  			if (ok && cc->order > zone->compact_order_failed)
>  				zone->compact_order_failed = cc->order + 1;
>  			/* Currently async compaction is never deferred. */
> -			else if (!ok && cc->sync)
> +			else if (!ok && cc->mode == COMPACT_SYNC)
>  				defer_compaction(zone, cc->order);
>  		}
>
> @@ -823,7 +952,7 @@ int compact_pgdat(pg_data_t *pgdat, int order)
>  {
>  	struct compact_control cc = {
>  		.order = order,
> -		.sync = false,
> +		.mode = COMPACT_ASYNC_MOVABLE,
>  	};
>
>  	return __compact_pgdat(pgdat, &cc);
> @@ -833,7 +962,7 @@ static int compact_node(int nid)
>  {
>  	struct compact_control cc = {
>  		.order = -1,
> -		.sync = true,
> +		.mode = COMPACT_SYNC,
>  	};
>
>  	return __compact_pgdat(NODE_DATA(nid), &cc);
> diff --git a/mm/internal.h b/mm/internal.h
> index 2ba87fb..061fde7 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -94,6 +94,9 @@ extern void putback_lru_page(struct page *page);
>  /*
>   * in mm/page_alloc.c
>   */
> +extern void set_pageblock_migratetype(struct page *page, int migratetype);
> +extern int move_freepages_block(struct zone *zone, struct page *page,
> +				int migratetype);
>  extern void __free_pages_bootmem(struct page *page, unsigned int order);
>  extern void prep_compound_page(struct page *page, unsigned long order);
>  #ifdef CONFIG_MEMORY_FAILURE
> @@ -101,6 +104,7 @@ extern bool is_free_buddy_page(struct page *page);
>  #endif
>
>  #if defined CONFIG_COMPACTION || defined CONFIG_CMA
> +#include <linux/compaction.h>
>
>  /*
>   * in mm/compaction.c
> @@ -119,11 +123,14 @@ struct compact_control {
>  	unsigned long nr_migratepages;	/* Number of pages to migrate */
>  	unsigned long free_pfn;		/* isolate_freepages search base */
>  	unsigned long migrate_pfn;	/* isolate_migratepages search base */
> -	bool sync;			/* Synchronous migration */
> +	enum compact_mode mode;		/* Compaction mode */
>
>  	int order;			/* order a direct compactor needs */
>  	int migratetype;		/* MOVABLE, RECLAIMABLE etc */
>  	struct zone *zone;
> +
> +	/* Number of UNMOVABLE destination pageblocks skipped during scan */
> +	unsigned long nr_unmovable_pageblock;
>  };
>
>  unsigned long
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 476ae3e..d40e4c7 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -219,7 +219,7 @@ EXPORT_SYMBOL(nr_online_nodes);
>
>  int page_group_by_mobility_disabled __read_mostly;
>
> -static void set_pageblock_migratetype(struct page *page, int migratetype)
> +void set_pageblock_migratetype(struct page *page, int migratetype)
>  {
>
>  	if (unlikely(page_group_by_mobility_disabled))
> @@ -954,8 +954,8 @@ static int move_freepages(struct zone *zone,
>  	return pages_moved;
>  }
>
> -static int move_freepages_block(struct zone *zone, struct page *page,
> -				int migratetype)
> +int move_freepages_block(struct zone *zone, struct page *page,
> +			 int migratetype)
>  {
>  	unsigned long start_pfn, end_pfn;
>  	struct page *start_page, *end_page;
> @@ -5651,7 +5651,7 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
>  		.nr_migratepages = 0,
>  		.order = -1,
>  		.zone = page_zone(pfn_to_page(start)),
> -		.sync = true,
> +		.mode = COMPACT_SYNC,
>  	};
>  	INIT_LIST_HEAD(&cc.migratepages);
>



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v9] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
  2012-06-05  4:35         ` KOSAKI Motohiro
@ 2012-06-05  6:05           ` Minchan Kim
  -1 siblings, 0 replies; 31+ messages in thread
From: Minchan Kim @ 2012-06-05  6:05 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Bartlomiej Zolnierkiewicz, linux-mm, linux-kernel, Hugh Dickins,
	Linus Torvalds, Kyungmin Park, Marek Szyprowski, Mel Gorman,
	Rik van Riel, Dave Jones, Andrew Morton, Cong Wang,
	Markus Trippelsdorf

On 06/05/2012 01:35 PM, KOSAKI Motohiro wrote:

>>>> Minchan, are you interest this patch? If yes, can you please rewrite
>>>> it?
>>>
>>> Can do it but I want to give credit to Bartlomiej.
>>> Bartlomiej, if you like my patch, could you resend it as formal patch
>>> after you do broad testing?
>>>
>>> Frankly speaking, I don't want to merge it without any data which
>>> prove it's really good for real practice.
>>>
>>> When the patch firstly was submitted, it wasn't complicated so I was
>>> okay at that time but it has been complicated
>>> than my expectation. So if Andrew might pass the decision to me, I'm
>>> totally NACK if author doesn't provide
>>> any real data or VOC of some client.
> 
> I agree. And you don't need to bother this patch if you are not interest
> this one. I'm sorry.


Never mind.

> Let's throw it away until the author send us data.
> 

I guess it's hard to make such workload to prove it's useful normally.
But we can't make sure there isn't such workload in the world.
So I hope listen VOC. At least, Mel might require it.

If anyone doesn't support it, I hope let's add some vmstat like stuff for proving
this patch's effect. If we can't see the benefit through vmstat, we can deprecate
it later.

>>> 1) Any comment?
>>>
>>> Anyway, I fixed some bugs and clean up something I found during review.
>>>
>>> Minor thing.
>>> 1. change smt_result naming - I never like such long non-consistent
>>> naming. How about this?
>>> 2. fix can_rescue_unmovable_pageblock
>>>     2.1 pfn valid check for page_zone
>>>
>>> Major thing.
>>>
>>>     2.2 add lru_lock for stablizing PageLRU
>>>         If we don't hold lru_lock, there is possibility that
>>> unmovable(non-LRU) page can put in movable pageblock.
>>>         It can make compaction/CMA's regression. But there is a
>>> concern about deadlock between lru_lock and lock.
>>>         As I look the code, I can't find allocation trial with
>>> holding lru_lock so it might be safe(but not sure,
>>>         I didn't test it. It need more careful review/testing) but it
>>> makes new locking dependency(not sure, too.
>>>         We already made such rule but I didn't know that until now
>>> ;-) ) Why I thought so is we can allocate
>>>         GFP_ATOMIC with holding lru_lock, logically which might be
>>> crazy idea.
>>>
>>>     2.3 remove zone->lock in first phase.
>>>         We do rescue unmovable pageblock by 2-phase. In first-phase,
>>> we just peek pages so we don't need locking.
>>>         If we see non-stablizing value, it would be caught by 2-phase
>>> with needed lock or
>>>         can_rescue_unmovable_pageblock can return out of loop by
>>> stale page_order(cursor_page).
>>>         It couldn't make unmovable pageblock to movable but we can do
>>> it next time, again.
>>>         It's not critical.
>>>
>>> 2) Any comment?
>>>
>>> Now I can't inline the code so sorry but attach patch.
>>> It's not a formal patch/never tested.
>>>
>>
>>
>> Attached patch has a BUG in can_rescue_unmovable_pageblock.
>> Resend. I hope it is fixed.
>>
>>
>>
>>
>>
>> diff --git a/include/linux/compaction.h b/include/linux/compaction.h
>> index 51a90b7..e988037 100644
>> --- a/include/linux/compaction.h
>> +++ b/include/linux/compaction.h
>> @@ -1,6 +1,8 @@
>>  #ifndef _LINUX_COMPACTION_H
>>  #define _LINUX_COMPACTION_H
>>
>> +#include <linux/node.h>
>> +
>>  /* Return values for compact_zone() and try_to_compact_pages() */
>>  /* compaction didn't start as it was not possible or direct reclaim
>> was more suitable */
>>  #define COMPACT_SKIPPED        0
>> @@ -11,6 +13,23 @@
>>  /* The full zone was compacted */
>>  #define COMPACT_COMPLETE    3
>>
>> +/*
>> + * compaction supports three modes
>> + *
>> + * COMPACT_ASYNC_MOVABLE uses asynchronous migration and only scans
>> + *    MIGRATE_MOVABLE pageblocks as migration sources and targets.
>> + * COMPACT_ASYNC_UNMOVABLE uses asynchronous migration and only scans
>> + *    MIGRATE_MOVABLE pageblocks as migration sources.
>> + *    MIGRATE_UNMOVABLE pageblocks are scanned as potential migration
>> + *    targets and convers them to MIGRATE_MOVABLE if possible
>> + * COMPACT_SYNC uses synchronous migration and scans all pageblocks
>> + */
>> +enum compact_mode {
>> +    COMPACT_ASYNC_MOVABLE,
>> +    COMPACT_ASYNC_UNMOVABLE,
>> +    COMPACT_SYNC,
>> +};
>> +
>>  #ifdef CONFIG_COMPACTION
>>  extern int sysctl_compact_memory;
>>  extern int sysctl_compaction_handler(struct ctl_table *table, int write,
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index 7ea259d..dd02f25 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -236,7 +236,7 @@ isolate_migratepages_range(struct zone *zone,
>> struct compact_control *cc,
>>       */
>>      while (unlikely(too_many_isolated(zone))) {
>>          /* async migration should just abort */
>> -        if (!cc->sync)
>> +        if (cc->mode != COMPACT_SYNC)
>>              return 0;
>>
>>          congestion_wait(BLK_RW_ASYNC, HZ/10);
>> @@ -304,7 +304,8 @@ isolate_migratepages_range(struct zone *zone,
>> struct compact_control *cc,
>>           * satisfies the allocation
>>           */
>>          pageblock_nr = low_pfn >> pageblock_order;
>> -        if (!cc->sync && last_pageblock_nr != pageblock_nr &&
>> +        if (cc->mode != COMPACT_SYNC &&
>> +            last_pageblock_nr != pageblock_nr &&
>>              !migrate_async_suitable(get_pageblock_migratetype(page))) {
>>              low_pfn += pageblock_nr_pages;
>>              low_pfn = ALIGN(low_pfn, pageblock_nr_pages) - 1;
>> @@ -325,7 +326,7 @@ isolate_migratepages_range(struct zone *zone,
>> struct compact_control *cc,
>>              continue;
>>          }
>>
>> -        if (!cc->sync)
>> +        if (cc->mode != COMPACT_SYNC)
>>              mode |= ISOLATE_ASYNC_MIGRATE;
>>
>>          lruvec = mem_cgroup_page_lruvec(page, zone);
>> @@ -360,27 +361,121 @@ isolate_migratepages_range(struct zone *zone,
>> struct compact_control *cc,
>>
>>  #endif /* CONFIG_COMPACTION || CONFIG_CMA */
>>  #ifdef CONFIG_COMPACTION
>> +/*
>> + * Returns true if MIGRATE_UNMOVABLE pageblock can be successfully
>> + * converted to MIGRATE_MOVABLE type, false otherwise.
>> + */
>> +static bool can_rescue_unmovable_pageblock(struct page *page, bool
>> need_lrulock)
>> +{
>> +    struct zone *zone;
>> +    unsigned long pfn, start_pfn, end_pfn;
>> +    struct page *start_page, *end_page, *cursor_page;
>> +    bool lru_locked = false;
>> +
>> +    zone = page_zone(page);
>> +    pfn = page_to_pfn(page);
>> +    start_pfn = pfn & ~(pageblock_nr_pages - 1);
>> +    end_pfn = start_pfn + pageblock_nr_pages - 1;
>> +
>> +    start_page = pfn_to_page(start_pfn);
>> +    end_page = pfn_to_page(end_pfn);
>> +
>> +    for (cursor_page = start_page, pfn = start_pfn; cursor_page <=
>> end_page;
>> +        pfn++, cursor_page++) {
>>
>> -/* Returns true if the page is within a block suitable for migration
>> to */
>> -static bool suitable_migration_target(struct page *page)
>> +        if (!pfn_valid_within(pfn))
>> +            continue;
>> +
>> +        /* Do not deal with pageblocks that overlap zones */
>> +        if (page_zone(cursor_page) != zone)
>> +            goto out;
>> +
>> +        if (PageBuddy(cursor_page)) {
>> +            unsigned long order = page_order(cursor_page);
>> +
>> +            pfn += (1 << order) - 1;
>> +            cursor_page += (1 << order) - 1;
>> +            continue;
>> +        } else if (page_count(cursor_page) == 0) {
>> +            continue;
> 
> Can we assume freed tail page always have page_count()==0? if yes, why
> do we
> need dangerous PageBuddy(cursor_page) check? ok, but this may be harmless.
> 


page_count check is for pcp pages.
Am I missing your point?

> But if no, this code is seriously dangerous. think following scenario,
> 
> 1) cursor page points free page
> 
>     +----------------+------------------+
>     | free (order-1) |  used (order-1)  |
>     +----------------+------------------+
>     |
>    cursor
> 
> 2) moved cursor
> 
>     +----------------+------------------+
>     | free (order-1) |  used (order-1)  |
>     +----------------+------------------+
>                      |
>                      cursor
> 
> 3) neighbor block was freed
> 
> 
>     +----------------+------------------+
>     | free (order-2)                    |
>     +----------------+------------------+
>                      |
>                      cursor
> 
> now, cursor points to middle of free block.
> 

> 

> Anyway, I recommend to avoid dangerous no zone->lock game and change
> can_rescue_unmovable_pageblock() is only called w/ zone->lock. I have



I can't understand your point.
If the page is middle of free block, what's the problem in can_rescue_unmovable_pageblock
at first trial of can_rescue_xxx?
I think we can stabilize it in second trial of can_rescue_unmovable_pageblock with zone->lock.

> no seen any worth to include this high complex for mere minor optimization.

> 

> 
>> +        } else if (PageLRU(cursor_page)) {
>> +            if (!need_lrulock)
>> +                continue;
>> +            else if (lru_locked)
>> +                continue;
>> +            else {
>> +                spin_lock(&zone->lru_lock);
> 
> Hmm...
> I don't like to take lru_lock. 1) Until now, we carefully avoid to take
> both zone->lock and zone->lru_lock. they are both performance critical
> lock. And I think pageblock migratetype don't need strictly correct. It
> is only optimization of anti fragmentation. Why do we need take it?
> 


movable_block has unmovable page can make regression of anti-fragmentation.
So I did it. I agree it's a sort of optimization.
If others don't want it at the cost of regression anti-fragmentation, we can remove the lock.

> 
> 
>> +                lru_locked = true;
>> +                if (PageLRU(page))
>> +                    continue;
>> +            }
>> +        }
>> +
>> +        goto out;
>> +    }
>> +
> 
> Why don't we need to release lru_lock when returning true.


Because my brain has gone. :(

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v9] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
@ 2012-06-05  6:05           ` Minchan Kim
  0 siblings, 0 replies; 31+ messages in thread
From: Minchan Kim @ 2012-06-05  6:05 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: ; Bartlomiej Zolnierkiewicz, linux-mm, linux-kernel, Hugh Dickins,
	Linus Torvalds, Kyungmin Park, Marek Szyprowski, Mel Gorman,
	Rik van Riel, Dave Jones, Andrew Morton, Cong Wang,
	Markus Trippelsdorf

On 06/05/2012 01:35 PM, KOSAKI Motohiro wrote:

>>>> Minchan, are you interest this patch? If yes, can you please rewrite
>>>> it?
>>>
>>> Can do it but I want to give credit to Bartlomiej.
>>> Bartlomiej, if you like my patch, could you resend it as formal patch
>>> after you do broad testing?
>>>
>>> Frankly speaking, I don't want to merge it without any data which
>>> prove it's really good for real practice.
>>>
>>> When the patch firstly was submitted, it wasn't complicated so I was
>>> okay at that time but it has been complicated
>>> than my expectation. So if Andrew might pass the decision to me, I'm
>>> totally NACK if author doesn't provide
>>> any real data or VOC of some client.
> 
> I agree. And you don't need to bother this patch if you are not interest
> this one. I'm sorry.


Never mind.

> Let's throw it away until the author send us data.
> 

I guess it's hard to make such workload to prove it's useful normally.
But we can't make sure there isn't such workload in the world.
So I hope listen VOC. At least, Mel might require it.

If anyone doesn't support it, I hope let's add some vmstat like stuff for proving
this patch's effect. If we can't see the benefit through vmstat, we can deprecate
it later.

>>> 1) Any comment?
>>>
>>> Anyway, I fixed some bugs and clean up something I found during review.
>>>
>>> Minor thing.
>>> 1. change smt_result naming - I never like such long non-consistent
>>> naming. How about this?
>>> 2. fix can_rescue_unmovable_pageblock
>>>     2.1 pfn valid check for page_zone
>>>
>>> Major thing.
>>>
>>>     2.2 add lru_lock for stablizing PageLRU
>>>         If we don't hold lru_lock, there is possibility that
>>> unmovable(non-LRU) page can put in movable pageblock.
>>>         It can make compaction/CMA's regression. But there is a
>>> concern about deadlock between lru_lock and lock.
>>>         As I look the code, I can't find allocation trial with
>>> holding lru_lock so it might be safe(but not sure,
>>>         I didn't test it. It need more careful review/testing) but it
>>> makes new locking dependency(not sure, too.
>>>         We already made such rule but I didn't know that until now
>>> ;-) ) Why I thought so is we can allocate
>>>         GFP_ATOMIC with holding lru_lock, logically which might be
>>> crazy idea.
>>>
>>>     2.3 remove zone->lock in first phase.
>>>         We do rescue unmovable pageblock by 2-phase. In first-phase,
>>> we just peek pages so we don't need locking.
>>>         If we see non-stablizing value, it would be caught by 2-phase
>>> with needed lock or
>>>         can_rescue_unmovable_pageblock can return out of loop by
>>> stale page_order(cursor_page).
>>>         It couldn't make unmovable pageblock to movable but we can do
>>> it next time, again.
>>>         It's not critical.
>>>
>>> 2) Any comment?
>>>
>>> Now I can't inline the code so sorry but attach patch.
>>> It's not a formal patch/never tested.
>>>
>>
>>
>> Attached patch has a BUG in can_rescue_unmovable_pageblock.
>> Resend. I hope it is fixed.
>>
>>
>>
>>
>>
>> diff --git a/include/linux/compaction.h b/include/linux/compaction.h
>> index 51a90b7..e988037 100644
>> --- a/include/linux/compaction.h
>> +++ b/include/linux/compaction.h
>> @@ -1,6 +1,8 @@
>>  #ifndef _LINUX_COMPACTION_H
>>  #define _LINUX_COMPACTION_H
>>
>> +#include <linux/node.h>
>> +
>>  /* Return values for compact_zone() and try_to_compact_pages() */
>>  /* compaction didn't start as it was not possible or direct reclaim
>> was more suitable */
>>  #define COMPACT_SKIPPED        0
>> @@ -11,6 +13,23 @@
>>  /* The full zone was compacted */
>>  #define COMPACT_COMPLETE    3
>>
>> +/*
>> + * compaction supports three modes
>> + *
>> + * COMPACT_ASYNC_MOVABLE uses asynchronous migration and only scans
>> + *    MIGRATE_MOVABLE pageblocks as migration sources and targets.
>> + * COMPACT_ASYNC_UNMOVABLE uses asynchronous migration and only scans
>> + *    MIGRATE_MOVABLE pageblocks as migration sources.
>> + *    MIGRATE_UNMOVABLE pageblocks are scanned as potential migration
>> + *    targets and convers them to MIGRATE_MOVABLE if possible
>> + * COMPACT_SYNC uses synchronous migration and scans all pageblocks
>> + */
>> +enum compact_mode {
>> +    COMPACT_ASYNC_MOVABLE,
>> +    COMPACT_ASYNC_UNMOVABLE,
>> +    COMPACT_SYNC,
>> +};
>> +
>>  #ifdef CONFIG_COMPACTION
>>  extern int sysctl_compact_memory;
>>  extern int sysctl_compaction_handler(struct ctl_table *table, int write,
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index 7ea259d..dd02f25 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -236,7 +236,7 @@ isolate_migratepages_range(struct zone *zone,
>> struct compact_control *cc,
>>       */
>>      while (unlikely(too_many_isolated(zone))) {
>>          /* async migration should just abort */
>> -        if (!cc->sync)
>> +        if (cc->mode != COMPACT_SYNC)
>>              return 0;
>>
>>          congestion_wait(BLK_RW_ASYNC, HZ/10);
>> @@ -304,7 +304,8 @@ isolate_migratepages_range(struct zone *zone,
>> struct compact_control *cc,
>>           * satisfies the allocation
>>           */
>>          pageblock_nr = low_pfn >> pageblock_order;
>> -        if (!cc->sync && last_pageblock_nr != pageblock_nr &&
>> +        if (cc->mode != COMPACT_SYNC &&
>> +            last_pageblock_nr != pageblock_nr &&
>>              !migrate_async_suitable(get_pageblock_migratetype(page))) {
>>              low_pfn += pageblock_nr_pages;
>>              low_pfn = ALIGN(low_pfn, pageblock_nr_pages) - 1;
>> @@ -325,7 +326,7 @@ isolate_migratepages_range(struct zone *zone,
>> struct compact_control *cc,
>>              continue;
>>          }
>>
>> -        if (!cc->sync)
>> +        if (cc->mode != COMPACT_SYNC)
>>              mode |= ISOLATE_ASYNC_MIGRATE;
>>
>>          lruvec = mem_cgroup_page_lruvec(page, zone);
>> @@ -360,27 +361,121 @@ isolate_migratepages_range(struct zone *zone,
>> struct compact_control *cc,
>>
>>  #endif /* CONFIG_COMPACTION || CONFIG_CMA */
>>  #ifdef CONFIG_COMPACTION
>> +/*
>> + * Returns true if MIGRATE_UNMOVABLE pageblock can be successfully
>> + * converted to MIGRATE_MOVABLE type, false otherwise.
>> + */
>> +static bool can_rescue_unmovable_pageblock(struct page *page, bool
>> need_lrulock)
>> +{
>> +    struct zone *zone;
>> +    unsigned long pfn, start_pfn, end_pfn;
>> +    struct page *start_page, *end_page, *cursor_page;
>> +    bool lru_locked = false;
>> +
>> +    zone = page_zone(page);
>> +    pfn = page_to_pfn(page);
>> +    start_pfn = pfn & ~(pageblock_nr_pages - 1);
>> +    end_pfn = start_pfn + pageblock_nr_pages - 1;
>> +
>> +    start_page = pfn_to_page(start_pfn);
>> +    end_page = pfn_to_page(end_pfn);
>> +
>> +    for (cursor_page = start_page, pfn = start_pfn; cursor_page <=
>> end_page;
>> +        pfn++, cursor_page++) {
>>
>> -/* Returns true if the page is within a block suitable for migration
>> to */
>> -static bool suitable_migration_target(struct page *page)
>> +        if (!pfn_valid_within(pfn))
>> +            continue;
>> +
>> +        /* Do not deal with pageblocks that overlap zones */
>> +        if (page_zone(cursor_page) != zone)
>> +            goto out;
>> +
>> +        if (PageBuddy(cursor_page)) {
>> +            unsigned long order = page_order(cursor_page);
>> +
>> +            pfn += (1 << order) - 1;
>> +            cursor_page += (1 << order) - 1;
>> +            continue;
>> +        } else if (page_count(cursor_page) == 0) {
>> +            continue;
> 
> Can we assume freed tail page always have page_count()==0? if yes, why
> do we
> need dangerous PageBuddy(cursor_page) check? ok, but this may be harmless.
> 


page_count check is for pcp pages.
Am I missing your point?

> But if no, this code is seriously dangerous. think following scenario,
> 
> 1) cursor page points free page
> 
>     +----------------+------------------+
>     | free (order-1) |  used (order-1)  |
>     +----------------+------------------+
>     |
>    cursor
> 
> 2) moved cursor
> 
>     +----------------+------------------+
>     | free (order-1) |  used (order-1)  |
>     +----------------+------------------+
>                      |
>                      cursor
> 
> 3) neighbor block was freed
> 
> 
>     +----------------+------------------+
>     | free (order-2)                    |
>     +----------------+------------------+
>                      |
>                      cursor
> 
> now, cursor points to middle of free block.
> 

> 

> Anyway, I recommend to avoid dangerous no zone->lock game and change
> can_rescue_unmovable_pageblock() is only called w/ zone->lock. I have



I can't understand your point.
If the page is middle of free block, what's the problem in can_rescue_unmovable_pageblock
at first trial of can_rescue_xxx?
I think we can stabilize it in second trial of can_rescue_unmovable_pageblock with zone->lock.

> no seen any worth to include this high complex for mere minor optimization.

> 

> 
>> +        } else if (PageLRU(cursor_page)) {
>> +            if (!need_lrulock)
>> +                continue;
>> +            else if (lru_locked)
>> +                continue;
>> +            else {
>> +                spin_lock(&zone->lru_lock);
> 
> Hmm...
> I don't like to take lru_lock. 1) Until now, we carefully avoid to take
> both zone->lock and zone->lru_lock. they are both performance critical
> lock. And I think pageblock migratetype don't need strictly correct. It
> is only optimization of anti fragmentation. Why do we need take it?
> 


movable_block has unmovable page can make regression of anti-fragmentation.
So I did it. I agree it's a sort of optimization.
If others don't want it at the cost of regression anti-fragmentation, we can remove the lock.

> 
> 
>> +                lru_locked = true;
>> +                if (PageLRU(page))
>> +                    continue;
>> +            }
>> +        }
>> +
>> +        goto out;
>> +    }
>> +
> 
> Why don't we need to release lru_lock when returning true.


Because my brain has gone. :(

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v9] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
  2012-06-05  6:05           ` Minchan Kim
@ 2012-06-05 14:40             ` KOSAKI Motohiro
  -1 siblings, 0 replies; 31+ messages in thread
From: KOSAKI Motohiro @ 2012-06-05 14:40 UTC (permalink / raw)
  To: Minchan Kim
  Cc: KOSAKI Motohiro, Bartlomiej Zolnierkiewicz, linux-mm,
	linux-kernel, Hugh Dickins, Linus Torvalds, Kyungmin Park,
	Marek Szyprowski, Mel Gorman, Rik van Riel, Dave Jones,
	Andrew Morton, Cong Wang, Markus Trippelsdorf

(6/5/12 2:05 AM), Minchan Kim wrote:
> On 06/05/2012 01:35 PM, KOSAKI Motohiro wrote:
>
>>>>> Minchan, are you interest this patch? If yes, can you please rewrite
>>>>> it?
>>>>
>>>> Can do it but I want to give credit to Bartlomiej.
>>>> Bartlomiej, if you like my patch, could you resend it as formal patch
>>>> after you do broad testing?
>>>>
>>>> Frankly speaking, I don't want to merge it without any data which
>>>> prove it's really good for real practice.
>>>>
>>>> When the patch firstly was submitted, it wasn't complicated so I was
>>>> okay at that time but it has been complicated
>>>> than my expectation. So if Andrew might pass the decision to me, I'm
>>>> totally NACK if author doesn't provide
>>>> any real data or VOC of some client.
>>
>> I agree. And you don't need to bother this patch if you are not interest
>> this one. I'm sorry.
>
>
> Never mind.
>
>> Let's throw it away until the author send us data.
>>
>
> I guess it's hard to make such workload to prove it's useful normally.
> But we can't make sure there isn't such workload in the world.
> So I hope listen VOC. At least, Mel might require it.
>
> If anyone doesn't support it, I hope let's add some vmstat like stuff for proving
> this patch's effect. If we can't see the benefit through vmstat, we can deprecate
> it later.

Eek, bug we can not deprecate the vmstat. I hope to make good decision _before_
inclusion. ;-)


>>> +static bool can_rescue_unmovable_pageblock(struct page *page, bool
>>> need_lrulock)
>>> +{
>>> +    struct zone *zone;
>>> +    unsigned long pfn, start_pfn, end_pfn;
>>> +    struct page *start_page, *end_page, *cursor_page;
>>> +    bool lru_locked = false;
>>> +
>>> +    zone = page_zone(page);
>>> +    pfn = page_to_pfn(page);
>>> +    start_pfn = pfn&  ~(pageblock_nr_pages - 1);
>>> +    end_pfn = start_pfn + pageblock_nr_pages - 1;
>>> +
>>> +    start_page = pfn_to_page(start_pfn);
>>> +    end_page = pfn_to_page(end_pfn);
>>> +
>>> +    for (cursor_page = start_page, pfn = start_pfn; cursor_page<=
>>> end_page;
>>> +        pfn++, cursor_page++) {
>>>
>>> -/* Returns true if the page is within a block suitable for migration
>>> to */
>>> -static bool suitable_migration_target(struct page *page)
>>> +        if (!pfn_valid_within(pfn))
>>> +            continue;
>>> +
>>> +        /* Do not deal with pageblocks that overlap zones */
>>> +        if (page_zone(cursor_page) != zone)
>>> +            goto out;
>>> +
>>> +        if (PageBuddy(cursor_page)) {
>>> +            unsigned long order = page_order(cursor_page);
>>> +
>>> +            pfn += (1<<  order) - 1;
>>> +            cursor_page += (1<<  order) - 1;
>>> +            continue;
>>> +        } else if (page_count(cursor_page) == 0) {
>>> +            continue;
>>
>> Can we assume freed tail page always have page_count()==0? if yes, why
>> do we
>> need dangerous PageBuddy(cursor_page) check? ok, but this may be harmless.
>
> page_count check is for pcp pages.

Right. but my point was, I doubt we can do buddy walk w/o zone->lock.


> Am I missing your point?
>
>
>> But if no, this code is seriously dangerous. think following scenario,
>>
>> 1) cursor page points free page
>>
>>      +----------------+------------------+
>>      | free (order-1) |  used (order-1)  |
>>      +----------------+------------------+
>>      |
>>     cursor
>>
>> 2) moved cursor
>>
>>      +----------------+------------------+
>>      | free (order-1) |  used (order-1)  |
>>      +----------------+------------------+
>>                       |
>>                       cursor
>>
>> 3) neighbor block was freed
>>
>>
>>      +----------------+------------------+
>>      | free (order-2)                    |
>>      +----------------+------------------+
>>                       |
>>                       cursor
>>
>> now, cursor points to middle of free block.
>
>> Anyway, I recommend to avoid dangerous no zone->lock game and change
>> can_rescue_unmovable_pageblock() is only called w/ zone->lock. I have
>
>
>
> I can't understand your point.
> If the page is middle of free block, what's the problem in can_rescue_unmovable_pageblock
> at first trial of can_rescue_xxx?

I'm not sure. but other all pfn scanning code carefully avoid to touch a middle of free pages
block. (also they take zone->lock anytime)


> I think we can stabilize it in second trial of can_rescue_unmovable_pageblock with zone->lock.
>
>> no seen any worth to include this high complex for mere minor optimization.
>
>>
>
>>
>>> +        } else if (PageLRU(cursor_page)) {
>>> +            if (!need_lrulock)
>>> +                continue;
>>> +            else if (lru_locked)
>>> +                continue;
>>> +            else {
>>> +                spin_lock(&zone->lru_lock);
>>
>> Hmm...
>> I don't like to take lru_lock. 1) Until now, we carefully avoid to take
>> both zone->lock and zone->lru_lock. they are both performance critical
>> lock. And I think pageblock migratetype don't need strictly correct. It
>> is only optimization of anti fragmentation. Why do we need take it?
>
> movable_block has unmovable page can make regression of anti-fragmentation.
> So I did it. I agree it's a sort of optimization.
> If others don't want it at the cost of regression anti-fragmentation, we can remove the lock.

ok.


>
>>
>>
>>> +                lru_locked = true;
>>> +                if (PageLRU(page))
>>> +                    continue;
>>> +            }
>>> +        }
>>> +
>>> +        goto out;
>>> +    }
>>> +
>>
>> Why don't we need to release lru_lock when returning true.
>
>
> Because my brain has gone. :(

Never mind.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v9] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
@ 2012-06-05 14:40             ` KOSAKI Motohiro
  0 siblings, 0 replies; 31+ messages in thread
From: KOSAKI Motohiro @ 2012-06-05 14:40 UTC (permalink / raw)
  To: Minchan Kim
  Cc: KOSAKI Motohiro, Bartlomiej Zolnierkiewicz, linux-mm,
	linux-kernel, Hugh Dickins, Linus Torvalds, Kyungmin Park,
	Marek Szyprowski, Mel Gorman, Rik van Riel, Dave Jones,
	Andrew Morton, Cong Wang, Markus Trippelsdorf

(6/5/12 2:05 AM), Minchan Kim wrote:
> On 06/05/2012 01:35 PM, KOSAKI Motohiro wrote:
>
>>>>> Minchan, are you interest this patch? If yes, can you please rewrite
>>>>> it?
>>>>
>>>> Can do it but I want to give credit to Bartlomiej.
>>>> Bartlomiej, if you like my patch, could you resend it as formal patch
>>>> after you do broad testing?
>>>>
>>>> Frankly speaking, I don't want to merge it without any data which
>>>> prove it's really good for real practice.
>>>>
>>>> When the patch firstly was submitted, it wasn't complicated so I was
>>>> okay at that time but it has been complicated
>>>> than my expectation. So if Andrew might pass the decision to me, I'm
>>>> totally NACK if author doesn't provide
>>>> any real data or VOC of some client.
>>
>> I agree. And you don't need to bother this patch if you are not interest
>> this one. I'm sorry.
>
>
> Never mind.
>
>> Let's throw it away until the author send us data.
>>
>
> I guess it's hard to make such workload to prove it's useful normally.
> But we can't make sure there isn't such workload in the world.
> So I hope listen VOC. At least, Mel might require it.
>
> If anyone doesn't support it, I hope let's add some vmstat like stuff for proving
> this patch's effect. If we can't see the benefit through vmstat, we can deprecate
> it later.

Eek, bug we can not deprecate the vmstat. I hope to make good decision _before_
inclusion. ;-)


>>> +static bool can_rescue_unmovable_pageblock(struct page *page, bool
>>> need_lrulock)
>>> +{
>>> +    struct zone *zone;
>>> +    unsigned long pfn, start_pfn, end_pfn;
>>> +    struct page *start_page, *end_page, *cursor_page;
>>> +    bool lru_locked = false;
>>> +
>>> +    zone = page_zone(page);
>>> +    pfn = page_to_pfn(page);
>>> +    start_pfn = pfn&  ~(pageblock_nr_pages - 1);
>>> +    end_pfn = start_pfn + pageblock_nr_pages - 1;
>>> +
>>> +    start_page = pfn_to_page(start_pfn);
>>> +    end_page = pfn_to_page(end_pfn);
>>> +
>>> +    for (cursor_page = start_page, pfn = start_pfn; cursor_page<=
>>> end_page;
>>> +        pfn++, cursor_page++) {
>>>
>>> -/* Returns true if the page is within a block suitable for migration
>>> to */
>>> -static bool suitable_migration_target(struct page *page)
>>> +        if (!pfn_valid_within(pfn))
>>> +            continue;
>>> +
>>> +        /* Do not deal with pageblocks that overlap zones */
>>> +        if (page_zone(cursor_page) != zone)
>>> +            goto out;
>>> +
>>> +        if (PageBuddy(cursor_page)) {
>>> +            unsigned long order = page_order(cursor_page);
>>> +
>>> +            pfn += (1<<  order) - 1;
>>> +            cursor_page += (1<<  order) - 1;
>>> +            continue;
>>> +        } else if (page_count(cursor_page) == 0) {
>>> +            continue;
>>
>> Can we assume freed tail page always have page_count()==0? if yes, why
>> do we
>> need dangerous PageBuddy(cursor_page) check? ok, but this may be harmless.
>
> page_count check is for pcp pages.

Right. but my point was, I doubt we can do buddy walk w/o zone->lock.


> Am I missing your point?
>
>
>> But if no, this code is seriously dangerous. think following scenario,
>>
>> 1) cursor page points free page
>>
>>      +----------------+------------------+
>>      | free (order-1) |  used (order-1)  |
>>      +----------------+------------------+
>>      |
>>     cursor
>>
>> 2) moved cursor
>>
>>      +----------------+------------------+
>>      | free (order-1) |  used (order-1)  |
>>      +----------------+------------------+
>>                       |
>>                       cursor
>>
>> 3) neighbor block was freed
>>
>>
>>      +----------------+------------------+
>>      | free (order-2)                    |
>>      +----------------+------------------+
>>                       |
>>                       cursor
>>
>> now, cursor points to middle of free block.
>
>> Anyway, I recommend to avoid dangerous no zone->lock game and change
>> can_rescue_unmovable_pageblock() is only called w/ zone->lock. I have
>
>
>
> I can't understand your point.
> If the page is middle of free block, what's the problem in can_rescue_unmovable_pageblock
> at first trial of can_rescue_xxx?

I'm not sure. but other all pfn scanning code carefully avoid to touch a middle of free pages
block. (also they take zone->lock anytime)


> I think we can stabilize it in second trial of can_rescue_unmovable_pageblock with zone->lock.
>
>> no seen any worth to include this high complex for mere minor optimization.
>
>>
>
>>
>>> +        } else if (PageLRU(cursor_page)) {
>>> +            if (!need_lrulock)
>>> +                continue;
>>> +            else if (lru_locked)
>>> +                continue;
>>> +            else {
>>> +                spin_lock(&zone->lru_lock);
>>
>> Hmm...
>> I don't like to take lru_lock. 1) Until now, we carefully avoid to take
>> both zone->lock and zone->lru_lock. they are both performance critical
>> lock. And I think pageblock migratetype don't need strictly correct. It
>> is only optimization of anti fragmentation. Why do we need take it?
>
> movable_block has unmovable page can make regression of anti-fragmentation.
> So I did it. I agree it's a sort of optimization.
> If others don't want it at the cost of regression anti-fragmentation, we can remove the lock.

ok.


>
>>
>>
>>> +                lru_locked = true;
>>> +                if (PageLRU(page))
>>> +                    continue;
>>> +            }
>>> +        }
>>> +
>>> +        goto out;
>>> +    }
>>> +
>>
>> Why don't we need to release lru_lock when returning true.
>
>
> Because my brain has gone. :(

Never mind.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v9] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
  2012-06-05  6:05           ` Minchan Kim
@ 2012-06-11 13:06             ` Mel Gorman
  -1 siblings, 0 replies; 31+ messages in thread
From: Mel Gorman @ 2012-06-11 13:06 UTC (permalink / raw)
  To: Minchan Kim
  Cc: KOSAKI Motohiro, Bartlomiej Zolnierkiewicz, linux-mm,
	linux-kernel, Hugh Dickins, Linus Torvalds, Kyungmin Park,
	Marek Szyprowski, Rik van Riel, Dave Jones, Andrew Morton,
	Cong Wang, Markus Trippelsdorf

On Tue, Jun 05, 2012 at 03:05:40PM +0900, Minchan Kim wrote:
> > Let's throw it away until the author send us data.
> > 
> 
> I guess it's hard to make such workload to prove it's useful normally.
> But we can't make sure there isn't such workload in the world.
> So I hope listen VOC. At least, Mel might require it.
> 

I'm playing a lot of catch-up at the moment after being out for a few days
so sorry for my silence on this and other threads.

My initial support for this patch was based on an artifical load but one I
felt was plausible to trigger if CMA was being used. In a normal workload
I thought it might be possible to hit if a large process exited freeing
a lot of pagetable pages from MIGRATE_UNMOVABLE blocks at the same time
but that is a little unlikely and a test case would also look very artifical.

Hence, I believe that if you require a real workload to demonstrate the
benefit of the patch that it will be very difficult to find. The primary
decision is if CMA needs this or not. I was under the impression that it
was a help for CMA allocation success rates but I may be mistaken.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v9] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
@ 2012-06-11 13:06             ` Mel Gorman
  0 siblings, 0 replies; 31+ messages in thread
From: Mel Gorman @ 2012-06-11 13:06 UTC (permalink / raw)
  To: Minchan Kim
  Cc: KOSAKI Motohiro, Bartlomiej Zolnierkiewicz, linux-mm,
	linux-kernel, Hugh Dickins, Linus Torvalds, Kyungmin Park,
	Marek Szyprowski, Rik van Riel, Dave Jones, Andrew Morton,
	Cong Wang, Markus Trippelsdorf

On Tue, Jun 05, 2012 at 03:05:40PM +0900, Minchan Kim wrote:
> > Let's throw it away until the author send us data.
> > 
> 
> I guess it's hard to make such workload to prove it's useful normally.
> But we can't make sure there isn't such workload in the world.
> So I hope listen VOC. At least, Mel might require it.
> 

I'm playing a lot of catch-up at the moment after being out for a few days
so sorry for my silence on this and other threads.

My initial support for this patch was based on an artifical load but one I
felt was plausible to trigger if CMA was being used. In a normal workload
I thought it might be possible to hit if a large process exited freeing
a lot of pagetable pages from MIGRATE_UNMOVABLE blocks at the same time
but that is a little unlikely and a test case would also look very artifical.

Hence, I believe that if you require a real workload to demonstrate the
benefit of the patch that it will be very difficult to find. The primary
decision is if CMA needs this or not. I was under the impression that it
was a help for CMA allocation success rates but I may be mistaken.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v9] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
  2012-06-11 13:06             ` Mel Gorman
@ 2012-06-11 13:35               ` Rik van Riel
  -1 siblings, 0 replies; 31+ messages in thread
From: Rik van Riel @ 2012-06-11 13:35 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Minchan Kim, KOSAKI Motohiro, Bartlomiej Zolnierkiewicz, linux-mm,
	linux-kernel, Hugh Dickins, Linus Torvalds, Kyungmin Park,
	Marek Szyprowski, Dave Jones, Andrew Morton, Cong Wang,
	Markus Trippelsdorf

On 06/11/2012 09:06 AM, Mel Gorman wrote:

> My initial support for this patch was based on an artifical load but one I
> felt was plausible to trigger if CMA was being used. In a normal workload
> I thought it might be possible to hit if a large process exited freeing
> a lot of pagetable pages from MIGRATE_UNMOVABLE blocks at the same time
> but that is a little unlikely and a test case would also look very artifical.
>
> Hence, I believe that if you require a real workload to demonstrate the
> benefit of the patch that it will be very difficult to find. The primary
> decision is if CMA needs this or not. I was under the impression that it
> was a help for CMA allocation success rates but I may be mistaken.

If it helps CMA allocation rates, it should also help
allocation rates for transparent hugepages.

Conveniently, THP allocation rates are already exported
in /proc/vmstat.  Now all we need is a test load :)

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v9] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
@ 2012-06-11 13:35               ` Rik van Riel
  0 siblings, 0 replies; 31+ messages in thread
From: Rik van Riel @ 2012-06-11 13:35 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Minchan Kim, KOSAKI Motohiro, Bartlomiej Zolnierkiewicz, linux-mm,
	linux-kernel, Hugh Dickins, Linus Torvalds, Kyungmin Park,
	Marek Szyprowski, Dave Jones, Andrew Morton, Cong Wang,
	Markus Trippelsdorf

On 06/11/2012 09:06 AM, Mel Gorman wrote:

> My initial support for this patch was based on an artifical load but one I
> felt was plausible to trigger if CMA was being used. In a normal workload
> I thought it might be possible to hit if a large process exited freeing
> a lot of pagetable pages from MIGRATE_UNMOVABLE blocks at the same time
> but that is a little unlikely and a test case would also look very artifical.
>
> Hence, I believe that if you require a real workload to demonstrate the
> benefit of the patch that it will be very difficult to find. The primary
> decision is if CMA needs this or not. I was under the impression that it
> was a help for CMA allocation success rates but I may be mistaken.

If it helps CMA allocation rates, it should also help
allocation rates for transparent hugepages.

Conveniently, THP allocation rates are already exported
in /proc/vmstat.  Now all we need is a test load :)

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v9] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
  2012-06-05  2:38       ` Minchan Kim
@ 2012-06-06 10:06         ` Bartlomiej Zolnierkiewicz
  -1 siblings, 0 replies; 31+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2012-06-06 10:06 UTC (permalink / raw)
  To: Minchan Kim
  Cc: KOSAKI Motohiro, linux-mm, linux-kernel, Hugh Dickins,
	Linus Torvalds, Kyungmin Park, Marek Szyprowski, Mel Gorman,
	Rik van Riel, Dave Jones, Andrew Morton, Cong Wang,
	Markus Trippelsdorf

On Tuesday 05 June 2012 04:38:53 Minchan Kim wrote:
> On 06/05/2012 10:59 AM, Minchan Kim wrote:
> 
> > On 06/05/2012 05:22 AM, KOSAKI Motohiro wrote:
> > 
> >>> +/*
> >>> + * Returns true if MIGRATE_UNMOVABLE pageblock can be successfully
> >>> + * converted to MIGRATE_MOVABLE type, false otherwise.
> >>> + */
> >>> +static bool can_rescue_unmovable_pageblock(struct page *page, bool
> >>> locked)
> >>> +{
> >>> +    unsigned long pfn, start_pfn, end_pfn;
> >>> +    struct page *start_page, *end_page, *cursor_page;
> >>> +
> >>> +    pfn = page_to_pfn(page);
> >>> +    start_pfn = pfn&  ~(pageblock_nr_pages - 1);
> >>> +    end_pfn = start_pfn + pageblock_nr_pages - 1;
> >>> +
> >>> +    start_page = pfn_to_page(start_pfn);
> >>> +    end_page = pfn_to_page(end_pfn);
> >>> +
> >>> +    for (cursor_page = start_page, pfn = start_pfn; cursor_page<=
> >>> end_page;
> >>> +        pfn++, cursor_page++) {
> >>> +        struct zone *zone = page_zone(start_page);
> >>> +        unsigned long flags;
> >>> +
> >>> +        if (!pfn_valid_within(pfn))
> >>> +            continue;
> >>> +
> >>> +        /* Do not deal with pageblocks that overlap zones */
> >>> +        if (page_zone(cursor_page) != zone)
> >>> +            return false;
> >>> +
> >>> +        if (!locked)
> >>> +            spin_lock_irqsave(&zone->lock, flags);
> >>> +
> >>> +        if (PageBuddy(cursor_page)) {
> >>> +            int order = page_order(cursor_page);
> >>>
> >>> -/* Returns true if the page is within a block suitable for migration
> >>> to */
> >>> -static bool suitable_migration_target(struct page *page)
> >>> +            pfn += (1<<  order) - 1;
> >>> +            cursor_page += (1<<  order) - 1;
> >>> +
> >>> +            if (!locked)
> >>> +                spin_unlock_irqrestore(&zone->lock, flags);
> >>> +            continue;
> >>> +        } else if (page_count(cursor_page) == 0 ||
> >>> +               PageLRU(cursor_page)) {
> >>> +            if (!locked)
> >>> +                spin_unlock_irqrestore(&zone->lock, flags);
> >>> +            continue;
> >>> +        }
> >>> +
> >>> +        if (!locked)
> >>> +            spin_unlock_irqrestore(&zone->lock, flags);
> >>> +
> >>> +        return false;
> >>> +    }
> >>> +
> >>> +    return true;
> >>> +}
> >>
> >> Minchan, are you interest this patch? If yes, can you please rewrite it?
> > 
> > 
> > Can do it but I want to give credit to Bartlomiej.
> > Bartlomiej, if you like my patch, could you resend it as formal patch after you do broad testing?

Sure.

> >> This one are
> >> not fixed our pointed issue and can_rescue_unmovable_pageblock() still
> >> has plenty bugs.
> >> We can't ack it.
> >>
> > 
> > 
> > Frankly speaking, I don't want to merge it without any data which prove it's really good for real practice.
> > 
> > When the patch firstly was submitted, it wasn't complicated so I was okay at that time but it has been complicated
> > than my expectation. So if Andrew might pass the decision to me, I'm totally NACK if author doesn't provide
> > any real data or VOC of some client.

I found this issue by accident while testing compaction code so unfortunately
I don't have any real bugreport to back it up.  It is just a corner case which
is more likely to happen in situation where there is rather small number of
pageblocks and quite heavy kernel memory allocation/freeing activity in
kernel going on.  I would presume that the issue can happen on some embedded
configurations but they are not your typical machine and it is not likely
to see a real bugreport for it.

I'm also quite unhappy with the increasing complexity of what seemed as
a quite simple fix initially and I tend to agree that the patch may stay
out-of-tree until there is a more proven need for it (maybe with documenting
the issue in the code for the time being).  Still, I would like to have
all outstanding issues fixed so I can merge the patch locally (and to -mm
if Andrew agrees) and just wait to see if the patch is ever needed in
practice.

I've attached the code that I use to trigger the issue at the bottom of this
mail so people can reproduce the problem and see for themselves whether it
is worth to fix it or not.

> > 1) Any comment?
> > 
> > Anyway, I fixed some bugs and clean up something I found during review.

Thanks for doing this.

> > Minor thing.
> > 1. change smt_result naming - I never like such long non-consistent naming. How about this?
> > 2. fix can_rescue_unmovable_pageblock 
> >    2.1 pfn valid check for page_zone
> > 
> > Major thing.
> > 
> >    2.2 add lru_lock for stablizing PageLRU
> >        If we don't hold lru_lock, there is possibility that unmovable(non-LRU) page can put in movable pageblock.
> >        It can make compaction/CMA's regression. But there is a concern about deadlock between lru_lock and lock.
> >        As I look the code, I can't find allocation trial with holding lru_lock so it might be safe(but not sure,
> >        I didn't test it. It need more careful review/testing) but it makes new locking dependency(not sure, too.
> >        We already made such rule but I didn't know that until now ;-) ) Why I thought so is we can allocate
> >        GFP_ATOMIC with holding lru_lock, logically which might be crazy idea.
> > 
> >    2.3 remove zone->lock in first phase.
> >        We do rescue unmovable pageblock by 2-phase. In first-phase, we just peek pages so we don't need locking.
> >        If we see non-stablizing value, it would be caught by 2-phase with needed lock or 
> >        can_rescue_unmovable_pageblock can return out of loop by stale page_order(cursor_page).
> >        It couldn't make unmovable pageblock to movable but we can do it next time, again.
> >        It's not critical.
> > 
> > 2) Any comment?
> > 
> > Now I can't inline the code so sorry but attach patch.
> > It's not a formal patch/never tested.
> > 
> 
> 
> Attached patch has a BUG in can_rescue_unmovable_pageblock.
> Resend. I hope it is fixed.

@@ -399,10 +399,14 @@
 		} else if (page_count(cursor_page) == 0) {
 			continue;
 		} else if (PageLRU(cursor_page)) {
-			if (!lru_locked && need_lrulock) {
+			if (!need_lrulock)
+				continue;
+			else if (lru_locked)
+				continue;
+			else {
 				spin_lock(&zone->lru_lock);
 				lru_locked = true;
-				if (PageLRU(cursor_page))
+				if (PageLRU(page))
 					continue;
 			}
 		}

Could you please explain why do we need to check page and not cursor_page
here?

Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung Poland R&D Center


My test case (on 512 MiB machine):
* insmod alloc_frag.ko
* run ./alloc_app and push it to background with Ctrl-Z
* rmmod alloc_frag.ko
* insmod alloc_test.ko

---
 alloc_app.c     |   22 ++++++++++++++++++++++
 mm/Kconfig      |    8 ++++++++
 mm/Makefile     |    3 +++
 mm/alloc_frag.c |   35 +++++++++++++++++++++++++++++++++++
 mm/alloc_test.c |   40 ++++++++++++++++++++++++++++++++++++++++
 5 files changed, 108 insertions(+)

Index: b/alloc_app.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ b/alloc_app.c	2012-04-06 11:49:23.789380700 +0200
@@ -0,0 +1,22 @@
+
+#include <stdlib.h>
+#include <string.h>
+#include <stdio.h>
+
+#define ALLOC_NR_PAGES 60000
+
+int main(void)
+{
+	void *alloc_app_pages[ALLOC_NR_PAGES];
+	int i;
+
+	for (i = 0; i < ALLOC_NR_PAGES; i++) {
+		alloc_app_pages[i] = malloc(4096);
+		if (alloc_app_pages[i])
+			memset(alloc_app_pages[i], 'z', 4096);
+	}
+
+	getchar();
+
+	return 0;
+}
Index: b/mm/Kconfig
===================================================================
--- a/mm/Kconfig	2012-04-05 18:40:36.000000000 +0200
+++ b/mm/Kconfig	2012-04-06 11:49:23.789380700 +0200
@@ -379,3 +379,11 @@
 	  in a negligible performance hit.
 
 	  If unsure, say Y to enable cleancache
+
+config ALLOC_FRAG
+	tristate "alloc frag"
+	help
+
+config ALLOC_TEST
+	tristate "alloc test"
+	help
Index: b/mm/Makefile
===================================================================
--- a/mm/Makefile	2012-04-05 18:40:36.000000000 +0200
+++ b/mm/Makefile	2012-04-06 11:49:23.789380700 +0200
@@ -16,6 +16,9 @@
 			   $(mmu-y)
 obj-y += init-mm.o
 
+obj-$(CONFIG_ALLOC_FRAG) += alloc_frag.o
+obj-$(CONFIG_ALLOC_TEST) += alloc_test.o
+
 ifdef CONFIG_NO_BOOTMEM
 	obj-y		+= nobootmem.o
 else
Index: b/mm/alloc_frag.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ b/mm/alloc_frag.c	2012-04-06 11:52:43.761439914 +0200
@@ -0,0 +1,35 @@
+
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+
+#define ALLOC_NR_PAGES 120000
+static struct page *alloc_frag_pages[ALLOC_NR_PAGES];
+
+static int __init alloc_frag_init(void)
+{
+	int i;
+
+	for (i = 0; i < ALLOC_NR_PAGES; i++)
+		alloc_frag_pages[i] = alloc_pages(GFP_KERNEL, 0);
+
+	for (i = 0; i < ALLOC_NR_PAGES; i += 2) {
+		if (alloc_frag_pages[i])
+			__free_pages(alloc_frag_pages[i], 0);
+	}
+
+	return 0;
+}
+module_init(alloc_frag_init);
+
+static void __exit alloc_frag_exit(void)
+{
+	int i;
+
+	for (i = 1; i < ALLOC_NR_PAGES; i += 2)
+		if (alloc_frag_pages[i])
+			__free_pages(alloc_frag_pages[i], 0);
+}
+module_exit(alloc_frag_exit);
+
+MODULE_LICENSE("GPL");
Index: b/mm/alloc_test.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ b/mm/alloc_test.c	2012-04-06 11:49:23.789380700 +0200
@@ -0,0 +1,40 @@
+
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+
+#define ALLOC_NR_PAGES 120000
+static struct page *alloc_test_pages[ALLOC_NR_PAGES];
+
+static int __init alloc_test_init(void)
+{
+	int i;
+
+	printk("trying order-9 allocs..\n");
+	for (i = 0; i < 100; i++) {
+		alloc_test_pages[i] = alloc_pages(GFP_KERNEL, 9);
+		if (alloc_test_pages[i])
+			printk("\ttry %d succes\n", i);
+		else {
+			printk("\ttry %d failure\n", i);
+			break;
+		}
+	}
+
+	return 0;
+}
+module_init(alloc_test_init);
+
+static void __exit alloc_test_exit(void)
+{
+	int i;
+
+	for (i = 0; i < 100; i++) {
+		if (alloc_test_pages[i])
+			__free_pages(alloc_test_pages[i], 9);
+	}
+
+}
+module_exit(alloc_test_exit);
+
+MODULE_LICENSE("GPL");

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v9] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
@ 2012-06-06 10:06         ` Bartlomiej Zolnierkiewicz
  0 siblings, 0 replies; 31+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2012-06-06 10:06 UTC (permalink / raw)
  To: Minchan Kim
  Cc: KOSAKI Motohiro, linux-mm, linux-kernel, Hugh Dickins,
	Linus Torvalds, Kyungmin Park, Marek Szyprowski, Mel Gorman,
	Rik van Riel, Dave Jones, Andrew Morton, Cong Wang,
	Markus Trippelsdorf

On Tuesday 05 June 2012 04:38:53 Minchan Kim wrote:
> On 06/05/2012 10:59 AM, Minchan Kim wrote:
> 
> > On 06/05/2012 05:22 AM, KOSAKI Motohiro wrote:
> > 
> >>> +/*
> >>> + * Returns true if MIGRATE_UNMOVABLE pageblock can be successfully
> >>> + * converted to MIGRATE_MOVABLE type, false otherwise.
> >>> + */
> >>> +static bool can_rescue_unmovable_pageblock(struct page *page, bool
> >>> locked)
> >>> +{
> >>> +    unsigned long pfn, start_pfn, end_pfn;
> >>> +    struct page *start_page, *end_page, *cursor_page;
> >>> +
> >>> +    pfn = page_to_pfn(page);
> >>> +    start_pfn = pfn&  ~(pageblock_nr_pages - 1);
> >>> +    end_pfn = start_pfn + pageblock_nr_pages - 1;
> >>> +
> >>> +    start_page = pfn_to_page(start_pfn);
> >>> +    end_page = pfn_to_page(end_pfn);
> >>> +
> >>> +    for (cursor_page = start_page, pfn = start_pfn; cursor_page<=
> >>> end_page;
> >>> +        pfn++, cursor_page++) {
> >>> +        struct zone *zone = page_zone(start_page);
> >>> +        unsigned long flags;
> >>> +
> >>> +        if (!pfn_valid_within(pfn))
> >>> +            continue;
> >>> +
> >>> +        /* Do not deal with pageblocks that overlap zones */
> >>> +        if (page_zone(cursor_page) != zone)
> >>> +            return false;
> >>> +
> >>> +        if (!locked)
> >>> +            spin_lock_irqsave(&zone->lock, flags);
> >>> +
> >>> +        if (PageBuddy(cursor_page)) {
> >>> +            int order = page_order(cursor_page);
> >>>
> >>> -/* Returns true if the page is within a block suitable for migration
> >>> to */
> >>> -static bool suitable_migration_target(struct page *page)
> >>> +            pfn += (1<<  order) - 1;
> >>> +            cursor_page += (1<<  order) - 1;
> >>> +
> >>> +            if (!locked)
> >>> +                spin_unlock_irqrestore(&zone->lock, flags);
> >>> +            continue;
> >>> +        } else if (page_count(cursor_page) == 0 ||
> >>> +               PageLRU(cursor_page)) {
> >>> +            if (!locked)
> >>> +                spin_unlock_irqrestore(&zone->lock, flags);
> >>> +            continue;
> >>> +        }
> >>> +
> >>> +        if (!locked)
> >>> +            spin_unlock_irqrestore(&zone->lock, flags);
> >>> +
> >>> +        return false;
> >>> +    }
> >>> +
> >>> +    return true;
> >>> +}
> >>
> >> Minchan, are you interest this patch? If yes, can you please rewrite it?
> > 
> > 
> > Can do it but I want to give credit to Bartlomiej.
> > Bartlomiej, if you like my patch, could you resend it as formal patch after you do broad testing?

Sure.

> >> This one are
> >> not fixed our pointed issue and can_rescue_unmovable_pageblock() still
> >> has plenty bugs.
> >> We can't ack it.
> >>
> > 
> > 
> > Frankly speaking, I don't want to merge it without any data which prove it's really good for real practice.
> > 
> > When the patch firstly was submitted, it wasn't complicated so I was okay at that time but it has been complicated
> > than my expectation. So if Andrew might pass the decision to me, I'm totally NACK if author doesn't provide
> > any real data or VOC of some client.

I found this issue by accident while testing compaction code so unfortunately
I don't have any real bugreport to back it up.  It is just a corner case which
is more likely to happen in situation where there is rather small number of
pageblocks and quite heavy kernel memory allocation/freeing activity in
kernel going on.  I would presume that the issue can happen on some embedded
configurations but they are not your typical machine and it is not likely
to see a real bugreport for it.

I'm also quite unhappy with the increasing complexity of what seemed as
a quite simple fix initially and I tend to agree that the patch may stay
out-of-tree until there is a more proven need for it (maybe with documenting
the issue in the code for the time being).  Still, I would like to have
all outstanding issues fixed so I can merge the patch locally (and to -mm
if Andrew agrees) and just wait to see if the patch is ever needed in
practice.

I've attached the code that I use to trigger the issue at the bottom of this
mail so people can reproduce the problem and see for themselves whether it
is worth to fix it or not.

> > 1) Any comment?
> > 
> > Anyway, I fixed some bugs and clean up something I found during review.

Thanks for doing this.

> > Minor thing.
> > 1. change smt_result naming - I never like such long non-consistent naming. How about this?
> > 2. fix can_rescue_unmovable_pageblock 
> >    2.1 pfn valid check for page_zone
> > 
> > Major thing.
> > 
> >    2.2 add lru_lock for stablizing PageLRU
> >        If we don't hold lru_lock, there is possibility that unmovable(non-LRU) page can put in movable pageblock.
> >        It can make compaction/CMA's regression. But there is a concern about deadlock between lru_lock and lock.
> >        As I look the code, I can't find allocation trial with holding lru_lock so it might be safe(but not sure,
> >        I didn't test it. It need more careful review/testing) but it makes new locking dependency(not sure, too.
> >        We already made such rule but I didn't know that until now ;-) ) Why I thought so is we can allocate
> >        GFP_ATOMIC with holding lru_lock, logically which might be crazy idea.
> > 
> >    2.3 remove zone->lock in first phase.
> >        We do rescue unmovable pageblock by 2-phase. In first-phase, we just peek pages so we don't need locking.
> >        If we see non-stablizing value, it would be caught by 2-phase with needed lock or 
> >        can_rescue_unmovable_pageblock can return out of loop by stale page_order(cursor_page).
> >        It couldn't make unmovable pageblock to movable but we can do it next time, again.
> >        It's not critical.
> > 
> > 2) Any comment?
> > 
> > Now I can't inline the code so sorry but attach patch.
> > It's not a formal patch/never tested.
> > 
> 
> 
> Attached patch has a BUG in can_rescue_unmovable_pageblock.
> Resend. I hope it is fixed.

@@ -399,10 +399,14 @@
 		} else if (page_count(cursor_page) == 0) {
 			continue;
 		} else if (PageLRU(cursor_page)) {
-			if (!lru_locked && need_lrulock) {
+			if (!need_lrulock)
+				continue;
+			else if (lru_locked)
+				continue;
+			else {
 				spin_lock(&zone->lru_lock);
 				lru_locked = true;
-				if (PageLRU(cursor_page))
+				if (PageLRU(page))
 					continue;
 			}
 		}

Could you please explain why do we need to check page and not cursor_page
here?

Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung Poland R&D Center


My test case (on 512 MiB machine):
* insmod alloc_frag.ko
* run ./alloc_app and push it to background with Ctrl-Z
* rmmod alloc_frag.ko
* insmod alloc_test.ko

---
 alloc_app.c     |   22 ++++++++++++++++++++++
 mm/Kconfig      |    8 ++++++++
 mm/Makefile     |    3 +++
 mm/alloc_frag.c |   35 +++++++++++++++++++++++++++++++++++
 mm/alloc_test.c |   40 ++++++++++++++++++++++++++++++++++++++++
 5 files changed, 108 insertions(+)

Index: b/alloc_app.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ b/alloc_app.c	2012-04-06 11:49:23.789380700 +0200
@@ -0,0 +1,22 @@
+
+#include <stdlib.h>
+#include <string.h>
+#include <stdio.h>
+
+#define ALLOC_NR_PAGES 60000
+
+int main(void)
+{
+	void *alloc_app_pages[ALLOC_NR_PAGES];
+	int i;
+
+	for (i = 0; i < ALLOC_NR_PAGES; i++) {
+		alloc_app_pages[i] = malloc(4096);
+		if (alloc_app_pages[i])
+			memset(alloc_app_pages[i], 'z', 4096);
+	}
+
+	getchar();
+
+	return 0;
+}
Index: b/mm/Kconfig
===================================================================
--- a/mm/Kconfig	2012-04-05 18:40:36.000000000 +0200
+++ b/mm/Kconfig	2012-04-06 11:49:23.789380700 +0200
@@ -379,3 +379,11 @@
 	  in a negligible performance hit.
 
 	  If unsure, say Y to enable cleancache
+
+config ALLOC_FRAG
+	tristate "alloc frag"
+	help
+
+config ALLOC_TEST
+	tristate "alloc test"
+	help
Index: b/mm/Makefile
===================================================================
--- a/mm/Makefile	2012-04-05 18:40:36.000000000 +0200
+++ b/mm/Makefile	2012-04-06 11:49:23.789380700 +0200
@@ -16,6 +16,9 @@
 			   $(mmu-y)
 obj-y += init-mm.o
 
+obj-$(CONFIG_ALLOC_FRAG) += alloc_frag.o
+obj-$(CONFIG_ALLOC_TEST) += alloc_test.o
+
 ifdef CONFIG_NO_BOOTMEM
 	obj-y		+= nobootmem.o
 else
Index: b/mm/alloc_frag.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ b/mm/alloc_frag.c	2012-04-06 11:52:43.761439914 +0200
@@ -0,0 +1,35 @@
+
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+
+#define ALLOC_NR_PAGES 120000
+static struct page *alloc_frag_pages[ALLOC_NR_PAGES];
+
+static int __init alloc_frag_init(void)
+{
+	int i;
+
+	for (i = 0; i < ALLOC_NR_PAGES; i++)
+		alloc_frag_pages[i] = alloc_pages(GFP_KERNEL, 0);
+
+	for (i = 0; i < ALLOC_NR_PAGES; i += 2) {
+		if (alloc_frag_pages[i])
+			__free_pages(alloc_frag_pages[i], 0);
+	}
+
+	return 0;
+}
+module_init(alloc_frag_init);
+
+static void __exit alloc_frag_exit(void)
+{
+	int i;
+
+	for (i = 1; i < ALLOC_NR_PAGES; i += 2)
+		if (alloc_frag_pages[i])
+			__free_pages(alloc_frag_pages[i], 0);
+}
+module_exit(alloc_frag_exit);
+
+MODULE_LICENSE("GPL");
Index: b/mm/alloc_test.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ b/mm/alloc_test.c	2012-04-06 11:49:23.789380700 +0200
@@ -0,0 +1,40 @@
+
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+
+#define ALLOC_NR_PAGES 120000
+static struct page *alloc_test_pages[ALLOC_NR_PAGES];
+
+static int __init alloc_test_init(void)
+{
+	int i;
+
+	printk("trying order-9 allocs..\n");
+	for (i = 0; i < 100; i++) {
+		alloc_test_pages[i] = alloc_pages(GFP_KERNEL, 9);
+		if (alloc_test_pages[i])
+			printk("\ttry %d succes\n", i);
+		else {
+			printk("\ttry %d failure\n", i);
+			break;
+		}
+	}
+
+	return 0;
+}
+module_init(alloc_test_init);
+
+static void __exit alloc_test_exit(void)
+{
+	int i;
+
+	for (i = 0; i < 100; i++) {
+		if (alloc_test_pages[i])
+			__free_pages(alloc_test_pages[i], 9);
+	}
+
+}
+module_exit(alloc_test_exit);
+
+MODULE_LICENSE("GPL");

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v9] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
  2012-06-06 10:06         ` Bartlomiej Zolnierkiewicz
@ 2012-06-07  4:13           ` Minchan Kim
  -1 siblings, 0 replies; 31+ messages in thread
From: Minchan Kim @ 2012-06-07  4:13 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: KOSAKI Motohiro, linux-mm, linux-kernel, Hugh Dickins,
	Linus Torvalds, Kyungmin Park, Marek Szyprowski, Mel Gorman,
	Rik van Riel, Dave Jones, Andrew Morton, Cong Wang,
	Markus Trippelsdorf

[-- Attachment #1: Type: text/plain, Size: 6965 bytes --]

On 06/06/2012 07:06 PM, Bartlomiej Zolnierkiewicz wrote:

> On Tuesday 05 June 2012 04:38:53 Minchan Kim wrote:
>> On 06/05/2012 10:59 AM, Minchan Kim wrote:
>>
>>> On 06/05/2012 05:22 AM, KOSAKI Motohiro wrote:
>>>
>>>>> +/*
>>>>> + * Returns true if MIGRATE_UNMOVABLE pageblock can be successfully
>>>>> + * converted to MIGRATE_MOVABLE type, false otherwise.
>>>>> + */
>>>>> +static bool can_rescue_unmovable_pageblock(struct page *page, bool
>>>>> locked)
>>>>> +{
>>>>> +    unsigned long pfn, start_pfn, end_pfn;
>>>>> +    struct page *start_page, *end_page, *cursor_page;
>>>>> +
>>>>> +    pfn = page_to_pfn(page);
>>>>> +    start_pfn = pfn&  ~(pageblock_nr_pages - 1);
>>>>> +    end_pfn = start_pfn + pageblock_nr_pages - 1;
>>>>> +
>>>>> +    start_page = pfn_to_page(start_pfn);
>>>>> +    end_page = pfn_to_page(end_pfn);
>>>>> +
>>>>> +    for (cursor_page = start_page, pfn = start_pfn; cursor_page<=
>>>>> end_page;
>>>>> +        pfn++, cursor_page++) {
>>>>> +        struct zone *zone = page_zone(start_page);
>>>>> +        unsigned long flags;
>>>>> +
>>>>> +        if (!pfn_valid_within(pfn))
>>>>> +            continue;
>>>>> +
>>>>> +        /* Do not deal with pageblocks that overlap zones */
>>>>> +        if (page_zone(cursor_page) != zone)
>>>>> +            return false;
>>>>> +
>>>>> +        if (!locked)
>>>>> +            spin_lock_irqsave(&zone->lock, flags);
>>>>> +
>>>>> +        if (PageBuddy(cursor_page)) {
>>>>> +            int order = page_order(cursor_page);
>>>>>
>>>>> -/* Returns true if the page is within a block suitable for migration
>>>>> to */
>>>>> -static bool suitable_migration_target(struct page *page)
>>>>> +            pfn += (1<<  order) - 1;
>>>>> +            cursor_page += (1<<  order) - 1;
>>>>> +
>>>>> +            if (!locked)
>>>>> +                spin_unlock_irqrestore(&zone->lock, flags);
>>>>> +            continue;
>>>>> +        } else if (page_count(cursor_page) == 0 ||
>>>>> +               PageLRU(cursor_page)) {
>>>>> +            if (!locked)
>>>>> +                spin_unlock_irqrestore(&zone->lock, flags);
>>>>> +            continue;
>>>>> +        }
>>>>> +
>>>>> +        if (!locked)
>>>>> +            spin_unlock_irqrestore(&zone->lock, flags);
>>>>> +
>>>>> +        return false;
>>>>> +    }
>>>>> +
>>>>> +    return true;
>>>>> +}
>>>>
>>>> Minchan, are you interest this patch? If yes, can you please rewrite it?
>>>
>>>
>>> Can do it but I want to give credit to Bartlomiej.
>>> Bartlomiej, if you like my patch, could you resend it as formal patch after you do broad testing?
> 
> Sure.


Please use attached one instead of buggy old version. :(
This patch fix THP racing, remove unnecessary lock and add more comment.

> 
>>>> This one are
>>>> not fixed our pointed issue and can_rescue_unmovable_pageblock() still
>>>> has plenty bugs.
>>>> We can't ack it.
>>>>
>>>
>>>
>>> Frankly speaking, I don't want to merge it without any data which prove it's really good for real practice.
>>>
>>> When the patch firstly was submitted, it wasn't complicated so I was okay at that time but it has been complicated
>>> than my expectation. So if Andrew might pass the decision to me, I'm totally NACK if author doesn't provide
>>> any real data or VOC of some client.
> 
> I found this issue by accident while testing compaction code so unfortunately
> I don't have any real bugreport to back it up.  It is just a corner case which
> is more likely to happen in situation where there is rather small number of
> pageblocks and quite heavy kernel memory allocation/freeing activity in
> kernel going on.  I would presume that the issue can happen on some embedded
> configurations but they are not your typical machine and it is not likely
> to see a real bugreport for it.
> 
> I'm also quite unhappy with the increasing complexity of what seemed as
> a quite simple fix initially and I tend to agree that the patch may stay
> out-of-tree until there is a more proven need for it (maybe with documenting
> the issue in the code for the time being).  Still, I would like to have
> all outstanding issues fixed so I can merge the patch locally (and to -mm
> if Andrew agrees) and just wait to see if the patch is ever needed in
> practice.
> 
> I've attached the code that I use to trigger the issue at the bottom of this
> mail so people can reproduce the problem and see for themselves whether it
> is worth to fix it or not.
> 
>>> 1) Any comment?
>>>
>>> Anyway, I fixed some bugs and clean up something I found during review.
> 
> Thanks for doing this.
> 
>>> Minor thing.
>>> 1. change smt_result naming - I never like such long non-consistent naming. How about this?
>>> 2. fix can_rescue_unmovable_pageblock 
>>>    2.1 pfn valid check for page_zone
>>>
>>> Major thing.
>>>
>>>    2.2 add lru_lock for stablizing PageLRU
>>>        If we don't hold lru_lock, there is possibility that unmovable(non-LRU) page can put in movable pageblock.
>>>        It can make compaction/CMA's regression. But there is a concern about deadlock between lru_lock and lock.
>>>        As I look the code, I can't find allocation trial with holding lru_lock so it might be safe(but not sure,
>>>        I didn't test it. It need more careful review/testing) but it makes new locking dependency(not sure, too.
>>>        We already made such rule but I didn't know that until now ;-) ) Why I thought so is we can allocate
>>>        GFP_ATOMIC with holding lru_lock, logically which might be crazy idea.
>>>
>>>    2.3 remove zone->lock in first phase.
>>>        We do rescue unmovable pageblock by 2-phase. In first-phase, we just peek pages so we don't need locking.
>>>        If we see non-stablizing value, it would be caught by 2-phase with needed lock or 
>>>        can_rescue_unmovable_pageblock can return out of loop by stale page_order(cursor_page).
>>>        It couldn't make unmovable pageblock to movable but we can do it next time, again.
>>>        It's not critical.
>>>
>>> 2) Any comment?
>>>
>>> Now I can't inline the code so sorry but attach patch.
>>> It's not a formal patch/never tested.
>>>
>>
>>
>> Attached patch has a BUG in can_rescue_unmovable_pageblock.
>> Resend. I hope it is fixed.
> 
> @@ -399,10 +399,14 @@
>  		} else if (page_count(cursor_page) == 0) {
>  			continue;
>  		} else if (PageLRU(cursor_page)) {
> -			if (!lru_locked && need_lrulock) {
> +			if (!need_lrulock)
> +				continue;
> +			else if (lru_locked)
> +				continue;
> +			else {
>  				spin_lock(&zone->lru_lock);
>  				lru_locked = true;
> -				if (PageLRU(cursor_page))
> +				if (PageLRU(page))
>  					continue;
>  			}
>  		}
> 
> Could you please explain why do we need to check page and not cursor_page
> here?


Slaps self.
That's because I was brain-dead typo.

Please consider attached one and of course, it's totally untested. :(

> 
> Best regards,
> --
> Bartlomiej Zolnierkiewicz
> Samsung Poland R&D Center



> -- 

Kind regards,
Minchan Kim

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-1.patch --]
[-- Type: text/x-patch; name="0001-1.patch", Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v9] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks
@ 2012-06-07  4:13           ` Minchan Kim
  0 siblings, 0 replies; 31+ messages in thread
From: Minchan Kim @ 2012-06-07  4:13 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: KOSAKI Motohiro, linux-mm, linux-kernel, Hugh Dickins,
	Linus Torvalds, Kyungmin Park, Marek Szyprowski, Mel Gorman,
	Rik van Riel, Dave Jones, Andrew Morton, Cong Wang,
	Markus Trippelsdorf

[-- Attachment #1: Type: text/plain, Size: 6965 bytes --]

On 06/06/2012 07:06 PM, Bartlomiej Zolnierkiewicz wrote:

> On Tuesday 05 June 2012 04:38:53 Minchan Kim wrote:
>> On 06/05/2012 10:59 AM, Minchan Kim wrote:
>>
>>> On 06/05/2012 05:22 AM, KOSAKI Motohiro wrote:
>>>
>>>>> +/*
>>>>> + * Returns true if MIGRATE_UNMOVABLE pageblock can be successfully
>>>>> + * converted to MIGRATE_MOVABLE type, false otherwise.
>>>>> + */
>>>>> +static bool can_rescue_unmovable_pageblock(struct page *page, bool
>>>>> locked)
>>>>> +{
>>>>> +    unsigned long pfn, start_pfn, end_pfn;
>>>>> +    struct page *start_page, *end_page, *cursor_page;
>>>>> +
>>>>> +    pfn = page_to_pfn(page);
>>>>> +    start_pfn = pfn&  ~(pageblock_nr_pages - 1);
>>>>> +    end_pfn = start_pfn + pageblock_nr_pages - 1;
>>>>> +
>>>>> +    start_page = pfn_to_page(start_pfn);
>>>>> +    end_page = pfn_to_page(end_pfn);
>>>>> +
>>>>> +    for (cursor_page = start_page, pfn = start_pfn; cursor_page<=
>>>>> end_page;
>>>>> +        pfn++, cursor_page++) {
>>>>> +        struct zone *zone = page_zone(start_page);
>>>>> +        unsigned long flags;
>>>>> +
>>>>> +        if (!pfn_valid_within(pfn))
>>>>> +            continue;
>>>>> +
>>>>> +        /* Do not deal with pageblocks that overlap zones */
>>>>> +        if (page_zone(cursor_page) != zone)
>>>>> +            return false;
>>>>> +
>>>>> +        if (!locked)
>>>>> +            spin_lock_irqsave(&zone->lock, flags);
>>>>> +
>>>>> +        if (PageBuddy(cursor_page)) {
>>>>> +            int order = page_order(cursor_page);
>>>>>
>>>>> -/* Returns true if the page is within a block suitable for migration
>>>>> to */
>>>>> -static bool suitable_migration_target(struct page *page)
>>>>> +            pfn += (1<<  order) - 1;
>>>>> +            cursor_page += (1<<  order) - 1;
>>>>> +
>>>>> +            if (!locked)
>>>>> +                spin_unlock_irqrestore(&zone->lock, flags);
>>>>> +            continue;
>>>>> +        } else if (page_count(cursor_page) == 0 ||
>>>>> +               PageLRU(cursor_page)) {
>>>>> +            if (!locked)
>>>>> +                spin_unlock_irqrestore(&zone->lock, flags);
>>>>> +            continue;
>>>>> +        }
>>>>> +
>>>>> +        if (!locked)
>>>>> +            spin_unlock_irqrestore(&zone->lock, flags);
>>>>> +
>>>>> +        return false;
>>>>> +    }
>>>>> +
>>>>> +    return true;
>>>>> +}
>>>>
>>>> Minchan, are you interest this patch? If yes, can you please rewrite it?
>>>
>>>
>>> Can do it but I want to give credit to Bartlomiej.
>>> Bartlomiej, if you like my patch, could you resend it as formal patch after you do broad testing?
> 
> Sure.


Please use attached one instead of buggy old version. :(
This patch fix THP racing, remove unnecessary lock and add more comment.

> 
>>>> This one are
>>>> not fixed our pointed issue and can_rescue_unmovable_pageblock() still
>>>> has plenty bugs.
>>>> We can't ack it.
>>>>
>>>
>>>
>>> Frankly speaking, I don't want to merge it without any data which prove it's really good for real practice.
>>>
>>> When the patch firstly was submitted, it wasn't complicated so I was okay at that time but it has been complicated
>>> than my expectation. So if Andrew might pass the decision to me, I'm totally NACK if author doesn't provide
>>> any real data or VOC of some client.
> 
> I found this issue by accident while testing compaction code so unfortunately
> I don't have any real bugreport to back it up.  It is just a corner case which
> is more likely to happen in situation where there is rather small number of
> pageblocks and quite heavy kernel memory allocation/freeing activity in
> kernel going on.  I would presume that the issue can happen on some embedded
> configurations but they are not your typical machine and it is not likely
> to see a real bugreport for it.
> 
> I'm also quite unhappy with the increasing complexity of what seemed as
> a quite simple fix initially and I tend to agree that the patch may stay
> out-of-tree until there is a more proven need for it (maybe with documenting
> the issue in the code for the time being).  Still, I would like to have
> all outstanding issues fixed so I can merge the patch locally (and to -mm
> if Andrew agrees) and just wait to see if the patch is ever needed in
> practice.
> 
> I've attached the code that I use to trigger the issue at the bottom of this
> mail so people can reproduce the problem and see for themselves whether it
> is worth to fix it or not.
> 
>>> 1) Any comment?
>>>
>>> Anyway, I fixed some bugs and clean up something I found during review.
> 
> Thanks for doing this.
> 
>>> Minor thing.
>>> 1. change smt_result naming - I never like such long non-consistent naming. How about this?
>>> 2. fix can_rescue_unmovable_pageblock 
>>>    2.1 pfn valid check for page_zone
>>>
>>> Major thing.
>>>
>>>    2.2 add lru_lock for stablizing PageLRU
>>>        If we don't hold lru_lock, there is possibility that unmovable(non-LRU) page can put in movable pageblock.
>>>        It can make compaction/CMA's regression. But there is a concern about deadlock between lru_lock and lock.
>>>        As I look the code, I can't find allocation trial with holding lru_lock so it might be safe(but not sure,
>>>        I didn't test it. It need more careful review/testing) but it makes new locking dependency(not sure, too.
>>>        We already made such rule but I didn't know that until now ;-) ) Why I thought so is we can allocate
>>>        GFP_ATOMIC with holding lru_lock, logically which might be crazy idea.
>>>
>>>    2.3 remove zone->lock in first phase.
>>>        We do rescue unmovable pageblock by 2-phase. In first-phase, we just peek pages so we don't need locking.
>>>        If we see non-stablizing value, it would be caught by 2-phase with needed lock or 
>>>        can_rescue_unmovable_pageblock can return out of loop by stale page_order(cursor_page).
>>>        It couldn't make unmovable pageblock to movable but we can do it next time, again.
>>>        It's not critical.
>>>
>>> 2) Any comment?
>>>
>>> Now I can't inline the code so sorry but attach patch.
>>> It's not a formal patch/never tested.
>>>
>>
>>
>> Attached patch has a BUG in can_rescue_unmovable_pageblock.
>> Resend. I hope it is fixed.
> 
> @@ -399,10 +399,14 @@
>  		} else if (page_count(cursor_page) == 0) {
>  			continue;
>  		} else if (PageLRU(cursor_page)) {
> -			if (!lru_locked && need_lrulock) {
> +			if (!need_lrulock)
> +				continue;
> +			else if (lru_locked)
> +				continue;
> +			else {
>  				spin_lock(&zone->lru_lock);
>  				lru_locked = true;
> -				if (PageLRU(cursor_page))
> +				if (PageLRU(page))
>  					continue;
>  			}
>  		}
> 
> Could you please explain why do we need to check page and not cursor_page
> here?


Slaps self.
That's because I was brain-dead typo.

Please consider attached one and of course, it's totally untested. :(

> 
> Best regards,
> --
> Bartlomiej Zolnierkiewicz
> Samsung Poland R&D Center



> -- 

Kind regards,
Minchan Kim

[-- Attachment #2: 0001-1.patch --]
[-- Type: text/x-patch, Size: 13928 bytes --]

>From ad4f07fe0da971fe8ef841aa4a2a5bc107fa8548 Mon Sep 17 00:00:00 2001
From: Minchan Kim <minchan@kernel.org>
Date: Thu, 7 Jun 2012 13:12:14 +0900
Subject: [PATCH] 1

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 include/linux/compaction.h |   19 +++++
 mm/compaction.c            |  170 ++++++++++++++++++++++++++++++++++++++------
 mm/internal.h              |    9 ++-
 mm/page_alloc.c            |    8 +--
 4 files changed, 178 insertions(+), 28 deletions(-)

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index 51a90b7..e988037 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -1,6 +1,8 @@
 #ifndef _LINUX_COMPACTION_H
 #define _LINUX_COMPACTION_H
 
+#include <linux/node.h>
+
 /* Return values for compact_zone() and try_to_compact_pages() */
 /* compaction didn't start as it was not possible or direct reclaim was more suitable */
 #define COMPACT_SKIPPED		0
@@ -11,6 +13,23 @@
 /* The full zone was compacted */
 #define COMPACT_COMPLETE	3
 
+/*
+ * compaction supports three modes
+ *
+ * COMPACT_ASYNC_MOVABLE uses asynchronous migration and only scans
+ *    MIGRATE_MOVABLE pageblocks as migration sources and targets.
+ * COMPACT_ASYNC_UNMOVABLE uses asynchronous migration and only scans
+ *    MIGRATE_MOVABLE pageblocks as migration sources.
+ *    MIGRATE_UNMOVABLE pageblocks are scanned as potential migration
+ *    targets and convers them to MIGRATE_MOVABLE if possible
+ * COMPACT_SYNC uses synchronous migration and scans all pageblocks
+ */
+enum compact_mode {
+	COMPACT_ASYNC_MOVABLE,
+	COMPACT_ASYNC_UNMOVABLE,
+	COMPACT_SYNC,
+};
+
 #ifdef CONFIG_COMPACTION
 extern int sysctl_compact_memory;
 extern int sysctl_compaction_handler(struct ctl_table *table, int write,
diff --git a/mm/compaction.c b/mm/compaction.c
index 7ea259d..a5c9141 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -236,7 +236,7 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
 	 */
 	while (unlikely(too_many_isolated(zone))) {
 		/* async migration should just abort */
-		if (!cc->sync)
+		if (cc->mode != COMPACT_SYNC)
 			return 0;
 
 		congestion_wait(BLK_RW_ASYNC, HZ/10);
@@ -304,7 +304,8 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
 		 * satisfies the allocation
 		 */
 		pageblock_nr = low_pfn >> pageblock_order;
-		if (!cc->sync && last_pageblock_nr != pageblock_nr &&
+		if (cc->mode != COMPACT_SYNC &&
+		    last_pageblock_nr != pageblock_nr &&
 		    !migrate_async_suitable(get_pageblock_migratetype(page))) {
 			low_pfn += pageblock_nr_pages;
 			low_pfn = ALIGN(low_pfn, pageblock_nr_pages) - 1;
@@ -325,7 +326,7 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
 			continue;
 		}
 
-		if (!cc->sync)
+		if (cc->mode != COMPACT_SYNC)
 			mode |= ISOLATE_ASYNC_MIGRATE;
 
 		lruvec = mem_cgroup_page_lruvec(page, zone);
@@ -360,27 +361,116 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
 
 #endif /* CONFIG_COMPACTION || CONFIG_CMA */
 #ifdef CONFIG_COMPACTION
+/*
+ * Returns true if MIGRATE_UNMOVABLE pageblock can be successfully
+ * converted to MIGRATE_MOVABLE type, false otherwise.
+ */
+static bool can_rescue_unmovable_pageblock(struct page *page)
+{
+	struct zone *zone;
+	unsigned long pfn, start_pfn, end_pfn;
+	struct page *start_page, *end_page, *cursor_page;
+
+	zone = page_zone(page);
+	pfn = page_to_pfn(page);
+	start_pfn = pfn & ~(pageblock_nr_pages - 1);
+	end_pfn = start_pfn + pageblock_nr_pages - 1;
+
+	start_page = pfn_to_page(start_pfn);
+	end_page = pfn_to_page(end_pfn);
+
+	for (cursor_page = start_page, pfn = start_pfn; cursor_page <= end_page;
+		pfn++, cursor_page++) {
+
+		if (!pfn_valid_within(pfn))
+			continue;
+
+		/* Do not deal with pageblocks that overlap zones */
+		if (page_zone(cursor_page) != zone)
+			return false;
+
+		/*
+		 * Race with page allocator/reclaimer can happen so that
+		 * it can deceive unmovable block to migratable type
+		 * on this pageblock. It could regress on anti-fragmentation
+		 * but it's rare and not critical.
+		 */
+		if (PageBuddy(cursor_page)) {
+			unsigned long order = page_order(cursor_page);
+
+			pfn += (1 << order) - 1;
+			cursor_page += (1 << order) - 1;
+			continue;
+		} else if (PageLRU(cursor_page)) {
+			continue;
+		/*
+		 * We can't use page_count which does compound_head
+		 * as we don't have a pin a page.
+		 */
+		} else if (!atomic_read(&cursor_page->_count)) {
+			continue;
+		}
+
+		return false;
+	}
+
+	return true;
+}
+
+static void rescue_unmovable_pageblock(struct page *page)
+{
+	set_pageblock_migratetype(page, MIGRATE_MOVABLE);
+	move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE);
+}
+
+/*
+ * MIGRATE_TARGET : good for migration target
+ * RESCUE_UNMOVABLE_TARTET : good only if we can rescue the unmovable pageblock.
+ * UNMOVABLE_TARGET : can't migrate because it's a page in unmovable pageblock.
+ * SKIP_TARGET : can't migrate by another reasons.
+ */
+enum smt_result {
+	MIGRATE_TARGET,
+	RESCUE_UNMOVABLE_TARGET,
+	UNMOVABLE_TARGET,
+	SKIP_TARGET,
+};
 
-/* Returns true if the page is within a block suitable for migration to */
-static bool suitable_migration_target(struct page *page)
+/*
+ * Returns MIGRATE_TARGET if the page is within a block
+ * suitable for migration to, UNMOVABLE_TARGET if the page
+ * is within a MIGRATE_UNMOVABLE block, SKIP_TARGET otherwise.
+ */
+static enum smt_result suitable_migration_target(struct page *page,
+			      struct compact_control *cc)
 {
 
 	int migratetype = get_pageblock_migratetype(page);
 
 	/* Don't interfere with memory hot-remove or the min_free_kbytes blocks */
 	if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
-		return false;
+		return SKIP_TARGET;
 
 	/* If the page is a large free page, then allow migration */
 	if (PageBuddy(page) && page_order(page) >= pageblock_order)
-		return true;
+		return MIGRATE_TARGET;
 
 	/* If the block is MIGRATE_MOVABLE or MIGRATE_CMA, allow migration */
-	if (migrate_async_suitable(migratetype))
-		return true;
+	if (cc->mode != COMPACT_ASYNC_UNMOVABLE &&
+	    migrate_async_suitable(migratetype))
+		return MIGRATE_TARGET;
+
+	if (cc->mode == COMPACT_ASYNC_MOVABLE &&
+	    migratetype == MIGRATE_UNMOVABLE)
+		return UNMOVABLE_TARGET;
+
+	if (cc->mode != COMPACT_ASYNC_MOVABLE &&
+	    migratetype == MIGRATE_UNMOVABLE &&
+	    can_rescue_unmovable_pageblock(page))
+		return RESCUE_UNMOVABLE_TARGET;
 
 	/* Otherwise skip the block */
-	return false;
+	return SKIP_TARGET;
 }
 
 /*
@@ -414,6 +504,13 @@ static void isolate_freepages(struct zone *zone,
 	zone_end_pfn = zone->zone_start_pfn + zone->spanned_pages;
 
 	/*
+	 * isolate_freepages() may be called more than once during
+	 * compact_zone_order() run and we want only the most recent
+	 * count.
+	 */
+	cc->nr_unmovable_pageblock = 0;
+
+	/*
 	 * Isolate free pages until enough are available to migrate the
 	 * pages on cc->migratepages. We stop searching if the migrate
 	 * and free page scanners meet or enough free pages are isolated.
@@ -421,6 +518,7 @@ static void isolate_freepages(struct zone *zone,
 	for (; pfn > low_pfn && cc->nr_migratepages > nr_freepages;
 					pfn -= pageblock_nr_pages) {
 		unsigned long isolated;
+		enum smt_result ret;
 
 		if (!pfn_valid(pfn))
 			continue;
@@ -437,9 +535,12 @@ static void isolate_freepages(struct zone *zone,
 			continue;
 
 		/* Check the block is suitable for migration */
-		if (!suitable_migration_target(page))
+		ret = suitable_migration_target(page, cc);
+		if (ret != MIGRATE_TARGET && ret != RESCUE_UNMOVABLE_TARGET) {
+			if (ret == UNMOVABLE_TARGET)
+				cc->nr_unmovable_pageblock++;
 			continue;
-
+		}
 		/*
 		 * Found a block suitable for isolating free pages from. Now
 		 * we disabled interrupts, double check things are ok and
@@ -448,12 +549,16 @@ static void isolate_freepages(struct zone *zone,
 		 */
 		isolated = 0;
 		spin_lock_irqsave(&zone->lock, flags);
-		if (suitable_migration_target(page)) {
+		ret = suitable_migration_target(page, cc);
+		if (ret == MIGRATE_TARGET || ret == RESCUE_UNMOVABLE_TARGET) {
+			if (ret == RESCUE_UNMOVABLE_TARGET)
+				rescue_unmovable_pageblock(page);
 			end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn);
 			isolated = isolate_freepages_block(pfn, end_pfn,
 							   freelist, false);
 			nr_freepages += isolated;
-		}
+		} else if (ret == UNMOVABLE_TARGET)
+			cc->nr_unmovable_pageblock++;
 		spin_unlock_irqrestore(&zone->lock, flags);
 
 		/*
@@ -685,8 +790,9 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
 
 		nr_migrate = cc->nr_migratepages;
 		err = migrate_pages(&cc->migratepages, compaction_alloc,
-				(unsigned long)cc, false,
-				cc->sync ? MIGRATE_SYNC_LIGHT : MIGRATE_ASYNC);
+			(unsigned long)&cc->freepages, false,
+			(cc->mode == COMPACT_SYNC) ? MIGRATE_SYNC_LIGHT
+						      : MIGRATE_ASYNC);
 		update_nr_listpages(cc);
 		nr_remaining = cc->nr_migratepages;
 
@@ -715,7 +821,8 @@ out:
 
 static unsigned long compact_zone_order(struct zone *zone,
 				 int order, gfp_t gfp_mask,
-				 bool sync)
+				 enum compact_mode mode,
+				 unsigned long *nr_pageblocks_skipped)
 {
 	struct compact_control cc = {
 		.nr_freepages = 0,
@@ -723,12 +830,17 @@ static unsigned long compact_zone_order(struct zone *zone,
 		.order = order,
 		.migratetype = allocflags_to_migratetype(gfp_mask),
 		.zone = zone,
-		.sync = sync,
+		.mode = mode,
 	};
+	unsigned long rc;
+
 	INIT_LIST_HEAD(&cc.freepages);
 	INIT_LIST_HEAD(&cc.migratepages);
 
-	return compact_zone(zone, &cc);
+	rc = compact_zone(zone, &cc);
+	*nr_pageblocks_skipped = cc.nr_unmovable_pageblock;
+
+	return rc;
 }
 
 int sysctl_extfrag_threshold = 500;
@@ -753,6 +865,8 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
 	struct zoneref *z;
 	struct zone *zone;
 	int rc = COMPACT_SKIPPED;
+	unsigned long nr_pageblocks_skipped;
+	enum compact_mode mode;
 
 	/*
 	 * Check whether it is worth even starting compaction. The order check is
@@ -769,12 +883,22 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
 								nodemask) {
 		int status;
 
-		status = compact_zone_order(zone, order, gfp_mask, sync);
+		mode = sync ? COMPACT_SYNC : COMPACT_ASYNC_MOVABLE;
+retry:
+		status = compact_zone_order(zone, order, gfp_mask, mode,
+						&nr_pageblocks_skipped);
 		rc = max(status, rc);
 
 		/* If a normal allocation would succeed, stop compacting */
 		if (zone_watermark_ok(zone, order, low_wmark_pages(zone), 0, 0))
 			break;
+
+		if (rc == COMPACT_COMPLETE && mode == COMPACT_ASYNC_MOVABLE) {
+			if (nr_pageblocks_skipped) {
+				mode = COMPACT_ASYNC_UNMOVABLE;
+				goto retry;
+			}
+		}
 	}
 
 	return rc;
@@ -808,7 +932,7 @@ static int __compact_pgdat(pg_data_t *pgdat, struct compact_control *cc)
 			if (ok && cc->order > zone->compact_order_failed)
 				zone->compact_order_failed = cc->order + 1;
 			/* Currently async compaction is never deferred. */
-			else if (!ok && cc->sync)
+			else if (!ok && cc->mode == COMPACT_SYNC)
 				defer_compaction(zone, cc->order);
 		}
 
@@ -823,7 +947,7 @@ int compact_pgdat(pg_data_t *pgdat, int order)
 {
 	struct compact_control cc = {
 		.order = order,
-		.sync = false,
+		.mode = COMPACT_ASYNC_MOVABLE,
 	};
 
 	return __compact_pgdat(pgdat, &cc);
@@ -833,7 +957,7 @@ static int compact_node(int nid)
 {
 	struct compact_control cc = {
 		.order = -1,
-		.sync = true,
+		.mode = COMPACT_SYNC,
 	};
 
 	return __compact_pgdat(NODE_DATA(nid), &cc);
diff --git a/mm/internal.h b/mm/internal.h
index 2ba87fb..061fde7 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -94,6 +94,9 @@ extern void putback_lru_page(struct page *page);
 /*
  * in mm/page_alloc.c
  */
+extern void set_pageblock_migratetype(struct page *page, int migratetype);
+extern int move_freepages_block(struct zone *zone, struct page *page,
+				int migratetype);
 extern void __free_pages_bootmem(struct page *page, unsigned int order);
 extern void prep_compound_page(struct page *page, unsigned long order);
 #ifdef CONFIG_MEMORY_FAILURE
@@ -101,6 +104,7 @@ extern bool is_free_buddy_page(struct page *page);
 #endif
 
 #if defined CONFIG_COMPACTION || defined CONFIG_CMA
+#include <linux/compaction.h>
 
 /*
  * in mm/compaction.c
@@ -119,11 +123,14 @@ struct compact_control {
 	unsigned long nr_migratepages;	/* Number of pages to migrate */
 	unsigned long free_pfn;		/* isolate_freepages search base */
 	unsigned long migrate_pfn;	/* isolate_migratepages search base */
-	bool sync;			/* Synchronous migration */
+	enum compact_mode mode;		/* Compaction mode */
 
 	int order;			/* order a direct compactor needs */
 	int migratetype;		/* MOVABLE, RECLAIMABLE etc */
 	struct zone *zone;
+
+	/* Number of UNMOVABLE destination pageblocks skipped during scan */
+	unsigned long nr_unmovable_pageblock;
 };
 
 unsigned long
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 476ae3e..d40e4c7 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -219,7 +219,7 @@ EXPORT_SYMBOL(nr_online_nodes);
 
 int page_group_by_mobility_disabled __read_mostly;
 
-static void set_pageblock_migratetype(struct page *page, int migratetype)
+void set_pageblock_migratetype(struct page *page, int migratetype)
 {
 
 	if (unlikely(page_group_by_mobility_disabled))
@@ -954,8 +954,8 @@ static int move_freepages(struct zone *zone,
 	return pages_moved;
 }
 
-static int move_freepages_block(struct zone *zone, struct page *page,
-				int migratetype)
+int move_freepages_block(struct zone *zone, struct page *page,
+			 int migratetype)
 {
 	unsigned long start_pfn, end_pfn;
 	struct page *start_page, *end_page;
@@ -5651,7 +5651,7 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
 		.nr_migratepages = 0,
 		.order = -1,
 		.zone = page_zone(pfn_to_page(start)),
-		.sync = true,
+		.mode = COMPACT_SYNC,
 	};
 	INIT_LIST_HEAD(&cc.migratepages);
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2012-06-11 13:35 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-06-04 13:43 [PATCH v9] mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks Bartlomiej Zolnierkiewicz
2012-06-04 13:43 ` Bartlomiej Zolnierkiewicz
2012-06-04 14:22 ` Michal Nazarewicz
2012-06-04 14:22   ` Michal Nazarewicz
2012-06-06 12:55   ` Bartlomiej Zolnierkiewicz
2012-06-06 12:55     ` Bartlomiej Zolnierkiewicz
2012-06-06 15:52     ` Michal Nazarewicz
2012-06-06 15:52       ` Michal Nazarewicz
2012-06-07  4:23       ` Minchan Kim
2012-06-07  4:23         ` Minchan Kim
2012-06-04 17:13 ` Dave Jones
2012-06-04 17:13   ` Dave Jones
2012-06-04 20:22 ` KOSAKI Motohiro
2012-06-04 20:22   ` KOSAKI Motohiro
2012-06-05  1:59   ` Minchan Kim
2012-06-05  2:38     ` Minchan Kim
2012-06-05  2:38       ` Minchan Kim
2012-06-05  4:35       ` KOSAKI Motohiro
2012-06-05  4:35         ` KOSAKI Motohiro
2012-06-05  6:05         ` Minchan Kim
2012-06-05  6:05           ` Minchan Kim
2012-06-05 14:40           ` KOSAKI Motohiro
2012-06-05 14:40             ` KOSAKI Motohiro
2012-06-11 13:06           ` Mel Gorman
2012-06-11 13:06             ` Mel Gorman
2012-06-11 13:35             ` Rik van Riel
2012-06-11 13:35               ` Rik van Riel
2012-06-06 10:06       ` Bartlomiej Zolnierkiewicz
2012-06-06 10:06         ` Bartlomiej Zolnierkiewicz
2012-06-07  4:13         ` Minchan Kim
2012-06-07  4:13           ` Minchan Kim

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.