linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/3 V1] mm: add new migrate type and online_movable for hotplug
@ 2012-07-04  7:26 Lai Jiangshan
  2012-07-04  7:26 ` [RFC PATCH 1/3 V1] mm, page_alloc: use __rmqueue_smallest when borrow memory from MIGRATE_CMA Lai Jiangshan
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Lai Jiangshan @ 2012-07-04  7:26 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Chris Metcalf, --, Len Brown

The 1st patch fixes the allocation of CMA and prepares for movable-like types.

The 2nd patch add a new migrate type which stands for the movable types which
pages will not be changed to the other type.

I chose the name MIGRATE_HOTREMOVE from MIGRATE_HOTREMOVE
and MIGRATE_MOVABLE_STABLE, it just because the first usecase of this new type
is for hotremove.

The 3th path introduces online_movable. When a memoryblock is onlined
by "online_movable", the kernel will not have directly reference to the page
of the memoryblock, thus we can remove that memory any time when needed.

Different from ZONE_MOVABLE: it can be used for any given memroyblock.

Lai Jiangshan (3):
  use __rmqueue_smallest when borrow memory from MIGRATE_CMA
  add MIGRATE_HOTREMOVE type
  add online_movable

 arch/tile/mm/init.c            |    2 +-
 drivers/acpi/acpi_memhotplug.c |    3 +-
 drivers/base/memory.c          |   24 +++++++----
 include/linux/memory.h         |    1 +
 include/linux/memory_hotplug.h |    4 +-
 include/linux/mmzone.h         |   37 +++++++++++++++++
 include/linux/page-isolation.h |    2 +-
 mm/compaction.c                |    6 +-
 mm/memory-failure.c            |    8 +++-
 mm/memory_hotplug.c            |   36 +++++++++++++---
 mm/page_alloc.c                |   86 ++++++++++++++++-----------------------
 mm/vmstat.c                    |    3 +
 12 files changed, 136 insertions(+), 76 deletions(-)

-- 
1.7.4.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC PATCH 1/3 V1] mm, page_alloc: use __rmqueue_smallest when borrow memory from MIGRATE_CMA
  2012-07-04  7:26 [RFC PATCH 0/3 V1] mm: add new migrate type and online_movable for hotplug Lai Jiangshan
@ 2012-07-04  7:26 ` Lai Jiangshan
  2012-07-04 10:17   ` Mel Gorman
  2012-07-04  7:26 ` [RFC PATCH 2/3 V1] mm, page migrate: add MIGRATE_HOTREMOVE type Lai Jiangshan
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Lai Jiangshan @ 2012-07-04  7:26 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Chris Metcalf, --, Len Brown

The pages of MIGRATE_CMA can't not be changed to the other type,
nor be moved to the other free list. 

==>
So when we use __rmqueue_fallback() to borrow memory from MIGRATE_CMA,
one of the highest order page is borrowed and it is split.
But the free pages resulted by splitting can NOT
be moved to MIGRATE_MOVABLE.

==>
So in the next time of allocation, we NEED to borrow again,
another one of the highest order page is borrowed from CMA and it is split.
and results some other new split free pages.

==>
So when __rmqueue_fallback() borrows (highest order)memory from MIGRATE_CMA,
it introduces fragments at the same time and may waste tlb(only one page is used in
a pageblock).

Conclusion:
We should borrows the smallest order memory from MIGRATE_CMA in such case

Result(good):
1) use __rmqueue_smallest when borrow memory from MIGRATE_CMA
2) __rmqueue_fallback() don't handle CMA, it becomes much simpler
Result(bad):
__rmqueue_smallest() can't not be inlined to avoid function call overhead.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 include/linux/mmzone.h |    1 +
 mm/page_alloc.c        |   63 ++++++++++++++++--------------------------------
 2 files changed, 22 insertions(+), 42 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index bf3404e..979c333 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -40,6 +40,7 @@ enum {
 	MIGRATE_RECLAIMABLE,
 	MIGRATE_MOVABLE,
 	MIGRATE_PCPTYPES,	/* the number of types on the pcp lists */
+	MIGRATE_PRIME_TYPES = MIGRATE_PCPTYPES,
 	MIGRATE_RESERVE = MIGRATE_PCPTYPES,
 #ifdef CONFIG_CMA
 	/*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 476ae3e..efc327f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -893,17 +893,10 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
  * This array describes the order lists are fallen back to when
  * the free lists for the desirable migrate type are depleted
  */
-static int fallbacks[MIGRATE_TYPES][4] = {
-	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,     MIGRATE_RESERVE },
-	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,     MIGRATE_RESERVE },
-#ifdef CONFIG_CMA
-	[MIGRATE_MOVABLE]     = { MIGRATE_CMA,         MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
-	[MIGRATE_CMA]         = { MIGRATE_RESERVE }, /* Never used */
-#else
-	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE,   MIGRATE_RESERVE },
-#endif
-	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE }, /* Never used */
-	[MIGRATE_ISOLATE]     = { MIGRATE_RESERVE }, /* Never used */
+static int fallbacks[MIGRATE_PRIME_TYPES][2] = {
+	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE   },
+	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE   },
+	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE },
 };
 
 /*
@@ -995,16 +988,15 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 	struct page *page;
 	int migratetype, i;
 
+	if (WARN_ON_ONCE(start_migratetype >= MIGRATE_PRIME_TYPES))
+		start_migratetype = MIGRATE_UNMOVABLE;
+
 	/* Find the largest possible block of pages in the other list */
 	for (current_order = MAX_ORDER-1; current_order >= order;
 						--current_order) {
-		for (i = 0;; i++) {
+		for (i = 0; i < ARRAY_SIZE(fallbacks[0]); i++) {
 			migratetype = fallbacks[start_migratetype][i];
 
-			/* MIGRATE_RESERVE handled later if necessary */
-			if (migratetype == MIGRATE_RESERVE)
-				break;
-
 			area = &(zone->free_area[current_order]);
 			if (list_empty(&area->free_list[migratetype]))
 				continue;
@@ -1018,17 +1010,10 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 			 * pages to the preferred allocation list. If falling
 			 * back for a reclaimable kernel allocation, be more
 			 * aggressive about taking ownership of free pages
-			 *
-			 * On the other hand, never change migration
-			 * type of MIGRATE_CMA pageblocks nor move CMA
-			 * pages on different free lists. We don't
-			 * want unmovable pages to be allocated from
-			 * MIGRATE_CMA areas.
 			 */
-			if (!is_migrate_cma(migratetype) &&
-			    (unlikely(current_order >= pageblock_order / 2) ||
-			     start_migratetype == MIGRATE_RECLAIMABLE ||
-			     page_group_by_mobility_disabled)) {
+			if (unlikely(current_order >= pageblock_order / 2) ||
+			    start_migratetype == MIGRATE_RECLAIMABLE ||
+			    page_group_by_mobility_disabled) {
 				int pages;
 				pages = move_freepages_block(zone, page,
 								start_migratetype);
@@ -1047,14 +1032,12 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
 			rmv_page_order(page);
 
 			/* Take ownership for orders >= pageblock_order */
-			if (current_order >= pageblock_order &&
-			    !is_migrate_cma(migratetype))
+			if (current_order >= pageblock_order)
 				change_pageblock_range(page, current_order,
 							start_migratetype);
 
 			expand(zone, page, order, current_order, area,
-			       is_migrate_cma(migratetype)
-			     ? migratetype : start_migratetype);
+			       start_migratetype);
 
 			trace_mm_page_alloc_extfrag(page, order, current_order,
 				start_migratetype, migratetype);
@@ -1075,22 +1058,18 @@ static struct page *__rmqueue(struct zone *zone, unsigned int order,
 {
 	struct page *page;
 
-retry_reserve:
 	page = __rmqueue_smallest(zone, order, migratetype);
 
-	if (unlikely(!page) && migratetype != MIGRATE_RESERVE) {
+#ifdef CONFIG_CMA
+	if (unlikely(!page) && migratetype == MIGRATE_MOVABLE)
+		page = __rmqueue_smallest(zone, order, MIGRATE_CMA);
+#endif
+
+	if (unlikely(!page))
 		page = __rmqueue_fallback(zone, order, migratetype);
 
-		/*
-		 * Use MIGRATE_RESERVE rather than fail an allocation. goto
-		 * is used because __rmqueue_smallest is an inline function
-		 * and we want just one call site
-		 */
-		if (!page) {
-			migratetype = MIGRATE_RESERVE;
-			goto retry_reserve;
-		}
-	}
+	if (unlikely(!page))
+		page = __rmqueue_smallest(zone, order, MIGRATE_RESERVE);
 
 	trace_mm_page_alloc_zone_locked(page, order, migratetype);
 	return page;
-- 
1.7.4.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC PATCH 2/3 V1] mm, page migrate: add MIGRATE_HOTREMOVE type
  2012-07-04  7:26 [RFC PATCH 0/3 V1] mm: add new migrate type and online_movable for hotplug Lai Jiangshan
  2012-07-04  7:26 ` [RFC PATCH 1/3 V1] mm, page_alloc: use __rmqueue_smallest when borrow memory from MIGRATE_CMA Lai Jiangshan
@ 2012-07-04  7:26 ` Lai Jiangshan
  2012-07-04 10:19   ` Mel Gorman
  2012-07-04  7:26 ` [RFC PATCH 3/3 V1] mm, memory-hotplug: add online_movable Lai Jiangshan
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Lai Jiangshan @ 2012-07-04  7:26 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Chris Metcalf, --, Len Brown

MIGRATE_HOTREMOVE is a special kind of MIGRATE_MOVABLE, but it is stable:
any page of the type can NOT be changed to the other type nor be moved to
the other free list.

So the pages of MIGRATE_HOTREMOVE are always movable, this ability is
useful for hugepages and hotremove ...etc.

MIGRATE_HOTREMOVE pages is the used as the first candidate when
we allocate movable pages.

1) add small routine is_migrate_movable() for movable-like types
2) add small routine is_migrate_stable() for stable types
3) fix some comments
4) fix get_any_page(). The get_any_page() may change
   MIGRATE_CMA/HOTREMOVE types page to MOVABLE which may cause this page
   to be changed to UNMOVABLE.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 include/linux/mmzone.h         |   34 ++++++++++++++++++++++++++++++++++
 include/linux/page-isolation.h |    2 +-
 mm/compaction.c                |    6 +++---
 mm/memory-failure.c            |    8 +++++++-
 mm/page_alloc.c                |   21 +++++++++++++--------
 mm/vmstat.c                    |    3 +++
 6 files changed, 61 insertions(+), 13 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 979c333..872f430 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -58,6 +58,15 @@ enum {
 	 */
 	MIGRATE_CMA,
 #endif
+#ifdef CONFIG_MEMORY_HOTREMOVE
+	/*
+	 * MIGRATE_HOTREMOVE migration type is designed to mimic the way
+	 * ZONE_MOVABLE works.  Only movable pages can be allocated
+	 * from MIGRATE_HOTREMOVE pageblocks and page allocator never
+	 * implicitly change migration type of MIGRATE_HOTREMOVE pageblock.
+	 */
+	MIGRATE_HOTREMOVE,
+#endif
 	MIGRATE_ISOLATE,	/* can't allocate from here */
 	MIGRATE_TYPES
 };
@@ -70,6 +79,31 @@ enum {
 #  define cma_wmark_pages(zone) 0
 #endif
 
+#ifdef CONFIG_MEMORY_HOTREMOVE
+#define is_migrate_hotremove(migratetype) ((migratetype) == MIGRATE_HOTREMOVE)
+#else
+#define is_migrate_hotremove(migratetype) false
+#endif
+
+/* Is it one of the movable types */
+static inline bool is_migrate_movable(int migratetype)
+{
+	return is_migrate_hotremove(migratetype) ||
+	       migratetype == MIGRATE_MOVABLE ||
+	       is_migrate_cma(migratetype);
+}
+
+/*
+ * Stable types: any page of the type can NOT be changed to
+ * the other type nor be moved to the other free list.
+ */
+static inline bool is_migrate_stable(int migratetype)
+{
+	return is_migrate_hotremove(migratetype) ||
+	       is_migrate_cma(migratetype) ||
+	       migratetype == MIGRATE_RESERVE;
+}
+
 #define for_each_migratetype_order(order, type) \
 	for (order = 0; order < MAX_ORDER; order++) \
 		for (type = 0; type < MIGRATE_TYPES; type++)
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 3bdcab3..b1d6d92 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -15,7 +15,7 @@ start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
 			 unsigned migratetype);
 
 /*
- * Changes MIGRATE_ISOLATE to MIGRATE_MOVABLE.
+ * Changes MIGRATE_ISOLATE to migratetype.
  * target range is [start_pfn, end_pfn)
  */
 extern int
diff --git a/mm/compaction.c b/mm/compaction.c
index 7ea259d..e8da894 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -47,7 +47,7 @@ static void map_pages(struct list_head *list)
 
 static inline bool migrate_async_suitable(int migratetype)
 {
-	return is_migrate_cma(migratetype) || migratetype == MIGRATE_MOVABLE;
+	return is_migrate_movable(migratetype);
 }
 
 /*
@@ -375,8 +375,8 @@ static bool suitable_migration_target(struct page *page)
 	if (PageBuddy(page) && page_order(page) >= pageblock_order)
 		return true;
 
-	/* If the block is MIGRATE_MOVABLE or MIGRATE_CMA, allow migration */
-	if (migrate_async_suitable(migratetype))
+	/* If the block is movable, allow migration */
+	if (is_migrate_movable(migratetype))
 		return true;
 
 	/* Otherwise skip the block */
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index ab1e714..f5e300d 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1367,6 +1367,7 @@ static struct page *new_page(struct page *p, unsigned long private, int **x)
 static int get_any_page(struct page *p, unsigned long pfn, int flags)
 {
 	int ret;
+	int mt;
 
 	if (flags & MF_COUNT_INCREASED)
 		return 1;
@@ -1377,6 +1378,11 @@ static int get_any_page(struct page *p, unsigned long pfn, int flags)
 	 */
 	lock_memory_hotplug();
 
+	/* Don't move page of stable type to MIGRATE_MOVABLE */
+	mt = get_pageblock_migratetype(p);
+	if (!is_migrate_stable(mt))
+		mt = MIGRATE_MOVABLE;
+
 	/*
 	 * Isolate the page, so that it doesn't get reallocated if it
 	 * was free.
@@ -1404,7 +1410,7 @@ static int get_any_page(struct page *p, unsigned long pfn, int flags)
 		/* Not a free page */
 		ret = 1;
 	}
-	unset_migratetype_isolate(p, MIGRATE_MOVABLE);
+	unset_migratetype_isolate(p, mt);
 	unlock_memory_hotplug();
 	return ret;
 }
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index efc327f..7a4a03b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -667,7 +667,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
 			page = list_entry(list->prev, struct page, lru);
 			/* must delete as __free_one_page list manipulates */
 			list_del(&page->lru);
-			/* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */
+			/* MIGRATE_MOVABLE list may include other types */
 			__free_one_page(page, zone, 0, page_private(page));
 			trace_mm_page_pcpu_drain(page, 0, page_private(page));
 		} while (--to_free && --batch_free && !list_empty(list));
@@ -1058,6 +1058,14 @@ static struct page *__rmqueue(struct zone *zone, unsigned int order,
 {
 	struct page *page;
 
+#ifdef CONFIG_MEMORY_HOTREMOVE
+	if (migratetype == MIGRATE_MOVABLE) {
+		page = __rmqueue_smallest(zone, order, MIGRATE_HOTREMOVE);
+		if (likely(page))
+			goto done;
+	}
+#endif
+
 	page = __rmqueue_smallest(zone, order, migratetype);
 
 #ifdef CONFIG_CMA
@@ -1071,6 +1079,7 @@ static struct page *__rmqueue(struct zone *zone, unsigned int order,
 	if (unlikely(!page))
 		page = __rmqueue_smallest(zone, order, MIGRATE_RESERVE);
 
+done:
 	trace_mm_page_alloc_zone_locked(page, order, migratetype);
 	return page;
 }
@@ -1105,11 +1114,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 			list_add(&page->lru, list);
 		else
 			list_add_tail(&page->lru, list);
-		if (IS_ENABLED(CONFIG_CMA)) {
-			mt = get_pageblock_migratetype(page);
-			if (!is_migrate_cma(mt) && mt != MIGRATE_ISOLATE)
-				mt = migratetype;
-		}
+		mt = get_pageblock_migratetype(page);
 		set_page_private(page, mt);
 		list = &page->lru;
 	}
@@ -1392,7 +1397,7 @@ int split_free_page(struct page *page)
 		struct page *endpage = page + (1 << order) - 1;
 		for (; page < endpage; page += pageblock_nr_pages) {
 			int mt = get_pageblock_migratetype(page);
-			if (mt != MIGRATE_ISOLATE && !is_migrate_cma(mt))
+			if (mt != MIGRATE_ISOLATE && !is_migrate_stable(mt))
 				set_pageblock_migratetype(page,
 							  MIGRATE_MOVABLE);
 		}
@@ -5465,7 +5470,7 @@ __count_immobile_pages(struct zone *zone, struct page *page, int count)
 	if (zone_idx(zone) == ZONE_MOVABLE)
 		return true;
 	mt = get_pageblock_migratetype(page);
-	if (mt == MIGRATE_MOVABLE || is_migrate_cma(mt))
+	if (is_migrate_movable(mt))
 		return true;
 
 	pfn = page_to_pfn(page);
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 1bbbbd9..44a3b7f 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -616,6 +616,9 @@ static char * const migratetype_names[MIGRATE_TYPES] = {
 #ifdef CONFIG_CMA
 	"CMA",
 #endif
+#ifdef CONFIG_MEMORY_HOTREMOVE
+	"Hotremove",
+#endif
 	"Isolate",
 };
 
-- 
1.7.4.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC PATCH 3/3 V1] mm, memory-hotplug: add online_movable
  2012-07-04  7:26 [RFC PATCH 0/3 V1] mm: add new migrate type and online_movable for hotplug Lai Jiangshan
  2012-07-04  7:26 ` [RFC PATCH 1/3 V1] mm, page_alloc: use __rmqueue_smallest when borrow memory from MIGRATE_CMA Lai Jiangshan
  2012-07-04  7:26 ` [RFC PATCH 2/3 V1] mm, page migrate: add MIGRATE_HOTREMOVE type Lai Jiangshan
@ 2012-07-04  7:26 ` Lai Jiangshan
  2012-07-04 14:58   ` Greg Kroah-Hartman
  2012-07-04  7:35 ` [RFC PATCH 0/3 V1] mm: add new migrate type and online_movable for hotplug Minchan Kim
  2012-07-05  9:05 ` Mel Gorman
  4 siblings, 1 reply; 16+ messages in thread
From: Lai Jiangshan @ 2012-07-04  7:26 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Chris Metcalf, --, Len Brown

When a memoryblock is onlined by "online_movable", the kernel will not
have directly reference to the page of the memoryblock,
thus we can remove that memory any time when needed.

It makes things easy when we dynamic hot-add/remove memory, make better
utilities of memories.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 arch/tile/mm/init.c            |    2 +-
 drivers/acpi/acpi_memhotplug.c |    3 ++-
 drivers/base/memory.c          |   24 +++++++++++++++---------
 include/linux/memory.h         |    1 +
 include/linux/memory_hotplug.h |    4 ++--
 include/linux/mmzone.h         |    2 ++
 mm/memory_hotplug.c            |   36 +++++++++++++++++++++++++++++-------
 mm/page_alloc.c                |    2 +-
 8 files changed, 53 insertions(+), 21 deletions(-)

diff --git a/arch/tile/mm/init.c b/arch/tile/mm/init.c
index 630dd2c..624d397 100644
--- a/arch/tile/mm/init.c
+++ b/arch/tile/mm/init.c
@@ -943,7 +943,7 @@ int arch_add_memory(u64 start, u64 size)
 	return __add_pages(zone, start_pfn, nr_pages);
 }
 
-int remove_memory(u64 start, u64 size)
+int remove_memory(u64 start, u64 size, int movable)
 {
 	return -EINVAL;
 }
diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index d985713..8a9c039 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -318,7 +318,8 @@ static int acpi_memory_disable_device(struct acpi_memory_device *mem_device)
 	 */
 	list_for_each_entry_safe(info, n, &mem_device->res_list, list) {
 		if (info->enabled) {
-			result = remove_memory(info->start_addr, info->length);
+			result = remove_memory(info->start_addr,
+					info->length, 0);
 			if (result)
 				return result;
 		}
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 7dda4f7..cc6c5d2 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -246,7 +246,7 @@ static bool pages_correctly_reserved(unsigned long start_pfn,
  * OK to have direct references to sparsemem variables in here.
  */
 static int
-memory_block_action(unsigned long phys_index, unsigned long action)
+memory_block_action(unsigned long phys_index, unsigned long action, int movable)
 {
 	unsigned long start_pfn, start_paddr;
 	unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
@@ -262,12 +262,12 @@ memory_block_action(unsigned long phys_index, unsigned long action)
 			if (!pages_correctly_reserved(start_pfn, nr_pages))
 				return -EBUSY;
 
-			ret = online_pages(start_pfn, nr_pages);
+			ret = online_pages(start_pfn, nr_pages, movable);
 			break;
 		case MEM_OFFLINE:
 			start_paddr = page_to_pfn(first_page) << PAGE_SHIFT;
 			ret = remove_memory(start_paddr,
-					    nr_pages << PAGE_SHIFT);
+					    nr_pages << PAGE_SHIFT, movable);
 			break;
 		default:
 			WARN(1, KERN_WARNING "%s(%ld, %ld) unknown action: "
@@ -279,7 +279,8 @@ memory_block_action(unsigned long phys_index, unsigned long action)
 }
 
 static int memory_block_change_state(struct memory_block *mem,
-		unsigned long to_state, unsigned long from_state_req)
+		unsigned long to_state, unsigned long from_state_req,
+		int movable)
 {
 	int ret = 0;
 
@@ -290,16 +291,19 @@ static int memory_block_change_state(struct memory_block *mem,
 		goto out;
 	}
 
-	if (to_state == MEM_OFFLINE)
+	if (to_state == MEM_OFFLINE) {
+		movable = mem->movable;
 		mem->state = MEM_GOING_OFFLINE;
+	}
 
-	ret = memory_block_action(mem->start_section_nr, to_state);
+	ret = memory_block_action(mem->start_section_nr, to_state, movable);
 
 	if (ret) {
 		mem->state = from_state_req;
 		goto out;
 	}
 
+	mem->movable = movable;
 	mem->state = to_state;
 	switch (mem->state) {
 	case MEM_OFFLINE:
@@ -325,10 +329,12 @@ store_mem_state(struct device *dev,
 
 	mem = container_of(dev, struct memory_block, dev);
 
-	if (!strncmp(buf, "online", min((int)count, 6)))
-		ret = memory_block_change_state(mem, MEM_ONLINE, MEM_OFFLINE);
+	if (!strncmp(buf, "online_movable", min((int)count, 14)))
+		ret = memory_block_change_state(mem, MEM_ONLINE, MEM_OFFLINE, 1);
+	else if (!strncmp(buf, "online", min((int)count, 6)))
+		ret = memory_block_change_state(mem, MEM_ONLINE, MEM_OFFLINE, 0);
 	else if(!strncmp(buf, "offline", min((int)count, 7)))
-		ret = memory_block_change_state(mem, MEM_OFFLINE, MEM_ONLINE);
+		ret = memory_block_change_state(mem, MEM_OFFLINE, MEM_ONLINE, 0);
 
 	if (ret)
 		return ret;
diff --git a/include/linux/memory.h b/include/linux/memory.h
index 1ac7f6e..90eae9c 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -26,6 +26,7 @@ struct memory_block {
 	unsigned long end_section_nr;
 	unsigned long state;
 	int section_count;
+	int movable;
 
 	/*
 	 * This serializes all state change requests.  It isn't
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 910550f..0e6501c 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -70,7 +70,7 @@ extern int zone_grow_free_lists(struct zone *zone, unsigned long new_nr_pages);
 extern int zone_grow_waitqueues(struct zone *zone, unsigned long nr_pages);
 extern int add_one_highpage(struct page *page, int pfn, int bad_ppro);
 /* VM interface that may be used by firmware interface */
-extern int online_pages(unsigned long, unsigned long);
+extern int online_pages(unsigned long, unsigned long, int);
 extern void __offline_isolated_pages(unsigned long, unsigned long);
 
 typedef void (*online_page_callback_t)(struct page *page);
@@ -233,7 +233,7 @@ static inline int is_mem_section_removable(unsigned long pfn,
 extern int mem_online_node(int nid);
 extern int add_memory(int nid, u64 start, u64 size);
 extern int arch_add_memory(int nid, u64 start, u64 size);
-extern int remove_memory(u64 start, u64 size);
+extern int remove_memory(u64 start, u64 size, int);
 extern int sparse_add_one_section(struct zone *zone, unsigned long start_pfn,
 								int nr_pages);
 extern void sparse_remove_one_section(struct zone *zone, struct mem_section *ms);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 872f430..458bd0b 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -115,6 +115,8 @@ static inline int get_pageblock_migratetype(struct page *page)
 	return get_pageblock_flags_group(page, PB_migrate, PB_migrate_end);
 }
 
+extern void set_pageblock_migratetype(struct page *page, int migratetype);
+
 struct free_area {
 	struct list_head	free_list[MIGRATE_TYPES];
 	unsigned long		nr_free;
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 0d7e3ec..cb49893 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -457,7 +457,7 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
 }
 
 
-int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
+int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int movable)
 {
 	unsigned long onlined_pages = 0;
 	struct zone *zone;
@@ -466,6 +466,12 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
 	int ret;
 	struct memory_notify arg;
 
+	/* at least, alignment against pageblock is necessary */
+	if (!IS_ALIGNED(pfn, pageblock_nr_pages))
+		return -EINVAL;
+	if (!IS_ALIGNED(nr_pages, pageblock_nr_pages))
+		return -EINVAL;
+
 	lock_memory_hotplug();
 	arg.start_pfn = pfn;
 	arg.nr_pages = nr_pages;
@@ -497,6 +503,21 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
 	if (!populated_zone(zone))
 		need_zonelists_rebuild = 1;
 
+#ifdef CONFIG_MEMORY_HOTREMOVE
+	if (movable) {
+		unsigned long offset;
+
+		for (offset = 0;
+		     offset < nr_pages;
+		     offset += pageblock_nr_pages) {
+			spin_lock_irq(&zone->lock);
+			set_pageblock_migratetype(pfn_to_page(pfn + offset),
+					MIGRATE_HOTREMOVE);
+			spin_unlock_irq(&zone->lock);
+		}
+	}
+#endif
+
 	ret = walk_system_ram_range(pfn, nr_pages, &onlined_pages,
 		online_pages_range);
 	if (ret) {
@@ -866,13 +887,14 @@ check_pages_isolated(unsigned long start_pfn, unsigned long end_pfn)
 }
 
 static int __ref offline_pages(unsigned long start_pfn,
-		  unsigned long end_pfn, unsigned long timeout)
+		  unsigned long end_pfn, unsigned long timeout, int movable)
 {
 	unsigned long pfn, nr_pages, expire;
 	long offlined_pages;
 	int ret, drain, retry_max, node;
 	struct zone *zone;
 	struct memory_notify arg;
+	int origin_mt = movable ? MIGRATE_HOTREMOVE : MIGRATE_MOVABLE;
 
 	BUG_ON(start_pfn >= end_pfn);
 	/* at least, alignment against pageblock is necessary */
@@ -892,7 +914,7 @@ static int __ref offline_pages(unsigned long start_pfn,
 	nr_pages = end_pfn - start_pfn;
 
 	/* set above range as isolated */
-	ret = start_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+	ret = start_isolate_page_range(start_pfn, end_pfn, origin_mt);
 	if (ret)
 		goto out;
 
@@ -983,23 +1005,23 @@ failed_removal:
 	       ((unsigned long long) end_pfn << PAGE_SHIFT) - 1);
 	memory_notify(MEM_CANCEL_OFFLINE, &arg);
 	/* pushback to free area */
-	undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+	undo_isolate_page_range(start_pfn, end_pfn, origin_mt);
 
 out:
 	unlock_memory_hotplug();
 	return ret;
 }
 
-int remove_memory(u64 start, u64 size)
+int remove_memory(u64 start, u64 size, int movable)
 {
 	unsigned long start_pfn, end_pfn;
 
 	start_pfn = PFN_DOWN(start);
 	end_pfn = start_pfn + PFN_DOWN(size);
-	return offline_pages(start_pfn, end_pfn, 120 * HZ);
+	return offline_pages(start_pfn, end_pfn, 120 * HZ, movable);
 }
 #else
-int remove_memory(u64 start, u64 size)
+int remove_memory(u64 start, u64 size, int movable)
 {
 	return -EINVAL;
 }
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7a4a03b..801772c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -219,7 +219,7 @@ EXPORT_SYMBOL(nr_online_nodes);
 
 int page_group_by_mobility_disabled __read_mostly;
 
-static void set_pageblock_migratetype(struct page *page, int migratetype)
+void set_pageblock_migratetype(struct page *page, int migratetype)
 {
 
 	if (unlikely(page_group_by_mobility_disabled))
-- 
1.7.4.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 0/3 V1] mm: add new migrate type and online_movable for hotplug
  2012-07-04  7:26 [RFC PATCH 0/3 V1] mm: add new migrate type and online_movable for hotplug Lai Jiangshan
                   ` (2 preceding siblings ...)
  2012-07-04  7:26 ` [RFC PATCH 3/3 V1] mm, memory-hotplug: add online_movable Lai Jiangshan
@ 2012-07-04  7:35 ` Minchan Kim
  2012-07-04  8:23   ` Lai Jiangshan
  2012-07-05  9:05 ` Mel Gorman
  4 siblings, 1 reply; 16+ messages in thread
From: Minchan Kim @ 2012-07-04  7:35 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Mel Gorman, Chris Metcalf --, Len Brown --, Greg Kroah-Hartman --,
	Andi Kleen --, Julia Lawall --, David Howells --,
	Benjamin Herrenschmidt --, Kay Sievers --, Ingo Molnar --,
	Paul Gortmaker --, Daniel Kiper --, Andrew Morton --,
	Konrad Rzeszutek Wilk --, Michal Hocko --, KAMEZAWA Hiroyuki --,
	Michal Nazarewicz --, Marek Szyprowski --, Rik van Riel --,
	Bjorn Helgaas --, Christoph Lameter --, David Rientjes --,
	linux-kernel, linux-acpi, linux-mm

Hello,

I am not sure when I can review this series by urgent other works.
At a glance, it seems to attract me.
But unfortunately, when I read description in cover-letter, I can't
find "What's the problem?".
If you provide that, it could help too many your Ccing people who can
judge  "whether I dive into code or not"

Thanks!

Side-Note: What's the "--" of email addresses?

On Wed, Jul 4, 2012 at 4:26 PM, Lai Jiangshan <laijs@cn.fujitsu.com> wrote:
> The 1st patch fixes the allocation of CMA and prepares for movable-like types.
>
> The 2nd patch add a new migrate type which stands for the movable types which
> pages will not be changed to the other type.
>
> I chose the name MIGRATE_HOTREMOVE from MIGRATE_HOTREMOVE
> and MIGRATE_MOVABLE_STABLE, it just because the first usecase of this new type
> is for hotremove.
>
> The 3th path introduces online_movable. When a memoryblock is onlined
> by "online_movable", the kernel will not have directly reference to the page
> of the memoryblock, thus we can remove that memory any time when needed.
>
> Different from ZONE_MOVABLE: it can be used for any given memroyblock.
>
> Lai Jiangshan (3):
>   use __rmqueue_smallest when borrow memory from MIGRATE_CMA
>   add MIGRATE_HOTREMOVE type
>   add online_movable
>
>  arch/tile/mm/init.c            |    2 +-
>  drivers/acpi/acpi_memhotplug.c |    3 +-
>  drivers/base/memory.c          |   24 +++++++----
>  include/linux/memory.h         |    1 +
>  include/linux/memory_hotplug.h |    4 +-
>  include/linux/mmzone.h         |   37 +++++++++++++++++
>  include/linux/page-isolation.h |    2 +-
>  mm/compaction.c                |    6 +-
>  mm/memory-failure.c            |    8 +++-
>  mm/memory_hotplug.c            |   36 +++++++++++++---
>  mm/page_alloc.c                |   86 ++++++++++++++++-----------------------
>  mm/vmstat.c                    |    3 +
>  12 files changed, 136 insertions(+), 76 deletions(-)
>
> --
> 1.7.4.4
>



-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 0/3 V1] mm: add new migrate type and online_movable for hotplug
  2012-07-04  7:35 ` [RFC PATCH 0/3 V1] mm: add new migrate type and online_movable for hotplug Minchan Kim
@ 2012-07-04  8:23   ` Lai Jiangshan
  2012-07-04  8:43     ` Lai Jiangshan
  0 siblings, 1 reply; 16+ messages in thread
From: Lai Jiangshan @ 2012-07-04  8:23 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Mel Gorman, Chris Metcalf --, Len Brown --, Greg Kroah-Hartman --,
	Andi Kleen --, Julia Lawall --, David Howells --,
	Benjamin Herrenschmidt --, Kay Sievers --, Ingo Molnar --,
	Paul Gortmaker --, Daniel Kiper --, Andrew Morton --,
	Konrad Rzeszutek Wilk --, Michal Hocko --, KAMEZAWA Hiroyuki --,
	Michal Nazarewicz --, Marek Szyprowski --, Rik van Riel --,
	Bjorn Helgaas --, Christoph Lameter --, David Rientjes --, LKML,
	linux-acpi, linux-mm

On 07/04/2012 03:35 PM, Minchan Kim wrote:
> Hello,
> 
> I am not sure when I can review this series by urgent other works.
> At a glance, it seems to attract me.
> But unfortunately, when I read description in cover-letter, I can't
> find "What's the problem?".
> If you provide that, it could help too many your Ccing people who can
> judge  "whether I dive into code or not"

This patchset adds a stable-movable-migrate-type for memory-management,
It is used for anti-fragmentation(hugepage, big-order alloction...) and
hot-removal-of-memory(virtualization, power-conserve, move memory between systems).
it likes ZONE_MOVABLE, but it is more elastic.

Beside it, it fixes some code of CMA.

Thanks,
Lai


> 
> Thanks!
> 
> Side-Note: What's the "--" of email addresses?

Wrong script, I will resent it.

> 
> On Wed, Jul 4, 2012 at 4:26 PM, Lai Jiangshan <laijs@cn.fujitsu.com> wrote:
>> The 1st patch fixes the allocation of CMA and prepares for movable-like types.
>>
>> The 2nd patch add a new migrate type which stands for the movable types which
>> pages will not be changed to the other type.
>>
>> I chose the name MIGRATE_HOTREMOVE from MIGRATE_HOTREMOVE
>> and MIGRATE_MOVABLE_STABLE, it just because the first usecase of this new type
>> is for hotremove.
>>
>> The 3th path introduces online_movable. When a memoryblock is onlined
>> by "online_movable", the kernel will not have directly reference to the page
>> of the memoryblock, thus we can remove that memory any time when needed.
>>
>> Different from ZONE_MOVABLE: it can be used for any given memroyblock.
>>
>> Lai Jiangshan (3):
>>   use __rmqueue_smallest when borrow memory from MIGRATE_CMA
>>   add MIGRATE_HOTREMOVE type
>>   add online_movable
>>
>>  arch/tile/mm/init.c            |    2 +-
>>  drivers/acpi/acpi_memhotplug.c |    3 +-
>>  drivers/base/memory.c          |   24 +++++++----
>>  include/linux/memory.h         |    1 +
>>  include/linux/memory_hotplug.h |    4 +-
>>  include/linux/mmzone.h         |   37 +++++++++++++++++
>>  include/linux/page-isolation.h |    2 +-
>>  mm/compaction.c                |    6 +-
>>  mm/memory-failure.c            |    8 +++-
>>  mm/memory_hotplug.c            |   36 +++++++++++++---
>>  mm/page_alloc.c                |   86 ++++++++++++++++-----------------------
>>  mm/vmstat.c                    |    3 +
>>  12 files changed, 136 insertions(+), 76 deletions(-)
>>
>> --
>> 1.7.4.4
>>
> 
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 0/3 V1] mm: add new migrate type and online_movable for hotplug
  2012-07-04  8:23   ` Lai Jiangshan
@ 2012-07-04  8:43     ` Lai Jiangshan
  0 siblings, 0 replies; 16+ messages in thread
From: Lai Jiangshan @ 2012-07-04  8:43 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Mel Gorman, Chris Metcalf, Len Brown, Greg Kroah-Hartman,
	Andi Kleen, Julia Lawall, David Howells, Benjamin Herrenschmidt,
	Kay Sievers, Ingo Molnar, Paul Gortmaker, Daniel Kiper,
	Andrew Morton, Konrad Rzeszutek Wilk, Michal Hocko,
	KAMEZAWA Hiroyuki, Michal Nazarewicz, Marek Szyprowski,
	Rik van Riel, Bjorn Helgaas, Christoph Lameter, David Rientjes,
	LKML, linux-acpi, linux-mm

On 07/04/2012 04:23 PM, Lai Jiangshan wrote:
> On 07/04/2012 03:35 PM, Minchan Kim wrote:
>> Hello,
>>
>> I am not sure when I can review this series by urgent other works.
>> At a glance, it seems to attract me.

Patches are resent with updated cover-letter and correct cc list.

Thank you very much.
Lai


>> But unfortunately, when I read description in cover-letter, I can't
>> find "What's the problem?".
>> If you provide that, it could help too many your Ccing people who can
>> judge  "whether I dive into code or not"
> 
> This patchset adds a stable-movable-migrate-type for memory-management,
> It is used for anti-fragmentation(hugepage, big-order alloction...) and
> hot-removal-of-memory(virtualization, power-conserve, move memory between systems).
> it likes ZONE_MOVABLE, but it is more elastic.
> 
> Beside it, it fixes some code of CMA.
> 
> Thanks,
> Lai
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 1/3 V1] mm, page_alloc: use __rmqueue_smallest when borrow memory from MIGRATE_CMA
  2012-07-04  7:26 ` [RFC PATCH 1/3 V1] mm, page_alloc: use __rmqueue_smallest when borrow memory from MIGRATE_CMA Lai Jiangshan
@ 2012-07-04 10:17   ` Mel Gorman
  2012-07-04 10:43     ` Lai Jiangshan
  0 siblings, 1 reply; 16+ messages in thread
From: Mel Gorman @ 2012-07-04 10:17 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Chris Metcalf, Len Brown, Greg Kroah-Hartman, Andi Kleen,
	Julia Lawall, David Howells, Benjamin Herrenschmidt, Kay Sievers,
	Ingo Molnar, Paul Gortmaker, Daniel Kiper, Andrew Morton,
	Konrad Rzeszutek Wilk, Michal Hocko, KAMEZAWA Hiroyuki,
	Minchan Kim, Michal Nazarewicz, Marek Szyprowski, Rik van Riel,
	Bjorn Helgaas, Christoph Lameter, David Rientjes, linux-kernel,
	linux-acpi, linux-mm

On Wed, Jul 04, 2012 at 03:26:16PM +0800, Lai Jiangshan wrote:
> The pages of MIGRATE_CMA can't not be changed to the other type,
> nor be moved to the other free list. 
> 
> ==>
> So when we use __rmqueue_fallback() to borrow memory from MIGRATE_CMA,
> one of the highest order page is borrowed and it is split.
> But the free pages resulted by splitting can NOT
> be moved to MIGRATE_MOVABLE.
> 
> ==>
> So in the next time of allocation, we NEED to borrow again,
> another one of the highest order page is borrowed from CMA and it is split.
> and results some other new split free pages.
> 

Then special case __rmqueue_fallback() to move pages stolen from
MIGRATE_CMA to the MIGRATE_MOVABLE lists but do not change the pageblock
type.

> ==>
> So when __rmqueue_fallback() borrows (highest order)memory from MIGRATE_CMA,
> it introduces fragments at the same time and may waste tlb(only one page is used in
> a pageblock).
> 
> Conclusion:
> We should borrows the smallest order memory from MIGRATE_CMA in such case
> 

That's excessive and unnecessary special casing. Just move the
MIGRATE_CMA pages to the MIGRATE_MOVABLE lists.

> Result(good):
> 1) use __rmqueue_smallest when borrow memory from MIGRATE_CMA
> 2) __rmqueue_fallback() don't handle CMA, it becomes much simpler
> Result(bad):
> __rmqueue_smallest() can't not be inlined to avoid function call overhead.
> 
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> ---
>  include/linux/mmzone.h |    1 +
>  mm/page_alloc.c        |   63 ++++++++++++++++--------------------------------
>  2 files changed, 22 insertions(+), 42 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index bf3404e..979c333 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -40,6 +40,7 @@ enum {
>  	MIGRATE_RECLAIMABLE,
>  	MIGRATE_MOVABLE,
>  	MIGRATE_PCPTYPES,	/* the number of types on the pcp lists */
> +	MIGRATE_PRIME_TYPES = MIGRATE_PCPTYPES,

No explanation why this new name is necessary.

>  	MIGRATE_RESERVE = MIGRATE_PCPTYPES,
>  #ifdef CONFIG_CMA
>  	/*
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 476ae3e..efc327f 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -893,17 +893,10 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
>   * This array describes the order lists are fallen back to when
>   * the free lists for the desirable migrate type are depleted
>   */
> -static int fallbacks[MIGRATE_TYPES][4] = {
> -	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,     MIGRATE_RESERVE },
> -	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,     MIGRATE_RESERVE },
> -#ifdef CONFIG_CMA
> -	[MIGRATE_MOVABLE]     = { MIGRATE_CMA,         MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
> -	[MIGRATE_CMA]         = { MIGRATE_RESERVE }, /* Never used */
> -#else
> -	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE,   MIGRATE_RESERVE },
> -#endif
> -	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE }, /* Never used */
> -	[MIGRATE_ISOLATE]     = { MIGRATE_RESERVE }, /* Never used */
> +static int fallbacks[MIGRATE_PRIME_TYPES][2] = {
> +	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE   },
> +	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE   },
> +	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE },
>  };
>  

And it's completely unclear why it was necessary to rip out the existing
fallback lists. It reworks how fallback lists work for no clear benefit.

>  /*
> @@ -995,16 +988,15 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
>  	struct page *page;
>  	int migratetype, i;
>  
> +	if (WARN_ON_ONCE(start_migratetype >= MIGRATE_PRIME_TYPES))
> +		start_migratetype = MIGRATE_UNMOVABLE;
> +

This should be completely unnecessary. If this warning is hit, the
callers are severely broken.

>  	/* Find the largest possible block of pages in the other list */
>  	for (current_order = MAX_ORDER-1; current_order >= order;
>  						--current_order) {
> -		for (i = 0;; i++) {
> +		for (i = 0; i < ARRAY_SIZE(fallbacks[0]); i++) {
>  			migratetype = fallbacks[start_migratetype][i];
>  
> -			/* MIGRATE_RESERVE handled later if necessary */
> -			if (migratetype == MIGRATE_RESERVE)
> -				break;
> -
>  			area = &(zone->free_area[current_order]);
>  			if (list_empty(&area->free_list[migratetype]))
>  				continue;
> @@ -1018,17 +1010,10 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
>  			 * pages to the preferred allocation list. If falling
>  			 * back for a reclaimable kernel allocation, be more
>  			 * aggressive about taking ownership of free pages
> -			 *
> -			 * On the other hand, never change migration
> -			 * type of MIGRATE_CMA pageblocks nor move CMA
> -			 * pages on different free lists. We don't
> -			 * want unmovable pages to be allocated from
> -			 * MIGRATE_CMA areas.
>  			 */
> -			if (!is_migrate_cma(migratetype) &&
> -			    (unlikely(current_order >= pageblock_order / 2) ||
> -			     start_migratetype == MIGRATE_RECLAIMABLE ||
> -			     page_group_by_mobility_disabled)) {
> +			if (unlikely(current_order >= pageblock_order / 2) ||
> +			    start_migratetype == MIGRATE_RECLAIMABLE ||
> +			    page_group_by_mobility_disabled) {
>  				int pages;
>  				pages = move_freepages_block(zone, page,
>  								start_migratetype);
> @@ -1047,14 +1032,12 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
>  			rmv_page_order(page);
>  
>  			/* Take ownership for orders >= pageblock_order */
> -			if (current_order >= pageblock_order &&
> -			    !is_migrate_cma(migratetype))
> +			if (current_order >= pageblock_order)
>  				change_pageblock_range(page, current_order,
>  							start_migratetype);
>  
>  			expand(zone, page, order, current_order, area,
> -			       is_migrate_cma(migratetype)
> -			     ? migratetype : start_migratetype);
> +			       start_migratetype);
>  
>  			trace_mm_page_alloc_extfrag(page, order, current_order,
>  				start_migratetype, migratetype);
> @@ -1075,22 +1058,18 @@ static struct page *__rmqueue(struct zone *zone, unsigned int order,
>  {
>  	struct page *page;
>  
> -retry_reserve:
>  	page = __rmqueue_smallest(zone, order, migratetype);
>  
> -	if (unlikely(!page) && migratetype != MIGRATE_RESERVE) {
> +#ifdef CONFIG_CMA
> +	if (unlikely(!page) && migratetype == MIGRATE_MOVABLE)
> +		page = __rmqueue_smallest(zone, order, MIGRATE_CMA);
> +#endif
> +
> +	if (unlikely(!page))
>  		page = __rmqueue_fallback(zone, order, migratetype);
>  
> -		/*
> -		 * Use MIGRATE_RESERVE rather than fail an allocation. goto
> -		 * is used because __rmqueue_smallest is an inline function
> -		 * and we want just one call site
> -		 */
> -		if (!page) {
> -			migratetype = MIGRATE_RESERVE;
> -			goto retry_reserve;
> -		}
> -	}
> +	if (unlikely(!page))
> +		page = __rmqueue_smallest(zone, order, MIGRATE_RESERVE);
>  
>  	trace_mm_page_alloc_zone_locked(page, order, migratetype);
>  	return page;

I didn't read this closely. It seems way more complex than necessary to
solve the problem described in the changelog. All you should need is to
special case that a __GFP_MOVABLE allocation using MIGRATE_CMA should
place the unused buddies on the MIGRATE_MOVABLE lists.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 2/3 V1] mm, page migrate: add MIGRATE_HOTREMOVE type
  2012-07-04  7:26 ` [RFC PATCH 2/3 V1] mm, page migrate: add MIGRATE_HOTREMOVE type Lai Jiangshan
@ 2012-07-04 10:19   ` Mel Gorman
  2012-07-04 10:50     ` Lai Jiangshan
  0 siblings, 1 reply; 16+ messages in thread
From: Mel Gorman @ 2012-07-04 10:19 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Chris Metcalf, Len Brown, Greg Kroah-Hartman, Andi Kleen,
	Julia Lawall, David Howells, Benjamin Herrenschmidt, Kay Sievers,
	Ingo Molnar, Paul Gortmaker, Daniel Kiper, Andrew Morton,
	Konrad Rzeszutek Wilk, Michal Hocko, KAMEZAWA Hiroyuki,
	Minchan Kim, Michal Nazarewicz, Marek Szyprowski, Rik van Riel,
	Bjorn Helgaas, Christoph Lameter, David Rientjes, linux-kernel,
	linux-acpi, linux-mm

On Wed, Jul 04, 2012 at 03:26:17PM +0800, Lai Jiangshan wrote:
> MIGRATE_HOTREMOVE is a special kind of MIGRATE_MOVABLE, but it is stable:
> any page of the type can NOT be changed to the other type nor be moved to
> the other free list.
> 
> So the pages of MIGRATE_HOTREMOVE are always movable, this ability is
> useful for hugepages and hotremove ...etc.
> 
> MIGRATE_HOTREMOVE pages is the used as the first candidate when
> we allocate movable pages.
> 
> 1) add small routine is_migrate_movable() for movable-like types
> 2) add small routine is_migrate_stable() for stable types
> 3) fix some comments
> 4) fix get_any_page(). The get_any_page() may change
>    MIGRATE_CMA/HOTREMOVE types page to MOVABLE which may cause this page
>    to be changed to UNMOVABLE.
> 

Reuse MIGRATE_CMA. Even if the pages are on the movable lists it should
not be a problem for memory hot-remove.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 1/3 V1] mm, page_alloc: use __rmqueue_smallest when borrow memory from MIGRATE_CMA
  2012-07-04 10:17   ` Mel Gorman
@ 2012-07-04 10:43     ` Lai Jiangshan
  2012-07-04 11:19       ` Mel Gorman
  0 siblings, 1 reply; 16+ messages in thread
From: Lai Jiangshan @ 2012-07-04 10:43 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Chris Metcalf, Len Brown, Greg Kroah-Hartman, Andi Kleen,
	Julia Lawall, David Howells, Benjamin Herrenschmidt, Kay Sievers,
	Ingo Molnar, Paul Gortmaker, Daniel Kiper, Andrew Morton,
	Konrad Rzeszutek Wilk, Michal Hocko, KAMEZAWA Hiroyuki,
	Minchan Kim, Michal Nazarewicz, Marek Szyprowski, Rik van Riel,
	Bjorn Helgaas, Christoph Lameter, David Rientjes, linux-kernel,
	linux-acpi, linux-mm

On 07/04/2012 06:17 PM, Mel Gorman wrote:
> On Wed, Jul 04, 2012 at 03:26:16PM +0800, Lai Jiangshan wrote:
>> The pages of MIGRATE_CMA can't not be changed to the other type,
>> nor be moved to the other free list. 
>>
>> ==>
>> So when we use __rmqueue_fallback() to borrow memory from MIGRATE_CMA,
>> one of the highest order page is borrowed and it is split.
>> But the free pages resulted by splitting can NOT
>> be moved to MIGRATE_MOVABLE.
>>
>> ==>
>> So in the next time of allocation, we NEED to borrow again,
>> another one of the highest order page is borrowed from CMA and it is split.
>> and results some other new split free pages.
>>
> 
> Then special case __rmqueue_fallback() to move pages stolen from
> MIGRATE_CMA to the MIGRATE_MOVABLE lists but do not change the pageblock
> type.

Because unmovable-page-requirement can allocate page from
MIGRATE_MOVABLE free list. So We can not move MIGRATE_CMA pages
to the MIGRATE_MOVABLE free list.

See here:

MOVABLE list is empty
UNMOVABLE list is empty
movable-page-requirement
	borrow from CMA list
	split it, others are put into UNMOVABLE list
unmovable-page-requiremnt
	borrow from UNMOVABLE list
	NOW, it is BUG, we use CMA pages for unmovable usage.



> 
>> ==>
>> So when __rmqueue_fallback() borrows (highest order)memory from MIGRATE_CMA,
>> it introduces fragments at the same time and may waste tlb(only one page is used in
>> a pageblock).
>>
>> Conclusion:
>> We should borrows the smallest order memory from MIGRATE_CMA in such case
>>
> 
> That's excessive and unnecessary special casing. Just move the
> MIGRATE_CMA pages to the MIGRATE_MOVABLE lists.

...

> 
>> Result(good):
>> 1) use __rmqueue_smallest when borrow memory from MIGRATE_CMA
>> 2) __rmqueue_fallback() don't handle CMA, it becomes much simpler
>> Result(bad):
>> __rmqueue_smallest() can't not be inlined to avoid function call overhead.
>>
>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>> ---
>>  include/linux/mmzone.h |    1 +
>>  mm/page_alloc.c        |   63 ++++++++++++++++--------------------------------
>>  2 files changed, 22 insertions(+), 42 deletions(-)
>>
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index bf3404e..979c333 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -40,6 +40,7 @@ enum {
>>  	MIGRATE_RECLAIMABLE,
>>  	MIGRATE_MOVABLE,
>>  	MIGRATE_PCPTYPES,	/* the number of types on the pcp lists */
>> +	MIGRATE_PRIME_TYPES = MIGRATE_PCPTYPES,
> 
> No explanation why this new name is necessary.

These three types are the basic types.
reusing the name MIGRATE_PCPTYPES in fallback array makes confusing.


> 
>>  	MIGRATE_RESERVE = MIGRATE_PCPTYPES,
>>  #ifdef CONFIG_CMA
>>  	/*
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 476ae3e..efc327f 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -893,17 +893,10 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
>>   * This array describes the order lists are fallen back to when
>>   * the free lists for the desirable migrate type are depleted
>>   */
>> -static int fallbacks[MIGRATE_TYPES][4] = {
>> -	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,     MIGRATE_RESERVE },
>> -	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,     MIGRATE_RESERVE },
>> -#ifdef CONFIG_CMA
>> -	[MIGRATE_MOVABLE]     = { MIGRATE_CMA,         MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
>> -	[MIGRATE_CMA]         = { MIGRATE_RESERVE }, /* Never used */
>> -#else
>> -	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE,   MIGRATE_RESERVE },
>> -#endif
>> -	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE }, /* Never used */
>> -	[MIGRATE_ISOLATE]     = { MIGRATE_RESERVE }, /* Never used */
>> +static int fallbacks[MIGRATE_PRIME_TYPES][2] = {
>> +	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE   },
>> +	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE   },
>> +	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE },
>>  };
>>  
> 
> And it's completely unclear why it was necessary to rip out the existing
> fallback lists. It reworks how fallback lists work for no clear benefit.
> 
>>  /*
>> @@ -995,16 +988,15 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
>>  	struct page *page;
>>  	int migratetype, i;
>>  
>> +	if (WARN_ON_ONCE(start_migratetype >= MIGRATE_PRIME_TYPES))
>> +		start_migratetype = MIGRATE_UNMOVABLE;
>> +
> 
> This should be completely unnecessary. If this warning is hit, the
> callers are severely broken.

will be removed.

> 
>>  	/* Find the largest possible block of pages in the other list */
>>  	for (current_order = MAX_ORDER-1; current_order >= order;
>>  						--current_order) {
>> -		for (i = 0;; i++) {
>> +		for (i = 0; i < ARRAY_SIZE(fallbacks[0]); i++) {
>>  			migratetype = fallbacks[start_migratetype][i];
>>  
>> -			/* MIGRATE_RESERVE handled later if necessary */
>> -			if (migratetype == MIGRATE_RESERVE)
>> -				break;
>> -
>>  			area = &(zone->free_area[current_order]);
>>  			if (list_empty(&area->free_list[migratetype]))
>>  				continue;
>> @@ -1018,17 +1010,10 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
>>  			 * pages to the preferred allocation list. If falling
>>  			 * back for a reclaimable kernel allocation, be more
>>  			 * aggressive about taking ownership of free pages
>> -			 *
>> -			 * On the other hand, never change migration
>> -			 * type of MIGRATE_CMA pageblocks nor move CMA
>> -			 * pages on different free lists. We don't
>> -			 * want unmovable pages to be allocated from
>> -			 * MIGRATE_CMA areas.
>>  			 */
>> -			if (!is_migrate_cma(migratetype) &&
>> -			    (unlikely(current_order >= pageblock_order / 2) ||
>> -			     start_migratetype == MIGRATE_RECLAIMABLE ||
>> -			     page_group_by_mobility_disabled)) {
>> +			if (unlikely(current_order >= pageblock_order / 2) ||
>> +			    start_migratetype == MIGRATE_RECLAIMABLE ||
>> +			    page_group_by_mobility_disabled) {
>>  				int pages;
>>  				pages = move_freepages_block(zone, page,
>>  								start_migratetype);
>> @@ -1047,14 +1032,12 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype)
>>  			rmv_page_order(page);
>>  
>>  			/* Take ownership for orders >= pageblock_order */
>> -			if (current_order >= pageblock_order &&
>> -			    !is_migrate_cma(migratetype))
>> +			if (current_order >= pageblock_order)
>>  				change_pageblock_range(page, current_order,
>>  							start_migratetype);
>>  
>>  			expand(zone, page, order, current_order, area,
>> -			       is_migrate_cma(migratetype)
>> -			     ? migratetype : start_migratetype);
>> +			       start_migratetype);
>>  
>>  			trace_mm_page_alloc_extfrag(page, order, current_order,
>>  				start_migratetype, migratetype);
>> @@ -1075,22 +1058,18 @@ static struct page *__rmqueue(struct zone *zone, unsigned int order,
>>  {
>>  	struct page *page;
>>  
>> -retry_reserve:
>>  	page = __rmqueue_smallest(zone, order, migratetype);
>>  
>> -	if (unlikely(!page) && migratetype != MIGRATE_RESERVE) {
>> +#ifdef CONFIG_CMA
>> +	if (unlikely(!page) && migratetype == MIGRATE_MOVABLE)
>> +		page = __rmqueue_smallest(zone, order, MIGRATE_CMA);
>> +#endif
>> +
>> +	if (unlikely(!page))
>>  		page = __rmqueue_fallback(zone, order, migratetype);
>>  
>> -		/*
>> -		 * Use MIGRATE_RESERVE rather than fail an allocation. goto
>> -		 * is used because __rmqueue_smallest is an inline function
>> -		 * and we want just one call site
>> -		 */
>> -		if (!page) {
>> -			migratetype = MIGRATE_RESERVE;
>> -			goto retry_reserve;
>> -		}
>> -	}
>> +	if (unlikely(!page))
>> +		page = __rmqueue_smallest(zone, order, MIGRATE_RESERVE);
>>  
>>  	trace_mm_page_alloc_zone_locked(page, order, migratetype);
>>  	return page;
> 
> I didn't read this closely. It seems way more complex than necessary to
> solve the problem described in the changelog. All you should need is to
> special case that a __GFP_MOVABLE allocation using MIGRATE_CMA should
> place the unused buddies on the MIGRATE_MOVABLE lists.
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 2/3 V1] mm, page migrate: add MIGRATE_HOTREMOVE type
  2012-07-04 10:19   ` Mel Gorman
@ 2012-07-04 10:50     ` Lai Jiangshan
  0 siblings, 0 replies; 16+ messages in thread
From: Lai Jiangshan @ 2012-07-04 10:50 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Chris Metcalf, Len Brown, Greg Kroah-Hartman, Andi Kleen,
	Julia Lawall, David Howells, Benjamin Herrenschmidt, Kay Sievers,
	Ingo Molnar, Paul Gortmaker, Daniel Kiper, Andrew Morton,
	Konrad Rzeszutek Wilk, Michal Hocko, KAMEZAWA Hiroyuki,
	Minchan Kim, Michal Nazarewicz, Marek Szyprowski, Rik van Riel,
	Bjorn Helgaas, Christoph Lameter, David Rientjes, linux-kernel,
	linux-acpi, linux-mm

On 07/04/2012 06:19 PM, Mel Gorman wrote:
> On Wed, Jul 04, 2012 at 03:26:17PM +0800, Lai Jiangshan wrote:
>> MIGRATE_HOTREMOVE is a special kind of MIGRATE_MOVABLE, but it is stable:
>> any page of the type can NOT be changed to the other type nor be moved to
>> the other free list.
>>
>> So the pages of MIGRATE_HOTREMOVE are always movable, this ability is
>> useful for hugepages and hotremove ...etc.
>>
>> MIGRATE_HOTREMOVE pages is the used as the first candidate when
>> we allocate movable pages.
>>
>> 1) add small routine is_migrate_movable() for movable-like types
>> 2) add small routine is_migrate_stable() for stable types
>> 3) fix some comments
>> 4) fix get_any_page(). The get_any_page() may change
>>    MIGRATE_CMA/HOTREMOVE types page to MOVABLE which may cause this page
>>    to be changed to UNMOVABLE.
>>
> 
> Reuse MIGRATE_CMA. 

Will do it.

> Even if the pages are on the movable lists it should
> not be a problem for memory hot-remove.

It does have problem, unmovable pages may be allocated on it.

The movable lists can be used for other type when ohter type is empty.
Or we can rename current movable-lists to movable-preference-lists.

Thanks,
Lai

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 1/3 V1] mm, page_alloc: use __rmqueue_smallest when borrow memory from MIGRATE_CMA
  2012-07-04 10:43     ` Lai Jiangshan
@ 2012-07-04 11:19       ` Mel Gorman
  2012-07-05  1:36         ` Lai Jiangshan
  0 siblings, 1 reply; 16+ messages in thread
From: Mel Gorman @ 2012-07-04 11:19 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Chris Metcalf, Len Brown, Greg Kroah-Hartman, Andi Kleen,
	Julia Lawall, David Howells, Benjamin Herrenschmidt, Kay Sievers,
	Ingo Molnar, Paul Gortmaker, Daniel Kiper, Andrew Morton,
	Konrad Rzeszutek Wilk, Michal Hocko, KAMEZAWA Hiroyuki,
	Minchan Kim, Michal Nazarewicz, Marek Szyprowski, Rik van Riel,
	Bjorn Helgaas, Christoph Lameter, David Rientjes, linux-kernel,
	linux-acpi, linux-mm

On Wed, Jul 04, 2012 at 06:43:47PM +0800, Lai Jiangshan wrote:
> On 07/04/2012 06:17 PM, Mel Gorman wrote:
> > On Wed, Jul 04, 2012 at 03:26:16PM +0800, Lai Jiangshan wrote:
> >> The pages of MIGRATE_CMA can't not be changed to the other type,
> >> nor be moved to the other free list. 
> >>
> >> ==>
> >> So when we use __rmqueue_fallback() to borrow memory from MIGRATE_CMA,
> >> one of the highest order page is borrowed and it is split.
> >> But the free pages resulted by splitting can NOT
> >> be moved to MIGRATE_MOVABLE.
> >>
> >> ==>
> >> So in the next time of allocation, we NEED to borrow again,
> >> another one of the highest order page is borrowed from CMA and it is split.
> >> and results some other new split free pages.
> >>
> > 
> > Then special case __rmqueue_fallback() to move pages stolen from
> > MIGRATE_CMA to the MIGRATE_MOVABLE lists but do not change the pageblock
> > type.
> 
> Because unmovable-page-requirement can allocate page from
> MIGRATE_MOVABLE free list. So We can not move MIGRATE_CMA pages
> to the MIGRATE_MOVABLE free list.
> 

Ok, good point.

> See here:
> 
> MOVABLE list is empty
> UNMOVABLE list is empty
> movable-page-requirement
> 	borrow from CMA list
> 	split it, others are put into UNMOVABLE list
> unmovable-page-requiremnt
> 	borrow from UNMOVABLE list
> 	NOW, it is BUG, we use CMA pages for unmovable usage.
> 

The patch still looks unnecessarily complex for what you are trying to
achieve and as a result I'm not reviewing it as carefully as I should.
It looks like the entire patch boiled down to this hunk here

+#ifdef CONFIG_CMA
+	if (unlikely(!page) && migratetype == MIGRATE_MOVABLE)
+		page = __rmqueue_smallest(zone, order, MIGRATE_CMA);
+#endif
+

With that in place, this would would need to change from

[MIGRATE_MOVABLE]     = { MIGRATE_CMA, MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },

to

[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },

because the fallback is already being handled as a special case. Leave
the other fallback logic as it is.

This is not tested at all and is only meant to illustrate why I think
your patch looks excessively complex for what you are trying to
achieve.

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4beb7ae..0063e93 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -895,11 +895,9 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
 static int fallbacks[MIGRATE_TYPES][4] = {
 	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,     MIGRATE_RESERVE },
 	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,     MIGRATE_RESERVE },
+	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE,   MIGRATE_RESERVE },
 #ifdef CONFIG_CMA
-	[MIGRATE_MOVABLE]     = { MIGRATE_CMA,         MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
 	[MIGRATE_CMA]         = { MIGRATE_RESERVE }, /* Never used */
-#else
-	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE,   MIGRATE_RESERVE },
 #endif
 	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE }, /* Never used */
 	[MIGRATE_ISOLATE]     = { MIGRATE_RESERVE }, /* Never used */
@@ -1076,6 +1074,20 @@ static struct page *__rmqueue(struct zone *zone, unsigned int order,
 
 retry_reserve:
 	page = __rmqueue_smallest(zone, order, migratetype);
+#ifdef CONFIG_CMA
+	if (!unlikely(!page) && migratetype == MIGRATE_MOVABLE) {
+
+		/*
+		 * CMA is a special case where we want to use
+		 * the smallest available page instead of splitting
+		 * the largest chunks. We still must avoid the pages
+		 * moving to MIGRATE_MOVABLE where they might be
+		 * used for UNRECLAIMABLE or UNMOVABLE allocations
+		 */
+		migratetype = MIGRATE_CMA;
+		goto retry_reserve;
+	}
+#endif /* CONFIG_CMA */
 
 	if (unlikely(!page) && migratetype != MIGRATE_RESERVE) {
 		page = __rmqueue_fallback(zone, order, migratetype);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 3/3 V1] mm, memory-hotplug: add online_movable
  2012-07-04  7:26 ` [RFC PATCH 3/3 V1] mm, memory-hotplug: add online_movable Lai Jiangshan
@ 2012-07-04 14:58   ` Greg Kroah-Hartman
  0 siblings, 0 replies; 16+ messages in thread
From: Greg Kroah-Hartman @ 2012-07-04 14:58 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Mel Gorman, Chris Metcalf, --, Len Brown, Andi Kleen,
	Julia Lawall, David Howells, Benjamin Herrenschmidt, Kay Sievers,
	Ingo Molnar, Paul Gortmaker, Daniel Kiper, Andrew Morton,
	Konrad Rzeszutek Wilk, Michal Hocko, KAMEZAWA Hiroyuki,
	Minchan Kim, Michal Nazarewicz, Marek Szyprowski, Rik van Riel,
	Bjorn Helgaas, Christoph Lameter, David Rientjes, linux-kernel,
	linux-acpi, linux-mm

On Wed, Jul 04, 2012 at 03:26:18PM +0800, Lai Jiangshan wrote:
> When a memoryblock is onlined by "online_movable", the kernel will not
> have directly reference to the page of the memoryblock,
> thus we can remove that memory any time when needed.
> 
> It makes things easy when we dynamic hot-add/remove memory, make better
> utilities of memories.

As you are changing the kernel/user API here, please update
Documentation/ABI with the proper sysfs file changes at the same time.

thanks,

greg k-h

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 1/3 V1] mm, page_alloc: use __rmqueue_smallest when borrow memory from MIGRATE_CMA
  2012-07-04 11:19       ` Mel Gorman
@ 2012-07-05  1:36         ` Lai Jiangshan
  2012-07-05  8:38           ` Mel Gorman
  0 siblings, 1 reply; 16+ messages in thread
From: Lai Jiangshan @ 2012-07-05  1:36 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Chris Metcalf, Len Brown, Greg Kroah-Hartman, Andi Kleen,
	Julia Lawall, David Howells, Benjamin Herrenschmidt, Kay Sievers,
	Ingo Molnar, Paul Gortmaker, Daniel Kiper, Andrew Morton,
	Konrad Rzeszutek Wilk, Michal Hocko, KAMEZAWA Hiroyuki,
	Minchan Kim, Michal Nazarewicz, Marek Szyprowski, Rik van Riel,
	Bjorn Helgaas, Christoph Lameter, David Rientjes, linux-kernel,
	linux-acpi, linux-mm

On 07/04/2012 07:19 PM, Mel Gorman wrote:
> On Wed, Jul 04, 2012 at 06:43:47PM +0800, Lai Jiangshan wrote:
>> On 07/04/2012 06:17 PM, Mel Gorman wrote:
>>> On Wed, Jul 04, 2012 at 03:26:16PM +0800, Lai Jiangshan wrote:
>>>> The pages of MIGRATE_CMA can't not be changed to the other type,
>>>> nor be moved to the other free list. 
>>>>
>>>> ==>
>>>> So when we use __rmqueue_fallback() to borrow memory from MIGRATE_CMA,
>>>> one of the highest order page is borrowed and it is split.
>>>> But the free pages resulted by splitting can NOT
>>>> be moved to MIGRATE_MOVABLE.
>>>>
>>>> ==>
>>>> So in the next time of allocation, we NEED to borrow again,
>>>> another one of the highest order page is borrowed from CMA and it is split.
>>>> and results some other new split free pages.
>>>>
>>>
>>> Then special case __rmqueue_fallback() to move pages stolen from
>>> MIGRATE_CMA to the MIGRATE_MOVABLE lists but do not change the pageblock
>>> type.
>>
>> Because unmovable-page-requirement can allocate page from
>> MIGRATE_MOVABLE free list. So We can not move MIGRATE_CMA pages
>> to the MIGRATE_MOVABLE free list.
>>
> 
> Ok, good point.
> 
>> See here:
>>
>> MOVABLE list is empty
>> UNMOVABLE list is empty
>> movable-page-requirement
>> 	borrow from CMA list
>> 	split it, others are put into UNMOVABLE list
>> unmovable-page-requiremnt
>> 	borrow from UNMOVABLE list
>> 	NOW, it is BUG, we use CMA pages for unmovable usage.
>>
> 
> The patch still looks unnecessarily complex for what you are trying to

Which is complex in my code? __rmqueue_smallest()? __rmqueue_fallback()?

__rmqueue_smallest() ? I think it is required.

__rmqueue_fallback()?
It is just a cleanup for __rmqueue_fallback(), CMA is removed out from
__rmqueue_fallback(), so we can cleanup fallback(). I will remove/split
the cleanup part of the patch in next round.

> achieve and as a result I'm not reviewing it as carefully as I should.
> It looks like the entire patch boiled down to this hunk here
> 
> +#ifdef CONFIG_CMA
> +	if (unlikely(!page) && migratetype == MIGRATE_MOVABLE)
> +		page = __rmqueue_smallest(zone, order, MIGRATE_CMA);
> +#endif
> +
> 
> With that in place, this would would need to change from
> 
> [MIGRATE_MOVABLE]     = { MIGRATE_CMA, MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
> 
> to
> 
> [MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
> 
> because the fallback is already being handled as a special case. Leave
> the other fallback logic as it is.
> 
> This is not tested at all and is only meant to illustrate why I think
> your patch looks excessively complex for what you are trying to
> achieve.
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 4beb7ae..0063e93 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -895,11 +895,9 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
>  static int fallbacks[MIGRATE_TYPES][4] = {
>  	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,     MIGRATE_RESERVE },
>  	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,     MIGRATE_RESERVE },
> +	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE,   MIGRATE_RESERVE },
>  #ifdef CONFIG_CMA
> -	[MIGRATE_MOVABLE]     = { MIGRATE_CMA,         MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
>  	[MIGRATE_CMA]         = { MIGRATE_RESERVE }, /* Never used */
> -#else
> -	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE,   MIGRATE_RESERVE },
>  #endif
>  	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE }, /* Never used */
>  	[MIGRATE_ISOLATE]     = { MIGRATE_RESERVE }, /* Never used */
> @@ -1076,6 +1074,20 @@ static struct page *__rmqueue(struct zone *zone, unsigned int order,
>  
>  retry_reserve:
>  	page = __rmqueue_smallest(zone, order, migratetype);
> +#ifdef CONFIG_CMA
> +	if (!unlikely(!page) && migratetype == MIGRATE_MOVABLE) {
> +
> +		/*
> +		 * CMA is a special case where we want to use
> +		 * the smallest available page instead of splitting
> +		 * the largest chunks. We still must avoid the pages
> +		 * moving to MIGRATE_MOVABLE where they might be
> +		 * used for UNRECLAIMABLE or UNMOVABLE allocations
> +		 */
> +		migratetype = MIGRATE_CMA;
> +		goto retry_reserve;
> +	}
> +#endif /* CONFIG_CMA */
>  
>  	if (unlikely(!page) && migratetype != MIGRATE_RESERVE) {


need to add

+		if (migratetype == MIGRATE_CMA)
+			migratetype = MIGRATE_MOVABLE;

to restore the migratetype for fallback.

And your code are the same as mine in the view of CPU:
__rmqueue_smallest(MIGRATE_MOVABLE)
if failed: __rmqueue_smallest(MIGRATE_CMA)
if failed: __rmqueue_fallback()
if failed: __rmqueue_smallest(MIGRATE_RESERVE)

The differences are:
you just use "goto" instead "if" for instruction control.
your code are longer.
the number of branch in your code = mine + 1

My code have better readability:
Mine:

========================================================================
	page = __rmqueue_smallest(zone, order, migratetype);

#ifdef CONFIG_CMA
	if (unlikely(!page) && migratetype == MIGRATE_MOVABLE)
		page = __rmqueue_smallest(zone, order, MIGRATE_CMA);
#endif

	if (unlikely(!page))
		page = __rmqueue_fallback(zone, order, migratetype);

	if (unlikely(!page))
		page = __rmqueue_smallest(zone, order, MIGRATE_RESERVE);

	trace_mm_page_alloc_zone_locked(page, order, migratetype);
	return page;
=====================================================================

Yours:

==================================================================
retry_reserve:
	page = __rmqueue_smallest(zone, order, migratetype);

#ifdef CONFIG_CMA
	if (!unlikely(!page) && migratetype == MIGRATE_MOVABLE) {

		/*
		 * CMA is a special case where we want to use
		 * the smallest available page instead of splitting
		 * the largest chunks. We still must avoid the pages
		 * moving to MIGRATE_MOVABLE where they might be
		 * used for UNRECLAIMABLE or UNMOVABLE allocations
		 */
		migratetype = MIGRATE_CMA;
		goto retry_reserve;
	}
#endif /* CONFIG_CMA */


	if (unlikely(!page) && migratetype != MIGRATE_RESERVE) {
		if (migratetype == MIGRATE_CMA)
			migratetype = MIGRATE_MOVABLE;

		page = __rmqueue_fallback(zone, order, migratetype);

		/*
		 * Use MIGRATE_RESERVE rather than fail an allocation. goto
		 * is used because __rmqueue_smallest is an inline function
		 * and we want just one call site
		 */
		if (!page) {
			migratetype = MIGRATE_RESERVE;
			goto retry_reserve;
		}
	}

	trace_mm_page_alloc_zone_locked(page, order, migratetype);
	return page;
==========================================================================

How about this one? (just type it in the email client)

#define RMQUEUE_FALLBACK 1024
int rmqueue_list[3][4] = {
	[MIGRATE_UNMOVABLE] = { MIGRATE_UNMOVABLE, RMQUEUE_FALLBACK, MIGRATE_RESERVE},
	[MIGRATE_RECLAIMABLE] = { MIGRATE_RECLAIMABLE, RMQUEUE_FALLBACK, MIGRATE_RESERVE},
	[MIGRATE_MOVABLE] = {MIGRATE_MOVABLE, MIGRATE_CMA, RMQUEUE_FALLBACK, MIGRATE_RESERVE},
}

static struct page *__rmqueue(struct zone *zone, unsigned int order,
						int migratetype)
{
	struct page *page;
	int i, mt;

	for (i = 0; ; i++) {
		mt = rmqueue_list[migratetype][i];
		if (likely(mt != RMQUEUE_FALLBACK)
			page = __rmqueue_smallest(zone, order, mt);
		else
			page = __rmqueue_fallback(zone, order, migratetype);

		/* MIGRATE_RESERVE is always the last one */
		if (likely(page) || (mt == MIGRATE_RESERVE))
			break;
	}

	trace_mm_page_alloc_zone_locked(page, order, migratetype);
	return page;
}

Thanks,
Lai


>  		page = __rmqueue_fallback(zone, order, migratetype);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 1/3 V1] mm, page_alloc: use __rmqueue_smallest when borrow memory from MIGRATE_CMA
  2012-07-05  1:36         ` Lai Jiangshan
@ 2012-07-05  8:38           ` Mel Gorman
  0 siblings, 0 replies; 16+ messages in thread
From: Mel Gorman @ 2012-07-05  8:38 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Chris Metcalf, Len Brown, Greg Kroah-Hartman, Andi Kleen,
	Julia Lawall, David Howells, Benjamin Herrenschmidt, Kay Sievers,
	Ingo Molnar, Paul Gortmaker, Daniel Kiper, Andrew Morton,
	Konrad Rzeszutek Wilk, Michal Hocko, KAMEZAWA Hiroyuki,
	Minchan Kim, Michal Nazarewicz, Marek Szyprowski, Rik van Riel,
	Bjorn Helgaas, Christoph Lameter, David Rientjes, linux-kernel,
	linux-acpi, linux-mm

On Thu, Jul 05, 2012 at 09:36:14AM +0800, Lai Jiangshan wrote:
> On 07/04/2012 07:19 PM, Mel Gorman wrote:
> > On Wed, Jul 04, 2012 at 06:43:47PM +0800, Lai Jiangshan wrote:
> >> On 07/04/2012 06:17 PM, Mel Gorman wrote:
> >>> On Wed, Jul 04, 2012 at 03:26:16PM +0800, Lai Jiangshan wrote:
> >>>> The pages of MIGRATE_CMA can't not be changed to the other type,
> >>>> nor be moved to the other free list. 
> >>>>
> >>>> ==>
> >>>> So when we use __rmqueue_fallback() to borrow memory from MIGRATE_CMA,
> >>>> one of the highest order page is borrowed and it is split.
> >>>> But the free pages resulted by splitting can NOT
> >>>> be moved to MIGRATE_MOVABLE.
> >>>>
> >>>> ==>
> >>>> So in the next time of allocation, we NEED to borrow again,
> >>>> another one of the highest order page is borrowed from CMA and it is split.
> >>>> and results some other new split free pages.
> >>>>
> >>>
> >>> Then special case __rmqueue_fallback() to move pages stolen from
> >>> MIGRATE_CMA to the MIGRATE_MOVABLE lists but do not change the pageblock
> >>> type.
> >>
> >> Because unmovable-page-requirement can allocate page from
> >> MIGRATE_MOVABLE free list. So We can not move MIGRATE_CMA pages
> >> to the MIGRATE_MOVABLE free list.
> >>
> > 
> > Ok, good point.
> > 
> >> See here:
> >>
> >> MOVABLE list is empty
> >> UNMOVABLE list is empty
> >> movable-page-requirement
> >> 	borrow from CMA list
> >> 	split it, others are put into UNMOVABLE list
> >> unmovable-page-requiremnt
> >> 	borrow from UNMOVABLE list
> >> 	NOW, it is BUG, we use CMA pages for unmovable usage.
> >>
> > 
> > The patch still looks unnecessarily complex for what you are trying to
> 
> Which is complex in my code? __rmqueue_smallest()? __rmqueue_fallback()?
> 

It's the review that was confusing. It churned a lot of code, altered the
fallback lists, added a misleading name (MIGRATE_PRIME_TYPES because PRIME
in this context does not mean anything useful), added a warning that made
no sense and stopped __rmqueue_smallest from being inlined. In cases
like this it is better to split your patch into the part you want
followed by a cleanup patch if that is necessary.

> __rmqueue_smallest() ? I think it is required.
> 
> __rmqueue_fallback()?
> It is just a cleanup for __rmqueue_fallback(), CMA is removed out from
> __rmqueue_fallback(), so we can cleanup fallback(). I will remove/split
> the cleanup part of the patch in next round.
> 
> > achieve and as a result I'm not reviewing it as carefully as I should.
> > It looks like the entire patch boiled down to this hunk here
> > 
> > +#ifdef CONFIG_CMA
> > +	if (unlikely(!page) && migratetype == MIGRATE_MOVABLE)
> > +		page = __rmqueue_smallest(zone, order, MIGRATE_CMA);
> > +#endif
> > +
> > 
> > With that in place, this would would need to change from
> > 
> > [MIGRATE_MOVABLE]     = { MIGRATE_CMA, MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
> > 
> > to
> > 
> > [MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
> > 
> > because the fallback is already being handled as a special case. Leave
> > the other fallback logic as it is.
> > 
> > This is not tested at all and is only meant to illustrate why I think
> > your patch looks excessively complex for what you are trying to
> > achieve.
> > 
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 4beb7ae..0063e93 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -895,11 +895,9 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
> >  static int fallbacks[MIGRATE_TYPES][4] = {
> >  	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,     MIGRATE_RESERVE },
> >  	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,     MIGRATE_RESERVE },
> > +	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE,   MIGRATE_RESERVE },
> >  #ifdef CONFIG_CMA
> > -	[MIGRATE_MOVABLE]     = { MIGRATE_CMA,         MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
> >  	[MIGRATE_CMA]         = { MIGRATE_RESERVE }, /* Never used */
> > -#else
> > -	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE,   MIGRATE_RESERVE },
> >  #endif
> >  	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE }, /* Never used */
> >  	[MIGRATE_ISOLATE]     = { MIGRATE_RESERVE }, /* Never used */
> > @@ -1076,6 +1074,20 @@ static struct page *__rmqueue(struct zone *zone, unsigned int order,
> >  
> >  retry_reserve:
> >  	page = __rmqueue_smallest(zone, order, migratetype);
> > +#ifdef CONFIG_CMA
> > +	if (!unlikely(!page) && migratetype == MIGRATE_MOVABLE) {
> > +
> > +		/*
> > +		 * CMA is a special case where we want to use
> > +		 * the smallest available page instead of splitting
> > +		 * the largest chunks. We still must avoid the pages
> > +		 * moving to MIGRATE_MOVABLE where they might be
> > +		 * used for UNRECLAIMABLE or UNMOVABLE allocations
> > +		 */
> > +		migratetype = MIGRATE_CMA;
> > +		goto retry_reserve;
> > +	}
> > +#endif /* CONFIG_CMA */
> >  
> >  	if (unlikely(!page) && migratetype != MIGRATE_RESERVE) {
> 
> 
> need to add
> 
> +		if (migratetype == MIGRATE_CMA)
> +			migratetype = MIGRATE_MOVABLE;
> 
> to restore the migratetype for fallback.
> 

True, but it'd still be easier to review and __rmqueue_smallest would
still be inlined.

> And your code are the same as mine in the view of CPU:
> __rmqueue_smallest(MIGRATE_MOVABLE)
> if failed: __rmqueue_smallest(MIGRATE_CMA)
> if failed: __rmqueue_fallback()
> if failed: __rmqueue_smallest(MIGRATE_RESERVE)
> 
> The differences are:
> you just use "goto" instead "if" for instruction control.
> your code are longer.
> the number of branch in your code = mine + 1
> 
> My code have better readability:
> Mine:
> 
> ========================================================================
> 	page = __rmqueue_smallest(zone, order, migratetype);
> 
> #ifdef CONFIG_CMA
> 	if (unlikely(!page) && migratetype == MIGRATE_MOVABLE)
> 		page = __rmqueue_smallest(zone, order, MIGRATE_CMA);
> #endif
> 
> 	if (unlikely(!page))
> 		page = __rmqueue_fallback(zone, order, migratetype);
> 
> 	if (unlikely(!page))
> 		page = __rmqueue_smallest(zone, order, MIGRATE_RESERVE);
> 
> 	trace_mm_page_alloc_zone_locked(page, order, migratetype);
> 	return page;
> =====================================================================
> 
> Yours:
> 
> ==================================================================
> retry_reserve:
> 	page = __rmqueue_smallest(zone, order, migratetype);
> 
> #ifdef CONFIG_CMA
> 	if (!unlikely(!page) && migratetype == MIGRATE_MOVABLE) {
> 
> 		/*
> 		 * CMA is a special case where we want to use
> 		 * the smallest available page instead of splitting
> 		 * the largest chunks. We still must avoid the pages
> 		 * moving to MIGRATE_MOVABLE where they might be
> 		 * used for UNRECLAIMABLE or UNMOVABLE allocations
> 		 */
> 		migratetype = MIGRATE_CMA;
> 		goto retry_reserve;
> 	}
> #endif /* CONFIG_CMA */
> 
> 
> 	if (unlikely(!page) && migratetype != MIGRATE_RESERVE) {
> 		if (migratetype == MIGRATE_CMA)
> 			migratetype = MIGRATE_MOVABLE;
> 
> 		page = __rmqueue_fallback(zone, order, migratetype);
> 
> 		/*
> 		 * Use MIGRATE_RESERVE rather than fail an allocation. goto
> 		 * is used because __rmqueue_smallest is an inline function
> 		 * and we want just one call site
> 		 */
> 		if (!page) {
> 			migratetype = MIGRATE_RESERVE;
> 			goto retry_reserve;
> 		}
> 	}
> 
> 	trace_mm_page_alloc_zone_locked(page, order, migratetype);
> 	return page;
> ==========================================================================
> 
> How about this one? (just type it in the email client)
> 
> #define RMQUEUE_FALLBACK 1024

Use -1 to be clear this is an impossible index. Add a comment explaining
that

> int rmqueue_list[3][4] = {

Use MIGRATE_PCPTYPES and MIGRATE_PCPTYPES+1 to size the array.

> 	[MIGRATE_UNMOVABLE] = { MIGRATE_UNMOVABLE, RMQUEUE_FALLBACK, MIGRATE_RESERVE},
> 	[MIGRATE_RECLAIMABLE] = { MIGRATE_RECLAIMABLE, RMQUEUE_FALLBACK, MIGRATE_RESERVE},
> 	[MIGRATE_MOVABLE] = {MIGRATE_MOVABLE, MIGRATE_CMA, RMQUEUE_FALLBACK, MIGRATE_RESERVE},
> }
> 
> static struct page *__rmqueue(struct zone *zone, unsigned int order,
> 						int migratetype)
> {
> 	struct page *page;
> 	int i, mt;
> 
> 	for (i = 0; ; i++) {
> 		mt = rmqueue_list[migratetype][i];
> 		if (likely(mt != RMQUEUE_FALLBACK)
> 			page = __rmqueue_smallest(zone, order, mt);
> 		else
> 			page = __rmqueue_fallback(zone, order, migratetype);
> 
> 		/* MIGRATE_RESERVE is always the last one */
> 		if (likely(page) || (mt == MIGRATE_RESERVE))
> 			break;
> 	}
> 
> 	trace_mm_page_alloc_zone_locked(page, order, migratetype);
> 	return page;
> }

That would indeed churn the code a lot less and preserve the inlining of
__rmqueue_smallest. It comes at the cost of an additional static array
but the flow is clear at least.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 0/3 V1] mm: add new migrate type and online_movable for hotplug
  2012-07-04  7:26 [RFC PATCH 0/3 V1] mm: add new migrate type and online_movable for hotplug Lai Jiangshan
                   ` (3 preceding siblings ...)
  2012-07-04  7:35 ` [RFC PATCH 0/3 V1] mm: add new migrate type and online_movable for hotplug Minchan Kim
@ 2012-07-05  9:05 ` Mel Gorman
  4 siblings, 0 replies; 16+ messages in thread
From: Mel Gorman @ 2012-07-05  9:05 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Chris Metcalf, Len Brown, Greg Kroah-Hartman, Andi Kleen,
	Julia Lawall, David Howells, Benjamin Herrenschmidt, Kay Sievers,
	Ingo Molnar, Paul Gortmaker, Daniel Kiper, Andrew Morton,
	Konrad Rzeszutek Wilk, Michal Hocko, KAMEZAWA Hiroyuki,
	Minchan Kim, Michal Nazarewicz, Marek Szyprowski, Rik van Riel,
	Bjorn Helgaas, Christoph Lameter, David Rientjes, linux-kernel,
	linux-acpi, linux-mm

On Wed, Jul 04, 2012 at 03:26:15PM +0800, Lai Jiangshan wrote:
> > <SNIP>
> 
> Different from ZONE_MOVABLE: it can be used for any given memroyblock.
> 
> Lai Jiangshan (3):
>   use __rmqueue_smallest when borrow memory from MIGRATE_CMA
>   add MIGRATE_HOTREMOVE type
>   add online_movable
> 
>  arch/tile/mm/init.c            |    2 +-
>  drivers/acpi/acpi_memhotplug.c |    3 +-
>  drivers/base/memory.c          |   24 +++++++----
>  include/linux/memory.h         |    1 +
>  include/linux/memory_hotplug.h |    4 +-
>  include/linux/mmzone.h         |   37 +++++++++++++++++
>  include/linux/page-isolation.h |    2 +-
>  mm/compaction.c                |    6 +-
>  mm/memory-failure.c            |    8 +++-
>  mm/memory_hotplug.c            |   36 +++++++++++++---
>  mm/page_alloc.c                |   86 ++++++++++++++++-----------------------
>  mm/vmstat.c                    |    3 +
>  12 files changed, 136 insertions(+), 76 deletions(-)
> 

I apologise for my crap review of the first patch to date. It was atrociously
bad form and one of the reasons my review was so superficial was because I
was aware of the problem below. It's pretty severe, we've encountered it on
other occasions and it led me to dismiss the series quickly without adequate
explanation or close review when I should have taken the time to explain it.

The reason ZONE_MOVABLE exists is because of page reclaim. MIGRATE_CMA
or any migrate type that is MIGRATE_CMA-like is not understood by reclaim
currently and is not addressed by this series just from looking the diffstat
(no changes to vmscan.c). In low memory situations, it's actually fine
and the system appears to work well. The problem is either when the
MIGRATE_CMA-like area is large and is either completely free or is the
only source of pages that can be reclaimed.

In these cases, MIGRATE_UNMOVABLE and MIGRATE_RECLAIMABLE allocations fail
because their lists and fallback lists are empty. However, if it enters
direct reclaim or wakes kswapd the watermarks are fine and reclaim does
nothing. Depending on implementation details this causes either a loop
or OOM.

Minimally the watermark checks need to take the MIGRATE_CMA area into account
but even then it is still fragile. If kswapd and direct reclaim are forced
to reclaim pages, there is no guarantee they will reclaim pages that are
usable by MIGRATE_UNMOVABLE or MIGRATE_RECLAIMABLE. To handle this you must
either keep reclaiming pages until it works (easy to implement but disruptive
to the system) or scan the LRU lists searching for suitable pages (higher
CPU usage, LRU age disruption, will require the entire zone to be scanned
in the OOM case which will be slow and subject to races and possible false
OOMs). When these situations occur, it is very difficult to debug because it
just looks like a hang and the exact triggering situations will be different.

If the allocation then fails due to insufficient usable memory, the
resulting OOM message will be harder to read because it'll show OOM when
there are plenty of pages free. This can be addressed by clear accounting and
informative messages of course but to be very clear it might be necessary
to walk all the buddy lists to identify how many of the free pages were
MIGRATE_CMA. You could use separate accounting of course but then you have
accounting and memory overhead instead.

In the case of CMA, this issue is less of a problem but it was discussed
before CMA was merged. CMAs use case means that it is not likely to suffer
severely because of the expected size of the region, how its used and how
many slab allocations are expected on the systems it targets. It's far worse
for memory hotplug because if the bulk of your memory is memory hotplugged,
you may not be able to use it for metadata-intensive workloads for example
which will result in bug reports. You could have 90% free memory and
be unable to use any of it because you cannot increase the size of slab
leading to odd corner cases.

ZONE_MOVABLE is not great, but it handles the reclaim issues in a
straight-forward manner, OOM is handled quickly because the whole system
does not have to be scanned to detect the situation and the OOM messages
are easy to read. If you want to replace it with MIGRATE_CMA or
something MIGRATE_CMA-like, you need to take these issues into account
or at the very least explain in detail why it is not an issue.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2012-07-05  9:05 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-07-04  7:26 [RFC PATCH 0/3 V1] mm: add new migrate type and online_movable for hotplug Lai Jiangshan
2012-07-04  7:26 ` [RFC PATCH 1/3 V1] mm, page_alloc: use __rmqueue_smallest when borrow memory from MIGRATE_CMA Lai Jiangshan
2012-07-04 10:17   ` Mel Gorman
2012-07-04 10:43     ` Lai Jiangshan
2012-07-04 11:19       ` Mel Gorman
2012-07-05  1:36         ` Lai Jiangshan
2012-07-05  8:38           ` Mel Gorman
2012-07-04  7:26 ` [RFC PATCH 2/3 V1] mm, page migrate: add MIGRATE_HOTREMOVE type Lai Jiangshan
2012-07-04 10:19   ` Mel Gorman
2012-07-04 10:50     ` Lai Jiangshan
2012-07-04  7:26 ` [RFC PATCH 3/3 V1] mm, memory-hotplug: add online_movable Lai Jiangshan
2012-07-04 14:58   ` Greg Kroah-Hartman
2012-07-04  7:35 ` [RFC PATCH 0/3 V1] mm: add new migrate type and online_movable for hotplug Minchan Kim
2012-07-04  8:23   ` Lai Jiangshan
2012-07-04  8:43     ` Lai Jiangshan
2012-07-05  9:05 ` Mel Gorman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).