linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] introduce a new state 'isolate' for memblock to split the isolation and migration steps
@ 2018-09-19  3:17 Pingfan Liu
  2018-09-19  3:17 ` [PATCH 1/3] mm/isolation: separate the isolation and migration ops in offline memblock Pingfan Liu
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Pingfan Liu @ 2018-09-19  3:17 UTC (permalink / raw)
  To: linux-mm
  Cc: Pingfan Liu, Andrew Morton, KAMEZAWA Hiroyuki, Mel Gorman,
	Greg Kroah-Hartman, Pavel Tatashin, Michal Hocko, Bharata B Rao,
	Dan Williams, H. Peter Anvin, Kirill A . Shutemov

Currently, offline pages in the unit of memblock, and normally, it is done
one by one on each memblock. If there is only one numa node, then the dst
pages may come from the next memblock to be offlined, which wastes time
during memory offline. For a system with multi numa node, if only replacing
part of mem on a node, and the migration dst page can be allocated from
local node (which is done by [3/3]), it also faces such issue.
This patch suggests to introduce a new state, named 'isolate', the state
transition can be isolate -> online or reversion. And another slight
benefit of "isolated" state is no further allocation on this memblock,
which can block potential unmovable page allocated again from this
memblock for a long time.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

Pingfan Liu (3):
  mm/isolation: separate the isolation and migration ops in offline
    memblock
  drivers/base/memory: introduce a new state 'isolate' for memblock
  drivers/base/node: create a partial offline hints under each node

 drivers/base/memory.c           | 31 ++++++++++++++++++++++++++++++-
 drivers/base/node.c             | 33 +++++++++++++++++++++++++++++++++
 include/linux/memory.h          |  1 +
 include/linux/mmzone.h          |  1 +
 include/linux/page-isolation.h  |  4 ++--
 include/linux/pageblock-flags.h |  2 ++
 mm/memory_hotplug.c             | 37 ++++++++++++++++++++++---------------
 mm/page_alloc.c                 |  4 ++--
 mm/page_isolation.c             | 28 +++++++++++++++++++++++-----
 9 files changed, 116 insertions(+), 25 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/3] mm/isolation: separate the isolation and migration ops in offline memblock
  2018-09-19  3:17 [PATCH 0/3] introduce a new state 'isolate' for memblock to split the isolation and migration steps Pingfan Liu
@ 2018-09-19  3:17 ` Pingfan Liu
  2018-09-19  3:17 ` [PATCH 2/3] drivers/base/memory: introduce a new state 'isolate' for memblock Pingfan Liu
  2018-09-19  3:17 ` [PATCH 3/3] drivers/base/node: create a partial offline hints under each node Pingfan Liu
  2 siblings, 0 replies; 6+ messages in thread
From: Pingfan Liu @ 2018-09-19  3:17 UTC (permalink / raw)
  To: linux-mm
  Cc: Pingfan Liu, Andrew Morton, KAMEZAWA Hiroyuki, Mel Gorman,
	Greg Kroah-Hartman, Pavel Tatashin, Michal Hocko, Bharata B Rao,
	Dan Williams, H. Peter Anvin, Kirill A . Shutemov

The current design of start_isolate_page_range() relies on MIGRATE_ISOLATE
to run against other threads. Hence the callers of start_isolate_page_range()
can only do the isolation by themselves.
But in this series, a suggested mem offline seq splits the pageblock's
isolation and migration on a memblock, i.e.
  -1. call start_isolate_page_range() on a batch of memblock
  -2. call __offline_pages() on each memblock.
This requires the ability to allow __offline_pages() to reuse the isolation

About the mark of isolation, it is not preferable to do it in
memblock, because at this level, pageblock is used, and the memblock should
be hidden. On the other hand, isolation and compaction can not run in
parallel, the PB_migrate_skip bit can be reused to mark the isolation
result of previous ops, as used by this patch. Also the prototype of
start_isolate_page_range() is changed to tell __offline_pages cases from
temporary isolation e.g. alloc_contig_range()

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/page-isolation.h  |  4 ++--
 include/linux/pageblock-flags.h |  2 ++
 mm/memory_hotplug.c             |  6 +++---
 mm/page_alloc.c                 |  4 ++--
 mm/page_isolation.c             | 28 +++++++++++++++++++++++-----
 5 files changed, 32 insertions(+), 12 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 4ae347c..dcc2bd1 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -47,7 +47,7 @@ int move_freepages_block(struct zone *zone, struct page *page,
  */
 int
 start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
-			 unsigned migratetype, bool skip_hwpoisoned_pages);
+	unsigned int migratetype, bool skip_hwpoisoned_pages, bool reuse);
 
 /*
  * Changes MIGRATE_ISOLATE to MIGRATE_MOVABLE.
@@ -55,7 +55,7 @@ start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
  */
 int
 undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
-			unsigned migratetype);
+	unsigned int migratetype, bool reuse);
 
 /*
  * Test all pages in [start_pfn, end_pfn) are isolated or not.
diff --git a/include/linux/pageblock-flags.h b/include/linux/pageblock-flags.h
index 9132c5c..80c5341 100644
--- a/include/linux/pageblock-flags.h
+++ b/include/linux/pageblock-flags.h
@@ -31,6 +31,8 @@ enum pageblock_bits {
 	PB_migrate_end = PB_migrate + 3 - 1,
 			/* 3 bits required for migrate types */
 	PB_migrate_skip,/* If set the block is skipped by compaction */
+	PB_isolate_skip = PB_migrate_skip,
+			/* isolation and compaction do not concur */
 
 	/*
 	 * Assume the bits will always align on a word. If this assumption
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 9eea6e8..228de4d 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1616,7 +1616,7 @@ static int __ref __offline_pages(unsigned long start_pfn,
 
 	/* set above range as isolated */
 	ret = start_isolate_page_range(start_pfn, end_pfn,
-				       MIGRATE_MOVABLE, true);
+				       MIGRATE_MOVABLE, true, true);
 	if (ret)
 		return ret;
 
@@ -1662,7 +1662,7 @@ static int __ref __offline_pages(unsigned long start_pfn,
 	   We cannot do rollback at this point. */
 	offline_isolated_pages(start_pfn, end_pfn);
 	/* reset pagetype flags and makes migrate type to be MOVABLE */
-	undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+	undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE, true);
 	/* removal success */
 	adjust_managed_page_count(pfn_to_page(start_pfn), -offlined_pages);
 	zone->present_pages -= offlined_pages;
@@ -1697,7 +1697,7 @@ static int __ref __offline_pages(unsigned long start_pfn,
 		 ((unsigned long long) end_pfn << PAGE_SHIFT) - 1);
 	memory_notify(MEM_CANCEL_OFFLINE, &arg);
 	/* pushback to free area */
-	undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+	undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE, true);
 	return ret;
 }
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 05e983f..a0ae259 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7882,7 +7882,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 
 	ret = start_isolate_page_range(pfn_max_align_down(start),
 				       pfn_max_align_up(end), migratetype,
-				       false);
+				       false, false);
 	if (ret)
 		return ret;
 
@@ -7967,7 +7967,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 
 done:
 	undo_isolate_page_range(pfn_max_align_down(start),
-				pfn_max_align_up(end), migratetype);
+		pfn_max_align_up(end), migratetype, false);
 	return ret;
 }
 
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 43e0856..36858ab 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -15,8 +15,18 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/page_isolation.h>
 
+#define get_pageblock_isolate_skip(page) \
+			get_pageblock_flags_group(page, PB_isolate_skip,     \
+							PB_isolate_skip)
+#define clear_pageblock_isolate_skip(page) \
+			set_pageblock_flags_group(page, 0, PB_isolate_skip,  \
+							PB_isolate_skip)
+#define set_pageblock_isolate_skip(page) \
+			set_pageblock_flags_group(page, 1, PB_isolate_skip,  \
+							PB_isolate_skip)
+
 static int set_migratetype_isolate(struct page *page, int migratetype,
-				bool skip_hwpoisoned_pages)
+				bool skip_hwpoisoned_pages, bool reuse)
 {
 	struct zone *zone;
 	unsigned long flags, pfn;
@@ -33,8 +43,11 @@ static int set_migratetype_isolate(struct page *page, int migratetype,
 	 * If it is already set, then someone else must have raced and
 	 * set it before us.  Return -EBUSY
 	 */
-	if (is_migrate_isolate_page(page))
+	if (is_migrate_isolate_page(page)) {
+		if (reuse && get_pageblock_isolate_skip(page))
+			ret = 0;
 		goto out;
+	}
 
 	pfn = page_to_pfn(page);
 	arg.start_pfn = pfn;
@@ -75,6 +88,8 @@ static int set_migratetype_isolate(struct page *page, int migratetype,
 		int mt = get_pageblock_migratetype(page);
 
 		set_pageblock_migratetype(page, MIGRATE_ISOLATE);
+		if (reuse)
+			set_pageblock_isolate_skip(page);
 		zone->nr_isolate_pageblock++;
 		nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE,
 									NULL);
@@ -185,7 +200,7 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
  * prevents two threads from simultaneously working on overlapping ranges.
  */
 int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
-			     unsigned migratetype, bool skip_hwpoisoned_pages)
+	unsigned int migratetype, bool skip_hwpoisoned_pages, bool reuse)
 {
 	unsigned long pfn;
 	unsigned long undo_pfn;
@@ -199,7 +214,8 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
 	     pfn += pageblock_nr_pages) {
 		page = __first_valid_page(pfn, pageblock_nr_pages);
 		if (page &&
-		    set_migratetype_isolate(page, migratetype, skip_hwpoisoned_pages)) {
+		    set_migratetype_isolate(page, migratetype,
+			skip_hwpoisoned_pages, reuse)) {
 			undo_pfn = pfn;
 			goto undo;
 		}
@@ -222,7 +238,7 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
  * Make isolated pages available again.
  */
 int undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
-			    unsigned migratetype)
+	unsigned int migratetype, bool reuse)
 {
 	unsigned long pfn;
 	struct page *page;
@@ -236,6 +252,8 @@ int undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
 		page = __first_valid_page(pfn, pageblock_nr_pages);
 		if (!page || !is_migrate_isolate_page(page))
 			continue;
+		if (reuse)
+			clear_pageblock_isolate_skip(page);
 		unset_migratetype_isolate(page, migratetype);
 	}
 	return 0;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/3] drivers/base/memory: introduce a new state 'isolate' for memblock
  2018-09-19  3:17 [PATCH 0/3] introduce a new state 'isolate' for memblock to split the isolation and migration steps Pingfan Liu
  2018-09-19  3:17 ` [PATCH 1/3] mm/isolation: separate the isolation and migration ops in offline memblock Pingfan Liu
@ 2018-09-19  3:17 ` Pingfan Liu
  2018-09-19  6:49   ` kbuild test robot
  2018-09-19  3:17 ` [PATCH 3/3] drivers/base/node: create a partial offline hints under each node Pingfan Liu
  2 siblings, 1 reply; 6+ messages in thread
From: Pingfan Liu @ 2018-09-19  3:17 UTC (permalink / raw)
  To: linux-mm
  Cc: Pingfan Liu, Andrew Morton, KAMEZAWA Hiroyuki, Mel Gorman,
	Greg Kroah-Hartman, Pavel Tatashin, Michal Hocko, Bharata B Rao,
	Dan Williams, H. Peter Anvin, Kirill A . Shutemov

Currently, offline pages in the unit of memblock, and normally, it is done
one by one on each memblock. If there is only one numa node, then the dst
pages may come from the next memblock to be offlined, which wastes time
during memory offline. For a system with multi numa node, if only replacing
part of mem on a node, and the migration dst page can be allocated from
local node (which is done by [3/3]), it also faces such issue.
This patch suggests to introduce a new state, named 'isolate', the state
transition can be isolate -> online or reversion. And another slight
benefit of "isolated" state is no further allocation on this memblock,
which can block potential unmovable page allocated again from this
memblock for a long time.

After this patch, the suggested ops to offline pages
will looks like:
  for i in {s..e}; do  echo isolate > memory$i/state; done
  for i in {s..e}; do  echo offline > memory$i/state; done

Since this patch does not change the original offline path, hence
  for i in (s..e); do  echo offline > memory$i/state; done
still works.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 drivers/base/memory.c  | 31 ++++++++++++++++++++++++++++++-
 include/linux/memory.h |  1 +
 2 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index c8a1cb0..3b714be 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -19,6 +19,7 @@
 #include <linux/memory.h>
 #include <linux/memory_hotplug.h>
 #include <linux/mm.h>
+#include <linux/page-isolation.h>
 #include <linux/mutex.h>
 #include <linux/stat.h>
 #include <linux/slab.h>
@@ -166,6 +167,9 @@ static ssize_t show_mem_state(struct device *dev,
 	case MEM_GOING_OFFLINE:
 		len = sprintf(buf, "going-offline\n");
 		break;
+	case MEM_ISOLATED:
+		len = sprintf(buf, "isolated\n");
+		break;
 	default:
 		len = sprintf(buf, "ERROR-UNKNOWN-%ld\n",
 				mem->state);
@@ -323,6 +327,9 @@ store_mem_state(struct device *dev,
 {
 	struct memory_block *mem = to_memory_block(dev);
 	int ret, online_type;
+	int isolated = 0;
+	unsigned long start_pfn;
+	unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
 
 	ret = lock_device_hotplug_sysfs();
 	if (ret)
@@ -336,7 +343,13 @@ store_mem_state(struct device *dev,
 		online_type = MMOP_ONLINE_KEEP;
 	else if (sysfs_streq(buf, "offline"))
 		online_type = MMOP_OFFLINE;
-	else {
+	else if (sysfs_streq(buf, "isolate")) {
+		isolated = 1;
+		goto memblock_isolated;
+	} else if (sysfs_streq(buf, "unisolate")) {
+		isolated = -1;
+		goto memblock_isolated;
+	} else {
 		ret = -EINVAL;
 		goto err;
 	}
@@ -366,6 +379,20 @@ store_mem_state(struct device *dev,
 
 	mem_hotplug_done();
 err:
+memblock_isolated:
+	if (isolated == 1 && mem->state == MEM_ONLINE) {
+		start_pfn = section_nr_to_pfn(mem->start_section_nr);
+		ret = start_isolate_page_range(start_pfn, start_pfn + nr_pages,
+			MIGRATE_MOVABLE, true, true);
+		if (!ret)
+			mem->state = MEM_ISOLATED;
+	} else if (isolated == -1 && mem->state == MEM_ISOLATED) {
+		start_pfn = section_nr_to_pfn(mem->start_section_nr);
+		ret = undo_isolate_page_range(start_pfn, start_pfn + nr_pages,
+			MIGRATE_MOVABLE, true);
+		if (!ret)
+			mem->state = MEM_ONLINE;
+	}
 	unlock_device_hotplug();
 
 	if (ret < 0)
@@ -455,6 +482,7 @@ static DEVICE_ATTR(phys_index, 0444, show_mem_start_phys_index, NULL);
 static DEVICE_ATTR(state, 0644, show_mem_state, store_mem_state);
 static DEVICE_ATTR(phys_device, 0444, show_phys_device, NULL);
 static DEVICE_ATTR(removable, 0444, show_mem_removable, NULL);
+//static DEVICE_ATTR(isolate, 0600, show_mem_isolate, store_mem_isolate);
 
 /*
  * Block size attribute stuff
@@ -631,6 +659,7 @@ static struct attribute *memory_memblk_attrs[] = {
 #ifdef CONFIG_MEMORY_HOTREMOVE
 	&dev_attr_valid_zones.attr,
 #endif
+	//&dev_attr_isolate.attr,
 	NULL
 };
 
diff --git a/include/linux/memory.h b/include/linux/memory.h
index a6ddefc..e00f22c 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -47,6 +47,7 @@ int set_memory_block_size_order(unsigned int order);
 #define	MEM_GOING_ONLINE	(1<<3)
 #define	MEM_CANCEL_ONLINE	(1<<4)
 #define	MEM_CANCEL_OFFLINE	(1<<5)
+#define	MEM_ISOLATED	(1<<6)
 
 struct memory_notify {
 	unsigned long start_pfn;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 3/3] drivers/base/node: create a partial offline hints under each node
  2018-09-19  3:17 [PATCH 0/3] introduce a new state 'isolate' for memblock to split the isolation and migration steps Pingfan Liu
  2018-09-19  3:17 ` [PATCH 1/3] mm/isolation: separate the isolation and migration ops in offline memblock Pingfan Liu
  2018-09-19  3:17 ` [PATCH 2/3] drivers/base/memory: introduce a new state 'isolate' for memblock Pingfan Liu
@ 2018-09-19  3:17 ` Pingfan Liu
  2018-09-19  4:36   ` kbuild test robot
  2 siblings, 1 reply; 6+ messages in thread
From: Pingfan Liu @ 2018-09-19  3:17 UTC (permalink / raw)
  To: linux-mm
  Cc: Pingfan Liu, Andrew Morton, KAMEZAWA Hiroyuki, Mel Gorman,
	Greg Kroah-Hartman, Pavel Tatashin, Michal Hocko, Bharata B Rao,
	Dan Williams, H. Peter Anvin, Kirill A . Shutemov

When offline mem, there are two cases: 1st, offline all of memblock under a
node. 2nd, only offline and replace part of mem under a node. For the 2nd
case, there is not need to alloc new page from other nodes, which may incur
extra numa fault to resolve the misplaced issue, and place unnecessary mem
pressure on other nodes. The patch suggests to introduce an interface
 /sys/../node/nodeX/partial_offline to let the user order how to
allocate a new page, i.e. from local node or other nodes.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 drivers/base/node.c    | 33 +++++++++++++++++++++++++++++++++
 include/linux/mmzone.h |  1 +
 mm/memory_hotplug.c    | 31 +++++++++++++++++++------------
 3 files changed, 53 insertions(+), 12 deletions(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index 1ac4c36..64b0cb8 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -25,6 +25,36 @@ static struct bus_type node_subsys = {
 	.dev_name = "node",
 };
 
+static ssize_t read_partial_offline(struct device *dev,
+	struct device_attribute *attr, char *buf)
+{
+	int nid = dev->id;
+	struct pglist_data *pgdat = NODE_DATA(nid);
+	ssize_t len = 0;
+
+	if (pgdat->partial_offline)
+		len = sprintf(buf, "1\n");
+	else
+		len = sprintf(buf, "0\n");
+
+	return len;
+}
+
+static ssize_t write_partial_offline(struct device *dev,
+	struct device_attribute *attr, const char *buf, size_t count)
+{
+	int nid = dev->id;
+	struct pglist_data *pgdat = NODE_DATA(nid);
+
+	if (sysfs_streq(buf, "1"))
+		pgdat->partial_offline = true;
+	else if (sysfs_streq(buf, "0"))
+		pgdat->partial_offline = false;
+	else
+		return -EINVAL;
+
+	return strlen(buf);
+}
 
 static ssize_t node_read_cpumap(struct device *dev, bool list, char *buf)
 {
@@ -56,6 +86,8 @@ static inline ssize_t node_read_cpulist(struct device *dev,
 	return node_read_cpumap(dev, true, buf);
 }
 
+static DEVICE_ATTR(partial_offline, 0600, read_partial_offline,
+	write_partial_offline);
 static DEVICE_ATTR(cpumap,  S_IRUGO, node_read_cpumask, NULL);
 static DEVICE_ATTR(cpulist, S_IRUGO, node_read_cpulist, NULL);
 
@@ -235,6 +267,7 @@ static struct attribute *node_dev_attrs[] = {
 	&dev_attr_numastat.attr,
 	&dev_attr_distance.attr,
 	&dev_attr_vmstat.attr,
+	&dev_attr_partial_offline.attr,
 	NULL
 };
 ATTRIBUTE_GROUPS(node_dev);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 1e22d96..80c44c8 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -722,6 +722,7 @@ typedef struct pglist_data {
 	/* Per-node vmstats */
 	struct per_cpu_nodestat __percpu *per_cpu_nodestats;
 	atomic_long_t		vm_stat[NR_VM_NODE_STAT_ITEMS];
+	bool	partial_offline;
 } pg_data_t;
 
 #define node_present_pages(nid)	(NODE_DATA(nid)->node_present_pages)
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 228de4d..3c66075 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1346,18 +1346,10 @@ static unsigned long scan_movable_pages(unsigned long start, unsigned long end)
 
 static struct page *new_node_page(struct page *page, unsigned long private)
 {
-	int nid = page_to_nid(page);
-	nodemask_t nmask = node_states[N_MEMORY];
-
-	/*
-	 * try to allocate from a different node but reuse this node if there
-	 * are no other online nodes to be used (e.g. we are offlining a part
-	 * of the only existing node)
-	 */
-	node_clear(nid, nmask);
-	if (nodes_empty(nmask))
-		node_set(nid, nmask);
+	nodemask_t nmask = *(nodemask_t *)private;
+	int nid;
 
+	nid = page_to_nid(page);
 	return new_page_nodemask(page, nid, &nmask);
 }
 
@@ -1371,6 +1363,8 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 	int not_managed = 0;
 	int ret = 0;
 	LIST_HEAD(source);
+	int nid;
+	nodemask_t nmask = node_states[N_MEMORY];
 
 	for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
 		if (!pfn_valid(pfn))
@@ -1430,8 +1424,21 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 			goto out;
 		}
 
+		page = list_entry(source.next, struct page, lru);
+		nid = page_to_nid(page);
+		if (!NODE_DATA(nid)->partial_offline) {
+			/*
+			 * try to allocate from a different node but reuse this
+			 * node if there are no other online nodes to be used
+			 * (e.g. we are offlining a part of the only existing
+			 * node)
+			 */
+			node_clear(nid, nmask);
+			if (nodes_empty(nmask))
+				node_set(nid, nmask);
+		}
 		/* Allocate a new page from the nearest neighbor node */
-		ret = migrate_pages(&source, new_node_page, NULL, 0,
+		ret = migrate_pages(&source, new_node_page, NULL, &nmask,
 					MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
 		if (ret)
 			putback_movable_pages(&source);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 3/3] drivers/base/node: create a partial offline hints under each node
  2018-09-19  3:17 ` [PATCH 3/3] drivers/base/node: create a partial offline hints under each node Pingfan Liu
@ 2018-09-19  4:36   ` kbuild test robot
  0 siblings, 0 replies; 6+ messages in thread
From: kbuild test robot @ 2018-09-19  4:36 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: kbuild-all, linux-mm, Andrew Morton, KAMEZAWA Hiroyuki,
	Mel Gorman, Greg Kroah-Hartman, Pavel Tatashin, Michal Hocko,
	Bharata B Rao, Dan Williams, H. Peter Anvin, Kirill A . Shutemov

[-- Attachment #1: Type: text/plain, Size: 4562 bytes --]

Hi Pingfan,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v4.19-rc4 next-20180918]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Pingfan-Liu/introduce-a-new-state-isolate-for-memblock-to-split-the-isolation-and-migration-steps/20180919-112650
config: x86_64-randconfig-x018-201837 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   mm/memory_hotplug.c: In function 'do_migrate_range':
>> mm/memory_hotplug.c:1442:53: warning: passing argument 4 of 'migrate_pages' makes integer from pointer without a cast [-Wint-conversion]
      ret = migrate_pages(&source, new_node_page, NULL, &nmask,
                                                        ^
   In file included from mm/memory_hotplug.c:27:0:
   include/linux/migrate.h:68:12: note: expected 'long unsigned int' but argument is of type 'nodemask_t * {aka struct <anonymous> *}'
    extern int migrate_pages(struct list_head *l, new_page_t new, free_page_t free,
               ^~~~~~~~~~~~~

vim +/migrate_pages +1442 mm/memory_hotplug.c

  1356	
  1357	#define NR_OFFLINE_AT_ONCE_PAGES	(256)
  1358	static int
  1359	do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
  1360	{
  1361		unsigned long pfn;
  1362		struct page *page;
  1363		int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
  1364		int not_managed = 0;
  1365		int ret = 0;
  1366		LIST_HEAD(source);
  1367		int nid;
  1368		nodemask_t nmask = node_states[N_MEMORY];
  1369	
  1370		for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
  1371			if (!pfn_valid(pfn))
  1372				continue;
  1373			page = pfn_to_page(pfn);
  1374	
  1375			if (PageHuge(page)) {
  1376				struct page *head = compound_head(page);
  1377				pfn = page_to_pfn(head) + (1<<compound_order(head)) - 1;
  1378				if (compound_order(head) > PFN_SECTION_SHIFT) {
  1379					ret = -EBUSY;
  1380					break;
  1381				}
  1382				if (isolate_huge_page(page, &source))
  1383					move_pages -= 1 << compound_order(head);
  1384				continue;
  1385			} else if (PageTransHuge(page))
  1386				pfn = page_to_pfn(compound_head(page))
  1387					+ hpage_nr_pages(page) - 1;
  1388	
  1389			if (!get_page_unless_zero(page))
  1390				continue;
  1391			/*
  1392			 * We can skip free pages. And we can deal with pages on
  1393			 * LRU and non-lru movable pages.
  1394			 */
  1395			if (PageLRU(page))
  1396				ret = isolate_lru_page(page);
  1397			else
  1398				ret = isolate_movable_page(page, ISOLATE_UNEVICTABLE);
  1399			if (!ret) { /* Success */
  1400				put_page(page);
  1401				list_add_tail(&page->lru, &source);
  1402				move_pages--;
  1403				if (!__PageMovable(page))
  1404					inc_node_page_state(page, NR_ISOLATED_ANON +
  1405							    page_is_file_cache(page));
  1406	
  1407			} else {
  1408	#ifdef CONFIG_DEBUG_VM
  1409				pr_alert("failed to isolate pfn %lx\n", pfn);
  1410				dump_page(page, "isolation failed");
  1411	#endif
  1412				put_page(page);
  1413				/* Because we don't have big zone->lock. we should
  1414				   check this again here. */
  1415				if (page_count(page)) {
  1416					not_managed++;
  1417					ret = -EBUSY;
  1418					break;
  1419				}
  1420			}
  1421		}
  1422		if (!list_empty(&source)) {
  1423			if (not_managed) {
  1424				putback_movable_pages(&source);
  1425				goto out;
  1426			}
  1427	
  1428			page = list_entry(source.next, struct page, lru);
  1429			nid = page_to_nid(page);
  1430			if (!NODE_DATA(nid)->partial_offline) {
  1431				/*
  1432				 * try to allocate from a different node but reuse this
  1433				 * node if there are no other online nodes to be used
  1434				 * (e.g. we are offlining a part of the only existing
  1435				 * node)
  1436				 */
  1437				node_clear(nid, nmask);
  1438				if (nodes_empty(nmask))
  1439					node_set(nid, nmask);
  1440			}
  1441			/* Allocate a new page from the nearest neighbor node */
> 1442			ret = migrate_pages(&source, new_node_page, NULL, &nmask,
  1443						MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
  1444			if (ret)
  1445				putback_movable_pages(&source);
  1446		}
  1447	out:
  1448		return ret;
  1449	}
  1450	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 33239 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/3] drivers/base/memory: introduce a new state 'isolate' for memblock
  2018-09-19  3:17 ` [PATCH 2/3] drivers/base/memory: introduce a new state 'isolate' for memblock Pingfan Liu
@ 2018-09-19  6:49   ` kbuild test robot
  0 siblings, 0 replies; 6+ messages in thread
From: kbuild test robot @ 2018-09-19  6:49 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: kbuild-all, linux-mm, Andrew Morton, KAMEZAWA Hiroyuki,
	Mel Gorman, Greg Kroah-Hartman, Pavel Tatashin, Michal Hocko,
	Bharata B Rao, Dan Williams, H. Peter Anvin, Kirill A . Shutemov

[-- Attachment #1: Type: text/plain, Size: 3823 bytes --]

Hi Pingfan,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.19-rc4 next-20180918]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Pingfan-Liu/introduce-a-new-state-isolate-for-memblock-to-split-the-isolation-and-migration-steps/20180919-112650
config: x86_64-randconfig-s0-09191204 (attached as .config)
compiler: gcc-6 (Debian 6.4.0-9) 6.4.0 20171026
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

   drivers/base/memory.o: In function `store_mem_state':
>> drivers/base/memory.c:385: undefined reference to `start_isolate_page_range'
>> drivers/base/memory.c:391: undefined reference to `undo_isolate_page_range'

vim +385 drivers/base/memory.c

   323	
   324	static ssize_t
   325	store_mem_state(struct device *dev,
   326			struct device_attribute *attr, const char *buf, size_t count)
   327	{
   328		struct memory_block *mem = to_memory_block(dev);
   329		int ret, online_type;
   330		int isolated = 0;
   331		unsigned long start_pfn;
   332		unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
   333	
   334		ret = lock_device_hotplug_sysfs();
   335		if (ret)
   336			return ret;
   337	
   338		if (sysfs_streq(buf, "online_kernel"))
   339			online_type = MMOP_ONLINE_KERNEL;
   340		else if (sysfs_streq(buf, "online_movable"))
   341			online_type = MMOP_ONLINE_MOVABLE;
   342		else if (sysfs_streq(buf, "online"))
   343			online_type = MMOP_ONLINE_KEEP;
   344		else if (sysfs_streq(buf, "offline"))
   345			online_type = MMOP_OFFLINE;
   346		else if (sysfs_streq(buf, "isolate")) {
   347			isolated = 1;
   348			goto memblock_isolated;
   349		} else if (sysfs_streq(buf, "unisolate")) {
   350			isolated = -1;
   351			goto memblock_isolated;
   352		} else {
   353			ret = -EINVAL;
   354			goto err;
   355		}
   356	
   357		/*
   358		 * Memory hotplug needs to hold mem_hotplug_begin() for probe to find
   359		 * the correct memory block to online before doing device_online(dev),
   360		 * which will take dev->mutex.  Take the lock early to prevent an
   361		 * inversion, memory_subsys_online() callbacks will be implemented by
   362		 * assuming it's already protected.
   363		 */
   364		mem_hotplug_begin();
   365	
   366		switch (online_type) {
   367		case MMOP_ONLINE_KERNEL:
   368		case MMOP_ONLINE_MOVABLE:
   369		case MMOP_ONLINE_KEEP:
   370			mem->online_type = online_type;
   371			ret = device_online(&mem->dev);
   372			break;
   373		case MMOP_OFFLINE:
   374			ret = device_offline(&mem->dev);
   375			break;
   376		default:
   377			ret = -EINVAL; /* should never happen */
   378		}
   379	
   380		mem_hotplug_done();
   381	err:
   382	memblock_isolated:
   383		if (isolated == 1 && mem->state == MEM_ONLINE) {
   384			start_pfn = section_nr_to_pfn(mem->start_section_nr);
 > 385			ret = start_isolate_page_range(start_pfn, start_pfn + nr_pages,
   386				MIGRATE_MOVABLE, true, true);
   387			if (!ret)
   388				mem->state = MEM_ISOLATED;
   389		} else if (isolated == -1 && mem->state == MEM_ISOLATED) {
   390			start_pfn = section_nr_to_pfn(mem->start_section_nr);
 > 391			ret = undo_isolate_page_range(start_pfn, start_pfn + nr_pages,
   392				MIGRATE_MOVABLE, true);
   393			if (!ret)
   394				mem->state = MEM_ONLINE;
   395		}
   396		unlock_device_hotplug();
   397	
   398		if (ret < 0)
   399			return ret;
   400		if (ret)
   401			return -EINVAL;
   402	
   403		return count;
   404	}
   405	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 27621 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-09-19  6:50 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-09-19  3:17 [PATCH 0/3] introduce a new state 'isolate' for memblock to split the isolation and migration steps Pingfan Liu
2018-09-19  3:17 ` [PATCH 1/3] mm/isolation: separate the isolation and migration ops in offline memblock Pingfan Liu
2018-09-19  3:17 ` [PATCH 2/3] drivers/base/memory: introduce a new state 'isolate' for memblock Pingfan Liu
2018-09-19  6:49   ` kbuild test robot
2018-09-19  3:17 ` [PATCH 3/3] drivers/base/node: create a partial offline hints under each node Pingfan Liu
2018-09-19  4:36   ` kbuild test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).