From: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
To: akpm@linux-foundation.org, mgorman@suse.de, dave@sr71.net,
hannes@cmpxchg.org, tony.luck@intel.com,
matthew.garrett@nebula.com, riel@redhat.com,
arjan@linux.intel.com, srinivas.pandruvada@linux.intel.com,
willy@linux.intel.com, kamezawa.hiroyu@jp.fujitsu.com,
lenb@kernel.org, rjw@sisk.pl
Cc: gargankita@gmail.com, paulmck@linux.vnet.ibm.com,
svaidy@linux.vnet.ibm.com, andi@firstfloor.org,
isimatu.yasuaki@jp.fujitsu.com, santosh.shilimkar@ti.com,
kosaki.motohiro@gmail.com, srivatsa.bhat@linux.vnet.ibm.com,
linux-pm@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: [RFC PATCH v4 35/40] mm: Add infrastructure to evacuate memory regions using compaction
Date: Thu, 26 Sep 2013 04:51:39 +0530 [thread overview]
Message-ID: <20130925232136.26184.28161.stgit@srivatsabhat.in.ibm.com> (raw)
In-Reply-To: <20130925231250.26184.31438.stgit@srivatsabhat.in.ibm.com>
To enhance memory power-savings, we need to be able to completely evacuate
lightly allocated regions, and move those used pages to lower regions,
which would help consolidate all the allocations to a minimum no. of regions.
This can be done using some of the memory compaction and reclaim algorithms.
Develop such an infrastructure to evacuate memory regions completely.
The traditional compaction algorithm uses a pfn walker to get free pages
for compaction. But this would be way too costly for us. So we do a pfn walk
only to isolate the used pages, but to get free pages, we just depend on the
fast buddy allocator itself. But we are careful to abort the compaction run
when the buddy allocator starts giving free pages in this region itself or
higher regions (because in that case, if we proceed, it would be defeating
the purpose of the entire effort).
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---
include/linux/compaction.h | 7 +++
include/linux/gfp.h | 2 +
include/linux/migrate.h | 3 +
include/linux/mm.h | 1
include/trace/events/migrate.h | 3 +
mm/compaction.c | 99 ++++++++++++++++++++++++++++++++++++++++
mm/page_alloc.c | 23 +++++++--
7 files changed, 130 insertions(+), 8 deletions(-)
diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index 091d72e..6be2b08 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -26,6 +26,7 @@ extern unsigned long try_to_compact_pages(struct zonelist *zonelist,
extern void compact_pgdat(pg_data_t *pgdat, int order);
extern void reset_isolation_suitable(pg_data_t *pgdat);
extern unsigned long compaction_suitable(struct zone *zone, int order);
+extern int evacuate_mem_region(struct zone *z, struct zone_mem_region *zmr);
/* Do not skip compaction more than 64 times */
#define COMPACT_MAX_DEFER_SHIFT 6
@@ -102,6 +103,12 @@ static inline bool compaction_deferred(struct zone *zone, int order)
return true;
}
+static inline int evacuate_mem_region(struct zone *z,
+ struct zone_mem_region *zmr)
+{
+ return 0;
+}
+
#endif /* CONFIG_COMPACTION */
#if defined(CONFIG_COMPACTION) && defined(CONFIG_SYSFS) && defined(CONFIG_NUMA)
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 9b4dd49..dab3c78 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -351,6 +351,8 @@ extern struct page *alloc_pages_vma(gfp_t gfp_mask, int order,
extern unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order);
extern unsigned long get_zeroed_page(gfp_t gfp_mask);
+int rmqueue_bulk(struct zone *zone, unsigned int order, unsigned long count,
+ struct list_head *list, int migratetype, int cold);
void *alloc_pages_exact(size_t size, gfp_t gfp_mask);
void free_pages_exact(void *virt, size_t size);
/* This is different from alloc_pages_exact_node !!! */
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 8d3c57f..5ab1d48 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -30,7 +30,8 @@ enum migrate_reason {
MR_SYSCALL, /* also applies to cpusets */
MR_MEMPOLICY_MBIND,
MR_NUMA_MISPLACED,
- MR_CMA
+ MR_CMA,
+ MR_PWR_MGMT
};
#ifdef CONFIG_MIGRATION
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 4286a75..f49acb0 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -470,6 +470,7 @@ void put_pages_list(struct list_head *pages);
void split_page(struct page *page, unsigned int order);
int split_free_page(struct page *page);
+void __split_free_page(struct page *page, unsigned int order);
/*
* Compound pages have a destructor function. Provide a
diff --git a/include/trace/events/migrate.h b/include/trace/events/migrate.h
index ec2a6cc..e6892c0 100644
--- a/include/trace/events/migrate.h
+++ b/include/trace/events/migrate.h
@@ -15,7 +15,8 @@
{MR_MEMORY_HOTPLUG, "memory_hotplug"}, \
{MR_SYSCALL, "syscall_or_cpuset"}, \
{MR_MEMPOLICY_MBIND, "mempolicy_mbind"}, \
- {MR_CMA, "cma"}
+ {MR_CMA, "cma"}, \
+ {MR_PWR_MGMT, "power_management"}
TRACE_EVENT(mm_migrate_pages,
diff --git a/mm/compaction.c b/mm/compaction.c
index c775066..9449b7f 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1168,6 +1168,105 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
return rc;
}
+static struct page *power_mgmt_alloc(struct page *migratepage,
+ unsigned long data, int **result)
+{
+ struct compact_control *cc = (struct compact_control *)data;
+ struct page *freepage;
+
+ /*
+ * Try to allocate pages from lower memory regions. If it fails,
+ * abort.
+ */
+ if (list_empty(&cc->freepages)) {
+ struct zone *z = page_zone(migratepage);
+ unsigned int i, count, order = 0;
+ struct page *page, *tmp;
+ LIST_HEAD(list);
+
+ /* Get a bunch of order-0 pages from the buddy freelists */
+ count = rmqueue_bulk(z, order, cc->nr_migratepages, &list,
+ MIGRATE_MOVABLE, 1);
+
+ cc->nr_freepages = count * (1ULL << order);
+
+ if (list_empty(&list))
+ return NULL;
+
+ list_for_each_entry_safe(page, tmp, &list, lru) {
+ __split_free_page(page, order);
+
+ list_move_tail(&page->lru, &cc->freepages);
+
+ /*
+ * Now add all the order-0 subdivisions of this page
+ * to the freelist as well.
+ */
+ for (i = 1; i < (1ULL << order); i++) {
+ page++;
+ list_add(&page->lru, &cc->freepages);
+ }
+
+ }
+
+ VM_BUG_ON(!list_empty(&list));
+
+ /* Now map all the order-0 pages on the freelist. */
+ map_pages(&cc->freepages);
+ }
+
+ freepage = list_entry(cc->freepages.next, struct page, lru);
+
+ if (page_zone_region_id(freepage) >= page_zone_region_id(migratepage))
+ return NULL; /* Freepage is not from lower region, so abort */
+
+ list_del(&freepage->lru);
+ cc->nr_freepages--;
+
+ return freepage;
+}
+
+static unsigned long power_mgmt_release_freepages(unsigned long info)
+{
+ struct compact_control *cc = (struct compact_control *)info;
+
+ return release_freepages(&cc->freepages);
+}
+
+int evacuate_mem_region(struct zone *z, struct zone_mem_region *zmr)
+{
+ unsigned long start_pfn = zmr->start_pfn;
+ unsigned long end_pfn = zmr->end_pfn;
+
+ struct compact_control cc = {
+ .nr_migratepages = 0,
+ .order = -1,
+ .zone = page_zone(pfn_to_page(start_pfn)),
+ .sync = false, /* Async migration */
+ .ignore_skip_hint = true,
+ };
+
+ struct aggression_control ac = {
+ .isolate_unevictable = false,
+ .prep_all = false,
+ .reclaim_clean = true,
+ .max_tries = 1,
+ .reason = MR_PWR_MGMT,
+ };
+
+ struct free_page_control fc = {
+ .free_page_alloc = power_mgmt_alloc,
+ .alloc_data = (unsigned long)&cc,
+ .release_freepages = power_mgmt_release_freepages,
+ .free_data = (unsigned long)&cc,
+ };
+
+ INIT_LIST_HEAD(&cc.migratepages);
+ INIT_LIST_HEAD(&cc.freepages);
+
+ return compact_range(&cc, &ac, &fc, start_pfn, end_pfn);
+}
+
/* Compact all zones within a node */
static void __compact_pgdat(pg_data_t *pgdat, struct compact_control *cc)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 70c3d7a..4571d30 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1793,9 +1793,8 @@ retry:
* a single hold of the lock, for efficiency. Add them to the supplied list.
* Returns the number of new pages which were placed at *list.
*/
-static int rmqueue_bulk(struct zone *zone, unsigned int order,
- unsigned long count, struct list_head *list,
- int migratetype, int cold)
+int rmqueue_bulk(struct zone *zone, unsigned int order, unsigned long count,
+ struct list_head *list, int migratetype, int cold)
{
int mt = migratetype, i;
@@ -2111,6 +2110,20 @@ static int __isolate_free_page(struct page *page, unsigned int order)
return 1UL << order;
}
+
+/*
+ * The page is already free and isolated (removed) from the buddy system.
+ * Set up the refcounts appropriately. Note that we can't use page_order()
+ * here, since the buddy system would have invoked rmv_page_order() before
+ * giving the page.
+ */
+void __split_free_page(struct page *page, unsigned int order)
+{
+ /* Split into individual pages */
+ set_page_refcounted(page);
+ split_page(page, order);
+}
+
/*
* Similar to split_page except the page is already free. As this is only
* being used for migration, the migratetype of the block also changes.
@@ -2132,9 +2145,7 @@ int split_free_page(struct page *page)
if (!nr_pages)
return 0;
- /* Split into individual pages */
- set_page_refcounted(page);
- split_page(page, order);
+ __split_free_page(page, order);
return nr_pages;
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-09-25 23:25 UTC|newest]
Thread overview: 77+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-09-25 23:13 [RFC PATCH v4 00/40] mm: Memory Power Management Srivatsa S. Bhat
2013-09-25 23:13 ` [RFC PATCH v4 01/40] mm: Introduce memory regions data-structure to capture region boundaries within nodes Srivatsa S. Bhat
2013-10-23 9:54 ` Johannes Weiner
2013-10-23 14:38 ` Srivatsa S. Bhat
2013-09-25 23:14 ` [RFC PATCH v4 02/40] mm: Initialize node memory regions during boot Srivatsa S. Bhat
2013-09-25 23:14 ` [RFC PATCH v4 03/40] mm: Introduce and initialize zone memory regions Srivatsa S. Bhat
2013-09-25 23:14 ` [RFC PATCH v4 04/40] mm: Add helpers to retrieve node region and zone region for a given page Srivatsa S. Bhat
2013-09-25 23:14 ` [RFC PATCH v4 05/40] mm: Add data-structures to describe memory regions within the zones' freelists Srivatsa S. Bhat
2013-09-25 23:14 ` [RFC PATCH v4 06/40] mm: Demarcate and maintain pageblocks in region-order in " Srivatsa S. Bhat
2013-09-26 22:16 ` Dave Hansen
2013-09-27 6:34 ` Srivatsa S. Bhat
2013-10-23 10:17 ` Johannes Weiner
2013-10-23 16:09 ` Srivatsa S. Bhat
2013-09-25 23:15 ` [RFC PATCH v4 07/40] mm: Track the freepage migratetype of pages accurately Srivatsa S. Bhat
2013-09-25 23:15 ` [RFC PATCH v4 08/40] mm: Use the correct migratetype during buddy merging Srivatsa S. Bhat
2013-09-25 23:15 ` [RFC PATCH v4 09/40] mm: Add an optimized version of del_from_freelist to keep page allocation fast Srivatsa S. Bhat
2013-09-25 23:15 ` [RFC PATCH v4 10/40] bitops: Document the difference in indexing between fls() and __fls() Srivatsa S. Bhat
2013-09-25 23:16 ` [RFC PATCH v4 11/40] mm: A new optimized O(log n) sorting algo to speed up buddy-sorting Srivatsa S. Bhat
2013-09-25 23:16 ` [RFC PATCH v4 12/40] mm: Add support to accurately track per-memory-region allocation Srivatsa S. Bhat
2013-09-25 23:16 ` [RFC PATCH v4 13/40] mm: Print memory region statistics to understand the buddy allocator behavior Srivatsa S. Bhat
2013-09-25 23:17 ` [RFC PATCH v4 14/40] mm: Enable per-memory-region fragmentation stats in pagetypeinfo Srivatsa S. Bhat
2013-09-25 23:17 ` [RFC PATCH v4 15/40] mm: Add aggressive bias to prefer lower regions during page allocation Srivatsa S. Bhat
2013-09-25 23:17 ` [RFC PATCH v4 16/40] mm: Introduce a "Region Allocator" to manage entire memory regions Srivatsa S. Bhat
2013-10-23 10:10 ` Johannes Weiner
2013-10-23 16:22 ` Srivatsa S. Bhat
2013-09-25 23:17 ` [RFC PATCH v4 17/40] mm: Add a mechanism to add pages to buddy freelists in bulk Srivatsa S. Bhat
2013-09-25 23:18 ` [RFC PATCH v4 18/40] mm: Provide a mechanism to delete pages from " Srivatsa S. Bhat
2013-09-25 23:18 ` [RFC PATCH v4 19/40] mm: Provide a mechanism to release free memory to the region allocator Srivatsa S. Bhat
2013-09-25 23:18 ` [RFC PATCH v4 20/40] mm: Provide a mechanism to request free memory from " Srivatsa S. Bhat
2013-09-25 23:18 ` [RFC PATCH v4 21/40] mm: Maintain the counter for freepages in " Srivatsa S. Bhat
2013-09-25 23:18 ` [RFC PATCH v4 22/40] mm: Propagate the sorted-buddy bias for picking free regions, to " Srivatsa S. Bhat
2013-09-25 23:19 ` [RFC PATCH v4 23/40] mm: Fix vmstat to also account for freepages in the " Srivatsa S. Bhat
2013-09-25 23:19 ` [RFC PATCH v4 24/40] mm: Drop some very expensive sorted-buddy related checks under DEBUG_PAGEALLOC Srivatsa S. Bhat
2013-09-25 23:19 ` [RFC PATCH v4 25/40] mm: Connect Page Allocator(PA) to Region Allocator(RA); add PA => RA flow Srivatsa S. Bhat
2013-09-25 23:19 ` [RFC PATCH v4 26/40] mm: Connect Page Allocator(PA) to Region Allocator(RA); add PA <= " Srivatsa S. Bhat
2013-09-25 23:19 ` [RFC PATCH v4 27/40] mm: Update the freepage migratetype of pages during region allocation Srivatsa S. Bhat
2013-09-25 23:20 ` [RFC PATCH v4 28/40] mm: Provide a mechanism to check if a given page is in the region allocator Srivatsa S. Bhat
2013-09-25 23:20 ` [RFC PATCH v4 29/40] mm: Add a way to request pages of a particular region from " Srivatsa S. Bhat
2013-09-25 23:20 ` [RFC PATCH v4 30/40] mm: Modify move_freepages() to handle pages in the region allocator properly Srivatsa S. Bhat
2013-09-25 23:20 ` [RFC PATCH v4 31/40] mm: Never change migratetypes of pageblocks during freepage stealing Srivatsa S. Bhat
2013-09-25 23:20 ` [RFC PATCH v4 32/40] mm: Set pageblock migratetype when allocating regions from region allocator Srivatsa S. Bhat
2013-09-25 23:21 ` [RFC PATCH v4 33/40] mm: Use a cache between page-allocator and region-allocator Srivatsa S. Bhat
2013-09-25 23:21 ` [RFC PATCH v4 34/40] mm: Restructure the compaction part of CMA for wider use Srivatsa S. Bhat
2013-09-25 23:21 ` Srivatsa S. Bhat [this message]
2013-09-25 23:21 ` [RFC PATCH v4 36/40] kthread: Split out kthread-worker bits to avoid circular header-file dependency Srivatsa S. Bhat
2013-09-25 23:22 ` [RFC PATCH v4 37/40] mm: Add a kthread to perform targeted compaction for memory power management Srivatsa S. Bhat
2013-09-25 23:22 ` [RFC PATCH v4 38/40] mm: Add a mechanism to queue work to the kmempowerd kthread Srivatsa S. Bhat
2013-09-25 23:22 ` [RFC PATCH v4 39/40] mm: Add intelligence in kmempowerd to ignore regions unsuitable for evacuation Srivatsa S. Bhat
2013-09-25 23:22 ` [RFC PATCH v4 40/40] mm: Add triggers in the page-allocator to kick off region evacuation Srivatsa S. Bhat
2013-09-25 23:26 ` [Results] [RFC PATCH v4 00/40] mm: Memory Power Management Srivatsa S. Bhat
2013-09-25 23:40 ` Andrew Morton
2013-09-25 23:47 ` Andi Kleen
2013-09-26 1:14 ` Arjan van de Ven
2013-09-26 13:09 ` Srivatsa S. Bhat
2013-09-26 1:15 ` Arjan van de Ven
2013-09-26 1:21 ` Andrew Morton
2013-09-26 1:50 ` Andi Kleen
2013-09-26 2:59 ` Andrew Morton
2013-09-26 13:42 ` Srivatsa S. Bhat
2013-09-26 15:58 ` Arjan van de Ven
2013-09-26 17:00 ` Srivatsa S. Bhat
2013-09-26 18:06 ` Arjan van de Ven
2013-09-26 18:33 ` Srivatsa S. Bhat
2013-09-26 18:50 ` Luck, Tony
2013-09-26 18:56 ` Srivatsa S. Bhat
2013-09-26 13:37 ` Srivatsa S. Bhat
2013-09-26 15:23 ` Arjan van de Ven
2013-09-26 13:16 ` Srivatsa S. Bhat
2013-09-26 12:58 ` Srivatsa S. Bhat
2013-09-26 15:29 ` Arjan van de Ven
2013-09-26 17:22 ` Luck, Tony
2013-09-26 17:54 ` Srivatsa S. Bhat
2013-09-26 19:38 ` Andi Kleen
2013-11-12 8:02 ` Srivatsa S. Bhat
2013-11-12 17:34 ` Dave Hansen
2013-11-12 18:44 ` Srivatsa S. Bhat
2013-11-12 18:49 ` Srivatsa S. Bhat
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130925232136.26184.28161.stgit@srivatsabhat.in.ibm.com \
--to=srivatsa.bhat@linux.vnet.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=arjan@linux.intel.com \
--cc=dave@sr71.net \
--cc=gargankita@gmail.com \
--cc=hannes@cmpxchg.org \
--cc=isimatu.yasuaki@jp.fujitsu.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=kosaki.motohiro@gmail.com \
--cc=lenb@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-pm@vger.kernel.org \
--cc=matthew.garrett@nebula.com \
--cc=mgorman@suse.de \
--cc=paulmck@linux.vnet.ibm.com \
--cc=riel@redhat.com \
--cc=rjw@sisk.pl \
--cc=santosh.shilimkar@ti.com \
--cc=srinivas.pandruvada@linux.intel.com \
--cc=svaidy@linux.vnet.ibm.com \
--cc=tony.luck@intel.com \
--cc=willy@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).