[tip:numa/core] mm/migration: Improve migrate_misplaced_page()

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: tip-bot for Mel Gorman <mgorman@suse.de>
To: linux-tip-commits@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, hpa@zytor.com, mingo@kernel.org,
	torvalds@linux-foundation.org, a.p.zijlstra@chello.nl,
	hannes@cmpxchg.org, hughd@google.com, riel@redhat.com,
	Lee.Schermerhorn@hp.com, aarcange@redhat.com, mgorman@suse.de,
	tglx@linutronix.de, linux-mm@kvack.org
Subject: [tip:numa/core] mm/migration: Improve migrate_misplaced_page()
Date: Mon, 19 Nov 2012 11:44:22 -0800	[thread overview]
Message-ID: <tip-292c8cf52d4c65e1f8744e5c7ce774516d868ee8@git.kernel.org> (raw)
In-Reply-To: <1353064973-26082-14-git-send-email-mgorman@suse.de>

Commit-ID:  292c8cf52d4c65e1f8744e5c7ce774516d868ee8
Gitweb:     http://git.kernel.org/tip/292c8cf52d4c65e1f8744e5c7ce774516d868ee8
Author:     Mel Gorman <mgorman@suse.de>
AuthorDate: Fri, 16 Nov 2012 11:22:23 +0000
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 19 Nov 2012 03:31:22 +0100

mm/migration: Improve migrate_misplaced_page()

Fix, improve and clean up migrate_misplaced_page() to
reuse migrate_pages() and to check for zone watermarks
to make sure we don't overload the node.

This was originally based on Peter's patch "mm/migrate: Introduce
migrate_misplaced_page()" but borrows extremely heavily from Andrea's
"autonuma: memory follows CPU algorithm and task/mm_autonuma stats
collection".

Based-on-work-by: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Based-on-work-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Based-on-work-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linux-MM <linux-mm@kvack.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Link: http://lkml.kernel.org/r/1353064973-26082-14-git-send-email-mgorman@suse.de
[ Adapted to the numa/core tree. Kept Mel's patch separate to retain
  original authorship for the authors. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 include/linux/migrate_mode.h |   3 -
 mm/memory.c                  |  13 ++--
 mm/migrate.c                 | 143 +++++++++++++++++++++++++++----------------
 3 files changed, 95 insertions(+), 64 deletions(-)

diff --git a/include/linux/migrate_mode.h b/include/linux/migrate_mode.h
index 40b37dc..ebf3d89 100644
--- a/include/linux/migrate_mode.h
+++ b/include/linux/migrate_mode.h
@@ -6,14 +6,11 @@
  *	on most operations but not ->writepage as the potential stall time
  *	is too significant
  * MIGRATE_SYNC will block when migrating pages
- * MIGRATE_FAULT called from the fault path to migrate-on-fault for mempolicy
- *	this path has an extra reference count
  */
 enum migrate_mode {
 	MIGRATE_ASYNC,
 	MIGRATE_SYNC_LIGHT,
 	MIGRATE_SYNC,
-	MIGRATE_FAULT,
 };
 
 #endif		/* MIGRATE_MODE_H_INCLUDED */
diff --git a/mm/memory.c b/mm/memory.c
index 23ad2eb..52ad29d 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3492,28 +3492,25 @@ out_pte_upgrade_unlock:
 
 out_unlock:
 	pte_unmap_unlock(ptep, ptl);
-out:
+
 	if (page) {
 		task_numa_fault(page_nid, last_cpu, 1);
 		put_page(page);
 	}
-
+out:
 	return 0;
 
 migrate:
 	pte_unmap_unlock(ptep, ptl);
 
-	if (!migrate_misplaced_page(page, node)) {
-		page_nid = node;
+	if (migrate_misplaced_page(page, node)) {
 		goto out;
 	}
+	page = NULL;
 
 	ptep = pte_offset_map_lock(mm, pmd, address, &ptl);
-	if (!pte_same(*ptep, entry)) {
-		put_page(page);
-		page = NULL;
+	if (!pte_same(*ptep, entry))
 		goto out_unlock;
-	}
 
 	goto out_pte_upgrade_unlock;
 }
diff --git a/mm/migrate.c b/mm/migrate.c
index b89062d..16a4709 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -225,7 +225,7 @@ static bool buffer_migrate_lock_buffers(struct buffer_head *head,
 	struct buffer_head *bh = head;
 
 	/* Simple case, sync compaction */
-	if (mode != MIGRATE_ASYNC && mode != MIGRATE_FAULT) {
+	if (mode != MIGRATE_ASYNC) {
 		do {
 			get_bh(bh);
 			lock_buffer(bh);
@@ -282,19 +282,9 @@ static int migrate_page_move_mapping(struct address_space *mapping,
 	int expected_count = 0;
 	void **pslot;
 
-	if (mode == MIGRATE_FAULT) {
-		/*
-		 * MIGRATE_FAULT has an extra reference on the page and
-		 * otherwise acts like ASYNC, no point in delaying the
-		 * fault, we'll try again next time.
-		 */
-		expected_count++;
-	}
-
 	if (!mapping) {
 		/* Anonymous page without mapping */
-		expected_count += 1;
-		if (page_count(page) != expected_count)
+		if (page_count(page) != 1)
 			return -EAGAIN;
 		return 0;
 	}
@@ -304,7 +294,7 @@ static int migrate_page_move_mapping(struct address_space *mapping,
 	pslot = radix_tree_lookup_slot(&mapping->page_tree,
  					page_index(page));
 
-	expected_count += 2 + page_has_private(page);
+	expected_count = 2 + page_has_private(page);
 	if (page_count(page) != expected_count ||
 		radix_tree_deref_slot_protected(pslot, &mapping->tree_lock) != page) {
 		spin_unlock_irq(&mapping->tree_lock);
@@ -323,7 +313,7 @@ static int migrate_page_move_mapping(struct address_space *mapping,
 	 * the mapping back due to an elevated page count, we would have to
 	 * block waiting on other references to be dropped.
 	 */
-	if ((mode == MIGRATE_ASYNC || mode == MIGRATE_FAULT) && head &&
+	if (mode == MIGRATE_ASYNC && head &&
 			!buffer_migrate_lock_buffers(head, mode)) {
 		page_unfreeze_refs(page, expected_count);
 		spin_unlock_irq(&mapping->tree_lock);
@@ -531,7 +521,7 @@ int buffer_migrate_page(struct address_space *mapping,
 	 * with an IRQ-safe spinlock held. In the sync case, the buffers
 	 * need to be locked now
 	 */
-	if (mode != MIGRATE_ASYNC && mode != MIGRATE_FAULT)
+	if (mode != MIGRATE_ASYNC)
 		BUG_ON(!buffer_migrate_lock_buffers(head, mode));
 
 	ClearPagePrivate(page);
@@ -697,7 +687,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
 	struct anon_vma *anon_vma = NULL;
 
 	if (!trylock_page(page)) {
-		if (!force || mode == MIGRATE_ASYNC || mode == MIGRATE_FAULT)
+		if (!force || mode == MIGRATE_ASYNC)
 			goto out;
 
 		/*
@@ -1415,55 +1405,102 @@ int migrate_vmas(struct mm_struct *mm, const nodemask_t *to,
 }
 
 /*
+ * Returns true if this is a safe migration target node for misplaced NUMA
+ * pages. Currently it only checks the watermarks which is a bit crude.
+ */
+static bool migrate_balanced_pgdat(struct pglist_data *pgdat,
+				   int nr_migrate_pages)
+{
+	int z;
+
+	for (z = pgdat->nr_zones - 1; z >= 0; z--) {
+		struct zone *zone = pgdat->node_zones + z;
+
+		if (!populated_zone(zone))
+			continue;
+
+		if (zone->all_unreclaimable)
+			continue;
+
+		/* Avoid waking kswapd by allocating pages_to_migrate pages. */
+		if (!zone_watermark_ok(zone, 0,
+				       high_wmark_pages(zone) +
+				       nr_migrate_pages,
+				       0, 0))
+			continue;
+		return true;
+	}
+	return false;
+}
+
+static struct page *alloc_misplaced_dst_page(struct page *page,
+					   unsigned long data,
+					   int **result)
+{
+	int nid = (int) data;
+	struct page *newpage;
+
+	newpage = alloc_pages_exact_node(nid,
+					 (GFP_HIGHUSER_MOVABLE | GFP_THISNODE |
+					  __GFP_NOMEMALLOC | __GFP_NORETRY |
+					  __GFP_NOWARN) &
+					 ~GFP_IOFS, 0);
+	return newpage;
+}
+
+/*
  * Attempt to migrate a misplaced page to the specified destination
- * node.
+ * node. Caller is expected to have an elevated reference count on
+ * the page that will be dropped by this function before returning.
  */
 int migrate_misplaced_page(struct page *page, int node)
 {
-	struct address_space *mapping = page_mapping(page);
-	int page_lru = page_is_file_cache(page);
-	struct page *newpage;
-	int ret = -EAGAIN;
-	gfp_t gfp = GFP_HIGHUSER_MOVABLE;
+	int isolated = 0;
+	LIST_HEAD(migratepages);
 
 	/*
-	 * Never wait for allocations just to migrate on fault, but don't dip
-	 * into reserves. And, only accept pages from the specified node. No
-	 * sense migrating to a different "misplaced" page!
+	 * Don't migrate pages that are mapped in multiple processes.
+	 * TODO: Handle false sharing detection instead of this hammer
 	 */
-	if (mapping)
-		gfp = mapping_gfp_mask(mapping);
-	gfp &= ~__GFP_WAIT;
-	gfp |= __GFP_NOMEMALLOC | GFP_THISNODE;
-
-	newpage = alloc_pages_node(node, gfp, 0);
-	if (!newpage) {
-		ret = -ENOMEM;
+	if (page_mapcount(page) != 1)
 		goto out;
-	}
 
-	if (isolate_lru_page(page)) {
-		ret = -EBUSY;
-		goto put_new;
+	/* Avoid migrating to a node that is nearly full */
+	if (migrate_balanced_pgdat(NODE_DATA(node), 1)) {
+		int page_lru;
+
+		if (isolate_lru_page(page)) {
+			put_page(page);
+			goto out;
+		}
+		isolated = 1;
+
+		/*
+		 * Page is isolated which takes a reference count so now the
+		 * callers reference can be safely dropped without the page
+		 * disappearing underneath us during migration
+		 */
+		put_page(page);
+
+		page_lru = page_is_file_cache(page);
+		inc_zone_page_state(page, NR_ISOLATED_ANON + page_lru);
+		list_add(&page->lru, &migratepages);
 	}
 
-	inc_zone_page_state(page, NR_ISOLATED_ANON + page_lru);
-	ret = __unmap_and_move(page, newpage, 0, 0, MIGRATE_FAULT);
-	/*
-	 * A page that has been migrated has all references removed and will be
-	 * freed. A page that has not been migrated will have kepts its
-	 * references and be restored.
-	 */
-	dec_zone_page_state(page, NR_ISOLATED_ANON + page_lru);
-	putback_lru_page(page);
-put_new:
-	/*
-	 * Move the new page to the LRU. If migration was not successful
-	 * then this will free the page.
-	 */
-	putback_lru_page(newpage);
+	if (isolated) {
+		int nr_remaining;
+
+		nr_remaining = migrate_pages(&migratepages,
+				alloc_misplaced_dst_page,
+				node, false, MIGRATE_ASYNC);
+		if (nr_remaining) {
+			putback_lru_pages(&migratepages);
+			isolated = 0;
+		}
+	}
+	BUG_ON(!list_empty(&migratepages));
 out:
-	return ret;
+	return isolated;
 }
 
 #endif /* CONFIG_NUMA */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2012-11-19 19:46 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-16 11:22 [RFC PATCH 00/43] Automatic NUMA Balancing V3 Mel Gorman
2012-11-16 11:22 ` [PATCH 01/43] mm: compaction: Move migration fail/success stats to migrate.c Mel Gorman
2012-11-16 11:22 ` [PATCH 02/43] mm: migrate: Add a tracepoint for migrate_pages Mel Gorman
2012-11-16 11:22 ` [PATCH 03/43] mm: compaction: Add scanned and isolated counters for compaction Mel Gorman
2012-11-16 11:22 ` [PATCH 04/43] mm: numa: define _PAGE_NUMA Mel Gorman
2012-11-16 11:22 ` [PATCH 05/43] mm: numa: pte_numa() and pmd_numa() Mel Gorman
2012-11-16 11:22 ` [PATCH 06/43] mm: numa: Make pte_numa() and pmd_numa() a generic implementation Mel Gorman
2012-11-16 14:09   ` Rik van Riel
2012-11-16 14:41     ` Mel Gorman
2012-11-16 15:32       ` Linus Torvalds
2012-11-16 16:08         ` Ingo Molnar
2012-11-16 16:56           ` Mel Gorman
2012-11-16 17:12             ` Ingo Molnar
2012-11-16 17:48               ` Mel Gorman
2012-11-16 18:04                 ` Ingo Molnar
2012-11-16 18:55                   ` Mel Gorman
2012-11-16 17:26             ` Rik van Riel
2012-11-16 17:37             ` Ingo Molnar
2012-11-16 18:44               ` Mel Gorman
2012-11-16 16:19         ` Mel Gorman
2012-11-16 11:22 ` [PATCH 07/43] mm: numa: Support NUMA hinting page faults from gup/gup_fast Mel Gorman
2012-11-16 14:09   ` Rik van Riel
2012-11-16 11:22 ` [PATCH 08/43] mm: numa: split_huge_page: transfer the NUMA type from the pmd to the pte Mel Gorman
2012-11-16 11:22 ` [PATCH 09/43] mm: numa: Create basic numa page hinting infrastructure Mel Gorman
2012-11-16 11:22 ` [PATCH 10/43] mm: mempolicy: Make MPOL_LOCAL a real policy Mel Gorman
2012-11-16 11:22 ` [PATCH 11/43] mm: mempolicy: Add MPOL_MF_NOOP Mel Gorman
2012-11-16 11:22 ` [PATCH 12/43] mm: mempolicy: Check for misplaced page Mel Gorman
2012-11-16 11:22 ` [PATCH 13/43] mm: migrate: Introduce migrate_misplaced_page() Mel Gorman
2012-11-19 19:44   ` tip-bot for Mel Gorman [this message]
2012-11-16 11:22 ` [PATCH 14/43] mm: mempolicy: Use _PAGE_NUMA to migrate pages Mel Gorman
2012-11-16 16:08   ` Rik van Riel
2012-11-16 11:22 ` [PATCH 15/43] mm: mempolicy: Add MPOL_MF_LAZY Mel Gorman
2012-11-16 11:22 ` [PATCH 16/43] mm: mempolicy: Hide MPOL_NOOP and MPOL_MF_LAZY from userspace for now Mel Gorman
2012-11-16 16:22   ` Rik van Riel
2012-11-16 11:22 ` [PATCH 17/43] sched, mm, x86: Add the ARCH_SUPPORTS_NUMA_BALANCING flag Mel Gorman
2012-11-16 11:22 ` [PATCH 18/43] mm: numa: Add fault driven placement and migration Mel Gorman
2012-11-16 11:22 ` [PATCH 19/43] mm: numa: Avoid double faulting after migrating misplaced page Mel Gorman
2012-11-16 11:22 ` [PATCH 20/43] mm: sched: numa: Implement constant, per task Working Set Sampling (WSS) rate Mel Gorman
2012-11-16 11:22 ` [PATCH 21/43] sched, numa, mm: Count WS scanning against present PTEs, not virtual memory ranges Mel Gorman
2012-11-16 11:22 ` [PATCH 22/43] mm: sched: numa: Implement slow start for working set sampling Mel Gorman
2012-11-16 11:22 ` [PATCH 23/43] mm: numa: Add pte updates, hinting and migration stats Mel Gorman
2012-11-16 11:22 ` [PATCH 24/43] mm: numa: Migrate on reference policy Mel Gorman
2012-11-16 11:22 ` [PATCH 25/43] mm: numa: Migrate pages handled during a pmd_numa hinting fault Mel Gorman
2012-11-16 11:22 ` [PATCH 26/43] mm: numa: Only mark a PMD pmd_numa if the pages are all on the same node Mel Gorman
2012-11-16 11:22 ` [PATCH 27/43] mm: numa: Structures for Migrate On Fault per NUMA migration rate limiting Mel Gorman
2012-11-16 11:22 ` [PATCH 28/43] mm: numa: Rate limit the amount of memory that is migrated between nodes Mel Gorman
2012-11-16 11:22 ` [PATCH 29/43] mm: numa: Rate limit setting of pte_numa if node is saturated Mel Gorman
2012-11-16 11:22 ` [PATCH 30/43] sched: numa: Slowly increase the scanning period as NUMA faults are handled Mel Gorman
2012-11-16 11:22 ` [PATCH 31/43] mm: numa: Introduce last_nid to the page frame Mel Gorman
2012-11-16 11:22 ` [PATCH 32/43] mm: numa: Use a two-stage filter to restrict pages being migrated for unlikely task<->node relationships Mel Gorman
2012-11-16 11:22 ` [PATCH 33/43] x86: mm: only do a local tlb flush in ptep_set_access_flags() Mel Gorman
2012-11-16 11:22 ` [PATCH 34/43] x86: mm: drop TLB flush from ptep_set_access_flags Mel Gorman
2012-11-16 11:22 ` [PATCH 35/43] mm,generic: only flush the local TLB in ptep_set_access_flags Mel Gorman
2012-11-16 11:22 ` [PATCH 36/43] sched: numa: Introduce tsk_home_node() Mel Gorman
2012-11-16 11:22 ` [PATCH 37/43] sched: numa: Make find_busiest_queue() a method Mel Gorman
2012-11-16 11:22 ` [PATCH 38/43] sched: numa: Implement home-node awareness Mel Gorman
2012-11-16 11:22 ` [PATCH 39/43] sched: numa: Introduce per-mm and per-task structures Mel Gorman
2012-11-16 11:22 ` [PATCH 40/43] sched: numa: CPU follows memory Mel Gorman
2012-11-16 11:22 ` [PATCH 41/43] sched: numa: Rename mempolicy to HOME Mel Gorman
2012-11-16 11:22 ` [PATCH 42/43] sched: numa: Consider only one CPU per node for CPU-follows-memory Mel Gorman
2012-11-16 11:22 ` [PATCH 43/43] sched: numa: Increase and decrease a tasks scanning period based on task fault statistics Mel Gorman
2012-11-16 14:56 ` [RFC PATCH 00/43] Automatic NUMA Balancing V3 Mel Gorman

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:40b37dc dfblob:ebf3d89 dfblob:23ad2eb dfblob:52ad29d
dfblob:b89062d dfblob:16a4709 )
 OR (
bs:"[tip:numa/core] mm/migration: Improve migrate_misplaced_page()" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=tip-292c8cf52d4c65e1f8744e5c7ce774516d868ee8@git.kernel.org \
    --to=mgorman@suse.de \
    --cc=Lee.Schermerhorn@hp.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=aarcange@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=hpa@zytor.com \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-tip-commits@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=riel@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).