[PATCH 20/31] mm: numa: Add pte updates, hinting and migration stats

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Mel Gorman <mgorman@suse.de>
To: Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Ingo Molnar <mingo@kernel.org>
Cc: Rik van Riel <riel@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Hugh Dickins <hughd@google.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>, Mel Gorman <mgorman@suse.de>
Subject: [PATCH 20/31] mm: numa: Add pte updates, hinting and migration stats
Date: Tue, 13 Nov 2012 11:12:49 +0000	[thread overview]
Message-ID: <1352805180-1607-21-git-send-email-mgorman@suse.de> (raw)
In-Reply-To: <1352805180-1607-1-git-send-email-mgorman@suse.de>

It is tricky to quantify the basic cost of automatic NUMA placement in a
meaningful manner. This patch adds some vmstats that can be used as part
of a basic costing model.

u    = basic unit = sizeof(void *)
Ca   = cost of struct page access = sizeof(struct page) / u
Cpte = Cost PTE access = Ca
Cupdate = Cost PTE update = (2 * Cpte) + (2 * Wlock)
	where Cpte is incurred twice for a read and a write and Wlock
	is a constant representing the cost of taking or releasing a
	lock
Cnumahint = Cost of a minor page fault = some high constant e.g. 1000
Cpagerw = Cost to read or write a full page = Ca + PAGE_SIZE/u
Ci = Cost of page isolation = Ca + Wi
	where Wi is a constant that should reflect the approximate cost
	of the locking operation
Cpagecopy = Cpagerw + (Cpagerw * Wnuma) + Ci + (Ci * Wnuma)
	where Wnuma is the approximate NUMA factor. 1 is local. 1.2
	would imply that remote accesses are 20% more expensive

Balancing cost = Cpte * numa_pte_updates +
		Cnumahint * numa_hint_faults +
		Ci * numa_pages_migrated +
		Cpagecopy * numa_pages_migrated

Note that numa_pages_migrated is used as a measure of how many pages
were isolated even though it would miss pages that failed to migrate. A
vmstat counter could have been added for it but the isolation cost is
pretty marginal in comparison to the overall cost so it seemed overkill.

The ideal way to measure automatic placement benefit would be to count
the number of remote accesses versus local accesses and do something like

	benefit = (remote_accesses_before - remove_access_after) * Wnuma

but the information is not readily available. As a workload converges, the
expection would be that the number of remote numa hints would reduce to 0.

	convergence = numa_hint_faults_local / numa_hint_faults
		where this is measured for the last N number of
		numa hints recorded. When the workload is fully
		converged the value is 1.

This can measure if the placement policy is converging and how fast it is
doing it.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
---
 include/linux/vm_event_item.h |    6 ++++++
 mm/huge_memory.c              |    1 +
 mm/memory.c                   |   12 ++++++++++++
 mm/mempolicy.c                |    5 +++++
 mm/migrate.c                  |    3 ++-
 mm/vmstat.c                   |    6 ++++++
 6 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index a1f750b..dded0af 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -38,6 +38,12 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
 		KSWAPD_LOW_WMARK_HIT_QUICKLY, KSWAPD_HIGH_WMARK_HIT_QUICKLY,
 		KSWAPD_SKIP_CONGESTION_WAIT,
 		PAGEOUTRUN, ALLOCSTALL, PGROTATED,
+#ifdef CONFIG_BALANCE_NUMA
+		NUMA_PTE_UPDATES,
+		NUMA_HINT_FAULTS,
+		NUMA_HINT_FAULTS_LOCAL,
+		NUMA_PAGE_MIGRATE,
+#endif
 #ifdef CONFIG_MIGRATION
 		PGMIGRATE_SUCCESS, PGMIGRATE_FAIL,
 #endif
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 833a601..f45f25b 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1033,6 +1033,7 @@ int do_huge_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	page = pmd_page(pmd);
 	get_page(page);
 	spin_unlock(&mm->page_table_lock);
+	count_vm_event(NUMA_HINT_FAULTS);
 
 	target_nid = mpol_misplaced(page, vma, haddr);
 	if (target_nid == -1)
diff --git a/mm/memory.c b/mm/memory.c
index 73fa203..95c9abb 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3457,11 +3457,14 @@ int do_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	if (unlikely(!pte_same(*ptep, pte)))
 		goto out_unlock;
 
+	count_vm_event(NUMA_HINT_FAULTS);
 	page = vm_normal_page(vma, addr, pte);
 	BUG_ON(!page);
 
 	get_page(page);
 	current_nid = page_to_nid(page);
+	if (current_nid == numa_node_id())
+		count_vm_event(NUMA_HINT_FAULTS_LOCAL);
 	target_nid = mpol_misplaced(page, vma, addr);
 	if (target_nid == -1) {
 		/*
@@ -3520,6 +3523,9 @@ int do_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	unsigned long offset;
 	spinlock_t *ptl;
 	bool numa = false;
+	int local_nid = numa_node_id();
+	unsigned long nr_faults = 0;
+	unsigned long nr_faults_local = 0;
 
 	spin_lock(&mm->page_table_lock);
 	pmd = *pmdp;
@@ -3566,10 +3572,16 @@ int do_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		curr_nid = page_to_nid(page);
 		task_numa_fault(curr_nid, 1, false);
 
+		nr_faults++;
+		if (curr_nid == local_nid)
+			nr_faults_local++;
+
 		pte = pte_offset_map_lock(mm, pmdp, addr, &ptl);
 	}
 	pte_unmap_unlock(orig_pte, ptl);
 
+	count_vm_events(NUMA_HINT_FAULTS, nr_faults);
+	count_vm_events(NUMA_HINT_FAULTS_LOCAL, nr_faults_local);
 	return 0;
 }
 
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 11052ea..860341e 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -583,6 +583,7 @@ change_prot_numa_range(struct mm_struct *mm, struct vm_area_struct *vma,
 	unsigned long _address, end;
 	spinlock_t *ptl;
 	int ret = 0;
+	int nr_pte_updates = 0;
 
 	VM_BUG_ON(address & ~PAGE_MASK);
 
@@ -626,6 +627,7 @@ change_prot_numa_range(struct mm_struct *mm, struct vm_area_struct *vma,
 
 		set_pmd_at(mm, address, pmd, pmd_mknuma(*pmd));
 		ret += HPAGE_PMD_NR;
+		nr_pte_updates++;
 		/* defer TLB flush to lower the overhead */
 		spin_unlock(&mm->page_table_lock);
 		goto out;
@@ -652,6 +654,7 @@ change_prot_numa_range(struct mm_struct *mm, struct vm_area_struct *vma,
 			continue;
 
 		set_pte_at(mm, _address, _pte, pte_mknuma(pteval));
+		nr_pte_updates++;
 
 		/* defer TLB flush to lower the overhead */
 		ret++;
@@ -666,6 +669,8 @@ change_prot_numa_range(struct mm_struct *mm, struct vm_area_struct *vma,
 	}
 
 out:
+	if (nr_pte_updates)
+		count_vm_events(NUMA_PTE_UPDATES, nr_pte_updates);
 	return ret;
 }
 
diff --git a/mm/migrate.c b/mm/migrate.c
index 631b2c5..a890429 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1517,7 +1517,8 @@ struct page *migrate_misplaced_page(struct page *page, int node)
 		if (nr_remaining) {
 			putback_lru_pages(&migratepages);
 			req.newpage = NULL;
-		}
+		} else
+			count_vm_event(NUMA_PAGE_MIGRATE);
 	}
 	BUG_ON(!list_empty(&migratepages));
 out:
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 3a067fa..cfa386da 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -774,6 +774,12 @@ const char * const vmstat_text[] = {
 
 	"pgrotated",
 
+#ifdef CONFIG_BALANCE_NUMA
+	"numa_pte_updates",
+	"numa_hint_faults",
+	"numa_hint_faults_local",
+	"numa_pages_migrated",
+#endif
 #ifdef CONFIG_MIGRATION
 	"pgmigrate_success",
 	"pgmigrate_fail",
-- 
1.7.9.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2012-11-13 11:13 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-13 11:12 [RFC PATCH 00/31] Foundation for automatic NUMA balancing V2 Mel Gorman
2012-11-13 11:12 ` [PATCH 01/31] mm: compaction: Move migration fail/success stats to migrate.c Mel Gorman
2012-11-13 11:12 ` [PATCH 02/31] mm: migrate: Add a tracepoint for migrate_pages Mel Gorman
2012-11-13 11:12 ` [PATCH 03/31] mm: compaction: Add scanned and isolated counters for compaction Mel Gorman
2012-11-13 11:12 ` [PATCH 04/31] mm: numa: define _PAGE_NUMA Mel Gorman
2012-11-13 11:12 ` [PATCH 05/31] mm: numa: pte_numa() and pmd_numa() Mel Gorman
2012-11-13 11:12 ` [PATCH 06/31] mm: numa: teach gup_fast about pmd_numa Mel Gorman
2012-11-13 11:12 ` [PATCH 07/31] mm: numa: split_huge_page: transfer the NUMA type from the pmd to the pte Mel Gorman
2012-11-14 17:13   ` Rik van Riel
2012-11-13 11:12 ` [PATCH 08/31] mm: numa: Create basic numa page hinting infrastructure Mel Gorman
2012-11-13 11:12 ` [PATCH 09/31] mm: mempolicy: Make MPOL_LOCAL a real policy Mel Gorman
2012-11-13 11:12 ` [PATCH 10/31] mm: mempolicy: Add MPOL_MF_NOOP Mel Gorman
2012-11-13 11:12 ` [PATCH 11/31] mm: mempolicy: Check for misplaced page Mel Gorman
2012-11-13 11:12 ` [PATCH 12/31] mm: migrate: Introduce migrate_misplaced_page() Mel Gorman
2012-11-13 11:12 ` [PATCH 13/31] mm: mempolicy: Use _PAGE_NUMA to migrate pages Mel Gorman
2012-11-13 11:12 ` [PATCH 14/31] mm: mempolicy: Add MPOL_MF_LAZY Mel Gorman
2012-11-13 11:12 ` [PATCH 15/31] mm: numa: Add fault driven placement and migration Mel Gorman
2012-11-13 11:12 ` [PATCH 16/31] mm: numa: Only call task_numa_placement for misplaced pages Mel Gorman
2012-11-14 17:58   ` Rik van Riel
2012-11-14 18:18     ` Mel Gorman
2012-11-13 11:12 ` [PATCH 17/31] mm: numa: Avoid double faulting after migrating misplaced page Mel Gorman
2012-11-14 18:00   ` Rik van Riel
2012-11-13 11:12 ` [PATCH 18/31] mm: sched: numa: Implement constant, per task Working Set Sampling (WSS) rate Mel Gorman
2012-11-13 11:12 ` [PATCH 19/31] mm: sched: numa: Implement slow start for working set sampling Mel Gorman
2012-11-13 11:12 ` Mel Gorman [this message]
2012-11-13 11:12 ` [PATCH 21/31] mm: numa: Migrate on reference policy Mel Gorman
2012-11-13 11:12 ` [PATCH 22/31] x86: mm: only do a local tlb flush in ptep_set_access_flags() Mel Gorman
2012-11-13 11:12 ` [PATCH 23/31] x86: mm: drop TLB flush from ptep_set_access_flags Mel Gorman
2012-11-13 11:12 ` [PATCH 24/31] mm,generic: only flush the local TLB in ptep_set_access_flags Mel Gorman
2012-11-13 11:12 ` [PATCH 25/31] sched: numa: Introduce tsk_home_node() Mel Gorman
2012-11-13 11:12 ` [PATCH 26/31] sched: numa: Make mempolicy home-node aware Mel Gorman
2012-11-14 18:22   ` Rik van Riel
2012-11-14 18:50     ` Mel Gorman
2012-11-13 11:12 ` [PATCH 27/31] sched: numa: Make find_busiest_queue() a method Mel Gorman
2012-11-14 18:25   ` Rik van Riel
2012-11-13 11:12 ` [PATCH 28/31] sched: numa: Implement home-node awareness Mel Gorman
2012-11-13 11:12 ` [PATCH 29/31] sched: numa: CPU follows memory Mel Gorman
2012-11-14 11:20   ` Mel Gorman
2012-11-13 11:12 ` [PATCH 30/31] mm: numa: Introduce last_nid to the page frame Mel Gorman
2012-11-13 11:13 ` [PATCH 31/31] mm: numa: Use a two-stage filter to restrict pages being migrated for unlikely task<->node relationships Mel Gorman
2012-11-13 15:14 ` [RFC PATCH 00/31] Foundation for automatic NUMA balancing V2 Ingo Molnar
2012-11-13 15:42   ` Mel Gorman
2012-11-13 17:27     ` Ingo Molnar
2012-11-14  4:09       ` Rik van Riel
2012-11-14 12:24       ` Mel Gorman

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:a1f750b dfblob:dded0af dfblob:833a601 dfblob:f45f25b
dfblob:73fa203 dfblob:95c9abb dfblob:11052ea dfblob:860341e
dfblob:631b2c5 dfblob:a890429 dfblob:3a067fa dfblob:cfa386d )
 OR (
bs:"[PATCH 20/31] mm: numa: Add pte updates, hinting and migration stats" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1352805180-1607-21-git-send-email-mgorman@suse.de \
    --to=mgorman@suse.de \
    --cc=a.p.zijlstra@chello.nl \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@kernel.org \
    --cc=riel@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).