From: Andrea Arcangeli <aarcange@redhat.com>
To: linux-kernel@vger.kernel.org, linux-mm@kvack.org
Cc: Hillf Danton <dhillf@gmail.com>, Dan Smith <danms@us.ibm.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Andrew Morton <akpm@linux-foundation.org>,
Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@elte.hu>,
Paul Turner <pjt@google.com>,
Suresh Siddha <suresh.b.siddha@intel.com>,
Mike Galbraith <efault@gmx.de>,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
Lai Jiangshan <laijs@cn.fujitsu.com>,
Bharata B Rao <bharata.rao@gmail.com>,
Lee Schermerhorn <Lee.Schermerhorn@hp.com>,
Rik van Riel <riel@redhat.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>,
Christoph Lameter <cl@linux.com>, Alex Shi <alex.shi@intel.com>,
Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>,
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
Don Morris <don.morris@hp.com>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>
Subject: [PATCH 04/36] autonuma: pte_numa() and pmd_numa()
Date: Wed, 22 Aug 2012 16:58:48 +0200 [thread overview]
Message-ID: <1345647560-30387-5-git-send-email-aarcange@redhat.com> (raw)
In-Reply-To: <1345647560-30387-1-git-send-email-aarcange@redhat.com>
Implement pte_numa and pmd_numa.
We must atomically set the numa bit and clear the present bit to
define a pte_numa or pmd_numa.
Once a pte or pmd has been set as pte_numa or pmd_numa, the next time
a thread touches a virtual address in the corresponding virtual range,
a NUMA hinting page fault will trigger. The NUMA hinting page fault
will clear the NUMA bit and set the present bit again to resolve the
page fault.
NUMA hinting page faults are used:
1) to fill in the per-thread NUMA statistic stored for each thread in
a current->task_autonuma data structure
2) to track the per-node last_nid information in the page structure to
detect false sharing
3) to queue the page mapped by the pte_numa or pmd_numa for async
migration if there have been enough NUMA hinting page faults on the
page coming from remote CPUs
NUMA hinting page faults collect information and possibly add pages to
migrate queues. They are extremely quick, absolutely non-blocking and
do not allocate memory.
The generic implementation is used when CONFIG_AUTONUMA=n.
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
arch/x86/include/asm/pgtable.h | 65 ++++++++++++++++++++++++++++++++++++++-
include/asm-generic/pgtable.h | 12 +++++++
2 files changed, 75 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index b49e70d..bfe42aa 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -405,7 +405,8 @@ static inline int pte_same(pte_t a, pte_t b)
static inline int pte_present(pte_t a)
{
- return pte_flags(a) & (_PAGE_PRESENT | _PAGE_PROTNONE);
+ return pte_flags(a) & (_PAGE_PRESENT | _PAGE_PROTNONE |
+ _PAGE_NUMA_PTE);
}
static inline int pte_hidden(pte_t pte)
@@ -421,7 +422,63 @@ static inline int pmd_present(pmd_t pmd)
* the _PAGE_PSE flag will remain set at all times while the
* _PAGE_PRESENT bit is clear).
*/
- return pmd_flags(pmd) & (_PAGE_PRESENT | _PAGE_PROTNONE | _PAGE_PSE);
+ return pmd_flags(pmd) & (_PAGE_PRESENT | _PAGE_PROTNONE | _PAGE_PSE |
+ _PAGE_NUMA_PMD);
+}
+
+#ifdef CONFIG_AUTONUMA
+/*
+ * _PAGE_NUMA_PTE and _PAGE_NUMA_PMD works identical to
+ * _PAGE_PROTNONE. They're set only when _PAGE_PRESET is not
+ * set and they're never set if _PAGE_PRESENT is set.
+ *
+ * pte/pmd_present() returns true if pte/pmd_numa returns true. Page
+ * fault triggers on those regions if pte/pmd_numa returns true
+ * (because _PAGE_PRESENT is not set).
+ */
+static inline int pte_numa(pte_t pte)
+{
+ return (pte_flags(pte) &
+ (_PAGE_NUMA_PTE|_PAGE_PRESENT)) == _PAGE_NUMA_PTE;
+}
+
+static inline int pmd_numa(pmd_t pmd)
+{
+ return (pmd_flags(pmd) &
+ (_PAGE_NUMA_PMD|_PAGE_PRESENT)) == _PAGE_NUMA_PMD;
+}
+#endif
+
+/*
+ * pte/pmd_mknuma sets the _PAGE_ACCESSED bitflag automatically
+ * because they're called by the NUMA hinting minor page fault. If we
+ * wouldn't set the _PAGE_ACCESSED bitflag here, the TLB miss handler
+ * would be forced to set it later while filling the TLB after we
+ * return to userland. That would trigger a second write to memory
+ * that we optimize away by setting _PAGE_ACCESSED here.
+ */
+static inline pte_t pte_mknonnuma(pte_t pte)
+{
+ pte = pte_clear_flags(pte, _PAGE_NUMA_PTE);
+ return pte_set_flags(pte, _PAGE_PRESENT|_PAGE_ACCESSED);
+}
+
+static inline pmd_t pmd_mknonnuma(pmd_t pmd)
+{
+ pmd = pmd_clear_flags(pmd, _PAGE_NUMA_PMD);
+ return pmd_set_flags(pmd, _PAGE_PRESENT|_PAGE_ACCESSED);
+}
+
+static inline pte_t pte_mknuma(pte_t pte)
+{
+ pte = pte_set_flags(pte, _PAGE_NUMA_PTE);
+ return pte_clear_flags(pte, _PAGE_PRESENT);
+}
+
+static inline pmd_t pmd_mknuma(pmd_t pmd)
+{
+ pmd = pmd_set_flags(pmd, _PAGE_NUMA_PMD);
+ return pmd_clear_flags(pmd, _PAGE_PRESENT);
}
static inline int pmd_none(pmd_t pmd)
@@ -480,6 +537,10 @@ static inline pte_t *pte_offset_kernel(pmd_t *pmd, unsigned long address)
static inline int pmd_bad(pmd_t pmd)
{
+#ifdef CONFIG_AUTONUMA
+ if (pmd_numa(pmd))
+ return 0;
+#endif
return (pmd_flags(pmd) & ~_PAGE_USER) != _KERNPG_TABLE;
}
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index ff4947b..0ff87ec 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -530,6 +530,18 @@ static inline int pmd_trans_unstable(pmd_t *pmd)
#endif
}
+#ifndef CONFIG_AUTONUMA
+static inline int pte_numa(pte_t pte)
+{
+ return 0;
+}
+
+static inline int pmd_numa(pmd_t pmd)
+{
+ return 0;
+}
+#endif /* CONFIG_AUTONUMA */
+
#endif /* CONFIG_MMU */
#endif /* !__ASSEMBLY__ */
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2012-08-22 15:00 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-08-22 14:58 [PATCH 00/36] AutoNUMA24 Andrea Arcangeli
2012-08-22 14:58 ` [PATCH 01/36] autonuma: make set_pmd_at always available Andrea Arcangeli
2012-08-22 14:58 ` [PATCH 02/36] autonuma: export is_vma_temporary_stack() even if CONFIG_TRANSPARENT_HUGEPAGE=n Andrea Arcangeli
2012-08-22 14:58 ` [PATCH 03/36] autonuma: define _PAGE_NUMA_PTE and _PAGE_NUMA_PMD Andrea Arcangeli
2012-08-22 14:58 ` Andrea Arcangeli [this message]
2012-08-22 14:58 ` [PATCH 05/36] autonuma: teach gup_fast about pmd_numa Andrea Arcangeli
2012-08-22 14:58 ` [PATCH 06/36] autonuma: introduce kthread_bind_node() Andrea Arcangeli
2012-08-22 14:58 ` [PATCH 07/36] autonuma: mm_autonuma and task_autonuma data structures Andrea Arcangeli
2012-08-22 14:58 ` [PATCH 08/36] autonuma: define the autonuma flags Andrea Arcangeli
2012-08-22 14:58 ` [PATCH 09/36] autonuma: core autonuma.h header Andrea Arcangeli
2012-08-22 14:58 ` [PATCH 10/36] autonuma: CPU follows memory algorithm Andrea Arcangeli
2012-08-22 14:58 ` [PATCH 11/36] autonuma: add page structure fields Andrea Arcangeli
2012-08-22 14:58 ` [PATCH 12/36] autonuma: knuma_migrated per NUMA node queues Andrea Arcangeli
2012-08-22 14:58 ` [PATCH 13/36] autonuma: autonuma_enter/exit Andrea Arcangeli
2012-08-22 14:58 ` [PATCH 14/36] autonuma: call autonuma_setup_new_exec() Andrea Arcangeli
2012-08-22 14:58 ` [PATCH 15/36] autonuma: alloc/free/init task_autonuma Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 16/36] autonuma: alloc/free/init mm_autonuma Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 17/36] autonuma: prevent select_task_rq_fair to return -1 Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 18/36] autonuma: teach CFS about autonuma affinity Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 19/36] autonuma: memory follows CPU algorithm and task/mm_autonuma stats collection Andrea Arcangeli
2012-08-22 20:19 ` Andi Kleen
2012-08-22 21:22 ` Hugh Dickins
2012-08-22 21:24 ` Andrea Arcangeli
2012-08-22 22:37 ` Andi Kleen
2012-08-22 22:46 ` Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 20/36] autonuma: default mempolicy follow AutoNUMA Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 21/36] autonuma: call autonuma_split_huge_page() Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 22/36] autonuma: make khugepaged pte_numa aware Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 23/36] autonuma: retain page last_nid information in khugepaged Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 24/36] autonuma: numa hinting page faults entry points Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 25/36] autonuma: reset autonuma page data when pages are freed Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 26/36] autonuma: link mm/autonuma.o and kernel/sched/numa.o Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 27/36] autonuma: add CONFIG_AUTONUMA and CONFIG_AUTONUMA_DEFAULT_ENABLED Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 28/36] autonuma: page_autonuma Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 29/36] autonuma: autonuma_migrate_head[0] dynamic size Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 30/36] autonuma: bugcheck page_autonuma fields on newly allocated pages Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 31/36] autonuma: shrink the per-page page_autonuma struct size Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 32/36] autonuma: boost khugepaged scanning rate Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 33/36] autonuma: powerpc port Andrea Arcangeli
2012-08-22 22:01 ` Benjamin Herrenschmidt
2012-08-22 22:35 ` Andrea Arcangeli
2012-08-23 5:11 ` Benjamin Herrenschmidt
2012-08-23 15:23 ` Andrea Arcangeli
2012-08-23 22:13 ` Benjamin Herrenschmidt
2012-08-22 22:56 ` Benjamin Herrenschmidt
2012-08-22 23:06 ` Andrea Arcangeli
2012-08-23 4:15 ` Vaidyanathan Srinivasan
2012-08-22 14:59 ` [PATCH 34/36] autonuma: make the AUTONUMA_SCAN_PMD_FLAG conditional to CONFIG_HAVE_ARCH_AUTONUMA_SCAN_PMD Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 35/36] autonuma: add knuma_migrated/allow_first_fault in sysfs Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 36/36] autonuma: add mm_autonuma working set estimation Andrea Arcangeli
2012-08-22 19:26 ` [PATCH 00/36] AutoNUMA24 Rik van Riel
2012-08-22 21:40 ` Ingo Molnar
2012-08-22 22:19 ` Andrea Arcangeli
2012-08-23 8:42 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1345647560-30387-5-git-send-email-aarcange@redhat.com \
--to=aarcange@redhat.com \
--cc=Lee.Schermerhorn@hp.com \
--cc=akpm@linux-foundation.org \
--cc=alex.shi@intel.com \
--cc=benh@kernel.crashing.org \
--cc=bharata.rao@gmail.com \
--cc=cl@linux.com \
--cc=danms@us.ibm.com \
--cc=dhillf@gmail.com \
--cc=don.morris@hp.com \
--cc=efault@gmx.de \
--cc=hannes@cmpxchg.org \
--cc=konrad.wilk@oracle.com \
--cc=laijs@cn.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mauricfo@linux.vnet.ibm.com \
--cc=mingo@elte.hu \
--cc=paulmck@linux.vnet.ibm.com \
--cc=pjt@google.com \
--cc=riel@redhat.com \
--cc=suresh.b.siddha@intel.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=vatsa@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).