Re: [RFC PATCH 00/31] Foundation for automatic NUMA balancing V2

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Mel Gorman <mgorman@suse.de>
To: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Rik van Riel <riel@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Hugh Dickins <hughd@google.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH 00/31] Foundation for automatic NUMA balancing V2
Date: Tue, 13 Nov 2012 15:42:15 +0000	[thread overview]
Message-ID: <20121113154215.GD8218@suse.de> (raw)
In-Reply-To: <20121113151416.GA20044@gmail.com>

On Tue, Nov 13, 2012 at 04:14:16PM +0100, Ingo Molnar wrote:
> 
> * Mel Gorman <mgorman@suse.de> wrote:
> 
> > (Since I wrote this changelog there has been another release 
> > of schednuma. I had delayed releasing this series long enough 
> > and decided not to delay further. Of course, I plan to dig 
> > into that new revision and see what has changed.)
> 
> Thanks, I've picked up a number of cleanups from your series and 
> propagated them into tip:numa/core tree.
> 

Cool.

> FYI, in addition to the specific patches to which I replied to 
> earier today, I've also propagated all your:
> 
>    CONFIG_SCHED_NUMA -> CONFIG_BALANCE_NUMA
> 
> renames thoughout the patches - I fundamentally agree that 
> CONFIG_BALANCE_NUMA is a better, more generic name.
> 
> My structural criticism of the architecture specific bits of 
> your patch-queue still applies to this version as well. That 
> change inflicted much of the changes that you had to do to 
> Peter's patches. It blew up the size of your tree and forks the 
> code into per architecture variants for no good reason.
> 

Should be fairly easy to do what you described -- move to generic and
make weak functions. PAGE_NUMA still has to be defined per architecture
because they'll need to update their pte_present, pmd_present and pmd_bad to
match but I do not necessarily consider this to be a bad thing. Initially,
enabling automatic NUMA support be a careful choice until we can be 100%
sure that PROT_NONE is equivalent in all cases.  Prototype is below that
moves definitions to mm/pgtable-generic.c

There is still the task of converting change_prot_numa() to reuse
change_protection if PAGE_NUMA == PROT_NONE but that should be
straight-forward.

> Had you not done that and had you kept the code generic you'd 
> essentially end up close to where tip:numa/core is today.
> 
> So if we can clear that core issue up we'll have quite a bit of 
> agreement.
> 
> I'd also like to add another, structural side note: you mixed 
> new vm-stats bits into the whole queue, needlessly blowing up 
> the size and the mm/ specific portions of the tree. I'd suggest 
> to post and keep those bits separately, preferably on top of 
> what we have already once it has settled down. I'm keeping the 
> 'perf bench numa' bits separate as well.

The stats part are fairly late in the queue. I noticed they break build
for !CONFIG_BALANCE_NUMA but it was trivially resolved. I feel they are
important due to the history showing the cost of all the balancing
implementations to be fairly high. One can use profiles to see where
some of the cost is but I also find the vmstats helpful in figuring out
how much work it's doing. They can be dropped again if they are not
considered generally useful.

> 
> Anyway, I've applied all applicable cleanups from you and picked 
> up Peter's latest code with the modifications I've indicated in 
> that thread, to the latest tip:numa/core tree, which I'll send 
> out for review in the next hour or so.
> 

Ok.

> This version is supposed to address all review feedback received 
> so far: it refines the MM specific split-up of the patches, 
> fixes regressions - see the changelogs for more details.
> 
> I'll (re-)send the full series of the latest patches and any 
> additional feedback will be welcome.
> 

Thanks

---8<---
mm: numa: Make pte_numa() and pmd_numa() a generic implementation

It was pointed out by Ingo Molnar that the per-architecture definition of
the NUMA PTE helper functions means that each supporting architecture
will have to cut and paste it which is unfortunate. He suggested instead
that the helpers should be weak functions that can be overridden by the
architecture.

This patch moves the helpers to mm/pgtable-generic.c and makes them weak
functions. Architectures wishing to use this will still be required to
define _PAGE_NUMA and potentially update their p[te|md]_present and
pmd_bad helpers if they choose to make PAGE_NUMA similar to PROT_NONE.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 arch/x86/include/asm/pgtable.h |   56 +---------------------------------------
 include/asm-generic/pgtable.h  |   17 +++++-------
 mm/pgtable-generic.c           |   53 +++++++++++++++++++++++++++++++++++++
 3 files changed, 60 insertions(+), 66 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index e075d57..4a4c11c 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -425,61 +425,6 @@ static inline int pmd_present(pmd_t pmd)
 				 _PAGE_NUMA);
 }
 
-#ifdef CONFIG_BALANCE_NUMA
-/*
- * _PAGE_NUMA works identical to _PAGE_PROTNONE (it's actually the
- * same bit too). It's set only when _PAGE_PRESET is not set and it's
- * never set if _PAGE_PRESENT is set.
- *
- * pte/pmd_present() returns true if pte/pmd_numa returns true. Page
- * fault triggers on those regions if pte/pmd_numa returns true
- * (because _PAGE_PRESENT is not set).
- */
-static inline int pte_numa(pte_t pte)
-{
-	return (pte_flags(pte) &
-		(_PAGE_NUMA|_PAGE_PRESENT)) == _PAGE_NUMA;
-}
-
-static inline int pmd_numa(pmd_t pmd)
-{
-	return (pmd_flags(pmd) &
-		(_PAGE_NUMA|_PAGE_PRESENT)) == _PAGE_NUMA;
-}
-#endif
-
-/*
- * pte/pmd_mknuma sets the _PAGE_ACCESSED bitflag automatically
- * because they're called by the NUMA hinting minor page fault. If we
- * wouldn't set the _PAGE_ACCESSED bitflag here, the TLB miss handler
- * would be forced to set it later while filling the TLB after we
- * return to userland. That would trigger a second write to memory
- * that we optimize away by setting _PAGE_ACCESSED here.
- */
-static inline pte_t pte_mknonnuma(pte_t pte)
-{
-	pte = pte_clear_flags(pte, _PAGE_NUMA);
-	return pte_set_flags(pte, _PAGE_PRESENT|_PAGE_ACCESSED);
-}
-
-static inline pmd_t pmd_mknonnuma(pmd_t pmd)
-{
-	pmd = pmd_clear_flags(pmd, _PAGE_NUMA);
-	return pmd_set_flags(pmd, _PAGE_PRESENT|_PAGE_ACCESSED);
-}
-
-static inline pte_t pte_mknuma(pte_t pte)
-{
-	pte = pte_set_flags(pte, _PAGE_NUMA);
-	return pte_clear_flags(pte, _PAGE_PRESENT);
-}
-
-static inline pmd_t pmd_mknuma(pmd_t pmd)
-{
-	pmd = pmd_set_flags(pmd, _PAGE_NUMA);
-	return pmd_clear_flags(pmd, _PAGE_PRESENT);
-}
-
 static inline int pmd_none(pmd_t pmd)
 {
 	/* Only check low word on 32-bit platforms, since it might be
@@ -534,6 +479,7 @@ static inline pte_t *pte_offset_kernel(pmd_t *pmd, unsigned long address)
 	return (pte_t *)pmd_page_vaddr(*pmd) + pte_index(address);
 }
 
+extern int pmd_numa(pmd_t pmd);
 static inline int pmd_bad(pmd_t pmd)
 {
 #ifdef CONFIG_BALANCE_NUMA
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 896667e..da3e761 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -554,17 +554,12 @@ static inline int pmd_trans_unstable(pmd_t *pmd)
 #endif
 }
 
-#ifndef CONFIG_BALANCE_NUMA
-static inline int pte_numa(pte_t pte)
-{
-	return 0;
-}
-
-static inline int pmd_numa(pmd_t pmd)
-{
-	return 0;
-}
-#endif /* CONFIG_BALANCE_NUMA */
+extern int pte_numa(pte_t pte);
+extern int pmd_numa(pmd_t pmd);
+extern pte_t pte_mknonnuma(pte_t pte);
+extern pmd_t pmd_mknonnuma(pmd_t pmd);
+extern pte_t pte_mknuma(pte_t pte);
+extern pmd_t pmd_mknuma(pmd_t pmd);
 
 #endif /* CONFIG_MMU */
 
diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
index e642627..6b6507f 100644
--- a/mm/pgtable-generic.c
+++ b/mm/pgtable-generic.c
@@ -170,3 +170,56 @@ void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
 }
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 #endif
+
+/*
+ * _PAGE_NUMA works identical to _PAGE_PROTNONE (it's actually the
+ * same bit too). It's set only when _PAGE_PRESET is not set and it's
+ * never set if _PAGE_PRESENT is set.
+ *
+ * pte/pmd_present() returns true if pte/pmd_numa returns true. Page
+ * fault triggers on those regions if pte/pmd_numa returns true
+ * (because _PAGE_PRESENT is not set).
+ */
+__weak int pte_numa(pte_t pte)
+{
+	return (pte_flags(pte) &
+		(_PAGE_NUMA|_PAGE_PRESENT)) == _PAGE_NUMA;
+}
+
+__weak int pmd_numa(pmd_t pmd)
+{
+	return (pmd_flags(pmd) &
+		(_PAGE_NUMA|_PAGE_PRESENT)) == _PAGE_NUMA;
+}
+
+/*
+ * pte/pmd_mknuma sets the _PAGE_ACCESSED bitflag automatically
+ * because they're called by the NUMA hinting minor page fault. If we
+ * wouldn't set the _PAGE_ACCESSED bitflag here, the TLB miss handler
+ * would be forced to set it later while filling the TLB after we
+ * return to userland. That would trigger a second write to memory
+ * that we optimize away by setting _PAGE_ACCESSED here.
+ */
+__weak pte_t pte_mknonnuma(pte_t pte)
+{
+	pte = pte_clear_flags(pte, _PAGE_NUMA);
+	return pte_set_flags(pte, _PAGE_PRESENT|_PAGE_ACCESSED);
+}
+
+__weak pmd_t pmd_mknonnuma(pmd_t pmd)
+{
+	pmd = pmd_clear_flags(pmd, _PAGE_NUMA);
+	return pmd_set_flags(pmd, _PAGE_PRESENT|_PAGE_ACCESSED);
+}
+
+__weak pte_t pte_mknuma(pte_t pte)
+{
+	pte = pte_set_flags(pte, _PAGE_NUMA);
+	return pte_clear_flags(pte, _PAGE_PRESENT);
+}
+
+__weak pmd_t pmd_mknuma(pmd_t pmd)
+{
+	pmd = pmd_set_flags(pmd, _PAGE_NUMA);
+	return pmd_clear_flags(pmd, _PAGE_PRESENT);
+}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2012-11-13 15:42 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-13 11:12 [RFC PATCH 00/31] Foundation for automatic NUMA balancing V2 Mel Gorman
2012-11-13 11:12 ` [PATCH 01/31] mm: compaction: Move migration fail/success stats to migrate.c Mel Gorman
2012-11-13 11:12 ` [PATCH 02/31] mm: migrate: Add a tracepoint for migrate_pages Mel Gorman
2012-11-13 11:12 ` [PATCH 03/31] mm: compaction: Add scanned and isolated counters for compaction Mel Gorman
2012-11-13 11:12 ` [PATCH 04/31] mm: numa: define _PAGE_NUMA Mel Gorman
2012-11-13 11:12 ` [PATCH 05/31] mm: numa: pte_numa() and pmd_numa() Mel Gorman
2012-11-13 11:12 ` [PATCH 06/31] mm: numa: teach gup_fast about pmd_numa Mel Gorman
2012-11-13 11:12 ` [PATCH 07/31] mm: numa: split_huge_page: transfer the NUMA type from the pmd to the pte Mel Gorman
2012-11-14 17:13   ` Rik van Riel
2012-11-13 11:12 ` [PATCH 08/31] mm: numa: Create basic numa page hinting infrastructure Mel Gorman
2012-11-13 11:12 ` [PATCH 09/31] mm: mempolicy: Make MPOL_LOCAL a real policy Mel Gorman
2012-11-13 11:12 ` [PATCH 10/31] mm: mempolicy: Add MPOL_MF_NOOP Mel Gorman
2012-11-13 11:12 ` [PATCH 11/31] mm: mempolicy: Check for misplaced page Mel Gorman
2012-11-13 11:12 ` [PATCH 12/31] mm: migrate: Introduce migrate_misplaced_page() Mel Gorman
2012-11-13 11:12 ` [PATCH 13/31] mm: mempolicy: Use _PAGE_NUMA to migrate pages Mel Gorman
2012-11-13 11:12 ` [PATCH 14/31] mm: mempolicy: Add MPOL_MF_LAZY Mel Gorman
2012-11-13 11:12 ` [PATCH 15/31] mm: numa: Add fault driven placement and migration Mel Gorman
2012-11-13 11:12 ` [PATCH 16/31] mm: numa: Only call task_numa_placement for misplaced pages Mel Gorman
2012-11-14 17:58   ` Rik van Riel
2012-11-14 18:18     ` Mel Gorman
2012-11-13 11:12 ` [PATCH 17/31] mm: numa: Avoid double faulting after migrating misplaced page Mel Gorman
2012-11-14 18:00   ` Rik van Riel
2012-11-13 11:12 ` [PATCH 18/31] mm: sched: numa: Implement constant, per task Working Set Sampling (WSS) rate Mel Gorman
2012-11-13 11:12 ` [PATCH 19/31] mm: sched: numa: Implement slow start for working set sampling Mel Gorman
2012-11-13 11:12 ` [PATCH 20/31] mm: numa: Add pte updates, hinting and migration stats Mel Gorman
2012-11-13 11:12 ` [PATCH 21/31] mm: numa: Migrate on reference policy Mel Gorman
2012-11-13 11:12 ` [PATCH 22/31] x86: mm: only do a local tlb flush in ptep_set_access_flags() Mel Gorman
2012-11-13 11:12 ` [PATCH 23/31] x86: mm: drop TLB flush from ptep_set_access_flags Mel Gorman
2012-11-13 11:12 ` [PATCH 24/31] mm,generic: only flush the local TLB in ptep_set_access_flags Mel Gorman
2012-11-13 11:12 ` [PATCH 25/31] sched: numa: Introduce tsk_home_node() Mel Gorman
2012-11-13 11:12 ` [PATCH 26/31] sched: numa: Make mempolicy home-node aware Mel Gorman
2012-11-14 18:22   ` Rik van Riel
2012-11-14 18:50     ` Mel Gorman
2012-11-13 11:12 ` [PATCH 27/31] sched: numa: Make find_busiest_queue() a method Mel Gorman
2012-11-14 18:25   ` Rik van Riel
2012-11-13 11:12 ` [PATCH 28/31] sched: numa: Implement home-node awareness Mel Gorman
2012-11-13 11:12 ` [PATCH 29/31] sched: numa: CPU follows memory Mel Gorman
2012-11-14 11:20   ` Mel Gorman
2012-11-13 11:12 ` [PATCH 30/31] mm: numa: Introduce last_nid to the page frame Mel Gorman
2012-11-13 11:13 ` [PATCH 31/31] mm: numa: Use a two-stage filter to restrict pages being migrated for unlikely task<->node relationships Mel Gorman
2012-11-13 15:14 ` [RFC PATCH 00/31] Foundation for automatic NUMA balancing V2 Ingo Molnar
2012-11-13 15:42   ` Mel Gorman [this message]
2012-11-13 17:27     ` Ingo Molnar
2012-11-14  4:09       ` Rik van Riel
2012-11-14 12:24       ` Mel Gorman

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:e075d57 dfblob:4a4c11c dfblob:896667e dfblob:da3e761
dfblob:e642627 dfblob:6b6507f )
 OR (
bs:"Re: [RFC PATCH 00/31] Foundation for automatic NUMA balancing V2" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121113154215.GD8218@suse.de \
    --to=mgorman@suse.de \
    --cc=a.p.zijlstra@chello.nl \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@kernel.org \
    --cc=riel@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).