[RFC PATCH 0/3] Use an alternative to _PAGE_PROTNONE for _PAGE

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH 0/3] Use an alternative to _PAGE_PROTNONE for _PAGE_NUMA
@ 2014-04-07 15:10 Mel Gorman
  2014-04-07 15:10 ` [PATCH 1/3] x86: Require x86-64 for automatic NUMA balancing Mel Gorman
                   ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Mel Gorman @ 2014-04-07 15:10 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Cyrill Gorcunov, Mel Gorman, Peter Anvin, Ingo Molnar,
	Steven Noonan, Rik van Riel, David Vrabel, Andrew Morton,
	Peter Zijlstra, Andrea Arcangeli, Linux-MM, Linux-X86, LKML

Aliasing _PAGE_NUMA and _PAGE_PROTNONE had some convenient properties but
it ultimately gave Xen a headache and pisses almost everybody off that
looks closely at it. Two discussions on "why this makes sense" is one
discussion too many so rather than having a third there is this series.

Conceptually it's simple -- use an unused physical address bit for _PAGE_NUMA
and make it a 64-bit only feature on x86. This had been avoided before
because if the physical address space expands we are back to square one
but lets worry about that when it happens unless the x86 maintainers or
hardware people warn us that we're about to run headlong into a wall.

Testing was minimal -- short lived JVM and autonumabench tests that trigger
the relevant paths for NUMA balancing. Functionally it did not die miserably.
Performance looks as expected with no major changes.

 arch/x86/Kconfig                     |  2 +-
 arch/x86/include/asm/pgtable.h       |  8 +++----
 arch/x86/include/asm/pgtable_types.h | 44 ++++++++++++++++++++----------------
 mm/memory.c                          | 12 ----------
 4 files changed, 29 insertions(+), 37 deletions(-)

-- 
1.8.4.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 1/3] x86: Require x86-64 for automatic NUMA balancing
  2014-04-07 15:10 [RFC PATCH 0/3] Use an alternative to _PAGE_PROTNONE for _PAGE_NUMA Mel Gorman
@ 2014-04-07 15:10 ` Mel Gorman
  2014-04-07 15:10 ` [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels Mel Gorman
  2014-04-07 15:10 ` [PATCH 3/3] mm: Allow FOLL_NUMA on FOLL_FORCE Mel Gorman
  2 siblings, 0 replies; 29+ messages in thread
From: Mel Gorman @ 2014-04-07 15:10 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Cyrill Gorcunov, Mel Gorman, Peter Anvin, Ingo Molnar,
	Steven Noonan, Rik van Riel, David Vrabel, Andrew Morton,
	Peter Zijlstra, Andrea Arcangeli, Linux-MM, Linux-X86, LKML

Automatic NUMA balancing currently depends on reusing the PROT_NONE
bit which has caused problems on Xen. In preparation for using one of
the unused physical address bits this patch requires x86-64 for automatic
NUMA balancing. 32-bit support for NUMA on x86 is no longer interesting
and the loss of automatic NUMA balancing support should be no surprise.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 arch/x86/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 0af5250..084b1c1 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -26,7 +26,7 @@ config X86
 	select ARCH_MIGHT_HAVE_PC_SERIO
 	select HAVE_AOUT if X86_32
 	select HAVE_UNSTABLE_SCHED_CLOCK
-	select ARCH_SUPPORTS_NUMA_BALANCING
+	select ARCH_SUPPORTS_NUMA_BALANCING if X86_64
 	select ARCH_SUPPORTS_INT128 if X86_64
 	select ARCH_WANTS_PROT_NUMA_PROT_NONE
 	select HAVE_IDE
-- 
1.8.4.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels
  2014-04-07 15:10 [RFC PATCH 0/3] Use an alternative to _PAGE_PROTNONE for _PAGE_NUMA Mel Gorman
  2014-04-07 15:10 ` [PATCH 1/3] x86: Require x86-64 for automatic NUMA balancing Mel Gorman
@ 2014-04-07 15:10 ` Mel Gorman
  2014-04-07 15:32   ` David Vrabel
  2014-04-07 17:37   ` Dave Hansen
  2014-04-07 15:10 ` [PATCH 3/3] mm: Allow FOLL_NUMA on FOLL_FORCE Mel Gorman
  2 siblings, 2 replies; 29+ messages in thread
From: Mel Gorman @ 2014-04-07 15:10 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Cyrill Gorcunov, Mel Gorman, Peter Anvin, Ingo Molnar,
	Steven Noonan, Rik van Riel, David Vrabel, Andrew Morton,
	Peter Zijlstra, Andrea Arcangeli, Linux-MM, Linux-X86, LKML

_PAGE_NUMA is currently an alias of _PROT_PROTNONE to trap NUMA hinting
faults. As the bit is shared care is taken that _PAGE_NUMA is only used in
places where _PAGE_PROTNONE could not reach but this still causes problems
on Xen and conceptually difficult.

Fundamentally, we only need the _PAGE_NUMA bit to tell the difference
between an entry that is really unmapped and a page that is protected
for NUMA hinting faults. Due to physical address limitations bits 52:62
are free so we can currently use them. As the present bit is cleared when
making a NUMA PTE, the hinting faults will still be trapped. It means that
32-bit NUMA cannot use automatic NUMA balancing but it is improbable that
anyone cares about that configuration.

In the future there will be a problem when the physical address space
expands because the bits may no longer be free. There is also the risk that
the hardware people are planning to use these bits for some other purpose.
When/if this happens then an option would be to use bit 11 and disable
kmemcheck if automatic NUMA balancing is enabled assuming bit 11 has not
been used for something else in the meantime.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 arch/x86/include/asm/pgtable.h       |  8 +++----
 arch/x86/include/asm/pgtable_types.h | 44 ++++++++++++++++++++----------------
 2 files changed, 28 insertions(+), 24 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index bbc8b12..58fa7d1 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -447,8 +447,8 @@ static inline int pte_same(pte_t a, pte_t b)
 
 static inline int pte_present(pte_t a)
 {
-	return pte_flags(a) & (_PAGE_PRESENT | _PAGE_PROTNONE |
-			       _PAGE_NUMA);
+	return (pte_flags(a) & (_PAGE_PRESENT | _PAGE_PROTNONE |
+			       _PAGE_NUMA)) != 0;
 }
 
 #define pte_accessible pte_accessible
@@ -477,8 +477,8 @@ static inline int pmd_present(pmd_t pmd)
 	 * the _PAGE_PSE flag will remain set at all times while the
 	 * _PAGE_PRESENT bit is clear).
 	 */
-	return pmd_flags(pmd) & (_PAGE_PRESENT | _PAGE_PROTNONE | _PAGE_PSE |
-				 _PAGE_NUMA);
+	return (pmd_flags(pmd) & (_PAGE_PRESENT | _PAGE_PROTNONE | _PAGE_PSE |
+				 _PAGE_NUMA)) != 0;
 }
 
 static inline int pmd_none(pmd_t pmd)
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index 1aa9ccd..f3eafd2 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -25,6 +25,15 @@
 #define _PAGE_BIT_SPLITTING	_PAGE_BIT_UNUSED1 /* only valid on a PSE pmd */
 #define _PAGE_BIT_NX           63       /* No execute: only valid after cpuid check */
 
+/*
+ * Software bits ignored by the page table walker
+ * At the time of writing, different levels have bits that are ignored. Due
+ * to physical address limitations, bits 52:62 should be ignored for the PMD
+ * and PTE levels and are available for use by software. Be aware that this
+ * may change if the physical address space expands.
+ */
+#define _PAGE_BIT_NUMA		62
+
 /* If _PAGE_BIT_PRESENT is clear, we use these: */
 /* - if the user mapped it with PROT_NONE; pte_present gives true */
 #define _PAGE_BIT_PROTNONE	_PAGE_BIT_GLOBAL
@@ -56,6 +65,21 @@
 #endif
 
 /*
+ * _PAGE_NUMA distinguishes between a numa hinting minor fault and a page
+ * that is not present. The hinting fault gathers numa placement statistics
+ * (see pte_numa()). The bit is always zero when the PTE is not present.
+ *
+ * The bit picked must be always zero when the pmd is present and not
+ * present, so that we don't lose information when we set it while
+ * atomically clearing the present bit.
+ */
+#ifdef CONFIG_NUMA_BALANCING
+#define _PAGE_NUMA	(_AT(pteval_t, 1) << _PAGE_BIT_NUMA)
+#else
+#define _PAGE_NUMA	(_AT(pteval_t, 0))
+#endif
+
+/*
  * The same hidden bit is used by kmemcheck, but since kmemcheck
  * works on kernel pages while soft-dirty engine on user space,
  * they do not conflict with each other.
@@ -94,26 +118,6 @@
 #define _PAGE_FILE	(_AT(pteval_t, 1) << _PAGE_BIT_FILE)
 #define _PAGE_PROTNONE	(_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE)
 
-/*
- * _PAGE_NUMA indicates that this page will trigger a numa hinting
- * minor page fault to gather numa placement statistics (see
- * pte_numa()). The bit picked (8) is within the range between
- * _PAGE_FILE (6) and _PAGE_PROTNONE (8) bits. Therefore, it doesn't
- * require changes to the swp entry format because that bit is always
- * zero when the pte is not present.
- *
- * The bit picked must be always zero when the pmd is present and not
- * present, so that we don't lose information when we set it while
- * atomically clearing the present bit.
- *
- * Because we shared the same bit (8) with _PAGE_PROTNONE this can be
- * interpreted as _PAGE_NUMA only in places that _PAGE_PROTNONE
- * couldn't reach, like handle_mm_fault() (see access_error in
- * arch/x86/mm/fault.c, the vma protection must not be PROT_NONE for
- * handle_mm_fault() to be invoked).
- */
-#define _PAGE_NUMA	_PAGE_PROTNONE
-
 #define _PAGE_TABLE	(_PAGE_PRESENT | _PAGE_RW | _PAGE_USER |	\
 			 _PAGE_ACCESSED | _PAGE_DIRTY)
 #define _KERNPG_TABLE	(_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED |	\
-- 
1.8.4.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 3/3] mm: Allow FOLL_NUMA on FOLL_FORCE
  2014-04-07 15:10 [RFC PATCH 0/3] Use an alternative to _PAGE_PROTNONE for _PAGE_NUMA Mel Gorman
  2014-04-07 15:10 ` [PATCH 1/3] x86: Require x86-64 for automatic NUMA balancing Mel Gorman
  2014-04-07 15:10 ` [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels Mel Gorman
@ 2014-04-07 15:10 ` Mel Gorman
  2 siblings, 0 replies; 29+ messages in thread
From: Mel Gorman @ 2014-04-07 15:10 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Cyrill Gorcunov, Mel Gorman, Peter Anvin, Ingo Molnar,
	Steven Noonan, Rik van Riel, David Vrabel, Andrew Morton,
	Peter Zijlstra, Andrea Arcangeli, Linux-MM, Linux-X86, LKML

As _PAGE_NUMA is no longer aliased to _PAGE_PROTNONE there should be no
confusion between them. It should be possible to kick away the special
casing in __get_user_pages.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/memory.c | 12 ------------
 1 file changed, 12 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 22dfa61..b9c35a7 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1714,18 +1714,6 @@ long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
 	vm_flags &= (gup_flags & FOLL_FORCE) ?
 			(VM_MAYREAD | VM_MAYWRITE) : (VM_READ | VM_WRITE);
 
-	/*
-	 * If FOLL_FORCE and FOLL_NUMA are both set, handle_mm_fault
-	 * would be called on PROT_NONE ranges. We must never invoke
-	 * handle_mm_fault on PROT_NONE ranges or the NUMA hinting
-	 * page faults would unprotect the PROT_NONE ranges if
-	 * _PAGE_NUMA and _PAGE_PROTNONE are sharing the same pte/pmd
-	 * bitflag. So to avoid that, don't set FOLL_NUMA if
-	 * FOLL_FORCE is set.
-	 */
-	if (!(gup_flags & FOLL_FORCE))
-		gup_flags |= FOLL_NUMA;
-
 	i = 0;
 
 	do {
-- 
1.8.4.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels
  2014-04-07 15:10 ` [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels Mel Gorman
@ 2014-04-07 15:32   ` David Vrabel
  2014-04-07 15:49     ` Mel Gorman
  2014-04-07 17:37   ` Dave Hansen
  1 sibling, 1 reply; 29+ messages in thread
From: David Vrabel @ 2014-04-07 15:32 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Linus Torvalds, Cyrill Gorcunov, Peter Anvin, Ingo Molnar,
	Steven Noonan, Rik van Riel, Andrew Morton, Peter Zijlstra,
	Andrea Arcangeli, Linux-MM, Linux-X86, LKML

On 07/04/14 16:10, Mel Gorman wrote:
> _PAGE_NUMA is currently an alias of _PROT_PROTNONE to trap NUMA hinting
> faults. As the bit is shared care is taken that _PAGE_NUMA is only used in
> places where _PAGE_PROTNONE could not reach but this still causes problems
> on Xen and conceptually difficult.

The problem with Xen guests occurred because mprotect() /was/ confusing
PROTNONE mappings with _PAGE_NUMA and clearing the non-existant NUMA hints.

David

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels
  2014-04-07 15:32   ` David Vrabel
@ 2014-04-07 15:49     ` Mel Gorman
  2014-04-07 16:19       ` Cyrill Gorcunov
  0 siblings, 1 reply; 29+ messages in thread
From: Mel Gorman @ 2014-04-07 15:49 UTC (permalink / raw)
  To: David Vrabel
  Cc: Linus Torvalds, Cyrill Gorcunov, Peter Anvin, Ingo Molnar,
	Steven Noonan, Rik van Riel, Andrew Morton, Peter Zijlstra,
	Andrea Arcangeli, Linux-MM, Linux-X86, LKML

On Mon, Apr 07, 2014 at 04:32:39PM +0100, David Vrabel wrote:
> On 07/04/14 16:10, Mel Gorman wrote:
> > _PAGE_NUMA is currently an alias of _PROT_PROTNONE to trap NUMA hinting
> > faults. As the bit is shared care is taken that _PAGE_NUMA is only used in
> > places where _PAGE_PROTNONE could not reach but this still causes problems
> > on Xen and conceptually difficult.
> 
> The problem with Xen guests occurred because mprotect() /was/ confusing
> PROTNONE mappings with _PAGE_NUMA and clearing the non-existant NUMA hints.
> 

I didn't bother spelling it out in case I gave the impression that I was
blaming Xen for the problem.  As the bit is now changes, does it help
the Xen problem or cause another collision of some sort? There is no
guarantee _PAGE_NUMA will remain as bit 62 but at worst it'll use bit 11
and NUMA_BALANCING will depend in !KMEMCHECK.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels
  2014-04-07 15:49     ` Mel Gorman
@ 2014-04-07 16:19       ` Cyrill Gorcunov
  2014-04-07 18:28         ` Mel Gorman
  0 siblings, 1 reply; 29+ messages in thread
From: Cyrill Gorcunov @ 2014-04-07 16:19 UTC (permalink / raw)
  To: Mel Gorman
  Cc: David Vrabel, Linus Torvalds, Peter Anvin, Ingo Molnar,
	Steven Noonan, Rik van Riel, Andrew Morton, Peter Zijlstra,
	Andrea Arcangeli, Linux-MM, Linux-X86, LKML, Pavel Emelyanov

On Mon, Apr 07, 2014 at 04:49:35PM +0100, Mel Gorman wrote:
> On Mon, Apr 07, 2014 at 04:32:39PM +0100, David Vrabel wrote:
> > On 07/04/14 16:10, Mel Gorman wrote:
> > > _PAGE_NUMA is currently an alias of _PROT_PROTNONE to trap NUMA hinting
> > > faults. As the bit is shared care is taken that _PAGE_NUMA is only used in
> > > places where _PAGE_PROTNONE could not reach but this still causes problems
> > > on Xen and conceptually difficult.
> > 
> > The problem with Xen guests occurred because mprotect() /was/ confusing
> > PROTNONE mappings with _PAGE_NUMA and clearing the non-existant NUMA hints.
> 
> I didn't bother spelling it out in case I gave the impression that I was
> blaming Xen for the problem.  As the bit is now changes, does it help
> the Xen problem or cause another collision of some sort? There is no
> guarantee _PAGE_NUMA will remain as bit 62 but at worst it'll use bit 11
> and NUMA_BALANCING will depend in !KMEMCHECK.

Fwiw, we're using bit 11 for soft-dirty tracking, so i really hope worst case
never happen. (At the moment I'm trying to figure out if with this set
it would be possible to clean up ugly macros in pgoff_to_pte for 2 level pages).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels
  2014-04-07 15:10 ` [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels Mel Gorman
  2014-04-07 15:32   ` David Vrabel
@ 2014-04-07 17:37   ` Dave Hansen
  1 sibling, 0 replies; 29+ messages in thread
From: Dave Hansen @ 2014-04-07 17:37 UTC (permalink / raw)
  To: Mel Gorman, Linus Torvalds
  Cc: Cyrill Gorcunov, Peter Anvin, Ingo Molnar, Steven Noonan,
	Rik van Riel, David Vrabel, Andrew Morton, Peter Zijlstra,
	Andrea Arcangeli, Linux-MM, Linux-X86, LKML

On 04/07/2014 08:10 AM, Mel Gorman wrote:
> +/*
> + * Software bits ignored by the page table walker
> + * At the time of writing, different levels have bits that are ignored. Due
> + * to physical address limitations, bits 52:62 should be ignored for the PMD
> + * and PTE levels and are available for use by software. Be aware that this
> + * may change if the physical address space expands.
> + */
> +#define _PAGE_BIT_NUMA		62

Doesn't moving it up to the high bits break pte_modify()'s assumptions?
 I was thinking of this nugget from change_pte_range():

	ptent = ptep_modify_prot_start(mm, addr, pte);
        if (pte_numa(ptent))
        	ptent = pte_mknonnuma(ptent);
	ptent = pte_modify(ptent, newprot);

pte_modify() pulls off all the high bits out of 'ptent' and only adds
them back if they're in newprot (which as far as I can tell comes from
the VMA).  So I _think_ it'll axe the _PAGE_NUMA out of 'ptent'.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels
  2014-04-07 16:19       ` Cyrill Gorcunov
@ 2014-04-07 18:28         ` Mel Gorman
  2014-04-07 19:16           ` Cyrill Gorcunov
  2014-04-07 19:27           ` H. Peter Anvin
  0 siblings, 2 replies; 29+ messages in thread
From: Mel Gorman @ 2014-04-07 18:28 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: David Vrabel, Linus Torvalds, Peter Anvin, Ingo Molnar,
	Steven Noonan, Rik van Riel, Andrew Morton, Peter Zijlstra,
	Andrea Arcangeli, Linux-MM, Linux-X86, LKML, Pavel Emelyanov

On Mon, Apr 07, 2014 at 08:19:10PM +0400, Cyrill Gorcunov wrote:
> On Mon, Apr 07, 2014 at 04:49:35PM +0100, Mel Gorman wrote:
> > On Mon, Apr 07, 2014 at 04:32:39PM +0100, David Vrabel wrote:
> > > On 07/04/14 16:10, Mel Gorman wrote:
> > > > _PAGE_NUMA is currently an alias of _PROT_PROTNONE to trap NUMA hinting
> > > > faults. As the bit is shared care is taken that _PAGE_NUMA is only used in
> > > > places where _PAGE_PROTNONE could not reach but this still causes problems
> > > > on Xen and conceptually difficult.
> > > 
> > > The problem with Xen guests occurred because mprotect() /was/ confusing
> > > PROTNONE mappings with _PAGE_NUMA and clearing the non-existant NUMA hints.
> > 
> > I didn't bother spelling it out in case I gave the impression that I was
> > blaming Xen for the problem.  As the bit is now changes, does it help
> > the Xen problem or cause another collision of some sort? There is no
> > guarantee _PAGE_NUMA will remain as bit 62 but at worst it'll use bit 11
> > and NUMA_BALANCING will depend in !KMEMCHECK.
> 
> Fwiw, we're using bit 11 for soft-dirty tracking, so i really hope worst case
> never happen. (At the moment I'm trying to figure out if with this set
> it would be possible to clean up ugly macros in pgoff_to_pte for 2 level pages).

I had considered the soft-dirty tracking usage of the same bit. I thought I'd
be able to swizzle around it or a further worst case of having soft-dirty and
automatic NUMA balancing mutually exclusive. Unfortunately upon examination
it's not obvious how to have both of them share a bit and I suspect any
attempt to will break CRIU.  In my current tree, NUMA_BALANCING cannot be
set if MEM_SOFT_DIRTY which is not particularly satisfactory. Next on the
list is examining if _PAGE_BIT_IOMAP can be used.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels
  2014-04-07 18:28         ` Mel Gorman
@ 2014-04-07 19:16           ` Cyrill Gorcunov
  2014-04-07 19:27           ` H. Peter Anvin
  1 sibling, 0 replies; 29+ messages in thread
From: Cyrill Gorcunov @ 2014-04-07 19:16 UTC (permalink / raw)
  To: Mel Gorman
  Cc: David Vrabel, Linus Torvalds, Peter Anvin, Ingo Molnar,
	Steven Noonan, Rik van Riel, Andrew Morton, Peter Zijlstra,
	Andrea Arcangeli, Linux-MM, Linux-X86, LKML, Pavel Emelyanov

On Mon, Apr 07, 2014 at 07:28:54PM +0100, Mel Gorman wrote:
> > > I didn't bother spelling it out in case I gave the impression that I was
> > > blaming Xen for the problem.  As the bit is now changes, does it help
> > > the Xen problem or cause another collision of some sort? There is no
> > > guarantee _PAGE_NUMA will remain as bit 62 but at worst it'll use bit 11
> > > and NUMA_BALANCING will depend in !KMEMCHECK.
> > 
> > Fwiw, we're using bit 11 for soft-dirty tracking, so i really hope worst case
> > never happen. (At the moment I'm trying to figure out if with this set
> > it would be possible to clean up ugly macros in pgoff_to_pte for 2 level pages).
> 
> I had considered the soft-dirty tracking usage of the same bit. I thought I'd
> be able to swizzle around it or a further worst case of having soft-dirty and
> automatic NUMA balancing mutually exclusive. Unfortunately upon examination
> it's not obvious how to have both of them share a bit and I suspect any
> attempt to will break CRIU.  In my current tree, NUMA_BALANCING cannot be
> set if MEM_SOFT_DIRTY which is not particularly satisfactory. Next on the
> list is examining if _PAGE_BIT_IOMAP can be used.

Thanks for info, Mel! It seems indeed if no more space left on x86-64 (in
the very worst case which I still think won't happen anytime soon) we'll
have to make them mut. exclusive. But for now (with 62 bit used for numa)
they can live together, right?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels
  2014-04-07 18:28         ` Mel Gorman
  2014-04-07 19:16           ` Cyrill Gorcunov
@ 2014-04-07 19:27           ` H. Peter Anvin
  2014-04-07 19:36             ` Cyrill Gorcunov
  2014-04-07 21:19             ` Mel Gorman
  1 sibling, 2 replies; 29+ messages in thread
From: H. Peter Anvin @ 2014-04-07 19:27 UTC (permalink / raw)
  To: Mel Gorman, Cyrill Gorcunov
  Cc: David Vrabel, Linus Torvalds, Ingo Molnar, Steven Noonan,
	Rik van Riel, Andrew Morton, Peter Zijlstra, Andrea Arcangeli,
	Linux-MM, Linux-X86, LKML, Pavel Emelyanov

On 04/07/2014 11:28 AM, Mel Gorman wrote:
> 
> I had considered the soft-dirty tracking usage of the same bit. I thought I'd
> be able to swizzle around it or a further worst case of having soft-dirty and
> automatic NUMA balancing mutually exclusive. Unfortunately upon examination
> it's not obvious how to have both of them share a bit and I suspect any
> attempt to will break CRIU.  In my current tree, NUMA_BALANCING cannot be
> set if MEM_SOFT_DIRTY which is not particularly satisfactory. Next on the
> list is examining if _PAGE_BIT_IOMAP can be used.
> 

Didn't we smoke the last user of _PAGE_BIT_IOMAP?

	-hpa

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels
  2014-04-07 19:27           ` H. Peter Anvin
@ 2014-04-07 19:36             ` Cyrill Gorcunov
  2014-04-07 19:42               ` H. Peter Anvin
  2014-04-08  9:31               ` David Vrabel
  2014-04-07 21:19             ` Mel Gorman
  1 sibling, 2 replies; 29+ messages in thread
From: Cyrill Gorcunov @ 2014-04-07 19:36 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Mel Gorman, David Vrabel, Linus Torvalds, Ingo Molnar,
	Steven Noonan, Rik van Riel, Andrew Morton, Peter Zijlstra,
	Andrea Arcangeli, Linux-MM, Linux-X86, LKML, Pavel Emelyanov

On Mon, Apr 07, 2014 at 12:27:10PM -0700, H. Peter Anvin wrote:
> On 04/07/2014 11:28 AM, Mel Gorman wrote:
> > 
> > I had considered the soft-dirty tracking usage of the same bit. I thought I'd
> > be able to swizzle around it or a further worst case of having soft-dirty and
> > automatic NUMA balancing mutually exclusive. Unfortunately upon examination
> > it's not obvious how to have both of them share a bit and I suspect any
> > attempt to will break CRIU.  In my current tree, NUMA_BALANCING cannot be
> > set if MEM_SOFT_DIRTY which is not particularly satisfactory. Next on the
> > list is examining if _PAGE_BIT_IOMAP can be used.
> 
> Didn't we smoke the last user of _PAGE_BIT_IOMAP?

Seems so, at least for non-kernel pages (not considering this bit references in
xen code, which i simply don't know but i guess it's used for kernel pages only).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels
  2014-04-07 19:36             ` Cyrill Gorcunov
@ 2014-04-07 19:42               ` H. Peter Anvin
  2014-04-07 21:25                 ` Mel Gorman
  2014-04-08  9:31               ` David Vrabel
  1 sibling, 1 reply; 29+ messages in thread
From: H. Peter Anvin @ 2014-04-07 19:42 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Mel Gorman, David Vrabel, Linus Torvalds, Ingo Molnar,
	Steven Noonan, Rik van Riel, Andrew Morton, Peter Zijlstra,
	Andrea Arcangeli, Linux-MM, Linux-X86, LKML, Pavel Emelyanov

On 04/07/2014 12:36 PM, Cyrill Gorcunov wrote:
> On Mon, Apr 07, 2014 at 12:27:10PM -0700, H. Peter Anvin wrote:
>> On 04/07/2014 11:28 AM, Mel Gorman wrote:
>>>
>>> I had considered the soft-dirty tracking usage of the same bit. I thought I'd
>>> be able to swizzle around it or a further worst case of having soft-dirty and
>>> automatic NUMA balancing mutually exclusive. Unfortunately upon examination
>>> it's not obvious how to have both of them share a bit and I suspect any
>>> attempt to will break CRIU.  In my current tree, NUMA_BALANCING cannot be
>>> set if MEM_SOFT_DIRTY which is not particularly satisfactory. Next on the
>>> list is examining if _PAGE_BIT_IOMAP can be used.
>>
>> Didn't we smoke the last user of _PAGE_BIT_IOMAP?
> 
> Seems so, at least for non-kernel pages (not considering this bit references in
> xen code, which i simply don't know but i guess it's used for kernel pages only).
> 

David Vrabel has a patchset which I presumed would be pulled through the
Xen tree this merge window:

[PATCHv5 0/8] x86/xen: fixes for mapping high MMIO regions (and remove
_PAGE_IOMAP)

That frees up this bit.

	-hpa

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels
  2014-04-07 19:27           ` H. Peter Anvin
  2014-04-07 19:36             ` Cyrill Gorcunov
@ 2014-04-07 21:19             ` Mel Gorman
  1 sibling, 0 replies; 29+ messages in thread
From: Mel Gorman @ 2014-04-07 21:19 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Cyrill Gorcunov, David Vrabel, Linus Torvalds, Ingo Molnar,
	Steven Noonan, Rik van Riel, Andrew Morton, Peter Zijlstra,
	Andrea Arcangeli, Linux-MM, Linux-X86, LKML, Pavel Emelyanov

On Mon, Apr 07, 2014 at 12:27:10PM -0700, H. Peter Anvin wrote:
> On 04/07/2014 11:28 AM, Mel Gorman wrote:
> > 
> > I had considered the soft-dirty tracking usage of the same bit. I thought I'd
> > be able to swizzle around it or a further worst case of having soft-dirty and
> > automatic NUMA balancing mutually exclusive. Unfortunately upon examination
> > it's not obvious how to have both of them share a bit and I suspect any
> > attempt to will break CRIU.  In my current tree, NUMA_BALANCING cannot be
> > set if MEM_SOFT_DIRTY which is not particularly satisfactory. Next on the
> > list is examining if _PAGE_BIT_IOMAP can be used.
> > 
> 
> Didn't we smoke the last user of _PAGE_BIT_IOMAP?
> 

There are still some users of _PAGE_IOMAP with Xen being the main user.
For x86 on bare metal it looks like userspace should never have a PTE with
_PAGE_IO set so it should be usable as _PAGE_NUMA. Patches that do that
are currently being tested but a side-effect was that I had to disable
support on Xen as Xen appears to use it to distinguish between Xen PTEs
and MFNs. It's unclear what automatic NUMA balancing on Xen even means --
are NUMA nodes always mapped to the physical topology? What is sensible
behaviour if guest and host both run it? etc. If they need it, we can then
examine what the proper way to support _PAGE_NUMA on Xen is.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels
  2014-04-07 19:42               ` H. Peter Anvin
@ 2014-04-07 21:25                 ` Mel Gorman
  2014-04-08  4:04                   ` Steven Noonan
  0 siblings, 1 reply; 29+ messages in thread
From: Mel Gorman @ 2014-04-07 21:25 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Cyrill Gorcunov, David Vrabel, Linus Torvalds, Ingo Molnar,
	Steven Noonan, Rik van Riel, Andrew Morton, Peter Zijlstra,
	Andrea Arcangeli, Linux-MM, Linux-X86, LKML, Pavel Emelyanov

On Mon, Apr 07, 2014 at 12:42:40PM -0700, H. Peter Anvin wrote:
> On 04/07/2014 12:36 PM, Cyrill Gorcunov wrote:
> > On Mon, Apr 07, 2014 at 12:27:10PM -0700, H. Peter Anvin wrote:
> >> On 04/07/2014 11:28 AM, Mel Gorman wrote:
> >>>
> >>> I had considered the soft-dirty tracking usage of the same bit. I thought I'd
> >>> be able to swizzle around it or a further worst case of having soft-dirty and
> >>> automatic NUMA balancing mutually exclusive. Unfortunately upon examination
> >>> it's not obvious how to have both of them share a bit and I suspect any
> >>> attempt to will break CRIU.  In my current tree, NUMA_BALANCING cannot be
> >>> set if MEM_SOFT_DIRTY which is not particularly satisfactory. Next on the
> >>> list is examining if _PAGE_BIT_IOMAP can be used.
> >>
> >> Didn't we smoke the last user of _PAGE_BIT_IOMAP?
> > 
> > Seems so, at least for non-kernel pages (not considering this bit references in
> > xen code, which i simply don't know but i guess it's used for kernel pages only).
> > 
> 
> David Vrabel has a patchset which I presumed would be pulled through the
> Xen tree this merge window:
> 
> [PATCHv5 0/8] x86/xen: fixes for mapping high MMIO regions (and remove
> _PAGE_IOMAP)
> 
> That frees up this bit.
> 

Thanks, I was not aware of that patch.  Based on it, I intend to force
automatic NUMA balancing to depend on !XEN and see what the reaction is. If
support for Xen is really required then it potentially be re-enabled if/when
that series is merged assuming they do not need the bit for something else.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels
  2014-04-07 21:25                 ` Mel Gorman
@ 2014-04-08  4:04                   ` Steven Noonan
  2014-04-08 15:16                     ` H. Peter Anvin
  0 siblings, 1 reply; 29+ messages in thread
From: Steven Noonan @ 2014-04-08  4:04 UTC (permalink / raw)
  To: Mel Gorman
  Cc: H. Peter Anvin, Cyrill Gorcunov, David Vrabel, Linus Torvalds,
	Ingo Molnar, Rik van Riel, Andrew Morton, Peter Zijlstra,
	Andrea Arcangeli, Linux-MM, Linux-X86, LKML, Pavel Emelyanov

On Mon, Apr 7, 2014 at 2:25 PM, Mel Gorman <mgorman@suse.de> wrote:
> On Mon, Apr 07, 2014 at 12:42:40PM -0700, H. Peter Anvin wrote:
>> On 04/07/2014 12:36 PM, Cyrill Gorcunov wrote:
>> > On Mon, Apr 07, 2014 at 12:27:10PM -0700, H. Peter Anvin wrote:
>> >> On 04/07/2014 11:28 AM, Mel Gorman wrote:
>> >>>
>> >>> I had considered the soft-dirty tracking usage of the same bit. I thought I'd
>> >>> be able to swizzle around it or a further worst case of having soft-dirty and
>> >>> automatic NUMA balancing mutually exclusive. Unfortunately upon examination
>> >>> it's not obvious how to have both of them share a bit and I suspect any
>> >>> attempt to will break CRIU.  In my current tree, NUMA_BALANCING cannot be
>> >>> set if MEM_SOFT_DIRTY which is not particularly satisfactory. Next on the
>> >>> list is examining if _PAGE_BIT_IOMAP can be used.
>> >>
>> >> Didn't we smoke the last user of _PAGE_BIT_IOMAP?
>> >
>> > Seems so, at least for non-kernel pages (not considering this bit references in
>> > xen code, which i simply don't know but i guess it's used for kernel pages only).
>> >
>>
>> David Vrabel has a patchset which I presumed would be pulled through the
>> Xen tree this merge window:
>>
>> [PATCHv5 0/8] x86/xen: fixes for mapping high MMIO regions (and remove
>> _PAGE_IOMAP)
>>
>> That frees up this bit.
>>
>
> Thanks, I was not aware of that patch.  Based on it, I intend to force
> automatic NUMA balancing to depend on !XEN and see what the reaction is. If
> support for Xen is really required then it potentially be re-enabled if/when
> that series is merged assuming they do not need the bit for something else.
>

Amazon EC2 does have large memory instance types with NUMA exposed to
the guest (e.g. c3.8xlarge, i2.8xlarge, etc), so it'd be preferable
(to me anyway) if we didn't require !XEN.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels
  2014-04-07 19:36             ` Cyrill Gorcunov
  2014-04-07 19:42               ` H. Peter Anvin
@ 2014-04-08  9:31               ` David Vrabel
  1 sibling, 0 replies; 29+ messages in thread
From: David Vrabel @ 2014-04-08  9:31 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: H. Peter Anvin, Mel Gorman, Linus Torvalds, Ingo Molnar,
	Steven Noonan, Rik van Riel, Andrew Morton, Peter Zijlstra,
	Andrea Arcangeli, Linux-MM, Linux-X86, LKML, Pavel Emelyanov

On 07/04/14 20:36, Cyrill Gorcunov wrote:
> On Mon, Apr 07, 2014 at 12:27:10PM -0700, H. Peter Anvin wrote:
>> On 04/07/2014 11:28 AM, Mel Gorman wrote:
>>>
>>> I had considered the soft-dirty tracking usage of the same bit. I thought I'd
>>> be able to swizzle around it or a further worst case of having soft-dirty and
>>> automatic NUMA balancing mutually exclusive. Unfortunately upon examination
>>> it's not obvious how to have both of them share a bit and I suspect any
>>> attempt to will break CRIU.  In my current tree, NUMA_BALANCING cannot be
>>> set if MEM_SOFT_DIRTY which is not particularly satisfactory. Next on the
>>> list is examining if _PAGE_BIT_IOMAP can be used.
>>
>> Didn't we smoke the last user of _PAGE_BIT_IOMAP?

Not yet.

A last minute regression with mapping of I/O regions from userspace was
found so I had to drop the series from 3.15.  It should be back for 3.16.

> Seems so, at least for non-kernel pages (not considering this bit references in
> xen code, which i simply don't know but i guess it's used for kernel pages only).

Xen uses it for all I/O mappings, both kernel and for userspace.

David

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels
  2014-04-08  4:04                   ` Steven Noonan
@ 2014-04-08 15:16                     ` H. Peter Anvin
  2014-04-08 16:02                       ` Konrad Rzeszutek Wilk
  2014-04-08 20:51                       ` Steven Noonan
  0 siblings, 2 replies; 29+ messages in thread
From: H. Peter Anvin @ 2014-04-08 15:16 UTC (permalink / raw)
  To: Steven Noonan, Mel Gorman
  Cc: Cyrill Gorcunov, David Vrabel, Linus Torvalds, Ingo Molnar,
	Rik van Riel, Andrew Morton, Peter Zijlstra, Andrea Arcangeli,
	Linux-MM, Linux-X86, LKML, Pavel Emelyanov

<snark>

Of course, it would also be preferable if Amazon (or anything else) didn't need Xen PV :(

On April 7, 2014 9:04:53 PM PDT, Steven Noonan <steven@uplinklabs.net> wrote:
>On Mon, Apr 7, 2014 at 2:25 PM, Mel Gorman <mgorman@suse.de> wrote:
>> On Mon, Apr 07, 2014 at 12:42:40PM -0700, H. Peter Anvin wrote:
>>> On 04/07/2014 12:36 PM, Cyrill Gorcunov wrote:
>>> > On Mon, Apr 07, 2014 at 12:27:10PM -0700, H. Peter Anvin wrote:
>>> >> On 04/07/2014 11:28 AM, Mel Gorman wrote:
>>> >>>
>>> >>> I had considered the soft-dirty tracking usage of the same bit.
>I thought I'd
>>> >>> be able to swizzle around it or a further worst case of having
>soft-dirty and
>>> >>> automatic NUMA balancing mutually exclusive. Unfortunately upon
>examination
>>> >>> it's not obvious how to have both of them share a bit and I
>suspect any
>>> >>> attempt to will break CRIU.  In my current tree, NUMA_BALANCING
>cannot be
>>> >>> set if MEM_SOFT_DIRTY which is not particularly satisfactory.
>Next on the
>>> >>> list is examining if _PAGE_BIT_IOMAP can be used.
>>> >>
>>> >> Didn't we smoke the last user of _PAGE_BIT_IOMAP?
>>> >
>>> > Seems so, at least for non-kernel pages (not considering this bit
>references in
>>> > xen code, which i simply don't know but i guess it's used for
>kernel pages only).
>>> >
>>>
>>> David Vrabel has a patchset which I presumed would be pulled through
>the
>>> Xen tree this merge window:
>>>
>>> [PATCHv5 0/8] x86/xen: fixes for mapping high MMIO regions (and
>remove
>>> _PAGE_IOMAP)
>>>
>>> That frees up this bit.
>>>
>>
>> Thanks, I was not aware of that patch.  Based on it, I intend to
>force
>> automatic NUMA balancing to depend on !XEN and see what the reaction
>is. If
>> support for Xen is really required then it potentially be re-enabled
>if/when
>> that series is merged assuming they do not need the bit for something
>else.
>>
>
>Amazon EC2 does have large memory instance types with NUMA exposed to
>the guest (e.g. c3.8xlarge, i2.8xlarge, etc), so it'd be preferable
>(to me anyway) if we didn't require !XEN.

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels
  2014-04-08 15:16                     ` H. Peter Anvin
@ 2014-04-08 16:02                       ` Konrad Rzeszutek Wilk
  2014-04-08 16:16                         ` H. Peter Anvin
  2014-04-08 16:51                         ` Mel Gorman
  2014-04-08 20:51                       ` Steven Noonan
  1 sibling, 2 replies; 29+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-04-08 16:02 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Steven Noonan, Mel Gorman, Cyrill Gorcunov, David Vrabel,
	Linus Torvalds, Ingo Molnar, Rik van Riel, Andrew Morton,
	Peter Zijlstra, Andrea Arcangeli, Linux-MM, Linux-X86, LKML,
	Pavel Emelyanov

.snip..
> >>> David Vrabel has a patchset which I presumed would be pulled through
> >the
> >>> Xen tree this merge window:
> >>>
> >>> [PATCHv5 0/8] x86/xen: fixes for mapping high MMIO regions (and
> >remove
> >>> _PAGE_IOMAP)
> >>>
> >>> That frees up this bit.
> >>>
> >>
> >> Thanks, I was not aware of that patch.  Based on it, I intend to
> >force
> >> automatic NUMA balancing to depend on !XEN and see what the reaction
> >is. If
> >> support for Xen is really required then it potentially be re-enabled
> >if/when
> >> that series is merged assuming they do not need the bit for something
> >else.
> >>
> >
> >Amazon EC2 does have large memory instance types with NUMA exposed to
> >the guest (e.g. c3.8xlarge, i2.8xlarge, etc), so it'd be preferable
> >(to me anyway) if we didn't require !XEN.

What about the patch that David Vrabel posted:

http://osdir.com/ml/general/2014-03/msg41979.html

Has anybody taken it for a spin?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels
  2014-04-08 16:02                       ` Konrad Rzeszutek Wilk
@ 2014-04-08 16:16                         ` H. Peter Anvin
  2014-04-08 16:47                           ` Mel Gorman
  2014-04-08 16:50                           ` David Vrabel
  2014-04-08 16:51                         ` Mel Gorman
  1 sibling, 2 replies; 29+ messages in thread
From: H. Peter Anvin @ 2014-04-08 16:16 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Steven Noonan, Mel Gorman, Cyrill Gorcunov, David Vrabel,
	Linus Torvalds, Ingo Molnar, Rik van Riel, Andrew Morton,
	Peter Zijlstra, Andrea Arcangeli, Linux-MM, Linux-X86, LKML,
	Pavel Emelyanov

On 04/08/2014 09:02 AM, Konrad Rzeszutek Wilk wrote:
>>>
>>> Amazon EC2 does have large memory instance types with NUMA exposed to
>>> the guest (e.g. c3.8xlarge, i2.8xlarge, etc), so it'd be preferable
>>> (to me anyway) if we didn't require !XEN.
> 
> What about the patch that David Vrabel posted:
> 
> http://osdir.com/ml/general/2014-03/msg41979.html
> 
> Has anybody taken it for a spin?
> 

Oh lovely, more pvops in low level paths.  I'm so thrilled.

Incidentally, I wasn't even Cc:'d on that patch and was only added to
the thread by Linus, but never saw the early bits of the thread
including the actual patch.

	-hpa

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels
  2014-04-08 16:16                         ` H. Peter Anvin
@ 2014-04-08 16:47                           ` Mel Gorman
  2014-04-08 16:50                           ` David Vrabel
  1 sibling, 0 replies; 29+ messages in thread
From: Mel Gorman @ 2014-04-08 16:47 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Konrad Rzeszutek Wilk, Steven Noonan, Cyrill Gorcunov,
	David Vrabel, Linus Torvalds, Ingo Molnar, Rik van Riel,
	Andrew Morton, Peter Zijlstra, Andrea Arcangeli, Linux-MM,
	Linux-X86, LKML, Pavel Emelyanov

On Tue, Apr 08, 2014 at 09:16:49AM -0700, H. Peter Anvin wrote:
> On 04/08/2014 09:02 AM, Konrad Rzeszutek Wilk wrote:
> >>>
> >>> Amazon EC2 does have large memory instance types with NUMA exposed to
> >>> the guest (e.g. c3.8xlarge, i2.8xlarge, etc), so it'd be preferable
> >>> (to me anyway) if we didn't require !XEN.
> > 
> > What about the patch that David Vrabel posted:
> > 
> > http://osdir.com/ml/general/2014-03/msg41979.html
> > 
> > Has anybody taken it for a spin?
> > 
> 
> Oh lovely, more pvops in low level paths.  I'm so thrilled.
> 
> Incidentally, I wasn't even Cc:'d on that patch and was only added to
> the thread by Linus, but never saw the early bits of the thread
> including the actual patch.
> 

I posted an alternative to that patch that confines the damage to the
NUMA pte helpers.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels
  2014-04-08 16:16                         ` H. Peter Anvin
  2014-04-08 16:47                           ` Mel Gorman
@ 2014-04-08 16:50                           ` David Vrabel
  1 sibling, 0 replies; 29+ messages in thread
From: David Vrabel @ 2014-04-08 16:50 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Konrad Rzeszutek Wilk, Steven Noonan, Mel Gorman, Cyrill Gorcunov,
	Linus Torvalds, Ingo Molnar, Rik van Riel, Andrew Morton,
	Peter Zijlstra, Andrea Arcangeli, Linux-MM, Linux-X86, LKML,
	Pavel Emelyanov

On 08/04/14 17:16, H. Peter Anvin wrote:
> On 04/08/2014 09:02 AM, Konrad Rzeszutek Wilk wrote:
>>>>
>>>> Amazon EC2 does have large memory instance types with NUMA exposed to
>>>> the guest (e.g. c3.8xlarge, i2.8xlarge, etc), so it'd be preferable
>>>> (to me anyway) if we didn't require !XEN.
>>
>> What about the patch that David Vrabel posted:
>>
>> http://osdir.com/ml/general/2014-03/msg41979.html
>>
>> Has anybody taken it for a spin?
>>
> 
> Oh lovely, more pvops in low level paths.  I'm so thrilled.
> 
> Incidentally, I wasn't even Cc:'d on that patch and was only added to
> the thread by Linus, but never saw the early bits of the thread
> including the actual patch.

I did resend a version CC'd to all the x86 maintainers and included some
performance figures for native (~1 extra clock cycle).

I've included it again below.

My preference would be take this patch as it fixes it for both NUMA
rebalancing and any future uses that want to set/clear _PAGE_PRESENT.

David

8<--------------
x86: use pv-ops in {pte, pmd}_{set,clear}_flags()

Instead of using native functions to operate on the PTEs in
pte_set_flags(), pte_clear_flags(), pmd_set_flags(), pmd_clear_flags()
use the PV aware ones.

This fixes a regression in Xen PV guests introduced by 1667918b6483
(mm: numa: clear numa hinting information on mprotect).

This has negligible performance impact on native since the pte_val()
and __pte() (etc.) calls are patched at runtime when running on bare
metal.  Measurements on a 3 GHz AMD 4284 give approx. 0.3 ns (~1 clock
cycle) of additional time for each function.

Xen PV guest page tables require that their entries use machine
addresses if the preset bit (_PAGE_PRESENT) is set, and (for
successful migration) non-present PTEs must use pseudo-physical
addresses.  This is because on migration MFNs only present PTEs are
translated to PFNs (canonicalised) so they may be translated back to
the new MFN in the destination domain (uncanonicalised).

pte_mknonnuma(), pmd_mknonnuma(), pte_mknuma() and pmd_mknuma() set
and clear the _PAGE_PRESENT bit using pte_set_flags(),
pte_clear_flags(), etc.

In a Xen PV guest, these functions must translate MFNs to PFNs when
clearing _PAGE_PRESENT and translate PFNs to MFNs when setting
_PAGE_PRESENT.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: Steven Noonan <steven@uplinklabs.net>
Cc: Elena Ufimtseva <ufimtseva@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: <stable@vger.kernel.org>        [3.12+]
---
 arch/x86/include/asm/pgtable.h |   12 ++++++------
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index bbc8b12..323e5e2 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -174,16 +174,16 @@ static inline int has_transparent_hugepage(void)

 static inline pte_t pte_set_flags(pte_t pte, pteval_t set)
 {
-	pteval_t v = native_pte_val(pte);
+	pteval_t v = pte_val(pte);

-	return native_make_pte(v | set);
+	return __pte(v | set);
 }

 static inline pte_t pte_clear_flags(pte_t pte, pteval_t clear)
 {
-	pteval_t v = native_pte_val(pte);
+	pteval_t v = pte_val(pte);

-	return native_make_pte(v & ~clear);
+	return __pte(v & ~clear);
 }

 static inline pte_t pte_mkclean(pte_t pte)
@@ -248,14 +248,14 @@ static inline pte_t pte_mkspecial(pte_t pte)

 static inline pmd_t pmd_set_flags(pmd_t pmd, pmdval_t set)
 {
-	pmdval_t v = native_pmd_val(pmd);
+	pmdval_t v = pmd_val(pmd);

 	return __pmd(v | set);
 }

 static inline pmd_t pmd_clear_flags(pmd_t pmd, pmdval_t clear)
 {
-	pmdval_t v = native_pmd_val(pmd);
+	pmdval_t v = pmd_val(pmd);

 	return __pmd(v & ~clear);
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels
  2014-04-08 16:02                       ` Konrad Rzeszutek Wilk
  2014-04-08 16:16                         ` H. Peter Anvin
@ 2014-04-08 16:51                         ` Mel Gorman
  2014-04-09 15:18                           ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 29+ messages in thread
From: Mel Gorman @ 2014-04-08 16:51 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: H. Peter Anvin, Steven Noonan, Cyrill Gorcunov, David Vrabel,
	Linus Torvalds, Ingo Molnar, Rik van Riel, Andrew Morton,
	Peter Zijlstra, Andrea Arcangeli, Linux-MM, Linux-X86, LKML,
	Pavel Emelyanov

On Tue, Apr 08, 2014 at 12:02:50PM -0400, Konrad Rzeszutek Wilk wrote:
> .snip..
> > >>> David Vrabel has a patchset which I presumed would be pulled through
> > >the
> > >>> Xen tree this merge window:
> > >>>
> > >>> [PATCHv5 0/8] x86/xen: fixes for mapping high MMIO regions (and
> > >remove
> > >>> _PAGE_IOMAP)
> > >>>
> > >>> That frees up this bit.
> > >>>
> > >>
> > >> Thanks, I was not aware of that patch.  Based on it, I intend to
> > >force
> > >> automatic NUMA balancing to depend on !XEN and see what the reaction
> > >is. If
> > >> support for Xen is really required then it potentially be re-enabled
> > >if/when
> > >> that series is merged assuming they do not need the bit for something
> > >else.
> > >>
> > >
> > >Amazon EC2 does have large memory instance types with NUMA exposed to
> > >the guest (e.g. c3.8xlarge, i2.8xlarge, etc), so it'd be preferable
> > >(to me anyway) if we didn't require !XEN.
> 
> What about the patch that David Vrabel posted:
> 
> http://osdir.com/ml/general/2014-03/msg41979.html
> 
> Has anybody taken it for a spin?

Alternatively "[PATCH 4/5] mm: use paravirt friendly ops for NUMA
hinting ptes" which modifies the NUMA pte helpers instead of the main
set/clear ones.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels
  2014-04-08 15:16                     ` H. Peter Anvin
  2014-04-08 16:02                       ` Konrad Rzeszutek Wilk
@ 2014-04-08 20:51                       ` Steven Noonan
  2014-04-08 20:59                         ` H. Peter Anvin
  1 sibling, 1 reply; 29+ messages in thread
From: Steven Noonan @ 2014-04-08 20:51 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Mel Gorman, Cyrill Gorcunov, David Vrabel, Linus Torvalds,
	Ingo Molnar, Rik van Riel, Andrew Morton, Peter Zijlstra,
	Andrea Arcangeli, Linux-MM, Linux-X86, LKML, Pavel Emelyanov

On Tue, Apr 8, 2014 at 8:16 AM, H. Peter Anvin <hpa@zytor.com> wrote:
> <snark>
>
> Of course, it would also be preferable if Amazon (or anything else) didn't need Xen PV :(

Well Amazon doesn't expose NUMA on PV, only on HVM guests.

> On April 7, 2014 9:04:53 PM PDT, Steven Noonan <steven@uplinklabs.net> wrote:
>>On Mon, Apr 7, 2014 at 2:25 PM, Mel Gorman <mgorman@suse.de> wrote:
>>> On Mon, Apr 07, 2014 at 12:42:40PM -0700, H. Peter Anvin wrote:
>>>> On 04/07/2014 12:36 PM, Cyrill Gorcunov wrote:
>>>> > On Mon, Apr 07, 2014 at 12:27:10PM -0700, H. Peter Anvin wrote:
>>>> >> On 04/07/2014 11:28 AM, Mel Gorman wrote:
>>>> >>>
>>>> >>> I had considered the soft-dirty tracking usage of the same bit.
>>I thought I'd
>>>> >>> be able to swizzle around it or a further worst case of having
>>soft-dirty and
>>>> >>> automatic NUMA balancing mutually exclusive. Unfortunately upon
>>examination
>>>> >>> it's not obvious how to have both of them share a bit and I
>>suspect any
>>>> >>> attempt to will break CRIU.  In my current tree, NUMA_BALANCING
>>cannot be
>>>> >>> set if MEM_SOFT_DIRTY which is not particularly satisfactory.
>>Next on the
>>>> >>> list is examining if _PAGE_BIT_IOMAP can be used.
>>>> >>
>>>> >> Didn't we smoke the last user of _PAGE_BIT_IOMAP?
>>>> >
>>>> > Seems so, at least for non-kernel pages (not considering this bit
>>references in
>>>> > xen code, which i simply don't know but i guess it's used for
>>kernel pages only).
>>>> >
>>>>
>>>> David Vrabel has a patchset which I presumed would be pulled through
>>the
>>>> Xen tree this merge window:
>>>>
>>>> [PATCHv5 0/8] x86/xen: fixes for mapping high MMIO regions (and
>>remove
>>>> _PAGE_IOMAP)
>>>>
>>>> That frees up this bit.
>>>>
>>>
>>> Thanks, I was not aware of that patch.  Based on it, I intend to
>>force
>>> automatic NUMA balancing to depend on !XEN and see what the reaction
>>is. If
>>> support for Xen is really required then it potentially be re-enabled
>>if/when
>>> that series is merged assuming they do not need the bit for something
>>else.
>>>
>>
>>Amazon EC2 does have large memory instance types with NUMA exposed to
>>the guest (e.g. c3.8xlarge, i2.8xlarge, etc), so it'd be preferable
>>(to me anyway) if we didn't require !XEN.
>
> --
> Sent from my mobile phone.  Please pardon brevity and lack of formatting.
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels
  2014-04-08 20:51                       ` Steven Noonan
@ 2014-04-08 20:59                         ` H. Peter Anvin
  2014-04-09 15:04                           ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 29+ messages in thread
From: H. Peter Anvin @ 2014-04-08 20:59 UTC (permalink / raw)
  To: Steven Noonan
  Cc: Mel Gorman, Cyrill Gorcunov, David Vrabel, Linus Torvalds,
	Ingo Molnar, Rik van Riel, Andrew Morton, Peter Zijlstra,
	Andrea Arcangeli, Linux-MM, Linux-X86, LKML, Pavel Emelyanov

On 04/08/2014 01:51 PM, Steven Noonan wrote:
> On Tue, Apr 8, 2014 at 8:16 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>> <snark>
>>
>> Of course, it would also be preferable if Amazon (or anything else) didn't need Xen PV :(
> 
> Well Amazon doesn't expose NUMA on PV, only on HVM guests.
> 

Yes, but Amazon is one of the main things keeping Xen PV alive as far as
I can tell, which means the support gets built in, and so on.

	-hpa


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels
  2014-04-08 20:59                         ` H. Peter Anvin
@ 2014-04-09 15:04                           ` Konrad Rzeszutek Wilk
  2014-04-09 15:09                             ` Peter Zijlstra
  0 siblings, 1 reply; 29+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-04-09 15:04 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Steven Noonan, Mel Gorman, Cyrill Gorcunov, David Vrabel,
	Linus Torvalds, Ingo Molnar, Rik van Riel, Andrew Morton,
	Peter Zijlstra, Andrea Arcangeli, Linux-MM, Linux-X86, LKML,
	Pavel Emelyanov

On Tue, Apr 08, 2014 at 01:59:09PM -0700, H. Peter Anvin wrote:
> On 04/08/2014 01:51 PM, Steven Noonan wrote:
> > On Tue, Apr 8, 2014 at 8:16 AM, H. Peter Anvin <hpa@zytor.com> wrote:
> >> <snark>
> >>
> >> Of course, it would also be preferable if Amazon (or anything else) didn't need Xen PV :(
> > 
> > Well Amazon doesn't expose NUMA on PV, only on HVM guests.
> > 
> 
> Yes, but Amazon is one of the main things keeping Xen PV alive as far as
> I can tell, which means the support gets built in, and so on.

Taking the snarkiness aside, the issue here is that even on guests
without NUMA exposed the problem shows up. That is the 'mknuma' are
still being called even if the guest topology is not NUMA!

Which brings a question - why isn't the mknuma and its friends gatted by
an jump_label machinery or such?

Mel, any particular reasons why it couldn't be done this way?
> 
> 	-hpa
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels
  2014-04-09 15:04                           ` Konrad Rzeszutek Wilk
@ 2014-04-09 15:09                             ` Peter Zijlstra
  0 siblings, 0 replies; 29+ messages in thread
From: Peter Zijlstra @ 2014-04-09 15:09 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: H. Peter Anvin, Steven Noonan, Mel Gorman, Cyrill Gorcunov,
	David Vrabel, Linus Torvalds, Ingo Molnar, Rik van Riel,
	Andrew Morton, Andrea Arcangeli, Linux-MM, Linux-X86, LKML,
	Pavel Emelyanov

On Wed, Apr 09, 2014 at 11:04:48AM -0400, Konrad Rzeszutek Wilk wrote:
> On Tue, Apr 08, 2014 at 01:59:09PM -0700, H. Peter Anvin wrote:
> > On 04/08/2014 01:51 PM, Steven Noonan wrote:
> > > On Tue, Apr 8, 2014 at 8:16 AM, H. Peter Anvin <hpa@zytor.com> wrote:
> > >> <snark>
> > >>
> > >> Of course, it would also be preferable if Amazon (or anything else) didn't need Xen PV :(
> > > 
> > > Well Amazon doesn't expose NUMA on PV, only on HVM guests.
> > > 
> > 
> > Yes, but Amazon is one of the main things keeping Xen PV alive as far as
> > I can tell, which means the support gets built in, and so on.
> 
> Taking the snarkiness aside, the issue here is that even on guests
> without NUMA exposed the problem shows up. That is the 'mknuma' are
> still being called even if the guest topology is not NUMA!
> 
> Which brings a question - why isn't the mknuma and its friends gatted by
> an jump_label machinery or such?
> 
> Mel, any particular reasons why it couldn't be done this way?

Hmm,. I thought we disabled all that when there was only the 1 node. All
this should be driven from task_tick_numa() which only gets called when
numabalancing_enabled, and that _should_ be false when nr_nodes == 1.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels
  2014-04-08 16:51                         ` Mel Gorman
@ 2014-04-09 15:18                           ` Konrad Rzeszutek Wilk
  2014-04-09 15:39                             ` Mel Gorman
  0 siblings, 1 reply; 29+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-04-09 15:18 UTC (permalink / raw)
  To: Mel Gorman
  Cc: H. Peter Anvin, Steven Noonan, Cyrill Gorcunov, David Vrabel,
	Linus Torvalds, Ingo Molnar, Rik van Riel, Andrew Morton,
	Peter Zijlstra, Andrea Arcangeli, Linux-MM, Linux-X86, LKML,
	Pavel Emelyanov

On Tue, Apr 08, 2014 at 05:51:23PM +0100, Mel Gorman wrote:
> On Tue, Apr 08, 2014 at 12:02:50PM -0400, Konrad Rzeszutek Wilk wrote:
> > .snip..
> > > >>> David Vrabel has a patchset which I presumed would be pulled through
> > > >the
> > > >>> Xen tree this merge window:
> > > >>>
> > > >>> [PATCHv5 0/8] x86/xen: fixes for mapping high MMIO regions (and
> > > >remove
> > > >>> _PAGE_IOMAP)
> > > >>>
> > > >>> That frees up this bit.
> > > >>>
> > > >>
> > > >> Thanks, I was not aware of that patch.  Based on it, I intend to
> > > >force
> > > >> automatic NUMA balancing to depend on !XEN and see what the reaction
> > > >is. If
> > > >> support for Xen is really required then it potentially be re-enabled
> > > >if/when
> > > >> that series is merged assuming they do not need the bit for something
> > > >else.
> > > >>
> > > >
> > > >Amazon EC2 does have large memory instance types with NUMA exposed to
> > > >the guest (e.g. c3.8xlarge, i2.8xlarge, etc), so it'd be preferable
> > > >(to me anyway) if we didn't require !XEN.
> > 
> > What about the patch that David Vrabel posted:
> > 
> > http://osdir.com/ml/general/2014-03/msg41979.html
> > 
> > Has anybody taken it for a spin?
> 
> Alternatively "[PATCH 4/5] mm: use paravirt friendly ops for NUMA
> hinting ptes" which modifies the NUMA pte helpers instead of the main
> set/clear ones.

Ah nice! Looking forward to it being posted as non-RFC and could you also
please CC 'xen-devel@lists.xenproject.org' on it?

Thank you!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels
  2014-04-09 15:18                           ` Konrad Rzeszutek Wilk
@ 2014-04-09 15:39                             ` Mel Gorman
  0 siblings, 0 replies; 29+ messages in thread
From: Mel Gorman @ 2014-04-09 15:39 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: H. Peter Anvin, Steven Noonan, Cyrill Gorcunov, David Vrabel,
	Linus Torvalds, Ingo Molnar, Rik van Riel, Andrew Morton,
	Peter Zijlstra, Andrea Arcangeli, Linux-MM, Linux-X86, LKML,
	Pavel Emelyanov

On Wed, Apr 09, 2014 at 11:18:27AM -0400, Konrad Rzeszutek Wilk wrote:
> On Tue, Apr 08, 2014 at 05:51:23PM +0100, Mel Gorman wrote:
> > On Tue, Apr 08, 2014 at 12:02:50PM -0400, Konrad Rzeszutek Wilk wrote:
> > > .snip..
> > > > >>> David Vrabel has a patchset which I presumed would be pulled through
> > > > >the
> > > > >>> Xen tree this merge window:
> > > > >>>
> > > > >>> [PATCHv5 0/8] x86/xen: fixes for mapping high MMIO regions (and
> > > > >remove
> > > > >>> _PAGE_IOMAP)
> > > > >>>
> > > > >>> That frees up this bit.
> > > > >>>
> > > > >>
> > > > >> Thanks, I was not aware of that patch.  Based on it, I intend to
> > > > >force
> > > > >> automatic NUMA balancing to depend on !XEN and see what the reaction
> > > > >is. If
> > > > >> support for Xen is really required then it potentially be re-enabled
> > > > >if/when
> > > > >> that series is merged assuming they do not need the bit for something
> > > > >else.
> > > > >>
> > > > >
> > > > >Amazon EC2 does have large memory instance types with NUMA exposed to
> > > > >the guest (e.g. c3.8xlarge, i2.8xlarge, etc), so it'd be preferable
> > > > >(to me anyway) if we didn't require !XEN.
> > > 
> > > What about the patch that David Vrabel posted:
> > > 
> > > http://osdir.com/ml/general/2014-03/msg41979.html
> > > 
> > > Has anybody taken it for a spin?
> > 
> > Alternatively "[PATCH 4/5] mm: use paravirt friendly ops for NUMA
> > hinting ptes" which modifies the NUMA pte helpers instead of the main
> > set/clear ones.
> 
> Ah nice! Looking forward to it being posted as non-RFC and could you also
> please CC 'xen-devel@lists.xenproject.org' on it?
> 

Yes I will. Unless the x86 maintainers push for it on the grounds that
it is a functional fix for xen, I'm going to wait until after the merge
window to resend it. That'd give it some chance of being tested in -next
before hitting mainline.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2014-04-09 15:39 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-07 15:10 [RFC PATCH 0/3] Use an alternative to _PAGE_PROTNONE for _PAGE_NUMA Mel Gorman
2014-04-07 15:10 ` [PATCH 1/3] x86: Require x86-64 for automatic NUMA balancing Mel Gorman
2014-04-07 15:10 ` [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels Mel Gorman
2014-04-07 15:32   ` David Vrabel
2014-04-07 15:49     ` Mel Gorman
2014-04-07 16:19       ` Cyrill Gorcunov
2014-04-07 18:28         ` Mel Gorman
2014-04-07 19:16           ` Cyrill Gorcunov
2014-04-07 19:27           ` H. Peter Anvin
2014-04-07 19:36             ` Cyrill Gorcunov
2014-04-07 19:42               ` H. Peter Anvin
2014-04-07 21:25                 ` Mel Gorman
2014-04-08  4:04                   ` Steven Noonan
2014-04-08 15:16                     ` H. Peter Anvin
2014-04-08 16:02                       ` Konrad Rzeszutek Wilk
2014-04-08 16:16                         ` H. Peter Anvin
2014-04-08 16:47                           ` Mel Gorman
2014-04-08 16:50                           ` David Vrabel
2014-04-08 16:51                         ` Mel Gorman
2014-04-09 15:18                           ` Konrad Rzeszutek Wilk
2014-04-09 15:39                             ` Mel Gorman
2014-04-08 20:51                       ` Steven Noonan
2014-04-08 20:59                         ` H. Peter Anvin
2014-04-09 15:04                           ` Konrad Rzeszutek Wilk
2014-04-09 15:09                             ` Peter Zijlstra
2014-04-08  9:31               ` David Vrabel
2014-04-07 21:19             ` Mel Gorman
2014-04-07 17:37   ` Dave Hansen
2014-04-07 15:10 ` [PATCH 3/3] mm: Allow FOLL_NUMA on FOLL_FORCE Mel Gorman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).