* [PATCH RFC] x86,mm: use _PAGE_BIT_SOFTW2 as _PAGE_BIT_NUMA
@ 2014-11-06 17:48 Wei Liu
2014-11-07 8:52 ` Mel Gorman
0 siblings, 1 reply; 3+ messages in thread
From: Wei Liu @ 2014-11-06 17:48 UTC (permalink / raw)
To: linux-mm, xen-devel
Cc: Wei Liu, H. Peter Anvin, Andrew Morton, Mel Gorman, David Vrabel,
Konrad Rzeszutek Wilk, Cyrill Gorcunov, Hugh Dickins,
Rik van Riel
In b38af4721 ("x86,mm: fix pte_special versus pte_numa") pte_special()
(SPECIAL with PRESENT or PROTNONE) was made to complement pte_numa()
(SPECIAL with neither PRESENT nor PROTNONE). That broke Xen PV guest
with NUMA balancing support.
That's because Xen hypervisor sets _PAGE_GLOBAL (_PAGE_GLOBAL /
_PAGE_PROTNONE in Linux) for guest user space mapping. So in a Xen PV
guest, when NUMA balancing is enabled, a NUMA hinted PTE ends up
"SPECIAL (in fact NUMA) with PROTNONE but not PRESENT", which makes
pte_special() returns true when it shouldn't.
Fundamentally we only need _PAGE_NUMA and _PAGE_PRESENT to tell
difference between an unmapped entry and an entry protected for NUMA
hinting fault. So use _PAGE_BIT_SOFTW2 as _PAGE_BIT_NUMA, adjust
_PAGE_NUMA_MASK and SWP_OFFSET_SHIFT as needed.
Suggested-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Cc: xen-devel@lists.xenproject.org
---
arch/x86/include/asm/pgtable.h | 5 -----
arch/x86/include/asm/pgtable_64.h | 2 +-
arch/x86/include/asm/pgtable_types.h | 8 ++++----
3 files changed, 5 insertions(+), 10 deletions(-)
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index aa97a07..8dee3ed 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -131,11 +131,6 @@ static inline int pte_exec(pte_t pte)
static inline int pte_special(pte_t pte)
{
- /*
- * See CONFIG_NUMA_BALANCING pte_numa in include/asm-generic/pgtable.h.
- * On x86 we have _PAGE_BIT_NUMA == _PAGE_BIT_GLOBAL+1 ==
- * __PAGE_BIT_SOFTW1 == _PAGE_BIT_SPECIAL.
- */
return (pte_flags(pte) & _PAGE_SPECIAL) &&
(pte_flags(pte) & (_PAGE_PRESENT|_PAGE_PROTNONE));
}
diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
index 4572b2f..26f2ade 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -148,7 +148,7 @@ static inline int pgd_large(pgd_t pgd) { return 0; }
#define SWP_TYPE_BITS (_PAGE_BIT_FILE - _PAGE_BIT_PRESENT - 1)
#ifdef CONFIG_NUMA_BALANCING
/* Automatic NUMA balancing needs to be distinguishable from swap entries */
-#define SWP_OFFSET_SHIFT (_PAGE_BIT_PROTNONE + 2)
+#define SWP_OFFSET_SHIFT (_PAGE_BIT_PROTNONE + 3)
#else
#define SWP_OFFSET_SHIFT (_PAGE_BIT_PROTNONE + 1)
#endif
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index 0778964..bc82d6b 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -31,9 +31,9 @@
* Swap offsets on configurations that allow automatic NUMA balancing use the
* bits after _PAGE_BIT_GLOBAL. To uniquely distinguish NUMA hinting PTEs from
* swap entries, we use the first bit after _PAGE_BIT_GLOBAL and shrink the
- * maximum possible swap space from 16TB to 8TB.
+ * maximum possible swap space from 16TB to 4TB.
*/
-#define _PAGE_BIT_NUMA (_PAGE_BIT_GLOBAL+1)
+#define _PAGE_BIT_NUMA _PAGE_BIT_SOFTW2
/* If _PAGE_BIT_PRESENT is clear, we use these: */
/* - if the user mapped it with PROT_NONE; pte_present gives true */
@@ -325,8 +325,8 @@ static inline pteval_t pte_flags(pte_t pte)
}
#ifdef CONFIG_NUMA_BALANCING
-/* Set of bits that distinguishes present, prot_none and numa ptes */
-#define _PAGE_NUMA_MASK (_PAGE_NUMA|_PAGE_PROTNONE|_PAGE_PRESENT)
+/* Set of bits that distinguishes present and numa ptes */
+#define _PAGE_NUMA_MASK (_PAGE_NUMA|_PAGE_PRESENT)
static inline pteval_t ptenuma_flags(pte_t pte)
{
return pte_flags(pte) & _PAGE_NUMA_MASK;
--
1.7.10.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH RFC] x86,mm: use _PAGE_BIT_SOFTW2 as _PAGE_BIT_NUMA
2014-11-06 17:48 [PATCH RFC] x86,mm: use _PAGE_BIT_SOFTW2 as _PAGE_BIT_NUMA Wei Liu
@ 2014-11-07 8:52 ` Mel Gorman
2014-11-07 10:45 ` Wei Liu
0 siblings, 1 reply; 3+ messages in thread
From: Mel Gorman @ 2014-11-07 8:52 UTC (permalink / raw)
To: Wei Liu
Cc: linux-mm, xen-devel, H. Peter Anvin, Andrew Morton, David Vrabel,
Konrad Rzeszutek Wilk, Cyrill Gorcunov, Hugh Dickins,
Rik van Riel
On Thu, Nov 06, 2014 at 05:48:16PM +0000, Wei Liu wrote:
> In b38af4721 ("x86,mm: fix pte_special versus pte_numa") pte_special()
> (SPECIAL with PRESENT or PROTNONE) was made to complement pte_numa()
> (SPECIAL with neither PRESENT nor PROTNONE). That broke Xen PV guest
> with NUMA balancing support.
>
> That's because Xen hypervisor sets _PAGE_GLOBAL (_PAGE_GLOBAL /
> _PAGE_PROTNONE in Linux) for guest user space mapping. So in a Xen PV
> guest, when NUMA balancing is enabled, a NUMA hinted PTE ends up
> "SPECIAL (in fact NUMA) with PROTNONE but not PRESENT", which makes
> pte_special() returns true when it shouldn't.
>
> Fundamentally we only need _PAGE_NUMA and _PAGE_PRESENT to tell
> difference between an unmapped entry and an entry protected for NUMA
> hinting fault. So use _PAGE_BIT_SOFTW2 as _PAGE_BIT_NUMA, adjust
> _PAGE_NUMA_MASK and SWP_OFFSET_SHIFT as needed.
>
> Suggested-by: David Vrabel <david.vrabel@citrix.com>
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
I suggest instead that you force automatic NUMA balancing to be disabled
on Xen PV guests until I or someone else finds time to implement Linus'
idea to remove _PAGE_NUMA entirely. It's been on my TODO list for a few
weeks but I still have not reached the point where I'm back working on
upstream material properly.
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH RFC] x86,mm: use _PAGE_BIT_SOFTW2 as _PAGE_BIT_NUMA
2014-11-07 8:52 ` Mel Gorman
@ 2014-11-07 10:45 ` Wei Liu
0 siblings, 0 replies; 3+ messages in thread
From: Wei Liu @ 2014-11-07 10:45 UTC (permalink / raw)
To: Mel Gorman
Cc: Wei Liu, linux-mm, xen-devel, H. Peter Anvin, Andrew Morton,
David Vrabel, Konrad Rzeszutek Wilk, Cyrill Gorcunov,
Hugh Dickins, Rik van Riel
On Fri, Nov 07, 2014 at 08:52:10AM +0000, Mel Gorman wrote:
> On Thu, Nov 06, 2014 at 05:48:16PM +0000, Wei Liu wrote:
> > In b38af4721 ("x86,mm: fix pte_special versus pte_numa") pte_special()
> > (SPECIAL with PRESENT or PROTNONE) was made to complement pte_numa()
> > (SPECIAL with neither PRESENT nor PROTNONE). That broke Xen PV guest
> > with NUMA balancing support.
> >
> > That's because Xen hypervisor sets _PAGE_GLOBAL (_PAGE_GLOBAL /
> > _PAGE_PROTNONE in Linux) for guest user space mapping. So in a Xen PV
> > guest, when NUMA balancing is enabled, a NUMA hinted PTE ends up
> > "SPECIAL (in fact NUMA) with PROTNONE but not PRESENT", which makes
> > pte_special() returns true when it shouldn't.
> >
> > Fundamentally we only need _PAGE_NUMA and _PAGE_PRESENT to tell
> > difference between an unmapped entry and an entry protected for NUMA
> > hinting fault. So use _PAGE_BIT_SOFTW2 as _PAGE_BIT_NUMA, adjust
> > _PAGE_NUMA_MASK and SWP_OFFSET_SHIFT as needed.
> >
> > Suggested-by: David Vrabel <david.vrabel@citrix.com>
> > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
>
> I suggest instead that you force automatic NUMA balancing to be disabled
> on Xen PV guests until I or someone else finds time to implement Linus'
> idea to remove _PAGE_NUMA entirely. It's been on my TODO list for a few
> weeks but I still have not reached the point where I'm back working on
> upstream material properly.
>
No problem. Thanks for the suggestion.
Wei.
> --
> Mel Gorman
> SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2014-11-07 10:45 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-06 17:48 [PATCH RFC] x86,mm: use _PAGE_BIT_SOFTW2 as _PAGE_BIT_NUMA Wei Liu
2014-11-07 8:52 ` Mel Gorman
2014-11-07 10:45 ` Wei Liu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).