linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Wei Liu <wei.liu2@citrix.com>
To: linux-mm@kvack.org, xen-devel@lists.xenproject.org
Cc: Wei Liu <wei.liu2@citrix.com>, "H. Peter Anvin" <hpa@zytor.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@suse.de>,
	David Vrabel <david.vrabel@citrix.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Cyrill Gorcunov <gorcunov@openvz.org>,
	Hugh Dickins <hughd@google.com>, Rik van Riel <riel@redhat.com>
Subject: [PATCH RFC] x86,mm: use _PAGE_BIT_SOFTW2 as _PAGE_BIT_NUMA
Date: Thu, 6 Nov 2014 17:48:16 +0000	[thread overview]
Message-ID: <1415296096-22873-1-git-send-email-wei.liu2@citrix.com> (raw)

In b38af4721 ("x86,mm: fix pte_special versus pte_numa") pte_special()
(SPECIAL with PRESENT or PROTNONE) was made to complement pte_numa()
(SPECIAL with neither PRESENT nor PROTNONE). That broke Xen PV guest
with NUMA balancing support.

That's because Xen hypervisor sets _PAGE_GLOBAL (_PAGE_GLOBAL /
_PAGE_PROTNONE in Linux) for guest user space mapping. So in a Xen PV
guest, when NUMA balancing is enabled, a NUMA hinted PTE ends up
"SPECIAL (in fact NUMA) with PROTNONE but not PRESENT", which makes
pte_special() returns true when it shouldn't.

Fundamentally we only need _PAGE_NUMA and _PAGE_PRESENT to tell
difference between an unmapped entry and an entry protected for NUMA
hinting fault. So use _PAGE_BIT_SOFTW2 as _PAGE_BIT_NUMA, adjust
_PAGE_NUMA_MASK and SWP_OFFSET_SHIFT as needed.

Suggested-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Cc: xen-devel@lists.xenproject.org
---
 arch/x86/include/asm/pgtable.h       |    5 -----
 arch/x86/include/asm/pgtable_64.h    |    2 +-
 arch/x86/include/asm/pgtable_types.h |    8 ++++----
 3 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index aa97a07..8dee3ed 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -131,11 +131,6 @@ static inline int pte_exec(pte_t pte)
 
 static inline int pte_special(pte_t pte)
 {
-	/*
-	 * See CONFIG_NUMA_BALANCING pte_numa in include/asm-generic/pgtable.h.
-	 * On x86 we have _PAGE_BIT_NUMA == _PAGE_BIT_GLOBAL+1 ==
-	 * __PAGE_BIT_SOFTW1 == _PAGE_BIT_SPECIAL.
-	 */
 	return (pte_flags(pte) & _PAGE_SPECIAL) &&
 		(pte_flags(pte) & (_PAGE_PRESENT|_PAGE_PROTNONE));
 }
diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
index 4572b2f..26f2ade 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -148,7 +148,7 @@ static inline int pgd_large(pgd_t pgd) { return 0; }
 #define SWP_TYPE_BITS (_PAGE_BIT_FILE - _PAGE_BIT_PRESENT - 1)
 #ifdef CONFIG_NUMA_BALANCING
 /* Automatic NUMA balancing needs to be distinguishable from swap entries */
-#define SWP_OFFSET_SHIFT (_PAGE_BIT_PROTNONE + 2)
+#define SWP_OFFSET_SHIFT (_PAGE_BIT_PROTNONE + 3)
 #else
 #define SWP_OFFSET_SHIFT (_PAGE_BIT_PROTNONE + 1)
 #endif
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index 0778964..bc82d6b 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -31,9 +31,9 @@
  * Swap offsets on configurations that allow automatic NUMA balancing use the
  * bits after _PAGE_BIT_GLOBAL. To uniquely distinguish NUMA hinting PTEs from
  * swap entries, we use the first bit after _PAGE_BIT_GLOBAL and shrink the
- * maximum possible swap space from 16TB to 8TB.
+ * maximum possible swap space from 16TB to 4TB.
  */
-#define _PAGE_BIT_NUMA		(_PAGE_BIT_GLOBAL+1)
+#define _PAGE_BIT_NUMA		_PAGE_BIT_SOFTW2
 
 /* If _PAGE_BIT_PRESENT is clear, we use these: */
 /* - if the user mapped it with PROT_NONE; pte_present gives true */
@@ -325,8 +325,8 @@ static inline pteval_t pte_flags(pte_t pte)
 }
 
 #ifdef CONFIG_NUMA_BALANCING
-/* Set of bits that distinguishes present, prot_none and numa ptes */
-#define _PAGE_NUMA_MASK (_PAGE_NUMA|_PAGE_PROTNONE|_PAGE_PRESENT)
+/* Set of bits that distinguishes present and numa ptes */
+#define _PAGE_NUMA_MASK (_PAGE_NUMA|_PAGE_PRESENT)
 static inline pteval_t ptenuma_flags(pte_t pte)
 {
 	return pte_flags(pte) & _PAGE_NUMA_MASK;
-- 
1.7.10.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

             reply	other threads:[~2014-11-06 17:49 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-06 17:48 Wei Liu [this message]
2014-11-07  8:52 ` [PATCH RFC] x86,mm: use _PAGE_BIT_SOFTW2 as _PAGE_BIT_NUMA Mel Gorman
2014-11-07 10:45   ` Wei Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1415296096-22873-1-git-send-email-wei.liu2@citrix.com \
    --to=wei.liu2@citrix.com \
    --cc=akpm@linux-foundation.org \
    --cc=david.vrabel@citrix.com \
    --cc=gorcunov@openvz.org \
    --cc=hpa@zytor.com \
    --cc=hughd@google.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=riel@redhat.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).