xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/2] x86: fix Xen PV regression caused by NUMA page migration
@ 2014-03-21 18:18 David Vrabel
  2014-03-21 18:18 ` [PATCH 1/2] Revert "xen: properly account for _PAGE_NUMA during xen pte translations" David Vrabel
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: David Vrabel @ 2014-03-21 18:18 UTC (permalink / raw)
  To: xen-devel
  Cc: Steven Noonan, Mel Gorman, Elena Ufimtseva, Boris Ostrovsky,
	David Vrabel

This series should properly fix the Xen PV guest regression introduced
by 1667918b6483 (mm: numa: clear numa hinting information on
mprotect).  The previous fix a9c8e4beeeb6 (xen: properly account for
_PAGE_NUMA during xen pte translations) breaks save/restore
(migration) and needs to be reverted.

I've only given this series a minimal amount of testing and would
appreciate testing by someone who experienced/reproduced the original
regression.

David

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] Revert "xen: properly account for _PAGE_NUMA during xen pte translations"
  2014-03-21 18:18 [RFC PATCH 0/2] x86: fix Xen PV regression caused by NUMA page migration David Vrabel
@ 2014-03-21 18:18 ` David Vrabel
  2014-03-21 18:18 ` [PATCH 2/2] x86: use pv-ops in {pte, pmd}_{set, clear}_flags() David Vrabel
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: David Vrabel @ 2014-03-21 18:18 UTC (permalink / raw)
  To: xen-devel
  Cc: Steven Noonan, Mel Gorman, Elena Ufimtseva, Boris Ostrovsky,
	David Vrabel

This reverts commit a9c8e4beeeb64c22b84c803747487857fe424b68.

PTEs in Xen PV guests must contain machine addresses if _PAGE_PRESENT
is set and pseudo-physical addresses is _PAGE_PRESENT is clear.

This is because during a domain save/restore (migration) the page
table entries are "canonicalised" and uncanonicalised". i.e., MFNs are
converted to PFNs during domain save so that on a restore the page
table entries may be rewritten with the new MFNs on the destination.

This change resulted in writing PTEs with MFNs if _PAGE_NUMA was set
but _PAGE_PRESENT was clear.  These PTEs would be migrated as-is which
would result in unexpected behaviour in the destination domain.
Either a) the MFN would be translated to the wrong PFN/page; b)
setting the _PAGE_PRESENT bit would clear the PTE because the MFN is
no longer owned by the domain; or c) the present bit would not get
set.

Symptoms include "Bad page" reports when unmapping after migrating a
domain.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Steven Noonan <steven@uplinklabs.net>
Cc: Elena Ufimtseva <ufimtseva@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: <stable@vger.kernel.org>        [3.12+]
---
 arch/x86/include/asm/pgtable.h |   14 ++------------
 arch/x86/xen/mmu.c             |    4 ++--
 2 files changed, 4 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 5ad38ad..bbc8b12 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -445,20 +445,10 @@ static inline int pte_same(pte_t a, pte_t b)
 	return a.pte == b.pte;
 }
 
-static inline int pteval_present(pteval_t pteval)
-{
-	/*
-	 * Yes Linus, _PAGE_PROTNONE == _PAGE_NUMA. Expressing it this
-	 * way clearly states that the intent is that protnone and numa
-	 * hinting ptes are considered present for the purposes of
-	 * pagetable operations like zapping, protection changes, gup etc.
-	 */
-	return pteval & (_PAGE_PRESENT | _PAGE_PROTNONE | _PAGE_NUMA);
-}
-
 static inline int pte_present(pte_t a)
 {
-	return pteval_present(pte_flags(a));
+	return pte_flags(a) & (_PAGE_PRESENT | _PAGE_PROTNONE |
+			       _PAGE_NUMA);
 }
 
 #define pte_accessible pte_accessible
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 256282e..2423ef0 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -365,7 +365,7 @@ void xen_ptep_modify_prot_commit(struct mm_struct *mm, unsigned long addr,
 /* Assume pteval_t is equivalent to all the other *val_t types. */
 static pteval_t pte_mfn_to_pfn(pteval_t val)
 {
-	if (pteval_present(val)) {
+	if (val & _PAGE_PRESENT) {
 		unsigned long mfn = (val & PTE_PFN_MASK) >> PAGE_SHIFT;
 		unsigned long pfn = mfn_to_pfn(mfn);
 
@@ -381,7 +381,7 @@ static pteval_t pte_mfn_to_pfn(pteval_t val)
 
 static pteval_t pte_pfn_to_mfn(pteval_t val)
 {
-	if (pteval_present(val)) {
+	if (val & _PAGE_PRESENT) {
 		unsigned long pfn = (val & PTE_PFN_MASK) >> PAGE_SHIFT;
 		pteval_t flags = val & PTE_FLAGS_MASK;
 		unsigned long mfn;
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] x86: use pv-ops in {pte, pmd}_{set, clear}_flags()
  2014-03-21 18:18 [RFC PATCH 0/2] x86: fix Xen PV regression caused by NUMA page migration David Vrabel
  2014-03-21 18:18 ` [PATCH 1/2] Revert "xen: properly account for _PAGE_NUMA during xen pte translations" David Vrabel
@ 2014-03-21 18:18 ` David Vrabel
  2014-03-24 11:28   ` David Vrabel
  2014-03-26 19:10 ` [RFC PATCH 0/2] x86: fix Xen PV regression caused by NUMA page migration David Vrabel
  2014-04-15  8:24 ` David Sutton
  3 siblings, 1 reply; 6+ messages in thread
From: David Vrabel @ 2014-03-21 18:18 UTC (permalink / raw)
  To: xen-devel
  Cc: Steven Noonan, Mel Gorman, Elena Ufimtseva, Boris Ostrovsky,
	David Vrabel

Instead of using native functions to operate on the PTEs in
pte_set_flags(), pte_clear_flags(), pmd_set_flags(), pmd_clear_flags()
use the PV aware ones.

This fixes a regression in Xen PV guests introduced by 1667918b6483
(mm: numa: clear numa hinting information on mprotect).

Xen PV guest page tables require that their entries use machine
addresses if the preset bit (_PAGE_PRESENT) is set, and (for
successful migration) non-present PTEs must use pseudo-physical
addresses.  This is because on migration MFNs in present PTEs are
translated to PFNs (canonicalised) so they may be translated back to
the new MFN in the destination domain (uncanonicalised).

pte_mknonnuma(), pmd_mknonnuma(), pte_mknuma() and pmd_mknuma() set
and clear the _PAGE_PRESENT bit using pte_set_flags(),
pte_clear_flags(), etc.

In a Xen PV guest, these functions must translate MFNs to PFNs when
clearing _PAGE_PRESENT and translate PFNs to MFNs when setting
_PAGE_PRESENT.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: Steven Noonan <steven@uplinklabs.net>
Cc: Elena Ufimtseva <ufimtseva@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: <stable@vger.kernel.org>        [3.12+]
---
 arch/x86/include/asm/pgtable.h |   12 ++++++------
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index bbc8b12..323e5e2 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -174,16 +174,16 @@ static inline int has_transparent_hugepage(void)
 
 static inline pte_t pte_set_flags(pte_t pte, pteval_t set)
 {
-	pteval_t v = native_pte_val(pte);
+	pteval_t v = pte_val(pte);
 
-	return native_make_pte(v | set);
+	return __pte(v | set);
 }
 
 static inline pte_t pte_clear_flags(pte_t pte, pteval_t clear)
 {
-	pteval_t v = native_pte_val(pte);
+	pteval_t v = pte_val(pte);
 
-	return native_make_pte(v & ~clear);
+	return __pte(v & ~clear);
 }
 
 static inline pte_t pte_mkclean(pte_t pte)
@@ -248,14 +248,14 @@ static inline pte_t pte_mkspecial(pte_t pte)
 
 static inline pmd_t pmd_set_flags(pmd_t pmd, pmdval_t set)
 {
-	pmdval_t v = native_pmd_val(pmd);
+	pmdval_t v = pmd_val(pmd);
 
 	return __pmd(v | set);
 }
 
 static inline pmd_t pmd_clear_flags(pmd_t pmd, pmdval_t clear)
 {
-	pmdval_t v = native_pmd_val(pmd);
+	pmdval_t v = pmd_val(pmd);
 
 	return __pmd(v & ~clear);
 }
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] x86: use pv-ops in {pte, pmd}_{set, clear}_flags()
  2014-03-21 18:18 ` [PATCH 2/2] x86: use pv-ops in {pte, pmd}_{set, clear}_flags() David Vrabel
@ 2014-03-24 11:28   ` David Vrabel
  0 siblings, 0 replies; 6+ messages in thread
From: David Vrabel @ 2014-03-24 11:28 UTC (permalink / raw)
  To: David Vrabel
  Cc: Steven Noonan, Mel Gorman, Elena Ufimtseva, xen-devel,
	Boris Ostrovsky

On 21/03/14 18:18, David Vrabel wrote:
> Instead of using native functions to operate on the PTEs in
> pte_set_flags(), pte_clear_flags(), pmd_set_flags(), pmd_clear_flags()
> use the PV aware ones.

Looking at the history of this changes, these were introduced to avoid
PV ops for performance reasons.

I believe this can be fixed by making the numa-related PTE modifier
functions (pte_mknuma(), pte_mknonnuma()) use pte_modify() which is the
correct function to use when adjusting page protection.

I really do not understand how you're supposed to distinguish between a
PTE for a PROT_NONE page and one with _PAGE_NUMA -- they're identical.
i.e., pte_numa() will return true for a PROT_NONE protected page which
just seems wrong to me.

David

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH 0/2] x86: fix Xen PV regression caused by NUMA page migration
  2014-03-21 18:18 [RFC PATCH 0/2] x86: fix Xen PV regression caused by NUMA page migration David Vrabel
  2014-03-21 18:18 ` [PATCH 1/2] Revert "xen: properly account for _PAGE_NUMA during xen pte translations" David Vrabel
  2014-03-21 18:18 ` [PATCH 2/2] x86: use pv-ops in {pte, pmd}_{set, clear}_flags() David Vrabel
@ 2014-03-26 19:10 ` David Vrabel
  2014-04-15  8:24 ` David Sutton
  3 siblings, 0 replies; 6+ messages in thread
From: David Vrabel @ 2014-03-26 19:10 UTC (permalink / raw)
  To: David Vrabel
  Cc: Boris Ostrovsky, xen-devel, Steven Noonan, Mel Gorman,
	Elena Ufimtseva

[-- Attachment #1: Type: text/plain, Size: 815 bytes --]

On 21/03/14 18:18, David Vrabel wrote:
> This series should properly fix the Xen PV guest regression introduced
> by 1667918b6483 (mm: numa: clear numa hinting information on
> mprotect).  The previous fix a9c8e4beeeb6 (xen: properly account for
> _PAGE_NUMA during xen pte translations) breaks save/restore
> (migration) and needs to be reverted.
> 
> I've only given this series a minimal amount of testing and would
> appreciate testing by someone who experienced/reproduced the original
> regression.

Attached is a simple test program that triggers the bug.

Patch #2 fixes the regression and I think that it is the correct fix but
I need to confirm it doesn't cause any measurable performance problems
on native x86 (I think it should not).

I don't have any more time to work on this until next week.

David

[-- Attachment #2: mprotect.c --]
[-- Type: text/x-csrc, Size: 1407 bytes --]

#include <sys/mman.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define PAGE_SIZE 4096
#define MAGIC 0x79128347ul

int main(void)
{
    void *p;
    volatile unsigned long *m;
    int ret;

    /* Map a frame for the test. */
    m = p = mmap(NULL, PAGE_SIZE, PROT_WRITE|PROT_READ, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
    if (p == MAP_FAILED) {
        perror("mmap");
        exit(1);
    }

    /* Write to the frame. */
    *m = MAGIC;

    /* Set PROT_NONE on the mapping. */
    ret = mprotect(p, PAGE_SIZE, PROT_NONE);
    if (ret < 0) {
        perror("mprotect");
        exit(1);
    }

    /*
     * Set PROT_READ (or any other non-PROT_NONE protection) on the
     * mapping.
     *
     * On a buggy system pte_mknonnuma() will clear _PAGE_PROTNONE (==
     * _PAGE_NUMA) and set _PAGE_PRESENT /without/ doing a PFN to MFN
     * conversion.
     *
     * The resulting mapping will be to the wrong MFN.
     */
    ret = mprotect(p, PAGE_SIZE, PROT_READ);
    if (ret < 0) {
        perror("mprotect");
        exit(1);
    }

    /* Check we're still mapping the original frame. */
    if (*m != MAGIC) {
        printf("FAIL (0x%lx != 0x%lx)\n", *m, MAGIC);
        ret = 1;
    }

    /*
     * On a buggy system, the munmap() will usually trigger a "Bad
     * page" error since the mapping isn't for the original page.
     */
    munmap(p, PAGE_SIZE);

    return ret;
}

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH 0/2] x86: fix Xen PV regression caused by NUMA page migration
  2014-03-21 18:18 [RFC PATCH 0/2] x86: fix Xen PV regression caused by NUMA page migration David Vrabel
                   ` (2 preceding siblings ...)
  2014-03-26 19:10 ` [RFC PATCH 0/2] x86: fix Xen PV regression caused by NUMA page migration David Vrabel
@ 2014-04-15  8:24 ` David Sutton
  3 siblings, 0 replies; 6+ messages in thread
From: David Sutton @ 2014-04-15  8:24 UTC (permalink / raw)
  To: xen-devel

David,

David Vrabel <david.vrabel <at> citrix.com> writes:

> 
> This series should properly fix the Xen PV guest regression introduced
> by 1667918b6483 (mm: numa: clear numa hinting information on
> mprotect).  The previous fix a9c8e4beeeb6 (xen: properly account for
> _PAGE_NUMA during xen pte translations) breaks save/restore
> (migration) and needs to be reverted.
> 
> I've only given this series a minimal amount of testing and would
> appreciate testing by someone who experienced/reproduced the original
> regression.
>
I just upgraded to the Archlinux linux 3.14.1-1 packages and it started to
show the same symptoms as before the original patch was applied (the key one
I noticed being major instability running firefox under dom0. I then applied
the 2nd patch in this set and recompiled; initial testing (doing things
which would almost certainly cause the issue to trigger, such as viewing a
youtube video through firefox) indicated that the patch is working as I
haven't been able to trigger the bug yet.
> 
> David
> 
Regards,

  David

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-04-15  8:24 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-21 18:18 [RFC PATCH 0/2] x86: fix Xen PV regression caused by NUMA page migration David Vrabel
2014-03-21 18:18 ` [PATCH 1/2] Revert "xen: properly account for _PAGE_NUMA during xen pte translations" David Vrabel
2014-03-21 18:18 ` [PATCH 2/2] x86: use pv-ops in {pte, pmd}_{set, clear}_flags() David Vrabel
2014-03-24 11:28   ` David Vrabel
2014-03-26 19:10 ` [RFC PATCH 0/2] x86: fix Xen PV regression caused by NUMA page migration David Vrabel
2014-04-15  8:24 ` David Sutton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).