linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@suse.de>
To: David Vrabel <david.vrabel@citrix.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Wei Liu <wei.liu2@citrix.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Xen-devel@lists.xen.org" <Xen-devel@lists.xen.org>
Subject: Re: NUMA_BALANCING and Xen PV guest regression in 3.20-rc0
Date: Thu, 19 Feb 2015 17:01:04 +0000	[thread overview]
Message-ID: <20150219170104.GS3087@suse.de> (raw)
In-Reply-To: <54E5DFED.9050700@citrix.com>

On Thu, Feb 19, 2015 at 01:06:53PM +0000, David Vrabel wrote:
> Mel,
> 
> The NUMA_BALANCING series beginning with 5d833062139d (mm: numa: do not
> dereference pmd outside of the lock during NUMA hinting fault) and
> specifically 8a0516ed8b90 (mm: convert p[te|md]_numa users to
> p[te|md]_protnone_numa) breaks Xen 64-bit PV guests.
> 
> Any fault on a present userspace mapping (e.g., a write to a read-only
> mapping) is being misinterpreted as a NUMA hinting fault and not handled
> correctly.  All userspace programs end up continuously  faulting.
> 
> This is because the hypervisor sets _PAGE_GLOBAL (== _PAGE_PROTNONE) on
> all present userspace page table entries.
> 

I see, this is a variation of the problem where the NUMA hinted PTE was
treated as special due to the paravirt interfaces not being used.

> Note that the comment in asm/pgtable_types.h that says that
> _PAGE_BIT_PROTNONE is only valid on non-present entries.
> 
>   /* If _PAGE_BIT_PRESENT is clear, we use these: */
>   /* - if the user mapped it with PROT_NONE; pte_present gives true */
>   #define _PAGE_BIT_PROTNONE	_PAGE_BIT_GLOBAL
> 
> Adjusting pte_protnone() and pmd_protnone() to check for the absence of
> _PAGE_PRESENT allows 64-bit Xen PV guests to work correctly again (see
> following patch), but I'm not sure if NUMA_BALANCING would correctly
> work with this change.
> 

Thanks for the analysis and the reminder of some of the details from the
previous discussion.

> 
> 8<---------------------------
> x86: pte_protnone() and pmd_protnone() must check entry is
>  not present
> 
> Since _PAGE_PROTNONE aliases _PAGE_GLOBAL it is only valid if
> _PAGE_PRESENT is clear.  Make pte_protnone() and pmd_protnone() check
> for this.
> 
> This fixes a 64-bit Xen PV guest regression introduced by
> 8a0516ed8b90c95ffa1363b420caa37418149f21 (mm: convert p[te|md]_numa
> users to p[te|md]_protnone_numa).  Any userspace process would
> endlessly fault.
> 
> In a 64-bit PV guest, userspace page table entries have _PAGE_GLOBAL
> set by the hypervisor.  This meant that any fault on a present
> userspace entry (e.g., a write to a read-only mapping) would be
> misinterpreted as a NUMA hinting fault and the fault would not be
> correctly handled, resulting in the access endlessly faulting.
> 
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> Cc: Mel Gorman <mgorman@suse.de>

I cannot think of a reason why this would fail for NUMA balancing on bare
metal. The PAGE_NONE protection clears the present bit on p[te|md]_modify
so the expectations are matched before or after the patch is applied. So,
for bare metal at least

Acked-by: Mel Gorman <mgorman@suse.de>

I *think* this will work ok with Xen but I cannot 100% convince myself.
I'm adding Wei Liu to the cc who may have a Xen PV setup handy that
supports NUMA and may be able to test the patch to confirm.

-- 
Mel Gorman
SUSE Labs

  reply	other threads:[~2015-02-19 17:01 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-19 13:06 NUMA_BALANCING and Xen PV guest regression in 3.20-rc0 David Vrabel
2015-02-19 17:01 ` Mel Gorman [this message]
2015-02-23 15:13   ` [Xen-devel] " Dario Faggioli
2015-02-23 15:46     ` Mel Gorman
2015-02-19 23:09 ` Linus Torvalds
2015-02-20 10:28   ` [Xen-devel] " David Vrabel
2015-02-20  1:05 ` Kirill A. Shutemov
2015-02-20  1:49   ` Linus Torvalds
2015-02-20 10:47     ` [Xen-devel] " Andrew Cooper
2015-02-20 11:29       ` Kirill A. Shutemov
2015-02-20 11:54         ` Andrew Cooper

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150219170104.GS3087@suse.de \
    --to=mgorman@suse.de \
    --cc=Xen-devel@lists.xen.org \
    --cc=david.vrabel@citrix.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=wei.liu2@citrix.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).