From: Mel Gorman <mgorman@suse.de>
To: David Vrabel <david.vrabel@citrix.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Wei Liu <wei.liu2@citrix.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"Xen-devel@lists.xen.org" <Xen-devel@lists.xen.org>
Subject: Re: NUMA_BALANCING and Xen PV guest regression in 3.20-rc0
Date: Thu, 19 Feb 2015 17:01:04 +0000 [thread overview]
Message-ID: <20150219170104.GS3087@suse.de> (raw)
In-Reply-To: <54E5DFED.9050700@citrix.com>
On Thu, Feb 19, 2015 at 01:06:53PM +0000, David Vrabel wrote:
> Mel,
>
> The NUMA_BALANCING series beginning with 5d833062139d (mm: numa: do not
> dereference pmd outside of the lock during NUMA hinting fault) and
> specifically 8a0516ed8b90 (mm: convert p[te|md]_numa users to
> p[te|md]_protnone_numa) breaks Xen 64-bit PV guests.
>
> Any fault on a present userspace mapping (e.g., a write to a read-only
> mapping) is being misinterpreted as a NUMA hinting fault and not handled
> correctly. All userspace programs end up continuously faulting.
>
> This is because the hypervisor sets _PAGE_GLOBAL (== _PAGE_PROTNONE) on
> all present userspace page table entries.
>
I see, this is a variation of the problem where the NUMA hinted PTE was
treated as special due to the paravirt interfaces not being used.
> Note that the comment in asm/pgtable_types.h that says that
> _PAGE_BIT_PROTNONE is only valid on non-present entries.
>
> /* If _PAGE_BIT_PRESENT is clear, we use these: */
> /* - if the user mapped it with PROT_NONE; pte_present gives true */
> #define _PAGE_BIT_PROTNONE _PAGE_BIT_GLOBAL
>
> Adjusting pte_protnone() and pmd_protnone() to check for the absence of
> _PAGE_PRESENT allows 64-bit Xen PV guests to work correctly again (see
> following patch), but I'm not sure if NUMA_BALANCING would correctly
> work with this change.
>
Thanks for the analysis and the reminder of some of the details from the
previous discussion.
>
> 8<---------------------------
> x86: pte_protnone() and pmd_protnone() must check entry is
> not present
>
> Since _PAGE_PROTNONE aliases _PAGE_GLOBAL it is only valid if
> _PAGE_PRESENT is clear. Make pte_protnone() and pmd_protnone() check
> for this.
>
> This fixes a 64-bit Xen PV guest regression introduced by
> 8a0516ed8b90c95ffa1363b420caa37418149f21 (mm: convert p[te|md]_numa
> users to p[te|md]_protnone_numa). Any userspace process would
> endlessly fault.
>
> In a 64-bit PV guest, userspace page table entries have _PAGE_GLOBAL
> set by the hypervisor. This meant that any fault on a present
> userspace entry (e.g., a write to a read-only mapping) would be
> misinterpreted as a NUMA hinting fault and the fault would not be
> correctly handled, resulting in the access endlessly faulting.
>
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> Cc: Mel Gorman <mgorman@suse.de>
I cannot think of a reason why this would fail for NUMA balancing on bare
metal. The PAGE_NONE protection clears the present bit on p[te|md]_modify
so the expectations are matched before or after the patch is applied. So,
for bare metal at least
Acked-by: Mel Gorman <mgorman@suse.de>
I *think* this will work ok with Xen but I cannot 100% convince myself.
I'm adding Wei Liu to the cc who may have a Xen PV setup handy that
supports NUMA and may be able to test the patch to confirm.
--
Mel Gorman
SUSE Labs
next prev parent reply other threads:[~2015-02-19 17:01 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-19 13:06 NUMA_BALANCING and Xen PV guest regression in 3.20-rc0 David Vrabel
2015-02-19 17:01 ` Mel Gorman [this message]
2015-02-23 15:13 ` [Xen-devel] " Dario Faggioli
2015-02-23 15:13 ` Dario Faggioli
2015-02-23 15:46 ` Mel Gorman
2015-02-23 15:46 ` [Xen-devel] " Mel Gorman
2015-02-19 17:01 ` Mel Gorman
2015-02-19 23:09 ` Linus Torvalds
2015-02-19 23:09 ` Linus Torvalds
2015-02-20 10:28 ` [Xen-devel] " David Vrabel
2015-02-20 10:28 ` David Vrabel
2015-02-20 1:05 ` Kirill A. Shutemov
2015-02-20 1:05 ` Kirill A. Shutemov
2015-02-20 1:49 ` Linus Torvalds
2015-02-20 10:47 ` Andrew Cooper
2015-02-20 10:47 ` [Xen-devel] " Andrew Cooper
2015-02-20 11:29 ` Kirill A. Shutemov
2015-02-20 11:29 ` [Xen-devel] " Kirill A. Shutemov
2015-02-20 11:54 ` Andrew Cooper
2015-02-20 11:54 ` Andrew Cooper
2015-02-20 1:49 ` Linus Torvalds
-- strict thread matches above, loose matches on Subject: below --
2015-02-19 13:06 David Vrabel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150219170104.GS3087@suse.de \
--to=mgorman@suse.de \
--cc=Xen-devel@lists.xen.org \
--cc=david.vrabel@citrix.com \
--cc=linux-kernel@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=wei.liu2@citrix.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.