From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wg0-f49.google.com (mail-wg0-f49.google.com [74.125.82.49]) by kanga.kvack.org (Postfix) with ESMTP id 228C16B0031 for ; Mon, 7 Apr 2014 11:10:51 -0400 (EDT) Received: by mail-wg0-f49.google.com with SMTP id a1so7090440wgh.32 for ; Mon, 07 Apr 2014 08:10:50 -0700 (PDT) Received: from mx2.suse.de (cantor2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id bw16si5259030wib.115.2014.04.07.08.10.49 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 07 Apr 2014 08:10:49 -0700 (PDT) From: Mel Gorman Subject: [RFC PATCH 0/3] Use an alternative to _PAGE_PROTNONE for _PAGE_NUMA Date: Mon, 7 Apr 2014 16:10:40 +0100 Message-Id: <1396883443-11696-1-git-send-email-mgorman@suse.de> Sender: owner-linux-mm@kvack.org List-ID: To: Linus Torvalds Cc: Cyrill Gorcunov , Mel Gorman , Peter Anvin , Ingo Molnar , Steven Noonan , Rik van Riel , David Vrabel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML Aliasing _PAGE_NUMA and _PAGE_PROTNONE had some convenient properties but it ultimately gave Xen a headache and pisses almost everybody off that looks closely at it. Two discussions on "why this makes sense" is one discussion too many so rather than having a third there is this series. Conceptually it's simple -- use an unused physical address bit for _PAGE_NUMA and make it a 64-bit only feature on x86. This had been avoided before because if the physical address space expands we are back to square one but lets worry about that when it happens unless the x86 maintainers or hardware people warn us that we're about to run headlong into a wall. Testing was minimal -- short lived JVM and autonumabench tests that trigger the relevant paths for NUMA balancing. Functionally it did not die miserably. Performance looks as expected with no major changes. arch/x86/Kconfig | 2 +- arch/x86/include/asm/pgtable.h | 8 +++---- arch/x86/include/asm/pgtable_types.h | 44 ++++++++++++++++++++---------------- mm/memory.c | 12 ---------- 4 files changed, 29 insertions(+), 37 deletions(-) -- 1.8.4.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f170.google.com (mail-wi0-f170.google.com [209.85.212.170]) by kanga.kvack.org (Postfix) with ESMTP id 24EAE6B0036 for ; Mon, 7 Apr 2014 11:10:51 -0400 (EDT) Received: by mail-wi0-f170.google.com with SMTP id bs8so6342101wib.5 for ; Mon, 07 Apr 2014 08:10:50 -0700 (PDT) Received: from mx2.suse.de (cantor2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id h9si6352935wjb.42.2014.04.07.08.10.49 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 07 Apr 2014 08:10:49 -0700 (PDT) From: Mel Gorman Subject: [PATCH 1/3] x86: Require x86-64 for automatic NUMA balancing Date: Mon, 7 Apr 2014 16:10:41 +0100 Message-Id: <1396883443-11696-2-git-send-email-mgorman@suse.de> In-Reply-To: <1396883443-11696-1-git-send-email-mgorman@suse.de> References: <1396883443-11696-1-git-send-email-mgorman@suse.de> Sender: owner-linux-mm@kvack.org List-ID: To: Linus Torvalds Cc: Cyrill Gorcunov , Mel Gorman , Peter Anvin , Ingo Molnar , Steven Noonan , Rik van Riel , David Vrabel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML Automatic NUMA balancing currently depends on reusing the PROT_NONE bit which has caused problems on Xen. In preparation for using one of the unused physical address bits this patch requires x86-64 for automatic NUMA balancing. 32-bit support for NUMA on x86 is no longer interesting and the loss of automatic NUMA balancing support should be no surprise. Signed-off-by: Mel Gorman --- arch/x86/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 0af5250..084b1c1 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -26,7 +26,7 @@ config X86 select ARCH_MIGHT_HAVE_PC_SERIO select HAVE_AOUT if X86_32 select HAVE_UNSTABLE_SCHED_CLOCK - select ARCH_SUPPORTS_NUMA_BALANCING + select ARCH_SUPPORTS_NUMA_BALANCING if X86_64 select ARCH_SUPPORTS_INT128 if X86_64 select ARCH_WANTS_PROT_NUMA_PROT_NONE select HAVE_IDE -- 1.8.4.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-we0-f170.google.com (mail-we0-f170.google.com [74.125.82.170]) by kanga.kvack.org (Postfix) with ESMTP id A15FC6B0037 for ; Mon, 7 Apr 2014 11:10:51 -0400 (EDT) Received: by mail-we0-f170.google.com with SMTP id w61so6943070wes.29 for ; Mon, 07 Apr 2014 08:10:50 -0700 (PDT) Received: from mx2.suse.de (cantor2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id 12si6338456wjw.205.2014.04.07.08.10.49 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 07 Apr 2014 08:10:49 -0700 (PDT) From: Mel Gorman Subject: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels Date: Mon, 7 Apr 2014 16:10:42 +0100 Message-Id: <1396883443-11696-3-git-send-email-mgorman@suse.de> In-Reply-To: <1396883443-11696-1-git-send-email-mgorman@suse.de> References: <1396883443-11696-1-git-send-email-mgorman@suse.de> Sender: owner-linux-mm@kvack.org List-ID: To: Linus Torvalds Cc: Cyrill Gorcunov , Mel Gorman , Peter Anvin , Ingo Molnar , Steven Noonan , Rik van Riel , David Vrabel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML _PAGE_NUMA is currently an alias of _PROT_PROTNONE to trap NUMA hinting faults. As the bit is shared care is taken that _PAGE_NUMA is only used in places where _PAGE_PROTNONE could not reach but this still causes problems on Xen and conceptually difficult. Fundamentally, we only need the _PAGE_NUMA bit to tell the difference between an entry that is really unmapped and a page that is protected for NUMA hinting faults. Due to physical address limitations bits 52:62 are free so we can currently use them. As the present bit is cleared when making a NUMA PTE, the hinting faults will still be trapped. It means that 32-bit NUMA cannot use automatic NUMA balancing but it is improbable that anyone cares about that configuration. In the future there will be a problem when the physical address space expands because the bits may no longer be free. There is also the risk that the hardware people are planning to use these bits for some other purpose. When/if this happens then an option would be to use bit 11 and disable kmemcheck if automatic NUMA balancing is enabled assuming bit 11 has not been used for something else in the meantime. Signed-off-by: Mel Gorman --- arch/x86/include/asm/pgtable.h | 8 +++---- arch/x86/include/asm/pgtable_types.h | 44 ++++++++++++++++++++---------------- 2 files changed, 28 insertions(+), 24 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index bbc8b12..58fa7d1 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -447,8 +447,8 @@ static inline int pte_same(pte_t a, pte_t b) static inline int pte_present(pte_t a) { - return pte_flags(a) & (_PAGE_PRESENT | _PAGE_PROTNONE | - _PAGE_NUMA); + return (pte_flags(a) & (_PAGE_PRESENT | _PAGE_PROTNONE | + _PAGE_NUMA)) != 0; } #define pte_accessible pte_accessible @@ -477,8 +477,8 @@ static inline int pmd_present(pmd_t pmd) * the _PAGE_PSE flag will remain set at all times while the * _PAGE_PRESENT bit is clear). */ - return pmd_flags(pmd) & (_PAGE_PRESENT | _PAGE_PROTNONE | _PAGE_PSE | - _PAGE_NUMA); + return (pmd_flags(pmd) & (_PAGE_PRESENT | _PAGE_PROTNONE | _PAGE_PSE | + _PAGE_NUMA)) != 0; } static inline int pmd_none(pmd_t pmd) diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index 1aa9ccd..f3eafd2 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -25,6 +25,15 @@ #define _PAGE_BIT_SPLITTING _PAGE_BIT_UNUSED1 /* only valid on a PSE pmd */ #define _PAGE_BIT_NX 63 /* No execute: only valid after cpuid check */ +/* + * Software bits ignored by the page table walker + * At the time of writing, different levels have bits that are ignored. Due + * to physical address limitations, bits 52:62 should be ignored for the PMD + * and PTE levels and are available for use by software. Be aware that this + * may change if the physical address space expands. + */ +#define _PAGE_BIT_NUMA 62 + /* If _PAGE_BIT_PRESENT is clear, we use these: */ /* - if the user mapped it with PROT_NONE; pte_present gives true */ #define _PAGE_BIT_PROTNONE _PAGE_BIT_GLOBAL @@ -56,6 +65,21 @@ #endif /* + * _PAGE_NUMA distinguishes between a numa hinting minor fault and a page + * that is not present. The hinting fault gathers numa placement statistics + * (see pte_numa()). The bit is always zero when the PTE is not present. + * + * The bit picked must be always zero when the pmd is present and not + * present, so that we don't lose information when we set it while + * atomically clearing the present bit. + */ +#ifdef CONFIG_NUMA_BALANCING +#define _PAGE_NUMA (_AT(pteval_t, 1) << _PAGE_BIT_NUMA) +#else +#define _PAGE_NUMA (_AT(pteval_t, 0)) +#endif + +/* * The same hidden bit is used by kmemcheck, but since kmemcheck * works on kernel pages while soft-dirty engine on user space, * they do not conflict with each other. @@ -94,26 +118,6 @@ #define _PAGE_FILE (_AT(pteval_t, 1) << _PAGE_BIT_FILE) #define _PAGE_PROTNONE (_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE) -/* - * _PAGE_NUMA indicates that this page will trigger a numa hinting - * minor page fault to gather numa placement statistics (see - * pte_numa()). The bit picked (8) is within the range between - * _PAGE_FILE (6) and _PAGE_PROTNONE (8) bits. Therefore, it doesn't - * require changes to the swp entry format because that bit is always - * zero when the pte is not present. - * - * The bit picked must be always zero when the pmd is present and not - * present, so that we don't lose information when we set it while - * atomically clearing the present bit. - * - * Because we shared the same bit (8) with _PAGE_PROTNONE this can be - * interpreted as _PAGE_NUMA only in places that _PAGE_PROTNONE - * couldn't reach, like handle_mm_fault() (see access_error in - * arch/x86/mm/fault.c, the vma protection must not be PROT_NONE for - * handle_mm_fault() to be invoked). - */ -#define _PAGE_NUMA _PAGE_PROTNONE - #define _PAGE_TABLE (_PAGE_PRESENT | _PAGE_RW | _PAGE_USER | \ _PAGE_ACCESSED | _PAGE_DIRTY) #define _KERNPG_TABLE (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | \ -- 1.8.4.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-we0-f171.google.com (mail-we0-f171.google.com [74.125.82.171]) by kanga.kvack.org (Postfix) with ESMTP id E4E066B0038 for ; Mon, 7 Apr 2014 11:10:51 -0400 (EDT) Received: by mail-we0-f171.google.com with SMTP id t61so6805527wes.2 for ; Mon, 07 Apr 2014 08:10:51 -0700 (PDT) Received: from mx2.suse.de (cantor2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id jt9si6341492wjc.180.2014.04.07.08.10.50 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 07 Apr 2014 08:10:50 -0700 (PDT) From: Mel Gorman Subject: [PATCH 3/3] mm: Allow FOLL_NUMA on FOLL_FORCE Date: Mon, 7 Apr 2014 16:10:43 +0100 Message-Id: <1396883443-11696-4-git-send-email-mgorman@suse.de> In-Reply-To: <1396883443-11696-1-git-send-email-mgorman@suse.de> References: <1396883443-11696-1-git-send-email-mgorman@suse.de> Sender: owner-linux-mm@kvack.org List-ID: To: Linus Torvalds Cc: Cyrill Gorcunov , Mel Gorman , Peter Anvin , Ingo Molnar , Steven Noonan , Rik van Riel , David Vrabel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML As _PAGE_NUMA is no longer aliased to _PAGE_PROTNONE there should be no confusion between them. It should be possible to kick away the special casing in __get_user_pages. Signed-off-by: Mel Gorman --- mm/memory.c | 12 ------------ 1 file changed, 12 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 22dfa61..b9c35a7 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1714,18 +1714,6 @@ long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, vm_flags &= (gup_flags & FOLL_FORCE) ? (VM_MAYREAD | VM_MAYWRITE) : (VM_READ | VM_WRITE); - /* - * If FOLL_FORCE and FOLL_NUMA are both set, handle_mm_fault - * would be called on PROT_NONE ranges. We must never invoke - * handle_mm_fault on PROT_NONE ranges or the NUMA hinting - * page faults would unprotect the PROT_NONE ranges if - * _PAGE_NUMA and _PAGE_PROTNONE are sharing the same pte/pmd - * bitflag. So to avoid that, don't set FOLL_NUMA if - * FOLL_FORCE is set. - */ - if (!(gup_flags & FOLL_FORCE)) - gup_flags |= FOLL_NUMA; - i = 0; do { -- 1.8.4.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yk0-f181.google.com (mail-yk0-f181.google.com [209.85.160.181]) by kanga.kvack.org (Postfix) with ESMTP id 5FBAD6B0031 for ; Mon, 7 Apr 2014 11:32:43 -0400 (EDT) Received: by mail-yk0-f181.google.com with SMTP id 131so5677058ykp.12 for ; Mon, 07 Apr 2014 08:32:43 -0700 (PDT) Received: from SMTP.CITRIX.COM (smtp.citrix.com. [66.165.176.89]) by mx.google.com with ESMTPS id a63si19606354yhk.189.2014.04.07.08.32.42 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 07 Apr 2014 08:32:42 -0700 (PDT) Message-ID: <5342C517.2020305@citrix.com> Date: Mon, 7 Apr 2014 16:32:39 +0100 From: David Vrabel MIME-Version: 1.0 Subject: Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels References: <1396883443-11696-1-git-send-email-mgorman@suse.de> <1396883443-11696-3-git-send-email-mgorman@suse.de> In-Reply-To: <1396883443-11696-3-git-send-email-mgorman@suse.de> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Mel Gorman Cc: Linus Torvalds , Cyrill Gorcunov , Peter Anvin , Ingo Molnar , Steven Noonan , Rik van Riel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML On 07/04/14 16:10, Mel Gorman wrote: > _PAGE_NUMA is currently an alias of _PROT_PROTNONE to trap NUMA hinting > faults. As the bit is shared care is taken that _PAGE_NUMA is only used in > places where _PAGE_PROTNONE could not reach but this still causes problems > on Xen and conceptually difficult. The problem with Xen guests occurred because mprotect() /was/ confusing PROTNONE mappings with _PAGE_NUMA and clearing the non-existant NUMA hints. David -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f173.google.com (mail-wi0-f173.google.com [209.85.212.173]) by kanga.kvack.org (Postfix) with ESMTP id E53A66B0031 for ; Mon, 7 Apr 2014 11:49:41 -0400 (EDT) Received: by mail-wi0-f173.google.com with SMTP id z2so5358775wiv.0 for ; Mon, 07 Apr 2014 08:49:41 -0700 (PDT) Received: from mx2.suse.de (cantor2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id vj2si6395842wjc.184.2014.04.07.08.49.39 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 07 Apr 2014 08:49:40 -0700 (PDT) Date: Mon, 7 Apr 2014 16:49:35 +0100 From: Mel Gorman Subject: Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels Message-ID: <20140407154935.GD7292@suse.de> References: <1396883443-11696-1-git-send-email-mgorman@suse.de> <1396883443-11696-3-git-send-email-mgorman@suse.de> <5342C517.2020305@citrix.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <5342C517.2020305@citrix.com> Sender: owner-linux-mm@kvack.org List-ID: To: David Vrabel Cc: Linus Torvalds , Cyrill Gorcunov , Peter Anvin , Ingo Molnar , Steven Noonan , Rik van Riel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML On Mon, Apr 07, 2014 at 04:32:39PM +0100, David Vrabel wrote: > On 07/04/14 16:10, Mel Gorman wrote: > > _PAGE_NUMA is currently an alias of _PROT_PROTNONE to trap NUMA hinting > > faults. As the bit is shared care is taken that _PAGE_NUMA is only used in > > places where _PAGE_PROTNONE could not reach but this still causes problems > > on Xen and conceptually difficult. > > The problem with Xen guests occurred because mprotect() /was/ confusing > PROTNONE mappings with _PAGE_NUMA and clearing the non-existant NUMA hints. > I didn't bother spelling it out in case I gave the impression that I was blaming Xen for the problem. As the bit is now changes, does it help the Xen problem or cause another collision of some sort? There is no guarantee _PAGE_NUMA will remain as bit 62 but at worst it'll use bit 11 and NUMA_BALANCING will depend in !KMEMCHECK. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-la0-f47.google.com (mail-la0-f47.google.com [209.85.215.47]) by kanga.kvack.org (Postfix) with ESMTP id 07BA96B0031 for ; Mon, 7 Apr 2014 12:19:14 -0400 (EDT) Received: by mail-la0-f47.google.com with SMTP id pn19so4914619lab.6 for ; Mon, 07 Apr 2014 09:19:13 -0700 (PDT) Received: from mail-la0-x22a.google.com (mail-la0-x22a.google.com [2a00:1450:4010:c03::22a]) by mx.google.com with ESMTPS id y6si12662332lal.131.2014.04.07.09.19.11 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 07 Apr 2014 09:19:12 -0700 (PDT) Received: by mail-la0-f42.google.com with SMTP id ec20so5090279lab.1 for ; Mon, 07 Apr 2014 09:19:11 -0700 (PDT) Date: Mon, 7 Apr 2014 20:19:10 +0400 From: Cyrill Gorcunov Subject: Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels Message-ID: <20140407161910.GJ1444@moon> References: <1396883443-11696-1-git-send-email-mgorman@suse.de> <1396883443-11696-3-git-send-email-mgorman@suse.de> <5342C517.2020305@citrix.com> <20140407154935.GD7292@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140407154935.GD7292@suse.de> Sender: owner-linux-mm@kvack.org List-ID: To: Mel Gorman Cc: David Vrabel , Linus Torvalds , Peter Anvin , Ingo Molnar , Steven Noonan , Rik van Riel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML , Pavel Emelyanov On Mon, Apr 07, 2014 at 04:49:35PM +0100, Mel Gorman wrote: > On Mon, Apr 07, 2014 at 04:32:39PM +0100, David Vrabel wrote: > > On 07/04/14 16:10, Mel Gorman wrote: > > > _PAGE_NUMA is currently an alias of _PROT_PROTNONE to trap NUMA hinting > > > faults. As the bit is shared care is taken that _PAGE_NUMA is only used in > > > places where _PAGE_PROTNONE could not reach but this still causes problems > > > on Xen and conceptually difficult. > > > > The problem with Xen guests occurred because mprotect() /was/ confusing > > PROTNONE mappings with _PAGE_NUMA and clearing the non-existant NUMA hints. > > I didn't bother spelling it out in case I gave the impression that I was > blaming Xen for the problem. As the bit is now changes, does it help > the Xen problem or cause another collision of some sort? There is no > guarantee _PAGE_NUMA will remain as bit 62 but at worst it'll use bit 11 > and NUMA_BALANCING will depend in !KMEMCHECK. Fwiw, we're using bit 11 for soft-dirty tracking, so i really hope worst case never happen. (At the moment I'm trying to figure out if with this set it would be possible to clean up ugly macros in pgoff_to_pte for 2 level pages). -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ee0-f48.google.com (mail-ee0-f48.google.com [74.125.83.48]) by kanga.kvack.org (Postfix) with ESMTP id 6D6006B0072 for ; Mon, 7 Apr 2014 14:29:01 -0400 (EDT) Received: by mail-ee0-f48.google.com with SMTP id b57so898960eek.7 for ; Mon, 07 Apr 2014 11:29:00 -0700 (PDT) Received: from mx2.suse.de (cantor2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id g45si25046793eev.10.2014.04.07.11.28.59 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 07 Apr 2014 11:28:59 -0700 (PDT) Date: Mon, 7 Apr 2014 19:28:54 +0100 From: Mel Gorman Subject: Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels Message-ID: <20140407182854.GH7292@suse.de> References: <1396883443-11696-1-git-send-email-mgorman@suse.de> <1396883443-11696-3-git-send-email-mgorman@suse.de> <5342C517.2020305@citrix.com> <20140407154935.GD7292@suse.de> <20140407161910.GJ1444@moon> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20140407161910.GJ1444@moon> Sender: owner-linux-mm@kvack.org List-ID: To: Cyrill Gorcunov Cc: David Vrabel , Linus Torvalds , Peter Anvin , Ingo Molnar , Steven Noonan , Rik van Riel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML , Pavel Emelyanov On Mon, Apr 07, 2014 at 08:19:10PM +0400, Cyrill Gorcunov wrote: > On Mon, Apr 07, 2014 at 04:49:35PM +0100, Mel Gorman wrote: > > On Mon, Apr 07, 2014 at 04:32:39PM +0100, David Vrabel wrote: > > > On 07/04/14 16:10, Mel Gorman wrote: > > > > _PAGE_NUMA is currently an alias of _PROT_PROTNONE to trap NUMA hinting > > > > faults. As the bit is shared care is taken that _PAGE_NUMA is only used in > > > > places where _PAGE_PROTNONE could not reach but this still causes problems > > > > on Xen and conceptually difficult. > > > > > > The problem with Xen guests occurred because mprotect() /was/ confusing > > > PROTNONE mappings with _PAGE_NUMA and clearing the non-existant NUMA hints. > > > > I didn't bother spelling it out in case I gave the impression that I was > > blaming Xen for the problem. As the bit is now changes, does it help > > the Xen problem or cause another collision of some sort? There is no > > guarantee _PAGE_NUMA will remain as bit 62 but at worst it'll use bit 11 > > and NUMA_BALANCING will depend in !KMEMCHECK. > > Fwiw, we're using bit 11 for soft-dirty tracking, so i really hope worst case > never happen. (At the moment I'm trying to figure out if with this set > it would be possible to clean up ugly macros in pgoff_to_pte for 2 level pages). I had considered the soft-dirty tracking usage of the same bit. I thought I'd be able to swizzle around it or a further worst case of having soft-dirty and automatic NUMA balancing mutually exclusive. Unfortunately upon examination it's not obvious how to have both of them share a bit and I suspect any attempt to will break CRIU. In my current tree, NUMA_BALANCING cannot be set if MEM_SOFT_DIRTY which is not particularly satisfactory. Next on the list is examining if _PAGE_BIT_IOMAP can be used. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lb0-f177.google.com (mail-lb0-f177.google.com [209.85.217.177]) by kanga.kvack.org (Postfix) with ESMTP id 852166B006E for ; Mon, 7 Apr 2014 15:16:26 -0400 (EDT) Received: by mail-lb0-f177.google.com with SMTP id z11so5112824lbi.22 for ; Mon, 07 Apr 2014 12:16:25 -0700 (PDT) Received: from mail-lb0-x22f.google.com (mail-lb0-x22f.google.com [2a00:1450:4010:c04::22f]) by mx.google.com with ESMTPS id u5si12983825laa.52.2014.04.07.12.16.23 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 07 Apr 2014 12:16:24 -0700 (PDT) Received: by mail-lb0-f175.google.com with SMTP id w7so5187134lbi.34 for ; Mon, 07 Apr 2014 12:16:23 -0700 (PDT) Date: Mon, 7 Apr 2014 23:16:22 +0400 From: Cyrill Gorcunov Subject: Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels Message-ID: <20140407191622.GA23983@moon> References: <1396883443-11696-1-git-send-email-mgorman@suse.de> <1396883443-11696-3-git-send-email-mgorman@suse.de> <5342C517.2020305@citrix.com> <20140407154935.GD7292@suse.de> <20140407161910.GJ1444@moon> <20140407182854.GH7292@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140407182854.GH7292@suse.de> Sender: owner-linux-mm@kvack.org List-ID: To: Mel Gorman Cc: David Vrabel , Linus Torvalds , Peter Anvin , Ingo Molnar , Steven Noonan , Rik van Riel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML , Pavel Emelyanov On Mon, Apr 07, 2014 at 07:28:54PM +0100, Mel Gorman wrote: > > > I didn't bother spelling it out in case I gave the impression that I was > > > blaming Xen for the problem. As the bit is now changes, does it help > > > the Xen problem or cause another collision of some sort? There is no > > > guarantee _PAGE_NUMA will remain as bit 62 but at worst it'll use bit 11 > > > and NUMA_BALANCING will depend in !KMEMCHECK. > > > > Fwiw, we're using bit 11 for soft-dirty tracking, so i really hope worst case > > never happen. (At the moment I'm trying to figure out if with this set > > it would be possible to clean up ugly macros in pgoff_to_pte for 2 level pages). > > I had considered the soft-dirty tracking usage of the same bit. I thought I'd > be able to swizzle around it or a further worst case of having soft-dirty and > automatic NUMA balancing mutually exclusive. Unfortunately upon examination > it's not obvious how to have both of them share a bit and I suspect any > attempt to will break CRIU. In my current tree, NUMA_BALANCING cannot be > set if MEM_SOFT_DIRTY which is not particularly satisfactory. Next on the > list is examining if _PAGE_BIT_IOMAP can be used. Thanks for info, Mel! It seems indeed if no more space left on x86-64 (in the very worst case which I still think won't happen anytime soon) we'll have to make them mut. exclusive. But for now (with 62 bit used for numa) they can live together, right? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wg0-f51.google.com (mail-wg0-f51.google.com [74.125.82.51]) by kanga.kvack.org (Postfix) with ESMTP id 870366B0036 for ; Mon, 7 Apr 2014 15:28:10 -0400 (EDT) Received: by mail-wg0-f51.google.com with SMTP id k14so7355157wgh.22 for ; Mon, 07 Apr 2014 12:28:09 -0700 (PDT) Received: from mail.zytor.com (terminus.zytor.com. [2001:1868:205::10]) by mx.google.com with ESMTPS id dh4si6643348wjc.167.2014.04.07.12.28.04 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 07 Apr 2014 12:28:05 -0700 (PDT) Message-ID: <5342FC0E.9080701@zytor.com> Date: Mon, 07 Apr 2014 12:27:10 -0700 From: "H. Peter Anvin" MIME-Version: 1.0 Subject: Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels References: <1396883443-11696-1-git-send-email-mgorman@suse.de> <1396883443-11696-3-git-send-email-mgorman@suse.de> <5342C517.2020305@citrix.com> <20140407154935.GD7292@suse.de> <20140407161910.GJ1444@moon> <20140407182854.GH7292@suse.de> In-Reply-To: <20140407182854.GH7292@suse.de> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Mel Gorman , Cyrill Gorcunov Cc: David Vrabel , Linus Torvalds , Ingo Molnar , Steven Noonan , Rik van Riel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML , Pavel Emelyanov On 04/07/2014 11:28 AM, Mel Gorman wrote: > > I had considered the soft-dirty tracking usage of the same bit. I thought I'd > be able to swizzle around it or a further worst case of having soft-dirty and > automatic NUMA balancing mutually exclusive. Unfortunately upon examination > it's not obvious how to have both of them share a bit and I suspect any > attempt to will break CRIU. In my current tree, NUMA_BALANCING cannot be > set if MEM_SOFT_DIRTY which is not particularly satisfactory. Next on the > list is examining if _PAGE_BIT_IOMAP can be used. > Didn't we smoke the last user of _PAGE_BIT_IOMAP? -hpa -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-la0-f42.google.com (mail-la0-f42.google.com [209.85.215.42]) by kanga.kvack.org (Postfix) with ESMTP id 1892C6B0031 for ; Mon, 7 Apr 2014 15:36:49 -0400 (EDT) Received: by mail-la0-f42.google.com with SMTP id ec20so5222769lab.15 for ; Mon, 07 Apr 2014 12:36:49 -0700 (PDT) Received: from mail-lb0-x22e.google.com (mail-lb0-x22e.google.com [2a00:1450:4010:c04::22e]) by mx.google.com with ESMTPS id e6si13012013lah.79.2014.04.07.12.36.48 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 07 Apr 2014 12:36:48 -0700 (PDT) Received: by mail-lb0-f174.google.com with SMTP id u14so5183767lbd.19 for ; Mon, 07 Apr 2014 12:36:48 -0700 (PDT) Date: Mon, 7 Apr 2014 23:36:46 +0400 From: Cyrill Gorcunov Subject: Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels Message-ID: <20140407193646.GC23983@moon> References: <1396883443-11696-1-git-send-email-mgorman@suse.de> <1396883443-11696-3-git-send-email-mgorman@suse.de> <5342C517.2020305@citrix.com> <20140407154935.GD7292@suse.de> <20140407161910.GJ1444@moon> <20140407182854.GH7292@suse.de> <5342FC0E.9080701@zytor.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5342FC0E.9080701@zytor.com> Sender: owner-linux-mm@kvack.org List-ID: To: "H. Peter Anvin" Cc: Mel Gorman , David Vrabel , Linus Torvalds , Ingo Molnar , Steven Noonan , Rik van Riel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML , Pavel Emelyanov On Mon, Apr 07, 2014 at 12:27:10PM -0700, H. Peter Anvin wrote: > On 04/07/2014 11:28 AM, Mel Gorman wrote: > > > > I had considered the soft-dirty tracking usage of the same bit. I thought I'd > > be able to swizzle around it or a further worst case of having soft-dirty and > > automatic NUMA balancing mutually exclusive. Unfortunately upon examination > > it's not obvious how to have both of them share a bit and I suspect any > > attempt to will break CRIU. In my current tree, NUMA_BALANCING cannot be > > set if MEM_SOFT_DIRTY which is not particularly satisfactory. Next on the > > list is examining if _PAGE_BIT_IOMAP can be used. > > Didn't we smoke the last user of _PAGE_BIT_IOMAP? Seems so, at least for non-kernel pages (not considering this bit references in xen code, which i simply don't know but i guess it's used for kernel pages only). -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ee0-f50.google.com (mail-ee0-f50.google.com [74.125.83.50]) by kanga.kvack.org (Postfix) with ESMTP id C5E256B0031 for ; Mon, 7 Apr 2014 15:43:18 -0400 (EDT) Received: by mail-ee0-f50.google.com with SMTP id c13so54372eek.23 for ; Mon, 07 Apr 2014 12:43:18 -0700 (PDT) Received: from mail.zytor.com (terminus.zytor.com. [2001:1868:205::10]) by mx.google.com with ESMTPS id l41si25285717eef.38.2014.04.07.12.43.16 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 07 Apr 2014 12:43:17 -0700 (PDT) Message-ID: <5342FFB0.6010501@zytor.com> Date: Mon, 07 Apr 2014 12:42:40 -0700 From: "H. Peter Anvin" MIME-Version: 1.0 Subject: Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels References: <1396883443-11696-1-git-send-email-mgorman@suse.de> <1396883443-11696-3-git-send-email-mgorman@suse.de> <5342C517.2020305@citrix.com> <20140407154935.GD7292@suse.de> <20140407161910.GJ1444@moon> <20140407182854.GH7292@suse.de> <5342FC0E.9080701@zytor.com> <20140407193646.GC23983@moon> In-Reply-To: <20140407193646.GC23983@moon> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Cyrill Gorcunov Cc: Mel Gorman , David Vrabel , Linus Torvalds , Ingo Molnar , Steven Noonan , Rik van Riel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML , Pavel Emelyanov On 04/07/2014 12:36 PM, Cyrill Gorcunov wrote: > On Mon, Apr 07, 2014 at 12:27:10PM -0700, H. Peter Anvin wrote: >> On 04/07/2014 11:28 AM, Mel Gorman wrote: >>> >>> I had considered the soft-dirty tracking usage of the same bit. I thought I'd >>> be able to swizzle around it or a further worst case of having soft-dirty and >>> automatic NUMA balancing mutually exclusive. Unfortunately upon examination >>> it's not obvious how to have both of them share a bit and I suspect any >>> attempt to will break CRIU. In my current tree, NUMA_BALANCING cannot be >>> set if MEM_SOFT_DIRTY which is not particularly satisfactory. Next on the >>> list is examining if _PAGE_BIT_IOMAP can be used. >> >> Didn't we smoke the last user of _PAGE_BIT_IOMAP? > > Seems so, at least for non-kernel pages (not considering this bit references in > xen code, which i simply don't know but i guess it's used for kernel pages only). > David Vrabel has a patchset which I presumed would be pulled through the Xen tree this merge window: [PATCHv5 0/8] x86/xen: fixes for mapping high MMIO regions (and remove _PAGE_IOMAP) That frees up this bit. -hpa -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pb0-f42.google.com (mail-pb0-f42.google.com [209.85.160.42]) by kanga.kvack.org (Postfix) with ESMTP id 112C16B0031 for ; Mon, 7 Apr 2014 15:58:41 -0400 (EDT) Received: by mail-pb0-f42.google.com with SMTP id rr13so7256917pbb.29 for ; Mon, 07 Apr 2014 12:58:40 -0700 (PDT) Received: from mga09.intel.com (mga09.intel.com. [134.134.136.24]) by mx.google.com with ESMTP id ic8si8856327pad.259.2014.04.07.12.58.38 for ; Mon, 07 Apr 2014 12:58:39 -0700 (PDT) Message-ID: <5342E273.4070308@intel.com> Date: Mon, 07 Apr 2014 10:37:55 -0700 From: Dave Hansen MIME-Version: 1.0 Subject: Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels References: <1396883443-11696-1-git-send-email-mgorman@suse.de> <1396883443-11696-3-git-send-email-mgorman@suse.de> In-Reply-To: <1396883443-11696-3-git-send-email-mgorman@suse.de> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Mel Gorman , Linus Torvalds Cc: Cyrill Gorcunov , Peter Anvin , Ingo Molnar , Steven Noonan , Rik van Riel , David Vrabel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML On 04/07/2014 08:10 AM, Mel Gorman wrote: > +/* > + * Software bits ignored by the page table walker > + * At the time of writing, different levels have bits that are ignored. Due > + * to physical address limitations, bits 52:62 should be ignored for the PMD > + * and PTE levels and are available for use by software. Be aware that this > + * may change if the physical address space expands. > + */ > +#define _PAGE_BIT_NUMA 62 Doesn't moving it up to the high bits break pte_modify()'s assumptions? I was thinking of this nugget from change_pte_range(): ptent = ptep_modify_prot_start(mm, addr, pte); if (pte_numa(ptent)) ptent = pte_mknonnuma(ptent); ptent = pte_modify(ptent, newprot); pte_modify() pulls off all the high bits out of 'ptent' and only adds them back if they're in newprot (which as far as I can tell comes from the VMA). So I _think_ it'll axe the _PAGE_NUMA out of 'ptent'. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f181.google.com (mail-wi0-f181.google.com [209.85.212.181]) by kanga.kvack.org (Postfix) with ESMTP id 7210C6B0031 for ; Mon, 7 Apr 2014 17:19:53 -0400 (EDT) Received: by mail-wi0-f181.google.com with SMTP id hm4so253822wib.14 for ; Mon, 07 Apr 2014 14:19:52 -0700 (PDT) Received: from mx2.suse.de (cantor2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id z42si25548129eel.122.2014.04.07.14.19.51 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 07 Apr 2014 14:19:51 -0700 (PDT) Date: Mon, 7 Apr 2014 22:19:44 +0100 From: Mel Gorman Subject: Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels Message-ID: <20140407211944.GI7292@suse.de> References: <1396883443-11696-1-git-send-email-mgorman@suse.de> <1396883443-11696-3-git-send-email-mgorman@suse.de> <5342C517.2020305@citrix.com> <20140407154935.GD7292@suse.de> <20140407161910.GJ1444@moon> <20140407182854.GH7292@suse.de> <5342FC0E.9080701@zytor.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <5342FC0E.9080701@zytor.com> Sender: owner-linux-mm@kvack.org List-ID: To: "H. Peter Anvin" Cc: Cyrill Gorcunov , David Vrabel , Linus Torvalds , Ingo Molnar , Steven Noonan , Rik van Riel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML , Pavel Emelyanov On Mon, Apr 07, 2014 at 12:27:10PM -0700, H. Peter Anvin wrote: > On 04/07/2014 11:28 AM, Mel Gorman wrote: > > > > I had considered the soft-dirty tracking usage of the same bit. I thought I'd > > be able to swizzle around it or a further worst case of having soft-dirty and > > automatic NUMA balancing mutually exclusive. Unfortunately upon examination > > it's not obvious how to have both of them share a bit and I suspect any > > attempt to will break CRIU. In my current tree, NUMA_BALANCING cannot be > > set if MEM_SOFT_DIRTY which is not particularly satisfactory. Next on the > > list is examining if _PAGE_BIT_IOMAP can be used. > > > > Didn't we smoke the last user of _PAGE_BIT_IOMAP? > There are still some users of _PAGE_IOMAP with Xen being the main user. For x86 on bare metal it looks like userspace should never have a PTE with _PAGE_IO set so it should be usable as _PAGE_NUMA. Patches that do that are currently being tested but a side-effect was that I had to disable support on Xen as Xen appears to use it to distinguish between Xen PTEs and MFNs. It's unclear what automatic NUMA balancing on Xen even means -- are NUMA nodes always mapped to the physical topology? What is sensible behaviour if guest and host both run it? etc. If they need it, we can then examine what the proper way to support _PAGE_NUMA on Xen is. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wg0-f42.google.com (mail-wg0-f42.google.com [74.125.82.42]) by kanga.kvack.org (Postfix) with ESMTP id E0DFD6B0031 for ; Mon, 7 Apr 2014 17:25:42 -0400 (EDT) Received: by mail-wg0-f42.google.com with SMTP id y10so33389wgg.1 for ; Mon, 07 Apr 2014 14:25:42 -0700 (PDT) Received: from mx2.suse.de (cantor2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id x46si25544176eea.269.2014.04.07.14.25.41 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 07 Apr 2014 14:25:41 -0700 (PDT) Date: Mon, 7 Apr 2014 22:25:35 +0100 From: Mel Gorman Subject: Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels Message-ID: <20140407212535.GJ7292@suse.de> References: <1396883443-11696-1-git-send-email-mgorman@suse.de> <1396883443-11696-3-git-send-email-mgorman@suse.de> <5342C517.2020305@citrix.com> <20140407154935.GD7292@suse.de> <20140407161910.GJ1444@moon> <20140407182854.GH7292@suse.de> <5342FC0E.9080701@zytor.com> <20140407193646.GC23983@moon> <5342FFB0.6010501@zytor.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <5342FFB0.6010501@zytor.com> Sender: owner-linux-mm@kvack.org List-ID: To: "H. Peter Anvin" Cc: Cyrill Gorcunov , David Vrabel , Linus Torvalds , Ingo Molnar , Steven Noonan , Rik van Riel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML , Pavel Emelyanov On Mon, Apr 07, 2014 at 12:42:40PM -0700, H. Peter Anvin wrote: > On 04/07/2014 12:36 PM, Cyrill Gorcunov wrote: > > On Mon, Apr 07, 2014 at 12:27:10PM -0700, H. Peter Anvin wrote: > >> On 04/07/2014 11:28 AM, Mel Gorman wrote: > >>> > >>> I had considered the soft-dirty tracking usage of the same bit. I thought I'd > >>> be able to swizzle around it or a further worst case of having soft-dirty and > >>> automatic NUMA balancing mutually exclusive. Unfortunately upon examination > >>> it's not obvious how to have both of them share a bit and I suspect any > >>> attempt to will break CRIU. In my current tree, NUMA_BALANCING cannot be > >>> set if MEM_SOFT_DIRTY which is not particularly satisfactory. Next on the > >>> list is examining if _PAGE_BIT_IOMAP can be used. > >> > >> Didn't we smoke the last user of _PAGE_BIT_IOMAP? > > > > Seems so, at least for non-kernel pages (not considering this bit references in > > xen code, which i simply don't know but i guess it's used for kernel pages only). > > > > David Vrabel has a patchset which I presumed would be pulled through the > Xen tree this merge window: > > [PATCHv5 0/8] x86/xen: fixes for mapping high MMIO regions (and remove > _PAGE_IOMAP) > > That frees up this bit. > Thanks, I was not aware of that patch. Based on it, I intend to force automatic NUMA balancing to depend on !XEN and see what the reaction is. If support for Xen is really required then it potentially be re-enabled if/when that series is merged assuming they do not need the bit for something else. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-la0-f48.google.com (mail-la0-f48.google.com [209.85.215.48]) by kanga.kvack.org (Postfix) with ESMTP id 8C0256B0068 for ; Tue, 8 Apr 2014 00:04:55 -0400 (EDT) Received: by mail-la0-f48.google.com with SMTP id gf5so321367lab.7 for ; Mon, 07 Apr 2014 21:04:54 -0700 (PDT) Received: from mail-lb0-f180.google.com (mail-lb0-f180.google.com [209.85.217.180]) by mx.google.com with ESMTPS id sz4si367710lbb.204.2014.04.07.21.04.53 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 07 Apr 2014 21:04:54 -0700 (PDT) Received: by mail-lb0-f180.google.com with SMTP id 10so318731lbg.25 for ; Mon, 07 Apr 2014 21:04:53 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20140407212535.GJ7292@suse.de> References: <1396883443-11696-1-git-send-email-mgorman@suse.de> <1396883443-11696-3-git-send-email-mgorman@suse.de> <5342C517.2020305@citrix.com> <20140407154935.GD7292@suse.de> <20140407161910.GJ1444@moon> <20140407182854.GH7292@suse.de> <5342FC0E.9080701@zytor.com> <20140407193646.GC23983@moon> <5342FFB0.6010501@zytor.com> <20140407212535.GJ7292@suse.de> Date: Mon, 7 Apr 2014 21:04:53 -0700 Message-ID: Subject: Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels From: Steven Noonan Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: To: Mel Gorman Cc: "H. Peter Anvin" , Cyrill Gorcunov , David Vrabel , Linus Torvalds , Ingo Molnar , Rik van Riel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML , Pavel Emelyanov On Mon, Apr 7, 2014 at 2:25 PM, Mel Gorman wrote: > On Mon, Apr 07, 2014 at 12:42:40PM -0700, H. Peter Anvin wrote: >> On 04/07/2014 12:36 PM, Cyrill Gorcunov wrote: >> > On Mon, Apr 07, 2014 at 12:27:10PM -0700, H. Peter Anvin wrote: >> >> On 04/07/2014 11:28 AM, Mel Gorman wrote: >> >>> >> >>> I had considered the soft-dirty tracking usage of the same bit. I thought I'd >> >>> be able to swizzle around it or a further worst case of having soft-dirty and >> >>> automatic NUMA balancing mutually exclusive. Unfortunately upon examination >> >>> it's not obvious how to have both of them share a bit and I suspect any >> >>> attempt to will break CRIU. In my current tree, NUMA_BALANCING cannot be >> >>> set if MEM_SOFT_DIRTY which is not particularly satisfactory. Next on the >> >>> list is examining if _PAGE_BIT_IOMAP can be used. >> >> >> >> Didn't we smoke the last user of _PAGE_BIT_IOMAP? >> > >> > Seems so, at least for non-kernel pages (not considering this bit references in >> > xen code, which i simply don't know but i guess it's used for kernel pages only). >> > >> >> David Vrabel has a patchset which I presumed would be pulled through the >> Xen tree this merge window: >> >> [PATCHv5 0/8] x86/xen: fixes for mapping high MMIO regions (and remove >> _PAGE_IOMAP) >> >> That frees up this bit. >> > > Thanks, I was not aware of that patch. Based on it, I intend to force > automatic NUMA balancing to depend on !XEN and see what the reaction is. If > support for Xen is really required then it potentially be re-enabled if/when > that series is merged assuming they do not need the bit for something else. > Amazon EC2 does have large memory instance types with NUMA exposed to the guest (e.g. c3.8xlarge, i2.8xlarge, etc), so it'd be preferable (to me anyway) if we didn't require !XEN. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yk0-f171.google.com (mail-yk0-f171.google.com [209.85.160.171]) by kanga.kvack.org (Postfix) with ESMTP id 46E756B0095 for ; Tue, 8 Apr 2014 05:31:40 -0400 (EDT) Received: by mail-yk0-f171.google.com with SMTP id q9so552636ykb.30 for ; Tue, 08 Apr 2014 02:31:38 -0700 (PDT) Received: from SMTP02.CITRIX.COM (smtp02.citrix.com. [66.165.176.63]) by mx.google.com with ESMTPS id m27si1719479yha.60.2014.04.08.02.31.38 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 08 Apr 2014 02:31:38 -0700 (PDT) Message-ID: <5343C1F6.4090600@citrix.com> Date: Tue, 8 Apr 2014 10:31:34 +0100 From: David Vrabel MIME-Version: 1.0 Subject: Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels References: <1396883443-11696-1-git-send-email-mgorman@suse.de> <1396883443-11696-3-git-send-email-mgorman@suse.de> <5342C517.2020305@citrix.com> <20140407154935.GD7292@suse.de> <20140407161910.GJ1444@moon> <20140407182854.GH7292@suse.de> <5342FC0E.9080701@zytor.com> <20140407193646.GC23983@moon> In-Reply-To: <20140407193646.GC23983@moon> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Cyrill Gorcunov Cc: "H. Peter Anvin" , Mel Gorman , Linus Torvalds , Ingo Molnar , Steven Noonan , Rik van Riel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML , Pavel Emelyanov On 07/04/14 20:36, Cyrill Gorcunov wrote: > On Mon, Apr 07, 2014 at 12:27:10PM -0700, H. Peter Anvin wrote: >> On 04/07/2014 11:28 AM, Mel Gorman wrote: >>> >>> I had considered the soft-dirty tracking usage of the same bit. I thought I'd >>> be able to swizzle around it or a further worst case of having soft-dirty and >>> automatic NUMA balancing mutually exclusive. Unfortunately upon examination >>> it's not obvious how to have both of them share a bit and I suspect any >>> attempt to will break CRIU. In my current tree, NUMA_BALANCING cannot be >>> set if MEM_SOFT_DIRTY which is not particularly satisfactory. Next on the >>> list is examining if _PAGE_BIT_IOMAP can be used. >> >> Didn't we smoke the last user of _PAGE_BIT_IOMAP? Not yet. A last minute regression with mapping of I/O regions from userspace was found so I had to drop the series from 3.15. It should be back for 3.16. > Seems so, at least for non-kernel pages (not considering this bit references in > xen code, which i simply don't know but i guess it's used for kernel pages only). Xen uses it for all I/O mappings, both kernel and for userspace. David -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-we0-f170.google.com (mail-we0-f170.google.com [74.125.82.170]) by kanga.kvack.org (Postfix) with ESMTP id 411596B0036 for ; Tue, 8 Apr 2014 11:17:16 -0400 (EDT) Received: by mail-we0-f170.google.com with SMTP id w61so1126637wes.15 for ; Tue, 08 Apr 2014 08:17:11 -0700 (PDT) Received: from mail.zytor.com (terminus.zytor.com. [2001:1868:205::10]) by mx.google.com with ESMTPS id gt3si1063666wib.8.2014.04.08.08.17.08 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 08 Apr 2014 08:17:09 -0700 (PDT) In-Reply-To: References: <1396883443-11696-1-git-send-email-mgorman@suse.de> <1396883443-11696-3-git-send-email-mgorman@suse.de> <5342C517.2020305@citrix.com> <20140407154935.GD7292@suse.de> <20140407161910.GJ1444@moon> <20140407182854.GH7292@suse.de> <5342FC0E.9080701@zytor.com> <20140407193646.GC23983@moon> <5342FFB0.6010501@zytor.com> <20140407212535.GJ7292@suse.de> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=UTF-8 Subject: Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels From: "H. Peter Anvin" Date: Tue, 08 Apr 2014 08:16:14 -0700 Message-ID: Sender: owner-linux-mm@kvack.org List-ID: To: Steven Noonan , Mel Gorman Cc: Cyrill Gorcunov , David Vrabel , Linus Torvalds , Ingo Molnar , Rik van Riel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML , Pavel Emelyanov Of course, it would also be preferable if Amazon (or anything else) didn't need Xen PV :( On April 7, 2014 9:04:53 PM PDT, Steven Noonan wrote: >On Mon, Apr 7, 2014 at 2:25 PM, Mel Gorman wrote: >> On Mon, Apr 07, 2014 at 12:42:40PM -0700, H. Peter Anvin wrote: >>> On 04/07/2014 12:36 PM, Cyrill Gorcunov wrote: >>> > On Mon, Apr 07, 2014 at 12:27:10PM -0700, H. Peter Anvin wrote: >>> >> On 04/07/2014 11:28 AM, Mel Gorman wrote: >>> >>> >>> >>> I had considered the soft-dirty tracking usage of the same bit. >I thought I'd >>> >>> be able to swizzle around it or a further worst case of having >soft-dirty and >>> >>> automatic NUMA balancing mutually exclusive. Unfortunately upon >examination >>> >>> it's not obvious how to have both of them share a bit and I >suspect any >>> >>> attempt to will break CRIU. In my current tree, NUMA_BALANCING >cannot be >>> >>> set if MEM_SOFT_DIRTY which is not particularly satisfactory. >Next on the >>> >>> list is examining if _PAGE_BIT_IOMAP can be used. >>> >> >>> >> Didn't we smoke the last user of _PAGE_BIT_IOMAP? >>> > >>> > Seems so, at least for non-kernel pages (not considering this bit >references in >>> > xen code, which i simply don't know but i guess it's used for >kernel pages only). >>> > >>> >>> David Vrabel has a patchset which I presumed would be pulled through >the >>> Xen tree this merge window: >>> >>> [PATCHv5 0/8] x86/xen: fixes for mapping high MMIO regions (and >remove >>> _PAGE_IOMAP) >>> >>> That frees up this bit. >>> >> >> Thanks, I was not aware of that patch. Based on it, I intend to >force >> automatic NUMA balancing to depend on !XEN and see what the reaction >is. If >> support for Xen is really required then it potentially be re-enabled >if/when >> that series is merged assuming they do not need the bit for something >else. >> > >Amazon EC2 does have large memory instance types with NUMA exposed to >the guest (e.g. c3.8xlarge, i2.8xlarge, etc), so it'd be preferable >(to me anyway) if we didn't require !XEN. -- Sent from my mobile phone. Please pardon brevity and lack of formatting. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ig0-f177.google.com (mail-ig0-f177.google.com [209.85.213.177]) by kanga.kvack.org (Postfix) with ESMTP id 4DBB66B0031 for ; Tue, 8 Apr 2014 12:03:29 -0400 (EDT) Received: by mail-ig0-f177.google.com with SMTP id ur14so1242071igb.4 for ; Tue, 08 Apr 2014 09:03:29 -0700 (PDT) Received: from userp1040.oracle.com (userp1040.oracle.com. [156.151.31.81]) by mx.google.com with ESMTPS id ql8si3934960igc.9.2014.04.08.09.03.28 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Tue, 08 Apr 2014 09:03:28 -0700 (PDT) Date: Tue, 8 Apr 2014 12:02:50 -0400 From: Konrad Rzeszutek Wilk Subject: Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels Message-ID: <20140408160250.GE31554@phenom.dumpdata.com> References: <5342C517.2020305@citrix.com> <20140407154935.GD7292@suse.de> <20140407161910.GJ1444@moon> <20140407182854.GH7292@suse.de> <5342FC0E.9080701@zytor.com> <20140407193646.GC23983@moon> <5342FFB0.6010501@zytor.com> <20140407212535.GJ7292@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: "H. Peter Anvin" Cc: Steven Noonan , Mel Gorman , Cyrill Gorcunov , David Vrabel , Linus Torvalds , Ingo Molnar , Rik van Riel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML , Pavel Emelyanov .snip.. > >>> David Vrabel has a patchset which I presumed would be pulled through > >the > >>> Xen tree this merge window: > >>> > >>> [PATCHv5 0/8] x86/xen: fixes for mapping high MMIO regions (and > >remove > >>> _PAGE_IOMAP) > >>> > >>> That frees up this bit. > >>> > >> > >> Thanks, I was not aware of that patch. Based on it, I intend to > >force > >> automatic NUMA balancing to depend on !XEN and see what the reaction > >is. If > >> support for Xen is really required then it potentially be re-enabled > >if/when > >> that series is merged assuming they do not need the bit for something > >else. > >> > > > >Amazon EC2 does have large memory instance types with NUMA exposed to > >the guest (e.g. c3.8xlarge, i2.8xlarge, etc), so it'd be preferable > >(to me anyway) if we didn't require !XEN. What about the patch that David Vrabel posted: http://osdir.com/ml/general/2014-03/msg41979.html Has anybody taken it for a spin? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ee0-f44.google.com (mail-ee0-f44.google.com [74.125.83.44]) by kanga.kvack.org (Postfix) with ESMTP id 4AF946B0035 for ; Tue, 8 Apr 2014 12:18:18 -0400 (EDT) Received: by mail-ee0-f44.google.com with SMTP id e49so893056eek.3 for ; Tue, 08 Apr 2014 09:18:17 -0700 (PDT) Received: from mail.zytor.com (terminus.zytor.com. [2001:1868:205::10]) by mx.google.com with ESMTPS id u5si3421205een.113.2014.04.08.09.18.15 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 08 Apr 2014 09:18:16 -0700 (PDT) Message-ID: <534420F1.3030301@zytor.com> Date: Tue, 08 Apr 2014 09:16:49 -0700 From: "H. Peter Anvin" MIME-Version: 1.0 Subject: Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels References: <5342C517.2020305@citrix.com> <20140407154935.GD7292@suse.de> <20140407161910.GJ1444@moon> <20140407182854.GH7292@suse.de> <5342FC0E.9080701@zytor.com> <20140407193646.GC23983@moon> <5342FFB0.6010501@zytor.com> <20140407212535.GJ7292@suse.de> <20140408160250.GE31554@phenom.dumpdata.com> In-Reply-To: <20140408160250.GE31554@phenom.dumpdata.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Konrad Rzeszutek Wilk Cc: Steven Noonan , Mel Gorman , Cyrill Gorcunov , David Vrabel , Linus Torvalds , Ingo Molnar , Rik van Riel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML , Pavel Emelyanov On 04/08/2014 09:02 AM, Konrad Rzeszutek Wilk wrote: >>> >>> Amazon EC2 does have large memory instance types with NUMA exposed to >>> the guest (e.g. c3.8xlarge, i2.8xlarge, etc), so it'd be preferable >>> (to me anyway) if we didn't require !XEN. > > What about the patch that David Vrabel posted: > > http://osdir.com/ml/general/2014-03/msg41979.html > > Has anybody taken it for a spin? > Oh lovely, more pvops in low level paths. I'm so thrilled. Incidentally, I wasn't even Cc:'d on that patch and was only added to the thread by Linus, but never saw the early bits of the thread including the actual patch. -hpa -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ee0-f47.google.com (mail-ee0-f47.google.com [74.125.83.47]) by kanga.kvack.org (Postfix) with ESMTP id 7C5C86B0031 for ; Tue, 8 Apr 2014 12:47:50 -0400 (EDT) Received: by mail-ee0-f47.google.com with SMTP id b15so906252eek.34 for ; Tue, 08 Apr 2014 09:47:48 -0700 (PDT) Received: from mx2.suse.de (cantor2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id 43si3505313eei.265.2014.04.08.09.47.47 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 08 Apr 2014 09:47:47 -0700 (PDT) Date: Tue, 8 Apr 2014 17:47:44 +0100 From: Mel Gorman Subject: Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels Message-ID: <20140408164744.GM7292@suse.de> References: <20140407161910.GJ1444@moon> <20140407182854.GH7292@suse.de> <5342FC0E.9080701@zytor.com> <20140407193646.GC23983@moon> <5342FFB0.6010501@zytor.com> <20140407212535.GJ7292@suse.de> <20140408160250.GE31554@phenom.dumpdata.com> <534420F1.3030301@zytor.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <534420F1.3030301@zytor.com> Sender: owner-linux-mm@kvack.org List-ID: To: "H. Peter Anvin" Cc: Konrad Rzeszutek Wilk , Steven Noonan , Cyrill Gorcunov , David Vrabel , Linus Torvalds , Ingo Molnar , Rik van Riel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML , Pavel Emelyanov On Tue, Apr 08, 2014 at 09:16:49AM -0700, H. Peter Anvin wrote: > On 04/08/2014 09:02 AM, Konrad Rzeszutek Wilk wrote: > >>> > >>> Amazon EC2 does have large memory instance types with NUMA exposed to > >>> the guest (e.g. c3.8xlarge, i2.8xlarge, etc), so it'd be preferable > >>> (to me anyway) if we didn't require !XEN. > > > > What about the patch that David Vrabel posted: > > > > http://osdir.com/ml/general/2014-03/msg41979.html > > > > Has anybody taken it for a spin? > > > > Oh lovely, more pvops in low level paths. I'm so thrilled. > > Incidentally, I wasn't even Cc:'d on that patch and was only added to > the thread by Linus, but never saw the early bits of the thread > including the actual patch. > I posted an alternative to that patch that confines the damage to the NUMA pte helpers. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yk0-f177.google.com (mail-yk0-f177.google.com [209.85.160.177]) by kanga.kvack.org (Postfix) with ESMTP id 4DD4A6B0031 for ; Tue, 8 Apr 2014 12:51:15 -0400 (EDT) Received: by mail-yk0-f177.google.com with SMTP id q200so1049079ykb.8 for ; Tue, 08 Apr 2014 09:51:14 -0700 (PDT) Received: from SMTP02.CITRIX.COM (smtp02.citrix.com. [66.165.176.63]) by mx.google.com with ESMTPS id k25si3226164yhl.54.2014.04.08.09.51.12 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 08 Apr 2014 09:51:12 -0700 (PDT) Message-ID: <534428F2.2040205@citrix.com> Date: Tue, 8 Apr 2014 17:50:58 +0100 From: David Vrabel MIME-Version: 1.0 Subject: Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels References: <5342C517.2020305@citrix.com> <20140407154935.GD7292@suse.de> <20140407161910.GJ1444@moon> <20140407182854.GH7292@suse.de> <5342FC0E.9080701@zytor.com> <20140407193646.GC23983@moon> <5342FFB0.6010501@zytor.com> <20140407212535.GJ7292@suse.de> <20140408160250.GE31554@phenom.dumpdata.com> <534420F1.3030301@zytor.com> In-Reply-To: <534420F1.3030301@zytor.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: "H. Peter Anvin" Cc: Konrad Rzeszutek Wilk , Steven Noonan , Mel Gorman , Cyrill Gorcunov , Linus Torvalds , Ingo Molnar , Rik van Riel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML , Pavel Emelyanov On 08/04/14 17:16, H. Peter Anvin wrote: > On 04/08/2014 09:02 AM, Konrad Rzeszutek Wilk wrote: >>>> >>>> Amazon EC2 does have large memory instance types with NUMA exposed to >>>> the guest (e.g. c3.8xlarge, i2.8xlarge, etc), so it'd be preferable >>>> (to me anyway) if we didn't require !XEN. >> >> What about the patch that David Vrabel posted: >> >> http://osdir.com/ml/general/2014-03/msg41979.html >> >> Has anybody taken it for a spin? >> > > Oh lovely, more pvops in low level paths. I'm so thrilled. > > Incidentally, I wasn't even Cc:'d on that patch and was only added to > the thread by Linus, but never saw the early bits of the thread > including the actual patch. I did resend a version CC'd to all the x86 maintainers and included some performance figures for native (~1 extra clock cycle). I've included it again below. My preference would be take this patch as it fixes it for both NUMA rebalancing and any future uses that want to set/clear _PAGE_PRESENT. David 8<-------------- x86: use pv-ops in {pte, pmd}_{set,clear}_flags() Instead of using native functions to operate on the PTEs in pte_set_flags(), pte_clear_flags(), pmd_set_flags(), pmd_clear_flags() use the PV aware ones. This fixes a regression in Xen PV guests introduced by 1667918b6483 (mm: numa: clear numa hinting information on mprotect). This has negligible performance impact on native since the pte_val() and __pte() (etc.) calls are patched at runtime when running on bare metal. Measurements on a 3 GHz AMD 4284 give approx. 0.3 ns (~1 clock cycle) of additional time for each function. Xen PV guest page tables require that their entries use machine addresses if the preset bit (_PAGE_PRESENT) is set, and (for successful migration) non-present PTEs must use pseudo-physical addresses. This is because on migration MFNs only present PTEs are translated to PFNs (canonicalised) so they may be translated back to the new MFN in the destination domain (uncanonicalised). pte_mknonnuma(), pmd_mknonnuma(), pte_mknuma() and pmd_mknuma() set and clear the _PAGE_PRESENT bit using pte_set_flags(), pte_clear_flags(), etc. In a Xen PV guest, these functions must translate MFNs to PFNs when clearing _PAGE_PRESENT and translate PFNs to MFNs when setting _PAGE_PRESENT. Signed-off-by: David Vrabel Cc: Steven Noonan Cc: Elena Ufimtseva Cc: Mel Gorman Cc: [3.12+] --- arch/x86/include/asm/pgtable.h | 12 ++++++------ 1 files changed, 6 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index bbc8b12..323e5e2 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -174,16 +174,16 @@ static inline int has_transparent_hugepage(void) static inline pte_t pte_set_flags(pte_t pte, pteval_t set) { - pteval_t v = native_pte_val(pte); + pteval_t v = pte_val(pte); - return native_make_pte(v | set); + return __pte(v | set); } static inline pte_t pte_clear_flags(pte_t pte, pteval_t clear) { - pteval_t v = native_pte_val(pte); + pteval_t v = pte_val(pte); - return native_make_pte(v & ~clear); + return __pte(v & ~clear); } static inline pte_t pte_mkclean(pte_t pte) @@ -248,14 +248,14 @@ static inline pte_t pte_mkspecial(pte_t pte) static inline pmd_t pmd_set_flags(pmd_t pmd, pmdval_t set) { - pmdval_t v = native_pmd_val(pmd); + pmdval_t v = pmd_val(pmd); return __pmd(v | set); } static inline pmd_t pmd_clear_flags(pmd_t pmd, pmdval_t clear) { - pmdval_t v = native_pmd_val(pmd); + pmdval_t v = pmd_val(pmd); return __pmd(v & ~clear); } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ee0-f50.google.com (mail-ee0-f50.google.com [74.125.83.50]) by kanga.kvack.org (Postfix) with ESMTP id ABD576B0037 for ; Tue, 8 Apr 2014 12:51:28 -0400 (EDT) Received: by mail-ee0-f50.google.com with SMTP id c13so904599eek.23 for ; Tue, 08 Apr 2014 09:51:28 -0700 (PDT) Received: from mx2.suse.de (cantor2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id x47si3524515eel.223.2014.04.08.09.51.27 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 08 Apr 2014 09:51:27 -0700 (PDT) Date: Tue, 8 Apr 2014 17:51:23 +0100 From: Mel Gorman Subject: Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels Message-ID: <20140408165123.GN7292@suse.de> References: <20140407154935.GD7292@suse.de> <20140407161910.GJ1444@moon> <20140407182854.GH7292@suse.de> <5342FC0E.9080701@zytor.com> <20140407193646.GC23983@moon> <5342FFB0.6010501@zytor.com> <20140407212535.GJ7292@suse.de> <20140408160250.GE31554@phenom.dumpdata.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20140408160250.GE31554@phenom.dumpdata.com> Sender: owner-linux-mm@kvack.org List-ID: To: Konrad Rzeszutek Wilk Cc: "H. Peter Anvin" , Steven Noonan , Cyrill Gorcunov , David Vrabel , Linus Torvalds , Ingo Molnar , Rik van Riel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML , Pavel Emelyanov On Tue, Apr 08, 2014 at 12:02:50PM -0400, Konrad Rzeszutek Wilk wrote: > .snip.. > > >>> David Vrabel has a patchset which I presumed would be pulled through > > >the > > >>> Xen tree this merge window: > > >>> > > >>> [PATCHv5 0/8] x86/xen: fixes for mapping high MMIO regions (and > > >remove > > >>> _PAGE_IOMAP) > > >>> > > >>> That frees up this bit. > > >>> > > >> > > >> Thanks, I was not aware of that patch. Based on it, I intend to > > >force > > >> automatic NUMA balancing to depend on !XEN and see what the reaction > > >is. If > > >> support for Xen is really required then it potentially be re-enabled > > >if/when > > >> that series is merged assuming they do not need the bit for something > > >else. > > >> > > > > > >Amazon EC2 does have large memory instance types with NUMA exposed to > > >the guest (e.g. c3.8xlarge, i2.8xlarge, etc), so it'd be preferable > > >(to me anyway) if we didn't require !XEN. > > What about the patch that David Vrabel posted: > > http://osdir.com/ml/general/2014-03/msg41979.html > > Has anybody taken it for a spin? Alternatively "[PATCH 4/5] mm: use paravirt friendly ops for NUMA hinting ptes" which modifies the NUMA pte helpers instead of the main set/clear ones. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-bk0-f42.google.com (mail-bk0-f42.google.com [209.85.214.42]) by kanga.kvack.org (Postfix) with ESMTP id E92C86B0036 for ; Tue, 8 Apr 2014 16:51:31 -0400 (EDT) Received: by mail-bk0-f42.google.com with SMTP id mx12so1308392bkb.1 for ; Tue, 08 Apr 2014 13:51:31 -0700 (PDT) Received: from mail-bk0-f50.google.com (mail-bk0-f50.google.com [209.85.214.50]) by mx.google.com with ESMTPS id nr10si1684866bkb.71.2014.04.08.13.51.29 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 08 Apr 2014 13:51:30 -0700 (PDT) Received: by mail-bk0-f50.google.com with SMTP id w10so1266202bkz.23 for ; Tue, 08 Apr 2014 13:51:28 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: <1396883443-11696-1-git-send-email-mgorman@suse.de> <1396883443-11696-3-git-send-email-mgorman@suse.de> <5342C517.2020305@citrix.com> <20140407154935.GD7292@suse.de> <20140407161910.GJ1444@moon> <20140407182854.GH7292@suse.de> <5342FC0E.9080701@zytor.com> <20140407193646.GC23983@moon> <5342FFB0.6010501@zytor.com> <20140407212535.GJ7292@suse.de> Date: Tue, 8 Apr 2014 13:51:28 -0700 Message-ID: Subject: Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels From: Steven Noonan Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: To: "H. Peter Anvin" Cc: Mel Gorman , Cyrill Gorcunov , David Vrabel , Linus Torvalds , Ingo Molnar , Rik van Riel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML , Pavel Emelyanov On Tue, Apr 8, 2014 at 8:16 AM, H. Peter Anvin wrote: > > > Of course, it would also be preferable if Amazon (or anything else) didn't need Xen PV :( Well Amazon doesn't expose NUMA on PV, only on HVM guests. > On April 7, 2014 9:04:53 PM PDT, Steven Noonan wrote: >>On Mon, Apr 7, 2014 at 2:25 PM, Mel Gorman wrote: >>> On Mon, Apr 07, 2014 at 12:42:40PM -0700, H. Peter Anvin wrote: >>>> On 04/07/2014 12:36 PM, Cyrill Gorcunov wrote: >>>> > On Mon, Apr 07, 2014 at 12:27:10PM -0700, H. Peter Anvin wrote: >>>> >> On 04/07/2014 11:28 AM, Mel Gorman wrote: >>>> >>> >>>> >>> I had considered the soft-dirty tracking usage of the same bit. >>I thought I'd >>>> >>> be able to swizzle around it or a further worst case of having >>soft-dirty and >>>> >>> automatic NUMA balancing mutually exclusive. Unfortunately upon >>examination >>>> >>> it's not obvious how to have both of them share a bit and I >>suspect any >>>> >>> attempt to will break CRIU. In my current tree, NUMA_BALANCING >>cannot be >>>> >>> set if MEM_SOFT_DIRTY which is not particularly satisfactory. >>Next on the >>>> >>> list is examining if _PAGE_BIT_IOMAP can be used. >>>> >> >>>> >> Didn't we smoke the last user of _PAGE_BIT_IOMAP? >>>> > >>>> > Seems so, at least for non-kernel pages (not considering this bit >>references in >>>> > xen code, which i simply don't know but i guess it's used for >>kernel pages only). >>>> > >>>> >>>> David Vrabel has a patchset which I presumed would be pulled through >>the >>>> Xen tree this merge window: >>>> >>>> [PATCHv5 0/8] x86/xen: fixes for mapping high MMIO regions (and >>remove >>>> _PAGE_IOMAP) >>>> >>>> That frees up this bit. >>>> >>> >>> Thanks, I was not aware of that patch. Based on it, I intend to >>force >>> automatic NUMA balancing to depend on !XEN and see what the reaction >>is. If >>> support for Xen is really required then it potentially be re-enabled >>if/when >>> that series is merged assuming they do not need the bit for something >>else. >>> >> >>Amazon EC2 does have large memory instance types with NUMA exposed to >>the guest (e.g. c3.8xlarge, i2.8xlarge, etc), so it'd be preferable >>(to me anyway) if we didn't require !XEN. > > -- > Sent from my mobile phone. Please pardon brevity and lack of formatting. > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ee0-f48.google.com (mail-ee0-f48.google.com [74.125.83.48]) by kanga.kvack.org (Postfix) with ESMTP id BACE66B0036 for ; Tue, 8 Apr 2014 16:59:55 -0400 (EDT) Received: by mail-ee0-f48.google.com with SMTP id b57so1131432eek.7 for ; Tue, 08 Apr 2014 13:59:55 -0700 (PDT) Received: from mail.zytor.com (terminus.zytor.com. [2001:1868:205::10]) by mx.google.com with ESMTPS id z42si4409072eel.2.2014.04.08.13.59.53 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 08 Apr 2014 13:59:54 -0700 (PDT) Message-ID: <5344631D.1050203@zytor.com> Date: Tue, 08 Apr 2014 13:59:09 -0700 From: "H. Peter Anvin" MIME-Version: 1.0 Subject: Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels References: <1396883443-11696-1-git-send-email-mgorman@suse.de> <1396883443-11696-3-git-send-email-mgorman@suse.de> <5342C517.2020305@citrix.com> <20140407154935.GD7292@suse.de> <20140407161910.GJ1444@moon> <20140407182854.GH7292@suse.de> <5342FC0E.9080701@zytor.com> <20140407193646.GC23983@moon> <5342FFB0.6010501@zytor.com> <20140407212535.GJ7292@suse.de> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Steven Noonan Cc: Mel Gorman , Cyrill Gorcunov , David Vrabel , Linus Torvalds , Ingo Molnar , Rik van Riel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML , Pavel Emelyanov On 04/08/2014 01:51 PM, Steven Noonan wrote: > On Tue, Apr 8, 2014 at 8:16 AM, H. Peter Anvin wrote: >> >> >> Of course, it would also be preferable if Amazon (or anything else) didn't need Xen PV :( > > Well Amazon doesn't expose NUMA on PV, only on HVM guests. > Yes, but Amazon is one of the main things keeping Xen PV alive as far as I can tell, which means the support gets built in, and so on. -hpa -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ie0-f178.google.com (mail-ie0-f178.google.com [209.85.223.178]) by kanga.kvack.org (Postfix) with ESMTP id AAF0F6B0035 for ; Wed, 9 Apr 2014 11:05:35 -0400 (EDT) Received: by mail-ie0-f178.google.com with SMTP id lx4so2583257iec.9 for ; Wed, 09 Apr 2014 08:05:34 -0700 (PDT) Received: from userp1040.oracle.com (userp1040.oracle.com. [156.151.31.81]) by mx.google.com with ESMTPS id m10si1554615icu.169.2014.04.09.08.05.28 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Wed, 09 Apr 2014 08:05:28 -0700 (PDT) Date: Wed, 9 Apr 2014 11:04:48 -0400 From: Konrad Rzeszutek Wilk Subject: Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels Message-ID: <20140409150448.GE5860@phenom.dumpdata.com> References: <20140407161910.GJ1444@moon> <20140407182854.GH7292@suse.de> <5342FC0E.9080701@zytor.com> <20140407193646.GC23983@moon> <5342FFB0.6010501@zytor.com> <20140407212535.GJ7292@suse.de> <5344631D.1050203@zytor.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5344631D.1050203@zytor.com> Sender: owner-linux-mm@kvack.org List-ID: To: "H. Peter Anvin" Cc: Steven Noonan , Mel Gorman , Cyrill Gorcunov , David Vrabel , Linus Torvalds , Ingo Molnar , Rik van Riel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML , Pavel Emelyanov On Tue, Apr 08, 2014 at 01:59:09PM -0700, H. Peter Anvin wrote: > On 04/08/2014 01:51 PM, Steven Noonan wrote: > > On Tue, Apr 8, 2014 at 8:16 AM, H. Peter Anvin wrote: > >> > >> > >> Of course, it would also be preferable if Amazon (or anything else) didn't need Xen PV :( > > > > Well Amazon doesn't expose NUMA on PV, only on HVM guests. > > > > Yes, but Amazon is one of the main things keeping Xen PV alive as far as > I can tell, which means the support gets built in, and so on. Taking the snarkiness aside, the issue here is that even on guests without NUMA exposed the problem shows up. That is the 'mknuma' are still being called even if the guest topology is not NUMA! Which brings a question - why isn't the mknuma and its friends gatted by an jump_label machinery or such? Mel, any particular reasons why it couldn't be done this way? > > -hpa > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ie0-f180.google.com (mail-ie0-f180.google.com [209.85.223.180]) by kanga.kvack.org (Postfix) with ESMTP id 2C8C46B0031 for ; Wed, 9 Apr 2014 11:10:00 -0400 (EDT) Received: by mail-ie0-f180.google.com with SMTP id as1so2458459iec.25 for ; Wed, 09 Apr 2014 08:09:59 -0700 (PDT) Received: from merlin.infradead.org (merlin.infradead.org. [2001:4978:20e::2]) by mx.google.com with ESMTPS id nv5si10778558igb.41.2014.04.09.08.09.56 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 09 Apr 2014 08:09:56 -0700 (PDT) Date: Wed, 9 Apr 2014 17:09:37 +0200 From: Peter Zijlstra Subject: Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels Message-ID: <20140409150935.GC10526@twins.programming.kicks-ass.net> References: <20140407182854.GH7292@suse.de> <5342FC0E.9080701@zytor.com> <20140407193646.GC23983@moon> <5342FFB0.6010501@zytor.com> <20140407212535.GJ7292@suse.de> <5344631D.1050203@zytor.com> <20140409150448.GE5860@phenom.dumpdata.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140409150448.GE5860@phenom.dumpdata.com> Sender: owner-linux-mm@kvack.org List-ID: To: Konrad Rzeszutek Wilk Cc: "H. Peter Anvin" , Steven Noonan , Mel Gorman , Cyrill Gorcunov , David Vrabel , Linus Torvalds , Ingo Molnar , Rik van Riel , Andrew Morton , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML , Pavel Emelyanov On Wed, Apr 09, 2014 at 11:04:48AM -0400, Konrad Rzeszutek Wilk wrote: > On Tue, Apr 08, 2014 at 01:59:09PM -0700, H. Peter Anvin wrote: > > On 04/08/2014 01:51 PM, Steven Noonan wrote: > > > On Tue, Apr 8, 2014 at 8:16 AM, H. Peter Anvin wrote: > > >> > > >> > > >> Of course, it would also be preferable if Amazon (or anything else) didn't need Xen PV :( > > > > > > Well Amazon doesn't expose NUMA on PV, only on HVM guests. > > > > > > > Yes, but Amazon is one of the main things keeping Xen PV alive as far as > > I can tell, which means the support gets built in, and so on. > > Taking the snarkiness aside, the issue here is that even on guests > without NUMA exposed the problem shows up. That is the 'mknuma' are > still being called even if the guest topology is not NUMA! > > Which brings a question - why isn't the mknuma and its friends gatted by > an jump_label machinery or such? > > Mel, any particular reasons why it couldn't be done this way? Hmm,. I thought we disabled all that when there was only the 1 node. All this should be driven from task_tick_numa() which only gets called when numabalancing_enabled, and that _should_ be false when nr_nodes == 1. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yh0-f42.google.com (mail-yh0-f42.google.com [209.85.213.42]) by kanga.kvack.org (Postfix) with ESMTP id DEA0A6B0035 for ; Wed, 9 Apr 2014 11:19:07 -0400 (EDT) Received: by mail-yh0-f42.google.com with SMTP id t59so2512480yho.1 for ; Wed, 09 Apr 2014 08:19:05 -0700 (PDT) Received: from userp1040.oracle.com (userp1040.oracle.com. [156.151.31.81]) by mx.google.com with ESMTPS id i71si1390047yhb.101.2014.04.09.08.19.04 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Wed, 09 Apr 2014 08:19:04 -0700 (PDT) Date: Wed, 9 Apr 2014 11:18:27 -0400 From: Konrad Rzeszutek Wilk Subject: Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels Message-ID: <20140409151827.GA6445@phenom.dumpdata.com> References: <20140407161910.GJ1444@moon> <20140407182854.GH7292@suse.de> <5342FC0E.9080701@zytor.com> <20140407193646.GC23983@moon> <5342FFB0.6010501@zytor.com> <20140407212535.GJ7292@suse.de> <20140408160250.GE31554@phenom.dumpdata.com> <20140408165123.GN7292@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140408165123.GN7292@suse.de> Sender: owner-linux-mm@kvack.org List-ID: To: Mel Gorman Cc: "H. Peter Anvin" , Steven Noonan , Cyrill Gorcunov , David Vrabel , Linus Torvalds , Ingo Molnar , Rik van Riel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML , Pavel Emelyanov On Tue, Apr 08, 2014 at 05:51:23PM +0100, Mel Gorman wrote: > On Tue, Apr 08, 2014 at 12:02:50PM -0400, Konrad Rzeszutek Wilk wrote: > > .snip.. > > > >>> David Vrabel has a patchset which I presumed would be pulled through > > > >the > > > >>> Xen tree this merge window: > > > >>> > > > >>> [PATCHv5 0/8] x86/xen: fixes for mapping high MMIO regions (and > > > >remove > > > >>> _PAGE_IOMAP) > > > >>> > > > >>> That frees up this bit. > > > >>> > > > >> > > > >> Thanks, I was not aware of that patch. Based on it, I intend to > > > >force > > > >> automatic NUMA balancing to depend on !XEN and see what the reaction > > > >is. If > > > >> support for Xen is really required then it potentially be re-enabled > > > >if/when > > > >> that series is merged assuming they do not need the bit for something > > > >else. > > > >> > > > > > > > >Amazon EC2 does have large memory instance types with NUMA exposed to > > > >the guest (e.g. c3.8xlarge, i2.8xlarge, etc), so it'd be preferable > > > >(to me anyway) if we didn't require !XEN. > > > > What about the patch that David Vrabel posted: > > > > http://osdir.com/ml/general/2014-03/msg41979.html > > > > Has anybody taken it for a spin? > > Alternatively "[PATCH 4/5] mm: use paravirt friendly ops for NUMA > hinting ptes" which modifies the NUMA pte helpers instead of the main > set/clear ones. Ah nice! Looking forward to it being posted as non-RFC and could you also please CC 'xen-devel@lists.xenproject.org' on it? Thank you! -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f171.google.com (mail-wi0-f171.google.com [209.85.212.171]) by kanga.kvack.org (Postfix) with ESMTP id D5C516B0036 for ; Wed, 9 Apr 2014 11:39:25 -0400 (EDT) Received: by mail-wi0-f171.google.com with SMTP id q5so9149079wiv.4 for ; Wed, 09 Apr 2014 08:39:24 -0700 (PDT) Received: from mx2.suse.de (cantor2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id cy3si2996434wib.39.2014.04.09.08.39.22 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 09 Apr 2014 08:39:23 -0700 (PDT) Date: Wed, 9 Apr 2014 16:39:17 +0100 From: Mel Gorman Subject: Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels Message-ID: <20140409153916.GT7292@suse.de> References: <20140407182854.GH7292@suse.de> <5342FC0E.9080701@zytor.com> <20140407193646.GC23983@moon> <5342FFB0.6010501@zytor.com> <20140407212535.GJ7292@suse.de> <20140408160250.GE31554@phenom.dumpdata.com> <20140408165123.GN7292@suse.de> <20140409151827.GA6445@phenom.dumpdata.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20140409151827.GA6445@phenom.dumpdata.com> Sender: owner-linux-mm@kvack.org List-ID: To: Konrad Rzeszutek Wilk Cc: "H. Peter Anvin" , Steven Noonan , Cyrill Gorcunov , David Vrabel , Linus Torvalds , Ingo Molnar , Rik van Riel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML , Pavel Emelyanov On Wed, Apr 09, 2014 at 11:18:27AM -0400, Konrad Rzeszutek Wilk wrote: > On Tue, Apr 08, 2014 at 05:51:23PM +0100, Mel Gorman wrote: > > On Tue, Apr 08, 2014 at 12:02:50PM -0400, Konrad Rzeszutek Wilk wrote: > > > .snip.. > > > > >>> David Vrabel has a patchset which I presumed would be pulled through > > > > >the > > > > >>> Xen tree this merge window: > > > > >>> > > > > >>> [PATCHv5 0/8] x86/xen: fixes for mapping high MMIO regions (and > > > > >remove > > > > >>> _PAGE_IOMAP) > > > > >>> > > > > >>> That frees up this bit. > > > > >>> > > > > >> > > > > >> Thanks, I was not aware of that patch. Based on it, I intend to > > > > >force > > > > >> automatic NUMA balancing to depend on !XEN and see what the reaction > > > > >is. If > > > > >> support for Xen is really required then it potentially be re-enabled > > > > >if/when > > > > >> that series is merged assuming they do not need the bit for something > > > > >else. > > > > >> > > > > > > > > > >Amazon EC2 does have large memory instance types with NUMA exposed to > > > > >the guest (e.g. c3.8xlarge, i2.8xlarge, etc), so it'd be preferable > > > > >(to me anyway) if we didn't require !XEN. > > > > > > What about the patch that David Vrabel posted: > > > > > > http://osdir.com/ml/general/2014-03/msg41979.html > > > > > > Has anybody taken it for a spin? > > > > Alternatively "[PATCH 4/5] mm: use paravirt friendly ops for NUMA > > hinting ptes" which modifies the NUMA pte helpers instead of the main > > set/clear ones. > > Ah nice! Looking forward to it being posted as non-RFC and could you also > please CC 'xen-devel@lists.xenproject.org' on it? > Yes I will. Unless the x86 maintainers push for it on the grounds that it is a functional fix for xen, I'm going to wait until after the merge window to resend it. That'd give it some chance of being tested in -next before hitting mainline. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755812AbaDGPLH (ORCPT ); Mon, 7 Apr 2014 11:11:07 -0400 Received: from cantor2.suse.de ([195.135.220.15]:50826 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755758AbaDGPKu (ORCPT ); Mon, 7 Apr 2014 11:10:50 -0400 From: Mel Gorman To: Linus Torvalds Cc: Cyrill Gorcunov , Mel Gorman , Peter Anvin , Ingo Molnar , Steven Noonan , Rik van Riel , David Vrabel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML Subject: [RFC PATCH 0/3] Use an alternative to _PAGE_PROTNONE for _PAGE_NUMA Date: Mon, 7 Apr 2014 16:10:40 +0100 Message-Id: <1396883443-11696-1-git-send-email-mgorman@suse.de> X-Mailer: git-send-email 1.8.4.5 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Aliasing _PAGE_NUMA and _PAGE_PROTNONE had some convenient properties but it ultimately gave Xen a headache and pisses almost everybody off that looks closely at it. Two discussions on "why this makes sense" is one discussion too many so rather than having a third there is this series. Conceptually it's simple -- use an unused physical address bit for _PAGE_NUMA and make it a 64-bit only feature on x86. This had been avoided before because if the physical address space expands we are back to square one but lets worry about that when it happens unless the x86 maintainers or hardware people warn us that we're about to run headlong into a wall. Testing was minimal -- short lived JVM and autonumabench tests that trigger the relevant paths for NUMA balancing. Functionally it did not die miserably. Performance looks as expected with no major changes. arch/x86/Kconfig | 2 +- arch/x86/include/asm/pgtable.h | 8 +++---- arch/x86/include/asm/pgtable_types.h | 44 ++++++++++++++++++++---------------- mm/memory.c | 12 ---------- 4 files changed, 29 insertions(+), 37 deletions(-) -- 1.8.4.5 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755797AbaDGPLD (ORCPT ); Mon, 7 Apr 2014 11:11:03 -0400 Received: from cantor2.suse.de ([195.135.220.15]:50824 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755759AbaDGPKu (ORCPT ); Mon, 7 Apr 2014 11:10:50 -0400 From: Mel Gorman To: Linus Torvalds Cc: Cyrill Gorcunov , Mel Gorman , Peter Anvin , Ingo Molnar , Steven Noonan , Rik van Riel , David Vrabel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML Subject: [PATCH 1/3] x86: Require x86-64 for automatic NUMA balancing Date: Mon, 7 Apr 2014 16:10:41 +0100 Message-Id: <1396883443-11696-2-git-send-email-mgorman@suse.de> X-Mailer: git-send-email 1.8.4.5 In-Reply-To: <1396883443-11696-1-git-send-email-mgorman@suse.de> References: <1396883443-11696-1-git-send-email-mgorman@suse.de> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Automatic NUMA balancing currently depends on reusing the PROT_NONE bit which has caused problems on Xen. In preparation for using one of the unused physical address bits this patch requires x86-64 for automatic NUMA balancing. 32-bit support for NUMA on x86 is no longer interesting and the loss of automatic NUMA balancing support should be no surprise. Signed-off-by: Mel Gorman --- arch/x86/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 0af5250..084b1c1 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -26,7 +26,7 @@ config X86 select ARCH_MIGHT_HAVE_PC_SERIO select HAVE_AOUT if X86_32 select HAVE_UNSTABLE_SCHED_CLOCK - select ARCH_SUPPORTS_NUMA_BALANCING + select ARCH_SUPPORTS_NUMA_BALANCING if X86_64 select ARCH_SUPPORTS_INT128 if X86_64 select ARCH_WANTS_PROT_NUMA_PROT_NONE select HAVE_IDE -- 1.8.4.5 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755778AbaDGPK4 (ORCPT ); Mon, 7 Apr 2014 11:10:56 -0400 Received: from cantor2.suse.de ([195.135.220.15]:50829 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755760AbaDGPKu (ORCPT ); Mon, 7 Apr 2014 11:10:50 -0400 From: Mel Gorman To: Linus Torvalds Cc: Cyrill Gorcunov , Mel Gorman , Peter Anvin , Ingo Molnar , Steven Noonan , Rik van Riel , David Vrabel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML Subject: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels Date: Mon, 7 Apr 2014 16:10:42 +0100 Message-Id: <1396883443-11696-3-git-send-email-mgorman@suse.de> X-Mailer: git-send-email 1.8.4.5 In-Reply-To: <1396883443-11696-1-git-send-email-mgorman@suse.de> References: <1396883443-11696-1-git-send-email-mgorman@suse.de> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org _PAGE_NUMA is currently an alias of _PROT_PROTNONE to trap NUMA hinting faults. As the bit is shared care is taken that _PAGE_NUMA is only used in places where _PAGE_PROTNONE could not reach but this still causes problems on Xen and conceptually difficult. Fundamentally, we only need the _PAGE_NUMA bit to tell the difference between an entry that is really unmapped and a page that is protected for NUMA hinting faults. Due to physical address limitations bits 52:62 are free so we can currently use them. As the present bit is cleared when making a NUMA PTE, the hinting faults will still be trapped. It means that 32-bit NUMA cannot use automatic NUMA balancing but it is improbable that anyone cares about that configuration. In the future there will be a problem when the physical address space expands because the bits may no longer be free. There is also the risk that the hardware people are planning to use these bits for some other purpose. When/if this happens then an option would be to use bit 11 and disable kmemcheck if automatic NUMA balancing is enabled assuming bit 11 has not been used for something else in the meantime. Signed-off-by: Mel Gorman --- arch/x86/include/asm/pgtable.h | 8 +++---- arch/x86/include/asm/pgtable_types.h | 44 ++++++++++++++++++++---------------- 2 files changed, 28 insertions(+), 24 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index bbc8b12..58fa7d1 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -447,8 +447,8 @@ static inline int pte_same(pte_t a, pte_t b) static inline int pte_present(pte_t a) { - return pte_flags(a) & (_PAGE_PRESENT | _PAGE_PROTNONE | - _PAGE_NUMA); + return (pte_flags(a) & (_PAGE_PRESENT | _PAGE_PROTNONE | + _PAGE_NUMA)) != 0; } #define pte_accessible pte_accessible @@ -477,8 +477,8 @@ static inline int pmd_present(pmd_t pmd) * the _PAGE_PSE flag will remain set at all times while the * _PAGE_PRESENT bit is clear). */ - return pmd_flags(pmd) & (_PAGE_PRESENT | _PAGE_PROTNONE | _PAGE_PSE | - _PAGE_NUMA); + return (pmd_flags(pmd) & (_PAGE_PRESENT | _PAGE_PROTNONE | _PAGE_PSE | + _PAGE_NUMA)) != 0; } static inline int pmd_none(pmd_t pmd) diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index 1aa9ccd..f3eafd2 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -25,6 +25,15 @@ #define _PAGE_BIT_SPLITTING _PAGE_BIT_UNUSED1 /* only valid on a PSE pmd */ #define _PAGE_BIT_NX 63 /* No execute: only valid after cpuid check */ +/* + * Software bits ignored by the page table walker + * At the time of writing, different levels have bits that are ignored. Due + * to physical address limitations, bits 52:62 should be ignored for the PMD + * and PTE levels and are available for use by software. Be aware that this + * may change if the physical address space expands. + */ +#define _PAGE_BIT_NUMA 62 + /* If _PAGE_BIT_PRESENT is clear, we use these: */ /* - if the user mapped it with PROT_NONE; pte_present gives true */ #define _PAGE_BIT_PROTNONE _PAGE_BIT_GLOBAL @@ -56,6 +65,21 @@ #endif /* + * _PAGE_NUMA distinguishes between a numa hinting minor fault and a page + * that is not present. The hinting fault gathers numa placement statistics + * (see pte_numa()). The bit is always zero when the PTE is not present. + * + * The bit picked must be always zero when the pmd is present and not + * present, so that we don't lose information when we set it while + * atomically clearing the present bit. + */ +#ifdef CONFIG_NUMA_BALANCING +#define _PAGE_NUMA (_AT(pteval_t, 1) << _PAGE_BIT_NUMA) +#else +#define _PAGE_NUMA (_AT(pteval_t, 0)) +#endif + +/* * The same hidden bit is used by kmemcheck, but since kmemcheck * works on kernel pages while soft-dirty engine on user space, * they do not conflict with each other. @@ -94,26 +118,6 @@ #define _PAGE_FILE (_AT(pteval_t, 1) << _PAGE_BIT_FILE) #define _PAGE_PROTNONE (_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE) -/* - * _PAGE_NUMA indicates that this page will trigger a numa hinting - * minor page fault to gather numa placement statistics (see - * pte_numa()). The bit picked (8) is within the range between - * _PAGE_FILE (6) and _PAGE_PROTNONE (8) bits. Therefore, it doesn't - * require changes to the swp entry format because that bit is always - * zero when the pte is not present. - * - * The bit picked must be always zero when the pmd is present and not - * present, so that we don't lose information when we set it while - * atomically clearing the present bit. - * - * Because we shared the same bit (8) with _PAGE_PROTNONE this can be - * interpreted as _PAGE_NUMA only in places that _PAGE_PROTNONE - * couldn't reach, like handle_mm_fault() (see access_error in - * arch/x86/mm/fault.c, the vma protection must not be PROT_NONE for - * handle_mm_fault() to be invoked). - */ -#define _PAGE_NUMA _PAGE_PROTNONE - #define _PAGE_TABLE (_PAGE_PRESENT | _PAGE_RW | _PAGE_USER | \ _PAGE_ACCESSED | _PAGE_DIRTY) #define _KERNPG_TABLE (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | \ -- 1.8.4.5 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755850AbaDGPMX (ORCPT ); Mon, 7 Apr 2014 11:12:23 -0400 Received: from cantor2.suse.de ([195.135.220.15]:50834 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755478AbaDGPKu (ORCPT ); Mon, 7 Apr 2014 11:10:50 -0400 From: Mel Gorman To: Linus Torvalds Cc: Cyrill Gorcunov , Mel Gorman , Peter Anvin , Ingo Molnar , Steven Noonan , Rik van Riel , David Vrabel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML Subject: [PATCH 3/3] mm: Allow FOLL_NUMA on FOLL_FORCE Date: Mon, 7 Apr 2014 16:10:43 +0100 Message-Id: <1396883443-11696-4-git-send-email-mgorman@suse.de> X-Mailer: git-send-email 1.8.4.5 In-Reply-To: <1396883443-11696-1-git-send-email-mgorman@suse.de> References: <1396883443-11696-1-git-send-email-mgorman@suse.de> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org As _PAGE_NUMA is no longer aliased to _PAGE_PROTNONE there should be no confusion between them. It should be possible to kick away the special casing in __get_user_pages. Signed-off-by: Mel Gorman --- mm/memory.c | 12 ------------ 1 file changed, 12 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 22dfa61..b9c35a7 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1714,18 +1714,6 @@ long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, vm_flags &= (gup_flags & FOLL_FORCE) ? (VM_MAYREAD | VM_MAYWRITE) : (VM_READ | VM_WRITE); - /* - * If FOLL_FORCE and FOLL_NUMA are both set, handle_mm_fault - * would be called on PROT_NONE ranges. We must never invoke - * handle_mm_fault on PROT_NONE ranges or the NUMA hinting - * page faults would unprotect the PROT_NONE ranges if - * _PAGE_NUMA and _PAGE_PROTNONE are sharing the same pte/pmd - * bitflag. So to avoid that, don't set FOLL_NUMA if - * FOLL_FORCE is set. - */ - if (!(gup_flags & FOLL_FORCE)) - gup_flags |= FOLL_NUMA; - i = 0; do { -- 1.8.4.5 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754455AbaDGPco (ORCPT ); Mon, 7 Apr 2014 11:32:44 -0400 Received: from smtp.citrix.com ([66.165.176.89]:17698 "EHLO SMTP.CITRIX.COM" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750965AbaDGPcn (ORCPT ); Mon, 7 Apr 2014 11:32:43 -0400 X-IronPort-AV: E=Sophos;i="4.97,810,1389744000"; d="scan'208";a="118724411" Message-ID: <5342C517.2020305@citrix.com> Date: Mon, 7 Apr 2014 16:32:39 +0100 From: David Vrabel User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.16) Gecko/20121215 Iceowl/1.0b1 Icedove/3.0.11 MIME-Version: 1.0 To: Mel Gorman CC: Linus Torvalds , Cyrill Gorcunov , Peter Anvin , Ingo Molnar , Steven Noonan , Rik van Riel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML Subject: Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels References: <1396883443-11696-1-git-send-email-mgorman@suse.de> <1396883443-11696-3-git-send-email-mgorman@suse.de> In-Reply-To: <1396883443-11696-3-git-send-email-mgorman@suse.de> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.80.2.76] X-DLP: MIA2 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 07/04/14 16:10, Mel Gorman wrote: > _PAGE_NUMA is currently an alias of _PROT_PROTNONE to trap NUMA hinting > faults. As the bit is shared care is taken that _PAGE_NUMA is only used in > places where _PAGE_PROTNONE could not reach but this still causes problems > on Xen and conceptually difficult. The problem with Xen guests occurred because mprotect() /was/ confusing PROTNONE mappings with _PAGE_NUMA and clearing the non-existant NUMA hints. David From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754586AbaDGPtn (ORCPT ); Mon, 7 Apr 2014 11:49:43 -0400 Received: from cantor2.suse.de ([195.135.220.15]:51914 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751946AbaDGPtl (ORCPT ); Mon, 7 Apr 2014 11:49:41 -0400 Date: Mon, 7 Apr 2014 16:49:35 +0100 From: Mel Gorman To: David Vrabel Cc: Linus Torvalds , Cyrill Gorcunov , Peter Anvin , Ingo Molnar , Steven Noonan , Rik van Riel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML Subject: Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels Message-ID: <20140407154935.GD7292@suse.de> References: <1396883443-11696-1-git-send-email-mgorman@suse.de> <1396883443-11696-3-git-send-email-mgorman@suse.de> <5342C517.2020305@citrix.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <5342C517.2020305@citrix.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Apr 07, 2014 at 04:32:39PM +0100, David Vrabel wrote: > On 07/04/14 16:10, Mel Gorman wrote: > > _PAGE_NUMA is currently an alias of _PROT_PROTNONE to trap NUMA hinting > > faults. As the bit is shared care is taken that _PAGE_NUMA is only used in > > places where _PAGE_PROTNONE could not reach but this still causes problems > > on Xen and conceptually difficult. > > The problem with Xen guests occurred because mprotect() /was/ confusing > PROTNONE mappings with _PAGE_NUMA and clearing the non-existant NUMA hints. > I didn't bother spelling it out in case I gave the impression that I was blaming Xen for the problem. As the bit is now changes, does it help the Xen problem or cause another collision of some sort? There is no guarantee _PAGE_NUMA will remain as bit 62 but at worst it'll use bit 11 and NUMA_BALANCING will depend in !KMEMCHECK. -- Mel Gorman SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754760AbaDGQTR (ORCPT ); Mon, 7 Apr 2014 12:19:17 -0400 Received: from mail-la0-f54.google.com ([209.85.215.54]:61986 "EHLO mail-la0-f54.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753057AbaDGQTM (ORCPT ); Mon, 7 Apr 2014 12:19:12 -0400 Date: Mon, 7 Apr 2014 20:19:10 +0400 From: Cyrill Gorcunov To: Mel Gorman Cc: David Vrabel , Linus Torvalds , Peter Anvin , Ingo Molnar , Steven Noonan , Rik van Riel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML , Pavel Emelyanov Subject: Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels Message-ID: <20140407161910.GJ1444@moon> References: <1396883443-11696-1-git-send-email-mgorman@suse.de> <1396883443-11696-3-git-send-email-mgorman@suse.de> <5342C517.2020305@citrix.com> <20140407154935.GD7292@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140407154935.GD7292@suse.de> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Apr 07, 2014 at 04:49:35PM +0100, Mel Gorman wrote: > On Mon, Apr 07, 2014 at 04:32:39PM +0100, David Vrabel wrote: > > On 07/04/14 16:10, Mel Gorman wrote: > > > _PAGE_NUMA is currently an alias of _PROT_PROTNONE to trap NUMA hinting > > > faults. As the bit is shared care is taken that _PAGE_NUMA is only used in > > > places where _PAGE_PROTNONE could not reach but this still causes problems > > > on Xen and conceptually difficult. > > > > The problem with Xen guests occurred because mprotect() /was/ confusing > > PROTNONE mappings with _PAGE_NUMA and clearing the non-existant NUMA hints. > > I didn't bother spelling it out in case I gave the impression that I was > blaming Xen for the problem. As the bit is now changes, does it help > the Xen problem or cause another collision of some sort? There is no > guarantee _PAGE_NUMA will remain as bit 62 but at worst it'll use bit 11 > and NUMA_BALANCING will depend in !KMEMCHECK. Fwiw, we're using bit 11 for soft-dirty tracking, so i really hope worst case never happen. (At the moment I'm trying to figure out if with this set it would be possible to clean up ugly macros in pgoff_to_pte for 2 level pages). From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755884AbaDGS3D (ORCPT ); Mon, 7 Apr 2014 14:29:03 -0400 Received: from cantor2.suse.de ([195.135.220.15]:54731 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753197AbaDGS3A (ORCPT ); Mon, 7 Apr 2014 14:29:00 -0400 Date: Mon, 7 Apr 2014 19:28:54 +0100 From: Mel Gorman To: Cyrill Gorcunov Cc: David Vrabel , Linus Torvalds , Peter Anvin , Ingo Molnar , Steven Noonan , Rik van Riel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML , Pavel Emelyanov Subject: Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels Message-ID: <20140407182854.GH7292@suse.de> References: <1396883443-11696-1-git-send-email-mgorman@suse.de> <1396883443-11696-3-git-send-email-mgorman@suse.de> <5342C517.2020305@citrix.com> <20140407154935.GD7292@suse.de> <20140407161910.GJ1444@moon> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20140407161910.GJ1444@moon> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Apr 07, 2014 at 08:19:10PM +0400, Cyrill Gorcunov wrote: > On Mon, Apr 07, 2014 at 04:49:35PM +0100, Mel Gorman wrote: > > On Mon, Apr 07, 2014 at 04:32:39PM +0100, David Vrabel wrote: > > > On 07/04/14 16:10, Mel Gorman wrote: > > > > _PAGE_NUMA is currently an alias of _PROT_PROTNONE to trap NUMA hinting > > > > faults. As the bit is shared care is taken that _PAGE_NUMA is only used in > > > > places where _PAGE_PROTNONE could not reach but this still causes problems > > > > on Xen and conceptually difficult. > > > > > > The problem with Xen guests occurred because mprotect() /was/ confusing > > > PROTNONE mappings with _PAGE_NUMA and clearing the non-existant NUMA hints. > > > > I didn't bother spelling it out in case I gave the impression that I was > > blaming Xen for the problem. As the bit is now changes, does it help > > the Xen problem or cause another collision of some sort? There is no > > guarantee _PAGE_NUMA will remain as bit 62 but at worst it'll use bit 11 > > and NUMA_BALANCING will depend in !KMEMCHECK. > > Fwiw, we're using bit 11 for soft-dirty tracking, so i really hope worst case > never happen. (At the moment I'm trying to figure out if with this set > it would be possible to clean up ugly macros in pgoff_to_pte for 2 level pages). I had considered the soft-dirty tracking usage of the same bit. I thought I'd be able to swizzle around it or a further worst case of having soft-dirty and automatic NUMA balancing mutually exclusive. Unfortunately upon examination it's not obvious how to have both of them share a bit and I suspect any attempt to will break CRIU. In my current tree, NUMA_BALANCING cannot be set if MEM_SOFT_DIRTY which is not particularly satisfactory. Next on the list is examining if _PAGE_BIT_IOMAP can be used. -- Mel Gorman SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755718AbaDGTQ0 (ORCPT ); Mon, 7 Apr 2014 15:16:26 -0400 Received: from mail-lb0-f180.google.com ([209.85.217.180]:45544 "EHLO mail-lb0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755610AbaDGTQZ (ORCPT ); Mon, 7 Apr 2014 15:16:25 -0400 Date: Mon, 7 Apr 2014 23:16:22 +0400 From: Cyrill Gorcunov To: Mel Gorman Cc: David Vrabel , Linus Torvalds , Peter Anvin , Ingo Molnar , Steven Noonan , Rik van Riel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML , Pavel Emelyanov Subject: Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels Message-ID: <20140407191622.GA23983@moon> References: <1396883443-11696-1-git-send-email-mgorman@suse.de> <1396883443-11696-3-git-send-email-mgorman@suse.de> <5342C517.2020305@citrix.com> <20140407154935.GD7292@suse.de> <20140407161910.GJ1444@moon> <20140407182854.GH7292@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140407182854.GH7292@suse.de> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Apr 07, 2014 at 07:28:54PM +0100, Mel Gorman wrote: > > > I didn't bother spelling it out in case I gave the impression that I was > > > blaming Xen for the problem. As the bit is now changes, does it help > > > the Xen problem or cause another collision of some sort? There is no > > > guarantee _PAGE_NUMA will remain as bit 62 but at worst it'll use bit 11 > > > and NUMA_BALANCING will depend in !KMEMCHECK. > > > > Fwiw, we're using bit 11 for soft-dirty tracking, so i really hope worst case > > never happen. (At the moment I'm trying to figure out if with this set > > it would be possible to clean up ugly macros in pgoff_to_pte for 2 level pages). > > I had considered the soft-dirty tracking usage of the same bit. I thought I'd > be able to swizzle around it or a further worst case of having soft-dirty and > automatic NUMA balancing mutually exclusive. Unfortunately upon examination > it's not obvious how to have both of them share a bit and I suspect any > attempt to will break CRIU. In my current tree, NUMA_BALANCING cannot be > set if MEM_SOFT_DIRTY which is not particularly satisfactory. Next on the > list is examining if _PAGE_BIT_IOMAP can be used. Thanks for info, Mel! It seems indeed if no more space left on x86-64 (in the very worst case which I still think won't happen anytime soon) we'll have to make them mut. exclusive. But for now (with 62 bit used for numa) they can live together, right? From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755886AbaDGT3N (ORCPT ); Mon, 7 Apr 2014 15:29:13 -0400 Received: from terminus.zytor.com ([198.137.202.10]:57905 "EHLO mail.zytor.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755830AbaDGT3L (ORCPT ); Mon, 7 Apr 2014 15:29:11 -0400 Message-ID: <5342FC0E.9080701@zytor.com> Date: Mon, 07 Apr 2014 12:27:10 -0700 From: "H. Peter Anvin" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 MIME-Version: 1.0 To: Mel Gorman , Cyrill Gorcunov CC: David Vrabel , Linus Torvalds , Ingo Molnar , Steven Noonan , Rik van Riel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML , Pavel Emelyanov Subject: Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels References: <1396883443-11696-1-git-send-email-mgorman@suse.de> <1396883443-11696-3-git-send-email-mgorman@suse.de> <5342C517.2020305@citrix.com> <20140407154935.GD7292@suse.de> <20140407161910.GJ1444@moon> <20140407182854.GH7292@suse.de> In-Reply-To: <20140407182854.GH7292@suse.de> X-Enigmail-Version: 1.6 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 04/07/2014 11:28 AM, Mel Gorman wrote: > > I had considered the soft-dirty tracking usage of the same bit. I thought I'd > be able to swizzle around it or a further worst case of having soft-dirty and > automatic NUMA balancing mutually exclusive. Unfortunately upon examination > it's not obvious how to have both of them share a bit and I suspect any > attempt to will break CRIU. In my current tree, NUMA_BALANCING cannot be > set if MEM_SOFT_DIRTY which is not particularly satisfactory. Next on the > list is examining if _PAGE_BIT_IOMAP can be used. > Didn't we smoke the last user of _PAGE_BIT_IOMAP? -hpa From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755883AbaDGTgu (ORCPT ); Mon, 7 Apr 2014 15:36:50 -0400 Received: from mail-la0-f54.google.com ([209.85.215.54]:42829 "EHLO mail-la0-f54.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755217AbaDGTgt (ORCPT ); Mon, 7 Apr 2014 15:36:49 -0400 Date: Mon, 7 Apr 2014 23:36:46 +0400 From: Cyrill Gorcunov To: "H. Peter Anvin" Cc: Mel Gorman , David Vrabel , Linus Torvalds , Ingo Molnar , Steven Noonan , Rik van Riel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML , Pavel Emelyanov Subject: Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels Message-ID: <20140407193646.GC23983@moon> References: <1396883443-11696-1-git-send-email-mgorman@suse.de> <1396883443-11696-3-git-send-email-mgorman@suse.de> <5342C517.2020305@citrix.com> <20140407154935.GD7292@suse.de> <20140407161910.GJ1444@moon> <20140407182854.GH7292@suse.de> <5342FC0E.9080701@zytor.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5342FC0E.9080701@zytor.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Apr 07, 2014 at 12:27:10PM -0700, H. Peter Anvin wrote: > On 04/07/2014 11:28 AM, Mel Gorman wrote: > > > > I had considered the soft-dirty tracking usage of the same bit. I thought I'd > > be able to swizzle around it or a further worst case of having soft-dirty and > > automatic NUMA balancing mutually exclusive. Unfortunately upon examination > > it's not obvious how to have both of them share a bit and I suspect any > > attempt to will break CRIU. In my current tree, NUMA_BALANCING cannot be > > set if MEM_SOFT_DIRTY which is not particularly satisfactory. Next on the > > list is examining if _PAGE_BIT_IOMAP can be used. > > Didn't we smoke the last user of _PAGE_BIT_IOMAP? Seems so, at least for non-kernel pages (not considering this bit references in xen code, which i simply don't know but i guess it's used for kernel pages only). From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755720AbaDGTnb (ORCPT ); Mon, 7 Apr 2014 15:43:31 -0400 Received: from terminus.zytor.com ([198.137.202.10]:58062 "EHLO mail.zytor.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755510AbaDGTn3 (ORCPT ); Mon, 7 Apr 2014 15:43:29 -0400 Message-ID: <5342FFB0.6010501@zytor.com> Date: Mon, 07 Apr 2014 12:42:40 -0700 From: "H. Peter Anvin" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 MIME-Version: 1.0 To: Cyrill Gorcunov CC: Mel Gorman , David Vrabel , Linus Torvalds , Ingo Molnar , Steven Noonan , Rik van Riel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML , Pavel Emelyanov Subject: Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels References: <1396883443-11696-1-git-send-email-mgorman@suse.de> <1396883443-11696-3-git-send-email-mgorman@suse.de> <5342C517.2020305@citrix.com> <20140407154935.GD7292@suse.de> <20140407161910.GJ1444@moon> <20140407182854.GH7292@suse.de> <5342FC0E.9080701@zytor.com> <20140407193646.GC23983@moon> In-Reply-To: <20140407193646.GC23983@moon> X-Enigmail-Version: 1.6 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 04/07/2014 12:36 PM, Cyrill Gorcunov wrote: > On Mon, Apr 07, 2014 at 12:27:10PM -0700, H. Peter Anvin wrote: >> On 04/07/2014 11:28 AM, Mel Gorman wrote: >>> >>> I had considered the soft-dirty tracking usage of the same bit. I thought I'd >>> be able to swizzle around it or a further worst case of having soft-dirty and >>> automatic NUMA balancing mutually exclusive. Unfortunately upon examination >>> it's not obvious how to have both of them share a bit and I suspect any >>> attempt to will break CRIU. In my current tree, NUMA_BALANCING cannot be >>> set if MEM_SOFT_DIRTY which is not particularly satisfactory. Next on the >>> list is examining if _PAGE_BIT_IOMAP can be used. >> >> Didn't we smoke the last user of _PAGE_BIT_IOMAP? > > Seems so, at least for non-kernel pages (not considering this bit references in > xen code, which i simply don't know but i guess it's used for kernel pages only). > David Vrabel has a patchset which I presumed would be pulled through the Xen tree this merge window: [PATCHv5 0/8] x86/xen: fixes for mapping high MMIO regions (and remove _PAGE_IOMAP) That frees up this bit. -hpa From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755525AbaDGT6k (ORCPT ); Mon, 7 Apr 2014 15:58:40 -0400 Received: from mga09.intel.com ([134.134.136.24]:59861 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753765AbaDGT6j (ORCPT ); Mon, 7 Apr 2014 15:58:39 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.97,811,1389772800"; d="scan'208";a="516301608" Message-ID: <5342E273.4070308@intel.com> Date: Mon, 07 Apr 2014 10:37:55 -0700 From: Dave Hansen User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 MIME-Version: 1.0 To: Mel Gorman , Linus Torvalds CC: Cyrill Gorcunov , Peter Anvin , Ingo Molnar , Steven Noonan , Rik van Riel , David Vrabel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML Subject: Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels References: <1396883443-11696-1-git-send-email-mgorman@suse.de> <1396883443-11696-3-git-send-email-mgorman@suse.de> In-Reply-To: <1396883443-11696-3-git-send-email-mgorman@suse.de> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 04/07/2014 08:10 AM, Mel Gorman wrote: > +/* > + * Software bits ignored by the page table walker > + * At the time of writing, different levels have bits that are ignored. Due > + * to physical address limitations, bits 52:62 should be ignored for the PMD > + * and PTE levels and are available for use by software. Be aware that this > + * may change if the physical address space expands. > + */ > +#define _PAGE_BIT_NUMA 62 Doesn't moving it up to the high bits break pte_modify()'s assumptions? I was thinking of this nugget from change_pte_range(): ptent = ptep_modify_prot_start(mm, addr, pte); if (pte_numa(ptent)) ptent = pte_mknonnuma(ptent); ptent = pte_modify(ptent, newprot); pte_modify() pulls off all the high bits out of 'ptent' and only adds them back if they're in newprot (which as far as I can tell comes from the VMA). So I _think_ it'll axe the _PAGE_NUMA out of 'ptent'. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755508AbaDGVTx (ORCPT ); Mon, 7 Apr 2014 17:19:53 -0400 Received: from cantor2.suse.de ([195.135.220.15]:60816 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755034AbaDGVTw (ORCPT ); Mon, 7 Apr 2014 17:19:52 -0400 Date: Mon, 7 Apr 2014 22:19:44 +0100 From: Mel Gorman To: "H. Peter Anvin" Cc: Cyrill Gorcunov , David Vrabel , Linus Torvalds , Ingo Molnar , Steven Noonan , Rik van Riel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML , Pavel Emelyanov Subject: Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels Message-ID: <20140407211944.GI7292@suse.de> References: <1396883443-11696-1-git-send-email-mgorman@suse.de> <1396883443-11696-3-git-send-email-mgorman@suse.de> <5342C517.2020305@citrix.com> <20140407154935.GD7292@suse.de> <20140407161910.GJ1444@moon> <20140407182854.GH7292@suse.de> <5342FC0E.9080701@zytor.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <5342FC0E.9080701@zytor.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Apr 07, 2014 at 12:27:10PM -0700, H. Peter Anvin wrote: > On 04/07/2014 11:28 AM, Mel Gorman wrote: > > > > I had considered the soft-dirty tracking usage of the same bit. I thought I'd > > be able to swizzle around it or a further worst case of having soft-dirty and > > automatic NUMA balancing mutually exclusive. Unfortunately upon examination > > it's not obvious how to have both of them share a bit and I suspect any > > attempt to will break CRIU. In my current tree, NUMA_BALANCING cannot be > > set if MEM_SOFT_DIRTY which is not particularly satisfactory. Next on the > > list is examining if _PAGE_BIT_IOMAP can be used. > > > > Didn't we smoke the last user of _PAGE_BIT_IOMAP? > There are still some users of _PAGE_IOMAP with Xen being the main user. For x86 on bare metal it looks like userspace should never have a PTE with _PAGE_IO set so it should be usable as _PAGE_NUMA. Patches that do that are currently being tested but a side-effect was that I had to disable support on Xen as Xen appears to use it to distinguish between Xen PTEs and MFNs. It's unclear what automatic NUMA balancing on Xen even means -- are NUMA nodes always mapped to the physical topology? What is sensible behaviour if guest and host both run it? etc. If they need it, we can then examine what the proper way to support _PAGE_NUMA on Xen is. -- Mel Gorman SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755498AbaDGVZn (ORCPT ); Mon, 7 Apr 2014 17:25:43 -0400 Received: from cantor2.suse.de ([195.135.220.15]:35102 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753771AbaDGVZl (ORCPT ); Mon, 7 Apr 2014 17:25:41 -0400 Date: Mon, 7 Apr 2014 22:25:35 +0100 From: Mel Gorman To: "H. Peter Anvin" Cc: Cyrill Gorcunov , David Vrabel , Linus Torvalds , Ingo Molnar , Steven Noonan , Rik van Riel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML , Pavel Emelyanov Subject: Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels Message-ID: <20140407212535.GJ7292@suse.de> References: <1396883443-11696-1-git-send-email-mgorman@suse.de> <1396883443-11696-3-git-send-email-mgorman@suse.de> <5342C517.2020305@citrix.com> <20140407154935.GD7292@suse.de> <20140407161910.GJ1444@moon> <20140407182854.GH7292@suse.de> <5342FC0E.9080701@zytor.com> <20140407193646.GC23983@moon> <5342FFB0.6010501@zytor.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <5342FFB0.6010501@zytor.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Apr 07, 2014 at 12:42:40PM -0700, H. Peter Anvin wrote: > On 04/07/2014 12:36 PM, Cyrill Gorcunov wrote: > > On Mon, Apr 07, 2014 at 12:27:10PM -0700, H. Peter Anvin wrote: > >> On 04/07/2014 11:28 AM, Mel Gorman wrote: > >>> > >>> I had considered the soft-dirty tracking usage of the same bit. I thought I'd > >>> be able to swizzle around it or a further worst case of having soft-dirty and > >>> automatic NUMA balancing mutually exclusive. Unfortunately upon examination > >>> it's not obvious how to have both of them share a bit and I suspect any > >>> attempt to will break CRIU. In my current tree, NUMA_BALANCING cannot be > >>> set if MEM_SOFT_DIRTY which is not particularly satisfactory. Next on the > >>> list is examining if _PAGE_BIT_IOMAP can be used. > >> > >> Didn't we smoke the last user of _PAGE_BIT_IOMAP? > > > > Seems so, at least for non-kernel pages (not considering this bit references in > > xen code, which i simply don't know but i guess it's used for kernel pages only). > > > > David Vrabel has a patchset which I presumed would be pulled through the > Xen tree this merge window: > > [PATCHv5 0/8] x86/xen: fixes for mapping high MMIO regions (and remove > _PAGE_IOMAP) > > That frees up this bit. > Thanks, I was not aware of that patch. Based on it, I intend to force automatic NUMA balancing to depend on !XEN and see what the reaction is. If support for Xen is really required then it potentially be re-enabled if/when that series is merged assuming they do not need the bit for something else. -- Mel Gorman SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755034AbaDHEE4 (ORCPT ); Tue, 8 Apr 2014 00:04:56 -0400 Received: from mail-la0-f42.google.com ([209.85.215.42]:42585 "EHLO mail-la0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750708AbaDHEEy (ORCPT ); Tue, 8 Apr 2014 00:04:54 -0400 MIME-Version: 1.0 In-Reply-To: <20140407212535.GJ7292@suse.de> References: <1396883443-11696-1-git-send-email-mgorman@suse.de> <1396883443-11696-3-git-send-email-mgorman@suse.de> <5342C517.2020305@citrix.com> <20140407154935.GD7292@suse.de> <20140407161910.GJ1444@moon> <20140407182854.GH7292@suse.de> <5342FC0E.9080701@zytor.com> <20140407193646.GC23983@moon> <5342FFB0.6010501@zytor.com> <20140407212535.GJ7292@suse.de> Date: Mon, 7 Apr 2014 21:04:53 -0700 Message-ID: Subject: Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels From: Steven Noonan To: Mel Gorman Cc: "H. Peter Anvin" , Cyrill Gorcunov , David Vrabel , Linus Torvalds , Ingo Molnar , Rik van Riel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML , Pavel Emelyanov Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Apr 7, 2014 at 2:25 PM, Mel Gorman wrote: > On Mon, Apr 07, 2014 at 12:42:40PM -0700, H. Peter Anvin wrote: >> On 04/07/2014 12:36 PM, Cyrill Gorcunov wrote: >> > On Mon, Apr 07, 2014 at 12:27:10PM -0700, H. Peter Anvin wrote: >> >> On 04/07/2014 11:28 AM, Mel Gorman wrote: >> >>> >> >>> I had considered the soft-dirty tracking usage of the same bit. I thought I'd >> >>> be able to swizzle around it or a further worst case of having soft-dirty and >> >>> automatic NUMA balancing mutually exclusive. Unfortunately upon examination >> >>> it's not obvious how to have both of them share a bit and I suspect any >> >>> attempt to will break CRIU. In my current tree, NUMA_BALANCING cannot be >> >>> set if MEM_SOFT_DIRTY which is not particularly satisfactory. Next on the >> >>> list is examining if _PAGE_BIT_IOMAP can be used. >> >> >> >> Didn't we smoke the last user of _PAGE_BIT_IOMAP? >> > >> > Seems so, at least for non-kernel pages (not considering this bit references in >> > xen code, which i simply don't know but i guess it's used for kernel pages only). >> > >> >> David Vrabel has a patchset which I presumed would be pulled through the >> Xen tree this merge window: >> >> [PATCHv5 0/8] x86/xen: fixes for mapping high MMIO regions (and remove >> _PAGE_IOMAP) >> >> That frees up this bit. >> > > Thanks, I was not aware of that patch. Based on it, I intend to force > automatic NUMA balancing to depend on !XEN and see what the reaction is. If > support for Xen is really required then it potentially be re-enabled if/when > that series is merged assuming they do not need the bit for something else. > Amazon EC2 does have large memory instance types with NUMA exposed to the guest (e.g. c3.8xlarge, i2.8xlarge, etc), so it'd be preferable (to me anyway) if we didn't require !XEN. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756297AbaDHJbj (ORCPT ); Tue, 8 Apr 2014 05:31:39 -0400 Received: from smtp02.citrix.com ([66.165.176.63]:41518 "EHLO SMTP02.CITRIX.COM" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756065AbaDHJbi (ORCPT ); Tue, 8 Apr 2014 05:31:38 -0400 X-IronPort-AV: E=Sophos;i="4.97,816,1389744000"; d="scan'208";a="117796899" Message-ID: <5343C1F6.4090600@citrix.com> Date: Tue, 8 Apr 2014 10:31:34 +0100 From: David Vrabel User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.16) Gecko/20121215 Iceowl/1.0b1 Icedove/3.0.11 MIME-Version: 1.0 To: Cyrill Gorcunov CC: "H. Peter Anvin" , Mel Gorman , Linus Torvalds , Ingo Molnar , Steven Noonan , Rik van Riel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML , Pavel Emelyanov Subject: Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels References: <1396883443-11696-1-git-send-email-mgorman@suse.de> <1396883443-11696-3-git-send-email-mgorman@suse.de> <5342C517.2020305@citrix.com> <20140407154935.GD7292@suse.de> <20140407161910.GJ1444@moon> <20140407182854.GH7292@suse.de> <5342FC0E.9080701@zytor.com> <20140407193646.GC23983@moon> In-Reply-To: <20140407193646.GC23983@moon> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.80.2.76] X-DLP: MIA1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 07/04/14 20:36, Cyrill Gorcunov wrote: > On Mon, Apr 07, 2014 at 12:27:10PM -0700, H. Peter Anvin wrote: >> On 04/07/2014 11:28 AM, Mel Gorman wrote: >>> >>> I had considered the soft-dirty tracking usage of the same bit. I thought I'd >>> be able to swizzle around it or a further worst case of having soft-dirty and >>> automatic NUMA balancing mutually exclusive. Unfortunately upon examination >>> it's not obvious how to have both of them share a bit and I suspect any >>> attempt to will break CRIU. In my current tree, NUMA_BALANCING cannot be >>> set if MEM_SOFT_DIRTY which is not particularly satisfactory. Next on the >>> list is examining if _PAGE_BIT_IOMAP can be used. >> >> Didn't we smoke the last user of _PAGE_BIT_IOMAP? Not yet. A last minute regression with mapping of I/O regions from userspace was found so I had to drop the series from 3.15. It should be back for 3.16. > Seems so, at least for non-kernel pages (not considering this bit references in > xen code, which i simply don't know but i guess it's used for kernel pages only). Xen uses it for all I/O mappings, both kernel and for userspace. David From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757317AbaDHPSB (ORCPT ); Tue, 8 Apr 2014 11:18:01 -0400 Received: from terminus.zytor.com ([198.137.202.10]:59925 "EHLO mail.zytor.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756533AbaDHPR7 (ORCPT ); Tue, 8 Apr 2014 11:17:59 -0400 User-Agent: K-9 Mail for Android In-Reply-To: References: <1396883443-11696-1-git-send-email-mgorman@suse.de> <1396883443-11696-3-git-send-email-mgorman@suse.de> <5342C517.2020305@citrix.com> <20140407154935.GD7292@suse.de> <20140407161910.GJ1444@moon> <20140407182854.GH7292@suse.de> <5342FC0E.9080701@zytor.com> <20140407193646.GC23983@moon> <5342FFB0.6010501@zytor.com> <20140407212535.GJ7292@suse.de> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=UTF-8 Subject: Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels From: "H. Peter Anvin" Date: Tue, 08 Apr 2014 08:16:14 -0700 To: Steven Noonan , Mel Gorman CC: Cyrill Gorcunov , David Vrabel , Linus Torvalds , Ingo Molnar , Rik van Riel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML , Pavel Emelyanov Message-ID: Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Of course, it would also be preferable if Amazon (or anything else) didn't need Xen PV :( On April 7, 2014 9:04:53 PM PDT, Steven Noonan wrote: >On Mon, Apr 7, 2014 at 2:25 PM, Mel Gorman wrote: >> On Mon, Apr 07, 2014 at 12:42:40PM -0700, H. Peter Anvin wrote: >>> On 04/07/2014 12:36 PM, Cyrill Gorcunov wrote: >>> > On Mon, Apr 07, 2014 at 12:27:10PM -0700, H. Peter Anvin wrote: >>> >> On 04/07/2014 11:28 AM, Mel Gorman wrote: >>> >>> >>> >>> I had considered the soft-dirty tracking usage of the same bit. >I thought I'd >>> >>> be able to swizzle around it or a further worst case of having >soft-dirty and >>> >>> automatic NUMA balancing mutually exclusive. Unfortunately upon >examination >>> >>> it's not obvious how to have both of them share a bit and I >suspect any >>> >>> attempt to will break CRIU. In my current tree, NUMA_BALANCING >cannot be >>> >>> set if MEM_SOFT_DIRTY which is not particularly satisfactory. >Next on the >>> >>> list is examining if _PAGE_BIT_IOMAP can be used. >>> >> >>> >> Didn't we smoke the last user of _PAGE_BIT_IOMAP? >>> > >>> > Seems so, at least for non-kernel pages (not considering this bit >references in >>> > xen code, which i simply don't know but i guess it's used for >kernel pages only). >>> > >>> >>> David Vrabel has a patchset which I presumed would be pulled through >the >>> Xen tree this merge window: >>> >>> [PATCHv5 0/8] x86/xen: fixes for mapping high MMIO regions (and >remove >>> _PAGE_IOMAP) >>> >>> That frees up this bit. >>> >> >> Thanks, I was not aware of that patch. Based on it, I intend to >force >> automatic NUMA balancing to depend on !XEN and see what the reaction >is. If >> support for Xen is really required then it potentially be re-enabled >if/when >> that series is merged assuming they do not need the bit for something >else. >> > >Amazon EC2 does have large memory instance types with NUMA exposed to >the guest (e.g. c3.8xlarge, i2.8xlarge, etc), so it'd be preferable >(to me anyway) if we didn't require !XEN. -- Sent from my mobile phone. Please pardon brevity and lack of formatting. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932355AbaDHQDk (ORCPT ); Tue, 8 Apr 2014 12:03:40 -0400 Received: from userp1040.oracle.com ([156.151.31.81]:46071 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756709AbaDHQDi (ORCPT ); Tue, 8 Apr 2014 12:03:38 -0400 Date: Tue, 8 Apr 2014 12:02:50 -0400 From: Konrad Rzeszutek Wilk To: "H. Peter Anvin" Cc: Steven Noonan , Mel Gorman , Cyrill Gorcunov , David Vrabel , Linus Torvalds , Ingo Molnar , Rik van Riel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML , Pavel Emelyanov Subject: Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels Message-ID: <20140408160250.GE31554@phenom.dumpdata.com> References: <5342C517.2020305@citrix.com> <20140407154935.GD7292@suse.de> <20140407161910.GJ1444@moon> <20140407182854.GH7292@suse.de> <5342FC0E.9080701@zytor.com> <20140407193646.GC23983@moon> <5342FFB0.6010501@zytor.com> <20140407212535.GJ7292@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-Source-IP: acsinet21.oracle.com [141.146.126.237] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org .snip.. > >>> David Vrabel has a patchset which I presumed would be pulled through > >the > >>> Xen tree this merge window: > >>> > >>> [PATCHv5 0/8] x86/xen: fixes for mapping high MMIO regions (and > >remove > >>> _PAGE_IOMAP) > >>> > >>> That frees up this bit. > >>> > >> > >> Thanks, I was not aware of that patch. Based on it, I intend to > >force > >> automatic NUMA balancing to depend on !XEN and see what the reaction > >is. If > >> support for Xen is really required then it potentially be re-enabled > >if/when > >> that series is merged assuming they do not need the bit for something > >else. > >> > > > >Amazon EC2 does have large memory instance types with NUMA exposed to > >the guest (e.g. c3.8xlarge, i2.8xlarge, etc), so it'd be preferable > >(to me anyway) if we didn't require !XEN. What about the patch that David Vrabel posted: http://osdir.com/ml/general/2014-03/msg41979.html Has anybody taken it for a spin? From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756914AbaDHQSr (ORCPT ); Tue, 8 Apr 2014 12:18:47 -0400 Received: from terminus.zytor.com ([198.137.202.10]:33264 "EHLO mail.zytor.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756329AbaDHQSo (ORCPT ); Tue, 8 Apr 2014 12:18:44 -0400 Message-ID: <534420F1.3030301@zytor.com> Date: Tue, 08 Apr 2014 09:16:49 -0700 From: "H. Peter Anvin" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 MIME-Version: 1.0 To: Konrad Rzeszutek Wilk CC: Steven Noonan , Mel Gorman , Cyrill Gorcunov , David Vrabel , Linus Torvalds , Ingo Molnar , Rik van Riel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML , Pavel Emelyanov Subject: Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels References: <5342C517.2020305@citrix.com> <20140407154935.GD7292@suse.de> <20140407161910.GJ1444@moon> <20140407182854.GH7292@suse.de> <5342FC0E.9080701@zytor.com> <20140407193646.GC23983@moon> <5342FFB0.6010501@zytor.com> <20140407212535.GJ7292@suse.de> <20140408160250.GE31554@phenom.dumpdata.com> In-Reply-To: <20140408160250.GE31554@phenom.dumpdata.com> X-Enigmail-Version: 1.6 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 04/08/2014 09:02 AM, Konrad Rzeszutek Wilk wrote: >>> >>> Amazon EC2 does have large memory instance types with NUMA exposed to >>> the guest (e.g. c3.8xlarge, i2.8xlarge, etc), so it'd be preferable >>> (to me anyway) if we didn't require !XEN. > > What about the patch that David Vrabel posted: > > http://osdir.com/ml/general/2014-03/msg41979.html > > Has anybody taken it for a spin? > Oh lovely, more pvops in low level paths. I'm so thrilled. Incidentally, I wasn't even Cc:'d on that patch and was only added to the thread by Linus, but never saw the early bits of the thread including the actual patch. -hpa From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757367AbaDHQrt (ORCPT ); Tue, 8 Apr 2014 12:47:49 -0400 Received: from cantor2.suse.de ([195.135.220.15]:38032 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756894AbaDHQrs (ORCPT ); Tue, 8 Apr 2014 12:47:48 -0400 Date: Tue, 8 Apr 2014 17:47:44 +0100 From: Mel Gorman To: "H. Peter Anvin" Cc: Konrad Rzeszutek Wilk , Steven Noonan , Cyrill Gorcunov , David Vrabel , Linus Torvalds , Ingo Molnar , Rik van Riel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML , Pavel Emelyanov Subject: Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels Message-ID: <20140408164744.GM7292@suse.de> References: <20140407161910.GJ1444@moon> <20140407182854.GH7292@suse.de> <5342FC0E.9080701@zytor.com> <20140407193646.GC23983@moon> <5342FFB0.6010501@zytor.com> <20140407212535.GJ7292@suse.de> <20140408160250.GE31554@phenom.dumpdata.com> <534420F1.3030301@zytor.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <534420F1.3030301@zytor.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Apr 08, 2014 at 09:16:49AM -0700, H. Peter Anvin wrote: > On 04/08/2014 09:02 AM, Konrad Rzeszutek Wilk wrote: > >>> > >>> Amazon EC2 does have large memory instance types with NUMA exposed to > >>> the guest (e.g. c3.8xlarge, i2.8xlarge, etc), so it'd be preferable > >>> (to me anyway) if we didn't require !XEN. > > > > What about the patch that David Vrabel posted: > > > > http://osdir.com/ml/general/2014-03/msg41979.html > > > > Has anybody taken it for a spin? > > > > Oh lovely, more pvops in low level paths. I'm so thrilled. > > Incidentally, I wasn't even Cc:'d on that patch and was only added to > the thread by Linus, but never saw the early bits of the thread > including the actual patch. > I posted an alternative to that patch that confines the damage to the NUMA pte helpers. -- Mel Gorman SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757417AbaDHQvP (ORCPT ); Tue, 8 Apr 2014 12:51:15 -0400 Received: from smtp02.citrix.com ([66.165.176.63]:17239 "EHLO SMTP02.CITRIX.COM" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752768AbaDHQvN (ORCPT ); Tue, 8 Apr 2014 12:51:13 -0400 X-IronPort-AV: E=Sophos;i="4.97,819,1389744000"; d="scan'208";a="117962145" Message-ID: <534428F2.2040205@citrix.com> Date: Tue, 8 Apr 2014 17:50:58 +0100 From: David Vrabel User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.16) Gecko/20121215 Iceowl/1.0b1 Icedove/3.0.11 MIME-Version: 1.0 To: "H. Peter Anvin" CC: Konrad Rzeszutek Wilk , Steven Noonan , Mel Gorman , Cyrill Gorcunov , Linus Torvalds , Ingo Molnar , Rik van Riel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML , Pavel Emelyanov Subject: Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels References: <5342C517.2020305@citrix.com> <20140407154935.GD7292@suse.de> <20140407161910.GJ1444@moon> <20140407182854.GH7292@suse.de> <5342FC0E.9080701@zytor.com> <20140407193646.GC23983@moon> <5342FFB0.6010501@zytor.com> <20140407212535.GJ7292@suse.de> <20140408160250.GE31554@phenom.dumpdata.com> <534420F1.3030301@zytor.com> In-Reply-To: <534420F1.3030301@zytor.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.80.2.76] X-DLP: MIA2 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 08/04/14 17:16, H. Peter Anvin wrote: > On 04/08/2014 09:02 AM, Konrad Rzeszutek Wilk wrote: >>>> >>>> Amazon EC2 does have large memory instance types with NUMA exposed to >>>> the guest (e.g. c3.8xlarge, i2.8xlarge, etc), so it'd be preferable >>>> (to me anyway) if we didn't require !XEN. >> >> What about the patch that David Vrabel posted: >> >> http://osdir.com/ml/general/2014-03/msg41979.html >> >> Has anybody taken it for a spin? >> > > Oh lovely, more pvops in low level paths. I'm so thrilled. > > Incidentally, I wasn't even Cc:'d on that patch and was only added to > the thread by Linus, but never saw the early bits of the thread > including the actual patch. I did resend a version CC'd to all the x86 maintainers and included some performance figures for native (~1 extra clock cycle). I've included it again below. My preference would be take this patch as it fixes it for both NUMA rebalancing and any future uses that want to set/clear _PAGE_PRESENT. David 8<-------------- x86: use pv-ops in {pte, pmd}_{set,clear}_flags() Instead of using native functions to operate on the PTEs in pte_set_flags(), pte_clear_flags(), pmd_set_flags(), pmd_clear_flags() use the PV aware ones. This fixes a regression in Xen PV guests introduced by 1667918b6483 (mm: numa: clear numa hinting information on mprotect). This has negligible performance impact on native since the pte_val() and __pte() (etc.) calls are patched at runtime when running on bare metal. Measurements on a 3 GHz AMD 4284 give approx. 0.3 ns (~1 clock cycle) of additional time for each function. Xen PV guest page tables require that their entries use machine addresses if the preset bit (_PAGE_PRESENT) is set, and (for successful migration) non-present PTEs must use pseudo-physical addresses. This is because on migration MFNs only present PTEs are translated to PFNs (canonicalised) so they may be translated back to the new MFN in the destination domain (uncanonicalised). pte_mknonnuma(), pmd_mknonnuma(), pte_mknuma() and pmd_mknuma() set and clear the _PAGE_PRESENT bit using pte_set_flags(), pte_clear_flags(), etc. In a Xen PV guest, these functions must translate MFNs to PFNs when clearing _PAGE_PRESENT and translate PFNs to MFNs when setting _PAGE_PRESENT. Signed-off-by: David Vrabel Cc: Steven Noonan Cc: Elena Ufimtseva Cc: Mel Gorman Cc: [3.12+] --- arch/x86/include/asm/pgtable.h | 12 ++++++------ 1 files changed, 6 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index bbc8b12..323e5e2 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -174,16 +174,16 @@ static inline int has_transparent_hugepage(void) static inline pte_t pte_set_flags(pte_t pte, pteval_t set) { - pteval_t v = native_pte_val(pte); + pteval_t v = pte_val(pte); - return native_make_pte(v | set); + return __pte(v | set); } static inline pte_t pte_clear_flags(pte_t pte, pteval_t clear) { - pteval_t v = native_pte_val(pte); + pteval_t v = pte_val(pte); - return native_make_pte(v & ~clear); + return __pte(v & ~clear); } static inline pte_t pte_mkclean(pte_t pte) @@ -248,14 +248,14 @@ static inline pte_t pte_mkspecial(pte_t pte) static inline pmd_t pmd_set_flags(pmd_t pmd, pmdval_t set) { - pmdval_t v = native_pmd_val(pmd); + pmdval_t v = pmd_val(pmd); return __pmd(v | set); } static inline pmd_t pmd_clear_flags(pmd_t pmd, pmdval_t clear) { - pmdval_t v = native_pmd_val(pmd); + pmdval_t v = pmd_val(pmd); return __pmd(v & ~clear); } From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756433AbaDHQva (ORCPT ); Tue, 8 Apr 2014 12:51:30 -0400 Received: from cantor2.suse.de ([195.135.220.15]:38098 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756377AbaDHQv1 (ORCPT ); Tue, 8 Apr 2014 12:51:27 -0400 Date: Tue, 8 Apr 2014 17:51:23 +0100 From: Mel Gorman To: Konrad Rzeszutek Wilk Cc: "H. Peter Anvin" , Steven Noonan , Cyrill Gorcunov , David Vrabel , Linus Torvalds , Ingo Molnar , Rik van Riel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML , Pavel Emelyanov Subject: Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels Message-ID: <20140408165123.GN7292@suse.de> References: <20140407154935.GD7292@suse.de> <20140407161910.GJ1444@moon> <20140407182854.GH7292@suse.de> <5342FC0E.9080701@zytor.com> <20140407193646.GC23983@moon> <5342FFB0.6010501@zytor.com> <20140407212535.GJ7292@suse.de> <20140408160250.GE31554@phenom.dumpdata.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20140408160250.GE31554@phenom.dumpdata.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Apr 08, 2014 at 12:02:50PM -0400, Konrad Rzeszutek Wilk wrote: > .snip.. > > >>> David Vrabel has a patchset which I presumed would be pulled through > > >the > > >>> Xen tree this merge window: > > >>> > > >>> [PATCHv5 0/8] x86/xen: fixes for mapping high MMIO regions (and > > >remove > > >>> _PAGE_IOMAP) > > >>> > > >>> That frees up this bit. > > >>> > > >> > > >> Thanks, I was not aware of that patch. Based on it, I intend to > > >force > > >> automatic NUMA balancing to depend on !XEN and see what the reaction > > >is. If > > >> support for Xen is really required then it potentially be re-enabled > > >if/when > > >> that series is merged assuming they do not need the bit for something > > >else. > > >> > > > > > >Amazon EC2 does have large memory instance types with NUMA exposed to > > >the guest (e.g. c3.8xlarge, i2.8xlarge, etc), so it'd be preferable > > >(to me anyway) if we didn't require !XEN. > > What about the patch that David Vrabel posted: > > http://osdir.com/ml/general/2014-03/msg41979.html > > Has anybody taken it for a spin? Alternatively "[PATCH 4/5] mm: use paravirt friendly ops for NUMA hinting ptes" which modifies the NUMA pte helpers instead of the main set/clear ones. -- Mel Gorman SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757214AbaDHUvb (ORCPT ); Tue, 8 Apr 2014 16:51:31 -0400 Received: from mail-bk0-f43.google.com ([209.85.214.43]:44390 "EHLO mail-bk0-f43.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757064AbaDHUv3 (ORCPT ); Tue, 8 Apr 2014 16:51:29 -0400 MIME-Version: 1.0 In-Reply-To: References: <1396883443-11696-1-git-send-email-mgorman@suse.de> <1396883443-11696-3-git-send-email-mgorman@suse.de> <5342C517.2020305@citrix.com> <20140407154935.GD7292@suse.de> <20140407161910.GJ1444@moon> <20140407182854.GH7292@suse.de> <5342FC0E.9080701@zytor.com> <20140407193646.GC23983@moon> <5342FFB0.6010501@zytor.com> <20140407212535.GJ7292@suse.de> Date: Tue, 8 Apr 2014 13:51:28 -0700 Message-ID: Subject: Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels From: Steven Noonan To: "H. Peter Anvin" Cc: Mel Gorman , Cyrill Gorcunov , David Vrabel , Linus Torvalds , Ingo Molnar , Rik van Riel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML , Pavel Emelyanov Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Apr 8, 2014 at 8:16 AM, H. Peter Anvin wrote: > > > Of course, it would also be preferable if Amazon (or anything else) didn't need Xen PV :( Well Amazon doesn't expose NUMA on PV, only on HVM guests. > On April 7, 2014 9:04:53 PM PDT, Steven Noonan wrote: >>On Mon, Apr 7, 2014 at 2:25 PM, Mel Gorman wrote: >>> On Mon, Apr 07, 2014 at 12:42:40PM -0700, H. Peter Anvin wrote: >>>> On 04/07/2014 12:36 PM, Cyrill Gorcunov wrote: >>>> > On Mon, Apr 07, 2014 at 12:27:10PM -0700, H. Peter Anvin wrote: >>>> >> On 04/07/2014 11:28 AM, Mel Gorman wrote: >>>> >>> >>>> >>> I had considered the soft-dirty tracking usage of the same bit. >>I thought I'd >>>> >>> be able to swizzle around it or a further worst case of having >>soft-dirty and >>>> >>> automatic NUMA balancing mutually exclusive. Unfortunately upon >>examination >>>> >>> it's not obvious how to have both of them share a bit and I >>suspect any >>>> >>> attempt to will break CRIU. In my current tree, NUMA_BALANCING >>cannot be >>>> >>> set if MEM_SOFT_DIRTY which is not particularly satisfactory. >>Next on the >>>> >>> list is examining if _PAGE_BIT_IOMAP can be used. >>>> >> >>>> >> Didn't we smoke the last user of _PAGE_BIT_IOMAP? >>>> > >>>> > Seems so, at least for non-kernel pages (not considering this bit >>references in >>>> > xen code, which i simply don't know but i guess it's used for >>kernel pages only). >>>> > >>>> >>>> David Vrabel has a patchset which I presumed would be pulled through >>the >>>> Xen tree this merge window: >>>> >>>> [PATCHv5 0/8] x86/xen: fixes for mapping high MMIO regions (and >>remove >>>> _PAGE_IOMAP) >>>> >>>> That frees up this bit. >>>> >>> >>> Thanks, I was not aware of that patch. Based on it, I intend to >>force >>> automatic NUMA balancing to depend on !XEN and see what the reaction >>is. If >>> support for Xen is really required then it potentially be re-enabled >>if/when >>> that series is merged assuming they do not need the bit for something >>else. >>> >> >>Amazon EC2 does have large memory instance types with NUMA exposed to >>the guest (e.g. c3.8xlarge, i2.8xlarge, etc), so it'd be preferable >>(to me anyway) if we didn't require !XEN. > > -- > Sent from my mobile phone. Please pardon brevity and lack of formatting. > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757256AbaDHVAs (ORCPT ); Tue, 8 Apr 2014 17:00:48 -0400 Received: from terminus.zytor.com ([198.137.202.10]:37101 "EHLO mail.zytor.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756753AbaDHVAq (ORCPT ); Tue, 8 Apr 2014 17:00:46 -0400 Message-ID: <5344631D.1050203@zytor.com> Date: Tue, 08 Apr 2014 13:59:09 -0700 From: "H. Peter Anvin" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 MIME-Version: 1.0 To: Steven Noonan CC: Mel Gorman , Cyrill Gorcunov , David Vrabel , Linus Torvalds , Ingo Molnar , Rik van Riel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML , Pavel Emelyanov Subject: Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels References: <1396883443-11696-1-git-send-email-mgorman@suse.de> <1396883443-11696-3-git-send-email-mgorman@suse.de> <5342C517.2020305@citrix.com> <20140407154935.GD7292@suse.de> <20140407161910.GJ1444@moon> <20140407182854.GH7292@suse.de> <5342FC0E.9080701@zytor.com> <20140407193646.GC23983@moon> <5342FFB0.6010501@zytor.com> <20140407212535.GJ7292@suse.de> In-Reply-To: X-Enigmail-Version: 1.6 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 04/08/2014 01:51 PM, Steven Noonan wrote: > On Tue, Apr 8, 2014 at 8:16 AM, H. Peter Anvin wrote: >> >> >> Of course, it would also be preferable if Amazon (or anything else) didn't need Xen PV :( > > Well Amazon doesn't expose NUMA on PV, only on HVM guests. > Yes, but Amazon is one of the main things keeping Xen PV alive as far as I can tell, which means the support gets built in, and so on. -hpa From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964785AbaDIPFn (ORCPT ); Wed, 9 Apr 2014 11:05:43 -0400 Received: from userp1040.oracle.com ([156.151.31.81]:31336 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933427AbaDIPFl (ORCPT ); Wed, 9 Apr 2014 11:05:41 -0400 Date: Wed, 9 Apr 2014 11:04:48 -0400 From: Konrad Rzeszutek Wilk To: "H. Peter Anvin" Cc: Steven Noonan , Mel Gorman , Cyrill Gorcunov , David Vrabel , Linus Torvalds , Ingo Molnar , Rik van Riel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML , Pavel Emelyanov Subject: Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels Message-ID: <20140409150448.GE5860@phenom.dumpdata.com> References: <20140407161910.GJ1444@moon> <20140407182854.GH7292@suse.de> <5342FC0E.9080701@zytor.com> <20140407193646.GC23983@moon> <5342FFB0.6010501@zytor.com> <20140407212535.GJ7292@suse.de> <5344631D.1050203@zytor.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5344631D.1050203@zytor.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-Source-IP: acsinet22.oracle.com [141.146.126.238] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Apr 08, 2014 at 01:59:09PM -0700, H. Peter Anvin wrote: > On 04/08/2014 01:51 PM, Steven Noonan wrote: > > On Tue, Apr 8, 2014 at 8:16 AM, H. Peter Anvin wrote: > >> > >> > >> Of course, it would also be preferable if Amazon (or anything else) didn't need Xen PV :( > > > > Well Amazon doesn't expose NUMA on PV, only on HVM guests. > > > > Yes, but Amazon is one of the main things keeping Xen PV alive as far as > I can tell, which means the support gets built in, and so on. Taking the snarkiness aside, the issue here is that even on guests without NUMA exposed the problem shows up. That is the 'mknuma' are still being called even if the guest topology is not NUMA! Which brings a question - why isn't the mknuma and its friends gatted by an jump_label machinery or such? Mel, any particular reasons why it couldn't be done this way? > > -hpa > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933848AbaDIPKN (ORCPT ); Wed, 9 Apr 2014 11:10:13 -0400 Received: from merlin.infradead.org ([205.233.59.134]:45671 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933514AbaDIPKJ (ORCPT ); Wed, 9 Apr 2014 11:10:09 -0400 Date: Wed, 9 Apr 2014 17:09:37 +0200 From: Peter Zijlstra To: Konrad Rzeszutek Wilk Cc: "H. Peter Anvin" , Steven Noonan , Mel Gorman , Cyrill Gorcunov , David Vrabel , Linus Torvalds , Ingo Molnar , Rik van Riel , Andrew Morton , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML , Pavel Emelyanov Subject: Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels Message-ID: <20140409150935.GC10526@twins.programming.kicks-ass.net> References: <20140407182854.GH7292@suse.de> <5342FC0E.9080701@zytor.com> <20140407193646.GC23983@moon> <5342FFB0.6010501@zytor.com> <20140407212535.GJ7292@suse.de> <5344631D.1050203@zytor.com> <20140409150448.GE5860@phenom.dumpdata.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140409150448.GE5860@phenom.dumpdata.com> User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 09, 2014 at 11:04:48AM -0400, Konrad Rzeszutek Wilk wrote: > On Tue, Apr 08, 2014 at 01:59:09PM -0700, H. Peter Anvin wrote: > > On 04/08/2014 01:51 PM, Steven Noonan wrote: > > > On Tue, Apr 8, 2014 at 8:16 AM, H. Peter Anvin wrote: > > >> > > >> > > >> Of course, it would also be preferable if Amazon (or anything else) didn't need Xen PV :( > > > > > > Well Amazon doesn't expose NUMA on PV, only on HVM guests. > > > > > > > Yes, but Amazon is one of the main things keeping Xen PV alive as far as > > I can tell, which means the support gets built in, and so on. > > Taking the snarkiness aside, the issue here is that even on guests > without NUMA exposed the problem shows up. That is the 'mknuma' are > still being called even if the guest topology is not NUMA! > > Which brings a question - why isn't the mknuma and its friends gatted by > an jump_label machinery or such? > > Mel, any particular reasons why it couldn't be done this way? Hmm,. I thought we disabled all that when there was only the 1 node. All this should be driven from task_tick_numa() which only gets called when numabalancing_enabled, and that _should_ be false when nr_nodes == 1. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933945AbaDIPTT (ORCPT ); Wed, 9 Apr 2014 11:19:19 -0400 Received: from userp1040.oracle.com ([156.151.31.81]:40569 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933420AbaDIPTO (ORCPT ); Wed, 9 Apr 2014 11:19:14 -0400 Date: Wed, 9 Apr 2014 11:18:27 -0400 From: Konrad Rzeszutek Wilk To: Mel Gorman Cc: "H. Peter Anvin" , Steven Noonan , Cyrill Gorcunov , David Vrabel , Linus Torvalds , Ingo Molnar , Rik van Riel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML , Pavel Emelyanov Subject: Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels Message-ID: <20140409151827.GA6445@phenom.dumpdata.com> References: <20140407161910.GJ1444@moon> <20140407182854.GH7292@suse.de> <5342FC0E.9080701@zytor.com> <20140407193646.GC23983@moon> <5342FFB0.6010501@zytor.com> <20140407212535.GJ7292@suse.de> <20140408160250.GE31554@phenom.dumpdata.com> <20140408165123.GN7292@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140408165123.GN7292@suse.de> User-Agent: Mutt/1.5.21 (2010-09-15) X-Source-IP: acsinet21.oracle.com [141.146.126.237] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Apr 08, 2014 at 05:51:23PM +0100, Mel Gorman wrote: > On Tue, Apr 08, 2014 at 12:02:50PM -0400, Konrad Rzeszutek Wilk wrote: > > .snip.. > > > >>> David Vrabel has a patchset which I presumed would be pulled through > > > >the > > > >>> Xen tree this merge window: > > > >>> > > > >>> [PATCHv5 0/8] x86/xen: fixes for mapping high MMIO regions (and > > > >remove > > > >>> _PAGE_IOMAP) > > > >>> > > > >>> That frees up this bit. > > > >>> > > > >> > > > >> Thanks, I was not aware of that patch. Based on it, I intend to > > > >force > > > >> automatic NUMA balancing to depend on !XEN and see what the reaction > > > >is. If > > > >> support for Xen is really required then it potentially be re-enabled > > > >if/when > > > >> that series is merged assuming they do not need the bit for something > > > >else. > > > >> > > > > > > > >Amazon EC2 does have large memory instance types with NUMA exposed to > > > >the guest (e.g. c3.8xlarge, i2.8xlarge, etc), so it'd be preferable > > > >(to me anyway) if we didn't require !XEN. > > > > What about the patch that David Vrabel posted: > > > > http://osdir.com/ml/general/2014-03/msg41979.html > > > > Has anybody taken it for a spin? > > Alternatively "[PATCH 4/5] mm: use paravirt friendly ops for NUMA > hinting ptes" which modifies the NUMA pte helpers instead of the main > set/clear ones. Ah nice! Looking forward to it being posted as non-RFC and could you also please CC 'xen-devel@lists.xenproject.org' on it? Thank you! From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933911AbaDIPj0 (ORCPT ); Wed, 9 Apr 2014 11:39:26 -0400 Received: from cantor2.suse.de ([195.135.220.15]:58975 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932898AbaDIPjX (ORCPT ); Wed, 9 Apr 2014 11:39:23 -0400 Date: Wed, 9 Apr 2014 16:39:17 +0100 From: Mel Gorman To: Konrad Rzeszutek Wilk Cc: "H. Peter Anvin" , Steven Noonan , Cyrill Gorcunov , David Vrabel , Linus Torvalds , Ingo Molnar , Rik van Riel , Andrew Morton , Peter Zijlstra , Andrea Arcangeli , Linux-MM , Linux-X86 , LKML , Pavel Emelyanov Subject: Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels Message-ID: <20140409153916.GT7292@suse.de> References: <20140407182854.GH7292@suse.de> <5342FC0E.9080701@zytor.com> <20140407193646.GC23983@moon> <5342FFB0.6010501@zytor.com> <20140407212535.GJ7292@suse.de> <20140408160250.GE31554@phenom.dumpdata.com> <20140408165123.GN7292@suse.de> <20140409151827.GA6445@phenom.dumpdata.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20140409151827.GA6445@phenom.dumpdata.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 09, 2014 at 11:18:27AM -0400, Konrad Rzeszutek Wilk wrote: > On Tue, Apr 08, 2014 at 05:51:23PM +0100, Mel Gorman wrote: > > On Tue, Apr 08, 2014 at 12:02:50PM -0400, Konrad Rzeszutek Wilk wrote: > > > .snip.. > > > > >>> David Vrabel has a patchset which I presumed would be pulled through > > > > >the > > > > >>> Xen tree this merge window: > > > > >>> > > > > >>> [PATCHv5 0/8] x86/xen: fixes for mapping high MMIO regions (and > > > > >remove > > > > >>> _PAGE_IOMAP) > > > > >>> > > > > >>> That frees up this bit. > > > > >>> > > > > >> > > > > >> Thanks, I was not aware of that patch. Based on it, I intend to > > > > >force > > > > >> automatic NUMA balancing to depend on !XEN and see what the reaction > > > > >is. If > > > > >> support for Xen is really required then it potentially be re-enabled > > > > >if/when > > > > >> that series is merged assuming they do not need the bit for something > > > > >else. > > > > >> > > > > > > > > > >Amazon EC2 does have large memory instance types with NUMA exposed to > > > > >the guest (e.g. c3.8xlarge, i2.8xlarge, etc), so it'd be preferable > > > > >(to me anyway) if we didn't require !XEN. > > > > > > What about the patch that David Vrabel posted: > > > > > > http://osdir.com/ml/general/2014-03/msg41979.html > > > > > > Has anybody taken it for a spin? > > > > Alternatively "[PATCH 4/5] mm: use paravirt friendly ops for NUMA > > hinting ptes" which modifies the NUMA pte helpers instead of the main > > set/clear ones. > > Ah nice! Looking forward to it being posted as non-RFC and could you also > please CC 'xen-devel@lists.xenproject.org' on it? > Yes I will. Unless the x86 maintainers push for it on the grounds that it is a functional fix for xen, I'm going to wait until after the merge window to resend it. That'd give it some chance of being tested in -next before hitting mainline. -- Mel Gorman SUSE Labs