From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-la0-f53.google.com (mail-la0-f53.google.com [209.85.215.53]) by kanga.kvack.org (Postfix) with ESMTP id E863F6B0031 for ; Thu, 3 Apr 2014 15:09:55 -0400 (EDT) Received: by mail-la0-f53.google.com with SMTP id b8so1669054lan.40 for ; Thu, 03 Apr 2014 12:09:55 -0700 (PDT) Received: from mail-la0-x22f.google.com (mail-la0-x22f.google.com [2a00:1450:4010:c03::22f]) by mx.google.com with ESMTPS id on7si4106224lbb.221.2014.04.03.12.09.53 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 03 Apr 2014 12:09:54 -0700 (PDT) Received: by mail-la0-f47.google.com with SMTP id pn19so1699606lab.34 for ; Thu, 03 Apr 2014 12:09:53 -0700 (PDT) Message-Id: <20140403184844.260532690@openvz.org> Date: Thu, 03 Apr 2014 22:48:44 +0400 From: Cyrill Gorcunov Subject: [rfc 0/3] Cleaning up soft-dirty bit usage Sender: owner-linux-mm@kvack.org List-ID: To: linux-kernel@vger.kernel.org Cc: gorcunov@openvz.org, linux-mm@kvack.org Hi! I've been trying to clean up soft-dirty bit usage. I can't cleanup "ridiculous macros in pgtable-2level.h" completely because I need to define _PAGE_FILE,_PAGE_PROTNONE,_PAGE_NUMA bits in sequence manner like #define _PAGE_BIT_FILE (_PAGE_BIT_PRESENT + 1) /* _PAGE_BIT_RW */ #define _PAGE_BIT_NUMA (_PAGE_BIT_PRESENT + 2) /* _PAGE_BIT_USER */ #define _PAGE_BIT_PROTNONE (_PAGE_BIT_PRESENT + 3) /* _PAGE_BIT_PWT */ which can't be done right now because numa code needs to save original pte bits for example in __split_huge_page_map, if I'm not missing something obvious. Also if we ever redefine the bits above we will need to update PAT code which uses _PAGE_GLOBAL + _PAGE_PRESENT to make pte_present return true or false. Another weird thing I found is the following sequence: mprotect_fixup change_protection (passes @prot_numa = 0 which finally ends up in) ... change_pte_range(..., prot_numa) if (!prot_numa) { ... } else { ... this seems to be dead code branch ... } is it intentional, and @prot_numa argument is supposed to be passed with prot_numa = 1 one day, or it's leftover from old times? Note I've not yet tested the series building it now, hopefully finish testing in a couple of hours. Linus, by saying "define the bits we use when PAGE_PRESENT==0 separately and explicitly" you meant complete rework of the bits, right? Not simply group them in once place in a header? Cyrill -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-la0-f46.google.com (mail-la0-f46.google.com [209.85.215.46]) by kanga.kvack.org (Postfix) with ESMTP id 1CEF86B0035 for ; Thu, 3 Apr 2014 15:09:57 -0400 (EDT) Received: by mail-la0-f46.google.com with SMTP id hr17so1683903lab.5 for ; Thu, 03 Apr 2014 12:09:57 -0700 (PDT) Received: from mail-la0-x233.google.com (mail-la0-x233.google.com [2a00:1450:4010:c03::233]) by mx.google.com with ESMTPS id sz4si4124004lbb.36.2014.04.03.12.09.55 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 03 Apr 2014 12:09:55 -0700 (PDT) Received: by mail-la0-f51.google.com with SMTP id pv20so1683511lab.38 for ; Thu, 03 Apr 2014 12:09:55 -0700 (PDT) Message-Id: <20140403190952.552007526@openvz.org> Date: Thu, 03 Apr 2014 22:48:45 +0400 From: Cyrill Gorcunov Subject: [rfc 1/3] mm: pgtable -- Drop unneeded preprocessor ifdef References: <20140403184844.260532690@openvz.org> Content-Disposition: inline; filename=pgbits-drop-if Sender: owner-linux-mm@kvack.org List-ID: To: linux-kernel@vger.kernel.org Cc: gorcunov@openvz.org, linux-mm@kvack.org, Linus Torvalds , Mel Gorman , Peter Anvin , Ingo Molnar , Steven Noonan , Rik van Riel , David Vrabel , Andrew Morton , Peter Zijlstra , Pavel Emelyanov _PAGE_BIT_FILE (bit 6) is always less than _PAGE_BIT_PROTNONE (bit 9) so drop redundant #ifdef. CC: Linus Torvalds CC: Mel Gorman CC: Peter Anvin CC: Ingo Molnar CC: Steven Noonan CC: Rik van Riel CC: David Vrabel CC: Andrew Morton CC: Peter Zijlstra CC: Pavel Emelyanov Signed-off-by: Cyrill Gorcunov --- arch/x86/include/asm/pgtable-2level.h | 10 ---------- arch/x86/include/asm/pgtable_64.h | 5 ----- 2 files changed, 15 deletions(-) Index: linux-2.6.git/arch/x86/include/asm/pgtable-2level.h =================================================================== --- linux-2.6.git.orig/arch/x86/include/asm/pgtable-2level.h +++ linux-2.6.git/arch/x86/include/asm/pgtable-2level.h @@ -115,13 +115,8 @@ static __always_inline pte_t pgoff_to_pt */ #define PTE_FILE_MAX_BITS 29 #define PTE_FILE_SHIFT1 (_PAGE_BIT_PRESENT + 1) -#if _PAGE_BIT_FILE < _PAGE_BIT_PROTNONE #define PTE_FILE_SHIFT2 (_PAGE_BIT_FILE + 1) #define PTE_FILE_SHIFT3 (_PAGE_BIT_PROTNONE + 1) -#else -#define PTE_FILE_SHIFT2 (_PAGE_BIT_PROTNONE + 1) -#define PTE_FILE_SHIFT3 (_PAGE_BIT_FILE + 1) -#endif #define PTE_FILE_BITS1 (PTE_FILE_SHIFT2 - PTE_FILE_SHIFT1 - 1) #define PTE_FILE_BITS2 (PTE_FILE_SHIFT3 - PTE_FILE_SHIFT2 - 1) @@ -153,13 +148,8 @@ static __always_inline pte_t pgoff_to_pt #endif /* CONFIG_MEM_SOFT_DIRTY */ /* Encode and de-code a swap entry */ -#if _PAGE_BIT_FILE < _PAGE_BIT_PROTNONE #define SWP_TYPE_BITS (_PAGE_BIT_FILE - _PAGE_BIT_PRESENT - 1) #define SWP_OFFSET_SHIFT (_PAGE_BIT_PROTNONE + 1) -#else -#define SWP_TYPE_BITS (_PAGE_BIT_PROTNONE - _PAGE_BIT_PRESENT - 1) -#define SWP_OFFSET_SHIFT (_PAGE_BIT_FILE + 1) -#endif #define MAX_SWAPFILES_CHECK() BUILD_BUG_ON(MAX_SWAPFILES_SHIFT > SWP_TYPE_BITS) Index: linux-2.6.git/arch/x86/include/asm/pgtable_64.h =================================================================== --- linux-2.6.git.orig/arch/x86/include/asm/pgtable_64.h +++ linux-2.6.git/arch/x86/include/asm/pgtable_64.h @@ -143,13 +143,8 @@ static inline int pgd_large(pgd_t pgd) { #define pte_unmap(pte) ((void)(pte))/* NOP */ /* Encode and de-code a swap entry */ -#if _PAGE_BIT_FILE < _PAGE_BIT_PROTNONE #define SWP_TYPE_BITS (_PAGE_BIT_FILE - _PAGE_BIT_PRESENT - 1) #define SWP_OFFSET_SHIFT (_PAGE_BIT_PROTNONE + 1) -#else -#define SWP_TYPE_BITS (_PAGE_BIT_PROTNONE - _PAGE_BIT_PRESENT - 1) -#define SWP_OFFSET_SHIFT (_PAGE_BIT_FILE + 1) -#endif #define MAX_SWAPFILES_CHECK() BUILD_BUG_ON(MAX_SWAPFILES_SHIFT > SWP_TYPE_BITS) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lb0-f177.google.com (mail-lb0-f177.google.com [209.85.217.177]) by kanga.kvack.org (Postfix) with ESMTP id 9287F6B0035 for ; Thu, 3 Apr 2014 15:09:58 -0400 (EDT) Received: by mail-lb0-f177.google.com with SMTP id z11so1651871lbi.8 for ; Thu, 03 Apr 2014 12:09:57 -0700 (PDT) Received: from mail-la0-x22c.google.com (mail-la0-x22c.google.com [2a00:1450:4010:c03::22c]) by mx.google.com with ESMTPS id iz10si3231835lbc.249.2014.04.03.12.09.56 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 03 Apr 2014 12:09:56 -0700 (PDT) Received: by mail-la0-f44.google.com with SMTP id c6so1699606lan.17 for ; Thu, 03 Apr 2014 12:09:56 -0700 (PDT) Message-Id: <20140403190952.661204455@openvz.org> Date: Thu, 03 Apr 2014 22:48:46 +0400 From: Cyrill Gorcunov Subject: [rfc 2/3] mm: pgtable -- Require X86_64 for soft-dirty tracker References: <20140403184844.260532690@openvz.org> Content-Disposition: inline; filename=pgbits-drop-softdirty-non-x86-64 Sender: owner-linux-mm@kvack.org List-ID: To: linux-kernel@vger.kernel.org Cc: gorcunov@openvz.org, linux-mm@kvack.org, Linus Torvalds , Mel Gorman , Peter Anvin , Ingo Molnar , Steven Noonan , Rik van Riel , David Vrabel , Andrew Morton , Peter Zijlstra , Pavel Emelyanov Tracking dirty status on 2 level pages requires very ugly macros and taking into account how old the machines who can operate without PAE mode only are, lets drop soft dirty tracker from them for code simplicity (note I can't drop all the macros from 2 level pages by now since _PAGE_BIT_PROTNONE and _PAGE_BIT_FILE are still used even without tracker). Linus proposed to completely rip off softdirty support on x86-32 (even with PAE) and since for CRIU we're not planning to support native x86-32 mode, lets do that. (Softdirty tracker is relatively new feature which mostly used by CRIU so I don't expect if such API change would cause problems on userspace). CC: Linus Torvalds CC: Mel Gorman CC: Peter Anvin CC: Ingo Molnar CC: Steven Noonan CC: Rik van Riel CC: David Vrabel CC: Andrew Morton CC: Peter Zijlstra CC: Pavel Emelyanov Signed-off-by: Cyrill Gorcunov --- arch/x86/Kconfig | 2 - arch/x86/include/asm/pgtable-2level.h | 49 ---------------------------------- 2 files changed, 1 insertion(+), 50 deletions(-) Index: linux-2.6.git/arch/x86/Kconfig =================================================================== --- linux-2.6.git.orig/arch/x86/Kconfig +++ linux-2.6.git/arch/x86/Kconfig @@ -104,7 +104,7 @@ config X86 select HAVE_ARCH_SECCOMP_FILTER select BUILDTIME_EXTABLE_SORT select GENERIC_CMOS_UPDATE - select HAVE_ARCH_SOFT_DIRTY + select HAVE_ARCH_SOFT_DIRTY if X86_64 select CLOCKSOURCE_WATCHDOG select GENERIC_CLOCKEVENTS select ARCH_CLOCKSOURCE_DATA if X86_64 Index: linux-2.6.git/arch/x86/include/asm/pgtable-2level.h =================================================================== --- linux-2.6.git.orig/arch/x86/include/asm/pgtable-2level.h +++ linux-2.6.git/arch/x86/include/asm/pgtable-2level.h @@ -62,53 +62,6 @@ static inline unsigned long pte_bitop(un return ((value >> rightshift) & mask) << leftshift; } -#ifdef CONFIG_MEM_SOFT_DIRTY - -/* - * Bits _PAGE_BIT_PRESENT, _PAGE_BIT_FILE, _PAGE_BIT_SOFT_DIRTY and - * _PAGE_BIT_PROTNONE are taken, split up the 28 bits of offset - * into this range. - */ -#define PTE_FILE_MAX_BITS 28 -#define PTE_FILE_SHIFT1 (_PAGE_BIT_PRESENT + 1) -#define PTE_FILE_SHIFT2 (_PAGE_BIT_FILE + 1) -#define PTE_FILE_SHIFT3 (_PAGE_BIT_PROTNONE + 1) -#define PTE_FILE_SHIFT4 (_PAGE_BIT_SOFT_DIRTY + 1) -#define PTE_FILE_BITS1 (PTE_FILE_SHIFT2 - PTE_FILE_SHIFT1 - 1) -#define PTE_FILE_BITS2 (PTE_FILE_SHIFT3 - PTE_FILE_SHIFT2 - 1) -#define PTE_FILE_BITS3 (PTE_FILE_SHIFT4 - PTE_FILE_SHIFT3 - 1) - -#define PTE_FILE_MASK1 ((1U << PTE_FILE_BITS1) - 1) -#define PTE_FILE_MASK2 ((1U << PTE_FILE_BITS2) - 1) -#define PTE_FILE_MASK3 ((1U << PTE_FILE_BITS3) - 1) - -#define PTE_FILE_LSHIFT2 (PTE_FILE_BITS1) -#define PTE_FILE_LSHIFT3 (PTE_FILE_BITS1 + PTE_FILE_BITS2) -#define PTE_FILE_LSHIFT4 (PTE_FILE_BITS1 + PTE_FILE_BITS2 + PTE_FILE_BITS3) - -static __always_inline pgoff_t pte_to_pgoff(pte_t pte) -{ - return (pgoff_t) - (pte_bitop(pte.pte_low, PTE_FILE_SHIFT1, PTE_FILE_MASK1, 0) + - pte_bitop(pte.pte_low, PTE_FILE_SHIFT2, PTE_FILE_MASK2, PTE_FILE_LSHIFT2) + - pte_bitop(pte.pte_low, PTE_FILE_SHIFT3, PTE_FILE_MASK3, PTE_FILE_LSHIFT3) + - pte_bitop(pte.pte_low, PTE_FILE_SHIFT4, -1UL, PTE_FILE_LSHIFT4)); -} - -static __always_inline pte_t pgoff_to_pte(pgoff_t off) -{ - return (pte_t){ - .pte_low = - pte_bitop(off, 0, PTE_FILE_MASK1, PTE_FILE_SHIFT1) + - pte_bitop(off, PTE_FILE_LSHIFT2, PTE_FILE_MASK2, PTE_FILE_SHIFT2) + - pte_bitop(off, PTE_FILE_LSHIFT3, PTE_FILE_MASK3, PTE_FILE_SHIFT3) + - pte_bitop(off, PTE_FILE_LSHIFT4, -1UL, PTE_FILE_SHIFT4) + - _PAGE_FILE, - }; -} - -#else /* CONFIG_MEM_SOFT_DIRTY */ - /* * Bits _PAGE_BIT_PRESENT, _PAGE_BIT_FILE and _PAGE_BIT_PROTNONE are taken, * split up the 29 bits of offset into this range. @@ -145,8 +98,6 @@ static __always_inline pte_t pgoff_to_pt }; } -#endif /* CONFIG_MEM_SOFT_DIRTY */ - /* Encode and de-code a swap entry */ #define SWP_TYPE_BITS (_PAGE_BIT_FILE - _PAGE_BIT_PRESENT - 1) #define SWP_OFFSET_SHIFT (_PAGE_BIT_PROTNONE + 1) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-la0-f47.google.com (mail-la0-f47.google.com [209.85.215.47]) by kanga.kvack.org (Postfix) with ESMTP id EF1A76B0037 for ; Thu, 3 Apr 2014 15:09:58 -0400 (EDT) Received: by mail-la0-f47.google.com with SMTP id pn19so1667690lab.6 for ; Thu, 03 Apr 2014 12:09:58 -0700 (PDT) Received: from mail-la0-x234.google.com (mail-la0-x234.google.com [2a00:1450:4010:c03::234]) by mx.google.com with ESMTPS id g7si4109914lab.166.2014.04.03.12.09.57 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 03 Apr 2014 12:09:57 -0700 (PDT) Received: by mail-la0-f52.google.com with SMTP id ec20so1711862lab.11 for ; Thu, 03 Apr 2014 12:09:56 -0700 (PDT) Message-Id: <20140403190952.766500364@openvz.org> Date: Thu, 03 Apr 2014 22:48:47 +0400 From: Cyrill Gorcunov Subject: [rfc 3/3] mm: pgtable -- Use _PAGE_SOFT_DIRTY for swap entries References: <20140403184844.260532690@openvz.org> Content-Disposition: inline; filename=pgbits-drop-pse-for-dirty-swap Sender: owner-linux-mm@kvack.org List-ID: To: linux-kernel@vger.kernel.org Cc: gorcunov@openvz.org, linux-mm@kvack.org, Linus Torvalds , Mel Gorman , Peter Anvin , Ingo Molnar , Steven Noonan , Rik van Riel , David Vrabel , Andrew Morton , Peter Zijlstra , Pavel Emelyanov Since we support soft-dirty on x86-64 now we can release _PAGE_PSE bit used to track dirty swap entries and reuse ealready existing _PAGE_SOFT_DIRTY. Thus for all soft-dirty needs we use same pte bit. CC: Linus Torvalds CC: Mel Gorman CC: Peter Anvin CC: Ingo Molnar CC: Steven Noonan CC: Rik van Riel CC: David Vrabel CC: Andrew Morton CC: Peter Zijlstra CC: Pavel Emelyanov Signed-off-by: Cyrill Gorcunov --- arch/x86/include/asm/pgtable_64.h | 12 ++++++++++-- arch/x86/include/asm/pgtable_types.h | 19 ++++--------------- 2 files changed, 14 insertions(+), 17 deletions(-) Index: linux-2.6.git/arch/x86/include/asm/pgtable_64.h =================================================================== --- linux-2.6.git.orig/arch/x86/include/asm/pgtable_64.h +++ linux-2.6.git/arch/x86/include/asm/pgtable_64.h @@ -142,9 +142,17 @@ static inline int pgd_large(pgd_t pgd) { #define pte_offset_map(dir, address) pte_offset_kernel((dir), (address)) #define pte_unmap(pte) ((void)(pte))/* NOP */ -/* Encode and de-code a swap entry */ +/* + * Encode and de-code a swap entry. When soft-dirty memory tracker is + * enabled we need to borrow _PAGE_BIT_SOFT_DIRTY bit for own needs, + * which limits the max size of swap partiotion about to 1T. + */ #define SWP_TYPE_BITS (_PAGE_BIT_FILE - _PAGE_BIT_PRESENT - 1) -#define SWP_OFFSET_SHIFT (_PAGE_BIT_PROTNONE + 1) +#ifdef CONFIG_MEM_SOFT_DIRTY +# define SWP_OFFSET_SHIFT (_PAGE_BIT_SOFT_DIRTY + 1) +#else +# define SWP_OFFSET_SHIFT (_PAGE_BIT_PROTNONE + 1) +#endif #define MAX_SWAPFILES_CHECK() BUILD_BUG_ON(MAX_SWAPFILES_SHIFT > SWP_TYPE_BITS) Index: linux-2.6.git/arch/x86/include/asm/pgtable_types.h =================================================================== --- linux-2.6.git.orig/arch/x86/include/asm/pgtable_types.h +++ linux-2.6.git/arch/x86/include/asm/pgtable_types.h @@ -59,29 +59,18 @@ * The same hidden bit is used by kmemcheck, but since kmemcheck * works on kernel pages while soft-dirty engine on user space, * they do not conflict with each other. + * + * Because soft-dirty is limited to x86-64 only we can reuse this + * bit to track swap entries as well. */ #define _PAGE_BIT_SOFT_DIRTY _PAGE_BIT_HIDDEN #ifdef CONFIG_MEM_SOFT_DIRTY #define _PAGE_SOFT_DIRTY (_AT(pteval_t, 1) << _PAGE_BIT_SOFT_DIRTY) +#define _PAGE_SWP_SOFT_DIRTY _PAGE_SOFT_DIRTY #else #define _PAGE_SOFT_DIRTY (_AT(pteval_t, 0)) -#endif - -/* - * Tracking soft dirty bit when a page goes to a swap is tricky. - * We need a bit which can be stored in pte _and_ not conflict - * with swap entry format. On x86 bits 6 and 7 are *not* involved - * into swap entry computation, but bit 6 is used for nonlinear - * file mapping, so we borrow bit 7 for soft dirty tracking. - * - * Please note that this bit must be treated as swap dirty page - * mark if and only if the PTE has present bit clear! - */ -#ifdef CONFIG_MEM_SOFT_DIRTY -#define _PAGE_SWP_SOFT_DIRTY _PAGE_PSE -#else #define _PAGE_SWP_SOFT_DIRTY (_AT(pteval_t, 0)) #endif -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-la0-f51.google.com (mail-la0-f51.google.com [209.85.215.51]) by kanga.kvack.org (Postfix) with ESMTP id 573C26B0031 for ; Mon, 7 Apr 2014 09:24:41 -0400 (EDT) Received: by mail-la0-f51.google.com with SMTP id pv20so4674515lab.24 for ; Mon, 07 Apr 2014 06:24:40 -0700 (PDT) Received: from mail-la0-x233.google.com (mail-la0-x233.google.com [2a00:1450:4010:c03::233]) by mx.google.com with ESMTPS id 1si12291386lam.90.2014.04.07.06.24.39 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 07 Apr 2014 06:24:39 -0700 (PDT) Received: by mail-la0-f51.google.com with SMTP id pv20so4736169lab.38 for ; Mon, 07 Apr 2014 06:24:39 -0700 (PDT) Date: Mon, 7 Apr 2014 17:24:37 +0400 From: Cyrill Gorcunov Subject: Re: [rfc 0/3] Cleaning up soft-dirty bit usage Message-ID: <20140407132437.GH1444@moon> References: <20140403184844.260532690@openvz.org> <20140407130701.GA16677@node.dhcp.inet.fi> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140407130701.GA16677@node.dhcp.inet.fi> Sender: owner-linux-mm@kvack.org List-ID: To: "Kirill A. Shutemov" Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org On Mon, Apr 07, 2014 at 04:07:01PM +0300, Kirill A. Shutemov wrote: > On Thu, Apr 03, 2014 at 10:48:44PM +0400, Cyrill Gorcunov wrote: > > Hi! I've been trying to clean up soft-dirty bit usage. I can't cleanup > > "ridiculous macros in pgtable-2level.h" completely because I need to > > define _PAGE_FILE,_PAGE_PROTNONE,_PAGE_NUMA bits in sequence manner > > like > > > > #define _PAGE_BIT_FILE (_PAGE_BIT_PRESENT + 1) /* _PAGE_BIT_RW */ > > #define _PAGE_BIT_NUMA (_PAGE_BIT_PRESENT + 2) /* _PAGE_BIT_USER */ > > #define _PAGE_BIT_PROTNONE (_PAGE_BIT_PRESENT + 3) /* _PAGE_BIT_PWT */ > > > > which can't be done right now because numa code needs to save original > > pte bits for example in __split_huge_page_map, if I'm not missing something > > obvious. > > Sorry, I didn't get this. How __split_huge_page_map() does depend on pte > bits order? __split_huge_page_map ... for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) { ... here we modify with pte bits entry = pte_mknuma(entry); --> clean _PAGE_PRESENT and set _PAGE_NUMA pte bits must remain valid and meaningful, for example we might have set _PAGE_RW here > > is it intentional, and @prot_numa argument is supposed to be passed > > with prot_numa = 1 one day, or it's leftover from old times? > > I see one more user of change_protection() -- change_prot_numa(), which > has .prot_numa == 1. Yeah, thanks, managed to miss this. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Kirill A. Shutemov" Subject: Re: [rfc 0/3] Cleaning up soft-dirty bit usage Date: Mon, 7 Apr 2014 16:07:01 +0300 Message-ID: <20140407130701.GA16677@node.dhcp.inet.fi> References: <20140403184844.260532690@openvz.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: <20140403184844.260532690@openvz.org> Sender: linux-kernel-owner@vger.kernel.org To: Cyrill Gorcunov Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org List-Id: linux-mm.kvack.org On Thu, Apr 03, 2014 at 10:48:44PM +0400, Cyrill Gorcunov wrote: > Hi! I've been trying to clean up soft-dirty bit usage. I can't cleanup > "ridiculous macros in pgtable-2level.h" completely because I need to > define _PAGE_FILE,_PAGE_PROTNONE,_PAGE_NUMA bits in sequence manner > like > > #define _PAGE_BIT_FILE (_PAGE_BIT_PRESENT + 1) /* _PAGE_BIT_RW */ > #define _PAGE_BIT_NUMA (_PAGE_BIT_PRESENT + 2) /* _PAGE_BIT_USER */ > #define _PAGE_BIT_PROTNONE (_PAGE_BIT_PRESENT + 3) /* _PAGE_BIT_PWT */ > > which can't be done right now because numa code needs to save original > pte bits for example in __split_huge_page_map, if I'm not missing something > obvious. Sorry, I didn't get this. How __split_huge_page_map() does depend on pte bits order? > > Also if we ever redefine the bits above we will need to update PAT code > which uses _PAGE_GLOBAL + _PAGE_PRESENT to make pte_present return true > or false. > > Another weird thing I found is the following sequence: > > mprotect_fixup > change_protection (passes @prot_numa = 0 which finally ends up in) > ... > change_pte_range(..., prot_numa) > > if (!prot_numa) { > ... > } else { > ... this seems to be dead code branch ... > } > > is it intentional, and @prot_numa argument is supposed to be passed > with prot_numa = 1 one day, or it's leftover from old times? I see one more user of change_protection() -- change_prot_numa(), which has .prot_numa == 1. -- Kirill A. Shutemov From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753413AbaDCTJ5 (ORCPT ); Thu, 3 Apr 2014 15:09:57 -0400 Received: from mail-lb0-f178.google.com ([209.85.217.178]:42961 "EHLO mail-lb0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753348AbaDCTJz (ORCPT ); Thu, 3 Apr 2014 15:09:55 -0400 Message-Id: <20140403184844.260532690@openvz.org> User-Agent: quilt/0.60-1 Date: Thu, 03 Apr 2014 22:48:44 +0400 From: Cyrill Gorcunov To: linux-kernel@vger.kernel.org Cc: gorcunov@openvz.org, linux-mm@kvack.org Subject: [rfc 0/3] Cleaning up soft-dirty bit usage Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi! I've been trying to clean up soft-dirty bit usage. I can't cleanup "ridiculous macros in pgtable-2level.h" completely because I need to define _PAGE_FILE,_PAGE_PROTNONE,_PAGE_NUMA bits in sequence manner like #define _PAGE_BIT_FILE (_PAGE_BIT_PRESENT + 1) /* _PAGE_BIT_RW */ #define _PAGE_BIT_NUMA (_PAGE_BIT_PRESENT + 2) /* _PAGE_BIT_USER */ #define _PAGE_BIT_PROTNONE (_PAGE_BIT_PRESENT + 3) /* _PAGE_BIT_PWT */ which can't be done right now because numa code needs to save original pte bits for example in __split_huge_page_map, if I'm not missing something obvious. Also if we ever redefine the bits above we will need to update PAT code which uses _PAGE_GLOBAL + _PAGE_PRESENT to make pte_present return true or false. Another weird thing I found is the following sequence: mprotect_fixup change_protection (passes @prot_numa = 0 which finally ends up in) ... change_pte_range(..., prot_numa) if (!prot_numa) { ... } else { ... this seems to be dead code branch ... } is it intentional, and @prot_numa argument is supposed to be passed with prot_numa = 1 one day, or it's leftover from old times? Note I've not yet tested the series building it now, hopefully finish testing in a couple of hours. Linus, by saying "define the bits we use when PAGE_PRESENT==0 separately and explicitly" you meant complete rework of the bits, right? Not simply group them in once place in a header? Cyrill From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753486AbaDCTKe (ORCPT ); Thu, 3 Apr 2014 15:10:34 -0400 Received: from mail-la0-f43.google.com ([209.85.215.43]:34293 "EHLO mail-la0-f43.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753356AbaDCTJ4 (ORCPT ); Thu, 3 Apr 2014 15:09:56 -0400 Message-Id: <20140403190952.552007526@openvz.org> User-Agent: quilt/0.60-1 Date: Thu, 03 Apr 2014 22:48:45 +0400 From: Cyrill Gorcunov To: linux-kernel@vger.kernel.org Cc: gorcunov@openvz.org, linux-mm@kvack.org, Linus Torvalds , Mel Gorman , Peter Anvin , Ingo Molnar , Steven Noonan , Rik van Riel , David Vrabel , Andrew Morton , Peter Zijlstra , Pavel Emelyanov Subject: [rfc 1/3] mm: pgtable -- Drop unneeded preprocessor ifdef References: <20140403184844.260532690@openvz.org> Content-Disposition: inline; filename=pgbits-drop-if Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org _PAGE_BIT_FILE (bit 6) is always less than _PAGE_BIT_PROTNONE (bit 9) so drop redundant #ifdef. CC: Linus Torvalds CC: Mel Gorman CC: Peter Anvin CC: Ingo Molnar CC: Steven Noonan CC: Rik van Riel CC: David Vrabel CC: Andrew Morton CC: Peter Zijlstra CC: Pavel Emelyanov Signed-off-by: Cyrill Gorcunov --- arch/x86/include/asm/pgtable-2level.h | 10 ---------- arch/x86/include/asm/pgtable_64.h | 5 ----- 2 files changed, 15 deletions(-) Index: linux-2.6.git/arch/x86/include/asm/pgtable-2level.h =================================================================== --- linux-2.6.git.orig/arch/x86/include/asm/pgtable-2level.h +++ linux-2.6.git/arch/x86/include/asm/pgtable-2level.h @@ -115,13 +115,8 @@ static __always_inline pte_t pgoff_to_pt */ #define PTE_FILE_MAX_BITS 29 #define PTE_FILE_SHIFT1 (_PAGE_BIT_PRESENT + 1) -#if _PAGE_BIT_FILE < _PAGE_BIT_PROTNONE #define PTE_FILE_SHIFT2 (_PAGE_BIT_FILE + 1) #define PTE_FILE_SHIFT3 (_PAGE_BIT_PROTNONE + 1) -#else -#define PTE_FILE_SHIFT2 (_PAGE_BIT_PROTNONE + 1) -#define PTE_FILE_SHIFT3 (_PAGE_BIT_FILE + 1) -#endif #define PTE_FILE_BITS1 (PTE_FILE_SHIFT2 - PTE_FILE_SHIFT1 - 1) #define PTE_FILE_BITS2 (PTE_FILE_SHIFT3 - PTE_FILE_SHIFT2 - 1) @@ -153,13 +148,8 @@ static __always_inline pte_t pgoff_to_pt #endif /* CONFIG_MEM_SOFT_DIRTY */ /* Encode and de-code a swap entry */ -#if _PAGE_BIT_FILE < _PAGE_BIT_PROTNONE #define SWP_TYPE_BITS (_PAGE_BIT_FILE - _PAGE_BIT_PRESENT - 1) #define SWP_OFFSET_SHIFT (_PAGE_BIT_PROTNONE + 1) -#else -#define SWP_TYPE_BITS (_PAGE_BIT_PROTNONE - _PAGE_BIT_PRESENT - 1) -#define SWP_OFFSET_SHIFT (_PAGE_BIT_FILE + 1) -#endif #define MAX_SWAPFILES_CHECK() BUILD_BUG_ON(MAX_SWAPFILES_SHIFT > SWP_TYPE_BITS) Index: linux-2.6.git/arch/x86/include/asm/pgtable_64.h =================================================================== --- linux-2.6.git.orig/arch/x86/include/asm/pgtable_64.h +++ linux-2.6.git/arch/x86/include/asm/pgtable_64.h @@ -143,13 +143,8 @@ static inline int pgd_large(pgd_t pgd) { #define pte_unmap(pte) ((void)(pte))/* NOP */ /* Encode and de-code a swap entry */ -#if _PAGE_BIT_FILE < _PAGE_BIT_PROTNONE #define SWP_TYPE_BITS (_PAGE_BIT_FILE - _PAGE_BIT_PRESENT - 1) #define SWP_OFFSET_SHIFT (_PAGE_BIT_PROTNONE + 1) -#else -#define SWP_TYPE_BITS (_PAGE_BIT_PROTNONE - _PAGE_BIT_PRESENT - 1) -#define SWP_OFFSET_SHIFT (_PAGE_BIT_FILE + 1) -#endif #define MAX_SWAPFILES_CHECK() BUILD_BUG_ON(MAX_SWAPFILES_SHIFT > SWP_TYPE_BITS) From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753470AbaDCTKb (ORCPT ); Thu, 3 Apr 2014 15:10:31 -0400 Received: from mail-la0-f52.google.com ([209.85.215.52]:45507 "EHLO mail-la0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753421AbaDCTJ6 (ORCPT ); Thu, 3 Apr 2014 15:09:58 -0400 Message-Id: <20140403190952.766500364@openvz.org> User-Agent: quilt/0.60-1 Date: Thu, 03 Apr 2014 22:48:47 +0400 From: Cyrill Gorcunov To: linux-kernel@vger.kernel.org Cc: gorcunov@openvz.org, linux-mm@kvack.org, Linus Torvalds , Mel Gorman , Peter Anvin , Ingo Molnar , Steven Noonan , Rik van Riel , David Vrabel , Andrew Morton , Peter Zijlstra , Pavel Emelyanov Subject: [rfc 3/3] mm: pgtable -- Use _PAGE_SOFT_DIRTY for swap entries References: <20140403184844.260532690@openvz.org> Content-Disposition: inline; filename=pgbits-drop-pse-for-dirty-swap Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Since we support soft-dirty on x86-64 now we can release _PAGE_PSE bit used to track dirty swap entries and reuse ealready existing _PAGE_SOFT_DIRTY. Thus for all soft-dirty needs we use same pte bit. CC: Linus Torvalds CC: Mel Gorman CC: Peter Anvin CC: Ingo Molnar CC: Steven Noonan CC: Rik van Riel CC: David Vrabel CC: Andrew Morton CC: Peter Zijlstra CC: Pavel Emelyanov Signed-off-by: Cyrill Gorcunov --- arch/x86/include/asm/pgtable_64.h | 12 ++++++++++-- arch/x86/include/asm/pgtable_types.h | 19 ++++--------------- 2 files changed, 14 insertions(+), 17 deletions(-) Index: linux-2.6.git/arch/x86/include/asm/pgtable_64.h =================================================================== --- linux-2.6.git.orig/arch/x86/include/asm/pgtable_64.h +++ linux-2.6.git/arch/x86/include/asm/pgtable_64.h @@ -142,9 +142,17 @@ static inline int pgd_large(pgd_t pgd) { #define pte_offset_map(dir, address) pte_offset_kernel((dir), (address)) #define pte_unmap(pte) ((void)(pte))/* NOP */ -/* Encode and de-code a swap entry */ +/* + * Encode and de-code a swap entry. When soft-dirty memory tracker is + * enabled we need to borrow _PAGE_BIT_SOFT_DIRTY bit for own needs, + * which limits the max size of swap partiotion about to 1T. + */ #define SWP_TYPE_BITS (_PAGE_BIT_FILE - _PAGE_BIT_PRESENT - 1) -#define SWP_OFFSET_SHIFT (_PAGE_BIT_PROTNONE + 1) +#ifdef CONFIG_MEM_SOFT_DIRTY +# define SWP_OFFSET_SHIFT (_PAGE_BIT_SOFT_DIRTY + 1) +#else +# define SWP_OFFSET_SHIFT (_PAGE_BIT_PROTNONE + 1) +#endif #define MAX_SWAPFILES_CHECK() BUILD_BUG_ON(MAX_SWAPFILES_SHIFT > SWP_TYPE_BITS) Index: linux-2.6.git/arch/x86/include/asm/pgtable_types.h =================================================================== --- linux-2.6.git.orig/arch/x86/include/asm/pgtable_types.h +++ linux-2.6.git/arch/x86/include/asm/pgtable_types.h @@ -59,29 +59,18 @@ * The same hidden bit is used by kmemcheck, but since kmemcheck * works on kernel pages while soft-dirty engine on user space, * they do not conflict with each other. + * + * Because soft-dirty is limited to x86-64 only we can reuse this + * bit to track swap entries as well. */ #define _PAGE_BIT_SOFT_DIRTY _PAGE_BIT_HIDDEN #ifdef CONFIG_MEM_SOFT_DIRTY #define _PAGE_SOFT_DIRTY (_AT(pteval_t, 1) << _PAGE_BIT_SOFT_DIRTY) +#define _PAGE_SWP_SOFT_DIRTY _PAGE_SOFT_DIRTY #else #define _PAGE_SOFT_DIRTY (_AT(pteval_t, 0)) -#endif - -/* - * Tracking soft dirty bit when a page goes to a swap is tricky. - * We need a bit which can be stored in pte _and_ not conflict - * with swap entry format. On x86 bits 6 and 7 are *not* involved - * into swap entry computation, but bit 6 is used for nonlinear - * file mapping, so we borrow bit 7 for soft dirty tracking. - * - * Please note that this bit must be treated as swap dirty page - * mark if and only if the PTE has present bit clear! - */ -#ifdef CONFIG_MEM_SOFT_DIRTY -#define _PAGE_SWP_SOFT_DIRTY _PAGE_PSE -#else #define _PAGE_SWP_SOFT_DIRTY (_AT(pteval_t, 0)) #endif From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753447AbaDCTK1 (ORCPT ); Thu, 3 Apr 2014 15:10:27 -0400 Received: from mail-lb0-f178.google.com ([209.85.217.178]:48267 "EHLO mail-lb0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753419AbaDCTJ6 (ORCPT ); Thu, 3 Apr 2014 15:09:58 -0400 Message-Id: <20140403190952.661204455@openvz.org> User-Agent: quilt/0.60-1 Date: Thu, 03 Apr 2014 22:48:46 +0400 From: Cyrill Gorcunov To: linux-kernel@vger.kernel.org Cc: gorcunov@openvz.org, linux-mm@kvack.org, Linus Torvalds , Mel Gorman , Peter Anvin , Ingo Molnar , Steven Noonan , Rik van Riel , David Vrabel , Andrew Morton , Peter Zijlstra , Pavel Emelyanov Subject: [rfc 2/3] mm: pgtable -- Require X86_64 for soft-dirty tracker References: <20140403184844.260532690@openvz.org> Content-Disposition: inline; filename=pgbits-drop-softdirty-non-x86-64 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Tracking dirty status on 2 level pages requires very ugly macros and taking into account how old the machines who can operate without PAE mode only are, lets drop soft dirty tracker from them for code simplicity (note I can't drop all the macros from 2 level pages by now since _PAGE_BIT_PROTNONE and _PAGE_BIT_FILE are still used even without tracker). Linus proposed to completely rip off softdirty support on x86-32 (even with PAE) and since for CRIU we're not planning to support native x86-32 mode, lets do that. (Softdirty tracker is relatively new feature which mostly used by CRIU so I don't expect if such API change would cause problems on userspace). CC: Linus Torvalds CC: Mel Gorman CC: Peter Anvin CC: Ingo Molnar CC: Steven Noonan CC: Rik van Riel CC: David Vrabel CC: Andrew Morton CC: Peter Zijlstra CC: Pavel Emelyanov Signed-off-by: Cyrill Gorcunov --- arch/x86/Kconfig | 2 - arch/x86/include/asm/pgtable-2level.h | 49 ---------------------------------- 2 files changed, 1 insertion(+), 50 deletions(-) Index: linux-2.6.git/arch/x86/Kconfig =================================================================== --- linux-2.6.git.orig/arch/x86/Kconfig +++ linux-2.6.git/arch/x86/Kconfig @@ -104,7 +104,7 @@ config X86 select HAVE_ARCH_SECCOMP_FILTER select BUILDTIME_EXTABLE_SORT select GENERIC_CMOS_UPDATE - select HAVE_ARCH_SOFT_DIRTY + select HAVE_ARCH_SOFT_DIRTY if X86_64 select CLOCKSOURCE_WATCHDOG select GENERIC_CLOCKEVENTS select ARCH_CLOCKSOURCE_DATA if X86_64 Index: linux-2.6.git/arch/x86/include/asm/pgtable-2level.h =================================================================== --- linux-2.6.git.orig/arch/x86/include/asm/pgtable-2level.h +++ linux-2.6.git/arch/x86/include/asm/pgtable-2level.h @@ -62,53 +62,6 @@ static inline unsigned long pte_bitop(un return ((value >> rightshift) & mask) << leftshift; } -#ifdef CONFIG_MEM_SOFT_DIRTY - -/* - * Bits _PAGE_BIT_PRESENT, _PAGE_BIT_FILE, _PAGE_BIT_SOFT_DIRTY and - * _PAGE_BIT_PROTNONE are taken, split up the 28 bits of offset - * into this range. - */ -#define PTE_FILE_MAX_BITS 28 -#define PTE_FILE_SHIFT1 (_PAGE_BIT_PRESENT + 1) -#define PTE_FILE_SHIFT2 (_PAGE_BIT_FILE + 1) -#define PTE_FILE_SHIFT3 (_PAGE_BIT_PROTNONE + 1) -#define PTE_FILE_SHIFT4 (_PAGE_BIT_SOFT_DIRTY + 1) -#define PTE_FILE_BITS1 (PTE_FILE_SHIFT2 - PTE_FILE_SHIFT1 - 1) -#define PTE_FILE_BITS2 (PTE_FILE_SHIFT3 - PTE_FILE_SHIFT2 - 1) -#define PTE_FILE_BITS3 (PTE_FILE_SHIFT4 - PTE_FILE_SHIFT3 - 1) - -#define PTE_FILE_MASK1 ((1U << PTE_FILE_BITS1) - 1) -#define PTE_FILE_MASK2 ((1U << PTE_FILE_BITS2) - 1) -#define PTE_FILE_MASK3 ((1U << PTE_FILE_BITS3) - 1) - -#define PTE_FILE_LSHIFT2 (PTE_FILE_BITS1) -#define PTE_FILE_LSHIFT3 (PTE_FILE_BITS1 + PTE_FILE_BITS2) -#define PTE_FILE_LSHIFT4 (PTE_FILE_BITS1 + PTE_FILE_BITS2 + PTE_FILE_BITS3) - -static __always_inline pgoff_t pte_to_pgoff(pte_t pte) -{ - return (pgoff_t) - (pte_bitop(pte.pte_low, PTE_FILE_SHIFT1, PTE_FILE_MASK1, 0) + - pte_bitop(pte.pte_low, PTE_FILE_SHIFT2, PTE_FILE_MASK2, PTE_FILE_LSHIFT2) + - pte_bitop(pte.pte_low, PTE_FILE_SHIFT3, PTE_FILE_MASK3, PTE_FILE_LSHIFT3) + - pte_bitop(pte.pte_low, PTE_FILE_SHIFT4, -1UL, PTE_FILE_LSHIFT4)); -} - -static __always_inline pte_t pgoff_to_pte(pgoff_t off) -{ - return (pte_t){ - .pte_low = - pte_bitop(off, 0, PTE_FILE_MASK1, PTE_FILE_SHIFT1) + - pte_bitop(off, PTE_FILE_LSHIFT2, PTE_FILE_MASK2, PTE_FILE_SHIFT2) + - pte_bitop(off, PTE_FILE_LSHIFT3, PTE_FILE_MASK3, PTE_FILE_SHIFT3) + - pte_bitop(off, PTE_FILE_LSHIFT4, -1UL, PTE_FILE_SHIFT4) + - _PAGE_FILE, - }; -} - -#else /* CONFIG_MEM_SOFT_DIRTY */ - /* * Bits _PAGE_BIT_PRESENT, _PAGE_BIT_FILE and _PAGE_BIT_PROTNONE are taken, * split up the 29 bits of offset into this range. @@ -145,8 +98,6 @@ static __always_inline pte_t pgoff_to_pt }; } -#endif /* CONFIG_MEM_SOFT_DIRTY */ - /* Encode and de-code a swap entry */ #define SWP_TYPE_BITS (_PAGE_BIT_FILE - _PAGE_BIT_PRESENT - 1) #define SWP_OFFSET_SHIFT (_PAGE_BIT_PROTNONE + 1) From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755313AbaDGNYm (ORCPT ); Mon, 7 Apr 2014 09:24:42 -0400 Received: from mail-lb0-f179.google.com ([209.85.217.179]:34209 "EHLO mail-lb0-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754790AbaDGNYk (ORCPT ); Mon, 7 Apr 2014 09:24:40 -0400 Date: Mon, 7 Apr 2014 17:24:37 +0400 From: Cyrill Gorcunov To: "Kirill A. Shutemov" Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [rfc 0/3] Cleaning up soft-dirty bit usage Message-ID: <20140407132437.GH1444@moon> References: <20140403184844.260532690@openvz.org> <20140407130701.GA16677@node.dhcp.inet.fi> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140407130701.GA16677@node.dhcp.inet.fi> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Apr 07, 2014 at 04:07:01PM +0300, Kirill A. Shutemov wrote: > On Thu, Apr 03, 2014 at 10:48:44PM +0400, Cyrill Gorcunov wrote: > > Hi! I've been trying to clean up soft-dirty bit usage. I can't cleanup > > "ridiculous macros in pgtable-2level.h" completely because I need to > > define _PAGE_FILE,_PAGE_PROTNONE,_PAGE_NUMA bits in sequence manner > > like > > > > #define _PAGE_BIT_FILE (_PAGE_BIT_PRESENT + 1) /* _PAGE_BIT_RW */ > > #define _PAGE_BIT_NUMA (_PAGE_BIT_PRESENT + 2) /* _PAGE_BIT_USER */ > > #define _PAGE_BIT_PROTNONE (_PAGE_BIT_PRESENT + 3) /* _PAGE_BIT_PWT */ > > > > which can't be done right now because numa code needs to save original > > pte bits for example in __split_huge_page_map, if I'm not missing something > > obvious. > > Sorry, I didn't get this. How __split_huge_page_map() does depend on pte > bits order? __split_huge_page_map ... for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) { ... here we modify with pte bits entry = pte_mknuma(entry); --> clean _PAGE_PRESENT and set _PAGE_NUMA pte bits must remain valid and meaningful, for example we might have set _PAGE_RW here > > is it intentional, and @prot_numa argument is supposed to be passed > > with prot_numa = 1 one day, or it's leftover from old times? > > I see one more user of change_protection() -- change_prot_numa(), which > has .prot_numa == 1. Yeah, thanks, managed to miss this.