* [PATCH v1 0/2] arm64/mm: Enable userfaultfd write-protect @ 2024-04-24 11:10 Ryan Roberts 2024-04-24 11:10 ` [PATCH v1 1/2] arm64/mm: Move PTE_PROT_NONE and PMD_PRESENT_INVALID Ryan Roberts 2024-04-24 11:10 ` [PATCH v1 2/2] arm64/mm: Add uffd write-protect support Ryan Roberts 0 siblings, 2 replies; 21+ messages in thread From: Ryan Roberts @ 2024-04-24 11:10 UTC (permalink / raw) To: Catalin Marinas, Will Deacon, Joey Gouly, Ard Biesheuvel, Mark Rutland, Anshuman Khandual, David Hildenbrand, Peter Xu, Mike Rapoport, Shivansh Vij Cc: Ryan Roberts, linux-arm-kernel, linux-kernel Hi All, This series adds uffd write-protect support for arm64. The patches are extracted, without change, from a wider RFC I posted at [1]. Previous attempts to add uffd-wp (and soft-dirty) have failed because of a perceived lack of available PTE SW bits. However it actually turns out that there are 2 available but they are hidden. PTE_PROT_NONE was previously occupying a SW bit, but it only applies when PTE_VALID is clear, so this is moved to overlay PTE_UXN in patch 1, freeing up the SW bit. Bit 63 is marked as "IGNORED" in the Arm ARM, but it does not currently indicate "reserved for SW use" like it does for the other SW bits. I've confirmed with the spec owner that this is an oversight; the bit is intended to be reserved for SW use and the spec will clarify this in a future update. So now we have two spare bits; patch 2 enables uffd-wp on arm64, using the SW bit freed up by moving PTE_PROT_NONE. This leaves bit 63 spare for future use (e.g. soft-dirty - see RFC at [1] - or some other usage). This applies on top of v6.9-rc3. The all mm selftests involving uffd-wp now run and pass and no other selftest regressions are observed. [1] https://lore.kernel.org/all/20240419074344.2643212-1-ryan.roberts@arm.com/ Thanks, Ryan Ryan Roberts (2): arm64/mm: Move PTE_PROT_NONE and PMD_PRESENT_INVALID arm64/mm: Add uffd write-protect support arch/arm64/Kconfig | 1 + arch/arm64/include/asm/pgtable-prot.h | 12 ++++- arch/arm64/include/asm/pgtable.h | 71 ++++++++++++++++++++++++--- 3 files changed, 75 insertions(+), 9 deletions(-) -- 2.25.1 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v1 1/2] arm64/mm: Move PTE_PROT_NONE and PMD_PRESENT_INVALID 2024-04-24 11:10 [PATCH v1 0/2] arm64/mm: Enable userfaultfd write-protect Ryan Roberts @ 2024-04-24 11:10 ` Ryan Roberts 2024-04-24 16:43 ` Catalin Marinas 2024-04-25 9:16 ` David Hildenbrand 2024-04-24 11:10 ` [PATCH v1 2/2] arm64/mm: Add uffd write-protect support Ryan Roberts 1 sibling, 2 replies; 21+ messages in thread From: Ryan Roberts @ 2024-04-24 11:10 UTC (permalink / raw) To: Catalin Marinas, Will Deacon, Joey Gouly, Ard Biesheuvel, Mark Rutland, Anshuman Khandual, David Hildenbrand, Peter Xu, Mike Rapoport, Shivansh Vij Cc: Ryan Roberts, linux-arm-kernel, linux-kernel Previously PTE_PROT_NONE was occupying bit 58, one of the bits reserved for SW use when the PTE is valid. This is a waste of those precious SW bits since PTE_PROT_NONE can only ever be set when valid is clear. Instead let's overlay it on what would be a HW bit if valid was set. We need to be careful about which HW bit to choose since some of them must be preserved; when pte_present() is true (as it is for a PTE_PROT_NONE pte), it is legitimate for the core to call various accessors, e.g. pte_dirty(), pte_write() etc. There are also some accessors that are private to the arch which must continue to be honoured, e.g. pte_user(), pte_user_exec() etc. So we choose to overlay PTE_UXN; This effectively means that whenever a pte has PTE_PROT_NONE set, it will always report pte_user_exec() == false, which is obviously always correct. As a result of this change, we must shuffle the layout of the arch-specific swap pte so that PTE_PROT_NONE is always zero and not overlapping with any other field. As a result of this, there is no way to keep the `type` field contiguous without conflicting with PMD_PRESENT_INVALID (bit 59), which must also be 0 for a swap pte. So let's move PMD_PRESENT_INVALID to bit 60. In the end, this frees up bit 58 for future use as a proper SW bit (e.g. soft-dirty or uffd-wp). Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> --- arch/arm64/include/asm/pgtable-prot.h | 4 ++-- arch/arm64/include/asm/pgtable.h | 16 +++++++++------- 2 files changed, 11 insertions(+), 9 deletions(-) diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h index dd9ee67d1d87..ef952d69fd04 100644 --- a/arch/arm64/include/asm/pgtable-prot.h +++ b/arch/arm64/include/asm/pgtable-prot.h @@ -18,14 +18,14 @@ #define PTE_DIRTY (_AT(pteval_t, 1) << 55) #define PTE_SPECIAL (_AT(pteval_t, 1) << 56) #define PTE_DEVMAP (_AT(pteval_t, 1) << 57) -#define PTE_PROT_NONE (_AT(pteval_t, 1) << 58) /* only when !PTE_VALID */ +#define PTE_PROT_NONE (PTE_UXN) /* Reuse PTE_UXN; only when !PTE_VALID */ /* * This bit indicates that the entry is present i.e. pmd_page() * still points to a valid huge page in memory even if the pmd * has been invalidated. */ -#define PMD_PRESENT_INVALID (_AT(pteval_t, 1) << 59) /* only when !PMD_SECT_VALID */ +#define PMD_PRESENT_INVALID (_AT(pteval_t, 1) << 60) /* only when !PMD_SECT_VALID */ #define _PROT_DEFAULT (PTE_TYPE_PAGE | PTE_AF | PTE_SHARED) #define _PROT_SECT_DEFAULT (PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index afdd56d26ad7..23aabff4fa6f 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -1248,20 +1248,22 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma, * Encode and decode a swap entry: * bits 0-1: present (must be zero) * bits 2: remember PG_anon_exclusive - * bits 3-7: swap type - * bits 8-57: swap offset - * bit 58: PTE_PROT_NONE (must be zero) + * bits 4-53: swap offset + * bit 54: PTE_PROT_NONE (overlays PTE_UXN) (must be zero) + * bits 55-59: swap type + * bit 60: PMD_PRESENT_INVALID (must be zero) */ -#define __SWP_TYPE_SHIFT 3 +#define __SWP_TYPE_SHIFT 55 #define __SWP_TYPE_BITS 5 -#define __SWP_OFFSET_BITS 50 #define __SWP_TYPE_MASK ((1 << __SWP_TYPE_BITS) - 1) -#define __SWP_OFFSET_SHIFT (__SWP_TYPE_BITS + __SWP_TYPE_SHIFT) +#define __SWP_OFFSET_SHIFT 4 +#define __SWP_OFFSET_BITS 50 #define __SWP_OFFSET_MASK ((1UL << __SWP_OFFSET_BITS) - 1) #define __swp_type(x) (((x).val >> __SWP_TYPE_SHIFT) & __SWP_TYPE_MASK) #define __swp_offset(x) (((x).val >> __SWP_OFFSET_SHIFT) & __SWP_OFFSET_MASK) -#define __swp_entry(type,offset) ((swp_entry_t) { ((type) << __SWP_TYPE_SHIFT) | ((offset) << __SWP_OFFSET_SHIFT) }) +#define __swp_entry(type, offset) ((swp_entry_t) { ((unsigned long)(type) << __SWP_TYPE_SHIFT) | \ + ((unsigned long)(offset) << __SWP_OFFSET_SHIFT) }) #define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) }) #define __swp_entry_to_pte(swp) ((pte_t) { (swp).val }) -- 2.25.1 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v1 1/2] arm64/mm: Move PTE_PROT_NONE and PMD_PRESENT_INVALID 2024-04-24 11:10 ` [PATCH v1 1/2] arm64/mm: Move PTE_PROT_NONE and PMD_PRESENT_INVALID Ryan Roberts @ 2024-04-24 16:43 ` Catalin Marinas 2024-04-25 8:40 ` Ryan Roberts 2024-04-25 9:16 ` David Hildenbrand 1 sibling, 1 reply; 21+ messages in thread From: Catalin Marinas @ 2024-04-24 16:43 UTC (permalink / raw) To: Ryan Roberts Cc: Will Deacon, Joey Gouly, Ard Biesheuvel, Mark Rutland, Anshuman Khandual, David Hildenbrand, Peter Xu, Mike Rapoport, Shivansh Vij, linux-arm-kernel, linux-kernel On Wed, Apr 24, 2024 at 12:10:16PM +0100, Ryan Roberts wrote: > Previously PTE_PROT_NONE was occupying bit 58, one of the bits reserved > for SW use when the PTE is valid. This is a waste of those precious SW > bits since PTE_PROT_NONE can only ever be set when valid is clear. > Instead let's overlay it on what would be a HW bit if valid was set. > > We need to be careful about which HW bit to choose since some of them > must be preserved; when pte_present() is true (as it is for a > PTE_PROT_NONE pte), it is legitimate for the core to call various > accessors, e.g. pte_dirty(), pte_write() etc. There are also some > accessors that are private to the arch which must continue to be > honoured, e.g. pte_user(), pte_user_exec() etc. > > So we choose to overlay PTE_UXN; This effectively means that whenever a > pte has PTE_PROT_NONE set, it will always report pte_user_exec() == > false, which is obviously always correct. > > As a result of this change, we must shuffle the layout of the > arch-specific swap pte so that PTE_PROT_NONE is always zero and not > overlapping with any other field. As a result of this, there is no way > to keep the `type` field contiguous without conflicting with > PMD_PRESENT_INVALID (bit 59), which must also be 0 for a swap pte. So > let's move PMD_PRESENT_INVALID to bit 60. I think we discussed but forgot the details. What was the reason for not using, say, bit 60 for PTE_PROT_NONE to avoid all the swap bits reshuffling? Clearing or setting of the PTE_PROT_NONE bit is done via pte_modify() and this gets all the new permission bits anyway. With POE support (on the list for now), PTE_PROT_NONE would overlap with POIndex[0] but I don't think we ever plan to read this field (other than maybe ptdump). The POIndex field is set from the vma->vm_page_prot (Joey may need to adjust vm_get_page_prot() in his patches to avoid setting a pkey on a PROT_NONE mapping). -- Catalin _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v1 1/2] arm64/mm: Move PTE_PROT_NONE and PMD_PRESENT_INVALID 2024-04-24 16:43 ` Catalin Marinas @ 2024-04-25 8:40 ` Ryan Roberts 0 siblings, 0 replies; 21+ messages in thread From: Ryan Roberts @ 2024-04-25 8:40 UTC (permalink / raw) To: Catalin Marinas Cc: Will Deacon, Joey Gouly, Ard Biesheuvel, Mark Rutland, Anshuman Khandual, David Hildenbrand, Peter Xu, Mike Rapoport, Shivansh Vij, linux-arm-kernel, linux-kernel On 24/04/2024 17:43, Catalin Marinas wrote: > On Wed, Apr 24, 2024 at 12:10:16PM +0100, Ryan Roberts wrote: >> Previously PTE_PROT_NONE was occupying bit 58, one of the bits reserved >> for SW use when the PTE is valid. This is a waste of those precious SW >> bits since PTE_PROT_NONE can only ever be set when valid is clear. >> Instead let's overlay it on what would be a HW bit if valid was set. >> >> We need to be careful about which HW bit to choose since some of them >> must be preserved; when pte_present() is true (as it is for a >> PTE_PROT_NONE pte), it is legitimate for the core to call various >> accessors, e.g. pte_dirty(), pte_write() etc. There are also some >> accessors that are private to the arch which must continue to be >> honoured, e.g. pte_user(), pte_user_exec() etc. >> >> So we choose to overlay PTE_UXN; This effectively means that whenever a >> pte has PTE_PROT_NONE set, it will always report pte_user_exec() == >> false, which is obviously always correct. >> >> As a result of this change, we must shuffle the layout of the >> arch-specific swap pte so that PTE_PROT_NONE is always zero and not >> overlapping with any other field. As a result of this, there is no way >> to keep the `type` field contiguous without conflicting with >> PMD_PRESENT_INVALID (bit 59), which must also be 0 for a swap pte. So >> let's move PMD_PRESENT_INVALID to bit 60. > > I think we discussed but forgot the details. What was the reason for not > using, say, bit 60 for PTE_PROT_NONE to avoid all the swap bits > reshuffling? Clearing or setting of the PTE_PROT_NONE bit is done via > pte_modify() and this gets all the new permission bits anyway. With POE > support (on the list for now), PTE_PROT_NONE would overlap with > POIndex[0] but I don't think we ever plan to read this field (other than > maybe ptdump). The POIndex field is set from the vma->vm_page_prot (Joey > may need to adjust vm_get_page_prot() in his patches to avoid setting a > pkey on a PROT_NONE mapping). > Copy/pasting your comment from the other patch into this one since its easier to discuss it all together: Ah, I did not realise we need to free up bit 3 from the swap pte as well. Though maybe patch 1 is fine as is but for the record, it would be good to justify the decision to go with PTE_UXN. While we need a new bit in the swap pte for uffd-wp, its just a SW bit - it could go anywhere. I chose bit 3 because it was free after all the other shuffling. As for choosing PTE_UXN for PTE_PROT_NONE, I wanted to choose a bit that would definitely never lead to confusion if ever interpretted as its HW meaning, since as far as the core-mm is concerned, the pte is either present or its not, and if it is present, then it is completely valid to call all the pte_*() helpers. By definition, if PTE_PROT_NONE is set, then the PTE is not executable in user space, so any helpers that continue to interpret the bit position as UXN will still give sensible answers. Yes, I could have just put PTE_PROT_NONE in bit position 60 and avoided all the shuffling. But in the past you have pushed back on using the PBHA bits due to out of tree patches using them. I thought it was better to just sidestep having to think about it by not using them. Additionally, as you point out, I didn't want to risk overlapping with the POIndex and that causing subtle bugs. But then... PMD_PRESENT_INVALID. Which already turns out to be violating my above considerations. Ugh. I considered moving that to NS, but it would have required splitting the offset field into 2 discontiguous parts in the swap pte. In the end, I decided its ok in any position because its transient; its just a temp marker and the pte will soon get set again from scratch so it doesn't matter is adding the marker is destructive. Personally I think there is less risk of a future/subtle bug by putting PTE_PROT_NONE over PTE_UXN. But if you prefer to reduce churn by putting it at bit 60 then I'm ok with that. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v1 1/2] arm64/mm: Move PTE_PROT_NONE and PMD_PRESENT_INVALID 2024-04-24 11:10 ` [PATCH v1 1/2] arm64/mm: Move PTE_PROT_NONE and PMD_PRESENT_INVALID Ryan Roberts 2024-04-24 16:43 ` Catalin Marinas @ 2024-04-25 9:16 ` David Hildenbrand 2024-04-25 10:29 ` Ryan Roberts 1 sibling, 1 reply; 21+ messages in thread From: David Hildenbrand @ 2024-04-25 9:16 UTC (permalink / raw) To: Ryan Roberts, Catalin Marinas, Will Deacon, Joey Gouly, Ard Biesheuvel, Mark Rutland, Anshuman Khandual, Peter Xu, Mike Rapoport, Shivansh Vij Cc: linux-arm-kernel, linux-kernel On 24.04.24 13:10, Ryan Roberts wrote: > Previously PTE_PROT_NONE was occupying bit 58, one of the bits reserved > for SW use when the PTE is valid. This is a waste of those precious SW > bits since PTE_PROT_NONE can only ever be set when valid is clear. > Instead let's overlay it on what would be a HW bit if valid was set. > > We need to be careful about which HW bit to choose since some of them > must be preserved; when pte_present() is true (as it is for a > PTE_PROT_NONE pte), it is legitimate for the core to call various > accessors, e.g. pte_dirty(), pte_write() etc. There are also some > accessors that are private to the arch which must continue to be > honoured, e.g. pte_user(), pte_user_exec() etc. > > So we choose to overlay PTE_UXN; This effectively means that whenever a > pte has PTE_PROT_NONE set, it will always report pte_user_exec() == > false, which is obviously always correct. > > As a result of this change, we must shuffle the layout of the > arch-specific swap pte so that PTE_PROT_NONE is always zero and not > overlapping with any other field. As a result of this, there is no way > to keep the `type` field contiguous without conflicting with > PMD_PRESENT_INVALID (bit 59), which must also be 0 for a swap pte. So > let's move PMD_PRESENT_INVALID to bit 60. A note that some archs split/re-combine type and/or offset, to make use of every bit possible :) But that's mostly relevant for 32bit. (and as long as PFNs can still fit into the swp offset for migration entries etc.) > > In the end, this frees up bit 58 for future use as a proper SW bit (e.g. > soft-dirty or uffd-wp). I was briefly confused about how you would use these bits as SW bits for swap PTEs (which you can't as they overlay the type). See below regarding bit 3. I would have said here "proper SW bit for present PTEs". > > Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> > --- > arch/arm64/include/asm/pgtable-prot.h | 4 ++-- > arch/arm64/include/asm/pgtable.h | 16 +++++++++------- > 2 files changed, 11 insertions(+), 9 deletions(-) > > diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h > index dd9ee67d1d87..ef952d69fd04 100644 > --- a/arch/arm64/include/asm/pgtable-prot.h > +++ b/arch/arm64/include/asm/pgtable-prot.h > @@ -18,14 +18,14 @@ > #define PTE_DIRTY (_AT(pteval_t, 1) << 55) > #define PTE_SPECIAL (_AT(pteval_t, 1) << 56) > #define PTE_DEVMAP (_AT(pteval_t, 1) << 57) > -#define PTE_PROT_NONE (_AT(pteval_t, 1) << 58) /* only when !PTE_VALID */ > +#define PTE_PROT_NONE (PTE_UXN) /* Reuse PTE_UXN; only when !PTE_VALID */ > > /* > * This bit indicates that the entry is present i.e. pmd_page() > * still points to a valid huge page in memory even if the pmd > * has been invalidated. > */ > -#define PMD_PRESENT_INVALID (_AT(pteval_t, 1) << 59) /* only when !PMD_SECT_VALID */ > +#define PMD_PRESENT_INVALID (_AT(pteval_t, 1) << 60) /* only when !PMD_SECT_VALID */ > > #define _PROT_DEFAULT (PTE_TYPE_PAGE | PTE_AF | PTE_SHARED) > #define _PROT_SECT_DEFAULT (PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S) > diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h > index afdd56d26ad7..23aabff4fa6f 100644 > --- a/arch/arm64/include/asm/pgtable.h > +++ b/arch/arm64/include/asm/pgtable.h > @@ -1248,20 +1248,22 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma, > * Encode and decode a swap entry: > * bits 0-1: present (must be zero) > * bits 2: remember PG_anon_exclusive > - * bits 3-7: swap type > - * bits 8-57: swap offset > - * bit 58: PTE_PROT_NONE (must be zero) Reading this patch alone: what happened to bit 3? Please mention that that it will be used as a swap pte metadata bit (uffd-wp). > + * bits 4-53: swap offset So we'll still have 50bit for the offset, good. We could even use 61-63 if ever required to store bigger PFNs. LGTM Reviewed-by: David Hildenbrand <david@redhat.com> -- Cheers, David / dhildenb _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v1 1/2] arm64/mm: Move PTE_PROT_NONE and PMD_PRESENT_INVALID 2024-04-25 9:16 ` David Hildenbrand @ 2024-04-25 10:29 ` Ryan Roberts 2024-04-25 10:37 ` Ryan Roberts 0 siblings, 1 reply; 21+ messages in thread From: Ryan Roberts @ 2024-04-25 10:29 UTC (permalink / raw) To: David Hildenbrand, Catalin Marinas, Will Deacon, Joey Gouly, Ard Biesheuvel, Mark Rutland, Anshuman Khandual, Peter Xu, Mike Rapoport, Shivansh Vij Cc: linux-arm-kernel, linux-kernel On 25/04/2024 10:16, David Hildenbrand wrote: > On 24.04.24 13:10, Ryan Roberts wrote: >> Previously PTE_PROT_NONE was occupying bit 58, one of the bits reserved >> for SW use when the PTE is valid. This is a waste of those precious SW >> bits since PTE_PROT_NONE can only ever be set when valid is clear. >> Instead let's overlay it on what would be a HW bit if valid was set. >> >> We need to be careful about which HW bit to choose since some of them >> must be preserved; when pte_present() is true (as it is for a >> PTE_PROT_NONE pte), it is legitimate for the core to call various >> accessors, e.g. pte_dirty(), pte_write() etc. There are also some >> accessors that are private to the arch which must continue to be >> honoured, e.g. pte_user(), pte_user_exec() etc. >> >> So we choose to overlay PTE_UXN; This effectively means that whenever a >> pte has PTE_PROT_NONE set, it will always report pte_user_exec() == >> false, which is obviously always correct. >> >> As a result of this change, we must shuffle the layout of the >> arch-specific swap pte so that PTE_PROT_NONE is always zero and not >> overlapping with any other field. As a result of this, there is no way >> to keep the `type` field contiguous without conflicting with >> PMD_PRESENT_INVALID (bit 59), which must also be 0 for a swap pte. So >> let's move PMD_PRESENT_INVALID to bit 60. > > A note that some archs split/re-combine type and/or offset, to make use of every > bit possible :) But that's mostly relevant for 32bit. > > (and as long as PFNs can still fit into the swp offset for migration entries etc.) Yeah, I considered splitting the type or offset field to avoid moving PMD_PRESENT_INVALID, but thought it was better to avoid the extra mask and shift. > >> >> In the end, this frees up bit 58 for future use as a proper SW bit (e.g. >> soft-dirty or uffd-wp). > > I was briefly confused about how you would use these bits as SW bits for swap > PTEs (which you can't as they overlay the type). See below regarding bit 3. > > I would have said here "proper SW bit for present PTEs". Yes; I'll clarify in the next version. > >> >> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> >> --- >> arch/arm64/include/asm/pgtable-prot.h | 4 ++-- >> arch/arm64/include/asm/pgtable.h | 16 +++++++++------- >> 2 files changed, 11 insertions(+), 9 deletions(-) >> >> diff --git a/arch/arm64/include/asm/pgtable-prot.h >> b/arch/arm64/include/asm/pgtable-prot.h >> index dd9ee67d1d87..ef952d69fd04 100644 >> --- a/arch/arm64/include/asm/pgtable-prot.h >> +++ b/arch/arm64/include/asm/pgtable-prot.h >> @@ -18,14 +18,14 @@ >> #define PTE_DIRTY (_AT(pteval_t, 1) << 55) >> #define PTE_SPECIAL (_AT(pteval_t, 1) << 56) >> #define PTE_DEVMAP (_AT(pteval_t, 1) << 57) >> -#define PTE_PROT_NONE (_AT(pteval_t, 1) << 58) /* only when !PTE_VALID */ >> +#define PTE_PROT_NONE (PTE_UXN) /* Reuse PTE_UXN; only when >> !PTE_VALID */ >> /* >> * This bit indicates that the entry is present i.e. pmd_page() >> * still points to a valid huge page in memory even if the pmd >> * has been invalidated. >> */ >> -#define PMD_PRESENT_INVALID (_AT(pteval_t, 1) << 59) /* only when >> !PMD_SECT_VALID */ >> +#define PMD_PRESENT_INVALID (_AT(pteval_t, 1) << 60) /* only when >> !PMD_SECT_VALID */ >> #define _PROT_DEFAULT (PTE_TYPE_PAGE | PTE_AF | PTE_SHARED) >> #define _PROT_SECT_DEFAULT (PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S) >> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h >> index afdd56d26ad7..23aabff4fa6f 100644 >> --- a/arch/arm64/include/asm/pgtable.h >> +++ b/arch/arm64/include/asm/pgtable.h >> @@ -1248,20 +1248,22 @@ static inline pmd_t pmdp_establish(struct >> vm_area_struct *vma, >> * Encode and decode a swap entry: >> * bits 0-1: present (must be zero) >> * bits 2: remember PG_anon_exclusive >> - * bits 3-7: swap type >> - * bits 8-57: swap offset >> - * bit 58: PTE_PROT_NONE (must be zero) > > Reading this patch alone: what happened to bit 3? Please mention that that it > will be used as a swap pte metadata bit (uffd-wp). Will do. It's all a bit arbitrary though. I could have put offset in 3-52, and then 53 would have been spare for uffd-wp. I'm not sure there is any advantage to either option. > >> + * bits 4-53: swap offset > > So we'll still have 50bit for the offset, good. We could even use 61-63 if ever > required to store bigger PFNs. yep, or more sw bits. > > LGTM > > Reviewed-by: David Hildenbrand <david@redhat.com> Thanks! _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v1 1/2] arm64/mm: Move PTE_PROT_NONE and PMD_PRESENT_INVALID 2024-04-25 10:29 ` Ryan Roberts @ 2024-04-25 10:37 ` Ryan Roberts 2024-04-26 14:48 ` Catalin Marinas 0 siblings, 1 reply; 21+ messages in thread From: Ryan Roberts @ 2024-04-25 10:37 UTC (permalink / raw) To: David Hildenbrand, Catalin Marinas, Will Deacon, Joey Gouly, Ard Biesheuvel, Mark Rutland, Anshuman Khandual, Peter Xu, Mike Rapoport, Shivansh Vij Cc: linux-arm-kernel, linux-kernel On 25/04/2024 11:29, Ryan Roberts wrote: > On 25/04/2024 10:16, David Hildenbrand wrote: >> On 24.04.24 13:10, Ryan Roberts wrote: >>> Previously PTE_PROT_NONE was occupying bit 58, one of the bits reserved >>> for SW use when the PTE is valid. This is a waste of those precious SW >>> bits since PTE_PROT_NONE can only ever be set when valid is clear. >>> Instead let's overlay it on what would be a HW bit if valid was set. >>> >>> We need to be careful about which HW bit to choose since some of them >>> must be preserved; when pte_present() is true (as it is for a >>> PTE_PROT_NONE pte), it is legitimate for the core to call various >>> accessors, e.g. pte_dirty(), pte_write() etc. There are also some >>> accessors that are private to the arch which must continue to be >>> honoured, e.g. pte_user(), pte_user_exec() etc. >>> >>> So we choose to overlay PTE_UXN; This effectively means that whenever a >>> pte has PTE_PROT_NONE set, it will always report pte_user_exec() == >>> false, which is obviously always correct. >>> >>> As a result of this change, we must shuffle the layout of the >>> arch-specific swap pte so that PTE_PROT_NONE is always zero and not >>> overlapping with any other field. As a result of this, there is no way >>> to keep the `type` field contiguous without conflicting with >>> PMD_PRESENT_INVALID (bit 59), which must also be 0 for a swap pte. So >>> let's move PMD_PRESENT_INVALID to bit 60. >> >> A note that some archs split/re-combine type and/or offset, to make use of every >> bit possible :) But that's mostly relevant for 32bit. >> >> (and as long as PFNs can still fit into the swp offset for migration entries etc.) > > Yeah, I considered splitting the type or offset field to avoid moving > PMD_PRESENT_INVALID, but thought it was better to avoid the extra mask and shift. Also, IMHO we shouldn't really need to reserve PMD_PRESENT_INVALID for swap ptes; it would be cleaner to have one bit that defines "present" when valid is clear (similar to PTE_PROT_NONE today) then another bit which is only defined when "present && !valid" which tells us if this is PTE_PROT_NONE or PMD_PRESENT_INVALID (I don't think you can ever have both at the same time?). But there is a problem with this: __split_huge_pmd_locked() calls pmdp_invalidate() for a pmd before it determines that it is pmd_present(). So the PMD_PRESENT_INVALID can be set in a swap pte today. That feels wrong to me, but was trying to avoid the whole thing unravelling so didn't persue. > >> >>> >>> In the end, this frees up bit 58 for future use as a proper SW bit (e.g. >>> soft-dirty or uffd-wp). >> >> I was briefly confused about how you would use these bits as SW bits for swap >> PTEs (which you can't as they overlay the type). See below regarding bit 3. >> >> I would have said here "proper SW bit for present PTEs". > > Yes; I'll clarify in the next version. > >> >>> >>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> >>> --- >>> arch/arm64/include/asm/pgtable-prot.h | 4 ++-- >>> arch/arm64/include/asm/pgtable.h | 16 +++++++++------- >>> 2 files changed, 11 insertions(+), 9 deletions(-) >>> >>> diff --git a/arch/arm64/include/asm/pgtable-prot.h >>> b/arch/arm64/include/asm/pgtable-prot.h >>> index dd9ee67d1d87..ef952d69fd04 100644 >>> --- a/arch/arm64/include/asm/pgtable-prot.h >>> +++ b/arch/arm64/include/asm/pgtable-prot.h >>> @@ -18,14 +18,14 @@ >>> #define PTE_DIRTY (_AT(pteval_t, 1) << 55) >>> #define PTE_SPECIAL (_AT(pteval_t, 1) << 56) >>> #define PTE_DEVMAP (_AT(pteval_t, 1) << 57) >>> -#define PTE_PROT_NONE (_AT(pteval_t, 1) << 58) /* only when !PTE_VALID */ >>> +#define PTE_PROT_NONE (PTE_UXN) /* Reuse PTE_UXN; only when >>> !PTE_VALID */ >>> /* >>> * This bit indicates that the entry is present i.e. pmd_page() >>> * still points to a valid huge page in memory even if the pmd >>> * has been invalidated. >>> */ >>> -#define PMD_PRESENT_INVALID (_AT(pteval_t, 1) << 59) /* only when >>> !PMD_SECT_VALID */ >>> +#define PMD_PRESENT_INVALID (_AT(pteval_t, 1) << 60) /* only when >>> !PMD_SECT_VALID */ >>> #define _PROT_DEFAULT (PTE_TYPE_PAGE | PTE_AF | PTE_SHARED) >>> #define _PROT_SECT_DEFAULT (PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S) >>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h >>> index afdd56d26ad7..23aabff4fa6f 100644 >>> --- a/arch/arm64/include/asm/pgtable.h >>> +++ b/arch/arm64/include/asm/pgtable.h >>> @@ -1248,20 +1248,22 @@ static inline pmd_t pmdp_establish(struct >>> vm_area_struct *vma, >>> * Encode and decode a swap entry: >>> * bits 0-1: present (must be zero) >>> * bits 2: remember PG_anon_exclusive >>> - * bits 3-7: swap type >>> - * bits 8-57: swap offset >>> - * bit 58: PTE_PROT_NONE (must be zero) >> >> Reading this patch alone: what happened to bit 3? Please mention that that it >> will be used as a swap pte metadata bit (uffd-wp). > > Will do. It's all a bit arbitrary though. I could have put offset in 3-52, and > then 53 would have been spare for uffd-wp. I'm not sure there is any advantage > to either option. > >> >>> + * bits 4-53: swap offset >> >> So we'll still have 50bit for the offset, good. We could even use 61-63 if ever >> required to store bigger PFNs. > > yep, or more sw bits. > >> >> LGTM >> >> Reviewed-by: David Hildenbrand <david@redhat.com> > > Thanks! > > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v1 1/2] arm64/mm: Move PTE_PROT_NONE and PMD_PRESENT_INVALID 2024-04-25 10:37 ` Ryan Roberts @ 2024-04-26 14:48 ` Catalin Marinas 2024-04-29 10:04 ` Ryan Roberts 0 siblings, 1 reply; 21+ messages in thread From: Catalin Marinas @ 2024-04-26 14:48 UTC (permalink / raw) To: Ryan Roberts Cc: David Hildenbrand, Will Deacon, Joey Gouly, Ard Biesheuvel, Mark Rutland, Anshuman Khandual, Peter Xu, Mike Rapoport, Shivansh Vij, linux-arm-kernel, linux-kernel On Thu, Apr 25, 2024 at 11:37:42AM +0100, Ryan Roberts wrote: > Also, IMHO we shouldn't really need to reserve PMD_PRESENT_INVALID for swap > ptes; it would be cleaner to have one bit that defines "present" when valid is > clear (similar to PTE_PROT_NONE today) then another bit which is only defined > when "present && !valid" which tells us if this is PTE_PROT_NONE or > PMD_PRESENT_INVALID (I don't think you can ever have both at the same time?). I think this make sense, maybe rename the above to PTE_PRESENT_INVALID and use it for both ptes and pmds. > But there is a problem with this: __split_huge_pmd_locked() calls > pmdp_invalidate() for a pmd before it determines that it is pmd_present(). So > the PMD_PRESENT_INVALID can be set in a swap pte today. That feels wrong to me, > but was trying to avoid the whole thing unravelling so didn't persue. Maybe what's wrong is the arm64 implementation setting this bit on a swap/migration pmd (though we could handle this in the core code as well, it depends what the other architectures do). The only check for the PMD_PRESENT_INVALID bit is in the arm64 code and it can be absorbed into the pmd_present() check. I think it is currently broken as pmd_present() can return true for a swap pmd after pmd_mkinvalid(). So I don't think we lose anything if pmd_mkinvalid() skips any bit setting when !PTE_VALID. Maybe it even fixes some corner case we never hit yet (like pmd_present() on a swap/migration+invalid pmd). -- Catalin _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v1 1/2] arm64/mm: Move PTE_PROT_NONE and PMD_PRESENT_INVALID 2024-04-26 14:48 ` Catalin Marinas @ 2024-04-29 10:04 ` Ryan Roberts 2024-04-29 12:38 ` Catalin Marinas 0 siblings, 1 reply; 21+ messages in thread From: Ryan Roberts @ 2024-04-29 10:04 UTC (permalink / raw) To: Catalin Marinas Cc: David Hildenbrand, Will Deacon, Joey Gouly, Ard Biesheuvel, Mark Rutland, Anshuman Khandual, Peter Xu, Mike Rapoport, Shivansh Vij, linux-arm-kernel, linux-kernel On 26/04/2024 15:48, Catalin Marinas wrote: > On Thu, Apr 25, 2024 at 11:37:42AM +0100, Ryan Roberts wrote: >> Also, IMHO we shouldn't really need to reserve PMD_PRESENT_INVALID for swap >> ptes; it would be cleaner to have one bit that defines "present" when valid is >> clear (similar to PTE_PROT_NONE today) then another bit which is only defined >> when "present && !valid" which tells us if this is PTE_PROT_NONE or >> PMD_PRESENT_INVALID (I don't think you can ever have both at the same time?). > > I think this make sense, maybe rename the above to PTE_PRESENT_INVALID > and use it for both ptes and pmds. Yep, sounds good. I've already got a patch to do this, but it's exposed a bug in core-mm so will now fix that before I can validate my change. see https://lore.kernel.org/linux-arm-kernel/ZiuyGXt0XWwRgFh9@x1n/ With this in place, I'm proposing to remove PTE_PROT_NONE entirely and instead represent PROT_NONE as a present but invalid pte (PTE_VALID=0, PTE_INVALID=1) with both PTE_WRITE=0 and PTE_RDONLY=0. While the HW would interpret PTE_WRITE=0/PTE_RDONLY=0 as "RW without dirty bit modification", this is not a problem as the pte is invalid, so the HW doesn't interpret it. And SW always uses the PTE_WRITE bit to interpret the writability of the pte. So PTE_WRITE=0/PTE_RDONLY=0 was previously an unused combination that we now repurpose for PROT_NONE. This will subtly change behaviour in an edge case though. Imagine: pte_t pte; pte = pte_modify(pte, PAGE_NONE); pte = pte_mkwrite_novma(pte); WARN_ON(pte_protnone(pte)); Should that warning fire or not? Previously, because we had a dedicated bit for PTE_PROT_NONE it would fire. With my proposed change it will not fire. To me it's more intuitive if it doesn't fire. Regardless there is no core code that ever does this. Once you have a protnone pte, its terminal - nothing ever modifies it with these helpers AFAICS. Personally I think this is a nice tidy up that saves a SW bit in both present and swap ptes. What do you think? (I'll just post the series if its easier to provide feedback in that context). > >> But there is a problem with this: __split_huge_pmd_locked() calls >> pmdp_invalidate() for a pmd before it determines that it is pmd_present(). So >> the PMD_PRESENT_INVALID can be set in a swap pte today. That feels wrong to me, >> but was trying to avoid the whole thing unravelling so didn't persue. > > Maybe what's wrong is the arm64 implementation setting this bit on a > swap/migration pmd (though we could handle this in the core code as > well, it depends what the other architectures do). The only check for > the PMD_PRESENT_INVALID bit is in the arm64 code and it can be absorbed > into the pmd_present() check. I think it is currently broken as > pmd_present() can return true for a swap pmd after pmd_mkinvalid(). I've posted a fix here: https://lore.kernel.org/linux-mm/20240425170704.3379492-1-ryan.roberts@arm.com/ My position is that you shouldn't be calling pmd_mkinvalid() on a non-present pmd. > > So I don't think we lose anything if pmd_mkinvalid() skips any bit > setting when !PTE_VALID. Maybe it even fixes some corner case we never > hit yet (like pmd_present() on a swap/migration+invalid pmd). > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v1 1/2] arm64/mm: Move PTE_PROT_NONE and PMD_PRESENT_INVALID 2024-04-29 10:04 ` Ryan Roberts @ 2024-04-29 12:38 ` Catalin Marinas 2024-04-29 13:01 ` Ryan Roberts 0 siblings, 1 reply; 21+ messages in thread From: Catalin Marinas @ 2024-04-29 12:38 UTC (permalink / raw) To: Ryan Roberts Cc: David Hildenbrand, Will Deacon, Joey Gouly, Ard Biesheuvel, Mark Rutland, Anshuman Khandual, Peter Xu, Mike Rapoport, Shivansh Vij, linux-arm-kernel, linux-kernel On Mon, Apr 29, 2024 at 11:04:53AM +0100, Ryan Roberts wrote: > On 26/04/2024 15:48, Catalin Marinas wrote: > > On Thu, Apr 25, 2024 at 11:37:42AM +0100, Ryan Roberts wrote: > >> Also, IMHO we shouldn't really need to reserve PMD_PRESENT_INVALID for swap > >> ptes; it would be cleaner to have one bit that defines "present" when valid is > >> clear (similar to PTE_PROT_NONE today) then another bit which is only defined > >> when "present && !valid" which tells us if this is PTE_PROT_NONE or > >> PMD_PRESENT_INVALID (I don't think you can ever have both at the same time?). > > > > I think this make sense, maybe rename the above to PTE_PRESENT_INVALID > > and use it for both ptes and pmds. > > Yep, sounds good. I've already got a patch to do this, but it's exposed a bug in > core-mm so will now fix that before I can validate my change. see > https://lore.kernel.org/linux-arm-kernel/ZiuyGXt0XWwRgFh9@x1n/ > > With this in place, I'm proposing to remove PTE_PROT_NONE entirely and instead > represent PROT_NONE as a present but invalid pte (PTE_VALID=0, PTE_INVALID=1) > with both PTE_WRITE=0 and PTE_RDONLY=0. > > While the HW would interpret PTE_WRITE=0/PTE_RDONLY=0 as "RW without dirty bit > modification", this is not a problem as the pte is invalid, so the HW doesn't > interpret it. And SW always uses the PTE_WRITE bit to interpret the writability > of the pte. So PTE_WRITE=0/PTE_RDONLY=0 was previously an unused combination > that we now repurpose for PROT_NONE. Why not just keep the bits currently in PAGE_NONE (PTE_RDONLY would be set) and check PTE_USER|PTE_UXN == 0b01 which is a unique combination for PAGE_NONE (bar the kernel mappings). For ptes, it doesn't matter, we can assume that PTE_PRESENT_INVALID means pte_protnone(). For pmds, however, we can end up with pmd_protnone(pmd_mkinvalid(pmd)) == true for any of the PAGE_* permissions encoded into a valid pmd. That's where a dedicated PTE_PROT_NONE bit helped. Let's say a CPU starts splitting a pmd and does a pmdp_invalidate*() first to set PTE_PRESENT_INVALID. A different CPU gets a fault and since the pmd is present, it goes and checks pmd_protnone() which returns true, ending up on do_huge_pmd_numa_page() path. Maybe some locks help but it looks fragile to rely on them. So I think for protnone we need to check some other bits (like USER and UXN) in addition to PTE_PRESENT_INVALID. > This will subtly change behaviour in an edge case though. Imagine: > > pte_t pte; > > pte = pte_modify(pte, PAGE_NONE); > pte = pte_mkwrite_novma(pte); > WARN_ON(pte_protnone(pte)); > > Should that warning fire or not? Previously, because we had a dedicated bit for > PTE_PROT_NONE it would fire. With my proposed change it will not fire. To me > it's more intuitive if it doesn't fire. Regardless there is no core code that > ever does this. Once you have a protnone pte, its terminal - nothing ever > modifies it with these helpers AFAICS. I don't think any core code should try to make page a PAGE_NONE pte writeable. > Personally I think this is a nice tidy up that saves a SW bit in both present > and swap ptes. What do you think? (I'll just post the series if its easier to > provide feedback in that context). It would be nice to tidy this up and get rid of PTE_PROT_NONE as long as it doesn't affect the pmd case I mentioned above. > >> But there is a problem with this: __split_huge_pmd_locked() calls > >> pmdp_invalidate() for a pmd before it determines that it is pmd_present(). So > >> the PMD_PRESENT_INVALID can be set in a swap pte today. That feels wrong to me, > >> but was trying to avoid the whole thing unravelling so didn't persue. > > > > Maybe what's wrong is the arm64 implementation setting this bit on a > > swap/migration pmd (though we could handle this in the core code as > > well, it depends what the other architectures do). The only check for > > the PMD_PRESENT_INVALID bit is in the arm64 code and it can be absorbed > > into the pmd_present() check. I think it is currently broken as > > pmd_present() can return true for a swap pmd after pmd_mkinvalid(). > > I've posted a fix here: > https://lore.kernel.org/linux-mm/20240425170704.3379492-1-ryan.roberts@arm.com/ > > My position is that you shouldn't be calling pmd_mkinvalid() on a non-present pmd. I agree, thanks. -- Catalin _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v1 1/2] arm64/mm: Move PTE_PROT_NONE and PMD_PRESENT_INVALID 2024-04-29 12:38 ` Catalin Marinas @ 2024-04-29 13:01 ` Ryan Roberts 2024-04-29 13:23 ` Ryan Roberts 0 siblings, 1 reply; 21+ messages in thread From: Ryan Roberts @ 2024-04-29 13:01 UTC (permalink / raw) To: Catalin Marinas Cc: David Hildenbrand, Will Deacon, Joey Gouly, Ard Biesheuvel, Mark Rutland, Anshuman Khandual, Peter Xu, Mike Rapoport, Shivansh Vij, linux-arm-kernel, linux-kernel On 29/04/2024 13:38, Catalin Marinas wrote: > On Mon, Apr 29, 2024 at 11:04:53AM +0100, Ryan Roberts wrote: >> On 26/04/2024 15:48, Catalin Marinas wrote: >>> On Thu, Apr 25, 2024 at 11:37:42AM +0100, Ryan Roberts wrote: >>>> Also, IMHO we shouldn't really need to reserve PMD_PRESENT_INVALID for swap >>>> ptes; it would be cleaner to have one bit that defines "present" when valid is >>>> clear (similar to PTE_PROT_NONE today) then another bit which is only defined >>>> when "present && !valid" which tells us if this is PTE_PROT_NONE or >>>> PMD_PRESENT_INVALID (I don't think you can ever have both at the same time?). >>> >>> I think this make sense, maybe rename the above to PTE_PRESENT_INVALID >>> and use it for both ptes and pmds. >> >> Yep, sounds good. I've already got a patch to do this, but it's exposed a bug in >> core-mm so will now fix that before I can validate my change. see >> https://lore.kernel.org/linux-arm-kernel/ZiuyGXt0XWwRgFh9@x1n/ >> >> With this in place, I'm proposing to remove PTE_PROT_NONE entirely and instead >> represent PROT_NONE as a present but invalid pte (PTE_VALID=0, PTE_INVALID=1) >> with both PTE_WRITE=0 and PTE_RDONLY=0. >> >> While the HW would interpret PTE_WRITE=0/PTE_RDONLY=0 as "RW without dirty bit >> modification", this is not a problem as the pte is invalid, so the HW doesn't >> interpret it. And SW always uses the PTE_WRITE bit to interpret the writability >> of the pte. So PTE_WRITE=0/PTE_RDONLY=0 was previously an unused combination >> that we now repurpose for PROT_NONE. > > Why not just keep the bits currently in PAGE_NONE (PTE_RDONLY would be > set) and check PTE_USER|PTE_UXN == 0b01 which is a unique combination > for PAGE_NONE (bar the kernel mappings). Yes I guess that works. I personally prefer my proposal because it is more intuitive; you have an R bit and a W bit, and you encode RO, WR, and NONE. But if you think reusing the kernel mapping check (PTE_USER|PTE_UXN == 0b01) is preferable, then I'll go with that. > > For ptes, it doesn't matter, we can assume that PTE_PRESENT_INVALID > means pte_protnone(). For pmds, however, we can end up with > pmd_protnone(pmd_mkinvalid(pmd)) == true for any of the PAGE_* > permissions encoded into a valid pmd. That's where a dedicated > PTE_PROT_NONE bit helped. Yes agreed. > > Let's say a CPU starts splitting a pmd and does a pmdp_invalidate*() > first to set PTE_PRESENT_INVALID. A different CPU gets a fault and since > the pmd is present, it goes and checks pmd_protnone() which returns > true, ending up on do_huge_pmd_numa_page() path. Maybe some locks help > but it looks fragile to rely on them. > > So I think for protnone we need to check some other bits (like USER and > UXN) in addition to PTE_PRESENT_INVALID. Yes 100% agree. But using PTE_WRITE|PTE_RDONLY==0b00 is just as valid for that purpose, I think? > >> This will subtly change behaviour in an edge case though. Imagine: >> >> pte_t pte; >> >> pte = pte_modify(pte, PAGE_NONE); >> pte = pte_mkwrite_novma(pte); >> WARN_ON(pte_protnone(pte)); >> >> Should that warning fire or not? Previously, because we had a dedicated bit for >> PTE_PROT_NONE it would fire. With my proposed change it will not fire. To me >> it's more intuitive if it doesn't fire. Regardless there is no core code that >> ever does this. Once you have a protnone pte, its terminal - nothing ever >> modifies it with these helpers AFAICS. > > I don't think any core code should try to make page a PAGE_NONE pte > writeable. I looked at some other arches; some (at least alpha and hexagon) will not fire this warning because they have R and W bits and 0b00 means NONE. Others (x86) will fire it because they have an explicit NONE bit and don't remove it on permission change. So I conclude its UB and fine to do either. > >> Personally I think this is a nice tidy up that saves a SW bit in both present >> and swap ptes. What do you think? (I'll just post the series if its easier to >> provide feedback in that context). > > It would be nice to tidy this up and get rid of PTE_PROT_NONE as long as > it doesn't affect the pmd case I mentioned above. > >>>> But there is a problem with this: __split_huge_pmd_locked() calls >>>> pmdp_invalidate() for a pmd before it determines that it is pmd_present(). So >>>> the PMD_PRESENT_INVALID can be set in a swap pte today. That feels wrong to me, >>>> but was trying to avoid the whole thing unravelling so didn't persue. >>> >>> Maybe what's wrong is the arm64 implementation setting this bit on a >>> swap/migration pmd (though we could handle this in the core code as >>> well, it depends what the other architectures do). The only check for >>> the PMD_PRESENT_INVALID bit is in the arm64 code and it can be absorbed >>> into the pmd_present() check. I think it is currently broken as >>> pmd_present() can return true for a swap pmd after pmd_mkinvalid(). >> >> I've posted a fix here: >> https://lore.kernel.org/linux-mm/20240425170704.3379492-1-ryan.roberts@arm.com/ >> >> My position is that you shouldn't be calling pmd_mkinvalid() on a non-present pmd. > > I agree, thanks. > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v1 1/2] arm64/mm: Move PTE_PROT_NONE and PMD_PRESENT_INVALID 2024-04-29 13:01 ` Ryan Roberts @ 2024-04-29 13:23 ` Ryan Roberts 2024-04-29 14:18 ` Catalin Marinas 0 siblings, 1 reply; 21+ messages in thread From: Ryan Roberts @ 2024-04-29 13:23 UTC (permalink / raw) To: Catalin Marinas Cc: David Hildenbrand, Will Deacon, Joey Gouly, Ard Biesheuvel, Mark Rutland, Anshuman Khandual, Peter Xu, Mike Rapoport, Shivansh Vij, linux-arm-kernel, linux-kernel On 29/04/2024 14:01, Ryan Roberts wrote: > On 29/04/2024 13:38, Catalin Marinas wrote: >> On Mon, Apr 29, 2024 at 11:04:53AM +0100, Ryan Roberts wrote: >>> On 26/04/2024 15:48, Catalin Marinas wrote: >>>> On Thu, Apr 25, 2024 at 11:37:42AM +0100, Ryan Roberts wrote: >>>>> Also, IMHO we shouldn't really need to reserve PMD_PRESENT_INVALID for swap >>>>> ptes; it would be cleaner to have one bit that defines "present" when valid is >>>>> clear (similar to PTE_PROT_NONE today) then another bit which is only defined >>>>> when "present && !valid" which tells us if this is PTE_PROT_NONE or >>>>> PMD_PRESENT_INVALID (I don't think you can ever have both at the same time?). >>>> >>>> I think this make sense, maybe rename the above to PTE_PRESENT_INVALID >>>> and use it for both ptes and pmds. >>> >>> Yep, sounds good. I've already got a patch to do this, but it's exposed a bug in >>> core-mm so will now fix that before I can validate my change. see >>> https://lore.kernel.org/linux-arm-kernel/ZiuyGXt0XWwRgFh9@x1n/ >>> >>> With this in place, I'm proposing to remove PTE_PROT_NONE entirely and instead >>> represent PROT_NONE as a present but invalid pte (PTE_VALID=0, PTE_INVALID=1) >>> with both PTE_WRITE=0 and PTE_RDONLY=0. >>> >>> While the HW would interpret PTE_WRITE=0/PTE_RDONLY=0 as "RW without dirty bit >>> modification", this is not a problem as the pte is invalid, so the HW doesn't >>> interpret it. And SW always uses the PTE_WRITE bit to interpret the writability >>> of the pte. So PTE_WRITE=0/PTE_RDONLY=0 was previously an unused combination >>> that we now repurpose for PROT_NONE. >> >> Why not just keep the bits currently in PAGE_NONE (PTE_RDONLY would be >> set) and check PTE_USER|PTE_UXN == 0b01 which is a unique combination >> for PAGE_NONE (bar the kernel mappings). > > Yes I guess that works. I personally prefer my proposal because it is more > intuitive; you have an R bit and a W bit, and you encode RO, WR, and NONE. But > if you think reusing the kernel mapping check (PTE_USER|PTE_UXN == 0b01) is > preferable, then I'll go with that. Ignore this - I looked at your proposed approach and agree it's better. I'll use `PTE_USER|PTE_UXN==0b01`. Posting shortly... > >> >> For ptes, it doesn't matter, we can assume that PTE_PRESENT_INVALID >> means pte_protnone(). For pmds, however, we can end up with >> pmd_protnone(pmd_mkinvalid(pmd)) == true for any of the PAGE_* >> permissions encoded into a valid pmd. That's where a dedicated >> PTE_PROT_NONE bit helped. > > Yes agreed. > >> >> Let's say a CPU starts splitting a pmd and does a pmdp_invalidate*() >> first to set PTE_PRESENT_INVALID. A different CPU gets a fault and since >> the pmd is present, it goes and checks pmd_protnone() which returns >> true, ending up on do_huge_pmd_numa_page() path. Maybe some locks help >> but it looks fragile to rely on them. >> >> So I think for protnone we need to check some other bits (like USER and >> UXN) in addition to PTE_PRESENT_INVALID. > > Yes 100% agree. But using PTE_WRITE|PTE_RDONLY==0b00 is just as valid for that > purpose, I think? > >> >>> This will subtly change behaviour in an edge case though. Imagine: >>> >>> pte_t pte; >>> >>> pte = pte_modify(pte, PAGE_NONE); >>> pte = pte_mkwrite_novma(pte); >>> WARN_ON(pte_protnone(pte)); >>> >>> Should that warning fire or not? Previously, because we had a dedicated bit for >>> PTE_PROT_NONE it would fire. With my proposed change it will not fire. To me >>> it's more intuitive if it doesn't fire. Regardless there is no core code that >>> ever does this. Once you have a protnone pte, its terminal - nothing ever >>> modifies it with these helpers AFAICS. >> >> I don't think any core code should try to make page a PAGE_NONE pte >> writeable. > > I looked at some other arches; some (at least alpha and hexagon) will not fire > this warning because they have R and W bits and 0b00 means NONE. Others (x86) > will fire it because they have an explicit NONE bit and don't remove it on > permission change. So I conclude its UB and fine to do either. > >> >>> Personally I think this is a nice tidy up that saves a SW bit in both present >>> and swap ptes. What do you think? (I'll just post the series if its easier to >>> provide feedback in that context). >> >> It would be nice to tidy this up and get rid of PTE_PROT_NONE as long as >> it doesn't affect the pmd case I mentioned above. >> >>>>> But there is a problem with this: __split_huge_pmd_locked() calls >>>>> pmdp_invalidate() for a pmd before it determines that it is pmd_present(). So >>>>> the PMD_PRESENT_INVALID can be set in a swap pte today. That feels wrong to me, >>>>> but was trying to avoid the whole thing unravelling so didn't persue. >>>> >>>> Maybe what's wrong is the arm64 implementation setting this bit on a >>>> swap/migration pmd (though we could handle this in the core code as >>>> well, it depends what the other architectures do). The only check for >>>> the PMD_PRESENT_INVALID bit is in the arm64 code and it can be absorbed >>>> into the pmd_present() check. I think it is currently broken as >>>> pmd_present() can return true for a swap pmd after pmd_mkinvalid(). >>> >>> I've posted a fix here: >>> https://lore.kernel.org/linux-mm/20240425170704.3379492-1-ryan.roberts@arm.com/ >>> >>> My position is that you shouldn't be calling pmd_mkinvalid() on a non-present pmd. >> >> I agree, thanks. >> > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v1 1/2] arm64/mm: Move PTE_PROT_NONE and PMD_PRESENT_INVALID 2024-04-29 13:23 ` Ryan Roberts @ 2024-04-29 14:18 ` Catalin Marinas 2024-04-29 15:04 ` Ryan Roberts 0 siblings, 1 reply; 21+ messages in thread From: Catalin Marinas @ 2024-04-29 14:18 UTC (permalink / raw) To: Ryan Roberts Cc: David Hildenbrand, Will Deacon, Joey Gouly, Ard Biesheuvel, Mark Rutland, Anshuman Khandual, Peter Xu, Mike Rapoport, Shivansh Vij, linux-arm-kernel, linux-kernel On Mon, Apr 29, 2024 at 02:23:35PM +0100, Ryan Roberts wrote: > On 29/04/2024 14:01, Ryan Roberts wrote: > > On 29/04/2024 13:38, Catalin Marinas wrote: > >> On Mon, Apr 29, 2024 at 11:04:53AM +0100, Ryan Roberts wrote: > >>> On 26/04/2024 15:48, Catalin Marinas wrote: > >>>> On Thu, Apr 25, 2024 at 11:37:42AM +0100, Ryan Roberts wrote: > >>>>> Also, IMHO we shouldn't really need to reserve PMD_PRESENT_INVALID for swap > >>>>> ptes; it would be cleaner to have one bit that defines "present" when valid is > >>>>> clear (similar to PTE_PROT_NONE today) then another bit which is only defined > >>>>> when "present && !valid" which tells us if this is PTE_PROT_NONE or > >>>>> PMD_PRESENT_INVALID (I don't think you can ever have both at the same time?). > >>>> > >>>> I think this make sense, maybe rename the above to PTE_PRESENT_INVALID > >>>> and use it for both ptes and pmds. > >>> > >>> Yep, sounds good. I've already got a patch to do this, but it's exposed a bug in > >>> core-mm so will now fix that before I can validate my change. see > >>> https://lore.kernel.org/linux-arm-kernel/ZiuyGXt0XWwRgFh9@x1n/ > >>> > >>> With this in place, I'm proposing to remove PTE_PROT_NONE entirely and instead > >>> represent PROT_NONE as a present but invalid pte (PTE_VALID=0, PTE_INVALID=1) > >>> with both PTE_WRITE=0 and PTE_RDONLY=0. > >>> > >>> While the HW would interpret PTE_WRITE=0/PTE_RDONLY=0 as "RW without dirty bit > >>> modification", this is not a problem as the pte is invalid, so the HW doesn't > >>> interpret it. And SW always uses the PTE_WRITE bit to interpret the writability > >>> of the pte. So PTE_WRITE=0/PTE_RDONLY=0 was previously an unused combination > >>> that we now repurpose for PROT_NONE. > >> > >> Why not just keep the bits currently in PAGE_NONE (PTE_RDONLY would be > >> set) and check PTE_USER|PTE_UXN == 0b01 which is a unique combination > >> for PAGE_NONE (bar the kernel mappings). > > > > Yes I guess that works. I personally prefer my proposal because it is more > > intuitive; you have an R bit and a W bit, and you encode RO, WR, and NONE. But > > if you think reusing the kernel mapping check (PTE_USER|PTE_UXN == 0b01) is > > preferable, then I'll go with that. > > Ignore this - I looked at your proposed approach and agree it's better. I'll use > `PTE_USER|PTE_UXN==0b01`. Posting shortly... You nearly convinced me until I read your second reply ;). The PTE_WRITE|PTE_RDONLY == 0b00 still has the mkwrite problem if we care about (I don't think it can happen though). -- Catalin _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v1 1/2] arm64/mm: Move PTE_PROT_NONE and PMD_PRESENT_INVALID 2024-04-29 14:18 ` Catalin Marinas @ 2024-04-29 15:04 ` Ryan Roberts 0 siblings, 0 replies; 21+ messages in thread From: Ryan Roberts @ 2024-04-29 15:04 UTC (permalink / raw) To: Catalin Marinas Cc: David Hildenbrand, Will Deacon, Joey Gouly, Ard Biesheuvel, Mark Rutland, Anshuman Khandual, Peter Xu, Mike Rapoport, Shivansh Vij, linux-arm-kernel, linux-kernel On 29/04/2024 15:18, Catalin Marinas wrote: > On Mon, Apr 29, 2024 at 02:23:35PM +0100, Ryan Roberts wrote: >> On 29/04/2024 14:01, Ryan Roberts wrote: >>> On 29/04/2024 13:38, Catalin Marinas wrote: >>>> On Mon, Apr 29, 2024 at 11:04:53AM +0100, Ryan Roberts wrote: >>>>> On 26/04/2024 15:48, Catalin Marinas wrote: >>>>>> On Thu, Apr 25, 2024 at 11:37:42AM +0100, Ryan Roberts wrote: >>>>>>> Also, IMHO we shouldn't really need to reserve PMD_PRESENT_INVALID for swap >>>>>>> ptes; it would be cleaner to have one bit that defines "present" when valid is >>>>>>> clear (similar to PTE_PROT_NONE today) then another bit which is only defined >>>>>>> when "present && !valid" which tells us if this is PTE_PROT_NONE or >>>>>>> PMD_PRESENT_INVALID (I don't think you can ever have both at the same time?). >>>>>> >>>>>> I think this make sense, maybe rename the above to PTE_PRESENT_INVALID >>>>>> and use it for both ptes and pmds. >>>>> >>>>> Yep, sounds good. I've already got a patch to do this, but it's exposed a bug in >>>>> core-mm so will now fix that before I can validate my change. see >>>>> https://lore.kernel.org/linux-arm-kernel/ZiuyGXt0XWwRgFh9@x1n/ >>>>> >>>>> With this in place, I'm proposing to remove PTE_PROT_NONE entirely and instead >>>>> represent PROT_NONE as a present but invalid pte (PTE_VALID=0, PTE_INVALID=1) >>>>> with both PTE_WRITE=0 and PTE_RDONLY=0. >>>>> >>>>> While the HW would interpret PTE_WRITE=0/PTE_RDONLY=0 as "RW without dirty bit >>>>> modification", this is not a problem as the pte is invalid, so the HW doesn't >>>>> interpret it. And SW always uses the PTE_WRITE bit to interpret the writability >>>>> of the pte. So PTE_WRITE=0/PTE_RDONLY=0 was previously an unused combination >>>>> that we now repurpose for PROT_NONE. >>>> >>>> Why not just keep the bits currently in PAGE_NONE (PTE_RDONLY would be >>>> set) and check PTE_USER|PTE_UXN == 0b01 which is a unique combination >>>> for PAGE_NONE (bar the kernel mappings). >>> >>> Yes I guess that works. I personally prefer my proposal because it is more >>> intuitive; you have an R bit and a W bit, and you encode RO, WR, and NONE. But >>> if you think reusing the kernel mapping check (PTE_USER|PTE_UXN == 0b01) is >>> preferable, then I'll go with that. >> >> Ignore this - I looked at your proposed approach and agree it's better. I'll use >> `PTE_USER|PTE_UXN==0b01`. Posting shortly... > > You nearly convinced me until I read your second reply ;). The > PTE_WRITE|PTE_RDONLY == 0b00 still has the mkwrite problem if we care > about (I don't think it can happen though). Yes, just to clearly enumerate the reasons I prefer your approach: - PTE_RDONLY is also used for HW dirty bit. I had to add a conditional to pte_mkclean() for my scheme to prevent pte_mkclean() on a PROT_NONE pte eroneously making it RO. No such problem with your scheme. - With my scheme, we have the mkwrite problem, as you call it. Although, as I said some arches already have this semantic, so I don't think its a problem. But with your scheme we keep the existing arm64 semantics so it reduces risk of a problem in a corner I overlooked. Anyway, I've posted the v2. Take a look when you get time - perhaps we can get it into v6.10? _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v1 2/2] arm64/mm: Add uffd write-protect support 2024-04-24 11:10 [PATCH v1 0/2] arm64/mm: Enable userfaultfd write-protect Ryan Roberts 2024-04-24 11:10 ` [PATCH v1 1/2] arm64/mm: Move PTE_PROT_NONE and PMD_PRESENT_INVALID Ryan Roberts @ 2024-04-24 11:10 ` Ryan Roberts 2024-04-24 11:57 ` Peter Xu 2024-04-24 16:46 ` Catalin Marinas 1 sibling, 2 replies; 21+ messages in thread From: Ryan Roberts @ 2024-04-24 11:10 UTC (permalink / raw) To: Catalin Marinas, Will Deacon, Joey Gouly, Ard Biesheuvel, Mark Rutland, Anshuman Khandual, David Hildenbrand, Peter Xu, Mike Rapoport, Shivansh Vij Cc: Ryan Roberts, linux-arm-kernel, linux-kernel Let's use the newly-free PTE SW bit (58) to add support for uffd-wp. The standard handlers are implemented for set/test/clear for both pte and pmd. Additionally we must also track the uffd-wp state as a pte swp bit, so use a free swap entry pte bit (3). Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> --- arch/arm64/Kconfig | 1 + arch/arm64/include/asm/pgtable-prot.h | 8 ++++ arch/arm64/include/asm/pgtable.h | 55 +++++++++++++++++++++++++++ 3 files changed, 64 insertions(+) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 7b11c98b3e84..763e221f2169 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -255,6 +255,7 @@ config ARM64 select SYSCTL_EXCEPTION_TRACE select THREAD_INFO_IN_TASK select HAVE_ARCH_USERFAULTFD_MINOR if USERFAULTFD + select HAVE_ARCH_USERFAULTFD_WP if USERFAULTFD select TRACE_IRQFLAGS_SUPPORT select TRACE_IRQFLAGS_NMI_SUPPORT select HAVE_SOFTIRQ_ON_OWN_STACK diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h index ef952d69fd04..f1e1f6306e03 100644 --- a/arch/arm64/include/asm/pgtable-prot.h +++ b/arch/arm64/include/asm/pgtable-prot.h @@ -20,6 +20,14 @@ #define PTE_DEVMAP (_AT(pteval_t, 1) << 57) #define PTE_PROT_NONE (PTE_UXN) /* Reuse PTE_UXN; only when !PTE_VALID */ +#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP +#define PTE_UFFD_WP (_AT(pteval_t, 1) << 58) /* uffd-wp tracking */ +#define PTE_SWP_UFFD_WP (_AT(pteval_t, 1) << 3) /* only for swp ptes */ +#else +#define PTE_UFFD_WP (_AT(pteval_t, 0)) +#define PTE_SWP_UFFD_WP (_AT(pteval_t, 0)) +#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ + /* * This bit indicates that the entry is present i.e. pmd_page() * still points to a valid huge page in memory even if the pmd diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 23aabff4fa6f..3f4748741fdb 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -271,6 +271,34 @@ static inline pte_t pte_mkdevmap(pte_t pte) return set_pte_bit(pte, __pgprot(PTE_DEVMAP | PTE_SPECIAL)); } +#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP +static inline int pte_uffd_wp(pte_t pte) +{ + bool wp = !!(pte_val(pte) & PTE_UFFD_WP); + +#ifdef CONFIG_DEBUG_VM + /* + * Having write bit for wr-protect-marked present ptes is fatal, because + * it means the uffd-wp bit will be ignored and write will just go + * through. See comment in x86 implementation. + */ + WARN_ON_ONCE(wp && pte_write(pte)); +#endif + + return wp; +} + +static inline pte_t pte_mkuffd_wp(pte_t pte) +{ + return pte_wrprotect(set_pte_bit(pte, __pgprot(PTE_UFFD_WP))); +} + +static inline pte_t pte_clear_uffd_wp(pte_t pte) +{ + return clear_pte_bit(pte, __pgprot(PTE_UFFD_WP)); +} +#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ + static inline void __set_pte(pte_t *ptep, pte_t pte) { WRITE_ONCE(*ptep, pte); @@ -463,6 +491,23 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte) return clear_pte_bit(pte, __pgprot(PTE_SWP_EXCLUSIVE)); } +#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP +static inline pte_t pte_swp_mkuffd_wp(pte_t pte) +{ + return set_pte_bit(pte, __pgprot(PTE_SWP_UFFD_WP)); +} + +static inline int pte_swp_uffd_wp(pte_t pte) +{ + return !!(pte_val(pte) & PTE_SWP_UFFD_WP); +} + +static inline pte_t pte_swp_clear_uffd_wp(pte_t pte) +{ + return clear_pte_bit(pte, __pgprot(PTE_SWP_UFFD_WP)); +} +#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ + #ifdef CONFIG_NUMA_BALANCING /* * See the comment in include/linux/pgtable.h @@ -508,6 +553,15 @@ static inline int pmd_trans_huge(pmd_t pmd) #define pmd_mkclean(pmd) pte_pmd(pte_mkclean(pmd_pte(pmd))) #define pmd_mkdirty(pmd) pte_pmd(pte_mkdirty(pmd_pte(pmd))) #define pmd_mkyoung(pmd) pte_pmd(pte_mkyoung(pmd_pte(pmd))) +#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP +#define pmd_uffd_wp(pmd) pte_uffd_wp(pmd_pte(pmd)) +#define pmd_mkuffd_wp(pmd) pte_pmd(pte_mkuffd_wp(pmd_pte(pmd))) +#define pmd_clear_uffd_wp(pmd) pte_pmd(pte_clear_uffd_wp(pmd_pte(pmd))) +#define pmd_swp_uffd_wp(pmd) pte_swp_uffd_wp(pmd_pte(pmd)) +#define pmd_swp_mkuffd_wp(pmd) pte_pmd(pte_swp_mkuffd_wp(pmd_pte(pmd))) +#define pmd_swp_clear_uffd_wp(pmd) \ + pte_pmd(pte_swp_clear_uffd_wp(pmd_pte(pmd))) +#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ static inline pmd_t pmd_mkinvalid(pmd_t pmd) { @@ -1248,6 +1302,7 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma, * Encode and decode a swap entry: * bits 0-1: present (must be zero) * bits 2: remember PG_anon_exclusive + * bit 3: remember uffd-wp state * bits 4-53: swap offset * bit 54: PTE_PROT_NONE (overlays PTE_UXN) (must be zero) * bits 55-59: swap type -- 2.25.1 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v1 2/2] arm64/mm: Add uffd write-protect support 2024-04-24 11:10 ` [PATCH v1 2/2] arm64/mm: Add uffd write-protect support Ryan Roberts @ 2024-04-24 11:57 ` Peter Xu 2024-04-24 12:51 ` Ryan Roberts 2024-04-26 13:17 ` Ryan Roberts 2024-04-24 16:46 ` Catalin Marinas 1 sibling, 2 replies; 21+ messages in thread From: Peter Xu @ 2024-04-24 11:57 UTC (permalink / raw) To: Ryan Roberts Cc: Catalin Marinas, Will Deacon, Joey Gouly, Ard Biesheuvel, Mark Rutland, Anshuman Khandual, David Hildenbrand, Mike Rapoport, Shivansh Vij, linux-arm-kernel, linux-kernel Hi, Ryan, On Wed, Apr 24, 2024 at 12:10:17PM +0100, Ryan Roberts wrote: > Let's use the newly-free PTE SW bit (58) to add support for uffd-wp. > > The standard handlers are implemented for set/test/clear for both pte > and pmd. Additionally we must also track the uffd-wp state as a pte swp > bit, so use a free swap entry pte bit (3). > > Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Looks all sane here from userfault perspective, just one comment below. > --- > arch/arm64/Kconfig | 1 + > arch/arm64/include/asm/pgtable-prot.h | 8 ++++ > arch/arm64/include/asm/pgtable.h | 55 +++++++++++++++++++++++++++ > 3 files changed, 64 insertions(+) > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > index 7b11c98b3e84..763e221f2169 100644 > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -255,6 +255,7 @@ config ARM64 > select SYSCTL_EXCEPTION_TRACE > select THREAD_INFO_IN_TASK > select HAVE_ARCH_USERFAULTFD_MINOR if USERFAULTFD > + select HAVE_ARCH_USERFAULTFD_WP if USERFAULTFD > select TRACE_IRQFLAGS_SUPPORT > select TRACE_IRQFLAGS_NMI_SUPPORT > select HAVE_SOFTIRQ_ON_OWN_STACK > diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h > index ef952d69fd04..f1e1f6306e03 100644 > --- a/arch/arm64/include/asm/pgtable-prot.h > +++ b/arch/arm64/include/asm/pgtable-prot.h > @@ -20,6 +20,14 @@ > #define PTE_DEVMAP (_AT(pteval_t, 1) << 57) > #define PTE_PROT_NONE (PTE_UXN) /* Reuse PTE_UXN; only when !PTE_VALID */ > > +#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP > +#define PTE_UFFD_WP (_AT(pteval_t, 1) << 58) /* uffd-wp tracking */ > +#define PTE_SWP_UFFD_WP (_AT(pteval_t, 1) << 3) /* only for swp ptes */ > +#else > +#define PTE_UFFD_WP (_AT(pteval_t, 0)) > +#define PTE_SWP_UFFD_WP (_AT(pteval_t, 0)) > +#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ > + > /* > * This bit indicates that the entry is present i.e. pmd_page() > * still points to a valid huge page in memory even if the pmd > diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h > index 23aabff4fa6f..3f4748741fdb 100644 > --- a/arch/arm64/include/asm/pgtable.h > +++ b/arch/arm64/include/asm/pgtable.h > @@ -271,6 +271,34 @@ static inline pte_t pte_mkdevmap(pte_t pte) > return set_pte_bit(pte, __pgprot(PTE_DEVMAP | PTE_SPECIAL)); > } > > +#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP > +static inline int pte_uffd_wp(pte_t pte) > +{ > + bool wp = !!(pte_val(pte) & PTE_UFFD_WP); > + > +#ifdef CONFIG_DEBUG_VM > + /* > + * Having write bit for wr-protect-marked present ptes is fatal, because > + * it means the uffd-wp bit will be ignored and write will just go > + * through. See comment in x86 implementation. > + */ > + WARN_ON_ONCE(wp && pte_write(pte)); > +#endif Feel free to drop this line, see: https://lore.kernel.org/r/20240417212549.2766883-1-peterx@redhat.com It's still in mm-unstable only. AFAICT ARM64 also is supported by check_page_table, I also checked ARM's ptep_modify_prot_commit() which uses set_pte_at(), so it should cover everything in a superior way already. With that dropped, feel free to add: Acked-by: Peter Xu <peterx@redhat.com> Thanks, -- Peter Xu _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v1 2/2] arm64/mm: Add uffd write-protect support 2024-04-24 11:57 ` Peter Xu @ 2024-04-24 12:51 ` Ryan Roberts 2024-04-26 13:17 ` Ryan Roberts 1 sibling, 0 replies; 21+ messages in thread From: Ryan Roberts @ 2024-04-24 12:51 UTC (permalink / raw) To: Peter Xu Cc: Catalin Marinas, Will Deacon, Joey Gouly, Ard Biesheuvel, Mark Rutland, Anshuman Khandual, David Hildenbrand, Mike Rapoport, Shivansh Vij, linux-arm-kernel, linux-kernel On 24/04/2024 12:57, Peter Xu wrote: > Hi, Ryan, > > On Wed, Apr 24, 2024 at 12:10:17PM +0100, Ryan Roberts wrote: >> Let's use the newly-free PTE SW bit (58) to add support for uffd-wp. >> >> The standard handlers are implemented for set/test/clear for both pte >> and pmd. Additionally we must also track the uffd-wp state as a pte swp >> bit, so use a free swap entry pte bit (3). >> >> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> > > Looks all sane here from userfault perspective, just one comment below. > >> --- >> arch/arm64/Kconfig | 1 + >> arch/arm64/include/asm/pgtable-prot.h | 8 ++++ >> arch/arm64/include/asm/pgtable.h | 55 +++++++++++++++++++++++++++ >> 3 files changed, 64 insertions(+) >> >> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig >> index 7b11c98b3e84..763e221f2169 100644 >> --- a/arch/arm64/Kconfig >> +++ b/arch/arm64/Kconfig >> @@ -255,6 +255,7 @@ config ARM64 >> select SYSCTL_EXCEPTION_TRACE >> select THREAD_INFO_IN_TASK >> select HAVE_ARCH_USERFAULTFD_MINOR if USERFAULTFD >> + select HAVE_ARCH_USERFAULTFD_WP if USERFAULTFD >> select TRACE_IRQFLAGS_SUPPORT >> select TRACE_IRQFLAGS_NMI_SUPPORT >> select HAVE_SOFTIRQ_ON_OWN_STACK >> diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h >> index ef952d69fd04..f1e1f6306e03 100644 >> --- a/arch/arm64/include/asm/pgtable-prot.h >> +++ b/arch/arm64/include/asm/pgtable-prot.h >> @@ -20,6 +20,14 @@ >> #define PTE_DEVMAP (_AT(pteval_t, 1) << 57) >> #define PTE_PROT_NONE (PTE_UXN) /* Reuse PTE_UXN; only when !PTE_VALID */ >> >> +#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP >> +#define PTE_UFFD_WP (_AT(pteval_t, 1) << 58) /* uffd-wp tracking */ >> +#define PTE_SWP_UFFD_WP (_AT(pteval_t, 1) << 3) /* only for swp ptes */ >> +#else >> +#define PTE_UFFD_WP (_AT(pteval_t, 0)) >> +#define PTE_SWP_UFFD_WP (_AT(pteval_t, 0)) >> +#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ >> + >> /* >> * This bit indicates that the entry is present i.e. pmd_page() >> * still points to a valid huge page in memory even if the pmd >> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h >> index 23aabff4fa6f..3f4748741fdb 100644 >> --- a/arch/arm64/include/asm/pgtable.h >> +++ b/arch/arm64/include/asm/pgtable.h >> @@ -271,6 +271,34 @@ static inline pte_t pte_mkdevmap(pte_t pte) >> return set_pte_bit(pte, __pgprot(PTE_DEVMAP | PTE_SPECIAL)); >> } >> >> +#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP >> +static inline int pte_uffd_wp(pte_t pte) >> +{ >> + bool wp = !!(pte_val(pte) & PTE_UFFD_WP); >> + >> +#ifdef CONFIG_DEBUG_VM >> + /* >> + * Having write bit for wr-protect-marked present ptes is fatal, because >> + * it means the uffd-wp bit will be ignored and write will just go >> + * through. See comment in x86 implementation. >> + */ >> + WARN_ON_ONCE(wp && pte_write(pte)); >> +#endif > > Feel free to drop this line, see: > > https://lore.kernel.org/r/20240417212549.2766883-1-peterx@redhat.com Ahh nice! In that case, I'm going to convert this to a macro, which is the arm64 style for these getters (for some reason...): #define pte_uffd_wp(pte_t pte) (!!(pte_val(pte) & PTE_UFFD_WP)) Will send out a v2 once others have had time to comment. > > It's still in mm-unstable only. > > AFAICT ARM64 also is supported by check_page_table, I also checked ARM's > ptep_modify_prot_commit() which uses set_pte_at(), so it should cover > everything in a superior way already. > > With that dropped, feel free to add: > > Acked-by: Peter Xu <peterx@redhat.com> Thanks! > > Thanks, > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v1 2/2] arm64/mm: Add uffd write-protect support 2024-04-24 11:57 ` Peter Xu 2024-04-24 12:51 ` Ryan Roberts @ 2024-04-26 13:17 ` Ryan Roberts 2024-04-26 13:54 ` Peter Xu 1 sibling, 1 reply; 21+ messages in thread From: Ryan Roberts @ 2024-04-26 13:17 UTC (permalink / raw) To: Peter Xu, Muhammad Usama Anjum Cc: Catalin Marinas, Will Deacon, Joey Gouly, Ard Biesheuvel, Mark Rutland, Anshuman Khandual, David Hildenbrand, Mike Rapoport, Shivansh Vij, linux-arm-kernel, linux-kernel + Muhammad Usama Anjum <usama.anjum@collabora.com> Hi Peter, Muhammad, On 24/04/2024 12:57, Peter Xu wrote: > Hi, Ryan, > > On Wed, Apr 24, 2024 at 12:10:17PM +0100, Ryan Roberts wrote: >> Let's use the newly-free PTE SW bit (58) to add support for uffd-wp. >> >> The standard handlers are implemented for set/test/clear for both pte >> and pmd. Additionally we must also track the uffd-wp state as a pte swp >> bit, so use a free swap entry pte bit (3). >> >> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> > > Looks all sane here from userfault perspective, just one comment below. > >> --- >> arch/arm64/Kconfig | 1 + >> arch/arm64/include/asm/pgtable-prot.h | 8 ++++ >> arch/arm64/include/asm/pgtable.h | 55 +++++++++++++++++++++++++++ >> 3 files changed, 64 insertions(+) >> >> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig >> index 7b11c98b3e84..763e221f2169 100644 >> --- a/arch/arm64/Kconfig >> +++ b/arch/arm64/Kconfig >> @@ -255,6 +255,7 @@ config ARM64 >> select SYSCTL_EXCEPTION_TRACE >> select THREAD_INFO_IN_TASK >> select HAVE_ARCH_USERFAULTFD_MINOR if USERFAULTFD >> + select HAVE_ARCH_USERFAULTFD_WP if USERFAULTFD >> select TRACE_IRQFLAGS_SUPPORT >> select TRACE_IRQFLAGS_NMI_SUPPORT >> select HAVE_SOFTIRQ_ON_OWN_STACK >> diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h >> index ef952d69fd04..f1e1f6306e03 100644 >> --- a/arch/arm64/include/asm/pgtable-prot.h >> +++ b/arch/arm64/include/asm/pgtable-prot.h >> @@ -20,6 +20,14 @@ >> #define PTE_DEVMAP (_AT(pteval_t, 1) << 57) >> #define PTE_PROT_NONE (PTE_UXN) /* Reuse PTE_UXN; only when !PTE_VALID */ >> >> +#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP >> +#define PTE_UFFD_WP (_AT(pteval_t, 1) << 58) /* uffd-wp tracking */ >> +#define PTE_SWP_UFFD_WP (_AT(pteval_t, 1) << 3) /* only for swp ptes */ I've just noticed code in task_mmu.c: static int pagemap_scan_pmd_entry(pmd_t *pmd, unsigned long start, unsigned long end, struct mm_walk *walk) { ... if (!p->arg.category_anyof_mask && !p->arg.category_inverted && p->arg.category_mask == PAGE_IS_WRITTEN && p->arg.return_mask == PAGE_IS_WRITTEN) { for (addr = start; addr < end; pte++, addr += PAGE_SIZE) { unsigned long next = addr + PAGE_SIZE; if (pte_uffd_wp(ptep_get(pte))) <<<<<< continue; ... } } } As far as I can see, you don't know that the pte is present when you do this. So does this imply that the UFFD-WP bit is expected to be in the same position for both present ptes and swap ptes? I had assumed pte_uffd_wp() was for present ptes and pte_swp_uffd_wp() was for swap ptes. As you can see, the way I've implemented this for arm64 the bit is in a different position for these 2 cases. I've just done a slightly different implementation that changes the first patch in this series quite a bit and a bunch of pagemap_ioctl mm kselftests are now failing. I think this is the root cause, but haven't proven it definitively yet. I'm inclined towords thinking the above is a bug and should be fixed so that I can store the bit in different places. What do you think? Thanks, Ryan >> +#else >> +#define PTE_UFFD_WP (_AT(pteval_t, 0)) >> +#define PTE_SWP_UFFD_WP (_AT(pteval_t, 0)) >> +#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ >> + >> /* >> * This bit indicates that the entry is present i.e. pmd_page() >> * still points to a valid huge page in memory even if the pmd >> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h >> index 23aabff4fa6f..3f4748741fdb 100644 >> --- a/arch/arm64/include/asm/pgtable.h >> +++ b/arch/arm64/include/asm/pgtable.h >> @@ -271,6 +271,34 @@ static inline pte_t pte_mkdevmap(pte_t pte) >> return set_pte_bit(pte, __pgprot(PTE_DEVMAP | PTE_SPECIAL)); >> } >> >> +#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP >> +static inline int pte_uffd_wp(pte_t pte) >> +{ >> + bool wp = !!(pte_val(pte) & PTE_UFFD_WP); >> + >> +#ifdef CONFIG_DEBUG_VM >> + /* >> + * Having write bit for wr-protect-marked present ptes is fatal, because >> + * it means the uffd-wp bit will be ignored and write will just go >> + * through. See comment in x86 implementation. >> + */ >> + WARN_ON_ONCE(wp && pte_write(pte)); >> +#endif > > Feel free to drop this line, see: > > https://lore.kernel.org/r/20240417212549.2766883-1-peterx@redhat.com > > It's still in mm-unstable only. > > AFAICT ARM64 also is supported by check_page_table, I also checked ARM's > ptep_modify_prot_commit() which uses set_pte_at(), so it should cover > everything in a superior way already. > > With that dropped, feel free to add: > > Acked-by: Peter Xu <peterx@redhat.com> > > Thanks, > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v1 2/2] arm64/mm: Add uffd write-protect support 2024-04-26 13:17 ` Ryan Roberts @ 2024-04-26 13:54 ` Peter Xu 2024-04-29 9:39 ` Ryan Roberts 0 siblings, 1 reply; 21+ messages in thread From: Peter Xu @ 2024-04-26 13:54 UTC (permalink / raw) To: Ryan Roberts Cc: Muhammad Usama Anjum, Catalin Marinas, Will Deacon, Joey Gouly, Ard Biesheuvel, Mark Rutland, Anshuman Khandual, David Hildenbrand, Mike Rapoport, Shivansh Vij, linux-arm-kernel, linux-kernel On Fri, Apr 26, 2024 at 02:17:41PM +0100, Ryan Roberts wrote: > + Muhammad Usama Anjum <usama.anjum@collabora.com> > > Hi Peter, Muhammad, > > > On 24/04/2024 12:57, Peter Xu wrote: > > Hi, Ryan, > > > > On Wed, Apr 24, 2024 at 12:10:17PM +0100, Ryan Roberts wrote: > >> Let's use the newly-free PTE SW bit (58) to add support for uffd-wp. > >> > >> The standard handlers are implemented for set/test/clear for both pte > >> and pmd. Additionally we must also track the uffd-wp state as a pte swp > >> bit, so use a free swap entry pte bit (3). > >> > >> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> > > > > Looks all sane here from userfault perspective, just one comment below. > > > >> --- > >> arch/arm64/Kconfig | 1 + > >> arch/arm64/include/asm/pgtable-prot.h | 8 ++++ > >> arch/arm64/include/asm/pgtable.h | 55 +++++++++++++++++++++++++++ > >> 3 files changed, 64 insertions(+) > >> > >> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > >> index 7b11c98b3e84..763e221f2169 100644 > >> --- a/arch/arm64/Kconfig > >> +++ b/arch/arm64/Kconfig > >> @@ -255,6 +255,7 @@ config ARM64 > >> select SYSCTL_EXCEPTION_TRACE > >> select THREAD_INFO_IN_TASK > >> select HAVE_ARCH_USERFAULTFD_MINOR if USERFAULTFD > >> + select HAVE_ARCH_USERFAULTFD_WP if USERFAULTFD > >> select TRACE_IRQFLAGS_SUPPORT > >> select TRACE_IRQFLAGS_NMI_SUPPORT > >> select HAVE_SOFTIRQ_ON_OWN_STACK > >> diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h > >> index ef952d69fd04..f1e1f6306e03 100644 > >> --- a/arch/arm64/include/asm/pgtable-prot.h > >> +++ b/arch/arm64/include/asm/pgtable-prot.h > >> @@ -20,6 +20,14 @@ > >> #define PTE_DEVMAP (_AT(pteval_t, 1) << 57) > >> #define PTE_PROT_NONE (PTE_UXN) /* Reuse PTE_UXN; only when !PTE_VALID */ > >> > >> +#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP > >> +#define PTE_UFFD_WP (_AT(pteval_t, 1) << 58) /* uffd-wp tracking */ > >> +#define PTE_SWP_UFFD_WP (_AT(pteval_t, 1) << 3) /* only for swp ptes */ > > I've just noticed code in task_mmu.c: > > static int pagemap_scan_pmd_entry(pmd_t *pmd, unsigned long start, > unsigned long end, struct mm_walk *walk) > { > ... > > if (!p->arg.category_anyof_mask && !p->arg.category_inverted && > p->arg.category_mask == PAGE_IS_WRITTEN && > p->arg.return_mask == PAGE_IS_WRITTEN) { > for (addr = start; addr < end; pte++, addr += PAGE_SIZE) { > unsigned long next = addr + PAGE_SIZE; > > if (pte_uffd_wp(ptep_get(pte))) <<<<<< > continue; > > ... > } > } > } > > As far as I can see, you don't know that the pte is present when you do this. So > does this imply that the UFFD-WP bit is expected to be in the same position for > both present ptes and swap ptes? I had assumed pte_uffd_wp() was for present > ptes and pte_swp_uffd_wp() was for swap ptes. > > As you can see, the way I've implemented this for arm64 the bit is in a > different position for these 2 cases. I've just done a slightly different > implementation that changes the first patch in this series quite a bit and a > bunch of pagemap_ioctl mm kselftests are now failing. I think this is the root > cause, but haven't proven it definitively yet. > > I'm inclined towords thinking the above is a bug and should be fixed so that I > can store the bit in different places. What do you think? Yep I agree. Even on x86_64 they should be defined differently. It looks like some sheer luck the test constantly pass on x86 even if it checked the wrong one. Worth checking all the relevant paths in the pagemap code to make sure it's checked, e.g. I also see one fast path above this chunk of code which looks like to have the same issue. Thanks, -- Peter Xu _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v1 2/2] arm64/mm: Add uffd write-protect support 2024-04-26 13:54 ` Peter Xu @ 2024-04-29 9:39 ` Ryan Roberts 0 siblings, 0 replies; 21+ messages in thread From: Ryan Roberts @ 2024-04-29 9:39 UTC (permalink / raw) To: Peter Xu Cc: Muhammad Usama Anjum, Catalin Marinas, Will Deacon, Joey Gouly, Ard Biesheuvel, Mark Rutland, Anshuman Khandual, David Hildenbrand, Mike Rapoport, Shivansh Vij, linux-arm-kernel, linux-kernel On 26/04/2024 14:54, Peter Xu wrote: > On Fri, Apr 26, 2024 at 02:17:41PM +0100, Ryan Roberts wrote: >> + Muhammad Usama Anjum <usama.anjum@collabora.com> >> >> Hi Peter, Muhammad, >> >> >> On 24/04/2024 12:57, Peter Xu wrote: >>> Hi, Ryan, >>> >>> On Wed, Apr 24, 2024 at 12:10:17PM +0100, Ryan Roberts wrote: >>>> Let's use the newly-free PTE SW bit (58) to add support for uffd-wp. >>>> >>>> The standard handlers are implemented for set/test/clear for both pte >>>> and pmd. Additionally we must also track the uffd-wp state as a pte swp >>>> bit, so use a free swap entry pte bit (3). >>>> >>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> >>> >>> Looks all sane here from userfault perspective, just one comment below. >>> >>>> --- >>>> arch/arm64/Kconfig | 1 + >>>> arch/arm64/include/asm/pgtable-prot.h | 8 ++++ >>>> arch/arm64/include/asm/pgtable.h | 55 +++++++++++++++++++++++++++ >>>> 3 files changed, 64 insertions(+) >>>> >>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig >>>> index 7b11c98b3e84..763e221f2169 100644 >>>> --- a/arch/arm64/Kconfig >>>> +++ b/arch/arm64/Kconfig >>>> @@ -255,6 +255,7 @@ config ARM64 >>>> select SYSCTL_EXCEPTION_TRACE >>>> select THREAD_INFO_IN_TASK >>>> select HAVE_ARCH_USERFAULTFD_MINOR if USERFAULTFD >>>> + select HAVE_ARCH_USERFAULTFD_WP if USERFAULTFD >>>> select TRACE_IRQFLAGS_SUPPORT >>>> select TRACE_IRQFLAGS_NMI_SUPPORT >>>> select HAVE_SOFTIRQ_ON_OWN_STACK >>>> diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h >>>> index ef952d69fd04..f1e1f6306e03 100644 >>>> --- a/arch/arm64/include/asm/pgtable-prot.h >>>> +++ b/arch/arm64/include/asm/pgtable-prot.h >>>> @@ -20,6 +20,14 @@ >>>> #define PTE_DEVMAP (_AT(pteval_t, 1) << 57) >>>> #define PTE_PROT_NONE (PTE_UXN) /* Reuse PTE_UXN; only when !PTE_VALID */ >>>> >>>> +#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP >>>> +#define PTE_UFFD_WP (_AT(pteval_t, 1) << 58) /* uffd-wp tracking */ >>>> +#define PTE_SWP_UFFD_WP (_AT(pteval_t, 1) << 3) /* only for swp ptes */ >> >> I've just noticed code in task_mmu.c: >> >> static int pagemap_scan_pmd_entry(pmd_t *pmd, unsigned long start, >> unsigned long end, struct mm_walk *walk) >> { >> ... >> >> if (!p->arg.category_anyof_mask && !p->arg.category_inverted && >> p->arg.category_mask == PAGE_IS_WRITTEN && >> p->arg.return_mask == PAGE_IS_WRITTEN) { >> for (addr = start; addr < end; pte++, addr += PAGE_SIZE) { >> unsigned long next = addr + PAGE_SIZE; >> >> if (pte_uffd_wp(ptep_get(pte))) <<<<<< >> continue; >> >> ... >> } >> } >> } >> >> As far as I can see, you don't know that the pte is present when you do this. So >> does this imply that the UFFD-WP bit is expected to be in the same position for >> both present ptes and swap ptes? I had assumed pte_uffd_wp() was for present >> ptes and pte_swp_uffd_wp() was for swap ptes. >> >> As you can see, the way I've implemented this for arm64 the bit is in a >> different position for these 2 cases. I've just done a slightly different >> implementation that changes the first patch in this series quite a bit and a >> bunch of pagemap_ioctl mm kselftests are now failing. I think this is the root >> cause, but haven't proven it definitively yet. >> >> I'm inclined towords thinking the above is a bug and should be fixed so that I >> can store the bit in different places. What do you think? > > Yep I agree. OK great - I'll spin a patch to fix this. > > Even on x86_64 they should be defined differently. It looks like some > sheer luck the test constantly pass on x86 even if it checked the wrong one. > > Worth checking all the relevant paths in the pagemap code to make sure it's > checked, e.g. I also see one fast path above this chunk of code which looks > like to have the same issue. Yes, spotted that one. I'll audit other sites too. Thanks! > > Thanks, > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v1 2/2] arm64/mm: Add uffd write-protect support 2024-04-24 11:10 ` [PATCH v1 2/2] arm64/mm: Add uffd write-protect support Ryan Roberts 2024-04-24 11:57 ` Peter Xu @ 2024-04-24 16:46 ` Catalin Marinas 1 sibling, 0 replies; 21+ messages in thread From: Catalin Marinas @ 2024-04-24 16:46 UTC (permalink / raw) To: Ryan Roberts Cc: Will Deacon, Joey Gouly, Ard Biesheuvel, Mark Rutland, Anshuman Khandual, David Hildenbrand, Peter Xu, Mike Rapoport, Shivansh Vij, linux-arm-kernel, linux-kernel On Wed, Apr 24, 2024 at 12:10:17PM +0100, Ryan Roberts wrote: > @@ -1248,6 +1302,7 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma, > * Encode and decode a swap entry: > * bits 0-1: present (must be zero) > * bits 2: remember PG_anon_exclusive > + * bit 3: remember uffd-wp state > * bits 4-53: swap offset > * bit 54: PTE_PROT_NONE (overlays PTE_UXN) (must be zero) > * bits 55-59: swap type Ah, I did not realise we need to free up bit 3 from the swap pte as well. Though maybe patch 1 is fine as is but for the record, it would be good to justify the decision to go with PTE_UXN. For this patch: Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2024-04-29 15:05 UTC | newest] Thread overview: 21+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-04-24 11:10 [PATCH v1 0/2] arm64/mm: Enable userfaultfd write-protect Ryan Roberts 2024-04-24 11:10 ` [PATCH v1 1/2] arm64/mm: Move PTE_PROT_NONE and PMD_PRESENT_INVALID Ryan Roberts 2024-04-24 16:43 ` Catalin Marinas 2024-04-25 8:40 ` Ryan Roberts 2024-04-25 9:16 ` David Hildenbrand 2024-04-25 10:29 ` Ryan Roberts 2024-04-25 10:37 ` Ryan Roberts 2024-04-26 14:48 ` Catalin Marinas 2024-04-29 10:04 ` Ryan Roberts 2024-04-29 12:38 ` Catalin Marinas 2024-04-29 13:01 ` Ryan Roberts 2024-04-29 13:23 ` Ryan Roberts 2024-04-29 14:18 ` Catalin Marinas 2024-04-29 15:04 ` Ryan Roberts 2024-04-24 11:10 ` [PATCH v1 2/2] arm64/mm: Add uffd write-protect support Ryan Roberts 2024-04-24 11:57 ` Peter Xu 2024-04-24 12:51 ` Ryan Roberts 2024-04-26 13:17 ` Ryan Roberts 2024-04-26 13:54 ` Peter Xu 2024-04-29 9:39 ` Ryan Roberts 2024-04-24 16:46 ` Catalin Marinas
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).