Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Lorenzo Stoakes <ljs@kernel.org>
To: Kiryl Shutsemau <kirill@shutemov.name>
Cc: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com,
	 david@kernel.org, surenb@google.com, vbabka@kernel.org,
	Liam.Howlett@oracle.com,  ziy@nvidia.com, corbet@lwn.net,
	skhan@linuxfoundation.org, seanjc@google.com,
	 pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com,
	sj@kernel.org,  usama.arif@linux.dev, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,  linux-doc@vger.kernel.org,
	linux-kselftest@vger.kernel.org, kvm@vger.kernel.org,
	 kernel-team@meta.com, "Kiryl Shutsemau (Meta)" <kas@kernel.org>
Subject: Re: [PATCH v5 08/18] mm: add VM_UFFD_RWP VMA flag
Date: Fri, 29 May 2026 08:24:55 +0100	[thread overview]
Message-ID: <ahk60ViRq4q2g4uz@lucifer> (raw)
In-Reply-To: <20260526130509.2748441-9-kirill@shutemov.name>

On Tue, May 26, 2026 at 02:04:56PM +0100, Kiryl Shutsemau wrote:
> From: "Kiryl Shutsemau (Meta)" <kas@kernel.org>
>
> Preparatory patch for userfaultfd read-write protection (RWP). RWP
> extends userfaultfd protection from plain write-protection (WP) to
> full read-write protection: accesses to an RWP-protected range --
> reads as well as writes -- trap through userfaultfd.
>
> Reserve VM_UFFD_RWP, add the userfaultfd_rwp() and
> userfaultfd_protected() helpers, and wire up the smaps "ur" entry and
> the trace-flag table the rest of the series will use. The flag is
> gated on CONFIG_USERFAULTFD_RWP, which is introduced together with the
> UAPI in a later patch; until then VM_UFFD_RWP aliases VM_NONE and
> every downstream check folds to dead code.
>
> Nothing sets or queries the flag yet.
>
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> Assisted-by: Claude:claude-opus-4-6

Hm, if you've just used claude to bounce ideas off, I'm really not sure if
it's necessary to disclose, though I respect your thoroughness for doing so
:)

I guess determining the threshold at which it makes sense to do so is still
a WIP for us in the kernel.

> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> Reviewed-by: SeongJae Park <sj@kernel.org>
> ---
>  Documentation/filesystems/proc.rst |  1 +
>  fs/proc/task_mmu.c                 |  3 +++
>  include/linux/mm.h                 | 28 +++++++++++++++++----------
>  include/linux/userfaultfd_k.h      | 31 +++++++++++++++++++++++++-----
>  include/trace/events/mmflags.h     |  7 +++++++
>  5 files changed, 55 insertions(+), 15 deletions(-)
>
> diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
> index db6167befb7b..db28207c5290 100644
> --- a/Documentation/filesystems/proc.rst
> +++ b/Documentation/filesystems/proc.rst
> @@ -607,6 +607,7 @@ encoded manner. The codes are the following:
>      um    userfaultfd missing tracking
>      uw    userfaultfd wr-protect tracking
>      ui    userfaultfd minor fault
> +    ur    userfaultfd read-write-protect tracking
>      ss    shadow/guarded control stack page
>      sl    sealed
>      lf    lock on fault pages
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 1e5f6ee8a3b6..974c5f4aa533 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -1237,6 +1237,9 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
>  #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR
>  		[ilog2(VM_UFFD_MINOR)]	= "ui",
>  #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */
> +#ifdef CONFIG_USERFAULTFD_RWP
> +		[ilog2(VM_UFFD_RWP)]	= "ur",
> +#endif
>  #ifdef CONFIG_ARCH_HAS_USER_SHADOW_STACK
>  		[ilog2(VM_SHADOW_STACK)] = "ss",
>  #endif
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 71b11945e4fc..6499cfb61dc4 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -362,6 +362,7 @@ enum {
>  #endif
>  	DECLARE_VMA_BIT(UFFD_MINOR, 41),
>  	DECLARE_VMA_BIT(SEALED, 42),
> +	DECLARE_VMA_BIT(UFFD_RWP, 43),

I'm guessing CONFIG_USERFAULTFD_RWP is predicated on CONFIG_64BIT?

It's a silly situation and once my VMA flags stuff is done it'll be
eliminated but for now... :)

>  	/* Flags that reuse flags above. */
>  	DECLARE_VMA_BIT_ALIAS(PKEY_BIT0, HIGH_ARCH_0),
>  	DECLARE_VMA_BIT_ALIAS(PKEY_BIT1, HIGH_ARCH_1),
> @@ -505,6 +506,11 @@ enum {
>  #else
>  #define VM_UFFD_MINOR	VM_NONE
>  #endif
> +#ifdef CONFIG_USERFAULTFD_RWP
> +#define VM_UFFD_RWP		INIT_VM_FLAG(UFFD_RWP)
> +#else
> +#define VM_UFFD_RWP		VM_NONE
> +#endif
>  #ifdef CONFIG_64BIT
>  #define VM_ALLOW_ANY_UNCACHED	INIT_VM_FLAG(ALLOW_ANY_UNCACHED)
>  #define VM_SEALED		INIT_VM_FLAG(SEALED)
> @@ -642,22 +648,24 @@ enum {
>   * reconsistuted upon page fault, so necessitate page table copying upon fork.
>   *
>   * Note that these flags should be compared with the DESTINATION VMA not the
> - * source, as VM_UFFD_WP may not be propagated to destination, while all other
> - * flags will be.
> + * source: VM_UFFD_WP and VM_UFFD_RWP may be cleared on the destination
> + * (dup_userfaultfd() -> userfaultfd_reset_ctx() when the parent context did
> + * not negotiate UFFD_FEATURE_EVENT_FORK), while all other flags propagate.
>   *
>   * VM_PFNMAP / VM_MIXEDMAP - These contain kernel-mapped data which cannot be
>   *                           reasonably reconstructed on page fault.
>   *
>   *              VM_UFFD_WP - Encodes metadata about an installed uffd
> - *                           write protect handler, which cannot be
> - *                           reconstructed on page fault.
> + *              VM_UFFD_RWP  write- or read-write-protect handler, which
> + *                           cannot be reconstructed on page fault.
>   *
> - *                           We always copy pgtables when dst_vma has uffd-wp
> - *                           enabled even if it's file-backed
> - *                           (e.g. shmem). Because when uffd-wp is enabled,
> - *                           pgtable contains uffd-wp protection information,
> - *                           that's something we can't retrieve from page cache,
> - *                           and skip copying will lose those info.
> + *                           We always copy pgtables when dst_vma has the
> + *                           uffd PTE bit in use even if it's file-backed
> + *                           (e.g. shmem). Because when the uffd bit is
> + *                           in use, the pgtable contains the protection
> + *                           information, that's something we can't
> + *                           retrieve from page cache, and skip copying
> + *                           will lose those info.
>   *
>   *          VM_MAYBE_GUARD - Could contain page guard region markers which
>   *                           by design are a property of the page tables
> diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
> index f4cf5763f92c..0aef628514df 100644
> --- a/include/linux/userfaultfd_k.h
> +++ b/include/linux/userfaultfd_k.h
> @@ -21,10 +21,11 @@
>  #include <linux/hugetlb_inline.h>
>
>  /* The set of all possible UFFD-related VM flags. */
> -#define __VM_UFFD_FLAGS (VM_UFFD_MISSING | VM_UFFD_WP | VM_UFFD_MINOR)
> +#define __VM_UFFD_FLAGS (VM_UFFD_MISSING | VM_UFFD_MINOR | \
> +			 VM_UFFD_WP | VM_UFFD_RWP)
>
>  #define __VMA_UFFD_FLAGS mk_vma_flags(VMA_UFFD_MISSING_BIT, VMA_UFFD_WP_BIT, \
> -				      VMA_UFFD_MINOR_BIT)
> +				      VMA_UFFD_MINOR_BIT, VMA_UFFD_RWP_BIT)
>
>  /*
>   * CAREFUL: Check include/uapi/asm-generic/fcntl.h when defining
> @@ -178,7 +179,7 @@ static inline bool is_mergeable_vm_userfaultfd_ctx(struct vm_area_struct *vma,
>   */
>  static inline bool uffd_disable_huge_pmd_share(struct vm_area_struct *vma)
>  {
> -	return vma->vm_flags & (VM_UFFD_WP | VM_UFFD_MINOR);
> +	return vma->vm_flags & (VM_UFFD_MINOR | VM_UFFD_WP | VM_UFFD_RWP);

While we're here we might as well switch to using the new API?

Can do:

	return vma_test_any_mask(vma, __VMA_UFFD_FLAGS);

One unfortunate thing is using bit values means we can't do the VM_NONE
trick, but if !CONFIG_USERFAULTFD_RWP then VMA_UFFD_RWP_BIT wouldn't be set
anyway, same for minor so this should be fine?

>  }
>
>  /*
> @@ -208,6 +209,16 @@ static inline bool userfaultfd_minor(struct vm_area_struct *vma)
>  	return vma->vm_flags & VM_UFFD_MINOR;
>  }
>
> +static inline bool userfaultfd_rwp(struct vm_area_struct *vma)
> +{
> +	return vma->vm_flags & VM_UFFD_RWP;
> +}

Can be:

	return vma_test(vma, VMA_UFFD_RWP_BIT);

> +
> +static inline bool userfaultfd_protected(struct vm_area_struct *vma)
> +{
> +	return userfaultfd_wp(vma) || userfaultfd_rwp(vma);
> +}
> +
>  static inline bool userfaultfd_pte_wp(struct vm_area_struct *vma,
>  				      pte_t pte)
>  {
> @@ -328,6 +339,16 @@ static inline bool userfaultfd_minor(struct vm_area_struct *vma)
>  	return false;
>  }
>
> +static inline bool userfaultfd_rwp(struct vm_area_struct *vma)
> +{
> +	return false;
> +}
> +
> +static inline bool userfaultfd_protected(struct vm_area_struct *vma)
> +{
> +	return false;
> +}
> +
>  static inline bool userfaultfd_pte_wp(struct vm_area_struct *vma,
>  				      pte_t pte)
>  {
> @@ -421,8 +442,8 @@ static inline bool userfaultfd_wp_use_markers(struct vm_area_struct *vma)
>  }
>
>  /*
> - * Returns true if this is a swap pte and was uffd-wp wr-protected in either
> - * forms (pte marker or a normal swap pte), false otherwise.
> + * Returns true if this swap pte carries uffd-tracked state in either
> + * form (pte marker or a normal swap pte), false otherwise.
>   */
>  static inline bool pte_swp_uffd_any(pte_t pte)
>  {
> diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
> index a6e5a44c9b42..bfface3d0203 100644
> --- a/include/trace/events/mmflags.h
> +++ b/include/trace/events/mmflags.h
> @@ -194,6 +194,12 @@ IF_HAVE_PG_ARCH_3(arch_3)
>  # define IF_HAVE_UFFD_MINOR(flag, name)
>  #endif
>
> +#ifdef CONFIG_USERFAULTFD_RWP
> +# define IF_HAVE_UFFD_RWP(flag, name) {flag, name},
> +#else
> +# define IF_HAVE_UFFD_RWP(flag, name)
> +#endif
> +
>  #if defined(CONFIG_64BIT) || defined(CONFIG_PPC32)
>  # define IF_HAVE_VM_DROPPABLE(flag, name) {flag, name},
>  #else
> @@ -215,6 +221,7 @@ IF_HAVE_UFFD_MINOR(VM_UFFD_MINOR,	"uffd_minor"	)		\
>  	{VM_PFNMAP,			"pfnmap"	},		\
>  	{VM_MAYBE_GUARD,		"maybe_guard"	},		\
>  	{VM_UFFD_WP,			"uffd_wp"	},		\
> +IF_HAVE_UFFD_RWP(VM_UFFD_RWP,		"uffd_rwp"	)		\
>  	{VM_LOCKED,			"locked"	},		\
>  	{VM_IO,				"io"		},		\
>  	{VM_SEQ_READ,			"seqread"	},		\
> --
> 2.54.0
>

Cheers, Lorenzo


  reply	other threads:[~2026-05-29  7:25 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-26 13:04 [PATCH v5 00/18] userfaultfd: working set tracking for VM guest memory Kiryl Shutsemau
2026-05-26 13:04 ` [PATCH v5 01/18] fs/proc/task_mmu: fix make_uffd_wp_huge_pte() prot-update race Kiryl Shutsemau
2026-05-26 13:04 ` [PATCH v5 02/18] mm/huge_memory: preserve pmd_swp_uffd_wp on device-private PMD downgrade Kiryl Shutsemau
2026-05-26 13:04 ` [PATCH v5 03/18] userfaultfd: gate must_wait writability check on pte_present() Kiryl Shutsemau
2026-05-26 13:04 ` [PATCH v5 04/18] mm: skip out-of-range bits in mk_vma_flags() Kiryl Shutsemau
2026-05-29 14:00   ` Lorenzo Stoakes
2026-05-29 16:09     ` Kiryl Shutsemau
2026-05-26 13:04 ` [PATCH v5 05/18] mm: decouple protnone helpers from CONFIG_NUMA_BALANCING Kiryl Shutsemau
2026-05-26 13:04 ` [PATCH v5 06/18] mm: rename uffd-wp PTE bit macros to uffd Kiryl Shutsemau
2026-05-26 13:04 ` [PATCH v5 07/18] mm: rename uffd-wp PTE accessors " Kiryl Shutsemau
2026-05-26 13:04 ` [PATCH v5 08/18] mm: add VM_UFFD_RWP VMA flag Kiryl Shutsemau
2026-05-29  7:24   ` Lorenzo Stoakes [this message]
2026-05-29 13:07     ` Kiryl Shutsemau
2026-05-29 14:00       ` Lorenzo Stoakes
2026-05-26 13:04 ` [PATCH v5 09/18] mm: add MM_CP_UFFD_RWP change_protection() flag Kiryl Shutsemau
2026-05-29  1:19   ` SeongJae Park
2026-05-26 13:04 ` [PATCH v5 10/18] mm: preserve RWP marker across PTE rewrites Kiryl Shutsemau
2026-05-26 13:04 ` [PATCH v5 11/18] mm: handle VM_UFFD_RWP in khugepaged, rmap, and GUP Kiryl Shutsemau
2026-05-26 13:05 ` [PATCH v5 12/18] userfaultfd: add UFFDIO_REGISTER_MODE_RWP and UFFDIO_RWPROTECT plumbing Kiryl Shutsemau
2026-05-26 13:05 ` [PATCH v5 13/18] mm/userfaultfd: add RWP fault delivery and expose UFFDIO_REGISTER_MODE_RWP Kiryl Shutsemau
2026-05-26 13:05 ` [PATCH v5 14/18] mm/pagemap: add PAGE_IS_ACCESSED for RWP tracking Kiryl Shutsemau
2026-05-26 13:05 ` [PATCH v5 15/18] userfaultfd: add UFFD_FEATURE_RWP_ASYNC for async fault resolution Kiryl Shutsemau
2026-05-26 13:05 ` [PATCH v5 16/18] userfaultfd: add UFFDIO_SET_MODE for runtime sync/async toggle Kiryl Shutsemau
2026-05-26 13:05 ` [PATCH v5 17/18] selftests/mm: add userfaultfd RWP tests Kiryl Shutsemau
2026-05-26 13:05 ` [PATCH v5 18/18] Documentation/userfaultfd: document RWP working set tracking Kiryl Shutsemau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ahk60ViRq4q2g4uz@lucifer \
    --to=ljs@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=david@kernel.org \
    --cc=jthoughton@google.com \
    --cc=kas@kernel.org \
    --cc=kernel-team@meta.com \
    --cc=kirill@shutemov.name \
    --cc=kvm@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=rppt@kernel.org \
    --cc=seanjc@google.com \
    --cc=sj@kernel.org \
    --cc=skhan@linuxfoundation.org \
    --cc=surenb@google.com \
    --cc=usama.arif@linux.dev \
    --cc=vbabka@kernel.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox