All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lorenzo Stoakes <ljs@kernel.org>
To: Kiryl Shutsemau <kirill@shutemov.name>
Cc: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com,
	 david@kernel.org, surenb@google.com, vbabka@kernel.org,
	Liam.Howlett@oracle.com,  ziy@nvidia.com, corbet@lwn.net,
	skhan@linuxfoundation.org, seanjc@google.com,
	 pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com,
	sj@kernel.org,  usama.arif@linux.dev, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,  linux-doc@vger.kernel.org,
	linux-kselftest@vger.kernel.org, kvm@vger.kernel.org,
	 kernel-team@meta.com, "Kiryl Shutsemau (Meta)" <kas@kernel.org>
Subject: Re: [PATCH v5 08/18] mm: add VM_UFFD_RWP VMA flag
Date: Fri, 29 May 2026 08:24:55 +0100	[thread overview]
Message-ID: <ahk60ViRq4q2g4uz@lucifer> (raw)
In-Reply-To: <20260526130509.2748441-9-kirill@shutemov.name>

On Tue, May 26, 2026 at 02:04:56PM +0100, Kiryl Shutsemau wrote:
> From: "Kiryl Shutsemau (Meta)" <kas@kernel.org>
>
> Preparatory patch for userfaultfd read-write protection (RWP). RWP
> extends userfaultfd protection from plain write-protection (WP) to
> full read-write protection: accesses to an RWP-protected range --
> reads as well as writes -- trap through userfaultfd.
>
> Reserve VM_UFFD_RWP, add the userfaultfd_rwp() and
> userfaultfd_protected() helpers, and wire up the smaps "ur" entry and
> the trace-flag table the rest of the series will use. The flag is
> gated on CONFIG_USERFAULTFD_RWP, which is introduced together with the
> UAPI in a later patch; until then VM_UFFD_RWP aliases VM_NONE and
> every downstream check folds to dead code.
>
> Nothing sets or queries the flag yet.
>
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> Assisted-by: Claude:claude-opus-4-6

Hm, if you've just used claude to bounce ideas off, I'm really not sure if
it's necessary to disclose, though I respect your thoroughness for doing so
:)

I guess determining the threshold at which it makes sense to do so is still
a WIP for us in the kernel.

> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> Reviewed-by: SeongJae Park <sj@kernel.org>
> ---
>  Documentation/filesystems/proc.rst |  1 +
>  fs/proc/task_mmu.c                 |  3 +++
>  include/linux/mm.h                 | 28 +++++++++++++++++----------
>  include/linux/userfaultfd_k.h      | 31 +++++++++++++++++++++++++-----
>  include/trace/events/mmflags.h     |  7 +++++++
>  5 files changed, 55 insertions(+), 15 deletions(-)
>
> diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
> index db6167befb7b..db28207c5290 100644
> --- a/Documentation/filesystems/proc.rst
> +++ b/Documentation/filesystems/proc.rst
> @@ -607,6 +607,7 @@ encoded manner. The codes are the following:
>      um    userfaultfd missing tracking
>      uw    userfaultfd wr-protect tracking
>      ui    userfaultfd minor fault
> +    ur    userfaultfd read-write-protect tracking
>      ss    shadow/guarded control stack page
>      sl    sealed
>      lf    lock on fault pages
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 1e5f6ee8a3b6..974c5f4aa533 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -1237,6 +1237,9 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
>  #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR
>  		[ilog2(VM_UFFD_MINOR)]	= "ui",
>  #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */
> +#ifdef CONFIG_USERFAULTFD_RWP
> +		[ilog2(VM_UFFD_RWP)]	= "ur",
> +#endif
>  #ifdef CONFIG_ARCH_HAS_USER_SHADOW_STACK
>  		[ilog2(VM_SHADOW_STACK)] = "ss",
>  #endif
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 71b11945e4fc..6499cfb61dc4 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -362,6 +362,7 @@ enum {
>  #endif
>  	DECLARE_VMA_BIT(UFFD_MINOR, 41),
>  	DECLARE_VMA_BIT(SEALED, 42),
> +	DECLARE_VMA_BIT(UFFD_RWP, 43),

I'm guessing CONFIG_USERFAULTFD_RWP is predicated on CONFIG_64BIT?

It's a silly situation and once my VMA flags stuff is done it'll be
eliminated but for now... :)

>  	/* Flags that reuse flags above. */
>  	DECLARE_VMA_BIT_ALIAS(PKEY_BIT0, HIGH_ARCH_0),
>  	DECLARE_VMA_BIT_ALIAS(PKEY_BIT1, HIGH_ARCH_1),
> @@ -505,6 +506,11 @@ enum {
>  #else
>  #define VM_UFFD_MINOR	VM_NONE
>  #endif
> +#ifdef CONFIG_USERFAULTFD_RWP
> +#define VM_UFFD_RWP		INIT_VM_FLAG(UFFD_RWP)
> +#else
> +#define VM_UFFD_RWP		VM_NONE
> +#endif
>  #ifdef CONFIG_64BIT
>  #define VM_ALLOW_ANY_UNCACHED	INIT_VM_FLAG(ALLOW_ANY_UNCACHED)
>  #define VM_SEALED		INIT_VM_FLAG(SEALED)
> @@ -642,22 +648,24 @@ enum {
>   * reconsistuted upon page fault, so necessitate page table copying upon fork.
>   *
>   * Note that these flags should be compared with the DESTINATION VMA not the
> - * source, as VM_UFFD_WP may not be propagated to destination, while all other
> - * flags will be.
> + * source: VM_UFFD_WP and VM_UFFD_RWP may be cleared on the destination
> + * (dup_userfaultfd() -> userfaultfd_reset_ctx() when the parent context did
> + * not negotiate UFFD_FEATURE_EVENT_FORK), while all other flags propagate.
>   *
>   * VM_PFNMAP / VM_MIXEDMAP - These contain kernel-mapped data which cannot be
>   *                           reasonably reconstructed on page fault.
>   *
>   *              VM_UFFD_WP - Encodes metadata about an installed uffd
> - *                           write protect handler, which cannot be
> - *                           reconstructed on page fault.
> + *              VM_UFFD_RWP  write- or read-write-protect handler, which
> + *                           cannot be reconstructed on page fault.
>   *
> - *                           We always copy pgtables when dst_vma has uffd-wp
> - *                           enabled even if it's file-backed
> - *                           (e.g. shmem). Because when uffd-wp is enabled,
> - *                           pgtable contains uffd-wp protection information,
> - *                           that's something we can't retrieve from page cache,
> - *                           and skip copying will lose those info.
> + *                           We always copy pgtables when dst_vma has the
> + *                           uffd PTE bit in use even if it's file-backed
> + *                           (e.g. shmem). Because when the uffd bit is
> + *                           in use, the pgtable contains the protection
> + *                           information, that's something we can't
> + *                           retrieve from page cache, and skip copying
> + *                           will lose those info.
>   *
>   *          VM_MAYBE_GUARD - Could contain page guard region markers which
>   *                           by design are a property of the page tables
> diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
> index f4cf5763f92c..0aef628514df 100644
> --- a/include/linux/userfaultfd_k.h
> +++ b/include/linux/userfaultfd_k.h
> @@ -21,10 +21,11 @@
>  #include <linux/hugetlb_inline.h>
>
>  /* The set of all possible UFFD-related VM flags. */
> -#define __VM_UFFD_FLAGS (VM_UFFD_MISSING | VM_UFFD_WP | VM_UFFD_MINOR)
> +#define __VM_UFFD_FLAGS (VM_UFFD_MISSING | VM_UFFD_MINOR | \
> +			 VM_UFFD_WP | VM_UFFD_RWP)
>
>  #define __VMA_UFFD_FLAGS mk_vma_flags(VMA_UFFD_MISSING_BIT, VMA_UFFD_WP_BIT, \
> -				      VMA_UFFD_MINOR_BIT)
> +				      VMA_UFFD_MINOR_BIT, VMA_UFFD_RWP_BIT)
>
>  /*
>   * CAREFUL: Check include/uapi/asm-generic/fcntl.h when defining
> @@ -178,7 +179,7 @@ static inline bool is_mergeable_vm_userfaultfd_ctx(struct vm_area_struct *vma,
>   */
>  static inline bool uffd_disable_huge_pmd_share(struct vm_area_struct *vma)
>  {
> -	return vma->vm_flags & (VM_UFFD_WP | VM_UFFD_MINOR);
> +	return vma->vm_flags & (VM_UFFD_MINOR | VM_UFFD_WP | VM_UFFD_RWP);

While we're here we might as well switch to using the new API?

Can do:

	return vma_test_any_mask(vma, __VMA_UFFD_FLAGS);

One unfortunate thing is using bit values means we can't do the VM_NONE
trick, but if !CONFIG_USERFAULTFD_RWP then VMA_UFFD_RWP_BIT wouldn't be set
anyway, same for minor so this should be fine?

>  }
>
>  /*
> @@ -208,6 +209,16 @@ static inline bool userfaultfd_minor(struct vm_area_struct *vma)
>  	return vma->vm_flags & VM_UFFD_MINOR;
>  }
>
> +static inline bool userfaultfd_rwp(struct vm_area_struct *vma)
> +{
> +	return vma->vm_flags & VM_UFFD_RWP;
> +}

Can be:

	return vma_test(vma, VMA_UFFD_RWP_BIT);

> +
> +static inline bool userfaultfd_protected(struct vm_area_struct *vma)
> +{
> +	return userfaultfd_wp(vma) || userfaultfd_rwp(vma);
> +}
> +
>  static inline bool userfaultfd_pte_wp(struct vm_area_struct *vma,
>  				      pte_t pte)
>  {
> @@ -328,6 +339,16 @@ static inline bool userfaultfd_minor(struct vm_area_struct *vma)
>  	return false;
>  }
>
> +static inline bool userfaultfd_rwp(struct vm_area_struct *vma)
> +{
> +	return false;
> +}
> +
> +static inline bool userfaultfd_protected(struct vm_area_struct *vma)
> +{
> +	return false;
> +}
> +
>  static inline bool userfaultfd_pte_wp(struct vm_area_struct *vma,
>  				      pte_t pte)
>  {
> @@ -421,8 +442,8 @@ static inline bool userfaultfd_wp_use_markers(struct vm_area_struct *vma)
>  }
>
>  /*
> - * Returns true if this is a swap pte and was uffd-wp wr-protected in either
> - * forms (pte marker or a normal swap pte), false otherwise.
> + * Returns true if this swap pte carries uffd-tracked state in either
> + * form (pte marker or a normal swap pte), false otherwise.
>   */
>  static inline bool pte_swp_uffd_any(pte_t pte)
>  {
> diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
> index a6e5a44c9b42..bfface3d0203 100644
> --- a/include/trace/events/mmflags.h
> +++ b/include/trace/events/mmflags.h
> @@ -194,6 +194,12 @@ IF_HAVE_PG_ARCH_3(arch_3)
>  # define IF_HAVE_UFFD_MINOR(flag, name)
>  #endif
>
> +#ifdef CONFIG_USERFAULTFD_RWP
> +# define IF_HAVE_UFFD_RWP(flag, name) {flag, name},
> +#else
> +# define IF_HAVE_UFFD_RWP(flag, name)
> +#endif
> +
>  #if defined(CONFIG_64BIT) || defined(CONFIG_PPC32)
>  # define IF_HAVE_VM_DROPPABLE(flag, name) {flag, name},
>  #else
> @@ -215,6 +221,7 @@ IF_HAVE_UFFD_MINOR(VM_UFFD_MINOR,	"uffd_minor"	)		\
>  	{VM_PFNMAP,			"pfnmap"	},		\
>  	{VM_MAYBE_GUARD,		"maybe_guard"	},		\
>  	{VM_UFFD_WP,			"uffd_wp"	},		\
> +IF_HAVE_UFFD_RWP(VM_UFFD_RWP,		"uffd_rwp"	)		\
>  	{VM_LOCKED,			"locked"	},		\
>  	{VM_IO,				"io"		},		\
>  	{VM_SEQ_READ,			"seqread"	},		\
> --
> 2.54.0
>

Cheers, Lorenzo

  parent reply	other threads:[~2026-05-29  7:25 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-26 13:04 [PATCH v5 00/18] userfaultfd: working set tracking for VM guest memory Kiryl Shutsemau
2026-05-26 13:04 ` [PATCH v5 01/18] fs/proc/task_mmu: fix make_uffd_wp_huge_pte() prot-update race Kiryl Shutsemau
2026-05-26 13:46   ` sashiko-bot
2026-05-26 13:04 ` [PATCH v5 02/18] mm/huge_memory: preserve pmd_swp_uffd_wp on device-private PMD downgrade Kiryl Shutsemau
2026-05-26 13:43   ` sashiko-bot
2026-05-26 13:04 ` [PATCH v5 03/18] userfaultfd: gate must_wait writability check on pte_present() Kiryl Shutsemau
2026-05-26 13:44   ` sashiko-bot
2026-05-26 13:04 ` [PATCH v5 04/18] mm: skip out-of-range bits in mk_vma_flags() Kiryl Shutsemau
2026-05-29 14:00   ` Lorenzo Stoakes
2026-05-29 16:09     ` Kiryl Shutsemau
2026-06-01  9:37       ` Lorenzo Stoakes
2026-05-30 16:52     ` Mike Rapoport
2026-06-01  7:42       ` Lorenzo Stoakes
2026-06-01 14:08       ` Kiryl Shutsemau
2026-06-01 14:28         ` Mike Rapoport
2026-05-26 13:04 ` [PATCH v5 05/18] mm: decouple protnone helpers from CONFIG_NUMA_BALANCING Kiryl Shutsemau
2026-05-26 13:04 ` [PATCH v5 06/18] mm: rename uffd-wp PTE bit macros to uffd Kiryl Shutsemau
2026-05-26 13:04 ` [PATCH v5 07/18] mm: rename uffd-wp PTE accessors " Kiryl Shutsemau
2026-05-26 13:29   ` sashiko-bot
2026-05-26 13:04 ` [PATCH v5 08/18] mm: add VM_UFFD_RWP VMA flag Kiryl Shutsemau
2026-05-26 14:37   ` sashiko-bot
2026-05-29  7:24   ` Lorenzo Stoakes [this message]
2026-05-29 13:07     ` Kiryl Shutsemau
2026-05-29 14:00       ` Lorenzo Stoakes
2026-05-26 13:04 ` [PATCH v5 09/18] mm: add MM_CP_UFFD_RWP change_protection() flag Kiryl Shutsemau
2026-05-26 14:07   ` sashiko-bot
2026-05-29  1:19   ` SeongJae Park
2026-05-26 13:04 ` [PATCH v5 10/18] mm: preserve RWP marker across PTE rewrites Kiryl Shutsemau
2026-05-26 14:15   ` sashiko-bot
2026-05-26 13:04 ` [PATCH v5 11/18] mm: handle VM_UFFD_RWP in khugepaged, rmap, and GUP Kiryl Shutsemau
2026-05-26 15:04   ` sashiko-bot
2026-05-26 13:05 ` [PATCH v5 12/18] userfaultfd: add UFFDIO_REGISTER_MODE_RWP and UFFDIO_RWPROTECT plumbing Kiryl Shutsemau
2026-05-26 14:45   ` sashiko-bot
2026-05-26 13:05 ` [PATCH v5 13/18] mm/userfaultfd: add RWP fault delivery and expose UFFDIO_REGISTER_MODE_RWP Kiryl Shutsemau
2026-05-26 14:33   ` sashiko-bot
2026-05-26 13:05 ` [PATCH v5 14/18] mm/pagemap: add PAGE_IS_ACCESSED for RWP tracking Kiryl Shutsemau
2026-05-26 14:37   ` sashiko-bot
2026-05-26 13:05 ` [PATCH v5 15/18] userfaultfd: add UFFD_FEATURE_RWP_ASYNC for async fault resolution Kiryl Shutsemau
2026-05-26 13:05 ` [PATCH v5 16/18] userfaultfd: add UFFDIO_SET_MODE for runtime sync/async toggle Kiryl Shutsemau
2026-05-26 15:07   ` sashiko-bot
2026-05-26 13:05 ` [PATCH v5 17/18] selftests/mm: add userfaultfd RWP tests Kiryl Shutsemau
2026-05-26 13:05 ` [PATCH v5 18/18] Documentation/userfaultfd: document RWP working set tracking Kiryl Shutsemau
2026-05-26 14:51   ` sashiko-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ahk60ViRq4q2g4uz@lucifer \
    --to=ljs@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=david@kernel.org \
    --cc=jthoughton@google.com \
    --cc=kas@kernel.org \
    --cc=kernel-team@meta.com \
    --cc=kirill@shutemov.name \
    --cc=kvm@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=rppt@kernel.org \
    --cc=seanjc@google.com \
    --cc=sj@kernel.org \
    --cc=skhan@linuxfoundation.org \
    --cc=surenb@google.com \
    --cc=usama.arif@linux.dev \
    --cc=vbabka@kernel.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.