Linux Documentation
 help / color / mirror / Atom feed
From: Lorenzo Stoakes <ljs@kernel.org>
To: Kiryl Shutsemau <kirill@shutemov.name>
Cc: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com,
	 david@kernel.org, surenb@google.com, vbabka@kernel.org,
	Liam.Howlett@oracle.com,  ziy@nvidia.com, corbet@lwn.net,
	skhan@linuxfoundation.org, seanjc@google.com,
	 pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com,
	sj@kernel.org,  usama.arif@linux.dev, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,  linux-doc@vger.kernel.org,
	linux-kselftest@vger.kernel.org, kvm@vger.kernel.org,
	 kernel-team@meta.com, "Kiryl Shutsemau (Meta)" <kas@kernel.org>,
	 stable@vger.kernel.org
Subject: Re: [PATCH v5 04/18] mm: skip out-of-range bits in mk_vma_flags()
Date: Fri, 29 May 2026 15:00:14 +0100	[thread overview]
Message-ID: <ahmQvfNk7S4F0LBj@lucifer> (raw)
In-Reply-To: <20260526130509.2748441-5-kirill@shutemov.name>

On Tue, May 26, 2026 at 02:04:52PM +0100, Kiryl Shutsemau wrote:
> From: "Kiryl Shutsemau (Meta)" <kas@kernel.org>
>
> vma_flags_t is one unsigned long on 32-bit -- NUM_VMA_FLAG_BITS ==
> BITS_PER_LONG by design, so VM_xxx-declared bits sit in the first
> word and hit the single-long fast path. But the bit enum declares
> some bits unconditionally above BITS_PER_LONG (VMA_UFFD_MINOR_BIT
> == 41 today, with VM_UFFD_MINOR == VM_NONE on 32-bit so no VMA
> actually carries the bit).

Yeah ugh.

>
> Passing such a bit to mk_vma_flags() goes through __set_bit(41,
> &one_long) and writes one word past the end. The compiler folds
> the OOB store with wraparound (1UL << (41 % 32) == bit 9) into
> the first word. Bit 9 is already in __VMA_UFFD_FLAGS so the mask
> happens to come out right today, but any high-numbered bit whose

That is... helpful :) but not great that this is the situation, an
oversight, clearly! How I hate 32-bit kernels :)

> mod-BITS_PER_LONG position is otherwise unused would silently OR
> an extra bit into the mask.
>
> Add VMA_NO_BIT and have DECLARE_VMA_BIT() resolve any bitnum out
> of range to it. vma_flags_set_flag() drops negative bit values.
> The ternary collapses at compile time, the runtime check folds
> away when the bit is in range, and the common path is unchanged.

Hmm are you sure it does?

A key design goal was that mk_vma_flags() generates compile-time constants
the same as if the bitmap were constructed independently.

This surely must generate code? Or at least runs a significant risk of it?

Setting a precedent here would probably lead to VMA_NO_BIT being used more
and therefore generating code in more places I think.

And I'm not sure I really want to bend over backwards here to work around
issues with 32-bit kernels when in the long run the intent is that we move
to making these values 64-bit anyway.

Perhaps it's even wise possibly to just make these values 64-bit already,
ahead of time?

(I did look at this in terms of wanting something like a VMA_NO_BIT so we
could get VM_NONE-like behaviour but nothing I tried failed to generate
code.)

A simple solution that doesn't require change to the core is to just uglify
userfaultfd_k.h a bit with:

#ifdef HAVE_ARCH_USERFAULTFD_MINOR
#define __VMA_UFFD_FLAGS mk_vma_flags(VMA_UFFD_MISSING_BIT, VMA_UFFD_WP_BIT, \
				      VMA_UFFD_MINOR_BIT)
#else
#define __VMA_UFFD_FLAGS mk_vma_flags(VMA_UFFD_MISSING_BIT, VMA_UFFD_WP_BIT)
#endif

But of course that becomes much more horrible with your changes...

Another alternative, which I used for VMA_DROPPABLE is to add something
like this in mm.h:

#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR
#define VM_UFFD_MINOR	INIT_VM_FLAG(UFFD_MINOR)
+define VMA_UFFD_MINOR	mk_vma_flags(VMA_UFFD_MINOR_BIT)
#else
#define VM_UFFD_MINOR	VM_NONE
+define VMA_UFFD_MINOR	EMPTY_VMA_FLAGS
#endif

Then we can do:

#define __VMA_UFFD_FLAGS append_vma_flags(VMA_MINOR, VMA_UFFD_MISSING_BIT, \
					  VMA_UFFD_WP_BIT)

With you changes in 08/18 on top it'd get hairier, but we could make our
life easier by implementing something like:

static __always_inline vma_flags_t __mk_vma_flags_from_masks(size_t count,
	const vma_flags_t *masks)
{
	vma_flags_t flags = EMPTY_VMA_FLAGS;
	int i;

	for (i = 0; i < count; i++)
		mask = vma_flags_set_mask(&flags, masks[i]);

	return flags;
}

#define mk_vma_flags_from_masks(...) __mk_vma_flags_from_masks(, \
		COUNT_ARGS(__VA_ARGS__), (const vma_flags_t []){__VA_ARGS__})

(untested code - you would need to ensure it generated equivalent
constants as VM_xxx would now :)

Then you could get VMA_UFFD_RWP with:

#ifdef CONFIG_USERFAULTFD_RWP
#define VMA_UFFD_RWP	mk_vma_flags(VMA_UFFD_RWP_BIT)
#else
#define VMA_UFFD_RWP	EMPTY_VMA_FLAGS
#endif

And then:


/* Always available if CONFIG_USERFAULTFD set. */
#define __VMA_UFFD_DEFAULT_FLAGS \
		mk_vma_flags(VMA_UFFD_MISSING_BIT, VMA_UFFD_WP_BIT)

#define __VMA_UFFD_FLAGS mk_vma_flags_from_masks(__VMA_UFFD_DEFAULT_FLAGS \
		VMA_MINOR, VMA_RWP)

Which is kind ok-ish right? I mean it's all a bit fugly obviously.

>
> Bits declared in the enum are now safe to pass to mk_vma_flags()
> regardless of arch.

I mean another issue here is we're then codifying behaviour that's legacy
ahead of time. I really want to avoid that.

>
> Fixes: 9ea35a25d51b ("mm: introduce VMA flags bitmap type")
> Cc: stable@vger.kernel.org
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> ---
>  include/linux/mm.h | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 0f2612a70fb1..71b11945e4fc 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -286,8 +286,17 @@ extern unsigned int kobjsize(const void *objp);
>   */
>  typedef int __bitwise vma_flag_t;
>
> -#define DECLARE_VMA_BIT(name, bitnum) \
> -	VMA_ ## name ## _BIT = ((__force vma_flag_t)bitnum)
> +/*
> + * VMA_NO_BIT means "no bit"; mk_vma_flags() skips it. DECLARE_VMA_BIT()
> + * below uses it for any bit number that doesn't fit in the bitmap, so
> + * callers don't need to track which bits are valid on the current build.
> + */
> +#define VMA_NO_BIT	((__force vma_flag_t)-1)
> +
> +#define DECLARE_VMA_BIT(name, bitnum)					\
> +	VMA_ ## name ## _BIT = (((bitnum) < NUM_VMA_FLAG_BITS) ?	\
> +				((__force vma_flag_t)(bitnum)) :	\
> +				VMA_NO_BIT)
>  #define DECLARE_VMA_BIT_ALIAS(name, aliased) \
>  	VMA_ ## name ## _BIT = (VMA_ ## aliased ## _BIT)
>  enum {
> @@ -1081,6 +1090,8 @@ static __always_inline void vma_flags_set_flag(vma_flags_t *flags,
>  {
>  	unsigned long *bitmap = flags->__vma_flags;
>
> +	if ((__force int)bit < 0)
> +		return;
>  	__set_bit((__force int)bit, bitmap);
>  }
>
> --
> 2.54.0
>

Either way, I think we should break out any fix like this from the series.

Andrew is I think also not a fan of fixes patches in the middle of series
anyway :P

Cheers, Lorenzo

  reply	other threads:[~2026-05-29 14:00 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-26 13:04 [PATCH v5 00/18] userfaultfd: working set tracking for VM guest memory Kiryl Shutsemau
2026-05-26 13:04 ` [PATCH v5 01/18] fs/proc/task_mmu: fix make_uffd_wp_huge_pte() prot-update race Kiryl Shutsemau
2026-05-26 13:04 ` [PATCH v5 02/18] mm/huge_memory: preserve pmd_swp_uffd_wp on device-private PMD downgrade Kiryl Shutsemau
2026-05-26 13:04 ` [PATCH v5 03/18] userfaultfd: gate must_wait writability check on pte_present() Kiryl Shutsemau
2026-05-26 13:04 ` [PATCH v5 04/18] mm: skip out-of-range bits in mk_vma_flags() Kiryl Shutsemau
2026-05-29 14:00   ` Lorenzo Stoakes [this message]
2026-05-29 16:09     ` Kiryl Shutsemau
2026-05-26 13:04 ` [PATCH v5 05/18] mm: decouple protnone helpers from CONFIG_NUMA_BALANCING Kiryl Shutsemau
2026-05-26 13:04 ` [PATCH v5 06/18] mm: rename uffd-wp PTE bit macros to uffd Kiryl Shutsemau
2026-05-26 13:04 ` [PATCH v5 07/18] mm: rename uffd-wp PTE accessors " Kiryl Shutsemau
2026-05-26 13:04 ` [PATCH v5 08/18] mm: add VM_UFFD_RWP VMA flag Kiryl Shutsemau
2026-05-29  7:24   ` Lorenzo Stoakes
2026-05-29 13:07     ` Kiryl Shutsemau
2026-05-29 14:00       ` Lorenzo Stoakes
2026-05-26 13:04 ` [PATCH v5 09/18] mm: add MM_CP_UFFD_RWP change_protection() flag Kiryl Shutsemau
2026-05-29  1:19   ` SeongJae Park
2026-05-26 13:04 ` [PATCH v5 10/18] mm: preserve RWP marker across PTE rewrites Kiryl Shutsemau
2026-05-26 13:04 ` [PATCH v5 11/18] mm: handle VM_UFFD_RWP in khugepaged, rmap, and GUP Kiryl Shutsemau
2026-05-26 13:05 ` [PATCH v5 12/18] userfaultfd: add UFFDIO_REGISTER_MODE_RWP and UFFDIO_RWPROTECT plumbing Kiryl Shutsemau
2026-05-26 13:05 ` [PATCH v5 13/18] mm/userfaultfd: add RWP fault delivery and expose UFFDIO_REGISTER_MODE_RWP Kiryl Shutsemau
2026-05-26 13:05 ` [PATCH v5 14/18] mm/pagemap: add PAGE_IS_ACCESSED for RWP tracking Kiryl Shutsemau
2026-05-26 13:05 ` [PATCH v5 15/18] userfaultfd: add UFFD_FEATURE_RWP_ASYNC for async fault resolution Kiryl Shutsemau
2026-05-26 13:05 ` [PATCH v5 16/18] userfaultfd: add UFFDIO_SET_MODE for runtime sync/async toggle Kiryl Shutsemau
2026-05-26 13:05 ` [PATCH v5 17/18] selftests/mm: add userfaultfd RWP tests Kiryl Shutsemau
2026-05-26 13:05 ` [PATCH v5 18/18] Documentation/userfaultfd: document RWP working set tracking Kiryl Shutsemau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ahmQvfNk7S4F0LBj@lucifer \
    --to=ljs@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=david@kernel.org \
    --cc=jthoughton@google.com \
    --cc=kas@kernel.org \
    --cc=kernel-team@meta.com \
    --cc=kirill@shutemov.name \
    --cc=kvm@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=rppt@kernel.org \
    --cc=seanjc@google.com \
    --cc=sj@kernel.org \
    --cc=skhan@linuxfoundation.org \
    --cc=stable@vger.kernel.org \
    --cc=surenb@google.com \
    --cc=usama.arif@linux.dev \
    --cc=vbabka@kernel.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox