From: Lorenzo Stoakes <ljs@kernel.org>
To: Kiryl Shutsemau <kirill@shutemov.name>
Cc: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com,
david@kernel.org, surenb@google.com, vbabka@kernel.org,
Liam.Howlett@oracle.com, ziy@nvidia.com, corbet@lwn.net,
skhan@linuxfoundation.org, seanjc@google.com,
pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com,
sj@kernel.org, usama.arif@linux.dev, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
linux-kselftest@vger.kernel.org, kvm@vger.kernel.org,
kernel-team@meta.com, "Kiryl Shutsemau (Meta)" <kas@kernel.org>,
stable@vger.kernel.org
Subject: Re: [PATCH v5 04/18] mm: skip out-of-range bits in mk_vma_flags()
Date: Fri, 29 May 2026 15:00:14 +0100 [thread overview]
Message-ID: <ahmQvfNk7S4F0LBj@lucifer> (raw)
In-Reply-To: <20260526130509.2748441-5-kirill@shutemov.name>
On Tue, May 26, 2026 at 02:04:52PM +0100, Kiryl Shutsemau wrote:
> From: "Kiryl Shutsemau (Meta)" <kas@kernel.org>
>
> vma_flags_t is one unsigned long on 32-bit -- NUM_VMA_FLAG_BITS ==
> BITS_PER_LONG by design, so VM_xxx-declared bits sit in the first
> word and hit the single-long fast path. But the bit enum declares
> some bits unconditionally above BITS_PER_LONG (VMA_UFFD_MINOR_BIT
> == 41 today, with VM_UFFD_MINOR == VM_NONE on 32-bit so no VMA
> actually carries the bit).
Yeah ugh.
>
> Passing such a bit to mk_vma_flags() goes through __set_bit(41,
> &one_long) and writes one word past the end. The compiler folds
> the OOB store with wraparound (1UL << (41 % 32) == bit 9) into
> the first word. Bit 9 is already in __VMA_UFFD_FLAGS so the mask
> happens to come out right today, but any high-numbered bit whose
That is... helpful :) but not great that this is the situation, an
oversight, clearly! How I hate 32-bit kernels :)
> mod-BITS_PER_LONG position is otherwise unused would silently OR
> an extra bit into the mask.
>
> Add VMA_NO_BIT and have DECLARE_VMA_BIT() resolve any bitnum out
> of range to it. vma_flags_set_flag() drops negative bit values.
> The ternary collapses at compile time, the runtime check folds
> away when the bit is in range, and the common path is unchanged.
Hmm are you sure it does?
A key design goal was that mk_vma_flags() generates compile-time constants
the same as if the bitmap were constructed independently.
This surely must generate code? Or at least runs a significant risk of it?
Setting a precedent here would probably lead to VMA_NO_BIT being used more
and therefore generating code in more places I think.
And I'm not sure I really want to bend over backwards here to work around
issues with 32-bit kernels when in the long run the intent is that we move
to making these values 64-bit anyway.
Perhaps it's even wise possibly to just make these values 64-bit already,
ahead of time?
(I did look at this in terms of wanting something like a VMA_NO_BIT so we
could get VM_NONE-like behaviour but nothing I tried failed to generate
code.)
A simple solution that doesn't require change to the core is to just uglify
userfaultfd_k.h a bit with:
#ifdef HAVE_ARCH_USERFAULTFD_MINOR
#define __VMA_UFFD_FLAGS mk_vma_flags(VMA_UFFD_MISSING_BIT, VMA_UFFD_WP_BIT, \
VMA_UFFD_MINOR_BIT)
#else
#define __VMA_UFFD_FLAGS mk_vma_flags(VMA_UFFD_MISSING_BIT, VMA_UFFD_WP_BIT)
#endif
But of course that becomes much more horrible with your changes...
Another alternative, which I used for VMA_DROPPABLE is to add something
like this in mm.h:
#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR
#define VM_UFFD_MINOR INIT_VM_FLAG(UFFD_MINOR)
+define VMA_UFFD_MINOR mk_vma_flags(VMA_UFFD_MINOR_BIT)
#else
#define VM_UFFD_MINOR VM_NONE
+define VMA_UFFD_MINOR EMPTY_VMA_FLAGS
#endif
Then we can do:
#define __VMA_UFFD_FLAGS append_vma_flags(VMA_MINOR, VMA_UFFD_MISSING_BIT, \
VMA_UFFD_WP_BIT)
With you changes in 08/18 on top it'd get hairier, but we could make our
life easier by implementing something like:
static __always_inline vma_flags_t __mk_vma_flags_from_masks(size_t count,
const vma_flags_t *masks)
{
vma_flags_t flags = EMPTY_VMA_FLAGS;
int i;
for (i = 0; i < count; i++)
mask = vma_flags_set_mask(&flags, masks[i]);
return flags;
}
#define mk_vma_flags_from_masks(...) __mk_vma_flags_from_masks(, \
COUNT_ARGS(__VA_ARGS__), (const vma_flags_t []){__VA_ARGS__})
(untested code - you would need to ensure it generated equivalent
constants as VM_xxx would now :)
Then you could get VMA_UFFD_RWP with:
#ifdef CONFIG_USERFAULTFD_RWP
#define VMA_UFFD_RWP mk_vma_flags(VMA_UFFD_RWP_BIT)
#else
#define VMA_UFFD_RWP EMPTY_VMA_FLAGS
#endif
And then:
/* Always available if CONFIG_USERFAULTFD set. */
#define __VMA_UFFD_DEFAULT_FLAGS \
mk_vma_flags(VMA_UFFD_MISSING_BIT, VMA_UFFD_WP_BIT)
#define __VMA_UFFD_FLAGS mk_vma_flags_from_masks(__VMA_UFFD_DEFAULT_FLAGS \
VMA_MINOR, VMA_RWP)
Which is kind ok-ish right? I mean it's all a bit fugly obviously.
>
> Bits declared in the enum are now safe to pass to mk_vma_flags()
> regardless of arch.
I mean another issue here is we're then codifying behaviour that's legacy
ahead of time. I really want to avoid that.
>
> Fixes: 9ea35a25d51b ("mm: introduce VMA flags bitmap type")
> Cc: stable@vger.kernel.org
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> ---
> include/linux/mm.h | 15 +++++++++++++--
> 1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 0f2612a70fb1..71b11945e4fc 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -286,8 +286,17 @@ extern unsigned int kobjsize(const void *objp);
> */
> typedef int __bitwise vma_flag_t;
>
> -#define DECLARE_VMA_BIT(name, bitnum) \
> - VMA_ ## name ## _BIT = ((__force vma_flag_t)bitnum)
> +/*
> + * VMA_NO_BIT means "no bit"; mk_vma_flags() skips it. DECLARE_VMA_BIT()
> + * below uses it for any bit number that doesn't fit in the bitmap, so
> + * callers don't need to track which bits are valid on the current build.
> + */
> +#define VMA_NO_BIT ((__force vma_flag_t)-1)
> +
> +#define DECLARE_VMA_BIT(name, bitnum) \
> + VMA_ ## name ## _BIT = (((bitnum) < NUM_VMA_FLAG_BITS) ? \
> + ((__force vma_flag_t)(bitnum)) : \
> + VMA_NO_BIT)
> #define DECLARE_VMA_BIT_ALIAS(name, aliased) \
> VMA_ ## name ## _BIT = (VMA_ ## aliased ## _BIT)
> enum {
> @@ -1081,6 +1090,8 @@ static __always_inline void vma_flags_set_flag(vma_flags_t *flags,
> {
> unsigned long *bitmap = flags->__vma_flags;
>
> + if ((__force int)bit < 0)
> + return;
> __set_bit((__force int)bit, bitmap);
> }
>
> --
> 2.54.0
>
Either way, I think we should break out any fix like this from the series.
Andrew is I think also not a fan of fixes patches in the middle of series
anyway :P
Cheers, Lorenzo
next prev parent reply other threads:[~2026-05-29 14:00 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-26 13:04 [PATCH v5 00/18] userfaultfd: working set tracking for VM guest memory Kiryl Shutsemau
2026-05-26 13:04 ` [PATCH v5 01/18] fs/proc/task_mmu: fix make_uffd_wp_huge_pte() prot-update race Kiryl Shutsemau
2026-05-26 13:46 ` sashiko-bot
2026-05-26 13:04 ` [PATCH v5 02/18] mm/huge_memory: preserve pmd_swp_uffd_wp on device-private PMD downgrade Kiryl Shutsemau
2026-05-26 13:43 ` sashiko-bot
2026-05-26 13:04 ` [PATCH v5 03/18] userfaultfd: gate must_wait writability check on pte_present() Kiryl Shutsemau
2026-05-26 13:44 ` sashiko-bot
2026-05-26 13:04 ` [PATCH v5 04/18] mm: skip out-of-range bits in mk_vma_flags() Kiryl Shutsemau
2026-05-29 14:00 ` Lorenzo Stoakes [this message]
2026-05-29 16:09 ` Kiryl Shutsemau
2026-06-01 9:37 ` Lorenzo Stoakes
2026-05-30 16:52 ` Mike Rapoport
2026-06-01 7:42 ` Lorenzo Stoakes
2026-06-01 14:08 ` Kiryl Shutsemau
2026-06-01 14:28 ` Mike Rapoport
2026-05-26 13:04 ` [PATCH v5 05/18] mm: decouple protnone helpers from CONFIG_NUMA_BALANCING Kiryl Shutsemau
2026-05-26 13:04 ` [PATCH v5 06/18] mm: rename uffd-wp PTE bit macros to uffd Kiryl Shutsemau
2026-05-26 13:04 ` [PATCH v5 07/18] mm: rename uffd-wp PTE accessors " Kiryl Shutsemau
2026-05-26 13:29 ` sashiko-bot
2026-05-26 13:04 ` [PATCH v5 08/18] mm: add VM_UFFD_RWP VMA flag Kiryl Shutsemau
2026-05-26 14:37 ` sashiko-bot
2026-05-29 7:24 ` Lorenzo Stoakes
2026-05-29 13:07 ` Kiryl Shutsemau
2026-05-29 14:00 ` Lorenzo Stoakes
2026-05-26 13:04 ` [PATCH v5 09/18] mm: add MM_CP_UFFD_RWP change_protection() flag Kiryl Shutsemau
2026-05-26 14:07 ` sashiko-bot
2026-05-29 1:19 ` SeongJae Park
2026-05-26 13:04 ` [PATCH v5 10/18] mm: preserve RWP marker across PTE rewrites Kiryl Shutsemau
2026-05-26 14:15 ` sashiko-bot
2026-05-26 13:04 ` [PATCH v5 11/18] mm: handle VM_UFFD_RWP in khugepaged, rmap, and GUP Kiryl Shutsemau
2026-05-26 15:04 ` sashiko-bot
2026-05-26 13:05 ` [PATCH v5 12/18] userfaultfd: add UFFDIO_REGISTER_MODE_RWP and UFFDIO_RWPROTECT plumbing Kiryl Shutsemau
2026-05-26 14:45 ` sashiko-bot
2026-05-26 13:05 ` [PATCH v5 13/18] mm/userfaultfd: add RWP fault delivery and expose UFFDIO_REGISTER_MODE_RWP Kiryl Shutsemau
2026-05-26 14:33 ` sashiko-bot
2026-05-26 13:05 ` [PATCH v5 14/18] mm/pagemap: add PAGE_IS_ACCESSED for RWP tracking Kiryl Shutsemau
2026-05-26 14:37 ` sashiko-bot
2026-05-26 13:05 ` [PATCH v5 15/18] userfaultfd: add UFFD_FEATURE_RWP_ASYNC for async fault resolution Kiryl Shutsemau
2026-05-26 13:05 ` [PATCH v5 16/18] userfaultfd: add UFFDIO_SET_MODE for runtime sync/async toggle Kiryl Shutsemau
2026-05-26 15:07 ` sashiko-bot
2026-05-26 13:05 ` [PATCH v5 17/18] selftests/mm: add userfaultfd RWP tests Kiryl Shutsemau
2026-05-26 13:05 ` [PATCH v5 18/18] Documentation/userfaultfd: document RWP working set tracking Kiryl Shutsemau
2026-05-26 14:51 ` sashiko-bot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ahmQvfNk7S4F0LBj@lucifer \
--to=ljs@kernel.org \
--cc=Liam.Howlett@oracle.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=corbet@lwn.net \
--cc=david@kernel.org \
--cc=jthoughton@google.com \
--cc=kas@kernel.org \
--cc=kernel-team@meta.com \
--cc=kirill@shutemov.name \
--cc=kvm@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=rppt@kernel.org \
--cc=seanjc@google.com \
--cc=sj@kernel.org \
--cc=skhan@linuxfoundation.org \
--cc=stable@vger.kernel.org \
--cc=surenb@google.com \
--cc=usama.arif@linux.dev \
--cc=vbabka@kernel.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.