From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Kiryl Shutsemau <kas@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Peter Xu <peterx@redhat.com>, Lorenzo Stoakes <ljs@kernel.org>,
Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Vlastimil Babka <vbabka@kernel.org>,
"Liam R . Howlett" <Liam.Howlett@oracle.com>,
Zi Yan <ziy@nvidia.com>, Jonathan Corbet <corbet@lwn.net>,
Shuah Khan <skhan@linuxfoundation.org>,
Sean Christopherson <seanjc@google.com>,
Paolo Bonzini <pbonzini@redhat.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org,
kvm@vger.kernel.org
Subject: Re: [RFC, PATCH 00/12] userfaultfd: working set tracking for VM guest memory
Date: Wed, 22 Apr 2026 20:39:50 +0200 [thread overview]
Message-ID: <b77d559b-215e-460a-a268-e63b8273ef42@kernel.org> (raw)
In-Reply-To: <aed6fHLrIdahbdY3@thinkstation>
On 4/21/26 16:33, Kiryl Shutsemau wrote:
> On Tue, Apr 21, 2026 at 03:03:56PM +0200, David Hildenbrand (Arm) wrote:
>> On 4/19/26 16:33, Kiryl Shutsemau wrote:
>>>
>>> See https://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git uffd/rfc-v3
>>>
>>
>> Quick feedback from skimming over it:
>>
>>
>> 1) ARCH_SUPPORTS_PROT_NONE needs some thought, because I am pretty sure all
>> architectures support something like mprotect(PROT_NONE), and the config
>> option might be misleading.
>>
>> So you very likely want to express different semantics here. You want to
>> know whether pte_protnone()/pmd_protnone() works.
>
> We do support mprotect(PROT_NONE) everywhere, but we don't always have a
> way to distinguish such entries from others without VMA in hands. Like,
> there are other PTEs that don't have present bit set. In my and NUMA
> balancing context we cannot rely on VMA, because we want to install
> PAGE_NONE entires into accessible VMA.
Exactly. So it's not ARCH_SUPPORTS_PROT_NONE.
>
> So we need two things; pte/pmd_protnone() checks and PAGE_NONE itself.
> The first to test PTE for PAGE_NONE, the second for pte/pmd_modify() to
> make the entry protnone.
>
> Currently, generic code only use this functionality for NUMA balancing
> and gated by NUMA balancing config option. So I moved it under separate
> config option.
>
> Do you want it to be named differently?
Would ARCH_SUPPORTS_PXX_PROTNONE or sth. like that better describe that
pte_protnone()/pmd_protnone() do what we want?
>
>> 2) The other stuff is really just an extension of existing WP handling.
>> I suspect we want to have some reasonable cleanups to not end up in
>> common code with
>>
>> @@ -1841,7 +1841,7 @@ static void copy_huge_non_present_pmd(
>> add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR);
>> mm_inc_nr_ptes(dst_mm);
>> pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable);
>> - if (!userfaultfd_wp(dst_vma))
>> + if (!userfaultfd_wp(dst_vma) && !userfaultfd_rwp(dst_vma))
>> pmd = pmd_swp_clear_uffd_wp(pmd);
>> set_pmd_at(dst_mm, addr, dst_pmd, pmd);
>>
>> All the uffd handling should be better isolated (i.e., a single vma check?),
>> and likely the uffd bit should be abstracted away from being called "wp" to
>> something more generic.
>>
>> Maybe it's simply a "uffd" flag which's semantics depend
>> on the vma flags.
>>
>> Maybe something like:
>>
>> @@ -1841,7 +1841,7 @@ static void copy_huge_non_present_pmd(
>> add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR);
>> mm_inc_nr_ptes(dst_mm);
>> pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable);
>> if (!userfaultfd_uses_pte_bit(dst_vma))
>> pmd = pmd_swp_clear_uffd(pmd);
>> set_pmd_at(dst_mm, addr, dst_pmd, pmd);
>>
>> Not sure, needs another thought. But I think there are some decent
>> cleanups to be had.
>
> That's fair. Maybe userfaultfd_protected() name is better for the VMA
> check?
Yes, something like that could also work.
>
> And about UFFD_WP bit name. Maybe we can just drop _WP: _PAGE_UFFD_WP ->
> _PAGE_UFFD, pte_uffd_wp() -> pte_uffd()?
Yes, I hinted at the above with pmd_swp_clear_uffd().
>
> But it is a lot of changes. Can I do the bit rename as a follow up
> patchset?
Let's get this clean. There is no need to rush that in ;)
I suspect it's a fairly mechanical change.
>
>> 3) Some other stuff needs a second thought, like
>>
>> diff --git a/mm/gup.c b/mm/gup.c
>> index 8e7dc2c6ee738..08fc18f1290d4 100644
>> --- a/mm/gup.c
>> +++ b/mm/gup.c
>> @@ -695,7 +695,8 @@ static inline bool can_follow_write_pmd(pmd_t pmd, struct page *page,
>> /* ... and a write-fault isn't required for other reasons. */
>> if (pmd_needs_soft_dirty_wp(vma, pmd))
>> return false;
>> - return !userfaultfd_huge_pmd_wp(vma, pmd);
>> + return !userfaultfd_huge_pmd_wp(vma, pmd) &&
>> + !userfaultfd_huge_pmd_rwp(vma, pmd);
>> }
>>
>> How can a pte be writable and prot_none at the same time? Maybe just confused AI
>> output that you should carefully double check before sending that out officially.
>
> Note that this path is for !pmd_write() case to begin with. It serves
> FOLL_FORCE case. I believe this check is correct: we don't want to allow
> to write to such pages even with FOLL_FORCE.
>
> But looking around, I missed gup_can_follow_protnone() modification. It
> has to return false for RWP.
Right, read-permission checks come before the write-permission checks.
>
>> 4) How do we want to handle PM_UFFD_WP?
>>
>> We are pretty much out of flags soon. Overloading PM_UFFD_WP means that we will not
>> be able to easily support using a separate bit.
>>
>> But our internal design will not easily allow that either, and I am not really
>> sure we want to go down that path any time soon.
>>
>> Maybe we could document this for now as "In WP VMAs, indicated WP PTEs.
>> Otherwise, in RWP VMAs, indicates RWP.". Whenever we would allow both at the
>> same time, we could change the semantics. User space would fail to create one
>> with both protection types for now either way.
>
> Yeah. I think about doing documentation-only update for PM_UFFD_WP for
> now.
Ok, good!
--
Cheers,
David
prev parent reply other threads:[~2026-04-22 18:39 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-14 14:23 [RFC, PATCH 00/12] userfaultfd: working set tracking for VM guest memory Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 01/12] userfaultfd: define UAPI constants for anonymous minor faults Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 02/12] userfaultfd: add UFFD_FEATURE_MINOR_ANON registration support Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 03/12] userfaultfd: implement UFFDIO_DEACTIVATE ioctl Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 04/12] userfaultfd: UFFDIO_CONTINUE for anonymous memory Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 05/12] mm: intercept protnone faults on VM_UFFD_MINOR anonymous VMAs Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 06/12] userfaultfd: auto-resolve shmem and hugetlbfs minor faults in async mode Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 07/12] sched/numa: skip scanning anonymous VM_UFFD_MINOR VMAs Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 08/12] userfaultfd: enable UFFD_FEATURE_MINOR_ANON Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 09/12] mm/pagemap: add PAGE_IS_UFFD_DEACTIVATED to PAGEMAP_SCAN Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 10/12] userfaultfd: add UFFDIO_SET_MODE for runtime sync/async toggle Kiryl Shutsemau (Meta)
2026-04-15 15:08 ` Usama Arif
2026-04-16 13:27 ` Kiryl Shutsemau
2026-04-14 14:23 ` [RFC, PATCH 11/12] selftests/mm: add userfaultfd anonymous minor fault tests Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 12/12] Documentation/userfaultfd: document working set tracking Kiryl Shutsemau (Meta)
2026-04-14 15:28 ` [RFC, PATCH 00/12] userfaultfd: working set tracking for VM guest memory Peter Xu
2026-04-14 17:08 ` Kiryl Shutsemau
2026-04-14 17:45 ` Peter Xu
2026-04-14 15:37 ` David Hildenbrand (Arm)
2026-04-14 17:10 ` Kiryl Shutsemau
2026-04-16 13:49 ` Kiryl Shutsemau
2026-04-16 18:32 ` David Hildenbrand (Arm)
2026-04-16 20:25 ` Kiryl Shutsemau
2026-04-17 11:02 ` Kiryl Shutsemau
2026-04-17 11:43 ` David Hildenbrand (Arm)
2026-04-17 12:26 ` Kiryl Shutsemau
2026-04-19 14:33 ` Kiryl Shutsemau
2026-04-21 13:03 ` David Hildenbrand (Arm)
2026-04-21 14:33 ` Kiryl Shutsemau
2026-04-22 9:27 ` Kiryl Shutsemau
2026-04-22 18:27 ` David Hildenbrand (Arm)
2026-04-22 18:39 ` David Hildenbrand (Arm) [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b77d559b-215e-460a-a268-e63b8273ef42@kernel.org \
--to=david@kernel.org \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=corbet@lwn.net \
--cc=kas@kernel.org \
--cc=kvm@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=rppt@kernel.org \
--cc=seanjc@google.com \
--cc=skhan@linuxfoundation.org \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox