public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Kiryl Shutsemau <kas@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Peter Xu <peterx@redhat.com>, Lorenzo Stoakes <ljs@kernel.org>,
	Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Vlastimil Babka <vbabka@kernel.org>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	Zi Yan <ziy@nvidia.com>, Jonathan Corbet <corbet@lwn.net>,
	Shuah Khan <skhan@linuxfoundation.org>,
	Sean Christopherson <seanjc@google.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org,
	kvm@vger.kernel.org
Subject: Re: [RFC, PATCH 00/12] userfaultfd: working set tracking for VM guest memory
Date: Wed, 22 Apr 2026 20:39:50 +0200	[thread overview]
Message-ID: <b77d559b-215e-460a-a268-e63b8273ef42@kernel.org> (raw)
In-Reply-To: <aed6fHLrIdahbdY3@thinkstation>

On 4/21/26 16:33, Kiryl Shutsemau wrote:
> On Tue, Apr 21, 2026 at 03:03:56PM +0200, David Hildenbrand (Arm) wrote:
>> On 4/19/26 16:33, Kiryl Shutsemau wrote:
>>>
>>> See https://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git uffd/rfc-v3
>>>
>>
>> Quick feedback from skimming over it:
>>
>>
>> 1) ARCH_SUPPORTS_PROT_NONE needs some thought, because I am pretty sure all 
>> architectures support something like mprotect(PROT_NONE), and the config
>> option might be misleading.
>>
>> So you very likely want to express different semantics here. You want to
>> know whether pte_protnone()/pmd_protnone() works.
> 
> We do support mprotect(PROT_NONE) everywhere, but we don't always have a
> way to distinguish such entries from others without VMA in hands. Like,
> there are other PTEs that don't have present bit set. In my and NUMA
> balancing context we cannot rely on VMA, because we want to install
> PAGE_NONE entires into accessible VMA.

Exactly. So it's not ARCH_SUPPORTS_PROT_NONE.

> 
> So we need two things; pte/pmd_protnone() checks and PAGE_NONE itself.
> The first to test PTE for PAGE_NONE, the second for pte/pmd_modify() to
> make the entry protnone.
> 
> Currently, generic code only use this functionality for NUMA balancing
> and gated by NUMA balancing config option. So I moved it under separate
> config option.
> 
> Do you want it to be named differently?

Would ARCH_SUPPORTS_PXX_PROTNONE or sth. like that better describe that
pte_protnone()/pmd_protnone() do what we want?

> 
>> 2) The other stuff is really just an extension of existing WP handling.
>> I suspect we want to have some reasonable cleanups to not end up in
>> common code with
>>
>> @@ -1841,7 +1841,7 @@ static void copy_huge_non_present_pmd(
>>  	add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR);
>>  	mm_inc_nr_ptes(dst_mm);
>>  	pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable);
>> -	if (!userfaultfd_wp(dst_vma))
>> +	if (!userfaultfd_wp(dst_vma) && !userfaultfd_rwp(dst_vma))
>>  		pmd = pmd_swp_clear_uffd_wp(pmd);
>>  	set_pmd_at(dst_mm, addr, dst_pmd, pmd);
>>
>> All the uffd handling should be better isolated (i.e., a single vma check?),
>> and likely the uffd bit should be abstracted away from being called "wp" to
>> something more generic.
>>
>> Maybe it's simply a "uffd" flag which's semantics depend
>> on the vma flags.
>>
>> Maybe something like:
>>
>> @@ -1841,7 +1841,7 @@ static void copy_huge_non_present_pmd(
>>  	add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR);
>>  	mm_inc_nr_ptes(dst_mm);
>>  	pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable);
>> 	if (!userfaultfd_uses_pte_bit(dst_vma))
>>  		pmd = pmd_swp_clear_uffd(pmd);
>>  	set_pmd_at(dst_mm, addr, dst_pmd, pmd);
>>
>> Not sure, needs another thought. But I think there are some decent
>> cleanups to be had.
> 
> That's fair. Maybe userfaultfd_protected() name is better for the VMA
> check?

Yes, something like that could also work.

> 
> And about UFFD_WP bit name. Maybe we can just drop _WP: _PAGE_UFFD_WP ->
> _PAGE_UFFD, pte_uffd_wp() -> pte_uffd()?

Yes, I hinted at the above with pmd_swp_clear_uffd().

> 
> But it is a lot of changes. Can I do the bit rename as a follow up
> patchset?

Let's get this clean. There is no need to rush that in ;)

I suspect it's a fairly mechanical change.

> 
>> 3) Some other stuff needs a second thought, like
>>
>> diff --git a/mm/gup.c b/mm/gup.c
>> index 8e7dc2c6ee738..08fc18f1290d4 100644
>> --- a/mm/gup.c
>> +++ b/mm/gup.c
>> @@ -695,7 +695,8 @@ static inline bool can_follow_write_pmd(pmd_t pmd, struct page *page,
>>  	/* ... and a write-fault isn't required for other reasons. */
>>  	if (pmd_needs_soft_dirty_wp(vma, pmd))
>>  		return false;
>> -	return !userfaultfd_huge_pmd_wp(vma, pmd);
>> +	return !userfaultfd_huge_pmd_wp(vma, pmd) &&
>> +	       !userfaultfd_huge_pmd_rwp(vma, pmd);
>>  }
>>
>> How can a pte be writable and prot_none at the same time? Maybe just confused AI
>> output that you should carefully double check before sending that out officially.
> 
> Note that this path is for !pmd_write() case to begin with. It serves
> FOLL_FORCE case. I believe this check is correct: we don't want to allow
> to write to such pages even with FOLL_FORCE.
> 
> But looking around, I missed gup_can_follow_protnone() modification. It
> has to return false for RWP.

Right, read-permission checks come before the write-permission checks.

> 
>> 4) How do we want to handle PM_UFFD_WP?
>>
>> We are pretty much out of flags soon. Overloading PM_UFFD_WP means that we will not
>> be able to easily support using a separate bit.
>>
>> But our internal design will not easily allow that either, and I am not really
>> sure we want to go down that path any time soon.
>>
>> Maybe we could document this for now as "In WP VMAs, indicated WP PTEs.
>> Otherwise, in RWP VMAs, indicates RWP.". Whenever we would allow both at the
>> same time, we could change the semantics. User space would fail to create one
>> with both protection types for now either way.
> 
> Yeah. I think about doing documentation-only update for PM_UFFD_WP for
> now.

Ok, good!

-- 
Cheers,

David

      parent reply	other threads:[~2026-04-22 18:39 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-14 14:23 [RFC, PATCH 00/12] userfaultfd: working set tracking for VM guest memory Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 01/12] userfaultfd: define UAPI constants for anonymous minor faults Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 02/12] userfaultfd: add UFFD_FEATURE_MINOR_ANON registration support Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 03/12] userfaultfd: implement UFFDIO_DEACTIVATE ioctl Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 04/12] userfaultfd: UFFDIO_CONTINUE for anonymous memory Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 05/12] mm: intercept protnone faults on VM_UFFD_MINOR anonymous VMAs Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 06/12] userfaultfd: auto-resolve shmem and hugetlbfs minor faults in async mode Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 07/12] sched/numa: skip scanning anonymous VM_UFFD_MINOR VMAs Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 08/12] userfaultfd: enable UFFD_FEATURE_MINOR_ANON Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 09/12] mm/pagemap: add PAGE_IS_UFFD_DEACTIVATED to PAGEMAP_SCAN Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 10/12] userfaultfd: add UFFDIO_SET_MODE for runtime sync/async toggle Kiryl Shutsemau (Meta)
2026-04-15 15:08   ` Usama Arif
2026-04-16 13:27     ` Kiryl Shutsemau
2026-04-14 14:23 ` [RFC, PATCH 11/12] selftests/mm: add userfaultfd anonymous minor fault tests Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 12/12] Documentation/userfaultfd: document working set tracking Kiryl Shutsemau (Meta)
2026-04-14 15:28 ` [RFC, PATCH 00/12] userfaultfd: working set tracking for VM guest memory Peter Xu
2026-04-14 17:08   ` Kiryl Shutsemau
2026-04-14 17:45     ` Peter Xu
2026-04-14 15:37 ` David Hildenbrand (Arm)
2026-04-14 17:10   ` Kiryl Shutsemau
2026-04-16 13:49     ` Kiryl Shutsemau
2026-04-16 18:32       ` David Hildenbrand (Arm)
2026-04-16 20:25         ` Kiryl Shutsemau
2026-04-17 11:02           ` Kiryl Shutsemau
2026-04-17 11:43           ` David Hildenbrand (Arm)
2026-04-17 12:26             ` Kiryl Shutsemau
2026-04-19 14:33               ` Kiryl Shutsemau
2026-04-21 13:03                 ` David Hildenbrand (Arm)
2026-04-21 14:33                   ` Kiryl Shutsemau
2026-04-22  9:27                     ` Kiryl Shutsemau
2026-04-22 18:27                       ` David Hildenbrand (Arm)
2026-04-22 18:39                     ` David Hildenbrand (Arm) [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b77d559b-215e-460a-a268-e63b8273ef42@kernel.org \
    --to=david@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=kas@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=rppt@kernel.org \
    --cc=seanjc@google.com \
    --cc=skhan@linuxfoundation.org \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox