public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Kiryl Shutsemau <kas@kernel.org>
To: akpm@linux-foundation.org, rppt@kernel.org, peterx@redhat.com,
	 david@kernel.org
Cc: ljs@kernel.org, surenb@google.com, vbabka@kernel.org,
	 Liam.Howlett@oracle.com, ziy@nvidia.com, corbet@lwn.net,
	skhan@linuxfoundation.org,  seanjc@google.com,
	pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com,
	 sj@kernel.org, usama.arif@linux.dev, linux-mm@kvack.org,
	 linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-kselftest@vger.kernel.org,  kvm@vger.kernel.org,
	kernel-team@meta.com
Subject: Re: [PATCH 07/14] mm: handle VM_UFFD_RWP in khugepaged, rmap, and GUP
Date: Thu, 30 Apr 2026 17:31:17 +0100	[thread overview]
Message-ID: <afODmDGimJ4QUkkv@thinkstation> (raw)
In-Reply-To: <afOCAZ-dTQiLFN0y@thinkstation>

On Thu, Apr 30, 2026 at 05:28:17PM +0100, Kiryl Shutsemau wrote:
> sashiko.dev -- https://sashiko.dev/#/patchset/20260427114607.4068647-1-kas@kernel.org -- wrote:
> > > @@ -1084,9 +1092,29 @@ static __always_inline void __copy_present_ptes(struct vm_area_struct *dst_vma,
> > >  		pte_t pte, unsigned long addr, int nr)
> > >  {
> > >  	struct mm_struct *src_mm = src_vma->vm_mm;
> > > +	bool writable;
> > > +
> > > +	/*
> > > +	 * Snapshot writability before the RWP-disarm rewrite below: when the
> > > +	 * child is not RWP-armed, pte_modify(pte, dst_vma->vm_page_prot) can
> > > +	 * silently drop _PAGE_RW from a resolved (no-marker) writable PTE,
> > > +	 * so a later pte_write(pte) check would skip the COW wrprotect and
> > > +	 * leave the parent writable over a folio shared with the child.
> > > +	 */
> > > +	writable = pte_write(pte);
> > > +
> > > +	/*
> > > +	 * Child is not RWP-armed: restore accessible protection so the
> > > +	 * inherited PAGE_NONE does not cost a fault on first read.
> > > +	 */
> > > +	if (!userfaultfd_protected(dst_vma)) {
> > > +		if (userfaultfd_rwp(src_vma))
> > > +			pte = pte_modify(pte, dst_vma->vm_page_prot);
> > > +		pte = pte_clear_uffd(pte);
> > > +	}
> > Does this unconditional pte_modify() create invalid clean and writable PTEs
> > for shared mappings?
> >
> > Without checking pte_uffd(pte) first, this blindly modifies every present PTE
> > if the source VMA had RWP enabled. For shared writable mappings, vm_page_prot
> > includes _PAGE_RW. If a PTE was clean and mapped read-only to intercept the
> > first write for filesystem dirty-tracking, pte_modify() forces the write bit
> > on while preserving the clean state.
> 
> Two reasons this is safe in practice:
> 
> 1. RWP cannot reach a backing that uses the clean-RO + page_mkwrite()
>    dirty-tracking pattern. vma_can_userfault() admits RWP only on anon,
>    shmem and hugetlbfs (the ones with vm_uffd_ops); regular file-backed
>    shared mappings are rejected at register time.
> 
> 2. For backings that *do* support shared writable + dirty tracking (i.e.
>    the ones RWP isn't enabled on), the fs sets vma->vm_page_prot to RO
>    precisely so the first write traps page_mkwrite(). pte_modify(pte,
>    dst_vma->vm_page_prot) would not add _PAGE_RW in that configuration
>    either.
> 
> > Hardware writes could then bypass page_mkwrite(), leaving the filesystem
> > unaware that the page is dirty, and potentially causing silent data loss
> > upon reclaim.
> > Could this also overwrite other important PTE states like NUMA hinting
> > or soft-dirty tracking?
> 
> pte_modify() on the supported architectures preserves _PAGE_CHG_MASK,
> which covers _PAGE_DIRTY, _PAGE_ACCESSED, _PAGE_SOFT_DIRTY, and the PFN.
> NUMA hinting is encoded in the protection bits, so a transient hint
> state on a resolved (no-marker) PTE would be lost across this rewrite,
> but that's just a re-prime on the next NUMA scan, not a correctness
> issue.
> 
> > @@ -298,6 +298,16 @@ static __always_inline void change_present_ptes(struct mmu_gather *tlb,
> >      else if (uffd_wp_resolve || uffd_rwp_resolve)
> >          ptent = pte_clear_uffd(ptent);
> >
> > +    /*
> > +     * The uffd bit on a VM_UFFD_RWP VMA carries PROT_NONE
> > +     * semantics. If mprotect() or NUMA hinting changed the
> > +     * base protection, restore PAGE_NONE so the PTE still
> > +     * traps on any access. pte_modify() preserves
> > +     * _PAGE_UFFD.
> > +     */
> > +    if (userfaultfd_rwp(vma) && pte_uffd(ptent))
> > +        ptent = pte_modify(ptent, PAGE_NONE);
> >
> > For VMAs registered with both UFFD WP and RWP, could this silently upgrade
> > Write-Protected pages to Read-Write-Protected?
> 
> VM_UFFD_WP and VM_UFFD_RWP are mutually exclusive per VMA. The combination
> is rejected at registration time:
> 
>   fs/userfaultfd.c, userfaultfd_register():
>         if ((vm_flags & VM_UFFD_WP) && (vm_flags & VM_UFFD_RWP))
>                 goto out;       /* -EINVAL */
> 
> and re-registration cannot drop either bit (also -EBUSY in the same
> function), so a PTE in a WP-armed VMA can never satisfy
> userfaultfd_rwp(vma); the new branch never fires there.
> 
> [ ... ]
> > +        /* Re-arm RWP on the moved PTE if dst_vma is RWP-registered. */
> > +        if (userfaultfd_rwp(dst_vma)) {
> > +            orig_dst_pte = pte_modify(orig_dst_pte, PAGE_NONE);
> > +            orig_dst_pte = pte_mkuffd(orig_dst_pte);
> > +        }
> > +
> >          set_pte_at(mm, dst_addr, dst_pte, orig_dst_pte);
> >
> > Could applying PAGE_NONE unconditionally to newly moved pages trap
> > applications in an infinite fault loop?
> 
> No -- the post-MOVE access is delivered as a normal RWP fault. In sync
> mode it goes to the registered handler, which resolves it with
> UFFDIO_RWPROTECT clearing MODE_RWP; in async mode the kernel resolves
> it in-kernel and the faulting thread continues. There is no loop.
> 
> The semantics here are intentional: a VM_UFFD_RWP VMA has the contract
> that every present PTE is either an active marker or a tracked-and-
> resolved PTE whose next access will re-trap. UFFDIO_MOVE into such a
> VMA must keep that contract, otherwise the moved-in page would be a
> silent hole in the working-set view. UFFDIO_MOVE has no mode flag for
> "skip protection", by design -- the same way it has no flag to skip
> WP arming if dst_vma were WP-armed (and the equivalent could be added
> there if we ever decide UFFDIO_MOVE should preserve markers in WP
> VMAs too).
> 

Oopsie. 

I put it in reply to the wrong patch. It suppose to be for 06/14.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

  reply	other threads:[~2026-04-30 16:31 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-27 11:45 [PATCH 00/14] userfaultfd: working set tracking for VM guest memory Kiryl Shutsemau (Meta)
2026-04-27 11:45 ` [PATCH 01/14] mm: decouple protnone helpers from CONFIG_NUMA_BALANCING Kiryl Shutsemau (Meta)
2026-04-30  4:47   ` SeongJae Park
2026-04-27 11:45 ` [PATCH 02/14] mm: rename uffd-wp PTE bit macros to uffd Kiryl Shutsemau (Meta)
2026-04-27 11:45 ` [PATCH 03/14] mm: rename uffd-wp PTE accessors " Kiryl Shutsemau (Meta)
2026-04-27 11:45 ` [PATCH 04/14] mm: add VM_UFFD_RWP VMA flag Kiryl Shutsemau (Meta)
2026-04-27 11:45 ` [PATCH 05/14] mm: add MM_CP_UFFD_RWP change_protection() flag Kiryl Shutsemau (Meta)
2026-04-27 11:45 ` [PATCH 06/14] mm: preserve RWP marker across PTE rewrites Kiryl Shutsemau (Meta)
2026-04-27 11:45 ` [PATCH 07/14] mm: handle VM_UFFD_RWP in khugepaged, rmap, and GUP Kiryl Shutsemau (Meta)
2026-04-30 16:28   ` Kiryl Shutsemau
2026-04-30 16:31     ` Kiryl Shutsemau [this message]
2026-04-27 11:45 ` [PATCH 08/14] userfaultfd: add UFFDIO_REGISTER_MODE_RWP and UFFDIO_RWPROTECT plumbing Kiryl Shutsemau (Meta)
2026-04-30 16:46   ` Kiryl Shutsemau
2026-04-27 11:45 ` [PATCH 09/14] mm/userfaultfd: add RWP fault delivery and expose UFFDIO_REGISTER_MODE_RWP Kiryl Shutsemau (Meta)
2026-04-30 16:51   ` Kiryl Shutsemau
2026-04-27 11:45 ` [PATCH 10/14] mm/pagemap: add PAGE_IS_ACCESSED for RWP tracking Kiryl Shutsemau (Meta)
2026-04-27 11:45 ` [PATCH 11/14] userfaultfd: add UFFD_FEATURE_RWP_ASYNC for async fault resolution Kiryl Shutsemau (Meta)
2026-04-27 11:46 ` [PATCH 12/14] userfaultfd: add UFFDIO_SET_MODE for runtime sync/async toggle Kiryl Shutsemau (Meta)
2026-04-27 11:46 ` [PATCH 13/14] selftests/mm: add userfaultfd RWP tests Kiryl Shutsemau (Meta)
2026-04-27 11:46 ` [PATCH 14/14] Documentation/userfaultfd: document RWP working set tracking Kiryl Shutsemau (Meta)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=afODmDGimJ4QUkkv@thinkstation \
    --to=kas@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=david@kernel.org \
    --cc=jthoughton@google.com \
    --cc=kernel-team@meta.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=rppt@kernel.org \
    --cc=seanjc@google.com \
    --cc=sj@kernel.org \
    --cc=skhan@linuxfoundation.org \
    --cc=surenb@google.com \
    --cc=usama.arif@linux.dev \
    --cc=vbabka@kernel.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox