Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Kiryl Shutsemau <kas@kernel.org>
To: Mike Rapoport <rppt@kernel.org>
Cc: akpm@linux-foundation.org, peterx@redhat.com, david@kernel.org,
	 ljs@kernel.org, surenb@google.com, vbabka@kernel.org,
	Liam.Howlett@oracle.com,  ziy@nvidia.com, corbet@lwn.net,
	skhan@linuxfoundation.org, seanjc@google.com,
	 pbonzini@redhat.com, jthoughton@google.com, aarcange@redhat.com,
	sj@kernel.org,  usama.arif@linux.dev, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,  linux-doc@vger.kernel.org,
	linux-kselftest@vger.kernel.org, kvm@vger.kernel.org,
	 kernel-team@meta.com
Subject: Re: [PATCH v2 14/14] Documentation/userfaultfd: document RWP working set tracking
Date: Fri, 22 May 2026 13:37:51 +0100	[thread overview]
Message-ID: <ahBMtyua-5v8GyxS@thinkstation> (raw)
In-Reply-To: <agQZiTUQNviaGIim@kernel.org>

On Wed, May 13, 2026 at 09:26:17AM +0300, Mike Rapoport wrote:
> On Fri, May 08, 2026 at 04:55:26PM +0100, Kiryl Shutsemau (Meta) wrote:
> > Add an admin-guide section covering UFFDIO_REGISTER_MODE_RWP:
> > 
> >   - sync and async fault models;
> >   - UFFDIO_RWPROTECT semantics;
> >   - UFFD_FEATURE_RWP_ASYNC;
> >   - UFFDIO_SET_MODE runtime mode flips.
> > 
> > It also covers typical VMM working-set-tracking workflow from detection
> > loop through sync-mode eviction and back to async.
> 
> We'd also need man page update at some point :)

Will add a patch for man-pages in v3.

> > Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> > Assisted-by: Claude:claude-opus-4-6
> > ---
> >  Documentation/admin-guide/mm/userfaultfd.rst | 226 ++++++++++++++++++-
> >  1 file changed, 220 insertions(+), 6 deletions(-)
> > 
> > diff --git a/Documentation/admin-guide/mm/userfaultfd.rst b/Documentation/admin-guide/mm/userfaultfd.rst
> > index 1e533639fd50..5ac4ae3dff1b 100644
> > --- a/Documentation/admin-guide/mm/userfaultfd.rst
> > +++ b/Documentation/admin-guide/mm/userfaultfd.rst
> > @@ -275,16 +275,16 @@ tracking and it can be different in a few ways:
> >    - Dirty information will not get lost if the pte was zapped due to
> >      various reasons (e.g. during split of a shmem transparent huge page).
> >  
> > -  - Due to a reverted meaning of soft-dirty (page clean when uffd-wp bit
> > -    set; dirty when uffd-wp bit cleared), it has different semantics on
> > -    some of the memory operations.  For example: ``MADV_DONTNEED`` on
> > +  - Due to a reverted meaning of soft-dirty (page clean when the uffd bit
> > +    is set; dirty when the uffd bit is cleared), it has different semantics
> > +    on some of the memory operations.  For example: ``MADV_DONTNEED`` on
> >      anonymous (or ``MADV_REMOVE`` on a file mapping) will be treated as
> > -    dirtying of memory by dropping uffd-wp bit during the procedure.
> > +    dirtying of memory by dropping the uffd bit during the procedure.
> >  
> >  The user app can collect the "written/dirty" status by looking up the
> > -uffd-wp bit for the pages being interested in /proc/pagemap.
> > +uffd bit for the pages being interested in /proc/pagemap.
> >  
> > -The page will not be under track of uffd-wp async mode until the page is
> > +The page will not be under track of userfaultfd-wp async mode until the page is
> >  explicitly write-protected by ``ioctl(UFFDIO_WRITEPROTECT)`` with the mode
> >  flag ``UFFDIO_WRITEPROTECT_MODE_WP`` set.  Trying to resolve a page fault
> >  that was tracked by async mode userfaultfd-wp is invalid.
> > @@ -307,6 +307,220 @@ transparent to the guest, we want that same address range to act as if it was
> >  still poisoned, even though it's on a new physical host which ostensibly
> >  doesn't have a memory error in the exact same spot.
> >  
> > +Read-Write Protection
> > +---------------------
> > +
> > +``UFFDIO_REGISTER_MODE_RWP`` enables read-write protection tracking on a
> > +memory range. It is similar to (but faster than) ``mprotect(PROT_NONE)``
> > +combined with a signal handler; unlike ``mprotect(PROT_NONE)``, RWP only
> > +traps accesses to *present* PTEs, so accesses to unpopulated addresses in a
> > +protected range fall through to the normal missing-page path. It uses the
> > +PROT_NONE hinting mechanism (same as NUMA balancing) to make pages
> > +inaccessible while keeping them resident in memory. Works on anonymous,
> > +shmem, and hugetlbfs memory.
> > +
> > +This is designed for VM memory managers that need to track the working set
> 
> This feature? Or RWP mode?

RWP.

> > +of guest memory for cold page eviction to tiered or remote storage.
> > +
> > +**Setup:**
> > +
> > +1. Open a userfaultfd and enable ``UFFD_FEATURE_RWP`` via ``UFFDIO_API``.
> > +   Optionally request ``UFFD_FEATURE_RWP_ASYNC`` as well — it requires
> > +   ``UFFD_FEATURE_RWP`` to be set in the same ``UFFDIO_API`` call.
> > +
> > +2. Register the guest memory range with ``UFFDIO_REGISTER_MODE_RWP``
> > +   (and ``UFFDIO_REGISTER_MODE_MISSING`` if evicted pages will need to be
> > +   fetched back from storage).
> > +
> > +**Feature availability:**
> > +
> > +RWP is built on top of two kernel primitives: a spare PTE bit owned by
> > +userfaultfd (``CONFIG_HAVE_ARCH_USERFAULTFD_WP``) and arch support for
> 
> Please spell out architecture.

Ack.

> > +present-but-inaccessible PTEs (``CONFIG_ARCH_HAS_PTE_PROTNONE``). When both
> > +are available on a 64-bit kernel, the build selects
> > +``CONFIG_USERFAULTFD_RWP=y`` and the ``VM_UFFD_RWP`` VMA flag becomes
> > +available.
> > +
> > +``UFFD_FEATURE_RWP`` and ``UFFD_FEATURE_RWP_ASYNC`` are masked out of the
> > +features returned by ``UFFDIO_API`` when the running kernel or architecture
> > +cannot support them — for example 32-bit kernels (where ``VM_UFFD_RWP`` is
> > +unavailable), kernels built without ``CONFIG_USERFAULTFD_RWP``, and
> > +architectures whose ptes cannot carry the uffd bit at runtime (e.g. riscv
> > +without the ``SVRSW60T59B`` extension). ``UFFDIO_API`` does not fail;
> > +unsupported bits are simply absent from ``uffdio_api.features`` on return.
> > +VMMs should inspect the returned ``features`` after ``UFFDIO_API`` and fall
> 
> Lets s/VMM/Callers/.
> Although RWP is designed for VMMs, it's not limited to them and I expect
> other use-cases will be coming along.

Okay.


-- 
  Kiryl Shutsemau / Kirill A. Shutemov


  reply	other threads:[~2026-05-22 12:37 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-08 15:55 [PATCH v2 00/14] userfaultfd: working set tracking for VM guest memory Kiryl Shutsemau (Meta)
2026-05-08 15:55 ` [PATCH v2 01/14] mm: decouple protnone helpers from CONFIG_NUMA_BALANCING Kiryl Shutsemau (Meta)
2026-05-08 15:55 ` [PATCH v2 02/14] mm: rename uffd-wp PTE bit macros to uffd Kiryl Shutsemau (Meta)
2026-05-08 23:52   ` SeongJae Park
2026-05-08 15:55 ` [PATCH v2 03/14] mm: rename uffd-wp PTE accessors " Kiryl Shutsemau (Meta)
2026-05-14  1:31   ` SeongJae Park
2026-05-22 10:33     ` Kiryl Shutsemau
2026-05-08 15:55 ` [PATCH v2 04/14] mm: add VM_UFFD_RWP VMA flag Kiryl Shutsemau (Meta)
2026-05-12 16:48   ` Mike Rapoport
2026-05-22 10:39     ` Kiryl Shutsemau
2026-05-15  0:29   ` SeongJae Park
2026-05-22 10:54     ` Kiryl Shutsemau
2026-05-08 15:55 ` [PATCH v2 05/14] mm: add MM_CP_UFFD_RWP change_protection() flag Kiryl Shutsemau (Meta)
2026-05-12 16:45   ` Mike Rapoport
2026-05-22 11:44     ` Kiryl Shutsemau
2026-05-08 15:55 ` [PATCH v2 06/14] mm: preserve RWP marker across PTE rewrites Kiryl Shutsemau (Meta)
2026-05-12 16:59   ` Mike Rapoport
2026-05-08 15:55 ` [PATCH v2 07/14] mm: handle VM_UFFD_RWP in khugepaged, rmap, and GUP Kiryl Shutsemau (Meta)
2026-05-12 17:00   ` Mike Rapoport
2026-05-08 15:55 ` [PATCH v2 08/14] userfaultfd: add UFFDIO_REGISTER_MODE_RWP and UFFDIO_RWPROTECT plumbing Kiryl Shutsemau (Meta)
2026-05-12 17:20   ` Mike Rapoport
2026-05-22 11:50     ` Kiryl Shutsemau
2026-05-08 15:55 ` [PATCH v2 09/14] mm/userfaultfd: add RWP fault delivery and expose UFFDIO_REGISTER_MODE_RWP Kiryl Shutsemau (Meta)
2026-05-12 17:29   ` Mike Rapoport
2026-05-22 11:51     ` Kiryl Shutsemau
2026-05-08 15:55 ` [PATCH v2 10/14] mm/pagemap: add PAGE_IS_ACCESSED for RWP tracking Kiryl Shutsemau (Meta)
2026-05-12 17:41   ` Mike Rapoport
2026-05-08 15:55 ` [PATCH v2 11/14] userfaultfd: add UFFD_FEATURE_RWP_ASYNC for async fault resolution Kiryl Shutsemau (Meta)
2026-05-12 18:05   ` Mike Rapoport
2026-05-08 15:55 ` [PATCH v2 12/14] userfaultfd: add UFFDIO_SET_MODE for runtime sync/async toggle Kiryl Shutsemau (Meta)
2026-05-12 18:11   ` Mike Rapoport
2026-05-08 15:55 ` [PATCH v2 13/14] selftests/mm: add userfaultfd RWP tests Kiryl Shutsemau (Meta)
2026-05-13  6:06   ` Mike Rapoport
2026-05-22 12:10     ` Kiryl Shutsemau
2026-05-08 15:55 ` [PATCH v2 14/14] Documentation/userfaultfd: document RWP working set tracking Kiryl Shutsemau (Meta)
2026-05-13  6:26   ` Mike Rapoport
2026-05-22 12:37     ` Kiryl Shutsemau [this message]
2026-05-08 17:32 ` [PATCH v2 00/14] userfaultfd: working set tracking for VM guest memory Andrew Morton
2026-05-08 22:48   ` Kiryl Shutsemau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ahBMtyua-5v8GyxS@thinkstation \
    --to=kas@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=david@kernel.org \
    --cc=jthoughton@google.com \
    --cc=kernel-team@meta.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=rppt@kernel.org \
    --cc=seanjc@google.com \
    --cc=sj@kernel.org \
    --cc=skhan@linuxfoundation.org \
    --cc=surenb@google.com \
    --cc=usama.arif@linux.dev \
    --cc=vbabka@kernel.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox