Kernel KVM virtualization development
 help / color / mirror / Atom feed
From: sashiko-bot@kernel.org
To: "Kiryl Shutsemau" <kirill@shutemov.name>
Cc: kvm@vger.kernel.org
Subject: Re: [PATCH v5 18/18] Documentation/userfaultfd: document RWP working set tracking
Date: Tue, 26 May 2026 14:51:14 +0000	[thread overview]
Message-ID: <20260526145115.12F551F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260526130509.2748441-19-kirill@shutemov.name>

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [Medium] PAGEMAP_SCAN example matches unpopulated pages as "cold", leading to an infinite loop of evicting memory holes.
- [Medium] The documentation falsely claims that UFFDIO_API does not fail when unsupported features are requested.
--

commit a8b39975821d1562e1c0e6bd26ea6ff5f6e32ba6
Author: Kiryl Shutsemau (Meta) <kas@kernel.org>

Documentation/userfaultfd: document RWP working set tracking

This patch adds an admin-guide section covering UFFDIO_REGISTER_MODE_RWP.
It details sync and async fault models, UFFDIO_RWPROTECT semantics, and
covers a typical VMM working-set-tracking workflow.

> diff --git a/Documentation/admin-guide/mm/userfaultfd.rst b/Documentation/admin-guide/mm/userfaultfd.rst
> index 1e533639fd509..1db99b5355f71 100644
> --- a/Documentation/admin-guide/mm/userfaultfd.rst
> +++ b/Documentation/admin-guide/mm/userfaultfd.rst

[ ... ]

> +Feature availability:
> +
> +RWP is built on top of two kernel primitives: a spare PTE bit owned by
> +userfaultfd (CONFIG_HAVE_ARCH_USERFAULTFD_WP) and architecture support
> +for present-but-inaccessible PTEs (CONFIG_ARCH_HAS_PTE_PROTNONE). When both
> +are available on a 64-bit kernel, the build selects
> +CONFIG_USERFAULTFD_RWP=y and the VM_UFFD_RWP VMA flag becomes
> +available.
> +
> +UFFD_FEATURE_RWP and UFFD_FEATURE_RWP_ASYNC are masked out of the
> +features returned by UFFDIO_API when the running kernel or architecture
> +cannot support them — for example 32-bit kernels (where VM_UFFD_RWP is
> +unavailable), kernels built without CONFIG_USERFAULTFD_RWP, and
> +architectures whose ptes cannot carry the uffd bit at runtime (e.g. riscv
> +without the SVRSW60T59B extension). UFFDIO_API does not fail;
> +unsupported bits are simply absent from uffdio_api.features on return.

[Severity: Medium]
Is this description of UFFDIO_API behavior accurate?

Looking at userfaultfd_api() in mm/userfaultfd.c, the kernel validates
the requested features against what is supported, and branches to an
error path returning -EINVAL if unsupported features are requested:

	if (features & ~uffdio_api.features)
		goto err_out;

Because the ioctl fails, the userfaultfd context remains uninitialized,
and any subsequent ioctls on that fd (such as UFFDIO_REGISTER) will fail.
Should the documentation advise users to first query available features
with a zero-feature UFFDIO_API call rather than assuming it will not fail?

[ ... ]

> +Cold Page Detection with PAGEMAP_SCAN:
> +
> +RWP-protected PTEs carry the uffd PTE bit; the fault-resolution path
> +clears it. PAGEMAP_SCAN reports PAGE_IS_ACCESSED once the bit is
> +clear on a VM_UFFD_RWP VMA, so inverting it efficiently reports the
> +still-protected (cold) pages::
> +
> +    struct pm_scan_arg arg = {
> +        .size = sizeof(arg),
> +        .start = guest_mem_start,
> +        .end = guest_mem_end,
> +        .vec = (uint64_t)regions,
> +        .vec_len = regions_len,
> +        .category_mask = PAGE_IS_ACCESSED,
> +        .category_inverted = PAGE_IS_ACCESSED,
> +        .return_mask = PAGE_IS_ACCESSED,
> +    };

[Severity: Medium]
Will this configuration also match unpopulated pages (memory holes) as
"cold" regions?

Since pagemap_page_category() returns 0 for unpopulated pages, evaluating
(0 ^ PAGE_IS_ACCESSED) & PAGE_IS_ACCESSED inside 
pagemap_scan_is_interesting_page() results in true. 

Later in the documentation, the VMM loop example uses this same mask:

[ ... ]

> +        /* 3. Find cold pages (uffd bit still set) */
> +        ioctl(pagemap_fd, PAGEMAP_SCAN, &(struct pm_scan_arg){
> +            .category_mask = PAGE_IS_ACCESSED,
> +            .category_inverted = PAGE_IS_ACCESSED,
> +            .return_mask = PAGE_IS_ACCESSED,
> +            ...
> +        });
> +
> +        /* 4. Evict cold pages (vCPU faults block on guest_mem) */
> +        for each cold range:
> +            /* Read from io_mem -- bypasses RWP, no fault. */
> +            pwrite(storage_fd, (char *)io_mem + cold_offset,
> +                   len, cold_offset);
> +            /* Drop the page from the shared file. */
> +            fallocate(fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
> +                      cold_offset, len);

If unpopulated ranges are included in the results, does this create an
infinite loop where the VMM continually reads zeroes from io_mem (allocating
new zero pages), writes them to storage, and calls fallocate() on an existing
hole? 

Since a hole never acquires the PAGE_IS_ACCESSED bit, the next scan would
find the exact same ranges. Should PAGE_IS_PRESENT be included in
category_mask (but not category_inverted) to safely skip memory holes?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260526130509.2748441-1-kirill@shutemov.name?part=18

      reply	other threads:[~2026-05-26 14:51 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-26 13:04 [PATCH v5 00/18] userfaultfd: working set tracking for VM guest memory Kiryl Shutsemau
2026-05-26 13:04 ` [PATCH v5 01/18] fs/proc/task_mmu: fix make_uffd_wp_huge_pte() prot-update race Kiryl Shutsemau
2026-05-26 13:46   ` sashiko-bot
2026-05-26 13:04 ` [PATCH v5 02/18] mm/huge_memory: preserve pmd_swp_uffd_wp on device-private PMD downgrade Kiryl Shutsemau
2026-05-26 13:43   ` sashiko-bot
2026-05-26 13:04 ` [PATCH v5 03/18] userfaultfd: gate must_wait writability check on pte_present() Kiryl Shutsemau
2026-05-26 13:44   ` sashiko-bot
2026-05-26 13:04 ` [PATCH v5 04/18] mm: skip out-of-range bits in mk_vma_flags() Kiryl Shutsemau
2026-05-29 14:00   ` Lorenzo Stoakes
2026-05-29 16:09     ` Kiryl Shutsemau
2026-05-30 16:52     ` Mike Rapoport
2026-05-26 13:04 ` [PATCH v5 05/18] mm: decouple protnone helpers from CONFIG_NUMA_BALANCING Kiryl Shutsemau
2026-05-26 13:04 ` [PATCH v5 06/18] mm: rename uffd-wp PTE bit macros to uffd Kiryl Shutsemau
2026-05-26 13:04 ` [PATCH v5 07/18] mm: rename uffd-wp PTE accessors " Kiryl Shutsemau
2026-05-26 13:29   ` sashiko-bot
2026-05-26 13:04 ` [PATCH v5 08/18] mm: add VM_UFFD_RWP VMA flag Kiryl Shutsemau
2026-05-26 14:37   ` sashiko-bot
2026-05-29  7:24   ` Lorenzo Stoakes
2026-05-29 13:07     ` Kiryl Shutsemau
2026-05-29 14:00       ` Lorenzo Stoakes
2026-05-26 13:04 ` [PATCH v5 09/18] mm: add MM_CP_UFFD_RWP change_protection() flag Kiryl Shutsemau
2026-05-26 14:07   ` sashiko-bot
2026-05-29  1:19   ` SeongJae Park
2026-05-26 13:04 ` [PATCH v5 10/18] mm: preserve RWP marker across PTE rewrites Kiryl Shutsemau
2026-05-26 14:15   ` sashiko-bot
2026-05-26 13:04 ` [PATCH v5 11/18] mm: handle VM_UFFD_RWP in khugepaged, rmap, and GUP Kiryl Shutsemau
2026-05-26 15:04   ` sashiko-bot
2026-05-26 13:05 ` [PATCH v5 12/18] userfaultfd: add UFFDIO_REGISTER_MODE_RWP and UFFDIO_RWPROTECT plumbing Kiryl Shutsemau
2026-05-26 14:45   ` sashiko-bot
2026-05-26 13:05 ` [PATCH v5 13/18] mm/userfaultfd: add RWP fault delivery and expose UFFDIO_REGISTER_MODE_RWP Kiryl Shutsemau
2026-05-26 14:33   ` sashiko-bot
2026-05-26 13:05 ` [PATCH v5 14/18] mm/pagemap: add PAGE_IS_ACCESSED for RWP tracking Kiryl Shutsemau
2026-05-26 14:37   ` sashiko-bot
2026-05-26 13:05 ` [PATCH v5 15/18] userfaultfd: add UFFD_FEATURE_RWP_ASYNC for async fault resolution Kiryl Shutsemau
2026-05-26 13:05 ` [PATCH v5 16/18] userfaultfd: add UFFDIO_SET_MODE for runtime sync/async toggle Kiryl Shutsemau
2026-05-26 15:07   ` sashiko-bot
2026-05-26 13:05 ` [PATCH v5 17/18] selftests/mm: add userfaultfd RWP tests Kiryl Shutsemau
2026-05-26 13:05 ` [PATCH v5 18/18] Documentation/userfaultfd: document RWP working set tracking Kiryl Shutsemau
2026-05-26 14:51   ` sashiko-bot [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260526145115.12F551F000E9@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=kirill@shutemov.name \
    --cc=kvm@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox