* Re: [PATCH v4 14/14] Documentation/userfaultfd: document RWP working set tracking
[not found] <20260525113737.1942478-15-kas@kernel.org>
@ 2026-05-25 12:24 ` sashiko-bot
2026-05-25 15:45 ` Kiryl Shutsemau
0 siblings, 1 reply; 2+ messages in thread
From: sashiko-bot @ 2026-05-25 12:24 UTC (permalink / raw)
Cc: kvm
Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
- [Critical] The documented VMM working set tracking workflow disables async mode AFTER performing PAGEMAP_SCAN, creating a TOCTOU data loss race condition.
- [High] UFFDIO_SET_MODE relies solely on mmap_write_lock() to synchronize with in-flight faults, failing to exclude or drain per-VMA locked faults.
- [Medium] The example code uses an anonymous compound literal for the UFFDIO_API ioctl argument, making it impossible to inspect returned features.
- [Low] The example C code contains standard C compliance issues and an undefined variable in the `pwrite` call.
--
commit ae370f680f6102fabefea63e5943a13cca9daab8
Author: Kiryl Shutsemau (Meta) <kas@kernel.org>
Documentation/userfaultfd: document RWP working set tracking
Add an admin-guide section covering UFFDIO_REGISTER_MODE_RWP:
- sync and async fault models;
- UFFDIO_RWPROTECT semantics;
- UFFD_FEATURE_RWP_ASYNC;
- UFFDIO_SET_MODE runtime mode flips.
It also covers typical VMM working-set-tracking workflow from detection
loop through sync-mode eviction and back to async.
> diff --git a/Documentation/admin-guide/mm/userfaultfd.rst b/Documentation/admin-guide/mm/userfaultfd.rst
> index 1e533639fd509..cb5d0e0c9fff1 100644
> --- a/Documentation/admin-guide/mm/userfaultfd.rst
> +++ b/Documentation/admin-guide/mm/userfaultfd.rst
[ ... ]
> +**Runtime Mode Switching:**
> +
> +``UFFDIO_SET_MODE`` toggles ``UFFD_FEATURE_RWP_ASYNC`` at runtime, allowing
> +the VMM to switch between lightweight async detection and safe sync
> +eviction without re-registering. The toggle takes ``mmap_write_lock()`` to
> +ensure all in-flight faults complete before the mode change takes effect.
[Severity: High]
Does taking only mmap_write_lock() safely synchronize with in-flight faults
here?
mmap_write_lock() alone does not exclude or drain existing per-VMA read locks.
Since modern page faults (including userfaultfd async and sync faults)
execute under the per-VMA lock, UFFDIO_SET_MODE could return while async
faults are still resolving if the implementation only relies on
mmap_write_lock().
Would the implementation need to call vma_start_write(vma) to properly
drain per-VMA locked faults?
[ ... ]
> + uffd = userfaultfd(O_CLOEXEC | O_NONBLOCK);
> + ioctl(uffd, UFFDIO_API, &(struct uffdio_api){
> + .api = UFFD_API,
> + .features = UFFD_FEATURE_RWP | UFFD_FEATURE_RWP_ASYNC,
> + });
[Severity: Medium]
Since UFFDIO_API modifies the features field on return, how can the caller
inspect the returned features to fall back when RWP is unavailable?
Using an anonymous compound literal here means the struct lifetime expires
after the ioctl() statement, making it impossible to check the negotiated
features as recommended in the "Feature availability" section.
Should this use a named local variable instead?
> + ioctl(uffd, UFFDIO_REGISTER, &(struct uffdio_register){
> + .range = { guest_mem, guest_size },
> + .mode = UFFDIO_REGISTER_MODE_RWP |
> + UFFDIO_REGISTER_MODE_MISSING,
> + });
> +
> + /* Tracking loop */
> + while (vm_running) {
> + /* 1. Detection phase (async — no vCPU stalls) */
> + ioctl(uffd, UFFDIO_RWPROTECT, &(struct uffdio_rwprotect){
> + .range = full_range,
> + .mode = UFFDIO_RWPROTECT_MODE_RWP });
> + sleep(tracking_interval);
> +
> + /* 2. Find cold pages (uffd bit still set) */
> + ioctl(pagemap_fd, PAGEMAP_SCAN, &(struct pm_scan_arg){
> + .category_mask = PAGE_IS_ACCESSED,
> + .category_inverted = PAGE_IS_ACCESSED,
> + .return_mask = PAGE_IS_ACCESSED,
> + ...
> + });
> +
> + /* 3. Switch to sync for safe eviction */
> + ioctl(uffd, UFFDIO_SET_MODE,
> + &(struct uffdio_set_mode){
> + .disable = UFFD_FEATURE_RWP_ASYNC });
[Severity: Critical]
Does scanning for cold pages before switching to sync mode introduce a data
loss race condition?
If a vCPU accesses a "cold" page between PAGEMAP_SCAN and UFFDIO_SET_MODE,
the async fault handler will silently remove RWP protection and grant write
access without blocking.
The VMM would then evict this page (copying it with pwrite and dropping it
with fallocate), and any concurrent vCPU writes would be discarded by the
hole punch, leading to guest data corruption.
Should the VMM switch to sync mode before calling PAGEMAP_SCAN to safely
freeze the working set?
> +
> + /* 4. Evict cold pages (vCPU faults block on guest_mem) */
> + for each cold range:
> + /* Read from io_mem -- bypasses RWP, no fault. */
> + pwrite(storage_fd, io_mem + cold_offset, len, offset);
[Severity: Low]
Does this code compile cleanly?
The pwrite() call uses "offset" as its fourth argument, but the file offset
variable seems to be "cold_offset".
Also, io_mem is a void* from mmap(), and pointer arithmetic on void*
(io_mem + cold_offset) is a GCC extension rather than standard C.
Should io_mem be explicitly cast to a byte pointer, and offset be changed
to cold_offset?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260525113737.1942478-1-kas@kernel.org?part=14
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [PATCH v4 14/14] Documentation/userfaultfd: document RWP working set tracking
2026-05-25 12:24 ` [PATCH v4 14/14] Documentation/userfaultfd: document RWP working set tracking sashiko-bot
@ 2026-05-25 15:45 ` Kiryl Shutsemau
0 siblings, 0 replies; 2+ messages in thread
From: Kiryl Shutsemau @ 2026-05-25 15:45 UTC (permalink / raw)
To: sashiko-reviews
Cc: kvm, akpm, rppt, peterx, david, ljs, surenb, vbabka, Liam.Howlett,
ziy, corbet, skhan, seanjc, pbonzini, jthoughton, aarcange, sj,
usama.arif, linux-mm, linux-kernel, linux-doc, linux-kselftest,
kernel-team
On Mon, May 25, 2026 at 12:24:08PM +0000, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
> - [Critical] The documented VMM working set tracking workflow disables
> async mode AFTER performing PAGEMAP_SCAN, creating a TOCTOU data loss
> race condition.
> - [High] UFFDIO_SET_MODE relies solely on mmap_write_lock() to
> synchronize with in-flight faults, failing to exclude or drain per-VMA
> locked faults.
> - [Medium] The example code uses an anonymous compound literal for the
> UFFDIO_API ioctl argument, making it impossible to inspect returned
> features.
> - [Low] The example C code contains standard C compliance issues and
> an undefined variable in the `pwrite` call.
All valid. Will address in v5.
> --
> Sashiko AI review · https://sashiko.dev/#/patchset/20260525113737.1942478-1-kas@kernel.org?part=14
--
Kiryl Shutsemau / Kirill A. Shutemov
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2026-05-25 15:45 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20260525113737.1942478-15-kas@kernel.org>
2026-05-25 12:24 ` [PATCH v4 14/14] Documentation/userfaultfd: document RWP working set tracking sashiko-bot
2026-05-25 15:45 ` Kiryl Shutsemau
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox