From: Kiryl Shutsemau <kas@kernel.org>
To: Peter Xu <peterx@redhat.com>
Cc: "David Hildenbrand (Arm)" <david@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Lorenzo Stoakes <ljs@kernel.org>,
Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Vlastimil Babka <vbabka@kernel.org>,
"Liam R . Howlett" <Liam.Howlett@oracle.com>,
Zi Yan <ziy@nvidia.com>, Jonathan Corbet <corbet@lwn.net>,
Shuah Khan <skhan@linuxfoundation.org>,
Sean Christopherson <seanjc@google.com>,
Paolo Bonzini <pbonzini@redhat.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org,
kvm@vger.kernel.org
Subject: Re: [RFC, PATCH 00/12] userfaultfd: working set tracking for VM guest memory
Date: Fri, 24 Apr 2026 12:37:35 +0100 [thread overview]
Message-ID: <aetHWMZyEEIEzsJZ@thinkstation> (raw)
In-Reply-To: <aep8tsYFfr_Xe54q@x1.local>
On Thu, Apr 23, 2026 at 04:10:30PM -0400, Peter Xu wrote:
> On Thu, Apr 23, 2026 at 09:25:30PM +0200, David Hildenbrand (Arm) wrote:
> > >
> > > The other thing is, as I mentioned in the other email, I still don't know
> > > how the current RW protection would work for anonymous. I don't yet think
> > > the user swapper can read the anon page with RW-protected pgtables. So far
> > > my understanding is maybe you only care about shmem so it's fine, but it'll
> > > always be great to confirm with you.
That's true. We use vhost and therefore shmem in our setup.
One idea I had about how to make atomic eviction for anon is extending
process_vm_read() and process_madvise():
- Add a flag to process_vm_read() to bypass the protnone check on
accessible (or only RWP?) VMAs.
- Allow process_madvise(MADV_DONTNEED) when the caller already has
ptrace write access to the target.
The standing objection to remote DONTNEED has been "destructive", but
process_vm_writev() already lets a ptrace-capable caller overwrite
arbitrary anon with attacker-chosen content. DONTNEED is strictly
weaker — it zeroes, it does not inject — so the trust model is already
established.
> > I wonder if uffdio_move could be used for a swapper implementation instead?
I considered it. UFFDIO_MOVE can in principle relocate the cold folio
into a staging VMA inside the VMM, which then reads it and drops it.
The downside is the VMM has to maintain a second address range and
serialise eviction through it. A purpose-built primitive — something
like UFFDIO_EVICT that zaps the PTE and returns the folio contents
(optionally to an fd for io_uring) — seems cleaner.
> If RW is justified to be useful first, maybe.
>
> I had a gut feeling Kirill's use case doesn't use anon at all, then if
> nobody needs it we can still decide to not support anon.
>
> >
> > If we ever have to read from a protnone page, maybe we could teach ptrace access
> > to do it, or have something that can read from prot_none areas -- like
> > uffdio_copy, which can write to prot-none areas.
>
> Somethinig like swap_access() in my proposal can also partly achieve that.
>
> https://lore.kernel.org/all/aYuad2k75iD9bnBE@x1.local/
A maccess()-style primitive that reads through PROT_NONE is a reasonable
building block and overlaps with part of what UFFDIO_EVICT would need.
> There, it was only about reading from swap so far, though. But that one
> might be easier to be extended to read PROT_NONE and directly put data into
> buffer user specified (ps: in my local tree impl I named it maccess() to
> pair with mincore(), but it doesn't really matter; it doesn't even need to
> be a syscall..).
>
> To me, the interfacing is not a major issue. The major question I have is
> why RW protection can help in swap system impl when we already have uffd-wp.
>
> So I want to make sure the use case can't be implemented by uffd-wp already.
> Because that's really what we might do for QEMU.
Race-free eviction can definitely be implemented with uffd-wp already.
But not proper working set discovery.
--
Kiryl Shutsemau / Kirill A. Shutemov
next prev parent reply other threads:[~2026-04-24 11:37 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-14 14:23 [RFC, PATCH 00/12] userfaultfd: working set tracking for VM guest memory Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 01/12] userfaultfd: define UAPI constants for anonymous minor faults Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 02/12] userfaultfd: add UFFD_FEATURE_MINOR_ANON registration support Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 03/12] userfaultfd: implement UFFDIO_DEACTIVATE ioctl Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 04/12] userfaultfd: UFFDIO_CONTINUE for anonymous memory Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 05/12] mm: intercept protnone faults on VM_UFFD_MINOR anonymous VMAs Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 06/12] userfaultfd: auto-resolve shmem and hugetlbfs minor faults in async mode Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 07/12] sched/numa: skip scanning anonymous VM_UFFD_MINOR VMAs Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 08/12] userfaultfd: enable UFFD_FEATURE_MINOR_ANON Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 09/12] mm/pagemap: add PAGE_IS_UFFD_DEACTIVATED to PAGEMAP_SCAN Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 10/12] userfaultfd: add UFFDIO_SET_MODE for runtime sync/async toggle Kiryl Shutsemau (Meta)
2026-04-15 15:08 ` Usama Arif
2026-04-16 13:27 ` Kiryl Shutsemau
2026-04-14 14:23 ` [RFC, PATCH 11/12] selftests/mm: add userfaultfd anonymous minor fault tests Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 12/12] Documentation/userfaultfd: document working set tracking Kiryl Shutsemau (Meta)
2026-04-14 15:28 ` [RFC, PATCH 00/12] userfaultfd: working set tracking for VM guest memory Peter Xu
2026-04-14 17:08 ` Kiryl Shutsemau
2026-04-14 17:45 ` Peter Xu
2026-04-14 15:37 ` David Hildenbrand (Arm)
2026-04-14 17:10 ` Kiryl Shutsemau
2026-04-16 13:49 ` Kiryl Shutsemau
2026-04-16 18:32 ` David Hildenbrand (Arm)
2026-04-16 20:25 ` Kiryl Shutsemau
2026-04-17 11:02 ` Kiryl Shutsemau
2026-04-17 11:43 ` David Hildenbrand (Arm)
2026-04-17 12:26 ` Kiryl Shutsemau
2026-04-19 14:33 ` Kiryl Shutsemau
2026-04-21 13:03 ` David Hildenbrand (Arm)
2026-04-21 14:33 ` Kiryl Shutsemau
2026-04-22 9:27 ` Kiryl Shutsemau
2026-04-22 18:27 ` David Hildenbrand (Arm)
2026-04-22 18:39 ` David Hildenbrand (Arm)
2026-04-23 14:27 ` Kiryl Shutsemau
2026-04-23 14:50 ` Peter Xu
2026-04-23 18:08 ` Kiryl Shutsemau
2026-04-23 18:57 ` Peter Xu
2026-04-23 19:25 ` David Hildenbrand (Arm)
2026-04-23 20:10 ` Peter Xu
2026-04-24 11:37 ` Kiryl Shutsemau [this message]
2026-04-24 12:59 ` Peter Xu
2026-04-25 5:56 ` David Hildenbrand (Arm)
2026-04-24 0:26 ` SeongJae Park
2026-04-24 11:55 ` Peter Xu
2026-04-24 23:59 ` SeongJae Park
2026-04-24 10:34 ` Kiryl Shutsemau
2026-04-24 11:51 ` Peter Xu
2026-04-24 13:49 ` Kiryl Shutsemau
2026-04-24 15:55 ` Peter Xu
2026-04-24 16:09 ` Peter Xu
2026-04-27 10:52 ` Kiryl Shutsemau
2026-04-25 6:05 ` David Hildenbrand (Arm)
2026-04-27 10:23 ` Kiryl Shutsemau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aetHWMZyEEIEzsJZ@thinkstation \
--to=kas@kernel.org \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=corbet@lwn.net \
--cc=david@kernel.org \
--cc=kvm@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=rppt@kernel.org \
--cc=seanjc@google.com \
--cc=skhan@linuxfoundation.org \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.