From: Peter Xu <peterx@redhat.com>
To: Kiryl Shutsemau <kas@kernel.org>
Cc: "David Hildenbrand (Arm)" <david@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Lorenzo Stoakes <ljs@kernel.org>, Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Vlastimil Babka <vbabka@kernel.org>,
"Liam R . Howlett" <Liam.Howlett@oracle.com>,
Zi Yan <ziy@nvidia.com>, Jonathan Corbet <corbet@lwn.net>,
Shuah Khan <skhan@linuxfoundation.org>,
Sean Christopherson <seanjc@google.com>,
Paolo Bonzini <pbonzini@redhat.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org,
kvm@vger.kernel.org
Subject: Re: [RFC, PATCH 00/12] userfaultfd: working set tracking for VM guest memory
Date: Fri, 24 Apr 2026 08:59:57 -0400 [thread overview]
Message-ID: <aetpTUc9ojwF6Is7@x1.local> (raw)
In-Reply-To: <aetHWMZyEEIEzsJZ@thinkstation>
On Fri, Apr 24, 2026 at 12:37:35PM +0100, Kiryl Shutsemau wrote:
> On Thu, Apr 23, 2026 at 04:10:30PM -0400, Peter Xu wrote:
> > On Thu, Apr 23, 2026 at 09:25:30PM +0200, David Hildenbrand (Arm) wrote:
> > > >
> > > > The other thing is, as I mentioned in the other email, I still don't know
> > > > how the current RW protection would work for anonymous. I don't yet think
> > > > the user swapper can read the anon page with RW-protected pgtables. So far
> > > > my understanding is maybe you only care about shmem so it's fine, but it'll
> > > > always be great to confirm with you.
>
>
> That's true. We use vhost and therefore shmem in our setup.
I see, thanks for confirming.
Side note: I believe host works for anon too since GUP works for anon, but
it doesn't matter as long as we know anon isn't a must.
>
> One idea I had about how to make atomic eviction for anon is extending
> process_vm_read() and process_madvise():
>
> - Add a flag to process_vm_read() to bypass the protnone check on
> accessible (or only RWP?) VMAs.
>
> - Allow process_madvise(MADV_DONTNEED) when the caller already has
> ptrace write access to the target.
>
> The standing objection to remote DONTNEED has been "destructive", but
> process_vm_writev() already lets a ptrace-capable caller overwrite
> arbitrary anon with attacker-chosen content. DONTNEED is strictly
> weaker — it zeroes, it does not inject — so the trust model is already
> established.
>
> > > I wonder if uffdio_move could be used for a swapper implementation instead?
>
> I considered it. UFFDIO_MOVE can in principle relocate the cold folio
> into a staging VMA inside the VMM, which then reads it and drops it.
> The downside is the VMM has to maintain a second address range and
> serialise eviction through it. A purpose-built primitive — something
> like UFFDIO_EVICT that zaps the PTE and returns the folio contents
> (optionally to an fd for io_uring) — seems cleaner.
Right, the other thing is unnecessary overhead on the extra pgtable
operations when moving to the staging VMA (e.g. tlb flush).
>
>
> > If RW is justified to be useful first, maybe.
> >
> > I had a gut feeling Kirill's use case doesn't use anon at all, then if
> > nobody needs it we can still decide to not support anon.
> >
> > >
> > > If we ever have to read from a protnone page, maybe we could teach ptrace access
> > > to do it, or have something that can read from prot_none areas -- like
> > > uffdio_copy, which can write to prot-none areas.
> >
> > Somethinig like swap_access() in my proposal can also partly achieve that.
> >
> > https://lore.kernel.org/all/aYuad2k75iD9bnBE@x1.local/
>
> A maccess()-style primitive that reads through PROT_NONE is a reasonable
> building block and overlaps with part of what UFFDIO_EVICT would need.
>
> > There, it was only about reading from swap so far, though. But that one
> > might be easier to be extended to read PROT_NONE and directly put data into
> > buffer user specified (ps: in my local tree impl I named it maccess() to
> > pair with mincore(), but it doesn't really matter; it doesn't even need to
> > be a syscall..).
> >
> > To me, the interfacing is not a major issue. The major question I have is
> > why RW protection can help in swap system impl when we already have uffd-wp.
> >
> > So I want to make sure the use case can't be implemented by uffd-wp already.
> > Because that's really what we might do for QEMU.
>
> Race-free eviction can definitely be implemented with uffd-wp already.
> But not proper working set discovery.
Good. Then we can focus the discussion on hotness tracking with RWP and
its benefits, and compare it with a pure access bit focused tracking system
(as I mentioned in the other reply).
Thanks,
--
Peter Xu
next prev parent reply other threads:[~2026-04-24 13:00 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-14 14:23 [RFC, PATCH 00/12] userfaultfd: working set tracking for VM guest memory Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 01/12] userfaultfd: define UAPI constants for anonymous minor faults Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 02/12] userfaultfd: add UFFD_FEATURE_MINOR_ANON registration support Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 03/12] userfaultfd: implement UFFDIO_DEACTIVATE ioctl Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 04/12] userfaultfd: UFFDIO_CONTINUE for anonymous memory Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 05/12] mm: intercept protnone faults on VM_UFFD_MINOR anonymous VMAs Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 06/12] userfaultfd: auto-resolve shmem and hugetlbfs minor faults in async mode Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 07/12] sched/numa: skip scanning anonymous VM_UFFD_MINOR VMAs Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 08/12] userfaultfd: enable UFFD_FEATURE_MINOR_ANON Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 09/12] mm/pagemap: add PAGE_IS_UFFD_DEACTIVATED to PAGEMAP_SCAN Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 10/12] userfaultfd: add UFFDIO_SET_MODE for runtime sync/async toggle Kiryl Shutsemau (Meta)
2026-04-15 15:08 ` Usama Arif
2026-04-16 13:27 ` Kiryl Shutsemau
2026-04-14 14:23 ` [RFC, PATCH 11/12] selftests/mm: add userfaultfd anonymous minor fault tests Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 12/12] Documentation/userfaultfd: document working set tracking Kiryl Shutsemau (Meta)
2026-04-14 15:28 ` [RFC, PATCH 00/12] userfaultfd: working set tracking for VM guest memory Peter Xu
2026-04-14 17:08 ` Kiryl Shutsemau
2026-04-14 17:45 ` Peter Xu
2026-04-14 15:37 ` David Hildenbrand (Arm)
2026-04-14 17:10 ` Kiryl Shutsemau
2026-04-16 13:49 ` Kiryl Shutsemau
2026-04-16 18:32 ` David Hildenbrand (Arm)
2026-04-16 20:25 ` Kiryl Shutsemau
2026-04-17 11:02 ` Kiryl Shutsemau
2026-04-17 11:43 ` David Hildenbrand (Arm)
2026-04-17 12:26 ` Kiryl Shutsemau
2026-04-19 14:33 ` Kiryl Shutsemau
2026-04-21 13:03 ` David Hildenbrand (Arm)
2026-04-21 14:33 ` Kiryl Shutsemau
2026-04-22 9:27 ` Kiryl Shutsemau
2026-04-22 18:27 ` David Hildenbrand (Arm)
2026-04-22 18:39 ` David Hildenbrand (Arm)
2026-04-23 14:27 ` Kiryl Shutsemau
2026-04-23 14:50 ` Peter Xu
2026-04-23 18:08 ` Kiryl Shutsemau
2026-04-23 18:57 ` Peter Xu
2026-04-23 19:25 ` David Hildenbrand (Arm)
2026-04-23 20:10 ` Peter Xu
2026-04-24 11:37 ` Kiryl Shutsemau
2026-04-24 12:59 ` Peter Xu [this message]
2026-04-25 5:56 ` David Hildenbrand (Arm)
2026-04-24 0:26 ` SeongJae Park
2026-04-24 11:55 ` Peter Xu
2026-04-24 23:59 ` SeongJae Park
2026-04-24 10:34 ` Kiryl Shutsemau
2026-04-24 11:51 ` Peter Xu
2026-04-24 13:49 ` Kiryl Shutsemau
2026-04-24 15:55 ` Peter Xu
2026-04-24 16:09 ` Peter Xu
2026-04-27 10:52 ` Kiryl Shutsemau
2026-04-25 6:05 ` David Hildenbrand (Arm)
2026-04-27 10:23 ` Kiryl Shutsemau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aetpTUc9ojwF6Is7@x1.local \
--to=peterx@redhat.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=corbet@lwn.net \
--cc=david@kernel.org \
--cc=kas@kernel.org \
--cc=kvm@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=pbonzini@redhat.com \
--cc=rppt@kernel.org \
--cc=seanjc@google.com \
--cc=skhan@linuxfoundation.org \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.