From: Peter Xu <peterx@redhat.com>
To: Kiryl Shutsemau <kas@kernel.org>
Cc: "David Hildenbrand (Arm)" <david@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Lorenzo Stoakes <ljs@kernel.org>, Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Vlastimil Babka <vbabka@kernel.org>,
"Liam R . Howlett" <Liam.Howlett@oracle.com>,
Zi Yan <ziy@nvidia.com>, Jonathan Corbet <corbet@lwn.net>,
Shuah Khan <skhan@linuxfoundation.org>,
Sean Christopherson <seanjc@google.com>,
Paolo Bonzini <pbonzini@redhat.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org,
kvm@vger.kernel.org
Subject: Re: [RFC, PATCH 00/12] userfaultfd: working set tracking for VM guest memory
Date: Fri, 24 Apr 2026 11:55:39 -0400 [thread overview]
Message-ID: <aeuSe0PY-g10KNUF@x1.local> (raw)
In-Reply-To: <aetyhki-UD70dyRL@thinkstation>
On Fri, Apr 24, 2026 at 02:49:58PM +0100, Kiryl Shutsemau wrote:
> On Fri, Apr 24, 2026 at 07:51:44AM -0400, Peter Xu wrote:
> > On Fri, Apr 24, 2026 at 11:34:48AM +0100, Kiryl Shutsemau wrote:
> > > Both page_idle and the LRUs (legacy or MGLRU) track accesses on physical
> > > memory. We need visibility in the virtual address space domain.
> >
> > Yes they are, but ACCESS bit isn't.
>
> A-bit is not a reliable signal for userspace working-set tracking
> because the kernel itself is a concurrent consumer. It is exactly why
> page_idle needs PG_young on top of the A-bit: PG_young is the "kernel
I assume you meant PG_idle. I actually don't know whether PG_young is
still actively used anywhere in the current code base.
> ate the A-bit but the page was actually touched" escape hatch. And
> bringing PG_young into the picture puts us right back into physical-side
> tracking.
>
> > For migration, see e.g. remove_migration_pte() has:
> >
> > if (!softleaf_is_migration_young(entry))
> > pte = pte_mkold(pte);
>
> remove_migration_pte() only propagates young-at-unmap. It does not
> cover the common case: A-bit cleared by reclaim before migration
> started. The concurrent-consumer problem is what breaks the signal,
> not the migration boundary.
IMHO it's a separate problem, and AFAIU it was well solved at least with
old LRUs with PG_idle. It's just slightly unfortunate it doesn't yet work
with MGLRU. Also, when the extra bit is in folio->flags, it only works if
both the consumers are reporting per-folio, not per-mm.
I'm actually curious whether there're numbers or solid proof showing that
in your case the per-folio perf is too bad already to justify a new per-mm
API, like RWP. It's because currently this proposal is still so far very
much about "let's implement a swap system". It really doesn't yet have a
lot to prove on hotness tracking POV.
Not asking for a time-consuming test immediately, but IMHO these should
really be solid clues to first justify the overhead with current rmap in
production.
For us, we know the overhead in theory, but we never really measured how
much.
Even if so, I don't think it's unsolvable.
I want to explore if there's something that can still be generic and work
for per-mm tracking. I believe if we can have some bit in the ptes, then
when mm reclaim code walks clearing ACCESS bit and sees some vma is being
tracked, then instead of setting PG_idle, it can just move the access bit
over to that special pte bit, and only to this vma this pte. IIUC that'll
benefit from both worlds: fast HW-accelerated access bit, and no minor
faults.
Would something like that worth exploring?
--
Peter Xu
next prev parent reply other threads:[~2026-04-24 15:55 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-14 14:23 [RFC, PATCH 00/12] userfaultfd: working set tracking for VM guest memory Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 01/12] userfaultfd: define UAPI constants for anonymous minor faults Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 02/12] userfaultfd: add UFFD_FEATURE_MINOR_ANON registration support Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 03/12] userfaultfd: implement UFFDIO_DEACTIVATE ioctl Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 04/12] userfaultfd: UFFDIO_CONTINUE for anonymous memory Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 05/12] mm: intercept protnone faults on VM_UFFD_MINOR anonymous VMAs Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 06/12] userfaultfd: auto-resolve shmem and hugetlbfs minor faults in async mode Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 07/12] sched/numa: skip scanning anonymous VM_UFFD_MINOR VMAs Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 08/12] userfaultfd: enable UFFD_FEATURE_MINOR_ANON Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 09/12] mm/pagemap: add PAGE_IS_UFFD_DEACTIVATED to PAGEMAP_SCAN Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 10/12] userfaultfd: add UFFDIO_SET_MODE for runtime sync/async toggle Kiryl Shutsemau (Meta)
2026-04-15 15:08 ` Usama Arif
2026-04-16 13:27 ` Kiryl Shutsemau
2026-04-14 14:23 ` [RFC, PATCH 11/12] selftests/mm: add userfaultfd anonymous minor fault tests Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 12/12] Documentation/userfaultfd: document working set tracking Kiryl Shutsemau (Meta)
2026-04-14 15:28 ` [RFC, PATCH 00/12] userfaultfd: working set tracking for VM guest memory Peter Xu
2026-04-14 17:08 ` Kiryl Shutsemau
2026-04-14 17:45 ` Peter Xu
2026-04-14 15:37 ` David Hildenbrand (Arm)
2026-04-14 17:10 ` Kiryl Shutsemau
2026-04-16 13:49 ` Kiryl Shutsemau
2026-04-16 18:32 ` David Hildenbrand (Arm)
2026-04-16 20:25 ` Kiryl Shutsemau
2026-04-17 11:02 ` Kiryl Shutsemau
2026-04-17 11:43 ` David Hildenbrand (Arm)
2026-04-17 12:26 ` Kiryl Shutsemau
2026-04-19 14:33 ` Kiryl Shutsemau
2026-04-21 13:03 ` David Hildenbrand (Arm)
2026-04-21 14:33 ` Kiryl Shutsemau
2026-04-22 9:27 ` Kiryl Shutsemau
2026-04-22 18:27 ` David Hildenbrand (Arm)
2026-04-22 18:39 ` David Hildenbrand (Arm)
2026-04-23 14:27 ` Kiryl Shutsemau
2026-04-23 14:50 ` Peter Xu
2026-04-23 18:08 ` Kiryl Shutsemau
2026-04-23 18:57 ` Peter Xu
2026-04-23 19:25 ` David Hildenbrand (Arm)
2026-04-23 20:10 ` Peter Xu
2026-04-24 11:37 ` Kiryl Shutsemau
2026-04-24 12:59 ` Peter Xu
2026-04-25 5:56 ` David Hildenbrand (Arm)
2026-04-24 0:26 ` SeongJae Park
2026-04-24 11:55 ` Peter Xu
2026-04-24 23:59 ` SeongJae Park
2026-04-24 10:34 ` Kiryl Shutsemau
2026-04-24 11:51 ` Peter Xu
2026-04-24 13:49 ` Kiryl Shutsemau
2026-04-24 15:55 ` Peter Xu [this message]
2026-04-24 16:09 ` Peter Xu
2026-04-25 6:05 ` David Hildenbrand (Arm)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aeuSe0PY-g10KNUF@x1.local \
--to=peterx@redhat.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=corbet@lwn.net \
--cc=david@kernel.org \
--cc=kas@kernel.org \
--cc=kvm@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=pbonzini@redhat.com \
--cc=rppt@kernel.org \
--cc=seanjc@google.com \
--cc=skhan@linuxfoundation.org \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox