All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Kiryl Shutsemau <kas@kernel.org>
Cc: "David Hildenbrand (Arm)" <david@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Lorenzo Stoakes <ljs@kernel.org>, Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Vlastimil Babka <vbabka@kernel.org>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	Zi Yan <ziy@nvidia.com>, Jonathan Corbet <corbet@lwn.net>,
	Shuah Khan <skhan@linuxfoundation.org>,
	Sean Christopherson <seanjc@google.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org,
	kvm@vger.kernel.org
Subject: Re: [RFC, PATCH 00/12] userfaultfd: working set tracking for VM guest memory
Date: Thu, 23 Apr 2026 14:57:34 -0400	[thread overview]
Message-ID: <aeprnnccJeyHB2rt@x1.local> (raw)
In-Reply-To: <aeo5IPpQi7onyjTF@thinkstation>

On Thu, Apr 23, 2026 at 07:08:00PM +0100, Kiryl Shutsemau wrote:
> On Thu, Apr 23, 2026 at 10:50:06AM -0400, Peter Xu wrote:
> > Hello, Kiryl,
> > 
> > On Thu, Apr 23, 2026 at 03:27:11PM +0100, Kiryl Shutsemau wrote:
> > > The patchet is pretty good shape in my eyes and will probably drop RFC
> > > tag.
> > 
> > I still have some high level questions not yet got answered.  Do you want
> > to answer them?
> > 
> > https://lore.kernel.org/all/ad59TxAHNwFWH7Cc@x1.local/
> 
> Sorry, reply to this got lost in my TODO list.

No worries.

> 
> > In summary, it's about:
> > 
> > - Whether we have explored other approaches on page hotness tracking
> 
> So, for read/write tracking we have clear_refs=1, page_idle and DAMON.
> Did I miss something?
> 
> clear_refs is process-wide hammer. And you can miss a hot page if it
> races with LRU rotation.
> 
> page_idle needs rmap. It will not scale.

Yes. If you would benefit from a per-mm page_idle, then it may apply to us
too if we will be enforced to implement full-userspace swap in QEMU.

That's also why I suggested (in my previous reply) that we split the
requirement: one is for hotness tracking, the other is about read-inclusive
trapping (v.s. wr-protect only traps).

> 
> DAMON is built around sampling. It is good for working set estimation,
> but I don't think it is directly useful for eviction decision. It can
> miss hot pages. LRU rotation will also loose info.

Exactly.  If we need to collect ACCESS bit (or anything similar) for
eviction accuracy pusrpose, IIUC we need per-page info, we can't estimate
by sampling.

> 
> None of them gives comparable capabilities.

I want to see if some of your work can be generalized so we can use too,
and we can also work together.

> 
> We also need a mechanism to atomically evict pages.

Yes, this is the 2nd question below, and btw uffd-wp can also achieve this.

> 
> > - Whether read protection is required for an userspace swap system
> >   (e.g. did you get time to have a look at umap?)
> 
> I looked at it briefly, so I can miss details.
> 
> IIUC, in absence of read tracking it doesn't collect hotness information
> at all. The eviction is based on fault-in time: the oldest faulted-in

For example, let's imagine if we can have a per-mm idle page tracker, would
it work for you to collect hotness info?

The other idea is, no matter whether we use MGLRU or legacy LRU, if we can
expose a better interface to share hotness info from kernel to userspace,
would it be possible?

> page gets evicted first. I guess it is fine if you don't care much about
> refault cost. Like, if your workload fits into memory completely and
> refaults are rare.

One thing to mention is, if we have any hotness tracking facility ready
above (e.g. per-mm idle page tracking) we _will_ trap read faults too; it's
just that it'll be much faster (when it's hardware ACCESS bit).

So if I'm not wrong, what I am trying to discuss as a full userspace swap
system will always trap read too for most of the cases.

The difference is only about that 5ms (in case of 30s+5ms example I gave in
the other email).  Your RW protection will also trap that 5ms, what I
described won't: when a decision is made, we wr-protect the page, any read
on top of it will still go through so it will trigger a refault.  My point
is, that 5ms missing over 30s (in reality maybe more than 30s) sampling
window (which covered read accesses) isn't a major issue, and IMHO it's not
a strong enough reason to include the whole RW feature.

The other thing is, as I mentioned in the other email, I still don't know
how the current RW protection would work for anonymous.  I don't yet think
the user swapper can read the anon page with RW-protected pgtables.  So far
my understanding is maybe you only care about shmem so it's fine, but it'll
always be great to confirm with you.

Thanks,

> 
> That's not my case.
> 
> -- 
>   Kiryl Shutsemau / Kirill A. Shutemov
> 

-- 
Peter Xu


  reply	other threads:[~2026-04-23 18:57 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-14 14:23 [RFC, PATCH 00/12] userfaultfd: working set tracking for VM guest memory Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 01/12] userfaultfd: define UAPI constants for anonymous minor faults Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 02/12] userfaultfd: add UFFD_FEATURE_MINOR_ANON registration support Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 03/12] userfaultfd: implement UFFDIO_DEACTIVATE ioctl Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 04/12] userfaultfd: UFFDIO_CONTINUE for anonymous memory Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 05/12] mm: intercept protnone faults on VM_UFFD_MINOR anonymous VMAs Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 06/12] userfaultfd: auto-resolve shmem and hugetlbfs minor faults in async mode Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 07/12] sched/numa: skip scanning anonymous VM_UFFD_MINOR VMAs Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 08/12] userfaultfd: enable UFFD_FEATURE_MINOR_ANON Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 09/12] mm/pagemap: add PAGE_IS_UFFD_DEACTIVATED to PAGEMAP_SCAN Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 10/12] userfaultfd: add UFFDIO_SET_MODE for runtime sync/async toggle Kiryl Shutsemau (Meta)
2026-04-15 15:08   ` Usama Arif
2026-04-16 13:27     ` Kiryl Shutsemau
2026-04-14 14:23 ` [RFC, PATCH 11/12] selftests/mm: add userfaultfd anonymous minor fault tests Kiryl Shutsemau (Meta)
2026-04-14 14:23 ` [RFC, PATCH 12/12] Documentation/userfaultfd: document working set tracking Kiryl Shutsemau (Meta)
2026-04-14 15:28 ` [RFC, PATCH 00/12] userfaultfd: working set tracking for VM guest memory Peter Xu
2026-04-14 17:08   ` Kiryl Shutsemau
2026-04-14 17:45     ` Peter Xu
2026-04-14 15:37 ` David Hildenbrand (Arm)
2026-04-14 17:10   ` Kiryl Shutsemau
2026-04-16 13:49     ` Kiryl Shutsemau
2026-04-16 18:32       ` David Hildenbrand (Arm)
2026-04-16 20:25         ` Kiryl Shutsemau
2026-04-17 11:02           ` Kiryl Shutsemau
2026-04-17 11:43           ` David Hildenbrand (Arm)
2026-04-17 12:26             ` Kiryl Shutsemau
2026-04-19 14:33               ` Kiryl Shutsemau
2026-04-21 13:03                 ` David Hildenbrand (Arm)
2026-04-21 14:33                   ` Kiryl Shutsemau
2026-04-22  9:27                     ` Kiryl Shutsemau
2026-04-22 18:27                       ` David Hildenbrand (Arm)
2026-04-22 18:39                     ` David Hildenbrand (Arm)
2026-04-23 14:27                       ` Kiryl Shutsemau
2026-04-23 14:50                         ` Peter Xu
2026-04-23 18:08                           ` Kiryl Shutsemau
2026-04-23 18:57                             ` Peter Xu [this message]
2026-04-23 19:25                               ` David Hildenbrand (Arm)
2026-04-23 20:10                                 ` Peter Xu
2026-04-24 11:37                                   ` Kiryl Shutsemau
2026-04-24 12:59                                     ` Peter Xu
2026-04-25  5:56                                   ` David Hildenbrand (Arm)
2026-04-24  0:26                               ` SeongJae Park
2026-04-24 11:55                                 ` Peter Xu
2026-04-24 23:59                                   ` SeongJae Park
2026-04-24 10:34                               ` Kiryl Shutsemau
2026-04-24 11:51                                 ` Peter Xu
2026-04-24 13:49                                   ` Kiryl Shutsemau
2026-04-24 15:55                                     ` Peter Xu
2026-04-24 16:09                                       ` Peter Xu
2026-04-27 10:52                                       ` Kiryl Shutsemau
2026-04-25  6:05                                     ` David Hildenbrand (Arm)
2026-04-27 10:23                                       ` Kiryl Shutsemau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aeprnnccJeyHB2rt@x1.local \
    --to=peterx@redhat.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=david@kernel.org \
    --cc=kas@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=rppt@kernel.org \
    --cc=seanjc@google.com \
    --cc=skhan@linuxfoundation.org \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.