Re: [RFC PATCH 0/2] mm: memfd with write notifications

Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Mattias Nissler <mattias.nissler@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>,
	Mattias Nissler <mnissler@meta.com>,
	linux-mm@kvack.org, Hugh Dickins <hughd@google.com>,
	"Lorenzo Stoakes (Oracle)" <ljs@kernel.org>,
	"Liam R. Howlett" <liam@infradead.org>,
	Vlastimil Babka <vbabka@kernel.org>,
	Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>
Subject: Re: [RFC PATCH 0/2] mm: memfd with write notifications
Date: Wed, 17 Jun 2026 11:14:39 +0200	[thread overview]
Message-ID: <7e77c739-a279-4875-aa2a-e68430aa7404@kernel.org> (raw)
In-Reply-To: <CAERLvmSc6Z9JN-Ufc6_NJvuk4S8oLD-HfDdWS2k1JNo90vhjQQ@mail.gmail.com>

On 6/16/26 13:32, Mattias Nissler wrote:
> On Tue, Jun 16, 2026 at 10:20 AM David Hildenbrand (Arm)
> <david@kernel.org> wrote:
>>
>> On 6/16/26 10:02, Mattias Nissler wrote:
>>> On Mon, Jun 15, 2026 at 7:43 PM David Hildenbrand (Arm)
>>> <david@kernel.org> wrote:
>>>
>>> Thanks for bringing up that context. So this is mostly dirty page
>>> tracking? I reckon you'll need a way to learn which pages have been
>>> written, rather than just a notification if any page in the region
>>> gets hit?
>>
>> Yes, for QEMU to use it as replacement for uffd targeted at multi-process setups
>> (i.e., vhost-user), I think you'd need something similar like uffd, but on the
>> file level, and less uffd-like :)
> 
> This was probably a bit tongue-in-cheek, but if I may ask, what
> aspects of uffd are problematic?

Heh, there are several.

Let's ignoring the implementation-wise issues that people keep complaining
about. Two (related) problems I am aware of:

1) User-space handling

Right now you always need a user-space handler that POLLs for events to handle
them. For each event, you have to context-switch to user space, sometimes a
couple of times. Scalability problems (many threads faulting at the same time,
in-kernel locking) was raised as a problem in the past.

2) Blocking nature

If you look into the details of the history of UFFD_USER_MODE_ONLY +
/proc/sys/vm/unprivileged_userfaultfd, the problem is that userfaultfd can block
at various places, some of them possibly being able to hurt the kernel.

We inherently rely on user space to make progress.

Using BPF[1] could avoid both problems in some scenarios, but there are
certainly use cases where you would still have to block.

[1] https://dl.acm.org/doi/10.1145/3672197.3673432

> Obviously it being process-scoped is
> a major mismatch for your use case, but are there any other challenges
> you have in mind?

Right, that's a conceptual thing: userfaultfd protects VMA ranges, not file
ranges. To emulate protecting file ranges, you have to protect all mmap's in all
involved processes.

Obviously, things like read() or write() instead of mmap() cannot be handled by
userfaultfd.

> 
>>
>> But in general, protecting/unprotecting file ranges (read-only, read-write),
>> notifications on access, notifications when filling holes etc.
> 
> Do you need notifications to be synchronous (stopping the faulting
> process) or is asynchronous (firing a notification and
> write-unprotecting automatically) sufficient?

Most use cases I am aware of need to be synchronous. Only some could be relaxed
to asynchronous handling.

With postcopy live-migration, you really have to place the page with the right
content before the faulting thread can continue running. You cannot just place
zero-filled pages. Similarly with CRIU.

With VM background snapshots, you really have to save away page content before
un-protecting and modifying the page.

For electric fences[2], it might be sufficient to just detect "wrong page
accessed" asynchronously. But it doesn't really work on files.

Protecting unplugged virtio-mem memory in VMs from re-access  (sparse memory
regions where some parts should no be accessed by the VM) could likely get away
with asynchronous handling, but similar to electric fences, synchronous events
might be better to debug the "what did actually do something wrong".

I assume garbage collection similarly requires synchronous notifications (but I
would suspect that this is usually anonymous memory).

There are some upcoming use cases around working-set tracking [3]. IIRC,
asynchronous handling is usually fine (or not requiring a notification at all
and instead inspecting the accessed-state).

[2] https://gitlab.com/efency/efency
[3] https://lore.kernel.org/r/20260529172716.357179-1-kas@kernel.org

> 
>>
>> uffd gives you that of course, but at the cost of the kernel
>>> having to perform more bookkeeping. Trying to wrap my head around what
>>> a good trade-off for complexity / user space API could be.
>>
>> Yes. For a simple doorbell mechanism (IIUC your proposal correctly), it feels
>> rather odd to embed it like that in memfd.
> 
> So what you have in mind would operate at the file level, but work for
> any kind of file?

VMs with file-backed memory usually rely on shmem/hugetlb/memfd (+guest_memfd in
the near future). Other file systems are uncommon.

For CRIU, I could imagine that other file systems could be reasonable, but I
don't know enough about how they handle files.

> 
> Btw. thanks for bringing a different perspective to this conversation
> to help explore the design space, this is exactly what I was hoping
> for.

Sure! I guess my main point is: most use cases I am aware of would need
synchronous handling (and have ways to fix it up, like userfaultfd). For a
simple doorbell, this might not really be what you want.

-- 
Cheers,

David

     prev parent reply	other threads:[~2026-06-17  9:14 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-03 12:55 [RFC PATCH 0/2] mm: memfd with write notifications Mattias Nissler
2026-06-03 12:55 ` [RFC PATCH 1/2] mm: `memfd_tripwire` proof-of-concept Mattias Nissler
2026-06-03 12:55 ` [RFC PATCH 2/2] selftests: `memfd_tripwire` selftest Mattias Nissler
2026-06-11  1:36 ` [RFC PATCH 0/2] mm: memfd with write notifications Baolin Wang
2026-06-11 12:40   ` Mattias Nissler
     [not found]   ` <40381f8a-47e3-4f97-a9ad-f6f868fe0392@kernel.org>
     [not found]     ` <CAERLvmQyOAvCN971uUx1PDqTXExOv-BHbNgo-oByaHavUmLgfw@mail.gmail.com>
     [not found]       ` <ee858321-7407-423a-adca-caab5ad9e2b8@kernel.org>
2026-06-16 11:32         ` Mattias Nissler
2026-06-17  9:14           ` David Hildenbrand (Arm) [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7e77c739-a279-4875-aa2a-e68430aa7404@kernel.org \
    --to=david@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=hughd@google.com \
    --cc=liam@infradead.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mattias.nissler@gmail.com \
    --cc=mhocko@suse.com \
    --cc=mnissler@meta.com \
    --cc=rppt@kernel.org \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox