linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Joanne Koong <joannelkoong@gmail.com>
To: Caleb Sander Mateos <csander@purestorage.com>
Cc: miklos@szeredi.hu, axboe@kernel.dk, bschubert@ddn.com,
	 asml.silence@gmail.com, io-uring@vger.kernel.org,
	xiaobing.li@samsung.com,  linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v1 07/30] io_uring/rsrc: add fixed buffer table pinning/unpinning
Date: Sat, 13 Dec 2025 15:07:57 +0900	[thread overview]
Message-ID: <CAJnrk1ZorOeUkUkCsZ6xYj8CKFPEa5WbACTNgm7_kZ4LLvLoCA@mail.gmail.com> (raw)
In-Reply-To: <CADUfDZqmj0X0e+DzFG-8W7cOkjj1emNzovitCL4E8SyLHmR4Ag@mail.gmail.com>

On Wed, Dec 10, 2025 at 12:35 PM Caleb Sander Mateos
<csander@purestorage.com> wrote:
>
> On Thu, Dec 4, 2025 at 12:07 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> >
> > On Wed, Dec 3, 2025 at 5:24 PM Caleb Sander Mateos
> > <csander@purestorage.com> wrote:
> > >
> > > On Wed, Dec 3, 2025 at 2:52 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> > > >
> > > > On Tue, Dec 2, 2025 at 8:49 PM Caleb Sander Mateos
> > > > <csander@purestorage.com> wrote:
> > > > >
> > > > > On Tue, Dec 2, 2025 at 4:36 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> > > > > >
> > > > > > Add kernel APIs to pin and unpin the buffer table for fixed buffers,
> > > > > > preventing userspace from unregistering or updating the fixed buffers
> > > > > > table while it is pinned by the kernel.

After discussing this a bit with Jens at lpc, he's not a fan of the
pinning idea for fixed buffers, so for v2 I am going to abandon this
and instead just do the lookup every time.

This ends up making the discussion below irrelevant, but adding my
comments below anyways for completeness.

> > > > > >
> > > > > > This has two advantages:
> > > > > > a) Eliminating the overhead of having to fetch and construct an iter for
> > > > > > a fixed buffer per every cmd. Instead, the caller can pin the buffer
> > > > > > table, fetch/construct the iter once, and use that across cmds for
> > > > > > however long it needs to until it is ready to unpin the buffer table.
> > > > > >
> > > > > > b) Allowing a fixed buffer lookup at any index. The buffer table must be
> > > > > > pinned in order to allow this, otherwise we would have to keep track of
> > > > > > all the nodes that have been looked up by the io_kiocb so that we can
> > > > > > properly adjust the refcounts for those nodes. Ensuring that the buffer
> > > > > > table must first be pinned before being able to fetch a buffer at any
> > > > > > index makes things logistically a lot neater.
> > > > >
> > > > > Why is it necessary to pin the entire buffer table rather than
> > > > > specific entries? That's the purpose of the existing io_rsrc_node refs
> > > > > field.
> > > >
> > > > How would this work with userspace buffer unregistration (which works
> > > > at the table level)? If buffer unregistration should still succeed
> > > > then fuse would need a way to be notified that the buffer has been
> > > > unregistered since the buffer belongs to userspace (eg it would be
> > > > wrong if fuse continues using it even though fuse retains a refcount
> > > > on it). If buffer unregistration should fail, then we would need to
> > > > track this pinned state inside the node instead of relying just on the
> > > > refs field, as buffers can be unregistered even if there are in-flight
> > > > refs (eg we would need to differentiate the ref being from a pin vs
> > > > from not a pin), and I think this would make unregistration more
> > > > cumbersome as well (eg we would have to iterate through all the
> > > > entries looking to see if any are pinned before iterating through them
> > > > again to do the actual unregistration).
> > >
> > > Not sure I would say buffer unregistration operates on the table as a
> > > whole. Each registered buffer node is unregistered individually and
> >
> > I'm looking at the liburing interface for it and I'm only seeing
> > io_uring_unregister_buffers() / IORING_UNREGISTER_BUFFERS which works
> > on the entire table, so I'm wondering how that interface would work if
> > pinning/unpinning was at the entry level?
>
> IORING_REGISTER_BUFFERS_UPDATE can be used to update individual
> registered buffers. For each updated slot, if there is an existing
> buffer, it will be unregistered (decrementing the buffer node's
> reference count). A new buffer may or may not be registered in its
> place; it can be skipped by specifying a struct iovec with a NULL
> iov_base.
>

I think we would still need to account for the
IORING_UNREGISTER_BUFFERS path since that is callable by users.

> >
> > > stores its own reference count. io_put_rsrc_node() will be called on
> > > each buffer node in the table. However, io_put_rsrc_node() just
> > > removes the one reference from the buffer node. If there are other
> > > references on the buffer node (such as an inflight io_uring request
> > > using it), io_free_rsrc_node() won't be called to free the buffer node
> > > until all those references are dropped too. So fuse holding a
> > > reference on the buffer node would allow it to be unregistered, but
> > > prevent it from being freed until fuse dropped its reference.
> > > I'm not sure I understand the problem with fuse continuing to hold
> > > onto a registered buffer node after userspace has unregistered it from
> > > the buffer table. (It looks like the buffer node in question is the
> >
> > For fuse, it holds the reference to the buffer for the lifetime of the
> > connection, which could be a very long time. I'm not seeing how we
> > could let userspace succeed in unregistering with fuse continuing to
> > hold that reference, since as I understand it conceptually,
> > unregistering the buffer should give ownership of the buffer
> > completely back to userspace.
>
> I'm not quite sure what you mean by "give ownership of the buffer
> completely back to userspace". My understanding is that registering a
> buffer with io_uring just pins the physical pages as a perf
> optimization. I'm not aware of a way for userspace to observe directly
> whether or not certain physical pages are pinned. There's already no
> guarantee that the physical pages are unpinned as soon as a buffer is
> unregistered; if there are any inflight io_uring requests using the
> registered buffer, they will continue to hold a reference count on the
> buffer, preventing it from being released. The only guarantee is that
> the unregistered slot in the buffer table is now empty.
>
> Presumably fuse must release its reference count (or pin) eventually,
> or otherwise there would be a resource leak? I don't see an issue with
> holding references to registered buffers for the lifetime of a fuse
> connection. As long as there's a way for the fuse server to tell the
> kernel to release those resources if they are no longer needed (which
> it sounds like already exists from your description of aborting a fuse
> connection).

The way I view it is that conceptually, userspace "owns" the buffer.
Userspace allocated the buffer and is later responsible for freeing it
when it's done using it. When userspace registers the buffer to the
ring, it can expect the buffer to be written to / read from by the
kernel code, but it should also be able to expect that when it
unregisters the buffer from the ring and any of the concurrent
inflight requests are fulfilled, the kernel will not be accessing that
buffer anymore and userspace has complete sovereignty over the buffer.

With fuse continuing to hold the reference on the buffer (for the
lifetime of the connection) and continuing to read to / write from it
after unregistration, that violates that contract.

>
> >
> > > one at FUSE_URING_FIXED_HEADERS_INDEX?) Wouldn't pinning the buffer
> >
> > Yep you have that right, the buffer node in question is the one at
> > FUSE_URING_FIXED_HEADERS_INDEX which is where all the headers for
> > requests are placed.
> >
> > > table present similar issues? How would userspace get fuse to drop its
> >
> > I don't think pinning the buffer table has a similar issue because we
> > disallow unregistration if it's pinned.
>
> It sounds like you're saying that buffer unregistration just isn't
> expected to work together with fuse's use of registered buffers, is
> that accurate? Does it matter then whether the buffer unregistration
> returns -EBUSY because the buffer table is pinned vs. succeeding but
> not actually releasing the buffer resource because fuse still holds a
> reference on it? My preference would be to just use the existing

Yes, I think the difference is that returning -EBUSY signals to the
caller that the buffer cannot be unregistered and will still be used.
Whereas succeeding but not actually releasing the buffer resource and
still continuing to use it beyond the inflight cmds is
contrary/deceptive behavior to what the user should be able to expect.

> reference counting mechanism rather than introducing another mechanism
> if both provide safety equally well.
>
> >
> > > pin if it wants to modify the buffer registrations? I would imagine
> >
> > For the fuse use case, the server never really modifies its buffer
> > registrations as it sets up everything before initiating the
> > connection. But if it wanted to in the future, the server could send a
> > fuse notification to the kernel to unpin the buf table.
>
> This seems to assume that all the registered buffers are used
> exclusively for fuse purposes. But it seems plausible that a fuse
> server might want to use io_uring for other non-fuse purposes and want
> to use registered buffers there too as a perf optimization? Then it
> would be nice for the process to be able to update the other
> registered buffers even while the kernel pins (or holds a reference
> count on) the ones used for fuse.

I don't think this is likely. I think it's more likely that a
high-performance fuse server would use a separate ring for
non-fuse-related I/O, as doing it on the same ring it's using to
communicate with the fuse kernel would lead to more contention for
ring entries which would result in higher latency for servicing
incoming/outgoing fuse requests. I guess the caller could make the
ring very big to accommodate for that, but then that leads to more
memory overhead (eg allocating the headers buffers for all the
potential entries).

>
> Best,
> Caleb

Thanks,
Joanne

  reply	other threads:[~2025-12-13  6:08 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-03  0:34 [PATCH v1 00/30] fuse/io-uring: add kernel-managed buffer rings and zero-copy Joanne Koong
2025-12-03  0:34 ` [PATCH v1 01/30] io_uring/kbuf: refactor io_buf_pbuf_register() logic into generic helpers Joanne Koong
2025-12-03  0:34 ` [PATCH v1 02/30] io_uring/kbuf: rename io_unregister_pbuf_ring() to io_unregister_buf_ring() Joanne Koong
2025-12-03  0:34 ` [PATCH v1 03/30] io_uring/kbuf: add support for kernel-managed buffer rings Joanne Koong
2025-12-03  0:34 ` [PATCH v1 04/30] io_uring/kbuf: add mmap " Joanne Koong
2025-12-03  0:35 ` [PATCH v1 05/30] io_uring/kbuf: support kernel-managed buffer rings in buffer selection Joanne Koong
2025-12-03  0:35 ` [PATCH v1 06/30] io_uring/kbuf: add buffer ring pinning/unpinning Joanne Koong
2025-12-03  4:13   ` Caleb Sander Mateos
2025-12-04 18:41     ` Joanne Koong
2025-12-03  0:35 ` [PATCH v1 07/30] io_uring/rsrc: add fixed buffer table pinning/unpinning Joanne Koong
2025-12-03  4:49   ` Caleb Sander Mateos
2025-12-03 22:52     ` Joanne Koong
2025-12-04  1:24       ` Caleb Sander Mateos
2025-12-04 20:07         ` Joanne Koong
2025-12-10  3:35           ` Caleb Sander Mateos
2025-12-13  6:07             ` Joanne Koong [this message]
2025-12-03  0:35 ` [PATCH v1 08/30] io_uring/kbuf: add recycling for pinned kernel managed buffer rings Joanne Koong
2025-12-03  0:35 ` [PATCH v1 09/30] io_uring: add io_uring_cmd_import_fixed_index() Joanne Koong
2025-12-03 21:43   ` Caleb Sander Mateos
2025-12-04 18:56     ` Joanne Koong
2025-12-05 16:56       ` Caleb Sander Mateos
2025-12-05 23:28         ` Joanne Koong
2025-12-11  2:57           ` Caleb Sander Mateos
2025-12-03  0:35 ` [PATCH v1 10/30] io_uring/kbuf: add io_uring_is_kmbuf_ring() Joanne Koong
2025-12-03  0:35 ` [PATCH v1 11/30] io_uring/kbuf: return buffer id in buffer selection Joanne Koong
2025-12-03 21:53   ` Caleb Sander Mateos
2025-12-04 19:22     ` Joanne Koong
2025-12-04 21:57       ` Caleb Sander Mateos
2025-12-03  0:35 ` [PATCH v1 12/30] io_uring/kbuf: export io_ring_buffer_select() Joanne Koong
2025-12-03  0:35 ` [PATCH v1 13/30] io_uring/cmd: set selected buffer index in __io_uring_cmd_done() Joanne Koong
2025-12-03  0:35 ` [PATCH v1 14/30] io_uring: add release callback for ring death Joanne Koong
2025-12-03 22:25   ` Caleb Sander Mateos
2025-12-03 22:54     ` Joanne Koong
2025-12-03  0:35 ` [PATCH v1 15/30] fuse: refactor io-uring logic for getting next fuse request Joanne Koong
2025-12-03  0:35 ` [PATCH v1 16/30] fuse: refactor io-uring header copying to ring Joanne Koong
2025-12-03  0:35 ` [PATCH v1 17/30] fuse: refactor io-uring header copying from ring Joanne Koong
2025-12-03  0:35 ` [PATCH v1 18/30] fuse: use enum types for header copying Joanne Koong
2025-12-03  0:35 ` [PATCH v1 19/30] fuse: refactor setting up copy state for payload copying Joanne Koong
2025-12-03  0:35 ` [PATCH v1 20/30] fuse: support buffer copying for kernel addresses Joanne Koong
2025-12-03  0:35 ` [PATCH v1 21/30] fuse: add io-uring kernel-managed buffer ring Joanne Koong
2025-12-03  0:35 ` [PATCH v1 22/30] io_uring/rsrc: refactor io_buffer_register_bvec()/io_buffer_unregister_bvec() Joanne Koong
2025-12-07  8:33   ` Caleb Sander Mateos
2025-12-13  5:11     ` Joanne Koong
2025-12-03  0:35 ` [PATCH v1 23/30] io_uring/rsrc: split io_buffer_register_request() logic Joanne Koong
2025-12-07  8:41   ` Caleb Sander Mateos
2025-12-13  5:24     ` Joanne Koong
2025-12-03  0:35 ` [PATCH v1 24/30] io_uring/rsrc: Allow buffer release callback to be optional Joanne Koong
2025-12-07  8:42   ` Caleb Sander Mateos
2025-12-03  0:35 ` [PATCH v1 25/30] io_uring/rsrc: add io_buffer_register_bvec() Joanne Koong
2025-12-03  0:35 ` [PATCH v1 26/30] io_uring/rsrc: export io_buffer_unregister Joanne Koong
2025-12-03  0:35 ` [PATCH v1 27/30] fuse: rename fuse_set_zero_arg0() to fuse_zero_in_arg0() Joanne Koong
2025-12-03  0:35 ` [PATCH v1 28/30] fuse: enforce op header for every payload reply Joanne Koong
2025-12-03  0:35 ` [PATCH v1 29/30] fuse: add zero-copy over io-uring Joanne Koong
2025-12-03  0:35 ` [PATCH v1 30/30] docs: fuse: add io-uring bufring and zero-copy documentation Joanne Koong
2025-12-13  7:52   ` Askar Safin
2025-12-13  9:14 ` [PATCH v1 00/30] fuse/io-uring: add kernel-managed buffer rings and zero-copy Askar Safin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJnrk1ZorOeUkUkCsZ6xYj8CKFPEa5WbACTNgm7_kZ4LLvLoCA@mail.gmail.com \
    --to=joannelkoong@gmail.com \
    --cc=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=bschubert@ddn.com \
    --cc=csander@purestorage.com \
    --cc=io-uring@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=xiaobing.li@samsung.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).