linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Caleb Sander Mateos <csander@purestorage.com>
To: Joanne Koong <joannelkoong@gmail.com>
Cc: miklos@szeredi.hu, axboe@kernel.dk, bschubert@ddn.com,
	 asml.silence@gmail.com, io-uring@vger.kernel.org,
	xiaobing.li@samsung.com,  linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v1 07/30] io_uring/rsrc: add fixed buffer table pinning/unpinning
Date: Tue, 9 Dec 2025 19:35:35 -0800	[thread overview]
Message-ID: <CADUfDZqmj0X0e+DzFG-8W7cOkjj1emNzovitCL4E8SyLHmR4Ag@mail.gmail.com> (raw)
In-Reply-To: <CAJnrk1ZmJ8V3MOpcEYks5i3dG431C7PmB14J7N2k22rn41ROuw@mail.gmail.com>

On Thu, Dec 4, 2025 at 12:07 PM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> On Wed, Dec 3, 2025 at 5:24 PM Caleb Sander Mateos
> <csander@purestorage.com> wrote:
> >
> > On Wed, Dec 3, 2025 at 2:52 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> > >
> > > On Tue, Dec 2, 2025 at 8:49 PM Caleb Sander Mateos
> > > <csander@purestorage.com> wrote:
> > > >
> > > > On Tue, Dec 2, 2025 at 4:36 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> > > > >
> > > > > Add kernel APIs to pin and unpin the buffer table for fixed buffers,
> > > > > preventing userspace from unregistering or updating the fixed buffers
> > > > > table while it is pinned by the kernel.
> > > > >
> > > > > This has two advantages:
> > > > > a) Eliminating the overhead of having to fetch and construct an iter for
> > > > > a fixed buffer per every cmd. Instead, the caller can pin the buffer
> > > > > table, fetch/construct the iter once, and use that across cmds for
> > > > > however long it needs to until it is ready to unpin the buffer table.
> > > > >
> > > > > b) Allowing a fixed buffer lookup at any index. The buffer table must be
> > > > > pinned in order to allow this, otherwise we would have to keep track of
> > > > > all the nodes that have been looked up by the io_kiocb so that we can
> > > > > properly adjust the refcounts for those nodes. Ensuring that the buffer
> > > > > table must first be pinned before being able to fetch a buffer at any
> > > > > index makes things logistically a lot neater.
> > > >
> > > > Why is it necessary to pin the entire buffer table rather than
> > > > specific entries? That's the purpose of the existing io_rsrc_node refs
> > > > field.
> > >
> > > How would this work with userspace buffer unregistration (which works
> > > at the table level)? If buffer unregistration should still succeed
> > > then fuse would need a way to be notified that the buffer has been
> > > unregistered since the buffer belongs to userspace (eg it would be
> > > wrong if fuse continues using it even though fuse retains a refcount
> > > on it). If buffer unregistration should fail, then we would need to
> > > track this pinned state inside the node instead of relying just on the
> > > refs field, as buffers can be unregistered even if there are in-flight
> > > refs (eg we would need to differentiate the ref being from a pin vs
> > > from not a pin), and I think this would make unregistration more
> > > cumbersome as well (eg we would have to iterate through all the
> > > entries looking to see if any are pinned before iterating through them
> > > again to do the actual unregistration).
> >
> > Not sure I would say buffer unregistration operates on the table as a
> > whole. Each registered buffer node is unregistered individually and
>
> I'm looking at the liburing interface for it and I'm only seeing
> io_uring_unregister_buffers() / IORING_UNREGISTER_BUFFERS which works
> on the entire table, so I'm wondering how that interface would work if
> pinning/unpinning was at the entry level?

IORING_REGISTER_BUFFERS_UPDATE can be used to update individual
registered buffers. For each updated slot, if there is an existing
buffer, it will be unregistered (decrementing the buffer node's
reference count). A new buffer may or may not be registered in its
place; it can be skipped by specifying a struct iovec with a NULL
iov_base.

>
> > stores its own reference count. io_put_rsrc_node() will be called on
> > each buffer node in the table. However, io_put_rsrc_node() just
> > removes the one reference from the buffer node. If there are other
> > references on the buffer node (such as an inflight io_uring request
> > using it), io_free_rsrc_node() won't be called to free the buffer node
> > until all those references are dropped too. So fuse holding a
> > reference on the buffer node would allow it to be unregistered, but
> > prevent it from being freed until fuse dropped its reference.
> > I'm not sure I understand the problem with fuse continuing to hold
> > onto a registered buffer node after userspace has unregistered it from
> > the buffer table. (It looks like the buffer node in question is the
>
> For fuse, it holds the reference to the buffer for the lifetime of the
> connection, which could be a very long time. I'm not seeing how we
> could let userspace succeed in unregistering with fuse continuing to
> hold that reference, since as I understand it conceptually,
> unregistering the buffer should give ownership of the buffer
> completely back to userspace.

I'm not quite sure what you mean by "give ownership of the buffer
completely back to userspace". My understanding is that registering a
buffer with io_uring just pins the physical pages as a perf
optimization. I'm not aware of a way for userspace to observe directly
whether or not certain physical pages are pinned. There's already no
guarantee that the physical pages are unpinned as soon as a buffer is
unregistered; if there are any inflight io_uring requests using the
registered buffer, they will continue to hold a reference count on the
buffer, preventing it from being released. The only guarantee is that
the unregistered slot in the buffer table is now empty.

Presumably fuse must release its reference count (or pin) eventually,
or otherwise there would be a resource leak? I don't see an issue with
holding references to registered buffers for the lifetime of a fuse
connection. As long as there's a way for the fuse server to tell the
kernel to release those resources if they are no longer needed (which
it sounds like already exists from your description of aborting a fuse
connection).

>
> > one at FUSE_URING_FIXED_HEADERS_INDEX?) Wouldn't pinning the buffer
>
> Yep you have that right, the buffer node in question is the one at
> FUSE_URING_FIXED_HEADERS_INDEX which is where all the headers for
> requests are placed.
>
> > table present similar issues? How would userspace get fuse to drop its
>
> I don't think pinning the buffer table has a similar issue because we
> disallow unregistration if it's pinned.

It sounds like you're saying that buffer unregistration just isn't
expected to work together with fuse's use of registered buffers, is
that accurate? Does it matter then whether the buffer unregistration
returns -EBUSY because the buffer table is pinned vs. succeeding but
not actually releasing the buffer resource because fuse still holds a
reference on it? My preference would be to just use the existing
reference counting mechanism rather than introducing another mechanism
if both provide safety equally well.

>
> > pin if it wants to modify the buffer registrations? I would imagine
>
> For the fuse use case, the server never really modifies its buffer
> registrations as it sets up everything before initiating the
> connection. But if it wanted to in the future, the server could send a
> fuse notification to the kernel to unpin the buf table.

This seems to assume that all the registered buffers are used
exclusively for fuse purposes. But it seems plausible that a fuse
server might want to use io_uring for other non-fuse purposes and want
to use registered buffers there too as a perf optimization? Then it
would be nice for the process to be able to update the other
registered buffers even while the kernel pins (or holds a reference
count on) the ones used for fuse.

Best,
Caleb

>
> > the code path that calls io_uring_buf_table_unpin() currently could
> > instead call into io_put_rsrc_node() (maybe by completing an io_uring
> > request that has imported the registered buffer) to release its
> > reference on the buffer node. For ublk, userspace can request to stop
> > a ublk device or the kernel will do so automatically if userspace
> > drops its file handle (e.g. if the process exits), which will release
> > any io_uring resources the ublk device is using.
>
> Fuse has something similar where the server can abort the connection,
> and that will release the pin / other io uring resources.
>
> Thanks,
> Joanne
>
> >
> > >
> > > >
> > > > >

  reply	other threads:[~2025-12-10  3:35 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-03  0:34 [PATCH v1 00/30] fuse/io-uring: add kernel-managed buffer rings and zero-copy Joanne Koong
2025-12-03  0:34 ` [PATCH v1 01/30] io_uring/kbuf: refactor io_buf_pbuf_register() logic into generic helpers Joanne Koong
2025-12-03  0:34 ` [PATCH v1 02/30] io_uring/kbuf: rename io_unregister_pbuf_ring() to io_unregister_buf_ring() Joanne Koong
2025-12-03  0:34 ` [PATCH v1 03/30] io_uring/kbuf: add support for kernel-managed buffer rings Joanne Koong
2025-12-03  0:34 ` [PATCH v1 04/30] io_uring/kbuf: add mmap " Joanne Koong
2025-12-03  0:35 ` [PATCH v1 05/30] io_uring/kbuf: support kernel-managed buffer rings in buffer selection Joanne Koong
2025-12-03  0:35 ` [PATCH v1 06/30] io_uring/kbuf: add buffer ring pinning/unpinning Joanne Koong
2025-12-03  4:13   ` Caleb Sander Mateos
2025-12-04 18:41     ` Joanne Koong
2025-12-03  0:35 ` [PATCH v1 07/30] io_uring/rsrc: add fixed buffer table pinning/unpinning Joanne Koong
2025-12-03  4:49   ` Caleb Sander Mateos
2025-12-03 22:52     ` Joanne Koong
2025-12-04  1:24       ` Caleb Sander Mateos
2025-12-04 20:07         ` Joanne Koong
2025-12-10  3:35           ` Caleb Sander Mateos [this message]
2025-12-03  0:35 ` [PATCH v1 08/30] io_uring/kbuf: add recycling for pinned kernel managed buffer rings Joanne Koong
2025-12-03  0:35 ` [PATCH v1 09/30] io_uring: add io_uring_cmd_import_fixed_index() Joanne Koong
2025-12-03 21:43   ` Caleb Sander Mateos
2025-12-04 18:56     ` Joanne Koong
2025-12-05 16:56       ` Caleb Sander Mateos
2025-12-05 23:28         ` Joanne Koong
2025-12-11  2:57           ` Caleb Sander Mateos
2025-12-03  0:35 ` [PATCH v1 10/30] io_uring/kbuf: add io_uring_is_kmbuf_ring() Joanne Koong
2025-12-03  0:35 ` [PATCH v1 11/30] io_uring/kbuf: return buffer id in buffer selection Joanne Koong
2025-12-03 21:53   ` Caleb Sander Mateos
2025-12-04 19:22     ` Joanne Koong
2025-12-04 21:57       ` Caleb Sander Mateos
2025-12-03  0:35 ` [PATCH v1 12/30] io_uring/kbuf: export io_ring_buffer_select() Joanne Koong
2025-12-03  0:35 ` [PATCH v1 13/30] io_uring/cmd: set selected buffer index in __io_uring_cmd_done() Joanne Koong
2025-12-03  0:35 ` [PATCH v1 14/30] io_uring: add release callback for ring death Joanne Koong
2025-12-03 22:25   ` Caleb Sander Mateos
2025-12-03 22:54     ` Joanne Koong
2025-12-03  0:35 ` [PATCH v1 15/30] fuse: refactor io-uring logic for getting next fuse request Joanne Koong
2025-12-03  0:35 ` [PATCH v1 16/30] fuse: refactor io-uring header copying to ring Joanne Koong
2025-12-03  0:35 ` [PATCH v1 17/30] fuse: refactor io-uring header copying from ring Joanne Koong
2025-12-03  0:35 ` [PATCH v1 18/30] fuse: use enum types for header copying Joanne Koong
2025-12-03  0:35 ` [PATCH v1 19/30] fuse: refactor setting up copy state for payload copying Joanne Koong
2025-12-03  0:35 ` [PATCH v1 20/30] fuse: support buffer copying for kernel addresses Joanne Koong
2025-12-03  0:35 ` [PATCH v1 21/30] fuse: add io-uring kernel-managed buffer ring Joanne Koong
2025-12-03  0:35 ` [PATCH v1 22/30] io_uring/rsrc: refactor io_buffer_register_bvec()/io_buffer_unregister_bvec() Joanne Koong
2025-12-07  8:33   ` Caleb Sander Mateos
2025-12-03  0:35 ` [PATCH v1 23/30] io_uring/rsrc: split io_buffer_register_request() logic Joanne Koong
2025-12-07  8:41   ` Caleb Sander Mateos
2025-12-03  0:35 ` [PATCH v1 24/30] io_uring/rsrc: Allow buffer release callback to be optional Joanne Koong
2025-12-07  8:42   ` Caleb Sander Mateos
2025-12-03  0:35 ` [PATCH v1 25/30] io_uring/rsrc: add io_buffer_register_bvec() Joanne Koong
2025-12-03  0:35 ` [PATCH v1 26/30] io_uring/rsrc: export io_buffer_unregister Joanne Koong
2025-12-03  0:35 ` [PATCH v1 27/30] fuse: rename fuse_set_zero_arg0() to fuse_zero_in_arg0() Joanne Koong
2025-12-03  0:35 ` [PATCH v1 28/30] fuse: enforce op header for every payload reply Joanne Koong
2025-12-03  0:35 ` [PATCH v1 29/30] fuse: add zero-copy over io-uring Joanne Koong
2025-12-03  0:35 ` [PATCH v1 30/30] docs: fuse: add io-uring bufring and zero-copy documentation Joanne Koong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CADUfDZqmj0X0e+DzFG-8W7cOkjj1emNzovitCL4E8SyLHmR4Ag@mail.gmail.com \
    --to=csander@purestorage.com \
    --cc=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=bschubert@ddn.com \
    --cc=io-uring@vger.kernel.org \
    --cc=joannelkoong@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=xiaobing.li@samsung.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).