linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Pavel Begunkov <asml.silence@gmail.com>
To: Joanne Koong <joannelkoong@gmail.com>
Cc: miklos@szeredi.hu, axboe@kernel.dk,
	linux-fsdevel@vger.kernel.org, bschubert@ddn.com,
	io-uring@vger.kernel.org, xiaobing.li@samsung.com,
	csander@purestorage.com, kernel-team@meta.com
Subject: Re: [PATCH v2 1/8] io_uring/uring_cmd: add io_uring_cmd_import_fixed_full()
Date: Thu, 30 Oct 2025 18:06:15 +0000	[thread overview]
Message-ID: <9f0debb1-ce0e-4085-a3fe-0da7a8fd76a6@gmail.com> (raw)
In-Reply-To: <CAJnrk1ZaGkEdWwhR=4nQe4kQOp6KqQQHRoS7GbTRcwnKrR5A3g@mail.gmail.com>

On 10/29/25 18:37, Joanne Koong wrote:
> On Wed, Oct 29, 2025 at 7:01 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>>
>> On 10/27/25 22:28, Joanne Koong wrote:
>>> Add an API for fetching the registered buffer associated with a
>>> io_uring cmd. This is useful for callers who need access to the buffer
>>> but do not have prior knowledge of the buffer's user address or length.
>>
>> Joanne, is it needed because you don't want to pass {offset,size}
>> via fuse uapi? It's often more convenient to allocate and register
>> one large buffer and let requests to use subchunks. Shouldn't be
>> different for performance, but e.g. if you try to overlay it onto
>> huge pages it'll be severely overaccounted.
>>
> 
> Hi Pavel,
> 
> Yes, I was thinking this would be a simpler interface than the
> userspace caller having to pass in the uaddr and size on every
> request. Right now the way it is structured is that userspace
> allocates a buffer per request, then registers all those buffers. On
> the kernel side when it fetches the buffer, it'll always fetch the
> whole buffer (eg offset is 0 and size is the full size).
> 
> Do you think it is better to allocate one large buffer and have the
> requests use subchunks? 

I think so, but that's general advice, I don't know the fuse
implementation details, and it's not a strong opinion. It'll be great
if you take a look at what other server implementations might want and
do, and if whether this approach is flexible enough, and how amendable
it is if you change it later on. E.g. how many registered buffers it
might need? io_uring caps it at some 1000s. How large buffers are?
Each separate buffer has memory footprint. And because of the same
footprint there might be cache misses as well if there are too many.
Can you always predict the max number of buffers to avoid resizing
the table? Do you ever want to use huge pages while being
restricted by mlock limits? And so on.

In either case, I don't have a problem with this patch, just
found it a bit off.

> My worry with this is that it would lead to
> suboptimal cache locality when servers offload handling requests to
> separate thread workers. From a code perspective it seems a bit

It wouldn't affect locality of the user buffers, that depends on
the user space implementation. Are you sharing an io_uring instance
between threads?

> simpler to have each request have its own buffer, but it wouldn't be
> much more complicated to have it all be part of one large buffer.
> 
> Right now, we are fetching the bvec iter every time there's a request
> because of the possibility that the buffer might have been
> unregistered (libfuse will not do this, but some other rogue userspace
> program could). If we added a flag to tell io uring that attempts at
> unregistration should return -EBUSY, then we could just fetch the bvec
> iter once and use that for the lifetime of the server connection
> instead of having to fetch it every request, and then when the
> connection is aborted, we could unset the flag so that userspace can
> then successfully unregister their buffers. Do you think this is a
> good idea to have in io-uring? If this is fine to add then I'll add
> this to v3.
The devil is in details, i.e. synchronisation. Taking a long term
node reference might be fine. Does this change the uapi for this
patchset? If not, I'd do it as a follow up. It also sounds like
you can apply this optimisation regardless of whether you take
a full registered buffer or go with sub ranges.

-- 
Pavel Begunkov


  parent reply	other threads:[~2025-10-30 18:06 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-27 22:27 [PATCH v2 0/8] fuse: support io-uring registered buffers Joanne Koong
2025-10-27 22:28 ` [PATCH v2 1/8] io_uring/uring_cmd: add io_uring_cmd_import_fixed_full() Joanne Koong
2025-10-28  1:28   ` Caleb Sander Mateos
2025-10-29 14:01   ` Pavel Begunkov
2025-10-29 18:37     ` Joanne Koong
2025-10-29 19:59       ` Bernd Schubert
2025-10-30 17:42         ` Pavel Begunkov
2025-10-30 18:06       ` Pavel Begunkov [this message]
2025-10-30 22:23         ` Bernd Schubert
2025-10-30 23:50           ` Joanne Koong
2025-10-31 10:27             ` Bernd Schubert
2025-10-31 21:19               ` Joanne Koong
2025-10-30 23:13         ` Joanne Koong
2025-10-27 22:28 ` [PATCH v2 2/8] fuse: refactor io-uring logic for getting next fuse request Joanne Koong
2025-10-30 23:07   ` Bernd Schubert
2025-10-27 22:28 ` [PATCH v2 3/8] fuse: refactor io-uring header copying to ring Joanne Koong
2025-10-30 23:15   ` Bernd Schubert
2025-10-30 23:52     ` Joanne Koong
2025-10-27 22:28 ` [PATCH v2 4/8] fuse: refactor io-uring header copying from ring Joanne Koong
2025-10-27 22:28 ` [PATCH v2 5/8] fuse: use enum types for header copying Joanne Koong
2025-11-05 23:01   ` Bernd Schubert
2025-11-06 21:59     ` Joanne Koong
2025-11-07 22:11       ` Bernd Schubert
2025-10-27 22:28 ` [PATCH v2 6/8] fuse: add user_ prefix to userspace headers and payload fields Joanne Koong
2025-10-28  1:32   ` Caleb Sander Mateos
2025-10-28 23:56     ` Joanne Koong
2025-11-06 13:35   ` Bernd Schubert
2025-10-27 22:28 ` [PATCH v2 7/8] fuse: refactor setting up copy state for payload copying Joanne Koong
2025-11-06 16:53   ` Bernd Schubert
2025-11-06 22:01     ` Joanne Koong
2025-10-27 22:28 ` [PATCH v2 8/8] fuse: support io-uring registered buffers Joanne Koong
2025-10-28  1:42   ` Caleb Sander Mateos
2025-10-28 23:56     ` Joanne Koong
2025-11-06 19:48   ` Bernd Schubert
2025-11-06 23:09     ` Joanne Koong
2025-11-07 22:16       ` Bernd Schubert
2025-11-07 22:23         ` Bernd Schubert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9f0debb1-ce0e-4085-a3fe-0da7a8fd76a6@gmail.com \
    --to=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=bschubert@ddn.com \
    --cc=csander@purestorage.com \
    --cc=io-uring@vger.kernel.org \
    --cc=joannelkoong@gmail.com \
    --cc=kernel-team@meta.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=xiaobing.li@samsung.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).