From: Bernd Schubert <bernd@bsbernd.com>
To: Joanne Koong <joannelkoong@gmail.com>
Cc: miklos@szeredi.hu, axboe@kernel.dk, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v2 14/14] docs: fuse: add io-uring bufring and zero-copy documentation
Date: Wed, 15 Apr 2026 12:55:02 +0200 [thread overview]
Message-ID: <ff596299-38c1-4c5a-8f1d-14931dd84ef0@bsbernd.com> (raw)
In-Reply-To: <CAJnrk1ZAcbYkewWz=CbxgzkvXnk2-QKLjSk-oA0hywbnVUSM9A@mail.gmail.com>
On 4/15/26 03:10, Joanne Koong wrote:
> On Tue, Apr 14, 2026 at 2:05 PM Bernd Schubert <bernd@bsbernd.com> wrote:
>>
>> On 4/2/26 18:28, Joanne Koong wrote:
>>> Add documentation for fuse over io-uring usage of buffer rings and
>>> zero-copy.
>>>
>>> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
>>> ---
>>> .../filesystems/fuse/fuse-io-uring.rst | 189 ++++++++++++++++++
>>> 1 file changed, 189 insertions(+)
>>>
>>> diff --git a/Documentation/filesystems/fuse/fuse-io-uring.rst b/Documentation/filesystems/fuse/fuse-io-uring.rst
>>> index d73dd0dbd238..bc47686c023f 100644
>>> --- a/Documentation/filesystems/fuse/fuse-io-uring.rst
>>> +++ b/Documentation/filesystems/fuse/fuse-io-uring.rst
>>> @@ -95,5 +95,194 @@ Sending requests with CQEs
>>> | <fuse_unlink() |
>>> | <sys_unlink() |
>>>
>>> +Buffer rings
>>> +============
>>>
>>> +Buffer rings have two main advantages:
>>>
>>> +* Reduced memory usage: payload buffers are pooled and selected on demand
>>> + rather than dedicated per-entry, allowing fewer buffers than entries. This
Then don't register that many entries? An entry is useless if it cannot
carry data - why do you need to register that many entries then?
>>> + infrastructure also allows for future optimizations like incremental buffer
>>> + consumption where non-overlapping parts of a buffer may be used across
>>> + concurrent requests.
>>> +* Foundation for pinned buffers: contiguous buffer allocations allow the
>>> + kernel to pin and vmap the entire region, avoiding per-request page
>>> + resolution overhead
Pinning can be done per buffer as well. The part that is harder is
pinning of the headers - this is why libfuse currently allocates 4K for
every header - prepares the pinning. From my point of view, we _should_
make use of that and set at registration time that the header is
allocated as 4K and then small requests can be inlined into the
remaining part of those 4K. With that ring bufs become useful, because
most metadata do not need a new payload buffer anymore.
However, I think in your current design headers are mapped into a large
region and there is no way to use extra space. I think that is fine, as
long as we have the capability to have multi size buf pools.
Contiguous buffer allocation can be done for entries as well - userspace
just needs to assign it to buffers that way. It becomes it a bit harder
with dynamic entry registration - entries buffers should then allocat in
sizes of system huge pages.
In fact I initially had that in libfuse and had allocated all userspace
buffers as one big memory. Then 'temporarily' removed it because I had
development stability issues - the single buffer needs to be marked with
ASAN areas in order to catch issues. For initial development that was
just overkill, but could be added now, in combination with ASAN buf marking.
For pools it would be good think about ASAN as well.
>>> +
>>> +At a high-level, this is how fuse uses buffer rings:
>>> +
>>> +* The first REGISTER SQE for a queue creates the queue and sets up the
>>> + buffer ring. The server provides two iovecs: one for headers and one for
>>> + payload buffers. Each entry gets a fixed ID (sqe->buf_index) that maps
>>> + to a specific header slot.
>>
>> Hi Joanne,
>>
>> thanks a lot for this document! Could we discuss if we could just hook
>> in here and allow SQEs with different iovecs for the payload buffer?
>> Let's say fuse-server wants multiple IO sizes - it could easily do that
>> via different pBufs and just needs to specify the dedicated IO size per
>> pBuf. Those buffers could then get sorted into an array - we could
>> define either via FUSE init the number of buf sizes or use a fixed size
>> array. Fuse requests then would just need to pick the right array.
>> This is basically what I'm currently working on for ublk.
>>
>> I think it would be good to agree on the design before it gets merged so
>> that uapi doesn't change.
>
> Hi Bernd,
>
> I'm not certain I fully see the use case for a fuse server preferring
> a static preallocation of multiple IO sizes over using incremental
> buffer consumption, but in my mind to support multiple IO size
I have to admit that I don't see why we need pbuf for dynamic
allocation. While the io-uring ring has a fixed number of SQEs/CQEs and
while libfuse currently strongly couples these to fuse buffers, there is
no technical reason. Initially it was, because I had taken the 'tags'
from ublk design, but then Miklos asked to make it lists that just get
appended whenever a FUSE_IO_URING_CMD_REGISTER is send. Which means
libfuse _could_ add new entries any time. You could start with 1 entry
per queue, additionally with the reduce-nr-queue patches you could even
start with a single queue and a single entry - and then extend that at
any time to what libfuse or the application believes is needed.
I.e. except of io-uring setup, adding or even removing ring entries and
their buffers is mainly a missing userspace issue. In order to remove
idle entries, we could add another notification type like
FUSE_NOTIFY_WAKE_RING_ENTRIES and it would then wake a given amount per
queue and maybe send via a new op code like FUSE_NOOP. All of that seems
to be easy.
> payloads, I was thinking something like this might work best:
>
> * iov[0] for the headers stays the same. no matter how many IO size
> payloads there are, the ent always maps to a header and the headers
> are a fixed size
> * iov[1...x] are the payload buffers. From the uapi perspective, in
> the fues_uring_cmd_req init struct, there would need to be an array of
> uint32_t buf_sizes. Each index in the array would correspond to index
> + 1 in the iov[] payloads passed
> * on the fuse side, each of the buffer pools has its own ring. I think
> this makes managing the different buffers a lot easier and gets rid of
> having to do any array sorting, and makes buffer selection/recycling
> O(1).
Let's say we would have per queue
struct fuse_bufring {
bool use_pinned_headers: 1;
bool use_zero_copy: 1;
unsigned int max_queue_depth; /* headers buffer capacity; frozen
at first REGISTER */
union {
void __user *headers;
struct fuse_bufring_pinned pinned_headers;
};
unsigned int nr_pools;
struct fuse_bufring_pool *pools[FUSE_URING_MAX_POOLS];
/* lookup: order (req size) pool */
struct fuse_bufring_pool *order_map[FUSE_URING_NR_ORDERS];
};
Order map is then dynamically created at buf pool registration time, and
then we would eventually get to
struct fuse_bufring_pool *pool = order_map[get_order(fuse_len_args())];
(obviously the final code needs to get a check that we don't exceed max
payload size.)
The looked up pool can be stored into ring_ent for buf recycling.
And then
struct fuse_uring_cmd_req {
...
union {
struct {
__u32 max_queue_depth; /* (renamed from queue_depth) */
__u32 buf_size;
__u8 pool_idx;
__u8 _pad[3];
} init;
...
};
};
I think pool_idx is needed one way or the other, because the io-uring
ring owner might have other pools for its own purposes.
Thanks,
Bernd
next prev parent reply other threads:[~2026-04-15 10:55 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-02 16:28 [PATCH v2 00/14] fuse: add io-uring buffer rings and zero-copy Joanne Koong
2026-04-02 16:28 ` [PATCH v2 01/14] fuse: separate next request fetching from sending logic Joanne Koong
2026-04-29 11:52 ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 02/14] fuse: refactor io-uring header copying to ring Joanne Koong
2026-04-29 12:05 ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 03/14] fuse: refactor io-uring header copying from ring Joanne Koong
2026-04-29 12:06 ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 04/14] fuse: use enum types for header copying Joanne Koong
2026-04-30 8:04 ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 05/14] fuse: refactor setting up copy state for payload copying Joanne Koong
2026-04-30 8:06 ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 06/14] fuse: support buffer copying for kernel addresses Joanne Koong
2026-04-30 8:19 ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 07/14] fuse: use named constants for io-uring iovec indices Joanne Koong
2026-04-15 9:36 ` Bernd Schubert
2026-04-30 8:20 ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 08/14] fuse: move fuse_uring_abort() from header to dev_uring.c Joanne Koong
2026-04-15 9:40 ` Bernd Schubert
2026-04-30 8:21 ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 09/14] fuse: rearrange io-uring iovec and ent allocation logic Joanne Koong
2026-04-15 9:45 ` Bernd Schubert
2026-04-30 8:24 ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 10/14] fuse: add io-uring buffer rings Joanne Koong
2026-04-15 9:48 ` Bernd Schubert
2026-04-15 21:40 ` Joanne Koong
2026-04-30 11:08 ` Jeff Layton
2026-04-30 12:44 ` Joanne Koong
2026-05-05 22:47 ` Bernd Schubert
2026-04-02 16:28 ` [PATCH v2 11/14] fuse: add pinned headers capability for " Joanne Koong
2026-04-14 12:47 ` Bernd Schubert
2026-04-15 0:48 ` Joanne Koong
2026-05-05 22:51 ` Bernd Schubert
2026-04-30 11:22 ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 12/14] fuse: add pinned payload buffers " Joanne Koong
2026-04-30 11:29 ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 13/14] fuse: add zero-copy over io-uring Joanne Koong
2026-04-30 11:42 ` Jeff Layton
2026-04-30 12:35 ` Joanne Koong
2026-04-30 12:55 ` Jeff Layton
2026-05-05 22:55 ` Bernd Schubert
2026-04-30 12:56 ` Jeff Layton
2026-05-05 23:45 ` Bernd Schubert
2026-04-02 16:28 ` [PATCH v2 14/14] docs: fuse: add io-uring bufring and zero-copy documentation Joanne Koong
2026-04-14 21:05 ` Bernd Schubert
2026-04-15 1:10 ` Joanne Koong
2026-04-15 10:55 ` Bernd Schubert [this message]
2026-04-15 22:40 ` Joanne Koong
2026-04-30 12:57 ` Jeff Layton
2026-04-30 12:59 ` [PATCH v2 00/14] fuse: add io-uring buffer rings and zero-copy Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ff596299-38c1-4c5a-8f1d-14931dd84ef0@bsbernd.com \
--to=bernd@bsbernd.com \
--cc=axboe@kernel.dk \
--cc=joannelkoong@gmail.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=miklos@szeredi.hu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox