From: Bernd Schubert <bernd@bsbernd.com>
To: Joanne Koong <joannelkoong@gmail.com>, miklos@szeredi.hu
Cc: axboe@kernel.dk, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v2 14/14] docs: fuse: add io-uring bufring and zero-copy documentation
Date: Tue, 14 Apr 2026 23:05:01 +0200 [thread overview]
Message-ID: <0f2b87c5-2c98-4463-9a9c-bfca91e83cfc@bsbernd.com> (raw)
In-Reply-To: <20260402162840.2989717-15-joannelkoong@gmail.com>
On 4/2/26 18:28, Joanne Koong wrote:
> Add documentation for fuse over io-uring usage of buffer rings and
> zero-copy.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---
> .../filesystems/fuse/fuse-io-uring.rst | 189 ++++++++++++++++++
> 1 file changed, 189 insertions(+)
>
> diff --git a/Documentation/filesystems/fuse/fuse-io-uring.rst b/Documentation/filesystems/fuse/fuse-io-uring.rst
> index d73dd0dbd238..bc47686c023f 100644
> --- a/Documentation/filesystems/fuse/fuse-io-uring.rst
> +++ b/Documentation/filesystems/fuse/fuse-io-uring.rst
> @@ -95,5 +95,194 @@ Sending requests with CQEs
> | <fuse_unlink() |
> | <sys_unlink() |
>
> +Buffer rings
> +============
>
> +Buffer rings have two main advantages:
>
> +* Reduced memory usage: payload buffers are pooled and selected on demand
> + rather than dedicated per-entry, allowing fewer buffers than entries. This
> + infrastructure also allows for future optimizations like incremental buffer
> + consumption where non-overlapping parts of a buffer may be used across
> + concurrent requests.
> +* Foundation for pinned buffers: contiguous buffer allocations allow the
> + kernel to pin and vmap the entire region, avoiding per-request page
> + resolution overhead
> +
> +At a high-level, this is how fuse uses buffer rings:
> +
> +* The first REGISTER SQE for a queue creates the queue and sets up the
> + buffer ring. The server provides two iovecs: one for headers and one for
> + payload buffers. Each entry gets a fixed ID (sqe->buf_index) that maps
> + to a specific header slot.
Hi Joanne,
thanks a lot for this document! Could we discuss if we could just hook
in here and allow SQEs with different iovecs for the payload buffer?
Let's say fuse-server wants multiple IO sizes - it could easily do that
via different pBufs and just needs to specify the dedicated IO size per
pBuf. Those buffers could then get sorted into an array - we could
define either via FUSE init the number of buf sizes or use a fixed size
array. Fuse requests then would just need to pick the right array.
This is basically what I'm currently working on for ublk.
I think it would be good to agree on the design before it gets merged so
that uapi doesn't change.
Thanks,
Bernd
> +* When a client request arrives, the kernel selects a payload buffer from
> + the ring (if the request has copyable data), copies headers and payload
> + data, and completes the sqe.
> +* The buf_id of the selected payload buffer is communicated to the server
> + via the fuse_uring_ent_in_out header. The server uses this to locate the
> + payload data in its buffer.
> +* The server processes the request and sends a COMMIT_AND_FETCH SQE with
> + the reply. The kernel processes the reply and recycles the buffer.
> +
> +Visually, this looks like::
> +
> + Headers buffer:
> + +-----------------------+-----------------------+-----+
> + | fuse_uring_req_header | fuse_uring_req_header | ... |
> + | [ent 0] | [ent 1] | |
> + +-----------------------+-----------------------+-----+
> + ^ ^
> + | |
> + ent 0 header slot ent 1 header slot
> + (sqe->buf_index=0) (sqe->buf_index=1)
> +
> + Payload buffer pool:
> + +-----------+-----------+-----------+-----+
> + | buf 0 | buf 1 | buf 2 | ... |
> + | (buf_size)| (buf_size)| (buf_size)| |
> + +-----------+-----------+-----------+-----+
> + selected on demand, recycled after each request
> +
> +Buffer ring request flow
> +------------------------::
> +
> +| Kernel | FUSE daemon
> +| |
> +| [client request arrives] |
> +| >fuse_uring_send() |
> +| [select payload buf from ring] |
> +| >fuse_uring_select_buffer() |
> +| [copy headers to ent's header slot] |
> +| >copy_header_to_ring() |
> +| [copy payload to selected buf] |
> +| >fuse_uring_copy_to_ring() |
> +| [set buf_id in ent_in_out header] |
> +| >io_uring_cmd_done() |
> +| | [CQE received]
> +| | [read headers from header slot]
> +| | [read payload from buf_id]
> +| | [process request]
> +| | [write reply to header slot]
> +| | [write reply payload to buf]
> +| | >io_uring_submit()
> +| | COMMIT_AND_FETCH
> +| >fuse_uring_commit_fetch() |
> +| >fuse_uring_commit() |
> +| [copy reply from ring] |
> +| >fuse_uring_recycle_buffer() |
> +| >fuse_uring_get_next_fuse_req() |
> +
> +Pinned buffers
> +==============
> +
> +Servers can optionally pin their header and/or payload buffers by setting
> +FUSE_URING_PINNED_HEADERS and/or FUSE_URING_PINNED_BUFFERS flags. When
> +set, the kernel pins the user pages and vmaps them during queue setup,
> +enabling memcpy to/from the kernel virtual address instead of
> +copy_to_user/copy_from_user.
> +
> +This avoids the per-request cost of pinning/unpinning user pages and
> +translating virtual addresses. Buffers must be page-aligned. The pinned pages
> +are accounted against RLIMIT_MEMLOCK (bypassable with CAP_IPC_LOCK).
> +
> +Zero-copy
> +=========
> +
> +Fuse io-uring zero-copy allows the server to directly read from / write to
> +the client's pages, bypassing any intermediary buffer copies. This requires
> +the FUSE_URING_ZERO_COPY flag, buffer rings with pinned headers and buffers,
> +and CAP_SYS_ADMIN.
> +
> +The kernel registers the client's underlying pages as a sparse buffer at
> +the entry's fixed id via io_buffer_register_bvec(). The fuse server can
> +then perform io_uring read/write operations directly on these pages.
> +Non-page-backed args (eg out headers) go through the payload buffer as
> +normal. Pages are unregistered when the request completes.
> +
> +The request flow for the zero-copy write path (client writes data, server
> +reads it) is as follows:
> +
> +Zero-copy write
> +---------------::
> +| Kernel | FUSE server
> +| |
> +| "write(fd, buf, 1MB)" |
> +| |
> +| >sys_write() |
> +| >fuse_file_write_iter() |
> +| >fuse_send_one() |
> +| [req->args->in_pages = true] |
> +| [folios hold client write data] |
> +| |
> +| >fuse_uring_copy_to_ring() |
> +| >copy_header_to_ring(IN_OUT) |
> +| [memcpy fuse_in_header to |
> +| pinned headers buf via kaddr] |
> +| >copy_header_to_ring(OP) |
> +| [memcpy write_in header] |
> +| |
> +| >fuse_uring_args_to_ring() |
> +| >setup_fuse_copy_state() |
> +| [is_kaddr = true] |
> +| [skip_folio_copy = true] |
> +| |
> +| >fuse_uring_set_up_zero_copy() |
> +| [folio_get for each client folio] |
> +| [build bio_vec array from folios] |
> +| >io_buffer_register_bvec() |
> +| [register pages at ent->id] |
> +| [ent->zero_copied = true] |
> +| |
> +| >fuse_copy_args() |
> +| [skip_folio_copy => return 0 |
> +| for page arg, skip data copy] |
> +| |
> +| >copy_header_to_ring(RING_ENT) |
> +| [memcpy ent_in_out] |
> +| >io_uring_cmd_done() |
> +| |
> +| | [CQE received]
> +| |
> +| | [issue io_uring READ at
> +| | ent->id]
> +| | [reads directly from
> +| | client's pages (ZERO_COPY)]
> +| |
> +| | [write data to backing
> +| | store]
> +| | [submit COMMIT AND FETCH]
> +| |
> +| >fuse_uring_commit_fetch() |
> +| >fuse_uring_commit() |
> +| >fuse_uring_copy_from_ring() |
> +| >fuse_uring_req_end() |
> +| >io_buffer_unregister(ent->id) |
> +| [unregister sparse buffer] |
> +| >fuse_zero_copy_release() |
> +| [folio_put for each folio] |
> +| [ent->zero_copied = false] |
> +| >fuse_request_end() |
> +| [wake up client] |
> +
> +The zero-copy read path is analogous.
> +
> +Some requests may have both page-backed args and non-page-backed args.
> +For these requests, the page-backed args are zero-copied while the
> +non-page-backed args are copied to the buffer selected from the buffer
> +ring:
> + zero-copy: pages registered via io_buffer_register_bvec()
> + non-page-backed: copied to payload buffer via fuse_copy_args()
> +
> +For a request whose payload is zero-copied, the registration/unregistration
> +path looks like:
> +
> +register: fuse_uring_set_up_zero_copy()
> + folio_get() for each folio
> + io_buffer_register_bvec(ent->id)
> +
> +[server accesses pages via io_uring fixed buf at ent->id]
> +
> +unregister: fuse_uring_req_end()
> + io_buffer_unregister(ent->id)
> + -> fuse_zero_copy_release() callback
> + folio_put() for each folio
next prev parent reply other threads:[~2026-04-14 21:05 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-02 16:28 [PATCH v2 00/14] fuse: add io-uring buffer rings and zero-copy Joanne Koong
2026-04-02 16:28 ` [PATCH v2 01/14] fuse: separate next request fetching from sending logic Joanne Koong
2026-04-29 11:52 ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 02/14] fuse: refactor io-uring header copying to ring Joanne Koong
2026-04-29 12:05 ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 03/14] fuse: refactor io-uring header copying from ring Joanne Koong
2026-04-29 12:06 ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 04/14] fuse: use enum types for header copying Joanne Koong
2026-04-30 8:04 ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 05/14] fuse: refactor setting up copy state for payload copying Joanne Koong
2026-04-30 8:06 ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 06/14] fuse: support buffer copying for kernel addresses Joanne Koong
2026-04-30 8:19 ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 07/14] fuse: use named constants for io-uring iovec indices Joanne Koong
2026-04-15 9:36 ` Bernd Schubert
2026-04-30 8:20 ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 08/14] fuse: move fuse_uring_abort() from header to dev_uring.c Joanne Koong
2026-04-15 9:40 ` Bernd Schubert
2026-04-30 8:21 ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 09/14] fuse: rearrange io-uring iovec and ent allocation logic Joanne Koong
2026-04-15 9:45 ` Bernd Schubert
2026-04-30 8:24 ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 10/14] fuse: add io-uring buffer rings Joanne Koong
2026-04-15 9:48 ` Bernd Schubert
2026-04-15 21:40 ` Joanne Koong
2026-04-30 11:08 ` Jeff Layton
2026-04-30 12:44 ` Joanne Koong
2026-05-05 22:47 ` Bernd Schubert
2026-04-02 16:28 ` [PATCH v2 11/14] fuse: add pinned headers capability for " Joanne Koong
2026-04-14 12:47 ` Bernd Schubert
2026-04-15 0:48 ` Joanne Koong
2026-05-05 22:51 ` Bernd Schubert
2026-04-30 11:22 ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 12/14] fuse: add pinned payload buffers " Joanne Koong
2026-04-30 11:29 ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 13/14] fuse: add zero-copy over io-uring Joanne Koong
2026-04-30 11:42 ` Jeff Layton
2026-04-30 12:35 ` Joanne Koong
2026-04-30 12:55 ` Jeff Layton
2026-05-05 22:55 ` Bernd Schubert
2026-04-30 12:56 ` Jeff Layton
2026-05-05 23:45 ` Bernd Schubert
2026-04-02 16:28 ` [PATCH v2 14/14] docs: fuse: add io-uring bufring and zero-copy documentation Joanne Koong
2026-04-14 21:05 ` Bernd Schubert [this message]
2026-04-15 1:10 ` Joanne Koong
2026-04-15 10:55 ` Bernd Schubert
2026-04-15 22:40 ` Joanne Koong
2026-04-30 12:57 ` Jeff Layton
2026-04-30 12:59 ` [PATCH v2 00/14] fuse: add io-uring buffer rings and zero-copy Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0f2b87c5-2c98-4463-9a9c-bfca91e83cfc@bsbernd.com \
--to=bernd@bsbernd.com \
--cc=axboe@kernel.dk \
--cc=joannelkoong@gmail.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=miklos@szeredi.hu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox