From: Joanne Koong <joannelkoong@gmail.com>
To: miklos@szeredi.hu
Cc: bernd@bsbernd.com, axboe@kernel.dk, linux-fsdevel@vger.kernel.org
Subject: [PATCH v2 14/14] docs: fuse: add io-uring bufring and zero-copy documentation
Date: Thu, 2 Apr 2026 09:28:40 -0700 [thread overview]
Message-ID: <20260402162840.2989717-15-joannelkoong@gmail.com> (raw)
In-Reply-To: <20260402162840.2989717-1-joannelkoong@gmail.com>
Add documentation for fuse over io-uring usage of buffer rings and
zero-copy.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
.../filesystems/fuse/fuse-io-uring.rst | 189 ++++++++++++++++++
1 file changed, 189 insertions(+)
diff --git a/Documentation/filesystems/fuse/fuse-io-uring.rst b/Documentation/filesystems/fuse/fuse-io-uring.rst
index d73dd0dbd238..bc47686c023f 100644
--- a/Documentation/filesystems/fuse/fuse-io-uring.rst
+++ b/Documentation/filesystems/fuse/fuse-io-uring.rst
@@ -95,5 +95,194 @@ Sending requests with CQEs
| <fuse_unlink() |
| <sys_unlink() |
+Buffer rings
+============
+Buffer rings have two main advantages:
+* Reduced memory usage: payload buffers are pooled and selected on demand
+ rather than dedicated per-entry, allowing fewer buffers than entries. This
+ infrastructure also allows for future optimizations like incremental buffer
+ consumption where non-overlapping parts of a buffer may be used across
+ concurrent requests.
+* Foundation for pinned buffers: contiguous buffer allocations allow the
+ kernel to pin and vmap the entire region, avoiding per-request page
+ resolution overhead
+
+At a high-level, this is how fuse uses buffer rings:
+
+* The first REGISTER SQE for a queue creates the queue and sets up the
+ buffer ring. The server provides two iovecs: one for headers and one for
+ payload buffers. Each entry gets a fixed ID (sqe->buf_index) that maps
+ to a specific header slot.
+* When a client request arrives, the kernel selects a payload buffer from
+ the ring (if the request has copyable data), copies headers and payload
+ data, and completes the sqe.
+* The buf_id of the selected payload buffer is communicated to the server
+ via the fuse_uring_ent_in_out header. The server uses this to locate the
+ payload data in its buffer.
+* The server processes the request and sends a COMMIT_AND_FETCH SQE with
+ the reply. The kernel processes the reply and recycles the buffer.
+
+Visually, this looks like::
+
+ Headers buffer:
+ +-----------------------+-----------------------+-----+
+ | fuse_uring_req_header | fuse_uring_req_header | ... |
+ | [ent 0] | [ent 1] | |
+ +-----------------------+-----------------------+-----+
+ ^ ^
+ | |
+ ent 0 header slot ent 1 header slot
+ (sqe->buf_index=0) (sqe->buf_index=1)
+
+ Payload buffer pool:
+ +-----------+-----------+-----------+-----+
+ | buf 0 | buf 1 | buf 2 | ... |
+ | (buf_size)| (buf_size)| (buf_size)| |
+ +-----------+-----------+-----------+-----+
+ selected on demand, recycled after each request
+
+Buffer ring request flow
+------------------------::
+
+| Kernel | FUSE daemon
+| |
+| [client request arrives] |
+| >fuse_uring_send() |
+| [select payload buf from ring] |
+| >fuse_uring_select_buffer() |
+| [copy headers to ent's header slot] |
+| >copy_header_to_ring() |
+| [copy payload to selected buf] |
+| >fuse_uring_copy_to_ring() |
+| [set buf_id in ent_in_out header] |
+| >io_uring_cmd_done() |
+| | [CQE received]
+| | [read headers from header slot]
+| | [read payload from buf_id]
+| | [process request]
+| | [write reply to header slot]
+| | [write reply payload to buf]
+| | >io_uring_submit()
+| | COMMIT_AND_FETCH
+| >fuse_uring_commit_fetch() |
+| >fuse_uring_commit() |
+| [copy reply from ring] |
+| >fuse_uring_recycle_buffer() |
+| >fuse_uring_get_next_fuse_req() |
+
+Pinned buffers
+==============
+
+Servers can optionally pin their header and/or payload buffers by setting
+FUSE_URING_PINNED_HEADERS and/or FUSE_URING_PINNED_BUFFERS flags. When
+set, the kernel pins the user pages and vmaps them during queue setup,
+enabling memcpy to/from the kernel virtual address instead of
+copy_to_user/copy_from_user.
+
+This avoids the per-request cost of pinning/unpinning user pages and
+translating virtual addresses. Buffers must be page-aligned. The pinned pages
+are accounted against RLIMIT_MEMLOCK (bypassable with CAP_IPC_LOCK).
+
+Zero-copy
+=========
+
+Fuse io-uring zero-copy allows the server to directly read from / write to
+the client's pages, bypassing any intermediary buffer copies. This requires
+the FUSE_URING_ZERO_COPY flag, buffer rings with pinned headers and buffers,
+and CAP_SYS_ADMIN.
+
+The kernel registers the client's underlying pages as a sparse buffer at
+the entry's fixed id via io_buffer_register_bvec(). The fuse server can
+then perform io_uring read/write operations directly on these pages.
+Non-page-backed args (eg out headers) go through the payload buffer as
+normal. Pages are unregistered when the request completes.
+
+The request flow for the zero-copy write path (client writes data, server
+reads it) is as follows:
+
+Zero-copy write
+---------------::
+| Kernel | FUSE server
+| |
+| "write(fd, buf, 1MB)" |
+| |
+| >sys_write() |
+| >fuse_file_write_iter() |
+| >fuse_send_one() |
+| [req->args->in_pages = true] |
+| [folios hold client write data] |
+| |
+| >fuse_uring_copy_to_ring() |
+| >copy_header_to_ring(IN_OUT) |
+| [memcpy fuse_in_header to |
+| pinned headers buf via kaddr] |
+| >copy_header_to_ring(OP) |
+| [memcpy write_in header] |
+| |
+| >fuse_uring_args_to_ring() |
+| >setup_fuse_copy_state() |
+| [is_kaddr = true] |
+| [skip_folio_copy = true] |
+| |
+| >fuse_uring_set_up_zero_copy() |
+| [folio_get for each client folio] |
+| [build bio_vec array from folios] |
+| >io_buffer_register_bvec() |
+| [register pages at ent->id] |
+| [ent->zero_copied = true] |
+| |
+| >fuse_copy_args() |
+| [skip_folio_copy => return 0 |
+| for page arg, skip data copy] |
+| |
+| >copy_header_to_ring(RING_ENT) |
+| [memcpy ent_in_out] |
+| >io_uring_cmd_done() |
+| |
+| | [CQE received]
+| |
+| | [issue io_uring READ at
+| | ent->id]
+| | [reads directly from
+| | client's pages (ZERO_COPY)]
+| |
+| | [write data to backing
+| | store]
+| | [submit COMMIT AND FETCH]
+| |
+| >fuse_uring_commit_fetch() |
+| >fuse_uring_commit() |
+| >fuse_uring_copy_from_ring() |
+| >fuse_uring_req_end() |
+| >io_buffer_unregister(ent->id) |
+| [unregister sparse buffer] |
+| >fuse_zero_copy_release() |
+| [folio_put for each folio] |
+| [ent->zero_copied = false] |
+| >fuse_request_end() |
+| [wake up client] |
+
+The zero-copy read path is analogous.
+
+Some requests may have both page-backed args and non-page-backed args.
+For these requests, the page-backed args are zero-copied while the
+non-page-backed args are copied to the buffer selected from the buffer
+ring:
+ zero-copy: pages registered via io_buffer_register_bvec()
+ non-page-backed: copied to payload buffer via fuse_copy_args()
+
+For a request whose payload is zero-copied, the registration/unregistration
+path looks like:
+
+register: fuse_uring_set_up_zero_copy()
+ folio_get() for each folio
+ io_buffer_register_bvec(ent->id)
+
+[server accesses pages via io_uring fixed buf at ent->id]
+
+unregister: fuse_uring_req_end()
+ io_buffer_unregister(ent->id)
+ -> fuse_zero_copy_release() callback
+ folio_put() for each folio
--
2.52.0
next prev parent reply other threads:[~2026-04-02 16:30 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-02 16:28 [PATCH v2 00/14] fuse: add io-uring buffer rings and zero-copy Joanne Koong
2026-04-02 16:28 ` [PATCH v2 01/14] fuse: separate next request fetching from sending logic Joanne Koong
2026-04-29 11:52 ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 02/14] fuse: refactor io-uring header copying to ring Joanne Koong
2026-04-29 12:05 ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 03/14] fuse: refactor io-uring header copying from ring Joanne Koong
2026-04-29 12:06 ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 04/14] fuse: use enum types for header copying Joanne Koong
2026-04-30 8:04 ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 05/14] fuse: refactor setting up copy state for payload copying Joanne Koong
2026-04-30 8:06 ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 06/14] fuse: support buffer copying for kernel addresses Joanne Koong
2026-04-30 8:19 ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 07/14] fuse: use named constants for io-uring iovec indices Joanne Koong
2026-04-15 9:36 ` Bernd Schubert
2026-04-30 8:20 ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 08/14] fuse: move fuse_uring_abort() from header to dev_uring.c Joanne Koong
2026-04-15 9:40 ` Bernd Schubert
2026-04-30 8:21 ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 09/14] fuse: rearrange io-uring iovec and ent allocation logic Joanne Koong
2026-04-15 9:45 ` Bernd Schubert
2026-04-30 8:24 ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 10/14] fuse: add io-uring buffer rings Joanne Koong
2026-04-15 9:48 ` Bernd Schubert
2026-04-15 21:40 ` Joanne Koong
2026-04-30 11:08 ` Jeff Layton
2026-04-30 12:44 ` Joanne Koong
2026-05-05 22:47 ` Bernd Schubert
2026-04-02 16:28 ` [PATCH v2 11/14] fuse: add pinned headers capability for " Joanne Koong
2026-04-14 12:47 ` Bernd Schubert
2026-04-15 0:48 ` Joanne Koong
2026-05-05 22:51 ` Bernd Schubert
2026-04-30 11:22 ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 12/14] fuse: add pinned payload buffers " Joanne Koong
2026-04-30 11:29 ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 13/14] fuse: add zero-copy over io-uring Joanne Koong
2026-04-30 11:42 ` Jeff Layton
2026-04-30 12:35 ` Joanne Koong
2026-04-30 12:55 ` Jeff Layton
2026-05-05 22:55 ` Bernd Schubert
2026-04-30 12:56 ` Jeff Layton
2026-05-05 23:45 ` Bernd Schubert
2026-04-02 16:28 ` Joanne Koong [this message]
2026-04-14 21:05 ` [PATCH v2 14/14] docs: fuse: add io-uring bufring and zero-copy documentation Bernd Schubert
2026-04-15 1:10 ` Joanne Koong
2026-04-15 10:55 ` Bernd Schubert
2026-04-15 22:40 ` Joanne Koong
2026-04-30 12:57 ` Jeff Layton
2026-04-30 12:59 ` [PATCH v2 00/14] fuse: add io-uring buffer rings and zero-copy Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260402162840.2989717-15-joannelkoong@gmail.com \
--to=joannelkoong@gmail.com \
--cc=axboe@kernel.dk \
--cc=bernd@bsbernd.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=miklos@szeredi.hu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox