Linux filesystem development
 help / color / mirror / Atom feed
From: Joanne Koong <joannelkoong@gmail.com>
To: miklos@szeredi.hu
Cc: bernd@bsbernd.com, axboe@kernel.dk, linux-fsdevel@vger.kernel.org
Subject: [PATCH v2 14/14] docs: fuse: add io-uring bufring and zero-copy documentation
Date: Thu,  2 Apr 2026 09:28:40 -0700	[thread overview]
Message-ID: <20260402162840.2989717-15-joannelkoong@gmail.com> (raw)
In-Reply-To: <20260402162840.2989717-1-joannelkoong@gmail.com>

Add documentation for fuse over io-uring usage of buffer rings and
zero-copy.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 .../filesystems/fuse/fuse-io-uring.rst        | 189 ++++++++++++++++++
 1 file changed, 189 insertions(+)

diff --git a/Documentation/filesystems/fuse/fuse-io-uring.rst b/Documentation/filesystems/fuse/fuse-io-uring.rst
index d73dd0dbd238..bc47686c023f 100644
--- a/Documentation/filesystems/fuse/fuse-io-uring.rst
+++ b/Documentation/filesystems/fuse/fuse-io-uring.rst
@@ -95,5 +95,194 @@ Sending requests with CQEs
  |    <fuse_unlink()                         |
  |  <sys_unlink()                            |
 
+Buffer rings
+============
 
+Buffer rings have two main advantages:
 
+* Reduced memory usage: payload buffers are pooled and selected on demand
+  rather than dedicated per-entry, allowing fewer buffers than entries. This
+  infrastructure also allows for future optimizations like incremental buffer
+  consumption where non-overlapping parts of a buffer may be used across
+  concurrent requests.
+* Foundation for pinned buffers: contiguous buffer allocations allow the
+  kernel to pin and vmap the entire region, avoiding per-request page
+  resolution overhead
+
+At a high-level, this is how fuse uses buffer rings:
+
+* The first REGISTER SQE for a queue creates the queue and sets up the
+  buffer ring. The server provides two iovecs: one for headers and one for
+  payload buffers. Each entry gets a fixed ID (sqe->buf_index) that maps
+  to a specific header slot.
+* When a client request arrives, the kernel selects a payload buffer from
+  the ring (if the request has copyable data), copies headers and payload
+  data, and completes the sqe.
+* The buf_id of the selected payload buffer is communicated to the server
+  via the fuse_uring_ent_in_out header. The server uses this to locate the
+  payload data in its buffer.
+* The server processes the request and sends a COMMIT_AND_FETCH SQE with
+  the reply. The kernel processes the reply and recycles the buffer.
+
+Visually, this looks like::
+
+ Headers buffer:
+ +-----------------------+-----------------------+-----+
+ | fuse_uring_req_header | fuse_uring_req_header | ... |
+ | [ent 0]               | [ent 1]               |     |
+ +-----------------------+-----------------------+-----+
+ ^                       ^
+ |                       |
+ ent 0 header slot       ent 1 header slot
+ (sqe->buf_index=0)      (sqe->buf_index=1)
+
+ Payload buffer pool:
+ +-----------+-----------+-----------+-----+
+ | buf 0     | buf 1     | buf 2     | ... |
+ | (buf_size)| (buf_size)| (buf_size)|     |
+ +-----------+-----------+-----------+-----+
+ selected on demand, recycled after each request
+
+Buffer ring request flow
+------------------------::
+
+|  Kernel                                  |  FUSE daemon
+|                                          |
+|  [client request arrives]                |
+|  >fuse_uring_send()                      |
+|    [select payload buf from ring]        |
+|    >fuse_uring_select_buffer()           |
+|    [copy headers to ent's header slot]   |
+|    >copy_header_to_ring()                |
+|    [copy payload to selected buf]        |
+|    >fuse_uring_copy_to_ring()            |
+|    [set buf_id in ent_in_out header]     |
+|    >io_uring_cmd_done()                  |
+|                                          |  [CQE received]
+|                                          |  [read headers from header slot]
+|                                          |  [read payload from buf_id]
+|                                          |  [process request]
+|                                          |  [write reply to header slot]
+|                                          |  [write reply payload to buf]
+|                                          |  >io_uring_submit()
+|                                          |   COMMIT_AND_FETCH
+|  >fuse_uring_commit_fetch()              |
+|    >fuse_uring_commit()                  |
+|     [copy reply from ring]               |
+|     >fuse_uring_recycle_buffer()         |
+|    >fuse_uring_get_next_fuse_req()       |
+
+Pinned buffers
+==============
+
+Servers can optionally pin their header and/or payload buffers by setting
+FUSE_URING_PINNED_HEADERS and/or FUSE_URING_PINNED_BUFFERS flags. When
+set, the kernel pins the user pages and vmaps them during queue setup,
+enabling memcpy to/from the kernel virtual address instead of
+copy_to_user/copy_from_user.
+
+This avoids the per-request cost of pinning/unpinning user pages and
+translating virtual addresses. Buffers must be page-aligned. The pinned pages
+are accounted against RLIMIT_MEMLOCK (bypassable with CAP_IPC_LOCK).
+
+Zero-copy
+=========
+
+Fuse io-uring zero-copy allows the server to directly read from / write to
+the client's pages, bypassing any intermediary buffer copies. This requires
+the FUSE_URING_ZERO_COPY flag, buffer rings with pinned headers and buffers,
+and CAP_SYS_ADMIN.
+
+The kernel registers the client's underlying pages as a sparse buffer at
+the entry's fixed id via io_buffer_register_bvec(). The fuse server can
+then perform io_uring read/write operations directly on these pages.
+Non-page-backed args (eg out headers) go through the payload buffer as
+normal. Pages are unregistered when the request completes.
+
+The request flow for the zero-copy write path (client writes data, server
+reads it) is as follows:
+
+Zero-copy write
+---------------::
+|  Kernel                                   |  FUSE server
+|                                           |
+|  "write(fd, buf, 1MB)"                    |
+|                                           |
+|  >sys_write()                             |
+|    >fuse_file_write_iter()                |
+|      >fuse_send_one()                     |
+|        [req->args->in_pages = true]       |
+|        [folios hold client write data]    |
+|                                           |
+|  >fuse_uring_copy_to_ring()               |
+|    >copy_header_to_ring(IN_OUT)           |
+|      [memcpy fuse_in_header to            |
+|       pinned headers buf via kaddr]       |
+|    >copy_header_to_ring(OP)               |
+|      [memcpy write_in header]             |
+|                                           |
+|    >fuse_uring_args_to_ring()             |
+|      >setup_fuse_copy_state()             |
+|        [is_kaddr = true]                  |
+|        [skip_folio_copy = true]           |
+|                                           |
+|      >fuse_uring_set_up_zero_copy()       |
+|        [folio_get for each client folio]  |
+|        [build bio_vec array from folios]  |
+|        >io_buffer_register_bvec()         |
+|          [register pages at ent->id]      |
+|        [ent->zero_copied = true]          |
+|                                           |
+|      >fuse_copy_args()                    |
+|        [skip_folio_copy => return 0       |
+|         for page arg, skip data copy]     |
+|                                           |
+|    >copy_header_to_ring(RING_ENT)         |
+|      [memcpy ent_in_out]                  |
+|    >io_uring_cmd_done()                   |
+|                                           |
+|                                           | [CQE received]
+|                                           |
+|                                           | [issue io_uring READ at
+|                                           |  ent->id]
+|                                           | [reads directly from
+|                                           | client's pages (ZERO_COPY)]
+|                                           |
+|                                           | [write data to backing
+|                                           | store]
+|                                           |  [submit COMMIT AND FETCH]
+|                                           |
+|  >fuse_uring_commit_fetch()               |
+|    >fuse_uring_commit()                   |
+|      >fuse_uring_copy_from_ring()         |
+|    >fuse_uring_req_end()                  |
+|      >io_buffer_unregister(ent->id)       |
+|        [unregister sparse buffer]         |
+|      >fuse_zero_copy_release()            |
+|        [folio_put for each folio]         |
+|      [ent->zero_copied = false]           |
+|      >fuse_request_end()                  |
+|        [wake up client]                   |
+
+The zero-copy read path is analogous.
+
+Some requests may have both page-backed args and non-page-backed args.
+For these requests, the page-backed args are zero-copied while the
+non-page-backed args are copied to the buffer selected from the buffer
+ring:
+  zero-copy: pages registered via io_buffer_register_bvec()
+  non-page-backed: copied to payload buffer via fuse_copy_args()
+
+For a request whose payload is zero-copied, the registration/unregistration
+path looks like:
+
+register:  fuse_uring_set_up_zero_copy()
+	     folio_get() for each folio
+	     io_buffer_register_bvec(ent->id)
+
+[server accesses pages via io_uring fixed buf at ent->id]
+
+unregister: fuse_uring_req_end()
+	      io_buffer_unregister(ent->id)
+	      -> fuse_zero_copy_release() callback
+		 folio_put() for each folio
-- 
2.52.0


  parent reply	other threads:[~2026-04-02 16:30 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-02 16:28 [PATCH v2 00/14] fuse: add io-uring buffer rings and zero-copy Joanne Koong
2026-04-02 16:28 ` [PATCH v2 01/14] fuse: separate next request fetching from sending logic Joanne Koong
2026-04-29 11:52   ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 02/14] fuse: refactor io-uring header copying to ring Joanne Koong
2026-04-29 12:05   ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 03/14] fuse: refactor io-uring header copying from ring Joanne Koong
2026-04-29 12:06   ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 04/14] fuse: use enum types for header copying Joanne Koong
2026-04-30  8:04   ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 05/14] fuse: refactor setting up copy state for payload copying Joanne Koong
2026-04-30  8:06   ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 06/14] fuse: support buffer copying for kernel addresses Joanne Koong
2026-04-30  8:19   ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 07/14] fuse: use named constants for io-uring iovec indices Joanne Koong
2026-04-15  9:36   ` Bernd Schubert
2026-04-30  8:20   ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 08/14] fuse: move fuse_uring_abort() from header to dev_uring.c Joanne Koong
2026-04-15  9:40   ` Bernd Schubert
2026-04-30  8:21   ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 09/14] fuse: rearrange io-uring iovec and ent allocation logic Joanne Koong
2026-04-15  9:45   ` Bernd Schubert
2026-04-30  8:24   ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 10/14] fuse: add io-uring buffer rings Joanne Koong
2026-04-15  9:48   ` Bernd Schubert
2026-04-15 21:40     ` Joanne Koong
2026-04-30 11:08   ` Jeff Layton
2026-04-30 12:44     ` Joanne Koong
2026-05-05 22:47   ` Bernd Schubert
2026-04-02 16:28 ` [PATCH v2 11/14] fuse: add pinned headers capability for " Joanne Koong
2026-04-14 12:47   ` Bernd Schubert
2026-04-15  0:48     ` Joanne Koong
2026-05-05 22:51       ` Bernd Schubert
2026-04-30 11:22   ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 12/14] fuse: add pinned payload buffers " Joanne Koong
2026-04-30 11:29   ` Jeff Layton
2026-04-02 16:28 ` [PATCH v2 13/14] fuse: add zero-copy over io-uring Joanne Koong
2026-04-30 11:42   ` Jeff Layton
2026-04-30 12:35     ` Joanne Koong
2026-04-30 12:55       ` Jeff Layton
2026-05-05 22:55         ` Bernd Schubert
2026-04-30 12:56   ` Jeff Layton
2026-05-05 23:45   ` Bernd Schubert
2026-04-02 16:28 ` Joanne Koong [this message]
2026-04-14 21:05   ` [PATCH v2 14/14] docs: fuse: add io-uring bufring and zero-copy documentation Bernd Schubert
2026-04-15  1:10     ` Joanne Koong
2026-04-15 10:55       ` Bernd Schubert
2026-04-15 22:40         ` Joanne Koong
2026-04-30 12:57   ` Jeff Layton
2026-04-30 12:59 ` [PATCH v2 00/14] fuse: add io-uring buffer rings and zero-copy Jeff Layton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260402162840.2989717-15-joannelkoong@gmail.com \
    --to=joannelkoong@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=bernd@bsbernd.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox