* [PATCH v1 0/9] fuse: add io-uring buffer rings and zero-copy
@ 2026-03-24 22:45 Joanne Koong
2026-03-24 22:45 ` [PATCH v1 1/9] fuse: separate next request fetching from sending logic Joanne Koong
` (8 more replies)
0 siblings, 9 replies; 10+ messages in thread
From: Joanne Koong @ 2026-03-24 22:45 UTC (permalink / raw)
To: axboe; +Cc: miklos, bschubert, linux-fsdevel
This series adds buffer ring and zero-copy capabilities to fuse over io-uring.
Using buffer rings has two benefits:
a) eliminates the overhead of pinning/unpinning user pages and
translating virtual addresses for every server-kernel interaction
b) reduces the amount of memory needed for the buffers per queue.
Incremental buffer consumption, when added, will allow non-overlapping
regions of a buffer to be used simultaneously across multiple requests.
Using zero-copy (only for privileged servers) eliminates the memory copies
between kernel and userspace for read/write/payload-heavy operations by
allowing the server to directly operate on the client's underlying pages.
This series has a dependency on io-uring kernel-managed buffer rings [1] and
the io-uring registered bvec buffers changes in [2]. It contains the fuse
changes carried over from the larger series in [3]. Because of the heavy
dependency on the io-uring changes, this series is submitted against Jens's
io-uring tree instead of Miklos's fuse tree.
This series has the following changes from [3]:
* uses a registered memory region for headers instead of a fixed buffer.
This avoids the overhead of fixed buffer lookups and having to do a
refcounting dance on every i/o
* adds folio reference acquisition for pages registered into the bvec (and the
corresponding release callback) to account for the case where a server is
forcibly unmounted while non-fuse io-uring requests using those pages are
in-flight
Benchmarks for zero-copy showed approximately the following differences in
throughput for bs=1M:
direct randreads: ~20% increase (~2100 MB/s -> ~2600 MB/s)
buffered randreads: ~25% increase (~1900 MB/s -> 2400 MB/s)
direct randwrites: no difference (~750 MB/s)
buffered randwrites: ~10% increase (950 MB/s -> 1050 MB/s)
The benchmark was run using fio on the passthrough_hp server:
fio --name=test_run --ioengine=sync --rw=rand{read,write} --bs=1M
--size=1G --numjobs=2 --ramp_time=30 --group_reporting=1
The libfuse changes can be found in [4]. This has a dependency on the liburing
changes in [5]. To test the server, you can run it with:
sudo ~/libfuse/build/example/passthrough_hp ~/src ~/mounts/tmp
--nopassthrough -o io_uring_zero_copy -o io_uring_q_depth=8
Once this series is merged, the libfuse changes will be tidied up and
submitted upstream.
Thanks,
Joanne
[1] https://lore.kernel.org/linux-fsdevel/20260306003224.3620942-1-joannelkoong@gmail.com/
[2] https://lore.kernel.org/io-uring/20260324221426.3436334-1-joannelkoong@gmail.com/
[3] https://lore.kernel.org/linux-fsdevel/20260116233044.1532965-1-joannelkoong@gmail.com/
[4] https://github.com/joannekoong/libfuse/tree/zero_copy
[5] https://github.com/joannekoong/liburing/commits/pbuf_kernel_managed/
Joanne Koong (9):
fuse: separate next request fetching from sending logic
fuse: refactor io-uring header copying to ring
fuse: refactor io-uring header copying from ring
fuse: use enum types for header copying
fuse: refactor setting up copy state for payload copying
fuse: support buffer copying for kernel addresses
fuse: add io-uring kernel-managed buffer ring
fuse: add zero-copy over io-uring
docs: fuse: add io-uring bufring and zero-copy documentation
.../filesystems/fuse/fuse-io-uring.rst | 61 +-
fs/fuse/dev.c | 32 +-
fs/fuse/dev_uring.c | 696 ++++++++++++++----
fs/fuse/dev_uring_i.h | 50 +-
fs/fuse/fuse_dev_i.h | 8 +-
include/uapi/linux/fuse.h | 20 +-
6 files changed, 707 insertions(+), 160 deletions(-)
--
2.52.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v1 1/9] fuse: separate next request fetching from sending logic
2026-03-24 22:45 [PATCH v1 0/9] fuse: add io-uring buffer rings and zero-copy Joanne Koong
@ 2026-03-24 22:45 ` Joanne Koong
2026-03-24 22:45 ` [PATCH v1 2/9] fuse: refactor io-uring header copying to ring Joanne Koong
` (7 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Joanne Koong @ 2026-03-24 22:45 UTC (permalink / raw)
To: axboe; +Cc: miklos, bschubert, linux-fsdevel
Simplify the logic for fetching + sending off the next request.
This gets rid of fuse_uring_send_next_to_ring() which contained
duplicated logic from fuse_uring_send(). This decouples request fetching
from the send operation, which makes the control flow clearer and
reduces unnecessary parameter passing.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Bernd Schubert <bschubert@ddn.com>
---
fs/fuse/dev_uring.c | 78 ++++++++++++++++-----------------------------
1 file changed, 28 insertions(+), 50 deletions(-)
diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index 3a38b61aac26..54436d3fda4d 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -714,34 +714,6 @@ static int fuse_uring_prepare_send(struct fuse_ring_ent *ent,
return err;
}
-/*
- * Write data to the ring buffer and send the request to userspace,
- * userspace will read it
- * This is comparable with classical read(/dev/fuse)
- */
-static int fuse_uring_send_next_to_ring(struct fuse_ring_ent *ent,
- struct fuse_req *req,
- unsigned int issue_flags)
-{
- struct fuse_ring_queue *queue = ent->queue;
- int err;
- struct io_uring_cmd *cmd;
-
- err = fuse_uring_prepare_send(ent, req);
- if (err)
- return err;
-
- spin_lock(&queue->lock);
- cmd = ent->cmd;
- ent->cmd = NULL;
- ent->state = FRRS_USERSPACE;
- list_move_tail(&ent->list, &queue->ent_in_userspace);
- spin_unlock(&queue->lock);
-
- io_uring_cmd_done(cmd, 0, issue_flags);
- return 0;
-}
-
/*
* Make a ring entry available for fuse_req assignment
*/
@@ -838,11 +810,13 @@ static void fuse_uring_commit(struct fuse_ring_ent *ent, struct fuse_req *req,
}
/*
- * Get the next fuse req and send it
+ * Get the next fuse req.
+ *
+ * Returns true if the next fuse request has been assigned to the ent.
+ * Else, there is no next fuse request and this returns false.
*/
-static void fuse_uring_next_fuse_req(struct fuse_ring_ent *ent,
- struct fuse_ring_queue *queue,
- unsigned int issue_flags)
+static bool fuse_uring_get_next_fuse_req(struct fuse_ring_ent *ent,
+ struct fuse_ring_queue *queue)
{
int err;
struct fuse_req *req;
@@ -854,10 +828,12 @@ static void fuse_uring_next_fuse_req(struct fuse_ring_ent *ent,
spin_unlock(&queue->lock);
if (req) {
- err = fuse_uring_send_next_to_ring(ent, req, issue_flags);
+ err = fuse_uring_prepare_send(ent, req);
if (err)
goto retry;
}
+
+ return req != NULL;
}
static int fuse_ring_ent_set_commit(struct fuse_ring_ent *ent)
@@ -875,6 +851,20 @@ static int fuse_ring_ent_set_commit(struct fuse_ring_ent *ent)
return 0;
}
+static void fuse_uring_send(struct fuse_ring_ent *ent, struct io_uring_cmd *cmd,
+ ssize_t ret, unsigned int issue_flags)
+{
+ struct fuse_ring_queue *queue = ent->queue;
+
+ spin_lock(&queue->lock);
+ ent->state = FRRS_USERSPACE;
+ list_move_tail(&ent->list, &queue->ent_in_userspace);
+ ent->cmd = NULL;
+ spin_unlock(&queue->lock);
+
+ io_uring_cmd_done(cmd, ret, issue_flags);
+}
+
/* FUSE_URING_CMD_COMMIT_AND_FETCH handler */
static int fuse_uring_commit_fetch(struct io_uring_cmd *cmd, int issue_flags,
struct fuse_conn *fc)
@@ -947,7 +937,8 @@ static int fuse_uring_commit_fetch(struct io_uring_cmd *cmd, int issue_flags,
* and fetching is done in one step vs legacy fuse, which has separated
* read (fetch request) and write (commit result).
*/
- fuse_uring_next_fuse_req(ent, queue, issue_flags);
+ if (fuse_uring_get_next_fuse_req(ent, queue))
+ fuse_uring_send(ent, cmd, 0, issue_flags);
return 0;
}
@@ -1196,20 +1187,6 @@ int fuse_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
return -EIOCBQUEUED;
}
-static void fuse_uring_send(struct fuse_ring_ent *ent, struct io_uring_cmd *cmd,
- ssize_t ret, unsigned int issue_flags)
-{
- struct fuse_ring_queue *queue = ent->queue;
-
- spin_lock(&queue->lock);
- ent->state = FRRS_USERSPACE;
- list_move_tail(&ent->list, &queue->ent_in_userspace);
- ent->cmd = NULL;
- spin_unlock(&queue->lock);
-
- io_uring_cmd_done(cmd, ret, issue_flags);
-}
-
/*
* This prepares and sends the ring request in fuse-uring task context.
* User buffers are not mapped yet - the application does not have permission
@@ -1226,8 +1203,9 @@ static void fuse_uring_send_in_task(struct io_tw_req tw_req, io_tw_token_t tw)
if (!tw.cancel) {
err = fuse_uring_prepare_send(ent, ent->fuse_req);
if (err) {
- fuse_uring_next_fuse_req(ent, queue, issue_flags);
- return;
+ if (!fuse_uring_get_next_fuse_req(ent, queue))
+ return;
+ err = 0;
}
} else {
err = -ECANCELED;
--
2.52.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v1 2/9] fuse: refactor io-uring header copying to ring
2026-03-24 22:45 [PATCH v1 0/9] fuse: add io-uring buffer rings and zero-copy Joanne Koong
2026-03-24 22:45 ` [PATCH v1 1/9] fuse: separate next request fetching from sending logic Joanne Koong
@ 2026-03-24 22:45 ` Joanne Koong
2026-03-24 22:45 ` [PATCH v1 3/9] fuse: refactor io-uring header copying from ring Joanne Koong
` (6 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Joanne Koong @ 2026-03-24 22:45 UTC (permalink / raw)
To: axboe; +Cc: miklos, bschubert, linux-fsdevel
Move header copying to ring logic into a new copy_header_to_ring()
function. This makes the copy_to_user() logic more clear and centralizes
error handling / rate-limited logging.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Bernd Schubert <bschubert@ddn.com>
---
fs/fuse/dev_uring.c | 39 +++++++++++++++++++++------------------
1 file changed, 21 insertions(+), 18 deletions(-)
diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index 54436d3fda4d..5fc8ca330595 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -575,6 +575,18 @@ static int fuse_uring_out_header_has_err(struct fuse_out_header *oh,
return err;
}
+static __always_inline int copy_header_to_ring(void __user *ring,
+ const void *header,
+ size_t header_size)
+{
+ if (copy_to_user(ring, header, header_size)) {
+ pr_info_ratelimited("Copying header to ring failed.\n");
+ return -EFAULT;
+ }
+
+ return 0;
+}
+
static int fuse_uring_copy_from_ring(struct fuse_ring *ring,
struct fuse_req *req,
struct fuse_ring_ent *ent)
@@ -637,13 +649,11 @@ static int fuse_uring_args_to_ring(struct fuse_ring *ring, struct fuse_req *req,
* Some op code have that as zero size.
*/
if (args->in_args[0].size > 0) {
- err = copy_to_user(&ent->headers->op_in, in_args->value,
- in_args->size);
- if (err) {
- pr_info_ratelimited(
- "Copying the header failed.\n");
- return -EFAULT;
- }
+ err = copy_header_to_ring(&ent->headers->op_in,
+ in_args->value,
+ in_args->size);
+ if (err)
+ return err;
}
in_args++;
num_args--;
@@ -659,9 +669,8 @@ static int fuse_uring_args_to_ring(struct fuse_ring *ring, struct fuse_req *req,
}
ent_in_out.payload_sz = cs.ring.copied_sz;
- err = copy_to_user(&ent->headers->ring_ent_in_out, &ent_in_out,
- sizeof(ent_in_out));
- return err ? -EFAULT : 0;
+ return copy_header_to_ring(&ent->headers->ring_ent_in_out, &ent_in_out,
+ sizeof(ent_in_out));
}
static int fuse_uring_copy_to_ring(struct fuse_ring_ent *ent,
@@ -690,14 +699,8 @@ static int fuse_uring_copy_to_ring(struct fuse_ring_ent *ent,
}
/* copy fuse_in_header */
- err = copy_to_user(&ent->headers->in_out, &req->in.h,
- sizeof(req->in.h));
- if (err) {
- err = -EFAULT;
- return err;
- }
-
- return 0;
+ return copy_header_to_ring(&ent->headers->in_out, &req->in.h,
+ sizeof(req->in.h));
}
static int fuse_uring_prepare_send(struct fuse_ring_ent *ent,
--
2.52.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v1 3/9] fuse: refactor io-uring header copying from ring
2026-03-24 22:45 [PATCH v1 0/9] fuse: add io-uring buffer rings and zero-copy Joanne Koong
2026-03-24 22:45 ` [PATCH v1 1/9] fuse: separate next request fetching from sending logic Joanne Koong
2026-03-24 22:45 ` [PATCH v1 2/9] fuse: refactor io-uring header copying to ring Joanne Koong
@ 2026-03-24 22:45 ` Joanne Koong
2026-03-24 22:45 ` [PATCH v1 4/9] fuse: use enum types for header copying Joanne Koong
` (5 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Joanne Koong @ 2026-03-24 22:45 UTC (permalink / raw)
To: axboe; +Cc: miklos, bschubert, linux-fsdevel
Move header copying from ring logic into a new copy_header_from_ring()
function. This makes the copy_from_user() logic more clear and
centralizes error handling / rate-limited logging.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Bernd Schubert <bschubert@ddn.com>
---
fs/fuse/dev_uring.c | 24 ++++++++++++++++++------
1 file changed, 18 insertions(+), 6 deletions(-)
diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index 5fc8ca330595..86f9bb94b45a 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -587,6 +587,18 @@ static __always_inline int copy_header_to_ring(void __user *ring,
return 0;
}
+static __always_inline int copy_header_from_ring(void *header,
+ const void __user *ring,
+ size_t header_size)
+{
+ if (copy_from_user(header, ring, header_size)) {
+ pr_info_ratelimited("Copying header from ring failed.\n");
+ return -EFAULT;
+ }
+
+ return 0;
+}
+
static int fuse_uring_copy_from_ring(struct fuse_ring *ring,
struct fuse_req *req,
struct fuse_ring_ent *ent)
@@ -597,10 +609,10 @@ static int fuse_uring_copy_from_ring(struct fuse_ring *ring,
int err;
struct fuse_uring_ent_in_out ring_in_out;
- err = copy_from_user(&ring_in_out, &ent->headers->ring_ent_in_out,
- sizeof(ring_in_out));
+ err = copy_header_from_ring(&ring_in_out, &ent->headers->ring_ent_in_out,
+ sizeof(ring_in_out));
if (err)
- return -EFAULT;
+ return err;
err = import_ubuf(ITER_SOURCE, ent->payload, ring->max_payload_sz,
&iter);
@@ -794,10 +806,10 @@ static void fuse_uring_commit(struct fuse_ring_ent *ent, struct fuse_req *req,
struct fuse_conn *fc = ring->fc;
ssize_t err = 0;
- err = copy_from_user(&req->out.h, &ent->headers->in_out,
- sizeof(req->out.h));
+ err = copy_header_from_ring(&req->out.h, &ent->headers->in_out,
+ sizeof(req->out.h));
if (err) {
- req->out.h.error = -EFAULT;
+ req->out.h.error = err;
goto out;
}
--
2.52.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v1 4/9] fuse: use enum types for header copying
2026-03-24 22:45 [PATCH v1 0/9] fuse: add io-uring buffer rings and zero-copy Joanne Koong
` (2 preceding siblings ...)
2026-03-24 22:45 ` [PATCH v1 3/9] fuse: refactor io-uring header copying from ring Joanne Koong
@ 2026-03-24 22:45 ` Joanne Koong
2026-03-24 22:45 ` [PATCH v1 5/9] fuse: refactor setting up copy state for payload copying Joanne Koong
` (4 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Joanne Koong @ 2026-03-24 22:45 UTC (permalink / raw)
To: axboe; +Cc: miklos, bschubert, linux-fsdevel
Use enum types to identify which part of the header needs to be copied.
This improves the interface and will simplify both kernel-space and
user-space header addresses copying when kernel-managed buffer rings are
added.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Bernd Schubert <bschubert@ddn.com>
---
fs/fuse/dev_uring.c | 62 +++++++++++++++++++++++++++++++++++++--------
1 file changed, 52 insertions(+), 10 deletions(-)
diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index 86f9bb94b45a..daf0c3dffcdb 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -31,6 +31,15 @@ struct fuse_uring_pdu {
static const struct fuse_iqueue_ops fuse_io_uring_ops;
+enum fuse_uring_header_type {
+ /* struct fuse_in_header / struct fuse_out_header */
+ FUSE_URING_HEADER_IN_OUT,
+ /* per op code header */
+ FUSE_URING_HEADER_OP,
+ /* struct fuse_uring_ent_in_out header */
+ FUSE_URING_HEADER_RING_ENT,
+};
+
static void uring_cmd_set_ring_ent(struct io_uring_cmd *cmd,
struct fuse_ring_ent *ring_ent)
{
@@ -575,10 +584,34 @@ static int fuse_uring_out_header_has_err(struct fuse_out_header *oh,
return err;
}
-static __always_inline int copy_header_to_ring(void __user *ring,
+static int ring_header_type_offset(enum fuse_uring_header_type type)
+{
+ switch (type) {
+ case FUSE_URING_HEADER_IN_OUT:
+ return 0;
+ case FUSE_URING_HEADER_OP:
+ return offsetof(struct fuse_uring_req_header, op_in);
+ case FUSE_URING_HEADER_RING_ENT:
+ return offsetof(struct fuse_uring_req_header, ring_ent_in_out);
+ default:
+ WARN_ONCE(1, "Invalid header type: %d\n", type);
+ return -EINVAL;
+ }
+}
+
+static __always_inline int copy_header_to_ring(struct fuse_ring_ent *ent,
+ enum fuse_uring_header_type type,
const void *header,
size_t header_size)
{
+ int offset = ring_header_type_offset(type);
+ void __user *ring;
+
+ if (offset < 0)
+ return offset;
+
+ ring = (char __user *)ent->headers + offset;
+
if (copy_to_user(ring, header, header_size)) {
pr_info_ratelimited("Copying header to ring failed.\n");
return -EFAULT;
@@ -587,10 +620,19 @@ static __always_inline int copy_header_to_ring(void __user *ring,
return 0;
}
-static __always_inline int copy_header_from_ring(void *header,
- const void __user *ring,
+static __always_inline int copy_header_from_ring(struct fuse_ring_ent *ent,
+ enum fuse_uring_header_type type,
+ void *header,
size_t header_size)
{
+ int offset = ring_header_type_offset(type);
+ const void __user *ring;
+
+ if (offset < 0)
+ return offset;
+
+ ring = (char __user *)ent->headers + offset;
+
if (copy_from_user(header, ring, header_size)) {
pr_info_ratelimited("Copying header from ring failed.\n");
return -EFAULT;
@@ -609,8 +651,8 @@ static int fuse_uring_copy_from_ring(struct fuse_ring *ring,
int err;
struct fuse_uring_ent_in_out ring_in_out;
- err = copy_header_from_ring(&ring_in_out, &ent->headers->ring_ent_in_out,
- sizeof(ring_in_out));
+ err = copy_header_from_ring(ent, FUSE_URING_HEADER_RING_ENT,
+ &ring_in_out, sizeof(ring_in_out));
if (err)
return err;
@@ -661,7 +703,7 @@ static int fuse_uring_args_to_ring(struct fuse_ring *ring, struct fuse_req *req,
* Some op code have that as zero size.
*/
if (args->in_args[0].size > 0) {
- err = copy_header_to_ring(&ent->headers->op_in,
+ err = copy_header_to_ring(ent, FUSE_URING_HEADER_OP,
in_args->value,
in_args->size);
if (err)
@@ -681,8 +723,8 @@ static int fuse_uring_args_to_ring(struct fuse_ring *ring, struct fuse_req *req,
}
ent_in_out.payload_sz = cs.ring.copied_sz;
- return copy_header_to_ring(&ent->headers->ring_ent_in_out, &ent_in_out,
- sizeof(ent_in_out));
+ return copy_header_to_ring(ent, FUSE_URING_HEADER_RING_ENT,
+ &ent_in_out, sizeof(ent_in_out));
}
static int fuse_uring_copy_to_ring(struct fuse_ring_ent *ent,
@@ -711,7 +753,7 @@ static int fuse_uring_copy_to_ring(struct fuse_ring_ent *ent,
}
/* copy fuse_in_header */
- return copy_header_to_ring(&ent->headers->in_out, &req->in.h,
+ return copy_header_to_ring(ent, FUSE_URING_HEADER_IN_OUT, &req->in.h,
sizeof(req->in.h));
}
@@ -806,7 +848,7 @@ static void fuse_uring_commit(struct fuse_ring_ent *ent, struct fuse_req *req,
struct fuse_conn *fc = ring->fc;
ssize_t err = 0;
- err = copy_header_from_ring(&req->out.h, &ent->headers->in_out,
+ err = copy_header_from_ring(ent, FUSE_URING_HEADER_IN_OUT, &req->out.h,
sizeof(req->out.h));
if (err) {
req->out.h.error = err;
--
2.52.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v1 5/9] fuse: refactor setting up copy state for payload copying
2026-03-24 22:45 [PATCH v1 0/9] fuse: add io-uring buffer rings and zero-copy Joanne Koong
` (3 preceding siblings ...)
2026-03-24 22:45 ` [PATCH v1 4/9] fuse: use enum types for header copying Joanne Koong
@ 2026-03-24 22:45 ` Joanne Koong
2026-03-24 22:45 ` [PATCH v1 6/9] fuse: support buffer copying for kernel addresses Joanne Koong
` (3 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Joanne Koong @ 2026-03-24 22:45 UTC (permalink / raw)
To: axboe; +Cc: miklos, bschubert, linux-fsdevel
Add a new helper function setup_fuse_copy_state() to contain the logic
for setting up the copy state for payload copying.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Bernd Schubert <bschubert@ddn.com>
---
fs/fuse/dev_uring.c | 38 ++++++++++++++++++++++++--------------
1 file changed, 24 insertions(+), 14 deletions(-)
diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index daf0c3dffcdb..81cd20e0d50b 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -641,6 +641,27 @@ static __always_inline int copy_header_from_ring(struct fuse_ring_ent *ent,
return 0;
}
+static int setup_fuse_copy_state(struct fuse_copy_state *cs,
+ struct fuse_ring *ring, struct fuse_req *req,
+ struct fuse_ring_ent *ent, int dir,
+ struct iov_iter *iter)
+{
+ int err;
+
+ err = import_ubuf(dir, ent->payload, ring->max_payload_sz, iter);
+ if (err) {
+ pr_info_ratelimited("fuse: Import of user buffer failed\n");
+ return err;
+ }
+
+ fuse_copy_init(cs, dir == ITER_DEST, iter);
+
+ cs->is_uring = true;
+ cs->req = req;
+
+ return 0;
+}
+
static int fuse_uring_copy_from_ring(struct fuse_ring *ring,
struct fuse_req *req,
struct fuse_ring_ent *ent)
@@ -656,15 +677,10 @@ static int fuse_uring_copy_from_ring(struct fuse_ring *ring,
if (err)
return err;
- err = import_ubuf(ITER_SOURCE, ent->payload, ring->max_payload_sz,
- &iter);
+ err = setup_fuse_copy_state(&cs, ring, req, ent, ITER_SOURCE, &iter);
if (err)
return err;
- fuse_copy_init(&cs, false, &iter);
- cs.is_uring = true;
- cs.req = req;
-
err = fuse_copy_out_args(&cs, args, ring_in_out.payload_sz);
fuse_copy_finish(&cs);
return err;
@@ -687,15 +703,9 @@ static int fuse_uring_args_to_ring(struct fuse_ring *ring, struct fuse_req *req,
.commit_id = req->in.h.unique,
};
- err = import_ubuf(ITER_DEST, ent->payload, ring->max_payload_sz, &iter);
- if (err) {
- pr_info_ratelimited("fuse: Import of user buffer failed\n");
+ err = setup_fuse_copy_state(&cs, ring, req, ent, ITER_DEST, &iter);
+ if (err)
return err;
- }
-
- fuse_copy_init(&cs, true, &iter);
- cs.is_uring = true;
- cs.req = req;
if (num_args > 0) {
/*
--
2.52.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v1 6/9] fuse: support buffer copying for kernel addresses
2026-03-24 22:45 [PATCH v1 0/9] fuse: add io-uring buffer rings and zero-copy Joanne Koong
` (4 preceding siblings ...)
2026-03-24 22:45 ` [PATCH v1 5/9] fuse: refactor setting up copy state for payload copying Joanne Koong
@ 2026-03-24 22:45 ` Joanne Koong
2026-03-24 22:45 ` [PATCH v1 7/9] fuse: add io-uring kernel-managed buffer ring Joanne Koong
` (2 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Joanne Koong @ 2026-03-24 22:45 UTC (permalink / raw)
To: axboe; +Cc: miklos, bschubert, linux-fsdevel, Bernd Schubert
This is a preparatory patch needed to support kernel-managed ring
buffers in fuse-over-io-uring. For kernel-managed ring buffers, we get
the vmapped address of the buffer which we can directly use.
Currently, buffer copying in fuse only supports extracting underlying
pages from an iov iter and kmapping them. This commit allows buffer
copying to work directly on a kaddr.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Bernd Schubert <bernd@bsbernd.com>
---
fs/fuse/dev.c | 25 +++++++++++++++++++------
fs/fuse/fuse_dev_i.h | 7 ++++++-
2 files changed, 25 insertions(+), 7 deletions(-)
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 0b0241f47170..7fee4ff64348 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -848,6 +848,9 @@ void fuse_copy_init(struct fuse_copy_state *cs, bool write,
/* Unmap and put previous page of userspace buffer */
void fuse_copy_finish(struct fuse_copy_state *cs)
{
+ if (cs->is_kaddr)
+ return;
+
if (cs->currbuf) {
struct pipe_buffer *buf = cs->currbuf;
@@ -873,6 +876,11 @@ static int fuse_copy_fill(struct fuse_copy_state *cs)
struct page *page;
int err;
+ if (cs->is_kaddr) {
+ WARN_ON_ONCE(!cs->len);
+ return 0;
+ }
+
err = unlock_request(cs->req);
if (err)
return err;
@@ -931,15 +939,20 @@ static int fuse_copy_do(struct fuse_copy_state *cs, void **val, unsigned *size)
{
unsigned ncpy = min(*size, cs->len);
if (val) {
- void *pgaddr = kmap_local_page(cs->pg);
- void *buf = pgaddr + cs->offset;
+ void *pgaddr, *buf;
+ if (!cs->is_kaddr) {
+ pgaddr = kmap_local_page(cs->pg);
+ buf = pgaddr + cs->offset;
+ } else {
+ buf = cs->kaddr + cs->offset;
+ }
if (cs->write)
memcpy(buf, *val, ncpy);
else
memcpy(*val, buf, ncpy);
-
- kunmap_local(pgaddr);
+ if (!cs->is_kaddr)
+ kunmap_local(pgaddr);
*val += ncpy;
}
*size -= ncpy;
@@ -1127,7 +1140,7 @@ static int fuse_copy_folio(struct fuse_copy_state *cs, struct folio **foliop,
}
while (count) {
- if (cs->write && cs->pipebufs && folio) {
+ if (cs->write && cs->pipebufs && folio && !cs->is_kaddr) {
/*
* Can't control lifetime of pipe buffers, so always
* copy user pages.
@@ -1139,7 +1152,7 @@ static int fuse_copy_folio(struct fuse_copy_state *cs, struct folio **foliop,
} else {
return fuse_ref_folio(cs, folio, offset, count);
}
- } else if (!cs->len) {
+ } else if (!cs->len && !cs->is_kaddr) {
if (cs->move_folios && folio &&
offset == 0 && count == size) {
err = fuse_try_move_folio(cs, foliop);
diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
index 134bf44aff0d..aa1d25421054 100644
--- a/fs/fuse/fuse_dev_i.h
+++ b/fs/fuse/fuse_dev_i.h
@@ -28,12 +28,17 @@ struct fuse_copy_state {
struct pipe_buffer *currbuf;
struct pipe_inode_info *pipe;
unsigned long nr_segs;
- struct page *pg;
+ union {
+ struct page *pg;
+ void *kaddr;
+ };
unsigned int len;
unsigned int offset;
bool write:1;
bool move_folios:1;
bool is_uring:1;
+ /* if set, use kaddr; otherwise use pg */
+ bool is_kaddr:1;
struct {
unsigned int copied_sz; /* copied size into the user buffer */
} ring;
--
2.52.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v1 7/9] fuse: add io-uring kernel-managed buffer ring
2026-03-24 22:45 [PATCH v1 0/9] fuse: add io-uring buffer rings and zero-copy Joanne Koong
` (5 preceding siblings ...)
2026-03-24 22:45 ` [PATCH v1 6/9] fuse: support buffer copying for kernel addresses Joanne Koong
@ 2026-03-24 22:45 ` Joanne Koong
2026-03-24 22:45 ` [PATCH v1 8/9] fuse: add zero-copy over io-uring Joanne Koong
2026-03-24 22:45 ` [PATCH v1 9/9] docs: fuse: add io-uring bufring and zero-copy documentation Joanne Koong
8 siblings, 0 replies; 10+ messages in thread
From: Joanne Koong @ 2026-03-24 22:45 UTC (permalink / raw)
To: axboe; +Cc: miklos, bschubert, linux-fsdevel
Add io-uring kernel-managed buffer ring capability for fuse servers
communicating through the io-uring interface.
This has two benefits:
a) eliminates the overhead of pinning/unpinning user pages and
translating virtual addresses for every server-kernel interaction
b) reduces the amount of memory needed for the buffers per queue.
Incremental buffer consumption, when added, will allow non-overlapping
regions of a buffer to be used simultaneously across multiple requests.
Buffer ring usage is set on a per-queue basis. In order to use this, the
server needs to have preregistered a kernel-managed buffer ring and a
memory region big enough to hold the headers. This should be done before
queue creation is initiated. The kernel-managed buffer ring will be
pinned for the lifetime of the io-ring.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/fuse/dev_uring.c | 384 +++++++++++++++++++++++++++++++-------
fs/fuse/dev_uring_i.h | 46 ++++-
include/uapi/linux/fuse.h | 19 +-
3 files changed, 376 insertions(+), 73 deletions(-)
diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index 81cd20e0d50b..cd45b6a8e6c6 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -10,6 +10,7 @@
#include "fuse_trace.h"
#include <linux/fs.h>
+#include <linux/io_uring.h>
#include <linux/io_uring/cmd.h>
static bool __read_mostly enable_uring;
@@ -19,6 +20,7 @@ MODULE_PARM_DESC(enable_uring,
#define FUSE_URING_IOV_SEGS 2 /* header and payload */
+#define FUSE_URING_BUFRING_GROUP 0
bool fuse_uring_enabled(void)
{
@@ -40,6 +42,11 @@ enum fuse_uring_header_type {
FUSE_URING_HEADER_RING_ENT,
};
+static inline bool bufring_enabled(struct fuse_ring_queue *queue)
+{
+ return queue->bufring.enabled;
+}
+
static void uring_cmd_set_ring_ent(struct io_uring_cmd *cmd,
struct fuse_ring_ent *ring_ent)
{
@@ -276,20 +283,89 @@ static struct fuse_ring *fuse_uring_create(struct fuse_conn *fc)
return res;
}
-static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
- int qid)
+static void *fuse_uring_get_buf_ring_headers(struct io_uring_cmd *cmd,
+ u64 headers_offset,
+ unsigned int queue_depth,
+ unsigned int issue_flags)
+{
+ void *mem_reg_addr;
+ u64 mem_reg_size, headers_size;
+ unsigned nr_pages;
+
+ /* find the headers in the registered memory region */
+ mem_reg_addr = io_uring_registered_mem_region_get(cmd, &nr_pages,
+ issue_flags);
+
+ /* server does not have a registered memory region set up */
+ if (!mem_reg_addr)
+ return NULL;
+
+ mem_reg_size = (u64)nr_pages * PAGE_SIZE;
+ if (headers_offset > mem_reg_size)
+ return NULL;
+
+ /* verify headers fit within memory region bounds */
+ headers_size = (u64)queue_depth * sizeof(struct fuse_uring_req_header);
+ if (mem_reg_size - headers_offset < headers_size)
+ return NULL;
+
+ return mem_reg_addr + headers_offset;
+}
+
+static int fuse_uring_buf_ring_setup(struct io_uring_cmd *cmd,
+ struct fuse_ring_queue *queue,
+ unsigned int issue_flags)
+{
+ const struct fuse_uring_cmd_req *cmd_req =
+ io_uring_sqe128_cmd(cmd->sqe, struct fuse_uring_cmd_req);
+ u64 headers_offset = READ_ONCE(cmd_req->init.headers_offset);
+ unsigned queue_depth = READ_ONCE(cmd_req->init.queue_depth);
+ void *headers;
+ int err;
+
+ if (!queue_depth)
+ return -EINVAL;
+
+ headers = fuse_uring_get_buf_ring_headers(cmd, headers_offset,
+ queue_depth, issue_flags);
+ if (!headers)
+ return -EINVAL;
+
+ err = io_uring_buf_ring_pin(cmd, FUSE_URING_BUFRING_GROUP, issue_flags,
+ &queue->bufring.list);
+ if (err)
+ return err;
+
+ if (!io_uring_is_kmbuf_ring(cmd, FUSE_URING_BUFRING_GROUP, issue_flags))
+ goto error;
+
+ queue->bufring.headers = headers;
+ queue->bufring.queue_depth = queue_depth;
+ queue->bufring.enabled = true;
+
+ return 0;
+
+error:
+ io_uring_buf_ring_unpin(cmd, FUSE_URING_BUFRING_GROUP, issue_flags);
+ return -EINVAL;
+}
+
+static struct fuse_ring_queue *
+fuse_uring_create_queue(struct io_uring_cmd *cmd, struct fuse_ring *ring,
+ int qid, bool use_bufring, unsigned int issue_flags)
{
struct fuse_conn *fc = ring->fc;
struct fuse_ring_queue *queue;
struct list_head *pq;
+ int err;
queue = kzalloc_obj(*queue, GFP_KERNEL_ACCOUNT);
if (!queue)
- return NULL;
+ return ERR_PTR(-ENOMEM);
pq = kzalloc_objs(struct list_head, FUSE_PQ_HASH_SIZE);
if (!pq) {
kfree(queue);
- return NULL;
+ return ERR_PTR(-ENOMEM);
}
queue->qid = qid;
@@ -307,12 +383,29 @@ static struct fuse_ring_queue *fuse_uring_create_queue(struct fuse_ring *ring,
queue->fpq.processing = pq;
fuse_pqueue_init(&queue->fpq);
+ if (use_bufring) {
+ err = fuse_uring_buf_ring_setup(cmd, queue, issue_flags);
+ if (err) {
+ kfree(pq);
+ kfree(queue);
+ return ERR_PTR(err);
+ }
+ }
+
spin_lock(&fc->lock);
+ /* check if the queue creation raced with another thread */
if (ring->queues[qid]) {
spin_unlock(&fc->lock);
- kfree(queue->fpq.processing);
+ if (use_bufring)
+ io_uring_buf_ring_unpin(cmd, FUSE_URING_BUFRING_GROUP,
+ issue_flags);
+ kfree(pq);
kfree(queue);
- return ring->queues[qid];
+
+ queue = ring->queues[qid];
+ if (bufring_enabled(queue) != use_bufring)
+ return ERR_PTR(-EINVAL);
+ return queue;
}
/*
@@ -605,16 +698,25 @@ static __always_inline int copy_header_to_ring(struct fuse_ring_ent *ent,
size_t header_size)
{
int offset = ring_header_type_offset(type);
- void __user *ring;
if (offset < 0)
return offset;
- ring = (char __user *)ent->headers + offset;
+ if (bufring_enabled(ent->queue)) {
+ int buf_offset =
+ sizeof(struct fuse_uring_req_header) * ent->id;
- if (copy_to_user(ring, header, header_size)) {
- pr_info_ratelimited("Copying header to ring failed.\n");
- return -EFAULT;
+ memcpy(ent->queue->bufring.headers + buf_offset + offset,
+ header, header_size);
+ } else {
+ void __user *ring = (char __user *)ent->headers + offset;
+
+ if (copy_to_user(ring, header, header_size)) {
+ pr_info_ratelimited("Copying header to ring failed: "
+ "header_type=%u, header_size=%zu\n",
+ type, header_size);
+ return -EFAULT;
+ }
}
return 0;
@@ -626,16 +728,25 @@ static __always_inline int copy_header_from_ring(struct fuse_ring_ent *ent,
size_t header_size)
{
int offset = ring_header_type_offset(type);
- const void __user *ring;
if (offset < 0)
return offset;
- ring = (char __user *)ent->headers + offset;
+ if (bufring_enabled(ent->queue)) {
+ int buf_offset =
+ sizeof(struct fuse_uring_req_header) * ent->id;
- if (copy_from_user(header, ring, header_size)) {
- pr_info_ratelimited("Copying header from ring failed.\n");
- return -EFAULT;
+ memcpy(header, ent->queue->bufring.headers + buf_offset + offset,
+ header_size);
+ } else {
+ const void __user *ring = (char __user *)ent->headers + offset;
+
+ if (copy_from_user(header, ring, header_size)) {
+ pr_info_ratelimited("Copying header from ring failed: "
+ "header_type=%u, header_size=%zu\n",
+ type, header_size);
+ return -EFAULT;
+ }
}
return 0;
@@ -648,14 +759,23 @@ static int setup_fuse_copy_state(struct fuse_copy_state *cs,
{
int err;
- err = import_ubuf(dir, ent->payload, ring->max_payload_sz, iter);
- if (err) {
- pr_info_ratelimited("fuse: Import of user buffer failed\n");
- return err;
+ if (!bufring_enabled(ent->queue)) {
+ err = import_ubuf(dir, ent->payload, ring->max_payload_sz, iter);
+ if (err) {
+ pr_info_ratelimited("fuse: Import of user buffer "
+ "failed\n");
+ return err;
+ }
}
fuse_copy_init(cs, dir == ITER_DEST, iter);
+ if (bufring_enabled(ent->queue)) {
+ cs->is_kaddr = true;
+ cs->len = ent->payload_kvec.iov_len;
+ cs->kaddr = ent->payload_kvec.iov_base;
+ }
+
cs->is_uring = true;
cs->req = req;
@@ -767,6 +887,91 @@ static int fuse_uring_copy_to_ring(struct fuse_ring_ent *ent,
sizeof(req->in.h));
}
+static bool fuse_uring_req_has_payload(struct fuse_req *req)
+{
+ struct fuse_args *args = req->args;
+
+ return args->in_numargs > 1 || args->out_numargs;
+}
+
+static int fuse_uring_select_buffer(struct fuse_ring_ent *ent,
+ unsigned int issue_flags)
+ __must_hold(&queue->lock)
+{
+ struct io_br_sel sel;
+ size_t len = 0;
+
+ lockdep_assert_held(&ent->queue->lock);
+
+ /* Get a buffer to use for the payload */
+ sel = io_ring_buffer_select(cmd_to_io_kiocb(ent->cmd), &len,
+ ent->queue->bufring.list, issue_flags);
+ if (sel.val)
+ return sel.val;
+ if (!sel.kaddr)
+ return -ENOENT;
+
+ ent->payload_kvec.iov_base = sel.kaddr;
+ ent->payload_kvec.iov_len = len;
+ ent->buf_id = sel.buf_id;
+
+ return 0;
+}
+
+static void fuse_uring_recycle_buffer(struct fuse_ring_ent *ent,
+ unsigned int issue_flags)
+ __must_hold(&queue->lock)
+{
+ struct kvec *kvec = &ent->payload_kvec;
+
+ lockdep_assert_held(&ent->queue->lock);
+
+ if (!bufring_enabled(ent->queue) || !kvec->iov_base)
+ return;
+
+ WARN_ON_ONCE(io_uring_kmbuf_recycle(ent->cmd, FUSE_URING_BUFRING_GROUP,
+ (u64)(uintptr_t)kvec->iov_base,
+ kvec->iov_len, ent->buf_id,
+ issue_flags));
+
+ memset(kvec, 0, sizeof(*kvec));
+}
+
+static int fuse_uring_next_req_update_buffer(struct fuse_ring_ent *ent,
+ struct fuse_req *req,
+ unsigned int issue_flags)
+{
+ bool buffer_selected;
+ bool has_payload;
+
+ if (!bufring_enabled(ent->queue))
+ return 0;
+
+ buffer_selected = ent->payload_kvec.iov_base != NULL;
+ has_payload = fuse_uring_req_has_payload(req);
+
+ if (has_payload && !buffer_selected)
+ return fuse_uring_select_buffer(ent, issue_flags);
+
+ if (!has_payload && buffer_selected)
+ fuse_uring_recycle_buffer(ent, issue_flags);
+
+ return 0;
+}
+
+static int fuse_uring_prep_buffer(struct fuse_ring_ent *ent,
+ struct fuse_req *req, unsigned issue_flags)
+{
+ if (!bufring_enabled(ent->queue))
+ return 0;
+
+ /* no payload to copy, can skip selecting a buffer */
+ if (!fuse_uring_req_has_payload(req))
+ return 0;
+
+ return fuse_uring_select_buffer(ent, issue_flags);
+}
+
static int fuse_uring_prepare_send(struct fuse_ring_ent *ent,
struct fuse_req *req)
{
@@ -829,21 +1034,29 @@ static void fuse_uring_add_req_to_ring_ent(struct fuse_ring_ent *ent,
}
/* Fetch the next fuse request if available */
-static struct fuse_req *fuse_uring_ent_assign_req(struct fuse_ring_ent *ent)
+static struct fuse_req *fuse_uring_ent_assign_req(struct fuse_ring_ent *ent,
+ unsigned int issue_flags)
__must_hold(&queue->lock)
{
struct fuse_req *req;
struct fuse_ring_queue *queue = ent->queue;
struct list_head *req_queue = &queue->fuse_req_queue;
+ int err;
lockdep_assert_held(&queue->lock);
/* get and assign the next entry while it is still holding the lock */
req = list_first_entry_or_null(req_queue, struct fuse_req, list);
- if (req)
- fuse_uring_add_req_to_ring_ent(ent, req);
+ if (req) {
+ err = fuse_uring_next_req_update_buffer(ent, req, issue_flags);
+ if (!err) {
+ fuse_uring_add_req_to_ring_ent(ent, req);
+ return req;
+ }
+ }
- return req;
+ fuse_uring_recycle_buffer(ent, issue_flags);
+ return NULL;
}
/*
@@ -883,7 +1096,8 @@ static void fuse_uring_commit(struct fuse_ring_ent *ent, struct fuse_req *req,
* Else, there is no next fuse request and this returns false.
*/
static bool fuse_uring_get_next_fuse_req(struct fuse_ring_ent *ent,
- struct fuse_ring_queue *queue)
+ struct fuse_ring_queue *queue,
+ unsigned int issue_flags)
{
int err;
struct fuse_req *req;
@@ -891,7 +1105,7 @@ static bool fuse_uring_get_next_fuse_req(struct fuse_ring_ent *ent,
retry:
spin_lock(&queue->lock);
fuse_uring_ent_avail(ent, queue);
- req = fuse_uring_ent_assign_req(ent);
+ req = fuse_uring_ent_assign_req(ent, issue_flags);
spin_unlock(&queue->lock);
if (req) {
@@ -1003,8 +1217,14 @@ static int fuse_uring_commit_fetch(struct io_uring_cmd *cmd, int issue_flags,
* fuse requests would otherwise not get processed - committing
* and fetching is done in one step vs legacy fuse, which has separated
* read (fetch request) and write (commit result).
+ *
+ * If the server is using bufrings and has populated the ring with less
+ * payload buffers than ents, it is possible that there may not be an
+ * available buffer for the next request. If so, then the fetch is a
+ * no-op and the next request will be handled when a buffer becomes
+ * available.
*/
- if (fuse_uring_get_next_fuse_req(ent, queue))
+ if (fuse_uring_get_next_fuse_req(ent, queue, issue_flags))
fuse_uring_send(ent, cmd, 0, issue_flags);
return 0;
}
@@ -1100,39 +1320,51 @@ fuse_uring_create_ring_ent(struct io_uring_cmd *cmd,
struct iovec iov[FUSE_URING_IOV_SEGS];
int err;
- err = fuse_uring_get_iovec_from_sqe(cmd->sqe, iov);
- if (err) {
- pr_info_ratelimited("Failed to get iovec from sqe, err=%d\n",
- err);
- return ERR_PTR(err);
- }
-
- err = -EINVAL;
- if (iov[0].iov_len < sizeof(struct fuse_uring_req_header)) {
- pr_info_ratelimited("Invalid header len %zu\n", iov[0].iov_len);
- return ERR_PTR(err);
- }
-
- payload_size = iov[1].iov_len;
- if (payload_size < ring->max_payload_sz) {
- pr_info_ratelimited("Invalid req payload len %zu\n",
- payload_size);
- return ERR_PTR(err);
- }
-
- err = -ENOMEM;
ent = kzalloc_obj(*ent, GFP_KERNEL_ACCOUNT);
if (!ent)
- return ERR_PTR(err);
+ return ERR_PTR(-ENOMEM);
INIT_LIST_HEAD(&ent->list);
ent->queue = queue;
- ent->headers = iov[0].iov_base;
- ent->payload = iov[1].iov_base;
+
+ if (bufring_enabled(queue)) {
+ ent->id = READ_ONCE(cmd->sqe->buf_index);
+ if (ent->id >= queue->bufring.queue_depth) {
+ err = -EINVAL;
+ goto error;
+ }
+ } else {
+ err = fuse_uring_get_iovec_from_sqe(cmd->sqe, iov);
+ if (err) {
+ pr_info_ratelimited("Failed to get iovec from sqe, err=%d\n",
+ err);
+ goto error;
+ }
+
+ err = -EINVAL;
+ if (iov[0].iov_len < sizeof(struct fuse_uring_req_header)) {
+ pr_info_ratelimited("Invalid header len %zu\n",
+ iov[0].iov_len);
+ goto error;
+ }
+
+ payload_size = iov[1].iov_len;
+ if (payload_size < ring->max_payload_sz) {
+ pr_info_ratelimited("Invalid req payload len %zu\n",
+ payload_size);
+ goto error;
+ }
+ ent->headers = iov[0].iov_base;
+ ent->payload = iov[1].iov_base;
+ }
atomic_inc(&ring->queue_refs);
return ent;
+
+error:
+ kfree(ent);
+ return ERR_PTR(err);
}
/*
@@ -1144,6 +1376,8 @@ static int fuse_uring_register(struct io_uring_cmd *cmd,
{
const struct fuse_uring_cmd_req *cmd_req = io_uring_sqe128_cmd(cmd->sqe,
struct fuse_uring_cmd_req);
+ unsigned int init_flags = READ_ONCE(cmd_req->flags);
+ bool use_bufring = init_flags & FUSE_URING_BUF_RING;
struct fuse_ring *ring = smp_load_acquire(&fc->ring);
struct fuse_ring_queue *queue;
struct fuse_ring_ent *ent;
@@ -1164,9 +1398,13 @@ static int fuse_uring_register(struct io_uring_cmd *cmd,
queue = ring->queues[qid];
if (!queue) {
- queue = fuse_uring_create_queue(ring, qid);
- if (!queue)
- return err;
+ queue = fuse_uring_create_queue(cmd, ring, qid, use_bufring,
+ issue_flags);
+ if (IS_ERR(queue))
+ return PTR_ERR(queue);
+ } else {
+ if (bufring_enabled(queue) != use_bufring)
+ return -EINVAL;
}
/*
@@ -1270,7 +1508,8 @@ static void fuse_uring_send_in_task(struct io_tw_req tw_req, io_tw_token_t tw)
if (!tw.cancel) {
err = fuse_uring_prepare_send(ent, ent->fuse_req);
if (err) {
- if (!fuse_uring_get_next_fuse_req(ent, queue))
+ if (!fuse_uring_get_next_fuse_req(ent, queue,
+ issue_flags))
return;
err = 0;
}
@@ -1332,14 +1571,19 @@ void fuse_uring_queue_fuse_req(struct fuse_iqueue *fiq, struct fuse_req *req)
req->ring_queue = queue;
ent = list_first_entry_or_null(&queue->ent_avail_queue,
struct fuse_ring_ent, list);
- if (ent)
- fuse_uring_add_req_to_ring_ent(ent, req);
- else
- list_add_tail(&req->list, &queue->fuse_req_queue);
- spin_unlock(&queue->lock);
+ if (ent) {
+ err = fuse_uring_prep_buffer(ent, req, IO_URING_F_UNLOCKED);
+ if (!err) {
+ fuse_uring_add_req_to_ring_ent(ent, req);
+ spin_unlock(&queue->lock);
+ fuse_uring_dispatch_ent(ent);
+ return;
+ }
+ WARN_ON_ONCE(err != -ENOENT);
+ }
- if (ent)
- fuse_uring_dispatch_ent(ent);
+ list_add_tail(&req->list, &queue->fuse_req_queue);
+ spin_unlock(&queue->lock);
return;
@@ -1357,6 +1601,7 @@ bool fuse_uring_queue_bq_req(struct fuse_req *req)
struct fuse_ring *ring = fc->ring;
struct fuse_ring_queue *queue;
struct fuse_ring_ent *ent = NULL;
+ int err;
queue = fuse_uring_task_to_queue(ring);
if (!queue)
@@ -1389,14 +1634,15 @@ bool fuse_uring_queue_bq_req(struct fuse_req *req)
req = list_first_entry_or_null(&queue->fuse_req_queue, struct fuse_req,
list);
if (ent && req) {
- fuse_uring_add_req_to_ring_ent(ent, req);
- spin_unlock(&queue->lock);
-
- fuse_uring_dispatch_ent(ent);
- } else {
- spin_unlock(&queue->lock);
+ err = fuse_uring_prep_buffer(ent, req, IO_URING_F_UNLOCKED);
+ if (!err) {
+ fuse_uring_add_req_to_ring_ent(ent, req);
+ spin_unlock(&queue->lock);
+ fuse_uring_dispatch_ent(ent);
+ return true;
+ }
}
-
+ spin_unlock(&queue->lock);
return true;
}
diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
index 51a563922ce1..36496013e3e8 100644
--- a/fs/fuse/dev_uring_i.h
+++ b/fs/fuse/dev_uring_i.h
@@ -38,9 +38,31 @@ enum fuse_ring_req_state {
/** A fuse ring entry, part of the ring queue */
struct fuse_ring_ent {
- /* userspace buffer */
- struct fuse_uring_req_header __user *headers;
- void __user *payload;
+ union {
+ /* queue->bufring.enabled == false */
+ struct {
+ /* userspace buffers */
+ struct fuse_uring_req_header __user *headers;
+ void __user *payload;
+ };
+ /* queue->bufring.enabled == true */
+ struct {
+ /*
+ * unique fixed id for the ent. Used by kernel/server to
+ * locate the header data.
+ */
+ unsigned int id;
+ /*
+ * id of the bufring buffer the ent is using for the
+ * current request. May differ per-request.
+ *
+ * this needs to be tracked so we can recycle the buffer
+ * back to the ring when the request is done.
+ */
+ unsigned int buf_id;
+ struct kvec payload_kvec;
+ };
+ };
/* the ring queue that owns the request */
struct fuse_ring_queue *queue;
@@ -99,6 +121,24 @@ struct fuse_ring_queue {
unsigned int active_background;
bool stopped;
+
+ /*
+ * kernel-managed buffer ring support
+ *
+ * the following fields are only used if the server chooses to use
+ * bufrings
+ */
+ struct {
+ bool enabled: 1;
+ unsigned int queue_depth;
+ /*
+ * pointer to where the headers reside in the registered memory
+ * region
+ */
+ void *headers;
+ /* synchronized by the queue lock */
+ struct io_buffer_list *list;
+ } bufring;
};
/**
diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index c13e1f9a2f12..8f6ab0693a3d 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -240,6 +240,10 @@
* - add FUSE_COPY_FILE_RANGE_64
* - add struct fuse_copy_file_range_out
* - add FUSE_NOTIFY_PRUNE
+ *
+ * 7.46
+ * - add FUSE_URING_BUF_RING flag
+ * - add fuse_uring_cmd_req init struct
*/
#ifndef _LINUX_FUSE_H
@@ -1294,6 +1298,9 @@ enum fuse_uring_cmd {
FUSE_IO_URING_CMD_COMMIT_AND_FETCH = 2,
};
+/* fuse_uring_cmd_req flags */
+#define FUSE_URING_BUF_RING (1 << 0)
+
/**
* In the 80B command area of the SQE.
*/
@@ -1305,7 +1312,17 @@ struct fuse_uring_cmd_req {
/* queue the command is for (queue index) */
uint16_t qid;
- uint8_t padding[6];
+
+ union {
+ struct {
+ /*
+ * Byte offset into the ring's registered memory region.
+ * This is where the headers must reside.
+ */
+ uint64_t headers_offset;
+ uint16_t queue_depth;
+ } init;
+ };
};
#endif /* _LINUX_FUSE_H */
--
2.52.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v1 8/9] fuse: add zero-copy over io-uring
2026-03-24 22:45 [PATCH v1 0/9] fuse: add io-uring buffer rings and zero-copy Joanne Koong
` (6 preceding siblings ...)
2026-03-24 22:45 ` [PATCH v1 7/9] fuse: add io-uring kernel-managed buffer ring Joanne Koong
@ 2026-03-24 22:45 ` Joanne Koong
2026-03-24 22:45 ` [PATCH v1 9/9] docs: fuse: add io-uring bufring and zero-copy documentation Joanne Koong
8 siblings, 0 replies; 10+ messages in thread
From: Joanne Koong @ 2026-03-24 22:45 UTC (permalink / raw)
To: axboe; +Cc: miklos, bschubert, linux-fsdevel
Implement zero-copy data transfer for fuse over io-uring, eliminating
memory copies between kernel and userspace for read/write operations.
This is only allowed on privileged servers and requires the server to
preregister the following:
a) a kernel-managed buffer ring
b) a registered memory region large enough to hold the headers
c) a sparse buffer corresponding to the queue depth
The sparse buffer is where the client's pages reside. The registered
memory region is where the headers (struct fuse_uring_req_header) are
placed. The kernel-managed buffer ring is where any non-zero-copied args
reside (for example, out headers).
Benchmarks with bs=1M showed approximately the following differences in
throughput:
direct randreads: ~20% increase (~2100 MB/s -> ~2600 MB/s)
buffered randreads: ~25% increase (~1900 MB/s -> 2400 MB/s)
direct randwrites: no difference (~750 MB/s)
buffered randwrites: ~10% increase (950 MB/s -> 1050 MB/s)
The benchmark was run using fio on the passthrough_hp server:
fio --name=test_run --ioengine=sync --rw=rand{read,write} --bs=1M
--size=1G --numjobs=2 --ramp_time=30 --group_reporting=1
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
fs/fuse/dev.c | 7 +-
fs/fuse/dev_uring.c | 163 ++++++++++++++++++++++++++++++++------
fs/fuse/dev_uring_i.h | 6 +-
fs/fuse/fuse_dev_i.h | 1 +
include/uapi/linux/fuse.h | 3 +-
5 files changed, 151 insertions(+), 29 deletions(-)
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 7fee4ff64348..c94a8cc3a996 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -1231,10 +1231,13 @@ int fuse_copy_args(struct fuse_copy_state *cs, unsigned numargs,
for (i = 0; !err && i < numargs; i++) {
struct fuse_arg *arg = &args[i];
- if (i == numargs - 1 && argpages)
+ if (i == numargs - 1 && argpages) {
+ if (cs->skip_folio_copy)
+ return 0;
err = fuse_copy_folios(cs, arg->size, zeroing);
- else
+ } else {
err = fuse_copy_one(cs, arg->value, arg->size);
+ }
}
return err;
}
diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
index cd45b6a8e6c6..7409fccf93ca 100644
--- a/fs/fuse/dev_uring.c
+++ b/fs/fuse/dev_uring.c
@@ -31,6 +31,11 @@ struct fuse_uring_pdu {
struct fuse_ring_ent *ent;
};
+struct fuse_zero_copy_bvs {
+ unsigned nr_bvs;
+ struct bio_vec bvs[];
+};
+
static const struct fuse_iqueue_ops fuse_io_uring_ops;
enum fuse_uring_header_type {
@@ -47,6 +52,11 @@ static inline bool bufring_enabled(struct fuse_ring_queue *queue)
return queue->bufring.enabled;
}
+static inline bool zero_copy_enabled(struct fuse_ring_queue *queue)
+{
+ return queue->bufring.zero_copy;
+}
+
static void uring_cmd_set_ring_ent(struct io_uring_cmd *cmd,
struct fuse_ring_ent *ring_ent)
{
@@ -92,8 +102,14 @@ static void fuse_uring_flush_bg(struct fuse_ring_queue *queue)
}
}
+static bool can_zero_copy_req(struct fuse_ring_ent *ent, struct fuse_req *req)
+{
+ return zero_copy_enabled(ent->queue) &&
+ (req->args->in_pages || req->args->out_pages);
+}
+
static void fuse_uring_req_end(struct fuse_ring_ent *ent, struct fuse_req *req,
- int error)
+ int error, unsigned issue_flags)
{
struct fuse_ring_queue *queue = ent->queue;
struct fuse_ring *ring = queue->ring;
@@ -112,6 +128,11 @@ static void fuse_uring_req_end(struct fuse_ring_ent *ent, struct fuse_req *req,
spin_unlock(&queue->lock);
+ if (ent->zero_copied) {
+ io_buffer_unregister(ent->cmd, ent->id, issue_flags);
+ ent->zero_copied = false;
+ }
+
if (error)
req->out.h.error = error;
@@ -314,6 +335,7 @@ static void *fuse_uring_get_buf_ring_headers(struct io_uring_cmd *cmd,
static int fuse_uring_buf_ring_setup(struct io_uring_cmd *cmd,
struct fuse_ring_queue *queue,
+ bool zero_copy,
unsigned int issue_flags)
{
const struct fuse_uring_cmd_req *cmd_req =
@@ -323,7 +345,7 @@ static int fuse_uring_buf_ring_setup(struct io_uring_cmd *cmd,
void *headers;
int err;
- if (!queue_depth)
+ if (!queue_depth || (zero_copy && !capable(CAP_SYS_ADMIN)))
return -EINVAL;
headers = fuse_uring_get_buf_ring_headers(cmd, headers_offset,
@@ -342,6 +364,7 @@ static int fuse_uring_buf_ring_setup(struct io_uring_cmd *cmd,
queue->bufring.headers = headers;
queue->bufring.queue_depth = queue_depth;
queue->bufring.enabled = true;
+ queue->bufring.zero_copy = zero_copy;
return 0;
@@ -352,7 +375,8 @@ static int fuse_uring_buf_ring_setup(struct io_uring_cmd *cmd,
static struct fuse_ring_queue *
fuse_uring_create_queue(struct io_uring_cmd *cmd, struct fuse_ring *ring,
- int qid, bool use_bufring, unsigned int issue_flags)
+ int qid, bool use_bufring, bool use_zero_copy,
+ unsigned int issue_flags)
{
struct fuse_conn *fc = ring->fc;
struct fuse_ring_queue *queue;
@@ -384,12 +408,13 @@ fuse_uring_create_queue(struct io_uring_cmd *cmd, struct fuse_ring *ring,
fuse_pqueue_init(&queue->fpq);
if (use_bufring) {
- err = fuse_uring_buf_ring_setup(cmd, queue, issue_flags);
- if (err) {
- kfree(pq);
- kfree(queue);
- return ERR_PTR(err);
- }
+ err = fuse_uring_buf_ring_setup(cmd, queue, use_zero_copy,
+ issue_flags);
+ if (err)
+ goto cleanup;
+ } else if (use_zero_copy) {
+ err = -EINVAL;
+ goto cleanup;
}
spin_lock(&fc->lock);
@@ -403,7 +428,8 @@ fuse_uring_create_queue(struct io_uring_cmd *cmd, struct fuse_ring *ring,
kfree(queue);
queue = ring->queues[qid];
- if (bufring_enabled(queue) != use_bufring)
+ if (bufring_enabled(queue) != use_bufring ||
+ zero_copy_enabled(queue) != use_zero_copy)
return ERR_PTR(-EINVAL);
return queue;
}
@@ -415,6 +441,11 @@ fuse_uring_create_queue(struct io_uring_cmd *cmd, struct fuse_ring *ring,
spin_unlock(&fc->lock);
return queue;
+
+cleanup:
+ kfree(pq);
+ kfree(queue);
+ return ERR_PTR(err);
}
static void fuse_uring_stop_fuse_req_end(struct fuse_req *req)
@@ -774,6 +805,7 @@ static int setup_fuse_copy_state(struct fuse_copy_state *cs,
cs->is_kaddr = true;
cs->len = ent->payload_kvec.iov_len;
cs->kaddr = ent->payload_kvec.iov_base;
+ cs->skip_folio_copy = can_zero_copy_req(ent, req);
}
cs->is_uring = true;
@@ -806,11 +838,70 @@ static int fuse_uring_copy_from_ring(struct fuse_ring *ring,
return err;
}
+static void fuse_zero_copy_release(void *priv)
+{
+ struct fuse_zero_copy_bvs *zc_bvs = priv;
+ unsigned int i;
+
+ for (i = 0; i < zc_bvs->nr_bvs; i++)
+ folio_put(page_folio(zc_bvs->bvs[i].bv_page));
+
+ kfree(zc_bvs);
+}
+
+static int fuse_uring_set_up_zero_copy(struct fuse_ring_ent *ent,
+ struct fuse_req *req,
+ unsigned issue_flags)
+{
+ struct fuse_args_pages *ap;
+ int err, i, ddir = 0;
+ struct fuse_zero_copy_bvs *zc_bvs;
+ struct bio_vec *bvs;
+
+ /* out_pages indicates a read, in_pages indicates a write */
+ if (req->args->out_pages)
+ ddir |= IO_BUF_DEST;
+ if (req->args->in_pages)
+ ddir |= IO_BUF_SOURCE;
+
+ WARN_ON_ONCE(!ddir);
+
+ ap = container_of(req->args, typeof(*ap), args);
+
+ zc_bvs = kmalloc(struct_size(zc_bvs, bvs, ap->num_folios),
+ GFP_KERNEL_ACCOUNT);
+ if (!zc_bvs)
+ return -ENOMEM;
+
+ zc_bvs->nr_bvs = ap->num_folios;
+ bvs = zc_bvs->bvs;
+ for (i = 0; i < ap->num_folios; i++) {
+ bvs[i].bv_page = folio_page(ap->folios[i], 0);
+ bvs[i].bv_offset = ap->descs[i].offset;
+ bvs[i].bv_len = ap->descs[i].length;
+ folio_get(page_folio(bvs[i].bv_page));
+ }
+
+ err = io_buffer_register_bvec(ent->cmd, bvs, ap->num_folios,
+ fuse_zero_copy_release, zc_bvs,
+ ddir, ent->id,
+ issue_flags);
+ if (err) {
+ fuse_zero_copy_release(zc_bvs);
+ return err;
+ }
+
+ ent->zero_copied = true;
+
+ return 0;
+}
+
/*
* Copy data from the req to the ring buffer
*/
static int fuse_uring_args_to_ring(struct fuse_ring *ring, struct fuse_req *req,
- struct fuse_ring_ent *ent)
+ struct fuse_ring_ent *ent,
+ unsigned int issue_flags)
{
struct fuse_copy_state cs;
struct fuse_args *args = req->args;
@@ -843,6 +934,11 @@ static int fuse_uring_args_to_ring(struct fuse_ring *ring, struct fuse_req *req,
num_args--;
}
+ if (can_zero_copy_req(ent, req)) {
+ err = fuse_uring_set_up_zero_copy(ent, req, issue_flags);
+ if (err)
+ return err;
+ }
/* copy the payload */
err = fuse_copy_args(&cs, num_args, args->in_pages,
(struct fuse_arg *)in_args, 0);
@@ -853,12 +949,17 @@ static int fuse_uring_args_to_ring(struct fuse_ring *ring, struct fuse_req *req,
}
ent_in_out.payload_sz = cs.ring.copied_sz;
+ if (cs.skip_folio_copy && args->in_pages)
+ ent_in_out.payload_sz +=
+ args->in_args[args->in_numargs - 1].size;
+
return copy_header_to_ring(ent, FUSE_URING_HEADER_RING_ENT,
&ent_in_out, sizeof(ent_in_out));
}
static int fuse_uring_copy_to_ring(struct fuse_ring_ent *ent,
- struct fuse_req *req)
+ struct fuse_req *req,
+ unsigned int issue_flags)
{
struct fuse_ring_queue *queue = ent->queue;
struct fuse_ring *ring = queue->ring;
@@ -876,7 +977,7 @@ static int fuse_uring_copy_to_ring(struct fuse_ring_ent *ent,
return err;
/* copy the request */
- err = fuse_uring_args_to_ring(ring, req, ent);
+ err = fuse_uring_args_to_ring(ring, req, ent, issue_flags);
if (unlikely(err)) {
pr_info_ratelimited("Copy to ring failed: %d\n", err);
return err;
@@ -887,11 +988,20 @@ static int fuse_uring_copy_to_ring(struct fuse_ring_ent *ent,
sizeof(req->in.h));
}
-static bool fuse_uring_req_has_payload(struct fuse_req *req)
+static bool fuse_uring_req_has_copyable_payload(struct fuse_ring_ent *ent,
+ struct fuse_req *req)
{
struct fuse_args *args = req->args;
- return args->in_numargs > 1 || args->out_numargs;
+ if (!can_zero_copy_req(ent, req))
+ return args->in_numargs > 1 || args->out_numargs;
+
+ if ((args->in_numargs > 1) && (!args->in_pages || args->in_numargs > 2))
+ return true;
+ if (args->out_numargs && (!args->out_pages || args->out_numargs > 1))
+ return true;
+
+ return false;
}
static int fuse_uring_select_buffer(struct fuse_ring_ent *ent,
@@ -948,7 +1058,7 @@ static int fuse_uring_next_req_update_buffer(struct fuse_ring_ent *ent,
return 0;
buffer_selected = ent->payload_kvec.iov_base != NULL;
- has_payload = fuse_uring_req_has_payload(req);
+ has_payload = fuse_uring_req_has_copyable_payload(ent, req);
if (has_payload && !buffer_selected)
return fuse_uring_select_buffer(ent, issue_flags);
@@ -966,22 +1076,23 @@ static int fuse_uring_prep_buffer(struct fuse_ring_ent *ent,
return 0;
/* no payload to copy, can skip selecting a buffer */
- if (!fuse_uring_req_has_payload(req))
+ if (!fuse_uring_req_has_copyable_payload(ent, req))
return 0;
return fuse_uring_select_buffer(ent, issue_flags);
}
static int fuse_uring_prepare_send(struct fuse_ring_ent *ent,
- struct fuse_req *req)
+ struct fuse_req *req,
+ unsigned int issue_flags)
{
int err;
- err = fuse_uring_copy_to_ring(ent, req);
+ err = fuse_uring_copy_to_ring(ent, req, issue_flags);
if (!err)
set_bit(FR_SENT, &req->flags);
else
- fuse_uring_req_end(ent, req, err);
+ fuse_uring_req_end(ent, req, err, issue_flags);
return err;
}
@@ -1086,7 +1197,7 @@ static void fuse_uring_commit(struct fuse_ring_ent *ent, struct fuse_req *req,
err = fuse_uring_copy_from_ring(ring, req, ent);
out:
- fuse_uring_req_end(ent, req, err);
+ fuse_uring_req_end(ent, req, err, issue_flags);
}
/*
@@ -1109,7 +1220,7 @@ static bool fuse_uring_get_next_fuse_req(struct fuse_ring_ent *ent,
spin_unlock(&queue->lock);
if (req) {
- err = fuse_uring_prepare_send(ent, req);
+ err = fuse_uring_prepare_send(ent, req, issue_flags);
if (err)
goto retry;
}
@@ -1378,6 +1489,7 @@ static int fuse_uring_register(struct io_uring_cmd *cmd,
struct fuse_uring_cmd_req);
unsigned int init_flags = READ_ONCE(cmd_req->flags);
bool use_bufring = init_flags & FUSE_URING_BUF_RING;
+ bool use_zero_copy = init_flags & FUSE_URING_ZERO_COPY;
struct fuse_ring *ring = smp_load_acquire(&fc->ring);
struct fuse_ring_queue *queue;
struct fuse_ring_ent *ent;
@@ -1399,11 +1511,12 @@ static int fuse_uring_register(struct io_uring_cmd *cmd,
queue = ring->queues[qid];
if (!queue) {
queue = fuse_uring_create_queue(cmd, ring, qid, use_bufring,
- issue_flags);
+ use_zero_copy, issue_flags);
if (IS_ERR(queue))
return PTR_ERR(queue);
} else {
- if (bufring_enabled(queue) != use_bufring)
+ if (bufring_enabled(queue) != use_bufring ||
+ zero_copy_enabled(queue) != use_zero_copy)
return -EINVAL;
}
@@ -1506,7 +1619,7 @@ static void fuse_uring_send_in_task(struct io_tw_req tw_req, io_tw_token_t tw)
int err;
if (!tw.cancel) {
- err = fuse_uring_prepare_send(ent, ent->fuse_req);
+ err = fuse_uring_prepare_send(ent, ent->fuse_req, issue_flags);
if (err) {
if (!fuse_uring_get_next_fuse_req(ent, queue,
issue_flags))
diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
index 36496013e3e8..2fdb733a4170 100644
--- a/fs/fuse/dev_uring_i.h
+++ b/fs/fuse/dev_uring_i.h
@@ -49,7 +49,7 @@ struct fuse_ring_ent {
struct {
/*
* unique fixed id for the ent. Used by kernel/server to
- * locate the header data.
+ * locate the header data and zero-copy backing pages.
*/
unsigned int id;
/*
@@ -61,6 +61,8 @@ struct fuse_ring_ent {
*/
unsigned int buf_id;
struct kvec payload_kvec;
+ /* true if the request's pages are being zero-copied */
+ bool zero_copied;
};
};
@@ -130,6 +132,8 @@ struct fuse_ring_queue {
*/
struct {
bool enabled: 1;
+ /* this is only allowed on privileged servers */
+ bool zero_copy: 1;
unsigned int queue_depth;
/*
* pointer to where the headers reside in the registered memory
diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
index aa1d25421054..67b5bed451fe 100644
--- a/fs/fuse/fuse_dev_i.h
+++ b/fs/fuse/fuse_dev_i.h
@@ -39,6 +39,7 @@ struct fuse_copy_state {
bool is_uring:1;
/* if set, use kaddr; otherwise use pg */
bool is_kaddr:1;
+ bool skip_folio_copy:1;
struct {
unsigned int copied_sz; /* copied size into the user buffer */
} ring;
diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index 8f6ab0693a3d..57499f25d65c 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -242,7 +242,7 @@
* - add FUSE_NOTIFY_PRUNE
*
* 7.46
- * - add FUSE_URING_BUF_RING flag
+ * - add FUSE_URING_BUF_RING and FUSE_URING_ZERO_COPY flag
* - add fuse_uring_cmd_req init struct
*/
@@ -1300,6 +1300,7 @@ enum fuse_uring_cmd {
/* fuse_uring_cmd_req flags */
#define FUSE_URING_BUF_RING (1 << 0)
+#define FUSE_URING_ZERO_COPY (1 << 1)
/**
* In the 80B command area of the SQE.
--
2.52.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v1 9/9] docs: fuse: add io-uring bufring and zero-copy documentation
2026-03-24 22:45 [PATCH v1 0/9] fuse: add io-uring buffer rings and zero-copy Joanne Koong
` (7 preceding siblings ...)
2026-03-24 22:45 ` [PATCH v1 8/9] fuse: add zero-copy over io-uring Joanne Koong
@ 2026-03-24 22:45 ` Joanne Koong
8 siblings, 0 replies; 10+ messages in thread
From: Joanne Koong @ 2026-03-24 22:45 UTC (permalink / raw)
To: axboe; +Cc: miklos, bschubert, linux-fsdevel
Add documentation for fuse over io-uring usage of kernel-managed
bufrings and zero-copy.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
.../filesystems/fuse/fuse-io-uring.rst | 61 ++++++++++++++++++-
1 file changed, 60 insertions(+), 1 deletion(-)
diff --git a/Documentation/filesystems/fuse/fuse-io-uring.rst b/Documentation/filesystems/fuse/fuse-io-uring.rst
index d73dd0dbd238..f2522541e4ba 100644
--- a/Documentation/filesystems/fuse/fuse-io-uring.rst
+++ b/Documentation/filesystems/fuse/fuse-io-uring.rst
@@ -95,5 +95,64 @@ Sending requests with CQEs
| <fuse_unlink() |
| <sys_unlink() |
+Kernel-managed buffer rings
+===========================
-
+Kernel-managed buffer rings have two main advantages:
+
+* eliminates the overhead of pinning/unpinning user pages and translating
+ virtual addresses for every server-kernel interaction
+* reduces buffer memory allocation requirements
+
+In order to use buffer rings, the server must preregister the following:
+
+* a kernel-managed buffer ring
+* a registered memory region large enough to hold the headers for the ents
+
+Please refer to libfuse's lib/fuse_uring.c for an example of how to set this
+up.
+
+At a high-level, this is how fuse uses buffer rings:
+
+* The server registers a kernel-managed buffer ring. In the kernel this
+ allocates the pages needed for the buffers and vmaps them. The server
+ obtains the virtual address for the buffers through an mmap call on the ring
+ fd.
+* When the server creates a queue, it passes in the headers offset, which
+ specifies the offset into the registered memory region where the headers
+ reside.
+* When there is a request from a client, fuse will select a buffer from the
+ ring if there is any payload that needs to be copied, copy over the payload
+ to the selected buffer, and copy over the headers to the registered memory
+ region. Each ent has its own id (passed through sqe->buf_index at setup
+ time) that communicates where in the memory region the headers for that ent
+ resides.
+* The server obtains a cqe representing the request. The cqe flag will have
+ IORING_CQE_F_BUFFER set if a selected buffer was used for the payload. The
+ buffer id is stashed in cqe->flags (through IORING_CQE_BUFFER_SHIFT). The
+ server can directly access the payload by using that buffer id to calculate
+ the offset into the virtual address obtained for the buffers.
+* The server processes the request and then sends a
+ FUSE_URING_CMD_COMMIT_AND_FETCH sqe with the reply.
+* When the kernel handles the sqe, it will process the reply and if there is a
+ next request, it will reuse the same selected buffer for the request. If
+ there is no next request, it will recycle the buffer back to the ring.
+
+Zero-copy
+=========
+
+Fuse io-uring zero-copy allows the server to directly read from / write to the
+client's pages and bypass any intermediary buffer copies. This is only allowed
+on privileged servers. Access to these pages (eg reads/writes by the server)
+must go through the io-uring interface.
+
+In order to use zero-copy, the server must use kernel-managed buffer rings. It
+will also need to preregister a sparse buffer for every ent in the queue. This
+is where the client's pages will reside.
+
+When the client issues a read/write, fuse stores the client's underlying pages
+in the sparse buffer entry corresponding to the ent in the queue. The server
+can then issue reads/writes on these pages through io_uring rw operations.
+The pages are unregistered once the server replies to the request.
+Non-zero-copyable payload (if needed) is placed in a buffer from the
+kernel-managed buffer ring.
--
2.52.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-03-24 22:46 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-24 22:45 [PATCH v1 0/9] fuse: add io-uring buffer rings and zero-copy Joanne Koong
2026-03-24 22:45 ` [PATCH v1 1/9] fuse: separate next request fetching from sending logic Joanne Koong
2026-03-24 22:45 ` [PATCH v1 2/9] fuse: refactor io-uring header copying to ring Joanne Koong
2026-03-24 22:45 ` [PATCH v1 3/9] fuse: refactor io-uring header copying from ring Joanne Koong
2026-03-24 22:45 ` [PATCH v1 4/9] fuse: use enum types for header copying Joanne Koong
2026-03-24 22:45 ` [PATCH v1 5/9] fuse: refactor setting up copy state for payload copying Joanne Koong
2026-03-24 22:45 ` [PATCH v1 6/9] fuse: support buffer copying for kernel addresses Joanne Koong
2026-03-24 22:45 ` [PATCH v1 7/9] fuse: add io-uring kernel-managed buffer ring Joanne Koong
2026-03-24 22:45 ` [PATCH v1 8/9] fuse: add zero-copy over io-uring Joanne Koong
2026-03-24 22:45 ` [PATCH v1 9/9] docs: fuse: add io-uring bufring and zero-copy documentation Joanne Koong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox